BiomaRt or how to access the Ensembl data from R

BiomaRt is a Bioconductor package that make accessing and retrieving Ensembl data from the R software very easy. The recent Bioconductor 3.1 release includes a new version of BiomaRt packed with many new Ensembl friendly functions allowing you to connect and retrieve data from the Ensembl marts in record time.

To celebrate the new Bioconductor release, we’ve just launched a brand new mart documentation page. This new documentation covers the BioMaRt package but also how to combine species dataBioMart RESTful and Perl API.

You want to get some Ensembl data from BioMart using BiomaRt? Easy, just follow the simple guide below.

How can I install the BiomaRt, R package?

First make sure you have installed the R software on your computer. Then, run the following commands from your R terminal to install the Bioconductor BiomaRt R package:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")

What are the Ensembl marts?

The following functions will give you the list of the current available Ensembl marts

> library(biomaRt)

> listEnsembl()

     biomart               version
1    ensembl               Ensembl Genes 80
2        snp               Ensembl Variation 80
3 regulation               Ensembl Regulation 80
4       vega               Vega 60
5      pride               PRIDE (EBI UK)

Which Ensembl species have Variation data?

The listDatasets function will list all the species available for a given mart.

> library(biomaRt)

> variation = useEnsembl(biomart="snp")

> listDatasets(variation)

biomart_R_1

What data can I get from the Variation mart (filters and attributes)?

The listFilters and listAttributes functions will give you the list of all the filters and attributes available for a given mart.

> library(biomaRt)
 
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> listFilters(variation)

> listAttributes(variation)

biomart_R_filters

 

 

biomart_R_attributes

 

 

 

How can I get data about a variant using an rsID?

In the following example, you will be able to retrieve Variation source, Chromosome locations, Minor allele, Frequency and count, Consequences, Ensembl Gene and Transcript IDs for the Variation name “rs1333049”.

> library(biomaRt)
 
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> rs1333049 <- getBM(attributes=c('refsnp_id','refsnp_source','chr_name','chrom_start','chrom_end','minor_allele','minor_allele_freq','minor_allele_count','consequence_allele_string','ensembl_gene_stable_id','ensembl_transcript_stable_id'), filters = 'snp_filter', values ="rs1333049", mart = variation)

> rs1333049

biomart_R_snp_information

How can I get data on all genes on a chromosome?

In the following example, you will be able to retrieve Ensembl Gene IDs, HGNC symbols and biotypes located on the human chromosome Y.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> chrY_genes <- getBM(attributes=c('ensembl_gene_id','gene_biotype','hgnc_symbol','chromosome_name','start_position','end_position'), filters = 'chromosome_name', values ="Y", mart = ensembl)

> chrY_genes 

biomart_R_gene

How can I get protein domains information mapped to an Ensembl Gene ID?

In the following example, you will be able to retrieve Ensembl Gene, Transcript and Protein IDs, Interpro and Pfam protein domain IDs and locations mapped to the Ensembl Gene ID “ENSG00000198763”.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> domain_location_ENSG00000198763 <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','interpro','interpro_start','interpro_end','pfam','pfam_start','pfam_end'), filters ='ensembl_gene_id', values ="ENSG00000198763", mart = ensembl) 

> domain_location_ENSG00000198763

temporary_screenshot

The Bioconductor BiomaRt R package and complete documentation can be found on the BiomaRt Bioconductor page.