BiomaRt is a Bioconductor package that make accessing and retrieving Ensembl data from the R software very easy. The recent Bioconductor 3.1 release includes a new version of BiomaRt packed with many new Ensembl friendly functions allowing you to connect and retrieve data from the Ensembl marts in record time.
To celebrate the new Bioconductor release, we’ve just launched a brand new mart documentation page. This new documentation covers the BioMaRt package but also how to combine species data, BioMart RESTful and Perl API.
You want to get some Ensembl data from BioMart using BiomaRt? Easy, just follow the simple guide below.
How can I install the BiomaRt, R package?
First make sure you have installed the R software on your computer. Then, run the following commands from your R terminal to install the Bioconductor BiomaRt R package:
source("http://bioconductor.org/biocLite.R") biocLite("biomaRt")
What are the Ensembl marts?
The following functions will give you the list of the current available Ensembl marts
> library(biomaRt) > listEnsembl() biomart version 1 ensembl Ensembl Genes 80 2 snp Ensembl Variation 80 3 regulation Ensembl Regulation 80 4 vega Vega 60 5 pride PRIDE (EBI UK)
Which Ensembl species have Variation data?
The listDatasets function will list all the species available for a given mart.
> library(biomaRt) > variation = useEnsembl(biomart="snp") > listDatasets(variation)
What data can I get from the Variation mart (filters and attributes)?
The listFilters and listAttributes functions will give you the list of all the filters and attributes available for a given mart.
> library(biomaRt) > variation = useEnsembl(biomart="snp", dataset="hsapiens_snp") > listFilters(variation) > listAttributes(variation)
How can I get data about a variant using an rsID?
In the following example, you will be able to retrieve Variation source, Chromosome locations, Minor allele, Frequency and count, Consequences, Ensembl Gene and Transcript IDs for the Variation name “rs1333049”.
> library(biomaRt) > variation = useEnsembl(biomart="snp", dataset="hsapiens_snp") > rs1333049 <- getBM(attributes=c('refsnp_id','refsnp_source','chr_name','chrom_start','chrom_end','minor_allele','minor_allele_freq','minor_allele_count','consequence_allele_string','ensembl_gene_stable_id','ensembl_transcript_stable_id'), filters = 'snp_filter', values ="rs1333049", mart = variation) > rs1333049
How can I get data on all genes on a chromosome?
In the following example, you will be able to retrieve Ensembl Gene IDs, HGNC symbols and biotypes located on the human chromosome Y.
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl") > chrY_genes <- getBM(attributes=c('ensembl_gene_id','gene_biotype','hgnc_symbol','chromosome_name','start_position','end_position'), filters = 'chromosome_name', values ="Y", mart = ensembl) > chrY_genes
How can I get protein domains information mapped to an Ensembl Gene ID?
In the following example, you will be able to retrieve Ensembl Gene, Transcript and Protein IDs, Interpro and Pfam protein domain IDs and locations mapped to the Ensembl Gene ID “ENSG00000198763”.
> library(biomaRt) > ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl") > domain_location_ENSG00000198763 <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','interpro','interpro_start','interpro_end','pfam','pfam_start','pfam_end'), filters ='ensembl_gene_id', values ="ENSG00000198763", mart = ensembl) > domain_location_ENSG00000198763
The Bioconductor BiomaRt R package and complete documentation can be found on the BiomaRt Bioconductor page.