What’s new in e83:

  • Human: gene set updated to GENCODE 24, and new assembly patches (GRCh38.p5)
  • Manhattan plot track for LD
  • Advanced Filtering and Counts on Variant table
  • Minor allele frequency (MAF) filter on sequence mark-up views

Human gene set update and new assembly patches

chromosome_exceptions

The human gene set now corresponds to GENCODE 24 while the assembly has been updated to include new assembly patches for GRCh38.p5.

Manhattan plot track for linkage disequilibrium

This new linkage disequilibrium (LD) track is focused on a variant and displays the linked variants surrounding the focus variant. The track displays a Manhattan plot, using the r2 and D prime values (from 0 to 1) on the Y axis. The new track is accessible in the Variation Linkage disequilibrium page, through the links in the new column “LD Manhattan plot“.

Manhattan_plot_track_LD

Advanced Filtering and Counts on Variant table

The functionality of the variant table has been further expanded to allow a wider range of filtering options. Filtering can now be applied by Minor Allele Frequency, SIFT and PolyPhen scores, Clinical Significance, Consequence Type and many other columns, using buttons along the top of the variant table. For many of these filters, preset useful combinations of options are available within the popup allowing rapid configuration of more complex combinations. In addition, row counts for each consequence type have been readded to the existing Consequence Type filter. These are displayed in the popup which appears once the filter button has been pressed.

Filtering variants in RYR1 gene (ENSG00000196218).

MAF filter on sequence mark-up views

The variants displayed on all sequence mark-up views can be filtered by minor allele frequency (MAF), allowing you to either show or hide according to a range of frequencies (between 0.01% and 10%). This filtering is not on by default so to enable it go to ‘Configure this page’ on any sequence view page and then choose the value you want from the ‘Hide variants by frequency (MAF)’ drop down menu.

Consequence filter for MAF.Improving the image export and Ensembl mobile website

There are many updates to these functionalities which will be described in detail in separate blog posts. Look out for these blogs! Below is how the new image export wizard looks like.

Image export window

Other news

  • Mouse: updated to GENCODE M8 annotation
  • Rat: updated gene set, including manually annotated HAVANA annotation
  • Annotations now available in RDF format for all species on our FTP site
  • New human phenotype association data from Cancer Gene Census
  • RefSeq genomic to mRNA comparison attributes will be updated for human
  • New dbSNP145 variation data for chicken and pig

A complete list of the changes can be found on the Ensembl website.

Find out more about the new release, and ask the team questions, in our free webinar. Wednesday 16th December, 4pm GMT. Register here.

What’s new in e82:

Ensembl mobile website

We are very happy to announce the release of the Ensembl website mobile version, available on http://m.ensembl.org. This new website allows you to quickly search for a gene, variants or phenotype on your mobile device.

Ensembl mobile website

Support of VEP Plugins through the web interface, script and REST

The VEP can now be extended beyond its core functionality using a system of plugins. Plugins are a powerful way to extend, filter and manipulate the output of the VEP. More information regarding the VEP plugins can be found on the following documentation page.

Improved Variation tables

Variation tables for genes and transcripts have been reimplemented to effectively handle the large number of variants now known for many genes. At the same time, the ability to filter, sort, and select this data has been improved. Filtering by variant type is now achieved by selecting the “Type:” filter at the top of the main table. Further features and refinements are expected to be added in forthcoming releases.

new_Variation_table

Zebrafish development stage RNASeq data set

We’ve added sample-specific BAM files, splice junctions (introns) and gene models based on a range of zebrafish developmental stages and tissue samples.

Zebrafish_BAM_track

Marking a region on images

A new feature to mark a selected region has been added to the location, gene and other views. Marking can be applied by drag-selecting a region and then using the zmenu to mark it, or by clicking on a feature on an image and then using the zmenu to mark the location of the feature.

LastZ replaced TBlat for pairwise alignments

We have replaced TBlat with LastZ and recomputed 9 pairwise alignments using LastZ. TBlat was used for distantly related species as it was yielding a higher genome coverage, but over time we have optimised the LastZ parameters that enable it to give a 50-100% increase in genome coverage.

Other news

  • Human variation data updates to dbSNP (144) including variants from the Exome Aggregation Consortium (ExAC)
  • Mouse: updated to GENCODE M7 including HAVANA annotation.
  • Improved data upload form
  • Improvements to PDF export
  • Export mode for projectors and print
  • Phenotype data updated for several species, including human, mouse, rat and horse

A complete list of the changes can be found on the Ensembl website.

Find out more about the new release, and ask the team questions, in our free webinar. Wednesday 7th October, 4pm BST. Register for free.

What’s new in e81:

Human gene set update and new assembly patches

Human_assembly_exception

The human gene set now corresponds to GENCODE 23 while the assembly has been updated to include new assembly patches for GRCh38.p3.

Mouse and zebrafish clone tracks

New Mouse clones

Mouse and zebrafish clone libraries have been imported from the NCBI clone database to replace our previous DAS tracks. The new clones tracks can be found under “Clones and misc regions” in the configuration menu on the left hand side, while the coordinates for the BAC ends can be found as tracks under “Simple features”, also from the configuration menu.

New mouse regulatory build

New regulatory mouse build

The Regulatory Build on Mouse was re-computed, converting the “old style” build to the “new style” build introduced on human in e!76. All Regulatory Builds in Ensembl are now updated to the new style. We have also increased the number of mouse cell types to 8.

Transcript sequence mark up

transcript_sequence_markup

Transcript sequences can now be marked up to show exons as alternating upper and lower case characters, rather than grey/blue text. Simply tick the “Show exons as alternating upper/lower case” box in the “Configure this page” panel on Transcript cDNA or Transcript Protein pages.

This markup option will also carry over to the sequence export if RTF format is chosen.

Other news

  • Mouse: updated to GENCODE M6 including HAVANA annotation with the new assembly patched (GRCm38.p4)
  • Annotations now available in GFF3 format for all our species on our FTP site
  • Phenotype data updated for several species, including human, mouse, sheep and chicken
  • Sheep: updated gene set including lincRNA genes

A complete list of the changes can be found on the Ensembl website.

BiomaRt is a Bioconductor package that make accessing and retrieving Ensembl data from the R software very easy. The recent Bioconductor 3.1 release includes a new version of BiomaRt packed with many new Ensembl friendly functions allowing you to connect and retrieve data from the Ensembl marts in record time.

To celebrate the new Bioconductor release, we’ve just launched a brand new mart documentation page. This new documentation covers the BioMaRt package but also how to combine species dataBioMart RESTful and Perl API.

You want to get some Ensembl data from BioMart using BiomaRt? Easy, just follow the simple guide below.

How can I install the BiomaRt, R package?

First make sure you have installed the R software on your computer. Then, run the following commands from your R terminal to install the Bioconductor BiomaRt R package:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")

What are the Ensembl marts?

The following functions will give you the list of the current available Ensembl marts

> library(biomaRt)

> listEnsembl()

     biomart               version
1    ensembl               Ensembl Genes 80
2        snp               Ensembl Variation 80
3 regulation               Ensembl Regulation 80
4       vega               Vega 60
5      pride               PRIDE (EBI UK)

Which Ensembl species have Variation data?

The listDatasets function will list all the species available for a given mart.

> library(biomaRt)

> variation = useEnsembl(biomart="snp")

> listDatasets(variation)

biomart_R_1

What data can I get from the Variation mart (filters and attributes)?

The listFilters and listAttributes functions will give you the list of all the filters and attributes available for a given mart.

> library(biomaRt)
 
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> listFilters(variation)

> listAttributes(variation)

biomart_R_filters

 

 

biomart_R_attributes

 

 

 

How can I get data about a variant using an rsID?

In the following example, you will be able to retrieve Variation source, Chromosome locations, Minor allele, Frequency and count, Consequences, Ensembl Gene and Transcript IDs for the Variation name “rs1333049”.

> library(biomaRt)
 
> variation = useEnsembl(biomart="snp", dataset="hsapiens_snp")

> rs1333049 <- getBM(attributes=c('refsnp_id','refsnp_source','chr_name','chrom_start','chrom_end','minor_allele','minor_allele_freq','minor_allele_count','consequence_allele_string','ensembl_gene_stable_id','ensembl_transcript_stable_id'), filters = 'snp_filter', values ="rs1333049", mart = variation)

> rs1333049

biomart_R_snp_information

How can I get data on all genes on a chromosome?

In the following example, you will be able to retrieve Ensembl Gene IDs, HGNC symbols and biotypes located on the human chromosome Y.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> chrY_genes <- getBM(attributes=c('ensembl_gene_id','gene_biotype','hgnc_symbol','chromosome_name','start_position','end_position'), filters = 'chromosome_name', values ="Y", mart = ensembl)

> chrY_genes 

biomart_R_gene

How can I get protein domains information mapped to an Ensembl Gene ID?

In the following example, you will be able to retrieve Ensembl Gene, Transcript and Protein IDs, Interpro and Pfam protein domain IDs and locations mapped to the Ensembl Gene ID “ENSG00000198763”.

> library(biomaRt)

> ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")

> domain_location_ENSG00000198763 <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','ensembl_peptide_id','interpro','interpro_start','interpro_end','pfam','pfam_start','pfam_end'), filters ='ensembl_gene_id', values ="ENSG00000198763", mart = ensembl) 

> domain_location_ENSG00000198763

temporary_screenshot

The Bioconductor BiomaRt R package and complete documentation can be found on the BiomaRt Bioconductor page.

What’s new in e80:

1000 Genomes phase 3 and dbSNP build 142

We are happy to announce that Human dbSNP 142 incorporating 1000 Genomes phase 3 data is now available for the GRCh38 assembly.

1000_genomes_phase3

Gene Expression Atlas Widget

The Gene Expression Atlas widget has been embedded into Ensembl. You can now view where the gene is expressed anatomically and also which experiment it is associated with.

GXA_plugin

Updated zebrafish and rat gene annotation based on the GRCz10 and Rnor_6.0 assemblies

rattusWe are really excited to release the full gene annotation, dbSNP and microarray updates for:

Zebrafisch

Track label improvement

Some tracks in images now appear within sections, grouping common tracks within a category is now possible.

Each section is identified by a heading underlined in a certain colour, and each track within that section using the same colour on the left-hand side.

Also, some tracks now have labels within the image itself, to allow longer descriptions. These in-image labels can be configured on or off via the configuration panel.

New user data track type: long-range interactions

We are very pleased to announce that Ensembl now supports long-range pairwise interaction data, which can be drawn as arcs on Region in Detail. Scores are indicated using a grey-to-black gradient, and labels can be displayed by selecting the appropriate track style from the configuration menu.

Initially we support the two formats developed by WashU for their Epigenomics browser. More information on both formats can be found in our online documentation.

We hope to support more formats in the future, so please let us know which formats you are currently using!

Other news

  • Mouse GENCODE M5 (GRCm38.p3): An updated version of the GENCODE gene set
  • We’ve imported sequence variants from:
  • New BioMart documentation
  • New export options for comparative views (homologues, gene trees and OrthoXML filtering)
  • Gap initiation update for BLAT and BLAST

A complete list of the changes can be found on the Ensembl website.

Ensembl 80 is scheduled for May 2015. Highlights include:

Variation data imports and updates

  • 1000 Genomes phase 3 studies will be imported for human
  • Variant locations will be added from the ExAC project
  • The latest sequence variants will be imported from:
    • dbSNP build 142 for human, mouse, zebrafish and cow
    • dbSNP build 143 for sheep and pig

Updated gene sets and annotations

  • Mouse GENCODE M5 (GRCm38.p3): An updated version of the GENCODE gene set
  • Updated zebrafish gene annotation based on the GRCz10 assembly
  • Updated rat gene annotation based on the new Rnor_6.0 assembly
  • RefSeq genomic to mRNA comparison attributes will be added for human

New web features

  • New export options for comparative views (homologues, gene trees and OrthoXML filtering)
  • New display styles for BigWig files on karyotype
  • Support for long-range interaction data

Other updates

  • The pairwise and multiple alignments have been updated to use the new Zebrafish and Rat assemblies

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.

After running our brand new Ensembl Regulatory Build on the human GRCh38 and GRCh37 assemblies, we spent some time revamping our current Regulation mart to make it faster, easier to use and pack it with brand new features. A complete re-design of the mart has been done in the background to make sure our mart can provide improved performance and deal with the data increase. The new Regulation mart can still be found on the Ensembl website under the BioMart tab.

The first thing that you will notice is that you can now access each regulation data type separately, this allows you to get the data quickly and make the filter and attribute sections neater.

New Regulation mart dataset dropdown

 

 

 

 

 

 

Each Regulation section holds its own type of data, for example you can get the following data for human GRCh38:

  • Binding Motifs (TFBS Annotation)
    • Binding Matrix ID (e.g: MA0003.1)
    • Feature type data (e.g: BHLHE40,CTCFL,…)
  • Other Regulatory Regions
    • Feature type data (e.g: FANTOM predictions, VISTA Enhancers)
    • Identifiers (e.g: hs1, 1:922877-923268,…)
  • Regulatory Evidence (Regulatory Build Information)
    • Feature type data (e.g: ATF3, DNase1,…)
    • Cell type (e.g: A549, H1ESC,..)
    • Feature Type class (e.g: Histone, Polymerase, Transcription Factor, Open Chromatin)
    • Project name (e.g: ENCODE and Roadmap Epigenomics)
    • SRA Experiment Accession (e.g: SRX018823SRX056730,…)
  • Regulatory Features (Regulatory Build Information)
    • Ensembl Regulatory Stable ID (e.g: ENSR00001516677)
    • Feature type (e.g: CTCF Binding Site, Enhancer, Open chromatin, Promoter, Promoter Flanking Region, TF binding site)
    • Cell type (e.g: A549, DND-41, GM12878, H1ESC,…)
  • Regulatory Segments (Segmentation Information)
    • Feature type (e.g: CTCF enriched, Predicted Enhancer, Predicted heterochromatin, Predicted low activity, Predicted Promoter flank, Predicted Promoter with TSS, Predicted Repressed, Predicted Transcribed Region)
    • Cell type (e.g: HeLa-S3, HepG2)
  • miRNA Target Regions (TarBase miRNA target predictions)
    • miRNA Identifier ID (e.g: hsa-miR-124-3p, hsa-miR-122-5p,…)
    • miRNA Accession ID (e.g: MIMAT0000069, MIMAT0000646,…)

We also have Binding motifs, Regulatory evidence, Regulatory Features, miRNA Target Regions data available for mouse and Other Regulatory Regions available for both mouse and fruit fly.

In addition of the above, we have added the following brand new information:

  • Band Start/End, Marker Start/End and ENCODE Pilot Regions filters to the six Regulation data sections
  • SO name and accession to the six Regulation data sections
  • EFO Term accession to the Regulatory evidence, Regulatory features and Regulatory segments.
  • “Has evidence”, which denotes whether Regulatory features have supporting evidence on a particular cell type or not.
  • Chromosome Strand and Evidence to miRNA Target Regions.

Working with R?

Did you know that you can access all our marts using the BiomaRt Bioconductor R package?

To do this, first install the Bioconductor BiomaRt package: http://bioconductor.org/packages/release/bioc/html/biomaRt.html.

The following R code will then give you the chromosome location and scores for the human GRCh38 Binding matrix ID MA0005.2:

> library(biomaRt)

> ensembl_regulation = useMart(biomart="ENSEMBL_MART_FUNCGEN",host="www.ensembl.org",dataset="hsapiens_motif_feature")

> binding_matrix = getBM(attributes=c('binding_matrix_id','chromosome_name', 'chromosome_start', 'chromosome_end','chromosome_strand', 'score'), filters='motif_binding_matrix_id',values="MA0005.2", mart=ensembl_regulation)

Screen Shot 2015-04-02 at 14.16.56

The new Regulation mart is available for human, mouse and fruit fly on both www.ensembl.org and www.grch37.ensembl.org.

Ensembl 79 is scheduled for March 2015. Highlights include:

Updated gene sets and annotations

  • Human GENCODE release 22 (GRCh38.p2): An updated version of the GENCODE gene set, which combines Havana’s manual annotation and Ensembl’s evidence-based automatic annotation, will be released
  • Assembly patches will be added and annotated for the new human assembly GRCh38.p2
  • RefSeq to Ensembl model comparison attributes will be added for human
  • Fruitfly assembly will be updated to BDGP6

Variation data imports and updates

  • 1000 Genomes phase 3 studies will be imported for human
  • The latest sequence variants from dbSNP build 142 for human will be imported
  • New Global Alliance standards REST endpoints will be available for sets of Variation data
  • NextGen Project genotype data will be added from 3 sheep populations (Iranian Ovis aries, Iranian Ovis orientalis, Moroccan Ovis aries)
  • New rat strain-specific variants and genotypes, and QTLs and phenotypes from the Rat Genome Database (RGD)

New web features

  • Updated Gene gain/loss tree view
  • New summary statistics of the homologs predicted between each pair of species

For more details on the declared intentions, please visit our Ensembl admin site. Please note that these are intentions and are not guaranteed to make it into the release.

What’s new in e!77?

Imported Transcript Support Levels (TSLs) from UCSC

Transcript Support Levels (TSLs)We have imported Transcript Support Levels (TSLs) from UCSC and we are displaying them as a new flag in the human and mouse Gene and Transcript tables. TSLs for human are based on the GENCODE 20 gene set and TSLs for mouse are based on the GENCODE M2 gene set. Transcript Support Level is a method to highlight the well-supported and poorly-supported transcript models for users.

New APPRIS flag for human and mouse

APPRIS principal isoformWe’ve added a new flag for human and mouse in the Gene and Transcript tables  to represent the principal isoform generated by APPRIS which is a system that deploys a range of computational methods to provide value to the annotations of the human genome. It aims to select a single CDS for each gene as the principal isoform by combining protein structural information, functionally important residues and evidence from cross-species alignments. APPRIS is a joint project between Centro Nacional de Investigaciones Oncologicas and Instituto Nacional de Bioinformatica.

Improved alignments export

New alignment export toolThis release we are continuing our upgrade of our Export interface, the second component to be released is the alignments. To download an alignment, just navigate to one of the following pages:

Select an alignment, click on the “Download sequence” button, select your favourite output format from CLUSTALW, FASTA, Mega, MSF, Nexus, Pfam, Phylip, PSI, RTF (text alignment views only) or Stockholm, preview the output file, and save it.

New species

Chlorocebus_sabaeusThe Vervet-AGM (Chlorocebus sabaeus), also known as the Vervet Monkey or African Green Monkey, ChlSab1.1 (GCA_000409795.2), was produced by the Vervet Genomics Consortium. This species is particularly important when studying high blood pressure and AIDS, since it is a host for simian immunodeficiency virus (SIV). BAM files and RNAseq-based gene models are available also for this species.

Other news:

  • Data matrix configuration extended to all species with RNASeq data
  • Added new studies and updated other studies for human from DGVa
  • Updated human phenotype from ClinVar and Decipher
  • Updated mouse phenotype from IMPC

A complete list of the changes can be found on the Ensembl website.

Find out more at the Ensembl Release Webinar e77 (Thu, October 16, 2014 4:00 PM – 4.30 PM GMT). Register for free here: http://tinyurl.com/ensembl-oct-release

What’s new in e!76?

  • Updated human assembly to GRCh38 (GENCODE 20)
  • Updated mouse gene set (GENCODE M3)
  • New BLAST/BLAT
  • New Regulation displays
  • Improved sequence export
  • New species: Amazon molly and Olive baboon

New human assembly – GRCh38

GRCh38We’re excited to release the full annotation of the new human genome assembly (GRCh38). This new assembly includes 24 chromosomes, mitochondrial DNA, 261 alternative reference loci, 127 unplaced scaffolds and 42 unlocalized scaffolds. Our comprehensive GRCh38 resources include updated variation data, results from the new regulatory annotation build, updated comparative genomics data and tissue-specific alignments of Human BodyMap 2.0 data. Full details in our GRCh38 blog series.

There are many reasons to switch to the new human assembly, but if you not able to move just yet, our new archive website http://grch37.ensembl.org provides access to the previous assembly and annotation for those who need it.

New BLAST/BLAT

New BLAST Karyotype view

Release 76 includes our new BLAST/BLAT, which uses the same tools infrastructure as the new web-based VEP that came out in release 75. Highlights of this new version includes the ability to save tickets in a user-friendly table, automatic result retrieval, improved speed and job tracking. We are also now using NCBI-BLAST to enable us to distribute our BLAST code more freely.

New Regulation displays

regulation-display

 

 

 

 

 

 

The display of regulatory regions (sequences that may be involved in gene regulation) has been updated to coincide with the release of the data from the new Ensembl Regulatory build. The major enhancement is a redesigned interface for selecting which evidence types to display and for which cell types, as well as how to display the evidence, for example as peaks or signals.

More details about the new regulation pipeline and displays can be found on the Regulatory Build documentation.

Improved sequence export

new_export_toolWe are embarking on an upgrade of our Export interface to make it more intuitive; the first component to be released is DNA and peptide sequences. To download a sequence, just navigate to a sequence page, click on the “Download sequence” button, select your favourite output format, preview the output file and save it. Note that the “Export data” button has been disabled on pages that use this new interface, to avoid confusion.

New species

We are happy to announce annotation for two new species in this release.

Papio_anubis

The Olive baboon (Papio anubis) assembly, Panu_2.0 (GCA_000264685.1) was produced by the Baylor College of Medicine. This species is used for physiological and behavioural studies as well as comparative genomic studies. BAM files and RNA-seq based gene models are provided through a collaboration between the Nonhuman Primate Reference Transcriptome Resource (nhprtr.org) and the Human Genome Sequencing Center, Baylor College of Medicine (hgsc.bcm.edu).

Amazon molly

The Amazon molly (Poecilia formosa) assembly, Poecilia_formosa-5.1.2 (GCA_000485575.1) was produced by the Aquatic Genome Models Consortium. Amazon molly is used as a model for modern evolutionary biology and carcinogenicity studies, and is extremely easy to breed and rear in captivity. More information on the Amazon molly blog post.

Other news:

A complete list of the changes can be found on the Ensembl website.

Find out more at the Ensembl Release Webinar e76 (Wed, August 20, 2014 4:00 PM – 4.30 PM GMT). Register for free here: http://tinyurl.com/e76-webinar

Want to know more about GRCh38?, Register for free to the Ensembl and GRC Webinar (Wed, September 17, 2014 4:00 PM – 4:30 PM GMT): http://tinyurl.com/GRCh38-webinar