Ensembl 100 has been released!

We are very excited to announce the release of Ensembl 100, along with Ensembl Genomes 47! Time has really flown for us. We moved from our beginnings as a browser with just one genome 20 years ago to an integrated resource for many species and data types in 2020. In this release we continue to scale up, bringing you 29 new genomes and a lot more.

Major Data Updates for Human and Mouse

Ensembl release 100 brings us up to GENCODE 34 for Human and GENCODE M25 for Mouse. We have also updated the genomic allele frequencies from the Genome Aggregation Database (gnomAD) on the GRCh38 assembly to version 3.

New Genomes

We have added 29 new species: Three mammals, seven fish, six birds, four reptiles, eight plants and one mosquito. Here’s the complete list for you to check if your favourite genome is among them.

Mammals:

Fish:

Birds:

Reptiles:

The genome of the Goode’s thornscrub tortoise is from the Vertebrate Genomes Project.

Plants:

Metazoa:

New Assemblies and/or Annotation

For one of our favourite animals, the Platypus (Ornithorhynchus anatinus), we have a new assembly (mOrnAna1.p.v1) and annotation from the Vertebrate Genomes Project. We also updated the assembly and annotation for the Northern pike (Esox lucius, Eluc_v4 assembly).

The fungus Zymoseptoria tritici, an important wheat pathogen causing septoria leaf blotch, got an additional gene set from the Max Planck Institute for Evolutionary Biology and a revised gene set from Rothamsted Research.

New Interface for Configuration of Multidimensional Track Hubs

In this release, we introduce a new interface for multidimensional track hubs, such as the Blueprint Hub. Here is where to find the interface and how to use it:

Once you attached your track hub click on the ‘Configure this page’ button on the Location tab. For the Blueprint Hub, you have two options: You can configure the  ‘Blueprint Region’ or ‘Blueprint Signal’ display. We will use ‘Blueprint Region’ as an example.

In a first step, you need to select your tracks from a matrix based on the two primary dimensions as decided by the Track Hub provider. For Blueprint these are the sample description (here a cell type) and experiment (here a data type) you are interested in. You can do this by selecting or deselecting all and by selecting individual sample descriptions or experiments. You can use the search box at the top of each tab to help you find the data you are interested in. Now click on the green ‘Filter tracks’ button at the bottom.

In a second step, you can refine your selection based on other dimensions to the data, for example analysis_group or analysis_type (note this screen is not shown for Track Hubs with two dimensions only). Click on the cells in the matrix to select what you want to see. Now click on the green ‘Configure track display’ button at the bottom.

In a third step, you can configure your track display. Click on the cells in a similar matrix to select what you want to see. Finally, click on the green ‘View tracks’ button at the top to load and display the data you have selected and you are done.

New Data and Displays for Plants

We have new variation and comparative genomics data and displays for plants in this release. For Triticum aestivum (Bread wheat), the release brings a new Linkage disequilibrium display that you can find on the Location tab. For Triticum turgidum (Durum wheat), we have five additional pairwise genome alignments with related grasses, namely Aegilops tauschii, Hordeum vulgare, Oryza sativa Japonica Group, Triticum aestivum and Triticum dicoccoides. You can find an example here. Finally for Rice, we computed a multiple genome alignment for the eleven Oryza taxa in Ensembl Plants. You can look at an example here. For this alignment, we used the Ensembl Compara Enredo-Pecan-Ortheus (EPO) pipeline with optimised parameters.

GRCh37

Following our consultation in 2019 and a blog post summarising the results, we have now removed all non-human data from our GRCh37 resources (grch37.ensembl.org, grch37.rest.ensembl.org and the public database at ensembldb.ensembl.org:3337). We have new data too: We updated our variation resource with new data versions including dbSNP 153, COSMIC 90 and ClinVar’s December 2019 release.  We have generated regulatory feature annotations using the latest regulatory build and have updated our imported sets of enhancers (VISTA) and miRNA/gene interactions (Tarbase). We have also updated RefSeq annotations.

Other Updates and Changes

  • Single exon genes for Sus scrofa (Pig)
  • Mitochondrial sequences and annotation for Macaca mulatta (Macaque)
  • Common name for Salmo trutta updated to Brown trout
  • Discontinuation of dN/dS analysis for vertebrates and plants
  • BioMart will no longer hold mappings of variants to transcripts with the biotypes lncRNA, processed_pseudogene and unprocessed_pseudogene. These data, including predicted functional consequences, will not be available for filtering and will not be reported as attributes. Such consequence information  will still be available in browser views for variants or transcripts and will be reported by the Ensembl VEP.
  • Search of the website will no longer return phenotype records for the ambiguous search terms: Annotated by HGMD, Variant of unknown significance, NOT IN OMIM, not provided, not specified, None, ClinVar: phenotype not specified.
  • Retirement of two archive sites, dec2014.archive.ensembl.org (Ensembl 78) and mar2015.archive.ensembl.org (Ensembl 79)