What’s coming in Ensembl release 113 / Ensembl Genomes 60?

We expect to release Ensembl 113 and Ensembl Genomes 60 in September October 2024. Below is a list of updates that we are hoping to include in the upcoming release. However, please note that we cannot guarantee everything listed here will make it into the final release.

Vertebrates

GENCODE

The Havana team is planning to integrate a large number of long non-coding RNA (lncRNA) transcripts originating from the Capture Long-read Sequencing (CLS) project within the GENCODE consortium. The number of annotated lncRNA transcripts in Homo sapiens (Human) and Mus musculus (Mouse) will increase by roughly 130,000 in each species in release 113. We expect approximately 20,000 new lncRNA genes.

The default Gene track is changing from GENCODE Comprehensive to GENCODE Basic to enable the display of the increased number of transcripts. Additionally, the GENCODE Primary tag will be introduced. This is a new transcript subset which covers all human exons in a minimal set of transcripts. The tag will be represented in REST API responses and GFF3/GTF files as gencode_primary. The GENCODE Basic tag (currently labelled as basic) will be changed to the tag gencode_basic.

Havana manual annotation

Updated Havana manual gene annotation will become available for H. sapiens (Human; GRCh38) and M. musculus (Mouse). Moreover, the major histocompatibility complex (MHC) genes in Rattus norvegicus (Norway rat; mRatBN7.2) and Sus scrofa (Pig; Sscrofa11.1) have been manually updated together with additional immune genes. 

New gene annotation

Additional genome assemblies and gene annotation for breeds will be added for existing species:

  • Capra hircus (Goat): 2 breeds
  • Ovis aries (Sheep): 8 breeds
  • Sus scrofa (Pig): 8 breeds

Variation data

Our O. aries (Sheep; ARS-UI_Ramb_v2.0) displays have been updated to show variants from the European Variation Archive (EVA) release 5.

To simplify variant displays during the large increase in H. sapiens (Human; GRCh38) transcripts driven by the ongoing incorporation of models from long-read sequencing data, we will only display pre-calculated transcript consequences for the GENCODE Primary set, which includes all exons on this assembly. The H. sapiens (Human; GRCh37) displays will remain unchanged. We will provide a VCF file on the FTP site containing variant annotations for the full set of GRCh38 transcripts.

The Ensembl Variant Effect Predictor (VEP) will support the GENCODE Primary transcript set to enable annotation of all potential variant consequences without duplication across multiple transcripts. The following will be supported:

  • optionally limiting Ensembl VEP annotation to this set
  • reporting the flag in Ensembl VEP output (as is currently done for MANE transcripts) as a boolean

GnomAD population allele frequency data for Human will be updated to v4.1. This will be available for the website, command-line and REST API instances of Ensembl VEP. We are also making REVEL and ClinPred scores available on the website and REST API version of Ensembl VEP for the Human GRCh38 assembly.

Regulation

Regulatory annotation for H. sapiens (Human) and M. musculus (Mouse) will receive a major update. This will include regulatory features, the set of epigenomes (tissues and cell lines) and the associated regulatory activity. We will be retiring the Transcription Factor Binding Sites (TFBS) regulatory feature from Human and Mouse. We will also update the motif features for Human and add motif feature annotation to Mouse. 

The format of ENSR IDs will be updated to use additional characters (capital letters and the underscore symbol _). This affects all species. Additionally, regulatory annotation will be updated for all species to address minor inconsistencies, add further data and introduce the new ENSR IDs.

Microarray annotation will be retired for the following existing species:

  • Anas platyrhynchos (Mallard)
  • Aotus nancymaae (Nancy Ma’s night monkey)
  • Callithrix jacchus (Common marmoset)
  • Carlito syrichta (Philippine tarsier)
  • Cavia porcellus (Guinea pig)
  • Cercocebus atys (Sooty mangabey)
  • Colobus angolensis palliatus (Angolan colobus)
  • Cricetulus griseus (Chinese hamster) CHOK1GS
  • Cricetulus griseus (Chinese hamster) CriGri
  • Cyprinodon variegatus (Sheepshead minnow)
  • Fundulus heteroclitus (Mummichog)
  • Ictalurus punctatus (Channel catfish)
  • Mandrillus leucophaeus (Drill)
  • Mesocricetus auratus (Golden hamster)
  • Microcebus murinus (Gray mouse lemur)
  • Mus spretus (Algerian mouse)
  • Nannospalax galili (Middle East blind mole-rat)
  • Nomascus leucogenys (Northern white-cheeked gibbon)
  • Ornithorhynchus anatinus (Platypus)
  • Papio anubis (Olive baboon)
  • Piliocolobus tephrosceles (Ugandan red colobus)
  • Prolemur simus (Greater bamboo lemur)
  • Propithecus coquereli (Coquerel’s sifaka)
  • Rhinopithecus bieti (Black-and-white snub-nosed monkey)
  • Rhinopithecus roxellana (Golden snub-nosed monkey)
  • Saimiri boliviensis boliviensis (Black-capped squirrel monkey)
  • Theropithecus gelada (Gelada)

The following REST API regulation endpoints will be retired:

GET regulatory/species/:species/id/:id
GET regulatory/species/:species/epigenome
GET regulatory/species/:species/microarray/:microarray/vendor/:vendor
GET regulatory/species/:species/microarray
GET regulatory/species/:species/microarray/:microarray/probe/:probe
GET regulatory/species/:species/microarray/:microarray/probe_set/:probe_set

In addition, the other_regulatory and array_probe options in the feature parameter will be removed from the following endpoints:

GET overlap/region/:species/:region
GET overlap/id/:id

Ensembl Plants

Additional variation data from the Watkins collection will be added for Triticum aestivum (Common wheat; GCA_900519105.1).

Genome assemblies and gene annotation will be updated for the following species:

  • Marchantia polymorpha (Common liverwort; GCA_039105155.1)
  • Triticum aestivum (Common wheat; Paragon v2; GCA_949126075.1)

Additional genome assemblies and gene annotation will be added for the following existing species:

  • Brassica rapa (Field mustard; GCA_900412535.3)

Genome assemblies and gene annotation will be added for the following new species:

  • Arachis hypogaea (Peanut; GCA_003086295.3)
  • Lathyrus sativus (Grass pea; GCA_963859935.3)
  • Triticum timopheevii (Sanduri wheat; GCA_963921465.1)

Ensembl Metazoa

A new annotation source will be added for the existing Culex quinquefasciatus (Southern house mosquito; GCA_015732765.1) assembly.

Additional genome assemblies and gene annotation will be added for existing species:

  • Anopheles coluzzii (Mosquitos; GCA_943734685.1)
  • Anopheles funestus (African malaria mosquito; GCA_943734845.1)
  • Lutzomyia longipalpis (Sandfly; GCA_024334085.1)
  • Phlebotomus papatasi (Sandfly; GCA_024763615.2)

Genome assemblies and gene annotation will be added for the following new species:

  • Anastrepha ludens (Mexican fruit fly; GCA_028408465.1)
  • Anastrepha obliqua (West Indian fruit fly; GCA_027943255.1)
  • Bombus affinis (Rusty patched bumble bee; GCA_024516045.2)
  • Culex pipiens pallens (Common house mosquito; GCA_016801865.2)
  • Cylas formicarius (Sweet potato weevil; GCA_029955315.1)
  • Diorhabda carinulata (Northern tamarisk beetle; GCA_026250575.1)
  • Diorhabda sublineata (Subtropical tamarisk beetle; GCA_026230105.1)
  • Hylaeus anthracinus (Anthricinan yellow-faced bee; GCA_026225885.1)
  • Hylaeus volcanicus (Volcano masked bee; GCA_026283585.1)
  • Malaya genurostris (Mosquitos; GCA_030247185.2)
  • Microplitis demolitor (Parasitoidwasp; GCA_026212275.2)
  • Microplitis mediator (Endoparasitoid wasp; GCA_029852145.1)
  • Mytilus californianus (California mussel; GCA_021869535.1)
  • Phlebotomus argentipes (Sandfly; GCA_947086385.1)
  • Plodia interpunctella (Indianmeal moth; GCA_027563975.1)
  • Spodoptera frugiperda (Fall armyworm; GCA_023101765.3)
  • Topomyia yanbarensis (Mosquitos; GCA_030247195.1)
  • Toxorhynchites rutilus septentrionalis (Elephant mosquito; GCA_029784135.1)
  • Uranotaenia lowii (Sandfly; GCA_029784155.1)
  • Wyeomyia smithii (Pitcher plant mosquito; GCA_029784165.1)
  • Zeugodacus cucurbitae (Melon fly; GCA_028554725.2)

The following genomes will be removed:

  • Athalia rosae (Turnip sawfly; GCA_000344095.2)
  • Drosophila yakuba (Fruit fly; GCA_000005975.1)
  • Melitaea cinxia (Glanville fritillary; GCA_000716385.1)

Other updates and changes

  • The latest InterProScan version (5.69-101.0) will be run on all species, including those in Ensembl Bacteria
  • Cross-references for all plant and vertebrate species will be fully updated
  • UniProt references for all species will be updated
  • Ensembl Genomes 37 (October 2017), Ensembl Genomes 40 (July 2018) and Ensembl 97 (July 2019) archives will be retired