Expanding Ensembl Metazoa gene trees 

As of Ensembl 110 / Ensembl Genomes 57, gene trees within Ensembl Metazoa have been expanded to cover 275 species by dividing them into 3 taxonomic clade sets: Metazoa, Protostomia, and Insecta. In addition, the release and update frequency of metazoan gene trees will change, with Metazoa and Protostomia being updated in every even-numbered release and Insecta being updated in every odd-numbered release. Read on to find out more about this update.

Until Ensembl 109 / Ensembl Genomes 56, Ensembl Metazoa has generated gene trees over a single set of taxa. The set included 175 taxa, characterising the most relevant invertebrate metazoan species hosted by Ensembl. Due to ever increasing volumes of new metazoan genomes and finite computational resources, changes are required to meet user demands while allowing continued support and delivery of up-to-date Ensembl comparative resources

In order to increase the total amount of species, thereby expanding overall taxonomic scope included in the Comparative genomics resources of Ensembl Metazoa without affecting our release cycle, from Ensembl 110 / Ensembl Genomes 57 the number of taxonomic clade sets in our gene trees will expand from one to three:

  • Metazoa
  • Protostomia
  • Insecta

With this simple change, we will be able to cover up to 275 species, that is, 100 species more than before. Taxonomic sets have been given different maximum sizes based on their relevance for our users and the current tendencies on clade expansion. Metazoa and Protostomia will have a maximum of 75 species whilst Insecta will cover up to 125 species. You can find the detailed list of species included in each set here.

The available data within each gene tree set will remain the same, but limited to the species included in that particular set. Importantly, we have considered it essential to include key species, such as Drosophila melanogaster, in more than one of these sets (see Figure 1). As displayed in Figure 2, all the available gene trees will be accessible from the left side-panel inside the Comparative Genomics menu as usual. This will also affect homology prediction, particularly for those species present in more than one gene tree. In those cases, a priority system has been put in place to choose what information to preserve: Metazoa > Protostomia > Insects. This means for any given species, the homology information displayed will be the most left-side clade set in that list from all the gene trees that species belongs to. For instance, following Figure 1, Lingula anatina will display only the Metazoa homologies with Loa loa, despite both species being included in Metazoa and Protostomia sets.

The changes outlined above will have some impact on the release and update frequency of metazoan gene trees. Moving forward from Ensembl 110, Ensembl Metazoa releases will jointly deliver two of the three gene tree sets together (Set A: Metazoa and Protostomia), while Insecta (Set B) will release separately. Each release will alternate between Set A and Set B i.e. Metazoa and Protostomia gene trees will be updated in every even numbered Ensembl release while Insecta will be released in Ensembl 111 / Ensembl Genomes 58 and then updated in every odd numbered release thereafter.

We acknowledge that the aforementioned changes to metazoan comparative resources introduce non-trivial, meaningful changes, and therefore we will archive Ensembl 109. The Ensembl 109 archive should allow users, if they prefer, to keep using single taxon set gene trees until the first full complement of all three new gene tree sets are made available in Ensembl 111.

Figure 1: The three gene tree sets with their maximum size, list of overlapping species between sets in Ensembl 110 / Ensembl Genomes 57 and priority of the default set displayed on the website. Priority of taxa set order: Metazoa > Protostomia > Insecta. 

Figure 2: New left-side menu options for the different gene tree sets. Metazoa and Protostomes will be available in Ensembl 110 / Ensembl Genomes 57, and Insects will be added from Ensembl 111 / Ensembl Genomes 58.

Authors: Jorge Álvarez Jarreta and Lahcen Campbell
Editors: Vasily Sitnik, Aleena Mushtaq, Benjamin Moore and Louisse Paola Mirabueno