146 new insect genomes on Ensembl Rapid Release

The latest update to the Ensembl Rapid Release brought 146 winged insect genome assemblies covering 94 different species across Lepidoptera and Hymenoptera orders. The majority of these come from the Darwin Tree of Life project.

The Darwin Tree of Life (DToL) project aims to sequence the genomes of around 70,000 eukaryotic species in the British Isles and preserve their digital records. It is a multi-discipline collaboration across genomics, ecology, and taxonomy with a potential of revolutionising the way we do biology and conservation. Our role in this effort is to annotate these newly sequenced genomes and freely provide this data through the Ensembl genome browser. We have recently adapted our automated annotation pipelines to non-vertebrate eukaryotic genomes to be able to annotate a broader spectrum of genomic biodiversity, beyond vertebrates.

We distribute this new data through the Ensembl Rapid Release – our lightweight genome browser designed to release the latest genome annotation for a large number of vertebrate and non-vertebrate species every two weeks. The latest Ensembl Rapid Release update includes 146 genome assemblies, from 94 species of winged insects, belonging to 59 different genera:

Number of new species across Hymenoptera and Lepidoptera

All but 20 of these come from the DToL project. They will complement the 14 Lepidoptera species from the DToL, that we already host on the Ensembl Rapid Release site. As the genomes produced by the DToL are phased, both copies of the chromosomes can be reconstructed for each diploid species. The genome assembly is usually a haploid representation of the genome, therefore these genomes are then split into two assemblies: the primary haplotype and the alternative haplotype. The primary haplotypes have gone through an additional step of manual curation and are chromosomal level assemblies of highest quality, recommended for downstream analysis. The alternative haplotypes have not been manually curated and thus might have artificial duplications, or missing sequences, and are generally available at the scaffold level only. For species where the alternative haplotype is of high quality, Ensembl has also generated a full gene set annotation, using the same process and data as generating the gene set for the primary haplotype. Therefore, many of the new species will have gene annotations on both the primary and the alternate haplotype. Having two sets of annotated chromosomes captures a greater deal of the genetic diversity and their comparison might reveal interesting biological differences. For many of these species, it is the first-ever molecular data produced, which will help to refine the taxonomy, understand their evolution, and aid protection and restoration of biodiversity.