The human pangenome: a more diverse human reference genome in Ensembl

The human pangenome, a high-quality collection of reference human genome sequences that better captures diversity from different human populations compared to the current human reference genome, is now available through Ensembl. 

The work was led by the international Human Pangenome Reference Consortium (HPRC), a group funded by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH) and consisting of 14 institutes, including EMBL’s European Bioinformatics Institute (EMBL-EBI).

Researchers have released a new human pangenome reference, a high-
quality collection of reference human genome sequences that captures
substantially more diversity from different human populations than what was
previously available. Credit: Darryl Leja, NHGRI.

Genome sequences differ only slightly among individuals. In the case of humans, any two genomes are, on average, more than 99% identical. Small genomic differences contribute to each person’s uniqueness and can provide insights about their health, helping to diagnose disease and guide medical treatments. 

These small differences mean that using one standard reference genome, as many studies currently do, can have limitations. While the previous reference genome sequence was single and linear, the pangenome represents many different versions of the human genome sequence at the same time. This gives researchers a wider range of options for using the pangenome in analysing other human genome sequences. 

The new pangenome reference is a collection of different genomes from
which to compare an individual genome sequence. Like a map of the subway
system, the pangenome graph has many possible routes for a sequence to take,
represented by the different colors.  
The detouring paths at the top of the image represent single nucleotide variants
(SNVs), which are single letter differences. The yellow path that loops around itself
and repeats the same nucleotides represents a duplication variant. The pink path
that loops counterclockwise and follows the nucleotide sequence backwards
represents an inversion variant. At the bottom, the green and dark blue paths miss
the C nucleotide in its route and represent a deletion variant. The light blue path,
which has extra nucleotides in its route, represents an insertion variant.
Credit: Darryl Leja, NHGRI.

Expanding the range of genomes to increase the diversity present in the human reference genome will help progress personalised medicine by enabling clinicians to better tailor treatment to individual patients. This draft human pangenome reference includes the maternal and paternal genome sequences from 47 people, and the researchers are aiming to increase this number to 350 by mid-2024. The work, published in the journal Nature, is one of several papers published today by HPRC members. The majority of the genomes used to create the human pangenome reference were collected as part of the 1000 genomes project, the largest public catalogue of human variation and genotype data from a wide range of populations.

Accessing the human pangenome data 

In order to understand the differences in the genes present across the individual genomes represented in the human pangenome, Ensembl have mapped the high-quality annotations on the reference human genome generated as part of the GENCODE project, across the pangenome.

The human pangenome sequences and annotation are openly accessible on the Ensembl human pangenome project page and through Ensembl Rapid Release.

More about the Human Pangenome Reference Consortium

The Human Pangenome Reference Consortium (HPRC) is a project funded by the National Human Genome Research Institute to sequence and assemble genomes from individuals from diverse populations in order to better represent the genomic landscape of diverse human populations.

Institutions involved in the HPRC can be found on the project’s main page.

Information about the range of populations included in the project can be found on the project’s population sampling and representation page

This blog was adapted from the NHGRI and EMBL-EBI press release.