The human pangenome, a high-quality collection of reference human genome sequences that better captures diversity from different human populations compared to the current human reference genome, is now available through Ensembl.
The work was led by the international Human Pangenome Reference Consortium (HPRC), a group funded by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH) and consisting of 14 institutes, including EMBL’s European Bioinformatics Institute (EMBL-EBI).
Genome sequences differ only slightly among individuals. In the case of humans, any two genomes are, on average, more than 99% identical. Small genomic differences contribute to each person’s uniqueness and can provide insights about their health, helping to diagnose disease and guide medical treatments.
These small differences mean that using one standard reference genome, as many studies currently do, can have limitations. While the previous reference genome sequence was single and linear, the pangenome represents many different versions of the human genome sequence at the same time. This gives researchers a wider range of options for using the pangenome in analysing other human genome sequences.
Expanding the range of genomes to increase the diversity present in the human reference genome will help progress personalised medicine by enabling clinicians to better tailor treatment to individual patients. This draft human pangenome reference includes the maternal and paternal genome sequences from 47 people, and the researchers are aiming to increase this number to 350 by mid-2024. The work, published in the journal Nature, is one of several papers published today by HPRC members. The majority of the genomes used to create the human pangenome reference were collected as part of the 1000 genomes project, the largest public catalogue of human variation and genotype data from a wide range of populations.
Accessing the human pangenome data
In order to understand the differences in the genes present across the individual genomes represented in the human pangenome, Ensembl have mapped the high-quality annotations on the reference human genome generated as part of the GENCODE project, across the pangenome.
The human pangenome sequences and annotation are openly accessible on the Ensembl human pangenome project page and through Ensembl Rapid Release.
More about the Human Pangenome Reference Consortium
The Human Pangenome Reference Consortium (HPRC) is a project funded by the National Human Genome Research Institute to sequence and assemble genomes from individuals from diverse populations in order to better represent the genomic landscape of diverse human populations.
Institutions involved in the HPRC can be found on the project’s main page.
Information about the range of populations included in the project can be found on the project’s population sampling and representation page.
This blog was adapted from the NHGRI and EMBL-EBI press release.