Haplotype data from the 1000 Genomes Project available in Ensembl

Ensembl incorporated haplotype data from the 1000 Genomes Project into e!84, which was released in March 2016. These data allow you to view genomic sequence variants that associate together—haplotypes—and how they track through individuals and populations.

Humans, unlike lab mice, are fairly outbred. Once a reference human genome was available, a logical next step was thus to sample the spectrum of genetic diversity, with the ultimate aim of understanding how such genetic variation might affect phenotype. Both the International HapMap Project and the 1000 Genomes Project undertook large-scale collection of variation data to determine how genotype frequencies differ from population to population, and have catalogued tens of millions of variants occurring in individuals around the world.

Heritable genetic variation arises when germ cells replicate, and can result either from proofreading errors made by DNA polymerase or from recombination between homologous chromosomes during meiosis I. Types of variation include nucleotide changes, insertions and deletions, which can each create variant genomic sequences, or alleles. Alleles that are physically close together are likely to co-segregate during recombination. These are called haplotypes, and can be passed through the germline from parent to child. Depending on their effects on fitness—neutral, negative or positive—haplotypes will become more or less prevalent over time, with the rates at which they occur in different populations typically reflecting both population size and evolutionary pressures.

The 1000 Genomes Project, which assayed 2504 individuals from 26 populations across five continents, ultimately discovered more than 88 million sequence variants. In order to gain a sense of their functional significance, variants that co-occur in exons have been grouped together to infer haplotypes, which can now be viewed and browsed on transcript pages in Ensembl.

One example can be found below. Olfactory and taste receptor genes are highly variable, with many alleles that result in subtle differences in smell and taste perception from person to person. TAS2R38, which encodes a taste receptor involved in perceiving bitter flavours such as PTC, has a number of different alleles that combine together into numerous haplotypes. Depending on your haplotype, you may be more or less sensitive to bitterness, which might also influence how much you like coffee, certain vegetables and dark beer.

By default, the table depicts protein haplotypes, including the specific amino acid changes found in 1000 Genomes Project participants; the effects of the alleles on the protein sequence, including stop codons or predicted deleterious substitutions (based on PolyPhen or SIFT scores), are flagged. In addition, you can switch from the protein to the transcript-haplotype view, which displays the nucleotide substitutions that underlie particular protein haplotypes. Redundancy in the genetic code means that several transcript haplotypes may together encode a single protein haplotype.

The tables also show haplotype frequency in the entire 1000 Genomes dataset and in different subpopulations. Frequency is represented as a proportion, as well as by the number of copies of the haplotype identified. (Each autosome can have identical or different haplotypes at each locus, as can the X chromosome in females, so the number of copies does not sum to the number of participants.)

Understanding the composition of haplotypes, and how these grouped alleles track through populations, yields insight into selective pressures and helps link genotypes to functionally relevant phenotypes. By incorporating these data, Ensembl can provide a better representation of sequence variation, and its functional consequences.

Ensembl Blog

News about the Ensembl Project and its genome browser

Haplotype data from the 1000 Genomes Project available in Ensembl