The Genome Reference Consortium (GRC) plans to release a new human assembly (GRCh38) later this year. What is the reason for an update? The current human genome in Ensembl, UCSC and NCBI (GRCh37) is indeed high quality. The GRC reports it’s accurate to an error rate of ~1 in 100,000 bases. However, there are still gaps in the assembly and there are a number of difficult loci still to be resolved. The new update addresses many of these issues. More reasons for the update can be found on the GRC blog.
The new assembly will be available in Ensembl next year (the third or fourth quarter of 2014). What happens between the public release of the updated human assembly and the Ensembl release of the annotated genome? A series of posts from our team will cover the work required to annotate genes, variants, and more to a high standard. These articles will reflect not only the efforts to deliver high quality annotation, but to integrate the data in a useful way for our users. At least one post each month will reflect our thorough analysis of the genome in the following areas:
- Release cycle (how does it work?)
- The GRCh38 assembly
- The new regulation build– integration of ENCODE and Blueprint data
- Coordinate mapping from one assembly version to another
- Processing dbSNP and other sources of variants
- High quality genes, annotating the GENCODE set
- Updates to the Variant Effect Predictor (VEP)
- Determining stable ids for the new gene set
- Whole Genome Alignments– pairing up the new human assembly with other species
- Quality control- how do we check our data?
Keep your eye on our blog for more posts in this series, marked with the category “GRCh38 Ensembl”.