I am writing in my capacity as leader of the Ensembl project based at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) based near Cambridge, England. Ensembl is one of the world’s leading sources of genome information and a central aggregation point for genomic data.

Continue reading

Today the Wellcome Trust Sanger Institute and the European Bioinformatics Institute (EMBL-EBI) announced plans to reorganise the Ensembl project so that we can best leverage the strengths of Ensembl’s parent institutes to capitalise on emerging opportunities in genomics

From our users’ perspective, all existing Ensembl services including our genome browser, APIs and regular releases will continue as usual. Behind the scenes, the current Ensembl services will be consolidated at EMBL-EBI to help us strengthen the existing resources and facilitate closer links with UniProt, Ensembl Genomes and the Expression Atlas, which are all based at EMBL-EBI. Ensembl will also continue to be closely involved in the GENCODE project coordinated at Sanger.

New Ensembl activities that focus on novel methods for storing and representing human variation will be based at the Sanger Institute. These efforts will be aligned with the aims of the global alliance for the secure sharing of genomic and clinical data. As part of the reorganisation, Ensembl will also be connected more closely with the DECIPHER project. DECIPHER is an interactive, web-based service to support sharing of likely functional, rare clinical variation and engagement with the clinical community.  These connections will improve clinically relevant access to the rich genome annotations provided within Ensembl.

This is an extraordinary time for genomics. Ensembl has supported and contributed to the dramatic advances in genomics over almost 15 years and we are excited about what the future holds.

In collaboration with the Neandertal Genome Project, we have created an Ensembl-style browser of the Neandertal data available at http://projects.ensembl.org/neandertal. A draft sequence of the Neandertal genome was published in the May 7 issue of Science.

The Neandertal browser includes the ability to visualise the Neandertal data using the new Resembl code developed in collaboration with Illumina. The Resembl code will be introduced in the 1000 Genomes browser later this month and in Ensembl over the summer.

Data include:
– Neandertal sequencing reads from all 6 Neandertal fossils
– Neandertal contigs/consensus from all individuals combined
– Modern human sequencing reads to put the divergence of the Neandertal genomes into perspective
– Selective sweep scan to detect positive selection in early modern humans
– A catalog of changes consisting of Neandertal alleles for positions of non-synonymous difference between human and chimpanzee

Full details of the data types and instructions for using our new display tools are available on the data information page.

Links are also provided from the Neandertal Browser home page to the raw sequence data stored at the EBI for the Neandertal genome project and the modern human genome data.

Further information about the project is available from the project page at Max Planck Institute for Evolutionary Anthropology, from the genome paper and from other companion papers in the same issue of science.

We thank Janet Kelso, Ed Green and Udo Stenzel at the MPI for assistance and Eugene Kulesha at the EBI for work to create the Neandertal browser.

Ensembl release 57 has been rescheduled for mid to late February 2010.

We had originally planned for release 57 to be this week, but our final quality checks identified a significant error in the unreleased data set. Because of this, we feel that our users would be better served by rescheduling the release to ensure that we provide the best possible data resources for the community.

On behalf of everyone in the project, thank you for your continued support of Ensembl and we wish you all the very best for the holiday season and the new year.

Ensembl is currently down due to a power outage in the data centre at the Sanger Institute last night. Power has been restored, but it will take some time to restore all of the services.

We are working to get things up and running and expect that Ensembl will be back mid to late morning UK time.

Ensembl has begun to incorporate data from genome-wide association studies. These data are being added in coordination with the European Genotype Archive, a new database resource at the EBI designed to provide a permanent archive for human variation data that is not available for unlimited public release because of ethical or individual privacy restrictions. The European Genotype Archive has recently launched with the raw data from the Wellcome Trust Case Control Consortium (WTCCC. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661-678). In the future the EGA will provide additional array-based genotype data as well as data from re-sequencing and CNV studies. The EGA will also contain phenotype data.

Ensembl is incorporating summary data from genome-wide association studies represented in the EGA. The data generally represent the p-value for each of the tested SNP (Single Nucleotide Polymorphism) associated with the given phenotype.

The WTCCC summary data is now available on Ensembl as DAS tracks selectable from the “DAS Sources” menu from the CytoView and ContigView pages. The following menu items provide access to data from biopolar disorder (BD), coronary artery disease (CAD), cardiovascular disease (CD), hypertension (HT), type 1 diabetes (T1D), type 2 diabetes (T2D):

WTCCC BD
WTCCC CAD
WTCCC CD
WTCCC HT
WTCCC T1D
WTCCC T2D

In future releases, GWAS data will be integrated into the Ensembl variation databases.

We will be adding additional data to both Ensembl and the European Genotype Archive as the data become available. We hope you find these new data resources useful.

Today the 1000 Genomes projects was announced. By any measure this is a big deal.
The goal is simple: to create the most comprehensive and medically useful collection of human variation ever assembled by producing approximately 6 terabases of sequence. To put this amount of data in prospective, 6 terabases is more than 60 times the amount of data that is currently available in the DDBJ/GenBank/EMBL Archive and that took more than 25 years to collect. At the peak production of the 1000 Genomes project more that 8 billion basepairs per day will be sequenced. It’s data output of the the entire human genome project every week. All made publicly available.
The data generation rate and the short read length mean that the bioinformatics requires for the project are equally ambitious (or terrifying depending on your point of view). The EBI and NCBI, working together, are creating a joint DCC (data coordination centre) to collect, organise and provide the data to the world. Steve Sherry at the NCBI and I are eager to take this on.
At Ensembl we’ve been expecting this development and built support for re-sequencing data into our variation database a couple of years ago. So far, we have data for about 6 humans, 5 mouse strains, and a smattering of rat data. Small stuff compared to six months from now, but large enough that we have both experience and confidence dealing with the large-scale resequencing data. We are probably going to need both.
Check out more at http://www.1000genomes.org