DNA methylation is an epigenetic mark known to be important in many biological processes, namely in the regulation of gene expression.

In addition to MeDIP-Chip methylation datasets for 17 cell lines and tissues from Rakyan et al (2008), Ensembl now also provides, in the form of DAS tracks, RRBS (Reduced Representation Bisulfite Sequencing) data for 44 cell lines from ENCODE. This data provides genome-scale information at single-nucleotide resolution regarding the methylation state of the genome in each cell line.

To visualize this data, when in the Location View, go to Configure this page in the left panel, and select the appropriate DNA Methylation tracks within the Regulation section of the configuration panel.

The methylation state is indicated by a color gradient which goes from dark blue, indicating highly methylated areas, through green and then towards yellow which indicates low methylation.

 

We merged replicates by a weighted average of the percentage of methylation of the replicates, but only taking positions with at least 20 reads in the combined set.

More information regarding the ENCODE datasets can be seen here. We follow the ENCODE data policies regarding data usage.

We are glad to announce the launch of our latest installment.

Ensembl Release 63 (e63) includes a new high-coverage assembly for microbat (Myotis lucifugus), the most recent human and zebrafish manual gene annotations from Havana, and a fresh update of mouse variation data, among numerous other additions. The previous Ensembl release is archived at e62.ensembl.org.

Tracks on Region in detail and Region overview pages can now be reordered by dragging them to a new position on the image. The strand of the track can still be identified by a colour and a text message when passing the mouse over the track bar.

 

The popular Variant Effect Predictor (VEP) tool has been updated in e63, including speed improvements and a renewed support for variants that fall in regulatory regions.

 

Pie charts have been added to the human variation pages for the 1000 Genomes population allele frequencies.


A new configuration table facilitates the exploration of regulatory data, including the capacity to search for specific markers of interest.  To access this functionality, click on ‘Configure this page’ while on a Location View or a Regulatory Region View and select ‘Regulatory Evidence’.

 

A new microbat genome assembly brings it from low to high-coverage. A new genebuild has been performed on this assembly using the Ensembl gene annotation pipeline.

 

Users of our Perl API will certainly enjoy the new Doxygen-based API documentation, with an improved user interface, better support for object-oriented programming and a comprehensive search tool. There is also an updated Regulation API tutorial to help users access regulatory data programmatically.

More details on some of these changes will be posted soon, so keep an eye on our blog!

More information also available on the Ensembl website.

Ensembl provides annotations indicating regions in the genome that are experimentally verified to be bound by transcription factors (from ChIP-Seq experiments). Within these regions, we now also provide precise transcription factor binding sites. To generate these binding sites, we make use of publicly available Position Weight Matrices (PWM) from Jaspar.

Transcription factor binding sites can be seen as black boxes in the Regulatory Features track. If you click on a Regulatory Feature you can see information regarding the binding sites contained within that regulatory feature. This includes the binding matrix used and a binding score representing how well a particular site matches the binding matrix. Clicking on a specific black box within the regulatory feature will highlight the corresponding information on the menu (the darker blue line in the figure showing information for a CTCF binding site). Transcription factor binding sites are also displayed as evidence for a regulatory feature (as ‘Core PWM’ entries).

To generate these PWM matches we take Jaspar matrices and find matches throughout the genome. Then, we use experimental binding data to stringently choose high confidence binding sites that fall within regions enriched in ChIP-Seq experiments for the corresponding factor. More details on this process can be found here.

Since release 58, the Ensembl Regulatory Build has been cell type specific. Regulatory Features are defined as sites of open chromatin which are potentially involved in gene regulation. These are built using data from different cell types, resulting in differing structures, attributes and classifications across the various cell types.

Release 59 also boasts greater coverage due to the incorporation of more data sets (see previous blog post), and a new ‘projection’ methodology. Projection allows Regulatory Features to be built on cell lines with sparse data, and also consolidates existing higher quality builds. The result of these changes is an increase in the number of Regulatory Features per cell type, as well as an improvement in the number of features which are assigned a classification e.g. Gene Associated or Promoter Associated etc.

The ‘Regulation’ panel has also been updated to reflect the new cell type specific nature of the build e.g. ENSR00000515919. The ‘Details by cell line’ view is now split into several sections, each showing data for a specific cell line. The uppermost section details a species ‘MultiCell’ cell line, showing data used to define the core regions of the given Regulatory Feature. More information on this view can be found here. We are always looking at ways to improve our Regulatory Build process, some areas we are currently considering are listed in our development road map.