Ensembl is holding a workshop titled, ‘Introduction to automatic gene annotation’ aimed at developers. The workshop runs on 29-30th of October 2013 at Cold Spring Harbor Laboratory, New York.

Registration for this workshop is free, but participants will need to cover their own accommodation and meal expenses. Please contact Bert (bert@ebi.ac.uk) for more details or to register.

Two Ensembl developers will present sessions on how to create your own core database, including the loading of a genome assembly into a database and the running of simple analyses using the Ensembl genebuild system.

Participants will be expected to have experience in programming and a background in object-oriented programming. A good familiarity with Perl, a Unix/Linux environment, and MySQL are essential to follow the workshop and the programming examples. Knowledge of the Ensembl core API is also essential.

Topics to be presented:

  • Introduction to the Ensembl genebuild system, including data input types, generating protein-coding transcript models, and adding UTR to these models
  • An introduction to assembly structure (toplevel, contigs, scaffolds,  chromosomes)
  • Overview of the Ensembl Analysis and Pipeline APIs
  • Obtaining the Ensembl API (cvs checkout)
  • Core database schema
  • Tracking jobs in the system
  • Runnable and RunnableDB modules

Practical sessions:

  • Creating a genebuild database
  • Loading an assembly into the database
  • Running algorithms first on the commandline and then using the  pipeline
  • Understanding how the pipeline code interacts with the algorithms and the database
  • Understanding the pipeline’s job tracking system
  • Visualisation of results with Apollo

Would you like to join us? Please contact Bert (bert@ebi.ac.uk) for more details or to register.

Related Cold Spring Harbor Conference:
Genome Informatics 2013, 30 October to 2nd November, Cold Spring Harbor, New York. Please click here for full details.

CropperCapture[1038]The Genome Reference Consortium (GRC) plans to release a new human assembly (GRCh38) later this year. What is the reason for an update? The current human genome in Ensembl, UCSC and NCBI (GRCh37) is indeed high quality. The GRC reports it’s accurate to an error rate of ~1 in 100,000 bases. However, there are still gaps in the assembly and there are a number of difficult loci still to be resolved.  The new update addresses many of these issues.  More reasons for the update can be found on the GRC blog.

The new assembly will be available in Ensembl next year (the third or fourth quarter of 2014). What happens between the public release of the updated human assembly and the Ensembl release of the annotated genome? A series of posts from our team will cover the work required to annotate genes, variants, and more to a high standard. These articles will reflect not only the efforts to deliver high quality annotation, but to integrate the data in a useful way for our users. At least one post each month will reflect our thorough analysis of the genome in the following areas:

  • Release cycle (how does it work?)
  • The GRCh38 assembly
  • The new regulation build– integration of ENCODE and Blueprint data
  • Coordinate mapping from one assembly version to another
  • Processing dbSNP and other sources of variants
  • High quality genes, annotating the GENCODE set
  • Updates to the Variant Effect Predictor (VEP)
  • Determining stable ids for the new gene set
  • Whole Genome Alignments– pairing up the new human assembly with other species
  • Quality control- how do we check our data?

Keep your eye on our blog for more posts in this series, marked with the category “GRCh38 Ensembl”.

Are the Ensembl databases and API a core part of your research? Did you develop tools and scripts to get exactly what you need from these resources? Why not share them with the community?

programsEnsembl are pleased to announce e!code, a directory of programming resources for use with the Ensembl datasets and codebase. It currently includes a selection of VEP plugins developed by the Ensembl team plus external contributions such as the Java API JEnsembl.

Please note that this new site is not a repository, only a central listing of available resources. If you would like to contribute, please be sure to read our contributor guidelines.

Visit our e!code mini-site (part of this blog) for more information!