We have just released the latest update of Ensembl Genomes.

The highlights of our new release are:

New Genomes

Ensembl Plants

Ensembl Fungi

Seven additional plant pathogens:

Ensembl Protists

Ensembl Metazoa

Ensembl Bacteria

  • > 4K new genomes: 15,944 genomes now in total

New Data

Ensembl Metazoa

Ensembl Protists

  • DNA alignments: Pythium sp and Phytophthora sp

Ensembl Plants

Slide1

New variation data in maize

  • Transcriptome and whole genome alignments: T. aestivum

Updated Data

In addition to New Genomes and New Data, we have also updated BioMarts, gene trees and comparative genomics in this new release.

Get in touch if you have any questions or comments.

The Ensembl Genomes team!

Do you want to annotate genes and transcripts of your favourite genome?
Will you be in Cambridge (UK) for the Genome Informatics 2014 meeting?
Have you worked with the Unix command line?

ebang-60If your answer is yes to any of the above, you may want to attend our ‘Introduction to Ensembl automatic gene annotation’ workshop on 19-20th of September 2014. Registration is free, but participants need to cover their own accommodation, sustenance and transport expenses.

THIS COURSE IS NOW FULL. Registration is closed.

The workshop

Dan and Fergal from the Ensembl Genebuild team will show how to create your own core database for genome annotation, load a genome assembly and run some of the analyses using the Ensembl genebuild system.

Pre-requisites

Unix (or Linux) knowledge is mandatory. Participants are also expected to have some knowledge of relational databases (e.g. MySQL) and object-oriented programming (the Ensembl API uses Perl).

Topics

  • Introduction to the Ensembl genebuild system, including data input types, how to generate protein-coding transcript models, and add UTR to these models
  • Introduction to assembly structure (toplevel, contigs, scaffolds,  chromosomes)
  • Core database schema
  • Tracking jobs in the system
  • Runnable and RunnableDB modules

Practical sessions

  • Creating a genebuild database
  • Loading an assembly into the database
  • Running algorithms first on the commandline and then using the  pipeline
  • Understanding how the pipeline code interacts with the algorithms and the database
  • Understanding the pipeline’s job tracking system
  • Visualisation of results with Apollo

Slide1

Genomics Informatics 2014

Our Ensembl Gene Annotation workshop will precede this year’s Genome Informatics conference taking place in Cambridge (UK) on 21-24th September.

Screen shot 2014-04-09 at 12.59.20

Please click here for more details on Genome Informatics 2014, including deadlines and programme.

We have just released the latest update of Ensembl Genomes.
The highlights of this release are:

New genomes

Slide1

Wild rice, plant pathogens and peach are some of our new genomes

Ensembl Plants

Amborella trichopoda
Prunus persica (peach)
Oryza (rice) genomes

Ensembl Protists

Bigelowiella natans
Pythium (plant pathogens) genomes

Ensembl Metazoa

Onchocerca volvulus

Ensembl Bacteria

1,246 additional genomes (11,010 genomes now represented in total)

New data

Ensembl Plants

Whole genome alignments
Screen shot 2014-04-07 at 14.39.26

Polyphoid view and whole genome alignments in wheat (genomes A, B and D)

EST alignments
  • Prunus persica
  • wheat EST against bread wheat and its progenitors
Sorghum bicolor variation data based on the SAP panel (Morris et. al. 2013)
Homoeologous genes between the wheat component A, B and D genomes

Updated data

Ensembl Fungi

  • Latest annotations for S. pombe from PomBase
  • GO annotation projected from S. cerevisiae and S. pombe to all species

Ensembl Metazoa

  • Latest EST-based genes for many species
  • Removal of haplotype regions in Anopheles gambiae
  • Peptide comparative genomes

Ensembl Plants

  • Updated UniGene alignments in T. aestivum against its progenitors
  • Peptide comparative genomes

Get in touch if you have any questions or comments.

The Ensembl Genomes team.

In our latest release of Ensembl, we launched a brand new web interface for the VEP (Variant Effect Predictor).

vep_logo

As “it says on the tin”, the VEP predicts the effect of variants (i.e. SNPs, indels and CNVs) on genes and regulatory elements. It tells you where your variants are located (e.g. introns, coding exons, transcription factor binding motifs), what effect they may have on protein coding sequences, and whether these effects might be deleterious or benign.

The VEP does this by mapping your variants against genes, transcripts, translations, and regulatory features that we annotate in Ensembl.
The Variant Effect Predictor can also be run against other gene sets: you can predict the effect of your variants on RefSeq genes too!

What is new?

The new web interface is more user-friendly and has lots of improvements:

Increased number of variants you can input

You can upload up to one million variants in a compressed format, with a 50MB file size limit. To upload these larger files, you simply need to log in. If you do not have an Ensembl account, you are missing out, as there are many perks of registering. It’s easy to do: just provide your name and email address. If you’d rather not register, the upload limit drops down to 5 MB, i.e. around 100,000 variants.

Display of results

We provide a summary statistics table and pie charts illustrating the different SO terms and the classes of coding consequences for the variants you input.

Slide1

The new web interface provides user-friendly pie charts and summary statistics.

The results preview table with additional details is shown after the pie charts. You can apply a range of filters to any of the data fields and limit the results you see. The full or filtered results can be downloaded as VCF or tab-delimited text for import into Excel.

Slide1

The new ticket tracker

You can run several jobs at the same time and track them back at a later date via the ticket numbers assigned to them. You can easily edit and re-run previous jobs. These jobs will be kept in our Ensembl servers for 30 days. If you register though, the jobs will be kept for as long as you like.

Slide1Population data from the NHLBI Exome Sequencing Project (ESP)

Slide1

Population frequency data for the 1000 Genomes and ESP projects.

The VEP provides frequency data for known variants from both the 1000 Genomes and NHLBI exome sequencing projects.

You can also use this frequency data to filter your variants: you may wish to exclude known variants with a frequency above 1%, for example.

VEP results are linked to BioMart

The results table in the VEP is now directly linked to BioMart, a data export tool.
This allows you to retrieve additional data about known variants or the genes your variants affect.
Slide1

You just need to select the attributes in BioMart, e.g. phenotype, orthologues, Gene Ontology terms, and you are ready to go.

Other ways to access the VEP

If you use a command line, you can run the VEP with our script on your own computer. With the Perl script, you can do everything you can do in the online version plus much, much more! It’s the most powerful way to use the VEP.

A couple of functionalities of the VEP (e.g. fetch variant consequences) are also available in the beta version of our language agonistic Rest API.

Help on the VEP

Have a look at our video on the new online VEP interface and our documentation pages for help on the web interface and script versions of VEP.

If you have questions or comments, please get in touch with us.

Why did my gene change?

As a member of the Ensembl Outreach team, who is actively involved with training and user support, I often have to answer the question, “Why did the annotation of my favourite gene change?”

There are a few driving forces behind the changes in the annotation of any given gene.  Two of those are the growing number of sequences that are deposited in sequence databases nightly, and the updates to the genomic assembly of a given species.  Regardless of the reason, changes and improvements will result in a revised and refined annotation of our Ensembl geneset.

However, clinical researchers in particular may prefer to work in a more controlled, less changeable environment. You may wonder then: “Is there annotation in Ensembl that won’t change?”  Yes, there is!

Are there genes that don’t change?

There is a set of gene sequences where changes are strictly prohibited.
These are the LRG records. An LRG or Locus Reference Genomic has a fixed and stable reference sequence for reporting and diagnosing variants that cause diseases in humans.

More than 700 records have been annotated so far. They have been mapped everywhere in the human genome, with the exception of the Y chromosome and the mtDNA.

Slide1

LRG loci currently annotated in human.

The majority of these 700 records (59%) are publicly available on the LRG website, whereas the remaining are still in the validation phase, carried out manually by LRG curators. The ultimate goal is to provide an LRG record for every single protein-coding gene in the human genome. It’s certainly a mammoth task! So, genes with clinical implications will be prioritised.

Slide1

Summary information of LRG_293 (BRCA2 gene)

Stable sequences allow clinical geneticists and the research community to report their variation data in a more controlled and stable framework. They will be able to perform consistent comparisons of variants reported in LRG coordinates against other databases and therefore be better equipped when diagnosing diseases.

Viewing LRGs in Ensembl

In addition to the LRG website, clinical geneticists and others can investigate any of the public LRG records in Ensembl too, where they can be viewed in the context of our comprehensive annotation of genes, variants and regulatory features, among many other features.

Use Ensembl to search for an LRG and get all the variants that map to it. You can then check the functional impact of these variants. For more tips on how to investigate LRG records in Ensembl, contact us.

Slide1

Variation consequences calculated with VEP.

Can I request my own LRG?

You can request an LRG record for a clinically relevant gene. For more details on how to submit the request, have a look at the LRG request page.

The Cold Spring Harbor Laboratory will be hosting a winter conference on Avian Model Systems in March this year, and the abstract deadline is fast approaching.

Prior to the meeting, the EMBL-EBI and the WTSI will run a two-day workshop on Avian Genomics with a focus on analyses of NGS data, such as RNA-Seq, ChIP-Seq, and on Ensembl Genome Browsing.

In the current version of Ensembl (release 74, December 2013), we provide detailed annotations of genes, transcripts and proteins for five birds, namely chicken, duck, zebra finch, flycatcher and turkey. On our Pre Ensembl, we also display the preliminary analysis of the budgerigar genome.

blog_birds

Our gene annotation in Ensembl is built based on biological evidence that has been experimentally validated, such as mRNA, ESTs and proteins. For two out of the five birds listed above (i.e. chicken and flycatcher), we also used RNA-Seq data for the annotation of their genomes.

During this Ensembl Browser Workshop, we will be navigating the Ensembl browser to cover gene annotation, variation and comparative genomics data, and we will also introduce some of our genomic tools, such as BioMart and the VEP.

The deadline for abstract submission to the Meeting and the Pre-Meeting Genomic Workshop is January 24th.

If you want to attend this workshop, please contact Val Pakaluk.

 

We have just released the latest update of Ensembl Genomes.

This release contains the chromosome survey sequence for bread wheat cv Chinese Spring, generated by the International Wheat Genome Sequencing Consortiumaa0More than 100,000 protein-coding genes predicted by MIPS  are now available for the first time in Ensembl Plants. We also provide in this release functional annotation, genome alignments, and inferred evolutionary histories for bread wheat.

The other highlights of release 21 are:

New genomes

Ensembl Fungi

Verticillium dahliae and three new Schizosaccharomyces species (S. cryophilusS. octosporus and S. japonicus)

Ensembl Protists

Emiliania huxleyi

Ensembl Metazoa

metazoa

Dendroctonus ponderosae and Solenopsis invicta

Ensembl Bacteria

675 new genomes added (9,764 genomes now represented in total)

New data

Ensembl Fungi

Pairwise DNA alignments for Schizosaccharomyces and Magnaporthe species, and for Ustilago maydis and Sporisorium reilianum

Ensembl Protists

Pairwise alignments for Alveolates, Stramenopiles, Amoebozoa and Kinetoplastida

Ensembl Plants

  • Gene models for T. aestivum chromosome survey sequences, as described above
  • Additional wheat RNA-Seq data.
  • DNA-DNA alignments between bread wheat, rice, and Brachypodium distachyon.

MSU-7 gene models for the rice genome are now available, allowing visual comparison between this and the IRGSP set. Cross-references between the two gene sets are provided by rap-db, so searching and querying using either identifier space is possible.

msu

Overview of MSU and IRGSP gene models in rice.

Updated data

Ensembl Fungi

  • Peptide comparative genomics and BioMart
    Schizosaccharomyces pombe annotations from PomBase

Ensembl Metazoa

Peptide comparative genomics and BioMart

Ensembl Plants

  • Updated gene-trees to include T. aestivum gene models.
    Updated GO terms for O. sativa and projected to other species.
    Updated repeat analysis (TREP) for wheat genomes

If you have any questions or comments, get in touch.

The Ensembl Genomes Team

Monday morning in London: hustle and bustle of the start of the working week.
Packed trains and tube. Huge crowds on their customary way to work.

congestion

Congestion on the London underground.

I too (armed with my laptop, a course booklet and a pen) was on my way to London to deliver training in Ensembl.

Aida Santaolalla and Anita Grigoriadis organised this Ensembl browser workshop for the cancer research community at Guy’s hospital, a large NHS hospital with breathtaking views of the River Thames.

photo 4

View of London from Guy’s hospital.

We had 18 participants with mixed professional roles within the hospital: postdocs, PhD students, staff researchers, principal investigators, and a physician. All had one research theme in mind: cancer.

Cancer is a broad range of diseases characterised by unregulated cell growth due to varied and complex causes, one of them being genetic predisposition. During the workshop, I demonstrated, among other things, how Ensembl could be used to find out genes and variants (i.e. SNPs, CNVs and indels) that may confer such genetic predisposition. The location of variants that are associated with breast cancer, for instance, can be seen in Ensembl.

Screen shot 2013-12-16 at 11.27.29

Among the attendees, 40% were completely new to Ensembl and a whopping 80% had never used Ensembl BioMart before. The comments received in my feedback survey were very rewarding: ‘This was a very good workshop where I learned the basics of Ensembl software’, ‘This workshop was particularly useful in pointing out tools in Ensembl that I wouldn’t have known existed through personal investigation.’ And better still, every single participant indicated they would use Ensembl and Ensembl BioMart more often after this workshop, and they would recommend the workshop to a colleague.

Ensembl workshop delivered. Happy participants. Great feedback. Job done.
The day was far from over though. I still had to make my way out of the city.

Monday evening in London: hustle and bustle of the end of the first day of the working week. Packed trains and tube: huge crowds on their customary way home.

Can’t deny I was chuffed to bits to go back to the HQs of Ensembl in the quiet, tranquil Cambridgeshire.

Now I’m ready for my next adventure: University of Pavia, Italy.
Maybe I will come to your institute one day.

If you want to organise a training session in Ensembl, you can follow the steps of our hostesses at Guy’s, who contacted us to organise this successful Ensembl workshop in London for the cancer research community.

Slide1

Some of the new genomes in Ensembl Fungi and Ensembl Plants

We are thrilled to announce that the latest update of Ensembl Genomes has been released. The highlights of our new release are:

New genomes
Ensembl Fungi
Blumeria graminis and Microbotryum violaceum

Ensembl Plants
First release of the Chromosome Survey Sequence for bread wheat cv Chinese Spring generated by the International Wheat Genome Sequencing

Ensembl Bacteria
2,802 new genomes included (9089 genomes now represented in total)

New data
Ensembl Metazoa
New variation data for Anopheles gambiae (from VectorBase), originating from the Wellcome Trust Sanger Institute’s Malaria Programme’s Anopheles gambiae Genome Variation Project.

Ensembl Bacteria
Operon data for E. coli K-12 imported from RegulonDB
New cross references added to Rhea and MetaCyc, and the Enzyme Classification of the  IUBMB

Updated assemblies
Ensembl Plants
Oryza sativa (IRGSP v1.0)

Ensembl Metazoa
Acyrthosiphon pisum (AphidBase 2.0)
Apis mellifera (BeeBase 4.0)
Strongylocentrotus purpuratus (SpBase 3.0)

Updated data
Ensembl Fungi, Ensembl Protists and Ensembl Plants
Manually curated ontology annotations in model organisms projected to orthologous genes
Cross-references to PHI-base added for 4 additional species of plant pathogens

Assembly converterassembly_converter_0
The Ensembl Genomes Assembly converter, which can map feature coordinates from one version of a genome assembly to another, is now also available for all species in Ensembl Metazoa, Fungi, Plants, and Protists.
It can be accessed online or programmatically (Perl API or REST service).

Any questions or comments? Contact us.

The Ensembl Genomes Team

p.s. the public MySQL will be available shortly.

Ensembl Genomes release 19 is out!

Slide1

The Ensembl Genomes browser.

The release contains new and updated genomes and associated data, which can be accessible via the Ensembl Genomes website, the REST web service, the Perl APIs and the FTP site.

The detailed features of this new release are:

Ensembl Metazoa
* New genomes
Rhodnius prolixus (a hemipteran, imported from VectorBase)
Tetranychus urticae (two-spotted spider mite, a chelicerate)
Crassostrea gigas (Pacific oyster)
Lottia gigantea (owl limpet)
Capitella teleta (polychaete worm)
Helobdella robusta (leech)

Slide1

* New data
Synteny, based on orthology, between Drosophila melanogaster and Anopheles gambiae

* Updated data
Peptide comparative genomics and BioMart

Ensembl Fungi
* Updated genomes
S. pombe (PomBase v36)

* New data
Transcript profiling RNA-Seq for Zymoseptoria tritici (SRA study SRP017084)

* Updated data
Annotations of pathogenic phenotype imported from PHI-base

Ensembl Protists
* New genomes
Paramecium tetraurelia (Ciliate)

* Updated data
Peptide compara and Biomart

Ensembl Plants
* New genomes
Triticum urartu (wheat A-genome progenitor)
Aegilops tauschii (wheat D-genome progenitor)

Slide1

The two new wheat genomes.

* New data
Bread wheat genome alignments to barley

* Updated data
Barley variations and BioMart

 

Ensembl Bacteria
* No significant updates

Get in touch to let us know what you think of our new release.

note: the public MySQL and FTP will be available shortly.

The Ensembl Genomes Team