Cool stuff the Ensembl VEP can do: analysis with RefSeq transcripts

By default, VEP uses the Ensembl/GENCODE transcript set when analysing your variants, but you can also choose to use NCBI’s RefSeq transcripts.

Ensembl and NCBI use slightly different strategies for annotation which results in two different, but overlapping transcript sets. So there is a VEP option to analyse against both sets at the same time.

A key difference in strategy is that Ensembl/GENCODE transcripts are built from the reference assembly, but RefSeq transcripts are not, so the latter do not necessarily match the genome. As variants are called against the genome, this can confound analysis. Where possible, NCBI creates alignments of transcripts to the genome. Ensembl VEP can take these into account* when predicting molecular consequences and deriving HGVS descriptions.

Ensembl and NCBI are now collaborating to define a single default transcript for each human gene. You can flag these ‘MANE‘ (Matched Annotation from NCBI and EMBL-EBI) transcripts when analysing variants with the Ensembl/GENCODE transcript set. Some additional information, such as protein annotations and quality measures like APPRIS and TSL, are only available for Ensembl transcripts, but SIFT and PolyPhen-2 predictions are available for both transcript sets.

* These “corrections” to transcript sequences are in our human VEP caches and are used by default. For other species the alignments can be downloaded from NCBI and used on the command line with the –bam option. Corrections have been applied in VEP consequence calling since Ensembl version 90 and as of version 100, alignments are also taken into account in our HGVS calculation.