
Trixie the Triceratops
Ensembl produce high quality gene annotation for a number of species, but getting it to the high quality we expect takes time. This means there are many species and strains where we don’t have annotation yet. If you’re working with a species without Ensembl annotation (like Trixie the Triceratops here) or even a specific strain that we don’t have, you can still make use of VEP for predicting the effect of variants on genes and transcripts, using your own annotation. All you need is a GFF or GTF of the transcripts, and a FASTA file of the genome.
Your files need to be compressed, indexed and formatted correctly to use them with VEP. All you need to do then is to specify the location of the files in your VEP command:
./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz
The online VEP tool is only able to work with databases loaded into Ensembl, so you will need to use the VEP script for your analyses with custom data.
Custom variant annotation
If you have separate data that you would like to add on to your variants, on top of what the VEP offers, you can also include these data. For example, if you have a VCF file containing allele frequencies or phenotypes in the INFO column, the VEP can check whether any of your variants match those in the file, and copy any annotation you specify across to the INFO column of your VEP output VEP. If you compare your VEP variants to a custom bigWig file, the VEP can get you the value of the wiggle at the locus of the variant. You can also check if your variants overlap any regions in a custom BED or GTF/GFF file.