Cool stuff the VEP can do: optimisation

Some Variant Effect Predictor (VEP) jobs are small, just ten or fewer variants, and that’s easy. Some VEP jobs are big, if you do variant calling on one whole human genome, that’s five million variants! The more variants you have, the more computing power the VEP needs to process them, which can make it slow. But there are ways to speed it up.

Switch to the command line

The first way to speed up your VEP queries is to say goodbye to the online tool, and embrace the command line script. Once you have more than a couple of thousand variants, the convenience of the online tool is massively outweighed by the speed of the script. The documentation provides lots of help for installation, including a handy installation script, and if you get stuck, you can always contact us.

Install a cache

As you run the installation script, you’ll be prompted to install a cache. Caches contain all the data the VEP needs to run, stored in a compact format on your own computer. This means that the VEP no longer needs to communicate with our database to annotate your variants, and can work much more quickly. If you’re getting HGVS in your output, you’ll also need to get a FASTA file, which you’ll also be prompted for by the installation script.

Further optimisations

If you’ve really got a lot of variants, VEP supports parallelisation with the –fork command, allowing you to use multiple processor cores. Additionally, installing the Ensembl-XS package can reduce runtime by approximately 10% by replacing key VEP components with compiled C, and installing the perl module Set::IntervalTree will allow for faster retrieval of overlapping variants. Sorting input and cache files into chromosomal order will also provide a speed-up. See our docs for further tips!