In Ensembl release 110, we have extended the analysis options available for structural variants (SV) in Ensembl VEP including more detailed molecular consequence predictions, more efficient integration of information from reference SV sets, support for breakend variant annotation and the integration of CADD-SV scores.
Breakend variant annotation
Ensembl VEP now supports (simple) cases of complex rearrangements using breakend variants as described in the VCF 4.4 format specification, for instance:
#CHROM POS ID REF ALT QUAL FILTER INFO
1 234919885 bnd1 A [chr1:17124942[A . . .
22 22857058 bnd2 A A[chr5:228930[,A[chrX:109323[ . . .
For such input, Ensembl VEP will return all transcripts affected by the breakends, including breakends defined in the ALT field.
The MATEID and EVENT INFO tags are not currently supported but we will investigate how to handle these in the future.
More rapid reference data integration
When running Ensembl VEP locally, you can now more rapidly integrate results from reference sources – like phenotype assertions for SVs in ClinVar and frequency data from studies like gnomAD and the 1000 Genomes Project. The –custom option has been extended to allow you to specify:
- the type of overlap you wish to be reported eg. any, within, surrounding or exact
- the minimum percentage overlap
- whether the overlap should be reciprocal
The –custom options can be used with standard bioinformatics formats (bed, gff, gtf, vcf or bigwig) and for clarity now accepts key-value parameters. See Variant Effect Predictor Custom annotations for more information.
Other improvements
- VCF copy number tags are now used in consequence calculation, with CN=0 now treated as a deletion and CN=2 as a duplication.
- All VCF-supported SV types (INS, DEL, TDUP, DUP, CNV, INV, BND) are now supported in Ensembl VEP default input and REST-style regions formats. More information available at Variant Effect Predictor Data formats.
- The CADD plugin now supports pre-calculated CADD-SV deleteriousness scores. These scores are calculated using a wide set of annotations across the range and in the vicinity of SVs.
- Inversions covering an entire transcript are now annotated as coding_transcript_variants (or non_coding_transcript_variants)
- More detailed predicted consequences (such as splice poly-pyrimidine tract variant) are not reported when an SV has a more disruptive impact such as overlapping an exon, simplifying the returned results.
- The impact of classification of feature elongation/truncation consequences has been promoted from MODIFIER to HIGH.
These changes are also reflected on SV pages in the Ensembl browser where ‘supporting evidence’ SVs are taken into account in the same way as CN tags described above to predict impact for insertions or deletions, where appropriate.
As always, if you have any questions or feedback, please let us know. We are eager for you to use the new functionality and hope it helps with interpreting new SVs. We intend to continue working on updates to expand SV options, there are more enhancements planned for future releases!