Known bugs in Ensembl | |
---|---|
Inconsistency in transcripts numbering in GFF3 and GTF exported files |
|
Affects: Live site | Versions: Ensembl 102, Ensembl 103, Ensembl 104, Ensembl 105 |
Description: We noticed, from a bug report that some inconstencies may appear in particular cases between our GFF3 and GTF FTP files available.
Sometime, depending on data underlying our dumps, the number of transcripts retrieved may differ from one file to the other, for the same species. The main difference between GTF and GFF3 dumping is that for GTF, we get the transcripts from the gene ($gene->get_all_Transcripts) while for the GFF3, we get the transcripts from the underlying slice ($transcript_adaptor->fetch_all_by_Slice) https://github.com/Ensembl/ensembl-production/blob/release/104/modules/Bio/EnsEMBL/Production/Pipeline/GFF3/DumpFile.pm#L199 This means if the transcript goes over the boundaries of the slice, we might not dump it although we dump the genes. We plan to fix this from 106 onwards. |
|
Workaround: No work around. Except using most up to date datasets |
1000 Genomes minor allele frequency incorrect for duplications |
|
Affects: Live site, Staging, GRCh37 | Expected Versions: Ensembl 103, 104 |
Description: Some insertion/deletion variants which can be described as duplications currently have incorrect global allele frequencies from the 1000 Genomes Project reported in the Ensembl variant and transcript views, BioMart and in Ensembl VEP. Versions 103 – 104 are affected. The continental population frequencies for these variants are correct and the problem can be identified by comparing the two. Example: rs199588481 where an ‘A’ is inserted adjacent to an ‘A’ , the VCF reference allele ‘A’ is annotated as the minor allele, when the alternate allele ‘AA’ should be. This issue will be resolved in Ensembl VEP version 105, which will be released in the autumn. BioMart and the Ensembl browsers will be fixed for version 106. | |
Workaround: We advise ignoring these global frequencies and filtering using the continental frequencies instead.. |
Missing clone information for mouse GRCm39 |
|
Affects: Live site, Mirrors | Expected Versions: Ensembl 103 |
Description: The coordinates of the following clones libraries have not been loaded into the Mouse GRCm39 database, they are not visible in the Location view * B6Ng01 * C3H * CH25 * CH26 * CH28 * CH29 * CH33 * CH34 * CH36 * CT7 * DN * MHPN * MHPP * MM_DBa * MSMg01 * RP21 * RP22 * RP23 * RP2 * WI * bMQ |
|
Workaround: At the bottom of the Location view page, you should click on “View in archive site” and select “Ensembl 102: Nov 2020 (GRCm38.p6)”. You can now click on the cog to configure the page and view the clone information for the region of interest. In most of the cases the coordinates would have shifted. |
Mouse species.common_name needs to be patched |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: In mouse 39 the species.common_name is set to ‘house mouse’, a change from ‘mouse’ in earlier releases. The consequences of this are 1) When searching from the home page, results from the reference mouse as shown with a species of ‘House Mouse’ whereas all strains are shown with a species of ‘Mouse’. This means that clicking on ‘Mouse’ does not show the results from the reference. 2) Whereas ‘Mouse’ is a favourite and therefore elevated in the list of species, ‘House mouse’ is not. This means that you will need to expand the list to find ‘House Mouse’. |
|
Workaround: To filter search results to show the reference mouse, you will need to scroll down the long list of species to find ‘House Mouse’. |
Missing RFAM xrefs in mouse core database |
|
Affects: Live site, Mirrors, Staging | Expected Versions: Ensembl 103 |
Description: The RFAM xrefs are missing from the mouse core database. As a consequence, a variable number of genes and transcripts of the biotypes ‘misc_RNA’, ‘ribozyme’, ‘rRNA’, ‘snoRNA’ and ‘snRNA’, will get a clone-based name instead of the RFAM name. Descriptions may be empty for these genes and transcripts. | |
Workaround: none |
Microarray data not present in Ensembl Metazoa BioMart for Drosophila melanogaster |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: No microarray data for Drosophila melanogaster in BioMart on Ensembl Metazoa. | |
Workaround: Since that data is present in both Ensembl Metazoa and Ensembl sites, the work around would be to use Ensembl for Drosophila melanogaster for release 103. |
Some genes need to be updated for the wheat cultivar Stanley |
|
Affects: Live site | Expected Versions: Ensembl 102, Ensembl 103 |
Description: For the wheat cultivar Stanley, a different and newer genome assembly version has been uploaded to the sequence archives. The gene projections in Ensembl Plants refer to an older version of the Stanley assembly, which has not been uploaded to the archives. This results in an inconsistency between GFF file and genome assembly sequence file. However, the only difference between the two Stanley assembly versions is one scaffold that changed orientation (chr2A:1-5191484). That means that the overall gene content remains completely UNCHANGED and the CDS and protein sequences in the fasta files remain valid. Only thing to note is that a few genes (those in the region of the flipped scaffold) in the GFF file will have incorrect coordinates with relation to the latest Stanley assembly sequence. The corrected GFF (along with all other files) can also be accessed here: https://wheat.ipk-gatersleben.deAnd will be updated in Ensembl 104. |
|
Workaround: The corrected GFF (along with all other files) can also be accessed here: https://wheat.ipk-gatersleben.de |
Merged RNA-seq data not available for some sheep |
|
Affects: Live site | Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104 |
Description: The RNA-seq merged BAM files and their associated tracks are not available for the Rambouillet sheep (Oar_rambouillet_v1.0) nor for the Texel sheep (Oar_v4.0). This also affects Ensembl Rapid Releases from 8 onwards. | |
Workaround: none |
GENCODE Basic track doesn’t get displayed |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: It is not possible to display the GENCODE Basic gene annotation track in the genome browser | |
Workaround: none |
Missing variant pathogenicity predictions for REVEL, MetaLR and MutationAssessor |
|
Affects: Live site, Mirrors, Staging, Test | Expected Versions: Ensembl 102, Ensembl 103 |
Description: We are missing variant pathogenicity predictions from REVEL, MetaLR and MutationAssessor on: * Variant page > Genes and regulation view * Transcript page > Variant table viewThis only affects human GRCh38 views. Predictions for CADD, SIFT and PolyPhen-2 are still available. This problem does not impact Ensembl VEP. |
|
Workaround: The scores can still be retrieved:
|
Genomes have been over-masked |
|
Affects: Live site | Expected Versions: Ensembl 102, Ensembl 103 |
Description: Repeatmasked genomes have been masked using Repeatmodeler libraries for some species – we are not confident that this is not masking gene families and so will remove this masking, i.e. only mask the genomes using Repbase libraries. | |
Workaround: For the time being, masked genomes have been masked using the Repeatmodeler libraries. |
Regulation Mart missing |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: Some dataset have unfortunately been missed out in regulation mart for this release.Missing datasets are:
We apologise for the inconvenience and doing our best to restore these datasets for release 104. |
|
Workaround: In the meantime, you can use the archive site to retrieve the data, unfortunately, Mouse would be using the GRCm38.p6 assembly and not the latest version GRCm39. |
Non-current exons in human core database |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: A large number of exons are erroneously labelled as non-current in the human core database (exon.is_current = 0). This bug may impact Ensembl API users, since several ExonAdaptor methods filter for current exons:fetch_all fetch_by_stable_id fetch_by_stable_id_version This bug does not seem to affect the website. |
|
Workaround: If possible, use alternative API methods to fetch exons, such as fetch_all_by_Transcript. |
GO Term Filters not available in non-vertebrate BioMart |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: GO Term Accession and GO Term Name filters do not show up in the Ensembl Genomes BioMart across all divisions. | |
Workaround: Please use the archived release 59, available here: https://nov2020-plants.ensembl.org/biomart/martview To navigate to other divisions, please modify the URL by changing the division name. |
EPO and EPO extended MSAs not displayed correctly |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: In e103, we found out that EPO and EPO Extended MSAs have been displayed incorrectly for quite some time (based on a later GitHub track, most likely since e86 or when Alignment(image) was made available). After some investigation we have found out that the bug is located in Compara’s AlignSliceAdaptor and was causing issues like species displayed twice or the ancestral sequence not having the correct information. | |
Workaround: none |
Broken/ missing links for transcripts with biotypes “tRNA” and “IG” for RefSeq tracks |
|
Affects: Live site | Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104 |
Description: When viewing the RefSeq track, the links to NCBI for transcripts with biotypes “tRNA” and “IG” are broken or incorrect. | |
Workaround: This will be fixed in an upcoming Ensembl release, in the meantime the links will be disabled. |
Compara ncRNA trees stats not described accurately |
|
Affects: Live site | Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103, Ensembl 104 |
Description: The stats computed in ncRNA trees under the names {{nb_genes_in_tree}} and {{nb_orphaned_genes}} are not actually referring to the final trees but the unfiltered clusters (earlier stage). In Ensembl 103 we have corrected this problem and they will match their name, but their values will decrease significantly in at least 50% of the species reported. | |
Workaround: none |
Some protein coding genes mysteriously turned into non_translating_CDS |
|
Affects: Live site | Expected Versions: Ensembl 101, Ensembl 102, Ensembl 103 |
Description: A user spotted that peptide fasta files are considerably shorter for pachysolen_tannophilus_nrrl_y_2460_gca_001661245 (fungus). Turns out that this is because in release 42 a lot of its protein coding genes were marked as nontranslating_CDS (although the underling data and annotation has not changed).Needs investigation! | |
Workaround: none |
GRCh37 – COSMIC insertion coordinates off by +1 |
|
Affects: Live site | Expected Versions: Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103 |
Description: The coordinates for insertions imported for COSMIC source are off by +1.For GRCh37 e100, e101: 2.66 % (253,428 / 9,511,409) COSMIC variation is affected. | |
Workaround: The previous release can be used. GRCh37 99 contained 4,478,854 COSMIC variation data. |
Drosophila melanogaster RNA gene cross-reference links do not work |
|
Affects: Live site, Mirrors, Staging | Expected Versions: Ensembl 99, Ensembl 100, Ensembl 101, Ensembl 102, Ensembl 103 |
Description: Rfam and miRBase cross-reference links do not work, because they use the FlyBase ID instead of the RNA gene. | |
Workaround: Search for the Rfam or miRBase ID on the respective websites. |
Extra character in Drosophila file dumps |
|
Affects: Live site | Expected Versions: Ensembl 103 |
Description: an extra `_` has been inserted in some GTF file paths during FTP dumps for Drosophila species:ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_virilis/Drosophila_virilis.dvir_caf1_.50.gtf.gz ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_yakuba/Drosophila_yakuba.dyak_caf1_.50.gtf.gz ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_ananassae/Drosophila_ananassae.dana_caf1_.50.gtf.gz ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_mojavensis/Drosophila_mojavensis.dmoj_caf1_.50.gtf.gz ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_pseudoobscura/Drosophila_pseudoobscura.Dpse_3.0_.50.gtf.gz ftp://ftp.ensemblgenomes.org/pub/metazoa/release-50/gtf/drosophila_simulans/Drosophila_simulans.ASM75419v3_.50.gtf.gz (note the extra underscore in the assembly string). |
|
Workaround: Add the _ |