The RefSeq column on our gene pages has changed.
We’re moving towards a more unified gene-set with RefSeq, with biologically important transcripts being highlighted as MANE. This means displays you’re used to seeing will be updated to reflect these changes, and this may affect the way you have been working with Ensembl.
On a gene page, you’ll see the table of transcripts now has the column RefSeq match. In human GRCh38 this shows a versioned RefSeq NM which is a 100% match to the Ensembl transcript, including sequence, structure and UTRs. These transcripts will have the flag MANE Select v0.5 in the Flags column in this table.
The addition of the RefSeq match column to the table of transcripts means that the previous RefSeq column, that contained all transcripts or peptides that were linked to the Ensembl transcript, has been removed because it displayed only sequence similarity and region overlap and not a perfect match.
Around 50% of human genes on GRCh38 will have a MANE Select transcript in Ensembl 96. The MANE project is only being done for GRCh38, so there is currently no data in the RefSeq Match column for GRCh37 or non-humans.
You can still find associated RefSeq transcripts and peptides on the External References page for a gene; click on External References in the menu on the left. In BioMart, you can filter the gene database to see only the MANE Select transcript, and get the perfect NCBI matches as RefSeq match transcript.
We are monitoring the response to this. If this change affects you and you would like to tell us more about your use-case, please let us know.
And more changes to come…
Genes in both Human and other species that do not have a MANE Select transcript, will have an Ensembl Select transcript. This will be chosen using the same criteria that we use to pick our top transcript to compare with NCBI for MANE Select, just without the NCBI comparison.
Another thing set to change with MANE is the gene trees. These are calculated using a representative transcript for each gene, currently based on coding length and support levels. We plan to use the MANE Select or Ensembl Select instead to calculate these trees in future.
We expect that in most cases the transcripts used to calculate the gene trees will not change, but some will. If the transcript does change, it is possible the underlying trees will change too, but we believe that the new trees will be more representative of biologically important transcripts. We don’t yet know how many genes are likely to be affected in this way.
How does MANE benefit me?
Matched Annotation from NCBI and EBI (MANE) will facilitate greater consistency for reporting data relating to genes, such as variant positions. By standardising our transcripts, we ensure that anyone who finds an important A->G variant on position 5 of Ensembl’s BEND4-201 CDS can report that, knowing that someone else who prefers to use RefSeq will find the same A->G variant at the same position 5, rather than 12 bases away. This interoperability will speed up research, with no more need for time-consuming conversions, or worse, misinterpretations.
By identifying the MANE Select transcript (the most biologically important transcript, as agreed between Ensembl and NCBI), we also highlight a single transcript to use universally for reporting variants. Biologically important transcripts will be eventually added as MANE Plus, allowing you to easily filter your data to only see that which we think is important.
Addendum: reversion for GRCh37
Since we do not have plans to introduce MANE for GRCh37, we have reverted the table back to the old style with the RefSeq column of related transcripts.