Are you looking for whole genomes, protein sequences, alignments or other genome-wide data from Ensembl?
Look no further; our FTP site is the place for you:
- Download our data from the current release only (i.e. Ensembl 78)
- Download our data from current and previous releases (including GRCh37)
These are some of our data that can be downloaded in bulk and for free; file types are described in brackets:
- DNA, cDNA, CDS, ncRNA sequences (FASTA)
- Annotations of our coding and non-coding genes (GTF)
- Annotation of regulatory elements for the human and mouse genomes (GFF)
- Variation data (VCF) for more than 20 Ensembl species
- RNASeq reads (BAM) aligned against 25 genomes
- GERP scores to identify constrained elements (BED)
- Alignments of resequencing data for several species (EMF)
- Multiple and pairwise genome alignments (MAF)
- Ensembl databases for local installation (MySQL)
How can the Ensembl FTP foster research?
Let’s look at coiled-coils, simple dimers in protein sequences found in many species and believed to enable protein-protein interaction in a variety of biological processes.
Coiled-coil domains differ immensely from their globular counterparts, and distinct evolutionary constraints on them are expected. How conserved are coiled-coils? What has driven their evolution?
Intrigued by these questions, Surkont and Pereira-Leal (2015) set out on an journey to compare different protein sequences across several vertebrates, and the yeast. They show that substitution patterns do differ in coiled-coil versus globular regions, and they developed an evolutionary model to improve the detection of coiled-coils by homology, and their phylogeny inference.
Where did Surkont and Pereira-Leal find these proteomes for their investigation? In our FTP site.
Why not explore the Ensembl FTP site to see what we’ve got in store for you?
Any comments or questions, just get in touch.