gitandensembl_smallToday I am happy to announce Ensembl’s migration away from CVS as our primary version control system (VCS) to Git. This migration sees the end of nearly a year’s worth of work to ensure that our Git repositories provide the same historical record as CVS.

To summarise the changes:

  • Ensembl’s code is now provided from our GitHub organisation at http://github.com/Ensembl
  • We have migrated Ensembl’s versioning back to 1999
  • We will continue to back-port changes to CVS for the next 3 releases (support ending with release 77)
  • You can still download our API release tarballs from our FTP site

CVS (Concurrent Version System) was first released in 1990 and was based on an earlier system called RCS (released in 1982). It relies on a centralised single server to hold all previous revisions with none of this information held on the client. CVS also assumes that commits to files within the same project are independent of each other. When the Ensembl project started in 1999, we chose CVS as it was one of the best available VCS in the open source community.

Since that choice, a new breed of VCS has appeared: decentralised/distributed version control systems (DVCS). These systems favour local copies of the repositories, removing the need to communicate with a centralised server, except when sending or receiving new commits, and work with sets of file changes as a single atomic block of work. According to Black Duck’s comparison of repositories, Git is the dominant DVCS in open source projects. We have decided to use the code hosting company GitHub as the location of our repositories. GitHub has been a major contributor behind the success of Git by providing an infrastructure that promotes social coding between developers.

Whilst CVS has been a very good servant to Ensembl, the time has come to move on to better tools. We have seen other projects within EMBL-EBI and the Wellcome Trust Sanger Institute make a similar transition to Git. None of them have looked back, citing better tooling, a larger support base and an ability to support both long-term and short-term development branches. We agree and cannot wait to start using this exciting technology.

Over the past few months a number of users have asked us how to install Ensembl and its dependencies on OSX. Over the past 4 years I have had to do this quite a number of times and thought it best to share my personal best practice. There are alternatives to this methodology involving supplementing the stock OSX Perl with extra libraries or using ActiveState for OSX. I recommend neither method. Apple never developed a package management tool that works well with Perl libraries and so upgrades carry a level of risk. As for ActiveState they do not currently support DBD::mysql on OSX. Instead we will install a new version of Perl using Perlbrew; a Perl installation management tool.

This guide will require admin rights on your mac and assumes some understanding of the terminal. If you do not feel confident enough then try using our Virtual Machine instead.

Pre-Flight Checks

You must have Xcode and GCC installed on your mac. Check by running the following command and see if you get a response similar to the one pasted below

> gcc -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.11~182/src/configure --disable-checking --enable-werror --prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.11~182/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)

Later versions of OSX (Mavericks 10.9) are intelligent enough to instal the GCC tools automatically once you have confirmed. If you are on an earlier version of OSX and do not get this prompt then following the instructions below (this will require admin privileges)

  1. Install Xcode from the Apple AppStore
  2. Run it Applications
  3. Install the command line utilities by clicking on Xcode in the menu
    • Preferences
    • Downloads
    • Click “Install” by the Command Line Tools section

Additional information (with screenshots) is available from this Stack Overflow answer.

Installing Perlbrew

Firstly you need to install Perlbrew. This will create a directory called perl5 in your home directory. It will also ask to add commands to your shell’s profile (either .bashrc, .cshrc or .bash_profile) to bring the perlbrew binary onto your path. To install use the following commands:

> curl -L http://install.perlbrew.pl | bash

# Add this to the end of your ~/.bash_profile
> echo 'source $HOME/perl5/perlbrew/etc/bashrc' >> ~/.bash_profile

Now install Perl 5.14.4 (you will have to wait a bit). The following command was run on a Mountain Lion installation (10.8.4). You must install a new version of Perl. Modifying the system version of Perl (including installing module updates) on OSX is a very bad idea and can cause unintentional side effects. To be safe always install your own version:

> perlbrew install -j 5 --as 5.14.4 \
--thread --64all -Duseshrplib perl-5.14.4

Fetching perl 5.14.4 as /Users/user/perl5/perlbrew/dists/perl-5.14.4.tar.bz2
Installing /Users/user/perl5/perlbrew/build/perl-5.14.4 into ~/perl5/perlbrew/perls/5.14.4

This could take a while. You can run the following command on another shell to track the status:

tail -f ~/perl5/perlbrew/build.perl-5.14.4.log

5.14.4 is successfully installed.

Later versions of OSX and Perl can sometimes fail during this compilation process citing issues with locale settings. Should you see this run the following command (stops any testing against the new Perl binary):

> perlbrew install --notest --as 5.14.4 --thread \
--64all -Duseshrplib perl-5.14.4

Now install cpanminus. This is our CPAN package manager and makes working with it a breeze.

> perlbrew install-cpanm

Now we will switch to using the new version of Perl by default and ensure that the switch worked.

> perlbrew switch 5.14.4
> perl -v | grep 'This is'
This is perl 5, version 14, subversion 4 (v5.14.4) built for darwin-thread-multi-2level

Installing MySQL Client Libraries

DBD::mysql requires access to libmysqlclient.18.dylib and the MySQL C headers to compile. MySQL’s Connector/C distribution ships with these files. However I have always found more success using a server installation and like having a personal MySQL server to develop against. This guide will only cover using a MySQL Server installation.

  1. Go to http://dev.mysql.com/downloads/mysql/
  2. Select a version compatible with your Mac
  3. I selected Mac OS X ver. 10.7 (x86,64bit), DMG archive MySQL Community Server 5.6.12. You may find a later version. Make sure to change all other commands accordingly
  4. Mount the DMG
  5. Install mysql-5.6.12-osx10.7-x86_64.pkg and double click the MySQL.prefPane
  6. Check you can start-up MySQL (required for DBD::mysql installation tests)
  7. Go to System Preferences > MySQL > Start MySQL Server
  8. Enter your system admin password

Installing core dependencies

Basic dependencies can be installed using the cpanm command. For core Ensembl that amounts to database bindings so lets bring in DBI.

> cpanm DBI

Should you wish to run any core test suites you will also need the following packages:

> cpanm Test::Differences Test::Exception Test::Perl::Critic

Installing DBD::mysql

Congratulations on getting this far. Now for the tricky bit. By default the required dynamic library is not available on OSX’s default search paths. You can solve by using one of the following 3 options. Once the library is available to OSX you can install DBD::mysql with the following command (ensure your MySQL server is running otherwise the library’s test suite will fail). I prefer to use the second option and symbolically link the library into /usr/lib but this does require admin rights.

> cpanm DBD::mysql

Option 1). Add MySQL’s lib directory onto the DYLD_LIBRARY_PATH

Works well for all command line terminals, does not require admin but will not work if you’re going to use a GUI based application to run Ensembl scripts.

> export DYLD_LIBRARY_PATH=/usr/local/mysql/lib/:$DYLD_LIBRARY_PATH

Option 2). Symbolically link the required library into /usr/lib

Works well for all applications but requires admin rights to create the symbolic link in /usr/lib

> sudo ln -s /usr/local/mysql/lib/libmysqlclient.18.dylib /usr/lib/libmysqlclient.18.dylib

Option 3). Add the library to install_name_tool

A more official OSX way of doing it but will require re-updating the library whenever you upgrade your MySQL installation. Also requires admin rights.

> sudo install_name_tool -id /usr/local/mysql-5.6.12-osx10.7-x86_64/lib/libmysqlclient.18.dylib /usr/local/mysql-5.6.12-osx10.7-x86_64/lib/libmysqlclient.18.dylib

Installing Ensembl

Nearly there. My best advice is to follow the installation instructions hosted on the Ensembl website. Once finished you should verify the installation is good. Ensembl ships with a program called ping_ensembl.pl. We will use this to check we can connect to Ensembl’s UK based MySQL servers and can find the species human.

> perl ~/src/ensembl/misc-scripts/ping_ensembl.pl
Installation is good. Connection to Ensembl works and you can query the human core database

The script will also try to diagnose any problems with missing dependencies. Remember should you need to install any additional dependencies use cpanm.

Congratulations

If you made it this far you should have a fully functional installation of Perl able to query Ensembl. More information on the API is available from our website along with tutorials covering the core, variation, comparative genomics and regulation APIs.

Should you have any issues then please do not hesitate to contact Helpdesk or follow our debug my Ensembl installation guide.

You may have noticed that release 71 saw us make useastdb.ensembl.org accessible on port 3306 alongside the traditional Ensembl DB port 5306. This is in response to comments from users that many institutions and businesses do not allow access to remote resources on non-standard port numbers. The older 5306 was such a non-standard port number. We are beginning a migration which will see Ensembl host current release databases on the default MySQL port 3306 alongside 5306 detailed in the following table

Ensembl Databases and their Ports
Database Database Releases Release 71 Release 72
ensembldb.ensembl.org 47 and lower 3306 and 4306 4306
ensembldb.ensembl.org 48 plus 5306 3306 and 5306
useastdb.ensembl.org Current and previous 3306 and 5306 3306 and 5306

Prior to release 48 all Ensembl databases were hosted on 3306. Release 48 saw an upgrading of our MySQL deployment platform from v4 to v5 (http://lists.ensembl.org/ensembl-dev/msg03424.html). This necessitated the deployment of two servers; 3306 hosting MySQL v4 databases and 5306 hosting v5 databases. This process sees us returning to our original hosting pattern of making our latest releases available on 3306.

We hope that this move will help more users access our resources. Should you need more information then please do not hesitate to get in touch.

An example of output and documentation from the Ensembl REST ServiceWe are pleased to announce the beta release of our programming language agnostic REST API, for Release 68 data, at beta.rest.ensembl.org. Our initial release provides access to:

  • Sequences (genomic, cDNA, CDS and protein)
  • VEP (Variant Effect Predictor)
  • Homologies
  • Gene Trees
  • Assembly and coordinate mapping

Data can be retrieved in JSON, XML and a variety of bioinformatical formats such as FASTA. Each endpoint is fully documented with live service responses and example clients in Perl, Python, Ruby and the Unix command line.

Since 2006, Perl has been the only language to have a supported API. Third party alternatives are available but can lag in their support of new data. The REST service has been developed using Catalyst and the Perl API providing a stable base for development and provides access to all of Ensembl’s functionality. Using the Perl API also means that any Ensembl compatible resource can provide data using the same REST server. Our sister project, Ensembl Genomes, has already taken advantage of this feature and are hosting release 15 data at test.rest.ensemblgenomes.org.

Development is on-going so please let us know about any features you would like to see in a future release. Please send any feedback to helpdesk.

 

Ensembl 65 brought a major change to our core data model; we decided to merge the stable id tables with their parent tables. The relationship between a stable id record and its parent record was 1:1 making these tables an unnecessary step of normalisation & increasing the number of joins the API and MySQL had to perform. If you are using the Perl API this change will be transparent. However if you use direct SQL then views have been provided to replicate the stable id tables and allow your SQL to remain compatible. These views will be removed in Ensembl release 67. To support the new schema queries should be performed against the parent table for example:

  -- Original SQL
  select g.seq_region_start, g.seq_region_end
  from gene g join gene_stable_id gsi using (gene_id)
  where gsi.stable_id = 'ENSG00000139618';

  -- Should now be
  select g.seq_region_start, g.seq_region_end
  from gene g where g.stable_id = 'ENSG00000139618';

If you have any other queries about the changes then please contact helpdesk or our dev mailing list.