Posted on

I don't know exactly why, but installing science/math/statistics oriented Python packages on OS X has historically been a complete pain in the ass. It seems as though things have improved over the past few years with the development of custom disk images, meta-package installers and other fanciful things, but most of these solutions sacrifice the ability to upgrade the given packages or link against custom builds of supporting libraries due to overly aggressive sandboxing to ensure that things Just Work™.

I've written this as a reference for persons not familiar with the intricacies of package management and environment separation that is common best practice in the software development/operations world, but who would nonetheless prefer to steer clear of the point-and-click installers1.

Initial Setup

Before you do anything at all, make sure you install Xcode. If you already have it installed, you can safely skip this step. If you do not have it installed and wish to avoid the 4.3GB download, you can install the Xcode command line tools, which clocks in at a few hundred megabytes in size and include the relevant compilers and build toolchains that will be necessary for every step that follows.

The easiest way to install the Xcode CLI tools is with xcode-select, which is included in OS X Mavericks:

$ xcode-select --install

Installing the Latest Python via Homebrew

You're probably not going to get very far with the built-in Python that OS X ships with. Perhaps you're one of the lucky ones and you've simply installed any and all Python packages to your system site-packages without issue. If you are, then relish in this temporary luxury; it won't last for very long.

For the rest of us, we're going to install the latest Python 2 (2.7.8 as of this writing) via Homebrew.

Installing Homebrew itself is simple enough2:

$ ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

This will check for some required dependencies, make sure your system is up-to-date, and install the required scripts in the default destination of /usr/local. We won't go into much detail on this — there are plenty of guides on how to install Homebrew —, but once the process has completed successfully and you've followed all the printed instructions, you can move on to the next step.

Installing the latest version of Python 2

Now for the big guy:

$ brew install python

Assuming that your $PATH has been updated (as suggested during the Homebrew installation) to include /usr/local/bin, the default python executable should be the one you just installed.

You can check this by looking at the output of python --version and ensuring it's the same version reported by Homebrew, in addition to running which python to ensure that it points to /usr/local/bin/python and not /usr/bin/python; if the latter is the case, that means your default Python binary is still the system default one.

As a nice side-effect of the above successful installation, we now have pip (the Python package manager) available as well. Let's put that to good use right away.

Virtualenv and Virtualenvwrapper

The virtualenv package, which is now a default module in Python 3.3 (albeit with a slightly different name and functionality), is essential for ensuring that you minimize your chances of ending up in dependency hell. From their introduction:

It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).

A tool that I frequently use is virtualenvwrapper, which is a very small set of smart defaults and command aliases to make working with virtualenvs more intuitive. Let's install that now, which will also install the base virtualenv package:

$ pip install virtualenvwrapper

Next, you'll want to add the following lines to the end of your shell startup file. This is most likely ~/.bashrc, but if you've changed your default shell to something else such as zsh, then it could be different (e.g. ~/.zshrc):

export WORKON_HOME=$HOME/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh

The first line in the above code block indicates that new virtual environments created with virtualenvwrapper should be stored in $HOME/.virtualenvs. You can modify this as you see fit, but I generally leave this as a good default. I find that keeping all my virtualenvs in the same hidden folder in my home directory reduces the amount of clutter within individual projects, and makes it a bit more difficult to mistakenly add a whole virtual environment to version control3.

The second line adds a few convenient aliases to your current shell environment for creating, activating, switching and removing environments:

  1. mkvirtualenv test: Will create a test environment and activate it automatically.
  2. workon scientific: Switches you to the (already created) scientific environment.
  3. workon: When you don't specify an environment, this will print all existing environments available.
  4. deactivate: Disables the currently active environment, if any.
  5. rmvirtualenv statistics: Completely remove the statistics environment.

We'll use this to create an environment for installing our scientific packages:

$ mkvirtualenv scientific

This will create a blank scientific environment, and activate it – you should see a (scientific) tag in your shell prompt4.

The list of things that virtualenvwrapper can do is quite extensive, if you're interested. I've only presented the very basics here, but more in-depth learning what this tool can do is indispensable if you work with Python virtual environments all day long.

Almost There: Some Necessary and Optional Libraries

A few system libraries that you'll want to install before finally installing the final Python packages that we've been working towards:

Freetype

$ brew install freetype

While this might seem like a silly requirement (Freetype is a library for providing a cross-platform font engine), some packages require it for the programmatic generation of imates that includes text, such as plots in matplotlib5.

Libxml2

$ brew install libxml2

You might not deal with XML files directly, but there are some libraries and/or packages that utilize XML as an intermediate data representation format. This library is the de facto standard that most userland packages will link against.

The Finale: Installing the Python Packages

Our stage is primed, and we can finally attempt to install the packages that we wanted in the first place. Within the active scientific environment, run the following command6:

$ pip install numpy scipy matplotlib pandas nltk ipython

Assuming that we've done everything correctly, this should take a few minutes to fetch the packages in question from the PyPi index, install them and their dependencies (some of which overlap, e.g. SciPy depends on NumPy), and compile any required C extensions.

A simple way to check that everything was installed correctly is to inspect the version strings for all of the relevant packages:

(scientific) $ ipython
Python 2.7.8 (default, Aug 24 2014, 21:26:19)

In [1]: import numpy

In [2]: print numpy.version.full_version
1.8.2

In [3]: import scipy

In [4]: print scipy.version.full_version
0.14.0

In [5]: import pandas

In [6]: print pandas.version.version
0.14.1

In [7]: import nltk

In [8]: print nltk.version_info
sys.version_info(major=2, minor=7, micro=8, releaselevel='final', serial=0)

In [9]: import matplotlib

In [10]: print matplotlib.__version__
1.4.0

{% endhighlight %}

If all the above version strings print without error, then there's a high likelihood that everything is working as expected.

Going Further

The virtual environment framework that we've setup is a robust one, and can be used for any number of projects. For the sake of simplicity we created a single scientific virtualenv, but there's nothing stopping you from creating different environments for any and all of your projects, as long as the latter are independent of one another.

Resist the temptation to install every package for every one of your projects into the same catch-all virtualenv, and you should manage to hold on to your sanity for just a bit longer than your colleagues.


1

Familiarity with the command-line is an obvious necessity, of course.

2

There are some non-negligible security implications when running a remote script in this manner, but for the sake of simplicity we will not deviate from the official Homebrew installation instructions. If this worries you, it's always possible to first fetch the given installation script, inspect it, and run it locally. Additionally, you can always fetch the current tarball and extract it in /usr/local yourself.

3

Adding an entire virtual environment to version control might seem like a good idea, but things are never as simple as they seem. The moment that someone running a slightly (or completely) different operating system decides to download your project that includes a full virtualenv folder that may contain packages with C modules that were compiled against your own architecture, they're going to have a hard time getting things to work.

4

If you are using a shell other than Bash or Zsh, this environment tag may or may not appear. The way this works is that the script that is used to activate the virtualenv also modifies your current prompt string (the PS1 environment variable) to indicate the currently active virtualenv. As a result, there is a chance that this may not work if you're using a very special or non-standard shell configuration.

5

I recently ran into a bug while attempting to install matplotlib that required a bit of source diving to determine that it was failing due to freetype not being available on my system. The bug has since been fixed, but as of this writing has not been made available in a stable release.

6

I've tacked on the ipython package, which many of you might already be using as an enhanced interactive shell (or even as an incredibly useful interactive notebook). It's possible to use the same ipython package installed in your system site-packages for all of your virtualenvs, but some unexpected behaviour might occur. As a result, it's suggested to install ipython into each virtualenv when required.