I don't know exactly why, but installing science/math/statistics oriented Python packages on OS X has historically been a complete pain in the ass. It seems as though things have improved over the past few years with the development of custom disk images, meta-package installers and other fanciful things, but most of these solutions sacrifice the ability to upgrade the given packages or link against custom builds of supporting libraries due to overly aggressive sandboxing to ensure that things Just Work™.
I've written this as a reference for persons not familiar with the intricacies of package management and environment separation that is common best practice in the software development/operations world, but who would nonetheless prefer to steer clear of the point-and-click installers1.
Before you do anything at all, make sure you install Xcode. If you already have it installed, you can safely skip this step. If you do not have it installed and wish to avoid the 4.3GB download, you can install the Xcode command line tools, which clocks in at a few hundred megabytes in size and include the relevant compilers and build toolchains that will be necessary for every step that follows.
The easiest way to install the Xcode CLI tools is with
xcode-select, which is included in OS X Mavericks:
Installing the Latest Python via Homebrew
You're probably not going to get very far with the built-in Python that OS X ships with. Perhaps you're one of the lucky ones and you've simply installed any and all Python packages to your system
site-packages without issue. If you are, then relish in this temporary luxury; it won't last
for very long.
For the rest of us, we're going to install the latest Python 2 (
2.7.8 as of this writing) via
Installing Homebrew itself is simple enough2:
This will check for some required dependencies, make sure your system is up-to-date, and install the required scripts in the default destination of
/usr/local. We won't go into much detail on this — there are plenty of guides on how to install Homebrew —, but once the process has completed successfully and you've followed all the printed instructions, you can move on to the next step.
Installing the latest version of Python 2
Now for the big guy:
Assuming that your
$PATH has been updated (as suggested during the Homebrew installation) to include
/usr/local/bin, the default
python executable should be the one you just installed.
You can check this by looking at the output of
python --version and ensuring it's the same version reported by Homebrew, in addition to running
which python to ensure that it points to
/usr/local/bin/python and not
/usr/bin/python; if the latter is the case, that means your default Python binary is still the system default one.
As a nice side-effect of the above successful installation, we now have
pip (the Python package manager) available as well. Let's put that to good use right away.
Virtualenv and Virtualenvwrapper
The virtualenv package, which is now a default module in Python
3.3 (albeit with a slightly different name and functionality), is essential for ensuring that you minimize
your chances of ending up in dependency hell. From their introduction:
It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).
A tool that I frequently use is virtualenvwrapper,
which is a very small set of smart defaults and command aliases to make working with virtualenvs more
intuitive. Let's install that now, which will also install the base
Next, you'll want to add the following lines to the end of your shell startup file. This is most likely
if you've changed your default shell to something else such as
zsh, then it could be different (e.g.
The first line in the above code block indicates that new virtual environments created with virtualenvwrapper should
be stored in
$HOME/.virtualenvs. You can modify this as you see fit, but I generally leave this as a good default. I find
that keeping all my virtualenvs in the same hidden folder in my home directory reduces the amount of clutter within individual
projects, and makes it a bit more difficult to mistakenly add a whole virtual environment to version control3.
The second line adds a few convenient aliases to your current shell environment for creating, activating, switching and removing environments:
mkvirtualenv test: Will create a
testenvironment and activate it automatically.
workon scientific: Switches you to the (already created)
workon: When you don't specify an environment, this will print all existing environments available.
deactivate: Disables the currently active environment, if any.
rmvirtualenv statistics: Completely remove the
We'll use this to create an environment for installing our scientific packages:
This will create a blank
scientific environment, and activate it – you should see a
(scientific) tag in your shell
The list of things that virtualenvwrapper can do is quite extensive, if you're interested. I've only presented the very basics here, but more in-depth learning what this tool can do is indispensable if you work with Python virtual environments all day long.
Almost There: Some Necessary and Optional Libraries
A few system libraries that you'll want to install before finally installing the final Python packages that we've been working towards:
While this might seem like a silly requirement (Freetype is a library for providing a cross-platform font engine), some packages require it for the programmatic generation of imates that includes text, such as plots in matplotlib5.
You might not deal with XML files directly, but there are some libraries and/or packages that utilize XML as an intermediate data representation format. This library is the de facto standard that most userland packages will link against.
The Finale: Installing the Python Packages
Our stage is primed, and we can finally attempt to install the packages that we wanted in the first place. Within the active
scientific environment, run the following command6:
Assuming that we've done everything correctly, this should take a few minutes to fetch the packages in question from the PyPi index, install them and their dependencies (some of which overlap, e.g. SciPy depends on NumPy), and compile any required C extensions.
A simple way to check that everything was installed correctly is to inspect the version strings for all of the relevant packages:
If all the above version strings print without error, then there's a high likelihood that everything is working as expected.
The virtual environment framework that we've setup is a robust one, and can be used for any number of projects. For the sake of simplicity we created a single
scientific virtualenv, but there's nothing stopping you from creating different environments for any and all of your projects, as long as the latter are independent of one another.
Resist the temptation to install every package for every one of your projects into the same catch-all virtualenv, and you should manage to hold on to your sanity for just a bit longer than your colleagues.
- Familiarity with the command-line is an obvious necessity, of course. ⏎
There are some non-negligible security implications when running a remote script in this manner, but for the sake of simplicity we will not deviate from the official Homebrew installation instructions. If this worries you, it's always possible to first fetch the given installation script, inspect it,
and run it locally. Additionally, you can always fetch the current tarball
and extract it in
- Adding an entire virtual environment to version control might seem like a good idea, but things are never as simple as they seem. The moment that someone running a slightly (or completely) different operating system decides to download your project that includes a full virtualenv folder that may contain packages with C modules that were compiled against your own architecture, they're going to have a hard time getting things to work. ⏎
If you are using a shell other than Bash or Zsh, this environment tag may or may not appear. The way this works is that the script that is used to activate the virtualenv also modifies your current prompt string (the
PS1environment variable) to indicate the currently active virtualenv. As a result, there is a chance that this may not work if you're using a very special or non-standard shell configuration. ⏎
- I recently ran into a bug while attempting to install matplotlib that required a bit of source diving to determine that it was failing due to freetype not being available on my system. The bug has since been fixed, but as of this writing has not been made available in a stable release. ⏎
I've tacked on the ipython package, which many of you might already be using as an enhanced interactive
shell (or even as an incredibly useful interactive notebook). It's possible to use the
ipythonpackage installed in your system
site-packagesfor all of your virtualenvs, but some unexpected behaviour might occur. As a result, it's suggested to install ipython into each virtualenv when required. ⏎