So you have just bought a new MacBook or looking into doing some data science work on your existing Mac and need to setup a development environment?

This article is a guide for installing and configuring a lean data science development environment on a new MacBook (M1+ or Intel chip).

To get started with data science, you need three basic components:

  1. A terminal application for launching applications and running programs
  2. A robust Python environment for development work
  3. A code editor/integrated development environment such as Jupyterlab or VSCode

The setup guide includes the key tools required for data science workflows with a focus on lightweight tools (minimising installation of unnecessary or large packages) and automation (i.e. using the command line for installation of software where possible).

We will discuss the following tools that I use as the base for a lean and effective workbench for my data science projects:

  • Homebrew
  • Terminal (iterm2)
  • Python (pyenv)
  • Code Editor (Jupyterlab and Visual Studio Code)
  • Docker

Let’s get into it! 🤘

Installing a System Package Manager

Homebrew

The first step is to install a system package manager. I recommend Homebrew , it is simple to use and just works.

Homebrew is the most popular package manager for MacOS. It is a free and open-source package management system which allows you to easily install applications straight from the command line.

We will use Homebrew to install most of the relevant software for our data science development environment.

Installation instructions are available on the Homebrew homepage . But essentially, you can paste and run the following command into the Terminal application to install Homebrew.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

The script will also install any relevant dependencies (e.g. Apple’s command line tools for Xcode ) if they are not already installed. Note, this can take a while.

You should follow the on screen instructions to complete the installation setup.

After installing Homebrew, you might be prompted to add Homebrew to your PATH (see screenshot below). Adding Homebrew to your PATH variable allows you to use the brew keyword command in the terminal.

Homebrew installation instructions
Output from Homebrew installation script. Including instructions for adding Homebrew to your PATH

Follow the instructions found in the ‘Next steps’ section of the terminal output to configure your shell. For example, using the snippet below with your username:

# add Homebrew to your path if prompted
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> /Users/<your-username>/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

Programs and Tools

Now, we have a package manager, let’s use it to install some key tools which will be the basis of our development environment setup.

Terminal

I prefer to use iterm2 as my terminal, instead of the standard Terminal application which ships with MacOS.

iterm2 is very customisable and has lots of great features .

We can install it using Homebrew. Or you can install it directly from the iterm2 website .

# install iterm2 using Homebrew
brew install --cask iterm2

I won’t go into how you can customise iterm2 (e.g. fonts, color schemes etc.) in this post, but there are lots of great tutorials on the web you can look into that explain various customisations.

Now we have installed iterm2 we can use it for the rest of the setup.

Git

Git is essential for managing version control in your projects and collaborating effectively with others. It should be one of the first things you install on your system.

“Everything should be under version control” – The Pragmatic Programmer , David Thomas and Andrew Hunt

Git should already be downloaded as part of the Homebrew installation step above. However, you can also install it using Homebrew:

brew install git

Once installed you should set your global Git credentials (e.g. your GitHub username and email) to identify yourself on your future commits. For example:

git config --global user.name "John Doe"
git config --global user.email johndoe@example.com

You can also setup your SSH keys to authenticate to GitHub. Check out my other article on managing SSH keys for multiple accounts .

If you are new to learning Git, I highly recommend reading Ry's Git Tutorial . It is a free ebook covering the core aspects of using Git and collaborating with others

Python

Now for the good stuff. Installing Python.

But first a word of warning…

⛔️ Do not mess with your system Python

Python will already come installed with MacOS. If you type python or python3 into your terminal application, the Python REPL will open up. Avoid using or installing packages to the Python program already installed on your system. I will refer to this as your ‘system Python’.

While it is possible to use the system’s Python, it is very risky as you may make unintended changes to the environment which adversely affects other programs in your operating system. Additionally, if you ever break anything in your Python environment and want to just completely uninstall and reinstall it you really could be in real trouble. It’s best not to touch it.

See this video from RealPython to understand more about system Python

✅ Keep your development Python environments separate from the system Python

Instead, you should install a separate application such as conda or pyenv to manage your Python environments separately from the system Python.

I prefer to use pyenv to manage my Python versions and virtual environments.

Many tutorials, particularly for data science, will suggest using conda . However, I find conda (and even miniconda) to be bloated with unnecessary software.

Pyenv is lightweight and makes it very easy to manage multiple versions of Python on the same computer.

Unfortunately, the installation of pyenv is a bit more involved than conda but I think the effort upfront is worth it in the end.

Installing pyenv

Pyenv can be a little tricky to install properly so make sure you read the installation instructions on their GitHub page very carefully.

First you should install the following dependencies documented in the GitHub wiki page :

brew install openssl readline sqlite3 xz zlib

Next, install pyenv

brew install pyenv

Configure your shell

It is easy to miss in the installation instructions, but you also need to configure your shell in order for pyenv to work properly.

The instructions for configuring your shell are (in my opinion) hidden in another part of the documentation – in step 2 of the Basic GitHub Checkout instructions

MacOS and iterm2 use zsh as the default shell. Therefore you need to run the following commands to initialise pyenv when loading a new session:

echo 'eval "$(pyenv init --path)"' >> ~/.zprofile
echo 'eval "$(pyenv init -)"' >> ~/.zshrc

These commands add some information to your .zprofile and .zshrc configuration files which will automatically initialise pyenv every time you open your terminal.

Close and restart your iterm2 terminal for the changes to take effect.

Install a new Python version

After installing pyenv, the only version Python available to us is still the system Python (avoid using!!).

# list available python versions
pyenv versions

A full list of pyenv commands are available in their COMMANDS.md page.

pyenv initial versions
Only system python is availalbe after installing pyenv

Therefore we need to install a separate version of Python to use for our projects – for example version 3.10.0:

# install separate python version 3.10.0
pyenv install 3.10.0

# list versions again
pyenv versions

pyenv versions
Python 3.10.0 is now available to us

Note: I used v3.10.0 as an example, however, you can install any version of Python you like. Even another 3.8.9 version that is separate to the default. See the Python documentation for a full list of Python versions.

Change the default Python version

You will notice there is an asterisk next to ‘system’ in the screenshot above. This means that the default Python interpreter is still the system python.

We can use the pyenv global command to set the default Python version to one we just installed.

# set default version to newly installed 3.10.0
pyenv global 3.10.0

pyenv new version
After running the `pyenv global` command, we can see the Python version being used has changed to v3.10.0 from the default v3.8.9

If, in the future, you need or want to use a different Python version, you can easily install another python version using pyenv install <version> and then set it as the default using pyenv global <version>.

Installing pyenv-virtualenv for virtual environment management

Now we have a separate Python environment, we need to set up a virtual environment manager to manage Python environments for our specific Python projects.

Here is a primer on virtual environments for an explanation on the importance of keeping separate environments for each of your Python projects

You could use the inbuilt venv Python module for virtual environments. However, I like to use a pyenv plugin called pyenv-virtualenv for managing virtual environments which is easy to use and has some great features.

Similar to pyenv, we can use brew to install pyenv-virtualenv and then update our .zshrc config file to enable auto-activation of virtual environments.

Read the instructions in the pyenv-virtualenv GitHub README for more details

# install pyenv-virtualenv plugin
brew install pyenv-virtualenv

# run this command to add a new line to your to .zshrc
echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.zshrc

Close and restart your iterm2 application for the changes to take effect.

pyenv-virtualenv is very easy to use. The main commands are documented on the GitHub README . As an example you can create a new virtual environment using the following:

# create new virtualenv (e.g. pyenv virtualenv <python-version> <env-name>)
pyenv virtualenv 3.10.0 test_project

# activate the virtualenv
pyenv activate test_project

# list all available virtual environments
pyenv virtualenvs

A really nice feature of pyenv-virtualenv is that you can set the Python version and virtual environment for each directory so that the correct virtual environment is automatically activated when you navigate to that directory.

# create and navigate to an example directory called 'test_project'
mkdir test_project && cd test_project

# set the 'local' environment to use the 'test_project' virtualenv created above
pyenv local test_project

pyenv virtual env demo
Notice how the prompt automatically changes when we navigate to the 'test_project' directory after running the `pyenv local` command. This signals that the virtual environment is automatically activated when entering the directory. Pretty handy, right!?

The pyenv local command will create a .python-version file in your directory which records which virtualenv to use when you are in that directory.

You are now all good to go with your Python setup – just activate your virtual environment and use pip install to install packages into your environment as and when you need them!

JupyterLab

Most data scientists typically work with Jupyter notebooks and Jupyterlab.

JupyterLab can be installed into your virtual environment just like any other Python package.

Note: You could install Jupyter lab ‘globally’, however, I find it best to install it into each virtual environment you use. It ensures the Jupyterlab version remains consistent within your environment and automatically loads the other packages from your current virtual environment into Jupyter the notebook.

For example:

# activate virtual environment if not already activated
pyenv activate test_project

# install jupyterlab into your virtual environment
pip install jupyterlab

# open jupyter lab
jupyter lab

jupyterlab ui
Jupyterlab running in our virtual environment

Code Editor

Working from Jupyterlab is great, but for some tasks it is useful to have a more general purpose IDE (integrated development environment).

My preferred IDE (outside of the terminal) is Visual Studio Code .

I was recommended VSCode when I first started. There are many other good options for code editors, such as PyCharm , but I’ve had no issues with VSCode and find it pleasant to use so have stuck with it.

We can install VSCode from the command line (or from their website ):

brew install --cask visual-studio-code

Once installed, you can access VSCode from you applications folder or directly from the terminal using the keyword code, which is quite cool.

Extensions

VSCode has many great features and extensions. My favourite extensions that I install are:

  • Python – Python code completion (IntelliSense) and linting support
  • Pylance – type checking tool (see my article on the importance of type hinting )
  • Jupyter – allows you to create and interact with Jupyter notebooks within VSCode
  • GitLense – valuable functionality to improve your Git workflow and keep track of changes

Other useful extensions are:

  • Docker – useful for interacting with Docker containers (see below)
  • Vim – Vim keybindings in VSCode!

VSCode UI
VSCode extensions for Python development

There is a good tutorial on using Python in VSCode on the VSCode webiste, which is worth checking out if you are new to VSCode

Settings

You can specify various preferences in VScode , either through their UI or in a settings.json file.

These settings can be useful for defining automatic actions when saving files and enabling other language specific features.

Here is my basic settings.json file which automatically formats your files (using the black Python formatter ) when you press save; organises your import statements and enables type checking .

# settings.json
{
    "editor.formatOnSaveMode":"file",
    "editor.formatOnSave":true,
    "editor.codeActionsOnSave": { "source.organizeImports": true},
    "python.linting.pylintEnabled":true,
    "python.linting.enabled": true,
    "python.analysis.typeCheckingMode": "strict",
    "python.formatting.provider": "black",
}

These settings help keep your projects and Python files tidy and compliant with PEP8 formatting conventions which helps with readability .

Docker (Optional)

Docker is an open source platform for building containerized applications. Although general data science workflows and development can happen outside of Docker containers (e.g. in virtual environments), most applications will be eventually deployed in some sort of containerised environment in production.

I have marked installing Docker as optional because it isn’t necessarily required for data science. But developing your applications in a containerised environment is a great way to improve the reproducibility of your analysis and avoid the dreaded “It worked on my machine!” syndrome. It also reduces the gap between development environments and production which will enable you to deploy models faster and with fewer issues.

I highly recommend aspiring data scientists learn the basics of Docker. There is a fantastic YouTube channel called TechWorld with Nana which has an excellent beginner tutorial series for Docker – really worth checking out if you are just getting started with learning Docker.

To install Docker, we have to go the Docker website . You should select and install the relevant program for your MacBook (i.e. Intel chip vs Apple Chip)

Docker website
Choose the relevant installation package for your MacBook chipset

Conclusion

There we have it! A functional data science environment on MacOS.

This post has covered the steps to create a basic data science workbench on MacOS:

  • We installed a system package manager (Homebrew) to easily install applications directly from the command line.
  • Next we installed a good terminal application (iterm2) and ensured we had Git installed and configured.
  • Then we moved onto setting up a robust Python environment using pyenv and the pyenv-virtualenv plugin.
  • The pyenv setup involved first installing some dependencies (e.g.openssl, readline, sqlite3, xz, zlib) and then installing pyenv using Homebrew. We also had to add some information to our .zprofile and .zshrc configuration files in order for pyenv to automatically initialise when opening a new terminal.
  • We discussed the importance of keeping the Python program that you use for development separate from the system Python in order to prevent you accidentally breaking something with your operating system.
  • Finally, we set up VSCode for code editing and installed Docker for containerised application development.

With Homebrew, iterm2, pyenv and VSCode installed and configured correctly you can get a long way towards a productive data science environment.

The next step to supercharge your productivity is customising your terminal application. In a future post I will walk through my terminal configuration using tmux + vim. Stay tuned! 🚀

Homebrew bonus 💡

Did you know you can also install ‘non-coding’ applications via Homebrew such as Spotify 🤯

brew install --cask spotify

Further Reading