A Guide to Reproducible Python Environments
Writing code is just one part of software development. To make your code reliably run on different machines, managing environments is crucial. This article is an introduction to creating reproducible environments in Python using
environment.yaml files, and some additional tips and tricks.
What are Environments?
Environments in Python are isolated workspaces that contain a specific collection of packages and dependencies. By using environments, you can ensure that your code runs the same way it was developed, eliminating the “it works on my machine” problem.
Conda is an open-source package and environment management system. It can help you create, save, load, and switch between environments in Python.
Conda vs Anaconda vs Pip
You might say: Pip (+venv) can do all that, why should I use Conda?
The difference between Pip and Conda is, that Pip installs only Python packages whereas Conda installs packages which may contain software written in any language.
Conda can be installed standalone or come bundled with a collection of common datascience packages, called Anaconda:
After installing conda with e.g. miniconda or anaconda or miniforge-mamba (see below), creating a new environment in Conda is as simple as running the command
conda create --name myenv, where “myenv” is the name you choose for your new environment. Once created, you can activate this environment using
conda activate myenv. This allows you to work within a clean, isolated space, ensuring that your project’s dependencies won’t interfere with one another. Environments you no longer need can be removed with the command
conda remove -n myenv --all.
Installing packages is equally straightforward. Just use the
conda install command followed by the package name you want to add to your environment. Conda will automatically resolve and manage dependencies, simplifying the often complex task of package management.
-y to the end of the
conda install command lets conda skip the confirmation interruption between solving the environment and installing the packages.
Conda Cheat Sheet
The CONDA CHEAT SHEET is a useful collection of the most commonly used conda commands.
Conda calls the repositories from which it obtains the python packages “channels”. These are the places that store all the python packages as well as the package requirements and version informations.
Conda-Forge is a community-driven channel for Conda packages. It is often more up-to-date than the default Anaconda channel. To use Conda-Forge, just include it under
channels in your
environment.yaml file or run
conda config --add channels conda-forge or use the mamba installer linked above, which uses conda-forge channels by default.
environment.yaml file allows you to define an environment in a text format. It specifies the channels to use for package searching and the list of dependencies.
These can either be stored in a central location such as your user root directory or per project in the project repository. The latter offers the advantage of allowing you to track your environment.yml file with git alongside your code.
environment.yaml file for a cadet installation can be found in the post above.
To create an environment from this file, first create the file in a chosen location and then run:
conda env create -f environment.yaml
Alternatively, you can also use a template and set a custom name (e.g.
my_project) for your project:
conda env create -f environment.yaml -n my_project