A Guide to Reproducible Python Environments and CADET Installations

A recommended workflow to install CADET-Process:

For more detail on each point, see the post below.

  1. Install a python package manager. We recommend mamba or conda
  2. Prepare an environment.yml text file. For example it could contain:
name: cadet
channels:
  - conda-forge
dependencies:
  - python=3.10  # currently recommended for cadet-process installations 
  - pip
  - cadet
#  - jupyterlab  # optional, to run jupyter notebooks
#  - openpyxl    # optional, to open .xlsx files
#  - git         # optional, for version control
  - pip:
      - cadet-process
  1. For mamba: In your miniforge prompt run mamba env create -f environment.yml
    For conda: In your conda prompt run conda env create -f environment.yml
  2. Check out the tutorials, the documentation, or an example.
1 Like

A Guide to Reproducible Python Environments

Writing code is just one part of software development. To make your code reliably run on different machines, managing environments is crucial. This article is an introduction to creating reproducible environments in Python using conda, environment.yaml files, and some additional tips and tricks.

What are Environments?

Environments in Python are isolated workspaces that contain a specific collection of packages and dependencies. By using environments, you can ensure that your code runs the same way it was developed, eliminating the “it works on my machine” problem.

Conda

Conda is an open-source package and environment management system. It can help you create, save, load, and switch between environments in Python.

Conda vs Anaconda vs Pip

You might say: Pip (+venv) can do all that, why should I use Conda?
The difference between Pip and Conda is, that Pip installs only Python packages whereas Conda installs packages which may contain software written in any language.

Conda can be installed standalone or come bundled with a collection of common datascience packages, called Anaconda:


source

After installing conda with e.g. miniconda or anaconda or miniforge-mamba (see below), creating a new environment in Conda is as simple as running the command conda create --name myenv, where “myenv” is the name you choose for your new environment. Once created, you can activate this environment using conda activate myenv. This allows you to work within a clean, isolated space, ensuring that your project’s dependencies won’t interfere with one another. Environments you no longer need can be removed with the command conda remove -n myenv --all.

Installing packages is equally straightforward. Just use the conda install command followed by the package name you want to add to your environment. Conda will automatically resolve and manage dependencies, simplifying the often complex task of package management.

Adding a -y to the end of the conda install command lets conda skip the confirmation interruption between solving the environment and installing the packages.

Conda Cheat Sheet

The CONDA CHEAT SHEET is a useful collection of the most commonly used conda commands.

Conda-Forge

Conda calls the repositories from which it obtains the python packages “channels”. These are the places that store all the python packages as well as the package requirements and version informations.

Conda-Forge is a community-driven channel for Conda packages. It is often more up-to-date than the default Anaconda channel. To use Conda-Forge, just include it under channels in your environment.yaml file or run conda config --add channels conda-forge or use the mamba installer linked above, which uses conda-forge channels by default.

environment.yaml Files

An environment.yaml file allows you to define an environment in a text format. It specifies the channels to use for package searching and the list of dependencies.

These can either be stored in a central location such as your user root directory or per project in the project repository. The latter offers the advantage of allowing you to track your environment.yml file with git alongside your code.

A sample environment.yaml file for a cadet installation can be found in the post above.

To create an environment from this file, first create the file in a chosen location and then run:

conda env create -f environment.yaml

Alternatively, you can also use a template and set a custom name (e.g. my_project) for your project:

conda env create -f environment.yaml -n my_project
3 Likes

Mamba

Conda can be quite slow when calculating which package versions to choose to create an environment.

To improve this issue, Mamba was implemented. Mamba is a reimplementation of the conda package manager in C++. It advertises much faster dependency solving compared to conda. It can be used as a base python environment manager instead of conda and can be installed from here.

For most commands mamba is a drop-in replacement for conda. E.g. to install a new package, you can now simply run

mamba install -c conda-forge cadet

Or, to create a new environment:

mamba env create -f environment.yaml

Installing a Custom Version of a Package From a Git Branch or Commit

You can also use pip within a Conda/ Mamba environment to install packages directly from a specific Git branch or commit. This requires Git to be installed.

A word of warning: mixing conda and pip can lead to problems and might break your conda environment. Pip commands should therefore only be used within conda environments and not in the base environment.

pip install git+https://github.com/your-user/your-repo.git@your-branch

e.g., to install the dev version of CADET-Process, run the following:

pip install git+https://github.com/fau-advanced-separations/CADET-Process.git@dev

or with SSH:

pip install git+ssh://git@github.com/fau-advanced-separations/CADET-Process.git@dev

or from gitlab e.f. the CADET RDM tool

pip install git+ssh://git@jugit.fz-juelich.de/IBG-1/ModSim/cadet/CADET-RDM.git@master