A Conda workflow that, well, just works

Our project has been developing Python applications for almost 5 years now, including DevOp-based tools, full-stack websites, a bit of data science, and a bunch of one-off, use-once, toss away applications. Now overlay these Python applications on top of development and deployment systems running Windows 10, Mac OS X, and Linux and try to keep environments consistent with the same packages, let alone the same package versions. And at the time, Microsoft was not yet supporting a stable version of Python as part of their native offerings. Moreover, the Python 3 virtual environment landscape was still in flux, pyenv is just weird, and managing venv's in Windows 10 just seemed like a nightmare.

Enter Anaconda and its conda package and environment management tool. I'll not go into Anaconda's background, or why you should consider using their version of the Python ecosystem – you can Google as well as I can write. I will say, however, that Anaconda's website proclaims their Data Science bent, but their tool-chain goes way beyond Data Science.

Installing Anaconda

The first step towards using Anaconda is installing it. For me, on a Linux desktop, I downloaded the Linux installer, a shell script (Anaconda3-2020.02-Linux-x86_64.sh as of this post), and ran it as the local user (not root). If you accept all of the defaults, Anaconda will insert itself into your local .bashrc and invoke its default base environment when you startup your terminal window. I prefer to start Anaconda's environment at my choosing, so I excised the script fragment from .bashrc and dropped it verbatim into a separate shell file:

#!/bin/sh
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/servilla/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/servilla/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/home/servilla/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/home/servilla/anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

To start up the default base environment, I source the file that contains the above script fragment: > . ./bin/source-conda.sh, and magically the Anaconda environment comes to life with a clue ((base), at least in my shell) to its presence.

Project Management

When using Anaconda for a Python project, you always interact with conda at the command line interface. So, what is conda? From the documentation, "Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux." Surprisingly, conda supports languages other than Python, including "R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more." I've only used conda with Python and R, and this post only describes how I use it to manage Python projects.

There are a couple of points I would like to make about using conda before getting into the details of my workflow: 1) conda works just like any other Python virtual environment tool to create and update Python packages (coming from the Anaconda repository) and to maintain the environment in isolation from other peer environments and 2) conda seamlessly supports the use of Python pip to install packages from other package repositories (namely PyPi), especially for packages that are not available through Anaconda.

My general workflow using conda to develop Python applications is likely typical of other users: first, create a new virtual environment, install needed packages, develop your code, save the environment (including your code), copy the saved environment to a deployment server and re-create the environment, install your code, done – wash, rinse, repeat.

With this being said, lets get into the nuts-n-bolts.

Step 1: Creating and activating a new environment

To create a new Python virtual environment use the command conda create – straight forward. There are a few options to this command (see conda create --help), but the one that I always use is --no-default-packages, which creates a new virtual environment with only the necessary packages installed – in other words, you get to select most of the batteries that you require. You will also want to use the --name option, which is used to identify the name of the environment you are creating. I always install Python as a part of this step (python=<version>) to ensure I get the version of Python I want (by default, conda will install the Python version in the base environment, which for me is Python 3.7.7). Lets create a new environment with the name "project" and Python version 3.8:

> conda create --name project --no-default-packages python=3.8
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/servilla/anaconda3/envs/project

  added / updated specs:
    - python=3.8


The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
  ca-certificates    pkgs/main/linux-64::ca-certificates-2020.6.24-0
  certifi            pkgs/main/linux-64::certifi-2020.6.20-py38_0
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
  libedit            pkgs/main/linux-64::libedit-3.1.20191231-h14c3975_1
  libffi             pkgs/main/linux-64::libffi-3.3-he6710b0_2
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  ncurses            pkgs/main/linux-64::ncurses-6.2-he6710b0_1
  openssl            pkgs/main/linux-64::openssl-1.1.1g-h7b6447c_0
  pip                pkgs/main/linux-64::pip-20.1.1-py38_1
  python             pkgs/main/linux-64::python-3.8.3-hcff3b4d_2
  readline           pkgs/main/linux-64::readline-8.0-h7b6447c_0
  setuptools         pkgs/main/linux-64::setuptools-49.2.0-py38_0
  sqlite             pkgs/main/linux-64::sqlite-3.32.3-h62c20be_0
  tk                 pkgs/main/linux-64::tk-8.6.10-hbc83047_0
  wheel              pkgs/main/linux-64::wheel-0.34.2-py38_0
  xz                 pkgs/main/linux-64::xz-5.2.5-h7b6447c_0
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3


Proceed ([y]/n)?

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate project
#
# To deactivate an active environment, use
#
#     $ conda deactivate

As you can see, the "project" environment has been setup with a minimal set of packages (including Python 3.8.3) and is waiting to be activated (base) > conda activate project.

Step 2: Adding packages from Anaconda and PyPi

Now, say I am working on a simple CLI application and I would like to do some CLI options using click. To install click via conda, you would simply conda install click (note that conda has many options, including searching for packages and describing packages - read the docs):

> conda install click
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/servilla/anaconda3/envs/project

  added / updated specs:
    - click


The following NEW packages will be INSTALLED:

  click              pkgs/main/noarch::click-7.1.2-py_0


Proceed ([y]/n)? 

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

~/tmp via C project took 11s 

In addition, I'd like to use a logging wrapper called daiquiri, but unfortunately it is not available through Anaconda's default distribution channel (it is, however, on the conda-forge channel). No big deal, just install it from PyPi pip install daiquiri:

> pip install daiquiri
Collecting daiquiri
  Using cached daiquiri-2.1.1-py2.py3-none-any.whl (17 kB)
Processing /home/servilla/.cache/pip/wheels/5c/ea/22/e0e5f32a7d6a9d15791b539766bc1a091f412f05e846b15718/python_json_logger-0.1.11-py2.py3-none-any.whl
Installing collected packages: python-json-logger, daiquiri
Successfully installed daiquiri-2.1.1 python-json-logger-0.1.11

conda is very clear about the origin and version of packages installed; these can be seen with conda list command:

> conda list
# packages in environment at /home/servilla/anaconda3/envs/project:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
ca-certificates           2020.6.24                     0  
certifi                   2020.6.20                py38_0  
click                     7.1.2                      py_0  
daiquiri                  2.1.1                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
ncurses                   6.2                  he6710b0_1  
openssl                   1.1.1g               h7b6447c_0  
pip                       20.1.1                   py38_1  
python                    3.8.3                hcff3b4d_2  
python-json-logger        0.1.11                   pypi_0    pypi
readline                  8.0                  h7b6447c_0  
setuptools                49.2.0                   py38_0  
sqlite                    3.32.3               h62c20be_0  
tk                        8.6.10               hbc83047_0  
wheel                     0.34.2                   py38_0  
xz                        5.2.5                h7b6447c_0  
zlib                      1.2.11               h7b6447c_3 

You can see above that click version 7.1.2 was added to "project" from Anaconda (based mostly on the empty Channel value, which implies "default") and that daiquiri version 2.1.1 and python-json-logger (a daiquiri dependency) version 0.1.11 were added from PyPi. These three packages were also added to the virtual environment's local package repository found in anaconda3/envs/project/lib/python3.8/site-packages:

certifi
certifi-2020.6.20-py3.8.egg-info
click
click-7.1.2.dist-info
daiquiri
daiquiri-2.1.1.dist-info
easy_install.py
pip
pip-20.1.1-py3.8.egg-info
pkg_resources
pythonjsonlogger
python_json_logger-0.1.11.dist-info
README.txt
setuptools
setuptools-49.2.0.post20200714-py3.8.egg-info
wheel
wheel-0.34.2-py3.8.egg-info

Step 3: Adding your local application source code

I'll not go into much detail here since everyone seems to have their own recipe for packaging their own source code into their applications. In my case, I still rely on setuptools and the setup.py file for deploying local source code into the application through pip, which will either park the local package into the site-packages directory or links it back to source location if using the --editable option. We do not use PyPi to distribute our source code, but rather rely on pulling source code directly from GitHub. This does have implications when recreating the virtual environment for which I will discuss below.

Step 4: Saving the "project" environment

conda provides a very nice tool-chain for managing environments through the env interface. I use two versions of this command to save the specifications into a yaml file for use in recreating the environment: 1) conda env export --file environment.yml saves detailed package information for the entire environment and 2) conda env export --from-history --file environment.yml saves only the packages you installed with conda install.

The conda env export --file environment.yml command, by default, saves both the standard package version (as you would find in PyPi, for example) and for the explicit build version found in the Anaconda package repository:

> conda env export --file environment.yml
name: project
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - ca-certificates=2020.6.24=0
  - certifi=2020.6.20=py38_0
  - click=7.1.2=py_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20191231=h14c3975_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=9.1.0=hdf63c60_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - ncurses=6.2=he6710b0_1
  - openssl=1.1.1g=h7b6447c_0
  - pip=20.1.1=py38_1
  - python=3.8.3=hcff3b4d_2
  - readline=8.0=h7b6447c_0
  - setuptools=49.2.0=py38_0
  - sqlite=3.32.3=h62c20be_0
  - tk=8.6.10=hbc83047_0
  - wheel=0.34.2=py38_0
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - daiquiri==2.1.1
    - python-json-logger==0.1.11
prefix: /home/servilla/anaconda3/envs/project

The build version is unique to the target operating system – in this case, Linux. If this build specification was used to recreate the environment on a Windows 10 or Mac OS X host, it would fail to find the correct packages and not complete. For this reason, if I want to recreate an environment across different operating systems, I often use the --no-builds option to remove the specific Anaconda package information from the output:

conda env export --no-builds --file environment.yml
name: project
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - ca-certificates=2020.6.24
  - certifi=2020.6.20
  - click=7.1.2
  - ld_impl_linux-64=2.33.1
  - libedit=3.1.20191231
  - libffi=3.3
  - libgcc-ng=9.1.0
  - libstdcxx-ng=9.1.0
  - ncurses=6.2
  - openssl=1.1.1g
  - pip=20.1.1
  - python=3.8.3
  - readline=8.0
  - setuptools=49.2.0
  - sqlite=3.32.3
  - tk=8.6.10
  - wheel=0.34.2
  - xz=5.2.5
  - zlib=1.2.11
  - pip:
    - daiquiri==2.1.1
    - python-json-logger==0.1.11
prefix: /home/servilla/anaconda3/envs/project

There is, however, another issue to contend with that is tied to both the target operating system and dependent packages. If, for example, your application requires upstream dependencies that are operating system specific (e.g., file system-level packages are often different between Linux and Windows 10), using the standard conda env export may not work when recreating an environment that originates on Linux, but is destined for Windows 10  or Mac OS X. Using the --no-builds option does not help in this case either since the problem lies with what packages are listed as dependencies, not with the build version being used. To alleviate this problem, you can  use the conda env export --from-history command, which only saves packages you have explicitly installed with conda install, leaving out any of their conda installed dependencies:

> conda env export --from-history --file environment.yml
name: project
channels:
  - defaults
dependencies:
  - python=3.8
  - click
prefix: /home/servilla/anaconda3/envs/project

Using this last method allows conda to recreate any of the upstream dependencies using its own rules, and often works without user intervention. Unfortunately, this greatly slimmed down version of the environment does not include packages installed via pip. These must be merged in separately and in a manual process. If you did not include package versions during their installation, you may want to "pin" them in the saved environment.yml file:

name: project
channels:
  - defaults
dependencies:
  - python=3.8
  - click=7.1.2
  - pip:
    - daiquiri==2.1.1
    - python-json-logger==0.1.11
prefix: /home/servilla/anaconda3/envs/project

To finish this step, push the locally developed source code, including the environment.yml file (as a backup, I usually run pip freeze > requirements.txt and save the output to Git just in case),  to GitHub (or your favorite source code repository; or make a tarball for that matter).

Step 5: Recreating the "project" environment

Recreating a conda environment is quite simple. As noted above, the enviroment.yml file is stored in GitHub along with the project application source code. First, clone the GitHub hosted repository. This will bring down the environment.yml file. Next, use the conda env create --file environment.yml command to recreate the original "project" environment. In addition to installing the Anaconda-based packages, conda will also install any packages installed by pip (in this case, both daiquiri and python-json-logger). Finally, activate the "project" virtual environment using conda activate project and install the application source code using pip.

If you update your development environment and would like push this to your deployment server, you only need to regenerate the environment.yml file, along with any application source code changes you've made, and pull it down to the deployment server. Once there, you can execute the conda env update --file environment.yml command  – this will update your deployed application environment.

As always, your mileage may vary.