A Conda workflow that, well, just works
Our project has been developing Python applications for almost 5 years now, including DevOp-based tools, full-stack websites, a bit of data science, and a bunch of one-off, use-once, toss away applications. Now overlay these Python applications on top of development and deployment systems running Windows 10, Mac OS X, and Linux and try to keep environments consistent with the same packages, let alone the same package versions. And at the time, Microsoft was not yet supporting a stable version of Python as part of their native offerings. Moreover, the Python 3 virtual environment landscape was still in flux, pyenv
is just weird, and managing venv's in Windows 10 just seemed like a nightmare.
Enter Anaconda and its conda
package and environment management tool. I'll not go into Anaconda's background, or why you should consider using their version of the Python ecosystem – you can Google as well as I can write. I will say, however, that Anaconda's website proclaims their Data Science bent, but their tool-chain goes way beyond Data Science.
Installing Anaconda
The first step towards using Anaconda is installing it. For me, on a Linux desktop, I downloaded the Linux installer, a shell script (Anaconda3-2020.02-Linux-x86_64.sh
as of this post), and ran it as the local user (not root
). If you accept all of the defaults, Anaconda will insert itself into your local .bashrc
and invoke its default base
environment when you startup your terminal window. I prefer to start Anaconda's environment at my choosing, so I excised the script fragment from .bashrc
and dropped it verbatim into a separate shell file:
#!/bin/sh
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/servilla/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/servilla/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/servilla/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/servilla/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
To start up the default base
environment, I source the file that contains the above script fragment: > . ./bin/source-conda.sh
, and magically the Anaconda environment comes to life with a clue ((base)
, at least in my shell) to its presence.
Project Management
When using Anaconda for a Python project, you always interact with conda
at the command line interface. So, what is conda
? From the documentation, "Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux." Surprisingly, conda
supports languages other than Python, including "R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more." I've only used conda
with Python and R, and this post only describes how I use it to manage Python projects.
There are a couple of points I would like to make about using conda
before getting into the details of my workflow: 1) conda
works just like any other Python virtual environment tool to create and update Python packages (coming from the Anaconda repository) and to maintain the environment in isolation from other peer environments and 2) conda
seamlessly supports the use of Python pip
to install packages from other package repositories (namely PyPi), especially for packages that are not available through Anaconda.
My general workflow using conda
to develop Python applications is likely typical of other users: first, create a new virtual environment, install needed packages, develop your code, save the environment (including your code), copy the saved environment to a deployment server and re-create the environment, install your code, done – wash, rinse, repeat.
With this being said, lets get into the nuts-n-bolts.
Step 1: Creating and activating a new environment
To create a new Python virtual environment use the command conda create
– straight forward. There are a few options to this command (see conda create --help
), but the one that I always use is --no-default-packages
, which creates a new virtual environment with only the necessary packages installed – in other words, you get to select most of the batteries that you require. You will also want to use the --name
option, which is used to identify the name of the environment you are creating. I always install Python as a part of this step (python=<version>
) to ensure I get the version of Python I want (by default, conda
will install the Python version in the base
environment, which for me is Python 3.7.7). Lets create a new environment with the name "project" and Python version 3.8:
> conda create --name project --no-default-packages python=3.8
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/servilla/anaconda3/envs/project
added / updated specs:
- python=3.8
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main
ca-certificates pkgs/main/linux-64::ca-certificates-2020.6.24-0
certifi pkgs/main/linux-64::certifi-2020.6.20-py38_0
ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
libedit pkgs/main/linux-64::libedit-3.1.20191231-h14c3975_1
libffi pkgs/main/linux-64::libffi-3.3-he6710b0_2
libgcc-ng pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
ncurses pkgs/main/linux-64::ncurses-6.2-he6710b0_1
openssl pkgs/main/linux-64::openssl-1.1.1g-h7b6447c_0
pip pkgs/main/linux-64::pip-20.1.1-py38_1
python pkgs/main/linux-64::python-3.8.3-hcff3b4d_2
readline pkgs/main/linux-64::readline-8.0-h7b6447c_0
setuptools pkgs/main/linux-64::setuptools-49.2.0-py38_0
sqlite pkgs/main/linux-64::sqlite-3.32.3-h62c20be_0
tk pkgs/main/linux-64::tk-8.6.10-hbc83047_0
wheel pkgs/main/linux-64::wheel-0.34.2-py38_0
xz pkgs/main/linux-64::xz-5.2.5-h7b6447c_0
zlib pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3
Proceed ([y]/n)?
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate project
#
# To deactivate an active environment, use
#
# $ conda deactivate
As you can see, the "project" environment has been setup with a minimal set of packages (including Python 3.8.3) and is waiting to be activated (base) > conda activate project
.
Step 2: Adding packages from Anaconda and PyPi
Now, say I am working on a simple CLI application and I would like to do some CLI options using click
. To install click
via conda
, you would simply conda install click
(note that conda
has many options, including searching for packages and describing packages - read the docs):
> conda install click
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/servilla/anaconda3/envs/project
added / updated specs:
- click
The following NEW packages will be INSTALLED:
click pkgs/main/noarch::click-7.1.2-py_0
Proceed ([y]/n)?
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
~/tmp via C project took 11s
In addition, I'd like to use a logging wrapper called daiquiri
, but unfortunately it is not available through Anaconda's default distribution channel (it is, however, on the conda-forge channel). No big deal, just install it from PyPi pip install daiquiri:
> pip install daiquiri
Collecting daiquiri
Using cached daiquiri-2.1.1-py2.py3-none-any.whl (17 kB)
Processing /home/servilla/.cache/pip/wheels/5c/ea/22/e0e5f32a7d6a9d15791b539766bc1a091f412f05e846b15718/python_json_logger-0.1.11-py2.py3-none-any.whl
Installing collected packages: python-json-logger, daiquiri
Successfully installed daiquiri-2.1.1 python-json-logger-0.1.11
conda
is very clear about the origin and version of packages installed; these can be seen with conda list
command:
> conda list
# packages in environment at /home/servilla/anaconda3/envs/project:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
ca-certificates 2020.6.24 0
certifi 2020.6.20 py38_0
click 7.1.2 py_0
daiquiri 2.1.1 pypi_0 pypi
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20191231 h14c3975_1
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
ncurses 6.2 he6710b0_1
openssl 1.1.1g h7b6447c_0
pip 20.1.1 py38_1
python 3.8.3 hcff3b4d_2
python-json-logger 0.1.11 pypi_0 pypi
readline 8.0 h7b6447c_0
setuptools 49.2.0 py38_0
sqlite 3.32.3 h62c20be_0
tk 8.6.10 hbc83047_0
wheel 0.34.2 py38_0
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
You can see above that click
version 7.1.2 was added to "project" from Anaconda (based mostly on the empty Channel value, which implies "default") and that daiquiri
version 2.1.1 and python-json-logger
(a daiquiri
dependency) version 0.1.11 were added from PyPi. These three packages were also added to the virtual environment's local package repository found in anaconda3/envs/project/lib/python3.8/site-packages
:
certifi
certifi-2020.6.20-py3.8.egg-info
click
click-7.1.2.dist-info
daiquiri
daiquiri-2.1.1.dist-info
easy_install.py
pip
pip-20.1.1-py3.8.egg-info
pkg_resources
pythonjsonlogger
python_json_logger-0.1.11.dist-info
README.txt
setuptools
setuptools-49.2.0.post20200714-py3.8.egg-info
wheel
wheel-0.34.2-py3.8.egg-info
Step 3: Adding your local application source code
I'll not go into much detail here since everyone seems to have their own recipe for packaging their own source code into their applications. In my case, I still rely on setuptools
and the setup.py
file for deploying local source code into the application through pip
, which will either park the local package into the site-packages
directory or links it back to source location if using the --editable
option. We do not use PyPi to distribute our source code, but rather rely on pulling source code directly from GitHub. This does have implications when recreating the virtual environment for which I will discuss below.
Step 4: Saving the "project" environment
conda
provides a very nice tool-chain for managing environments through the env
interface. I use two versions of this command to save the specifications into a yaml file for use in recreating the environment: 1) conda env export --file environment.yml
saves detailed package information for the entire environment and 2) conda env export --from-history --file environment.yml
saves only the packages you installed with conda install
.
The conda env export --file environment.yml
command, by default, saves both the standard package version (as you would find in PyPi, for example) and for the explicit build version found in the Anaconda package repository:
> conda env export --file environment.yml
name: project
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- ca-certificates=2020.6.24=0
- certifi=2020.6.20=py38_0
- click=7.1.2=py_0
- ld_impl_linux-64=2.33.1=h53a641e_7
- libedit=3.1.20191231=h14c3975_1
- libffi=3.3=he6710b0_2
- libgcc-ng=9.1.0=hdf63c60_0
- libstdcxx-ng=9.1.0=hdf63c60_0
- ncurses=6.2=he6710b0_1
- openssl=1.1.1g=h7b6447c_0
- pip=20.1.1=py38_1
- python=3.8.3=hcff3b4d_2
- readline=8.0=h7b6447c_0
- setuptools=49.2.0=py38_0
- sqlite=3.32.3=h62c20be_0
- tk=8.6.10=hbc83047_0
- wheel=0.34.2=py38_0
- xz=5.2.5=h7b6447c_0
- zlib=1.2.11=h7b6447c_3
- pip:
- daiquiri==2.1.1
- python-json-logger==0.1.11
prefix: /home/servilla/anaconda3/envs/project
The build version is unique to the target operating system – in this case, Linux. If this build specification was used to recreate the environment on a Windows 10 or Mac OS X host, it would fail to find the correct packages and not complete. For this reason, if I want to recreate an environment across different operating systems, I often use the --no-builds
option to remove the specific Anaconda package information from the output:
conda env export --no-builds --file environment.yml
name: project
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1
- ca-certificates=2020.6.24
- certifi=2020.6.20
- click=7.1.2
- ld_impl_linux-64=2.33.1
- libedit=3.1.20191231
- libffi=3.3
- libgcc-ng=9.1.0
- libstdcxx-ng=9.1.0
- ncurses=6.2
- openssl=1.1.1g
- pip=20.1.1
- python=3.8.3
- readline=8.0
- setuptools=49.2.0
- sqlite=3.32.3
- tk=8.6.10
- wheel=0.34.2
- xz=5.2.5
- zlib=1.2.11
- pip:
- daiquiri==2.1.1
- python-json-logger==0.1.11
prefix: /home/servilla/anaconda3/envs/project
There is, however, another issue to contend with that is tied to both the target operating system and dependent packages. If, for example, your application requires upstream dependencies that are operating system specific (e.g., file system-level packages are often different between Linux and Windows 10), using the standard conda env export
may not work when recreating an environment that originates on Linux, but is destined for Windows 10 or Mac OS X. Using the --no-builds
option does not help in this case either since the problem lies with what packages are listed as dependencies, not with the build version being used. To alleviate this problem, you can use the conda env export --from-history
command, which only saves packages you have explicitly installed with conda install
, leaving out any of their conda
installed dependencies:
> conda env export --from-history --file environment.yml
name: project
channels:
- defaults
dependencies:
- python=3.8
- click
prefix: /home/servilla/anaconda3/envs/project
Using this last method allows conda
to recreate any of the upstream dependencies using its own rules, and often works without user intervention. Unfortunately, this greatly slimmed down version of the environment does not include packages installed via pip
. These must be merged in separately and in a manual process. If you did not include package versions during their installation, you may want to "pin" them in the saved environment.yml
file:
name: project
channels:
- defaults
dependencies:
- python=3.8
- click=7.1.2
- pip:
- daiquiri==2.1.1
- python-json-logger==0.1.11
prefix: /home/servilla/anaconda3/envs/project
To finish this step, push the locally developed source code, including the environment.yml
file (as a backup, I usually run pip freeze > requirements.txt
and save the output to Git just in case), to GitHub (or your favorite source code repository; or make a tarball for that matter).
Step 5: Recreating the "project" environment
Recreating a conda
environment is quite simple. As noted above, the enviroment.yml
file is stored in GitHub along with the project application source code. First, clone the GitHub hosted repository. This will bring down the environment.yml
file. Next, use the conda env create --file environment.yml
command to recreate the original "project" environment. In addition to installing the Anaconda-based packages, conda
will also install any packages installed by pip
(in this case, both daiquiri
and python-json-logger
). Finally, activate the "project" virtual environment using conda activate project
and install the application source code using pip
.
If you update your development environment and would like push this to your deployment server, you only need to regenerate the environment.yml
file, along with any application source code changes you've made, and pull it down to the deployment server. Once there, you can execute the conda env update --file environment.yml
command – this will update your deployed application environment.
As always, your mileage may vary.