{ "metadata": { "kernelspec": { "display_name": "Bash", "language": "bash", "name": "bash" }, "language_info": { "codemirror_mode": "shell", "file_extension": ".sh", "mimetype": "text/x-sh", "name": "bash" } }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
conda
.\n- Run our software from the command line.\n\n**Time Estimation: 30M**\nConda environments, like Python Virtual Environments allow you to easily manage your installed packages and prevent conflicts between different project’s dependencies. This tutorial follows an identical structure to the virtualenv tutorial, but with conda.
\n\n\nComment\nThis tutorial is significantly based on the Carpentries lesson “Intermediate Research Software Development”.
\n
If you have a python project you are using, you will often see something like\nfollowing two lines somewhere at the top.
\nfrom matplotlib import pyplot as plt\nimport numpy as np\n
This means that our code requires two external libraries (also called third-party packages or dependencies) -\nnumpy
and matplotlib
.
Python applications often use external libraries that don’t come as part of the standard Python distribution. This means\nthat you will have to use a package manager tool to install them on your system.
\nApplications will also sometimes need a\nspecific version of an external library (e.g. because they require that a particular\nbug has been fixed in a newer version of the library), or a specific version of Python interpreter.\nThis means that each Python application you work with may require a different setup and a set of dependencies so it\nis important to be able to keep these configurations separate to avoid confusion between projects.\nThe solution for this problem is to create a self-contained\nenvironment per project, which contains a particular version of Python installation plus a number of\nadditional external libraries.
\nIf you see something like
\nimport pysam\n
You know you’ll need additional packages installed on your system, as it relies on htslib, a C library for working with HTS data. This usually means installing additional packages and things that are not always available from within Python’s packaging ecosystem.
\nConda environments go beyond virtual environments, and make it easier to develop, run, test and share code with others. In this tutorial, we learn how\nto set up an environment to develop our code and manage our external dependencies.
\n\n\nAgenda\nIn this tutorial, we will cover:
\n\n
So what exactly are conda environments, and why use them?
\nA conda environment is an isolated working copy of specific versions of\none of more packages and all of their dependencies.
\nThis is in fact simply a directory with a particular structure which includes\nlinks to and enables multiple side-by-side installations of different packages\nor different versions of the same external library to coexist on your machine\nand only one to be selected for each of our projects. This allows you to work on\na particular project without worrying about affecting other projects on your\nmachine.
\nAs more external libraries are added to your project over time, you can add them to\nits specific environment and avoid a great deal of confusion by having separate (smaller) environments\nfor each project rather than one huge global environment with potential package version clashes. Another big motivator\nfor using environments is that they make sharing your code with others much easier (as we will see shortly).\nHere are some typical scenarios where the usage of environments is highly recommended (almost unavoidable):
\nYou do not have to worry too much about specific versions of external libraries that your project depends on most of the time.\nConda environments enable you to always use the latest available version without specifying it explicitly.\nThey also enable you to use a specific older version of a package for your project, should you need to.
\n\n\n\nNote that you will not have a separate package installations for each of your projects - they will only\never be installed once on your system (in
\n$CONDA/pkgs
) but will be referenced from different environments.
There are several commonly used command line tools for managing environments:
\nhomebrew
, historically used on OSX to manage packages.nix
, which has a steep learning curve but allows you to declare the state of your entire systemconda
, package and environment management system (also included as part of the Anaconda Python distribution often used by the scientific community)docker
and singularity
are somewhat similar to other environment managers, as they can have isolated images with software and dependencies.While there are pros and cons for using each of the above, all will do the job of managing\nenvironments for you and it may be a matter of personal preference which one you go for. The Galaxy project is heavily invested in the Conda ecosystem and recommends it as an entry point as it is the most generally useful, and convenient. The BioConda ecosystem provides an unbelievably large number of packages for bioinformatics specific purposes, which makes it a good choice in general.
\nPart of managing your (virtual) working environment involves installing, updating and removing external packages\non your system. The Conda command (conda
) is most commonly used for this - it interacts\n and obtains the packages from one or more Conda repositories (e.g. Conda Forge, BioConda, etc.)
\n\n\nAnaconda is an open source Python\ndistribution commonly used for scientific programming - it conveniently installs Python, package and environment management
\nconda
, and a\nnumber of commonly used scientific computing packages so you do not have to obtain them separately.\nconda
is an independent command line tool (available separately from the Anaconda distribution too) with dual functionality: (1) it is a package manager that helps you find Python packages from\nremote package repositories and install them on your system, and (2) it is also a virtual environment manager. So, you can useconda
for both tasks instead of usingvenv
andpip
.
venv
and pip
are considered the de facto standards for environment and package management for Python 3.\nHowever, the advantages of using Anaconda and conda
are that you get (most of the) packages needed for\nscientific code development included with the distribution. If you are only collaborating with others who are also using\nAnaconda, you may find that conda
satisfies all your needs.
It is good, however, to be aware of all these tools (pip
, venv
, pyenv
, etc.),\nand use them accordingly. As you become more familiar with them you will realise that equivalent tools work in a similar\nway even though the command syntax may be different (and that there are equivalent tools for other programming languages\ntoo to which your knowledge can be ported).
Let us have a look at how we can create and manage environments and their packages from the command line using conda
.
We will use Miniconda, a minimal conda installer that is commonly used, in place of the larger and slower to download full anaconda distribution.
\n\n\nHands-on: Installing Conda via Miniconda\n\n
\n- Go to the Miniconda installation page and find the appropriate installer for your system.
\n- Download and run the script.
\n- You will probably need to close, and restart your terminal.
\n- Check that you can run the
\nconda
command, otherwise something may have gone wrong.
If you’re running on Linux and following this tutorial via a Jupyter/CoCalc notebook, and you agree to the Anaconda terms of service, you can simply run the following cell:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-1", "source": [ "wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh\n", "bash Miniconda3-latest-Linux-x86_64.sh -b\n", "conda init bash" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-2", "source": "Here you will need to restart your kernel, or if you’re in a desktop environment, restart your terminal.
\nLet’s install our first package, the new libmamba
solver for Conda, as an example of how to install a package. A side benefit is that it will significant speed up your package installations!
Here we see a few things:
\n-y
- installs without asking questions like “do you want to do this”. Generally people don’t use this, but in a Notebook environment it’s a bit nicer.-q
- quiet installation, by default it prints a lot of progress update messages.conda-libmamba-solver=22.8
, the package and version of that package that we wish to install.We’ll now configure conda to use mamba
by default:
While we’re at it, let’s configure Conda to use the same default repositories as Galaxy:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-7", "source": [ "conda config --add channels bioconda\n", "conda config --add channels conda-forge" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-8", "source": "This will give us access to the vast repositories of BioConda (bioinformatics software) and Conda Forge (languages and libraries).
\nCreating a new environment is done by executing the following command:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-9", "source": [ "conda create -y -n my-env" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-10", "source": "where my-env
is any arbitrary name for this Conda environment. Environment names are global, so pick something meaningful when you create one!
For our project, let’s create an environment called hts
You can list all of the created environments with
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-13", "source": [ "conda env list" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-14", "source": "You’ll notice that there is a base
environment created by default, where you can install packages and play around with Conda. We do not recommend installing things into the base
environment, if at all possible. Create a new environment for each tool you need to install
\n\n\nConda’s package resolution takes into account every other package installed in an environment. Especially if you use R packages, this can result in environments taking an inreasing amount to time to install new packages and resolve all of the dependencies.
\nThus by using isolated environments, you can be sure package resolution is quite fast.
\n
Once you’ve created an environment, you will need to activate it:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-15", "source": [ "conda activate hts" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-16", "source": "Activating the environment will change your command line’s prompt to show what environment\nyou are currently using (indicated by its name in round brackets at the start of the prompt),\nand modify the environment so that any packages you install will be available on the CLI.
\nWhen you’re done working on your project, you can exit the environment with:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "conda deactivate" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-18", "source": "If you’ve just done the deactivate
, ensure you reactivate the environment ready for the next part:
We noticed earlier that our code depends on two external libraries - numpy
and matplotlib
as well as pysam
which depends on htslib
. In order for the code to run on your machine, you need to\ninstall these dependencies into your environment.
To install the latest version of a package with conda
you use conda’s install
command and specify the package’s name, e.g.:
Note that we needed to pick a version of python that we’d use, here we specify python=3
meaning “any Python version that starts with 3”, so it won’t use Python 2.7 or a future Python 4.
If you run the conda install
command on a package that is already installed, conda
will notice this and do nothing.
To install a specific version of a package give the package name followed by =
and the version number, e.g.\nconda install numpy=1.21.1
.
To specify a minimum version of a Python package, you can\ndo pip3 install 'numpy>=1.20'
.
To upgrade a package to the latest version, e.g. conda update numpy
. (If it’s at the latest version it will attempt to downgrade the package)
To display information about the current environment:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-23", "source": [ "conda info" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-24", "source": "To display information about a particular package installed in your current environment:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-25", "source": [ "conda list python" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "bash" ], "id": "" } } }, { "id": "cell-26", "source": "To list all packages installed with pip
(in your current environment):
To uninstall a package installed in the environment do: conda remove package-name
.\nYou can also supply a list of packages to uninstall at the same time.
conda
You are collaborating on a project with a team so, naturally, you will want to share your environment with your\ncollaborators so they can easily ‘clone’ your software project with all of its dependencies and everyone\ncan replicate equivalent environments on their machines. conda
has a handy way of exporting,\nsaving and sharing environments.
To export your active environment - use conda env export
command to\nproduce a list of packages installed in the environment.\nA common convention is to put this list in a environment.yml
file:
The first of the above commands will create a environment.yml
file in your current directory.\nThe environment.yml
file can then be committed to a version control system and\nget shipped as part of your software and shared with collaborators and/or users. They can then replicate your environment and\ninstall all the necessary packages from the project root as follows:
As your project grows - you may need to update your environment for a variety of reasons. For example, one of your project’s dependencies has\njust released a new version (dependency version number update), you need an additional package for data analysis\n(adding a new dependency) or you have found a better package and no longer need the older package (adding a new and\nremoving an old dependency). What you need to do in this case (apart from installing the new and removing the\npackages that are no longer needed from your environment) is update the contents of the environment.yml
file\naccordingly by re-issuing conda env export
command and propagate the updated environment.yml
file to your collaborators\nvia your code sharing platform (e.g. GitHub).
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "cell_type": "markdown", "id": "final-ending-cell", "metadata": { "editable": false, "collapsed": false }, "source": [ "# Key Points\n\n", "- Environments keep Python versions and dependencies required by different projects separate.\n", "- An environment is itself a directory structure of software and libraries\n", "- Use `conda create -nFor a full list of options and commands, consult the official
\nconda
documentation