View markdown source on GitHub

Tool Dependencies and Conda

Contributors

Questions

Objectives

Requirements

last_modification Last modification: Sep 28, 2022

Planemo

These slides mirror the section on “Dependencies and Conda” in the Planemo Documentation.


Galaxy Dependencies


class: left, enlarge120

Example Tool (1 / 2)

From Planemo docs - the following example builds a tool for the seqtk seq command.

$ planemo tool_init --force \
                    --id 'seqtk_seq' \
                    --name 'Convert to FASTA (seqtk)' \
                    --requirement seqtk@1.2 \
                    --example_command 'seqtk seq -a 2.fastq > 2.fasta' \
                    --example_input 2.fastq \
                    --example_output 2.fasta \
                    --test_case \
                    --cite_url 'https://github.com/lh3/seqtk' \
                    --help_from_command 'seqtk seq'

Notice the --requirement seqtk@1.2.


class: left, enlarge120

Example Tool (2 / 2)

The --requirement seqtk@1.2 gets translated into the following Galaxy tool XML:

<requirements>
    <requirement type="package" version="1.2">seqtk</requirement>
</requirements>

Dependency Resolution

schematic of a galaxy server with dependency resolution via requirement tags at the top. On the left is the tool box with a number of xml files listed like seqtk_seq and seqtk_subseq. On the right is applications & libraries showing only a few tools like seqtk, all of the 3 multipoe subtools were collapsed

Speaker Notes


class: enlarge120

Conda logo

Package, dependency and environment management


class: enlarge120

###.image-25[Conda logo]
Conda Terminology

Conda recipes build packages that are published to channels.


class: enlarge120

.image-25[Conda logo]
Conda Key Features for Galaxy

Speaker Notes

Compared with the Tool Shed dependency management (tool_dependencies.xml), BioConda is:


###.image-25[Conda logo]
Conda Distributions



Schematic showing conda as a small circle, miniconda encompasses it and adds python and base packages. Anaconda encompasses all of it, adding 150 high quality packages.


class: enlarge120

###.image-25[conda logo]
Best Practice Channels

.footnote[If you are interested in Natural Language Processing or Cheminformatics you may be asking if these channels can still work for your tools. Despite the name Bioconda - it is really more about community and a set of best practices than about bioinformatics purity - many diverse packages have been integrated.]


class: left, enlarge120

###.image-25[conda logo]
Install and Configure

$ planemo conda_init
$ export PATH=$PATH:~/miniconda3/bin
$ which conda
/Users/john/miniconda3/bin/conda

Planemo installs Conda using miniconda and configured defaults designed to easy development.

This has already been done on Planemo machine.


class: enlarge120

###.image-25[conda logo]
Quickstart

Using Conda outside Planemo

$ conda create -n yaml pyyaml
$ conda env list
base        *  ~/miniconda3
yaml           ~/miniconda3/envs/yaml
$ conda activate yaml
(yaml) $
$ conda install pyyaml

class: enlarge150

Conda and Galaxy

Galaxy now automatically installs Conda when first launched and will use Bioconda and other channels for package resolution.


Installing Tools with Conda

Screnshot of tool installation menu


Managing Tool Dependencies

Screenshot of the galaxy page for managing tool dependencies


class: enlarge150

Conda and Planemo

Using Conda directly is generally package-centric, Planemo provides abstractions that are tool-centric.


class: left, enlarge120

The next few slides will use the seqtk example from Planemo’s documentation - this can be downloaded to follow along using the following command:

$ planemo project_init --template=seqtk_complete seqtk_example
$ cd seqtk_example

class: left

Linting Conda Dependencies

.enlarge120[Planemo can check if the requirements of a tool are available in best practice Conda channels using the --conda_requirements flag of planemo lint.]


$ planemo lint --conda_requirements seqtk_seq.xml
Linting tool /Users/john/workspace/planemo/docs/writing/seqtk_seq_v6.xml
  ...
Applying linter requirements_in_conda... CHECK
.. INFO: Requirement [seqtk@1.2] matches target in best practice Conda channel [bioconda].


.enlarge120[Notice Planemo indicates this tool is available and shows the channel it is available in.]


class: left

The Planemo conda_install command

.reduce70[```sh $ planemo conda_install seqtk_seq.xml Install conda target CondaTarget[seqtk,version=1.2] /home/john/miniconda3/bin/conda create -y –name __seqtk@1.2 seqtk=1.2 Fetching package metadata …………… Solving package specifications: ………. Package plan for installation in environment /home/john/miniconda3/envs/__seqtk@1.2: The following packages will be downloaded: package | build —————————|—————– seqtk-1.2 | 0 29 KB bioconda The following NEW packages will be INSTALLED: seqtk: 1.2-0 bioconda zlib: 1.2.8-3 Fetching packages … …. #

To deactivate this environment, use:

> source deactivate __seqtk@1.2

# $ which seqtk seqtk not found


Notice seqtk hasn't been placed on the `PATH`, an environment has been setup that Galaxy (when
used through Planemo) can leverage.

---

class: enlarge120

### The Planemo `conda_env` command

.reduce70[```sh
$ . <(planemo conda_env seqtk_seq.xml)
Deactivate environment with conda_env_deactivate
(seqtk_seq) $ which seqtk
/home/planemo/miniconda2/envs/jobdepsiJClEUfecc6d406196737781ff4456ec60975c137e04884e4f4b05dc68192f7cec4656/bin/seqtk
(seqtk_seq) $ seqtk seq

Usage:   seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT    mask bases with quality lower than INT [0]
         -X INT    mask bases with quality higher than INT [255]
         -n CHAR   masked bases converted to CHAR; 0 for lowercase [0]
         -l INT    number of residues per line; 0 for 2^32-1 [0]
...
         -V        shift quality by '(-Q) - 33'
         -U        convert all bases to uppercases
         -S        strip of white spaces in sequences
(seqtk_seq) $ conda_env_deactivate
$
```]

---

## Using the Tool Environment

Now that we have verified the Conda environment setup with `conda_install` works properly on the
command-line, we can use our tool!

`planemo test` and `planemo serve` will use this environment by default now for this tool.

---

class: left

### Planemo `test`

.reduce70[```sh
$ planemo test seqtk_seq.xml
...
INFO  [galaxy.tools.actions] Handled output named output1 for tool seqtk_seq (20.136 ms)
INFO  [galaxy.tools.actions] Added output datasets to history (12.782 ms)
INFO  [galaxy.tools.actions] Verified access to datasets for Job[unflushed,tool_id=seqtk_seq] (10.954 ms)
INFO  [galaxy.tools.actions] Setup for job Job[unflushed,tool_id=seqtk_seq] complete, ready to flush (21.053 ms)
INFO  [galaxy.tools.actions] Flushed transaction for job Job[id=2,tool_id=seqtk_seq] (26.510 ms)
INFO  [galaxy.jobs.handler] (2) Job dispatched
DEBUG [galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda
DEBUG [galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda
INFO  [galaxy.jobs.command_factory] Built script [/tmp/tmpLvKwta/job_working_directory/000/2/tool_script.sh] for tool command [[ "$CONDA_DEFAULT_ENV" = "/Users/john/miniconda2/envs/__seqtk@1.2" ] || . /Users/john/miniconda2/bin/activate '/Users/john/miniconda2/envs/__seqtk@1.2' >conda_activate.log 2>&1 ; seqtk seq -a '/tmp/tmpLvKwta/files/000/dataset_1.dat' > '/tmp/tmpLvKwta/files/000/dataset_2.dat']
DEBUG [galaxy.tools.deps] Using dependency samtools version None of type conda
DEBUG [galaxy.tools.deps] Using dependency samtools version None of type conda
ok

----------------------------------------------------------------------
XML: /private/tmp/tmpLvKwta/xunit.xml
----------------------------------------------------------------------
Ran 1 test in 15.936s

OK
```]

.enlarge120[
The following line indicates the seqtk package was found:

[galaxy.tools.deps] Using dependency seqtk version 1.2 of type conda

]

---

<hands-on-title>Hands-on</hands-on-title>

![Cartoon of people jumping.](../../images/exercise.png)

---

<hands-on-title>Hands-on</hands-on-title>

#### The Goal
- Use the Planemo commands `conda_install`, `conda_env`, and `test` to practice the Galaxy tool dependency development lifecycle.

---

<hands-on-title>Hands-on</hands-on-title>

#### Steps

Run the following commands to practice working with Galaxy tools, Planemo, and Conda.


```bash
$ planemo project_init --template=seqtk_complete seqtk_example
$ cd seqtk_example
$ planemo conda_install seqtk_seq.xml
$ . &lt;(planemo conda_env seqtk_seq.xml)
$ planemo test seqtk_seq.xml

class: enlarge120

Finding the Correct Requirements & Packages

The previous example worked because a published Bioconda recipe named seqtk at version 1.2 was previously published, but how can these be found?

Two easy approaches are using planemo conda_search and using the Anaconda web search.


class: left

Using the Planemo conda_search Command

The Planemo conda_search command is a shortcut around conda search that searches best practice channels that Galaxy is configured to work with:

$ planemo conda_search seqt
Fetching package metadata ...............
seqtk                        r75                           0  bioconda
                             r82                           0  bioconda
                             r93                           0  bioconda
                             1.2                           0  bioconda

Alternatively, conda can be used directly:

$ $HOME/miniconda3/bin/conda search -c iuc -c conda-forge -c bioconda seqtk

Using Anaconda Search - https://anaconda.org

screensho of the anaconda homepage


Using Anaconda Search - https://anaconda.org

Screenshot of an anaconda search for the term seqtk

Speaker Notes

Notice that only one of these results is a best practice channel and so that is the only one that will be used by Galaxy by default.


Hands-on

Cartoon of people jumping.


Hands-on

The Goal


Hands-on

Steps

  1. Run the following commands to download an example tool to modify.
    $ planemo project_init --template conda_exercises conda_exercises
    $ cd conda_exercises/exercise1
    $ ls
    pear.xml              test-data
    
  2. Run planemo test pear.xml to verify the tool does not function without dependencies defined.
  3. Use --conda_requirements flag with planemo lint to verify it does indeed lack requirements.
  4. Use planemo conda_search or the Anaconda website to search for the correct package and version in a best practice channel.
  5. Update pear.xml with the correct requirement tags.
  6. Re-run the lint command from above to verify the tool now has the correct dependency definition.
  7. Re-run the test command from above to verify the tool test now works properly.

Writing a Conda recipe

.footnote[conda-build user guide
Bioconda guidelines]


class: left

meta.yaml

meta.yaml contains basic metadata about the recipe.

{% set version = "0.7.17" %}
{% set sha256 = "980b9591b61c60042c4a39b9e31ccaad8d17ff179d44d347997825da3fdf47fd" %}

package:
  name: bwa
  version: {{ version }}
source:
  url: https://github.com/lh3/bwa/archive/v{{ version }}.tar.gz
  sha256: {{ sha256 }}
  patches:
    - Makefile.patch
build:
  number: 7
about:
  home: https://github.com/lh3/bwa
  license: GPL3
  license_file: COPYING
  summary: The BWA read mapper.

meta.yaml > requirements

requirements:
  build:
    - {{ compiler('c') }}
  host:
    - zlib
  run:
    - zlib
    - perl


Preprocessing Selectors

requirements:
  build:
    - bz2file  # [py < 33]
    - typing  # [py27 or py34]

build:
  skip: True  # [osx]

.footer[.center[https://conda.io/projects/conda-build/en/latest/source/define-metadata.html#preprocessing-selectors ]]


class: left

Tests

meta.yaml should contain simple tests. These are commands executed at the end of conda build and expected to return 0 on success.

test:
  commands:
    - bowtie2 --version
test:
  commands:
    - bwa 2>&1 | grep 'index sequences in the'
test:
  commands:
    - '$R -e "library(''xcms'')"'

Please note that the Conda tests run inside the runtime environment and not in the build environment.

.footnote[.center[https://conda.io/projects/conda-build/en/latest/source/define-metadata.html#test-section ]]


From the Bioconda Guidelines:

An adequate test must be included in the recipe. An “adequate” test depends on the recipe, but must be able to detect a successful installation. While many packages may ship their own test suite (unit tests or otherwise), including these in the recipe is not recommended since it may timeout the build system on CircleCI. We especially want to avoid including any kind of test data in the repository.


build.sh

#!/bin/bash
./configure --prefix=$PREFIX
make
make install
#!/bin/bash
mkdir -p $PREFIX/bin
cp *.py $PREFIX/bin

.footnote[http://training.galaxyproject.org/training-material/topics/dev/tutorials/conda_sys/slides.html]


Skeletons

$ conda skeleton pypi <packagename>
$ conda skeleton cran <packagename>
$ bioconductor_skeleton.py <packagename>
$ conda skeleton cpan <packagename>`

These generate pre-filled recipes (not guaranteed to work out of the box) for specific programming environments.

.footnote[Building conda packages with conda skeleton]


Building

Once the recipe is ready to go, the conda build command can be used to build it.

$ $HOME/miniconda3/bin/conda build .
  1. BUILD START: Builds/Compiles the package
  2. BUILD START: Provides a .tar.bz2
  3. TEST START: Installs the .tar.bz2 previously generated
  4. TEST START: Launches the functional tests
  5. (Provides the .tar.bz2 path)

.footnote[If miniconda wasn’t configured with planemo conda_init, you may have
to run conda install conda-build before using the above command.]


.image-75[bioconda logo]


###.image-25[bioconda logo] contributing 1/2


###.image-25[bioconda logo] contributing 2/2

See Contributing with GitHub.


Planemo and --conda_use_local

By default, Galaxy and Planemo will ignore locally built packages.

Simply pass --conda_use_local to various Planemo commands (e.g. test, conda_install, or serve) to use the local package cache.

Enables developing Galaxy tools and Conda recipes in parallel.


Hands-on

Cartoon of people jumping.


Hands-on

The Goal


class: left

Hands-on

Before

If you have completed exercise1, open exercise2.

$ cd ../exercise2
$ ls
fleeqtk_seq.xml              test-data

This directory contains the outline of a tool for fleeqtk. fleeqtk is a fork of the project seqtk that many Planemo tutorials are built around and the example tool fleeqtk_seq.xml should be fairly familiar.


class: left

Hands-on

Steps

  1. Clone and branch Bioconda (https://github.com/bioconda/bioconda-recipes)
  2. Build a recipe for fleeqtk version 1.3. You may wish to use conda skeleton, start from scratch, or copy the recipe of seqtk and work from there - any of these strategies should work
    • fleeqtk 1.3 can be downloaded using the URL https://github.com/jmchilton/fleeqtk/archive/v1.3.tar.gz
    • fleeqtk can be built using make and installed with make install
  3. Use conda build to build the recipe
  4. Add a requirement for this new package in the example tool.
  5. Run planemo conda_install --conda_use_local fleeqtk_seq.xml to install the package for Galaxy
  6. Run planemo test fleeqtk_seq.xml to verify the tool and package work together

Advanced Topics in Conda Development


Jinja Templating

{% set name = "seqtk" %}
{% set version = "1.15.1" %}

package:
  name: {{ name }}
  version: {{ version }}

source:
    url: http://coolsoftware.com/{{ name }}/{{ version }}/{{ name }}-{{ version }}.zip

https://conda.io/projects/conda-build/en/latest/source/define-metadata.html#templating-with-jinja


class: left

Stable URLs

source:
  url: https://github.com/lh3/bwa/archive/v0.7.15.tar.gz
  md5: 54fdee953c5c256d36885a1c5c6b118c

While supported by Conda, git_url and git_rev are not as stable as a git tarball. Ideally a github repo should have tagged releases that are accessible as tarballs from the “releases” section of the github repo. In addition tarballs can be easily mirrored and Bioconda is saving a copy of every tarball so the recipe can be rebuild at any time.


class: left

Python

For PyPI packages

conda skeleton pypi <package_name>

Python - pysam’s build.sh

#!/bin/bash
# Remove gcc statements that do not work on older compilers for CentOS5
# support
sed -i'' -e 's/"-Wno-error=declaration-after-statement",//g' setup.py
sed -i'' -e 's/"-Wno-error=declaration-after-statement"//g' setup.py
# linking htslib, see:
# https://pysam.readthedocs.org/en/latest/installation.html#external
# https://github.com/pysam-developers/pysam/blob/v0.9.0/setup.py#L79
export CFLAGS="-I$PREFIX/include"
export CPPFLAGS="-I$PREFIX/include"
export LDFLAGS="-L$PREFIX/lib"

export HTSLIB_LIBRARY_DIR=$PREFIX/lib
export HTSLIB_INCLUDE_DIR=$PREFIX/include
$PYTHON setup.py install

Python - pysam’s meta.yaml

.reduce50[

{% set version = "0.15.2" %}
{% set samtools_version = "1.9" %}
{% set bcftools_version = "1.9" %}

package:
  name: pysam
  version: '{{ version }}'

source:
  url: https://github.com/pysam-developers/pysam/archive/v{{ version }}.tar.gz
  sha256: 8cb3dd70f0d825086ac059ec2445ebd2ec5f14af73e7f1f4bd358966aaee5ed3

build:
  number: 3
  binary_relocation: False # [linux]

requirements:
  build:
    - {{ compiler('c') }}
  host:
    - htslib
    - samtools {{ samtools_version }}
    - bcftools {{ bcftools_version }}
    - cython
    - python
    - setuptools
    - zlib
    - curl
    - libdeflate
  run:
    - samtools {{ samtools_version }}
    - bcftools {{ bcftools_version }}
    - python
    - curl
    - libdeflate

test:
  imports:
    - pysam
```]

---

## R

For CRAN packages

```sh
conda skeleton cran <packagename>

warning The majority of R packages on CRAN are generic and should therefore be submitted at Conda-Forge. Exceptions are r-* packages that depends on bioconductor-* packages.

Conda-Forge contribution guidelines


Java

https://bioconda.github.io/guidelines.html#java


Java - PeptideShaker’s build.sh

#!/bin/bash
set -eu -o pipefail

outdir=$PREFIX/share/$PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM
mkdir -p $outdir
mkdir -p $PREFIX/bin
cp -R * $outdir/
cp $RECIPE_DIR/peptide-shaker.py $outdir/peptide-shaker
ls -l $outdir
ln -s $outdir/peptide-shaker $PREFIX/bin
chmod 0755 "${PREFIX}/bin/peptide-shaker"

class: reduce70

Java - PeptideShaker’s meta.yaml

...

source:
    url: http://genesis.ugent.be/maven2/eu/isas/peptideshaker/{{ name }}/{{ version }}/{{ name }}-{{ version }}.zip
    md5: 14a48413e28a25614f5fda2b381d7197

requirements:
  build:
  run:
    - openjdk >=6
    - python

test:
    commands:
      - peptide-shaker eu.isas.peptideshaker.cmd.PeptideShakerCLI
      - peptide-shaker eu.isas.peptideshaker.cmd.PeptideShakerCLI -Xms512m -Xmx1g

Perl

For CPAN packages

conda skeleton cpan <packagename>

class: reduce70

Perl - Module-Build

package:
  name: perl-module-build
  version: "0.4214"

source:
  url: https://cpan.metacpan.org/authors/id/L/LE/LEONT/Module-Build-0.4214.tar.gz
  md5: 7b7ca5a47bef48c50c8b5906ca3ac7fb

build:
  number: 0

requirements:
  host:
    - perl
    - perl-cpan-meta-yaml
    - perl-extutils-parsexs
    - perl-data-dumper
    # [...]
  run:
    - perl
    - perl-text-parsewords
    - perl-cpan-meta
    - perl-version
    # [...]

test:
  # Perl 'use' tests
  imports:
    - Module::Build
    - Module::Build::Base
    - Module::Build::Compat
    - Module::Build::Config
    # [...]

about:
  home: https://metacpan.org/pod/Module::Build
  license: perl_5
  summary: 'Build and install Perl modules

class: left

Metapackages

Metapackages tie together other packages. All they do is define dependencies. For example, the hubward-all metapackage specifies the various other conda packages needed to get full hubward installation running just by installing one package.

Other metapackages might tie together conda packages with a theme. For example, all UCSC utilities related to bigBed files, or a set of packages useful for variant calling.

https://bioconda.github.io/guidelines.html#metapackages


class: left

CircleCI Continuous Building

.image-70[circle ci logo]


CircleCI Command Line Interface (CLI)

.reduce70[bash $ curl -o /usr/local/bin/circleci https://circle-downloads.s3.amazonaws.com/releases/build_agent_wrapper/circleci ]

.reduce70[bash $ chmod +x /usr/local/bin/circleci ]

.reduce70[bash $ circleci build ]


Key Points

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network This material is licensed under the Creative Commons Attribution 4.0 International License.