Packaging Conda Environments For Nextflow And Aws Batch
If you are running Nextflow pipelines on AWS Batch you will run into a conundrum: AWS Batch requires your jobs to be containerized using Docker, but Nextflow documentation warns:
The Conda environment feature is not supported by executors that use remote object storage as a work directory e.g. AWS Batch
While at first it may seem that you will need to painstakingly build docker environments around your favorite tools, this is not so! It turns out there is a quick and relatively painless recipe to turn a Conda environment into a Docker container.
Solution
Given a .yml file for a conda environment ( e.g.
created using
conda env export > env.yml
) and a Dockerfile with the following contents:
FROM mambaorg/micromamba:1.4.9
COPY --chown=$MAMBA_USER:$MAMBA_USER env.yml /tmp/env.yml
RUN micromamba install -y -n base -f /tmp/env.yml &&
micromamba clean --all --yes
That’s it, it really is just that easy!
Processes run with the micromamba container will
automatically have the
base
environment activated, so everything installed
from your
env.yml
will be on the
$PATH
when your Nextflow processes run.
One thing to note is the micromamba container
runs as
$MAMBA_USER
, so if you are doing anything with volume
mounts you might need to watch out for file
permission issues.
Hopefully this can save you some pain and time fiddling with dependencies!
References
micromamba-docker documentation.
Comments