If you are running Nextflow pipelines on AWS Batch you will run into a conundrum: AWS Batch requires your jobs to be containerized using Docker, but Nextflow documentation warns:

The Conda environment feature is not supported by executors that use remote object storage as a work directory e.g. AWS Batch

While at first it may seem that you will need to painstakingly build docker environments around your favorite tools, this is not so! It turns out there is a quick and relatively painless recipe to turn a Conda environment into a Docker container.

Solution

Given a .yml file for a conda environment ( e.g. created using conda env export > env.yml ) and a Dockerfile with the following contents:

FROM mambaorg/micromamba:1.4.9
COPY --chown=$MAMBA_USER:$MAMBA_USER env.yml /tmp/env.yml
RUN micromamba install -y -n base -f /tmp/env.yml &&
    micromamba clean --all --yes

That’s it, it really is just that easy! Processes run with the micromamba container will automatically have the base environment activated, so everything installed from your env.yml will be on the $PATH when your Nextflow processes run.

One thing to note is the micromamba container runs as $MAMBA_USER, so if you are doing anything with volume mounts you might need to watch out for file permission issues.

Hopefully this can save you some pain and time fiddling with dependencies!

References

micromamba-docker documentation.

Comments