3 minute read

BEAST 2.7 is out and promises to be better than ever. However, it’s not available on conda yet. This post will show you how to use BEAST 2.7 with Snakemake to automatically install BEAST 2.7 and its packages.

I’ve been using Snakemake for a few years now and it’s a great tool for managing bioinformatics workflows. Check out the Snakemake Tutorial for a quick introduction.

BEAST 2.7 isn’t currently available on conda, however, we can use a post-deployment script to install it into a Snakemake pipeline. Post-deployment scripts are run after a environment is created and can be used to install additional packages or perform other tasks.

The following script will download and install BEAST 2.7.6. It will also install the ReMASTER package (as an example). You can add additional packages by adding them to the PACKAGES array and change the BEAST version by changing the VERSION variable.

It’s not that important to understand the script, but I’ve added comments to explain what it’s doing. The script is designed to be platform agnostic and will download the appropriate version of BEAST 2.7 for your operating system.

#!env bash
set -o pipefail
set -e # Exit on error

# Define the version variable
VERSION="2.7.6"
# array of packages to install
PACKAGES=("ReMASTER")

# Function to download, install, and symlink
download_install_symlink() {
    os=$(uname -s)
    arch=$(uname -m)
    file=""

    # change to the envs directory
    cd $CONDA_PREFIX

    # Download and Install
    if [[ "$os" == "Darwin" ]]; then
        file="BEAST.v$VERSION.Mac.dmg"
        curl -LO "https://github.com/CompEvol/beast2/releases/download/v$VERSION/$file"
        hdiutil mount "$file"  # This mounts the dmg file
        cp -R "/Volumes/BEAST v$VERSION/BEAST $VERSION/" "$CONDA_PREFIX/lib/beast"
        hdiutil unmount "/Volumes/BEAST v$VERSION/"
    elif [[ "$os" == "Linux" ]]; then
        if [[ "$arch" == "x86_64" ]]; then
            file="BEAST.v$VERSION.Linux.x86.tgz"
        elif [[ "$arch" == "aarch64" ]]; then
            file="BEAST.v$VERSION.Linux.aarch64.tgz"
        else
            echo "Unsupported architecture"
            return 1
        fi
        curl -LO "https://github.com/CompEvol/beast2/releases/download/v$VERSION/$file"
        tar -xzvf "$file"
        mv beast "$CONDA_PREFIX/lib/beast"
    else
        echo "Unsupported operating system"
        return 1
    fi

    # Create symlinks
    for cmd in "$CONDA_PREFIX/lib/beast/bin/"*; do
        ln -sf "$cmd" "$CONDA_PREFIX/bin/"
    done
    
    # Remove the downloaded file
    rm -rf "$file"
}

# Call the function
download_install_symlink

# This script is used to add packages to beast after the beast.yaml env is installed
beast -version  # Need to call beast once (even just to query the version) to create support dirs

# Install packages
for package in "${PACKAGES[@]}"; do
    packagemanager -add "$package"
done

# Activate script: Set up LD_LIBRARY_PATH for beagle
echo -e 'export LD_LIBRARY_PATH_CONDA_BACKUP="${LD_LIBRARY_PATH:-}"\nexport LD_LIBRARY_PATH=$CONDA_PREFIX/lib:${LD_LIBRARY_PATH:-}' > "$CONDA_PREFIX/etc/conda/activate.d/beagle_activate.sh"

# Deactivate script: Restore original LD_LIBRARY_PATH and clean up
echo -e 'export LD_LIBRARY_PATH=${LD_LIBRARY_PATH_CONDA_BACKUP:-}\nunset LD_LIBRARY_PATH_CONDA_BACKUP\n[ -z "$LD_LIBRARY_PATH" ] && unset LD_LIBRARY_PATH' > "$CONDA_PREFIX/etc/conda/deactivate.d/beagle_deactivate.sh"

The script will be used with a Snakemake environment file. Post-deployment scripts are run after the env is installed and must be named <envname>.post-deploy.sh. For example, if your env is called beast.yaml, the post-deployment script above should be located next to the environment file and called beast.post-deploy.sh.

Here’s an example env file that installs beagle-lib (env must install something). The post-deployment script will run after and install BEAST 2.7 and ReMASTER.

# envs/beast.yaml
channels:
  - bioconda
  - conda-forge
dependencies:
  - beagle-lib==4.0.1

Here’s an example Snakefile that uses the BEAST 2.7 env.

# Snakefile
rule beast_version:
    conda: "envs/beast.yaml"
    output: "beast.version"
    shell: "beast -version > {output}"

The workflow directory should look like this:

workflow/
├── envs
│   ├── beast.post-deploy.sh
│   └── beast.yaml
└── Snakefile

You can run the workflow with the following command:

snakemake --cores 1 --use-conda -R beast_version  

This will install BEAST 2.7 (the first time you run it) and run the beast_version rule. The --use-conda flag tells Snakemake to use the env specified in the rule. The -R flag tells Snakemake to run the specified rule and all of its dependencies.

A complete example can be found at https://github.com/Wytamma/using-beast-2.7-with-snakemake.

Comments