AlphaFold

This guide shows how to run the AlphaFold framework on the DCSR clusters using a python virtual environment and the DCSR software stack

Fans of Conda may also wish to check out https://github.com/kalininalab/alphafold_non_docker from which a lot of the information below is taken - just make sure to module load gcc miniconda3 rather than following the exact procedure!

For details on how to run the model please see the Supplementary Information article

Reference databases

The reference databases needed for AlphaFold have been made available in /dcsrsoft/reference/alphafold so there is no need to download them

$ ls /dcsrsoft/reference/alphafold/
bfd  mgnify  params  pdb70  pdb_mmcif  uniclust30  uniref90

Setting up a virtual environment

In order to satisfy a number of the dependencies and to install AlphaFold itself we use a python virtual environment

As usual we recommenced that you create these environments in ~~the~~your project space in the /work filesystem. You can, of course, create one per lab and share it.

$ module load gcc python

$ cd /work/path/to/my/project

$ python -m venv alpha-venv

$ source alpha-venv/bin/activate
(alpha-venv) [ulambda@curnagl ]$ 

$ pip install alphafold
..

$ pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
..

You can check what has been installed by running pip list inside the virtual environment

AlphaFold and friends

Whilst we have already installed the AlphaFold python package it's useful to have the source code which can be obtained with git

git clone https://github.com/deepmind/alphafold.git

This will create a folder called alphafold

Go into the directory (cd alphafold) and download a helper script

wget https://raw.githubusercontent.com/kalininalab/alphafold_non_docker/main/run_alphafold.sh

As you will be running jobs via Slurm please comment out (with #) the following lines in run_alphafold.sh so that if multiple GPUs are used thy will be visible

# Export ENVIRONMENT variables (change me if required)
if [[ "$use_gpu" == true ]] ; then
    export CUDA_VISIBLE_DEVICES=0
fi

It's also useful to make the helper script executable

$ chmod a+x run_alphafold.sh

Now we download some chemical data need by the code

 wget -q -P alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt

After all this the setup is complete and we are ready to go.

Running an example

Here we show running on an interactive node using Sinteractive but the same logic applies to batch jobs which will be needed for longer running tasks.

$ Sinteractive -G 1 -t 2:00:00 -c 16 -m 64G
 
Sinteractive is running with the following options:
 
--gres=gpu:1 -c 16 --mem 64G -J interactive -p interactive -t 2:00:00
 
salloc: Granted job allocation 123456
salloc: Waiting for resource configuration
salloc: Nodes dnagpu001 are ready for job
[ulambda@dnagpu001 ]$

In order to make all the necessary tools available we first need to load some modules

$ module load gcc python hh-suite openmm hmmer pdbfixer kalign cuda cudnn

$ module list

Currently Loaded Modules:
  1) gcc/9.3.0     2) python/3.8.8   3) hh-suite/3.3.0   4) fftw/3.3.9   5) openmm/7.5.0   
  6) hmmer/3.3.2   7) pdbfixer/1.7   8) kalign/3.3.1     9) cuda/11.2.2  10) cudnn/8.1.1.33-11.2

We then activate the virtual environment

$ source /work/path/to/my/project/alpha-venv/bin/activate
(alpha-venv) [ulambda@dnagpu001 ~]$

The change into the alphafold repository and launch a task

$ cd /work/path/to/my/project/alphafold

$ ./run_alphafold.sh -d /dcsrsoft/reference/alphafold -o /scratch/ulambda/alphatest -m model_1 -f T1024.fasta -t 2021-05-01 -g true

2021-07-20 17:32:00.051940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
..
..
2021-07-20 17:32:04.076381: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:171] XLA service 0x318b090 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2021-07-20 17:32:04.076419: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:179]   StreamExecutor device (0): Interpreter, <undefined>
..
I0720 17:45:51.702012 140086992525120 utils.py:36] Started HHsearch query
I0720 17:47:37.402516 140086992525120 utils.py:40] Finished HHsearch query in 105.700 seconds
I0720 17:47:38.506151 140086992525120 hhblits.py:128] Launching subprocess "/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/hh-suite-3.3.0-k3vfe6b2jsdl6cebrcmb3qoxav2gyukz/bin/hhblits -i T1024.fasta -cpu 4 -oa3m /tmp/tmpkv138q2u/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /dcsrsoft/reference/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /dcsrsoft/reference/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0720 17:47:38.572123 140086992525120 utils.py:36] Started HHblits query
..
..

Example batch script

A batch script to be submitted via sbatch which does the same thing as above

#!/bin/bash

#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 24
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --gres-flags=enforce-binding
#SBATCH --mem 200G
#SBATCH -t 1-0

module purge

module load gcc python hh-suite openmm hmmer pdbfixer kalign cuda cudnn

source /work/path/to/my/project/alpha-venv/bin/activate

cd /work/path/to/my/project/alphafold/

./run_alphafold.sh -d /dcsrsoft/reference/alphafold -o /scratch/ulambda/alphatest -m model_1 -f T1024.fasta -t 2021-05-01 -g true