This guide shows how to run the AlphaFold framework on the DCSR clusters using a python virtual environment and the DCSR software stack

Fans of Conda may also wish to check out from which a lot of the information below is taken - just make sure to module load gcc miniconda3 rather than following the exact procedure!

For details on how to run the model please see the Supplementary Information article

For some ideas on how to separate the CPU and GPU parts:

Alternatively - check out what has already been calculated

Reference databases

The reference databases needed for AlphaFold have been made available in /reference/alphafold so there is no need to download them - the directory name is the date on which the databases were downloaded.

$ ls /reference/alphafold/
bfd  mgnify  params  pdb70  pdb_mmcif  uniclust30  uniref9020211104

New versions will be downloaded if required.


Setting up a virtual environment

In order to satisfy a number of the dependencies and to install AlphaFold itself we use a python virtual environment

As usual we recommenced that you create these environments in your project space in the /work filesystem. You can, of course, create one per lab and share it. 

$ module load gcc python

$ cd /work/path/to/my/project

$ python -m venv alpha-venv

$ source alpha-venv/bin/activate
(alpha-venv) [ulambda@curnagl ]$ 

$ pip install alphafold

$ pip install --upgrade "jax[cuda111]" -f

You can check what has been installed by running pip list inside the virtual environment 

AlphaFold and friends

Whilst we have already installed the AlphaFold python package it's useful to have the source code which can be obtained with git

git clone

This will create a folder called alphafold

Go into the directory (cd alphafold) and download a helper script


As you will be running jobs via Slurm please comment out (with #) the following lines in so that if multiple GPUs are used they will be visible

# Export ENVIRONMENT variables (change me if required)
if [[ "$use_gpu" == true ]] ; then

It's also useful to make the helper script executable

$ chmod a+x

Now we download some chemical data need by the code

 wget -q -P alphafold/common/

After all this the setup is complete and we are ready to go.

Running an example


Here we show running on an interactive node using Sinteractive but the same logic applies to batch jobs which will be needed for longer running tasks.

$ Sinteractive -G 1 -t 2:00:00 -c 16 -m 64G
Sinteractive is running with the following options:
--gres=gpu:1 -c 16 --mem 64G -J interactive -p interactive -t 2:00:00
salloc: Granted job allocation 123456
salloc: Waiting for resource configuration
salloc: Nodes dnagpu001 are ready for job
[ulambda@dnagpu001 ]$ 

In order to make all the necessary tools available we first need to load some modules

$ module load gcc python hh-suite openmm hmmer pdbfixer kalign cuda cudnn

$ module list

Currently Loaded Modules:
  1) gcc/9.3.0     2) python/3.8.8   3) hh-suite/3.3.0   4) fftw/3.3.9   5) openmm/7.5.0   
  6) hmmer/3.3.2   7) pdbfixer/1.7   8) kalign/3.3.1     9) cuda/11.2.2  10) cudnn/

We then activate the virtual environment

$ source /work/path/to/my/project/alpha-venv/bin/activate
(alpha-venv) [ulambda@dnagpu001 ~]$ 

Then change into the alphafold repository and launch a task

$ cd /work/path/to/my/project/alphafold

$ ./ -d /reference/alphafold/20210719 -o /scratch/ulambda/alphatest -m model_1 -f T1024.fasta -t 2021-05-01 -g true

2021-07-20 17:32:00.051940: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-07-20 17:32:04.076381: I external/org_tensorflow/tensorflow/compiler/xla/service/] XLA service 0x318b090 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2021-07-20 17:32:04.076419: I external/org_tensorflow/tensorflow/compiler/xla/service/]   StreamExecutor device (0): Interpreter, <undefined>
I0720 17:45:51.702012 140086992525120] Started HHsearch query
I0720 17:47:37.402516 140086992525120] Finished HHsearch query in 105.700 seconds
I0720 17:47:38.506151 140086992525120] Launching subprocess "/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/hh-suite-3.3.0-k3vfe6b2jsdl6cebrcmb3qoxav2gyukz/bin/hhblits -i T1024.fasta -cpu 4 -oa3m /tmp/tmpkv138q2u/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /dcsrsoft/reference/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /dcsrsoft/reference/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0720 17:47:38.572123 140086992525120] Started HHblits query
Example batch script

A job script to be submitted via sbatch which does the same thing as above


#SBATCH -n 1
#SBATCH -c 24
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --gres-flags=enforce-binding
#SBATCH --mem 200G
#SBATCH -t 6:00:00

module purge

module load gcc python hh-suite openmm hmmer pdbfixer kalign cuda cudnn

source /work/path/to/my/project/alpha-venv/bin/activate

cd /work/path/to/my/project/alphafold/

./ -d /reference/alphafold/20210719 -o /scratch/ulambda/alphatest -m model_1 -f T1024.fasta -t 2021-05-01 -g true

The above analysis for T1024 takes approximately 2 jours with the resources requested.

The timings.json file shows

    "features": 7004.208073139191,
    "process_features_model_1": 8.682352781295776,
    "predict_and_compile_model_1": 148.41881656646729,
    "relax_model_1": 64.47628593444824