AlphaFold
This guide shows how to run the AlphaFold framework on the DCSR clusters using a python virtual environment and the DCSR software stack
Fans of Conda may also wish to check out https://github.com/kalininalab/alphafold_non_docker from which a lot of the information below is taken - just make sure to module load gcc miniconda3 rather than following the exact procedure!
For details on how to run the model please see the Supplementary Information article
Reference databases
The reference databases needed for AlphaFold have been made available in /dcsrsoft/reference/alphafold so there is no need to download them
$ ls /dcsrsoft/reference/alphafold/
bfd  mgnify  params  pdb70  pdb_mmcif  uniclust30  uniref90Setting up a virtual environment
In order to satisfy a number of the dependencies and to install AlphaFold itself we use a python virtual environment
As usual we recommenced that you create these environments in theyour project space in the /work filesystem. You can, of course, create one per lab and share it. 
$ module load gcc python
$ cd /work/path/to/my/project
$ python -m venv alpha-venv
$ source alpha-venv/bin/activate
(alpha-venv) [ulambda@curnagl ]$ 
$ pip install alphafold
..
$ pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
..You can check what has been installed by running pip list inside the virtual environment 
AlphaFold and friends
Whilst we have already installed the AlphaFold python package it's useful to have the source code which can be obtained with git
git clone https://github.com/deepmind/alphafold.git
This will create a folder called alphafold
Go into the directory (cd alphafold) and download a helper script
wget https://raw.githubusercontent.com/kalininalab/alphafold_non_docker/main/run_alphafold.sh
As you will be running jobs via Slurm please comment out (with #) the following lines in run_alphafold.sh so that if multiple GPUs are used thy will be visible
# Export ENVIRONMENT variables (change me if required)
if [[ "$use_gpu" == true ]] ; then
    export CUDA_VISIBLE_DEVICES=0
fiIt's also useful to make the helper script executable
$ chmod a+x run_alphafold.shNow we download some chemical data need by the code
 wget -q -P alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txtAfter all this the setup is complete and we are ready to go.
Running an example
Here we show running on an interactive node using Sinteractive but the same logic applies to batch jobs which will be needed for longer running tasks.
$ Sinteractive -G 1 -t 2:00:00 -c 16 -m 64G
 
Sinteractive is running with the following options:
 
--gres=gpu:1 -c 16 --mem 64G -J interactive -p interactive -t 2:00:00
 
salloc: Granted job allocation 123456
salloc: Waiting for resource configuration
salloc: Nodes dnagpu001 are ready for job
[ulambda@dnagpu001 ]$ In order to make all the necessary tools available we first need to load some modules
$ module load gcc python hh-suite openmm hmmer pdbfixer kalign cuda cudnn
$ module list
Currently Loaded Modules:
  1) gcc/9.3.0     2) python/3.8.8   3) hh-suite/3.3.0   4) fftw/3.3.9   5) openmm/7.5.0   
  6) hmmer/3.3.2   7) pdbfixer/1.7   8) kalign/3.3.1     9) cuda/11.2.2  10) cudnn/8.1.1.33-11.2We then activate the virtual environment
$ source /work/path/to/my/project/alpha-venv/bin/activate
(alpha-venv) [ulambda@dnagpu001 ~]$ The change into the alphafold repository and launch a task
$ cd /work/path/to/my/project/alphafold
$ ./run_alphafold.sh -d /dcsrsoft/reference/alphafold -o /scratch/ulambda/alphatest -m model_1 -f T1024.fasta -t 2021-05-01 -g true
2021-07-20 17:32:00.051940: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
..
..
2021-07-20 17:32:04.076381: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:171] XLA service 0x318b090 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2021-07-20 17:32:04.076419: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:179]   StreamExecutor device (0): Interpreter, <undefined>
..
I0720 17:45:51.702012 140086992525120 utils.py:36] Started HHsearch query
I0720 17:47:37.402516 140086992525120 utils.py:40] Finished HHsearch query in 105.700 seconds
I0720 17:47:38.506151 140086992525120 hhblits.py:128] Launching subprocess "/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/hh-suite-3.3.0-k3vfe6b2jsdl6cebrcmb3qoxav2gyukz/bin/hhblits -i T1024.fasta -cpu 4 -oa3m /tmp/tmpkv138q2u/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /dcsrsoft/reference/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /dcsrsoft/reference/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0720 17:47:38.572123 140086992525120 utils.py:36] Started HHblits query
..
..Example batch script
A batch script to be submitted via sbatch which does the same thing as above
#!/bin/bash
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 24
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --gres-flags=enforce-binding
#SBATCH --mem 200G
#SBATCH -t 1-0
module purge
module load gcc python hh-suite openmm hmmer pdbfixer kalign cuda cudnn
source /work/path/to/my/project/alpha-venv/bin/activate
cd /work/path/to/my/project/alphafold/
./run_alphafold.sh -d /dcsrsoft/reference/alphafold -o /scratch/ulambda/alphatest -m model_1 -f T1024.fasta -t 2021-05-01 -g true
