Software

DCSR Software Stack

What is it?

The DCSR provides a software environment including commonly used scientific tools and libraries.

The software is optimised to make best use of the CPUs, GPUs and high speed Infiniband interconnect.

In order to create the environment we use the Spack package manager and Lmod.

Information on how to use the software stack can be found in our introductory course.

For information on the deprecated Vital-IT software stack please see here.


Release and lifecycle

Each year we provide a new release of the software stack which fixes versions for key tools and libraries - the new stack is put in production during the annual maintenance in early January and the previous release remains available for one year.

The versions for key components are given in the following table:

Year Release name GCC MPI Intel R Python CUDA
2021/2 Hêtre 9.3.0 MVAPICH2 2.3.5 2021.2.0 4.0.5 3.8.8 11.2
2022/3 Arolle  10.4.0 MVAPICH2 2.3.7  2022.1.0 4.2.1 3.9.13 11.6

Newer versions of tools may be made available during the year but the base versions will remain the default.


Jura


On Jura (sensitive data) cluster the stack is not loaded by default and must be activated with

source /dcsrsoft/spack/bin/setup_dcsrsoft


Old software stack

The old (Vital-IT) software stack can be accessed on Curnagl via the following commands

$ source /dcsrsoft/bin/use_old_software 

##################################
#                                #
#  WARNING - USING OLD SOFTWARE  #
#                                #
##################################
 
$ module load Bioinformatics/Software/vital-it 

Please note that the old stack is not updated, no new tools can be added and there is no guarantee that it will work.

Compiling and running MPI codes

To illustrate the procedure we will compile and run a MPI hello world example from mpitutorial.com. First we download the source code:
$ wget https://raw.githubusercontent.com/mpitutorial/mpitutorial/gh-pages/tutorials/mpi-hello-world/code/mpi_hello_world.c

 

Compiling with GCC

To compile the code, we first need to load the gcc and mvapich2 modules:

$ module load gcc
$ module load mvapich2                                                                                                                                       
Then we can produce the executable called mpi_hello_world by compiling the source code mpi_hello_world.c:
$ mpicc mpi_hello_world.c -o mpi_hello_world
The mpicc tool is a wrapper around the gcc compiler that adds the correct options for linking MPI codes and if you are curious you can run mpicc -show to see what it does.
To run the executable we create a Slurm submission script called run_mpi_hello_world.sh, where we ask to run a total of 4 MPI tasks with (at max) 2 tasks per node:
#!/bin/bash

#SBATCH --time 00-00:05:00
#SBATCH --mem=2G
#SBATCH --ntasks 4
#SBATCH --ntasks-per-node 2
#SBATCH --cpus-per-task 1

module purge
module load gcc
module load mvapich2
module list

EXE=mpi_hello_world
[ ! -f  $EXE ] && echo "EXE $EXE not found." && exit 1

srun  $EXE
Finally, we submit our MPI job with:
$ sbatch run_mpi_hello_world.sh

Upon completion you should get something like:
...

Hello world from processor dna001.curnagl, rank 1 out of 4 processors
Hello world from processor dna001.curnagl, rank 3 out of 4 processors
Hello world from processor dna004.curnagl, rank 0 out of 4 processors
Hello world from processor dna004.curnagl, rank 2 out of 4 processors

It is important to check is that you have a single group of 4 processors and not 4 groups of 1 processor. If that's the case, you can now compile and run your own MPI application.

The important bit of the script is the srun $EXE as MPI jobs but be started with a job launcher in order to run multiple processes on multiple nodes. 

Compiling with Intel

Rather than compiling with GCC and MVAPICH2, you can compile and run your MPI application with the tools from Intel. So, instead of loading the modules gcc and mpich, you load the modules intel and intel-oneapi-mpi:

$ module load intel
$ module load intel-oneapi-mpi 

To compile, use the Intel compiler wrapper mpiicc (rather than mpiic which is a wrapper for gcc):

$ mpiicc mpi_hello_world.c -o mpi_hello_world

And to run, simply load the right modules accordingly:

#!/bin/bash

#SBATCH --time 00-00:05:00
#SBATCH --mem=2G
#SBATCH --ntasks 4
#SBATCH --ntasks-per-node 2
#SBATCH --cpus-per-task 1

module purge
module load intel
module load intel-oneapi-mpi
module list

EXE=mpi_hello_world
[ ! -f  $EXE ] && echo "EXE $EXE not found." && exit 1

srun $EXE

 

MATLAB on the clusters

The full version of MATLAB is only installed on the login and interactive nodes so in order to run MATLAB jobs on the cluster you first need to compile your .m files then run them using the MATLAB runtime.

This is because the UNIL has a limited number of licences and with an HPC cluster it's easy to use them all.

The number of licences and available toolboxes is detailed here

Thankfully the compilation process isn't too complicated but there are a number of steps to follow and a few issues to be aware of.

Let's start with our MatrixCAB.m file

disp("Matrix A:");
A = [1, 2; 3, 4];
disp(A);

disp("Matrix B:");
B = [5, 6; 7, 8];
disp(B);

disp("Matrix C = A * B:");
C = A * B;
disp(C);


First of all we need to load the module that provides MATLAB

[ulambda@login ~]$ module load matlab
[ulambda@login ~]$ module list

Currently Loaded Modules:
  1) matlab/2021b

We now compile the MatrixCAB.m file with the mcc compiler which is now in the path.

$ mcc -v -m MatrixCAB.m 

Compiler version: 8.1 (R2021b)
Dependency analysis by REQUIREMENTS.
Parsing file "/users/ulambda/MatrixCAB.m"
	(referenced from command line).
Generating file "/users/ulambda/readme.txt".
Generating file "MatrixCAB.sh".

The compiler documentation can be found at https://ch.mathworks.com/help/compiler/mcc.html

Note that there are now 3 new files:

readme.txt

run_MatrixCAB.sh

MatrixCAB

If we take a look at the last file we see that it's an executable file

$ file MatrixCAB
MatrixCAB: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=ad76a4654419e7968208a77a172f103afe2d77c2, stripped

The curious are welcome to look at the output from ldd which shows what the executable is linked to.

$ module load matlab-runtime
$ ldd MatrixCAB


The readme.txt explains in great detail how to run the compiled object and the run_MatrixCAB.sh script is for launching the job.

In order to make use of the executable we need to load the MATLAB runtime environment module

module load matlab-runtime

Please note that the runtime has to correspond to the version of mcc used to compile the .m file. Please see the following page for the corresponding runtime and compiler versions:

https://ch.mathworks.com/products/compiler/matlab-runtime.html

On the DCSR clusters the modules are configured to have the same version naming scheme:

matlab-runtime/2021b    
matlab/2021b 

The runtime module sets the MCR_PATH variable which is needed by the run_MatrixCAB.sh script.

To launch the compiled MatrixCAB object we need to put all the elements together:

sh run_MatrixCAB.sh $MCR_PATH

Obviously this should be done on a compute node using a job script:

#!/bin/bash

#SBATCH --time 00-00:05:00
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 4000M

module load matlab-runtime/2021b

MATLAB_SCRIPT=MatrixCAB

sh run_$MATLAB_SCRIPT.sh $MCR_PATH

echo "Finished - next time I'll port my code to Julia"

Task farming with Matlab

When processing numerous Matlab jobs in parallel on the clusters, you will likely encounter stability issues with some jobs failing randomly, other hanging (see below the explanations from Matlab support). To solve the issue, you must set the MCR_CACHE_ROOT environment variable (see https://ch.mathworks.com/help/compiler_sdk/ml_code/mcr-component-cache-and-ctf-archive-embedding.html) in order that the same location (by default in your home directory) is not used by all jobs.

For job arrays, you can adopt the following:

#!/bin/bash

#SBATCH --array=1-5
#SBATCH --partition cpu
#SBATCH --mem=8G
#SBATCH --time=00:15:00

module load matlab-runtime/2021b

# Create a task-specific MCR_CACHE_ROOT directory

mcr_cache_root=/tmp/$USER/MCR_CACHE_ROOT_${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}
mkdir -pv $mcr_cache_root
export MCR_CACHE_ROOT=$mcr_cache_root

### YOUR MATLAB ANALYSIS HERE

MATLAB_SCRIPT=MatrixCAB

sh run_$MATLAB_SCRIPT.sh $MCR_PATH

###

# Tidy up the place
rm -rv $mcr_cache_root


Explanations from Matlab support

When running a MATLAB Compiler standalone executable, the MCR_CACHE_ROOT location is used by the standalone executable to extract the deployable archive into. As the name suggests, the extracted archive is cached in this location, meaning the archive is extracted the very first time you run the application and then for consecutive runs the already extracted data from the cache is used.

There are mechanisms in place which try to ensure that when you run multiple instances of the same application at the same time, you do not run into any concurrency issues with this cache (e.g. a second instance should not also try to extract the archive if the first instance was already in the process of doing this). However, there are some limitations to these mechanisms; they were designed to deal with concurrency issues which might occur if an interactive user would run a handful of concurrent instances of the application; when doing this interactively this implies that you are not starting all those instances at exactly the same point in time and there are at least a few seconds between starting each instance. If you are somehow starting a lot of instances at virtual the same time (through some shell script, or possible even some cluster scheduler), this mechanism may break down. The likelihood of running into issues increases even more if the cache is in located on a shared network drive, shared by multiple machines (which can definitely be the case for a home directory), and all these machines are running instances of the same application.

This is probably what you are running into then. Giving each instance its own cache location would prevent those issues altogether as there would be no concurrency in the first place.


Using Conda and Anaconda

Conda is a package manager system for Python and other tools and is widely used in some areas such as bioinformatics and data science. On personal computers it is a useful way to install a stack of tools.

The full documentation can be found at

https://docs.conda.io/projects/conda/en/latest/user-guide/index.html

Warning: Conda, whilst convenient, is not designed to be installed on multi-user compute clusters and we are unable to guarantee that tools installed via it will work correctly. This is especially true for any parallel (MPI) tools.

Setting up Conda

First load the appropriate modules

$ module load gcc miniconda3

For getting the conda command to work with your bash shell, you need to type

eval "$(command conda 'shell.bash' 'hook' 2> /dev/null)"

You can automatize this to happen every time you log in, by simply typing the very first time you use it:

$ conda init bash

This command will hang on a sudo password input, just ignore it (ctrl-c)

You will now probably need to log out and back in again to "activate" the changes.

Once you log in again conda should be available.

However this is not recommended, especially if you are using different kind of environments (eg. Conda and Mamba). A convenient option is to define and alias inside your ~/.bashrc by adding at the end the following line:

alias goconda="eval \"\$(command conda 'shell.bash' 'hook' 2> /dev/null)\""

Then each time you need Conda, after loading the module, you just type

goconda

Please ignore any messages about updating to a newer version of conda! 

Configuring Conda

By default Conda will put everything including downloads in your home directory. Due to the limited space available this is probable not what you want.

We strongly recommend that you create a .condarc file in your home directory with the following options:

pkgs_dirs:
  - /work/path/to/my/project/space

where the path is the path to your project space on /work - we do not recommend installing things in /scratch as they might be automatically deleted.

You may also wish to add a non standard env_dirs

envs_dirs:
  - ~/myproject-envs

Please see the full condarc documentation for all the possible configuration options

https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html

Using Conda virtual environments

The basic commands for creating conda environments are: 

Creation
$ conda create --name $MY_CONDA_ENV_NAME
Activation
$ conda activate $MY_CONDA_ENV_NAME
Deactivation
$ conda deactivate
Environment in specific location

If you need to create an environment in a non standard location:

$ conda create --prefix $MY_CONDA_ENV_PATH

$ conda activate $MY_CONDA_ENV_PATH

$ conda deactivate

Installing packages

The base commands are:

$ conda search $PACKAGE_NAME
$ conda install $PACKAGE_NAME

Running Slurm jobs with conda

Since Conda needs some initialization before being used, a Sbatch script must explicitly ask to run bash in login mode. This can be performed by adding --login option to the shebang. Here is an example of Sbatch script using Conda:

#!/bin/bash --login

#SBATCH --time 00-00:05:00
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 4G

module load gcc miniconda3
conda activate $MY_CONDA_ENV_PATH
…

Using Mamba to install Conda packages

Mamba is an alternative to Conda package manager. The main advantage is its speed regarding dependency resolution.

Setting up Mamba

The proposed installation is based on micromamba and doesn't require any installation or module loading on the cluster. You just have to add the following 2 lines to your ~/.bashrc file:

export PATH="$PATH:/dcsrsoft/spack/external/micromamba"
export MAMBA_ROOT_PREFIX="/work/FAC/INSTITUTE/PI/PROJECT/mamba_root"

Of course, replace /work/FAC/INSTITUTE/PI/PROJECT with the path corresponding to your project.

Then, you just have to run the initialization process with the following command:

micromamba shell init

Finally, you have to logout from the cluster and the environment will be properly configured at the next login.

Using Mamba

Instead of using conda commands, you can replace conda with micromamba. For instance:

micromamba create --prefix ./my_mamba_env
micromamba activate ./my_mamba_env
micromamba install busco -c conda-forge -c bioconda
busco -v
micromamba deactivate

Restriction

You cannot use Mamba with virtual environment created previously with Conda. Such environments must be recreated.

Deep Learning with GPUs

The training phase of your deep learning model may be very time consuming. To accelerate this process you may want to use GPUs and you will need to install the deep learning packages, such as Keras or PyTorch, properly. Here is a short documentation on how to install some well known deep learning packages in Python and R. If you encounter any problem during the installation or if you need to install other deep learning packages (in Python, R or other programming languages), please send an email to helpdesk@unil.ch with subject DCSR: Deep Learning package installation, and we will try to help you.

Keras

We will install the TensorFlow 2's implementation of the Keras API (tf.keras); see https://keras.io/about/

To install the packages in your home directory:

cd $HOME

Log into a GPU node:

Sinteractive -m 4G -G 1

Check that the GPU is visible:

nvidia-smi

Load parallel modules and python:

module purge
module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

Create a virtual environment. Here we will call it "venv_tensorflow_gpu", but you may choose another name:

python -m venv venv_tensorflow_gpu

Activate the virtual environment:

source venv_tensorflow_gpu/bin/activate

Install TensorFlow (which includes Keras):

pip install tensorflow

Check that TensorFlow was properly installed:

python -c 'import tensorflow; print(tensorflow.__version__)'

There might be a warning message and the output should be something like "2.5.0".

You may install extra packages that you deep learning code will use. For example:

pip install numpy
pip install scikit-learn
pip install pandas
pip install matplotlib

Deactivate your virtual environment and logout from the GPU node:

deactivate
exit

Comment

If you want to make your installation more reproducible, you may proceed as follows:

1. Create a file called "requirements.txt" and write the package names inside. You may also specify the package versions. For example:

tensorflow==2.5.0
numpy==1.19.5
scikit-learn==0.24.2
pandas==1.2.5
matplotlib==3.4.2

2. Proceed as above, but instead of installing the packages individually, type 

pip install -r requirements.txt

Run your deep learning code

To test your deep learning code (maximum 1h), say "my_deep_learning_code.py", you may use the interactive mode:

cd /scratch/username/

Sinteractive -p interactive -m 4G -G 1

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

source $HOME/venv_tensorflow_gpu/bin/activate

Run your code:

python my_deep_learning_code.py

or copy/paste your code inside a python environment:

python

copy/paste your code. For example:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

etc

Comment

To confirm that TensorFlow is using the GPU:

import tensorflow as tf
tf.config.list_physical_devices("GPU")

or to obtain the number of GPUs available:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))

Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my_sbatch_script.sh":

#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch

#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out

#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

source $HOME/venv_tensorflow_gpu/bin/activate

python /PATH_TO_YOUR_CODE/my_deep_learning_code.py

To launch your job:

cd $HOME/PATH_TO_YOUR_SBATCH_SCRIPT/

sbatch my_sbatch_script.sh

Multi-GPU parallelism

If you want to use a single GPU, you do not need to tell Keras to use the GPU. Indeed, if a GPU is available, Keras will use it automatically.

On the other hand, if you want to use 2 (or more) GPUs (on the same node), you need to use a special TensorFlow function, called "tf.distribute.MirroredStrategy", in your python code "my_deep_learning_code.py": see the Keras documentation https://keras.io/guides/distributed_training/  If no devices are specified in the constructor argument of the strategy then it will use all the available GPUs. If no GPUs are found, it will use the available CPUs. 

This function implements single-machine multi-GPU data parallelism. It works in the following way: divide the batch data into multiple sub-batches, apply a model copy on each sub-batch, where every model copy is executed on a dedicated GPU, and finally concatenate the results (on CPU) into one big batch. For example, if your batch_size is 64 and you use 2 GPUs, then we will divide the input data into 2 sub-batches of 32 samples, process each sub-batch on one GPU, then return the full batch of 64 processed samples. This induces quasi-linear speedup.

And the sbatch script must contain the line:

#SBATCH --gres gpu:2

TensorBoard

To use TensorBoard on Curnagl, you need to modify your code as explained in https://keras.io/api/callbacks/tensorboard/ .

After your TensorBoard "logs" directory has been created, you need to proceed as follows:

[/scratch/pjacquet] Sinteractive -m 4G -G 1
Sinteractive is running with the following options:

--gres=gpu:1 -c 1 --mem 4G -J interactive -p interactive -t 1:00:00 --x11

salloc: Granted job allocation 2466209
salloc: Waiting for resource configuration
salloc: Nodes dnagpu001 are ready for job

You need to remember the GPU node's name dnagpuXXX. Here it is dnagpu001.

Then

[/scratch/pjacquet] module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

[/scratch/pjacquet] source $HOME/venv_tensorflow_gpu/bin/activate

(venv_tensorflow_gpu) [/scratch/pjacquet] ls
logs

(venv_tensorflow_gpu) [/scratch/pjacquet] tensorboard --logdir=./logs --port=6006

You will see the following message:

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.6.0 at http://localhost:6006/ (Press CTRL+C to quit)

On your laptop, you need to type:

ssh -J curnagl.dcsr.unil.ch -L 6006:localhost:6006 dnagpuXXX

where dnagpuXXX is the GPU node's name you used to launch TensorBoard (above it was dnagpu001).

Finally, on your laptop, you may use any web browser (e.g. Chrome) to open the page http://localhost:6006 (copy/paste this link into your web browser). You should then see TensorBoard with the information located in the "logs" folder.

TensorFlow

The installation of TensorFlow 2 is the same as for Keras, so please look at the above Keras installation.

Warning

In TensorFlow 1.15 and previous versions, the packages for CPU and GPU are offered separately:

pip install tensorflow==1.15 # CPU
pip install tensorflow-gpu==1.15 # GPU

PyTorch

To install the packages in your home directory:

cd $HOME

Log into a GPU node:

Sinteractive -m 4G -G 1

Check that the GPU is visible:

nvidia-smi

Load parallel modules and python:

module purge
module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

Create a virtual environment. Here we will call it "venv_pytorch_gpu", but you may choose another name:

python -m venv venv_pytorch_gpu

Activate the virtual environment:

source venv_pytorch_gpu/bin/activate

Install PyTorch:

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

Check that PyTorch was properly installed:

python -c 'import torch; print(torch.__version__)'

There might be a warning message and the output should be something like "1.8.1".

You may install extra packages that you deep learning code will use. For example:

pip install scikit-learn
pip install pandas
pip install matplotlib

Deactivate your virtual environment and logout from the GPU node:

deactivate
exit

Comment

If you want to make your installation more reproducible, you may proceed as follows:

1. Create a file called "requirements.txt" and write the package names inside. You may also specify the package versions. For example:

torch==1.8.1
torchvision==0.9.1
scikit-learn==0.24.2
pandas==1.2.4
matplotlib==3.4.2

2. Proceed as above, but instead of installing the packages individually, type 

pip install -r requirements.txt

Run your deep learning code

To test your deep learning code (maximum 1h), say "my_deep_learning_code.py", you may use the interactive mode:

cd /scratch/username/

Sinteractive -m 4G -G 1

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

source $HOME/venv_pytorch_gpu/bin/activate

Run your code:

python my_deep_learning_code.py

or copy/paste your code inside a python environment:

python

copy/paste your code

Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my_sbatch_script.sh":

#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch

#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out

#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

source $HOME/venv_pytorch_gpu/bin/activate

python /PATH_TO_YOUR_CODE/my_deep_learning_code.py

To launch your job:

cd $HOME/PATH_TO_YOUR_SBATCH_SCRIPT/

sbatch my_sbatch_script.sh

TensorBoard

You may use TensorBoard with PyTorch by looking at the documentation  

https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html

and by adapting slightly the instructions above (see TensorBoard in Keras). 

R Keras

R Keras is an interface to Python Keras. In simple terms, this means that the Keras R package allows you to enjoy the benefit of R programming while having access to the capabilities of the Python Keras package.

To install the packages in your home directory:

cd $HOME

Log into a GPU node:

Sinteractive -m 4G -G 1

Check that the GPU is visible:

nvidia-smi

Load parallel modules and python:

module purge
module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13 r/4.2.1

Launch an R environment:

Install the R Keras package by using a virtual environment (called "venv_r-tensorflow_gpu"):

install.packages("keras")

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

library(keras)
library("tensorflow")

install_tensorflow(version = "2.5.0-gpu", method = "virtualenv", envname = "venv_r-tensorflow_gpu")

q()

This will install Keras and TensorFlow.

Comment

If you receive an error message concerning "conda", you may need to look at your .bashrc file for a conda init configuration and comment this part.

Run your deep learning code

To test your deep learning code (maximum 1h), say "my_deep_learning_code.R", you may use the interactive mode:

Sinteractive -m 4G -G 1

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13 r/4.2.1

R

library(keras)
library("tensorflow")

copy/paste your code

Comment

To confirm that TensorFlow is using the GPU:

tf$config$list_physical_devices("GPU")

or to obtain the number of GPUs available:

print(length(tf$config$list_physical_devices("GPU")))

Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my_sbatch_script.sh":

#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch

#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out

#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13 r/4.2.1

Rscript /PATH_TO_YOUR_CODE/my_deep_learning_code.R

To launch your job:

cd $HOME/PATH_TO_YOUR_SBATCH_SCRIPT/

sbatch my_sbatch_script.sh

Multi-GPU parallelism

See the explanation under the Python Keras installation.

AlphaFold

The project home page where you can find the latest information is at https://github.com/deepmind/alphafold 

For details on how to run the model please see the Supplementary Information article

For some ideas on how to separate the CPU and GPU parts: https://github.com/Zuricho/ParallelFold.

Alternatively - check out what has already been calculated

Note on GPU usage

Whilst Alphafold makes use of GPUs for the inference part of the modelling, depending on the use case, this can be a small part of the running time as shown by the timings.json file that is produced for every run:

For the T1024 test case:

{
    "features": 6510.152379751205,
    "process_features_model_1_pred_0": 3.555035352706909,
    "predict_and_compile_model_1_pred_0": 124.84101128578186,
    "relax_model_1_pred_0": 25.707252502441406,
    "process_features_model_2_pred_0": 2.0465400218963623,
    "predict_and_compile_model_2_pred_0": 104.1096305847168,
    "relax_model_2_pred_0": 14.539108514785767,
    "process_features_model_3_pred_0": 1.7761900424957275,
    "predict_and_compile_model_3_pred_0": 82.07982850074768,
    "relax_model_3_pred_0": 13.683411598205566,
    "process_features_model_4_pred_0": 1.8073537349700928,
    "predict_and_compile_model_4_pred_0": 82.5819890499115,
    "relax_model_4_pred_0": 15.835367441177368,
    "process_features_model_5_pred_0": 1.9143474102020264,
    "predict_and_compile_model_5_pred_0": 77.47663712501526,
    "relax_model_5_pred_0": 14.72615647315979
}

That means that out of the ~2 hour run time 1h48 is spend running "classical" code (mostly hhblits) and only ~10 minutes is spent on the GPU.

As such do not request 2 GPUs as the potential speedup is negligible and this will block resources for other users

For multimer modelling the GPU part can take longer and depending on what you need it might be worth turning off relaxation. Always check the timings.json file to see where time is being spent! 

If we look at the overall efficiency of the job using seff we see:

Nodes: 1
Cores per node: 24
CPU Utilized: 03:28:24
CPU Efficiency: 7.33% of 1-23:21:36 core-walltime
Job Wall-clock time: 01:58:24
Memory Utilized: 81.94 GB
Memory Efficiency: 40.97% of 200.00 GB


Reference databases

The reference databases needed for AlphaFold have been made available in /reference/alphafold so there is no need to download them - the directory name is the date on which the databases were downloaded.

$ ls /reference/alphafold/
20210719  
20211104
20220414

New versions will be downloaded if required.

The versions correspond to:


Using containers

The Alphafold project recommend using Docker to run the code which works on cloud or personal resources but not when using shared HPC systems as administrative access (required for Docker) is obviously not permitted.

Singularity containers

We provide Singularity images which can be used on the DCSR clusters and these can be found in /dcsrsoft/singularity/containers/

The currently available images are:

When running the images directly it is necessary to provide all the paths to databases which is error prone and tedious.

$ singularity run /dcsrsoft/singularity/containers/alphafold-v2.1.1.sif --helpshort
Full AlphaFold protein structure prediction script.
flags:

/app/alphafold/run_alphafold.py:
  --[no]benchmark: Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins.
    (default: 'false')
  --bfd_database_path: Path to the BFD database for use by HHblits.
  --data_dir: Path to directory of supporting data.
  --db_preset: <full_dbs|reduced_dbs>: Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config  (full_dbs)
    (default: 'full_dbs')
  --fasta_paths: Paths to FASTA files, each containing a prediction target that will be folded one after another. If a FASTA file contains multiple sequences, then it will be folded as a multimer. Paths should be separated by commas. All FASTA paths must have a unique basename as the basename is used
    to name the output directories for each prediction.
    (a comma separated list)
  --hhblits_binary_path: Path to the HHblits executable.
    (default: '/opt/conda/bin/hhblits')
  --hhsearch_binary_path: Path to the HHsearch executable.
    (default: '/opt/conda/bin/hhsearch')
  --hmmbuild_binary_path: Path to the hmmbuild executable.
    (default: '/usr/bin/hmmbuild')
  --hmmsearch_binary_path: Path to the hmmsearch executable.
    (default: '/usr/bin/hmmsearch')
  --is_prokaryote_list: Optional for multimer system, not used by the single chain system. This list should contain a boolean for each fasta specifying true where the target complex is from a prokaryote, and false where it is not, or where the origin is unknown. These values determine the pairing
    method for the MSA.
    (a comma separated list)
  --jackhmmer_binary_path: Path to the JackHMMER executable.
    (default: '/usr/bin/jackhmmer')
  --kalign_binary_path: Path to the Kalign executable.
    (default: '/usr/bin/kalign')
  --max_template_date: Maximum template release date to consider. Important if folding historical test sets.
  --mgnify_database_path: Path to the MGnify database for use by JackHMMER.
  --model_preset: <monomer|monomer_casp14|monomer_ptm|multimer>: Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model
    (default: 'monomer')
  --obsolete_pdbs_path: Path to file containing a mapping from obsolete PDB IDs to the PDB IDs of their replacements.
  --output_dir: Path to a directory that will store the results.
  --pdb70_database_path: Path to the PDB70 database for use by HHsearch.
  --pdb_seqres_database_path: Path to the PDB seqres database for use by hmmsearch.
  --random_seed: The random seed for the data pipeline. By default, this is randomly generated. Note that even if this is set, Alphafold may still not be deterministic, because processes like GPU inference are nondeterministic.
    (an integer)
  --small_bfd_database_path: Path to the small version of BFD used with the "reduced_dbs" preset.
  --template_mmcif_dir: Path to a directory with template mmCIF structures, each named <pdb_id>.cif
  --uniclust30_database_path: Path to the Uniclust30 database for use by HHblits.
  --uniprot_database_path: Path to the Uniprot database for use by JackHMMer.
  --uniref90_database_path: Path to the Uniref90 database for use by JackHMMER.
  --[no]use_precomputed_msas: Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have changed.
    (default: 'false')

Try --helpfull to get a list of all flags.

To run the container - here we are using a GPU so the --nv flag must be used to make the GPU visible inside the container

module load singularity

singularity run --nv /dcsrsoft/singularity/containers/alphafold-v2.1.1.sif <OPTIONS>


Helper Scripts

In order to make life simpler there is a wrapper script: run_alphafold_2.2.0.py - this can be found at: 

/dcsrsoft/singularity/containers/run_alphafold_2.2.0.py

Please copy it to your working directory 

$ python3 run_alphafold_2.2.0.py -h

usage: run_alphafold_2.2.0.py [-h] --fasta-paths FASTA_PATHS [FASTA_PATHS ...] [--max-template-date MAX_TEMPLATE_DATE] [--db-preset {reduced_dbs,full_dbs}] [--model-preset {monomer,monomer_casp14,monomer_ptm,multimer}] [--num-multimer-predictions-per-model NUM_MULTIMER_PREDICTIONS_PER_MODEL] [--benchmark]
                              [--use-precomputed-msas] [--data-dir DATA_DIR] [--docker-image DOCKER_IMAGE] [--output-dir OUTPUT_DIR] [--use-gpu] [--run-relax] [--enable-gpu-relax] [--gpu-devices GPU_DEVICES] [--cpus CPUS]

Singularity launch script for Alphafold v2.2.0

optional arguments:
  -h, --help            show this help message and exit
  --fasta-paths FASTA_PATHS [FASTA_PATHS ...], -f FASTA_PATHS [FASTA_PATHS ...]
                        Paths to FASTA files, each containing one sequence. All FASTA paths must have a unique basename as the basename is used to name the output directories for each prediction.
  --max-template-date MAX_TEMPLATE_DATE, -t MAX_TEMPLATE_DATE
                        Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets.
  --db-preset {reduced_dbs,full_dbs}
                        Choose preset model configuration - no ensembling with uniref90 + bfd + uniclust30 (full_dbs), or 8 model ensemblings with uniref90 + bfd + uniclust30 (casp14).
  --model-preset {monomer,monomer_casp14,monomer_ptm,multimer}
                        Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multimer model
  --num-multimer-predictions-per-model NUM_MULTIMER_PREDICTIONS_PER_MODEL
                        How many predictions (each with a different random seed) will be generated per model. E.g. if this is 2 and there are 5 models then there will be 10 predictions per input. Note: this FLAG only applies if model_preset=multimer
  --benchmark, -b       Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins.
  --use-precomputed-msas
                        Whether to read MSAs that have been written to disk instead of running the MSA tools. The MSA files are looked up in the output directory, so it must stay the same between multiple runs that are to reuse the MSAs. WARNING: This will not check if the sequence, database or configuration
                        have changed.
  --data-dir DATA_DIR, -d DATA_DIR
                        Path to directory with supporting data: AlphaFold parameters and genetic and template databases. Set to the target of download_all_databases.sh.
  --docker-image DOCKER_IMAGE
                        Alphafold docker image.
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Output directory for results.
  --use-gpu             Enable NVIDIA runtime to run with GPUs.
  --run-relax           Whether to run the final relaxation step on the predicted models. Turning relax off might result in predictions with distracting stereochemical violations but might help in case you are having issues with the relaxation stage.
  --enable-gpu-relax    Run relax on GPU if GPU is enabled.
  --gpu-devices GPU_DEVICES
                        Comma separated list of devices to pass to NVIDIA_VISIBLE_DEVICES.
  --cpus CPUS, -c CPUS  Number of CPUs to use.


An example batch script using the helper script is:

#!/bin/bash

#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 24
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --gres-flags=enforce-binding
#SBATCH --mem 200G
#SBATCH -t 6:00:00

module purge
module load singularity

export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"

./run_alphafold_2.2.0.py --data-dir /reference/alphafold/20220414 --cpus 24 --use-gpu --fasta-paths ./T1024.fasta --output-dir /scratch/ulambda/alphafold/runtest


Alphafold without containers


Fans of Conda may also wish to check out https://github.com/kalininalab/alphafold_non_docker.  Just make sure to module load gcc miniconda3 rather than following the exact procedure!



R on the clusters

R is provided via the DCSR software stack

Interactive mode

To load R:

module load gcc r
R
# Then you can use R interactively
> ...

Batch mode

While using R in batch mode, you have to use Rscript to launch your script. Here is an example of sbatch script, run_r.sh:

#!/bin/bash

#SBATCH --time 00-00:20:00
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 4G

module load gcc r

Rscript my_r_script.R

Then, just submit the job to Slurm:

sbatch run_r.sh

Package installation

A number of core packages are installed centrally - you can see what is available by using the library() function. Given the number of packages and multiple versions available other packages should be installed by the user. 

Installing R packages is pretty straightforward thanks to install.packages() function. However, be careful since it might fill your home directory very quickly. For big packages with large amount of dependencies, like adegenet for instance, you will probably reach the quota before the end of the installation. Here is a solution to mitigate that problem:

rm -rf $HOME/R
mkdir -p /work/FAC/FBM/DEE/my_py/default/jdoe/R
cd $HOME
ln -s /work/FAC/FBM/DEE/my_py/default/jdoe/R

Handling dependencies

Sometimes R packages depend on external libraries. For most of cases the library is already installed on the cluster you just need to load the module before trying to install the package from the R session.

If the installation of package is still failing you need to define the following variables. For example, if our package depend on gsl and mpfr libraries, we need to do the following:

module load gsl mpfr
export CPATH=$GSL_ROOT/include:$MPFR_ROOT/include
export LIBRARY_PATH=$GSL_ROOT/lib:$MPFR_ROOT/lib

Setting up an alternate personal library

If you want to set up an alternate location where to install R packages, you can proceed as follows:

mkdir -p ~/R/my_personal_lib2

# If you already have a ~/.Renviron file, make a backup
cp -iv ~/.Renviron ~/.Renviron_backup                  

echo 'R_LIBS_USER=~/R/my_personal_lib2' > ~/.Renviron

Then relaunch R. Packages will then be installed under ~/R/my_personal_lib2.

Software local installation

This page gives an example of a local installation of a software, i.e. a software that will be only available to yourself. For simplicity we assume here that the software you want to install is available as a single binary file.

To be executed from anywhere the binary must be placed in a directory contained in your PATH environment variable. We use here a directory called "bin" in your home directory:

$ mkdir ~/bin

Then, edit your ~/.bashrc file to add the newly created directory to your search path by adding this line:

export PATH=~/bin:$PATH

Then reload your .bashrc to take into account this change:

$ source ~/.bashrc

Now, you can simply copy your binary to ~/bin and it will be available from anywhere for execution:

$ cp /path/to/downloaded/my_binary ~/bin

Finally, make sure your binary is executable:

$ chmod +x ~/bin/my_binary

 

 

 

DCSR GitLab service

What is it?

The DCSR hosted version control service (https://gitlab.dcsr.unil.ch) is primarily intended for the users of the "sensitive" data clusters which do not have direct internet access. It is not an official UNIL wide version control service!

It is accessible from both the sensitive data services and the UNIL network. From outside the UNIL network a VPN connection is required. It is open to all registered users of the DCSR facilities and is hosted on reliable hardware.

Should I use it?

If you are a user of the sensitive data clusters/services then the answer is yes.

For other users it may well be more convenient to use internet accessible services such as c4science.ch or GitHub.com as these allow for external collaborations and do not require VPN access or an account on the DCSR systems.

Running Busco

A Singularity container is available for version 4.0.6 of Busco. To run it, you need to proceed as follows:

$ module load singularity
$ export SINGULARITY_BINDPATH="/scratch,/users,/work"

Some configuration files included in the container must be copied in a writable location. So create a directory in your /scratch, e.g. called "busco_config"

$ mkdir /path/to/busco_config

Then we copy the stuff out of the container to the newly created directory:

$ singularity exec /dcsrsoft/singularity/containers/busco-4.0.6 cp -rv /opt/miniconda/config/. /path/to/busco_config

Now we need to set the AUGUSTUS_CONFIG_PATH environment variable to the newly created and populated busco_config directory:

$ export AUGUSTUS_CONFIG_PATH=/path/to/busco_config

Finally, you should now be able to run a test dataset from busco (see https://gitlab.com/ezlab/busco/-/tree/master/test_data/eukaryota):

$ curl -O https://gitlab.com/ezlab/busco/-/raw/master/test_data/eukaryota/genome.fna

And launch the analysis. 
Note: in $AUGUSTUS_CONFIG_PATH you have a copy of the default config.ini used here, so you can copy, modify it and use it in the --config option in the following command:

$ singularity exec /dcsrsoft/singularity/containers/busco-4.0.6 busco --config /opt/miniconda/config/config.ini -i genome.fna -c 8 -m geno -f --out test_eukaryota

Then download the reference log:

curl -O https://gitlab.com/ezlab/busco/-/raw/master/test_data/eukaryota/expected_log.txt

And compare to the one you generated.

Offline installation on Jura

Installing new software on Jura is complicated because the cluster does not have Internet access. This page covers the installation of R, BioConductor, and Python packages.

R packages

For packages in CRAN

Since Jura cluster is not connected to the Internet, it won't be possible to install R packages directly. A local CRAN mirror has been deployed to ease the installation, it can be used as follows:

module load gcc r
R
>install.packages(c('dplyr','ggplot2','cluster'), repos='http://mirror.dcsr.unil.ch/cran/')

For packages not in CRAN

For packages not available in the local CRAN mirror, you will have to go through the following procedure:

STEP 1: download the package on a machine connected to the internet, e.g.:

wget http://cnsgenomics.com/software/gsmr/static/gsmr_1.0.9.tar.gz

STEP 2: transfer the package to Jura using the standard procedure detailed here

STEP 3: log into Jura

STEP 4: launch R

module load gcc r
R

STEP 5: install the package with:

> install.packages("/path/to/gsmr_1.0.9.tar.gz", repos = NULL, type="source")

For packages with the source on a Git server

For packages provided on a Git server, you will have to first build a package to transfer to Jura before you can proceed with the installation of the package per se. From a machine connected to the internet:

STEP 1: clone the Git repository in directory hereinafter referred to as "/path/to"

git clone https://github.com/jean997/causeSims.git

STEP 2: launch R

module load gcc r
R

STEP 3: build the R package

> require("devtools")
> build("causeSims")

which should output something along these lines:

> build("causeSims")
✔ checking for file ‘/path/to/causeSims/DESCRIPTION’ ...
─ preparing ‘causeSims’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘causeSims_0.1.0.tar.gz’
  Warning: invalid uid value replaced by that for user 'nobody'
  
[1] "/path/to/causeSims_0.1.0.tar.gz"

Then continue with STEPS 2, 3, 4 & 5 of paragraph "For packages not in CRAN" above.


BioConductor packages

BioConductor is a package manager that enhances R with bio-informatic packages. A local BioConductor mirror has been deployed to ease the installation.

First, you have to define a `~/.Rprofile` file with, at least, the following content:

options(
    BioC_mirror = "http://mirror.dcsr.unil.ch/bioconductor",
    repos = "http://mirror.dcsr.unil.ch/cran"
)
options(
    BIOCONDUCTOR_ONLINE_VERSION_DIAGNOSIS = FALSE
)

 

It is very important to have an empty new line at the end of the file!

Then you can launch R:

module load gcc r
R

And install the BioConducter package manager:

> install.packages("http://mirror.dcsr.unil.ch/cran/src/contrib/Archive/BiocManager/BiocManager_1.30.10.tar.gz", repos=NULL, type="source")
> install.packages("http://mirror.dcsr.unil.ch/bioconductor/BiocVersion_3.12.0.tar.gz", repos=NULL, type="source")
> library(BiocManager)

The first step might ask if you want to use a personal library, you can answer yes to both questions.

Finally, you can install the BioConductor packages, for instance edgeR:

> BiocManager::install("edgeR")

At the end of the installation, R might ask you if you want to update BiocManager package. Please don't since newer version is not working with the installed version of R

Python packages

Thankfully Python packages are somewhat easier to deal with - here we use PyTorch as an example

First, on a system that has internet access use pip3 download

mkdir torch

cd torch

pip3 download torch torchvision
Collecting torch
  Downloading https://files.pythonhosted.org/packages/76/58/668ffb25215b3f8231a550a227be7f905f514859c70a65ca59d28f9b7f60/torch-1.5.0-cp37-cp37m-manylinux1_x86_64.whl (752.0MB)
     |████████████████████████████████| 752.0MB 33kB/s 
  Saved ./torch-1.5.0-cp37-cp37m-manylinux1_x86_64.whl
Collecting torchvision
  Downloading https://files.pythonhosted.org/packages/7b/ed/a894f274a7733d6492e438a5831a95b507c5ec777edf6d8c3b97574e08c4/torchvision-0.6.0-cp37-cp37m-manylinux1_x86_64.whl (6.6MB)
     |████████████████████████████████| 6.6MB 15.4MB/s 
  Saved ./torchvision-0.6.0-cp37-cp37m-manylinux1_x86_64.whl
Collecting numpy
  Using cached https://files.pythonhosted.org/packages/1f/df/7988fbbdc8c9b8efb575029498ad84b77e023a3e4623e85068823a102b1d/numpy-1.18.4-cp37-cp37m-manylinux1_x86_64.whl
  Saved ./numpy-1.18.4-cp37-cp37m-manylinux1_x86_64.whl
Collecting future
  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
     |████████████████████████████████| 829kB 23.1MB/s 
  Saved ./future-0.18.2.tar.gz
Collecting pillow>=4.1.1
  Downloading https://files.pythonhosted.org/packages/ab/f8/d3627cc230270a6a4eedee32974fbc8cb26c5fdb8710dd5ea70133640022/Pillow-7.1.2-cp37-cp37m-manylinux1_x86_64.whl (2.1MB)
     |████████████████████████████████| 2.1MB 15.3MB/s 
  Saved ./Pillow-7.1.2-cp37-cp37m-manylinux1_x86_64.whl
Successfully downloaded torch torchvision numpy future pillow

This will download all the required files which can then be copied to the system without internet access. 

Then we can move torch directory to Jura using SFTP server.

Finally, we can use pip / pip3 to install the package from the downloaded files (torch directory).

$ pip install --user --no-index --find-links=torch torch torchvision

Please be aware that some Python packages take up a lot of space and you may wish to set a non standard installation directory via the --target option of pip - see the Pip documentation for full details.

SWITCHfilesender from the cluster

Switch Filesender

Filesender is a service provided by SWITCH to transfer files over http. Normally files are uploaded via a web browser but this is not possible from the DCSR clusters.

In order to avoid having to transfer the files to your local computer it is possible to use the Filesender command line tools as explained below

 

Configuring the CLI tools

 

Connect to https://filesender.switch.ch then go to the profile tab

Screenshot 2022-01-13 at 15.14.02.png

Then click on "Create API secret" to generate a code that will be used to allow you to authenticate

Screenshot 2022-01-13 at 15.14.37.png 

This will generate a long string like

ab56bf28434d1fba1d5f6g3aaf8776e55fd722df205197

This code should never be shared 

Then connect to Curnagl and run the following commands to download the CLI tool and the configuration

cd

mkdir ~/.filesender

wget https://filesender.switch.ch/clidownload.php -O filesender.py

wget https://filesender.switch.ch/clidownload.php?config=1 -O ~/.filesender/filesender.py.ini

You will then need to edit the ~/.filesender/filesender.py.ini file using your preferred tool

You need to enter your username as show in the Filesender profile and the API key that you generated

Note that at present, unlike the other Switch services this is not your EduID account!

[system]
base_url = https://filesender.switch.ch/filesender2/rest.php
default_transfer_days_valid = 20

[user]
username = Ursula.Lambda@unil.ch
apikey = ab56bf28434d1fba1d5f6g3aaf8776e55fd722df205197

 

Transferring files

 

Now that we have done this we can transfer files - note that the modules must be loaded in order to have a python with the required libraries.

[ulambda@login ~]$ module load gcc python

[ulambda@login ~]$ python3 filesender.py -p -r ethz.collaborator@protonmail.ch results.zip 

Uploading: /users/ulambda/results.zip 0-5242880 0%
Uploading: /users/ulambda/results.zip 5242880-10485760 6%
Uploading: /users/ulambda/results.zip 10485760-15728640 11%
Uploading: /users/ulambda/results.zip 15728640-20971520 17%
Uploading: /users/ulambda/results.zip 20971520-26214400 23%
Uploading: /users/ulambda/results.zip 26214400-31457280 29%
Uploading: /users/ulambda/results.zip 31457280-36700160 34%
Uploading: /users/ulambda/results.zip 36700160-41943040 40%
Uploading: /users/ulambda/results.zip 41943040-47185920 46%
Uploading: /users/ulambda/results.zip 47185920-52428800 52%
Uploading: /users/ulambda/results.zip 52428800-57671680 57%
Uploading: /users/ulambda/results.zip 57671680-62914560 63%
Uploading: /users/ulambda/results.zip 62914560-68157440 69%
Uploading: /users/ulambda/results.zip 68157440-73400320 74%
Uploading: /users/ulambda/results.zip 73400320-78643200 80%
Uploading: /users/ulambda/results.zip 78643200-83886080 86%
Uploading: /users/ulambda/results.zip 83886080-89128960 92%
Uploading: /users/ulambda/results.zip 89128960-91575794 97%
Uploading: /users/ulambda/results.zip 91575794 100%

A mail will be sent to ethz.collaborator@protonmail.ch who can then download the file

 

Filetransfer from the cluster

filetransfer.dcsr.unil.ch

https://filetransfer.dcsr.unil.ch is a service provided by the DCSR to allow you to transfer files to and from external collaborators.

This is an alternative to SWITCHFileSender and the space available is 6TB with a maximum per user limit of 4TB - this space is shared between all users so it is unlikely that you will be able to transfer 4TB of data at once.

The filetransfer service is based on LiquidFiles and the user guide  is available at https://man.liquidfiles.com/userguide.html

In order to transfer files to and from the DCSR clusters without using the web browser it is also possible to use the CLI tools as explained below


Configuring the service


First you need to connect to the web interface at https://filetransfer.dcsr.unil.ch and connect using your UNIL username (e.g. ulambda for Ursula Lambda) and password. This is not your EduID password but rather the one you use to connect to the clusters.

Once connected go to settings (the cog symbol in the top right corner) then the API tab


Screenshot 2022-01-25 at 10.11.35.png

The API key is how you authenticate from the clusters and this secret should never be shared. It can be reset via the yellow button.


Transferring files from the cluster


Connect to the login node and load the liquidfiles module

[ulambda@login ~]$ module load liquidfiles

[ulambda@login ~]$ liquidfiles 
Usage:
	liquidfiles <command> <command_args>

Valid commands are:
	attach                 Uploads given files to server.
	attach_chunk           Uploads given chunk of file to server.
	delete_attachments     Deletes the given attachments.
	delete_filelink        Deletes the given filelink.
	download               Download given files.
	file_request           Sends the file request to specified user.
	filedrop               Sends the file(s) by filedrop.
	filelink               Uploads given file and creates filelink on it.
	filelinks              Lists the available filelinks.
	get_api_key            Retrieves api key for the specified user.
	messages               Lists the available messages.
	send                   Sends the file(s) to specified user.

Type 'liquidfiles help <command_name>' to see command specific options and usage.

Abnormal exit codes:
	1     Command line arguments are invalid - Invalid command name, missing required argument, invalid value for specific argument.
	2     CURL error - Can't connect to host, connection timeout, certificate check failure, etc.
	3     Error during file upload - Invalid API key, Invalid filename, etc.
	4     Error during file send to user.
	5     Error in file system - Can't open file, etc.

For example to upload a file and create a file link

 

You can then connect to the web interface from you workstation to manage the files and send messages as required.

As preparing and uploading files can take a while we recommend that this is performed in a tmux session which means that even if your connection to the cluster is lost the process continues and you can reconnect. 

 

Transferring large files


If using a single file upload doesn't work and it is not possible to split the data into multiple smaller files then the following information may be useful


Staging the files

We recommend that you create TAR files containing the data you wish to transfer and stage this in your /scratch space. Depending on the data type it can be useful to compress it first.

$ cd /scratch/ulambda
$ mkdir mytransfer
$ cd mytransfer
$ tar -cvf mydata.tar /work/path/to/my/data

Then calculate the checksum of the file to be transfered

$ sha256sum mydata.tar
7aac249b9ec0835361f44c84921a194e587a38daecadf302e9dec44386c9fb36  mydata.tar

Split the file and transfer chunks

Whilst it might be possible to transfer huge files in one upload, it isn't recommended and above ~100GB we recommend that you follow the procedure given below.

Split the file into chunks

$ split --verbose -d -a4 -b1G mydata.tar
creating file 'x0000'
creating file 'x0001'
creating file 'x0002'
creating file 'x0003'
..
..
creating file 'x0102'

In the staging directory this will create files of exactly 1GB in size- here Usrula's file is 102.5 GB so there are 103 chunks

Use a loop and the attach_chunk command

First we need to know how many files there are

$ ls x* | wc -l
103

This is because we need to tell the service how many bits the file has been split into so it knows when the upload is complete.  

Now we note our API key and use the following bash loop (this can also be put in a script). 

$ module load liquidfiles

$ for a in `seq -w 0 102`; do liquidfiles attach_chunk --server=https://filetransfer.dcsr.unil.ch --api_key=9MUQeF5nG899lHdCtg --chunk=$a --chunks=103 --filename=mydata.tar x0$a; done

Uploading chunk 'x0000'.
100% [================================================================================]
Current chunk uploaded successfully.
Uploading chunk 'x0001'.
100% [================================================================================]
Current chunk uploaded successfully.
..
Uploading chunk 'x0102'.
100% [================================================================================]

All chunks of file uploaded successfully. ID: FP0LAQ9FGFAosPNioe6ZyQ

Alternatively we can also use variables which makes the loop cleaner and easier to put in a script:

module load liquidfiles

SERVER=https://filetransfer.dcsr.unil.ch
KEY=9MUQeF5nG899lHdCtg
CHUNKS=103
MYFILE=mydata.tar
NC=`expr $CHUNKS - 1`

for a in `seq -w 0 $NC`; do liquidfiles attach_chunk --server=$SERVER --api_key=$KEY --chunk=$a --chunks=$CHUNKS --filename=$MYFILE x0$a; done

A shell script that does the same things is

#!/bin/bash

for a in `seq -w 0 102`; do
    liquidfiles attach_chunk --server=https://filetransfer.dcsr.unil.ch --api_key=9MUQeF5nG899lHdCtg --chunk=$a --chunks=103 --filename=mydata.tar x0$a
done

Once all the chunks are uploaded the file will be assembled/processed and after a short while it will be visible in the web interface.

Here we see a previously uploaded file of 304 GB called my file.ffdata

Screenshot 2022-02-11 at 20.19.32.png


Cleaning up

Once the file is uploaded please don't forget to clean up the TAR file and the chunks.

$ cd /scratch/ulambda/mytransfer
$ rm *
$ cd ..
$ rmdir mytransfer



CryoSPARC

First of all, if you plan to use CryoSPARC on the cluster, please contact us to get a port number (you will understand later why it's important).

CryoSPARC can be used on Curnagl and take benefit from Nvidia A100 GPUs. This page presents the installation in the /work storage location, so that it can be shared among the members of the same project. The purpose is to help you with installation, but in case of problem, don't hesitate to look at the official documentation.

1. Get a license

A free license can be obtained for non-commercial use from Structura Biotechnology.

You will receive an email containing your license ID. It is similar to:
235e3142-d2b0-17eb-c43a-9c2461c1234d

2. Prerequisites

Before starting the installation we suppose that:

Obviously you must not use those values and they must be modified.

3. Install CryoSPARC

First, connect to the Curnagl login node using your favourite SSH client and follow the next steps.

Define the 3 prerequisites variables 

export LICENSE_ID="235e3142-d2b0-17eb-c43a-9c2461c1234d"
export CRYOSPARC_ROOT=/work/FAC/FBM/DMF/ulambda/cryosparc
export CRYOSPARC_PORT=45678

Create some directories and download the packages

mkdir -p $CRYOSPARC_ROOT
mkdir -p $CRYOSPARC_ROOT/database
mkdir -p $CRYOSPARC_ROOT/scratch
mkdir -p $CRYOSPARC_ROOT/curnagl_config
cd $CRYOSPARC_ROOT
curl -L https://get.cryosparc.com/download/master-latest/$LICENSE_ID -o cryosparc_master.tar.gz
curl -L https://get.cryosparc.com/download/worker-latest/$LICENSE_ID -o cryosparc_worker.tar.gz
tar xf cryosparc_master.tar.gz
tar xf cryosparc_worker.tar.gz

Create $CRYOSPARC_ROOT/curnagl_config/cluster_info.json 

Use your favourite editor to fill the file with the following content:

{
"qdel_cmd_tpl": "scancel {{ cluster_job_id }}",
"worker_bin_path": "/work/FAC/FBM/DMF/ulambda/cryosparc/cryosparc_worker/bin/cryosparcw",
"title": "curnagl",
"cache_path": "/work/FAC/FBM/DMF/ulambda/cryosparc/scratch",
"qinfo_cmd_tpl": "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'",
"qsub_cmd_tpl": "sbatch {{ script_path_abs }}",
"qstat_cmd_tpl": "squeue -j {{ cluster_job_id }}",
"cache_quota_mb": 1000000,
"send_cmd_tpl": "{{ command }}",
"cache_reserve_mb": 10000,
"name": "curnagl"
}

Pay attention to worker_bin_path and cache_path variables, they must be adapted to your setup. cache_reserve_mb and cache_quota_mb might have to be modified, depending on your needs.

Create $CRYOSPARC_ROOT/curnagl_config/cluster_script.sh

Use your favourite editor to fill the file with the following content:

#!/bin/bash
#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH --partition=gpu
#SBATCH --time=4:00:00
#SBATCH --output={{ job_log_path_abs }}
#SBATCH --error={{ job_log_path_abs }}
#SBATCH --nodes=1
#SBATCH --mem={{ (ram_gb*1024*4)|int }}M
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task={{ num_cpu }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --gres-flags=enforce-binding

module load gcc cuda

available_devs=""
for devidx in $(seq 1 16);
do
    if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
        if [[ -z "$available_devs" ]] ; then
            available_devs=$devidx
        else
            available_devs=$available_devs,$devidx
        fi
    fi
done
export CUDA_VISIBLE_DEVICES=$available_devs

srun {{ run_cmd }}

Install CryoSPARC master

cd $CRYOSPARC_ROOT/cryosparc_master
./install.sh --license $LICENSE_ID --hostname curnagl --dbpath $CRYOSPARC_ROOT/database --port $CRYOSPARC_PORT

At the end of the installation process, the installer asks you if you want to modify your ~/.bashrc file, please answer yes.

Start CryoSPARC and create a user

export PATH=$CRYOSPARC_ROOT/cryosparc_master/bin:$PATH
cryosparcm start
cryosparcm createuser --email "ursula.lambda@unil.ch" --password "ursulabestpassword" --username "ulambda" --firstname "Ursula" --lastname "Lambda"

Of course, when creating the user, you have to use appropriate information, the password shouldn't be your UNIL password.

Install CryoSPARC worker

First you have to connect to a GPU node:

Sinteractive -G1 -m8G

Once you are connected to the node:

export LICENSE_ID="235e3142-d2b0-17eb-c43a-9c2461c1234d"
export CRYOSPARC_ROOT=/work/FAC/FBM/DMF/ulambda/cryosparc
module load gcc cuda
cd $CRYOSPARC_ROOT/cryosparc_worker
./install.sh --license $LICENSE_ID --cudapath $CUDA_HOME

At the end of the process, you can logout.

Configure the cluster workers

cd $CRYOSPARC_ROOT/curnagl_config
cryosparcm cluster connect

4. Connection to the web interface

You have to create a tunnel from your laptop to the Curnagl login node:

ssh -N -L 8080:localhost:45678 ulambda@curnagl.dcsr.unil.ch

Please note that the port 45678 must be modified according to the one that DCSR gave you, and ulambda must be replaced with your UNIL login.

Then you can open a Web browser the following address: http://localhost:8080.

image-1643304261513.png

Here you have to use the credentials defined when you created a user.

5. Working with CryoSPARC

When you start working with CryoSPARC on Curnagl, you have to start it from the login node:

cryosparcm start

When you have finished, you should stop CryoSPARC in order to avoid wasting resources on Curnagl login node:

cryosparcm stop

Sandbox containers

Container basics


For how to use Singularity/Apptainer containers please see our course at: http://dcsrs-courses.ad.unil.ch/courses/r_python_singularity/


Sandboxes

A container image (the .sif file) is read only and its contents cannot be changed which makes them perfect for distributing safe in the knowledge that they should run exactly as they were created.

Sometimes, especially when developing things, it's very useful to be able to interactively modify a container and this is what sandboxes are for.

Please be aware that anything done by hand is not reproducible so all steps should be transferred to the container definition file.


Creating and modifying a sandbox


Note that the steps here should be run on the cluster login node (curnagl.dcsr.unil.ch) as it is currently the only machine with the configuration in place to allow containers to be built.

To start you need a basic definition file - this can be an empty OS or something more complicated that already has some configuration.

In the following example we will use a definition that installs the latest version of R. We will then try and install extra packages before creating the immutable SIF image.


Here's our file which we save as newR.def

BootStrap: docker
From: ubuntu:20.04

%post
  apt update
  apt install -y locales gnupg-agent
  sed -i '/^#.* en_.*.UTF-8 /s/^#//' /etc/locale.gen
  sed -i '/^#.* fr_.*.UTF-8 /s/^#//' /etc/locale.gen
  locale-gen

  # install two helper packages we need
  apt install -y --no-install-recommends software-properties-common dirmngr

  # add the signing key (by Michael Rutter) for these repos
  wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
  
  apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 51716619E084DAB9

  # add the R 4.0 repo from CRAN -- adjust 'focal' to 'groovy' or 'bionic' as needed
  add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"

  apt install -y --no-install-recommends r-base

Create the sandbox

Change to your scratch space /scratch/username and:

$ module load singularity

$ singularity build --fakeroot --sandbox newR newR.def

WARNING: The underlying filesystem on which resides "/scratch/username/myR" won't allow to set ownership, as a consequence the sandbox could not preserve image's files/directories ownerships
INFO:    Starting build...
Getting image source signatures
Copying blob d7bfe07ed847 [--------------------------------------] 0.0b / 0.0b
Copying config 2772dfba34 done  
..
..
..
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Processing triggers for systemd (245.4-4ubuntu3.17) ...
Processing triggers for mime-support (3.64ubuntu1) ...
INFO:    Creating sandbox directory...
INFO:    Build complete: myR


This will create a directory called newR which is the writable container image. Have a look inside and see what there is!


Run and edit the image


Before running the container we need to set up the filesystems that will be visible inside - here we want /users  and /scratch to be visible

$ export SINGULARITY_BINDPATH="/users,/scratch"

$ mkdir newR/users
$ mkdir newR/scratch

Now we launch the image with an interactive shell

$ singularity shell --writable --fakeroot newR/

Singularity> 

On the command line we can then work interactively with the image.

As we are going to be installing R packages we know that we need some extra tools:

Singularity> apt-get install make gcc g++ gfortran

Now we can launch R and install some packages

Singularity> R

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
..

> install.packages('tibble')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘glue’, ‘cli’, ‘utf8’, ‘ellipsis’, ‘fansi’, ‘lifecycle’, ‘magrittr’, ‘pillar’, ‘rlang’, ‘vctrs’

trying URL 'https://cloud.r-project.org/src/contrib/glue_1.6.2.tar.gz'
Content type 'application/x-gzip' length 106510 bytes (104 KB)
==================================================
downloaded 104 KB

..
..

** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (tibble)

Keep iterating until things are correct but don't forget to write down all the steps and transfer then to the definition file to allow for future reproducible builds.


Sandbox to SIF


$ singularity build --fakeroot R-4.2.1-production.sif  newR/

You will now have a SIF file that can be used in the normal way

$ singularity run R-4.2.1-production.sif R

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
..

>

Remember that files on /scratch will be automatically deleted if there isn't enough free space so save your definitions in a git repository and move the SIF images to your project space in /work

Course software for decision trees / random forests

In the practicals, we will use only a small dataset and we will need only little computation power and memory ressources. You can therefore do the practicals on various computing platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals. 

If you choose to work on the UNIL JupyterLab, then you do not need to prepare anything since all the necessary libraries will already be installed on the UNIL JupyterLab. In all cases, you will receive a guest username during the course, so you will be able to work on the UNIL JupyterLab.

Otherwise, if you prefer to work on your laptop or on Curnagl, please make sure you have a working installation before the day of the course as on the day we will be unable to provide any assistance with this. If you have difficulties with the installation on Curnagl we can help you so please contact us before the course at helpdesk@unil.ch with subject: DCSR ML course.

On the other hand, if we are unable to install the libraries on your laptop, we will unfortunately not be able to help you (there are too many particular cases), so you will need to use the UNIL Jupyter Lab during the course. 

JupyterLab

Here are some instructions for using the UNIL JupyterLab to do the practicals.

You need to be able to access the eduroam wifi with your UNIL account or via the UNIL VPN.

Go to the webpage:  https://jupyter.dcsr.unil.ch/jupyter

Enter the login and password that you have received during the course. Due to a technical issue, you may receive a warning message "Your connection is not private". This is OK. So please proceed by clicking on the advanced button and then on "Proceed to dcsrs-jupyter.ad.unil.ch (unsafe)".

Python

Click on the "ML" square button in the Notebook panel.

Copy / paste the commands from the html practical file to the Jupyter Notebook.  

To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

When you have finished the practicals, select File / Log out.

R

Click on the "ML R" square button in the Notebook panel.

Copy / paste the commands from the html practical file to the Jupyter Notebook.

To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

When you have finished the practicals, select File / Log out.

Laptop

You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).

Python installation

Here are some instructions for installing decision tree and random forest libraries on your laptop. You need Python >= 3.7.

For Mac and Linux

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install scikit-learn pandas matplotlib graphviz seaborn

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

source mlcourse/bin/activate

pip3 install notebook

jupyter notebook
For Windows

If you do not have Python installed, you can use either Conda: https://docs.conda.io/en/latest/miniconda.html or Python official installer: https://www.python.org/downloads/windows/ 

Let us create a virtual environment. Open  your terminal and type:

C:\Users\user>python -m venv mlcourse

C:\Users\user>mlcourse\Scripts\activate.bat

(mlcourse) C:\Users\user>

(mlcourse) C:\Users\user>pip3 install scikit-learn pandas matplotlib graphviz seaborn

You can terminate the current session:

(mlcourse) C:\Users\user>deactivate

C:\Users\user>

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

C:\Users\user>mlcourse\Scripts\activate.bat

(mlcourse) C:\Users\user>pip3 install notebook

(mlcourse) C:\Users\user>jupyter notebook

Information: Use Control-C to stop this server.

R installation

Here are some instructions for installing decision tree and random forest libraries on your laptop.

You need R >= 4.0. Run R in your terminal or launch RStudio.

For Windows users, you can download R here: https://cran.r-project.org/bin/windows/base/

REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

install.packages("rpart")

install.packages("rpart.plot")

install.packages("randomForest")

install.packages("tidyverse")

The installation of "tidyverse" may lead to some conflicts, but do not worry you should be able to do the practicals fine. 

You can terminate the current R session:

q()

Save workspace image? [y/n/c]: n

TO DO THE PRACTICALS (today or another day):

Simply run R in your terminal or launch RStudio.

Curnagl

For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.

For Mac users, download and install XQuartz (X server): https://www.xquartz.org/

For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: https://mobaxterm.mobatek.net/download-home-edition.html

For Linux users, you do not need to install anything.

Python installation

Here are some instructions for installing decision tree and random forest libraries on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.

For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.

See also the documentation: https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/

mkdir < my unil username >

cd < my unil username >

For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13

python -m venv mlcourse

source mlcourse/bin/activate

pip install scikit-learn pandas matplotlib graphviz seaborn

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >

For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13

source mlcourse/bin/activate

python

R installation

Here are some instructions for installing decision tree and random forest libraries on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.

For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question “do you want to save password ?” Say No if your are not sure. Then you should be on Curnagl.

See also the documentation: https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/

mkdir < my unil username >

cd < my unil username >

For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13 r/4.2.1

R

REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

install.packages("rpart")

install.packages("rpart.plot")

install.packages("randomForest")

install.packages("tidyverse")

The installation of "tidyverse" may lead to some conflicts, but do not worry you should be able to do the practicals fine. 

You can terminate the current R session:

q()

Save workspace image? [y/n/c]: n

TO DO THE PRACTICALS (today or another day):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >

For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13 r/4.2.1

R

Course software for introductory deep learning

In the practicals, we will use only a small dataset and we will need only little computation power and memory ressources. You can therefore do the practicals on various computing platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals. 

If you choose to work on the UNIL JupyterLab, then you do not need to prepare anything since all the necessary libraries will already be installed on the UNIL JupyterLab. In all cases, you will receive a guest username during the course, so you will be able to work on the UNIL JupyterLab.

Otherwise, if you prefer to work on your laptop or on Curnagl, please make sure you have a working installation before the day of the course as on the day we will be unable to provide any assistance with this. If you have difficulties with the installation on Curnagl we can help you so please contact us before the course at helpdesk@unil.ch with subject: DCSR ML course.

On the other hand, if we are unable to install the libraries on your laptop, we will unfortunately not be able to help you (there are too many particular cases), so you will need to use the UNIL Jupyter Lab during the course. 

JupyterLab

Here are some instructions for using the UNIL JupyterLab to do the practicals.

You need to be able to access the eduroam wifi with your UNIL account or via the UNIL VPN.

Go to the webpage:  https://jupyter.dcsr.unil.ch/jupyter

Enter the login and password that you have received during the course. Due to a technical issue, you may receive a warning message "Your connection is not private". This is OK. So please proceed by clicking on the advanced button and then on "Proceed to dcsrs-jupyter.ad.unil.ch (unsafe)".

Python

Click on the "ML" square button in the Notebook panel.

Copy / paste the commands from the html practical file to the Jupyter Notebook.  

To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

When using TensorFlow, you may receive a warning

2022-09-22 11:01:12.232756: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 11:01:12.232856: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).

When you have finished the practicals, select File / Log out.

R

Click on the "ML R" square button in the Notebook panel.

Copy / paste the commands from the html practical file to the Jupyter Notebook.

To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

When using TensorFlow, you may receive a warning

2022-09-22 11:01:12.232756: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 11:01:12.232856: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).

When you have finished the practicals, select File / Log out.

Laptop

You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).

Python installation

Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on your laptop. You need Python >= 3.8.

For Linux

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install tensorflow scikit-learn scikeras eli5 pandas matplotlib notebook keras-tuner

You may need to choose the right library versions, for example tensorflow==2.12.0

To check that Tensorflow was installed:

python3 -c "import tensorflow; print(tensorflow.version.VERSION)"

There might be a warning message (see above) and the output should be something like "2.12.0".

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

source mlcourse/bin/activate

jupyter notebook
For Mac

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner

If you receive an error message such as:

ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos

Then, try the following command:

SYSTEM_VERSION_COMPAT=0 pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner

If you have a Mac with M1 or more recent chip (if you are not sure have a look at "About this Mac"), you can also install the tensorflow-metal library to accelerate training on Mac GPUs (but this is not necessary for the course):

pip3 install tensorflow-metal

To check that Tensorflow was installed:

python3 -c "import tensorflow; print(tensorflow.version.VERSION)"

There might be a warning message (see above) and the output should be something like "2.12.0".

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

source mlcourse/bin/activate

jupyter notebook
For Windows

If you do not have Python installed, you can use either Conda: https://docs.conda.io/en/latest/miniconda.html (see the instructions here: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html) or Python official installer: https://www.python.org/downloads/windows/ 

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install tensorflow scikit-learn scikeras eli5 pandas matplotlib notebook keras-tuner

You may need to choose the right library versions, for example tensorflow==2.12.0

To check that Tensorflow was installed:

python -c "import tensorflow; print(tensorflow.version.VERSION)"

There might be a warning message (see above) and the output should be something like "2.12.0".

You can terminate the current session:

deactivate

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

mlcourse\Scripts\activate.bat

jupyter notebook

R installation

Here are some instructions for installing Keras with TensorFlow at the backend, and other libraries, on your laptop. The R keras is actually an interface to the Python Keras. In simple terms, this means that the keras R package allows you to enjoy the benefit of R programming while having access to the capabilities of the Python Keras package.

You need R >= 4.0 and Python >= 3.8. 

REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

For Mac and Linux

Run the following commands on your terminal:

cd ~

ls .virtualenvs

# Create this directory only if you receive an error message 
# saying that this directory does not exist
mkdir .virtualenvs

Then

cd ~/.virtualenvs

python3 -m venv r-reticulate

source r-reticulate/bin/activate

# For Linux
pip3 install tensorflow scikit-learn scikeras eli5 pandas matplotlib notebook keras-tuner

# For Mac
pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner

deactivate

You must name the environment 'r-reticulate' as otherwise it wont be able to find it.

You may need to choose the right library versions, for example tensorflow==2.12.0

Run R in your terminal and type

install.packages("keras")

install.packages("reticulate")

install.packages("ggplot2")

install.packages("ggfortify")

To check that Keras was properly installed:

library(keras)

library(tensorflow)

is_keras_available(version = NULL)

There might be a warning message (see above) and the output should be something like "TRUE".

You can terminate the current R session:

q()

Save workspace image? [y/n/c]: n

TO DO THE PRACTICALS (today or another day):

Then you can either run R in your terminal or launch RStudio.

For windows

If you do not have Python installed, you can use either Conda: https://docs.conda.io/en/latest/miniconda.html (see the instructions here: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html).

Run the following commands:

install.packages("keras")

library(keras)

library(tensorflow)

install_tensorflow(method="conda", envname="r-reticulate", version="2.9.2")

install.packages("ggplot2")
install.packages("ggfortify")

You must name the environment 'r-reticulate' as otherwise it won't be able to find it. To test the correct installation, type:

tf$constant("Hello Tensorflow!")

You should obtain messages such as "Loaded Tensorflow version 2.9.2" and "tf.Tensor(b'Hello Tensorflow!', shape=(), dtype=string)".

You can terminate the current R session:

q()

Save workspace image? [y/n/c]: n

TO DO THE PRACTICALS (today or another day):

Simply run R in your terminal or launch RStudio.

Curnagl

For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.

For Mac users, download and install XQuartz (X server): https://www.xquartz.org/

For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: https://mobaxterm.mobatek.net/download-home-edition.html

For Linux users, you do not need to install anything.

When testing if TensorFlow was properly installed (see below) you may receive a warning

2022-03-16 12:15:00.564218: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/python-3.8.8-tb3aceqq5wzx4kr5m7s5m4kzh4kxi3ex/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tcl-8.6.11-aonlmtcje4sgqf6gc4d56cnp3mbbhvnj/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tk-8.6.11-2gb36lqwohtzopr52c62hajn4tq7sf6m/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib64:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib
2022-03-16 12:15:00.564262: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).

Python installation

Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.

For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.

See also the documentation: https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc

mkdir < my unil username >

cd < my unil username >

For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).

git clone https://c4science.ch/source/DL_INTRO.git

module load gcc python/3.9.13

python -m venv mlcourse

source mlcourse/bin/activate

pip install -r DL_INTRO/requirements.txt

To check that TensorFlow was installed:

python -c 'import tensorflow; print(tensorflow.version.VERSION)'

There might be a warning message (see above) and the output should be something like "2.9.2".

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >

For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13

source mlcourse/bin/activate

python

R installation

Here are some instructions for installing Keras with TensorFlow at the backend, and other libraries, on the UNIL cluster called Curnagl. The R keras is actually an interface to the Python Keras. In simple terms, this means that the keras R package allows you to enjoy the benefit of R programming while having access to the capabilities of the Python Keras package. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.

For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question “do you want to save password ?” Say No if your are not sure. Then you should be on Curnagl.

See also the documentation: https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster

module load gcc cmake openblas python/3.9.13 r/4.2.1

git clone https://c4science.ch/source/DL_INTRO.git

cd ~/.virtualenvs

python -m venv r-reticulate

source r-reticulate/bin/activate

pip install -r ~/DL_INTRO/requirements.txt

pip --no-cache install --no-binary numpy numpy==1.25.0

For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).

REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

R

install.packages("keras")

install.packages("reticulate")

install.packages("ggplot2")

install.packages("ggfortify")

To check that Keras was properly installed:

library(keras)

library(tensorflow)

is_keras_available(version = NULL)

There might be a warning message (see above) and the output should be something like "TRUE".

You can terminate the current R session:

q()

Save workspace image? [y/n/c]: n

TO DO THE PRACTICALS (today or another day):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >

For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13 r/4.2.1

R

Rstudio on the Curnagl cluster

Rstudio can be run on the curnagl cluster from within a singularity container, with an interactive interface provided on the web browser of any given workstation.

Running interactively with Rstudio on the clusters is only meant for testing. Development must be carried out on the users workstations, and production runs must be accomplished from within R scripts/codes in batch mode.

Preparatory steps

  1. If the workstation is outside of the campus, first connect to the VPN
  2. Login to the cluster
  3. Create/choose a folder under the /scratch or the /work filesystems under your project (ex. /work/FAC/.../rstudio); this folder will appear as your HOME inside the Rstudio environment, and we will refer to it as ${WORK}
  4. (This step is optional and only applies if you need RStudio version >4.2.2) Create the singularity image inside the cluster (substitute ${WORK} appropriately):

    [me@curnagl ~]$ module load singularity
    [me@curnagl ~]$ singularity pull --dir="${WORK}" --name=rstudio-server.sif docker://rocker/rstudio
    This last step might take a while...

The batch script

Create a file rstudio-server.sbatch with the following contents (it must be on the cluster, but the exact location does not matter):

#!/bin/bash -l

#SBATCH --account ACCOUNT_NAME
#SBATCH --mail-type BEGIN 
#SBATCH --mail-user <first.lastname>@unil.ch

#SBATCH --chdir ${WORK}
#SBATCH --job-name rstudio-server
#SBATCH --signal=USR2
#SBATCH --output=rstudio-server.job.%j

#SBATCH --partition interactive

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --time 01:59:59
#SBATCH --export NONE

module load gcc python singularity

RSTUDIO_CWD=$(pwd)
RSTUDIO_SIF="/dcsrsoft/singularity/containers/rstudio-4.2.2.sif"
LOCAL_PORT=8787

# Create temp directory for ephemeral content to bind-mount in the container
RSTUDIO_TMP=$(mktemp --tmpdir -d rstudio.XXX)

mkdir -p -m 700 \
        ${RSTUDIO_TMP}/run \
        ${RSTUDIO_TMP}/tmp \
        ${RSTUDIO_TMP}/var/lib/rstudio-server

mkdir -p ${RSTUDIO_CWD}/.R

cat > ${RSTUDIO_TMP}/database.conf <<END
provider=sqlite
directory=/var/lib/rstudio-server
END

# Set OMP_NUM_THREADS to prevent OpenBLAS (and any other OpenMP-enhanced
# libraries used by R) from spawning more threads than the number of processors
# allocated to the job.
#
# Set R_LIBS_USER to a path specific to rocker/rstudio to avoid conflicts with
# personal libraries from any R installation in the host environment

cat > ${RSTUDIO_TMP}/rsession.sh <<END
#!/bin/sh

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export R_LIBS_USER=${RSTUDIO_CWD}/.R
export PATH=${PATH}:/usr/lib/rstudio-server/bin
exec rsession "\${@}"
END

chmod +x ${RSTUDIO_TMP}/rsession.sh

SINGULARITY_BIND+="${RSTUDIO_CWD}:${RSTUDIO_CWD},"
SINGULARITY_BIND+="${RSTUDIO_TMP}/run:/run,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/tmp:/tmp,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/database.conf:/etc/rstudio/database.conf,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/rsession.sh:/etc/rstudio/rsession.sh,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/var/lib/rstudio-server:/var/lib/rstudio-server,"
SINGULARITY_BIND+="/users:/users,/scratch:/scratch,/work:/work"
export SINGULARITY_BIND

# Do not suspend idle sessions.
# Alternative to setting session-timeout-minutes=0 in /etc/rstudio/rsession.conf
export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0

export SINGULARITYENV_USER=$(id -un)
export SINGULARITYENV_PASSWORD=$(openssl rand -base64 15)

# get unused socket per https://unix.stackexchange.com/a/132524
# tiny race condition between the python & singularity commands
readonly PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
cat 1>&2 <<END
1. SSH tunnel from your workstation using the following command:

   ssh -n -N -J ${SINGULARITYENV_USER}@curnagl.dcsr.unil.ch -L ${LOCAL_PORT}:localhost:${PORT} ${SINGULARITYENV_USER}@${HOSTNAME}

   and point your web browser to http://localhost:${LOCAL_PORT}

2. log in to RStudio Server using the following credentials:

   user: ${SINGULARITYENV_USER}
   password: ${SINGULARITYENV_PASSWORD}

When done using RStudio Server, terminate the job by:

1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)
2. Issue the following command on the login node:

      scancel -f ${SLURM_JOB_ID}
END

singularity exec --home ${RSTUDIO_CWD} --cleanenv ${RSTUDIO_SIF} \
    rserver --www-port ${PORT} \
            --auth-none=0 \
            --auth-pam-helper-path=pam-helper \
            --auth-stay-signed-in-days=30 \
            --auth-timeout-minutes=0 \
            --rsession-path=/etc/rstudio/rsession.sh \
            --server-user=${SINGULARITYENV_USER}

SINGULARITY_EXIT_CODE=$?
echo "rserver exited $SINGULARITY_EXIT_CODE" 1>&2
exit $SINGULARITY_EXIT_CODE

You need to carefully replace, at the beginning of the file, the following elements:

Running Rstudio

Submit a job for running Rstudio from within the cluster with:

[me@curnagl ~]$ sbatch rstudio-server.sbatch

You will receive a notification by e-mail as soon as the job is running.

A new file ${WORK}/rstudio-server.job.### (with ### some given job id number) is then automatically created. Its contents will give you instructions on how to proceed in order to start a new Rstudio remote session from your workstation.

You will have 2h time to test your code.

Rstudio on the Urblauna cluster

Rstudio can be run on the Urblauna cluster from within a singularity container, with an interactive interface provided on the web browser of a Guacamole session.

Running interactively with Rstudio on the clusters is only meant for testing. Development must be carried out on the users workstations, and production runs must be accomplished from within R scripts/codes in batch mode.

Preparatory steps on Curnagl side

A few operations have to be executed on the Curnagl cluster:

  1. Create a directory in your /work project dedicated to be used as an R library, for instance:
    mkdir /work/FAC/FBM/DBC/mypi/project/R_ROOT 
  2. Optional : install required R packages, for instance ggplot2
    module load gcc r export R_LIBS_USER=/work/FAC/FBM/DBC/mypi/project/R_ROOT R >>>install.packages("ggplot2")

The batch script

Create a file rstudio-server.sbatch with the following contents (it must be on the cluster, but the exact location does not matter):

#!/bin/bash -l

#SBATCH --account <<<ACCOUNT_NAME>>>
#SBATCH --job-name rstudio-server
#SBATCH --signal=USR2
#SBATCH --output=rstudio-server.job
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --time 02:00:00
#SBATCH --partition interactive
#SBATCH --export NONE

RLIBS_USER_DIR=<<<RLIBS_PATH>>>
RSTUDIO_CWD=~
RSTUDIO_SIF="/dcsrsoft/singularity/containers/rstudio-4.2.1.sif"

module load gcc python singularity
module load r
RLIBS_DIR=${R_ROOT}/rlib/R/library
module unload r


# Create temp directory for ephemeral content to bind-mount in the container
RSTUDIO_TMP=$(mktemp --tmpdir -d rstudio.XXX)

mkdir -p -m 700 \
        ${RSTUDIO_TMP}/run \
        ${RSTUDIO_TMP}/tmp \
        ${RSTUDIO_TMP}/var/lib/rstudio-server

mkdir -p ${RSTUDIO_CWD}/.R

cat > ${RSTUDIO_TMP}/database.conf <<END
provider=sqlite
directory=/var/lib/rstudio-server
END

# Set OMP_NUM_THREADS to prevent OpenBLAS (and any other OpenMP-enhanced
# libraries used by R) from spawning more threads than the number of processors
# allocated to the job.
#
# Set R_LIBS_USER to a path specific to rocker/rstudio to avoid conflicts with
# personal libraries from any R installation in the host environment

cat > ${RSTUDIO_TMP}/rsession.sh <<END
#!/bin/sh

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export R_LIBS=${RLIBS_DIR}
export R_LIBS_USER=${RLIBS_USER_DIR}
export PATH=${PATH}:/usr/lib/rstudio-server/bin
exec rsession "\${@}"
END

chmod +x ${RSTUDIO_TMP}/rsession.sh

SINGULARITY_BIND+="${RSTUDIO_CWD}:${RSTUDIO_CWD},"
SINGULARITY_BIND+="${RSTUDIO_TMP}/run:/run,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/tmp:/tmp,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/database.conf:/etc/rstudio/database.conf,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/rsession.sh:/etc/rstudio/rsession.sh,"
SINGULARITY_BIND+="${RSTUDIO_TMP}/var/lib/rstudio-server:/var/lib/rstudio-server,"
SINGULARITY_BIND+="/users:/users,/scratch:/scratch,/work:/work,/dcsrsoft"
export SINGULARITY_BIND

# Do not suspend idle sessions.
# Alternative to setting session-timeout-minutes=0 in /etc/rstudio/rsession.conf
export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0

export SINGULARITYENV_USER=$(id -un)
export SINGULARITYENV_PASSWORD=$(openssl rand -base64 15)

# get unused socket per https://unix.stackexchange.com/a/132524
# tiny race condition between the python & singularity commands
readonly PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
cat 1>&2 <<END
1. open the Guacamole web browser to http://${HOSTNAME}:${PORT}

2. log in to RStudio Server using the following credentials:

   user: ${SINGULARITYENV_USER}
   password: ${SINGULARITYENV_PASSWORD}

When done using RStudio Server, terminate the job by:

1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)
2. Issue the following command on the login node:

      scancel -f ${SLURM_JOB_ID}
END

#singularity exec --env R_LIBS=${RLIBS_DIR} --home ${RSTUDIO_CWD} --cleanenv ${RSTUDIO_SIF} \
singularity exec --home ${RSTUDIO_CWD} --cleanenv ${RSTUDIO_SIF} \
    rserver --www-port ${PORT} \
            --auth-none=0 \
            --auth-pam-helper-path=pam-helper \
            --auth-stay-signed-in-days=30 \
            --auth-timeout-minutes=0 \
            --rsession-path=/etc/rstudio/rsession.sh \
            --server-user=${SINGULARITYENV_USER}

SINGULARITY_EXIT_CODE=$?
echo "rserver exited $SINGULARITY_EXIT_CODE" 1>&2
exit $SINGULARITY_EXIT_CODE

You need to carefully replace, at the beginning of the file, the following elements:

Running Rstudio

Submit a job for running Rstudio from within the cluster with:

[me@curnagl ~]$ sbatch rstudio-server.sbatch

Once the job is running (you can check that with Squeue), a new file rstudio-server.job is then automatically created. Its contents will give you instructions on how to proceed in order to start a new Rstudio remote session from Guacamole.

In this script we have reserved 2 hours

JupyterLab on the curnagl cluster

JupyterLab can be run on the curnagl cluster for testing purposes, only as an intermediate step in the porting of applications from regular workstations to curnagl.

The installation is made inside a python virtual environment, and this tutorial covers the installation of the following kernels: IPyKernel (python), IRKernel (R), IJulia (julia), MATLAB kernel (matlab), IOctave (octave), stata_kernel (stata) and sas_kernel (sas).

If the workstation is outside of the campus, first connect to the VPN.

Creating the virtual environment

First create/choose a folder ${WORK} under the /scratch or the /work filesystems under your project (ex. WORK=/work/FAC/.../my_project). The following needs to be run only once on the cluster (preferably on an interactive computing node):

module load gcc python
python -m venv ${WORK}/jlab_venv
${WORK}/jlab_venv/bin/pip install jupyterlab ipykernel numpy matplotlib

The IPyKernel is automatically available. The other kernels need to be installed according to your needs.

Installing the kernels

Each time you start a new session on the cluster, remember to define the variable ${WORK} according to the path you chose when creating the virtual environment.

IRKernel

module load gcc r
export R_LIBS_USER=${WORK}/jlab_venv/lib/Rlibs
mkdir -p ${R_LIBS_USER}
echo "install.packages('IRkernel', repos='https://stat.ethz.ch/CRAN/', lib=Sys.getenv('R_LIBS_USER'))" | R --no-save
source ${WORK}/jlab_venv/bin/activate
echo "IRkernel::installspec()" | R --no-save
deactivate

IJulia

module load gcc julia
export JULIA_DEPOT_PATH=${WORK}/jlab_venv/lib/Jlibs
julia -e 'using Pkg; Pkg.add("IJulia")'

MATLAB kernel

${WORK}/jlab_venv/bin/pip install matlab_kernel matlabengine==9.11.19

IOctave

${WORK}/jlab_venv/bin/pip install octave_kernel
echo "c.OctaveKernel.plot_settings = dict(backend='gnuplot')" > ~/.jupyter/octave_kernel_config.py

stata_kernel

module load stata-se
${WORK}/jlab_venv/bin/pip install stata_kernel
${WORK}/jlab_venv/bin/python -m stata_kernel.install
sed -i "s/^stata_path = None/stata_path = $(echo ${STATA_SE_ROOT} | sed 's/\//\\\//g')\/stata-se/" ~/.stata_kernel.conf
sed -i 's/stata_path = \(.*\)stata-mp/stata_path = \1stata-se/' ~/.stata_kernel.conf

sas_kernel

module load sas
${WORK}/jlab_venv/bin/pip install sas_kernel
sed -i "s/'\/opt\/sasinside\/SASHome/'$(echo ${SAS_ROOT} | sed 's/\//\\\//g')/g" ${WORK}/jlab_venv/lib64/python3.9/site-packages/saspy/sascfg.py

Running JupyterLab

Before running JupyterLab, you need to start an interactive session!

Sinteractive

Take note of the name of the running node, that you will later need.  On curnagl, you can type:

hostname

If you didn't install all of the kernels, the corresponding lines should be ignored in the commands below. The execution order is important, in the sense that loading the gcc module should always be done before activating virtual environments.

# Load python
module load gcc python

# IOctave (optional)
module load octave gnuplot

# IRKernel (optional)
export R_LIBS_USER=${WORK}/jlab_venv/lib/Rlibs

# IJulia (optional)
export JULIA_DEPOT_PATH=${WORK}/jlab_venv/lib/Jlibs

# JupyterLab environment
source ${WORK}/jlab_venv/bin/activate

# Launch JupyterLab (on the shell a link that can be copied on the browser will appear)
cd ${WORK}
jupyter-lab

deactivate

Before you can copy and paste the link into your favorite browser, you will need to establish an SSH tunnel to the interactive node. From a UNIX-like workstation, you can establish the SSH tunnel to the curnagl node with the following command (replace <username> with your user name, and <hostname> with the name of the node you obtained above, and the <port> number is obtained from the link, it is typically 8888):

ssh -n -N -J <username>@curnagl.dcsr.unil.ch -L <port>:localhost:<port> <username>@<hostname>

You will be prompted for your password. When you have finished, you can close the tunnel with Ctrl-C.

Note on Python/R/Julia modules and packages

The modules you install manually from JupyterLab in Python, R or Julia end up inside the JupyterLab virtual environment (${WORK}/jlab_venv). They are hence isolated and independent from your Python/R/Julia instances outside of the virtual environment.

JupyterLab with C++ on the curnagl cluster

JupyterLab can be run on the curnagl cluster for testing purposes, only as an intermediate step in the porting of applications from regular workstations to curnagl.

This tutorial intends to setup JupyterLab on the cluster together with the support for the C++ programming language, through the xeus-cling kernel. Besides the IPyKernel kernel for the python language, which is natively supported, we will also provide the option to install support for the following kernels: IRKernel (R), IJulia (julia), MATLAB kernel (matlab), IOctave (octave), stata_kernel (stata) and sas_kernel (sas).

These instructions are hence related to the JupyterLab on the curnagl cluster tutorial, but the implementation is very different because a JIT compiler is necessary in order to interactively process C++ code. Instead of using a python virtual environment in order to isolate and install JupyterLab, the kernels and the corresponding dependencies, we use micromamba.

Setup of the micromamba virtual environment

First create/choose a folder ${WORK} under the /scratch or the /work filesystems under your project (ex. WORK=/work/FAC/.../my_project). The following needs to be run only once on the cluster (preferably on an interactive computing node):

module load gcc python
export MAMBA_ROOT=/dcsrsoft/spack/external/micromamba
export MAMBA_ROOT_PREFIX="${WORK}/micromamba"
eval "$(${MAMBA_ROOT}/micromamba shell hook --shell=bash)"
micromamba create -y --prefix ${WORK}/jlab_menv python==3.9.13 jupyterlab ipykernel numpy matplotlib xeus-cling -c conda-forge

The IPyKernel and the xeus-cling kernel for handling C++ are now available. The other kernels need to be installed according to your needs.

Installing the optional kernels

Each time you start a new session on the cluster, remember to define the variable ${WORK} according to the path you chose when creating the virtual environment.

IRKernel

module load gcc r
export R_LIBS_USER=${WORK}/jlab_menv/lib/Rlibs
mkdir ${R_LIBS_USER}
echo "install.packages('IRkernel', repos='https://stat.ethz.ch/CRAN/', lib=Sys.getenv('R_LIBS_USER'))" | R --no-save
export MAMBA_ROOT=/dcsrsoft/spack/external/micromamba
export MAMBA_ROOT_PREFIX="${WORK}/micromamba"
eval "$(${MAMBA_ROOT}/micromamba shell hook --shell=bash)"
echo "IRkernel::installspec()" | micromamba run --prefix ${WORK}/jlab_menv R --no-save

IJulia

module load gcc julia
export JULIA_DEPOT_PATH=${WORK}/jlab_menv/lib/Jlibs
julia -e 'using Pkg; Pkg.add("IJulia")'

MATLAB kernel

${WORK}/jlab_menv/bin/pip install matlab_kernel matlabengine==9.11.19

IOctave

${WORK}/jlab_menv/bin/pip install octave_kernel
echo "c.OctaveKernel.plot_settings = dict(backend='gnuplot')" > ~/.jupyter/octave_kernel_config.py

stata_kernel

module load stata-se
${WORK}/jlab_menv/bin/pip install stata_kernel
${WORK}/jlab_menv/bin/python -m stata_kernel.install
sed -i "s/^stata_path = None/stata_path = $(echo ${STATA_SE_ROOT} | sed 's/\//\\\//g')\/stata-se/" ~/.stata_kernel.conf
sed -i 's/stata_path = \(.*\)stata-mp/stata_path = \1stata-se/' ~/.stata_kernel.conf

sas_kernel

module load sas
${WORK}/jlab_menv/bin/pip install sas_kernel
sed -i "s/'\/opt\/sasinside\/SASHome/'$(echo ${SAS_ROOT} | sed 's/\//\\\//g')/g" ${WORK}/jlab_venv/lib64/python3.9/site-packages/saspy/sascfg.py

Running JupyterLab

Before running JupyterLab, you need to start an interactive session!

Sinteractive

Take note of the name of the running node, that you will later need.  On curnagl, you can type:

hostname

If you didn't install all of the kernels, the corresponding lines should be ignored in the commands below. The execution order is important, in the sense that loading the gcc module should always be done before activating virtual environments.

# Load python and setup the environment for micromamba to work
module load gcc python
export MAMBA_ROOT=/dcsrsoft/spack/external/micromamba
export MAMBA_ROOT_PREFIX="${WORK}/micromamba"
eval "$(${MAMBA_ROOT}/micromamba shell hook --shell=bash)"

# IOctave (optional)
module load octave gnuplot

# IRKernel (optional)
export R_LIBS_USER=${WORK}/jlab_menv/lib/Rlibs

# IJulia (optional)
export JULIA_DEPOT_PATH=${WORK}/jlab_menv/lib/Jlibs

# Launch JupyterLab (on the shell a link that can be copied on the browser will appear)
cd ${WORK}
micromamba run --prefix ${WORK}/jlab_menv jupyter-lab

Before you can copy and paste the link into your favorite browser, you will need to establish an SSH tunnel to the interactive node. From a UNIX-like workstation, you can establish the SSH tunnel to the curnagl node with the following command (replace <username> with your user name, and <hostname> with the name of the node you obtained above, and the <port> number is obtained from the link, it is typically 8888):

ssh -n -N -J <username>@curnagl.dcsr.unil.ch -L <port>:localhost:<port> <hostname>

You will be prompted for your password. When you have finished, you can close the tunnel with Ctrl-C.

Note on Python/R/Julia modules and packages

The modules you install manually from JupyterLab in Python, R or Julia end up inside the JupyterLab virtual environment (${WORK}/jlab_menv). They are hence isolated and independent from your Python/R/Julia instances outside of the virtual environment.

Dask on curnagl

In order to use Dask in Curnagl you have to use the following packages:

Note: please make sure to use version 2022.11.0 or later. Previous versions have some bugs on worker-nodes that make them very slow when using several threads.

Dask makes easy to parallelize computations, you can run computational intensive methods on parallel by assigning those computations to different CPU resources.

For example:

def cpu_intensive_method(x, y , z):
    # CPU computations
    return x + 1


futures = []
for x,y,z in zip(list_x, list_y, list_z):
	future = client.submit(cpu_intensive_method, x, y, z)
    futures.append(future)

result = client.gather(futures)

This documentation proposes two types of use:

Local cluster

Python script looks like:

import dask
from dask.distributed import Client, LocalCluster

def compute(x):
  ""CPU demanding code"
  

if __name__ == "__main__":
  
	cluster = LocalCluster()
	client = Client(address=cluster)
    parameters = [1, 2, 3, 4]
    for x in parameters:
      future = client.submit(inc, x)
      futures.append(future)
      
    result = client.gather(futures)

Call to LocalCluster and Client should be put inside the block if __name__ == "__main__".  For more information, you can check the following link: https://docs.dask.org/en/stable/scheduling.html

The method LocalCluster() will deploy N workers, each worker using T threads such that NxT is equal to the number of cores reserved by SLURM. Dask will balance the number of workers and the number of threads per worker, the goal is to take advantage of GIL free workloads such as Numpy and Pandas.

SLURM script:

#SBATCH --job-name dask_job
#SBATCH --ntasks 16
#SBATCH -N 1
#SBATCH --partition cpu
#SBATCH --cpus-per-task 1
#SBATCH --time 01:00:00
#SBATCH --output=dask_job-%j.out
#SBATCH --error=dask_job%j.error


python script.py

Make sure to include the parameter -N 1 otherwise SLURM will allocate tasks on different nodes and it will make Dask local cluster fail. You should adapt the parameter --ntasks,  as we are using just one machine we can choose between 1 and 48. Just have in mind that the smallest the number the faster your job will start. You can choose to run with less processes but for a longer time.

Slurm cluster

The python script can be launched directly from the frontend but you need to keep you session open with tools such as tmux or screen otherwise your jobs will be cancelled. 

In your Python script you should put something like:

import dask
from dask.distributed import Client
from dask_jobqueue import SLURMCluster

def compute(x):
  ""CPU demanding code"
  

if __name__ == "__main__":
  
	cluster = SLURMCluster(cores=8, memory="40GB")
    client = Client(cluster)
    
    cluster.adapt(maximum_jobs=5, interval="10000 ms")
    for x in parameters:
      future = client.submit(inc, x)
      futures.append(future)
      
    result = client.gather(futures)

In this case DASK will launch jobs with 8 cores and 40GB of memory.  The parameters memory and cores are mandatory.  There are two methods to launch jobs: adapt and scale. adapt will launch/kill jobs by taking into account the load of your computation and how many computations in parallel you can run. You can put a limit on the number of jobs that will be launched. The parameter interval is necessary and needs to be set to 10000 ms to avoid killing jobs too early.

scale will create a static infrastructure composed of a fix number of jobs, specified with the parameters jobs. Example

scale(jobs=10)

This will launch 10 jobs independent from the load and the amount of computation you generate.

Some facts about Slurm jobs and DASK

You need to have in mind that the computation will depend on the availability of resources, if jobs are not running your computation will not start. So if you think that your computation is stuck, please verify first that jobs have been submitted and that they are running using the command: squeue -u $USER.

By default the walltime is set to 30 min, you can use the parameter: walltime if you think that each individual computation will last more than the default time.

Slurm files will be generated under the same directory where you launch your python command. 

Jobs will killed by Dask when there is no more computation to be done. If you see the message:

slurmstepd: error: *** JOB 25260254 ON dna051 CANCELLED AT 2023-03-01T11:00:19 ***

It is completely normal and it does not mean that there was an error in your computation. 

Optimal number of workers

Both LocalCluster or SLURMCluster, will automatically balance the number of workers and the number of threads per worker. You can choose the number of workers using the parameter n_workers. If most of the computation relies on Numpy or Pandas, it is preferable to have only one worker n_workers=1. If most of the computation is pure Python code you should use as much workers as possible. Example:

Local cluster:

LocalCluster(n_workers=int(os.environ['SLURM_NTASKS']))

Slurm cluster:

SLURMCluster(cores=8, memory="40GB", n_workers=8)

Example

Here, it is an example code which illustrates the use of Dask. The code runs 40 multiplications of random matrices of size NXN, each computation returns the sum of all the elements of the result matrix:

import os
import time
import numpy as np
from dask.distributed import Client, LocalCluster
from dask_jobqueue import SLURMCluster

SIZE = 9192

def compute(tag):
    np.random.seed(tag)
    A = np.random.random((SIZE,SIZE))
    B = np.random.random((SIZE,SIZE))
    start = time.time()
    C = np.dot(A,B)
    end = time.time()
    elapsed = end-start                                                                                                                                       
    return elapsed, np.sum(C)

if __name__ == "__main__":

#    cluster = LocalCluster(n_workers=int(os.environ['SLURM_NTASKS']))                                                                                                      
    cluster = SLURMCluster(memory="40GB", n_workers=8)                                                                                
    client = Client(cluster)

    cluster.adapt(maximum_jobs=5, interval="10000 ms")                                                                                   
    N_ITER = 40

    futures = []
    for i in range(N_ITER):
        future = client.submit(compute, i)
        futures.append(future)

    results = client.gather(futures)                                                                                              
    print(results)



Running the Isca framework on the cluster

Isca is a framework for the idealized modelling of the global circulation of planetary atmospheres at varying levels of complexity and realism. The framework is an outgrowth of models from GFDL designed for Earth's atmosphere, but it may readily be extended into other planetary regimes.

Installation

First of all define a folder ${WORK} on the /work or the /scratch filesystem (somewhere where you have write permissions):

export WORK=/work/FAC/...
mkdir -p ${WORK}

Load the following relevant modules and create a python virtual environment:

module load gcc/10.4.0
module load mvapich2/2.3.7
module load netcdf-c/4.8.1-mpi
module load netcdf-fortran/4.5.4
module load python/3.9.13

python -m venv ${WORK}/isca_venv

Install the required python modules:

${WORK}/isca_venv/bin/pip install dask f90nml ipykernel Jinja2 numpy pandas pytest sh==1.14.3 tqdm xarray

Download and install the Isca framework:

cd ${WORK}
git clone https://github.com/ExeClim/Isca
cd Isca/src/extra/python
${WORK}/isca_venv/bin/pip install -e .

Patch the Isca makefile:

sed -i 's/-fdefault-double-8$/-fdefault-double-8 \\\n           -fallow-invalid-boz -fallow-argument-mismatch/' ${WORK}/Isca/src/extra/python/isca/templates/mkmf.template.gfort

Create the environment file for curnagl:

cat << EOF > ${WORK}/Isca/src/extra/env/curnagl-gfortran
echo Loading basic gfortran environment

module load gcc/10.4.0
module load mvapich2/2.3.7
module load netcdf-c/4.8.1-mpi
module load netcdf-fortran/4.5.4

# this defaults to ia64, but we will use gfortran, not ifort
export GFDL_MKMF_TEMPLATE=gfort

export F90=mpifort
export CC=mpicc
EOF

Compiling and running the Held-Suarez dynamical core test case

Compilation takes place automatically at runtime. After logging in to the cluster, create a SLURM script file start.sbatch with the following contents:

#!/bin/bash -l

#SBATCH --account ACCOUNT_NAME
#SBATCH --mail-type ALL 
#SBATCH --mail-user <first.lastname>@unil.ch

#SBATCH --chdir ${WORK}
#SBATCH --job-name isca_held-suarez
#SBATCH --output=isca_held-suarez.job.%j

#SBATCH --partition cpu

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 16
#SBATCH --mem 8G
#SBATCH --time 00:29:59
#SBATCH --export ALL

module load gcc/10.4.0
module load mvapich2/2.3.7
module load netcdf-c/4.8.1-mpi
module load netcdf-fortran/4.5.4

WORK=$(pwd)

export GFDL_BASE=${WORK}/Isca
export GFDL_ENV=curnagl-gfortran
export GFDL_WORK=${WORK}/isca_work
export GFDL_DATA=${WORK}/isca_gfdl_data

export C_INCLUDE_PATH=${NETCDF_C_ROOT}/include
export LIBRARY_PATH=${NETCDF_C_ROOT}/lib

sed -i "s/^NCORES =.*$/NCORES = $(echo ${SLURM_CPUS_PER_TASK:-1})/" ${GFDL_BASE}/exp/test_cases/held_suarez/held_suarez_test_case.py

${WORK}/isca_venv/bin/python $GFDL_BASE/exp/test_cases/held_suarez/held_suarez_test_case.py

You need to carefully replace, at the beginning of the file, the following elements:

Then you can simply start the job:

sbatch start.sbatch

Running the MPAS framework on the cluster

The Model for Prediction Across Scales (MPAS) is a collaborative project for developing atmosphere, ocean and other earth-system simulation components for use in climate, regional climate and weather studies.

Compilation

First of all define a folder ${WORK} on the /work or the /scratch filesystem (somewhere where you have write permissions):

export WORK=/work/FAC/...
mkdir -p ${WORK}

Load the following relevant modules:

module load gcc/10.4.0
module load mvapich2/2.3.7
module load parallel-netcdf/1.12.2
module load parallelio/2.5.9-mpi

export PIO=$PARALLELIO_ROOT
export PNETCDF=$PARALLEL_NETCDF_ROOT

Download the MPAS framework:

cd ${WORK}
git clone https://github.com/MPAS-Dev/MPAS-Model

Patch the MPAS Makefile:

sed -i 's/-ffree-form/-ffree-form -fallow-argument-mismatch/' ${WORK}/MPAS-Model/Makefile

Compile:

cd ${WORK}/MPAS-Model

make gfortran CORE=init_atmosphere AUTOCLEAN=true PRECISION=single OPENMP=true USE_PIO2=true
make gfortran CORE=atmosphere AUTOCLEAN=true PRECISION=single OPENMP=true USE_PIO2=true

Running a basic global simulation

Here we aim at running a basic global simulation, just to test that the framework runs. we need to proceed in three steps:

  1. Process time-invariant fields, which will be interpolated into a given mesh, this step produces a "static" file
  2. Interpolating time-varying meteorological and land-surface fields from intermediate files (produced by the
    ungrib component of the WRF Pre-processing System), this step produces an "init" file
  3. Run the basic simulation
Create the run folder and link to the binary files
cd ${WORK}
mkdir -p run
cd run
ln -s ${WORK}/MPAS-Model/init_atmosphere_model
ln -s ${WORK}/MPAS-Model/atmosphere_model
Get the mesh files
cd ${WORK}
wget https://www2.mmm.ucar.edu/projects/mpas/atmosphere_meshes/x1.40962.tar.gz
cd run
tar xvzf ../x1.40962.tar.gz
Create the configuration files for the "static" run

The namelist.init_atmosphere file:

cat << EOF > ${WORK}/run/namelist.init_atmosphere
&nhyd_model
config_init_case = 7
/
&data_sources
config_geog_data_path = '${WORK}/WPS_GEOG/'
config_landuse_data = 'MODIFIED_IGBP_MODIS_NOAH'
config_topo_data = 'GMTED2010'
config_vegfrac_data = 'MODIS'
config_albedo_data = 'MODIS'
config_maxsnowalbedo_data = 'MODIS'
/
&preproc_stages
config_static_interp = true
config_native_gwd_static = true
config_vertical_grid = false
config_met_interp = false
config_input_sst = false
config_frac_seaice = false
/
EOF

The streams.init_atmosphere file:

cat << EOF > ${WORK}/run/streams.init_atmosphere
<streams>
<immutable_stream name="input"
                  type="input"
                  precision="single"
                  filename_template="x1.40962.grid.nc"
                  input_interval="initial_only" />

<immutable_stream name="output"
                  type="output"
                  filename_template="x1.40962.static.nc"
                  packages="initial_conds"
                  output_interval="initial_only" />
</streams>
EOF
Proceed to the "static" run

You will need to make sure that the folder ${WORK}/WPS_GEOG exists and contains all the appropriate data.

First create a start_mpas_init.sbatch file (carefully replace on line #4 ACCOUNT_NAME by your actual project name and on line #6 appropriately type your e-mail address, or double-comment with an additional # if you don't wish to receive job notifications):

cat << EOF > ${WORK}/run/start_mpas_init.sbatch
#!/bin/bash -l

#SBATCH --account ACCOUNT_NAME
#SBATCH --mail-type ALL 
#SBATCH --mail-user <first.lastname>@unil.ch

#SBATCH --chdir ${WORK}/run
#SBATCH --job-name mpas_init
#SBATCH --output=mpas_init.job.%j

#SBATCH --partition cpu

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --time 00:59:59
#SBATCH --export ALL

module load gcc/10.4.0
module load mvapich2/2.3.7
module load parallel-netcdf/1.12.2
module load parallelio/2.5.9-mpi

export PIO=\$PARALLELIO_ROOT

srun ./init_atmosphere_model
EOF

Now start the job with sbatch start_mpas_init.sbatch and at the end of the run, make sure that the log file ${WORK}/run/log.init_atmosphere.0000.out displays no error.

Create the configuration files for the "init" run

The namelist.init_atmosphere file:

cat << EOF > ${WORK}/run/namelist.init_atmosphere
&nhyd_model
config_init_case = 7
config_start_time = '2014-09-10_00:00:00'
/
&dimensions
config_nvertlevels = 55
config_nsoillevels = 4
config_nfglevels = 38
config_nfgsoillevels = 4
/
&data_sources
config_met_prefix = 'GFS'
config_use_spechumd = false
/
&vertical_grid
config_ztop = 30000.0
config_nsmterrain = 1
config_smooth_surfaces = true
config_dzmin = 0.3
config_nsm = 30
config_tc_vertical_grid = true
config_blend_bdy_terrain = false
/
&preproc_stages
config_static_interp = false
config_native_gwd_static = false
config_vertical_grid = true
config_met_interp = true
config_input_sst = false
config_frac_seaice = true
/
EOF

The streams.init_atmosphere file:

cat << EOF > ${WORK}/run/streams.init_atmosphere
<streams>
<immutable_stream name="input"
                  type="input"
                  filename_template="x1.40962.static.nc"
                  input_interval="initial_only" />

<immutable_stream name="output"
                  type="output"
                  filename_template="x1.40962.init.nc"
                  packages="initial_conds"
                  output_interval="initial_only" />
</streams>
EOF
Proceed to the "init" run

Just start again the job with sbatch start_mpas_init.sbatch and at the end of the run, make sure that the log file ${WORK}/run/log.init_atmosphere.0000.out displays no error.

Create the configuration file for the global simulation

The namelist.atmosphere file:

cat << EOF > ${WORK}/run/namelist.atmosphere
&nhyd_model
    config_time_integration_order = 2
    config_dt = 720.0
    config_start_time = '2014-09-10_00:00:00'
    config_run_duration = '0_03:00:00'
    config_split_dynamics_transport = true
    config_number_of_sub_steps = 2
    config_dynamics_split_steps = 3
    config_h_mom_eddy_visc2 = 0.0
    config_h_mom_eddy_visc4 = 0.0
    config_v_mom_eddy_visc2 = 0.0
    config_h_theta_eddy_visc2 = 0.0
    config_h_theta_eddy_visc4 = 0.0
    config_v_theta_eddy_visc2 = 0.0
    config_horiz_mixing = '2d_smagorinsky'
    config_len_disp = 120000.0
    config_visc4_2dsmag = 0.05
    config_w_adv_order = 3
    config_theta_adv_order = 3
    config_scalar_adv_order = 3
    config_u_vadv_order = 3
    config_w_vadv_order = 3
    config_theta_vadv_order = 3
    config_scalar_vadv_order = 3
    config_scalar_advection = true
    config_positive_definite = false
    config_monotonic = true
    config_coef_3rd_order = 0.25
    config_epssm = 0.1
    config_smdiv = 0.1
/
&damping
    config_zd = 22000.0
    config_xnutr = 0.2
/
&limited_area
    config_apply_lbcs = false
/
&io
    config_pio_num_iotasks = 0
    config_pio_stride = 1
/
&decomposition
    config_block_decomp_file_prefix = 'x1.40962.graph.info.part.'
/
&restart
    config_do_restart = false
/
&printout
    config_print_global_minmax_vel = true
    config_print_detailed_minmax_vel = false
/
&IAU
    config_IAU_option = 'off'
    config_IAU_window_length_s = 21600.
/
&physics
    config_sst_update = false
    config_sstdiurn_update = false
    config_deepsoiltemp_update = false
    config_radtlw_interval = '00:30:00'
    config_radtsw_interval = '00:30:00'
    config_bucket_update = 'none'
    config_physics_suite = 'mesoscale_reference'
/
&soundings
    config_sounding_interval = 'none'
/
EOF

The streams.atmosphere file:

cat << 'EOF' > ${WORK}/run/streams.atmosphere
<streams>
<immutable_stream name="input"
                  type="input"
                  filename_template="x1.40962.init.nc"
                  input_interval="initial_only" />

<immutable_stream name="restart"
                  type="input;output"
                  filename_template="restart.$Y-$M-$D_$h.$m.$s.nc"
                  input_interval="initial_only"
                  output_interval="1_00:00:00" />

<stream name="output"
        type="output"
        filename_template="history.$Y-$M-$D_$h.$m.$s.nc"
        output_interval="6:00:00" >
</stream>

<stream name="diagnostics"
        type="output"
        filename_template="diag.$Y-$M-$D_$h.$m.$s.nc"
        output_interval="3:00:00" >
</stream>

<immutable_stream name="iau"
                  type="input"
                  filename_template="x1.40962.AmB.$Y-$M-$D_$h.$m.$s.nc"
                  filename_interval="none"
                  packages="iau"
                  input_interval="initial_only" />

<immutable_stream name="lbc_in"
                  type="input"
                  filename_template="lbc.$Y-$M-$D_$h.$m.$s.nc"
                  filename_interval="input_interval"
                  packages="limited_area"
                  input_interval="none" />

</streams>
EOF

Run the whole simulation

You will need to copy relevant data to the run folder:

cp ${WORK}/MPAS-Model/{GENPARM.TBL,LANDUSE.TBL,OZONE_DAT.TBL,OZONE_LAT.TBL,OZONE_PLEV.TBL,RRTMG_LW_DATA,RRTMG_SW_DATA,SOILPARM.TBL,VEGPARM.TBL} ${WORK}/run/.

Then create a start_mpas.sbatch file (carefully replace on line #4 ACCOUNT_NAME by your actual project name and on line #6 appropriately type your e-mail address, or double-comment with an additional # if you don't wish to receive job notifications):

cat << EOF > ${WORK}/run/start_mpas.sbatch
#!/bin/bash -l

#SBATCH --account ACCOUNT_NAME
#SBATCH --mail-type ALL 
#SBATCH --mail-user <first.lastname>@unil.ch

#SBATCH --chdir ${WORK}/run
#SBATCH --job-name mpas_init
#SBATCH --output=mpas_init.job.%j

#SBATCH --partition cpu

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 16
#SBATCH --mem 8G
#SBATCH --time 00:59:59
#SBATCH --export ALL

module load gcc/10.4.0
module load mvapich2/2.3.7
module load parallel-netcdf/1.12.2
module load parallelio/2.5.9-mpi

export PIO=\$PARALLELIO_ROOT

srun ./atmosphere_model
EOF

Now start the job with sbatch start_mpas.sbatch and at the end of the run, make sure that the log file ${WORK}/run/log.atmosphere.0000.out displays no error.

Run OpenFOAM codes on Curnagl

Script to run OpenFOAM code

You are using OpenFOAM on your computer and you need more ressources. Let’s go on Curnagl! 

OpenFOAM is usually using MPI. Here is a bash script to run your parallelized OpenFOAM code. NTASKS should be replaced by the number of processors you want to use into your OpenFOAM code.   It is good practice to put your OpenFOAM code in a bash file instead of calling OpenFOAM commands right into the sbatch file.
For instance, create openfoam.sh in which you call your OpenFOAM code (replace commands with yours):

!/bin/bash
# First command
decomposepar ...
# Second command, if you are using a parallel command, CALL IT WITH SRUN COMMAND
srun snappyHexMesh -parallel ...

Then, create a sbatch file to run your OpenFOAM bash file on Curnagl:

#!/bin/bash -l 

#SBATCH --job-name openfoam  
#SBATCH --output openfoam.out 

#SBATCH --partition cpu 
#SBATCH --nodes 1  
#SBATCH --ntasks NTASKS 
#SBATCH --cpus-per-task 1 
#SBATCH --mem 8G  
#SBATCH --time 00:30:00
#SBATCH --export NONE

module purge
module load gcc/10.4.0 mvapich2/2.3.7 openfoam/2206 

export SLURM_EXPORT_ENV=ALL

# RUN YOUR BASH OPENFOAM CODE HERE
bash ./openfoam.sh

Please note that running your parallelized OpenFOAM code should not be performed via  mpirun but  srun.  For a complete MPI overview on Curnagl, please refer to compiling and running MPI codes wiki.


How do I transfer my OpenFOAM code to Curnagl ?


You can upload your OpenFOAM code thanks to FileZilla or copy and paste data to the cluster thanks to the scp command. 

Example: I want to copy test.py to Curnagl. I run the following command:

scp test.py <username>@curnagl.dcsr.unil.ch:/YOUR_PATH_ON_CURNAGL

Where YOUR_PATH_ON_CURNAGL is something like /users/username/work/my_folder.

In these commands, do not forget to change <username>  with yours.

This transfer can be done for any file type: .py, .csv, .h, images...

To copy a folder, use the command  scp -r.

For more details, refer to transfer files to/from Curnagl wiki.

Compiling software using cluster libraries

If you see the following error when compiling a code on the cluster:

fatal error: XXXX.h: No such file or directory 

That means that the software you are trying to compile needs  a specific header file provided by a third party library. In order to use a third party library, the compiler mainly needs two things:

By default in Linux systems, those files are located in default paths as: /usr, /lib, etc.. There are two ways to tell the compiler where to look for those files: Makefile or using compiler variables.

Makefile

Makefiles provide the following  Variables :

The three first variables are used to pass extra options to a specific compiler and language, c, c++ and fortran respectively. The last variable is meant to be used to pass the option -L -l which are used by the linker.

Example

CFLAGS+= -I/usr/local/cuda/include
LDFLAGS+= -L/usr/local/cuda/lib -lcudnn

Here we will tell the compiler where to find the include files and the location of libraries. Those variables should already be present on the makefile and used on the compilation process.

GCC Variables

if you are using GCC, you can use the following Variables :

CPATH=/usr/local/cuda/include
LIBRARY_PATH=/usr/local/cuda/lib

This would have the same result as modifying the variable on the Makefile. This procedure is very useful in case you do not have access to the Makefile or Makefile variables are not used during compilation.

Using cluster libraries

On the cluster, libraries are provided by modules which means that you need to tell the compiler to look for headers files and binary files in special locations. The procedure is the following:

Example

$ module load cuda
$ module show cuda
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /dcsrsoft/spack/arolle/v1.0/spack/share/spack/lmod/Zen2-IB/Core/cuda/11.6.2.lua:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
whatis("Name : cuda")
whatis("Version : 11.6.2")
whatis("Target : zen")
whatis("Short description : CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).")
help([[CUDA is a parallel computing platform and programming model invented by
NVIDIA. It enables dramatic increases in computing performance by
harnessing the power of the graphics processing unit (GPU). Note: This
package does not currently install the drivers necessary to run CUDA.
These will need to be installed manually. See:
https://docs.nvidia.com/cuda/ for details.]])
depends_on("libxml2/2.9.13")
prepend_path("LD_LIBRARY_PATH","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf/lib64")
prepend_path("PATH","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf/bin")
prepend_path("CMAKE_PREFIX_PATH","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf/")
setenv("CUDA_HOME","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf")
setenv("CUDA_ROOT","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf")

You can observe that there is the variable CUDA_ROOT which is the one that should be used.

export CFLAGS="-I$CUDA_ROOT/include"
LDFLAGS+= -L$(CUDA_ROOT)/lib64/stubs -L$(CUDA_ROOT)/lib64/ -lcuda -lcudart -lcublas -lcurand

This is quite a complex example, sometimes you only need -L$(XXX_ROOT)/lib

Example for R package

In the case of an R package, we do not have control over the Makefile, so the only option is to use GCC variables. For an R package that depend on gsl and mpfr libraries, we need to do the following:

mdoule load gsl mpfr
export CPATH=$GSL_ROOT/include:$MPFR_ROOT/include
export LIBRARY_PATH=$GSL_ROOT/lib:$MPFR_ROOT/lib



Course software for Image Analysis with CNNs

You can do the practicals on various computing platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals. 

If you choose to work on the UNIL JupyterLab, then you do not need to prepare anything since all the necessary libraries will already be installed on the UNIL JupyterLab. In all cases, you will receive a guest username during the course, so you will be able to work on the UNIL JupyterLab.

Otherwise, if you prefer to work on your laptop or on Curnagl, please make sure you have a working installation before the day of the course as on the day we will be unable to provide any assistance with this. If you have difficulties with the installation we can help you, so please contact us before the course at helpdesk@unil.ch with subject: DCSR ML course. 

Before the course, we will send you all the files that are needed to do the practicals.

JupyterLab

Here are some instructions for using the UNIL JupyterLab to do the practicals.

Go to the webpage: https://jupyter.dcsr.unil.ch/jupyter

Enter the login and password that you have received during the course.

Image Classification

We have already prepared your workspace, including the data and notebook. However, in case there is a problem, you can follow the following instructions.

Click on the button "New Folder" (the small logo of of folder with a "+" sign) and name it "models".

Click again on the same button "New Folder" and name it "images".

Double click on the "images" folder that you have just created. 

Click on the button "Upload Files" (the vertical arrow logo) and upload the three images (car.jpeg, frog.jpeg and ship.jpeg) that are included in "images" directory you have received for this course. 

Click on the folder logo (just on top of "Name") to come out of the "images" folder. 

Double click on the "models" folder and then click on the button "Upload Files" to upload all the "models.keras" and "models.npy" files that are included in the "models" directory you have received for this course. 

Click on the folder logo (just on top of "Name") to come out of the "models" folder. 

To work with the html file "Convolutional_Neural_Networks.html":

To work with the notebook "Convolutional_Neural_Networks.ipynb":

In the practical code (i.e. the Python code in the html or ipynb file), the following paths were set:

platform = "jupyter"

PATH_IMAGES = "./images"

PATH_MODELS = "./models"

To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

When using TensorFlow, you may receive a warning

2022-09-22 11:01:12.232756: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 11:01:12.232856: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).

When you have finished the practicals, select File / Log out.

Image Segmentation

Now click on the "ImageProcessing" square button in the Notebook panel.

Copy / paste the commands from the html practical file to the Jupyter Notebook.

To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

Laptop

You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).

Image Classification

Please decide in which folder (or path) you want to do the practicals and go there:

cd THE_PATH_WHERE_I_DO_THE_PRACTICALS

Then you need to create two folders:

mkdir images
mkdir models

Please copy/paste the three images (car.jpeg, frog.jpeg and ship.jpeg) that are included in the folder "images" you have received for this course in the "images" folder. And also copy/paste all the "models.keras" and "models.npy" files that are included in "models" directory you have received for this course. 

In the practical code (i.e. the Python code in the html file), you will need to set the paths as follows:

platform = "laptop"

PATH_IMAGES = "./images"

PATH_MODELS = "./models"

Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on your laptop. You need Python >= 3.8.

For Linux

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install tensorflow tf-keras-vis scikit-learn matplotlib numpy h5py notebook

You may need to choose the right library versions, for example tensorflow==2.12.0

To check that Tensorflow was installed:

python3 -c "import tensorflow; print(tensorflow.version.VERSION)"

There might be a warning message (see above) and the output should be something like "2.12.0".

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

source mlcourse/bin/activate

jupyter notebook
For Mac

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install tensorflow-macos==2.12.0 tf-keras-vis scikit-learn matplotlib numpy h5py notebook

If you receive an error message such as:

ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos

Then, try the following command:

SYSTEM_VERSION_COMPAT=0 pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner

If you have a Mac with M1 or more recent chip (if you are not sure have a look at "About this Mac"), you can also install the tensorflow-metal library to accelerate training on Mac GPUs (but this is not necessary for the course):

pip3 install tensorflow-metal

To check that Tensorflow was installed:

python3 -c "import tensorflow; print(tensorflow.version.VERSION)"

There might be a warning message (see above) and the output should be something like "2.12.0".

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

source mlcourse/bin/activate

jupyter notebook
For Windows

If you do not have Python installed, you can use either Conda: https://docs.conda.io/en/latest/miniconda.html (see the instructions here: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html) or Python official installer: https://www.python.org/downloads/windows/ 

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install tensorflow tf-keras-vis scikit-learn matplotlib numpy h5py notebook

You may need to choose the right library versions, for example tensorflow==2.12.0

To check that Tensorflow was installed:

python -c "import tensorflow; print(tensorflow.version.VERSION)"

There might be a warning message (see above) and the output should be something like "2.12.0".

You can terminate the current session:

deactivate

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

mlcourse\Scripts\activate.bat

jupyter notebook

Image Segmentation

This part of the course must be done on the UNIL Jupyter Lab but some instructions on how to install the libraries on your laptop will be given at the end of the course.

Curnagl

For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.

For Mac users, download and install XQuartz (X server): https://www.xquartz.org/

For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: https://mobaxterm.mobatek.net/download-home-edition.html

For Linux users, you do not need to install anything.

When testing if TensorFlow was properly installed (see below) you may receive a warning

2022-03-16 12:15:00.564218: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/python-3.8.8-tb3aceqq5wzx4kr5m7s5m4kzh4kxi3ex/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tcl-8.6.11-aonlmtcje4sgqf6gc4d56cnp3mbbhvnj/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tk-8.6.11-2gb36lqwohtzopr52c62hajn4tq7sf6m/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib64:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib
2022-03-16 12:15:00.564262: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).

Image Classification

Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.

For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.

See also the documentation: https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster

You can do the practicals in your /scratch directory or on the course group "cours_hpc" if you have asked us in advanced:

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc

mkdir < my unil username >

cd < my unil username >

You need to make two directories:

mkdir images

mkdir models

Clone the following git repos:

git clone https://c4science.ch/source/CNN_Classification.git

Copy the images from CNN_Classification to images:

cp CNN_Classification/*jpeg images

You also need to upload all the "models.keras" and "models.npy" files that are included in the "models" directory you have received for this course, and move them to the "models" folder on Curnagl.

Let us install libraries from the interactive partition:

Sinteractive -m 10G -G 1

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

python -m venv mlcourse

source mlcourse/bin/activate

pip install -r CNN_Classification/requirements.txt

To check that TensorFlow was installed:

python -c 'import tensorflow; print(tensorflow.version.VERSION)'

There might be a warning message (see above) and the output should be something like "2.9.1".

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >

You can do the practicals on the interactive partition:

Sinteractive -m 10G -G 1

module load gcc/10.4.0 cuda/11.6.2 cudnn/8.4.0.27-11.6 python/3.9.13

source mlcourse/bin/activate

python

In the practical code (i.e. the Python code in the html file), you will need to set the paths as follows:

platform = "curnagl"

PATH_IMAGES = "./images"

PATH_MODELS = "./models"

Image Segmentation

On demand. If you work in a project in which you need to use Curnagl to do segmentations, please contact us.