Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time required for inferencing many proteins (default: 'false')
```
An example batch script using the helper script is:
```
#!/bin/bash
#SBATCH -c 24
#SBATCH -p gpu
#SBATCH --gres=gpu:1
#SBATCH --gres-flags=enforce-binding
#SBATCH --mem 200G
#SBATCH -t 6:00:00
module purge
module load singularityce
export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
bash /dcsrsoft/singularity/containers/run_alphafold_032e2f2.sh -d /reference/alphafold/20221206 -t 2022-12-06 -n 24 -g true -f ./T1024.fasta -o /scratch/ulambda/alphafold/runtest
```
#### Alphafold without containers
Fans of Conda may also wish to check out [https://github.com/kalininalab/alphafold\_non\_docker](https://github.com/kalininalab/alphafold_non_docker). Just make sure to `module load gcc miniconda3` rather than following the exact procedure!
# Alphafold 3
**Disclaimer:** this page is provided for experimental support only!
**Disclaimer 2**: pay attention to the terms of use provided [here](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md)!
The project home page where you can find the latest information [there](https://github.com/google-deepmind/alphafold3).
### Using Alphafold 3 through a container
The Apptainer/Singularity container for Alphafold 3 is available at `/dcsrsoft/singularity/containers/alphafold-v3.sif`.
As stated on the Github page, it is possible to test Alphafold 3 with the following JSON input (named `fold_input.json`):
```json
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}
```
To ease the use of Alphafold 3, we have downloaded:
- the databases to `/reference/alphafold3/db`
- the model to `/reference/alphafold3/model`
Here an example of Slurm job that can be used to run Alphafold 3 with the above JSON file:
```bash
#!/bin/bash -l
#SBATCH --time 2:00:00
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --cpus-per-task 8
#SBATCH --mem=64G
dcsrsoft use 20241118
module load apptainer
export APPTAINER_BINDPATH="/scratch,/work,/users,/reference"
mkdir -p output
apptainer run --nv /dcsrsoft/singularity/containers/alphafold-v3.sif --json_path=fold_input.json --output_dir=output --model_dir=/reference/alphafold3/model --db_dir=/reference/alphafold3/db
```
# CryoSPARC
First of all, if you plan to use CryoSPARC on the cluster, please contact us to get a port number (you will understand later why it's important).
CryoSPARC can be used on Curnagl and take benefit from Nvidia A100 GPUs. This page presents the installation in the /work storage location, so that it can be shared among the members of the same project. The purpose is to help you with installation, but in case of problem, don't hesitate to look at the [official documentation](https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure).
## 1. Get a license
A free license can be obtained for non-commercial use from [Structura Biotechnology](https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/obtaining-a-license-id).
You will receive an email containing your license ID. It is similar to:
235e3142-d2b0-17eb-c43a-9c2461c1234d
## 2. Prerequisites
Before starting the installation we suppose that:
- DCSR gave you the following port number: 45678
- you want to install Cryosparc to the following location: /work/FAC/FBM/DMF/ulambda/cryosparc
- your license ID is: 235e3142-d2b0-17eb-c43a-9c2461c1234d
Obviously you must not use those values and they must be modified.
## 3. Install CryoSPARC
First, connect to the Curnagl login node using your favourite SSH client and follow the next steps.
#### Define the 3 prerequisites variables
```shell
export LICENSE_ID="235e3142-d2b0-17eb-c43a-9c2461c1234d"
export CRYOSPARC_ROOT=/work/FAC/FBM/DMF/ulambda/cryosparc
export CRYOSPARC_PORT=45678
```
#### Create some directories and download the packages
```shell
mkdir -p $CRYOSPARC_ROOT
mkdir -p $CRYOSPARC_ROOT/database
mkdir -p $CRYOSPARC_ROOT/scratch
mkdir -p $CRYOSPARC_ROOT/curnagl_config
cd $CRYOSPARC_ROOT
curl -L https://get.cryosparc.com/download/master-latest/$LICENSE_ID -o cryosparc_master.tar.gz
curl -L https://get.cryosparc.com/download/worker-latest/$LICENSE_ID -o cryosparc_worker.tar.gz
tar xf cryosparc_master.tar.gz
tar xf cryosparc_worker.tar.gz
```
#### Create `$CRYOSPARC_ROOT/curnagl_config/cluster_info.json`
Use your favourite editor to fill the file with the following content:
```JSON
{
"qdel_cmd_tpl": "scancel {{ cluster_job_id }}",
"worker_bin_path": "/work/FAC/FBM/DMF/ulambda/cryosparc/cryosparc_worker/bin/cryosparcw",
"title": "curnagl",
"cache_path": "/work/FAC/FBM/DMF/ulambda/cryosparc/scratch",
"qinfo_cmd_tpl": "sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'",
"qsub_cmd_tpl": "sbatch {{ script_path_abs }}",
"qstat_cmd_tpl": "squeue -j {{ cluster_job_id }}",
"cache_quota_mb": 1000000,
"send_cmd_tpl": "{{ command }}",
"cache_reserve_mb": 10000,
"name": "curnagl"
}
```
Pay attention to `worker_bin_path` and `cache_path` variables, they must be adapted to your setup. `cache_reserve_mb` and `cache_quota_mb` might have to be modified, depending on your needs.
#### Create `$CRYOSPARC_ROOT/curnagl_config/cluster_script.sh`
Use your favourite editor to fill the file with the following content:
```shell
#!/bin/bash
#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}
#SBATCH --partition={{ "gpu" if num_gpu > 0 else "cpu" }}
#SBATCH --time=12:00:00
#SBATCH --output={{ job_log_path_abs }}
#SBATCH --error={{ job_log_path_abs }}
#SBATCH --nodes=1
#SBATCH --mem={{ (ram_gb*1024*2)|int }}M
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task={{ num_cpu }}
#SBATCH --gres=gpu:{{ num_gpu }}
#SBATCH --gres-flags=enforce-binding
module load gcc cuda
available_devs=""
for devidx in $(seq 1 16);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z "$available_devs" ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
srun {{ run_cmd }}
```
#### Install CryoSPARC master
```
cd $CRYOSPARC_ROOT/cryosparc_master
./install.sh --license $LICENSE_ID --hostname curnagl --dbpath $CRYOSPARC_ROOT/database --port $CRYOSPARC_PORT
```
At the end of the installation process, the installer asks you if you want to modify your `~/.bashrc` file, please answer yes.
#### Start CryoSPARC and create a user
```
export PATH=$CRYOSPARC_ROOT/cryosparc_master/bin:$PATH
cryosparcm start
cryosparcm createuser --email "ursula.lambda@unil.ch" --password "ursulabestpassword" --username "ulambda" --firstname "Ursula" --lastname "Lambda"
```
Of course, when creating the user, you have to use appropriate information, the password shouldn't be your UNIL password.
#### Install CryoSPARC worker
First you have to connect to a GPU node:
```shell
Sinteractive -G1 -m8G
```
Once you are connected to the node:
```shell
export LICENSE_ID="235e3142-d2b0-17eb-c43a-9c2461c1234d"
export CRYOSPARC_ROOT=/work/FAC/FBM/DMF/ulambda/cryosparc
module load gcc cuda
cd $CRYOSPARC_ROOT/cryosparc_worker
./install.sh --license $LICENSE_ID --cudapath $CUDA_HOME
```
At the end of the process, you can logout.
#### Configure the cluster workers
```shell
cd $CRYOSPARC_ROOT/curnagl_config
cryosparcm cluster connect
```
## 4. Connection to the web interface
You have to create a tunnel from your laptop to the Curnagl login node:
```shell
ssh -N -L 8080:localhost:45678 ulambda@curnagl.dcsr.unil.ch
```
Please note that the port 45678 **must** be modified according to the one that DCSR gave you, and ulambda **must** be replaced with your UNIL login.
Then you can open a Web browser the following address: [http://localhost:8080](http://localhost:8080).
[](https://wiki.unil.ch/ci/uploads/images/gallery/2022-01/image-1643304261513.png)
Here you have to use the credentials defined when you created a user.
## 5. Working with CryoSPARC
When you start working with CryoSPARC on Curnagl, you have to start it from the login node:
```shell
cryosparcm start
```
When you have finished, you should stop CryoSPARC in order to avoid wasting resources on Curnagl login node:
```shell
cryosparcm stop
```
# Compiling and running MPI codes
To illustrate the procedure we will compile and run a MPI hello world example from [mpitutorial.com](https://mpitutorial.com/). First we download the source code:
```
$ wget https://raw.githubusercontent.com/mpitutorial/mpitutorial/gh-pages/tutorials/mpi-hello-world/code/mpi_hello_world.c
```
### Compiling with GCC
To compile the code, we first need to load the gcc and mvapich2 modules:
```
$ module load mvapich2
```
Then we can produce the executable called `mpi_hello_world` by compiling the source code `mpi_hello_world.c`:
```
$ mpicc mpi_hello_world.c -o mpi_hello_world
```
The `mpicc` tool is a wrapper around the gcc compiler that adds the correct options for linking MPI codes and if you are curious you can run `mpicc -show` to see what it does.
To run the executable we create a Slurm submission script called `run_mpi_hello_world.sh`, where we ask to run a total of 4 MPI tasks with (at max) 2 tasks per node:
```
#!/bin/bash
#SBATCH --time 00-00:05:00
#SBATCH --mem=2G
#SBATCH --ntasks 4
#SBATCH --ntasks-per-node 2
#SBATCH --cpus-per-task 1
module purge
module load gcc
module load mvapich2
module list
EXE=mpi_hello_world
[ ! -f $EXE ] && echo "EXE $EXE not found." && exit 1
srun $EXE
```
Finally, we submit our MPI job with:
```
$ sbatch run_mpi_hello_world.sh
```
Upon completion you should get something like:
```
...
Hello world from processor dna001.curnagl, rank 1 out of 4 processors
Hello world from processor dna001.curnagl, rank 3 out of 4 processors
Hello world from processor dna004.curnagl, rank 0 out of 4 processors
Hello world from processor dna004.curnagl, rank 2 out of 4 processors
```
It is important to check is that you have a single group of 4 processors and not 4 groups of 1 processor. If that's the case, you can now compile and run your own MPI application.
The important bit of the script is the `srun $EXE` as MPI jobs but be started with a job launcher in order to run multiple processes on multiple nodes.
# Deep Learning with GPUs
The training phase of your deep learning model may be very time consuming. To accelerate this process you may want to use GPUs and you will need to install the deep learning packages, such as Keras or PyTorch, properly. Here is a short documentation on how to install some well known deep learning packages in Python. If you encounter any problem during the installation or if you need to install other deep learning packages (in Python, R or other programming languages), please send an email to with subject DCSR: Deep Learning package installation, and we will try to help you.
### TensorFlow and Keras
We will install the TensorFlow 2's implementation of the Keras API (tf.keras); see [https://keras.io/about/](https://keras.io/about/)
To install the packages in your work directory:
```
cd /work/PATH_TO_YOUR_PROJECT
```
Log into a GPU node:
```
Sinteractive -m 4G -G 1
```
Check that the GPU is visible:
```
nvidia-smi
```
If it works properly you should see a message including an NVIDIA table. If you instead receive an error message such as "nvidia-smi: command not found" it means there is a problem.
To use TensorFlow on NVIDIA GPUs we recommend the use of NVIDIA containers including TensorFlow and its dependences such as CUDA and CuDNN that are necessary for GPU acceleration. The NVIDIA containers will also include various Python libraries and Python itself in such a way that everything is compatible with the version of TensorFlow you choose. Nevertheless, if you prefer to use the virtual environment method, please look at the instructions in the comments below.
```
module load singularityce/4.1.0
export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
```
We have already downloaded several versions of TensorFlow:
```
/dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-24.05-2.15.sif
/dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-24.01-2.14.sif
/dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-23.10-2.13.sif
/dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-23.07-2.12.sif
/dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-23.03-2.11.sif
/dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-22.12-2.10.sif
```
Here the last two numbers indicate the TensorFlow version, for example "tensorflow-ngc-24.05-2.15.sif" corresponds to TensorFlow version "2.15". In case you want to use another version, see the instructions in the comments below.
To run it:
```
singularity run --nv /dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-24.05-2.15.sif
```
You may receive a few error messages such as “not a valid test operator”, but this is ok and should not cause any problem. You should see a message by NVIDIA including the TensorFlow version. The prompt should now start with "Singularity>" emphasising that you are working within a singularity container.
To check that TensorFlow was properly installed:
```
Singularity> python -c 'import tensorflow; print(tensorflow.__version__)'
```
There might be a few warning messages such as "Unable to register", but this is ok, and the output should be something like "2.15.0".
To confirm that TensorFlow is using the GPU:
```
Singularity> python -c 'import tensorflow as tf; gpus = tf.config.list_physical_devices("GPU"); print("Num GPUs Available: ", len(gpus)); print("GPUs: ", gpus)'
```
You can check the list of python libraries available:
```
Singularity> pip list
```
Notice that on top of TensorFlow several well known libraries, such as "notebook", "numpy", "pandas", "scikit-learn" and "scipy", were installed in the container. The great news here is that NVIDIA made sure that all these libraries were compatible with TensorFlow so there should not be any version incompatibilities.
If necessary you may install extra packages that your deep learning code will use. For that you should create a virtual environment. Here we will call it "venv\_tensorflow\_gpu", but you may choose another name:
```
Singularity> python -m venv --system-site-packages venv_tensorflow_gpu
```
Activate the virtual environment:
```
Singularity> source venv_tensorflow_gpu/bin/activate
```
To install for example "tf\_keras\_vis":
```
(venv_tensorflow_gpu) Singularity> pip install tf_keras_vis
```
Deactivate your virtual environment and logout from singularity and the GPU node:
```
(venv_tensorflow_gpu) Singularity> deactivate
Singularity> exit
exit
```
#### Comments
##### Reproducibility
The container version specifies all Python libraries versions, ensuring consistency across different environments. If you also use a virtual environment and want to make your installation more reproducible, you may proceed as follows:
1\. Create a file called "requirements.txt" and write the package names inside. You may also specify the package versions. For example:
```
tf_keras_vis==0.8.7
```
2\. Proceed as above, but instead of installing the packages individually, type
```
pip install -r requirements.txt
```
##### Build your own container
Go to the webpage: [https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/index.html](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/index.html)
Click on the latest release, which is "TensorFlow Release 24.05" at the time we're writing this documentation, and scroll down to see the table "NVIDIA TensorFlow Container Versions". It will show you the container versions and associated TensorFlow versions. For exemple, if you want to use TensorFlow 2.14 you could select the container 24.01.
Go to the webpage: [https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow/tags](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow/tags)
Select the appropriate container, for 24.01 it is "nvcr.io/nvidia/tensorflow:24.01-tf2-py3". Do not choose any "-igpu" containers because they do not work on the UNIL clusters.
Choose a name for the container, for example "tensorflow-ngc-24.01-tf2.14.sif", and create the following file by using your favorite editor:
```
cd /scratch/username/
vi tensorflow-ngc.def
```
```
Bootstrap: docker
From: nvcr.io/nvidia/tensorflow:24.01-tf2-py3
%post
apt-get update && apt -y upgrade
PYTHONVERSION=$(python3 --version|cut -f2 -d\ | cut -f-2 -d.)
apt-get install -y bash wget gzip locales python$PYTHONVERSION-venv git
sed -i '/^#.* en_.*.UTF-8 /s/^#//' /etc/locale.gen
sed -i '/^#.* fr_.*.UTF-8 /s/^#//' /etc/locale.gen
locale-gen
```
Note that if you choose a difference container version, you will need to replace "24.01" by the appropriate container version in the script.
You can now download the container:
```
module load singularityce/4.1.0
export SINGULARITY_DISABLE_CACHE=1
singularity build --fakeroot tensorflow-ngc-24.01-tf2.14.sif tensorflow-ngc.def
mv tensorflow-ngc-24.01-tf2.14.sif /work/PATH_TO_YOUR_PROJECT
```
That's it. You can then use it as it was explained above.
Warning: Do not log into a GPU node for building a singularity container, it will not work. But of course you will need to log into a GPU node to use it as shown below.
##### Use a virtual environment
Using containers is convenient because it is often difficult to install TensorFlow directly within a virtual environment. The reason is that TensorFlow has several dependencies and we must load or install the correct versions of them. Here are some instructions:
```
cd /work/PATH_TO_YOUR_PROJECT
Sinteractive -m 4G -G 1
module load python/3.10.13 tk/8.6.11 tcl/8.6.12
python -m venv venv_tensorflow_gpu
source venv_tensorflow_gpu/bin/activate
pip install tensorflow[and-cuda]==2.14.0 "numpy<2"
```
#### Run your deep learning code
To test your deep learning code (maximum 1h), say "my\_deep\_learning\_code.py", you may use the interactive mode:
```
cd /PATH_TO_YOUR_CODE/
Sinteractive -m 4G -G 1
module load singularityce/4.1.0
export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
singularity run --nv /dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-24.05-2.15.sif
source /work/PATH_TO_YOUR_PROJECT/venv_tensorflow_gpu/bin/activate
```
Run your code:
```
python my_deep_learning_code.py
```
or copy/paste your code inside a python environment:
```
python
copy/paste your code. For example:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
etc
```
Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my\_sbatch\_script.sh":
```
#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch
#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out
#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00
module load singularityce/4.1.0
export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
# To use only singularity
export singularity_python="singularity run --nv /dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-24.05-2.15.sif python"
# To use singularity and virtual environment
export singularity_python="singularity run --nv /dcsrsoft/singularity/containers/tensorflow/tensorflow-ngc-24.05-2.15.sif /work/PATH_TO_YOUR_PROJECT/venv_tensorflow_gpu/bin/python"
$singularity_python /PATH_TO_YOUR_CODE/my_deep_learning_code.py
```
To launch your job:
```
cd PATH_TO_YOUR_SBATCH_SCRIPT/
sbatch my_sbatch_script.sh
```
Remember that you should write the output files in your /scratch directory.
#### Multi-GPU parallelism
If you want to use a single GPU, you do not need to tell Keras to use the GPU. Indeed, if a GPU is available, Keras will use it automatically.
On the other hand, if you want to use 2 (or more) GPUs (on the same node), you need to use a special TensorFlow function, called "tf.distribute.MirroredStrategy", in your python code "my\_deep\_learning\_code.py": see the Keras documentation [https://keras.io/guides/distributed\_training/](https://keras.io/guides/distributed_training/) If no devices are specified in the constructor argument of the strategy then it will use all the available GPUs. If no GPUs are found, it will use the available CPUs.
This function implements single-machine multi-GPU data parallelism. It works in the following way: divide the batch data into multiple sub-batches, apply a model copy on each sub-batch, where every model copy is executed on a dedicated GPU, and finally concatenate the results (on CPU) into one big batch. For example, if your batch\_size is 64 and you use 2 GPUs, then we will divide the input data into 2 sub-batches of 32 samples, process each sub-batch on one GPU, then return the full batch of 64 processed samples. This induces quasi-linear speedup.
And the sbatch script must contain the line:
```
#SBATCH --gres gpu:2
```
### TensorBoard
To use TensorBoard on Curnagl, you need to modify your code as explained in [https://keras.io/api/callbacks/tensorboard/](https://keras.io/api/callbacks/tensorboard/) .
After your TensorBoard "logs" directory has been created, you need to proceed as follows:
```
[/scratch/pjacquet] Sinteractive -m 4G -G 1
```
```
Sinteractive is running with the following options:
--gres=gpu:1 -c 1 --mem 4G -J interactive -p interactive -t 1:00:00 --x11
salloc: Granted job allocation 2466209
salloc: Waiting for resource configuration
salloc: Nodes dnagpu001 are ready for job
```
You need to remember the GPU node's name dnagpuXXX. Here it is dnagpu001.
Then
```
[/scratch/pjacquet] module load singularityce/4.1.0
[/scratch/pjacquet] export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
[/scratch/pjacquet] singularity run --nv /dcsrsoft/singularity/containers/tensorflow-ngc-24.05-2.15.sif
Singularity> source /work/PATH_TO_YOUR_PROJECT/venv_tensorflow_gpu/bin/activate
(venv_tensorflow_gpu) Singularity> ls
logs
(venv_tensorflow_gpu) Singularity> tensorboard --logdir=./logs --port=6006
```
You will see the following message:
```
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.6.0 at http://localhost:6006/ (Press CTRL+C to quit)
```
On your laptop, you need to type:
```
ssh -J curnagl.dcsr.unil.ch -L 6006:localhost:6006 dnagpuXXX
```
where dnagpuXXX is the GPU node's name you used to launch TensorBoard (above it was dnagpu001).
Finally, on your laptop, you may use any web browser (e.g. Chrome) to open the page [http://localhost:6006](http://localhost:6006 "http://localhost:6006/") (copy/paste this link into your web browser). You should then see TensorBoard with the information located in the "logs" folder.
### PyTorch
To install the packages in your work directory:
```
cd /work/PATH_TO_YOUR_PROJECT
```
Log into a GPU node:
```
Sinteractive -m 4G -G 1
```
Check that the GPU is visible:
```
nvidia-smi
```
If it works properly you should see a message including an NVIDIA table. If you instead receive an error message such as "nvidia-smi: command not found" it means there is a problem.
To use PyTorch on NVIDIA GPUs we recommend the use of NVIDIA containers including PyTorch and its dependences such as CUDA and CuDNN that are necessary for GPU acceleration. The NVIDIA containers will also include various Python libraries and Python itself in such a way that everything is compatible with the version of PyTorch you choose. Nevertheless, if you prefer to use the virtual environment method, please look at the instructions in the comments below.
```
module load singularityce/4.1.0
export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
```
We have already downloaded several versions of PyTorch:
```
/dcsrsoft/singularity/containers/pytorch/pytorch-ngc-24.05-2.4.sif
/dcsrsoft/singularity/containers/pytorch/pytorch-ngc-24.04-2.3.sif
/dcsrsoft/singularity/containers/pytorch/pytorch-ngc-24.01-2.2.sif
/dcsrsoft/singularity/containers/pytorch/pytorch-ngc-23.10-2.1.sif
/dcsrsoft/singularity/containers/pytorch/pytorch-ngc-23.05-2.0.sif
```
Here the last two numbers indicate the PyTorch version, for example "pytorch-ngc-24.05-2.4.sif" corresponds to PyTorch version "2.4". In case you want to use another version, see the instructions in the comments below.
To run it:
```
singularity run --nv /dcsrsoft/singularity/containers/pytorch/pytorch-ngc-24.05-2.4.sif
```
You may receive a few error messages such as “not a valid test operator”, but this is ok and should not cause any problem. You should see a message by NVIDIA including the PyTorch version. The prompt should now start with "Singularity>" emphasising that you are working within a singularity container.
To check that PyTorch was properly installed:
```
Singularity> python -c 'import torch; print(torch.__version__)'
```
There might be a few warning messages such as "Unable to register", but this is ok, and the output should be something like "2.4.0".
To confirm that PyTorch is using the GPU:
```
Singularity> python -c 'import torch; cuda_available = torch.cuda.is_available(); num_gpus = torch.cuda.device_count(); gpus = [torch.cuda.get_device_name(i) for i in range(num_gpus)]; print("Num GPUs Available: ", num_gpus); print("GPUs: ", gpus)'
```
You can check the list of python libraries available:
```
Singularity> pip list
```
Notice that on top of PyTorch several well known libraries, such as "notebook", "numpy", "pandas", "scikit-learn" and "scipy", were installed in the container. The great news here is that NVIDIA made sure that all these libraries were compatible with PyTorch so there should not be any version incompatibilities.
If necessary you may install extra packages that your deep learning code will use. For that you should create a virtual environment. Here we will call it "venv\_pytorch\_gpu", but you may choose another name:
```
Singularity> python -m venv --system-site-packages venv_pytorch_gpu
```
Activate the virtual environment:
```
Singularity> source venv_pytorch_gpu/bin/activate
```
To install for example "captum":
```
(venv_pytorch_gpu) Singularity> pip install captum
```
Deactivate your virtual environment and logout from singularity and the GPU node:
```
(venv_pytorch_gpu) Singularity> deactivate
Singularity> exit
exit
```
#### Comments
##### Reproducibility
The container version specifies all Python libraries versions, ensuring consistency across different environments. If you also use a virtual environment and want to make your installation more reproducible, you may proceed as follows:
1\. Create a file called "requirements.txt" and write the package names inside. You may also specify the package versions. For example:
```
captum==0.7.0
```
2\. Proceed as above, but instead of installing the packages individually, type
```
pip install -r requirements.txt
```
##### Build your own container
Go to the webpage: [https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html)
Click on the latest release, which is "PyTorch Release 24.05" at the time we're writing this documentation, and scroll down to see the table "NVIDIA PyTorch Container Versions". It will show you the container versions and associated PyTorch versions. For exemple, if you want to use PyTorch 2.4 you could select the container 24.05.
Go to the webpage: [https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags)
Select the appropriate container, for 24.05 it is "nvcr.io/nvidia/pytorch:24.05-py3". Do not choose any "-igpu" containers because they do not work on the UNIL clusters.
Choose a name for the container, for example "pytorch-ngc-24.05-2.4.sif", and create the following file by using your favorite editor:
```
cd /scratch/username/
vi pytorch-ngc.def
```
```
Bootstrap: docker
From: nvcr.io/nvidia/pytorch:24.05-py3
%post
apt-get update && apt -y upgrade
PYTHONVERSION=$(python3 --version|cut -f2 -d\ | cut -f-2 -d.)
apt-get install -y bash wget gzip locales python$PYTHONVERSION-venv git
sed -i '/^#.* en_.*.UTF-8 /s/^#//' /etc/locale.gen
sed -i '/^#.* fr_.*.UTF-8 /s/^#//' /etc/locale.gen
locale-gen
```
Note that if you choose a difference container version, you will need to replace "24.05" by the appropriate container version in the script.
You can now download the container:
```
module load singularityce/4.1.0
export SINGULARITY_DISABLE_CACHE=1
singularity build --fakeroot pytorch-ngc-24.05-2.4.sif pytorch-ngc.def
mv pytorch-ngc-24.05-2.4.sif /work/PATH_TO_YOUR_PROJECT
```
That's it. You can then use it as it was explained above.
Warning: Do not log into a GPU node for building a singularity container, it will not work. But of course you will need to log into a GPU node to use it as shown below.
##### Use a virtual environment
Using containers is convenient because it is often difficult to install PyTorch directly within a virtual environment. The reason is that PyTorch has several dependencies and we must load or install the correct versions of them. Here are some instructions:
```
cd /work/PATH_TO_YOUR_PROJECT
Sinteractive -m 4G -G 1
module load python/3.10.13 cuda/11.8.0 cudnn/8.7.0.84-11.8
python -m venv venv_pytorch_gpu
source venv_pytorch_gpu/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
#### Run your deep learning code
To test your deep learning code (maximum 1h), say "my\_deep\_learning\_code.py", you may use the interactive mode:
```
cd /PATH_TO_YOUR_CODE/
Sinteractive -m 4G -G 1
module load singularityce/4.1.0
export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
singularity run --nv /dcsrsoft/singularity/containers/pytorch/pytorch-ngc-24.05-2.4.sif
source /work/PATH_TO_YOUR_PROJECT/venv_pytorch_gpu/bin/activate
```
Run your code:
```
python my_deep_learning_code.py
```
or copy/paste your code inside a python environment:
```
python
copy/paste your code
```
Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my\_sbatch\_script.sh":
```
#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch
#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out
#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00
module load singularityce/4.1.0
export SINGULARITY_BINDPATH="/scratch,/dcsrsoft,/users,/work,/reference"
# To use only singularity
export singularity_python="singularity run --nv /dcsrsoft/singularity/containers/pytorch/pytorch-ngc-24.05-2.4.sif python"
# To use singularity and virtual environment
export singularity_python="singularity run --nv /dcsrsoft/singularity/containers/pytorch/pytorch-ngc-24.05-2.4.sif /work/PATH_TO_YOUR_PROJECT/venv_pytorch_gpu/bin/python"
$singularity_python /PATH_TO_YOUR_CODE/my_deep_learning_code.py
```
To launch your job:
```
cd $HOME/PATH_TO_YOUR_SBATCH_SCRIPT/
sbatch my_sbatch_script.sh
```
### TensorBoard
You may use TensorBoard with PyTorch by looking at the documentation
[https://pytorch.org/tutorials/recipes/recipes/tensorboard\_with\_pytorch.html](https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html)
and by adapting slightly the instructions above (see TensorBoard in TensorFlow and Keras).
# Software local installation
This page gives an example of a local installation of a software, i.e. a software that will be only available to yourself. For simplicity we assume here that the software you want to install is available as a single binary file.
To be executed from anywhere the binary must be placed in a directory contained in your PATH environment variable. We use here a directory called "bin" in your home directory:
```
$ mkdir ~/bin
```
Then, edit your ~/.bashrc file to add the newly created directory to your search path by adding this line:
`export PATH=~/bin:$PATH`
Then reload your .bashrc to take into account this change:
```
$ source ~/.bashrc
```
Now, you can simply copy your binary to ~/bin and it will be available from anywhere for execution:
```
$ cp /path/to/downloaded/my_binary ~/bin
```
Finally, make sure your binary is executable:
```
$ chmod +x ~/bin/my_binary
```
# Rstudio on the Urblauna cluster
Rstudio can be run on the Urblauna cluster from within a singularity container, with an interactive interface provided on the web browser of a [Guacamole](https://u-web.dcsr.unil.ch/) session.
Running interactively with Rstudio on the clusters is only meant for testing. Development must be carried out on the users workstations, and production runs must be accomplished [from within R scripts/codes in batch mode](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/r-on-the-clusters-old/).
The command Rstudio is now available in r-light module. You have to do a reservation first with Sinteractive, ask the right amount of resources and then launch the command 'Rstudio'.
### Procedure
```bash
Sinteractive # specify here the right amount of resources
module load r-light
Rstudio
```
The procedure below is now deprecated !!
#### Preparatory steps on Curnagl side
A few operations have to be executed on the Curnagl cluster:
1. Create a directory in your /work project dedicated to be used as an R library, for instance:
`mkdir /work/FAC/FBM/DBC/mypi/project/R_ROOT`
2. Optional : install required R packages, for instance `ggplot2`
`module load gcc rexport R_LIBS_USER=/work/FAC/FBM/DBC/mypi/project/R_ROOTR>>>install.packages("ggplot2")`
#### The batch script
Create a file **rstudio-server.sbatch** with the following contents (it must be on the cluster, but the exact location does not matter):
```bash
#!/bin/bash -l
#SBATCH --account <<>>
#SBATCH --job-name rstudio-server
#SBATCH --signal=USR2
#SBATCH --output=rstudio-server.job
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --time 02:00:00
#SBATCH --partition interactive
#SBATCH --export NONE
RLIBS_USER_DIR=<<>>
RSTUDIO_CWD=~
RSTUDIO_SIF="/dcsrsoft/singularity/containers/rstudio-4.3.2.sif"
module load python singularityce
module load r
RLIBS_DIR=${R_ROOT}/rlib/R/library
module unload r
# Create temp directory for ephemeral content to bind-mount in the container
RSTUDIO_TMP=$(mktemp --tmpdir -d rstudio.XXX)
mkdir -p -m 700 \
${RSTUDIO_TMP}/run \
${RSTUDIO_TMP}/tmp \
${RSTUDIO_TMP}/var/lib/rstudio-server
mkdir -p ${RSTUDIO_CWD}/.R
cat > ${RSTUDIO_TMP}/database.conf < ${RSTUDIO_TMP}/rsession.sh <&2 <&2
exit $SINGULARITY_EXIT_CODE
```
You need to carefully replace, at the beginning of the file, the following elements:
- On line 3: **<<<*ACCOUNT\_NAME>>>*** with the project id that was attributed to your PI for the given project
- On line 14: **<<<*RLIBS\_PATH>>>*** must be replaced with the **absolute path** (ex. */work/FAC/.../R\_ROOT*) to the chosen folder you created on the preparatory steps
#### Running Rstudio
Submit a job for running Rstudio from within the cluster with:
```
[me@urblauna ~]$ sbatch rstudio-server.sbatch
```
Once the job is running (you can check that with Squeue), a new file rstudio-server.job is then automatically created. Its contents will give you instructions on how to proceed in order to start a new Rstudio remote session from Guacamole.
In this script we have reserved 2 hours
# DCSR GitLab service
**What is it?**
The DCSR hosted version control service ([https://gitlab.dcsr.unil.ch](https://gitlab.dcsr.unil.ch)) is primarily intended for the users of the "sensitive" data clusters which do not have direct internet access. It is not an official UNIL wide version control service!
It is accessible from both the sensitive data services and the UNIL network. From outside the UNIL network a VPN connection is required. It is open to all registered users of the DCSR facilities and is hosted on reliable hardware.
**Should I use it?**
If you are a user of the sensitive data clusters/services then the answer is yes.
For other users it may well be more convenient to use internet accessible services such as c4science.ch or GitHub.com as these allow for external collaborations and do not require VPN access or an account on the DCSR systems.
# Running Busco
A Singularity container is available for version 4.0.6 of Busco. To run it, you need to proceed as follows:
```
$ module load singularityce
$ export SINGULARITY_BINDPATH="/scratch,/users,/work"
```
Some configuration files included in the container must be copied in a writable location. So create a directory in your /scratch, e.g. called "busco\_config"
```
$ mkdir /path/to/busco_config
```
Then we copy the stuff out of the container to the newly created directory:
```
$ singularity exec /dcsrsoft/singularity/containers/busco-4.0.6 cp -rv /opt/miniconda/config/. /path/to/busco_config
```
Now we need to set the AUGUSTUS\_CONFIG\_PATH environment variable to the newly created and populated busco\_config directory:
```
$ export AUGUSTUS_CONFIG_PATH=/path/to/busco_config
```
Finally, you should now be able to run a test dataset from busco (see [https://gitlab.com/ezlab/busco/-/tree/master/test\_data/eukaryota](https://gitlab.com/ezlab/busco/-/tree/master/test_data/eukaryota)):
```
$ curl -O https://gitlab.com/ezlab/busco/-/raw/master/test_data/eukaryota/genome.fna
```
And launch the analysis.
Note: in `$AUGUSTUS_CONFIG_PATH` you have a copy of the default `config.ini` used here, so you can copy, modify it and use it in the `--config` option in the following command:
```
$ singularity exec /dcsrsoft/singularity/containers/busco-4.0.6 busco --config /opt/miniconda/config/config.ini -i genome.fna -c 8 -m geno -f --out test_eukaryota
```
Then download the reference log:
```
curl -O https://gitlab.com/ezlab/busco/-/raw/master/test_data/eukaryota/expected_log.txt
```
And compare to the one you generated.
# SWITCHfilesender from the cluster
#### Switch Filesender
Filesender is a service provided by SWITCH to transfer files over http. Normally files are uploaded via a web browser but this is not possible from the DCSR clusters.
In order to avoid having to transfer the files to your local computer it is possible to use the Filesender command line tools as explained below
#### Configuring the CLI tools
Connect to [https://filesender.switch.ch](https://filesender.switch.ch) then go to the profile tab
[](https://wiki.unil.ch/ci/uploads/images/gallery/2022-01/screenshot-2022-01-13-at-15-14-02.png)
Then click on "Create API secret" to generate a code that will be used to allow you to authenticate
[](https://wiki.unil.ch/ci/uploads/images/gallery/2022-01/screenshot-2022-01-13-at-15-14-37.png)
This will generate a long string like
`ab56bf28434d1fba1d5f6g3aaf8776e55fd722df205197`
This code should never be shared
Then connect to Curnagl and run the following commands to download the CLI tool and the configuration
```
cd
mkdir ~/.filesender
wget https://filesender.switch.ch/clidownload.php -O filesender.py
wget https://filesender.switch.ch/clidownload.php?config=1 -O ~/.filesender/filesender.py.ini
```
You will then need to edit the` ~/.filesender/filesender.py.ini` file using your preferred tool
You need to enter your username as show in the Filesender profile and the API key that you generated
*Note that at present, unlike the other Switch services this is not your EduID account!*
```
[system]
base_url = https://filesender.switch.ch/filesender2/rest.php
default_transfer_days_valid = 20
[user]
username = Ursula.Lambda@unil.ch
apikey = ab56bf28434d1fba1d5f6g3aaf8776e55fd722df205197
```
#### Transferring files
Now that we have done this we can transfer files - note that the modules must be loaded in order to have a python with the required libraries.
```
[ulambda@login ~]$ module load gcc python
[ulambda@login ~]$ python3 filesender.py -p -r ethz.collaborator@protonmail.ch results.zip
Uploading: /users/ulambda/results.zip 0-5242880 0%
Uploading: /users/ulambda/results.zip 5242880-10485760 6%
Uploading: /users/ulambda/results.zip 10485760-15728640 11%
Uploading: /users/ulambda/results.zip 15728640-20971520 17%
Uploading: /users/ulambda/results.zip 20971520-26214400 23%
Uploading: /users/ulambda/results.zip 26214400-31457280 29%
Uploading: /users/ulambda/results.zip 31457280-36700160 34%
Uploading: /users/ulambda/results.zip 36700160-41943040 40%
Uploading: /users/ulambda/results.zip 41943040-47185920 46%
Uploading: /users/ulambda/results.zip 47185920-52428800 52%
Uploading: /users/ulambda/results.zip 52428800-57671680 57%
Uploading: /users/ulambda/results.zip 57671680-62914560 63%
Uploading: /users/ulambda/results.zip 62914560-68157440 69%
Uploading: /users/ulambda/results.zip 68157440-73400320 74%
Uploading: /users/ulambda/results.zip 73400320-78643200 80%
Uploading: /users/ulambda/results.zip 78643200-83886080 86%
Uploading: /users/ulambda/results.zip 83886080-89128960 92%
Uploading: /users/ulambda/results.zip 89128960-91575794 97%
Uploading: /users/ulambda/results.zip 91575794 100%
```
A mail will be sent to who can then download the file
# Filetransfer from the cluster
#### filetransfer.dcsr.unil.ch
[https://filetransfer.dcsr.unil.ch](https://filetransfer.dcsr.unil.ch) is a service provided by the DCSR to allow you to transfer files to and from external collaborators.
This is an alternative to SWITCHFileSender and the space available is 6TB with a maximum per user limit of 4TB - this space is shared between all users so it is unlikely that you will be able to transfer 4TB of data at once.
The filetransfer service is based on LiquidFiles and the user guide is available at [https://man.liquidfiles.com/userguide.html](https://man.liquidfiles.com/userguide.html)
In order to transfer files to and from the DCSR clusters without using the web browser it is also possible to use the CLI tools as explained below
#### Configuring the service
First you need to connect to the web interface at [https://filetransfer.dcsr.unil.ch](https://filetransfer.dcsr.unil.ch) and connect using your UNIL username (e.g. ulambda for Ursula Lambda) and password. This is not your EduID password but rather the one you use to connect to the clusters.
Once connected go to settings (the cog symbol in the top right corner) then the API tab
[](https://wiki.unil.ch/ci/uploads/images/gallery/2022-01/screenshot-2022-01-25-at-10-11-35.png)
The API key is how you authenticate from the clusters and this secret should never be shared. It can be reset via the yellow button.
#### Transferring files from the cluster
Connect to the login node and load the liquidfiles module
```
[ulambda@login ~]$ module load liquidfiles
[ulambda@login ~]$ liquidfiles
Usage:
liquidfiles
Valid commands are:
attach Uploads given files to server.
attach_chunk Uploads given chunk of file to server.
delete_attachments Deletes the given attachments.
delete_filelink Deletes the given filelink.
download Download given files.
file_request Sends the file request to specified user.
filedrop Sends the file(s) by filedrop.
filelink Uploads given file and creates filelink on it.
filelinks Lists the available filelinks.
get_api_key Retrieves api key for the specified user.
messages Lists the available messages.
send Sends the file(s) to specified user.
Type 'liquidfiles help ' to see command specific options and usage.
Abnormal exit codes:
1 Command line arguments are invalid - Invalid command name, missing required argument, invalid value for specific argument.
2 CURL error - Can't connect to host, connection timeout, certificate check failure, etc.
3 Error during file upload - Invalid API key, Invalid filename, etc.
4 Error during file send to user.
5 Error in file system - Can't open file, etc.
```
For example to upload a file and create a file link
```
liquidfiles filelink --server=https://filetransfer.dcsr.unil.ch --api_key=9MUQeF5nG899lHdCtg myfile.dat
```
You can then connect to the web interface from you workstation to manage the files and send messages as required.
As preparing and uploading files can take a while we recommend that this is performed in a tmux session which means that even if your connection to the cluster is lost the process continues and you can reconnect.
####
#### Transferring large files
***If using a single file upload doesn't work and it is not possible to split the data into multiple smaller files then the following information may be useful***
##### Staging the files
We recommend that you create TAR files containing the data you wish to transfer and stage this in your /scratch space. Depending on the data type it can be useful to compress it first.
```
$ cd /scratch/ulambda
$ mkdir mytransfer
$ cd mytransfer
$ tar -cvf mydata.tar /work/path/to/my/data
```
Then calculate the checksum of the file to be transfered
```
$ sha256sum mydata.tar
7aac249b9ec0835361f44c84921a194e587a38daecadf302e9dec44386c9fb36 mydata.tar
```
#####
##### Split the file and transfer chunks
Whilst it might be possible to transfer huge files in one upload, it isn't recommended and above ~100GB we recommend that you follow the procedure given below.
**Split the file into chunks**
```
$ split --verbose -d -a4 -b1G mydata.tar
creating file 'x0000'
creating file 'x0001'
creating file 'x0002'
creating file 'x0003'
..
..
creating file 'x0102'
```
In the staging directory this will create files of exactly 1GB in size- here Usrula's file is 102.5 GB so there are 103 chunks
**Use a loop and the attach\_chunk command**
First we need to know how many files there are
```
$ ls x* | wc -l
103
```
This is because we need to tell the service how many bits the file has been split into so it knows when the upload is complete.
Now we note our API key and use the following bash loop (this can also be put in a script).
```
$ module load liquidfiles
$ for a in `seq -w 0 102`; do liquidfiles attach_chunk --server=https://filetransfer.dcsr.unil.ch --api_key=9MUQeF5nG899lHdCtg --chunk=$a --chunks=103 --filename=mydata.tar x0$a; done
Uploading chunk 'x0000'.
100% [================================================================================]
Current chunk uploaded successfully.
Uploading chunk 'x0001'.
100% [================================================================================]
Current chunk uploaded successfully.
..
Uploading chunk 'x0102'.
100% [================================================================================]
All chunks of file uploaded successfully. ID: FP0LAQ9FGFAosPNioe6ZyQ
```
Alternatively we can also use variables which makes the loop cleaner and easier to put in a script:
```
module load liquidfiles
SERVER=https://filetransfer.dcsr.unil.ch
KEY=9MUQeF5nG899lHdCtg
CHUNKS=103
MYFILE=mydata.tar
NC=`expr $CHUNKS - 1`
for a in `seq -w 0 $NC`; do liquidfiles attach_chunk --server=$SERVER --api_key=$KEY --chunk=$a --chunks=$CHUNKS --filename=$MYFILE x0$a; done
```
A shell script that does the same things is
```
#!/bin/bash
for a in `seq -w 0 102`; do
liquidfiles attach_chunk --server=https://filetransfer.dcsr.unil.ch --api_key=9MUQeF5nG899lHdCtg --chunk=$a --chunks=103 --filename=mydata.tar x0$a
done
```
Once all the chunks are uploaded the file will be assembled/processed and after a short while it will be visible in the web interface.
Here we see a previously uploaded file of 304 GB called my file.ffdata
[](https://wiki.unil.ch/ci/uploads/images/gallery/2022-02/screenshot-2022-02-11-at-20-19-32.png)
##### Cleaning up
Once the file is uploaded please don't forget to clean up the TAR file and the chunks.
```
$ cd /scratch/ulambda/mytransfer
$ rm *
$ cd ..
$ rmdir mytransfer
```
# R on the clusters (old)
R is provided via the [DCSR software stack](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/dcsr-software-stack)
### Interactive mode
To load R:
```shell
module load r
R
# Then you can use R interactively
> ...
```
### Batch mode
While using R in batch mode, you have to use `Rscript` to launch your script. Here is an example of sbatch script, `run_r.sh`:
```shell
#!/bin/bash
#SBATCH --time 00-00:20:00
#SBATCH --cpus-per-task 1
#SBATCH --mem 4G
module load r
Rscript my_r_script.R
```
Then, just submit the job to Slurm:
```shell
sbatch run_r.sh
```
### Package installation
A number of core packages are installed centrally - you can see what is available by using the `library()` function. Given the number of packages and multiple versions available other packages should be installed by the user.
Installing R packages is pretty straightforward thanks to [install.packages()](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/install.packages.html) function. However, be careful since it might fill your home directory very quickly. For big packages with large amount of dependencies, like `adegenet` for instance, you will probably reach the quota before the end of the installation. Here is a solution to mitigate that problem:
- Remove your current R library (or set up an alternate one as explained in the section [Setting up an alternate personal library](#bkmrk-setting-up-an-altern) below):
```shell
rm -rf $HOME/R
```
- Create a new library in your scratch directory (obviously modify the path according to your situation):
```
mkdir -p /work/FAC/FBM/DEE/my_py/default/jdoe/R
```
- Create a symlink to locate the R library on the scratch dir:
```shell
cd $HOME
ln -s /work/FAC/FBM/DEE/my_py/default/jdoe/R
```
- Install your R packages
#### Handling dependencies
Sometimes R packages depend on external libraries. For most of cases the library is already installed on the cluster you just need to load the module before trying to install the package from the R session.
If the installation of package is still failing you need to define the following variables. For example, if our package depend on gsl and mpfr libraries, we need to do the following:
```bash
module load gsl mpfr
export CPATH=$GSL_ROOT/include:$MPFR_ROOT/include
export LIBRARY_PATH=$GSL_ROOT/lib:$MPFR_ROOT/lib
```
### Setting up an alternate personal library
If you want to set up an alternate location where to install R packages, you can proceed as follows:
```
mkdir -p ~/R/my_personal_lib2
# If you already have a ~/.Renviron file, make a backup
cp -iv ~/.Renviron ~/.Renviron_backup
echo 'R_LIBS_USER=~/R/my_personal_lib2' > ~/.Renviron
```
Then relaunch R. Packages will then be installed under `~/R/my_personal_lib2`.
# Sandbox containers
#### Container basics
For how to use Singularity/Apptainer containers please see our course at: [http://dcsrs-courses.ad.unil.ch/r\_python\_singularity/r\_python\_singularity.html](http://dcsrs-courses.ad.unil.ch/r_python_singularity/r_python_singularity.html)
#### Sandboxes
A container image (the .sif file) is read only and its contents cannot be changed which makes them perfect for distributing safe in the knowledge that they should run exactly as they were created.
Sometimes, especially when developing things, it's very useful to be able to interactively modify a container and this is what sandboxes are for.
Please be aware that anything done by hand is not reproducible so all steps should be transferred to the container definition file.
#### Creating and modifying a sandbox
Note that the steps here should be run on the cluster login node (curnagl.dcsr.unil.ch) as it is currently the only machine with the configuration in place to allow containers to be built.
To start you need a basic definition file - this can be an empty OS or something more complicated that already has some configuration.
In the following example we will use a definition that installs the latest version of R. We will then try and install extra packages before creating the immutable SIF image.
Here's our file which we save as `newR.def`
```
Bootstrap: docker
From: ubuntu:20.04
%post
apt update
apt install -y locales gnupg-agent
sed -i '/^#.* en_.*.UTF-8 /s/^#//' /etc/locale.gen
sed -i '/^#.* fr_.*.UTF-8 /s/^#//' /etc/locale.gen
locale-gen
# install two helper packages we need
apt install -y --no-install-recommends software-properties-common dirmngr
# add the signing key (by Michael Rutter) for these repos
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 51716619E084DAB9
# add the R 4.0 repo from CRAN -- adjust 'focal' to 'groovy' or 'bionic' as needed
add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
apt install -y --no-install-recommends r-base
```
#####
##### Create the sandbox
Change to your scratch space /scratch/username and:
```
$ module load singularityce
$ singularity build --fakeroot --sandbox newR newR.def
WARNING: The underlying filesystem on which resides "/scratch/username/myR" won't allow to set ownership, as a consequence the sandbox could not preserve image's files/directories ownerships
INFO: Starting build...
Getting image source signatures
Copying blob d7bfe07ed847 [--------------------------------------] 0.0b / 0.0b
Copying config 2772dfba34 done
..
..
..
Processing triggers for libc-bin (2.31-0ubuntu9.9) ...
Processing triggers for systemd (245.4-4ubuntu3.17) ...
Processing triggers for mime-support (3.64ubuntu1) ...
INFO: Creating sandbox directory...
INFO: Build complete: myR
```
This will create a directory called newR which is the writable container image. Have a look inside and see what there is!
##### Run and edit the image
Before running the container we need to set up the filesystems that will be visible inside - here we want /users and /scratch to be visible
```
$ export SINGULARITY_BINDPATH="/users,/scratch"
$ mkdir newR/users
$ mkdir newR/scratch
```
Now we launch the image with an interactive shell
```
$ singularity shell --writable --fakeroot newR/
Singularity>
```
On the command line we can then work interactively with the image.
As we are going to be installing R packages we know that we need some extra tools:
```
Singularity> apt-get install make gcc g++ gfortran
```
Now we can launch R and install some packages
```
Singularity> R
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
..
> install.packages('tibble')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘glue’, ‘cli’, ‘utf8’, ‘ellipsis’, ‘fansi’, ‘lifecycle’, ‘magrittr’, ‘pillar’, ‘rlang’, ‘vctrs’
trying URL 'https://cloud.r-project.org/src/contrib/glue_1.6.2.tar.gz'
Content type 'application/x-gzip' length 106510 bytes (104 KB)
==================================================
downloaded 104 KB
..
..
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (tibble)
```
Keep iterating until things are correct but don't forget to write down all the steps and transfer then to the definition file to allow for future reproducible builds.
##### Sandbox to SIF
```
$ singularity build --fakeroot R-4.2.1-production.sif newR/
```
You will now have a SIF file that can be used in the normal way
```
$ singularity run R-4.2.1-production.sif R
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
..
>
```
Remember that files on /scratch will be automatically deleted if there isn't enough free space so save your definitions in a git repository and move the SIF images to your project space in /work
# Course software for decision trees / random forests
In the practicals, we will use only a small dataset and we will need only little computation power and memory ressources. You can therefore do the practicals on various computing platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals.
- [JupyterLab](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-decision-trees-random-forests#bkmrk-jupyterlab): Working on the cloud is convenient because the installation of the Python and R packages is already done and you will be working with a Jupyter Notebook style even if you use R. Note, however, that the UNIL JupyterLab will only be active during the course and for one week following its completion, so in the long term you should use either your laptop or Curnagl.
- [Laptop](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-decision-trees-random-forests#bkmrk-laptop): This is good if you want to work directly on your laptop, but you will need to install the required libraries on your laptop. Warning: We will give general instructions on how to install the libraries on your laptop but it is sometimes tricky to find the right library versions and we will not be able to help you with the installation. The installation should take about 15 minutes.
- [Curnagl](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-decision-trees-random-forests#bkmrk-curnagl): This is efficient if you are used to work on a cluster or if you intend to use one in the future to work on large projects. If you have an account you can work on your /scratch folder or ask us to be part of the course project but please contact us at least a week before the course. If you do not have an account to access the UNIL cluster Curnagl, please contact us at least a week before the course so that we can give you a temporary account. The installation should take about 15 minutes. Note that it is also possible to use JupyterLab on Curnagl: see [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster)
If you choose to work on the UNIL JupyterLab, then you do not need to prepare anything since all the necessary libraries will already be installed on the UNIL JupyterLab. In all cases, you will receive a guest username during the course, so you will be able to work on the UNIL JupyterLab.
Otherwise, if you prefer to work on your laptop or on Curnagl, please make sure you have a working installation before the day of the course as on the day we will be unable to provide any assistance with this.
If you have difficulties with the installation on Curnagl we can help you so please contact us before the course at helpdesk@unil.ch with subject: DCSR ML course.
On the other hand, if you are unable to install the libraries on your laptop, we will unfortunately not be able to help you (there are too many particular cases), so you will need to use the UNIL Jupyter Lab during the course.
Before the course, we will send you all the files that are needed to do the practicals.
### **JupyterLab**
Here are some instructions for using the UNIL JupyterLab to do the practicals.
You need to be able to access the eduroam wifi with your UNIL account or via the UNIL VPN.
Go to the webpage: [https://jupyter.dcsr.unil.ch/jupyter](https://jupyter.dcsr.unil.ch/jupyter)
Enter the login and password that you have received during the course. Due to a technical issue, you may receive a warning message "Your connection is not private". This is OK. So please proceed by clicking on the advanced button and then on "Proceed to dcsrs-jupyter.ad.unil.ch (unsafe)".
#### **Python**
Click on the "ML" square button in the Notebook panel.
Copy / paste the commands from the html practical file to the Jupyter Notebook.
To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.
When you have finished the practicals, select File / Log out.
#### **R**
Click on the "ML R" square button in the Notebook panel.
Copy / paste the commands from the html practical file to the Jupyter Notebook.
To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.
When you have finished the practicals, select File / Log out.
### **Laptop**
You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).
#### **Python installation**
Here are some instructions for installing decision tree and random forest libraries on your laptop. You need Python >= 3.7.
##### **For Mac and Linux**
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install scikit-learn pandas matplotlib graphviz seaborn
```
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
source mlcourse/bin/activate
pip3 install notebook
jupyter notebook
```
##### **For Windows**
If you do not have Python installed, you can use either Conda: [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html) or Python official installer: [https://www.python.org/downloads/windows/](https://www.python.org/downloads/windows/)
Let us create a virtual environment. Open your terminal and type:
```
C:\Users\user>python -m venv mlcourse
C:\Users\user>mlcourse\Scripts\activate.bat
(mlcourse) C:\Users\user>
(mlcourse) C:\Users\user>pip3 install scikit-learn pandas matplotlib graphviz seaborn
```
You can terminate the current session:
```
(mlcourse) C:\Users\user>deactivate
C:\Users\user>
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
C:\Users\user>mlcourse\Scripts\activate.bat
(mlcourse) C:\Users\user>pip3 install notebook
(mlcourse) C:\Users\user>jupyter notebook
```
**Information:** Use Control-C to stop this server.
#### **R installation**
Here are some instructions for installing decision tree and random forest libraries on your laptop.
You need R >= 4.0. Run R in your terminal or launch RStudio.
For Windows users, you can download R here: [https://cran.r-project.org/bin/windows/base/](https://cran.r-project.org/bin/windows/base/ "https://cran.r-project.org/bin/windows/base/")
REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library to install packages into? (yes/No/cancel) yes
And select Switzerland for the CRAN mirror.
```
install.packages("rpart")
install.packages("rpart.plot")
install.packages("randomForest")
install.packages("tidyverse")
```
The installation of "tidyverse" may lead to some conflicts, but do not worry you should be able to do the practicals fine.
You can terminate the current R session:
```
q()
```
Save workspace image? \[y/n/c\]: n
**TO DO THE PRACTICALS (today or another day):**
Simply run R in your terminal or launch RStudio.
### **Curnagl**
For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.
For Mac users, download and install XQuartz (X server): [https://www.xquartz.org/](https://www.xquartz.org/)
For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: [https://mobaxterm.mobatek.net/download-home-edition.html](https://mobaxterm.mobatek.net/download-home-edition.html)
For Linux users, you do not need to install anything.
#### **Python installation**
Here are some instructions for installing decision tree and random forest libraries on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
```
Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.
For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.
See also the documentation: [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster)
```
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/
mkdir < my unil username >
cd < my unil username >
```
For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).
```
module load python/3.10.13
python -m venv mlcourse
source mlcourse/bin/activate
pip install scikit-learn pandas matplotlib graphviz seaborn
```
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >
```
For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).
```
module load python/3.10.13
source mlcourse/bin/activate
python
```
#### **R installation**
Here are some instructions for installing decision tree and random forest libraries on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
```
Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.
For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question “do you want to save password ?” Say No if your are not sure. Then you should be on Curnagl.
See also the documentation: [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster)
```
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/
mkdir < my unil username >
cd < my unil username >
```
For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).
```
module load r/4.3.2
R
```
REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library to install packages into? (yes/No/cancel) yes
And select Switzerland for the CRAN mirror.
```
install.packages("rpart")
install.packages("rpart.plot")
install.packages("randomForest")
install.packages("tidyverse")
```
The installation of "tidyverse" may lead to some conflicts, but do not worry you should be able to do the practicals fine.
You can terminate the current R session:
```
q()
```
Save workspace image? \[y/n/c\]: n
**TO DO THE PRACTICALS (today or another day):**
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
cd /scratch/my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >
```
For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).
```
module load r/4.3.2
R
```
# Course software for introductory deep learning
In the practicals, we will use only a small dataset and we will need only little computation power and memory ressources. You can therefore do the practicals on various computing platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals.
- [JupyterLab](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-introductory-deep-learning#bkmrk-jupyterlab): Working on the cloud is convenient because the installation of the Python and R packages is already done and you will be working with a Jupyter Notebook style even if you use R. Note, however, that the UNIL JupyterLab will only be active during the course and for one week following its completion, so in the long term you should use either your laptop or Curnagl.
- [Laptop](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-introductory-deep-learning#bkmrk-laptop): This is good if you want to work directly on your laptop, but you will need to install the required libraries on your laptop. Warning: We will give general instructions on how to install the libraries on your laptop but it is sometimes tricky to find the right library versions and we will not be able to help you with the installation. The installation should take about 15 minutes.
- [Curnagl](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-introductory-deep-learning#bkmrk-curnagl): This is efficient if you are used to work on a cluster or if you intend to use one in the future to work on large projects. If you have an account you can work on your /scratch folder or ask us to be part of the course project but please contact us at least a week before the course. If you do not have an account to access the UNIL cluster Curnagl, please contact us at least a week before the course so that we can give you a temporary account. The installation should take about 15 minutes. Note that it is also possible to use JupyterLab on Curnagl: see [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster)
If you choose to work on the UNIL JupyterLab, then you do not need to prepare anything since all the necessary libraries will already be installed on the UNIL JupyterLab. In all cases, you will receive a guest username during the course, so you will be able to work on the UNIL JupyterLab.
Otherwise, if you prefer to work on your laptop or on Curnagl, please make sure you have a working installation before the day of the course as on the day we will be unable to provide any assistance with this.
If you have difficulties with the installation on Curnagl we can help you so please contact us before the course at helpdesk@unil.ch with subject: DCSR ML course.
On the other hand, if you are unable to install the libraries on your laptop, we will unfortunately not be able to help you (there are too many particular cases), so you will need to use the UNIL Jupyter Lab during the course.
Before the course, we will send you all the files that are needed to do the practicals.
### **JupyterLab**
Here are some instructions for using the UNIL JupyterLab to do the practicals.
You need to be able to access the eduroam wifi with your UNIL account or via the UNIL VPN.
Go to the webpage: [https://jupyter.dcsr.unil.ch/jupyter](https://jupyter.dcsr.unil.ch/jupyter)
Enter the login and password that you have received during the course. Due to a technical issue, you may receive a warning message "Your connection is not private". This is OK. So please proceed by clicking on the advanced button and then on "Proceed to dcsrs-jupyter.ad.unil.ch (unsafe)".
#### **Python**
Click on the "ML" square button in the Notebook panel.
Copy / paste the commands from the html practical file to the Jupyter Notebook.
To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.
When using TensorFlow, you may receive a warning
2022-09-22 11:01:12.232756: W tensorflow/stream\_executor/platform/default/dso\_loader.cc:64\] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 11:01:12.232856: I tensorflow/stream\_executor/cuda/cudart\_stub.cc:29\] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).
When you have finished the practicals, select File / Log out.
#### **R**
Click on the "ML R" square button in the Notebook panel.
Copy / paste the commands from the html practical file to the Jupyter Notebook.
To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.
When using TensorFlow, you may receive a warning
2022-09-22 11:01:12.232756: W tensorflow/stream\_executor/platform/default/dso\_loader.cc:64\] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 11:01:12.232856: I tensorflow/stream\_executor/cuda/cudart\_stub.cc:29\] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).
When you have finished the practicals, select File / Log out.
### **Laptop**
You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).
#### **Python installation**
Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on your laptop. You need Python >= 3.8.
##### **For Linux**
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install tensorflow scikit-learn scikeras eli5 pandas matplotlib notebook keras-tuner
```
You may need to choose the right library versions, for example tensorflow==2.12.0
To check that Tensorflow was installed:
```
python3 -c "import tensorflow; print(tensorflow.version.VERSION)"
```
There might be a warning message (see above) and the output should be something like "2.12.0".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
source mlcourse/bin/activate
jupyter notebook
```
##### **For Mac**
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner
```
If you receive an error message such as:
ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos
Then, try the following command:
```
SYSTEM_VERSION_COMPAT=0 pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner
```
If you have a Mac with M1 or more recent chip (if you are not sure have a look at "About this Mac"), you can also install the tensorflow-metal library to accelerate training on Mac GPUs (but this is not necessary for the course):
```
pip3 install tensorflow-metal
```
To check that Tensorflow was installed:
```
python3 -c "import tensorflow; print(tensorflow.version.VERSION)"
```
There might be a warning message (see above) and the output should be something like "2.12.0".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
source mlcourse/bin/activate
jupyter notebook
```
##### **For Windows**
If you do not have Python installed, you can use either Conda: [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html) (see the instructions here: [https://conda.io/projects/conda/en/latest/user-guide/install/windows.html](https://conda.io/projects/conda/en/latest/user-guide/install/windows.html)) or Python official installer: [https://www.python.org/downloads/windows/](https://www.python.org/downloads/windows/)
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install tensorflow scikit-learn scikeras eli5 pandas matplotlib notebook keras-tuner
```
You may need to choose the right library versions, for example tensorflow==2.12.0
To check that Tensorflow was installed:
```
python -c "import tensorflow; print(tensorflow.version.VERSION)"
```
There might be a warning message (see above) and the output should be something like "2.12.0".
You can terminate the current session:
```
deactivate
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
mlcourse\Scripts\activate.bat
jupyter notebook
```
#### **R installation**
Here are some instructions for installing Keras with TensorFlow at the backend, and other libraries, on your laptop. The R keras is actually an interface to the Python Keras. In simple terms, this means that the keras R package allows you to enjoy the benefit of R programming while having access to the capabilities of the Python Keras package.
You need R >= 4.0 and Python >= 3.8.
REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library to install packages into? (yes/No/cancel) yes
And select Switzerland for the CRAN mirror.
##### **For Mac, Windows and Linux**
Run the following commands on your terminal:
```
cd ~
ls .virtualenvs
# Create this directory only if you receive an error message
# saying that this directory does not exist
mkdir .virtualenvs
```
Then
```
cd ~/.virtualenvs
python3 -m venv r-reticulate
source r-reticulate/bin/activate
# For Windows and Linux
pip3 install tensorflow scikit-learn scikeras eli5 pandas matplotlib notebook keras-tuner
# For Mac
pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner
deactivate
```
You must name the environment 'r-reticulate' as otherwise it wont be able to find it.
You may need to choose the right library versions, for example tensorflow==2.12.0
Run R in your terminal and type
```
install.packages("keras")
install.packages("reticulate")
install.packages("ggplot2")
install.packages("ggfortify")
```
To check that Keras was properly installed:
```
library(keras)
library(tensorflow)
is_keras_available(version = NULL)
```
There might be a warning message (see above) and the output should be something like "TRUE".
You can terminate the current R session:
```
q()
```
Save workspace image? \[y/n/c\]: n
**TO DO THE PRACTICALS (today or another day):**
Then you can either run R in your terminal or launch RStudio.
### **Curnagl**
For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.
For Mac users, download and install XQuartz (X server): [https://www.xquartz.org/](https://www.xquartz.org/)
For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: [https://mobaxterm.mobatek.net/download-home-edition.html](https://mobaxterm.mobatek.net/download-home-edition.html)
For Linux users, you do not need to install anything.
When testing if TensorFlow was properly installed (see below) you may receive a warning
2022-03-16 12:15:00.564218: W tensorflow/stream\_executor/platform/default/dso\_loader.cc:64\] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD\_LIBRARY\_PATH: /dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/python-3.8.8-tb3aceqq5wzx4kr5m7s5m4kzh4kxi3ex/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tcl-8.6.11-aonlmtcje4sgqf6gc4d56cnp3mbbhvnj/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tk-8.6.11-2gb36lqwohtzopr52c62hajn4tq7sf6m/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib64:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib
2022-03-16 12:15:00.564262: I tensorflow/stream\_executor/cuda/cudart\_stub.cc:29\] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).
#### **Python installation**
Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
```
Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.
For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.
See also the documentation: [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster)
```
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc
mkdir < my unil username >
cd < my unil username >
```
For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).
```
git clone https://c4science.ch/source/DL_INTRO.git
module load python/3.10.13
python -m venv mlcourse
source mlcourse/bin/activate
pip install -r DL_INTRO/requirements.txt
```
To check that TensorFlow was installed:
```
python -c 'import tensorflow; print(tensorflow.version.VERSION)'
```
There might be a warning message (see above) and the output should be something like "2.9.2".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >
```
For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).
```
module load gcc python/3.10.13
source mlcourse/bin/activate
python
```
#### **R installation**
Here are some instructions for installing Keras with TensorFlow at the backend, and other libraries, on the UNIL cluster called Curnagl. The R keras is actually an interface to the Python Keras. In simple terms, this means that the keras R package allows you to enjoy the benefit of R programming while having access to the capabilities of the Python Keras package. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
```
Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.
For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question “do you want to save password ?” Say No if your are not sure. Then you should be on Curnagl.
See also the documentation: [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster)
```
cd ~
module load python/3.10.13 r-light/4.4.1
git clone https://c4science.ch/source/DL_INTRO.git
cd ~/.virtualenvs
python -m venv r-reticulate
source r-reticulate/bin/activate
pip install -r ~/DL_INTRO/requirements.txt
```
For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).
REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library to install packages into? (yes/No/cancel) yes
And select Switzerland for the CRAN mirror.
```
R
install.packages("keras")
install.packages("ggplot2")
install.packages("ggfortify")
```
To check that Keras was properly installed:
```
library(keras)
library(tensorflow)
is_keras_available(version = NULL)
```
There might be a warning message (see above) and the output should be something like "TRUE".
You can terminate the current R session:
```
q()
```
Save workspace image? \[y/n/c\]: n
**TO DO THE PRACTICALS (today or another day):**
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
```
For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).
```
cd ~
module load python/3.10.13 r-light/4.4.1
R
```
# JupyterLab on the curnagl cluster
JupyterLab can be run on the curnagl cluster for testing purposes, only as an intermediate step in the porting of applications from regular workstations to curnagl.
The installation is made inside a python virtual environment, and this tutorial covers the installation of the following kernels: IPyKernel (**python**), IRKernel (**R**), IJulia (**julia**), MATLAB kernel (**matlab**), IOctave (**octave**), stata\_kernel (**stata**) and sas\_kernel (**sas**).
If the workstation is outside of the campus, first [connect to the VPN](https://www.unil.ch/ci/reseau-unil-chez-soi#guides-dinstallation).
### Creating the virtual environment
First create/choose a folder ${WORK} under the **/scratch** or the **/work** filesystems under your project (ex. WORK=*/work/FAC/.../my\_project*). The following needs to be run only once on the cluster (preferably on an interactive computing node):
```bash
module load gcc python
python -m venv ${WORK}/jlab_venv
${WORK}/jlab_venv/bin/pip install jupyterlab ipykernel numpy matplotlib
```
The IPyKernel is automatically available. The other kernels need to be installed according to your needs.
### Installing the kernels
**Each time you start a new session on the cluster, remember to define the variable ${WORK} according to the path you chose when creating the virtual environment.**
#### IRKernel
```bash
module load gcc r
export R_LIBS_USER=${WORK}/jlab_venv/lib/Rlibs
mkdir -p ${R_LIBS_USER}
echo "install.packages('IRkernel', repos='https://stat.ethz.ch/CRAN/', lib=Sys.getenv('R_LIBS_USER'))" | R --no-save
source ${WORK}/jlab_venv/bin/activate
echo "IRkernel::installspec()" | R --no-save
deactivate
```
#### IJulia
```bash
module load gcc julia
export JULIA_DEPOT_PATH=${WORK}/jlab_venv/lib/Jlibs
julia -e 'using Pkg; Pkg.add("IJulia")'
```
#### MATLAB kernel
```bash
${WORK}/jlab_venv/bin/pip install matlab_kernel matlabengine==9.11.19
```
#### IOctave
```bash
${WORK}/jlab_venv/bin/pip install octave_kernel
echo "c.OctaveKernel.plot_settings = dict(backend='gnuplot')" > ~/.jupyter/octave_kernel_config.py
```
#### stata\_kernel
```bash
module load stata-se
${WORK}/jlab_venv/bin/pip install stata_kernel
${WORK}/jlab_venv/bin/python -m stata_kernel.install
sed -i "s/^stata_path = None/stata_path = $(echo ${STATA_SE_ROOT} | sed 's/\//\\\//g')\/stata-se/" ~/.stata_kernel.conf
sed -i 's/stata_path = \(.*\)stata-mp/stata_path = \1stata-se/' ~/.stata_kernel.conf
```
#### sas\_kernel
```bash
module load sas
${WORK}/jlab_venv/bin/pip install sas_kernel
sed -i "s/'\/opt\/sasinside\/SASHome/'$(echo ${SAS_ROOT} | sed 's/\//\\\//g')/g" ${WORK}/jlab_venv/lib64/python3.9/site-packages/saspy/sascfg.py
```
### Running JupyterLab
**Before running JupyterLab, you need to start an interactive session!**
```bash
Sinteractive
```
Take note of the name of the running node, that you will later need. On curnagl, you can type:
```bash
hostname
```
If you didn't install all of the kernels, the corresponding lines should be ignored in the commands below. **The execution order is important, in the sense that loading the gcc module should always be done before activating virtual environments.**
```bash
# Load python
module load gcc python
# IOctave (optional)
module load octave gnuplot
# IRKernel (optional)
export R_LIBS_USER=${WORK}/jlab_venv/lib/Rlibs
# IJulia (optional)
export JULIA_DEPOT_PATH=${WORK}/jlab_venv/lib/Jlibs
# JupyterLab environment
source ${WORK}/jlab_venv/bin/activate
# Launch JupyterLab (on the shell a link that can be copied on the browser will appear)
cd ${WORK}
jupyter-lab
deactivate
```
Before you can copy and paste the link into your favorite browser, you will need to establish an SSH tunnel to the interactive node. From a UNIX-like workstation, you can establish the SSH tunnel to the curnagl node with the following command (replace <username> with your user name, and <hostname> with the name of the node you obtained above, and the <port> number is obtained from the link, it is typically 8888):
```
ssh -n -N -J @curnagl.dcsr.unil.ch -L :localhost: @
```
You will be prompted for your password. When you have finished, you can close the tunnel with Ctrl-C.
### Note on Python/R/Julia modules and packages
The modules you install manually from JupyterLab in Python, R or Julia end up inside the JupyterLab virtual environment (${WORK}/jlab\_venv). They are hence isolated and independent from your Python/R/Julia instances outside of the virtual environment.
# JupyterLab with C++ on the curnagl cluster
JupyterLab can be run on the curnagl cluster for testing purposes, only as an intermediate step in the porting of applications from regular workstations to curnagl.
This tutorial intends to setup JupyterLab on the cluster together with the support for the C++ programming language, through the [xeus-cling kernel](https://github.com/jupyter-xeus/xeus-cling). Besides the IPyKernel kernel for the python language, which is natively supported, we will also provide the option to install support for the following kernels: IRKernel (**R**), IJulia (**julia**), MATLAB kernel (**matlab**), IOctave (**octave**), stata\_kernel (**stata**) and sas\_kernel (**sas**).
These instructions are hence related to the [JupyterLab on the curnagl cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster) tutorial, but the implementation is very different because a JIT compiler is necessary in order to interactively process C++ code. Instead of using a python virtual environment in order to isolate and install JupyterLab, the kernels and the corresponding dependencies, we use [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html).
### Setup of the micromamba virtual environment
First create/choose a folder ${WORK} under the **/scratch** or the **/work** filesystems under your project (ex. WORK=*/work/FAC/.../my\_project*). The following needs to be run only once on the cluster (preferably on an interactive computing node):
```bash
module load gcc python
export MAMBA_ROOT=/dcsrsoft/spack/external/micromamba
export MAMBA_ROOT_PREFIX="${WORK}/micromamba"
eval "$(${MAMBA_ROOT}/micromamba shell hook --shell=bash)"
micromamba create -y --prefix ${WORK}/jlab_menv python==3.9.13 jupyterlab ipykernel numpy matplotlib xeus-cling -c conda-forge
```
The IPyKernel and the xeus-cling kernel for handling C++ are now available. The other kernels need to be installed according to your needs.
### Installing the optional kernels
**Each time you start a new session on the cluster, remember to define the variable ${WORK} according to the path you chose when creating the virtual environment.**
#### IRKernel
```bash
module load gcc r
export R_LIBS_USER=${WORK}/jlab_menv/lib/Rlibs
mkdir ${R_LIBS_USER}
echo "install.packages('IRkernel', repos='https://stat.ethz.ch/CRAN/', lib=Sys.getenv('R_LIBS_USER'))" | R --no-save
export MAMBA_ROOT=/dcsrsoft/spack/external/micromamba
export MAMBA_ROOT_PREFIX="${WORK}/micromamba"
eval "$(${MAMBA_ROOT}/micromamba shell hook --shell=bash)"
echo "IRkernel::installspec()" | micromamba run --prefix ${WORK}/jlab_menv R --no-save
```
#### IJulia
```bash
module load gcc julia
export JULIA_DEPOT_PATH=${WORK}/jlab_menv/lib/Jlibs
julia -e 'using Pkg; Pkg.add("IJulia")'
```
#### MATLAB kernel
```bash
${WORK}/jlab_menv/bin/pip install matlab_kernel matlabengine==9.11.19
```
#### IOctave
```bash
${WORK}/jlab_menv/bin/pip install octave_kernel
echo "c.OctaveKernel.plot_settings = dict(backend='gnuplot')" > ~/.jupyter/octave_kernel_config.py
```
#### stata\_kernel
```bash
module load stata-se
${WORK}/jlab_menv/bin/pip install stata_kernel
${WORK}/jlab_menv/bin/python -m stata_kernel.install
sed -i "s/^stata_path = None/stata_path = $(echo ${STATA_SE_ROOT} | sed 's/\//\\\//g')\/stata-se/" ~/.stata_kernel.conf
sed -i 's/stata_path = \(.*\)stata-mp/stata_path = \1stata-se/' ~/.stata_kernel.conf
```
#### sas\_kernel
```bash
module load sas
${WORK}/jlab_menv/bin/pip install sas_kernel
sed -i "s/'\/opt\/sasinside\/SASHome/'$(echo ${SAS_ROOT} | sed 's/\//\\\//g')/g" ${WORK}/jlab_venv/lib64/python3.9/site-packages/saspy/sascfg.py
```
### Running JupyterLab
**Before running JupyterLab, you need to start an interactive session!**
```bash
Sinteractive
```
Take note of the name of the running node, that you will later need. On curnagl, you can type:
```bash
hostname
```
If you didn't install all of the kernels, the corresponding lines should be ignored in the commands below. **The execution order is important, in the sense that loading the gcc module should always be done before activating virtual environments.**
```bash
# Load python and setup the environment for micromamba to work
module load gcc python
export MAMBA_ROOT=/dcsrsoft/spack/external/micromamba
export MAMBA_ROOT_PREFIX="${WORK}/micromamba"
eval "$(${MAMBA_ROOT}/micromamba shell hook --shell=bash)"
# IOctave (optional)
module load octave gnuplot
# IRKernel (optional)
export R_LIBS_USER=${WORK}/jlab_menv/lib/Rlibs
# IJulia (optional)
export JULIA_DEPOT_PATH=${WORK}/jlab_menv/lib/Jlibs
# Launch JupyterLab (on the shell a link that can be copied on the browser will appear)
cd ${WORK}
micromamba run --prefix ${WORK}/jlab_menv jupyter-lab
```
Before you can copy and paste the link into your favorite browser, you will need to establish an SSH tunnel to the interactive node. From a UNIX-like workstation, you can establish the SSH tunnel to the curnagl node with the following command (replace <username> with your user name, and <hostname> with the name of the node you obtained above, and the <port> number is obtained from the link, it is typically 8888):
```
ssh -n -N -J @curnagl.dcsr.unil.ch -L :localhost:
```
You will be prompted for your password. When you have finished, you can close the tunnel with Ctrl-C.
### Note on Python/R/Julia modules and packages
The modules you install manually from JupyterLab in Python, R or Julia end up inside the JupyterLab virtual environment (${WORK}/jlab\_menv). They are hence isolated and independent from your Python/R/Julia instances outside of the virtual environment.
# Dask on curnagl
In order to use Dask in Curnagl you have to use the following packages:
- dask
- dask-jobqueue
Note: please make sure to use version 2022.11.0 or later. Previous versions have some bugs on worker-nodes that make them very slow when using several threads.
Dask makes easy to parallelize computations, you can run computational intensive methods on parallel by assigning those computations to different CPU resources.
For example:
```python
def cpu_intensive_method(x, y , z):
# CPU computations
return x + 1
futures = []
for x,y,z in zip(list_x, list_y, list_z):
future = client.submit(cpu_intensive_method, x, y, z)
futures.append(future)
result = client.gather(futures)
```
This documentation proposes two types of use:
- LocalCluster: this mode is very simple and can be used to easily parallelize computations by submitting just one job in the cluster. This is a good starting point
- SlurmCluster: this mode handle more parallelisim by distributing work on several machines. It can handle load and submit automatically new jobs for increasing paralellisim
### Local cluster
Python script looks like:
```python
import dask
from dask.distributed import Client, LocalCluster
def compute(x):
""CPU demanding code"
if __name__ == "__main__":
cluster = LocalCluster()
client = Client(address=cluster)
parameters = [1, 2, 3, 4]
for x in parameters:
future = client.submit(inc, x)
futures.append(future)
result = client.gather(futures)
```
Call to LocalCluster and Client should be put inside the block if \_\_name\_\_ == "\_\_main\_\_". For more information, you can check the following link: [https://docs.dask.org/en/stable/scheduling.html](https://docs.dask.org/en/stable/scheduling.html)
The method LocalCluster() will deploy N workers, each worker using T threads such that NxT is equal to the number of cores reserved by SLURM. Dask will balance the number of workers and the number of threads per worker, the goal is to take advantage of GIL free workloads such as Numpy and Pandas.
SLURM script:
```bash
#SBATCH --job-name dask_job
#SBATCH --ntasks 16
#SBATCH -N 1
#SBATCH --partition cpu
#SBATCH --cpus-per-task 1
#SBATCH --time 01:00:00
#SBATCH --output=dask_job-%j.out
#SBATCH --error=dask_job%j.error
python script.py
```
Make sure to include the parameter `-N 1` otherwise SLURM will allocate tasks on different nodes and it will make Dask local cluster fail. You should adapt the parameter` --ntasks`, as we are using just one machine we can choose between 1 and 48. Just have in mind that the smallest the number the faster your job will start. You can choose to run with less processes but for a longer time.
### Slurm cluster
The python script can be launched directly from the frontend but you need to keep you session open with tools such as `tmux `or `screen `otherwise your jobs will be cancelled.
In your Python script you should put something like:
```python
import dask
from dask.distributed import Client
from dask_jobqueue import SLURMCluster
def compute(x):
""CPU demanding code"
if __name__ == "__main__":
cluster = SLURMCluster(cores=8, memory="40GB")
client = Client(cluster)
cluster.adapt(maximum_jobs=5, interval="10000 ms")
for x in parameters:
future = client.submit(inc, x)
futures.append(future)
result = client.gather(futures)
```
In this case DASK will launch jobs with 8 cores and 40GB of memory. The parameters `memory `and `cores` are mandatory. There are two methods to launch jobs: adapt and scale. `adapt` will launch/kill jobs by taking into account the load of your computation and how many computations in parallel you can run. You can put a limit on the number of jobs that will be launched. The parameter `interval` is necessary and needs to be set to `10000 ms` to avoid killing jobs too early.
`scale` will create a static infrastructure composed of a fix number of jobs, specified with the parameters jobs. Example
`scale(jobs=10)`
This will launch 10 jobs independent from the load and the amount of computation you generate.
#### Some facts about Slurm jobs and DASK
You need to have in mind that the computation will depend on the availability of resources, if jobs are not running your computation will not start. So if you think that your computation is stuck, please verify first that jobs have been submitted and that they are running using the command: `squeue -u $USER`.
By default the walltime is set to 30 min, you can use the parameter: `walltime` if you think that each individual computation will last more than the default time.
Slurm files will be generated under the same directory where you launch your python command.
Jobs will killed by Dask when there is no more computation to be done. If you see the message:
`slurmstepd: error: *** JOB 25260254 ON dna051 CANCELLED AT 2023-03-01T11:00:19 ***`
It is completely normal and it does not mean that there was an error in your computation.
### Optimal number of workers
Both LocalCluster or SLURMCluster, will automatically balance the number of workers and the number of threads per worker. You can choose the number of workers using the parameter `n_workers`. If most of the computation relies on Numpy or Pandas, it is preferable to have only one worker `n_workers=1`. If most of the computation is pure Python code you should use as much workers as possible. Example:
Local cluster:
`LocalCluster(n_workers=int(os.environ['SLURM_NTASKS']))`
Slurm cluster:
`SLURMCluster(cores=8, memory="40GB", n_workers=8)`
### Example
Here, it is an example code which illustrates the use of Dask. The code runs 40 multiplications of random matrices of size NXN, each computation returns the sum of all the elements of the result matrix:
```python
import os
import time
import numpy as np
from dask.distributed import Client, LocalCluster
from dask_jobqueue import SLURMCluster
SIZE = 9192
def compute(tag):
np.random.seed(tag)
A = np.random.random((SIZE,SIZE))
B = np.random.random((SIZE,SIZE))
start = time.time()
C = np.dot(A,B)
end = time.time()
elapsed = end-start
return elapsed, np.sum(C)
if __name__ == "__main__":
# cluster = LocalCluster(n_workers=int(os.environ['SLURM_NTASKS']))
cluster = SLURMCluster(memory="40GB", n_workers=8)
client = Client(cluster)
cluster.adapt(maximum_jobs=5, interval="10000 ms")
N_ITER = 40
futures = []
for i in range(N_ITER):
future = client.submit(compute, i)
futures.append(future)
results = client.gather(futures)
print(results)
```
# Running the Isca framework on the cluster
Isca is a framework for the idealized modelling of the global circulation of planetary atmospheres at varying levels of complexity and realism. The framework is an outgrowth of models from GFDL designed for Earth's atmosphere, but it may readily be extended into other planetary regimes.
### Installation
First of all define a folder ${WORK} on the /work or the /scratch filesystem (somewhere where you have write permissions):
```bash
export WORK=/work/FAC/...
mkdir -p ${WORK}
```
Load the following relevant modules and create a python virtual environment:
```bash
dcsrsoft use arolle
module load gcc/10.4.0
module load mvapich2/2.3.7
module load netcdf-c/4.8.1-mpi
module load netcdf-fortran/4.5.4
module load python/3.9.13
python -m venv ${WORK}/isca_venv
```
Install the required python modules:
```bash
${WORK}/isca_venv/bin/pip install dask f90nml ipykernel Jinja2 numpy pandas pytest sh==1.14.3 tqdm xarray
```
Download and install the Isca framework:
```bash
cd ${WORK}
git clone https://github.com/ExeClim/Isca
cd Isca/src/extra/python
${WORK}/isca_venv/bin/pip install -e .
```
Patch the Isca makefile:
```bash
sed -i 's/-fdefault-double-8$/-fdefault-double-8 \\\n -fallow-invalid-boz -fallow-argument-mismatch/' ${WORK}/Isca/src/extra/python/isca/templates/mkmf.template.gfort
```
Create the environment file for curnagl:
```bash
cat << EOF > ${WORK}/Isca/src/extra/env/curnagl-gfortran
echo Loading basic gfortran environment
# this defaults to ia64, but we will use gfortran, not ifort
export GFDL_MKMF_TEMPLATE=gfort
export F90=mpifort
export CC=mpicc
EOF
```
### Compiling and running the Held-Suarez dynamical core test case
Compilation takes place automatically at runtime. After logging in to the cluster, create a SLURM script file start.sbatch with the following contents:
```bash
#!/bin/bash -l
#SBATCH --account ACCOUNT_NAME
#SBATCH --mail-type ALL
#SBATCH --mail-user @unil.ch
#SBATCH --chdir ${WORK}
#SBATCH --job-name isca_held-suarez
#SBATCH --output=isca_held-suarez.job.%j
#SBATCH --partition cpu
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 16
#SBATCH --mem 8G
#SBATCH --time 00:29:59
#SBATCH --export ALL
dcsrsoft use arolle
module load gcc/10.4.0
module load mvapich2/2.3.7
module load netcdf-c/4.8.1-mpi
module load netcdf-fortran/4.5.4
WORK=$(pwd)
export GFDL_BASE=${WORK}/Isca
export GFDL_ENV=curnagl-gfortran
export GFDL_WORK=${WORK}/isca_work
export GFDL_DATA=${WORK}/isca_gfdl_data
export C_INCLUDE_PATH=${NETCDF_C_ROOT}/include
export LIBRARY_PATH=${NETCDF_C_ROOT}/lib
sed -i "s/^NCORES =.*$/NCORES = $(echo ${SLURM_CPUS_PER_TASK:-1})/" ${GFDL_BASE}/exp/test_cases/held_suarez/held_suarez_test_case.py
${WORK}/isca_venv/bin/python $GFDL_BASE/exp/test_cases/held_suarez/held_suarez_test_case.py
```
You need to carefully replace, at the beginning of the file, the following elements:
- On line 3: ***ACCOUNT\_NAME*** with the project id that was attributed to your PI for the given project
- On line 5: ***<first.lastname>@unil.ch*** with your e-mail address (or double-comment that line with an additional '#' if you don't wish to receive e-mail notifications about the status of the job)
- On line 7: ***${WORK}*** must be replaced with the **absolute path** (ex. */work/FAC/.../isca*) to the chosen folder you created on the installation steps
- On line 15-17: you can adjust the number of CPUs, the memory and the time for the job (the present values are appropriate for the default Held-Suarez example)
Then you can simply start the job:
```bash
sbatch start.sbatch
```
# Running the MPAS framework on the cluster
The Model for Prediction Across Scales (MPAS) is a collaborative project for developing atmosphere, ocean and other earth-system simulation components for use in climate, regional climate and weather studies.
### Compilation
First of all define a folder ${WORK} on the /work or the /scratch filesystem (somewhere where you have write permissions):
```bash
export WORK=/work/FAC/...
mkdir -p ${WORK}
```
Load the following relevant modules:
```bash
module load gcc/11.4.0
module load mvapich2/2.3.7-1
module load parallel-netcdf/1.12.3
module load parallelio/2.6.2
export PIO=$PARALLELIO_ROOT
export PNETCDF=$PARALLEL_NETCDF_ROOT
```
Download the MPAS framework:
```bash
cd ${WORK}
git clone https://github.com/MPAS-Dev/MPAS-Model --depth 1 --branch $(curl -sL https://api.github.com/repos/MPAS-Dev/MPAS-Model/releases/latest | grep -i "tag_name" | awk -F '"' '{print $4}')
```
This is going to download the source code of the latest release of MPAS. The last version that was successfully tested on the `curnagl` cluster with the present instructions is `v8.1.0` and future versions might need some adjustments to compile and run.
Patch the MPAS Makefile:
```bash
sed -i 's/-ffree-form/-ffree-form -fallow-argument-mismatch/' ${WORK}/MPAS-Model/Makefile
sed -i 's/ mpi_f08_test//' ${WORK}/MPAS-Model/Makefile
```
This is going to force MPAS to use the old MPI wrapper for Fortran 90. When compiling with GCC older than version 12.0, a bug in the C binding interoperability feature ([https://gcc.gnu.org/bugzilla/show\_bug.cgi?id=104100](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104100)) used by the MPI wrapper for Fortran 2008 breaks the code. If you are compiling with GCC 12.0 or newer, you do not need to patch and the new wrapper will be successfully used.
Compile:
```bash
cd ${WORK}/MPAS-Model
make gfortran CORE=init_atmosphere AUTOCLEAN=true PRECISION=single OPENMP=true USE_PIO2=true
make gfortran CORE=atmosphere AUTOCLEAN=true PRECISION=single OPENMP=true USE_PIO2=true
```
### Running a basic global simulation
Here we aim at running a basic global simulation, just to test that the framework runs. we need to proceed in three steps:
1. Process time-invariant fields, which will be interpolated into a given mesh, this step produces a "static" file
2. Interpolating time-varying meteorological and land-surface fields from intermediate files (produced by the
ungrib component of the WRF Pre-processing System), this step produces an "init" file
3. Run the basic simulation
##### Create the run folder and link to the binary files
```bash
cd ${WORK}
mkdir -p run
cd run
ln -s ${WORK}/MPAS-Model/init_atmosphere_model
ln -s ${WORK}/MPAS-Model/atmosphere_model
```
##### Get the mesh files
```bash
cd ${WORK}
wget https://www2.mmm.ucar.edu/projects/mpas/atmosphere_meshes/x1.40962.tar.gz
wget https://www2.mmm.ucar.edu/projects/mpas/atmosphere_meshes/x1.40962_static.tar.gz
cd run
tar xvzf ../x1.40962.tar.gz
tar xvzf ../x1.40962_static.tar.gz
```
##### Create the configuration files for the "static" run
The `namelist.init_atmosphere` file:
```bash
cat << EOF > ${WORK}/run/namelist.init_atmosphere
&nhyd_model
config_init_case = 7
/
&data_sources
config_geog_data_path = '${WORK}/WPS_GEOG/'
config_landuse_data = 'MODIFIED_IGBP_MODIS_NOAH'
config_topo_data = 'GMTED2010'
config_vegfrac_data = 'MODIS'
config_albedo_data = 'MODIS'
config_maxsnowalbedo_data = 'MODIS'
/
&preproc_stages
config_static_interp = true
config_native_gwd_static = true
config_vertical_grid = false
config_met_interp = false
config_input_sst = false
config_frac_seaice = false
/
EOF
```
The `streams.init_atmosphere` file:
```bash
cat << EOF > ${WORK}/run/streams.init_atmosphere
EOF
```
##### Proceed to the "static" run
You will need to make sure that the folder `${WORK}/WPS_GEOG` exists and contains all the appropriate data.
First create a `start_mpas_init.sbatch` file (carefully replace on line #4 `ACCOUNT_NAME` by your actual project name and on line #6 appropriately type your e-mail address, or double-comment with an additional `#` if you don't wish to receive job notifications):
```bash
cat << EOF > ${WORK}/run/start_mpas_init.sbatch
#!/bin/bash -l
#SBATCH --account ACCOUNT_NAME
#SBATCH --mail-type ALL
#SBATCH --mail-user @unil.ch
#SBATCH --chdir ${WORK}/run
#SBATCH --job-name mpas_init
#SBATCH --output=mpas_init.job.%j
#SBATCH --partition cpu
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --time 00:59:59
#SBATCH --export ALL
module load gcc/11.4.0
module load mvapich2/2.3.7-1
module load parallel-netcdf/1.12.3
module load parallelio/2.6.2
export PIO=\$PARALLELIO_ROOT
export PNETCDF=\$PARALLEL_NETCDF_ROOT
export LD_LIBRARY_PATH=\$PARALLELIO_ROOT/lib:\$PARALLEL_NETCDF_ROOT/lib:\$LD_LIBRARY_PATH
srun ./init_atmosphere_model
EOF
```
Now start the job with `sbatch start_mpas_init.sbatch` and at the end of the run, make sure that the log file `${WORK}/run/log.init_atmosphere.0000.out` displays no error.
##### Create the configuration files for the "init" run
The `namelist.init_atmosphere` file:
```bash
cat << EOF > ${WORK}/run/namelist.init_atmosphere
&nhyd_model
config_init_case = 7
config_start_time = '2014-09-10_00:00:00'
/
&dimensions
config_nvertlevels = 55
config_nsoillevels = 4
config_nfglevels = 38
config_nfgsoillevels = 4
/
&data_sources
config_met_prefix = 'GFS'
config_use_spechumd = false
/
&vertical_grid
config_ztop = 30000.0
config_nsmterrain = 1
config_smooth_surfaces = true
config_dzmin = 0.3
config_nsm = 30
config_tc_vertical_grid = true
config_blend_bdy_terrain = false
/
&preproc_stages
config_static_interp = false
config_native_gwd_static = false
config_vertical_grid = true
config_met_interp = true
config_input_sst = false
config_frac_seaice = true
/
EOF
```
The `streams.init_atmosphere` file:
```bash
cat << EOF > ${WORK}/run/streams.init_atmosphere
EOF
```
##### Proceed to the "init" run
Just start again the job with `sbatch start_mpas_init.sbatch` and at the end of the run, make sure that the log file `${WORK}/run/log.init_atmosphere.0000.out` displays no error.
##### Create the configuration file for the global simulation
The `namelist.atmosphere` file:
```bash
cat << EOF > ${WORK}/run/namelist.atmosphere
&nhyd_model
config_time_integration_order = 2
config_dt = 720.0
config_start_time = '2014-09-10_00:00:00'
config_run_duration = '0_03:00:00'
config_split_dynamics_transport = true
config_number_of_sub_steps = 2
config_dynamics_split_steps = 3
config_h_mom_eddy_visc2 = 0.0
config_h_mom_eddy_visc4 = 0.0
config_v_mom_eddy_visc2 = 0.0
config_h_theta_eddy_visc2 = 0.0
config_h_theta_eddy_visc4 = 0.0
config_v_theta_eddy_visc2 = 0.0
config_horiz_mixing = '2d_smagorinsky'
config_len_disp = 120000.0
config_visc4_2dsmag = 0.05
config_w_adv_order = 3
config_theta_adv_order = 3
config_scalar_adv_order = 3
config_u_vadv_order = 3
config_w_vadv_order = 3
config_theta_vadv_order = 3
config_scalar_vadv_order = 3
config_scalar_advection = true
config_positive_definite = false
config_monotonic = true
config_coef_3rd_order = 0.25
config_epssm = 0.1
config_smdiv = 0.1
/
&damping
config_zd = 22000.0
config_xnutr = 0.2
/
&limited_area
config_apply_lbcs = false
/
&io
config_pio_num_iotasks = 0
config_pio_stride = 1
/
&decomposition
config_block_decomp_file_prefix = 'x1.40962.graph.info.part.'
/
&restart
config_do_restart = false
/
&printout
config_print_global_minmax_vel = true
config_print_detailed_minmax_vel = false
/
&IAU
config_IAU_option = 'off'
config_IAU_window_length_s = 21600.
/
&physics
config_sst_update = false
config_sstdiurn_update = false
config_deepsoiltemp_update = false
config_radtlw_interval = '00:30:00'
config_radtsw_interval = '00:30:00'
config_bucket_update = 'none'
config_physics_suite = 'mesoscale_reference'
/
&soundings
config_sounding_interval = 'none'
/
EOF
```
The `streams.atmosphere` file:
```bash
cat << 'EOF' > ${WORK}/run/streams.atmosphere
EOF
```
#### Run the whole simulation
You will need to copy relevant data to the run folder:
```bash
cp ${WORK}/MPAS-Model/{GENPARM.TBL,LANDUSE.TBL,OZONE_DAT.TBL,OZONE_LAT.TBL,OZONE_PLEV.TBL,RRTMG_LW_DATA,RRTMG_SW_DATA,SOILPARM.TBL,VEGPARM.TBL} ${WORK}/run/.
```
Then create a `start_mpas.sbatch` file (carefully replace on line #4 `ACCOUNT_NAME` by your actual project name and on line #6 appropriately type your e-mail address, or double-comment with an additional `#` if you don't wish to receive job notifications):
```bash
cat << EOF > ${WORK}/run/start_mpas.sbatch
#!/bin/bash -l
#SBATCH --account ACCOUNT_NAME
#SBATCH --mail-type ALL
#SBATCH --mail-user @unil.ch
#SBATCH --chdir ${WORK}/run
#SBATCH --job-name mpas_init
#SBATCH --output=mpas_init.job.%j
#SBATCH --partition cpu
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 16
#SBATCH --mem 8G
#SBATCH --time 00:59:59
#SBATCH --export ALL
module load mvapich2/2.3.7-1
module load parallel-netcdf/1.12.3
module load parallelio/2.6.2
export PIO=\$PARALLELIO_ROOT
export PNETCDF=\$PARALLEL_NETCDF_ROOT
export LD_LIBRARY_PATH=\$PARALLELIO_ROOT/lib:\$PARALLEL_NETCDF_ROOT/lib:\$LD_LIBRARY_PATH
srun ./atmosphere_model
EOF
```
Now start the job with `sbatch start_mpas.sbatch` and at the end of the run, make sure that the log file `${WORK}/run/log.atmosphere.0000.out` displays no error.
# Run OpenFOAM codes on Curnagl
### Script to run OpenFOAM code
##### **You are using OpenFOAM on your computer and you need more ressources. Let’s go on Curnagl! **
OpenFOAM is usually using MPI. Here is a bash script to run your parallelized OpenFOAM code. NTASKS should be replaced by the number of processors you want to use into your OpenFOAM code. It is good practice to put your OpenFOAM code in a bash file instead of calling OpenFOAM commands right into the sbatch file.
For instance, create `openfoam.sh` in which you call your OpenFOAM code (replace commands with yours):
```bash
!/bin/bash
# First command
decomposepar ...
# Second command, if you are using a parallel command, CALL IT WITH SRUN COMMAND
srun snappyHexMesh -parallel ...
```
Then, create a sbatch file to run your OpenFOAM bash file on Curnagl:
```bash
#!/bin/bash -l
#SBATCH --job-name openfoam
#SBATCH --output openfoam.out
#SBATCH --partition cpu
#SBATCH --nodes 1
#SBATCH --ntasks NTASKS
#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --time 00:30:00
#SBATCH --export NONE
module purge
module load gcc/10.4.0 mvapich2/2.3.7 openfoam/2206
export SLURM_EXPORT_ENV=ALL
# RUN YOUR BASH OPENFOAM CODE HERE
bash ./openfoam.sh
```
Please note that running your parallelized OpenFOAM code should not be performed via `mpirun` but `srun`. For a complete MPI overview on Curnagl, please refer to [compiling and running MPI codes](https://wiki.unil.ch/ci/books/service-de-calcul-haute-performance-%28hpc%29/page/compiling-and-running-mpi-codes "compiling and running MPI codes") wiki.
###
### How do I transfer my OpenFOAM code to Curnagl ?
You can upload your OpenFOAM code thanks to FileZilla or copy and paste data to the cluster thanks to the `scp` command.
Example: I want to copy test.py to Curnagl. I run the following command:
`scp test.py @curnagl.dcsr.unil.ch:/YOUR_PATH_ON_CURNAGL`
Where `YOUR_PATH_ON_CURNAGL` is something like `/users/username/work/my_folder`.
In these commands, do not forget to change `` with yours.
**This transfer can be done for any file type: .py, .csv, .h, images...**
**To copy a folder, use the command `scp -r`.**
**For more details, refer to [transfer files to/from Curnagl](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/transfer-files-tofrom-curnagl "transfer files to/from Curnagl") wiki.**
# Compiling software using cluster libraries
If you see the following error when compiling a code on the cluster:
```bash
fatal error: XXXX.h: No such file or directory
```
That means that the software you are trying to compile needs a specific header file provided by a third party library. In order to use a third party library, the compiler mainly needs two things:
- a header file XXXX.h
- the binary of the library: XXXXX.so
By default in Linux systems, those files are located in default paths as: /usr, /lib, etc.. There are two ways to tell the compiler where to look for those files: Makefile or using compiler variables.
### Makefile
Makefiles provide the following [Variables](https://www.gnu.org/software/make/manual/make.html#Implicit-Variables) :
- CFLAGS
- CXXFLAGS
- FFLAGS
- LDFLAGS
The three first variables are used to pass extra options to a specific compiler and language, c, c++ and fortran respectively. The last variable is meant to be used to pass the option `-L -l` which are used by the linker.
**Example**
```bash
CFLAGS+= -I/usr/local/cuda/include
LDFLAGS+= -L/usr/local/cuda/lib -lcudnn
```
Here we will tell the compiler where to find the include files and the location of libraries. Those variables should already be present on the makefile and used on the compilation process.
#### GCC Variables
if you are using GCC, you can use the following [Variables](https://gcc.gnu.org/onlinedocs/gcc/Environment-Variables.html) :
- CPATH
- LIBRARY\_PATH
```bash
CPATH=/usr/local/cuda/include
LIBRARY_PATH=/usr/local/cuda/lib
```
This would have the same result as modifying the variable on the Makefile. This procedure is very useful in case you do not have access to the Makefile or Makefile variables are not used during compilation.
### Using cluster libraries
On the cluster, libraries are provided by modules which means that you need to tell the compiler to look for headers files and binary files in special locations. The procedure is the following:
- load the library: module load XXX
- find the name of the ROOT variable by executing: module show XXX
- Use that variable on the CFLAFGS and LDFLAGS definition
**Example**
```bash
$ module load cuda
$ module show cuda
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/dcsrsoft/spack/arolle/v1.0/spack/share/spack/lmod/Zen2-IB/Core/cuda/11.6.2.lua:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
whatis("Name : cuda")
whatis("Version : 11.6.2")
whatis("Target : zen")
whatis("Short description : CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).")
help([[CUDA is a parallel computing platform and programming model invented by
NVIDIA. It enables dramatic increases in computing performance by
harnessing the power of the graphics processing unit (GPU). Note: This
package does not currently install the drivers necessary to run CUDA.
These will need to be installed manually. See:
https://docs.nvidia.com/cuda/ for details.]])
depends_on("libxml2/2.9.13")
prepend_path("LD_LIBRARY_PATH","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf/lib64")
prepend_path("PATH","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf/bin")
prepend_path("CMAKE_PREFIX_PATH","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf/")
setenv("CUDA_HOME","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf")
setenv("CUDA_ROOT","/dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen/gcc-8.4.1/cuda-11.6.2-rswplbcorqlt6ywhcnbdisk6puje4ejf")
```
You can observe that there is the variable `CUDA_ROOT` which is the one that should be used.
```
export CFLAGS="-I$CUDA_ROOT/include"
LDFLAGS+= -L$(CUDA_ROOT)/lib64/stubs -L$(CUDA_ROOT)/lib64/ -lcuda -lcudart -lcublas -lcurand
```
This is quite a complex example, sometimes you only need `-L$(XXX_ROOT)/lib`.
**Example for R package**
In the case of an R package, we do not have control over the Makefile, so the only option is to use GCC variables. For an R package that depend on gsl and mpfr libraries, we need to do the following:
```
mdoule load gsl mpfr
export CPATH=$GSL_ROOT/include:$MPFR_ROOT/include
export LIBRARY_PATH=$GSL_ROOT/lib:$MPFR_ROOT/lib
```
# Course software for Image Analysis with CNNs
You can do the practicals on various computing platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals.
- [JupyterLab](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-image-analysis-with-cnns#bkmrk-jupyterlab): Working on the cloud is convenient because the installation of the Python packages is already done and you will be working with a Jupyter Notebook style. Note, however, that the UNIL JupyterLab will only be active during the course and for one week following its completion, so in the long term you should use either your laptop or Curnagl.
- [Laptop](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-image-analysis-with-cnns#bkmrk-laptop): This is good if you want to work directly on your laptop, but you will need to install the required libraries on your laptop. Warning: We will give general instructions on how to install the libraries on your laptop but it is sometimes tricky to find the right library versions and we will not be able to help you with the installation. The installation should take about 15 minutes.
- [Curnagl](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-image-analysis-with-cnns#bkmrk-curnagl): This is efficient if you are used to work on a cluster or if you intend to use one in the future to work on large projects. If you have an account you can work on your /scratch folder or ask us to be part of the course project but please contact us at least a week before the course. If you do not have an account to access the UNIL cluster Curnagl, please contact us at least a week before the course so that we can give you a temporary account. The installation should take about 15 minutes. Note that it is also possible to use JupyterLab on Curnagl: see [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster)
If you choose to work on the UNIL JupyterLab, then you do not need to prepare anything since all the necessary libraries will already be installed on the UNIL JupyterLab. In all cases, you will receive a guest username during the course, so you will be able to work on the UNIL JupyterLab.
Otherwise, if you prefer to work on your laptop or on Curnagl, please make sure you have a working installation before the day of the course as on the day we will be unable to provide any assistance with this.
If you have difficulties with the installation on Curnagl we can help you, so please contact us before the course at helpdesk@unil.ch with subject: DCSR ML course.
On the other hand, if you are unable to install the libraries on your laptop, we will unfortunately not be able to help you (there are too many particular cases), so you will need to use the UNIL Jupyter Lab during the course.
Before the course, we will send you all the files that are needed to do the practicals.
### **JupyterLab**
Here are some instructions for using the UNIL JupyterLab to do the practicals.
Go to the webpage: [https://jupyter.dcsr.unil.ch/jupyter](https://jupyter.dcsr.unil.ch/jupyter)
Enter the login and password that you have received during the course.
#### **Image Classification**
We have already prepared your workspace, including the data and notebook. However, in case there is a problem, you can follow the following instructions.
Click on the button "New Folder" (the small logo of of folder with a "+" sign) and name it "models".
Click again on the same button "New Folder" and name it "images".
Double click on the "images" folder that you have just created.
Click on the button "Upload Files" (the vertical arrow logo) and upload the three images (car.jpeg, frog.jpeg and ship.jpeg) that are included in "images" directory you have received for this course.
Click on the folder logo (just on top of "Name") to come out of the "images" folder.
Double click on the "models" folder and then click on the button "Upload Files" to upload all the "models.keras" and "models.npy" files that are included in the "models" directory you have received for this course.
Click on the folder logo (just on top of "Name") to come out of the "models" folder.
To work with the html file "Convolutional\_Neural\_Networks.html":
- Click on the "CNN" square button in the Notebook panel
- Copy / paste the commands from the html practical file to the Jupyter Notebook
To work with the notebook "Convolutional\_Neural\_Networks.ipynb":
- Upload the notebook "Convolutional\_Neural\_Networks.ipynb"
- Double click on "Convolutional\_Neural\_Networks.ipynb"
- Change the "ipykernel" (top right button "Python 3 ipykernel") to CNN
In the practical code (i.e. the Python code in the html or ipynb file), the following paths were set:
platform = "jupyter"
PATH\_IMAGES = "./images"
PATH\_MODELS = "./models"
To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.
When using TensorFlow, you may receive a warning
2022-09-22 11:01:12.232756: W tensorflow/stream\_executor/platform/default/dso\_loader.cc:64\] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-22 11:01:12.232856: I tensorflow/stream\_executor/cuda/cudart\_stub.cc:29\] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).
When you have finished the practicals, select File / Log out.
#### **Image Segmentation**
Now click on the "ImageProcessing" square button in the Notebook panel.
Copy / paste the commands from the html practical file to the Jupyter Notebook.
To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.
### **Laptop**
You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).
#### **Image Classification**
Please decide in which folder (or path) you want to do the practicals and go there:
```
cd THE_PATH_WHERE_I_DO_THE_PRACTICALS
```
Then you need to create two folders:
```
mkdir images
mkdir models
```
Please copy/paste the three images (car.jpeg, frog.jpeg and ship.jpeg) that are included in the folder "images" you have received for this course in the "images" folder. And also copy/paste all the "models.keras" and "models.npy" files that are included in "models" directory you have received for this course.
In the practical code (i.e. the Python code in the html file), you will need to set the paths as follows:
platform = "laptop"
PATH\_IMAGES = "./images"
PATH\_MODELS = "./models"
Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on your laptop. You need Python >= 3.8.
##### **For Linux**
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install tensorflow tf-keras-vis scikit-learn matplotlib numpy h5py notebook
```
You may need to choose the right library versions, for example tensorflow==2.12.0
To check that Tensorflow was installed:
```
python3 -c "import tensorflow; print(tensorflow.version.VERSION)"
```
There might be a warning message (see above) and the output should be something like "2.12.0".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
source mlcourse/bin/activate
jupyter notebook
```
##### **For Mac**
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install tensorflow-macos==2.12.0 tf-keras-vis scikit-learn matplotlib numpy h5py notebook
```
If you receive an error message such as:
ERROR: Could not find a version that satisfies the requirement tensorflow-macos (from versions: none)
ERROR: No matching distribution found for tensorflow-macos
Then, try the following command:
```
SYSTEM_VERSION_COMPAT=0 pip3 install tensorflow-macos==2.12.0 scikit-learn==1.2.2 scikeras eli5 pandas matplotlib notebook keras-tuner
```
If you have a Mac with M1 or more recent chip (if you are not sure have a look at "About this Mac"), you can also install the tensorflow-metal library to accelerate training on Mac GPUs (but this is not necessary for the course):
```
pip3 install tensorflow-metal
```
To check that Tensorflow was installed:
```
python3 -c "import tensorflow; print(tensorflow.version.VERSION)"
```
There might be a warning message (see above) and the output should be something like "2.12.0".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
source mlcourse/bin/activate
jupyter notebook
```
##### **For Windows**
If you do not have Python installed, you can use either Conda: [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html) (see the instructions here: [https://conda.io/projects/conda/en/latest/user-guide/install/windows.html](https://conda.io/projects/conda/en/latest/user-guide/install/windows.html)) or Python official installer: [https://www.python.org/downloads/windows/](https://www.python.org/downloads/windows/)
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install tensorflow tf-keras-vis scikit-learn matplotlib numpy h5py notebook
```
You may need to choose the right library versions, for example tensorflow==2.12.0
To check that Tensorflow was installed:
```
python -c "import tensorflow; print(tensorflow.version.VERSION)"
```
There might be a warning message (see above) and the output should be something like "2.12.0".
You can terminate the current session:
```
deactivate
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
mlcourse\Scripts\activate.bat
jupyter notebook
```
#### **Image Segmentation**
This part of the course must be done on the UNIL Jupyter Lab but some instructions on how to install the libraries on your laptop will be given at the end of the course.
### **Curnagl**
For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.
For Mac users, download and install XQuartz (X server): [https://www.xquartz.org/](https://www.xquartz.org/)
For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: [https://mobaxterm.mobatek.net/download-home-edition.html](https://mobaxterm.mobatek.net/download-home-edition.html)
For Linux users, you do not need to install anything.
When testing if TensorFlow was properly installed (see below) you may receive a warning
2022-03-16 12:15:00.564218: W tensorflow/stream\_executor/platform/default/dso\_loader.cc:64\] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD\_LIBRARY\_PATH: /dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/python-3.8.8-tb3aceqq5wzx4kr5m7s5m4kzh4kxi3ex/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tcl-8.6.11-aonlmtcje4sgqf6gc4d56cnp3mbbhvnj/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen2/gcc-9.3.0/tk-8.6.11-2gb36lqwohtzopr52c62hajn4tq7sf6m/lib:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib64:/dcsrsoft/spack/hetre/v1.2/spack/opt/spack/linux-rhel8-zen/gcc-8.3.1/gcc-9.3.0-nwqdwvso3jf3fgygezygmtty6hvydale/lib
2022-03-16 12:15:00.564262: I tensorflow/stream\_executor/cuda/cudart\_stub.cc:29\] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
You should not worry. By default, TensorFlow is trying to use GPUs and since there are no GPUs, it writes a warning and decides to use CPUs (which is enough for our course).
#### **Image Classification**
Here are some instructions for installing Keras with TensorFlow at the backend (for Python3), and other libraries, on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
```
Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.
For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.
See also the documentation: [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster)
You can do the practicals in your /scratch directory or on the course group "cours\_hpc" if you have asked us in advanced:
```
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc
mkdir < my unil username >
cd < my unil username >
```
You need to make two directories:
```
mkdir images
mkdir models
```
Clone the following git repos:
```
git clone https://c4science.ch/source/CNN_Classification.git
```
Copy the images from CNN\_Classification to images:
```
cp CNN_Classification/*jpeg images
```
You also need to upload all the "models.keras" and "models.npy" files that are included in the "models" directory you have received for this course, and move them to the "models" folder on Curnagl.
Let us install libraries from the interactive partition:
```
Sinteractive -m 10G -G 1
module load python/3.10.13 cuda/11.8.0 cudnn/8.7.0.84-11.8
python -m venv mlcourse
source mlcourse/bin/activate
pip install -r CNN_Classification/requirements.txt
```
To check that TensorFlow was installed:
```
python -c 'import tensorflow; print(tensorflow.version.VERSION)'
```
There might be a warning message (see above) and the output should be something like "2.9.1".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >
```
You can do the practicals on the interactive partition:
```
Sinteractive -m 10G -G 1
module load python/3.10.13 cuda/11.8.0 cudnn/8.7.0.84-11.8
source mlcourse/bin/activate
python
```
In the practical code (i.e. the Python code in the html file), you will need to set the paths as follows:
platform = "curnagl"
PATH\_IMAGES = "./images"
PATH\_MODELS = "./models"
#### **Image Segmentation**
On demand. If you work in a project in which you need to use Curnagl to do segmentations, please contact us.
# Course software for Text Analysis with LLMs
You can do the practicals on various computing platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals.
- [JupyterLab](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-text-analysis-with-llms#bkmrk-jupyterlab): Working on the cloud is convenient because the installation of the Python packages is already done and you will be working with a Jupyter Notebook style. Note, however, that the UNIL JupyterLab will only be active during the course and for one week following its completion, so in the long term you should use either your laptop or Curnagl.
- [Laptop](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-text-analysis-with-llms#bkmrk-laptop): This is good if you want to work directly on your laptop, but you will need to install the required libraries on your laptop. Warning: We will give general instructions on how to install the libraries on your laptop but it is sometimes tricky to find the right library versions and we will not be able to help you with the installation. The installation should take about 15 minutes.
- [Curnagl](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/course-software-for-text-analysis-with-llms#bkmrk-curnagl): This is efficient if you are used to work on a cluster or if you intend to use one in the future to work on large projects. If you have an account you can work on your /scratch folder or ask us to be part of the course project but please contact us at least a week before the course. If you do not have an account to access the UNIL cluster Curnagl, please contact us at least a week before the course so that we can give you a temporary account. The installation should take about 15 minutes. Note that it is also possible to use JupyterLab on Curnagl: see [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster)
If you choose to work on the UNIL JupyterLab, then you do not need to prepare anything since all the necessary libraries will already be installed on the UNIL JupyterLab. In all cases, you will receive a guest username during the course, so you will be able to work on the UNIL JupyterLab.
Otherwise, if you prefer to work on your laptop or on Curnagl, please make sure you have a working installation before the day of the course as on the day we will be unable to provide any assistance with this.
If you have difficulties with the installation on Curnagl we can help you, so please contact us before the course at helpdesk@unil.ch with subject: DCSR ML course.
On the other hand, if you are unable to install the libraries on your laptop, we will unfortunately not be able to help you (there are too many particular cases), so you will need to use the UNIL Jupyter Lab during the course.
Before the course, we will send you all the files that are needed to do the practicals.
### **JupyterLab**
Here are some instructions for using the UNIL JupyterLab to do the practicals.
Go to the webpage: [https://jupyter.dcsr.unil.ch/jupyter](https://jupyter.dcsr.unil.ch/jupyter)
Enter the login and password that you have received during the course.
We have already prepared your workspace, including the data and notebook.
Double click on "Transformers\_with\_Hugging\_Face.ipynb"
Change the "ipykernel" (top right button "Python 3 ipykernel") to LLM
In the notebook, check that
platform = "jupyter"
To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.
When you have finished the practicals, select File / Log out.
### **Laptop**
You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).
Please decide in which folder (or path) you want to do the practicals, go there and copie the notebook there:
```
cd THE_PATH_WHERE_I_DO_THE_PRACTICALS
```
In the notebook, set
platform = "laptop"
Here are some instructions for installing PyTorch and other libraries on your laptop. You need Python >= 3.8.
##### **For Linux**
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install torch torchvision torchinfo transformers accelerate datasets sentencepiece pandas scikit-learn matplotlib sacremoses notebook ipywidgets gdown wget
```
You may need to choose the right library versions
To check that PyTorch was installed:
```
python3 -c "import torch; print(torch.__version__)"
```
There might be a warning message (see above) and the output should be something like "2.3.0".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
source mlcourse/bin/activate
jupyter notebook
```
##### **For Mac**
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install torch torchvision torchinfo transformers accelerate datasets sentencepiece pandas scikit-learn matplotlib sacremoses notebook ipywidgets gdown wget
```
You may need to choose the right library versions
To check that PyTorch was installed:
```
python3 -c "import torch; print(torch.__version__)"
```
There might be a warning message (see above) and the output should be something like "2.3.0".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
source mlcourse/bin/activate
jupyter notebook
```
##### **For Windows**
If you do not have Python installed, you can use either Conda: [https://docs.conda.io/en/latest/miniconda.html](https://docs.conda.io/en/latest/miniconda.html) (see the instructions here: [https://conda.io/projects/conda/en/latest/user-guide/install/windows.html](https://conda.io/projects/conda/en/latest/user-guide/install/windows.html)) or Python official installer: [https://www.python.org/downloads/windows/](https://www.python.org/downloads/windows/)
We will use a terminal to install the libraries.
Let us create a virtual environment. Open your terminal and type:
```
python3 -m venv mlcourse
source mlcourse/bin/activate
pip3 install torch torchvision torchinfo transformers accelerate datasets sentencepiece pandas scikit-learn matplotlib sacremoses notebook ipywidgets gdown wget
```
You may need to choose the right library versions
To check that PyTorch was installed:
```
python3 -c "import torch; print(torch.__version__)"
```
There might be a warning message (see above) and the output should be something like "2.3.0".
You can terminate the current session:
```
deactivate
```
**TO DO THE PRACTICALS (today or another day):**
You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:
```
mlcourse\Scripts\activate.bat
jupyter notebook
```
### **Curnagl**
For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.
For Mac users, download and install XQuartz (X server): [https://www.xquartz.org/](https://www.xquartz.org/)
For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: [https://mobaxterm.mobatek.net/download-home-edition.html](https://mobaxterm.mobatek.net/download-home-edition.html)
For Linux users, you do not need to install anything.
Here are some instructions for installing PyTorch and other libraries on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
```
Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.
For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.
See also the documentation: [https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster](https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster)
You can do the practicals in your /scratch directory or on the course group "cours\_hpc" if you have asked us in advanced:
```
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc
mkdir < my unil username >
cd < my unil username >
```
Clone the following git repos:
```
git clone https://c4science.ch/diffusion/13379/llm_course.git
```
Let us install libraries from the interactive partition:
```
Sinteractive -m 10G -G 1
module load python/3.10.13 cuda/11.8.0 cudnn/8.7.0.84-11.8
python -m venv mlcourse
source mlcourse/bin/activate
pip install -r llm_course/requirements.txt
```
To check that PyTorch was installed:
```
python3 -c "import torch; print(torch.__version__)"
```
There might be a warning message (see above) and the output should be something like "2.3.0".
You can terminate the current session:
```
deactivate
exit
```
**TO DO THE PRACTICALS (today or another day):**
```
ssh -Y < my unil username >@curnagl.dcsr.unil.ch
cd /scratch/< my unil username >
or
cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >
```
You can do the practicals on the interactive partition:
```
Sinteractive -m 10G -G 1
module load python/3.10.13 cuda/11.8.0 cudnn/8.7.0.84-11.8
source mlcourse/bin/activate
python
```
In the practical code (i.e. the Python code in the html file), you will need to set the paths as follows:
platform = "curnagl"
During the practicals, if you receive an error message "Disk quota exceeded", you will need to make some space in your home directory. For example, by deleting .cache.
# Performance of LLM backends and models in Curnagl
## Introduction
This page shows performance of Llama and mistral models on Curnagl hardware. We have measured the token throughput which should help you to have an idea of what is possible using Curnagl resources. Training time and inference time for different task could be estimated using these results.
---
## Models and backends tested
### Tested Models
**Llama3**
- Official access to Meta Llama3 models: [Meta Llama3 models on Hugging Face](https://huggingface.co/meta-llama)
- [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)
**Mistral**
- Official access to Mistral models: [Mistral models on MistralAI website](https://docs.mistral.ai/getting-started/models/models_overview/)
- Access to Mistral models on Hugging Face: [Mistral models on Hugging Face](https://huggingface.co/mistralai)
- [mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
- [Mixtral-8x7B-v0.1-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
---
### Tested Backends
- [**vLLM**](https://github.com/vllm-project/vllm)
vLLM backend provides efficient memory usage and fast token sampling. This backend is ideal for testing Llama3 and Mistral models in environments that require high-speed responses and low latency.
- [**llama.cpp**](https://github.com/ggerganov/llama.cpp)
llama.cpp was primarily used for llama but it can be applied to other LLM models. This optimized backend provides efficient inference on GPUs.
- [**Transformers**](https://huggingface.co/docs/transformers)
If not the most widely used LLM black box, it is one of them. Easy to use, the Hugging Face Transformers library supports a wide range of models and backends. One of its main advantages is its quick set up, which enables quick experimentation across architectures.
- [**mistral-inference**](https://github.com/mistralai/mistral-inference)
This is the official inference backend for Mistral. It is (supposed to be) optimized for Mistral's architecture, thus increasing the model performance. However, our benchmarks results do not demonstrate any specificities to Mistral model as llama.cpp seems to perform better.
---
## Hardware description
Three different types of GPUs have been used to benchmark LLM models:
- A100 which are available on Curnagl, [official documentation](https://www.nvidia.com/en-us/data-center/a100/),
- GH200 which will be available soon on Curnagl, [official documentation](https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip?ncid=no-ncid),
- L40 which will be available soon on Curnagl, [official documentation](https://www.nvidia.com/en-us/data-center/l40/) and [specifications](https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413?ncid=no-ncid).
Here are their specifications
| Characteristics| A100 | GH200 | L40S |
| ---- | ---- | ---- | ---- |
| Number of nodes at UNIL | 8 | 1 | 8 |
| Memory per node (GB) | 40 | 80 | 48 |
| Number of CPU per NUMA node | 48 | 72 | 8 |
| Memory bandwidth - up to (TB/s) | 1.9 | 4 | 0.86 |
| FP64 performance (teraFlops) | 9.7 | 34 | NA |
| TF64 performance (teraFlops) | 19.5 | 67 | NA |
| FP32 performance (teraFlops) | 19.5 | 67 | 91.6 |
| TF32 performance (teraFlops) | 156 | 494 | 183 |
| TF32 performance with sparsity (teraFlops) | 312 | 494 | 366 |
| FP16 performance (teraFlops) | 312 | 990 | 362 |
| INT8 performance (teraFlops) | 624 | 1.9 | 733 |
Depending on the code you are running, one GPU may better suit your requirements and expectations.
**Note:** These architectures are not powerful enough to train Large Language Models.
**Note:** Our benchmarks aim to determine which GPU types should be provided to researchers. If you require new GPUs for your research, feel free to reach out to us through the Help Desk. In case, you and other researchers agree on the same GPU request, we will do our best to provide new resources that meet your needs.
---
## Inference latency results
This [chat dataset from GPT3](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V3_unfiltered_cleaned_split.json) has been used to benchmark models.
In order to guarantee reproduciblity of resultst and be able to perform a comparison between different bechmarks we set the following parameters:
- The maximum number of tokens to generate, is set to ```400```
- The temperature, which controls the output randomness, is set to ```0```
- The context size, which is the number of tokens the model can process within a single input, is set to ``default``. This means the maximum context size of the model (e.g 131072 for Llama3.1)
- Use of GPU exclusively
- All models are loaded in F16 (no quantization)
### Mistral models
#### mistral-7B-Instruct-v0.3
| Backend results (Token/seconds)| A100 | GH200 | L40 |
| ---- | ---- | ---- | ---- |
| vllm | 74.1 | - | - |
| llama.cpp | 53.8 | 138.4 | 42.8 |
| Transformers | 30 | 41.3 | 21.6 |
| mistral-inference | 23.4 | - | 25 |
#### Mixtral-8x7B-v0.1-Instruct
| Backend results (Token/seconds)| A100 | GH200 | L40 |
| ---- | ---- | ---- | ---- |
| llama.cpp | NA | NA | 23.4 |
| Transformers | NA | NA | 8.5 |
### Llama models
#### 8B Instruct
| Backend results (Token/seconds)| A100 | GH200 | L40|
|---------|------|-------|-----|
| llama.cpp|62.645|100.845|43.387|
| Transformers| 31.650 | 43.321|21.062|
|vllm|44.686|-|45.176|
#### 70B Instruct
| Backend results (Token/seconds) | L40 |
| --------|-----|
| llama.cpp| 5.029|
| Transformers| 2.372|
| vllm| 30.945|
## Conclusions
- Mixtral 8x7B and Llama 70B Instruct are composed of several billions of parameters. Therefore the resulting memory consumption for inference can only be supported by multiple GPUs using the same machine or by using a combination of VRAM and RAM host memory. This of course will degrade the performance because we need to transfert data between two types of memory which could be slow. GH200 has a large bus memory which offers a good performance on this types of cases.
- The use of distributed setup adds a lot of latency.
- Transformers backend offers a good trade-off between learning curve and performance.
- Banckends offer the possibility to configure a context size. The parameter has no impact on peformance (token throughput) but it is correlated to the amount of VRAM consumed. Therefore, if you want to optimize memory consumption you should set the context size to an appropiate value.
- GH200 offers the best inference speed but it could be difficult to set up and install libraries on.
- The results shown here were obtained without any optimization. There are optimization than can be applied like quantization and flash attention.