Deep Learning with GPUs

The training phase of your deep learning model may be very time consuming. To accelerate this process you may want to use GPUs and you will need to install the deep learning packages, such as Keras or PyTorch, properly. Here is a short documentation on how to install some well known deep learning packages in Python and R. If you encounter any problem during the installation or if you need to install other deep learning packages (in Python, R or other programming languages), please send an email to helpdesk@unil.ch with subject DCSR: Deep Learning package installation, and we will try to help you.

Keras

To install the packages in your home directory:

cd $HOME

Log into a GPU node:

Sinteractive -p interactive -m 4G -G 1

Check that the GPU is visible:

nvidia-smi

Load parallel modules and python:

module purge
module load gcc/9.3.0 cuda/11.2.2 cudnn/8.1.1.33-11.2 python/3.8.8

Create a virtual environment. Here we will call it "venv_keras_gpu", but you may choose another name:

virtualenv -p python venv_keras_gpu

Activate the virtual environment:

source venv_keras_gpu/bin/activate

Install TensorFlow and Keras:

pip install tensorflow
pip install keras

Check that Keras was properly installed:

python -c 'import keras; print(keras.__version__)'

There might be a warning message and the output should be something like "2.4.3".

You may install extra packages that you deep learning code will use. For example:

pip install numpy
pip install scikit-learn
pip install pandas
pip install matplotlib

Deactivate your virtual environment and logout from the GPU node:

deactivate
exit

Comment

If you want to make your installation more reproducible, you may proceed as follows:

1. Create a file called "requirements.txt" and write the package names inside. You may also specify the package versions. For example:

tensorflow==2.4.1
keras==2.4.3
numpy==1.19.5
scikit-learn==0.24.2
pandas==1.2.4
matplotlib==3.4.2

2. Proceed as above, but instead of installing the packages individually, type

pip install -r requirements.txt

Run your deep learning code

To test your deep learning code (maximum 1h), say "my_deep_learning_code.py", you may use the interactive mode:

cd /scratch/username/

Sinteractive -p interactive -m 4G -G 1

module load gcc cuda cudnn python/3.8.8

source $HOME/venv_keras_gpu/bin/activate

Run your code:

python my_deep_learning_code.py

or copy/paste your code inside a python environment:

python

copy/paste your code

Comment

To confirm that TensorFlow is using the GPU:

import tensorflow as tf
tf.config.list_physical_devices("GPU")

or to obtain the number of GPUs available:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices("GPU")))

Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my_sbatch_script.sh":

#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch

#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out

#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00

module load gcc cuda cudnn python/3.8.8

source $HOME/venv_keras_gpu/bin/activate

python /PATH_TO_YOUR_CODE/my_deep_learning_code.py

To launch your job:

cd $HOME/PATH_TO_YOUR_SBATCH_SCRIPT/

sbatch my_sbatch_script.sh

Multi-GPU parallelism

If you want to use a single GPU, you do not need to tell Keras to use the GPU. Indeed, if a GPU is available, Keras will use it automatically.

On the other hand, if you want to use 2 (or more) GPUs (on the same node), you need to tell Keras to use 2 GPUs. For that, you need to use a special Keras function, called "multi_gpu_model", in your python code "my_deep_learning_code.py":

from keras.utils import multi_gpu_model

parallel_model = multi_gpu_model(model, gpus=2)

This function implements single-machine multi-GPU data parallelism (so gpus >=2). It works in the following way: divide the input data into multiple sub-batches, apply a model copy on each sub-batch, where every model copy is executed on a dedicated GPU, and finally concatenate the results (on CPU) into one big batch. For example, if your batch_size is 64 and you use gpus=2, then we will divide the input data into 2 sub-batches of 32 samples, process each sub-batch on one GPU, then return the full batch of 64 processed samples. This induces quasi-linear speedup. For more information, see the Keras documentation https://faroit.com/keras-docs/2.1.2/utils/#multi_gpu_model

And the sbatch script must contain the line:

#SBATCH --gres gpu:2

TensorFlow

The installation of TensorFlow is the same as for Keras, except that you do not need to install Keras and that you may want to call your virtual environment "venv_tensorflow_gpu", so please look at the above Keras installation documentation.

Warning

In TensorFlow 1.15 and previous versions, the packages for CPU and GPU are offered separately:

pip install tensorflow==1.15 # CPU
pip install tensorflow-gpu==1.15 # GPU

PyTorch

To install the packages in your home directory:

cd $HOME

Log into a GPU node:

Sinteractive -p interactive -m 4G -G 1

Check that the GPU is visible:

nvidia-smi

Load parallel modules and python:

module purge
module load gcc cuda cudnn python/3.8.8

Create a virtual environment. Here we will call it "venv_pytorch_gpu", but you may choose another name:

virtualenv -p python venv_pytorch_gpu

Activate the virtual environment:

source venv_pytorch_gpu/bin/activate

Install PyTorch:

pip install torch
pip install torchvision

Check that PyTorch was properly installed:

python -c 'import torch; print(torch.__version__)'

There might be a warning message and the output should be something like "1.8.1".

You may install extra packages that you deep learning code will use. For example:

pip install sklearn
pip install pandas
pip install matplotlib

Deactivate your virtual environment and logout from the GPU node:

deactivate
exit

Comment

If you want to make your installation more reproducible, you may proceed as follows:

1. Create a file called "requirements.txt" and write the package names inside. You may also specify the package versions. For example:

torch==1.8.1
torchvision==0.9.1
sklearn==0.24.2
pandas==1.2.4
mathplotlib==3.4.2

2. Proceed as above, but instead of installing the packages individually, type

pip install -r requirements.txt

Run your deep learning code

To test your deep learning code (maximum 1h), say "my_deep_learning_code.py", you may use the interactive mode:

cd /scratch/username/

Sinteractive -p interactive -m 4G -G 1

module load gcc cuda cudnn python/3.8.8

source $HOME/venv_pytorch_gpu/bin/activate

Run your code:

python my_deep_learning_code.py

or copy/paste your code inside a python environment:

python

copy/paste your code

Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my_sbatch_script.sh":

#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch

#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out

#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00

module load gcc cuda cudnn python/3.8.8

source $HOME/venv_pytorch_gpu/bin/activate

python /PATH_TO_YOUR_CODE/my_deep_learning_code.py

To launch your job:

cd $HOME/PATH_TO_YOUR_SBATCH_SCRIPT/

sbatch my_sbatch_script.sh

R Keras

R Keras is an interface to Python Keras. In simple terms, this means that the Keras R package allows you to enjoy the benefit of R programming while having access to the capabilities of the Python Keras package.

To install the packages in your home directory:

cd $HOME

Log into a GPU node:

Sinteractive -p interactive -m 4G -G 1

Check that the GPU is visible:

nvidia-smi

Load parallel modules and python:

module purge
module load gcc cuda cudnn python/3.8.8 r/4.0.4

Launch an R environment:

Install the R Keras package by using a virtual environment (called "venv_r-tensorflow_gpu"):

install.packages("keras")

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

library(keras)
library("tensorflow")

install_tensorflow(version = "gpu", method = "virtualenv", envname = "venv_r-tensorflow_gpu")

q()

This will install Keras and TensorFlow. library(tensorflow) install_tensorflow(method="virtualenv", envname="r-tensorflow_gpu", version = "gpu") q()

Comment

If you receive an error message concerning "conda", you may need to look at your .bashrc file for a conda init configuration and comment this part.

Run your deep learning code

To test your deep learning code (maximum 1h), say "my_deep_learning_code.R", you may use the interactive mode:

Sinteractive -p interactive -m 4G -G 1

module load gcc cuda cudnn python/3.8.8 r/4.0.4

R

library(keras)
library("tensorflow")

copy/paste your code

Comment

To confirm that TensorFlow is using the GPU:

tf$config$list_physical_devices("GPU")

or to obtain the number of GPUs available:

print(length(tf$config$list_physical_devices("GPU")))

Once you have finished testing your code, you must close your interactive session (by typing exit), and then run it on the cluster by using an sbatch script, say "my_sbatch_script.sh":

#!/bin/bash -l
#SBATCH --account your_account_id
#SBATCH --mail-type ALL
#SBATCH --mail-user firstname.surname@unil.ch

#SBATCH --chdir /scratch/username/
#SBATCH --job-name my_deep_learning_job
#SBATCH --output my_deep_learning_job.out

#SBATCH --partition gpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 10G
#SBATCH --time 01:00:00

module load gcc cuda cudnn python/3.8.8

Rscript /PATH_TO_YOUR_CODE/my_deep_learning_code.R

To launch your job:

cd $HOME/PATH_TO_YOUR_SBATCH_SCRIPT/

sbatch my_sbatch_script.sh

Multi-GPU parallelism

If you want to use a single GPU, you do not need to tell R Keras to use the GPU. Indeed, if a GPU is available, R Keras will use it automatically.

On the other hand, if you want to use 2 (or more) GPUs (on the same node), you need to tell R Keras to use 2 GPUs. For that, you need to use a special R Keras function, called "multi_gpu_model", in your R code "my_deep_learning_code.R":

library(keras)
library(tensorflow)

parallel_model = multi_gpu_model(model, gpus=2)

This function implements single-machine multi-GPU data parallelism (so gpus >=2). It works in the following way: divide the input data into multiple sub-batches, apply a model copy on each sub-batch, where every model copy is executed on a dedicated GPU, and finally concatenate the results (on CPU) into one big batch. For example, if your batch_size is 64 and you use gpus=2, then we will divide the input data into 2 sub-batches of 32 samples, process each sub-batch on one GPU, then return the full batch of 64 processed samples. This induces quasi-linear speedup. For for information, see the R Keras documentation: https://keras.rstudio.com/reference/multi_gpu_model.html

And the sbatch script must contain the line:

#SBATCH --gres gpu:2