Passer au contenu principal

Requesting and using GPUs

GPU Nodes

As part of the Axiomgpu partition there are a number of GPU equipped nodes available.

Currently there are 87 nodes each with 2 KeplerNVIDIA classA100 GPUs. One additional node is in the interactive partition 

Requesting GPUs

In order to access the GPUs they need to be requested via SLURM as one does for other resources such as CPUs and memory. 

The flag required is --gres=gpu:1 for 1 GPU per node and --gres=gpu:2 for 2 GPUs per node. 

 An example job script is as follows:

#!/bin/bash

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 12
#SBATCH --mem 24G64G
#SBATCH --time 12:00:00

# NOTE - GPUS are in the Axiomgpu partition

#SBATCH --partition axiomgpu
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding

# Set up my modules

module purge
module load cuda/toolkitmy list of modules
module load cuda

# Check that the GPU is visible

nvidia-smi

# Run my GPU enable python code

python mygpucode.py 

If the #SBATCH --gres gpu:1 is omitted then no GPUs will be visible even if they are present on the compute node. 

If you request one GPU it will always be seen as device 0.

The #SBATCH --gres-flags enforce-binding option ensures that the CPUs allocated will be on the same PCI bus as the GPU(s) which greatly improves the memory bandwidth. This may mean that you have to wait longer for resources to be allocated but it is strongly recommended.

If you select 2 GPUs then we strongly advise also requesting #SBATCH --exculsive to have all the resources of the node available to your job.

 

Using CUDA

In order to use the CUDA toolkit there is a module available

module load cuda/toolkitcuda

This loads the nvcc compiler and CUDA libraries. 

TheThere NVIDIAis CUDAalso samplesa arecudnn availablenodule atfor /software/external/cuda/10.2/samples -the pleaseDNN make sure to copy them to your home or scratch space before trying to edit and compile.tools/libraries 

 

Containers and GPUs

Singularity containers can make use of GPUs but in order to make them visible to the container environment an extra flag "--nv" must be passed to Singularity

module load singularity

singularity run --nv mycontainer.sif

The full documentation is at https://sylabs.io/guides/3.5/user-guide/gpu.html