Requesting and using GPUs

Both Curnagl and Urblauna have nodes with GPUs.

You can find a detailed description of Curnagl GPUs here and Urblauna GPUs here

An introductory tutorial video on using HPC clusters is available here.

Requesting GPUs

In order to access the GPUs they need to be requested via SLURM as one does for other resources such as CPUs and memory.

The flag required is --gres=gpu:1 for 1 GPU per node, you can use any number between 1 and N (--gres=gpu:N). Please check cluster documentation.

An example job script is as follows:

#!/bin/bash -l

#SBATCH --cpus-per-task 12
#SBATCH --mem 64G
#SBATCH --time 12:00:00

# GPU partition request only for Curnagl 
#SBATCH --partition gpu

#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding

# Set up my modules

module purge
module load my list of modules
module load cuda

# Check that the GPU is visible

nvidia-smi

# Run my GPU enable python code

python mygpucode.py

If the #SBATCH --gres gpu:1 is omitted then no GPUs will be visible even if they are present on the compute node.

If you request one GPU it will always be seen as device 0.

The #SBATCH --gres-flags enforce-binding option ensures that the CPUs allocated will be on the same PCI bus as the GPU(s) which greatly improves the memory bandwidth. This may mean that you have to wait longer for resources to be allocated but it is strongly recommended.

Using CUDA

In order to use the CUDA toolkit there is a module available

module load cuda

This loads the nvcc compiler and CUDA libraries. There is also a cudnn module for the DNN tools/libraries

Containers and GPUs

Singularity containers can make use of GPUs but in order to make them visible to the container environment an extra flag "--nv" must be passed to Singularity

module load singularity

singularity run --nv mycontainer.sif

The full documentation is at https://sylabs.io/guides/3.5/user-guide/gpu.html

you can find here, examples of using GPUs from containers.

DCSR? Kesako?

How to access the clusters

I'm a PI and would like to use the clusters - what do I do?

How do I ask for help?

Recovering deleted files?

Curnagl

Urblauna

How to run a job on Curnagl

What projects am I part of and what is my default account?

Providing access to external collaborators

Requesting and using GPUs

How do I run a job for more that 3 days?

Access NAS DCSR from the cluster

SSH connection to DCSR cluster

Checkpoint SLURM jobs

Urblauna access and data transfer

Job Templates

Urblauna Guacamole / RDP issues

Transfer files to/from Curnagl

Transfert S3 DCSR to other support

DCSR Software Stack

Old software stack

R on the clusters

Rstudio on the Curnagl cluster

MATLAB on the clusters

Using Conda and Anaconda

Using Mamba to install Conda packages

AlphaFold

Alphafold 3

CryoSPARC

Compiling and running MPI codes

Deep Learning with GPUs

Software local installation

Rstudio on the Urblauna cluster

DCSR GitLab service

Running Busco

SWITCHfilesender from the cluster

Filetransfer from the cluster

R on the clusters (old)

Sandbox containers

Course software for decision trees / random forests

Course software for introductory deep learning

JupyterLab on the curnagl cluster

JupyterLab with C++ on the curnagl cluster

Dask on curnagl

Running the Isca framework on the cluster

Running the MPAS framework on the cluster

Run OpenFOAM codes on Curnagl

Compiling software using cluster libraries

Course software for Image Analysis with CNNs

Course software for Text Analysis with LLMs

Run MPI with containers

Measuring job's CO2 footprint

Profiling Tools

DCSR Courses

How to run LLM models

Performance of LLM backends and models in Curnagl

Requesting and using GPUs

Requesting GPUs

Using CUDA

Containers and GPUs