Job Templates

Here you can find example job script templates for a variety of job types

You can copy and paste the examples to use as a base - don't forget to edit the account and e-mail address as well as which software you want to use!

For all the possible things you can ask for see the official documentation

Single threaded tasks

Here we want to use a tool that cannot make use of more than one CPU at a time.

The important things to know are:

How long do I expect the job to run for?
How much memory do I think I need?
Do I want e-mail notifications?
What modules (or other software) do I need to load?

#!/bin/bash

#SBATCH --cpus-per-task 1
#SBATCH --partition cpu
#SBATCH --mem 8G
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc

Array jobs

Here we want to run an array job where there are N almost identical jobs that differ only in the input parameters.

In this example we use 1 CPU per task but you can obviously use more (see the multi-threaded task example)

See our introductory course for more details

The important things to know are:

How long do I expect each individual job to run for?
How much memory do I think I need per individual job?
How many array elements do I have?
How am I going to prepare my inputs for the elements?
Do I want e-mail notifications?
What modules (or other software) do I need to load?

#!/bin/bash

#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --partition cpu
#SBATCH --time 12:00:00
#SBATCH --array=1-100

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Extract the parameters from a file (one line per job array element)

INPUT=$(sed -n ${SLURM_ARRAY_TASK_ID}p in.list)

# Load the required software: e.g.
# module purge
# module load gcc

Multi-threaded tasks

Here we want to use a tool that makes use of more than one CPU at a time.

The important things to know are:

How long do I expect the job to run for?
How much memory do I think I need?
How many cores can the task use efficiently?
How do I tell the code how many cores/threads it should use?
Do I want e-mail notifications?
What modules (or other software) do I need to load?

Note that on the DCSR clusters the variable OMP_NUM_THREADS is set to the same value as cpus-per-task but here we set it explicitly as an example

#!/bin/bash

#SBATCH --cpus-per-task 8
#SBATCH --mem 64G
#SBATCH --partition cpu
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Set the number of threads for OpenMP codes

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Load the required software: e.g.
# module purge
# module load gcc

MPI tasks

Here we want to use code that uses MPI to allow for distributed memory parallel calculations.

The important things to know are:

How mank ranks (MPI tasks) do I want to run?
How does my code performance scale as I increase the number of ranks?
How much memory do I think I need per rank?
How long do I expect the job to run for?
Do I want e-mail notifications?
What modules (or other software) do I need to load?

Here we give the example of a code that we know runs efficiently with ~100 ranks so we choose 96 as this completely fills two compute nodes.

With MPI tasks always choose a number of tasks that entirely fills nodes: 48 / 96 / 144 / 192 etc - this is where the --ntasks-per-node directive is useful.

As we know that we are using the entire node it makes sense to ask for all the memory even if we don't need it.

#!/bin/bash

#SBATCH --nodes 2
#SBATCH --ntasks-per-node 48 
#SBATCH --cpus-per-task 1
#SBATCH --mem 500G
#SBATCH --partition cpu
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc mvapich2

# MPI codes must be launched with srun

srun mycode.x

Hybrid MPI/OpenMP tasks

Here we want to run a hybrid MPI/OpenMP code where each MPI rank uses OpenMP for shared memory parallelisation.

Based on the code and the CPU architecture we know that 12 threads per rank is efficient - always run tests to find the best ratio of threads per rank!

The important things to know are:

How mank ranks (MPI tasks) do I want to run?
How does my code performance scale as I increase the number of ranks and threads per rank?
How much memory do I think I need per rank (taking into account OpenMP?
How long do I expect the job to run for?
Do I want e-mail notifications?
What modules (or other software) do I need to load?

#!/bin/bash

#SBATCH --nodes 2
#SBATCH --ntasks-per-node 4 
#SBATCH --cpus-per-task 12
#SBATCH --mem 500G
#SBATCH --partition cpu
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc mvapich2

# Set the number of threads for the OpenMP tasks (12 in this case)

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# MPI codes must be launched with srun

srun mycode.x

GPU tasks

Here we want to run a code that makes use of one GPU and one CPU core - some codes are able to use multiple GPUs and CPU cores but please check how the performance scales!

The important things to know are:

How mank GPUs do I need (1 or 2)
How does my code performance scale as I increase the number GPUs?
How much memory do I think I need for the CPU part of the job.
How long do I expect the job to run for?
Do I want e-mail notifications?
What modules (or other software) do I need to load?

Note the use of the --gres-flags enforce-binding directive to ensure that the CPU part of the code is on the same bus as the GPU used so as to maximise memory bandwidth.

In this example we run 2 tasks per node over 4 nodes for a total of 8 ranks and 8 GPUs.

#!/bin/bash

#SBATCH --cpus-per-task 1
#SBATCH --mem 500G
#SBATCH --partition gpu
#SBATCH --time 12:00:00
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc cuda

MPI+GPU tasks

Here we have a code that used MPI for distributed memory parallelisation with one GPU per rank for computation.

The important things to know are:

How mank GPUs per rank do I need (probably 1)
How does my code performance scale as I increase the number ranks?
How much memory do I think I need for the CPU part of the job.
How long do I expect the job to run for?
Do I want e-mail notifications?
What modules (or other software) do I need to load?

Note the use of the --gres-flags enforce-binding directive to ensure that the CPU part of the code is on the same bus as the GPU used so as to maximise memory bandwidth.

In this example we run 2 tasks per node over 4 nodes for a total of 8 ranks and 8 GPUs.

#!/bin/bash

#SBATCH --nodes 4
#SBATCH --ntasks-per-node 2 
#SBATCH --cpus-per-task 8
#SBATCH --mem 500G
#SBATCH --partition gpu
#SBATCH --time 12:00:00
#SBATCH --gpus-per-task 1
#SBATCH --gres-flags enforce-binding

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc mvapich2 cuda

# MPI codes must be launched with srun

srun mycode.x

DCSR? Kesako?

How to access the clusters

I'm a PI and would like to use the clusters - what do I do?

How do I ask for help?

Recovering deleted files?

Curnagl

Urblauna

How to run a job on Curnagl

What projects am I part of and what is my default account?

Providing access to external collaborators

Requesting and using GPUs

How do I run a job for more that 3 days?

Access NAS DCSR from the cluster

SSH connection to DCSR cluster

Checkpoint SLURM jobs

Urblauna access and data transfer

Job Templates

Urblauna Guacamole / RDP issues

Urblauna

Transfer files to/from Curnagl

Transfert S3 DCSR to other support

DCSR Software Stack

Old software stack

R on the clusters

Rstudio on the Curnagl cluster

MATLAB on the clusters

Using Conda and Anaconda

Using Mamba to install Conda packages

AlphaFold

Alphafold 3

CryoSPARC

Compiling and running MPI codes

Deep Learning with GPUs

Software local installation

Rstudio on the Urblauna cluster

DCSR GitLab service

Running Busco

How to run LLM models

SWITCHfilesender from the cluster

Filetransfer from the cluster

R on the clusters (old)

Sandbox containers

Course software for decision trees / random forests

Course software for introductory deep learning

JupyterLab on the curnagl cluster

JupyterLab with C++ on the curnagl cluster

Dask on curnagl

Running the Isca framework on the cluster

Running the MPAS framework on the cluster

Run OpenFOAM codes on Curnagl

Compiling software using cluster libraries

Course software for Image Analysis with CNNs

Course software for Text Analysis with LLMs

Performance of LLM backends and models in Curnagl

Run MPI with containers

Profiling Tools

DCSR Courses

Job Templates

Single threaded tasks

Array jobs

Multi-threaded tasks

MPI tasks

Hybrid MPI/OpenMP tasks

GPU tasks

MPI+GPU tasks