Using the Clusters

How to run a job on Curnagl

Overview

Suppose that you have finished writing your code, say a python code called <my_code.py>, and you want to run it on the cluster Curnagl. You will need to submit a job (a bash script) with information such as the number of CPU you want to use and the amount of RAM memory you will need. This information will be processed by the job scheduler (a software installed on the cluster) and your code will be executed. The job scheduler used in Wally is called SLURM (Simple Linux Utility for Resource Management). It is a free open-source software used by many of the world’s computer clusters.

The partitions

The clusters contain several partitions (sets of compute nodes dedicated to different means). To list them, type

sinfo

As you can see, there are three partitions:

Each partition is associated with a submission queue. A queue is essentially a waiting line for your compute job to be matched with an available compute resource. Those resources become available once a compute job from a previous user is completed.

Note that the nodes may be in different states: idle=not used, alloc=used, down=switch off, etc. Depending on what you want to do, you should choose the appropriate partition/submission queue.

The sbatch script

To execute your python code on the cluster, you need to make a bash script, say <my_script.sh>, specifying the information needed to run your python code (you may want to use nano, vim or emacs as an editor on the cluster). Here is an example:

#!/bin/bash -l

#SBATCH --account project_id 
#SBATCH --mail-type ALL 
#SBATCH --mail-user firstname.surname@unil.ch

#SBATCH --chdir /scratch/<your_username>/
#SBATCH --job-name my_code 
#SBATCH --output my_code.out

#SBATCH --partition cpu

#SBATCH --nodes 1 
#SBATCH --ntasks 1 
#SBATCH --cpus-per-task 8 
#SBATCH --mem 10G 
#SBATCH --time 00:30:00 
#SBATCH --export NONE

module load gcc/10.4.0 python/3.9.13

python3 /PATH_TO_YOUR_CODE/my_code.py

Here we have used the command "module load gcc/9.3.0 python/3.8.8" before "python3 /PATH_TO_YOUR_CODE/my_code.py" to load some libraries and to make several programs available.

To display the list of available modules or to search for a package:

module avail
module spider package_name

For example, to load bowtie2:

module load gcc/9.3.0 bowtie2/2.4.2

To display information of the sbatch command, including the SLURM options:

sbatch --help
sbatch --usage

Finally, you submit the bash script as follows:

sbatch my_script.sh

Important: We recommend to store the above bash script and your python code in your home folder, and to store your main input data in your work space. The data may be read from your python code. Finally you must write your results in your scratch space.

To show the state (R=running or PD=pending) of your jobs, type:

Squeue

If you realize that you made a mistake in your code or in the SLURM options, you may cancel it:

scancel JOBID

An interactive session

Often it is convenient to work interactively on the cluster before submitting a job. I remind you that when you connect to the cluster you are actually located at the front-end machine and your must NOT run any code there. Instead you should connect to a node by using the Sinteractive command as shown below.


[ulambda@login ~]$ Sinteractive -c 1 -m 8G -t 01:00:00
 
interactive is running with the following options:

-c 1 --mem 8G -J interactive -p interactive -t 01:00:00 --x11

salloc: Granted job allocation 172565
salloc: Waiting for resource configuration
salloc: Nodes dna020 are ready for job
[ulambda@dna020 ~]$  hostname
dna020.curnagl

You can then run your code.

Hint: If you are having problems with a job script then copy and paste the lines one at a time from the script into an interactive session - errors are much more obvious this way.

You can see the available options by passing the -h option.

[ulambda@login1 ~]$ Sinteractive -h
Usage: Sinteractive [-t] [-m] [-A] [-c] [-J]

Optional arguments:
    -t: time required in hours:minutes:seconds (default: 1:00:00)
    -m: amount of memory required (default: 8G)
    -A: Account under which this job should be run
    -R: Reservation to be used
    -c: number of CPU cores to request (default: 1)
    -J: job name (default: interactive)
    -G: Number of GPUs (default: 0)


To logout from the node, simply type:

exit

Embarrassingly parallel jobs

Suppose you have 14 configuration files in <path_to_configurations> and you want to process them in parallel by using your python code <my_code.py>. This is an example of embarrassingly parallel programming where you run 14 independent jobs in parallel, each with a different set of parameters specified in your configuration files. One way to do it is to use an array type:

#!/bin/bash -l

#SBATCH --account project_id 
#SBATCH --mail-type ALL 
#SBATCH --mail-user firstname.surname@unil.ch 

#SBATCH --chdir /scratch/<your_username>/
#SBATCH --job-name my_code 
#SBATCH --output=my_code_%A_%a.out

#SBATCH --partition cpu
#SBATCH --ntasks 1

#SBATCH --cpus-per-task 8 
#SBATCH --mem 10G 
#SBATCH --time 00:30:00 
#SBATCH --export NONE

#SBATCH --array=0-13

module load gcc/10.4.0 python/3.9.13

FILES=(/path_to_configurations/*)

python3 /PATH_TO_YOUR_CODE/my_code.py ${FILES[$SLURM_ARRAY_TASK_ID]}

The above allocations (for example time=30 minutes) is applied to each individual job in your array.

Similarly, if the configuration files are simple numbers:

#!/bin/bash -l

#SBATCH --account project_id 
#SBATCH --mail-type ALL 
#SBATCH --mail-user firstname.surname@unil.ch 

#SBATCH --chdir /scratch/<your_username>/
#SBATCH --job-name my_code 
#SBATCH --output=my_code_%A_%a.out

#SBATCH --partition cpu 
#SBATCH --ntasks 1

#SBATCH --cpus-per-task 8 
#SBATCH --mem 10G 
#SBATCH --time 00:30:00 
#SBATCH --export NONE

#SBATCH --array=0-13

module load gcc/10.4.0 python/3.9.13

ARGS=(0.1 2.2 3.5 14 51 64 79.5 80 99 104 118 125 130 100)

python3 /PATH_TO_YOUR_CODE/my_code.py ${ARGS[$SLURM_ARRAY_TASK_ID]}

Another way to run embarrassingly parallel jobs is by using one-line SLURM commands. For example, this may be useful if you want to run your python code on all the files with bam extension in a folder: 

for file in `ls *.bam`
do
sbatch --account project_id --mail-type ALL --mail-user firstname.surname@unil.ch
--chdir /scratch/<your_username>/ --job-name my_code --output my_code-%j.out --partition cpu
--nodes 1 --ntasks 1 --cpus-per-task 8 --mem 10G --time 00:30:00
--wrap "module load gcc/9.3.0 python/3.8.8; python3 /PATH_TO_YOUR_CODE/my_code.py $file" &
done

MPI jobs

Suppose you are using MPI codes locally and you want to launch them on Curnagl. 

The below example is a slurm script running an MPI code  mpicode (which can be either of C, python, or fortran type...) on one single node (i.e. --nodes 1) using NTASKS cores without using multi-threading (i.e. --cpus-per-task 1). In this example, the memory required is 32Gb in total. To run an MPI code, the loading modules are gcc and mvapich2 only. You must add needed modules (depending on your code) behind those two.

Instead of mpirun command, you must use srun command, which is the equivalent command to run MPI codes on a cluster. To know more about srun, go through srun --help documentation.

#!/bin/bash -l 

#SBATCH --account project_id  
#SBATCH --mail-type ALL  
#SBATCH --mail-user firstname.surname@unil.ch  

#SBATCH --chdir /scratch/<your_username>/ 
#SBATCH --job-name testmpi 
#SBATCH --output testmpi.out 

#SBATCH --partition cpu 
#SBATCH --nodes 1  
#SBATCH --ntasks NTASKS 
#SBATCH --cpus-per-task 1 
#SBATCH --mem 32G  
#SBATCH --time 01:00:00  

module purge
module load gcc/10.4.0 mvapich2/2.3.7  

srun mpicode 

For a complete MPI overview on Curnagl, please refer to compiling and running MPI codes wiki.

Good practice



What projects am I part of and what is my default account?

In order to find out what projects you are part of on the clusters then you can use the Sproject tool:

$ Sproject 

The user ulambda ( Ursula Lambda ) is in the following project accounts
  
   ulambda_default
   ulambda_etivaz
   ulambda_gruyere
 
Their default account is: ulambda_default

If Sproject is called without any arguments then it tells you what projects/accounts you are in. 

To find out what projects other users are in you can call Sproject with the -u option

$ Sproject -u nosuchuser

The user nosuchuser ( I really do not exist ) is in the following project accounts
..
..

 

Providing access to external collaborators

In order to allow non UNIL collaborators to use the HPC clusters there are three steps which are detailed below.

Please note that the DCSR does not accredit external collaborators as this is a centralised process.

The procedures for different user groups are explained at https://www.unil.ch/ci/ui

  1. The external collaborator must first obtain an EduID via www.eduid.ch
  2. The external collaborator must ask for a UNIL account using this form. The external collaborator must give the name of the PI in the form (The PI is "sponsoring" the account)
  3. the PI to whom the external collaborator is connected must use this application to add the collaborator into the appropriate project. Log into the application if necessary on the top right, and click on the "Manage members list / Gérer la liste de membres" icon for your project. The usernames always have 8 characters (e.g. Greta Thunberg username would be: gthunber)
  4. the external collaborator needs to use the UNIL VPN:

    https://www.unil.ch/ci/fr/home/menuinst/catalogue-de-services/reseau-et-telephonie/acces-hors-campus-vpn/documentation.html

The external collaborator on the VPN can then login to the HPC cluster as if he was inside the UNIL.

Requesting and using GPUs

GPU Nodes

Both Curnagl and Urblauna have nodes with GPUs - on Curnagl these are in a separate partition.

Curnagl

Currently there are 7 nodes each with 2 NVIDIA A100 GPUs. One additional node is in the interactive partition 

Urblauna

Currently there are 2 nodes each with 2 NVIDIA A100 GPUs. The GPUs are partitioned into 2 GPUs with 20GB of memory so it appears that each node had 4 distinct GPUs. These GPUs are also available interactively. 

Requesting GPUs

In order to access the GPUs they need to be requested via SLURM as one does for other resources such as CPUs and memory. 

The flag required is --gres=gpu:1 for 1 GPU per node and --gres=gpu:2 for 2 GPUs per node. 

 An example job script is as follows:

#!/bin/bash -l

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 12
#SBATCH --mem 64G
#SBATCH --time 12:00:00

# GPU partition request only for Curnagl 
#SBATCH --partition gpu

#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding

# Set up my modules

module purge
module load my list of modules
module load cuda

# Check that the GPU is visible

nvidia-smi

# Run my GPU enable python code

python mygpucode.py 

If the #SBATCH --gres gpu:1 is omitted then no GPUs will be visible even if they are present on the compute node. 

If you request one GPU it will always be seen as device 0.

The #SBATCH --gres-flags enforce-binding option ensures that the CPUs allocated will be on the same PCI bus as the GPU(s) which greatly improves the memory bandwidth. This may mean that you have to wait longer for resources to be allocated but it is strongly recommended.

Partitions

The #SBATCH --partition can take different options depending on whether you are on Curnagl or on Urblauna.

Curnagl:

Urblauna:

Using CUDA

In order to use the CUDA toolkit there is a module available

module load cuda

This loads the nvcc compiler and CUDA libraries. There is also a cudnn nodule for the DNN tools/libraries 


Containers and GPUs

Singularity containers can make use of GPUs but in order to make them visible to the container environment an extra flag "--nv" must be passed to Singularity

module load singularity

singularity run --nv mycontainer.sif

The full documentation is at https://sylabs.io/guides/3.5/user-guide/gpu.html


How do I run a job for more that 3 days?

The simple answer is that you can't without special authorisation. Please do not submit such jobs and ask for a time extension!

If you think that you need to run for longer than 3 days then please do the following:

Contact us via helpdesk@unil.ch and explain what the problem is.

We will then get in touch with you to analyse your code and suggest performance or workflow improvements to either allow it to complete within the required time or to allow it to be run in steps using checkpoint/restart techniques.

Recent cases involve codes that were predicted to take months to run now finishing in a few days after a bit of optimisation.

If the software cannot be optimised, there is the possibility of using a checkpoint mechanism. More information is available on the checkpoint page

Access NAS DCSR from the cluster

The NAS is available from the login node only under /nas. The folder hierarchy is:

/nas/FAC/<your_faculty>/<your_department>/<your_PI>/<your_project>

Cluster -> NAS

To copy a file to the new NAS:

cp /path/to/file /nas/FAC/<your_faculty>/<your_department>/<your_PI>/<your_project>

To copy a folder to the new NAS:

cp -r /path/to/folder /nas/FAC/<your_faculty>/<your_department>/<your_PI>/<your_project>

For more complex operations, consider using rsync. For the documentation see the man page:

man rsync

or check out this link.

NAS -> cluster

As above, just swapping the source and destination:

cp /nas/FAC/<your_faculty>/<your_department>/<your_PI>/<your_project>/file /path/to/dest
cp -r /nas/FAC/<your_faculty>/<your_department>/<your_PI>/<your_project>/folder /path/to/dest

SSH connection to DCSR cluster

This page presents how to connect to DCSR cluster depending on your operating system.

Linux

SSH is always installed by most commons Linux distributions, so no extra package should be installed.

Connection with a password

To connect using a password, just run the following command:

ssh username@curnagl.dcsr.unil.ch

Of course, replace username in the command line with your UNIL login, and use your UNIL password.

Connection with a key

To connect with a key, you first have to generate the key on your laptop. This can be done as follows:

ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/ejeanvoi/.ssh/id_ed25519): /home/ejeanvoi/.ssh/id_dcsr_cluster
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/ejeanvoi/.ssh/id_dcsr_cluster
Your public key has been saved in /home/ejeanvoi/.ssh/id_dcsr_cluster.pub
The key fingerprint is:
SHA256:8349RPk/2AuwzazGul4ki8xQbwjGj+d7AiU3O7JY064 ejeanvoi@archvm
The key's randomart image is:
+--[ED25519 256]--+
|                 |
|    .            |
|     + .       . |
|    ..=+o     o  |
|     o=+S+ o . . |
|     =*+oo+ * . .|
|    o *=..oo Bo .|
|   . . o.o.oo.+o.|
|     E..++=o   oo|
+----[SHA256]-----+

By default, it suggests you to create the private key to ~/.ssh/id_ed25519 and the public key to to ~/.ssh/id_ed25519.pub. You can hit "Enter" when the question is asked if you don't use any other key. Otherwise, you can choose another path, for instance: ~/.ssh/id_dcsr_cluster like in the example above.

Then, you have to enter a passphrase (twice). This is optional but you are strongly encouraged to choose a strong passphrase.

Once the key is created, you have to copy the public to the cluster. This can be done as follows:

[ejeanvoi@archvm ~]$ ssh-copy-id -i /home/ejeanvoi/.ssh/id_dcsr_cluster ejeanvoi@curnagl.dcsr.unil.ch
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/ejeanvoi/.ssh/id_dcsr_cluster.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
ejeanvoi@curnagl.dcsr.unil.ch's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'ejeanvoi@curnagl.dcsr.unil.ch'"
and check to make sure that only the key(s) you wanted were added.

Thanks to -i option, you can specify the path to the private key, here we use /home/ejeanvoi/.ssh/id_dcsr_cluster to comply with the beginning of the example. You are asked to enter you UNIL password to access the cluster, and behind the scene, the public key will be automatically copied to the cluster.

Finally, you can connect to the cluster using you key, and that time, you will be asked to enter the passphrase of the key (and not the UNIL password):

[ejeanvoi@archvm ~]$ ssh -i /home/ejeanvoi/.ssh/id_dcsr_cluster ejeanvoi@curnagl.dcsr.unil.ch
Enter passphrase for key '.ssh/id_dcsr_cluster':
Last login: Fri Nov 26 10:25:05 2021 from 130.223.6.87
[ejeanvoi@login ~]$

Remote graphical interface

To visualize a graphical application running from the cluster, you have to connect using -X option:

ssh -X username@curnagl.dcsr.unil.ch

macOS

Like Linux, SSH has a native support in macOS, so nothing special has to be installed, excepted for the graphical part.

Connection with a password

This is similar to the Linux version described above.

Connection with a key

This is similar to the Linux version described above.

Remote graphical interface

To enable graphical visualization over SSH, you have to install an X server. Most common one is XQuartz, it can be installed like any other .dmg application.

Then, you have to add the following line at the beginning of the ~/.ssh/config file (if the file doesn't exist, you can create it):

XAuthLocation /opt/X11/bin/xauth

Finally, just add -X flag to the ssh command and run your graphical applications:

image-1637921404046.png

Windows

To access the DCSR clusters from a Windows host, you have to use an SSH client.

Several options are available:

We present here only MobaXterm (since it's a great tool that also allows to transfer files with a GUI) and the PowerShell options. For both options, we'll see how to connect through SSH with a password and with a key.

MobaXterm

Connection with a password

After opening MobaXterm, you have to create a new session:

image-1637855599086.png

Then you have to configure the connection:

image-1637855844680.png

Then you can choose to save or not your password in MobaXterm:

image-1637855958519.png

Finally, you are connected to Curnagl:

image-1637855982735.png

You can see, on the left panel, a file browser. This represents your files on the cluster and it can be used to edit small text files, or to download/upload files to/from your laptop.

Connection with a key

First you have to create a key:

image-1637856210025.png

A new windows is opened, there you can choose the kind of key (Ed25519 is a good choice):

image-1637856320671.png

While the key generation, you have to move the mouse over the window to create entropy:

image-1637856429206.png

When the key is generated, copy the public key into a text document:

image-1637858153947.png

Then, choose a passphrase (very important to protect your private key), and save the private key in your computer:

image-1637858382625.png

Once the private key is saved, you can create a new SSH session that uses a private key:

image-1637858608767.png

The first time you will connect, you will be prompted to enter the password of your UNIL account:

image-1637858679413.png

Once connected to the cluster, put the content of you public key at the end of a file called ~/.ssh/authorized_keys. This can be done using that command:

echo "PUBLIC_KEY" >> ~/.ssh/authorized_keys

(of course replace PUBLIC_KEY in the previous command with the value of you public key pasted in a text file)

image-1637858969167.png

And the next time you will connect, you will be prompted to enter the SSH key passphrase, and not the UNIL account password:

image-1637859097534.png

Remote graphical interface

With MobaXterm, it's very easy to use a remote graphical interface. You just have to pay attention to check the "X11-Forwarding" option when you create the session (it should be checked by default):

image-1637928096473.png

And then, once connected, you can run any graphical application:

image-1637930839430.png

 

 

SSH from PowerShell

Connection with a password

First, you have to run Windows PowerShell:

image-1637859384206.png

Once the terminal is here, you can just run the following command, add Curnagl to the list of known hosts, and enter your password (UNIL account):

ssh username@curnagl.dcsr.unil.ch

image-1637859622117.png

Connection with a key

First you have to open Windows Powershell:

image-1637860320009.png

Then you have to generate the key with the following command:

ssh-keygen -t ed25519

You can accept the default name for the key (just hit Enter), and then you have to choose a passphrase:

image-1637860426985.png

Then you have to print the content of the public key, to connect on Curnagl using the password method (with UNIL password, and to execute the following command:

echo "PUBLIC_KEY" >> ~/.ssh/authorized_keys

(of course replace PUBLIC_KEY in the previous command with the value of you public key pasted from the terminal)

Once this is done, you can exit the session, and connect again. This time the passphrase of the SSH key will be asked instead of your UNIL account password:

image-1637860990146.png

 

Checkpoint SLURM jobs

Introduction

As you probably noticed, execution time for jobs in DCSR clusters is limited to 3 days. For those jobs that take more than 3 days and cannot be optimized or divided up into smaller jobs, DCSR's clusters provide a Checkpoint mechanism. This mechanism will save the state of application in disk, resubmit the same job, and restore the state of the application from the point at which it was stopped. The checkpoint mechanism is based on CRIU which uses low level operating system mechanisms, so in theory it should work for most of the applications.

How to use it

In order to use it, you need to do two things:

job modifications

make the following changes to your jobs scripts:

  1. You need to source the script /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh
  2. Add the setup_ckpt call
  3. Use launch_app to call your application
  4. (optional) add --error and --output to slurm parameters. This will create two separate files for standard output and standard error. If you need to process the output of your application later you are encourage to add these parameters, otherwise you will see some errors or warnings from the checkpoint mechanism. If your application generates custom output files, you do not need these options. 

The script below summarizes those changes:

#!/bin/sh
#SBATCH --job-name job1
#SBATCH -N 1
#SBATCH --cpus-per-task 4
#SBATCH --partition cpu
#SBATCH --time 02:00:00
#SBATCH --mem=16G
#SBATCH --error job1-%j.error
#SBATCH --output job1-%j.out

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh

setup_ckpt

launch_app $APP


the --time parameter does not limit the duration of the job but It will be used to create the checkpoint. For example for a --time 02:00:00 , after 2 hours the job will be checkpointed and it will be rescheduled some minutes later. You can put any value from 1 hour to 3 days, a good value is something in the middle: 10h or 12h. The checkpoint uses low level Operating System mechanism so it should work for most of applications, however, there coud be some error with some exotic application. That is why it is a good idea to put something not longer that 12 hours for the time limit, as it will allow you to know if the application is compatible with checkpointing. 

launching the job

Before launching your job please follow the following recipe: 

export SBATCH_OPEN_MODE="append"
export SBATCH_SIGNAL=B:USR1@60
sbatch job.sh


SBATCH_SIGNAL=B:USR1@60 implies that checkpoint mechanism have 60 seconds to create the checkpoint. For some memory hungry applications the checkpoint can be longer.  Refer to checkpoint performance to have some indications.

Additionally to the out and error files produced by SLURM, the execution of the job will generate:

  1. checkpoint-JOB_ID.log: checkpoint logs
  2. checkpoint-JOB_ID-files: application checkpoint files. Please do not delete this directory until your job has finished otherwise the job will fail.

Job examples:

#!/bin/sh
#SBATCH --job-name job1
#SBATCH -N 1
#SBATCH --cpus-per-task 1
#SBATCH --partition cpu
#SBATCH --time 02:00:00
#SBATCH --mem=16G

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh

setup_ckpt

launch_app ../pi_css5 400000000

Multithread application:

#!/bin/sh
#SBATCH --job-name job1
#SBATCH -N 1
#SBATCH --cpus-per-task 4
#SBATCH --partition cpu
#SBATCH --time 02:00:00
#SBATCH --mem=16G

export OMP_NUM_THREADS=4
module load gcc

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh
setup_ckpt

launch_app /home/user1/lu.C.x

Tensorflow:

#!/bin/sh
#SBATCH --job-name job1
#SBATCH -N 1
#SBATCH --cpus-per-task 4
#SBATCH --partition cpu
#SBATCH --time 02:00:00
#SBATCH --mem=16G

export OMP_NUM_THREADS=4
source ../tensorflow_env/bin/activate

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh
setup_ckpt

launch_app python run_tensorflow.py

Samtools:

#!/bin/sh
#SBATCH --job-name job1
#SBATCH -N 1
#SBATCH --cpus-per-task 1
#SBATCH --partition cpu
#SBATCH --time 02:00:00
#SBATCH --mem=16G

module load gcc samtools

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh
setup_ckpt

launch_app samtools sort /users/user1/samtools/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam -o sorted_file.bam

Again, before launching a job you need to follow the following recipe:

export SBATCH_OPEN_MODE="append"
export SBATCH_SIGNAL=B:USR1@60
sbatch job.sh

Complex job scripts

If your job script look like this:

#!/bin/sh
#SBATCH --job-name job1
#SBATCH -N 1
#SBATCH --cpus-per-task 1
#SBATCH --partition cpu
#SBATCH --time 02:00:00
#SBATCH --mem=16G

module load gcc samtools

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh
setup_ckpt

command_1
command_2
command_3
command_4
launch_app command_n

Only the command_n will be checkpointed. The rest of the commands will be executed each time the job is restored. This can be a problem in the following cases:

  1. command_1, command_2 ... take a considerable amount time to execute
  2. command_1, command_2 generate input for command_n. This will make the checkpoint fail if the input file differs in size

For those cases, we suggest to wrap all those commands inside a shell script and checkpoint the given shell script.

command_1
command_2
command_3
command_4
command_n

job example:

#!/bin/sh
#SBATCH --job-name job1
#SBATCH -N 1
#SBATCH --cpus-per-task 1
#SBATCH --partition cpu
#SBATCH --time 02:00:00
#SBATCH --mem=16G

module load gcc samtools

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh
setup_ckpt

launch_app script.sh

Performance

Checkpoint can take some time and it is directly proportional to the amount of memory used by the application. These are some numbers:

Memory size (GB) Checkpoint time (secs)
4.6 3.2
7.1 5.6
9 7.74
18 15

In theory with 60 seconds it should be able to checkpoint an application that take up to ~50 GB of RAM. Feel free to change SBATCH_SIGNAL=B:USR1@60. A good rule of thumb is to count a 1.5 second per GB of memory allocated

Java applications

In order to checkpoint java application, we have to use two parameters for launching the application:

-XX:-UsePerfData

This will deactivate the creation of the directory /tmp/hsperfdata_$USER, otherwise it will make the checkpoint restoration fail

-XX:+UseSerialGC

This will enable the Serial Garbage collector which deactivates the parallel garbage collector. The parallel garbage collector generates a GC thread per thread of computation. Thus, making the restoration of checkpoint more difficult due to the large number of threads.

Snakemake

In order to use the checkpoint mechanism with snakemake, you need to adapt the SLURM profile used to submit jobs into the cluster. Normally the SLURM profile define the following options:

We need to modify how jobs are launched to slurm, the idea is to wrap snakemake jobscript into another job. This will enable us to checkpoint all processes launched by snakemake. 

The procedure consist in the following steps (the following steps are based on the slurm plugin provided here: https://github.com/Snakemake-Profiles/slurm)

Create checkpoint script

Please create the following script and call it job-checkpoint.sh:

#!/bin/bash

source /dcsrsoft/spack/external/ckptslurmjob/scripts/ckpt_methods.sh

setup_ckpt

launch_app $1

make it executable: chmod +x job-checkpoint.sh. This script should be placed at the same directory as the other slurm scripts used.

Modify slurm-scripts

We need to modify the sbatch command used. Normally a jobscript is passed as a parameter, we need to pass our aforementioned script first and pass the snakemake jobscript as a parameter, as shown below (lines 6 and 9):

def submit_job(jobscript, **sbatch_options):
    """Submit jobscript and return jobid."""
    options = format_sbatch_options(**sbatch_options)
    try:
        # path of our checkpoint script
        jobscript_ckpt = os.path.join(os.path.dirname(__file__),'job-checkpoint.sh')
        # we tell sbatch to execute the chekcpoint script first and we pass 
        # jobscript as a parameter
        cmd = ["sbatch"] + ["--parsable"] + options + [jobscript_ckpt] + [jobscript]
        res = sp.check_output(cmd)
    except sp.CalledProcessError as e:
        raise e
    # Get jobid
    res = res.decode()
    try:
        jobid = re.search(r"(\d+)", res).group(1)
    except Exception as e:
        raise e
    return jobid
  

Ideally, we need to pass extra options to sbatch in order to control output and error files:

sbatch_options = { "output" : "{rule}_%j.out", "error" : "{rule}_%j.error"}

This is necessary to isolate errors and warnings raised by the checkpoint mechanism into an error file (as explained at the beginning of this page). This is only valid for the official slurm profile as it will treat snakemake wildcards defined in Snakefile (e.g rule). 

Export necessary variables

You still need to export some variables before launching snakemake:

export SBATCH_OPEN_MODE="append"
export SBATCH_SIGNAL=B:USR1@1800
snakemake --profile slurm-chk/ --verbose 

With this configuration, the checkpoint will start 30 min before the end of the job.

Limitations

Urblauna access and data transfer

Connecting to Urblauna

The Urblauna cluster is intended for the processing of sensitive data and as such comes with a number of restrictions.

All access requires the use of two factor authentication and outside of the UNIL/CHUV networks the UNIL VPN is required.

Note for CHUV users: in case of problems connecting to Urblauna please contact your local IT team to ensure that the network connection is authorised.

2 Factor authentication

When your account is activated on urblauna you will receive an email from noreply@unil.ch that contains a link to the QR code to set up the 2 factor authentication - this is not the same code as for EduID!

To import the QR code you first need to install an application on your phone such as Google Authenticator or FreeOTP+. Alternatively desktop applications such as KeePassXC can also be used.

If you lose the secret then please contact us in order to generate a new one.

Web interface

As for Jura there is a web interface (Guacamole) that allows for a graphical connection to the Urblauna login node

To connect go to u-web.dcsr.unil.ch

You will then be prompted to enter your username and password followed by the 2FA code that you received

urblauna_rdp.png

SSH interface

There is also SSH terminal access which may be more convenient for many operations. Unlike connections to Curnagl no X11 forwarding or use of tunnels is permitted. The use of scp to copy data is also blocked.

To connect:

ssh username@u-ssh.dcsr.unil.ch

You will then be prompted for your UNIL password and the 2FA code that you received as follows:

% ssh ulambda@u-ssh.dcsr.unil.ch

(ulambda@u-ssh.dcsr.unil.ch) Password: 
(ulambda@u-ssh.dcsr.unil.ch) Verification code: 

Last login: Wed Jan 18 13:25:46 2023 from 130.223.123.456

[ulambda@urblauna ~]$ 

Please note that these are not the EduID password and 2FA code!

Data Transfer

An SFTP server allows you to import and export data.

From Laptop or Jura to Urblauna

Here is the procedure to transfer a file, say mydata.txt, from your Laptop or Jura to Urblauna.

From your Laptop or Jura:

cd path_to_my_data

sftp <username>@u-sftp.dcsr.unil.ch

You will be prompted for your password and the two factor authentication code as for an SSH connection to Urblauna.

sftp> put mydata.txt

sftp> exit

Your file "mydata.txt" will be in /scratch/username/.

Data is copied to/from your scratch directory ( /scratch/username ) and once there it should be moved to the appropriate storage space such as /data or /archive - please remember that the scratch space is automatically cleaned up.

From Urblauna to Laptop

Here is the procedure to transfer a file, say mydata.txt, from Urblauna to your Laptop.

Log into Urblauna and type:

cp path_to_my_data /scratch/username/

From your Laptop:

sftp <username>@u-sftp.dcsr.unil.ch

You will be prompted for your password and the two factor authentication code as for an SSH connection to Urblauna.

sftp> get mydata.txt

sftp> exit

Your file "mydata.txt" will be in your current working directory.

Jura to Urblauna Migration

Migration schedule and deadlines


Once the Urblauna cluster is officially in service the decommissioning process for Jura will begin.

The timescale is still under discussion but we expect that Jura will be fully offline after six months.


Compute migration


Urblauna offers a huge gain in performance and capacity so you are encouraged to transfer your workflows as soon as possible. If you are still using the Vital-IT software stack we encourage you to transition to the new and supported DCSR stack.

In order to allow for straightforward installation of user tools the Curnagl /work filesystem is available in read-only mode on Urblauna.

To be able to use this your PI should create a new project via https://conference.unil.ch/research-resource-requests/ with the following characteristics:

Request field Input


Default project ?  No
Name: (e.g.) Group Software
Short title (shared storage name) (e.g.) software
What kind of data will you reuse or generate ? Normal data


Then select the Storage and Compute bundles - we recommend asking for the following: 

Field Input


HPC Cluster - estimate of the CPU(core) time in Hours 100

File storage - Storage Volume Quota in Gigabytes 

0
Storage on /work - Storage Volume Quota in Gigabytes 100
Storage on /work - Max. nb of files 2000000


Once provisioned the space will be available under /work in the standard organisational structure. All required software can be installed here and then used directly from Urblauna. 



Data migration


See https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/urblauna for details about the data spaces available on Urblauna

It is your responsibility to migrate all required data from the Jura scratch and data spaces to Urblauna

The /archive filesystem is visible on both clusters and does not require any migration

The new  /data filesystem has been initialed with the same quotas as on the (old) Jura /data space - if you require more space please contact us via the helpdesk.

In order to move data from the Jura scratch filesystem to Urblauna you can use the u-sftp.dcsr.unil.ch SFTP service - See the SFTP documentation for more information


Job Templates

Here you can find example job script templates for a variety of job types

  1. Single-threaded tasks
  2. Array jobs
  3. Multi-threaded tasks
  4. MPI tasks
  5. Hybrid MPI/OpenMP tasks
  6. GPU tasks
  7. MPI+GPU tasks

You can copy and paste the examples to use as a base - don't forget to edit the account and e-mail address as well as which software you want to use!

For all the possible things you can ask for see the official documentation at https://slurm.schedmd.com/sbatch.html

Single threaded tasks

Here we want to use a tool that cannot make use of more than one CPU at a time.

The important things to know are:

#!/bin/bash

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --partition cpu
#SBATCH --mem 8G
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc

Array jobs

Here we want to run an array job where there are N almost identical jobs that differ only in the input parameters.

In this example we use 1 CPU per task but you can obviously use more (see the multi-threaded task example)

See our introductory course for more details

The important things to know are:

#!/bin/bash

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --mem 8G
#SBATCH --partition cpu
#SBATCH --time 12:00:00
#SBATCH --array=1-100

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Extract the parameters from a file (one line per job array element)

INPUT=$(sed -n ${SLURM_ARRAY_TASK_ID}p in.list)

# Load the required software: e.g.
# module purge
# module load gcc

Multi-threaded tasks

Here we want to use a tool that makes use of more than one CPU at a time.

The important things to know are:

Note that on the DCSR clusters the variable OMP_NUM_THREADS is set to the same value as cpus-per-task but here we set it explicitly as an example

#!/bin/bash

#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 8
#SBATCH --mem 64G
#SBATCH --partition cpu
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Set the number of threads for OpenMP codes

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# Load the required software: e.g.
# module purge
# module load gcc

MPI tasks

Here we want to use code that uses MPI to allow for distributed memory parallel calculations.

The important things to know are:

Here we give the example of a code that we know runs efficiently with ~100 ranks so we choose 96 as this completely fills two compute nodes.

With MPI tasks always choose a number of tasks that entirely fills nodes: 48 / 96 / 144 / 192 etc - this is where the --ntasks-per-node directive is useful.

As we know that we are using the entire node it makes sense to ask for all the memory even if we don't need it.

#!/bin/bash

#SBATCH --nodes 2
#SBATCH --ntasks-per-node 48 
#SBATCH --cpus-per-task 1
#SBATCH --mem 500G
#SBATCH --partition cpu
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc mvapich2

# MPI codes must be launched with srun

srun mycode.x

Hybrid MPI/OpenMP tasks

Here we want to run a hybrid MPI/OpenMP code where each MPI rank uses OpenMP for shared memory parallelisation.

Based on the code and the CPU architecture we know that 12 threads per rank is efficient - always run tests to find the best ratio of threads per rank!

The important things to know are:

#!/bin/bash

#SBATCH --nodes 2
#SBATCH --ntasks-per-node 4 
#SBATCH --cpus-per-task 12
#SBATCH --mem 500G
#SBATCH --partition cpu
#SBATCH --time 12:00:00

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc mvapich2

# Set the number of threads for the OpenMP tasks (12 in this case)

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# MPI codes must be launched with srun

srun mycode.x

GPU tasks

Here we want to run a code that makes use of one GPU and one CPU core - some codes are able to use multiple GPUs and CPU cores but please check how the performance scales!

The important things to know are:

Note the use of the --gres-flags enforce-binding directive to ensure that the CPU part of the code is on the same bus as the GPU used so as to maximise memory bandwidth.

In this example we run 2 tasks per node over 4 nodes for a total of 8 ranks and 8 GPUs.

#!/bin/bash

#SBATCH --nodes 1
#SBATCH --ntasks-per-node 1 
#SBATCH --cpus-per-task 1
#SBATCH --mem 500G
#SBATCH --partition gpu
#SBATCH --time 12:00:00
#SBATCH --gres gpu:1
#SBATCH --gres-flags enforce-binding

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc cuda

MPI+GPU tasks

Here we have a code that used MPI for distributed memory parallelisation with one GPU per rank for computation.

The important things to know are:

Note the use of the --gres-flags enforce-binding directive to ensure that the CPU part of the code is on the same bus as the GPU used so as to maximise memory bandwidth.

In this example we run 2 tasks per node over 4 nodes for a total of 8 ranks and 8 GPUs.

#!/bin/bash

#SBATCH --nodes 4
#SBATCH --ntasks-per-node 2 
#SBATCH --cpus-per-task 8
#SBATCH --mem 500G
#SBATCH --partition gpu
#SBATCH --time 12:00:00
#SBATCH --gpus-per-task 1
#SBATCH --gres-flags enforce-binding

#SBATCH --account ulambda_gruyere
#SBATCH --mail-type END,FAIL 
#SBATCH --mail-user ursula.lambda@unil.ch

# Load the required software: e.g.
# module purge
# module load gcc mvapich2 cuda

# MPI codes must be launched with srun

srun mycode.x

Urblauna Guacamole / RDP issues

Resolving connnection problems

There can sometimes be communication issues between the web based RDP service (Guacamole) and the RDP client on the login node.

If you are continuously redirected to the page in the image below then you will need to clean up the processes on the login node.

rdf_fail.png

To do so connect using SSH to u-ssh.dcsr.unil.ch and run the following commands making sure to replace the username ulambda with your own username and the session ids with those returned by the command:

$ loginctl list-sessions | grep ulambda | grep c[1-9]

     c3 123456 ulambda           
    c13 123456 ulambda
    
$ loginctl terminate-session c3 c13    

You will then be able to reconnect via u-web.dcsr.unil.ch

Urblauna migration

Urblauna

Urblauna is the new UNIL cluster for sensitive data and will replace the Jura cluster.

stockage-horus is the name used for the Jura cluster when connecting from the CHUV.

Note: "HORUS" is an acronym for HOspital Research Unified data and analytics Services

Documentation

As well as this page there is documentation at:

https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/urblauna-access-and-data-transfer

https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jura-to-urblauna-migration

https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/urblauna-guacamole-rdp-issues

Nearly all the documentation for Curnagl is also applicable - see the HPC Wiki

The introductory course for using the clusters is available HERE

The slides for our other courses can be consulted HERE

These courses are next planned for the 13th, 14th and 15th of June 2023 and take place in the Biophore auditorium

Support

Please contact the DCSR via helpdesk@unil.ch and start the mail subject with "DCSR Urblauna"

Do not send mails to dcsr-support - they will be ignored.

Specifications

Total cores: 864

Total memory: 18TB

Memory to core ratio: 21 GB/core

For those of you have already used Curnagl then things will be very familiar.

If the initial resources are found to be insufficient then more capacity can be easily added.

An Urblauna looks like:

urblauna.jpg

How to connect

For Jura the connection method is different depending on if one connects from the CHUV or the UNIL networks - for Urblauna it's the same for everyone.

The SSH and Web interfaces can be used simultaneously.

Two Factor Authentication

You should have received a QR code which allows you to setup the 2FA - if you lose your code then let us know and we will generate a new one for you.

SSH

% ssh ulambda@u-ssh.dcsr.unil.ch

(ulambda@u-ssh.dcsr.unil.ch) Password: 
(ulambda@u-ssh.dcsr.unil.ch) Verification code: 

Last login: Wed Jan 18 13:25:46 2023 from 130.223.123.456

[ulambda@urblauna ~]$

The 2FA code is cached for 1 hour in case that you connect again.

X11 Forwarding and SSH tunnels are blocked as is SCP

Web

Go to u-web.dcsr.unil.ch and you will be asked for your username and password followed by the 2FA code:

urblauna_rdp.png

This will send you to a web based graphical desktop that should be familiar for those who already use jura.dcsr.unil.ch

Note than until now the CHUV users did not have this as a connection option.

urblauna_desktop.png

Data Transfer

The principle method to get data in/out of Urblauna is using the SFTP protocol

On Urblauna your /scratch/<username> space is used as the buffer when transferring data.

% sftp ulambda@u-sftp.dcsr.unil.ch
(ulambda@u-sftp.dcsr.unil.ch) Password: 
(ulambda@u-sftp.dcsr.unil.ch) Verification code: 
Connected to u-sftp.dcsr.unil.ch.

sftp> pwd
Remote working directory: /ulambda

sftp> put mydata.tar.gz 
Uploading mydata.tar.gz to /ulambda/mydata.tar.gz

The file will then be visible from urblauna at /scratch/ulambda/mydata.tar.gz

For graphical clients such as Filezilla you need to use the interactive login type so as to be able to enter the 2FA code.

Coming soon

There will be an SFTP endpoint u-archive.dcsr.unil.ch that will allow transfer to the /archive filesystem without 2FA from specific IP addresses at the CHUV.

This service will be what stockage-horus was originally supposed to be!

This is on request and must be validated by the CHUV security team.

What's new

There are a number of changes between Jura and Urblauna that you need to be aware of:

CPUs

The nodes each have two AMD Zen3 CPUs with 24 cores for a total of 48 cores per node.

In your jobs please ask for the following core counts:

Do not ask for core counts like 20 / 32 / 40 as this makes no sense given the underlying architecture. We recommend running scaling tests to find the optimal level of parallelism for multi-threaded codes.

Unlike for Jura all the CPUs are identical so all nodes will provide the same performance.

GPUs

There are two GPU equiped nodes in Urblauna and each A100 card has been partitioned so as to provide a total of 8 GPUs with 20GB of memory.

To request a GPU use the --gres Slurm directive

#SBATCH --gres gpu:1

Or interactively with Sinteractive and the -G option

$ Sinteractive -G 1
 
Sinteractive is running with the following options:
 
--gres=gpu:1 -c 1 --mem 8G -J interactive -p interactive -t 1:00:00
 
salloc: Granted job allocation 20000394
salloc: Waiting for resource configuration
salloc: Nodes snagpu001 are ready for job

$ nvidia-smi 
..
+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    1   0   0  |     19MiB / 19968MiB | 42      0 |  3   0    2    0    0 |
|                  |      0MiB / 32767MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

MPI

This is now possible... Ask us for more details if needed.

/data

The /data filesystem is structured in the same way as on Jura but it is not the same filesystem

This space is on reliable storage but there are no backups or snapshots.

The initial quotas are the same as on Jura - if you wish to increase the limit then just ask us. With 1PB available all resonable requests will be accepted.

The Jura /data filesystem is available in read-only at /jura_data

/scratch

Unlike on Jura /scratch is now organised per user as on Curnagl and as it is considered as temporary space there is no fee associated.

There are no quotas but in case of the utilisation being greater that 90% then files older than 2 weeks will be removed automatically.

/users

The /users home directory filesystem is also new - the Jura home directories can be accessed in read-only at /jura_home.

/work

The Curnagl /work filesystem is visible in read-only from inside Urblauna. This is very useful for being able to install software on an Internet connected system.

/reference

This is intended to host widely used datasets

The /db set of biological databases can be found at /reference/bio_db/

/archive

This is exactly the same /archive as on Jura so there is nothing to do!

The DCSR software stack

This is now the default stack and is identical to Curnagl. It is still possible to use the old Vital-IT /software but this is deprecated and no support can be provided.

For how to do this see the documentation at Old software stack

Note: The version of the DCSR software stack currently available on Jura (source /dcsrsoft/spack/bin/setup_dcsrsoft) is one release behind that of Curnagl/Urblauna. To have the same stack you can use the dcsrsoft use old command:

[ulambda@urblauna ~]$ dcsrsoft use old
Switching to the old software stack

There's lots of information on how to use this in our introductory course

Installing your own software

We encourage you to ask for a project on Curnagl (HPC normal data) which will allow you to install tools and then be able to use them directly inside Urblauna.

See the documentation for further details

For those who use Conda don't forget to make sure that all the directories are in your project /work space

https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/using-conda-and-anaconda

nano .condarc


pkgs_dirs:
  - /work/path/to/my/project/space
  
envs_dirs:
  - /work/path/to/my/project/space

For R packages it's easy to set an alternative library location:

echo 'R_LIBS_USER=/work/path/to/project/Rlib' > ~/.Renviron

This will need to be run on both Curnagl and Urblauna and will allow you to install packages when connected to the internet and run them inside the air gapped environment.

Jura decommissioning

Urblauna provides a generational leap in performance and capacity compared to Jura so we hope that the interest to migrate is obvious.

If you need help transferring your workflows from Jura to Urblauna then just ask us!

Transfer files to/from Curnagl

How do I transfer my code to Curnagl? How do I get my results back on my computer?

 

There are two main options to transfer data to/from Curnagl:

1.  You are familiar with terminal/bash command lines: use the `scp` command, see below.

2.  You are used to work with graphic interfaces: use FileZilla, see below.


First option: scp command


scp <FILE_TO_COPY> <FOLDER_WHERE_YOU_PASTE_IT>

scp –r <FOLDER_TO_COPY> <FOLDER_WHERE_YOU_PASTE_IT>

The latter command refers to a folder transfer. To transfer a folder, add the recursive command –r after scp. 


From your own computer to the cluster. 

Never better than an example to understand this command. Suppose you have a file (of any type you want!) called tutorial on your own computer. Here are the steps to copy this file to Curnagl cluster: 

  1. Depending on your distribution.

LINUX: open a terminal in the folder where is the file tutorial , or open a terminal and then use cd command to go to the right place.

MAC: type terminal in the search field, choose `terminal`, then use cd command to go to the right place.

WINDOWS: type cmd in the menu, choose Command prompt or Powershell, then use cd command to go to the right place.

2. Open a second terminal. Connect to Curnagl with the  ssh command you are used to.

This step is not mandatory but it allows you to get the path where you want to paste tutorial. One tip: in case the path where you want to paste tutorial is very long (e.g. /users/<username>/<project>/<sub_project>/<sub_sub_project>/<sub_sub_sub_project>) or simply to avoid mistakes when writting the path: use pwd command in the right folder on this second terminal connected to Curnagl, copy the whole path and paste it to the end of the scp command (see below).

You now have two open terminals: one where you are on your own computer and one where you are connected to Curnagl. Suppose you want to copy/paste tutorial to /users/<username>/<project> on Curnagl, where <username> is your username and <project> is the folder where you want to paste tutorial. 

3. On the terminal from step 1 (which can access tutorial file since you are supposed to be in the right folder), type the following command (it will ask for your password):

scp tutorial <username>@curnagl.dcsr.unil.ch:/users/<username>/<project>

You can check either the copy/paste performed well or not: use ls command on Curnagl and check either if tutorial file is there or not. 


From the cluster to your own computer. 

Only step 3 changes: 

scp <username>@curnagl.dcsr.unil.ch:/users/<username>/<project>/tutorial .

In case you do not want to paste it in the current folder (that is for what . stands for at the end of the above command line), simply replace . with the correct path. 



Second option: FileZilla

First, you must install FileZilla on your computer. Please refer to: https://filezilla-project.org/ (install client version, more documentation on https://wiki.filezilla-project.org/Client_Installation). 

Here are the steps to transfer data to/from Curnagl with FileZilla.

  1. Open FileZilla. Performa a quick connection to Curnagl. Fill in `Host` with:sftp://curnagl.dcsr.unil.ch filezilla_sftp_1.png
  2. Then fill in `Username`, `Password`, and `Port` with 22. Click on `Quickconnect`. Refer to the screeshot below.
    filezilla_sftp_2.png
  3. You now have the remote site window on the right.
    1.  To transfer data to Curnagl: click and move file/folder from the left window (local site) to the right window (remote site).
    2. Inversely, to transfer data from Curnagl: click and move file/folder from the right window (remote) to the left window (local site).

Instead of `/home/margot/` on the left local site (respectively `/users/msirdey/` on the right remote site), you should see your own path on your computer (respectively on Curnagl).

FileZilla keeps remote sites in memory. For future transfers, click on the arrow on the right of `Quickconnect` and choose `sftp://curnagl.dcsr.unil.ch`.