S3 storage


Introduction

What is S3?

Amazon S3 (Simple Storage Service) is a scalable object storage service used for storing and retrieving any amount of data at any time. It organizes data into containers called “buckets.” Each bucket can store an unlimited number of objects, which are the fundamental entities stored in S3.

Understanding S3 Bucket structure

S3 Endpoint Access

Accessing S3 is similar to accessing any other web service over HTTP, which most users are already familiar with. The endpoint URL follows the same structure as a typical web address, making it straightforward to understand and use.

An S3 endpoint address typically looks like this: https://dnsname.com/bucket-name/object-key

For example, if you have a bucket named my-bucket and an object with the key folder1/file.txt, the S3 URL would be: https://dnsname.com/my-bucket/folder1/file.txt

IAM Key Pairs

To access and manage your S3 resources securely, you will use IAM (Identity and Access Management) key pairs instead of a traditional login and password. An IAM key pair consists of an Access Key ID and a Secret Access Key. These keys are used to authenticate your requests to AWS services:

Unlike a traditional login and password, different IAM key pairs can be attached to different sets of permissions defined in their policy files. These policies control what actions the keys are allowed to perform, enhancing security by ensuring that each key pair has only the necessary permissions for its intended tasks.

Request S3 bucket

To request S3 bucket you have two options. Either you want to attach it to an existing project or you want to create a new project with S3 storage.

Attach S3 bucket to an existing project

In that case send an email to helpdesk@unil.ch (with subject starting with "DCSR add S3 bucket to project") and providing the following information:

Add S3 bucket to new project

When requesting a new project using the dedicated web application, on the resource information selection screen, click on advanced selection as follows:

image.png

Then choose "Object Storage, NO BACKUP" article :

image.png

Finally, specify your requirements:

image.png

Software to access S3 bucket

From Curnagl cluster (command line tools)

Rclone

 

awscli

 

From your laptop or a workstation

Command line tools

Following the official documentation, you can install Rclone (https://rclone.org/install/) or awscli (https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) on your laptop/workstation.

Then both tools can be used as described in the above part dedicated to use from Curnagl cluster.

Cyberduck

Cyberduck can be installed from https://cyberduck.io/download/.

 

 

Share files from a bucket with presign keys

Purpose

AWS presigned URLs (or presign keys) are used to grant temporary access to objects in Amazon S3 without requiring the recipient to have AWS security credentials. Here are the main purposes:

This is particularly useful for sharing files securely or allowing temporary uploads without exposing your AWS credentials.

Warning: only files can be shared this way, so if you want to share a folder, you have to create an archive of this folder first.

Prerequisite

The bucket has to be opened on the outside, if it's not the case, send an email to helpdesk@unil.ch with the subject starting with "DCSR S3 bucket".

Create a presign key with aws-cli

You can either install awscli on your laptop since it's a simple Python package, or use it from the cluster (there is a module called awscli-v2 that you can simploy load).

Configuration

If it's not configured yet, you can run aws configure command. This will ask you for:

This will create two files:

Presign key creation

Let's consider that you want the file to_share/important_file.gz from the bucket recn-fac-fbm-dep-greatpi-data, you can use the following command: 

aws --endpoint-url=https://s3.unil.ch s3 presign \
    s3://recn-fac-fbm-dep-greatpi-data/to_share/important_file.gz \
    --expires-in 604800

The value defined with --expires-in parameter is the validity of the link expressed in seconds. The maximum validity is 7 days.

This command will return a link that can be shared with your external collaborator.

Create a presign key with Cyberduck

Once your connection to the bucket is configured with Cyberduck, you can browse it. Then right click to file you want to share, then choose "Copy URL", and choose one of the 3 expiration options (1 hour, 1 day, 1 week). Finally, just paste the result somewhere to get the link.

image.png