# S3 storage

# Introduction

## What is S3?

Amazon S3 (Simple Storage Service) is a scalable object storage service used for storing and retrieving any amount of data at any time. It organizes data into containers called “buckets.” Each bucket can store an unlimited number of objects, which are the fundamental entities stored in S3.

## Understanding S3 Bucket structure

- Buckets: These are the top-level containers in S3. Each bucket has a unique name and is used to store objects.
- Objects: These are the files stored in a bucket. Each object is identified by a unique key (or ID) within the bucket.
- Object Keys: While S3 does not have a traditional file system hierarchy, it uses a flat namespace. The / character in object keys is used to simulate a directory structure, making it easier to organize and manage objects. However, these are not actual directories but part of the object’s key.

## S3 Endpoint Access

Accessing S3 is similar to accessing any other web service over HTTP, which most users are already familiar with. The endpoint URL follows the same structure as a typical web address, making it straightforward to understand and use.

An S3 endpoint address typically looks like this: https://dnsname.com/bucket-name/object-key

- Endpoint: [https://dnsname.com](https://dnsname.com)
- Bucket Name: bucket-name
- Object Key: object-key

For example, if you have a bucket named my-bucket and an object with the key folder1/file.txt, the S3 URL would be: https://dnsname.com/my-bucket/folder1/file.txt

## IAM Key Pairs

To access and manage your S3 resources securely, you will use IAM (Identity and Access Management) key pairs instead of a traditional login and password. An IAM key pair consists of an Access Key ID and a Secret Access Key. These keys are used to authenticate your requests to AWS services:

- Access Key ID: this is similar to a username
- Secret Access Key: this is similar to a password and should be kept secure.

Unlike a traditional login and password, different IAM key pairs can be attached to different sets of permissions defined in their policy files. These policies control what actions the keys are allowed to perform, enhancing security by ensuring that each key pair has only the necessary permissions for its intended tasks.

# Request S3 bucket

To request S3 bucket you have two options. Either you want to attach it to an existing project or you want to create a new project with S3 storage.

## Attach S3 bucket to an existing project

In that case send an email to <helpdesk@unil.ch> (with subject starting with "DCSR add S3 bucket to project") and providing the following information:

- project name
- size of the bucket in GB
- access allowed from outside UNIL (in read only mode only?) or not
- if access is allowed from outside UNIL, should it be limited to a set of IP or network ranges?

## Add S3 bucket to new project

When requesting a new project using the dedicated [web application](https://requests.dcsr.unil.ch/), on the resource information selection screen, click on advanced selection as follows:

[![image.png](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/scaled-1680-/yYsimage.png)](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/yYsimage.png)

Then choose "Object Storage, NO BACKUP" article :

[![image.png](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/scaled-1680-/Oj8image.png)](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/Oj8image.png)

Finally, specify your requirements:

[![image.png](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/scaled-1680-/Bl2image.png)](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/Bl2image.png)

# Software to access S3 bucket

## From Curnagl cluster (command line tools)

#### Rclone

#### awscli

## From your laptop or a workstation

#### Command line tools

Following the official documentation, you can install Rclone ([https://rclone.org/install/)](https://rclone.org/install/)) or awscli (https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) on your laptop/workstation.

Then both tools can be used as described in the above part dedicated to use from Curnagl cluster.

#### Cyberduck

Cyberduck can be installed from [https://cyberduck.io/download/](https://cyberduck.io/download/).

# Share files from a bucket with presign keys

## Purpose

AWS presigned URLs (or presign keys) are used to grant temporary access to objects in Amazon S3 without requiring the recipient to have AWS security credentials. Here are the main purposes:

- **Download Access**: You can generate a presigned URL to allow someone to download an object from your S3 bucket without needing their own AWS credentials
- **Upload Access**: Similarly, you can create a presigned URL to permit someone to upload a specific object to your S3 bucket
- **Time-Limited Access**: The access provided by a presigned URL is time-limited, meaning it will expire after a specified duration

This is particularly useful for sharing files securely or allowing temporary uploads without exposing your AWS credentials.

**<span style="color: rgb(224, 62, 45);">Warning: only files can be shared this way, so if you want to share a folder, you have to create an archive of this folder first.</span>**

## Prerequisite

The bucket has to be opened on the outside, if it's not the case, send an email to <helpdesk@unil.ch> with the subject starting with "DCSR S3 bucket".

## Create a presign key with `aws-cli`

You can either install awscli on your laptop since it's a simple Python package, or use it from the cluster (there is a module called `awscli-v2` that you can simploy load).

#### Configuration

If it's not configured yet, you can run `aws configure` command. This will ask you for:

- AWS Access Key ID: you have to provide the read/write access key corresponding to your bucket
- AWS Secret Access Key: you have to provide the read/write secret key corresponding to your bucket
- Default region name: you can put `us-east-1`
- Default output format: you can put `json`

This will create two files:

- ~/.aws/config
- ~/.aws/credentials

#### Presign key creation

Let's consider that you want the file `to_share/important_file.gz` from the bucket `recn-fac-fbm-dep-greatpi-data`, you can use the following command:

```bash
aws --endpoint-url=https://s3.unil.ch s3 presign \
    s3://recn-fac-fbm-dep-greatpi-data/to_share/important_file.gz \
    --expires-in 604800
```

The value defined with --expires-in parameter is the validity of the link expressed in seconds. The maximum validity is 7 days.

This command will return a link that can be shared with your external collaborator.

## Create a presign key with Cyberduck

Once your connection to the bucket is configured with Cyberduck, you can browse it. Then right click to file you want to share, then choose "Copy URL", and choose one of the 3 expiration options (1 hour, 1 day, 1 week). Finally, just paste the result somewhere to get the link.

[![image.png](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/scaled-1680-/aosimage.png)](https://wiki.unil.ch/ci/uploads/images/gallery/2025-03/aosimage.png)