S3 storage
- Introduction
- Request S3 bucket
- Software to access S3 bucket
- Share files from a bucket with presign keys
Introduction
What is S3?
Amazon S3 (Simple Storage Service) is a scalable object storage service used for storing and retrieving any amount of data at any time. It organizes data into containers called “buckets.” Each bucket can store an unlimited number of objects, which are the fundamental entities stored in S3.
Understanding S3 Bucket structure
- Buckets: These are the top-level containers in S3. Each bucket has a unique name and is used to store objects.
- Objects: These are the files stored in a bucket. Each object is identified by a unique key (or ID) within the bucket.
- Object Keys: While S3 does not have a traditional file system hierarchy, it uses a flat namespace. The / character in object keys is used to simulate a directory structure, making it easier to organize and manage objects. However, these are not actual directories but part of the object’s key.
S3 Endpoint Access
Accessing S3 is similar to accessing any other web service over HTTP, which most users are already familiar with. The endpoint URL follows the same structure as a typical web address, making it straightforward to understand and use.
An S3 endpoint address typically looks like this: https://dnsname.com/bucket-name/object-key
- Endpoint: https://dnsname.com
- Bucket Name: bucket-name
- Object Key: object-key
For example, if you have a bucket named my-bucket and an object with the key folder1/file.txt, the S3 URL would be: https://dnsname.com/my-bucket/folder1/file.txt
IAM Key Pairs
To access and manage your S3 resources securely, you will use IAM (Identity and Access Management) key pairs instead of a traditional login and password. An IAM key pair consists of an Access Key ID and a Secret Access Key. These keys are used to authenticate your requests to AWS services:
- Access Key ID: this is similar to a username
- Secret Access Key: this is similar to a password and should be kept secure.
Unlike a traditional login and password, different IAM key pairs can be attached to different sets of permissions defined in their policy files. These policies control what actions the keys are allowed to perform, enhancing security by ensuring that each key pair has only the necessary permissions for its intended tasks.
Request S3 bucket
To request S3 bucket you have two options. Either you want to attach it to an existing project or you want to create a new project with S3 storage.
Attach S3 bucket to an existing project
In that case send an email to helpdesk@unil.ch (with subject starting with "DCSR add S3 bucket to project") and providing the following information:
- project name
- size of the bucket in GB
- access allowed from outside UNIL (in read only mode only?) or not
- if access is allowed from outside UNIL, should it be limited to a set of IP or network ranges?
Add S3 bucket to new project
When requesting a new project using the dedicated web application, on the resource information selection screen, click on advanced selection as follows:
Then choose "Object Storage, NO BACKUP" article :
Finally, specify your requirements:
Software to access S3 bucket
From Curnagl cluster (command line tools)
Rclone
awscli
From your laptop or a workstation
Command line tools
Following the official documentation, you can install Rclone (https://rclone.org/install/) or awscli (https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) on your laptop/workstation.
Then both tools can be used as described in the above part dedicated to use from Curnagl cluster.
Cyberduck
Cyberduck can be installed from https://cyberduck.io/download/.
Share files from a bucket with presign keys
Purpose
AWS presigned URLs (or presign keys) are used to grant temporary access to objects in Amazon S3 without requiring the recipient to have AWS security credentials. Here are the main purposes:
- Download Access: You can generate a presigned URL to allow someone to download an object from your S3 bucket without needing their own AWS credentials
- Upload Access: Similarly, you can create a presigned URL to permit someone to upload a specific object to your S3 bucket
- Time-Limited Access: The access provided by a presigned URL is time-limited, meaning it will expire after a specified duration
This is particularly useful for sharing files securely or allowing temporary uploads without exposing your AWS credentials.
Warning: only files can be shared this way, so if you want to share a folder, you have to create an archive of this folder first.
Prerequisite
The bucket has to be opened on the outside, if it's not the case, send an email to helpdesk@unil.ch with the subject starting with "DCSR S3 bucket".
Create a presign key with aws-cli
You can either install awscli on your laptop since it's a simple Python package, or use it from the cluster (there is a module called awscli-v2
that you can simploy load).
Configuration
If it's not configured yet, you can run aws configure
command. This will ask you for:
- AWS Access Key ID: you have to provide the read/write access key corresponding to your bucket
- AWS Secret Access Key: you have to provide the read/write secret key corresponding to your bucket
- Default region name: you can put
us-east-1
- Default output format: you can put
json
This will create two files:
- ~/.aws/config
- ~/.aws/credentials
Presign key creation
Let's consider that you want the file to_share/important_file.gz
from the bucket recn-fac-fbm-dep-greatpi-data
, you can use the following command:
aws --endpoint-url=https://s3.unil.ch s3 presign \
s3://recn-fac-fbm-dep-greatpi-data/to_share/important_file.gz \
--expires-in 604800
The value defined with --expires-in parameter is the validity of the link expressed in seconds. The maximum validity is 7 days.
This command will return a link that can be shared with your external collaborator.
Create a presign key with Cyberduck
Once your connection to the bucket is configured with Cyberduck, you can browse it. Then right click to file you want to share, then choose "Copy URL", and choose one of the 3 expiration options (1 hour, 1 day, 1 week). Finally, just paste the result somewhere to get the link.