Introduction

 

Amazon S3 (Simple Storage Service) is the flexible, cloud hosted object storage service provided by Amazon Web Services. Touted as eleven 9’s (99.999999999%) of durability with 99.99% availability, S3 is a web accessible, data storage solution with high scalability to support on-premise backups, logging, static web hosting, and cloud processing. This guide will briefly describe different options to interact with S3 but will focus in-depth on using the AWS CLI to work with the service.

Requirements

Before starting this guide you should be familiar with…

  • Managing Linux or Unix systems from the command line
  • Linux or Unix requires Python 2.6.5 or higher.
  • Install using pip.

 

AWS Command Line Installation

 

Other S3 Tools (source)

Once the AWS CLI is installed, you are ready to proceed.

Setting Up Credentials
Creating A User

Before you can start interacting with S3, sign into the AWS console to create set of access keys using AWS Identity and Authentication Management (AWS IAM) *See AWS IAM Best Practices. Once logged into the AWS console, navigate to Services > IAM > Users and click Create New Users.

 

user_64683_585abb2949802.PNG

 

Enter a username ie. s3user and click Next: Permissions. Select the Attach existing policies directly tab, type “s3” into search filter, and select AmazonS3FullAccess policy.

 

user_64683_585abb806122c.PNG

 

Click Next: Review for a final accounting of your new user and then click Complete to create and save your new access keys.

 

user_64683_585abbeba1fb9.PNG

 

Remember to download and securely save the the Access Key ID and Secret Access Key because this is the last time they will be available for download or viewing.

The following policy can also be represented as JSON (Javascript Object Notation) which allows all actions and resources associated with S3. *By Default, all resources and resource actions are explicitly set to Deny

 

// JSON representation of S3 Full Access Policy
{
 "Version": "2012-10-17",
 "Statement": [{
 "Effect": "Allow",
 "Action": "s3:*",
 "Resource": "*"
 }]
}

 

Our access keys associated with the user can now interact with AWS S3 and you are ready to configure your local AWS CLI client.

Configure the CLI

Configure the CLIConfiguring your AWS CLI can be done using the aws configure command or setting environment variables.

 

Using aws configure Command:

## Enter the following commands using the keys created earlier

$> aws configure
 AWS Access Key ID [None]: <YOUR_AWS_ACCESS_KEY_ID>
 AWS Secret Access Key [None]: <YOUR_AWS_SECRET_ACCESS_KEY>
 Default region name [None]: us-east-1 #Region List
 Default output format [None]: text #Formats(txt,json(default),table)

 

Setting environment variables:

## Enter the following commands using the keys created earlier

$ export AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY_ID>
$ export AWS_SECRET_ACCESS_KEY=<YOUR_AWS_SECRET_ACCESS_KEY>
$ export AWS_DEFAULT_REGION=us-east-1 #Use your applicable region

 

For more information on setting up AWS CLO configuration and createing different profiles. refer to the official doFor more information on setting up AWS CLI configuration and creating different profiles, refer to the official documentation.

 

Working with the CLI

The AWS CLI is a comprehensive tool for interacting with Amazon’s Web Services. This guide will focus on the command aws s3 & aws s3api and there subcommands.

Exploring the Commands

When using a command line tool, being able to view the manual or help from the tool is the best way to start getting comfortable with it. Amazon has integrated thorough documentation into its CLI tool for each service and command.

 

## Getting Started

## A high level synopsis of the tool and services
$> aws help

## An overview of the available s3 commands and their arguments
$> aws s3 help

## An overview of the specified commands arguments and options
$> aws s3 <command> help

 

Creating a Bucket

Before you can start moving data to S3, you will first need to create a bucket.

 

## Creating a new bucket in us-west-1 region
$> aws s3 mb s3://my-s3 --region us-west-1

# output
make_bucket: my-s3

## Your bucket is no ready to use!
## http://my-s3.s3.amazonaws.com/

* Note: Naming the bucket “my-s3” is impractical and for example purposes. Amazon S3 bucket names must be globally unique and follow DNS naming conventions. See Amazon’s bucket restrictions.=

Interacting with Data in S3

The aws s3 service provides commands that allow you to work with S3 as if it were a filesystem. These will be the primary commands for you to move data to, from, and in between S3 and your local network.

 

Copy files locally to the new S3 bucket.

*This example assumes the local directory has towo text files. test1.txt and test2.txt

## Copy files to bucket directory “files”
$> echo “test” >> test1.txt && echo “test” >> test2.txt
$> aws s3 cp test1.txt s3://my-s3/files/test1.txt

#Output
upload: ./test1.txt to s3://my-s3/files/test1.txt

## Recursively copy files in current directory including pattern *.txt

$> aws s3 cp --recursive --exclude “*” --include “*.txt” \
 ./ s3://my-s3/files/

#Output
upload: ./test1.txt to s3://my-s3/files/test1.txt
upload: ./test2.txt to s3://my-s3/files/test2.txt

 

List and summarize the newly uploaded files

## List files in the bucket directory “files”
$> aws s3 ls s3://my-s3/files/

 

#output
2016-11-20 12:21:15 1700 test1.txt
 2016-11-22 08:20:46 1300 test2.txt

## Recursively summarize files in directory and make them human readable
$> aws s3 ls s3://my-s3/files –summarize –human-readable –recursive

#output
2016-11-20 12:21:15 1.7 KiB files/test1.txt
2016-11-22 08:20:46 1.3 KiB files/test2.txt

Total Objects: 2
Total Size: 3 KiB

 

Move files in S3 to new directory

## Move test1.txt to be renamed to testone.txt
$> aws s3 mv test1.txt s3://my-s3/files/testone.txt

 

 # output
 move: s3://my-s3/files/test1.txt to s3://my-s3/files/testone.txt 

## Recursively move objects in directory to new directory
$> aws s3 mv --recursive s3://my-s3/files s3://my-s3/temp

 

 # output
 move: s3://my-s3/files/testone.txt to s3://my-s3/temp/testone.txt
 move: s3://my-s3/files/test2.txt to s3://my-s3/temp/test2.txt

 

Sync local directory to S3 directory

## Sync synced_dir to S3
$> mkdir synced_dir && echo “test” >> synced_dir/test1.txt \
 && echo “test” >> synced_dir/test2.txt

 

$> aws s3 sync ./synced_dir s3://my-s3/synced_dir

 #output
 upload: synced_dir/test1.txt to s3://my-s3/synced_dir/test1.txt 
 upload: synced_dir/test2.txt to s3://my-s3/synced_dir/test2.txt

## Update test1.txt and run sync again
$> echo “update” >> synced_dir/test1.txt

$> aws s3 sync ./synced_dir s3://my-s3/synced_dir

# output
upload: synced_dir/test1.txt to s3://my-s3/synced_dir/test1.txt

## Delete test1.txt and let sync delete S3 copy
$> rm -rf synced_dir/test1.txt

$> aws s3 sync –delete ./synced_dir s3://my-s3/synced_dir

# output
delete: s3://my-s3/synced_dir/test1.txt

 

Remove S3 resources

## Remove individual file from temp dir
$> aws s3 rm s3://my-s3/temp/testone.txt

 # output
 delete: s3://my-s3/temp/testone.txt

## Recursively remove files in temp directory with pattern ‘*.txt’
$> aws s3 rm –recursive –exclude “*” –include “*.txt” s3://my-s3/temp

# output
delete: s3://my-s3/temp/test2.txt

## Delete all resources within the bucket
$> aws s3 rm –recursive s3://my-s3

# output
delete: s3://my-s3/synced_dir/test2.txt

 

Delete S3 Bucket

## Remove bucket and any remaining items within it
$> aws s3 rb --force s3://my-s3

# output
remove_bucket: my-s3

 

S3 Events and Logging
Using CloudTrail To Log S3 Events

Amazon’s CloudTrail is a managed web service that makes it easy to track your API calls. Now it allows you to track S3 events (create, destroy, lost) for multiple designated buckets and prefixes.

This example you will setup logging for S3 events from the AWS Dashboard. First, create a new trail named s3-event-logs with a new bucket your-s3-event-logging to track events and click Turn On.

 

user_64683_585ac3c959dab.PNG

 

Select the newly created s3-events-logs trail and scroll down to the section Event Selectors and click the pencil icon to edit. Select your desired bucket/s and track All events. Click Save

 

user_64683_585ac41f92294.PNG

 

CloudTrail is now tracking your events from S3!

 

Publishing, Versioning, and Lifecycle
Data Publishing

Private, restricted, temporary, and public access are some available options to share data with other applications or users on S3. These access control levels (acl) can be assigned at the bucket or object level. The following commands are an overview of the posibilities.

 

## Enable full bucket access by email
$> aws s3api put-bucket-acl --bucket <s3-bucket-name> \
 --grant-full-control emailaddress=user@example.com \

## Enable public read of bucket
$> aws s3api put-bucket-acl –bucket <s3-bucket-name> \
-acl public-read

## Enable full object access by email
$> aws s3api put-object-acl –bucket <s3-bucket-name> –key test.txt \
–grant-full-control emailaddress=user@example.com

## Enable public read of object
$> aws s3api put-object-acl –bucket <s3-bucket-name> –key test.txt \
–acl public-read

Check out Amazon’s documentation on S3 Access Control for more in depth configuration options.

 

Also for public datasets, S3 offers requester pays option so bucket owners do not have to burden the download costs. To setup a requester pays bucket from the AWS Console, select the S3 bucket. Click Properties on the right, top corner and scroll down to Requester Pays. Check Enabled and Click Save.

user_64683_585ac576a4239.PNG

Versioning

S3 allows for multiple variation of the same data objects for more robust and resilient data storage.

 

This is a quick overview of setting and verifying bucket versioning is enabled.

## Enable versioning on a bucket
$> aws s3api put-bucket-versioning \
 --bucket <s3-bucket-name> \
 --versioning-configuration Status=Enabled

## Verify versioning was created
$> aws s3api get-bucket-versioning \
–bucket <s3-bucket-name>

## Suspend versioning on a bucket
$> aws s3api put-bucket-versioning \
–bucket <s3-bucket-name> \
–versioning-configuration Status=Suspend

Check out Amazon’s documentation on S3 Versioning for more in depth configuration options.

Lifecycle

Object lifecycles allows data to be transitioned for infrequent access, archival, or removal. Setting defined timelines allows for more efficient and cost effective storage.

This example creates a lifecycle for log data in which you will move logs in logs/ directory into infrequent storage after 30 days, then into archive after 90 days, and finally remove the logs after 365 days.

First copy and paste the configuration into a file.

## save as lifecycle.json file

{
“Rules”: [
{
“ID”: “log-cycle”,
“Prefix”: “logs/”,
“Status”: “Enabled”,
“Transitions”: [
{
“Days”: 30,
“StorageClass”: “STANDARD_IA”
},
{
“Days”: 90,
“StorageClass”: “GLACIER”
}
],
“Expiration”: {
“Days”: 365
}
}
]
}

 

Next we will apply the lifecycle to our bucket

## Create the bucket lifecycle
$> aws s3api put-bucket-lifecycle-configuration \
 --bucket <s3-bucket-name> \
 --lifecycle-configuration file://lifecyle.json

## Verify lifecycle was created
$> aws s3api get-bucket-lifecycle-configuration \
 --bucket <s3-bucket-name>

 

## Delete lifecycle from bucket
aws s3api delete-bucket-lifecycle \
 --bucket <s3-bucket-name>

Check out Amazon’s documentation on S3 Lifecycle for more in depth configuration options.

 

Additional Resources
Courses
Labs
Comments are disabled for this guide.