S3cmd is a program that allows you to backup your Linux box to Amazon S3. Amazon S3 allows you basically unlimited storage and, as long as you have the bandwidth, you can use it from any location. There are two options in a backup that you can use: you can either copy all the files over to an S3 bucket (called put) or you can use the sync command to sync file changes on a regular basis.

(Screencast Located at Bottom)

Step one is to download the program. If you don’t have your own Linux box/Linux server, you can signup on Linux Academy and practice this on your own instance. Once you’re ready and at the Linux command prompt, type “apt-get install s3cmd”.

Once this is installed, you need to make sure you have an Amazon S3 bucket – available by signing up at http://aws.amazon.com. Select “security credentials” from the top right corner of the login screen; this will provide you access to your API keys.

Now you can configure your s3cmd program to connect to your Amazon S3 bucket. At the command prompt, type “s3cmd –configure”. Enter through all the defaults and save your connection settings. Once this process is finished, you should arrive back at the command prompt. At this point you can verify your install worked by typing “s3cmd ls” which will list all the available buckets in your Amazon S3 account.

Lets say you want to backup /home/pinehead folder to the Amazon S3 bucket. You have two options; first you can put the files into the bucket which will simply copy all the contents over and overwrite any existing files with the new ones. :$ s3cmd put -r /home/pinehead s3://pinehead-bkup. Note: we use -r when we are copying over folders so it copies recursively.

s3://pinehead-bkup is the S3 bucket that you are backing up to. To verify the copy worked, you can see the contents of the bucket by typing :$ s3cmd ls s3://pinehead-bkup. Doing this, you see your pinehead folder inside of the bucket. If you only wanted to copy the contents of the pinehead folder and not the folder itself, we would add “/” at the end of the path. :$ s3cmd put -r /home/pinehead/ s3://pinehead-bkup.

A real backup will sync the changes and not copy the contents. So, if you are ready to do that, just replace “put” with “sync” and run your command. Once you’ve done so, change or add some files/folders in your directory. Then run the command again and you’ll notice only the changed or added files/folders were synced to the Amazon S3 bucket.

This is what it looks like when the file changes.

This is what it looks like when no changes have occurred and nothing has synced.

To make this a backup, you need to schedule it to run on a regular basis. In this case, we’ll create a daily cron job.
:$ crontab -e
Add the following line: 0 5 * * * s3cmd sync -r /home/pinehead s3://pinehead-bkup

Of course, you will use your own folder name and your own bucket name respectively. This will run your backup everyday at 5am and only sync the changes that occurred on the file system.

Here’s a video if you prefer that type of thing

13 responses to “How to Backup Linux to Amazon S3 Using s3cmd”

  1. […] How to Backup Linux to Amazon S3 Using s3cmd | Pinehead.tv Related posts:Creating An Amazon EC2 Instance With Linux Lamp Stack Fixing Common & Uncommon Build Errors In Appcelerator Titanium How To Use Ama… […]

  2. Steven says:

    Thank you for this excellent walk through – while not on Ubuntu, I was able to get the s3cmd package downloaded and installed at s3tools.org and once installed, the rest of your tutorial made the backup a breeze!

  3. Alex Chejlyk says:

    Excellent tutorial!
    Thanks for taking the time to put it up, it is most definitely appreciated.



  4. Pär says:

    Thanks for the tutorial! Would you like to share how much data you backup, and what it costs you? I’m finding it hard to understand what Amazons pricing model would result in for me.

  5. PY says:

    Wow in under 10 minutes I was backing up 3 linux instances to s3, many thanks!

  6. Jeff P says:

    awesome tutorial man

  7. keith says:

    this is cool and useful… but this is an “asynchronous mirror”, not a backup. the reason for the distinction is, if you delete a file locally, and your “backup” runs, it will delete (sync), your remote file too. same if you corrupt a file or you are hit with ransomware. maybe s3 will backup your mirror automatically but i’m under the impression it does not.

    • bobby V says:

      If you enable versioning on your bucket that should help here. Bad thing about doing “backups” in this way is that it’s hard to get a snapshot of the entire filesystem on a given date. Picking out single files from a few days back is fine though.

      Amazon has a script that “backs up” EFS to a given “date buckets” using hardlinks which is similar to what you would want to do here.. It would probably work given a bit of tweaking to retrieve S3 backed up FS to a given date: https://github.com/awslabs/data-pipeline-samples/tree/master/samples/EFSBackup

  8. Vaggelis says:

    That is great. How can we set it with a cron or recurring somehow?

  9. Nikhil says:

    Thanks a lot for the article. It really helped me to sync with our s3. Two updates from my side is
    1. Command to configure is : s3cmd –configure
    2. To check the configuration is s3cmd ls

    and one additional info is to sync from S3 to local computer is : s3cmd sync -r s3://account/folder /var/www/html.

    Just sharing if this info helps someone.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Get actionable training and tech advice

We'll email you our latest articles up to once per week.