Using AWS Data Pipeline with DynamoDB

Hands-On Lab

 

Photo of Craig Arcuri

Craig Arcuri

AWS Training Architect II in Content

Length

00:30:00

Difficulty

Intermediate

This Learning Activity will allow the student to use AWS Data Pipeline to export a DynamoDB table to an S3 Storage location. Data Pipeline can be used to export the table at specified times, when specified events occur (data uploads or changes in the DynamoDB table), or at Data Pipeline activation. In this Learning Activity, the student will configure Data Pipeline to export the Data upon activation of the Data Pipeline. To perform this task, Data Pipeline will create an EMR Cluster (a group of M4.xlarge instances). Once the pipeline is activated, the data export to S3 will begin. After a few minutes the operation will complete and the S3 bucket can be checked for the successful export file.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Using AWS Data Pipeline with DynamoDB

Introduction

In this hands-on lab, we'll create a data pipeline to export a DynamoDB table to S3.

Log in to the AWS environment with the cloud_user credentials provided on the lab page. Once inside the AWS account, make sure you are using us-east-1 (N. Virginia) as the selected region.

Data Pipeline Export from DynamoDB to S3

Before we begin, there's one little thing we should do now to save us a few seconds later:

  1. From the AWS Management Console Dashboard, navigate to VPC.
  2. Click Subnets.
  3. Copy the Subnet ID of the first subnet listed, and paste it into a note or text file. We'll need it in a few minutes.

Create One Data Pipeline

  1. Navigate to Data Pipeline.
  2. Click Get started now.
  3. For a Name, enter "backupdynamodb".
  4. In the first section, for a Source, keep Build using a template selected, and choose the template Export DynamoDB table to S3.
  5. In the Parameters section, enter a Source DynamoDB table name of "LinuxAcademy". (Note: You must use this exact name, as it's the name of the table we'll pull our data from.)
  6. Click into Output S3 folder, and select the provided folder.
  7. In the Schedule section, for Run, select on pipeline activation.
  8. In the Pipeline Configurations section, under S3 location for logs, click the folder icon and select the provided location.
  9. In the Security/Access section, select Custom for IAM roles, and select the provided roles for Pipeline role and EC2 instance role.
  10. Click Edit in Architect.
  11. In Resources, change the instance types to m4.xlarge.
  12. Click Add an optional field..., select Subnet Id, and paste in the subnet ID we copied earlier.
  13. Click Activities, and set Resize Cluster Before Running to false.
  14. Click Save.
  15. Click Activate.

Check for S3 Folder from Data Pipeline Export

  1. Navigate to S3.
  2. Click the listed S3 bucket.
  3. Verify two folders have been created.

Conclusion

Congratulations on completing this lab!