Amazon DynamoDB Data Modeling
AWS Training Architect II in Content
Welcome to the Amazon DynamoDB Data Modeling course.
In this course, you will learn:NoSQL concepts and DynamoDB's features How to optimize DynamoDB for: Runtime performance Cost DynamoDB data modeling patterns 1:N, N:M, hierarchical, GSI overloading, and more Write sharding, sparse indexes, and materialized aggregations Strategies to migrate data from RDBMS to DynamoDB
You'll further solidify your learning with real-world examples and hands-on labs.
By the end of this course, you'll have the necessary skills and knowledge to effectively build scalable, cost-efficient, high-performance applications using DynamoDB.
Thank you for taking the course — let's get started!
Welcome to the Amazon DynamoDB Data Modeling course. In this course, you will learn:NoSQL concepts and DynamoDB's features How to optimize DynamoDB for:Runtime performance CostDynamoDB data modeling patterns1:N, N:M, hierarchical, GSI overloading, and more Write sharding, sparse indexes, and materialized aggregations Strategies to migrate data from RDBMS to DynamoDBYou'll further solidify your learning with real-world examples and hands-on labs. By the end of this course, you'll have the necessary skills and knowledge to effectively build scalable, cost-efficient, high-performance applications using DynamoDB.
About the Training Architect
A little bit about me — your instructor for this course!
My name is Mark Richman, and I'll be your training architect for this course. I'm really excited to be helping you on your training journey.
I'd love to hear from you, so please feel free to reach out to me in our Slack community or on social media. I'm
@mrichman on Twitter, GitHub, and just about everywhere else.
In this lesson, we discuss relational databases and their inherent faults. Next, we explain the "sparse data" problem and how NoSQL — and DynamoDB in particular — solves this problem. Finally, we define the goals of data modeling and how it impacts the performance and cost profile of your DynamoDB-based applications.
Understanding DynamoDB Data Structures
DynamoDB and Tables
In this lesson, we cover what DynamoDB is, how it combines the best of wide column and key-value data stores, and some basic terminology. We also discuss the two capacity modes (On-Demand vs. Provisioned) and how billing is calculated.
Items and Attributes
Partition and Sort Keys
In this lesson, we discuss partition and sort keys. We cover their purpose and how they relate to performance with respect to partitions.
In this lesson, we discuss the available data types in DynamoDB.
Indexes in DynamoDB are very different from those in relational databases. Indexes store your data in different ways to facilitate different access patterns. In this lesson, we discuss global secondary indexes (GSIs) and local secondary indexes (LSIs) — including how to create them and their features and limitations — and walk through a usage example in the AWS Management Console.
Demo: Creating a Table
In this lesson, we learn how to create a table in DynamoDB. We will supply partition and sort keys and create a few items. We also see to how perform query and scan operations on our table, as well as how to apply filter expressions.
Basics of DynamoDB Data Modeling
This lesson introduces the technical implementation of partition keys, including what they are and how they are used with DynamoDB's hashing function to store and retrieve data. We'll also look at how partitions influence performance and support our applications' access patterns. We'll see how partitions work to:Help distribute our workloads evenly Spread reads and writes across partitions for performance What partition key design choices can cause performance issues like throttling
NoSQL Data Modeling Concepts
With NoSQL databases and DynamoDB in particular, we take a very different approach to data modeling. In this lesson, we look at some data modeling concepts you'll need to understand to become successful with DynamoDB.
In this lesson, we take a look at modeling one-to-many relationships within a single DynamoDB table.GitHub
Adjacency Lists and Many-to-Many Relationships
Adjacency lists are a design pattern that provide a way to represent graph data (nodes and edges) in DynamoDB. We'll use this pattern to implement many-to-many relationships in a single DynamoDB table.Adjacency List Design Pattern
In this lesson, we're going to model hierarchical data in DynamoDB, all while searching for diapers at our local Target store.
In this lesson, we take a look at a technique called GSI overloading, which allows us to hold many different types of data items at one time — all in a single DynamoDB table.
In this lesson, we cover a technique called write sharding. One way to better distribute writes across a partition key space in DynamoDB is to expand the space. For write-heavy items, you can avoid hot keys by applying a random suffix to partition keys.GitHub
In this lesson, we talk about sparse indexes and how we can use them to speed up queries and save some money.
In this lesson, we cover how to use the data coming from DynamoDB streams to aggregate our voting data in real time and use Lambda to scale up without any operational overhead.GitHub
Analyzing Data Workloads
Qualifying Your Workload’s Requirements
Simple Case Study
The next several lessons are going to cover a process of identifying and defining data access requirements of an application. This lesson provides an overview of a simple case study to illustrate the techniques used in this module of the course.
Defining an Entity–Relationship Model
In this lesson, we learn about the entity–relationship modeling process, showing how to map data entities and the relationships between those entities.
Documenting Access Patterns
To start designing a DynamoDB table that performs well and minimizes cost, we need to first identify the access patterns required by the applications it needs to support. In this lesson, we walk through the process of identifying key access patterns your DynamoDB tables will support.
In this lesson, we talk about the various phases of the data lifecycle — the flow of data from creation and storage to when that data ultimately becomes obsolete and is destroyed.
Interpreting Access Patterns
Optimizing for Read/Write Workloads
In this lesson, we talk about read-heavy and write-heavy workloads, as well as what the data model considerations are for each case.
Scenario 1 - User Profile Microservice
Scenario 1 - User Profile Microservice
In this section of the course, we're going to step through a process of identifying and defining data access requirements of an application. Our example application is a microservice that provides flight information to travel agents.Sample data
Entities and Access Patterns
In this brief lesson, we will identify the access patterns in our application. From these access patterns, we can extract the domain entities, as well as the various queries that will inform our indexes.
In this lesson, we will:Explore our source CSV flight data. Define a DynamoDB data model to suit our access patterns. Deploy a SAM application for our microservice. Load data into DynamoDB. Test our microservice via the API Gateway endpoint.Installing the AWS SAM CLI
Scenario 2 - Relational to DynamoDB Migration
Scenario 2 - Relational to DynamoDB Migration
In this section of the course, we'll discuss a strategy for migrating from an RDBMS to DynamoDB that takes into account all the concepts we've covered so far. We'll migrate a sample database from MySQL into DynamoDB using the AWS Database Migration Service (DMS).MySQL Employees Sample Database AWS Database Migration Service (DMS)
Data Analysis Phase
The purpose of the data analysis phase is to understand the composition of the source data and to identify the data access patterns used by the application. This information is required input into the data modeling phase. It is also essential for understanding the cost and performance of running a workload on DynamoDB.
Data Modeling Phase
In this phase, the table, partition and sort keys, and secondary indexes will be defined. The data model produced in this phase must support the data access patterns described in the data analysis phase.
The testing phase is the most important part of the migration strategy. It is during this phase that the entire migration process will be tested end-to-end. In this lesson, we'll cover the basic tests that a comprehensive test plan should minimally contain.
Data Migration Phase
In this lesson, we'll work through the data migration phase. Using the AWS Database Migration Service (DMS), we'll extract our source data from MySQL, transform it into the structure supporting our new DynamoDB data model, and load that data into DynamoDB.
Access Patterns and Supporting IndexesFind employee by employee IDBase table, query on
emp_noFind all employees for a given last nameGSI PK:
last_nameFind all employees for a given titleGSI PK:
titleFind all employees for a given departmentGSI PK:
department_nameFind all employees with salary >= 130,000GSI PK:
salary>=130000Find all current department managersGSI PK:
Query GSI once for each
department_nameEmployees DynamoDB Table Definition
MySQL Employees Sample Database AMI ID:
amzn_lin2_mysql57_employees (ami-0aa8df173b9f8c0d3)Minimum t3a.2xlarge Minimum 10 GB EBS gp2 Security group:SSH (22/tcp) MySQL (3306/tcp)Connect via
ssh -i <keypair.pem> ec2-user@<public ip>. MySQL read-only user: * Username:
MySQL root user (local only): * Username:
mysql -ucloud_user -pbettertogether employees mysql> show tables; +----------------------+ | Tables_in_employees | +----------------------+ | current_dept_emp | | departments | | dept_emp | | dept_emp_latest_date | | dept_manager | | employees | | salaries | | titles | +----------------------+ 8 rows in set (0.00 sec)
Create Materialized View in MySQL Note: You must connect to MySQL as
mysql -uroot -pStr0ngpass! employees
rootin order to create tables or views.
CREATE VIEW dms_source AS SELECT employees.*, titles.title, titles.from_date AS title_from_date, titles.to_date AS title_to_date, salaries.salary, salaries.from_date AS salary_from_date, salaries.to_date AS salary_to_date, dept_emp.dept_no AS department_number, departments.dept_name AS department_name, dept_emp.from_date AS dept_from_date, dept_emp.to_date AS dept_to_date, CONCAT_WS(' ', manager.first_name, manager.last_name) AS manager_name, manager.emp_no AS manager_emp_no FROM employees INNER JOIN titles ON employees.emp_no = titles.emp_no INNER JOIN salaries ON employees.emp_no = salaries.emp_no INNER JOIN dept_emp ON employees.emp_no = dept_emp.emp_no INNER JOIN departments ON dept_emp.dept_no = departments.dept_no INNER JOIN dept_manager dm ON dm.dept_no = departments.dept_no INNER JOIN employees manager ON manager.emp_no = dm.emp_no;
CREATE TABLE dms_source_materialized SELECT * FROM dms_source;
CREATE TABLEcommand should take approximately three minutes to complete.
Create DMS Access Role Name:
mysql> CREATE TABLE dms_source_materialized SELECT * FROM dms_source; Query OK, 13849320 rows affected (2 min 51.05 sec) Records: 13849320 Duplicates: 0 Warnings: 0 mysql> select count(emp_no) from dms_source_materialized; +---------------+ | count(emp_no) | +---------------+ | 13849320 | +---------------+ 1 row in set (8.69 sec)
AmazonDynamoDBFullAccessNote the service access role ARN. DMS Replication Instance Name:
3.3.1(or latest) Create Source Endpoint Name:
cloud_userCreate Target Endpoint Name:
employees-dynamodbDMS Task Name:
rules.jsonTesting the Access Patterns
Can't See DMS Task CloudWatch Logs?Create an IAM role with trusted entity
DMS. Attach the
AmazonDMSCloudWatchLogsRolemanaged policy. Save the role as
dms-cloudwatch-logs-role.Tip: Increasing DMS Task Logging Level to Debug https://aws.amazon.com/premiumsupport/knowledge-center/dms-enable-debug-logging/
aws dms modify-replication-task --replication-task-arn <task-arn> --replication-task-settings file://task-settings-debug.json
Evolving a Data Model
So, what happens when our DynamoDB table is in a live production environment and we need to support a new access pattern? What's the developer experience here, and how do we update our application without interruption?
Thanks for taking this course! Please don't forget to press the thumbs-up button on all the lessons you liked! In this lesson, we look at some other courses you may enjoy next. Related CoursesAmazon DynamoDB Deep Dive AWS Certified SysOps Administrator - Associate (SOA-C01) Automating AWS with Lambda, Python, and Boto3
Take this course and learn a new skill today.
Transform your learning with our all access plan.Start 7-Day Free Trial