Skip to main content

Using Kinesis Data Firehose and Kinesis Data Analytics

Hands-On Lab

 

Photo of Miles Baker

Miles Baker

AWS Training Architect II

Length

01:00:00

Difficulty

Intermediate

Easily ingesting data from numerous sources and making timely decisions is becoming a critical and core capability for many businesses. In this lab, we provide hands-on experience using Amazon Kinesis Data Firehose to capture, transform, and load data streams into Amazon S3 and perform near real-time analytics with Amazon Kinesis Data Analytics. ### Lab Prerequisites - Understand how to log into and use the AWS Management Console. - Understand AWS Elastic Compute Cloud (EC2) basics. - Understand AWS Command Line Interface (CLI) basics.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Using Kinesis Data Firehose and Kinesis Data Analytics

Introduction

Easily ingesting data from numerous sources and making timely decisions is becoming a critical and core capability for many businesses. In this lab, we provide hands-on experience using Amazon Kinesis Data Firehose to capture, transform, and load data streams into Amazon S3 and perform near real-time analytics with Amazon Kinesis Data Analytics.

Lab Prerequisites

  • Understand how to log into and use the AWS Management Console.
  • Understand AWS Elastic Compute Cloud (EC2) basics.
  • Understand AWS Command Line Interface (CLI) basics.

Scenario

Our boss has asked us to load data into Amazon Simple Storage Service (S3). The data ingestion needs to be reliable and we need a solution with little to no ongoing administration. Additionally, the solution needs to automatically scale to meet capacity. Kinesis Data Firehose is perfect for this situation.

Additionally, we've been asked to analyze the data as it's streaming in so we can get a sense of the values and identify data anomalies. Eventually, our team will build capabilities to respond to customer data in real-time. Amazon Kinesis Data Analytics is perfect for this situation.

In this lab, we create a Kinesis Data Firehose stream, run a script to generate simulated data, and set up some basic analytical queries in Kinesis Data Analytics to look at the data.

Please log into the AWS Console by using the cloud_user credentials provided in the lab instructions.

Once inside the AWS account, make sure you are using us-east-1 (N. Virginia) as the selected region.

Resources for this lab:

Useful links related to this Hands-on Lab:

Connecting to the Lab

  1. Log in to the AWS Management Console using the credentials provided on the lab instructions page. Make sure you're using the us-east-1 region.

Create a Kinesis Data Firehose Delivery Stream

  1. Click Services in the top menu.

  2. Enter "Kinesis" in the search box.

  3. Click Kinesis.

  4. Click Get started.

  5. Click Create delivery stream.

  6. Enter "captains-kfh" in the Delivery stream name box. Do not include the quotes.

  7. Scroll to the bottom and click Next.

  8. Click Next.

  9. Under S3 destination, click Create new.

  1. Enter a unique name in the S3 bucket name box. Start the name with "kfh-ml-".

  2. Click Create S3 bucket.

  3. Scroll to the bottom and click Next.

  4. Enter a value of 1 as the Buffer size.

  5. Enter a value of 60 as the Buffer interval.

  6. Scroll to the bottom and click Create new or choose next to IAM role.

  7. Click the combo box next to IAM Role and select the role provided for the lab.

  8. Click the combo box next to Policy Name and select FirehoseDeliveryRole.

  9. Click Allow.

  10. Click Next.

  11. Click Create delivery stream.

Stream Data to the New Kinesis Data Firehose Delivery Stream

  1. Open a terminal. One option is to open a new tab and navigate to the Linux Academy web terminal.

  2. Connect to the server using the credentials provided on the hands-on lab page.

  3. Verify the existence of the provided python file.

    ls
  4. Run the command.

    python write-to-kinesis-firehose-space-captains.py
  5. Return to the AWS tab show the Firehose delivery streams.

  6. Click captains-kfh.

  7. Click the Monitoring tab.

  8. Refresh the page using the available button until data appears.

  9. Click the Details tab.

  1. Under Amazon S3 destination, click the link for the S3 bucket.

  2. Click through the folders containing the date to verify the existence of a data record file.

  3. Navigate back to the terminal and cancel the script by pressing Ctrl-C.

  4. Copy the data files. Replace FOLDER_NAME with the name of the folder in AWS created in the previous task.

    aws s3 sync s3://FOLDER_NAME/ .
  5. Verify the file contents. Replace FILE_NAME with the name of the local file created in the previous step.

    tail -1 FILE_NAME

Create a Kinesis Data Analytics Application

  1. Restart the script.

    python write-to-kinesis-firehose-space-captains.py
  2. Navigate back to the AWS console.

  3. Click Data Analytics from the left-hand menu.

  4. Click Create application.

  5. Enter "popular-space-captains" in the box next to Application name.

  6. Enter "popular-space-captains" in the box next to Description.

  7. Click Create application.

  8. Click Connect streaming data.

  9. Click the Kinesis Firehose delivery stream radio button.

  1. Click the combo box next to Kinesis Firehose delivery stream and select captains-kfh.

  2. Under Access permissions, select the Choose from IAM roles that Kinsesis Analytics can assume option.

  3. Click the combo box next to IAM role and select the role automatically provided.

  4. Scroll to the bottom and click Discover schema.

  5. Click Save and continue.

  6. Click Go to SQL editor.

  7. Click Yes, start application.

  8. Open a new tab and navigate to the GitHub repository provided in the lab.

  9. Click kinesis-analytics-popular-captains.sql.

  10. Click Raw.

  11. Select all of the text and copy it to the clipboard.

  12. Navigate back to the Kinesis window and paste the text into the box under Real-time analytics.

  13. Click Save and run SQL. After some time, the data should appear at the bottom of the window.

Create a Kinesis Data Analytics Anomaly Detection Application

  1. Delete the current query.

  2. Navigate back to the GitHub tab.

  3. Click the root directory link.

  4. Click kinesis-analytics-rating-anomaly.sql.

  5. Click Raw

  6. Select all of the text and copy it to the clipboard.

  7. Navigate back to the Kinesis window and paste the text into the box under Real-time analytics.

  8. Click Save and run SQL. After some time, the data should appear at the bottom of the window.

Conclusion

Congratulations, you've completed this hands-on lab!