Building a Pipeline to Ingest and Analyze Streaming Data

Hands-On Lab

 

Photo of Derek Morgan

Derek Morgan

Training Architect

Length

00:30:00

Difficulty

Intermediate

In this live AWS environment, you will learn how you can use the AWS SDK for Python (boto3) to interact with Kinesis Data Streams. Specifically, you will learn how you can create, send data to, and get data from a Kinesis Data Stream.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Building a Pipeline to Ingest and Analyze Streaming Data

Introduction

In this hands-on lab, we will learn how to create, send data to, and get data from a Kinesis Data Stream.

Solution

Log in to the live AWS environment using the credentials provided. Make sure you're in the N. Virginia (us-east-1) region throughout the lab. In the AWS console, navigate to Kinesis.

Then, open a new browser tab and log in to the Jupyter notebook using the provided credentials. Once you're logged in, open the KinesisSDK.ipynb notebook.

Creating and Interacting with a Kinesis Data Stream using the AWS SDK for Python - Boto3

  1. In the Jupyter notebook, run the command listed in the first cell.
  2. Once it's finished running, run the next cell.
  3. Run the next cell to set some variables.
  4. Run the next cell to create the Kinesis Data Stream.
  5. In the AWS console, on the Kinesis page, click Get started.
  6. Click Create data stream.
  7. Click Dashboard in the left-hand menu, and refresh the page to see if you don't yet see the data stream we created.
  8. In the Jupyter notebook, run the next cell to write a function that will add data to our stream.
  9. Make sure the stream is created by checking the Kinesis dashboard, and then run the next cell in the Jupyter notebook to run the function a few times to send the data into the stream.
  10. Run the next cell to check out the data.

Analyzing Streaming Data with SQL and Kinesis Analytics

  1. In the Jupyter notebook, run the next cell to use Kinesis Analytics to analyze the streaming data.
  2. In the AWS console, click to open our data stream.
  3. Right-click Data Analytics in the left-hand menu to open it in a new browser tab.
  4. On the Kinesis Analytics page, click Create application.
  5. Give it an Application name of "penguin-app", and click Create application.
  6. On the penguin-app page, click Connect streaming data.
  7. On the Connect streaming data source page, set the following values:
    • Source: Kinesis stream
    • Kinesis stream: our-penguin-stream
    • Record pre-processing: Disabled
    • Access permissions: Choose from IAM roles that Kinesis Analytics can assume
    • IAM role: Select the one listed
  8. Click Discover schema.
  9. Click Save and continue.
  10. On the penguin-app page, click Go to SQL editor.
  11. In the dialog, click Yes, start application.
  12. In the Jupyter notebook, copy the SQL code for Kinesis Analytics.
  13. On the Real-time analytics page in the AWS console, paste in the code.
  14. Click Save and run SQL.
  15. After a couple minutes, you should see the real-time analytics start populating below.

Conclusion

Congratulations on completing this hands-on lab!