Skip to main content

Creating and Configuring Kinesis Streams and Kinesis Firehose in AWS

Hands-On Lab

 

Photo of

Training Architect

Length

01:00:00

Difficulty

Intermediate

One of the foundational elements of any Big Data pipeline is the actual "pipes". Kinesis is one way data can flow between applications, storage, and databases. By utilizing Kinesis, we can ensure our data arrives safe and sound to its intended destination. In this learning activity, we are going to see some of the various methods in which we can implement and use Kinesis and Kinesis Firehose, allowing data to get to where it needs to go.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Creating and Configuring Kinesis Streams and Kinesis Firehose in AWS

Introduction

In this hands-on lab, we are going to see some of the various methods in which we can implement and use Kinesis and Kinesis Firehose, allowing data to get to where it needs to go.

Solution

Log in to the live AWS environment using the credentials provided. Make sure you're in the N. Virginia (us-east-1) region throughout the lab.

Open a terminal window and log in to the server via SSH using the credentials provided on the lab page.

Using boto3 to Write to a Firehose Stream

Create Delivery Stream

  1. In the AWS console, navigate to Kinesis.
  2. Click Get started.
  3. Click Create delivery stream.
  4. For Delivery stream name, enter linuxacademy-courses.
  5. In the Choose source section, choose Direct PUT or other sources.
  6. Click Next.
  7. Leave the settings on the Process records page as-is, and click Next.
  8. On the Choose destination page, in the S3 destination section, click Create new.
  9. In the Create S3 bucket dialog, give it a unique name (e.g., "la-courses-" with a series of random numbers at the end).
  10. Click Create S3 bucket.
  11. Click Next.
  12. On the Configure settings page, set the following values:
    • S3 buffer conditions
      • Buffer size: 1 MB
      • Buffer interval: 60 seconds
      • Error logging: Disabled
    • S3 compression and encryption
      • Leave settings as-is
    • Error logging
      • Error logging: Disabled
    • IAM role
      • Click Create new or choose.
      • In the Role Summary section:
      • IAM Role: Select the listed role
      • Policy Name: FirehoseDeliveryRole
      • Click Allow.
  13. Click Next.
  14. Click Create delivery stream.

Write Data to the Stream and Send to S3 Bucket

  1. In the terminal, run:

    python write-to-kinesis-firehose.py
  2. In the AWS console, open S3 in a new browser tab.

  3. Click to open the listed bucket we created.

  4. After a minute or so, it should populate.

  5. In the terminal, hit Ctrl+C to stop the data.

  6. Run the following command:

    aws s3 cp --recursive s3://<BUCKET NAME>/ .
  7. Remove the 2018 folder:

    rm -rf 2018

Using the Kinesis Agent

Create Another Delivery Stream

  1. In the AWS console, on the Firehose delivery streams page, click Create delivery stream.
  2. For Delivery stream name, enter firehose-1.
  3. In the Choose source section, choose Direct PUT or other sources.
  4. Click Next.
  5. Leave the settings on the Process records page as-is, and click Next.
  6. On the Choose destination page, in the S3 destination section, click Create new.
  7. In the Create S3 bucket dialog, give it a unique name (e.g., "apachelog" with a series of random numbers at the end).
  8. Click Create S3 bucket.
  9. Click Next.
  10. On the Configure settings page, set the following values:
    • S3 buffer conditions
      • Buffer size: 1 MB
      • Buffer interval: 60 seconds
      • Error logging: Disabled
    • S3 compression and encryption
      • Leave settings as-is
    • Error logging
      • Error logging: Disabled
    • IAM role
      • Click Create new or choose.
      • In the Role Summary section:
      • IAM Role: Select the listed role
      • Policy Name: FirehoseDeliveryRole
      • Click Allow.
  11. Click Next.
  12. Click Create delivery stream.

Install Kinesis Agent

  1. In the terminal window, run:

    sudo yum -y install aws-kinesis-agent

Move File to Directory

  1. Run the following command:

    sudo cp firehose-agent.json /etc/aws-kinesis-agent.json
  2. Start the Kinesis Agent:

    sudo service aws-kinesis-agent start
  3. In the browser, refresh the S3 bucket to see our new bucket listed.

  4. In the terminal, run:

    aws s3 cp --recursive s3://<NEW BUCKET NAME>/ .

Create a New Delivery Stream

  1. In the AWS console, on the Firehose delivery streams page, click Create delivery stream.
  2. For Delivery stream name, enter firehose-2.
  3. In the Choose source section, choose Direct PUT or other sources.
  4. Click Next.
  5. Leave the settings on the Process records page as-is, and click Next.
  6. On the Choose destination page, in the S3 destination section, click Create new.
  7. In the Create S3 bucket dialog, give it a unique name (e.g., "apachelogjson" with a series of random numbers at the end).
  8. Click Create S3 bucket.
  9. Click Next.
  10. On the Configure settings page, set the following values:
    • S3 buffer conditions
      • Buffer size: 1 MB
      • Buffer interval: 60 seconds
      • Error logging: Disabled
    • S3 compression and encryption
      • Leave settings as-is
    • Error logging
      • Error logging: Disabled
    • IAM role
      • Click Create new or choose.
      • In the Role Summary section:
      • IAM Role: Select the listed role
      • Policy Name: FirehoseDeliveryRole
      • Click Allow.
  11. Click Next.
  12. Click Create delivery stream.

Copy File

  1. In the terminal window, run:

    sudo cp firehose-transform-agent.json /etc/aws-kinesis/agent.json
  2. Restart the Kinesis Agent:

    sudo service aws-kinesis-agent restart
  3. In the AWS console, refresh S3 to see the new bucket added.

  4. In the terminal window, remove the 2018 folder:

    rm -rf 2018
  5. Run the following:

    aws s3 cp --recursive s3://<NEWEST BUCKET NAME>/ .

Reading from a Kinesis Stream

  1. In the AWS console, navigate to Kinesis > Data Streams.

  2. Click Create Kinesis stream.

  3. For Kinesis stream name, enter kinesis-1.

  4. For Number of shards, enter 1.

  5. Click Create Kinesis stream.

  6. In the terminal window, run:

    sudo cp firehose-and-streams-agent.json /etc/aws-kinesis/agent.json
  7. Restart the Kinesis Agent:

    sudo service aws-kinesis-agent restart
  8. Read the Kinesis stream:

    python read-kinesis-stream.py

Conclusion

Congratulations on completing this hands-on lab!