Back up Messages to an S3 Bucket in Kafka

Hands-On Lab

 

Photo of Chad Crowell

Chad Crowell

DevOps Training Architect II in Content

Length

01:45:00

Difficulty

Intermediate

Kafka is known for replicating data amongst brokers, to account for broker failure. Sometimes, this isn't enough, or perhaps you are sharing this data with a third-party application. Backing up your messages to an alternate source can save a lot of time and energy when trying to configure access to a cluster, or delivering that data quickly across the world.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Back up Messages to an S3 Bucket in Kafka

The Scenario

Kafka is known for replicating data amongst brokers, to account for broker failure. Sometimes, this isn't enough, or perhaps you are sharing this data with a third-party application. Backing up your messages to an alternate source can save a lot of time and energy when trying to configure access to a cluster, or delivering that data quickly across the world.

We've got to run through how to make this happen, and we'll do it by setting Kafka up so that it writes out to an AWS S3 bucket.

Logging In

There are a few different passwords, and keys we need to worry about in this lab. As we go through, just note which ones we need for whichever operation we're performing at the moment.

Connect and Start the Kafka Cluster

In the Bastion host, start a container and open a shell to that container:

sudo docker run -ti --rm --name kafka-cluster --network host confluentinc/docker-demo-base:3.3.0

Notice that we've miraculously become root. Let's get into the /tmp directory:

cd /tmp

Now we can start up the Kafka cluster:

confluent start

Once that starts successfully, we can move on.

Create a New S3 Bucket

Install the awscli tool (once we've updated our system):

apt update
apt install -y awscli

Configure access to AWS by creating a key. Note that our cloud_user Access and Super Access keys are sitting back on the hands-on lab page:

aws configure

AWS Access Key ID: [ACCESS_KEY]
AWS Secret Access Key: [SECRET_ACCESS_KEY]
Default region name: us-east-1
Default output format: [None]

Create a new bucket in the us-east-1 region. Make sure our name is unique:

aws s3api create-bucket --region us-east-1 --bucket [UNIQUE_BUCKET_NAME]

Add the new bucket name to the configuration file for the S3 connector:

apt install -y vim

Then exit the properties file:

vim /etc/kafka-connect-s3/quickstart-s3.properties

Change the region and bucket.name lines to this:

s3.region=us-east-1
s3.bucket.name=[$globally_unique_bucket_name]

Start a Producer to a New Topic Named s3_topic and Write at Least 9 Messages

Now let's open an Avro console producer to the topic, and include a schema:

kafka-avro-console-producer --broker-list localhost:9092 --topic s3_topic --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'

Type the 9 messages following the defined schema:

{"f1": "value1"}
{"f1": "value2"}
{"f1": "value3"}
{"f1": "value4"}
{"f1": "value5"}
{"f1": "value6"}
{"f1": "value7"}
{"f1": "value8"}
{"f1": "value9"}

Press Ctrl + C to exit the session.

Start the Connector and Verify the Messages Are in the S3 Bucket

We can start the connector and load the configuration:

confluent load s3-sink

We'll get some JSON output, with our bucket name in there somewhere. Copy that name, and use it in the command to list its objects:

aws s3api list-objects --bucket [OUR_BUCKET_NAME]

Conclusion

The JSON output here shows that we have managed to get objects from our Kafka cluster stored in an AWS S3 bucket. This is exactly what we were aiming to do. Congratulations!