Streaming Data Using Kafka Streams to Count Words

Hands-On Lab

 

Photo of Chad Crowell

Chad Crowell

DevOps Training Architect II in Content

Length

01:00:00

Difficulty

Intermediate

Kafka Streams is a library enabling you to perform per-event processing of records. You can use it to process data as soon as it arrives, versus having to wait for a batch to occur. In this hands-on lab, we use Kafka Streams to stream some input data as plain-text and process it in real-time. With this, we can count the number of words from our input stream.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Streaming Data Using Kafka Streams to Count Words

Introduction

In this hands-on lab, we use the WordCount demo application that comes with the Kafka binaries. This application is already built, so we won't create the application from scratch. We need to create an input topic, an output topic, and then use the WordCount Streaming Application to count the number of words in the input stream using the Kafka console consumer. We do this by passing in the apprpriate properties to the console consumer to format, serialize, and deserialize the data into the correct output for viewing in the console. When we have an output of the count of each word in the console, we've successfully completed this lab.

Solution

  1. Begin by logging in to the lab server using the credentials provided on the hands-on lab page.

    ssh cloud_user@PUBLIC_IP_ADDRESS

Set Up the Environment

  1. Transfer to the content-kafka-deep-dive directory.

    cd content-kafka-deep-dive/
  2. Build the Kafka Cluster

    docker-compose up -d --build
  3. Return to your home directory and expand the Kafka file.

    cd
    
    tar -xvf kafka_2.12-2.2.0.tgz
  4. Rename the new directory.

    mv kafka_2.12-2.2.0 kafka
  5. Install Java.

    sudo apt install default-jdk
  6. Verify the version.

    java -version

Write to a Kafka Stream and Open a Kafka Console Consumer

  1. Transfer to the kafka directory.

    cd kafka/
  2. Create an input topic.

    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic streams-plaintext-input
  3. Create an output topic.

    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic streams-wordcount-output
  4. Open a Kafka console producer.

    bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-plaintext-input
  5. Enter the following messages.

    kafka streams is great
    kafka processes messages in real time
    kafka helps real information streams
  6. Use Ctrl-C to exit the producer.

  7. Open a Kafka console consumer.

    bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 
        --topic streams-wordcount-output 
        --from-beginning 
        --formatter kafka.tools.DefaultMessageFormatter 
        --property print.key=true 
        --property print.value=true 
        --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer 
        --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

Run the WordCountDemo on the Kafka Stream

  1. Open a new terminal to the lab server.

    ssh cloud_user@PUBLIC_IP_ADDRESS
  2. Move the kafka directory and run the WordCount application.

    cd kafka
    
    bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo
  3. Switch back to the terminal with the open Kafka consumer and verify the word counts.

Conclusion

Congratulations — you've completed this hands-on lab!