Skip to main content

Joining Datasets with KSQL

Hands-On Lab

 

Photo of Will Boyd

Will Boyd

DevOps Team Lead in Content

Length

00:30:00

Difficulty

Intermediate

KSQL provides a SQL-like interface for most of the operations you perform using Kafka Streams. Like Kafka Streams, KSQL is capable of joining multiple streams into a single dataset. In this lab, we will work with joins in KSQL by writing a persistent streaming query that joins two streams. This will give you some hands-on experience with joins in KSQL.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Joining Datasets with KSQL

Introduction

KSQL provides a SQL-like interface for most of the operations you perform using Kafka Streams. Like Kafka Streams, KSQL is capable of joining multiple streams into a single data set. In this lab, we will work with joins in KSQL by writing a persistent streaming query that joins two streams. This will give you some hands-on experience with joins in KSQL.

Solution

Log in to the lab server using the credentials provided on the hands-on lab page:

ssh cloud_user@PUBLIC_IP_ADDRESS

Create Streams for Both Input Topics

  1. Start a KSQL session:

    sudo ksql
  2. Set auto.offset.reset to earliest so that all streams will process the existing test data:

    SET 'auto.offset.reset' = 'earliest';
  3. View the data in the member_signups topic:

    PRINT 'member_signups' FROM BEGINNING;
  4. Create a stream for the member_signups topic:

    CREATE STREAM member_signups
      (lastname VARCHAR,
        firstname VARCHAR)
      WITH (KAFKA_TOPIC='member_signups',
        VALUE_FORMAT='DELIMITED');
  5. View the data in the member_contact topic:

    PRINT 'member_contact' FROM BEGINNING;

    You may need to hit Ctrl+C to stop the process before the next step.

  6. Create a stream for the member_contact topic:

    CREATE STREAM member_contact
      (email VARCHAR)
      WITH (KAFKA_TOPIC='member_contact',
        VALUE_FORMAT='DELIMITED');

Create a Persistent Streaming Query to Join the Two Streams and Output the Result

  1. Create a persistent streaming query to join the two streams:

    CREATE STREAM member_email_list AS
      SELECT member_signups.firstname, member_signups.lastname, member_contact.email
      FROM member_signups
      INNER JOIN member_contact WITHIN 365 DAYS ON member_signups.rowkey = member_contact.rowkey;
  2. Check the output topic to verify the correct data is present:

    PRINT 'MEMBER_EMAIL_LIST' FROM BEGINNING;

Conclusion

Congratulations on successfully completing this hands-on lab!