Exploring Kafka

Message brokers such as Kafka are very popular back end technology that testers may encounter, especially in a microservices based product architecture.

Message brokers and message queues differ from API based request and response services in that the component subscribes to messages that could arrive at any time rather then specifically GET some data.

There is a good introduction to Kafka and the concepts of event streaming here : http://kafka.apache.org/documentation/#introduction

Let’s start exploring. Firstly, you’ll need to install Docker, which is a container virtulization platform. Once installed, clone the repository from https://github.com/ObjectiveTester/AllThingsTesting.com-examples.git and open a terminal in the same directory as docker-compose.yml:

AllThingsTesting.com-examples/Kafka

And type the following commands:

docker-compose up -d
docker-compose exec kafka bash

This downloads, installs and starts the necessary Kafka components, and starts a terminal session on the Kafka broker so we can issue commands to it.

In an additional terminal, run this command (as before, in the same directory as docker-compose.yml):

docker-compose exec kafka bash

Produce and consume

In the first terminal, create a message topic with:

kafka-topics --create --topic simple-topic \
--bootstrap-server kafka:9092 \
--replication-factor 1 --partitions 1

Then run a command that accepts input to create messages:

kafka-console-producer --topic simple-topic \
--broker-list kafka:9092 \
--property parse.key=true --property key.separator=":"

In the other terminal, run a command to read a specific number of messages:

kafka-console-consumer --topic simple-topic \
--bootstrap-server kafka:9092 \
--consumer-property group.id=test-group \
--property print.key=true --property key.separator=":" \
--max-messages 2

In the producer terminal, type in line by line (or copy and paste in one block):

k1:v1
k2:v2
k3:v3
k4:v4
k5:v5
k6:v6
k7:v7
k8:v8

This will create 8 key/value pairs and the consumer terminal process will show the first 2 before exiting.

So far so good. Re-run the consumer to read the next 2 messages (3 & 4).

Now, read all the remaining messages with:

kafka-console-consumer --topic simple-topic \
--bootstrap-server kafka:9092 \
--consumer-property group.id=test-group \
--property print.key=true --property key.separator=":"

Reading from the start

Next, let’s re-read all the data. In the ‘consumer’ terminal, terminate the command with Ctrl-C and then type:

kafka-console-consumer --topic simple-topic \
--bootstrap-server kafka:9092 \
--property print.key=true --property key.separator=":" \
--from-beginning

This will re-display all the messages in the topic. Now, enter another key and value in the ‘producer’ terminal. That will also be displayed in the ‘consumer’ terminal.

Moving offsets

An offset is a record in the consumer group of where consumers have read to so far – multiple instances of a service may be consuming messages and those services may be interrupted, and the application using the messages may not capable of dealing with repeated messages. ‘Rewinding’ the offset can sometimes be useful or necessary.

In the ‘consumer’ terminal, terminate the running command with Ctrl-C and then set the offset for a consumer group with:

kafka-consumer-groups --bootstrap-server kafka:9092 \
--group test-group --topic simple-topic \
--reset-offsets --to-offset 4  --execute

Then restart the consumer process, specifying the same consumer group:

kafka-console-consumer --topic simple-topic \
--bootstrap-server kafka:9092 \
--consumer-property group.id=test-group \
--property print.key=true --property key.separator=":"

The consumer will re-read all messages from the specified offset, starting with k5:v5. If you enter another value in the ‘producer’ terminal, that will also be displayed by the consumer.

To specifically read messages from a given offset on a specific partition, you can also do:

kafka-console-consumer --topic simple-topic \
--bootstrap-server kafka:9092 \
--property print.key=true --property key.separator=":" \
--partition 0 --offset 4

The documentation discusses message delivery semantics which will be an important test consideration based on how the application using Kafka has been designed. There is more detail available here: https://kafka.apache.org/documentation/#semantics

Partitions

Terminate the processes in the two terminals with Ctrl-C. In one of the terminals, create a new topic with 2 partitions:

kafka-topics --create --topic multi-partition \
--bootstrap-server kafka:9092 \
--replication-factor 1 --partitions 2

Then run a command that accepts input to create messages:

kafka-console-producer --topic multi-partition \
--broker-list kafka:9092 \
--property parse.key=true --property key.separator=":"

Then start a 3rd terminal and run the docker command to start a shell on the Kafka broker (as before, in the same directory as docker-compose.yml):

docker-compose exec kafka bash

In the 2nd terminal:

kafka-console-consumer --topic multi-partition \
--bootstrap-server kafka:9092 \
--property print.key=true --property key.separator=":" \
--partition 0

In the 3rd terminal:

kafka-console-consumer --topic multi-partition \
--bootstrap-server kafka:9092 \
--property print.key=true --property key.separator=":" \
--partition 1

Then copy and paste key value pairs into the producer terminal as before.

The messages will be written to both partitions of the topic. By default, the partition written to is determined based on a hash of the key, but this can be configured to be round-robin, or the producer can specify a specific partition (although not with the command line tool kafka-console-producer)

Other commands

Some other commands to try out:

kafka-producer-perf-test --topic multi-partition \
--producer-props bootstrap.servers=kafka:9092 \
--throughput 10000 --record-size 10 --num-records 100

kafka-topics --zookeeper zookeeper:2181 --list

kafka-topics --zookeeper zookeeper:2181 --describe \
--topic <topic>

kafka-topics --zookeeper zookeeper:2181 --delete \
--topic <topic>

Replication & Retention

The example Kafka service is purely for experimental purposes – a real world implementation will be designed with fault tolerance and longevity of data considerations accounted for, and these will have implications for the test approach.

The documentation covers these subjects here:
https://kafka.apache.org/documentation/#replication

https://kafka.apache.org/documentation/#compaction

Next steps

This has been a very brief look at Kafka with some hands on examples to explain the basic concepts. How organisations use Kafka in production will vary widely, so if you are working on a project involving Kafka, speak to a developer or architect for more information on configuration and how you can add value by testing the message broker.