So far the following ideas have been introduced: topic, message, partition, producer, consumer and broker. By now, you should understand how Kafka stores messages on disk using commit log, topics and partitions. You should also know how a message is structured.
It’s time to introduce consumer groups, which are the missing piece of message distribution in Kafka.
In the previous post I mentioned that there are certain rules, which define how messages are delivered to partitions. Let’s assume the same round robin selection will be used as before.
Hire Event Exercise
In order to illustrate how consumer groups are used, let’s first define an example scenario. I will use the hire topic with two partitions introduced earlier in the series. There will be a single consumer that will be sending messages to the topic. When a person is hired by a fictitious company Building Architects & Co., that person is registered in an HCM Application. On the first day of work, each new employee is required to have their working suit ready and needs to be assigned to their first task based on the current stream of work and their declared experience. Those two events happen in different programmes in the company – let’s call them Equipment Registry and Work Manager. The picture below illustrates required flow:
It’s not hard to guess that Kafka and the hire topic will try to meet those requirements. Producer – the HCM Application – will be sending hire events to the hire topic. Each message will go in a round robin fashion to partitions 0 and 1. In this example John Smith hire event was placed in the partition number 0.
The question is: how do consumers know which messages are for them, which to read and which one they already read? All messages that are sent to the hire topic should be delivered to both applications: Work Manager and Equipment Registry.
Message distribution is managed with a concept of the consumer group:
- consumer group tells the broker what kind of service is connecting to it
- consumer group can be compared to the class in Java, and a consumer that belongs to that group can be thought of as an instance of that class
- each consumer can belong to one group only
- one group can have many consumers
- the maximum number of active consumers in a group is equal to the number of partitions in a topic:
no of partitions >= active consumers in a group
Kafka tracks which messages have been read on two levels simultaneously:
- consumer group level
- partition level
It means that consumers belonging to the same consumer group can not read messages from the same partition. If there are two consumers in a group and a topic consists of two partitions, one consumer will read from partition 0 and another from partition 1. It will be best to illustrate use cases of consumers and consumer groups.
UC1: Topic with a single partition and a single consumer
All messages from partition 0 are received by Consumer A.
UC2: Topic with two partitions and a single consumer
Red message from partition 0 and blue message from partition 1 are received by Consumer A.
UC3: Topic with two partitions and two consumers in the same CG
Consumer A and Consumer B both belong to the same consumer group. Red message from partition 0 is received by Consumer A, blue message from partition 1 goes this time to Consumer B.
UC4: Topic with two partitions, four consumers and two CGs
Consumer A and D belong to the grey group, Consumer B and C to the green group. Each consumer in a group is attached to a single partition. Each message from a single partition goes to exactly one consumer per group: red message is received by A and B, blue message is received by C and D. In other words a group receives all messages from all partitions.
As you can see, the number of consumers in a group has to meet the inequality defined above, i.e. there can be as many active consumers in a single group as there are partitions in a topic.
Solving Hire Event Exercise
Below is an illustration of a solution for the hire event exercise:
In the solution Work Manager belongs to WM Group and Equipment Registry to ER Group. Thanks to this setup, both applications can receive all messages sent by the HCM Application.
Solving problem from Part 1
Overcoming Message Queue Limitation
The challenge with message queue was adding a third service (D), such that all messages were received by that service. As you probably remember, this was not possible with queues, since only some messages would go to D. With Kafka it is possible by:
- using 2 partitions and the same consumer group for B and C to load balance work between those services
- using third service D in another consumer group to read from partitions 0 and 1 so that it receives all messages
Overcoming Publish – Subscribe Limitation
The requirement was to load balance services B and C. With the traditional publish – subscribe pattern, all messages would be received by B, C, B’ and C’, hence the work would be doubled rather than load balanced. In order to achieve that in Kafka, the following setup can be used:
- 2 partitions and 2 consumer groups
- grouping services: B and B’, C and C’ to load balance work between them
I added a data table as a destination to illustrate that services B and B’, C and C’ are delivering the same functionality.
This post completes a set of basic Kafka Ideas, which provide reasoning why one would like to use Kafka. In the future Kafka Ideas you will see how to install, configure and use Kafka. There are also advanced topics which were omitted for simplicity, however are worth analysing to understand Kafka even better.