What Apache Kafka is?
Apache Kafka can be thought of as a message broker. It has the following characteristics:
- allows sending messages between two parties
- allows one-to-one (peer to peer, queue) or one-to-many (broadcast, topic) message delivery
- persists messages
What ideas are behind Kafka and how does it differ from a classical broker? In this series of posts you’ll find out how does Apache Kafka work and be able to run and use Kafka cluster.
Common Messaging Patterns
Looking at existing Message Oriented Middleware (MOM), Microsoft MSMQ for example or JMS implementations like Apache ActiveMQ, JBoss Messaging, HornetQ or Amazon SQS, two main message passing patterns are widely used:
- message queue
- publish – subscribe
In this pattern a producer sends a message to a queue. A message is received by only one recipient / consumer. In the JMS standard, message ordering is not guaranteed. Messaging queues usually guarantee that a message is processed only once.
A common use case is load balancing of work required to process the messages, which can be achieved by delegating multiple consumers to read from a queue. If message ordering is not required it is a perfect fit. Another is to allow producers to continue their work even if recipients of data are not available – for example due to a system failure. It means that queues usually offer persistent storage of messages.
Publish – Subscribe
This pattern allows a publisher to broadcast a message to many subscribers using a topic. Message producers publish to a topic, and recipients subscribe to it. Only active subscribers (at the moment of sending a message) or durable subscribers receive the copy of a message.
This pattern can be successfully used to propagate changes to all components that need to know about it – a new hire event for example, can be sent to different applications used by different departments to set up employee’s accounts and prepare equipment for the first day of their work.
Since there are products already offerring patterns that can be used in most cases, why would we bother using Kafka? It turns out that Kafka combines the two basic approaches into one, scalable solution, allowing its users to solve challenges that appear in growing and evolving systems.
Message Queue Limitation
Let’s consider the following scenario using a message queue: service A writes to a queue and 2 consumers (service B and C) are used to load balance the processing of messages. However another type of consumer comes into play (service D, which could be an audit) that it is required to receive the same data as A and B. With message queues it is not possible to add that 3rd service to the same queue, since it would take only some messages, and those messages would not be received by services A and B anymore.
Publish – Subscribe Limitation
In a different scenario there is a service A that publishes messages to a topic. That topic has two subscribers – service B and C. What happens if the quantity of messages is very high and I would like to load balance the work by adding B’ and C’ services? That isn’t possible in a publish – subscribe pattern, since all messages go to all subscribers. Instead of load balance, all messages would go to all services.
Kafka mitigates those limitations using a mechanism common for both cases: a concept of a topic. However, it is a topic on steroids and the main idea of Apache Kafka, which will be described in the next part of the series.