Apache Kafka Ideas – Part 2

The Topic, the Message and the Partition

Traditional messaging patterns: message queue and publish – subscribe, have some limitations as a result of their design.

In the previous post – Apache Kafka Ideas – Part 1, a couple of messaging use cases were introduced. In order to define those cases with Kafka, it is important to understand its ideas. At the very heart of Kafka are topics and partitions. This post explains basic concepts behind them.

The Topic

A topic is an abstract entity, which acts as a mailbox for messages. Each message in Kafka has to be sent to a topic. Kafka uses the follwoing naming conventions:

  • message senders are called Producers
  • message recipients are called Consumers
  • single Kafka instance is called a Broker
  • producers send data to a broker
  • consumers poll for data from a broker

Kafka message exchange

The Message

The image below illustrates Kafka’s message structure:

Kafka Message

  • message Key is optional
  • message Value is required, it contains the actual payload
  • message Timestamp is also optional and it can be set by the sender, however certain rules apply on the broker side whether to override it or not

The Partition

Partitions unlike topics have physical representations in Kafka. Each message is persisted on a disk within a partition:

  • partitions are folders that aggregate messages on a disk
  • partition name, or folder name if you like, follows the following pattern:
    [topic_name]-[partition_number]
  • partitions contain log files (*.log)
  • log files are physical containers for messages
  • each topic must have at least one partition
  • partition numbering starts from 0

Kafka partitions

Physical representation of a topic named hire:

Kafka partitions physical

The additional *.index and *.timeindex files support Kafka engine in looking up messages that are stored on a disk. /tmp/kafka-logs is the logs root defined in a broker’s configuration. In the example above, the topic’s name is hire. The partitions (folders hire-0 and hire-1) are stored directly under the messages’ root, hence the topic folder is not maintained.

There are certain rules that define how messages are delivered to a particular partition – these will be covered later in the series. For now let’s assume that when a message is sent, it will go to a partition that meets the follwoing equation – let’s call it round robin selection:

partition = message_no % partition_count

In Kafka, a partition is a structured commit log. It means that:

  • messages are appended in order they are received
  • each message has its corresponding  position in the log called offset
  • offset and order of messages are maintained per partition
  • offset numbering starts from 0

Kafka Offset

Offset is never reset! It is Java’s Long, which means that 2^63 – 1 messages can be stored in one partition. If 1 million messages were sent per second, a producer would need 292 471 years to fill up the whole partition.

In the next post we will discuss consumer groups which would allow to finally define an alternative solution for the message queue and the publish – subscribe traditional patterns.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s