How Kafka Achieves High Throughput: A Breakdown of Its Log-Centric Architecture

Kafka routinely handles millions of messages per second on commodity hardware. This performance isn't accidental. It stems from deliberate architectural choices centered around log-based storage, OS-level optimizations, and minimal coordination between readers and writers. This post breaks down the core mechanisms that enable Kafka's high-throughput design. 1. Append-Only Log Storage Each Kafka topic is split into partitions, and each partition is an append-only log. It is essentially a durable, ordered sequence of messages that are immutable once written. To manage growing data size efficiently, Kafka breaks each partition’s log into multiple segment files. A segment is a file on disk that stores a continuous range of messages. New messages are always written to the active segment using low-level system calls like write(). This write lands in the OS page cache, not written to disk immediately. Kafka delays calling fsync() to flush data to disk, relying instead on configurable flush policies (based on time or size). This reduces disk I/O and improves performance, at the cost of brief durability gaps. Kafka mitigates this through replication across brokers. Over time, when a segment reaches a size threshold, it is closed and a new one is created. Older segments become read-only and are subject to log retention, compaction, or deletion based on topic settings. By aligning its write path with sequential disk I/O, Kafka avoids random seeks entirely. This makes reads and writes fast and predictable, even on spinning disks, and scales well with data volume. 2. Outperforming Traditional Queues with Sequential I/O Traditional messaging systems often manage message delivery using per-consumer tracking and persistence mechanisms that can result in random disk access, especially during acknowledgment, redelivery, or crash recovery. While these systems are efficient in memory, random I/O patterns on disk introduce performance bottlenecks. For spinning disks, a single seek can take around 10 milliseconds, and disks can only perform one seek at a time. Kafka sidesteps this entirely by relying on sequential I/O. Writes are appended, and reads proceed in order. This design significantly improves disk efficiency, especially under load. By decoupling performance from data volume and enabling concurrent read/write access, Kafka makes efficient use of low-cost storage hardware, such as commodity SATA drives, without sacrificing performance. 3. Speeding Up Seeks with Lightweight Indexing Each segment file is accompanied by lightweight offset and timestamp indexes. These allow consumers to seek directly to specific message positions without scanning entire files, ensuring fast lookup even on large datasets. Since Kafka consumers track their own offsets and messages are immutable, there is no need to update shared state for acknowledgments or deletions. This eliminates coordination between readers and writers, reducing contention and enabling true parallelism. 4. Batching to Maximize I/O Efficiency High-throughput systems must avoid the overhead of processing one message at a time. Kafka uses a message set abstraction to batch messages: Producers group messages before sending them. Brokers perform a single disk write per batch. Consumers fetch large batches with a single network call. This batching reduces system calls, disk seeks, and protocol overhead. As a result, throughput improves significantly. 5. Zero-Copy Data Transfer with sendfile() Conventional data transfer involves multiple memory copies: Disk to kernel space (page cache) Kernel to user space (application buffer) User space back to kernel (socket buffer) Kernel to NIC buffer (for network) Kafka avoids this overhead using the sendfile() system call. This enables zero-copy data transfer from the page cache directly to the network stack, bypassing user space entirely. This reduces CPU usage and memory pressure, allowing near wire-speed data transfer even under heavy load. 6. Long-Term Retention Without Performance Loss Kafka’s append-only log model enables long-term message retention, even for days or weeks, without degrading performance. Because reads and writes are decoupled, and messages are not mutated post-write, old data remains accessible without impacting current workloads. This supports powerful use cases like: Replaying messages for state recovery Late-arriving consumer processing Time-travel debugging and auditing Conclusion Kafka’s high throughput is the result of system and architectural decisions that work together by design. Its log-centric model avoids random I/O, minimizes coordination, and takes full advantage of OS-level features like the page cache and zero-copy transfers. The result: Kafka handles massive data volumes not through abstract complexity, but by working with the OS instead of against it. Append

Apr 20, 2025 - 06:40

How Kafka Achieves High Throughput: A Breakdown of Its Log-Centric Architecture

Kafka routinely handles millions of messages per second on commodity hardware. This performance isn't accidental. It stems from deliberate architectural choices centered around log-based storage, OS-level optimizations, and minimal coordination between readers and writers.

This post breaks down the core mechanisms that enable Kafka's high-throughput design.

1. Append-Only Log Storage

Each Kafka topic is split into partitions, and each partition is an append-only log. It is essentially a durable, ordered sequence of messages that are immutable once written.

To manage growing data size efficiently, Kafka breaks each partition’s log into multiple segment files. A segment is a file on disk that stores a continuous range of messages. New messages are always written to the active segment using low-level system calls like write(). This write lands in the OS page cache, not written to disk immediately.

Kafka delays calling fsync() to flush data to disk, relying instead on configurable flush policies (based on time or size). This reduces disk I/O and improves performance, at the cost of brief durability gaps. Kafka mitigates this through replication across brokers.

Over time, when a segment reaches a size threshold, it is closed and a new one is created. Older segments become read-only and are subject to log retention, compaction, or deletion based on topic settings.

By aligning its write path with sequential disk I/O, Kafka avoids random seeks entirely. This makes reads and writes fast and predictable, even on spinning disks, and scales well with data volume.

2. Outperforming Traditional Queues with Sequential I/O

Traditional messaging systems often manage message delivery using per-consumer tracking and persistence mechanisms that can result in random disk access, especially during acknowledgment, redelivery, or crash recovery. While these systems are efficient in memory, random I/O patterns on disk introduce performance bottlenecks. For spinning disks, a single seek can take around 10 milliseconds, and disks can only perform one seek at a time.

Kafka sidesteps this entirely by relying on sequential I/O. Writes are appended, and reads proceed in order. This design significantly improves disk efficiency, especially under load.

By decoupling performance from data volume and enabling concurrent read/write access, Kafka makes efficient use of low-cost storage hardware, such as commodity SATA drives, without sacrificing performance.

3. Speeding Up Seeks with Lightweight Indexing

Each segment file is accompanied by lightweight offset and timestamp indexes. These allow consumers to seek directly to specific message positions without scanning entire files, ensuring fast lookup even on large datasets.

Since Kafka consumers track their own offsets and messages are immutable, there is no need to update shared state for acknowledgments or deletions. This eliminates coordination between readers and writers, reducing contention and enabling true parallelism.

4. Batching to Maximize I/O Efficiency

High-throughput systems must avoid the overhead of processing one message at a time. Kafka uses a message set abstraction to batch messages:

Producers group messages before sending them.
Brokers perform a single disk write per batch.
Consumers fetch large batches with a single network call.

This batching reduces system calls, disk seeks, and protocol overhead. As a result, throughput improves significantly.

5. Zero-Copy Data Transfer with `sendfile()`

Conventional data transfer involves multiple memory copies:

Disk to kernel space (page cache)
Kernel to user space (application buffer)
User space back to kernel (socket buffer)
Kernel to NIC buffer (for network)

Kafka avoids this overhead using the sendfile() system call. This enables zero-copy data transfer from the page cache directly to the network stack, bypassing user space entirely.

This reduces CPU usage and memory pressure, allowing near wire-speed data transfer even under heavy load.

6. Long-Term Retention Without Performance Loss

Kafka’s append-only log model enables long-term message retention, even for days or weeks, without degrading performance. Because reads and writes are decoupled, and messages are not mutated post-write, old data remains accessible without impacting current workloads.

This supports powerful use cases like:

Replaying messages for state recovery
Late-arriving consumer processing
Time-travel debugging and auditing

Conclusion

Kafka’s high throughput is the result of system and architectural decisions that work together by design. Its log-centric model avoids random I/O, minimizes coordination, and takes full advantage of OS-level features like the page cache and zero-copy transfers.

The result: Kafka handles massive data volumes not through abstract complexity, but by working with the OS instead of against it.

Appendix: Key Terms

write(): A system call that transfers data from user space to the OS page cache.
Page cache: A memory buffer managed by the kernel.
fsync(): Forces data in the page cache to be flushed to disk.
sendfile(): A system call that sends data from a file directly to the network without copying to user space.
Sequential I/O: Reading or writing data in a linear order. Much faster than random I/O, especially on HDDs.
Random I/O: Accessing data at non-contiguous disk locations. This causes performance degradation due to disk seeks.

Further Reading:

Kafka Official Design Documentation