Why Kafka? A Developer-Friendly Guide to Event-Driven Architecture
What is Kafka? Kafka is an open-source distributed event streaming platform designed for handling real-time data feeds. Originally developed at LinkedIn and later open-sourced under the Apache Software Foundation, Kafka is now widely used for building high-throughput, fault-tolerant, and scalable data pipelines, real-time analytics, and event-driven architectures. What Problem Does Kafka Solve? Before Kafka, traditional message queues like RabbitMQ and ActiveMQ were widely used, but they had limitations in handling massive, high-throughput real-time data streams. Kafka was designed to address these issues by providing: Large-scale data handling – Kafka is optimized for ingesting, storing, and distributing high-volume data streams across distributed systems. Fault tolerance – Kafka replicates data across multiple nodes, ensuring that even if a broker fails, data remains available. Durability – Messages persist on disk, allowing consumers to replay events when needed. Support for event-driven architecture – It enables asynchronous communication between microservices, making it ideal for modern cloud applications. When to Use Kafka Kafka is the right choice when you need: High-throughput, real-time data processing – Ideal for log processing, financial transactions, and IoT data streams. Microservices decoupling – Kafka acts as an intermediary, allowing microservices to communicate asynchronously without direct dependencies. Event-driven systems – If your architecture revolves around reacting to changes (e.g., a user event triggering multiple downstream actions), Kafka is a solid choice. Reliable message delivery with persistence – Unlike traditional message queues that may drop messages, Kafka retains messages for a configurable period, ensuring durability and replayability. Scalability and fault tolerance – Kafka’s distributed nature allows it to scale horizontally while maintaining fault tolerance through replication. How Kafka Works Kafka consists of several key components: 1. Message A message is the smallest unit of data in Kafka. It can be a JSON object, a string, or any binary data. Messages may have an associated key, which determines which partition the message will be stored in. 2. Topic A topic is a logical channel where messages are sent by producers and read by consumers. Topics help categorize messages (e.g., logs, transactions, orders). 3. Producer A producer is a Kafka client that publishes messages to a topic. Messages can be sent in three ways: Fire and forget – The producer sends the message without waiting for confirmation, ensuring maximum speed but risking data loss. Synchronous send – The producer waits for an acknowledgment from Kafka before proceeding, ensuring reliability but adding latency. Asynchronous send – The producer sends messages in batches asynchronously, offering a balance between speed and reliability. Kafka allows configuring acknowledgments (ACKs) to balance consistency and performance: ACK 0 – No acknowledgment required (fastest but riskier). ACK 1 – The message is acknowledged when the leader broker receives it (faster but less safe). ACK All – The message is acknowledged only when all replicas confirm receipt (slower but safest). Producer Optimizations Message Compression & Batching – Kafka producers can batch and compress messages before sending them to brokers. This improves throughput and reduces disk usage but increases CPU overhead. Avro Serializer/Deserializer – Using Avro instead of JSON requires defining schemas upfront, but it improves performance and reduces storage consumption. 4. Partition Kafka topics are divided into partitions, which allow for parallel processing and scalability. Messages in a partition are ordered and immutable. 5. Consumer A consumer reads messages from partitions and keeps track of its position using an offset. Consumers can reset offsets to reprocess older messages. Kafka consumers work on a polling model, meaning they continuously request data from the broker rather than the broker pushing data to them. Consumer Optimization Partition Assignment Strategies: Range – Consumers get consecutive partitions. Round Robin – Partitions are evenly distributed across consumers. Sticky – Tries to minimize changes during rebalancing. Cooperative Sticky – Like Sticky but allows cooperative rebalancing. Batch Size Configuration – Consumers can define how many records or how much data should be retrieved per poll cycle. 6. Consumer Group A consumer group is a set of consumers that work together to process messages from a topic. Kafka ensures that a single partition is consumed by only one consumer within a group, maintaining order. 7. Offset Management When a consumer reads a m

What is Kafka?
Kafka is an open-source distributed event streaming platform designed for handling real-time data feeds.
Originally developed at LinkedIn and later open-sourced under the Apache Software Foundation, Kafka is now widely used for building high-throughput, fault-tolerant, and scalable data pipelines, real-time analytics, and event-driven architectures.
What Problem Does Kafka Solve?
Before Kafka, traditional message queues like RabbitMQ and ActiveMQ were widely used, but they had limitations in handling massive, high-throughput real-time data streams.
Kafka was designed to address these issues by providing:
- Large-scale data handling – Kafka is optimized for ingesting, storing, and distributing high-volume data streams across distributed systems.
- Fault tolerance – Kafka replicates data across multiple nodes, ensuring that even if a broker fails, data remains available.
- Durability – Messages persist on disk, allowing consumers to replay events when needed.
- Support for event-driven architecture – It enables asynchronous communication between microservices, making it ideal for modern cloud applications.
When to Use Kafka
Kafka is the right choice when you need:
- High-throughput, real-time data processing – Ideal for log processing, financial transactions, and IoT data streams.
- Microservices decoupling – Kafka acts as an intermediary, allowing microservices to communicate asynchronously without direct dependencies.
- Event-driven systems – If your architecture revolves around reacting to changes (e.g., a user event triggering multiple downstream actions), Kafka is a solid choice.
- Reliable message delivery with persistence – Unlike traditional message queues that may drop messages, Kafka retains messages for a configurable period, ensuring durability and replayability.
- Scalability and fault tolerance – Kafka’s distributed nature allows it to scale horizontally while maintaining fault tolerance through replication.
How Kafka Works
Kafka consists of several key components:
1. Message
A message is the smallest unit of data in Kafka.
It can be a JSON object, a string, or any binary data.
Messages may have an associated key, which determines which partition the message will be stored in.
2. Topic
A topic is a logical channel where messages are sent by producers and read by consumers. Topics help categorize messages (e.g., logs, transactions, orders).
3. Producer
A producer is a Kafka client that publishes messages to a topic. Messages can be sent in three ways:
- Fire and forget – The producer sends the message without waiting for confirmation, ensuring maximum speed but risking data loss.
- Synchronous send – The producer waits for an acknowledgment from Kafka before proceeding, ensuring reliability but adding latency.
- Asynchronous send – The producer sends messages in batches asynchronously, offering a balance between speed and reliability.
Kafka allows configuring acknowledgments (ACKs) to balance consistency and performance:
- ACK 0 – No acknowledgment required (fastest but riskier).
- ACK 1 – The message is acknowledged when the leader broker receives it (faster but less safe).
- ACK All – The message is acknowledged only when all replicas confirm receipt (slower but safest).
Producer Optimizations
- Message Compression & Batching – Kafka producers can batch and compress messages before sending them to brokers. This improves throughput and reduces disk usage but increases CPU overhead.
- Avro Serializer/Deserializer – Using Avro instead of JSON requires defining schemas upfront, but it improves performance and reduces storage consumption.
4. Partition
Kafka topics are divided into partitions, which allow for parallel processing and scalability.
Messages in a partition are ordered and immutable.
5. Consumer
A consumer reads messages from partitions and keeps track of its position using an offset.
Consumers can reset offsets to reprocess older messages.
Kafka consumers work on a polling model, meaning they continuously request data from the broker rather than the broker pushing data to them.
Consumer Optimization
-
Partition Assignment Strategies:
- Range – Consumers get consecutive partitions.
- Round Robin – Partitions are evenly distributed across consumers.
- Sticky – Tries to minimize changes during rebalancing.
- Cooperative Sticky – Like Sticky but allows cooperative rebalancing.
Batch Size Configuration – Consumers can define how many records or how much data should be retrieved per poll cycle.
6. Consumer Group
A consumer group is a set of consumers that work together to process messages from a topic.
Kafka ensures that a single partition is consumed by only one consumer within a group, maintaining order.
7. Offset Management
When a consumer reads a message, it updates its offset—the position of the last processed message.
- Auto-commit – Kafka automatically commits the offset at regular intervals.
- Manual commit – The application explicitly commits the offset, either synchronously or asynchronously.
8. Broker
A broker is a Kafka server that stores messages, assigns offsets, and handles client requests.
Multiple brokers form a Kafka cluster for scalability and fault tolerance.
9. Zookeeper
Zookeeper manages metadata, tracks brokers, and handles leader elections.
However, newer Kafka versions are working towards eliminating Zookeeper dependencies.
Example: Kafka in Action
To understand Kafka better, let's look at a simple example where a producer sends messages to a topic, and two different consumers process those messages separately: one simulating an email notification service and the other storing messages in a database.
Setup Kafka (docker-compose.yml)
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
container_name: zookeeper
restart: always
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:latest
container_name: kafka
restart: always
depends_on:
- zookeeper
ports:
- "9092:9092"
- "29092:29092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka:29092
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,PLAINTEXT_INTERNAL://0.0.0.0:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
Producer Code (producer.js)
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
clientId: "family-producer",
brokers: ["localhost:9092"],
});
const producer = kafka.producer();
async function sendMessage() {
await producer.connect();
console.log("