Building a Scalable Event-Driven Pipeline with MongoDB, Docker, and Kafka

In modern DevOps workflows, handling real-time data streams efficiently is crucial for building scalable applications. In this guide, we'll explore how to set up an event-driven pipeline using MongoDB, Docker, and Kafka to handle high-throughput data processing with ease. Imagine an e-commerce platform processing millions of orders in real time. Our setup ensures seamless, fault-tolerant data streaming between services. 1. Why Event-Driven Architectures? Traditional architectures struggle with real-time processing, batch jobs, and scalability. Event-driven systems address these problems by: Decoupling components for greater scalability. Processing data in real-time instead of batch operations. Enhancing fault tolerance through asynchronous messaging. Kafka serves as the central message broker, while MongoDB acts as a persistent data store for event logs and structured data. 2. Setting Up MongoDB with Docker To run MongoDB in a containerized environment, use the following Docker Compose setup: version: '3.8' services: mongodb: image: mongo:latest container_name: mongodb restart: always ports: - "27017:27017" environment: MONGO_INITDB_ROOT_USERNAME: root MONGO_INITDB_ROOT_PASSWORD: example volumes: - mongodb_data:/data/db volumes: mongodb_data: Run MongoDB with: docker-compose up -d Now, MongoDB is up and running on port 27017. 3. Deploying Kafka in Docker Kafka requires Zookeeper for coordination. We'll deploy both using Docker Compose: services: zookeeper: image: confluentinc/cp-zookeeper:latest container_name: zookeeper environment: ZOOKEEPER_CLIENT_PORT: 2181 ports: - "2181:2181" kafka: image: confluentinc/cp-kafka:latest container_name: kafka depends_on: - zookeeper ports: - "9092:9092" environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092 Start Kafka with: docker-compose up -d Check Kafka logs to confirm it's running: docker logs -f kafka 4. Connecting Kafka & MongoDB Kafka Connect enables data streaming between Kafka and MongoDB. Step 1: Install MongoDB Kafka Connector docker exec -it kafka bash confluent-hub install mongodb/kafka-connect-mongodb:latest Step 2: Configure Kafka Connector Create a mongo-sink.json file: { "name": "mongo-sink-connector", "config": { "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector", "topics": "events", "connection.uri": "mongodb://root:example@mongodb:27017", "database": "eventDB", "collection": "eventLogs" } } Apply the configuration: curl -X POST -H "Content-Type: application/json" --data @mongo-sink.json http://localhost:8083/connectors Now, Kafka will stream events directly into MongoDB!

Apr 4, 2025 - 17:55
 0
Building a Scalable Event-Driven Pipeline with MongoDB, Docker, and Kafka

In modern DevOps workflows, handling real-time data streams efficiently is crucial for building scalable applications. In this guide, we'll explore how to set up an event-driven pipeline using MongoDB, Docker, and Kafka to handle high-throughput data processing with ease.

Imagine an e-commerce platform processing millions of orders in real time. Our setup ensures seamless, fault-tolerant data streaming between services.

1. Why Event-Driven Architectures?

Traditional architectures struggle with real-time processing, batch jobs, and scalability. Event-driven systems address these problems by:

  • Decoupling components for greater scalability.
  • Processing data in real-time instead of batch operations.
  • Enhancing fault tolerance through asynchronous messaging.

Kafka serves as the central message broker, while MongoDB acts as a persistent data store for event logs and structured data.

2. Setting Up MongoDB with Docker

To run MongoDB in a containerized environment, use the following Docker Compose setup:

version: '3.8'
services:
  mongodb:
    image: mongo:latest
    container_name: mongodb
    restart: always
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example
    volumes:
      - mongodb_data:/data/db
volumes:
  mongodb_data:

Run MongoDB with:

docker-compose up -d

Now, MongoDB is up and running on port 27017.

3. Deploying Kafka in Docker

Kafka requires Zookeeper for coordination. We'll deploy both using Docker Compose:

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
    ports:
      - "2181:2181"

  kafka:
    image: confluentinc/cp-kafka:latest
    container_name: kafka
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092

Start Kafka with:

docker-compose up -d

Check Kafka logs to confirm it's running:

docker logs -f kafka

4. Connecting Kafka & MongoDB

Kafka Connect enables data streaming between Kafka and MongoDB.

Step 1: Install MongoDB Kafka Connector

docker exec -it kafka bash
confluent-hub install mongodb/kafka-connect-mongodb:latest

Step 2: Configure Kafka Connector

Create a mongo-sink.json file:

{
  "name": "mongo-sink-connector",
  "config": {
    "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
    "topics": "events",
    "connection.uri": "mongodb://root:example@mongodb:27017",
    "database": "eventDB",
    "collection": "eventLogs"
  }
}

Apply the configuration:

curl -X POST -H "Content-Type: application/json" --data @mongo-sink.json http://localhost:8083/connectors

Now, Kafka will stream events directly into MongoDB!