Streaming Postgres Changes with Debezium and Kafka Connect: A Hands-On Tutorial

Looking for a simpler solution? If you're reading this tutorial, you want to reliably stream changes out of your Postgres database. You’ve turned to Debezium, only to discover that the official Debezium documentation leaves more to be desired. I hit the same snag, and decided to save future developers similar toil by writting this guide. If you are looking for an easier, faster alternative to Debezium, we built Sequin to provide the same guarantees of Debezium while eliminating all this complexity. Introduction In this tutorial, you'll set up a complete CDC pipeline using Debezium (version 3.1) with Kafka Connect to capture changes from a PostgreSQL database. By the end, you'll have a working system that streams every insert, update, and delete operation from your database into Apache Kafka topics. What you'll learn: How to set up a local development environment for Debezium with Docker Compose How to configure PostgreSQL for logical replication How to set up a Debezium connector to capture changes How to observe and work with CDC events from your database Let's get started! Prerequisites Before diving in, make sure you have: Docker Engine and Docker Compose v2 installed on your system curl command-line tool for making HTTP requests Basic familiarity with PostgreSQL and command-line operations You'll be doing everything in docker and your CLI - Debezium doesn't come with a UI or additional tooling. The Architecture Here's what you're setting up: A PostgreSQL database with logical replication enabled Apache Kafka (which requires ZooKeeper in this version) for message streaming Kafka Connect with the Debezium PostgreSQL connector to provide CDC on the Postgres database and stream those changes to the aformentioned Kafka topic A simple "customers" table that you'll monitor for changes When you're done, any change to the customers table will be captured by Debezium and sent as an event to a Kafka topic, which you can then consume from any application. Step 1: Setting Up the Infrastructure with Docker Compose First, create your environment using Docker Compose. This will spin up all the necessary services in isolated containers. Create a docker-compose.yml file Create a new directory and then create a new file called docker-compose.yml: mkdir debezium-example cd debezium-example touch docker-compose.yml Copy and paste the following into the docker-compose.yml: services: zookeeper: image: quay.io/debezium/zookeeper:3.1 ports: ["2181:2181"] kafka: image: quay.io/debezium/kafka:3.1 depends_on: [zookeeper] ports: ["29092:29092"] environment: ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_LISTENERS: INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:29092 KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka:9092,EXTERNAL://localhost:29092 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 connect: image: quay.io/debezium/connect:3.1 depends_on: [kafka] ports: ["8083:8083"] environment: BOOTSTRAP_SERVERS: kafka:9092 GROUP_ID: 1 CONFIG_STORAGE_TOPIC: connect_configs OFFSET_STORAGE_TOPIC: connect_offsets STATUS_STORAGE_TOPIC: connect_statuses KEY_CONVERTER_SCHEMAS_ENABLE: "false" VALUE_CONVERTER_SCHEMAS_ENABLE: "false" postgres: image: debezium/postgres:15 ports: ["5432:5432"] command: postgres -c wal_level=logical -c max_wal_senders=10 -c max_replication_slots=10 environment: POSTGRES_USER: dbz POSTGRES_PASSWORD: dbz POSTGRES_DB: example Understanding the docker-compose.yml Briefly, this file defines the following services - all of which are needed to run Debezium: ZooKeeper: Provides distributed configuration and synchronization for Kafka Kafka: The message broker that will store your change data events. Kafka Connect: The framework that runs your Debezium connector PostgreSQL: Your database with logical replication enabled Notice the PostgreSQL configuration: You're using the debezium/postgres:15 image, which comes with the necessary logical decoding plugins You set wal_level=logical to enable logical replication You configure max_wal_senders and max_replication_slots to allow multiple replication connections Start the containers Back in your terminal, ensure you are in the debezium-example directory containing your docker-compose.yml file, and run: docker compose up -d This command starts all the services in detached mode. To verify that all containers are running: docker compose ps You should see all four containers (zookeeper, kafka, connect, and postgres) with a status of "Up". $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS

May 6, 2025 - 03:16
 0
Streaming Postgres Changes with Debezium and Kafka Connect: A Hands-On Tutorial

Looking for a simpler solution? If you're reading this tutorial, you want to reliably stream changes out of your Postgres database. You’ve turned to Debezium, only to discover that the official Debezium documentation leaves more to be desired. I hit the same snag, and decided to save future developers similar toil by writting this guide. If you are looking for an easier, faster alternative to Debezium, we built Sequin to provide the same guarantees of Debezium while eliminating all this complexity.

Introduction

In this tutorial, you'll set up a complete CDC pipeline using Debezium (version 3.1) with Kafka Connect to capture changes from a PostgreSQL database. By the end, you'll have a working system that streams every insert, update, and delete operation from your database into Apache Kafka topics.

What you'll learn:

  • How to set up a local development environment for Debezium with Docker Compose
  • How to configure PostgreSQL for logical replication
  • How to set up a Debezium connector to capture changes
  • How to observe and work with CDC events from your database

Let's get started!

Prerequisites

Before diving in, make sure you have:

  • Docker Engine and Docker Compose v2 installed on your system
  • curl command-line tool for making HTTP requests
  • Basic familiarity with PostgreSQL and command-line operations

You'll be doing everything in docker and your CLI - Debezium doesn't come with a UI or additional tooling.

The Architecture

Here's what you're setting up:

  1. A PostgreSQL database with logical replication enabled
  2. Apache Kafka (which requires ZooKeeper in this version) for message streaming
  3. Kafka Connect with the Debezium PostgreSQL connector to provide CDC on the Postgres database and stream those changes to the aformentioned Kafka topic
  4. A simple "customers" table that you'll monitor for changes

When you're done, any change to the customers table will be captured by Debezium and sent as an event to a Kafka topic, which you can then consume from any application.

Step 1: Setting Up the Infrastructure with Docker Compose

First, create your environment using Docker Compose. This will spin up all the necessary services in isolated containers.

Create a docker-compose.yml file

Create a new directory and then create a new file called docker-compose.yml:

mkdir debezium-example
cd debezium-example
touch docker-compose.yml

Copy and paste the following into the docker-compose.yml:

services:
  zookeeper:
    image: quay.io/debezium/zookeeper:3.1
    ports: ["2181:2181"]

  kafka:
    image: quay.io/debezium/kafka:3.1
    depends_on: [zookeeper]
    ports: ["29092:29092"]
    environment:
      ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENERS: INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:29092
      KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka:9092,EXTERNAL://localhost:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  connect:
    image: quay.io/debezium/connect:3.1
    depends_on: [kafka]
    ports: ["8083:8083"]
    environment:
      BOOTSTRAP_SERVERS: kafka:9092
      GROUP_ID: 1
      CONFIG_STORAGE_TOPIC: connect_configs
      OFFSET_STORAGE_TOPIC: connect_offsets
      STATUS_STORAGE_TOPIC: connect_statuses
      KEY_CONVERTER_SCHEMAS_ENABLE: "false"
      VALUE_CONVERTER_SCHEMAS_ENABLE: "false"

  postgres:
    image: debezium/postgres:15
    ports: ["5432:5432"]
    command: postgres -c wal_level=logical -c max_wal_senders=10 -c max_replication_slots=10
    environment:
      POSTGRES_USER: dbz
      POSTGRES_PASSWORD: dbz
      POSTGRES_DB: example

Understanding the docker-compose.yml

Briefly, this file defines the following services - all of which are needed to run Debezium:

  • ZooKeeper: Provides distributed configuration and synchronization for Kafka
  • Kafka: The message broker that will store your change data events.
  • Kafka Connect: The framework that runs your Debezium connector
  • PostgreSQL: Your database with logical replication enabled

Notice the PostgreSQL configuration:

  • You're using the debezium/postgres:15 image, which comes with the necessary logical decoding plugins
  • You set wal_level=logical to enable logical replication
  • You configure max_wal_senders and max_replication_slots to allow multiple replication connections

Start the containers

Back in your terminal, ensure you are in the debezium-example directory containing your docker-compose.yml file, and run:

docker compose up -d

This command starts all the services in detached mode. To verify that all containers are running:

docker compose ps

You should see all four containers (zookeeper, kafka, connect, and postgres) with a status of "Up".

$ docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED          STATUS          PORTS                                        NAMES
c218e8a9fc67   quay.io/debezium/connect:3.1     "/docker-entrypoint.…"   59 minutes ago   Up 59 minutes   8778/tcp, 0.0.0.0:8083->8083/tcp, 9092/tcp   debezium-connect-1
5ffe2fc31745   quay.io/debezium/kafka:3.1       "/docker-entrypoint.…"   59 minutes ago   Up 59 minutes   9092/tcp, 0.0.0.0:29092->29092/tcp           debezium-kafka-1
1b1de4194458   debezium/postgres:15             "docker-entrypoint.s…"   59 minutes ago   Up 59 minutes   0.0.0.0:5432->5432/tcp                       debezium-postgres-1
d5d87a35b013   quay.io/debezium/zookeeper:3.1   "/docker-entrypoint.…"   59 minutes ago   Up 59 minutes   2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp   debezium-zookeeper-1

Step 2: Preparing PostgreSQL for CDC

Now that the required infrastructure is running, you need to configure PostgreSQL for change data capture. You’ll connect to the Postgres instance you just created and add a new replication user for Debezium, set up a demo table, and configure the table for full row captures.

Create a replication user

Create the demo table

Create a simple customers table that we'll monitor for changes:

docker compose exec postgres \
  psql -U dbz -d example \
  -c "CREATE TABLE customers (id SERIAL PRIMARY KEY, name VARCHAR(255), email VARCHAR(255));"

This table has three columns:

  • id: An auto-incrementing primary key
  • name: A customer's name
  • email: A customer's email address

If you'd like to explore the database more directly in a SQL client like TablePlus, you can connect to it using any PostgreSQL client with these connection details:

  • Host: localhost
  • Port: 5432
  • Database: example
  • User: dbz
  • Password: dbz
  • SSL Mode: disable

Or using a connection string:

postgresql://dbz:dbz@localhost:5432/example?sslmode=disable

This allows you to explore the database schema, run queries, and make changes that will be captured by Debezium.

Configure full row images

By default, PostgreSQL's logical replication only includes the primary key and changed columns in update events. To get the full "before" and "after" state of rows, you need to set the REPLICA IDENTITY to FULL:

docker compose exec postgres \
  psql -U dbz -d example \
  -c "ALTER TABLE customers REPLICA IDENTITY FULL;"

This setting ensures that when a row is updated or deleted, the entire row's data (before the change) is included in the WAL (Write-Ahead Log) entry, giving you complete information about the change in the resulting Kafka message produced by Debezium.

Step 3: Setting Up the Debezium Connector

Now, configure and register the Debezium PostgreSQL connector with Kafka Connect.

Create the connector configuration

Create a file named register-postgres.json with the following content:

{
  "name": "example-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "dbz",
    "database.password": "dbz",
    "database.dbname": "example",
    "topic.prefix": "example",
    "slot.name": "example_slot",
    "publication.autocreate.mode": "filtered",
    "table.include.list": "public.customers"
  }
}

This configuration:

  • Names the connector example-connector
  • Specifies the PostgreSQL connection details
  • Sets a topic prefix of example (resulting in a topic named example.public.customers)
  • Creates a replication slot named example_slot
  • Configures the publication to only include the tables you specify
  • Limits change capture to only the public.customers table

Register the connector

Register the connector with Kafka Connect's REST API running on port 8083:

curl -X POST -H "Content-Type: application/json" \
     --data @register-postgres.json \
     http://localhost:8083/connectors

If successful, you should receive a JSON object that looks similar to the configuration in your register-postgres.json file. This means that Kafka Connect has started the Debezium connector, which is now listening for changes to our customers table.

Verify the connector is running

You can check the status of your connector:

curl -s http://localhost:8083/connectors/example-connector/status | jq

{
  "name": "example-connector",
  "connector": {
    "state": "RUNNING",
    "worker_id": "172.20.0.5:8083"
  },
  "tasks": [
    {
      "id": 0,
      "state": "RUNNING",
      "worker_id": "172.20.0.5:8083"
    }
  ],
  "type": "source"
}

You should see that the connector is in the RUNNING state. If you encounter any issues, you can check the Kafka Connect logs:

docker compose logs -f connect

Step 4: Generating and Observing Change Events

Now for the exciting part! Make some changes to your database and watch the CDC events flow into Kafka.

Start a Kafka consumer

First, open a terminal window and start a console consumer to watch the Kafka topic:

docker compose exec kafka /kafka/bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic example.public.customers --from-beginning

This command connects to Kafka and subscribes to the example.public.customers topic, displaying all messages as they arrive.

Insert a new customer

In a new terminal window, insert a record into our customers table:

docker compose exec postgres \
  psql -U dbz -d example \
  -c "INSERT INTO customers(name,email) VALUES ('Alice','alice@example.com');"

In your consumer terminal, you should see a JSON message appear that looks something like this:

{
  "before": null,
  "after": {
    "id": 1,
    "name": "Alice",
    "email": "alice@example.com"
  },
  "source": {
    "version": "3.1.0.Final",
    "connector": "postgresql",
    "name": "example",
    "ts_ms": 1620000000000,
    "snapshot": "false",
    "db": "example",
    "sequence": "[\"12345678\",\"12345678\"]",
    "schema": "public",
    "table": "customers",
    "txId": 123,
    "lsn": 12345678
  },
  "op": "c",
  "ts_ms": 1620000000001
}

This is the Debezium change event format, also known as the "envelope." It contains:

  • before: The previous state of the row (null for inserts)
  • after: The new state of the row
  • source: Metadata about the event source
  • op: The operation type ("c" for create/insert)
  • ts_ms: Timestamp of when the event was processed

Update the customer

Now let's update the customer record:

docker compose exec postgres \
  psql -U dbz -d example \
  -c "UPDATE customers SET name='Alice Updated' WHERE id=1;"

In your consumer terminal, you should see another JSON message, this time with both before and after fields populated:

{
  "before": {
    "id": 1,
    "name": "Alice",
    "email": "alice@example.com"
  },
  "after": {
    "id": 1,
    "name": "Alice Updated",
    "email": "alice@example.com"
  },
  "source": { /* ... */ },
  "op": "u",
  "ts_ms": 1620000000002
}

Note that op now has a value of "u" for update, and both before and after states are included.

Delete the customer

Finally, let's delete the customer:

docker compose exec postgres \
  psql -U dbz -d example \
  -c "DELETE FROM customers WHERE id=1;"

You'll see a final message with op set to "d" (delete):

{
  "before": {
    "id": 1,
    "name": "Alice Updated",
    "email": "alice@example.com"
  },
  "after": null,
  "source": { /* ... */ },
  "op": "d",
  "ts_ms": 1620000000003
}

Notice that for deletes, after is null since the row no longer exists after the operation.

Step 6: Cleaning Up

When you're finished experimenting, clean up the environment:

docker compose down -v

This stops all containers and removes volumes created by Docker Compose.

Understanding the Architecture

Now that you've seen Debezium in action, it’s helpful to discuss what is happening behind the scenes.

  1. PostgreSQL Write-Ahead Log (WAL): When data changes in PostgreSQL, those changes are written to the WAL for durability. With logical replication enabled, these changes can be decoded into logical change events (we go into more detail in this post).
  2. Debezium PostgreSQL Connector: Debezium creates a replication slot in PostgreSQL and subscribes to changes. It reads from the WAL, transforms the binary changes into structured events, and sends them to Kafka.
  3. Kafka Connect: This framework manages the connector lifecycle and ensures reliable delivery of events to Kafka, handling failures and offsets.
  4. Kafka Topics: Each table's changes are published to a dedicated topic, allowing consumers to subscribe only to the tables they care about.

Conclusion

You've successfully set up a change data capture pipeline using Debezium and Kafka Connect. You've seen how to capture inserts, updates, and deletes from PostgreSQL and stream them to Kafka in real-time.

In this Kafka Connect runtime, Debezium relies on Kafka (and the Kafka Exosystem) to provide capabilities like single message transforms, message retention, and efficient scaling. Now that the system is running, get familiar with these tool to get set up.

From here, you'll want to tailor this template to your Postgres database. Ensure you enable logical replication in your Postgres instance, create a user for Debezium, and then buckle up for Java logs.