Migrating 2.4 TB PostgreSQL to a Cloud-Native Lakehouse Without Downtime
CDC-Based Migration with Debezium, Iceberg, and Zero Business Disruption
The Challenge
A European B2B SaaS company had accumulated 2.4 TB of operational data in a self-managed PostgreSQL 14 cluster on bare-metal. Analytics queries were running directly on the production database, causing significant read load, occasional replication lag, and — during quarter-end reporting — perceptible slowdown for end users. The data team needed a dedicated analytical platform but had hard constraints: zero downtime for their operational workloads and a migration window of less than 30 days.
Our Solution
We designed a CDC-based migration strategy using Debezium for change data capture, avoiding any ETL-style bulk export that would require downtime. **Phase 1 — Initial Snapshot (Week 1)** Used pglogical to create a consistent snapshot replica, then loaded it into S3 as Parquet using Spark on EMR. This took 14 hours and happened entirely in the background. **Phase 2 — CDC Streaming (Weeks 1–3)** Deployed Debezium on Kubernetes, connected to a dedicated replication slot on the Postgres primary, and streamed WAL events to Kafka topics (one per table). A Flink consumer materialised these events as Iceberg table updates on S3, maintaining full ACID compliance. **Phase 3 — Validation & Cutover (Week 4)** Ran automated row count and checksum validation across all 340 tables. When the Iceberg replica achieved <1 second lag from production, we switched the analytics workloads to Trino-over-Iceberg and decommissioned the analytics replica on Postgres.
Results & Impact
Client
Confidential SaaS Platform (Series A, Europe)
Frequently Asked Questions
Does CDC work with all PostgreSQL versions?+
Debezium supports PostgreSQL 9.6+. You need the logical replication feature enabled (wal_level = logical) and either the pgoutput plugin (built-in since PG10) or decoderbufs. AWS RDS and Aurora PostgreSQL both support this configuration.
What happens if Kafka goes down during CDC streaming?+
Debezium stores its offset in Kafka Connect's distributed offset store. On restart, it resumes from the last committed offset. Kafka's replication ensures no data loss. The Iceberg sink handles duplicate events idempotently via the Flink CDC merge operator.
Want similar results for your business?
Let's discuss your project. Free consultation, no obligation.
Start a Conversation