Home/Case Studies/Migrating 2.4 TB PostgreSQL to a Cloud-Native Lakehouse Without Downtime
Data Engineering

Migrating 2.4 TB PostgreSQL to a Cloud-Native Lakehouse Without Downtime

CDC-Based Migration with Debezium, Iceberg, and Zero Business Disruption

Data Engineering
PostgreSQL
Apache Iceberg
Debezium
CDC
Migration
Zero downtime achieved — production PostgreSQL untouched during migration
Query performance for analytics improved by 8–40× (Trino columnar reads vs PostgreSQL row scans)
Production database read load reduced by 65% (analytics traffic fully offloaded)
Storage cost reduced by 40% (Parquet columnar compression vs PostgreSQL heap files)
Migration completed in 22 days — 8 days ahead of the 30-day target

The Challenge

A European B2B SaaS company had accumulated 2.4 TB of operational data in a self-managed PostgreSQL 14 cluster on bare-metal. Analytics queries were running directly on the production database, causing significant read load, occasional replication lag, and — during quarter-end reporting — perceptible slowdown for end users. The data team needed a dedicated analytical platform but had hard constraints: zero downtime for their operational workloads and a migration window of less than 30 days.

Our Solution

We designed a CDC-based migration strategy using Debezium for change data capture, avoiding any ETL-style bulk export that would require downtime. **Phase 1 — Initial Snapshot (Week 1)** Used pglogical to create a consistent snapshot replica, then loaded it into S3 as Parquet using Spark on EMR. This took 14 hours and happened entirely in the background. **Phase 2 — CDC Streaming (Weeks 1–3)** Deployed Debezium on Kubernetes, connected to a dedicated replication slot on the Postgres primary, and streamed WAL events to Kafka topics (one per table). A Flink consumer materialised these events as Iceberg table updates on S3, maintaining full ACID compliance. **Phase 3 — Validation & Cutover (Week 4)** Ran automated row count and checksum validation across all 340 tables. When the Iceberg replica achieved <1 second lag from production, we switched the analytics workloads to Trino-over-Iceberg and decommissioned the analytics replica on Postgres.

Results & Impact

Zero downtime achieved — production PostgreSQL untouched during migration
Query performance for analytics improved by 8–40× (Trino columnar reads vs PostgreSQL row scans)
Production database read load reduced by 65% (analytics traffic fully offloaded)
Storage cost reduced by 40% (Parquet columnar compression vs PostgreSQL heap files)
Migration completed in 22 days — 8 days ahead of the 30-day target

Client

Confidential SaaS Platform (Series A, Europe)

Frequently Asked Questions

Does CDC work with all PostgreSQL versions?+

Debezium supports PostgreSQL 9.6+. You need the logical replication feature enabled (wal_level = logical) and either the pgoutput plugin (built-in since PG10) or decoderbufs. AWS RDS and Aurora PostgreSQL both support this configuration.

What happens if Kafka goes down during CDC streaming?+

Debezium stores its offset in Kafka Connect's distributed offset store. On restart, it resumes from the last committed offset. Kafka's replication ensures no data loss. The Iceberg sink handles duplicate events idempotently via the Flink CDC merge operator.

Want similar results for your business?

Let's discuss your project. Free consultation, no obligation.

Start a Conversation