PostgreSQL vs YugabyteDB: distributed Postgres-style—amazing or overkill

Was this helpful?

Most database arguments start like philosophy and end like an incident report. Your app is fine—until you need two things at once: Postgres semantics and global availability. Then the bill comes due. Sometimes it’s a performance bill. Sometimes it’s an operations bill. Sometimes it’s a “why is the checkout page hanging only in São Paulo?” bill.

If you’re trying to decide between “single Postgres, well-run” and “distributed Postgres-style, Raft-backed, everywhere,” this is the production-grounded view: what breaks, what’s expensive, what’s brilliant, and what’s overkill.

What you’re really choosing (it’s not features)

On paper, PostgreSQL and YugabyteDB’s YSQL can both look like “SQL with tables, indexes, transactions, JSON, and a query planner.” That’s not the decision. The decision is:

  • Do you want to scale mostly by making one database instance better (bigger box, faster disks, tuned queries, read replicas, partitioning), or
  • Do you want to scale by distributing the database’s storage and consensus across nodes, accepting higher baseline complexity to buy failure tolerance and horizontal growth.

PostgreSQL is the gold standard for “simple that can be sophisticated.” It lets you run a clean production system where performance work is mostly about indexes, vacuum, memory, and not doing dumb things with connection storms.

YugabyteDB is a distributed SQL database with a Postgres-compatible API (YSQL) built over a distributed storage layer (DocDB) and Raft consensus. It wants to be the answer when your requirements read like a ransom note: “must survive node loss, must scale writes, must run across zones/regions, must be Postgres-ish, must still do ACID.”

The sharp edge: a distributed database changes the physics of your app. Many queries become “a little networked.” Some become “a lot networked.” You’ll also start caring about placement, leaders, tablet splits, and the difference between local and global transactions. You can keep your SQL, but you’re swapping one kind of expertise for another.

Rule of thumb I’ll defend: If your biggest pain is “we can’t keep Postgres up” or “we keep hitting CPU on one writer,” do not start with distributed SQL. Start with boring Postgres done right. If your biggest pain is “our business requires active-active across zones/regions with low RPO and no single-writer bottleneck,” YugabyteDB (or similar) becomes a real option.

Quick history and facts that matter in production

Some context helps cut through marketing. Here are concrete facts that change how you operate these systems:

  1. PostgreSQL’s roots go back to POSTGRES (1986), long before “cloud native” was a twinkle in a VC deck. The modern PostgreSQL project has been shipping reliable releases for decades, and the culture values correctness over novelty.
  2. MVCC in Postgres (multi-version concurrency control) is why readers don’t block writers (mostly). It’s also why vacuum exists, and why “dead tuples” are a real operational thing.
  3. Streaming replication (WAL-based) became mainstream in Postgres 9.x era, enabling practical hot standbys. This shaped the default HA pattern: one primary, replicas, failover orchestration.
  4. Postgres physical replication is all-or-nothing at the instance level. It’s great for whole-cluster failover, not for sharding your write load across many primaries without careful app-level partitioning.
  5. Raft became the consensus workhorse of modern distributed systems in the 2010s. It’s popular because it’s easier to reason about than Paxos, not because it’s free. Every write pays some coordination tax.
  6. YugabyteDB uses a distributed storage engine (DocDB) inspired by LSM-tree designs. That means compactions, SSTables, and different IO behaviors than Postgres heap storage.
  7. “Distributed SQL” is a second swing at an old idea. Shared-nothing databases and distributed transactions are not new; what’s new is better automation, better networking, and the willingness to spend more CPU to simplify application design.
  8. Google Spanner (2012) normalized the idea of globally distributed ACID with strong consistency, at the cost of careful time/consensus engineering. Many systems (including the broader market category) borrowed the ambition, even if they differ in implementation.
  9. Postgres extensions are a superpower (PostGIS, pg_stat_statements, etc.). In “Postgres-compatible” systems, extension support is often partial, and that changes what you can instrument and how you solve problems.

Joke #1: A distributed database is just a regular database that has discovered teleportation is unreliable and now holds grudges about it.

Architecture differences that change your pager life

PostgreSQL: one primary reality, plus replicas

In classic Postgres production, you have one writable primary. Replicas follow via WAL. Failover promotes a replica. Reads can be offloaded, but writes are single-node.

This is not a weakness; it’s a design that keeps the core simpler and predictable. With good disks, sane schema design, and connection pooling, Postgres can do a lot. And the failure modes are well-understood: disk saturation, replication lag, bloat, autovacuum misbehavior, checkpoint storms, bad queries, and connection spikes.

The operational win is that you can usually answer “where is the truth?” with “on the primary.” That sounds small until you’re debugging a heisenbug at 3 a.m.

YugabyteDB: distributed storage, tablet leaders, and consensus everywhere

YugabyteDB splits data into tablets (shards). Each tablet is replicated (typically RF=3). A tablet has a leader, and followers. Writes go through Raft to commit. Reads may be served from leaders or followers depending on consistency settings and query paths.

The result: you can lose nodes and still serve traffic. You can scale storage and write throughput by adding nodes. But you’ve also introduced:

  • Leader placement issues (latency depends on where leaders are).
  • Hot tablet issues (one shard gets hammered, becomes the bottleneck).
  • Background work (compactions, tablet splitting, rebalancing) that competes with foreground traffic.
  • More ways to be “up” but slow (quorum exists, but tail latency is ugly).

The operational model shifts. You don’t just manage one database process; you manage a small distributed system with its own control plane and failure semantics.

Storage engine differences: heap+WAL vs LSM-ish reality

Postgres stores rows in heap files, uses WAL for durability, and relies on vacuum to reclaim space because MVCC keeps old versions around until they’re safe to remove.

YugabyteDB’s underlying storage behaves more like an LSM-tree engine: writes are log-structured, then compacted. That often means write amplification and compaction debt can become performance problems. It’s not “worse,” it’s different. Your disks and your monitoring need to match the difference.

Transactions: local vs distributed is where your latency goes to live

Postgres transactions are local to the instance. Locking and visibility rules are all in-process. Network is not part of commit latency.

In a distributed SQL system, some transactions touch multiple tablets. That means extra coordination. If those tablets are in different zones/regions, now your commit path includes WAN latency. The database can still be correct. Your users will still be annoyed.

Practical implication: In YugabyteDB, data modeling and locality choices (tablespaces/placement, partitioning keys, access patterns) can make the difference between “fast enough” and “why is every request 80 ms slower than yesterday.”

Latency, consistency, and cost: the triangle you can’t escape

Let’s talk about the constraints you can’t negotiate with:

Latency: baseline matters more than averages

In Postgres, a simple indexed lookup can be a few hundred microseconds to a couple milliseconds on decent hardware, assuming cache hits and a calm system. Tail latency tends to come from IO stalls, lock contention, or garbage work like vacuum falling behind.

In distributed systems, baseline latency includes coordination. Even if the median looks okay, your p95/p99 can jump when leadership moves, when a node has a GC hiccup, when compactions spike IO, or when a cross-tablet transaction hits the slowest participant.

Consistency: what your product people mean vs what your code does

Postgres gives you strong consistency inside the instance, and replication is usually asynchronous unless you configure synchronous replication. That means you choose between “no data loss on primary failure” and “don’t add latency to every commit.” You can do synchronous, but it’s a business decision disguised as a config file.

YugabyteDB typically uses synchronous replication across a tablet’s Raft group for writes. That’s a stronger default durability posture, but latency depends on replica placement. You can also configure read consistency tradeoffs. Your system will let you make it fast; it will also let you make it weird.

Cost: you pay for resilience with extra machines and extra work

A well-run Postgres cluster might be: one beefy primary, two replicas, a connection pooler, and good backups. Your scaling lever is “bigger primary” until it’s not.

A YugabyteDB cluster wants multiple nodes even for small workloads, because redundancy is part of the design. With replication factor 3, your raw storage cost is roughly 3x (plus overhead), and you need headroom for rebalancing and compactions. You’re buying operational smoothness under failure, but you’re paying in nodes, network, and complexity.

Joke #2: The nice thing about consensus algorithms is they guarantee agreement—mostly about how long your deploy is going to take.

A reliability paraphrased idea (one quote, kept honest)

Paraphrased idea, attributed to John Allspaw: “Your system’s real behavior is what it does under stress and failure, not what the architecture diagram promised.”

Postgres compatibility: what “Postgres-style” buys you—and what it doesn’t

YugabyteDB’s YSQL speaks the Postgres wire protocol and aims for Postgres semantics. That’s valuable: existing drivers, ORMs, and a lot of SQL knowledge transfer.

But “compatible” is not “identical,” and production systems live in the differences. Here are the places teams trip:

  • Extensions: In Postgres, extensions solve real problems (observability, types, indexing). In YugabyteDB, extension support is more limited. If you rely on a specific extension, you might be redesigning instead of migrating.
  • Planner and performance characteristics: Even if a query is valid SQL, execution costs differ. Distributed scans and distributed joins can behave like “fine in staging, expensive in prod” when data size crosses a threshold.
  • Isolation and locking nuance: The goal is Postgres-like behavior, but distributed concurrency control has constraints. Read the fine print on isolation levels and transaction retries.
  • Operational tooling: Postgres has a deep bench of tooling. YugabyteDB has its own tooling, metrics, and operational workflows. You’ll get good visibility, but you’ll relearn some instincts.

If your application uses vanilla SQL, standard indexes, and your biggest problem is scaling beyond a single primary, YugabyteDB can feel like magic. If your application leans heavily on Postgres-specific features and extensions, distributed Postgres-style can feel like moving apartments and discovering your couch doesn’t fit in the new elevator.

Practical tasks: commands, outputs, decisions (12+)

These are the real checks I run (or ask someone to run) when deciding whether Postgres is enough, whether a YugabyteDB cluster is healthy, or why something is slow. Each includes: command, what the output means, and what decision you make.

Task 1 — Postgres: confirm replication state and lag

cr0x@server:~$ psql -h pg-primary -U postgres -d postgres -c "select application_name, state, sync_state, write_lag, flush_lag, replay_lag from pg_stat_replication;"
 application_name |   state   | sync_state | write_lag | flush_lag | replay_lag
------------------+-----------+------------+-----------+-----------+------------
 pg-replica-1      | streaming | async      | 00:00:00  | 00:00:00  | 00:00:01
 pg-replica-2      | streaming | async      | 00:00:00  | 00:00:00  | 00:00:02
(2 rows)

Meaning: Replicas are streaming; replay lag is low seconds. Decision: Read scaling and failover posture look normal. If lag grows, investigate network/disk on replicas or long-running queries delaying replay.

Task 2 — Postgres: find the worst queries by total time

cr0x@server:~$ psql -h pg-primary -U postgres -d appdb -c "select queryid, calls, total_time, mean_time, rows, left(query, 80) as q from pg_stat_statements order by total_time desc limit 5;"
 queryid  | calls | total_time | mean_time | rows  | q
----------+-------+------------+-----------+-------+--------------------------------------------------------------------------------
 98122311 | 12000 | 955432.12  | 79.61     | 12000 | select * from orders where customer_id = $1 order by created_at desc limit 50
 11220091 |  1500 | 440010.55  | 293.34    |  1500 | update inventory set qty = qty - $1 where sku = $2
(2 rows)

Meaning: One query dominates total time; another has high mean latency. Decision: Optimize/index the first if it’s hot; investigate contention or missing index for the update. If this is already “good enough,” you might not need distributed SQL.

Task 3 — Postgres: check bloat pressure and vacuum health

cr0x@server:~$ psql -h pg-primary -U postgres -d appdb -c "select relname, n_dead_tup, n_live_tup, last_vacuum, last_autovacuum from pg_stat_user_tables order by n_dead_tup desc limit 5;"
     relname     | n_dead_tup | n_live_tup |     last_vacuum      |   last_autovacuum
----------------+------------+------------+----------------------+----------------------
 order_events   |   8200000  |  51000000  |                      | 2025-12-30 09:01:12
 sessions       |   1900000  |  12000000  | 2025-12-29 03:11:55  | 2025-12-30 08:57:48
(2 rows)

Meaning: Dead tuples are high; autovacuum is running but may be falling behind. Decision: Tune autovacuum thresholds per table, add appropriate indexes for vacuum, and check long-running transactions. Don’t “solve” this by migrating databases.

Task 4 — Postgres: see active locks and who’s blocking

cr0x@server:~$ psql -h pg-primary -U postgres -d appdb -c "select blocked.pid as blocked_pid, blocker.pid as blocker_pid, blocked.query as blocked_query, blocker.query as blocker_query from pg_locks blocked join pg_stat_activity blocked_sa on blocked_sa.pid = blocked.pid join pg_locks blocker on blocker.locktype = blocked.locktype and blocker.database is not distinct from blocked.database and blocker.relation is not distinct from blocked.relation and blocker.page is not distinct from blocked.page and blocker.tuple is not distinct from blocked.tuple and blocker.pid != blocked.pid join pg_stat_activity blocker_sa on blocker_sa.pid = blocker.pid where not blocked.granted;"
 blocked_pid | blocker_pid |           blocked_query           |         blocker_query
------------+-------------+-----------------------------------+--------------------------------
      29411 |       28703 | update inventory set qty = qty-1  | vacuum (verbose, analyze) inventory
(1 row)

Meaning: Vacuum is blocking an update (or vice versa). Decision: Adjust vacuum cost settings, schedule maintenance windows, or redesign transaction patterns. If your app is lock-sensitive, distributed won’t magically remove contention.

Task 5 — Postgres: confirm checkpoint pressure

cr0x@server:~$ psql -h pg-primary -U postgres -d postgres -c "select checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time from pg_stat_bgwriter;"
 checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time
------------------+-----------------+-----------------------+----------------------
             1021 |             980 |               7123456 |               502311
(1 row)

Meaning: Many requested checkpoints; write/sync times are high. Decision: Tune shared_buffers, checkpoint_timeout, checkpoint_completion_target, and ensure disk can handle write bursts. Again: fix the fundamentals before shopping for a new database.

Task 6 — Postgres: check connection storms and pooler need

cr0x@server:~$ psql -h pg-primary -U postgres -d postgres -c "select state, count(*) from pg_stat_activity group by 1 order by 2 desc;"
   state   | count
-----------+-------
 idle      | 420
 active    |  65
 idle in transaction | 12
(3 rows)

Meaning: Hundreds of idle sessions; some “idle in transaction” (bad). Decision: Add/confirm connection pooling (PgBouncer), fix app transaction handling, and set statement/idle timeouts. Many “Postgres is slow” complaints are really “too many connections.”

Task 7 — YugabyteDB: check overall cluster status

cr0x@server:~$ yb-admin -master_addresses yb-master-1:7100,yb-master-2:7100,yb-master-3:7100 list_all_tablet_servers
UUID                                 RPC Host/Port       State
3d6b1f7a-2c3b-4f5f-ae3b-7d4d7bbad001 yb-tserver-1:9100   ALIVE
c4f05c1e-9d2b-43d4-a822-ef6e8f9a0002 yb-tserver-2:9100   ALIVE
a9e7f2c0-8b7f-4b32-a1a9-1d8d1dbb0003 yb-tserver-3:9100   ALIVE
(3 rows)

Meaning: All tablet servers are alive. Decision: If one is missing or DEAD, stop here and stabilize the cluster before doing performance analysis; “slow” often means “degraded replication.”

Task 8 — YugabyteDB: inspect tablet leaders and replication health

cr0x@server:~$ yb-admin -master_addresses yb-master-1:7100,yb-master-2:7100,yb-master-3:7100 list_tablets appdb.orders
Tablet-UUID                             Range                       Leader-UUID                             RF
b8c9d1110c3b4f9ea0d0000000000001        [hash 0x00, 0x55)           3d6b1f7a-2c3b-4f5f-ae3b-7d4d7bbad001   3
b8c9d1110c3b4f9ea0d0000000000002        [hash 0x55, 0xaa)           c4f05c1e-9d2b-43d4-a822-ef6e8f9a0002   3
b8c9d1110c3b4f9ea0d0000000000003        [hash 0xaa, 0xff)           a9e7f2c0-8b7f-4b32-a1a9-1d8d1dbb0003   3

Meaning: Leadership is balanced across tservers. Decision: If leaders concentrate on one node, expect hotspots and uneven CPU. Rebalance leaders or revisit placement/partitioning.

Task 9 — YugabyteDB: check for under-replicated tablets

cr0x@server:~$ yb-admin -master_addresses yb-master-1:7100,yb-master-2:7100,yb-master-3:7100 list_under_replicated_tablets
Tablet-UUID                             Table
(0 rows)

Meaning: No under-replicated tablets. Decision: If this shows entries, fix replication first (disk, network, node health). Under-replication inflates latency and risk.

Task 10 — YugabyteDB (YSQL): find slow queries at the SQL layer

cr0x@server:~$ ysqlsh -h yb-tserver-1 -U yugabyte -d appdb -c "select queryid, calls, total_time, mean_time, left(query, 80) from pg_stat_statements order by mean_time desc limit 5;"
 queryid  | calls | total_time | mean_time | left
----------+-------+------------+-----------+--------------------------------------------------------------------------------
 77190012 |  3300 | 222000.11  | 67.27     | select * from orders where customer_id = $1 order by created_at desc limit 50
(1 row)

Meaning: Same story as Postgres: your app’s hot paths are visible. Decision: Before blaming “distributed,” check indexing and query shape. Distributed systems punish full scans more harshly.

Task 11 — YugabyteDB: validate whether a query is local or distributed (explain)

cr0x@server:~$ ysqlsh -h yb-tserver-1 -U yugabyte -d appdb -c "explain (analyze, dist, costs off) select * from orders where customer_id = 42 order by created_at desc limit 50;"
                               QUERY PLAN
------------------------------------------------------------------------
 Limit (actual time=8.211..8.227 rows=50 loops=1)
   ->  Index Scan using orders_customer_created_idx on orders (actual time=8.210..8.221 rows=50 loops=1)
       Storage Read Requests: 3
       Storage Write Requests: 0
       Storage Execution Time: 6.902 ms
 Planning Time: 0.312 ms
 Execution Time: 8.411 ms
(8 rows)

Meaning: “Storage Read Requests: 3” hints multiple tablet reads. Decision: If this number grows with data size, design for locality (hash partition key, colocate related tables, or avoid cross-partition joins).

Task 12 — YugabyteDB: check compaction/LSM pressure via disk and IO symptoms

cr0x@server:~$ iostat -xz 1 3
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.22    0.00    6.11   21.40    0.00   54.27

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz aqu-sz  %util
nvme0n1         120.0   8200.0     0.0    0.0    4.10    68.3     950.0  64200.0     0.0    0.0   24.80    67.6   9.10  98.5

Meaning: Disk is pegged; write await is high. In an LSM-ish engine, compactions can drive sustained writes. Decision: Add IO capacity, tune compactions, throttle background work, or add nodes to spread load. If your budget can’t support fast disks, distributed storage will be a bad time.

Task 13 — Network reality check: cross-zone/region latency

cr0x@server:~$ ping -c 5 yb-tserver-3
PING yb-tserver-3 (10.10.3.21) 56(84) bytes of data.
64 bytes from 10.10.3.21: icmp_seq=1 ttl=62 time=18.9 ms
64 bytes from 10.10.3.21: icmp_seq=2 ttl=62 time=19.4 ms
64 bytes from 10.10.3.21: icmp_seq=3 ttl=62 time=18.7 ms
64 bytes from 10.10.3.21: icmp_seq=4 ttl=62 time=20.1 ms
64 bytes from 10.10.3.21: icmp_seq=5 ttl=62 time=19.2 ms

--- yb-tserver-3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 18.7/19.2/20.1/0.5 ms

Meaning: ~19 ms RTT is fine for humans, expensive for consensus. Decision: If Raft quorums span these links, commit latency will inherit it. Keep quorums within a low-latency domain when possible; use geo-partitioning strategies rather than pretending physics doesn’t exist.

Task 14 — Postgres: verify your backups are real (not aspirational)

cr0x@server:~$ pgbackrest --stanza=appdb check
stanza: appdb
status: ok
cipher: none
db (current)
  wal archive min/max (0000000100002A9F000000D1/0000000100002A9F000000F4) ok
  database size 145.3GB, backup size 52.1GB

Meaning: WAL archiving works; backups are consistent. Decision: If this fails, fix backups before scaling anything. Reliability is not a database feature; it’s an operating habit.

Task 15 — YugabyteDB: basic tablet count and skew check

cr0x@server:~$ yb-admin -master_addresses yb-master-1:7100,yb-master-2:7100,yb-master-3:7100 list_tablet_servers
UUID                                 Host           Tablets  Leaders
3d6b1f7a-2c3b-4f5f-ae3b-7d4d7bbad001 yb-tserver-1   620      240
c4f05c1e-9d2b-43d4-a822-ef6e8f9a0002 yb-tserver-2   610      210
a9e7f2c0-8b7f-4b32-a1a9-1d8d1dbb0003 yb-tserver-3   590      170

Meaning: Tablet and leader distribution is uneven. Decision: Investigate rebalancer settings and leader movement. Skew often maps directly to skewed CPU and tail latency.

Fast diagnosis playbook: find the bottleneck without reading tea leaves

When something is “slow,” you need a triage order. Not a ten-page runbook. A three-pass filter that narrows the problem fast.

First: is the system degraded or just busy?

  • Postgres: check replication health and disk saturation. If the primary is IO-bound, everything else is downstream noise.
  • YugabyteDB: check node liveness and under-replication. A cluster can be “up” while Raft groups are unhealthy, and that’s when latency goes feral.

Second: is it one query/path or everything?

  • Look at pg_stat_statements (both systems at the YSQL layer) to find the top consumers by mean and total time.
  • If one query dominates, you likely have a schema/index/query problem.
  • If everything is slow, you likely have a resource bottleneck (CPU, IO, network) or a systemic behavior (vacuum/compaction, connection storms, leader imbalance).

Third: is latency coming from compute, IO, or network?

  • Compute: high CPU, run queue; look for hot leaders (YB) or bad plans (PG).
  • IO: high iowait, high await, device utilization near 100%.
  • Network: cross-zone RTT, packet loss, or a design that forces quorum across distance.

Decision shortcut: If you can attribute p95 latency to network RTT and distributed commit paths, you need data locality changes—not “more nodes.” If you can attribute it to a few queries, you need indexes and better access patterns—not a new database.

Three corporate mini-stories from the trenches

1) Incident caused by a wrong assumption: “read replicas will save us”

A mid-sized SaaS company ran Postgres with two read replicas. The app was mostly reads, so the team moved “heavy endpoints” to replicas and felt accomplished. For a while, everything was fine. Then they launched a feature that wrote an audit row on every read (compliance folks were thrilled).

The assumption was that the endpoint was “read-heavy,” so it stayed routed to replicas. But the ORM wrote audit events in the same request path. On replicas, those writes failed, retried, and fell back to the primary. The primary suddenly saw a surge of writes, plus a surge of connection churn from retries. CPU spiked; lock waits followed; p95 turned into p999.

The incident was noisy because nothing was “down.” The primary was alive. Replication was alive. The user experience was just miserable. The app logs were full of misleading “could not execute statement” warnings that looked like transient network issues.

The fix was embarrassingly straightforward: route all endpoints that can write—even “a tiny write”—to the primary, and separate audit ingestion into an async pipeline with a buffer. They also added a guardrail: in the app, any DB connection tagged as read-only would reject writes at the driver layer.

Takeaway: Scaling reads with replicas is real, but “read-only” must be enforced. And if your product roadmap quietly changes read paths into write paths, your architecture needs to notice.

2) Optimization that backfired: “let’s crank up parallelism and batch everything”

An e-commerce platform hit checkout slowness and decided to “optimize the database.” They increased batch sizes for inventory updates and ran more workers in parallel. The idea: fewer transactions, more throughput, less overhead.

On Postgres, the bigger transactions held locks longer. Deadlocks increased. Autovacuum started lagging because long-running transactions prevented cleanup. Bloat crept in; indexes grew; cache hit rate fell; IO went up. The system became slower for everyone, not just checkout.

Later, they tested a distributed SQL option (YugabyteDB) and applied the same instinct: bigger batches, more parallel workers. This time the backfire looked different. Hot shards formed around popular SKUs. A few tablets became leaders for the busiest keys, and those tablet leaders were hammered. Compactions spiked on the hot nodes. Latency became spiky in a way the team wasn’t used to: “mostly fine, occasionally awful.”

They recovered by doing the opposite of what felt “efficient”: smaller transactions, tighter key distribution, and per-SKU contention controls. They also introduced idempotency and sane retry policies because distributed transactions will ask for retries when conflicts happen.

Takeaway: “Batch more” is not a universal performance trick. It changes lock duration in Postgres and hotspot intensity in distributed systems. Sometimes the optimization is just moving the pain into a more expensive shape.

3) Boring but correct practice that saved the day: rehearsed restores

A payments-adjacent company ran Postgres for core ledger data and experimented with YugabyteDB for a globally available user profile service. They weren’t glamorous about operations. They were strict.

Every month, they ran a restore drill. Not a theoretical one. An actual “restore to a new environment and run a verification suite” drill. They validated WAL continuity, checked row counts, and ran a set of invariants: balances sum correctly, foreign keys match, recent writes exist.

One day, a storage subsystem bug corrupted a subset of blocks on a replica. The primary was still fine, but the failover candidate was now questionable. Because they practiced, the team didn’t guess. They immediately removed the suspect replica from promotion and restored a clean replica from backup while keeping the primary stable.

Two months later, when a separate incident required promoting a replica during a maintenance window gone sideways, they knew exactly which nodes were safe and how long it would take to rebuild. Nobody improvised on production data. Nobody got creative with `rm -rf`.

Takeaway: Practice restores. It’s unsexy, it’s time-consuming, and it prevents the kind of “we lost data and also our dignity” outage that ends careers.

Common mistakes: symptom → root cause → fix

1) Symptom: Postgres p95 latency spikes every few minutes

Root cause: Checkpoint bursts and IO saturation (often combined with small checkpoint_timeout and insufficient disk throughput).

Fix: Increase checkpoint_timeout, set checkpoint_completion_target, ensure WAL and data are on fast storage, and watch pg_stat_bgwriter. Confirm the OS isn’t swapping.

2) Symptom: Postgres tables keep growing, performance degrades over days

Root cause: Vacuum can’t keep up (autovacuum thresholds too high, long transactions, or insufficient maintenance_work_mem).

Fix: Tune autovacuum per hot table, shorten transaction lifetimes, add indexes that help vacuum, and monitor n_dead_tup.

3) Symptom: “We added read replicas but writes got slower”

Root cause: Connection explosion and pool misconfiguration; replicas didn’t reduce primary work because app still hits primary for writes and many reads.

Fix: Use a pooler (PgBouncer), reduce max connections, and move truly read-only workloads. Validate by measuring primary CPU/IO before/after.

4) Symptom: YugabyteDB is “up” but latency is terrible after a node loss

Root cause: Under-replicated tablets or ongoing re-replication; Raft groups are working harder, and leaders may have moved.

Fix: Check under-replicated tablets, stabilize disks/network, allow rebalancing to finish, and avoid running large schema changes during recovery.

5) Symptom: YugabyteDB has one node with high CPU and the rest look bored

Root cause: Leader concentration or hotspot tablets driven by skewed key access patterns.

Fix: Rebalance leaders, increase tablet count appropriately, and redesign partition keys to spread writes. Consider colocating or geo-partitioning for locality.

6) Symptom: Multi-region YugabyteDB writes feel “randomly slow”

Root cause: Quorum spans high-latency links; commit path includes WAN RTT, and leader placement may be suboptimal.

Fix: Keep Raft quorums within a region/zone where possible, use follower reads or geo-partitioning for reads, and set expectations: strong global writes cost latency.

7) Symptom: Application sees frequent transaction retries in YugabyteDB

Root cause: High contention keys, hot rows, or long transactions increasing conflict probability.

Fix: Reduce contention (shard counters, avoid “global sequence row”), shorten transactions, implement retry with jitter, and ensure idempotency.

8) Symptom: “We migrated and now analytics queries are worse”

Root cause: Distributed scans and joins are expensive; data locality and indexing aren’t aligned with analytic access patterns.

Fix: Separate OLTP from analytics, use materialized views/ETL, or keep analytics on Postgres/warehouse. Distributed OLTP is not a free analytics engine.

Checklists / step-by-step plan

Step-by-step: decide if Postgres is enough

  1. Measure, don’t vibe: enable pg_stat_statements and capture 7–14 days of workload shape (top queries by total/mean time, call counts).
  2. Fix the obvious: missing indexes, bad ORM patterns, N+1 queries, “idle in transaction,” and connection pool misconfig.
  3. Get vacuum under control: per-table autovacuum tuning for churn-heavy tables; confirm long transactions aren’t blocking cleanup.
  4. Confirm disk headroom: iostat and filesystem utilization; check for checkpoint pressure and WAL write saturation.
  5. Implement HA the boring way: at least one replica, automated failover, backups with restore drills, and a pooler.
  6. Scale reads properly: replicas for read-only, caching where safe, and reduce round trips.
  7. Scale writes honestly: if one primary is the limit, evaluate partitioning/sharding at app layer or consider distributed SQL.

Step-by-step: decide if YugabyteDB is justified

  1. Write down the non-negotiables: “survive zone loss,” “active-active,” “write scaling,” “RPO/RTO.” If you can’t articulate these, you’re shopping for dopamine.
  2. Model latency budget: what p95 do you need, and what RTT exists between placements? If your commit path spans 20–50 ms RTT, you will feel it.
  3. Prototype the hot path: not a generic benchmark. Your actual top 5 queries, your actual transaction shapes, your actual indexes.
  4. Design for locality: choose partition keys and placement to keep most transactions local. Treat cross-region distributed transactions as a special case.
  5. Plan retries: implement idempotency and transaction retry handling in the app. Distributed systems don’t ask; they tell.
  6. Budget for ops: monitoring for leader skew, tablet splits, compactions, and node health. Assign ownership; “the vendor will handle it” is not an on-call plan.

Migration checklist (if you must)

  1. Inventory Postgres features used: extensions, custom types, triggers, logical replication, stored procedures.
  2. Identify compatibility gaps early; rewrite the risky parts first.
  3. Set up dual writes or CDC in a staging environment; validate correctness with invariants.
  4. Run load tests that include failure: kill a node, add latency, fill disks.
  5. Define rollback criteria and rehearse it. If rollback is “we’ll figure it out,” you won’t.

FAQ

1) Is YugabyteDB “just Postgres but distributed”?

No. YSQL aims for Postgres compatibility at the SQL and protocol layers, but the storage and replication model is different. That changes performance and failure modes.

2) Can I get active-active with Postgres?

Not in the straightforward “multi-writer with strong consistency” sense. You can do logical replication, multi-primary patterns, or app-level sharding, but each comes with sharp edges and conflict handling. For true distributed writes without app-sharding, you’re in distributed SQL territory.

3) Will YugabyteDB automatically be faster than Postgres?

No. For many OLTP workloads that fit on one good Postgres primary, Postgres will be faster and simpler. YugabyteDB buys you resilience and horizontal scaling, not free latency reduction.

4) What’s the most common reason YugabyteDB migrations fail?

Assuming “Postgres-compatible” means “drop-in replacement,” then discovering an extension, query pattern, or operational expectation doesn’t translate cleanly. Compatibility is real, but not magical.

5) When is distributed SQL actually the right choice?

When your business requirements demand high availability across failure domains, scaling writes beyond a single node, and you can’t or won’t shard in the application. Also when downtime costs more than extra nodes.

6) How do retries change application design in distributed systems?

You need idempotency for write operations and structured retry logic with jitter and bounded attempts. Otherwise, retries turn transient conflicts into thundering herds.

7) Does multi-region always mean “users everywhere get low latency”?

No. Multi-region can mean “available everywhere,” not “fast everywhere.” Strongly consistent writes across regions pay WAN latency. To get low latency everywhere, you usually need locality strategies and sometimes relaxed consistency for reads.

8) Can I use YugabyteDB for analytics too?

You can run analytical queries, but distributed OLTP engines often make large scans and joins expensive. Many teams keep OLTP in Postgres or YugabyteDB and move analytics to a dedicated system.

9) What should I monitor that’s different between them?

Postgres: vacuum progress, bloat, locks, checkpoints, replication lag, connection counts. YugabyteDB: tablet health, leader distribution, under-replication, compaction debt, rebalancing, and cross-node latency.

10) If I’m small today, should I “future-proof” with YugabyteDB?

Usually no. Run Postgres well first. The best “future-proofing” is clean schema design, query discipline, and operational hygiene. Migrations are always harder than people budget for.

Practical next steps

If you want a decision that holds up under production stress, do this in order:

  1. Make your current Postgres boringly excellent: pg_stat_statements, connection pooling, vacuum tuning, sane timeouts, verified backups, rehearsed restores.
  2. Quantify the scaling wall: is it CPU, IO, lock contention, or single-writer throughput? Bring graphs, not opinions.
  3. Write down the availability requirement in operational terms: what failure domains must you survive, what RPO/RTO, and what user-facing latency budget.
  4. If you truly need distributed writes and multi-domain resilience: prototype YugabyteDB using your real workload, measure tail latency, and practice failure (node loss, disk pressure, latency injection).
  5. Choose what you can operate: the best database is the one your team can keep fast, correct, and calm at 2 a.m.

Postgres is still the default because it’s predictable and deeply understood. YugabyteDB is compelling when the problem statement is genuinely distributed. If you’re not sure which problem you have, you probably have a Postgres problem.

← Previous
DNS Wildcard Records: The Convenience That Quietly Breaks Things (and Fixes)
Next →
ZFS Replication Over WAN: Making Sends Fast on Slow Links

Leave a comment