MySQL vs CockroachDB: distributed SQL on small servers—the latency tax nobody mentions

October 10, 2025 • February 3, 2026 • Read: 25 min • Views: 13

Was this helpful?

You bought “three small servers” because budgets love threes. Then you deployed a distributed SQL database because marketing loves the word “resilient.”
Now your dashboard’s p95 latency looks like it started taking the scenic route, and every incident review includes the phrase “but it worked fine locally.”

This is the latency tax nobody puts on the slide: distributed correctness has a very real per-query cost, and small servers are where that cost becomes loud.
Not catastrophic. Just loud enough to ruin your sleep.

The latency tax: what you pay for consensus

If you remember one thing, make it this: MySQL’s default OLTP path is “do the write locally, then replicate later.”
CockroachDB’s default OLTP path is “coordinate the write with a quorum, then commit.”

On a decent network and healthy disks, quorum writes are fine. On small servers with commodity storage and 1Gbps networking (or worse: virtualized “best effort”),
quorum writes become a latency amplifier. Not always in median latency. In tail latency. The kind users notice and SREs inherit.

The operational difference isn’t subtle:

MySQL: A single primary can be extremely fast. It is also a single point where physics is kind and correctness is local. Replicas are for read scale and failover. Consistency is a policy choice.
CockroachDB: Every write participates in consensus for the range it touches. Consistency is the default, not a bolt-on. Latency is the fee.

A practical framing: if your app is sensitive to p99 latency and you’re running small nodes, distributed SQL can be like hiring a committee to approve every email you send.

One quote that survives a lot of incident reviews: Everything fails, all the time. — Werner Vogels.
He wasn’t talking about your three-node cluster specifically, but he might as well have been.

Where the tax shows up (and why it surprises people)

The surprise isn’t that a distributed system has overhead. The surprise is where it manifests.
You’ll look at CPU and see plenty of headroom. You’ll look at average query time and feel okay. Then you’ll open the histogram and see p95/p99 spiking.

Common places the tax appears:

Quorum writes: a commit needs acknowledgements from a majority of replicas for the range. That’s at least one round trip, sometimes more.
Read consistency: strongly consistent reads can need coordination (leaseholder placement matters). Stale reads can be faster, but you must opt in and understand the consequences.
Range splits and rebalances: background work isn’t “free.” On small boxes, it competes directly with foreground latency.
Compactions: LSM-based storage engines trade write amplification for read efficiency. That amplification becomes a disk and CPU tax at inconvenient times.

Two numbers that should haunt your sizing spreadsheet

You can’t “optimize” away physics, only route around it.

Network RTT between nodes: every millisecond matters when your commit path includes cross-node coordination. A 1–2ms RTT sounds fine until you need multiple hops under load.
Storage fsync latency: if your storage layer occasionally turns fsync into a 20–100ms adventure, distributed consensus will faithfully incorporate that adventure into user-visible latency.

Joke #1: A three-node cluster on tiny VMs is like a three-person relay race where everyone insists on tying their shoes between handoffs.

Facts and history you can use in arguments

These aren’t trivia for a pub quiz. They explain why the systems behave the way they do.

MySQL shipped in the mid-1990s and became the default for web OLTP because it was easy to deploy, easy to replicate, and fast on a single box.
InnoDB became the default MySQL storage engine (after years of MyISAM dominance), bringing crash recovery, row-level locking, and real transactions to mainstream deployments.
Most MySQL HA patterns historically favored asynchronous replication because it’s simple and keeps primary latency low; the tradeoff is potential data loss on failover.
Consensus algorithms like Raft were designed to make distributed consensus easier to understand and implement than Paxos, not free in latency.
CockroachDB was built with Raft from the beginning, aiming for PostgreSQL-like SQL semantics with a shared-nothing, geo-distributed design.
Google Spanner popularized “distributed SQL with strong consistency” using TrueTime; CockroachDB targets similar goals without specialized clock hardware, which affects how uncertainty and transaction retries show up.
LSM-tree storage engines became common in distributed databases because they handle high write throughput well, but they require compaction—i.e., background I/O that can bite tail latency.
The CAP theorem era made “consistency vs availability” a boardroom conversation; distributed SQL systems typically pick consistency and partition tolerance, forcing availability compromises during partitions.
p99 latency became an SRE obsession because user experience is dominated by tail behavior, not averages; distributed coordination is a classic tail-latency generator.

Why small servers make distributed SQL look worse

“Small servers” usually means some combination of: fewer cores, less RAM, slower NVMe (or worse: network-attached), less predictable neighbors, and a network that was sized for “normal” traffic.
Distributed SQL does not hate small servers. It just refuses to pretend they are big servers.

Small nodes don’t just reduce capacity; they reduce slack

Slack is the invisible margin that keeps tail latency stable.
On a big machine, compactions, flushes, or minor network jitter might be absorbed.
On a small node, the same background work competes with foreground traffic and wins more often than you’d like.

Three-node clusters are operationally seductive (and mechanically harsh)

Three nodes is the minimum shape that looks like “a cluster.” It’s also the minimum shape where any disturbance can become user-visible.
Lose one node and you’re in “no more failures please” mode.

MySQL on three nodes typically means 1 primary + 2 replicas. The primary stays hot; replicas are mostly passengers until failover.
CockroachDB on three nodes means all three are in the write path (quorum). Every node’s health matters every minute.

The quiet killer: unevenness

Small fleets are rarely homogeneous. One node is on a slightly noisier hypervisor.
One disk is slightly slower.
One NIC is sharing interrupts with something rude.
Distributed systems tend to run at the pace of their worst participant—especially at p99.

MySQL: the low-latency baseline with sharp edges

MySQL’s superpower is boring: one machine, one commit log, one cache hierarchy.
If your workload fits on one primary, MySQL can deliver very low latency with predictable behavior.
You can absolutely run MySQL badly, but you have to work at it.

What MySQL is great at (on small servers)

Low-latency single-row writes when your schema and indexes are sane.
Stable p95/p99 when storage is decent and you avoid contention hotspots.
Operational simplicity for teams that can tolerate a single primary and have a clear failover plan.

MySQL’s failure modes are not subtle

Primary saturation: you don’t get graceful degradation; you get a queue.
Replication lag: read scaling and failover safety depend on a lagging async stream unless you pay the sync cost.
Failover correctness: split brain is avoidable, but not automatic; you need tooling and discipline.

A MySQL reality check

If you need strong consistency across multiple writers in multiple locations, MySQL won’t magically become a distributed database.
You can build that with semi-sync replication, group replication, or external coordination—but you’ll earn every pager.

CockroachDB: correctness at scale, even when you didn’t need scale

CockroachDB’s pitch is simple: keep SQL, lose the single-primary fragility.
Writes are replicated. Reads can be served from the “right” place. Failovers are automated and fast.
It is real engineering, not fairy dust.

What CockroachDB is great at (even on small servers)

Surviving node failures without manual failover rituals.
Scaling reads and writes by adding nodes (within reason and with attention to locality).
Consistent semantics that keep application logic simpler than hand-rolled replication correctness.

What CockroachDB demands from you

Respect for latency: you’re coordinating, so measure network and disk like you mean it.
Understanding of contention: hotspots don’t disappear in a cluster; they become distributed hotspots with retries.
Careful locality decisions: leaseholders and replicas must be placed intentionally if you care about latency.

Transaction retries: the “feature” that feels like a bug

CockroachDB will retry transactions under contention or uncertainty. That’s part of how it preserves correctness.
The symptom is application-level errors (or driver retries) and latency spikes that look random until you correlate them with contention metrics.

Joke #2: Distributed SQL is the only product that can sell you “no single point of failure” and still fail your query because two clocks disagreed politely.

Workload-by-workload: who wins and why

Simple OLTP (single-row reads/writes, tight p99 SLO)

On small servers, MySQL usually wins on raw latency. The commit is local. The cache is local. The code path is mature and short.

CockroachDB can meet reasonable SLOs here, but you’ll feel the quorum and storage engine costs more, especially if your nodes are underpowered or your network is twitchy.
If your product lives or dies on 5–20ms p99 for small transactions, don’t casually swap in a consensus-based database and hope it “tunes out.”

High availability with automatic failover

CockroachDB wins on operational behavior. If you’re not staffed to run crisp MySQL failover drills, you will eventually fail over badly.
CockroachDB makes the “boring” thing (node loss) boring.

MySQL can be highly available, but you must implement and rehearse it. The technology exists. The organizational consistency often doesn’t.

Write scaling (multiple writers, horizontal growth)

MySQL’s horizontal write scaling is not its native comfort zone.
You can shard, but then your app becomes a routing layer. That’s a legitimate architecture, but it’s a commitment.

CockroachDB is designed to scale writes by splitting ranges and distributing them. On small nodes, you may still be limited by per-node disk and CPU,
but at least the design goal matches the requirement.

Read scaling

MySQL read replicas are blunt but effective. They’re also operationally forgiving: if a replica lags, you route around it.
CockroachDB can distribute reads too, but you must understand leaseholders and locality; otherwise reads bounce around the cluster and pay extra hops.

Multi-region latency

If you’re truly multi-region, the speed of light becomes your product manager. Strong consistency across oceans is expensive.
CockroachDB gives you tools to place data and control read/write locality, but if you demand single-digit milliseconds globally, your demand is wrong.

MySQL in multi-region often becomes a primary-in-one-region truth with async replicas elsewhere. That’s fast locally and “eventually correct” remotely.
Sometimes that’s exactly what you want.

Schema evolution and operational friction

Both have schema change pitfalls. MySQL has decades of operational folk wisdom for online schema changes.
CockroachDB supports online schema changes but can still surprise you with backfills and their impact on small clusters.

Practical tasks: commands, outputs, and the decision they drive

You don’t fix distributed latency with vibes. You fix it with measurements that point to the slowest part of the commit path:
network, disk, contention, or topology.

Below are concrete tasks. Each one includes (1) a command, (2) representative output, (3) what it means, and (4) the decision you make.
Run them on your real nodes. Preferably during a load test that resembles production, not during a quiet Tuesday afternoon.

Task 1: Confirm network RTT between database nodes

cr0x@server:~$ ping -c 5 db-node-2
PING db-node-2 (10.0.0.12) 56(84) bytes of data.
64 bytes from 10.0.0.12: icmp_seq=1 ttl=64 time=0.78 ms
64 bytes from 10.0.0.12: icmp_seq=2 ttl=64 time=0.91 ms
64 bytes from 10.0.0.12: icmp_seq=3 ttl=64 time=1.02 ms
64 bytes from 10.0.0.12: icmp_seq=4 ttl=64 time=0.85 ms
64 bytes from 10.0.0.12: icmp_seq=5 ttl=64 time=0.80 ms

--- db-node-2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4092ms
rtt min/avg/max/mdev = 0.78/0.87/1.02/0.09 ms

Meaning: Sub-millisecond to ~1ms RTT is decent for a small cluster. If you see 3–10ms inside one “datacenter,” expect p99 pain under quorum writes.

Decision: If RTT is high or jittery, fix network placement (same rack/zone), NIC offload settings, and noisy-neighbor issues before blaming the database.

Task 2: Measure TCP latency and retransmits under load

cr0x@server:~$ ss -ti dst 10.0.0.12:26257 | head -n 12
ESTAB 0 0 10.0.0.11:52144 10.0.0.12:26257
	 cubic wscale:7,7 rto:204 rtt:1.3/0.4 ato:40 mss:1448 pmtu:1500 rcvmss:1448 advmss:1448 cwnd:10 bytes_acked:812345 segs_out:12034 segs_in:11877 send 89.2Mbps lastsnd:12 lastrcv:12 lastack:12 pacing_rate 178Mbps retrans:3/12034

Meaning: rtt shows observed RTT and variance. retrans indicates packet loss or congestion.

Decision: If retransmits climb during spikes, your tail latency isn’t “database tuning.” It’s congestion. Fix QoS, bandwidth, MTU mismatches, or noisy east-west traffic.

Task 3: Check disk latency distribution (the real villain)

cr0x@server:~$ iostat -x 1 5
Linux 6.5.0 (db-node-1) 	12/30/2025 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.20    0.00    6.40    3.60    0.00   71.80

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz aqu-sz  %util
nvme0n1         210.0   8120.0     0.0    0.0    1.20    38.7     480.0   24576.0     0.0    0.0   14.80    51.2   7.40  92.0

Meaning: w_await at ~15ms with %util ~92% means writes are queuing. That’s where p99 goes to die.

Decision: If disk util is consistently high or await spikes, move to faster storage, separate WAL devices, reduce write amplification, or lower compaction pressure before changing SQL knobs.

Task 4: Validate filesystem and mount options

cr0x@server:~$ findmnt -no TARGET,SOURCE,FSTYPE,OPTIONS /var/lib
/var/lib /dev/nvme0n1p2 ext4 rw,relatime,discard,data=ordered

Meaning: Options like discard can add latency on some setups; ext4 defaults are usually fine, but the details matter.

Decision: If you see network filesystems, thin-provisioning weirdness, or unsafe options, fix storage first. Distributed consensus doesn’t forgive flaky disks.

Task 5: Check CPU steal time (virtualization tax)

cr0x@server:~$ mpstat 1 5 | tail -n 7
12:22:11 PM  all   15.40    0.00    6.90    2.10    0.00    4.80    0.00   70.80
12:22:12 PM  all   17.20    0.00    8.10    2.40    0.00    6.20    0.00   66.10
12:22:13 PM  all   14.80    0.00    6.30    1.90    0.00    7.60    0.00   69.40

Meaning: If you see %steal non-trivial (not shown here; it’s 0.00), the hypervisor is borrowing your CPU time. That inflates coordination latency.

Decision: High steal → move to dedicated instances, pin vCPUs, or accept that your p99 is now a shared resource with your neighbors’ batch jobs.

Task 6: Measure fsync behavior quickly with fio

cr0x@server:~$ fio --name=fsync-test --directory=/var/lib/dbtest --size=1G --bs=4k --rw=write --ioengine=sync --fdatasync=1 --runtime=30 --time_based --direct=1
fsync-test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=sync, iodepth=1
fio-3.33
Starting 1 process
fsync-test: (groupid=0, jobs=1): err= 0: pid=22811: Tue Dec 30 12:23:40 2025
  write: IOPS=4200, BW=16.4MiB/s (17.2MB/s)(492MiB/30001msec); 0 zone resets
    clat (usec): min=70, max=24500, avg=220, stdev=410
    clat percentiles (usec):
     |  95.00th=[  310],  99.00th=[  950],  99.90th=[ 8500]

Meaning: The 99.90th at 8.5ms is okay-ish; if you see 50–200ms at the 99.9th, your database will periodically “freeze” at p99.

Decision: Bad fsync tail → change storage class, disable write cache lies, ensure proper power-loss protection, or move WAL/pebble/redo logs to better media.

Task 7: MySQL—check InnoDB log flush policy

cr0x@server:~$ mysql -e "SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';"
+--------------------------------+-------+
| Variable_name                  | Value |
+--------------------------------+-------+
| innodb_flush_log_at_trx_commit | 1     |
+--------------------------------+-------+

Meaning: 1 is safest: flush on every commit. 2 or 0 can lower latency but risks losing recent commits on crash.

Decision: If you’re comparing MySQL to CockroachDB, keep this at 1 for honesty. If you change it, write the risk into your incident runbook.

Task 8: MySQL—verify semi-sync is actually on (or not)

cr0x@server:~$ mysql -e "SHOW STATUS LIKE 'Rpl_semi_sync_master_status';"
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_status | OFF   |
+-----------------------------+-------+

Meaning: If semi-sync is OFF, primary commits are not waiting for replica acknowledgement. That’s why MySQL is fast, and also why failover can lose data.

Decision: If the business requires “no data loss,” you either enable semi-sync (paying latency) or you stop pretending async replication meets that requirement.

Task 9: MySQL—spot replication lag and decide what “fresh” means

cr0x@server:~$ mysql -e "SHOW REPLICA STATUS\G" | egrep "Seconds_Behind_Source|Replica_IO_Running|Replica_SQL_Running"
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
Seconds_Behind_Source: 42

Meaning: 42 seconds behind is not “a little.” It’s a different database.

Decision: Route reads that need freshness to primary, tune replication (parallel apply, schema/index fixes), or accept eventual consistency explicitly in application behavior.

Task 10: CockroachDB—check node health and liveness

cr0x@server:~$ cockroach node status --insecure
  id |            address            |  sql_address  |  build  |            started_at            |            updated_at            | is_available | is_live
-----+-------------------------------+---------------+---------+----------------------------------+----------------------------------+--------------+----------
   1 | db-node-1:26257               | db-node-1:26257 | v23.2.6 | 2025-12-30 11:02:17.123456+00:00 | 2025-12-30 12:25:01.123456+00:00 | true         | true
   2 | db-node-2:26257               | db-node-2:26257 | v23.2.6 | 2025-12-30 11:02:19.223456+00:00 | 2025-12-30 12:25:00.923456+00:00 | true         | true
   3 | db-node-3:26257               | db-node-3:26257 | v23.2.6 | 2025-12-30 11:02:21.323456+00:00 | 2025-12-30 12:25:01.003456+00:00 | true         | true

Meaning: All nodes live and available. If one flips to unavailable intermittently, expect write latency spikes and transaction retries.

Decision: Fix node stability before tuning SQL. “It flaps sometimes” is not a minor detail in consensus systems.

Task 11: CockroachDB—inspect range distribution and under-replication

cr0x@server:~$ cockroach node ranges --insecure
  node_id | ranges | leaseholders | replicas | avg_range_size
----------+--------+--------------+----------+----------------
        1 |    412 |          190 |      824 |        64MiB
        2 |    398 |          145 |      796 |        66MiB
        3 |    421 |          205 |      842 |        63MiB

Meaning: Reasonably balanced. If one node holds most leaseholders, it becomes the latency hot spot for reads/writes on those ranges.

Decision: If skewed, adjust locality constraints, add capacity, or investigate why rebalancing is stuck (often disk or network).

Task 12: CockroachDB—look for contention and transaction retries in SQL stats

cr0x@server:~$ cockroach sql --insecure -e "SELECT app_name, sum(retry_count) AS retries, sum(cnt) AS stmts FROM crdb_internal.statement_statistics GROUP BY app_name ORDER BY retries DESC LIMIT 5;"
  app_name  | retries | stmts
------------+---------+--------
  web-api   |    1842 |  98412
  worker    |     210 |  12001
  console   |       0 |   1422

Meaning: Your API is getting retried a lot. That is time the user experiences as latency (even if the transaction ultimately succeeds).

Decision: Find hot keys, reduce contention (shard counters, batch writes differently), and ensure your client retries are configured correctly rather than failing loudly.

Task 13: CockroachDB—identify leaseholder locality for a hot table

cr0x@server:~$ cockroach sql --insecure -e "SHOW RANGES FROM TABLE app.orders;"
  start_key | end_key | range_id | replicas | lease_holder
------------+---------+----------+----------+--------------
  NULL      | /1000   |    84    | {1,2,3}  |      2
  /1000     | /2000   |    91    | {1,2,3}  |      2
  /2000     | /3000   |    99    | {1,2,3}  |      1

Meaning: Leaseholders concentrated on node 2 means node 2 handles the “coordination” for those ranges. If node 2 is slower, everyone is slower.

Decision: Pin leaseholders via zone configs/locality, or fix node 2’s disk/network/CPU so it stops being the cluster’s mood ring.

Task 14: Compare query latency distributions from the client side

cr0x@server:~$ pt-query-digest --type slowlog /var/log/mysql/mysql-slow.log | head -n 12
# 2.1s user time, 20ms system time, 28.11M rss, 100.00% CPU
# Query_time distribution
#   1us
#  10us
# 100us  ################################################
#   1ms   ###############################################
#  10ms   ######
# 100ms   ##
#    1s   #

Meaning: The distribution tells you if you have a small number of pathological slow queries or broad tail degradation.

Decision: Pathological few → index/plan fixes. Broad tail → look at storage/network/lock contention and background work.

Task 15: Check Linux scheduler and IRQ imbalance (small-node classic)

cr0x@server:~$ cat /proc/interrupts | egrep "eth0|nvme" | head -n 6
  48:  18823401        0        0        0   IR-PCI-MSI 524288-edge      eth0-TxRx-0
  49:         0        0        0        0   IR-PCI-MSI 524289-edge      eth0-TxRx-1
  50:         0        0        0        0   IR-PCI-MSI 524290-edge      eth0-TxRx-2
  51:         0        0        0        0   IR-PCI-MSI 524291-edge      eth0-TxRx-3

Meaning: If all interrupts land on one CPU (like above), you can get weird latency under load even with “low” average CPU usage.

Decision: Enable irqbalance or pin queues properly; for small nodes, this can be the difference between stable p99 and chaos.

Fast diagnosis playbook

When latency spikes, you don’t have time for philosophy. You need a short sequence that finds the bottleneck with high probability.
Here’s the playbook I use when someone says, “Cockroach is slow,” or “MySQL is suddenly lagging,” on small servers.

First: is it network, disk, or contention?

Check node-to-node RTT and retransmits (ping + ss).
If RTT/jitter or retransmits correlate with latency spikes, stop and fix the network.
Check disk await and utilization (iostat).
If writes are queuing (w_await elevated, %util high), stop and fix storage or write amplification.
Check transaction retries / lock waits (Cockroach SQL stats for retries; MySQL performance_schema or InnoDB status for lock waits).
If retries/waits spike, you have contention, not “slow hardware.”

Second: confirm topology and locality are sane

CockroachDB: leaseholder placement for hot ranges; range distribution balance; under-replication.
MySQL: primary/replica roles, replication lag, read routing, and whether semi-sync is unexpectedly enabled (or disabled).

Third: identify background work stealing your lunch

CockroachDB: compactions, range rebalancing, schema backfills. On small nodes these can be loud.
MySQL: purge lag, checkpoint pressure, long transactions, or a backup job doing “polite” I/O that isn’t actually polite.

Stop conditions (a.k.a. avoid random tuning)

If iostat shows high await, do not touch SQL settings yet.
If packet retransmits spike, do not redesign schema yet.
If retries/lock waits are dominant, do not buy new hardware before fixing the hotspot.

Three corporate-world mini-stories (anonymized, plausible, technically accurate)

Mini-story 1: The incident caused by a wrong assumption

A mid-sized SaaS company migrated a user profile service from MySQL to CockroachDB. The architecture doc said, “We need HA; a three-node cluster gives us that.”
The team was smart, the code was clean, and the migration rehearsals went well—on a staging environment with low load and friendly neighbors.

The assumption that failed: “If each node is under 30% CPU, we have plenty of headroom.”
Production latency spiked after launch, but CPU stayed calm. The on-call engineer stared at CPU graphs like they were going to confess.
Meanwhile, users were waiting an extra 200–600ms for profile updates during peak traffic.

The real bottleneck was storage tail latency on one node. Its SSD had occasional long fsync pauses.
MySQL had masked this before because the primary was on a different box, and replication lag wasn’t user-facing.
In CockroachDB, that one node was part of the quorum for a large share of ranges, so its pauses became quorum pauses.

The fix was embarrassingly non-database: replace the drive and rebalance leaseholders.
They also changed procurement rules: no “mystery SSDs,” and no mixed storage classes inside a consensus cluster.

The lesson that stuck: with distributed SQL, a single slow node isn’t just “a bit slower.” It becomes the cluster’s personality.

Mini-story 2: The optimization that backfired

An e-commerce platform ran MySQL with async replication. Checkout was fast, but leadership wanted “stronger consistency” during outages.
Someone proposed enabling semi-sync replication to ensure at least one replica acknowledged writes before commit.
It sounded like a free safety upgrade. It was not.

They enabled semi-sync during a maintenance window and saw only a modest increase in average latency.
A week later, the first real network disturbance hit: a top-of-rack switch started dropping packets intermittently.
The primary began waiting on replica acks that arrived late or not at all.

Latency didn’t just increase. It became spiky. Checkout traffic piled up, thread pools saturated, and then the app tier started timing out.
The incident looked like “database is slow.” The root cause was “network is unreliable, and we put it in the commit path.”

The backfired optimization wasn’t semi-sync itself; it was enabling it without validating east-west network quality and without timeouts tuned to reality.
After the incident, they kept semi-sync—but only after placing replicas on a better network segment and setting clear behavior for when semi-sync falls back.

The lesson: anything that adds an acknowledgement to the commit path turns network hiccups into user-visible pain. This is true in MySQL and it’s the default in CockroachDB.

Mini-story 3: The boring but correct practice that saved the day

A financial services team ran CockroachDB on small but decent nodes. They didn’t have infinite budget, so they leaned into discipline:
stable instance types, dedicated disks, and a strict change calendar for schema migrations.

Their least glamorous practice was also the most valuable: before every release, they ran a fixed “tail-latency acceptance test” that measured p95 and p99 across a set of critical transactions,
while simultaneously running a background compaction-like I/O job to simulate noisy conditions.

One day, the test failed. Median latency looked fine; p99 doubled. The change? A new index on a high-write table plus a backfill that increased write amplification.
The developers argued the code was correct, and they were right. But production correctness includes physics.

Because they had the test, they caught it before customers did. They rescheduled the backfill to off-peak, adjusted constraints to keep leaseholders near the API tier,
and added throttling for the migration job.

The day was saved by a boring routine and a refusal to treat p99 as “someone else’s metric.”

Common mistakes: symptoms → root cause → fix

These are the patterns that show up in incident channels. The symptoms are repetitive; the root causes are usually boring.
The fixes are specific, not inspirational.

CockroachDB: p99 write latency spikes every few minutes

Symptoms: brief latency cliffs, transaction retries, nodes “healthy” at a glance.
Root cause: storage compactions or flushes competing with foreground writes; or one node with long fsync tail.
Fix: validate fsync tail with fio, reduce disk saturation, use better storage, and ensure nodes are homogeneous. Consider separating workloads and sizing for compaction headroom.

CockroachDB: reads are unexpectedly slow in a “single region” cluster

Symptoms: read latency higher than expected, even for simple SELECTs.
Root cause: leaseholders not located near the SQL gateway/app tier; queries bouncing between nodes.
Fix: inspect leaseholder distribution, set locality/zone constraints, and route SQL connections appropriately.

CockroachDB: application sees sporadic serialization failures

Symptoms: retryable errors, increased end-to-end request time, “random” failures under load.
Root cause: contention on hot rows/ranges; long transactions; counter-style updates.
Fix: redesign hotspots (sharded counters, batching), shorten transactions, add appropriate indexes, and ensure client retry logic is correct and bounded.

MySQL: replicas lag during peak and never catch up

Symptoms: rising Seconds_Behind_Source, delayed reads, risky failovers.
Root cause: slow queries on replica, insufficient apply parallelism, heavy DDL, or I/O saturation.
Fix: optimize the hot queries, enable/adjust parallel replication, isolate replicas for reads, and schedule DDL carefully.

MySQL: sudden latency increase after “making it safer”

Symptoms: commits slower, app timeouts, CPU normal.
Root cause: enabling semi-sync or stricter durability settings without verifying network/disk tail latency.
Fix: measure RTT and retransmits, set sane timeouts, place replicas better, and decide explicitly what failure semantics you want.

Both: average latency is fine, users still complain

Symptoms: dashboards show “green,” support tickets say “slow sometimes.”
Root cause: tail latency from contention, GC/compaction, background jobs, or network jitter.
Fix: instrument and alert on p95/p99, correlate with disk/network metrics, and make tail latency part of release criteria.

Checklists / step-by-step plan

Checklist A: choosing MySQL vs CockroachDB for small servers

If you need single-digit ms p99 for OLTP in one location: start with MySQL on a strong primary. Add replicas for reads and backups. Don’t apologize.
If you need automated failover with minimal operational toil and can tolerate a latency premium: CockroachDB is a reasonable choice, but size disks and network like they matter (because they do).
If you need multi-writer without sharding the app: CockroachDB fits the shape. MySQL will force sharding or careful coordination patterns.
If your nodes are “tiny” and storage is mediocre: don’t pick distributed SQL for performance. Pick it for availability and operational behavior, and accept the tax.
If the business demands “no data loss”: make sure you’re not relying on async semantics in MySQL, and make sure your CockroachDB cluster can actually keep quorum under failure.

Checklist B: making CockroachDB behave on small nodes

Keep nodes as identical as possible (CPU, RAM, disk type, kernel settings).
Measure and control fsync tail latency; don’t guess.
Watch transaction retries and contention; fix hotspots in schema and access patterns.
Check leaseholder locality for hot ranges; don’t assume the database “knows” your app topology.
Schedule schema changes and backfills; throttle them; treat them as production events.
Leave headroom for compactions and rebalancing. “80% disk busy” is not headroom.

Checklist C: making MySQL safe without destroying latency

Keep innodb_flush_log_at_trx_commit=1 unless you can explain the risk in one sentence to a non-engineer.
Define what consistency you need for reads; route accordingly (primary vs replicas).
Rehearse failover. A runbook you’ve never executed is fiction.
Monitor replication lag and set thresholds that trigger operational action, not just dashboards.
Keep backups and restores boring and tested; replication is not a backup.
Do online schema changes with proven tooling and a rollback plan.

FAQ

1) Is CockroachDB “slower” than MySQL?

On small servers for low-latency OLTP, often yes—especially at p95/p99—because quorum replication and LSM compactions add unavoidable work.
On workloads that need scale-out writes or automated failover, “slower per transaction” can still be “faster for the business” because outages cost more than milliseconds.

2) Why does CockroachDB latency look fine on average but bad at p99?

Because tail latency is where coordination and background storage work show up: occasional slow fsync, compaction bursts, replica hiccups, network jitter, and retries.
Medians hide these; users don’t.

3) Can I make CockroachDB as fast as a single-node MySQL primary?

You can narrow the gap with good hardware, locality-aware configuration, and contention fixes. You can’t remove quorum and still claim the same failure semantics.
If you need single-node latency, run a single-node database for that path.

4) Is a three-node CockroachDB cluster enough?

It’s the minimum for quorum-based HA, and it works. It’s also fragile in the sense that any maintenance or node instability consumes your safety margin immediately.
If you can afford it, five nodes buys operational breathing room and often better tail behavior.

5) What’s the MySQL equivalent of CockroachDB’s strong consistency?

MySQL can approximate stronger durability/consistency using semi-sync replication or group replication, but you’re changing the commit path and taking a latency hit.
The key difference is that CockroachDB is designed around that model; MySQL can do it, but it’s easier to misconfigure and harder to reason about during failures.

6) Which one is easier to run with a small team?

If your workload fits on one primary, MySQL is usually simpler day-to-day.
If you need automated failover and can’t afford manual HA operations, CockroachDB can reduce toil—provided you invest in measuring disk/network and understanding contention.

7) Does CockroachDB require fast NVMe?

It doesn’t require it to function. It requires it to feel good under load, especially on small nodes.
Slow or jittery storage shows up as quorum commit latency and compaction stalls.

8) Can I use CockroachDB on 1Gbps networking?

Yes, but you must watch retransmits and east-west congestion, and you should keep nodes close (low RTT, low jitter).
If your network is shared and bursty, your p99 will reflect that reality.

9) Why do small clusters feel “fine” until traffic grows a bit?

Because the first thing you run out of is slack, not CPU. Background work and queueing effects rise nonlinearly.
At small scale, the system has enough idle time to hide sins; at moderate scale, it starts charging interest.

10) If I’m already on MySQL, when should I move to CockroachDB?

Move when your pain is primarily around failover, multi-writer growth, operational fragility, or the complexity of sharding and correctness.
Don’t move because you want a trendy database. If your main pain is query latency, a distributed system is an expensive way to buy more latency.

Conclusion: what to do next week

If you’re on small servers and debating MySQL vs CockroachDB, treat it like an engineering decision, not a branding decision.
MySQL is the latency baseline. CockroachDB is the availability baseline. You can tune both, but you can’t negotiate away their defaults.

Practical next steps that actually change outcomes:

Measure RTT and disk fsync tail latency on your real nodes. If either is ugly, fix that before changing databases.
Pick your truth model: async replication (fast, riskier) vs quorum/strong consistency (safer, slower). Write it down.
Run a p99-focused load test with background I/O running. If you don’t test tails, you’ll discover them in production.
If you choose CockroachDB on small nodes: invest in homogenous hardware, locality/leaseholder awareness, and contention diagnosis from day one.
If you choose MySQL: invest in disciplined failover, replication lag monitoring, and tested backups. Don’t let “simple” become “untended.”

The latency tax isn’t a hidden fee. It’s on the receipt. You just have to look at p99 to see it.