MySQL vs SQLite Concurrency: Why Writes Become a Traffic Cliff

December 20, 2025 • February 3, 2026 • Read: 24 min • Views: 11

Was this helpful?

The graph looks fine, until it doesn’t. Latency is flat, CPU is bored, and then a small increase in write traffic turns your app into a queueing museum exhibit:
everyone stands around waiting to be allowed inside.

If you’ve ever watched an “SQLite is fast” prototype go to production and suddenly start throwing database is locked errors, you’ve met the traffic cliff.
This isn’t a moral failing. It’s physics, locks, and a set of tradeoffs that SQLite and MySQL make very differently.

The traffic cliff: what you’re actually seeing

“Traffic cliff” is what it feels like when a system’s throughput doesn’t degrade gracefully. You add 10% more write load and suddenly you get 10x worse tail
latency, timeouts, retries, and a thundering herd of clients politely re-attempting the same thing they just did.

In SQLite, the cliff is usually caused by lock contention and a single-writer design. There are nuances (WAL mode changes the picture, and the OS filesystem matters),
but the big idea stays: concurrent writers pile up behind a lock. As the queue grows, latency explodes. It’s not subtle.

In MySQL (specifically InnoDB), the cliff still exists—but you have more gears. Row-level locking, MVCC, background flushing, and an engine designed for concurrent
connections mean writes tend to degrade more gradually, and you get more tuning knobs and more ways to shoot yourself in the foot.

One useful mental model: SQLite is a very fast, very reliable single-lane bridge with great signage. MySQL is a highway interchange with multiple ramps, speed limits,
and a surprising number of ways to cause a traffic jam if you put a couch in the left lane.

Interesting facts and history that matter operationally

SQLite started in 2000 as an embedded database for internal use, designed to be small, reliable, and require zero administration—still its superpower.
SQLite is in the public domain, which is a big reason it’s everywhere: fewer licensing anxieties, more “just ship it” adoption.
SQLite’s testing culture is extreme: it’s famous for heavy automated testing, including fault-injection and simulation of I/O errors—good news when your disk lies.
WAL mode arrived later (mid/late 2000s era) to improve read concurrency by separating reads from writes via a write-ahead log.
MySQL’s InnoDB became the default (historically after years of MyISAM being common), shifting the “normal” behavior toward transactions and row-level locks.
InnoDB uses MVCC (multi-version concurrency control): readers don’t block writers the same way, which is why read-heavy workloads can stay calm under writes.
SQLite’s “single writer” is a design choice, not a missing feature. It simplifies correctness across platforms and filesystems.
Filesystem semantics matter more for SQLite than people expect. Network filesystems can break locking assumptions or make them painfully slow.
MySQL’s durability is configurable via log flush policies, which means you can choose between “safe” and “fast” and then forget what you chose.

Concurrency models: SQLite vs MySQL in plain machinery

SQLite: a database file with a locking protocol

SQLite is a library that reads and writes a single database file (plus sidecar files like WAL and shared-memory indexes in WAL mode). Your application process
links it (directly or via a wrapper), and queries operate through normal file I/O with a coordination mechanism to keep the file consistent.

The concurrency story hinges on locks. In rollback-journal mode (the older default style), SQLite takes locks on the database file as it moves through states:
shared, reserved, pending, exclusive. The details are precise, but operationally it means this: at some point, a writer needs an exclusive lock to commit, and while
that lock is held, other writers are blocked—and, depending on mode and timing, readers can be blocked too.

In WAL mode, it gets better for reads: multiple readers can read the main database while a writer appends to the WAL. But writes still serialize: only one writer
at a time can commit to the WAL. WAL gives you read concurrency and often better performance under mixed read/write, but it does not turn SQLite into a
multi-writer engine.

MySQL InnoDB: a server with concurrent transaction machinery

MySQL is a client/server database. You connect over a socket, and the server manages concurrency inside a storage engine (InnoDB, typically).
InnoDB uses row-level locks, transactions, undo logs, redo logs, background flushing, and MVCC snapshots. It’s built to keep many clients moving at once.

Writes still contend—on hot rows, on indexes, on auto-increment locks (less than it used to), on metadata locks, and on redo log flushing.
But you can frequently spread the pain. With good schema and query design, concurrent writers can proceed as long as they aren’t hammering the same records.

SQLite asks: “How do we keep the file consistent and portable?” InnoDB asks: “How do we keep 500 clients happy while the disk is on fire?”

Why writes in SQLite cliff: the queueing explanation

Let’s talk about why this fails suddenly rather than gradually.

When you have a single critical section (the writer lock), you’ve built a queue. As long as your arrival rate of write transactions is less than your service rate
(how fast the single writer can complete each transaction), things look normal. The moment you get close to saturation, the queue length starts to grow quickly.
With random arrivals and variable transaction times, the tail gets ugly fast.

SQLite writes tend to be “bursty” because of transaction boundaries and fsync behavior. Many applications do tiny transactions:
insert one row, commit; update one row, commit. Each commit is a durability event that may require syncing. That adds a big, spiky service-time component.
Suddenly your single-lane bridge has a toll booth that occasionally stops traffic to count coins.

The cliff is amplified by client retry behavior. When SQLite returns SQLITE_BUSY or database is locked, libraries and apps often retry.
Retrying is fine when contention is brief. Under sustained load, it becomes self-harm: you increase the arrival rate exactly when the system is saturated.

Here’s your first joke: A single-writer database under peak load is like a one-person meeting where everyone interrupts—somehow nobody makes progress, but everyone leaves tired.

How WAL changes the cliff (but doesn’t remove it)

WAL mode lets readers avoid blocking on a writer. That’s huge in read-heavy workloads: the cliff can move to the right.
But the writer is still single. Worse, WAL introduces checkpointing, which is a kind of deferred work: you append to WAL until you decide to checkpoint and merge changes
back into the main database. Checkpointing costs I/O and can block or slow things depending on settings and pressure.

If you’re unlucky, you’ve traded “every write blocks reads sometimes” for “reads are smooth until the checkpoint monster wakes up.”
That can look like periodic latency spikes, or a cliff that appears on a schedule.

Filesystem and storage stack: SQLite’s hidden dependency

SQLite’s locks rely on OS file locking semantics. On local disks with sane filesystems, it’s predictable. On some network filesystems, it’s either broken,
emulated poorly, or extremely slow. Even when “supported,” latency variation can turn minor contention into a constant busy-loop.

MySQL also cares about I/O latency, obviously, but its concurrency control is not built on “every client process coordinates with file locks on the same file.”
It’s built on a dedicated server process that controls access to storage structures.

Why MySQL usually doesn’t cliff the same way

MySQL’s InnoDB engine has a different failure profile. It can absolutely fall over, but the common path is not “one writer lock blocks everyone.”
Instead, it’s a mix of:

Hot row/index contention: many writers updating the same row(s) or same index leaf page.
Transaction log pressure: redo log flushes become the limiter, especially with strict durability.
Buffer pool and flushing: dirty pages pile up; flushing becomes urgent; throughput collapses if I/O can’t keep up.
Lock waits and deadlocks: not a single global write lock, but enough waits to stall the application.
Connection storms: too many client threads, context switching, memory blowups.

The key difference: if 50 clients write to 50 different rows, InnoDB can often do that concurrently. If 50 clients write to the same “counter” row, it will serialize.
But you can redesign that (shard counters, use append-only tables, batch updates). With SQLite, redesign often means “reduce writers or batch harder.”

Quote requirement, carefully handled:
Hope is not a strategy — commonly attributed to operations culture (paraphrased idea).
It applies perfectly here: “we won’t have concurrent writes” is not an architecture.

WAL, checkpoints, and the “surprise write storm”

WAL mode in SQLite is commonly recommended, and for good reason: it improves read/write concurrency by letting readers read the stable database while writes go to the log.
But WAL introduces operational work you can’t ignore: checkpoints.

A checkpoint merges WAL contents back into the main database file. That’s I/O heavy. If it can’t keep up, the WAL grows. If the WAL grows, it can slow down reads
(because readers may need to consult the WAL), increase disk usage, and lengthen recovery time after crashes.

Under steady write load, you can end up with periodic spikes when checkpointing triggers. Under bursty load, you can get “write storms” where the system tries to catch up,
and suddenly your previously smooth performance turns into a sawtooth.

The cliff pattern often looks like this:

Low/moderate load: everything is fast, WAL is small, checkpoints are cheap.
Approaching saturation: writers wait more, WAL grows, checkpoint cost grows.
Over saturation: writers stack up, checkpointing competes for the same disk bandwidth, lock waits increase, clients retry, and you’re paging someone at 02:00.

Transactions, autocommit, and the death-by-small-writes pattern

SQLite’s write cliff is frequently self-inflicted via transaction shape.

Many apps run in autocommit mode: every INSERT/UPDATE is its own transaction. That means every write pays the full “commit” cost: lock acquisition, journal/WAL work,
and durability sync behavior. If you do 1,000 updates one by one, you’re not doing 1,000 updates—you’re doing 1,000 commits. Your disk becomes a metronome.

Batch writes inside explicit transactions. It’s not optional. If your application can’t do that, SQLite is the wrong tool once concurrency and throughput matter.

MySQL also benefits from batching, but it tolerates smaller transactions better because it’s designed around concurrent commit, group commit, and background I/O.
You still want good transaction hygiene, but you’re less likely to hit an immediate cliff from “normal” multi-user behavior.

Practical tasks: commands, outputs, and decisions (12+)

These are the checks I actually run when someone says “writes got slow” or “SQLite locked up” or “MySQL is timing out.” Each task includes:
command, what the output means, and what decision you make next.

Task 1: Confirm SQLite journal mode and critical pragmas

cr0x@server:~$ sqlite3 /var/lib/app/app.db 'PRAGMA journal_mode; PRAGMA synchronous; PRAGMA busy_timeout;'
wal
2
0

Meaning: journal_mode is WAL. synchronous=2 is FULL (durable, slower). busy_timeout=0 means “fail fast” on contention.
Decision: Set a sane busy timeout (e.g., 2000–10000 ms) to reduce immediate failures, and review synchronous based on durability needs.

Task 2: Check WAL file size and checkpoint pressure

cr0x@server:~$ ls -lh /var/lib/app/app.db*
-rw-r----- 1 app app 1.2G Dec 30 01:12 /var/lib/app/app.db
-rw-r----- 1 app app 768M Dec 30 01:13 /var/lib/app/app.db-wal
-rw-r----- 1 app app  32K Dec 30 01:12 /var/lib/app/app.db-shm

Meaning: A 768M WAL suggests checkpoints aren’t keeping up or aren’t running.
Decision: Investigate checkpoint settings and whether the app is leaving long-running read transactions open (which can prevent checkpoints).

Task 3: Inspect SQLite for long-running transactions (app-level symptom check)

cr0x@server:~$ lsof /var/lib/app/app.db | head
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
app      2140  app   17u   REG  252,0 1291845632 1048577 /var/lib/app/app.db
app      2140  app   18u   REG  252,0  805306368 1048578 /var/lib/app/app.db-wal
app      2199  app   17u   REG  252,0 1291845632 1048577 /var/lib/app/app.db

Meaning: Multiple processes/threads have the DB open; that’s normal. It doesn’t show locks directly, but it tells you who is in the game.
Decision: If you see unexpected processes, stop them. If many workers share one DB file on network storage, re-evaluate architecture immediately.

Task 4: Quick-and-dirty contention reproduction with busy timeouts

cr0x@server:~$ sqlite3 /var/lib/app/app.db 'PRAGMA busy_timeout=2000; BEGIN IMMEDIATE; SELECT 1;'
1

Meaning: BEGIN IMMEDIATE attempts to get a write reservation early. If this blocks or returns BUSY quickly, you have writer contention.
Decision: If this consistently struggles during incidents, reduce concurrent writers or batch writes; WAL alone won’t save you.

Task 5: Check filesystem type (SQLite cares more than you want)

cr0x@server:~$ findmnt -no SOURCE,FSTYPE,OPTIONS /var/lib/app
/dev/nvme0n1p2 ext4 rw,relatime

Meaning: Local ext4: good baseline. If you see nfs/cifs/fuse, treat that as a major risk for locking and latency.
Decision: Move SQLite to local storage or move to a server database.

Task 6: Measure disk latency during write spikes

cr0x@server:~$ iostat -xz 1 5
Linux 6.8.0 (server)  12/30/2025  _x86_64_ (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.12    0.00    2.10    9.80    0.00   81.98

Device            r/s     w/s   rKB/s   wKB/s  avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1         120.0   980.0   6400  51200      97.5     8.20    8.40    2.10    9.20   0.62  68.0

Meaning: iowait is high and write await is ~9ms. That’s not catastrophic, but if it spikes to tens/hundreds of ms, commits will stall and lock queues grow.
Decision: If storage latency spikes correlate with lock errors, prioritize I/O: faster disk, less sync, larger transactions, fewer writers.

Task 7: Check MySQL engine and baseline configuration

cr0x@server:~$ mysql -uroot -p -e "SHOW VARIABLES LIKE 'default_storage_engine'; SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';"
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| default_storage_engine | InnoDB |
+------------------------+-------+
+--------------------------------+-------+
| Variable_name                  | Value |
+--------------------------------+-------+
| innodb_flush_log_at_trx_commit | 1     |
+--------------------------------+-------+

Meaning: InnoDB with flush-at-commit=1 is durable but more sensitive to fsync latency.
Decision: If you can tolerate losing up to ~1 second of commits in a crash, consider value 2; otherwise invest in low-latency storage and group commit.

Task 8: Check MySQL for lock waits and contention

cr0x@server:~$ mysql -uroot -p -e "SHOW ENGINE INNODB STATUS\G" | sed -n '1,120p'
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2025-12-30 01:15:02 0x7f1c6c0d9700 INNODB MONITOR OUTPUT
=====================================
------------
TRANSACTIONS
------------
Trx id counter 5829101
Purge done for trx's n:o < 5829000 undo n:o < 0 state: running
History list length 2113
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 5829088, ACTIVE 12 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 118, OS thread handle 139759, query id 99102 app 10.0.0.24 updating
UPDATE counters SET value=value+1 WHERE id=1

Meaning: A transaction updating the same counter row is classic hot-row contention. Active 12 seconds is a smell.
Decision: Fix schema/workload: shard counters, use buffered increments, or redesign to avoid a single hot row.

Task 9: Check MySQL process list for pileups

cr0x@server:~$ mysql -uroot -p -e "SHOW FULL PROCESSLIST;" | head
Id	User	Host	db	Command	Time	State	Info
118	app	10.0.0.24:51122	prod	Query	12	updating	UPDATE counters SET value=value+1 WHERE id=1
119	app	10.0.0.25:51140	prod	Query	11	Waiting for row lock	UPDATE counters SET value=value+1 WHERE id=1
120	app	10.0.0.26:51188	prod	Query	11	Waiting for row lock	UPDATE counters SET value=value+1 WHERE id=1

Meaning: Many sessions waiting for row lock: not a “MySQL is slow” problem; it’s an application contention problem.
Decision: Reduce concurrency to that hotspot, change query pattern, or move to append-only with aggregation.

Task 10: Check InnoDB deadlocks quickly

cr0x@server:~$ mysql -uroot -p -e "SHOW ENGINE INNODB STATUS\G" | grep -n "LATEST DETECTED DEADLOCK" -A25
247:LATEST DETECTED DEADLOCK
248:------------------------
249:2025-12-30 01:10:44 0x7f1c6c1da700
250:*** (1) TRANSACTION:
251:TRANSACTION 5829051, ACTIVE 1 sec starting index read
252:mysql tables in use 1, locked 1
253:LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)

Meaning: Deadlocks aren’t “bad,” they’re a concurrency signal. InnoDB will pick a victim and roll it back.
Decision: Make transactions shorter and consistent in lock order; add appropriate indexes; handle deadlock retries in the application.

Task 11: Observe MySQL throughput and flushing pressure

cr0x@server:~$ mysql -uroot -p -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_pages_dirty'; SHOW GLOBAL STATUS LIKE 'Innodb_log_waits';"
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| Innodb_buffer_pool_pages_dirty| 8421  |
+-------------------------------+-------+
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Innodb_log_waits | 187   |
+------------------+-------+

Meaning: Dirty pages indicate pending flush; log waits indicate transactions waiting for redo log flush space.
Decision: If log waits are climbing, you’re limited by redo log I/O or sizing; consider log file size tuning and storage latency improvements.

Task 12: Validate connection count and thread behavior in MySQL

cr0x@server:~$ mysql -uroot -p -e "SHOW GLOBAL STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections';"
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 812   |
+-------------------+-------+
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 1000  |
+-----------------+-------+

Meaning: 812 connections is a lot unless you planned for it. Connection storms can look like “database is slow” when it’s really thread scheduling.
Decision: Use connection pooling, cap concurrency, and validate that the app isn’t creating one connection per request.

Task 13: Check SQLite compile/runtime version (because behavior changes)

cr0x@server:~$ sqlite3 --version
3.45.2 2024-03-12 11:06:23 df5c253c0b3dd24916e4ec7cf77d3db5294cc9fd3f24a94d9c0b6ec8d1c4a2c7

Meaning: Version tells you what WAL and locking fixes exist. Old versions can have different defaults and performance characteristics.
Decision: If you’re on something ancient bundled with an OS image, upgrade the library/app runtime where possible.

Task 14: Determine if you’re accidentally on a network filesystem

cr0x@server:~$ mount | grep -E ' /var/lib/app |nfs|cifs|fuse' | head
/dev/nvme0n1p2 on /var/lib/app type ext4 (rw,relatime)

Meaning: Local mount. Good. If you saw NFS/CIFS/FUSE here, you’d have an immediate action item.
Decision: If it’s remote, stop and re-architect: local disk for SQLite or move to MySQL/Postgres.

Task 15: Spot “death by autocommit” in application logs (pattern check)

cr0x@server:~$ grep -E "SQLITE_BUSY|database is locked|BEGIN|COMMIT" /var/log/app/app.log | tail -n 8
2025-12-30T01:12:10Z db=sqlite msg="BEGIN"
2025-12-30T01:12:10Z db=sqlite msg="COMMIT"
2025-12-30T01:12:10Z db=sqlite err="database is locked"
2025-12-30T01:12:10Z db=sqlite msg="BEGIN"
2025-12-30T01:12:10Z db=sqlite msg="COMMIT"

Meaning: BEGIN/COMMIT pairs per operation and immediate lock errors: you’re doing tiny transactions and colliding.
Decision: Batch writes inside explicit transactions; add backpressure; reduce worker concurrency.

Fast diagnosis playbook

When write performance collapses, you don’t have time to philosophize about ACID. You need a fast branching path.
Here’s the order that usually finds the bottleneck quickly.

First: identify the lock type and where contention is happening

SQLite: Are you seeing SQLITE_BUSY / database is locked? Check PRAGMA busy_timeout, WAL size, and whether you have multiple writers.
MySQL: Are queries “Waiting for row lock” or “Waiting for table metadata lock”? Use SHOW FULL PROCESSLIST and SHOW ENGINE INNODB STATUS.

Second: check I/O latency and durability settings

Run iostat -xz. If write await spikes, commits will stall.
SQLite: check PRAGMA synchronous. MySQL: check innodb_flush_log_at_trx_commit.

Third: confirm transaction shape and concurrency limits

Look for autocommit loops: many tiny transactions.
Verify worker pool sizes, request concurrency, and retry behavior.
In MySQL, check connection count and whether a pool exists.

Fourth: decide “tune” vs “redesign”

If it’s SQLite and you need multiple concurrent writers: assume redesign/migration.
If it’s MySQL and you have a hot row/index: redesign query patterns and data model; tuning alone won’t beat physics.
If it’s I/O: buy latency, reduce sync frequency safely, or reduce write volume.

Common mistakes: symptoms → root cause → fix

1) “database is locked” appears only at peak traffic

Symptoms: Sporadic lock errors, retries, timeouts, and a sudden jump in tail latency.

Root cause: Writer lock contention and queueing; too many concurrent writers; busy_timeout too low or zero.

Fix: Batch writes in explicit transactions; set a sensible busy_timeout; reduce writer concurrency; move to MySQL if multi-writer is a requirement.

2) WAL file grows endlessly

Symptoms: .db-wal grows large; disk usage creeps; periodic latency spikes.

Root cause: Checkpoints not running or blocked by long-running readers; checkpoint settings not matched to workload.

Fix: Ensure readers don’t hold transactions open; schedule/trigger checkpoints appropriately; consider smaller read transactions or snapshot usage patterns.

3) “SQLite is fast on my laptop” but slow in containers

Symptoms: Great dev performance; production shows lock timeouts and I/O stalls.

Root cause: Different storage: overlay filesystems, network volumes, throttled IOPS, or higher fsync latency.

Fix: Put SQLite on local, low-latency persistent storage; measure fsync cost; or move to MySQL where the server can absorb variability better.

4) MySQL write latency spikes without obvious locks

Symptoms: Queries “execute” but take much longer; CPU is moderate; disk busy.

Root cause: Redo log flush pressure or aggressive dirty page flushing; storage latency.

Fix: Improve disk latency; tune redo log size; verify flush policies; reduce transaction commit rate via batching.

5) MySQL is “down” but the real problem is connections

Symptoms: Errors about too many connections; response times degrade across the board.

Root cause: Connection-per-request behavior, pool misconfiguration, or retry storms.

Fix: Enforce pooling, cap concurrency, and apply backpressure; monitor Threads_connected and application queue depth.

6) “We added an index and writes got worse”

Symptoms: Reads improved, but insert/update throughput fell; lock time increased.

Root cause: Extra index maintenance increases write amplification and contention (especially on monotonic keys).

Fix: Keep indexes minimal; consider composite indexes carefully; avoid redundant indexes; use append-friendly keys when possible.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized product team built a job runner that processed events from a message queue. Each worker wrote a status row to SQLite: enqueue time, start time, finish time,
and a couple counters. It ran great in staging. It ran great for the first month in production.

The assumption was quiet and deadly: “Workers are independent, so database writes are independent.” They were not. All workers wrote to the same file, on the same
volume, and each job produced multiple small transactions because the ORM was in autocommit mode.

Peak traffic arrived (predictably, at the same time marketing scheduled a campaign). Workers increased from a handful to dozens. Suddenly the queue length grew faster
than processing. Workers retried on lock errors, which increased write attempts. CPU stayed low. The service looked “fine” if you only watched CPU.

The tell was the log: a wall of database is locked and tiny transactions. The fix was not heroics: reduce writer concurrency, batch per-job writes into
one transaction, and move state storage to MySQL for multi-writer durability. The rest of the week was spent explaining to leadership why “fast embedded DB” doesn’t mean
“shared write database server.”

Mini-story 2: The optimization that backfired

Another team wanted “fewer network hops” and moved a small write-heavy feature from MySQL to SQLite embedded in an API service. Their logic was seductive:
no network, no TCP overhead, no server, no connection pools. Just a file. Performance tests looked great—on one instance.

Then the feature became popular. They scaled the API horizontally, because that’s what you do when you have a stateless service. Each instance pointed at the same
shared volume (a managed network filesystem) so they could “share the database.” That’s where the optimism met filesystem reality.

The system didn’t just slow down. It developed a personality: occasional pauses, lock timeouts, and weird behavior where a deploy would “fix” it for an hour.
WAL files grew, checkpointing became erratic, and the network filesystem’s locking semantics introduced extra latency variance. Debugging was miserable because each
instance was technically “healthy,” just blocked.

They backed out the change. Not because SQLite is bad, but because they tried to use it as a multi-node shared database. The “optimization” removed a network hop and
replaced it with a distributed lock and unpredictable storage latency. You can’t out-clever physics; you can only choose which part of physics you want to pay.

Mini-story 3: The boring but correct practice that saved the day

A payments-adjacent team ran MySQL for transactional workloads and used SQLite on edge devices for local caching. They had a rule: no SQLite database file was allowed
to have more than one active writer per process group, and every write path had an explicit transaction boundary and a bounded queue.

It was boring. It required code reviews that asked annoying questions like “What’s your maximum transaction time?” and “Where is backpressure applied?”
They also had synthetic load tests that modeled peak write rates and forced lock contention in pre-prod.

One day, a new feature accidentally introduced a write-on-read pattern: a read endpoint updated a “last_seen” field on every request. In MySQL, this would have been
noisy but survivable; on the edge SQLite caches, it turned reads into writes and threatened to serialize everything.

The safety rails did their job. The bounded queue filled, the service degraded predictably (some updates dropped, reads stayed fast), and alerts fired before the devices
became unusable. They reverted the change and replaced “write last_seen” with a periodic batch update. Nobody got paged at 02:00. This is what “boring” buys you.

Checklists / step-by-step plan

If you insist on running SQLite under concurrent load

Turn on WAL unless you have a specific reason not to. Verify with PRAGMA journal_mode;.
Batch writes: wrap multiple statements in one explicit transaction. Measure commit rate, not query rate.
Set busy_timeout so you get controlled waiting instead of immediate failures; then cap retries at the application layer.
Enforce a single writer with a queue (in-process or via a dedicated writer service). Multiple writers is not “parallel,” it’s “contending.”
Keep transactions short; do not hold read transactions open across slow application logic.
Pin the database to local storage. Avoid network filesystems for shared writes.
Watch WAL and checkpoint behavior: alert on WAL growth and checkpoint stalls.
Load test with realistic concurrency, including retries and timeouts, not just “single-threaded benchmarks.”

If you’re choosing MySQL for concurrency

Use InnoDB (yes, still worth saying) and confirm durable settings match business requirements.
Design away hot spots: avoid single-row counters; avoid global locks disguised as “status tables.”
Index responsibly: every index is a write tax; pay only the taxes you need.
Pool connections and cap concurrency; do not let every request spawn a new session.
Monitor lock waits and deadlocks and treat them as feedback about your transaction design.
Test under write bursts, not just steady-state. Many outages happen when buffers fill and flushing becomes urgent.

Migration trigger checklist: when SQLite is the wrong hill to die on

You need multiple concurrent writers across threads/processes and can’t serialize them without hurting latency.
You need horizontal scaling with shared state across nodes.
You depend on networked storage for the database file.
You can’t batch writes because the product requires immediate commit per event at high rate.
You need online schema changes and operational tooling that expects a server database.

Second joke: SQLite is a fantastic tool, but using it as a shared write database is like using a screwdriver as a chisel—it works until it becomes a story you tell at onboarding.

FAQ

1) Is SQLite really single-writer?

Practically, yes: only one writer can commit at a time. WAL improves read concurrency, not multi-writer throughput.
You can have multiple connections issuing writes, but they serialize on the writer lock.

2) Does WAL mode make SQLite safe for high concurrency?

It makes it safer for read-heavy concurrency and mixed workloads, because readers don’t block on a writer the same way.
It doesn’t remove the single-writer bottleneck; it often just moves it or changes the shape (checkpoint spikes).

3) Why do writes cliff instead of slowly getting worse?

Because you’re saturating a serialized resource. Queueing theory: as utilization approaches 100%, wait times explode.
Add retries and variable commit time, and the explosion becomes theatrical.

4) Can I fix SQLite concurrency by adding more CPU?

Usually no. The limiter is lock serialization and fsync/I/O latency, not CPU. Faster storage and fewer commits help more than more cores.

5) Is “database is locked” always bad?

It’s a signal. A few busy responses under bursts can be fine if you have backpressure and bounded retries.
Persistent lock errors mean your write arrival rate exceeds what the single writer can service, or transactions are too long.

6) Why does MySQL handle concurrent writes better?

InnoDB supports row-level locks and MVCC, so independent writes can proceed concurrently if they don’t touch the same rows/index pages.
It still has bottlenecks, but they’re often localized and redesignable.

7) When does MySQL still behave like a “traffic cliff”?

Hot rows (single counters), big secondary indexes under heavy insert, redo log pressure, or connection storms.
MySQL can degrade gradually, but it can also fall off a cliff if you build a hotspot and then pile on retries.

8) Should I reduce durability to improve write throughput?

Sometimes, but do it deliberately. SQLite PRAGMA synchronous and MySQL innodb_flush_log_at_trx_commit are business decisions.
If you can’t tolerate data loss, don’t “optimize” yourself into a compliance incident.

9) Can I run SQLite on NFS if I’m careful?

You can try, and you might even get lucky for a while. But locking semantics and latency variance make it a high-risk bet.
If you need shared writes across nodes, prefer a database server.

10) What’s the simplest safe pattern for SQLite in production?

Single writer (queue), WAL mode, batched transactions, local disk, short-lived read transactions, and bounded retries with backpressure.
If that sounds like you’re building a database server, you’re learning the right lesson.

Next steps you can actually do this week

If you’re already on SQLite and feeling the cliff:

Measure: confirm journal mode, synchronous, busy_timeout; inspect WAL size; correlate lock errors with disk latency.
Fix the shape: batch writes in explicit transactions; cap writer concurrency; add backpressure; stop infinite retries.
Stabilize: ensure local storage; prevent long-running read transactions; watch checkpoint behavior.
Decide: if you need multi-writer concurrency across processes/nodes, plan migration to MySQL (or another server database) instead of “tuning harder.”

If you’re choosing between MySQL and SQLite for a new system:

Use SQLite when you want embedded simplicity, mostly reads, bounded write concurrency, and local durability.
Use MySQL when you need many concurrent writers, centralized state across multiple app instances, operational tooling, and predictable scaling under load.

The cliff isn’t mysterious. It’s a contract you accidentally signed: “one writer at a time.” If that contract matches your workload, SQLite is brilliant.
If it doesn’t, the cliff is not a bug. It’s the bill.