You don’t notice write concurrency until you do. Everything is fine in dev, fine in staging, fine at 2 a.m. with one user.
Then the business launches a feature, your job queue goes hot, and suddenly your “simple” database starts returning
database is locked like it’s auditioning for a recurring role.
This is where the PostgreSQL vs SQLite decision stops being philosophical and becomes operational.
Concurrent writers aren’t a nice-to-have; they’re the difference between a system that degrades gracefully and one that
turns into a polite denial-of-service against itself.
The blunt answer: who wins?
For concurrent writers, PostgreSQL wins. Not by a little. By design.
SQLite is a brilliant embedded database that optimizes for simplicity, zero administration, and reliability in a single-file
footprint. It can handle multiple readers concurrently. It can even handle “multiple writers” in the sense that many
processes may attempt writes. But it ultimately serializes writes through locking around the database file, especially around
committing transactions.
PostgreSQL is a server database built to accept many client connections and to sustain a steady stream of transactions from
many concurrent sessions. It uses MVCC (multi-version concurrency control) to decouple readers from writers and has mature lock
management, background processes, WAL (write-ahead log) shipping options, and the tooling to tell you what’s stuck and why.
If your workload includes multiple independent workers inserting/updating frequently—think web requests, background jobs, ingestion
pipelines, or anything “queue-like”—PostgreSQL is the boring, correct choice. SQLite can still be correct, but only when you design
around its write serialization (batching, single-writer pattern, WAL mode with realistic expectations).
One sentence rule: if write contention is part of the problem statement, don’t pick a single-file database and then act surprised
when the file becomes the bottleneck.
Interesting facts and historical context
- SQLite began in 2000 as an embedded database for internal tooling. Its “serverless” design is a feature, not an omission.
- PostgreSQL traces back to the 1980s (POSTGRES project at UC Berkeley). That lineage shows in its focus on concurrency and extensibility.
- SQLite’s entire database is a single file (plus optional shared memory / WAL files). That’s operationally convenient and concurrency-hostile.
- SQLite’s WAL mode (introduced in 2010) improved reader/writer concurrency dramatically, but did not turn it into a multi-writer engine.
- PostgreSQL’s WAL exists to guarantee durability and enable crash recovery; it also enables replication and point-in-time recovery.
- SQLite is everywhere: phones, browsers, appliances, games. The typical workload is local, short transactions, low writer concurrency.
- PostgreSQL’s MVCC means updates create new row versions; old versions stick around until vacuumed. That’s the price of high concurrency.
- SQLite’s “database is locked” is not a mystery; it is the engine telling you: “I’m doing a write; wait your turn.”
- PostgreSQL lock diagnostics are first-class via system catalogs; you can see blockers, waiters, and query text in production.
What “concurrent writers” actually means in each engine
Concurrency is not parallelism; it’s scheduling under contention
When people say “concurrent writers,” they often mean “I have N workers and I want throughput roughly proportional to N.”
That expectation is reasonable in PostgreSQL (until you saturate CPU, I/O, locks, or commit latency). In SQLite, it’s the
wrong mental model unless you’ve centralized writes.
A writer isn’t just INSERT or UPDATE. It’s:
- acquiring locks,
- writing pages / WAL records,
- fsync’ing (durability),
- updating indexes,
- and potentially waiting on other writers.
If the system spends 20ms per transaction waiting on durable commit, you can’t brute-force your way to 10,000 TPS
by starting more threads. You’ll just create a lock-wait festival.
SQLite: “one lane, merge politely”
SQLite uses locks on the database file (and in WAL mode, coordination files) to ensure consistency. In rollback journal mode
(the older default), a writer blocks readers during commit phases. WAL mode improves that: readers don’t block writers and writers
don’t block readers as much. But writers still serialize. Only one writer can commit at a time.
You can have multiple processes attempting to write concurrently. SQLite will queue them via lock contention. If you don’t
configure timeouts and retry logic, you’ll see failures.
PostgreSQL: “multiple lanes, traffic lights, and a control room”
PostgreSQL is designed for concurrent sessions. Writers generally don’t block readers due to MVCC. Writers can block each other,
but typically only on the same rows (or on heavyweight locks like schema changes).
PostgreSQL has:
- row-level locks for conflicting updates,
- transaction snapshots,
- background writer and checkpointer processes,
- WAL buffers and group commit,
- and introspection views for diagnosing waits.
The key: concurrency is part of the architecture, not an “it might work if you’re careful” mode.
SQLite under write contention: what really happens
Locking modes and why your “fast” benchmark lies
SQLite looks incredible in a single-threaded benchmark. It’s in-process, there’s no network, no server context switches,
and the query planner is sharp. Then you add writers.
In rollback journal mode, a writer needs to lock the database for parts of the write transaction. The lock escalates through
phases (shared, reserved, pending, exclusive), and other connections may be blocked or rejected depending on timing.
In WAL mode, readers can proceed while a writer appends to the WAL, but commits still serialize.
What you feel operationally:
- write latency spikes under contention,
- “database is locked” errors appear unless you wait/retry,
- checkpoints become a hidden performance lever,
- and “just add more workers” stops working quickly.
WAL mode helps, but it’s not a multi-writer miracle
WAL mode is usually the right choice if you’re doing any meaningful concurrency in SQLite. It keeps readers from blocking writers
in the common case. But WAL introduces a new moving part: checkpoints. If checkpoints can’t keep up, the WAL file grows, and readers
can be forced to scan more history. And your filesystem now has more write patterns to deal with.
You still get a single writer at the point of commit. That’s the architectural constraint. You can manage it. You can even make it
work nicely. But you don’t “tune” your way into true concurrent writes.
Durability settings: you are trading safety for speed
SQLite’s PRAGMA synchronous and PRAGMA journal_mode are where teams quietly make deals with the reliability devil.
If you set synchronous=OFF, you can absolutely get higher throughput. You can also absolutely lose committed transactions
on power failure. Decide like an adult: write down the durability requirement, then configure accordingly.
Joke #1: SQLite is “serverless” the way instant ramen is “cooking.” It works, but don’t pretend it’s the same kitchen.
PostgreSQL under write contention: what really happens
MVCC: the reason readers don’t scream when writers show up
PostgreSQL’s MVCC means readers see a snapshot of the database as of the start of their transaction (or statement, depending on isolation).
Writers create new row versions. Readers keep reading old versions until they’re done. That’s the core concurrency win.
The trade-off is cleanup. Old row versions (dead tuples) must be vacuumed. If you ignore vacuum, you pay later in bloat, cache churn,
and index inefficiency. PostgreSQL will not save you from neglect; it will merely wait until you’re busiest.
WAL, commits, and why fsync is your actual throughput ceiling
Every durable commit means WAL must reach stable storage. PostgreSQL can do group commit: multiple transactions can share an fsync.
That’s how it sustains high commit rates on decent storage. But the underlying physics still apply: durable commits are gated by write latency.
If your I/O subsystem has 3–10ms latency for flushes under load, your max durable commits per second is not infinite, regardless of CPU.
This is where SRE reality shows up: performance tuning quickly becomes storage engineering.
Locking: the real failure mode is not “locks exist,” it’s “locks surprise you”
PostgreSQL has robust lock semantics. That’s not the issue. The issue is teams failing to understand where locks come from:
- long transactions holding row locks,
- schema migrations taking
ACCESS EXCLUSIVElocks, - autovacuum or vacuum full,
- foreign key checks under high write rates,
- hot spots on the same index pages (especially with monotonically increasing keys).
You can see blockers and waiters in PostgreSQL. Use that. Don’t guess.
Quote (paraphrased idea), attributed: Werner Vogels often emphasizes that “everything fails, all the time”—design systems so failure is normal, not exceptional.
Joke #2: If you put SQLite behind a web server and call it “distributed,” congratulations—you’ve invented a very small traffic jam.
Latency, fsync, and the storage layer (where most “DB problems” live)
Concurrent writers aren’t only a database engine story. They’re a storage story. Durable transactions mean you flush.
Flush means the kernel and the device agree data is on stable media. That operation has latency and variance.
Variance is the killer: p99 commit latency is what determines queue backup, lock wait buildup, and tail latency in services.
SQLite typically uses filesystem locking and writes to a single file (plus journal/WAL). On many filesystems, contention on that file
and metadata can show up quickly under many writers. PostgreSQL spreads work across multiple files (relation segments) and maintains WAL,
but it still ultimately depends on storage flush behavior.
The practical implication: if PostgreSQL is slow under write load, you can often fix it with better IOPS/latency, WAL tuning,
checkpoint tuning, schema/index changes, batching, or partitioning. If SQLite is slow under many writers, the fix is usually architectural:
reduce concurrency at the database file, or move to a server database.
Fast diagnosis playbook
When “writes are slow” hits production, don’t start with opinion wars. Start with evidence. Here’s the order that minimizes wasted time.
First: confirm the bottleneck class (lock, CPU, or I/O)
- SQLite: are you seeing
database is lockedor long waits on writes? That’s contention, not “mysterious slowness.” - PostgreSQL: check wait events. If sessions wait on locks, it’s contention. If they wait on WAL or I/O, it’s storage/commit. If CPU is pegged, it’s compute/query/index work.
Second: isolate commit latency vs query work
- If transactions are small but commits are slow, look at fsync, WAL flush, and storage latency.
- If commits are fast but statements are slow, look at indexes, hot rows, and query plans.
Third: check for “one big transaction” holding everyone hostage
- Long-running transactions block vacuum in PostgreSQL, which increases bloat and can cascade into write amplification.
- In SQLite, a transaction that stays open during a batch can keep the writer lock longer than you expected.
Fourth: verify your concurrency control is intentional
- SQLite: do you have a busy timeout and retry/backoff? Is WAL mode enabled? Are you checkpointing sensibly?
- PostgreSQL: do you have a connection pool? Are you saturating the server with too many concurrent writers?
Practical tasks with commands, outputs, and decisions
These are the kinds of tasks you run during an incident or a migration decision. Each one includes: the command, what the output means,
and what decision you make from it. Commands are realistic; adapt paths and service names.
Task 1: Find SQLite journal mode and synchronous level
cr0x@server:~$ sqlite3 /var/lib/app/app.db 'PRAGMA journal_mode; PRAGMA synchronous;'
wal
2
Meaning: journal mode is WAL; synchronous=2 (FULL). You’re paying for durability on each transaction.
Decision: If latency is too high, first try batching writes or reducing commit frequency, not turning durability off.
Only relax synchronous if business explicitly accepts data loss on crash.
Task 2: Check SQLite for lock errors in logs
cr0x@server:~$ journalctl -u app.service -n 200 | grep -E 'database is locked|SQLITE_BUSY' | tail
Dec 30 09:22:11 server app[1842]: ERROR db write failed: database is locked
Dec 30 09:22:12 server app[1842]: ERROR db write failed: SQLITE_BUSY in insert_event
Meaning: the application is not waiting long enough, or contention is high enough that timeouts expire.
Decision: Add busy_timeout, implement retry with jitter, and reduce the number of concurrent writers (single-writer queue pattern).
If writers are truly independent and high-rate, plan migration to PostgreSQL.
Task 3: Confirm SQLite busy timeout at runtime
cr0x@server:~$ sqlite3 /var/lib/app/app.db 'PRAGMA busy_timeout;'
0
Meaning: no busy timeout. SQLite will fail immediately when it can’t get the lock.
Decision: Set a sane timeout in the application connection setup (e.g., 2000–10000ms depending on SLO) and add retries.
Task 4: Observe WAL growth (checkpoint pressure)
cr0x@server:~$ ls -lh /var/lib/app/app.db*
-rw-r----- 1 app app 1.2G Dec 30 09:23 /var/lib/app/app.db
-rw-r----- 1 app app 3.8G Dec 30 09:23 /var/lib/app/app.db-wal
-rw-r----- 1 app app 32K Dec 30 09:23 /var/lib/app/app.db-shm
Meaning: WAL is huge compared to the database. Checkpointing is not keeping up or is blocked by long readers.
Decision: Identify long read transactions; adjust checkpoint strategy (application-level wal_checkpoint if appropriate),
or move write-heavy workload away from SQLite.
Task 5: Check for long-running SQLite readers (common WAL checkpoint blocker)
cr0x@server:~$ lsof /var/lib/app/app.db | head
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
app 1842 app 12u REG 253,0 1288490189 49155 /var/lib/app/app.db
worker 1910 app 10u REG 253,0 1288490189 49155 /var/lib/app/app.db
report 2201 app 8u REG 253,0 1288490189 49155 /var/lib/app/app.db
Meaning: multiple processes have the DB open. That’s normal, but long-lived report jobs can keep snapshots open.
Decision: Ensure reporting uses separate replicas (in PostgreSQL) or runs off exported data. In SQLite, shorten read transactions and avoid “open transaction while streaming results.”
Task 6: PostgreSQL: see who is waiting and on what
cr0x@server:~$ psql -d appdb -c "SELECT pid, usename, wait_event_type, wait_event, state, now()-query_start AS age, left(query,80) AS query FROM pg_stat_activity WHERE state <> 'idle' ORDER BY age DESC LIMIT 8;"
pid | usename | wait_event_type | wait_event | state | age | query
------+--------+-----------------+------------+--------+----------+-----------------------------------------------
6241 | app | Lock | tuple | active | 00:01:12 | UPDATE orders SET status='paid' WHERE id=$1
6310 | app | IO | DataFileRead| active | 00:00:49 | INSERT INTO events(ts, type, payload) VALUES...
6188 | app | WAL | WALWrite | active | 00:00:20 | INSERT INTO events(ts, type, payload) VALUES...
Meaning: you have at least three bottleneck types: row-level lock waits (tuple), data file reads, and WAL writes.
Decision: For Lock/tuple, find the blocker and reduce hot-row contention. For WAL, investigate commit/WAL device latency and checkpoint settings.
For DataFileRead, check cache hit rate and missing indexes.
Task 7: PostgreSQL: find the blocking query
cr0x@server:~$ psql -d appdb -c "SELECT blocked.pid AS blocked_pid, blocker.pid AS blocker_pid, blocked.query AS blocked_query, blocker.query AS blocker_query FROM pg_stat_activity blocked JOIN pg_locks bl ON bl.pid=blocked.pid AND NOT bl.granted JOIN pg_locks kl ON kl.locktype=bl.locktype AND kl.database IS NOT DISTINCT FROM bl.database AND kl.relation IS NOT DISTINCT FROM bl.relation AND kl.page IS NOT DISTINCT FROM bl.page AND kl.tuple IS NOT DISTINCT FROM bl.tuple AND kl.virtualxid IS NOT DISTINCT FROM bl.virtualxid AND kl.transactionid IS NOT DISTINCT FROM bl.transactionid AND kl.classid IS NOT DISTINCT FROM bl.classid AND kl.objid IS NOT DISTINCT FROM bl.objid AND kl.objsubid IS NOT DISTINCT FROM bl.objsubid AND kl.pid != bl.pid JOIN pg_stat_activity blocker ON blocker.pid=kl.pid;"
blocked_pid | blocker_pid | blocked_query | blocker_query
------------+------------+------------------------------------+------------------------------------------
6241 | 6177 | UPDATE orders SET status='paid'... | UPDATE orders SET status='paid'...
Meaning: writers are colliding on the same rows (or adjacent hot rows).
Decision: Fix application logic: avoid multiple workers fighting over the same row set; use SELECT ... FOR UPDATE SKIP LOCKED work-queue patterns; partition hot tables.
Task 8: PostgreSQL: check commit and WAL pressure via stats
cr0x@server:~$ psql -d appdb -c "SELECT datname, xact_commit, xact_rollback, blks_read, blks_hit, temp_files, temp_bytes FROM pg_stat_database WHERE datname='appdb';"
datname | xact_commit | xact_rollback | blks_read | blks_hit | temp_files | temp_bytes
--------+-------------+---------------+----------+---------+-----------+-----------
appdb | 8921341 | 31221 | 1842290 | 44211987| 1842 | 987654321
Meaning: heavy commit volume and notable temp usage. Temp files often mean sorts/hashes spilling to disk.
Decision: If temp spills correlate with write slowdown, tune work_mem for the specific query paths or add indexes to avoid big sorts.
Commit volume suggests batching or using fewer transactions if application can tolerate it.
Task 9: PostgreSQL: watch checkpoint behavior (classic write-latency culprit)
cr0x@server:~$ psql -d appdb -c "SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint, buffers_backend FROM pg_stat_bgwriter;"
checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint | buffers_backend
------------------+-----------------+-----------------------+----------------------+-------------------+----------------
122 | 987 | 932112 | 121009 | 8123344 | 223445
Meaning: many requested checkpoints (forced by WAL volume) and large checkpoint write time. Backend writes exist too.
Decision: Reduce checkpoint spikes: increase max_wal_size, tune checkpoint_timeout, raise checkpoint_completion_target.
Then validate storage can sustain the write rate.
Task 10: PostgreSQL: measure cache hit ratio quickly
cr0x@server:~$ psql -d appdb -c "SELECT round(100.0 * sum(blks_hit) / nullif(sum(blks_hit)+sum(blks_read),0),2) AS cache_hit_pct FROM pg_stat_database;"
cache_hit_pct
--------------
96.01
Meaning: decent hit rate, but 96% under heavy load can still mean a lot of reads.
Decision: If write throughput is gated by reads (indexes missing, random I/O), focus on query plans and indexes before buying hardware.
Task 11: System-level: confirm storage latency under load
cr0x@server:~$ iostat -xz 1 3
Linux 6.8.0 (server) 12/30/2025 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
18.22 0.00 6.11 14.90 0.00 60.77
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz aqu-sz %util
nvme0n1 120.0 12288.0 0.0 0.00 1.80 102.40 950.0 97280.0 0.0 0.00 12.40 102.40 4.10 96.00
Meaning: write await ~12ms with 96% utilization and significant iowait. Your “database problem” is now a storage problem.
Decision: Reduce synchronous commit frequency (batching/grouping), move WAL to faster storage, or upgrade device/RAID/cache settings.
Also validate you’re not throttled by cloud volume limits.
Task 12: PostgreSQL: check WAL generation rate
cr0x@server:~$ psql -d appdb -c "SELECT now(), pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),'0/0')) AS wal_since_boot;"
now | wal_since_boot
------------------------------+----------------
2025-12-30 09:25:33.11452+00 | 384 GB
Meaning: you’re producing a lot of WAL. That’s normal for heavy writes, but it drives checkpoints and replication bandwidth.
Decision: If WAL volume is unexpectedly high, check for update churn (frequent updates to the same rows), bloated indexes, or unneeded indexes.
Consider partitioning or reducing secondary indexes on ingest tables.
Task 13: PostgreSQL: find top write queries quickly
cr0x@server:~$ psql -d appdb -c "SELECT calls, rows, mean_exec_time, left(query,90) AS query FROM pg_stat_statements WHERE query ILIKE 'insert%' OR query ILIKE 'update%' ORDER BY mean_exec_time DESC LIMIT 5;"
calls | rows | mean_exec_time | query
-------+--------+----------------+------------------------------------------------------------------------------------------
8821 | 8821 | 14.22 | INSERT INTO events(ts, type, payload) VALUES ($1,$2,$3)
1102 | 1102 | 11.90 | UPDATE orders SET status=$1, updated_at=now() WHERE id=$2
Meaning: the inserts are materially expensive. Could be WAL/fsync, indexes, or contention.
Decision: If insert mean time tracks WAL waits, focus on commit and WAL device. If it tracks CPU, reduce indexes or batch inserts using multi-row insert or COPY.
Task 14: SQLite: test contention by simulating two writers (quick repro)
cr0x@server:~$ bash -lc 'sqlite3 /tmp/t.db "PRAGMA journal_mode=WAL; CREATE TABLE IF NOT EXISTS t(id INTEGER PRIMARY KEY, v TEXT);"; (sqlite3 /tmp/t.db "BEGIN; INSERT INTO t(v) VALUES(\"a\"); SELECT \"writer1 holding\"; SELECT sleep(2); COMMIT;" &) ; (sqlite3 /tmp/t.db "PRAGMA busy_timeout=500; INSERT INTO t(v) VALUES(\"b\");" ; wait)'
wal
writer1 holding
Error: database is locked
Meaning: even in WAL mode, if the first writer holds a transaction, the second writer fails if it can’t wait long enough.
Decision: Keep transactions short; set busy timeout; don’t hold write transactions while doing application work.
If you need real concurrent writes, stop fighting physics: move to PostgreSQL.
Three corporate-world mini-stories (anonymized, plausible, painful)
Incident: the wrong assumption (“SQLite supports concurrent writes, right?”)
A mid-sized team shipped a “lightweight analytics” feature inside a service that already handled user requests.
They chose SQLite because it was easy: a file, a migration script, and no new infrastructure. They even enabled WAL mode.
In staging, it flew. In production, the service ran with 12 worker processes and a bursty job queue.
The first symptom was subtle: sporadic 500s during traffic spikes, all pointing to failed inserts. The error text was clear:
database is locked. But the team read it like a transient glitch rather than a design constraint.
They added retries. It got “better,” then worse.
The real problem was that each worker opened a transaction, did a few inserts, and then performed a network call before committing.
That network call sometimes took hundreds of milliseconds. During that time, the writer lock was held. Other workers queued, timed out,
retried, and created a self-amplifying storm of lock attempts.
They fixed it in two phases. First, they moved the network call outside the transaction and shortened the transaction scope.
That stabilized the service. Second, they migrated analytics writes to PostgreSQL, leaving SQLite only for local caching on edge nodes.
It stopped being dramatic. It became boring. Boring is the goal.
Optimization that backfired: “Let’s increase concurrency to speed up ingestion”
Another company had a PostgreSQL-backed ingestion pipeline writing events into a single large table.
In an attempt to hit a new throughput target, they increased the number of ingestion workers from “a few” to “a lot.”
CPU stayed fine. Network stayed fine. Latency exploded anyway.
They assumed the database would scale linearly with worker count. Instead, the system hit contention and I/O limits.
Autovacuum started to work harder. Checkpoints became more frequent due to WAL volume.
Commit time p99 climbed, which backed up the queue, which increased parallelism further because “workers were idle waiting.”
They also had a well-meaning index on a high-cardinality JSON-derived field that almost never got used for reads.
Every insert paid for it. Under low concurrency, it was tolerable. Under high concurrency, it was the tax that pushed them over the edge.
The system wasn’t “slow.” It was doing exactly what they asked, just not what they wanted.
The fix wasn’t “more tuning.” It was: reduce worker count to match the storage commit capacity, remove or delay the expensive index,
and batch inserts using COPY during peak ingestion windows. After that, throughput improved and latency stabilized.
They learned the old SRE lesson: concurrency is a tool, not a virtue.
Boring but correct practice that saved the day: “We sized WAL and practiced restores”
A payments-adjacent system (not the ledger, but close enough to matter) ran PostgreSQL with steady write load.
The engineering manager insisted on three things that nobody found exciting:
scheduled vacuum review, WAL and checkpoint settings documented with rationale, and routine restore drills to a separate environment.
One afternoon, a schema migration deployed with a new index creation. It wasn’t a catastrophic migration, but it increased write
amplification and WAL generation more than expected. Replication lag grew. Alerts fired. The on-call could see it in the stats:
more requested checkpoints, more WAL volume, rising commit latency.
Because the team had already sized max_wal_size and checkpoint targets conservatively, the system didn’t immediately thrash itself.
Because they practiced restores, rolling back the migration and re-deploying a safer version wasn’t terrifying.
And because vacuum health was monitored, bloat didn’t compound the issue.
The incident was still annoying, but it stayed within “annoying.” No data loss. No multi-day recovery epic.
The boring practices didn’t make headlines. They prevented them.
Common mistakes: symptoms → root cause → fix
1) Symptom: SQLite “database is locked” during peak traffic
Root cause: too many concurrent writers, no busy timeout, long transactions, or writes performed while holding a transaction open.
Fix: enable WAL mode; set busy_timeout; keep transactions short; implement retry with jitter; centralize writes through a single writer; migrate write-heavy paths to PostgreSQL.
2) Symptom: SQLite WAL file grows without bound
Root cause: checkpoints not running, checkpoints blocked by long readers, or an app pattern that keeps read transactions open.
Fix: shorten read transactions; avoid streaming reads while a transaction stays open; run periodic checkpoints; consider separating reporting from OLTP writes.
3) Symptom: PostgreSQL inserts slow down as concurrency increases
Root cause: commit latency bound (WAL flush), storage saturation, or too many small transactions.
Fix: batch inserts; use COPY; ensure WAL is on low-latency storage; tune checkpoints (max_wal_size, checkpoint_completion_target); consider async commit only if acceptable.
4) Symptom: PostgreSQL writers stuck waiting on locks
Root cause: hot rows, queue patterns without SKIP LOCKED, long transactions, or DDL locking during peak load.
Fix: redesign hot spots; use SELECT ... FOR UPDATE SKIP LOCKED; keep transactions short; run DDL with lock-friendly patterns; schedule heavy migrations off-peak.
5) Symptom: PostgreSQL bloat and worsening write amplification over time
Root cause: autovacuum falling behind, long transactions preventing cleanup, or frequent updates to the same rows.
Fix: monitor vacuum; fix long transactions; adjust autovacuum thresholds per table; reduce churn; partition high-churn data.
6) Symptom: “We added indexes and it got slower”
Root cause: every insert/update must maintain every index; write path became heavier than read benefit justifies.
Fix: keep only indexes that pay for themselves; delay index creation until after backfills; use partial indexes; benchmark write cost under realistic concurrency.
Checklists / step-by-step plan
Decision checklist: should this workload be SQLite or PostgreSQL?
- Count your writers. If you expect multiple independent processes/threads to write frequently, default to PostgreSQL.
- Define durability. If you can’t tolerate losing “successful” writes on crash, don’t use unsafe SQLite pragmas. PostgreSQL defaults are safer.
- Transaction shape matters. If you can batch and tolerate a single-writer pattern, SQLite can work.
- Operational requirements. If you need replication, PITR, online schema changes, and introspection: PostgreSQL.
- Deployment environment. If you can’t run a DB server (edge devices, offline apps), SQLite is a gift.
Step-by-step: making SQLite behave under modest write concurrency
- Enable WAL mode.
- Set a busy timeout and implement retry with jitter.
- Keep write transactions short; never hold them across network calls.
- Batch writes: fewer commits, more work per transaction.
- Control checkpointing if WAL grows (and validate long readers aren’t blocking).
- If your write queue keeps growing, stop optimizing and migrate.
Step-by-step: stabilizing PostgreSQL write throughput
- Measure wait events and lock contention in
pg_stat_activity. - Check checkpoints and WAL pressure in
pg_stat_bgwriter. - Validate storage latency with
iostat; confirm WAL device performance. - Reduce transaction count by batching; use COPY for ingestion.
- Remove expensive unused indexes on hot ingest tables.
- Fix hot-row contention (queue patterns, counters, status updates).
- Ensure vacuum keeps up; fix long transactions.
- Add capacity only after you’ve confirmed what’s saturated.
FAQ
1) Can SQLite handle multiple writers at all?
Multiple connections can attempt writes, but SQLite serializes the actual write/commit work via locking. Under contention, you’ll see waits or SQLITE_BUSY.
It’s workable if you keep transactions short and accept that writes queue behind each other.
2) Does WAL mode make SQLite “concurrent” like PostgreSQL?
WAL mode improves reader/writer concurrency. It does not turn SQLite into a multi-writer engine. Writers still serialize at commit.
WAL also introduces checkpoint behavior that you must understand, or you’ll get surprise disk usage and latency.
3) Why does PostgreSQL tolerate many writers better?
MVCC reduces reader/writer blocking, row-level locks localize conflicts, and the system is built around concurrent sessions with background processes.
It also supports group commit, which helps amortize fsync cost across many transactions.
4) What’s the real limit for PostgreSQL write throughput?
Often: durable commit latency (WAL flush) and storage throughput. After that: lock contention on hot rows/indexes, checkpoint behavior, and CPU for index maintenance.
The limit is usually not “PostgreSQL can’t,” it’s “your storage and schema disagree with your workload.”
5) Should I just disable fsync/synchronous settings to get speed?
Only if you’re comfortable losing data on crash/power failure. In SQLite, PRAGMA synchronous=OFF is a real data-loss lever.
In PostgreSQL, changing durability settings has similar trade-offs. Write down the acceptable loss window first.
6) Is connection pooling required for PostgreSQL writers?
If you have many application instances, yes. Not because pooling magically speeds up queries, but because too many connections cause overhead
and can increase contention. Pooling helps you control concurrency and keep the database in its efficient operating range.
7) When is SQLite the right choice even with writes?
Local-first apps, single-user systems, edge devices, CI/test tooling, caching layers, or when you can enforce a single writer and batch updates.
If you can’t enforce those constraints, SQLite becomes a fragile dependency.
8) What’s the most common migration path from SQLite to PostgreSQL?
Usually: keep SQLite for local cache/offline mode, move authoritative writes to PostgreSQL, add a sync process, then phase out SQLite writes.
The key is to stop treating SQLite as a shared write hub for concurrent workers.
9) Why do I see PostgreSQL lock waits when I’m “only inserting”?
Inserts still hit indexes, sequences, foreign keys, and sometimes hot pages. If many sessions insert into the same table with the same index pattern,
you can see contention. Also, inserts that update “status” rows or counters create hot rows immediately.
10) What’s the simplest safe pattern for a work queue?
In PostgreSQL: a queue table with SELECT ... FOR UPDATE SKIP LOCKED and short transactions. In SQLite: a single writer process that dequeues and writes,
with readers consuming snapshots as needed.
Next steps you can actually execute
If you’re choosing between PostgreSQL and SQLite for a system with concurrent writers, decide based on who owns contention.
SQLite makes your application own it. PostgreSQL shares the burden and gives you the dashboards, levers, and escape hatches.
- If you’re on SQLite and seeing lock errors: enable WAL, set busy timeout, shorten transactions, and centralize writes. If the workload still needs many writers, schedule a PostgreSQL migration.
- If you’re on PostgreSQL and writes are slow: check wait events and blockers, then check WAL/checkpoints, then check storage latency. Fix hot rows and transaction shape before buying hardware.
- In both cases: measure p95/p99 commit latency and transaction rate. That’s the ground truth for “concurrent writers.”
The “winner” isn’t a moral judgment. It’s a fit-for-purpose call. For concurrent writers in production, PostgreSQL is the grown-up tool.
Use SQLite where its single-file simplicity is a superpower—not where it becomes your outage generator.