You ship a service, everything looks fine in staging, and then production traffic arrives with a cruel little gift: latency.
Not the “we need faster queries” kind. The “why are we paying for a database server just to wait on it” kind.
Sometimes the fastest database is the one you don’t have to talk to over a socket. Sometimes the boring file on disk,
sitting next to your app, quietly outperforms a perfectly respectable MySQL cluster—at least for the workload you actually have.
The “free speed” premise: where SQLite wins without trying
SQLite is a library. MySQL is a service. That single sentence explains 70% of the “SQLite was faster” benchmark screenshots
people wave around like a victory flag.
If you run SQLite in-process, your app calls into a library, which reads a local file (or page cache) and returns rows. No TCP.
No connection pool drama. No server thread scheduling. No auth handshake. No proxy hop. No waiting behind other clients.
It’s the database equivalent of walking to the fridge instead of ordering delivery and arguing with the intercom.
This is what I mean by “free speed”: speed you get by removing moving parts, not by being clever.
Your query plan might be the same complexity; your data might be the same size; your storage might even be the same disk.
But the path length—CPU instructions, context switches, syscalls, wakeups—gets shorter.
The trap is thinking this means SQLite is “better” in general. It’s not. It’s better in specific, common scenarios:
local-first apps, edge workloads, read-heavy services with low write concurrency, and systems where operational simplicity
is a feature, not an afterthought.
First short joke (because this topic deserves one): SQLite has a fantastic ops story—mostly because it refuses to have ops.
Two mental models: database server vs database file
MySQL: separate failure domain, separate performance envelope
MySQL is a network service with its own process, memory, threads, and I/O scheduler. That separation is a superpower:
it isolates database work from application crashes, allows many clients to connect, and supports advanced replication
and clustering patterns.
But separation costs. Every request crosses a boundary: client library → socket → kernel networking → server → storage engine.
Each boundary introduces overhead and additional queueing points. Under load, queueing dominates.
This is why your “simple SELECT by primary key” can go from sub-millisecond to tens of milliseconds without the query itself changing.
SQLite: the database is a file, the server is your process
SQLite runs inside your process space. It reads and writes a single database file (plus optional journal/WAL files),
using OS locks for concurrency coordination. It leans heavily on the OS page cache and benefits from locality.
The killer detail: in many workloads, data is already in memory (page cache). SQLite can hit those pages without a round trip.
When you benchmark “SELECT 1 row,” you’re measuring function calls and cache hits—not client-server protocol overhead.
So what are you choosing, really?
You’re choosing how much you want to pay for shared access, remote access, and multi-writer concurrency.
MySQL is optimized for “many clients, many writers, central authority.”
SQLite is optimized for “one application owns the data, reads are frequent, writes are coordinated.”
If you’re building a web app with hundreds of concurrent write transactions from many app servers, SQLite is not a bargain.
If you’re building a service with mostly reads, a modest write rate, and a desire to remove infra,
SQLite can be unreasonably effective.
Facts & history that change how you reason about both
- SQLite (2000) was designed by D. Richard Hipp to be embedded, serverless, and self-contained, targeting reliability and portability over feature sprawl.
- SQLite’s “public domain” approach (effectively no licensing friction) helped it land everywhere: browsers, phones, routers, printers, and desktop apps.
- MySQL (mid-1990s) emerged in an era where “database” meant a separate server process and where shared hosting demanded multi-tenant access.
- InnoDB became the default MySQL storage engine because it brought crash recovery, transactions, and row-level locking to a world that had learned to fear table locks.
- SQLite added WAL mode to drastically improve read concurrency by separating readers from writers, a turning point for real-world app workloads.
- SQLite uses a compact B-tree design optimized for local storage and predictable performance, which is why it behaves so well on small devices.
- MySQL replication shaped modern ops patterns (read replicas, failover, binlogs), but those benefits come with operational responsibility.
- SQLite’s test culture is famously intense: huge automated test coverage and aggressive fuzzing have made it one of the most battle-tested pieces of infrastructure you already depend on.
Workloads: the exact shapes where SQLite beats MySQL (and where it doesn’t)
SQLite wins when the “database” is mostly a local index
Think: a job runner tracking task state, a service caching API responses, a CLI tool storing metadata, a desktop app,
an edge collector buffering events, or a single-node web service with a clear read pattern.
In these setups, MySQL’s strengths—multi-client arbitration, remote access, heavy concurrency—are underused.
You’re paying for a taxi to drive across the street.
SQLite wins when reads dominate and writes are coordinated
SQLite can serve many concurrent readers efficiently, especially in WAL mode. Writes, however, are serialized at the database level.
If your write rate is low or if you can funnel writes through one worker (or one leader), you can get excellent throughput with low latency.
MySQL wins when you need sustained multi-writer concurrency
MySQL with InnoDB is built for concurrent writes. Row-level locking, MVCC, background flushing, and separate buffer pools are
all engineered for the “many writers” world. If your system has frequent writes from many app instances and you can’t coordinate them,
SQLite becomes a lock contention generator.
MySQL wins when you need remote access and shared ownership
If multiple services or teams need to access the same dataset, centralizing the database is often the correct organizational decision,
not just a technical one. You want access control, auditing, backups, and predictable multi-client behavior.
SQLite wins when you need “ship it with the app” simplicity
Shipping SQLite is like shipping a library. Shipping MySQL is shipping an ecosystem: config, upgrades, backups, monitoring,
user management, and the occasional “why is it swapping” postmortem.
Second short joke: The easiest database migration is the one you never do, which is why people keep SQLite around like an old but reliable hoodie.
The latency budget: your query is innocent, the trip is guilty
In production, most “database slowness” is not the query plan. It’s everything else:
connection churn, thread scheduling, lock waits, fsync waits, noisy neighbors, and network tail latency.
SQLite removes the network and server scheduling layers. That’s the free speed. If your workload is small enough to fit into RAM
(or mostly in OS cache), and if writes are modest, the median and tail latency can be dramatically better.
MySQL can be extremely fast too—but it needs competent tuning and stable conditions. SQLite needs less help to be decent,
because there’s less to tune.
What you’re actually measuring in benchmarks
When someone posts a benchmark saying “SQLite is 3× faster than MySQL,” ask:
- Are they using localhost sockets or real network?
- Are they reusing connections or reconnecting every query?
- Is MySQL fsyncing on every transaction while SQLite is not (or vice versa)?
- Is the dataset in cache for one and not the other?
- Are they measuring single-thread or real concurrency?
- Are they using WAL mode and sensible synchronous settings?
If the benchmark doesn’t answer those questions, it’s a story, not evidence.
Durability and crash semantics: what “safe” really means
The first serious question SREs ask is not “how fast,” it’s “what happens when the host falls over mid-write.”
Both MySQL and SQLite can be durable. Both can also be configured into foot-guns.
SQLite durability knobs: journal mode and synchronous
SQLite’s durability is governed primarily by journaling mode (DELETE, TRUNCATE, PERSIST, MEMORY, WAL)
and PRAGMA synchronous (OFF, NORMAL, FULL, EXTRA). WAL mode typically improves concurrency and write throughput
by appending to a WAL file and checkpointing later.
The uncomfortable truth: many apps “benchmark” SQLite with synchronous=OFF and then act surprised when power loss causes corruption.
That’s not a database problem, it’s a decision problem.
MySQL durability knobs: innodb_flush_log_at_trx_commit and friends
MySQL’s durability lives in InnoDB: redo logs, doublewrite buffer, buffer pool flushing. The famous knob is
innodb_flush_log_at_trx_commit. Set it to 1 and you fsync at commit (durable, slower).
Set it to 2 or 0 and you trade durability for throughput.
Both systems let you choose. The key is to choose consciously, document it, and test failure behavior.
A reliability paraphrased idea worth keeping
Paraphrased idea from John Allspaw: reliability comes from designing for failure and learning from it, not from pretending failure won’t happen.
Concurrency: locks, MVCC, and why “multiple writers” is a lifestyle choice
SQLite: one writer at a time, by design
SQLite allows multiple readers and one writer. WAL mode improves the reader story because readers can continue reading the old snapshot while a writer appends to WAL.
But if you have multiple concurrent writers, they queue.
The outcome is not “it breaks.” The outcome is latency spikes, busy timeouts, and occasionally the kind of thundering herd
where everyone retries at once and makes it worse.
MySQL/InnoDB: built for concurrent writers, but not magic
InnoDB offers row-level locks and MVCC to allow concurrent transactions. But contention exists:
hot rows, hot secondary indexes, auto-increment locks (depending on configuration), metadata locks, and buffer pool pressure.
MySQL can handle high concurrency—until your schema or query pattern turns it into a lock wait simulator.
You don’t get concurrency for free; you get it with careful indexing and transaction design.
Coordination strategies that make SQLite viable
- Single-writer architecture: funnel writes through one process or thread. Readers can be many.
- Batch writes: fewer transactions, larger commits (within reason).
- Use WAL + busy_timeout: reduce spurious failures under light contention.
- Keep transactions short: “Do work, then write” beats “write while thinking.”
Operational overhead: the hidden tax MySQL adds (and SQLite doesn’t)
MySQL isn’t “hard,” but it’s a system. It needs patching, backups, privileges, replication management, disk provisioning,
monitoring, alerting, and humans who remember what they did six months ago.
SQLite’s ops model is: back up a file; monitor disk; verify you aren’t corrupting it; and don’t let ten writers fight in a hallway.
That simplicity is worth money.
The flip side: SQLite pushes responsibility to the application boundary. File placement, filesystem semantics, container storage,
and backup consistency are now your problem. If you treat it like a magical blob, it will treat you like an amateur.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A product team built a small internal dashboard service. It was read-heavy and mostly served cached analytics results.
They chose SQLite to avoid standing up MySQL. Reasonable.
Then the service got “upgraded” to support user annotations. Writes were small, but they happened in bursts:
every morning, dozens of people opened the dashboard at the same time, created notes, and updated tags.
The team assumed “small writes” meant “no big deal.”
On the first Monday after launch, the service started returning intermittent 500s. The app logs showed “database is locked.”
The on-call did what on-calls do: increased retries. The error rate got worse because now every client retried at the same time,
effectively turning “one writer” into “a queue with a megaphone.”
The fix wasn’t to abandon SQLite. The fix was to treat it like what it is: a single-writer database. They introduced a write queue
(a single background worker doing transactions), shortened transaction scope, enabled WAL, and set a sane busy timeout.
The error rate dropped to zero and latency normalized.
The wrong assumption wasn’t “SQLite is fast.” The wrong assumption was “write concurrency doesn’t matter if writes are small.”
Concurrency doesn’t care about your feelings.
Mini-story 2: The optimization that backfired
Another team ran MySQL for a session store with moderate read/write rates. They were chasing p99 latency and noticed fsync waits.
Someone suggested lowering durability because “sessions are ephemeral.” They changed InnoDB flush behavior to reduce fsync pressure.
Latency improved immediately. The change was celebrated.
Two weeks later, a host reboot during maintenance caused a chunk of recent session writes to vanish. Users were logged out,
carts were lost, and customer support got an unscheduled workout.
The postmortem wasn’t dramatic. It was boring, which is worse. The team had silently redefined the meaning of “committed.”
They optimized for benchmarks and forgot to optimize for user experience.
They rolled back the durability change and fixed latency the right way: larger redo logs, better buffer pool sizing,
and transaction batching at the application layer. MySQL got stable again.
The lesson: durability is part of your product, not just your database config.
Mini-story 3: The boring but correct practice that saved the day
A small fleet of edge collectors buffered telemetry locally and uploaded in batches. They used SQLite on each device.
Writes were frequent but coordinated: one ingestion process wrote; uploaders read.
The team did something painfully unsexy: they tested power loss and filesystem full conditions.
During one deployment, a bug caused upload retries to explode. Devices started filling disks.
The ingestion process began failing writes with “disk I/O error.” This could have turned into silent data loss,
because edge fleets are excellent at failing quietly.
But they had two guardrails: (1) disk space monitoring with a hard cutoff that paused ingestion before total exhaustion,
and (2) a periodic integrity check on the SQLite DB that ran during low-traffic windows.
When the incident hit, devices stopped ingesting before corrupting the database, sent a clear health signal,
and recovered automatically once the retry bug was fixed. No heroic hand-edits of database files.
No mystery corruption.
Boring practice, saved day: proactive failure testing plus simple, explicit backpressure.
Practical tasks: commands, outputs, and decisions (12+)
Below are real tasks I’d run in production or in a staging environment that actually resembles production.
Each includes: command, what the output means, and the decision it drives.
Task 1: Confirm SQLite journal mode and synchronous level
cr0x@server:~$ sqlite3 /var/lib/myapp/app.db "PRAGMA journal_mode; PRAGMA synchronous;"
wal
2
Meaning: WAL mode is enabled. synchronous=2 means FULL (durability-focused).
Depending on build, numeric values map to OFF/NORMAL/FULL/EXTRA.
Decision: If you’re seeing write latency spikes, consider NORMAL for acceptable durability in many cases,
but only if your product can tolerate losing the last transaction on power loss. Document the tradeoff.
Task 2: Check SQLite for lock contention via busy_timeout and quick write test
cr0x@server:~$ sqlite3 /var/lib/myapp/app.db "PRAGMA busy_timeout=5000; BEGIN IMMEDIATE; SELECT 'got write lock'; COMMIT;"
got write lock
Meaning: The process acquired a write lock quickly. If it stalls or errors, you have a writer already active.
Decision: If this frequently waits, your architecture needs a single-writer queue or reduced transaction scope.
Don’t “fix” it by adding retries everywhere.
Task 3: Observe active SQLite access patterns (file locks and writers)
cr0x@server:~$ sudo lsof /var/lib/myapp/app.db | head
myapp 1187 appuser 12u REG 259,0 52428800 1048577 /var/lib/myapp/app.db
myapp 1187 appuser 13u REG 259,0 8388608 1048578 /var/lib/myapp/app.db-wal
myapp 1187 appuser 14u REG 259,0 32768 1048579 /var/lib/myapp/app.db-shm
Meaning: WAL and SHM files exist and are open. That’s expected in WAL mode.
Many processes holding the file open can hint at multi-writer risk.
Decision: If you see lots of different PIDs opening the DB for writes, redesign so only one component writes.
Task 4: Measure SQLite database and WAL growth (checkpoint pressure)
cr0x@server:~$ ls -lh /var/lib/myapp/app.db /var/lib/myapp/app.db-wal
-rw------- 1 appuser appuser 48M Dec 30 09:41 /var/lib/myapp/app.db
-rw------- 1 appuser appuser 512M Dec 30 09:43 /var/lib/myapp/app.db-wal
Meaning: WAL is much larger than the main DB. Checkpointing may not be happening (or is blocked by long readers).
Decision: Investigate long-running reads; consider manual or tuned checkpointing; ensure readers aren’t holding snapshots forever.
WAL bloat can turn into disk pressure and slow checkpoints.
Task 5: Force and inspect a SQLite checkpoint result
cr0x@server:~$ sqlite3 /var/lib/myapp/app.db "PRAGMA wal_checkpoint(TRUNCATE);"
0|0|0
Meaning: The three numbers are (busy, log, checkpointed) pages. All zeros often means nothing to do or already checkpointed.
If “busy” is non-zero, checkpointing couldn’t proceed due to active readers.
Decision: If busy pages persist, find and fix long-lived readers (leaky connections, streaming queries).
Task 6: Run a quick SQLite integrity check (catch corruption early)
cr0x@server:~$ sqlite3 /var/lib/myapp/app.db "PRAGMA quick_check;"
ok
Meaning: Structural integrity is OK. If you see anything else, treat it as urgent.
Decision: If corruption appears: stop writes, snapshot the file, and restore from a known-good backup.
Then investigate storage, power-loss behavior, and unsafe PRAGMA settings.
Task 7: Check filesystem mount options (fsync semantics matter)
cr0x@server:~$ findmnt -no SOURCE,FSTYPE,OPTIONS /var/lib/myapp
/dev/nvme0n1p2 ext4 rw,relatime,errors=remount-ro
Meaning: You’re on ext4 with typical options. If you see exotic options (like disabling barriers),
durability assumptions can break.
Decision: If you need strict durability, keep conservative mount options and avoid “performance” flags you can’t explain in a postmortem.
Task 8: See if disk is your real bottleneck (iostat)
cr0x@server:~$ iostat -x 1 3
Linux 6.5.0 (server) 12/30/2025 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
12.00 0.00 6.00 18.00 0.00 64.00
Device r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 120.0 400.0 3.2 18.5 86.0 5.2 11.8 2.1 14.7 0.6 31.0
Meaning: Non-trivial iowait and higher write await suggests fsync/flush pressure.
Decision: If disk latency is high, no query optimization will save you. Reduce sync frequency carefully (if allowed),
batch writes, or move to faster storage.
Task 9: Check MySQL server health and immediate contention hints
cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_running'; SHOW GLOBAL STATUS LIKE 'Questions';"
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| Threads_running | 64 |
+-----------------+-------+
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| Questions | 12893412 |
+---------------+----------+
Meaning: Many running threads can indicate CPU saturation, lock waits, or a thundering herd.
Decision: If Threads_running is high and latency is high, check for lock waits and slow queries before adding more app workers.
Task 10: Identify MySQL lock waits in InnoDB
cr0x@server:~$ mysql -e "SHOW ENGINE INNODB STATUS\G" | sed -n '1,120p'
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
2025-12-30 09:48:12 0x7f2c1c1fe700 INNODB MONITOR OUTPUT
=====================================
...
LATEST DETECTED DEADLOCK
------------------------
...
TRANSACTIONS
------------
Trx id counter 1829341
Purge done for trx's n:o < 1829200 undo n:o < 0 state: running
History list length 1234
...
Meaning: This output tells you if you’re deadlocking, building up history list length (undo),
or stuck on locks.
Decision: If you see deadlocks or huge history list length, shorten transactions and add indexes to reduce lock scope.
If the “LATEST DETECTED DEADLOCK” repeats, fix the application pattern, not the database.
Task 11: Check MySQL durability setting that affects fsync behavior
cr0x@server:~$ mysql -e "SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';"
+------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| innodb_flush_log_at_trx_commit | 1 |
+------------------------------+-------+
Meaning: Value 1 means redo log is flushed to disk at each commit (durable).
Decision: If latency is dominated by fsync and you can tolerate minimal data loss, you might choose 2.
If you cannot tolerate it, keep 1 and fix performance elsewhere (batching, storage, schema).
Task 12: Inspect connection churn (a silent MySQL latency killer)
cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Connections'; SHOW GLOBAL STATUS LIKE 'Aborted_connects';"
+---------------+--------+
| Variable_name | Value |
+---------------+--------+
| Connections | 904221 |
+---------------+--------+
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Aborted_connects | 1203 |
+------------------+-------+
Meaning: Very high Connections relative to steady QPS often means you’re connecting too frequently.
Aborted_connects hints at auth/network issues or resource limits.
Decision: If connections are churning: fix pooling, raise timeouts, and stop doing “connect per request.”
If you can’t fix that soon, SQLite’s in-process model may genuinely outperform you for the same workload.
Task 13: Verify whether MySQL is reading from disk or cache (buffer pool pressure)
cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads'; SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';"
+-------------------------+----------+
| Variable_name | Value |
+-------------------------+----------+
| Innodb_buffer_pool_reads| 498221 |
+-------------------------+----------+
+-----------------------------------+-----------+
| Variable_name | Value |
+-----------------------------------+-----------+
| Innodb_buffer_pool_read_requests | 289223112 |
+-----------------------------------+-----------+
Meaning: Buffer pool reads are physical reads; read requests are logical reads. A low ratio is good.
A rising Innodb_buffer_pool_reads rate means you’re missing cache and hitting disk.
Decision: If you’re missing cache: increase buffer pool, reduce working set, add indexes, or move hot data elsewhere.
If your working set is tiny, SQLite with OS cache may be simpler and faster.
Task 14: Check where your SQLite file lives (containers and network storage gotchas)
cr0x@server:~$ df -T /var/lib/myapp/app.db
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/nvme0n1p2 ext4 205113320 80422344 114123456 42% /
Meaning: The DB is on local ext4. Good. If it’s on NFS or a weird overlay, your locking and fsync guarantees can get exciting.
Decision: Keep SQLite on local storage unless you deeply understand the semantics of your network filesystem and have tested them under failure.
Fast diagnosis playbook: what to check first/second/third
When “the database is slow,” your job is to find which queue is building up. Don’t start by rewriting SQL.
Start by locating the waiting room.
First: is it latency from crossing boundaries?
- MySQL: check connection churn, pool sizing, DNS, TLS overhead, proxy hops.
- SQLite: check if the DB is on local disk and whether you accidentally put it on network storage or a contended volume.
If you’re doing connect-per-request to MySQL, that’s your bottleneck until proven otherwise.
If SQLite is on a slow network mount, that’s your bottleneck until proven otherwise.
Second: is it lock contention?
- MySQL: inspect InnoDB status for lock waits/deadlocks; look for hot rows and long transactions.
- SQLite: look for “database is locked,” long-running readers blocking checkpoints, and multiple writers.
Contention shows up as spiky latency and timeouts while CPU might look “fine.” That’s classic queueing.
Third: is it disk flush / fsync pressure?
- Both: check iowait, disk await, and whether you are forcing sync on every transaction.
- SQLite: inspect WAL size growth; checkpoint behavior; synchronous settings.
- MySQL: watch redo log flush behavior and buffer pool misses.
If the disk is slow, the database is slow. There’s no debate club meeting required.
Finally: only now, examine query plans
Query plans matter, but in many real incidents, the “bad query” is the one that holds a lock too long or causes cache churn.
Fix the waiting first. Then optimize the SQL.
Common mistakes: symptom → root cause → fix
1) SQLite returns “database is locked” under load
Symptom: sporadic failures or long waits on writes, often during traffic spikes.
Root cause: multiple concurrent writers; long transactions; missing busy_timeout; WAL misconfiguration; checkpoints blocked by long readers.
Fix: funnel writes through a single writer; enable WAL; set busy_timeout; keep transactions short; ensure readers close promptly; tune checkpointing.
2) SQLite benchmarks look great, production loses data after crash
Symptom: corruption or missing recent writes after power loss/reboot.
Root cause: unsafe durability settings (e.g., synchronous=OFF), or storage layer lying about flushes.
Fix: use WAL with sensible synchronous level; keep SQLite on local storage; test crash/power-loss; implement backups and integrity checks.
3) MySQL is “slow” but CPU is low and disks are fine
Symptom: high query latency with low resource utilization.
Root cause: lock waits, connection storms, or queueing at the server due to too many threads.
Fix: reduce connection churn; fix pooling; inspect InnoDB lock waits and deadlocks; shorten transactions; add needed indexes.
4) MySQL gets fast in benchmarks, then painful in production
Symptom: great p50, terrible p99; periodic stalls.
Root cause: buffer pool misses, fsync bursts, background flushing, or replication lag causing app-level backpressure.
Fix: size buffer pool; avoid hot indexes; batch writes; monitor redo and flush behavior; ensure replicas aren’t overloaded if you rely on them.
5) SQLite on Kubernetes behaves unpredictably
Symptom: weird latency, lock errors, or data disappears after rescheduling.
Root cause: DB file stored on ephemeral container FS, overlay layers, or a volume with unexpected locking semantics.
Fix: use a proper persistent volume with tested semantics; keep the DB local to a node when possible; treat pod rescheduling as a failure scenario and plan for it.
6) “We can just put SQLite on NFS so all pods share it”
Symptom: corruption risks, lock weirdness, performance collapse.
Root cause: network filesystem semantics, lock manager behavior, and fsync guarantees that don’t match SQLite’s assumptions.
Fix: don’t. If you need shared access across nodes, use a database server (MySQL/Postgres) or a replicated local-first architecture with explicit sync.
Checklists / step-by-step plan
Decision checklist: should this workload be SQLite?
- Is the dataset owned by one application? If multiple independent services must write, prefer MySQL.
- Is write concurrency low or coordinatable? If yes, SQLite remains on the table.
- Can you keep the DB on local storage? If no, think very hard; SQLite hates “surprising” storage.
- Is the working set small and hot? SQLite plus OS cache can be brutally fast.
- Do you need replication, failover, and remote access? If yes, MySQL wins unless you build those layers yourself.
- Is operational simplicity a top requirement? SQLite gives you fewer knobs and fewer pager alerts.
SQLite production setup plan (boring, correct, repeatable)
- Place the DB on a persistent, local filesystem with known semantics (ext4/xfs on real disks).
- Enable WAL and set a busy timeout in the application startup.
- Decide durability (
synchronous) explicitly and write it down in your runbook. - Keep transactions short; do not hold a transaction while making network calls.
- Implement periodic
PRAGMA quick_checkduring low traffic windows. - Back up with a consistent method (e.g., SQLite online backup API or controlled snapshots) and test restores.
- Monitor WAL growth and disk free space; implement backpressure before disk-full.
- Design for single-writer, many-reader patterns; add a write queue if needed.
MySQL “stop the bleeding” plan when you suspect it’s slower than it should be
- Verify connection pooling and reduce churn; check Connections growth rate.
- Check lock waits/deadlocks; find hot rows and long transactions.
- Check disk latency and iowait; fsync stalls can dominate p99.
- Inspect buffer pool misses; if you’re reading from disk constantly, you’re already behind.
- Only then: tune indexes and query plans for the actual top queries.
FAQ
1) Is SQLite “faster” than MySQL?
Sometimes, yes—especially for read-heavy workloads where the overhead of client-server communication dominates.
But MySQL can outperform SQLite under heavy concurrent writes and complex multi-client workloads.
2) When does SQLite beat MySQL in real production?
When the DB is local, the working set is hot, reads dominate, and writes are coordinated (single-writer or low contention).
Also when you want to remove infra and simplify operations.
3) Can I use SQLite for a web app?
Yes, if you run a single instance or can route writes to a single writer and mostly serve reads.
If you have multiple stateless app servers all writing concurrently, SQLite will eventually punish you with lock contention.
4) Is WAL mode always the right answer for SQLite?
Usually for concurrent reads, yes. WAL improves reader/writer behavior. But it introduces checkpointing considerations and extra files.
You must monitor WAL growth and ensure readers don’t hold snapshots forever.
5) Is SQLite safe on network filesystems?
Treat “safe” as “proven safe under your exact filesystem, mount options, and failure modes.” In practice, shared network filesystems are a frequent source of pain.
If you need shared access across nodes, use MySQL (or another server DB) instead of trying to make SQLite act like one.
6) What about backups for SQLite?
Backing up a file is easy. Backing it up consistently while the app is writing is the real requirement.
Use SQLite’s online backup approach or stop writes briefly. Then test restores; a backup you haven’t restored is a rumor.
7) What about migrations: start with SQLite, move to MySQL later?
That’s a valid strategy if you design for it: keep SQL portable where you can, avoid SQLite-specific quirks, and build a migration pipeline early.
Don’t wait until you’re on fire to invent data export.
8) Why does MySQL sometimes have worse tail latency than SQLite?
Because it has more queueing points: network, thread scheduling, lock waits, buffer pool misses, fsync bursts, replication side effects.
SQLite’s simpler path can produce nicer p99—until write contention appears.
9) Can I scale SQLite with replicas?
Not in the MySQL sense. You can replicate the file or stream changes, but then you’re building a distributed system.
If you need straightforward replication and failover, MySQL is the grown-up choice.
10) If SQLite is so good, why doesn’t everyone use it for everything?
Because “everything” includes multi-tenant access, many concurrent writers, remote clients, and strong operational primitives like replication.
SQLite is a scalpel, not a Swiss army knife.
Next steps you can actually do this week
If you’re deciding between MySQL and SQLite—or trying to rescue a system that chose poorly—do the following in order:
- Write down your workload shape: read/write ratio, peak concurrent writers, dataset size, durability requirements, deployment topology.
- Measure boundary costs: connection churn and network latency for MySQL; storage placement and lock contention for SQLite.
- Run a realistic benchmark: same dataset, warmed cache and cold cache, real concurrency, and production-like durability settings.
- Pick the simplest architecture that meets requirements: if SQLite meets them, enjoy the free speed and fewer moving parts.
- If you need MySQL, commit to operating it well: pooling, monitoring, backups, and a schema that respects concurrency.
The goal isn’t to win a database argument. The goal is to ship a system that’s fast because it’s sane—and reliable because it’s honest about failure.