You shipped “search” because product asked for it, and you figured: “It’s just text. Add an index. Done.”
Two weeks later the CEO is typing refund policy and getting nothing, or everything, or results that change between refreshes.
Meanwhile your on-call phone is teaching your pillow new swear words.
SQLite full-text can be shockingly good. PostgreSQL full-text can be deceptively sharp-edged.
The difference isn’t “toy database vs real database.” The difference is what fails first: correctness, ranking, concurrency, or operational control.
If you run production systems, you want the failure modes predictable.
The decision frame: what you’re actually choosing
“PostgreSQL vs SQLite” is the wrong question. The question is:
where do you want your complexity to live?
SQLite puts complexity in the application process and the filesystem. PostgreSQL puts it in a server with knobs, introspection, and hard boundaries.
For full-text search (FTS), the real decision hinges on:
- Concurrency model: SQLite is great until write contention becomes your personality. PostgreSQL is built for concurrent writes.
- Ranking and linguistics: both can be “good enough,” but PostgreSQL gives you more structured control; SQLite gives you speed and portability.
- Operational visibility: Postgres has mature stats, query plans, and server-level instrumentation. SQLite requires you to be your own DBA.
- Data lifecycle: backups, replication, corruption handling, and schema evolution look different when the database is a file.
- Failure containment: a wedged SQLite writer can stall your app. A wedged Postgres query can be killed without killing the whole process.
Opinionated guidance:
If your app is single-node, write-light, and you want embedded simplicity, SQLite FTS5 is a legit choice.
If you need multi-writer concurrency, online maintenance, and robust introspection, use PostgreSQL and don’t apologize for it.
One quote that’s stuck with ops people for decades, and it applies here:
“Hope is not a strategy.”
— Vince Lombardi
Facts and history that matter more than benchmarks
Benchmarks are fun until they become procurement. Here are concrete context points that affect real systems.
(This is the part where someone says “but my laptop test…” and you quietly close the tab.)
- SQLite is older than many “enterprise” stacks. It started around 2000, built for reliability and zero-admin embedding.
- SQLite FTS is not one thing. FTS3/4 came earlier; FTS5 is newer and has better extensibility and features like built-in BM25 ranking.
- PostgreSQL full-text search shipped long before it was trendy. tsearch2 existed as an external module in early 2000s; it was integrated into core PostgreSQL later.
- SQLite’s concurrency constraint is architectural. It’s not “slow”; it’s that a single database file has locking semantics that can bottleneck writers.
- WAL mode was a big deal for SQLite. It improved read/write concurrency by separating the write-ahead log from the main database file.
- PostgreSQL’s GIN index made text search practical at scale. GIN is purpose-built for “contains many keys” data types, like tsvector terms.
- Both systems tokenize; neither is “Google.” You’ll manage stemming, stopwords, and synonyms yourself, and you’ll get blamed anyway.
- SQLite is famously tested. Its test suite is intense and long-running; it’s one reason it’s trusted in embedded systems.
- PostgreSQL is a platform, not just a database. Extensions, custom dictionaries, and server-side jobs change what “full-text search” can look like.
Joke #1 (short, relevant): Full-text search is where product discovers language is ambiguous, and engineering discovers product is also ambiguous.
How SQLite FTS5 and PostgreSQL tsearch actually work
SQLite FTS5: inverted index living in a file
FTS5 stores an inverted index: terms → lists of row IDs (and positions, depending on options).
The “table” you query isn’t a normal table; it’s a virtual table backed by FTS index structures.
You write rows, it tokenizes text using a tokenizer (like unicode61), and updates the index.
The nice part: it’s fast, portable, and comes along for the ride with your app.
The spicy part: index maintenance can be write-heavy, and if you misconfigure journaling or checkpointing, your tail latencies show up in the worst possible place: user requests.
PostgreSQL tsearch: lexemes, dictionaries, and indexes
PostgreSQL full-text search revolves around tsvector (the document as normalized terms with positions) and tsquery (the query).
You can build a tsvector from columns on the fly, but in production you usually store it (generated column or trigger) and index it with GIN.
Postgres adds structure: configurations, dictionaries, stopwords, and weighting.
That structure makes it more controllable for multilingual or domain-specific search, but it also gives you more ways to misconfigure it.
Ranking: BM25-ish vs ts_rank
SQLite FTS5 includes a BM25 ranking function; it’s not magic, but it’s reasonably aligned with what people expect from search.
PostgreSQL provides ts_rank and friends; with weights and normalization you can get good results, but you’ll tune it.
Translation: if you want “decent ranking out of the box,” SQLite can surprise you.
If you want “ranking you can explain to your legal team,” PostgreSQL gives you more levers.
When SQLite surprises you (in the good way)
1) Embedded search with low operational overhead
You deploy a single binary (or container) and a single database file.
You don’t need a DB server, connection pooling, migrations across nodes, or service discovery.
For internal tools, desktop apps, edge devices, and single-tenant deployments, SQLite FTS5 is a gift.
2) Read-heavy workloads with local data
If your access pattern is mostly reads and occasional batch writes, SQLite in WAL mode can be very fast.
Local IO, no network hop, no server contention, and a mature query engine.
3) Small-to-medium corpora with sane query patterns
For tens of thousands to a few million documents (depending on document size and hardware), SQLite FTS5 can feel “too fast to be real.”
It’s real. It’s also easy to outgrow if you add:
frequent updates, multiple writers, or high query concurrency under one file lock regime.
4) Portable indexing and repeatable test environments
Shipping a reproducible dataset for integration tests is easier when it’s a file.
If you’ve ever tried to debug ranking differences between staging and prod across Postgres config drift, you know why this matters.
When SQLite doesn’t surprise you (and hurts anyway)
Concurrency: the writer choke point
SQLite can do concurrent reads with WAL, but writes still serialize.
For FTS, writes are not “cheap inserts”; they also update the index. Under steady write load, you’ll see lock waits and tail latency spikes.
If you need multiple app instances writing to the same DB, you’re in for a hard conversation with physics.
Yes, you can put the file on a network filesystem. No, you shouldn’t, unless you enjoy undefined behavior as a hobby.
Operational control and introspection
Postgres gives you pg_stat_statements, slow query logs, EXPLAIN (ANALYZE, BUFFERS), background vacuum behavior, and more.
SQLite gives you pragmas and whatever tracing you wire up. That’s fine—until your incident commander asks “what changed?”
Hotspot updates and merge behavior
Frequent updates to the same set of documents can fragment FTS index structures and increase merge work.
If you don’t schedule maintenance (or you schedule it during peak), you can create a self-inflicted DoS.
Correctness gotchas: tokenization and locale
“Why doesn’t ‘résumé’ match ‘resume’?” is not a philosophical question; it’s a tokenizer configuration question.
SQLite’s tokenizers are configurable but not infinite. PostgreSQL’s text search configs can be more nuanced, especially with custom dictionaries.
Joke #2 (short, relevant): SQLite concurrency is like a tiny coffee shop: the espresso is great, but only one barista can steam milk at a time.
What PostgreSQL gives you that SQLite won’t
Hard multi-user concurrency and isolation
PostgreSQL is built for many sessions doing reads and writes at once, with MVCC to keep readers and writers from stepping on each other.
If you’re running search in a web app with multiple workers and background jobs, this matters immediately.
Indexing options and predictable query planning
With PostgreSQL you can:
- Persist
tsvectorand index it with GIN. - Use partial indexes to scope search to “active” documents.
- Use generated columns to avoid triggers.
- Use
GINtuning (likefastupdate) and monitor bloat.
Better tooling for “the day after launch”
Full-text search gets political after launch.
Someone will demand “why is this result above that one?” or “can we exclude internal docs?”
PostgreSQL’s visibility into plans, stats, and background maintenance makes it easier to debug without guesswork.
But: Postgres isn’t a search engine either
If you need fuzzy matching, typo tolerance, synonyms at scale, query-time analytics, or distributed indexing, you’re heading toward a dedicated search system.
Use SQLite/Postgres FTS when you want database-native search, not a second infrastructure stack.
Practical tasks: commands, outputs, and decisions
These are real “what do I do at 2 a.m.” tasks. Each includes a command, representative output, what it means, and the decision you make.
The commands are runnable; adjust paths, database names, and users to your environment.
Task 1: Confirm SQLite journal mode (WAL or not)
cr0x@server:~$ sqlite3 app.db "PRAGMA journal_mode;"
wal
What it means: wal enables better read/write concurrency than delete or truncate.
Decision: If it’s not wal and you have concurrent reads during writes, switch to WAL and retest tail latency.
Task 2: Check SQLite busy timeout (avoid immediate “database is locked”)
cr0x@server:~$ sqlite3 app.db "PRAGMA busy_timeout;"
0
What it means: 0 means callers fail immediately on lock contention.
Decision: Set a nonzero timeout in the application connection (or PRAGMA) if brief contention is expected; otherwise fix writer contention at the source.
Task 3: Verify FTS5 table exists and is actually FTS5
cr0x@server:~$ sqlite3 app.db ".schema docs_fts"
CREATE VIRTUAL TABLE docs_fts USING fts5(title, body, content='docs', content_rowid='id', tokenize = 'unicode61');
What it means: It’s an FTS5 virtual table with external content.
Decision: If you used external content, ensure triggers or rebuild processes keep it in sync, or you’ll serve stale/empty results.
Task 4: Run an FTS5 integrity check
cr0x@server:~$ sqlite3 app.db "INSERT INTO docs_fts(docs_fts) VALUES('integrity-check');"
What it means: No output usually means it passed. Errors indicate index corruption or mismatch.
Decision: If it fails, plan a rebuild (rebuild) and scrutinize filesystem/IO issues and abrupt process termination patterns.
Task 5: Rebuild an FTS5 external-content index (planned maintenance)
cr0x@server:~$ sqlite3 app.db "INSERT INTO docs_fts(docs_fts) VALUES('rebuild');"
What it means: FTS5 rebuilds index from the content table.
Decision: Schedule this off-peak; if rebuild time is unacceptable, you’re likely at the point where Postgres (or a search engine) becomes saner.
Task 6: See if SQLite is checkpointing WAL aggressively (or not at all)
cr0x@server:~$ sqlite3 app.db "PRAGMA wal_checkpoint(TRUNCATE);"
0|0|0
What it means: Output is busy|log|checkpointed. All zeros means no frames pending.
Decision: If log grows huge in production, add controlled checkpointing (app-level) and monitor IO latency.
Task 7: Inspect SQLite page size and cache size assumptions
cr0x@server:~$ sqlite3 app.db "PRAGMA page_size; PRAGMA cache_size;"
4096
-2000
What it means: 4KB pages; cache_size negative means KB units (here ~2,000KB).
Decision: If your workload is read-heavy and you’re thrashing disk, increase cache size in the application connection, but validate memory impact per process.
Task 8: Identify “database is locked” errors in application logs
cr0x@server:~$ journalctl -u app.service --since "1 hour ago" | grep -i "database is locked" | tail
Dec 30 01:12:40 host app[2219]: sqlite error: database is locked (SQLITE_BUSY) on INSERT INTO docs_fts ...
Dec 30 01:12:41 host app[2219]: sqlite error: database is locked (SQLITE_BUSY) on UPDATE docs SET ...
What it means: You have writer contention or long transactions holding locks.
Decision: Shorten transactions, move indexing updates to a single writer queue, or stop pretending the file is a multi-writer database server.
Task 9: Find large WAL files (symptom of checkpointing or write bursts)
cr0x@server:~$ ls -lh app.db app.db-wal app.db-shm
-rw-r--r-- 1 app app 12G Dec 30 01:10 app.db
-rw-r--r-- 1 app app 3.8G Dec 30 01:12 app.db-wal
-rw-r--r-- 1 app app 32K Dec 30 00:55 app.db-shm
What it means: WAL is huge; recovery/checkpoint cost and disk pressure are coming.
Decision: Investigate long readers preventing checkpoints, add periodic checkpoints, and verify disk has headroom; otherwise you’ll “discover” ENOSPC during peak traffic.
Task 10: PostgreSQL: confirm your text search config
cr0x@server:~$ psql -d app -c "SHOW default_text_search_config;"
default_text_search_config
----------------------------
pg_catalog.english
(1 row)
What it means: Your default parsing/stemming is English.
Decision: If you have multilingual content, don’t leave this at default and hope. Choose per-document config or store multiple vectors.
Task 11: PostgreSQL: see if you have the right index type for tsvector
cr0x@server:~$ psql -d app -c "\d+ docs"
Table "public.docs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
id | bigint | | not null | | plain | | |
title | text | | | | extended | | |
body | text | | | | extended | | |
tsv | tsvector| | | | extended | | |
Indexes:
"docs_pkey" PRIMARY KEY, btree (id)
"docs_tsv_gin" gin (tsv)
What it means: You have a GIN index on tsv. Good.
Decision: If this index is missing, add it before you blame Postgres for being “slow.”
Task 12: PostgreSQL: inspect the actual plan for a search query
cr0x@server:~$ psql -d app -c "EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM docs WHERE tsv @@ plainto_tsquery('english','refund policy') ORDER BY ts_rank(tsv, plainto_tsquery('english','refund policy')) DESC LIMIT 20;"
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..35.71 rows=20 width=8) (actual time=4.112..4.168 rows=20 loops=1)
Buffers: shared hit=812
-> Index Scan using docs_tsv_gin on docs (cost=0.42..271.08 rows=152 width=8) (actual time=4.110..4.156 rows=20 loops=1)
Index Cond: (tsv @@ plainto_tsquery('english'::regconfig, 'refund policy'::text))
Planning Time: 0.214 ms
Execution Time: 4.242 ms
(6 rows)
What it means: It’s using the GIN index; buffers are hits (in cache). 4ms is healthy.
Decision: If you see a sequential scan, you likely missed the index, used a function preventing index usage, or have a very low-selectivity query.
Task 13: PostgreSQL: check if autovacuum is keeping up (bloat affects GIN too)
cr0x@server:~$ psql -d app -c "SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 5;"
relname | n_dead_tup | last_autovacuum
---------+------------+-------------------------------
docs | 842113 | 2025-12-30 00:41:12+00
events | 12044 | 2025-12-29 23:58:06+00
(2 rows)
What it means: Lots of dead tuples in docs; updates/deletes are piling up.
Decision: Tune autovacuum for that table, reduce update churn, or accept that search indexes will bloat and slow down over time.
Task 14: PostgreSQL: find slowest queries by total time (if pg_stat_statements is enabled)
cr0x@server:~$ psql -d app -c "SELECT calls, total_exec_time::int AS total_ms, mean_exec_time::numeric(10,2) AS mean_ms, rows, left(query,80) AS q FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 5;"
calls | total_ms | mean_ms | rows | q
-------+----------+---------+------+--------------------------------------------------------------------------------
10213 | 912345 | 89.33 | 0 | SELECT id FROM docs WHERE tsv @@ plainto_tsquery($1,$2) ORDER BY ts_rank...
1400 | 221100 | 157.92 | 1 | UPDATE docs SET body = $1, tsv = to_tsvector($2, $1) WHERE id = $3
(2 rows)
What it means: Your search query dominates total execution time; update path is also heavy.
Decision: Decide whether to optimize query/ranking, cache results, precompute vectors, or reduce update frequency.
Task 15: Check IO saturation on Linux (it’s usually the disk, not your tokenizer)
cr0x@server:~$ iostat -xz 1 3
avg-cpu: %user %nice %system %iowait %steal %idle
12.41 0.00 6.02 21.15 0.00 60.42
Device r/s w/s rkB/s wkB/s await %util
nvme0n1 112.0 980.0 8200.0 65200.0 18.40 97.50
What it means: Disk is nearly pegged; average wait is high. Search latency will track this.
Decision: Stop tuning ranking functions and start fixing IO: faster storage, reduce write amplification, batch writes, or move search off the hot disk.
Task 16: Check file descriptor pressure (SQLite in-process apps can hit limits)
cr0x@server:~$ cat /proc/$(pgrep -n app)/limits | grep "Max open files"
Max open files 1024 1048576 files
What it means: Soft limit is 1024; your app may hit it with logs, sockets, and DB files.
Decision: Raise limits if you’re seeing intermittent IO errors; it’s boring, but it prevents fake “database problems.”
Fast diagnosis playbook
You have slow search or timeouts. You don’t have time for a thesis.
Do this in order. Stop when you find the smoking crater.
First: decide whether you’re CPU-bound, IO-bound, or lock-bound
- IO-bound: high
%iowait, disk%utilnear 100%, growing WAL, slow fsyncs. - CPU-bound: high user CPU, stable IO, tokenization/ranking heavy, large result sets sorted.
- Lock-bound: “database is locked” (SQLite), long-running transactions (Postgres), blocked writers.
Second: verify the index is being used
- SQLite: confirm you’re using
docs_fts MATCH ?and not doingLIKE '%term%'on the content table. - Postgres:
EXPLAIN (ANALYZE, BUFFERS); look for GIN index usage, not seq scan.
Third: check write amplification and background maintenance
- SQLite: WAL size, checkpoint frequency, FTS rebuild/merge behavior.
- Postgres: autovacuum lag, dead tuples, GIN pending list behavior, bloat.
Fourth: inspect query shape and ranking
- Are you sorting thousands of matches to return 20?
- Are you doing prefix queries everywhere?
- Are you computing
to_tsvectoron the fly instead of stored vector?
Fifth: decide if the architecture is wrong
If the bottleneck is “many writers” and you’re on SQLite, tuning is a coping mechanism.
If the bottleneck is “search is now a product feature with SLAs,” Postgres might be fine, but a dedicated search system may be inevitable.
Common mistakes: symptom → root cause → fix
1) Symptom: “database is locked” spikes during peak
Root cause: Multiple writers or long transactions in SQLite; FTS index updates amplify writes.
Fix: Single-writer pattern (queue), shorten transactions, set busy_timeout, enable WAL, move to Postgres if multi-writer is real.
2) Symptom: search results missing for recently updated documents
Root cause: External content FTS table out of sync (triggers missing, failed, or disabled during bulk load).
Fix: Recreate triggers, run FTS5 rebuild, add invariant checks in CI (count mismatches), and alert on drift.
3) Symptom: Postgres search query uses sequential scan
Root cause: No GIN index on tsvector, or query wraps tsvector in a function, or low selectivity terms.
Fix: Persist tsvector, add GIN index, use correct operators (@@), consider partial indexes, adjust configuration/stopwords.
4) Symptom: ranking looks random or “worse than before”
Root cause: Tokenizer/dictionary changes, stopword list changes, mixing languages, or switching from phrase queries to bag-of-words.
Fix: Version your search config, write regression tests with golden queries, and treat ranking as an API contract.
5) Symptom: disk usage grows steadily and performance degrades
Root cause: Postgres bloat (dead tuples), GIN bloat, SQLite WAL not checkpointing, frequent updates to large text fields.
Fix: Tune autovacuum; reduce update churn; in SQLite, manage checkpoints; in Postgres, consider periodic REINDEX/maintenance windows where warranted.
6) Symptom: “works on my machine” but fails in prod with non-ASCII text
Root cause: Different tokenization rules, collations, or text search configs between environments.
Fix: Pin tokenizer/config explicitly; build test cases for Unicode and punctuation; stop relying on defaults.
7) Symptom: latency spikes every few minutes
Root cause: SQLite checkpointing or FTS maintenance bursts; Postgres autovacuum or IO pressure from unrelated workloads.
Fix: Correlate with logs/metrics; schedule maintenance; isolate storage; add jitter/backoff to batch writers.
8) Symptom: search returns too many irrelevant matches
Root cause: Query construction too permissive (OR-heavy), no field weighting, no phrase matching, stopwords removed too aggressively.
Fix: Weight title vs body (Postgres weights; SQLite separate columns and ranking tweaks), add phrase queries for common patterns, tune stopwords.
Checklists / step-by-step plan
Choosing SQLite FTS5: the “don’t get paged” checklist
- Enable WAL mode and verify it stays enabled across deployments.
- Set
busy_timeout(or app-level retry with jitter) deliberately. - Use a single writer pattern if you have frequent writes; treat FTS updates as write-heavy.
- Define checkpoint policy; monitor WAL size and disk headroom.
- If using external content, maintain triggers and add a periodic integrity check.
- Pin tokenizer configuration and test Unicode cases.
- Have a rebuild plan (time it); don’t discover rebuild takes hours during an incident.
- Backups: snapshot the DB file safely (and understand what “safe” means in WAL mode).
Choosing PostgreSQL tsearch: the “make it boring” checklist
- Persist
tsvector(generated column or trigger) and add a GIN index. - Choose
default_text_search_configintentionally; don’t leave multilingual content to luck. - Use
EXPLAIN (ANALYZE, BUFFERS)to confirm index usage before tuning anything else. - Enable
pg_stat_statementsso you can find the real expensive queries. - Tune autovacuum for the tables that churn; watch dead tuples and bloat.
- Decide your ranking function and freeze it behind a stable interface.
- Plan for migrations (config changes, dictionary changes) as versioned rollouts.
Migrating from SQLite FTS to PostgreSQL tsearch: step-by-step
- Inventory tokenization behavior (what matches today) and write a regression suite of queries and expected top results.
- Export documents and IDs; keep stable identifiers so you can compare results across systems.
- In Postgres, create
tsvectorwith the closest config; index it with GIN. - Run offline comparison: recall/precision for your regression queries, not synthetic benchmarks.
- Roll out dual-write or asynchronous indexing; don’t block user writes on search indexing during migration.
- Flip reads gradually; monitor latency and relevance complaints separately.
- Retire SQLite search only after you can rebuild Postgres indexes and recover from failures quickly.
Three corporate mini-stories from the trenches
Incident: the wrong assumption (“SQLite is just a file, so it’ll be faster”)
A mid-sized SaaS team shipped a document search feature embedded in the main app process using SQLite FTS5.
It was meant to reduce dependencies: no extra Postgres cluster, no search service, no new on-call rotation.
The first month was fine. The dataset was modest, and most writes were from a nightly import.
Then they added user-generated notes with autosave. Writes became constant.
They assumed WAL mode meant “basically concurrent,” and scaled the app horizontally. Each instance wrote to the same database file on shared storage.
The symptoms were classic: random timeouts, occasional “database is locked,” and CPU graphs that looked calm while users screamed.
The incident wasn’t a single crash. It was a slow-motion collapse.
One writer held a lock longer than expected, others backed up, request threads piled up, and the app stopped serving traffic long before the database “failed.”
They learned the hard way that a single-file lock boundary is not a distributed concurrency model.
The fix was not clever. They moved search to Postgres on a dedicated instance, queued indexing work, and kept SQLite for local dev and tests.
The most valuable change was social: they wrote down the concurrency assumptions in the design doc so it couldn’t happen again.
Optimization that backfired: “Let’s rebuild the index nightly to keep it fast”
Another team had a large SQLite FTS5 index and noticed query latency creeping up over time.
Someone proposed a nightly full rebuild of the FTS table. It worked in staging, it was easy to automate, and it made charts look good for exactly one day.
They rolled it out with confidence and a calendar invite.
The rebuild ran on the same host as the app. It saturated disk IO for an hour, then two.
Read queries got slower during the rebuild, so users retried, creating more load. The job overlapped with morning traffic because time zones exist.
Support opened tickets. Engineering blamed the network. The network team blamed DNS, as tradition demands.
The backfire wasn’t that rebuilding is “bad.” It was that rebuilding is an IO event.
The team was treating the database file like an internal cache, but it had become a primary user-facing feature.
They needed controlled maintenance, not a blunt-force reset.
The eventual fix: they stopped rebuilding and instead managed write patterns, checkpointing, and index health.
For bigger churn they moved to Postgres, where they could reindex online-ish within windows and isolate IO better.
The lesson was boring: don’t schedule a disk storm next to your customers.
Boring but correct practice: “We tested relevance like we test APIs”
A third company ran Postgres full-text search for internal knowledge base documents.
Nothing fancy: tsvector, GIN index, and a handful of ranking tweaks.
Their secret weapon wasn’t technology. It was discipline.
They kept a small corpus of “golden queries” with expected top results.
Every change to dictionaries, configs, or ranking weights ran through CI and produced a diff report.
Engineers learned quickly that “minor tweak” can mean “support ticket avalanche.”
One day, a routine OS package update changed a collation-related behavior in a staging environment, which then changed tokenization outcomes in a subtle way.
The regression suite caught it. They blocked the rollout and investigated.
It wasn’t a dramatic outage. It was a quiet non-event, the best kind.
When the time came to add another language, they didn’t argue about feelings. They added vectors per language and expanded the golden set.
It cost them some storage and some time.
It saved them from weeks of “why is search worse” meetings where everyone is right and nobody is useful.
FAQ
1) Is SQLite FTS5 “good enough” for production?
Yes, if you’re single-node (or single-writer), mostly read-heavy, and you treat maintenance (WAL, checkpoints, rebuilds) as a real operational concern.
It fails predictably when you violate its concurrency assumptions.
2) Is PostgreSQL full-text search slower than SQLite?
Not inherently. SQLite can feel faster because it’s embedded and avoids network hops.
Postgres can be extremely fast with a stored tsvector + GIN, but you must confirm index usage and keep autovacuum healthy.
3) Should I compute to_tsvector at query time in Postgres?
Almost never for a real workload. Persist the vector (generated column or trigger), index it, and keep queries simple enough for the planner to use the index.
4) Can SQLite handle multiple app instances reading and writing?
Reads: yes, especially with WAL. Writes: they serialize. If you have frequent writes from multiple instances, you’ll hit lock contention and latency spikes.
Use a single writer queue or move to a server database.
5) Why do I get different results between environments?
Tokenizer/config drift. SQLite tokenizers and Postgres text search configurations are defaults until they’re not.
Pin configs, version them, and test Unicode and punctuation explicitly.
6) How do I improve relevance without building a whole search stack?
Start with field weighting (title > body), phrase queries for common patterns, and stopword tuning.
Then add a regression suite of golden queries so relevance doesn’t regress silently.
7) When should I stop tuning and switch from SQLite to Postgres?
When the bottleneck is concurrency (writers), operational visibility, or when rebuild/maintenance windows become unacceptable.
If the fix requires “make users wait less by hoping,” you’re done.
8) When should I stop using Postgres FTS and adopt a search engine?
When you need typo tolerance, synonyms, complex scoring, faceting/aggregations at scale, or distributed indexing.
Also when search becomes a core product feature with its own SLAs and you need specialized tooling.
9) Does WAL mode in SQLite make writes concurrent?
It improves read/write concurrency, not write/write concurrency. One writer still wins at a time.
WAL mostly prevents readers from blocking on writers.
10) Is GIN always the right index for Postgres text search?
Usually, yes. GiST exists and can be useful in some cases, but GIN is the standard choice for tsvector containment queries.
Validate with EXPLAIN, not superstition.
Practical next steps
If you’re deciding today:
- Pick SQLite FTS5 if you can commit to a single-writer model, WAL, explicit checkpointing, and file-based backup discipline.
- Pick PostgreSQL tsearch if you need concurrency, introspection, predictable operations, and you’re willing to manage a database server.
If you’re already in pain:
- Run the fast diagnosis playbook. Identify whether you’re lock-, IO-, or CPU-bound.
- Prove index usage (SQLite:
MATCH; Postgres:EXPLAIN (ANALYZE, BUFFERS)). - Stabilize maintenance: WAL checkpoints (SQLite), autovacuum/bloat (Postgres).
- Lock down configs and add golden-query regression tests. Relevance is a contract, not a vibe.
- If concurrency is the limiting factor, stop negotiating with the physics of file locking and move the workload to Postgres.
The grown-up version of “database choice” is choosing which incidents you want to debug.
SQLite surprises you with how far it can go. PostgreSQL surprises you with how much you can control.
Choose based on the surprises you can afford.