MySQL vs PostgreSQL Full-Text Search: When Built-In Is Enough and When It’s a Trap

Was this helpful?

Someone, somewhere, has promised your stakeholders “Google-like search” on top of your transactional database. Then a customer types two words and gets zero results, or the CPU graph climbs like it’s trying to escape gravity.

Built-in full-text search can be a gift. It can also be a slow-motion outage with a relevance problem. MySQL and PostgreSQL both offer full-text search features that are solid—until you ask them to be a dedicated search engine. The trick is knowing where the line is, and having a playbook for the day it stops being cute.

The decision line: when built-in search is enough

Here’s the blunt version: built-in full-text search is enough when your search is an internal feature, not your product. It’s enough when you can explain your requirements in a paragraph and the paragraph doesn’t contain the words “did you mean,” “synonyms,” “fuzzy,” “personalized ranking,” “autocomplete,” or “near real-time analytics.”

Built-in is enough when…

  • Data volume is moderate and your search scope is constrained. Think: searching tickets in a helpdesk tool, knowledge base articles, product names, internal docs.
  • Latency requirements are “app fast”, not “search-engine fast under chaos.” If P95 can be 200–500 ms and nobody is live-filtering on every keystroke, you can often stay inside the database.
  • Ranking can be “good enough”, and you’re willing to tune it by adjusting weights and stopwords—not by training models.
  • Consistency matters more than relevance sophistication. Database search is transactionally consistent if you design it that way; external engines add index lag and another failure domain.
  • You can live with limitations like tokenization quirks, language stemming choices, and the occasional “why doesn’t it match this exact phrase?” ticket.

Built-in search becomes a trap when…

  • You’re building a search product. If search relevance is your differentiator, don’t pretend a general-purpose DB engine will replace a search stack indefinitely.
  • Your queries mix full-text with high-cardinality filters and sorting and you expect it to scale linearly. It won’t. It will scale like a committee.
  • Multi-tenant search is “search across all tenants” and your index design isn’t explicitly built for it. You’ll pay in CPU, memory, and unpleasant lock contention.
  • “Just add an index” becomes your operating model. Full-text indexes are not regular B-tree indexes. You can bloat them, churn them, and turn maintenance into a part-time job.
  • You need advanced features: fuzzy matching, synonyms per tenant, complex scoring, per-field analyzers, highlighting, “more like this,” query suggestions, typo tolerance, or vector + lexical hybrid search.

If you’re on the fence: start with built-in search to validate behavior and user needs, but design the data flow so you can later mirror documents to a dedicated engine without rewriting the whole app. Put another way: don’t make the database your search cluster and then act surprised when it starts behaving like one.

How MySQL full-text search actually works (and fails)

MySQL full-text search is deceptively simple: add a FULLTEXT index, use MATCH() AGAINST(), ship it. Under the hood, your storage engine matters (InnoDB today, MyISAM historically), your tokenization rules matter, and your “relevance” will be shaped by defaults you didn’t know you agreed to.

InnoDB FULLTEXT: what you’re really buying

InnoDB implements FULLTEXT by maintaining auxiliary tables for the inverted index. It’s integrated, transactional enough for most applications, but it has its own resource profile: background index maintenance, potentially heavy IO during large updates, and sensitivity to stopwords and minimum token sizes.

Natural language vs boolean mode: two different beasts

Natural language mode is “just search this text” and return a relevance score. Boolean mode adds operators and lets you force inclusion/exclusion and prefix matching. In production, boolean mode becomes popular because users want “must include X,” but boolean mode also makes it easier to accidentally write queries that return too much, too little, or scan more than you think.

Failure modes you will see in MySQL

  • “Why does searching ‘to be’ return nothing?” Stopwords and minimum word length. Your system is “working as designed,” which is rarely comforting.
  • Relevance looks random because the scoring model is coarse and heavily influenced by term frequency and document length in a way that may not match your domain.
  • Index build/update costs surprise you. Bulk updates or frequent edits to large text fields can create write amplification and replication lag.
  • Collation and charset issues affect tokenization and comparisons. You can get “same-looking string, different tokens” problems when Unicode normalization isn’t consistent.
  • Query plans degrade when you combine full-text predicates with other filters and sorting. MySQL can choose suboptimal plans or force temporary tables.

One operational reality: when MySQL FULLTEXT goes sideways, the symptoms look like “the database is slow.” Search becomes the noisy neighbor that steals the CPU budget from the rest of your app. And since it’s inside the DB, it’s harder to isolate without either rate-limiting the feature or moving it out.

How PostgreSQL full-text search actually works (and fails)

PostgreSQL full-text search (FTS) is a toolkit. You build tsvector documents, query them with tsquery, index them with GIN (usually) or GiST (sometimes), and rank results with ts_rank or friends. It’s more explicit than MySQL. That’s good because you can tune it, and bad because you can tune it wrong.

tsvector, tsquery, and why normalization isn’t optional

Postgres doesn’t “search the raw text” in the same way. It parses text into lexemes (normalized tokens), typically applying stemming based on a text search configuration (like english). That means you must decide which language config you’re using and whether stemming is desirable. It’s fantastic for “running” matching “run.” It’s terrible when your product codes or legal citations get mangled.

Index choice: GIN is fast to query, not free to maintain

GIN indexes are the workhorse for FTS because they handle the inverted-index shape well. Reads are fast. Writes can be expensive. If you update documents frequently, GIN pending lists and vacuum behavior become your new hobby. (Nobody chooses that hobby on purpose.)

Failure modes you will see in PostgreSQL

  • GIN index bloat and slowdowns if you have heavy churn and insufficient vacuum, or if you’re indexing large documents without thought.
  • Ranking disputes because the default ranking functions aren’t your product manager’s idea of “obvious.” You’ll need weights, normalization, and sometimes separate fields.
  • Language config mismatch leads to “why doesn’t ‘analysis’ match ‘analyses’?” or “why does searching a part number match nonsense?”
  • Bad query construction (using to_tsquery directly on user input) turns punctuation into syntax errors and transforms user queries into operational incidents.
  • Combining FTS with filtering and sorting can trigger large bitmap index scans and memory pressure if you don’t structure the query and indexes deliberately.

Postgres rewards you for being explicit: separate vectors per field, weighted ranking, generated columns, partial indexes per tenant, and clean query construction via plainto_tsquery or websearch_to_tsquery. It also punishes you for winging it in production.

Interesting facts and history you can weaponize

  1. MyISAM had full-text first, long before InnoDB supported it; many “MySQL FTS opinions” are fossilized from the MyISAM era.
  2. InnoDB FULLTEXT stores its index in auxiliary tables, which is why large rebuilds and heavy updates can look like mysterious internal IO storms.
  3. PostgreSQL’s tsearch predates today’s “search everywhere” trend; it’s been part of Postgres for many major versions, evolving from earlier contrib modules into core.
  4. GIN indexes were designed for composite values (arrays, JSON-like structures, lexeme sets). Full-text search is one of the best matches for their design.
  5. Postgres FTS has multiple dictionaries and configurations (stemming, stopwords, simple parsing). That flexibility is why it adapts well across languages—if you actually configure it.
  6. MySQL has minimum token size rules that historically caused “short word” misses; people discover this only after a CEO searches for a two-letter product line.
  7. Both systems make tradeoffs around stopwords: removing them reduces index size and noise, but can destroy relevance for domains where common words matter (legal text, titles, song lyrics).
  8. FTS relevance is not a universal truth. MySQL and Postgres score differently because their models differ; migrating engines changes results even if data is identical.
  9. Replication magnifies pain: an FTS-heavy workload can increase binlog/WAL volume and amplify lag during bulk reindexing or mass updates.

One paraphrased idea worth keeping on a sticky note, attributed to a reliability-famous engineer: paraphrased ideaWerner Vogels (about building systems under failure and designing for what breaks). Treat search as a failure domain, even when it lives in the database.

Joke #1: Full-text search is like office coffee: when it’s good nobody mentions it, and when it’s bad everyone files a ticket at once.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized SaaS company added “search all customer notes” into their admin panel. It shipped fast: one MySQL table, one FULLTEXT index, a new endpoint that ran MATCH(notes) AGAINST(?). It worked in staging. It even worked in production—until it didn’t.

Support reported “search is down” during business hours, but the rest of the app felt sluggish too. On-call saw CPU pegged and queries piling up. The team assumed the full-text index made searches “basically O(1).” The wrong assumption was subtler: they believed that adding a full-text predicate would always narrow results early and cheaply.

In reality, the query also included tenant filters, date range filters, and a sort by “most recently updated.” The optimizer chose a plan that did far more work than expected, and the “sort by recency” forced temporary tables for large candidate sets. Under certain common search terms, the candidate set was massive.

The incident ended with a rate-limit on the endpoint, a UI change to require at least 3 non-stopword characters, and a redesign: they split “search” from “sort,” prefiltered by tenant and recent activity first, and only then applied the full-text match. Later they added a separate search service. The lesson wasn’t “MySQL full-text is bad.” It was: the optimizer is not your product manager, and “indexed” doesn’t mean “cheap.”

Mini-story 2: The optimization that backfired

A marketplace team on PostgreSQL tried to speed up listing search by building a single big tsvector that concatenated title, description, and seller notes. They weighted everything equally. They also stored it as a generated column and indexed it with GIN. Queries got faster, and the team declared victory.

Then writes got slower. Not “a little.” They started seeing periodic spikes in WAL generation and autovacuum lag. The application had a feature that let sellers tweak descriptions frequently, often in bursts (think: seasonal updates). Each edit churned the GIN index. The GIN pending list grew, vacuum couldn’t keep up, and query latency became spiky. The “optimization” had moved cost from reads to writes, and their system was write-heavy.

They tried to tune autovacuum, which helped, then backfired again when it competed with normal workload and caused IO contention. Eventually they broke the vector into smaller fields, reduced indexed content (seller notes stopped being searchable), and moved “notes search” to an async index updated out-of-band. Relevance improved too because weights became meaningful instead of “everything is equally important.”

The lesson: the biggest vector is rarely the best vector. Index only what you need, and remember GIN is a query accelerator, not a free lunch.

Mini-story 3: The boring but correct practice that saved the day

An enterprise app ran Postgres full-text search for internal documentation across multiple departments. Nothing fancy: tsvector per document, GIN index, and a handful of filters. What was fancy was their operational discipline: they had a dedicated “search canary” query in monitoring, executed every minute with a known term set.

One Tuesday, they saw the canary latency climb gradually over an hour. No outage yet, just a slope. On-call checked vacuum stats, found autovacuum was falling behind for the table storing the documents. They also saw increased dead tuples due to a new feature that updated docs more frequently.

Because they had baselines, they responded before it became user-visible: they adjusted autovacuum thresholds for that table, scheduled a targeted VACUUM (ANALYZE) during a quiet period, and temporarily throttled the doc update batch job. No incident review meeting, no angry internal emails, just a quiet return to normal.

This is the boring practice: one synthetic query, tracked over time, tied to a specific subsystem. It’s not glamorous. It’s cheaper than heroics.

Joke #2: The fastest way to improve search relevance is to rename the feature “keyword filter” and watch expectations fall to match reality.

Hands-on tasks: commands, output, meaning, decision

These are real tasks you can run during a rollout, incident, or postmortem. Each includes a command, sample output, what it means, and what decision you make.

Task 1 (MySQL): Confirm FULLTEXT indexes exist and what they cover

cr0x@server:~$ mysql -e "SHOW INDEX FROM articles WHERE Index_type='FULLTEXT'\G"
*************************** 1. row ***************************
        Table: articles
   Non_unique: 1
     Key_name: ft_title_body
 Seq_in_index: 1
  Column_name: title
   Index_type: FULLTEXT
*************************** 2. row ***************************
        Table: articles
   Non_unique: 1
     Key_name: ft_title_body
 Seq_in_index: 2
  Column_name: body
   Index_type: FULLTEXT

Meaning: You have a composite FULLTEXT index on title and body. If the search query only uses body but your index is composite in a different order, you can still use it, but behavior can differ by engine/version and query shape.

Decision: Ensure your query’s MATCH(title, body) matches the indexed column list. If not, change the query or index; don’t rely on optimizer magic.

Task 2 (MySQL): Check stopword and token-size settings that explain “missing results”

cr0x@server:~$ mysql -e "SHOW VARIABLES LIKE 'innodb_ft_%';"
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_ft_min_token_size | 3     |
| innodb_ft_max_token_size | 84    |
| innodb_ft_enable_stopword| ON    |
+--------------------------+-------+

Meaning: Tokens shorter than 3 characters aren’t indexed. Stopwords are enabled. Two-letter searches will quietly fail.

Decision: If your domain has important short tokens (SKUs, codes), plan a config change and index rebuild—or redesign to store a separate normalized field for codes and search it with B-tree indexes.

Task 3 (MySQL): Validate whether the optimizer is using FULLTEXT or doing something expensive

cr0x@server:~$ mysql -e "EXPLAIN SELECT id FROM articles WHERE MATCH(title, body) AGAINST('incident response' IN NATURAL LANGUAGE MODE) AND tenant_id=42 ORDER BY updated_at DESC LIMIT 20\G"
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: articles
         type: fulltext
possible_keys: ft_title_body,idx_tenant_updated
          key: ft_title_body
      key_len: 0
          ref:
         rows: 12000
        Extra: Using where; Using filesort

Meaning: It’s using FULLTEXT (type: fulltext), but also doing a filesort. That’s often where latency goes to die under load.

Decision: Consider changing the UX/query: first get top N matches (no ORDER BY updated_at), then sort in app; or maintain a separate “recent per tenant” search index strategy; or accept less strict ordering.

Task 4 (MySQL): Identify full-text heavy queries in the slow log

cr0x@server:~$ sudo awk '/MATCH\(|AGAINST\(/ {print}' /var/log/mysql/mysql-slow.log | head -n 5
# Query_time: 1.842  Lock_time: 0.000 Rows_sent: 20  Rows_examined: 250000
SELECT id,title FROM articles WHERE MATCH(title,body) AGAINST('status page' IN BOOLEAN MODE) AND tenant_id=42 ORDER BY updated_at DESC LIMIT 20;
# Query_time: 1.221  Lock_time: 0.000 Rows_sent: 0  Rows_examined: 180000
SELECT id FROM articles WHERE MATCH(body) AGAINST('to be' IN NATURAL LANGUAGE MODE) AND tenant_id=42 LIMIT 20;

Meaning: You have queries examining huge row counts relative to rows returned. Also note “to be” returns 0 (likely stopwords/min token size).

Decision: Add guardrails: minimum query length, stopword-aware UI messaging, and query rewrites that apply selective filters first.

Task 5 (MySQL): Check for replication lag caused by indexing churn

cr0x@server:~$ mysql -e "SHOW SLAVE STATUS\G" | egrep "Seconds_Behind_Master|Slave_IO_Running|Slave_SQL_Running"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 143

Meaning: You’re 143 seconds behind. FULLTEXT-heavy updates can amplify this during bursts.

Decision: If search updates are causing lag, decouple indexing (async), batch updates off-peak, or move search out of the primary DB path.

Task 6 (PostgreSQL): Confirm FTS index type and size (bloat early warning)

cr0x@server:~$ psql -d appdb -c "\di+ public.*fts*"
                                           List of relations
 Schema |     Name      | Type  |  Owner   |   Table   | Persistence | Access method |  Size  | Description
--------+---------------+-------+----------+-----------+-------------+---------------+--------+-------------
 public | docs_fts_gin  | index | app_user | docs      | permanent   | gin           | 845 MB |
(1 row)

Meaning: GIN index is 845 MB. That may be fine—or it may be a bloat symptom, depending on table size and churn.

Decision: Compare index growth over time. If it grows faster than the table, investigate vacuum and update patterns.

Task 7 (PostgreSQL): Inspect autovacuum/vacuum health for the FTS table

cr0x@server:~$ psql -d appdb -c "SELECT relname,n_live_tup,n_dead_tup,last_autovacuum,last_vacuum FROM pg_stat_user_tables WHERE relname IN ('docs');"
 relname | n_live_tup | n_dead_tup |     last_autovacuum     |        last_vacuum
---------+------------+------------+-------------------------+-------------------------
 docs    |    2100342 |     482991 | 2025-12-28 09:42:11+00  |
(1 row)

Meaning: Nearly half a million dead tuples. That’s a vacuum backlog risk, and for GIN-backed FTS it can translate into worse query latency and larger indexes.

Decision: Tune autovacuum for this table (lower thresholds), and consider reducing update churn on the indexed text.

Task 8 (PostgreSQL): See whether your query is using GIN and how expensive it is

cr0x@server:~$ psql -d appdb -c "EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM docs WHERE tenant_id=42 AND fts @@ websearch_to_tsquery('english','incident response') ORDER BY updated_at DESC LIMIT 20;"
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2312.44..2312.49 rows=20 width=16) (actual time=118.230..118.244 rows=20 loops=1)
   Buffers: shared hit=18543 read=412
   ->  Sort  (cost=2312.44..2320.12 rows=3071 width=16) (actual time=118.228..118.237 rows=20 loops=1)
         Sort Key: updated_at DESC
         Sort Method: top-N heapsort  Memory: 34kB
         ->  Bitmap Heap Scan on docs  (cost=122.50..2231.64 rows=3071 width=16) (actual time=24.911..113.775 rows=5208 loops=1)
               Recheck Cond: (fts @@ websearch_to_tsquery('english'::regconfig, 'incident response'::text))
               Filter: (tenant_id = 42)
               Buffers: shared hit=18512 read=412
               ->  Bitmap Index Scan on docs_fts_gin  (cost=0.00..121.73 rows=16347 width=0) (actual time=23.021..23.022 rows=16221 loops=1)
                     Index Cond: (fts @@ websearch_to_tsquery('english'::regconfig, 'incident response'::text))
 Planning Time: 1.112 ms
 Execution Time: 118.410 ms
(13 rows)

Meaning: GIN is used, but tenant filtering is applied after the bitmap heap scan, meaning you pull many matches then discard most. Buffers show substantial work.

Decision: Consider a composite strategy: add a partial index per tenant (if tenant count is small), or store tenant_id into the document identity and partition by tenant, or maintain a separate fts per tenant shard.

Task 9 (PostgreSQL): Confirm text search configuration and stemming behavior

cr0x@server:~$ psql -d appdb -c "SHOW default_text_search_config;"
 default_text_search_config
----------------------------
 pg_catalog.english
(1 row)

Meaning: Default config is English. If you’re indexing multilingual content, this is a relevance and correctness risk.

Decision: Choose configs per document language, or use simple for non-linguistic tokens. Don’t pretend one stemming config fits all.

Task 10 (PostgreSQL): Validate what tokens are actually indexed

cr0x@server:~$ psql -d appdb -c "SELECT to_tsvector('english','Running runners ran easily') AS v;"
                     v
-------------------------------------------
 'easili':4 'ran':3 'run':1,2
(1 row)

Meaning: “Running” and “runners” normalized to “run,” “easily” to “easili.” This is why stemming can help or hurt.

Decision: If stemming breaks domain terms, switch configs or maintain separate fields with simple dictionaries for codes/names.

Task 11 (PostgreSQL): Catch “unsafe tsquery” construction before it becomes a page

cr0x@server:~$ psql -d appdb -c "SELECT to_tsquery('english','foo:bar');"
ERROR:  syntax error in tsquery: "foo:bar"

Meaning: User input can generate syntax errors if you use to_tsquery directly.

Decision: Use plainto_tsquery or websearch_to_tsquery for user input. Treat raw to_tsquery as an internal API.

Task 12 (System): Identify whether search is CPU-bound or IO-bound on the database host

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0      0  81244  52120 912340    0    0   210   180  790 1460 45 12 40  3  0
 6  1      0  65010  52080 910120    0    0  4520  1100 1200 2400 32 10 35 23  0
 7  2      0  64122  51990 905332    0    0  6120  1450 1350 2600 28 11 33 28  0
 5  1      0  63200  51920 900110    0    0  5900  1500 1300 2500 29 12 34 25  0
 4  0      0  62910  51860 899820    0    0  4800  1200 1150 2300 31 11 37 21  0

Meaning: wa (IO wait) spikes to 20–28%. That suggests storage is a bottleneck, not just CPU. Full-text queries often turn into “read a lot of pages” workloads when selectivity is low.

Decision: If IO wait is high, prioritize cache hit rate, index selectivity, and storage performance. If CPU is pegged with low IO wait, focus on query plans, tokenization costs, and concurrency limits.

Task 13 (PostgreSQL): Find the top time-consuming statements (including search)

cr0x@server:~$ psql -d appdb -c "SELECT calls, total_exec_time::bigint AS ms, mean_exec_time::numeric(10,2) AS mean_ms, rows, query FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 3;"
 calls |   ms   | mean_ms | rows |                              query
-------+--------+---------+------+-------------------------------------------------------------------
  8123 | 984221 | 121.21  |  0   | SELECT id FROM docs WHERE tenant_id=$1 AND fts @@ websearch_to_tsquery('english',$2) ORDER BY updated_at DESC LIMIT 20
  1102 | 312110 | 283.22  |  1   | UPDATE docs SET body=$1, fts=to_tsvector('english',$1) WHERE id=$2
   989 | 221004 | 223.45  |  1   | SELECT count(*) FROM docs WHERE fts @@ plainto_tsquery('english',$1)
(3 rows)

Meaning: Your top consumers include both search reads and FTS update writes. That’s the classic “search feature taxes everything” profile.

Decision: Decide whether to (a) reduce update frequency for indexed fields, (b) move vector updates off the request path, or (c) move search to a separate system.

Task 14 (MySQL): Confirm whether your FULLTEXT query is returning low selectivity (too many matches)

cr0x@server:~$ mysql -e "SELECT COUNT(*) AS matches FROM articles WHERE MATCH(title, body) AGAINST('status' IN NATURAL LANGUAGE MODE);"
+---------+
| matches |
+---------+
| 182344  |
+---------+

Meaning: A very common term matches a huge percentage of documents. That’s low selectivity; it drives heavy work, sorting pain, and cache misses.

Decision: Add filters, require additional terms, tune stopwords, or switch UX to scoped search (e.g., within a project/tenant/date range).

Fast diagnosis playbook

When search gets slow, you don’t have time for a philosophical debate about engines. You need to find the bottleneck in minutes.

First: is it query plan, resource saturation, or data skew?

  • Check system saturation: CPU vs IO wait. If IO wait is high, you’re reading too much from disk or storage is struggling. If CPU is high with low IO wait, you’re doing expensive ranking/parsing or too much concurrency.
  • Check query plan: Are you using the FTS index? Are you sorting huge candidate sets? Are filters applied early or late?
  • Check selectivity: Common-term searches returning massive match sets? That’s not “performance,” that’s math.

Second: separate “search read” pain from “index write” pain

  • Read pain: slow selects, buffer misses, large bitmap scans, filesorts, temp tables, high rows examined.
  • Write pain: slow updates, replication lag, WAL/binlog spikes, vacuum/autovacuum behind, lock contention around maintenance.

Third: decide the containment strategy

  • Contain now: rate-limit search, add minimum query length, disable expensive sorting, cap results, return partial results.
  • Fix next: add or correct indexes, change query structure, tune vacuum/autovacuum, rewrite tokenization strategy.
  • Re-architect: if search is a top resource consumer, isolate it: read replicas for search, dedicated DB, or external search engine.

Common mistakes: symptom → root cause → fix

1) “Search returns nothing for short terms”

Symptom: Two-letter product codes or short names never match.

Root cause: MySQL innodb_ft_min_token_size too high, stopwords enabled; Postgres stemming/tokenization config unsuitable for codes.

Fix: For MySQL, adjust token size and rebuild indexes (plan downtime/maintenance). For Postgres, index codes separately with B-tree or use simple config on dedicated fields.

2) “Search got slow after we added ORDER BY updated_at”

Symptom: Query uses FTS index but latency spikes and CPU/IO climbs.

Root cause: Sorting large candidate sets; MySQL filesort/temp tables; Postgres sort after bitmap heap scan.

Fix: Change UX (sort by rank, not updated time), use two-step retrieval, or precompute “recent docs” per tenant and search within that window.

3) “Postgres FTS suddenly spiky, then stays bad”

Symptom: Periodic latency spikes; index size grows; vacuum seems behind.

Root cause: GIN index bloat + vacuum lag; heavy updates to indexed text fields.

Fix: Tune autovacuum per table; consider reducing churn, batching updates, or rebuilding indexes during maintenance. If churn is business-required, isolate search workload.

4) “Some user searches error out in Postgres”

Symptom: A subset of queries 500 with tsquery syntax errors.

Root cause: Using to_tsquery directly on user input.

Fix: Replace with websearch_to_tsquery (best UX) or plainto_tsquery. Log rejected tokens, don’t crash.

5) “MySQL search feels inconsistent between environments”

Symptom: Staging returns results; prod doesn’t, or scoring differs.

Root cause: Different stopword lists, token size, collation/charset, or engine/version differences.

Fix: Standardize MySQL config across environments; explicitly set collations/charsets; treat stopword list as configuration you test.

6) “Search slows down everything else in the database”

Symptom: When search traffic increases, unrelated endpoints get slower.

Root cause: Shared CPU/IO/cache; full-text scans and ranking compete with OLTP workload; queueing effects.

Fix: Concurrency limit search; move search reads to replicas; isolate to separate database or service when it becomes a top contributor.

7) “Results look dumb: common documents dominate”

Symptom: Long docs or spammy docs always rank first.

Root cause: Ranking not tuned; weights equal; normalization not aligned with your domain; low-quality text fields included.

Fix: Weight title higher than body; exclude boilerplate; store structured fields separately; in Postgres use weighted vectors and rank normalization; in MySQL consider boolean mode constraints or additional filters.

8) “Multi-tenant search is slow even with indexes”

Symptom: Tenant filter present but query still reads huge volumes.

Root cause: FTS index doesn’t incorporate tenant_id, so you get matches across all tenants then filter.

Fix: Partition by tenant, use separate tables per tenant (if feasible), or maintain per-tenant indexes/partial indexes; at minimum, redesign to search within a tenant-scoped subset first.

Checklists / step-by-step plan

Step-by-step: shipping built-in FTS without hating your future self

  1. Write down “search contract”: supported operators, minimum query length, supported languages, ranking rules, and what “phrase” means. If you can’t describe it, users will discover it the hard way.
  2. Choose scope boundaries: tenant-only search, date windows, document types. Add filters that reduce candidate sets early.
  3. Decide tokenization and language configs (Postgres) or stopwords/token sizes (MySQL). Make it explicit in config management.
  4. Index only what matters: don’t shove every text blob into the search vector/index. Treat indexed text as a cost center.
  5. Define “safe query building”: never parse user input as syntax without escaping/translation. Postgres: prefer websearch_to_tsquery. MySQL: sanitize boolean operators if you expose them.
  6. Test with adversarial queries: common words, empty results, punctuation, Unicode, extremely long strings, “copy/paste from Word.”
  7. Install guardrails: request limits, timeouts, circuit breakers, and “sorry, narrow your search” responses.
  8. Monitor one canary query and track P95, buffer reads, and replication lag. Search failures often show up as gradual degradation.
  9. Plan your exit: document how you’d mirror documents to a dedicated search system. Even if you never use it, the design discipline helps.

Checklist: deciding between MySQL and Postgres built-in FTS

  • If you need flexibility and tunable relevance: Postgres wins. You can weight fields, choose dictionaries, and build more structured search behavior.
  • If you want minimal moving parts and accept simpler scoring: MySQL FULLTEXT can be fine, especially for straightforward keyword search with constraints.
  • If you have high write churn on indexed text: be cautious with both; Postgres GIN maintenance and MySQL auxiliary index updates both bite. Consider async indexing patterns.
  • If multi-language matters: Postgres is typically easier to configure correctly—provided you actually do it.
  • If your query needs “web search” semantics: Postgres websearch_to_tsquery is a practical advantage for user-entered queries.

Checklist: signs you should move to a dedicated search engine

  • Search is among top 3 consumers of DB CPU or IO.
  • You need fuzzy matching, synonyms, autocomplete, highlighting, per-field analyzers, or hybrid lexical+vector ranking.
  • You’re adding per-tenant relevance rules and your schema is starting to look like a search engine’s config file.
  • Incidents repeatedly involve “search traffic made the database sad.”
  • Your org can operate another stateful system without treating it like a pet.

FAQ

1) Is MySQL FULLTEXT “bad”?

No. It’s just opinionated and limited. It’s great for simple keyword search with constraints. It becomes painful when you need sophisticated relevance or when search load competes with OLTP load.

2) Is PostgreSQL full-text search a replacement for Elasticsearch/OpenSearch?

Sometimes, for internal or moderate search features. If you need typo tolerance, rich analyzers, fast faceting at scale, or advanced ranking pipelines, you’ll miss a dedicated search engine quickly.

3) Why do my Postgres searches behave oddly with punctuation and special characters?

Because tsquery has syntax. If you build tsquery directly from user input, punctuation can become operators or syntax errors. Use websearch_to_tsquery or plainto_tsquery.

4) Why is “ORDER BY updated_at” so expensive with FTS?

Because FTS returns a set of matching documents; ordering by a different attribute often requires sorting a large candidate set. If the match set is big, you sort a lot. Consider ordering by rank, prefiltering, or two-step retrieval.

5) How do I make ranking better in Postgres?

Split fields (title, body, tags), assign weights, and use weighted vectors. Also consider removing boilerplate text from the indexed field. Ranking is as much data hygiene as it is math.

6) What’s the best way to handle multi-tenant full-text search?

Scope search tightly to tenant first. For Postgres, consider partitioning by tenant or partial indexes when tenant counts are manageable. For MySQL, ensure tenant filters are selective and avoid global searches that then filter.

7) Should I store tsvector as a generated column?

Generated columns are convenient, but they put indexing cost on the write path. If you have heavy churn, consider async updates or updating vectors in batches, depending on your consistency needs.

8) How do I know if the problem is the index or the query?

Look at the plan and buffers/rows examined. If the FTS index is used but you still read a lot, you likely have low selectivity or expensive ordering/filtering. If the index isn’t used, it’s query construction or planner stats.

9) Can I do phrase search with built-in FTS?

Postgres supports phrase search via phraseto_tsquery and related features, but it’s not identical to “exact substring match.” MySQL boolean mode has operators but phrase behavior is not equivalent to search engines with positional indexing tuned for highlighting.

10) What’s the single biggest operational risk with built-in full-text search?

Search load is spiky and user-driven. It turns your primary database into a shared compute pool for unpredictable queries, and that’s how non-search endpoints get collateral damage.

Practical next steps

If you’re already running built-in full-text search, your job is to keep it from quietly becoming the top consumer of your database.

  1. Add a search canary (one query, one dashboard, one alert). Track latency, rows examined, and buffer reads over time.
  2. Implement guardrails: minimum query length, rate limits, timeouts, and a sane maximum result window.
  3. Run the diagnostic tasks above in a quiet window and record baselines: index sizes, vacuum stats, slow queries, plan shapes.
  4. Decide your containment strategy: replicas for search reads, or isolating search to a separate service when it becomes a major resource consumer.
  5. Make relevance a product decision, not an accidental outcome of defaults. If you need more than weights and tokenization, stop negotiating with your database and budget for a dedicated search engine.

Built-in full-text search can be a sharp tool. Just don’t use it as a hammer on a problem that’s clearly a nail gun.

← Previous
Docker Multi-Stage Builds: Shrink Images Without Breaking Runtime
Next →
MySQL vs Elasticsearch for Ecommerce Search: Why SQL Collapses Under Filters

Leave a comment