MySQL vs Elasticsearch for Ecommerce Search: Why SQL Collapses Under Filters

Was this helpful?

Your search page is fast until customers use it like customers. “Shoes” works. Then someone clicks
size 10, black, in stock, brand X, under $120, sorts by
“best selling”, and your database starts making the kind of life choices that end in paging and tears.

This is the moment teams realize: “We built search on SQL.” Not because it was the right tool, but because
MySQL was already there and everyone was afraid of adding another system. The bill arrives later, with interest,
and it’s paid in latency, lock contention, and dead-on-arrival feature requests like “can we show counts for each filter option?”

Why SQL search collapses under filters

Ecommerce “search” isn’t one query. It’s a conversation between a human and your catalog:
type something, constrain it, constrain it again, sort, paginate, and demand that the UI update
in under a few hundred milliseconds while also showing facet counts. Each click adds constraints,
and each constraint multiplies your database’s ways to get this wrong.

The failure mode: combinatorial pain, not one slow query

With MySQL, the typical pattern is:

  • One query to fetch results (often with joins for price, inventory, category, attributes).
  • Several queries to compute facet counts (brand counts, size counts, color counts, etc.).
  • Another query for “did you mean” or synonyms or relevance scoring hacks.

If you do all of that live against normalized tables, you’re asking a transactional database
to behave like a search engine. It will try. It will also set your CPUs on fire.

What “filters” do to SQL: they destroy selectivity assumptions

B-tree indexes (MySQL’s default for InnoDB) are great at narrowing down a dataset when
predicates are selective and align with an index prefix. Filters in faceted search don’t politely align.
They look like:

  • WHERE category IN (...) (multiple values, often non-selective at the top of the tree)
  • AND brand_id IN (...) (multi-select, user-controlled)
  • AND price BETWEEN ... (range predicates kill the rest of composite indexes)
  • AND (title LIKE '%term%' OR description LIKE '%term%') (leading wildcard = index is decorative)
  • AND attr_id=... AND attr_value IN (...) repeated N times (EAV join explosion)
  • ORDER BY popularity DESC with deep pagination (filesort + scanning)

MySQL’s optimizer has to choose a plan. For complex predicates, it guesses, based on statistics
that may be stale, coarse, or wrong for skewed data (which product catalogs absolutely are).
When it guesses wrong, it doesn’t get “10% slower.” It gets “timeouts and a meeting.”

The “facet counts” trap: you’re running analytics on OLTP

Facet counts are basically group-by aggregations over a filtered set. That’s analytical work.
In SQL, you implement it with GROUP BY across joined tables, and you either:

  • Run a big aggregation per facet (N queries per request), or
  • Try to do a mega-query with conditional aggregates, then cry when it becomes unmaintainable.

The tragedy is that each facet query repeats most of the work: apply the same filters, scan similar rows,
join the same tables, and then count. You can cache, but filters are user-defined, and the combinatorics
of caching explode unless your traffic is tiny or your catalog is tiny.

One dry operational truth: MySQL is predictable; user filters are not

MySQL shines when you can predict access patterns and index for them. Faceted search is adversarial:
users are effectively generating ad-hoc queries with weird combinations. MySQL can handle some of it,
but the performance envelope collapses quickly once you add:

  • multiple joins for attributes
  • range filters + sorting
  • deep paging
  • facet aggregations
  • multi-language stemming/synonyms/relevance tuning

Joke #1: Running faceted search on a fully normalized schema is like towing a boat with a bicycle—possible,
but everyone watching you is nervous.

How MySQL actually executes your “simple” search

If you want to understand why SQL search collapses, you need to internalize what MySQL is optimized for:
point lookups and narrow index range scans with transactional consistency, on row-stores, with B-tree indexes.
Search engines are built around inverted indexes, columnar-ish doc values, cached filter bitsets, and a scoring model.
Different physics.

B-tree indexes: excellent at “find rows like this,” mediocre at “compute relevance”

A B-tree index excels when the predicate reduces the candidate set quickly. It struggles when:

  • You need to match tokens inside text without pre-tokenizing (LIKE '%term%').
  • You need ranking by relevance (requires scoring, term frequencies, field boosts).
  • You need OR-heavy logic that broadens the scan (brand IN (...) OR category IN (...)).
  • You need aggregation counts across many dimensions (facets).

MySQL FULLTEXT helps, but it doesn’t solve faceting

MySQL’s FULLTEXT indexes can provide token-based matching for certain workloads,
and in InnoDB they’re real and useful. But ecommerce search wants more:

  • Prefix, fuzziness, typo tolerance
  • Synonyms and “did you mean”
  • Relevance tuning by fields (title vs description)
  • Structured filters that must be fast
  • Facet counts and “show me how many items match if I click this”

Even if FULLTEXT gets the textual match right, the join-and-filter work that follows can still be your bottleneck.
You end up with a two-stage query: text match yields IDs, then you join filters, then you sort. Each stage has its own
performance cliffs.

EAV schemas: the silent killer

Attribute systems are often implemented as EAV (entity-attribute-value): a product has many attributes,
each stored as rows. This is flexible and terrible for search filters. A query like “size=10 and color=black”
becomes multiple self-joins or subqueries against the attribute table. On a big catalog, that’s a join explosion
with low selectivity predicates. The optimizer can’t always reorder joins profitably, and even when it can, the
intermediate result sets get large fast.

Sorting and pagination: “ORDER BY popularity” is not free

Sorting is a second bottleneck that shows up after you “fixed” filtering. If your query selects many candidates
and then sorts by something not covered by an index, MySQL may do a filesort. That isn’t literally “sort on disk”
every time, but it can spill, and it burns CPU and memory.

Deep pagination (LIMIT 20 OFFSET 2000) is particularly nasty: MySQL still has to find and discard the first
2000 rows after applying filters and ordering. That’s wasted work per request, and it scales linearly with offset.

Query plan stability: production is where “sometimes” becomes “always”

The biggest operational headache is not that one query is slow. It’s that the same query shape becomes slow depending on:

  • which category is selected
  • which brands are selected
  • time of day (buffer pool warm vs cold)
  • data skew (one brand has 40% of products)
  • freshness (new products not in stats)

You see p95 latency jump, then you blame the network, then you blame the application, then you discover one user
clicked an “All colors” filter that’s implemented as IN (...hundreds of ids...). It’s always the innocuous UX feature.

One quote that’s been painfully reliable in operations: “Hope is not a strategy.” — Gene Kranz.

How Elasticsearch makes filters cheap (and why it’s still not magic)

Elasticsearch is built on Lucene, which is built for search. That sounds obvious and tautological,
but it has practical consequences: the index structure is an inverted index for text plus columnar-style
doc values for aggregations and sorting. Filters become set operations on bitsets, not join puzzles.

Inverted index: the core advantage

In an inverted index, each term points to a posting list of document IDs that contain it. Intersect posting lists,
apply filters as precomputed sets, score, and you’re done. The key is that the work scales with the number of matches,
not the number of rows scanned across joined tables.

Filters are cacheable and composable

In Elasticsearch, a filter (say brand:Acme) often becomes a bitset: a bitmap of documents that match.
Combine filters via AND/OR quickly. Many filters can be cached effectively because they’re reused across requests,
especially common facets like “in stock” or “free shipping.”

Facet counts (aggregations) are first-class citizens. Elasticsearch can compute counts by brand, size, color, etc.,
using doc values and segment-level optimizations. You’re not writing N different GROUP BY queries and praying
that the optimizer doesn’t interpret your schema as a personal challenge.

Denormalization is the price of performance

Elasticsearch performs best when each product is a single document containing the fields you filter and sort on.
That means denormalizing: include brand name, category path, attributes, availability state, maybe even computed rank.
OLTP engineers often resist denormalization on principle. Search makes you do it anyway, just in a separate system.

Operational reality check: Elasticsearch buys you performance, not simplicity

Elasticsearch is not free:

  • Indexing pipelines can fail. Backfills can take hours.
  • Mapping mistakes are forever (or at least until reindex).
  • Cluster sizing is real engineering, not vibes.
  • Consistency is eventual by default; you must design around it.
  • You now run two systems: MySQL for truth, Elasticsearch for search.

But if your product needs fast faceting and relevance, Elasticsearch is the right kind of complexity.
It matches the shape of the problem.

Joke #2: Elasticsearch is the friend who helps you move fast—then asks you to help them move a cluster next weekend.

Interesting facts and historical context (the stuff that explains today’s tradeoffs)

  1. Lucene predates Elasticsearch by nearly a decade. Lucene started in the late 1990s; Elasticsearch arrived later as a distributed wrapper with an API.
  2. MySQL’s InnoDB became the default engine in MySQL 5.5. That shift mattered: transactions, crash recovery, and buffer pool behavior shape what “fast” looks like.
  3. FULLTEXT in InnoDB was added after MyISAM had it for years. The feature exists, but the ecosystem of analyzers and scoring in Lucene is deeper.
  4. Early ecommerce “search” was often category browsing plus LIKE. That legacy still haunts schemas: EAV tables, keyword tables, and hand-rolled relevance columns.
  5. Faceted navigation became mainstream with large catalogs in the 2000s. Once “filter by size/color/brand” became table stakes, OLTP databases started getting abused.
  6. Elasticsearch popularized “schema on write” for search. Your mapping decisions at index time strongly affect query-time performance and correctness.
  7. Columnar doc values were a turning point for aggregations. Without doc values, “counts by facet” at scale would be far more expensive.
  8. Distributed search clusters made relevance a production concern. You don’t just query; you tune analyzers, synonyms, stemming, and scoring across shards.

Three corporate mini-stories from the trenches

1) Incident caused by a wrong assumption: “It’s indexed, so it’s fast”

A mid-market retailer had a MySQL-backed search endpoint that “worked fine” in staging. Production was another story.
The endpoint joined products to inventory, prices, categories, and attributes, then sorted by popularity. They had
added indexes everywhere. Someone even had a spreadsheet.

Traffic spiked during a seasonal campaign, and p95 search latency went from “acceptable” to “you can refresh the page
and still be waiting.” The database CPU was high but not pinned. The real pain was I/O and buffer pool churn:
the query touched too many pages, too unpredictably.

The wrong assumption was subtle: they believed that because each table had an index on the joined keys, the join
would remain cheap under filters. But the filters were on attributes stored in EAV rows, so the join order
changed with different filters. The optimizer picked a plan that looked good for common filters and catastrophic
for a specific category with huge attribute diversity.

The incident response wasn’t glamorous. They temporarily disabled multi-select facets for that category and forced
a narrower subset of filters. Latency recovered immediately. The root fix was to move faceted filtering and ranking
into Elasticsearch, with MySQL serving only as the catalog of record.

The postmortem note that mattered: “Index presence is not index usefulness.” If a predicate isn’t selective, your
index is a suggestion, not a solution.

2) Optimization that backfired: caching the SQL results

Another company attempted the classic move: “Let’s cache search responses in Redis.” They hashed the request parameters
and cached the full JSON response for 10 minutes. On paper, it reduced load. In practice, it created new failure modes.

The catalog changed frequently: prices updated, inventory fluctuated, promotions started and stopped. The cache served
stale results that violated business rules (out-of-stock items shown as available, discounted prices missing). Customer
support noticed. Marketing noticed. The CFO noticed.

They tightened the TTL. Load returned. They added cache invalidation hooks from the catalog pipeline. It became a complex,
brittle event-driven system, and they still couldn’t invalidate all combinations of filters. The combinatorics beat them:
you can’t efficiently invalidate “all queries that include brand X and any color and price between A and B.”

The final fix was boring: stop caching full responses and instead cache stable sub-results (like filter bitsets) in the
search engine, where the primitives are designed for it. They introduced Elasticsearch for query-time and used MySQL for
writes and truth. Response caching remained, but only for top-level category pages with predictable parameters.

3) Boring but correct practice that saved the day: dual-write plus replayable indexing

A larger org did something unfashionable: they treated search indexing as a first-class production pipeline.
Product updates flowed into a durable queue. A consumer updated Elasticsearch. Another consumer updated MySQL.
And every event was replayable from storage.

One afternoon, a mapping bug slipped in: a numeric field was indexed as text. Facet counts went weird, sorting broke,
and someone tried to hot-fix it by changing the mapping. Elasticsearch politely refused because mappings aren’t that flexible.
The team did not panic. That’s the whole point of boring process.

They created a new index with the correct mapping, replayed events from the durable log, validated counts against MySQL,
and flipped an alias. Downtime was avoided; the “blast radius” was limited to temporary relevance oddities during reindex.

Nothing about this was clever. It was just disciplined: immutable event log, idempotent indexing, alias-based cutovers,
and dashboards that compare search index docs to source-of-truth counts. When your search is a product feature,
that’s what adulthood looks like.

Fast diagnosis playbook

When search gets slow, your job is not to argue about which system is “better.” Your job is to find the bottleneck in minutes,
not days, and make a safe decision under pressure.

First: classify the slowness (app vs database vs search cluster)

  • Check p95/p99 at the edge: is the whole endpoint slow or only specific query shapes?
  • Compare “no filters” vs “many filters”: if filters cause a superlinear jump, it’s plan/aggregation work.
  • Check error budgets: timeouts, 5xx, circuit breaker trips.

Second: isolate query-time work from data freshness work

  • MySQL: is it slow because of scans/sorts, or because you’re blocked (locks)?
  • Elasticsearch: is it slow because of heavy aggregations, or because the cluster is busy merging segments / GC / disk-bound?

Third: look for the classic cliffs

  • MySQL cliffs: filesort, temporary tables, “rows examined” explosion, EAV joins, OFFSET deep paging, stale stats.
  • Elasticsearch cliffs: high cardinality aggregations, too many shards, fielddata blowups, slow disks, heap pressure, refresh/merge contention.

Fourth: take the lowest-risk mitigation

  • Disable the single worst facet or sort option temporarily.
  • Cap OFFSET (or require “search after”).
  • Reduce aggregation size.
  • Fall back to keyword-only without counts.

Fifth: decide architecture, not heroics

If your product requires rich filtering + facet counts + relevance, move the query workload to a search engine.
If your product is “admin search” with a few filters, MySQL is fine—just do it intentionally.

Practical tasks with commands, outputs, and decisions

These are the tasks I run in production when someone says “search is slow.” Each task includes:
a runnable command, a realistic sample output, what it means, and the decision you make.

Task 1: Identify top slow MySQL statements (by total time)

cr0x@server:~$ sudo pt-query-digest /var/log/mysql/mysql-slow.log | head -n 25
# 120s user time, 2s system time, 33.12M rss, 101.02M vsz
# Current date: Mon Dec 30 12:10:07 2025
# Hostname: db1
# Files: /var/log/mysql/mysql-slow.log
# Overall: 1.2k total, 37 unique, 0 QPS, 0x concurrency ____________
# Time range: 2025-12-30T10:00:00 to 2025-12-30T12:00:00
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time          3500s   150ms     12s      3s      9s     2s   1.5s
# Rows examine        3.1G       0    6.0M   2.6M   5.9M   1.1M   2.1M
# Query 1: 45% 0.12 QPS, 0.40x concurrency, ID 0xA1B2C3D4 at byte 12345
# Scores: V/M = 0.98
# Time range: 2025-12-30T10:10:00 to 2025-12-30T11:55:00
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         22     270
# Exec time     45   1580s   800ms     12s      6s     10s     2s      5s
# Rows examine  62   1.9G   400k    6.0M   7.1M   5.9M   1.1M   6.8M

What it means: Your “search query” isn’t slow; it’s scanning millions of rows per request and dominating time.

Decision: Pull that query shape, run EXPLAIN ANALYZE, and either redesign indexes/schema or move the workload out of MySQL.

Task 2: Find currently running MySQL queries and their time

cr0x@server:~$ mysql -e "SHOW FULL PROCESSLIST\G" | sed -n '1,35p'
*************************** 1. row ***************************
     Id: 91234
   User: app
   Host: 10.0.2.15:53912
     db: catalog
Command: Query
   Time: 18
  State: Sending data
   Info: SELECT p.id, p.title FROM products p
         JOIN product_attr pa ON pa.product_id=p.id
         WHERE p.category_id IN (12,13) AND pa.attr_id=7 AND pa.value IN ('black','navy')
         ORDER BY p.popularity DESC LIMIT 20 OFFSET 2000

What it means: “Sending data” for 18 seconds often means scan + join + sort, not network I/O.

Decision: Mitigate deep pagination and examine whether the join predicate is selective enough to use indexes.

Task 3: Inspect query plan with runtime metrics (MySQL 8)

cr0x@server:~$ mysql -e "EXPLAIN ANALYZE SELECT p.id FROM products p JOIN product_attr pa ON pa.product_id=p.id WHERE p.category_id IN (12,13) AND pa.attr_id=7 AND pa.value IN ('black','navy') ORDER BY p.popularity DESC LIMIT 20 OFFSET 2000;"
+----------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                          |
+----------------------------------------------------------------------------------------------------------------------------------+
| -> Limit/Offset: 20/2000  (cost=84512.23 rows=2020) (actual time=15123.412..15123.900 rows=20 loops=1)
    -> Sort: p.popularity DESC  (cost=84512.23 rows=40123) (actual time=15123.410..15123.650 rows=2020 loops=1)
        -> Nested loop inner join  (cost=40211.11 rows=40123) (actual time=112.310..15012.000 rows=420000 loops=1)
            -> Index range scan on p using idx_category (category_id in (12,13))  (cost=1123.00 rows=70000) (actual time=2.100..50.000 rows=68000 loops=1)
            -> Index lookup on pa using idx_attr (attr_id=7, value in ('black','navy')) (cost=0.50 rows=6) (actual time=0.200..0.210 rows=6 loops=68000) |
+----------------------------------------------------------------------------------------------------------------------------------+

What it means: 68k outer rows cause 68k repeated lookups; intermediate rows explode; sorting 420k candidates to discard 2000 is wasted work.

Decision: Avoid OFFSET paging, precompute/denormalize attributes for common filters, or move filtering/sorting to Elasticsearch.

Task 4: Verify index definitions and prefix usefulness

cr0x@server:~$ mysql -e "SHOW INDEX FROM product_attr\G" | sed -n '1,35p'
*************************** 1. row ***************************
        Table: product_attr
   Non_unique: 1
     Key_name: idx_attr
 Seq_in_index: 1
  Column_name: attr_id
    Collation: A
  Cardinality: 32
*************************** 2. row ***************************
        Table: product_attr
   Non_unique: 1
     Key_name: idx_attr
 Seq_in_index: 2
  Column_name: value
    Collation: A
  Cardinality: 2100
*************************** 3. row ***************************
        Table: product_attr
   Non_unique: 1
     Key_name: idx_attr
 Seq_in_index: 3
  Column_name: product_id
    Collation: A
  Cardinality: 800000

What it means: Low cardinality for attr_id suggests many rows per attribute; the index helps less than you think under broad categories.

Decision: Consider composite indexes that match your most common query patterns, or remodel attributes (flatten hot attributes into columns or a denormalized table).

Task 5: Check if MySQL is spilling to disk due to temp tables

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Created_tmp%';"
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Created_tmp_disk_tables | 184203   |
| Created_tmp_files       | 9132     |
| Created_tmp_tables      | 512004   |
+-------------------------+----------+

What it means: A high ratio of disk temp tables indicates sorts/group-bys that don’t fit memory or can’t use indexes.

Decision: Reduce GROUP BY complexity, ensure indexes support grouping, or stop doing facet counts in MySQL.

Task 6: Confirm buffer pool pressure (InnoDB)

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';"
+---------------------------------------+-----------+
| Variable_name                         | Value     |
+---------------------------------------+-----------+
| Innodb_buffer_pool_read_requests      | 982341223 |
| Innodb_buffer_pool_reads              | 18233411  |
+---------------------------------------+-----------+

What it means: Buffer pool reads (physical) are significant. Search queries are churning pages and missing cache.

Decision: Either provision more memory (temporary relief) or reduce working set by moving search off MySQL / denormalizing / caching differently.

Task 7: Check MySQL for lock waits impacting search

cr0x@server:~$ mysql -e "SELECT * FROM performance_schema.data_lock_waits\G" | sed -n '1,40p'
*************************** 1. row ***************************
REQUESTING_ENGINE_LOCK_ID: 139941888:106:4:5:12345
REQUESTING_ENGINE_TRANSACTION_ID: 812345
BLOCKING_ENGINE_LOCK_ID: 139941888:106:4:5:12345
BLOCKING_ENGINE_TRANSACTION_ID: 812100
REQUESTING_THREAD_ID: 2221
BLOCKING_THREAD_ID: 2199

What it means: Search latency may be from lock contention, often due to writes or long transactions that block reads (or metadata locks).

Decision: Fix the write path, shorten transactions, and separate read replicas for search if you must keep SQL search.

Task 8: Measure Elasticsearch search latency and shard health

cr0x@server:~$ curl -s http://localhost:9200/_cat/health?v
epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1735560300 12:05:00  es-prod yellow          6         6   420 210    0    0       12             0                  -                 97.1%

What it means: Yellow with unassigned shards can add latency and risk, especially if replicas are missing and nodes are under pressure.

Decision: Restore replica allocation capacity, fix disk watermarks, or reduce shard count before you chase query tuning.

Task 9: Identify slow Elasticsearch queries via search profile

cr0x@server:~$ curl -s -H 'Content-Type: application/json' http://localhost:9200/products/_search -d '{
  "profile": true,
  "size": 20,
  "query": {
    "bool": {
      "must": [{"match": {"title": "shoes"}}],
      "filter": [
        {"terms": {"brand_id": [12, 15, 18]}},
        {"term": {"in_stock": true}},
        {"range": {"price": {"gte": 50, "lte": 120}}}
      ]
    }
  },
  "aggs": {"brands": {"terms": {"field": "brand_id"}}}
}' | sed -n '1,35p'
{
  "took" : 38,
  "timed_out" : false,
  "_shards" : { "total" : 12, "successful" : 12, "skipped" : 0, "failed" : 0 },
  "hits" : { "total" : { "value" : 18432, "relation" : "eq" }, "hits" : [ ... ] },
  "profile" : { "shards" : [ { "searches" : [ { "query" : [ { "type" : "BooleanQuery", "time_in_nanos" : 8200000 } ] } ] } ] }
}

What it means: took: 38ms is fine; profiling shows where time goes (query vs aggregation vs fetch).

Decision: If aggregations dominate, reduce cardinality, use filter aggs, or precompute some facets.

Task 10: Detect too many shards (an easy Elasticsearch self-own)

cr0x@server:~$ curl -s http://localhost:9200/_cat/indices/products?v
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   products Zk8p3bYfQkOaK9mD1n2x3Q  24   1   820000        12000     96gb        48gb

What it means: 24 primary shards for <1M docs is often excessive. Oversharding increases overhead and latency.

Decision: Reindex into fewer shards, or use rollover/ILM patterns if you truly need time-based indices (most product catalogs don’t).

Task 11: Check Elasticsearch JVM heap pressure indicators

cr0x@server:~$ curl -s http://localhost:9200/_nodes/stats/jvm?filter_path=nodes.*.jvm.mem.heap_used_percent,nodes.*.name | sed -n '1,25p'
{
  "nodes" : {
    "9a1b" : { "name" : "es-data-1", "jvm" : { "mem" : { "heap_used_percent" : 78 } } },
    "9a1c" : { "name" : "es-data-2", "jvm" : { "mem" : { "heap_used_percent" : 83 } } }
  }
}

What it means: High sustained heap usage correlates with GC pauses and latency spikes, especially with heavy aggregations.

Decision: Reduce aggregation load, fix mappings to avoid fielddata, scale out data nodes, or tune heap sizing carefully.

Task 12: Validate mappings for filter fields (keyword vs text)

cr0x@server:~$ curl -s http://localhost:9200/products/_mapping | sed -n '1,45p'
{
  "products" : {
    "mappings" : {
      "properties" : {
        "brand" : { "type" : "text" },
        "brand_id" : { "type" : "integer" },
        "color" : { "type" : "keyword" },
        "title" : { "type" : "text" },
        "in_stock" : { "type" : "boolean" },
        "price" : { "type" : "scaled_float", "scaling_factor" : 100 }
      }
    }
  }
}

What it means: Filtering/aggregating on brand as text is a trap; you want keyword (or use brand.keyword subfield).

Decision: Fix mapping in a new index and reindex. Don’t “work around” it with runtime scripts unless you enjoy latency.

Task 13: Verify MySQL statistics freshness (optimizer sanity)

cr0x@server:~$ mysql -e "ANALYZE TABLE products, product_attr;"
+-------------------+---------+----------+----------+
| Table             | Op      | Msg_type | Msg_text |
+-------------------+---------+----------+----------+
| catalog.products  | analyze | status   | OK       |
| catalog.product_attr | analyze | status | OK       |
+-------------------+---------+----------+----------+

What it means: Stats refresh can change query plans dramatically if they were stale.

Decision: If this “fixes” search, you still need a durable plan: schedule analyze, or redesign queries to be robust.

Task 14: Detect deep pagination abuse at the application layer

cr0x@server:~$ sudo awk '{print $7}' /var/log/nginx/access.log | grep -E 'search\?' | sed 's/.*offset=\([0-9]\+\).*/\1/' | sort -n | tail -n 10
2000
2400
3000
3600
4200
5000
6200
8000
12000
20000

What it means: Users (or bots) are requesting offsets that cause linear work in MySQL and non-trivial work in search engines.

Decision: Cap offset, require cursor-based paging, and apply bot mitigation. Deep paging is a product decision disguised as a technical one.

Task 15: Check disk latency (the hidden dependency for both systems)

cr0x@server:~$ iostat -x 1 3
Linux 6.5.0 (db1)  12/30/2025  _x86_64_  (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           12.1    0.0     3.2    18.7    0.0    66.0

Device            r/s     w/s   rkB/s   wkB/s  await  svctm  %util
nvme0n1         120.0   340.0  9800.0 40200.0  18.4   1.2   92.0

What it means: High await and near-saturated %util mean storage latency is throttling you. Search workloads amplify random reads.

Decision: If disk is the limiter, no amount of query tuning will fully save you. Reduce I/O (better indexes/denormalization) or upgrade storage.

Common mistakes: symptoms → root cause → fix

1) Symptom: “Search is fast for some categories, terrible for others”

Root cause: Data skew + stale/inaccurate statistics lead to unstable MySQL query plans; certain categories blow up join cardinality.

Fix: Run EXPLAIN ANALYZE on representative categories; refresh stats; add targeted composite indexes; consider moving faceting to Elasticsearch.

2) Symptom: “Adding one more filter makes latency jump from 200ms to 8s”

Root cause: Predicate changes join order or disables index usage; range predicate or OR condition breaks composite index prefix; temporary tables/filesort appear.

Fix: Rewrite query to reduce ORs; avoid leading wildcard LIKE; flatten hot attributes; stop OFFSET; or offload to Elasticsearch filter context.

3) Symptom: “Facet counts are wrong or missing under load”

Root cause: Caching full responses with stale inventory/price; or running counts on replicas with lag; or mixing eventual search index with real-time stock.

Fix: Decide what must be strongly consistent (usually checkout, not search). Use near-real-time indexing and separate “availability” logic with clear SLAs.

4) Symptom: “MySQL CPU is fine but response time is awful”

Root cause: Disk I/O wait from large scans, poor locality, or buffer pool misses; sorting spills to disk.

Fix: Measure Innodb_buffer_pool_reads and disk latency; reduce rows examined; provision memory/storage; stop doing analytical faceting in MySQL.

5) Symptom: “Elasticsearch is slow after we added more fields to the index”

Root cause: Mapping explosion; high-cardinality aggregations; too many keyword fields; increased doc values memory and disk footprint.

Fix: Audit mappings; only index what you query; disable indexing for non-search fields; prefer numeric IDs for facets; reduce aggregation sizes.

6) Symptom: “Elasticsearch cluster looks healthy but queries time out”

Root cause: Oversharding, slow merges, heap pressure, or disks near saturation; query fan-out multiplies per-shard overhead.

Fix: Reduce shard count via reindex; improve storage; watch heap and GC; tune refresh/merge; scale out data nodes.

7) Symptom: “We used MySQL FULLTEXT and it still can’t do what we need”

Root cause: FULLTEXT handles token matching, but structured filtering + facet counts + relevance tuning are the real workload.

Fix: Treat FULLTEXT as a partial tool for limited cases. If you need faceting and relevance, use Elasticsearch (or another dedicated search engine).

8) Symptom: “Search results change order between requests”

Root cause: Non-deterministic tie-breaking in SQL sorts; concurrent updates; in Elasticsearch, scoring ties and segment merges can reorder equal scores.

Fix: Add stable secondary sorts (e.g., product_id), use deterministic sort keys, and be explicit about sorting behavior.

Checklists / step-by-step plan

When MySQL is acceptable for “search” (do this, not that)

  • Use MySQL for admin search, customer service tooling, and small catalogs with simple filters.
  • Avoid EAV joins on hot paths. If you must filter on attributes, precompute a denormalized table keyed by product_id.
  • Keep queries index-friendly: no leading wildcard LIKE; limit OR logic; avoid deep OFFSET.
  • Measure rows examined and temporary tables; don’t rely on “it has an index.”

When Elasticsearch is the right call (most ecommerce storefronts)

  • You need faceted navigation with counts.
  • You need relevance ranking and tuning across fields.
  • You need typo tolerance, synonyms, stemming, multi-language analyzers.
  • You need consistent low latency under unpredictable filter combinations.

Step-by-step migration plan: MySQL truth, Elasticsearch search

  1. Define the contract: what fields are searchable, filterable, sortable; what freshness you guarantee.
  2. Design the document model: one product document with denormalized fields, plus nested attributes if needed.
  3. Create mappings intentionally: keyword for exact matches/facets, text for analyzed search, numeric for ranges, scaled_float for prices.
  4. Build an indexing pipeline: queue-based events, idempotent updates, retry logic, dead-letter handling.
  5. Backfill safely: build a new index, bulk load, validate counts and sample queries, then cut over with aliases.
  6. Implement dual reads during rollout: compare MySQL and ES results for a subset of traffic to catch mapping/logic drift.
  7. Guardrails: cap aggregations, cap page depth, rate-limit bots, and track query shapes.
  8. Operate it like production: dashboards for heap/disk/latency, alerting on indexing lag, runbooks for reindex.

Hard rules that prevent recurring incidents

  • Never ship a new facet without load testing on realistic filter combinations.
  • Never let UI generate unbounded IN (...) lists without a cap.
  • Always treat deep pagination as a feature decision with an explicit limit.
  • Always keep MySQL as source of truth; search index is a projection.
  • Always make reindex repeatable (aliases + replayable source).

FAQ

1) Can I keep everything in MySQL if I add more indexes?

For simple search, yes. For faceted ecommerce search with many filters and counts, no. Indexes can reduce pain,
but they don’t change the fundamental mismatch: joins + group-bys + ranking are not MySQL’s happy place at scale.

2) What if I use MySQL FULLTEXT?

FULLTEXT can help with token matching, but faceting and relevance tuning still tend to push you toward a search engine.
FULLTEXT also won’t rescue you from EAV joins, deep paging, and expensive facet counts.

3) Why do filters make SQL slower than text search alone?

Because filters often force joins (attributes, inventory, price tables) and kill index selectivity. The candidate set stays large,
then sorting and pagination multiply work. In Elasticsearch, many filters become cached set intersections.

4) Is Elasticsearch always faster?

For search-style workloads (text + filters + aggregations), usually. For transactional lookups and strict consistency, no.
Elasticsearch can also be slower if you overshard, mis-map fields, or run heavy aggregations on weak hardware.

5) How do I keep inventory “real-time” if Elasticsearch is eventually consistent?

Decide what “real-time” means for the UX. Common patterns: index inventory with frequent updates; treat “in stock” as near-real-time;
or validate availability at add-to-cart/checkout time. Don’t pretend search results are your final truth.

6) What’s the biggest schema mistake people make in Elasticsearch for ecommerce?

Mapping facet fields as text instead of keyword, and indexing everything “just in case.”
It inflates doc values, slows aggregations, and makes heap usage spiky.

7) How do I avoid deep pagination problems?

Prefer cursor-based pagination. In MySQL, use “seek method” (remember last seen sort key + id). In Elasticsearch, use
search_after for stable sorts. Also: cap maximum page depth. Nobody is buying page 500.

8) Can I do a hybrid: MySQL filter first, then Elasticsearch?

You can, but it’s often a complexity tax with weak benefits. If MySQL provides a small candidate set (highly selective),
then sure. But most storefront filters aren’t selective enough, and you end up doing two expensive queries instead of one.

9) What’s the operational minimum to run Elasticsearch sanely?

Solid disks, enough heap, sensible shard counts, monitoring on heap/disk/latency, and a reindex strategy using aliases.
Also: an indexing pipeline with retries and a way to detect lag.

10) When should I not use Elasticsearch?

If your “search” is really a small admin filter UI, or your catalog is tiny and stable, MySQL is simpler.
If your team cannot operate a distributed system, either invest in that capability or choose a managed solution.

Conclusion: practical next steps

If your ecommerce search includes multiple filters, facet counts, relevance ranking, and frequent sorting changes,
stop trying to out-index the problem in MySQL. You’ll spend weeks shaving milliseconds until a new facet reintroduces seconds.
That’s not engineering; it’s denial with metrics.

Do this instead:

  1. Measure: capture slow query shapes, rows examined, temp tables, and disk latency.
  2. Put guardrails on the UX: cap deep paging, cap multi-select explosion, rate-limit bots.
  3. Decide the contract: what must be fresh, what can be near-real-time, and what correctness means.
  4. Move search to Elasticsearch when the feature set demands it: faceting, relevance, typo tolerance.
  5. Operate the pipeline: replayable indexing, alias cutovers, dashboards for indexing lag and cluster health.

MySQL remains the right place for orders, inventory truth, pricing rules, and everything you’ll explain to auditors.
Elasticsearch is where you let customers explore your catalog at speed without turning your database into a space heater.

← Previous
MySQL vs PostgreSQL Full-Text Search: When Built-In Is Enough and When It’s a Trap
Next →
Debian 13 MTU/MSS Mismatch: Why Large Files Stall and How to Fix It Cleanly

Leave a comment