MySQL vs MongoDB for Reporting and Analytics: Why Teams Crawl Back to SQL

Was this helpful?

If you’ve ever watched a CEO refresh a dashboard while your on-call phone warms up, you know this pain: reporting traffic doesn’t degrade gently. It arrives all at once, asks “just one more slice,” and then your database starts making unfamiliar noises.

Teams love MongoDB for shipping product features fast. Then reporting shows up. Suddenly you’re debugging aggregation pipelines at 2 a.m., and everyone develops a nostalgic affection for SQL joins they used to complain about.

Why teams crawl back to SQL (and why it’s not hypocrisy)

Reporting and analytics are a different sport from OLTP. Your product database is built for small, predictable reads and writes: fetch user, update cart, insert order. Reporting is built for large scans, heavy grouping, broad filters, and “compare this month to last month across six dimensions.” It’s also built for humans with time zones, opinions, and Excel.

MongoDB can do analytics, especially with the aggregation framework, but many teams discover the same uncomfortable truth: document stores optimize for application-shaped access patterns, not analyst-shaped ones. Analysts don’t think in “documents”; they think in relationships and dimensions. They will ask questions your original document design actively resists.

SQL systems (MySQL, Postgres, SQL Server, etc.) have a long, grim history of being beaten into shape for reporting. That history matters. Optimizers, join strategies, stable semantics, mature tools—this is accumulated operational scar tissue. MongoDB is improving, but when your VP wants a retention cohort chart by noon, you want boring, legible, well-trodden paths.

Also: reporting has politics. People want consistent numbers across teams. That means consistent definitions. SQL schemas and views turn definitions into shared, reviewable artifacts. MongoDB often turns them into application code snippets and pipeline fragments sprinkled across repos like glitter—forever.

One short joke, as a treat: MongoDB is schema-less the way a teenager’s room is furniture-less. Technically true, operationally expensive.

Historical context and interesting facts (the bits people forget)

Here are some concrete facts and timeline bits that explain why SQL keeps “winning back” reporting workloads. Not vibes—actual history.

  1. SQL standardization started in the 1980s. That means decades of tooling: JDBC/ODBC, BI connectors, query planners, and a shared mental model across companies.
  2. MySQL shipped in the mid-1990s and became the default web database for a generation. Reporting patterns were learned in production the hard way: replicas, summary tables, rollups, and partitioning.
  3. MongoDB emerged around 2009 during a wave of “NoSQL” systems meant to scale horizontally and move fast. Many were designed primarily for application workloads, not ad-hoc analytics.
  4. The rise of JSON (and later JSON in SQL) blurred the line. SQL systems adopted JSON columns and functions; MongoDB adopted richer query and aggregation capabilities. Convergence happened, but not evenly.
  5. “Schema-on-read” became popular in big data (Hadoop-era thinking). It worked for batch processing, but operational reporting wants fast, interactive queries. Schema-on-read is rarely free.
  6. Columnar storage and warehouses changed expectations. Teams got used to dashboards that scan millions of rows quickly. Once you’ve tasted that, a document store doing broad group-by can feel like pedaling a truck.
  7. BI tooling evolved around SQL. Even when tools support MongoDB, the “golden path” features (semantic layers, caching, lineage, permissioning) are almost always SQL-first.
  8. Replication patterns are older than most microservices. MySQL replication and read scaling patterns are deeply understood operationally, including failure modes and recovery playbooks.

All of this adds up to a simple operational rule: reporting success is less about “can the database run the query” and more about “can your org operate the database under unpredictable human demand.” SQL systems have had more time in that arena.

Reporting workload reality: what dashboards do to databases

Reporting queries are rude. They don’t use your carefully designed primary keys. They don’t respect your cache. They love time ranges. They want to group by columns you didn’t index because they didn’t exist when you shipped v1.

The three things dashboards do that break “application database” assumptions

  • They cause fan-out: one page load triggers many queries (filters, totals, charts, drilldowns). Multiply by a Monday morning log-in storm.
  • They skew access patterns: instead of “get document by id,” you get “scan last 90 days and compute distribution.”
  • They demand consistency in definitions: “active user,” “revenue,” “churn,” “conversion” mean different things until you pin them down. SQL schemas, views, and stored logic help you enforce shared definitions.

Analytics is not just query speed

Speed matters, but so do:

  • Predictability: stable query performance across parameter values and data growth.
  • Explainability: you need to show how the number was computed, not just deliver it.
  • Operational isolation: reporting should not take down your checkout flow.
  • Schema governance: if your “events” have 19 shapes, somebody is going to compute “DAU” wrong.

There’s a reason many mature shops run reporting off replicas, warehouses, or separate data marts. They learned that mixing OLTP and ad-hoc analytics is like letting customers into your kitchen during dinner rush.

Why MySQL tends to win for reporting

1) Joins are a feature, not a moral failure

Reporting is relational by nature: orders relate to customers, customers relate to segments, segments relate to campaigns, campaigns relate to spend. Even if the app stores “customer snapshot” inside an order document, analysts will still ask for customer-level attributes, updated segmentation, and corrections. Joins are how you connect facts and dimensions without rewriting history.

MySQL’s join engine, optimizer, and indexing model are built around this. When you write a query that joins five tables with selective predicates, MySQL has a fighting chance of doing something sane. In MongoDB, you’ll often either denormalize (and pay for it later) or use $lookup (and then rediscover joins, but with fewer guardrails and sometimes more memory pain).

2) Query plans are inspectable and operationally actionable

In MySQL, EXPLAIN, histograms, and optimizer traces give you structured clues. You can ask “why did it pick this index,” “why did it sort,” “why did it scan,” and get an answer that maps to known tuning patterns.

This matters in incident response. When dashboards melt down production, you don’t want philosophical debates. You want to see the plan, fix the index, add a covering index, rewrite the query, or move the workload. Fast.

3) The ecosystem is relentlessly SQL-shaped

BI tools, governance tools, access control patterns, auditing, query logging, and even “how to teach new hires to do reporting” are simpler when the storage layer is SQL. Your finance team doesn’t want to learn aggregation pipelines. They want a view named revenue_daily.

4) Replicas, partitioning, and summary tables are boring and effective

Reporting in MySQL usually ends up with some combination of:

  • Read replicas for dashboard traffic
  • Partitioned fact tables (often by date)
  • Pre-aggregated summary tables for hot metrics
  • Materialized rollups (yes, you are building a mini-warehouse)

None of this is trendy. It is, however, how you keep the product running while the org asks questions.

Where MongoDB is genuinely good (and where it isn’t)

MongoDB shines when the document shape matches the query shape

If your reporting is “show the user profile with embedded preferences and recent activity” and you built documents exactly for that, MongoDB can be fast and pleasant. Aggregations over a single collection with selective filters and correct indexes can also perform well.

MongoDB struggles when reporting becomes cross-entity and evolving

The failure mode looks like this:

  • Data starts denormalized for speed.
  • Definitions evolve (“active” changes, segmentation changes, corrections arrive).
  • Now you have to backfill embedded fields, reconcile inconsistencies, and explain which version of truth each chart uses.
  • Someone introduces $lookup pipelines to “join” collections.
  • Memory spikes, intermediate results balloon, and the cluster behaves like it’s allergic to your CFO.

MongoDB reporting can be fine when you treat it like a source, not the reporting engine

A very common stable architecture is: MongoDB for OLTP-ish document use cases → stream/ETL into a SQL system or warehouse for reporting. That’s not an insult to MongoDB. It’s specialization. The same way you don’t run Kubernetes control plane workloads on the same nodes as your batch Spark jobs unless you enjoy chaos.

Joins vs embedding: the reporting tax you always pay eventually

Embedding is seductive. One read, everything you need, no joins. Great—until your business wants history and consistency. The minute you embed mutable attributes (customer tier, address, segment) into fact-like documents (orders, events), you’re choosing a versioning model. Usually it’s “whatever it was at write time.” Sometimes that’s correct. Often it isn’t.

Mutable dimensions are where NoSQL denormalization hurts

Reporting wants the ability to ask:

  • “What was revenue by current segment?”
  • “What was revenue by segment at time of purchase?”
  • “What happens if we reclassify these customers?”

In SQL, this is a schema decision: slowly changing dimensions, snapshot tables, or “type 2” history. In MongoDB, teams often hack it by storing multiple fields, backfilling, or writing ad-hoc reconciliation pipelines. That can work, but it’s less explicit and easier to get wrong.

Second short joke (and we’re done):

Denormalization is like packing for a trip by wearing every shirt at once. You’ll save luggage space until you need to change clothes.

Query plans, indexes, and why “it worked in staging” is a lie

Staging data is polite. Production data is creative. Analytics queries amplify that creativity because they touch more rows, more ranges, and more weird corners.

MySQL: predictable failure modes

In MySQL reporting incidents, the usual culprits are:

  • Missing composite indexes for common filter+group patterns
  • Sorting large result sets due to ORDER BY on non-indexed columns
  • Join order mistakes when statistics are stale or skewed
  • Temporary table spills (disk temp tables) during aggregation
  • Replica lag when dashboards hammer replicas

The good news: these issues are diagnosable with standard tools and typically fixable without redesigning the entire data model.

MongoDB: performance cliffs via aggregation and memory

MongoDB can behave well until it doesn’t. Typical cliffs:

  • $group with high cardinality keys creates massive in-memory state
  • $lookup joins explode intermediate results
  • Index usage is fine for the first $match, then you’re in pipeline land
  • Queries get slower as documents become more varied (wide documents, arrays)
  • Sharding keys optimized for OLTP don’t match reporting filters

One reliability quote (paraphrased idea)

Paraphrased idea, attributed to Richard Cook: “Success in operations often hides the work that makes failure not happen.”

Reporting is where hidden work becomes visible. If you want dashboards that don’t embarrass you, you need explicit design: schemas, indexes, isolation, and routine maintenance.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

Company A ran their core product on MongoDB. Orders were documents with embedded customer snapshots: name, email, tier, region. The assumption was simple: “We’ll always want the customer info as it was when the order happened.” That was true for receipts and customer support, so everyone nodded and moved on.

Then the business launched a compliance-driven reclassification project. Regions changed. Customers got remapped. Finance wanted revenue by current region for the last eight quarters to match new reporting. The dashboard team built an aggregation pipeline over the orders collection and grouped by the embedded region field. It returned numbers. They were wrong. Quietly wrong.

When the discrepancy surfaced, engineering tried to “fix the data” by backfilling embedded region fields across historical orders. That took days, stressed the cluster, and broke the original assumption for customer support: the receipt now displayed a region the customer didn’t live in at purchase time. Two truths, one field.

The eventual fix was not glamorous: normalize dimensions into a SQL reporting store, keep order facts immutable, and model region as a slowly changing dimension with explicit “as-of” joins. The reporting layer got two views: “region_at_purchase” and “region_current.” The dashboard stopped lying. People stopped yelling. This is what adulthood looks like in data systems.

Mini-story 2: The optimization that backfired

Company B used MySQL and hit reporting latency. Someone proposed an optimization: “We’ll precompute everything nightly into one wide table so dashboards never join.” It sounded efficient. It also sounded like something a tired team says when they’ve been paging for months.

The wide table worked for about two weeks. Then a product launch introduced new attributes. The ETL grew conditional logic. Columns multiplied. The table became wide enough that most queries read far more data than necessary. InnoDB buffer pool churned. Replication lag got worse because the nightly job wrote a huge amount of data in a short window.

Worse: because the job ran nightly, the business started expecting “today’s” numbers at 10 a.m. They got yesterday. Teams began running partial backfills during the day. That collided with peak traffic and created a predictable daily incident window. The “optimization” turned into an operational schedule of pain.

The recovery was a step back to basics: keep normalized fact and dimension tables, build a small set of targeted summary tables for the truly hot metrics, and refresh them incrementally (hourly or streaming) with idempotent logic. Dashboards got faster, and the system stopped producing a daily surprise.

Mini-story 3: The boring but correct practice that saved the day

Company C had both MongoDB (product) and MySQL (reporting). The practice that saved them wasn’t a fancy architecture. It was a weekly routine: review the slow query log, review replica lag graphs, and run a small suite of “top dashboard queries” against production-like data with EXPLAIN/explain() captured.

It was dull. It also created a shared language between data analysts and SRE. Analysts learned to phrase questions in ways that didn’t require scanning the universe. SRE learned which metrics were business-critical and deserved summary tables or dedicated replicas.

When the company had a traffic spike from an unexpected partnership launch, reporting load increased sharply. The on-call engineer already knew the two worst queries, the indexes they required, and the safe throttles. They turned on a read-only reporting replica for dashboards and rate-limited a “download CSV” endpoint that had a habit of doing full table scans.

No heroics. No all-hands. Just predictable operations built from a boring habit. The partnership launch became a revenue story, not an outage story.

Practical tasks (commands, outputs, and decisions)

These are real tasks you can run today. Each includes: command, what the output means, and the decision you make from it. Use them as a working checklist, not trivia.

Task 1: MySQL — identify the top time-consuming query patterns

cr0x@server:~$ sudo pt-query-digest /var/log/mysql/mysql-slow.log | head -n 40
# 120s user time, 2.1s system time, 28.50M rss, 120.00M vsz
# Query_time distribution
#   1us
#  10ms
# 100ms  ####
#    1s  ##########
#   10s  #####
# Profile
# Rank Query ID           Response time Calls R/Call  V/M   Item
# ==== ================== ============= ===== ======= ===== ============
#    1 0xA1B2C3D4E5F6A7B8  420.1234  1200 0.3501  0.02  SELECT orders...
#    2 0x1122334455667788  180.0000   200 0.9000  0.01  SELECT users...

Meaning: You get which query shapes cost the most total time, not just the single slowest. Reporting pain is usually “a moderately slow query called a lot.”

Decision: Fix rank #1–#3 first: add indexes, rewrite, or move them to a replica/summary table. Ignore the rare unicorn query until the herd stops stampeding.

Task 2: MySQL — check replica lag before blaming the query

cr0x@server:~$ mysql -e "SHOW SLAVE STATUS\G" | egrep "Seconds_Behind_Master|Slave_IO_Running|Slave_SQL_Running"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 187

Meaning: Your reporting replica is ~3 minutes behind. Dashboards may show “wrong” numbers and analysts will call it a data bug.

Decision: If lag is unacceptable, reduce write pressure (batch ETL), increase replica capacity, or route some dashboards to cached results. Don’t tune a query on a replica that’s drowning in apply lag.

Task 3: MySQL — verify what the optimizer is actually doing

cr0x@server:~$ mysql -e "EXPLAIN FORMAT=tree SELECT c.segment, DATE(o.created_at) d, SUM(o.total) s FROM orders o JOIN customers c ON c.id=o.customer_id WHERE o.created_at >= '2025-01-01' AND o.created_at < '2025-02-01' GROUP BY c.segment, d;"
EXPLAIN: -> Aggregate using temporary table  (cost=...)
    -> Nested loop inner join  (cost=...)
        -> Index range scan on orders using idx_orders_created_at  (cost=...)
        -> Single-row index lookup on customers using PRIMARY (id=o.customer_id)

Meaning: The query uses a range scan on orders.created_at, then PK lookups to customers, then a temp table for grouping.

Decision: If temp table is huge, consider a composite index to support group-by patterns (or a summary table). If the range scan is too broad, partition by date or tighten predicates.

Task 4: MySQL — find whether temp tables are spilling to disk

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Created_tmp%';"
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Created_tmp_disk_tables | 48219    |
| Created_tmp_tables      | 51200    |
+-------------------------+----------+

Meaning: A large portion of temp tables are hitting disk, often due to big sorts/group-bys.

Decision: Tune queries/indexes to reduce temp tables, or adjust memory limits carefully. For reporting, prefer pre-aggregation instead of “bigger temp space” as your strategy.

Task 5: MySQL — check buffer pool pressure (are you caching anything?)

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads'; SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';"
+--------------------------+-----------+
| Variable_name            | Value     |
+--------------------------+-----------+
| Innodb_buffer_pool_reads | 9812234   |
+--------------------------+-----------+
+----------------------------------+-------------+
| Variable_name                    | Value       |
+----------------------------------+-------------+
| Innodb_buffer_pool_read_requests | 2048123456  |
+----------------------------------+-------------+

Meaning: Many logical reads are served from memory; some physical reads still happen. If Innodb_buffer_pool_reads climbs rapidly during dashboard use, you’re thrashing.

Decision: Consider separating reporting to replicas with bigger memory, adding covering indexes, or offloading to a warehouse. Don’t let dashboards evict OLTP hot pages.

Task 6: MySQL — confirm you have the right composite index (or don’t)

cr0x@server:~$ mysql -e "SHOW INDEX FROM orders;"
+--------+------------+------------------------+--------------+-------------+-----------+-------------+
| Table  | Key_name   | Column_name            | Seq_in_index | Cardinality | Non_unique| Index_type  |
+--------+------------+------------------------+--------------+-------------+-----------+-------------+
| orders | PRIMARY    | id                     | 1            | 50000000    | 0         | BTREE       |
| orders | idx_orders_created_at | created_at   | 1            | 3650        | 1         | BTREE       |
| orders | idx_orders_customer_id| customer_id  | 1            | 2000000     | 1         | BTREE       |
+--------+------------+------------------------+--------------+-------------+-----------+-------------+

Meaning: You have single-column indexes but not a composite one like (created_at, customer_id) or (created_at, status) that matches reporting filters.

Decision: Add composite indexes for the top reporting predicates. Do it on a replica first, measure, then roll forward. Indexes are not free—optimize for the queries you actually run.

Task 7: MongoDB — identify heavy aggregations and whether they spill

cr0x@server:~$ mongosh --quiet --eval 'db.orders.explain("executionStats").aggregate([{ $match: { createdAt: { $gte: ISODate("2025-01-01"), $lt: ISODate("2025-02-01") } } },{ $group: { _id: "$segment", revenue: { $sum: "$total" } } }], { allowDiskUse: true })'
{
  "stages": [
    { "$cursor": { "queryPlanner": { "winningPlan": { "stage": "IXSCAN" } }, "executionStats": { "totalDocsExamined": 4200000, "totalKeysExamined": 4200000 } } },
    { "$group": { "usedDisk": true, "spills": 3 } }
  ]
}

Meaning: The grouping spilled to disk (usedDisk: true). That’s a latency and IO warning light.

Decision: Reduce cardinality of group keys, pre-aggregate, or move the workload out of MongoDB. If disk spill is common for core dashboards, you’re doing analytics in the wrong place.

Task 8: MongoDB — check if a query is using the intended index

cr0x@server:~$ mongosh --quiet --eval 'db.orders.find({createdAt: {$gte: ISODate("2025-01-01"), $lt: ISODate("2025-02-01")}, status: "paid"}).sort({createdAt: -1}).hint({createdAt: 1}).explain("executionStats").queryPlanner.winningPlan'
{
  "stage": "FETCH",
  "inputStage": {
    "stage": "IXSCAN",
    "keyPattern": { "createdAt": 1 },
    "direction": "forward"
  }
}

Meaning: The planner can use createdAt index, but your status filter isn’t in the index; it may examine more documents than necessary.

Decision: Add a compound index like {status: 1, createdAt: 1} if this is a top query. If you also sort by createdAt, choose index order carefully.

Task 9: MongoDB — find current operations causing pain

cr0x@server:~$ mongosh --quiet --eval 'db.currentOp({ "active": true, "secs_running": { $gte: 5 } }).inprog.map(op => ({secs: op.secs_running, ns: op.ns, op: op.op, command: op.command && Object.keys(op.command)[0]}))'
[
  { "secs": 42, "ns": "prod.orders", "op": "command", "command": "aggregate" },
  { "secs": 19, "ns": "prod.events", "op": "command", "command": "find" }
]

Meaning: Long-running operations include an aggregation on orders. That’s often a dashboard or export.

Decision: Kill or throttle the known offender, add guardrails (timeouts, maxTimeMS), and move that query to a reporting store.

Task 10: OS — check disk latency when analytics “randomly slows down”

cr0x@server:~$ iostat -x 1 3
avg-cpu:  %user %nice %system %iowait  %steal  %idle
          12.00  0.00   6.00    18.00    0.00  64.00

Device            r/s     w/s   rkB/s   wkB/s  await  svctm  %util
nvme0n1         820.0  1100.0  64000   72000   28.5   0.9   97.0

Meaning: await is high and %util is saturated. Your database is IO-bound right now, not “CPU slow.”

Decision: Reduce large scans (indexes/partitioning), move reporting to replicas on separate storage, or schedule heavy jobs off-peak. Don’t keep tuning query syntax if the disk is pegged.

Task 11: OS — confirm whether memory pressure is causing swap storms

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1  524288  10240  12000  98000   15   25  1800  2400  900 1200 20  8 50 22  0

Meaning: Swap-in/out (si/so) is non-zero under load. That’s a database performance killer, especially for aggregations.

Decision: Stop running reporting on undersized nodes. Add memory, isolate reporting, or cap concurrency. A swapping database is a polite way of saying “I refuse.”

Task 12: MySQL — measure connection and thread saturation

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_running'; SHOW GLOBAL STATUS LIKE 'Connections'; SHOW VARIABLES LIKE 'max_connections';"
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Threads_running | 184   |
+-----------------+-------+
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| Connections   | 9812231  |
+---------------+----------+
+-----------------+------+
| Variable_name   | Value|
+-----------------+------+
| max_connections | 500  |
+-----------------+------+

Meaning: Lots of concurrent threads; you’re not at max connections, but you may be CPU context switching or hitting internal contention.

Decision: Add connection pooling, reduce dashboard parallelism, or route to a replica. If reporting traffic causes thread spikes, isolate it.

Task 13: MySQL — find the biggest tables and index bloat (storage reality check)

cr0x@server:~$ mysql -e "SELECT table_name, ROUND((data_length+index_length)/1024/1024/1024,2) AS size_gb FROM information_schema.tables WHERE table_schema='prod' ORDER BY (data_length+index_length) DESC LIMIT 10;"
+------------+---------+
| table_name | size_gb |
+------------+---------+
| events     | 420.55  |
| orders     | 88.12   |
| users      | 12.40   |
+------------+---------+

Meaning: If your biggest table is also the one powering dashboards, you’re scanning a lot of data whether you admit it or not.

Decision: Partition by date, archive, or move events to a warehouse. If “events” is 400+ GB and used for interactive reporting, stop pretending this is just OLTP.

Task 14: MongoDB — check index inventory and spot missing compound indexes

cr0x@server:~$ mongosh --quiet --eval 'db.orders.getIndexes().map(i => i.key)'
[
  { "_id": 1 },
  { "createdAt": 1 },
  { "customerId": 1 }
]

Meaning: Only single-field indexes exist. Aggregations filtering by multiple fields will examine too many documents.

Decision: Add compound indexes aligned to your top dashboard filters (e.g., {status: 1, createdAt: 1}). If your analytics requires a dozen such indexes, that’s a hint you should ETL into SQL/warehouse instead.

Task 15: MySQL — validate binlog/replication settings for reporting replicas

cr0x@server:~$ mysql -e "SHOW VARIABLES LIKE 'binlog_format'; SHOW VARIABLES LIKE 'sync_binlog';"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sync_binlog   | 1     |
+---------------+-------+

Meaning: Durable binlogging is on. Great for correctness, but it can add write overhead. Your reporting strategy shouldn’t rely on aggressive writes during peak times.

Decision: If reporting ETL is hammering primary writes, move ETL off the primary path (CDC, stream, or replica-based extraction). Don’t “optimize” durability settings because dashboards are slow.

Fast diagnosis playbook: find the bottleneck in minutes

This is the playbook I want on the wall when a dashboard incident hits. You don’t have time to be elegant. You have time to be correct.

First: decide whether it’s query cost, concurrency, or infrastructure

  1. Check saturation: CPU, IO, memory, network. If IO is pegged or swapping, tuning indexes won’t save you mid-incident.
  2. Check concurrency: number of running threads/ops. Reporting often fails from too many “reasonable” queries in parallel.
  3. Check top offenders: slow query digest (MySQL) or currentOp/slow logs (MongoDB).

Second: isolate the workload

  1. Route dashboards to a read replica or dedicated reporting node if you have it.
  2. Rate-limit exports and “download CSV” endpoints (they are stealth batch jobs).
  3. Turn on caching for the top charts if correctness tolerates slight staleness.

Third: prove the plan and fix the highest-leverage thing

  • MySQL: run EXPLAIN, add the missing composite index, or rewrite to reduce rows scanned. If grouping is huge, build a rollup table.
  • MongoDB: run explain("executionStats") for the pipeline, look for docs examined vs returned, and disk spills. If it’s spilling and cardinality is high, stop and offload.

Fourth: prevent recurrence

  • Create a “dashboard query budget” (max runtime, max scanned docs, max concurrency).
  • Add scheduled slow query review.
  • Define shared metrics in views/tables, not in application code fragments.

Common mistakes: symptoms → root cause → fix

1) Symptom: dashboards get slower every week without code changes

Root cause: data growth + missing partitioning/archival; queries scan wider time ranges over time.

Fix: partition fact tables by date (MySQL), build summary tables, or ETL to a warehouse. In MongoDB, introduce time-bucket collections or offload analytics.

2) Symptom: the “same query” is fast sometimes and terrible other times

Root cause: parameter sensitivity and skew. One customer/segment has 100× data; optimizer chooses a plan that’s fine for small segments but awful for big ones.

Fix: in MySQL, add composite indexes aligned to selective predicates; consider query rewrites; keep stats accurate. In MongoDB, ensure the initial $match is selective and indexed; avoid high-cardinality $group without pre-aggregation.

3) Symptom: reporting “works” but numbers don’t match across teams

Root cause: metric definitions implemented ad-hoc in pipelines or application code; no shared semantic layer.

Fix: define metrics in SQL views or controlled transformations; version them; add test datasets. Treat metric definitions like API contracts.

4) Symptom: MongoDB cluster becomes unstable during business reviews

Root cause: large aggregations with disk spills; $lookup joins; inadequate indexes; sharding key optimized for writes, not reporting.

Fix: move analytics to SQL/warehouse; pre-aggregate; redesign shard key for analytical queries only if you accept OLTP tradeoffs. Most teams should not fight this battle in MongoDB.

5) Symptom: MySQL primary slows down when analysts run queries

Root cause: running reporting on the primary; large reads and temp tables evict OLTP cache; locks/contention.

Fix: separate reporting via replicas; enforce read-only users; cap query runtime; use summary tables. Protect the primary like it pays your salary (it does).

6) Symptom: replica lag spikes during ETL or “refresh metrics” jobs

Root cause: heavy write bursts, large transactions, or ETL that updates many rows; replica can’t apply fast enough.

Fix: incremental updates, chunking, idempotent loads, and scheduling. Consider CDC into a separate reporting store rather than hammering MySQL with updates.

7) Symptom: adding indexes “did nothing”

Root cause: wrong index order, low selectivity, functions on columns (non-sargable predicates), or the query actually bottlenecks on grouping/sorting.

Fix: inspect plans; avoid wrapping indexed columns in functions; add covering/composite indexes; or pre-aggregate. Indexes aren’t magic; they’re math with consequences.

Checklists / step-by-step plan

Decision checklist: should reporting run on MongoDB directly?

  • Are most reports single-collection with selective predicates? If yes, maybe.
  • Do you need frequent cross-entity joins, changing definitions, or dimensional modeling? If yes, plan for SQL/warehouse.
  • Do you need BI tool compatibility with minimal friction? SQL wins.
  • Are you seeing aggregation disk spills or memory pressure? That’s your exit sign.

Step-by-step plan: build a sane reporting path without drama

  1. Inventory top dashboards: list the top 20 queries by frequency and business importance.
  2. Classify each query: OLTP-ish (point lookups), analytical (scan/group), export/batch (big dumps).
  3. Pick an isolation strategy:
    • MySQL: read replicas for dashboards, separate replica for exports.
    • MongoDB: separate analytics node only if you can tolerate eventual consistency; otherwise ETL out.
  4. Model the data explicitly:
    • Define facts (orders, events) and dimensions (customer, product, campaign).
    • Decide which dimensions need history (slowly changing) vs snapshots.
  5. Create a minimal semantic layer: SQL views for shared definitions of metrics and filters.
  6. Add the right indexes: based on real query patterns, not intuition.
  7. Pre-aggregate hot metrics: daily/hourly summaries, incrementally updated.
  8. Set guardrails:
    • Query timeouts, concurrency limits, max export size.
    • Separate roles for analysts vs service accounts.
  9. Operationalize it:
    • Slow query review cadence.
    • Replica lag alerts and runbooks.
    • Capacity planning tied to dashboard usage.
  10. Migrate gradually: start with one dashboard, validate numbers, then expand.

Migration checklist: moving analytics from MongoDB to MySQL (or a SQL layer)

  • Define canonical event/order schema (with types and nullability).
  • Decide idempotency keys for loads (e.g., event_id).
  • Backfill historical data in chunks; validate per-day totals.
  • Run dual-write or CDC-based sync until confidence is high.
  • Freeze definitions in views; require review for changes.
  • Cut dashboards over one-by-one; keep old pipeline for rollback window.

FAQ

1) Is MongoDB “bad” for analytics?

No. It’s good at analytics that match its strengths: single-collection queries with good indexes and reasonable grouping cardinality. It’s less good as a general-purpose BI backend under ad-hoc joins and evolving business definitions.

2) Why do joins matter so much for reporting?

Because reporting questions are relational: facts plus dimensions. If you avoid joins by embedding everything, you’re choosing a versioning model and paying with backfills, duplication, and inconsistent truth.

3) Can’t MongoDB just use $lookup and be fine?

$lookup is joins, but in a pipeline model that can create large intermediate results and memory pressure. It can work for small joins. It often becomes fragile at scale, especially under ad-hoc queries.

4) Why not run reporting on the MySQL primary if it’s “fast enough”?

Because “fast enough” is a lie you tell yourself until a board deck is due. Reporting load is bursty and unpredictable, and it can evict OLTP hot pages, increase lock contention, and trigger cascading latency. Use replicas or a separate reporting store.

5) What’s the fastest win for MySQL reporting performance?

Usually: add the correct composite index for the top dashboard query pattern, and route reporting traffic to a read replica. The second fastest win is a summary table for a hot metric.

6) What’s the fastest win for MongoDB reporting performance?

Make the first $match selective and indexed; add compound indexes that match filter patterns; use allowDiskUse when appropriate. If the query still spills and scans millions of documents, stop optimizing and offload.

7) Should we store analytics events in MongoDB at all?

If your event use case is application-centric (“show user activity feed”), MongoDB can be fine. For BI-grade event analytics, you’ll likely want a system optimized for scans, aggregations, and compression. Many teams keep MongoDB as the source and replicate events out.

8) How do we keep reporting numbers consistent across teams?

Define metrics centrally (SQL views/controlled transforms), version changes, and test with fixed datasets. Avoid metric logic living in scattered application code or one-off pipelines.

9) Is MySQL enough, or do we need a warehouse?

If your reporting is operational (recent time windows, moderate data volume, interactive dashboards), MySQL with replicas and summary tables can be enough. If you’re doing deep history, wide scans, and heavy multi-dimensional slicing, a warehouse will reduce pain.

10) What if our team already knows MongoDB and not SQL?

Learn enough SQL to operate reporting sanely. Reporting is a business interface, not an engineering preference contest. SQL literacy pays rent.

Conclusion: pragmatic next steps

If you’re choosing between MySQL and MongoDB specifically for reporting and analytics, choose SQL unless you can clearly prove your reporting questions are document-shaped and will stay that way. They won’t. The org will change its mind. It always does.

Do this next:

  1. Move reporting off the primary database path: replicas, a reporting cluster, or a separate SQL store.
  2. Write down your top 20 reporting queries and make them first-class citizens: indexes, plans, and budgets.
  3. Stop embedding mutable dimensions into immutable facts unless you explicitly want “as-written truth.” Model history intentionally.
  4. Build a small semantic layer (views/tables) so the company can argue about definitions without arguing about pipelines.
  5. Operationalize the boring habits: slow query review, replica lag alerts, and load testing on production-like data.

Teams “crawl back to SQL” because SQL is older, yes—but mostly because it’s operationally legible under pressure. When reporting becomes a business-critical system, legibility beats cleverness. Every time.

← Previous
Email SPF fails: the 5 record mistakes that break delivery (and fixes)
Next →
A modern CSS reset for 2026: minimal, safe for forms and media

Leave a comment