MySQL vs MariaDB on NVMe vs SATA SSD: why your DB still feels slow (and how to prove it)

October 11, 2025 • February 3, 2026 • Read: 21 min • Views: 13

Was this helpful?

You bought NVMe. You migrated the data directory. You posted the “we’re modern now” message in the team channel.
And the app still times out at 9:05 a.m. like it’s running on a laptop hard drive from 2007.

Here’s the uncomfortable truth: storage upgrades don’t fix database performance problems unless you understand what kind of I/O you’re actually doing.
“NVMe is fast” is not a diagnosis. It’s a purchase order.

What’s actually slow: latency, fsync, CPU, locks, or the query

When someone says “the database is slow,” what they usually mean is “the user is waiting.”
The wait can be caused by disk, sure. But it’s just as often:

CPU (bad plan, missing index, or simply too many concurrent requests)
Locks/transactions (row locks, gap locks, metadata locks, or long-running transactions)
Commit path latency (fsync, redo log flush, doublewrite behavior)
Read amplification (random reads because your working set doesn’t fit in memory)
Write amplification (small writes triggering huge background work)
Network and DNS nonsense (yes, this still happens)

NVMe can deliver insane throughput. But databases don’t buy throughput; they buy predictable latency.
One 99.9th percentile fsync spike can ruin your p95 API latency graphs for the entire morning.

My operational rule: if you haven’t measured where time goes, you’re not troubleshooting; you’re narrating.

One quote worth keeping in your head while you measure: “Hope is not a strategy.” — General Gordon R. Sullivan.

MySQL vs MariaDB: where behavior diverges in the real world

They’re cousins, not twins

For most application teams, MySQL and MariaDB “feel” the same until you’re paged at 02:00 and need to interpret
a thread dump, a replication delay, or a performance regression after a minor version bump.
On storage, both heavily depend on the same physics: page caches, writeback, fsync behavior, and the shape of your workload.
But their defaults, features, and edge behaviors can differ enough to change your bottleneck.

Commit and flushing behavior: the boring stuff that makes you slow

The commit path is where NVMe gets the most attention: log writes and flushes.
The catch is that your system’s flush behavior is a three-way negotiation between:
InnoDB settings, your filesystem + mount options, and your drive firmware/controller.

innodb_flush_log_at_trx_commit decides when redo is flushed on commit (durability vs latency).
sync_binlog influences how often binlog is fsync’d (replication safety vs write latency).
innodb_flush_method (often O_DIRECT) affects double caching and write patterns.
redo log size and checkpoint pressure can turn a decent device into a jitter factory.

Performance instrumentation differences you’ll notice

Modern MySQL has a strong Performance Schema story and a big ecosystem around it. MariaDB has good instrumentation too,
but you’ll find differences in table names, counters, and which waits are exposed in what shape.
Don’t pick a database because a blog told you it’s “faster.” Pick based on:
operational clarity, replication behavior you can live with, and your team’s ability to debug it under pressure.

Opinionated advice: if you can’t answer “what is the top wait event at peak” in 60 seconds, you’re flying blind regardless of brand.

NVMe vs SATA SSD: the parts that matter to databases

NVMe is not just “faster SSD”

SATA SSD performance was shaped by the AHCI interface: limited queue depth, higher overhead, and a design that made sense when
storage was pretending to be a spinning disk. NVMe is a different model: lots of parallel queues, low overhead, and much higher
potential IOPS.

If your workload is single-threaded, or dominated by fsync latency, or blocked on row locks,
NVMe will not feel like magic. It’ll feel like you upgraded your car’s engine while towing a trailer with the parking brake on.

Databases care about the ugly details

Latency distribution (p99 matters more than “up to 3.5GB/s”)
Write cache behavior (volatile cache + power-loss protection changes everything)
Thermal throttling (NVMe drives do it quietly, like a polite saboteur)
Queue depth and parallelism (a single busy thread won’t saturate NVMe)
Steady-state performance (after SLC cache is exhausted)

Joke #1: Buying NVMe to fix a slow query is like buying a faster elevator because you forgot where you parked.
The ride is smoother, but you’re still lost.

When SATA can be “fine”

If your dataset fits in memory, reads are mostly served from the buffer pool and OS caches. Storage matters primarily for:
redo log writes, binary log writes, and occasional page flushes.
A decent SATA SSD can handle that—until concurrency rises and fsync storms show up.

NVMe tends to shine when you have high concurrency random I/O, large working sets,
and lots of write pressure—or when you need low latency under load, not just high peak throughput on a marketing slide.

Interesting facts and context (the stuff people forget)

MariaDB started in 2009 as a fork after Oracle acquired Sun Microsystems, which owned MySQL.
InnoDB became the default storage engine for MySQL years ago; it wasn’t always the norm—MyISAM used to be common, with very different I/O behavior.
NVMe 1.0 shipped in 2011, and the “NVMe makes everything fast” myth arrived shortly after, often without queue-depth-aware testing.
fsync semantics vary by filesystem and mount options; two servers with identical SSDs can show wildly different commit latency.
Write amplification is real: a single logical page change can cascade into redo logging, doublewrite buffering, flushing, and background merges.
IOPS numbers are negotiable: vendors quote them at specific queue depths and access patterns that rarely match OLTP behavior.
Linux I/O schedulers changed over time; NVMe often uses a different default scheduler than SATA, impacting latency under load.
Power-loss protection matters: “enterprise” SSDs often have capacitors; consumer SSDs often don’t. This changes safe write caching and durability.
Replication depends on binlog fsync: pushing durability to “fast” settings can make replicas fall behind in different, confusing ways.

How to prove the bottleneck: 12+ practical tasks with commands

These are not “benchmarks for bragging.” They’re probes. Run them during a slow period and during a normal period.
Capture output. Make a call based on evidence.

Task 1: Confirm what storage you’re actually using (NVMe vs SATA) and how it’s connected

cr0x@server:~$ lsblk -d -o NAME,MODEL,TRAN,ROTA,SIZE
NAME  MODEL                 TRAN ROTA  SIZE
nvme0n1 Samsung SSD 990 PRO  nvme    0  1.8T
sda   Samsung SSD 860 EVO    sata    0  1.8T

What it means: TRAN tells you the interface. ROTA=0 means “non-rotational” (SSD).
Decision: if your datadir is still on sda, your NVMe purchase is currently decorative.

Task 2: Verify the MySQL/MariaDB datadir and which filesystem it lives on

cr0x@server:~$ mysql -e "SHOW VARIABLES LIKE 'datadir';"
+---------------+-----------------+
| Variable_name | Value           |
+---------------+-----------------+
| datadir       | /var/lib/mysql/ |
+---------------+-----------------+

cr0x@server:~$ df -Th /var/lib/mysql
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/nvme0n1p2 ext4  1.8T  620G  1.1T  37% /var/lib/mysql

What it means: you’re on ext4 backed by NVMe partition nvme0n1p2.
Decision: note filesystem type; it affects durability semantics and tuning options.

Task 3: Check mount options that influence write barriers and latency

cr0x@server:~$ findmnt -no TARGET,SOURCE,FSTYPE,OPTIONS /var/lib/mysql
/var/lib/mysql /dev/nvme0n1p2 ext4 rw,relatime,errors=remount-ro

What it means: basic ext4 mount. No exotic options shown.
Decision: if you see oddities like data=writeback or disabled barriers, treat it as a risk review, not a “performance tweak.”

Task 4: Confirm InnoDB flush method and commit durability knobs

cr0x@server:~$ mysql -e "SHOW VARIABLES WHERE Variable_name IN ('innodb_flush_method','innodb_flush_log_at_trx_commit','sync_binlog');"
+--------------------------------+-------+
| Variable_name                  | Value |
+--------------------------------+-------+
| innodb_flush_log_at_trx_commit | 1     |
| innodb_flush_method            | O_DIRECT |
| sync_binlog                    | 1     |
+--------------------------------+-------+

What it means: full durability: flush redo on every commit; fsync binlog on every transaction.
Decision: if latency is your problem and you can tolerate some risk, you may consider relaxing one knob—but only after measuring fsync waits (Task 10).

Task 5: Measure device health and whether it’s throttling or erroring

cr0x@server:~$ sudo smartctl -a /dev/nvme0n1 | sed -n '1,25p'
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0] (local build)
=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 990 PRO
Serial Number:                      S7X...
Firmware Version:                   5B2QJXD7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]

What it means: device is recognized and provides firmware info.
Decision: if you see critical warnings, media errors, or high temperature, stop “tuning MySQL” and fix the hardware path first.

Task 6: Check NVMe temperature and thermal throttling signals

cr0x@server:~$ sudo smartctl -a /dev/nvme0n1 | grep -E "Temperature:|Critical Warning|Thermal"
Critical Warning:                   0x00
Temperature:                        64 Celsius

What it means: 64°C is warm but not catastrophic for many drives; throttling thresholds vary.
Decision: if temps climb during peak and latency spikes match, add airflow/heatsink or move the device; don’t argue with thermodynamics.

Task 7: Inspect the kernel’s view of I/O scheduler and queueing

cr0x@server:~$ cat /sys/block/nvme0n1/queue/scheduler
[mq-deadline] none kyber bfq

cr0x@server:~$ cat /sys/block/sda/queue/scheduler
[mq-deadline] none kyber bfq

What it means: mq-deadline is active.
Decision: if you’re chasing tail latency, scheduler choice can matter. But change it only with before/after latency evidence, not vibes.

Task 8: Validate whether you’re CPU bound during “storage problems”

cr0x@server:~$ mpstat -P ALL 1 5
Linux 6.8.0 (server)  12/29/2025  _x86_64_ (16 CPU)

12:01:22 PM  CPU   %usr %nice %sys %iowait %irq %soft %steal %idle
12:01:23 PM  all   62.1  0.0   9.2    1.1   0.0   0.6    0.0  27.0

What it means: low iowait, high user CPU. That screams “queries/compute” more than “disk is slow.”
Decision: move attention to query plans, contention, and buffer pool efficiency before replacing drives again.

Task 9: Confirm disk-level pressure and latency with iostat

cr0x@server:~$ iostat -x 1 5
Linux 6.8.0 (server)  12/29/2025  _x86_64_ (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          58.10    0.00    8.90    2.40    0.00   30.60

Device            r/s     rkB/s   rrqm/s  %rrqm  r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm  w_await wareq-sz  aqu-sz  %util
nvme0n1         520.0  68400.0     0.0    0.0    2.10   131.5     880.0  99200.0     0.0    0.0    8.70   112.7     8.3   78.0

What it means: write await ~8.7ms and util 78% indicates meaningful pressure; NVMe can still be “busy” if your workload is sync-heavy.
Decision: correlate with InnoDB log waits and fsync time; if await spikes align with commit latency, tune the log/checkpoint path, not the buffer pool.

Task 10: Identify MySQL/MariaDB wait classes (especially I/O and log)

cr0x@server:~$ mysql -e "SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT/1000000000000 AS seconds_waited
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'wait/io/file/%' OR EVENT_NAME LIKE 'wait/synch/%'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;"
+------------------------------------------+------------+----------------+
| EVENT_NAME                               | COUNT_STAR | seconds_waited |
+------------------------------------------+------------+----------------+
| wait/io/file/innodb/innodb_log_file      | 1823345    | 913.42         |
| wait/io/file/innodb/innodb_data_file     | 937223     | 411.08         |
| wait/synch/mutex/innodb/log_sys_mutex    | 8234432    | 210.55         |
+------------------------------------------+------------+----------------+

What it means: log file waits dominate. That often maps to fsync latency or log mutex contention.
Decision: focus on redo log sizing, flush settings, and commit concurrency; NVMe alone won’t fix a serialized log bottleneck.

Task 11: Check InnoDB checkpoint pressure and dirty page behavior

cr0x@server:~$ mysql -e "SHOW ENGINE INNODB STATUS\G" | sed -n '1,120p'
...
Log sequence number          834992230144
Log flushed up to            834992229120
Last checkpoint at           834991100928
...
Modified db pages            214833
Pending writes: LRU 0, flush list 12, single page 0
...

What it means: pending flush list writes and lots of modified pages can mean background flushing is struggling.
Decision: if checkpoint age stays high and pending writes climb during peak, consider larger redo logs and review innodb_io_capacity settings after measuring actual device capability.

Task 12: Verify buffer pool hit rate and whether reads are coming from disk

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';"
+---------------------------------------+------------+
| Variable_name                         | Value      |
+---------------------------------------+------------+
| Innodb_buffer_pool_read_requests      | 9823344123 |
| Innodb_buffer_pool_reads              | 92233441   |
+---------------------------------------+------------+

What it means: lots of disk reads relative to requests can mean the working set doesn’t fit in memory or queries scan too much.
Decision: if disk reads spike during slow periods, either add memory / resize buffer pool or fix the queries that scan.

Task 13: Confirm redo log sizing (and whether it’s absurdly small)

cr0x@server:~$ mysql -e "SHOW VARIABLES LIKE 'innodb_redo_log_capacity'; SHOW VARIABLES LIKE 'innodb_log_file_size';"
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_redo_log_capacity| 2147483648 |
+-------------------------+------------+
+-------------------+-----------+
| Variable_name     | Value     |
+-------------------+-----------+
| innodb_log_file_size | 0      |
+-------------------+-----------+

What it means: using capacity-based redo sizing (2GiB here). Tiny logs can force frequent checkpoints; huge logs can hide problems but increase crash recovery time.
Decision: if log waits dominate and checkpoint pressure is high, increasing redo capacity is often a clean win—verify recovery objectives first.

Task 14: Validate that you’re not secretly limited by filesystem free space or fragmentation pressure

cr0x@server:~$ df -h /var/lib/mysql
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  1.8T  1.6T  140G  92% /var/lib/mysql

What it means: 92% full is a performance smell. SSDs and filesystems behave worse when nearly full; MySQL can also struggle with purge and temp space.
Decision: free space is performance. Budget it like you budget CPU.

Task 15: Check temporary table spill behavior (often misdiagnosed as “disk is slow”)

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Created_tmp%';"
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| Created_tmp_disk_tables | 184223  |
| Created_tmp_tables      | 231002  |
+-------------------------+---------+

What it means: many temp tables go to disk. That can punish both SATA and NVMe, especially if tmpdir sits on slower storage.
Decision: identify the queries creating temp tables; adjust indexes or query shape first, then tune temp table sizes carefully.

Task 16: Watch replication for I/O-induced lag (commit stalls can ripple)

cr0x@server:~$ mysql -e "SHOW SLAVE STATUS\G" | egrep "Seconds_Behind_Master|Slave_SQL_Running|Slave_IO_Running|Relay_Log_Space"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 87
Relay_Log_Space: 2143027200

What it means: lag and large relay logs can be caused by slow apply due to fsync pressure, insufficient parallelism, or heavy reads.
Decision: if lag rises during write peaks, treat storage + durability settings as part of replication performance, not a separate world.

Fast diagnosis playbook

This is the “I have 15 minutes and a manager hovering behind me” version. Don’t debate architecture. Find the bottleneck.

First: confirm whether the database is waiting on I/O or something else

Run mpstat: if iowait is low and CPU is high, it’s not “the disk” (Task 8).
Run iostat -x: look at await and %util for your datadir device (Task 9).
Check top wait events in Performance Schema: if log file I/O dominates, you’re in commit path trouble (Task 10).

Second: determine if it’s read pressure, write pressure, or commit latency

Read pressure: buffer pool reads rising (Task 12), lots of disk reads in iostat, slow queries showing full scans.
Write pressure: dirty pages + pending flushes + checkpoint lag in InnoDB status (Task 11).
Commit latency: log file waits and mutex waits, plus user-visible latency spikes on write endpoints (Task 10 + Task 4).

Third: validate the storage path and eliminate “it’s NVMe so it’s fine” assumptions

Confirm datadir actually sits on NVMe (Task 1–2).
Check device health/temperature (Task 5–6).
Confirm filesystem/mount options aren’t doing something surprising (Task 3).

Fourth: decide what to change (and what not to touch)

If waits point to queries/locks: fix queries, indexes, transaction scope. Don’t tweak filesystem flags to compensate for missing indexes.
If waits point to redo/binlog fsync: tune redo sizing and commit policy with clear durability requirements.
If the device shows throttling/errors: fix hardware, cooling, firmware, or placement before touching DB settings.

Common mistakes: symptom → root cause → fix

1) “We moved to NVMe but writes are still spiky”

Symptom: p95 latency spikes mostly on write endpoints; CPU looks okay.

Root cause: commit path is serialized by redo log flushing, binlog fsync, or log mutex contention; device throughput doesn’t help.

Fix: measure log waits (Task 10), check durability knobs (Task 4), increase redo capacity (Task 13), and validate binlog fsync policy matches business needs.

2) “It’s slow only during reports”

Symptom: OLTP workload slows when analytics/report queries run.

Root cause: buffer pool churn + long scans + temp table spills; storage becomes the victim, not the culprit.

Fix: identify top scanning queries, add the right indexes, offload reports to replica, and watch Created_tmp_disk_tables (Task 15).

3) “I/O wait is low but users complain”

Symptom: low system iowait; app timeouts persist.

Root cause: lock contention, hot rows, or CPU saturation. Storage upgrade won’t touch it.

Fix: inspect waits (Task 10), review slow query log and transaction duration, reduce contention (batching, indexes, smaller transactions).

4) “Replication lag appeared after we ‘optimized durability’”

Symptom: replicas fall behind during peaks; master seems fast.

Root cause: relaxing fsync settings changed workload characteristics; replicas now handle bursts differently, or the binlog/relay log I/O pattern worsened.

Fix: re-check sync_binlog and commit policies; measure relay log space growth and apply behavior (Task 16).

5) “NVMe is fast, but the server freezes for seconds”

Symptom: periodic stalls; graphs show sharp latency cliffs.

Root cause: thermal throttling, device firmware quirks, or filesystem writeback storms when nearly full.

Fix: check temperature and warnings (Task 6), keep free space (Task 14), and verify kernel/device behavior during stalls.

6) “We benchmarked 3GB/s, so production should be fine”

Symptom: synthetic throughput is great; production is not.

Root cause: wrong metric. OLTP is about small random reads/writes, fsync latency, and tail behavior under concurrency.

Fix: use workload-shaped tests, focus on p99 fsync and log waits, and compare under realistic concurrency (Task 9–11).

Three corporate mini-stories from the trenches

Mini-story #1: The incident caused by a wrong assumption

A mid-sized SaaS company migrated their primary database from SATA SSD to NVMe during a planned maintenance window.
They did the right basic steps: copy data, update fstab, verify datadir. The change request said “expect 3–5x I/O improvement.”
The next morning, support lit up: random checkout failures, weird spikes in order latency, and a few alarming deadlocks.

The on-call team assumed the NVMe drive was defective because the symptoms were “I/O-ish.” They ran a quick sequential write test,
got big numbers, and concluded “hardware is fine.” Then they spent hours changing kernel parameters and disabling things they didn’t understand.
Latency got worse. Naturally.

The real problem wasn’t the device. It was a single hot table with a long-running transaction pattern introduced the week before.
NVMe made some parts faster, which increased concurrency, which increased lock contention, which increased tail latency.
The storage change coincided with the incident, so it took the blame. Humans love a neat story.

When they finally checked Performance Schema waits, the top wait was not disk I/O. It was lock waits and a log mutex wait caused by
heavy commit contention. They fixed the application transaction scope, added an index that removed a scan, and the “NVMe incident” vanished.
The postmortem lesson wasn’t “don’t buy NVMe.” It was “don’t promote correlation to causation just because it fits in a Slack message.”

Mini-story #2: The optimization that backfired

Another organization had a MariaDB cluster on decent NVMe, but commit latency was hurting their API.
Someone suggested the classic move: relax durability. They changed innodb_flush_log_at_trx_commit and sync_binlog
to reduce fsync frequency. The write latency dropped. Everyone cheered. The change was declared “safe” because “we have replicas.”

Two weeks later, a power event hit one rack. Not catastrophic—just enough to reboot several nodes.
One primary came back with missing transactions that had been acknowledged to clients. Replicas didn’t save them because the lost transactions
had never made it to stable media on the primary in the first place, and the failover automation promoted the wrong node.
Now they had a data integrity incident. Those are the ones that make people speak quietly in meetings.

The backfire wasn’t “durability settings are evil.” It was the mismatch between business requirements and engineering choices.
If you must relax durability, you do it deliberately: with power-loss-protected devices, clear recovery procedures,
and an explicit agreement on what “acknowledged write” means.

They reverted the settings, then fixed the actual bottleneck: log contention and checkpoint pressure.
They resized redo logs, confirmed cooling to avoid throttling, and separated binlogs onto a different volume.
The API latency improved again—without lying to users about durability.

Mini-story #3: The boring but correct practice that saved the day

A financial services team ran MySQL on NVMe, but they treated storage as a reliability component, not a performance sticker.
They had a routine: weekly checks of SMART health, temperature baselines under peak, and a small set of “golden commands” captured in runbooks.
They also tracked p99 fsync time as a first-class metric.

One Tuesday, they noticed a subtle increase in write latency—nothing dramatic, just enough to nudge API p95 upward.
The usual “it’s the network” theories started floating around. The on-call engineer didn’t argue; they pulled the runbook and ran the checks.
SMART showed temperatures trending higher, and iostat showed w_await rising under load.

They found a failed fan in the chassis. The NVMe drive had begun throttling during peak.
Replace fan, restore airflow, latency returns to baseline. No outage, no emergency maintenance, no executive summary full of hand-waving.
The fix was boring. The result was beautiful.

Joke #2: The best performance optimization is sometimes a $20 fan—because electrons also prefer not to be roasted.

Checklists / step-by-step plan

Step-by-step: isolate “storage is slow” from “database is slow”

Verify placement: confirm datadir and tmpdir are on the intended device (Task 1–2).
Confirm system pressure: mpstat + iostat during a slow window (Task 8–9).
Check DB waits: Performance Schema top waits and InnoDB status (Task 10–11).
Classify workload:
- If log waits dominate: commit/fsync path.
- If data file reads dominate: memory/query plan problem.
- If mutex/lock waits dominate: contention/transaction design problem.
Validate hardware health: SMART warnings and temperature (Task 5–6).
Make one change at a time, with a before/after capture and rollback plan.

Checklist: NVMe readiness for MySQL/MariaDB

Drive has power-loss protection if you depend on safe write caching for latency.
Thermals are monitored; chassis airflow is validated under peak.
Filesystem choice and mount options are standardized across fleet.
Redo/binlog settings match business durability requirements (not just benchmarks).
Capacity headroom exists (don’t run at 90%+ full unless you enjoy surprises).
You can answer: “Top wait event at peak?” and “p99 fsync time?”

Checklist: MySQL vs MariaDB decision points (pragmatic edition)

Compatibility: confirm feature parity you rely on (replication mode, GTIDs, tooling).
Observability: pick the one your team can debug quickly with existing skills and dashboards.
Upgrade discipline: choose the ecosystem where you can patch regularly with predictable behavior.
Operational tooling: backups, verification, failover, schema migrations—this matters more than microbenchmarks.

FAQ

1) Should NVMe always outperform SATA SSD for MySQL?

Not always in ways you’ll feel. If you’re CPU-bound, lock-bound, or your working set fits in memory, NVMe won’t change much.
NVMe shines under high concurrency random I/O and when you need lower tail latency under sustained write load.

2) Why does “%iowait” stay low even when queries are slow?

Because the process might be waiting on locks, CPU scheduling, memory stalls, or internal MySQL mutexes.
Also, iowait is a CPU accounting metric, not a direct measure of disk latency. Use iostat and DB wait events instead.

3) Is `innodb_flush_log_at_trx_commit=2` safe?

It’s a durability trade. It can lose up to about a second of committed transactions during a crash/power loss scenario.
Safe depends on your business contract with reality. Decide explicitly, and don’t hide it behind “performance tuning.”

4) My log file waits dominate. What’s the first knob to turn?

First measure. Then consider redo log capacity (too small causes checkpoint churn) and binlog fsync frequency.
If durability must remain strict (1/1), you’ll likely need better latency devices, PLP SSDs, or reduced commit concurrency at the app layer.

5) Does filesystem choice (ext4 vs XFS) matter for database performance?

It can, mostly through latency behavior under writeback and how it handles metadata and allocation patterns.
The bigger win is consistency and measurability across hosts. Pick one, standardize mount options, and baseline p99 fsync latency.

6) What’s the easiest way to tell if I’m read-bound or write-bound?

Look at Innodb_buffer_pool_reads and iostat reads vs writes, then cross-check Performance Schema.
High disk reads and data file waits imply read pressure. High log waits and pending flushes imply write/commit pressure.

7) How can a “faster disk” make lock contention worse?

By increasing throughput of some operations, you can increase concurrency and hot-spot pressure elsewhere.
Faster commits can cause more transactions to collide on the same rows or indexes, pushing contention to the front of the line.

8) Are consumer NVMe drives okay for production databases?

Sometimes, for non-critical workloads. For durability-sensitive systems, lack of power-loss protection and inconsistent steady-state latency
are common deal-breakers. Even when they’re “fast,” their failure modes are rarely polite.

9) MySQL vs MariaDB: which is faster on NVMe?

Depends on version, workload, configuration, and features used. In real operations, “faster” is less important than
“predictably debuggable” and “behaves well under failure.” Benchmark your own workload and compare wait profiles, not just QPS.

10) What single metric should I alert on to catch storage-related DB pain early?

If you can only pick one, alert on commit-related latency or log file wait time (from DB instrumentation) plus device temperature.
Those two catch a surprising amount of real-world pain.

Conclusion: practical next steps

NVMe is a great tool. It’s not a substitute for understanding your database’s wait profile.
If your MySQL or MariaDB system still feels slow after moving from SATA SSD to NVMe, assume nothing and measure everything that matters:
commit path, checkpoint pressure, buffer pool misses, and device health.

Run the fast diagnosis playbook during a slow window and capture outputs.
Classify the bottleneck (read pressure vs write/commit pressure vs contention/CPU).
Fix the highest-leverage constraint first: query plans/locks, redo sizing/checkpoints, or hardware thermals/health.
Make one change at a time. Keep a rollback. Baseline p95/p99 latency, not just throughput.

Your goal isn’t “NVMe everywhere.” Your goal is a database that behaves predictably at peak, fails honestly, and can be debugged quickly by tired humans.