Spinning disks don’t fail gracefully. They fail like a meeting invite: quietly at first, then suddenly everyone is late and nobody knows why.
When MariaDB or PostgreSQL lands on HDD and the workload stops fitting in memory, you get a specific kind of misery: high latency, stalled commits,
angry application timeouts, and a dashboard that looks like a heart monitor.
The practical question isn’t “Which database is faster?” It’s: which one degrades less badly when the disk turns into the bottleneck,
and what knobs actually work when you can’t “just add SSDs” this quarter.
The blunt answer: who suffers more on HDD
Under disk pressure on HDD, both suffer. But they suffer in different ways, and the “who suffers more” answer depends on
what kind of disk pressure you mean.
-
Random-read pressure (working set doesn’t fit in RAM, many point lookups):
MariaDB/InnoDB often degrades harder. InnoDB’s performance cliff on HDD is steep when buffer pool misses turn into random seeks. -
Write/commit pressure (many transactions, durability on):
PostgreSQL can look worse if WAL (and fsync behavior) lands on the same slow device and checkpoint tuning is neglected.
You’ll see commit latency spikes and “checkpoint storms.” -
Mixed OLTP (typical web workload):
MariaDB tends to be more sensitive to random I/O and secondary index patterns; PostgreSQL tends to be more sensitive to WAL/checkpoint/vacuum pacing.
If you force me into a headline: MariaDB tends to “suffer more” on HDD when you are IOPS-bound on random reads;
PostgreSQL tends to “suffer more” when you are fsync/WAL/checkpoint bound on writes and you tune nothing.
In real production, you can make either one terrible with the right combination of defaults, schema, and wishful thinking.
Opinionated operational guidance: if you must run on HDD, you should bias toward
fewer random I/Os, bigger memory, slower but steadier flushing, and isolation between sequential write paths and random read paths.
Both databases can be made survivable on HDD, but neither is forgiving when your workload turns into a seek-fest.
What HDD hates (and what it tolerates)
HDD isn’t “slow.” It’s inconsistent. Sequential throughput can look okay on paper. The killer is seek latency and queueing.
Once you’re doing lots of random I/O at low block sizes, a single disk behaves like a tiny gate that everyone tries to squeeze through at once.
HDD is allergic to random I/O
Random reads are expensive because the head must move. Add concurrency and you don’t get “parallelism,” you get “musical chairs.”
OLTP patterns (point lookups, secondary index probes, nested loops) turn into scattered reads and the disk becomes the global mutex.
HDD tolerates sequential writes… until it doesn’t
Sequential log writes can be okay on HDD: append-heavy, mostly linear. That’s where PostgreSQL WAL and InnoDB redo logs
can be your friend—if the system keeps them sequential. But checkpointing, doublewrite, page flushing, and vacuum can all turn
“sequential-ish” into “random-ish,” and your queue depth turns into a punchline.
Joke #1: Running a write-heavy database on a single HDD is like doing distributed systems with sticky notes—technically possible, socially expensive.
The HDD bottleneck you actually observe: latency, not bandwidth
On production incidents, the signal is usually:
await times jump, application p99 latency follows, and throughput drops.
People stare at “MB/s” graphs and miss the real story: a 10ms seek becomes 100ms when the queue piles up, and now your
transaction’s fsync is waiting behind a vacuum and a checkpoint.
Why InnoDB and PostgreSQL behave differently under pressure
InnoDB (MariaDB): buffer pool misses become random reads
InnoDB is a page-based engine. Data and indexes live in B-trees. When a page isn’t in the buffer pool, it’s fetched from disk.
On SSD, you mostly pay a small penalty. On HDD, each miss can mean a seek.
InnoDB’s strengths—adaptive hash index, change buffering, background flushing—help, but they can’t repeal physics.
If your hot set doesn’t fit in RAM and the access pattern is random, InnoDB becomes a random I/O generator.
- Doublewrite buffer: protects against partial page writes (important on HDD), but adds write overhead.
- Redo log: sequential-ish, but flush policies can force frequent fsync.
- Change buffer (insert buffer): can defer some secondary index maintenance, reducing random writes, but later merges can spike I/O.
PostgreSQL: WAL is sequential; checkpoints decide whether you’ll sleep
PostgreSQL also uses fixed-size pages (blocks) and a shared buffer cache. But its write path is centered on WAL:
commit writes WAL records, and data pages are written later by background writer / checkpointer.
On HDD, WAL being sequential is good. The trap is that PostgreSQL’s durability model and checkpointing can create bursts:
a big checkpoint can flush many dirty buffers quickly, causing random writes and contention with reads. If you tune poorly,
you get latency sawtooths: calm, then chaos, repeating on schedule.
- WAL fsync: commit latency depends on fsync behavior and WAL device performance.
- Checkpoints: force dirty pages to disk; too frequent or too spiky equals I/O storms.
- Autovacuum: keeps bloat in check but can create background read/write load; misconfigured, it becomes a silent disk tax.
Where HDD pain hides: write amplification and background work
When people say “disk pressure,” they often mean “the database is doing more I/O than I think it should.”
That’s usually write amplification:
- Too many dirty pages flushed per unit of useful work (bad checkpoint/flushing behavior).
- Bloat and fragmentation that makes reads/writes touch more pages.
- Indexes that multiply writes (every insert/update hits multiple B-trees).
- Replication/logging settings that force synchronous disk behavior at higher frequency than needed.
A reliability quote that belongs on every ops wall
Hope is not a strategy.
— Gen. Gordon R. Sullivan
On HDD, hoping the kernel cache or “some tuning later” will save you is how you end up doing incident response with your lunch getting cold.
Disk pressure failure modes you’ll actually see
1) The “everything is slow but CPU is idle” incident
CPU looks bored, load average is high, and the database threads are “running” but not doing anything productive.
On Linux, that’s often iowait and blocked tasks. HDD queue depth climbs, await climbs, and your p95 turns into a p95th percentile apology.
2) Commit latency spikes (fsync stalls)
In PostgreSQL, this often shows up as slow commits when WAL fsync is delayed by saturated disk.
In MariaDB/InnoDB, redo log flushes and fsync can similarly stall commits depending on flush settings.
3) Background maintenance fights your foreground traffic
Checkpoints, vacuum/autovacuum, and InnoDB page cleaners are “helpful” until they aren’t.
On HDD, the worst pattern is: background work becomes aggressive, disk latency spikes, and then foreground work slows,
which causes more backlog, which causes more dirty pages, which makes the background work more aggressive. It’s a feedback loop with a pager.
4) The “we added an index and everything got worse” surprise
Indexes speed reads but multiply writes. On HDD, that write multiplier is a tax you pay with interest: more pages dirtied, more flushing,
more random I/O, more bloat. If you don’t measure, you’ll optimize yourself into an outage.
5) Slow replication and “IO thread” backlog
Disk saturation slows writing of WAL/binlogs and slows applying changes. Replicas fall behind, then failover becomes exciting in the wrong way.
HDD gives you fewer chances to be sloppy about where logs land and how many fsyncs you demand per second.
Fast diagnosis playbook
This is the order I use when a database on HDD is melting down. It’s designed to find the bottleneck quickly, not to be “complete.”
You can do this in 5–10 minutes if you have access.
First: confirm it’s disk latency (not CPU, not locks)
- Check await and %util per device. If await is high and util is pegged, you’re I/O bound.
- Check if the DB is waiting on fsync or buffer I/O vs waiting on locks.
- Confirm memory pressure: if you’re swapping or the cache hit rate fell off a cliff, HDD is being asked to do random reads it can’t handle.
Second: identify whether reads or writes are driving the pain
- Look at read/write IOPS and average request size. Random reads look like small reads with high await.
- Check database stats: buffer cache hit ratio, WAL/redo generation, checkpoint activity, dirty page flushing.
Third: pick the least-worst mitigation
- If random reads: increase effective cache (RAM, buffer pool/shared_buffers), reduce working set, fix queries/indexes.
- If write bursts: smooth checkpoints/flushing, separate WAL/redo onto its own device if possible, reduce sync frequency only if acceptable.
- If maintenance is the bully: throttle vacuum or page cleaners, schedule heavy tasks off-peak, and address bloat/fragmentation properly.
Practical tasks: commands, outputs, decisions (12+)
These are production-grade checks. Each includes: command, what output means, and the decision you make.
Run them on the database host (and sometimes inside psql/mariadb). Adjust paths and credentials to your environment.
Task 1: Check per-disk latency and saturation
cr0x@server:~$ iostat -x 1 5
Linux 6.5.0 (db01) 12/30/2025 _x86_64_ (8 CPU)
Device r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await %util
sda 35.0 120.0 1.2 6.8 128 7.50 58.2 42.1 62.9 99.4
Meaning: %util near 100% plus await tens of ms indicates the disk is saturated. avgqu-sz shows a queue.
Decision: treat this as an I/O incident. Stop blaming CPU. Next, determine whether reads or writes dominate and whether WAL/redo shares the same device.
Task 2: See if the kernel is spending time in iowait
cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 12 0 51200 12000 820000 0 0 980 5400 900 1800 10 6 40 44 0
Meaning: high b (blocked) and high wa means threads are waiting on disk.
Decision: proceed to DB-side wait analysis; also check for swapping (si/so) because swap on HDD is performance arson.
Task 3: Confirm you’re not swapping (or worse, thrashing)
cr0x@server:~$ free -h
total used free shared buff/cache available
Mem: 31Gi 26Gi 120Mi 1.1Gi 4.9Gi 2.2Gi
Swap: 4.0Gi 1.8Gi 2.2Gi
Meaning: swap in use on a DB host is a red flag; available is dangerously low.
Decision: reduce memory footprint, adjust DB caches to avoid OOM but prevent swapping, or add RAM. On HDD, swap use often correlates with random read storms.
Task 4: Identify top I/O consumers at process level
cr0x@server:~$ sudo iotop -oPa
Total DISK READ: 18.25 M/s | Total DISK WRITE: 72.10 M/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
2211 be/4 postgres 2.10 M/s 18.50 M/s 0.00 % 89.00 % postgres: checkpointer
1998 be/4 postgres 0.50 M/s 9.20 M/s 0.00 % 45.00 % postgres: autovacuum worker
1870 be/4 mysql 1.20 M/s 22.30 M/s 0.00 % 70.00 % mariadbd
Meaning: you can see if checkpointer/autovacuum (Postgres) or mariadbd (InnoDB flushing) is dominating writes.
Decision: if background processes dominate, tune checkpoint/vacuum (Postgres) or flushing settings (InnoDB), and separate log/data paths if possible.
Task 5: Verify filesystem mount options (barriers, atime)
cr0x@server:~$ mount | egrep ' /var/lib/(postgresql|mysql) '
/dev/sda2 on /var/lib/postgresql type ext4 (rw,relatime,errors=remount-ro)
/dev/sda3 on /var/lib/mysql type ext4 (rw,relatime,errors=remount-ro)
Meaning: relatime is fine; noatime can reduce metadata writes. The bigger question is whether logs and data share spindles.
Decision: consider noatime where appropriate; more importantly, plan device separation for WAL/redo and data on HDD.
Task 6: Check PostgreSQL wait events for I/O vs locks
cr0x@server:~$ sudo -u postgres psql -c "select wait_event_type, wait_event, count(*) from pg_stat_activity where state='active' group by 1,2 order by 3 desc;"
wait_event_type | wait_event | count
-----------------+---------------+-------
IO | DataFileRead | 18
IO | WALWrite | 6
Lock | relation | 2
Meaning: many sessions stuck on DataFileRead indicates random reads; WALWrite indicates log write pressure.
Decision: if DataFileRead dominates, chase cache hit rate, bad queries, and bloat. If WALWrite dominates, focus on WAL device and checkpoint tuning.
Task 7: PostgreSQL checkpoint behavior (are you causing storms?)
cr0x@server:~$ sudo -u postgres psql -c "select checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time, buffers_checkpoint from pg_stat_bgwriter;"
checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint
------------------+-----------------+-----------------------+----------------------+-------------------
120 | 310 | 9876543 | 432109 | 9823410
Meaning: high checkpoints_req vs timed suggests you’re hitting max_wal_size or checkpoint triggers often; large write/sync times suggest painful flushes.
Decision: increase max_wal_size, tune checkpoint_timeout and checkpoint_completion_target, and consider moving WAL to a dedicated spindle.
Task 8: PostgreSQL cache hit ratio (quick and dirty)
cr0x@server:~$ sudo -u postgres psql -c "select sum(blks_hit) as hit, sum(blks_read) as read, round(100.0*sum(blks_hit)/nullif(sum(blks_hit)+sum(blks_read),0),2) as hit_pct from pg_stat_database;"
hit | read | hit_pct
-----------+---------+---------
987654321 | 5432109 | 99.45
Meaning: ~99% can still be bad if the remaining 1% is huge at your QPS. On HDD, that 1% can be your whole outage.
Decision: if hit_pct drops during incident, increase memory effectiveness and reduce working set (indexes, query plans, partitioning, caching in app).
Task 9: PostgreSQL find heavy queries causing reads/writes
cr0x@server:~$ sudo -u postgres psql -c "select queryid, calls, rows, shared_blks_read, shared_blks_hit, (shared_blks_read*8) as read_kb, left(query,120) from pg_stat_statements order by shared_blks_read desc limit 5;"
queryid | calls | rows | shared_blks_read | shared_blks_hit | read_kb | left
----------+-------+-------+------------------+-----------------+---------+-------------------------------
12345678 | 9200 | 18400 | 812345 | 91234567 | 6498760 | SELECT * FROM events WHERE ...
Meaning: a few queries often dominate reads. On HDD, that’s your target list for query/index fixes.
Decision: fix the top offenders: add/adjust indexes (carefully), remove N+1 patterns, avoid large sorts/hashes on disk, reduce returned columns.
Task 10: MariaDB check InnoDB buffer pool and read pressure
cr0x@server:~$ sudo mariadb -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';"
+---------------------------------------+------------+
| Variable_name | Value |
+---------------------------------------+------------+
| Innodb_buffer_pool_read_requests | 9876543210 |
| Innodb_buffer_pool_reads | 43210987 |
+---------------------------------------+------------+
Meaning: Innodb_buffer_pool_reads are physical reads. On HDD, if this climbs fast, you’re doing disk seeks.
The ratio gives you a clue; the rate during incident is more important than the lifetime total.
Decision: if physical reads spike, increase innodb_buffer_pool_size (within RAM limits), fix indexes/queries, and reduce table scans.
Task 11: MariaDB check redo log flush behavior and fsync pressure
cr0x@server:~$ sudo mariadb -e "SHOW GLOBAL VARIABLES LIKE 'innodb_flush_log_at_trx_commit'; SHOW GLOBAL STATUS LIKE 'Innodb_os_log_fsyncs';"
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| innodb_flush_log_at_trx_commit| 1 |
+-------------------------------+-------+
+---------------------+----------+
| Variable_name | Value |
+---------------------+----------+
| Innodb_os_log_fsyncs| 55443322 |
+---------------------+----------+
Meaning: innodb_flush_log_at_trx_commit=1 is safest (fsync every commit). On HDD with high TPS, this can be brutal.
Innodb_os_log_fsyncs indicates how often.
Decision: do not change durability lightly. If the business allows, 2 reduces fsync frequency; otherwise, move redo log to faster storage or reduce commit rate (batching).
Task 12: MariaDB inspect InnoDB dirty page flushing and stalls
cr0x@server:~$ sudo mariadb -e "SHOW ENGINE INNODB STATUS\G" | egrep -i 'Modified db pages|buffer pool size|free buffers|Pending writes|Page cleaner'
Modified db pages 245678
Buffer pool size 2097152
Free buffers 1024
Pending writes: LRU 0, flush list 37, single page 0
Page cleaner: 1000ms intended loop took 8450ms
Meaning: many modified pages + page cleaner loop taking far longer than intended implies InnoDB can’t flush fast enough.
On HDD this often precedes stalls and sudden latency explosions.
Decision: tune innodb_io_capacity/innodb_io_capacity_max to match HDD reality (not fantasy), reduce write rate, and ensure redo log sizing is sane.
Task 13: Check whether WAL/redo and data are on the same device
cr0x@server:~$ lsblk -o NAME,SIZE,TYPE,MOUNTPOINTS
NAME SIZE TYPE MOUNTPOINTS
sda 1.8T disk
├─sda1 512M part /boot
├─sda2 900G part /var/lib/postgresql
└─sda3 900G part /var/lib/mysql
Meaning: everything on one disk means WAL/redo writes and data reads compete for head movement. On HDD, that’s a self-inflicted wound.
Decision: split: put WAL/binlog/redo on a different spindle (or RAID10 set) if you can. Even a “mediocre” dedicated disk can stabilize latency.
Task 14: Identify disk queueing and scheduler
cr0x@server:~$ cat /sys/block/sda/queue/scheduler
[mq-deadline] none kyber bfq
Meaning: scheduler affects latency under mixed workloads. For HDD, a deadline-type scheduler often behaves better than “none.”
Decision: if you’re using none on HDD, consider mq-deadline (test first). Don’t cargo-cult; measure await and tail latency.
Task 15: PostgreSQL WAL volume and location
cr0x@server:~$ sudo -u postgres psql -c "show data_directory; show wal_level; show synchronous_commit;"
data_directory
-------------------------
/var/lib/postgresql/16/main
(1 row)
wal_level
-----------
replica
(1 row)
synchronous_commit
-------------------
on
(1 row)
Meaning: WAL settings determine how much WAL is generated and when commits wait. If synchronous_commit is on (default), commits wait for WAL flush.
Decision: keep synchronous_commit on unless you explicitly accept losing recent transactions. For HDD relief, move WAL to separate device and smooth checkpoints.
Task 16: PostgreSQL bloat indicator (quick table size reality check)
cr0x@server:~$ sudo -u postgres psql -c "select relname, pg_size_pretty(pg_total_relation_size(relid)) as total, n_live_tup, n_dead_tup from pg_stat_user_tables order by pg_total_relation_size(relid) desc limit 5;"
relname | total | n_live_tup | n_dead_tup
-----------+--------+------------+------------
events | 220 GB | 410000000 | 98000000
sessions | 85 GB | 150000000 | 72000000
Meaning: high dead tuples implies bloat. On HDD, bloat is extra pages = extra seeks = extra suffering.
Decision: tune autovacuum for that table, consider partitioning, and schedule vacuum/rewrite operations carefully (they’re I/O heavy).
Three corporate-world mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A mid-sized SaaS company ran MariaDB on a pair of big HDDs in RAID1. The workload was classic OLTP: sessions, billing events,
and a couple of “helpful” analytics queries that kept sneaking into production.
The team assumed RAID1 meant “read performance is basically fine.” They also assumed their buffer pool hit ratio was “good enough”
because it was usually above 98%.
Then marketing shipped a feature that added a “search by email prefix” endpoint. It did a query that looked harmless,
but in practice it missed indexes under certain collation settings and produced range scans on a large table.
During peak hours, those scans caused a wave of buffer pool churn. The miss rate didn’t look dramatic as a percentage.
The absolute number of physical reads did.
On HDD, the random reads were lethal. Not because the disk couldn’t do throughput—it could read plenty of megabytes per second.
But it couldn’t seek fast enough. Latency rose, queries piled up, and connection pools saturated. The app started retrying.
The retries doubled the pressure. You know the rest.
The fix wasn’t exotic. They added the correct composite index, forced the query to use it, and then raised the buffer pool
so that the hot working set actually fit. They also put the “analytics-ish” endpoints onto a replica where slow queries could fail
without turning the primary into a random I/O generator.
The lesson that stuck: RAID1 doesn’t make random seeks cheap. It just gives you two sets of heads to disappoint you.
Mini-story 2: The optimization that backfired
A payments-adjacent platform ran PostgreSQL on HDD because the compliance environment was “stable” and procurement moved at glacier speed.
They got hammered by commit latency during batch import windows.
Someone proposed a “simple fix”: make checkpoints happen more frequently so each checkpoint writes less and finishes faster.
They reduced checkpoint_timeout and kept max_wal_size small. Checkpoints did finish “faster.”
Also, they happened constantly. That turned what could have been a smoother background write pattern into a near-permanent state of flushing.
The checkpointer and backend writes competed with reads. Autovacuum still existed, because entropy is undefeated.
The user-visible symptom was brutal: p99 latency developed a rhythmic pattern.
Every few minutes, the API would slow down enough to trigger timeouts.
The on-call person saw “checkpoint complete” messages in logs like a metronome from hell.
The eventual fix was the opposite: allow WAL to accumulate more (bigger max_wal_size), increase
checkpoint_completion_target so writes spread out, and move WAL onto its own disk.
Import windows became boring again. Nobody throws a party for “boring,” but that’s what you want.
Mini-story 3: The boring but correct practice that saved the day
A large internal enterprise system ran MariaDB and PostgreSQL side-by-side for different services.
Everything lived on HDD arrays because the environment was built years ago and the “fast tier” was reserved for other systems.
The team couldn’t change hardware quickly, so they did something unfashionable: they made I/O boring.
They split log paths from data paths wherever possible. PostgreSQL WAL got dedicated spindles. MariaDB redo logs and binlogs were separated too.
They also enforced a rule: no unreviewed schema changes during peak hours, and every new index required measuring the write cost on staging
with production-like data volumes.
They scheduled vacuum-heavy operations and InnoDB maintenance tasks off-peak, and they watched “background work” metrics like hawks:
checkpoint write time, buffer pool reads, dirty pages, and replication lag.
When incidents happened, they had a fast diagnosis playbook and people actually used it instead of launching a blame-based routing protocol.
The result wasn’t “fast.” It was stable. The business didn’t notice the disks were slow because the system stopped
oscillating between fine and on-fire. The team slept. Sleep is a feature.
Common mistakes: symptoms → root cause → fix
1) Symptom: p99 latency spikes every few minutes (PostgreSQL)
Root cause: checkpoint bursts (max_wal_size too small, checkpoint settings too aggressive, WAL and data fighting).
Fix: increase max_wal_size, raise checkpoint_completion_target (spread writes), and move WAL to a separate disk if possible.
2) Symptom: slow “simple” SELECTs, CPU low, disk await high (MariaDB)
Root cause: buffer pool misses creating random reads; missing or wrong indexes; hot set doesn’t fit RAM.
Fix: increase innodb_buffer_pool_size, fix indexes/query plans, reduce table scans, consider partitioning or caching.
3) Symptom: commits intermittently stall; “fsync” shows up everywhere
Root cause: log device saturated; too many transactions forcing frequent fsync; WAL/binlog/redo sharing HDD with data.
Fix: separate logs onto dedicated spindles; batch writes; keep durability settings unless you explicitly accept losing data.
4) Symptom: replication lag increases during peak
Root cause: primary disk saturated; replica applying changes can’t keep up; background maintenance steals I/O budget.
Fix: throttle maintenance, tune apply parallelism where applicable, separate log/data, and reduce write amplification (indexes, bloat).
5) Symptom: after adding an index, everything slows down
Root cause: write amplification (more index maintenance), more dirty pages, more flushing and random I/O.
Fix: add only indexes that pay for themselves; consider partial/covering indexes (Postgres), drop unused indexes, and validate with write-heavy testing.
6) Symptom: autovacuum “randomly” hurts performance (PostgreSQL)
Root cause: bloat and dead tuples accumulated; autovacuum is forced to work harder during peak, scanning heaps and indexes.
Fix: tune table-level autovacuum settings; keep dead tuples low; consider partitioning; schedule manual vacuum/rewrite where necessary.
7) Symptom: InnoDB stalls with “page cleaner” warnings
Root cause: flushing can’t keep up; dirty pages accumulate; checkpoint age pressure forces aggressive flushing.
Fix: adjust innodb_io_capacity appropriately, increase redo log capacity where suitable, reduce write spikes, and ensure buffer pool isn’t wildly oversized for the disk.
8) Symptom: “We have plenty of MB/s headroom” but latency is awful
Root cause: you’re IOPS/seek-bound, not bandwidth-bound. Small random I/O kills you while MB/s charts stay smug.
Fix: focus on await/queue depth, cache hit rates, and reducing random access; stop using throughput graphs as comfort blankets.
Checklists / step-by-step plan (HDD survival)
Step-by-step: stabilize an incident in progress
- Confirm I/O saturation:
iostat -xshows high await + high util. - Stop the bleeding: temporarily throttle batch jobs, heavy reports, and long-running maintenance.
- Separate reads from writes where possible: route analytics to replicas; pause non-critical writes.
- Identify top queries: use
pg_stat_statementsor slow query log; kill the worst offenders if safe. - Reduce checkpoint/vacuum aggression (Postgres): avoid emergency tuning that increases checkpoint frequency.
- Check memory pressure: if swapping, reduce caches or restart safely after fixing configuration (carefully) to reclaim memory.
Checklist: preventing random read death spirals
- Size buffer caches realistically: MariaDB
innodb_buffer_pool_size, Postgresshared_buffersplus OS cache. - Keep working sets small: remove unused indexes, archive cold data, partition large append-only tables.
- Fix query plans: avoid sequential scans on hot paths; watch nested loops that do many index lookups.
- Prefer fewer, better indexes over “index everything.”
Checklist: preventing write burst disasters
- Separate WAL/redo/binlog from data on HDD.
- Postgres: increase WAL room and spread checkpoints (
max_wal_size,checkpoint_completion_target). - MariaDB: monitor dirty pages, page cleaner lag, redo log pressure; tune I/O capacity to match real device behavior.
- Watch autovacuum and InnoDB background flushing as first-class workloads, not “background magic.”
Checklist: when HDD is non-negotiable
- Use RAID10 over RAID5 for random write-heavy workloads if you have to choose.
- Keep fsync durability on unless the business signs off on losing transactions.
- Measure tail latency, not just throughput.
- Document the “fast diagnosis” playbook and practice it.
Interesting facts & short history (the context people forget)
- InnoDB wasn’t always default in MySQL: MyISAM used to be common, and it behaved very differently under write pressure and crash recovery.
- MariaDB forked from MySQL in 2009: governance concerns after the Sun/Oracle acquisition pushed many teams to hedge.
- PostgreSQL’s WAL design is old and proven: WAL-based crash recovery has been central for decades, and it’s why sequential log I/O matters so much.
- InnoDB uses a doublewrite buffer by default: it’s specifically about protecting against partial page writes—more relevant on power-loss-prone environments and HDD-era storage.
- PostgreSQL checkpoints are intentionally tunable: defaults aim for safety, not for “HDD with a mean workload and no patience.”
- Autovacuum was introduced to reduce operational burden: but mis-tuned autovacuum is a classic source of unexpected I/O on spinning disks.
- Linux I/O schedulers changed with blk-mq: what worked on older kernels isn’t always best now; HDD still benefits from latency-aware scheduling.
- HDD firmware lies sometimes: write caches and reordering can make latency unpredictable; databases compensate with fsync semantics and conservative assumptions.
Joke #2: The fastest way to learn about HDD seek time is to put your database on one and watch your career seek new opportunities.
FAQ
1) If I’m stuck with HDD, should I pick MariaDB or PostgreSQL?
Pick based on your workload and operational maturity. If you can control queries and keep the hot set in RAM, both can behave.
If you expect random-read pressure, PostgreSQL often degrades more predictably; if you expect high commit rates and can’t separate WAL, MariaDB may feel smoother until fsync pressure hits.
The real answer: choose the one your team can tune and operate well, then design for fewer random I/Os.
2) What’s the single best hardware change on HDD?
Split log writes from data reads. Put PostgreSQL WAL (and MariaDB redo/binlog) on dedicated spindles or a separate array.
It reduces head contention and stabilizes commit latency.
3) Should I turn off fsync or relax durability settings to survive?
Only with explicit business approval. PostgreSQL fsync=off is asking for corruption after a crash.
MariaDB innodb_flush_log_at_trx_commit=2 can be acceptable in some cases (you risk losing up to ~1 second of transactions),
but make it a product decision, not a midnight hack.
4) Why do my graphs show decent MB/s but the app is timing out?
Because HDD is IOPS/latency-bound. A few MB/s of small random reads can saturate seeks and create huge queueing delays.
Watch await, queue depth, and DB wait events—not just throughput.
5) Does increasing PostgreSQL shared_buffers fix HDD pain?
Sometimes, but it’s not magic. PostgreSQL also relies heavily on the OS page cache. Oversizing shared_buffers can reduce effective OS cache and backfire.
On HDD, the goal is: keep the hot set cached somewhere and avoid swap.
6) Does increasing innodb_buffer_pool_size always help MariaDB on HDD?
It helps until it causes swapping or starves the OS. On HDD, swapping is catastrophic. Grow buffer pool carefully and measure physical reads and latency.
Also ensure your queries actually benefit; a bad query will happily scan whatever you cache.
7) What’s the most common PostgreSQL HDD tuning mistake?
Under-allocating WAL space and triggering frequent requested checkpoints, then wondering why latency spikes rhythmically.
Give WAL room and spread checkpoint I/O.
8) What’s the most common InnoDB HDD tuning mistake?
Pretending the disk is faster than it is. Setting innodb_io_capacity too high can cause aggressive flushing patterns that fight foreground reads.
Set it based on measured device capability and watch page cleaner behavior.
9) Can RAID fix this?
RAID can help by adding spindles (more heads) and better redundancy, but it won’t turn random I/O into cheap I/O.
RAID10 is generally friendlier for mixed random workloads than parity RAID. Still, the biggest wins usually come from cache fit and workload shaping.
10) Are there schema patterns that are especially painful on HDD?
Yes: wide secondary-index-heavy tables with frequent updates; unbounded “events” tables without partitioning; and designs that force many random lookups per request.
On HDD, you want locality and fewer page touches per transaction.
Practical next steps
If you run MariaDB or PostgreSQL on HDD and disk pressure is already a thing, don’t start with tuning knobs.
Start with measurement, then architecture, then knobs.
- Measure latency properly:
iostat -x, wait events, and DB stats during peak. - Separate logs from data: WAL/redo/binlog isolation is the highest ROI change on spinning disks.
- Make the hot set fit: right-size buffer caches and stop swapping. If you can’t, shrink the working set (partition/archive/drop unused indexes).
- Fix the top 5 queries: it’s usually not “the database,” it’s one or two query patterns turning HDD into a seek generator.
- Smooth background work: checkpoint tuning (Postgres) and flushing behavior (InnoDB) should aim for steady-state, not heroics.
- Write down your playbook: use the fast diagnosis order, and rehearse it before the next incident rehearses you.