MySQL vs MariaDB: innodb_buffer_pool_size—the copy-paste tuning mistake that kills performance

Was this helpful?

Nothing ruins a calm on-call shift like “the database is slow” with no other context, followed by a graph that looks like a seismograph. You check CPU: fine. You check queries: mostly normal. Then you notice the host is swapping, IO is pegged, and your database latency just learned how to time travel.

A depressing amount of this comes from one well-meaning, copy-pasted line: innodb_buffer_pool_size. Same parameter name in MySQL and MariaDB. Similar default behavior. Different surrounding ecosystem, different memory accounting realities, and different failure modes when you overshoot. It’s a tuning knob that can make you look like a genius—or like someone who “optimized” production into a crater.

What the buffer pool really is (and what it is not)

The InnoDB buffer pool is the primary cache for InnoDB data and indexes. When things are healthy, it’s where reads come from and where dirty pages sit before being flushed to disk. When things are unhealthy, it’s where your memory budget goes to die.

What it caches

  • Data pages (table rows stored in pages)
  • Index pages (B-tree nodes)
  • Insert buffer / change buffer entries (depending on workload)
  • Adaptive hash index structures (if enabled, workload-dependent)

What it does not cover

This is where the copy-paste tuning mistake starts. People set buffer pool to “80% of RAM” and declare victory, forgetting that InnoDB is not the only thing in memory.

  • Per-connection memory (sort buffers, join buffers, temp tables) can dwarf your expectations under concurrency.
  • Binary log cache, replication buffers, and thread stacks exist even when you don’t think about them.
  • InnoDB log buffer and internal memory structures aren’t “free.”
  • OS page cache matters, even with InnoDB. Starve it and you can punish metadata IO, binlog IO, and anything not served from the buffer pool.
  • Other daemons (backup agents, monitoring, sidecars) don’t stop consuming memory just because you wrote a confident config file.

If you want a mental model: innodb_buffer_pool_size is the biggest predictable bucket, not the whole pool party.

Joke #1: Copy-pasting database configs is like borrowing someone else’s prescription glasses: you’ll definitely see something, just not what you need.

MySQL vs MariaDB: why copy-paste hurts

At the parameter level, MySQL and MariaDB often look interchangeable. At the operational level, they’re siblings who grew up in different houses and now have different habits.

Same knob, different surrounding defaults

In both products, the buffer pool is the dominant InnoDB memory consumer. But the performance outcome of “set it high” depends on:

  • Version-specific behaviors (especially around flushing, page clean threads, and how metrics are exposed)
  • Distribution packaging defaults (systemd limits, cgroup memory ceilings, tmpdir locations)
  • Other engine features (MariaDB and MySQL have diverged in several subsystems; even when InnoDB is “InnoDB-ish,” the integration points differ)
  • Observability differences (what you can easily measure changes what you can safely tune)

The copy-paste failure mode

The classic mistake goes like this:

  1. You have a MySQL server where buffer pool is set to 75–80% RAM. Life is okay.
  2. You deploy MariaDB (or vice versa) on hosts with “the same RAM.”
  3. You reuse the same my.cnf stanza with the same buffer pool size.
  4. The new environment has different headroom requirements: more connections, different temp table behavior, different filesystem cache needs, different kernel memory pressure, different cgroup limits, or simply different workload.
  5. You hit memory pressure. The kernel swaps. Latency explodes. Your IO subsystem becomes a heat lamp.

The parameter didn’t change. The physics did.

Why oversizing kills performance in a very specific way

When you set innodb_buffer_pool_size too large, you don’t get “slightly worse.” You get “different system.” The host starts reclaiming memory aggressively, potentially swapping out database memory that you assumed was hot. Now reads that were memory hits become major faults. Writes stall behind IO. InnoDB’s flush behavior can become erratic because the OS is now the scheduler of your pain.

Even without swap, starving the OS page cache can hurt: binary logs, relay logs, table metadata, and filesystem journaling compete for the scraps. Your database starts behaving like it’s running on a network drive from 2009.

Interesting facts and historical context (short, useful, not trivia)

  1. InnoDB wasn’t always the default. MySQL historically defaulted to MyISAM; the modern expectation that “everything is InnoDB” is relatively recent in production culture.
  2. The buffer pool used to be simpler to reason about. Older versions had fewer moving parts (and fewer observability surfaces), so “just make it big” looked safer than it actually was.
  3. MariaDB forked from MySQL after the Sun/Oracle era shifts. The forks preserved many variable names, which is great for familiarity and terrible for copy-paste overconfidence.
  4. Buffer pool instances were introduced to reduce contention. Splitting the buffer pool into instances reduced mutex contention on multi-core systems, but it also created another tuning dimension people guess at.
  5. Linux swapping behavior changed operationally over time. Kernel defaults and cgroup usage in containerized deployments made “memory” a less stable promise than “RAM installed.”
  6. Performance Schema changed how we diagnose memory and waits. Better instrumentation created better tuning—if you actually enable and query it.
  7. SSD adoption changed the penalty curve. On spinning disks, a cache miss was slow; on SSDs, it’s “less slow,” which tempted teams to underinvest in correct caching until concurrency proved them wrong.
  8. Cloud instance types rewired expectations. RAM-to-IO ratios, burst credits, and network-attached storage made the same buffer pool size behave differently across environments.
  9. Transparent Huge Pages (THP) became a repeat offender. Not a database setting, but it interacts with memory allocation patterns in ways that show up as stalls and latency spikes.

Fast diagnosis playbook

This is what you do when your application team is yelling and you have 15 minutes to find the bottleneck. The goal is not perfect tuning. The goal is identifying whether innodb_buffer_pool_size is involved and whether you’re dying from memory pressure, IO saturation, or lock contention.

First: confirm the host is not lying to you about memory

  1. Check swap activity and major faults. If swapping is active, stop treating query tuning as the first step.
  2. Check cgroup limits. In containers or systemd slices, “free -h” can look fine while the database is boxed in.
  3. Check OOM killer logs. If the kernel is shooting processes, your buffer pool size is now a crime scene.

Second: determine if reads are cache hits or disk IO

  1. Buffer pool hit rate and reads. Look for rising physical reads and a low hit ratio under steady workload.
  2. IO wait and device saturation. If the disks are pegged, your “database slowness” might be “storage is the bottleneck.”

Third: check if writes are stalling behind flushing

  1. Dirty page percentage. If it’s high and not coming down, you’re flush-bound.
  2. Checkpoint age / log pressure. If redo log pressure is high, writes will stall regardless of buffer pool size.

Decision tree (quick and blunt)

  • Swapping or OOM? Reduce memory usage first. Usually that means buffer pool down, connection memory under control, or both.
  • Not swapping, but IO saturated and hit rate low? Buffer pool may be too small or workload doesn’t fit. Increase carefully if you have headroom.
  • IO saturated with high hit rate? Likely write pressure, flushing config, fsync, binlog, or storage latency. Don’t “solve” it by inflating buffer pool.

Practical tasks: commands, outputs, and decisions (12+)

These are real commands you can run on a Linux host and a MySQL/MariaDB server. Each task includes (1) the command, (2) what typical output means, and (3) the decision you make.

Task 1: Confirm which server you’re actually running

cr0x@server:~$ mysql --version
mysql  Ver 15.1 Distrib 10.11.6-MariaDB, for Linux (x86_64) using readline 5.1

Meaning: This is MariaDB 10.11.x. If you expected Oracle MySQL 8.0, your runbook just became fanfiction.

Decision: Use MariaDB-specific instrumentation and be cautious assuming MySQL 8.0 behaviors or defaults.

Task 2: Check current buffer pool size and related knobs

cr0x@server:~$ mysql -NBe "SHOW VARIABLES WHERE Variable_name IN ('innodb_buffer_pool_size','innodb_buffer_pool_instances','innodb_page_size','innodb_flush_method','innodb_flush_neighbors')"
innodb_buffer_pool_size	25769803776
innodb_buffer_pool_instances	8
innodb_page_size	16384
innodb_flush_method	O_DIRECT
innodb_flush_neighbors	0

Meaning: 24 GiB buffer pool, 8 instances, 16 KiB pages, direct IO enabled. O_DIRECT reduces double-buffering but increases your dependence on the buffer pool being correctly sized.

Decision: Compare this size against real memory limits and total MySQL footprint. Don’t assume “24 GiB is fine” because the box has “32 GiB installed.”

Task 3: Check memory at OS level (and whether swap is being used)

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi        29Gi       512Mi       1.2Gi       1.6Gi       1.1Gi
Swap:          4.0Gi       2.6Gi       1.4Gi

Meaning: Swap is active and significant. “Available” memory is low. The kernel is under pressure.

Decision: Stop tuning queries first. Plan to reduce MySQL memory usage (buffer pool and/or per-connection buffers), or add RAM, or lower concurrency.

Task 4: Verify swapping is current, not historical

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  1 2700000 480000  24000 820000  12   45  1800  2400  900 1400 18  6 48 28  0
 4  2 2702000 470000  24000 810000  20   60  2000  2600 1100 1600 20  7 41 32  0
 2  1 2705000 460000  24000 800000  10   30  1700  2100  980 1500 17  5 52 26  0

Meaning: Non-zero si/so indicates swap-in and swap-out happening now. CPU wa (IO wait) is high.

Decision: Treat this as an emergency: reduce memory pressure quickly or expect cascading latency.

Task 5: Confirm the service memory constraints (systemd)

cr0x@server:~$ systemctl show mariadb -p MemoryMax -p MemoryHigh -p TasksMax
MemoryMax=0
MemoryHigh=0
TasksMax=3096

Meaning: No explicit systemd memory cap here. If you’re containerized, this won’t tell the whole story.

Decision: If this is in Kubernetes/Docker, check cgroup limits next. If it’s bare metal/VM, proceed to MySQL process memory.

Task 6: Check cgroup memory limit (containers or slices)

cr0x@server:~$ cat /sys/fs/cgroup/memory.max
21474836480

Meaning: The process group is capped at 20 GiB. If your buffer pool is 24 GiB, you have engineered an OOM event with paperwork.

Decision: Lower innodb_buffer_pool_size below the cgroup limit with headroom for connections and OS overhead, or raise the limit.

Task 7: Find the real MySQL/MariaDB process RSS and mapping behavior

cr0x@server:~$ ps -o pid,cmd,rss,vsz --sort=-rss -C mysqld | head -n 5
 1423 /usr/sbin/mariadbd 28765432 32298124

Meaning: RSS ~27.4 GiB, VSZ ~30.8 GiB. The process is already close to RAM. This does not include all kernel memory or page cache.

Decision: If swapping/pressure exists, reduce buffer pool and/or other buffers; consider connection limits.

Task 8: Check MySQL status: buffer pool usage, reads, and dirty pages

cr0x@server:~$ mysql -NBe "SHOW GLOBAL STATUS WHERE Variable_name IN ('Innodb_buffer_pool_pages_total','Innodb_buffer_pool_pages_free','Innodb_buffer_pool_read_requests','Innodb_buffer_pool_reads','Innodb_buffer_pool_pages_dirty','Innodb_buffer_pool_wait_free')"
Innodb_buffer_pool_pages_total	1572864
Innodb_buffer_pool_pages_free	1024
Innodb_buffer_pool_read_requests	9842331123
Innodb_buffer_pool_reads	83422119
Innodb_buffer_pool_pages_dirty	210432
Innodb_buffer_pool_wait_free	9821

Meaning: Buffer pool is basically full (only 1024 free pages). Physical reads are substantial. wait_free indicates threads waiting for free pages—classic sign of pressure and/or poor flushing.

Decision: If host memory is tight, you may be oversized and swapping. If host memory is fine, increase buffer pool (carefully) or fix IO/flushing. Don’t guess: correlate with OS metrics.

Task 9: Compute buffer pool hit ratio (quick approximation)

cr0x@server:~$ mysql -NBe "SELECT ROUND((1- (SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME='Innodb_buffer_pool_reads')/(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME='Innodb_buffer_pool_read_requests'))*100,2) AS hit_ratio_percent;"
99.15

Meaning: Hit ratio ~99.15%. That sounds great, but it can still be insufficient at large scale. 0.85% misses on billions of requests is a lot of disk reads.

Decision: If IO is saturated and misses correlate with workload, buffer pool might be too small. If IO is saturated but misses aren’t rising, look at writes/binlogs/fsync/flush behavior.

Task 10: Check dirty page percentage and flushing pressure

cr0x@server:~$ mysql -NBe "SELECT ROUND(100*(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_dirty')/(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME='Innodb_buffer_pool_pages_total'),2) AS dirty_pct;"
13.38

Meaning: Dirty pages ~13%. Not insane by itself. But if it climbs and stays high with stalls, you’re behind on flushing.

Decision: If write latency is spiking and dirty pct stays elevated, examine IO capacity, flush settings, and redo log sizing/pressure. Don’t “solve” it with a bigger buffer pool unless you have IO headroom.

Task 11: Check current connection count and max connections

cr0x@server:~$ mysql -NBe "SHOW GLOBAL STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections';"
Threads_connected	482
max_connections	2000

Meaning: 482 live connections. With permissive max_connections, you can get memory death by a thousand thread stacks and per-session buffers.

Decision: If memory pressure exists, cap connections, use pooling, and audit per-connection buffers. Buffer pool sizing without connection discipline is wishful thinking.

Task 12: Inspect per-connection memory defaults that often explode

cr0x@server:~$ mysql -NBe "SHOW VARIABLES WHERE Variable_name IN ('sort_buffer_size','join_buffer_size','read_buffer_size','read_rnd_buffer_size','tmp_table_size','max_heap_table_size','thread_stack')"
sort_buffer_size	262144
join_buffer_size	262144
read_buffer_size	131072
read_rnd_buffer_size	262144
tmp_table_size	16777216
max_heap_table_size	16777216
thread_stack	299008

Meaning: Defaults are modest here, but “modest × hundreds of connections” adds up, and temp tables can spill to disk or consume memory depending on query patterns.

Decision: If you see memory pressure with high concurrency, don’t just shrink buffer pool—also fix connection pooling and query patterns creating large temp tables.

Task 13: Check whether temp tables are hitting disk

cr0x@server:~$ mysql -NBe "SHOW GLOBAL STATUS LIKE 'Created_tmp%tables';"
Created_tmp_disk_tables	1284921
Created_tmp_tables	4422103

Meaning: A meaningful fraction of temp tables are on disk. That’s extra IO, often misdiagnosed as “buffer pool too small.”

Decision: Review queries and indexes; consider increasing tmp table limits only if you’ve proven memory headroom and the workload benefits.

Task 14: Identify IO saturation and latency on the database volume

cr0x@server:~$ iostat -x 1 3
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.22    0.00    6.11   31.45    0.00   44.22

Device            r/s     w/s   rkB/s   wkB/s  await  aqu-sz  %util
nvme0n1         620.0  1140.0 49600.0 88400.0   9.80    3.10  96.00

Meaning: Device is ~96% utilized, await ~10ms. If this is NVMe, 10ms is suspiciously high and usually means queueing and contention.

Decision: If IO is near saturation, a bigger buffer pool might help reads if you have RAM headroom, but it won’t fix write stalls caused by fsync/binlog/flush configuration. Confirm what kind of IO dominates.

Task 15: Check top IO consumers (is it mysqld or something else?)

cr0x@server:~$ sudo iotop -o -b -n 3
Total DISK READ: 45.21 M/s | Total DISK WRITE: 86.73 M/s
  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 1423 be/4 mysql     42.10 M/s   80.22 M/s   12.00 %  92.00 % mariadbd
 2211 be/4 root       0.10 M/s    6.20 M/s    0.00 %   4.00 % backup-agent

Meaning: MariaDB is the dominant IO consumer and is swapping in (SWAPIN%). That’s the nightmare combo: IO-bound and swap-thrashing.

Decision: Immediate mitigation: reduce memory footprint and stop swap; consider temporarily lowering concurrency or shedding load. Then tune.

Task 16: Check InnoDB engine status for the story behind the metrics

cr0x@server:~$ mysql -NBe "SHOW ENGINE INNODB STATUS\G" | sed -n '1,120p'
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2025-12-29 10:12:19 0x7f3c9c2fe700 INNODB MONITOR OUTPUT
=====================================
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 27487790694
Dictionary memory allocated 612345678
Buffer pool size   1572864
Free buffers       1024
Database pages     1559000
Old database pages 574000
Modified db pages  210432
Pending reads 0
Pending writes: LRU 0, flush list 48, single page 0
Pages made young 112334455, not young 99887766
...

Meaning: You can see pending writes and modified pages. Pending flush list writes indicate flushing pressure; dictionary memory indicates extra footprint beyond the buffer pool.

Decision: If pending writes stay non-zero and grow, investigate IO capacity and flushing configuration; if dictionary memory is huge, consider schema/index bloat and metadata caching pressure.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

They migrated a customer-facing service from MySQL to MariaDB because procurement liked the licensing story and the engineers liked the idea of “drop-in compatibility.” The config file came along for the ride, including a proud innodb_buffer_pool_size set to a fat percentage of RAM.

The rollout looked fine in staging. Of course it did—staging had lower concurrency and fewer background jobs. In production, the new MariaDB nodes lived inside a container runtime with a memory limit that nobody on the database team had been told about. The host had plenty of RAM. The container didn’t.

Two hours after cutover, the latency curve started climbing in a smooth, horrifying line. Then it snapped into spikes. At first they blamed queries. Then they blamed the load balancer. Then they blamed “the cloud.” Eventually someone ran a single command in the right place and saw the cgroup limit.

The buffer pool was larger than the memory limit. The kernel did what kernels do: OOM kills or swap-thrash until the service looks like it’s haunted. They reduced the buffer pool, capped connections, and the system recovered—without changing a single query.

The wrong assumption was not about MySQL vs MariaDB. It was believing “RAM on the box” equals “RAM available to mysqld.” In 2025, that’s not an assumption. That’s a bug.

Mini-story 2: The optimization that backfired

A different team had a write-heavy workload and saw disk reads creeping up. Someone proposed the classic fix: make the buffer pool huge so more pages stay hot. They bumped it aggressively during a maintenance window, watched cache hit rate improve, and celebrated.

Then the replica lag started. Not immediately. A few days later, during peak traffic, replication fell behind and never caught up. The primary was “fine,” but replicas were drowning. Application read traffic was routed to replicas, so users still felt the pain.

The root cause wasn’t mystical. With the buffer pool inflated, the OS page cache got squeezed. Binary logs and relay logs stopped being cached effectively. The replica IO thread and SQL thread turned into a constant parade of cache misses and fsync waits. Meanwhile, the bigger buffer pool also increased the amount of dirty data that could accumulate, raising the cost of catch-up during bursts.

They rolled the buffer pool back down slightly, made sure redo/binlog IO had breathing room, and used metrics to confirm the trade-off. The “optimization” wasn’t wrong in principle—it was wrong in context. Performance is a budget, not a wish.

Mini-story 3: The boring but correct practice that saved the day

This one is unglamorous, which is why it worked. A company running mixed MySQL and MariaDB fleets had a standard practice: every database host reported (1) effective memory limits, (2) mysqld RSS, (3) swap activity, and (4) buffer pool metrics in the same dashboard. Not as a “nice to have.” As a deployment gate.

When a team requested a larger buffer pool “because reads are slow,” SRE required a capacity worksheet: current RSS, expected connection count, per-connection buffer caps, and a measured IO profile. No worksheet, no change.

One Friday, a new service launched and accidentally opened far more connections than expected. The database didn’t die. It got slower, but it stayed up. Why? Because they had left headroom intentionally and had set sensible connection limits. The buffer pool wasn’t set at some macho percentage; it was set at what the system could afford.

They fixed the connection storm, performance returned, and nobody had to do the 2 a.m. ritual of “turn it off and on again” on a database that was paging itself to death.

How to size innodb_buffer_pool_size safely

Here’s the opinionated guidance: stop using one percentage rule across fleets. Use a method. If you want a rule of thumb, fine—earn it by measuring.

Step 1: Know your deployment model

  • Bare metal / VM: You usually have predictable RAM, but you still share with the OS page cache and background services.
  • Container / cgroup-limited: The limit is the truth. Everything else is storytelling.

Step 2: Budget memory explicitly (don’t hand-wave “other”)

A practical production budget looks like:

  • Buffer pool: the big chunk
  • Connection memory: worst-case-ish estimate, or enforce limits so worst-case can’t happen
  • InnoDB overhead: data dictionary, internal structures, adaptive hash (if used), purge, etc.
  • OS and filesystem cache: especially important for binlogs/relay logs and metadata
  • Operational headroom: backups, schema changes, load spikes, and the fact that your future self is going to do something risky

Step 3: Treat swap as a sign you lost

In database land, swap is rarely “fine.” It’s a performance cliff. If you see active swap-in/swap-out on a database host, it’s not “using all the resources.” It’s choking.

Joke #2: If your database is swapping, it’s not “multi-tasking.” It’s just trying to remember where it put its keys.

Step 4: Use buffer pool instances correctly

Buffer pool instances help reduce contention, but there’s a sanity range:

  • Too few instances on large pools can increase contention.
  • Too many instances can increase overhead and reduce efficiency. You also make some metrics noisier.

Pick a reasonable number for your size and CPU, then measure. Don’t cargo-cult “8” because you saw it in a blog post from the era of SATA SSDs.

Step 5: Don’t starve IO subsystems by starving the OS cache

Especially in replication-heavy environments, binlog and relay log IO can decide your fate. Even if InnoDB reads are served from buffer pool, your operational reality involves logs, metadata, and other file IO. If you give everything to the buffer pool, you force the rest of the system to go to disk for every step. Then you blame InnoDB. Classic.

One reliability quote (paraphrased idea)

Paraphrased idea: reliability comes from designing for failure, not assuming it won’t happen. — John Allspaw

Common mistakes: symptom → root cause → fix

This section is meant to feel uncomfortably familiar.

1) Symptom: sudden latency spikes after “memory tuning”

Root cause: Buffer pool set too high, host begins swapping or reclaiming aggressively.

Fix: Reduce innodb_buffer_pool_size to restore headroom; cap connections; confirm swap is disabled or not used; verify cgroup limits.

2) Symptom: high IO utilization, but buffer pool hit rate looks “good”

Root cause: IO is dominated by writes (redo, flushing, binlogs) or temp tables, not by InnoDB reads.

Fix: Inspect dirty pages, pending flushes, binlog fsync behavior, temp table usage; tune flushing and storage; don’t blindly increase buffer pool.

3) Symptom: replicas lag more after increasing buffer pool

Root cause: OS page cache got squeezed; relay log/binlog IO becomes uncached; replicas become IO-latency bound.

Fix: Reduce buffer pool slightly; ensure log IO has cache headroom; consider separating logs to different devices where appropriate.

4) Symptom: performance degrades only during peak connections

Root cause: Per-connection memory spikes; too many threads; temp tables create IO and memory pressure.

Fix: Use connection pooling; lower max_connections; review per-session buffer sizes; fix queries that spill to disk.

5) Symptom: OOM killer takes out mysqld after a restart

Root cause: Buffer pool and other allocations happen quickly at startup; combined with cgroup limits or low headroom, you trip OOM.

Fix: Right-size buffer pool to the true limit; ensure system has headroom during warm-up; consider staggered restarts and preflight checks.

6) Symptom: “We increased buffer pool but nothing improved”

Root cause: Working set doesn’t fit anyway, or bottleneck is elsewhere (locks, CPU, storage latency, network, replication).

Fix: Measure buffer pool miss volume, not just ratio; identify top wait events; analyze IO latency; tune queries and indexes based on real bottlenecks.

Checklists / step-by-step plan

Checklist A: Pre-change safety checks (do this before touching buffer pool)

  1. Confirm server flavor and version (mysql --version).
  2. Confirm memory limits (systemd and cgroups).
  3. Record current buffer pool metrics: size, pages free, reads, wait_free, dirty pages.
  4. Record OS metrics: swap usage, major faults, IO utilization, IO latency.
  5. Record connection metrics: Threads_connected, max_connections, and peak.
  6. Check temp table usage and disk temp table creation.

Checklist B: Step-by-step tuning plan that won’t ruin your weekend

  1. Decide the goal. Are you trying to reduce read IO? Reduce latency spikes? Stop swap? These are different problems.
  2. Set a headroom target. Pick a concrete number (e.g., keep several GiB free/available, and keep swap inactive).
  3. Adjust buffer pool in small increments. Big jumps hide causality.
  4. Watch the right metrics for at least one workload cycle. Not five minutes. A cycle.
  5. If swap appears, back out immediately. Don’t “wait and see.” Swap doesn’t heal itself during peak load.
  6. After stabilization, reassess query patterns. If you’re spilling temp tables or doing full scans, the buffer pool is not your only lever.

Checklist C: Emergency mitigation when you are already swapping

  1. Reduce incoming load (rate limiting, read shedding, disable non-critical jobs).
  2. Lower connection concurrency quickly (application pool limits, proxy limits).
  3. Reduce buffer pool size at next safe opportunity (may require restart depending on version and settings).
  4. Stop memory-hungry background tasks (ad-hoc analytics queries, backups if they are thrashing IO).
  5. Confirm swap activity stops (vmstat, iotop).

FAQ

1) Can I use “80% of RAM” for innodb_buffer_pool_size on both MySQL and MariaDB?

You can, but you’re choosing superstition over engineering. Start with a conservative size that leaves real headroom, then adjust based on swap activity, IO metrics, and workload behavior.

2) Why did increasing innodb_buffer_pool_size make performance worse?

Most often: you induced swapping or squeezed the OS page cache so hard that non-InnoDB IO (logs, metadata, temp files) became disk-bound. Bigger cache isn’t better if it causes the kernel to evict the wrong things.

3) Is swapping always bad for MySQL/MariaDB?

Active swapping is almost always bad. A tiny amount of swap usage that never changes can be historical residue; continuous swap-in/swap-out during load is a performance emergency.

4) How do I know if reads are the problem or writes are the problem?

Correlate buffer pool misses (Innodb_buffer_pool_reads), device utilization/await (iostat -x), and InnoDB pending flushes/dirty pages (SHOW ENGINE INNODB STATUS). If IO is saturated but misses aren’t rising, suspect writes or temp tables.

5) Does using O_DIRECT change how I should size the buffer pool?

Yes. With O_DIRECT, you reduce double-buffering and rely more on InnoDB caching. That can improve predictability, but it also punishes undersizing and makes correct headroom planning more critical.

6) Should I set innodb_buffer_pool_instances manually?

Only if you have a reason and you’re measuring contention and throughput. Too few instances can bottleneck on mutexes; too many is overhead. Pick a sane value and revisit if you see contention symptoms.

7) Why is my buffer pool hit ratio high but latency still bad?

Because hit ratio is not throughput, and it hides the absolute number of misses. Also, latency can come from writes, fsync behavior, lock waits, replication, network, or storage latency even when reads hit cache.

8) What’s the safest way to change innodb_buffer_pool_size?

Safest means: planned change window, small increments, rollback plan, and monitoring. Whether it’s dynamic depends on version and engine behavior; don’t assume online resize will be painless under load.

9) Is MariaDB “worse” or “better” than MySQL for buffer pool behavior?

Neither in a universal sense. The problem is assuming they’re identical. Operationally, differences in packaging, defaults, and workload context matter more than brand allegiance.

10) If I have plenty of RAM, should I just max out the buffer pool?

No. You still need OS cache, headroom for connections, and space for operational bursts (backups, schema changes, failovers). Use the RAM to buy stability, not to win a benchmark screenshot.

Next steps (what to do this week)

Do these in order. They’re boring. They work.

  1. Inventory reality: On every database node, record MySQL/MariaDB version, cgroup/systemd memory limits, RAM, swap settings, and current buffer pool size.
  2. Build a minimal dashboard: Swap activity, mysqld RSS, device await/%util, buffer pool reads vs read requests, dirty page percentage, Threads_connected.
  3. Pick a headroom policy: Decide what “safe” means (no active swap; a minimum available memory; connection cap).
  4. Fix connection discipline: If you can’t predict concurrency, you can’t predict memory. Use pooling; lower max_connections to something you can actually support.
  5. Tune buffer pool with measurement: Increase or decrease based on observed IO and swap behavior, not on inherited percentages.
  6. Write the runbook you wish you had: Include the fast diagnosis playbook and the 12+ commands above. Future you will be tired and unimpressed by vague guidance.

If you take only one lesson: innodb_buffer_pool_size isn’t a “set-and-forget” magic number. It’s a contract between InnoDB, the OS, your workload, and your operational discipline. Break the contract, and production will enforce the penalty clause.

← Previous
rDNS/PTR Missing: The Boring DNS Fix That Saves Email Deliverability
Next →
Proxmox VMs Have No Internet: vmbr0 Bridge Mistakes and Quick Fixes

Leave a comment