Some databases fail loudly. MySQL and MariaDB in Docker often fail politely: they keep serving traffic while quietly turning your SSD into a percussion instrument.
The performance cliff usually isn’t your schema or your “one weird query.” It’s defaults: Docker storage drivers, filesystem semantics, memory limits, fsync behavior, and a few database settings that were reasonable in 2008 and are disastrous inside a container in 2026.
The real enemy: mismatch between database expectations and container reality
MySQL and MariaDB were born assuming a fairly boring contract with the operating system:
- “When I call fsync, my data is on stable storage.”
- “When I allocate memory, it stays mine.”
- “When I write to disk, latency is somewhat predictable.”
- “When I use temporary tables, I have enough local disk.”
Docker doesn’t break that contract on purpose. It just adds layers: union filesystems, copy-on-write, cgroup limits, virtualized filesystems on macOS/Windows, and volume drivers with wildly different durability and latency characteristics. Your database is still trying to be correct. Your container platform is trying to be flexible. Correctness and flexibility can coexist—if you stop trusting defaults.
Here’s the thesis: treat a containerized database as a storage appliance. Storage is a first-class dependency. The CPU is usually fine. The network is rarely the first problem. It’s almost always I/O latency, fsync behavior, or memory pressure disguised as “slow queries.”
Interesting facts and a little history (because defaults have a backstory)
These are small facts, but each explains a “why is it like this?” moment you’ll hit when tuning MySQL/MariaDB in Docker:
- InnoDB became the default engine in MySQL 5.5, largely because it handles crashes and concurrency better than MyISAM. That also means you inherited InnoDB’s strong opinions about fsync and redo logs.
- MariaDB forked from MySQL in 2009 after Oracle acquired Sun. The fork preserved compatibility, but the operational defaults and performance characteristics have diverged in subtle ways over time.
- Docker’s overlay filesystems were designed for images, not databases. Copy-on-write is fantastic for layering application files; it’s a tax on workloads that write small random blocks all day.
- InnoDB’s doublewrite buffer exists because partial page writes happen (power loss, controller weirdness, kernel bugs). It trades write amplification for safety—an ugly but usually correct trade.
- MySQL’s query cache was removed in MySQL 8.0 because it caused contention and unpredictable performance. If you’re still “tuning query cache” inside a container, you’re probably on an old build and you have bigger problems.
- Linux’s O_DIRECT and fsync semantics vary by filesystem and mount options. Database durability isn’t a universal truth; it’s a negotiated settlement between software layers.
- cgroups made “available memory” a lie inside containers for years. Modern MySQL is better at reading cgroup limits, but plenty of images and older versions still size buffers based on host RAM.
- SSD latency cliffs are real: when you saturate the device queue or trigger garbage collection, latency spikes first, throughput later. Your database will time out long before your monitoring shows “disk is 100% busy.”
And one reliability quote—because it’s still the only lesson that matters:
“Hope is not a strategy.” — General Gordon R. Sullivan
Fast diagnosis playbook: find the bottleneck in 15 minutes
If a containerized MySQL/MariaDB instance is slow, don’t start with SQL. Start with physics. The database is waiting on something.
First: confirm what “slow” means
- Is it higher latency per query, or reduced throughput, or both?
- Is it only writes, only reads, or everything?
- Is it periodic (spikes) or constant?
Second: check disk latency and fsync pressure
- Look for high redo log fsync time and dirty page stalls.
- Confirm you’re not writing to overlay2 or a thin network volume with terrible sync latency.
Third: check memory and cgroup limits
- Are you OOM-killing mysqld, or swapping, or shrinking page cache into oblivion?
- Is InnoDB buffer pool sized for host memory instead of container limit?
Fourth: check CPU steal and throttling
- CPU starvation masquerades as “random” DB slowness, especially under bursty workloads.
Fifth: only now read the slow query log
- If the slow queries are waiting on “waiting for handler commit” or I/O, you’re back to disk and fsync.
- If they’re CPU-bound with bad plans, then tune SQL and indexes.
That’s the sequence. When people skip it, they tune the wrong thing for weeks.
The defaults that ruin performance (and what to do instead)
1) Writing your datadir on overlay2 (or any union filesystem)
The classic Docker anti-pattern: you run MySQL in a container, forget to mount a real volume, and the datadir lives in the container’s writable layer. It works. It benchmarks terribly. It also makes upgrades and backups harder because your state is glued to an ephemeral layer.
Why it hurts: overlay2 is copy-on-write. Databases write a lot of small updates, and InnoDB writes in patterns that trigger metadata churn. Even if you “have SSD,” the filesystem layer can add latency and CPU overhead.
Do this instead: put /var/lib/mysql on a named volume or a bind mount on a real filesystem. In production, prefer local volumes on XFS/ext4 with sane mount options, or a carefully chosen CSI volume with known sync latency.
2) Assuming “Docker volume” means “fast” (it doesn’t)
A Docker “volume” is an abstraction. The backing store might be a local directory on ext4. Or NFS. Or a cloud block device. Or a distributed filesystem. The latency profile can be “fine” or “why is commit taking 200ms?”
Do this instead: measure fsync latency on the exact volume driver you use. Treat storage like a dependency with an SLO.
3) Default InnoDB durability settings + slow sync storage = sadness
Two variables matter most for write-heavy workloads:
innodb_flush_log_at_trx_commit(usually 1 by default)sync_binlog(often 1 in more careful setups, but varies)
innodb_flush_log_at_trx_commit=1 means InnoDB flushes redo to disk at every commit. On storage with high fsync latency, commit latency becomes storage latency. If you also have binary logging and sync_binlog=1, you’re paying twice.
What to do: decide your durability requirements explicitly. For many internal systems, innodb_flush_log_at_trx_commit=2 is an acceptable risk trade. For financial systems, maybe not. But make the decision, don’t inherit it.
Joke #1: If you set innodb_flush_log_at_trx_commit=0 in production, your database is now powered by vibes and optimism.
4) Binary logs on slow storage (and no plan for them)
Binary logs aren’t optional if you want replication or point-in-time recovery. They’re also a steady stream of writes that can become the dominant I/O. In Docker, people routinely forget:
- Where binlogs are stored (same datadir unless configured)
- How quickly they grow under write load
- That purging requires policy
Fix: size storage for binlog retention, enable automatic expiration, and monitor.
5) Container memory limits + InnoDB buffer pool defaults = cache thrash
Inside a container, memory is political. MySQL might see host memory (depending on version and config) and happily allocate a large buffer pool. Then the cgroup limit kicks in, and the kernel starts killing things or reclaiming aggressively.
Symptoms: periodic stalls, OOM kills, “random” query latency spikes, and an OS that looks calm while the container burns.
Fix: set innodb_buffer_pool_size based on the container limit, not host RAM. Leave room for:
- connection buffers (per-connection allocations add up)
- sort/join buffers under load
- InnoDB internal overhead
- filesystem cache (yes, still relevant)
6) Too many connections (because defaults are generous)
max_connections is often set high “just in case.” In containers, that “just in case” becomes “just in swap.” Each connection can allocate per-thread memory. Under bursts, memory balloons and the kernel does what kernels do: it punishes you for lying about capacity.
Fix: cap max_connections, use pooling, and size per-thread buffers sanely. Your database is not a concert venue; it doesn’t need infinite standing room.
7) tmpdir and temp tables landing on the wrong disk
Large sorts, ALTER TABLE operations, and complex queries may spill to disk. Inside containers, tmpdir might land on a small root filesystem layer, not your big data volume.
Fix: set tmpdir to a path on the same fast volume (or a dedicated fast volume), and monitor free space. If you’re on Kubernetes, this is where ephemeral storage limits surprise people.
8) Missing or mis-tuned redo log sizing
Redo logs that are too small cause frequent checkpoints, which increases background flushing and amplifies write pressure. Too large can increase crash recovery time. In container land, the “too small” case is more common because defaults are conservative and storage is often slower than expected.
Fix: tune redo log capacity for your write rate and recovery objectives. On modern MySQL, you’ll be thinking in terms of innodb_redo_log_capacity; on MariaDB and older MySQL, you’ll be working with log file size and count.
9) “We’ll just use network storage” (and then fsync is 20ms)
Remote storage can work. It can also quietly turn your commit path into a round trip across a network and three layers of caching. Many distributed storage systems are optimized for throughput, not fsync latency.
Fix: validate sync write latency before you bet your database on it. If you must use networked volumes, prefer those with predictable durability semantics and low tail latency. Measure p99, not averages.
10) Not pinning CPU or observing throttling
Database performance loves consistency. CPU throttling and contention can look like I/O waits because the database thread doesn’t get scheduled to finish work. In containers, you can accidentally run MySQL on “best effort” CPU while batch jobs stampede the node.
Fix: set appropriate CPU requests/limits (or Docker CPU quotas) and watch throttling metrics. If you see throttling under normal load, you’re underprovisioned or misconfigured.
11) Running on macOS/Windows Docker Desktop and expecting Linux-native I/O
Dev environments are where performance myths are born. Docker Desktop uses a VM. Bind mounts go through translation layers. File I/O can be dramatically slower, especially sync-heavy patterns like InnoDB redo.
Fix: for realistic performance tests, run on Linux with a real volume. For local dev, accept slower I/O or switch to named volumes rather than bind mounts.
12) Treating MySQL config as “inside the container” instead of “part of the service”
If your configuration is baked into the image with no externalization, you will eventually roll a change that requires rebuilding images under pressure. This is how “simple” incidents become long ones.
Fix: mount config, version it, and make it observable (SHOW VARIABLES should match your intent).
Practical tasks: commands, outputs, and decisions (12+)
These are production-grade checks. Each includes: a command, what the output means, and the decision you make from it.
Task 1: Confirm where MySQL is actually writing data (overlay vs volume)
cr0x@server:~$ docker inspect -f '{{ .Mounts }}' mysql-prod
[{volume mysql-data /var/lib/docker/volumes/mysql-data/_data /var/lib/mysql local true }]
Meaning: /var/lib/mysql is on a named volume, not the container layer.
Decision: If you don’t see a mount for /var/lib/mysql, stop and fix that before tuning anything else.
Task 2: Identify the Docker storage driver (overlay2, devicemapper, etc.)
cr0x@server:~$ docker info --format '{{ .Driver }}'
overlay2
Meaning: overlay2 is in use. Fine for containers. Not fine for database state unless you mount volumes correctly.
Decision: If the datadir is not a volume, expect pain. If it is a volume, you’ve mostly avoided overlay2 penalties.
Task 3: Check filesystem type and mount options backing the volume
cr0x@server:~$ docker run --rm -v mysql-data:/var/lib/mysql alpine sh -c "df -T /var/lib/mysql && mount | grep ' /var/lib/docker'"
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/nvme0n1p2 xfs 488245288 124332456 363912832 26% /var/lib/docker
/dev/nvme0n1p2 on /var/lib/docker type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k)
Meaning: This volume lives on XFS with typical options.
Decision: If you see NFS or a remote mount here, treat fsync latency as your primary suspect for write performance.
Task 4: Check container memory limit vs what MySQL thinks it has
cr0x@server:~$ docker inspect -f '{{ .HostConfig.Memory }}' mysql-prod
2147483648
cr0x@server:~$ docker exec -it mysql-prod bash -lc "cat /sys/fs/cgroup/memory.max 2>/dev/null || cat /sys/fs/cgroup/memory/memory.limit_in_bytes"
2147483648
Meaning: The container is limited to 2 GiB.
Decision: Set innodb_buffer_pool_size to something like 1.0–1.4 GiB depending on workload and connection count, not “80% of the host.”
Task 5: Verify actual InnoDB buffer pool size
cr0x@server:~$ docker exec -it mysql-prod bash -lc "mysql -N -e \"SHOW VARIABLES LIKE 'innodb_buffer_pool_size';\""
innodb_buffer_pool_size 1073741824
Meaning: Buffer pool is 1 GiB.
Decision: If it’s tiny (like 128 MiB) on a busy instance, expect constant disk reads and poor performance.
Task 6: Check for OOM kills and memory pressure signals
cr0x@server:~$ docker exec -it mysql-prod bash -lc "dmesg -T | tail -n 5"
[Fri Jan 3 09:28:41 2026] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=docker-3b1a...,mems_allowed=0,oom_memcg=/docker/3b1a...
[Fri Jan 3 09:28:41 2026] Killed process 1973 (mysqld) total-vm:3121044kB, anon-rss:1632100kB, file-rss:0kB, shmem-rss:0kB
Meaning: The kernel killed mysqld due to cgroup memory limits.
Decision: Reduce MySQL memory usage (buffers, connections) or raise the container memory limit. Also fix restart loops and corruption risk analysis.
Task 7: Inspect redo log fsync pressure and commit stalls
cr0x@server:~$ docker exec -it mysql-prod bash -lc "mysql -e \"SHOW ENGINE INNODB STATUS\\G\" | sed -n '1,120p'"
...
Log sequence number 912345667890
Log flushed up to 912345660000
Last checkpoint at 912345100000
0 pending log flushes, 0 pending chkp writes
...
Meaning: If you see many pending log flushes or a big gap between “Log sequence number” and “Log flushed up to,” your storage can’t keep up with redo fsync.
Decision: Improve storage latency, tune redo capacity, consider durability trade-offs (innodb_flush_log_at_trx_commit), and reduce write amplification.
Task 8: Confirm durability-related variables (the ones people “assume”)
cr0x@server:~$ docker exec -it mysql-prod bash -lc "mysql -e \"SHOW VARIABLES WHERE Variable_name IN ('innodb_flush_log_at_trx_commit','sync_binlog','innodb_doublewrite');\""
+------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| innodb_doublewrite | ON |
| innodb_flush_log_at_trx_commit | 1 |
| sync_binlog | 1 |
+------------------------------+-------+
Meaning: You’re paying full durability cost (redo fsync per commit + binlog fsync per commit).
Decision: Keep it if you need it. If you don’t, change it intentionally and document the risk profile.
Task 9: Measure real disk latency from inside the container (quick and dirty)
cr0x@server:~$ docker exec -it mysql-prod bash -lc "dd if=/dev/zero of=/var/lib/mysql/dd-test.bin bs=4k count=25000 oflag=dsync 2>&1 | tail -n 2"
25000+0 records in
25000+0 records out
Meaning: With oflag=dsync, each block is synced. If this command takes seconds longer than expected, your sync write latency is bad. (The dd output timing line is omitted here because it varies, but you’ll see it on your system.)
Decision: If sync writes are slow, stop blaming indexes. Fix storage or durability settings.
Task 10: Check whether you’re accidentally using a bind mount with poor semantics (Desktop, CIFS, etc.)
cr0x@server:~$ docker inspect -f '{{ range .Mounts }}{{ .Type }} {{ .Source }} -> {{ .Destination }}{{ "\n" }}{{ end }}' mysql-prod
volume /var/lib/docker/volumes/mysql-data/_data -> /var/lib/mysql
Meaning: It’s a Docker-managed volume; good start.
Decision: If you see bind to a path that lives on a slow/remote FS, you’ve found a likely bottleneck.
Task 11: Check temp table and tmpdir behavior (disk spills)
cr0x@server:~$ docker exec -it mysql-prod bash -lc "mysql -e \"SHOW VARIABLES LIKE 'tmpdir'; SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';\""
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| tmpdir | /tmp |
+---------------+----------+
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 48291 |
+-------------------------+-------+
Meaning: tmpdir is /tmp, and you have many disk-based temp tables.
Decision: Put tmpdir on a fast volume with enough space. Also investigate queries causing temp spills.
Task 12: Spot CPU throttling inside containers
cr0x@server:~$ docker stats --no-stream mysql-prod
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
3b1a1c2d3e4f mysql-prod 398.23% 1.62GiB / 2GiB 81.0% 1.2GB / 900MB 40GB / 12GB 58
Meaning: High CPU usage can be normal. The key is whether you’re hitting CPU quotas (not shown here) and seeing latency spikes correlated with throttling.
Decision: If CPU is high and latency is high, check query plans. If CPU is low but latency is high, check I/O and locks first.
Task 13: Confirm slow query log is enabled and where it writes
cr0x@server:~$ docker exec -it mysql-prod bash -lc "mysql -e \"SHOW VARIABLES LIKE 'slow_query_log'; SHOW VARIABLES LIKE 'slow_query_log_file';\""
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| slow_query_log | ON |
+----------------------+-------------------------------+
| Variable_name | Value |
+----------------------+-------------------------------+
| slow_query_log_file | /var/lib/mysql/mysql-slow.log |
+----------------------+-------------------------------+
Meaning: Logging is on and lands on the data volume.
Decision: If it lands in the container filesystem and you rotate poorly, you can fill root and crash MySQL. Put logs on persistent storage and rotate.
Task 14: Check how many threads and connections you’re actually running
cr0x@server:~$ docker exec -it mysql-prod bash -lc "mysql -e \"SHOW GLOBAL STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections';\""
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_connected | 412 |
+-------------------+-------+
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 800 |
+-----------------+-------+
Meaning: You’re running hot on connections. Even if queries are fast, per-thread memory can wreck your container.
Decision: Add pooling, cap max connections, and measure memory growth during bursts.
Three corporate mini-stories (anonymized, painfully familiar)
Mini-story 1: The incident caused by a wrong assumption
They migrated a legacy app from VMs to containers because “it’s just packaging.” The database went too, because that sounded neat and modern. The team used a popular MySQL image, ran it in Docker Swarm, and used a bind mount to a host directory that looked persistent enough.
On day three, the p95 write latency doubled, then doubled again. The app didn’t crash; it just got slower until the UI looked like it was rendered by a very patient intern. Engineers dove into query plans, added indexes, and argued about ORM behavior. The slow query log showed a bunch of innocent INSERTs taking hundreds of milliseconds.
Eventually someone checked the host: the bind mount path lived on a network share mounted for “convenience,” because the ops team wanted easy access for backups. Nobody called it NFS in the Docker compose file; it was just a directory. Fsync latency was brutal, and the tail latencies were worse.
The fix was boring: move the datadir to local block storage with predictable sync latency, then implement proper backups via logical dumps and binlogs (and later physical backups). Latency dropped immediately. The indexes they added weren’t harmful, but they weren’t the problem.
Mini-story 2: The optimization that backfired
A different company had a write-heavy service. Someone read that innodb_flush_log_at_trx_commit=2 can be faster, and they were right. They changed it, saw great benchmarks, and rolled it out broadly.
For a month, everything looked good. Throughput improved, and the storage graphs calmed down. Then a node crashed hard—kernel panic, unclean shutdown, the whole spectacle. MySQL restarted, recovered, and the service came back. Except some recently acknowledged writes were gone.
No one was happy, but nobody was surprised either. The service didn’t have a clear durability contract: it acknowledged transactions before they were fully durable, and the application layer assumed “acknowledged means permanent.” The optimization was technically correct but operationally unowned.
They eventually restored strict durability on critical tables and implemented idempotency for write paths where they could tolerate replay. The real fix wasn’t the config; it was aligning application semantics with storage reality.
Mini-story 3: The boring but correct practice that saved the day
A team running MariaDB in containers had one habit: every performance complaint started with the same three checks—volume type, fsync latency, and container memory headroom. It wasn’t glamorous, but it made incidents short.
During a peak traffic event, they saw replication lag climbing. App engineers immediately blamed “a bad query rollout.” The SRE on call checked disk latency on the primary and replicas. The replicas were fine; the primary had periodic spikes in sync write latency.
The root cause wasn’t MySQL at all. A scheduled antivirus scan (yes, on Linux) was crawling the volume mountpoint and hammering metadata. Because they had the habit of checking storage first, they found it in minutes, not hours.
They excluded the database directories from scanning, documented it, and moved on. The best part: nobody had to learn a new database parameter at 2 a.m.
Common mistakes: symptoms → root cause → fix
This section is meant to be used mid-incident when someone is asking why “the DB is slow” and the room is getting loud.
1) Symptom: INSERT/UPDATE latency spikes, reads mostly fine
- Root cause: fsync latency (redo log flush, binlog sync) on slow storage or network volume.
- Fix: move datadir/binlogs to lower-latency storage; measure sync writes; tune durability only with explicit risk sign-off.
2) Symptom: MySQL restarts “randomly,” container exits with 137
- Root cause: OOM kill from cgroup memory limit; buffer pool + per-connection memory exceeded limit.
- Fix: reduce buffer pool, cap connections, use pooling, increase memory limit, and verify memory usage under load.
3) Symptom: Slow queries show “Copying to tmp table” or disk temp tables increase
- Root cause: tmpdir on slow or small filesystem; temp spills due to sorts/group by operations.
- Fix: move tmpdir to fast volume; increase tmp table thresholds carefully; fix queries/indexes causing spills.
4) Symptom: “Disk full” but datadir volume has space
- Root cause: binlogs or slow logs growing without rotation; tmpdir on container root; Docker root filesystem filling.
- Fix: configure log rotation; set binlog expiration; move tmpdir and logs onto the correct volume; monitor both Docker root and DB volumes.
5) Symptom: Throughput collapses during backups
- Root cause: backup method saturating I/O or locking tables; snapshotting on storage that penalizes copy-on-write; reading from the same volume as redo writes.
- Fix: schedule backups with I/O limits; use replica for backups; use physical backups if appropriate; measure impact and set an SLO.
6) Symptom: Replication lag grows after moving to containers
- Root cause: replicas on different storage class; CPU throttling; binlog fsync cost; too-small relay log space or slow disk.
- Fix: standardize storage; check throttling; tune replication settings; ensure relay logs and datadir are on fast persistent volumes.
7) Symptom: Performance is fine until a load spike, then everything times out
- Root cause: connection storm; max_connections too high; thread creation and memory explosion.
- Fix: pool connections; cap concurrency at app layer; set max_connections based on capacity; consider thread pool features where available.
8) Symptom: CPU looks low but queries are slow
- Root cause: I/O wait, lock contention, or throttling not visible in naive metrics.
- Fix: check InnoDB status for waits; check disk latency; check container CPU throttling metrics; inspect lock waits.
Joke #2: Containers don’t make your database faster; they just help it travel with its bad habits.
Checklists / step-by-step plan
Step-by-step: production-ready MySQL/MariaDB in Docker
- Pick the storage class first. Local SSD-backed block storage beats network storage for low-latency commits, unless you’ve proven otherwise.
- Mount datadir as a volume. No exceptions. If you can’t mount a volume, you don’t have a database; you have a demo.
- Decide durability explicitly. Document
innodb_flush_log_at_trx_commitandsync_binlogwith a clear risk statement. - Size memory against container limits. Buffer pool + overhead + peak connections must fit with headroom.
- Limit connections and enforce pooling. Use sane
max_connections. Do not let your app “discover” the limit via outages. - Put tmpdir somewhere safe. Fast, large, monitored. Same goes for logs.
- Enable observability. Slow query log, performance schema (when appropriate), key InnoDB metrics.
- Plan backups and restores as a system. Test restore time and correctness. If you can’t restore, you don’t have backups.
- Load test on the same platform. Docker Desktop benchmarks are for feelings, not capacity planning.
- Rehearse failure. Kill -9 the container in staging and verify recovery behavior and data integrity assumptions.
Minimal “day 1” configuration decisions (write them down)
- Where is
/var/lib/mysqlstored, and what backs it? - What’s the expected p95 and p99 fsync latency?
- What’s the container memory limit, and how is buffer pool sized?
- What’s the maximum allowed concurrent connections?
- Where do binlogs and slow logs live, and how are they rotated?
- What is your restore procedure, and when did you last run it?
FAQ
1) Should I run MySQL/MariaDB in Docker in production?
Yes, if you treat it like a stateful service with real storage engineering. No, if your plan is “it worked on my laptop.” Containers don’t remove the need for ops discipline; they make it more urgent.
2) Named volume or bind mount for /var/lib/mysql?
On Linux servers, either can work. Named volumes are often simpler operationally. Bind mounts are fine if you control the filesystem, mount options, and backups. On Docker Desktop, named volumes usually outperform bind mounts for databases.
3) overlay2 is my Docker driver—am I doomed?
No. You’re doomed only if your database files live on overlay2’s writable layer. Use a real volume and you mostly bypass the worst of it.
4) Is innodb_flush_log_at_trx_commit=2 “safe”?
It’s a business decision disguised as a config knob. It can lose up to ~1 second of transactions on crash (depending on timing). If the application can retry safely (idempotency) or tolerate minor loss, it may be acceptable. If not, keep it at 1.
5) Why is performance fine in the morning and terrible in the afternoon?
Often: storage contention (other workloads on the node), background jobs, backups, log rotation, or SSD garbage collection under sustained writes. Measure disk latency over time and correlate with workload events.
6) My slow query log shows simple queries taking forever. How?
Because “simple” SQL can still require fsync at commit, wait on locks, or wait for buffer pool flushes. Look for commit waits, I/O waits, and lock waits before rewriting queries.
7) Should I disable InnoDB doublewrite to speed things up?
Only if you fully understand the risk and your storage stack guarantees atomic page writes (many don’t). Disabling it can improve write performance but increases corruption risk on partial writes. Most teams should leave it on and fix storage latency instead.
8) How do I know if temp tables are hurting me?
Check Created_tmp_disk_tables and watch disk usage where tmpdir points. If disk temp tables spike during slow periods, you likely have queries spilling to disk or tmpdir is slow.
9) Is replication slower in containers?
Not inherently. But containers make it easier to accidentally run primaries and replicas on different storage types, different CPU quotas, or different noisy neighbors. Consistency of infrastructure matters more than “container vs VM.”
10) What’s the single best performance win?
Put your datadir and redo/binlogs on low-latency storage and verify fsync behavior. Most “MySQL is slow in Docker” complaints are storage semantics problems wearing a SQL costume.
Conclusion: next steps you can do today
If you’re running MySQL/MariaDB in Docker and performance is a mystery, stop guessing. Do these in order:
- Confirm
/var/lib/mysqlis on a real volume, not overlay. - Measure sync write latency on that volume (not “disk throughput”).
- Verify container memory limits and buffer pool sizing match reality.
- Check durability knobs and confirm they match the business contract.
- Move tmpdir/logs/binlogs onto the right storage and rotate them.
Then tune queries. Then tune indexes. Not the other way around.