Local dev is a friendly liar. Your laptop has one user, fast NVMe, a warm filesystem cache, and a Docker Desktop stack that “helpfully” smooths over sharp edges. Production has none of that. Production has real concurrency, real I/O latency, real upgrades, and the kind of failure modes that arrive at 03:12 with a calendar invite titled “SEV-1”.
If you run MySQL or MariaDB in Docker containers, the gap between “it starts” and “it survives” is where most teams bleed. The engines are similar—until they aren’t. The images look interchangeable—until a minor version changes a default. The storage is “just a volume”—until fsync becomes your new religion.
What breaks in production (and why)
When someone says “MySQL and MariaDB are basically the same,” what they usually mean is: “my application ran a few queries and didn’t crash.” Production requires stronger guarantees: predictable startup, stable defaults, safe upgrades, consistent performance under load, and recoverability when something goes sideways.
The big categories of “worked locally” failure
- Authentication & client compatibility: MySQL 8 changed defaults that older clients don’t love. MariaDB went its own way.
- Persistent storage semantics: bind mounts vs named volumes vs network volumes behave differently under fsync, permissions, and latency.
- Resource limits: your laptop overcommits memory and CPU. Production containers get cgroups and enforcement.
- Startup ordering: Compose “depends_on” isn’t readiness. Your app hits the DB while it’s still doing crash recovery.
- Configuration drift: local uses defaults; prod uses a config file; staging uses an environment variable. Surprise, you’re running three different databases.
- Upgrade paths: a minor image bump can change defaults, redo log formats, or replication behavior.
- SQL mode/collation differences: subtle differences manifest as “why is prod rejecting this insert?”
Here’s the theme: MySQL and MariaDB are not “just containers.” They’re stateful systems with sharp durability and performance edges. Docker makes them easy to start; it does not make them easy to run.
One short joke (1/2): Running a database in a container is like putting a piano on a skateboard—possible, but you’d better care about the floor.
Pick a stance: what you should do
If you’re shipping to production:
- Pin versions. Explicit tags. No
latest. Ever. - Choose the engine intentionally. If you need Oracle MySQL features or strict compatibility with MySQL 8 behavior, use MySQL. If you need MariaDB-specific features or operational reasons (like certain distributions), use MariaDB. Don’t “swap later.”
- Design storage first. Decide where the data lives, how it’s backed up, and how it’s restored before you ship a single query.
- Measure fsync latency. Not IOPS marketing. Actual fsync behavior on your actual storage path.
Interesting facts and historical context (you’ll actually use)
These aren’t trivia-night facts. They explain why defaults differ, why compatibility is not guaranteed, and why “drop-in replacement” has an expiration date.
- MariaDB was created as a fork of MySQL after the Sun/Oracle acquisitions, to keep a community-governed path alive. That history explains why the names are similar and the divergence is… political and technical.
- MySQL 8 changed the default authentication plugin to
caching_sha2_password, which broke older clients and tooling that expectedmysql_native_password. - MySQL 8 removed the query cache (it was a foot-gun under concurrency). If you were relying on it in MySQL 5.7, you were relying on a trap.
- MariaDB’s optimizer and storage engine ecosystem diverged over time (e.g., Aria, MyRocks in certain contexts, and different InnoDB/XtraDB lineage historically). The net effect: “same query” doesn’t mean “same plan.”
- GTID replication exists in both worlds, but implementation details differ enough that cross-engine replication/migration needs planning, not vibes.
- Docker official images are opinionated packaging. Entry-point scripts, default users, permissions, and initialization logic are part of the product you’re running.
- Linux filesystems and drivers matter. Overlay filesystems, network filesystems, and copy-on-write layers influence fsync cost and metadata behavior.
- Durability is configurable. Settings like
innodb_flush_log_at_trx_committrade data safety for speed. Many “fast” container guides quietly set durability to “optional.”
MySQL vs MariaDB in containers: the differences that bite
1) Image behavior is part of your DB
When you run mysql:8.0 or mariadb:11, you’re not only choosing a database engine. You’re choosing:
- How initialization works (
/docker-entrypoint-initdb.dbehavior, charset defaults, root password handling). - What the default config looks like, and where it’s loaded from.
- What user IDs the service runs as, affecting volume permissions.
In local dev, you blow away volumes without remorse. In prod, your entrypoint scripts should never be the only thing standing between “restart” and “data directory is unusable.”
2) Authentication and TLS: it’s not just “a password”
The classic failure: app uses an older connector. Locally, you ran MariaDB or MySQL 5.7. In prod, someone chose MySQL 8 “because it’s newer.” The app starts failing logins with errors about authentication plugins or RSA public keys.
Make a decision: either upgrade the client libraries to support MySQL 8 defaults, or explicitly configure MySQL 8 to use mysql_native_password for that user (with eyes open about security posture). For MariaDB, the auth story is different enough that you should validate connector compatibility early.
3) SQL modes and collations: “works” until someone inserts real data
Local data sets are clean. Production data is a landfill with business value. Differences in:
sql_mode(strictness around invalid dates, truncated strings, and implicit conversions)- default character set (e.g.,
utf8mb4vs olderutf8) - collation defaults (sorting and comparison rules)
…can turn silent local behavior into loud production breakage.
4) Performance: same query, different plan, different pain
MySQL and MariaDB share a lot of DNA, but they can choose different execution plans under load. Docker adds its own performance tax: filesystem overlay behavior, storage drivers, and constrained I/O that doesn’t show up on a developer laptop.
If you’re serious: treat each engine + version as a separate platform. Benchmark the workload you actually have, not a synthetic one-liner.
5) Replication and migration: “compatible enough” isn’t a strategy
At some point, a team wants to migrate MariaDB to MySQL, or the other way, because of support contracts, managed service offerings, or feature needs. Containerization doesn’t make that easier; it makes it easier to do casually and therefore dangerously.
Logical dumps (mysqldump) are slow but portable. Physical copies of data directories are fast but brittle across versions and engines. Replication can be elegant but demands compatibility discipline.
Docker storage: where performance and durability go to argue
Databases do two things all day: read and write. Docker storage determines whether those writes are fast, durable, and recoverable—or “fast until they aren’t,” which is a polite way to describe data corruption and extended outages.
Named volumes vs bind mounts: pick intentionally
- Named volume: Docker-managed path, usually stable permissions, fewer surprises. Operationally nicer for many setups.
- Bind mount: you control the exact host path. Great for predictable backup integration and debugging, but easy to mess up with ownership, SELinux/AppArmor contexts, and filesystem choices.
If you bind-mount a directory from a host filesystem with unusual semantics (network FS, some distributed filesystems, misconfigured NFS), you can get stalls on fsync or outright corruption risk.
fsync latency is your “it worked locally” tax
In local dev, fsync is quick and the cache is warm. In production, fsync can be slow due to:
- networked storage
- RAID controller cache policy
- cloud block storage behavior
- copy-on-write overhead
- host-level contention
When fsync is slow, InnoDB log flushes are slow. Then commits are slow. Then your app times out. Then someone “fixes” it by turning durability down. And now you’re gambling with your paycheck.
One quote (paraphrased idea): Werner Vogels’ paraphrased idea: everything fails, all the time—design for it rather than hoping it won’t happen.
Container memory limits and InnoDB behavior
InnoDB wants memory for buffer pool, log buffers, and caches. Containers get limits. If you don’t size innodb_buffer_pool_size and friends with cgroups in mind, the kernel OOM killer will eventually make a decision on your behalf. It will not be the decision you wanted.
Health checks vs readiness
A container that is “up” is not necessarily “ready.” After a crash or hard restart, MySQL/MariaDB may perform crash recovery. The process accepts TCP connections early, but queries might block or fail until recovery finishes. This is where naive orchestration turns a transient incident into a cascading failure.
Practical tasks: commands, outputs, and what decisions to make
Below are production-grade tasks you can run today. Each one includes: command, a realistic output sample, what it means, and the decision you make from it. Do these before you declare victory.
Task 1: Confirm which engine and version you are actually running
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SELECT @@version, @@version_comment;"
+-----------+----------------------------------+
| @@version | @@version_comment |
+-----------+----------------------------------+
| 8.0.36 | MySQL Community Server - GPL |
+-----------+----------------------------------+
What it means: This is Oracle MySQL 8.0.36, not MariaDB. Version comment matters; MariaDB prints its own branding.
Decision: Use engine-specific docs and migration notes. Don’t apply MariaDB tuning or assumptions to MySQL 8.
Task 2: Check the container image tag and digest (pinning)
cr0x@server:~$ docker inspect --format '{{.Config.Image}} {{index .RepoDigests 0}}' db
mysql:8.0 mysql@sha256:3a5f2c8e7f2c2c6f2d2f4c1c0b0b7c6d9e8f1a2b3c4d5e6f7a8b9c0d1e2f3a4
What it means: You’re running a tag and there’s a specific digest. Tags move; digests don’t.
Decision: Pin to a digest for production rollouts (or at least pin minor version and control upgrades).
Task 3: See whether you’re using a named volume or bind mount for data
cr0x@server:~$ docker inspect db --format '{{json .Mounts}}'
[{"Type":"volume","Name":"prod-mysql-data","Source":"/var/lib/docker/volumes/prod-mysql-data/_data","Destination":"/var/lib/mysql","Driver":"local","Mode":"z","RW":true,"Propagation":""}]
What it means: A Docker named volume backs /var/lib/mysql.
Decision: Ensure your backup and restore procedures know where this volume lives, and that the host storage is suitable (filesystem, latency, monitoring).
Task 4: Verify filesystem type and mount options under the data directory
cr0x@server:~$ docker exec -it db bash -lc "df -T /var/lib/mysql && mount | grep ' /var/lib/mysql ' -n || true"
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/nvme0n1p2 ext4 96143228 31287124 59921844 35% /var/lib/mysql
What it means: Data is on ext4; good baseline. If you see overlayfs or an NFS mount here, stop and reconsider.
Decision: Avoid putting InnoDB data on network filesystems unless you fully understand durability and locking semantics.
Task 5: Confirm the data directory and critical InnoDB durability settings
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW VARIABLES WHERE Variable_name IN ('datadir','innodb_flush_log_at_trx_commit','sync_binlog');"
+------------------------------+----------------+
| Variable_name | Value |
+------------------------------+----------------+
| datadir | /var/lib/mysql/|
| innodb_flush_log_at_trx_commit | 1 |
| sync_binlog | 1 |
+------------------------------+----------------+
What it means: Full durability: flush logs every commit, sync binlog every commit.
Decision: Keep this for production unless you’ve explicitly accepted data loss on crash. If performance is bad, fix storage and configuration before turning these down.
Task 6: Check authentication plugin defaults and user configuration (MySQL 8 classic issue)
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SELECT user, host, plugin FROM mysql.user WHERE user IN ('app','root');"
+------+-----------+-----------------------+
| user | host | plugin |
+------+-----------+-----------------------+
| app | % | caching_sha2_password |
| root | localhost | caching_sha2_password |
+------+-----------+-----------------------+
What it means: The app user uses caching_sha2_password. Older clients may fail.
Decision: Upgrade client connector or change plugin per-user (and test). Don’t “fix” this by downgrading the server without a plan.
Task 7: Validate SQL mode and catch local/prod drift
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SELECT @@sql_mode\G"
*************************** 1. row ***************************
@@sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
What it means: Strict modes are enabled. Apps that relied on lax behavior can break.
Decision: Prefer fixing the application and schema rather than loosening SQL mode globally. If you must adjust, do it explicitly and document it.
Task 8: Check default character set and collation (data correctness, not cosmetics)
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW VARIABLES LIKE 'character_set_server'; SHOW VARIABLES LIKE 'collation_server';"
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
+------------------+--------------------+
| Variable_name | Value |
+------------------+--------------------+
| collation_server | utf8mb4_0900_ai_ci |
+------------------+--------------------+
What it means: MySQL 8 uses newer collations by default. MariaDB differs. Sorting/comparison can change across environments.
Decision: Standardize charset/collation in schema migrations. Don’t leave it to server defaults.
Task 9: Inspect container logs for crash recovery, permission issues, and plugin failures
cr0x@server:~$ docker logs --tail=80 db
2025-12-31T00:11:21.443915Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.36) initializing of server in progress as process 1
2025-12-31T00:11:22.119203Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2025-12-31T00:11:27.991130Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2025-12-31T00:11:28.250984Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections.
What it means: Clean startup. If you see “permission denied,” “upgrade after crash,” or repeated restarts, treat it as a storage/config issue, not an app issue.
Decision: Make readiness checks reflect actual usability (e.g., a simple query) and investigate slow recovery if it repeats.
Task 10: Confirm resource limits (cgroups) to avoid mystery OOMs
cr0x@server:~$ docker inspect db --format 'Memory={{.HostConfig.Memory}} NanoCpus={{.HostConfig.NanoCpus}}'
Memory=2147483648 NanoCpus=2000000000
What it means: 2 GiB memory limit and 2 CPU cores worth of quota are set.
Decision: Tune buffer pool and connection limits to fit. If you run MySQL with unlimited defaults inside a 2 GiB box, the OOM killer will eventually host a surprise party.
Task 11: Watch runtime memory and check for OOM kill history
cr0x@server:~$ docker stats --no-stream db
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
a1b2c3d4e5f6 db 185.23% 1.92GiB / 2.00GiB 96.00% 1.2GB / 1.1GB 9.8GB / 14GB 61
What it means: You’re flirting with the memory limit. That’s how you get OOM kills during spikes.
Decision: Reduce max_connections, right-size buffer pool, or raise memory. Also evaluate query patterns causing spikes.
Task 12: Check InnoDB buffer pool sizing vs container memory
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW VARIABLES LIKE 'innodb_buffer_pool_size';"
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| innodb_buffer_pool_size | 1073741824|
+-------------------------+-----------+
What it means: Buffer pool is 1 GiB. On a 2 GiB container, this might be reasonable—if the rest (connections, temp tables, sort buffers) is controlled.
Decision: Keep a margin. If you’re routinely at 90% memory, reduce buffer pool or fix per-connection memory usage.
Task 13: Measure slow queries and whether you’re CPU-bound or I/O-bound
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW GLOBAL STATUS LIKE 'Slow_queries'; SHOW VARIABLES LIKE 'slow_query_log';"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Slow_queries | 1842 |
+---------------+-------+
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| slow_query_log | ON |
+----------------+-------+
What it means: Slow query count is rising. Logging is enabled, so you can inspect and fix.
Decision: Pull slow logs, fix indexes/query patterns, and verify that production uses the same query plans as staging.
Task 14: Validate binary logging and format (replication and point-in-time recovery)
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW VARIABLES LIKE 'log_bin'; SHOW VARIABLES LIKE 'binlog_format';"
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin | ON |
+---------------+-------+
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
What it means: Binlogs are on and in ROW format—generally the safer replication format for correctness.
Decision: If you want point-in-time recovery, keep binlogs and ship them. If you don’t, be honest: your recovery plan is “restore last backup and lose data.”
Task 15: Confirm that your app isn’t accidentally using localhost/socket assumptions
cr0x@server:~$ docker exec -it app bash -lc "getent hosts db && nc -vz db 3306"
172.20.0.3 db
Connection to db 3306 port [tcp/mysql] succeeded!
What it means: The app container can resolve and reach the DB by service name over TCP.
Decision: Standardize on TCP for container-to-container connectivity unless you have a deliberate reason to share a UNIX socket via volume (rare and fragile).
Task 16: Check for table corruption or crash recovery warnings (InnoDB health)
cr0x@server:~$ docker exec -it db mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW ENGINE INNODB STATUS\G" | sed -n '1,80p'
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
2025-12-31 00:22:10 0x7f1a2c1ff700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 44 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 102 srv_active, 0 srv_shutdown, 188 srv_idle
srv_master_thread log flush and writes: 290
What it means: You can inspect for “LATEST DETECTED DEADLOCK” and I/O waits. Absence of glaring errors is good; presence of long waits points to I/O or contention.
Decision: If you see high “log i/o” waits or “fsync” stalls, investigate storage latency first.
Fast diagnosis playbook
You have an incident. The app is slow or down. People are typing “any updates?” in chat like it’s a load test. Here’s the order that finds the bottleneck quickly.
First: Is the database actually healthy and ready?
- Check container restarts and logs for crash recovery loops, permission errors, and upgrade messages.
- Run a trivial query from inside the container:
SELECT 1;If that blocks, you’re not “up.” - Confirm disk space and inode pressure on the host path backing the volume.
Second: Is it I/O latency (fsync) or CPU contention?
- If commits are slow and concurrency is high, suspect fsync latency and storage.
- If CPU is pegged and queries are slow, suspect missing indexes, bad plans, or too many threads.
- If memory is near limit, suspect swapping (host) or OOM kills (container).
Third: Is it a config drift or compatibility regression?
- Compare
sql_mode, charset/collation, and auth plugins between local/staging/prod. - Confirm image tag and digest. Was there an “innocent” redeploy?
- Check client connector versions and TLS requirements.
Fourth: Is it the network path and name resolution?
- Validate service discovery inside the Docker network.
- Confirm port mappings and firewall rules when crossing hosts.
- Look for ephemeral connection storms (pool misconfiguration).
Common mistakes: symptoms → root cause → fix
1) Symptom: App can’t log in after moving to MySQL 8
Root cause: Auth plugin mismatch (caching_sha2_password vs older client expectations).
Fix: Upgrade client connector; or set user plugin to mysql_native_password for that user and re-issue password; then schedule a proper upgrade.
2) Symptom: Random “permission denied” on startup after redeploy
Root cause: Volume ownership/SELinux labels don’t match the container’s runtime UID/GID; bind mount path created by root on host.
Fix: Standardize volume creation; apply correct ownership on host path; use consistent security context; avoid ad-hoc manual directory creation.
3) Symptom: Writes are slow only in production, reads seem fine
Root cause: fsync latency on production storage path (network volume, slow disk, or host contention). In dev, your disk is fast and idle.
Fix: Measure fsync behavior; move data to local SSD/NVMe or properly provisioned block storage; avoid overlay for datadir; tune I/O scheduler and host.
4) Symptom: “Too many connections” during traffic spikes
Root cause: Connection pooling misconfigured, containers auto-scaling without DB capacity planning, or low max_connections relative to workload.
Fix: Fix pool settings first (timeouts, max open connections). Then size max_connections with memory accounting (per-connection buffers).
5) Symptom: Queries behave differently between MariaDB and MySQL
Root cause: Different optimizer choices, different collations, or SQL mode differences.
Fix: Pin engine + version across environments; run EXPLAIN in prod-like conditions; standardize schema defaults for charset/collation and SQL mode.
6) Symptom: Container “healthy” but app gets timeouts right after restart
Root cause: Health check only verifies process is running; DB is still in crash recovery or warming buffers.
Fix: Health check should run a query (and optionally verify replication state). Gate app startup on readiness, not “container started.”
7) Symptom: “Disk full” inside container, host has free space
Root cause: You’re writing to the container layer (overlay) instead of the volume; or the volume is on a different filesystem that is full.
Fix: Ensure datadir points into the mounted volume; verify mounts; clean up binlogs and tmpdir with a retention policy.
8) Symptom: Replication breaks after “minor” version change
Root cause: Binlog/GTID compatibility expectations violated; different defaults or deprecated behaviors across versions/engines.
Fix: Upgrade with a staged plan; validate replication in a pre-prod environment with real traffic patterns; pin versions until validated.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
The team had a tidy Docker Compose file. A MySQL service, an app service, and a volume. Dev was happy. Staging was “fine.” Production went live and died quietly: intermittent login failures and a spike in 500s that looked like an app regression.
The assumption was subtle: “MySQL is MySQL.” Locally, they had been running MariaDB because it started faster and felt familiar. In production, a platform engineer swapped it to mysql:8 for standardization. No malice; just a well-intentioned cleanup.
The app used an older MySQL connector embedded in a base image that hadn’t been updated in a while. Under light load it sometimes worked (thanks to connection reuse and a forgiving test path). Under real load, new connections ramped up and authentication failed with plugin errors.
The fix wasn’t complicated: update the connector and standardize the auth expectations. The damage was the time lost because everyone debugged “networking” for an hour. The lesson was sharper: engine selection is part of your application contract, not an infrastructure detail you can hot-swap on a Friday.
Mini-story 2: The optimization that backfired
A different org had a performance problem: writes were slow. Someone spotted the usual suspects—innodb_flush_log_at_trx_commit=1 and sync_binlog=1. They changed them to 2 and 0 respectively. Latency improved. The incident channel calmed down. High fives all around.
Two weeks later, a host reboot happened. Not even a dramatic one. After restart, the application was missing recent transactions. Not a lot. Just enough to be painful, because the missing rows were customer-facing and time-sensitive.
Now the team had two problems: fix the data and fix the trust. They had to reconcile from external systems, reprocess events, and explain why durability had been traded away without a written decision. The root issue wasn’t the settings themselves; it was using them as a performance band-aid instead of fixing storage latency and I/O contention.
The eventual solution was boring: move the database volume onto properly provisioned block storage, cap connection storms, and tune slow queries. Durability settings went back to strict, because “fast but wrong” is not a feature.
Mini-story 3: The boring but correct practice that saved the day
Another team ran MariaDB in containers for an internal platform. They were not glamorous, but they were disciplined. Every deployment pinned image digests. Every config change went through a review. Backups ran nightly with a restore test every sprint.
One morning, after a routine node maintenance event, the database failed to start. The logs showed corruption symptoms in InnoDB tablespace files. Nobody celebrated, but nobody panicked either. They followed the runbook: stop writes, snapshot what’s left, restore last known good backup, replay binlogs up to the incident boundary.
They were back in service with limited data loss exposure because their retention and binlog shipping were designed ahead of time. The root cause turned out to be a storage layer issue on that node, not MariaDB itself. But the reason it didn’t become a career event was simple: they practiced recovery as a normal thing, not a once-a-year fire drill.
One short joke (2/2): Backups are like parachutes—if you wait until you need one to check it, you’ve already made a choice.
Checklists / step-by-step plan
Step-by-step: move from “local success” to “production survivability”
- Choose the engine and version deliberately. Document why you’re using MySQL vs MariaDB, and pin the exact version.
- Standardize config injection. Prefer a mounted config file or a controlled config mechanism; avoid mixing env vars and multiple config fragments without a policy.
- Define storage. Decide: named volume vs bind mount; host filesystem; storage class; IOPS/durability requirements.
- Set durability consciously. Keep
innodb_flush_log_at_trx_commit=1andsync_binlog=1unless you have signed-off data loss tolerance. - Set memory limits and tune to fit. Cap
max_connections, size buffer pool, and verify container OOM behavior. - Implement readiness checks. A health check that runs a query beats “port open.”
- Log and export metrics. Slow query log, error log, and core MySQL/MariaDB metrics (connections, buffer pool hit ratio, redo log waits).
- Backups with restore tests. Schedule restores in a disposable environment; validate schema and application compatibility.
- Upgrade rehearsal. Rehearse version upgrades with production-like data size and traffic patterns. Measure recovery times.
- Incident runbooks. Have a playbook for: high latency, replication lag, disk full, and startup failures.
Deployment checklist (print this before you ship)
- Image tag pinned and reviewed; ideally digest pinned.
- Data directory on a volume (not container layer).
- Filesystem and mount options reviewed for DB suitability.
- Memory/CPU limits set; MySQL/MariaDB tuned accordingly.
- Auth plugin/client compatibility validated.
- Charset/collation standardized in schema, not left to defaults.
- Durability settings explicitly decided and documented.
- Backups configured; restore tested.
- Monitoring and alerts set for disk, CPU, memory, and query latency.
- Readiness/health checks validate real query success.
FAQ
1) Can I switch from MariaDB to MySQL by just changing the Docker image?
You can try, and sometimes it even starts. But data directory formats, system tables, and defaults differ across versions. Plan a migration: logical dump/restore or replication-based cutover.
2) Why is MySQL 8 breaking my app when MariaDB worked?
Most commonly: authentication plugin differences, SQL mode strictness, and collation defaults. Confirm with SELECT @@version_comment, user plugins, and @@sql_mode.
3) Are named volumes safer than bind mounts?
Operationally, named volumes reduce foot-guns with permissions and accidental writes to the container layer. Bind mounts are fine when you control the host and have strong practices around ownership, labels, and backups.
4) Why is the database slower in Docker than on bare metal?
If the datadir is truly on a volume, the overhead can be small. The big slowdowns usually come from the underlying storage (networked volumes, slow disks, contention) and fsync behavior—not Docker itself.
5) Should I disable fsync-related durability settings for performance?
Only if you can tolerate losing committed transactions on crash. For production systems that users trust, fix storage latency and query patterns first. Changing durability is a business decision, not a tuning tweak.
6) What’s the quickest way to confirm I/O is my bottleneck?
Look for InnoDB status showing log waits, correlate with slow commit latency, and check host disk metrics. In incidents, the “writes slow, reads fine” pattern is a loud hint.
7) Why does Compose “depends_on” not solve startup issues?
It controls start order, not readiness. MySQL/MariaDB might accept connections before it can serve queries reliably (crash recovery, initialization, migrations).
8) Can I use the same backup strategy for MySQL and MariaDB containers?
Conceptually yes—logical dumps and physical backups exist for both—but tooling options and compatibility details differ. The real rule: test restores with the same engine/version you will run.
9) What’s the safest way to run schema migrations in containers?
Run migrations as an explicit job with a lock/leader mechanism, and gate app rollout on migration success. Don’t let multiple replicas race to migrate at startup.
10) How do I prevent accidental upgrades when rebuilding images?
Pin base images, pin DB image tags/digests, and treat version changes as change-managed events with a rollback plan.
Conclusion: next steps that prevent re-learning this
“It worked locally” is usually true. It’s just irrelevant. Production is a different machine with different failure modes, and databases amplify every tiny lie your environment tells you.
Do these next, in order:
- Pick the engine and pin the version (and stop pretending MySQL and MariaDB are interchangeable at runtime).
- Audit your storage path for the datadir: filesystem type, mount options, and real fsync latency under load.
- Standardize defaults: sql_mode, charset/collation, authentication expectations—make them explicit.
- Implement readiness checks and recovery drills: a database you can’t restore is a rumor, not a system.
- Run the practical tasks above and keep their outputs in your runbook so you can compare before/after changes.
If you do nothing else: stop using “latest,” stop treating the data directory as a casual folder, and stop letting defaults decide correctness. Production rewards boredom. Earn it.