MySQL vs MariaDB: Kubernetes readiness—probes, restarts, and data safety

Was this helpful?

Your MySQL pod “looks fine” until the node reboots, Kubernetes starts doing what it was designed to do, and suddenly the database is in a restart loop with a volume that refuses to mount. Meanwhile, the application team calls it “a Kubernetes issue,” and the platform team calls it “a database issue.” Congratulations: you’re the adult in the room now.

This is a practical, slightly opinionated guide to running MySQL and MariaDB on Kubernetes without turning probes into a denial-of-service, restarts into corruption, or “stateless mindset” into data loss. We’ll talk about what breaks, why it breaks, and the commands you’ll actually run at 2 a.m.

What Kubernetes actually does to your database (and why it hurts)

Kubernetes is consistent about one thing: it is indifferent. It will restart your process if the probe fails. It will move your pod if the node is drained. It will kill containers when resources are tight. It will also do this while your database is mid-flush, mid-recovery, or mid-transaction, because it has no concept of “this is a sacred cow with a write-ahead log.”

Databases are state machines with expensive invariants. InnoDB’s invariants revolve around redo/undo logs, doublewrite (depending on settings), and durable fsync. MariaDB and MySQL share a lot of DNA here, but Kubernetes pressure exposes the edges differently—especially around startup time, probe behavior, and replication stacks you might be using (async replication, semi-sync, or Galera).

The three Kubernetes behaviors that matter most for data services:

  • Probes are traffic gates and kill switches. Readiness decides whether you get traffic. Liveness decides whether you get to live. Startup exists because liveness is otherwise a bully.
  • Restarts are “normal.” You don’t get to treat them as exceptional. Your database must be able to restart safely and quickly, repeatedly.
  • Storage is attached and detached by humans (and controllers). Volume mount delays, stale attachments, and slow fsync can dominate everything.

If you only remember one operational truth: for stateful systems, Kubernetes isn’t “self-healing,” it’s “self-retrying.” Your job is to make retries safe.

MySQL vs MariaDB: what matters in Kubernetes

MySQL and MariaDB are close enough to lull teams into assuming operational equivalence. That assumption is the seed of several very expensive outages.

Core engine behavior: mostly similar, but watch the defaults and edge cases

In typical Kubernetes deployments you’ll run InnoDB on both. Crash recovery exists on both. Both can recover from abrupt termination. But the devil is in:

  • Startup duration variance. After an unclean shutdown, InnoDB crash recovery time can range from seconds to “your liveness probe is now a weapon.” Recovery depends on redo log size, checkpoint age, IO throughput, and fsync behavior on your storage class.
  • Client tooling differences. Some images ship mysqladmin, some ship mariadb-admin wrappers, and some ship minimal clients. Probes and init scripts that hardcode one tool tend to fail on the other at the worst time.
  • GTID and replication details. MariaDB GTID is not MySQL GTID; they are not drop-in compatible. If you’re migrating or mixing tooling, the “it’s just GTID” story turns into a long night.

Replication stacks: async vs Galera changes how probes should behave

“MySQL on Kubernetes” often means one primary plus replicas (async replication), or an operator-managed cluster. “MariaDB on Kubernetes” often means either the same, or Galera (wsrep) because it’s easy to sell “multi-primary” to management.

Probes for async replication can be simple: is mysqld alive and accepting connections, and (for readiness) is replication reasonably caught up? Probes for Galera need to consider cluster state: a node can accept TCP and still be unsafe to serve writes (or even reads) depending on donor/desync state.

Operators and images: Kubernetes readiness is a product decision

Your operational experience is often determined by the operator/image choice more than the database brand. Some operators implement sane startupProbes, graceful termination hooks, and recovery gates. Others ship optimistic defaults and let you learn by fire.

My opinion: if you run stateful services without an opinionated operator (or a carefully maintained in-house chart), you’ve effectively chosen “pet database on cattle orchestration.” That’s not a strategy; it’s a vibe.

Interesting facts and historical context (the parts people forget)

  1. MariaDB was created in 2009 as a community-led fork after Oracle acquired Sun Microsystems (and thus MySQL).
  2. MySQL’s default character set changed over time (notably toward utf8mb4 in modern versions), and this impacts schema compatibility and index length in real upgrades.
  3. MariaDB’s GTID implementation is different from MySQL’s; tooling that assumes MySQL GTID semantics can mis-diagnose replication health on MariaDB.
  4. Galera replication became a MariaDB “headline feature” for many enterprises, but it’s a different failure model than async replication: membership and quorum are operational requirements, not nice-to-haves.
  5. Kubernetes added startupProbe relatively late (compared to liveness/readiness), largely because too many real workloads had slow-but-correct startups that were getting killed.
  6. InnoDB’s crash recovery cost is not linear with data size; it’s tied to redo volume and checkpoint behavior, which is why a “small database” can still restart painfully slowly after IO stalls.
  7. fsync behavior changed across cloud storage generations; what was “fine” on local SSD can become a restart storm on network-attached volumes with higher latency variance.
  8. MySQL 8 introduced a transactional data dictionary, which improved consistency but also made some upgrade/downgrade and recovery behaviors different from older MySQL and from MariaDB lines.

Readiness, liveness, and startup probes that don’t sabotage you

Probes are where Kubernetes touches your database every few seconds. Done well, they prevent traffic from hitting a sick pod. Done badly, they create the sickness.

Rules of thumb (opinionated, because you need them)

  • Never use liveness to check “is the database logically healthy.” Liveness should answer: “Is the process stuck beyond recovery?” If you make liveness depend on replication lag or a complex SQL query, you will kill perfectly recoverable pods.
  • Readiness can be strict. Readiness is the right place to gate traffic based on “can serve queries now” and (optionally) “is caught up enough.”
  • Use startupProbe for crash recovery windows. If you don’t, liveness will murder your InnoDB recovery. It will then restart, re-enter recovery, and get murdered again. That loop is a classic.
  • Prefer exec probes using local socket when possible. TCP probes can pass while SQL is wedged. SQL probes can fail when DNS is slow. Socket-based local admin checks reduce moving parts.

What a good readiness check looks like

Readiness should be cheap and deterministic. A common pattern: connect locally and run SELECT 1. If you’re a replica, optionally check replication lag. If you’re in Galera, check wsrep state.

Avoid “expensive truth” queries like scanning large tables or probing metadata that locks. You want “can accept a trivial query and return quickly.” That correlates well with user experience, and it won’t collapse your pod under probe load.

What a good liveness check looks like

Liveness should be conservative. For MySQL/MariaDB, a decent liveness probe is a local ping using admin tooling with a short timeout. If it can’t respond to a ping for N consecutive checks, something is wedged.

Startup probes: buy time for recovery

After unclean shutdown, InnoDB can legitimately take minutes to recover. Your probes must grant that time. This isn’t “being lenient.” This is correctness.

Joke #1: A liveness probe that kills mysqld during crash recovery is like checking a patient’s pulse by unplugging the ventilator.

Probe timeouts: the silent killer

Many probe failures are not “database down,” they’re “probe timeout too short for occasional storage latency.” If your PVC is backed by network storage, a 1-second timeout is an outage generator.

Restarts, shutdown semantics, and InnoDB crash recovery

Kubernetes termination is a signal plus a deadline. It sends SIGTERM, waits terminationGracePeriodSeconds, then SIGKILL. Your database needs enough grace to flush, update state, and exit cleanly.

Graceful shutdown buys you faster startup

Clean shutdown: fewer redo logs to apply, faster startup, fewer scary log messages, less time in “not ready.” Unclean shutdown: recovery time that depends on IO, which you can’t control during a node event.

Restart loops are usually probe-policy bugs

If a pod restarts repeatedly and the logs show crash recovery starting each time, your probes are too aggressive or you’re not using startupProbe. Kubernetes is not “healing” your DB; it’s interrupting the healing.

A reliability idea (paraphrased)

“Hope is not a strategy.” — paraphrased idea commonly attributed to engineers and operators in reliability circles. Treat it as guidance: design for failure, don’t wish it away.

Storage and data safety: PVCs, filesystems, and the fsync tax

If you want a database to be durable, you must pay for durable writes. Kubernetes doesn’t change that. It just makes the billing more confusing because the slow part is now a StorageClass.

PVC semantics that matter

  • Access mode (RWO vs RWX): Most database volumes should be RWO. RWX network filesystems can work, but performance and locking semantics vary; don’t casually drop MySQL on NFS and expect happiness.
  • Reclaim policy: “Delete” has its place, but you should not discover it in production after deleting a StatefulSet.
  • Attachment/mount delays: A pod can be scheduled quickly but wait a long time for volume attach. Probes and timeouts must account for this.

fsync and friends

The durability knobs (like innodb_flush_log_at_trx_commit and sync_binlog) interact with the underlying storage. On low-latency local SSD, fsync per commit is manageable. On a busy network volume with latency spikes, it can become a throughput cliff.

Turning down durability to “fix performance” is a business decision, even if you pretend it’s a technical one. If you relax fsync, you are choosing how much data loss you can tolerate on node failure.

Filesystem and mount options

ext4 and XFS are the usual choices. What matters more is consistency and observability: you need to know what you’re actually running on. Also, if you’re using overlay filesystems in weird ways, stop. Databases want boring block storage.

Joke #2: There are two kinds of storage: the kind you benchmark, and the kind that benchmarks you in production.

Practical tasks: commands, expected output, and the decision you make

These are real tasks you can run from an admin machine with kubectl access, plus a few inside-container checks. Each includes what the output implies and what decision you should make next.

Task 1: See if you’re dealing with a probe problem or a crash

cr0x@server:~$ kubectl -n prod get pod mysql-0 -o wide
NAME     READY   STATUS             RESTARTS   AGE   IP           NODE
mysql-0  0/1     CrashLoopBackOff   7          18m   10.42.3.17   node-7

Meaning: CrashLoopBackOff with multiple restarts suggests either the process exits or liveness kills it. Not enough information yet.

Decision: Inspect events and previous logs next; don’t tweak random MySQL settings yet.

Task 2: Read pod events to catch liveness/readiness killing you

cr0x@server:~$ kubectl -n prod describe pod mysql-0 | sed -n '/Events:/,$p'
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Warning  Unhealthy  2m (x12 over 16m)      kubelet            Liveness probe failed: command timed out: context deadline exceeded
  Normal   Killing    2m (x12 over 16m)      kubelet            Container mysql failed liveness probe, will be restarted

Meaning: The container is being killed by liveness timeout, not necessarily crashing.

Decision: Add/adjust startupProbe and increase probe timeoutSeconds. Also check storage latency before blaming MySQL.

Task 3: Compare current vs previous container logs

cr0x@server:~$ kubectl -n prod logs mysql-0 -c mysql --previous --tail=60
2025-12-31T01:12:17.812345Z 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=187654321
2025-12-31T01:12:46.991201Z 0 [Note] InnoDB: 128 out of 1024 pages recovered
2025-12-31T01:12:48.000901Z 0 [Note] mysqld: ready for connections.
2025-12-31T01:12:49.103422Z 0 [Warning] Aborted connection 12 to db: 'unconnected' user: 'healthcheck' host: 'localhost' (Got timeout reading communication packets)

Meaning: It actually reached “ready for connections” and then got killed/failed checks due to timeouts.

Decision: Probe query likely too slow or timeout too short under IO pressure. Fix probes first; then measure IO.

Task 4: Confirm the probe definitions (you’d be shocked how often they’re wrong)

cr0x@server:~$ kubectl -n prod get pod mysql-0 -o jsonpath='{.spec.containers[0].livenessProbe.exec.command}{"\n"}{.spec.containers[0].readinessProbe.exec.command}{"\n"}{.spec.containers[0].startupProbe.exec.command}{"\n"}'
[mysqladmin ping -h 127.0.0.1 -uroot -p$(MYSQL_ROOT_PASSWORD)]
[mysql -h 127.0.0.1 -uroot -p$(MYSQL_ROOT_PASSWORD) -e SELECT 1]
[]

Meaning: No startupProbe configured. Liveness uses password expansion in a way that may not work depending on shell handling.

Decision: Add startupProbe and prefer a wrapper script that handles secrets safely. Also avoid putting passwords directly in command args.

Task 5: Check termination grace period (clean shutdown matters)

cr0x@server:~$ kubectl -n prod get pod mysql-0 -o jsonpath='{.spec.terminationGracePeriodSeconds}{"\n"}'
30

Meaning: 30 seconds is often too short for a busy InnoDB instance to shut down cleanly.

Decision: Raise it (often 120–300s depending on workload) and ensure your container handles SIGTERM properly.

Task 6: Verify the PVC and whether the volume is stuck attaching

cr0x@server:~$ kubectl -n prod get pvc -l app=mysql
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS
data-mysql-0       Bound    pvc-3d5b2c9a-6c62-4a9c-a0c2-1f2a2e6b9b8a   200Gi      RWO            fast-ssd

Meaning: PVC is bound; attachment might still be slow, but at least provisioning isn’t failing.

Decision: If pod is Pending, check attach events; if running but slow, move to IO diagnostics.

Task 7: Check node-level pressure that triggers evictions or throttling

cr0x@server:~$ kubectl describe node node-7 | sed -n '/Conditions:/,/Addresses:/p'
Conditions:
  Type             Status  LastHeartbeatTime                 Reason              Message
  ----             ------  -----------------                 ------              -------
  MemoryPressure   False   2025-12-31T01:15:10Z              KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    2025-12-31T01:15:10Z              KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   2025-12-31T01:15:10Z              KubeletHasSufficientPID      kubelet has sufficient PID available

Meaning: DiskPressure True can cause evictions and can correlate with IO stalls.

Decision: Fix node disk pressure (image GC, log cleanup, bigger root disk). Don’t tune MySQL to “solve” node starvation.

Task 8: Look at container restarts with exit codes (process crash vs kill)

cr0x@server:~$ kubectl -n prod get pod mysql-0 -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}{" "}{.status.containerStatuses[0].lastState.terminated.reason}{"\n"}'
137 Error

Meaning: Exit code 137 usually means SIGKILL (often liveness kill or OOM kill).

Decision: Check events for OOMKilled; otherwise treat as probe/termination-grace issue.

Task 9: Check if you’re OOM-killing mysqld during recovery

cr0x@server:~$ kubectl -n prod describe pod mysql-0 | grep -E 'OOMKilled|Reason:|Last State' -n
118:    Last State:     Terminated
119:      Reason:       OOMKilled

Meaning: Memory limit is too low for workload or recovery phase; InnoDB may allocate spikes (buffer pool, caches, sort buffers depending on config).

Decision: Increase memory limits, reduce InnoDB buffer pool, and ensure you’re not using aggressive per-connection buffers with high max connections.

Task 10: Inspect live MySQL/MariaDB status quickly (inside the pod)

cr0x@server:~$ kubectl -n prod exec -it mysql-0 -c mysql -- bash -lc 'mysqladmin -uroot -p"$MYSQL_ROOT_PASSWORD" ping && mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW GLOBAL STATUS LIKE '\''Uptime'\'';"'
mysqld is alive
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Uptime        | 43    |
+---------------+-------+

Meaning: Server is up and answering. If readiness still fails, the readiness probe is wrong or too strict.

Decision: Align readiness logic with what you just proved: basic connectivity plus minimal correctness gates.

Task 11: Check InnoDB recovery indicators and durability settings

cr0x@server:~$ kubectl -n prod exec -it mysql-0 -c mysql -- bash -lc 'mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW VARIABLES WHERE Variable_name IN (\"innodb_flush_log_at_trx_commit\",\"sync_binlog\",\"innodb_doublewrite\");"'
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| innodb_doublewrite           | ON    |
| innodb_flush_log_at_trx_commit | 1   |
| sync_binlog                  | 1     |
+------------------------------+-------+

Meaning: This is durability-max mode. Good for data safety, potentially brutal on slow storage.

Decision: If latency is unacceptable, fix storage first. Only relax these if the business explicitly accepts data loss risk.

Task 12: Measure replication lag (async replica case)

cr0x@server:~$ kubectl -n prod exec -it mysql-1 -c mysql -- bash -lc 'mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW SLAVE STATUS\G" | egrep "Seconds_Behind_Master|Slave_IO_Running|Slave_SQL_Running"'
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 187

Meaning: Replica is behind. It may be alive but not ready for serving reads if your app expects fresh data.

Decision: Use readiness to gate traffic based on a lag threshold appropriate for your app. Don’t use liveness for this.

Task 13: Measure wsrep state (MariaDB Galera case)

cr0x@server:~$ kubectl -n prod exec -it mariadb-0 -c mariadb -- bash -lc 'mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW STATUS LIKE '\''wsrep_%'\'';" | egrep "wsrep_local_state_comment|wsrep_cluster_status|wsrep_ready"'
wsrep_cluster_status	Primary
wsrep_local_state_comment	Synced
wsrep_ready	ON

Meaning: Node is in Primary component, synced, and ready. This is the state you want before declaring readiness.

Decision: If wsrep_ready is OFF or state is Donor/Joining, keep the pod NotReady to avoid inconsistent reads/writes.

Task 14: See if the filesystem is actually what you think it is

cr0x@server:~$ kubectl -n prod exec -it mysql-0 -c mysql -- bash -lc 'df -T /var/lib/mysql | tail -n +2'
/dev/nvme1n1  ext4  205113212  83412228  111202456  43% /var/lib/mysql

Meaning: ext4 on a block device is typical. If you see nfs or something surprising, adjust expectations and probes.

Decision: If it’s a network filesystem, be much more conservative with timeouts and consider switching storage class for production OLTP.

Task 15: Watch for slow fsync symptoms in MySQL status

cr0x@server:~$ kubectl -n prod exec -it mysql-0 -c mysql -- bash -lc 'mysql -uroot -p"$MYSQL_ROOT_PASSWORD" -e "SHOW GLOBAL STATUS LIKE \"Innodb_os_log_fsyncs\"; SHOW GLOBAL STATUS LIKE \"Innodb_log_waits\";"'
+---------------------+--------+
| Variable_name       | Value  |
+---------------------+--------+
| Innodb_os_log_fsyncs| 983211 |
+---------------------+--------+
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Innodb_log_waits | 1249  |
+------------------+-------+

Meaning: Innodb_log_waits indicates contention on log flushing; often correlated with IO latency or too-small log buffers/log files.

Decision: Investigate storage latency and consider log configuration changes. If running on noisy network volumes, fix that first.

Fast diagnosis playbook: find the bottleneck before you blame the wrong thing

When a database pod is flapping or slow on Kubernetes, you can burn hours debating “DB vs platform.” Don’t. Follow a sequence that quickly tells you where reality lives.

First: determine if Kubernetes is killing it, or it’s crashing

  1. Pod events: liveness failures, OOMKilled, eviction, volume mount issues.
  2. Exit code: 137 (kill) vs a mysqld crash stack trace.
  3. Previous logs: did it reach “ready for connections”?

Second: determine if storage is the bottleneck

  1. Node conditions: DiskPressure, IO saturation hints.
  2. Startup/recovery time: if it only fails during crash recovery windows, suspect slow IO and too-aggressive probes.
  3. Database indicators: log waits, stalled checkpoints, replication apply lag that correlates with write load.

Third: determine if your probe logic is overfitted

  1. Does readiness depend on replication lag? Good. But is the threshold sensible for your app?
  2. Does liveness depend on replication lag? Bad. Change it.
  3. Is the probe invoking the right client? MariaDB images and MySQL images aren’t always symmetrical.

Fourth: verify shutdown and restart policy

  1. terminationGracePeriodSeconds long enough?
  2. preStop hook to stop accepting traffic before SIGTERM?
  3. startupProbe exists and is tuned for worst-case recovery?

Three corporate-world mini-stories (all too real)

Mini-story 1: The incident caused by a wrong assumption

A mid-sized SaaS company migrated from VMs to Kubernetes in phases. The first services moved were stateless, and everything went smoothly. Confidence grew. Then they moved their “simple MySQL replica” used for reporting. The plan was to copy the old VM configuration, slap it into a StatefulSet, and add a readiness probe that checked replication lag to protect dashboards from stale data.

The wrong assumption: “If replication is behind, the pod is unhealthy.” The readiness probe was implemented as a liveness probe by accident—same script, copied into the wrong stanza. It worked fine in staging because the dataset was small, and lag rarely exceeded a few seconds.

Production had a daily burst of writes and an analytics job that created periodic IO spikes. During the burst, lag grew. The liveness probe failed. Kubernetes restarted the replica. MySQL started crash recovery, which was slower under the same IO pressure. The liveness probe failed again. Restart loop. The dashboards broke, then the application started failing because it had been quietly using that replica for some read paths.

The fix was boring: move lag checks to readiness only, add startupProbe with a generous budget, and stop routing any critical application reads to a replica without explicit staleness tolerance. The postmortem’s uncomfortable lesson: the “reporting replica” was never actually isolated. It was just socially isolated.

Mini-story 2: The optimization that backfired

A large enterprise wanted faster write throughput on MariaDB in Kubernetes. They were running durable settings and complained about commit latency during peak hours. Someone proposed relaxing durability: set innodb_flush_log_at_trx_commit=2 and sync_binlog=0, because “the storage is redundant anyway.”

Throughput improved. Latency graphs looked better. People celebrated and moved on. A few weeks later, a node failure occurred during a heavy write period. Kubernetes rescheduled the pod, MariaDB started, crash recovery completed quickly—because it had less durable state to reconcile.

Then came the quiet part: a set of recent transactions were missing. Not corrupted. Missing. The application behaved inconsistently because some business events never happened. The logs did not scream. They whispered. The company spent days reconciling data using upstream event streams and customer support tickets. It was the worst kind of failure: the database was “healthy,” but the business wasn’t.

The eventual policy: if you relax durability, you must document the exact data loss window you accept and build compensating controls (idempotent writes, event sourcing, reconciliation jobs). Otherwise, keep 1/1 and pay for real storage. The optimization “worked” until it became a financial audit problem.

Mini-story 3: The boring but correct practice that saved the day

A fintech ran MySQL on Kubernetes with a strict discipline: every StatefulSet had a startupProbe tuned to worst-case recovery, termination grace long enough for clean shutdown, and a preStop hook that marked the pod NotReady before SIGTERM. They also ran periodic restore tests from backups into a scratch namespace. No one loved doing it. It wasn’t glamorous.

One afternoon, a routine cluster upgrade drained nodes faster than expected due to a misconfigured disruption budget elsewhere. Several MySQL pods were terminated while under load. A few had to do crash recovery on restart. It was not pretty, but it was controlled: pods went NotReady before termination, traffic moved away, and the grace period allowed most instances to shut down cleanly.

The real save: one replica came up with a broken data directory due to an unrelated storage backend glitch. Instead of improvising, the on-call followed the runbook they had practiced: cordon the affected node, detach the volume, provision a fresh PVC, and restore from last known good backup. The service degraded but stayed up. No one had to “just delete the pod and see.”

Their advantage wasn’t better engineers. It was fewer surprises. They had rehearsed the boring moves until they were automatic.

Common mistakes: symptoms → root cause → fix

1) CrashLoopBackOff during crash recovery

Symptoms: Pod restarts every 30–90 seconds; logs show InnoDB crash recovery starting repeatedly.

Root cause: Liveness probe is failing during legitimate recovery; no startupProbe; timeout too short; storage latency spikes.

Fix: Add startupProbe with enough failureThreshold×periodSeconds to cover worst-case recovery. Increase liveness timeout. Keep liveness cheap (admin ping), not heavy SQL.

2) Pod is Running but never Ready

Symptoms: STATUS Running, but READY 0/1; app sees no endpoints.

Root cause: Readiness probe checks replication lag with an unrealistically strict threshold; or probe connects via DNS/service that isn’t ready; or wrong credentials/utility in image.

Fix: Use localhost socket; ensure probe binary exists; loosen readiness criteria to what the app can tolerate; separate “serving reads” vs “serving writes” if needed.

3) Random “server has gone away” during node drains

Symptoms: Short spikes of client errors during deployment/upgrade; logs show abrupt disconnects.

Root cause: No preStop hook to remove pod from endpoints before termination; termination grace too short; connection draining not implemented.

Fix: Add preStop to flip readiness (or sleep after marking NotReady). Increase termination grace. Consider proxy layer for connection draining.

4) Replicas fall behind permanently after a restart

Symptoms: Seconds_Behind_Master grows and never recovers; IO/SQL threads run but slow.

Root cause: Storage throughput insufficient for apply rate; CPU limits throttling; single-threaded apply; or long transactions on primary.

Fix: Fix storage class/IOPS first; raise CPU limits; tune replication apply parallelism where applicable; reduce long transactions.

5) Data directory corruption after “simple redeploy”

Symptoms: mysqld fails to start; errors about missing tablespaces or redo log mismatch.

Root cause: Two pods mounted the same RWO volume due to manual intervention or broken controller; or container image changed datadir permissions; or init containers reinitialized accidentally.

Fix: Enforce StatefulSet identity and volume claims; avoid manual “attach elsewhere” hacks; lock down init logic so it never wipes an existing datadir.

6) “It’s slow” but only on Kubernetes

Symptoms: Same config is fast on VMs, slow on k8s; commits are spiky.

Root cause: Network-attached storage latency variance; CPU throttling due to low limits; noisy neighbor effects; mis-sized buffer pool for memory limit.

Fix: Benchmark storage class; reserve CPU/memory appropriately; ensure buffer pool fits inside memory limit; keep durability knobs stable unless you accept loss.

Checklists / step-by-step plan

Checklist A: Probe design for MySQL/MariaDB (do this before production)

  1. Use startupProbe to protect crash recovery (budget for worst case, not average).
  2. Liveness: simple local ping with short SQL-free check; conservative timeouts.
  3. Readiness: local connection + SELECT 1; optionally check replica lag or wsrep readiness.
  4. Avoid secrets in argv when possible; prefer env vars or config file readable only by mysql user.
  5. Set probe timeouts for your storage, not your optimism.

Checklist B: Restart safety and shutdown correctness

  1. terminationGracePeriodSeconds sized for flush/shutdown under load.
  2. preStop hook that stops routing traffic before SIGTERM (readiness gate or proxy drain).
  3. PDBs that prevent multiple critical pods from being disrupted at once.
  4. Pod anti-affinity so you don’t put primary and replica on the same failure domain.

Checklist C: Data safety posture (decide explicitly)

  1. Durability settings: choose innodb_flush_log_at_trx_commit and sync_binlog based on acceptable loss window.
  2. Storage class: pick a class with predictable fsync latency; test it with the same access pattern.
  3. Backups: automated, encrypted, with restore drills in a scratch namespace.
  4. Schema migrations: staged and reversible when possible; don’t run long blocking migrations during peak.

FAQ

1) Should I use MySQL or MariaDB on Kubernetes?

If you need maximum compatibility with MySQL ecosystem tooling (and MySQL 8 features), pick MySQL. If you’re committed to MariaDB features like certain Galera-based patterns, pick MariaDB. Operationally, both can be run safely—but only if you design probes and storage correctly.

2) Do I need an operator?

If you’re running production writes, yes—either a mature operator or a very disciplined internal chart with runbooks. Stateful reliability comes from automation of the ugly edge cases: bootstrap, failover, backups, and safe upgrades.

3) Can liveness probe check replication lag?

No. That’s a readiness concern. Liveness killing a lagging replica turns “temporarily behind” into “permanently dead.”

4) Why does my database restart take so long sometimes?

Unclean shutdown triggers crash recovery. Recovery time depends on redo to apply and the IO throughput/latency of your volume. On slow or variable storage, it can swing wildly.

5) Is it safe to run MySQL/MariaDB on network storage?

It can be safe if the storage provides predictable latency and correct semantics, but performance and tail latency often suffer. For heavy OLTP, local SSD or high-quality block storage with good fsync characteristics is usually the safer bet.

6) Should I set innodb_flush_log_at_trx_commit=2 for performance?

Only if the business accepts losing up to ~1 second of transactions on crash (and you understand binlog durability too). If you can’t tolerate that, don’t “optimize” it. Fix storage and resource isolation instead.

7) How should readiness behave for Galera (MariaDB) nodes?

Gate readiness on wsrep state: cluster status Primary, local state Synced, and wsrep_ready=ON. Otherwise, you’ll route traffic to nodes that are joining or donating and get weird client behavior.

8) What’s the #1 cause of data loss in Kubernetes database setups?

Human decisions dressed up as defaults: relaxed durability without explicit acceptance, deleting StatefulSets with “Delete” reclaim policy, or broken backup/restore discipline. Kubernetes rarely deletes your data accidentally; people do.

9) Do probes create load on the database?

Yes. If you probe too frequently, run heavy queries, or use too many concurrent healthcheck connections, you can cause the very latency that fails the probe. Keep checks cheap and rate-limited.

10) What’s the simplest safe probe approach?

StartupProbe: generous mysqladmin ping. Liveness: the same ping with conservative timeout and period. Readiness: local connect + SELECT 1, plus optional replication/wsrep gating.

Next steps you can do this week

If you’re currently running MySQL or MariaDB on Kubernetes, here’s a practical sequence that improves safety without requiring a migration or a new operator tomorrow:

  1. Add startupProbe to every database pod, tuned to worst-case crash recovery on your storage class.
  2. Audit liveness probes and remove anything that checks logical health (replication lag, wsrep, long SQL).
  3. Increase termination grace and add a preStop hook to stop routing traffic before termination.
  4. Measure storage tail latency during peak and correlate with probe failures and commit latency; if it’s spiky, fix storage before tuning MySQL knobs.
  5. Run a restore drill into a scratch namespace. If it’s painful, it’s not a backup; it’s a wish.
  6. Write down your durability posture (1/1 vs relaxed), with the business owner’s explicit acceptance if you choose data loss risk.

The goal isn’t “never restart.” The goal is “restarts are survivable, predictable, and boring.” Kubernetes will keep retrying. Make sure it’s retrying something safe.

← Previous
CPUs without fans: the comeback of passive computing
Next →
Ubuntu 24.04: Certificates renew but Nginx still serves the old one — why and how to fix

Leave a comment