The page is blank. Or worse: it loads instantly with that one line that makes every on-call engineer sigh—“Error establishing a database connection.” Your CEO is refreshing. Your marketing team is “just checking something.” Your monitoring is screaming like a tea kettle.
This error is not a diagnosis. It’s WordPress saying, “I asked the database a question and got silence.” Your job is to figure out whether that silence is a dead database, a wrong address, a credential problem, a saturated server, or a storage subsystem quietly eating itself.
What the error actually means (and what it doesn’t)
WordPress renders “Error establishing a database connection” when it can’t complete its initial connection handshake to MySQL/MariaDB (or another MySQL-compatible server). The failure happens before WordPress can run queries for options, users, or posts, so you get no theme, no content, just the message.
Technically, WordPress is doing something like:
- Read DB settings from
wp-config.php(DB_NAME, DB_USER, DB_PASSWORD, DB_HOST). - Initialize the database client driver (mysqli).
- Open a TCP connection (typically port 3306) or a local socket.
- Authenticate.
- Select the database schema.
If any of those steps fail, the output looks the same. That’s why the fastest recoveries start with narrowing the failure domain: network vs process vs credentials vs saturation vs storage.
What it’s not: usually not a “WordPress bug.” WordPress is just the messenger. Sometimes the messenger is annoying, but you still need the message.
Two common patterns:
- Hard down: database service is stopped, crashed, or unreachable. The fix is operational.
- Soft down: database is alive but refusing connections (too many clients, slow storage, lock contention, DNS problem, auth issue). The fix is often capacity or configuration.
One quote to staple to your incident process:
“Hope is not a strategy.” — General Gordon R. Sullivan
Fast diagnosis playbook (first/second/third)
When your site is down, “be thorough” is how you miss your SLA. Be fast, then be thorough. Here’s the order that wins in production.
First: confirm what’s broken from the outside
- Hit the site from a clean network path (not your office VPN) and confirm it’s not a caching layer serving stale error pages.
- Check if
/wp-admin/shows the same error. If yes, it’s not theme-level rendering; it’s early DB init. - Check server health dashboards (CPU, load, memory, disk latency). If disk latency is spiking, treat the database as guilty until proven innocent.
Second: isolate network vs service
- If DB is remote: test TCP reachability to port 3306 from the web host.
- If DB is local: test the Unix socket exists and MySQL is listening.
- Validate DNS resolution for DB_HOST if it’s a hostname, not an IP.
Third: credentials and capacity
- Verify
wp-config.phpmatches the actual DB credentials and host. - Check MySQL for “Too many connections,” slow queries, or InnoDB recovery.
- Check error logs: PHP-FPM, Nginx/Apache, MySQL error log, and system journal.
Rule of thumb: if you can connect with the same credentials from the web server, WordPress can too. If you can’t, stop editing WordPress and fix the platform.
Joke #1: The database isn’t “down,” it’s just taking a personal day—right after you started yours.
Hands-on recovery tasks (commands, outputs, decisions)
Below are practical tasks you can run during an incident. Each includes: a command, what the output means, and the decision you make next. This is the stuff you actually do at 02:17.
Task 1: Confirm MySQL/MariaDB service status
cr0x@server:~$ sudo systemctl status mariadb --no-pager
● mariadb.service - MariaDB 10.11.6 database server
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; preset: enabled)
Active: active (running) since Thu 2025-12-26 01:58:12 UTC; 15min ago
Docs: man:mariadbd(8)
https://mariadb.com/kb/en/library/systemd/
Main PID: 2143 (mariadbd)
Status: "Taking your SQL requests now..."
Tasks: 37 (limit: 18956)
Memory: 1.2G
CPU: 2min 41.121s
CGroup: /system.slice/mariadb.service
└─2143 /usr/sbin/mariadbd
Meaning: If it’s active (running), the DB daemon exists. That does not mean it’s accepting connections, but it’s alive.
Decision: If inactive, failed, or restarting, jump to logs (Task 4) and storage checks (Task 11/12) before you blindly restart again.
Task 2: See if MySQL is listening on the expected interface/port
cr0x@server:~$ sudo ss -lntp | grep -E ':(3306|33060)\b'
LISTEN 0 151 127.0.0.1:3306 0.0.0.0:* users:(("mariadbd",pid=2143,fd=22))
Meaning: This DB listens only on localhost. If your WordPress is on a different host, it will fail every time.
Decision: If WordPress is remote, fix bind-address or use a private network listener. If local, ensure DB_HOST points to 127.0.0.1 or the socket correctly.
Task 3: Verify DNS resolution for DB_HOST
cr0x@server:~$ getent hosts db.internal
10.10.2.15 db.internal
Meaning: Name resolves. If it hangs or returns nothing, you have a DNS or resolver issue (common during network incidents).
Decision: If broken, temporarily pin with an /etc/hosts entry during incident mitigation, then fix DNS properly after.
Task 4: Check the database error log for crash recovery, corruption, or disk issues
cr0x@server:~$ sudo tail -n 60 /var/log/mysql/error.log
2025-12-26 02:10:44 0 [Note] InnoDB: Starting crash recovery.
2025-12-26 02:10:44 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2025-12-26 02:10:47 0 [Note] InnoDB: Crash recovery finished.
2025-12-26 02:10:47 0 [Warning] Aborted connection 431 to db: 'wpdb' user: 'wpuser' host: '10.10.3.21' (Got timeout reading communication packets)
Meaning: Crash recovery happened (so there was a crash or unclean shutdown). Aborted connections suggest timeouts or network stalls.
Decision: If crash recovery loops or shows I/O errors, stop blaming WordPress. Check storage latency and filesystem health (Tasks 11–12).
Task 5: Test connectivity from the web server using the same credentials as WordPress
cr0x@server:~$ grep -E "DB_(NAME|USER|PASSWORD|HOST)" /var/www/html/wp-config.php
define('DB_NAME', 'wpdb');
define('DB_USER', 'wpuser');
define('DB_PASSWORD', 'correcthorsebatterystaple');
define('DB_HOST', 'db.internal');
cr0x@server:~$ mysql -h db.internal -u wpuser -p -e "SELECT 1;"
Enter password:
1
1
Meaning: If this succeeds, networking + credentials are fine. WordPress should be able to connect, so the issue is likely PHP runtime, socket mismatch, or transient saturation.
Decision: If it fails, the error message from mysql is gold—act on it. “Access denied” means credentials/permissions. “Can’t connect” means network/listener/firewall. “Unknown MySQL server host” means DNS.
Task 6: Check whether MySQL is refusing due to max_connections
cr0x@server:~$ mysql -h db.internal -u root -p -e "SHOW GLOBAL STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections';"
Enter password:
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_connected | 498 |
+-------------------+-------+
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 500 |
+-----------------+-------+
Meaning: You’re at the cliff edge. New connections will get refused; WordPress will throw the generic error.
Decision: Immediate mitigation: stop the connection storm (cache, rate-limit, kill abusive clients, restart PHP-FPM carefully). Then fix root cause: persistent connections strategy, query performance, PHP-FPM process counts, and connection pooling (where applicable).
Task 7: Identify top connection sources and what they’re doing
cr0x@server:~$ mysql -h db.internal -u root -p -e "SHOW FULL PROCESSLIST\G" | sed -n '1,80p'
Enter password:
*************************** 1. row ***************************
Id: 91231
User: wpuser
Host: 10.10.3.21:51344
db: wpdb
Command: Query
Time: 24
State: Sending data
Info: SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'
*************************** 2. row ***************************
Id: 91244
User: wpuser
Host: 10.10.3.21:51362
db: wpdb
Command: Query
Time: 29
State: Copying to tmp table
Info: SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts ...
Meaning: Long-running queries, temp tables, and “Sending data” can indicate missing indexes, huge result sets, or storage slowness.
Decision: If you see a repeating expensive query, you can temporarily kill the worst offenders to restore service, then add indexes or fix the plugin/theme generating it.
Task 8: Kill a runaway query (surgical, not angry)
cr0x@server:~$ mysql -h db.internal -u root -p -e "KILL 91244;"
Enter password:
Meaning: The thread is terminated. If it respawns instantly, something is hammering the DB (cron, traffic, bot, plugin).
Decision: If killing helps briefly, put in traffic controls (WAF rules, caching, admin lockdown) and hunt the source.
Task 9: Check PHP-FPM health and backlog (web tier can be the villain)
cr0x@server:~$ sudo systemctl status php8.2-fpm --no-pager
● php8.2-fpm.service - The PHP 8.2 FastCGI Process Manager
Loaded: loaded (/lib/systemd/system/php8.2-fpm.service; enabled; preset: enabled)
Active: active (running) since Thu 2025-12-26 01:55:02 UTC; 18min ago
Main PID: 1887 (php-fpm8.2)
Status: "Processes active: 52, idle: 1, Requests: 10432, slow: 219, Traffic: 0.6req/sec"
Meaning: If you have many active workers, few idle, and slow requests climbing, PHP is saturated. Saturated PHP often creates more DB load through retries, timeouts, and thundering herds.
Decision: If PHP-FPM is pegged, reduce concurrency or add capacity. Don’t crank it up blindly; you’ll turn DB refusal into a full-server collapse.
Task 10: Validate the DB_HOST socket vs TCP mismatch
cr0x@server:~$ sudo ls -l /var/run/mysqld/
total 4
srwxrwxrwx 1 mysql mysql 0 Dec 26 01:58 mysqld.sock
-rw-r--r-- 1 mysql mysql 5 Dec 26 01:58 mysqld.pid
cr0x@server:~$ php -r 'echo ini_get("mysqli.default_socket"), PHP_EOL;'
/var/run/mysqld/mysqld.sock
Meaning: Socket exists and PHP expects the same socket path. If PHP expects a different socket, local connections can fail even when the DB is running.
Decision: If mismatched, set DB_HOST to 127.0.0.1 to force TCP (quick mitigation) or correct socket settings in PHP config (proper fix).
Task 11: Check disk space and inode exhaustion (the classic “DB died” impersonator)
cr0x@server:~$ df -h /var/lib/mysql
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 100G 99G 1.0G 99% /var
cr0x@server:~$ df -i /var/lib/mysql
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda2 6553600 6553200 400 100% /var
Meaning: You’re out of inodes. MySQL can be “running” but unable to create temp tables, redo logs, or new files. WordPress sees connection failures or timeouts.
Decision: Free space/inodes immediately (logs, caches, old backups), then schedule a proper partitioning/storage fix. Inode exhaustion is a structural problem, not a mood.
Task 12: Look for I/O stalls that turn DB into a liar (it says “running,” it’s actually stuck)
cr0x@server:~$ iostat -xz 1 3
Linux 6.1.0 (server) 12/26/2025 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
10.21 0.00 5.02 42.11 0.00 42.66
Device r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 8.00 120.00 256.0 6144.0 98.0 12.4 102.5 15.2 108.4 2.10 96.0
Meaning: High %iowait and high await with near-100% utilization: your disk is the bottleneck. MySQL queries will stall; clients time out; WordPress reports “can’t connect.”
Decision: Reduce write pressure (stop noisy jobs, reduce logging bursts), move DB to faster storage, or scale vertically. If this is a shared VM disk, you may be victim of noisy neighbors.
Task 13: Check for filesystem errors or read-only remount
cr0x@server:~$ mount | grep ' /var '
/dev/sda2 on /var type ext4 (ro,relatime,errors=remount-ro)
cr0x@server:~$ sudo dmesg | tail -n 20
[12345.678901] EXT4-fs error (device sda2): ext4_journal_check_start:83: Detected aborted journal
[12345.678905] EXT4-fs (sda2): Remounting filesystem read-only
Meaning: The filesystem remounted read-only due to errors. MySQL may keep running but can’t write, and then everything unravels.
Decision: Treat as a storage incident. Stop MySQL cleanly if possible, plan fsck, and restore from backups if needed. Don’t keep poking; you can worsen corruption.
Task 14: Confirm WordPress is not pointing at the wrong database host after a migration
cr0x@server:~$ sudo -u www-data php -r "require '/var/www/html/wp-config.php'; echo DB_HOST, PHP_EOL;"
db.internal
cr0x@server:~$ ping -c 1 db.internal
PING db.internal (10.10.2.15) 56(84) bytes of data.
64 bytes from 10.10.2.15: icmp_seq=1 ttl=63 time=0.493 ms
Meaning: WordPress config resolves and host is reachable at the ICMP level. (ICMP isn’t proof of TCP service, but it’s a quick sanity check.)
Decision: If ping works but MySQL doesn’t, focus on firewall and DB listener. If both fail, you have routing or DNS issues.
Task 15: Validate user grants from the DB side (remote host mismatch is a classic)
cr0x@server:~$ mysql -u root -p -e "SHOW GRANTS FOR 'wpuser'@'10.10.3.%';"
Enter password:
+--------------------------------------------------------------------------------------------------+
| Grants for wpuser@10.10.3.% |
+--------------------------------------------------------------------------------------------------+
| GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER ON `wpdb`.* TO `wpuser`@`10.10.3.%` |
+--------------------------------------------------------------------------------------------------+
Meaning: The user is allowed from that subnet. If you only have 'wpuser'@'localhost', remote connections will fail with “Access denied.”
Decision: Fix grants to match reality. After migrations, “localhost” assumptions are the silent killers.
Task 16: Spot a table crash and do a safe-ish repair (MyISAM) or a safer check (InnoDB)
cr0x@server:~$ mysql -h db.internal -u root -p -e "CHECK TABLE wpdb.wp_options;"
Enter password:
+------------------+-------+----------+--------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+-------+----------+--------------------------------+
| wpdb.wp_options | check | status | OK |
+------------------+-------+----------+--------------------------------+
Meaning: Basic integrity check says OK. If you see “corrupt” on MyISAM tables, REPAIR TABLE may help. For InnoDB, corruption is a bigger story.
Decision: If corruption appears and you’re on InnoDB: prioritize restoring from backups or using InnoDB recovery modes carefully. “Repair” is not a magic wand for InnoDB.
Common mistakes: symptom → root cause → fix
This section is deliberately specific. Generic advice is how outages become recurring calendar events.
1) Symptom: error appears after a DNS change or migration
- Root cause:
DB_HOSTpoints to an old hostname/IP, or DNS TTL + caching keeps sending some servers to the old DB. - Fix: Confirm
DB_HOSTon every web node; flush local DNS caches where applicable; keep DB endpoint stable (VIP/CNAME) and migrate behind it.
2) Symptom: intermittent errors under traffic spikes, fine at low traffic
- Root cause: MySQL hits
max_connections, PHP-FPM has too many workers, or you have a thundering herd on uncached pages. - Fix: Add caching for read-heavy routes, cap PHP-FPM concurrency, increase DB capacity, and stop “scale web only” fantasies if DB is the choke point.
3) Symptom: DB service “running” but connections time out
- Root cause: I/O stalls, filesystem remounted read-only, or InnoDB stuck in recovery/flush pressure.
- Fix: Measure disk latency, check
dmesgand mount flags, reduce write load, and move DB to proper storage if needed.
4) Symptom: “Access denied for user” after changing the password
- Root cause: WordPress uses old credentials; config management didn’t propagate; or user grants are host-specific and changed host identity.
- Fix: Update
wp-config.phpconsistently, verify grants for the connecting host, and rotate secrets through a controlled process (not Slack paste-and-pray).
5) Symptom: error appears only on some pages (admin works, public doesn’t)
- Root cause: caching/CDN serving stale error pages; or a plugin triggers heavy queries only on certain routes.
- Fix: Purge caches, bypass CDN for diagnostics, then profile the slow queries tied to those routes.
6) Symptom: error appeared right after enabling a “performance” plugin
- Root cause: object cache misconfigured (bad Redis endpoint), aggressive cron/heartbeat behavior, or plugin floods DB with uncached queries.
- Fix: Disable the plugin to restore service, then reintroduce with measured load testing. Performance work without measurements is just arts and crafts.
7) Symptom: “MySQL server has gone away” and then WordPress shows connection error
- Root cause: max packet too small for large queries, idle connection timeouts, DB restart, or network interruptions.
- Fix: Check DB logs for restarts, tune timeouts and
max_allowed_packetif appropriate, and address network reliability between tiers.
Storage and filesystem failure modes that masquerade as “DB down”
Here’s the storage engineer’s unpopular truth: many “database connection” incidents are storage incidents wearing a fake mustache.
Disk latency is a connectivity problem in disguise
If InnoDB can’t flush, it will apply backpressure. Queries queue. Threads pile up. Connections hit timeouts. From WordPress’s perspective, the DB might as well be unplugged.
Watch for:
- High fsync pressure: redo log flushes stall under slow disks.
- Swap storms: DB memory pressure pushes pages out; latency explodes.
- Write amplification: temp tables on disk, big sorts,
Copying to tmp tablestates.
Inode exhaustion: the silly way to go down
Databases create files, temp tables, binlogs, and redo logs. If you’re out of inodes, the OS will politely refuse. MySQL will un-politely degrade. Your WordPress will just throw the same generic error, because of course it will.
Read-only remount: the quiet catastrophe
Ext4 with errors=remount-ro is doing you a favor: it’s trying not to make corruption worse. But it turns every write into a failure, and databases are mostly writes disguised as reads.
Networked storage and “the DB is slow today”
If your database is on network-attached storage, an overloaded storage array can translate into seconds of latency at the block level. MySQL will patiently wait. Clients will not. If you’re in the cloud on shared volumes, you can also get throttled. The OS rarely announces it with fanfare; it just becomes sluggish.
Prevention that actually works (not vibes)
Prevention is not “add more CPU.” For WordPress DB reliability, prevention is about controlling concurrency, stabilizing storage latency, and making failures visible before they become a homepage message.
1) Put the database on storage that behaves under pressure
- Use SSD/NVMe-class storage for MySQL’s data directory and redo logs.
- Prefer local NVMe over “mystery IOPS” network volumes for high-traffic sites.
- Monitor disk latency (
await,%util) like you monitor CPU.
2) Cap concurrency at the web tier
If PHP-FPM is allowed to spawn a small army, it will. Then every worker opens DB connections, and your database becomes the world’s saddest nightclub bouncer.
- Set reasonable
pm.max_childrenbased on CPU, memory, and DB capacity. - Use caching for anonymous traffic. Most WordPress pages do not need to hit the DB per request.
- Rate-limit login endpoints and XML-RPC if you don’t need it.
3) Make “DB connectivity” a monitored SLO, not a superstition
Monitor:
- TCP connect time from web nodes to DB (synthetic checks).
- MySQL
Threads_connected,Threads_running, and aborted connections. - Disk latency and filesystem error counters.
- Slow query log volume spikes.
4) Backups that restore, not backups that exist
A backup that hasn’t been restored is a bedtime story. Practice restores on a schedule, including:
- Logical dumps for portability (
mysqldumpor equivalent). - Physical backups for speed (file-level consistent snapshots or backup tools).
- Point-in-time recovery if you have binlogs and a plan.
5) Avoid single points of failure when it matters
Not every WordPress site needs HA, but if your business does, build it:
- Primary/replica with automated failover (carefully tested).
- Separate web tier from DB tier so scaling one doesn’t destabilize the other.
- Health checks that detect “DB accepts connections and can run queries,” not just “port is open.”
6) Keep WordPress sane: reduce query load and surprises
- Audit plugins. Every plugin is a potential DBA you didn’t hire.
- Use object caching properly (and monitor it).
- Keep WordPress core and PHP updated for performance and stability improvements.
Joke #2: A plugin promised “one-click optimization,” which is true—one click is all it takes to optimize your site into an outage.
Checklists / step-by-step plan
During the incident: 15-minute stabilization checklist
- Confirm impact: Is it all pages or a subset? Is the error cached by CDN?
- Check DB health: service status, listener, logs.
- Test from web host: connect using
mysqlwith WordPress credentials. - Check saturation: connections, running threads, slow queries.
- Check storage: disk full, inode full, I/O wait, read-only remount.
- Mitigate: reduce concurrency (temporarily), kill runaway queries, disable offending plugin if needed.
- Communicate: set a clear status update cadence with what you know and what you’re doing next.
After stabilization: root-cause checklist (same day)
- Extract error windows from MySQL, PHP-FPM, and web server logs.
- Correlate with metrics: connections, disk latency, CPU, memory, network errors.
- Identify the trigger: traffic spike, deployment, plugin change, storage event, DNS change.
- Write down the exact failure mode (e.g.,
max_connectionsreached, inode exhaustion, read-only remount). - Implement a prevention change with a rollback plan.
- Add/adjust monitors so this becomes a page earlier next time (or never again).
Hardening checklist (weekly/monthly)
- Restore-test backups.
- Review plugin list and remove dead weight.
- Review slow queries and index opportunities.
- Check disk growth trends and inode usage trends.
- Patch OS, DB, PHP with a controlled process.
- Review capacity: PHP-FPM worker counts vs DB max_connections vs hardware.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
The company had a “simple” architecture: a couple of web nodes and a managed database service. The team migrated WordPress to new web servers to get a newer PHP version and better CPU. Everything looked fine in staging. Production cutover was scheduled for a quiet morning.
The cutover happened, DNS flipped, and traffic arrived. Within minutes: Error establishing a database connection. The database metrics looked normal. The DB was “up.” The web nodes were healthy. Naturally, everyone stared at WordPress like it had personally betrayed them.
The wrong assumption was embedded in a single line: DB_HOST was set to localhost on the old servers, because the old architecture had the DB on the same host years ago. During earlier migrations, someone had “temporarily” used an SSH tunnel and never cleaned up the configuration logic. On the new servers, there was no local MySQL. WordPress tried to connect to itself. It failed predictably.
Fixing it was trivial: point DB_HOST to the managed DB endpoint, deploy, done. The lasting lesson wasn’t “update DB_HOST.” It was: don’t let infrastructure drift live inside application configs without tests that assert the topology. The team added a preflight check to validate DB connectivity from each web node before any DNS flip.
Mini-story 2: The optimization that backfired
A different org had a performance mandate. Pages were slow, and someone suggested: “Let’s increase PHP-FPM workers so requests don’t queue.” It sounded reasonable. The web servers had spare CPU. The change got rolled out during business hours because the risk “seemed low.”
The result was immediate: WordPress started throwing database connection errors intermittently. Not a full outage—just enough to ruin conversions and make customer support feel haunted. MySQL was hitting connection limits, and query latency shot up. The DB wasn’t sized for the new concurrency. More PHP workers meant more simultaneous DB connections, more cache misses, and more pressure on InnoDB flushes.
The team tried the usual instinctive move: increase max_connections. That bought minutes, then the server ran out of memory and started swapping. Now everything was slow: the DB, the web nodes, even SSH logins. This was no longer “a WordPress problem.” This was a resource exhaustion cascade.
The rollback fixed it. The forward fix was boring: add caching for anonymous traffic, set sane PHP-FPM limits, add indexes for the worst queries, and scale the DB storage/IOPS. The biggest “optimization” was simply not generating work that didn’t need to exist.
Mini-story 3: The boring but correct practice that saved the day
A media company ran WordPress as part of a broader platform. The DB was a single primary with a warm replica. Nothing fancy. The exciting part was their discipline: routine restore tests and a documented failover runbook that lived in the same repo as infra code.
One afternoon, the primary DB host experienced filesystem errors after a kernel update and reboot. It came back, but the data volume remounted read-only. MySQL started, then progressively failed as it tried to write. WordPress displayed the database connection error. They had about five minutes of confusion, then the on-call recognized the pattern: “running, but not writing.”
They didn’t attempt heroics on a damaged filesystem. They executed the runbook: promote the replica, update the DB endpoint, and restart web nodes to flush stale connections. Traffic recovered quickly. Later, they repaired the primary offline and re-seeded it properly.
The saving practice wasn’t magical automation. It was rehearsed, dull competence: backups validated, failover steps known, and a team culture that didn’t treat filesystems as immortal.
Interesting facts and context (short and concrete)
- Fact 1: WordPress started in 2003, and the MySQL dependency has been there from the beginning; the “DB connection” error has been haunting admins for two decades.
- Fact 2: For many years, WordPress’s default database driver was the old
mysqlextension; modern installs usemysqli, which changes how sockets and timeouts behave. - Fact 3: MySQL’s “Too many connections” behavior is famously abrupt: once you hit the cap, new sessions fail immediately, even if the server is otherwise healthy.
- Fact 4: InnoDB became the default MySQL storage engine in MySQL 5.5; before that, MyISAM was common, and “table crashed” repairs were more routine.
- Fact 5: DNS TTLs can keep old DB endpoints alive in client caches far longer than expected, especially across container resolvers and local caching daemons.
- Fact 6: The Unix socket vs TCP confusion is older than WordPress: clients may default to sockets for “localhost” but TCP for “127.0.0.1,” which can change auth paths and permissions.
- Fact 7: Many WordPress plugins implement their own caching layers poorly; a misconfigured object cache can increase DB load instead of reducing it.
- Fact 8: InnoDB crash recovery time is proportional to redo log size and workload; after a crash, “DB is up” can still mean “DB is busy repairing itself.”
- Fact 9: Filesystems remounting read-only on error is a safety feature, not a bug—databases just happen to hate it intensely.
FAQ
1) Why does WordPress show the same error for different failures?
Because the failure happens at connection time, before WordPress can load options or run meaningful queries. It can’t distinguish “wrong password” from “DB host on fire” without exposing details. You need platform-level checks.
2) Should I restart MySQL immediately?
Only if you’ve checked logs and storage symptoms first. Restarting can help if the daemon is wedged, but it can also make crash recovery worse or hide the real issue. If you see disk errors or read-only mounts, restarting is cargo cult.
3) How do I know if it’s credentials vs connectivity?
Run mysql -h ... -u ... -p -e "SELECT 1;" from the web server using the credentials in wp-config.php. “Access denied” means auth/grants. “Can’t connect” means network/listener/firewall. Timeouts often mean saturation or I/O stalls.
4) Why does it happen only during traffic spikes?
Because concurrency multiplies everything. More requests → more PHP workers → more DB connections → more lock contention and flush pressure. If you’re near capacity, a spike tips you into connection refusal and timeouts.
5) Is increasing max_connections a good fix?
Sometimes it’s a short-term bandage. Long-term, it can backfire by increasing memory use and context switching. Fix the demand side (caching, query optimization, rate limiting) and the supply side (DB resources) together.
6) Can a full disk cause a database connection error?
Yes. Disk full or inode full can prevent MySQL from creating temp tables or writing logs, leading to stalls and failures that manifest as connection problems at the application layer.
7) What’s the fastest safe mitigation if the DB is overloaded?
Reduce load first: enable/verify full-page caching for anonymous traffic, rate-limit abusive endpoints, and cap PHP-FPM workers. Killing the worst DB queries can buy time, but it’s not a strategy by itself.
8) How do I prevent plugin-caused DB meltdowns?
Run plugin changes like deployments: staged rollout, metrics comparison, and a rollback plan. Also audit plugins quarterly. Less code is less chaos.
9) Why does switching DB_HOST from localhost to 127.0.0.1 sometimes “fix” it?
Because it forces TCP instead of a Unix socket. If the socket path is wrong or permissions are odd, TCP can bypass the issue. It’s a valid mitigation; the real fix is consistent socket configuration.
10) If the database is remote, what’s the top hidden culprit?
Network path reliability and name resolution. A “small” DNS change, firewall rule, or routing issue can look exactly like a DB outage from WordPress’s point of view.
Conclusion: next steps you can do today
This WordPress error is a symptom, not a cause. Treat it like a production outage: isolate the failure domain fast, apply surgical mitigations, then fix the underlying capacity/config/storage issues so it doesn’t recur.
- Write your fast-diagnosis runbook using the tasks above, tailored to your environment (local DB vs managed DB, socket vs TCP).
- Add two monitors today: web-to-DB synthetic query latency, and disk latency on the DB host/volume.
- Cap concurrency (PHP-FPM and web server) based on DB capacity, not wishful thinking.
- Restore-test backups this week. Not “verify they exist.” Restore them.
- Audit plugins and remove anything that’s unmaintained, redundant, or “temporarily installed” since 2019.