WordPress High TTFB: Speed Up Server Response Without Magic Plugins

November 7, 2025 • February 3, 2026 • Read: 23 min • Views: 18

Was this helpful?

High TTFB is the kind of performance problem that makes everyone confident and wrong at the same time. Marketing says “the site is slow,” a plugin vendor says “install this,” and your hosting provider says “it’s WordPress.” Meanwhile your users stare at a blank page waiting for the first byte like it’s 2006 and we’re all on dial-up again.

This is not a plugin roundup. This is a production playbook for getting your server response time under control: measure where the first byte is stuck (network, TLS, web server, PHP, database, cache, storage), fix the bottleneck, and avoid changes that look fast on a chart but melt under real traffic.

TTFB reality check: what it is (and what it isn’t)

Time To First Byte is the time from the client starting the request to the client receiving the first byte of the response body. That includes:

DNS resolution (sometimes, depending on tool)
TCP connect
TLS handshake
Request routing (load balancer, reverse proxy)
Web server processing and upstream waits
Application work (PHP + WordPress)
Database queries
Cache misses, cache stampedes, and “why is Redis at 100% CPU?” moments
Storage latency (especially on shared or bursty volumes)

TTFB is not a pure “backend time.” It’s an end-to-end metric seen by the client. That’s why it’s both useful and easy to misuse. If you measure TTFB from your laptop on hotel Wi‑Fi, you’re learning about the hotel. If you measure it from inside the same datacenter, you’re learning about your stack.

Also: a low TTFB doesn’t guarantee a fast page. It just guarantees the server started talking quickly. You can still ship 7 MB of JavaScript and set the browser on fire. But when TTFB is high, it’s rarely a front-end problem. It’s the system not getting to “first byte” promptly—usually waiting on something.

Opinionated rule: don’t treat high TTFB as “WordPress is slow.” Treat it as “a request is blocked.” Then find what it’s blocked on.

Interesting facts and a little history

TTFB predates Core Web Vitals. Engineers have used it for decades because it maps well to server-side latency and network handshake cost.
WordPress started in 2003 as a fork of b2/cafelog. Early shared hosting shaped many defaults: PHP, MySQL, and “keep it simple” caching assumptions.
PHP opcode caching used to be optional and chaotic. Before OPcache became standard, running WordPress meant repeatedly parsing PHP on every request—TTFB pain was normal.
HTTP/1.1 keep-alive (late 1990s) was a quiet performance revolution. Without it, repeated connects massively inflated perceived latency.
TLS handshakes used to be expensive. Modern TLS 1.3 reduced round trips, and session resumption is a big deal for repeat visitors.
MySQL query cache was removed (effectively deprecated then gone) because it caused lock contention and unpredictable performance in write-heavy workloads—common in WP admin and WooCommerce.
“Page caching fixes everything” was true-ish when sites were mostly anonymous traffic. Logged-in users, personalization, and carts changed the math.
Object caching for WordPress matured late. Persistent object cache (Redis/Memcached) became mainstream as WP sites got more dynamic and plugin-heavy.
CDNs didn’t kill TTFB problems. They can hide them for cached pages, but dynamic requests and cache misses still go home to your origin like it owes them money.

Fast diagnosis playbook

When TTFB spikes, don’t start by installing plugins, changing themes, or arguing in Slack. Start by answering three questions quickly:

1) Is it network/TLS or server/app?

Measure TTFB from outside and inside the server/VPC.
If internal TTFB is fine but external is bad: think DNS, TLS handshake, load balancer, routing, packet loss, or WAF.
If internal is also bad: it’s your app stack (PHP, DB, cache, storage, CPU contention).

2) Is it one endpoint or everything?

Homepage only? Might be a slow query or a theme/template doing too much.
wp-admin only? Might be auth, external calls, or DB lock contention.
Everything (including static assets)? Think web server saturation, CPU steal, or upstream network limits.

3) Is it “slow all the time” or “slow under concurrency”?

Slow even at 1 request: look for single-request bottlenecks (DNS lookups from PHP, slow DB query, storage latency, PHP startup/opcache misconfig).
Fast at 1 request but slow at 20: look for pool sizing, DB connections, locks, cache stampedes, rate limiting, CPU saturation.

Stop condition: Once you can say “we are waiting on X for Y milliseconds,” you’re done diagnosing and you can start fixing. If you can’t say that, you’re still guessing.

Measure first: build a timeline of one request

TTFB is a single number. You need a timeline. The goal is to break the request into phases and attach numbers to each phase:

Name lookup (DNS)
Connect (TCP)
App connect (TLS handshake)
Pre-transfer (request sent, waiting for response)
Start transfer (TTFB itself)
Total (full response)

From the server side, you want:

Nginx/Apache timing: request time, upstream response time
PHP-FPM timing: request duration, slowlog stack traces
DB timing: slow query log, lock waits, buffer pool hit rate
System timing: CPU pressure, run queue, I/O latency, network retransmits

Joke #1: If your monitoring is “refresh the page and squint,” you don’t have monitoring; you have a ritual.

One quote, because it’s evergreen: Hope is not a strategy. — James Cameron.

Practical tasks with commands: what to run, what it means, what you decide

These are not “try this maybe.” These are the moves you can make on a Linux WordPress host with Nginx + PHP-FPM + MariaDB/MySQL (similar ideas apply to Apache). Each task includes a command, example output, what the output means, and the decision you make.

Task 1: Measure TTFB from a client with curl (baseline)

cr0x@server:~$ curl -o /dev/null -s -w 'dns=%{time_namelookup} connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total}\n' https://example.com/
dns=0.012 connect=0.031 tls=0.082 ttfb=1.247 total=1.392

Meaning: DNS/connect/TLS are small; TTFB is large. The server/app is slow to produce the first byte.

Decision: Focus on origin processing (Nginx upstream waits, PHP-FPM, DB, caches), not on network or TLS.

Task 2: Measure TTFB from inside the server/VPC (separate network from app)

cr0x@server:~$ curl -o /dev/null -s -w 'connect=%{time_connect} tls=%{time_appconnect} ttfb=%{time_starttransfer} total=%{time_total}\n' https://127.0.0.1/
connect=0.000 tls=0.000 ttfb=1.106 total=1.219

Meaning: Still slow internally. This is not your ISP, not DNS, not the CDN.

Decision: Instrument the server: web server logs, PHP-FPM status/slowlog, DB slow queries, system resource checks.

Task 3: Split static vs dynamic quickly

cr0x@server:~$ curl -o /dev/null -s -w 'ttfb=%{time_starttransfer} total=%{time_total}\n' https://example.com/wp-content/uploads/2025/01/logo.png
ttfb=0.045 total=0.046

Meaning: Static is fine; dynamic is the problem.

Decision: Don’t waste time tuning sendfile or gzip right now. Go straight to PHP/DB/caching.

Task 4: Turn on request timing in Nginx access logs (see upstream time)

cr0x@server:~$ sudo grep -R "log_format" /etc/nginx/nginx.conf /etc/nginx/conf.d 2>/dev/null
/etc/nginx/nginx.conf:log_format main '$remote_addr - $host [$time_local] "$request" $status $body_bytes_sent '
/etc/nginx/nginx.conf:'"$http_referer" "$http_user_agent" rt=$request_time urt=$upstream_response_time uct=$upstream_connect_time';

Meaning: You already have variables for request_time and upstream timings. Good.

Decision: Ensure the active server block uses that log_format and then inspect slow requests.

Task 5: Find the slowest requests in Nginx logs (real URLs, real timings)

cr0x@server:~$ sudo awk '{for(i=1;i<=NF;i++) if($i ~ /^rt=/){sub("rt=","",$i); print $i, $0}}' /var/log/nginx/access.log | sort -nr | head -n 5
2.914 203.0.113.10 - example.com [27/Dec/2025:10:11:09 +0000] "GET /product/widget/ HTTP/2.0" 200 84512 "-" "Mozilla/5.0" rt=2.914 urt=2.902 uct=0.000
2.501 203.0.113.11 - example.com [27/Dec/2025:10:10:58 +0000] "GET / HTTP/2.0" 200 112340 "-" "Mozilla/5.0" rt=2.501 urt=2.489 uct=0.000

Meaning: request_time (rt) roughly equals upstream_response_time (urt). Nginx isn’t the bottleneck; it’s waiting on upstream (PHP-FPM).

Decision: Move to PHP-FPM: pool saturation, slow scripts, external calls, OPcache.

Task 6: Check PHP-FPM pool health and saturation

cr0x@server:~$ curl -s http://127.0.0.1/fpm-status?full | sed -n '1,25p'
pool:                 www
process manager:      dynamic
start time:           27/Dec/2025:09:31:02 +0000
start since:          2430
accepted conn:        18291
listen queue:         17
max listen queue:     94
listen queue len:     128
idle processes:       0
active processes:     24
total processes:      24
max active processes: 24
max children reached: 63
slow requests:        211

Meaning: listen queue exists and max children reached is non-zero. PHP-FPM is saturated: requests are waiting before they even start running PHP.

Decision: Tune PHP-FPM: increase pm.max_children if CPU/RAM allow, and reduce per-request cost (OPcache, DB, caching). Also confirm you’re not hiding a slow DB behind “more children.”

Task 7: Check PHP-FPM slowlog for where time is going

cr0x@server:~$ sudo tail -n 30 /var/log/php8.2-fpm/www-slow.log
[27-Dec-2025 10:11:09]  [pool www] pid 22107
script_filename = /var/www/html/index.php
[0x00007f2a2c1a3f40] curl_exec() /var/www/html/wp-includes/Requests/src/Transport/Curl.php:205
[0x00007f2a2c19b8a0] request() /var/www/html/wp-includes/Requests/src/Session.php:214
[0x00007f2a2c193250] wp_remote_get() /var/www/html/wp-includes/http.php:197
[0x00007f2a2c18aa10] some_plugin_check_updates() /var/www/html/wp-content/plugins/some-plugin/plugin.php:812

Meaning: A plugin is doing an outbound HTTP call during page generation. That will murder TTFB the moment the remote service is slow or blocked.

Decision: Remove/replace the plugin behavior, move it to cron/background, or cache the result aggressively. Also verify egress DNS and firewall rules.

Task 8: Verify OPcache is enabled and not starved

cr0x@server:~$ php -i | grep -E 'opcache.enable|opcache.memory_consumption|opcache.interned_strings_buffer|opcache.max_accelerated_files|opcache.validate_timestamps'
opcache.enable => On => On
opcache.memory_consumption => 128 => 128
opcache.interned_strings_buffer => 8 => 8
opcache.max_accelerated_files => 10000 => 10000
opcache.validate_timestamps => On => On

Meaning: OPcache is on, but 128MB and 10k files can be tight for plugin-heavy WordPress. Timestamp validation on production also adds filesystem checks.

Decision: Increase OPcache memory and max files; consider setting opcache.validate_timestamps=0 on immutable deployments (with a restart on deploy).

Task 9: Check database slow queries (you can’t cache your way out of locks)

cr0x@server:~$ sudo tail -n 20 /var/log/mysql/mysql-slow.log
# Time: 2025-12-27T10:10:58.114221Z
# Query_time: 1.873  Lock_time: 0.001 Rows_sent: 20  Rows_examined: 945321
SET timestamp=1766830258;
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes';

Meaning: That options autoload query is scanning far too many rows. Usually it’s bloated autoloaded options (plugins love stuffing junk in there) or missing/ineffective indexing combined with table bloat.

Decision: Reduce autoload bloat, review wp_options size, and confirm proper indexes. Consider object caching to avoid repeated options hits, but fix the underlying bloat too.

Task 10: Check InnoDB buffer pool health (are you doing disk I/O for reads?)

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';"
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_read_requests      | 98765432 |
| Innodb_buffer_pool_reads              | 1234567  |
+---------------------------------------+----------+

Meaning: A lot of reads are coming from disk (Innodb_buffer_pool_reads is not tiny). That adds latency and spikes TTFB under load.

Decision: Increase innodb_buffer_pool_size (within RAM limits), reduce dataset size (cleanup), and check storage latency.

Task 11: Spot DB lock waits in real time (the “everything is stuck” mode)

cr0x@server:~$ mysql -e "SHOW PROCESSLIST;" | head -n 12
Id	User	Host	db	Command	Time	State	Info
291	wpuser	localhost	wp	Query	12	Sending data	SELECT * FROM wp_posts WHERE post_status='publish' ORDER BY post_date DESC LIMIT 10
305	wpuser	localhost	wp	Query	9	Locked	UPDATE wp_options SET option_value='...' WHERE option_name='woocommerce_sessions'

Meaning: Queries are waiting on locks. This often looks like “random TTFB spikes” because some requests get stuck behind a write or a long-running query.

Decision: Identify the locking query, fix the pattern (indexes, reduce session writes, avoid autocommit storms), and consider isolating heavy admin jobs away from front-end traffic.

Task 12: Check CPU pressure and run queue (are requests waiting for CPU?)

cr0x@server:~$ uptime
 10:12:33 up 12 days,  3:41,  2 users,  load average: 18.42, 17.90, 16.85

Meaning: Load average is very high. If you have, say, 8 vCPUs, you’re over capacity and PHP workers will queue.

Decision: Reduce per-request CPU cost (cache, OPcache, fewer plugins), cap concurrency, or add CPU. Don’t blindly increase PHP-FPM children on a CPU-saturated box.

Task 13: Find I/O latency spikes (storage can dominate TTFB)

cr0x@server:~$ iostat -xz 1 3
Linux 6.1.0 (server) 	12/27/2025 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          22.10    0.00    6.30    9.80    0.00   61.80

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s  w_await aqu-sz  %util
nvme0n1         210.0  18560.0     0.0   0.00   18.40    88.38    95.0   9120.0   42.10   5.12  98.20

Meaning: High await and near-100% util: storage is saturated. DB reads/writes and PHP filesystem stats will stall.

Decision: Move to faster storage, reduce I/O (buffer pool, caching), investigate noisy neighbors (shared disks), and check for backup jobs or log storms.

Task 14: Check network retransmits and drops (quiet latency killers)

cr0x@server:~$ netstat -s | egrep -i 'retrans|segments retransmited|listen|overflow' | head -n 12
    14572 segments retransmited
    0 listen queue overflows
    0 listen queue drops

Meaning: Retransmits are present. If they climb quickly, you have packet loss or congestion.

Decision: Check interface errors, upstream network, and load balancer health. High retransmits can inflate TTFB even when the app is fine.

Task 15: Verify Nginx isn’t buffering you into latency (rare, but real)

cr0x@server:~$ sudo nginx -T 2>/dev/null | egrep -n 'proxy_buffering|fastcgi_buffering|fastcgi_read_timeout|keepalive_timeout' | head -n 20
42:	keepalive_timeout 65;
117:	fastcgi_read_timeout 60s;
118:	fastcgi_buffering on;

Meaning: Buffering is on. This typically helps throughput, but can mask streaming responses and make debugging harder. Timeouts are normal.

Decision: Don’t flip buffering off as a superstition. Use it intentionally (mostly leave it on). If you have long-running requests, fix the cause, not the timeout.

Task 16: Confirm HTTP keep-alive and connection reuse (reduce handshake cost)

cr0x@server:~$ curl -I -s https://example.com/ | egrep -i 'server:|connection:|keep-alive|alt-svc|via'
server: nginx
connection: keep-alive

Meaning: Keep-alive is enabled. Good. If you saw connection: close everywhere, you’d pay extra handshake overhead repeatedly.

Decision: Keep it. Tune timeouts sanely. Don’t “optimize” by closing connections early unless you like self-inflicted latency.

Task 17: Look for external calls from WordPress at the network layer

cr0x@server:~$ sudo ss -tpn | grep php-fpm | head -n 10
ESTAB 0 0 10.0.0.10:48722 93.184.216.34:443 users:(("php-fpm8.2",pid=22107,fd=15))
ESTAB 0 0 10.0.0.10:48724 151.101.2.132:443 users:(("php-fpm8.2",pid=22111,fd=17))

Meaning: PHP-FPM workers are making outbound HTTPS connections while serving requests. That’s a TTFB time bomb.

Decision: Identify which plugin/theme does it; move to background jobs; cache results; enforce timeouts; block unexpected egress if you can.

Task 18: Quick “what changed?” sanity check (package updates, deploys, restarts)

cr0x@server:~$ sudo journalctl -p warning -S "2 hours ago" | head -n 12
Dec 27 09:02:11 server php-fpm8.2[1022]: WARNING: [pool www] server reached pm.max_children setting (24), consider raising it
Dec 27 09:05:44 server mariadbd[881]: Aborted connection 305 to db: 'wp' user: 'wpuser' host: 'localhost' (Got timeout reading communication packets)

Meaning: You have corroboration: PHP-FPM saturation and DB communication timeouts (possibly due to overload).

Decision: Treat it as capacity/latency, not as “mystery WordPress slowness.” Fix pool sizing and DB performance; consider scaling.

Fixes that actually reduce TTFB

1) Stop doing work on every request: page cache where it makes sense

For anonymous traffic, full-page caching is the single biggest lever. Not because it’s trendy, but because it removes PHP and DB from the request path. You can do this at:

Nginx fastcgi_cache
Reverse proxy cache (Varnish, or a managed edge cache)
CDN cache for HTML (careful with cookies and vary logic)

But you have to be honest about your site:

If you’re WooCommerce-heavy, most valuable traffic is logged-in or cookie’d. Page cache hit rate may be low.
If you have heavy personalization, caching becomes a routing problem: what varies and why?

Decision-making tip: if your homepage is cacheable but product pages are not, caching still wins. Fix what you can cache first.

2) Persistent object cache: Redis (or Memcached) for the parts WordPress repeats

WordPress does the same lookups repeatedly: options, post meta, term relationships, transients. A persistent object cache reduces DB round trips and helps TTFB for dynamic pages.

What to watch:

Object cache does not fix slow external HTTP calls.
Object cache does not fix DB lock contention caused by write patterns.
Object cache can backfire if you turn it into a shared garbage heap with no key hygiene and no eviction strategy.

3) PHP-FPM: tune for your hardware, not your hopes

PHP-FPM settings are easy to change and easier to ruin. Your goal is to avoid queuing while keeping CPU and RAM stable.

Symptom: listen queue and max children reached climb during spikes.
Fix: Increase pm.max_children if you have headroom, and reduce per-request time so children free up faster.
Anti-fix: Doubling children on a RAM-tight box. You’ll swap, and swapping is just latency with extra steps.

Compute a rough ceiling: measure memory per PHP-FPM process under load, then set children so total doesn’t exceed available RAM minus DB and OS cache needs.

4) OPcache: give PHP a brain and enough room to use it

Without OPcache, PHP re-parses scripts and burns CPU and filesystem calls. With OPcache but too little memory, it thrashes and you’re back to slow starts.

Do this:

Increase opcache.memory_consumption for plugin-heavy sites.
Increase opcache.max_accelerated_files enough to cover theme + plugins.
Prefer immutable deploy patterns; disable timestamp validation when safe.

5) Database: make the hot queries cheap and the lock waits rare

Most WordPress DB pain is not exotic. It’s the basics you already know you should do:

Keep wp_options autoload under control. Plugins bloat it; you pay the tax on every request.
Index what you query. Many plugins ship questionable queries and assume the DB will “figure it out.” The DB will. It will figure out how to be slow.
Give InnoDB enough buffer pool. If reads hit disk, TTFB becomes a storage benchmark.
Watch lock-heavy tables (sessions, options, postmeta) under write load.

6) Storage: eliminate tail latency, not just average latency

TTFB suffers from tail latency. The 99th percentile matters because that’s what users remember and what your monitoring alerts on.

Common storage TTFB killers:

Shared network volumes with burst credits (fast until you’re not)
Busy backups running on the same disk as MySQL
Log storms (debug logs, access logs, slow logs all on one volume)
Filesystem metadata churn (tons of stat() calls if OPcache timestamp validation and no realpath cache tuning)

7) External calls: make them asynchronous or make them disappear

If WordPress waits on third-party APIs during request processing, you’ve turned your SLA into their hobby. Put hard timeouts on outbound calls, cache results, and move non-critical work to cron/queue.

Joke #2: Third-party APIs are like elevators: the moment you’re late, every single one is busy.

8) CDN and edge: use it to reduce origin hits, not to hide origin fires

A CDN helps when you can cache HTML and static assets. It also helps reduce TLS and TCP costs for global users. But if your origin TTFB is 1.5s for uncached dynamic pages, your CDN is just a polite messenger delivering bad news faster.

Three corporate-world mini-stories (and the lessons they paid for)

Story 1: The incident caused by a wrong assumption

The team had a WordPress site that was “fine” in the office and “awful” for users overseas. They pulled a few Lighthouse reports and blamed the theme. A sprint later, the theme was lighter and the problem was… unchanged.

During a production incident review, someone finally measured TTFB from two places: from the server itself and from a remote region. Inside the VPC, TTFB was consistently low. From outside, it spiked and wobbled. The application wasn’t slow; the path to the application was.

The load balancer was terminating TLS, and a WAF rule set was doing deep inspection on every request—including cached ones. Under peak traffic, the inspection queue grew, and TTFB grew with it. Everyone assumed “WAF is basically free.” It was not.

They fixed it by tightening the rule set, bypassing inspection for known static paths, and adding capacity where it mattered. The theme changes helped a little, but the real fix was admitting the wrong assumption: “TTFB equals PHP time.” It didn’t.

Story 2: The optimization that backfired

A different company decided to win the TTFB battle by increasing PHP-FPM pm.max_children aggressively. The graph looked great in staging. In production, it was a spectacular success for about ten minutes.

Then the database started timing out. Not because it was under-provisioned, but because the new PHP concurrency stampeded it. The number of simultaneous queries spiked, buffer pool churn increased, and lock contention showed up in places nobody had profiled. TTFB went from “meh” to “meltdown.”

To make it spicier, memory usage jumped. The box began swapping. Once swap enters the chat, latency becomes interpretive dance: sometimes fine, sometimes horrifying, always hard to reason about.

The eventual fix was boring: cap PHP concurrency to what the DB could handle, enable persistent object caching, clean up a few pathological queries, and size the DB buffer pool properly. They also added a load test that measured queueing, not just average latency. The lesson: if you “optimize” by increasing parallelism, you’re not optimizing—you’re negotiating with your bottleneck.

Story 3: The boring but correct practice that saved the day

A media site ran WordPress behind Nginx and a caching layer. They had a habit I love: whenever performance looked weird, they pulled three logs before touching anything—Nginx access log with upstream timing, PHP-FPM slowlog, and MySQL slow query log.

One afternoon, TTFB spiked intermittently. The on-call could have done the usual dance: restart PHP, clear caches, blame the last deploy. Instead, they followed the habit. Nginx showed upstream response time spiking. PHP slowlog showed workers stuck in filesystem calls. MySQL slow log was quiet.

They checked storage metrics and found latency spikes aligned with an automated backup that had drifted into business hours after a timezone config change. Nobody had to guess. Nobody had to argue. They moved the backup window and added an alert on sustained disk await.

This wasn’t heroic engineering. It was plain operational hygiene. The kind that looks unimpressive until it saves your weekend.

Common mistakes: symptom → root cause → fix

Symptom: High TTFB only for first request after deploy/restart
Root cause: Cold OPcache, cold page/object cache, JIT warmup, DNS cache empty
Fix: Warm caches (preload URLs), ensure OPcache sized correctly, avoid frequent restarts, and pre-resolve critical outbound DNS if you must call out.
Symptom: TTFB spikes during traffic peaks, fine off-peak
Root cause: PHP-FPM saturation and queueing, DB connection limits, cache stampede
Fix: Right-size pm.max_children, add page/object caching, implement cache locking or request coalescing, and reduce expensive uncached endpoints.
Symptom: wp-admin feels slow, front-end mostly okay
Root cause: Admin requests bypass page cache; heavy queries, plugin update checks, remote calls
Fix: Profile admin endpoints; disable or schedule update checks; fix slow queries; separate admin traffic if needed.
Symptom: Static assets fast, HTML slow
Root cause: PHP/DB work dominates; no effective cache; slow external calls
Fix: Add page cache where safe; add persistent object cache; remove synchronous remote calls; tune OPcache and DB.
Symptom: Random long TTFB outliers (p95/p99 ugly)
Root cause: Storage tail latency, DB lock waits, noisy neighbor effects, cron jobs overlapping traffic
Fix: Move DB to stable low-latency storage; reduce lock contention; reschedule heavy cron/backup; add alerts on I/O latency and lock waits.
Symptom: Increasing PHP-FPM children made it worse
Root cause: DB became the bottleneck; memory pressure caused swapping
Fix: Back down concurrency; measure per-process RSS; tune DB and caches first; scale vertically/horizontally with intent.
Symptom: External monitoring shows high TTFB, internal checks look fine
Root cause: DNS, TLS handshake, WAF/LB queueing, packet loss
Fix: Measure with curl timings from multiple regions; inspect LB/WAF metrics; tune TLS resumption; fix network loss; bypass expensive inspection for safe paths.
Symptom: TTFB worsened after enabling a cache plugin
Root cause: Cache configuration causes lock contention, cache purge storms, or disables OPcache-friendly patterns
Fix: Prefer caching at Nginx/proxy where possible; ensure cache keys are correct; avoid purging entire cache on minor changes; validate with load tests.

Checklists / step-by-step plan

Step-by-step: from “TTFB is bad” to a specific fix in one working session

Establish baselines: run curl timing from your laptop and from the server. Save the numbers.
Confirm scope: test homepage, a typical post, a product page, and a static asset. Identify “dynamic-only” slowness.
Read the web server’s truth: enable/verify Nginx rt and urt fields. Find top slow URLs.
Check PHP-FPM saturation: look at listen queue and max children reached. Decide if you’re queueing.
Turn on PHP slowlog (if not already) with a threshold you can live with (e.g., 2s). Capture stack traces during slowness.
Check for outbound calls: slowlog stacks and ss output. If present, fix that first—nothing else matters.
Check DB slow queries and locks: enable slow query logging; inspect processlist for locks.
Check storage latency: use iostat. If await and util are bad, treat storage as a primary suspect.
Implement one change at a time: cache layer, OPcache tuning, DB buffer pool sizing, PHP pool sizing.
Validate with the same measurements: curl timing and log timings. Keep before/after evidence.

Operational checklist: keep TTFB from regressing next month

Access logs include request time and upstream response time.
PHP-FPM status endpoint is protected and monitored (queue, active processes, max children reached).
Slow logs are enabled (PHP and DB) with sane retention.
Backups and heavy cron jobs are scheduled and monitored for runtime drift.
Deploy process warms caches or at least avoids simultaneous cold starts across all nodes.
There’s a load test that measures p95/p99, not just average.
Plugin policy exists: anything doing synchronous external I/O in the request path is treated as production risk.

FAQ

1) What’s a “good” TTFB for WordPress?

For cached HTML at the edge or proxy: tens of milliseconds to low hundreds. For uncached dynamic WordPress: aiming for sub-400ms at p50 is reasonable on a healthy stack. Your p95 is what will hurt you publicly.

2) Why is TTFB high even when CPU usage looks low?

Because you can be blocked without being CPU-bound: waiting on disk I/O, DB locks, outbound HTTP calls, DNS timeouts, or a saturated PHP-FPM queue where workers are stuck elsewhere.

3) Do I need a caching plugin?

Not necessarily. Caching at the server/proxy layer is often more predictable and faster. Plugins can help manage cache purges, but they can also add complexity and failure modes. If you use one, treat it like infrastructure: configure, measure, and test.

4) If I add Redis, will TTFB magically drop?

No. Redis helps when the DB is doing repetitive reads (options, meta, taxonomy). It won’t fix slow PHP code, remote calls, or DB lock contention from write-heavy behavior. It’s a lever, not a religion.

5) Should I just increase PHP-FPM `pm.max_children`?

Only if you’ve proven you’re queueing and you have CPU/RAM headroom. Otherwise you’ll increase concurrency until you hit the real bottleneck (often the DB) and make latency worse.

6) Why does TTFB spike randomly at night?

Because “night” is when cron jobs and backups come out. Look for DB dumps, filesystem snapshots, log rotation, malware scans, or plugin scheduled jobs drifting into peak windows.

7) How do I know if the database is the bottleneck without fancy APM?

Correlate: Nginx upstream response time spikes + PHP slowlog shows time in DB calls + MySQL slow log/lock waits show activity. If DB time dominates, you’ll see it across those signals.

8) Does HTTP/2 or HTTP/3 reduce TTFB?

They can reduce connection overhead and improve multiplexing, especially on lossy networks. But if your origin is spending 1–2 seconds building HTML, the protocol won’t save you. Fix the origin first, then enjoy the protocol gains.

9) Is a CDN worth it if my TTFB is high?

Yes for static assets and cacheable HTML, because it reduces origin load and improves global latency. But don’t use it as an excuse to ignore origin performance. Cache misses will still reveal the truth.

10) Why is wp-admin slower after I “optimized” the front-end?

Because most optimizations you did were for cacheable pages. wp-admin is dynamic, often bypasses caches, and triggers extra plugin behaviors (update checks, analytics, remote APIs). Profile it separately.

Next steps you can do this week

Get the numbers: curl timing from inside and outside, and Nginx logs with upstream timings.
Prove or disprove PHP-FPM queueing: enable and check /fpm-status; watch max children reached and listen queue during load.
Enable PHP slowlog and wait for it to tell you the uncomfortable truth (usually external calls or one expensive code path).
Turn on DB slow query log for a day, then fix the top offenders and autoload bloat.
Validate storage latency with iostat during peak. If you’re on bursty or shared volumes, plan to move.
Add caching deliberately: page cache for anonymous traffic; object cache for dynamic repetition; keep cache keys and purge behavior sane.
Retest and lock it in: same curl measurements, same log inspection, and a small load test. Write down the results so future-you doesn’t repeat the same detective work.

If you do the measuring and you still can’t explain where the time goes, that’s your signal to add tracing (even lightweight) and to stop guessing. Production systems don’t respond to vibes.