WordPress Performance: The Cache Setup That Actually Survives Real Traffic

Was this helpful?

Traffic spikes don’t kill WordPress. Unplanned cache behavior does. The site looks fine at 30 requests per second… until the homepage cache expires,
a purge fires, logged-in users bypass cache, and suddenly PHP and MySQL are doing CrossFit against your will.

This is the practical, production-grade caching setup that holds up when marketing “just sends” a campaign, a crawler gets excited, or your CEO
refreshes the homepage 400 times to “test it.” We’ll build a layered cache, define what must not be cached, and make the purge story boring—because
boring is how uptime happens.

What actually survives real traffic (and what doesn’t)

WordPress performance conversations often start with plugins. Real traffic doesn’t care about your plugin list. It cares about whether your
architecture can serve the 95% of requests that should be identical without running PHP at all.

A survivable setup has three properties:

  • Layering: CDN/edge cache, origin full-page cache, object cache, and only then compute.
  • Correct bypass rules: logged-in sessions, carts, previews, and admin must not be cached the same way as public pages.
  • Controlled invalidation: you purge narrowly and predictably, and you avoid stampedes when caches expire.

If your “cache” still means “hit PHP, then maybe use a plugin to store HTML on disk,” you’re optimizing the wrong thing. Disk I/O and PHP workers
are not where you want your peak traffic to land.

The goal: for anonymous traffic, return a cached response from the CDN or Nginx in a few milliseconds, with PHP asleep and MySQL blissfully unaware.
For logged-in and transactional flows, make the dynamic path efficient and stable. You can’t cache your way out of a broken checkout, but you can
stop the homepage from lighting it on fire.

Facts and historical context (short, useful, mildly alarming)

  1. WordPress started (2003) in an era when “caching” often meant “turn on gzip and pray.” Modern traffic patterns are harsher: bots, scrapers, and link previews act like permanent load tests.
  2. Memcached predates Redis and was a go-to object cache for early WordPress scaling; Redis later gained popularity because persistence, data structures, and operational tooling improved the day-2 experience.
  3. Varnish popularized HTTP caching in front of dynamic apps; Nginx later absorbed much of that role for many teams with FastCGI cache and simpler ops.
  4. “Cookie explosion” is a modern WordPress tax: analytics, A/B testing, chat widgets, consent banners—each cookie can silently destroy cache hit rates if you vary on it.
  5. HTTP/2 didn’t solve backend bottlenecks: it improved connection handling and multiplexing, but if PHP-FPM workers are saturated, the browser still waits politely.
  6. Most WordPress sites are read-heavy until they aren’t. A marketing campaign can convert a “mostly read” site into “everyone hits /wp-admin/admin-ajax.php” in minutes.
  7. MySQL query cache is gone (removed in MySQL 8). If you’re still relying on it as a concept, you’re living in a museum with very fast Wi‑Fi.
  8. OPcache is not optional for PHP performance; without it, PHP recompiles scripts under load and you’re paying CPU for the privilege of being slow.

The target architecture: layered caching you can reason about

Here’s the stack that actually behaves under pressure:

  • CDN (edge): caches public pages and assets; enforces sane cache keys; shields origin from spikes.
  • Origin Nginx full-page cache: FastCGI cache for anonymous GET/HEAD; microcaching for certain endpoints if needed.
  • PHP-FPM + OPcache: tuned worker counts; stable memory; no “max_children roulette.”
  • Redis object cache: caches expensive WP object lookups; avoids repeated DB hits for options/transients.
  • MySQL: tuned for concurrency; slow query visibility; indexes that match reality.
  • Observability: cache hit metrics, upstream timings, and request classification (cached vs bypass vs dynamic).

You’ll notice what’s missing: “a plugin that promises 10x speed.” Plugins can help with integration and purge hooks, but the core survival mechanism
is at the HTTP layer. That’s where you stop load from entering your runtime in the first place.

One paraphrased idea that belongs taped to every on-call laptop, attributed to Werner Vogels (reliability/architecture): Everything fails eventually; design so failure modes are contained and predictable.

Joke #1: Cache invalidation is hard. The other hard problem is explaining to finance why “more CPU” didn’t fix “too many requests.”

Layer 1: CDN caching that doesn’t break your site

If you have a CDN and your origin still sees most anonymous traffic, your CDN is basically a very expensive TLS terminator.
The first job is making the CDN confidently cache what’s safe.

What the CDN should cache

  • Static assets: images, CSS, JS, fonts. Long TTL, immutable filenames if possible.
  • Public HTML pages for anonymous users: homepage, posts, categories, tags, marketing pages.
  • Some API responses if you control them and they’re public (rare for WordPress unless you front a read-only endpoint).

What the CDN must not cache (or must vary carefully)

  • Anything that changes per user: logged-in pages, admin, account, checkout, cart.
  • Responses that set cookies or depend on cookies unless your cache key is disciplined.
  • Preview pages, drafts, password-protected posts.

Practical advice: treat Set-Cookie as a toxic waste symbol for public HTML. If the origin is setting cookies on anonymous pageviews
because a plugin “needs it,” your CDN hit rate will die quietly.

Cache key discipline: your hit rate depends on it

By default, many CDNs can vary cache on a long list: query strings, headers, and sometimes cookies. That is a hit-rate murder weapon.
For WordPress, you typically want:

  • Vary on URL path and a controlled set of query parameters (often none for HTML).
  • Ignore most cookies for anonymous traffic, but bypass cache when specific WordPress cookies exist.
  • Honor Cache-Control from origin only if you trust your origin to be correct. Most don’t start correct.

If your CDN supports “serve stale on error” and “serve stale while revalidating,” turn those on for public content. It’s the difference between
“origin is sick but users are fine” and “origin is sick and everyone knows.”

Cache key rules: cookies, headers, and why your hit rate is lying

WordPress sets and reads a few cookies that matter:

  • wordpress_logged_in_*: user is logged in. Bypass full-page cache and CDN HTML cache.
  • wp-postpass_*: password-protected content. Don’t cache publicly.
  • woocommerce_cart_hash, woocommerce_items_in_cart: transactional state. Treat as bypass signals.
  • comment_author_*: can affect page rendering for comment forms. Usually best to bypass or vary cautiously.

Now the more subtle problem: third-party cookies. Many plugins set cookies for “A/B test group,” “referral source,” “consent,” “chat session,” and
so on. If your caching layer varies by cookie or if the origin sends different HTML based on these cookies, you fragment your cache into tiny pieces.

Decision rule: if a cookie does not materially change the HTML for anonymous users, it must not be part of the cache key. If it does change
the HTML, you need to decide whether that personalization is worth the performance cost. In most corporate environments, the answer is “no, not on the
critical landing pages.”

Layer 2: Origin full-page cache (Nginx FastCGI) done properly

Origin full-page caching is the workhorse. When the CDN misses (cold edge, purge, geo-specific behavior), your origin should still serve cached HTML
without running WordPress. Nginx FastCGI cache does this reliably if you configure it with grown-up rules.

Why FastCGI cache beats “HTML file cache plugins” in production

  • Concurrency: Nginx can serve cached responses with minimal overhead, even under high load.
  • Isolation: Cache sits in front of PHP, so PHP worker exhaustion doesn’t immediately take down public pages.
  • Control: You can define cache keys, bypass rules, and TTLs in a single place, and you can observe it.

Baseline Nginx FastCGI cache pattern (conceptual)

You want a cache key that ignores junk, a bypass for logged-in/cart/admin cookies, and a protection against stampedes.
Use fastcgi_cache_lock so one request warms the cache while others wait briefly instead of dogpiling PHP.

Practical TTL guidance:

  • Public posts/pages: 5–30 minutes at origin is common if you have purge hooks; longer if you don’t.
  • Homepage: often shorter if it changes frequently, but still cached.
  • Search pages: tricky; often cache briefly (30–120 seconds) or skip if personalized.

Microcaching (1–5 seconds) is a legitimate tool for endpoints you can’t fully cache but that get hammered (some AJAX endpoints). Don’t lead with it.
Use it when you have evidence.

Layer 3: Object cache (Redis) without turning it into a dumpster

Object caching helps most when your pages aren’t fully cached: logged-in dashboards, WooCommerce flows, and any site with lots of dynamic blocks.
It reduces repeated database queries for options, transients, and repeated lookups inside a single request and across requests.

But Redis object caching has a failure mode: it can become a shared junk drawer of unbounded keys, unpredictable TTLs, and “helpful” plugins storing
entire rendered fragments. When Redis starts evicting hot keys or swapping, it won’t just be slow—it will be creatively slow.

Rules that keep Redis helpful

  • Cap memory: set maxmemory and a sane eviction policy (often allkeys-lfu for mixed workloads).
  • Keep Redis local when possible: network latency adds up. If it must be remote, keep it close and monitor p99.
  • Use a clear prefix: avoid key collisions and make it possible to purge by prefix if needed.
  • Watch fragmentation and evictions: evictions are not “normal.” They’re a symptom of “we’re guessing.”

When object cache does not help

  • If you already serve most traffic from CDN + full-page cache, object cache doesn’t change the mainline.
  • If your bottleneck is PHP CPU due to heavy template logic, Redis won’t save you much.
  • If your database is slow because of missing indexes, caching can mask the issue until it can’t.

Layer 4: PHP-FPM and OPcache (because cache misses are real)

You can build a great cache and still melt down because the uncached path is unstable. Logged-in users, admin screens, and purge warmups will hit PHP.
If PHP-FPM is tuned like a hobby project, it will behave like one.

PHP-FPM: tune for memory first, then concurrency

The classic mistake is setting pm.max_children based on CPU count. WordPress is memory-hungry. You set max children based on:
(available RAM – safety margin) / average PHP process RSS.

You also need to observe:

  • max children reached events: means requests are queueing.
  • slow log entries: means specific scripts are taking too long.
  • request duration distribution: p95 and p99 matter more than average.

OPcache: the cheapest speedup you can buy

With OPcache enabled and sized correctly, PHP avoids recompiling scripts and keeps hot code in memory.
Under load, OPcache mis-sizing shows up as high CPU, frequent restarts, and “why is this slower after deploy?”

Layer 5: MySQL tuning for WordPress workloads

WordPress is not exotic. It’s a classic web app: lots of reads, some writes, and a few queries that become monstrous when your dataset grows.
Most WordPress database pain comes from:

  • Missing indexes for plugin-created tables or meta queries.
  • Autoloaded options bloat in wp_options.
  • Slow admin queries that nobody tests under production data volume.
  • Connection storms when caches expire and PHP workers spike.

The database should not be your page cache. It should be a durable source of truth. If your homepage needs 200 queries, caching hides it; it doesn’t
fix it. But with the right cache layers, DB load becomes predictable and you can tune and index based on real outliers.

Cache invalidation and purge: predictable beats clever

“Purge everything on publish” is the WordPress equivalent of pulling the fire alarm to test it. Sure, the alarm works. Now everyone’s outside and
angry.

Invalidation should be:

  • Narrow: purge the URL that changed and the small set of pages that reference it (category page, homepage, feeds).
  • Rate-limited: especially for bulk updates, imports, and editorial workflows.
  • Observable: you can see purge volume and correlate it to cache hit rate drops and origin load.

Stampede protection matters. If you purge a popular URL and 5,000 users request it at once, you want one request to rebuild and the rest to wait or
receive stale content briefly.

Joke #2: The easiest way to increase cache hit rate is to publish less. Editors have not embraced this revolutionary idea.

Fast diagnosis playbook: find the bottleneck in minutes

When the site is slow, you don’t have time for philosophy. You need a fast classification: are we missing cache, saturated in PHP, or blocked on the
database? Here’s the order that finds answers quickly.

First: confirm where time is spent (edge vs origin vs upstream)

  1. Check CDN/edge cache status headers (HIT/MISS/BYPASS). If you’re mostly MISS, fix caching rules before touching PHP.
  2. Check origin response headers for FastCGI cache status (HIT/MISS/BYPASS/EXPIRED).
  3. Measure TTFB from outside and compare to origin timings. If edge is fast but origin is slow, you have an origin problem but users may be fine.

Second: check saturation signals

  1. PHP-FPM: max children reached, listen queue, slowlog entries.
  2. MySQL: running threads, slow queries, lock waits.
  3. CPU and memory: swapping, steal time (VM), iowait (storage).

Third: identify the request class causing pain

  1. Is it /wp-admin/admin-ajax.php?
  2. Is it search?
  3. Is it a crawler ignoring robots?
  4. Is it a purge storm?

The fastest path to stability is often not “optimize WordPress.” It’s “stop sending dynamic traffic to WordPress.”
That means caching, rate limiting, and bypass discipline.

Practical tasks (commands + outputs + decisions)

These are the tasks I actually run during incident response and performance tuning. Each one includes: command, what the output means, and the
decision you make.

Task 1: Measure TTFB and confirm cache headers from the edge

cr0x@server:~$ curl -s -D- -o /dev/null https://example.com/ | egrep -i '^(HTTP/|age:|cache-control:|cf-cache-status:|x-cache:|via:|server-timing:)'
HTTP/2 200
cache-control: public, max-age=300
age: 142
cf-cache-status: HIT
via: 1.1 varnish

Meaning: cf-cache-status: HIT and a non-zero age suggest the CDN is serving cached HTML. Good.

Decision: If you see MISS for your top pages during normal traffic, fix CDN cache rules and cookie bypass logic before touching origin tuning.

Task 2: Compare edge vs origin directly (bypass CDN)

cr0x@server:~$ curl -s -D- -o /dev/null --resolve example.com:443:203.0.113.10 https://example.com/ | egrep -i '^(HTTP/|x-fastcgi-cache:|x-cache:|server:|cache-control:)'
HTTP/2 200
server: nginx
cache-control: public, max-age=300
x-fastcgi-cache: HIT

Meaning: You hit the origin IP; x-fastcgi-cache: HIT means Nginx is serving cached HTML without PHP.

Decision: If origin is MISS while CDN is MISS, expect a meltdown: you’re running WordPress for anonymous traffic. Fix origin full-page caching immediately.

Task 3: Check Nginx cache hit ratio from logs (quick and dirty)

cr0x@server:~$ sudo awk '{print $NF}' /var/log/nginx/access.log | sort | uniq -c
  81234 HIT
  10421 MISS
   3211 BYPASS
    442 EXPIRED

Meaning: This assumes your log format ends with cache status. High HIT is healthy; lots of BYPASS indicates cookie rules or auth traffic.

Decision: If BYPASS is high for pages that should be public, find which cookies are causing bypass and stop setting them for anonymous users.

Task 4: Identify top URLs causing origin load

cr0x@server:~$ sudo awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  15432 /wp-admin/admin-ajax.php
   9821 /
   6110 /wp-json/wp/v2/posts
   5880 /product/widget-1/
   4122 /?s=widget

Meaning: This shows the busiest endpoints. Admin AJAX and search are usual suspects.

Decision: If /wp-admin/admin-ajax.php dominates, investigate what’s calling it (front-end features, heartbeat, plugins), then rate-limit or cache/microcache where safe.

Task 5: Check PHP-FPM saturation (status page)

cr0x@server:~$ curl -s http://127.0.0.1/php-fpm-status | egrep -i 'pool|process manager|active processes|idle processes|max active processes|listen queue'
pool:                 www
process manager:      dynamic
active processes:     48
idle processes:       2
max active processes: 50
listen queue:         37

Meaning: You’re at the ceiling; queue is building. Requests are waiting for workers.

Decision: If memory allows, increase pm.max_children. If memory doesn’t allow, reduce dynamic load via caching/bypass rules and fix expensive endpoints.

Task 6: Confirm “max children reached” events

cr0x@server:~$ sudo journalctl -u php8.2-fpm --since "30 min ago" | egrep -i 'max_children|max children'
Feb 04 10:12:09 web1 php-fpm8.2[1203]: [WARNING] [pool www] server reached pm.max_children setting (50), consider raising it

Meaning: Hard evidence of saturation.

Decision: Treat this as a capacity planning signal. Either increase workers (RAM permitting) or reduce uncached traffic reaching PHP.

Task 7: Estimate PHP worker memory to set max_children safely

cr0x@server:~$ ps -ylC php-fpm8.2 --sort:rss | awk 'NR==1{print} NR>1{rss+=$8; n++} END{printf "avg_rss_kb=%d\n", rss/n}'
S   UID     PID    PPID  C PRI  NI   RSS    SZ WCHAN  TTY          TIME CMD
avg_rss_kb=142000

Meaning: Average resident memory per worker ~142MB. That’s typical for WordPress with plugins.

Decision: If you have, say, 8GB free for PHP, you can’t set 200 workers. You’ll swap and performance will die slowly and publicly.

Task 8: Check OPcache status and memory pressure

cr0x@server:~$ php -i | egrep -i 'opcache.enable|opcache.memory_consumption|opcache.interned_strings_buffer|opcache.max_accelerated_files'
opcache.enable => On => On
opcache.memory_consumption => 256
opcache.interned_strings_buffer => 16
opcache.max_accelerated_files => 20000

Meaning: OPcache is enabled and sized. Memory may still be insufficient depending on codebase size.

Decision: If deploys cause CPU spikes and cache resets, increase OPcache memory and validate that you aren’t frequently restarting PHP-FPM.

Task 9: Check Redis health for object cache

cr0x@server:~$ redis-cli INFO memory | egrep -i 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'
used_memory_human:1.42G
maxmemory_human:2.00G
mem_fragmentation_ratio:1.57

Meaning: Redis is using 1.42G of 2G, fragmentation is moderate/high. Fragmentation can rise under churn.

Decision: If fragmentation climbs and latency rises, consider tuning eviction policy, reducing key churn (bad plugins), or provisioning headroom.

Task 10: Check Redis evictions (a quiet performance killer)

cr0x@server:~$ redis-cli INFO stats | egrep -i 'evicted_keys|expired_keys|keyspace_hits|keyspace_misses'
keyspace_hits:98234123
keyspace_misses:7712231
expired_keys:412332
evicted_keys:118443

Meaning: Evictions mean Redis is under memory pressure and is discarding keys you wanted cached.

Decision: If evictions are ongoing during peak, increase Redis memory, change what you store, or set TTLs more sanely. Don’t pretend evictions are fine.

Task 11: Check MySQL threads and contention quickly

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_running'; SHOW GLOBAL STATUS LIKE 'Threads_connected';"
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Threads_running | 64    |
+-----------------+-------+
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 210   |
+-------------------+-------+

Meaning: Many running threads suggests DB is busy; connected threads suggests connection pressure.

Decision: If running threads spikes when cache expires, you have a stampede problem. Add cache locks, stale serving, and reduce bypass traffic.

Task 12: Find slow queries (top offenders)

cr0x@server:~$ sudo mysqldumpslow -s t -t 10 /var/log/mysql/mysql-slow.log
Count: 18  Time=2.31s (41s)  Lock=0.00s (0s)  Rows=1.0 (18), root[root]@localhost
  SELECT option_value FROM wp_options WHERE option_name = 'autoload_big_blob' LIMIT 1;
Count: 9  Time=1.97s (17s)  Lock=0.12s (1s)  Rows=1200.0 (10800), app[app]@10.0.0.12
  SELECT * FROM wp_postmeta WHERE meta_key = '...' AND meta_value LIKE '...%';

Meaning: Options lookups and unindexed meta queries are killing you. The second query screams “plugin doing a meta search.”

Decision: Fix autoload bloat, add targeted indexes where safe, and consider redesigning queries (or disabling the plugin feature). Caching alone won’t make unindexed LIKE fast.

Task 13: Check autoloaded options size (common WordPress landmine)

cr0x@server:~$ mysql -e "SELECT ROUND(SUM(LENGTH(option_value))/1024/1024,2) AS autoload_mb FROM wp_options WHERE autoload='yes';"
+-------------+
| autoload_mb |
+-------------+
| 18.47       |
+-------------+

Meaning: 18MB of autoloaded options means every request loads a bunch of stuff into memory. This grows silently over time.

Decision: Audit which options are autoloaded, disable autoload for large blobs, and fix the plugin/theme creating it.

Task 14: Confirm cache-control and vary headers from origin

cr0x@server:~$ curl -s -D- -o /dev/null https://example.com/ | egrep -i 'cache-control:|pragma:|expires:|vary:|set-cookie:'
cache-control: public, max-age=300
vary: Accept-Encoding

Meaning: No Set-Cookie on public homepage, and Vary isn’t exploding on irrelevant headers.

Decision: If you see Set-Cookie on public pages, find the plugin setting it and stop it. Otherwise you’re paying for cache and not getting cache.

Task 15: Detect bot spikes and bad actors quickly

cr0x@server:~$ sudo awk -F\" '{print $6}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  22110 Mozilla/5.0 (compatible; SomeBot/1.0; +http://bot.example)
  11842 Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36 Chrome/121.0 Mobile
   7441 Mozilla/5.0 (compatible; AnotherCrawler/2.1)

Meaning: A single bot dominates traffic. That can be fine if cached; disastrous if it bypasses cache.

Decision: If bots are hammering dynamic endpoints, add rate limits, tighten robots rules, and ensure public pages are cacheable so bots are cheap.

Task 16: Validate Nginx upstream response time distribution

cr0x@server:~$ sudo awk '{print $(NF-1)}' /var/log/nginx/access.log | sed 's/upstream_response_time=//' | awk -F, '{print $1}' | sort -n | tail -n 5
0.842
1.003
1.217
2.884
5.991

Meaning: Your slowest upstream responses are multiple seconds. Those are likely cache misses hitting PHP/MySQL.

Decision: Investigate the slow endpoints and align cache policy: either cache them, or make them cheaper (indexes, code fixes), or protect them (rate limits).

Common mistakes: symptoms → root cause → fix

1) CDN hit rate is low even though pages are “cacheable”

Symptoms: CDN shows many MISS/BYPASS; origin load is high during spikes.

Root cause: Cache key varies on cookies or query strings that shouldn’t vary; origin sends Set-Cookie on anonymous HTML.

Fix: Strip irrelevant cookies from cache key; bypass only on WordPress auth/cart cookies; stop plugins from setting cookies for anonymous users; normalize query strings.

2) “Works in staging” but production melts on cache expiry

Symptoms: Every few minutes, latency spikes; PHP and DB surge; then it “recovers.”

Root cause: Cache stampede: many clients miss at once; no cache locking; short TTL on hot pages; purge storms.

Fix: Enable cache lock at origin, serve stale while revalidating at CDN, stagger TTLs (jitter), and stop purging everything.

3) WooCommerce pages randomly show wrong content

Symptoms: Cart contents leak, “Hello Bob” appears for Alice, prices vary unexpectedly.

Root cause: Full-page caching applied to personalized/transactional pages; cache key ignores session state.

Fix: Bypass cache on cart/checkout/my-account, bypass on WooCommerce cookies, and never cache responses that set session cookies.

4) Admin is slow while public site is fast

Symptoms: Editors complain; wp-admin timeouts; public pages are fine.

Root cause: Admin bypasses full-page cache and hits heavy queries (postmeta searches, autoload bloat) and slow PHP execution.

Fix: Add Redis object cache, reduce autoloaded options, fix slow queries, tune PHP-FPM for logged-in concurrency, and consider separating admin traffic if needed.

5) After enabling Redis, performance gets worse

Symptoms: Higher latency, timeouts, Redis CPU high, evictions.

Root cause: Redis under-provisioned; key churn and eviction; remote Redis latency; using Redis for giant transient blobs.

Fix: Increase memory, choose sane eviction policy, keep Redis near the app, audit plugins storing large values, and monitor evictions and latency.

6) “Just add more PHP workers” causes swapping and collapse

Symptoms: Load average climbs; iowait spikes; everything slows; kernel logs show memory pressure.

Root cause: Worker count set beyond RAM capacity; per-worker RSS underestimated; OPcache or PHP memory leaks; no headroom.

Fix: Measure RSS, set max_children based on memory, add headroom, and reduce dynamic traffic via caching. Scaling wrong is worse than not scaling.

7) Purge runs cause brief outages

Symptoms: Right after deploy/publish, origin spikes; CDN goes MISS; user-facing latency spikes.

Root cause: Purging broad patterns (everything); no warmup; no cache lock; TTL too short.

Fix: Purge only changed URLs, rate limit purge, use cache lock, optionally prewarm critical pages, and allow stale on the edge.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company ran WordPress as the marketing front door for a product that had grown up fast. They had a CDN, a managed database, and a
container platform. It looked modern. The performance was “fine” until it wasn’t.

The wrong assumption: “If we set Cache-Control: public, the CDN will cache HTML and we’re done.” In reality, a consent plugin set a
cookie on the first pageview for every user, and their CDN configuration varied cache on all cookies by default. So every user got their own private
cache entry. Hit rate cratered without anyone noticing, because the dashboard’s “requests served” still looked impressive.

Then they launched a campaign. The homepage was suddenly a PHP endpoint again. PHP-FPM maxed out, MySQL connections spiked, and the health checks
started flapping. The CDN didn’t protect them because it wasn’t caching the thing they thought it was caching.

The fix was boring and effective: bypass cache only on specific WordPress cookies, ignore the consent cookie in the cache key, and stop setting
marketing cookies on HTML responses when not needed. They also added an origin FastCGI cache as a backstop.

The lesson: in performance work, the enemy isn’t complexity. It’s implicit behavior. If you can’t explain your cache key in one sentence,
you don’t have a cache—you have a rumor.

Mini-story 2: The optimization that backfired

Another team wanted faster “freshness” for editors. They reduced cache TTLs on everything to 30 seconds and added aggressive purge-on-publish for
good measure. The public site felt snappier for content updates. Editors were happy. For about a week.

The backfire showed up during normal weekday traffic. Every 30 seconds, popular pages expired at roughly the same time. The CDN and origin cache
experienced synchronized misses. Suddenly, thousands of requests per minute were rebuilding identical pages by running PHP and hitting MySQL. It was
a textbook stampede, except it was self-inflicted.

They tried to “fix” it by increasing PHP-FPM workers. That pushed the servers into memory pressure and intermittent swapping. Latency got worse and
less predictable. The monitoring graphs looked like modern art: spiky, colorful, and not suitable for decision-making.

The real fix: longer origin TTLs, cache locking, edge stale-while-revalidate, and targeted purge for changed URLs only. They added jitter to TTLs for
a few hot routes so expiries didn’t align. Freshness remained acceptable because purges were correct, not because TTLs were tiny.

The lesson: lowering TTLs is not “more real-time.” It’s “more load.” If you need freshness, build invalidation you can trust.

Mini-story 3: The boring practice that saved the day

A large enterprise ran multiple WordPress properties behind a shared edge. The team was not flashy. They had runbooks, tested restores, and a habit
of measuring before changing anything. Their biggest “innovation” was writing down what cookies meant.

During a major product announcement, traffic surged and then surged again when a news aggregator picked it up. What should have been a fire drill
turned into a mildly tense hour of watching dashboards.

The reason: they had a strict caching policy for anonymous traffic, with explicit bypass rules for auth/cart cookies, and they logged cache status on
every request. When the edge started seeing more MISS due to regional propagation lag, the origin FastCGI cache was already warm and lock-protected.
PHP utilization rose slightly but never spiked into queueing territory.

A bot did start hammering search queries, which were not cached. Their rate limit rules caught it, returning 429s without punishing real users.
Editors continued publishing without purging the entire site because their purge integration was narrow and rate-limited.

The lesson: boring practices—documented cache keys, cache status logging, and rate limiting—are not bureaucracy. They’re how you avoid explaining an
outage to people who don’t want to learn what a cookie is.

Checklists / step-by-step plan

Phase 1: Make anonymous traffic cheap (1–2 days)

  1. Define “anonymous” precisely: no wordpress_logged_in_*, no WooCommerce cart cookies, no post password cookies.
  2. Enable CDN caching for public HTML with a controlled cache key; bypass only on known auth/cart cookies.
  3. Ensure origin does not set cookies on public pages. Fix the plugin doing it or configure it to avoid anonymous cookies.
  4. Deploy origin full-page cache (Nginx FastCGI) with cache lock enabled and clear bypass rules.
  5. Log cache status at CDN (if possible) and at origin. If you can’t measure hit rate, you’re guessing.

Phase 2: Stabilize the dynamic path (2–5 days)

  1. Enable OPcache and confirm it’s properly sized.
  2. Tune PHP-FPM max children based on measured RSS and available RAM.
  3. Deploy Redis object cache if logged-in traffic and admin are significant. Set maxmemory and monitor evictions.
  4. Turn on slow logs for PHP-FPM and MySQL. Identify top offenders before “optimizing.”
  5. Fix autoload bloat in wp_options. This often yields outsized gains.

Phase 3: Make invalidation safe (ongoing)

  1. Implement narrow purges on publish/update: changed page + homepage + relevant archives, not “purge all.”
  2. Rate limit purge calls and batch them for bulk operations.
  3. Enable stale serving at the edge for brief windows during revalidation or origin errors.
  4. Prewarm selectively (homepage and top landing pages) after deploys, not the entire sitemap.
  5. Continuously validate bypass rules when marketing adds new scripts or plugins add cookies.

FAQ

1) Do I really need both a CDN cache and an origin full-page cache?

Yes, if you care about surviving real traffic. The CDN reduces global latency and absorbs spikes, but cache misses happen (purges, cold edges,
geography, headers). Origin cache is your backstop so misses don’t automatically become PHP work.

2) Isn’t a WordPress caching plugin enough?

Plugins help with purge hooks and some integration, but the most robust caching happens at the HTTP layer (CDN + Nginx). Plugin file caching can be
fine on small sites, but it’s not the architecture you want to bet your incident response on.

3) How do I know if a cookie is killing my cache hit rate?

Inspect response headers for Set-Cookie on public pages and review your CDN cache key rules. Then correlate cache status (HIT/MISS) with
the presence of certain cookies in requests. If “anonymous” requests still carry dozens of cookies, expect fragmentation.

4) Should I cache search results?

Sometimes. If search is public and not personalized, short TTL caching (or microcaching) can help a lot. If search depends on user state or includes
per-user pricing, be careful. Alternatively, rate limit abusive query patterns and consider improving search implementation.

5) What about WooCommerce?

Cache product pages for anonymous users. Bypass cache for cart, checkout, and account pages, and for requests with cart/session cookies. Also watch
AJAX endpoints and fragments; those can generate surprising load.

6) Redis or Memcached for object cache?

Both can work. Redis tends to win on operational flexibility and tooling in many orgs. The key is not the brand; it’s memory sizing, eviction
policy, latency, and plugin discipline.

7) How long should my TTLs be?

Long enough to be useful, short enough to be correct. If you have reliable purges, TTL can be longer. If purges are flaky, TTL becomes your safety
net. For many sites, 5–30 minutes at origin for public pages is a sane starting point, with edge caching often longer.

8) Should I enable “cache everything” at the CDN?

Not blindly. You can cache public HTML aggressively, but you must implement explicit bypass rules for logged-in and transactional cookies. “Cache
everything” without cookie discipline is how you get security incidents disguised as performance wins.

9) My public pages are fast, but API calls are slow. What then?

Identify which endpoints are slow and whether they should be cached or rate-limited. For wp-json, decide what’s public and stable. For
admin AJAX, investigate callers and reduce chatter. A fast homepage doesn’t help if the rest of the app is a denial-of-service generator.

Next steps you can ship this week

  • Instrument cache status end-to-end: edge cache status, origin FastCGI cache status, upstream response time.
  • Fix the cache key: bypass only on WordPress auth/cart cookies; ignore the rest for anonymous HTML.
  • Deploy origin FastCGI cache with lock: stop stampedes from hitting PHP.
  • Make purges narrow: purge changed URLs and a small dependency set; rate limit bulk operations.
  • Tune PHP-FPM based on memory: measure RSS; set max_children safely; confirm no swapping.
  • Audit autoload options: reduce bloat; remove plugin-generated blobs from autoload.
  • Protect hot dynamic endpoints: rate limit abusive patterns; microcache only with evidence.

The outcome you want is not “a fast site in the lab.” It’s a site that stays fast when real people arrive with real browsers, real bots,
and real chaos. Layered caching, disciplined bypass rules, and boring invalidation get you there.

← Previous
Proxmox PBS: Backups Succeed but Restores Fail — The Checklist That Catches It
Next →
Stop Apps from Auto‑Starting: The PowerShell Method That Actually Sticks

Leave a comment