WordPress 100% CPU usage: find the plugin/bot hammering your site

Was this helpful?

When WordPress pins a core at 100%, it doesn’t feel like “performance tuning.” It feels like your site is melting in public while you’re trying to explain to someone non-technical that, yes, the server is “up,” but no, it’s not actually working.

The good news: CPU spikes are usually diagnosable. The bad news: people tend to guess. They disable random plugins, restart PHP-FPM like it’s a ritual, and declare victory until the next spike. Let’s do this like we run production systems: measure, attribute, decide, then fix without breaking checkout.

Fast diagnosis playbook (do this first)

This is the “stop the bleeding and identify the attacker” path. Don’t optimize. Don’t refactor. Don’t argue with the graph. Follow the chain of custody from CPU to process to request to code.

1) Confirm it’s real CPU saturation (not just “load”)

  • Check CPU, run queue, steal time (VMs lie when neighbors are noisy).
  • Decide: if steal time is high, your app may be innocent; your hypervisor is not.

2) Identify which process family is burning CPU

  • Is it php-fpm workers? A single runaway process? mysqld? Something else?
  • Decide: if PHP is hot, next step is request attribution. If MySQL is hot, jump to slow queries.

3) Attribute CPU to a request path and client

  • Correlate PHP-FPM slowlog, Nginx/Apache access logs, and top IPs/URLs.
  • Decide: if one URL or one IP dominates, mitigate at the edge now (WAF/rate limit) while you debug deeper.

4) Find the WordPress entry point

  • Common culprits: wp-login.php, xmlrpc.php, /wp-json/, admin-ajax.php, WooCommerce endpoints, search, sitemap generation.
  • Decide: if it’s auth endpoints, treat as hostile traffic. If it’s admin-ajax, treat as plugin/theme behavior until proven bot.

5) Identify the plugin/theme or query

  • Use PHP-FPM slowlog stack traces and MySQL slow query log.
  • Decide: disable/replace the plugin, patch config, or cache aggressively—based on evidence, not vibes.

6) Apply a safe temporary cap

  • Rate-limit abusive paths, enable caching, and set sane PHP-FPM limits.
  • Decide: choose controlled degradation (429s on abusive endpoints) over total site death.

What “100% CPU” really means on WordPress

In WordPress land, “CPU at 100%” usually means PHP is doing too much work too often. It can also mean too many PHP workers are runnable at once because a plugin triggered expensive queries or remote calls. Or it means MySQL is on fire and PHP is just waiting, but your graphs are too coarse to tell.

Some reality checks:

  • One core pegged on a multi-core box might still be catastrophic if PHP-FPM is single-threaded per request and your hottest endpoint serializes work (locks, sessions, cache stampedes).
  • Load average is not CPU. Load can be I/O wait, runnable queue, or stuck processes. Check iowait and run queue.
  • “But we have caching” doesn’t mean you’re safe. Logged-in users, cart/checkout, admin-ajax, and personalized pages bypass most page caches by design.

Paraphrased idea from Werner Vogels (Amazon CTO): Everything fails eventually; design so you can detect, limit blast radius, and recover fast.

And yes, sometimes the cause is hilariously mundane: a cron job running every minute because someone copied a “performance” snippet from a forum.

Short joke #1: WordPress doesn’t “randomly” hit 100% CPU. It’s just very committed to the chaos you allowed.

Interesting facts & context (short, useful history)

  1. WordPress launched in 2003 as a fork of b2/cafelog; it inherited the “PHP renders everything on request” model that makes per-request cost matter.
  2. wp-cron isn’t real cron by default. It’s triggered by site traffic, which means “more visitors” can accidentally mean “more cron runs.”
  3. xmlrpc.php was a compatibility win (remote publishing, mobile clients), then became a favorite target for credential stuffing and pingback amplification.
  4. admin-ajax.php became a Swiss Army knife for plugins because it was convenient; it’s also an unintentional load generator when used for chatty front-end polling.
  5. PHP-FPM replaced mod_php as the common deployment pattern because it isolated pools and improved stability, but it also made “max_children” mistakes easier to make at scale.
  6. Object caching shifted bottlenecks: adding Redis/Memcached can reduce MySQL load, but it can also hide bad code until cache misses or evictions cause stampedes.
  7. WooCommerce changed the traffic shape: carts, sessions, AJAX fragments, and logged-in behavior invalidate full-page caching more than a blog ever did.
  8. HTTP/2 reduced connection overhead, but it can increase request concurrency, which makes expensive endpoints fail faster if you don’t rate-limit.
  9. “Headless” WP increased API usage: heavy /wp-json/ traffic can act like a low-grade DDoS if queries are unbounded or uncached.

Practical tasks: commands, outputs, decisions (the core toolkit)

You want repeatable tasks that move you from “CPU bad” to “this plugin and this endpoint, from these IPs, at this rate.” Below are tasks you can run on a typical Linux VPS or VM with Nginx/Apache + PHP-FPM + MySQL/MariaDB. Each includes: command, example output, what it means, and the decision to make.

Task 1: Confirm CPU saturation vs steal time

cr0x@server:~$ mpstat -P ALL 1 5
Linux 6.5.0 (wp-prod-01)  12/27/2025  _x86_64_  (4 CPU)

12:01:11 PM  CPU   %usr %nice  %sys %iowait  %irq %soft  %steal %idle
12:01:12 PM  all  92.10  0.00  6.40   0.20   0.00  0.80   0.00  0.50
12:01:12 PM    0  99.00  0.00  1.00   0.00   0.00  0.00   0.00  0.00
12:01:12 PM    1  88.00  0.00 11.00   1.00   0.00  0.00   0.00  0.00
12:01:12 PM    2  92.00  0.00  7.00   1.00   0.00  0.00   0.00  0.00
12:01:12 PM    3  89.00  0.00 10.00   1.00   0.00  0.00   0.00  0.00

Meaning: High %usr with near-zero %steal means your workload is actually burning CPU. Low %idle confirms saturation.

Decision: Keep digging on-app. If %steal were high (>5–10%), you’d escalate to the hosting layer or migrate.

Task 2: Find which processes are consuming CPU right now

cr0x@server:~$ ps -eo pid,ppid,cmd,%cpu,%mem --sort=-%cpu | head -n 12
  PID  PPID CMD                         %CPU %MEM
19421 19310 php-fpm: pool www           88.4  2.1
19455 19310 php-fpm: pool www           72.9  2.0
19470 19310 php-fpm: pool www           65.1  2.1
19310     1 php-fpm: master process      2.1  0.4
 2214     1 nginx: worker process        1.3  0.3
 1870     1 mysqld                       0.9 10.5

Meaning: PHP-FPM workers are the burners, not Nginx. MySQL is not the main hot spot (yet).

Decision: Switch to PHP request attribution: slowlog, access logs, and upstream timings.

Task 3: Check PHP-FPM pool saturation (are you queueing?)

cr0x@server:~$ sudo ss -s
Total: 817 (kernel 0)
TCP:   613 (estab 402, closed 162, orphaned 0, synrecv 0, timewait 162/0), ports 0

Transport Total     IP        IPv6
RAW       0         0         0
UDP       6         4         2
TCP       451       380       71
INET      457       384       73
FRAG      0         0         0

Meaning: Lots of established TCP sessions can mean normal traffic… or slow responses stacking up.

Decision: Check Nginx upstream timing and PHP-FPM status next.

Task 4: Enable and read PHP-FPM status (if available)

cr0x@server:~$ curl -s http://127.0.0.1/fpm-status | head
pool:                 www
process manager:      dynamic
start time:           27/Dec/2025:11:02:13 +0000
start since:          3540
accepted conn:        918245
listen queue:         37
max listen queue:     221
listen queue len:     128

Meaning: A non-zero listen queue means requests are waiting for a free PHP worker. That’s user-visible latency.

Decision: You can temporarily raise pm.max_children if RAM allows, but first find why workers are slow (expensive code, remote calls, DB). Scaling bad work just makes more bad work.

Task 5: Check Nginx top URLs by request rate

cr0x@server:~$ sudo awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  9412 /wp-admin/admin-ajax.php
  3120 /wp-login.php
  1788 /wp-json/wp/v2/posts?per_page=100
   904 /?s=shoes
   611 /xmlrpc.php

Meaning: admin-ajax.php dominates. That’s rarely “normal browsing.” It’s usually a plugin feature, a theme script, or a bot poking at endpoints.

Decision: Identify which action= is hot, then map it to a plugin.

Task 6: Break down admin-ajax by action parameter

cr0x@server:~$ sudo grep "admin-ajax.php" /var/log/nginx/access.log | awk -F'action=' '{print $2}' | awk '{print $1}' | cut -d'&' -f1 | sort | uniq -c | sort -nr | head
  8122 wc_fragment_refresh
   901 elementor_ajax
   214 wpforms_submit
   143 heartbeat

Meaning: wc_fragment_refresh is the classic WooCommerce cart fragments refresh call. It can be legitimate, but it’s also famously chatty and cache-hostile.

Decision: If traffic is real shoppers, you optimize the Woo path and caching rules. If it’s bots, you rate-limit/deny based on behavior.

Task 7: Identify top client IPs hammering the endpoint

cr0x@server:~$ sudo grep "admin-ajax.php" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head
  3022 203.0.113.50
  1877 198.51.100.77
   944 192.0.2.19
   611 10.0.0.12

Meaning: One IP generating thousands of requests is suspicious unless it’s your own monitoring, a load test, or a known reverse proxy.

Decision: If these are not your proxies, block or rate-limit immediately at the edge or firewall. Then keep diagnosing root cause for the rest of the traffic.

Task 8: Confirm whether requests are slow in Nginx upstream timing

cr0x@server:~$ sudo awk '{print $(NF-1),$7,$1}' /var/log/nginx/access.log | head -n 5
0.842 /wp-admin/admin-ajax.php 203.0.113.50
1.102 /wp-admin/admin-ajax.php 203.0.113.50
0.019 /wp-login.php 198.51.100.77
0.611 /wp-admin/admin-ajax.php 192.0.2.19
0.955 /wp-json/wp/v2/posts?per_page=100 198.51.100.77

Meaning: If you’ve configured a log format that includes upstream response time, you can see whether the backend is slow. Values near 1s for admin-ajax at scale will cook CPU.

Decision: Slow backend means you need PHP slowlog + WP profiling, not just blocking.

Task 9: Enable PHP-FPM slowlog and catch stack traces

cr0x@server:~$ sudo grep -nE "request_slowlog_timeout|slowlog" /etc/php/8.2/fpm/pool.d/www.conf
308:request_slowlog_timeout = 5s
309:slowlog = /var/log/php-fpm/www-slow.log

Meaning: Requests taking longer than 5 seconds will dump a backtrace to the slowlog file.

Decision: Turn it on during an incident window. If you can’t afford logging overhead, set it higher (10–15s) and keep it temporary.

cr0x@server:~$ sudo tail -n 25 /var/log/php-fpm/www-slow.log
[27-Dec-2025 12:03:41]  [pool www] pid 19455
script_filename = /var/www/html/wp-admin/admin-ajax.php
[0x00007f2a1c8b2a40] mysqli_query() /var/www/html/wp-includes/wp-db.php:2345
[0x00007f2a1c8b28b0] _do_query() /var/www/html/wp-includes/wp-db.php:2263
[0x00007f2a1c8b27f0] query() /var/www/html/wp-includes/wp-db.php:3307
[0x00007f2a1c8b2460] get_results() /var/www/html/wp-includes/wp-db.php:3650
[0x00007f2a1c8b1f90] wc_get_products() /var/www/html/wp-content/plugins/woocommerce/includes/wc-product-functions.php:1201
[0x00007f2a1c8b1c00] my_custom_fragments() /var/www/html/wp-content/plugins/some-fragments/plugin.php:88

Meaning: You now have a path from a hot endpoint to a specific plugin file and function. That’s gold.

Decision: Disable/replace the plugin or adjust its settings. If it’s your custom code, fix it. If it’s WooCommerce core behavior, consider fragment caching tweaks or reduce calls.

Task 10: Use WP-CLI to list plugins and their status

cr0x@server:~$ cd /var/www/html && sudo -u www-data wp plugin list --status=active
+---------------------+----------+-----------+---------+
| name                | status   | update    | version |
+---------------------+----------+-----------+---------+
| woocommerce         | active   | available | 8.9.2   |
| elementor           | active   | none      | 3.24.0  |
| some-fragments      | active   | none      | 1.6.1   |
| redis-cache         | active   | none      | 2.5.3   |
+---------------------+----------+-----------+---------+

Meaning: Confirms the suspect plugin is active and identifies update state (sometimes you’re running a known-bad version).

Decision: If an update exists and changelog indicates performance/security fixes, schedule it. If you need immediate relief, deactivate the offender off-peak or on a canary.

Task 11: Temporarily disable a suspected plugin (controlled test)

cr0x@server:~$ cd /var/www/html && sudo -u www-data wp plugin deactivate some-fragments
Plugin 'some-fragments' deactivated.

Meaning: Removes the code path from production without editing files directly.

Decision: Watch CPU and request rate. If CPU drops and errors don’t spike, you found your culprit. If errors spike, roll back and try a narrower mitigation (rate limit, caching, feature toggle).

Task 12: Check MySQL for slow queries (is DB the real bottleneck?)

cr0x@server:~$ sudo mysql -e "SHOW FULL PROCESSLIST\G" | head -n 30
*************************** 1. row ***************************
     Id: 12891
   User: wpuser
   Host: 127.0.0.1:51862
     db: wordpress
Command: Query
   Time: 9
  State: Sending data
   Info: SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND wp_posts.post_type IN ('product') AND (wp_posts.post_status = 'publish') ORDER BY wp_posts.post_date DESC LIMIT 0, 48

Meaning: Queries taking 9 seconds are not “fine.” “Sending data” often implies large scans, poor indexes, or huge result sets.

Decision: Enable slow query log, inspect query patterns, and fix indexes or reduce query cost (pagination, constraints, caching). Consider limiting expensive API queries.

Task 13: Turn on MySQL slow query log (temporary) and inspect

cr0x@server:~$ sudo mysql -e "SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1;"
cr0x@server:~$ sudo tail -n 20 /var/log/mysql/slow.log
# Time: 2025-12-27T12:06:01.123456Z
# User@Host: wpuser[wpuser] @ localhost []
# Query_time: 2.941  Lock_time: 0.001 Rows_sent: 48  Rows_examined: 504812
SET timestamp=1766837161;
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND wp_posts.post_type IN ('product') ORDER BY wp_posts.post_date DESC LIMIT 0, 48;

Meaning: Rows_examined in the hundreds of thousands for a tiny page result is classic “you’re scanning the ocean to find a fish.”

Decision: Identify which page/API triggers it, then consider query rewrite via plugin settings, adding indexes (carefully), or caching at object/page level.

Task 14: Check disk I/O wait (CPU might be “busy waiting” elsewhere)

cr0x@server:~$ iostat -x 1 3
Device            r/s   w/s  rkB/s  wkB/s  await  %util
nvme0n1          2.1  18.2   44.0  512.0   1.20  22.5

Meaning: Low await and moderate %util means storage is probably not the limiter. If await is high (tens/hundreds ms) and %util is pegged, your “CPU issue” is actually I/O pressure.

Decision: If storage is the bottleneck, fix I/O (optimize DB, move to faster disk, reduce logging, tune buffer pool) before messing with PHP.

Task 15: Catch a live PHP worker backtrace with perf (when you’re desperate)

cr0x@server:~$ sudo perf top -p 19421
Samples: 2K of event 'cpu-clock', 4000 Hz, Event count (approx.): 510000000
Overhead  Shared Object        Symbol
  18.20%  php-fpm8.2           zend_execute_ex
  11.35%  php-fpm8.2           zif_preg_match
   8.74%  libpcre2-8.so.0.11.2 pcre2_match_8
   6.10%  php-fpm8.2           zim_spl_autoload_call

Meaning: Heavy regex (preg_match) inside PHP can be plugin behavior (filtering, content parsing, security scanning) or a theme doing “clever” stuff per request.

Decision: Combine with slowlog to pinpoint which plugin calls it. If a security plugin is scanning content on every request, it might be time for a calmer product.

Is it a plugin, a bot, or the platform?

Most CPU incidents fall into one of three buckets:

  • Hostile traffic: credential stuffing on wp-login.php, XML-RPC abuse, brute force on REST endpoints, aggressive scraping, or “SEO tools” that behave like locusts.
  • Plugin/theme behavior: admin-ajax polling, expensive product queries, image processing, analytics beacons, page builders doing server-side rendering or dynamic CSS generation.
  • Platform constraints: too few CPU cores, too little RAM (swap thrash), noisy neighbor steal time, mis-tuned PHP-FPM, MySQL under-provisioned.

The key move is to attribute by request path and by client. A plugin problem shows up as many different IPs hitting the same endpoint. A bot problem often shows a small set of IPs or ASNs, weird user agents, or high request rates with low session continuity.

The two-minute split test

If the hot endpoint is wp-login.php or xmlrpc.php, assume hostile until proven otherwise. If the hot endpoint is admin-ajax.php with a consistent action, assume plugin/theme until proven otherwise. If it’s /wp-json/ with per_page=100 or unbounded queries, assume “API design problem.”

Short joke #2: If your “marketing automation” hits admin-ajax.php 50 times a second, it’s not automation; it’s a tiny denial-of-service with a budget code.

Three corporate-world mini-stories (things that actually happen)

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company ran WordPress as the public face of a product suite. Their SRE rotation treated it as “static-ish content,” so they put a CDN in front, enabled page caching, and moved on. Everyone slept better. Then a Tuesday morning arrived with a familiar smell: graphs that look like cliffs.

CPU was pegged, PHP-FPM listen queue was climbing, and the homepage was slow. The first assumption was the classic one: “The CDN must be down.” It wasn’t. Cache hit ratio was excellent. That’s what made the incident confusing: the public pages were cached, so why was PHP dying?

The answer lived in access logs. A botnet was hammering /wp-json/wp/v2/users and /wp-json/wp/v2/posts with large page sizes and search parameters. The CDN happily forwarded them because the requests looked like “normal GETs.” The team had assumed “GET is safe” and “CDN == shield.” Neither is true.

The fix was boring and effective: rate-limit REST endpoints at the edge, add stricter caching headers where safe, and block user enumeration endpoints. They also tightened the API to cap per_page and disabled unnecessary routes. CPU dropped instantly, and the lesson stuck: the attack surface includes every dynamic read endpoint, not just login forms.

Mini-story 2: The optimization that backfired

Another organization had recurring CPU spikes during campaign launches. Someone proposed a “simple win”: increase PHP-FPM pm.max_children and MySQL connections so the server can “handle more concurrency.” It sounded reasonable. It made the graphs look better for about an hour.

Then the site got slower. Not just slower—erratic. Some requests returned quickly; others took ages. CPU remained high, load average climbed, and MySQL started showing long query times. Eventually the box began swapping, and the kernel started killing processes. The team had successfully scaled their bottleneck into a full-system collapse.

Root cause: they increased PHP concurrency without reducing per-request cost. More workers meant more simultaneous expensive queries, which overwhelmed the database buffer pool. The added memory pressure pushed the system into swap, turning “CPU problem” into “everything problem.”

The real fix was less glamorous: cap PHP workers to fit RAM, enable PHP-FPM slowlog, identify the plugin generating unindexed product queries, and change its behavior. After that, they could raise concurrency carefully. They learned the hard way that “more workers” is not optimization; it’s a multiplier.

Mini-story 3: The boring but correct practice that saved the day

A global brand had a WordPress instance supporting a big press event. They’d been burned before, so they ran a dull regimen: weekly plugin inventory, staged updates, baseline performance tests, and a standard “traffic spike” runbook. No heroics, just habits.

On event day, CPU rose as expected. Then it rose more. They didn’t panic. The runbook started with: identify hot endpoint, top IPs, PHP-FPM queue, MySQL slow queries. Within minutes they found a surge of requests to a search endpoint that bypassed page cache. It was real user traffic, not bots.

Because they had pre-built mitigations, they flipped on a cached search results layer (short TTL), temporarily reduced search complexity (fewer fields), and set a sane rate limit for abusive patterns. The site stayed up. No one noticed the compromise except the people watching graphs.

The saving practice wasn’t a fancy tool. It was having instrumentation and a predictable workflow before the crisis. “Boring” is what you call reliability when it works.

Common WordPress CPU hotspots (what’s usually guilty)

1) wp-login.php and credential stuffing

High CPU plus lots of POSTs to wp-login.php is nearly always brute force or credential stuffing. Even failed logins cost CPU: password hashing, session handling, and plugin hooks that run on auth attempts.

What to do: enforce rate limits, add bot challenges at the edge, disable user enumeration, and consider moving admin behind VPN or IP allowlists if the org can tolerate it.

2) xmlrpc.php (pingback + brute force)

XML-RPC can be abused both for login attempts and for pingback amplification. Many sites don’t need it. Disabling it often reduces attack surface dramatically.

3) admin-ajax.php (polling and fragments)

Admin AJAX is the #1 “plugin did it” CPU source. Front-end scripts poll for updates (chat widgets, page builders, analytics), WooCommerce refreshes fragments, and some plugins use admin-ajax as their API because it’s there.

Tell: request rate is huge, endpoint is constant, many IPs, and slowlog shows plugin functions.

4) REST API endpoints with unbounded queries

REST endpoints can be hammered by scrapers or by your own front-end if you’ve gone headless. If you allow big per_page, expensive filters, or deep meta queries, you’ve built a CPU shredder with a nice interface.

5) Search and filtering (LIKE queries, meta queries)

WordPress search is famously expensive on large datasets, especially with WooCommerce and custom fields. Meta queries on unindexed meta tables are a slow-motion disaster.

6) wp-cron storms

Traffic-triggered cron means spikes can cause more cron runs, which cause more CPU, which slows the site, which increases overlap. It’s a feedback loop with a polite name.

7) “Security” plugins doing per-request scanning

Some security plugins inspect every request deeply (regex, file checks, IP reputation calls). They can be useful, but they can also become your biggest CPU consumer. Measure them like any other plugin.

Mitigations that work (and ones that bite later)

Block and rate-limit at the edge first

If hostile traffic is part of the story, do not “fix it in PHP.” PHP is the wrong layer for volumetric abuse. Use a WAF/CDN, Nginx rate limiting, or firewall rules.

Nginx rate limiting examples

If you control Nginx, you can throttle abusive endpoints. The goal isn’t to punish legitimate users. It’s to prevent a single actor from consuming all workers.

cr0x@server:~$ sudo nginx -T | grep -n "limit_req_zone" | head
53:limit_req_zone $binary_remote_addr zone=wp_login:10m rate=5r/m;
54:limit_req_zone $binary_remote_addr zone=wp_ajax:10m rate=30r/m;

Meaning: Defines per-IP request rates. Login should be slow; AJAX can be higher but not infinite.

Decision: Apply these to the right locations, then reload Nginx and watch 429 rates.

cr0x@server:~$ sudo grep -n "location = /wp-login.php" -n /etc/nginx/sites-enabled/default
121:location = /wp-login.php {
122:    limit_req zone=wp_login burst=10 nodelay;
123:    include snippets/fastcgi-php.conf;
124:    fastcgi_pass unix:/run/php/php8.2-fpm.sock;
125:}

Meaning: Login attempts are throttled. Bursts allow short spikes; sustained abuse gets 429.

Decision: If you see user complaints, loosen burst slightly but keep rate low. Brute force is patient.

Make wp-cron predictable

Disable traffic-triggered WP-Cron and run it via real cron. This is one of those changes that reduces “mystery spikes.”

cr0x@server:~$ sudo -u www-data wp config set DISABLE_WP_CRON true --raw
Success: Updated the constant 'DISABLE_WP_CRON' in the 'wp-config.php' file.

Meaning: WordPress stops spawning cron on page loads.

Decision: Add a system cron entry to run wp-cron.php at a sane interval.

cr0x@server:~$ sudo crontab -u www-data -l | tail -n 3
*/5 * * * * cd /var/www/html && wp cron event run --due-now >/dev/null 2>&1

Meaning: Cron is now time-driven, not traffic-driven.

Decision: If you have high background workload (WooCommerce actions), adjust interval or use Action Scheduler runners more carefully.

Use caching like you mean it (but don’t pretend it covers everything)

Full-page cache is great for anonymous content. Object cache is great for repeated DB lookups. Neither magically makes WooCommerce checkout cheap. What they do is reduce repeated work so your CPU is spent on real user actions, not repeated queries for the same menu.

Be wary of “cache everything” plugins that add layers without observability. If you can’t measure hit rate and eviction patterns, you’re operating a rumor.

Right-size PHP-FPM (do not set it to “infinite”)

CPU spikes often start as slow requests, then become queueing. If you set pm.max_children too low, you queue. Too high, you run out of RAM and thrash. There’s a sweet spot, and it depends on average PHP worker memory usage.

cr0x@server:~$ ps --no-headers -o rss -C php-fpm8.2 | awk '{sum+=$1; n++} END{print "avg_rss_kb=" int(sum/n) ", workers=" n}'
avg_rss_kb=89234, workers=24

Meaning: Average worker RSS is ~87MB. Multiply by max children and add overhead for MySQL, OS cache, and Nginx.

Decision: If you have 2GB RAM, 24 workers at 87MB is already ~2.1GB, which is not possible without swapping. Cap it lower, then reduce per-request memory usage.

Common mistakes: symptom → root cause → fix

This is the section where you recognize your own incident. It’s fine. We’ve all been there. The difference is whether you write it down and stop repeating it.

1) Symptom: CPU 100%, “load average” huge; site times out

Root cause: PHP-FPM listen queue growing because requests are slow, not because you need more workers.

Fix: Enable PHP-FPM slowlog and attribute slow paths; block/rate-limit abusive endpoints; reduce per-request work; then tune max children based on RAM.

2) Symptom: CPU high after enabling a cache plugin

Root cause: Cache warmup or cache preloading hitting every URL; or cache misses causing stampedes under concurrency.

Fix: Disable aggressive warmup in production; add cache locking if supported; lower concurrency of warmers; ensure object cache is correctly configured.

3) Symptom: spikes every few minutes like clockwork

Root cause: WP-Cron or Action Scheduler running heavy tasks, triggered by traffic or misconfigured schedule.

Fix: Disable WP-Cron and run real cron; inspect scheduled events; fix tasks that do too much per run.

4) Symptom: lots of admin-ajax requests; CPU melts under “normal” traffic

Root cause: WooCommerce fragments refresh, page builder editor assets, or front-end polling (heartbeat, chat) generating high-frequency uncached calls.

Fix: Reduce frequency; disable fragments where safe; cache responses; ensure these endpoints aren’t hit for anonymous users unnecessarily.

5) Symptom: MySQL CPU high; PHP moderate; queries show “Sending data”

Root cause: unindexed queries (meta queries, LIKE searches), large tables, or expensive sorts.

Fix: enable slow query log; identify query source; add indexes carefully; change plugin settings to avoid worst queries; add object caching.

6) Symptom: CPU high only during crawls; pages are mostly cacheable

Root cause: bots bypassing cache via query strings, headers, or hitting uncached endpoints like sitemaps and feeds.

Fix: cache sitemaps; set caching rules for common bot patterns; rate-limit; block abusive user agents that ignore robots.

7) Symptom: random 502/504 errors appear when CPU spikes

Root cause: upstream timeouts (Nginx/Apache waiting for PHP-FPM), insufficient workers, or workers stuck on slow DB/remote calls.

Fix: correlate error logs with PHP-FPM slowlog; increase timeouts only after you reduce request duration; avoid masking slow backend by increasing timeouts forever.

Checklists / step-by-step plan

Step-by-step: find the hammering bot or plugin in 30–60 minutes

  1. Verify CPU is real: check mpstat for steal and iowait.
  2. Identify hottest process: ps sorted by CPU. Usually PHP-FPM.
  3. Check PHP queue: FPM status page; confirm queue growth.
  4. Find top endpoints: parse access logs for top URL paths.
  5. Find top clients: top IPs for hot endpoints. Decide block/rate-limit now.
  6. Enable PHP-FPM slowlog (short window). Capture backtraces.
  7. Map slowlog to plugin/theme: file paths under wp-content/plugins or theme directories.
  8. Confirm with a controlled test: temporarily deactivate plugin via WP-CLI or feature flag; watch CPU and latency.
  9. If DB is suspicious: check processlist and slow query log; map queries to endpoints.
  10. Apply durable fix: plugin replacement/config change, caching, rate limits, cron changes, query optimization.
  11. Write a tiny runbook note: what endpoint, what plugin, what mitigation, what metric confirms health.

Checklist: safe mitigations you can apply during an incident

  • Rate-limit wp-login.php, xmlrpc.php, and abusive /wp-json/ routes.
  • Block obvious bad IPs (but prefer rate limits over whack-a-mole blocks).
  • Enable PHP-FPM slowlog temporarily; collect evidence.
  • Scale vertically only if steal time is low and you have evidence you’re CPU-limited (not I/O or DB locked).
  • Cap PHP-FPM workers to avoid swap death.
  • Disable WP-Cron and run a real cron.
  • Disable the specific plugin causing the slowlog traces if the site can tolerate it.
  • Turn off expensive non-essential features (related products, live search, fancy filters) for the duration.

Checklist: changes to schedule after the incident

  • Keep a baseline: request rate, p95 latency, PHP-FPM queue, top endpoints.
  • Implement structured access logs including upstream time.
  • Keep slow query log available (even if normally off) and know how to enable it fast.
  • Establish a “plugin performance budget”: avoid plugins that do heavy work on every request.
  • Update cadence with staging and rollback plan.

FAQ

1) How do I know if it’s bots or real users?

Look for concentration by IP, user agent, and behavior. Bots often hit the same endpoint at high rates, ignore cookies, and have low diversity of pages. Real users browse.

2) Should I just add more CPU cores?

Only after you’ve attributed the work. Scaling can buy time, but it also increases your bill and can amplify DB bottlenecks. If one endpoint is abusive, block it first.

3) Is admin-ajax.php always bad?

No. It’s a legitimate mechanism. It becomes bad when it’s used as a high-frequency API for anonymous users or when responses trigger expensive DB queries.

4) What’s the fastest way to find the plugin causing load?

PHP-FPM slowlog stack traces are the fastest reliable method. Access logs tell you what is hot; slowlog tells you which code is slow.

5) My CPU is high but PHP-FPM isn’t. What then?

Check MySQL CPU and processlist. Also check I/O wait and swap usage. Sometimes it’s search indexing, image processing, backup jobs, or even another tenant on the same host.

6) Can caching fix WooCommerce CPU spikes?

Partially. You can cache anonymous pages and some fragments, but cart/checkout and logged-in behavior will stay dynamic. Focus on reducing admin-ajax chatter and expensive queries.

7) Should I disable xmlrpc.php?

If you don’t use it (most sites don’t), yes—disable or block it. If you must keep it, rate-limit it heavily and monitor login attempts.

8) Is Redis object cache always a win?

It’s a win when it reduces repeated DB reads and you have a stable cache. It can backfire if it masks inefficient code until eviction storms or if misconfiguration causes constant misses.

9) Why do CPU spikes correlate with cron?

Because WP-Cron can trigger on page loads, which creates feedback loops under traffic. Moving it to real cron turns surprise spikes into scheduled work.

10) What’s the single most common cause of WordPress 100% CPU?

High request volume to a dynamic endpoint that bypasses cache, combined with slow plugin code or slow DB queries. It’s rarely “WordPress core” alone.

Conclusion: next steps you can do today

If your WordPress site is pegging CPU, don’t treat it like a mystery. Treat it like an investigation with evidence.

  1. Run the fast diagnosis playbook: identify process → endpoint → client → code path.
  2. Turn on PHP-FPM slowlog for a short window and capture backtraces during the spike.
  3. Parse access logs to find top URLs and top IPs. Rate-limit abusive endpoints immediately.
  4. Map stack traces to plugins and disable/replace the offenders with controlled tests.
  5. Make cron predictable and right-size PHP-FPM to avoid queueing and swap thrash.
  6. Write down what you learned: the endpoint, the plugin, the mitigation, and the metric that proves it’s fixed. Future-you will be less angry.

CPU is not a moral failing. It’s a bill you’re paying in real time. Get the receipt: the request, the plugin, the query, the client. Then make it stop.

← Previous
Email ARC explained (the short, useful version) — when it helps forwarded mail
Next →
Pure CSS Skeleton Screens: Shimmer, Reduced Motion, and Performance

Leave a comment