Thumbnails break in WordPress the way “quick changes” break production: silently, at scale, right before someone important notices. You deploy a new theme, tweak image sizes, switch CDNs, or offload media to object storage, and suddenly half your product pages are stretching a 150×150 square into a tragic billboard.
The worst part: regenerating thumbnails sounds harmless until it turns into a CPU blender, a disk I/O stampede, or a cache-miss festival. This is the practical, SRE-grade way to regenerate thumbnails safely—without downtime, without praying, and without discovering at 2 a.m. that your servers are allergic to ImageMagick.
What’s actually broken (and what isn’t)
“Thumbnails are broken” is a symptom, not a diagnosis. WordPress image rendering is a pipeline: upload original → generate intermediate sizes → store metadata in the database → serve the right file via theme and srcset → optionally rewrite via CDN/offload plugin → optionally cache at multiple layers.
Common thumbnail failure modes
- Missing intermediate files: the resized files were never generated, were deleted, or live in a different storage backend now.
- Wrong paths/URLs: database points to one URL pattern, filesystem contains another, or a CDN rewrite is misconfigured.
- Metadata mismatch: the resized files exist, but
wp_postmetacontains stale sizes or old dimensions. - Permissions/ownership: WordPress can read originals but can’t write new sizes into
wp-content/uploads. - Image processing failures: ImageMagick/GD errors, memory limits, policy restrictions, or timeouts under load.
- Cache lies: thumbnails are fixed, but users still see old 404s or stale HTML with old srcset.
Regenerating thumbnails fixes only one category: missing/intermediate sizes + metadata mismatch. It does not fix a broken CDN rewrite, a bad Nginx location block, or an S3 bucket policy that denies reads.
One paraphrased idea often attributed to systems reliability folks like Werner Vogels fits here: Paraphrased idea: Everything fails; design so recovery is routine, not a heroic event.
That’s the posture you want: routine, bounded, observable regeneration.
Fast diagnosis playbook
If you only have 10 minutes before someone starts a “war room,” do this in order. You’re looking for the bottleneck: storage, CPU, PHP workers, or cache/CDN.
1) Confirm what’s failing: 404, wrong size, or processing error
- If the browser shows 404s for resized filenames (e.g.,
image-300x200.jpg), you likely need regeneration or filesystem/offload alignment. - If you see 200 OK but stretched/cropped wrong, it’s usually stale metadata or a theme requesting a size that doesn’t exist.
- If admin uploads fail or regeneration throws errors, it’s usually ImageMagick/GD limits, PHP memory, or permissions.
2) Check server load shape: CPU vs I/O vs PHP-FPM saturation
- High CPU: image processing is real work; fix batching and tool choice (ImageMagick vs GD) and limits.
- High disk I/O wait: storage is the bottleneck; throttle harder, move temp dirs, or regenerate off-host.
- PHP-FPM maxed: your regeneration is stealing workers from live traffic; isolate it (CLI user, separate pool, or cron window).
3) Verify storage backend and URL rewriting
- If you offload to S3-compatible storage, confirm intermediate sizes are stored and served, not just originals.
- If you use a CDN, confirm it’s not caching 404 responses for resized files.
Then pick the least risky remediation: regenerate in controlled batches, warm caches, and purge only what you must. Downtime is for database migrations and existential dread, not thumbnails.
Interesting facts and historical context
Short, concrete facts that explain why WordPress thumbnails can feel like a haunted photocopier:
- WordPress stores image size metadata per attachment in
wp_postmeta(the_wp_attachment_metadatablob), so “fixing files” without updating metadata can still serve wrong srcset. - Intermediate sizes are named by convention (e.g.,
-300x200) which makes 404 diagnosis easy—but also makes CDNs cache misses very “sticky.” - Historically, many hosts compiled PHP with GD only, so some sites’ image output subtly changes when ImageMagick becomes available (sharpening, color profiles, alpha handling).
- WordPress’ responsive images (srcset) arrived in core years after themes learned “hard-coded sizes,” so legacy themes often request sizes that don’t match registered image sizes.
- EXIF orientation is a repeating source of pain: phones rotate in metadata; some processors apply it differently, so regenerated images can “flip” versus older ones.
- Object-storage offload plugins changed the meaning of “file exists,” because the truth is now split between local disk, remote bucket, and database rewrite rules.
- Some CDNs cache 404s by default; after you regenerate thumbnails, users keep seeing missing images until you purge negative cache entries.
- ImageMagick has a security history (policy restrictions are common), so hosts often ship it with conservative limits that break large images during regeneration.
- WordPress can generate multiple sizes per upload, which means one “simple” regeneration can multiply storage writes fast, especially with retina/extra sizes from plugins.
Choose your regeneration approach (and the blast radius)
Option A: WP-CLI regeneration (recommended for production)
WP-CLI is boring in the best way: it’s scriptable, batchable, and you can run it under a low-priority user with strict limits. It’s also easier to observe and stop.
Use it when: you control the shell, can schedule off-peak windows, and want a reproducible workflow.
Option B: WordPress admin plugins (acceptable for small sites)
Plugins that regenerate thumbnails from wp-admin are fine until they’re not. They run inside the same PHP-FPM workers that serve users, and they’re more sensitive to timeouts. They’re also harder to throttle precisely.
Use it when: the media library is small, traffic is low, and you don’t have shell access.
Option C: Background job queue (best for high-scale, multi-host)
If you run WordPress like a real application—with job workers, queues, and separate pools—thumbnail generation belongs there. You can rate-limit, retry, and keep web response latency clean.
Use it when: you have multiple app servers, high traffic, and you’re already operating workers.
What you should avoid
- Regenerating everything at once on a single host in the middle of the day.
- Blindly purging the entire CDN cache “just in case.” That’s a self-inflicted DDoS against your origin.
- Changing image sizes and regenerating before verifying theme usage. You’ll generate gigabytes of sizes nobody requests.
Joke #1: Thumbnail regeneration is like going to the gym—going too hard on day one mostly just proves you can still feel pain.
Practical tasks with commands: what to run, what the output means, what to do next
These are production-safe tasks you can run while the site stays up. The point is to measure, decide, and proceed in controlled steps. I’ll assume a typical Linux host with WordPress installed at /var/www/html. Adjust paths to match your world.
Task 1: Confirm WordPress sees the uploads directory correctly
cr0x@server:~$ cd /var/www/html && wp option get upload_path
...output...
What the output means: Empty output usually means default (wp-content/uploads). A non-empty value means WordPress is using a custom path.
Decision: If it’s custom, validate that path exists and is writable; regeneration will follow it.
Task 2: Check the active image editor (ImageMagick vs GD)
cr0x@server:~$ cd /var/www/html && wp eval 'print_r(wp_get_image_editor("wp-content/uploads/".date("Y")."/".date("m")."/test.jpg"));'
...output...
What the output means: You’ll either see an editor object (e.g., Imagick) or an error if the test file doesn’t exist or processing fails.
Decision: If Imagick fails with policy/resource errors, switch to GD temporarily or fix ImageMagick limits before you regenerate at scale.
Task 3: Verify disk space before you generate thousands of files
cr0x@server:~$ df -h /var/www/html/wp-content/uploads
...output...
What the output means: You care about available space and filesystem type. Low free space means regeneration will partially succeed and then fail—leaving a mess.
Decision: If free space is tight, do one of: increase disk, regenerate on a bigger volume, or reduce generated sizes.
Task 4: Estimate uploads size and file count (I/O impact)
cr0x@server:~$ du -sh /var/www/html/wp-content/uploads && find /var/www/html/wp-content/uploads -type f | wc -l
...output...
What the output means: Total size + file count hints at how long any scan/regeneration will take, and how hard you’ll hit inode/cache layers.
Decision: Huge counts push you toward batching, off-peak windows, and careful cache strategy.
Task 5: Identify the specific missing thumbnail variant from logs
cr0x@server:~$ sudo tail -n 50 /var/log/nginx/access.log
...output...
What the output means: Look for GET /wp-content/uploads/...-300x200.jpg with status 404 (or 403).
Decision: If 404s are consistently for certain sizes, confirm those sizes exist in WordPress’ registered image sizes and/or metadata.
Task 6: Confirm the file truly doesn’t exist on disk
cr0x@server:~$ ls -lah /var/www/html/wp-content/uploads/2025/11 | head
...output...
What the output means: If you see originals but not the expected -WxH variants, it’s likely a generation/metadata issue.
Decision: Proceed to metadata inspection and controlled regeneration.
Task 7: Inspect attachment metadata for a known broken image
cr0x@server:~$ cd /var/www/html && wp post list --post_type=attachment --posts_per_page=5 --orderby=date --order=DESC
...output...
What the output means: You get attachment IDs. Pick one that’s showing broken thumbnails.
Decision: Use the ID to print metadata next.
cr0x@server:~$ cd /var/www/html && wp post meta get 12345 _wp_attachment_metadata
...output...
What the output means: You’ll see serialized or JSON-ish data depending on context; inside are sizes entries with filenames and dimensions.
Decision: If metadata references sizes that are missing on disk, regeneration is justified. If metadata has no sizes, generation likely failed originally.
Task 8: See which image sizes WordPress thinks it should generate
cr0x@server:~$ cd /var/www/html && wp eval 'global $_wp_additional_image_sizes; print_r(array_merge(get_intermediate_image_sizes(), array_keys((array)$_wp_additional_image_sizes)));'
...output...
What the output means: A list of size names: thumbnail, medium, large, plus theme/plugin-defined sizes.
Decision: If you see a long list of custom sizes that nobody uses, consider disabling them before regenerating to avoid pointless work and storage growth.
Task 9: Dry-run regeneration for a single attachment
cr0x@server:~$ cd /var/www/html && wp media regenerate 12345 --yes
...output...
What the output means: It will report generated sizes or errors. Errors often mention memory, ImageMagick policy, or permissions.
Decision: If one image fails, do not start a bulk regen. Fix the failure mode first; otherwise you’ll just manufacture a longer incident.
Task 10: Run bulk regeneration in bounded batches (protect production)
First, generate a list of attachment IDs and feed them in chunks. This gives you a stop button and makes progress measurable.
cr0x@server:~$ cd /var/www/html && wp post list --post_type=attachment --field=ID --posts_per_page=1000 --orderby=ID --order=ASC > /tmp/attachment-ids.txt
...output...
What the output means: A file of IDs. No output is normal; check the file size if you’re suspicious.
Decision: If the list is enormous, you’ll run batches over hours/days with throttling.
cr0x@server:~$ split -l 200 /tmp/attachment-ids.txt /tmp/ids-batch-
...output...
What the output means: Creates files like /tmp/ids-batch-aa, each with 200 IDs.
Decision: Batch size is a throttle knob. Smaller batches mean less burst load and easier rollback.
cr0x@server:~$ cd /var/www/html && time xargs -a /tmp/ids-batch-aa -n 1 wp media regenerate --yes
...output...
What the output means: You’ll see per-ID progress and a wall-clock time. If it’s slow, your bottleneck is likely CPU or storage.
Decision: If latency spikes or load climbs, pause between batches, reduce batch size, or run only off-peak. Don’t be brave; be consistent.
Task 11: Throttle CPU and I/O priority for regeneration
cr0x@server:~$ cd /var/www/html && sudo nice -n 15 sudo ionice -c2 -n7 xargs -a /tmp/ids-batch-ab -n 1 wp media regenerate --yes
...output...
What the output means: Same WP-CLI output, but the OS deprioritizes the process relative to web traffic.
Decision: If you share the box with live traffic, use this by default. If you have dedicated workers, you can relax it.
Task 12: Watch PHP-FPM saturation while regeneration runs
cr0x@server:~$ sudo ss -s
...output...
What the output means: High established connections or a spike in TCP states can indicate overload. Pair with FPM status if enabled.
Decision: If connection counts climb and page latency worsens, throttle regeneration harder or isolate it to a separate host/pool.
Task 13: Verify regenerated files exist and metadata updated
cr0x@server:~$ ls -lah /var/www/html/wp-content/uploads/2025/11 | grep -- '-300x' | head
...output...
What the output means: Presence of new resized variants indicates file generation succeeded.
Decision: If files exist but front-end still serves old URLs, you’re in cache/HTML territory: purge selectively and ensure srcset is rebuilt.
cr0x@server:~$ cd /var/www/html && wp post meta get 12345 _wp_attachment_metadata | head
...output...
What the output means: You should see updated size entries. If not, regeneration might have written files but failed to persist metadata.
Decision: Investigate database write permissions, object cache oddities, or errors in WP-CLI output.
Task 14: Confirm the HTTP layer serves the regenerated file (not a cached 404)
cr0x@server:~$ curl -I https://example.com/wp-content/uploads/2025/11/image-300x200.jpg
...output...
What the output means: You care about HTTP status, cache headers, and possibly a CDN header indicating HIT/MISS.
Decision: If it’s still 404 but file exists on origin, your CDN or rewrite rules are lying. Purge the specific path and re-check origin directly.
Task 15: Spot ImageMagick policy or memory errors fast
cr0x@server:~$ cd /var/www/html && wp media regenerate 12345 --yes --debug
...output...
What the output means: Debug output often shows underlying errors like “not authorized” (policy.xml) or memory exhaustion.
Decision: If ImageMagick is blocked by policy, fix policy/limits or force GD; don’t keep retrying blindly.
Task 16: If offloading to object storage, verify thumbnails exist remotely
cr0x@server:~$ aws s3 ls s3://my-media-bucket/wp-content/uploads/2025/11/ | head
...output...
What the output means: You should see resized variants alongside originals. If you only see originals, your offload plugin may not be pushing intermediate sizes.
Decision: Fix the offload configuration and re-run regeneration (or re-sync) so new sizes are uploaded too.
Joke #2: If you regenerate thumbnails on Friday afternoon, you’re not a risk-taker—you’re a weekend enthusiast.
Three corporate mini-stories from the thumbnail trenches
1) The incident caused by a wrong assumption
The site was a marketing-heavy WordPress install with a modest catalog and a lot of landing pages. Someone changed theme image sizes to “sharpen performance.” The team assumed WordPress would lazily generate the new sizes on demand when requested. That’s not how it works: WordPress generates intermediate sizes at upload time, and regeneration is a deliberate action.
Within hours, the new theme started requesting a size name that didn’t exist for older media. The HTML shipped srcset entries pointing at -768x512 variants that were never created. The CDN, doing its job a little too well, cached a pile of 404 responses. Users didn’t just see missing images; they saw missing images reliably.
The first response was to “purge everything.” That caused a thundering herd back to origin, which was already busy with a plugin-based regeneration attempt running through wp-admin. PHP-FPM workers were now split between real users and image processing. Page generation slowed down, timeouts rose, and suddenly this was “an outage,” not “a media issue.”
The fix was painfully unglamorous: stop the plugin regen, identify the missing sizes, regenerate in CLI batches with low priority, purge only the negative-cache paths, then let caches warm naturally. The take-away that stuck: the assumption wasn’t just wrong—it created a chain reaction across CDN, origin, and PHP worker pools.
2) The optimization that backfired
A different team tried to be clever. They moved uploads to network-attached storage so all app servers could read the same files. The promise was “no more rsync, no more drift.” It worked for serving static originals. Thumbnail regeneration was another story.
Regeneration is write-heavy and metadata-heavy. Every attachment becomes multiple writes, plus stats, plus directory scans. The NAS performed fine under steady reads but got hammered by lots of small writes and metadata operations. iowait spiked, PHP processes stacked up, and the web tier began to behave like it was CPU-bound when it wasn’t.
They “optimized” by increasing concurrency: running multiple regeneration processes in parallel. That made the NAS the single shared bottleneck and turned a slow job into a slow incident. At the same time, backups started to miss their window because the storage system was busy processing a storm of tiny file operations.
The eventual approach was to run regeneration on a separate worker host with local fast storage, then sync results in controlled waves. Less elegant than the original idea, but it kept production latency sane. The moral: scaling concurrency without understanding storage behavior is just a way to find new ceilings—loudly.
3) The boring but correct practice that saved the day
This one didn’t make anyone’s résumé, which is exactly why it worked. A team had a standing practice: before any theme change or media-handling plugin change, they ran a small canary regeneration on a subset of attachments, and they recorded time-per-image plus peak load.
When they migrated from GD to ImageMagick (for quality reasons), their canary immediately surfaced failures on large PNGs due to conservative ImageMagick limits. No customer impact yet, because they hadn’t touched production behavior. They adjusted limits and PHP memory in a controlled change window, reran the canary, and only then proceeded.
Later, when they needed a full regeneration after adding new sizes for responsive design, they already had a known-safe batch size, a known-safe throttle, and a runbook with “stop conditions.” They ran it over two nights with a simple progress log and no drama.
The “boring” practice was measuring first and changing one variable at a time. It didn’t look impressive in a sprint demo, but it prevented the sort of cascading failure that turns a media task into an executive update.
Checklists / step-by-step plan (no downtime)
Step 0: Decide what “fixed” means
- Are thumbnails missing (404), wrong (distorted), or stale (old crop)?
- Is the problem global or limited to specific years/months in uploads?
- Is object storage/CDN involved?
Step 1: Stabilize production before you touch regeneration
- Freeze theme and plugin changes until regeneration is done.
- If the site is already degraded, stop any in-dashboard regeneration plugins first.
- Confirm you have disk space headroom and working backups of database + uploads manifest.
Step 2: Canary regeneration
- Pick 10 attachments: mix JPEG/PNG, old/new, large/small.
- Regenerate those IDs via WP-CLI.
- Verify files exist and the front-end serves correct sizes.
- Measure time per image and watch CPU/iowait.
Step 3: Choose the execution model
- Single host + low traffic: run batches with nice/ionice, pause between batches.
- Multi-host or high traffic: run on a dedicated worker or temporarily scale up and isolate processing.
- Offload to S3: confirm intermediate sizes are uploaded and served; regeneration alone may only fix local.
Step 4: Run batches with explicit stop conditions
Stop conditions should be written down before you start:
- Page response latency increases beyond your normal tolerance.
- CPU or iowait crosses a threshold you know correlates with user pain.
- Error rate for WP-CLI regeneration exceeds a small percentage (processing failures usually repeat).
- Disk space drops below a safe margin.
Step 5: Cache strategy: purge narrowly, warm deliberately
- Purge only the thumbnail paths that were returning 404 or stale content.
- Prefer allowing caches to refill naturally; avoid full-site purges.
- If your CDN caches 404s, explicitly purge those objects after regeneration.
Step 6: Post-run verification
- Sample across content types: posts, product pages, category grids, search results.
- Check srcset correctness (sizes exist and are reachable).
- Confirm object storage contains the generated sizes if you serve from it.
- Confirm backups and monitoring thresholds return to baseline.
Common mistakes: symptom → root cause → fix
1) Symptom: thumbnails 404, but originals load
Root cause: intermediate sizes missing, or metadata references sizes not present (common after theme size changes or failed past generations).
Fix: regenerate thumbnails in batches via WP-CLI; verify disk permissions; purge CDN 404 cache entries for affected paths.
2) Symptom: thumbnails exist on disk, but still 404 through HTTP
Root cause: web server config denies access, wrong document root, or offload/CDN rewrites point somewhere else.
Fix: validate Nginx/Apache location rules for /wp-content/uploads; check symlinks; check CDN origin path mapping.
3) Symptom: images look stretched or wrong crop after regeneration
Root cause: theme requests a size name that is registered differently now; or hard-coded dimensions conflict with generated sizes.
Fix: audit registered sizes and theme usage; regenerate only after sizes are finalized; update templates to use proper functions and size names.
4) Symptom: regeneration fails with memory errors
Root cause: PHP memory_limit too low for large images; ImageMagick resource limits too conservative.
Fix: raise memory limits appropriately, reduce batch size, and test canary images again; consider using GD temporarily for problematic formats.
5) Symptom: site becomes slow during regeneration even with batching
Root cause: shared bottleneck: disk I/O, network storage metadata ops, or PHP-FPM worker contention.
Fix: apply nice/ionice; move regeneration to a dedicated worker host; run during off-peak; reduce concurrency to one process.
6) Symptom: only some years/months are broken
Root cause: partial migration, rsync missed older folders, or permissions differ on older directories.
Fix: verify directory ownership recursively; confirm all upload paths exist; regenerate targeted ranges first.
7) Symptom: srcset includes sizes that don’t exist
Root cause: stale attachment metadata; or a plugin adds sizes then removed them, leaving old metadata behind.
Fix: regenerate attachments to refresh metadata; ensure size list is stable; clear page caches so HTML updates propagate.
8) Symptom: thumbnails exist, but CDN serves old broken image
Root cause: CDN cached the broken response, sometimes including 404 caching or stale object version.
Fix: purge specific objects (not everything); if versioning is available, change cache key via query-string/versioned filenames.
FAQ
1) Do I have to take WordPress down to regenerate thumbnails?
No. You do have to treat regeneration like a background batch job. Use WP-CLI, throttle it, and watch load. Downtime is optional; chaos is not.
2) Why didn’t WordPress generate the new thumbnail sizes automatically?
WordPress generates intermediate sizes when images are uploaded. When you change registered sizes later (theme switch, settings change), older attachments don’t retroactively get new sizes unless you regenerate.
3) What’s safer: ImageMagick or GD?
ImageMagick often produces better results and supports more operations, but it’s also more likely to run into policy/resource limits in hosted environments. GD is simpler and sometimes more predictable under strict constraints. In production, “safer” means “works for your largest real images under load.” Canary test before choosing.
4) Can I regenerate only missing sizes instead of everything?
Yes, but it depends on your tooling and how broken the metadata is. A common compromise is to regenerate only attachments from a date range or only specific post types, then expand. If you can’t easily detect missing sizes reliably, batch everything slowly.
5) Why are thumbnails correct in wp-admin but wrong on the front-end?
wp-admin often shows a single size or uses different markup than your theme. The front-end uses srcset, custom sizes, and caching layers. If HTML is cached, it can keep referencing old filenames even after regeneration.
6) What if my uploads are offloaded to S3-compatible storage?
Regeneration on the app host may only create local intermediate sizes. You must ensure your offload plugin uploads the new variants and that URLs rewrite to the bucket/CDN correctly. Verify remote existence, not just local.
7) How do I avoid saturating PHP-FPM workers?
Don’t regenerate via wp-admin on a busy site. Use WP-CLI with nice/ionice, keep concurrency low, and run in off-peak windows. If you can, use a dedicated worker host so web workers stay for web requests.
8) Should I purge the full CDN cache after regeneration?
Almost never. Purge only what changed or what was cached incorrectly (especially cached 404s). A full purge shifts all traffic back to origin and can create a new incident that has nothing to do with thumbnails.
9) Why does regeneration sometimes change image appearance?
Different processing libraries handle sharpening, color profiles, and EXIF orientation differently. Also, your theme’s crop settings might have changed. Expect some differences—verify the most visible content types before doing the full run.
10) Can I run multiple regeneration processes in parallel to finish faster?
You can, but you probably shouldn’t on shared storage or a single origin host. Parallel regeneration is an easy way to turn a long task into a short outage. If you want speed, add dedicated workers and measure storage behavior first.
Conclusion: next steps you can execute today
Broken thumbnails are rarely “just thumbnails.” They’re your system telling you where it’s fragile: storage assumptions, metadata drift, CDN behavior, and background work competing with live traffic.
Do this next:
- Run the fast diagnosis playbook and identify whether you’re dealing with missing files, stale metadata, or cache/offload misrouting.
- Canary regenerate 10 attachments via WP-CLI and confirm both filesystem and HTTP behavior.
- Generate an attachment ID list and run bounded batches with
nice/ionice, with explicit stop conditions. - Purge narrowly (especially cached 404s) and let caches warm back up instead of detonating the whole CDN.
- Write down what you learned: safe batch size, average seconds per image, and the failure modes you hit. Future-you will be annoyingly grateful.
If you treat regeneration like production work—observable, throttled, reversible—you won’t need downtime. You’ll just need patience and the maturity to not “speed things up” by making them louder.