WordPress: The Media Library Cleanup That Doesn’t Break URLs

Was this helpful?

Nothing makes a marketing org panic like a homepage full of broken images. Nothing makes an SRE panic like a “quick cleanup” on production that quietly turns into a distributed 404 generator.

You can absolutely shrink a bloated WordPress Media Library, reduce storage bills, and speed up backups—without breaking URLs. But you have to treat it like a production change: measure first, delete last, and keep an escape hatch.

What you’re actually cleaning (and why URLs break)

“Media cleanup” sounds like deleting a bunch of JPEGs. In WordPress, it’s messier:

  • Filesystem objects: usually under wp-content/uploads/YYYY/MM/, plus generated sizes (thumbnails) and sometimes “-scaled” variants.
  • Database rows: attachments are posts in wp_posts with post_type='attachment'.
  • Metadata: attachment metadata in wp_postmeta (notably _wp_attached_file and _wp_attachment_metadata) tells WordPress which files exist and what sizes were generated.
  • References: posts/pages store image references in HTML, block JSON, shortcodes, theme options, widgets, and sometimes in plugin tables.
  • CDN/offload layers: your “real” files may be on S3/Cloud Storage, with local copies optional.

URLs break for three main reasons:

  1. You deleted a file that is still referenced (by content, a builder, a theme option, a social card, or a plugin table).
  2. You changed the URL mapping (site URL change, offload plugin settings change, different bucket/path, or rewriting uploads path).
  3. You changed the filename (dedup/rename/optimization step that “helpfully” re-encoded, re-saved, or relocated images).

Media cleanup that doesn’t break URLs is not primarily a deletion problem. It’s a reference integrity problem.

Operational truth: WordPress doesn’t maintain a comprehensive “this image is used by these posts” index. You build that map yourself, or you accept risk. In production, accept as little as possible.

Joke #1: Deleting media on a live WordPress site without a plan is like pruning a bonsai with a chainsaw—technically a tool, emotionally a mistake.

Interesting facts & historical context (short, useful)

  1. Attachments are posts. WordPress stores media as rows in wp_posts since early versions, which is why you can “edit” an image like a post.
  2. Thumbnails changed the game. Automatic intermediate sizes existed early, but modern responsive image behavior (like srcset) made “one upload → many files” the default.
  3. The “-scaled” suffix is a relatively modern artifact. WordPress started generating “scaled” images for very large uploads to prevent gigantic originals from breaking layouts and consuming memory.
  4. EXIF handling has been a consistent footgun. Orientation metadata can make a picture look “rotated wrong” depending on how it’s processed and regenerated.
  5. The uploads path is configurable. WordPress can store files outside the default uploads directory, but many plugins assume the default path anyway.
  6. CDNs made URL permanence matter. The moment a campaign email goes out, those media URLs become quasi-immutable public API endpoints.
  7. Builders don’t always store plain HTML. Page builders often serialize content into JSON-ish blobs; simple grep misses usage.
  8. Some offload plugins treat the bucket as source of truth. Local cleanup can be safe—or catastrophic—depending on whether the offload is “copy” or “move.”
  9. WP-CLI became the adult supervision. Operationally, WP-CLI is the difference between repeatable maintenance and “I clicked around and hoped.”

Non-negotiable principles for safe media cleanup

1) URLs are contracts

Marketing sees image URLs as “assets.” Engineering should see them as public interfaces. Once a URL is published—web pages, emails, PDFs, social previews—changing it is a breaking change.

2) Make a deletion plan that assumes you’re wrong

You will miss a reference on the first pass. The plan must include:

  • A reversible stage (quarantine/move instead of delete).
  • A monitoring window (watch 404s, origin fetches, and error rates).
  • A rollback (move files back, restore DB rows, revert redirect rules).

3) Separate “orphan in DB” from “unused in content”

An attachment can be absent from the Media Library UI (or not obviously referenced) and still be used:

  • In theme options (customizer, header logo, Open Graph image).
  • In widgets/menus.
  • In CSS background images.
  • In builder plugin tables.

4) Don’t confuse “duplicate bytes” with “safe to dedupe”

Two files can be identical but have different URLs that are both in use. Dedupe without redirects is just a slow-motion outage.

5) Your backup strategy is part of the cleanup strategy

If your backups are slow, expensive, or unreliable, you’ll be tempted to “just delete.” Fix the backup pipeline and you’ll make calmer decisions. Also: test restores. A backup you haven’t restored is just an expensive feeling.

6) Confirm what actually serves media

Is it local disk? A mounted NFS/EFS? An object store via a plugin? A CDN pulling from origin? If you don’t know the serving path, you can’t predict the blast radius.

One quote worth keeping on a sticky note:

“Hope is not a strategy.” — General Gordon R. Sullivan

Fast diagnosis playbook

When someone says “the Media Library is huge and the site is slow,” don’t immediately reach for deletion scripts. First, identify what kind of pain you have: storage pressure, backup pain, admin slowness, front-end slowness, or CDN/origin churn.

First: confirm the symptom is real and current

  • Storage: filesystem usage and inode exhaustion.
  • Backups: runtime, incremental behavior, object count.
  • Frontend: 404s on uploads, origin bandwidth spikes, cache miss rate.
  • Admin: slow media grid/search due to DB slowness or huge metadata rows.

Second: find the bottleneck domain

  • Disk-bound: slow I/O, too many tiny files, slow network filesystem.
  • DB-bound: slow queries on wp_posts/wp_postmeta, no indexes for your search patterns, autoload bloat interfering with admin.
  • Network-bound: CDN cache misses, origin unreachable, wrong cache headers, offload plugin misconfiguration.

Third: choose the least risky lever

  • If the issue is backup duration, implement incremental backups or exclude caches before deleting media.
  • If the issue is frontend bandwidth, fix caching and image sizing before “cleanup.”
  • If the issue is storage pressure, quarantine old media first; don’t delete blind.

Practical tasks (commands + output + decisions)

These are production-grade tasks: each includes a command, sample output, what the output means, and the decision you make. Run them on a staging clone first. Then run them in production with a change window and a rollback plan.

Task 1: Check disk usage and inode pressure (the “are we actually full?” test)

cr0x@server:~$ df -h /var/www/html/wp-content/uploads
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  200G  176G   24G  89% /

Meaning: 89% used is uncomfortable but not an emergency. If this is 95%+ you’re in “incidents happen here” territory.

Decision: If >90%, prioritize safe quarantine + expansion plan; avoid long-running scripts that generate temp files.

cr0x@server:~$ df -i /var/www/html/wp-content/uploads
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
/dev/nvme0n1p2 13107200 8123456 4983744   62% /

Meaning: Inode usage is fine. If inode usage hits ~90% with plenty of GB free, you’re suffering from “too many files,” not “too much data.”

Decision: If inode-bound, focus on thumbnail sprawl and cache directories, not just big originals.

Task 2: Measure uploads footprint by year/month (find the “why is 2019 enormous?” clue)

cr0x@server:~$ du -h --max-depth=2 /var/www/html/wp-content/uploads | sort -h | tail -n 10
4.2G    /var/www/html/wp-content/uploads/2022
6.8G    /var/www/html/wp-content/uploads/2023
7.1G    /var/www/html/wp-content/uploads/2021
9.6G    /var/www/html/wp-content/uploads/2020
31G     /var/www/html/wp-content/uploads/2019
58G     /var/www/html/wp-content/uploads

Meaning: One year dominates. That often correlates with a campaign, a migration, or a plugin that generated tons of sizes.

Decision: Target analysis on the outlier year first. It’s where you’ll get the safest ROI.

Task 3: Find the biggest files (storage wins without touching URLs)

cr0x@server:~$ find /var/www/html/wp-content/uploads -type f -printf '%s %p\n' | sort -nr | head -n 5
52428800 /var/www/html/wp-content/uploads/2019/07/booth-video-poster.png
41943040 /var/www/html/wp-content/uploads/2019/08/trade-show-wall.jpg
39845888 /var/www/html/wp-content/uploads/2020/01/hero-background.tif
36700160 /var/www/html/wp-content/uploads/2021/11/webinar-slide-01.png
33554432 /var/www/html/wp-content/uploads/2019/09/product-shot-raw.jpg

Meaning: You likely have “web-hostile” formats (TIFF, raw-ish JPGs, huge PNGs) that never should have been uploaded.

Decision: Prefer replacement + redirects (keep original URL alive) or leave originals and fix rendering sizes. Don’t bulk-delete.

Task 4: Confirm WordPress thinks uploads live where you think they do

cr0x@server:~$ wp option get upload_path

Meaning: Empty output usually means “default uploads path.” If it prints a custom path, your cleanup scripts must follow it.

Decision: If custom path exists, audit any plugins and Nginx/Apache rules that assume default uploads.

cr0x@server:~$ wp option get upload_url_path

Meaning: Empty output means WordPress builds URLs from site URL + uploads path.

Decision: If this is set (or offload plugin overwrites), URL mapping may differ from filesystem layout.

Task 5: Inventory attachment counts and file types (scope the work)

cr0x@server:~$ wp db query "SELECT post_mime_type, COUNT(*) AS c FROM wp_posts WHERE post_type='attachment' GROUP BY post_mime_type ORDER BY c DESC LIMIT 10;"
+----------------+--------+
| post_mime_type | c      |
+----------------+--------+
| image/jpeg     | 48210  |
| image/png      | 12340  |
| image/webp     | 3210   |
| application/pdf| 2890   |
| image/gif      | 740    |
+----------------+--------+

Meaning: You have a large image library and a decent number of PDFs. PDFs often get embedded in downloads and emails; treat them like permanent.

Decision: Set different policies: images can be optimized; PDFs rarely should be deleted unless you can prove no usage.

Task 6: Identify attachments missing files (DB points to non-existent disk objects)

cr0x@server:~$ wp eval '
global $wpdb;
$ids=$wpdb->get_col("SELECT ID FROM {$wpdb->posts} WHERE post_type=\"attachment\" LIMIT 2000");
$missing=0;
foreach ($ids as $id){
  $file=get_attached_file($id);
  if ($file && !file_exists($file)) { $missing++; }
}
echo "checked=".count($ids)." missing=$missing\n";
'
checked=2000 missing=37

Meaning: Some attachments point to files that aren’t on disk (maybe offloaded, maybe deleted, maybe path changed).

Decision: Before deleting anything else, fix missing-file issues: they pollute your analysis and can indicate offload mismatch.

Task 7: Check whether media is offloaded (don’t delete your origin by accident)

cr0x@server:~$ wp plugin list --status=active
+---------------------------+----------+--------+---------+
| name                      | status   | update | version |
+---------------------------+----------+--------+---------+
| amazon-s3-and-cloudfront  | active   | none   | 3.2.2   |
| wordpress-seo             | active   | none   | 22.4    |
| woocommerce               | active   | none   | 8.6.1   |
+---------------------------+----------+--------+---------+

Meaning: An offload plugin is active. The filesystem may not be the authoritative store.

Decision: Verify offload mode (copy vs move). If it “moves” to object storage, local files may already be absent, and deleting “orphans” locally does nothing useful.

Task 8: Find references to a specific media URL in post content (spot-check methodology)

cr0x@server:~$ wp db query "SELECT ID, post_title FROM wp_posts WHERE post_type IN ('post','page') AND post_status IN ('publish','draft') AND post_content LIKE '%/wp-content/uploads/2019/07/trade-show-wall.jpg%';"
+-----+--------------------------+
| ID  | post_title               |
+-----+--------------------------+
| 912 | Summer trade show recap  |
+-----+--------------------------+

Meaning: At least one post directly references the URL. Deleting it will create a visible break.

Decision: If referenced, keep the file or replace it while preserving the URL (see redirect strategy later).

Task 9: Find attachments that are not attached to a parent post (not the same as unused)

cr0x@server:~$ wp db query "SELECT COUNT(*) AS unattached FROM wp_posts WHERE post_type='attachment' AND post_parent=0;"
+------------+
| unattached |
+------------+
| 39122      |
+------------+

Meaning: Many media items are “unattached.” This is normal for modern editors and builders; it does not prove unused.

Decision: Don’t use post_parent=0 as a deletion filter. Use reference scanning + access logs.

Task 10: Validate thumbnails explosion (how many files per attachment)

cr0x@server:~$ wp eval '
$id=12345;
$meta=wp_get_attachment_metadata($id);
echo "file=".$meta["file"]."\n";
echo "sizes=".count($meta["sizes"])."\n";
'
file=2019/07/trade-show-wall.jpg
sizes=18

Meaning: One upload generated 18 derivatives. Multiply by 50k images and your inode count tells a story.

Decision: If sizes are excessive, reduce registered image sizes (theme/plugins) before regenerating anything.

Task 11: Use access logs to find hot missing media (what users are actually seeing)

cr0x@server:~$ sudo awk '$9==404 && $7 ~ /\/wp-content\/uploads\// {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -n 10
52 /wp-content/uploads/2019/07/trade-show-wall.jpg
31 /wp-content/uploads/2020/01/hero-background.tif
18 /wp-content/uploads/2018/03/old-logo.png
11 /wp-content/uploads/2019/07/trade-show-wall-1024x768.jpg
9 /wp-content/uploads/2017/12/banner-winter.jpg

Meaning: These are the URLs that are currently failing in production. Some are originals; some are generated sizes. That’s important.

Decision: Fix the top offenders first. If the 404s are mostly intermediate sizes, regeneration or redirects may be required.

Task 12: Confirm whether a “missing” file exists on disk (avoid chasing ghosts)

cr0x@server:~$ ls -la /var/www/html/wp-content/uploads/2019/07/trade-show-wall.jpg
-rw-r--r-- 1 www-data www-data 41943040 Jul 12  2019 /var/www/html/wp-content/uploads/2019/07/trade-show-wall.jpg

Meaning: File exists. If users still get 404, your issue is likely routing, permissions, CDN/origin mismatch, or offload rules.

Decision: Don’t delete anything. Diagnose web serving path (Nginx alias, CDN behavior, file perms).

Task 13: Check permissions and ownership in uploads (classic silent break)

cr0x@server:~$ namei -l /var/www/html/wp-content/uploads/2019/07/trade-show-wall.jpg
f: /var/www/html/wp-content/uploads/2019/07/trade-show-wall.jpg
drwxr-xr-x root     root     /
drwxr-xr-x root     root     var
drwxr-xr-x root     root     www
drwxr-xr-x root     root     html
drwxr-xr-x www-data www-data wp-content
drwxr-x--- www-data www-data uploads
drwxr-x--- www-data www-data 2019
drwxr-x--- www-data www-data 07
-rw-r----- www-data www-data trade-show-wall.jpg

Meaning: Directories are 750 and file is 640. If Nginx runs as www-data, fine. If it runs as a different user, it’ll 403/404 depending on config.

Decision: Fix permissions before cleanup. Otherwise you’ll misclassify “unreachable” as “unused.”

Task 14: Quarantine suspected orphans safely (move, don’t delete)

cr0x@server:~$ mkdir -p /var/www/html/wp-content/uploads-quarantine
cr0x@server:~$ rsync -a --remove-source-files /var/www/html/wp-content/uploads/2017/ /var/www/html/wp-content/uploads-quarantine/2017/
cr0x@server:~$ find /var/www/html/wp-content/uploads/2017 -type f | head

Meaning: The last command prints nothing because files were moved out. This is reversible: you can move them back quickly.

Decision: Quarantine old media in chunks (by year/month). Then monitor 404s. If none spike, delete quarantine later.

Task 15: Add a temporary Nginx fallback for quarantined files (URL safety net)

cr0x@server:~$ sudo nginx -T | sed -n '1,120p'
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
# ...snip...

Meaning: Config is valid. Now you can safely add a location block (in a change window) that tries the normal file, then the quarantine path.

Decision: For a migration/quarantine period, you can serve from quarantine to keep URLs working while you confirm unused status.

Example location logic (conceptual; implement in the right server block and test): try normal path first, then quarantine.

Task 16: Track media-related DB bloat (attachment meta can be huge)

cr0x@server:~$ wp db query "SELECT ROUND(SUM(LENGTH(meta_value))/1024/1024,1) AS mb FROM wp_postmeta WHERE meta_key='_wp_attachment_metadata';"
+------+
| mb   |
+------+
| 892.4|
+------+

Meaning: Nearly a gigabyte of attachment metadata. That can affect backups and DB performance.

Decision: Don’t “optimize” by deleting metadata. Instead, reduce derivative sizes going forward and regenerate selectively.

Task 17: Spot-check that “regeneration” won’t alter URLs (it can, indirectly)

cr0x@server:~$ wp eval '
$id=12345;
echo get_attached_file($id)."\n";
'
/var/www/html/wp-content/uploads/2019/07/trade-show-wall.jpg

Meaning: Original file path is stable. Regenerating thumbnails should not change the original’s URL, but it can add/remove intermediate files that the front-end references via srcset.

Decision: If you regenerate, verify srcset output in rendered HTML and ensure the CDN/origin has the intermediate sizes.

Joke #2: “We’ll just regenerate thumbnails” is WordPress for “I would like to schedule a surprise load test.”

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption (unattached = unused)

The company had a content team that loved landing pages and hated waiting on engineering. Over a few years, they moved from the classic editor to a page builder, then to blocks, then back to the builder “for speed.” The media library ballooned. Backups got slower. Somebody finally said the obvious: “Let’s delete unattached media.”

An engineer ran a query for attachments with post_parent=0 and deleted them. It felt scientific. It was also wrong in the specific way WordPress likes to be wrong: modern editors frequently upload images that never get “attached” to a parent post, even when they’re used everywhere.

The outage didn’t show up as a full site failure. It was worse. The top navigation still loaded. The hero images were gone on high-value pages. Campaign pages looked like they were built in 1998. The support channel filled with screenshots and the kind of messages that start with “Is it just me or…”.

The postmortem was unglamorous: there was no reference index, no quarantine, and no log-based validation. They restored from backup, but the restore also rolled back unrelated content changes and created a second mess. The real fix was boring: stage a reference scan, quarantine first, then watch 404s and roll back quickly.

Mini-story 2: The optimization that backfired (dedupe + rename for “cleanliness”)

A different org wanted to reduce storage and “standardize” filenames. Their plan: detect identical images, keep one canonical copy, and rename everything to a clean scheme like brand-product-usecase-001.jpg. They even had a spreadsheet. This is the point where you can hear an SRE inhale sharply.

They rewrote image URLs in post content using a search/replace, then removed the “duplicate” files. It looked fine in a quick spot check. The problem: not all references lived in post content. Some were in theme options, some in CSS, some embedded in old PDFs, and some cached in a headless consumer that scraped content nightly.

For two weeks, random images failed depending on which path a request took: cached HTML still pointed to old filenames, and the CDN cached 404s aggressively. The helpdesk couldn’t reproduce consistently. Engineering blamed the CDN. The CDN blamed origin. Origin blamed “the website.” Classic.

They ended up implementing redirects from old to new URLs, but because the file paths were changed in bulk, the redirect map was huge and brittle. The final lesson: dedupe and renames are fine only if you treat old URLs as permanent and provide durable redirects (or never change the URL at all).

Mini-story 3: The boring but correct practice that saved the day (quarantine + log verification)

Another team had a media library that grew quietly until their shared storage started to squeal. They didn’t panic-delete. They cloned production to staging, wrote a script to list candidate “orphans,” and then did something deeply unfashionable: they reviewed a sample manually with the content team.

Then they quarantined by year: moved the oldest year of uploads to a separate directory and added a temporary web-server fallback to serve from quarantine if the file was requested. They left it in place for a few weeks. During that time, they watched access logs and 404s, and they got real evidence of what was still being requested.

They found surprising long-tail usage: an image from an old blog post was still embedded in a partner’s wiki, and a PDF from a retired campaign was still linked in a sales deck. Because quarantine still served the files, nobody noticed a break. The team simply moved those specific assets back to the main tree and marked them “do not delete.”

After the monitoring window, they deleted the remaining quarantined files, reduced backup time, and avoided a URL incident entirely. The result wasn’t dramatic. It was better: nobody outside engineering noticed, which is the highest compliment production systems can receive.

Common mistakes: symptom → root cause → fix

1) Symptom: sudden spike of 404s under /wp-content/uploads/

Root cause: Files deleted or moved without redirects; or CDN cache now missing objects after offload setting change.

Fix: Restore/quarantine rollback first. Then implement fallback serving (temporary) and rebuild a reference map. Only delete after a monitoring window.

2) Symptom: images load in admin preview but not on the public site

Root cause: Admin uses authenticated routes or different domain; public site uses CDN domain or different path mapping.

Fix: Compare wp option get siteurl, home, and offload/CDN settings. Validate the exact URL in browser dev tools and trace it to origin.

3) Symptom: random missing sizes like -1024x768 while originals exist

Root cause: Intermediate sizes deleted, but content references them via srcset or hardcoded size URLs.

Fix: Regenerate thumbnails selectively, or add rewrite rules that fall back from missing intermediate sizes to the original (careful: can increase bandwidth).

4) Symptom: Media Library grid is slow, search is painful

Root cause: DB performance issues, huge attachment tables, slow storage, or admin doing expensive queries; sometimes object cache is absent.

Fix: Profile DB queries; ensure indexes for common patterns; add object cache (Redis/Memcached) where appropriate; avoid mass regeneration during business hours.

5) Symptom: “cleanup” frees almost no space

Root cause: You deleted DB entries but not filesystem objects, or you’re using offload where local disk isn’t the main footprint, or thumbnails dominate.

Fix: Measure filesystem by year and by file type. Confirm where media is stored. Count intermediate sizes per image. Clean the thing that’s actually big.

6) Symptom: after migration, old image URLs redirect to homepage or 301 loop

Root cause: Over-broad rewrite rules or canonical redirect behavior from WordPress/SEO plugins.

Fix: Make redirects specific (uploads path only), test with curl, and avoid rewriting everything to /. Ensure redirects preserve path segments.

7) Symptom: deleting “unused” images breaks PDFs and email templates

Root cause: References exist outside WordPress posts (PDFs, email HTML stored elsewhere, CRM templates, external websites hotlinking).

Fix: Use access logs + CDN logs to detect external hits. Treat high-hit legacy media as “public API.” Keep or redirect.

8) Symptom: cleanup job times out or pegs CPU/I/O

Root cause: Scanning millions of files on network storage, or regenerating thumbnails in bulk, or running DB LIKE queries without constraints.

Fix: Chunk work by year/month, run off-hours, cap concurrency, and prefer log-driven targeting over full scans.

Checklists / step-by-step plan

Phase 0: Decide what “doesn’t break URLs” means for you

  • Strict: Every historical URL returns the same asset bytes forever. (Common for brand/legal assets.)
  • Practical: Every historical URL returns a valid asset (maybe optimized/replaced), no 404s. (Most marketing sites.)
  • Loose: Important URLs kept; long-tail may 404. (Fine only if you accept broken embeds.)

Pick one. Write it down. Make it a constraint for every decision.

Phase 1: Inventory and baseline

  1. Measure disk and inodes (df -h, df -i).
  2. Measure uploads by year/month (du --max-depth).
  3. List biggest offenders (find ... -printf '%s' sort).
  4. Count attachments by mime type (DB query).
  5. Check for offload plugins and confirm their mode.
  6. Baseline 404s for uploads in logs (top missing URLs).

Phase 2: Build a reference model (good enough, not perfect)

You’re trying to answer: “If I remove this file, who screams?” WordPress won’t answer this for you.

  • Start with direct references in post_content: search for /wp-content/uploads/ patterns. It’s crude but catches a lot.
  • Include attachment IDs usage: blocks often reference IDs, not URLs. Scan for "id":123 patterns in block content where feasible.
  • Include theme options: header logos, favicons, Open Graph defaults, background images.
  • Include known plugin tables: builders, sliders, galleries.
  • Overlay access logs: requests are the truth serum. If a URL is hit, it’s used by something, even if that something is an old PDF on a partner site.

Phase 3: Quarantine, monitor, then delete

  1. Quarantine by directory (oldest year first). Move files to a quarantine path on the same filesystem if possible (fast moves).
  2. Optional safety net: temporary server rule to serve from quarantine if not found in primary path.
  3. Monitor: uploads 404s, top missing URLs, and error budget signals for at least 1–4 weeks (depends on traffic patterns).
  4. Restore exceptions: move back files that are still requested or referenced.
  5. Delete quarantine when the request curve stays flat.

Phase 4: Prevent re-bloat

  • Reduce unnecessary intermediate sizes registered by theme/plugins.
  • Enforce max upload dimensions for editors (policy + tooling).
  • Enable server-side image optimization carefully (don’t rename files; don’t change URLs).
  • Review who can upload media and what formats are allowed.
  • Schedule periodic audits (quarterly), not “five-year purge parties.”

Redirect strategy: when you must change paths, don’t improvise

If you’re migrating uploads to a new path or domain (CDN or bucket), the safe method is: keep old URLs working via redirects or a rewrite that preserves the full relative path.

  • Best: keep the same URL and change only the backend storage (origin pulls from object store, CDN updated).
  • Next best: 301 from old path to new path with identical relative structure.
  • Avoid: 302 “temporary forever,” or redirect everything to homepage, or rewriting query strings without testing.

FAQ

1) Can I safely delete “unattached” media?

No, not as a rule. post_parent=0 often means “uploaded via modern editor/builder,” not “unused.” Use reference scans and access logs.

2) What’s the safest first cleanup that yields real space?

Start with outlier directories (one huge year/month) and obvious oversized files. Then quarantine that slice and monitor. You’ll learn the system without risking everything.

3) If I regenerate thumbnails, will it break URLs?

Regenerating thumbnails usually doesn’t change the original file URL. It can break pages that reference specific intermediate size filenames if those sizes change or disappear. Test srcset output and verify the generated files exist where the web server and CDN expect them.

4) How do I keep URLs stable while optimizing images?

Optimize in-place without renaming and without changing directory structure. If your optimizer renames files (or converts to WebP with new filenames), you need redirects or you accept broken links.

5) I use S3/offload. Should I clean local uploads at all?

Maybe. First confirm if the offload mode keeps local copies. If local is just a cache, deleting local might increase origin fetches or cause latency spikes. If object storage is authoritative, your cleanup target is the bucket, not the server disk.

6) Why do I see files on disk that aren’t in the Media Library?

Common causes: failed imports, manual uploads via FTP, old plugin behavior, or deleted attachment posts without removing files. Disk reality and DB reality drift over time—plan for it.

7) Do I need a plugin to find unused media?

Not strictly. You can build a solid process with WP-CLI, DB queries, and logs. Plugins can help, but they also add assumptions—especially around builders and custom fields. Validate before trusting.

8) What monitoring should I use during quarantine?

Track 404 rate for uploads paths, top missing URLs, and any CDN origin fetch spikes. Also watch user-facing synthetic checks on key pages that include lots of images.

9) How long should I keep quarantine before deleting?

Long enough to cover your traffic cycle. For B2B sites, 2–4 weeks is usually safer than 2–4 days. For high-traffic consumer sites, you may see enough signal in 48–72 hours, but long-tail embeds still exist.

10) What if legal/compliance requires deleting assets?

Then your “don’t break URLs” constraint changes: you may need URLs to return 410 Gone or a replacement asset. Do it deliberately: log it, document it, and avoid silent 404s.

Conclusion: what to do next week

Media cleanup that doesn’t break URLs is a reliability exercise wearing a content-management hat. You don’t “clean a library.” You manage a public asset API with years of consumers, most of whom will never file a ticket.

Practical next steps:

  1. Run the baseline tasks: disk, inodes, top years, top missing URLs, offload verification.
  2. Pick one old year/month as a pilot. Quarantine it, don’t delete it.
  3. Add a temporary safety net (server fallback) if you can, and watch uploads 404s daily.
  4. Move back exceptions, then delete quarantine only after the monitoring window stays quiet.
  5. Stop re-bloat: reduce image sizes, enforce upload policies, and schedule quarterly audits.

If you do it this way, the cleanup is almost disappointingly calm. That’s the point. Production systems reward adults.

← Previous
Wi‑Fi Drops Every 10 Minutes: The Advanced Driver Setting That Fixes It
Next →
Slow Copies to NAS: The SMB Tweaks That Actually Matter

Leave a comment