Soft 404s are the SEO version of a false fire alarm: nothing is burning, but everyone’s still evacuating. You publish a page, it loads fine in your browser, and yet Google Search Console flags it as “Soft 404” and quietly drops it from the index. Traffic declines, stakeholders panic, and you get to explain—again—that “works on my laptop” is not a monitoring strategy.
This is a WordPress problem often enough to feel personal. WordPress sites love friendly themes, plugins, caching layers, and “helpful” redirects. Those same comforts can teach Googlebot that your “real page” behaves like a missing page. Let’s fix the reality, not just the report.
What a soft 404 actually is (and why Google cares)
A “soft 404” is Google saying: “This looks like a missing page, even though your server didn’t send a proper 404.” Usually the server returns 200 OK (or sometimes 302 or 403) along with content that resembles an error page, a near-empty page, a “no results” page, or a generic template.
Google’s job is to keep garbage out of the index. If a URL returns 200 but behaves like “sorry, nothing here,” indexing it would pollute search results. So Google tries to infer intent: missing content, thin content, duplicate boilerplate, or deceptive behaviors caused by redirects/caching.
Operationally, soft 404s are not “an SEO thing.” They are a distributed systems thing: caches, CDNs, redirect logic, authentication edges, bot treatment, and inconsistent status codes. Googlebot is just another client with strong opinions.
One practical rule: your site should be boring to crawlers. Predictable status codes, consistent content, minimal surprise. Surprise is for product launches, not HTTP responses.
Why WordPress sites trigger soft 404s
1) “Pretty” 404 pages that return 200
Many WordPress themes render a beautiful “Page not found” page but forget the part where the server sends 404 Not Found. Sometimes it’s the theme, sometimes it’s a plugin, sometimes it’s a caching layer that stored a 404 page and serves it as 200.
2) Search results, tag archives, and pagination that look empty
WordPress can generate lots of URLs: /tag/, /author/, /page/99/, on-site search results, date archives. When those pages have little content (or “No posts found”), Google may treat them as soft 404s—especially if you return 200 with a template that is mostly boilerplate.
3) Redirect “cleanup” that erases meaning
Plugins that “fix” broken links by redirecting everything to the homepage are classic soft 404 machines. From a user perspective it feels friendly. From Google’s perspective it’s a mismatch: a specific URL that should be missing is being replaced by a generic page. That’s not a fix; it’s hiding evidence.
4) Maintenance modes, WAFs, and bot blocks that are too clever
Some sites return 200 with “maintenance” content, or they block bots and return an HTML “Access denied” page with 200. Googlebot indexes the denial page or declares soft 404. If you need to block, do it cleanly with proper status codes (503 for maintenance with Retry-After, 401/403 for auth) and consistent behavior.
5) Inconsistent rendering: bot sees one thing, humans see another
Geo rules, A/B tests, personalization, consent banners, and JS rendering issues can change page content. If Googlebot gets a stripped page, an interstitial, or an empty shell, it may score it as “missing.” Sometimes nothing is “wrong” on your side—until you test as Googlebot and see the sad truth.
Joke #1: A soft 404 is like your coworker saying “I’m not quitting” while packing their desk into a box.
Fast diagnosis playbook (check these in order)
If you want speed, stop guessing. Use a tight loop: check response code, check what content is served, check whether it changes for Googlebot, then trace the source layer.
First: verify HTTP status and final destination
- Does the URL return
200while displaying a 404 message? - Is there a redirect chain that ends at the homepage or a search page?
- Is the canonical pointing somewhere else?
Second: compare what Googlebot gets vs what you get
- Fetch with a Googlebot user agent. Compare HTML length, title, robots meta, canonical, and body content.
- Check if a WAF/CDN is injecting challenges or interstitials.
Third: isolate the layer that lies
- Origin server (WordPress/PHP) returning wrong status?
- Cache layer rewriting or caching error pages incorrectly?
- Plugin creating redirects or “smart” 404 handling?
Fourth: decide the intent of the URL
- Should it exist? Make it substantial and indexable.
- Should it be gone? Return
404or410and do not redirect to irrelevant pages. - Should it exist but not be indexed? Return
200withnoindex, but keep content useful for users.
How Google decides a page is “soft 404”
Google doesn’t publish the full classifier (and you don’t want them to), but we can infer the signals:
- Content similarity to known error templates: “Not found,” “no longer available,” “nothing here,” etc.
- Thin content: mostly navigation, header/footer, no main content.
- Unexpected redirects: lots of missing URLs redirecting to a generic page.
- Low unique value: faceted archives or auto-generated pages with near-zero differentiation.
- Fetch/render anomalies: blocked resources, JS errors, interstitials, consent walls, WAF challenges.
- Consistency over time: the same URL flipping between “real content” and “empty” based on cache state or geo.
Google also looks at the broader site pattern. If thousands of URLs all behave like “empty pages” with 200, the crawler gets skeptical fast.
One operational quote that belongs on every on-call runbook: You build it, you run it.
— Werner Vogels
Practical tasks: commands, outputs, and decisions
Below are hands-on checks you can run from a server or a workstation. Each task includes: a command, sample output, what it means, and the decision you make next. Do these in order if you’re debugging. Do them selectively if you’re auditing.
Task 1: Check the final HTTP status and redirect chain
cr0x@server:~$ curl -sSIL https://example.com/suspect-url | sed -n '1,25p'
HTTP/2 301
date: Sat, 27 Dec 2025 10:12:11 GMT
location: https://example.com/
cache-control: max-age=3600
server: nginx
HTTP/2 200
date: Sat, 27 Dec 2025 10:12:11 GMT
content-type: text/html; charset=UTF-8
cache-control: max-age=300
server: nginx
What it means: The URL redirects to the homepage. If this happens for missing pages, Google often flags soft 404 because the destination is unrelated.
Decision: If the URL is truly gone, stop redirecting it to the homepage. Return 404 or 410. If it moved, redirect to the closest equivalent page, not a generic one.
Task 2: Verify the body contains a “not found” template despite 200
cr0x@server:~$ curl -sS https://example.com/missing-page | grep -Ei 'not found|404|no posts found|nothing here' | head
404
Sorry, the page you are looking for could not be found.
What it means: Your site is telling users it’s missing, but you might still be returning 200.
Decision: Fix status codes at the source (theme/template or WordPress hooks), then re-test. A pretty 404 page is fine; a 200 404 page is not.
Task 3: Compare what Googlebot sees vs a normal browser
cr0x@server:~$ curl -sS -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://example.com/product/widget | sed -n '1,20p'
<html>
<head>
<title>Access Denied</title>
<meta name="robots" content="noindex,nofollow">
</head>
<body>Request blocked.</body>
What it means: A WAF/CDN rule is treating Googlebot differently. Even if humans see the product page, Google sees a block page.
Decision: Fix bot access. Allow Googlebot through (carefully), or return an honest status code (403) and accept that it won’t index. Don’t serve “Access denied” as 200.
Task 4: Confirm the real status code from origin (bypass CDN if possible)
cr0x@server:~$ curl -sSIL --resolve example.com:443:203.0.113.10 https://example.com/missing-page | sed -n '1,12p'
HTTP/2 200
date: Sat, 27 Dec 2025 10:14:02 GMT
content-type: text/html; charset=UTF-8
server: nginx
x-cache: MISS
What it means: By pinning DNS to the origin IP, you see the origin returns 200. The bug is likely WordPress/theme or origin config, not the CDN.
Decision: Fix WordPress handling (template, rewrite rules, plugins) so missing pages produce 404/410.
Task 5: Check headers for caching of error-like pages
cr0x@server:~$ curl -sSIL https://example.com/missing-page | egrep -i 'HTTP/|cache-control|age|x-cache|cf-cache-status|vary|location'
HTTP/2 200
cache-control: public, max-age=86400
age: 43122
x-cache: HIT
vary: Accept-Encoding
What it means: Your CDN/proxy is caching the “missing” experience for a day. If that page is actually a transient error or misrouted request, Google will repeatedly see the bad version.
Decision: Do not cache 404 templates as 200. Set correct status codes and consider different TTLs for error responses. Purge the bad cache entries after fixing.
Task 6: Inspect robots directives and canonical tags
cr0x@server:~$ curl -sS https://example.com/suspect-url | egrep -i 'rel="canonical"|meta name="robots"' | head
What it means: Canonical points to the homepage. That can be legitimate in rare cases, but if many pages canonicalize to home, Google will treat them as duplicates or soft 404-ish junk.
Decision: Fix canonical generation. Each real page should canonicalize to itself; truly non-canonical pages should be handled via redirects or noindex with a clear reason.
Task 7: Find WordPress “empty archive” patterns in access logs
cr0x@server:~$ sudo awk '$7 ~ /\/tag\/|\/author\/|\/page\/[0-9]+\/|\/\?s=/' /var/log/nginx/access.log | tail -n 5
198.51.100.21 - - [27/Dec/2025:10:10:01 +0000] "GET /tag/obsolete/ HTTP/2.0" 200 18432 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
198.51.100.21 - - [27/Dec/2025:10:10:03 +0000] "GET /author/ghostwriter/ HTTP/2.0" 200 18012 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
198.51.100.21 - - [27/Dec/2025:10:10:06 +0000] "GET /page/99/ HTTP/2.0" 200 17655 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
What it means: Googlebot is crawling thin/empty archive-like URLs that often produce soft 404 flags.
Decision: Decide whether those archives deserve indexing. If not, set noindex on them, or tighten internal linking and sitemap generation so you’re not inviting crawlers to useless URLs.
Task 8: Check if WordPress is sending the right status for 404 pages
cr0x@server:~$ curl -sS -o /dev/null -D - https://example.com/this-should-404 | head -n 8
HTTP/2 200
date: Sat, 27 Dec 2025 10:16:31 GMT
content-type: text/html; charset=UTF-8
server: nginx
What it means: Clear red flag. A missing page returns 200.
Decision: Fix at the origin: verify WordPress 404 handling, disable suspect plugins, review rewrite rules, and ensure your server does not rewrite missing URLs to index.php without proper 404 logic.
Task 9: Identify “redirect everything” rules in Nginx
cr0x@server:~$ sudo nginx -T 2>/dev/null | egrep -n 'try_files|error_page|return 301|rewrite' | head -n 25
45: try_files $uri $uri/ /index.php?$args;
88: error_page 404 =200 /index.php;
132: rewrite ^/old/(.*)$ / permanent;
What it means: error_page 404 =200 /index.php; is a classic foot-gun. It converts a real 404 into a 200 response, often serving a theme 404 template as success.
Decision: Remove or correct it. If you need custom error pages, keep the 404 status. In Nginx, prefer error_page 404 /404.html; (without forcing =200), or let WordPress handle 404s but ensure it sends correct status.
Task 10: Inspect Apache for error document misconfiguration
cr0x@server:~$ sudo apachectl -t -D DUMP_VHOSTS 2>/dev/null | head
VirtualHost configuration:
*:80 example.com (/etc/apache2/sites-enabled/000-default.conf:1)
cr0x@server:~$ sudo grep -R "ErrorDocument 404" -n /etc/apache2/sites-enabled /var/www 2>/dev/null | head
/etc/apache2/sites-enabled/000-default.conf:23:ErrorDocument 404 /index.php
What it means: If Apache routes 404s to /index.php without preserving the 404 status, WordPress can end up responding 200 with a 404-looking page.
Decision: Use a dedicated 404 document that keeps the status, or ensure your PHP/WordPress layer sets the correct code when it renders “not found.”
Task 11: Confirm WordPress can produce a true 404 via WP-CLI evaluation
cr0x@server:~$ cd /var/www/html
cr0x@server:~$ wp eval 'echo is_404() ? "is_404\n" : "not_404\n";'
not_404
What it means: If you run this during a request context it’s trickier, but as a sanity check it can reveal that WordPress isn’t treating certain “missing” patterns as 404s (common with custom rewrites or plugins).
Decision: Audit rewrite rules and custom post type routing. If a plugin hijacks routing, disable it and retest. If it’s custom code, fix the query and 404 handling so WordPress calls a 404 properly.
Task 12: Detect “soft 404 by thinness” using content length
cr0x@server:~$ curl -sS https://example.com/tag/obsolete/ | wc -c
8123
cr0x@server:~$ curl -sS https://example.com/real-article/ | wc -c
128944
What it means: The tag page is tiny compared to real content. That’s not always wrong, but if it’s mostly boilerplate plus “No posts found,” Google will likely soft-404 it.
Decision: Either improve it (add unique copy, show real items) or mark it noindex and stop linking to it in ways that make it look like a core page.
Task 13: Check sitemap entries for URLs that shouldn’t exist
cr0x@server:~$ curl -sS https://example.com/sitemap.xml | grep -Eo '<loc>[^<]+' | head
https://example.com/
https://example.com/tag/obsolete/
https://example.com/page/99/
What it means: Your sitemap is advertising thin/empty URLs. Google will crawl them, then judge them, then report them. Fair.
Decision: Fix sitemap generation (SEO plugin settings). Only include index-worthy URLs. Sitemaps are a contract; don’t sign contracts you can’t honor.
Task 14: Verify the server returns 410 for permanently removed content
cr0x@server:~$ curl -sS -o /dev/null -D - https://example.com/old-campaign-2019 | head -n 6
HTTP/2 410
date: Sat, 27 Dec 2025 10:20:09 GMT
content-type: text/html; charset=UTF-8
server: nginx
What it means: 410 Gone clearly communicates permanence. Google typically drops these faster than 404.
Decision: Use 410 for intentionally removed pages you don’t plan to replace, especially if they keep getting crawled. Use 404 for unknown/mistyped URLs.
Task 15: Verify no “helpful” redirect plugin is masking 404s
cr0x@server:~$ wp plugin list --status=active
+---------------------+----------+--------+---------+
| name | status | update | version |
+---------------------+----------+--------+---------+
| redirection | active | none | 5.4.2 |
| wp-super-cache | active | none | 1.9.4 |
| seo-plugin | active | none | 21.2 |
+---------------------+----------+--------+---------+
What it means: Redirect plugins are fine—until someone enables a global rule like “redirect all 404s to homepage.”
Decision: Audit rules. Delete global 404-to-home rules. Replace with specific redirects that map old URLs to the closest new URLs.
Task 16: Confirm your 404 page is marked as 404 (not cached as 200)
cr0x@server:~$ curl -sS -o /dev/null -D - https://example.com/definitely-missing | egrep -i 'HTTP/|cache-control|age|x-cache|via'
HTTP/2 404
cache-control: no-cache, must-revalidate, max-age=0
x-cache: MISS
What it means: This is what you want: correct status, conservative caching.
Decision: Ship it. Then purge any stale cached 200s and ask for recrawl in Search Console for key URLs after fixes.
Three corporate mini-stories from the trenches
Mini-story #1: The incident caused by a wrong assumption
A mid-sized e-commerce company migrated from an old CMS to WordPress with a shiny new theme. The team did what teams always do: tested the top 50 pages manually. Everything “worked.” They shipped the cutover Friday afternoon, because of course they did.
Within a week, Search Console lit up with soft 404s across product URLs. The product pages were still there for humans, but Google started dropping them. Paid traffic kept converting, organic traffic slumped, and the CEO learned the phrase “index coverage report.”
The wrong assumption: “If the page loads, the status must be 200 and that’s good.” The theme rendered a “product not available” template when inventory was zero—but it returned 200 and also set the canonical to the category page. That’s not a product page; that’s a politely disguised missing page.
They fixed it by making out-of-stock pages return 200 only if they included real product details and alternatives. Truly discontinued products returned 410 with specific, relevant redirects where possible. They also corrected canonical tags to self-reference for real products. Indexing recovered gradually as Google recrawled; the key was consistency and not flapping behavior.
Mini-story #2: The optimization that backfired
A media site wanted to reduce origin load. They pushed aggressive caching at the CDN and added a rule: cache HTML for 24 hours, including “error pages,” because it improved hit ratio and made dashboards look calm.
Then WordPress had a brief database hiccup. For a few minutes, many pages rendered a minimal template saying “No posts found” (because queries failed). The origin still returned 200 since the PHP layer didn’t throw a hard error. The CDN happily cached those responses for a day.
Humans complained first, but the real damage was quieter: Googlebot crawled during the bad window, saw a bunch of nearly empty 200 pages, and flagged them as soft 404. Even after the database recovered, Google continued to see cached emptiness.
The fix was unglamorous: stop caching HTML responses that match error-like patterns, cache 5xx briefly, and give 404s a sane TTL. They also added an application-level health gate: if WordPress can’t query content, return 503, not a “successful” empty page. Performance metrics got slightly worse; indexing and user trust got better. Pick your poison carefully.
Mini-story #3: The boring but correct practice that saved the day
A regulated B2B SaaS company ran WordPress for documentation and landing pages. Their SRE team insisted on something that looked bureaucratic: a weekly crawl-and-verify job that sampled URLs from the sitemap and checked status, canonical, and content length.
It wasn’t fancy. It ran from a CI runner with curl and dumped results into a spreadsheet and an alert channel. Marketing occasionally rolled their eyes. SRE occasionally rolled theirs back. Peace reigned.
During a plugin update, a caching plugin began serving the 404 template for some paginated archives due to an edge-case bug. The pages returned 200 with “Nothing Found.” The weekly job caught the anomaly within hours, not weeks, and they rolled back before Google reprocessed a meaningful chunk of the site.
The practice didn’t win awards. It did prevent an indexing mess, which is the closest thing ops has to winning awards anyway.
Joke #2: The quickest way to create soft 404s is to “simplify redirects” right before a holiday weekend. The second quickest is to say that out loud.
Common mistakes: symptom → root cause → fix
1) “Soft 404” on URLs that show a 404 message to users
Symptom: Page says “Not Found,” but server returns 200.
Root cause: Theme renders 404 template while the server status stays 200, or Nginx/Apache rewrites 404 to index.
Fix: Ensure the HTTP response is 404. Remove error_page 404 =200 patterns. Verify with curl -I and with a Googlebot user agent.
2) Missing pages redirected to homepage
Symptom: Lots of old URLs redirect to /; Search Console flags soft 404.
Root cause: Redirect plugin “helpfully” sending all 404s to home, or blanket rewrite rules.
Fix: Replace blanket redirects with specific 301s to relevant replacements. If there is no replacement, return 404 or 410.
3) Archive pages flagged as soft 404
Symptom: Tag, author, date, or pagination URLs listed as soft 404.
Root cause: Empty archives or ultra-thin pages with mostly boilerplate.
Fix: Decide: either make them valuable (curated content, meaningful intro text, real internal linking), or set noindex and remove from sitemaps.
4) “Soft 404” spikes after enabling CDN caching
Symptom: Search Console starts flagging many URLs soon after caching changes.
Root cause: Cache stored transient empty/error responses as 200 and served them long-lived to Googlebot.
Fix: Adjust caching rules: don’t cache error templates, keep 404/5xx TTL conservative, purge after fixes, and avoid caching personalized or bot-challenged responses.
5) Google sees “Access denied” or interstitial pages
Symptom: Soft 404 or “Crawled – currently not indexed” for important URLs; fetch-as-bot shows block page.
Root cause: WAF, bot protection, geo rules, or consent walls served to Googlebot.
Fix: Allow Googlebot (verify by reverse DNS if you’re strict), or serve proper 403/401 and accept non-indexing. Don’t serve block pages as 200.
6) Canonical tags collapsing many pages to one
Symptom: Many URLs have canonical pointing to homepage or a category root; indexing drops.
Root cause: SEO plugin misconfiguration, theme bug, or a misguided “avoid duplicates” hack.
Fix: Set canonical to self for real pages. Use canonicalization intentionally, not as a panic button.
7) JSON/REST endpoints or query URLs indexed and flagged
Symptom: Search Console shows soft 404 for weird URLs like query strings or API endpoints.
Root cause: Internal search links exposed, parameterized pages crawlable, or plugin generating junk URLs.
Fix: Stop generating crawl paths: restrict internal links, consider noindex for search results, and ensure non-HTML endpoints aren’t included in sitemaps.
Checklists / step-by-step plan
Step 1: Triage the URLs
- Bucket A: Should exist and be indexed (money pages, key docs).
- Bucket B: Should exist but not be indexed (internal search, some archives).
- Bucket C: Should not exist (deleted campaigns, typos, spammy query URLs).
Do not use one global behavior for all buckets. That’s how you get soft 404s—and meetings.
Step 2: For Bucket A, make the page undeniably real
- Return
200with substantial main content. - Ensure canonical is self-referential.
- Avoid “no results” pages masquerading as content pages.
- Fix rendering differences for Googlebot.
- Ensure the page is not blocked by WAF, auth, or robots directives.
Step 3: For Bucket B, keep it honest and intentional
- Return
200but addnoindexvia meta robots (or X-Robots-Tag if you control headers). - Remove from XML sitemaps.
- Reduce internal links that promote them as important.
Step 4: For Bucket C, make it disappear properly
- Return
404for unknown/mistyped URLs. - Return
410for intentionally removed content you will not replace. - Only redirect when there is a relevant replacement. Redirecting to home is rarely relevant.
Step 5: Fix the platform behaviors that create soft 404s at scale
- Theme: Verify 404 templates don’t override status codes.
- Plugins: Audit redirect and SEO plugins for blanket rules and canonical behavior.
- Server config: Remove 404-to-200 rewrites and “catch-all” hacks.
- CDN: Don’t cache transient empty responses as long-lived HTML.
- Monitoring: Add a scheduled crawl that samples important URLs and verifies status and content size.
Step 6: Validate and then ask Google to re-evaluate
- Re-test with
curl -Iand Googlebot UA. - Purge caches after the fix.
- In Search Console, request reindexing for top affected URLs. Don’t bother for thousands; fix systemic issues and let crawling catch up.
Facts and historical context (useful, not trivia)
- Fact 1: “Soft 404” isn’t an HTTP status. It’s a search engine classification layered on top of your actual response.
- Fact 2: Early web servers often served custom error pages, but the HTTP status code still mattered; search engines learned to distrust “pretty errors” with
200. - Fact 3: WordPress’s permalink and rewrite flexibility is a double-edged sword: it makes many URLs “resolve,” even when no content exists.
- Fact 4: The industry trend toward SPAs and JS-heavy themes increased cases where crawlers receive “empty shells,” which can look like soft 404s when rendering fails.
- Fact 5: CDN adoption improved performance but amplified bugs: one bad response cached globally can change what crawlers see for days.
- Fact 6: Search engines treat mass homepage redirects from removed URLs as a quality issue; it resembles “sneaky redirects” and low-value behavior.
- Fact 7:
410 Gonehas existed in HTTP for decades but is underused; it can speed up removal when you truly mean “gone forever.” - Fact 8: Parameters and faceted navigation exploded with e-commerce and CMS tags; search engines started classifying many near-empty combinations as low value or soft-404-like.
- Fact 9: Bot management grew into a product category; accidentally challenging or blocking Googlebot is now a common cause of indexing anomalies.
FAQ
1) Is a soft 404 always bad?
It’s bad if it hits URLs you want indexed. If Google flags a junk URL as soft 404, that’s arguably a free cleanup service. The problem is when your important pages get misclassified.
2) Should I redirect all 404s to the homepage to “save SEO”?
No. That’s a classic soft 404 trigger. Redirect only when there’s a relevant replacement. Otherwise return 404 or 410.
3) What’s better: 404 or 410?
404 means “not found (maybe temporary).” 410 means “gone (intentional).” Use 410 for deliberate removals you don’t plan to restore, especially if they keep getting crawled.
4) Can thin content alone cause soft 404?
Yes. If a page is mostly boilerplate with little unique main content, Google may treat it as effectively missing. This often happens on empty tag archives, search result pages, and deep pagination.
5) My WordPress 404 page looks fine—why does Google still complain?
Because Google cares about the status code and the intent. If you return 200 with a 404-like message, or canonical everything to home, Google will call it what it is: not a real page.
6) Could caching cause soft 404 even if the origin is correct now?
Absolutely. A CDN can cache a transient empty response as 200. Googlebot hits the cached version, not your freshly fixed origin, until TTL expires or you purge.
7) Should I “noindex” pages that are soft 404?
If the page should exist and be indexed, fix it instead of noindexing it. If the page should exist but not be indexed (like internal search), noindex is appropriate—provided it’s still useful for users.
8) How long does it take Google to recover after fixes?
Depends on crawl frequency and site size. Important pages may update within days; long-tail pages can take weeks. The fastest path is: fix systemic issues, purge caches, then request reindexing for top URLs.
9) Do I need to change WordPress core to fix this?
Usually no. Most causes are theme behavior, plugin redirects/canonicals, server config (404-to-200 rewrites), and caching/WAF rules. Start there.
10) Why would Google mark a real page as soft 404 during a migration?
Migrations often create redirect chains, mismatched canonicals, thin placeholder pages, or temporary maintenance responses served as 200. Google sees inconsistency and classifies defensively.
Conclusion: next steps that actually move the needle
Soft 404s aren’t mystical. They’re a mismatch between what your server claims (200) and what your content communicates (“nothing here”), often amplified by redirects and caches. Fix the mismatch, and Google usually calms down.
- Pick 10 flagged URLs and run the fast diagnosis playbook: status, redirects, bot view, cache behavior.
- Remove blanket 404-to-home redirects. Replace with specific, relevant mappings or return
404/410. - Make 404s real 404s. Audit Nginx/Apache rules and theme behavior that forces
200. - Decide what’s index-worthy. If archives/search pages are thin, either enrich them or
noindexand drop them from sitemaps. - Fix cache policy so transient emptiness doesn’t become a 24-hour global truth.
- Add a boring weekly crawl check. Boring is reliable. Reliable is profitable.
If you do the above, Search Console will stop tattling, Googlebot will stop distrusting your site, and you’ll get fewer emergency pings about “traffic mysteriously dropping.” Which is the real KPI.