You submitted your sitemap. Search Console nodded politely. Days later: “Couldn’t fetch.” Or worse: “Success,” but nothing shows up in the index. Meanwhile marketing is asking if “Google is down” and your boss is asking for an ETA like indexing is a deploy.
Indexing isn’t a button. It’s a pipeline: fetch → parse → trust → schedule → crawl → decide. Your WordPress sitemap is just one artifact in that pipeline, and it fails in very specific, diagnosable ways. Let’s debug it like adults with shell access.
What “sitemap not indexing” actually means
People say “my sitemap isn’t indexing” when they mean one of four things:
- Google can’t fetch the sitemap (network, DNS, TLS, auth, robots, WAF, 4xx/5xx, redirect loops).
- Google fetches it but can’t parse it (wrong content-type, HTML instead of XML, broken XML, gzip issues, huge files, invalid dates, invalid URLs).
- Google parses it but ignores the URLs (blocked by robots, noindex, canonical mismatch, soft 404, duplicates, low quality, spam signals).
- Google indexes some URLs but not others (crawl budget, faceted URLs, poor internal linking, thin content, parameter bloat).
A sitemap is not an indexing guarantee. It’s a hint. A good hint helps discovery and prioritization; a bad hint is just noise that gets deprioritized. Your job is to make it fetchable, parseable, and trustworthy.
One operational reality: Search Console “Submitted” or even “Success” is not a pass/fail test. It’s a status of one step in a multi-step pipeline. Treat it like an HTTP 200 from a reverse proxy: pleasant, but not proof the app works.
Facts and historical context (why this is harder than it looks)
- Sitemaps are relatively new. The XML Sitemaps protocol was introduced in 2005; before that, discovery was mostly links and luck.
- Search engines treat sitemaps as hints, not commands. That “lastmod” you set is advisory; if it looks unreliable, it gets discounted.
- Google has supported sitemap index files for years because a single sitemap is capped (commonly 50,000 URLs and ~50MB uncompressed). Big sites must split.
- WordPress didn’t ship a native sitemap until WordPress 5.5 (2020). Before that, plugins dominated—and some still conflict with core behavior.
- Robots.txt predates sitemaps by over a decade. It’s from the mid-1990s, and it still quietly ruins modern SEO when someone copies a “temporary” block into production.
- Canonical tags changed the game. If your sitemap URL disagrees with your canonical, Google often trusts the canonical and ignores the sitemap entry.
- CDNs became default infrastructure. Great for latency. Also great at caching the wrong thing forever if you let them.
- HTTPS migration bugs are evergreen. Mixed http/https in sitemaps remains one of the most common “looks fine to humans” failures.
- WAFs and bot mitigation are now common. Many “security” presets challenge Googlebot with JS/CAPTCHA—then you wonder why discovery collapses.
Fast diagnosis playbook (first/second/third)
First: confirm what Google is trying to fetch
- In Search Console, check the sitemap exact URL you submitted (http vs https, www vs apex, trailing slash, path).
- Check the sitemap fetch status and “Last read.” If “Couldn’t fetch,” stop theorizing and start doing HTTP.
- Pick one URL from the sitemap that should index. Inspect it in Search Console (URL Inspection) and note: “Crawled as,” “Indexing allowed?,” “User-declared canonical,” “Google-selected canonical.”
Second: reproduce from the outside with curl
- Fetch the sitemap with curl and follow redirects. Verify status code, content-type, and body start.
- Verify it is XML (or gzipped XML), not HTML, not a login page, not a cached error page.
- Validate redirect chain is short and stable (1–2 hops max).
Third: trace the failure inside your stack
- Check server logs for Googlebot fetch attempts and responses (status codes, bytes, user agent, edge location if behind CDN).
- Check robots.txt, meta robots tags, and HTTP headers (X-Robots-Tag).
- Check plugin conflicts (core sitemap vs Yoast/Rank Math) and caching layers returning wrong variants.
If you do those three phases in that order, you’ll usually find the bottleneck in under 20 minutes. If you skip to “reinstall SEO plugin,” you’re just rebooting the printer.
Failure modes that block sitemap indexing
1) The sitemap URL is “right” in your head, wrong in reality
WordPress can expose multiple sitemap endpoints depending on core and plugins:
- Core:
/wp-sitemap.xml(and related indexes). - Yoast:
/sitemap_index.xml - Rank Math:
/sitemap_index.xml(same path, different generator)
Common mess: you submit /wp-sitemap.xml but a plugin disables core and serves something else, or your reverse proxy rewrites the path and hands back a 200 HTML error page. Google doesn’t “figure it out.” It just fails or ignores.
2) Robots.txt blocks the sitemap or the URLs inside it
Blocking the sitemap fetch is obvious. Blocking the URLs listed in the sitemap is sneakier: the sitemap can be fetched and parsed, but none of its URLs are eligible to crawl.
3) Noindex, X-Robots-Tag, or a security header shuts the door
WordPress sites often accumulate noindex in three places:
- Meta tag in HTML (
<meta name="robots" content="noindex">), often set by “Discourage search engines” or an SEO plugin toggle. - HTTP header (
X-Robots-Tag: noindex), usually set by nginx/Apache rules or a security plugin. - Robots directives in robots.txt that disallow crawling.
4) Canonical mismatch and duplicate URL variants
If your sitemap lists http://example.com/page but the page canonicals to https://www.example.com/page/, Google will often treat the sitemap entry as a low-quality hint. Multiply that by thousands and the sitemap becomes background noise.
5) The sitemap returns HTML (or JSON) instead of XML
This happens due to caching, WAF challenges, “maintenance mode,” or forced login pages. Your browser may render something that looks plausible; Googlebot sees a different variant. If your CDN is doing device or bot-based variation, congratulations: you built a split-brain.
6) 200 OK with an error payload (soft failures)
Operationally, the worst bug is 200 OK with a body that says “Error”. Some plugins and themes do this. Some reverse proxies do this when upstream is down and a stale cached page is served. Google will parse garbage and move on.
7) Redirect chains, loops, and “helpful” normalization
Three redirects are not a strategy. Long redirect chains waste crawl budget and sometimes break fetch. Redirect loops are self-explanatory and yet still happen because someone “fixed” both www→apex and apex→www at different layers.
8) Server errors and rate limiting
Googlebot is polite, but it will back off when you throw 429s or inconsistent 5xx. If your sitemap fetch yields 503 during peak, Search Console might show intermittent fetch failures. That translates to delayed discovery, and “delayed discovery” becomes “not indexing” in executive language.
9) Large sitemaps and slow generation
Dynamic sitemap generation can be expensive on WordPress. If generating /sitemap_index.xml triggers heavy database queries and times out behind your proxy, you’ll see partial output, truncated XML, or 504s. Splitting sitemaps helps, but caching and pre-generation help more.
10) Bad lastmod hygiene (trust erosion)
If every URL shows the same lastmod timestamp, every day, forever, Google learns it’s meaningless. If your sitemap always claims “everything changed,” Google will treat it like the boy who cried deploy.
Paraphrased idea from Werner Vogels: Everything fails all the time; build systems that tolerate failure and recover quickly.
It applies to SEO plumbing, too: build for correctness, observability, and graceful degradation.
Short joke #1: A sitemap that “indexes itself” is like a pager that acknowledges its own alerts—comforting, but not useful.
Practical tasks: commands, outputs, and decisions (12+)
These tasks assume you have shell access to a host that can reach your site (a bastion, a CI runner, or even your web server). Replace example.com with your domain. The point is not the exact command; it’s the decision you make from the output.
Task 1: Fetch the sitemap and verify status, redirects, and content-type
cr0x@server:~$ curl -sSIL -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://example.com/sitemap_index.xml
HTTP/2 301
location: https://www.example.com/sitemap_index.xml
HTTP/2 200
content-type: application/xml; charset=UTF-8
cache-control: max-age=300
What it means: One redirect to the canonical host, then 200 with XML content-type. That’s healthy.
Decision: If you see 403/503/429, fix edge/WAF/rate limits first. If content-type is text/html, go hunt the layer returning HTML.
Task 2: Confirm the body is actually XML (not a login page)
cr0x@server:~$ curl -sS -A "Googlebot" https://www.example.com/sitemap_index.xml | head -n 5
https://www.example.com/post-sitemap.xml
What it means: Starts with XML declaration and a sitemapindex node. Good.
Decision: If you see HTML (<!doctype html>) or a “Please enable cookies” page, your WAF/CDN is serving the wrong variant to bots.
Task 3: Validate sitemap XML well-formedness quickly
cr0x@server:~$ curl -sS https://www.example.com/post-sitemap.xml | xmllint --noout -
cr0x@server:~$ echo $?
0
What it means: Exit code 0 means the XML is well-formed.
Decision: Non-zero means broken XML. Fix generation (plugin/theme), truncation (timeouts), or compression (double gzip) before anything else.
Task 4: Check for accidental noindex via HTTP headers
cr0x@server:~$ curl -sSI https://www.example.com/ | egrep -i "x-robots-tag|location|content-type"
content-type: text/html; charset=UTF-8
x-robots-tag: noindex, nofollow
What it means: Your entire site is being noindexed at the header level.
Decision: Remove or scope that header. If it’s from nginx/Apache, fix config. If it’s from a security plugin, disable that feature or whitelist.
Task 5: Check robots.txt fetch and contents (including sitemap directive)
cr0x@server:~$ curl -sS https://www.example.com/robots.txt
User-agent: *
Disallow: /wp-admin/
Disallow: /
Sitemap: https://www.example.com/sitemap_index.xml
What it means: Disallow: / blocks everything. The sitemap directive doesn’t override it.
Decision: Remove the global disallow (unless you truly want the site hidden). If staging needs it, don’t copy staging robots.txt to prod.
Task 6: Check the sitemap URLs for mixed host/protocol and redirects
cr0x@server:~$ curl -sS https://www.example.com/post-sitemap.xml | grep -Eo "[^<]+" | head
http://example.com/hello-world/
http://example.com/about/
http://example.com/contact/
What it means: Sitemap is emitting http and apex domain, not your canonical https://www.
Decision: Fix WordPress Site URL/Home URL settings, plugin settings, and any hardcoded filters. Then resubmit the sitemap. Don’t rely on redirects as a “fix.”
Task 7: Sample a URL from the sitemap and verify canonical and robots meta
cr0x@server:~$ curl -sS https://www.example.com/hello-world/ | egrep -i "
What it means: Canonical matches the URL and it’s indexable.
Decision: If canonical points elsewhere or robots meta shows noindex, fix the template/SEO plugin rules. If it’s intentional (tag archives, internal search), remove those URLs from sitemap.
Task 8: Detect if the sitemap is being gzipped or double-compressed
cr0x@server:~$ curl -sSI -H "Accept-Encoding: gzip" https://www.example.com/post-sitemap.xml | egrep -i "content-encoding|content-type"
content-type: application/xml; charset=UTF-8
content-encoding: gzip
What it means: Gzip is enabled; that’s fine.
Decision: If you see gzipped content but the body is not actually gzip (or it’s gzip inside gzip), fix server/CDN compression settings. Googlebot is patient, not psychic.
Task 9: Confirm the sitemap is not blocked by basic auth or IP allowlists
cr0x@server:~$ curl -sSIL https://www.example.com/sitemap_index.xml | head
HTTP/2 401
www-authenticate: Basic realm="Restricted"
What it means: The sitemap is behind authentication. Google can’t fetch it.
Decision: Remove auth from public site paths or selectively protect admin-only areas. If this is a staging environment, don’t submit its sitemap.
Task 10: Check nginx access logs for Googlebot sitemap fetches and status codes
cr0x@server:~$ sudo grep -E "Googlebot|sitemap" /var/log/nginx/access.log | tail -n 5
203.0.113.10 - - [27/Dec/2025:10:21:12 +0000] "GET /sitemap_index.xml HTTP/2.0" 200 1249 "-" "Googlebot/2.1"
203.0.113.10 - - [27/Dec/2025:10:21:13 +0000] "GET /post-sitemap.xml HTTP/2.0" 503 182 "-" "Googlebot/2.1"
What it means: Index fetched, but a child sitemap intermittently fails with 503.
Decision: Fix origin stability for child sitemaps (timeouts, PHP-FPM saturation, database). Google will treat flaky sitemap hosts as unreliable.
Task 11: Look for PHP-FPM saturation that causes intermittent 504/503
cr0x@server:~$ sudo tail -n 8 /var/log/php8.2-fpm.log
[27-Dec-2025 10:21:13] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[27-Dec-2025 10:21:13] WARNING: [pool www] child 1942 said into stderr: "script_filename = /var/www/html/index.php"
What it means: Your PHP pool is maxed out. Sitemap generation can tip it over during crawls.
Decision: Add caching for sitemap endpoints, raise capacity carefully, or reduce expensive DB queries. Don’t just crank pm.max_children without memory math.
Task 12: Verify WordPress “Discourage search engines” setting via wp-cli
cr0x@server:~$ cd /var/www/html
cr0x@server:~$ wp option get blog_public
0
What it means: WordPress is set to discourage indexing (often sets noindex via plugins/themes or affects robots output).
Decision: Set it to 1 in production, then confirm actual output headers/meta/robots.
cr0x@server:~$ wp option update blog_public 1
Success: Updated 'blog_public' option.
Task 13: Confirm which sitemap generator is active (plugin conflicts)
cr0x@server:~$ wp plugin list --status=active
+--------------------+--------+-----------+---------+
| name | status | update | version |
+--------------------+--------+-----------+---------+
| wordpress-seo | active | none | 22.5 |
| rank-math | active | available | 1.0.225 |
| wp-super-cache | active | none | 1.12.3 |
+--------------------+--------+-----------+---------+
What it means: Two SEO plugins active. This is how you get dueling canonicals, dueling sitemaps, and dueling blame.
Decision: Pick one SEO plugin. Disable the other. Then confirm the sitemap endpoint and canonical behavior again.
Task 14: Verify that the sitemap isn’t cached incorrectly by your CDN
cr0x@server:~$ curl -sSI https://www.example.com/sitemap_index.xml | egrep -i "cf-cache-status|age|via|x-cache"
cf-cache-status: HIT
age: 86400
What it means: The CDN is serving a day-old cached sitemap. That can be fine—unless it cached an error page or stale content after a migration.
Decision: Purge sitemap URLs, set a sane TTL, and consider “Cache Everything” exceptions. Sitemaps should be cacheable, but not permanently wrong.
Task 15: Check database performance impact if sitemaps are dynamic
cr0x@server:~$ mysql -NBe "SHOW FULL PROCESSLIST" | head
12345 root localhost wp Query 2 Sending data SELECT ID, post_title FROM wp_posts WHERE post_status='publish' ORDER BY post_modified DESC LIMIT 50000
What it means: Sitemap generation may be doing heavy queries. On shared DBs, this competes with page loads.
Decision: Cache sitemap output, pre-generate, or tune queries/indexes. If your sitemap endpoint is “slow,” Googlebot will learn it and fetch less often.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
They migrated a WordPress marketing site behind a CDN and a shiny new WAF. The team assumed: “If the homepage works in a browser, Google can crawl it.” That assumption belonged in a museum next to Netscape.
Search Console started showing sitemap fetch errors. The SEO manager escalated, the platform team shrugged, and everyone pointed at everyone. The sitemap URL returned 200 and looked fine—in a normal browser. When fetched with a bot user agent, the WAF served a JavaScript challenge page. Still 200. Still “fine” to anyone not actually parsing it as XML.
The fix was embarrassingly simple: a WAF rule to allow verified crawlers to fetch sitemap endpoints and key content paths without challenges. The real work was building the discipline: curl tests in CI, a “bot fetch” synthetic check, and a policy that “security controls must be observable.”
After the change, indexing recovered slowly over days. That delay caused more drama than the outage itself. But that’s the deal: crawling is eventually consistent, and the pipeline has backoff. If your org can’t handle delayed gratification, don’t break discovery.
Mini-story 2: The optimization that backfired
A different company tried to speed things up by caching everything at the edge. Their CDN rule: cache HTML for 24 hours, ignore query strings, and “optimize” content. The site felt fast. The dashboards went green. Everyone celebrated and went home early.
Then they launched a new product line and updated hundreds of pages. Internally, pages were updated. Externally, Googlebot mostly saw old content. The sitemap updated on the origin, but the CDN kept serving an older cached sitemap index with stale lastmod values—and occasionally a cached 503 response that the origin had served during a deploy.
Search Console reported oddities: “Sitemap can be read, but has errors,” and URLs stayed “Discovered – currently not indexed” for longer than normal. The team tried to “force indexing” by resubmitting, which did nothing, because the fetch still hit the stale edge object.
The fix: carve out explicit caching behavior for sitemap endpoints (short TTL, no transformation, cache-key includes host, and purge on deploy). They also stopped auto-updating lastmod for unchanged pages because it trained Google to ignore it. The site got slightly less “perfect” in Lighthouse. Indexing got a lot more predictable. Pick your battles.
Mini-story 3: The boring but correct practice that saved the day
At a large enterprise, WordPress was only one of many platforms. They had a dull policy: every public hostname must have a standard “crawlability runbook” and a weekly automated check that validates robots, sitemap, and a few canonical pages. No exceptions. No heroics.
One Friday, a DNS change intended for staging leaked into production: the apex domain started redirecting to a maintenance host for a subset of users. Browsers sometimes worked thanks to cached HSTS and happy-path routing. Googlebot, however, followed redirects like it was paid per hop and ended up at a 404 page with a 200 status code (because of course it did).
The automated check caught it within minutes. Not because it was clever—because it was boring: curl fetch, validate XML, verify canonical, alert on non-XML content-type. The on-call had a clear diff of what changed, and rolled back DNS before the crawl pipeline fully backed off.
Nothing dramatic happened. And that’s the point. The best SEO outage is the one that doesn’t become a meeting.
Short joke #2: The only thing more optimistic than a sitemap is the project plan that says “indexing: 2 days.”
Common mistakes: symptom → root cause → fix
“Couldn’t fetch” in Search Console
- Symptom: Search Console shows “Couldn’t fetch” or “Fetch failed.”
- Root cause: 401/403 from WAF, auth, geo blocks, or IP allowlists; 5xx due to origin instability; DNS/TLS issues.
- Fix: Reproduce with curl using a Googlebot UA; check access logs; whitelist bot traffic for sitemap endpoints; stabilize origin and remove auth from public paths.
“Sitemap is HTML” or “Invalid format”
- Symptom: Search Console complains about format, or reports parsing errors.
- Root cause: CDN/WAF returns an HTML challenge, a login page, or a maintenance page; PHP timeout truncates XML.
- Fix: Ensure
content-type: application/xml; bypass transformations for sitemap routes; add caching for sitemap generation; increase upstream timeouts only if you also address slowness.
“Success” but URLs don’t index
- Symptom: Sitemap status is “Success,” but coverage shows “Discovered – currently not indexed” or “Crawled – currently not indexed.”
- Root cause: URLs blocked by robots/noindex; canonical mismatch; thin/duplicate content; internal linking weak; too many low-value URLs.
- Fix: Pick representative URLs and inspect them; remove noindex and fix canonicals; prune sitemaps to only index-worthy URLs; improve internal links to high-value pages.
Only some sitemaps work (index file ok, child sitemaps fail)
- Symptom: Sitemap index fetches, but child sitemap intermittently fails or returns 5xx.
- Root cause: Dynamic generation heavy; PHP-FPM maxed; DB slow; caching inconsistent; rate limiting.
- Fix: Cache child sitemap output; pre-generate; split by post type/date; tune PHP-FPM and DB; add monitoring on sitemap endpoints.
Indexing collapsed after HTTPS or domain migration
- Symptom: URLs drop out; Search Console shows duplicate and alternate canonical problems.
- Root cause: Mixed http/https and www/apex variants in sitemap; old canonicals; redirect loops; inconsistent host normalization between CDN and origin.
- Fix: Make one canonical host/protocol; update WordPress settings; regenerate sitemaps; keep redirects simple and consistent at one layer; audit canonical tags.
Mass “Excluded by ‘noindex’ tag”
- Symptom: Coverage shows many excluded due to noindex.
- Root cause: “Discourage search engines” toggled; SEO plugin templates applied noindex to posts; HTTP
X-Robots-Tagset broadly. - Fix: Confirm with curl; fix at the narrowest responsible layer; then remove noindex from sitemap (don’t list what you refuse to index).
“Submitted URL seems to be a Soft 404”
- Symptom: Google treats pages as soft 404 even though they return 200.
- Root cause: Thin content, “no results found” templates, blocked content to bots, or error pages returning 200.
- Fix: Return proper 404/410 for missing content; avoid cloaking; improve real content and internal links; remove junk URLs from sitemaps.
Checklists / step-by-step plan
Step-by-step: make the sitemap fetchable and trustworthy
- Pick one sitemap generator. Core or one plugin. Disable the rest. Conflicts are not “redundancy.”
- Lock your canonical host/protocol. Decide on
httpsand eitherwwwor apex. Enforce it with a single redirect rule at the edge or origin (not both fighting). - Validate sitemap endpoint behavior. 200, XML content-type, sane caching, no transforms, no auth, no challenges.
- Validate child sitemaps. Randomly sample 5 child sitemaps and 10 URLs. If you don’t sample, you’re trusting the most optimistic part of your stack.
- Remove low-value URLs from sitemaps. No tag archives unless you mean it. No internal search. No paginated junk. No parameter variants.
- Fix robots/noindex/canonical mismatches. Don’t list URLs you disallow, noindex, or canonicalize away. That’s just wasting everyone’s time.
- Make sitemap generation cheap. Cache output. If it’s dynamic and expensive, pre-generate or use object caching. Googlebot traffic should not be your load test.
- Resubmit sitemap after real changes. Not as a ritual. Resubmit when you changed the endpoint, host, protocol, or removed blocks.
- Monitor it like an API. Synthetic checks that validate content-type and XML parseability beat waiting for Search Console to sulk.
Operational checklist: before blaming Google
- Can you fetch the sitemap from a non-browser client?
- Does it return XML and validate?
- Are there any 4xx/5xx/429 responses in logs for sitemap paths?
- Is robots.txt permissive for the content you want indexed?
- Do sample URLs return indexable signals (no noindex, canonical matches)?
- Is your CDN caching or transforming sitemap responses?
- Did you recently deploy, migrate, change DNS, or enable a WAF mode?
Minimal “known good” sitemap hygiene
- Use absolute URLs in
<loc>, canonical host only. - Split sitemaps when large; use a sitemap index.
- Set
lastmodonly when content actually changes. - Don’t include URLs that are noindex, redirected, or blocked by robots.
- Serve with correct content-type and stable caching.
FAQ
Why does Search Console say “Success” but my pages still aren’t indexed?
Because “Success” means Google fetched and parsed the sitemap. Indexing depends on the URLs themselves: robots/noindex, canonical choice, perceived quality, duplicates, and crawl scheduling.
Which sitemap should I submit for WordPress: wp-sitemap.xml or sitemap_index.xml?
Submit the one your site actually serves as the authoritative sitemap index. If you use an SEO plugin that provides /sitemap_index.xml, submit that. If you rely on core, submit /wp-sitemap.xml. Don’t submit both unless you enjoy duplicate signals.
Can a CDN break sitemap indexing?
Absolutely. CDNs can cache stale sitemaps, transform content, or serve bot challenges. Make sitemap endpoints predictable: short TTL, no HTML rewriting, no bot challenges, and purge on deploy/migration.
Do I need a Sitemap directive in robots.txt?
It helps but isn’t required if you submit in Search Console. It’s still worth adding because other crawlers use it, and it acts like a signpost when debugging.
Should I include category/tag archives in my sitemap?
Only if those pages are genuinely valuable landing pages with unique content. If they’re thin, duplicate, or autogenerated sludge, noindex them and remove them from the sitemap.
How often should my sitemap update?
Whenever meaningful content changes. Avoid updating lastmod for every URL on every request; it trains search engines to ignore your timestamps. Cache the sitemap and regenerate on publish/update events.
Does WordPress “Discourage search engines” block indexing completely?
It can, depending on your theme/plugins and how they implement it. Treat it as a red alert in production. Verify actual signals with curl: robots meta and any X-Robots-Tag headers.
What’s the fastest way to tell if Googlebot is being blocked?
Check your access logs for requests with Googlebot UA to sitemap paths and key pages, then verify status codes and bytes. Combine that with curl using a Googlebot UA from outside your network.
Do redirects in a sitemap matter?
Yes. A few redirects won’t kill you, but sitemaps should list final canonical URLs. If everything redirects, you’re wasting crawl budget and signaling poor hygiene.
My sitemap has 50,000+ URLs. Is that a problem?
It’s a problem if it’s one file. Split it and use a sitemap index. More importantly: make sure those URLs deserve indexing. Large sitemaps often hide a quality problem, not a technical one.
Conclusion: next steps that actually move the needle
If you want your WordPress sitemap to drive indexing, stop treating it like a magical SEO artifact and start treating it like an API endpoint with strict consumers.
- Run the fast diagnosis playbook: confirm the exact URL, curl it like Googlebot, then verify logs and directives.
- Fix fetchability first: 200 OK, correct XML, no auth, no WAF challenges, no redirect circus.
- Fix trust signals next: canonical consistency, noindex/robots alignment, and only include URLs you actually want indexed.
- Make it operational: monitor sitemap endpoints, cache intelligently, and test crawlability after every migration or CDN/security change.
Do that, and “not indexing” turns from an anxious mystery into a straightforward incident with a root cause and a change ticket. The way it should have been all along.