WordPress Multilingual: The Polylang Trap That Creates Duplicate Pages

Was this helpful?

You wake up to a Search Console alert: “Duplicate, submitted URL not selected as canonical.”
Traffic is down. Your content team swears nothing changed. Your SEO agency pings you with a spreadsheet full of near-identical URLs that differ only by /en/ vs no prefix, or a mysterious ?lang=en.

This is the Polylang trap: the site “works” for humans, but the URL surface area quietly explodes. Crawlers don’t forgive ambiguity.
And caches love making it worse.

What “duplicate pages” actually means in Polylang land

“Duplicate pages” is overloaded. In a Polylang setup it can mean at least four different things, and each demands a different fix.

1) Duplicate content from multiple URLs resolving to the same language page

Example: /about/ and /en/about/ both serve English. Or /de/uber-uns/ and /uber-uns/?lang=de.
Humans click one. Google indexes both and picks one canonical, sometimes the wrong one.

2) Duplicate “entities” in WordPress: posts created twice

This is the “we have two English pages titled About” scenario. Usually caused by importers, page builders, or a translation workflow that created a new post instead of linking a translation.
This is messier because it’s not just URL hygiene; it’s data integrity.

3) Duplicate taxonomy archives and term pages

Categories and tags can multiply. A translated category slug might exist both translated and untranslated. Worse: the same term ID can appear exposed under multiple language contexts due to misconfigured language filtering.

4) Duplicate cache objects that serve the wrong language under the right URL

This is the quiet killer: /fr/produit/ sometimes returns English because the cache key ignored language. Then Polylang tries to “fix” it with redirects.
Result: redirect loops, mixed canonicals, and a crawler party you didn’t invite.

The right question is not “do we have duplicates?” It’s “what layer is duplicating: routing, canonicalization, data, or cache?”
Diagnose that first. Fixing the wrong layer is how you end up with a bilingual site that’s also bi-directionally broken.

How Polylang creates duplicates (the usual mechanisms)

Language in URL: directory, subdomain, or parameter

Polylang supports language negotiation via URL directory (e.g., /en/), subdomain (e.g., en.example.com), or parameter (e.g., ?lang=en).
Each has different failure modes.

  • Directory-based is generally the least bad for SEO, but it demands strict redirects so only one form exists.
  • Subdomain-based makes caching and cookie scoping trickier but isolates languages cleanly if done well.
  • Parameter-based is the easiest way to manufacture duplicates, because lots of systems treat query strings as optional “same page.” Crawlers don’t.

The “default language” ambiguity

Most Polylang duplicate incidents start with the default language being reachable in two ways:
/about/ and /en/about/.
Someone decides “both are fine.” They aren’t.

Pick one. Redirect the other. Then enforce it at the edge (Nginx/CDN), not only in PHP where it’s slower and easier to bypass.

Canonicals and hreflang that drift apart

Canonical tags tell crawlers which URL is the preferred version. hreflang tells crawlers how language/region variants relate.
When they disagree—say canonical points to /about/ but hreflang lists /en/about/—you’ve told Google two different stories.
Google will choose a third story.

Sitemaps that list both forms

If your sitemap emits both /en/about/ and /about/ (or mixes query parameter variants), you’ve escalated the issue from “possible duplicate” to “invited duplicate.”
Sitemaps are a declaration of intent. If you list garbage, you get garbage indexing.

Caching that ignores language

Caches need a key. If language is decided by cookie, header, or query param, and you don’t vary the cache key accordingly, you will serve the wrong language.
Polylang then may redirect based on detected language, causing loops and duplicate crawl paths.

Joke #1: Caches are like toddlers—if you don’t set clear rules, they’ll happily hand you the wrong thing with total confidence.

Facts and context that change how you debug

  1. WordPress core wasn’t built as multilingual-first. Internationalization exists, but multiple-language content routing is plugin territory, which means the “source of truth” is fragmented.
  2. Canonical tags became mainstream SEO tooling in 2009. A lot of WordPress SEO behavior assumes a single canonical per content object; multilingual introduces “canonical per variant.”
  3. hreflang is not a ranking boost; it’s a disambiguation hint. If you get it wrong, you don’t just miss out—you create confusion about which URL belongs in which index.
  4. Search engines treat query parameters as separate URLs unless proven otherwise. Parameter-based language negotiation is basically duplication with a nicer UI.
  5. HTTP caches typically ignore cookies by default. If language selection is stored in a cookie, your cache must explicitly vary on it—or you must avoid cookie-based language selection for cached pages.
  6. CDNs can normalize URLs in surprising ways. Some configurations drop or reorder query parameters, which can merge languages into the same cache object.
  7. Robots and prefetchers don’t behave like browsers. They might not accept cookies, might not run JS, and will happily crawl alternate language links at scale.
  8. Polylang stores language relationships in its own tables/meta. If you migrate, import, or copy posts without preserving that mapping, translations become orphans, and orphans get duplicated during “fixes.”
  9. Permalink changes are URL migrations. Switching from ?lang= to /en/ is not “a setting.” It’s a full redirect plan, a cache purge plan, and a reindex plan.

One reliability principle applies here. A paraphrased idea often attributed to John Allspaw: incidents come from normal work interacting in unexpected ways, not from a single bad person.
Multilingual duplication is exactly that: normal plugin behavior + normal caching + normal SEO tooling = a weird emergent mess.

Fast diagnosis playbook

When someone says “Polylang is creating duplicate pages,” they’re usually describing a symptom seen in analytics or SEO tools.
Your job is to locate the duplication layer quickly.

First: determine whether duplicates are URL-level or content-level

  • Do multiple URLs return the same HTML (same language, same content)? That’s URL-level duplication (redirect/canonical/sitemap/caching).
  • Do multiple WordPress posts/pages exist with the same language and similar content? That’s content-level duplication (data/workflow/import).

Second: verify canonical + hreflang coherence on a single affected page

  • Canonical must point to the preferred URL form for that language.
  • hreflang set must be complete, consistent, and self-referential (each language points to itself correctly).

Third: check caching variance

  • If language changes based on cookie/header/query param, ensure cache keys vary accordingly.
  • Check if the CDN is caching HTML for logged-out users and whether it distinguishes language.

Fourth: audit redirects for default language

  • Pick a single canonical URL scheme for default language and enforce with 301s.
  • Eliminate “two doors” into the same content. Crawlers will use both doors.

Hands-on tasks: commands, outputs, decisions

These are practical checks you can run from a shell on a web node or a bastion with access.
Each task includes (1) a command, (2) what the output means, and (3) the decision you make.
Adjust domains and paths to your environment.

Task 1: Confirm whether two URLs return identical content

cr0x@server:~$ curl -sS -D- https://example.com/about/ -o /tmp/a.html | sed -n '1,20p'
HTTP/2 200
content-type: text/html; charset=UTF-8
cache-control: public, max-age=600
...
cr0x@server:~$ curl -sS https://example.com/en/about/ -o /tmp/b.html && sha256sum /tmp/a.html /tmp/b.html
e3b0c44298fc1c149afbf4c8996fb924...  /tmp/a.html
e3b0c44298fc1c149afbf4c8996fb924...  /tmp/b.html

Meaning: Identical hashes strongly suggest the same HTML is served at both URLs. That’s URL duplication, not editorial duplication.

Decision: Choose one URL form and redirect the other with a 301; then align canonical and sitemap to the winner.

Task 2: Inspect canonical and hreflang on the page

cr0x@server:~$ curl -sS https://example.com/en/about/ | grep -Eo '<link[^>]+(canonical|alternate)[^>]+' | head
<link rel="canonical" href="https://example.com/about/" />
<link rel="alternate" hreflang="en" href="https://example.com/en/about/" />
<link rel="alternate" hreflang="fr" href="https://example.com/fr/a-propos/" />

Meaning: Canonical points to /about/ while hreflang self points to /en/about/. That mismatch is a classic indexing headache.

Decision: Make canonical consistent with your chosen URL scheme (likely /en/about/ if you prefix all languages, or /about/ if default language is unprefixed and only one path exists).

Task 3: Follow redirects and see if language enforcement is happening

cr0x@server:~$ curl -sS -I -L https://example.com/about/ | sed -n '1,40p'
HTTP/2 200
content-type: text/html; charset=UTF-8
...

Meaning: No redirects. If /about/ and /en/about/ both 200, you’ve got two indexable URLs.

Decision: Add a 301 redirect for one of the forms, ideally at Nginx/CDN level.

Task 4: Check if cookies change the language and therefore must vary cache

cr0x@server:~$ curl -sS -I https://example.com/about/ | grep -i 'set-cookie'
set-cookie: pll_language=en; path=/; secure; HttpOnly; SameSite=Lax

Meaning: Polylang is setting a language cookie. If your cache doesn’t vary on this cookie, users can see the wrong language.

Decision: Either (a) avoid cookie-driven language switching for cached pages by enforcing language in URL, or (b) configure cache variation correctly (often painful and expensive).

Task 5: Verify whether cache varies by cookie/header (response hints)

cr0x@server:~$ curl -sS -I https://example.com/en/about/ | grep -iE 'vary|x-cache|cf-cache-status|age'
vary: Accept-Encoding
x-cache: HIT
age: 531

Meaning: Vary does not mention cookie or a language header, and cache is hitting. If language selection depends on cookie, this is suspicious.

Decision: Fix cache keying (edge rules) or re-architect language negotiation to be URL-based for anonymous traffic.

Task 6: Compare HTML language markers between two variants

cr0x@server:~$ curl -sS https://example.com/fr/a-propos/ | grep -Eo '<html[^>]+' | head -n 1
<html lang="en-US">

Meaning: French URL returning lang="en-US" strongly suggests wrong-language content or theme misconfiguration.

Decision: Treat as a cache bleed or template bug; verify Polylang language context and cache rules before touching SEO settings.

Task 7: Check whether your sitemap lists duplicates

cr0x@server:~$ curl -sS https://example.com/sitemap.xml | grep -Eo '<loc>[^<]+' | sed 's/<loc>//' | head
https://example.com/about/
https://example.com/en/about/
https://example.com/fr/a-propos/

Meaning: Sitemap explicitly includes both default and prefixed English URL.

Decision: Fix sitemap generation (SEO plugin + Polylang integration) so only canonical URLs are listed.

Task 8: Search access logs for language parameter crawl storms

cr0x@server:~$ sudo awk '$7 ~ /lang=/ {count++} END {print count}' /var/log/nginx/access.log
18427

Meaning: Lots of requests include lang=. Either internal links are leaking parameter-based URLs, or bots discovered them.

Decision: Stop generating parameter URLs, 301 them to directory/subdomain equivalents, and remove them from sitemaps and internal links.

Task 9: Confirm whether different query strings are cached as the same object

cr0x@server:~$ curl -sS -I "https://example.com/about/?lang=en" | grep -iE 'x-cache|cf-cache-status|age'
x-cache: HIT
age: 590
cr0x@server:~$ curl -sS -I "https://example.com/about/?lang=fr" | grep -iE 'x-cache|cf-cache-status|age'
x-cache: HIT
age: 590

Meaning: Same age and HIT pattern suggests the cache might be ignoring query strings or normalizing them.

Decision: Fix CDN/Nginx cache key to include query string where appropriate, or (better) eliminate query-string language mode entirely.

Task 10: Validate WordPress sees the correct home URL per request

cr0x@server:~$ wp option get home
https://example.com
cr0x@server:~$ wp option get siteurl
https://example.com

Meaning: Baseline looks normal. This check matters because mismatched home/siteurl can produce mixed canonicals and redirects that appear “multilingual.”

Decision: If these differ or are wrong (http vs https), fix them before blaming Polylang.

Task 11: Inspect Polylang language configuration quickly

cr0x@server:~$ wp plugin list --status=active | grep -i polylang
polylang                      3.6.2   active
cr0x@server:~$ wp option get polylang
Error: Could not get 'polylang' option. Does it exist?

Meaning: Polylang stores a lot in its own tables and multiple options; you won’t necessarily find a single tidy option blob.

Decision: Use database inspection for Polylang tables and verify URL mode in the admin UI; don’t assume CLI gives you the whole picture.

Task 12: Identify duplicate posts by title within a language (content-level duplication)

cr0x@server:~$ wp db query "SELECT p.ID, p.post_title, pm.meta_value AS lang
FROM wp_posts p
JOIN wp_term_relationships tr ON tr.object_id = p.ID
JOIN wp_term_taxonomy tt ON tt.term_taxonomy_id = tr.term_taxonomy_id
JOIN wp_terms t ON t.term_id = tt.term_id
LEFT JOIN wp_postmeta pm ON pm.post_id = p.ID AND pm.meta_key = '_pll_post_language'
WHERE p.post_type='page' AND p.post_status='publish' AND tt.taxonomy='language'
ORDER BY p.post_title LIMIT 10;"
+-----+----------------+------+
| ID  | post_title     | lang |
+-----+----------------+------+
| 311 | About          | NULL |
| 947 | About          | NULL |
| 102 | Careers        | NULL |
+-----+----------------+------+

Meaning: This output is illustrative: language linkage is not in _pll_post_language like other plugins; you may see NULL depending on schema/version.
The useful part is the technique: query for duplicates and then verify their Polylang language term relationships.

Decision: If you truly have duplicate post objects, fix the translation mapping (link translations) or delete/redirect the unintended duplicates. Don’t “canonical-tag” your way out of a data problem.

Task 13: Check Nginx for rewrite rules that create shadow URLs

cr0x@server:~$ sudo nginx -T 2>/dev/null | grep -RIn "rewrite.*lang|return 30[12].*/en/|try_files.*\\$args" /etc/nginx | head
/etc/nginx/sites-enabled/example.conf:47:    rewrite ^/about/$ /en/about/ permanent;
/etc/nginx/sites-enabled/example.conf:63:    try_files $uri $uri/ /index.php?$args;

Meaning: You may have hand-rolled redirects. The try_files with $args preserves query strings, which can keep parameter-based language URLs alive.

Decision: Make redirects consistent and aggressive: if query-string language mode is not desired, strip/redirect it. Also ensure you’re not rewriting only some paths and leaving others duplicated.

Task 14: Confirm the database doesn’t contain both slug variants for the same language

cr0x@server:~$ wp db query "SELECT post_name, COUNT(*) c FROM wp_posts WHERE post_type='page' AND post_status='publish' GROUP BY post_name HAVING c > 1 ORDER BY c DESC LIMIT 10;"
+-----------+---+
| post_name | c |
+-----------+---+
| about     | 2 |
+-----------+---+

Meaning: Two published pages share the same slug. WordPress will disambiguate with suffixes or routing oddities depending on hierarchy, but multilingual routing can make this look like “duplicate language pages.”

Decision: Fix the content model: unique slugs per language context, and ensure translations are linked rather than duplicated.

Task 15: Check for mixed language output being served from cache (spot-check)

cr0x@server:~$ for u in /en/about/ /fr/a-propos/; do echo "== $u"; curl -sS https://example.com$u | grep -Eo '<title>[^<]+' | head -n1; done
== /en/about/
<title>About - Example</title>
== /fr/a-propos/
<title>About - Example</title>

Meaning: French URL returns English title. This is almost never “SEO.” It’s caching variance, broken translation mapping, or language detection falling back.

Decision: Disable full-page cache temporarily to confirm, then correct cache keying and purge.

Task 16: Verify headers for canonical host/proto correctness

cr0x@server:~$ curl -sS https://example.com/en/about/ | grep -Eo '<link rel="canonical" href="[^"]+' | head -n1
<link rel="canonical" href="http://example.com/en/about/

Meaning: Canonical points to HTTP while the site is HTTPS. That creates “duplicates” by scheme, and multilingual just makes the graph bigger.

Decision: Fix WordPress URL settings, reverse proxy headers (X-Forwarded-Proto), and SEO plugin config so canonicals use the public scheme.

Failure modes by layer (WordPress, plugins, web, CDN, bots)

WordPress layer: permalinks and hierarchical pages

WordPress routing is deterministic until you introduce multiple “valid” URLs for the same content. Then you’re in ambiguity territory:
hierarchical pages, attachments, and auto-generated rewrite rules can give crawlers multiple paths.

If your default language is unprefixed, the default-language permalink must be the only reachable variant. If it’s reachable both prefixed and unprefixed, you’ve created a second identity.
WordPress won’t stop you. WordPress is polite like that.

Polylang layer: language negotiation and translation mapping

Polylang is good at what it does: attach language context, build alternate links, and let you translate content.
The trap is assuming it also governs caching, redirects at the edge, and sitemap behavior of other plugins. It doesn’t.

Translation mapping matters. If a page exists in two languages but they aren’t linked as translations, Polylang can treat them as independent pages.
Then a well-meaning editor duplicates a page “to translate it,” and now you have two pages in the same language because someone clicked the wrong thing in a dropdown.

SEO plugin layer: canonicals, sitemaps, and robots directives

Most SEO plugins have multilingual integration, but it’s not magic. When integrations fail, they fail silently.
You end up with:

  • Sitemaps that list both prefixed and unprefixed default language URLs.
  • Canonicals that ignore the current language context.
  • Alternate links that are correct but point to non-canonical URLs.

Web server layer: redirects, normalization, and query-string preservation

Nginx/Apache configs often preserve query strings by default. That’s normally correct.
In multilingual setups, it can keep “dead” language modes alive forever: ?lang= continues to resolve and gets indexed.

Normalization rules can also create duplicates: trailing slash vs no slash, uppercase vs lowercase, www vs apex. Multiply those by 5 languages and you’ve built a URL farm.

CDN layer: cache keys and normalization

CDNs reduce load and increase performance. They also make bugs global in about 45 seconds.
If the CDN caches HTML and language varies by cookie or header, you must configure the cache key correctly.
If you can’t, don’t cache HTML that varies by cookie. Cache static assets only, or move to URL-based language routing.

Bot layer: how crawlers discover the duplicates

Duplicates usually become visible because:

  • Internal links expose both variants (menus, language switchers, breadcrumbs, canonical tag mistakes).
  • Sitemaps list them.
  • Redirect chains expose them.
  • External links include the “wrong” form, and your site accepts it without redirecting.

Don’t waste time blaming “Google being dumb.” If you allow two URLs, crawlers will use two URLs. That’s not a bug; that’s their job.

Three corporate mini-stories from the multilingual trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized B2B company rolled out a bilingual site: English default, French added for a new market.
They used Polylang with language directories and kept the default language accessible both as / and /en/ “because marketing wanted the /en/ look.”

The assumption: canonical tags would make it fine. The SEO plugin would pick one, right?
It did pick one—sometimes. For some templates it emitted canonical without /en/. For others it included it. The header navigation linked to /en/, the footer linked to unprefixed.
So crawlers saw two internal link graphs with equal authority.

The result wasn’t immediate fire. It was slow rot: more crawl budget spent, index oscillation, and content showing up in the wrong language for branded searches.
Then the support team noticed something embarrassing: customers in France were landing on English pages because the “preferred” canonical in the index was English without the directory, and Google considered it the main page.

The fix was boring: pick one URL scheme, 301 the other, regenerate sitemaps, and purge caches. Rankings stabilized over a few weeks.
The real lesson: never allow two “valid” URLs for the same language variant. Canonicals are not an excuse to be indecisive.

Mini-story 2: The optimization that backfired

An ecommerce team had performance problems during campaigns. Someone enabled full-page caching at the CDN for all anonymous traffic.
Great graphs. Load dropped. Page speed improved. Everyone celebrated.

Two days later, customer service reports: “Spanish pages randomly show English.” It wasn’t random. The CDN cache key ignored the Polylang cookie and didn’t vary by Accept-Language.
The first request to a URL won; everyone else got that cached language.

Polylang tried to correct language based on cookie and redirected some users. Those redirects were also cached incorrectly.
The site developed a nice new feature: redirect loops that only happened in one region, behind one ISP, with one set of cookies. Classic.

The postmortem was straightforward. The optimization was valid in a monolingual world and destructive in a multilingual one.
They rolled back HTML caching, kept static asset caching, then reintroduced HTML caching only after moving language selection fully into the URL and varying cache keys properly.

Mini-story 3: The boring but correct practice that saved the day

A media organization ran Polylang across six languages. They’d been burned before, so they enforced a rule:
every language variant must have exactly one canonical URL, and every non-canonical variant must 301 within one hop.

They also had a scheduled job that sampled a few dozen URLs per language, checked canonical and hreflang coherence, and alerted if:
the canonical didn’t match the request language path, or if an hreflang target returned a redirect chain.

One Friday, a theme update changed how the language switcher generated links. It started emitting parameter-based URLs (?lang=) for some templates.
The monitoring caught it within the hour: sudden appearance of lang= in logs and a spike in non-canonical 200 responses.

They reverted the theme change, added a defensive redirect for ?lang= to the directory form, purged caches, and moved on.
No drama, no SEO cliff. Just a small blip. The secret ingredient was not genius; it was having an opinion and enforcing it continuously.

Joke #2: The best multilingual strategy is like a good on-call rotation—unexciting, documented, and nobody talks about it at parties.

Common mistakes: symptoms → root cause → fix

1) Symptom: /en/ and non-/en/ both index for English

Root cause: Default language reachable via two URL forms; no strict redirects; sitemap includes both.

Fix: Choose one canonical scheme for default language. Add 301 redirects for the other. Ensure canonical tags and sitemaps output only the canonical scheme.

2) Symptom: French URL sometimes shows English content

Root cause: Cache key missing language dimension (cookie/header/query); CDN or Nginx fastcgi cache serving wrong variant.

Fix: Move language negotiation to URL directories/subdomains for anonymous users; or vary cache by language cookie/header and confirm with repeated curl tests. Purge caches.

3) Symptom: Search Console says “Alternate page with proper canonical tag” for thousands of URLs

Root cause: Parameter-based URLs (?lang=) or trailing slash variants crawlable; canonicals point elsewhere but pages still 200.

Fix: 301 parameter variants to canonical directory URLs; normalize trailing slashes; remove duplicates from sitemap; ensure internal links don’t emit parameters.

4) Symptom: hreflang warnings (no return tags, wrong language codes)

Root cause: Incomplete translation mapping, removed languages without cleanup, or SEO plugin integration failing on some templates.

Fix: Ensure each language variant lists a complete hreflang set including self; fix translation links; verify codes (e.g., en, fr) and region variants if used.

5) Symptom: Duplicate category/tag archive pages per language

Root cause: Taxonomy translation misconfiguration; term slugs duplicated; archives not filtered by language consistently.

Fix: Decide whether taxonomy slugs are translated; enforce one approach; ensure archive queries filter by language; redirect unintended archives.

6) Symptom: After migrating domains, old language URLs still resolve

Root cause: Redirect rules too generic; caches hold old HTML; mixed canonicals (http/https, www/apex) keep old variants alive.

Fix: Implement explicit host/scheme redirects; purge CDN; verify canonicals on representative pages; check response headers for correct host and scheme.

7) Symptom: Editors see multiple “About” pages in the same language

Root cause: Translation workflow created new posts without linking translations; imports duplicated content; page builder templates cloned content objects.

Fix: Deduplicate content in WordPress: link translations properly, delete/merge extras, and implement editorial guardrails (roles, workflow, training).

8) Symptom: Redirect loops when switching languages

Root cause: Conflicting redirects (CDN + Nginx + Polylang), cache serving wrong language triggers redirect, or forced trailing slash rules fight language prefix rules.

Fix: Map redirect logic centrally (prefer edge), reduce redirect layers, test with curl -I -L, and ensure cache serves correct content per URL.

Checklists / step-by-step plan

Step-by-step: choose a single URL truth and enforce it

  1. Pick a URL strategy: directories or subdomains. For most sites: directories.
  2. Decide default language behavior: prefixed or unprefixed. If unprefixed, ensure prefixed default redirects away. If prefixed, ensure unprefixed redirects into it.
  3. Normalize scheme and host: one HTTPS host; redirect the rest.
  4. Remove parameter-based language URLs: 301 them to canonical directory/subdomain equivalents.
  5. Make canonicals match: canonical must match the chosen URL scheme for that language variant.
  6. Make hreflang coherent: complete set, correct codes, self-referential, and targets should be 200 (not redirecting).
  7. Fix sitemap output: only canonical URLs, correct alternates, no parameter variants.
  8. Audit internal links: menus, footers, breadcrumbs, related posts, language switcher—no mixed forms.
  9. Fix caching: ensure cache varies by language, or cache only content that is language-invariant. Prefer URL-based language for anonymous HTML caching.
  10. Purge aggressively: CDN + server cache + plugin cache. Then validate with curl sampling.
  11. Monitor: log queries for ?lang=, track 200 responses on non-canonical forms, watch for canonical/hreflang drift after releases.

Release checklist (the one you actually follow)

  • For 5 representative pages per language: confirm single-hop to canonical and correct canonical tag.
  • Confirm lang attribute in HTML matches the language URL.
  • Confirm language switcher links don’t use query params.
  • Confirm sitemap contains only canonical URL forms.
  • Confirm caching headers are sane and vary appropriately (or caching is disabled for HTML if it can’t be varied correctly).
  • Search logs for sudden spikes in non-canonical forms (/en/ duplication, ?lang=, mixed trailing slashes).

Data hygiene checklist (prevents editorial duplicates)

  • Define “translation creation” workflow: always create translations via Polylang’s linking UI, not copy/paste new pages.
  • Restrict who can publish in secondary languages until the process is stable.
  • Run periodic duplicate-title/duplicate-slug reports and review them with content ops.
  • Before imports: test on staging and verify translation mapping survives migration.

FAQ

1) Is Polylang “bad for SEO”?

No. Polylang is fine. The trap is letting multiple URL forms resolve to the same content and assuming canonicals will paper over it.
SEO hates ambiguity more than it hates any specific plugin.

2) Should the default language be prefixed (/en/) or not?

Either can work. Pick one and enforce it strictly.
If you want maximum consistency and fewer edge cases, prefix everything, including default. If you want prettier default URLs, keep default unprefixed—but ensure /en/ redirects away everywhere.

3) Why do I see ?lang= URLs even though I use language directories?

Usually a theme component, language switcher, or plugin is generating parameterized links.
Sometimes it’s a fallback behavior when Polylang can’t resolve a translation. Treat it as a bug: parameter URLs should 301 to the directory form.

4) Can I just add noindex to the duplicates?

You can, but it’s rarely the best first fix. If duplicates are reachable and internally linked, crawlers will keep spending time on them.
Prefer 301 redirects to a single canonical URL. Use noindex only when redirects aren’t feasible (rare) or for special cases like filtered listings.

5) My CDN caches HTML. How do I avoid mixed-language pages?

Make language selection part of the URL (/fr/, /en/) and configure the cache key to include the full path.
Avoid cookie-based language negotiation for cacheable anonymous HTML. If you must use cookies, vary cache by that cookie explicitly and test it.

6) Why does Search Console show duplicates after I fixed redirects?

Indexing is not instant. Also, you might still be emitting duplicates via sitemap, internal links, or canonicals.
Confirm that the non-canonical URL now returns a 301, and that the canonical page’s canonical tag points to itself, not a different variant.

7) What about translated taxonomies—should category slugs be translated?

Decide based on audience and scale. Translated slugs can be better UX, but they increase complexity.
If you translate taxonomy slugs, ensure you don’t also expose untranslated archives for the same language. One language, one term archive URL.

8) Does switching permalink structure cause duplicate pages?

It can. Changing permalink structures changes URL identities. In multilingual setups, it multiplies the blast radius.
Treat it as a migration: map old → new with 301s, update sitemaps, purge caches, and verify canonical/hreflang after the change.

9) What’s the quickest proof that the cache is the culprit?

Hit two language URLs repeatedly and see whether content flips or whether both return the same <title> or lang= value.
If disabling cache (temporarily) fixes it, you’ve got cache variance problems, not Polylang “duplicating pages.”

10) Should I move from Polylang to another multilingual plugin?

Only if your real problem is workflow or missing features. If your problem is duplicate URLs, you can recreate the same mess with any plugin.
Fix URL truth, canonicals, sitemaps, and caching first. Then evaluate tools.

Practical next steps

The Polylang trap isn’t a single bug. It’s the system doing exactly what you allowed: multiple URL forms, inconsistent canonicals, and caches that don’t speak “language.”
You don’t solve it with a plugin toggle and optimism.

Do this next, in order:

  1. Pick one canonical URL scheme per language (including a hard decision on default language prefixing).
  2. 301 everything else at the edge. One hop. No debate.
  3. Make canonical + hreflang tell the same story across templates.
  4. Fix sitemap emissions so you stop feeding duplicates to crawlers.
  5. Audit caching keys; if language is not in the URL, don’t cache anonymous HTML until it is.
  6. Put guardrails in place: a small monitoring script, a log check for ?lang=, and a release checklist that tests multilingual routing like it matters—because it does.

Multilingual WordPress can be stable and fast. It just can’t be vague. Make one URL the truth, and make everything else apologize with a 301.

← Previous
Install Windows 11 24H2 Without Losing Files: UEFI, Secure Boot, Drivers, Done
Next →
Windows Backup: Restores Fail at the Worst Time — Build a Real Restore Test

Leave a comment