You upgraded to Ubuntu 24.04 and now “some sites work, some sites don’t.” It’s never the same set twice.
Your monitoring says the internet is fine. Your coworkers insist it’s “DNS” with the smug serenity of people who have never read a packet capture.
What’s usually happening: you’re dual-stack (IPv4 + IPv6), but your IPv6 path is kind of alive—alive enough to be preferred, broken enough to time out.
That creates the worst user experience in networking: intermittent failure that looks like superstition.
What “random breakage” usually means on dual-stack
When users say “random sites don’t load,” they’re not describing randomness. They’re describing selection effects.
Some destinations publish AAAA records (IPv6 addresses), others don’t. Some CDNs steer you to an IPv6 edge that’s reachable, others steer you to one that isn’t.
Some sites are behind networks that require PMTUD to work correctly; others aren’t. Some endpoints use QUIC over UDP; others stick to TCP.
On a dual-stack host, many client stacks will try IPv6 first if a AAAA record exists. If the IPv6 route is broken or the DNS server returns bad IPv6 answers,
you get timeouts, long page loads, or “works on my phone, not on my laptop” arguments. Eventually, Happy Eyeballs may fall back to IPv4, but “eventually”
is not a user experience strategy.
Your goal isn’t to “make IPv6 go away.” Your goal is to make IPv6 correct so dual-stack behaves like a boring utility.
Boring is good. Boring pays rent.
Joke #1: Disabling IPv6 because it’s flaky is like removing your smoke detector because it beeps when the battery is low. Quiet, yes. Smart, no.
Facts and context that explain why this happens
- Fact 1: IPv6 was standardized in the late 1990s, but enterprise rollouts lagged for decades; “mostly working” deployments are common.
- Fact 2: Many OSes prefer IPv6 by default when both A and AAAA exist, guided by RFC 6724 address selection rules.
- Fact 3: “Happy Eyeballs” (initially RFC 6555, later updated) exists specifically because broken IPv6 paths were so common.
- Fact 4: Router Advertisements (RA) can configure IPv6 without DHCPv6; this convenience also makes misconfiguration easier to hide.
- Fact 5: DNS64/NAT64 can make IPv6-only clients reach IPv4 services; when deployed accidentally or half-broken, it creates surreal failures.
- Fact 6: Path MTU black holes were historically more visible in IPv4 because of fragmentation behavior; in IPv6, routers don’t fragment for you.
- Fact 7: Some VPN clients and “security agents” still mishandle IPv6, especially split tunneling and DNS routing for AAAA.
- Fact 8: Cloud providers and CDNs often have different IPv6 and IPv4 edges; you are not always hitting “the same server, just different IPs.”
- Fact 9: systemd-resolved changed how many Linux desktops and servers do stub resolution; it’s fast when right and confusing when wrong.
Fast diagnosis playbook (check first/second/third)
First: confirm whether the failures correlate with AAAA (IPv6) lookups
If broken sites reliably have AAAA records and working sites don’t, stop pretending it’s “random.” It’s IPv6 preference + a broken IPv6 path.
If both A and AAAA exist and IPv4 works but IPv6 hangs, you’re looking at routing, MTU, firewall, or DNS transport issues.
Second: validate the local host’s IPv6 configuration and default route
The fastest way to lose an afternoon is to debug application behavior before you’ve confirmed the kernel has a valid IPv6 address,
a default route, and functioning neighbor discovery.
Third: test basic IPv6 reachability and DNS behavior separately
DNS problems and packet-forwarding problems can look identical from a browser. Split them early:
(1) can you resolve AAAA quickly, and (2) can you connect to the resolved IPv6 address quickly?
Fourth: if it’s “connects but stalls,” suspect MTU/PMTUD or a stateful firewall
TCP handshake succeeds, TLS stalls, HTTP hangs, QUIC fails: that’s where MTU black holes and firewall state timeouts live.
Diagnose with targeted pings and packet captures, not vibes.
Practical tasks: commands, outputs, decisions (12+)
These are the tasks I actually run on an Ubuntu 24.04 host when dual-stack feels haunted. Each task includes what the output means and what decision you make next.
Run them in order; you’ll usually catch the culprit by task 7–10.
Task 1: confirm the host has global IPv6 and the right interface
cr0x@server:~$ ip -6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP
inet6 fe80::a00:27ff:fe4e:66a1/64 scope link
valid_lft forever preferred_lft forever
inet6 2001:db8:120:34::25/64 scope global dynamic mngtmpaddr
valid_lft 7100sec preferred_lft 3500sec
Meaning: You want a scope global address on the interface you expect (not only fe80::/64 link-local).
If you only see link-local, IPv6 won’t reach the internet without some special tunneling.
Decision: No global IPv6? Stop. Fix RA/DHCPv6 on the network or netplan config before debugging browsers.
Task 2: verify the IPv6 default route exists and points somewhere sane
cr0x@server:~$ ip -6 route show
2001:db8:120:34::/64 dev enp0s31f6 proto kernel metric 256 pref medium
fe80::/64 dev enp0s31f6 proto kernel metric 256 pref medium
default via fe80::1 dev enp0s31f6 proto ra metric 1024 pref medium
Meaning: The default route via a link-local gateway (like fe80::1) learned via RA is normal.
What’s not normal is having no default, or multiple defaults flapping between interfaces.
Decision: Missing default route? Focus on router advertisements (or static routing) before DNS. Multiple defaults? You may be dealing with VPNs, Wi‑Fi + Ethernet, or mis-set route metrics.
Task 3: check neighbor discovery isn’t failing (L2/L3 reality check)
cr0x@server:~$ ip -6 neigh show dev enp0s31f6
fe80::1 lladdr 00:11:22:33:44:55 router STALE
Meaning: If the router neighbor entry never appears, or sits in FAILED, you don’t have basic IPv6 adjacency.
Decision: If neighbor discovery is broken, suspect VLANs, Wi‑Fi isolation, RA-guard, switch security, or an overzealous firewall on the local segment.
Task 4: test raw IPv6 connectivity to a known address (bypass DNS)
cr0x@server:~$ ping -6 -c 3 2606:4700:4700::1111
PING 2606:4700:4700::1111(2606:4700:4700::1111) 56 data bytes
64 bytes from 2606:4700:4700::1111: icmp_seq=1 ttl=57 time=9.89 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=2 ttl=57 time=10.12 ms
64 bytes from 2606:4700:4700::1111: icmp_seq=3 ttl=57 time=10.02 ms
--- 2606:4700:4700::1111 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
Meaning: Successful ICMPv6 doesn’t prove TCP will work, but failure here is a giant red sign: IPv6 is fundamentally broken.
Decision: If this fails but IPv4 ping works, focus on routing/firewall/ISP/VPN rather than application settings.
Task 5: confirm DNS returns AAAA and A, and how fast
cr0x@server:~$ resolvectl query example.com
example.com: 2606:2800:220:1:248:1893:25c8:1946
93.184.216.34
-- Information acquired via protocol DNS in 42.0ms.
-- Data is authenticated: no
Meaning: You got both AAAA and A quickly. If queries take seconds or return only AAAA when the site actually has A too, that’s resolver/path weirdness.
Decision: Slow DNS? Investigate systemd-resolved upstream servers, EDNS behavior, or a VPN DNS capture issue.
Task 6: inspect which DNS servers are in play (per-link matters)
cr0x@server:~$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 2001:4860:4860::8888
DNS Servers: 2001:4860:4860::8888 8.8.8.8
Link 2 (enp0s31f6)
Current Scopes: DNS
Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 2001:4860:4860::8888
DNS Servers: 2001:4860:4860::8888 8.8.8.8
Meaning: systemd-resolved can have per-interface DNS. A VPN often injects DNS on its link, which can change AAAA answers or break them.
Decision: If DNS servers differ by link, confirm the default route and DNS link match your intent (especially with VPN split tunnels).
Task 7: force IPv6 and IPv4 connections separately to the same hostname
cr0x@server:~$ curl -6 -I --max-time 8 https://example.com
HTTP/2 200
cache-control: max-age=604800
content-type: text/html; charset=UTF-8
server: ECS (nyb/1D2A)
cr0x@server:~$ curl -4 -I --max-time 8 https://example.com
HTTP/2 200
cache-control: max-age=604800
content-type: text/html; charset=UTF-8
server: ECS (nyb/1D2A)
Meaning: If -4 works and -6 times out (or stalls mid-transfer), you have a true IPv6 path issue.
Decision: IPv6-only failing? Go to MTU and firewall tasks. Both failing? That’s broader connectivity, proxy, or TLS interception trouble.
Task 8: check route choice for a specific IPv6 destination
cr0x@server:~$ ip -6 route get 2606:4700:4700::1111
2606:4700:4700::1111 from :: via fe80::1 dev enp0s31f6 proto ra metric 1024 pref medium
Meaning: Confirms which interface/gateway will be used. If this shows a VPN interface or an unexpected NIC, you found your “randomness.”
Decision: Wrong egress interface? Fix route metrics, netplan config, or VPN policy routing.
Task 9: test MTU/PMTUD symptoms with a “do not fragment” style probe (IPv6-style)
cr0x@server:~$ ping -6 -c 3 -s 1452 2606:4700:4700::1111
PING 2606:4700:4700::1111(2606:4700:4700::1111) 1452 data bytes
1452 bytes from 2606:4700:4700::1111: icmp_seq=1 ttl=57 time=10.88 ms
1452 bytes from 2606:4700:4700::1111: icmp_seq=2 ttl=57 time=10.95 ms
1452 bytes from 2606:4700:4700::1111: icmp_seq=3 ttl=57 time=11.01 ms
--- 2606:4700:4700::1111 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
Meaning: This isn’t perfect PMTUD validation, but if small pings work and larger ones fail or hang, suspect an MTU black hole.
VPNs, tunnels, and PPPoE are repeat offenders.
Decision: If large packets fail, clamp MSS on the edge, adjust interface MTU, or fix ICMPv6 Packet Too Big being blocked.
Task 10: verify ICMPv6 isn’t being “secured” to death locally
cr0x@server:~$ sudo nft list ruleset | sed -n '1,160p'
table inet filter {
chain input {
type filter hook input priority filter; policy drop;
ct state established,related accept
iif "lo" accept
ip6 nexthdr icmpv6 icmpv6 type { echo-request, echo-reply, nd-neighbor-solicit, nd-neighbor-advert, nd-router-solicit, nd-router-advert, packet-too-big, time-exceeded, parameter-problem } accept
tcp dport { 22 } accept
}
chain forward {
type filter hook forward priority filter; policy drop;
}
chain output {
type filter hook output priority filter; policy accept;
}
}
Meaning: On IPv6, ICMPv6 is not optional. If someone blocked “all ICMP” as a security reflex, you can break neighbor discovery and PMTUD.
Decision: Ensure essential ICMPv6 types are permitted. If you’re on a server with policy drop, be explicit.
Task 11: check the kernel’s IPv6 disable switches (yes, people flip them)
cr0x@server:~$ sysctl net.ipv6.conf.all.disable_ipv6 net.ipv6.conf.default.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.conf.default.disable_ipv6 = 0
Meaning: If these are 1, IPv6 is disabled at the kernel level. Sometimes this is done “temporarily” and then fossilizes.
Decision: If disabled, re-enable and fix the real issue. If enabled, keep going—your problem is subtler (and therefore more fun).
Task 12: confirm what /etc/resolv.conf actually points to
cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Apr 19 10:07 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
Meaning: Ubuntu commonly uses systemd’s stub resolver. If you expected NetworkManager or a static resolv.conf, your mental model is outdated.
Decision: If it’s stub-resolv.conf, use resolvectl for truth. If it’s a static file, ensure it contains reachable DNS servers over both stacks.
Task 13: verify you’re not suffering from DNS over IPv6 transport failure
cr0x@server:~$ resolvectl query -4 example.com
example.com: 93.184.216.34
-- Information acquired via protocol DNS in 16.8ms.
cr0x@server:~$ resolvectl query -6 example.com
example.com: 2606:2800:220:1:248:1893:25c8:1946
-- Information acquired via protocol DNS in 2105ms.
Meaning: Same query, but IPv6 transport to the DNS server is slow. That alone can make “random sites” feel broken.
Decision: Fix IPv6 reachability to your DNS resolver, or prefer an IPv4 resolver temporarily while you repair IPv6 routing/MTU to the IPv6 resolver.
Task 14: capture proof (short, targeted tcpdump)
cr0x@server:~$ sudo tcpdump -ni enp0s31f6 ip6 and host 2606:4700:4700::1111
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp0s31f6, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:14:01.233445 IP6 2001:db8:120:34::25 > 2606:4700:4700::1111: ICMP6, echo request, seq 1, length 64
12:14:01.243110 IP6 2606:4700:4700::1111 > 2001:db8:120:34::25: ICMP6, echo reply, seq 1, length 64
Meaning: You’re verifying packets leave and replies return on the expected interface. If requests leave and nothing returns, that’s upstream.
If nothing leaves, that’s local routing/firewall.
Decision: Use this to prove where the failure sits before changing configs “just to see.”
The big root causes: DNS, routing, MTU, RA/DHCPv6, and firewalls
1) DNS answers are fine; DNS transport is not
One of the nastier variants: your resolver is reachable over IPv4 and IPv6, but IPv6 to the resolver is broken (MTU, routing asymmetry, firewall).
systemd-resolved may prefer the IPv6 resolver address, stall queries, and then retry. From the application’s perspective: “some domains time out.”
The fix isn’t to remove AAAA records (you can’t) or disable IPv6 (you shouldn’t). The fix is to make the IPv6 path to DNS reliable, or to constrain
which DNS servers are used on which interfaces so you don’t prefer a broken transport.
2) Default route exists, but the wrong one wins
Dual-stack failures love multi-homing: Ethernet + Wi‑Fi, LAN + VPN, Docker bridges, and “helpful” network tools that add routes.
You can have a perfectly good IPv4 default route and a broken IPv6 default route, and the host will still choose IPv6 for destinations with AAAA.
Fix the route metrics, or remove the rogue RA source. If you’re in an enterprise, be suspicious of guest Wi‑Fi that advertises IPv6 but doesn’t actually route it.
It happens more often than you’d think, because someone enabled IPv6 on an edge device by checkbox and never verified upstream.
3) MTU black holes (the “connect then stall” classic)
IPv6 requires PMTUD to work properly; routers won’t fragment for you. If ICMPv6 Packet Too Big messages are blocked somewhere,
or a tunnel reduces MTU and no one tells endpoints, TCP can connect and then hang on larger TLS records or HTTP responses.
VPNs are common culprits. So are firewalls with default-deny rulesets that accidentally drop ICMPv6 types they don’t recognize.
When you hear “it loads small pages but not big ones,” don’t argue. Go test MTU.
4) RA and DHCPv6 aren’t aligned with reality
IPv6 config can come from Router Advertisements (SLAAC), DHCPv6, or both.
A network can hand you a global IPv6 address but not a working default route, or hand you DNS servers that aren’t reachable.
You end up “configured” but not functional.
On Ubuntu clients, netplan + NetworkManager or systemd-networkd can behave differently with respect to RA and DHCPv6.
Make sure you know which renderer is active and which component is supposed to accept RA.
5) Firewalls: “block ICMP” is not a plan
On IPv4, you can sometimes get away with sloppy ICMP policy. On IPv6, you can’t.
Neighbor discovery, router discovery, and PMTUD rely on ICMPv6.
Blocking it doesn’t make you secure; it makes you blind and intermittently broken.
systemd-resolved on Ubuntu 24.04: what to trust, what to verify
Ubuntu 24.04 uses systemd-resolved in the common default setups. It’s doing caching, per-link DNS, split DNS, and it presents a local stub resolver.
That’s fine. It’s also a frequent source of confusion because people inspect /etc/resolv.conf and think they’ve learned something.
Often they’ve learned that there is a symlink.
What I trust
resolvectl statusis the current truth for which DNS servers are used per interface and globally.resolvectl queryis a decent way to measure DNS response time without bringing in extra tools.- systemd-resolved caching is generally stable and speeds up common lookups.
What I verify
- Whether a VPN link injected DNS servers and became the default route for IPv6 (often accidentally).
- Whether DNS servers are reachable over IPv6 without MTU or firewall issues.
- Whether split-DNS rules exist that send some domains to internal resolvers that don’t serve AAAA or time out.
If you suspect systemd-resolved itself is wedged, you can restart it—after you’ve captured symptoms. Restarting first is how bugs escape.
cr0x@server:~$ sudo systemctl restart systemd-resolved
cr0x@server:~$ resolvectl statistics
Transactions
Current Transactions: 0
Total Transactions: 1249
Cache
Current Cache Size: 317
Cache Hits: 802
Cache Misses: 447
DNSSEC Verdicts
Secure: 0
Insecure: 1249
Bogus: 0
Meaning: You’re confirming the daemon is healthy and caching. This does not prove upstream is healthy.
Decision: If restarts “fix it,” you still have a root cause; you just added a ritual. Go back to transport reachability and per-link DNS.
Netplan gotchas that quietly wreck IPv6
Netplan is great when you treat it as declarative configuration and keep it boring. It’s less great when you hand-edit YAML under pressure at 2 a.m.
YAML is a language designed to punish overconfidence.
Renderer confusion: NetworkManager vs systemd-networkd
On desktops, NetworkManager is typical. On servers, systemd-networkd is common. Their behavior around RA, DHCPv6, and DNS can differ.
When troubleshooting, identify the renderer in use and don’t mix assumptions.
cr0x@server:~$ sudo netplan get
network:
version: 2
renderer: networkd
ethernets:
enp0s31f6:
dhcp4: true
dhcp6: true
accept-ra: true
Meaning: This host accepts RA and uses DHCPv6. If you set accept-ra: false by accident, you can lose the default route.
Decision: If IPv6 is partially configured, verify dhcp6 and accept-ra match your network design.
Bad static IPv6 + RA: the “two truths” problem
A static IPv6 address with the wrong prefix length, combined with RA-provided routes, can yield bizarre selection behavior.
The host thinks it has a global address; remote replies go elsewhere or return via a different path.
IPv4 still works, so everyone blames “the app.”
cr0x@server:~$ ip -6 route show table main | head
default via fe80::1 dev enp0s31f6 proto ra metric 1024 pref medium
2001:db8:120:34::/64 dev enp0s31f6 proto kernel metric 256 pref medium
2001:db8:120:99::/64 dev enp0s31f6 proto kernel metric 256 pref medium
Meaning: Two on-link global prefixes on one interface can be valid, but it’s often a sign of old static config colliding with current RA.
Decision: If you didn’t intend multi-prefix addressing, clean it up. Reduce variables before debugging higher layers.
Common mistakes (symptom → root cause → fix)
1) Symptom: “Some sites time out, but ping6 works”
Root cause: ICMPv6 echo works, but TCP/UDP is blocked or NAT64/DNS64 is interfering, or MTU is broken for larger packets.
Fix: Test with curl -6 -I, then probe MTU. Verify firewall rules allow outbound IPv6 and essential ICMPv6 types, especially Packet Too Big.
2) Symptom: “IPv6 DNS lookups are slow; IPv4 DNS is fast”
Root cause: Resolver reachable via IPv4, but IPv6 path to the resolver is broken (routing/MTU/firewall), or the resolver’s IPv6 is rate-limited upstream.
Fix: Change DNS to a reachable resolver over IPv6, or temporarily prefer IPv4 DNS transport while repairing IPv6 reachability to DNS.
3) Symptom: “Works on Wi‑Fi, breaks on Ethernet (or vice versa)”
Root cause: Different RAs, different DNS servers, or different IPv6 default routes and metrics per link.
Fix: Compare resolvectl status and ip -6 route on each link. Fix route metrics or stop the rogue RA source.
4) Symptom: “After enabling a VPN, random sites break”
Root cause: VPN advertises itself as default route for IPv6 but doesn’t carry IPv6 traffic properly; or it hijacks DNS and mishandles AAAA.
Fix: Ensure the VPN supports IPv6 end-to-end, or configure split tunneling correctly. Validate with ip -6 route get for failing destinations.
5) Symptom: “Browser spins, but curl -4 works immediately”
Root cause: IPv6 is preferred; Happy Eyeballs fallback delays make the browser look hung.
Fix: Fix the IPv6 path (don’t hide it). As a temporary mitigation, adjust DNS or routing so IPv6 is used only when actually functional.
6) Symptom: “Only big downloads fail over IPv6”
Root cause: MTU black hole or blocked ICMPv6 Packet Too Big.
Fix: Allow ICMPv6 PTB. Clamp MSS at the edge device if you control it. Reduce tunnel MTU or fix PMTUD end-to-end.
7) Symptom: “IPv6 address present, but no default route”
Root cause: RA not received, RA ignored (accept-ra: false), RA filtered (RA-guard), or router not advertising a default.
Fix: Confirm RA reception with networkd/NetworkManager logs; correct netplan; fix RA-guard configuration on the switch.
8) Symptom: “Everything works for a while, then breaks until reconnect”
Root cause: Privacy addresses rotate and some upstream ACL or broken neighbor cache path can’t keep up; or stale routes from a transient interface.
Fix: Look for route flaps and address changes. Prefer stable addressing for servers; for clients, fix upstream filtering and routing stability.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company rolled out Ubuntu 24.04 to a developer fleet. Within a day, the internal helpdesk had a new category: “internet flaky.”
Developers reported that package installs would stall, some SaaS dashboards would half-load, and authentication redirects occasionally failed.
The network team insisted their WAN was clean. The endpoint team insisted it was a browser update. Everyone was correct, which is the most annoying kind of correct.
The wrong assumption was small and deadly: “If the client has an IPv6 address, IPv6 must be working.” On their guest Wi‑Fi and some office floors,
access points were sending Router Advertisements with a global prefix (so clients configured IPv6), but upstream routing for that prefix was missing on one distribution switch.
IPv6 packets left the client and vanished into the void. IPv4 kept working, so nobody got paged for “internet down.”
The symptom pattern looked random because only destinations with AAAA records triggered the broken IPv6 path. And because different CDNs returned AAAA for different regions,
two people could sit next to each other and hit different failing edges.
The fix was boring: stop advertising what you can’t route. They corrected RA configuration on the affected SSIDs and validated that every advertised prefix had an upstream route.
They also added a simple client-side check in their device onboarding: if IPv6 default route exists, verify a short list of IPv6 connectivity tests before declaring “network healthy.”
Mini-story 2: The optimization that backfired
Another org decided to “optimize DNS latency” by standardizing on IPv6 resolvers only. The rationale sounded modern and tidy: fewer legacy dependencies, better future-proofing,
and a single stack to troubleshoot. They rolled it out via endpoint management, replacing mixed IPv4/IPv6 resolvers with IPv6-only addresses.
It worked well in headquarters. It worked well in the data centers. Then remote offices started reporting that “Microsoft stuff is slow” and “some sites never load.”
The team chased browser caches, endpoint security agents, and even CPU throttling—because the failures didn’t reproduce on the corporate LAN.
The backfire was classic MTU. Several remote sites used a WAN link with encapsulation overhead; effective MTU was smaller than 1500.
IPv4 had years of scar tissue in their environment: MSS clamping was already in place for IPv4-heavy traffic patterns. IPv6 didn’t get the same care.
DNS over IPv6 to the resolvers would intermittently stall when responses crossed the MTU boundary and ICMPv6 Packet Too Big messages were mishandled by an edge firewall profile.
Rolling back to mixed resolvers immediately reduced impact, but the real fix was to repair the MTU story for IPv6: allow essential ICMPv6, clamp MSS where appropriate,
and validate with real payload sizes. The lesson was not “IPv6 is bad.” The lesson was: if you optimize a system you don’t measure, you’ve just invented a new incident class.
Mini-story 3: The boring but correct practice that saved the day
A financial services company (the kind that loves change freezes and hates surprises) had a strict practice: every network change required a dual-stack health check from a canary host.
Not a fancy synthetic monitoring platform—just a script running on a small Ubuntu VM in each site, logging DNS timing, IPv4/IPv6 route sanity, and a few HTTP HEAD requests over both stacks.
During a routine firewall firmware upgrade, an engineer applied a “hardened” policy template. The template permitted IPv4 ICMP types they expected,
but for IPv6 it defaulted to “deny unknown ICMPv6.” Neighbor discovery and Packet Too Big were collateral damage.
Within minutes, canary logs showed IPv6 HTTP requests stalling while IPv4 stayed healthy.
The boring practice saved the day because it caught the regression before users noticed. The on-call didn’t have to interpret vague tickets like “Slack is weird.”
They had timestamps, failure modes, and a clear diff: IPv6 broke exactly when the policy template was applied.
Fixing it was equally boring: explicitly allow the ICMPv6 types needed for ND, RA, and PMTUD. The postmortem was short, calm, and useful.
Nobody learned an exciting new trick. Everyone kept their weekend.
Checklists / step-by-step plan
Step-by-step: make dual-stack correct on Ubuntu 24.04
-
Confirm the symptom is IPv6-related.
Usecurl -4vscurl -6for a failing hostname. If IPv6 fails, stop blaming the browser. -
Validate IPv6 address + default route.
Checkip -6 addrandip -6 route. No global address or no default route means you’re not ready for application debugging. -
Verify neighbor discovery.
Checkip -6 neigh. If the router entry is missing/FAILED, you have L2/L3 problems or RA filtering. -
Test IPv6 reachability without DNS.
Ping a known IPv6 address. If it fails, focus on upstream routing/firewall/VPN. -
Test DNS timing and transport.
Useresolvectl query, compare-4and-6queries. Slow IPv6 DNS is still an IPv6 outage. -
Confirm which DNS servers are being used per link.
resolvectl status. If VPN injects DNS, verify it’s intended and works over IPv6. -
Check MTU and PMTUD signals.
Use larger IPv6 pings and look for stalls. If you control the edge, ensure ICMPv6 Packet Too Big is permitted. -
Audit firewall policy for IPv6 reality.
Usenft list ruleset. Allow essential ICMPv6 types. Don’t “secure” the protocol into failure. -
Lock in netplan correctness.
Confirm renderer, RA acceptance, DHCPv6 settings. Apply changes carefully and keep them minimal. -
Capture evidence before and after.
A short tcpdump is worth a thousand Slack threads. Save it with a timestamp and interface name.
Operational checklist: keep it from returning
- Maintain a dual-stack canary check (DNS timing + HTTP over v4/v6) per site/VLAN/VPN profile.
- Require change reviewers to consider ICMPv6 rules explicitly, not as an afterthought.
- Track where RAs originate; treat “rogue RA” as a real failure mode, not a theory.
- Document route metrics and policy routing when VPNs are involved.
- When you change MTU anywhere, validate IPv6 PMTUD with real payload sizes.
One quote worth keeping on your monitor
“Hope is not a strategy.” — Gene Kranz (paraphrased idea)
Substitute “disabling IPv6” for “hope” and you’ve got the moral of this entire mess.
FAQ
1) Why do only some sites break?
Because only some sites publish AAAA records, and only some CDNs steer you to an IPv6 edge your network can’t reach reliably.
Your browser tries IPv6 first, hits a dead path, then eventually falls back to IPv4.
2) Should I just disable IPv6 on Ubuntu 24.04?
As a temporary containment measure on a single host during an incident, maybe. As a “fix,” no.
Disabling IPv6 hides the failure and guarantees you won’t notice when your network is half-broken again.
3) If ping6 works, doesn’t that prove IPv6 is fine?
No. ICMPv6 echo can succeed while TCP/UDP fails due to firewall policy, MTU/PMTUD issues, or asymmetric routing.
Treat ping6 as a starting point, not a verdict.
4) What’s the single most common real cause?
Broken IPv6 routing advertised by RA (clients get addresses and prefer IPv6) combined with upstream not actually routing that prefix.
Second place: MTU black holes caused by blocked ICMPv6 Packet Too Big.
5) How do I know if systemd-resolved is the problem?
If resolvectl query is slow or inconsistent, and especially if resolvectl status shows unreachable DNS servers,
the resolver is exposing a transport issue. The daemon itself is rarely the root cause.
6) Why does a VPN make it worse?
VPNs often change routes and DNS servers. If the VPN claims IPv6 default routing but doesn’t carry IPv6 traffic properly, your host will send IPv6 into a tunnel to nowhere.
Or the VPN’s DNS resolver mishandles AAAA queries. Either way: dual-stack preference amplifies the breakage.
7) What ICMPv6 types must be allowed?
At minimum: neighbor solicitation/advertisement, router solicitation/advertisement, packet-too-big, time-exceeded, and parameter-problem.
Exact policy depends on your environment, but “block all ICMPv6” is a self-inflicted outage generator.
8) How do MTU issues present in browsers?
You’ll see connections that start but don’t finish: TLS handshakes stalling, large assets timing out, or QUIC failing while TCP sometimes works.
If small payloads succeed and larger ones fail, suspect MTU/PMTUD immediately.
9) Is this specific to Ubuntu 24.04?
The underlying failure modes are universal. What changes between releases is the plumbing: resolver behavior, netplan defaults, network manager versions,
and how aggressively apps prefer IPv6. Ubuntu 24.04 is modern enough that it won’t politely avoid your broken IPv6.
10) What’s the quickest safe mitigation while I fix the network?
Prefer stable DNS transport and stable routing. For example, ensure DNS servers are reachable over IPv6 (or temporarily use IPv4 resolvers),
and prevent a broken interface/VPN from being the IPv6 default route. Avoid kernel-level IPv6 disable unless you’re containing an active incident.
Conclusion: next steps that won’t age badly
Dual-stack is supposed to be boring. When it isn’t, it’s almost never “mystical IPv6.” It’s the same old operational sins:
advertising routes you can’t carry, blocking control-plane packets you actually need, and treating MTU as a theoretical concept.
Do this next, in order: run the fast diagnosis playbook, isolate whether it’s DNS transport or routing/MTU, and then fix the network so IPv6 is either
truly functional or not advertised. Don’t teach your fleet to live with a broken stack. That’s how you end up with policies that only work on Tuesdays.
Joke #2: The good news is IPv6 has enough addresses for every device you own. The bad news is it also has enough ways to misconfigure them.