You can load dashboards all day. Slack works. SSH connects instantly. Then you try to copy a 4 GB file over the VPN and it freezes at 3% like it’s waiting for approval from Legal.
The sender keeps “sending,” the receiver keeps “waiting,” and everyone blames “the network” with the confidence of someone who has never looked at a packet capture.
This is one of the most common VPN failure modes in production: small interactive traffic survives, while large transfers stall, crawl, or randomly reset.
The culprit is usually MTU—more precisely, how MTU interacts with encapsulation overhead, TCP behavior, and Path MTU Discovery (PMTUD) when ICMP gets filtered.
The mental model: why “web works” but big files don’t
MTU (Maximum Transmission Unit) is the largest IP packet payload your interface is willing to send without fragmentation.
Ethernet’s classic MTU is 1500 bytes. Many VPNs run over Ethernet. Many people assume the VPN inherits that 1500. That assumption has hurt more careers than any single bug in a service.
VPNs add headers. Sometimes a lot of headers.
Your “inner” packet (the one your application thinks it’s sending) gets wrapped inside an “outer” packet (the one the Internet actually carries).
If the outer path can’t handle the size, the packet must be fragmented or the sender must reduce packet size.
Here’s where the “web works” illusion comes from:
- Interactive stuff is tiny. SSH keystrokes and API calls typically fit in a few hundred bytes. Even if the MTU is wrong, plenty of those packets slip through.
- Web traffic is resilient. Browsers retry. CDNs tolerate loss. Some sites use smaller initial congestion windows. And many paths silently clamp TCP MSS or just happen to have enough headroom.
- Big transfers hit the MTU ceiling constantly. SCP, rsync, SMB, NFS over TCP—these try to fill the pipe. That means lots of full-sized TCP segments. If those segments get black-holed, throughput collapses.
- PMTUD is fragile in corporate networks. It relies on ICMP “Fragmentation Needed” messages returning to the sender. Firewalls love dropping ICMP because it “looks scary.”
When PMTUD is broken, you get a classic “MTU black hole”: large packets disappear, nobody informs the sender, TCP retransmits the same doomed segment, and the transfer “hangs.”
Joke #1: MTU problems are like office chairs—nobody thinks about them until their back goes out mid-quarter.
Interesting facts and historical context (MTU edition)
- 1500 bytes wasn’t a law of physics. It’s a design choice from early Ethernet tradeoffs: efficiency vs. collision domain behavior and hardware constraints.
- IP fragmentation exists because the Internet was stitched from many networks. Early IP had to cross links with different maximum frame sizes, so fragmentation was a compatibility hack.
- IPv6 changed the rules: routers don’t fragment. In IPv6, fragmentation is done only by the sender, which makes PMTUD even more critical.
- PPPoE’s 1492 MTU is a classic gotcha. Eight bytes of PPPoE overhead means 1500 doesn’t fit, and VPNs layered on top get even tighter.
- “Jumbo frames” are common inside data centers. 9000-byte MTUs improve efficiency, but mixed MTUs across overlays (VXLAN/Geneve) create failure modes that look exactly like VPN MTU issues.
- PMTUD has been known to fail since the 1990s. The “black hole” problem is old: ICMP filtering breaks the feedback loop.
- TCP MSS clamping became popular because people gave up on PMTUD. Not elegant, but it’s pragmatic when you don’t control every firewall in the path.
- IPsec can add more overhead than people expect. Depending on mode (tunnel vs transport), NAT traversal, and algorithms, the overhead swings—and your safe MTU changes.
What MTU pain looks like in the wild
MTU issues rarely show up as “packet too big.” They show up as weirdness: selective failure, inconsistent behavior across apps, and performance that looks like a bad day at an ISP.
The symptoms that should make you suspect MTU first:
- SCP/rsync/SMB transfers stall after some data is sent, often at a repeatable offset.
- HTTPS works, but uploading a large artifact times out.
- SSH works, but running
git cloneover the VPN is painfully slow or hangs. - Some sites work, some don’t, especially ones that send large cookies/headers or use larger TLS records.
- VPN works from one network but not another (home fiber vs. coffee shop vs. corporate guest Wi‑Fi).
- Things break only when you enable “full tunnel” or route additional subnets.
Underneath those symptoms is usually one of these mechanical causes:
oversized packets + DF set (don’t fragment) + broken PMTUD = black hole.
Or fragmentation happens, but it happens badly (lossy fragments, reassembly issues, CPU overhead, stateful firewall pain).
PMTUD and the ICMP messages everyone drops
Path MTU Discovery is supposed to be simple: send a packet with DF set, and if a router can’t forward it, the router replies with an ICMP message indicating the next-hop MTU.
The sender reduces size and tries again. Everyone is happy.
In practice, PMTUD is a negotiation conducted by two hosts through a chain of devices that may:
block ICMP, rewrite headers, encapsulate packets, or “helpfully” normalize traffic.
If the ICMP “Fragmentation Needed” (IPv4) or “Packet Too Big” (IPv6) message is filtered, the sender keeps transmitting oversized packets and gets no feedback.
That’s the entire black-hole story. A single security rule written in 2014 (“drop all ICMP”) can kneecap a modern VPN in 2025.
Here’s the opinionated take: do not solve PMTUD failures by hoping PMTUD will start working.
In corporate reality, you should assume some middlebox will drop ICMP at the worst possible place.
Your job is to pick a safe MTU or clamp MSS so TCP never sends packets that need PMTUD in the first place.
VPN encapsulation overhead: the bytes you don’t see
Encapsulation is not free. Every VPN protocol adds headers (and sometimes padding) that consume space inside the outer MTU.
If your physical interface MTU is 1500 and your VPN adds 80 bytes of overhead, your inner packet cannot be 1500 anymore.
Overhead varies. It varies by protocol, by mode, by whether NAT traversal is used, and by whether you’re on IPv4 or IPv6 on the outer transport.
Here are ballpark numbers you can reason with (not promises, just working estimates):
- WireGuard over IPv4/UDP: often ~60 bytes overhead; real safe MTU commonly 1420.
- WireGuard over IPv6/UDP: slightly more overhead; safe MTU often a bit lower.
- IPsec ESP tunnel mode: commonly 50–80+ bytes; NAT-T adds more (UDP encapsulation).
- OpenVPN over UDP: overhead depends on cipher and settings; 1500→ typically 1400–1472-ish depending on tuning.
- GRE/VXLAN overlays: add their own overhead; when stacked with VPN, you can lose 100+ bytes quickly.
The correct way to think: you have an outer path MTU, and you must choose an inner MTU that always fits after adding overhead.
If you don’t know the path MTU, you measure. If you can’t measure reliably, you choose conservatively and clamp.
Joke #2: Encapsulation overhead is like meetings—nobody budgets for it, and somehow it eats the whole afternoon.
How to pick the “right” VPN MTU
There isn’t one magic MTU. There is a safe MTU for your outer path and your encapsulation stack.
Picking it is an engineering exercise: measure the path, subtract overhead, and verify with real traffic.
Step 1: figure out the smallest outer MTU you’ll encounter
If you control the underlay (a private WAN), you may know it’s 1500 end-to-end or 9000 end-to-end.
If your VPN rides the public Internet, assume you do not control it.
Consumer ISPs, PPPoE, LTE, and hotel Wi‑Fi can reduce MTU in ways that change by location.
A pragmatic target for Internet-based VPNs is often:
MTU 1420 for WireGuard, or
MTU 1400 as a conservative “works most places” baseline when you have mixed networks and unknown middleboxes.
Step 2: decide whether you will rely on PMTUD or clamp MSS
If you can guarantee ICMP is permitted end-to-end and you have visibility into firewalls, PMTUD can work.
Most organizations cannot guarantee that across partner networks, remote worker networks, and random guest Wi‑Fi.
My default recommendation:
set a safe tunnel MTU and also clamp TCP MSS at the VPN edge for defense-in-depth.
It’s boring. It works. It’s the kind of boring that keeps your incident channel quiet.
Step 3: validate with “do not fragment” tests and real transfers
A ping test with DF set is not the whole story, but it’s a fast way to find the failure boundary.
Then validate with an actual bulk transfer (iperf3, scp of a large file, or rsync) because some paths treat ICMP differently than UDP/TCP.
Fast diagnosis playbook (first/second/third)
When someone says “VPN is slow” and you suspect MTU, you want a short deterministic path to truth.
Here’s the fastest order that catches most real-world failures without turning into a week-long archaeology dig.
First: confirm the symptom is size-dependent
- Small HTTP requests work; large uploads fail.
- SSH connects; scp of a multi-GB file stalls.
- Ping works with small payload; fails with larger payload + DF.
Second: find the effective MTU on the tunnel and its underlay
- Check the tunnel interface MTU.
- Check the physical egress interface MTU.
- Look for PPPoE (1492), LTE weirdness, or cloud networking constraints.
Third: detect PMTUD failure and decide clamp vs. allow ICMP
- Run DF pings to binary-search the max size.
- Capture traffic to see retransmits and missing ICMP.
- If ICMP is blocked somewhere you can’t control: clamp MSS and lower MTU.
Fourth: validate the fix with a real bulk transfer
- Use iperf3 or scp/rsync of a multi-GB file.
- Watch for retransmits, out-of-order, and throughput stability.
- Confirm you didn’t break something else (like an overlay network).
Practical tasks (commands, outputs, decisions)
The point of MTU troubleshooting is to stop guessing.
Below are concrete tasks you can run on Linux hosts and typical VPN gateways.
Each task includes: a command, example output, what it means, and what decision you make next.
Task 1: Check MTU on all interfaces (find the tunnel and the real egress)
cr0x@server:~$ ip -br link
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> mtu 65536
eth0 UP 0a:1b:2c:3d:4e:5f <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
wg0 UP 8a:9b:aa:bb:cc:dd <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420
What it means: The tunnel (wg0) is set to 1420, underlay eth0 is 1500. This is a sane starting point for WireGuard.
Decision: If the tunnel MTU is 1500 while underlay is 1500, you’re almost certainly oversized after overhead. Plan to lower tunnel MTU or clamp MSS.
Task 2: Inspect the route and learn which interface actually carries VPN traffic
cr0x@server:~$ ip route get 1.1.1.1
1.1.1.1 via 203.0.113.1 dev eth0 src 203.0.113.10 uid 1000
cache
What it means: Your Internet egress is eth0. That’s the underlay for the VPN if your VPN peers are on the Internet.
Decision: If the egress interface is a PPPoE device (often ppp0) or a VLAN with lower MTU, use that value as the upper bound for your outer packet size.
Task 3: Check for PPPoE or other reduced-MTU links
cr0x@server:~$ ip -d link show dev ppp0
6: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 3
link/ppp
What it means: Outer MTU is 1492, not 1500. If you assumed 1500, your VPN inner MTU must be reduced further.
Decision: Lower tunnel MTU and/or clamp MSS accordingly. Don’t “optimize” this away unless you control the access method.
Task 4: Quick DF ping to detect fragmentation/black hole (IPv4)
cr0x@server:~$ ping -M do -s 1472 -c 3 198.51.100.20
PING 198.51.100.20 (198.51.100.20) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1420
ping: local error: message too long, mtu=1420
ping: local error: message too long, mtu=1420
--- 198.51.100.20 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2053ms
What it means: Your local system already knows an MTU constraint (1420) on the path/interface used. A 1500-byte packet won’t fit.
Decision: Don’t send TCP segments that require 1500-byte packets over this path. Set tunnel MTU to match reality or clamp MSS.
Task 5: Binary search the maximum DF payload that works
cr0x@server:~$ ping -M do -s 1392 -c 2 198.51.100.20
PING 198.51.100.20 (198.51.100.20) 1392(1420) bytes of data.
1400 bytes from 198.51.100.20: icmp_seq=1 ttl=57 time=28.1 ms
1400 bytes from 198.51.100.20: icmp_seq=2 ttl=57 time=27.8 ms
--- 198.51.100.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 27.825/27.948/28.071/0.123 ms
What it means: 1420 total IP packet size works (1392 payload + 28 bytes ICMP+IP header). That suggests a safe path MTU near 1420 for this flow.
Decision: Choose tunnel MTU at or below this boundary (account for VPN overhead). For WireGuard, 1420 is often correct; for heavier encapsulations, go lower.
Task 6: Detect PMTUD being blocked (classic black hole pattern)
cr0x@server:~$ ping -M do -s 1412 -c 2 198.51.100.20
PING 198.51.100.20 (198.51.100.20) 1412(1440) bytes of data.
--- 198.51.100.20 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 2038ms
What it means: No reply, and you also didn’t get a helpful ICMP “frag needed” back. This might be plain loss, but if smaller sizes work reliably, it’s suspicious.
Decision: Assume an MTU black hole until proven otherwise. Prefer MSS clamping and a conservative tunnel MTU. Also check firewall rules for ICMP type 3 code 4 (IPv4) and ICMPv6 type 2.
Task 7: Check TCP MSS and observed path behavior with ss
cr0x@server:~$ ss -ti dst 198.51.100.20
ESTAB 0 0 10.0.0.5:48216 198.51.100.20:443
cubic wscale:7,7 rto:204 rtt:32.4/4.1 ato:40 mss:1360 pmtu:1420 rcvmss:1360 advmss:1360 cwnd:10 bytes_sent:128734 bytes_acked:128734
What it means: The connection is using MSS 1360 and PMTU 1420. That’s consistent: 1420 minus 40 bytes TCP+IP header = 1380 (close; options can reduce it slightly).
Decision: If you see MSS near 1460 but your tunnel MTU is lower, you likely have a mismatch and should clamp or fix MTU.
Task 8: Verify WireGuard’s configured MTU and peer path
cr0x@server:~$ wg show
interface: wg0
public key: 8n3r...redacted...
listening port: 51820
peer: pQ1k...redacted...
endpoint: 198.51.100.20:51820
allowed ips: 10.42.0.0/16
latest handshake: 1 minute, 12 seconds ago
transfer: 1.23 GiB received, 2.98 GiB sent
What it means: The tunnel is alive and moving data. MTU issues won’t necessarily stop handshakes; they punish bulk transfers and certain packet sizes.
Decision: If handshakes are fine but throughput is awful, focus on MTU/MSS/PMTUD rather than keys or routing.
Task 9: Check and set interface MTU (temporary change for testing)
cr0x@server:~$ sudo ip link set dev wg0 mtu 1380
cr0x@server:~$ ip link show dev wg0
5: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1380 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/none
What it means: You forced a smaller MTU on the tunnel. This reduces fragmentation risk at the cost of slightly more overhead per byte transferred.
Decision: If big transfers become stable after lowering MTU, you’ve confirmed MTU was the bottleneck. Now calculate the highest safe value and apply permanently in config management.
Task 10: Clamp TCP MSS on a Linux gateway with iptables
cr0x@server:~$ sudo iptables -t mangle -A FORWARD -o wg0 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
cr0x@server:~$ sudo iptables -t mangle -S FORWARD
-P FORWARD ACCEPT
-A FORWARD -o wg0 -p tcp -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
What it means: For TCP SYN packets leaving via wg0, MSS will be adjusted to fit the discovered PMTU (or interface MTU).
Decision: Use this when you cannot trust PMTUD end-to-end or when clients have inconsistent MTU settings. Then verify with ss -ti that MSS is reduced.
Task 11: Do the same with nftables (common in modern distros)
cr0x@server:~$ sudo nft add table inet mangle
cr0x@server:~$ sudo nft 'add chain inet mangle forward { type filter hook forward priority -150; }'
cr0x@server:~$ sudo nft add rule inet mangle forward oifname "wg0" tcp flags syn tcp option maxseg size set rt mtu
cr0x@server:~$ sudo nft list ruleset
table inet mangle {
chain forward {
type filter hook forward priority -150; policy accept;
oifname "wg0" tcp flags syn tcp option maxseg size set rt mtu
}
}
What it means: MSS is set based on route MTU. This is the nftables equivalent of “clamp to PMTU.”
Decision: Prefer nftables if that’s what your fleet uses. Make it persistent using your distro’s mechanism (not shown here), and test after reboot.
Task 12: Confirm ICMP “fragmentation needed” isn’t being dropped locally
cr0x@server:~$ sudo iptables -S INPUT
-P INPUT DROP
-A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j DROP
What it means: You are dropping all ICMP at the host firewall. That can break PMTUD (even if RELATED rules might have helped for some cases).
Decision: Allow needed ICMP types, or clamp MSS so you don’t rely on ICMP. If you’re on IPv6, “drop all ICMPv6” is a reliable way to break your network in creative ways.
Task 13: Capture evidence of black-holing (retransmits, no ICMP back)
cr0x@server:~$ sudo tcpdump -ni wg0 'tcp port 443 or icmp' -vv
tcpdump: listening on wg0, link-type RAW (Raw IP), snapshot length 262144 bytes
12:15:01.102334 IP 10.42.0.10.48216 > 10.42.0.20.443: Flags [P.], seq 1:1361, ack 1, win 501, length 1360
12:15:01.304982 IP 10.42.0.10.48216 > 10.42.0.20.443: Flags [P.], seq 1:1361, ack 1, win 501, length 1360
12:15:01.708115 IP 10.42.0.10.48216 > 10.42.0.20.443: Flags [P.], seq 1:1361, ack 1, win 501, length 1360
What it means: Retransmissions of the same segment without progress. If you never see ICMP “too big” messages and smaller MSS works, you have a PMTUD/MTU problem.
Decision: Reduce tunnel MTU and clamp MSS. Then retest and confirm retransmits drop.
Task 14: Measure throughput and stability with iperf3 (TCP)
cr0x@server:~$ iperf3 -c 10.42.0.20 -t 15
Connecting to host 10.42.0.20, port 5201
[ 5] local 10.42.0.10 port 46012 connected to 10.42.0.20 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 6.50 MBytes 54.5 Mbits/sec 12 34.0 KBytes
[ 5] 1.00-2.00 sec 2.00 MBytes 16.8 Mbits/sec 31 19.8 KBytes
[ 5] 2.00-3.00 sec 1.25 MBytes 10.5 Mbits/sec 45 14.1 KBytes
[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 60 10.0 KBytes
What it means: Retransmits climbing and throughput collapsing to zero is consistent with black-holed segments or severe loss. MTU issues often present like this once you hit the “bad size.”
Decision: Lower MTU / clamp MSS and rerun. A good result is stable bitrate with low retransmits.
Task 15: Spot local offload features that can confuse captures (not usually the root cause)
cr0x@server:~$ sudo ethtool -k eth0 | egrep 'tso|gso|gro'
tcp-segmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
What it means: Offloads can make packet captures look odd (big “super packets” on host side). They don’t usually cause MTU black holes, but they can make you misread evidence.
Decision: If you’re deep in packet analysis, temporarily disable offloads for clarity (not as a “fix”), then re-enable to avoid performance regression.
Task 16: Verify MTU on a remote client (because your gateway isn’t the only liar)
cr0x@server:~$ ssh user@client 'ip -br link show dev wg0'
wg0 UP 3a:44:55:66:77:88 <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500
What it means: Client wg0 is at 1500. That client may send inner packets that don’t fit once encapsulated, depending on the path.
Decision: Standardize MTU across clients via config management (or push a profile if you’re using a managed client). Consistency beats hero debugging.
Three corporate mini-stories (because production is a teacher)
1) The incident caused by a wrong assumption
A mid-sized company rolled out a new site-to-site VPN between a cloud VPC and an on-prem office.
Connectivity tests looked clean: pings worked, DNS worked, people could SSH to bastions. Everyone declared success.
Two days later, the CI system started failing to upload build artifacts to an internal registry reachable only over that VPN.
The build logs showed timeouts and retries. Developers blamed the registry. The registry team blamed the CI runners. SRE got pulled in because the incident channel was getting that special “this is urgent but nobody knows why” energy.
Someone noticed a pattern: small images pushed fine. Large layers stalled indefinitely.
The wrong assumption was simple and common: “The network is Ethernet, so MTU is 1500 everywhere.”
The on-prem uplink used PPPoE. The outer MTU was 1492. The VPN added overhead on top. PMTUD was broken because an old firewall policy dropped ICMP type 3 code 4.
So large TCP segments disappeared into the void, and TCP did what TCP does: retransmit, back off, and suffer quietly.
The fix wasn’t heroic. They lowered the tunnel MTU to a safe value and clamped MSS on the gateway.
Uploads became boring. The incident ended.
The postmortem action item was sharper: stop using “ping works” as a proxy for “bulk traffic works,” and stop dropping all ICMP by default.
2) The optimization that backfired
Another org had a remote workforce and a VPN client that defaulted to MTU 1420.
A network engineer, trying to squeeze more throughput out of large file transfers, pushed a policy to set MTU to 1500 “to match Ethernet.”
They also disabled MSS clamping because it “shouldn’t be necessary.”
For a week, it looked like a win in the office: faster transfers to certain internal services over a well-behaved ISP link.
Then the helpdesk queue exploded with remote users: video calls were fine, but uploading reports to an internal portal failed, and some web apps loaded without CSS.
The failures were inconsistent across ISPs and locations, which is exactly how MTU problems present when the underlay varies.
The backfire came from the real world: some users were on LTE hotspots with smaller effective MTUs; some were behind consumer routers doing odd things with ICMP; some had ISP paths that simply wouldn’t pass 1500+encapsulation reliably.
PMTUD should have negotiated smaller sizes, but it couldn’t, because ICMP “too big” messages were not making it through consistently.
The rollback to 1420 fixed most users instantly. They reintroduced MSS clamping at the VPN edge, then ran a controlled experiment to see if any subset could safely use a larger MTU.
The conclusion was annoyingly predictable: a globally higher MTU was not worth the support cost. They kept 1420 and focused on throughput via better routing and capacity planning.
3) The boring but correct practice that saved the day
A financial services team ran multiple VPN types: WireGuard for engineers, IPsec tunnels for partners, and an overlay network in Kubernetes.
They had a rule: every new tunnel must ship with an MTU test and an MSS policy, plus a one-page runbook that included the “DF ping binary search” procedure.
Nobody loved this rule. Everyone followed it because it prevented weekends from being destroyed.
One Friday, a partner changed their firewall posture and started blocking more ICMP.
Their IPsec tunnel stayed “up” but large messages to a settlement API started failing sporadically. Alerts triggered, but the team didn’t panic; they had a playbook and a known-good baseline.
They ran their standard checks: DF ping tests, tcpdump for retransmits, and verification of MSS clamping on the edge.
The result: the clamp was already preventing oversized TCP segments, so the impact was limited to a subset of non-TCP traffic and a specific service that sent larger UDP payloads than expected.
Because the tunnel MTU was already conservative, the blast radius was smaller than it could have been.
They adjusted the application payload size for that UDP path and coordinated with the partner to allow the necessary ICMP types.
The “boring” practice—standard MTU settings, clamping, and repeatable tests—turned what could have been a multi-team outage into a contained operational ticket.
Common mistakes: symptoms → root cause → fix
1) Symptom: SCP stalls; SSH interactive is fine
Root cause: TCP segments sized for 1500 MTU are being black-holed after VPN encapsulation; PMTUD/ICMP feedback is blocked.
Fix: Lower tunnel MTU (e.g., 1420→1380 if needed) and clamp MSS at the VPN edge. Verify with ss -ti and a real transfer.
2) Symptom: Some websites load, others hang or lose assets (CSS/images)
Root cause: Mixed packet sizes in TLS/HTTP; larger responses trigger fragmentation issues. Often seen with full-tunnel VPN and MTU mismatch.
Fix: Clamp MSS on outbound TCP from the client or VPN gateway. If you manage clients, standardize tunnel MTU.
3) Symptom: VPN “connects” but throughput is terrible and retransmits are high
Root cause: Fragmentation or black-hole loss at a specific size boundary; sometimes aggravated by an overlay-on-overlay stack (VXLAN over VPN, etc.).
Fix: Measure max DF size, reduce MTU, and avoid stacking encapsulations without budgeting MTU. If you must stack, bake MTU math into design.
4) Symptom: Works from office network, fails from home or LTE
Root cause: Underlay MTU differs (PPPoE/LTE), and your chosen tunnel MTU is too high for some paths. PMTUD may not function across consumer gear.
Fix: Use conservative MTU for remote-access VPNs and clamp MSS. Stop treating the office as “the Internet.”
5) Symptom: IPv6 traffic over VPN is flaky; IPv4 is okay
Root cause: ICMPv6 filtered (especially “Packet Too Big”), which breaks IPv6 PMTUD. Also, IPv6 headers are larger, reducing effective payload.
Fix: Permit required ICMPv6 types or clamp MSS / reduce MTU more aggressively for IPv6. Don’t blanket-drop ICMPv6.
6) Symptom: Kubernetes pods can reach services, but large responses fail
Root cause: CNI MTU mismatch vs. node/tunnel MTU; overlay adds overhead; pod MTU too large; PMTUD blocked inside overlays.
Fix: Set CNI MTU explicitly to fit the underlay and any VPN. Then clamp MSS at node egress if needed.
7) Symptom: UDP-based apps break (VoIP okay, but large UDP payloads fail)
Root cause: UDP doesn’t have MSS negotiation like TCP; oversized datagrams get fragmented and fragments are lost/dropped; or DF behavior differs.
Fix: Reduce application payload size; avoid large UDP datagrams over the VPN; ensure fragmentation is not required or is handled reliably.
Checklists / step-by-step plan
Checklist A: When you’re on call and need a fix in under an hour
- Confirm size-dependence. Try a large scp/rsync/iperf3; compare with a small HTTP request.
- Read current MTU values.
ip -br linkon both ends; identify tunnel and egress. - Run DF ping tests. Binary search payload sizes to find the boundary.
- Lower tunnel MTU temporarily. Test 1420 → 1380 → 1360 as needed.
- Clamp MSS at the VPN edge. Use iptables/nftables to clamp to PMTU.
- Retest bulk transfer. iperf3 + scp of a multi-GB file; ensure retransmits drop.
- Make it persistent. Update VPN config and firewall config management; add a regression test to your runbook.
Checklist B: When you’re designing a VPN and want to avoid the incident entirely
- Inventory encapsulation layers. VPN + overlay + VLAN + PPPoE is not a vibe; it’s an MTU budget.
- Pick a conservative baseline MTU. For Internet remote access, start around 1420 (WireGuard) or 1400 if you have mixed paths.
- Standardize MTU on clients. Don’t let endpoints guess; provide a config.
- Clamp MSS at boundaries. Especially where traffic transitions from “inside” to “tunnel.”
- Allow essential ICMP. Permit fragmentation-needed/packet-too-big. Don’t drop all ICMP because a scanner once yelled at you.
- Test from hostile networks. LTE, hotel Wi‑Fi, home PPPoE, and whatever your finance team uses on the road.
- Write the runbook. Include exact commands, expected outputs, and rollback procedures.
Checklist C: What to avoid (because you’ll regret it)
- Setting tunnel MTU to 1500 “because Ethernet.”
- Dropping all ICMP/ICMPv6 without understanding PMTUD.
- Assuming one successful ping means your MTU is correct.
- Stacking tunnels/overlays without explicitly calculating headroom.
- Rolling out MTU changes fleet-wide without a canary network that includes consumer ISPs.
FAQ
1) Why do small packets work but large ones stall?
Because small packets fit under the effective MTU even after VPN overhead. Large packets exceed it, get dropped or fragmented, and PMTUD may be blocked.
TCP then retransmits and backs off, which looks like a “hang.”
2) What’s a good default MTU for WireGuard?
1420 is a common, practical default on Ethernet underlays with 1500 MTU. If you see black-hole symptoms on some networks, try 1380 or 1360.
Then validate with DF pings and real transfers.
3) Should I rely on PMTUD instead of clamping MSS?
If you control the entire path and allow the required ICMP messages, PMTUD can work fine.
In mixed corporate/consumer reality, MSS clamping is often the safer operational choice—especially for remote-access VPNs.
4) Doesn’t lowering MTU reduce performance?
Slightly, yes: more packets for the same data, more per-packet overhead.
But the “performance” you get from a too-large MTU is imaginary if it causes retransmits, fragmentation, or stalls. Stability beats theoretical efficiency.
5) What about jumbo frames (9000 MTU)? Can I use a bigger tunnel MTU?
Only if the entire underlay supports it end-to-end and every encapsulation layer is accounted for.
Mixed MTUs are common, and one 1500-MTU hop will ruin your day. Measure; don’t assume.
6) If I clamp MSS, do I still need to set tunnel MTU correctly?
Clamping MSS protects TCP flows. It does nothing for UDP or non-TCP protocols.
A sane tunnel MTU reduces fragmentation risk across the board, so you usually want both: safe tunnel MTU + MSS clamping.
7) Why is IPv6 more sensitive to this?
Routers don’t fragment IPv6 packets. The sender must adapt based on ICMPv6 “Packet Too Big.”
If ICMPv6 is filtered, PMTUD fails hard, and you get black holes more reliably than with IPv4.
8) What’s the difference between MTU and MSS?
MTU is the maximum IP packet size on a link. MSS is the maximum TCP payload size in a segment.
MSS is typically MTU minus IP+TCP headers (and minus options). Clamping MSS prevents TCP from generating packets that exceed the path MTU.
9) Can I “fix” this by allowing fragmentation?
You can, but it’s a trade: fragmentation increases loss sensitivity (lose one fragment, lose the whole packet), burdens middleboxes, and can amplify performance issues.
Prefer avoiding fragmentation by choosing an appropriate MTU and clamping MSS.
10) What’s a good operational rule for ICMP in firewalls?
Permit the ICMP messages required for PMTUD (IPv4 fragmentation-needed, IPv6 packet-too-big) and basic error reporting.
Don’t blanket-drop ICMP. That policy is how you get “the VPN is haunted” tickets.
One quote worth keeping on your wall
Paraphrased idea from Werner Vogels: “Everything fails, all the time—design for it.” Applied here: assume PMTUD fails somewhere and build guardrails (MTU/MSS) anyway.
Conclusion: practical next steps
If big files stall over your VPN while basic web browsing still works, stop chasing ghosts in the application layer.
MTU and PMTUD are the usual suspects, and the fix is usually a combination of a conservative tunnel MTU and MSS clamping.
Next steps you can do today:
- Run DF ping tests to find the actual size boundary.
- Standardize tunnel MTU across endpoints (don’t let clients freestyle).
- Clamp TCP MSS at the VPN boundary to avoid relying on PMTUD.
- Audit firewall rules: allow essential ICMP/ICMPv6 messages.
- Validate with a real bulk transfer and keep the commands in a runbook.
When MTU is right, the VPN becomes boring. That’s the goal. Boring networks move data and keep you out of meetings.