Nothing makes you question your career choices like a “simple” IPsec VPN that works from one network, then dies the moment you move the client behind a home router, hotel Wi‑Fi, or a corporate NAT farm. The logs say the tunnel is “up.” The users say the app is “down.” Your firewall says it’s “allowing everything.” And yet: no traffic.
IPsec behind NAT fails in a handful of repeatable ways. The trick is to stop guessing which one you have. NAT‑Traversal (NAT‑T) is supposed to make it boring. When it isn’t boring, the failure is usually loud—just not in the place you’re looking.
What NAT actually does to IPsec (and why it’s awkward)
Let’s get the vocabulary straight, because IPsec failures are often vocabulary failures with extra steps.
IKE is control plane; ESP is data plane
In the modern world you’re usually dealing with IKEv2. It negotiates keys and parameters over UDP port 500. Once the peers agree, the actual encrypted traffic flows via:
- ESP (Encapsulating Security Payload): IP protocol 50 (not TCP/UDP). This is the classic, NAT-hostile mode.
- NAT-T encapsulated ESP: ESP wrapped inside UDP/4500. This is how IPsec survives NAT.
So you need two things to succeed: IKE has to complete, and the data plane has to carry traffic. You can have IKE “up” with a dead data path. That is not a paradox. That is Tuesday.
Why NAT breaks “native” ESP
NAT devices rewrite IP addresses (and often ports). ESP doesn’t have ports. Many NAT boxes can’t maintain stable state for a protocol that doesn’t give them a 5‑tuple (src IP, dst IP, src port, dst port, protocol). Some can do “ESP passthrough,” which is a marketing term meaning “we tried.”
Worse: IPsec integrity covers parts of the packet that NAT likes to rewrite. So if you use AH (Authentication Header, protocol 51), NAT rewriting invalidates the integrity check. AH and NAT are basically incompatible by design. ESP can survive NAT if the NAT device tracks it and doesn’t touch the protected fields, but that’s not something you should bet production on.
NAT-T’s core trick
NAT‑T changes the outer transport to UDP, so NAT devices can track a port-mapped flow. After NAT detection, both peers switch to UDP/4500. The encrypted payload (ESP) becomes a UDP payload. The NAT device becomes happy because it can now do what it does all day: keep UDP state.
But NAT‑T adds overhead. That matters for MTU and fragmentation. And it relies on NAT state timeouts and keepalives. That matters for “it connects then dies after 30 seconds.”
One paraphrased idea, often attributed to Werner Vogels, fits reliability work: Everything fails; design so it fails predictably and recovers automatically.
NAT‑T issues are predictable once you know the handful of traps.
The three most common “behind NAT” architectures
- Road warrior: laptop/phone behind consumer NAT to a VPN gateway. Usually easiest—if UDP/500 and UDP/4500 are allowed.
- Site-to-site with one NATed side: branch office behind ISP router doing NAT. Works, but you must plan for dynamic public IP and NAT mappings.
- Double NAT / CGNAT: you’re behind two NATs (home router + carrier-grade NAT). NAT‑T can work, but you’re now in the land of aggressive timeouts, port rewriting, and “please just let me use WireGuard.”
NAT-T in practice: ports, encapsulation, and state
NAT detection and the switch to UDP/4500
IKEv2 does NAT detection by hashing IP/port tuples and comparing what each side thinks the outer addresses/ports are. If the hashes don’t match, NAT exists somewhere on the path. Then the peers typically agree to use UDP encapsulation on port 4500.
If you see IKE messages on UDP/500 but never see UDP/4500, either NAT wasn’t detected (rare), NAT‑T isn’t enabled/negotiated (common misconfig), or a firewall drops UDP/4500 (very common).
Keepalives: because UDP state evaporates
NAT devices time out UDP mappings quickly—sometimes in 30 seconds, sometimes in a few minutes. NAT‑T uses periodic keepalives (often 20 seconds-ish) to keep that mapping alive. If keepalives are disabled or blocked, the tunnel comes up, traffic passes briefly, then dies until rekey or reconnect.
Short joke #1: NAT is like a goldfish: it forgets your UDP flow exists the moment you stop waving packets in front of it.
MTU, fragmentation, and why “works for ping” is meaningless
NAT‑T adds overhead: outer IP header + UDP header + non-ESP marker + ESP overhead. If you don’t account for it, you get path MTU problems. The tunnel “works” for small packets (pings, SSH banners), then large packets blackhole. You’ll see stalled TCP sessions, retransmits, or “the website loads but downloads fail.”
PMTUD often fails across IPsec + NAT because ICMP “Fragmentation Needed” messages are blocked or don’t map back cleanly. So you may have to clamp MSS or set a lower MTU explicitly.
NAT-T with multiple clients behind the same NAT
Multiple clients behind one NAT connecting to the same gateway is normal. The NAT device uses different source ports for each client’s UDP/4500 flow. If the VPN gateway or middleboxes assume “one peer per public IP,” you get flapping tunnels, wrong SA selection, or one user stealing another user’s session. This is almost always a broken policy lookup or an overly strict peer ID expectation.
IPsec policy selection and “rightid/leftid”
When NAT is involved, IP addresses are less stable identifiers. Use IKE identities (FQDN, email-like IDs, certificate subjectAltName) rather than “peer must come from X public IP.” Site-to-site behind dynamic IP? Use aggressive but controlled identity matching, and constrain with certificates/PSK plus narrowing proposals.
Fast diagnosis playbook (first/second/third)
This is the order that actually saves time in production. Don’t start by reading every log line since 2019.
First: prove packets exist (UDP/500 and UDP/4500) on both ends
- Capture on the client side: do you send UDP/500? Do you later send UDP/4500?
- Capture on the gateway WAN interface: do you receive them?
- If you see UDP/500 but not UDP/4500 after NAT detection, assume UDP/4500 is blocked until proven otherwise.
Second: confirm what state each peer thinks it has (IKE SA vs CHILD SA)
- Is IKE SA established but no CHILD SA? That’s usually proposal mismatch or policy/traffic selector mismatch.
- Is CHILD SA installed but traffic doesn’t pass? That’s usually routing, firewall, or MTU/fragmentation.
- Does it pass for 30–120 seconds then die? That’s usually NAT mapping timeout / keepalives.
Third: check MTU/MSS and asymmetric routing
- Run a DF ping (or tracepath) across the tunnel and find the real usable MTU.
- Clamp TCP MSS on the tunnel interface or at the gateway if needed.
- Verify return path: IPsec is allergic to asymmetric routing when stateful firewalls are involved.
If you do those three steps, you solve a large majority of “behind NAT” incidents without a war room.
Interesting facts and short history (the bits that explain today’s mess)
- IPsec was designed for IPv6 optimism. The original mindset assumed end-to-end addressing would be common; NAT was not the star of the show.
- AH (IP protocol 51) essentially can’t survive NAT. NAT modifies headers AH authenticates. ESP is the practical choice in NATed environments.
- NAT-T became standard because “ESP passthrough” wasn’t reliable. Vendors shipped hacks; interoperability was… aspirational.
- UDP/4500 is the convention for NAT-T. It’s now widely recognized by firewalls as “the IPsec NAT-T port,” which is both helpful and a fingerprint.
- NAT-T uses a non-ESP marker. In encapsulated ESP packets, a 4-byte zero marker distinguishes IKE traffic from ESP-in-UDP payloads.
- Early NAT-T drafts competed. Different implementations existed before standardization; legacy gear sometimes still shows odd behavior.
- UDP NAT timeouts vary wildly. Consumer routers, enterprise firewalls, and CGNAT boxes all choose different defaults; your keepalive settings need to match the worst case.
- Mobile networks are hostile by default. Some carriers and hotspots throttle or reshape UDP flows; IPsec over NAT-T can look like “random UDP” to them.
- IKEv2 improved resilience. Compared to IKEv1, IKEv2’s state machine and MOBIKE support handle address changes better—when enabled and supported.
Hands-on tasks: commands, expected output, and decisions
Below are practical tasks you can run on Linux-based clients/gateways. The commands are realistic; adapt interface names and IPs. Each task includes: command, what the output means, and the decision you make.
Task 1: Confirm you have a NAT in the path (client-side)
cr0x@server:~$ ip -br addr
lo UNKNOWN 127.0.0.1/8 ::1/128
wlan0 UP 192.168.1.50/24 fe80::a00:27ff:fe4e:66a1/64
Meaning: Your client has a private RFC1918 address (192.168.1.50). You are behind NAT.
Decision: Assume NAT-T is required unless the VPN is using a pure UDP-based VPN. For IPsec, you’ll need UDP/4500 reachability.
Task 2: Check default route and egress interface (client-side)
cr0x@server:~$ ip route show default
default via 192.168.1.1 dev wlan0 proto dhcp metric 600
Meaning: Traffic exits via 192.168.1.1 on wlan0. That’s your NAT gateway.
Decision: If the VPN fails only on this network, suspect this NAT device or its upstream policy.
Task 3: Verify UDP/500 and UDP/4500 are not locally blocked (client firewall)
cr0x@server:~$ sudo nft list ruleset | sed -n '1,120p'
table inet filter {
chain input {
type filter hook input priority 0; policy accept;
}
chain output {
type filter hook output priority 0; policy accept;
}
}
Meaning: Local firewall is permissive (policy accept). If yours is “drop,” look for explicit allows.
Decision: If you see drops for UDP/500 or UDP/4500, fix locally before blaming the network.
Task 4: Prove the client sends IKE packets (tcpdump)
cr0x@server:~$ sudo tcpdump -ni wlan0 udp port 500 or udp port 4500
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on wlan0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:10:41.102345 IP 192.168.1.50.55612 > 203.0.113.10.500: isakmp: parent_sa ikev2_init[I]
12:10:41.256789 IP 203.0.113.10.500 > 192.168.1.50.55612: isakmp: parent_sa ikev2_init[R]
Meaning: IKEv2 INIT is happening on UDP/500. So basic reachability exists.
Decision: Move on to see whether NAT-T (UDP/4500) begins after NAT detection.
Task 5: Confirm the switch to NAT-T (UDP/4500) actually happens
cr0x@server:~$ sudo tcpdump -ni wlan0 udp port 4500
listening on wlan0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:10:42.001122 IP 192.168.1.50.4500 > 203.0.113.10.4500: UDP-encap: NONESP-encap: isakmp: parent_sa ikev2_auth[I]
12:10:42.140055 IP 203.0.113.10.4500 > 192.168.1.50.4500: UDP-encap: NONESP-encap: isakmp: parent_sa ikev2_auth[R]
12:10:43.000211 IP 192.168.1.50.4500 > 203.0.113.10.4500: UDP-encap: ESP(spi=0xc1a2b3c4,seq=0x1)
Meaning: You have NAT-T (UDP/4500) and you see ESP-in-UDP packets. That’s the correct shape.
Decision: If UDP/4500 packets are absent, fix firewall/ACL/NAT rules in the path. If they’re present but no traffic works, look at CHILD SA and routing/MTU.
Task 6: If you never see UDP/4500, test whether the network blocks it
cr0x@server:~$ sudo nmap -sU -p 4500 203.0.113.10
Starting Nmap 7.94 ( https://nmap.org ) at 2025-12-28 12:12 UTC
Nmap scan report for 203.0.113.10
Host is up.
PORT STATE SERVICE
4500/udp open|filtered nat-t-ike
Meaning: UDP scans can’t reliably distinguish “open” from “filtered.” But if it returns “closed,” you have a strong signal something rejects it.
Decision: If the environment is suspect (hotel/cafe), try a different network or move IPsec to a more permissive transport (or use IKEv2 over TCP if supported, but that’s vendor-specific and not universally sane).
Task 7: On the VPN gateway, confirm packets arrive on WAN
cr0x@server:~$ sudo tcpdump -ni eth0 udp port 500 or udp port 4500
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:10:41.103111 IP 198.51.100.25.55612 > 203.0.113.10.500: isakmp: parent_sa ikev2_init[I]
12:10:42.001901 IP 198.51.100.25.4500 > 203.0.113.10.4500: UDP-encap: NONESP-encap: isakmp: parent_sa ikev2_auth[I]
Meaning: The gateway sees the client as 198.51.100.25 (the NAT’s public IP), not the private 192.168.1.50. That’s expected.
Decision: If the gateway sees UDP/500 but not UDP/4500, the block is between client and gateway (often the NAT device, upstream firewall, or ISP).
Task 8: Check IKE/CHILD SA state on a strongSwan gateway
cr0x@server:~$ sudo swanctl --list-sas
ikev2-vpn: #12, ESTABLISHED, IKEv2, 3c4d5e6f7a8b9c0d_i* 9f8e7d6c5b4a3210_r
local 'vpn-gw.example' @ 203.0.113.10[4500]
remote 'roadwarrior' @ 198.51.100.25[4500]
AES_GCM_16-256/PRF_HMAC_SHA2_256/ECP_256
established 62s ago, rekeying in 51m
ike CHILD_SA: net-traffic{24} ESTABLISHED
local 10.10.0.0/16
remote 10.50.20.0/24
AES_GCM_16-256/ECP_256, 12345678_i 9abcdef0_o, rekeying in 47m
bytes_i 10492, bytes_o 8891, packets_i 120, packets_o 97
Meaning: IKE SA is established and a CHILD SA exists. Counters increase. Data plane is moving.
Decision: If CHILD SA is missing or stuck in “INSTALLING,” troubleshoot proposals, PSK/certs, and traffic selectors. If counters stay at zero, troubleshoot routing/firewall/MTU.
Task 9: Detect “tunnel up, no traffic” via xfrm state/policy (Linux gateway)
cr0x@server:~$ sudo ip xfrm state | sed -n '1,60p'
src 203.0.113.10 dst 198.51.100.25
proto esp spi 0x12345678 reqid 1 mode tunnel
replay-window 32 flag af-unspec
aead rfc4106(gcm(aes)) 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 128
src 198.51.100.25 dst 203.0.113.10
proto esp spi 0x9abcdef0 reqid 1 mode tunnel
replay-window 32 flag af-unspec
aead rfc4106(gcm(aes)) 0xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb 128
Meaning: Kernel SAs exist. If they’re missing, the negotiation didn’t install them.
Decision: If state exists but policy doesn’t match your subnets, you have a selector mismatch. If both exist, move to routing and MTU checks.
Task 10: Verify NAT-T keepalives by watching periodic small packets
cr0x@server:~$ sudo tcpdump -ni eth0 udp port 4500 and host 198.51.100.25
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:15:00.000100 IP 198.51.100.25.4500 > 203.0.113.10.4500: UDP-encap: ESP(spi=0xc1a2b3c4,seq=0x35)
12:15:20.002210 IP 198.51.100.25.4500 > 203.0.113.10.4500: UDP-encap: ESP(spi=0xc1a2b3c4,seq=0x36)
Meaning: Traffic continues. If you see a burst, then silence, then a reconnect, that smells like UDP state timeout.
Decision: Enable/adjust NAT keepalives (client and/or gateway) and shorten DPD/keepalive intervals to survive aggressive NATs.
Task 11: Find the effective path MTU with tracepath
cr0x@server:~$ tracepath -n 10.10.1.10
1?: [LOCALHOST] pmtu 1500
1: 10.50.20.1 2.145ms
2: 10.10.1.10 5.621ms reached
Resume: pmtu 1380 hops 2 back 2
Meaning: Path MTU appears to be 1380. That’s lower than Ethernet 1500 due to encapsulation overhead and possibly an upstream constraint.
Decision: Set tunnel/interface MTU around 1380 (or slightly lower) and/or clamp TCP MSS. If PMTUD is flaky, be conservative.
Task 12: Confirm PMTUD/fragmentation symptoms with DF ping
cr0x@server:~$ ping -M do -s 1400 10.10.1.10 -c 3
PING 10.10.1.10 (10.10.1.10) 1400(1428) bytes of data.
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
--- 10.10.1.10 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss
Meaning: Your local interface MTU is 1500 but the effective tunnel MTU is lower. The DF packet can’t be sent as-is.
Decision: Lower MTU on the VPN interface and/or clamp TCP MSS (commonly to ~1360–1380 range depending on your encapsulation).
Task 13: Clamp TCP MSS with nftables (gateway)
cr0x@server:~$ sudo nft add table inet mangle
cr0x@server:~$ sudo nft add chain inet mangle forward '{ type filter hook forward priority -150; }'
cr0x@server:~$ sudo nft add rule inet mangle forward tcp flags syn tcp option maxseg size set 1360
cr0x@server:~$ sudo nft list table inet mangle
table inet mangle {
chain forward {
type filter hook forward priority -150; policy accept;
tcp flags syn tcp option maxseg size set 1360
}
}
Meaning: New TCP connections traversing the gateway get MSS clamped to 1360.
Decision: If MTU blackholing is your issue, this usually fixes “some sites work, others stall” without touching every client.
Task 14: Verify firewall allows UDP/500 and UDP/4500 on the gateway
cr0x@server:~$ sudo nft list ruleset | grep -E 'udp dport (500|4500)'
udp dport 500 accept
udp dport 4500 accept
Meaning: The gateway firewall is allowing IKE and NAT-T.
Decision: If these rules are missing (or ordered after a drop), add them. If they exist, the block is upstream or on the client side.
Task 15: Check conntrack state for UDP/4500 (NAT device / gateway doing NAT)
cr0x@server:~$ sudo conntrack -L -p udp --dport 4500 | head
udp 17 28 src=192.168.1.50 dst=203.0.113.10 sport=4500 dport=4500 src=203.0.113.10 dst=198.51.100.25 sport=4500 dport=4500 [ASSURED] mark=0 use=1
Meaning: There’s a NAT/conntrack mapping with 28 seconds remaining (example). If it drops to zero and disappears quickly, your timeout is too aggressive.
Decision: Increase UDP timeout if you control the NAT, or reduce keepalive interval if you don’t.
Task 16: Confirm the VPN subnet is actually routed to the tunnel
cr0x@server:~$ ip route get 10.10.1.10
10.10.1.10 dev vti0 src 10.50.20.2 uid 0
cache
Meaning: The kernel will send traffic to 10.10.1.10 via vti0. Good.
Decision: If it routes out the WAN instead, you have a routing/policy routing issue, not an IPsec issue.
Three corporate mini-stories from the NAT-T trenches
Mini-story #1: The incident caused by a wrong assumption
The setup looked routine: a partner needed a site-to-site IPsec tunnel into a staging environment. The partner’s engineer provided a single “peer IP” and their internal subnets. We whitelisted UDP/500 and UDP/4500 from that peer IP. Tunnel came up in the lab. Everyone declared victory.
Two days later, staging went dark—only for that partner. Our dashboards showed the IKE SA established, then flapping every few minutes. Packet capture on our gateway showed IKE packets coming from a different public IP than the one we whitelisted. The security team insisted we were under attack because “the peer IP changed.”
The wrong assumption was simple: that the partner had a stable public IP. They didn’t. They were behind an upstream NAT pool and the egress IP changed based on load, not on time. Their “peer IP” was the address their firewall UI showed, not what the internet saw on Tuesdays.
Fix was equally simple and equally political: we stopped binding the tunnel to a single source IP and instead bound it to certificate identity, strict proposals, and narrowed traffic selectors. We still constrained source IPs, but to a small provider range approved through change control. The tunnel stopped flapping. The incident ended. Everyone pretended this was the plan all along.
Lesson: if you’re matching on public IP alone, you’re betting reliability on someone else’s NAT architecture. That’s not “security.” That’s roulette with paperwork.
Mini-story #2: The optimization that backfired
An internal network team wanted faster failover for remote users. Their logic: reduce keepalive/DPD chatter to save bandwidth and CPU, because “the tunnel is mostly idle.” They lengthened NAT keepalive intervals and increased DPD timeouts. On paper, fewer packets, fewer interrupts, happier firewalls.
On the first day, everything looked fine. Then tickets started: “VPN connects but drops when I stop typing.” You could set a watch to it: around a minute of inactivity and the tunnel would become a zombie. The client UI still showed “connected,” but traffic went nowhere. Reconnect fixed it—until the next coffee break.
We captured traffic on the gateway. The NAT mapping upstream timed out at 30 seconds for UDP. The keepalive had been stretched beyond that. So the mapping died silently, and the next real packet was dropped because the NAT device had forgotten where to send it. When the client eventually retransmitted or rekeyed, it punched a new mapping. Users experienced it as random drops.
We rolled back the “optimization,” set keepalives to a value proven against the worst NAT we could find, and documented it as a non-negotiable SLA parameter. We also taught the team a harsh truth: bandwidth saved by disabling keepalives is often repaid with interest in user pain.
Mini-story #3: The boring but correct practice that saved the day
Another org, different vibe: heavily regulated, change-controlled, and allergic to heroic troubleshooting. Their IPsec standard required two things that seemed overkill until they weren’t: mandatory packet capture points and a written “expected packet shape” per tunnel (UDP/500 first, then UDP/4500, then ESP-in-UDP).
During an ISP migration, a subset of remote sites stopped passing traffic. The tunnel status pages looked green. The application owners escalated immediately, because of course they did.
The on-call SRE pulled up the runbook and did the boring steps. Capture on WAN: UDP/500 and UDP/4500 arrived. Capture on inside interface: no decrypted traffic. That narrowed it to policy selection, not reachability. They compared the installed traffic selectors to the standard and found the ISP migration changed the branch’s NAT behavior—source ports were being remapped more aggressively, and the branch firewall was matching peers by public IP and port combination. That broke when the NAT chose a different port.
The fix was a config change they already had pre-approved: peer match by identity, not by 5‑tuple. They applied it, verified counters increased, and closed the incident. No drama. No midnight calls with vendors. Just the dull satisfaction of a runbook that works.
Common mistakes: symptom → root cause → fix
This section is intentionally specific. If you can’t map your symptom to a root cause, you’re not diagnosing—you’re collecting feelings.
1) Symptom: IKE SA establishes on UDP/500, then nothing; no UDP/4500 seen
Root cause: NAT-T not negotiated or UDP/4500 blocked. Sometimes caused by disabling NAT-T on one peer or middlebox filtering “unknown UDP.”
Fix: Enable NAT-T on both peers. Allow inbound/outbound UDP/4500. Verify with tcpdump on both ends. If a firewall does “IPsec helper,” consider disabling the helper if it mangles traffic—helpers are famous for being helpful like a raccoon in your kitchen.
2) Symptom: Tunnel “up” but no traffic; CHILD SA missing
Root cause: Proposal mismatch (cipher/PRF/DH) or traffic selector mismatch (subnets don’t align). NAT doesn’t cause this directly but often reveals it when peers pick different IDs.
Fix: Compare proposals and selectors. Use narrow, explicit subnets. Confirm IDs (leftid/rightid). If one side expects a public IP as ID but the other sends an FQDN, negotiation may succeed partially and then fail at CHILD SA.
3) Symptom: Traffic passes for ~30–120 seconds, then stalls until reconnect
Root cause: NAT mapping timeout for UDP/4500; keepalives too infrequent or blocked.
Fix: Reduce NAT keepalive interval; enable DPD with reasonable timers. If behind CGNAT or mobile networks, tune aggressively. Watch UDP/4500 continuity with tcpdump.
4) Symptom: Ping works, but large downloads stall; some websites load, others hang
Root cause: MTU/PMTUD failure due to NAT-T overhead and blocked ICMP fragmentation-needed messages.
Fix: Clamp MSS (gateway is easiest), or reduce interface MTU on tunnel/VTI. Validate with tracepath and DF ping. Don’t “fix” this by disabling encryption features; fix packet sizing.
5) Symptom: One user behind a NAT can connect; second user fails or kicks the first off
Root cause: Gateway identifies peers by source IP only, or policy lookup collides when multiple clients share one NAT public IP.
Fix: Match peers by IKE identity, not IP. Ensure unique IDs/certs. Avoid configs that assume “one peer per IP.”
6) Symptom: Works from some networks, fails from hotels/cafes
Root cause: UDP/4500 filtered, rate-limited, or shaped; captive portals and “security” Wi‑Fi gear can be aggressively dumb.
Fix: Provide a fallback (different network, alternative VPN transport, or gateway options). At minimum, detect and message clearly: “Your network blocks UDP/4500.”
7) Symptom: IKE completes, CHILD SA installs, but return traffic never arrives
Root cause: Asymmetric routing or missing routes; decrypted traffic exits the wrong interface; stateful firewall drops return packets.
Fix: Validate routing with ip route get. Ensure policy routing if needed. Confirm that the protected subnets are routed back into the tunnel.
8) Symptom: Rekey causes outages every N minutes
Root cause: Rekey timing mismatch, buggy NAT devices that drop state during rekey, or mismatched lifetimes causing overlapping SAs to misroute.
Fix: Align lifetimes; use sane rekey margins; update firmware; reduce complexity. If you can’t update the NAT device, shorten the problem window with more deterministic timers.
Short joke #2: If your VPN only works until rekey, congratulations: you’ve built a periodic outage generator with enterprise-grade encryption.
Checklists / step-by-step plan
Checklist A: Bring up IPsec NAT-T behind NAT (site-to-site or road warrior)
- Decide identities first. Use FQDN/cert identities. Avoid “peer must come from this IP” unless it truly is static.
- Open the right ports. Inbound/outbound UDP/500 and UDP/4500 to the gateway. If you insist on allowing ESP (protocol 50), fine—but don’t rely on it behind NAT.
- Enable NAT-T explicitly on both peers if your platform makes it optional.
- Confirm proposals match. Cipher, integrity, PRF, DH group; keep it modern and interoperable.
- Define traffic selectors/subnets carefully. No overlaps. No “0.0.0.0/0 unless you mean it.”
- Plan MTU/MSS. Start with conservative MTU (e.g., ~1380) or clamp MSS at the gateway.
- Set keepalives/DPD for the worst NAT. You’re tuning for the crankiest device on the path, not the nicest.
- Instrument. Capture points (client egress, gateway WAN). Log IKE and CHILD SA states and byte counters.
Checklist B: When it’s broken, isolate layer-by-layer
- Packet shape: Do you see UDP/500? Do you see UDP/4500? Do you see ESP-in-UDP?
- State: Is IKE established? Is CHILD SA installed? Are counters incrementing?
- Policy: Do selectors match expected subnets? Are routes correct on both sides?
- Transport: Are you hitting MTU limits? Are ICMP errors blocked? Is UDP timing out?
- Middleboxes: Any ALG/IPsec helper/inspection doing “smart” things? Consider turning that “smart” off.
Checklist C: Hardening choices that make NAT-T less fragile
- Prefer IKEv2. Better state management; supports mobility features like MOBIKE when available.
- Use certificates for anything corporate. PSKs scale poorly and lead to sloppy peer matching.
- Standardize timers. DPD + NAT keepalives consistent across peers; avoid vendor defaults roulette.
- Baseline MTU/MSS. Pick a known-good setting and document it per tunnel type.
- Monitor byte counters, not “tunnel up.” “Up” is a UI concept. Bytes are physics.
FAQ
1) Why does IPsec work on my phone hotspot but not my office Wi‑Fi?
Different middleboxes. Office Wi‑Fi often includes firewalls, proxies, IDS/IPS, and “helpful” UDP filtering. Phone hotspots commonly allow UDP/4500 because carriers expect VPN use. Capture traffic: if UDP/4500 never leaves or never arrives, you’ve found the culprit.
2) Do I need to allow ESP (IP protocol 50) if I have NAT-T?
Not strictly. NAT‑T encapsulates ESP inside UDP/4500, so the path only needs UDP. Allowing ESP can help if there is no NAT and both peers prefer native ESP, but behind NAT it’s not reliable. Open UDP/500 and UDP/4500 first.
3) My logs say “NAT detected,” but then negotiation fails. What now?
NAT detection succeeding just means both sides noticed address translation. Failure after that is usually UDP/4500 blocked, NAT-T disabled on one side, or an identity/selector mismatch that shows up during AUTH/CHILD SA creation. Confirm the switch to UDP/4500 with tcpdump.
4) What’s the difference between DPD and NAT keepalives?
DPD (Dead Peer Detection) checks whether the peer is alive at the IKE layer. NAT keepalives are small periodic packets intended to keep UDP NAT mappings alive. You often want both: keepalives to prevent silent NAT expiration, and DPD to detect actual peer failure.
5) Why does the tunnel show “connected” but traffic doesn’t pass?
Because “connected” usually reflects IKE SA status. Data requires a CHILD SA plus correct routing/policy plus non-broken MTU. Check CHILD SA counters and run ip route get for a protected IP. If counters don’t move, traffic isn’t even entering the tunnel.
6) Is double NAT (home router + ISP CGNAT) a deal-breaker for NAT-T?
Not necessarily, but it increases the chance of aggressive UDP timeouts and port rewriting. If you can’t control the path, tune keepalives aggressively and consider mobility support (MOBIKE) for clients that change networks.
7) Should I lower MTU on the client or the gateway?
Prefer gateway-side MSS clamping for TCP: one change fixes many clients. For non-TCP protocols or if you control the client fleet, setting a lower tunnel/interface MTU can be cleaner. Validate with DF ping/tracepath.
8) How do I tell if UDP/4500 is blocked without Wireshark access on the client?
Check gateway captures: if you never see UDP/4500 from that user/network, it’s blocked upstream. If you do see it but the client says “connecting,” the issue is likely on the client host firewall or VPN software. You can also compare behavior across networks: if it works on LTE but not on that Wi‑Fi, the Wi‑Fi is guilty until proven innocent.
9) Can “IPsec passthrough” on a router break NAT-T?
Yes. Some devices implement helpers/ALGs that try to track and rewrite IKE/ESP. When wrong, they mangle packets or state, especially with multiple clients. If you can, disable IPsec helper features and rely on plain UDP forwarding/stateful NAT.
10) Why does multiple users behind the same NAT cause weirdness on some gateways?
If the gateway indexes sessions by source IP alone, two users share that IP and collide. Proper implementations use IKE identities and SPI values to disambiguate. Fix is configuration (peer matching by ID) or upgrading a gateway that still thinks NAT is a temporary fad.
Conclusion: what to change Monday morning
IPsec NAT‑T isn’t magic; it’s a compatibility layer for a network design (NAT everywhere) that IPsec didn’t originally assume. When the VPN won’t come up behind NAT, it’s rarely mysterious. It’s usually one of four things: UDP/4500 blocked, NAT mapping timing out, MTU blackholing, or identity/policy mismatches made worse by address translation.
Practical next steps:
- Instrument with packet captures at client egress and gateway WAN. Prove UDP/500 and UDP/4500 in both directions.
- Stop trusting “tunnel up.” Monitor CHILD SA byte counters and route correctness.
- Standardize timers (DPD + keepalives) for the worst NAT you expect, not the best lab network.
- Make MTU boring with MSS clamping or explicit MTU, then document the chosen values.
- Match peers by identity (cert/FQDN) instead of brittle public IP assumptions, especially for NATed branches and remote users.
Do those, and NAT‑T becomes what it was always meant to be: unremarkable plumbing. The best VPN incident is the one you never hear about.