You click “Connect.” The little VPN icon lights up. Everyone relaxes. Then nothing loads. Slack spins, browsers time out, and you start hearing the phrase “but it says connected” like it’s a legal contract.
L2TP/IPsec is old, widely implemented, and still surprisingly common in corporate environments because it’s built-in on many clients. It’s also the kind of VPN that can “connect” while delivering exactly zero usable connectivity—because the control plane succeeded and the data plane is quietly on fire.
What “connected” really means in L2TP/IPsec
L2TP/IPsec is two protocols in a trench coat:
- IPsec provides encryption and transport for packets, typically using IKEv1 (often “L2TP/IPsec PSK” in client UIs). It negotiates keys and establishes SAs (Security Associations). If this works, you can truthfully say “the tunnel is up.”
- L2TP runs inside IPsec and sets up a PPP session. PPP is where you get an IP address, DNS servers, and sometimes a default route or other pushed routes.
So “connected” could mean any of these:
- IKE succeeded; IPsec SAs exist; L2TP never came up. UI might still claim success depending on the client.
- L2TP came up; PPP authenticated; client got an IP; but routes didn’t change the way you think they did.
- Routes changed; packets flow; but NAT/firewall on the server drops return traffic.
- Everything is “fine” except DNS is pointing at a resolver you can’t reach over the VPN.
- Everything is “fine” until MTU/fragmentation kills large packets and web browsing becomes interpretive dance.
Rule one: treat “connected” as a hint, not a diagnosis.
Joke #1: A VPN that says “connected” but passes no traffic is like a meeting that could’ve been an email—technically occurred, functionally useless.
Interesting facts and context (why this stack behaves like this)
Some context makes the weirdness less mysterious and the fixes more predictable. Here are a few concrete, historically relevant facts:
- L2TP is basically L2F + PPTP DNA. It was standardized in the late 1990s as a way to tunnel PPP over IP networks. PPP assumptions still leak everywhere.
- L2TP provides no encryption. That’s why “L2TP/IPsec” is the usual pairing. L2TP is the session layer; IPsec is the security layer.
- Most clients use IKEv1 for L2TP/IPsec. Modern IPsec deployments prefer IKEv2, but L2TP stacks often stick to IKEv1 for compatibility.
- NAT-Traversal became the default reality. IPsec originally didn’t love NAT. NAT-T encapsulates ESP in UDP/4500 so it survives typical consumer routers.
- L2TP uses UDP/1701. In L2TP/IPsec, UDP/1701 is usually protected by IPsec; if you see it in the clear on the internet, you’re probably looking at a misconfiguration.
- Windows “connected” does not guarantee a default route change. Depending on settings (split tunneling, metrics, “use default gateway on remote network”), your internet route might stay local.
- PPP options still matter. Things like MRU/MTU, DNS assignment, and “defaultroute” settings can make or break usability.
- Many L2TP servers are built from older components. Common Linux stacks involve strongSwan/Libreswan + xl2tpd + pppd—each with its own logs, timers, and failure modes.
- ESP is IP protocol 50, not TCP/UDP. Firewalls that “allow IPsec” but forget protocol 50 create very specific half-working situations (or force NAT-T unexpectedly).
Fast diagnosis playbook (check 1, 2, 3)
If you’re on call, you don’t have time to rediscover networking. Here’s the fastest path to the bottleneck.
1) Is it routing or DNS?
- Test raw IP reachability (ping a public IP like 1.1.1.1 or your corporate public endpoint).
- Test DNS separately (resolve a name using the DNS servers you think you have).
If IP works and DNS doesn’t: stop touching IPsec. Fix DNS assignment or reachability.
2) Is traffic leaving the client over the tunnel?
- Inspect the routing table on the client. Look for a default route via PPP/tunnel interface (full tunnel) or specific routes (split tunnel).
- Check interface metrics: “right route, wrong priority” is a classic.
3) Is traffic returning?
- On the server, capture packets on the VPN interface and the uplink. If you see packets from clients going out but no return translation, it’s NAT/firewall.
- Check IP forwarding and NAT/MASQUERADE rules.
4) Does it die only for some sites or large transfers?
- That’s MTU/fragmentation until proven otherwise. Test with “don’t fragment” pings and clamp MSS.
5) Is it intermittent?
- Look for UDP timeout/NAT state issues, rekeying problems, or multiple clients behind one NAT with identical IDs.
The real failure modes: routing, NAT, DNS, MTU, firewall, and policy
Failure mode A: The client never sends internet traffic into the tunnel (routing/metrics)
L2TP/IPsec is often deployed as “full tunnel” (all traffic goes through VPN), but clients are frequently configured as split tunnel by accident—or by an old “optimization” nobody documented.
Common patterns:
- Default route still points to the local gateway, not PPP.
- A default route exists via VPN, but has a higher metric than the local one.
- Only corporate subnets are routed into the tunnel; internet stays local. Users interpret this as “no internet” because corporate DNS forces internal resolution paths.
Decision: determine whether you want split tunnel or full tunnel. Then make the routing table match that intent.
Failure mode B: Traffic enters the tunnel but dies on the server (missing forwarding, NAT, or firewall)
This is the most common “connected but no internet” root cause on Linux L2TP concentrators. The PPP session is up. Client has an IP. Packets arrive. Then the server shrugs and drops them on the floor because:
net.ipv4.ip_forwardis off.- NAT masquerade isn’t set for the VPN client pool toward the uplink.
- Firewall rules allow IKE/ESP/L2TP but block forwarded traffic.
- Reverse path filtering (rp_filter) discards asymmetric flows.
Decision: decide if this VPN is an internet egress path. If yes, you must forward and NAT. If no, don’t pretend it is—push split-tunnel routes and correct DNS.
Failure mode C: DNS is wrong (and everything “looks down”)
VPN “internet” is often just DNS. The browser needs DNS to turn names into IP addresses. If PPP pushes DNS servers that are only reachable on an internal network you didn’t route, you get:
- Ping to 1.1.1.1 works.
- Ping to example.com fails.
- Apps that hardcode IPs work; everything else doesn’t.
Decision: align DNS with routing. Full tunnel can use internal resolvers; split tunnel must use reachable resolvers (public or a DNS forwarder reachable via the split routes).
Failure mode D: MTU and fragmentation (the “some sites load, others don’t” classic)
L2TP + IPsec adds overhead. Then NAT-T adds more. Then PPP adds its own. If your path MTU is tight (common with PPPoE, LTE, hotel Wi‑Fi, or “helpful” middleboxes), large packets get blackholed.
Symptoms include:
- SSH works, web pages half-load, TLS handshakes fail intermittently.
- Small pings succeed; larger ones fail when DF is set.
- Speed tests start then crater.
Fix: clamp TCP MSS on the server (and sometimes on clients) and/or set PPP MTU/MRU to a conservative value (often 1400-ish, but verify).
Failure mode E: Firewall and NAT-T edge cases (UDP 4500, ESP, and “state”)
IPsec is sensitive to firewalls that are “almost” configured. Allowing UDP/500 and UDP/4500 but not ESP (protocol 50) can force NAT-T or break non-NAT paths. Some environments block UDP/4500 outright. Others allow it but drop long-lived idle UDP mappings, which causes “connects, then no traffic after a few minutes.”
Decision: decide whether you require NAT-T and ensure keepalives/rekey timers match your real-world NAT timeouts.
Failure mode F: Overlapping subnets (you routed yourself into a paradox)
If the client’s local LAN uses the same subnet as the corporate network (example: both are 192.168.1.0/24), the client will send traffic to the local LAN instead of the tunnel. The VPN might be perfect; the routing decision is not.
Fix: avoid common RFC1918 ranges for corporate networks if you can. If you can’t, use policy-based routing, assign non-overlapping pools, or NAT the VPN client side (last resort, but sometimes the only one that works at scale).
Joke #2: Overlapping subnets are like naming two coworkers “Chris” and then blaming the email system when the wrong one replies.
Practical tasks with commands: what you run, what it means, what you decide
Below are field-tested tasks you can run while the incident is happening. Commands are Linux-centric on the server side (strongSwan/xl2tpd/pppd), but the logic transfers to other platforms.
Each task includes: command, realistic output, what it means, and what decision you make.
Task 1: Confirm IPsec SAs exist (server)
cr0x@server:~$ sudo ipsec statusall
Status of IKE charon daemon (strongSwan 5.9.8, Linux 6.5.0, x86_64):
uptime: 2 hours, since Dec 27 08:12:41 2025
worker threads: 16 of 16 idle
Connections:
l2tp-psk: %any...%any IKEv1
Security Associations (1 up, 0 connecting):
l2tp-psk[3]: ESTABLISHED 6 minutes ago, 203.0.113.44[203.0.113.44]...198.51.100.10[198.51.100.10]
l2tp-psk{7}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c2a3d8c4_i c8a1aa9b_o
Means: IPsec is up. If users still have no internet, the problem is likely PPP/routing/NAT/DNS/MTU, not IKE negotiation.
Decision: Move on to PPP session verification and forwarding. Don’t waste time rotating PSKs “just in case.”
Task 2: Verify xl2tpd is accepting sessions (server)
cr0x@server:~$ sudo journalctl -u xl2tpd -n 30 --no-pager
Dec 27 10:01:12 server xl2tpd[1290]: Connection established to 203.0.113.44, 1701, LNS session 4087
Dec 27 10:01:12 server xl2tpd[1290]: Call established with 203.0.113.44, Local: 53291, Remote: 22714, Serial: 1
Dec 27 10:01:13 server pppd[24410]: pppd 2.4.9 started by root, uid 0
Dec 27 10:01:13 server pppd[24410]: peer from calling number 203.0.113.44 authorized
Dec 27 10:01:13 server pppd[24410]: local IP address 10.10.50.1
Dec 27 10:01:13 server pppd[24410]: remote IP address 10.10.50.101
Dec 27 10:01:13 server pppd[24410]: primary DNS address 10.10.0.53
Means: PPP is up and a client got 10.10.50.101 with DNS 10.10.0.53. Good. Now verify routes and reachability to 10.10.0.53 (and internet) from the server.
Decision: If DNS is internal (10.10.0.53), ensure the client can route to 10.10.0.0/16 via VPN or full tunnel.
Task 3: Confirm the PPP interface exists and has addresses (server)
cr0x@server:~$ ip -br addr show | egrep 'ppp|eth0|ens'
ens3 UP 198.51.100.10/24
ppp0 UP 10.10.50.1/32
Means: ppp0 is up with local /32. That’s normal for PPP. Clients are on the other side with their own /32s.
Decision: Ensure you have a route to the client pool via ppp interfaces or that your firewall rules match ppp+. Many people mistakenly write rules for “10.10.50.0/24 via ppp0” and then wonder why it’s messy.
Task 4: Check IP forwarding (server)
cr0x@server:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
Means: The server will not forward packets between interfaces. VPN clients will connect, but won’t reach anything beyond the server.
Decision: Turn it on persistently and immediately if this host is intended to route.
Task 5: Enable IP forwarding (server)
cr0x@server:~$ sudo sysctl -w net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1
Means: Forwarding enabled now. You still need firewall/NAT correctness.
Decision: If this fixes it, make it persistent via /etc/sysctl.d/*.conf and keep reading: forwarding without firewall rules is how you get surprises.
Task 6: Check NAT (MASQUERADE) for the VPN client pool (server)
cr0x@server:~$ sudo iptables -t nat -S | egrep 'MASQUERADE|10\.10\.50'
-A POSTROUTING -s 10.10.50.0/24 -o ens3 -j MASQUERADE
Means: Outbound NAT exists for the client pool 10.10.50.0/24 toward ens3 (internet uplink). That’s the usual requirement for “VPN gives internet.”
Decision: If this line is missing, add it. If it’s present but traffic still fails, look at filter rules and rp_filter.
Task 7: Check FORWARD policy and counters (server)
cr0x@server:~$ sudo iptables -S FORWARD
-P FORWARD DROP
-A FORWARD -i ppp+ -o ens3 -j ACCEPT
-A FORWARD -i ens3 -o ppp+ -m state --state ESTABLISHED,RELATED -j ACCEPT
Means: Forward default is DROP (good default). You allow ppp-to-internet and return traffic. Also good.
Decision: If you have DROP without these accepts, add them. If accepts exist but still no traffic, check that the interface names match reality (ppp+ helps) and that nftables isn’t actually in control.
Task 8: Identify whether nftables is the real firewall (server)
cr0x@server:~$ sudo nft list ruleset | head -n 25
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
iif "lo" accept
ct state established,related accept
udp dport {500, 4500, 1701} accept
}
chain forward {
type filter hook forward priority 0; policy drop;
iifname "ppp0" oifname "ens3" accept
iifname "ens3" oifname "ppp0" ct state established,related accept
}
}
Means: nftables is active and has specific rules for ppp0 only. If clients land on ppp1/ppp2, forwarding might fail.
Decision: Replace ppp0 matches with a wildcard style (iifname “ppp+”) or use sets. Otherwise the first client works and the second calls you at 2 a.m.
Task 9: Validate the server can reach the internet (server)
cr0x@server:~$ ping -c 2 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=12.4 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=57 time=12.1 ms
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
Means: The concentrator has basic connectivity. If VPN clients don’t, it’s not “the internet is down.”
Decision: Focus on forwarding/NAT/route selection for client traffic.
Task 10: Observe whether client packets traverse the server (tcpdump)
cr0x@server:~$ sudo tcpdump -ni ppp0 icmp or udp port 53
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ppp0, link-type PPP, snapshot length 262144 bytes
10:04:21.112233 IP 10.10.50.101 > 1.1.1.1: ICMP echo request, id 4112, seq 1, length 64
10:04:22.114455 IP 10.10.50.101 > 1.1.1.1: ICMP echo request, id 4112, seq 2, length 64
Means: Client is sending traffic through the tunnel to the server. If you don’t see replies, the issue is return path: NAT, firewall, or upstream routing.
Decision: Capture on ens3 next to see if requests go out and whether replies come back.
Task 11: Confirm NAT translation occurs (server, capture on uplink)
cr0x@server:~$ sudo tcpdump -ni ens3 icmp and host 1.1.1.1
listening on ens3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:04:21.112900 IP 198.51.100.10 > 1.1.1.1: ICMP echo request, id 4112, seq 1, length 64
10:04:21.125001 IP 1.1.1.1 > 198.51.100.10: ICMP echo reply, id 4112, seq 1, length 64
Means: NAT is working: the packet source is the server’s public IP (198.51.100.10), and replies return. If clients still don’t get replies, your forward chain back to PPP may be blocking, or rp_filter is dropping.
Decision: Check forward rules and rp_filter, then check whether replies reach ppp0.
Task 12: Check reverse path filtering (rp_filter) (server)
cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.ens3.rp_filter net.ipv4.conf.ppp0.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.ens3.rp_filter = 1
net.ipv4.conf.ppp0.rp_filter = 1
Means: Strict rp_filter can drop packets when the kernel thinks the return path is “wrong,” which happens with tunnels, NAT, policy routing, or asymmetric routing.
Decision: Set rp_filter to 0 (off) or 2 (loose) on relevant interfaces. For VPN concentrators, loose mode is often the sane compromise.
Task 13: Test DNS resolution via the pushed DNS server (server-side sanity check)
cr0x@server:~$ dig @10.10.0.53 example.com +short
93.184.216.34
Means: The internal resolver answers. That doesn’t prove the client can reach it, but it removes one variable.
Decision: If DNS resolves here but not for clients, it’s routing/firewall between VPN pool and DNS network, or client is not actually using that DNS.
Task 14: Check that PPP is pushing the intended DNS and routes (server config)
cr0x@server:~$ sudo sed -n '1,120p' /etc/ppp/options.xl2tpd
ipcp-accept-local
ipcp-accept-remote
ms-dns 10.10.0.53
ms-dns 10.10.0.54
noccp
auth
mtu 1400
mru 1400
proxyarp
Means: DNS is pushed and MTU is set. If users report “some sites load,” MTU here is a good sign. If MTU is default (often 1500), you may need to adjust.
Decision: If you don’t control MTU/MRU here, plan to add it and clamp MSS at the firewall.
Task 15: Detect MTU blackhole with DF ping (from server toward a client path)
cr0x@server:~$ ping -M do -s 1472 -c 2 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1476
ping: local error: message too long, mtu=1476
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1024ms
Means: Path MTU is below 1500 (here it suggests ~1476 at this hop), and if ICMP fragmentation-needed messages are blocked somewhere, clients will experience “random” failures.
Decision: Clamp MSS (e.g., to 1360–1410 range depending on overhead) and/or set PPP MTU/MRU lower. Verify with iterative tests.
Task 16: Clamp TCP MSS for forwarded VPN traffic (server)
cr0x@server:~$ sudo iptables -t mangle -A FORWARD -i ppp+ -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
cr0x@server:~$ sudo iptables -t mangle -S FORWARD | tail -n 2
-A FORWARD -i ppp+ -p tcp -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
Means: SYN packets will negotiate an MSS that fits the real path MTU, reducing fragmentation pain.
Decision: If this improves web/TLS reliability immediately, keep it and make it persistent. Still validate PMTUD (ICMP) isn’t being blocked upstream.
Task 17: Verify client pool routes exist (server routing)
cr0x@server:~$ ip route show | egrep '10\.10\.50|default'
default via 198.51.100.1 dev ens3
10.10.50.101 dev ppp0 scope link
Means: For PPP, routes are often per-client /32 routes. If you expected a /24, that expectation is the bug, not the kernel.
Decision: Ensure firewall rules and monitoring understand this. Use ppp+ interface matching rather than relying on a neat subnet route.
Task 18: Confirm UDP/4500 and ESP are allowed on the perimeter (server)
cr0x@server:~$ sudo iptables -S INPUT | egrep 'udp.*(500|4500|1701)|esp'
-A INPUT -p udp -m udp --dport 500 -j ACCEPT
-A INPUT -p udp -m udp --dport 4500 -j ACCEPT
-A INPUT -p udp -m udp --dport 1701 -j ACCEPT
-A INPUT -p esp -j ACCEPT
Means: The common ports and ESP are allowed. If ESP is missing and NAT-T is not used, you’ll have “connects sometimes” depending on client NAT conditions.
Decision: Allow ESP when feasible. If you can’t, enforce NAT-T and ensure 4500 is reliable end-to-end.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
The company had a legacy L2TP/IPsec concentrator that “always worked.” A small migration moved it from one VM host to another. Same IP. Same firewall rules. Same config files. The change request was boring, which is usually a good sign.
Monday morning, the helpdesk saw a flood: VPN connects, but “internet is dead.” People could reach some internal services but not others, and most external sites were unreachable. The networking team immediately blamed the upstream ISP because that’s what everyone blames when the browser spins.
The wrong assumption: “If the tunnel is up, routing is correct.” In reality, the new hypervisor’s network used a different interface name (ens192 instead of ens3). NAT rules still referenced the old device. Packets came in over ppp0, hit POSTROUTING, and… nothing matched. No masquerade. No internet.
It got worse: a few engineers could reach internal services because split tunnel routes still worked for internal subnets, so the symptoms looked inconsistent. That inconsistency wasted hours. They were hunting for a flaky network when the issue was a deterministic mismatch.
The fix was brutally simple: change the NAT rule to match the real uplink interface name (or better, match by routing table / use nftables sets). Then they added a boot-time check that asserts “if ppp+ exists, NAT rule must exist.” The lesson stuck: never key critical routing/NAT behavior on an interface name you don’t control.
Mini-story 2: The optimization that backfired
A security review flagged that VPN users were hairpinning all internet traffic through the data center. Someone proposed split tunneling as an optimization: “Only route corporate subnets through the VPN; leave everything else local. Less load, better performance.” On paper, correct. In production, comedy.
They pushed split tunnel routes without aligning DNS. PPP continued to push internal DNS resolvers because “that’s what we’ve always done.” Now remote laptops were resolving public domains via internal resolvers reachable only through the VPN, but the queries went into the tunnel while the answers came back over the local interface—or didn’t come back at all due to stateful firewalling and asymmetric routing on the client side.
The symptom was loud: “VPN connects, no internet.” The reality was narrower: no name resolution, intermittent internal access, and some apps (hardcoded IPs) still worked. Users don’t report “DNS resolution failure”; they report “nothing works.” Fair.
They “fixed” it temporarily by telling users to hardcode public DNS servers. That created a different problem: internal names stopped resolving, and security folks were unhappy about leaking internal queries. They eventually did the boring correct thing: provide a VPN-accessible DNS forwarder that answers internal zones and recurses for external, and ensure DNS queries follow the same routing policy as everything else.
Optimization is fine. But split tunneling is a routing policy change, not a CPU upgrade. Treat it with the same caution you’d give to changing database isolation levels: you’ll discover the edge cases the hard way.
Mini-story 3: The boring but correct practice that saved the day
A different org ran L2TP/IPsec for a fleet of industrial laptops. Not glamorous. But the SRE team had a habit: every VPN change required a scripted smoke test from an external runner. It performed: connect, verify assigned IP, ping a public IP, resolve a name, fetch a small HTTPS page, and download a larger file to test MTU/MSS.
One Friday, a routine firewall policy cleanup removed “weird legacy rules.” The next deployment pipeline ran the smoke test and failed at “fetch HTTPS page.” Ping worked. DNS worked. HTTP sometimes worked. HTTPS failed. This is the part where most teams start randomly restarting services. They didn’t.
The script included an MTU probe and flagged fragmentation failure. The firewall cleanup had also removed ICMP “fragmentation-needed” allowance and an MSS clamping rule. Without MSS clamping, TLS packets got blackholed on certain paths. The test caught it before humans did.
Rollback was immediate. They reintroduced a controlled MSS clamp rule, documented why it existed, and added a firewall unit test: “VPN clients must complete TLS handshake to a public endpoint.” Boring. Correct. Saved the weekend.
Common mistakes: symptom → root cause → fix
This section is opinionated because production needs opinions. These are the patterns that create “connected but no internet” tickets.
1) Symptom: “Connected” but can’t ping any IP (even 1.1.1.1)
- Root cause: Client traffic is not routed into the tunnel; or PPP session isn’t actually passing data; or server drops forwarding.
- Fix: On client, verify default route and interface metrics. On server, enable IP forwarding and allow FORWARD traffic from ppp+ to uplink, with NAT if internet egress is intended.
2) Symptom: Can ping public IPs but websites don’t load
- Root cause: DNS misconfiguration (unreachable DNS servers or client not using pushed DNS).
- Fix: Push reachable DNS via PPP options (ms-dns). Ensure routing allows access to those resolvers. For split tunnel, use a DNS forwarder reachable over the tunnel.
3) Symptom: Only some sites load; HTTPS is flaky; large downloads fail
- Root cause: MTU blackhole or fragmentation issues due to L2TP/IPsec overhead and blocked ICMP.
- Fix: Clamp TCP MSS on the server for forwarded VPN traffic. Set PPP MTU/MRU (e.g., 1400) and retest. Permit necessary ICMP types if you control firewalls.
4) Symptom: Internal resources work; internet does not
- Root cause: Split tunnel by design or by accident; missing NAT for internet egress; or policy forbids internet through VPN.
- Fix: Decide policy. If full tunnel is required, push default route (client-side setting) and configure NAT/forwarding. If split tunnel is desired, push only internal routes and provide sensible DNS.
5) Symptom: Internet works, but only for the first connected client
- Root cause: Firewall rules match a single interface (ppp0) instead of all PPP interfaces (ppp+), or NAT is tied to a specific interface that changes.
- Fix: Use ppp+ wildcards (iptables) or iifname/oifname patterns/sets (nftables). Avoid brittle interface name assumptions.
6) Symptom: Connect succeeds on some networks but not others (hotel Wi‑Fi, LTE)
- Root cause: UDP 4500 blocked or aggressively timed out; NAT-T issues; intermediate devices mangling ESP/UDP.
- Fix: Ensure NAT-T is enabled and keepalives are configured. If UDP is unreliable, consider migrating away from L2TP/IPsec to IKEv2 or TLS-based VPNs where feasible.
7) Symptom: “Connected” but can’t access corporate subnet that overlaps home subnet
- Root cause: Overlapping RFC1918 address space; client routes prefer local LAN.
- Fix: Change corporate addressing (best), or NAT VPN client subnets on the concentrator, or use policy routing per-client. At minimum, stop using 192.168.0.0/24 or 192.168.1.0/24 for anything you expect roaming clients to access.
8) Symptom: It worked yesterday; today connects but no traffic after a firewall change
- Root cause: Allowed IKE/L2TP ports but blocked forwarding, DNS, ICMP fragmentation-needed, or return traffic state.
- Fix: Audit rules for: UDP/500, UDP/4500, UDP/1701 (as needed), ESP, plus FORWARD and NAT. Restore MSS clamp and required ICMP if MTU issues appear.
Checklists / step-by-step plan
Checklist 1: Decide the policy (full tunnel vs split tunnel)
- Full tunnel: Client default route goes through VPN. You must NAT and forward on the server. DNS can be internal.
- Split tunnel: Only corporate prefixes go through VPN. You must push those routes. DNS must not create dependency on unreachable internal resolvers (use a forwarder reachable through the tunnel).
If you don’t make this decision explicit, your system will make it implicitly, and it will choose chaos.
Checklist 2: Server-side essentials (Linux L2TP/IPsec concentrator)
- IPsec SAs established (strongSwan/libreswan status shows INSTALLED).
- xl2tpd sees sessions; pppd authenticates; assigns client IPs; pushes DNS if desired.
net.ipv4.ip_forward=1.- NAT masquerade for client pool out the uplink (if full tunnel/internet egress).
- Firewall FORWARD allows ppp+ to uplink and return ESTABLISHED traffic.
- MSS clamping for forwarded TCP SYNs, or conservative PPP MTU/MRU.
- rp_filter set to loose/off where necessary.
- Logs: strongSwan, xl2tpd, pppd all collected and searchable.
Checklist 3: Client-side essentials (generic)
- Confirm you got an IP on the VPN interface and a DNS configuration you understand.
- Check route table: do you have a default route via VPN (full tunnel) or specific routes (split tunnel)?
- Test: ping public IP, resolve a name, curl an HTTPS endpoint.
- If only HTTPS fails: suspect MTU/MSS and fragmentation blackholes.
- If corporate resources fail from home: suspect overlapping subnets.
Step-by-step recovery plan during an outage
- Confirm scope. One user vs everyone; specific networks vs all networks; specific client OS vs all.
- Prove IPsec is up. If IPsec isn’t up, fix IKE/PSK/certs/ports first. If it is up, stop blaming IKE.
- Prove PPP is up. Look for assigned client IP and DNS in pppd logs.
- Prove routing intent. Full tunnel or split tunnel? Verify client routes match the policy.
- Prove server forwarding/NAT. tcpdump on ppp+ and uplink; verify MASQUERADE and FORWARD rules.
- Prove DNS. Resolve via the DNS server you’re pushing; confirm reachability across routes.
- Prove MTU. If symptoms are “some sites,” enable MSS clamping and set PPP MTU/MRU.
- Stabilize. Make changes persistent, add monitoring, and write down what you changed and why.
Operational principle (one quote)
“Hope is not a strategy.” — paraphrased idea attributed to operations and reliability leaders
Translation: if your VPN works only when “the network is nice,” it doesn’t work. Make it deterministic.
FAQ
1) Why does L2TP/IPsec say “connected” when nothing works?
Because “connected” often means IKE succeeded and SAs are installed. Actual usability depends on PPP, routing, NAT, firewall, DNS, and MTU. Different layers succeed independently.
2) Is this usually a client problem or server problem?
Most of the time it’s server-side forwarding/NAT/firewall, or a policy mismatch (split vs full tunnel). Client-side routing metrics and DNS selection are the next most common.
3) If internal resources work but internet doesn’t, what’s the most likely cause?
Either split tunnel is active (by design or accident), or NAT masquerade for the VPN client pool is missing. Decide whether VPN should provide internet egress. Then implement it explicitly.
4) If ping to 1.1.1.1 works but websites don’t, what should I do first?
Check DNS. Verify what resolvers the client is using and whether those resolvers are reachable via the VPN routes. DNS problems are the fastest “it’s down” illusion in the business.
5) What MTU should I use for L2TP/IPsec?
There isn’t one universal value, but 1400 is a common starting point for PPP MTU/MRU in L2TP/IPsec with NAT-T. Better: clamp MSS to PMTU and validate with DF pings and real HTTPS transfers.
6) Do I need to allow UDP/1701 through the firewall?
For L2TP/IPsec, UDP/1701 is used for L2TP control/data and is usually protected by IPsec. Practically, you often still allow UDP/1701 to the VPN server, plus UDP/500, UDP/4500, and ESP. Your exact rule set depends on your IPsec mode and whether NAT-T is used.
7) What about clients behind the same NAT? Can that break things?
Yes. Multiple clients behind one NAT can trigger NAT-T edge cases, especially with IKEv1 and identical identifiers. Ensure your IPsec configuration supports multiple peers and uses unique IDs/usernames, and tune keepalives so UDP mappings don’t expire.
8) Can overlapping subnets really cause “no internet”?
Yes, indirectly. If DNS or proxies route through a corporate subnet that overlaps a home LAN, the client sends traffic locally and it fails. Users interpret that as “internet down” because browsers and apps depend on those internal services.
9) Should we keep using L2TP/IPsec?
If you need maximum compatibility with built-in clients, you’ll keep seeing it. If you can choose, prefer IKEv2 or a modern TLS-based VPN that behaves better through NAT and has cleaner client tooling. But if you’re stuck with L2TP/IPsec, you can still make it reliable—by being explicit about routing, DNS, and MTU.
10) What’s the single most effective monitoring check for this?
Run an external synthetic test that establishes a VPN session and then performs: ping public IP, DNS lookup, and HTTPS GET of a small page plus a larger download. That catches routing, DNS, and MTU regressions quickly.
Next steps (what to change so you don’t keep re-living this)
Fixing today’s incident is fine. Preventing the next one is better, and honestly less exhausting.
- Write down your intent. Full tunnel or split tunnel. Which DNS. Which client pool. Which subnets. Make it a one-page runbook that matches reality.
- Make forwarding/NAT deterministic. Enable
ip_forwardpersistently, use interface wildcards or routing-based matching, and keep firewall rules version-controlled. - Standardize MTU handling. Add MSS clamping for forwarded traffic and set PPP MTU/MRU intentionally. Treat PMTUD failures as a known risk.
- Stop debugging by UI status. Monitor IPsec SAs, PPP sessions, and end-to-end data-plane tests separately.
- Add a smoke test. One script from outside your network that validates connectivity the way users experience it. Run it after every change.
- Plan a migration. If L2TP/IPsec is a legacy anchor, define a path to IKEv2 or another modern solution. Not because it’s trendy—because operational simplicity is a feature.
If you do only one thing: add packet captures and a synthetic VPN test to your incident workflow. “Connected but no internet” stops being a mystery and becomes a checklist item.