You can reach it from the office. You can reach it from the rack next door. The health check is green on the private subnet. Then a customer on the internet tries it and everything falls apart like a cheap folding chair.
This is the classic “works on LAN, fails on WAN” failure. It’s rarely the application. It’s usually routing, NAT, firewall state, or an MTU/PMTUD faceplant. The good news: you can prove which one in minutes—if you stop guessing and start interrogating the packet path.
The mental model: LAN success is not evidence of WAN health
When something “works on LAN,” you’ve proven only one thing: hosts on the same routing domain can complete a round trip. That’s it. WAN traffic introduces:
- Different source IPs (public networks vs RFC1918). Your server, firewall, and upstream might treat them differently.
- NAT state (SNAT/DNAT/masquerade) that doesn’t exist on pure LAN paths.
- Different return paths (asymmetric routing) when you have multiple gateways, VPNs, or “helpful” policy routing.
- Different MTUs (PPPoE, tunnels, overlays) that can silently kill certain flows.
- Different ACL zones (cloud security groups, edge firewalls, ISP filtering) that aren’t tested by LAN traffic.
In practice, “LAN OK, WAN broken” collapses into a few repeat offenders:
- Wrong default route (or wrong metric) for return traffic.
- Missing SNAT/masquerade for egress, or broken DNAT for ingress.
- Firewall allows LAN subnets but drops “unknown” sources or related/established state.
- Reverse path filtering drops packets because Linux thinks the source shouldn’t arrive on that interface.
- MTU/PMTUD black holes: SYN works, TLS stalls, “it’s flaky.”
Don’t start by restarting services. Start by proving how a packet enters, how it’s translated (if at all), and how it exits. If you can’t describe the path in one sentence, you’re not diagnosing—you’re hoping.
Interesting facts & history (because networks have baggage)
- RFC1918 private addressing (1996) made NAT the default crutch for IPv4 exhaustion; it also normalized “it works internally” as a misleading comfort signal.
- Linux netfilter/iptables landed in the 2.4 kernel era, replacing ipchains; the mental model of “tables/chains/hooks” still matters even when you’re using nftables today.
- Conntrack state is why “allow established/related” works—and why a full conntrack table makes a healthy firewall behave like it’s haunted.
- Reverse path filtering (rp_filter) was designed to reduce spoofing; in multi-homed systems it can drop legitimate traffic and create WAN-only failures.
- Path MTU Discovery relies on ICMP “Fragmentation needed” messages; filtering ICMP can break large flows while small pings still succeed.
- TCP MSS clamping became a common workaround for tunnel MTU problems; it’s useful, but it also hides root causes and can reduce performance.
- Asymmetric routing is common in the real world (dual uplinks, SD-WAN, ECMP), but stateful firewalls often assume symmetry and punish you for being creative.
- NAT reflection (hairpin NAT) is the reason internal clients can sometimes reach a service via its public IP; when it’s missing, you get confusing “LAN broken, WAN fine” too.
- nftables unified IPv4/IPv6 filtering semantics and improved performance/expressiveness, but the transition period left many systems with mixed tooling and mismatched expectations.
One quote worth keeping on a sticky note:
“Hope is not a strategy.” — Gen. Gordon R. Sullivan
That line gets repeated in ops because it’s painfully true. You don’t “feel” routing. You measure it.
Fast diagnosis playbook (first/second/third)
First: confirm the symptom is really WAN-specific
- Test from outside your network (cell hotspot, external probe, cloud VM).
- Test by IP and by name (DNS can masquerade as routing).
- Test TCP and ICMP separately (firewalls treat them differently).
Second: prove routing and return path on the server
ip route get <client_ip>from the server: the chosen egress interface and source IP must make sense.- Check multiple routes and metrics; the “correct” default route isn’t useful if it isn’t preferred.
- If multi-homed or VPN-enabled: inspect
ip ruleand per-table routes.
Third: prove NAT and state on the edge (or the host, if it’s doing NAT)
- Look for missing DNAT/SNAT, wrong interface matches, or rules that only hit LAN subnets.
- Inspect conntrack entries while testing; if packets arrive but no state is created, your hook/rule order is wrong.
- Validate rp_filter and MTU/PMTUD behavior before you blame “the ISP.”
If you do only one thing: run tcpdump on the ingress and egress interfaces during an external test. Packets don’t lie; humans do.
Practical tasks: commands, outputs, and decisions (12+)
These are meant to be run on Debian/Ubuntu servers and/or the Linux gateway doing routing/NAT. Each task includes what the output means and what decision you make next.
Task 1 — Confirm interfaces and addresses (sanity first)
cr0x@server:~$ ip -br addr
lo UNKNOWN 127.0.0.1/8 ::1/128
ens160 UP 10.20.0.10/24 fe80::a00:27ff:fe12:3456/64
ens192 UP 203.0.113.10/29 fe80::a00:27ff:fe98:7654/64
Meaning: This box is multi-homed: private LAN on ens160, public/WAN on ens192. Multi-homed is where “LAN works, WAN fails” goes to breed.
Decision: Assume return-path and rp_filter risk. Proceed to routing and rules, not application logs.
Task 2 — Read the routing table like you mean it
cr0x@server:~$ ip route
default via 10.20.0.1 dev ens160 proto dhcp metric 100
default via 203.0.113.9 dev ens192 proto static metric 200
10.20.0.0/24 dev ens160 proto kernel scope link src 10.20.0.10 metric 100
203.0.113.8/29 dev ens192 proto kernel scope link src 203.0.113.10 metric 200
Meaning: Two default routes. The LAN default has lower metric (100), so it wins. WAN clients might hit the public IP, but replies may try to leave via the LAN gateway. That’s asymmetry.
Decision: Fix default route preference or implement policy routing so traffic entering on WAN returns via WAN.
Task 3 — Ask the kernel: “how would you reach this WAN client?”
cr0x@server:~$ ip route get 198.51.100.77
198.51.100.77 via 10.20.0.1 dev ens160 src 10.20.0.10 uid 0
cache
Meaning: Replies to a public client would egress the private interface with a private source address. That’s not “a bit wrong.” It’s a full explanation of your outage.
Decision: Add policy routing or remove the competing default route. Don’t touch the app.
Task 4 — Identify policy routing that silently hijacks traffic
cr0x@server:~$ ip rule show
0: from all lookup local
1000: from 203.0.113.10 lookup wan
32766: from all lookup main
32767: from all lookup default
Meaning: There is a rule for traffic sourced from the public IP to use table wan. That helps, but only if the traffic is actually sourced from 203.0.113.10.
Decision: Ensure inbound connections to the public IP keep that source for replies (no SNAT to LAN), and ensure services bind correctly. Validate ip route show table wan.
Task 5 — Inspect the WAN routing table used by policy routing
cr0x@server:~$ ip route show table wan
default via 203.0.113.9 dev ens192
203.0.113.8/29 dev ens192 scope link src 203.0.113.10
Meaning: Table is sane. If WAN still fails, the issue is likely that replies aren’t matching this rule (wrong source IP) or packets are being dropped (rp_filter/firewall/NAT).
Decision: Validate actual source IP used in replies with tcpdump.
Task 6 — Check reverse path filtering (the “anti-spoofing” that eats your lunch)
cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.ens160.rp_filter net.ipv4.conf.ens192.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.ens160.rp_filter = 1
net.ipv4.conf.ens192.rp_filter = 1
Meaning: Strict rp_filter (1) on a multi-homed host can drop packets when the “best route back” doesn’t match the ingress interface. Which is exactly your situation with two defaults.
Decision: Set rp_filter to loose mode (2) on relevant interfaces, or fix routing symmetry so strict mode can remain.
Task 7 — Verify IP forwarding (if the host is a router/NAT box)
cr0x@server:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
Meaning: If this machine is supposed to route between LAN and WAN, it currently refuses.
Decision: Enable forwarding and then validate firewall/NAT. If it’s not supposed to route, stop and rethink the design (you might be debugging the wrong box).
Task 8 — Inspect nftables ruleset (modern Debian/Ubuntu default)
cr0x@server:~$ sudo nft list ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
ct state established,related accept
iif "ens160" ip saddr 10.20.0.0/24 accept
tcp dport 22 accept
}
chain forward {
type filter hook forward priority 0; policy drop;
ct state established,related accept
iif "ens160" oif "ens192" accept
}
}
table ip nat {
chain postrouting {
type nat hook postrouting priority 100; policy accept;
oif "ens192" masquerade
}
}
Meaning: Input policy is drop. It accepts established/related and anything from LAN subnet. It also allows SSH from anywhere. But it does not allow the WAN to reach your service port. That’s a clean, boring explanation.
Decision: Add explicit WAN accept rules for the required ports, and keep them tight. Don’t “policy accept” your way into an incident report.
Task 9 — If you’re on iptables, check both filter and nat tables
cr0x@server:~$ sudo iptables -S
-P INPUT DROP
-P FORWARD DROP
-P OUTPUT ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -s 10.20.0.0/24 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
Meaning: Same story as nft: WAN traffic to your app port is not allowed. LAN works because LAN source subnet is allowed.
Decision: Add an allow rule for the service port, scoped to expected sources if possible. If the service is behind DNAT, you also need FORWARD chain rules.
Task 10 — Validate DNAT/port-forwarding rules (edge box or host-based)
cr0x@gateway:~$ sudo nft list table ip nat
table ip nat {
chain prerouting {
type nat hook prerouting priority -100; policy accept;
iif "ens192" tcp dport 443 dnat to 10.20.0.10:443
}
chain postrouting {
type nat hook postrouting priority 100; policy accept;
oif "ens192" masquerade
}
}
Meaning: Port 443 from WAN is forwarded to an internal host. If WAN can’t connect, either packets aren’t arriving at ens192, the FORWARD chain blocks it, or the internal host’s default route/NAT reply is wrong.
Decision: Capture traffic on both interfaces and check FORWARD chain counters.
Task 11 — Watch conntrack while testing from WAN
cr0x@gateway:~$ sudo conntrack -E -p tcp --dport 443
[NEW] tcp 6 120 SYN_SENT src=198.51.100.77 dst=203.0.113.10 sport=51234 dport=443 [UNREPLIED] src=10.20.0.10 dst=198.51.100.77 sport=443 dport=51234
Meaning: The gateway sees an inbound SYN and has created a NATed connection, but it’s [UNREPLIED]. The internal server didn’t answer, or the answer didn’t return through this gateway.
Decision: tcpdump on internal interface to see if the SYN hits the server; tcpdump on server to see if it replies; verify server default route points back to this gateway.
Task 12 — tcpdump on WAN: do packets arrive at all?
cr0x@gateway:~$ sudo tcpdump -ni ens192 tcp port 443
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:04:21.123456 IP 198.51.100.77.51234 > 203.0.113.10.443: Flags [S], seq 1234567890, win 64240, options [mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 7], length 0
Meaning: Inbound SYN reaches the gateway. This is not an ISP block. This is now your problem (congratulations).
Decision: Capture on LAN interface too to confirm forwarding and DNAT are working.
Task 13 — tcpdump on LAN: does the forwarded SYN reach the server?
cr0x@gateway:~$ sudo tcpdump -ni ens160 host 10.20.0.10 and tcp port 443
listening on ens160, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:04:21.124001 IP 198.51.100.77.51234 > 10.20.0.10.443: Flags [S], seq 1234567890, win 64240, options [mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 7], length 0
Meaning: DNAT is working; the server is seeing traffic with the original client IP (good). Now the server must reply back via the gateway so SNAT and state tracking can finish the handshake.
Decision: Check the server’s route back to the client and run tcpdump on the server egress.
Task 14 — tcpdump on server: does it reply, and from which source IP?
cr0x@server:~$ sudo tcpdump -ni ens160 tcp port 443 and host 198.51.100.77
listening on ens160, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:04:21.124050 IP 198.51.100.77.51234 > 10.20.0.10.443: Flags [S], seq 1234567890, win 64240, options [mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 7], length 0
12:04:21.124090 IP 10.20.0.10.443 > 198.51.100.77.51234: Flags [R.], seq 0, ack 1234567891, win 0, length 0
Meaning: The server is rejecting (RST) on 443, so the app/service isn’t listening or is bound to a different address. LAN “works” might be hitting a different VIP/port, or a local proxy.
Decision: Validate listening sockets and bindings. If it’s a SYN-ACK instead, then verify that the reply goes back to the gateway (next task).
Task 15 — Check listening sockets and bind addresses
cr0x@server:~$ sudo ss -lntp | egrep ':443|:80'
LISTEN 0 4096 127.0.0.1:443 0.0.0.0:* users:(("nginx",pid=2031,fd=7))
Meaning: The service is bound to loopback only. It will work from “local” tests and maybe from a LAN reverse proxy, but it will fail from forwarded WAN traffic hitting 10.20.0.10.
Decision: Rebind to the correct interface (0.0.0.0 or specific IP) and retest from WAN.
Task 16 — Check MTU quickly when “SYN works, TLS stalls”
cr0x@server:~$ ping -M do -s 1472 198.51.100.77 -c 2
PING 198.51.100.77 (198.51.100.77) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1492
ping: local error: message too long, mtu=1492
--- 198.51.100.77 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1026ms
Meaning: Your effective MTU is 1492 (classic PPPoE). If you assume 1500 and block ICMP, you can create a WAN-only failure where small packets work and larger ones hang.
Decision: Fix MTU end-to-end or clamp MSS on the tunnel/edge. Don’t just “open random ports.”
Joke #1: NAT is like office politics—nobody admits it exists, but it decides who gets to talk to whom.
Routing/NAT patterns that cause LAN-only success
1) The server has two default routes, and the wrong one wins
This is the #1 pattern on multi-homed systems: one interface for management/LAN, one for public/WAN, sometimes a third for VPN. Linux picks the route with the lowest metric. If DHCP hands you a default gateway on the LAN NIC and you add a static default for WAN, you’ve created a lottery where the “wrong” route is consistently chosen.
WAN request comes in on the WAN interface, but the reply goes out the LAN gateway. Upstream drops it because the source IP is wrong, or because the stateful firewall in the middle never sees the reply.
What to do: choose one default route in main, and use policy routing for the exceptions. If you must keep multiple defaults, be explicit about metrics and source addresses.
2) DNAT works, but FORWARD is dropping
On a Linux gateway, DNAT in PREROUTING can be perfect and still nothing works because the FORWARD chain is default drop and has no allow rule for the forwarded service. People check “the NAT rules” and stop there. That’s how you spend a Tuesday night with tcpdump and regret.
What to do: treat NAT and filtering as separate layers. You need both: the translation and the permission.
3) Missing SNAT/masquerade for egress
LAN hosts can reach the gateway, can reach internal services, can even resolve DNS. But they can’t reach the internet because packets go out with private source IPs and die upstream. Locally, everything looks fine.
What to do: confirm POSTROUTING SNAT/masquerade exists and matches the correct outgoing interface. Then confirm return traffic is permitted and conntrack isn’t exhausted.
4) rp_filter drops packets only from “weird” sources
Strict reverse path filtering is fine on single-homed hosts. On multi-homed or policy-routed systems, it can drop legitimate WAN packets because the kernel says: “if I were to route back to that source, I wouldn’t use this interface, so it must be spoofed.”
That’s a reasonable security posture, until you’re the one explaining why only WAN fails. Loose mode (2) is often the right compromise on edge systems with multiple uplinks.
5) MTU/PMTUD black hole
Small packets work. Pings work. SYN/SYN-ACK works. Then TLS handshake stalls, HTTP uploads hang, or gRPC streams reset. The culprit is often PMTUD broken by filtered ICMP, plus a reduced path MTU because of PPPoE or tunnels.
What to do: measure the MTU, allow essential ICMP types, and clamp MSS when you must.
6) Hairpin NAT confusion
Sometimes the report is “works from the internet, fails from inside the office when using the public hostname.” That’s hairpin NAT (NAT reflection). It’s the mirror maze of NAT: useful when you need it, disorienting when you don’t have it.
What to do: implement split-horizon DNS (internal DNS returns internal IP) or configure hairpin NAT correctly on the edge device.
Joke #2: If you think your routing is “simple,” you’re either lucky or you haven’t looked at ip rule yet.
Three corporate mini-stories (anonymized, plausible, painfully familiar)
Mini-story #1 — The outage caused by a wrong assumption
They migrated a customer-facing service from a single NIC VM to a two-NIC VM: one interface for “internal east-west,” one for “public ingress.” The engineer doing the change had the right instinct: “separate traffic, reduce risk.”
The wrong assumption was subtle: they assumed the public NIC would automatically become the default route for replies to public clients. It didn’t. DHCP on the internal network provided a default gateway with a lower metric, and Linux happily used it.
From the internal network, the service looked perfect. Monitoring from inside the VPC was green. The on-call rotated, saw internal checks passing, and told the product team the issue was “probably DNS propagation.” It wasn’t. External clients saw SYN-ACKs coming from the wrong source IP or not at all, depending on where the asymmetric path got filtered.
The fix wasn’t heroic. They removed the internal default gateway, used a specific route for internal subnets, and added policy routing for the few outbound calls that needed the internal interface. The postmortem’s real lesson: a passing LAN test is not a WAN test, and route metrics are operational configuration, not trivia.
Mini-story #2 — The optimization that backfired
A platform team wanted to “standardize firewall rules” across fleets. They moved from a permissive default policy to a default-drop with a small allowlist. That’s usually the right direction—until you deploy it without understanding traffic sources.
They allowed application ports from the corporate RFC1918 space and from a partner subnet. They forgot that real customers don’t come from those ranges. In staging, everything passed because staging tests ran from inside the network. In production, WAN traffic hit the edge, got forwarded correctly, and then died at the host firewall’s input chain.
The on-call spent hours checking load balancer health checks and TLS certs because “it works internally.” Eventually someone ran nft list ruleset and noticed the allow rule was scoped to internal sources only. The “optimization” was correct in spirit but incomplete in scope.
They fixed it by defining explicit zones: internal, partner, internet. Internet got a minimal allowlist for the public service ports, rate-limited, logged, and monitored. They also added an external synthetic check as a release gate. The real backfire wasn’t security. It was assuming internal tests represented the world.
Mini-story #3 — The boring practice that saved the day
A finance company ran a small but strict runbook: every time a new public endpoint shipped, the engineer had to attach three artifacts to the ticket: ip route get to a public probe, a tcpdump snippet showing SYN/SYN-ACK, and the firewall rule diff.
People complained. It felt bureaucratic. Then a new VPN client rollout changed routing priorities on a subset of hosts. The VPN pushed a new default route with a better metric. Internally, everything still worked. Externally, endpoints went dark intermittently—only for traffic sourced from certain networks, depending on which egress was chosen.
Because they had “boring artifacts,” the on-call compared today’s ip route and ip rule outputs to last week’s ticket. The difference was obvious: the VPN default route was now winning. They added a policy rule for the service IP and pinned return traffic to the right table. No guesswork, no superstition.
The practice wasn’t glamorous. It was reproducible evidence. That’s what saved the day, not a war room.
Common mistakes: symptoms → root cause → fix
1) Symptom: works from LAN IP, fails from public IP
Root cause: service bound to 127.0.0.1 or to the LAN address only.
Fix: change bind/listen address to 0.0.0.0 or the public/VIP, and confirm with ss -lntp. If behind DNAT, bind to the internal address that DNAT targets.
2) Symptom: external SYN arrives, but no SYN-ACK returns
Root cause: asymmetric return path (wrong default route, wrong metric, missing policy routing) or rp_filter drops replies/requests.
Fix: run ip route get <client_ip>, adjust routes/metrics or add ip rule + per-table default. Set rp_filter to loose where appropriate.
3) Symptom: DNAT looks correct; still no connectivity
Root cause: FORWARD chain drop, or missing allow rule for forwarded traffic.
Fix: add explicit allow in forward chain for iif wan oif lan and destination port; keep established/related allowed.
4) Symptom: LAN hosts can’t reach the internet, but gateway can
Root cause: missing SNAT/masquerade for LAN egress, or SNAT tied to the wrong interface name.
Fix: add masquerade in postrouting for the WAN interface, confirm with tcpdump that egress packets use the public source.
5) Symptom: ping works, HTTP works for small pages, uploads/TLS hang
Root cause: MTU/PMTUD black hole (often ICMP blocked) over PPPoE/tunnels.
Fix: allow ICMP “fragmentation needed,” lower MTU on interfaces, or clamp TCP MSS on the edge.
6) Symptom: some external networks work, others fail
Root cause: upstream filtering/peering, or policy routing based on source prefix, or an ACL scoped to “known” subnets.
Fix: test from multiple external probes; check firewall rules for overly specific source matches; check BGP/edge policy if applicable.
7) Symptom: works on IPv4, fails on IPv6 (or vice versa)
Root cause: dual-stack partial configuration: DNS returns AAAA but firewall/routes aren’t ready, or NAT64 assumptions.
Fix: explicitly test -4/-6, ensure nftables has inet rules covering both, and confirm routing tables for v6.
8) Symptom: internal clients can’t reach service via public hostname
Root cause: missing hairpin NAT or lack of split-horizon DNS.
Fix: implement split DNS or configure NAT reflection; do not “fix” it by opening the firewall wider.
Checklists / step-by-step plan
Step-by-step: diagnose a public service that works only on LAN
- Reproduce from outside. Use a true external vantage point. If you can’t, stop—your diagnosis will be biased.
- Confirm DNS vs routing. Compare
digresults from inside and outside; test by IP directly. - Check the service is actually listening. Use
ss -lntpand verify bind addresses match the traffic path. - Trace packet arrival. tcpdump on WAN interface: do SYNs arrive?
- Trace forwarding/NAT. tcpdump on LAN side of the gateway: does DNAT forward the SYN to the server?
- Trace server reply. tcpdump on server egress: does it reply, and via which interface/source IP?
- Confirm routing decision.
ip route get <client_ip>must show the correct egress interface and source. - Confirm firewall decisions. nftables/iptables counters should increment where you expect. If they don’t, your rule doesn’t match reality.
- Check rp_filter and conntrack. rp_filter for multi-homing; conntrack table size if state is missing or flapping.
- Only then look at the application. By now you’ll know whether the app never saw packets, rejected them, or replied into a void.
Step-by-step: fix return-path issues safely (multi-homed host)
- Pick one primary default route in
main. Make it the egress for most traffic. - Create a dedicated routing table (e.g.,
wan) with its own default gateway. - Add an
ip rulematching traffic sourced from the public IP (or fwmark) to use that table. - Set rp_filter to loose mode where multi-homing demands it, or keep strict mode if you can guarantee symmetry.
- Validate with
ip route getfor representative client IPs. - Validate with tcpdump: SYN in, SYN-ACK out the same edge.
Step-by-step: validate NAT on a Linux gateway
- Confirm forwarding is enabled:
sysctl net.ipv4.ip_forward. - Confirm NAT rules exist (PREROUTING for DNAT, POSTROUTING for SNAT/masquerade).
- Confirm FORWARD chain allows the flow (new and established).
- Confirm conntrack entries change from
UNREPLIEDto replied during a test. - Confirm the internal host’s default route points to the NAT gateway (or has a route back to the client subnet).
FAQ
1) If LAN works, doesn’t that prove the firewall is fine?
No. Many firewalls allow RFC1918 sources broadly and treat internet sources as hostile. LAN success often proves only that your LAN allow rule matches.
2) Why does curl from the server itself work, but WAN fails?
Local curl doesn’t traverse DNAT, doesn’t prove routing symmetry, and might hit loopback bindings. Always test from an external host and watch packet flow.
3) How do I quickly detect asymmetric routing?
Run ip route get <external_client_ip> and compare with the ingress interface. If the reply would exit a different interface, you have asymmetry risk.
4) Should I disable rp_filter?
Prefer rp_filter=2 (loose) on multi-homed interfaces rather than disabling everywhere. If you can make routing symmetric, strict mode (1) is fine and safer.
5) I use nftables but tooling still shows iptables rules. Which is real?
On Debian/Ubuntu, iptables may be backed by nft (iptables-nft). The kernel evaluates nftables. Use nft list ruleset to see the truth.
6) Why do I see UNREPLIED in conntrack?
The first packet was seen and state was created, but no return traffic completed the flow. That points to server not responding, return path wrong, or reply blocked/dropped.
7) Can MTU really break only WAN?
Yes. LAN paths often stay at 1500. WAN edges (PPPoE, VPNs, tunnels) may reduce MTU. With ICMP blocked, PMTUD fails and large packets disappear.
8) What’s the safest way to open a port for WAN access?
Add an explicit allow rule for the service port on the ingress path (host or gateway), keep default-drop, and add logging/rate limiting where appropriate.
9) My service is behind DNAT. Where should I troubleshoot first: gateway or server?
Start at the gateway WAN interface: do packets arrive? Then LAN interface: do they forward? Then server: does it reply? That sequence prevents daydreaming.
10) I changed routes but nothing changed. Why?
Route cache and existing conntrack state can keep old behavior briefly. Flush route cache (ip route flush cache) and retest with new connections.
Conclusion: practical next steps
If you’re staring at “works on LAN, fails on WAN,” stop treating it as a mystery. It’s a packet path problem until proven otherwise.
- Run
ip routeandip route getfor a real external client IP. Fix return-path symmetry or add policy routing. - Inspect
nft list ruleset(or iptables) and confirm WAN traffic is explicitly permitted where it must be (INPUT for host services, FORWARD for DNAT). - Use tcpdump on ingress and egress interfaces during an external test. Confirm: arrival, translation, forwarding, reply.
- Check rp_filter and MTU when symptoms are selective or “flaky.”
- Turn your findings into a tiny runbook for your team: the commands, the expected outputs, and the one decision each output drives.
The goal isn’t to memorize every netfilter hook. It’s to build a habit: never guess when the kernel will tell you what it’s doing.