“No route to host” is the kind of error that makes smart people do dumb things—like restarting networking on a production box because it “usually fixes it.”
Sometimes it does. Often it just adds a second incident on top of the first.
On Ubuntu 24.04, this message usually means the kernel couldn’t figure out how to get packets to the destination at all, or it tried and got a hard failure back (often from ARP or routing).
This guide is the checklist I wish everyone used: ARP, gateway, routes, VLANs, ACLs, and the handful of commands that end the guessing quickly.
What “No route to host” actually means on Linux
On Linux, “No route to host” typically maps to an error like EHOSTUNREACH or ENETUNREACH. That’s not an application being dramatic; it’s the kernel saying:
“I cannot deliver this packet, and I have a specific reason that isn’t ‘timeout’.”
The tricky part is that multiple failure modes can surface as the same user-facing error:
- No matching route in the routing table (no default route, wrong prefix route, wrong VRF/table).
- Neighbor resolution failure (ARP for IPv4, NDP for IPv6) — the route exists, but you can’t find the next-hop MAC address.
- Immediate ICMP unreachable returned by a router, firewall, or host (policy-based block that chooses “unreachable” over “drop”).
- Local policy says “don’t” (nftables, policy routing, RP filter, or a host route pointing to the void).
Here’s the practical implication: if you treat every “No route to host” as “routing is wrong,” you’ll waste time. Sometimes routing is fine; the gateway is alive; you’re just not getting ARP replies because a VLAN is wrong or a security control is silently separating you from your next hop.
One quote I keep around for incidents like this: paraphrased idea from W. Edwards Deming: “Without data, you’re just another person with an opinion.”
This checklist is data-first: run a command, read output, make a decision.
Fast diagnosis playbook (first/second/third)
You want the fastest path to the bottleneck. Not the most elegant. Not the one that makes the network team happiest. The fastest.
First: prove local routing intent (do we know where to send packets?)
- Check the route the kernel will use:
ip route get <dest> - Confirm you have a default route if the destination is off-subnet.
- Confirm your source address selection matches what you think (multi-homed hosts love surprises).
Second: prove L2 adjacency to your next hop (can we ARP for the gateway?)
- Ping the default gateway (or at least attempt ARP): check
ip neighstate. - If ARP is stuck in
INCOMPLETE/FAILED, you’re not reaching your gateway at Layer 2. Stop blaming routes.
Third: prove policy isn’t blocking you (ACL/firewall/security group/RPF)
- Check local nftables/ufw rules.
- Capture on the egress interface: are ARP requests leaving? Are replies arriving?
- If packets leave and never come back, it’s upstream (switch/VLAN/ACL) or the gateway itself.
Joke #1: The fastest way to “fix” a network issue is to reboot something—right up until you discover you rebooted the only box with the logs.
Interesting facts & context (because systems have history)
- ARP predates the modern Internet boom. It was specified in 1982 (RFC 826). We’re still debugging the same basic mechanism in 2025.
- Linux neighbor states are a whole finite-state machine. Those
REACHABLE,STALE,DELAY,PROBE,FAILEDlabels aren’t decoration; they’re kernel behavior cues. - ICMP “host unreachable” is sometimes a courtesy. Many firewalls drop silently; some explicitly reject with ICMP, which makes “No route to host” show up instantly instead of timing out.
- Reverse path filtering (rp_filter) is older than most cloud VPCs. It exists to reduce spoofing, but it still breaks asymmetric routing in real systems.
- Ubuntu’s networking stack changed culturally more than technically. Netplan (YAML → renderer) made config management friendlier, but it also added a layer where “looks right” can still render wrong.
- “No route to host” is not the same as “Connection timed out.” “No route” usually means an immediate hard failure; “timeout” often means packets vanish (drop/blackhole).
- Gratuitous ARP exists to reduce downtime. Hosts announce “this IP is at this MAC” to update caches—useful during failover, dangerous during misconfig.
- Policy routing has been in Linux for decades. Multiple routing tables + rules are powerful, and also a reliable way to create invisible connectivity failures.
A mental model: L2 vs L3 vs policy
When Ubuntu says “No route to host,” you should immediately categorize the failure. Not by vibes—by layers and decisions the kernel makes.
Layer 3: “Which next hop?”
The kernel consults routing tables (possibly multiple via policy rules). If it cannot find a route, you get a fast error. If it finds a route, it chooses:
destination prefix → next-hop (gateway or on-link) → egress interface → source IP.
Layer 2: “What MAC address?”
If the next hop is on-link (including your default gateway), Linux must resolve the MAC address via ARP (IPv4) or NDP (IPv6).
If ARP fails, routing can be perfect and you still won’t reach anything. This is why the “route looks right” crowd loses arguments in incidents.
Policy and security controls: “Are we allowed?”
The packet may be blocked locally (nftables/ufw), rejected upstream (ACL, security group, firewall), or dropped due to anti-spoof checks (rp_filter).
These failures can present as immediate unreachable or as silence. “No route to host” often means an explicit reject somewhere.
Practical tasks: commands, outputs, and decisions (12+)
These are the tasks I run in production. Each one includes: command, what the output means, and the decision you make next.
Run them in order if you want a clean narrative, or jump around if you already suspect a layer.
Task 1: Reproduce the failure with a deterministic tool
cr0x@server:~$ nc -vz -w 3 10.40.12.50 22
nc: connect to 10.40.12.50 port 22 (tcp) failed: No route to host
Meaning: This is a kernel-level error, not an SSH client whim. If you see “No route to host” here, you can stop tweaking SSH configs.
Decision: Move to route selection (ip route get) and neighbor checks.
Task 2: Ask the kernel which route it would use
cr0x@server:~$ ip route get 10.40.12.50
10.40.12.50 via 10.40.12.1 dev ens192 src 10.40.12.20 uid 1000
cache
Meaning: Linux believes it should send traffic to gateway 10.40.12.1 via ens192, using source IP 10.40.12.20.
Decision: If this is wrong (wrong gateway, wrong interface, wrong source), fix routing/netplan. If it’s right, test the next hop with ARP/ping.
Task 3: Confirm you actually have the expected routes
cr0x@server:~$ ip route show
default via 10.40.12.1 dev ens192 proto dhcp src 10.40.12.20 metric 100
10.40.12.0/24 dev ens192 proto kernel scope link src 10.40.12.20 metric 100
Meaning: Default route exists. Subnet route exists. This is not “no route in the table.”
Decision: Move down the stack to ARP/neighbor resolution for 10.40.12.1.
Task 4: Check link state and carrier (don’t skip this)
cr0x@server:~$ ip link show ens192
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 00:50:56:aa:bb:cc brd ff:ff:ff:ff:ff:ff
Meaning: LOWER_UP tells you the NIC has carrier. If it’s NO-CARRIER or state DOWN, your problem is physical/virtual link level.
Decision: If not LOWER_UP, stop and fix the link (vSwitch port group, cable, bond, driver). If it is, continue.
Task 5: Check the neighbor (ARP) entry for your gateway
cr0x@server:~$ ip neigh show dev ens192
10.40.12.1 INCOMPLETE
10.40.12.77 lladdr 00:25:90:12:34:56 STALE
Meaning: INCOMPLETE for the gateway means ARP requests went out (or were attempted) but no reply was learned. This is a huge clue.
Decision: This is likely VLAN mismatch, switch port isolation, upstream ACL, wrong subnet mask, or the gateway being down. Validate ARP traffic with tcpdump.
Task 6: Force ARP activity by pinging the gateway
cr0x@server:~$ ping -c 2 -W 1 10.40.12.1
PING 10.40.12.1 (10.40.12.1) 56(84) bytes of data.
--- 10.40.12.1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1010ms
Meaning: Ping didn’t get replies. That alone doesn’t prove ARP failure—ICMP could be blocked—but combined with INCOMPLETE neighbor state, it’s strong evidence.
Decision: Capture ARP. If you don’t see replies, stop assuming “routing.” This is adjacency.
Task 7: Capture ARP on the egress interface
cr0x@server:~$ sudo tcpdump -ni ens192 arp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:41:05.112233 ARP, Request who-has 10.40.12.1 tell 10.40.12.20, length 28
12:41:06.113244 ARP, Request who-has 10.40.12.1 tell 10.40.12.20, length 28
Meaning: Requests leave your host. No replies arrive. That’s not a local firewall issue; ARP isn’t filtered by nftables in the usual IP chains anyway.
Decision: Escalate to switching/VLAN/portgroup/VRF/gateway health. If you do see a reply, your host should learn it; then look at rp_filter, policy routing, or upstream ACLs.
Task 8: Verify IP addressing and prefix are what you think
cr0x@server:~$ ip -br addr show ens192
ens192 UP 10.40.12.20/24 fe80::250:56ff:feaa:bbcc/64
Meaning: You’re configured for /24. If your network is actually /25 or /23, you may be ARPing for something that isn’t on-link or missing an on-link gateway route.
Decision: If prefix mismatch is suspected, confirm with network design/source of truth and fix netplan or DHCP scope. Don’t “try /16 to see if it works.” That’s how you create a second incident.
Task 9: Check policy routing rules (the silent saboteur)
cr0x@server:~$ ip rule show
0: from all lookup local
1000: from 10.40.12.20 lookup 100
32766: from all lookup main
32767: from all lookup default
Meaning: Traffic sourced from 10.40.12.20 uses routing table 100. If table 100 is missing routes, you can get “no route” despite a perfect main table.
Decision: Inspect table 100. If it’s wrong, fix the rule or populate routes correctly.
Task 10: Inspect the alternate routing table (if rules exist)
cr0x@server:~$ ip route show table 100
default via 10.40.12.254 dev ens192
Meaning: Table 100 sends default traffic to 10.40.12.254. If your real gateway is 10.40.12.1, you’ve found a misroute. If .254 doesn’t exist, ARP will fail and you’ll see unreachable behavior.
Decision: Fix the policy routing rule or the table’s default route. Then re-test ip route get and ip neigh.
Task 11: Check reverse path filtering (rp_filter) for asymmetric routes
cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.ens192.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.ens192.rp_filter = 1
Meaning: rp_filter=1 is “strict.” In multi-homed environments or asymmetric routing, strict RPF can drop replies and produce confusing unreachables/timeouts.
Decision: If you have asymmetric routing by design, consider rp_filter=2 (loose) or an architecture change. Don’t disable globally unless you enjoy threat models you can’t explain.
Task 12: Check local firewall (nftables/ufw) for rejects
cr0x@server:~$ sudo nft list ruleset | sed -n '1,120p'
table inet filter {
chain input {
type filter hook input priority filter; policy accept;
}
chain forward {
type filter hook forward priority filter; policy accept;
}
chain output {
type filter hook output priority filter; policy accept;
}
}
Meaning: This host isn’t blocking anything with nftables. If you see reject with icmpx type host-unreachable, that can explain immediate “No route to host.”
Decision: If output chain rejects traffic to the destination or subnet, fix rules. If ruleset is empty/accept, look upstream.
Task 13: Verify UFW status (because someone always enabled it “temporarily”)
cr0x@server:~$ sudo ufw status verbose
Status: inactive
Meaning: UFW isn’t participating. If active, check for outbound denies or route blocks (less common, but real).
Decision: If UFW is active and suspicious, either adjust policy or temporarily allow for diagnosis—document the change and revert.
Task 14: Check for a stale/wrong ARP entry (rare, but quick)
cr0x@server:~$ ip neigh show 10.40.12.1
10.40.12.1 lladdr 00:11:22:33:44:55 STALE
Meaning: You have a MAC for the gateway. If it’s wrong (gateway failover, VRRP move), you can briefly blackhole traffic.
Decision: Flush the entry and re-learn. If it immediately relearns the same MAC and still fails, the issue is beyond your host.
cr0x@server:~$ sudo ip neigh flush 10.40.12.1 dev ens192
10.40.12.1 dev ens192 flushed
Task 15: Validate netplan rendered config (Ubuntu 24.04 reality)
cr0x@server:~$ sudo netplan get
network:
version: 2
ethernets:
ens192:
dhcp4: true
Meaning: Netplan is using DHCP here. If you expect static routes/gateway, you either aren’t applying the right config or DHCP is overriding your intent.
Decision: Confirm the netplan YAML in /etc/netplan/, then netplan try (carefully) or coordinate a maintenance window for disruptive changes.
Task 16: Check systemd-networkd or NetworkManager depending on renderer
cr0x@server:~$ networkctl status ens192
● 2: ens192
Link File: /usr/lib/systemd/network/99-default.link
Network File: /run/systemd/network/10-netplan-ens192.network
State: routable (configured)
Online state: online
Address: 10.40.12.20
fe80::250:56ff:feaa:bbcc
Gateway: 10.40.12.1
Meaning: The renderer believes you’re routable and has a gateway. That doesn’t prove ARP works, but it does prove config is present.
Decision: If networkctl shows no gateway, fix configuration. If it shows a gateway but ARP fails, investigate upstream L2/ACL.
Task 17: Trace the path and see where “unreachable” appears
cr0x@server:~$ tracepath -n 10.40.12.50
1?: [LOCALHOST] pmtu 1500
1: 10.40.12.1 0.314ms
2: no reply
3: no reply
Too many hops: pmtu 1500
Meaning: You can reach hop 1 (gateway). If your earlier ARP was failing, this would not happen. So now you’re past L2.
Decision: If hop 1 is reachable but destination isn’t, investigate routing/ACL beyond the first hop: firewall rules, inter-VLAN ACLs, security groups, or missing return routes.
Task 18: Capture ICMP unreachable to prove an ACL reject
cr0x@server:~$ sudo tcpdump -ni ens192 icmp or icmp6
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:55:22.443322 IP 10.40.12.1 > 10.40.12.20: ICMP host 10.40.12.50 unreachable, length 68
Meaning: The gateway is explicitly telling you the host is unreachable. That’s not your Ubuntu box being confused. That’s a routing/ACL decision upstream.
Decision: Escalate with evidence: include timestamp, ICMP type/code, and the router IP that sent it. Ask: is there a route to 10.40.12.50? Is there an ACL blocking?
Joke #2: ARP is like office gossip—if nobody answers, it doesn’t mean you’re wrong; it means you’re talking to the wrong room.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company had a clean separation between “apps” and “databases” networks. A new Ubuntu 24.04 VM was deployed in the apps subnet.
The engineer assumed “default gateway can route to everything internal,” because that’s how the previous environment worked.
The new VM threw “No route to host” when connecting to a database. The routing table looked fine. The default route was present. The team spent an hour debating netplan syntax and whether systemd-networkd had a bug.
Meanwhile, the database team insisted nothing had changed.
The breakthrough came from tcpdump: an ICMP “host unreachable” returned instantly from the gateway. That meant this wasn’t a silent drop, and it wasn’t local.
The gateway was actively refusing to forward.
The wrong assumption: internal routing was universal. In reality, the network team had implemented inter-VLAN ACLs months earlier, allowing only specific app subnets to reach the database VLAN.
This VM landed in a “general apps” VLAN that wasn’t on the allow list.
Fix was boring: update the ACL to allow the correct subnet and ports, and update the provisioning workflow so database-bound VMs are placed into the right segment.
The key lesson: “default route exists” doesn’t mean “policy allows.”
Mini-story 2: The optimization that backfired
Another org wanted faster failover for a pair of routers doing first-hop redundancy.
They tuned ARP and neighbor timers on servers “to speed convergence,” and also reduced certain retry/backoff behaviors.
It worked in a lab. In production, it created an intermittent storm: during a gateway failover, some Linux hosts would mark the neighbor entry as FAILED quickly,
then refuse to talk to the gateway until the next resolution window. Clients saw bursts of “No route to host,” which is a great way to start an argument between SRE and Network Engineering.
Packet captures showed ARP requests going out during failover, but replies arrived a beat later—late enough to miss the newly shortened patience threshold.
The “optimization” shaved milliseconds off the happy path and added seconds to the unhappy path.
The rollback was immediate. Then they did the correct optimization: improve gateway failover signaling (gratuitous ARP/ND announcements), validate switch behavior, and test on realistic latency/jitter.
Leave host neighbor timers close to defaults unless you can model the consequences and you’re prepared to own them.
Mini-story 3: The boring but correct practice that saved the day
A financial services shop had an unglamorous rule: every incident report must include ip route get, ip neigh, and a 30-second tcpdump snippet.
The team grumbled. It felt like paperwork. Until it didn’t.
One afternoon, multiple Ubuntu 24.04 nodes started failing to reach an internal API with “No route to host.” The on-call pulled the standard triage bundle.
Routes were correct. Gateways were reachable. But tcpdump showed ICMP host-unreachable messages coming from a firewall VIP.
That immediately narrowed it: upstream device was rejecting, not dropping. The firewall team could search logs by the VIP and timestamp.
It turned out a newly deployed ACL template had a “deny any any” placed above a specific allow, but only for one zone pair.
The fix took minutes once the right evidence was presented. The boring practice—consistent diagnostic artifacts—prevented a multi-hour blame spiral.
Nobody had to “try a reboot.” Nobody had to touch netplan. The incident stayed one incident.
Common mistakes: symptom → root cause → fix
This section is blunt on purpose. These are the traps that keep “No route to host” alive long after it should be dead.
1) Symptom: “No route to host” to anything off-subnet
Root cause: Missing default route or wrong gateway.
Fix: Verify ip route. Fix netplan or DHCP. Confirm with ip route get 8.8.8.8 (or an internal off-subnet IP).
2) Symptom: Route exists, but gateway is INCOMPLETE in ip neigh
Root cause: ARP replies not reaching host: wrong VLAN, switch port mis-tagging, port security, gateway down, or L2 isolation.
Fix: tcpdump -ni <if> arp. If requests leave but no replies, escalate to L2/network team with evidence. Validate VM portgroup/VLAN tag.
3) Symptom: Immediate “No route to host” but ARP is fine and gateway pings
Root cause: Upstream device is sending ICMP unreachable (ACL reject, missing route to destination, or policy routing on gateway).
Fix: Capture ICMP with tcpdump. Identify the sender IP (often the gateway/firewall). Fix ACL/route upstream.
4) Symptom: Only one source IP fails on a multi-IP host
Root cause: Policy routing rule sends that source into an empty/wrong routing table.
Fix: ip rule show and ip route show table X. Align rules with intended egress/gateway.
5) Symptom: Works one direction, fails the other; errors vary between timeout and unreachable
Root cause: Asymmetric routing combined with strict RPF on one side.
Fix: Check rp_filter and upstream anti-spoof controls. Use loose mode where justified, or fix the routing symmetry.
6) Symptom: After a gateway failover, some hosts get “No route to host” for minutes
Root cause: Stale neighbor cache + delayed gratuitous ARP/ND, or aggressive timer tuning.
Fix: Ensure gateway sends gratuitous ARP/ND on failover. Consider flushing neighbor entries on impacted hosts only as a last resort.
7) Symptom: Only specific ports show “No route to host” (not “connection refused”)
Root cause: A firewall is rejecting with ICMP unreachable for certain ports/flows, or a middlebox policy is classifying traffic.
Fix: Tcpdump for ICMP unreachables during the failed connection attempt. Confirm with network/security team; adjust ACL policy.
8) Symptom: IPv6 destination fails with “No route to host,” IPv4 works
Root cause: Missing IPv6 default route or NDP failure; RA disabled; firewall blocks ICMPv6 (which breaks IPv6 in non-obvious ways).
Fix: Use ip -6 route and ip -6 neigh. Ensure ICMPv6 is allowed appropriately; confirm router advertisements if used.
Checklists / step-by-step plan
This is the “do it every time” section. Print it. Put it in your incident runbook. Make it somebody’s job to stop the room from freestyle debugging.
Checklist A: Local host sanity (Ubuntu 24.04)
-
Confirm interface is up and has carrier.
Useip link show. If no carrier, fix virtual/physical connectivity before anything else. -
Confirm addressing and prefix.
Useip -br addr. If prefix is wrong, routes and ARP behavior will lie to you. -
Confirm default route and specific route.
Useip route showandip route get <dest>. -
Check policy routing.
Useip rule show. If non-default rules exist, inspect those tables. -
Check neighbor entry for the next hop (gateway).
Useip neigh show.INCOMPLETEis your big red arrow. -
Check local firewall for rejects.
Usenft list rulesetandufw status. -
Capture traffic for 30 seconds.
Usetcpdumpfor ARP and ICMP. This is your evidence and your compass.
Checklist B: “Is it L2, L3, or policy?” decision tree
-
If
ip route getfails (no route): fix routing/default gateway/policy routing. -
If route exists but gateway ARP is
INCOMPLETE: it’s adjacency (VLAN/tagging/switch/gateway down). -
If gateway ARP resolves and gateway responds but destination fails:
check upstream routing/ACLs. Capture ICMP unreachables. - If nothing replies and nothing rejects (timeouts): look for drops (ACL drop, security group drop, MTU blackhole, upstream firewall silent drop).
Checklist C: Escalation packet (what to send to the network/security team)
If you need another team, don’t send “it doesn’t work.” Send a tight bundle:
ip route get <dest>output (shows next hop, interface, source IP)ip route showandip rule show(proves local intent)ip neigh showfor the gateway and destination (shows ARP state)- 30 seconds of
tcpdumpshowing ARP requests/replies and/or ICMP unreachables (shows who is rejecting) - Exact timestamp and destination/port (so they can correlate logs)
FAQ
1) Why do I get “No route to host” when the route table looks correct?
Because routing is only step one. Linux can know the next hop but still fail ARP/NDP for that next hop.
Check ip neigh for the gateway; INCOMPLETE or FAILED means you don’t have Layer 2 reachability to the next hop.
2) What’s the fastest single command to start with?
ip route get <dest>. It tells you interface, gateway, and chosen source IP. If that output surprises you, you’ve found the problem category.
3) How can an ACL cause “No route to host” instead of a timeout?
Some ACLs/firewalls are configured to reject rather than drop. Reject can generate ICMP “host unreachable” (or similar),
which your kernel surfaces as “No route to host” immediately. Capture ICMP with tcpdump to prove it.
4) Is “No route to host” the same as “Destination Host Unreachable” from ping?
They’re related but not identical. “Destination Host Unreachable” is ping’s interpretation of ICMP unreachable or local routing failure.
“No route to host” is a socket error your applications see. Both often point to unreachable at L2/L3, but you still need to locate where the unreachable is generated.
5) Why does it only fail for one destination IP, not the whole subnet?
Common causes: a missing specific route upstream, an ACL targeting that host, the destination host being down, or ARP conflict/duplication for that IP.
Use tracepath and tcpdump for ICMP unreachables to identify the rejecting hop.
6) How does Ubuntu 24.04 netplan factor into this?
Netplan is the configuration layer; the kernel makes forwarding decisions. Netplan mistakes show up as wrong addresses, wrong routes, wrong gateways, wrong renderer.
Verify with netplan get and networkctl status (or NetworkManager tools if that’s your renderer).
7) Can DNS cause “No route to host”?
DNS can make you connect to the wrong IP, but the error is still about reaching that IP. If you suspect DNS, resolve the name to an address,
then run ip route get for that address and proceed normally.
8) What if ARP works for the gateway, but connections still say “No route to host”?
Then the unreachable is likely being generated upstream (gateway/firewall) or locally by policy routing or firewall rejects.
Capture ICMP on the host during a failed connection attempt; if you see ICMP unreachable from a router/firewall, you have your culprit.
9) How do I differentiate “dropped” vs “rejected” quickly?
A reject often fails immediately and may show ICMP unreachable in tcpdump. A drop usually times out. That’s not perfect, but it’s a strong heuristic.
Always confirm with a capture on the egress interface.
10) Should I flush ARP cache as a fix?
As a diagnostic nudge, sometimes. As a habit, no. Flushing neighbor entries can mask a systemic problem (bad failover signaling, VLAN mis-tagging, duplicate IPs).
Use it to confirm behavior, then fix the underlying cause.
Conclusion: next steps that prevent repeats
“No route to host” is not a riddle. It’s a request: show me the route decision, show me the neighbor state, show me the policy outcome.
Do those three things and the problem usually stops being mysterious within minutes.
Practical next steps:
- Standardize your triage bundle:
ip route get,ip neigh,ip rule, and a shorttcpdump. - Teach the team to treat
INCOMPLETEneighbor entries as “stop and check VLAN/gateway,” not “try another route.” - Document which subnets are allowed to talk to which services. “Implicitly routable” is a myth in segmented networks.
- In change management, require proof of connectivity from at least one host in each segment after ACL/routing updates.