You can SSH into the box from the local VLAN. You can ping the gateway. The Wi‑Fi icon swears everything is fine. But the moment you try apt update, open a web page, or hit an external API: dead air.
This failure mode is the networking equivalent of being locked inside the office while your badge still opens the lobby door. The machine is “connected”… to somewhere. Just not to the internet. The culprit is usually routing: the default gateway is wrong, missing, or being outvoted by policy routing you forgot existed.
The mental model: packets don’t believe your UI
“Connected” typically means: the link is up, you got an IP address, and something responded to basic local traffic. It does not mean your outbound packets have a working default route, that replies will return on the same path, or that DNS resolves the way your applications need.
On Debian/Ubuntu, “no internet” usually reduces to one of these:
- No usable default route (missing, points to the wrong gateway, or has the wrong metric).
- Asymmetric routing (packets leave via one interface, replies come back via another and get dropped by rp_filter, firewall, or state tracking).
- Policy routing (ip rules and tables choose a different route than you think).
- DNS lies (resolving to unroutable addresses, captive portals, stale split-horizon zones).
- MTU/PMTUD problems (TLS handshakes hang, pings work, TCP stalls).
- VPN/tunnel side effects (split tunnel done wrong is just full outage with extra steps).
Be ruthless about separating layers: link, IP, routing, DNS, and application. Most time is wasted when people treat these as one vibe.
One quote to keep you honest: “Hope is not a strategy.” — Gen. Gordon R. Sullivan
Joke #1: If your default route is missing, your packets are basically writing “Return to sender” on themselves and walking into the void.
Fast diagnosis playbook (first/second/third)
First: confirm what “no internet” means in packets
- Can you reach the local gateway? If not, stop. You’re not debugging “internet.” You’re debugging L2/L3 adjacency.
- Can you reach a public IP by number? If yes but DNS fails, it’s DNS. If no, it’s routing/firewall/MTU.
- What route does the kernel actually pick? Don’t guess. Ask with
ip route get.
Second: isolate routing vs policy routing vs NAT
- Check the default route and metrics (
ip route). - Check policy rules (
ip rule) and the referenced tables (ip route show table X). - Check reverse path filtering if you have multiple interfaces (
sysctl net.ipv4.conf.*.rp_filter).
Third: test the “hard cases” that fake you out
- MTU / PMTUD: if ICMP works but HTTPS stalls, test with “do not fragment” pings and smaller payloads.
- Per-app proxy settings: curl works, browser doesn’t—or the reverse.
- VPN/WireGuard routes:
AllowedIPsor pushed routes can replace your default route.
Interesting facts and historical context (routing edition)
- Policy routing in Linux isn’t new. The multiple routing table design (iproute2) has been around since the late 1990s, and it’s still misunderstood daily.
- The “default route” is just a route. In the kernel it’s typically
0.0.0.0/0ordefault—no magic, just the least specific match. - Linux chooses routes by longest prefix, then metric. That’s why a more specific route can override your default, even if you never meant it to.
- Reverse path filtering (rp_filter) was designed for anti-spoofing. It breaks multi-homed hosts if you don’t tune it. Security features often do this: useful until you scale reality.
- systemd-resolved popularized stub resolvers on 127.0.0.53. When DNS fails, people blame “127.0.0.53” like it personally insulted them. It’s usually fine; your upstream isn’t.
- Netplan is a generator, not a daemon. It writes configuration for NetworkManager or systemd-networkd. Debug the renderer, not the YAML aesthetics.
- IPv6 can “win” unexpectedly. Many clients prefer IPv6; if you have broken IPv6 routing, you get slow failures that feel like packet loss.
- Default route wars happen on laptops. Plug in Ethernet, keep Wi‑Fi, add a VPN—now you have three contenders for “default.” The kernel will pick one. Your brain will pick the wrong one.
Practical tasks: commands, outputs, and decisions (12+)
These are the tasks I actually run when a box claims it’s connected but behaves like it’s on a desert island. Each task includes: a command, what the output means, and the decision you make from it.
Task 1: Verify link state and interface names
cr0x@server:~$ ip -br link
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
ens18 UP 52:54:00:12:34:56 <BROADCAST,MULTICAST,UP,LOWER_UP>
wlp2s0 DOWN 10:fe:ed:aa:bb:cc <NO-CARRIER,BROADCAST,MULTICAST,UP>
Meaning: ens18 is up and has carrier. wlp2s0 is administratively up but has no carrier (not associated).
Decision: Focus on ens18. Don’t troubleshoot Wi‑Fi DNS when the Wi‑Fi isn’t even connected.
Task 2: Confirm IP addressing and scope
cr0x@server:~$ ip -br addr show
lo UNKNOWN 127.0.0.1/8 ::1/128
ens18 UP 10.20.30.41/24 fe80::5054:ff:fe12:3456/64
Meaning: You have IPv4 10.20.30.41/24. No global IPv6, only link-local.
Decision: For internet tests, use IPv4 first to keep variables low. If IPv6 is expected, you have a separate problem (RA/DHCPv6).
Task 3: Check the default route (the usual villain)
cr0x@server:~$ ip route show
10.20.30.0/24 dev ens18 proto kernel scope link src 10.20.30.41
Meaning: No default via .... You can talk to the local subnet and that’s it.
Decision: Fix gateway provisioning (DHCP, static config, netplan, NM). Don’t touch DNS yet.
Task 4: Ask DHCP what it thinks it gave you
cr0x@server:~$ sudo journalctl -u systemd-networkd --since "10 min ago" | tail -n 20
...ens18: DHCPv4 address 10.20.30.41/24 via 10.20.30.1
...ens18: Configured gateway: 10.20.30.1
...ens18: Setting route: dst: 0.0.0.0/0, gw: 10.20.30.1
Meaning: networkd claims it set a default route via 10.20.30.1, but ip route didn’t show it.
Decision: Suspect another manager overwrote it, or policy routing moved it to a non-main table.
Task 5: Show routes in all tables (not just main)
cr0x@server:~$ ip route show table all | sed -n '1,120p'
default via 10.20.30.1 dev ens18 table 100
10.20.30.0/24 dev ens18 proto kernel scope link src 10.20.30.41
local 10.20.30.41 dev ens18 table local proto kernel scope host src 10.20.30.41
broadcast 10.20.30.255 dev ens18 table local proto kernel scope link src 10.20.30.41
Meaning: The default route exists, but it’s in table 100, not the main table.
Decision: Check ip rule to see when table 100 is used. If no rule matches your traffic, you have “connected but no internet.”
Task 6: Inspect policy rules and priorities
cr0x@server:~$ ip rule show
0: from all lookup local
1000: from 10.20.30.0/24 lookup 100
32766: from all lookup main
32767: from all lookup default
Meaning: Only traffic with source in 10.20.30.0/24 should use table 100. That seems fine—unless your source address is different (containers, policy marks, secondary IPs).
Decision: Test real route selection with ip route get for both an external IP and with explicit from.
Task 7: Ask the kernel which route it will take
cr0x@server:~$ ip route get 1.1.1.1
1.1.1.1 dev ens18 src 10.20.30.41 uid 1000
cache
Meaning: The kernel chose ens18 and source 10.20.30.41. If you saw “unreachable” here, you’re done: routing is broken.
Decision: If ip route get is sane but ping fails, look at firewall/NAT upstream. If it’s insane, fix routing locally.
Task 8: Test gateway reachability (L3 adjacency)
cr0x@server:~$ ping -c 2 10.20.30.1
PING 10.20.30.1 (10.20.30.1) 56(84) bytes of data.
64 bytes from 10.20.30.1: icmp_seq=1 ttl=64 time=0.341 ms
64 bytes from 10.20.30.1: icmp_seq=2 ttl=64 time=0.305 ms
--- 10.20.30.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1013ms
rtt min/avg/max/mdev = 0.305/0.323/0.341/0.018 ms
Meaning: You can reach the gateway. L2/L3 to the first hop is fine.
Decision: If internet is still dead, your default route, NAT, or upstream routing is the issue—not cabling.
Task 9: Test internet by IP (remove DNS from the story)
cr0x@server:~$ ping -c 2 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=12.4 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=12.2 ms
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 12.2/12.3/12.4/0.1 ms
Meaning: Raw connectivity to the internet works.
Decision: If browsers/apt still fail, it’s DNS, proxying, MTU, or application-level restrictions.
Task 10: Check DNS resolution path (systemd-resolved aware)
cr0x@server:~$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 10.20.30.53
DNS Servers: 10.20.30.53 10.20.30.54
Meaning: You’re using systemd-resolved stub mode, with upstream DNS servers in your LAN.
Decision: If resolution fails, test those DNS servers directly and confirm routing to them exists. Also check split DNS domains if a VPN is active.
Task 11: Verify name resolution and distinguish SERVFAIL vs timeout
cr0x@server:~$ resolvectl query deb.debian.org
deb.debian.org: 2a04:4e42:83::644
2a04:4e42:400::644
151.101.130.132
151.101.194.132
151.101.66.132
151.101.2.132
-- Information acquired via protocol DNS in 27.1ms.
-- Data is authenticated: no
Meaning: DNS works and returns both AAAA and A records. If this hung, you’d suspect DNS reachability. If it returned wrong IPs, you’d suspect split-horizon or captive interception.
Decision: If DNS works but HTTP fails, move to MTU, proxy, or firewall checks.
Task 12: Trace the actual path (quick and dirty)
cr0x@server:~$ traceroute -n 1.1.1.1 | sed -n '1,8p'
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
1 10.20.30.1 0.410 ms 0.361 ms 0.342 ms
2 198.51.100.1 2.103 ms 2.034 ms 2.012 ms
3 203.0.113.9 6.821 ms 6.774 ms 6.736 ms
4 1.1.1.1 12.504 ms 12.412 ms 12.380 ms
Meaning: You have a path beyond the gateway. If it died at hop 1, your gateway lacks upstream/NAT or blocks ICMP TTL exceeded. If it dies later, it’s upstream routing/firewall.
Decision: If hop 1 is fine but hop 2+ is dead, escalate to network team or fix your upstream NAT/router.
Task 13: Detect MTU blackholes (the “TLS hangs” classic)
cr0x@server:~$ ping -c 2 -M do -s 1472 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1400
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1007ms
Meaning: Your interface MTU is 1400 (or path forces 1400). Sending 1500-byte frames with DF fails locally.
Decision: Set MTU appropriately on the interface or tunnel. If a VPN is involved, the tunnel MTU is usually the real constraint.
Task 14: Check reverse path filter when multi-homed
cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.ens18.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.ens18.rp_filter = 1
Meaning: Strict rp_filter is enabled. On multi-homed hosts with asymmetric routing, this can drop valid replies.
Decision: If you truly need multi-homing/policy routing, set rp_filter to 2 (loose) on relevant interfaces, or design routes to avoid asymmetry.
Task 15: Confirm what manager is controlling the interface
cr0x@server:~$ networkctl status ens18 | sed -n '1,30p'
● 2: ens18
Link File: /usr/lib/systemd/network/99-default.link
Network File: /run/systemd/network/10-netplan-ens18.network
Type: ether
State: routable (configured)
Online state: online
Address: 10.20.30.41
Gateway: 10.20.30.1
DNS: 10.20.30.53
Meaning: systemd-networkd owns this interface, and netplan generated its config into /run/systemd/network.
Decision: Don’t “fix” this using NetworkManager unless you want dueling configs. Make the fix in netplan or networkd, then apply.
Routing failures that look like “no internet”
1) Missing default gateway (most common, least dramatic)
If ip route lacks a default via, you don’t have internet access. Period. You can still reach local hosts on-link, which fuels the illusion that “networking works.” It doesn’t; it’s unfinished.
Fix paths:
- DHCP expected: ensure the DHCP client runs on the correct interface and isn’t blocked. Check logs for “no DHCPOFFERS received.”
- Static expected: configure gateway explicitly in netplan, ifupdown, or NetworkManager, but only one of them.
2) Two default gateways, one wrong metric
Laptops love this. Servers can do it too when someone adds a “temporary” interface and forgets it. If you have two defaults, Linux will choose based on metric (lower is preferred) and protocol preferences. Your traffic may exit through an interface that can’t NAT, can’t reach upstream, or is quarantined.
Metrics are not advisory. They’re the tiebreaker that decides whether your packets go out through working internet or a management VLAN with no route to the world.
3) A default route exists, but in the wrong table
This is the policy routing trap: ip route looks fine in table 100, but your traffic uses main. Or vice versa. The machine is “connected” in the sense that it has routes somewhere. But they’re not being used for the traffic you care about.
4) Asymmetric routing + rp_filter = “it pings sometimes”
Asymmetric routing is not automatically wrong. It’s wrong when your host, or something upstream, enforces symmetry. Linux with strict rp_filter=1 will drop packets if the reverse path check fails. Stateful firewalls and NAT devices upstream can also punish asymmetry.
Symptoms are fun: outbound SYNs leave, SYN-ACKs return on a different interface and disappear. Some destinations work, others don’t. People blame “the internet.” The internet is innocent.
5) MTU and PMTUD problems: the stealth outage
ICMP ping at 56 bytes works. DNS works. But HTTPS hangs. SSH sometimes connects but file transfers stall. This is often path MTU discovery failing—especially across tunnels or misconfigured jumbo frames.
Modern networks often filter ICMP too aggressively. That breaks PMTUD. Then large packets get blackholed and TCP spends its day learning helplessness.
Policy routing: the silent bouncer at the door
If you only ever had one interface and one gateway, you could ignore policy routing forever. The moment you add a VPN, a second NIC, a container bridge with special rules, or “just one more subnet,” policy routing shows up—quietly—and starts deciding which table gets consulted for which packets.
The three pieces you must keep straight
- Rules (
ip rule): match traffic by source, destination, fwmark, iif, uidrange, etc., and choose a routing table. - Tables (
ip route show table X): contain routes, including defaults. The main table is just one of them. - Route selection (
ip route get): what the kernel actually does for a given packet.
What goes wrong in the real world
Rule matches the wrong source. You add a rule “from 10.0.0.0/24 lookup 100” assuming the server always uses 10.0.0.10. Then Docker or a secondary IP becomes the source for some traffic. Now half your traffic consults table main (no default), half consults table 100 (has default). It’s a partial outage dressed as randomness.
Rules get reordered. Priorities matter. A broad rule at priority 100 can shadow a specific rule at 1000. Always read priorities as “smaller number wins first.” It’s not a suggestion; it’s the control flow.
Marks without documentation. Someone used iptables -t mangle to mark packets for special routing. Then they left the company. Now your host has a second routing universe keyed by a hex number nobody remembers.
A sane approach for multi-homing
If you genuinely need two uplinks (say, a management network and a production internet uplink), do source-based routing intentionally:
- One table per uplink with a default route and on-link routes.
- One rule per source prefix selecting its table.
- Explicit route metrics in the main table to avoid accidental defaults.
- rp_filter tuned appropriately (
2is common for multi-homed, but understand the security trade-off).
Joke #2: Policy routing is like office politics—if you don’t know it exists, it’s already deciding your future.
Netplan, systemd-networkd, NetworkManager: who owns the truth?
Ubuntu (and sometimes Debian derivatives) can involve three layers of “network configuration” that people confuse:
- Netplan: YAML that generates config for a renderer. It doesn’t manage interfaces at runtime.
- systemd-networkd: a network manager for servers; configured via
.networkfiles and often fed by netplan. - NetworkManager: common on desktops, laptops, and some servers; can manage Wi‑Fi, VPN, and roaming.
Pick one renderer per interface. Don’t split-brain it. If NetworkManager manages ens18 while systemd-networkd also thinks it manages ens18, you’ll get route flapping that looks like “intermittent internet.” It’s not intermittent; it’s being rewritten.
How to tell which one is active
- systemd-networkd:
networkctl statusshows “configured” and points to a Network File. - NetworkManager:
nmcli dev statusshows state; routes often appear withproto dhcpbut that’s not exclusive.
Netplan: the two lines people forget
For static IPv4, you need an address and a route. Names vary by version; the most portable modern way is to define default via routes explicitly. Example patterns are fine, but production needs you to verify what netplan generated in /run/systemd/network and what the kernel installed in ip route.
When you change netplan, apply it in a way that won’t strand you on a remote system. netplan try exists for a reason: it gives you a rollback timer if you lose connectivity.
Three corporate mini-stories (anonymized, plausible, and annoying)
Mini-story 1: The outage caused by a wrong assumption
The incident started as a routine migration: a fleet of Ubuntu VMs moved from one hypervisor cluster to another. Same VLANs, same IP ranges, same security groups. Everyone was calm. That was the first mistake.
Within minutes, application nodes began failing health checks. Local monitoring could reach them. SSH worked from the bastion in the same subnet. But the apps couldn’t reach external payment endpoints. “Connected, but no internet” at scale is a special kind of humiliation.
The team assumed the default gateway was being handed out by DHCP, because that’s how it was done in the old cluster. In the new cluster, the network team had moved that subnet to static addressing with DHCP only handing out IPs (yes, that’s a thing people do), and the gateway option was missing. VMs booted with an address but no default route.
Diagnosis was fast once someone stopped staring at dashboards and ran ip route on a broken node. No default route. Fix was equally boring: update the provisioning to set the gateway explicitly (netplan in this case) and redeploy. The real fix was cultural: stop assuming that “DHCP means you get a gateway.” It usually does—until it doesn’t.
Mini-story 2: The optimization that backfired
A performance-minded engineer decided to “clean up routing” on a multi-homed Debian host. The box had two NICs: one for management, one for production traffic, plus a VPN interface used for specific partner APIs. It had accumulated years of routing cruft.
The optimization: add strict reverse path filtering globally (net.ipv4.conf.all.rp_filter=1) and remove what looked like “duplicate routes.” The goal was to reduce spoofing risk and make routing deterministic.
It worked in the lab. Then it hit production. Suddenly, only some external calls failed, and only from some source IPs. The failures looked like random timeouts. TCP SYNs went out through production, but replies sometimes returned through the VPN because of policy routing plus specific partner routes. With strict rp_filter, those replies got dropped as “martians.” The system wasn’t under attack; it was under-specified.
The postmortem was short and unromantic: strict rp_filter is fine on single-homed systems. On multi-homed systems with asymmetric paths, it’s a foot-gun unless you design around it. They reverted to rp_filter=2 on the relevant interfaces and documented the routing policy instead of “optimizing” it into mystery.
Mini-story 3: The boring but correct practice that saved the day
A storage platform team ran a mixed environment: Ubuntu for compute nodes, Debian for some infrastructure services, all with a standardized “network sanity” script baked into their incident response.
It was dull. It printed interface status, default routes, policy rules, DNS status, and an ip route get for a known external IP. It also captured journalctl snippets for the active network renderer. Nobody bragged about it. That’s how you know it was good.
One afternoon, a subset of hosts lost internet access after a VPN client update. The UI showed “connected.” The script immediately highlighted the real change: a new ip rule with a higher priority than expected, forcing most traffic into a table with a default route via the VPN interface. The VPN wasn’t intended as a full tunnel and didn’t allow general internet egress.
Because they had the script output from before and after, the team could point to a concrete delta in routing policy rather than arguing about “DNS seems slow.” Rollback was clean. The long-term fix was to pin the VPN’s routing rules to only the partner prefixes. The boring practice saved hours because it reduced the problem to facts the kernel could not deny.
Common mistakes: symptom → root cause → fix
-
Symptom: Can ping gateway, can’t ping 1.1.1.1
Root cause: Missing default route or wrong gateway IP
Fix: Add/repairdefault via <gateway>in the correct manager (netplan/NM/networkd). Verify withip route get 1.1.1.1. -
Symptom: Ping 1.1.1.1 works, but names don’t resolve
Root cause: Broken DNS server reachability, wrong DNS server, or VPN split DNS misconfigured
Fix: Checkresolvectl status, query withresolvectl query, ensure routes exist to DNS servers, fix DHCP option 6 or netplannameservers. -
Symptom: DNS works, ping works, HTTPS hangs or times out
Root cause: MTU/PMTUD blackhole (often tunnel/VPN), or ICMP blocked upstream
Fix: Identify path MTU with DF pings; set interface/tunnel MTU appropriately; avoid blocking ICMP “fragmentation needed.” -
Symptom: Works on Wi‑Fi, fails on Ethernet (or vice versa)
Root cause: Two default routes; wrong metric; policy routing selects unintended uplink
Fix: Remove the unintended default, adjust metrics, or add explicit policy rules. Validate withip routeandip route get. -
Symptom: Only some destinations fail; others fine; failures “random”
Root cause: Asymmetric routing + rp_filter, or selective upstream filtering on one egress path
Fix: Tune rp_filter for multi-homing; ensure return path symmetry or correct policy rules; confirm with tcpdump on both interfaces. -
Symptom: Local LAN reachable; internet dead after VPN connects
Root cause: VPN installed a default route (full tunnel) or higher-priority policy rule
Fix: Adjust VPN split tunnel configuration so only required prefixes route via VPN; verify routing tables andip rulepriorities. -
Symptom: “It works as root but not as a service” (or vice versa)
Root cause: Policy routing by UID, fwmarks from iptables/nft, or per-service network namespaces
Fix: Inspectip ruleforuidrange/fwmark; check service sandboxing; confirm withip route get ... uidor nsenter into the namespace. -
Symptom: Route exists but traffic still doesn’t leave
Root cause: Wrong source IP chosen; missingsrcselection; ARP flux; multiple addresses
Fix: Pin source in routes or rules; verify withip route get; checkip addrfor secondary IPs.
Checklists / step-by-step plan
Checklist A: Single-homed host (one NIC, one gateway)
- Confirm interface is up and has carrier:
ip -br link. - Confirm IP address and mask:
ip -br addr. - Confirm default route exists in main:
ip route. - Confirm you can reach gateway:
ping -c 2 <gw>. - Confirm you can reach a public IP:
ping -c 2 1.1.1.1. - Confirm DNS servers and resolution:
resolvectl statusandresolvectl query. - If HTTPS fails, test MTU:
ping -M do -s 1472 1.1.1.1and adjust.
Checklist B: Multi-homed host (two NICs, maybe VPN)
- List all defaults and metrics:
ip route show | grep -E '^default'. - Show policy rules:
ip rule. If you didn’t expect rules, congratulations: you have rules. - Show all tables:
ip route show table all. Look for default routes in non-main tables. - Ask the kernel for the chosen path:
ip route get 1.1.1.1andip route get 1.1.1.1 from <source-ip>. - Check rp_filter:
sysctl net.ipv4.conf.all.rp_filter. If strict and multi-homed, consider loosening per-interface. - If VPN is involved, verify what routes it pushed/installed:
ip route showbefore/after VPN up. - Validate return traffic with tcpdump on both interfaces if you suspect asymmetry.
Checklist C: Make the fix without bricking remote access
- When using netplan on Ubuntu, prefer
sudo netplan tryfirst (local console recommended). - When changing routes on a remote box, add new routes before removing old ones.
- Keep a persistent out-of-band path (console, iLO/IPMI, cloud serial console). “I’ll just SSH back in” is not a plan.
- After changes, validate with
ip route,ip rule,ip route get, and a real application test (curlto an external endpoint).
FAQ
1) Why does the desktop say “Connected” when I have no internet?
Because “connected” usually means link + IP. It doesn’t validate a default route, DNS, or upstream NAT. The kernel doesn’t care about icons.
2) What’s the single fastest command to prove routing is the issue?
ip route get 1.1.1.1. If it returns “unreachable” or chooses a bizarre interface/source, you’ve found your category of problem.
3) I have a default route, but still no internet. Now what?
Check whether replies come back the same way (asymmetry) and whether policy routing forces traffic into a different table. Also test MTU if HTTPS stalls.
4) Why does /etc/resolv.conf show 127.0.0.53? Is that broken?
Not inherently. That’s systemd-resolved’s local stub. The real upstream servers are in resolvectl status. Fix the upstream or the routing to it.
5) How do multiple default routes happen?
DHCP on two interfaces, Wi‑Fi plus Ethernet, VPN clients that install a default route, or static config layered on top of DHCP. Linux will pick one based on metrics and rule/table logic.
6) What’s the difference between “main table” routing and policy routing?
Main table routing is the default lookup. Policy routing adds rules that choose different tables based on packet attributes (source, mark, incoming interface, UID). If you have rules, “main” may not matter for the traffic you care about.
7) Should I disable rp_filter to fix this?
Don’t disable it blindly. On single-homed hosts, strict rp_filter is fine. On multi-homed hosts, set it to loose (2) per interface if asymmetry is legitimate. Better: design routing to be symmetric where possible.
8) Why does ping work but apt/curl fails?
Ping is small and simple. apt/curl hits DNS, TCP, TLS, proxies, and often larger packets. MTU/PMTUD issues and DNS issues are the most common reasons for this mismatch.
9) Can IPv6 cause “no internet” even if IPv4 is okay?
Yes. Many clients prefer IPv6; broken IPv6 routing can lead to slow failures. Test with curl -4 vs curl -6 and confirm IPv6 default routes and RA/DHCPv6 status.
10) I fixed it with a manual ip route add default. How do I make it persistent?
Put the configuration in the system’s network manager (netplan, NetworkManager, ifupdown, or networkd). Manual ip route changes vanish on reboot or link flap—and they will come back to haunt you.
Next steps you can actually do
If you take nothing else from this: stop trusting “connected” and start interrogating the kernel. Run ip route, ip rule, and ip route get. Those three will tell you what the host believes about the world.
Then do the boring fixes, in order:
- Ensure exactly one intended default route exists for the traffic class you care about (or multiple, but with explicit metrics and rules).
- Make policy routing explicit, documented, and minimal. If you can’t explain every
ip rule, you don’t own your routing. - Validate DNS via
resolvectland test by IP to separate concerns. - If applications hang, treat MTU as guilty until proven innocent.
- Persist changes in the correct manager (netplan/NM/networkd) and avoid split-brain configuration.
Production networking isn’t hard because the commands are hard. It’s hard because humans keep layering “just one more thing” until the host becomes a choose-your-own-adventure novel. Your job is to make it a grocery list again.