Symptom: a multi-homed Ubuntu 24.04 box “mostly works” until it doesn’t. Some inbound connections hang. Replies vanish. Monitoring says the link is fine. Your app team insists it’s DNS. It’s not.
Reality: the kernel is silently dropping packets because rp_filter decided your routing looks suspicious. And if you have two NICs, two uplinks, VRFs, policy routing, Kubernetes, VIPs, or anything resembling real life, suspicion becomes a lifestyle.
Case #54: the multi-homing outage you didn’t see coming
We’ll frame this like a real incident, because that’s how you’ll meet it: half an outage with excellent vibes.
A service node has two uplinks:
- eth0 to the “internal” network (east-west traffic, cluster, storage).
- eth1 to a “dmz/edge” network (north-south traffic, partners, monitoring, sometimes VPN).
Both links are up. Both have addresses. There is policy-based routing to make certain source IPs prefer certain gateways. There may also be a VIP for failover (VRRP/keepalived), or a WireGuard interface that becomes the default route for a subset of traffic.
Then someone upgrades to Ubuntu 24.04 or rebuilds a node and applies “security hardening” sysctls. Suddenly:
- Inbound SYNs arrive on eth1, but SYN-ACK never leaves.
- ICMP ping works from some networks but not others.
- Connections from one partner ASN flap every few minutes.
- Traffic to storage works; traffic to the load balancer doesn’t; Kubernetes liveness checks are a coin toss.
The first hour goes to blaming the obvious: firewall rules, MTU, a “bad switch port”, and—inevitably—DNS. Meanwhile the kernel is discarding packets because the reverse-path check can’t find a route back via the same interface it arrived on, so it assumes spoofing and drops.
One quote that should live in your runbooks: “Hope is not a strategy.” —paraphrased idea often attributed to engineers in reliability circles
Here’s the blunt truth: multi-homing without explicit routing design is just gambling with more cables.
What reverse path filtering actually does (and why it bites)
rp_filter is Linux’s source validation feature. The goal is good: drop packets that claim to come from a source address that the system would not route back to. This blocks some spoofing and reduces the blast radius of bad routes. In small, single-homed setups, it’s usually invisible and mostly helpful.
The mental model
When a packet arrives on interface X with source IP S, the kernel asks: “If I were to send a packet to S, would it go out interface X?”
- If the answer is “yes”, accept it.
- If the answer is “no”, drop it (or treat it specially depending on mode).
Strict vs loose vs off
Linux implements this with net.ipv4.conf.*.rp_filter:
- 0 (off): do not perform reverse path filtering.
- 1 (strict): incoming interface must match the route back to the source. Great for a single default route. Dangerous for asymmetry.
- 2 (loose): source must be reachable via some interface (route exists), but not necessarily the incoming one. This is usually the sane setting for multi-homed hosts.
Multi-homing creates asymmetry on purpose. Traffic can legitimately enter via one interface and leave via another because of:
- policy routing (
ip rule) - multiple default gateways (even “accidentally”)
- ECMP decisions in upstream networks
- NAT or VIP advertisement changes (VRRP/keepalived)
- tunnels (WireGuard, IPsec) where “return path” is not the same physical NIC
Strict rp_filter treats those legitimate packets like forgeries. You get “random” packet loss that correlates with which path upstream chose, which is why you can stare at the box for an hour and swear it’s haunted.
Joke #1: Reverse path filtering is like a bouncer who checks whether you would leave through the same door you entered. Great for fire codes, terrible for buildings with more than one exit.
Interesting facts & context (why this keeps happening)
- rp_filter exists because spoofing used to be casual. In early, less-filtered Internet days, forged source addresses were common enough that host-side checks mattered.
- It’s part of a broader family called “source address validation”. Networks implement similar concepts at edges, but host validation still helps when upstream filtering is inconsistent.
- Loose mode was built specifically to tolerate asymmetry. Multi-homing and complex routing aren’t “edge cases” anymore; they’re Tuesdays.
- The setting is per-interface and also “all”/“default”. People change one and assume it affects the others. It doesn’t, not always, and not the way you think.
- Containers and Kubernetes add interfaces you didn’t ask for. CNI plugins create veth pairs, bridges, and routes that can change the reverse-path decision unexpectedly.
- VRRP/keepalived VIPs can trigger rp_filter drops during failover. A VIP may arrive on one interface while the “best” return route is elsewhere during convergence.
- Cloud networking often produces asymmetric return paths. Secondary NICs, source/destination checks, and overlay routing mean packets may arrive “from” places your main table wouldn’t choose.
- rp_filter interacts with policy routing in non-obvious ways. Reverse-path lookup might not consult the routing table you expected unless rules and source selection are correct.
- People confuse rp_filter with firewalling. The drop happens before you see it in
iptables/nftableslogs unless you explicitly log kernel martians.
Fast diagnosis playbook
If you’re on call and you have 10 minutes before someone escalates, do this in order. The goal is to prove or eliminate rp_filter quickly, then decide whether you need routing fixes or sysctl changes.
1) Confirm the symptom is directional and interface-specific
- Can you see inbound packets on the expected interface?
- Do replies leave, and if so, which interface?
- Is the failure tied to a particular source network?
2) Check rp_filter values (all, default, and the interface)
If rp_filter=1 anywhere relevant, assume it’s guilty until proven innocent.
3) Do a route lookup for the source IP from the interface context
Use ip route get and make sure the egress interface matches the ingress interface if strict mode is enabled. If it doesn’t match, strict mode will drop.
4) Look for kernel logs: martians and reverse-path complaints
Enable temporary logging if needed, but don’t leave it on forever in high-traffic systems unless you enjoy filling disks.
5) If it’s multi-homed by design, move to loose mode (2) or fix policy routing
Strict mode is a fine security setting for single-homed servers. Multi-homed servers are not single-homed servers. Treat them differently.
Practical tasks: commands, outputs, decisions (12+)
These are the tasks I actually run when a multi-homed Ubuntu box starts dropping traffic “for no reason”. Each includes: command, example output, what it means, and what decision you make.
Task 1: List interfaces and addresses (spot multi-homing and VIPs)
cr0x@server:~$ ip -br addr
lo UNKNOWN 127.0.0.1/8 ::1/128
eth0 UP 10.10.10.20/24
eth1 UP 172.20.5.20/24
wg0 UP 10.99.0.2/32
Meaning: you have at least two L3 domains plus a tunnel. Asymmetry is not a hypothetical.
Decision: treat this host as multi-homed; strict rp_filter is likely wrong unless routing is very carefully pinned.
Task 2: Show default routes and metrics (spot accidental “two defaults”)
cr0x@server:~$ ip route show default
default via 10.10.10.1 dev eth0 proto dhcp src 10.10.10.20 metric 100
default via 172.20.5.1 dev eth1 proto static src 172.20.5.20 metric 200
Meaning: two default routes exist. Linux will prefer the lower metric, but inbound traffic can arrive on either interface depending on upstream routing.
Decision: either implement policy routing by source/interface, or expect rp_filter strict to drop.
Task 3: Check rp_filter globally and per-interface (the usual smoking gun)
cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.default.rp_filter net.ipv4.conf.eth0.rp_filter net.ipv4.conf.eth1.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.eth0.rp_filter = 1
net.ipv4.conf.eth1.rp_filter = 1
Meaning: strict mode everywhere.
Decision: if this host is truly multi-homed (and not “one interface is dead weight”), change to loose mode (2) for the relevant interfaces or redesign routing so strict validation holds.
Task 4: Confirm the problem with a route lookup to a failing source (reverse-path logic)
cr0x@server:~$ ip route get 203.0.113.55
203.0.113.55 via 10.10.10.1 dev eth0 src 10.10.10.20 uid 0
cache
Meaning: the kernel would send traffic to 203.0.113.55 via eth0. If packets from 203.0.113.55 arrive on eth1, strict rp_filter will drop them.
Decision: either fix routing so return path uses eth1 for that source (policy routing), or relax rp_filter.
Task 5: Capture on the ingress interface (prove packets arrive)
cr0x@server:~$ sudo tcpdump -ni eth1 host 203.0.113.55 and tcp port 443 -c 5
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:10:01.100001 IP 203.0.113.55.51512 > 172.20.5.20.443: Flags [S], seq 123456789, win 64240, options [mss 1460,sackOK,TS val 1 ecr 0], length 0
12:10:02.100002 IP 203.0.113.55.51512 > 172.20.5.20.443: Flags [S], seq 123456789, win 64240, options [mss 1460,sackOK,TS val 2 ecr 0], length 0
^C
2 packets captured
Meaning: SYNs arrive on eth1. If your service is listening, you should see SYN-ACK leaving. If you don’t, suspect kernel drop or local firewall.
Decision: capture egress too; if nothing leaves, move to rp_filter/martian checks.
Task 6: Capture on the expected egress interface (see if replies leave elsewhere)
cr0x@server:~$ sudo tcpdump -ni eth0 host 203.0.113.55 and tcp port 443 -c 5
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
Meaning: no replies leaving eth0 for that flow (at least during capture). Combine with Task 5 and you likely have drops before the TCP stack responds, or replies are blocked elsewhere.
Decision: check kernel logs and rp_filter; verify the service is listening; check nftables/iptables only after you’ve ruled out rp_filter.
Task 7: Inspect kernel logs for martians and rp_filter drops
cr0x@server:~$ sudo journalctl -k --since "10 min ago" | tail -n 20
Dec 30 12:09:58 server kernel: IPv4: martian source 203.0.113.55 from 203.0.113.55, on dev eth1
Dec 30 12:09:58 server kernel: ll header: 00000000: 00 11 22 33 44 55 66 77 88 99 aa bb 08 00
Dec 30 12:09:58 server kernel: IPv4: martian source 203.0.113.55 from 203.0.113.55, on dev eth1
Meaning: the kernel is telling you it considers that source invalid on that interface. That’s reverse-path validation in action (or related martian checks).
Decision: switch to loose rp_filter or fix policy routing so the reverse path matches.
Task 8: Temporarily set rp_filter to loose mode (quick test, not yet permanent)
cr0x@server:~$ sudo sysctl -w net.ipv4.conf.eth1.rp_filter=2
net.ipv4.conf.eth1.rp_filter = 2
Meaning: eth1 now uses loose mode.
Decision: retest the failing client. If the issue disappears immediately, you’ve confirmed root cause. Then make a permanent, documented change (and consider the security implications).
Task 9: Make rp_filter persistent with a dedicated sysctl file
cr0x@server:~$ printf '%s\n' \
'net.ipv4.conf.all.rp_filter=2' \
'net.ipv4.conf.default.rp_filter=2' \
'net.ipv4.conf.eth0.rp_filter=2' \
'net.ipv4.conf.eth1.rp_filter=2' | sudo tee /etc/sysctl.d/60-multihome-rpfilter.conf
net.ipv4.conf.all.rp_filter=2
net.ipv4.conf.default.rp_filter=2
net.ipv4.conf.eth0.rp_filter=2
net.ipv4.conf.eth1.rp_filter=2
cr0x@server:~$ sudo sysctl --system | tail -n 8
* Applying /etc/sysctl.d/60-multihome-rpfilter.conf ...
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.eth0.rp_filter = 2
net.ipv4.conf.eth1.rp_filter = 2
Meaning: settings will survive reboot and apply predictably.
Decision: commit this in your config management and annotate why; “security hardening” without exceptions is how you get midnight calls.
Task 10: Validate policy routing rules (if you prefer strict rp_filter)
cr0x@server:~$ ip rule show
0: from all lookup local
100: from 172.20.5.20 lookup 200
110: from 10.10.10.20 lookup 100
32766: from all lookup main
32767: from all lookup default
Meaning: traffic sourced from each IP uses a different table. That’s the beginning of sanity.
Decision: inspect tables 100 and 200 to ensure each has the correct default gateway and on-link routes. If a table is incomplete, reverse-path checks and reply routing will still be wrong.
Task 11: Inspect the custom routing tables (are defaults and connected routes present?)
cr0x@server:~$ ip route show table 100
default via 10.10.10.1 dev eth0 src 10.10.10.20
10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.20
cr0x@server:~$ ip route show table 200
default via 172.20.5.1 dev eth1 src 172.20.5.20
172.20.5.0/24 dev eth1 proto kernel scope link src 172.20.5.20
Meaning: each table is self-contained (default plus connected subnet). That’s what you want.
Decision: if you want strict rp_filter, you still need to guarantee that return routes match ingress. Policy routing helps, but you must also ensure source address selection and application binds behave as expected.
Task 12: Route lookup with explicit source (validate your policy routing)
cr0x@server:~$ ip route get 203.0.113.55 from 172.20.5.20
203.0.113.55 via 172.20.5.1 dev eth1 src 172.20.5.20 uid 0
cache
Meaning: if the response will be sourced from 172.20.5.20, it exits via eth1. This aligns with strict reverse-path expectations for flows arriving on eth1.
Decision: if this doesn’t match, fix ip rule and routing tables rather than turning rp_filter off blindly.
Task 13: Check for “martian logging” settings (helpful during incident)
cr0x@server:~$ sysctl net.ipv4.conf.all.log_martians net.ipv4.conf.eth1.log_martians
net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.eth1.log_martians = 0
Meaning: the kernel may be dropping without telling you.
Decision: temporarily enable logging on the suspect interface during diagnosis; disable afterward to avoid log spam.
Task 14: Temporarily enable martian logging (targeted, time-boxed)
cr0x@server:~$ sudo sysctl -w net.ipv4.conf.eth1.log_martians=1
net.ipv4.conf.eth1.log_martians = 1
Meaning: you’ll now see evidence in journalctl -k when reverse-path logic rejects traffic.
Decision: reproduce the problem once, capture the logs, then turn it back off.
Task 15: Verify ARP flux-related settings (often co-troubles with multihoming)
cr0x@server:~$ sysctl net.ipv4.conf.all.arp_ignore net.ipv4.conf.all.arp_announce
net.ipv4.conf.all.arp_ignore = 0
net.ipv4.conf.all.arp_announce = 0
Meaning: default ARP behavior can be “creative” on multi-homed hosts (responding on the “wrong” interface). Not rp_filter, but it creates similarly weird symptoms.
Decision: if you use VIPs or multiple subnets, consider tightening ARP behavior along with rp_filter settings, especially on bare metal or L2-adjacent networks.
Three corporate mini-stories from the trenches
Mini-story 1: the outage caused by a wrong assumption
The company had a pair of “identical” API nodes in two data centers. Each node had two NICs: one for internal services, one for partner traffic. The design doc said “partner traffic uses eth1.” That line was treated as physics.
A network change happened upstream—routine maintenance, a BGP preference tweak, nothing dramatic. One partner’s return path started arriving via a different edge, and packets began landing on eth0 on one site due to a new L3 hop. The API node still replied via eth1 because of policy routing keyed off the source address it chose for replies.
Strict rp_filter was enabled as part of a baseline hardening pack. It had always been enabled. It had also always been lucky. When the asymmetry arrived, the kernel started dropping inbound packets as “martians.” The app team saw intermittent 5xx errors. The SRE on call saw no interface errors, no firewall denies, no CPU spikes. Just a slow drip of failed sessions.
The wrong assumption was subtle: that “partner traffic uses eth1” meant it would arrive on eth1. In routed networks, “should” is an aspiration, not a guarantee.
The fix wasn’t heroic. They switched to loose mode on the partner-facing interface and updated policy routing to be consistent. Then they added a regression test: route lookups from each source IP to representative partner prefixes. The postmortem line that mattered: “We assumed symmetry; the network did not sign that contract.”
Mini-story 2: an optimization that backfired
A platform team wanted to reduce lateral movement risk. They pushed a new sysctl bundle across fleets: strict rp_filter, martian logging off (to reduce “noise”), and some additional knobs that looked good in a compliance spreadsheet.
It worked on the web tier: single NIC, single default route, predictable. So the team rolled it into the base image used for everything, including stateful systems with replication networks, backup networks, and occasionally a “migration NIC” that came and went.
The backfire was delayed. A month later, a storage-adjacent service started failing only during backup windows. Why? The backup system used a different network, and during those hours the service would originate some traffic with the “backup” source address due to an overly broad application bind. Replies took a different interface than the inbound requests. Strict rp_filter interpreted some inbound flows as spoofed and dropped them, but only when certain routes were present.
The optimization—tightening validation everywhere—created an intermittent bug that aligned with operational schedules. The worst kind of intermittent: reproducible only when you’re tired and your coffee is gone.
They recovered by splitting sysctl profiles: one for single-homed stateless tiers, one for multi-homed and routing-complex systems. They also changed the “quiet logs” approach. You don’t need martian logs all day, but you need an easy switch to turn them on during incidents. Quiet systems are pleasant; quiet failures are expensive.
Mini-story 3: the boring but correct practice that saved the day
A regulated enterprise ran multi-homed bastion hosts: one NIC to the admin network, one NIC to a controlled partner segment. The environment was full of change requests, and people loved “quick fixes.” These hosts refused quick fixes.
The team had a rule: every multi-homed node must have a small routing self-test that runs on boot and on network reload. It executed ip route get for a handful of representative destinations, from each source address, and compared outputs to a known-good expectation. If it drifted, it raised an alert before users felt it.
One day after a kernel update and a netplan change, the test failed: return path from the partner IP started preferring the admin default route. Nobody noticed in normal activity because the only affected flow was a low-volume monitoring callback. But the test noticed, loudly.
The fix was a one-line ip rule ordering correction and a deliberate choice: keep rp_filter=2 on the partner interface because asymmetric ingress could still happen. The key point is the boring part: the problem was caught by a deterministic check, not by hero debugging in production.
Joke #2: The most reliable system is the one that fails in staging. The second most reliable is the one that tattles immediately.
Designing multi-homing that survives rp_filter
Pick your philosophy: strict correctness or operational tolerance
You have two defensible approaches. Mixing them without thinking is how you get “case #54”.
Approach A: Keep strict rp_filter, design for symmetry
This can work, but it’s work:
- Ensure traffic entering on interface X will always route back to the source via X.
- Implement policy routing:
ip rulekeyed by source IP (and sometimes fwmarks) to select a routing table per interface/uplink. - Make each routing table complete: connected routes plus default (and any required specific routes).
- Ensure applications bind to the correct source addresses or use appropriate
SO_BINDTODEVICE/bind()behavior where necessary.
This is great when you truly control both sides of the path (private WAN, predictable upstream). It’s less great on the public Internet or in clouds with “helpful” routing.
Approach B: Use loose rp_filter for multi-homed, focus on reachability and logging
This is what I recommend for most multi-homed hosts that face unpredictable paths:
- Set
rp_filter=2on relevant interfaces (or globally, if the host is consistently multi-homed). - Keep policy routing anyway, because you still want stable egress behavior and correct source selection.
- Enable martian logs only during troubleshooting, or route them to rate-limited logging pipelines.
- Rely on upstream anti-spoofing at network edges, plus host firewalls, plus least privilege. rp_filter is not your only control.
Ubuntu 24.04 specifics: where people get surprised
Ubuntu 24.04 itself isn’t malicious. The surprise usually comes from what you changed at the same time:
- new baseline images
- new hardening sysctl sets
- netplan rewrites that inadvertently add a second default route or change metrics
- kernel upgrades that shift interface naming or driver behavior and reorder routes
- container networking upgrades (CNI versions) that add rules and routes
The failure mode feels like “Ubuntu networking is flaky.” It’s not. Your routing policy is inconsistent with strict source validation. The kernel is simply enforcing what you asked for.
How to decide: a practical rubric
- If the server is single-homed and stays that way: strict (1) is fine and often desirable.
- If the server has two active uplinks: default to loose (2) unless you have a strong reason and strong routing discipline.
- If the server uses VIPs, VRRP, load balancer direct server return, or anycast-ish tricks: assume asymmetry; use loose (2) and test failovers.
- If the server runs Kubernetes with multiple CNIs or special routing: loose (2) is usually safer; then validate with
ip route getper pod network as needed. - If security policy requires strict: enforce symmetry with policy routing and explicit application binding, and add continuous tests so it doesn’t drift.
Common mistakes: symptom → root cause → fix
1) “Inbound connections time out, but only from some networks”
Symptom: certain client subnets can’t connect; others are fine. SYNs arrive, no SYN-ACK.
Root cause: strict rp_filter drops packets arriving on an interface that isn’t the preferred return interface for that source (asymmetric path).
Fix: set net.ipv4.conf.<if>.rp_filter=2 or implement source-based routing so return path matches ingress.
2) “It broke right after we added a second default route for resilience”
Symptom: connectivity is flaky after adding a backup gateway.
Root cause: two defaults plus strict rp_filter creates a mismatch when traffic arrives on the higher-metric interface.
Fix: don’t rely on “backup default route” alone; use policy routing and/or routing protocols. If you keep dual defaults, prefer rp_filter loose mode.
3) “Firewall logs show nothing, but packets vanish”
Symptom: nftables/iptables counters don’t move; tcpdump sees inbound packets; the app doesn’t respond.
Root cause: rp_filter drop happens before firewall hooks you’re watching, and martian logging is off.
Fix: check sysctl net.ipv4.conf.*.rp_filter, enable log_martians briefly, then set rp_filter to loose or fix routing.
4) “VRRP failover causes a brief outage, then it recovers”
Symptom: a VIP moves, and some clients fail for 10–60 seconds.
Root cause: during convergence, the “best return route” changes faster/slower than VIP advertisement; strict rp_filter drops packets arriving on the “wrong” interface.
Fix: use rp_filter loose on VIP-carrying interfaces, and validate ARP behavior (arp_ignore/arp_announce) to avoid interface confusion.
5) “WireGuard works outbound, but inbound handshake is unreliable”
Symptom: tunnel comes up sometimes; remote peer sees retries.
Root cause: packets to the tunnel endpoint arrive on one NIC, return path prefers another due to default route/metrics; strict rp_filter drops.
Fix: rp_filter loose, plus ensure routing for the peer endpoint is pinned to the correct uplink if you need deterministic behavior.
6) “We set rp_filter=2 in /etc/sysctl.conf but it’s still 1 after reboot”
Symptom: runtime values revert.
Root cause: another sysctl.d file overrides it later, or cloud-init/hardening applies after your file.
Fix: put a dedicated file with an ordering that wins (higher lexical order), then verify with sysctl --system output.
7) “Only one interface is affected”
Symptom: eth1 drops, eth0 doesn’t.
Root cause: per-interface rp_filter differs, or only one interface receives asymmetric traffic.
Fix: inspect per-interface settings. Don’t assume all implies each interface is the same.
Checklists / step-by-step plan
Checklist A: stop the bleeding (incident mode)
- Prove ingress: tcpdump on the suspected interface for the failing client IP/port.
- Check rp_filter: read
net.ipv4.conf.all/default/<if>.rp_filter. - Check route back:
ip route get <client-ip>and compare expected egress interface. - Look for martians:
journalctl -k. If nothing, enablelog_martians=1temporarily on the interface. - Fast mitigation: set rp_filter to loose (2) on the affected interface(s), retest.
- Roll back risky changes: if a hardening bundle was applied fleet-wide, stop rollout and isolate multi-homed roles.
Checklist B: make it correct (after the incident)
- Decide your model: strict-with-symmetry or loose-with-policy-routing.
- Remove accidental dual defaults: if both defaults are necessary, document why and ensure metrics and rules are intentional.
- Implement policy routing: one table per uplink/source IP. Keep tables complete.
- Validate with route probes: run
ip route gettests from each source IP to representative destinations. - Make sysctls persistent: dedicated file in
/etc/sysctl.d/; verify ordering. - Document exceptions: multi-homed hosts aren’t “less secure”; they require different controls.
- Add a drift detector: a simple script that alerts when rp_filter changes or when routing tables lose required routes.
Checklist C: hardening without self-sabotage
- Classify hosts: single-homed, multi-homed, VIP, tunnel endpoints, Kubernetes nodes.
- Apply rp_filter by class: strict for single-homed; loose for multi-homed and VIP/tunnel roles unless symmetry is enforced.
- Log tactically: keep martian logs off by default, but provide an on-demand toggle and rate limits.
- Test failure modes: failover events, link down/up, DHCP renew, netplan apply, CNI restart.
FAQ
1) Is rp_filter a firewall?
No. It’s source validation in the kernel routing path. It can drop packets before your firewall rules or logs make the situation obvious.
2) Should I set rp_filter to 0 everywhere to “fix networking”?
Only if you enjoy trading one class of outage for a security regression. For multi-homed hosts, use 2 (loose) in most cases, and keep your routing policy sane.
3) What’s the safest setting for multi-homed Ubuntu servers?
rp_filter=2 is usually the best balance. Pair it with policy routing so egress is deterministic. If you can enforce symmetry reliably, strict can be viable—but don’t pretend it’s free.
4) Why does this show up after an Ubuntu 24.04 upgrade?
Because upgrades often coincide with new sysctl baselines, netplan changes, different default route metrics, or new interfaces (tunnels, containers). rp_filter didn’t suddenly become mean; your environment became more complex.
5) How do I confirm rp_filter is the culprit in minutes?
Check sysctl net.ipv4.conf.<if>.rp_filter, run ip route get <source-ip>, and look for martian logs. Flip the interface to loose mode temporarily and retest. If it fixes it, you have your answer.
6) Does loose mode (2) allow spoofing?
It relaxes interface matching, not reachability. The source must still be routable somewhere. It’s less strict, so yes, it’s a weaker anti-spoof control than strict mode—use upstream filtering and host firewalling as the primary controls.
7) Why do I see “martian source” in logs?
That’s the kernel telling you it received a packet with a source address that fails sanity checks for the interface, often because of rp_filter strict mode or route inconsistencies.
8) Can I keep strict rp_filter and still do policy routing?
Yes, but you must be disciplined: correct ip rule order, complete routing tables, and predictable source address selection. If any of those drift, strict rp_filter will punish you quickly.
9) Should Kubernetes nodes use strict rp_filter?
Usually not unless your CNI and routing are designed for it. Multi-interface nodes, overlay networks, and service VIPs make asymmetry common. Loose mode is safer operationally.
10) Where do I set this persistently on Ubuntu 24.04?
Create a file in /etc/sysctl.d/ (for example 60-multihome-rpfilter.conf), then apply with sysctl --system and verify the final values.
Conclusion: next steps that won’t wake you at 03:00
Reverse path filtering is one of those features that looks like “free security” until you run a real network. Multi-homing, VIPs, tunnels, and policy routing are normal in production. Strict rp_filter treats “normal” like “criminal.”
Do this:
- Classify hosts by networking complexity. Stop applying one sysctl profile to everything.
- For multi-homed roles, set rp_filter=2 (loose) unless you can prove symmetry end-to-end.
- Implement policy routing so egress is deterministic and debugging is humane.
- Add a routing self-test (a few
ip route getchecks) to catch drift before customers do. - Keep a martian logging toggle available for incidents, and turn it off again after you’ve got the evidence.
If you take nothing else: multi-homing is a design, not a checkbox. rp_filter just enforces whether you actually designed it.