Debian 13 Dual NIC Routing: Stop Asymmetric Routes and Random Drops (Case #53)

Was this helpful?

Dual-NIC Debian boxes fail in a very specific, very infuriating way: everything “works” until it doesn’t. SSH hangs for 20 seconds,
health checks flap, storage replication stalls, and TCP streams reset with no pattern besides “it’s worse at peak hours.”

The culprit is often asymmetric routing: packets arrive on NIC A and leave on NIC B, so stateful devices (or your own kernel) decide you’re lying.
This is case #53: two interfaces, two gateways, one server, and a pile of random drops you can’t reproduce on demand.

The failure shape: how asymmetry looks in production

Dual NICs are supposed to be boring. One NIC for “frontend,” one for “backend,” or maybe one for management and one for storage.
In the real world, the boundary erodes. Someone adds a second default route “just in case.” Or a routing daemon learns something
you didn’t mean to advertise. Or you plug both NICs into networks with their own opinions about your packets.

Asymmetric routing is not “bad” in the abstract. The internet runs on asymmetry all day. The problem is asymmetry through
stateful choke points: firewalls, NAT, load balancers, conntrack, DSR appliances, and sometimes the Linux kernel itself
when reverse-path filtering is enabled aggressively.

Typical symptoms:

  • Intermittent TCP resets, mostly on long-lived flows (replication, database, iSCSI/NFS, API streaming).
  • SSH sometimes stalls right after login or during scp; retries “fix it.”
  • Monitoring shows packet loss, but only from certain source subnets.
  • Incoming traffic lands on one interface, but replies exit the other (seen in tcpdump).
  • Kernel logs like martian source or silently dropped replies when rp_filter is strict.

If you have two default gateways on the same box and no policy routing, you don’t have redundancy.
You have a coin flip with consequences.

Interesting facts and context (yes, networking has lore)

  1. Policy routing in Linux is older than many “modern” cloud patterns. The ip rule framework arrived in the late 1990s with Linux 2.2 and matured in 2.4/2.6.
  2. Reverse path filtering (rp_filter) was popularized to blunt spoofing. It’s a security feature that doubles as a foot-gun in multi-homed servers.
  3. Asymmetric routing isn’t inherently wrong. It becomes wrong when a middlebox expects to see both directions of a flow on the same path.
  4. Linux has multiple route selection mechanisms. Longest-prefix match happens inside a table, then routing rules decide which table is consulted and when.
  5. ARP flux is a real thing. If Linux answers ARP for the “wrong” interface, peers send traffic to the wrong MAC and you chase ghosts.
  6. ECMP (equal-cost multipath) can look like “random drops.” It’s deterministic per flow hash, but the application sees it as chaos when middleboxes disagree.
  7. Conntrack is not just a firewall detail. Even “allow all” stateful rules depend on conntrack; out-of-path return traffic gets labeled INVALID and dropped.
  8. Systemd-networkd changed the default experience for many admins. Debian 13 makes it easy to produce multiple default routes unless you’re deliberate.

Fast diagnosis playbook

When packets “randomly drop,” you don’t have time for philosophical networking debates. You want a fast funnel:
confirm asymmetry, identify who’s dropping, then make routing deterministic.

1) Prove (or disprove) asymmetric routing in 5 minutes

  • Check routing tables and rules: ip route, ip rule.
  • Capture ingress/egress on both NICs during one failing flow: tcpdump on both interfaces.
  • Confirm source IP selection: ip route get <dest> from <src>.

2) Find the dropper: kernel, firewall, or network

  • Look for rp_filter/martians: journalctl -k, sysctl net.ipv4.conf.*.rp_filter.
  • Look for conntrack INVALID drops: nft list ruleset and counters, or iptables -L -v if you’re still living dangerously.
  • Check NIC errors/drops: ip -s link, ethtool -S.

3) Make routing deterministic (don’t “tune” first)

  • Pick one egress per source subnet using policy routing.
  • Set rp_filter to loose (2) on multi-homed hosts unless you truly know what you’re enforcing.
  • Stop advertising/using two default routes without metrics and rules.

Joke #1: If your routing depends on “whichever gateway feels lucky today,” congratulations—you’ve invented load balancing, but without the benefits.

Mental model: Linux routing, rules, and why Debian 13 surprises you

You debug dual NIC routing by understanding three layers of decision-making:
address selection, routing table lookup, and policy routing rules.
Most outages come from assuming Linux will “just reply out the same interface it came in on.”
That assumption is adorable. It’s also false.

Route selection: tables

Linux maintains routing tables. Most systems use the main table by default. When you run ip route,
you’re typically viewing the main table. Longest-prefix match wins, and if multiple routes are equal, metrics and ECMP rules apply.

Dual-NIC problems start when both NICs add a default route to main. Linux then chooses an egress
based on metrics (if different) or ECMP (if equal). That can be stable per flow, but it won’t align with your network’s expectations.

Policy routing: rules

ip rule lets you choose which table to consult based on packet properties: source address, fwmark, incoming interface,
TOS, UID ranges, and more. In practice, for dual NIC servers, source-based routing is the workhorse:
“Traffic sourced from 192.0.2.10 uses table 100, which has default via gateway A.”

Reverse path filtering: rp_filter

rp_filter checks whether the source address of an incoming packet is reachable via the interface it arrived on.
With strict mode (1), multi-homing can break because the “best route back” might be via the other NIC.
Loose mode (2) is typically what you want for multi-homed hosts: it verifies reachability, but not necessarily via the same interface.

Conntrack and stateful filtering

If you use nftables/iptables with stateful rules (most people do, sometimes without realizing), asymmetry can cause return packets to be seen as
INVALID because conntrack didn’t observe the original direction on that path. Then a perfectly valid packet gets dropped.
The packet isn’t “bad.” Your topology is.

ARP and neighbor selection

Another quiet villain is ARP behavior on hosts with multiple interfaces in the same L2 domain or overlapping prefixes.
Linux might answer ARP requests on one interface with the MAC of another, or choose a source address that confuses peers.
The result: traffic arrives where you didn’t expect, and your “routing fix” becomes a whack-a-mole session.

Practical tasks (commands, outputs, decisions)

The following tasks are written like you’re on call and the pager is still warm. Each includes:
a command, a realistic snippet of output, what it means, and the decision you make from it.

Task 1 — Inventory interfaces and addresses

cr0x@server:~$ ip -br addr
lo               UNKNOWN        127.0.0.1/8 ::1/128
enp1s0           UP             192.0.2.10/24 fe80::a00:27ff:fe12:3456/64
enp2s0           UP             198.51.100.10/24 fe80::a00:27ff:fe65:4321/64

Meaning: Two NICs, two IPv4 subnets. This is fine. It becomes not fine when both claim to be “the internet” (two defaults).

Decision: Confirm what each subnet is for (frontend/backend/management). Write it down. If nobody knows, you’re already in trouble.

Task 2 — Check the main routing table for multiple defaults

cr0x@server:~$ ip route show table main
default via 192.0.2.1 dev enp1s0 proto dhcp metric 100
default via 198.51.100.1 dev enp2s0 proto dhcp metric 100
192.0.2.0/24 dev enp1s0 proto kernel scope link src 192.0.2.10 metric 100
198.51.100.0/24 dev enp2s0 proto kernel scope link src 198.51.100.10 metric 100

Meaning: Two equal-metric defaults. Linux can ECMP this. Your firewall probably can’t.

Decision: You need policy routing or a single default. Don’t “just change metrics” unless you’re okay with failover behavior and you test it.

Task 3 — Inspect routing policy rules

cr0x@server:~$ ip rule show
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

Meaning: No policy routing rules. Everything consults main, so asymmetry is on the menu.

Decision: Add source-based rules and per-interface tables (or remove the extra default route).

Task 4 — Ask Linux “how would you route this?” for each source

cr0x@server:~$ ip route get 203.0.113.50 from 192.0.2.10
203.0.113.50 via 198.51.100.1 dev enp2s0 src 192.0.2.10 uid 0
    cache

Meaning: Traffic sourced as 192.0.2.10 would egress via enp2s0. That’s classic asymmetry fuel.

Decision: Implement “source 192.0.2.10 uses gateway 192.0.2.1” via policy routing.

Task 5 — Check rp_filter (strict mode is a frequent silent killer)

cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.enp1s0.rp_filter net.ipv4.conf.enp2s0.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.enp1s0.rp_filter = 1
net.ipv4.conf.enp2s0.rp_filter = 1

Meaning: Strict reverse-path filtering is enabled. With multi-homing, it often drops legitimate packets.

Decision: Set to loose (2) or disable (0) depending on your threat model and topology. Loose is the sane default for dual NIC routing.

Task 6 — Look for kernel hints: martians, rp_filter drops, neighbor weirdness

cr0x@server:~$ journalctl -k --since "2 hours ago" | tail -n 8
Dec 30 09:11:04 server kernel: IPv4: martian source 203.0.113.50 from 203.0.113.50, on dev enp1s0
Dec 30 09:11:04 server kernel: ll header: 00000000: 00 1b 21 22 33 44 00 1b 21 aa bb cc 08 00
Dec 30 09:12:18 server kernel: nf_conntrack: table full, dropping packet

Meaning: “Martian source” often correlates with rp_filter or routing inconsistency. Also: conntrack table is full, which creates drops that look “random.”

Decision: Fix routing first. Then address conntrack sizing if needed. If conntrack is full, you’re not debugging routing—you’re debugging overload too.

Task 7 — Check conntrack utilization

cr0x@server:~$ sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_count = 262144
net.netfilter.nf_conntrack_max = 262144

Meaning: You’re at the ceiling. New flows get dropped or bypassed depending on rules. Either way: pain.

Decision: If this host handles many connections (proxies, NAT, busy API), increase max and confirm memory headroom. Also reduce idle timeouts where appropriate.

Task 8 — Inspect nftables rules and counters for INVALID drops

cr0x@server:~$ nft list ruleset | sed -n '1,120p'
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;
    ct state established,related accept
    ct state invalid counter packets 1843 bytes 110580 drop
    iif "lo" accept
    tcp dport 22 accept
  }
}

Meaning: INVALID is being dropped, and the counter is climbing. Asymmetric paths are a top cause.

Decision: Don’t “accept INVALID” as a band-aid. Fix routing so packets are consistently tracked.

Task 9 — Check interface statistics for real drops vs routing drops

cr0x@server:~$ ip -s link show dev enp1s0
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 00:1b:21:aa:bb:cc brd ff:ff:ff:ff:ff:ff
    RX:  bytes  packets  errors  dropped  missed   mcast
     91433921  812334   0       0        0        1290
    TX:  bytes  packets  errors  dropped  carrier  collsns
     88122301  799221   0       0        0        0

Meaning: No NIC-level drops. So “random drops” likely happen at policy/firewall/conntrack, or upstream.

Decision: Stay focused: routing rules, rp_filter, conntrack, firewall state.

Task 10 — Capture traffic on both NICs to prove asymmetry

cr0x@server:~$ sudo tcpdump -ni enp1s0 host 203.0.113.50 and tcp port 443 -c 6
09:14:22.110001 IP 203.0.113.50.443 > 192.0.2.10.53122: Flags [S.], seq 1200, ack 900, win 65160, options [mss 1460], length 0
09:14:22.110050 IP 203.0.113.50.443 > 192.0.2.10.53122: Flags [.], ack 901, win 65160, length 0
cr0x@server:~$ sudo tcpdump -ni enp2s0 host 203.0.113.50 and tcp port 443 -c 6
09:14:22.110120 IP 192.0.2.10.53122 > 203.0.113.50.443: Flags [.], ack 1201, win 501, length 0
09:14:22.110200 IP 192.0.2.10.53122 > 203.0.113.50.443: Flags [P.], seq 901:1041, ack 1201, win 501, length 140

Meaning: SYN/ACK arrived on enp1s0, but ACK/data left on enp2s0. That’s asymmetry, proven with receipts.

Decision: Implement policy routing so replies to traffic destined for 192.0.2.10 exit enp1s0 (or ensure the upstream path is symmetric, which is harder).

Task 11 — Check neighbor/ARP behavior for flux

cr0x@server:~$ ip neigh show nud reachable,stale | head
192.0.2.1 dev enp1s0 lladdr 00:11:22:33:44:55 REACHABLE
198.51.100.1 dev enp2s0 lladdr 00:11:22:33:44:66 STALE

Meaning: Nothing obviously broken here. But if you see the same neighbor IP reachable via two interfaces, or lots of FAILED entries, suspect ARP flux or L2 design problems.

Decision: If networks overlap or share a VLAN, apply ARP controls (later section) and fix addressing.

Task 12 — Verify per-source routing once policy routing is added

cr0x@server:~$ ip route get 203.0.113.50 from 192.0.2.10
203.0.113.50 via 192.0.2.1 dev enp1s0 src 192.0.2.10 uid 0
    cache

Meaning: Now the egress matches the source subnet. This is the deterministic behavior you want.

Decision: Re-run tcpdump validation and watch nft counters. If INVALID drops stop climbing, you just bought stability.

Task 13 — Confirm no accidental ECMP defaults remain

cr0x@server:~$ ip route show default
default via 192.0.2.1 dev enp1s0 proto static metric 100
default via 198.51.100.1 dev enp2s0 proto static metric 200

Meaning: You can still keep two defaults with different metrics for failover, but your policy rules must be consistent.

Decision: Prefer a single default in main, and put the other default only in a dedicated table. Mixed models confuse future-you.

Task 14 — Check systemd-networkd state (Debian 13 reality)

cr0x@server:~$ networkctl status enp1s0 | sed -n '1,40p'
● 2: enp1s0
                 Link File: /usr/lib/systemd/network/99-default.link
              Network File: /etc/systemd/network/10-enp1s0.network
                      State: routable (configured)
               Online state: online
                    Address: 192.0.2.10/24
                    Gateway: 192.0.2.1

Meaning: networkd is managing your routing. That’s good if you configure it intentionally, and chaotic if you let DHCP spray defaults.

Decision: Put policy routing into networkd config so it survives reboot and doesn’t depend on a heroic on-call paste.

Fix patterns that actually hold up

Pattern A: Source-based policy routing (recommended for most dual-NIC servers)

Goal: traffic sourced from interface A’s IP uses interface A’s gateway; traffic sourced from interface B’s IP uses interface B’s gateway.
This stops asymmetric replies without needing upstream changes.

You implement this with:

  • Two routing tables (one per NIC)
  • Two ip rule entries matching source subnets
  • Routes in each table: connected subnet + default via its gateway

Immediate (runtime) configuration with iproute2

cr0x@server:~$ sudo ip route add 192.0.2.0/24 dev enp1s0 src 192.0.2.10 table 100
cr0x@server:~$ sudo ip route add default via 192.0.2.1 dev enp1s0 table 100
cr0x@server:~$ sudo ip route add 198.51.100.0/24 dev enp2s0 src 198.51.100.10 table 200
cr0x@server:~$ sudo ip route add default via 198.51.100.1 dev enp2s0 table 200
cr0x@server:~$ sudo ip rule add from 192.0.2.10/32 table 100 priority 1000
cr0x@server:~$ sudo ip rule add from 198.51.100.10/32 table 200 priority 1001

This works, but it disappears on reboot unless you persist it. Don’t be that person.

Persistent configuration with systemd-networkd (Debian 13-friendly)

Example /etc/systemd/network/10-enp1s0.network:

cr0x@server:~$ sudo sed -n '1,200p' /etc/systemd/network/10-enp1s0.network
[Match]
Name=enp1s0

[Network]
Address=192.0.2.10/24
Gateway=192.0.2.1
DNS=192.0.2.53

[RoutingPolicyRule]
From=192.0.2.10/32
Table=100
Priority=1000

[Route]
Destination=0.0.0.0/0
Gateway=192.0.2.1
Table=100

And /etc/systemd/network/20-enp2s0.network:

cr0x@server:~$ sudo sed -n '1,200p' /etc/systemd/network/20-enp2s0.network
[Match]
Name=enp2s0

[Network]
Address=198.51.100.10/24
Gateway=198.51.100.1
DNS=198.51.100.53

[RoutingPolicyRule]
From=198.51.100.10/32
Table=200
Priority=1001

[Route]
Destination=0.0.0.0/0
Gateway=198.51.100.1
Table=200

Then restart networkd:

cr0x@server:~$ sudo systemctl restart systemd-networkd
cr0x@server:~$ ip rule show | sed -n '1,10p'
0:      from all lookup local
1000:   from 192.0.2.10 lookup 100
1001:   from 198.51.100.10 lookup 200
32766:  from all lookup main
32767:  from all lookup default

Pattern B: One default route, one “special” network (best when one NIC is truly private)

If enp2s0 is strictly a storage network and should never be used for internet or client responses, don’t give it a default route at all.
Give it only the connected subnet route, and maybe a few explicit routes to storage peers.

This eliminates an entire class of outages. The interface becomes “dumb pipe to subnet X.”
The best routing policy is the one you don’t need.

Pattern C: fwmark-based routing (for complex apps and VIPs)

If you have multiple source addresses on one interface (VIPs), containers, or you need to steer only certain traffic,
you can mark packets in nftables and route based on fwmark.
This is more powerful and more error-prone. Use it when source-based rules aren’t enough.

rp_filter: set it deliberately, not by superstition

On multi-homed hosts, strict rp_filter is often incompatible with policy routing and legitimate asymmetry.
Loose mode is the usual compromise: it still requires a route back to the source, just not via the same interface.

cr0x@server:~$ sudo sysctl -w net.ipv4.conf.all.rp_filter=2
net.ipv4.conf.all.rp_filter = 2
cr0x@server:~$ sudo sysctl -w net.ipv4.conf.default.rp_filter=2
net.ipv4.conf.default.rp_filter = 2
cr0x@server:~$ sudo sysctl -w net.ipv4.conf.enp1s0.rp_filter=2
net.ipv4.conf.enp1s0.rp_filter = 2
cr0x@server:~$ sudo sysctl -w net.ipv4.conf.enp2s0.rp_filter=2
net.ipv4.conf.enp2s0.rp_filter = 2

Persist via /etc/sysctl.d/99-multihome.conf:

cr0x@server:~$ sudo tee /etc/sysctl.d/99-multihome.conf >/dev/null <<'EOF'
net.ipv4.conf.all.rp_filter=2
net.ipv4.conf.default.rp_filter=2
EOF
cr0x@server:~$ sudo sysctl --system | tail -n 4
* Applying /etc/sysctl.d/99-multihome.conf ...
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2

ARP controls: prevent “answering on the wrong NIC”

If both NICs sit on the same L2 or you have overlapping routes, tune ARP behavior to reduce flux:
arp_ignore and arp_announce.
This is not always required, but when it is, it’s the difference between sanity and interpretive dance.

cr0x@server:~$ sudo sysctl -w net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_ignore = 1
cr0x@server:~$ sudo sysctl -w net.ipv4.conf.all.arp_announce=2
net.ipv4.conf.all.arp_announce = 2

A reliability quote (paraphrased idea)

Paraphrased idea (attributed to Richard Cook): “Success is not the absence of failure; it’s the presence of adaptive capacity.”

Three corporate-world mini-stories

Mini-story 1: The incident caused by a wrong assumption

A mid-sized SaaS company ran Debian servers with two NICs: one public-ish VLAN for customer traffic, one private VLAN for database and backup.
The team assumed replies would leave via the same NIC that received the request. They’d “seen it work” in a lab.

One Monday, customer logins started flapping. Not fully down; just slow, spiky, and weird. The load balancer showed SYN/ACKs coming back late,
and half the TLS handshakes failed. Engineers chased CPU, then certificates, then the load balancer. The graphs were all slightly wrong in different ways.

The actual issue was simple: a DHCP renewal on the private NIC reintroduced a default route with the same metric as the public NIC.
Suddenly, some replies to customer traffic went out the private gateway. The perimeter firewall dropped them because it had no state for that direction.
From the server’s perspective, it “sent” the packet. From the customer’s perspective, the server ghosted.

The fix wasn’t heroic: pin one default route in main, use source-based policy routing for the secondary interface, and stop accepting default routes over DHCP on the backend VLAN.
The postmortem had one line worth framing: “We assumed symmetry; we configured randomness.”

Mini-story 2: The optimization that backfired

Another organization tried to “optimize latency” by enabling strict rp_filter everywhere.
Their security baseline treated it as free spoofing protection. It was rolled out with automation across a fleet that included multi-homed hosts.

A few weeks later, storage replication started dropping under load, but only between certain racks. TCP would establish, transfer some data,
then stall. Retries succeeded. Everyone blamed the storage vendor. Then they blamed MTU. Then someone blamed the switch ASIC, because it’s always the switch ASIC when you’re tired.

It turned out policy routing was already in place, but strict rp_filter didn’t like the return path lookup in certain cases where the “best route” differed from the ingress.
The kernel quietly dropped legitimate packets as martians. Replication fell back to retries and timeouts, turning a routing policy mismatch into a throughput collapse.

The “optimization” made the system brittle. Loose rp_filter restored stability, and they kept spoofing protection where it belonged: at the network edge,
with explicit ACLs, not wishful sysctls.

Mini-story 3: The boring but correct practice that saved the day

A large internal platform team had a rule: every multi-homed host must have a one-page “routing intent” note in the repo.
It described which interface owns which source addresses, what the default route is, and what policy rules exist. No poetry. Just truth.

During a datacenter migration, a new upstream firewall cluster was introduced. A small subset of services began seeing intermittent 502s.
The application teams escalated; the network team swore nothing changed “for those VLANs.” It smelled like an L7 problem, but the packet loss smelled like L3.

The SRE on call opened the routing intent note and immediately noticed the host was designed to reply out a specific NIC using table 200.
A quick ip rule check showed those rules were missing on the new image variant. The system booted fine. It also booted wrong.

They restored the intended networkd config, redeployed, and the issue evaporated. No drama. No war room. No blaming the firewall’s feelings.
Boring documentation plus deterministic config is not glamorous, but it’s the difference between an incident and a Slack message.

Joke #2: The only thing more random than asymmetric routing is the meeting where everyone insists it’s “definitely DNS.”

Common mistakes: symptoms → root cause → fix

1) SSH hangs or stalls intermittently

Symptoms: SSH connects, then pauses; scp stalls; retries help.

Root cause: Replies exit the wrong NIC; stateful firewall or upstream ACL drops return traffic.

Fix: Source-based policy routing per interface; ensure only one default in main; validate with tcpdump on both NICs.

2) nftables shows rising INVALID drops

Symptoms: ct state invalid drop counter climbs; users see random failures.

Root cause: Conntrack sees only one direction due to asymmetry (or conntrack table full).

Fix: Fix routing symmetry first; then size conntrack properly and ensure you’re not unnecessarily tracking high-volume traffic.

3) “Martian source” logs appear

Symptoms: Kernel logs martians; packets appear on “wrong” interface.

Root cause: Strict rp_filter on multi-homed host, or incorrect routes/rules.

Fix: Set rp_filter to loose (2) and implement policy routing so route lookup aligns with reality.

4) Works until DHCP renew, then breaks

Symptoms: Every few hours/days things go sideways; reboot “fixes” temporarily.

Root cause: DHCP installs or changes default routes/metrics; routing becomes non-deterministic again.

Fix: Disable default route via DHCP on secondary NIC, or isolate DHCP to one interface; persist routes via networkd.

5) Only some remote subnets fail

Symptoms: Most clients fine; one region/provider flaps.

Root cause: Those clients’ return path hits a different upstream (different firewall/NAT rules), making asymmetry visible.

Fix: Enforce deterministic egress per source; ensure upstream sees consistent paths.

6) “We set metrics, so it should be fine”

Symptoms: Mostly stable, but failover or partial failures produce blackholes.

Root cause: Metrics decide default route preference, but don’t guarantee replies follow ingress interface, especially with multiple sources.

Fix: Use policy routing; metrics are for preference, not correctness.

7) Storage or replication traffic melts under load

Symptoms: Throughput collapses, retransmits spike, timeouts.

Root cause: Conntrack table exhaustion, or stateful filtering plus asymmetry, or both.

Fix: Avoid tracking traffic that doesn’t need it; size conntrack; fix routing so flows stay consistent.

8) ARP weirdness: traffic arrives on the wrong NIC even with correct routing

Symptoms: Peers send packets to “wrong” MAC; failover behaves strangely.

Root cause: ARP flux: host answers ARP from the wrong interface; or overlapping L2 domains.

Fix: Separate L2 domains; tune arp_ignore/arp_announce; ensure unique subnets and clean VLAN design.

Checklists / step-by-step plan

Step-by-step: stop asymmetric routing on Debian 13 (production-safe order)

  1. Capture current state (before you touch anything):

    • ip -br addr
    • ip route show table main
    • ip rule show
    • sysctl net.ipv4.conf.all.rp_filter

    Decision: confirm you’re actually multi-homed (not just aliases) and whether two defaults exist.

  2. Prove asymmetry with tcpdump on both NICs during one failing flow.

    Decision: if ingress and egress differ for the same flow, proceed with policy routing. If not, you might have MTU, congestion, or upstream filtering.

  3. Decide your intent:

    • One NIC is the default for everything, the other is private-only (best).
    • Both NICs serve real clients/peers and must route correctly (policy routing required).

    Decision: write the intent in a comment in config. “We’ll remember” is not a plan.

  4. Implement policy routing (tables + rules) either via networkd or ifupdown.

    Decision: prefer networkd persistence on Debian 13 if it’s already managing interfaces.

  5. Set rp_filter to loose (2) for multi-homed hosts.

    Decision: if security insists on strict, require them to sign the incident review when it breaks. Loose mode is the realistic compromise.

  6. Validate with ip route get for representative destinations from each source address.

    Decision: if any source chooses the “wrong” gateway, routing tables/rules are incomplete.

  7. Re-check firewall counters (nft INVALID drops, etc.).

    Decision: if INVALID drops still rise, verify conntrack isn’t full and that traffic is not bypassing expected path.

  8. Load test or replay production-like traffic.

    Decision: if failures only appear under load, check conntrack sizing, IRQ saturation, qdisc, and upstream policing—not just routing.

  9. Make it durable:

    • Commit networkd files and sysctl config to your config management.
    • Document routing intent and a “known-good” output of ip rule and ip route show table 100/200.

    Decision: if the fix can be undone by DHCP renew, you didn’t fix it.

Operational checklist: after-change verification (10 minutes)

  • Confirm rules: ip rule show
  • Confirm tables: ip route show table 100, ip route show table 200
  • Confirm rp_filter: sysctl net.ipv4.conf.all.rp_filter
  • Confirm nft counters stabilize: nft list chain inet filter input (or equivalent)
  • Confirm traffic symmetry with tcpdump during a test transaction
  • Confirm no surprise defaults via DHCP after renew (or just wait for renew window)

FAQ

1) Can I just set route metrics and call it done?

Metrics decide preference, not correctness. They don’t guarantee that replies use the same interface as the source address,
and they don’t fix conntrack/stateful device expectations. Use policy routing for correctness; use metrics for preference/failover.

2) Do I really need policy routing if the subnets are different?

If there is only one default route and the other interface has no default, you can often avoid policy routing.
If both interfaces have gateways or you source traffic from both addresses, policy routing is the safe design.

3) What rp_filter setting should I use on dual-NIC hosts?

Typically 2 (loose). Strict (1) frequently drops legitimate traffic in multi-homed setups.
Disable (0) only if you understand the spoofing implications and have compensating controls.

4) Why do drops look random?

Because routing decisions can vary per flow (ECMP hash), per cache entry, or after DHCP renewals.
Add conntrack state and upstream firewall expectations, and you get failures that depend on timing and traffic shape.

5) Does this apply to IPv6 too?

Yes, but the mechanisms differ (IPv6 rp_filter is not the same, and source address selection rules are richer).
Policy routing exists for IPv6 with ip -6 rule and per-table routes; test carefully because IPv6 has multiple valid source addresses by design.

6) I see “nf_conntrack: table full.” Is that routing?

Not directly, but it creates packet loss that looks similar. Fix routing asymmetry first, then size conntrack.
If you’re tracking millions of short-lived flows, you may need both more capacity and better filtering strategy.

7) Should I accept INVALID packets to stop drops?

No. That’s like disabling a smoke alarm because it’s loud. INVALID often means asymmetry, timeout mismatch, or attack traffic.
Fix the path so packets become valid again.

8) How do I handle failover between gateways?

If you truly need failover, keep deterministic routing per source and use controlled mechanisms:
dynamic routing (BFD/FRR), tracked routes, or explicit metric changes with automation. Avoid “two equal defaults” unless you want ECMP and you understand the end-to-end path.

9) What about bonding (LACP)? Wouldn’t that solve it?

Bonding solves a different problem: link redundancy/aggregation at L2. It can help if both links are to the same L2 domain and you want one logical interface.
It does not replace correct L3 routing when the networks/gateways are different.

10) Why does it only break when we add a firewall?

Because stateful devices enforce symmetry for the flows they track. Without them, the internet might tolerate the asymmetry.
Once you introduce state, the path becomes part of the contract.

Next steps you should take this week

Fixing case #53 isn’t just about making the drops stop today. It’s about making sure the next change doesn’t resurrect them.
Here’s the practical to-do list that survives staff turnover and “temporary” network experiments.

  1. Decide and document routing intent for every dual-NIC host: which NIC owns which traffic, and why.
  2. Eliminate accidental dual defaults: one default in main, or explicit policy routing with separate tables.
  3. Set rp_filter consciously: loose mode for multi-homing, persist via sysctl.d.
  4. Validate with three tools: ip route get, tcpdump on both NICs, and firewall counters (nft/conntrack).
  5. Make config persistent in systemd-networkd (or your chosen network manager) and commit it to automation.
  6. Watch conntrack if you’re stateful: capacity, timeouts, and whether you’re tracking traffic you don’t need.
  7. Run a post-change rehearsal: simulate a gateway flap if you claim failover, and see what actually happens.

Dual NIC routing isn’t hard. What’s hard is pretending it’s not there until it picks a busy day to remind you.
Make it deterministic and go back to solving problems that are at least interesting.

← Previous
DNS Slow Lookups on Linux: Fix systemd-resolved the Right Way
Next →
Proxmox GPU Passthrough Black Screen: 10 Causes and Fixes

Leave a comment