Linux Firewall: The Clean nftables Layout That Stays Readable at 500 Rules

Was this helpful?

You don’t notice a firewall until it hurts. Usually at 02:13, when “a tiny change” turns SSH into a ghost,
your on-call phone into a smoke alarm, and your CEO into a part-time network engineer.

nftables can be elegant and fast—but only if you stop treating it like a bigger iptables.
This is a production layout that stays readable when the ruleset grows past 500 rules, survives audits,
and lets you debug like an adult.

Why layout matters more than syntax

nftables is not hard because of its syntax. It’s hard because your ruleset is a living system:
it grows by accretion, it’s modified under pressure, and it’s read by people who didn’t write it
(including Future You, who is the least forgiving reviewer alive).

When you cross ~200 rules, two things happen:

  • Change risk spikes. A small change can move a packet path from “accept” to “drop” with no obvious diff signal.
  • Debug time explodes. You stop reasoning about policy and start grepping for strings and praying.

The cure is structure. Not “comments everywhere” structure. Real structure:
predictable chain boundaries, strict naming, sets/maps for identity, and a ruleset that reflects how packets actually flow.

Opinion: if your nftables file is a single 1,200-line blob, you don’t have a firewall. You have a future incident report.

A few facts and historical context (so you stop repeating 2012 mistakes)

Short, concrete context points that matter in production:

  1. nftables entered the mainline Linux kernel in 3.13 (2014). It’s not “new” anymore; your bad habits are.
  2. iptables is really a front-end to netfilter. nftables is a newer front-end, designed to replace the iptables family and reduce duplication.
  3. nftables uses a VM-like bytecode approach. That’s why rules can be expressed compactly, and why sets/maps can radically reduce rule count.
  4. iptables historically had separate tools for IPv4/IPv6/arptables/ebtables. nftables unifies them, which is great—until you accidentally “unify” two policies that should remain separate.
  5. Conntrack (stateful tracking) long predates nftables. nftables didn’t invent statefulness; it made it easier to apply consistently.
  6. nftables supports atomic ruleset replacement. This is operational gold: you can load a new ruleset as a transaction, not a risky sequence of edits.
  7. Sets were a breakthrough for performance and readability. Instead of 200 “ip saddr X accept” lines, you get one rule plus a set definition.
  8. nftables has better introspection primitives. Counters, handles, and structured listings are designed for debugging and tooling—not just for humans staring at text.

One quote that operations folks should tattoo on their change process:
Hope is not a strategy. — General Gordon R. Sullivan

Design principles that scale past 500 rules

1) One packet path, one story

A readable ruleset tells the story of a packet. For example: inbound packet hits input → sanity checks →
established/related → “allowed services” → rate-limited logs → drop.

If your input chain alternates between “allow nginx” and “block bogons” and “allow monitoring” and “drop fragments” in no order,
you’ve forced the reader to simulate the entire chain in their head. That doesn’t scale.

2) Default deny at chain boundaries, not sprinkled everywhere

“Default deny” is not “drop packets in 90 places.” It’s one deliberate policy decision at the end of a path
(or as a chain policy), with exceptions pulled forward.

Sprinkling random drop rules makes auditing impossible and creates “shadow policies” that nobody remembers.

3) Use sets/maps for identity; use chains for behavior

Sets and maps are your readability budget. You spend them to keep rules short.
Chains express behavior and ordering. If you do the opposite (chains for identity, rules for behavior),
you’ll end up with a combinatorial mess.

4) Separate “edge” from “service host” from “transit”

The fastest way to create a broken ruleset is to mix router logic (forwarding/NAT), host logic (local services),
and “this box is also a VPN endpoint” logic in one chain.

Use separate tables or at least separate include files. Make it impossible to accidentally change NAT while “just opening a port.”

5) Be explicit about what’s allowed to talk to the box itself

Most production pain comes from control-plane traffic: SSH, config management, monitoring, time sync, service discovery.
Treat it as a first-class policy, not an afterthought.

6) Logging must be useful or it must not exist

Log storms don’t “help debugging.” They help your SIEM bill.
Log only at decision points, rate-limit it, and include a prefix you can grep.

Joke #1: If your firewall logs everything, congratulations—you’ve invented a very expensive random number generator.

7) Prefer atomic reloads, and test like you mean it

Production rule changes should be: validate → dry-run (or at least parse) → apply atomically → verify counters/behavior.
Not “edit live in a terminal and hope the TCP session stays up.”

8) Don’t chase micro-optimizations until you can explain your ruleset

nftables is fast. Your real bottleneck is usually not “two extra comparisons,” it’s “nobody knows which rule is active.”
Optimize for operability first. Your future incidents will thank you.

The reference layout: files, tables, chains, and naming

File layout (the part auditors love)

Keep /etc/nftables.conf tiny. It should load your real rules from a directory.
This makes reviews easy, enables partial ownership (network team owns NAT file, platform team owns services file),
and prevents merge conflicts that look like spaghetti.

  • /etc/nftables.conf — entrypoint
  • /etc/nftables.d/00-defs.nft — constants, sets, maps, interface names
  • /etc/nftables.d/10-filter-base.nft — base chains: input/output/forward skeleton
  • /etc/nftables.d/20-filter-services.nft — service allows (ingress to local services)
  • /etc/nftables.d/30-filter-management.nft — SSH/monitoring/config management
  • /etc/nftables.d/40-nat.nft — NAT (only if needed)
  • /etc/nftables.d/90-debug.nft — optional debug rules (disabled by default)

Naming conventions that survive teams

Names aren’t vanity. They’re the only way to debug quickly without paging the person who wrote the rules three years ago.
Use predictable prefixes:

  • Tables: inet filter, ip nat (and ip6 nat only if you must)
  • Base chains: input, output, forward
  • User chains: in_sanity, in_established, in_allow_mgmt, in_allow_services, in_log_drop
  • Sets: prefix with set_ and describe content: set_mgmt_v4, set_bogon_v4, set_allowed_tcp_services
  • Maps: prefix with map_ and describe mapping: map_if_trust, map_service_ports

Why “inet filter” is the default table

Use table inet filter for host firewalling when possible. You get one policy for IPv4 and IPv6,
fewer duplicated rules, and fewer “we forgot IPv6 existed” incidents.

But don’t be dogmatic: NAT is still typically in ip/ip6 families.
Keep NAT separate from filter. Mixing them is like storing chainsaws in the cutlery drawer.

Chain structure: small, purposeful, and boring

Base chains should be thin and mostly just jump to sub-chains. Sub-chains should each have one job.
When you do this, counters and traces become meaningful: you can see which stage drops traffic.

A clean ruleset example (with includes)

This is a layout that stays readable at 500 rules because the rule count lives in sets, not in repeated lines.
It also keeps debugging predictable because packet flow is staged.

cr0x@server:~$ sudo sed -n '1,160p' /etc/nftables.conf
#!/usr/sbin/nft -f

flush ruleset

include "/etc/nftables.d/*.nft"
cr0x@server:~$ sudo sed -n '1,260p' /etc/nftables.d/00-defs.nft
define IF_WAN = "eth0"
define IF_LAN = "eth1"
define IF_LOOP = "lo"

table inet filter {
  set set_mgmt_v4 {
    type ipv4_addr
    flags interval
    elements = { 198.51.100.10, 198.51.100.0/24 }
  }

  set set_mgmt_v6 {
    type ipv6_addr
    flags interval
    elements = { 2001:db8:100::/64 }
  }

  set set_bogon_v4 {
    type ipv4_addr
    flags interval
    elements = { 0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 127.0.0.0/8, 169.254.0.0/16,
                 172.16.0.0/12, 192.0.2.0/24, 192.168.0.0/16, 198.18.0.0/15, 203.0.113.0/24, 224.0.0.0/4, 240.0.0.0/4 }
  }

  set set_allowed_tcp_services {
    type inet_service
    elements = { 22, 80, 443, 9100 }
  }

  set set_allowed_udp_services {
    type inet_service
    elements = { 123 }
  }
}
cr0x@server:~$ sudo sed -n '1,260p' /etc/nftables.d/10-filter-base.nft
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;

    iifname $IF_LOOP accept

    jump in_sanity
    jump in_established
    jump in_allow_mgmt
    jump in_allow_services
    jump in_log_drop
  }

  chain forward {
    type filter hook forward priority 0; policy drop;

    jump fwd_sanity
    jump fwd_established
    jump fwd_allow
    jump fwd_log_drop
  }

  chain output {
    type filter hook output priority 0; policy accept;

    jump out_sanity
  }

  chain in_sanity {
    ct state invalid drop

    ip saddr @set_bogon_v4 drop
    ip6 saddr ::/128 drop

    meta l4proto { tcp, udp, icmp, icmpv6 } accept
    drop
  }

  chain in_established {
    ct state { established, related } accept
  }

  chain in_log_drop {
    limit rate 10/second burst 20 packets log prefix "nft in drop " flags all counter
    drop
  }

  chain out_sanity {
    ct state invalid drop
  }

  chain fwd_sanity {
    ct state invalid drop
  }

  chain fwd_established {
    ct state { established, related } accept
  }

  chain fwd_log_drop {
    limit rate 10/second burst 20 packets log prefix "nft fwd drop " flags all counter
    drop
  }
}
cr0x@server:~$ sudo sed -n '1,260p' /etc/nftables.d/30-filter-management.nft
table inet filter {
  chain in_allow_mgmt {
    tcp dport 22 ip saddr @set_mgmt_v4 accept
    tcp dport 22 ip6 saddr @set_mgmt_v6 accept
  }
}
cr0x@server:~$ sudo sed -n '1,300p' /etc/nftables.d/20-filter-services.nft
table inet filter {
  chain in_allow_services {
    tcp dport { 80, 443 } accept
    tcp dport 9100 accept
    udp dport 123 accept
  }
}

Notes on the example:

  • Base chains are short. They jump into staged chains, which makes packet flow easy to narrate.
  • Policy drop on input/forward. Output is accept by default for most servers; lock it down only when you have a reason.
  • Sanity chain is strict. It drops invalid early and handles bogons. It also has a “protocol allowlist” for L4 primitives so nonsense is dropped fast.
  • Management traffic is isolated. SSH allow rules are in a dedicated chain with source restrictions.
  • Logging is rate-limited and late. You log only drops at the end of the input/forward paths.

Joke #2: “Temporary” firewall rules have the same lifecycle as “temporary” tables in the database: they outlive your team.

Practical tasks: commands, outputs, and what you decide from them

These are real tasks you’ll do on real boxes. Each includes the command, a realistic-looking output,
what the output means, and the decision you make.

Task 1: Confirm nftables is active and which loader your distro uses

cr0x@server:~$ systemctl status nftables
● nftables.service - nftables
     Loaded: loaded (/lib/systemd/system/nftables.service; enabled; preset: enabled)
     Active: active (exited) since Tue 2026-02-04 09:11:22 UTC; 2h 13min ago
       Docs: man:nft(8)
    Process: 612 ExecStart=/usr/sbin/nft -f /etc/nftables.conf (code=exited, status=0/SUCCESS)
   Main PID: 612 (code=exited, status=0/SUCCESS)

Meaning: The service applied the ruleset and exited cleanly (common for nftables).

Decision: If this is inactive/failed, you fix service management first; debugging rules without a consistent loader is chaos.

Task 2: Dump the active ruleset exactly as the kernel sees it

cr0x@server:~$ sudo nft list ruleset
table inet filter {
	chain input {
		type filter hook input priority filter; policy drop;
		iifname "lo" accept
		jump in_sanity
		jump in_established
		jump in_allow_mgmt
		jump in_allow_services
		jump in_log_drop
	}
	chain in_allow_mgmt {
		tcp dport 22 ip saddr @set_mgmt_v4 accept
		tcp dport 22 ip6 saddr @set_mgmt_v6 accept
	}
	chain in_log_drop {
		limit rate 10/second burst 20 packets log prefix "nft in drop " flags all counter
		drop
	}
}

Meaning: This is the truth. Not your file, not your git repo—the active kernel ruleset.

Decision: If the output doesn’t match your expected includes, you have a loader mismatch or stale config deployment.

Task 3: Validate syntax before you apply it (prevent self-lockout)

cr0x@server:~$ sudo nft -c -f /etc/nftables.conf

Meaning: No output and exit code 0 means parse/check succeeded.

Decision: If validation fails, fix that first; do not “try it live.” If you must apply remotely, validation is non-negotiable.

Task 4: Apply changes atomically and verify success

cr0x@server:~$ sudo nft -f /etc/nftables.conf

Meaning: Again, no output usually means success.

Decision: Immediately follow with counters checks (Task 7) and service reachability tests. “Loaded” is not “correct.”

Task 5: Show chain handles (for precise deletions, audits, and tooling)

cr0x@server:~$ sudo nft -a list chain inet filter input
table inet filter {
	chain input { # handle 1
		type filter hook input priority filter; policy drop;
		iifname "lo" accept # handle 5
		jump in_sanity # handle 6
		jump in_established # handle 7
		jump in_allow_mgmt # handle 8
		jump in_allow_services # handle 9
		jump in_log_drop # handle 10
	}
}

Meaning: Handles are stable identifiers for rules in the active ruleset.

Decision: If you need to surgically remove one rule during an incident, use handles, not line-number guessing.

Task 6: List sets and confirm they loaded as intended

cr0x@server:~$ sudo nft list set inet filter set_mgmt_v4
table inet filter {
	set set_mgmt_v4 {
		type ipv4_addr
		flags interval
		elements = { 198.51.100.0/24, 198.51.100.10 }
	}
}

Meaning: Your identity data is present. Interval sets may coalesce entries.

Decision: If a management allow depends on this set, confirm it before blaming “SSH issues” on the network.

Task 7: Check counters to see what’s actually being hit

cr0x@server:~$ sudo nft list chain inet filter in_log_drop
table inet filter {
	chain in_log_drop {
		limit rate 10/second burst 20 packets log prefix "nft in drop " flags all counter packets 41 bytes 2870
		drop
	}
}

Meaning: 41 packets hit the drop logger. That’s not theoretical—something is being denied.

Decision: If counters spike after a deployment, you bisect policy changes. If counters are zero but users complain, the issue is elsewhere (routing, app, upstream ACL).

Task 8: Verify conntrack state behavior (common source of “it works once” bugs)

cr0x@server:~$ sudo conntrack -S
cpu=0 found=18231 invalid=12 ignore=0 insert=40121 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=0

Meaning: You have invalid packets (12). Some invalid is normal on the internet; huge numbers suggest asymmetric routing or broken offload.

Decision: If invalid climbs rapidly, investigate routing symmetry and offload settings; don’t just “allow invalid” to make graphs look calm.

Task 9: Use nft monitor during a change window

cr0x@server:~$ sudo nft monitor
add rule inet filter in_allow_services tcp dport 8443 counter accept

Meaning: Live view of rules being added/removed. Great for verifying automation did what it claims.

Decision: If you see unexpected churn, stop and review deployment tooling; config management loops can thrash your ruleset.

Task 10: Trace a packet path (the fastest way to find “which rule ate it”)

cr0x@server:~$ sudo nft add rule inet filter input meta nftrace set 1
cr0x@server:~$ sudo nft monitor trace
trace id 3c2a inet filter input packet: iif "eth0" ip saddr 203.0.113.55 ip daddr 198.51.100.20 tcp sport 51234 tcp dport 22
trace id 3c2a inet filter input rule jump in_sanity
trace id 3c2a inet filter in_sanity verdict accept
trace id 3c2a inet filter input rule jump in_established
trace id 3c2a inet filter input rule jump in_allow_mgmt
trace id 3c2a inet filter in_allow_mgmt verdict drop

Meaning: This trace shows the evaluation path. The packet is not in your mgmt source set, so it doesn’t match accept rules and falls through to drop later.

Decision: You either add the source to set_mgmt_v4 (if legitimate) or you tell the user “no, that’s the point.”

Task 11: Watch kernel logs for rate-limited firewall drops

cr0x@server:~$ sudo journalctl -k -n 5
Feb 04 11:22:19 server kernel: nft in drop IN=eth0 OUT= MAC=52:54:00:aa:bb:cc SRC=203.0.113.55 DST=198.51.100.20 LEN=60 TOS=0x00 PREC=0x00 TTL=49 ID=53113 DF PROTO=TCP SPT=51234 DPT=22 WINDOW=64240 RES=0x00 SYN URGP=0
Feb 04 11:22:20 server kernel: nft in drop IN=eth0 OUT= MAC=52:54:00:aa:bb:cc SRC=203.0.113.8 DST=198.51.100.20 LEN=52 TOS=0x00 PREC=0x00 TTL=51 ID=0 DF PROTO=TCP SPT=60211 DPT=443 WINDOW=1024 RES=0x00 SYN URGP=0

Meaning: Your logging prefix works, and you can correlate with service complaints.

Decision: If you see legitimate traffic dropped, fix allow rules. If you see garbage, keep logging minimal and consider upstream filtering.

Task 12: Confirm you didn’t forget IPv6

cr0x@server:~$ ip -6 addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
    inet6 2001:db8:200::20/64 scope global
       valid_lft forever preferred_lft forever

Meaning: The host has a global IPv6 address. If you only wrote IPv4 rules, you may have accidentally exposed services over v6—or blocked them unexpectedly.

Decision: Use inet table for filter, confirm v6 sources in management sets, and explicitly allow/deny services over IPv6 as intended.

Task 13: Confirm that a port is actually listening before blaming the firewall

cr0x@server:~$ sudo ss -lntp | head
State  Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
LISTEN 0      4096   0.0.0.0:22         0.0.0.0:*     users:(("sshd",pid=811,fd=3))
LISTEN 0      4096   0.0.0.0:80         0.0.0.0:*     users:(("nginx",pid=1002,fd=6))
LISTEN 0      4096   0.0.0.0:443        0.0.0.0:*     users:(("nginx",pid=1002,fd=7))

Meaning: Services are listening on IPv4. If users can’t connect, the firewall may be at fault—or routing/security groups/upstream ACLs.

Decision: If it’s not listening, fix the service. If it is listening, continue with nft counters/trace and upstream checks.

Task 14: Spot rule bloat by counting rules and hunting duplicates

cr0x@server:~$ sudo nft list ruleset | wc -l
892

Meaning: Line count is a crude metric, but it’s a smell test. If you expected 250 lines and got 892, you likely duplicated rules or expanded generated content.

Decision: Convert repeated literals to sets/maps, split includes by domain, and stop generating near-duplicates in automation.

Task 15: Confirm offload/fast path isn’t undermining visibility

cr0x@server:~$ ethtool -k eth0 | egrep 'gro|gso|tso|rx-checksumming|tx-checksumming'
rx-checksumming: on
tx-checksumming: on
tcp-segmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on

Meaning: Offloads are enabled. That’s fine, but some environments get weird with conntrack and certain encapsulations.

Decision: If you see high invalid conntrack or odd traces, test disabling specific offloads during a maintenance window—not as a random superstition.

Fast diagnosis playbook

When you’re under pressure, you don’t “review the ruleset.” You run a tight loop that finds the bottleneck quickly.
Here’s a fast sequence that works for most “traffic is blocked” or “traffic is slow” reports.

First: prove the service and the socket

  1. Check if the process is listening on the expected IP/port (ss -lntp).
  2. Confirm local connectivity (curl to localhost or nc -vz 127.0.0.1 PORT if applicable).
  3. Confirm it’s listening on IPv6 if clients use v6 (ss -lntp | grep '\[::\]:' patterns).

If the service isn’t listening, the firewall is innocent. Treat innocence as a valuable asset.

Second: prove packets arrive at the host

  1. Capture at the interface for a short window (tcpdump -ni eth0 port 443).
  2. If nothing arrives, investigate routing, upstream firewall, load balancer health checks, or security groups.

Third: prove nftables is the decision point

  1. Check counters on your drop/log chain(s) (nft list chain with counters).
  2. Enable a short-lived trace using nftrace and nft monitor trace to locate the exact stage.
  3. Confirm sets/maps contain expected elements (management IPs, service ports).

Fourth: isolate the specific policy break

  1. Is the traffic new or existing? If existing flows work but new ones fail, conntrack state or a new rule ordering is likely.
  2. Is it IPv4-only, IPv6-only, or both? “Both” often means chain policy; “one” often means missing family rules or inet misuse.
  3. Is there NAT involved? If yes, check NAT tables separately; don’t chase filter rules for a NAT bug.

Fifth: fix with minimal blast radius

  1. Prefer adding to an existing set/map rather than adding a new one-off rule.
  2. Prefer adding an allow rule in the correct dedicated chain (mgmt vs services) rather than in base chain.
  3. Reload atomically and verify with counters and a synthetic test.

Common mistakes: symptoms → root cause → fix

1) Symptom: SSH works from the office but not from VPN

Root cause: Management allow is bound to a source set that doesn’t include VPN egress ranges, or the VPN uses IPv6 and you only allowed v4.

Fix: Add VPN egress subnet to set_mgmt_v4/set_mgmt_v6. Validate with trace, then reload atomically.

2) Symptom: “It worked after restart, then broke later”

Root cause: Conntrack state allowed established flows, but new flows hit a new deny rule, or time-based changes (DHCP, dynamic IPs) moved clients out of allowed sets.

Fix: Check ct state placement; keep established,related early. For dynamic client IPs, use stable identity (VPN address pool, bastion) rather than random office ISP ranges.

3) Symptom: IPv6 exposure or IPv6 outage “out of nowhere”

Root cause: You wrote v4-only rules while the host has global v6, or you used ip table for filter thinking it covers both families.

Fix: Use table inet filter for host firewalling. Explicitly allow/deny services over v6. Verify with ip -6 addr and nft list ruleset.

4) Symptom: High CPU on a busy edge box after “improving logging”

Root cause: Logging too early in the chain, logging accepts, or missing rate limits causes kernel log overhead and userspace processing storms.

Fix: Log only final drops, rate-limit, and use concise prefixes. If you need observability, use counters and tracing surgically.

5) Symptom: Packet drops appear random during high traffic

Root cause: Conntrack table pressure, invalid state spikes, or asymmetric routing causing established flows to look invalid.

Fix: Check conntrack stats (conntrack -S), ensure routing symmetry, and don’t “accept invalid” as a band-aid. Investigate offload and encapsulation interactions.

6) Symptom: Automation keeps “fixing” your manual emergency change

Root cause: Config management re-applies a desired state without acknowledging incident edits.

Fix: During incident: update the source-of-truth repo quickly (even a hotfix branch), or temporarily pause the nftables role on that host. After incident: codify the emergency change properly (ideally as set elements, not ad-hoc rules).

7) Symptom: NAT works for some hosts but not others

Root cause: NAT and filter interleaved in confusing ways; forward chain allows traffic but postrouting masquerade is missing or too narrow.

Fix: Separate NAT into ip nat (and ip6 nat if needed). Verify with packet captures and counters. Keep forward allows and NAT rules aligned by interface/subnet identity.

Three corporate mini-stories (because you’ll recognize the smell)

Mini-story 1: The incident caused by a wrong assumption

A mid-size SaaS company moved from legacy iptables scripts to nftables using a “simple translation.”
The team assumed their old rule order didn’t matter because “it’s just allow lists.”
They also assumed IPv6 was irrelevant because their load balancer terminated connections over IPv4.

The new nftables ruleset went live during a quiet afternoon. It passed the basic smoke test:
the web service was reachable, and SSH from the bastion worked. Everyone went home feeling mature.
Two hours later, monitoring started showing sporadic failures from a subset of nodes in one region.

The real issue: those nodes preferred IPv6 for internal service discovery. The firewall had an ip family filter table,
and the IPv6 traffic was hitting a mostly-empty path. Some services were unintentionally exposed; others were silently blocked,
depending on which process bound to [::].

The debugging took longer than it should have because the ruleset wasn’t staged.
There was no “management chain,” no “services chain,” just a long list of mixed rules. Tracing would have shown the break in minutes,
but nobody had a trace workflow or a safe debug include.

The fix was boring: consolidate host firewalling into table inet filter, stage the chain flow,
and make IPv6 an explicit decision instead of an accidental side effect.
The postmortem’s most useful line was: “IPv6 is not a feature. It’s a thing that already exists.”

Mini-story 2: The optimization that backfired

A financial services team ran a high-traffic API tier and wanted to squeeze latency.
Someone noticed “too many rules” and decided to compress the firewall by moving a bunch of logic into clever maps,
doing aggressive early drops, and logging every reject for “better security visibility.”

On paper, it looked great: fewer lines in the ruleset and lots of structured matching.
In reality, they created a ruleset that nobody could read during an incident.
The “clever map” encoded multiple behaviors (allow, drop, log) in a way that required a mental compiler to understand.

The backfire came with load: log volume spiked, kernel time increased, and the SIEM pipeline started lagging.
Under pressure, someone disabled logging entirely—by deleting a shared chain used in three places—because the layout did not separate concerns.
That took down visibility right when the team needed it.

The recovery was a redesign: maps used only for identity (port groups and source groups),
behavior stayed in explicit chains, and logging moved to end-of-path drop chains with strict rate limits.
Latency improved slightly, but the real win was operational: the next incident took 15 minutes instead of half a day.

Mini-story 3: The boring but correct practice that saved the day

A large enterprise ran thousands of Linux VMs with nftables managed by configuration management.
Their security policy changed quarterly, and every quarter there was a fear: “this is the change that locks us out.”
One team quietly implemented a practice nobody celebrated: every nftables change had to be validated with nft -c,
deployed atomically, and verified with counters plus one synthetic connection test from a known probe host.

Months later, a rushed change landed late on a Friday (because of course it did).
The new policy accidentally removed a required UDP port for time sync on a subset of hosts.
The change deployed, and within minutes, their verification pipeline flagged a counter spike in the drop chain and failing synthetic checks.

The on-call didn’t need to guess. They had a staged ruleset, so the counters pointed straight at in_allow_services missing UDP 123.
They updated the service port set, re-validated, and reloaded atomically. Downtime was limited and contained.
Nobody wrote a heroic Slack thread. That’s how you know it was good engineering.

The lesson: the most valuable firewall feature is not a match expression. It’s a disciplined change process that assumes humans will be tired.

Checklists / step-by-step plan

Step-by-step: build a readable ruleset that scales

  1. Choose the table families: Use inet for filter, separate ip/ip6 for NAT if needed.
  2. Define interfaces once: Use define IF_WAN, IF_LAN, etc. Avoid magic strings in rules.
  3. Create identity sets: management sources, monitoring sources, internal subnets, allowed service ports.
  4. Create staged chains: sanity → established → management → services → log/drop.
  5. Keep base chains thin: Base chains should mostly jump; don’t hide policy in them.
  6. Log only at end-of-path: One drop logger per path (input/forward), rate-limited, with consistent prefixes.
  7. Enforce ordering: Make sure ct state established,related is early. Invalid drops are early too.
  8. Use atomic reloads: Always deploy via nft -f after nft -c.
  9. Verify with counters: Confirm expected chains increment under traffic; confirm drop counters stay reasonable.
  10. Keep debug tooling ready: Maintain a disabled-by-default debug include you can enable in emergencies (trace, temporary logging).

Checklist: pre-change safety for remote hosts

  • Out-of-band access exists (console/ILO/IPMI/serial) or a known bastion path is tested.
  • nft -c -f /etc/nftables.conf succeeds.
  • You have a rollback method: last known good config available locally.
  • You’re not mixing “NAT refactor” with “open one port.”
  • You have a live verification command ready (curl, nc, or a monitoring probe).

Checklist: ongoing hygiene (the stuff that keeps 500 rules readable)

  • Every new allow rule must land in the correct chain (mgmt vs services vs transit).
  • Every repeated literal (same subnet/port group) becomes a set within two iterations.
  • Every log rule is rate-limited and has an agreed prefix.
  • Ruleset changes are reviewed with “packet story” in mind: can a reader narrate the path in 60 seconds?
  • Audit quarterly: prune dead ports, prune dead source ranges, collapse duplicates into sets.

FAQ

1) Should I use chain policy drop or an explicit final drop rule?

For base chains like input and forward, a chain policy of drop is clean and obvious.
Still keep an explicit in_log_drop chain that logs and drops—because a policy drop doesn’t log by itself.

2) Why not log every drop in multiple places?

Because you’ll drown. Centralize logging at the end of the path, rate-limit it, and use counters elsewhere.
If you need detail, enable temporary trace for a short window.

3) Are sets always faster?

Usually, yes—especially for long lists of IPs/ports. But the bigger win is maintainability.
Sets also let you update membership without rewriting rule logic.

4) Where do I put “bogon” filtering?

Put it in a sanity chain early in input (and forward if you route).
Keep the list as a set, and review it. Don’t block RFC1918 on internal interfaces unless you enjoy self-inflicted outages.

5) How do I avoid accidentally blocking DNS/NTP/monitoring egress?

Start with output policy accept for most servers.
If you must restrict egress, do it as a separate project with full dependency mapping and good observability.
Egress lockdown without inventory is a great way to learn how many “optional” dependencies you actually have.

6) What’s the safest way to migrate from iptables to nftables?

Don’t do a blind translation. Re-express intent using sets and staged chains.
Run side-by-side in a controlled environment where possible, validate with traffic tests, then cut over with an atomic load.

7) Should I put management access and service access in the same chain?

No. Separate them. Management is control-plane, services are data-plane.
They have different source restrictions, different auditing requirements, and different incident responses.

8) How do I debug “some clients can connect, others can’t”?

Check whether the failing clients share a source range not included in your sets.
Use nft monitor trace to see the exact mismatch. Confirm if the clients are using IPv6.

9) Is nftables stateful by default?

No. You decide how to use conntrack state with ct state matches.
Most production host firewalls allow established,related early and treat invalid as drop.

10) How do I keep the ruleset readable when product teams demand “just one more exception”?

Force exceptions into data (set elements) rather than logic (new custom rules).
If the exception changes behavior, it needs its own chain with a name that admits what it is.
Shame is an underrated governance tool.

Next steps that won’t ruin your weekend

A readable nftables ruleset is not a style preference. It’s a reliability feature.
At 500 rules, you’re not fighting packets—you’re fighting entropy.

Do these next:

  1. Split your monolith into includes: defs, base, mgmt, services, nat, debug.
  2. Convert repeated IP/port lists into sets. Keep identity in sets; behavior in chains.
  3. Stage your packet path: sanity → established → allow mgmt → allow services → log/drop.
  4. Adopt the change loop: nft -c → atomic load → verify counters → trace only when needed.
  5. Write down your naming conventions and enforce them in review. You’re building an on-call tool, not a poem.

If you do nothing else: make the packet story obvious. When the pager goes off, clarity is the only performance metric that matters.

← Previous
WSL: The Fastest Way to Get a Real Dev Environment on Windows (No VM Drama)
Next →
CPU Spikes Every Few Minutes: The Scheduled Task You Should Check First

Leave a comment