Debian 13: nf_conntrack table full — why connections fail and how to tune safely

Was this helpful?

If your Debian 13 host suddenly starts “dropping the internet” while CPU is fine and disks are bored, check the kernel log. When you see nf_conntrack: table full, dropping packet, you’re not looking at a vague warning. You’re looking at a hard capacity limit in the kernel’s connection tracking subsystem, and new flows are now paying the price.

This is one of those failure modes that makes smart people say questionable things like “but TCP should retry.” Sure. It does. While your load balancer flips out, your users refresh, your probes go red, and everyone learns that the network stack is a finite resource.

What conntrack is (and why Debian 13 cares)

nf_conntrack is Linux’s connection tracking system. It keeps state about network flows so Netfilter can do stateful firewalling and NAT. That “state” is stored in a kernel table: entries for flows like TCP, UDP, and some ICMP conversations. It’s not optional if you’re doing common things like:

  • Stateful firewall rules (ct state established,related / -m conntrack --ctstate)
  • NAT / masquerade (typical for containers, Kubernetes nodes, home routers, and “temporary” corporate hacks that become permanent)
  • Load balancers or reverse proxies on a host that’s also doing NAT for something else
  • Anything that relies on tracking replies (FTP helpers, some SIP behavior, and other joys from the past)

When the table fills up, the kernel can’t allocate a new entry for a new flow. So it drops packets that would create new state. Existing flows may limp along, but user-visible behavior is “random” failures: some connections work, some hang, some retransmit, some time out.

Interesting facts and context (because this mess has a history)

  1. Conntrack came out of the Netfilter era in the early 2000s to enable stateful firewalling and NAT in Linux without external appliances.
  2. Historically, a lot of “my firewall is fine” incidents were actually conntrack exhaustion—because filtering rules were correct, but the state table wasn’t big enough.
  3. In modern container stacks, conntrack became a shared resource: a single Kubernetes node can track traffic for many pods and services, so bursts hit harder.
  4. The default sizing of conntrack is intentionally conservative to avoid large memory reservations on small machines. Good for laptops. Bad for busy gateways.
  5. NAT consumes conntrack entries aggressively, because each translation has to be tracked; add short-lived connections and the churn skyrockets.
  6. UDP “connections” are fake but still tracked, and their timeouts can quietly keep entries around long after the application stopped caring.
  7. nf_conntrack_buckets (hash table buckets) affects lookup performance; a big table with too few buckets is like a phone book with one letter tab.
  8. Load testing often misses conntrack pressure because it focuses on application throughput, not on flow churn and timeouts.
  9. Some outages blamed on “DDoS” were actually normal traffic plus a timeout change that kept entries alive longer.

One operational rule: conntrack problems rarely show up during calm periods. They show up during deploys, failovers, flash crowds, or when someone changes timeouts without telling you. Which is why you should treat conntrack as capacity planning, not as an emergency knob.

Why connections fail when the table is full

When a packet arrives that requires conntrack state (new TCP SYN, new UDP “session,” a reply that needs NAT mapping), the kernel tries to allocate a conntrack entry. If it can’t, you get a log message and a dropped packet. New connections fail in ways that look like:

  • TCP SYNs retransmitting until the client gives up (looks like packet loss)
  • DNS timeouts (UDP queries dropped, retries happen, latency spikes)
  • Outgoing HTTP calls hanging (especially from busy NATing nodes)
  • Intermittent service discovery failures in Kubernetes

Existing established TCP flows often continue because their conntrack entries already exist. This creates the classic “half the world works” symptom: your SSH session stays alive, but new logins fail. The incident chat fills with superstition.

Joke #1: Conntrack exhaustion is the only time “stateful” networking becomes “stateless” on purpose, and nobody enjoys the purity.

The real bottleneck: entries, not bandwidth

Conntrack capacity is measured in number of tracked flows, not bits per second. A host can have plenty of bandwidth and still fail because it ran out of entries due to:

  • Lots of short-lived connections (microservices with chatty HTTP clients)
  • High fan-out (one service calling dozens of others per request)
  • UDP workloads (DNS, QUIC, telemetry) with timeouts set too high
  • NAT gateways for many clients (each client’s connections accumulate)
  • Attack traffic, scans, or bot noise (even when rate-limited elsewhere)

A single quote, because operations has philosophers too

paraphrased idea — “Reliability comes from designing for failure, not pretending it won’t happen.” Attributed to: John Allspaw (reliability/operations).

Fast diagnosis playbook

This is the order that finds the bottleneck fast, without getting lost in 40 counters and a rabbit hole of “maybe it’s DNS.”

  1. First: confirm it’s conntrack exhaustion, not just packet loss.

    • Check kernel logs for “table full”.
    • Check current usage vs max.
  2. Second: identify what’s creating entries.

    • Count conntrack entries by protocol/state.
    • Look for huge UDP volumes, or piles of SYN_SENT/SYN_RECV.
    • Check whether the host is doing NAT (containers, Kubernetes, VPN, classic masquerade).
  3. Third: decide whether to scale max, shorten timeouts, or stop tracking.

    • If you’re routinely near max: increase capacity and buckets, then validate memory impact.
    • If you’re only spiking: reduce timeouts for the offender and/or fix fan-out or retry storms.
    • If you track traffic you don’t need: disable tracking for those flows (carefully).

Everything else—CPU profiling, fancy dashboards, blaming the firewall—is for after you’ve confirmed table pressure. Conntrack is either full or it isn’t.

Practical tasks: commands, outputs, decisions

These are real tasks you can run on Debian 13. Each one includes what the output means and the decision you make. Run them as root or with appropriate privileges.

Task 1: Confirm the kernel is dropping due to conntrack

cr0x@server:~$ sudo journalctl -k -g "nf_conntrack" -n 20 --no-pager
Dec 29 10:41:12 server kernel: nf_conntrack: table full, dropping packet
Dec 29 10:41:12 server kernel: nf_conntrack: table full, dropping packet
Dec 29 10:41:13 server kernel: nf_conntrack: table full, dropping packet

Meaning: This is the smoking gun. The kernel is refusing new conntrack allocations and dropping packets.

Decision: Stop guessing. Move to measuring usage and identifying what’s filling the table.

Task 2: See current usage vs configured max

cr0x@server:~$ sudo sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_count = 262144
net.netfilter.nf_conntrack_max = 262144

Meaning: You’re at the ceiling. Count equals max, and new flows will be dropped.

Decision: You need relief: either raise max (after checking memory), reduce timeouts, or reduce tracked flows.

Task 3: Check conntrack memory consumption (slab)

cr0x@server:~$ grep -E 'nf_conntrack|conntrack' /proc/slabinfo | head -n 3
nf_conntrack         262144 262144   320  256    8 : tunables    0    0    0 : slabdata   1024   1024      0
nf_conntrack_expect       0      0    96   42    1 : tunables    0    0    0 : slabdata      0      0      0

Meaning: The nf_conntrack slab shows how many objects are allocated and their size. Here, entries are fully allocated.

Decision: Estimate memory impact before raising max. If you’re memory-tight, timeouts or flow reduction may be safer.

Task 4: If conntrack-tools is installed, get a quick summary

cr0x@server:~$ sudo conntrack -S
cpu=0 found=1483921 invalid=121 insert=927314 insert_failed=8421 drop=8421 early_drop=0 error=0 search_restart=0

Meaning: insert_failed and drop rising indicate allocation failures—usually table full, sometimes hash pressure.

Decision: If insert_failed is climbing, you’re actively dropping new flows. Mitigate now; tune after.

Task 5: Count entries by protocol (quick heat map)

cr0x@server:~$ sudo conntrack -L | awk '{print $1}' | sort | uniq -c | sort -nr | head
210944 tcp
50877 udp
323 icmp

Meaning: TCP dominates here. If UDP dominates, you’ll look harder at DNS/QUIC/telemetry and UDP timeouts.

Decision: Choose your next drill-down: TCP state distribution or UDP timeout churn.

Task 6: For TCP, find which states are ballooning

cr0x@server:~$ sudo conntrack -L -p tcp | awk '{for (i=1;i<=NF;i++) if ($i ~ /^state=/) {split($i,a,"="); print a[2]}}' | sort | uniq -c | sort -nr | head
80421 ESTABLISHED
61211 TIME_WAIT
45433 SYN_SENT
12912 CLOSE_WAIT

Meaning: Lots of SYN_SENT can mean outbound connection storms or upstream not answering. Lots of TIME_WAIT can reflect short-lived connections and churn.

Decision: If SYN_SENT is high, look for retries and upstream reachability; if TIME_WAIT is high, focus on app connection reuse and timeouts.

Task 7: Identify top talkers by destination (spot a runaway dependency)

cr0x@server:~$ sudo conntrack -L -p tcp | awk '{for(i=1;i<=NF;i++){if($i ~ /^dst=/){split($i,a,"="); print a[2]}}}' | sort | uniq -c | sort -nr | head
64221 10.10.8.21
33110 10.10.8.22
12044 192.0.2.50

Meaning: One or two destinations are consuming a large chunk of the table. That’s usually a dependency outage, a retry storm, or a NAT aggregation point.

Decision: Pull logs/metrics for the clients talking to those IPs. Consider rate limiting or circuit breaking before touching sysctls.

Task 8: Check whether NAT is in play (nftables)

cr0x@server:~$ sudo nft list ruleset | sed -n '/table ip nat/,/}/p'
table ip nat {
  chain postrouting {
    type nat hook postrouting priority srcnat; policy accept;
    oifname "eth0" masquerade
  }
}

Meaning: The host is doing masquerade NAT. Every translated flow needs tracking, and bursty clients can fill the table quickly.

Decision: Treat this host like a gateway. Size conntrack for aggregate client behavior, not for one service.

Task 9: Check bucket count and max (hash sizing)

cr0x@server:~$ sudo sysctl net.netfilter.nf_conntrack_buckets net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_max = 262144

Meaning: A common guideline is nf_conntrack_max around 4× buckets (not a law, but a decent starting point). Here that ratio is exactly 4.

Decision: If you increase max substantially, revisit buckets too. Otherwise you’ll pay CPU for long hash chains under load.

Task 10: Validate timeouts currently set (the hidden capacity lever)

cr0x@server:~$ sudo sysctl net.netfilter.nf_conntrack_tcp_timeout_established net.netfilter.nf_conntrack_udp_timeout net.netfilter.nf_conntrack_udp_timeout_stream
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180

Meaning: Established TCP flows are held for 5 days here. That’s not “wrong,” but on busy NAT boxes it can be disastrous if clients churn and don’t close cleanly.

Decision: If you see many established entries that are stale (client gone), consider reducing the established timeout—carefully and with awareness of long-lived connections.

Task 11: Inspect per-namespace behavior (containers/Kubernetes)

cr0x@server:~$ sudo lsns -t net | head
        NS TYPE NPROCS   PID USER   NETNSID NSFS COMMAND
4026531993 net     245     1 root unassigned       /lib/systemd/systemd
4026532458 net      12  3112 root unassigned       /usr/bin/containerd

Meaning: Multiple network namespaces exist, but conntrack is still primarily a global kernel resource (with some per-netns accounting depending on setup). Containers can drive flow churn unexpectedly.

Decision: If this is a node, include pod/service traffic in sizing. Don’t size based only on “the app on the host.”

Task 12: Measure connection churn at the socket layer (correlate with conntrack)

cr0x@server:~$ ss -s
Total: 10648 (kernel 0)
TCP:   7342 (estab 1249, closed 5621, orphaned 0, synrecv 18, timewait 5621/0), ports 0

Transport Total     IP        IPv6
RAW       0         0         0
UDP       211       198       13

Meaning: timewait is huge at the socket layer too, matching conntrack churn. This is often an application behavior problem (no keepalive, no pooling) more than a kernel problem.

Decision: If you can fix churn at the app/proxy, do that. Kernel tuning is not a substitute for sane connection reuse.

Task 13: Quick mitigation—raise max immediately (temporary)

cr0x@server:~$ sudo sysctl -w net.netfilter.nf_conntrack_max=524288
net.netfilter.nf_conntrack_max = 524288

Meaning: The ceiling is higher now. Drops should stop if exhaustion was the only problem.

Decision: Treat this as a tourniquet. Follow up with bucket sizing, memory review, and a root-cause analysis of why entries grew.

Task 14: Make the change persistent (Debian sysctl.d)

cr0x@server:~$ sudo sh -c 'cat > /etc/sysctl.d/99-conntrack-tuning.conf <
cr0x@server:~$ sudo sysctl --system | tail -n 5
* Applying /etc/sysctl.d/99-conntrack-tuning.conf ...
net.netfilter.nf_conntrack_max = 524288

Meaning: The setting will survive reboot and is applied now.

Decision: Only persist after you’ve confirmed memory headroom and you’re not masking a traffic storm.

Task 15: Adjust hash buckets safely (boot-time module option)

cr0x@server:~$ sudo sh -c 'cat > /etc/modprobe.d/nf_conntrack.conf <
cr0x@server:~$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.12.0-amd64

Meaning: hashsize is set at module load time; on many systems it’s effectively a boot-time decision. Updating initramfs makes it apply early.

Decision: Plan a reboot window if you need bucket changes. Don’t pretend you can “just sysctl it” in the middle of an outage.

Task 16: Spot obvious scanning/attack patterns (SYNs, invalids)

cr0x@server:~$ sudo conntrack -S | sed -n '1p'
cpu=0 found=1483921 invalid=121 insert=927314 insert_failed=8421 drop=8421 early_drop=0 error=0 search_restart=0

Meaning: If invalid starts spiking with drops, you may be tracking garbage: malformed packets, asymmetric routing, or a partial DDoS.

Decision: Consider upstream filtering and local rate limiting; tuning conntrack larger just gives attackers a bigger toy box.

Sizing strategy that won’t bite you later

There are three knobs people reach for:

  • nf_conntrack_max: maximum number of tracked entries.
  • nf_conntrack_buckets (or module hashsize): hash table buckets for lookups.
  • Timeouts: how long entries stick around, especially for UDP and half-open TCP states.

The safe tuning mindset is: capacity (max), efficiency (buckets), and retention (timeouts). You can “solve” a full table by raising max, but you may just be storing junk longer, spending more memory, and making lookups slower if you ignore buckets.

Step 1: Decide what kind of host this is

  • Pure server (no NAT, minimal firewall state): conntrack may barely matter. If it’s full, something is weird (like an unexpected NAT rule, a proxy sidecar, or local container networking).
  • Gateway/NAT node: conntrack is central. Size it based on aggregate clients and their behavior. This is not the place to be stingy.
  • Kubernetes node: it’s a gateway in disguise. Expect bursts during rollouts, autoscaling, and node drains.

Step 2: Estimate the required entry count (practical math, not fantasy)

Don’t over-model. Use observed peaks and add headroom:

  • Measure nf_conntrack_count during normal busy periods.
  • Measure it during deploys/failovers (the “bad day” baseline).
  • Set nf_conntrack_max to at least 2× the 99th percentile peak for production gateways, sometimes 3× if you have retry storms.

If you can’t measure peaks (new environment), start conservatively high for gateways and then validate memory. The cost of being too low is outages. The cost of being too high is memory usage and potential lookup overhead (which buckets can mitigate).

Step 3: Validate memory headroom

Conntrack entries live in kernel memory (slab). Raising nf_conntrack_max increases potential kernel memory use. The per-entry size varies by kernel and features, so treat it as “hundreds of bytes,” then confirm with slabinfo.

On a machine that already flirts with OOM, cranking conntrack is like buying a bigger warehouse by moving into your kitchen.

Timeouts: the silent multiplier

Timeouts determine how long entries stick. This matters because many workloads create flows faster than you think, and a small change in retention time can double or triple steady-state entry count.

What timeouts usually cause pain

  • UDP timeout too high: common for “let’s be safe” tuning. It keeps dead UDP flows around and eats the table.
  • TCP established timeout too high on NAT gateways: can hold onto state for clients that vanished (mobile clients, flaky Wi‑Fi, crashy apps).
  • Half-open TCP timeouts: if under scan or SYN flood patterns, half-open entries can dominate.

Joke #2: Raising timeouts “to be safe” is like keeping every meeting invite forever because someday you might attend it retroactively.

A sane approach to timeouts

Don’t randomly set timeouts from blog posts. Instead:

  1. Identify the protocol/state dominating conntrack.
  2. Confirm it correlates with real user value (long-lived TCP sessions) or with churn/noise (DNS bursts, retries, scans).
  3. Adjust the smallest set of timeouts that reduce junk retention without breaking legitimate long-lived flows.

For many environments, the best fix is not a timeout tweak but stopping connection churn: HTTP keepalive, connection pooling, and sane client retry budgets. Conntrack should track real flows, not the byproduct of a thundering herd.

Hash buckets and performance: nf_conntrack_buckets

Even if you never hit the max, a poorly sized hash table can hurt. Conntrack lookups happen in the hot path of packet processing; long hash chains mean extra CPU per packet, which shows up as latency and softirq time.

What you can control

  • nf_conntrack_buckets is often read-only at runtime and set at module init.
  • Module option hashsize= is the common way to set it persistently.

Guideline that works more often than it fails

A common operational ratio is:

  • nf_conntrack_max ≈ 4 × nf_conntrack_buckets

This keeps average chain lengths reasonable. It’s not magic. It’s a starting point. If your traffic has weird tuple distributions or you’re under attack, you still need observation.

Debian 13 specifics: sysctl, systemd, and persistence

Debian 13 behaves like modern Debian: kernel params via sysctl, persistent configs under /etc/sysctl.d/, module options under /etc/modprobe.d/, and early boot handled by initramfs.

What to do on Debian 13

  • Put conntrack sysctls in a dedicated file like /etc/sysctl.d/99-conntrack-tuning.conf.
  • For hashsize, set options nf_conntrack hashsize=... in /etc/modprobe.d/nf_conntrack.conf and run update-initramfs -u.
  • Reboot during a window to apply bucket changes reliably.

Also: if your host uses containers, check that your “simple firewall” is not secretly doing NAT for bridge networks. Many are. Conntrack doesn’t care that it was “just for dev.”

Three corporate mini-stories from real life

Mini-story 1: The incident caused by a wrong assumption

The company had a Debian-based edge tier: a handful of hosts running a reverse proxy, plus some firewall rules for comfort. No NAT, no VPN, nothing fancy. They were convinced conntrack was irrelevant because “we’re not a router.”

Then a new security baseline landed: a stateful default-deny policy with broad “established/related” allowances. It was correct from a rules standpoint, and it did stop some garbage. It also quietly made conntrack mandatory for basically all inbound flows.

A few weeks later, a marketing campaign hit. Connection churn increased—lots of short TLS handshakes due to aggressive clients and a misconfigured keepalive. At peak, the kernel log started screaming about conntrack table full. The proxy looked “healthy” on CPU, and the network team blamed upstream packet loss. Meanwhile, new users couldn’t connect, but existing sessions stayed alive, which made the whole thing look like a partial outage.

The wrong assumption wasn’t “conntrack is bad.” The wrong assumption was “conntrack is only for NAT.” On a stateful firewall, conntrack is the product.

Fix: they increased nf_conntrack_max with a sane headroom factor, tuned the hashsize to match, and—more importantly—fixed keepalive and connection reuse in the proxy. After that, conntrack count stabilized at a fraction of max, and the “random” failures vanished.

Mini-story 2: The optimization that backfired

A platform team wanted lower latency for UDP-based telemetry and decided to “avoid drops” by increasing UDP conntrack timeouts. The change was rolled across a set of Debian 13 nodes acting as NAT gateways for branch sites and container workloads. Nobody did a capacity review because the nodes had plenty of RAM and “it’s just a timeout.”

For a while it looked fine. Then a routine deployment caused a minor brownout in a downstream collector. Clients retried. UDP flows churned. The longer timeout kept entries around longer, so the conntrack table filled with dead conversations that would never receive replies. New flows began failing, which triggered more retries, which created more flows. Classic positive feedback loop. The incident graph looked like a tidy exponential curve drawn by someone who hates you.

They tried raising nf_conntrack_max during the incident, which helped briefly but increased CPU due to poor bucket sizing and longer lookup chains. The net result was “it’s less broken but slower,” which is not a victory you want to announce.

The backfire wasn’t the idea of tuning. It was tuning without measuring churn and without understanding that retention time sets steady-state table occupancy. The correct fix ended up being: revert the UDP timeout increase, reduce retries at the client, and add backpressure in the pipeline. Afterward, they raised conntrack max moderately—with matching buckets—based on measured peaks, not vibes.

Mini-story 3: The boring but correct practice that saved the day

A different org had a policy: every gateway-like host had a “kernel capacity” dashboard with three boring lines: nf_conntrack_count, nf_conntrack_max, and slab usage for nf_conntrack. No fancy SLO math. Just the basics and an alert at 70% sustained.

During a supplier outage, their services started retrying outbound HTTPS calls more aggressively. That’s a client behavior issue, but it manifests as flow churn on the NAT node. Their conntrack alert fired before customers noticed anything. The on-call saw count rising, confirmed SYN_SENT dominating, and throttled the retrying job class. They also temporarily raised conntrack max because they had pre-calculated memory headroom and had a documented “safe bump” range.

No heroics, no war room. The incident ticket was almost embarrassing: “Prevented conntrack exhaustion by rate limiting retries; adjusted max within planned range.” That’s the kind of boring you want in production.

The saving practice was not a secret kernel flag. It was capacity observability, pre-approved tuning bounds, and a culture of treating retries as production traffic, not as free therapy for failing dependencies.

Common mistakes: symptoms → root cause → fix

1) “Some connections work, new ones fail” → conntrack full → raise max and stop churn

Symptoms: Existing SSH stays alive, but new SSH/HTTP connections hang. Kernel logs show drops.

Root cause: Table at nf_conntrack_max. New flows can’t be tracked.

Fix: Immediate: raise nf_conntrack_max (temporary). Then find the churn source and fix keepalive/retries; adjust buckets if you keep the higher max.

2) “We increased max but it’s still slow” → buckets too small → set hashsize and reboot

Symptoms: Drops stop, but latency rises and CPU in softirq increases.

Root cause: You increased max without increasing buckets, causing longer hash chains and slower lookups.

Fix: Set module hashsize (or adjust buckets if supported) so bucket count scales with max, then reboot in a window.

3) “It only happens during deploys” → retry storms + short-lived flows → fix client behavior

Symptoms: Conntrack count spikes during rollouts or failovers.

Root cause: Connection churn from retries, health checks, or service discovery flapping.

Fix: Reduce retries, add jitter, enable connection pooling, and make health checks less aggressive. Then size conntrack for realistic peak.

4) “UDP is eating everything” → UDP timeouts too high or noisy traffic → tune UDP and filter noise

Symptoms: Conntrack dominated by UDP entries; DNS failures are common.

Root cause: High UDP churn with timeouts retaining entries, plus possible scanning or telemetry noise.

Fix: Keep UDP timeouts sane, reduce telemetry fan-out, and filter obvious garbage at the edge when appropriate.

5) “It started after enabling Docker/Kubernetes” → hidden NAT and iptables rules → treat node as gateway

Symptoms: A “regular server” suddenly hits conntrack limits after container runtime install.

Root cause: Container networking adds NAT/forwarding and stateful rules, increasing tracked flows.

Fix: Inventory NAT rules, then size conntrack accordingly. Consider dedicated egress gateways for heavy workloads.

6) “conntrack invalid is huge” → asymmetric routing or offload weirdness → fix routing/paths

Symptoms: High invalid counts, strange drops, sometimes only one direction broken.

Root cause: Traffic returns via a different path than it left (asymmetric routing), or packets bypass tracking expectations due to topology/offload choices.

Fix: Fix routing symmetry or adjust design so stateful tracking sits in a consistent path. Avoid band-aids that just enlarge conntrack.

Checklists / step-by-step plan

Checklist: during an active incident (15–30 minutes)

  1. Confirm logs show nf_conntrack: table full and count equals max.
  2. Get a quick protocol/state breakdown (TCP vs UDP; SYN_SENT vs ESTABLISHED).
  3. Check if this host is doing NAT (nftables/iptables NAT chains).
  4. Apply a temporary max increase if you have memory headroom.
  5. Throttle the biggest churn source (retrying jobs, abusive clients, misbehaving health checks).
  6. Capture a short evidence bundle for later: sysctl values, conntrack summary, top destinations.

Checklist: after stabilization (same day)

  1. Decide the host role: gateway/NAT, k8s node, or pure server.
  2. Set persistent nf_conntrack_max based on measured peaks plus headroom.
  3. Plan bucket/hashsize adjustment to match the new max; schedule reboot.
  4. Review timeouts; revert any “be safe” increases that inflate retention.
  5. Fix churn at the source: keepalive, pooling, retry budgets, backoff/jitter.
  6. Add alerting at 70–80% sustained usage and a dashboard panel for count/max/slab.

Checklist: long-term hardening (this quarter)

  1. Capacity test connection churn, not just throughput. Include deploy/failover scenarios.
  2. Document a safe emergency range for nf_conntrack_max based on memory headroom.
  3. If you run Kubernetes, treat conntrack as a node-level SLO dependency; size per node type.
  4. Consider moving heavy NAT/egress to dedicated gateways to isolate blast radius.
  5. Audit firewall rules for unnecessary tracking (carefully; don’t break stateful security).

FAQ

1) Does “nf_conntrack table full” mean my firewall rules are wrong?

No. It usually means your rules are working as designed (stateful), but the state table is too small for the traffic pattern or timeouts.

2) Can I just set nf_conntrack_max to something huge and forget it?

You can, but you’ll pay kernel memory and potentially CPU (especially if buckets aren’t scaled). Also, a bigger table can let scans/attacks consume more state before you notice.

3) How do I know if NAT is the reason?

Look for masquerade/SNAT rules in nftables or iptables NAT tables, and confirm the host is forwarding traffic for other clients or containers. NAT almost always increases conntrack pressure.

4) Why do DNS failures show up first?

DNS is UDP, short-lived, and latency-sensitive. When new UDP flows get dropped, clients retry and amplify churn. It’s often the first visible crack.

5) What’s the difference between nf_conntrack_buckets and nf_conntrack_max?

nf_conntrack_max is capacity (how many entries). Buckets determine lookup performance. A large max with too few buckets increases CPU cost per packet.

6) Do established TCP connections drop when the table is full?

Usually not immediately, because they already have entries. The pain is mostly for new connections and for traffic needing new NAT mappings.

7) Can I disable conntrack for some traffic?

Sometimes, yes (for specific flows where tracking is unnecessary). But it’s easy to break NAT, stateful firewall expectations, and reply traffic. Do it only when you understand the path and have tests.

8) Is this an iptables problem or an nftables problem?

Neither and both. iptables/nftables are rule frontends; conntrack is a kernel subsystem they rely on. Migrating to nftables won’t magically remove conntrack pressure.

9) What if conntrack_count isn’t near max but I still see drops?

Look at conntrack -S for insert failures, check bucket sizing, and consider whether you have bursts causing transient exhaustion or memory pressure. Also verify logs are current and not historical.

10) Does increasing nf_conntrack_max require a reboot?

No, you can change nf_conntrack_max at runtime via sysctl. Bucket/hashsize changes typically require module reload or reboot to apply safely.

Next steps you can actually do this week

Do these in order, and you’ll turn conntrack from a surprise outage into a managed capacity metric:

  1. Add visibility: graph nf_conntrack_count, nf_conntrack_max, and conntrack slab usage. Alert at sustained 70–80%.
  2. Classify your nodes: anything doing NAT or running container networking is a gateway. Treat it like one.
  3. Pick a sane max: base it on observed peaks with headroom; persist it via /etc/sysctl.d/.
  4. Match buckets to max: set module hashsize, update initramfs, and schedule a reboot window.
  5. Fix churn: reduce retries, add jitter, enable keepalive/pooling, and stop turning transient failures into traffic floods.
  6. Review timeouts: revert “just in case” increases; tune only with evidence and awareness of long-lived connections.

Conntrack is a shared kernel table. Treat it like disk space on a database server: finite, measurable, and capable of ruining your afternoon if you ignore it.

← Previous
ZFS “No Space Left” Lies: Fixing Space Without Random Deletes
Next →
Proxmox systemd “Dependency failed”: Find the Service That Breaks Boot

Leave a comment