You installed Fail2ban. You enabled an SSH jail. You even watched someone hammer port 22 like it owes them money. And yet: nobody gets banned. No firewall entries. No satisfying “banned” counters. Just silence.
In production, silence is rarely peace. It’s usually telemetry missing, a wrong backend, the wrong firewall family, or a jail that never matched a line. The fix isn’t “reinstall Fail2ban” (that’s how you lose an afternoon). The fix is verification: prove the data path, prove the match, prove the ban, and prove the ban is enforceable.
Fast diagnosis playbook
If you’re on-call, start here. This is the “find the bottleneck in five minutes” sequence. It’s opinionated because the alternative is wandering around config files while attacks continue.
First: is Fail2ban alive and watching the jail you think it is?
- Check service health and recent errors in the journal.
- List enabled jails and verify your target jail is present.
Second: is it seeing events (logs) and matching them (filters)?
- Confirm the backend (systemd journal vs log files).
- Confirm log source actually contains the failure lines you expect.
- Run a regex test against known log lines.
Third: is it trying to ban, and are bans being enforced?
- Manually ban a test IP and see whether firewall rules appear.
- Validate you’re on nftables or iptables and that Fail2ban is configured accordingly.
- Validate the action selected in the jail and that the firewall “family” matches the system reality.
The most common failure mode on modern Ubuntu isn’t Fail2ban itself. It’s a mismatch between where logs live (systemd-journald vs files) and which firewall layer is actually active (nftables vs iptables-legacy). Fail2ban can’t ban what it can’t see, and it can’t block what it can’t program.
The mental model: events → matches → bans → enforcement
Fail2ban is simple in the same way a smoke detector is simple. It listens. It recognizes a pattern. It takes an action. If you put it in the wrong room, or the batteries are dead, it “works” with impressive calm.
Think in four layers:
- Events: authentication failures appear somewhere (a file like
/var/log/auth.logor the systemd journal). - Matches: a jail’s filter regex matches those event lines and counts failures per IP over a window.
- Bans: when thresholds are exceeded (findtime/maxretry), Fail2ban decides to ban an IP.
- Enforcement: the ban action inserts rules into a firewall (nftables or iptables) or tells a service (like an app’s ACL) to block.
When Fail2ban “isn’t banning anything,” you don’t guess which layer failed. You prove it. In practice:
- No events: wrong log source, permissions, or the service logs elsewhere (container logs, syslog disabled, etc.).
- Events but no matches: wrong filter, wrong jail, wrong log format, or your service isn’t producing the failure strings your filter expects.
- Matches but no bans: thresholds too high, ignoreip catches everything, time settings off, or the jail isn’t enabled.
- Bans but no enforcement: wrong action, wrong firewall backend, firewall manager conflict, or rule order means your “block” never blocks.
Paraphrased idea (Gene Kim): “Reliability comes from fast feedback and small, reversible changes.” That’s what this workflow is: tight feedback loops that let you stop guessing.
Interesting facts and context (quick, useful, occasionally nerdy)
- Fail2ban predates the cloud era. It started in the early 2000s, when SSH brute force and FTP password spraying were the daily background noise of the Internet.
- It doesn’t “detect attackers.” It detects log patterns. If an attacker avoids generating your expected logs (or hits an endpoint you’re not monitoring), Fail2ban stays politely unaware.
- The original default target was iptables. The ecosystem slowly moved toward nftables; today, Linux firewalls often run via nftables even when you type “iptables.”
- systemd-journald changed the log game. On some setups,
/var/log/auth.logis present; on others, the journal is the canonical source. Fail2ban can read either, but you must pick correctly. - “iptables” can mean multiple things. There’s iptables-legacy and iptables-nft. The command name is the same, the backend differs, and rules may land somewhere you’re not looking.
- Ubuntu’s Uncomplicated Firewall (UFW) is a policy layer. Underneath it historically used iptables; on newer systems it may integrate with nftables. Fail2ban can coexist, but ordering matters.
- IPv6 is the quiet footgun. If your service is reachable via IPv6 and you only ban IPv4, your “ban” is a speed bump, not a block.
- DNS and reverse lookups can stall bans. If you enable features that cause slow DNS operations inside the ban pipeline, you can create weird “it’s banning hours later” behavior.
Joke #1: Fail2ban doesn’t “not work.” It works exactly as configured, which is a much less comforting sentence at 03:00.
The verification workflow: practical tasks with commands, meaning, decisions
This is the core of the piece: real checks, in a sane order, with “what it means” and “what you do next.” Run them on the host where Fail2ban is installed.
Task 1 — Confirm Fail2ban is actually running (and not flapping)
cr0x@server:~$ systemctl status fail2ban --no-pager
● fail2ban.service - Fail2Ban Service
Loaded: loaded (/usr/lib/systemd/system/fail2ban.service; enabled; preset: enabled)
Active: active (running) since Sun 2025-12-28 09:12:19 UTC; 1h 7min ago
Docs: man:fail2ban(1)
Main PID: 1247 (fail2ban-server)
Tasks: 5 (limit: 19020)
Memory: 39.8M
CPU: 1.421s
CGroup: /system.slice/fail2ban.service
└─1247 /usr/bin/python3 /usr/bin/fail2ban-server -xf start
What it means: “active (running)” is table stakes. If it’s “failed” or restarting, bans won’t stick and jails may never load.
Decision: If it’s not stable, go straight to Task 2 (journal errors) before touching config.
Task 2 — Read the last errors/warnings from Fail2ban
cr0x@server:~$ journalctl -u fail2ban -n 200 --no-pager
Dec 28 09:12:19 server fail2ban-server[1247]: Server ready
Dec 28 10:01:07 server fail2ban-server[1247]: WARNING Found no accessible config files for 'filter.d/sshd'. Skipping...
Dec 28 10:01:07 server fail2ban-server[1247]: ERROR Unable to read the filter 'sshd'
What it means: This is Fail2ban telling you it can’t load a filter, can’t open a log, can’t talk to the firewall, or can’t bind its socket. Believe it.
Decision: Fix the exact reported file/permission problem first. Don’t “tune bantime” while it can’t load the jail.
Task 3 — List running jails (the ones Fail2ban actually loaded)
cr0x@server:~$ sudo fail2ban-client status
Status
|- Number of jail: 2
`- Jail list: sshd, nginx-http-auth
What it means: If your expected jail isn’t listed, it isn’t running. Full stop.
Decision: If a jail is missing, validate config in Task 4 and check enablement in the jail file. No jail, no ban.
Task 4 — Validate configuration parse without starting a guessing contest
cr0x@server:~$ sudo fail2ban-client -d
...snip...
Jail sshd: backend = systemd
Jail sshd: enabled = true
Jail sshd: maxretry = 5
Jail sshd: findtime = 600
Jail sshd: bantime = 3600
Jail sshd: action = nftables[type=multiport]
...snip...
What it means: -d dumps the computed configuration after includes. This is where you catch “I edited jail.conf but jail.local overrides it” and other classics.
Decision: If the backend/action/paths aren’t what you expect, stop and correct overrides. Don’t debug the wrong reality.
Task 5 — Inspect a specific jail’s status (it’s your scoreboard)
cr0x@server:~$ sudo fail2ban-client status sshd
Status for the jail: sshd
|- Filter
| |- Currently failed: 1
| |- Total failed: 44
| `- File list: /var/log/auth.log
`- Actions
|- Currently banned: 0
|- Total banned: 0
`- Banned IP list:
What it means: “Total failed” rising means it’s reading logs and matching. “Total banned” staying at zero means thresholds aren’t met, ignore rules apply, or action is failing.
Decision: If “File list” points to a file you don’t have or don’t use (common on journal-based systems), jump to Tasks 6–8.
Task 6 — Confirm where authentication failures are logged on Ubuntu 24.04
cr0x@server:~$ ls -l /var/log/auth.log
-rw-r----- 1 syslog adm 148322 Dec 28 10:18 /var/log/auth.log
What it means: If /var/log/auth.log exists and is updating, file backend can work. If it’s missing or stale, you probably need the systemd backend.
Decision: If missing/stale, use Task 7 to confirm the journal has the events and then configure backend = systemd for the jail.
Task 7 — Prove the failures exist in the journal
cr0x@server:~$ sudo journalctl -u ssh --since "30 min ago" --no-pager | tail -n 20
Dec 28 10:09:41 server sshd[3188]: Failed password for invalid user admin from 203.0.113.90 port 49152 ssh2
Dec 28 10:09:44 server sshd[3188]: Failed password for root from 203.0.113.90 port 49153 ssh2
What it means: The system has the evidence Fail2ban needs. Now you must ensure the jail reads the same source.
Decision: If the journal has entries but Fail2ban’s jail “File list” shows /var/log/auth.log, switch to backend = systemd (and remove misleading logpath overrides).
Task 8 — Check the jail’s effective backend and logpath
cr0x@server:~$ sudo fail2ban-client get sshd backend
systemd
cr0x@server:~$ sudo fail2ban-client get sshd logpath
/var/log/auth.log
What it means: Yes, you can end up with a systemd backend and a stale logpath still configured (especially if you copied older snippets). That’s confusing and unnecessary.
Decision: If using systemd backend, remove logpath from that jail unless you have a specific reason. Keep the configuration boring.
Task 9 — Verify Fail2ban can read the log source (permissions and groups)
cr0x@server:~$ ps -o user,group,comm -p $(pgrep -xo fail2ban-server)
USER GROUP COMMAND
root root fail2ban-server
cr0x@server:~$ sudo -u root test -r /var/log/auth.log && echo "readable"
readable
What it means: On Ubuntu, Fail2ban typically runs as root, so file permissions are rarely the blocker. But if you harden it to run unprivileged, log access becomes real.
Decision: If you run Fail2ban as non-root, ensure it can read the logs and manipulate the firewall. If it can’t, it won’t.
Task 10 — Run a filter regex test against real data (stop hoping)
cr0x@server:~$ sudo fail2ban-regex /var/log/auth.log /etc/fail2ban/filter.d/sshd.conf --print-all-matched
Running tests
=============
Use failregex filter file : sshd, basedir: /etc/fail2ban
Use log file : /var/log/auth.log
Results
=======
Failregex: 12 total
|- #) [# of hits] regular expression
| 1) [12] Failed password for .* from <HOST>
Ignoreregex: 0 total
Summary
=======
Lines: 1828 lines, 0 ignored, 12 matched, 1816 missed
What it means: If this reports “0 matched,” Fail2ban can’t possibly ban based on that filter/log combination. Your logs may be in the journal, or your SSHD messages are different (PAM modules, localization, etc.).
Decision: Fix matchability before you touch thresholds. If it doesn’t match, it doesn’t matter what maxretry is.
Task 11 — Confirm time window logic isn’t preventing bans
cr0x@server:~$ sudo fail2ban-client get sshd findtime
600
cr0x@server:~$ sudo fail2ban-client get sshd maxretry
5
cr0x@server:~$ sudo fail2ban-client get sshd bantime
3600
What it means: You ban when an IP produces maxretry matches inside findtime. If attackers spread attempts across time or across usernames with different log lines, you may not hit threshold.
Decision: If you have “Total failed” rising but no bans, temporarily lower maxretry to 2 and keep findtime reasonable (e.g., 10 minutes). Verify bans happen, then tune back up.
Task 12 — Check ignore rules (the silent immunity blanket)
cr0x@server:~$ sudo fail2ban-client get sshd ignoreip
127.0.0.1/8 ::1 10.0.0.0/8 192.168.0.0/16
What it means: If your traffic comes from a VPN or NAT range inside ignoreip, Fail2ban will calmly do nothing. This is a feature and also an outage generator.
Decision: Keep ignore lists narrow. Add your admin IPs, not entire corporate address space, unless you enjoy explaining why compromised laptops can brute force freely.
Task 13 — Force a ban (controlled test) to verify the action layer
cr0x@server:~$ sudo fail2ban-client set sshd banip 203.0.113.90
1
cr0x@server:~$ sudo fail2ban-client status sshd
Status for the jail: sshd
|- Filter
| |- Currently failed: 0
| |- Total failed: 44
| `- File list: /var/log/auth.log
`- Actions
|- Currently banned: 1
|- Total banned: 1
`- Banned IP list: 203.0.113.90
What it means: This proves Fail2ban’s control plane can issue bans. It does not prove the firewall is enforcing it. That’s next.
Decision: If the ban shows up here, proceed to firewall validation. If it doesn’t, your jail/action is broken inside Fail2ban (or you used the wrong jail name).
Task 14 — Identify whether the system uses nftables or iptables for enforcement
cr0x@server:~$ sudo update-alternatives --display iptables
iptables - auto mode
link best version is /usr/sbin/iptables-nft
link currently points to /usr/sbin/iptables-nft
link iptables is /usr/sbin/iptables
/usr/sbin/iptables-nft - priority 20
/usr/sbin/iptables-legacy - priority 10
What it means: If iptables points to iptables-nft, then “iptables rules” may actually be nftables rules underneath. If you use a Fail2ban nftables action, you should inspect nftables directly.
Decision: Pick one enforcement path and observe it with the correct tooling. Don’t mix iptables-legacy with nftables unless you’re intentionally running parallel universes.
Task 15 — Verify nftables has a Fail2ban table/chain (if using nftables actions)
cr0x@server:~$ sudo nft list ruleset | sed -n '1,120p'
table inet filter {
chain input {
type filter hook input priority filter; policy accept;
ct state established,related accept
iif "lo" accept
tcp dport 22 accept
}
}
table inet f2b-table {
set f2b-sshd {
type ipv4_addr
elements = { 203.0.113.90 }
}
chain f2b-sshd {
ip saddr @f2b-sshd drop
return
}
}
What it means: You want to see an f2b table/chain or set with your banned IP. If it exists, Fail2ban did program nftables.
Decision: If the set/chain exists but traffic still passes, you probably have rule ordering issues (Fail2ban chain not hooked early enough), or your service is reachable via IPv6 while you banned IPv4 only.
Task 16 — Verify iptables has Fail2ban chains (if using iptables actions)
cr0x@server:~$ sudo iptables -S | sed -n '1,80p'
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N f2b-sshd
-A INPUT -p tcp -m tcp --dport 22 -j f2b-sshd
-A f2b-sshd -s 203.0.113.90/32 -j REJECT --reject-with icmp-port-unreachable
-A f2b-sshd -j RETURN
What it means: The presence of f2b-sshd chain and a jump from INPUT to it is the proof of enforcement configuration.
Decision: If you have a chain but no jump into it, the action only created the chain but didn’t attach it. That’s an action config issue.
Task 17 — Verify the ban actually blocks packets (don’t trust config; test behavior)
cr0x@server:~$ sudo timeout 5 tcpdump -ni any 'host 203.0.113.90 and tcp port 22'
tcpdump: data link type LINUX_SLL2
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
10:21:02.112345 IP 203.0.113.90.49160 > 198.51.100.10.22: Flags [S], seq 123456789, win 64240, options [mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 7], length 0
What it means: You see SYNs arriving. Now check whether the server replies (SYN-ACK) or drops/rejects. If your ban is “drop,” you should see incoming SYNs without replies.
Decision: If you still see SYN-ACK responses leaving, your firewall rule is not being hit (wrong chain/order/interface family), or the service is bound differently than you assume.
Task 18 — Unban your test IP and clean up
cr0x@server:~$ sudo fail2ban-client set sshd unbanip 203.0.113.90
1
What it means: You’re not trying to permanently ban documentation IPs (or your coworker’s hotel Wi-Fi). You’re verifying the pipeline.
Decision: If unban doesn’t remove firewall state, your action cleanup is broken, or you’re looking at the wrong firewall backend.
Ubuntu 24.04 firewall reality: nftables, iptables, and what Fail2ban actually touches
Ubuntu 24.04 sits in the era where nftables is the underlying truth, even when your muscle memory still types iptables. Fail2ban can use either, but you must align:
- nftables actions create nft tables/chains/sets directly. You validate with
nft list ruleset. - iptables actions insert iptables rules. On systems with iptables-nft, those rules may be translated into nftables structures, but you still manage them via iptables tools.
Pick one enforcement interface and observe it correctly
If you configure Fail2ban to use nftables actions, don’t validate bans using iptables -S and then declare it broken. That’s like checking the oil level by staring at the tires. Technically it’s all part of the car, but you’re not measuring the right thing.
UFW complicates, but doesn’t forbid, Fail2ban
UFW is a policy manager, not a firewall engine. Fail2ban’s classic iptables action often coexists with UFW because UFW leaves room for extra chains. But UFW can also enforce strict default policies and order its chains before or after Fail2ban chains depending on how you integrate.
Operational rule: if you use UFW, ensure Fail2ban’s blocking is evaluated early in the input path for the targeted ports. If you don’t know, test with Task 17 and stop theorizing.
IPv6: the “banned but still connecting” culprit
If your service listens on :: (IPv6 wildcard) and the client has IPv6 connectivity, an IPv4 ban won’t touch that connection. That can look like “Fail2ban isn’t banning” when it is, just on the wrong protocol family.
Practical approach: confirm what address family clients use, and configure Fail2ban actions that cover both IPv4 and IPv6 where appropriate.
Filters and regex: when Fail2ban “works” but never matches
Filters are where your assumptions go to die. Fail2ban doesn’t understand “an attack.” It understands “a log line that matches this regex.” That’s both its power and its trap.
Know your log format: OpenSSH isn’t the only voice in auth
Between PAM, different SSHD settings, and sometimes custom patches, the failure strings can vary. Some environments emit “Invalid user” lines, others emit “Failed password,” others emit different fields for publickey failures. Your filter must match your reality.
Use fail2ban-regex as a gate, not as a debugging afterthought
If you do one disciplined thing: run fail2ban-regex against the exact log source you configured. If it doesn’t match, you don’t have a ban problem. You have a pattern problem.
Journald backends change how you scope
When using backend = systemd, Fail2ban uses journal queries rather than tailing a file. That’s usually better on modern systems, but it means old “logpath” wisdom can mislead you. In the journal world, the question becomes: is the unit name correct (e.g., ssh vs sshd) and are your journal permissions sufficient?
Joke #2: Regex is a write-only language—right up until the incident report demands you read it back.
Actions: when it matches, but nothing gets blocked
Once you’ve proven matches, the next failure class is “Fail2ban decided to ban, but enforcement did nothing.” That usually falls into one of these bins:
- Wrong firewall family: you used an iptables action on a system where you only inspect nftables, or vice versa.
- Rule ordering: the ban rule exists but is never reached because earlier rules accept the traffic (common with broad ACCEPT rules, or input hooks that bypass the chain).
- Interface mismatch: your service is on a different port/protocol than the action assumes, or you’re banning port 22 while SSH runs on 2222.
- IPv6 mismatch: you banned IPv4 but the client connects via IPv6.
- Permissions/capabilities: Fail2ban can’t actually program the firewall due to hardening or system policy.
Manual ban/unban is your best “unit test”
In real systems, you can spend hours waiting for a ban threshold to trigger. Don’t. Use Task 13 to force a ban and then inspect firewall state. If manual ban doesn’t create firewall entries, you have an action execution problem. If it does, but automatic bans don’t happen, you have an event/match/threshold problem.
Verify chain attachment, not just chain existence
It’s possible to have a beautifully populated Fail2ban chain that nothing ever jumps to. That’s the firewall equivalent of building a security guard booth in the parking lot and never staffing it.
For iptables, you need to see a jump from INPUT (or the relevant chain) into f2b-*. For nftables, you need the f2b chain referenced in the correct hook path, or a set used by a rule that actually runs for inbound traffic.
Three corporate mini-stories from the banless trenches
1) The incident caused by a wrong assumption
The team had a small fleet of Ubuntu servers upgraded to a newer release. The migration checklist included “install Fail2ban” and “enable sshd jail,” and someone checked the box. A few days later, their authentication logs showed a sustained brute-force pattern: thousands of failed attempts, steady as a metronome.
Everyone assumed the ban system had it covered because it always had. The on-call engineer looked at fail2ban-client status and saw the sshd jail “running.” Comforting. They went back to other work.
The next morning, someone noticed CPU spikes correlated with authentication bursts. The spikes weren’t catastrophic, but they were enough to degrade a latency-sensitive service sharing the same VM class. When they finally pulled the jail status details, “Total failed” was ticking up and “Total banned” was stuck at zero.
The wrong assumption: “If the jail is running, it must be banning.” In reality, the jail was reading /var/log/auth.log, which existed but was stale because the system’s auth events were now in journald only. No live log input. No ban.
The fix was boring and fast: switch the jail to backend = systemd, remove the logpath override that pinned it to the stale file, then force-ban a test IP to validate nftables rules. Once the chain was visible and matched, the brute-force noise dropped immediately. The incident postmortem was short and slightly embarrassing, which is the best kind.
2) The optimization that backfired
A security-minded engineer wanted to reduce disk writes and log volume. They tightened rsyslog configuration and leaned more heavily on journald. That’s not inherently bad. But they also adjusted rotation and retention in a way that made some file-based logs disappear entirely. Nobody noticed because the services still worked and the journal still had data.
Fail2ban, however, was still configured with file backends and pointed at paths that used to exist. Since Fail2ban didn’t crash, the assumption became “it’s fine.” It started, it ran, it even reported jails. But it was effectively tailing empty air.
Then came the subtle part: they also increased findtime and bantime dramatically to “be tougher.” The intention was good—ban longer, ban harder. The effect was worse: when they later re-enabled logging, Fail2ban suddenly had a larger window to evaluate, but the regex filter matched additional benign lines due to a custom PAM message format. It started banning internal NAT egress IPs during a maintenance window.
Operations saw the result as a flaky network issue. Developers saw it as “SSH is broken.” Security saw it as “we are under attack.” The truth was: an optimization in logging plus aggressive ban tuning amplified a filter mismatch.
The recovery path was disciplined: revert to a minimal set of known-good filters, switch the backends to journald explicitly, reduce thresholds to sane defaults, and narrow ignoreip to the actual admin ranges. They kept journald as the primary log store, but they stopped pretending that invisible logs could be parsed by tools expecting files.
3) The boring but correct practice that saved the day
Another org ran a standardized “security controls verification” job after every OS patching cycle. It wasn’t fancy; it was a handful of commands captured into a runbook with expected outputs. One of those checks forced a temporary ban on a safe test IP and validated that a firewall rule appeared in the correct subsystem.
During a routine rollout, a new base image changed the iptables alternative from legacy to nft. Nothing else in the application stack broke. The deployment looked clean. But the verification job failed: the test ban appeared in Fail2ban’s status, yet the expected iptables chain wasn’t present. No matching rules.
Because the test was routine, it was noticed immediately. The team didn’t wait for an attacker to prove the gap. They simply updated the Fail2ban action to nftables and adjusted the validation to inspect nft list ruleset instead of grepping iptables output. Ten minutes later, the pipeline passed. They pushed the new baseline and moved on.
The best part: nobody had to argue in a meeting about whether bans “probably still work.” They had evidence. Boring, correct practice. It doesn’t win awards, but it keeps your nights quiet.
Common mistakes: symptom → root cause → fix
1) Symptom: Jail is listed, but “Total failed” stays at 0
Root cause: Wrong log source. You’re tailing a file that’s not being written, or the unit you query in journald is wrong.
Fix: Prove failures exist (Task 7) and align backend/logpath (Tasks 6–8). Don’t guess. Make the jail read what your system writes.
2) Symptom: “Total failed” increases, but “Total banned” stays 0 forever
Root cause: Thresholds too high, ignoreip includes the attacker’s source range, or your failures are spread across time beyond findtime.
Fix: Check thresholds and ignoreip (Tasks 11–12). Temporarily lower maxretry to validate the ban pipeline, then tune back.
3) Symptom: “Currently banned” shows IPs, but attackers still connect
Root cause: Enforcement mismatch: wrong action (iptables vs nftables), rule ordering, or IPv6 bypass.
Fix: Force-ban a test IP (Task 13), then verify in the correct firewall subsystem (Tasks 14–16) and validate packet behavior (Task 17). Add IPv6 coverage if needed.
4) Symptom: Fail2ban logs show “No such file or directory” for logpath
Root cause: Old jail snippets pointing to /var/log/auth.log or app logs that no longer exist because logging moved to journald or container stdout.
Fix: Switch to backend = systemd where appropriate, or ensure the service logs to a file you can read and rotate safely.
5) Symptom: Manual ban works, automatic bans never happen
Root cause: Filter mismatch: your regex doesn’t match the actual failure lines. Or the jail monitors the wrong port/service name.
Fix: Run fail2ban-regex (Task 10) against the real source and adjust the filter or jail parameters until you get matches.
6) Symptom: After enabling UFW, bans stop working
Root cause: Chain ordering changed; UFW rules accept traffic before Fail2ban gets a chance to drop it.
Fix: Ensure Fail2ban rules are evaluated early for the targeted ports. Validate with behavior testing (Task 17), not just rule presence.
7) Symptom: Fail2ban bans your colleagues during a deploy
Root cause: NAT: many people share one egress IP, and failures from multiple hosts accumulate under that address.
Fix: Add appropriate corporate NAT addresses to ignoreip (narrowly), or increase maxretry/findtime to avoid punishing shared egress. Also consider moving admin access behind a VPN with per-user identity.
8) Symptom: Fail2ban service starts, but jails don’t load after an edit
Root cause: Syntax error in a jail file or an include ordering issue. Fail2ban is strict, and one bad stanza can disable what you think is enabled.
Fix: Use fail2ban-client -d (Task 4) and journal errors (Task 2) to identify the bad file. Fix, then restart and re-check jail list.
Checklists / step-by-step plan
Checklist A — “I need bans working today” (30–60 minutes, disciplined)
- Confirm Fail2ban health:
systemctl status fail2ban(Task 1). - Read Fail2ban errors:
journalctl -u fail2ban(Task 2). - Verify your jail is running:
fail2ban-client status(Task 3). - Inspect jail details:
fail2ban-client status sshd(Task 5). - Prove failures exist in your chosen log source (Tasks 6–7).
- Prove the filter matches:
fail2ban-regex(Task 10). - Force-ban a test IP:
fail2ban-client set sshd banip ...(Task 13). - Verify firewall state with the correct tool (Tasks 14–16).
- Validate behavior with tcpdump (Task 17), then unban (Task 18).
Checklist B — Hardening without breaking bans (because “secure” is not “nonfunctional”)
- Decide on your enforcement backend: nftables or iptables-nft. Document it.
- Standardize jails in
/etc/fail2ban/jail.d/*.localrather than editingjail.conf. - Prefer
backend = systemdon hosts where journald is authoritative. - Keep
ignoreipminimal; treat it like firewall allowlists (reviewed, justified, scoped). - Cover IPv6 if your services are reachable via IPv6.
- Add a simple ban verification test to your post-deploy checks (manual ban, inspect firewall, unban).
Checklist C — When you suspect rule ordering problems
- Force-ban a test IP (Task 13).
- Find the rule and chain presence (Tasks 15–16).
- Confirm the ban chain is actually referenced from the input path (iptables: jump from INPUT; nft: chain hook or referenced set).
- Test behavior with tcpdump (Task 17).
- Adjust action configuration or firewall policy order until packets stop being accepted.
FAQ
1) Why does Fail2ban show “Total failed” increasing but still not ban?
Because banning is threshold-driven. Verify maxretry, findtime, and ignoreip. Then temporarily lower maxretry to prove the pipeline works, and tune back.
2) Should I use backend = systemd on Ubuntu 24.04?
If your authentication failures live in journald (common), yes. Use the journal as the source of truth. File tailing is fine when the file is real, current, and consistently formatted.
3) How do I know whether I’m using nftables or iptables?
Check update-alternatives --display iptables and inspect the active rules with the matching tool. If Fail2ban uses nftables actions, inspect with nft list ruleset.
4) I see bans in Fail2ban status. Does that guarantee traffic is blocked?
No. That only proves Fail2ban’s internal state. Always validate enforcement by checking firewall rules and packet behavior (tcpdump or a controlled connection attempt).
5) Why does manual ban work but automatic ban doesn’t?
Manual ban bypasses filtering. Automatic bans depend on matching log lines. Use fail2ban-regex against the exact log source to prove your filter matches reality.
6) Can UFW interfere with Fail2ban?
Yes, usually through rule ordering or conflicting policy assumptions. The fix is not “disable UFW.” The fix is to ensure Fail2ban’s blocking is evaluated for the relevant traffic path and to verify behavior.
7) Why do I get banned from my own server sometimes?
NAT and shared egress. Multiple people failing logins from the same public IP accumulate retries. Add your admin egress IPs to ignoreip narrowly, or improve access patterns (VPN, bastion, keys).
8) Do I need to worry about IPv6 with Fail2ban?
If the service is reachable over IPv6, yes. Otherwise, you create a “ban works for half the Internet” situation, which is not the kind of success you want to report.
9) Is Fail2ban enough protection for SSH?
It’s a useful noise reducer and rate-limiter by ban. But the fundamentals still matter more: key-based auth, no password auth where possible, MFA on bastions, and least exposure on the network edge.
10) What’s the single best sanity check when someone says “Fail2ban isn’t working”?
Force a ban on a test IP, verify the firewall state, then verify packet behavior. That isolates enforcement from detection in minutes.
Conclusion: next steps you can do today
If Fail2ban isn’t banning on Ubuntu 24.04, treat it like any other production pipeline: input, processing, output, enforcement. Verify each layer with evidence. Don’t tweak knobs in the dark.
Do these three things next:
- Prove log visibility: confirm your failures exist where the jail reads them (journal vs file), and run
fail2ban-regexagainst real data. - Prove enforcement: manually ban a test IP and confirm rules appear in the correct firewall subsystem (nft vs iptables), then validate behavior with packets.
- Make it repeatable: add a small post-change verification step (ban, observe, unban) so upgrades don’t silently unhook your defenses.
Fail2ban is not magic. That’s good news. It means you can debug it like an adult: with a workflow, not a wish.