Rspamd: The Anti-Spam Tuning That Blocks Attacks Without Blocking People

February 26, 2026 • February 26, 2026 • Read: 23 min • Views: 1

Was this helpful?

Your mail system doesn’t fail politely. It fails at 09:12 on a Tuesday when sales can’t reach prospects, invoices don’t arrive,
and someone in compliance starts saying the word “audit” like it’s a spell.

The hard part isn’t “blocking spam.” The hard part is blocking the right spam—during a flood—without nuking your CEO’s calendar invite
from a hotel Wi‑Fi IP block that looks like it was last cleaned in 2009.

The production mental model: Rspamd is a policy engine, not a magic box

If you approach Rspamd like a single “spam score” number that you tweak until the pager stops, you’ll be busy forever. If you approach it
like a policy engine—an opinionated classifier with modules, reputation, and enforcement—you’ll get predictable outcomes.

In production, you don’t optimize for “maximum spam caught.” You optimize for: (1) minimal false positives, (2) bounded resource usage under attack,
and (3) quick reversibility when your assumptions blow up.

The trick is building a layered decision pipeline:

Authenticate (SPF/DKIM/DMARC) and decide what failures mean for your domains vs everyone else.
Classify content (symbols, rules, Bayesian where appropriate) without letting it become an unbounded CPU festival.
Apply policy (ratelimits, greylisting, URL reputation, attachment controls) based on observed abuse patterns.
Deliver safely (rewrite subject, add headers, quarantine) so humans can recover when the classifier is wrong.

Here’s the part many teams miss: the mail system is an adversarial environment. Attackers adapt. Your tuning has to be resilient, not perfect.
“Perfect” is what you say right before you deploy something that blocks payroll.

Paraphrased idea from Werner Vogels: Everything fails; design for it, detect it fast, and recover without heroics.

Interesting facts and history that matter in practice

A few context points that actually affect how you run Rspamd day-to-day. Not trivia—things that change decisions.

Mail authentication is young compared to SMTP. SMTP dates to the early 1980s; SPF, DKIM, and DMARC came decades later, and most of the world is still catching up.
Rspamd was built around a modular symbol system. You don’t “get a score”; you get a bag of symbols with weights, and that’s why tuning is tractable.
Redis became the common state backend for modern filtering. Reputation, ratelimits, fuzzy hashes, and Bayesian state need fast shared memory; Redis is the pragmatic choice in many deployments.
Greylisting works because spammers optimize for throughput. Legit MTAs retry; cheap spam bots often don’t. It’s less magical now than it used to be, but still useful when tuned.
DMARC changed the social contract. It lets domain owners publish “reject/quarantine” intentions, which finally gives receivers permission to be strict for some domains.
“Spam” evolved into credential theft. The ROI shifted from selling pills to stealing logins. That changes which signals matter: brand impersonation, URL reputation, and display-name tricks.
Mail floods are often a smokescreen. Attackers mail-bomb an inbox while attempting password resets or fraudulent transactions elsewhere.
Modern spam uses legitimate infrastructure. Compromised SaaS accounts and marketing platforms send mail that “passes auth,” forcing you to lean on content and behavioral controls.

Joke #1: Email is the only protocol where “it’s probably fine” is traditionally validated by waiting three days and seeing who complains.

How Rspamd makes decisions (and where tuning actually lives)

Rspamd evaluates messages and produces:

Symbols: named detections like R_SPF_FAIL or DKIM_VALID.
Scores: weights applied to symbols, combined into an overall action score.
Actions: what to do at thresholds (no action, add header, greylist, rewrite subject, reject).

Actions are the policy. Scores are just the wiring.

Most production pain comes from conflating “score tuning” with “policy tuning.” The stable approach:

Keep actions conservative: don’t reject aggressively until you have proven low false positives.
Use quarantine or rewrite subject as intermediate steps while you learn your mail stream.
Favor ratelimits and greylisting for floods, because they control resource use even when classification is noisy.

The modules you should actually care about

A pragmatic “core set” for many environments:

SPF/DKIM/DMARC (auth signals, domain policy)
ARC (forwarding chains; avoids punishing legitimate list/forward scenarios)
RBL / SURBL (IP/domain reputation; be cautious with aggressive lists)
Reputation (history-based, usually Redis-backed)
Ratelimit (flood control)
Multimap (your custom allow/deny logic at scale)
Fuzzy (hash-based similarity; good for repeated campaigns)
Bayes (useful but easy to poison and easy to misunderstand)

Where to put custom logic

If you keep adding Lua scripts because “we needed one more rule,” you’ll eventually have a bespoke mail filter written by three former employees.
Prefer:

multimap for most custom policy (domains, senders, MIME types, subject patterns)
local.d/ overrides instead of editing stock config
small, testable Lua only when multimap can’t express the logic

Scoring strategy: stop chasing “the perfect score”

The scoring system is a machine for expressing trade-offs. Your job is to make those trade-offs explicit.

Start with actions, then tune weights

A common baseline:

add header at a relatively low score (inform downstream, enable searches)
rewrite subject a bit higher (makes end-users notice)
greylist used sparingly (especially for inbound from unknown senders)
reject only when you’re confident (or when an attack forces your hand)

Embrace “soft failures” for auth signals

SPF and DKIM are valuable, but the real world is messy: forwarding, broken DNS, misconfigured senders.
Use failures as signals that combine with others, not as single-point rejection triggers—except for domains with strong DMARC policies.

Build separate policies for “your domains” vs “the internet”

Outbound and inbound are different sports.

For inbound, you care about phishing, spoofing, and floods.
For outbound, you care about account compromise and reputation preservation.

Your own domains can be held to stricter standards: enforce DKIM signing, enforce From alignment, enforce rate controls. If your own systems can’t comply,
that’s not Rspamd’s fault. That’s a governance issue wearing a technical disguise.

Practical tasks: commands, outputs, and decisions (12+)

These are production-grade “do this now” tasks. Each has: a command, what the output means, and the decision you make from it.
Assumptions: Rspamd installed, systemd-based Linux, Redis available, and either Postfix or another MTA integrated.

Task 1: Confirm Rspamd is actually healthy (not just running)

cr0x@server:~$ systemctl status rspamd --no-pager
● rspamd.service - Rspamd daemon
     Loaded: loaded (/lib/systemd/system/rspamd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2026-02-04 08:10:21 UTC; 2h 12min ago
   Main PID: 1423 (rspamd)
      Tasks: 32 (limit: 18958)
     Memory: 410.2M
        CPU: 8min 33.120s
     CGroup: /system.slice/rspamd.service
             ├─1423 rspamd: main process
             ├─1431 rspamd: proxy process (localhost:11332)
             ├─1432 rspamd: controller process (localhost:11334)
             └─1440 rspamd: worker process (normal)

Meaning: “Active” is table stakes. Memory/CPU gives you an early hint about leaks, rule explosions, or flood conditions.
Multiple processes are normal; look for worker count and growth.

Decision: If memory is climbing without bound or CPU is pegged, skip tuning weights and go to the Fast diagnosis playbook.

Task 2: Check Rspamd configuration sanity before you change anything

cr0x@server:~$ rspamadm configtest
syntax OK
modules OK

Meaning: Syntax OK means Rspamd can load the config. “Modules OK” means modules initialize without errors.

Decision: If this fails, don’t reload and “see what happens.” Fix the config error first; mail queues make excellent time bombs.

Task 3: Verify controller access and see live symbol distributions

cr0x@server:~$ rspamc stat
Uptime: 7964 seconds
Messages scanned: 184295
Messages learned: 312
Messages with action reject: 3911, 2.12%
Messages with action add header: 22144, 12.01%
Messages with action rewrite subject: 8021, 4.35%
Messages with action no action: 150219, 81.52%
Scanned messages per second: 23.15
Actions histogram:
no action: 150219
add header: 22144
rewrite subject: 8021
reject: 3911

Meaning: This is your control chart. Sudden spikes in “reject” or “rewrite subject” usually correlate with an inbound campaign or a broken sender.

Decision: If reject rate jumps suddenly, you investigate symbols and sources before adjusting thresholds. Don’t “fix” a campaign by permanently lowering standards.

Task 4: Inspect top symbols to find what’s actually driving actions

cr0x@server:~$ rspamc counters
Symbol: R_SPF_FAIL: 12401
Symbol: DKIM_SIGNED: 98234
Symbol: DMARC_POLICY_REJECT: 821
Symbol: RBL_DBL_SPAM: 6402
Symbol: URL_COUNT_5: 4901
Symbol: MIME_HTML_ONLY: 17322
Symbol: BAYES_SPAM: 2501
Symbol: BAYES_HAM: 1811

Meaning: Counters show symbol frequency. High-frequency symbols aren’t always bad; they tell you what the environment looks like.

Decision: If a single symbol is causing most rejects, validate it against real mail samples. Fix the rule or weight, not the entire policy.

Task 5: Pull a specific message scan result (headers or EML)

cr0x@server:~$ rspamc -h localhost:11334 -P 'controllerpass' symbols < /var/spool/mail/samples/suspect.eml
{"is_spam":true,"score":18.50,"required_score":15.00,"action":"reject","symbols":{"R_SPF_FAIL":{"score":2.00},"DMARC_POLICY_REJECT":{"score":4.00},"RBL_DBL_SPAM":{"score":5.00},"MIME_HTML_ONLY":{"score":1.00},"PHISHING":{"score":6.00}}}

Meaning: This is the “why.” It’s not enough to know the message was rejected; you need to see which symbols pushed it over.

Decision: If DMARC policy reject is firing for a domain that should be allowed (forwarders, mailing lists), you investigate ARC and authentication chain, not just weights.

Task 6: Validate Redis health and latency (Rspamd depends on it more than you think)

cr0x@server:~$ redis-cli -h 127.0.0.1 -p 6379 INFO stats | egrep 'instantaneous_ops_per_sec|keyspace_hits|keyspace_misses'
instantaneous_ops_per_sec:1240
keyspace_hits:9319921
keyspace_misses:42112

Meaning: Ops/sec gives load. Hits vs misses gives cache effectiveness. Lots of misses can mean cold cache or churn.

Decision: If ops/sec spikes during mail floods, ensure Redis isn’t single-thread saturating, and consider isolating it or scaling the Rspamd workers accordingly.

Task 7: Check Redis memory and eviction policy (to avoid “forgetting” reputations)

cr0x@server:~$ redis-cli INFO memory | egrep 'used_memory_human|maxmemory_human|mem_fragmentation_ratio'
used_memory_human:2.13G
maxmemory_human:3.00G
mem_fragmentation_ratio:1.12

Meaning: If you set a maxmemory and Redis starts evicting, your reputation and ratelimit state becomes unreliable. Fragmentation hints at allocator behavior.

Decision: If memory is tight, either raise maxmemory or reduce what you store (modules, retention). “Let it evict” is a policy choice, not a default.

Task 8: See if Rspamd is CPU-bound (rules too heavy, too many workers, or not enough)

cr0x@server:~$ pidstat -p $(pidof rspamd | tr ' ' ',') 1 3
Linux 6.1.0 (server)  02/04/2026  _x86_64_  (8 CPU)

12:40:11 PM   PID  %usr %system  %CPU  Command
12:40:12 PM  1423  1.00    0.00  1.00  rspamd
12:40:12 PM  1440 92.00    2.00 94.00  rspamd
12:40:12 PM  1441 88.00    1.00 89.00  rspamd

Meaning: Workers pegged near 100% are normal under load, but sustained pegging with growing queues means bottleneck.

Decision: If CPU is saturated, reduce expensive checks (URL extraction depth, heavy regex, too many DNS lookups) or add capacity. Don’t “fix” CPU with lower reject thresholds.

Task 9: Measure DNS resolution health (many modules are DNS-shaped)

cr0x@server:~$ resolvectl statistics
Transactions: 219381
Cache Hits: 171224
Cache Misses: 48157
DNSSEC Verdicts Secure: 0
DNSSEC Verdicts Insecure: 0
DNSSEC Verdicts Bogus: 0

Meaning: High misses can mean poor caching or aggressive TTLs. In mail filtering, DNS is often the hidden latency tax.

Decision: If misses are high and latency is visible, put a local caching resolver on the same host or LAN, and ensure Rspamd uses it.

Task 10: Test an RBL/SURBL lookup path (and detect timeouts)

cr0x@server:~$ dig +time=1 +tries=1 2.0.0.127.zen.spamhaus.org A +short
127.0.0.2

Meaning: A fast answer means your DNS path is working and the list is reachable. Timeouts here become per-message delays.

Decision: If lookups time out, either fix DNS egress, switch resolvers, or disable the list. Timeouts under load are how you self-DoS your mail gateway.

Task 11: Confirm Postfix is actually handing mail to Rspamd (integration sanity)

cr0x@server:~$ postconf -n | egrep 'smtpd_milters|non_smtpd_milters|milter_protocol|milter_default_action'
smtpd_milters = inet:127.0.0.1:11332
non_smtpd_milters = inet:127.0.0.1:11332
milter_protocol = 6
milter_default_action = accept

Meaning: If milters aren’t configured, you’re tuning a filter that never sees traffic. Default action “accept” is a safety choice during outages.

Decision: Keep milter_default_action=accept unless you enjoy explaining why the mail system failed closed at the worst possible moment.

Task 12: Watch live throughput and queue pressure (the “is this an attack?” check)

cr0x@server:~$ mailq | tail -n +1 | head -n 12
-Queue ID-  --Size-- ----Arrival Time---- -Sender/Recipient-------
A1B2C3D4E5     2480 Tue Feb  4 12:38:02  sender@example.net
                                         recipient@yourdomain.com

F6G7H8I9J0     3121 Tue Feb  4 12:38:02  sender2@random.tld
                                         recipient@yourdomain.com

-- 2814 Kbytes in 742 Requests.

Meaning: A growing queue indicates downstream slowness: Rspamd, DNS, content checks, or delivery issues.

Decision: If the queue grows during a spam wave, enable or tighten ratelimits/greylisting instead of only adjusting content scores.

Task 13: Inspect Rspamd logs for timeouts and module pain

cr0x@server:~$ journalctl -u rspamd --since "30 min ago" | egrep -i 'timeout|slow|error|redis' | tail -n 12
Feb 04 12:22:11 server rspamd[1440]: <0c1d3a>; lua; lua_rsa.lua:142: slow hash computation: 215ms
Feb 04 12:22:18 server rspamd[1441]: ; rbl; rbl.c:512: timeout while resolving zen.spamhaus.org for 203.0.113.44
Feb 04 12:23:04 server rspamd[1440]: ; redis; redis_pool.c:98: cannot connect to redis: Connection refused

Meaning: These messages point to exact modules causing latency or failure. RBL timeouts = DNS or egress. Redis connection issues = state modules failing open/closed.

Decision: Fix the dependency (DNS/Redis) before tuning rules. Otherwise you’re tuning on broken sensors.

Task 14: Safely reload Rspamd after config changes

cr0x@server:~$ rspamadm configtest && systemctl reload rspamd && systemctl is-active rspamd
syntax OK
modules OK
active

Meaning: Reload keeps process uptime while applying config changes; “active” confirms it didn’t crash-loop.

Decision: Always configtest before reload. You don’t want to learn syntax rules via mail outage.

Fast diagnosis playbook

When mail delivery gets slow, rejects spike, or users start forwarding screenshots with red circles: do this in order.
The goal is to find the bottleneck in minutes, not perform a full anthropology of your spam culture.

First: is it throughput, latency, or policy?

Throughput issue: queues growing, scan rate dropping, CPU pegged.
Latency issue: per-message delays, timeouts in logs, DNS stalls.
Policy issue: sudden false positives/negatives, action distribution shifts.

Second: check the three usual suspects

DNS:
- Look for RBL/SURBL timeouts in logs.
- Check caching resolver stats.
- Test representative lookups with dig using low timeout.
Redis:
- Is Redis up? Is it swapping? Is it evicting?
- Check latency spikes and connection errors.
CPU/memory pressure:
- Are Rspamd workers pegged? Are they spending time in one heavy module?
- Is the box thrashing due to memory or disk I/O (especially if Redis persists)?

Third: look at action histogram and top symbols

If “reject” or “rewrite subject” explodes, don’t immediately edit weights.
Pull 10 samples (a mix of rejected and borderline) and compare symbol sets. You’re looking for one of these:

A single bad reputation source (RBL returning false positives, DNS poisoning, network blocklists too aggressive)
Authentication chain changes (forwarders, new marketing platform, broken DKIM key rotation)
A new campaign pattern (new URLs, new attachment type, new subject tricks)

Fourth: choose the right lever

Attack flood: ratelimit + greylisting + connection controls.
Targeted phish: multimap/URL rules + strict DMARC handling for protected brands + subject/display-name heuristics.
False positives: quarantine + allowlists that are narrow + fix the specific module weight.
Performance collapse: reduce slow DNS-based modules, add caching, isolate Redis, scale workers.

Three corporate-world mini-stories from the trenches

1) The incident caused by a wrong assumption: “DKIM valid means safe”

A mid-size company moved to a single inbound mail gateway with Rspamd. They had just finished a multi-month “deliverability project,”
so authentication was the star of the show: SPF aligned, DKIM signed, DMARC enforced on their own domains.
They assumed that inbound mail with DKIM_VALID was mostly legitimate because “spammers can’t do DKIM.”

Then came the phish wave. It wasn’t spoofing. It was sent from compromised accounts on a legitimate SaaS platform that signs outbound DKIM correctly.
The mail passed SPF and DKIM, and DMARC was aligned because the attacker used a lookalike domain with valid auth.
The content was minimal, the URL was freshly registered, and the payload lived behind a redirect chain that didn’t show up in basic checks.

The team’s first response was to crank up weights on content symbols. That helped a bit and hurt a lot: suddenly regular “one-line” replies and HR notifications
got rewritten subjects, because many minimal emails look suspicious in a vacuum.

The fix was less exciting and more effective: they stopped treating DKIM as a trust signal and started treating it as an identity signal.
They built multimap rules for protected brands and internal workflows, added URL reputation checks that resolved redirects safely,
and introduced a quarantine workflow for “auth-passing but suspicious” messages. DKIM still mattered, but it didn’t get to vote twice.

After that, the system caught the next wave without breaking normal mail. The team also learned a useful lesson:
passing authentication is increasingly common for bad mail, because bad actors borrow good infrastructure.

2) The optimization that backfired: “We’ll just enable every RBL”

Another organization had a spam problem and a procurement problem. They didn’t want to pay for anything new, and the team
found a long list of DNS-based blocklists. Someone proposed: “Enable them all. More lists, more protection.”
It sounds reasonable until you remember how DNS works under load.

They enabled a stack of RBL/SURBL providers with aggressive timeouts and very little caching. For the first day, it looked amazing.
Rejects went up, spam complaints went down, and the team enjoyed that rare feeling of being right.

On day two, a flood hit. DNS lookups spiked, timeouts began, and Rspamd workers spent their time waiting for answers that never came.
Postfix queues grew. Legit mail got delayed by minutes. Then someone “fixed” it by increasing the number of Rspamd workers, which just multiplied DNS pressure.
The system wasn’t filtering spam anymore; it was DDoS’ing its own resolver.

Recovery was simple but not glamorous: they reduced the number of RBLs to a small set with proven signal quality,
put a local caching resolver in front, and set sane per-module timeouts so failures degraded gracefully.
They also kept metrics on DNS latency. It’s hard to argue with a graph that screams.

3) The boring but correct practice that saved the day: “Quarantine and review, always”

A regulated firm handled legal mail, finance mail, and the kind of HR mail that becomes evidence if mishandled.
They wanted aggressive spam rejection. The mail team refused. They implemented a quarantine policy instead:
most “high confidence” spam was rejected, but anything in a gray zone was quarantined for a defined retention period.

The quarantine was not a junk folder in a user mailbox. It was an operational control:
searchable, auditable, with role-based access and an incident workflow. Every false positive had a path to recovery.
Every release event logged who did it and why.

One afternoon a major partner rotated DKIM keys incorrectly and started failing alignment. Their mail looked like spoofing, and some of it landed in quarantine.
The business impact was real but contained: messages were delayed, not destroyed.

Because the team had the boring practice in place, the incident was a 30-minute operational exercise, not a multi-day blame storm.
They adjusted a temporary exception with tight scope, told the partner what broke, and removed the exception once the partner fixed DNS.
Nothing “clever” happened. That’s why it worked.

Tuning for modern attacks: floods, phish, and “business email cosplay”

Spam floods and mail-bombs: win by bounding resource usage

During a flood, content classification gets less reliable because attackers spray variations. The better move is to control the blast radius:
limit per-IP/per-sender rates, slow down unknowns, and protect your downstream MTA and mailboxes.

Use Rspamd’s ratelimit module and consider selective greylisting:

Ratelimit based on IP or ASN patterns when a campaign concentrates.
Ratelimit based on recipient during mail-bombs (one mailbox getting hammered).
Greylist unknown senders that fail basic hygiene checks (no valid rDNS, no auth, suspicious HELO), but avoid greylisting major providers that already have good reputation.

Credential phishing: treat URLs as the payload

Most phishing emails are URL delivery systems, not attachment delivery systems. Your tuning should reflect that:

Extract URLs reliably (but don’t let extraction become CPU hell).
Check URL reputation and newly registered domains.
Detect lookalike domains and brand bait in display names.
Be careful with redirect chains; attackers hide behind “legitimate” tracking domains.

Impersonation and BEC-style attacks: authentication isn’t enough

Business email compromise often uses:

Display name spoofing (“CEO Name” from a random domain)
Reply-to manipulation (From looks fine; Reply-To goes elsewhere)
Thread hijacking (reusing subjects like “Invoice” and “Re: Payment”)

Rspamd can help with custom rules (multimap/Lua), but the real control is policy:
route suspicious “impersonation patterns” to quarantine or add a loud header for clients that display banners.

Joke #2: The only thing more persistent than spam is the meeting invite for “Quick Sync” that keeps resurfacing like a haunted calendar entry.

Keeping humans unblocked: controlling false positives like an adult

False positives are not just “annoying.” They’re operational debt. People work around broken email by using personal accounts and chat apps,
and suddenly your data governance story turns into interpretive dance.

Design the “recovery path” before you tighten policy

Quarantine for borderline mail, with a defined review workflow.
Rewrite subject for medium confidence spam, so users can self-filter without losing mail.
Add headers for low confidence. Let downstream systems decide (SIEM, mailbox rules, user clients).

Prefer narrow allowlists over broad ones

“Allowlist the sender domain” is usually too broad. Prefer:

Allowlist a specific envelope sender + DKIM domain combination
Allowlist a specific IP range you control
Allowlist a known vendor’s dedicated sending domain (not their entire corporate domain)

Be strict where it’s safe: your own domains and high-risk brands

For mail that claims to be from your own domains, strictness is non-negotiable:
misaligned From should be punished. That’s how you stop spoofing and internal impersonation attempts.

For “protected brands” (banks, payroll providers, identity providers), strictness is also justified:
these are the domains attackers weaponize because users trust them.

Bayes: use it, but don’t let it drive the bus

Bayesian filtering can help, especially in stable mail environments. It can also:

Get poisoned by targeted campaigns
Overfit to internal jargon
Create invisible coupling between “training behavior” and mail outcomes

If you run Bayes, treat training as an operational process, not a hobby. Limit who can train, log training events, and periodically evaluate accuracy.

Common mistakes: symptom → root cause → fix

1) Symptom: sudden spike in rejected legitimate mail

Root cause: One reputation source turned “hot” (RBL/SURBL false positives) or DNS started timing out and failing in a biased way.

Fix: Identify the symbol driving rejects via rspamc symbols on samples. Reduce weight or disable that list, add DNS caching, set timeouts, and verify with live tests.

2) Symptom: queues growing, Rspamd CPU pegged, scan rate drops

Root cause: Too many expensive checks (RBLs, URL extraction, regex) + insufficient caching; or a flood pushing the system into worst-case behavior.

Fix: Tighten ratelimits/greylisting, reduce slow modules, ensure local caching resolver, and scale horizontally if needed. Measure before/after.

3) Symptom: “Spam passes” even with high scores configured

Root cause: Rspamd isn’t actually in the mail path (milter misconfig) or it’s failing open due to dependency issues (Redis down, timeouts).

Fix: Verify MTA milters, verify Rspamd logs, validate Redis connectivity. Don’t adjust scores until you confirm messages are being scanned.

4) Symptom: forwarded mail gets rejected due to DMARC

Root cause: Forwarding breaks SPF and can break DKIM; DMARC alignment fails and you enforce policy too strictly without ARC handling.

Fix: Enable and validate ARC processing, create targeted exceptions for known forwarders, and quarantine instead of reject for ambiguous forwarded flows.

5) Symptom: legitimate bulk mail (invoices, notifications) lands as spam

Root cause: Vendor misconfiguration: missing List-Unsubscribe headers, poor DKIM alignment, shared IP reputation, or “marketing template” content triggers.

Fix: Build narrow allow rules (specific sending domains/IPs), require DKIM alignment from vendors, and adjust weights only for the specific symbol patterns observed.

6) Symptom: intermittent failures, “cannot connect to redis”

Root cause: Redis on the same host is competing for memory/CPU, hitting maxclients, or being restarted by OOM killer.

Fix: Allocate resources properly, set Redis maxmemory with a deliberate eviction policy, monitor, and consider isolating Redis on dedicated infrastructure.

7) Symptom: “Everything is greylisted,” users complain about delays

Root cause: Greylisting applied too broadly (including major providers), or whitelist not configured for known good senders.

Fix: Restrict greylisting to unknown/low-reputation sources, whitelist major MTAs where appropriate, and use ratelimit for floods instead of blanket greylist.

8) Symptom: false positives tied to one internal department

Root cause: Internal systems sending machine-generated HTML-only mail or odd MIME structures that look spammy.

Fix: Fix the sender to produce sane MIME, add proper DKIM, and only then add a narrow allow rule if needed.

Checklists / step-by-step plan

Step-by-step: harden Rspamd without detonating user trust

Baseline observability: action histogram, scan rate, queue depth, DNS latency, Redis health.
Set conservative actions: add header > rewrite subject > quarantine/greylist > reject.
Validate auth chain: SPF/DKIM/DMARC/ARC results on representative traffic (not just one happy-path sender).
Control floods: enable ratelimits; decide a recipient-protection plan for mail-bombs.
Choose reputation sources intentionally: fewer high-quality lists, with strict timeouts and caching.
Build multimap policy: protected brands, known vendors, internal systems, and “never reject” paths routed to quarantine.
Define exception workflow: who can allowlist, how long exceptions live, how you audit and remove them.
Run a false-positive drill: simulate a partner DKIM failure; confirm you can recover mail from quarantine fast.
Stress test: replay sample spam floods; verify scan rate and queue behavior under load.
Iterate weekly: small changes, measured impact, revert capability.

Checklist: before raising reject aggressiveness

Quarantine exists and is searchable with a defined SLA for review.
Top rejection symbols are understood with real examples.
DNS caching is in place and RBL timeouts are controlled.
Redis has headroom and no eviction surprises.
Forwarding paths are evaluated (ARC where needed).
On-call runbook exists for “partner broke DKIM” and “mail-bomb attack.”

Checklist: during an active attack

Confirm it’s an attack: queue growth + action histogram changes + source concentration.
Enable/tighten ratelimits for attack sources or recipients.
Consider temporary greylisting for unknown sources.
Reduce slow checks that amplify latency (extra RBLs, heavy URL extraction) until stable.
Quarantine suspicious auth-passing mail rather than rejecting if uncertain.
Post-incident: undo temporary broad controls, and convert what you learned into targeted rules.

FAQ

1) Should I reject spam at SMTP time or quarantine it?

Reject at SMTP time for high-confidence cases (clear malware/phish, strong DMARC reject alignment failures for protected domains, known bad IPs).
Quarantine for ambiguous cases. If you can’t recover mail reliably, you’re not ready to reject aggressively.

2) What’s the fastest way to reduce false positives?

Don’t “lower all scores.” Identify the top symbol(s) driving false positives by scanning samples and reviewing counters. Fix that module weight or rule,
and add a narrow exception for the specific sender behavior if needed.

3) Is greylisting still useful?

Yes, but only when selective. Blanket greylisting punishes legitimate bulk providers and creates user-visible delays.
Use it as a pressure valve for unknown sources or during floods, not as your primary filter.

4) How many RBL/SURBL lists should I enable?

Fewer than you think. Choose a small number of high-signal lists, enforce tight timeouts, and make sure DNS caching is solid.
More lists can reduce spam—until they reduce your throughput more.

5) Why does spam pass SPF/DKIM/DMARC now?

Because attackers use compromised accounts and legitimate senders, or they register lookalike domains and set up auth properly.
Authentication proves identity of the sending domain, not the intent of the message.

6) Where should I keep local customizations?

In local.d/ overrides and multimap definitions. Avoid editing stock configs directly; it complicates upgrades and makes drift invisible.

7) How do I tune Rspamd under high load without making it worse?

Treat dependency latency as the enemy: DNS and Redis first. Use ratelimit/greylisting to bound work.
Disable or relax slow checks temporarily. Don’t add more workers unless you understand the shared bottleneck.

8) Do I need Bayesian filtering?

Not always. In diverse inbound mail environments, Bayes can be noisy and operationally expensive.
If you use it, lock down training, monitor accuracy, and ensure it’s not overpowering other signals.

9) How do I handle forwarded mail and mailing lists without getting phished?

Enable ARC evaluation where appropriate, quarantine ambiguous failures, and write explicit policy for known forwarders and lists.
Do not broadly “ignore DMARC.” That’s how spoofing returns.

Conclusion: next steps you can do this week

Rspamd will happily let you build a complicated scoring cathedral. Don’t. Build a policy machine that degrades gracefully,
is observable, and has a recovery path for humans.

Run rspamc stat and rspamc counters; screenshot baseline action rates and top symbols.
Validate DNS and Redis dependencies; fix timeouts and caching before touching weights.
Implement (or tighten) ratelimits for recipient mail-bombs and obvious source floods.
Set up quarantine for borderline cases; define who reviews it and how fast.
Create a multimap policy for protected brands and critical internal workflows.
Do one controlled tuning change, measure impact for a week, and keep a revert plan.

If you do those six things, you’ll block more attacks and block fewer people—which is the only metric that matters when the business is awake.