Rspamd False Positives: Tune Spam Scoring Without Letting Junk Through

Was this helpful?

You don’t notice spam filtering when it works. You notice it when a CFO’s wire confirmation gets rejected at 09:02 and suddenly you’re running a live incident in your inbox.

False positives are worse than spam. Spam wastes minutes; false positives burn trust, create shadow IT, and make people forward “important” mail through consumer accounts until the next audit. This is how you tune Rspamd scoring like an adult: evidence-first, reversible changes, and zero magical thinking.

The mental model: actions, symbols, and why false positives happen

Rspamd doesn’t “decide spam.” It sums signals. Each signal is a symbol with a score. The total score maps to an action: no action, add header, greylist, rewrite subject, reject, or quarantine (depending on your policy).

False positives usually come from one of four places:

  1. Bad assumptions about thresholds. Someone set reject = 6 because “6 feels right,” then forgot that a single broken DKIM plus a reputation hit can spike a clean newsletter.
  2. A symbol with an outsized score. A custom regexp rule that was meant to catch “invoice attached” now nails every “invoice” mention, including your own billing team.
  3. Misaligned authentication. SPF passes for one domain, DKIM signs as another, DMARC fails, and the message takes a scoring punch it doesn’t deserve—especially for forwarded mail and mailing lists.
  4. Training and reputation drift. Bayes trained on the wrong folder; fuzzy hashes learned a pattern that matches legitimate templates; historical reputation sources changed behavior.

Here’s the discipline: you tune symbols, not vibes. You tune per traffic class, not the entire internet. And you never “fix false positives” by simply raising thresholds until spam wins.

One dry truth from operations: “We’ll just bump the reject threshold” is the email equivalent of silencing a smoke alarm by removing the batteries. It works until it really, really doesn’t.

Also, one quote worth keeping on your wall. Werner Vogels has a paraphrased idea that’s operational gold: everything fails, so design for failure as a normal condition. Apply that here: assume some rules will be wrong, and build a process that catches and corrects them quickly.

Facts and context that actually matter in production

Six-to-ten quick context points, because production systems are built on history whether we admit it or not:

  • Fact 1: Rspamd was designed as a high-performance replacement for older monolithic filters, using a modular rule engine and async I/O to keep latency predictable under load.
  • Fact 2: The “symbol” model is intentionally transparent compared to many ML-only filters: you can explain why a message scored 9.3, which matters when Legal calls.
  • Fact 3: Greylisting was once a near-free win because early spam bots didn’t retry properly; modern spam infrastructure retries just fine, so greylisting is now a policy decision, not a cheat code.
  • Fact 4: DMARC adoption changed the scoring landscape: failing alignment isn’t automatically malicious, but it correlates strongly with abuse—especially for lookalike domains.
  • Fact 5: Mailing lists historically break DKIM because they modify headers/body; “DKIM fail” can mean “mailing list” more often than “spam.”
  • Fact 6: Autolearn (automatic Bayes training) has been a source of self-inflicted pain across multiple spam systems, not just Rspamd, because attackers can nudge your model with borderline content.
  • Fact 7: Redis became the de facto backing store for many Rspamd modules (Bayes, fuzzy, ratelimit, etc.) because low-latency lookups matter more than fancy persistence.
  • Fact 8: MIME parsing and URL extraction are perennial hotspots; performance regressions often show up as “false positives” because timeouts change which checks run.
  • Fact 9: “Legitimate bulk mail” is operationally closer to spam than to personal mail. Treat it as its own traffic class with its own policy.

Fast diagnosis playbook: find the bottleneck in minutes

When someone says “Rspamd is blocking legit mail,” you don’t start by editing scores. You start by proving what is happening.

First: confirm what action happened and why

  • Pull the message’s Rspamd result (from headers, logs, or controller).
  • Identify the top 3 scoring symbols.
  • Check whether the action (reject/quarantine/greylist) matches policy for that traffic class.

Second: check whether this is a systemic shift or a single sender

  • Compare today vs yesterday symbol frequency and mean scores.
  • Look for a sudden surge in one symbol (e.g., a DNSBL or DMARC alignment symbol).
  • Validate DNS health: timeouts are silent scoring modifiers because they change what checks run.

Third: verify the plumbing (Rspamd and dependencies)

  • Rspamd workers healthy? No crash loops? No backlog?
  • Redis latency and evictions? (Bayes/fuzzy data loss can change behavior.)
  • DNS resolver performance and correctness.

Fourth: apply the smallest possible fix

  • If one symbol is wrong, adjust that symbol or scope it (per domain, per IP, per sender).
  • If authentication alignment is noisy, tune those symbols but don’t neuter them.
  • If you must relax actions, do it temporarily and pair it with a compensating control (quarantine instead of reject; add headers and monitor).

Practical tasks: commands, outputs, and what you do next

These are the kinds of tasks you run during an incident or a tuning session. Each includes a command, a realistic snippet of output, what it means, and the decision you make next.

Task 1: Confirm Rspamd is running and which version you’re tuning

cr0x@server:~$ systemctl status rspamd --no-pager
● rspamd.service - Rspamd rapid spam filtering system
     Loaded: loaded (/lib/systemd/system/rspamd.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2026-01-03 08:41:12 UTC; 2h 13min ago
   Main PID: 1342 (rspamd)
      Tasks: 18 (limit: 18910)
     Memory: 212.4M
        CPU: 7min 11.402s

What it means: It’s up. If it’s flapping, you’re not tuning scoring—you’re fixing reliability.

Decision: If stable, proceed to evidence gathering. If unstable, stop and fix crashes/config reload loops first.

Task 2: Inspect action thresholds and confirm you didn’t inherit a trap

cr0x@server:~$ rspamadm configdump actions
actions {
    add_header = 6;
    greylist = 4;
    reject = 15;
    subject = 8;
}

What it means: Greylist at 4 is aggressive; add_header at 6 is fine; reject at 15 is conservative. If your false positives are rejections with reject=15, a few symbols are likely massively overweighted or a bug is present.

Decision: If reject is low (e.g., 6–8), treat that as a likely contributor and move toward quarantine/add_header while tuning symbols.

Task 3: Pull one specific message result from logs (by queue ID)

cr0x@server:~$ grep -F "q1bC7R2kQz" /var/log/rspamd/rspamd.log | tail -n 1
2026-01-03 10:55:21 #1342(normal) <9c1f0b>; task; rspamd_task_write_log: id: <q1bC7R2kQz@mail.example>, qid: q1bC7R2kQz, ip: 203.0.113.44, from: <billing@vendor.example>, (default: F (no action): [8.11/15.00] [DMARC_POLICY_REJECT(3.50){vendor.example;reject;},DKIM_REJECT(2.00){fail;},R_SPF_FAIL(1.80){-all;},MANY_INVISIBLE_PARTS(0.80){2;},MIME_HTML_ONLY(0.20){},ARC_NA(0.10){},ASN(0.01){AS64500;}] len: 48712, time: 110.2ms real, 36.7ms virtual, dns req: 18, digest: <d2c7...>, mime_rcpts: <ap@yourco.example>

What it means: Not rejected, but score 8.11. The big drivers are DMARC policy reject + DKIM fail + SPF fail. That’s not “random”; it’s alignment/auth failure.

Decision: Don’t lower DMARC/DKIM/SPF scores globally. Investigate whether the sender is misconfigured, whether forwarding broke SPF, or whether you’re receiving via a gateway that rewrites mail.

Task 4: Analyze symbol frequency spikes (cheap trend check)

cr0x@server:~$ awk -F'[][]' '/rspamd_task_write_log/ {print $4}' /var/log/rspamd/rspamd.log | \
tr ',' '\n' | sed 's/(.*//' | awk '{print $1}' | sort | uniq -c | sort -nr | head
  4821 R_DKIM_ALLOW
  3912 R_SPF_ALLOW
  1104 MIME_HTML_ONLY
   988 DMARC_POLICY_REJECT
   811 R_SPF_FAIL
   620 DKIM_REJECT
   507 MANY_INVISIBLE_PARTS

What it means: DMARC_POLICY_REJECT and DKIM_REJECT are high. Either you’re getting targeted spoofing (possible) or something changed in authentication checks (also possible).

Decision: Correlate with DNS changes, MTA path changes, or vendor onboarding. If only one vendor triggers, scope a fix per sender/domain instead of blunting global rules.

Task 5: Check DNS resolver health from the Rspamd host

cr0x@server:~$ resolvectl statistics
Transactions               Current Transactions  0
Cache                       Cache Size           4632
Cache                       Cache Hits           191022
Cache                       Cache Misses         22114
DNSSEC Verdicts             Secure               0
DNSSEC Verdicts             Insecure             0
DNSSEC Verdicts             Bogus                0

What it means: If cache misses explode and latency rises, Rspamd may time out DNS-based checks (SPF, DKIM key lookups, DNSBL), changing scoring behavior and sometimes creating false positives or inconsistent actions.

Decision: If DNS is sick, fix DNS first. Tuning scores in a degraded environment is how you bake in the outage as “policy.”

Task 6: Measure Redis latency and memory pressure

cr0x@server:~$ redis-cli -s /run/redis/redis-server.sock INFO stats | egrep 'instantaneous_ops_per_sec|keyspace_hits|keyspace_misses'
instantaneous_ops_per_sec:1420
keyspace_hits:982211
keyspace_misses:44102
cr0x@server:~$ redis-cli -s /run/redis/redis-server.sock INFO memory | egrep 'used_memory_human|maxmemory_human|evicted_keys'
used_memory_human:1.62G
maxmemory_human:2.00G
evicted_keys:18219

What it means: Evictions mean you’re forgetting things. Bayes and fuzzy can degrade unpredictably if Redis is constantly discarding state.

Decision: Stop tuning until Redis has headroom. Add memory, adjust maxmemory policy, or move Bayes/fuzzy to a dedicated Redis instance.

Task 7: Use rspamc to analyze a suspect message (from a saved file)

cr0x@server:~$ rspamc -h /run/rspamd/rspamd.sock symbols /tmp/suspect.eml
Results for file: /tmp/suspect.eml (0.221 seconds)
Symbol: DMARC_POLICY_REJECT (score: 3.50)
Symbol: DKIM_REJECT (score: 2.00)
Symbol: R_SPF_FAIL (score: 1.80)
Symbol: MIME_HTML_ONLY (score: 0.20)
Message-ID: <1c9f...@vendor.example>
Action: no action
Spam: false
Score: 7.50 / 15.00

What it means: You can reproduce the score outside the MTA. That’s essential for safe tuning.

Decision: If reproducible, adjust config and re-test on the same corpus before deployment.

Task 8: Dump a symbol’s configured score and group

cr0x@server:~$ rspamadm configdump --path scores | egrep 'DMARC_POLICY_REJECT|DKIM_REJECT|R_SPF_FAIL'
DMARC_POLICY_REJECT = 3.5;
DKIM_REJECT = 2;
R_SPF_FAIL = 1.8;

What it means: Confirms you’re not chasing ghosts; those are real weights, not “dynamic.”

Decision: If you need to reduce pain, prefer scoping (per domain/sender) over reducing these globally.

Task 9: Inspect per-domain policy overrides (common source of surprise)

cr0x@server:~$ rspamadm configdump --path settings
settings {
    yourco.example {
        priority = high;
        apply {
            actions {
                reject = 12;
            }
        }
    }
}

What it means: Someone lowered reject threshold only for your main domain (or raised it, etc.). This is often intentional, sometimes forgotten.

Decision: Validate whether this aligns with business requirements. If the override exists, document it and ensure monitoring matches the special policy.

Task 10: Check multimap allowlist hits (to see if you’re relying on duct tape)

cr0x@server:~$ grep -F "MULTIMAP" /var/log/rspamd/rspamd.log | tail -n 3
2026-01-03 10:52:11 #1342(normal) <7a2e1c>; lua; multimap.lua: MULTIMAP_ALLOWLIST_DOMAIN hit for domain vendor.example
2026-01-03 10:54:40 #1342(normal) <11bd77>; lua; multimap.lua: MULTIMAP_ALLOWLIST_IP hit for ip 198.51.100.9
2026-01-03 10:54:45 #1342(normal) <1f33a9>; lua; multimap.lua: MULTIMAP_ALLOWLIST_FROM hit for from billing@vendor.example

What it means: You’re already bypassing checks for some senders. That can be fine, but it’s a risk ledger item, not a victory lap.

Decision: If allowlisting is heavy, reduce scope: prefer DKIM-based allowlisting or restrict by IP ranges you control, and set expiration reviews.

Task 11: Validate that your configuration reloads cleanly

cr0x@server:~$ rspamadm configtest
syntax OK
configuration OK

What it means: You didn’t break parsing. This doesn’t prove your policy is good. It proves it won’t crash workers.

Decision: Only after configtest passes do you reload. Otherwise you’re debugging production with a blindfold.

Task 12: Reload Rspamd without restarting the whole box

cr0x@server:~$ systemctl reload rspamd
cr0x@server:~$ journalctl -u rspamd -n 5 --no-pager
Jan 03 11:02:10 server rspamd[1342]: config reloaded successfully
Jan 03 11:02:10 server rspamd[1342]: workers state: normal workers: 4

What it means: Reload succeeded; workers stayed up.

Decision: Immediately verify scoring on a test message corpus and monitor reject/quarantine rates for the next hour.

Task 13: Confirm MTA is receiving the Rspamd result headers

cr0x@server:~$ postqueue -p | head
-Queue ID-  --Size-- ----Arrival Time---- -Sender/Recipient-------
A1B2C3D4E5    48211 Thu Jan  3 11:03:12  billing@vendor.example
                                         ap@yourco.example
cr0x@server:~$ postcat -q A1B2C3D4E5 | egrep -i 'X-Spam|X-Rspamd|Authentication-Results' | head -n 20
X-Spam-Status: No, score=7.50 required=15.00
X-Spam-Action: no action
X-Spam-Score: 7.50
X-Spam-Level: *******
Authentication-Results: mx.yourco.example; dkim=fail header.d=vendor.example; spf=fail smtp.mailfrom=billing@vendor.example; dmarc=fail header.from=vendor.example

What it means: Headers confirm the same story. If headers are missing, your false positives might be an MTA policy problem, not Rspamd.

Decision: If the MTA is acting on a different signal than expected (e.g., milters, header checks), align the pipeline before you touch Rspamd scoring.

Task 14: Track action counts over a short interval

cr0x@server:~$ grep -oE 'default: [^ ]+ \([^)]*\)' /var/log/rspamd/rspamd.log | tail -n 2000 | sort | uniq -c | sort -nr
  1602 default: F (no action)
   287 default: R (reject)
    91 default: G (greylist)
    20 default: A (add header)

What it means: Rejects are relatively high. Whether that’s “bad” depends on your traffic mix. For corporate mail, 287 rejects in the last 2000 logged decisions is worth investigating.

Decision: Sample 20 rejects. If they’re mostly obvious spam, fine. If they include legitimate senders, find the recurring symbol(s) and fix those specifically.

Joke 1: Email is the only product where “deliverability” means “please let this person be interrupted by my message.”

A sane scoring strategy: change less, measure more

When you tune for fewer false positives, you’re negotiating between two failure modes:

  • Type I: spam delivered (annoying, risky)
  • Type II: legitimate mail blocked (business-breaking)

Most orgs claim they fear spam more. They don’t. They fear missing a real email more, because it triggers visible dysfunction. Your policy should admit that reality and codify it.

Pick actions based on business blast radius

Common actions and how they behave in the real world:

  • add_header: safest default for “maybe spam.” It keeps mail flowing while still letting users and SIEMs see the score. Downside: users will ignore headers until you train them (or you route based on headers).
  • greylist: decent for unknown senders, but painful for automated systems that don’t retry correctly. Use it when you can tolerate delivery delays and you know your legitimate senders behave like proper MTAs.
  • quarantine: the grown-up alternative to reject for medium confidence spam. It reduces false positive harm while keeping junk away from inboxes.
  • reject: reserve for high confidence. If you reject low-confidence mail, you’ve built an outage generator that triggers at random.

Change scoring like you change firewall rules

Four rules that keep you honest:

  1. Make one change at a time unless you’re rolling back a known-bad deploy.
  2. Prefer scoping to global changes: per domain, per recipient, per IP, per authenticated user.
  3. Keep a test corpus: at least 50 known-good and 200 known-bad messages (or whatever fits your environment). Re-score before/after.
  4. Log the decision: why you changed a symbol and what metric you expect to move.

Where false positives usually hide: “good” mail that looks spammy

Legitimate messages that commonly accumulate spammy signals:

  • Vendors sending invoices from a new platform (new IPs, mismatched auth)
  • Marketing newsletters (URL-heavy, tracking, HTML-only, bulk patterns)
  • Security alerts (short text + scary subject + links)
  • Calendar invites and automated notifications (weird MIME structures)
  • Forwarded mail (SPF breaks; DMARC fails unless ARC is used correctly)

The goal isn’t to pretend these are “personal mail.” The goal is to identify them and treat them as a category with a policy.

Targeted tuning tools: multimap, groups, and per-domain policy

Targeted tuning is how you reduce false positives without lowering the drawbridge for spam. In Rspamd, you have three practical levers:

  • Settings (per domain/user/IP/recipient) to adjust actions, apply symbol changes, or skip checks.
  • Multimap for allowlists/blocklists based on headers, envelope, IP, or other selectors.
  • Symbol scores and groups to keep related checks balanced (auth, content, reputation).

Prefer “softening” over “disabling”

Disabling checks is tempting and usually wrong. If DMARC failures are causing false positives for a specific partner because they misconfigured their mail, you have choices:

  • Ask them to fix it. (Yes, you can. Vendors respond faster when invoices stop arriving.)
  • Temporarily reduce the penalty for that partner only.
  • Quarantine instead of reject for that class of mail until auth is corrected.

Example: per-domain settings to avoid rejecting borderline mail

For a domain where business impact is high (e.g., your own), you might choose quarantine at a lower score than reject, keeping rejects for truly egregious spam.

Key practice: use overrides for actions, not blanket allowlisting. Let the system still score; just change what you do with the result.

Example: multimap allowlist with guard rails

If you must allowlist, do it with a bias toward verifiable identity:

  • Allowlist a DKIM-signed domain pattern you trust (and verify it’s stable).
  • Allowlist an IP range only if the sender publishes it and it’s not shared infrastructure.
  • Avoid allowlisting by “From:” display name unless you enjoy being phished.

Joke 2: The quickest way to get budget approval for spam filtering is to allow one phishing email through and call it a “user education initiative.”

Authentication and alignment: SPF, DKIM, DMARC without cargo culting

A lot of false positives are really “authentication disagreement” problems. You can’t tune your way out of a broken sender ecosystem—at least not cleanly. But you can understand the failure modes and choose policies that reduce collateral damage.

SPF: great until mail gets forwarded

SPF checks the connecting IP against the sender domain’s policy. Forwarding breaks SPF because the forwarding server’s IP isn’t authorized by the original sender. That’s not Rspamd being mean; that’s how SPF works.

Operational decision: if your environment heavily forwards mail (aliases to external systems, ticketing tools, etc.), don’t treat SPF fail as a near-reject signal. Keep it meaningful, but balanced.

DKIM: survives forwarding but breaks on modification

DKIM signs the message content. Mailing lists, footers, and “helpful” gateways that rewrite HTML can break DKIM. You’ll see DKIM fail spikes when a list server changes behavior or a security appliance starts “sanitizing” content.

Operational decision: a DKIM fail should be a strong signal when combined with suspicious content or spoofing indicators, not a standalone hammer.

DMARC: alignment is the point, and that’s why it hurts

DMARC checks whether SPF and/or DKIM align with the domain in the visible “From:”. This is how you stop obvious spoofing. It also punishes legitimate-but-messy mail flows.

Operational decision: treat DMARC policy reject/quarantine as important reputation/auth signals, but be careful about overreacting for known legitimate ecosystems (mailing lists, forwarding chains). Consider ARC validation if your flow supports it.

ARC: the complicated friend who is sometimes right

ARC (Authenticated Received Chain) can preserve authentication results through intermediaries. When it’s correctly deployed, it reduces false positives on forwarded mail. When it’s badly deployed, it adds more confusion.

Operational decision: if you see a lot of forwarded mail and ARC is present, validate whether your Rspamd build and configuration are evaluating it correctly. If ARC is consistently absent, that’s also a signal.

Bayes and fuzzy: training without poisoning yourself

Bayesian filtering is powerful and extremely easy to mess up. Fuzzy hashing is great for catching near-duplicate spam campaigns, and also easy to over-apply. The theme: control your inputs.

Bayes: use it, but don’t let it drive the car

Bayes in Rspamd can learn from messages you classify as ham/spam. The false positive trap happens when:

  • Users mark newsletters as spam because they’re annoyed, not because it’s actually spam.
  • Spam lands in a shared mailbox and is later moved around, confusing training pipelines.
  • Autolearn thresholds are set too aggressively, training on borderline messages.

Fuzzy: narrow scope, high confidence

Fuzzy works best when you hash high-signal elements: known spam templates, known phishing kits, and recurring payloads. It works worst when you hash common business templates (invoices, shipping notifications) because you will eventually hash yourself into rejecting half the internet.

Guard rails you should enforce

  • Require manual review for training sets used to change Bayes/fuzzy behavior.
  • Separate corp mail vs bulk mail training if you can. Bulk mail (even legitimate) shares traits with spam.
  • Monitor Bayes confidence distribution, not just “spam caught.” A Bayes model that is always uncertain is often starving or polluted.

Performance and backpressure: when filtering causes the false positives

This is the part people miss: false positives can be an artifact of timeouts and degraded checks.

Rspamd runs multiple checks: DNS lookups, Redis queries, URL extraction, MIME parsing, external reputation queries. When dependencies are slow, some checks time out. Depending on configuration, you might get:

  • Missing negative signals (spam slips through).
  • Missing positive signals (legit mail loses “allow” points and trends upward in score).
  • Fallback behavior that is conservative (e.g., temporary failures treated as suspicious).

What to monitor that correlates with false positives

  • DNS query latency and failure rates
  • Redis latency, evictions, persistence issues
  • Rspamd worker CPU time vs wall time (virtual vs real in logs)
  • Number of DNS requests per message (spikes can indicate a URL explosion campaign)

Design pattern: “quarantine on uncertainty”

If your environment is subject to dependency hiccups, consider a policy where borderline results go to quarantine rather than reject. It’s less dramatic. It also gives you a retrieval path for false positives.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company ran Rspamd behind a Postfix gateway. They had a clean, simple policy: anything over the reject threshold gets bounced. Someone assumed “reject means safer,” so they lowered reject from a conservative value to something closer to their add_header threshold. The change went in on a Friday. Of course.

Monday morning, finance complained that vendor remittances were “not arriving.” The mail wasn’t delayed; it was rejected. The bounces went to an unattended mailbox because the vendor used a no-reply address. So the vendor didn’t know, finance didn’t know, and the gateway quietly did exactly what it was told.

When we pulled a sample of rejected messages, most were clean but had DMARC alignment issues because the vendor had migrated to a new sending platform and misconfigured DKIM. Rspamd wasn’t being paranoid; it was being consistent. The mistake was the assumption that the reject threshold is just “the spam line.” It’s actually “the business outage line.”

The fix was dull and effective: restore the conservative reject threshold, route medium-confidence mail to quarantine, and create a process for vendor onboarding that checks SPF/DKIM/DMARC before the first invoice is sent.

Mini-story 2: The optimization that backfired

Another organization wanted to reduce mail latency. They noticed DNS queries per message were high during campaigns, so they “optimized” by switching resolvers and tightening timeouts. In a lab, it looked great: faster responses, fewer long tail delays.

In production, the new resolver occasionally rate-limited or dropped queries under burst load. Rspamd logs started showing longer “real” times and fewer successful lookups for SPF and DKIM keys. The funny part is that spam started slipping through and legitimate mail started scoring higher—because the “allow” symbols weren’t being awarded consistently.

The incident manifested as “Rspamd is inconsistent.” It wasn’t. The dependency was. After rolling back resolver changes, they added a local caching resolver on the same host network, increased timeout sanity, and monitored DNS failure rates as a first-class metric. Their latency improved and false positives dropped without touching scoring.

This is the lesson: performance tweaks to dependencies are policy changes in disguise. Treat them with the same change control you’d use for spam thresholds.

Mini-story 3: The boring but correct practice that saved the day

A global company had a habit that looked bureaucratic: every quarter, they reviewed their top 20 symbols by aggregate score contribution and their top 20 rejected sender domains. No heroics. Just boring reporting, a short meeting, and a ticket list.

One quarter, they noticed a steady climb in DMARC-related rejects from a legitimate SaaS provider used by HR. Before it hit crisis level, they opened a case with the provider, provided authentication failure evidence, and temporarily applied a per-sender policy that quarantined rather than rejected. HR never noticed.

Two weeks later, the provider fixed DKIM alignment. The company removed the temporary exception. No permanent allowlist scar tissue. No angry executives. No “why didn’t you tell us.”

This is the quiet win: routine review catches drift early, and temporary scoped exceptions prevent outages while keeping the rest of your posture intact.

Common mistakes: symptom → root cause → fix

Symptom: Legitimate mail is rejected with high scores dominated by authentication symbols.
Root cause: Sender misconfigured SPF/DKIM/DMARC, or forwarding/mailing lists breaking alignment.
Fix: Scope policy: quarantine instead of reject for that sender, and push the sender to fix alignment. Avoid globally reducing DMARC/DKIM scores.
Symptom: False positives spike at the same time as “DNS req” count increases in logs.
Root cause: DNS resolver latency/failure causes inconsistent check results.
Fix: Stabilize DNS (local cache, sane timeouts, capacity). Then re-evaluate scoring.
Symptom: Bayes suddenly labels a common business template as spam across multiple departments.
Root cause: Bad training input (users marking unwanted but legitimate bulk as spam), or autolearn too aggressive.
Fix: Review training pipeline; reduce autolearn; retrain from curated sets; separate bulk mail handling.
Symptom: You “fix” false positives by raising reject threshold, and spam volume rises in inbox within days.
Root cause: You changed the wrong control; underlying symbol imbalance remains.
Fix: Revert threshold change; identify top scoring symbols in false positives; tune/scoped override; add quarantine tier.
Symptom: A single internal application’s mail is constantly flagged, but external mail is fine.
Root cause: The app sends malformed MIME, HTML-only mail, missing Date/Message-ID, or uses shared outbound infrastructure with poor reputation.
Fix: Fix the app mail generation (proper headers, text part, consistent DKIM). If needed, add a per-IP policy with minimal relaxation while remediation occurs.
Symptom: Rspamd logs show normal scoring, but the MTA still rejects or reroutes mail.
Root cause: Another filter/milter/header check is acting, or the MTA interprets headers incorrectly.
Fix: Trace the pipeline; confirm the MTA policy maps to Rspamd action; remove conflicting checks or align decisions.
Symptom: Random classification changes after a Redis restart or memory pressure event.
Root cause: Redis evicted Bayes/fuzzy/metadata; model state changed.
Fix: Provide memory headroom; avoid eviction; separate Redis instances; confirm persistence strategy; monitor evicted_keys.
Symptom: Allowlisting “fixes” the issue but phishing starts bypassing filters.
Root cause: Allowlisting by weak identifiers (From header, domain without DKIM), or too-broad IP ranges.
Fix: Replace broad allowlists with scoped settings and strong identity checks; use expirations and periodic review.

Checklists / step-by-step plan

Step-by-step: reduce false positives safely (without opening the floodgates)

  1. Collect evidence: 20–50 false positives with full headers and Rspamd symbol breakdown.
  2. Classify by traffic type: personal mail, vendor transactional, bulk/marketing, system alerts, mailing lists.
  3. Identify top recurring symbols in false positives (top 3 per message; aggregate frequency).
  4. Confirm dependency health: DNS, Redis, CPU saturation, worker latency.
  5. Pick the least risky mitigation:
    • Per-sender or per-domain settings override action (quarantine instead of reject).
    • Adjust one symbol score slightly (small deltas, e.g., -0.5 not -3.5).
    • Fix sender authentication or internal mail composition.
  6. Re-score a test corpus using rspamc before deploying changes.
  7. Deploy with reload, not restart, and watch logs for config reload success.
  8. Monitor action rates (reject/quarantine/greylist/add_header) for at least one business day.
  9. Backstop with quarantine for medium confidence until metrics stabilize.
  10. Document exceptions with owners and expiration review dates.

Checklist: what you should have in place before tuning

  • A quarantine or at least a recoverable path for false positives
  • Centralized logging and a way to search by Message-ID/queue ID
  • A curated ham/spam corpus for regression scoring
  • Dashboards for DNS latency, Redis evictions, Rspamd worker latency
  • A change log for score/settings changes (commit messages count)

Checklist: “do not do this” list

  • Do not disable DMARC/DKIM/SPF checks globally because one vendor can’t configure email.
  • Do not allowlist by From header alone.
  • Do not tune during an outage (DNS/Redis issues) and call it “policy.”
  • Do not set reject thresholds close to add_header thresholds in business email environments.
  • Do not autolearn aggressively unless you’re prepared to audit training inputs.

FAQ

1) Should I fix false positives by raising the reject threshold?

Only as a temporary containment measure, and only if you pair it with quarantine or add_header monitoring. Long-term, fix the symbols causing the spikes or scope the policy to the affected traffic.

2) What’s the safest action configuration for corporate mail?

Usually: add_header for low suspicion, quarantine for medium, reject for high. Greylisting can work, but it creates delays and brittle behavior for automated senders.

3) A vendor keeps failing DKIM and DMARC. Do I allowlist them?

Prefer a scoped policy: quarantine instead of reject for that sender domain and keep scoring intact. If you must allowlist, do it as narrowly as possible (stable DKIM identity or a tight IP range) and set a review date.

4) Why do mailing lists cause false positives?

They often modify messages (footers, subject tags, header changes), which can break DKIM. They also change the delivery path in ways that break SPF. DMARC then sees misalignment and scores it accordingly.

5) How do I know which symbol is “wrong”?

You don’t guess. Aggregate your false positives and find recurring top contributors. If one symbol appears in most of them and isn’t strongly correlated with actual spam in your environment, it’s a candidate for tuning or scoping.

6) Can Bayes cause false positives even if everything else is fine?

Yes. A polluted Bayes model can confidently mislabel common templates. Tighten training inputs, reduce autolearn aggressiveness, and retrain from curated sets if needed.

7) Why do false positives get worse when Redis is under memory pressure?

Evictions remove learned state and caches. That changes Bayes probabilities and fuzzy/reputation behavior. The system becomes less consistent, which feels like “random” classification changes.

8) Is greylisting still useful?

Sometimes. It’s useful against low-quality senders and some opportunistic spam, but it delays legitimate first-contact mail and many modern spammers retry correctly. Use it intentionally, not nostalgically.

9) How do I tune without letting more spam through?

Scope changes (per sender/domain), prefer quarantine for uncertainty, and keep a regression corpus. Don’t weaken global auth symbols unless you have strong compensating controls.

10) What’s a good sign that the problem is not scoring but infrastructure?

When symbol patterns shift with timeouts, DNS errors, or Redis evictions, and when “real time” in logs climbs while “virtual time” stays low. Fix dependencies first.

Conclusion: next steps you can do this week

If you take only a few actions, take these:

  1. Build a small false-positive corpus with full headers and Rspamd symbol breakdown. Stop tuning from anecdotes.
  2. Implement quarantine for medium confidence so you can recover mistakes without business impact.
  3. Audit your dependency health: DNS and Redis issues masquerade as scoring problems.
  4. Tune by scoping: per-domain/per-sender settings beat global score reductions almost every time.
  5. Set up a boring quarterly review of top symbols and top rejected senders. It’s the cheapest reliability work you’ll ever do.

False positives aren’t a mystery. They’re a signal that your environment changed, your assumptions were wrong, or your dependencies are wobbling. Treat them like an operational problem, not a superstition problem, and Rspamd will behave.

← Previous
Proxmox VMs Have No Internet: vmbr0 Bridge Mistakes and Quick Fixes
Next →
GitHub-Style Code Blocks: Title Bars, Copy Buttons, Line Numbers, and Highlighted Lines

Leave a comment