Email SPF fails: the 5 record mistakes that break delivery (and fixes)

Was this helpful?

Email delivery failures rarely show up as one clean “down” alert. They show up as pipeline drift: marketing says campaigns are “underperforming,” sales claims leads “ghost,” and support starts getting screenshots of bounce messages that look like they were written by a committee of RFCs.

SPF is a frequent culprit because it’s both simple and incredibly easy to mess up in production DNS. The worst part: you can break SPF while thinking you improved it. Let’s stop doing that.

SPF in plain ops terms (what it really does)

SPF (Sender Policy Framework) is a DNS-published policy that tells a receiver which IPs are allowed to send mail using a domain in the SMTP “envelope-from” identity (the return-path domain). The receiver checks the connecting IP against that policy and produces a result: pass, fail, softfail, neutral, none, temperror, or permerror.

In production, SPF does two jobs:

  • Stops casual spoofing of your domain in the envelope-from (not necessarily the visible From: header).
  • Feeds downstream policy, especially DMARC. DMARC uses SPF (and/or DKIM) plus alignment rules to decide whether to quarantine or reject.

SPF is not magic. It doesn’t encrypt. It doesn’t prove a human typed the email. It does not survive forwarding in many cases. It’s a policy check tied to the SMTP client IP at the moment of delivery. That’s why a perfectly “valid” SPF record can still fail in the real world when your mail path changes.

One more nuance: SPF is evaluated on the envelope-from domain (Return-Path), not the pretty From: header people see. If your vendor uses bounces.vendor.example as return-path and your DMARC domain is example.com, SPF might pass but not align. That’s a separate problem with the same symptom: “DMARC failed.”

Paraphrased idea from Werner Vogels: You build it, you run it. Email is no exception. If you delegate sending to vendors, you still own the authentication outcomes.

Joke #1: SPF is like a nightclub bouncer checking the guest list. If you keep rewriting the list during the shift, don’t be surprised when nobody gets in.

Interesting facts and historical context

  • SPF predates DMARC by years. SPF became widely deployed in the mid-2000s; DMARC arrived later to unify SPF and DKIM under a domain-owner policy.
  • The “10 DNS lookup limit” is intentional friction. It exists to prevent a receiver from being forced into expensive recursive DNS work per message (a classic DoS vector).
  • TXT won the record-type fight. SPF once had a dedicated RR type, but TXT became the practical standard due to inconsistent DNS support and tooling.
  • SPF checks the connecting IP, not “the sender.” That’s why outbound relays, shared SaaS pools, and on-prem egress NAT are common foot-guns.
  • “~all” became popular because it felt safer. Softfail is often used during rollout, but many orgs never finish the migration to “-all,” leaving spoofing lanes open.
  • Receivers treat permerror harshly. A syntax error or lookup explosion can cause “permerror,” and some receivers effectively treat that as fail for enforcement decisions.
  • SPF macros exist and are rarely worth it. They’re powerful and fragile, and they can create unpredictable lookup behavior across receivers.
  • Alignment is why DMARC changed the game. SPF “pass” is not enough; it needs to align with the visible From: domain to satisfy DMARC via SPF.
  • Flattening SPF is a workaround, not a virtue. It trades DNS indirection for IP list maintenance, and it can go stale fast unless automated.

Fast diagnosis playbook

If you’re on-call and emails are bouncing, you don’t have time for philosophy. You need a deterministic sequence that finds the bottleneck quickly.

First: identify what identity is failing (SPF vs DMARC vs “some vendor thing”)

  1. Grab one real bounce or receiver header from a failing delivery. Look for: spf=, dmarc=, dkim=, Authentication-Results:, or explicit “SPF fail.”
  2. Extract the envelope-from domain (Return-Path) and the connecting IP (often shown in the bounce or in the SMTP log).
  3. Decide whether you’re fixing SPF, alignment, or routing. If SPF passes but DMARC fails, you may be dealing with alignment or DKIM.

Second: validate DNS publication and propagation

  1. Query the authoritative view of DNS (not your laptop’s cached resolver). Confirm there’s exactly one coherent SPF policy and it starts with v=spf1.
  2. Compare multiple resolvers (public and internal). If results differ, it’s propagation or split-horizon DNS.
  3. Check TTLs to estimate when the world will converge. Don’t guess. Read the TTL.

Third: evaluate the SPF record as the receiver would

  1. Count DNS lookups (includes, a, mx, ptr, exists, redirect). If you’re near 10, assume you’re over 10 somewhere.
  2. Test the specific sender IP against the record. SPF is about the IP that connected, not the vendor brand name.
  3. Look for brittle mechanisms: ptr, macros, redirect chains, overly broad includes.

Fourth: decide the least risky fix

  • If you have multiple SPF TXT records: merge to one. That’s usually the fastest win.
  • If you’re over the lookup limit: reduce includes, remove a/mx if not needed, or flatten (with automation) if you must.
  • If you’re failing due to vendor sending from unexpected IP ranges: fix the sending path or the vendor config, not just the SPF record.
  • If DMARC is enforcing and SPF doesn’t align: fix alignment (custom return-path) or lean on DKIM alignment.

The 5 SPF record mistakes that break delivery (and the fixes)

1) Publishing multiple SPF TXT records for the same name

What happens: The receiver queries TXT for example.com and gets two strings that both start with v=spf1. Per the SPF spec, that’s a permerror. Many receivers treat permerror as fail or at least as “suspicious,” and your DMARC policy may punish it.

Why it happens in real companies: Different teams “own” different senders. Marketing adds one SPF record for a platform. IT adds another for the corporate relay. A vendor adds their own during onboarding. Nobody merges them because DNS UIs don’t scream when you do something self-destructive.

Fix: You get exactly one SPF record per domain (per DNS name). Merge mechanisms into a single string, keep it under limits, and remove the extra record(s).

Operational advice: Treat SPF like a shared library: one package, multiple contributors, strict review. If you can’t control who edits DNS, you can’t control delivery.

2) Blowing the 10 DNS lookup limit (permerror in disguise)

What happens: SPF evaluation allows at most 10 DNS “mechanism” lookups during processing (includes, redirects, a, mx, ptr, exists). It’s easy to exceed when you chain includes (vendor includes vendor includes vendor). When you cross the line, receivers return permerror. That’s not “temporary.” It’s “your policy is invalid.”

Fix options (choose based on risk):

  • Prefer fewer includes. Remove senders you don’t use. Replace a and mx with explicit ip4/ip6 if you know the actual egress.
  • Use subdomains per sender. Put vendors on bounce.vendor.example.com and align DMARC via DKIM or custom return-paths. This avoids stuffing everything into the apex SPF record.
  • Flatten as a last resort. Replace include chains with literal IP ranges. Automate updates or accept that it will rot.

What to avoid: Don’t use ptr to “simplify” SPF. It adds lookups and unpredictability, and many receivers distrust it. Also don’t stack includes “just in case.” SPF is not a wishlist.

3) Using the wrong domain: SPF is on Return-Path, not From:

What happens: You publish SPF on example.com. Your SaaS platform uses bounce.vendor-mail.example.net as the envelope-from. Receivers check SPF for example.net (or whatever return-path domain is used), not example.com. Your SPF record might be perfect and still irrelevant.

Symptoms: You “pass SPF” for some streams (corporate MTA) and “fail SPF” for others (marketing platform), even though both appear to be “from example.com” to humans.

Fix:

  • Configure the vendor to use a custom return-path under your domain (common pattern: bounce.mail.example.com).
  • Publish SPF for that return-path domain that authorizes the vendor.
  • If DMARC is enforcing, ensure alignment: SPF can only satisfy DMARC if the envelope-from domain aligns with the From: domain (same domain or subdomain, depending on relaxed/strict alignment).

Opinionated take: If a vendor cannot support custom return-path and aligned DKIM, treat them as a deliverability liability, not a partner.

4) Syntax errors, quoting issues, and “helpful” DNS UI formatting

What happens: SPF is text, but not free-form poetry. A missing space, a stray quote, or a record split incorrectly can flip “pass” to permerror. Some DNS providers auto-wrap long TXT records into multiple quoted strings (which is fine), while others create multiple TXT records (which is not fine). Some UIs try to be helpful and end up being destructive.

Common syntax faceplants:

  • Missing v=spf1 at the start.
  • Using commas instead of spaces: v=spf1,ip4:... (no).
  • Forgetting all mechanism entirely (ambiguous policy).
  • Typos in mechanisms: incldue:, ip:1.2.3.4 (wrong), or malformed CIDR.
  • Using uppercase in a way that some broken parsers mishandle (rare, but why tempt fate?).

Fix: Validate the published record exactly as receivers will read it. Use authoritative queries. Then run an SPF evaluator against a known sender IP. Don’t trust the DNS UI preview.

5) Getting the qualifier wrong: -all vs ~all vs ?all (and pretending it’s “just preference”)

What happens: The all mechanism sets the default for everything not explicitly allowed. The qualifier is your policy posture:

  • -all: hard fail. Unauthorized senders should be rejected.
  • ~all: softfail. “Probably unauthorized, but accept with suspicion.”
  • ?all: neutral. “No opinion.”
  • +all: pass everything. That’s not SPF; that’s performance art.

Fix: Use -all when you actually know your senders. Use ~all only as a temporary rollout state with a deadline. Use ?all rarely, and only when you truly cannot enumerate senders (and then accept the spoofing risk).

Joke #2: If your SPF ends with +all, you didn’t configure email security—you installed a “Welcome, attackers” doormat.

Common mistakes: symptoms → root cause → fix

SPF permerror appears suddenly after a “small DNS change”

Symptoms: Bounces mention “SPF permerror,” “too many DNS lookups,” or “invalid SPF.” It started within an hour of a DNS update.

Root cause: You crossed the 10-lookup limit by adding one more include, or you created multiple SPF TXT records.

Fix: Merge to a single SPF TXT record and reduce lookups. Remove unneeded includes; replace a/mx with explicit IP ranges when feasible.

SPF passes but DMARC fails (and the receiver still rejects)

Symptoms: Header shows spf=pass but dmarc=fail; bounces mention DMARC policy. Users report “it works to Gmail but not to corporate domains,” or vice versa.

Root cause: SPF passed for an envelope-from domain that does not align with the From: domain. Or DKIM is missing/broken and SPF is the only hope.

Fix: Configure custom return-path under your domain (alignment), or ensure DKIM signing aligns with the From: domain and survives the sending path.

SPF fails only for one vendor stream

Symptoms: Office 365 mail passes; marketing platform fails. Or transactional passes; support ticketing fails.

Root cause: The vendor is sending from IPs not covered by the SPF record for the relevant return-path domain; or the vendor uses a different envelope-from domain than you assumed.

Fix: Identify the connecting IP and the envelope-from domain from real headers. Update the correct SPF record (often a subdomain). Prefer vendor include for their sending pool if it stays within lookup limits.

SPF “none” appears even though you “set up SPF months ago”

Symptoms: Receivers show spf=none. You swear SPF exists.

Root cause: You published SPF on www.example.com or a different domain than the return-path domain. Or split-horizon DNS serves different answers externally.

Fix: Query the actual return-path domain used in the message. Validate externally visible authoritative DNS. Remove internal-only records or align internal/external zones.

SPF fails after enabling a new outbound relay or changing NAT

Symptoms: Everything was stable; then infrastructure changes happened (new egress IPs, new relay pool). Now deliveries degrade.

Root cause: SPF allows the old egress IPs only. The SMTP client IP changed.

Fix: Update SPF with the new egress IP ranges (or the relay’s published include). Also verify the relay is actually used for all mail streams; hybrid setups often leak direct-to-internet traffic.

Intermittent SPF results across recipients

Symptoms: Some recipients see pass, others fail or none. This varies by region/provider.

Root cause: DNS propagation differences, stale caches, or multiple authoritative nameservers not in sync. Sometimes it’s a DNSSEC failure or a resolver path issue.

Fix: Check authoritative servers directly and compare across resolvers. Ensure all NS serve identical SPF TXT. Fix DNS deployment process; lower TTL temporarily during controlled change windows.

Practical tasks (commands, output, decisions)

Below are field-tested tasks you can run from a Linux host with common tools. Each task includes: the command, representative output, what it means, and what decision you make.

Task 1: Pull the published SPF TXT record (simple view)

cr0x@server:~$ dig +short TXT example.com
"v=spf1 ip4:203.0.113.10 include:_spf.google.com -all"

What it means: There is at least one TXT record; one string looks like SPF.

Decision: If you see more than one v=spf1 string here, you likely have the “multiple SPF TXT” permerror problem and should merge immediately.

Task 2: Detect multiple SPF records explicitly

cr0x@server:~$ dig TXT example.com +noall +answer
example.com.  300 IN TXT "v=spf1 ip4:203.0.113.10 -all"
example.com.  300 IN TXT "v=spf1 include:spf.mailvendor.example ~all"

What it means: Two SPF policies are published at the same owner name.

Decision: Treat as permerror risk. Replace both with one merged SPF string. Don’t “leave both and hope receivers pick one.” They shouldn’t.

Task 3: Query authoritative nameservers (avoid caching lies)

cr0x@server:~$ dig +short NS example.com
ns1.dnsprovider.example.
ns2.dnsprovider.example.
cr0x@server:~$ dig +noall +answer TXT example.com @ns1.dnsprovider.example
example.com.  300 IN TXT "v=spf1 ip4:203.0.113.10 include:_spf.google.com -all"

What it means: You’re inspecting what the authoritative server serves, not what some resolver cached yesterday.

Decision: If ns1 and ns2 differ, you have a DNS publication problem. Fix that first; email is downstream.

Task 4: Inspect TTL to estimate propagation time

cr0x@server:~$ dig TXT example.com +noall +answer
example.com.  3600 IN TXT "v=spf1 ip4:203.0.113.10 include:_spf.google.com -all"

What it means: TTL is 3600 seconds (one hour). Receivers may cache for up to that long (sometimes longer with broken resolvers).

Decision: If you need a fast rollback path, reduce TTL before risky changes during a planned window. During an incident, changing TTL won’t instantly flush caches.

Task 5: Check if your outbound IP is what you think it is

cr0x@server:~$ curl -s ifconfig.me
203.0.113.55

What it means: This host’s internet egress IP is 203.0.113.55.

Decision: If your SPF doesn’t include this IP (or your relay’s IP range), SPF will fail for direct sends from this host/network. Decide whether to route mail through the intended relay or update SPF.

Task 6: Confirm which SMTP client actually connects to the internet (Postfix logs)

cr0x@server:~$ sudo grep -E "to=<|relay=" /var/log/mail.log | tail -n 3
Jan 03 10:11:12 mx1 postfix/smtp[23144]: 3F2A0123: to=, relay=aspmx.l.google.com[142.250.115.27]:25, delay=1.2, delays=0.1/0.1/0.4/0.6, dsn=2.0.0, status=sent
Jan 03 10:12:01 mx1 postfix/smtp[23188]: 9ABCD456: to=, relay=203.0.113.200[203.0.113.200]:587, delay=0.9, delays=0.1/0.1/0.2/0.5, dsn=2.0.0, status=sent
Jan 03 10:12:45 mx1 postfix/smtp[23210]: 7CDE9012: to=, relay=aspmx.l.google.com[142.250.115.27]:25, delay=1.0, delays=0.1/0.1/0.3/0.5, dsn=2.0.0, status=sent

What it means: Some mail goes direct to recipients; some goes via a smarthost at 203.0.113.200.

Decision: Mixed routing increases SPF complexity because the connecting IP changes. Decide whether to enforce a single outbound path (preferred) or ensure SPF covers all possible egress IPs.

Task 7: Extract Return-Path and Authentication-Results from a raw message

cr0x@server:~$ sed -n '1,80p' message.eml | egrep -i '^(return-path|authentication-results|received|from:|dkim-signature):'
Return-Path: 
Authentication-Results: mx.recipient.example; spf=fail (sender IP is 198.51.100.77) smtp.mailfrom=bounce.mail.example.com; dkim=pass header.d=example.com; dmarc=pass header.from=example.com
Received: from mta77.vendor.example (mta77.vendor.example [198.51.100.77])
From: Billing 
DKIM-Signature: v=1; a=rsa-sha256; d=example.com; s=selector1; ...

What it means: SPF failed for bounce.mail.example.com from IP 198.51.100.77, but DKIM passed and DMARC passed (because DKIM aligned with example.com).

Decision: If DMARC passes via DKIM, SPF failure may be tolerable but still harmful for reputation. Decide whether to fix SPF anyway (recommended) or accept it temporarily while ensuring DKIM remains stable.

Task 8: Resolve the include chain (and spot hidden lookup bombs)

cr0x@server:~$ dig +short TXT _spf.google.com
"v=spf1 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all"

What it means: One include expands into more includes. That’s normal, but it consumes lookup budget.

Decision: When you stack multiple “big” includes, you’re betting you won’t hit 10 lookups. Count them. If you’re close, refactor before it breaks.

Task 9: Count DNS-lookups roughly (pragmatic method)

cr0x@server:~$ spfquery -ip 198.51.100.77 -sender bounces@bounce.mail.example.com -helo mta77.vendor.example
result: pass
smtp.mailfrom: bounce.mail.example.com
Received-SPF: pass (bounce.mail.example.com: domain of bounces@bounce.mail.example.com designates 198.51.100.77 as permitted sender)

What it means: Local SPF evaluation returns pass for that IP and sender. This is not identical to every receiver’s behavior, but it’s a strong signal.

Decision: If your test says pass but a receiver says fail, suspect they evaluated a different domain (alignment/return-path mismatch), saw different DNS (propagation/split-horizon), or hit lookup limits due to different resolution behavior.

Task 10: Verify you didn’t accidentally publish SPF on the wrong name

cr0x@server:~$ dig +short TXT www.example.com
"v=spf1 include:spf.mailvendor.example -all"
cr0x@server:~$ dig +short TXT example.com
"google-site-verification=abc123"

What it means: SPF exists on www.example.com but not on the apex example.com.

Decision: If your mail uses example.com in return-path, SPF is effectively missing. Move the SPF record to the correct domain name (or set it on the return-path domain you actually use).

Task 11: Check for split-horizon DNS (internal vs external answers)

cr0x@server:~$ dig +short TXT example.com @10.0.0.53
"v=spf1 ip4:10.10.10.10 -all"
cr0x@server:~$ dig +short TXT example.com @1.1.1.1
"v=spf1 ip4:203.0.113.10 include:_spf.google.com -all"

What it means: Internal resolver serves an internal-only SPF (nonsense to the public internet), while public resolvers see the real policy.

Decision: If internal systems send mail to the internet and evaluate SPF (or sign/route decisions depend on it), split-horizon can cause self-inflicted failures. Align zones or ensure internal resolvers can see the same public SPF view.

Task 12: Confirm DKIM alignment when SPF is fragile

cr0x@server:~$ sed -n '1,120p' message.eml | egrep -i '^(from:|dkim-signature|authentication-results):'
From: Notifications 
DKIM-Signature: v=1; a=rsa-sha256; d=example.com; s=selector1; ...
Authentication-Results: mx.recipient.example; spf=softfail smtp.mailfrom=bounce.vendor.example; dkim=pass header.d=example.com; dmarc=pass header.from=example.com

What it means: DKIM passes for example.com, and DMARC passes even though SPF is softfail because SPF does not align (or is weak).

Decision: If you can’t quickly fix SPF (vendor constraints, forwarding, include limits), ensure DKIM is robust and aligned. But don’t leave SPF broken forever; it still impacts trust and some receiver heuristics.

Task 13: Validate that your SPF ends with a sane default

cr0x@server:~$ dig +short TXT example.com | tr ' ' '\n' | tail -n 1
-all"

What it means: The policy ends in -all, a clear stance.

Decision: If it ends in ~all and you believe you know all senders, schedule the move to -all after verifying logs for unknown senders.

Task 14: Inspect whether you’re using risky mechanisms (ptr, exists, macros)

cr0x@server:~$ dig +short TXT example.com
"v=spf1 ptr exists:%{i}._spf.example.com include:spf.mailvendor.example -all"

What it means: ptr and exists are in play; macros are used. This is advanced SPF and often fragile.

Decision: Unless you have a very specific need and you test across multiple receivers, remove ptr and macro-driven exists. Replace with explicit ip4/ip6 or stable vendor includes.

Three corporate mini-stories from the SPF trenches

Mini-story #1: The incident caused by a wrong assumption

The company had a clean setup on paper: DMARC at p=reject, DKIM signing through a central outbound relay, and a tidy SPF record. Someone approved a new HR platform to send onboarding emails. It was “just email,” so procurement moved faster than engineering.

The vendor’s onboarding checklist said, “Add this SPF record.” The HR admin did exactly that—adding a second TXT record at the apex with a new v=spf1. Nobody noticed because DNS changes don’t page anyone until they do.

Within hours, multiple receivers started returning permerror on SPF. DMARC enforcement kicked in where DKIM wasn’t present (some transactional streams didn’t sign—legacy app servers were still sending directly). Those messages started bouncing. The visible symptom wasn’t “SPF broke.” It was “password resets aren’t arriving.” That’s how you learn email is part of your authentication system.

Engineering eventually pulled a failing header, saw spf=permerror, and found two SPF TXT answers. They merged the policies into one record and removed the duplicate. Then they discovered something worse: half the company’s mail didn’t even go through the relay consistently.

The real fix wasn’t just merging TXT. They enforced outbound routing so app servers couldn’t send direct-to-internet, then tightened SPF with -all. The wrong assumption wasn’t technical; it was organizational: “someone else owns email.” In reality, email owns you.

Mini-story #2: The optimization that backfired

A platform team wanted to “simplify DNS” and reduce vendor dependencies. They replaced several includes with a flattened list of IP ranges. It looked professional: fewer lookups, deterministic evaluation, and faster SPF checks. They even wrote a small script to generate the flattened record.

Then change management happened. The script wasn’t integrated into CI/CD; it lived on a laptop. It worked until the person with the laptop went on vacation, and the vendor rotated sending infrastructure. That’s not hypothetical—large email platforms move IP space around for capacity and reputation management.

Within a week, sporadic SPF fails appeared. Only some recipients were affected because the vendor used different pools per region and per message type. The team chased “recipient-specific filtering” for two days before someone compared the failing connecting IP to the flattened SPF list.

The backfire wasn’t flattening itself; it was flattening without automation, monitoring, and an update cadence. They reverted to vendor includes, then reintroduced flattening properly via a scheduled job that fetched vendor ranges and published DNS through the standard pipeline, with rollback and diff visibility.

The lesson: reliability engineering hates “one-time improvements.” If the improvement creates a new maintenance obligation, either automate it or don’t do it.

Mini-story #3: The boring but correct practice that saved the day

A different org had a policy: every DNS change to email-related records (SPF, DKIM selectors, DMARC, MX) required a pull request in infrastructure-as-code, with a linter and a simple canary check. Nobody called it glamorous. It was mostly YAML and arguments about TTLs.

One Friday, a vendor account manager emailed an “urgent” request: add an include to SPF before end of day, or campaigns would be delayed. The marketing team escalated, because of course they did. The on-call engineer added the change through the usual PR.

The CI linter failed. It flagged that the new include would push the SPF evaluation over the lookup limit due to an existing chain. The engineer didn’t need to be a deliverability wizard; the process caught it.

They responded with a sane alternative: publish a dedicated subdomain for that vendor’s return-path and keep the apex SPF lean. The vendor accepted, campaigns went out, and nothing broke. The boring practice didn’t just “prevent an outage.” It prevented a Friday-night incident with a stakeholder audience.

Checklists / step-by-step plan

Step-by-step: fixing SPF safely in production

  1. Inventory senders. List every system that sends mail as your domain: corporate suite, ticketing, marketing, billing, monitoring, CRM, custom apps, and “mystery appliances.”
  2. Collect real evidence. For each sender, capture at least one delivered message and record:
    • Return-Path (envelope-from domain)
    • Connecting IP
    • Authentication-Results line
  3. Decide your identity strategy. Prefer:
    • One or two controlled return-path subdomains for third parties
    • DKIM aligned to the visible From: domain
    • DMARC policy that matches your confidence
  4. Build a single SPF record per domain name. Merge entries. Remove dead vendors. Avoid ptr. Avoid long include chains.
  5. Count lookups before publishing. Includes expand. Redirects count. A/MX count. Keep margin.
  6. Plan TTL. For planned changes, lower TTL (e.g., from 3600 to 300) at least one TTL cycle before the change. Then raise it back after stability.
  7. Publish and verify on authoritative NS. Check every NS answers the same record.
  8. Verify with multiple resolvers. You’re not debugging “your DNS,” you’re debugging “the internet’s view of your DNS.”
  9. Monitor outcomes. Track bounce reasons, DMARC aggregate signals if you have them, and vendor delivery dashboards. You’re looking for SPF permerrors and rising softfails.
  10. Move from ~all to -all on a deadline. Set a date. Make someone own it. Otherwise softfail becomes your forever-policy.

Checklist: pre-flight before you add a new vendor include

  • Do we know the vendor’s envelope-from domain and can it be a custom subdomain under ours?
  • Will this include chain keep us under 10 DNS lookups in worst case?
  • Are we adding a second SPF record by mistake?
  • Is DKIM aligned and stable for the stream?
  • Is DMARC enforcement going to punish mistakes immediately?
  • Can we roll back quickly (low TTL, prior record stored, change pipeline ready)?

Checklist: emergency mitigation when SPF is actively breaking

  • Stop making parallel DNS edits from multiple consoles. One change owner.
  • If multiple SPF TXT exist: merge and remove duplicates first.
  • If permerror due to lookups: remove the newest include and re-test. Restore delivery, then redesign.
  • If a new outbound IP appeared: route traffic through the known relay temporarily instead of expanding SPF blindly.
  • If DMARC is rejecting and you can’t fix SPF fast: ensure DKIM aligned signing is working for the affected stream (often the quickest functional bypass).

FAQ

1) Can I have multiple SPF records if I split them across TXT strings?

You can split a single SPF record across multiple quoted strings within the same TXT RRset entry (some DNS providers do this automatically). You cannot publish multiple separate TXT records that each begin with v=spf1 for the same name.

2) What’s the difference between SPF fail and SPF softfail?

-all produces fail for unauthorized senders; ~all produces softfail. Softfail is “not authorized, but maybe accept.” Some receivers treat softfail nearly as harshly as fail when combined with poor reputation or DMARC enforcement signals.

3) Why do I see SPF pass but still get rejected?

Because SPF pass doesn’t guarantee DMARC pass. DMARC requires alignment between the From: domain and the SPF-authenticated envelope-from domain (or DKIM alignment). You can also be rejected for content, reputation, blocklists, or policy unrelated to SPF.

4) Does SPF protect the visible From: header?

Not directly. SPF authenticates the envelope-from (Return-Path) domain. DMARC is what ties authentication (SPF/DKIM) to the visible From: domain via alignment rules.

5) How do forwarders break SPF?

Classic forwarding changes the connecting IP to the forwarder’s IP, but the envelope-from domain remains the original sender’s domain. Unless the original SPF authorizes the forwarder’s IP (it usually doesn’t), SPF fails. This is one reason DKIM is crucial, and why some forwarders implement SRS (Sender Rewriting Scheme) to preserve SPF compatibility.

6) Should I use “a” and “mx” mechanisms in SPF?

Only if you truly understand and control what those resolve to. They cost DNS lookups and can unintentionally authorize infrastructure that shouldn’t send mail. For controlled outbound, explicit ip4/ip6 or a known vendor include is usually safer.

7) What’s SPF “redirect” and why can it be dangerous?

redirect= hands evaluation to another domain’s SPF record. It’s neat for centralizing policy, but it adds dependency chains and lookup risk. If the target record changes or breaks, your domain breaks too.

8) How long does an SPF fix take to “work”?

It depends on TTL and caching behavior. The best-case is minutes if TTL is low and resolvers cooperate. Worst-case can be hours. Some receivers cache aggressively. Plan changes with TTL strategy and expect a propagation tail.

9) Is SPF enough by itself in 2026?

No. SPF alone is insufficient because it doesn’t survive forwarding reliably and it doesn’t authenticate the visible From: header. Use SPF plus DKIM plus DMARC. Treat SPF as necessary but not sufficient.

10) When should I move from ~all to -all?

After you’ve validated all legitimate senders are authorized and you have a rollback plan. In practice: run ~all briefly during discovery, then switch to -all once your inventory is accurate. Put a deadline on it so it actually happens.

Conclusion: next steps you can execute today

If SPF is breaking delivery, it’s usually not because SPF is “hard.” It’s because DNS changes are unreviewed, sender inventories are fictional, and vendor onboarding is treated like a copy-paste exercise. Fix the process, not just the string.

Do this next:

  1. Pull a real failing message and extract return-path domain and sender IP.
  2. Query authoritative DNS and confirm exactly one v=spf1 TXT record exists for that domain name.
  3. Check lookup budget; remove risky mechanisms and unnecessary includes.
  4. Align identities: use custom return-path subdomains for vendors and ensure DKIM aligns with your From: domain.
  5. Lock down change control for SPF/DKIM/DMARC in a pipeline with linting and rollback.

SPF isn’t glamorous, but it’s one of those infrastructure corners where “mostly right” still fails in production. Get it boring. Keep it boring. Your inbox metrics will thank you.

← Previous
Ubuntu 24.04: Web server suddenly shows 502/504 — the real reason (and how to fix it fast)
Next →
MySQL vs MongoDB for Reporting and Analytics: Why Teams Crawl Back to SQL

Leave a comment