SPF records tend to start as a tidy one-liner. Then marketing buys a new sender, support adds a ticketing tool, product ships “send from our domain” in an integration, and finance insists on invoice mail from a different provider. Six months later your SPF looks like a ransom note written in DNS.
The failure mode is always the same: someone adds one more include:, crosses the DNS-lookup limit, and mail delivery gets “creative.” Sometimes it’s a quiet drop into spam. Sometimes it’s a hard fail at big receivers. Either way, you’re debugging DNS at 2 a.m. while a VP forwards you screenshots of bounce messages.
Why SPF includes get ugly (and why it matters)
SPF is simple in spirit: publish which hosts are allowed to send mail for a domain. The trouble is how SPF expresses that simplicity: a policy language that can trigger DNS lookups during evaluation, and a hard ceiling on how many you’re allowed.
That ceiling is the center of most SPF pain: the “10 DNS-lookup limit.” If SPF evaluation needs more than 10 DNS lookups, evaluators should return permerror (permanent error), and many receivers treat that as “this policy is broken, so we’re going to be skeptical.” Skeptical is email’s polite term for “your message is going to take a scenic route to the inbox, if it arrives at all.”
Here’s why includes turn into a mess:
- Includes are transitive. Every
include:can itself include other SPF records, each triggering more lookups. You don’t control what your vendor adds later. - Includes hide work. Your SPF record looks short, but the effective policy can be a small novella of DNS.
- Third parties churn. Vendors change infrastructure. Their SPF changes. Sometimes they add more nested includes. You inherit their complexity.
- You keep adding senders. Nobody deletes old tools. They just “pause campaigns.” Which is corporate for “we’ll be back in 18 months.”
Operationally, SPF is a dependency graph problem. Not a string problem. Treat it like dependency management: inventory, pin your requirements, reduce optional dependencies, and monitor for drift.
One short joke, as a palate cleanser: SPF records are like group chats: every new “include” feels harmless until your phone catches fire.
What “breaking mail” actually looks like
You rarely get a clean outage. You get a slow bleed:
- New signups stop receiving verification email at certain ISPs.
- Sales emails start landing in spam “all of a sudden” (translation: it’s been weeks).
- DMARC reports show rising SPF failures from legitimate sources you forgot existed.
- Support tickets arrive with bounce codes that vary by receiver and by mood.
SPF isn’t the only signal receivers use, but it’s a foundational one. If it’s broken, everything else has to work harder. And in email, “harder” means “less reliably.”
Interesting facts and small history
Six to ten small context points, because understanding where SPF came from makes the current mess feel… inevitable.
- SPF predates modern cloud email sprawl. It was designed when “your outbound mail servers” were a small, mostly static set.
- The 10-lookup limit is deliberate. It’s an anti-abuse and anti-DoS guardrail: receivers shouldn’t be forced into unbounded DNS work per message.
- “Include” is effectively a conditional jump.
include:says “if the sender passes that other policy, treat it as a pass here.” It’s not just a list merge. - SPF checks the envelope-from domain, not the From: header. That surprises people even in 2026, and it’s why DMARC exists to bridge identity alignment.
- SPF can fail for forwarded mail. Classic forwarding often breaks SPF because the forwarding host isn’t authorized by the original domain’s SPF.
- Receivers disagree on edge behavior. The spec is clear on many things, but real-world handling of
permerror,temperror, and ambiguous DNS responses varies. - DNS TTL is an operational lever. Short TTLs help you roll changes quickly, but increase DNS traffic; long TTLs make mistakes linger.
- “SPF flattening” became a cottage industry. Because vendor include chains became too deep, people started pre-resolving includes to concrete IPs—sometimes safely, sometimes not.
- SPF is widely deployed because it’s cheap. Publishing a TXT record is easier than managing keys (DKIM) or enforcing alignment (DMARC), so SPF often becomes the “first and only” control.
A mental model: how SPF really evaluates
When a receiver evaluates SPF, it’s running a small program: it starts with the domain in the SMTP envelope (MAIL FROM) and asks DNS for that domain’s SPF policy (usually via a TXT record starting with v=spf1). Then it checks mechanisms left to right until one matches and returns an outcome.
The mechanisms that matter for includes
include:causes the evaluator to fetch and evaluate the included domain’s SPF. That costs DNS lookups (at least one TXT query, plus whatever the included record triggers).aandmxcan trigger lookups too. If your record usesmx, SPF evaluation needs to query MX, then resolve A/AAAA for each MX target.existstriggers lookups, and it can be abused. Treat it as suspicious unless you have a very good reason.ptris deprecated for good reasons (performance and spoofing). If you still have it, you are carrying historical debt with interest.ip4/ip6are “free” from the DNS-lookup accounting perspective. They don’t require lookups during evaluation. They can still be operationally expensive to maintain.redirect=is like “replace my entire policy with this other domain’s policy.” It can be useful to centralize SPF for subdomains.
Why the lookup limit bites harder than you expect
The limit counts certain mechanisms and modifiers that cause DNS queries: include, a, mx, ptr, exists, and redirect. What feels like “one include” may be “one include plus five MX plus eight A/AAAA resolves.”
Also, SPF evaluation can involve both A and AAAA lookups. Even if you don’t use IPv6 intentionally, your DNS might publish AAAA records or your vendor might, and some evaluators query both.
Opinionated take: if you can’t explain your SPF record’s lookup count on a whiteboard from memory, you don’t own it. You’re renting it from chance.
One reliability quote (paraphrased)
Paraphrased idea (Werner Vogels, AWS): “Design so failure is expected, and build systems that keep working when parts fail.” SPF includes are parts.
Fast diagnosis playbook
This is what you do when deliverability is on fire and someone just pasted a bounce message into Slack.
First: identify which identity is failing
- Check the MAIL FROM / Return-Path domain in the failing message headers (or the sender system config). SPF evaluates that, not the visible From.
- Confirm the sending IP from the Received headers or your MTA logs.
- Ask: is this a new sender or a forwarded path? New sender suggests SPF inventory gap; forwarding suggests SPF will fail by design and you’ll need DKIM/DMARC strategy.
Second: determine whether it’s “policy wrong” or “policy broken”
- Query the SPF TXT record and confirm there’s exactly one SPF policy for that domain.
- Count DNS lookups (or at least identify the include chain depth). If you’re near or above 10, assume
permerrorin the field. - Check for DNS issues: NXDOMAIN, SERVFAIL, timeouts, or split-horizon surprises.
Third: choose the least risky mitigation
- If lookup limit is the issue: remove nonessential includes, or move a sender to a subdomain with its own SPF, or temporarily authorize a concrete IP range with
ip4/ip6(with a tight change ticket to remove it later). - If the sender IP is simply missing: add the correct authorization, but only after verifying the system truly sends with that domain’s envelope-from.
- If this is a forwarding break: don’t “fix” SPF by authorizing the forwarder; it’s whack-a-mole. Use DKIM/DMARC and/or ARC where applicable.
One rule: don’t “optimize” while you’re diagnosing. Stabilize first, then simplify.
Hands-on tasks: commands, outputs, and decisions
These are real operator tasks. Each one includes a command, an example output, what it means, and the decision you make next. Run them from a Linux host with standard tooling. If you don’t have these tools, install them on a jump box, not on your mail servers in the middle of an incident.
Task 1: Fetch the SPF record (and see if you have multiple)
cr0x@server:~$ dig +short TXT example.com
"v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"
"google-site-verification=abc123"
What it means: There is exactly one SPF policy string (the one starting with v=spf1). Good.
Decision: If you see more than one v=spf1 string returned, you must consolidate; multiple SPF records commonly cause permerror.
Task 2: Confirm the envelope-from domain in a message (local MTA)
cr0x@server:~$ postcat -q 1A2B3C4D | sed -n '1,40p'
*** ENVELOPE RECORDS ***
message_arrival_time: Tue Jan 2 12:01:09 2026
original_recipient: user@example.net
sender: bounce@mailer.example.com
*** MESSAGE CONTENTS ***
Return-Path: <bounce@mailer.example.com>
From: "Example Billing" <billing@example.com>
To: user@example.net
Subject: Invoice
What it means: SPF will be evaluated against mailer.example.com (Return-Path / sender), not example.com in the visible From.
Decision: Audit SPF for the envelope domain that actually sends, not the one marketing thinks it uses.
Task 3: Get a quick SPF validation signal via OpenSSL (SMTP banner sanity)
cr0x@server:~$ openssl s_client -starttls smtp -crlf -connect mx1.receiver.net:25 </dev/null | sed -n '1,15p'
CONNECTED(00000003)
depth=2 C=US, O=Internet Security Research Group, CN=ISRG Root X1
verify return:1
depth=1 C=US, O=Let's Encrypt, CN=R3
verify return:1
depth=0 CN=mx1.receiver.net
verify return:1
250 mx1.receiver.net ESMTP ready
What it means: Not SPF directly, but you’re confirming you can reach a receiver and TLS negotiation isn’t the real issue during tests.
Decision: If connectivity/TLS fails, fix that before blaming SPF. Deliverability “outages” sometimes start with your own network.
Task 4: Inspect SPF includes recursively (basic)
cr0x@server:~$ dig +short TXT _spf.google.com
"v=spf1 ip4:74.125.0.0/16 ip4:173.194.0.0/16 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all"
What it means: That single include expands into three more includes plus multiple IP ranges.
Decision: Start building an include tree and a lookup count. If you’re already including two big providers, you’re likely close to the limit.
Task 5: Count how many SPF strings exist (guard against accidental duplication)
cr0x@server:~$ dig +short TXT example.com | grep -c "v=spf1"
1
What it means: One SPF record. Great.
Decision: If output is 2 or more, consolidate into one. Delete the extras. Don’t try to “split” SPF across records; SPF doesn’t work that way.
Task 6: Check if a subdomain already has its own SPF record
cr0x@server:~$ dig +short TXT mailer.example.com
"v=spf1 include:spf.vendor-mail.com -all"
What it means: The vendor already supports sending from a delegated subdomain with separate SPF.
Decision: Prefer moving vendor senders to their own subdomain SPF rather than bloating the apex domain.
Task 7: Find MX records (and realize mx can cost you lookups)
cr0x@server:~$ dig +short MX example.com
10 mx01.mailhost.example.net.
20 mx02.mailhost.example.net.
What it means: If your SPF record uses mx, evaluation may resolve these names and their A/AAAA records. That’s multiple lookups.
Decision: Avoid mx in SPF unless you truly send outbound from your inbound MX hosts (rare in modern setups).
Task 8: Resolve A and AAAA for MX targets (lookup budgeting)
cr0x@server:~$ dig +short A mx01.mailhost.example.net
203.0.113.10
203.0.113.11
cr0x@server:~$ dig +short AAAA mx01.mailhost.example.net
2001:db8:10::10
What it means: Multiple addresses means additional resolution work for evaluators. Some will query both A and AAAA.
Decision: If you were planning to use mx or a in SPF, consider explicit ip4/ip6 instead for predictability (with a plan to maintain it).
Task 9: Measure DNS response time and detect flaky resolution
cr0x@server:~$ dig example.com TXT +stats | tail -n 5
;; Query time: 142 msec
;; SERVER: 10.0.0.53#53(10.0.0.53) (UDP)
;; WHEN: Tue Jan 2 12:20:11 UTC 2026
;; MSG SIZE rcvd: 232
What it means: 142 ms is not terrible, but if your includes require many lookups, latency compounds. If you see seconds, you have a DNS problem.
Decision: If DNS is slow or failing, SPF evaluation becomes unreliable. Fix DNS health (authoritative servers, resolvers, timeouts) before “simplifying” SPF.
Task 10: Check authoritative vs recursive answers (split-horizon traps)
cr0x@server:~$ dig @ns1.example-dns.net example.com TXT +norecurse +short
"v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"
What it means: You’re seeing what the authoritative server publishes. If this differs from your internal resolver answer, you have split-horizon DNS or caching issues.
Decision: Ensure the public authoritative record is correct; receivers use public DNS, not your internal view.
Task 11: Spot a broken include target (NXDOMAIN/SERVFAIL)
cr0x@server:~$ dig +short TXT spf.dead-vendor.example
What it means: Empty output usually means NXDOMAIN or no TXT. That include will fail evaluation depending on receiver behavior and spec interpretation.
Decision: Remove dead includes immediately. If you still need that sender, replace the include with their current supported domain or explicit IPs.
Task 12: Detect oversize SPF TXT strings (DNS limits and quoting)
cr0x@server:~$ dig +short TXT example.com | awk 'length($0) {print length($0), $0}'
92 "v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"
33 "google-site-verification=abc123"
What it means: TXT records can be split into multiple quoted strings by DNS servers; some tooling and humans mis-handle this. Large SPF records can also exceed practical limits and get mangled.
Decision: Keep SPF compact. If you’re nearing size limits or seeing split strings, that’s a smell: simplify by architecture, not by cramming more text.
Task 13: Validate that a given IP would pass SPF (local check with spfquery)
cr0x@server:~$ spfquery -ip 198.51.100.77 -sender bounce@mailer.example.com -helo mailer.example.com
pass
What it means: For this envelope domain and IP, SPF should pass.
Decision: If it fails, fix SPF or fix the sender configuration. If it passes locally but fails at receivers, suspect DNS propagation, lookup limits, or resolver differences.
Task 14: Check DMARC alignment clues (to avoid “fixing” the wrong layer)
cr0x@server:~$ dig +short TXT _dmarc.example.com
"v=DMARC1; p=quarantine; rua=mailto:dmarc@reports.example.com; adkim=s; aspf=s"
What it means: Strict alignment is required for both DKIM and SPF. If your envelope domain differs from From domain, SPF may pass but still not align.
Decision: When simplifying SPF, keep identity architecture in mind: align senders to the correct subdomain or ensure DKIM aligns.
Task 15: Check recent mail logs for “SPF permerror” patterns (Postfix example)
cr0x@server:~$ sudo grep -i "spf" /var/log/mail.log | tail -n 8
Jan 2 12:12:41 mx postfix/smtpd[22191]: warning: SPF: permerror (too many DNS lookups) identity=mailfrom; client-ip=198.51.100.77; helo=mailer.example.com; envelope-from=bounce@mailer.example.com; receiver=mx
Jan 2 12:13:02 mx postfix/smtpd[22191]: warning: SPF: fail identity=mailfrom; client-ip=203.0.113.55; envelope-from=noreply@example.com
What it means: You have both a lookup-limit breach and a straightforward fail for another identity. Two different problems, same week. Classic.
Decision: Prioritize permerror first; it can tank deliverability even for legitimate traffic. Then handle individual missing senders.
Task 16: Verify TXT changes propagate (watch TTL and caching)
cr0x@server:~$ dig example.com TXT +noall +answer
example.com. 300 IN TXT "v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"
What it means: TTL is 300 seconds (5 minutes). That’s incident-friendly.
Decision: For planned migrations, lower TTL a day ahead. For stable operations, raise it to reduce resolver load and improve cache hit rates.
Simplification strategies that don’t blow up production
The goal is not “short SPF.” The goal is “predictable SPF that stays under limits, survives vendor changes, and matches how your mail actually flows.” Short is a side effect.
1) Stop treating the apex domain SPF as a junk drawer
Put third-party senders on subdomains whenever possible. This is the single highest-leverage move. Examples:
mailer.example.comfor marketing automationsupport.example.comfor ticketingnotify.example.comfor product notifications
Then you publish separate SPF records per subdomain. Your apex SPF stays lean: only the infrastructure that truly sends mail as @example.com.
Tradeoff: You must ensure visible From domains and DKIM align with your DMARC policy, or you’ll fix SPF and still fail DMARC.
2) Prefer redirect= for consistency across a fleet of subdomains
If you own many subdomains that should share the same SPF, centralize it:
v=spf1 redirect=_spf.example.com- And then define the real policy at
_spf.example.com
This doesn’t reduce lookups by magic, but it reduces administrative mistakes. It’s a control-plane simplification, not a data-plane one.
3) Be ruthless about removing stale senders
Most orgs keep includes for vendors they stopped using years ago. Includes cost you lookup budget and increase the blast radius of vendor changes.
Operational move: require an “owner” for each include with a quarterly re-attestation. No owner, no include.
4) Avoid mx and a mechanisms unless you really mean them
mx in SPF is a classic legacy trick: “if it can receive mail for us, it can send mail for us.” In modern environments, inbound and outbound are separate. Using mx in SPF often authorizes hosts that should never send mail as you.
Recommendation: If you have a small, stable outbound pool, use explicit ip4/ip6. If you don’t, use subdomains and vendor includes. Don’t use mx as a lazy shortcut.
5) Flattening: useful, risky, sometimes necessary
“Flattening” means replacing includes with the resulting IP ranges (ip4/ip6) so SPF evaluation doesn’t need as many DNS lookups. This can be effective for staying under the 10-lookup limit.
But flattening has sharp edges:
- Vendor IP ranges can change without notice. Your flattened record goes stale, and you fail legitimate mail.
- Large IP lists make TXT records long and more error-prone to edit.
- You may accidentally authorize more than intended if you flatten too broadly.
When flattening is acceptable: for senders with published, stable IP ranges and a change control process on your side to refresh them regularly. Treat it like updating firewall rules: scheduled, reviewed, and monitored.
Second short joke: Flattening SPF is like meal-prepping: it saves time during the week, until you forget it in the fridge and it becomes a science project.
6) Make SPF a product: version it, test it, and stage it
SPF changes should go through the same discipline as production config:
- Keep the record in git (as text, plus an explanation comment in the repo).
- Have a test harness that counts lookups and validates syntax.
- Use a staged rollout via lower TTL, then deploy, then monitor DMARC reports and bounce patterns.
7) Align identity architecture with DMARC, not against it
Teams often try to fix deliverability by stuffing more into SPF, when the real problem is identity mismatch. If your DMARC policy is strict, you must ensure either:
- SPF passes and aligns (envelope domain aligns with From domain), or
- DKIM passes and aligns (d= domain aligns with From domain)
That frequently pushes you toward “each sender uses its own subdomain, with aligned DKIM,” which also happens to reduce SPF complexity at the apex. Funny how good architecture solves multiple problems.
Three corporate mini-stories from the SPF trenches
Mini-story 1: The incident caused by a wrong assumption
A mid-size SaaS company ran outbound mail from three places: Google Workspace for humans, a transactional provider for product email, and a marketing platform for campaigns. Their SPF at the apex was already heavy, but it still passed most of the time, which is email’s way of encouraging bad behavior.
A new integration shipped: “Send invitations from your company domain.” Engineering assumed that meant adding a DKIM key and signing as example.com. Reasonable assumption. The integration team configured the vendor to use bounce@vendor.example.net as the envelope sender because it was the default, and nobody asked what “SPF identity” would be evaluated against.
Two weeks later, customer invites started bouncing at a large receiver. The bounce reason referenced SPF failures for vendor.example.net, not example.com. Support escalated to SRE. SRE stared at the apex SPF record for ten minutes before realizing it was irrelevant to this failure. Wrong domain. Wrong identity.
The fix wasn’t “add another include.” The fix was to move the vendor envelope sender to mailer.example.com and publish a dedicated SPF for that subdomain. DKIM alignment was updated to match. The incident ended quickly, but the postmortem was blunt: you can’t reason about email auth by looking at the From header and vibes.
Mini-story 2: The optimization that backfired
An enterprise IT team decided to “clean up DNS.” They noticed their SPF record used include: for a provider that, on inspection, expanded to lots of ip4 ranges. They’d heard of SPF flattening, and it sounded like a performance win. So they flattened it manually.
At first, it looked great: fewer DNS lookups, shorter include chain, fewer intermittent temperror tickets from receivers with strict timeouts. They declared victory and moved on. A month later, a slow trickle of “password reset never arrived” reports started showing up, from a subset of consumer mail providers.
The vendor had added new sending IPs. The include record updated. The flattened record didn’t. The mail didn’t fail everywhere; it failed just enough to be expensive and hard to prove. Nobody wanted to roll back because the flattened SPF had been presented as a security improvement. It wasn’t a lie, but it wasn’t the whole truth either.
They eventually rebuilt the approach: flattening was automated with scheduled refresh, changes were diffed and reviewed, and the list was constrained to only the vendor services actually in use. The backfire wasn’t flattening itself. It was flattening without treating it like a living dependency.
Mini-story 3: The boring but correct practice that saved the day
A global company had a reputation for being “slow” about email changes. Every SPF update required an owner, a ticket, a rollback plan, and a 24-hour observation period. People complained. Marketing complained louder. This is the natural order of things.
One morning, a well-known email vendor had a DNS outage. Their SPF include domain intermittently returned SERVFAIL. Many customers saw sporadic temperror during SPF evaluation. Some receivers deferred mail. Some treated it as suspicious. Chaos, but polite chaos.
This company had two advantages: (1) their apex SPF had only one third-party include, because almost all vendor mail used delegated subdomains, and (2) they had monitoring that sampled SPF resolution from multiple public resolvers and alerted on SERVFAIL and lookup count anomalies.
The response was boring: temporarily route critical transactional mail through a secondary provider already authorized on a subdomain, and wait out the vendor’s DNS recovery. No panicked edits to the apex SPF. No “just add this include” at 9 a.m. The post-incident review was equally boring: they documented the vendor dependency and kept the architecture. Boring, in production systems, is a compliment.
Common mistakes: symptom → root cause → fix
1) Symptom: intermittent failures, sometimes “permerror”
Root cause: too many DNS lookups due to nested includes, plus mx/a mechanisms inside include chains.
Fix: reduce dependencies: move vendors to subdomains, remove stale includes, avoid mx/a in SPF, consider managed flattening with refresh.
2) Symptom: “SPF fail” but you swear the From domain is correct
Root cause: you’re looking at the From header; SPF is checking the envelope-from domain (Return-Path).
Fix: inspect headers/logs to find MAIL FROM. Fix the envelope domain configuration or publish SPF for the actual envelope domain.
3) Symptom: SPF suddenly breaks after “a small DNS cleanup”
Root cause: multiple SPF records (multiple v=spf1 TXT strings) exist, or someone edited quoting/spacing and broke parsing.
Fix: publish exactly one SPF policy per domain; validate with tooling; keep SPF in source control.
4) Symptom: mail forwarding causes SPF failures
Root cause: forwarders send from their IPs; original domain’s SPF doesn’t authorize them.
Fix: don’t authorize the world. Use DKIM signing and DMARC alignment; for receivers/forwarders that support it, ARC can preserve auth results.
5) Symptom: SPF passes but DMARC fails
Root cause: SPF passes for an envelope domain that doesn’t align with the From domain under DMARC (especially with strict alignment).
Fix: align identities: use a matching envelope domain (often a subdomain of the From domain) and/or ensure DKIM d= aligns with From.
6) Symptom: SPF works for IPv4 senders, fails mysteriously for some receivers
Root cause: you’re sending over IPv6 from some hosts, but SPF only authorizes IPv4, or the include chain behaves differently with AAAA queries and timeouts.
Fix: inventory actual sending IPs including IPv6; authorize ip6: as needed; ensure outbound MTAs aren’t leaking unexpected IPv6 paths.
7) Symptom: vendor mail fails only for some of their regions
Root cause: you flattened SPF and missed new vendor IP pools, or the vendor has multiple SPF include domains for different products and you included the wrong one.
Fix: revert to vendor include if you can afford the lookup budget, or implement automated flatten refresh with review; verify you’re using the correct SPF include for the product tier.
8) Symptom: receivers report “temperror” in SPF
Root cause: DNS timeouts, SERVFAIL, or flaky authoritative servers for either your domain or an include target.
Fix: harden DNS (authoritative redundancy, resolver health), reduce lookup count to reduce time spent in DNS, monitor from multiple public resolvers.
Checklists / step-by-step plan
Step-by-step: simplify SPF includes safely
- Inventory senders. List every system that sends mail using your brand: human mailboxes, transactional, marketing, support, finance, monitoring, CI, and “product invites.” Tie each to an owner.
- For each sender, capture identities. From domain, envelope-from domain, DKIM d= domain, sending IP ranges (v4/v6), and whether it supports custom return-path.
- Draw the current SPF dependency graph. Apex SPF plus all includes, and the includes inside those includes. Count estimated DNS lookups.
- Decide domain strategy. Which senders truly need to send as
@example.com? Everything else goes to delegated subdomains. - Move one sender at a time to a subdomain. Configure vendor to use
bounce@mailer.example.com(or similar), publish SPF for that subdomain, ensure DKIM aligns. - Keep apex SPF minimal. Usually: corporate mail provider + primary transactional provider, and only if they must use the apex domain.
- Remove stale includes. If a tool is “not active,” remove it. If someone complains later, they can re-authorize intentionally with proper design.
- Lower TTL ahead of big changes. Do this at least a day before if you can. Raise TTL back after stabilization.
- Validate syntax and lookup budget. Run SPF evaluation tests for representative sender IPs and identities.
- Roll out and observe. Watch DMARC aggregate reports and inbound bounces. Expect lag due to caches and reporting cadence.
- Document the contract. For each include/subdomain, write down: owner, vendor, why it exists, what it authorizes, and how to rotate it.
- Monitor for drift. Includes can change under you. Track lookup count and DNS health as a continuous signal.
Operational checklist: before you add any new include:
- Do we actually need apex-domain sending, or can this be a subdomain?
- What is the vendor’s envelope-from and does it support a custom return-path?
- What’s our current lookup budget and what will this include cost when expanded?
- Who owns this sender and will re-attest quarterly?
- Do we have DKIM alignment that will satisfy DMARC?
- What’s the rollback if this include causes permerror?
Emergency checklist: “We hit permerror”
- Confirm
permerror (too many DNS lookups)in logs or receiver feedback. - Identify the last SPF change (diff your DNS records).
- Temporarily remove the least critical include(s) to get under the limit.
- If removal breaks a business-critical sender, move it to a subdomain with its own SPF as the real fix.
- After stabilization, perform a full sender inventory and simplify properly.
FAQ
1) Why is there a “10 DNS lookup” limit in SPF?
Because receivers must evaluate SPF for huge volumes of mail. Without a limit, SPF could be abused to force excessive DNS work per message, turning mail reception into a DNS stress test.
2) Does adding ip4: entries cost DNS lookups?
No. ip4/ip6 mechanisms don’t require DNS queries during SPF evaluation. They’re cheap for receivers, but you pay the maintenance cost.
3) Are a and mx “bad” in SPF?
Not inherently, but they are often used lazily and can balloon lookups. They also tend to authorize more hosts than you intended. In 2026, explicit IPs or vendor subdomains are usually safer.
4) What’s the difference between include: and redirect=?
include: says “if the included policy passes, treat it as a match here” and evaluation continues if it doesn’t. redirect= says “use that other domain’s policy as the policy for this domain” (typically as the final modifier).
5) Can I have multiple SPF TXT records to avoid length or complexity?
No. Publishing multiple v=spf1 records is a common mistake and often yields permerror. You need exactly one SPF policy per domain.
6) If we use DMARC, do we still need SPF?
Yes. DMARC relies on SPF and/or DKIM passing and aligning. SPF is still a primary input, and many receivers use SPF results beyond DMARC evaluation.
7) Why does forwarding break SPF, and what do we do about it?
Forwarders re-send from their own IPs, which aren’t in your SPF. The durable fix is DKIM signing with alignment, plus DMARC. In some ecosystems, ARC can help preserve authentication across forwarding.
8) Should we use ~all or -all?
Use -all when you’re confident your inventory is correct and you’re enforcing. Use ~all during migrations if you need telemetry and want to reduce hard failures. But don’t leave ~all forever; it becomes permanent indecision in DNS.
9) What’s the best way to simplify SPF without flattening?
Delegate. Move third-party mail to subdomains with dedicated SPF and aligned DKIM. Keep the apex SPF for only the few systems that must send as the apex domain.
10) How do we monitor SPF includes drifting over time?
Periodically resolve your SPF record, walk includes, and track lookup count and DNS failures from multiple resolvers. Alert on changes, especially new nested includes or new mechanisms like exists.
Conclusion: practical next steps
If your SPF record is a pile of includes, you don’t have an SPF record. You have a dependency graph you’re not managing. The way out is not heroic DNS kung fu; it’s architecture and ownership.
- Today: pull your current SPF, confirm you have exactly one
v=spf1, and identify whether you’re flirting with the 10-lookup limit. - This week: inventory senders and move at least one major third-party sender to a delegated subdomain with its own SPF and aligned DKIM.
- This month: establish ownership per sender/include, implement a lightweight validation pipeline (syntax + lookup budget), and add monitoring for DNS failures and unexpected include drift.
SPF doesn’t need to be perfect. It needs to be operable. Keep it under the lookup limit, keep it aligned with how mail actually leaves your systems, and stop treating DNS as a place to hide complexity. Your inboxes—and your sleep—will improve.