Email: SPF includes are a mess — how to simplify without breaking mail

October 11, 2025 • February 3, 2026 • Read: 24 min • Views: 8

Was this helpful?

SPF records tend to start as a tidy one-liner. Then marketing buys a new sender, support adds a ticketing tool, product ships “send from our domain” in an integration, and finance insists on invoice mail from a different provider. Six months later your SPF looks like a ransom note written in DNS.

The failure mode is always the same: someone adds one more include:, crosses the DNS-lookup limit, and mail delivery gets “creative.” Sometimes it’s a quiet drop into spam. Sometimes it’s a hard fail at big receivers. Either way, you’re debugging DNS at 2 a.m. while a VP forwards you screenshots of bounce messages.

Why SPF includes get ugly (and why it matters)

SPF is simple in spirit: publish which hosts are allowed to send mail for a domain. The trouble is how SPF expresses that simplicity: a policy language that can trigger DNS lookups during evaluation, and a hard ceiling on how many you’re allowed.

That ceiling is the center of most SPF pain: the “10 DNS-lookup limit.” If SPF evaluation needs more than 10 DNS lookups, evaluators should return permerror (permanent error), and many receivers treat that as “this policy is broken, so we’re going to be skeptical.” Skeptical is email’s polite term for “your message is going to take a scenic route to the inbox, if it arrives at all.”

Here’s why includes turn into a mess:

Includes are transitive. Every include: can itself include other SPF records, each triggering more lookups. You don’t control what your vendor adds later.
Includes hide work. Your SPF record looks short, but the effective policy can be a small novella of DNS.
Third parties churn. Vendors change infrastructure. Their SPF changes. Sometimes they add more nested includes. You inherit their complexity.
You keep adding senders. Nobody deletes old tools. They just “pause campaigns.” Which is corporate for “we’ll be back in 18 months.”

Operationally, SPF is a dependency graph problem. Not a string problem. Treat it like dependency management: inventory, pin your requirements, reduce optional dependencies, and monitor for drift.

One short joke, as a palate cleanser: SPF records are like group chats: every new “include” feels harmless until your phone catches fire.

What “breaking mail” actually looks like

You rarely get a clean outage. You get a slow bleed:

New signups stop receiving verification email at certain ISPs.
Sales emails start landing in spam “all of a sudden” (translation: it’s been weeks).
DMARC reports show rising SPF failures from legitimate sources you forgot existed.
Support tickets arrive with bounce codes that vary by receiver and by mood.

SPF isn’t the only signal receivers use, but it’s a foundational one. If it’s broken, everything else has to work harder. And in email, “harder” means “less reliably.”

Interesting facts and small history

Six to ten small context points, because understanding where SPF came from makes the current mess feel… inevitable.

SPF predates modern cloud email sprawl. It was designed when “your outbound mail servers” were a small, mostly static set.
The 10-lookup limit is deliberate. It’s an anti-abuse and anti-DoS guardrail: receivers shouldn’t be forced into unbounded DNS work per message.
“Include” is effectively a conditional jump. include: says “if the sender passes that other policy, treat it as a pass here.” It’s not just a list merge.
SPF checks the envelope-from domain, not the From: header. That surprises people even in 2026, and it’s why DMARC exists to bridge identity alignment.
SPF can fail for forwarded mail. Classic forwarding often breaks SPF because the forwarding host isn’t authorized by the original domain’s SPF.
Receivers disagree on edge behavior. The spec is clear on many things, but real-world handling of permerror, temperror, and ambiguous DNS responses varies.
DNS TTL is an operational lever. Short TTLs help you roll changes quickly, but increase DNS traffic; long TTLs make mistakes linger.
“SPF flattening” became a cottage industry. Because vendor include chains became too deep, people started pre-resolving includes to concrete IPs—sometimes safely, sometimes not.
SPF is widely deployed because it’s cheap. Publishing a TXT record is easier than managing keys (DKIM) or enforcing alignment (DMARC), so SPF often becomes the “first and only” control.

A mental model: how SPF really evaluates

When a receiver evaluates SPF, it’s running a small program: it starts with the domain in the SMTP envelope (MAIL FROM) and asks DNS for that domain’s SPF policy (usually via a TXT record starting with v=spf1). Then it checks mechanisms left to right until one matches and returns an outcome.

The mechanisms that matter for includes

include: causes the evaluator to fetch and evaluate the included domain’s SPF. That costs DNS lookups (at least one TXT query, plus whatever the included record triggers).
a and mx can trigger lookups too. If your record uses mx, SPF evaluation needs to query MX, then resolve A/AAAA for each MX target.
exists triggers lookups, and it can be abused. Treat it as suspicious unless you have a very good reason.
ptr is deprecated for good reasons (performance and spoofing). If you still have it, you are carrying historical debt with interest.
ip4/ip6 are “free” from the DNS-lookup accounting perspective. They don’t require lookups during evaluation. They can still be operationally expensive to maintain.
redirect= is like “replace my entire policy with this other domain’s policy.” It can be useful to centralize SPF for subdomains.

Why the lookup limit bites harder than you expect

The limit counts certain mechanisms and modifiers that cause DNS queries: include, a, mx, ptr, exists, and redirect. What feels like “one include” may be “one include plus five MX plus eight A/AAAA resolves.”

Also, SPF evaluation can involve both A and AAAA lookups. Even if you don’t use IPv6 intentionally, your DNS might publish AAAA records or your vendor might, and some evaluators query both.

Opinionated take: if you can’t explain your SPF record’s lookup count on a whiteboard from memory, you don’t own it. You’re renting it from chance.

One reliability quote (paraphrased)

Paraphrased idea (Werner Vogels, AWS): “Design so failure is expected, and build systems that keep working when parts fail.” SPF includes are parts.

Fast diagnosis playbook

This is what you do when deliverability is on fire and someone just pasted a bounce message into Slack.

First: identify which identity is failing

Check the MAIL FROM / Return-Path domain in the failing message headers (or the sender system config). SPF evaluates that, not the visible From.
Confirm the sending IP from the Received headers or your MTA logs.
Ask: is this a new sender or a forwarded path? New sender suggests SPF inventory gap; forwarding suggests SPF will fail by design and you’ll need DKIM/DMARC strategy.

Second: determine whether it’s “policy wrong” or “policy broken”

Query the SPF TXT record and confirm there’s exactly one SPF policy for that domain.
Count DNS lookups (or at least identify the include chain depth). If you’re near or above 10, assume permerror in the field.
Check for DNS issues: NXDOMAIN, SERVFAIL, timeouts, or split-horizon surprises.

Third: choose the least risky mitigation

If lookup limit is the issue: remove nonessential includes, or move a sender to a subdomain with its own SPF, or temporarily authorize a concrete IP range with ip4/ip6 (with a tight change ticket to remove it later).
If the sender IP is simply missing: add the correct authorization, but only after verifying the system truly sends with that domain’s envelope-from.
If this is a forwarding break: don’t “fix” SPF by authorizing the forwarder; it’s whack-a-mole. Use DKIM/DMARC and/or ARC where applicable.

One rule: don’t “optimize” while you’re diagnosing. Stabilize first, then simplify.

Hands-on tasks: commands, outputs, and decisions

These are real operator tasks. Each one includes a command, an example output, what it means, and the decision you make next. Run them from a Linux host with standard tooling. If you don’t have these tools, install them on a jump box, not on your mail servers in the middle of an incident.

Task 1: Fetch the SPF record (and see if you have multiple)

cr0x@server:~$ dig +short TXT example.com
"v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"
"google-site-verification=abc123"

What it means: There is exactly one SPF policy string (the one starting with v=spf1). Good.

Decision: If you see more than one v=spf1 string returned, you must consolidate; multiple SPF records commonly cause permerror.

Task 2: Confirm the envelope-from domain in a message (local MTA)

cr0x@server:~$ postcat -q 1A2B3C4D | sed -n '1,40p'
*** ENVELOPE RECORDS ***
message_arrival_time: Tue Jan  2 12:01:09 2026
original_recipient: user@example.net
sender: bounce@mailer.example.com
*** MESSAGE CONTENTS ***
Return-Path: <bounce@mailer.example.com>
From: "Example Billing" <billing@example.com>
To: user@example.net
Subject: Invoice

What it means: SPF will be evaluated against mailer.example.com (Return-Path / sender), not example.com in the visible From.

Decision: Audit SPF for the envelope domain that actually sends, not the one marketing thinks it uses.

Task 3: Get a quick SPF validation signal via OpenSSL (SMTP banner sanity)

cr0x@server:~$ openssl s_client -starttls smtp -crlf -connect mx1.receiver.net:25 </dev/null | sed -n '1,15p'
CONNECTED(00000003)
depth=2 C=US, O=Internet Security Research Group, CN=ISRG Root X1
verify return:1
depth=1 C=US, O=Let's Encrypt, CN=R3
verify return:1
depth=0 CN=mx1.receiver.net
verify return:1
250 mx1.receiver.net ESMTP ready

What it means: Not SPF directly, but you’re confirming you can reach a receiver and TLS negotiation isn’t the real issue during tests.

Decision: If connectivity/TLS fails, fix that before blaming SPF. Deliverability “outages” sometimes start with your own network.

Task 4: Inspect SPF includes recursively (basic)

cr0x@server:~$ dig +short TXT _spf.google.com
"v=spf1 ip4:74.125.0.0/16 ip4:173.194.0.0/16 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all"

What it means: That single include expands into three more includes plus multiple IP ranges.

Decision: Start building an include tree and a lookup count. If you’re already including two big providers, you’re likely close to the limit.

Task 5: Count how many SPF strings exist (guard against accidental duplication)

cr0x@server:~$ dig +short TXT example.com | grep -c "v=spf1"
1

What it means: One SPF record. Great.

Decision: If output is 2 or more, consolidate into one. Delete the extras. Don’t try to “split” SPF across records; SPF doesn’t work that way.

Task 6: Check if a subdomain already has its own SPF record

cr0x@server:~$ dig +short TXT mailer.example.com
"v=spf1 include:spf.vendor-mail.com -all"

What it means: The vendor already supports sending from a delegated subdomain with separate SPF.

Decision: Prefer moving vendor senders to their own subdomain SPF rather than bloating the apex domain.

Task 7: Find MX records (and realize `mx` can cost you lookups)

cr0x@server:~$ dig +short MX example.com
10 mx01.mailhost.example.net.
20 mx02.mailhost.example.net.

What it means: If your SPF record uses mx, evaluation may resolve these names and their A/AAAA records. That’s multiple lookups.

Decision: Avoid mx in SPF unless you truly send outbound from your inbound MX hosts (rare in modern setups).

Task 8: Resolve A and AAAA for MX targets (lookup budgeting)

cr0x@server:~$ dig +short A mx01.mailhost.example.net
203.0.113.10
203.0.113.11

cr0x@server:~$ dig +short AAAA mx01.mailhost.example.net
2001:db8:10::10

What it means: Multiple addresses means additional resolution work for evaluators. Some will query both A and AAAA.

Decision: If you were planning to use mx or a in SPF, consider explicit ip4/ip6 instead for predictability (with a plan to maintain it).

Task 9: Measure DNS response time and detect flaky resolution

cr0x@server:~$ dig example.com TXT +stats | tail -n 5
;; Query time: 142 msec
;; SERVER: 10.0.0.53#53(10.0.0.53) (UDP)
;; WHEN: Tue Jan  2 12:20:11 UTC 2026
;; MSG SIZE  rcvd: 232

What it means: 142 ms is not terrible, but if your includes require many lookups, latency compounds. If you see seconds, you have a DNS problem.

Decision: If DNS is slow or failing, SPF evaluation becomes unreliable. Fix DNS health (authoritative servers, resolvers, timeouts) before “simplifying” SPF.

Task 10: Check authoritative vs recursive answers (split-horizon traps)

cr0x@server:~$ dig @ns1.example-dns.net example.com TXT +norecurse +short
"v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"

What it means: You’re seeing what the authoritative server publishes. If this differs from your internal resolver answer, you have split-horizon DNS or caching issues.

Decision: Ensure the public authoritative record is correct; receivers use public DNS, not your internal view.

Task 11: Spot a broken include target (NXDOMAIN/SERVFAIL)

cr0x@server:~$ dig +short TXT spf.dead-vendor.example

What it means: Empty output usually means NXDOMAIN or no TXT. That include will fail evaluation depending on receiver behavior and spec interpretation.

Decision: Remove dead includes immediately. If you still need that sender, replace the include with their current supported domain or explicit IPs.

Task 12: Detect oversize SPF TXT strings (DNS limits and quoting)

cr0x@server:~$ dig +short TXT example.com | awk 'length($0) {print length($0), $0}'
92 "v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"
33 "google-site-verification=abc123"

What it means: TXT records can be split into multiple quoted strings by DNS servers; some tooling and humans mis-handle this. Large SPF records can also exceed practical limits and get mangled.

Decision: Keep SPF compact. If you’re nearing size limits or seeing split strings, that’s a smell: simplify by architecture, not by cramming more text.

Task 13: Validate that a given IP would pass SPF (local check with spfquery)

cr0x@server:~$ spfquery -ip 198.51.100.77 -sender bounce@mailer.example.com -helo mailer.example.com
pass

What it means: For this envelope domain and IP, SPF should pass.

Decision: If it fails, fix SPF or fix the sender configuration. If it passes locally but fails at receivers, suspect DNS propagation, lookup limits, or resolver differences.

Task 14: Check DMARC alignment clues (to avoid “fixing” the wrong layer)

cr0x@server:~$ dig +short TXT _dmarc.example.com
"v=DMARC1; p=quarantine; rua=mailto:dmarc@reports.example.com; adkim=s; aspf=s"

What it means: Strict alignment is required for both DKIM and SPF. If your envelope domain differs from From domain, SPF may pass but still not align.

Decision: When simplifying SPF, keep identity architecture in mind: align senders to the correct subdomain or ensure DKIM aligns.

Task 15: Check recent mail logs for “SPF permerror” patterns (Postfix example)

cr0x@server:~$ sudo grep -i "spf" /var/log/mail.log | tail -n 8
Jan  2 12:12:41 mx postfix/smtpd[22191]: warning: SPF: permerror (too many DNS lookups) identity=mailfrom; client-ip=198.51.100.77; helo=mailer.example.com; envelope-from=bounce@mailer.example.com; receiver=mx
Jan  2 12:13:02 mx postfix/smtpd[22191]: warning: SPF: fail identity=mailfrom; client-ip=203.0.113.55; envelope-from=noreply@example.com

What it means: You have both a lookup-limit breach and a straightforward fail for another identity. Two different problems, same week. Classic.

Decision: Prioritize permerror first; it can tank deliverability even for legitimate traffic. Then handle individual missing senders.

Task 16: Verify TXT changes propagate (watch TTL and caching)

cr0x@server:~$ dig example.com TXT +noall +answer
example.com.  300 IN TXT "v=spf1 include:_spf.google.com include:spf.protection.outlook.com -all"

What it means: TTL is 300 seconds (5 minutes). That’s incident-friendly.

Decision: For planned migrations, lower TTL a day ahead. For stable operations, raise it to reduce resolver load and improve cache hit rates.

Simplification strategies that don’t blow up production

The goal is not “short SPF.” The goal is “predictable SPF that stays under limits, survives vendor changes, and matches how your mail actually flows.” Short is a side effect.

1) Stop treating the apex domain SPF as a junk drawer

Put third-party senders on subdomains whenever possible. This is the single highest-leverage move. Examples:

mailer.example.com for marketing automation
support.example.com for ticketing
notify.example.com for product notifications

Then you publish separate SPF records per subdomain. Your apex SPF stays lean: only the infrastructure that truly sends mail as @example.com.

Tradeoff: You must ensure visible From domains and DKIM align with your DMARC policy, or you’ll fix SPF and still fail DMARC.

2) Prefer `redirect=` for consistency across a fleet of subdomains

If you own many subdomains that should share the same SPF, centralize it:

v=spf1 redirect=_spf.example.com
And then define the real policy at _spf.example.com

This doesn’t reduce lookups by magic, but it reduces administrative mistakes. It’s a control-plane simplification, not a data-plane one.

3) Be ruthless about removing stale senders

Most orgs keep includes for vendors they stopped using years ago. Includes cost you lookup budget and increase the blast radius of vendor changes.

Operational move: require an “owner” for each include with a quarterly re-attestation. No owner, no include.

4) Avoid `mx` and `a` mechanisms unless you really mean them

mx in SPF is a classic legacy trick: “if it can receive mail for us, it can send mail for us.” In modern environments, inbound and outbound are separate. Using mx in SPF often authorizes hosts that should never send mail as you.

Recommendation: If you have a small, stable outbound pool, use explicit ip4/ip6. If you don’t, use subdomains and vendor includes. Don’t use mx as a lazy shortcut.

5) Flattening: useful, risky, sometimes necessary

“Flattening” means replacing includes with the resulting IP ranges (ip4/ip6) so SPF evaluation doesn’t need as many DNS lookups. This can be effective for staying under the 10-lookup limit.

But flattening has sharp edges:

Vendor IP ranges can change without notice. Your flattened record goes stale, and you fail legitimate mail.
Large IP lists make TXT records long and more error-prone to edit.
You may accidentally authorize more than intended if you flatten too broadly.

When flattening is acceptable: for senders with published, stable IP ranges and a change control process on your side to refresh them regularly. Treat it like updating firewall rules: scheduled, reviewed, and monitored.

Second short joke: Flattening SPF is like meal-prepping: it saves time during the week, until you forget it in the fridge and it becomes a science project.

6) Make SPF a product: version it, test it, and stage it

SPF changes should go through the same discipline as production config:

Keep the record in git (as text, plus an explanation comment in the repo).
Have a test harness that counts lookups and validates syntax.
Use a staged rollout via lower TTL, then deploy, then monitor DMARC reports and bounce patterns.

7) Align identity architecture with DMARC, not against it

Teams often try to fix deliverability by stuffing more into SPF, when the real problem is identity mismatch. If your DMARC policy is strict, you must ensure either:

SPF passes and aligns (envelope domain aligns with From domain), or
DKIM passes and aligns (d= domain aligns with From domain)

That frequently pushes you toward “each sender uses its own subdomain, with aligned DKIM,” which also happens to reduce SPF complexity at the apex. Funny how good architecture solves multiple problems.

Three corporate mini-stories from the SPF trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-size SaaS company ran outbound mail from three places: Google Workspace for humans, a transactional provider for product email, and a marketing platform for campaigns. Their SPF at the apex was already heavy, but it still passed most of the time, which is email’s way of encouraging bad behavior.

A new integration shipped: “Send invitations from your company domain.” Engineering assumed that meant adding a DKIM key and signing as example.com. Reasonable assumption. The integration team configured the vendor to use bounce@vendor.example.net as the envelope sender because it was the default, and nobody asked what “SPF identity” would be evaluated against.

Two weeks later, customer invites started bouncing at a large receiver. The bounce reason referenced SPF failures for vendor.example.net, not example.com. Support escalated to SRE. SRE stared at the apex SPF record for ten minutes before realizing it was irrelevant to this failure. Wrong domain. Wrong identity.

The fix wasn’t “add another include.” The fix was to move the vendor envelope sender to mailer.example.com and publish a dedicated SPF for that subdomain. DKIM alignment was updated to match. The incident ended quickly, but the postmortem was blunt: you can’t reason about email auth by looking at the From header and vibes.

Mini-story 2: The optimization that backfired

An enterprise IT team decided to “clean up DNS.” They noticed their SPF record used include: for a provider that, on inspection, expanded to lots of ip4 ranges. They’d heard of SPF flattening, and it sounded like a performance win. So they flattened it manually.

At first, it looked great: fewer DNS lookups, shorter include chain, fewer intermittent temperror tickets from receivers with strict timeouts. They declared victory and moved on. A month later, a slow trickle of “password reset never arrived” reports started showing up, from a subset of consumer mail providers.

The vendor had added new sending IPs. The include record updated. The flattened record didn’t. The mail didn’t fail everywhere; it failed just enough to be expensive and hard to prove. Nobody wanted to roll back because the flattened SPF had been presented as a security improvement. It wasn’t a lie, but it wasn’t the whole truth either.

They eventually rebuilt the approach: flattening was automated with scheduled refresh, changes were diffed and reviewed, and the list was constrained to only the vendor services actually in use. The backfire wasn’t flattening itself. It was flattening without treating it like a living dependency.

Mini-story 3: The boring but correct practice that saved the day

A global company had a reputation for being “slow” about email changes. Every SPF update required an owner, a ticket, a rollback plan, and a 24-hour observation period. People complained. Marketing complained louder. This is the natural order of things.

One morning, a well-known email vendor had a DNS outage. Their SPF include domain intermittently returned SERVFAIL. Many customers saw sporadic temperror during SPF evaluation. Some receivers deferred mail. Some treated it as suspicious. Chaos, but polite chaos.

This company had two advantages: (1) their apex SPF had only one third-party include, because almost all vendor mail used delegated subdomains, and (2) they had monitoring that sampled SPF resolution from multiple public resolvers and alerted on SERVFAIL and lookup count anomalies.

The response was boring: temporarily route critical transactional mail through a secondary provider already authorized on a subdomain, and wait out the vendor’s DNS recovery. No panicked edits to the apex SPF. No “just add this include” at 9 a.m. The post-incident review was equally boring: they documented the vendor dependency and kept the architecture. Boring, in production systems, is a compliment.

Common mistakes: symptom → root cause → fix

1) Symptom: intermittent failures, sometimes “permerror”

Root cause: too many DNS lookups due to nested includes, plus mx/a mechanisms inside include chains.

Fix: reduce dependencies: move vendors to subdomains, remove stale includes, avoid mx/a in SPF, consider managed flattening with refresh.

2) Symptom: “SPF fail” but you swear the From domain is correct

Root cause: you’re looking at the From header; SPF is checking the envelope-from domain (Return-Path).

Fix: inspect headers/logs to find MAIL FROM. Fix the envelope domain configuration or publish SPF for the actual envelope domain.

3) Symptom: SPF suddenly breaks after “a small DNS cleanup”

Root cause: multiple SPF records (multiple v=spf1 TXT strings) exist, or someone edited quoting/spacing and broke parsing.

Fix: publish exactly one SPF policy per domain; validate with tooling; keep SPF in source control.

4) Symptom: mail forwarding causes SPF failures

Root cause: forwarders send from their IPs; original domain’s SPF doesn’t authorize them.

Fix: don’t authorize the world. Use DKIM signing and DMARC alignment; for receivers/forwarders that support it, ARC can preserve auth results.

5) Symptom: SPF passes but DMARC fails

Root cause: SPF passes for an envelope domain that doesn’t align with the From domain under DMARC (especially with strict alignment).

Fix: align identities: use a matching envelope domain (often a subdomain of the From domain) and/or ensure DKIM d= aligns with From.

6) Symptom: SPF works for IPv4 senders, fails mysteriously for some receivers

Root cause: you’re sending over IPv6 from some hosts, but SPF only authorizes IPv4, or the include chain behaves differently with AAAA queries and timeouts.

Fix: inventory actual sending IPs including IPv6; authorize ip6: as needed; ensure outbound MTAs aren’t leaking unexpected IPv6 paths.

7) Symptom: vendor mail fails only for some of their regions

Root cause: you flattened SPF and missed new vendor IP pools, or the vendor has multiple SPF include domains for different products and you included the wrong one.

Fix: revert to vendor include if you can afford the lookup budget, or implement automated flatten refresh with review; verify you’re using the correct SPF include for the product tier.

8) Symptom: receivers report “temperror” in SPF

Root cause: DNS timeouts, SERVFAIL, or flaky authoritative servers for either your domain or an include target.

Fix: harden DNS (authoritative redundancy, resolver health), reduce lookup count to reduce time spent in DNS, monitor from multiple public resolvers.

Checklists / step-by-step plan

Step-by-step: simplify SPF includes safely

Inventory senders. List every system that sends mail using your brand: human mailboxes, transactional, marketing, support, finance, monitoring, CI, and “product invites.” Tie each to an owner.
For each sender, capture identities. From domain, envelope-from domain, DKIM d= domain, sending IP ranges (v4/v6), and whether it supports custom return-path.
Draw the current SPF dependency graph. Apex SPF plus all includes, and the includes inside those includes. Count estimated DNS lookups.
Decide domain strategy. Which senders truly need to send as @example.com? Everything else goes to delegated subdomains.
Move one sender at a time to a subdomain. Configure vendor to use bounce@mailer.example.com (or similar), publish SPF for that subdomain, ensure DKIM aligns.
Keep apex SPF minimal. Usually: corporate mail provider + primary transactional provider, and only if they must use the apex domain.
Remove stale includes. If a tool is “not active,” remove it. If someone complains later, they can re-authorize intentionally with proper design.
Lower TTL ahead of big changes. Do this at least a day before if you can. Raise TTL back after stabilization.
Validate syntax and lookup budget. Run SPF evaluation tests for representative sender IPs and identities.
Roll out and observe. Watch DMARC aggregate reports and inbound bounces. Expect lag due to caches and reporting cadence.
Document the contract. For each include/subdomain, write down: owner, vendor, why it exists, what it authorizes, and how to rotate it.
Monitor for drift. Includes can change under you. Track lookup count and DNS health as a continuous signal.

Operational checklist: before you add any new `include:`

Do we actually need apex-domain sending, or can this be a subdomain?
What is the vendor’s envelope-from and does it support a custom return-path?
What’s our current lookup budget and what will this include cost when expanded?
Who owns this sender and will re-attest quarterly?
Do we have DKIM alignment that will satisfy DMARC?
What’s the rollback if this include causes permerror?

Emergency checklist: “We hit permerror”

Confirm permerror (too many DNS lookups) in logs or receiver feedback.
Identify the last SPF change (diff your DNS records).
Temporarily remove the least critical include(s) to get under the limit.
If removal breaks a business-critical sender, move it to a subdomain with its own SPF as the real fix.
After stabilization, perform a full sender inventory and simplify properly.

FAQ

1) Why is there a “10 DNS lookup” limit in SPF?

Because receivers must evaluate SPF for huge volumes of mail. Without a limit, SPF could be abused to force excessive DNS work per message, turning mail reception into a DNS stress test.

2) Does adding `ip4:` entries cost DNS lookups?

No. ip4/ip6 mechanisms don’t require DNS queries during SPF evaluation. They’re cheap for receivers, but you pay the maintenance cost.

3) Are `a` and `mx` “bad” in SPF?

Not inherently, but they are often used lazily and can balloon lookups. They also tend to authorize more hosts than you intended. In 2026, explicit IPs or vendor subdomains are usually safer.

4) What’s the difference between `include:` and `redirect=`?

include: says “if the included policy passes, treat it as a match here” and evaluation continues if it doesn’t. redirect= says “use that other domain’s policy as the policy for this domain” (typically as the final modifier).

5) Can I have multiple SPF TXT records to avoid length or complexity?

No. Publishing multiple v=spf1 records is a common mistake and often yields permerror. You need exactly one SPF policy per domain.

6) If we use DMARC, do we still need SPF?

Yes. DMARC relies on SPF and/or DKIM passing and aligning. SPF is still a primary input, and many receivers use SPF results beyond DMARC evaluation.

7) Why does forwarding break SPF, and what do we do about it?

Forwarders re-send from their own IPs, which aren’t in your SPF. The durable fix is DKIM signing with alignment, plus DMARC. In some ecosystems, ARC can help preserve authentication across forwarding.

8) Should we use `~all` or `-all`?

Use -all when you’re confident your inventory is correct and you’re enforcing. Use ~all during migrations if you need telemetry and want to reduce hard failures. But don’t leave ~all forever; it becomes permanent indecision in DNS.

9) What’s the best way to simplify SPF without flattening?

Delegate. Move third-party mail to subdomains with dedicated SPF and aligned DKIM. Keep the apex SPF for only the few systems that must send as the apex domain.

10) How do we monitor SPF includes drifting over time?

Periodically resolve your SPF record, walk includes, and track lookup count and DNS failures from multiple resolvers. Alert on changes, especially new nested includes or new mechanisms like exists.

Conclusion: practical next steps

If your SPF record is a pile of includes, you don’t have an SPF record. You have a dependency graph you’re not managing. The way out is not heroic DNS kung fu; it’s architecture and ownership.

Today: pull your current SPF, confirm you have exactly one v=spf1, and identify whether you’re flirting with the 10-lookup limit.
This week: inventory senders and move at least one major third-party sender to a delegated subdomain with its own SPF and aligned DKIM.
This month: establish ownership per sender/include, implement a lightweight validation pipeline (syntax + lookup budget), and add monitoring for DNS failures and unexpected include drift.

SPF doesn’t need to be perfect. It needs to be operable. Keep it under the lookup limit, keep it aligned with how mail actually leaves your systems, and stop treating DNS as a place to hide complexity. Your inboxes—and your sleep—will improve.