You changed “one little DNS thing” and now half your outbound mail lands in spam, inbound bounces for some senders, and your CEO’s “quick test” from a personal Gmail account works—so everyone assumes it’s fine. It isn’t.
Mixed DNS records are the classic slow-burn outage: some resolvers see old data, others see new; some providers tolerate a bad SPF, others hard-fail; your monitoring checks the happy path while your customers live on the unhappy path. The best part: the root cause is often a typo with the emotional impact of a disk failure.
What “mixed DNS records” really means (and why email suffers first)
“Mixed DNS records” isn’t a formal RFC term. It’s the SRE way of saying: the DNS data your domain is publishing is internally inconsistent, partially migrated, or variably visible across resolvers. Sometimes it’s because you’re mid-cutover. Sometimes it’s because someone pasted two vendors’ setup guides into the same zone. Sometimes it’s because your internal DNS has “helpful” overrides that make tests pass from inside the office and fail everywhere else.
Email is where this breaks first because email routing and trust are a stack of DNS lookups—not one. Web traffic usually cares about A/AAAA and maybe CAA. Mail cares about MX, then A/AAAA for the MX targets, plus SPF TXT records, plus DKIM TXT records, plus DMARC TXT records, plus reverse DNS for the connecting IP, and increasingly MTA-STS or TLS reporting. Mix any of these incorrectly and the system still “sort of works” until you meet the one big provider that enforces the part you botched.
The three most common “mixed record” patterns
- Split routing: multiple MX hosts from different providers present at once (or old MX plus new MX), so inbound mail goes to the wrong place some of the time.
- Split authorization: SPF says Vendor A is allowed, but you actually send from Vendor B (or your own MTA). Or SPF is syntactically valid but logically broken (too many DNS lookups, wrong qualifiers).
- Split identity: DKIM keys for one platform, DMARC policy expecting alignment that your actual From/Return-Path doesn’t satisfy. Everything “sends,” but deliverability tanks.
One more pattern deserves special mention: split horizon DNS—where internal resolvers return different answers than public DNS. This is excellent for internal services. It is a dumpster fire for email unless you’re disciplined, because mail flows cross boundaries by definition.
Joke #1: DNS propagation is the only place where “eventual consistency” means “eventually your pager will go off.”
Fast diagnosis playbook (check first / second / third)
This is the high-signal workflow when delivery is failing and you need to find the bottleneck fast. Don’t open vendor dashboards first. Start with the public truth.
First: Is inbound routing deterministic?
- Check MX records from multiple resolvers (your resolver, a public one, and the authoritative nameservers).
- Resolve the A/AAAA for each MX target—make sure they exist, don’t point to decommissioned IPs, and aren’t CNAME chains to nowhere.
- Look for “mixed providers”: old MX targets still present, wrong priorities, or apex MX pointing to a hostname that no longer exists.
Second: Are you failing authentication in a way that causes rejections?
- Validate SPF syntax and DNS lookup count. If you see
permerror, treat it as broken. - Confirm DKIM selectors are published and match what your sender is using.
- Check DMARC exists, has a sane policy, and isn’t pointing to nonsense reporting addresses.
Third: Does the connecting IP look legitimate?
- Verify PTR (reverse DNS) for your outbound IPs.
- Verify forward-confirmed reverse DNS (FCrDNS): PTR hostname resolves back to the same IP.
- Check whether your outbound IP is unexpectedly coming from a new pool (a “helpful” network change, NAT, or a new SaaS relay).
Decision rule
If inbound is misrouted: fix MX first. If inbound is fine but you’re landing in spam or bouncing at big providers: fix SPF/DKIM/DMARC alignment and the IP identity (PTR/HELO). Don’t polish DMARC while MX is still sending half your customer mail to an abandoned mailbox at your previous vendor.
Facts & historical context worth knowing
- MX records replaced “mail routing by A record” early in DNS history, because “just connect to the domain” didn’t scale with multiple mail hosts and failover patterns.
- TXT records became the junk drawer of DNS partly because they were widely supported and flexible; SPF originally had its own RR type but TXT won in practice.
- SPF’s 10-DNS-lookup limit exists because receivers must evaluate it during SMTP time, and unbounded recursion would be a DoS gift.
- DKIM was designed to survive forwarding better than SPF, because it signs the message content rather than the connecting IP—but it can still break on modifications like footers and some list software.
- DMARC introduced alignment: it’s not enough to “pass SPF.” SPF must pass for a domain aligned with the visible From, or DKIM must pass and align.
- DNS TTL values are advisory; resolvers can cap, extend, or behave oddly under load. Low TTL is not a universal “propagate faster” button.
- Some receivers cache negative answers (NXDOMAIN) based on SOA minimum / negative TTL rules, which means a briefly missing DKIM record can haunt you for hours.
- Provider enforcement tightened after large-scale spoofing; many big inbox providers now treat weak auth as a deliverability penalty, not just a “nice-to-have.”
How email actually uses DNS: MX, SPF, DKIM, DMARC, PTR, TLSA, and friends
MX: where inbound mail goes (and how mixed records split your mailbox)
MX records are the traffic directors for inbound mail. Each MX record points to a hostname, plus a preference value (priority). Lowest number wins. If multiple MX records share the same preference, the sender will typically randomize among them.
Where it goes wrong:
- You leave old MX records in place “for safety,” creating a random mail split between two providers.
- You point an MX record at a CNAME that works in some resolvers but fails in others (MX targets should be hostnames with A/AAAA; CNAME at MX targets is widely discouraged and can be rejected by strict senders).
- You add IPv6 AAAA for an MX host that doesn’t actually accept mail over IPv6, producing timeouts for senders that prefer IPv6.
SPF: who is allowed to send (and why “it passes” isn’t the same as “it works”)
SPF is a TXT record (commonly at the domain root) that tells the receiver which IPs/hosts are authorized to send for that domain. It’s evaluated against the SMTP envelope sender (Return-Path / MAIL FROM) or, if that’s empty, the HELO/EHLO identity.
SPF failure modes that look like “mixed DNS”:
- Multiple SPF TXT records at the same name (two
v=spf1strings). Many receivers treat this as a permanent error. - Overlapping migrations:
include:_spf.oldvendor.exampleandinclude:_spf.newvendor.exampleboth present, pushing you over the DNS lookup limit. - Testing hacks: someone adds
+allor~all“temporarily,” and it becomes permanent, undermining enforcement and deliverability.
DKIM: cryptographic signatures (and the selector typo that ruins your week)
DKIM publishes a public key in DNS at selector._domainkey.example.com. Your outbound system signs messages using the private key and includes the selector in the DKIM-Signature header. Receivers fetch the public key and verify the signature.
Mixed record patterns:
- You rotate selectors but don’t publish the new selector everywhere (multi-provider DNS, or only internal DNS updated).
- You publish the selector at the wrong subdomain (common when sending as
mail.example.comvsexample.com). - You break the TXT record formatting (quotes, line breaks) so some DNS tools show it but some receivers fail to parse it.
DMARC: policy, alignment, reporting (and how “none” can still help)
DMARC is published at _dmarc.example.com and tells receivers what to do when SPF/DKIM don’t align with the visible From domain. It also provides reporting addresses to receive aggregate and forensic feedback (where supported).
Mixed record patterns:
- DMARC exists on
example.combut you’re actually sending fromsub.example.comwithout its own DMARC, or you rely on organizational domain behavior without understanding it. - DMARC policy is set to
p=rejectbefore DKIM is consistently signing across all outbound sources. - Multiple DMARC records exist (it happens), and receivers treat it as invalid.
PTR and FCrDNS: the “do you look like a real mail server” check
Reverse DNS (PTR) maps an IP back to a hostname. Many receivers heavily weight PTR correctness, especially for self-hosted MTAs. Even for SaaS relays, a sudden change in outbound IP identity can tank deliverability. FCrDNS adds a second check: the PTR name should resolve back to the same IP.
MTA-STS and TLSA: not required everywhere, but increasingly relevant
MTA-STS uses a DNS TXT record (_mta-sts.example.com) and an HTTPS policy host (mta-sts.example.com) to signal that inbound SMTP should use TLS and to pin expected MX hosts. TLSA (DANE) uses DNSSEC to publish TLS cert constraints. These aren’t “day one” requirements for many orgs, but if you deploy them, they add more ways mixed DNS can break mail.
Joke #2: The good news about email is it’s a mature protocol. The bad news is it’s a mature protocol.
One reliability reminder, paraphrased from a notable operations voice: paraphrased idea: optimize for fast recovery and learning, because failures are inevitable in complex systems
— John Allspaw.
Practical tasks: commands, outputs, and the decision you make
These tasks are meant to be runnable from any Linux box with common DNS tools installed. Run them from at least two networks if you can (corporate LAN + a cloud VM), because “works from here” is the trap.
Task 1: Query MX from your default resolver
cr0x@server:~$ dig +nocmd example.com MX +noall +answer
example.com. 300 IN MX 10 mx1.newmail.example.
example.com. 300 IN MX 20 mx2.newmail.example.
What it means: Public DNS (as seen by your resolver) says mail goes to two hosts, preference 10 then 20.
Decision: If you expected one provider and see two unrelated sets (old and new), stop and clean up. Mixed MX is not “extra redundancy” unless you control both systems and know exactly how they interact.
Task 2: Query MX from a specific public resolver
cr0x@server:~$ dig @1.1.1.1 example.com MX +noall +answer
example.com. 300 IN MX 10 mx1.oldmail.example.
example.com. 300 IN MX 20 mx2.oldmail.example.
What it means: Different resolver, different truth. You are mid-propagation or have inconsistent authoritative answers (or a DNSSEC issue causing fallback behavior).
Decision: Before changing more, query the authoritative nameservers. If they’re consistent, you wait; if they’re inconsistent, you fix the zone publishing path.
Task 3: Query the authoritative nameservers directly
cr0x@server:~$ dig +short NS example.com
ns1.dns-host.example.
ns2.dns-host.example.
cr0x@server:~$ dig @ns1.dns-host.example example.com MX +noall +answer
example.com. 300 IN MX 10 mx1.newmail.example.
example.com. 300 IN MX 20 mx2.newmail.example.
cr0x@server:~$ dig @ns2.dns-host.example example.com MX +noall +answer
example.com. 300 IN MX 10 mx1.newmail.example.
example.com. 300 IN MX 20 mx2.newmail.example.
What it means: Authoritative servers agree on the new MX.
Decision: If public resolvers disagree, you’re waiting on cache expiry. Communicate that clearly: “Some senders will still hit old MX until TTL expires.” If authoritative servers disagree, you have a publishing/zone sync problem and propagation won’t save you.
Task 4: Confirm MX targets resolve to A/AAAA
cr0x@server:~$ dig mx1.newmail.example A +noall +answer
mx1.newmail.example. 300 IN A 203.0.113.10
cr0x@server:~$ dig mx1.newmail.example AAAA +noall +answer
What it means: IPv4 exists, IPv6 does not (empty AAAA answer).
Decision: That’s fine if you’re not serving IPv6. What you don’t want is a broken AAAA that points to an IP where port 25 is closed, because some senders will prefer IPv6 and time out.
Task 5: Detect “CNAME at MX target” weirdness
cr0x@server:~$ dig mx1.newmail.example CNAME +noall +answer
mx1.newmail.example. 300 IN CNAME vendor-lb.mailhost.example.
What it means: The MX target is a CNAME.
Decision: If you control it and it’s known-good with your receivers, you may survive. If you’re troubleshooting delivery, remove this variable: use the provider’s recommended MX hostnames that resolve directly to A/AAAA.
Task 6: Check SPF record presence and count
cr0x@server:~$ dig example.com TXT +noall +answer
example.com. 300 IN TXT "v=spf1 include:_spf.newvendor.example -all"
example.com. 300 IN TXT "v=spf1 include:_spf.oldvendor.example -all"
What it means: Two SPF records. That’s not “two sources of truth,” it’s “permerror roulette.”
Decision: Merge into a single v=spf1 record. If you truly need both vendors during migration, combine includes in one string and keep it under lookup limits.
Task 7: Evaluate SPF against a known sending IP
cr0x@server:~$ spfquery -ip 203.0.113.55 -sender bounce@example.com -helo mail.example.com
result=permerror
smtp.comment=Two or more type TXT records found
What it means: SPF evaluation fails permanently because of multiple records.
Decision: Fix SPF now. Many receivers treat permerror as fail or as a strong negative signal.
Task 8: Check SPF DNS lookup explosion
cr0x@server:~$ spfquery -ip 203.0.113.55 -sender bounce@example.com -helo mail.example.com
result=permerror
smtp.comment=SPF Permanent Error: too many DNS lookups
What it means: Your includes/redirects exceed SPF’s limit.
Decision: Flatten SPF (carefully), remove unused includes, or consolidate senders behind a single relay with a stable SPF footprint.
Task 9: Fetch the DKIM public key for a selector you believe is in use
cr0x@server:~$ dig s1._domainkey.example.com TXT +noall +answer
s1._domainkey.example.com. 300 IN TXT "v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A..."
What it means: Selector exists and returns a public key.
Decision: If messages are failing DKIM, the issue is probably signing config (wrong selector, wrong domain, body/header canonicalization issues) or intermediate modification of messages.
Task 10: Catch the DKIM selector typo / missing record
cr0x@server:~$ dig s2._domainkey.example.com TXT +noall +answer
What it means: No TXT record for selector s2.
Decision: Either publish s2 or reconfigure the sender to use the published selector. If you’re mid-rotation, publish both old and new selectors until old mail is drained and all senders are updated.
Task 11: Check DMARC record correctness
cr0x@server:~$ dig _dmarc.example.com TXT +noall +answer
_dmarc.example.com. 300 IN TXT "v=DMARC1; p=quarantine; adkim=s; aspf=s; rua=mailto:dmarc-reports@example.com"
What it means: DMARC exists, policy is quarantine, strict alignment enabled.
Decision: Strict alignment is fine only if you’re disciplined about From domains and signing. If you have many systems sending mail, strict alignment can backfire. Consider relaxed alignment during cleanup, then tighten later.
Task 12: Detect multiple DMARC records
cr0x@server:~$ dig _dmarc.example.com TXT +noall +answer
_dmarc.example.com. 300 IN TXT "v=DMARC1; p=none; rua=mailto:dmarc-reports@example.com"
_dmarc.example.com. 300 IN TXT "v=DMARC1; p=reject; rua=mailto:dmarc@example.com"
What it means: Two DMARC records. Receivers can treat this as invalid and ignore DMARC entirely.
Decision: Publish exactly one DMARC record at _dmarc. Merge tags into a single policy.
Task 13: Verify reverse DNS (PTR) for outbound IP
cr0x@server:~$ dig -x 203.0.113.55 +noall +answer
55.113.0.203.in-addr.arpa. 3600 IN PTR mailout.example.com.
What it means: The IP has PTR pointing to mailout.example.com.
Decision: Ensure that hostname also resolves forward to the same IP (FCrDNS). If PTR is missing or generic, fix it with your ISP/cloud provider; it’s often non-negotiable for good delivery.
Task 14: Verify forward-confirmed reverse DNS (FCrDNS)
cr0x@server:~$ dig mailout.example.com A +noall +answer
mailout.example.com. 300 IN A 203.0.113.55
What it means: Forward matches reverse. This is the boring baseline many receivers expect.
Decision: If forward does not match, fix either the PTR or the A record. Don’t leave it “close enough.” Mail systems are not known for their chill.
Task 15: Check for split horizon DNS (public vs internal)
cr0x@server:~$ dig @10.0.0.53 example.com MX +noall +answer
example.com. 300 IN MX 10 mx1.internal.example.
cr0x@server:~$ dig @1.1.1.1 example.com MX +noall +answer
example.com. 300 IN MX 10 mx1.newmail.example.
What it means: Internal DNS returns a different MX than public DNS.
Decision: Unless you have a very specific, well-tested architecture, unify them. At minimum, make sure internal clients that send mail externally aren’t using internal-only identities that break SPF/DKIM/DMARC and confuse mail routing.
Task 16: Confirm DNS TTL and negative caching hints
cr0x@server:~$ dig example.com SOA +noall +answer
example.com. 300 IN SOA ns1.dns-host.example. hostmaster.example.com. 2026010401 3600 900 1209600 300
What it means: The SOA shows the zone serial and the last field (minimum) often influences negative caching behavior.
Decision: If you briefly published a missing DKIM selector (NXDOMAIN), some resolvers may remember that failure. Plan key rollouts with overlap and avoid “record disappears for a minute” deployments.
Task 17: Inspect a real message’s authentication results (from headers)
cr0x@server:~$ grep -E "Authentication-Results|Received-SPF|DKIM-Signature|DMARC" -n message.eml | head
12:Authentication-Results: mx.google.com; spf=fail (google.com: domain of bounce@example.com does not designate 203.0.113.55 as permitted sender) smtp.mailfrom=bounce@example.com; dkim=pass header.i=@example.com; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=example.com
33:DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=example.com; s=s1; ...
What it means: DKIM passes, SPF fails, DMARC fails. That can happen if DKIM aligns but DMARC still fails due to alignment mismatch or because the receiver’s evaluation differs from your expectation.
Decision: Focus on alignment: ensure the DKIM d= domain and/or SPF mailfrom domain align with the visible From. If DMARC is strict and your mailfrom is a third-party bounce domain, you may need custom return-path or a different sending setup.
Task 18: Confirm that your MX actually accepts SMTP connections (network reality)
cr0x@server:~$ nc -vz mx1.newmail.example 25
Connection to mx1.newmail.example 25 port [tcp/smtp] succeeded!
What it means: Port 25 is reachable.
Decision: If this fails from the public Internet but works internally, you have firewalling or routing issues—not a DNS issue. Don’t keep editing TXT records to fix a closed port.
Three corporate mini-stories from the trenches
Mini-story #1: The incident caused by a wrong assumption
The company had two brands under one parent. The marketing team decided to unify outbound mail so campaigns came from a single platform. The email vendor’s onboarding doc said, “Add these SPF includes and DKIM keys.” So far, normal.
The wrong assumption arrived quietly: someone believed SPF was evaluated against the visible From domain. They updated brandA.com SPF to include the vendor and called it done. In reality, transactional mail used bounces.brandA-mail.com as the envelope sender, because a different system handled return paths. That domain’s SPF still pointed to the old relay.
Delivery didn’t fail universally. It failed for certain high-signal receivers that weighed SPF heavily for that stream. The helpdesk got “some customers aren’t receiving password resets” tickets—always the worst category, because users don’t distinguish between “marketing email” and “my account is broken.”
Debugging took longer than it should have because tests were done from one mailbox provider and one network. The team finally looked at Authentication-Results headers, saw SPF failing on the envelope sender, and realized they had authenticated the wrong domain.
The fix was boring: publish SPF and DKIM for the actual bounce domain, align DMARC for the visible From, and standardize return-path handling across systems. They also wrote down, in big letters, which domains were used for what. Production loves big letters.
Mini-story #2: The optimization that backfired
A fast-growing org moved DNS to a provider that supported “smart routing” and automated health checks. They reduced TTLs across the zone because they wanted “instant failover” during incidents. It sounded reasonable in a meeting.
Then they migrated inbound mail. During the cutover, their automation kept flipping MX records because one of the new MX hosts failed a TCP check intermittently. Some resolvers cached the “old” MX, some cached the “new,” and some cached the version from five minutes ago. Inbound mail became probabilistic.
It got worse: low TTL increased query volume and exposed a second issue—rate limiting at the authoritative provider during a peak. Receivers started seeing SERVFAIL occasionally. Those senders queued and retried (email is patient), but it created a wave pattern of delayed deliveries that looked like an application bug.
The post-incident lesson wasn’t “never lower TTL.” It was: don’t use DNS as a health-check-driven load balancer for mail unless you fully understand the receiving ecosystem. SMTP senders already have retry logic. Your job is stable routing and stable identity, not fancy flips.
Mini-story #3: The boring but correct practice that saved the day
A different team had a migration plan that felt almost too conservative: they documented every sending source (ticketing system, CRM, monitoring alerts, product mail, marketing). For each, they recorded the envelope sender domain, the visible From domain, DKIM selector, and outbound IP or relay.
They also ran a weekly “DNS drift” check: pull the zone records from the authoritative API, compare against a known-good template, and alert on differences. It wasn’t glamorous. No one got promoted because a cronjob didn’t fire. But it prevented surprises.
When they migrated to a new outbound relay, the drift check flagged that a teammate had added a second SPF record instead of editing the existing one. It never made it to production because the change failed a preflight gate.
The migration still had normal hiccups—some caches held old MX longer than expected—but mail didn’t split between providers and DMARC didn’t suddenly start rejecting legitimate sends. The “boring” practice wasn’t overhead; it was a seatbelt.
Common mistakes (symptoms → root cause → fix)
1) Symptom: Some senders bounce with “host not found,” others deliver fine
Root cause: Inconsistent MX visibility across resolvers (propagation window, stale caches, or inconsistent authoritative answers). Sometimes mixed NS sets during registrar changes.
Fix: Query authoritative NS directly; ensure both NS serve identical zone serial and records. Avoid changing NS and MX simultaneously unless you like stress. Increase TTL before planned migrations, not during them.
2) Symptom: Inbound mail randomly lands at the old provider
Root cause: Old MX records still published, often with equal preference or still lower preference than the new ones.
Fix: Publish only the intended provider’s MX records. If you need dual delivery, do it intentionally with a controlled routing layer, not by leaving garbage in DNS.
3) Symptom: SPF shows “permerror” in headers
Root cause: Multiple SPF TXT records, or SPF lookup limit exceeded due to too many includes/redirects.
Fix: Consolidate to a single SPF record. Reduce includes, remove unused vendors, and keep lookup count under 10.
4) Symptom: DKIM “selector not found” appears intermittently
Root cause: Selector published in one DNS view but not another (split horizon), or record updated on one authoritative NS but not the other, or a key rotation removed the old selector too soon.
Fix: Publish selectors consistently on all authoritative servers. During rotation, overlap old and new selectors. Don’t “clean up” until you’ve verified all senders moved.
5) Symptom: DMARC rejects legitimate mail after a vendor change
Root cause: Alignment broken—new vendor uses a different bounce domain, or DKIM signs with a domain that doesn’t align with the From domain, especially under strict alignment.
Fix: Configure custom return-path/envelope sender where possible. Ensure DKIM d= aligns with From. Consider relaxed alignment while migrating, then re-tighten.
6) Symptom: Mail to one major provider is delayed for hours
Root cause: Receiver can’t reach your MX consistently (IPv6 preference to a broken AAAA, firewall blocks, greylisting + auth failures), or DNS intermittently SERVFAIL due to authoritative issues.
Fix: Validate IPv6 path if you publish AAAA. Check port 25 reachability from outside. Stabilize authoritative DNS and avoid flapping records.
7) Symptom: You pass SPF but still land in spam
Root cause: SPF passes on a non-aligned domain (or only HELO identity passes). DMARC fails or is missing. Or IP reputation and PTR/HELO identity are weak.
Fix: Implement DKIM and DMARC with alignment. Fix PTR/FCrDNS. Ensure outbound HELO is stable and meaningful.
8) Symptom: Everything looks correct in your office, fails for customers
Root cause: Split horizon DNS returning internal-only MX/SPF/DKIM answers, or internal egress uses a different NAT IP than external tests.
Fix: Test from outside networks. Remove internal overrides for public email identities. Document and monitor your egress IPs.
Checklists / step-by-step plan
Step-by-step: Fix a mixed DNS email outage safely
- Freeze changes. Stop editing random records while you’re still guessing. Capture current zone state (MX, TXT, A/AAAA, NS, SOA).
- Establish authoritative truth. Query each authoritative NS directly for MX, SPF TXT, DKIM selectors, DMARC. If they disagree, fix that first.
- Make inbound deterministic. Publish only the intended MX set. Verify MX targets resolve and accept SMTP.
- Fix SPF to one record. Merge includes; verify lookup count; ensure you authorize all legitimate outbound sources (or route them through one relay).
- Fix DKIM selectors. Make sure every sender signs and the selector exists in public DNS. Overlap old/new selectors during rotation.
- Set DMARC to match reality. Start with
p=noneif needed, then move toquarantineandrejectonce alignment is proven across all sources. - Verify IP identity. PTR + FCrDNS; ensure HELO/EHLO is stable and matches PTR domain where appropriate.
- Test from multiple vantage points. Use at least one external VM and one consumer ISP if possible.
- Wait for caches intelligently. Track TTLs; communicate expected convergence windows to stakeholders. Don’t promise instant recovery if TTL is 1 hour and major resolvers are sticky.
- Close the loop with real headers. Validate Authentication-Results from representative receivers. If you can’t see headers, you’re debugging blind.
Prevention checklist: Keep DNS from becoming an email outage generator
- One owner for the email DNS surface. Not “everyone can edit TXT because it’s easy.” Easy is how you get multiple SPF records.
- Inventory all sending systems. For each: From domain, envelope sender domain, DKIM selector, outbound IP/relay, and whether it supports custom return-path.
- Change management for DNS. Staging review, peer approval, and a rollback plan that doesn’t involve frantic clicking in a web console.
- Monitor DNS answers from multiple resolvers. Alert if MX/SPF/DMARC/DKIM changes unexpectedly, or if authoritative NS disagree.
- Key rotation playbook. Publish new selector first, then flip signing, then retire old selector later.
- Don’t over-optimize TTL. Use TTL as a planning tool. Lower it ahead of planned changes, then raise it back for stability.
- Document provider cutovers. Especially where two vendors are involved (old + new). “Temporary” DNS entries should have an expiration date and an owner.
FAQ
1) What exactly counts as “mixed DNS records” for email?
Any combination where mail routing or authentication data is inconsistent: two MX sets from different providers, multiple SPF records, DMARC duplicated, DKIM selectors missing, or internal/public DNS returning different answers.
2) Can I keep old and new MX records during a migration “just in case”?
You can, but you usually shouldn’t. It creates nondeterministic routing unless you control both ends and have a plan for deduplication, user visibility, and support. Most orgs want deterministic delivery to one mailbox system.
3) Why does mail still go to the old provider even after I updated MX?
Because sending systems cache DNS. Some cache longer than your TTL. Some also retry against previously resolved MX targets for queued messages. That’s normal. Your job is to ensure the authoritative answers are correct and stable.
4) Is having two SPF records always wrong?
Yes, for practical purposes. The SPF spec expects a single policy record. Many receivers treat multiple v=spf1 strings as a permanent error.
5) If DKIM passes, do I still need SPF?
For modern deliverability, you want both. DKIM is often the more robust signal under forwarding, but SPF still matters, and DMARC can pass via either one—assuming alignment.
6) What is DMARC alignment, in plain terms?
It means the authenticated domain (from SPF mailfrom or DKIM d=) matches the visible From domain, either exactly (strict) or within the same organizational domain (relaxed).
7) How do split horizon DNS setups break email?
If internal resolvers return different MX/SPF/DKIM/DMARC than the public Internet, internal tests may pass while external senders fail. Also, internal apps may send with identities that don’t validate externally.
8) Should I publish AAAA records for my MX hosts?
Only if you actually accept SMTP over IPv6 reliably (connectivity, firewalling, TLS config, monitoring). Publishing AAAA without operational readiness creates timeouts and delays for senders that prefer IPv6.
9) How strict should DMARC be?
Start with p=none to observe, then move to quarantine, then reject once you’ve verified all legitimate sources align and sign. Enforcement without inventory is how you block your own mail.
10) What’s the single fastest indicator that DNS is the culprit?
Different resolvers returning different MX or TXT answers for the same name. If authoritative nameservers disagree, it’s almost certainly a DNS publishing problem rather than an email server problem.
Conclusion: next steps you can do today
If you remember one thing: email breaks when DNS tells two stories at once. Your job is to make it tell one boring, correct story—consistently, from every resolver, every time.
- Run the MX/SPF/DKIM/DMARC queries against authoritative nameservers and at least one public resolver. Save the outputs.
- Make MX deterministic: remove legacy provider entries and verify MX targets resolve and accept SMTP.
- Reduce authentication entropy: one SPF record, published DKIM selectors that match actual signing, one DMARC record with a policy that matches your current maturity.
- Verify outbound identity: PTR + forward match, stable HELO, and no surprise egress IP changes.
- Write down your sending inventory and put a change gate in front of DNS edits. Future-you will not feel nostalgic about “quick fixes.”