You shipped the email. Customers didn’t get it. Or worse: it landed in spam with a polite-but-deadly “DKIM=fail” stamped on the forehead. The marketing team thinks “DNS propagation” is a weather pattern and asks if you can “just resend it.”
This is the part where you stop guessing. DKIM isn’t flaky. It’s strict. The signature breaks because something changed between signing and verification, or because verifiers can’t fetch the key, or because you signed the wrong thing in the first place. The good news: it’s usually fixable in under an hour if you follow a disciplined triage.
DKIM in plain terms (what actually gets signed)
DKIM (DomainKeys Identified Mail) is not “encryption,” and it’s not “proof the sender is nice.” It’s a cryptographic checksum over selected headers and (usually) the message body, signed with a private key held by the sending domain. The receiver fetches the public key from DNS and verifies the signature. If the signed bytes don’t match what arrives, verification fails. If the public key can’t be retrieved, verification fails. If the receiver can’t parse the signature, verification fails.
In a DKIM signature header (DKIM-Signature:), you’ll see fields like:
d=the signing domain (the “responsible” domain for DKIM)s=the selector (used to find the key in DNS)h=which headers are signedbh=body hash (base64 of the canonicalized body hash)b=the actual signature over headers + body hashc=canonicalization (how whitespace/line folding is normalized)t=/x=timestamp / expiry
Here’s the key operational point: DKIM validates a specific representation of the message. If any relay, gateway, footer-injector, list manager, “security appliance,” or helpful cloud service tweaks the signed headers or the body in a way not tolerated by canonicalization, the signature breaks. That’s not a DKIM bug. That’s DKIM doing its job.
Two canonicalization modes, infinite misery
Canonicalization controls how the signer and verifier normalize content before hashing and verifying. You’ll see:
- simple: little to no normalization. Fragile. Great for people who enjoy on-call pages.
- relaxed: tolerates certain whitespace and header formatting changes. Usually the right choice.
Common, sane defaults are c=relaxed/relaxed (headers/body both relaxed) or relaxed/simple in some systems. If you’re using simple/simple in a complex email path, you’re basically signing a sandcastle and mailing it through a car wash.
DKIM’s actual promise
DKIM answers: “Did a domain with control over this DKIM private key sign this message, and did the signed parts arrive unchanged?” It does not answer: “Is the From address authentic?” That’s DMARC’s job, using SPF and DKIM plus alignment rules. DKIM is one building block in a larger bouncer squad at the door.
Interesting facts and short history (because email is older than your CI pipeline)
- Fact 1: DKIM was standardized in 2007 (RFC 4871), merging ideas from Yahoo’s DomainKeys and Cisco’s Identified Internet Mail.
- Fact 2: DKIM signatures are evaluated after transport changes; even “harmless” footer injection can invalidate a body hash.
- Fact 3: Early deployments used 1024-bit RSA keys widely; many receivers now prefer or require 2048-bit keys for stronger security.
- Fact 4: DNS TXT record size limits and fragmentation have historically caused intermittent key fetch failures, especially with oversized keys or sloppy DNS setups.
- Fact 5: DKIM can sign only a subset of headers; what you choose to sign is a tradeoff between integrity and survivability across relays.
- Fact 6: Mailing lists are famous DKIM-breakers: they modify Subject, add list headers, and append footers—exactly the stuff DKIM tends to cover.
- Fact 7: ARC (Authenticated Received Chain) was introduced later to preserve authentication results across forwarders and lists, because DKIM alone often breaks downstream.
- Fact 8: Some MTAs historically rewrote line endings or dot-stuffed content differently; canonicalization exists partly to survive this mess.
- Fact 9: Multiple DKIM signatures can coexist; receiving systems typically accept the message if any valid signature meets policy needs (especially for DMARC alignment).
How DKIM breaks: the failure modes that matter
1) DNS can’t serve the key (or serves the wrong one)
Receivers verify DKIM by querying a DNS record at:
<selector>._domainkey.<domain>
If that record is missing, malformed, split incorrectly, too big for your DNS path, or cached in a stale state during rotation, you get DKIM failures that look random—because they are random across resolvers.
2) The message changed after signing
This is the classic: the message is signed, then a relay modifies it. Common culprits:
- Adding a disclaimer footer (legal, HR, or “confidentiality” banners)
- Rewriting Subject lines (tagging “EXTERNAL”)
- Modifying MIME boundaries or re-encoding content
- Normalizing whitespace in ways not covered by canonicalization
- Virus scanners or DLP gateways reassembling attachments
3) You signed the wrong headers (or too many)
Signing unstable headers is a self-inflicted wound. Headers like Received will change at each hop, so signing them is pointless unless you like failure. Some systems also add or rewrite Message-ID, Date, or even From in ways you didn’t expect. Pick headers that represent identity and content but remain stable after your signer runs.
4) Your signing domain doesn’t align with DMARC
You can have a passing DKIM signature and still fail DMARC if the d= domain doesn’t align with the visible From: domain per DMARC policy (strict or relaxed alignment). That often shows up as “DKIM=pass” but “DMARC=fail,” which business stakeholders interpret as “Email is broken.” Fair.
5) Time, expiry, and replay defenses
Some DKIM signatures include x= (expiry). If your clocks drift or your messages queue too long (hi, backpressure), signatures can expire before delivery. This is less common but extremely annoying because everything “looks right” except time.
6) Selector/key rotation performed like a weekend hobby
Rotating DKIM keys is normal. Rotating them without overlap is how you create a two-day incident. DNS caches, multi-region MTAs, and long-lived queues mean old selectors may still be used for a while. Pull the record too early and verification fails for delayed mail.
Short joke #1: DKIM is like a tamper-evident seal. If you keep opening the jar to “improve it,” don’t act surprised when the seal screams.
Fast diagnosis playbook (first/second/third)
This is the sequence that finds the bottleneck fastest in real environments. Don’t start by editing configs. Start by looking at what receivers saw.
First: confirm what failed (DNS vs body hash vs header parse)
- Get one real failed message (original headers as received). Not a screenshot. Not a forward. The raw source.
- Read the Authentication-Results header from the receiver. It often tells you exactly what broke: “no key,” “body hash mismatch,” “bad signature,” “invalid header format.”
- Check the DKIM-Signature fields:
d=,s=,c=,h=,bh=,x=.
Second: verify DNS from multiple vantage points
- Query the selector record using a known resolver.
- Query using your system resolver (to catch split-horizon or corporate DNS weirdness).
- Verify the record is syntactically valid and complete (no broken quoting, missing fragments).
Third: isolate where the message is being modified
- Compare the message at signing time vs at delivery (if you can capture both).
- Look for intermediate systems that rewrite content: outbound gateways, mail security, CRM platforms, list managers.
- Send a controlled test: same sender, same recipient, but bypass optional components (if possible) to find the modifying hop.
Fourth: fix the smallest thing that restores pass
Resist the urge to “re-architect email.” First restore deliverability, then harden the pipeline. In practice that often means:
- Fixing the DNS key record or selector usage
- Switching canonicalization to relaxed
- Stopping body modification after signing (move signing to the last hop)
- Adjusting which headers are signed
- Aligning
d=with From-domain for DMARC
Practical tasks: commands, outputs, and decisions (12+)
These are real tasks you can run on a mail host or an investigation box. Each includes: command, what output means, and what decision you make. I’m assuming a Linux environment with typical tooling. Adjust paths for your distro.
Task 1: Extract Authentication-Results and DKIM-Signature from a saved message
cr0x@server:~$ awk 'BEGIN{RS="";FS="\n"} {for(i=1;i<=NF;i++) if($i ~ /^Authentication-Results:/ || $i ~ /^DKIM-Signature:/) print $i}' failed.eml
Authentication-Results: mx.example.net; dkim=fail (body hash did not verify) header.d=example.com header.s=s2025 header.b=Qn...
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=example.com; s=s2025; h=from:to:subject:date:mime-version:content-type; bh=3l8...; b=Qn...
Meaning: You already have the class of failure: body hash mismatch. This is not “DNS missing,” not “bad key,” not “signature expired.” Something changed in the body after signing.
Decision: Stop staring at DNS. Go find what modifies the body (footers, re-encoding, gateway rewriting), or move signing later in the pipeline.
Task 2: Check the DKIM public key record exists (dig)
cr0x@server:~$ dig +short TXT s2025._domainkey.example.com
"v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtY..."
Meaning: A TXT record exists and returns a DKIM-looking payload.
Decision: If DKIM still fails with “no key,” suspect DNS visibility (split-horizon), DNSSEC failures, or record fragmentation/quoting issues on some resolvers.
Task 3: Check the same DKIM record via a specific resolver (catch split-horizon)
cr0x@server:~$ dig @1.1.1.1 +short TXT s2025._domainkey.example.com
"v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtY..."
Meaning: Public resolvers agree. If your internal resolver returns something else, you have split DNS or stale caches.
Decision: If results differ, fix authoritative DNS first. Email receivers are not using your internal view of reality.
Task 4: Detect malformed multi-string DKIM TXT records
cr0x@server:~$ dig TXT s2025._domainkey.example.com +noall +answer
s2025._domainkey.example.com. 300 IN TXT "v=DKIM1; k=rsa; p=MIIBIjANBgkq..." "G9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtY..."
Meaning: Some DNS providers split long TXT records into multiple strings. That’s fine if done correctly; resolvers concatenate them. But quoting mistakes can silently break keys.
Decision: If you see odd punctuation, missing quotes, or truncated segments, regenerate and re-publish the record. Don’t hand-edit base64 in a browser at 2 a.m.
Task 5: Verify OpenDKIM sees the key and selector mapping
cr0x@server:~$ sudo opendkim-testkey -d example.com -s s2025 -vvv
opendkim-testkey: using default configfile /etc/opendkim.conf
opendkim-testkey: checking key 's2025._domainkey.example.com'
opendkim-testkey: key OK
Meaning: The DNS key record is reachable and matches what OpenDKIM expects.
Decision: If this fails, fix DNS/selector first. If it passes but receivers fail, suspect mail flow or message modification.
Task 6: Confirm the signer is actually signing (mail logs)
cr0x@server:~$ sudo grep -E "opendkim|DKIM-Signature" /var/log/mail.log | tail -n 5
Jan 03 10:14:22 mailgw opendkim[1423]: 6F2C12A0B1: DKIM-Signature field added (s=s2025, d=example.com)
Jan 03 10:14:23 mailgw postfix/qmgr[1111]: 6F2C12A0B1: from=, size=48213, nrcpt=1 (queue active)
Meaning: OpenDKIM injected a signature for this message ID.
Decision: If you don’t see signing, check milter wiring, socket permissions, or a bypass path that skips the milter.
Task 7: Confirm Postfix is wired to the DKIM milter
cr0x@server:~$ sudo postconf | egrep 'smtpd_milters|non_smtpd_milters|milter_default_action|milter_protocol'
smtpd_milters = inet:127.0.0.1:8891
non_smtpd_milters = inet:127.0.0.1:8891
milter_default_action = accept
milter_protocol = 6
Meaning: Both SMTP-received and locally generated mail pass through the DKIM milter. If only smtpd_milters is set, mail injected by internal processes may not get signed.
Decision: If non_smtpd_milters is empty and you generate mail locally (cron, apps), add it. Unsigned mail is frequently treated as suspicious once DMARC is strict.
Task 8: Validate the message body wasn’t rewritten by a gateway (compare hashes via OpenDKIM tools)
cr0x@server:~$ sudo opendkim-testmsg -v < failed.eml
opendkim-testmsg: dkim_eom(): verification failed: body hash did not verify
Meaning: The stored message (as received) does not match the signed body hash.
Decision: Identify the hop that modified the body. If you control a downstream gateway doing footer injection, move DKIM signing after it or stop it from touching signed mail.
Task 9: Find which headers you signed (and whether you signed unstable ones)
cr0x@server:~$ grep -i '^DKIM-Signature:' -m 1 failed.eml | sed 's/; /\n/g' | egrep '^(h=|c=|d=|s=)'
c=relaxed/relaxed
d=example.com
s=s2025
h=from:to:subject:date:mime-version:content-type
Meaning: You signed a pretty standard header set. Good. If you see h=...:received:... or other hop-mutating headers, you’ve found a self-own.
Decision: If unstable headers are signed, adjust signer configuration to sign a stable set.
Task 10: Detect line-ending or MIME transformations (spot the classic “gateway rewrap”)
cr0x@server:~$ python3 - <<'PY'
import sys, email
from email import policy
msg = email.message_from_binary_file(open("failed.eml","rb"), policy=policy.default)
print("Content-Type:", msg.get_content_type())
print("Has multipart:", msg.is_multipart())
print("Transfer-Encoding:", msg.get("Content-Transfer-Encoding"))
PY
Content-Type: multipart/alternative
Has multipart: True
Transfer-Encoding: None
Meaning: Multipart email is more likely to be “helpfully” rewritten by security products. Not always, but it’s a frequent scene of the crime.
Decision: If you have a DLP/AV gateway, check whether it re-encodes parts or normalizes MIME boundaries. If yes, sign after that step.
Task 11: Verify DMARC alignment clues (Authentication-Results parsing)
cr0x@server:~$ grep -i '^Authentication-Results:' -m 1 failed.eml
Authentication-Results: mx.example.net; spf=pass smtp.mailfrom=bounces.vendor-mail.net; dkim=pass header.d=vendor-mail.net; dmarc=fail header.from=example.com
Meaning: SPF passed for the vendor domain, DKIM passed for the vendor domain, but DMARC failed because visible From is example.com and neither SPF nor DKIM aligns with that domain.
Decision: Force the vendor to sign with d=example.com (your domain) and align SPF via a custom bounce domain, or accept that DMARC will fail and deliverability will suffer under strict policies.
Task 12: Check for multiple DKIM signatures and decide which one matters
cr0x@server:~$ grep -i '^DKIM-Signature:' failed.eml | wc -l
2
Meaning: Two signatures exist. Often one is added by your system, another by an upstream service. Receivers may validate one and ignore the other depending on alignment and policy.
Decision: Ensure at least one passing signature aligns with the From-domain for DMARC. If the aligned one fails, you still have a problem even if another passes.
Task 13: Confirm the selector in outbound mail matches your published record
cr0x@server:~$ grep -i '^DKIM-Signature:' -m 1 failed.eml | tr ';' '\n' | grep -E '^\s*s='
s=s2025
Meaning: The message used selector s2025. If DNS only has s2024, you’ve found the mismatch.
Decision: Either publish the s2025 record immediately (and keep it), or roll back the signer config to the published selector. Don’t “wait for propagation” if the record simply doesn’t exist.
Task 14: Check OpenDKIM key table/signing table mappings (common misroute)
cr0x@server:~$ sudo egrep -v '^\s*(#|$)' /etc/opendkim/SigningTable /etc/opendkim/KeyTable
/etc/opendkim/SigningTable:*@example.com s2025._domainkey.example.com
/etc/opendkim/KeyTable:s2025._domainkey.example.com example.com:s2025:/etc/opendkim/keys/example.com/s2025.private
Meaning: SigningTable maps senders to a key name; KeyTable maps that name to a domain/selector/private key path.
Decision: If mail from noreply@sub.example.com is not covered by your wildcard, you may be unintentionally not signing. Fix the table patterns to match actual From/sender addresses.
Task 15: Validate that your private key file is readable by the signing service
cr0x@server:~$ sudo -u opendkim test -r /etc/opendkim/keys/example.com/s2025.private && echo "readable"
readable
Meaning: The OpenDKIM user can read the key. If not, OpenDKIM may silently fail to sign or log confusing errors.
Decision: Fix ownership/permissions. If you “solved” it by making the key world-readable, undo that and do it properly.
Task 16: Check for queue delays that can trigger signature expiry
cr0x@server:~$ mailq | head -n 20
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
6F2C12A0B1 48213 Fri Jan 3 10:14:23 noreply@example.com
user@recipient.net
-- 48 Kbytes in 1 Request.
Meaning: If you see hours/days of backlog and your signatures expire (x=), verification can fail despite correct config.
Decision: Reduce queue time, avoid DKIM expiry unless you have a strong reason, and fix the delivery bottleneck (rate limits, DNS issues, blocked IPs).
Common mistakes: symptom → root cause → fix
This section is intentionally blunt. Most DKIM problems are boring. The trick is to recognize the shape quickly.
1) “dkim=fail (no key for signature)”
- Symptom: Authentication-Results says no key, or “key not found.”
- Root cause: Missing selector TXT record, wrong selector (
s=) in outbound mail, split DNS, or broken TXT formatting. - Fix: Publish
<s>._domainkey.<d>TXT correctly; confirm withdigagainst public resolvers; ensure signer uses that selector.
2) “dkim=fail (body hash did not verify)”
- Symptom: DKIM fails; often intermittent by recipient or route.
- Root cause: Message body modified after signing: disclaimer, footer, content filter rewrite, re-encoding, line wrap changes not tolerated.
- Fix: Move signing to the last system before the Internet; disable post-sign modifications; use
relaxedcanonicalization; avoid adding content after signing.
3) “dkim=fail (signature did not verify)” with key present
- Symptom: Key exists, but signature fails even when body seems untouched.
- Root cause: Wrong key published for selector (rotation mismatch), corrupted private key, signing with different domain/selector than expected, or canonicalization mismatch in some implementations.
- Fix: Validate signer is using the same selector/key as DNS; overlap old/new selectors during rotation; regenerate keys if corruption suspected.
4) DKIM passes, DMARC fails
- Symptom:
dkim=passbutdmarc=fail. - Root cause: DKIM signed with
d=that doesn’t align with visible From-domain, or SPF passes for a different domain. - Fix: Ensure at least one DKIM signature aligns with From-domain; configure vendors with custom domain signing; validate DMARC alignment mode (relaxed vs strict).
5) DKIM fails only for mailing lists / forwards
- Symptom: Direct mail passes; forwarded/list mail fails.
- Root cause: Lists add footers/subject tags; forwarders wrap content; From rewriting (or not) interacts with DMARC. DKIM gets broken in transit.
- Fix: Prefer ARC on forwarders/lists; minimize modifications; consider DMARC-friendly list behavior; sign at the last hop you control.
6) DKIM fails only for large emails or certain recipients
- Symptom: Big newsletters fail, small alerts pass.
- Root cause: MIME transformations, gateway re-encoding, or DNS issues on resolvers that choke on large TXT/fragmentation; sometimes an MTU/EDNS path issue.
- Fix: Keep DKIM records clean; test DNS via multiple resolvers; avoid mail path components that rewrap HTML; validate that your ESP isn’t “optimizing” markup post-sign.
7) DKIM fails after a “harmless” infrastructure change
- Symptom: Suddenly DKIM fails after migrating to a new gateway, adding DLP, enabling “external” tagging, or turning on a cloud connector.
- Root cause: You changed the bytes. DKIM noticed.
- Fix: Move DKIM signing downstream of the modifying component, or configure the component not to modify outbound mail after signing.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
Company A ran a tidy outbound stack: application servers submitted mail to Postfix, Postfix handed it to a cloud relay, and the relay delivered to the world. They had DKIM “enabled” and a DMARC policy that wasn’t aggressive. Life was fine, which is often how the danger starts.
A security team rolled out a new outbound banner: prepend “[EXTERNAL]” to the subject for any mail leaving the company. They did this on the cloud relay, because that’s where the feature checkbox lived. The change was approved quickly; it was “only a subject prefix.”
Three days later, customer receipts started landing in spam and some partners began rejecting mail outright. The SRE on call saw dkim=fail across multiple receivers, and someone confidently stated: “DKIM shouldn’t care about Subject, it’s just a header.” That sentence cost them an afternoon.
Of course DKIM cared. Their signer was on the Postfix box, upstream of the cloud relay, and they signed the Subject header. The relay modified Subject after signing. Every message was now cryptographically “tampered with.”
The fix was not heroic. They moved DKIM signing to the relay (the last hop before the Internet) and stopped signing headers that downstream systems rewrite. They also added a change-control item: “Does this modify headers/body post-sign?” It sounds bureaucratic. It’s actually cheaper than surprise spam-folder tourism.
Mini-story 2: The optimization that backfired
Company B had a mature outbound setup and rotated DKIM keys quarterly. Someone noticed their DNS provider charged extra for “advanced records” and proposed an optimization: consolidate selectors and reuse the same DKIM key across multiple subdomains and business units. Fewer records, less hassle, fewer moving parts. What could go wrong.
They did the consolidation, then also shortened TTLs to speed future changes. The next rotation, they updated the DNS record for the selector and rolled the new private key to half of their sending fleet first. The other half lagged for a day because a maintenance window slipped.
Half the fleet signed with the new key, half signed with the old key, and DNS served only the new public key. Every message from the lagging half failed DKIM verification. Receivers didn’t care about the internal scheduling explanation. They just saw broken signatures and adjusted reputation accordingly.
The “optimization” removed the safety valve: having multiple selectors in parallel. If they had used overlapping selectors—old and new—each fleet segment could have signed with the selector matching its deployed key, and DNS could have served both keys until the rollout finished.
They recovered by re-publishing the old selector record, then planning rotations with overlap windows and an explicit “do not delete old selectors until queues are drained” rule. Fewer records was not the win they thought it was.
Mini-story 3: The boring but correct practice that saved the day
Company C had a compliance-heavy environment where every outbound message passed through an archival gateway. The gateway was notorious for occasionally rewriting MIME structure when it “normalized” attachments. Everyone hated it, but it was non-negotiable.
Years earlier, an engineer had insisted on a very unsexy design: DKIM signing happens after the archival gateway, on the final outbound MTA. That meant the archival box never saw DKIM-signed messages and couldn’t break signatures. People grumbled because it made key management “one more thing” on the final hop.
One Friday, the archival team pushed an update that changed how multipart boundaries were generated. It altered nothing semantically, but it changed bytes in transit for some messages. If DKIM signing had been upstream, it would have caused widespread DKIM failures and likely a DMARC blowup by Monday.
Instead, nothing happened. DKIM signatures were added after the gateway did its weirdness. Receivers validated clean mail. The incident was downgraded to “internal change created different MIME” and never touched deliverability.
Boring architectures don’t win design awards. They do win weekends.
Short joke #2: Email deliverability is the only sport where you can lose because someone added two spaces and a footer.
Checklists / step-by-step plan
Step-by-step: fix a DKIM failure in under an hour
- Capture one failed message source from the receiving mailbox (raw headers + body). Save as
failed.eml. - Read Authentication-Results: decide if it’s “no key,” “body hash mismatch,” “signature invalid,” or “pass but DMARC fail.”
- Extract selector and domain from
DKIM-Signature(s=,d=). - Query DNS for
s._domainkey.dusing at least one public resolver and your local resolver. - Validate signing service wiring (milter config, logs show “DKIM-Signature field added”).
- If body hash mismatch: map the outbound path and identify the hop that modifies content. Temporarily bypass that hop if possible to confirm.
- Apply the minimal corrective change:
- Fix DNS record/selector mismatch
- Move signing downstream of modifiers
- Stop modifiers from touching outbound mail
- Switch to
relaxed/relaxedcanonicalization
- Send a controlled test to at least two external receivers and confirm
dkim=pass. - Only then talk about key rotation hardening, ARC, and DMARC policy adjustments.
Operational hygiene checklist (the stuff you do before it breaks)
- Sign at the last hop you control before the open Internet.
- Use 2048-bit keys unless you have a strong constraint; keep TXT records clean and correctly split.
- Rotate selectors with overlap: publish new selector first, deploy signing, keep old selector record until you’re confident queues are drained and stragglers delivered.
- Pick stable headers to sign; don’t sign hop-mutating headers.
- Log DKIM signing events and alert on sudden drops in signed volume.
- Track every system that can rewrite mail (DLP, AV, banner tagging, “smart” SMTP relays, CRMs, ticketing systems).
- Validate DMARC alignment for vendor-sent mail using your domain.
A reliability idea worth keeping (paraphrased)
Paraphrased idea (John Allspaw): Reliability comes from making system behavior observable and learning from failure, not from pretending failures won’t happen.
FAQ
1) Why does DKIM pass sometimes and fail other times for the same sender?
Usually because the mail took different routes. One path modifies the message (footer injection, subject tagging, MIME rewrite), another doesn’t. Less commonly it’s DNS inconsistency: some resolvers see the key record, others don’t due to caching, split DNS, or a broken authoritative setup.
2) What’s the fastest way to tell if it’s DNS or message modification?
Read the receiver’s Authentication-Results. “No key” screams DNS/selector. “Body hash did not verify” screams modification after signing. “Signature did not verify” can be either wrong key/rotation mismatch or header/body changes.
3) Should I always use relaxed/relaxed canonicalization?
In production email paths with real-world intermediaries, yes more often than not. simple is brittle. If you control the entire path and want strictness, fine, but most orgs don’t actually control the entire path—despite what the architecture diagram claims.
4) Can I sign fewer headers to make DKIM survive better?
Yes, but don’t get cute. Sign identity-meaningful headers like From, plus core routing/content metadata that won’t change after signing. Avoid signing headers known to change downstream. The goal is survivability without making the signature meaningless.
5) Does forwarding always break DKIM?
Forwarding often breaks SPF (because the forwarder becomes the sending IP) and can break DKIM if the forwarder modifies content. Pure SMTP forwarding without modification can preserve DKIM. Many forwarders do modify (rewrap, add headers, sometimes alter content), which is where ARC becomes valuable.
6) What about mailing lists?
Mailing lists are DKIM’s natural predator. They commonly add footers, modify the subject, and change headers. Expect DKIM failures unless the list is configured to be DKIM/DMARC-friendly (or uses ARC properly). If list participation matters, plan for it instead of blaming “email being email.”
7) Is a 1024-bit DKIM key still acceptable?
Some receivers accept it, some penalize it, and the security margin is weaker. The practical answer: use 2048-bit RSA unless your DNS provider or platform can’t handle it cleanly. If you can’t, fix the DNS/provider constraint rather than clinging to 1024-bit keys.
8) We use a vendor to send mail. Why do we get DMARC failures even though the vendor says “DKIM is enabled”?
Because the vendor is probably signing with their own domain (d=vendor.com) and your visible From-domain is yours. DKIM may pass, but it doesn’t align with your domain, so DMARC fails. The fix is custom domain DKIM signing and an aligned envelope sender (custom bounce domain) if needed.
9) Should I have multiple DKIM signatures?
It can be useful: one signature from your infrastructure and one from a vendor, or one per domain identity. But make sure at least one signature aligns with your From-domain. Multiple signatures also complicate debugging; keep it intentional, not accidental.
10) How do I rotate DKIM keys without breaking mail?
Use a new selector, publish its DNS record first, deploy signers to use it, and keep the old selector record published for an overlap window. Only remove the old record when you’re confident delayed mail and straggler MTAs are no longer using it.
Conclusion: next steps you can actually do today
DKIM failures feel mystical until you treat them like any other production integrity problem: identify what changed, identify where it changed, and stop it from changing in the wrong place.
Do these next:
- Collect one failed raw message and classify the failure via
Authentication-Results. - Verify selector DNS from a public resolver and your local environment. Fix “no key” before touching anything else.
- If it’s a body hash mismatch, stop post-sign modifications: move DKIM signing to the final outbound hop or disable rewriting features downstream.
- Ensure DMARC alignment for your real business From-domains, especially for vendor-sent mail.
- Institutionalize the boring parts: overlap key rotations, log signing events, and treat “adds a footer” as a deliverability change, not a cosmetic tweak.
Email is a hostile distributed system disguised as a commodity service. DKIM is one of the few pieces that behaves predictably. If it fails, something changed. Find it. Fix it. Then write it down so Future You doesn’t have to rediscover it at 3 a.m.