Email: Multiple MX records — how priority really works (and common mistakes)

Was this helpful?

You changed MX records to “add redundancy,” went home, and woke up to a flood of tickets:
some senders can’t deliver, others deliver slowly, and a few messages are bouncing with errors that sound like a Kafka novel.
The mail system is “up,” but delivery is not a binary state. It’s a negotiated process between DNS, SMTP clients, policy, and time.

Multiple MX records are one of those simple-looking knobs that quietly controls who talks to you, when, and how hard they’ll try.
“Priority” is real, but it’s not the kind of priority many people assume. If you treat it like a clean active/standby switch, it will punish you.

MX “priority”: what it really means (and what it doesn’t)

An MX record says: “For this domain, here are the hostnames that accept SMTP mail, and here’s an ordering preference.”
The numeric value is called preference (often casually “priority”). Lower numbers are preferred. That’s the part everyone remembers.
The part everyone forgets: SMTP senders are not required to do what you wish; they follow standards and their own operational policies.

MX preference is not a health check.
It does not prove the server is reachable, configured, authorized, or ready.
It doesn’t even guarantee a sender will try the lowest one first if local caching or policy says otherwise.
It’s a hint. A good hint. But still a hint.

The practical meaning of multiple MX records is this:

  • Lower preference is tried first in the normal case.
  • Higher preference is tried if lower fails (connection refused, timeout, 4xx, etc.), and then senders typically queue and retry later.
  • Equal preference is load sharing (not guaranteed evenly, but that’s the intent).
  • Failover is probabilistic and time-based: it depends on how quickly failures occur, how long queues persist, and how retry schedules behave.

Think of MX preference as a routing policy for other people’s software, executed on their timetable.
It’s less like a traffic light and more like a set of road signs in a different city.

Two rules that save careers

  1. If a hostname is in MX, it must be able to accept mail for the domain (and handle it correctly), or you are signing up for nondeterministic loss, delays, and bounces.
  2. Never publish an MX target you can’t monitor and operate.
    “It’s just a backup” is how you get surprise spam relaying, surprise outages, and surprise legal conversations.

One quote, one reality check

Paraphrased idea (often attributed to systems reliability thinking): Hope is not a strategy; design and verification are.
That’s the mindset you want when you publish DNS that influences mail delivery globally.

The selection algorithm: how senders choose among MX targets

Here’s the simplified algorithm most SMTP clients follow, based on the standard behavior:

  1. Look up MX records for the recipient domain.
  2. If none exist, fall back to an A/AAAA lookup on the domain itself (yes, still common).
  3. Sort MX targets by increasing preference (lowest number first).
  4. For targets with the same preference, randomize or rotate the order.
  5. For each target in order:
    • Resolve A/AAAA for the MX hostname(s).
    • Try delivery to one of the resolved IPs (policy varies: some try all, some pick one, some prefer IPv6).
    • If delivery fails with a transient error, continue to next target or queue and retry later.
    • If delivery fails permanently (5xx), bounce.

That’s the happy path. Production is where it gets interesting:

  • DNS caching means a sender can keep using old MX answers longer than your change window, depending on TTL and resolver behavior.
  • Local policy can override your design: some large providers treat certain IP ranges differently, or do “sticky” routing to reduce connection churn.
  • Connection-level failures are not equal: a fast “connection refused” triggers quick fallback; a timeout burns seconds and can stall queue throughput.
  • Greylisting and rate limiting can turn your “backup” into a magnet for slow retries.

If you want crisp failover, MX alone won’t give you crisp failover. It gives you eventual delivery under reasonable conditions.
If you need deterministic routing, you’re in the realm of traffic management in front of SMTP (or using a mail service that provides it).

Joke #1: MX priority is like an org chart. It looks authoritative until you watch how decisions actually get made.

Interesting facts and historical context

Mail is older than most of your monitoring stack, and it carries some legacy behaviors that still matter.
Here are concrete facts that explain why “just add another MX” is rarely “just.”

  1. MX records were introduced to decouple mail routing from hostnames. Early mail routing relied heavily on host tables and direct host addressing; MX made it more flexible.
  2. MX preference is intentionally simple. It’s an ordering mechanism, not a weights-and-health framework like modern load balancers.
  3. Fallback to A/AAAA when MX is missing is still part of common behavior. That’s why naked domains sometimes “receive mail” accidentally.
  4. Some senders treat equal-preference MX as a rotation set. The goal is distribution, but the actual distribution depends on implementation and caching.
  5. “Backup MX” used to be a common pattern. Store-and-forward made sense when links were flaky; today it’s often a liability unless carefully controlled.
  6. SMTP is built around retries. Temporary failures are expected; queues and backoff are features, not bugs.
  7. Opportunistic TLS changed what “reachable” means. A server can accept TCP/25 but still fail delivery policy if TLS or certificate checks are enforced by the sender.
  8. IPv6 adoption introduced new asymmetries. A sender may prefer IPv6, and your v6 path might be “up” in routing but “down” in reality (firewalls, reverse DNS, reputation).
  9. Large providers apply behavior beyond the base standards. Throttling, reputation, and heuristics can dominate what you experience as “delivery.”

Where multi-MX designs go wrong in production

The seductive wrong assumption: “Higher number means standby”

People publish MX 10 for primary, MX 20 for standby, and assume the standby will only be used when primary is down.
That’s not guaranteed. “Down” is ambiguous: unreachable? rejecting? slow? temporary 4xx? DNS stale? IPv6 broken?
Many senders will try the secondary if the primary is merely slow or intermittently failing.

Then your standby becomes a production ingress path without the production hardening.
It might not have correct TLS configuration, or it might be in a different filtering path, or it might not be authorized to send bounces properly.

Equal priority as “load balancing” without symmetry

If you publish two MX with the same preference, you are inviting load distribution.
That’s fine only if both endpoints are operationally identical from the perspective of SMTP:
same anti-spam posture, same rate limits, same TLS policy, same banner identity, same routing internally, same ability to accept for all recipients.

In the real world, one node is a little older, a little misconfigured, a little less monitored.
Guess where most of your weird bounces come from.

“Backup MX” that accepts for everyone becomes a spam sink

A classic failure mode: the backup MX accepts mail for domains it doesn’t ultimately deliver to, because it’s configured as an open relay for inbound.
Not an open relay in the obvious “send anywhere” sense, but in the “accept for any recipient at SMTP time, then fail later” sense.
Spammers love that because it hides recipient validity and shifts pain to your queue.

If your backup MX does not have recipient validation (or cannot perform it because it lacks directory access), it must be configured extremely carefully.
Otherwise you’re building a liability with excellent uptime.

DNS mistakes: MX pointing at IPs, CNAMEs, or broken hostnames

MX targets must be hostnames that resolve to A and/or AAAA records. They cannot be raw IP addresses.
Many DNS providers will let you type nonsense. SMTP senders will not.

CNAME-at-MX is widely discouraged and often rejected by validators. Some resolvers follow it; some systems treat it as misconfiguration.
You don’t want “some” when the global mail ecosystem is the client.

Negative caching and propagation myths

When you “fix DNS,” you don’t instantly fix mail. Resolvers cache.
Some cache too long. Some ignore your TTL. Some have stale answers due to intermediary resolvers.
Your monitoring might show the right answer while a major sender still sees the old one.

Fast diagnosis playbook

When mail delivery is weird, do not start by tweaking MX numbers.
Start by finding where delivery is failing: DNS answer, TCP reachability, SMTP dialogue, policy rejection, or queue behavior.

First: confirm what the world sees (DNS, from multiple vantage points)

  • Query MX and A/AAAA for the domain and for each MX target.
  • Check TTLs and whether answers differ between resolvers.
  • Confirm there is no CNAME at MX target.

Second: test TCP/25 and SMTP banner on each MX target (IPv4 and IPv6)

  • Fast failures (refused) are good signals; timeouts are throughput killers.
  • Check firewall, routing, and provider blocks.
  • Confirm correct hostname and TLS capabilities if required.

Third: look at server-side acceptance and queue behavior

  • Are you rejecting at connect, at HELO/EHLO, at MAIL FROM, or at RCPT TO?
  • Are you deferring (4xx) due to greylisting, rate limiting, DNSBL, or policy?
  • Is your “backup” queueing indefinitely because it can’t deliver internally?

Fourth: correlate with logs and sender patterns

  • Which remote IPs are connecting to which MX target?
  • Are large providers consistently using the secondary?
  • Do failures cluster on IPv6, one AZ, or one specific hostname?

Practical tasks: commands, outputs, and decisions

These are the tasks I actually run when someone says “MX failover isn’t working” or “mail is delayed.”
Each task includes: the command, what the output means, and the decision you make.

Task 1: See the MX set as your resolver sees it

cr0x@server:~$ dig +nocmd example.com MX +noall +answer
example.com.        300     IN      MX      10 mx1.example.net.
example.com.        300     IN      MX      20 mx2.example.net.

Meaning: Preference 10 is tried before 20. TTL is 300 seconds.

Decision: If the set is wrong, fix DNS first. If it’s right, move on; MX numbers alone don’t explain “delays.”

Task 2: Check whether MX targets actually resolve (A and AAAA)

cr0x@server:~$ dig +nocmd mx1.example.net A mx1.example.net AAAA +noall +answer
mx1.example.net.    300     IN      A       203.0.113.10
mx1.example.net.    300     IN      AAAA    2001:db8:10::10

Meaning: Both IPv4 and IPv6 are published.

Decision: If you publish AAAA, you must operate IPv6. If IPv6 is broken, either fix it or stop advertising it.

Task 3: Detect a CNAME hiding behind an MX hostname

cr0x@server:~$ dig +nocmd mx2.example.net CNAME +noall +answer
mx2.example.net.    300     IN      CNAME   mail-gw.provider.example.

Meaning: Your MX hostname is a CNAME.

Decision: Replace with an A/AAAA name directly. Some senders will fail or treat this as misconfiguration.

Task 4: Compare answers from different resolvers (spot propagation/caching issues)

cr0x@server:~$ dig @1.1.1.1 example.com MX +noall +answer
example.com.        300     IN      MX      10 mx1.example.net.
example.com.        300     IN      MX      20 mx2.example.net.
cr0x@server:~$ dig @8.8.8.8 example.com MX +noall +answer
example.com.        3600    IN      MX      5 oldmx.example.org.

Meaning: Different resolvers disagree. One still serves the old MX with a longer TTL.

Decision: Expect delayed cutover. Keep old infrastructure alive until caches age out, or reduce TTL well before migrations.

Task 5: Verify TCP/25 reachability quickly (IPv4)

cr0x@server:~$ nc -vz -w 3 203.0.113.10 25
Connection to 203.0.113.10 25 port [tcp/smtp] succeeded!

Meaning: Port 25 is reachable and the host is listening.

Decision: If this fails, fix firewall/routing/provider blocks before touching DNS. MX “failover” won’t help if both are blocked.

Task 6: Verify TCP/25 reachability (IPv6)

cr0x@server:~$ nc -6 -vz -w 3 2001:db8:10::10 25
nc: connect to 2001:db8:10::10 port 25 (tcp) failed: No route to host

Meaning: IPv6 path is broken.

Decision: If you can’t fix IPv6 quickly, remove AAAA for the MX hostnames (or fix routing/firewall). Otherwise some senders will stall on timeouts.

Task 7: Check the SMTP banner and basic dialogue

cr0x@server:~$ openssl s_client -starttls smtp -connect mx1.example.net:25 -servername mx1.example.net -crlf
CONNECTED(00000003)
220 mx1.example.net ESMTP Postfix
EHLO test.example
250-mx1.example.net
250-STARTTLS
250 PIPELINING

Meaning: Server answers and offers STARTTLS. Banner identity looks sane.

Decision: If STARTTLS fails or the banner is wrong (e.g., generic cloud host), expect reputation and policy issues. Fix TLS and identity before adding more MX.

Task 8: Ask which MX is chosen by your MTA (from a Linux host)

cr0x@server:~$ postconf -n | grep -E 'myhostname|mydestination|relayhost'
myhostname = app1.internal.example
mydestination = $myhostname, localhost.localdomain, localhost
relayhost =

Meaning: This host will deliver directly (no relayhost configured).

Decision: If you intended to relay through a smart host, set relayhost. Otherwise this box is now part of your mail delivery surface area.

Task 9: Trace an outbound delivery decision (Postfix)

cr0x@server:~$ postmap -q example.com /etc/postfix/transport
smtp:

Meaning: No special transport; will use normal SMTP MX lookups.

Decision: If you need to force a specific next hop (e.g., to avoid a broken MX during incident response), use a temporary transport map entry.

Task 10: Look for deferrals and why they happen (queue inspection)

cr0x@server:~$ mailq
-Queue ID-  --Size-- ----Arrival Time---- -Sender/Recipient-------
7F2A51C02F     4210 Mon Jan  4 10:12:19  alerts@example.com
                                         (connect to mx1.example.net[203.0.113.10]:25: Connection timed out)
                                         user@example.com

-- 1 Kbytes in 1 Request.

Meaning: The sender is timing out to the primary. That’s slow-fail, the worst kind.

Decision: Fix connectivity or firewall. If you can’t, temporarily lower timeout settings (carefully) and ensure secondary is healthy.

Task 11: Confirm which remote IPs are hitting which MX (log sampling)

cr0x@server:~$ sudo grep -E 'connect from' /var/log/mail.log | tail -n 5
Jan  4 10:15:01 mx2 postfix/smtpd[18422]: connect from mail-oi1-f170.google.com[209.85.167.170]
Jan  4 10:15:09 mx2 postfix/smtpd[18422]: connect from mta7.am0.yahoodns.net[67.195.228.94]
Jan  4 10:15:14 mx2 postfix/smtpd[18422]: connect from outbound.protection.outlook.com[40.92.72.65]
Jan  4 10:15:18 mx2 postfix/smtpd[18422]: connect from mail.example-sender.net[198.51.100.44]
Jan  4 10:15:22 mx2 postfix/smtpd[18422]: connect from unknown[192.0.2.55]

Meaning: Big providers are connecting to mx2 (the “secondary”).

Decision: Treat mx2 as production. If it’s not equivalent to mx1, you’ve created a split-brain of acceptance policies.

Task 12: Test recipient acceptance at SMTP time (RCPT TO behavior)

cr0x@server:~$ swaks --to nonexistent-user@example.com --server mx2.example.net --timeout 10
=== Trying mx2.example.net:25...
=== Connected to mx2.example.net.
<-  220 mx2.example.net ESMTP Postfix
 -> EHLO client.example
<-  250-mx2.example.net
<-  250 PIPELINING
 -> MAIL FROM:<test@client.example>
<-  250 2.1.0 Ok
 -> RCPT TO:<nonexistent-user@example.com>
<-  250 2.1.5 Ok
 -> DATA
<-  554 5.7.1 Service unavailable; Client host blocked

Meaning: mx2 accepts an invalid recipient at RCPT time (250) but fails later. That’s a recipe for backscatter and queue growth.

Decision: Configure recipient validation (reject unknown users at RCPT TO) or ensure mx2 is not listed as MX for domains it can’t validate.

Task 13: Verify reverse DNS for the MX IPs (reputation basics)

cr0x@server:~$ dig +short -x 203.0.113.10
mx1.example.net.

Meaning: PTR matches the hostname used in SMTP identity.

Decision: If PTR is missing or generic, fix it. Some receivers treat mismatched rDNS as a trust signal and will throttle or reject.

Task 14: Check SPF alignment for inbound gateways that might generate bounces

cr0x@server:~$ dig +short example.com TXT
"v=spf1 ip4:203.0.113.10 ip4:203.0.113.20 -all"

Meaning: Only those IPs are authorized to send on behalf of example.com.

Decision: If your “secondary” sends any mail that claims to be from your domain (including DSNs), ensure SPF (and DKIM/DMARC where appropriate) won’t cause rejections.

Task 15: Confirm DMARC policy (to predict how strict receivers will be)

cr0x@server:~$ dig +short _dmarc.example.com TXT
"v=DMARC1; p=reject; rua=mailto:dmarc-rua@example.com"

Meaning: Strict policy; misaligned mail is rejected.

Decision: If you route mail through different gateways, keep signing/alignment consistent. Otherwise some paths will fail and look “random.”

Task 16: Spot SMTP timeouts vs refusals (kernel counters can hint)

cr0x@server:~$ sudo ss -tn state syn-sent '( dport = :25 )' | head
State    Recv-Q   Send-Q     Local Address:Port      Peer Address:Port
SYN-SENT 0        1          10.0.0.15:45822         203.0.113.10:25
SYN-SENT 0        1          10.0.0.15:45824         203.0.113.10:25

Meaning: Many SYN-SENT connections indicate timeouts or filtered packets, not application-level rejection.

Decision: Look at firewall rules and upstream filtering. If you see SYN-SENT pileups, changing MX preference won’t help.

Three corporate mini-stories (anonymized but painfully real)

Mini-story 1: The incident caused by a wrong assumption (“Standby means unused”)

A mid-sized company ran their inbound mail on a pair of MTAs in different data centers. MX 10 pointed at the primary.
MX 20 pointed at a “backup” server that hadn’t been touched in months because it was “only for emergencies.”
The team felt responsible. Two MX records! Resilience achieved! They even told leadership it was “active-passive.”

Then a firewall change introduced intermittent packet loss to the primary DC. Not an outage—just enough dropped SYN/ACKs to cause timeouts.
Some senders retried and eventually hit the secondary. Others queued for hours before trying the secondary.
A few gave up after their own retry policy expired. The symptom looked like random missing mail. That’s the worst kind of incident.

The backup MTA was running an older configuration that didn’t enforce recipient validation at RCPT time.
It accepted mail for typo’d recipients, then later failed delivery internally. Queue disk filled. Then it started failing legitimate mail too.
Meanwhile, the primary was “mostly up,” so monitoring didn’t scream until humans did.

The fix was brutally boring: make both MTAs equivalent, add proper recipient checks, align policy and TLS, and monitor deliverability, not just process uptime.
They also learned to treat timeouts as severity-one events. Refusals fail fast; timeouts fail slowly and steal your throughput.

Mini-story 2: The optimization that backfired (“Equal preference will load-balance nicely”)

Another org wanted to reduce inbound load on a single gateway. They published two MX records at preference 10.
They expected a clean 50/50 split and patted themselves on the back for avoiding a load balancer.
DNS-based load distribution! Classic.

It mostly worked—until it didn’t. One gateway had a slightly stricter TLS configuration: newer ciphers, different certificate chain, slightly different TLS handshake behavior.
A subset of older sender stacks failed STARTTLS negotiation and fell back to plaintext—or failed entirely based on their policy.
Suddenly, “some partners can’t email us” became a recurring ticket type.

Worse, spam filtering was not identical. One gateway learned faster (more CPU, newer rules), so spammers gravitated toward the weaker one.
That gateway’s IP reputation degraded, which made legitimate mail from certain providers arrive more slowly due to throttling.
Everyone blamed DNS because that’s what had changed. DNS was innocent; asymmetry was guilty.

They eventually moved to a model where both MX targets were truly identical in policy and patching cadence, and they validated TLS interoperability in staging.
Equal-preference MX isn’t “wrong.” Unequal endpoints behind equal-preference MX is wrong.

Mini-story 3: The boring but correct practice that saved the day (“Keep old MX alive, lower TTL early”)

A large enterprise migrated from an on-prem gateway to a hosted mail security service.
The migration plan wasn’t glamorous. It was a checklist: lower MX TTL days in advance, publish new MX with higher preference first, verify acceptance,
then swap preferences during a quiet window. Keep the old gateway online for a full cache-aging period.

During cutover, an unexpected regional DNS resolver issue caused some clients to receive stale MX answers.
Predictably, those clients kept sending mail to the old gateway. But the old gateway was still running, still accepting, and still forwarding to the new service.
No one noticed in the business. That’s the best outcome: invisibly correct.

The team’s monitoring was also boring and effective: they had synthetic SMTP tests to each MX target, tracking connection latency and STARTTLS success.
When the hosted service had a brief rate-limiting blip, the synthetic checks saw rising 4xx deferrals immediately.
The team responded by coordinating with the vendor and temporarily adjusting their own inbound throttles.

Nobody got a trophy. But nobody got paged at 3 a.m. either, which is the closest thing operations has to a trophy.

Joke #2: “It’s just DNS” is the email equivalent of “it’s just a small change.” Both phrases age badly.

Common mistakes: symptom → root cause → fix

1) Symptom: Some senders deliver to secondary even when primary is “up”

Root cause: Primary is slow or intermittently timing out; senders fall back based on their own retry logic. Could also be IPv6 broken on primary.

Fix: Eliminate timeouts. Check TCP reachability, SYN-SENT, firewall drops, and IPv6. Ensure primary fails fast when unhealthy (or remove broken paths).

2) Symptom: Mail is delayed for hours, then arrives in bursts

Root cause: Senders are queueing due to transient errors (4xx) or timeouts. Your MX “failover” is happening, but on a long retry schedule.

Fix: Find the exact SMTP stage returning 4xx. Fix rate limits, greylisting, DNSBL triggers, or upstream filtering. Reduce timeout-induced stalls.

3) Symptom: Random bounces referencing one MX hostname

Root cause: Unequal configuration between MX targets (TLS, anti-spam, auth requirements, recipient validation, or routing downstream).

Fix: Make MX targets symmetrical or remove the weaker one. If you can’t keep them equivalent, don’t advertise them equally (or at all).

4) Symptom: Mail loops or “relaying denied” from the secondary

Root cause: Secondary is not configured to accept mail for the domain, or it can’t route to the internal destination. Often happens with “backup MX” that doesn’t know your domains.

Fix: Ensure secondary is authoritative for the recipient domain and has correct transport routes. Validate with SMTP RCPT tests and end-to-end delivery.

5) Symptom: “Host not found” bounces after an MX change

Root cause: MX target hostname has no A/AAAA, or was mistyped, or relies on a CNAME chain that breaks in some resolvers.

Fix: Publish direct A/AAAA for MX targets; remove CNAME-at-MX; verify with dig and from multiple resolvers.

6) Symptom: Delivery failures only from IPv6-capable senders

Root cause: AAAA exists but IPv6 path is broken (routing, firewall, NAT64 weirdness, provider blocks) or reverse DNS missing for IPv6.

Fix: Either operate IPv6 fully (connectivity, rDNS, reputation), or do not publish AAAA for MX hosts.

7) Symptom: Increased spam after adding a backup MX

Root cause: Backup MX accepts for invalid recipients (no recipient validation), becoming an attractive target and queue burden.

Fix: Reject invalid recipients at RCPT time, or ensure backup MX has directory access / recipient maps. Add strict relay controls and monitoring.

8) Symptom: “Works in our tests” but partners report failures

Root cause: You tested from one network/resolver. Partners hit different resolvers, cached answers, different TLS stacks, or policy constraints.

Fix: Test from multiple networks and resolvers. Validate TLS interoperability and SMTP dialogue. Monitor from external probes, not just internally.

Checklists / step-by-step plan

Checklist: designing multiple MX records (what to do, not what to hope)

  1. Decide the intent: distribution (equal preference) or preference ordering (primary/secondary). Don’t mix narratives.
  2. Make every advertised MX production-grade: patching, monitoring, capacity, logs, backups, and on-call ownership.
  3. Keep policy consistent: TLS, cipher policy, anti-spam, rate limiting, recipient validation, and banner identity.
  4. Operate IPv6 deliberately: publish AAAA only if you’ve tested reachability, rDNS, and firewall paths.
  5. Set sensible TTLs: lower TTL well ahead of planned migrations; avoid frantic same-day TTL changes.
  6. Avoid CNAME-at-MX: publish A/AAAA directly for the MX hostname.
  7. Plan for queueing: understand that failover may mean delayed delivery. Communicate that to stakeholders.
  8. Validate end-to-end: test SMTP acceptance and internal routing, not just “port is open.”

Step-by-step: safe MX cutover (minimal drama edition)

  1. T-7 days: Lower MX TTL (e.g., 3600 → 300). Keep old TTL long enough to propagate safely.
  2. T-3 days: Bring new MX online. Ensure it can accept mail for the domain and deliver internally.
  3. T-2 days: Publish new MX with higher preference (as a canary). Example: existing MX 10, new MX 20.
  4. T-2 to T-1 days: Observe logs. Confirm who connects to the new MX and that messages deliver correctly.
  5. T-0: Swap preferences (new becomes lower number). Keep old MX online with higher preference.
  6. T+1 day: Continue monitoring from multiple resolvers and synthetic SMTP checks.
  7. T+cache-aging window: Remove old MX only after you’re confident stale resolvers are no longer feeding it traffic.

Checklist: “backup MX” done correctly (rare, but possible)

  1. Recipient validation is non-negotiable: reject unknown recipients at RCPT time.
  2. Store-and-forward limits: cap queue growth; alert early; ensure disk and I/O headroom.
  3. Strict relay controls: never allow it to become a general-purpose relay.
  4. Consistent policy: same TLS and filtering posture as primary to avoid reputation drift.
  5. Operational ownership: monitored, patched, tested, and included in incident drills.

FAQ

1) Does a lower MX number guarantee senders will always use that server?

No. It’s a preference order, not an enforcement mechanism. Senders can cache old answers, apply their own policies,
and fall back when the preferred host is slow, timing out, or transiently rejecting.

2) If my primary MX is down, will mail immediately go to the secondary?

Sometimes. If “down” results in fast failures (connection refused), many senders will quickly try the next MX.
If “down” looks like timeouts, they may burn time per attempt and then queue and retry later, leading to delays.

3) Should I set equal MX preference values for load balancing?

Only if both endpoints are equivalent operationally and policy-wise. Equal preference means “either is fine.”
If one is weaker, you’ll get weird partial failures that look random.

4) Can an MX record point directly to an IP address?

No. MX targets must be hostnames that resolve to IPs via A/AAAA records.
If you need direct IP routing control, you’re looking for a different tool than MX.

5) Is it okay for an MX target to be a CNAME?

It’s widely discouraged and causes interoperability issues. Some systems follow the CNAME; others treat it as invalid or behave inconsistently.
Publish A/AAAA directly for the MX hostname.

6) How many MX records should I publish?

Two is common. More than three rarely helps and often increases the surface area for misconfiguration.
Every MX you publish must be maintained like it’s receiving real traffic—because it will.

7) Can I use a “backup MX” that only stores mail and forwards later?

You can, but it must validate recipients (or you’ll accept junk you can’t deliver), and it must be secured and monitored.
Many modern orgs prefer using a reputable inbound mail security service rather than self-running store-and-forward.

8) Why do I see mail hitting my secondary MX even when everything is healthy?

Because some senders distribute across equal preference, because of cached DNS answers, because the sender’s own heuristics prefer a certain path,
or because your primary occasionally fails slow (timeouts), pushing retries to the secondary.

9) Do TTL changes take effect immediately?

TTL changes only affect caches that query after the change. Resolvers that cached the old record will keep it until it expires,
and some intermediate resolvers behave poorly. Plan TTL reductions ahead of time.

10) Does MX priority influence outbound mail from my domain?

Not directly. MX is for inbound routing to your domain. Outbound is driven by your MTA configuration, relayhost settings, and provider routing.
People confuse these constantly, especially during migrations.

Conclusion: next steps you can do today

Multiple MX records are not a magic redundancy checkbox. They’re a published contract with the world:
“Here are the doors you may use to deliver mail to us.” Every door must open reliably.

Practical next steps:

  1. Inventory your MX set and verify every target resolves (A/AAAA) and answers on TCP/25 from outside your network.
  2. If you publish IPv6, test IPv6 SMTP reachability and rDNS. If you can’t operate it, stop advertising it.
  3. Make your MX endpoints policy-identical: TLS behavior, recipient validation, anti-spam, and routing downstream.
  4. Build a small synthetic monitor: connect, read banner, issue EHLO, attempt STARTTLS, and record latency. Alert on timeouts and rising 4xx deferrals.
  5. For future changes, lower TTL early and keep the old MX alive long enough for caches to age out.

Do those, and “MX priority” becomes a useful tool instead of a recurring mystery.

← Previous
Tabs and Accordions Without Libraries: details/summary + Progressive Enhancement
Next →
ZFS RAIDZ1: The Risk Calculation Most People Never Do

Leave a comment