Email bounces are one of those problems that look like “marketing’s thing” until production goes sideways and your incident channel fills with screenshots of “550 something something.” Meanwhile, customers aren’t getting password resets, invoices, alerts, or that one message that legally must arrive.
If you can’t quickly answer “is this permanent, temporary, our fault, or the recipient’s?”, you’ll either overreact (rollbacks, panicked IP changes) or underreact (keep retrying a doomed delivery for 3 days). Let’s do this the way operators do: read the codes, confirm the hypothesis with logs and live checks, then fix the root cause—not the symptom.
A working mental model: what a bounce really is
A “bounce” is not a vibe. It’s a remote SMTP server (or a gateway in front of it) telling you that it did not accept your message for delivery. Sometimes it refuses immediately during the SMTP conversation. Sometimes it accepts the message and later generates a Delivery Status Notification (DSN) when it can’t deliver it locally.
For operations, the most important split is:
- Hard bounce (typically 5xx): a permanent failure. Retrying is usually a waste and can harm your reputation.
- Soft bounce (typically 4xx): a temporary failure. Retrying is expected, but repeated deferrals can become a reputation issue too.
Then you ask the next question: whose problem is it?
- Recipient problem: mailbox doesn’t exist, mailbox full, policy blocks on their side.
- Your problem: DNS/authentication issues, IP reputation, broken TLS, malformed messages, rate spikes, misrouted MX, misconfigured MTA.
- Network problem: timeouts, handshake failures, path MTU weirdness, transient routing.
Most teams fail here because they treat all bounces as equal. They are not. “550 5.1.1 user unknown” is a data hygiene problem. “421 4.7.0 try again later” is often a volume/reputation/throttling problem. “554 5.7.1” is the broad daylight mugging of deliverability: you got blocked.
Facts and history that actually help you debug
- SMTP is older than most of your infrastructure. It dates back to the early 1980s (RFC 821). Many behaviors are “because that’s how it worked on the ARPANET,” not because it’s elegant.
- Enhanced status codes (like 5.7.1) were introduced later to make error classes machine-readable (RFC 3463). Not all servers use them correctly.
- The “bounce message” you receive is often generated by your own MTA. If the remote rejects during SMTP, your server creates the DSN to tell your app/user what happened.
- Greylisting became popular in the mid‑2000s as an anti-spam tactic: temporarily reject first delivery attempts (often with a 4xx) to filter naive spam bots.
- RBLs (real-time blackhole lists) predate “deliverability engineering” as a job title. They started as community-maintained lists and evolved into a messy ecosystem of reputation signals.
- SPF existed before DKIM (SPF in early 2000s; DKIM standardized later). SPF authenticates the sending IP; DKIM authenticates message content with signatures.
- DMARC came later to tie SPF and DKIM together with a policy and reporting mechanism. A strict DMARC policy can turn “sometimes delivered” into “consistently rejected.”
- Big mailbox providers don’t just block on content. They use behavioral signals: complaint rates, bounce rates, engagement, and sending patterns. That’s why “it worked yesterday” is not evidence.
- SMTP reply texts are not standardized. The numeric codes are the contract; the free-form text is a hint—often a cryptic hint.
The anatomy of a bounce message (DSN fields you should read)
When you get a bounce email (DSN), it usually contains a structured part and a human-readable part. If you only read the subject line, you’re debugging blind.
Key DSN fields
- Final-Recipient: who it was trying to deliver to.
- Action: failed, delayed, delivered, relayed (most bounces are failed or delayed).
- Status: enhanced status code (like 5.1.1, 4.7.0).
- Remote-MTA: the remote server name (sometimes a gateway, not the final mailbox).
- Diagnostic-Code: often includes the raw SMTP reply.
- Reporting-MTA: your system that generated the DSN.
Operators read these fields like a crash dump: status code first, diagnostic text second, then correlate with logs for the SMTP transaction ID. If you can’t correlate bounces to queue IDs and timestamps, you’ll waste hours arguing with screenshots.
Bounce code families: 2xx, 4xx, 5xx and what they imply
2xx: success (not a bounce)
“250 2.0.0 Ok” means accepted. Not necessarily read. Not necessarily delivered to inbox. But accepted for delivery, which is all SMTP promises.
4xx: temporary failure (soft bounce / deferral)
4xx means “try again later.” Your MTA will queue and retry. But you still need to decide whether the root cause is transient (remote downtime) or chronic (your reputation causing throttling).
- 421: service not available / closing transmission channel. Often throttling or temporary refusal.
- 450/451/452: mailbox unavailable / local error / insufficient system storage. Could be greylisting, policy, or their capacity issue.
5xx: permanent failure (hard bounce)
5xx means “don’t retry this exact delivery as-is.” Your MTA may still retry depending on configuration, but you generally shouldn’t keep hammering a 550.
- 550: mailbox unavailable, user unknown, policy rejection.
- 552: exceeded storage allocation (mailbox full).
- 553: mailbox name not allowed (bad address syntax or recipient policy).
- 554: transaction failed, often policy-related (spam, blocklists, content).
Operator rule: treat 5xx as a data or policy problem that needs a change. Treat 4xx as a capacity or reputation signal that needs rate control and patience—unless it persists.
Enhanced status codes (x.y.z): the good, the bad, the misleading
Enhanced status codes give you a second dimension. The first digit is still 2/4/5. The second digit is the class (addressing, mailbox, mail system, network, security/policy). The third digit is the specific condition.
High-signal enhanced codes you’ll meet daily
- 5.1.1 — bad destination mailbox address (user unknown). Usually a dead address.
- 5.2.2 — mailbox full. Could be temporary, but most systems send 5xx anyway.
- 5.7.1 — delivery not authorized / message refused (policy). This is where blocks live.
- 4.7.0 — temporary policy failure (throttled, greylisted, rate-limited).
- 4.4.1 — connection timed out (network or remote issues).
- 5.3.0 — other mail system status (often “message too large” or general failure).
Here’s the catch: some providers slap “5.7.1” on anything they don’t like. It can mean “spam,” “missing SPF,” “bad DKIM,” “IP reputation,” “your HELO looks suspicious,” or “you sent too fast.” You still need evidence.
Paraphrased idea from Werner Vogels (reliability/ops): “everything fails, so design for failure.” Bounces are email’s way of being honest about that.
Fast diagnosis playbook (first/second/third checks)
This is the “you’re on call and someone just posted a bounce screenshot” drill. The goal is to find the bottleneck in under 10 minutes: address, auth, reputation, content, or remote capacity.
First: classify the failure in 30 seconds
- Is it 4xx or 5xx? If 5xx: stop retries for that recipient class; it’s probably permanent or policy. If 4xx: look for throttling vs network.
- Is the enhanced code present? 5.1.1 (address), 5.2.2 (quota), 5.7.1 (policy), 4.7.0 (temporary policy) are the big ones.
- Is it all recipients or a subset? One domain only suggests remote policy/reputation with that provider; all domains suggests your auth/DNS or your outbound MTA.
Second: validate your identity (DNS/auth) quickly
- Check SPF for the MAIL FROM / Return-Path domain.
- Check DKIM signing is happening and the selector record exists.
- Check DMARC policy and alignment (especially after domain changes).
- Confirm reverse DNS (PTR) matches your HELO/EHLO expectations.
Third: check reputation and rate behavior
- Look for “blocked,” “blacklist,” “spam,” “policy,” “too many connections” in diagnostic text.
- Check your outbound queue growth and retry patterns. A growing deferred queue is a rate/reputation smoke alarm.
- Identify whether this is a single IP, a whole pool, or a specific traffic type (password resets vs newsletters).
Fourth: isolate content and transport issues
- Message size, attachments, and line endings.
- TLS handshake failures and cipher mismatches.
- Malformed headers (From, Message-ID, Date) or weird MIME boundaries.
Practical tasks: commands, expected output, and what decision you make
These are real operator moves. Each task includes: a command, what typical output means, and the decision you make.
Task 1: Find the bounce reason in Postfix logs by queue ID
cr0x@server:~$ sudo grep -E "^[A-Z][a-f0-9]{9,12}:" /var/log/mail.log | grep "status=bounced" | tail -n 3
A1B2C3D4E5: to=<user@example.com>, relay=none, delay=0.2, delays=0.1/0/0/0.1, dsn=5.1.1, status=bounced (host mx.example.com[203.0.113.10] said: 550 5.1.1 <user@example.com> User unknown (in reply to RCPT TO command))
F6E7D8C9B0: to=<billing@corp.tld>, relay=mx.corp.tld[198.51.100.25]:25, delay=12, delays=0.2/0.1/10/1.7, dsn=4.7.0, status=deferred (host mx.corp.tld[198.51.100.25] said: 421 4.7.0 Try again later, rate limited (in reply to end of DATA command))
0AA11BB22C: to=<alerts@other.tld>, relay=none, delay=0.1, delays=0.05/0/0/0.05, dsn=5.7.1, status=bounced (host mx.other.tld[192.0.2.55] said: 550 5.7.1 Message rejected as spam (in reply to end of DATA command))
What it means: 5.1.1 is a dead address; 4.7.0 is a retry/throttle; 5.7.1 is policy/content/reputation.
Decision: suppress/clean invalid recipients for 5.1.1; apply rate limiting and backoff for 4.7.0; start auth/reputation/content investigation for 5.7.1.
Task 2: Inspect the current mail queue size and whether it’s deferred-heavy
cr0x@server:~$ mailq | tail -n 5
-- 1230 Kbytes in 184 Requests.
What it means: queue size alone is not enough, but a sudden spike suggests remote deferrals or local issues.
Decision: if queue growth correlates with 4xx deferrals, throttle outbound and investigate reputation; if you see local errors, check disk and MTA health.
Task 3: Summarize DSN codes in logs (what’s dominant right now)
cr0x@server:~$ sudo grep -oE "dsn=[245]\.[0-9]\.[0-9]" /var/log/mail.log | sort | uniq -c | sort -nr | head
412 dsn=4.7.0
119 dsn=5.7.1
88 dsn=5.1.1
41 dsn=4.4.1
What it means: 4.7.0 dominating points to throttling/greylisting/policy deferrals. 5.7.1 is blocks. 4.4.1 suggests network timeouts.
Decision: prioritize rate control and reputation if 4.7.0; prioritize authentication/content/policy if 5.7.1; check connectivity if 4.4.1 rises.
Task 4: Check a domain’s MX records (are you even talking to the right servers?)
cr0x@server:~$ dig +short MX example.com
10 mx1.example.com.
20 mx2.example.com.
What it means: if MX is wrong or points to dead hosts, you’ll see timeouts (4.4.1) or connection failures.
Decision: if MX changed recently, confirm propagation; if the recipient domain is misconfigured, you can’t fix it—stop retry storms and notify.
Task 5: Verify your sending IP has reverse DNS (PTR) set
cr0x@server:~$ dig +short -x 198.51.100.10
mailout1.yourdomain.tld.
What it means: missing or generic PTR is a common cause of 5.7.1 policy rejects, especially at stricter domains.
Decision: if PTR is missing/mismatched, fix rDNS with your provider and align HELO/EHLO to a name that resolves and matches policy expectations.
Task 6: Check SPF record for the envelope-from domain
cr0x@server:~$ dig +short TXT yourdomain.tld
"v=spf1 ip4:198.51.100.0/24 include:_spf.your-esp.tld -all"
What it means: SPF is present. The qualifier matters: -all is strict fail; ~all is softfail.
Decision: if your sending IPs aren’t included, add them. If you have too many includes, you may hit DNS lookup limits and cause intermittent failures.
Task 7: Validate DKIM selector record exists in DNS
cr0x@server:~$ dig +short TXT s1._domainkey.yourdomain.tld
"v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAt..."
What it means: selector is published. If this query returns nothing, receivers can’t verify your signature.
Decision: publish the selector or fix the domain/selector mismatch in your MTA/ESP configuration.
Task 8: Check DMARC policy (and whether you accidentally set it to reject)
cr0x@server:~$ dig +short TXT _dmarc.yourdomain.tld
"v=DMARC1; p=quarantine; rua=mailto:dmarc@yourdomain.tld; adkim=s; aspf=s"
What it means: strict alignment (adkim=s, aspf=s) can cause rejects if your From domain doesn’t align with SPF/DKIM domains.
Decision: if you see bounces after a DMARC change, confirm alignment; relax alignment temporarily only if you understand the spoofing trade-off.
Task 9: Test an SMTP session manually to see the raw rejection point
cr0x@server:~$ openssl s_client -starttls smtp -crlf -connect mx.other.tld:25
CONNECTED(00000003)
220 mx.other.tld ESMTP
ehlo mailout1.yourdomain.tld
250-mx.other.tld
250-STARTTLS
250 SIZE 52428800
mail from:<sender@yourdomain.tld>
250 2.1.0 Ok
rcpt to:<alerts@other.tld>
550 5.7.1 Blocked due to policy
What it means: rejection at RCPT TO is often recipient/policy-level; rejection after DATA is more content/reputation sensitive.
Decision: if blocked before DATA, focus on identity/reputation; if blocked after DATA, inspect content, headers, and sending pattern.
Task 10: Confirm your server presents the correct certificate chain for SMTP TLS
cr0x@server:~$ sudo postconf -n | grep -E "smtpd_tls_cert_file|smtpd_tls_key_file|smtpd_tls_CAfile"
smtpd_tls_cert_file = /etc/ssl/certs/mail.pem
smtpd_tls_key_file = /etc/ssl/private/mail.key
smtpd_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
What it means: you know which files are in play. TLS failures can show up as 4.4.1 timeouts or handshake errors in logs.
Decision: if you recently rotated certs and bounces started, verify permissions, chain completeness, and cipher settings.
Task 11: Identify top recipient domains in your deferred queue (who’s throttling you?)
cr0x@server:~$ mailq | awk '/^[A-F0-9]/ {id=$1} /@/ {print $0}' | sed -n 's/.*@//p' | tr -d '>,' | sort | uniq -c | sort -nr | head
98 gmail.com
71 corp.tld
40 outlook.com
19 yahoo.com
What it means: which domains are currently deferring/queuing you the most.
Decision: implement per-domain throttles and connection limits; consider splitting traffic types across IPs if you’re mixing transactional with bulk.
Task 12: Check for “too many connections” and tune concurrency (without going feral)
cr0x@server:~$ sudo grep -iE "too many connections|rate limited|throttl" /var/log/mail.log | tail -n 5
Jan 03 10:11:22 mail postfix/smtp[23111]: F6E7D8C9B0: to=<billing@corp.tld>, relay=mx.corp.tld[198.51.100.25]:25, dsn=4.7.0, status=deferred (host mx.corp.tld[198.51.100.25] said: 421 4.7.0 Try again later, rate limited)
Jan 03 10:11:40 mail postfix/smtp[23112]: 9D8C7B6A5F: to=<ops@corp.tld>, relay=mx.corp.tld[198.51.100.25]:25, dsn=4.7.0, status=deferred (host mx.corp.tld[198.51.100.25] said: 421 4.7.0 Too many connections)
What it means: the remote side is telling you to slow down.
Decision: reduce concurrency to that domain and increase backoff; don’t “optimize” by opening more connections. That’s how you turn a temporary throttle into a block.
Task 13: Confirm local disk isn’t the real reason you’re failing
cr0x@server:~$ df -h /var/spool/postfix
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 40G 37G 1.2G 97% /
What it means: your spool is nearly full. Postfix may start deferring, and your logs may lie by omission.
Decision: free space immediately and add monitoring; if you run close to full, you’ll get “mail system full” behavior that looks like remote problems.
Task 14: Inspect a specific queued message (check headers, size, and envelope)
cr0x@server:~$ sudo postcat -q A1B2C3D4E5 | sed -n '1,40p'
*** ENVELOPE RECORDS ***
message_size: 48213 48159 54 54
sender: sender@yourdomain.tld
*** MESSAGE CONTENTS ***
Received: by mailout1.yourdomain.tld (Postfix, from userid 1001)
Date: Fri, 03 Jan 2026 10:10:05 +0000
From: "Your App" <noreply@yourdomain.tld>
To: user@example.com
Subject: Reset your password
Message-ID: <20260103101005.12345@mailout1.yourdomain.tld>
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
What it means: you can confirm if headers are sane and whether the sender domain aligns with your auth setup.
Decision: if content is malformed (missing Date/Message-ID, broken MIME), fix the generator; some providers reject malformed messages as policy failures.
Joke 1: Email is the only system where you can do everything right and still get told “try again later” by a machine that won’t explain itself.
Three corporate mini-stories from the bounce trenches
Incident 1: The wrong assumption (5.1.1 that wasn’t “the recipient’s fault”)
One company migrated from a legacy CRM to a new customer platform. During cutover, they ran both systems in parallel. The email team noticed a sudden rise in 550 5.1.1 User unknown bounces for a particular domain. The immediate assumption was the usual: “our list is stale.”
They did what teams do under pressure: they suppressed those addresses. Thousands of them. Support tickets showed up within hours: customers were not receiving login links, and the customers were very real.
The actual root cause lived in a boring corner: the new system was lowercasing and trimming emails correctly, but a mid-layer service was also appending a hidden Unicode character copied from an internal admin tool. Visually, the address looked identical. SMTP servers don’t care about your feelings; they care about bytes. The recipient system treated it as a different local-part and replied “user unknown.”
The fix wasn’t “clean your list.” It was to normalize and validate addresses at ingestion, log the raw bytes, and add tests that compared UTF-8 normalized forms. They also updated suppression logic to require repeated failures and to treat “newly failing addresses” as suspicious during migrations.
Afterward, they kept a rule: if bounce rates jump right after a data pipeline change, assume you broke the addresses until proven otherwise.
Incident 2: The optimization that backfired (more throughput, more blocks)
A different org ran their own Postfix fleet and shipped a lot of legitimate email: invoices, receipts, and event notifications. Their queue started backing up during peak hours. Someone had an idea that sounded reasonable: “increase concurrency and shorten retry intervals so mail clears faster.”
They bumped per-destination concurrency and reduced backoff. The queue cleared. For about a day. Then bounces spiked: 421 4.7.0 deferrals became 550 5.7.1 blocks. The provider-specific diagnostic text included “rate limited” and then “blocked due to policy.”
What happened is classic: they trained the receiver to distrust them. Too many parallel connections, too many messages in a short time, and too little patience. Even compliant email can look like abuse when you send it like a denial-of-service test.
The rollback was painful because “undo the optimization” didn’t instantly restore reputation. They had to implement adaptive throttling per domain, segment bulk-like traffic away from transactional, and stop retrying aggressively on 4xx. Over time, the blocks eased.
They also learned a lesson that applies to storage and networking too: the fastest way to break a shared system is to optimize for throughput without respecting the other side’s limits.
Incident 3: The boring practice that saved the day (separation of traffic and calm retries)
An enterprise SaaS had a dull-sounding rule: transactional mail and bulk mail do not share IPs, do not share domains, and do not share retry behavior. Marketing hated it because it meant more coordination. Finance hated it because it meant more IP space and more vendor line items.
Then one afternoon, marketing ran a campaign that had unexpectedly high complaint rates. Predictably, the marketing IP pool started getting throttled and blocked. But password resets and security alerts kept flowing. Why? Separate IP reputation and strict rate limiting on the transactional side.
The on-call engineer still got paged, but the page was a heads-up, not a fire. They paused the marketing sends, cleaned the list, adjusted targeting, and waited for reputation to recover—without the business-wide outage of “nobody can log in.”
It wasn’t clever. It was boring. And it worked because it reduced blast radius. That’s an SRE-friendly design goal, even when the system is “just email.”
Joke 2: The quickest way to discover you have no deliverability strategy is to send a “please confirm your email” message that never arrives.
Common mistakes: symptom → root cause → fix
1) Symptom: Lots of 550 5.1.1 “User unknown” right after a migration
Root cause: address transformation bug (whitespace, Unicode, plus-addressing stripped, domain typo) or wrong recipient domain mapping.
Fix: validate at ingestion, normalize carefully, log raw address bytes, and delay suppression until repeated failures across multiple deliveries.
2) Symptom: 421 4.7.0 “Try again later” across one big provider
Root cause: throttling due to rate spikes, poor reputation, or new IP warming too fast.
Fix: per-domain rate limiting, gradual ramp-up, longer backoff on deferrals, and separate traffic types (transactional vs bulk).
3) Symptom: 550 5.7.1 “Blocked” or “Message rejected” suddenly for many domains
Root cause: authentication failure (SPF/DKIM/DMARC), rDNS missing, or IP listed due to complaints or compromised sender.
Fix: verify SPF/DKIM/DMARC alignment; confirm PTR; review sending logs for unusual spikes; rotate credentials; isolate traffic; remediate reputation (and stop blasting).
4) Symptom: 552 5.2.2 mailbox full bounces
Root cause: recipient quota exceeded. Not your infrastructure, but your retries can annoy them.
Fix: treat as “temporary-but-slow”: suppress repeated attempts for a while, notify user through alternative channels, and avoid immediate retries.
5) Symptom: 4.4.1 timeouts and connection errors to specific domains
Root cause: network path issues, broken IPv6 preference, wrong MX, firewall blocks, or remote outages.
Fix: test connectivity from the sending MTA, check DNS resolution (A/AAAA), verify firewall egress, and prefer IPv4 if your IPv6 path is unreliable.
6) Symptom: Bounces mention “TLS” or handshake failures
Root cause: incompatible TLS settings, missing intermediate certificates, clock skew, or broken SNI expectations in some gateways.
Fix: present a complete cert chain, keep OpenSSL current, monitor for handshake errors, and avoid extreme cipher hardening without testing real receivers.
7) Symptom: Only HTML-heavy messages bounce as spam; plaintext is fine
Root cause: content triggers (URL patterns, mismatched From domains, tracking domains, broken MIME, or “image-only” messages).
Fix: ensure valid MIME structure, include a reasonable text part, align visible link domains with authenticated domains, and reduce risky content patterns.
8) Symptom: “Message size exceeds fixed maximum message size”
Root cause: attachment or encoding bloat (base64 overhead) exceeding remote limits.
Fix: reduce attachment size, use download links, and enforce size limits in your app before queueing mail.
Checklists / step-by-step plan
Checklist A: When bounces spike (incident response)
- Stop the bleeding: pause bulk sends; keep transactional flowing if possible.
- Classify: top DSNs (4.7.0 vs 5.7.1 vs 5.1.1) from logs.
- Scope: which domains, which IPs, which traffic class, which release/change window.
- Identity: verify SPF/DKIM/DMARC and rDNS. Fix obvious breakages first.
- Rate behavior: reduce concurrency; increase retry backoff; implement per-domain throttles.
- Content sanity: confirm headers, MIME, message size. Compare a “good” and “bad” sample.
- Queue health: monitor queue size and deferred counts; ensure disk space and MTA processes are stable.
- Remediate: clean invalid recipients; disable compromised senders; segment traffic.
- Verify recovery: watch DSN distribution trend down; confirm acceptance (250) rates increase.
- Post-incident hardening: add dashboards for DSN codes, deferral rates, and per-domain throttling effectiveness.
Checklist B: Building a bounce-handling policy (so you don’t re-learn pain)
- Define hard vs soft bounce logic: do not treat all 5xx as permanent without nuance (quota is special), but don’t retry 5.1.1 forever.
- Set suppression thresholds: immediate suppression for 5.1.1; time-based suppression for 4.7.0; careful handling for 5.2.2.
- Separate traffic classes: transactional vs bulk, ideally separate IPs and domains.
- Per-domain throttling: different providers, different limits. Build a configuration surface you can change without deploys.
- Feedback loop: bounces should update your customer data, but with guardrails during migrations and incidents.
- Audit your From and Return-Path strategy: ensure alignment supports DMARC.
- Runbook everything: a bounce screenshot should translate into a predictable series of checks.
Checklist C: Pre-flight for new sending infrastructure (before you go live)
- PTR/rDNS set and verified.
- HELO/EHLO hostname resolves to the sending IP.
- SPF includes all outbound IPs; does not exceed DNS lookup limits.
- DKIM signing enabled; selectors published; rotation plan exists.
- DMARC policy chosen intentionally; alignment tested.
- Outbound TLS configured; certificates valid; time sync working.
- Queue and log monitoring in place; alerting on deferred spikes.
- Rate limits and concurrency defaults are conservative.
- Traffic separation decided, not “later.”
FAQ
1) What’s the difference between a bounce and a deferral?
A bounce is a definitive non-delivery (typically 5xx). A deferral is a temporary refusal (typically 4xx) that your MTA will retry later.
2) Is 5xx always a hard bounce I should suppress immediately?
Usually, but not blindly. 5.1.1 is a strong suppress-now signal. 5.2.2 (mailbox full) is often better treated as “back off for days” rather than “delete the user.”
3) Why do I see 4.7.0 for hours?
Because the receiver is throttling you or greylisting you. If it persists, treat it as a reputation/rate problem, not a random transient.
4) The diagnostic text says “spam,” but the email is legitimate. Now what?
“Spam” is often shorthand for policy/reputation failures. Check authentication (SPF/DKIM/DMARC), rDNS, sending patterns, and content structure. Then reduce volume and retries while reputation recovers.
5) Can a bad SPF record cause bounces?
Yes. With DMARC enforcement, SPF failures (especially with strict policy) can turn into 5.7.1 rejections. Even without DMARC, some receivers use SPF as a policy input.
6) Why does it work for some recipients at a provider but not others?
Providers run multiple clusters and policy layers. One MX may be stricter, or you may be hitting per-recipient or per-domain throttles. That’s why you need per-domain and per-IP visibility.
7) Should I “fix” bounces by switching IPs?
Only if you know why you’re switching. IP hopping to dodge reputation is a fast way to burn multiple IPs. Fix the cause (auth, list quality, volume) and then consider a controlled warm-up.
8) What’s the simplest metric that predicts deliverability trouble?
Rising 4.7.0 deferrals and growing deferred queues for major providers. It’s the early warning system before 5.7.1 blocks arrive.
9) What if the bounce message is missing the enhanced status code?
Fall back to the basic SMTP code (4xx/5xx) and the diagnostic text, then confirm in MTA logs. Some systems strip details; your logs usually have more truth than the DSN email.
10) Do I need separate domains for transactional and marketing email?
You don’t strictly need it, but it makes DMARC alignment and reputation isolation easier. If you keep one domain, you’re accepting a larger blast radius.
Next steps (what to change on Monday)
If you want fewer bounce incidents and faster recovery when they happen, do the unglamorous work:
- Build a DSN dashboard that trends 4.7.0, 5.7.1, 5.1.1, and 4.4.1 over time, per recipient domain and per sending IP.
- Implement per-domain throttling and stop treating remote servers like infinite sinks. Respect “try again later.”
- Harden identity: SPF includes all senders, DKIM signs everything important, DMARC policy matches your reality, and PTR/rDNS isn’t an afterthought.
- Segment traffic so marketing mistakes don’t take down password resets.
- Make suppression rules explicit: immediate for 5.1.1, time-based for 4xx, careful for mailbox-full cases, and guarded during migrations.
Email is not a reliable transport. It’s a negotiated truce between your MTA and everyone else’s policies. Read the codes, confirm with evidence, and fix the system—not the screenshot.