Email “Recipient address rejected”: why valid users still bounce

Was this helpful?

Nothing ruins a calm on-call shift like a VP forwarding a bounce message that says “Recipient address rejected” for an address you know exists. The user is in the directory. They can log into the portal. They received email yesterday. Yet today, someone else’s mail server insists the recipient is invalid.

This is the particular kind of failure where everyone blames “DNS” like it’s a weather event. It’s usually not DNS alone. It’s the chain of decisions between the sending MTA, your edge, your policy engine, your mailbox store, and your directory. Somewhere in that chain, a component decided to say “no” at RCPT time—and it did so in a way that looks like the user doesn’t exist.

What “Recipient address rejected” really means

When you see “Recipient address rejected,” you’re not seeing a single standardized error. You’re seeing the sender’s interpretation of a refusal during the SMTP conversation—most often in response to the RCPT TO: command.

In plain terms: the receiving side (your side, or a third party’s) rejected the recipient during the envelope stage, before the message body was accepted. That’s why senders love it: rejecting early saves bandwidth and storage. That’s also why you get the angry ticket: the sender thinks you said the address is bad.

Typical SMTP status codes you’ll see

  • 550 5.1.1 — user unknown / mailbox unavailable (or pretending to be).
  • 550 5.1.0 / 5.7.1 — policy rejection that may get mislabeled as “address rejected.”
  • 551 / 553 — often “not local” or address syntax/policy issues.
  • 450 / 451 / 4.7.1 — temporary rejection (greylisting, rate limiting, backend timeout). These sometimes appear as “recipient rejected” in UIs because UIs are not your friend.

Key point: “Recipient address rejected” doesn’t necessarily mean the user doesn’t exist. It means the receiving system declined to accept mail for that recipient at that moment, for some reason. Many systems default to using recipient-stage rejections as a convenient place to enforce policy.

One quote that belongs on every mail engineer’s desk: Paraphrased idea: “Hope is not a strategy.” — attributed to many reliability folks, but the idea is a solid operations principle. In mail terms: stop hoping it’s “just a transient.” Prove where the rejection occurs.

Where the rejection happens in SMTP (and why it matters)

SMTP is conversational. That matters because each stage has different knobs, different logs, and different consequences.

Relevant stages in the conversation

  1. Connect / banner: your edge says hello. TLS negotiation may happen. This is where IP reputation gates can kill you before you even get to recipients.
  2. EHLO/HELO: capabilities. Policy engines sometimes apply HELO checks or require TLS before proceeding.
  3. MAIL FROM:: envelope sender. SPF/DMARC alignment decisions may influence later acceptance, sometimes incorrectly delayed until RCPT time.
  4. RCPT TO:: the recipient address. This is the usual site of “recipient rejected.” Recipient existence checks, directory lookups, routing to downstream mailbox servers, and recipient filtering rules all trigger here.
  5. DATA: message body. Content scanning, size limits, malware checks. If you reject here, the bounce semantics can look very different.

If the rejection happens at RCPT time, you should expect no copy of the message on your side (unless you deliberately journal or log it). That means you debug using logs and traces, not mailbox searches.

And yes, it’s possible to reject at RCPT time because your backend mailbox store is slow or unavailable. Systems do that to avoid accepting mail they can’t deliver. Mail is a store-and-forward protocol, but plenty of real deployments behave like a synchronous RPC service.

Joke #1: SMTP is the only protocol where “550” can mean “no such user” and also “I’m having a bad day, don’t ask me why.”

Interesting facts and history (short, useful)

  • Fact 1: SMTP predates modern identity systems; it was designed for cooperative networks, not adversarial spam ecosystems. Many “recipient rejected” issues are policy retrofits on an old protocol.
  • Fact 2: The idea of rejecting at RCPT time became popular because accepting DATA for spam is expensive—bandwidth, CPU for scanning, and disk I/O.
  • Fact 3: “Directory harvest attacks” (probing for valid recipients) pushed admins to disable recipient verification or to lie with generic failures, which then confuse legitimate senders.
  • Fact 4: Greylisting (temporary 4xx rejections) was an early anti-spam technique that exploited the fact that compliant MTAs retry. It still causes “valid user bounced” complaints when senders don’t retry properly.
  • Fact 5: Catch-all mailboxes were once a common safety net. Many orgs removed them to reduce spam intake, increasing the frequency of hard 5xx bounces.
  • Fact 6: DNSBLs (blocklists) are operationally effective but can create false positives that manifest as recipient rejections depending on where checks are applied.
  • Fact 7: Some MTAs implement “recipient callouts” (verify recipients by querying downstream systems). Misconfigured callouts can turn transient backend slowness into “user unknown.”
  • Fact 8: Internationalized email addresses exist (SMTPUTF8), but many systems still mishandle them. Users with non-ASCII local parts can be “valid” internally and rejected externally.
  • Fact 9: Large providers increasingly use rate limits and anomaly detection that can reject at RCPT time as an abuse signal, not a recipient existence claim.

A taxonomy of “valid user still bounced” failure modes

1) The recipient exists, but not on that system

Mail routing is brutally literal. If MX points to the wrong place, or a gateway thinks it is authoritative for a domain it isn’t, the recipient can be “unknown” because the wrong server is being asked.

Common causes:

  • MX records changed but not propagated, or cached incorrectly.
  • Split-brain: different resolvers see different MX answers.
  • Multiple inbound paths (cloud filter + on-prem) with inconsistent recipient validation rules.
  • A stale “relay domains” / “local domains” list on the edge MTA.

2) Recipient validation depends on a directory lookup that is failing

At RCPT time, your edge can consult LDAP/AD, SQL, an API, or a flat map. If that lookup times out or errors, some configurations fail closed (reject) and some fail open (accept and queue).

Fail-closed is safer for spam load. It’s also how you create a high-priority incident for a directory outage that would otherwise be “just login is slow.”

3) A valid mailbox exists, but the address alias does not

Users love aliases. Sales especially. If you remove an alias, the user still “exists” but mail to the alias is legitimately invalid. The sender will insist it’s valid because it worked once, and they are not lying. It worked once.

This also happens with plus-addressing, subaddressing, and “dot variations.” Some systems accept them; some normalize; some reject.

4) Back-end mailbox store is down or overloaded; the edge rejects early

If your edge validates recipients by talking to the mailbox server (LMTP, proxy, callout), then storage latency can surface as SMTP recipient rejection. This is where being a storage engineer gets fun in the “why am I here at 3 a.m.” way.

If mailbox metadata lives on networked storage and that storage is unhappy—latency spikes, pool near full, metadata corruption, or an NFS hiccup—your recipient checks can fail and you’ll deny perfectly valid users.

5) Over-eager anti-spam/anti-abuse rules misfire

Policy engines sometimes reject at RCPT time because it’s cheap. But when you wire content or reputation logic into RCPT, the error texts can be misleading. “Recipient rejected” becomes the generic wrapper for “your IP looks sketchy” or “you’re sending too fast.”

6) Sender side is broken (and you still get blamed)

Some senders don’t retry on 4xx. Some rewrite the envelope. Some send malformed commands. Some do recipient batching that triggers your per-connection limits. And some systems show users a generic “address rejected” regardless of what happened.

Joke #2: If you ever feel useless, remember there are email clients that hide the SMTP status code “to keep things simple.”

Fast diagnosis playbook

When you’re on the clock, you don’t need a lecture. You need to find the bottleneck fast and move.

First: classify the bounce with the actual SMTP code

  • If it’s 5xx, it’s a hard reject. Assume a config/policy problem until proven otherwise.
  • If it’s 4xx, it’s a temporary reject. Assume throttling, greylisting, or backend outage/latency.
  • If the bounce UI doesn’t show codes, demand the raw bounce headers or SMTP transcript. No transcript, no certainty.

Second: confirm mail routing (DNS and edge ownership)

  • Check MX from multiple resolvers.
  • Confirm the rejecting host is actually yours (or your vendor’s edge) and not an obsolete gateway.
  • Verify the domain is configured consistently across all inbound hops.

Third: reproduce with an SMTP session against the same edge

  • Use openssl s_client for STARTTLS and send a manual RCPT TO.
  • Compare behavior for a known-good user vs the “bouncing” user.
  • Look for differences: alias mapping, case sensitivity, subaddressing, domain variants.

Fourth: read the receiving logs at the rejection point

  • Find the exact reject reason in the MTA logs (not the bounce paraphrase).
  • Identify whether the rejection came from: recipient maps, LDAP, policy service, spam filter, or a downstream callout.

Fifth: decide whether to fail open temporarily

If directory/backends are flaky and you have queue capacity, consider temporarily accepting and queueing mail rather than rejecting at RCPT time. It buys time and protects business traffic. It also increases spam intake. This is a grown-up tradeoff—make it consciously.

Hands-on tasks: commands, outputs, decisions

Below are practical tasks you can run during an incident. Each includes a realistic command, sample output, what it means, and the decision to make. Use them like tools, not rituals.

Task 1: Inspect the bounce for the real SMTP status

cr0x@server:~$ grep -E "Status:|Diagnostic-Code:|Final-Recipient:" -n bounce.eml
12:Final-Recipient: rfc822; user@example.com
13:Action: failed
14:Status: 5.1.1
15:Diagnostic-Code: smtp; 550 5.1.1 <user@example.com>: Recipient address rejected: User unknown in virtual mailbox table

Meaning: This is a hard reject at recipient stage, and the text indicates a local recipient mapping problem (“virtual mailbox table”).

Decision: Skip spam/reputation theories. Go straight to recipient lookup configuration and maps.

Task 2: Confirm MX records for the domain

cr0x@server:~$ dig +short MX example.com
10 mx1.mailgw.example.net.
20 mx2.mailgw.example.net.

Meaning: Inbound mail should flow to these gateways.

Decision: If the rejecting host is not one of these, you may be debugging a decommissioned or shadow gateway, or the sender is using cached/incorrect DNS.

Task 3: Check MX resolution from a different resolver (split-brain hunt)

cr0x@server:~$ dig @1.1.1.1 +short MX example.com
10 mx1.mailgw.example.net.
20 mx2.mailgw.example.net.

Meaning: Public resolver agrees. Good sign.

Decision: If internal resolvers disagree, fix internal DNS caching/forwarding before touching mail configs.

Task 4: Verify the rejecting server identity and TLS name

cr0x@server:~$ openssl s_client -starttls smtp -connect mx1.mailgw.example.net:25 -servername mx1.mailgw.example.net </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
subject=CN = mx1.mailgw.example.net
issuer=CN = Example Internal Issuing CA

Meaning: You’re talking to the expected gateway, and the certificate matches.

Decision: Proceed with live SMTP testing against the same endpoint.

Task 5: Manually reproduce the RCPT rejection over SMTP

cr0x@server:~$ printf "EHLO test.example.net\r\nMAIL FROM:<probe@test.example.net>\r\nRCPT TO:<user@example.com>\r\nQUIT\r\n" | nc -w 5 mx1.mailgw.example.net 25
220 mx1.mailgw.example.net ESMTP Postfix
250-mx1.mailgw.example.net
250-PIPELINING
250-SIZE 52428800
250-STARTTLS
250-ENHANCEDSTATUSCODES
250 8BITMIME
250 2.1.0 Ok
550 5.1.1 <user@example.com>: Recipient address rejected: User unknown in virtual mailbox table
221 2.0.0 Bye

Meaning: The gateway itself is rejecting the recipient before DATA. This isn’t a downstream content scan.

Decision: Look at Postfix virtual mailbox maps, alias maps, LDAP maps, or policy delegation.

Task 6: Check whether the address is present in Postfix maps

cr0x@server:~$ postmap -q "user@example.com" /etc/postfix/virtual_mailbox_maps
mailboxes/example.com/user/

Meaning: The map contains an entry and resolves to a mailbox path.

Decision: If the SMTP test still rejects, the gateway may not be using this map (wrong config), or you’re querying the wrong file/type.

Task 7: Confirm what maps Postfix is actually using

cr0x@server:~$ postconf -n | egrep "virtual_mailbox_maps|virtual_alias_maps|relay_domains|mydestination|smtpd_recipient_restrictions"
mydestination = localhost
relay_domains = example.com
smtpd_recipient_restrictions = permit_mynetworks, reject_unauth_destination, reject_unknown_recipient_domain, reject_unlisted_recipient
virtual_alias_maps = hash:/etc/postfix/virtual_alias_maps
virtual_mailbox_maps = ldap:/etc/postfix/ldap/virtual_mailbox_maps.cf

Meaning: Recipient validation is enforced (reject_unlisted_recipient) and the mailbox map is LDAP-based, not hash-based.

Decision: Stop editing /etc/postfix/virtual_mailbox_maps. Fix LDAP lookup, credentials, base DN, filter, or connectivity.

Task 8: Test LDAP lookup directly (the “is the directory lying?” check)

cr0x@server:~$ ldapsearch -x -H ldap://ldap1.example.net -b "ou=People,dc=example,dc=net" "(mail=user@example.com)" dn mail
dn: uid=user,ou=People,dc=example,dc=net
mail: user@example.com

Meaning: LDAP contains the address.

Decision: If Postfix still rejects, the issue is in the LDAP map configuration (filter mismatch, attribute name, bind DN permissions) or in intermittent LDAP failures.

Task 9: Check Postfix logs for the specific reject reason

cr0x@server:~$ sudo grep -R "reject: RCPT from" /var/log/mail.log | tail -n 5
Jan 04 09:42:11 mx1 postfix/smtpd[28411]: NOQUEUE: reject: RCPT from sender-gw.remote.net[203.0.113.77]: 550 5.1.1 <user@example.com>: Recipient address rejected: User unknown in virtual mailbox table; from=<sender@remote.net> to=<user@example.com> proto=ESMTP helo=<sender-gw.remote.net>

Meaning: The reject is happening in smtpd with “NOQUEUE” (never accepted). This is a front-door policy decision.

Decision: Focus on recipient listing/lookup and restrictions; do not waste time searching the queue for the message.

Task 10: Validate the LDAP map file Postfix uses

cr0x@server:~$ sudo sed -n '1,120p' /etc/postfix/ldap/virtual_mailbox_maps.cf
server_host = ldap1.example.net
search_base = ou=People,dc=example,dc=net
query_filter = (mail=%s)
result_attribute = mail
bind = yes
bind_dn = cn=postfix,dc=example,dc=net

Meaning: This map returns the mail attribute. That’s fine for existence checks, but not always for mailbox routing paths.

Decision: If Postfix expects a mailbox destination/path and you return the email itself, you may get “unknown in virtual mailbox table.” Align result_attribute with how you deliver (e.g., a maildir path attribute), or use separate maps for existence vs destination.

Task 11: Check for backend timeouts masquerading as “unknown user”

cr0x@server:~$ sudo journalctl -u postfix -S "10 min ago" | egrep -i "timeout|ldap|defer|reject" | tail -n 20
Jan 04 09:40:02 mx1 postfix/ldap[28122]: warning: ldap_search: Timeout
Jan 04 09:40:02 mx1 postfix/smtpd[28115]: NOQUEUE: reject: RCPT from sender-gw.remote.net[203.0.113.77]: 450 4.1.1 <user@example.com>: Recipient address rejected: temporary lookup failure; from=<sender@remote.net> to=<user@example.com> proto=ESMTP helo=<sender-gw.remote.net>

Meaning: LDAP lookup timed out; the rejection is temporary (4xx). Some senders will still present this as a “bounce.”

Decision: Treat as a dependency SLO breach: fix LDAP latency/connectivity. Consider fail-open or caching for recipient existence checks.

Task 12: Verify whether you’re greylisting or rate limiting at RCPT

cr0x@server:~$ sudo grep -R "greylist\|rate\|policyd\|postscreen" /var/log/mail.log | tail -n 10
Jan 04 09:41:09 mx1 postfix/smtpd[28307]: warning: unknown[203.0.113.77]: SASL LOGIN authentication failed: authentication failure
Jan 04 09:41:12 mx1 postfix/smtpd[28411]: NOQUEUE: reject: RCPT from sender-gw.remote.net[203.0.113.77]: 451 4.7.1 Service unavailable - try again later; from=<sender@remote.net> to=<user@example.com> proto=ESMTP helo=<sender-gw.remote.net>

Meaning: The rejection is 451 4.7.1: temporary, likely a policy gate (greylisting, abuse, or auth issues).

Decision: If the sender is legitimate and time-sensitive, whitelist their sending IPs temporarily while you tune the policy and confirm their retry behavior.

Task 13: Check whether the domain is treated as local/relay correctly

cr0x@server:~$ postconf -n | egrep "mydestination|virtual_mailbox_domains|relay_domains"
mydestination = localhost
relay_domains = example.com, example.org
virtual_mailbox_domains = example.com

Meaning: The domain appears both as relay and virtual mailbox domain. That’s suspicious; it can change which lookups are used.

Decision: Clarify architecture: either you host mailboxes (virtual mailbox domains) or you relay to downstream (relay_domains). Mixed configuration often results in “valid user” rejections on one path.

Task 14: Confirm the mailbox exists on the storage-backed mail store

cr0x@server:~$ sudo ls -ld /var/vmail/example.com/user
drwx------ 12 vmail vmail 4096 Jan  4 09:10 /var/vmail/example.com/user

Meaning: The maildir exists on disk. Good.

Decision: If recipient checks still fail, the issue is not “mailbox folder missing” but likely mapping/lookup or permission/SELinux context mismatches.

Task 15: Check storage health and “near-full” conditions that cause mail systems to refuse recipients

cr0x@server:~$ df -h /var/vmail
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       2.0T  1.9T   80G  96% /var/vmail

Meaning: 96% full on the mail volume. Many MTAs and delivery agents start failing in odd ways under high utilization, and some policy layers reject early to avoid queue explosions.

Decision: Initiate capacity relief immediately: expand, clean up, or move mailboxes. Don’t tune SMTP errors while storage is screaming.

Task 16: Verify queue pressure (if you accept mail then defer delivery)

cr0x@server:~$ mailq | tail -n 5
-- 2435 Kbytes in 512 Requests.

Meaning: Moderate queue. If this number spikes, you may be accepting mail but failing downstream, which can prompt admins to “temporarily” reject recipients and accidentally create bounces.

Decision: If queue growth is unbounded, fix downstream delivery first; consider temporarily increasing queue resources or throttling inbound rather than hard-bouncing valid users.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

They migrated inbound filtering to a cloud email security gateway. Clean change window, rollback plan, tests passed. The team assumed recipient validation would be identical to the old on-prem edge because “it’s just forwarding mail.”

On Monday morning, a subset of users started bouncing for external senders with “550 5.1.1 user unknown.” Internally, everything looked fine: users could email each other, and they could receive from a few partners. It felt random, which is how you know you’re in for a long day.

The wrong assumption was subtle: the old edge accepted mail for all recipients and let the mailbox tier decide what was valid. The new gateway did recipient validation at RCPT time by syncing a directory feed. The feed update lagged. Newly created aliases and distribution lists weren’t present yet, so the gateway rejected them as unknown.

The immediate fix was operational, not philosophical: increase directory sync frequency, and temporarily disable strict recipient rejection for certain address patterns while the sync stabilized. The longer-term fix was governance: any identity change that affects email addresses must be treated like production config, with propagation guarantees and observability.

Everyone learned the same lesson they always learn, just again: “forwarding mail” is a myth. Every hop is a policy enforcement point. If you don’t know which hop is authoritative for recipient existence, you don’t have an email system—you have a bounce generator.

Mini-story 2: The optimization that backfired

A large org was getting hammered by spam and directory harvest attempts. Someone proposed an optimization: reject unknown recipients immediately at the edge, before content scanning. That would save CPU and disk. It was correct in principle and disastrous in detail.

They implemented recipient verification by doing live callouts from the edge to the mailbox servers. The mailbox servers were backed by a busy storage cluster that occasionally had latency spikes during snapshot operations. Most users never noticed because mail delivery could lag a bit and catch up. But with callouts at RCPT time, a 300ms spike turned into an SMTP timeout and then into rejections.

Suddenly, during snapshot windows, the edge started rejecting valid recipients as “unknown.” External senders saw hard failures. Internally, monitoring showed “mailbox servers fine” because the servers were up, just slow. Storage graphs showed the spike, but nobody connected it to SMTP behavior until an SRE sat down and replayed the SMTP transcript.

The fix wasn’t to disable spam protection. The fix was to stop using synchronous backend dependency checks for recipient existence. They moved to a cached recipient directory on the edge with bounded staleness, and configured timeouts to fail open for lookup errors while rate-limiting suspicious senders. They also coordinated storage snapshot timing with mail peak hours. Boring alignment work. Big payoff.

Optimization is great, but only when you understand the new failure mode you’re buying. Here, they traded “spam load” for “production dependency coupling.” In mail, coupling is where reliability goes to die.

Mini-story 3: The boring but correct practice that saved the day

A finance team complained that invoices to a particular employee bounced with “recipient address rejected.” The employee was definitely real. Everyone braced for another day of mail archaeology.

The mail team had one boring practice: every inbound SMTP reject was logged with a structured reason code from the policy engine, and those logs were searchable by recipient and sending IP. Nothing fancy. Just disciplined logging and retention.

They queried the logs and found that the same sender IP had triggered a rate-limit rule after a burst of messages to multiple recipients. The policy engine rejected at RCPT time with a 451 4.7.1, but the sender’s system translated it into a hard bounce for the user. The “recipient rejected” phrasing was a UI lie.

Because they had the exact timing, rule name, and SMTP code, they could do a surgical fix: whitelist the sender IP temporarily and ask the partner to adjust retry behavior. No config thrash. No speculative DNS changes. The incident closed quickly, and the team looked magically competent—despite mostly being competent in a profoundly unmagical way.

Observability is not glamorous, but it’s the difference between an incident and a mystery novel. In production, you want incidents, not novels.

Common mistakes: symptom → root cause → fix

1) Symptom: “550 5.1.1 user unknown” for a user who exists

Root cause: Edge is doing strict recipient validation against a stale or failing directory/map.

Fix: Verify which map is authoritative (postconf -n), test lookup directly, fix sync/LDAP timeouts. Consider caching and fail-open for lookup failures.

2) Symptom: Bounces only from some external senders

Root cause: Split routing (different MX targets), sender caching old MX, or partner using a specific gateway path that applies different recipient policies.

Fix: Compare rejecting hostnames/IPs across bounces. Align recipient policy across all inbound gateways. Ensure old gateways return clear “domain not handled” instead of “user unknown.”

3) Symptom: Only aliases bounce; primary address works

Root cause: Alias map missing, alias removed, or alias not included in directory feed used by the edge.

Fix: Re-add alias or correct alias sync. If you intentionally removed it, communicate and set up a controlled redirect or auto-reply period instead of a hard bounce.

4) Symptom: Random recipients bounce during peak hours

Root cause: Recipient verification depends on backend latency (LDAP, mailbox server callouts, storage). Timeouts cause rejects.

Fix: Increase resilience: local cache, longer/appropriate timeouts, fail-open for transient errors, and separate spam throttles from recipient existence checks.

5) Symptom: “Recipient address rejected” but SMTP code is 5.7.1

Root cause: Policy rejection (reputation, authentication, TLS required, sender blocked) presented with misleading text by the sender UI or a gateway.

Fix: Use logs/transcript. If it’s policy, tune policy. Don’t waste time validating the mailbox.

6) Symptom: New hires bounce for a few hours after creation

Root cause: Identity-to-mail propagation delay (directory sync, address book policy, cloud gateway sync). Edge rejects until data arrives.

Fix: Shorten sync interval, publish expected propagation SLOs, and consider accepting and queueing mail for unknown recipients during that window if business impact is high.

7) Symptom: Only internationalized addresses bounce externally

Root cause: SMTPUTF8/IDN handling mismatch; gateway or sender doesn’t support UTF-8 local parts or punycode domain expectations.

Fix: Ensure SMTPUTF8 support end-to-end or avoid non-ASCII local parts for externally facing addresses. For domains, confirm consistent IDN configuration.

8) Symptom: User exists, but mail bounces after mailbox move/migration

Root cause: Mail routing attribute (targetAddress, mail routing table, or internal transport map) not updated everywhere; some gateways still point to old mailbox store.

Fix: Audit routing attributes, ensure old store returns temporary defers (4xx) rather than hard 5xx during migration, and keep forwarding/relays until cutover is fully consistent.

Checklists / step-by-step plan

Step-by-step: handle a “valid user bounced” ticket like an adult

  1. Get the raw bounce. Ask for the full headers and the SMTP diagnostic line. Screenshots are decoration.
  2. Extract: SMTP code, rejecting host, timestamp, recipient, envelope sender, sending IP if present.
  3. Confirm routing: MX records, and whether the rejecting host is on the inbound path you expect.
  4. Reproduce: manual SMTP session to the rejecting host; test both the reported recipient and a control recipient.
  5. Read the logs: find the NOQUEUE: reject entry and the module (smtpd, policy, ldap, postscreen).
  6. Identify the gate: recipient maps, directory, policy engine, rate limit, callout, or downstream routing.
  7. Fix the gate: map entry, alias sync, timeout tuning, whitelist, backend health, or DNS/routing correction.
  8. Decide on temporary mitigation: fail-open for lookups, temporary accept-and-queue, or partner whitelist.
  9. Verify with the same reproduction test. Don’t wait for a partner to “try again later” without proof.
  10. Write down the class of incident. If you can’t categorize it next time, you didn’t really fix it.

Checklist: design choices that reduce these incidents

  • Choose where recipient existence is enforced: edge vs mailbox tier. Be explicit.
  • If you enforce at the edge, cache with a defined staleness window and monitor sync health.
  • Separate “unknown user” from “policy reject” in logs and in SMTP responses (clear enhanced status codes help).
  • Prefer 4xx on dependency failures (LDAP timeout, mailbox callout timeout). 5xx should mean “this will never work.”
  • Keep inbound paths consistent (multiple MX, cloud gateways, regional edges). Same recipient logic everywhere.
  • Instrument and retain reject logs long enough to match user-reported timelines.
  • Storage capacity monitoring is email monitoring if your mailbox metadata and delivery depend on it.

Checklist: what not to do (because it feels productive)

  • Don’t “flush DNS” as a first move. Verify DNS answers and caching behavior first.
  • Don’t disable recipient validation globally as a panic response without considering spam flood impact.
  • Don’t edit map files that Postfix isn’t using (check postconf -n before you touch anything).
  • Don’t treat 4xx as a non-issue; some senders won’t retry, and you will still own the outage.
  • Don’t accept mail blindly if you can’t queue safely (disk almost full, queue already exploding).

FAQ

1) If the user can log in, why would email say “user unknown”?

Because login identity and mail routing identity are different systems. The edge may validate against a different directory view, a stale sync, or a different attribute (primary vs alias).

2) Is “Recipient address rejected” always our fault?

No. It can be a sender that doesn’t retry on 4xx, a sender using stale MX, or a UI that paraphrases every rejection as an address problem. Your job is to prove where the reject happened and why.

3) Should we reject unknown recipients at the edge?

Usually yes for spam control, but only if you can do it reliably: fast directory lookups, caching, and sane behavior on dependency failures. Otherwise you trade spam for self-inflicted bounces.

4) What’s the difference between rejecting at RCPT time vs after DATA?

RCPT-time rejection happens before accepting the message body; it’s cheaper and avoids queueing. DATA-time rejection gives you more context (content scanning), but you’ve already accepted more work and may need to generate DSNs depending on behavior.

5) Why do some bounces show 550 but the real issue is rate limiting?

Some gateways use generic 5xx codes for multiple policy rejections, and some sender software collapses multiple errors into “address rejected.” Always use the enhanced status code and the exact diagnostic text from logs.

6) Can storage issues really cause “recipient rejected”?

Yes. If recipient verification depends on mailbox metadata on disk or a downstream callout to a mailbox server that’s slow due to storage latency, the edge can time out and reject the recipient.

7) What’s the safest temporary mitigation during a directory outage?

If you have queue capacity: fail open on directory lookup errors (accept mail, defer delivery) while rate-limiting by IP and enforcing basic anti-abuse controls. If you don’t have queue capacity, you may have to 4xx tempfail.

8) Why does it only happen for new accounts?

Propagation lag: the mailbox exists in one system, but the edge’s recipient list (or cloud gateway sync) hasn’t caught up. Fix the sync pipeline and publish expectations so “two hours later it works” isn’t your operational plan.

9) Our partner insists they got a hard bounce, but we see 451 tempfails. Who’s right?

Both. You returned a temporary failure; their system chose to treat it as permanent (or their UI did). If it’s a high-value partner, ask them to confirm retry behavior and consider a scoped whitelist.

10) How do we prevent misleading error messages?

You can’t control sender UIs, but you can control your SMTP responses. Use accurate enhanced status codes and include a short reason that maps to an internal runbook key (“policy:rate-limit”, “lookup:ldap-timeout”).

Conclusion: next steps you can actually do

“Recipient address rejected” is not a diagnosis. It’s a symptom of a decision made at SMTP recipient time—often by a policy layer that’s trying to be efficient. Your job is to find the exact gate and decide whether it should be strict, cached, or temporarily permissive.

Do these next:

  1. Standardize how you capture bounces: require SMTP code, rejecting host, and timestamp.
  2. Make recipient validation explicit: which hop is authoritative, and what happens when lookups fail.
  3. Instrument reject reasons with structured logs and keep them long enough to match business timelines.
  4. Audit dependencies: LDAP latency, backend mailbox callouts, and storage capacity. If they can reject recipients, they are part of your inbound SLO.
  5. Practice the fast diagnosis playbook once when you’re not on fire. Future-you will be less grumpy.
← Previous
Big.LITTLE goes x86: how ARM ideas moved into PCs
Next →
Thermal pads: the $10 fix that can change a whole GPU

Leave a comment