SMTP TLS handshake fails: fix certs, ciphers, and chains the correct way

Was this helpful?

It worked yesterday. Today your outbound queue looks like a landfill, and your inbound logs are full of “handshake failure” and “unknown CA”. Meanwhile, a VP is forwarding you screenshots of a bounced invoice like you personally melted the internet.

SMTP over TLS is not “set-and-forget”. It’s a negotiation between two systems that change independently: certificates expire, root stores evolve, cipher preferences shift, and one side eventually decides your server is living in 2014. This is the field guide for fixing it without cargo-culting configs or downgrading security until it “works”.

Fast diagnosis playbook

If you’re on-call, you don’t need a lecture. You need a tight loop: reproduce, classify, fix the correct layer, move on.

1) Confirm where it fails: before STARTTLS, during handshake, or after

  • Before STARTTLS: STARTTLS not offered, port wrong, firewall, or policy refusing encryption (yes, that’s a thing).
  • During handshake: protocol/cipher mismatch, bad certificate chain, wrong cert via SNI, expired cert, or client trust store issue.
  • After handshake: SMTP auth fails, policy rejects, or application-level issues misreported as TLS.

2) Run one canonical probe from the failing side

Do this from the same network segment and host that experiences the error. TLS is sensitive to “works from my laptop” lies.

Use openssl s_client with STARTTLS, and capture the certificate chain plus the negotiated protocol and cipher.

3) Decide which bucket you’re in

  • “No peer certificate” / “handshake failure” immediately: protocol/cipher mismatch, STARTTLS not actually happening, or the server aborts due to policy/SNI.
  • “verify error:num=20/21 unable to get local issuer certificate”: broken chain presentation or missing intermediate.
  • “certificate has expired” / “not yet valid”: certificate lifecycle or clock skew.
  • “hostname mismatch”: wrong certificate, SNI not used or misconfigured, or MX points somewhere you didn’t expect.

4) Fix the server first unless you control the client

In SMTP, you often don’t control the other end. That means: fix what you present and what you negotiate. If your server is the one initiating outbound mail (client role), fix your trust store and policies too, but don’t paper over broken server config with “accept_invalid_certs = yes”. That’s not a fix; it’s a breach plan.

How SMTP TLS handshakes actually break (and why the logs lie)

SMTP TLS has two common modes:

  • Implicit TLS (SMTPS on 465): TLS starts immediately after TCP connect.
  • STARTTLS (usually on 25 or 587): client connects in plaintext, issues EHLO, sees STARTTLS, then upgrades the connection.

Handshake failures are often reported as a single depressing line, but the actual failure is typically one of these:

  1. Negotiation mismatch: No shared protocol (TLS 1.0 vs TLS 1.2+) or no shared cipher suites.
  2. Certificate chain issues: Server presents a leaf cert but forgets intermediates. Some clients can fetch missing intermediates; many SMTP stacks won’t.
  3. Trust issues: Client doesn’t trust the chain (old root store, private CA, missing root, or a recently distrusted CA).
  4. Identity issues: The certificate doesn’t match the name the client expects (MX hostname vs banner name vs SNI).
  5. Policy issues: Server requires client certs, requires SNI, refuses weak signatures, or enforces MTA-STS/DANE expectations on the other side.
  6. Middleboxes: Firewalls “help” by intercepting or mangling STARTTLS, or outbound TLS inspection goes sideways.

One operational truth: SMTP is an ecosystem. You can run a perfect setup and still fail with a peer that is misconfigured. Your job is to keep your side correct, observable, and reasonably compatible—without turning into the last TLS 1.0 server on Earth.

Paraphrased idea, with attribution: “Hope is not a strategy.” — often attributed in reliability circles to Gene Kranz, reflecting operations thinking (paraphrased).

Interesting facts and history you can weaponize

  • STARTTLS was an upgrade path, not a clean-sheet design. It exists because email was already everywhere in plaintext, and nobody was going to re-platform the planet.
  • Port 465 had a weird life. It started as “SMTPS”, got discouraged in favor of STARTTLS, then came back as a formally recognized submission TLS port in modern practice.
  • TLS 1.3 changed the handshake shape. Fewer round trips, different cipher suite naming, and fewer legacy knobs. Great for security; confusing for old monitoring scripts.
  • Some SMTP clients do not fetch missing intermediates. Browsers are forgiving; MTAs are often not. Assume nothing gets auto-fixed.
  • SNI is younger than most mail servers. If you host multiple domains on one IP, modern clients send SNI, but not all SMTP stacks did historically.
  • SHA-1 deprecation still haunts long-lived appliances. Old gear may reject newer signatures, and newer gear rejects old signatures. Everyone is disappointed.
  • CAA records can break certificate renewals silently. If your automation renews via ACME and CAA blocks the issuer, you learn about it when the cert expires—on a weekend.
  • DNS-based security (DANE, MTA-STS) changed expectations. Some senders now insist on valid, verifiable TLS for delivery, and will defer rather than downgrade.
  • Trust stores are political documents. A CA can be trusted today and distrusted tomorrow. Your “it’s a valid cert” argument won’t help if the root is gone.

Practical tasks: commands, outputs, decisions

These are the tasks I actually run during incidents. Each includes the command, what the output means, and what decision you make from it.

Task 1: Check that the server offers STARTTLS on port 25

cr0x@server:~$ nc -nv mail.example.net 25
(UNKNOWN) [203.0.113.10] 25 (smtp) open
220 mx1.example.net ESMTP Postfix

Meaning: TCP works and you reached an SMTP banner. If this hangs or times out, you’re not debugging TLS yet.

Decision: If TCP fails, fix routing/firewall/DNS first. If banner is wrong host, chase MX and load balancers.

Task 2: Confirm STARTTLS is advertised

cr0x@server:~$ printf "EHLO probe.example\r\nQUIT\r\n" | nc -nv mail.example.net 25
(UNKNOWN) [203.0.113.10] 25 (smtp) open
220 mx1.example.net ESMTP Postfix
250-mx1.example.net
250-PIPELINING
250-SIZE 52428800
250-ETRN
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
221 2.0.0 Bye

Meaning: STARTTLS exists. If it’s missing, either TLS is disabled or policy hides it (rare but real).

Decision: If STARTTLS missing on inbound MX, check MTA config and ensure TLS listener is enabled.

Task 3: Perform a STARTTLS handshake probe with SNI

cr0x@server:~$ openssl s_client -starttls smtp -connect mail.example.net:25 -servername mail.example.net -showcerts -verify_return_error
CONNECTED(00000003)
depth=2 C = US, O = Example Root CA, CN = Example Root CA R1
verify return:1
depth=1 C = US, O = Example Issuing CA, CN = Example Issuing CA G2
verify return:1
depth=0 CN = mail.example.net
verify return:1
---
Certificate chain
 0 s:CN = mail.example.net
   i:C = US, O = Example Issuing CA, CN = Example Issuing CA G2
-----BEGIN CERTIFICATE-----
...snip...
-----END CERTIFICATE-----
 1 s:C = US, O = Example Issuing CA, CN = Example Issuing CA G2
   i:C = US, O = Example Root CA, CN = Example Root CA R1
-----BEGIN CERTIFICATE-----
...snip...
-----END CERTIFICATE-----
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Verify return code: 0 (ok)

Meaning: You negotiated TLS 1.3 and validation succeeded. Chain includes intermediate.

Decision: If your production clients still fail, compare their trust store, protocol support, SNI behavior, and policy requirements.

Task 4: Probe without SNI to catch the “default cert” problem

cr0x@server:~$ openssl s_client -starttls smtp -connect mail.example.net:25 -showcerts -verify_return_error
CONNECTED(00000003)
depth=0 CN = default.invalid
verify error:num=62:Hostname mismatch
80DB5E2E677F0000:error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1889:
---
Certificate chain
 0 s:CN = default.invalid
   i:C = US, O = Example Issuing CA, CN = Example Issuing CA G2
---
Verify return code: 62 (Hostname mismatch)

Meaning: Without SNI, the server hands out a default certificate for another name.

Decision: If you host multiple names, ensure your SMTP daemon supports SNI and has correct per-name cert mapping; otherwise dedicate an IP or present a cert that covers all relevant names via SANs.

Task 5: Identify protocol version mismatch (client too old or server too strict)

cr0x@server:~$ openssl s_client -starttls smtp -connect mail.example.net:25 -tls1
CONNECTED(00000003)
140735215769152:error:0A000102:SSL routines:ssl_choose_client_version:unsupported protocol:../ssl/statem/statem_lib.c:1950:
no peer certificate available

Meaning: Server refuses TLS 1.0. That’s normal in 2026.

Decision: If a legacy client can’t do TLS 1.2+, upgrade it or isolate it behind a controlled relay. Do not re-enable TLS 1.0 on your internet MX unless you enjoy incident retrospectives.

Task 6: Identify cipher suite mismatch

cr0x@server:~$ openssl s_client -starttls smtp -connect mail.example.net:25 -tls1_2 -cipher 'RC4-SHA'
CONNECTED(00000003)
140735215769152:error:0A000410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure:../ssl/record/rec_layer_s3.c:1605:SSL alert number 40
no peer certificate available

Meaning: You offered an ancient cipher; server refused. Good.

Decision: If the reverse happens (client only offers junk), the fix is on the client. If your server only offers junk, fix your TLS library and config.

Task 7: Verify the certificate chain file on disk

cr0x@server:~$ openssl x509 -in /etc/ssl/mail/mail.example.net.crt -noout -subject -issuer -dates -ext subjectAltName
subject=CN = mail.example.net
issuer=C = US, O = Example Issuing CA, CN = Example Issuing CA G2
notBefore=Dec  1 00:00:00 2025 GMT
notAfter=Mar  1 23:59:59 2026 GMT
X509v3 Subject Alternative Name:
    DNS:mail.example.net, DNS:mx1.example.net

Meaning: Dates and SANs look sane. Expiry is soon: that’s a future incident already scheduled.

Decision: If SAN doesn’t match the hostname your peers connect to (often the MX name), fix issuance and DNS naming alignment.

Task 8: Ensure the presented chain is complete and ordered

cr0x@server:~$ openssl verify -CAfile /etc/ssl/certs/ca-certificates.crt -untrusted /etc/ssl/mail/intermediate.pem /etc/ssl/mail/mail.example.net.crt
/etc/ssl/mail/mail.example.net.crt: OK

Meaning: Given the intermediate, the leaf verifies against system roots.

Decision: If this fails, you likely have the wrong intermediate, an incomplete chain, or a cert from a private CA not trusted by peers.

Task 9: Inspect what the SMTP service actually presents (not what you think it presents)

cr0x@server:~$ openssl s_client -starttls smtp -connect 203.0.113.10:25 -servername mail.example.net 2>/dev/null | openssl x509 -noout -subject -issuer -ext subjectAltName
subject=CN = mail.example.net
issuer=C = US, O = Example Issuing CA, CN = Example Issuing CA G2
X509v3 Subject Alternative Name:
    DNS:mail.example.net, DNS:mx1.example.net

Meaning: You’re checking the live service, not the file in /etc/ssl that nobody actually loaded.

Decision: If live output differs from disk, you have reload problems, wrong path in config, or multiple MTAs/listeners.

Task 10: Confirm your MTA is listening where you think

cr0x@server:~$ ss -ltnp | egrep ':(25|465|587)\s'
LISTEN 0      100          0.0.0.0:25        0.0.0.0:*    users:(("master",pid=1243,fd=13))
LISTEN 0      100          0.0.0.0:587       0.0.0.0:*    users:(("master",pid=1243,fd=14))
LISTEN 0      100          0.0.0.0:465       0.0.0.0:*    users:(("master",pid=1243,fd=15))

Meaning: Postfix master is bound to all three ports. If 465 isn’t present, SMTPS isn’t enabled.

Decision: Match listener reality to your published settings and client expectations.

Task 11: Pull the exact TLS errors from Postfix logs

cr0x@server:~$ sudo grep -E "warning: TLS|SSL_accept|SSL_connect|handshake" /var/log/mail.log | tail -n 8
Jan 03 11:04:18 mx1 postfix/smtpd[22418]: warning: TLS library problem: error:0A000076:SSL routines::no suitable signature algorithm:../ssl/t1_lib.c:3364:
Jan 03 11:04:18 mx1 postfix/smtpd[22418]: lost connection after STARTTLS from unknown[198.51.100.23]
Jan 03 11:06:44 mx1 postfix/smtpd[22502]: warning: TLS library problem: error:0A000410:SSL routines::sslv3 alert handshake failure:../ssl/record/rec_layer_s3.c:1605:SSL alert number 40
Jan 03 11:06:44 mx1 postfix/smtpd[22502]: lost connection after STARTTLS from mail.partner.example[203.0.113.77]

Meaning: “No suitable signature algorithm” often points to RSA/EC algorithm mismatch or very old peer limitations.

Decision: Decide whether you will support that partner’s TLS stack. If it’s a major business dependency, route via a dedicated relay with tailored compatibility; don’t weaken the main MX.

Task 12: Check the outbound side: what your server negotiates when acting as a client

cr0x@server:~$ openssl s_client -starttls smtp -connect mx.partner.example:25 -servername mx.partner.example -brief
CONNECTION ESTABLISHED
Protocol version: TLSv1.2
Ciphersuite: ECDHE-RSA-AES256-GCM-SHA384
Peer certificate: CN = mx.partner.example
Hash used: SHA256
Signature type: RSA-PSS
Verification: OK

Meaning: Outbound client negotiation succeeded; protocol/cipher look modern enough.

Decision: If outbound fails only from your MTA but succeeds with OpenSSL, check MTA TLS policy, SNI support, and CA file paths.

Task 13: Validate system clock and time sync (yes, it matters)

cr0x@server:~$ timedatectl
Local time: Sat 2026-01-03 11:12:29 UTC
Universal time: Sat 2026-01-03 11:12:29 UTC
RTC time: Sat 2026-01-03 11:12:29
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no

Meaning: Time is synced. If it’s not, “not yet valid” cert errors are guaranteed chaos.

Decision: Fix NTP before touching TLS configs. A wrong clock makes every certificate look guilty.

Task 14: Confirm the CA store your MTA uses is populated

cr0x@server:~$ ls -l /etc/ssl/certs/ca-certificates.crt
-rw-r--r-- 1 root root 214392 Jan  2 02:14 /etc/ssl/certs/ca-certificates.crt

Meaning: You have a consolidated CA bundle. Some MTAs are configured to use custom bundles; make sure they exist and are updated.

Decision: If you’re using a custom CA file, document it and automate updates. Otherwise, keep it on the OS standard path.

Joke 1: A TLS handshake failure is like two executives meeting: both insist they’re compatible, and neither will change their defaults.

Certificate chains: how to build them correctly

The number one SMTP TLS failure I see in the wild: servers presenting an incomplete chain. Browsers often repair this by fetching intermediates using Authority Information Access (AIA). SMTP clients generally won’t. They try what you gave them, then they leave.

What “correct chain” means in SMTP terms

  • Leaf certificate: for the SMTP hostname peers connect to (typically the MX hostname).
  • Intermediate certificate(s): needed to link the leaf to a root in the client trust store.
  • Root certificate: usually not sent by the server; clients already have it. Sending it is usually harmless but can confuse some broken stacks and makes troubleshooting noisier.

Order matters

When you concatenate certificates, do it in the order: leaf first, then each intermediate up the chain. Don’t include the root unless you have a specific reason. If you got a “fullchain.pem” from your CA, it’s usually leaf + intermediate(s), which is what you want to present.

How chain problems show up in real logs

  • Client-side error: “unable to get local issuer certificate” or “unknown ca”.
  • Server-side symptom: connection drops right after STARTTLS, often logged as “lost connection after STARTTLS”.
  • Operational smell: some receivers accept mail, others defer for hours, depending on their TLS stack and trust store freshness.

Don’t mix chain files per service casually

One host can serve HTTPS, IMAPS, POP3S, and SMTP. Each service might be configured to use a different certificate file path. If you “renewed the cert” but only updated the web server, congrats: you fixed the wrong service. This happens more than anyone admits.

Ciphers and protocol versions: stop negotiating like it’s 1999

When the handshake fails with “no shared cipher” or “protocol version”, you’re looking at one of two situations:

  1. Your server is too strict for a legitimate peer you must support.
  2. The peer is obsolete, and you should not degrade your baseline security.

Here’s the opinionated part: for an internet-facing MX, default to TLS 1.2 and TLS 1.3, modern AEAD ciphers, and sane curves. If someone can’t talk to that, they can deliver without TLS only if your policy allows it—or they can upgrade. Your job is mail delivery and risk management, not running a compatibility museum.

SMTP reality: you don’t control most peers

But you can control how you fail:

  • For inbound: you can offer TLS but still accept plaintext if your policy permits. That keeps mail flowing but doesn’t enforce security.
  • For outbound: you can require TLS to specific domains (policy maps), or enforce MTA-STS/DANE where available.

Optimization trap: “We disabled TLS 1.2 because 1.3 is faster”

TLS 1.3 is great. You still need TLS 1.2 for compatibility with a chunk of SMTP infrastructure that hasn’t upgraded, including some managed appliances. Keep TLS 1.2 enabled unless you have a controlled environment and written exceptions.

SNI, name matching, and the “wrong certificate” trap

SNI (Server Name Indication) is how a client tells the server which hostname it wants during the TLS handshake. Without SNI, the server picks a default certificate. That’s fine for single-domain servers. It’s a disaster for multi-tenant mail gateways.

Where the expected name comes from

SMTP clients typically connect to the MX hostname they got from DNS. They validate the certificate against that name. If your MX record points to mx1.example.net but your certificate only contains mail.example.net, you created your own outage. Fix naming first; don’t try to “configure around” DNS truth.

Three name planes you must align

  • DNS: MX points to a hostname; A/AAAA maps it to IPs.
  • SMTP banner: what your server says in the 220 greeting. Not directly used for TLS validation, but it affects debugging and reputation signals.
  • Certificate SANs: what names the certificate is valid for. This is what matters for TLS name checks.

If you’re hosting multiple domains on one MX, you can still use a single certificate covering multiple hostnames via SANs, or use SNI-based selection if your MTA supports it cleanly. The “right” choice depends on operational complexity: SAN sprawl is ugly, but SNI misconfig is uglier.

MTA configuration patterns (Postfix, Exim) that don’t age badly

I’ll keep this practical: you want configs that are explicit, testable, and reloadable without surprises.

Postfix: inbound TLS essentials

  • Use a full chain file for smtpd_tls_cert_file (leaf + intermediates).
  • Point smtpd_tls_key_file to the matching private key.
  • Set a minimum protocol level that matches your risk posture; typically TLS 1.2.
  • Log enough to debug (smtpd_tls_loglevel), but not so much you DDoS your own disks.

Postfix: outbound TLS essentials

Outbound is where policy becomes real. You can do opportunistic TLS by default, and enforce TLS per-domain when you must.

  • Opportunistic: attempt STARTTLS when offered, but deliver without it if the peer doesn’t support it.
  • Enforced per-domain: require TLS to specific partners, and fail closed when it’s not available.

Exim: same physics, different knobs

Exim will happily let you shoot yourself with TLS configuration too. The guiding principle is the same: present a complete chain, ensure name matching, keep protocol versions modern, and test both with and without SNI where relevant.

Three corporate mini-stories from the trenches

1) The outage caused by a wrong assumption

The company had a single mail gateway that handled inbound for multiple brands. DNS was clean, MX records pointed to mx.brand-a.example and mx.brand-b.example, both resolving to the same IP. The team renewed certificates through an ACME client, and everything “looked fine” from a browser check.

Monday morning, a subset of partners started deferring mail to brand B. Logs showed “TLS handshake failed” on the sender side, while the receiver saw “lost connection after STARTTLS” with no obvious certificate errors. The on-call engineer did what many do under pressure: toggled more logging and stared at it harder. Nothing.

The wrong assumption was subtle: they believed all SMTP clients send SNI the way modern web clients do. Many did. Some didn’t. Those clients were getting the default certificate—brand A’s cert—because the mail daemon selected the first configured cert in the absence of SNI.

The fix was not to beg partners to upgrade. The fix was to stop needing SNI for correctness. They issued a certificate with SANs covering both MX hostnames and configured the SMTP service to present that single cert. They also updated monitoring to test with and without SNI, because reality has corners.

Afterward, the team wrote down the rule that should have existed already: for public SMTP, SNI is a performance/organizational feature, not a correctness dependency—unless you’ve explicitly validated your peer ecosystem.

2) The optimization that backfired

A different org was proud of tightening security. They disabled TLS 1.2 on their submission service because “TLS 1.3 is more secure and faster.” This was deployed alongside a new load balancer and a shiny compliance report. Everyone slept well.

Two weeks later, they got a wave of user complaints: mobile mail clients failing to send email intermittently. Not all clients. Not all networks. Of course. The helpdesk blamed passwords and forced resets, which made the problem worse and the users angrier.

The root cause was compatibility: several enterprise-managed mobile clients still negotiated TLS 1.2 only. Those clients weren’t “wrong” so much as “slow to update,” which is basically the default state of corporate endpoint management. The load balancer logs showed handshake failures; the application logs were mostly silent.

The fix was to re-enable TLS 1.2 on submission and keep TLS 1.3 preferred. They also moved the “security posture change” process to include a compatibility canary: before disabling a protocol version, they’d measure who still uses it on the submission edge.

The lesson: security improvements that break email are not security improvements; they’re productivity attacks you launched on your own staff.

3) The boring but correct practice that saved the day

A mid-sized SaaS provider ran their mail relays in two data centers. Nothing exotic. They did two things religiously: keep certificate inventory in config management, and run daily synthetic probes that validated TLS handshakes the way real SMTP peers do.

One day a CA rotated an intermediate in a way that was perfectly legitimate. Their ACME client renewed the leaf, but the deployment pipeline accidentally shipped only the leaf certificate to one data center’s relays—no intermediate. Half their outbound deliveries started getting deferred by stricter receivers. The other half kept flowing, because the other data center was configured correctly.

Here’s why it didn’t become a “major incident”: their synthetic probe caught it within minutes because it tested openssl s_client -starttls smtp -showcerts and asserted the chain length and verification status. The alert wasn’t “mail queue high.” It was “TLS chain incomplete on mx2.” Specific, actionable, boring.

The rollback was immediate. No heroics. No Slack archaeology. Just a clean redeploy with the correct full chain file and a postmortem that focused on why the pipeline allowed a partial chain artifact.

Boring correctness is underrated until it’s the only thing between you and a week of “email is unreliable” reputation damage.

Common mistakes: symptom → root cause → fix

1) Symptom: “STARTTLS not offered”

  • Root cause: TLS disabled in MTA, wrong service (submission vs MX), or a proxy/load balancer stripping extensions.
  • Fix: Confirm with a raw EHLO probe. Enable STARTTLS on the correct listener. If a proxy is in front, ensure it passes SMTP transparently or terminates TLS correctly.

2) Symptom: “lost connection after STARTTLS” on your server logs

  • Root cause: peer aborted handshake due to untrusted chain, hostname mismatch, or protocol/cipher mismatch.
  • Fix: Run openssl s_client -starttls smtp -servername ... -showcerts from the peer side if possible; otherwise reproduce from an external host. Validate chain completeness and negotiated protocol.

3) Symptom: “verify error:num=20 unable to get local issuer certificate”

  • Root cause: missing intermediate(s) in what the server presents.
  • Fix: configure the service to present full chain (leaf + intermediates). Don’t assume clients will fetch AIA.

4) Symptom: “certificate has expired” but you renewed it

  • Root cause: service still using old file, reload not performed, wrong listener updated, or a load balancer terminates TLS with its own cert.
  • Fix: query the live service certificate and compare its serial/dates to disk. Reload the right daemon. Update the load balancer’s cert store if it terminates TLS.

5) Symptom: “wrong version number” in OpenSSL

  • Root cause: you connected with implicit TLS to a STARTTLS port (or vice versa), or a non-TLS service is on that port.
  • Fix: use -starttls smtp for ports 25/587 and plain openssl s_client -connect host:465 for 465.

6) Symptom: works from some senders, fails from others

  • Root cause: SNI dependency, chain presentation differences across nodes, or peers with different trust stores/protocol support.
  • Fix: test with and without SNI; test each MX node/IP; ensure consistent chain deployment across fleet.

7) Symptom: handshake failure only after a crypto library upgrade

  • Root cause: stricter defaults (minimum key size, signature algorithms, disabled legacy ciphers).
  • Fix: inspect negotiated signature/cipher, adjust policy intentionally, and document exceptions. Don’t blindly re-enable deprecated algorithms globally.

Joke 2: If you’re still enabling TLS 1.0 “for that one partner,” you’re basically adopting a pet tiger because it looked lonely.

Checklists / step-by-step plan

Incident checklist: get mail flowing without creating a security incident

  1. Reproduce from the failing vantage point. Same host/network if possible.
  2. Identify mode: STARTTLS (25/587) vs implicit TLS (465).
  3. Capture handshake evidence: protocol, cipher, presented chain, verify code.
  4. Validate naming: what hostname are peers connecting to (MX), and is it in SAN?
  5. Validate chain completeness: leaf + intermediates presented, ordered correctly.
  6. Check expiry and time sync: certificate dates, system clock, NTP status.
  7. Check SNI behavior: compare probe with and without -servername.
  8. Check fleet consistency: test each MX IP directly, not just the hostname.
  9. Only then adjust TLS policy: minimum protocols/ciphers, per-peer exceptions if business-critical.
  10. Roll out safely: reload services, confirm live cert, and re-run probes.
  11. Close the loop: drain queues, watch deferrals, confirm partner acceptance.
  12. Write the post-incident note: what failed, what detection missed, what you’ll automate.

Hardening checklist: keep compatibility without being reckless

  • Enable TLS 1.2 and TLS 1.3; disable TLS 1.0/1.1 on internet-facing services unless you have a narrowly scoped relay exception.
  • Prefer AEAD ciphers; avoid static RSA key exchange.
  • Use a certificate with SANs matching the MX hostname(s). Don’t rely on CN-only behavior.
  • Present a complete chain (leaf + intermediates). Test with a strict verifier.
  • Automate renewals and reloads, and verify live presentation after deploy.
  • Monitor from outside your network with STARTTLS probes, not just HTTPS checks.
  • Document where TLS is terminated (MTA vs load balancer). One termination point per flow is a sanity feature.

Change-management plan: making TLS changes without surprise outages

  1. Collect telemetry: current negotiated protocol versions and ciphers (on submission, and if possible on inbound).
  2. Stage changes on one node first (canary MX), measure deferrals and handshake errors.
  3. Communicate partner-impacting changes ahead of time when enforcing TLS requirements.
  4. Roll fleet-wide with automated validation: probe each node/IP and verify chain and name match.
  5. Keep an emergency rollback path that does not involve enabling deprecated protocols globally.

FAQ

1) Why does OpenSSL verify OK but Postfix (or a partner) still fails?

Different trust stores, different defaults, and sometimes different SNI behavior. OpenSSL might use the OS CA bundle; your MTA might use a custom CA file or stricter policy maps.

2) Should I include the root CA in the chain I present?

Usually no. Present leaf + intermediates. Roots belong in client trust stores. Including the root rarely helps and sometimes confuses broken clients.

3) What’s the single most common chain mistake?

Deploying only the leaf certificate. SMTP clients often won’t fetch intermediates, so verification fails even though “the cert is valid” in a browser.

4) Can I fix handshake failures by allowing weaker ciphers?

Sometimes, but it’s often the wrong move. If a peer only supports weak ciphers, route that traffic through a controlled relay with scoped policy. Don’t weaken your main MX for the internet.

5) Why do some senders defer instead of delivering without TLS?

Some senders enforce TLS policies (per-domain requirements, MTA-STS, DANE, or internal compliance). They’d rather delay than risk plaintext delivery.

6) What does “no suitable signature algorithm” usually mean?

Signature algorithm negotiation failed—often because the peer is very old or your cert/key type (RSA-PSS/ECDSA) doesn’t match what it can validate. The fix is compatibility policy or upgrading the peer, not random cipher changes.

7) Is port 465 “wrong” for SMTP?

No. Port 465 is commonly used for implicit TLS submission. For server-to-server transport, port 25 with STARTTLS is still the norm. Use the right mode for the right audience.

8) How do I know if SNI is the problem?

Probe with and without -servername. If the certificate changes, you have SNI-based selection. Decide if you can depend on it, or move to a cert that covers all required names.

9) What if my certificate covers the wrong hostname—can I just change the MX record?

You can, but do it deliberately. MX names are part of your mail identity and reputation footprint. Often the safer fix is to reissue the cert to match the existing MX and align DNS, banner, and SANs.

Conclusion: next steps that survive audits

SMTP TLS handshake failures are rarely “random”. They’re usually deterministic: wrong name, incomplete chain, incompatible protocol/cipher, or inconsistent deployment across nodes. The way out is to stop guessing and start interrogating the live service with strict probes.

Do these next:

  1. Run a STARTTLS probe with -showcerts and capture negotiated protocol/cipher and verify code.
  2. Repeat without SNI and directly against each MX IP to expose default-cert and drift issues.
  3. Fix the chain presentation (leaf + intermediates) and ensure SANs match the MX hostname(s).
  4. Keep TLS 1.2 + 1.3 enabled; avoid global downgrades for a single legacy peer—use scoped relays or per-destination policy.
  5. Automate validation: after every renewal and every TLS-related change, verify what the service presents on the wire.

If you do only one thing differently after reading this: stop trusting the certificate file on disk. Trust what the network sees.

← Previous
Debian 13: SSHFS vs NFS — pick the one that won’t randomly hang (and configure it right)
Next →
IPC > GHz: the simplest explanation you’ll actually remember

Leave a comment