Ubuntu 24.04: Certificates renew but Nginx still serves the old one — why and how to fix

October 15, 2025 • February 3, 2026 • Read: 20 min • Views: 15

Was this helpful?

You ran certbot renew. It said “success.” Your monitoring still screams “certificate expires in 2 days.”
You check the filesystem and the new cert is right there. Yet browsers keep seeing the old one—sometimes from the same box.

This isn’t a mystery. It’s a set of very predictable failure modes: wrong path, wrong server block, no reload,
a load balancer doing TLS in front of you, or a stale process that never reopened files. On Ubuntu 24.04 the defaults
are sane, but the ecosystem around them (Snap, systemd, multiple Nginx instances, containers) makes it easy to be wrong with confidence.

What’s actually happening when “renewed” ≠ “served”

There are two separate systems at play:

Certificate issuance/renewal: an ACME client (usually Certbot, sometimes lego/acme.sh) writes new files to disk.
TLS serving: Nginx reads certificate files into memory when it loads configuration (start or reload). It does not magically “see” new bytes on disk.

So the failure mode is straightforward: renewal succeeded, but the serving stack didn’t switch over.
The tricky part is that “the serving stack” might not be the process you think it is. It might be:
Nginx on the host, Nginx in a container, Nginx in a chroot, a second Nginx instance bound to 443,
a reverse proxy upstream, or a cloud load balancer terminating TLS before traffic even reaches Nginx.

One more nuance: sometimes Nginx is reloaded correctly, but it still serves the “wrong” certificate because SNI selection picks a different server block than you expect.
This is common when you have a default server, wildcard certificates, or a catch-all block left behind by an automation tool.

Your goal is not “renew the cert.” That’s the easy part. Your goal is: prove which process answers TLS for a given hostname, and prove which files it loaded.

Joke #1: Certificates are like milk—freshness matters, but nobody wants to be the person sniffing them in production at 2 a.m.

Fast diagnosis playbook (first/second/third)

First: confirm what clients actually see

From a machine outside your network (or using a public probe), check the presented cert for the exact hostname.
Record issuer, serial, and notBefore/notAfter dates.
If it’s old, don’t touch Certbot yet. The server is serving old bytes; renewal might already be fine.

Second: identify who terminates TLS

Is port 443 on the host even Nginx?
Is there a load balancer, CDN, WAF, or ingress controller doing TLS before Nginx?
Is there more than one Nginx (host + container)?

Third: correlate Nginx config → certificate path → on-disk file → loaded process

Find the server block that matches the SNI hostname.
Verify the ssl_certificate and ssl_certificate_key paths.
Check if they are Let’s Encrypt live/ symlinks or pinned to archive/.
Reload Nginx and re-check from the outside.
If reload doesn’t change what’s served, you’re reloading the wrong Nginx or the wrong config.

That’s the bottleneck-hunting order. Start from what users see, then trace inward. Don’t start by tailing Certbot logs.
That’s how you spend an hour proving the wrong thing.

Interesting facts and context (why this problem keeps happening)

Fact 1: Nginx does not automatically re-read certificate files on disk. It loads them at startup and on reload (graceful config reload).
Fact 2: Let’s Encrypt certificates are short-lived (typically 90 days), by design, to reduce the blast radius of key compromise.
Fact 3: Certbot maintains two directories: /etc/letsencrypt/archive (versioned files) and /etc/letsencrypt/live (symlinks to “current”).
Fact 4: A surprisingly common outage is “renew succeeded, but reload failed.” Renewal log says fine; Nginx kept serving the old cert for days.
Fact 5: In the early TLS era, some servers required full restarts for cert changes; graceful reload became a practical reliability feature, not just a convenience.
Fact 6: SNI (Server Name Indication) is how one IP serves multiple certificates. If SNI selection lands in the wrong server block, you get the wrong cert even though the right cert exists.
Fact 7: Ubuntu’s adoption of Snap-packaged Certbot changed file locations for logs and sometimes changed how renewal hooks run, especially when mixing apt and snap installs.
Fact 8: OCSP stapling can make debugging confusing: clients might cache or display stapling status separately from the certificate chain, so “it looks wrong” isn’t always about the cert file.

The core causes: where the old cert really comes from

1) Nginx never reloaded (or reload failed)

Certbot can renew without restarting anything. Nginx keeps running, happily serving whatever it loaded last week.
If you have a deploy hook that should reload Nginx, it might not be configured, might not run, or might be failing.
Even worse: nginx -s reload might succeed but you’re reloading a different binary or different instance than the one bound to 443.

2) Nginx is pointing at the wrong file path

The correct pattern is to reference /etc/letsencrypt/live/yourname/fullchain.pem and privkey.pem.
If someone hardcoded /etc/letsencrypt/archive/yourname/fullchain2.pem, it will never advance automatically.
It will work right up until it doesn’t. This is the “works until next rotation” classic.

3) Wrong server block selected (SNI mismatch)

You might have the right certificate in one server block, and a default server block with a stale certificate.
Clients hitting example.com get the right cert; clients hitting www.example.com or an API hostname get the default.
Or your health checker uses an IP address and no SNI, so it always hits default. You then “fix” the wrong thing.

4) TLS is terminated elsewhere

If you use a cloud load balancer, CDN, reverse proxy, or Kubernetes ingress, Nginx might not serve TLS at all.
Renewal on the backend is irrelevant; the front door still has the old certificate.
The most expensive outages are the ones where everyone is debugging the wrong layer with great enthusiasm.

5) Multiple Nginx instances, containers, or ports

Ubuntu 24.04 is perfectly capable of running:
Nginx on the host (systemd), Nginx in Docker (compose), and an ingress controller (Kubernetes).
If two processes are listening on 443 on different network interfaces (or one on host network, one behind NAT), you can “reload” the wrong one all day.

6) Permissions or key format changes

Certbot might renew but write keys with permissions that Nginx cannot read (less common with standard Let’s Encrypt paths, more common with custom hooks or copied files).
Or you rotated from RSA to ECDSA, changed filenames, and forgot to update the Nginx config. Nginx reload then fails and keeps the old config in memory.

Practical tasks: commands, outputs, and decisions (12+)

These are the checks I actually run when production says “old certificate.” Each task includes:
the command, what the output means, and the decision you make from it.

Task 1: Check the served certificate from the outside (SNI on)

cr0x@server:~$ openssl s_client -connect example.com:443 -servername example.com -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -issuer -dates -serial
subject=CN = example.com
issuer=C = US, O = Let's Encrypt, CN = R11
notBefore=Dec  1 00:12:34 2025 GMT
notAfter=Feb 29 00:12:33 2026 GMT
serial=03A1B2C3D4E5F6

Meaning: This is what clients see. The notAfter date tells you if it’s old.

Decision: If this is still the old expiration, proceed. If it’s new, your monitoring might be checking a different hostname or endpoint.

Task 2: Check served certificate without SNI (default server behavior)

cr0x@server:~$ openssl s_client -connect example.com:443 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -subject -dates
subject=CN = default.invalid
notBefore=Sep  2 10:00:00 2025 GMT
notAfter=Dec  1 10:00:00 2025 GMT

Meaning: No SNI means Nginx picks the default server for that IP:port.

Decision: If the default cert is stale, fix the default server block too, or fix clients/health checks to send SNI.

Task 3: Confirm which process is listening on 443

cr0x@server:~$ sudo ss -ltnp | grep ':443 '
LISTEN 0      511          0.0.0.0:443       0.0.0.0:*    users:(("nginx",pid=2147,fd=6),("nginx",pid=2146,fd=6))

Meaning: You see which binary owns the socket.

Decision: If it’s not Nginx (or not the one you think), stop and chase that process. Don’t reload the wrong thing.

Task 4: Ensure you’re reloading the same Nginx that’s running

cr0x@server:~$ ps -fp 2147
UID          PID    PPID  C STIME TTY          TIME CMD
root        2147       1  0 Dec28 ?        00:00:02 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;

Meaning: Confirms the path (/usr/sbin/nginx) and that systemd likely owns it.

Decision: Use systemctl reload nginx for this instance; don’t call some other nginx binary from a container image.

Task 5: Validate Nginx configuration before reloading

cr0x@server:~$ sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Meaning: Reload will likely succeed. If this fails, Nginx will refuse to load new config/cert paths.

Decision: Fix config errors first. If nginx -t fails, “reload” won’t update certificates.

Task 6: Reload Nginx and confirm systemd accepted it

cr0x@server:~$ sudo systemctl reload nginx
cr0x@server:~$ systemctl status nginx --no-pager -l
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; preset: enabled)
     Active: active (running) since Sun 2025-12-28 09:10:12 UTC; 1 day 3h ago
       Docs: man:nginx(8)
    Process: 33910 ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reload (code=exited, status=0/SUCCESS)

Meaning: Reload ran and exited successfully.

Decision: Re-check the served certificate externally. If unchanged, you’re looking at the wrong TLS terminator or wrong server block.

Task 7: Find which server block matches your hostname

cr0x@server:~$ sudo nginx -T 2>/dev/null | grep -nE 'server_name|listen 443|ssl_certificate'
415:    listen 443 ssl http2;
416:    server_name example.com www.example.com;
432:    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
433:    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
601:    listen 443 ssl;
602:    server_name _;
612:    ssl_certificate /etc/ssl/certs/old-default.pem;
613:    ssl_certificate_key /etc/ssl/private/old-default.key;

Meaning: You’ve got at least two TLS server blocks: one correct, one default with an old certificate.

Decision: Update the default block (or remove it) if clients might hit it. Also ensure each hostname is covered with explicit server_name.

Task 8: Verify the `live/` symlinks actually point to the newest files

cr0x@server:~$ sudo ls -l /etc/letsencrypt/live/example.com/
total 4
-rw-r--r-- 1 root root 692 Sep  2 10:00 README
lrwxrwxrwx 1 root root  42 Dec 28 09:05 cert.pem -> ../../archive/example.com/cert12.pem
lrwxrwxrwx 1 root root  43 Dec 28 09:05 chain.pem -> ../../archive/example.com/chain12.pem
lrwxrwxrwx 1 root root  47 Dec 28 09:05 fullchain.pem -> ../../archive/example.com/fullchain12.pem
lrwxrwxrwx 1 root root  45 Dec 28 09:05 privkey.pem -> ../../archive/example.com/privkey12.pem

Meaning: Nginx should reference these symlinks. Certbot rotates the targets.

Decision: If Nginx points to archive/ directly, change it to live/ so future renewals don’t require manual edits.

Task 9: Inspect on-disk cert dates (what renewal actually produced)

cr0x@server:~$ sudo openssl x509 -in /etc/letsencrypt/live/example.com/fullchain.pem -noout -dates -issuer -subject
notBefore=Dec 28 09:05:11 2025 GMT
notAfter=Mar 28 09:05:10 2026 GMT
issuer=C = US, O = Let's Encrypt, CN = R11
subject=CN = example.com

Meaning: The file on disk is new.

Decision: If disk is new but served is old, the issue is reload/SNI/TLS terminator, not renewal.

Task 10: Check Certbot renewal history and whether a deploy hook ran

cr0x@server:~$ sudo grep -R "Deploying certificate" -n /var/log/letsencrypt/letsencrypt.log | tail -n 3
2025-12-28 09:05:12,345:INFO:Deploying certificate to /etc/letsencrypt/live/example.com/fullchain.pem
2025-12-28 09:05:12,346:INFO:Deploying key to /etc/letsencrypt/live/example.com/privkey.pem
2025-12-28 09:05:12,900:INFO:Running deploy-hook command: systemctl reload nginx

Meaning: Certbot did renewal and attempted reload.

Decision: If there’s no deploy-hook line, add one. If it exists, verify it succeeded in the system journal.

Task 11: Verify the Certbot timer (Ubuntu 24.04 systemd scheduling)

cr0x@server:~$ systemctl list-timers --all | grep -E 'certbot|letsencrypt'
Sun 2025-12-29 06:18:00 UTC  10h left  Sat 2025-12-28 06:18:01 UTC  13h ago  snap.certbot.renew.timer  snap.certbot.renew.service

Meaning: Renewal is on a schedule (here via Snap’s unit).

Decision: If no timer exists, renewal won’t happen automatically. If timer exists but you used apt certbot, you may have two competing setups—pick one.

Task 12: Confirm which Certbot you’re using (Snap vs apt) to avoid split-brain

cr0x@server:~$ which certbot
/snap/bin/certbot
cr0x@server:~$ snap list certbot
Name    Version   Rev  Tracking       Publisher  Notes
certbot 2.11.0    3741 latest/stable  certbot*   classic

Meaning: You’re using Snap Certbot.

Decision: Make sure any hooks, logs, and expectations match the Snap packaging. Avoid mixing with an apt-installed Certbot unless you enjoy forensic archaeology.

Task 13: Find out if you’re behind a load balancer or proxy doing TLS

cr0x@server:~$ curl -sI https://example.com | grep -iE 'server:|via:|x-forwarded|cf-|x-amz'
server: cloud-proxy
via: 1.1 edge-gw
x-forwarded-proto: https

Meaning: Headers suggest a proxy layer. Not proof, but a strong smell.

Decision: Check your DNS and network path. If TLS is terminated upstream, update the certificate there, not on Nginx.

Task 14: Check Nginx error logs for reload/cert read failures

cr0x@server:~$ sudo tail -n 30 /var/log/nginx/error.log
2025/12/28 09:05:13 [notice] 2147#2147: signal process started
2025/12/28 09:05:13 [emerg] 2147#2147: cannot load certificate "/etc/letsencrypt/live/example.com/fullchain.pem": BIO_new_file() failed (SSL: error:80000002:system library::No such file or directory)

Meaning: Reload attempted but failed to load the file. Nginx will keep the old config active.

Decision: Fix the file path, permissions, or SELinux/AppArmor constraints (AppArmor is more common here on Ubuntu). Then reload again.

Task 15: Prove which certificate file the running Nginx worker has open

cr0x@server:~$ sudo lsof -p 2147 | grep -E 'fullchain|cert\.pem|privkey' | head
nginx 2147 root  mem REG  252,0   4231  131072 /etc/letsencrypt/live/example.com/fullchain.pem
nginx 2147 root  mem REG  252,0   1704  131073 /etc/letsencrypt/live/example.com/privkey.pem

Meaning: This is the closest you’ll get to “what Nginx loaded” from the OS view.

Decision: If it’s still pointing to an old archive file or a different domain, fix config and reload. If it’s correct, but clients see old, TLS termination is elsewhere.

Task 16: Confirm your hostname resolves to the box you’re debugging

cr0x@server:~$ dig +short A example.com
203.0.113.10
cr0x@server:~$ ip -brief address show | awk '{print $1,$3}'
lo 127.0.0.1/8
eth0 203.0.113.10/24

Meaning: DNS A record matches the host IP.

Decision: If DNS points elsewhere (or you’re behind anycast/LB), you’re debugging the wrong machine. This happens more than people admit.

Three corporate-world mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-size company had a “simple” setup: one VM, one Nginx, one domain. Or so everyone believed.
They renewed a Let’s Encrypt certificate on Ubuntu, reloaded Nginx, and still got alerts that the cert was expiring.
The on-call engineer assumed the monitoring was wrong because certbot certificates showed a fresh expiration.

The browser on their laptop showed the old certificate, too. Then it got weirder: a coworker on a different ISP saw the new one.
That difference sparked the first correct question: “Are we behind an edge network?”
It turned out marketing had recently enabled a managed CDN/WAF feature in the DNS provider’s dashboard.
TLS was being terminated at the edge with an uploaded certificate that nobody had rotated.

The “wrong assumption” wasn’t technical incompetence; it was organizational. Everyone assumed the architecture diagram from last year was still accurate.
In practice, the system evolved without a single pull request touching Nginx.
Renewing the backend certificate did nothing because the backend was no longer the TLS endpoint.

The fix was boring: rotate the certificate where TLS terminated, document it, and add a probe that checks the certificate from multiple networks.
The prevention was even more boring: a weekly “what changed in our edge settings” review.
Nobody loves it, but it beats learning about architecture changes from a pager.

Mini-story 2: The optimization that backfired

Another team wanted to “reduce reloads” on a busy Nginx fleet. They had a shared belief that reloading Nginx too often would cause latency spikes.
They disabled automatic reload hooks after renewal and planned to reload during a weekly maintenance window.
The idea sounded reasonable: fewer config reloads, fewer moving parts.

Two months later, they hit the predictable edge: renewal succeeded several times, but the running processes never reloaded.
Some servers had been restarted for unrelated reasons and served fresh certs; others hadn’t and served stale certs.
The fleet became a patchwork of expiration dates. Monitoring went from a clean signal to noise.

The operational pain came from the ambiguity. When one hostname started failing TLS handshakes for a subset of users, engineers couldn’t tell if it was a network issue,
a specific node, or a client cache problem. They spent hours collecting evidence that all pointed to one sentence: “our optimization created state drift.”

They reversed the change: restore deploy hooks, keep reloads automatic, and measure impact rather than assuming it.
Properly executed, Nginx reloads are graceful. They don’t drop connections; they spin up new workers and drain old ones.
The backfire wasn’t that reloads are inherently dangerous. The backfire was treating operational consistency as optional.

Mini-story 3: The boring but correct practice that saved the day

A regulated enterprise ran Nginx on Ubuntu with a strict change process. Everyone complained about the checklists.
But they had one practice that looked like bureaucracy and acted like insurance: a post-renewal verification job.
After every renewal, a job ran from outside the network and recorded the served certificate serial number and expiry in a central log.

One morning the job flagged a single VIP hostname still serving an old certificate. Everything else looked healthy.
The team didn’t panic; they had a playbook. First check: SNI. Second check: default server. Third check: load balancer listeners.
They found a legacy listener on the load balancer that was still using an uploaded certificate instead of the managed one.

The fix took minutes because the evidence was immediate and specific: “this listener presents serial X; expected serial Y.”
No debate, no screenshot archaeology, no “works on my laptop.”
A change request got approved quickly because it was a straightforward substitution, not exploratory surgery.

The moral is aggressively unromantic: verify what users see, automatically, after every rotation.
It’s the kind of practice that makes engineers feel underappreciated—right up until it prevents an outage.

Common mistakes: symptom → root cause → fix

1) “Certbot says renewed, but browsers still show the old expiry”

Root cause: Nginx didn’t reload, or reload failed.

Fix: Run sudo nginx -t; then sudo systemctl reload nginx. Check systemctl status nginx and /var/log/nginx/error.log. Add a deploy hook so this happens automatically.

2) “The cert on disk is new, but served is old—only for some hostnames”

Root cause: SNI/server block mismatch. A default server block or another server_name catches traffic.

Fix: Use openssl s_client -servername for each hostname. Inspect nginx -T for listen 443 and server_name. Ensure each hostname maps to the correct cert.

3) “Health checks say the cert is old, but normal users are fine”

Root cause: Health checker doesn’t send SNI and hits the default server/certificate.

Fix: Configure the checker to use the hostname (SNI) or fix the default server’s certificate to a valid one for that IP endpoint.

4) “Reload succeeds but nothing changes”

Root cause: You reloaded a different Nginx than the one serving 443, or TLS terminates upstream.

Fix: Confirm socket ownership with ss -ltnp. Confirm DNS points to this host. If a load balancer/CDN exists, rotate certificate there.

5) “After renewal, Nginx won’t reload; it keeps serving the old cert”

Root cause: The new files are missing/unreadable due to permissions, path typos, or incorrect file references (e.g., referencing chain.pem instead of fullchain.pem in some setups).

Fix: Check /var/log/nginx/error.log for cannot load certificate. Fix paths, permissions, and reference fullchain.pem. Re-run nginx -t.

6) “Only some clients see the old cert; others see the new one”

Root cause: Multiple edge nodes / multiple A records / anycast / load balancer pool mismatch.

Fix: Resolve DNS from multiple places; check each backend node individually; ensure renewal and reload happen everywhere; fix drift.

7) “We updated Nginx config to new cert path, but it reverted later”

Root cause: Configuration management or Certbot’s Nginx installer overwrote files.

Fix: Manage TLS directives in your source-of-truth (Ansible, Puppet, etc.) and avoid manual edits to generated snippets without ownership.

Joke #2: Nothing builds team alignment like a certificate expiring—suddenly everyone remembers who owns DNS.

Checklists / step-by-step plan

Step-by-step: fix the “renewed but still old” situation safely

Prove the symptom externally. Use openssl s_client -servername and record expiry + serial.
Prove who owns :443. Use ss -ltnp. If it’s not Nginx, stop and route the ticket correctly.
Prove DNS points here. Use dig +short and compare to local IPs.
Inspect Nginx config selection. Use nginx -T, find the correct server_name, and identify ssl_certificate paths.
Ensure you reference Let’s Encrypt live/ symlinks. Fix any archive/ hardcoding.
Validate config. Run nginx -t. No validation, no reload.
Reload Nginx. Use systemctl reload nginx (or your service manager) and confirm success in systemctl status.
Re-check externally. Same command, same hostname, confirm serial/expiry changed.
If unchanged, hunt upstream termination. Check LB/CDN listeners and where certificates are managed.
Automate the next one. Add a deploy hook and a post-renew external verification check.

Checklist: what “good” looks like on Ubuntu 24.04

Nginx TLS config references /etc/letsencrypt/live/<name>/fullchain.pem and privkey.pem.
Exactly one certificate authority pipeline exists (Snap Certbot or apt-based tooling), not both.
A systemd timer runs renewal regularly, and logs show deploy hooks executing.
Nginx reload is tested (nginx -t) and monitored for failures.
An external probe verifies the served certificate after renewal and alerts on mismatch.

Checklist: avoid these “clever” moves

Don’t point Nginx at /etc/letsencrypt/archive files unless you enjoy manual rotations.
Don’t “optimize” by removing reload hooks. Drift will find you.
Don’t assume the VM serves TLS just because it runs Nginx.
Don’t debug using only a browser screenshot. Use openssl and capture serial numbers.

FAQ

1) Why doesn’t Nginx automatically pick up renewed certificates?

Because Nginx reads certificate and key files when it loads configuration. Renewal changes the files on disk, not Nginx’s in-memory state.
You need a reload (or restart) to reopen and parse the new files.

2) Is `systemctl reload nginx` safe in production?

Usually, yes. A reload is designed to be graceful: new workers start with new config, old workers drain.
Still run nginx -t first; a broken config turns “safe” into “self-inflicted incident.”

3) What’s the difference between `fullchain.pem` and `cert.pem`?

cert.pem is just the leaf certificate. fullchain.pem includes the leaf plus intermediates.
Many clients require the intermediates to build trust reliably, so fullchain.pem is the standard Nginx choice.

4) Why do I see the new cert on disk but an old cert over the network?

Either Nginx didn’t reload, Nginx is pointing at different files than you inspected, or the TLS endpoint isn’t Nginx at all (load balancer/CDN).
Treat it as a routing/ownership problem until proven otherwise.

5) My monitoring says “certificate expires soon,” but `openssl s_client` shows a fresh cert. Who’s lying?

Possibly neither. Monitoring might be checking a different hostname, checking without SNI, checking an old IP, or checking a different region/edge.
Update the check to send SNI and validate the exact hostname users hit.

6) Can I just restart Nginx instead of reloading?

You can, but reload is usually the better habit: less disruption and fewer surprise side effects.
Restart is a blunt tool; use it when reload is blocked (for example, wedged workers) or when your process manager requires it.

7) What if I’m using a load balancer that terminates TLS?

Then renewing the backend certificate won’t affect what users see. Rotate the certificate on the load balancer (or enable managed certificates there).
You can still keep backend TLS for defense-in-depth, but don’t confuse it with the public certificate.

8) How do I prevent this from happening again?

Two things: (1) a deploy hook that reloads Nginx after successful renewal, and (2) an external verification check that confirms the served serial/expiry.
Automation without verification is just fast failure.

9) What’s the one signal you trust when debugging?

The certificate actually served to a real client, measured with SNI using openssl s_client (or an equivalent probe) and recorded with serial + expiry.
Everything else is supporting evidence.

Conclusion: next steps that prevent the rerun

This problem is rarely “Let’s Encrypt failed.” It’s almost always “the serving layer didn’t switch.” Your debugging should reflect that.
Start from the outside, prove what’s served, prove who serves it, then align config paths and reload behavior.

The operational principle is well captured by a paraphrased idea often attributed to Ronald Coase: if you torture the data long enough, it will confess.
In ops terms: don’t torture logs and assumptions. Measure the live endpoint, then work backward.

Practical next steps:

Standardize Nginx to reference /etc/letsencrypt/live/.../fullchain.pem and privkey.pem, not archive/.
Make nginx -t + systemctl reload nginx the post-renew action via a deploy hook.
Add an external probe that records the served certificate serial and expiry per hostname after renewal.
Audit your TLS termination points: VM, container ingress, load balancer, CDN. Put an owner next to each one.