Docker + TLS: Let’s Encrypt inside/outside containers — pick the safe pattern

Was this helpful?

At 02:13, your on-call phone vibrates with the specific dread of “customers can’t log in.” You open the dashboard and see it: TLS handshake failures, a cascade of 525/526 errors, and a certificate that expired… yesterday. The service is fine. The certificate pipeline is not. That’s how “simple HTTPS” becomes a production incident.

Docker didn’t cause this. But Docker made it easier to hide the sharp edges: private keys in ephemeral filesystems, renewals running in the wrong namespace, ACME challenges stuck behind a proxy, and a “just mount /etc/letsencrypt somewhere” approach that turns into a permissions and rotation circus.

The decision: where Let’s Encrypt belongs

You have two broad options:

  1. ACME client runs on the host (or on a dedicated “certificate” VM/host namespace) and writes certificates into a controlled location; containers consume them read-only.
  2. ACME client runs inside a container (Certbot, lego, Traefik, Caddy, nginx-proxy companion, etc.) and writes certs to a volume that other containers read.

If you run production systems and you care about blast radius, the safe default is: terminate TLS at a dedicated reverse proxy container, and run certificate issuance/renewal in that same boundary (proxy-native ACME like Traefik/Caddy) or on the host with tight file permissions. Everything else gets plain HTTP on an internal network.

What you should avoid is the “every app container manages its own Let’s Encrypt” pattern. It looks modular. It’s actually a denial-of-service generator (rate limits), an observability nightmare, and a gift to anyone who wants your private keys scattered across writable volumes.

Here’s the opinionated rule: centralize certificates per ingress point. If the public Internet hits one place, that place owns TLS. Your app containers should not need to know what ACME is.

Facts and history that change how you operate

  • Let’s Encrypt launched in 2015 and made automation the expectation, not a luxury. The operational bar moved: “we renew by calendar reminder” stopped being acceptable.
  • ACME became an IETF standard (RFC 8555) in 2019. That matters because clients are replaceable; the workflow is not a vendor trick.
  • Certificates are short-lived by design (90 days for Let’s Encrypt). This is not stinginess; it reduces the damage window if a private key leaks.
  • Rate limits are part of the security model. They also punish “retry loops” when your deployment keeps failing challenges every minute.
  • HTTP-01 validation requires port 80 reachability for the domain. If you force HTTPS-only without a well-planned exception, you will break issuance at the worst time.
  • DNS-01 validation does not require inbound ports. It’s the go-to in locked-down environments, but it shifts your risk to DNS API credentials.
  • Wildcard certificates require DNS-01 in Let’s Encrypt. If your plan depends on wildcards, you’ve already chosen your challenge type.
  • TLS termination is also a trust boundary. It’s where you decide cipher suites, HSTS, client cert auth, and where private keys live.
  • Reloading a server is not the same as restarting a container. Some proxies can hot-reload certs; some need a restart; some need a signal; some need an API call.

One paraphrased idea from Google’s SRE book team (Beyer, Jones, Petoff, Murphy): Hope is not a strategy; reliability comes from engineered systems and feedback loops. That applies painfully well to certificate renewals.

Three patterns, ranked by safety

Pattern A (best default): reverse proxy owns TLS + ACME, apps stay HTTP-only

This is the “ingress is a product” approach. You run Traefik or Caddy (or nginx with a companion) as the only public-facing container. It requests certificates, stores them, renews them, and serves them. App containers never touch private keys.

Why it’s safe:

  • One place to harden and observe.
  • One place to reload certificates correctly.
  • Apps can be scaled/redeployed without touching TLS state.
  • Rate limits are easier to stay under.

Where it bites you: you must protect the proxy’s ACME storage (private keys and account keys). If your proxy container is compromised, that’s your certificate authority identity for that domain set.

Pattern B (very good): ACME client on host, certs bind-mounted read-only into proxy container

This is the boring Unix pattern. Certbot (or lego) runs on the host via systemd timers, writes to /etc/letsencrypt, and your proxy container reads certs from a read-only bind mount. Reload happens through a controlled hook.

Why it’s safe:

  • Host-level scheduling and logging are predictable.
  • You can use OS security controls (permissions, SELinux/AppArmor) more naturally.
  • Your proxy container doesn’t need DNS API credentials or ACME account keys.

Where it bites you: HTTP-01 challenges can be awkward if your proxy is also containerized. You need a clean path from the Internet to the challenge responder.

Pattern C (acceptable only when constrained): ACME client in a container writing into a shared volume

This is the “Certbot container + nginx container” Compose pattern. It can work. It also tends to age badly: permissions drift, volumes get copied between hosts, and renewals become invisible until they fail.

When it’s justified:

  • You can’t install anything on the host (managed environments, hardened images).
  • You’re in Kubernetes-like constraints but still on Docker.
  • You have a single-purpose host and strong container isolation policies.

What to do if you pick it: treat the certificate volume like a secret store. Read-only mounts everywhere except the ACME writer. Strict file ownership. No “777 because it works.”

Joke #1: Certificates are like milk. They’re fine until you forget the expiry date, and then the smell reaches management.

The pattern you should not ship: every service runs its own Certbot

Multiple services competing for port 80, each writing to its own volume, each renewing on its own schedule, each possibly using staging vs production endpoints differently. It’s a nice way to learn Let’s Encrypt rate limits in real time.

Certificate storage: volumes, permissions, and the private key problem

Most postmortems about TLS aren’t really about TLS. They’re about state: where the keys live, who can read them, and whether that state survives redeploys.

What must be persistent

  • Private keys (privkey.pem): if lost, you can reissue, but you’ll cause downtime and potentially lock yourself out of pinning/HSTS workflows.
  • Certificate chain (fullchain.pem): needed by the server to present a valid chain.
  • ACME account key: used by the client to authenticate to Let’s Encrypt. Lose it and you can still re-register, but you lose continuity and sometimes get operational surprises.

Bind mount vs named volume

Bind mount is simple and auditable: you can inspect files on the host, back them up, and apply host permissions. For sensitive material, bind mounts tend to be easier to reason about.

Named volumes are portable inside Docker tooling, but they can become a black box. They’re fine if you treat them like a managed datastore and you know how they’re backed up and restored.

Permissions: least privilege, not least effort

Your proxy needs read access to the key. Your ACME client needs write access. Nobody else does. Do not mount /etc/letsencrypt read-write into a half-dozen app containers because it’s convenient.

Decide the trust model:

  • Single host: store certs on the host filesystem, root-owned, group-readable by a specific group that the proxy container runs as.
  • Multiple hosts: avoid NFS for private keys unless you’re very sure about file locking and security. Prefer per-host issuance (DNS-01) or a secret distribution mechanism with rotation semantics.

Keys inside images: just don’t

Baking keys into images is a career-limiting move. Images get pushed to registries, cached on laptops, scanned by CI, and occasionally leaked. Keep keys out of the build context, out of layers, out of history.

Renewal and reload: what “automation” actually means

Renewal has three jobs:

  1. Get a new certificate before expiry.
  2. Put it where the server expects it.
  3. Make the server use it without dropping traffic.

Hot reload vs restart

Some proxies can reload certs without dropping connections. Others can’t. You need to know which you have, and you need to test it. “Restart the container weekly” is not a strategy; it’s a roulette wheel with better branding.

Hooks are your friend

If you use Certbot on the host, use deploy hooks to reload nginx/Traefik gracefully. If you use a proxy-native ACME implementation, confirm how it persists ACME state and how it handles reload.

Joke #2: Nothing builds confidence like a TLS renewal job that only runs when someone remembers it exists.

Practical tasks: commands, outputs, and decisions

These are not “copy-paste and pray” snippets. Each task includes what the output means and what decision you make next. Run them on the host unless stated otherwise.

Task 1: Confirm what is actually listening on ports 80/443

cr0x@server:~$ sudo ss -lntp | egrep ':80|:443'
LISTEN 0      4096         0.0.0.0:80        0.0.0.0:*    users:(("docker-proxy",pid=1123,fd=4))
LISTEN 0      4096         0.0.0.0:443       0.0.0.0:*    users:(("docker-proxy",pid=1144,fd=4))

Meaning: Docker is publishing both ports. That implies a container is your ingress. If you expected host nginx to own 80/443, you already found the conflict.

Decision: Identify which container maps those ports and confirm it’s the single TLS termination point.

Task 2: Identify the container publishing the ports

cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}'
NAMES        IMAGE             PORTS
edge-proxy   traefik:v3.1      0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp
api          myco/api:1.9.2    127.0.0.1:9000->9000/tcp

Meaning: edge-proxy is the public entry point. Good. The API is only local.

Decision: Ensure all public TLS happens in edge-proxy and remove any direct 443 exposures elsewhere.

Task 3: Check certificate currently served to the Internet

cr0x@server:~$ echo | openssl s_client -connect example.com:443 -servername example.com 2>/dev/null | openssl x509 -noout -subject -issuer -dates
subject=CN = example.com
issuer=C = US, O = Let's Encrypt, CN = R3
notBefore=Dec  1 03:12:10 2025 GMT
notAfter=Feb 29 03:12:09 2026 GMT

Meaning: The live cert expires on Feb 29. That’s your hard deadline. Also confirms SNI is correct.

Decision: If expiry is within 14 days and you don’t have a verified renewal pipeline, stop what you’re doing and fix that first.

Task 4: Validate the full chain and handshake quality

cr0x@server:~$ openssl s_client -connect example.com:443 -servername example.com -showcerts </dev/null 2>/dev/null | egrep 'Verify return code|subject=|issuer='
subject=CN = example.com
issuer=C = US, O = Let's Encrypt, CN = R3
Verify return code: 0 (ok)

Meaning: Chain is good and clients should validate it.

Decision: If verify code is not 0, check whether you’re serving fullchain.pem vs cert.pem and whether your proxy is configured for the correct bundle.

Task 5: If using Certbot on host, list certs and expirations

cr0x@server:~$ sudo certbot certificates
Saving debug log to /var/log/letsencrypt/letsencrypt.log

Found the following certs:
  Certificate Name: example.com
    Domains: example.com www.example.com
    Expiry Date: 2026-02-29 03:12:09+00:00 (VALID: 57 days)
    Certificate Path: /etc/letsencrypt/live/example.com/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/example.com/privkey.pem

Meaning: Certbot’s view of state. If this differs from what openssl s_client shows, your proxy is not reading the expected files.

Decision: Align the proxy config with the live paths under /etc/letsencrypt/live and ensure those symlinks are reachable inside the container.

Task 6: Dry-run renewals (staging) to verify the pipeline

cr0x@server:~$ sudo certbot renew --dry-run
Saving debug log to /var/log/letsencrypt/letsencrypt.log

Processing /etc/letsencrypt/renewal/example.com.conf
Simulating renewal of an existing certificate for example.com and www.example.com

Congratulations, all simulated renewals succeeded:
  /etc/letsencrypt/live/example.com/fullchain.pem (success)

Meaning: Your challenge path, credentials, and hooks work in staging. This is the closest thing to a unit test you get.

Decision: If this fails, do not wait for production renewal. Fix the failure now.

Task 7: Inspect challenge reachability for HTTP-01

cr0x@server:~$ curl -i http://example.com/.well-known/acme-challenge/ping
HTTP/1.1 404 Not Found
Server: traefik
Date: Sat, 03 Jan 2026 10:21:42 GMT
Content-Type: text/plain; charset=utf-8

Meaning: You can reach the host and the proxy is responding on port 80. A 404 is fine for this synthetic URL; what matters is that it doesn’t redirect to HTTPS in a way your ACME client can’t handle.

Decision: If you get a connection timeout, your firewall/NAT/port publish is wrong. If you get a 301 to HTTPS, confirm your ACME client/proxy supports it safely, or carve out an exception for the challenge path.

Task 8: Verify Docker volume mounts for certificate material

cr0x@server:~$ docker inspect edge-proxy --format '{{json .Mounts}}'
[{"Type":"bind","Source":"/etc/letsencrypt","Destination":"/etc/letsencrypt","Mode":"ro","RW":false,"Propagation":"rprivate"}]

Meaning: The proxy reads /etc/letsencrypt from the host, read-only. That’s the shape you want.

Decision: If it’s RW when it shouldn’t be, lock it down. If the source is a named volume, confirm you can back it up and restore it intentionally.

Task 9: Confirm file permissions and ownership on private keys

cr0x@server:~$ sudo ls -l /etc/letsencrypt/live/example.com/privkey.pem
-rw------- 1 root root 1704 Dec  1 03:12 /etc/letsencrypt/live/example.com/privkey.pem

Meaning: Only root can read it. If your proxy runs as non-root inside the container, it may fail to load the key.

Decision: Either run the proxy with a user that can read the key via group permissions, or use a controlled mechanism (like a dedicated group and chmod 640) rather than opening it to the world.

Task 10: Check the proxy logs for ACME and certificate reload events

cr0x@server:~$ docker logs --since 2h edge-proxy | egrep -i 'acme|certificate|renew|challenge' | tail -n 20
time="2026-01-03T08:01:12Z" level=info msg="Renewing certificate from LE : {Main:example.com SANs:[www.example.com]}"
time="2026-01-03T08:01:15Z" level=info msg="Server responded with a certificate."
time="2026-01-03T08:01:15Z" level=info msg="Adding certificate for domain(s) example.com, www.example.com"

Meaning: Renewal happened and the proxy believes it loaded the new cert.

Decision: If logs show renewal succeeded but clients still see the old cert, you likely have multiple ingress instances or a caching/load balancer layer serving a different cert.

Task 11: If using systemd timers for Certbot, verify scheduling and last run

cr0x@server:~$ systemctl list-timers | grep -i certbot
Sun 2026-01-04 03:17:00 UTC  15h left  Sat 2026-01-03 03:17:02 UTC  5h ago  certbot.timer  certbot.service

Meaning: Timer exists and ran recently.

Decision: If it doesn’t exist, you don’t have automation. If it exists but hasn’t run, check whether the host was down or the timer is misconfigured.

Task 12: Validate deploy hooks actually reloaded the proxy

cr0x@server:~$ sudo grep -R "deploy-hook" -n /etc/letsencrypt/renewal | head
/etc/letsencrypt/renewal/example.com.conf:12:deploy_hook = docker kill -s HUP edge-proxy

Meaning: After renewal, Certbot sends HUP to the proxy container. That’s a controlled reload pattern.

Decision: Confirm the proxy supports SIGHUP reload semantics. If it doesn’t, replace the hook with the correct reload command (or an API call), and test it during business hours.

Task 13: Confirm what certificate file the proxy is configured to use

cr0x@server:~$ docker exec -it edge-proxy sh -c 'grep -R "fullchain.pem\|privkey.pem" -n /etc/traefik /etc/nginx 2>/dev/null | head'
/etc/nginx/conf.d/https.conf:8:ssl_certificate     /etc/letsencrypt/live/example.com/fullchain.pem;
/etc/nginx/conf.d/https.conf:9:ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

Meaning: You’re serving the full chain and the key from the standard live paths.

Decision: If you see paths under /tmp or app-specific directories, expect surprises during redeploys.

Task 14: Check rate limit risk by counting recent failed attempts

cr0x@server:~$ sudo awk '/urn:ietf:params:acme:error/ {count++} END {print count+0}' /var/log/letsencrypt/letsencrypt.log
0

Meaning: No ACME error entries in the log. Good.

Decision: If this number is climbing, stop automated retries and fix the underlying validation issue before you hit rate limits.

Task 15: Confirm container time is sane (yes, it matters)

cr0x@server:~$ docker exec -it edge-proxy date -u
Sat Jan  3 10:23:01 UTC 2026

Meaning: Time is correct. Bad time can make certificate validation fail in ways that look like “random TLS errors.”

Decision: If time is off, fix host NTP first. Containers inherit host time; if it’s wrong, everything is wrong.

Fast diagnosis playbook

When TLS is on fire, you don’t “investigate.” You triage. Here’s the order that finds the bottleneck quickly.

First: is the wrong certificate being served, or no certificate?

  • Run openssl s_client against the public endpoint and check notAfter, subject, and issuer.
  • If it’s expired: you’re dealing with renewal failure or reload failure.
  • If it’s the wrong CN/SAN: you’re hitting the wrong ingress instance, wrong SNI routing, or a default certificate.

Second: does the ACME client believe it renewed?

  • Check Certbot/ACME logs for success entries and timestamps.
  • Check filesystem timestamps on fullchain.pem and privkey.pem.
  • If files are updated but the served cert is old: this is a reload/distribution problem.

Third: can the challenge be satisfied right now?

  • For HTTP-01: confirm port 80 reachability and that /.well-known/acme-challenge/ is routed to the right responder.
  • For DNS-01: confirm the DNS API credentials are present and valid, and check propagation delays.

Fourth: confirm there is only one source of truth

  • Look for multiple ingress containers or multiple hosts behind a load balancer that aren’t sharing certificate state intentionally.
  • Make sure staging vs production endpoints are not mixed.

Fifth: rate limits and retry storms

  • If you see repeated failures, stop the renewal job temporarily. Rate limits are unforgiving and will extend the outage.
  • Fix the underlying routing/DNS issue, then do a controlled retry.

Common mistakes: symptom → root cause → fix

1) Symptom: renewal “succeeded,” but browsers still show the old certificate

Root cause: The proxy never reloaded the new files, or you updated certs on one host while traffic hits another.

Fix: Add a deploy hook to reload the proxy (signal/API), and verify which instance serves traffic with openssl s_client from multiple vantage points.

2) Symptom: Certbot fails HTTP-01 with “connection refused” or “timeout”

Root cause: Port 80 isn’t reachable (firewall, NAT, wrong Docker publish) or another service is binding it.

Fix: Ensure port 80 is published by the ingress container and allowed through security groups/firewalls. Run ss -lntp to confirm.

3) Symptom: HTTP-01 fails with “unauthorized” and the token content is wrong

Root cause: The challenge path is being redirected/routed to the app, not the ACME responder. Often caused by “force HTTPS” rules applied too early or an overly greedy reverse proxy rule.

Fix: Add a specific route for /.well-known/acme-challenge/ that bypasses redirects and points to the ACME responder.

4) Symptom: You hit Let’s Encrypt rate limits during an incident

Root cause: Automated retries hammer production issuance after repeated challenge failures.

Fix: Use --dry-run in staging for testing; implement backoff; alert on failures. During an incident, stop the job and fix reachability first.

5) Symptom: Proxy can’t read privkey.pem inside container

Root cause: File permissions are root-only and the container runs as non-root, or SELinux labeling blocks access.

Fix: Use group-readable permissions for a dedicated group, run the proxy with that group, and if SELinux is enabled use proper labels on the bind mount.

6) Symptom: After redeploy, certificates disappear and proxy serves a default/self-signed cert

Root cause: ACME storage was in the container filesystem (ephemeral) or in an un-backed-up volume that got recreated.

Fix: Persist ACME storage to a named volume or bind mount with backups. Treat it as stateful data.

7) Symptom: Wildcard certificate requests fail even though HTTP-01 works

Root cause: Wildcards require DNS-01, not HTTP-01.

Fix: Implement DNS-01 via DNS provider API, lock down credentials, and test propagation behavior.

8) Symptom: “It works on one host but not another”

Root cause: Split-brain: multiple ingress nodes each issuing independently, or inconsistent time, or inconsistent configuration.

Fix: Pick a single ownership model (per-host issuance with DNS-01, or centralized ingress with distributed secrets) and enforce it with configuration management.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

They had a tidy Docker Compose stack: a reverse proxy, a couple of APIs, a frontend, and a “certbot” service. The assumption was simple: the certbot container renews certificates and the proxy magically starts using them. Nobody wrote down the “magically” part.

Renewal day arrived. Certbot did renew. The files on disk updated. But the proxy process had loaded the certificate at startup and never looked again. It was happily serving the old cert from memory while the disk contained the new one, like a librarian who refuses to accept new editions.

The team chased red herrings: DNS, firewall rules, Let’s Encrypt outages. Meanwhile browsers screamed “expired certificate” and customers assumed compromise. Security got involved. Leadership got involved. Sleep left the building.

The fix took minutes once seen: a deploy hook that sent the correct reload signal to the proxy container, plus a validation step that compared the live served cert to the filesystem after renewal. The larger fix took a week: they added an alert for “certificate expires within 14 days” and wrote a runbook that started with openssl s_client.

Mini-story 2: The optimization that backfired

A different company wanted faster deploys and fewer moving parts. Someone proposed: “Let each service container request its own certificate. Then scaling is easy, and teams are autonomous.” This sounded like modern architecture and also like something you could say in a meeting without being challenged.

It worked for a while, in the same way a kitchen works when nobody cooks. Then a migration happened: dozens of services redeployed over a few hours. Each one tried to issue a certificate. Some used staging, some used production, and a few had misconfigured domain names that failed validation and retried aggressively.

Rate limits hit. Some services couldn’t obtain certs, so they served self-signed fallbacks. Clients pinning trust failed hard. Support tickets spiked. The incident was technically “just certificates,” but operationally it was a distributed system self-own.

They rolled back to a centralized ingress that issued a small set of certificates and routed internally. Autonomy returned in a better form: teams owned routes and headers, not private key lifecycles. The optimization had been “remove the bottleneck.” What they removed was the only place anyone was looking.

Mini-story 3: The boring but correct practice that saved the day

A fairly regulated enterprise had an unglamorous rule: all Internet-facing TLS terminates at a hardened edge proxy cluster, and certificate state is backed up as part of infrastructure state. Engineers grumbled. It felt slow. It felt like paperwork.

Then an unexpected certificate authority chain change landed in the ecosystem and a subset of older clients behaved badly. The team didn’t have to scramble across 40 application repos hunting TLS settings. They adjusted the edge configuration, verified chain presentation, and rolled out a controlled change with a canary. The apps didn’t move.

Later that year, a host died. The replacement host came up, configuration applied, certificates restored, and traffic resumed. No last-minute issuance, no rate limit drama, no “why is the volume empty?” mystery.

The saving move wasn’t heroics. It was ownership boundaries, backups, and reload hooks tested quarterly. It was boring in the same way seatbelts are boring.

Checklists / step-by-step plan

Pick a pattern (do this before writing Compose files)

  1. Single host, simple ingress: Pattern A (proxy-native ACME) or Pattern B (host Certbot + proxy reads read-only).
  2. Multiple hosts behind LB: Prefer DNS-01 and per-host issuance, or a centralized secret distribution approach. Avoid “shared NFS of /etc/letsencrypt” unless you really understand the failure modes.
  3. Highly locked-down hosts: Pattern A with a well-understood ACME storage mechanism, plus backups and access controls.

Hardening checklist (the stuff you regret skipping)

  • Only one public ingress publishes ports 80/443.
  • Certificates and keys are persisted and backed up.
  • Private keys are readable only by the ingress (and the renewer if separate).
  • Staging dry-run renewal is tested and scheduled.
  • Reload mechanism is implemented and verified (signal/API/graceful reload).
  • Monitoring: alert on certificate expiry, renewal failures, and ACME errors.
  • Runbook: first command is openssl s_client, not “check Grafana.”

Step-by-step: host Certbot + containerized nginx/Traefik reading read-only

  1. Install Certbot on the host and obtain the initial certificate using a method compatible with your routing (standalone/webroot/DNS).
  2. Store certs in /etc/letsencrypt on the host.
  3. Bind mount /etc/letsencrypt into the proxy container as read-only.
  4. Configure the proxy to use fullchain.pem and privkey.pem.
  5. Add a Certbot deploy hook to reload the proxy gracefully.
  6. Enable and verify a systemd timer for renewals.
  7. Run certbot renew --dry-run and verify live served cert matches filesystem after reload.

Step-by-step: proxy-native ACME (Traefik/Caddy style)

  1. Persist ACME state to a volume/bind mount (this is not optional).
  2. Lock down permissions on ACME storage (account keys live there).
  3. Use HTTP-01 only if port 80 is reliably reachable; otherwise use DNS-01 with scoped DNS API credentials.
  4. Test renewal behavior and observe logs for renewal events.
  5. Back up ACME storage and test restore on a non-production instance.

FAQ

Should I run Certbot inside a container?

You can, but you shouldn’t by default. If the host can run Certbot, host-based renewals plus read-only mounts into the proxy are easier to audit and recover.

Is Traefik/Caddy ACME “safe”?

Yes, if you persist and protect the ACME storage. The unsafe version is leaving ACME data in an ephemeral container filesystem or mounting it RW everywhere.

Why not terminate TLS in each application container?

Because private keys spread, renewals multiply, and debugging becomes a scavenger hunt. Centralize TLS at the edge unless you have a specific compliance or architecture need.

What’s the safest challenge type with Docker?

DNS-01 is the most infrastructure-friendly when port 80 is messy (load balancers, locked firewalls, multiple ingresses). But it shifts risk to DNS API credentials and propagation timing.

Do I need port 80 open if I use HTTPS everywhere?

If you use HTTP-01, yes. The ACME server must fetch the token over HTTP. The usual solution is: allow HTTP only for /.well-known/acme-challenge/ and redirect everything else to HTTPS.

How do I avoid downtime when certificates renew?

Use a proxy that supports graceful reload, and trigger it via a deploy hook (or rely on the proxy’s built-in reload). Verify with openssl s_client after renewal.

Where should I store certificates on disk?

On the host in a protected directory (commonly /etc/letsencrypt) if the host runs ACME. Or in a dedicated, backed-up volume if the proxy runs ACME. Keep the private key readable only by what must serve it.

What if I have multiple Docker hosts behind a load balancer?

Pick one: per-host issuance (DNS-01 is common) or a centralized certificate distribution approach. Avoid ad-hoc shared filesystems unless you’ve tested locking, backups, and failover behavior.

How do I know if I’m close to Let’s Encrypt rate limits?

Look for repeated ACME errors in your logs and stop retry storms. A handful of controlled retries is fine; tight loops during an outage are how you end up waiting out cooldown periods.

Are Docker secrets a good place for TLS private keys?

In Swarm, Docker secrets can be a good primitive, but you still need rotation and reload semantics. In plain Docker Compose, “secrets” often devolve into mounted files without lifecycle management.

Conclusion: the next practical steps

If you only take one decision from this: centralize TLS ownership at the ingress. Then pick how certificates are issued:

  • Use proxy-native ACME when you can persist and protect its state cleanly.
  • Use host-based Certbot when you want predictable scheduling, logging, and filesystem control.
  • Use Certbot-in-container only when host installs are off-limits, and treat the certificate volume like a secret store.

Next steps you can do this week:

  1. Run the “served certificate” check with openssl s_client and record the expiry date somewhere visible.
  2. Run a dry-run renewal test and fix any failures while you’re not under pressure.
  3. Confirm reload semantics and implement a deploy hook that’s proven to work.
  4. Add one alert: “certificate expires in 14 days.” Boring alert. Life-saving alert.

After that, you can argue about ciphersuites and HTTP/3 like civilized people. First, stop the certificates from expiring.

← Previous
WordPress 404 on Posts: Fix Permalinks Without Breaking SEO
Next →
Containers vs VMs: which CPU profile wins for what

Leave a comment