Docker Private Registry TLS Errors: Fix Certificate Chains Properly

Was this helpful?

You deploy a private Docker registry. It works for you. It works for one CI runner. Then a new node joins the cluster and suddenly: x509: certificate signed by unknown authority. People start “fixing” it by toggling insecure-registries because it’s 2 a.m. and the pager is loud.

Don’t. TLS errors around registries are one of those problems where the right fix is boring, repeatable, and far cheaper than the heroic hack. The wrong fix is fast, fragile, and comes back like a bad sequel—usually during a deploy freeze.

What Docker registry TLS errors actually mean

Most “Docker registry TLS errors” are not Docker problems. They’re standard TLS validation failures surfacing through Docker’s client stack (Go’s TLS library), the daemon, and sometimes containerd. Your registry is just an HTTPS server with opinions.

When you run:

  • docker login registry.example.com
  • docker pull registry.example.com/team/app:tag
  • kubectl apply and nodes start pulling images

…the client expects the server to present a certificate that:

  1. Matches the hostname you used (Subject Alternative Name, not just CN).
  2. Is currently valid (not expired, and within NotBefore/NotAfter).
  3. Chains up to a trusted root CA on the machine doing the pulling.
  4. Includes required intermediates (or can fetch them via AIA, which is unreliable in locked-down networks).
  5. Uses acceptable key sizes and signature algorithms.

Docker’s error messages are often short and unromantic, like:

  • x509: certificate signed by unknown authority
  • x509: certificate is valid for ..., not registry.example.com
  • remote error: tls: bad certificate
  • tls: handshake failure

Each of those maps to a specific failure mode. You can diagnose it with deterministic steps. You do not need to “try things” in production.

Joke #1: TLS is like a bouncer—if your chain of trust looks fake, you’re not getting into the club, no matter how loudly you insist you’re “on the list.”

Facts and context you can use in a postmortem

Some background helps, because TLS failures often come from history—old defaults, old habits, old infrastructure.

  1. Docker moved trust decisions to the daemon early on. The daemon does pulls, so its trust store matters more than your shell’s.
  2. Subject Alternative Name replaced CN matching years ago. Modern clients ignore the old Common Name for hostname validation; SAN is mandatory.
  3. Intermediate CAs became common because roots are tightly controlled. Many public and private PKIs issue leaf certs from intermediates, not directly from roots.
  4. AIA fetching is real but not guaranteed. Some TLS stacks can download missing intermediates via Authority Information Access, but Docker’s behavior depends on environment and network access.
  5. Let’s Encrypt changed chain behavior over time. Chain selection and intermediate rotations have caused “it worked yesterday” surprises when servers served the wrong chain.
  6. Older clients fail on newer algorithms and vice versa. A registry with only ECDSA certificates can trip older OpenSSL builds; RSA-only can be slower but widely compatible.
  7. Corporate TLS inspection breaks assumptions. Transparent proxies that re-sign certificates mean your internal trust store is now a dependency, whether you like it or not.
  8. Container runtimes evolved. On many systems Docker uses containerd under the hood; Kubernetes might use containerd directly. Trust store paths and reload semantics differ.
  9. “Insecure registries” was meant for dev, not prod. It disables critical verification. It’s not a “temporary fix”; it’s a policy decision with security debt.

One operational quote to keep you honest: “Hope is not a strategy.” — General Gordon R. Sullivan. It applies painfully well to certificate chains.

Fast diagnosis playbook

If you’re on-call, you don’t want a lecture. You want a fast path from symptom to root cause.

First: confirm what endpoint the client is actually hitting

  • Is it registry.example.com or a load balancer name?
  • Are you using a port like :5000?
  • Is there a proxy or MITM TLS inspection?

If the hostname differs from what the certificate covers, you’re done: fix DNS or SANs.

Second: inspect what the server presents (chain + SAN)

Use openssl s_client against the registry endpoint from the failing machine. Do not inspect it from your laptop and assume it’s the same path.

If the server presents only the leaf cert (missing intermediates), fix the server config. Don’t “teach every node” missing intermediates if the registry is the one misbehaving.

Third: determine where trust is missing

  • If curl fails and Docker fails: machine trust store is missing the CA or chain is wrong.
  • If curl succeeds but Docker fails: Docker daemon trust config differs, or you’re using containerd with its own trust configuration.
  • If only Kubernetes pulls fail: node runtime trust differs from your interactive shell machine.

Fourth: check time, because time ruins everything

Clock skew makes valid certs look invalid. NTP outages cause “random TLS errors” that look like PKI problems.

Fifth: resist “insecure-registries” unless you’re explicitly accepting the risk

It will get you green builds. It will also normalize bypassing verification, which is how you end up shipping credentials to the wrong endpoint later.

Certificate chain mental model (so you stop guessing)

A TLS certificate chain is a story the server tells the client: “I am registry.example.com, and here’s how you can believe me.” The story has characters:

  • Leaf certificate: issued to your registry hostname(s). This is what your server “is.”
  • Intermediate certificate(s): the CA that issued the leaf. These bridge the leaf to a root.
  • Root certificate: the trust anchor. This is what the client already trusts.

Clients usually trust roots, not intermediates. Intermediates are typically not in the OS trust store unless they’re bundled by the CA vendor. That’s why servers must often present the intermediate chain.

What “fix the chain” actually means

It means: your registry endpoint must present the leaf certificate plus all required intermediates in the correct order, and the client must have a trust anchor (root) that validates that chain.

On typical web servers, that’s the difference between serving:

  • Wrong: cert.pem (leaf only)
  • Right: fullchain.pem (leaf + intermediate(s))

Also: your private CA root must be installed on every node that will pull images, including ephemeral CI runners and autoscaled Kubernetes nodes. If you’re issuing from a private root that isn’t in system trust, nothing else matters.

Docker-specific wrinkle: trust location and reload behavior

Docker does not automatically consume every certificate you throw at the OS. The daemon uses system trust on many distros, but Docker also supports per-registry trust bundles at:

  • /etc/docker/certs.d/registry.example.com:5000/ca.crt

And containerd has its own configuration knobs depending on distro and Kubernetes distribution. Translation: you need to know which runtime is doing the pull and where it reads CAs from.

Practical tasks: commands, outputs, decisions (12+)

These are the tasks I actually run when registry pulls fail. Each includes the command, what typical output means, and the decision you make. Run them from the machine that is failing (node, runner, build agent), not from the machine you wish were failing.

Task 1: Confirm the exact registry host:port Docker is using

cr0x@server:~$ docker image pull registry.example.com:5000/team/app:1.2.3
Error response from daemon: Get "https://registry.example.com:5000/v2/": x509: certificate signed by unknown authority

What it means: Docker is using https://registry.example.com:5000 and failing trust verification.

Decision: All subsequent checks must target registry.example.com:5000, including certificate SANs and the trust bundle path under certs.d.

Task 2: Check basic DNS and routing (yes, really)

cr0x@server:~$ getent hosts registry.example.com
10.20.30.40   registry.example.com

What it means: Name resolves to 10.20.30.40.

Decision: If this differs across nodes (split-horizon DNS), you may be hitting different load balancers with different certificates.

Task 3: Inspect the presented certificate chain and SANs

cr0x@server:~$ echo | openssl s_client -connect registry.example.com:5000 -servername registry.example.com -showcerts 2>/dev/null | openssl x509 -noout -subject -issuer -dates -ext subjectAltName
subject=CN = registry.example.com
issuer=CN = Corp Issuing CA 01
notBefore=Dec  1 00:00:00 2025 GMT
notAfter=Dec  1 23:59:59 2026 GMT
X509v3 Subject Alternative Name:
    DNS:registry.example.com, DNS:registry

What it means: The leaf cert covers registry.example.com; good. Issuer is an intermediate (“Corp Issuing CA 01”).

Decision: If SANs don’t include the exact hostname clients use, reissue the cert. Do not rely on CN. If dates are off, fix rotation or clock.

Task 4: See whether the server is sending intermediates

cr0x@server:~$ echo | openssl s_client -connect registry.example.com:5000 -servername registry.example.com -showcerts 2>/dev/null | awk '/BEGIN CERTIFICATE/{i++} END{print i}'
1

What it means: Only one certificate (the leaf) is being sent. That’s a classic misconfiguration if clients don’t already have the intermediate.

Decision: Fix the registry’s TLS configuration to serve the full chain (leaf + intermediate(s)). This is usually the correct place to solve it.

Task 5: Verify the chain locally with a specific CA bundle

cr0x@server:~$ openssl verify -CAfile /etc/ssl/certs/ca-certificates.crt /tmp/registry-leaf.pem
CN = registry.example.com
error 20 at 0 depth lookup: unable to get local issuer certificate
error /tmp/registry-leaf.pem: verification failed

What it means: Your system trust store can’t build the chain from the leaf to a trusted root (missing intermediate and/or root).

Decision: Either install the correct corporate root CA into the system trust store, or fix the server to present intermediates (or both, depending on PKI design).

Task 6: Check OS trust store knows your corporate root

cr0x@server:~$ sudo grep -R --line-number "Corp Root CA" /etc/ssl/certs 2>/dev/null | head
/etc/ssl/certs/Corp_Root_CA.pem:1:-----BEGIN CERTIFICATE-----

What it means: The corporate root is present in the system trust directory (at least on this host).

Decision: If present but verification still fails, you’re missing intermediates on the server side or you’re not actually chaining to that root.

Task 7: Test with curl to isolate “Docker vs system” trust

cr0x@server:~$ curl -vkI https://registry.example.com:5000/v2/
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* SSL certificate problem: unable to get local issuer certificate
curl: (60) SSL certificate problem: unable to get local issuer certificate

What it means: This isn’t Docker being special; the host itself can’t validate the server’s chain.

Decision: Fix chain/trust first. Don’t touch Docker configuration yet.

Task 8: Confirm Docker daemon sees any per-registry CA bundle

cr0x@server:~$ ls -la /etc/docker/certs.d/registry.example.com:5000/
total 12
drwxr-xr-x 2 root root 4096 Jan  3 09:12 .
drwxr-xr-x 3 root root 4096 Jan  3 09:12 ..
-rw-r--r-- 1 root root 1984 Jan  3 09:12 ca.crt

What it means: Docker has a per-registry CA file available.

Decision: If this file is missing on failing nodes, install it (or better, fix server chain and distribute only your root CA via standard OS trust management).

Task 9: Restart Docker after changing trust (because it won’t read your mind)

cr0x@server:~$ sudo systemctl restart docker
cr0x@server:~$ sudo systemctl is-active docker
active

What it means: Docker restarted cleanly.

Decision: If restart fails, resolve daemon config issues before re-testing pulls. Trust changes that aren’t loaded don’t count.

Task 10: Inspect Docker daemon configuration for insecure shortcuts

cr0x@server:~$ cat /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "insecure-registries": ["registry.example.com:5000"]
}

What it means: Someone bypassed TLS verification for this registry.

Decision: Remove it once proper TLS is fixed. If you keep it, you’re accepting MITM risk for all pulls from that endpoint.

Task 11: Verify what certificate file your registry endpoint is actually serving (Nginx example)

cr0x@server:~$ sudo nginx -T 2>/dev/null | grep -n "ssl_certificate"
57:    ssl_certificate     /etc/nginx/certs/registry.crt;
58:    ssl_certificate_key /etc/nginx/certs/registry.key;

What it means: Nginx is serving registry.crt. That might be leaf-only.

Decision: Ensure ssl_certificate points to a full chain PEM (leaf + intermediates). Then reload Nginx.

Task 12: Confirm the PEM file actually contains a full chain

cr0x@server:~$ awk '/BEGIN CERTIFICATE/{i++} END{print i}' /etc/nginx/certs/registry.crt
2

What it means: Two certs in the PEM (likely leaf + intermediate). That’s usually what you want.

Decision: If it prints 1, you’re probably serving leaf-only. Fix the file and reload the server.

Task 13: Reload the registry frontend safely (Nginx)

cr0x@server:~$ sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
cr0x@server:~$ sudo systemctl reload nginx
cr0x@server:~$ sudo systemctl is-active nginx
active

What it means: Config is valid and reload succeeded. Existing connections should survive; new handshakes use new certs.

Decision: Prefer reload over restart during business hours.

Task 14: Re-test chain presentation after server change

cr0x@server:~$ echo | openssl s_client -connect registry.example.com:5000 -servername registry.example.com -showcerts 2>/dev/null | awk '/BEGIN CERTIFICATE/{i++} END{print i}'
2

What it means: The server now sends more than the leaf certificate.

Decision: Re-test with curl and then docker pull. If it still fails, trust anchor may be missing.

Task 15: Validate with curl again (now it should succeed)

cr0x@server:~$ curl -sSI https://registry.example.com:5000/v2/ | head -n 5
HTTP/1.1 401 Unauthorized
Server: nginx
Docker-Distribution-Api-Version: registry/2.0
Www-Authenticate: Bearer realm="https://registry.example.com:5000/token",service="registry.example.com:5000"
Date: Fri, 03 Jan 2026 09:27:31 GMT

What it means: TLS succeeded; HTTP auth is now the “problem,” which is normal. The registry expects credentials.

Decision: Move on to Docker login/pull tests. Your TLS layer is now sane.

Task 16: Confirm Docker can now talk TLS to /v2/

cr0x@server:~$ docker login registry.example.com:5000
Username: cr0x
Password:
Login Succeeded

What it means: TLS validation and authentication both worked.

Decision: Roll the same fix across all nodes/runners. Then remove any insecure registry settings.

Task 17: Check node time if cert validity looks wrong

cr0x@server:~$ timedatectl status | sed -n '1,8p'
Local time: Fri 2026-01-03 09:28:02 UTC
Universal time: Fri 2026-01-03 09:28:02 UTC
RTC time: Fri 2026-01-03 09:28:03
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active

What it means: Clock is synced.

Decision: If not synchronized, fix NTP before you blame PKI. Expired/not-yet-valid errors often come from time drift.

Three corporate mini-stories from the trenches

1) The incident caused by a wrong assumption: “The load balancer handles the chain”

A mid-sized company ran a private registry behind a layer-7 load balancer. The platform team rotated certificates quarterly. They assumed the load balancer was serving the full chain because browsers were happy. Browsers, of course, are forgiving: they cache intermediates, they fetch via AIA, they “just work” until they don’t.

Then they rolled out new Linux worker nodes into an autoscaling pool. Fresh machines, minimal base image, locked-down egress. Those nodes couldn’t fetch intermediates from the public internet, and they didn’t have the intermediate cached from previous sessions because they were newborn.

The failure mode was beautifully repetitive: every new node failed image pulls with x509: certificate signed by unknown authority. Old nodes kept working because their trust caches and historical behavior masked the defect. The incident was misread as “autoscaling is broken” and “containerd is flaky.” It wasn’t. The chain was incomplete.

The fix was embarrassingly simple: configure the load balancer to present the full chain (leaf + intermediate), and validate using openssl s_client from a new node with no cached state. The real lesson: browsers are not a compliance test for your infrastructure clients.

2) The optimization that backfired: “We’ll use ECDSA only”

Another org decided to modernize TLS. They generated ECDSA certificates because they’re faster and smaller, and they hardened the registry ingress to prefer modern ciphers. It was a clean change on paper and a great slide in a security review.

Then a legacy build fleet started failing. Some runners had an older OpenSSL stack compiled without the right curve support; a couple of vendor appliances did TLS termination and re-encryption with limited algorithm support. The registry wasn’t “down,” but it was selectively unreachable.

The team’s first reaction was to roll back. The second reaction was better: dual-stack the certificate strategy—support RSA and ECDSA where feasible, or at minimum confirm that all clients in scope can negotiate the chosen algorithms. They also learned to test with the actual fleet, not a single modern laptop.

The backfire wasn’t that ECDSA is bad. The backfire was believing “our infrastructure” is homogeneous. It rarely is.

3) The boring but correct practice that saved the day: “Treat CA distribution like a release artifact”

A fintech team ran multiple internal services using a private PKI, including their container registry. They had a rule: the corporate root CA and required intermediates were packaged and shipped like any other dependency—versioned, checksummed, and deployed via their configuration management tooling.

When they rotated their intermediate CA (planned, announced, rehearsed), they updated the “trust bundle” package first. Nodes received it gradually. Only after the fleet reported compliance did they start issuing new leaf certificates for endpoints like the registry.

The rotation day was anticlimactic. A few stragglers failed pulls—predictably the unmanaged snowflake servers—and they were corrected with standard tooling. No midnight incident. No emergency insecure registry settings. No “why is only one cluster broken?” mystery.

It was boring. Boring is what you want. Joke #2: If your certificate rotation is exciting, you’re doing live theater, not operations.

Common mistakes: symptoms → root cause → fix

1) Symptom: x509: certificate signed by unknown authority

Root cause: Client can’t build a chain to a trusted root. Either the root CA isn’t installed, or the server isn’t sending intermediates, or both.

Fix: Prefer fixing server chain presentation (serve fullchain). Also ensure the correct root CA is installed in OS trust and/or /etc/docker/certs.d/ for that registry.

2) Symptom: x509: certificate is valid for foo, not registry.example.com

Root cause: Wrong SANs. Often happens when people issue certs for an internal name but clients use an external name, or when a load balancer hostname differs from registry hostname.

Fix: Reissue the leaf certificate with SANs for every name clients use. Don’t paper over it with DNS hacks unless you control all callers.

3) Symptom: Works in browser, fails in Docker

Root cause: Browser fetched intermediates via AIA or had cached intermediates; Docker host did not. Or Docker daemon trust differs from user trust store.

Fix: Validate with openssl s_client -showcerts from the failing node. Serve full chain on the server. Confirm Docker trust bundle.

4) Symptom: Some nodes can pull, new nodes cannot

Root cause: Cached intermediates, inconsistent base images, or inconsistent CA distribution. Autoscaling exposes drift.

Fix: Standardize CA installation in the image or bootstrap. Test on a fresh node with no prior TLS history.

5) Symptom: tls: handshake failure with no other clues

Root cause: Protocol/cipher mismatch, SNI issues, or a proxy in the middle. Sometimes registry speaks TLS only on one port and you’re hitting another.

Fix: Use openssl s_client with -servername. Confirm TLS versions and ciphers on server and clients. Identify any intercepting proxy.

6) Symptom: remote error: tls: bad certificate on client

Root cause: Server rejected the client side of the handshake. This can happen with mTLS misconfiguration (client certs required), or with a broken server chain that confuses the server-side stack.

Fix: Check server logs (nginx, envoy, registry). Confirm whether client certificates are required. If yes, configure Docker/client for mTLS or disable requirement for registry endpoints.

7) Symptom: Pull fails only in Kubernetes, not on a bastion

Root cause: Node runtime trust store is missing CA, or nodes use containerd directly and do not read Docker’s certs.d. Or the image pull goes through an internal proxy on nodes.

Fix: Debug on the node. Install CA where the runtime expects it. For containerd, use its cert configuration mechanisms (varies by distro). Avoid assuming “Docker settings apply.”

8) Symptom: Suddenly fails after cert rotation

Root cause: New intermediate introduced, chain order wrong, expired intermediate still served, or leaf issued from a new CA not yet trusted by clients.

Fix: Serve correct full chain. Pre-distribute new roots/intermediates before rotating leafs. Validate from a clean client.

Checklists / step-by-step plan

Step-by-step: fix a private registry certificate chain (the sane way)

  1. Inventory callers. List every system that pulls: developer laptops, CI runners, Kubernetes nodes, air-gapped builders, edge boxes.
  2. Identify the TLS termination point. Is it registry itself, Nginx, Envoy/Ingress, or a load balancer?
  3. Confirm the hostname(s) clients use. Include port variants. Include internal DNS names.
  4. Issue a leaf cert with correct SANs. Don’t negotiate with reality here; if clients use three hostnames, SAN needs three hostnames.
  5. Build a fullchain PEM. Concatenate leaf then intermediates (no root unless your platform specifically expects it; many don’t).
  6. Configure the TLS endpoint to serve fullchain. Nginx/Envoy/Ingress must present intermediates.
  7. Validate chain from a clean host. Use openssl s_client -showcerts and curl.
  8. Distribute trust anchors properly. Install corporate root CA into OS trust of every pull-capable node; optionally add per-registry CA bundles.
  9. Reload services. Reload Nginx/Ingress. Restart Docker/containerd if required for trust updates.
  10. Remove insecure bypasses. Delete insecure-registries entries and any HTTP registry exceptions.
  11. Automate renewal and reload. Certificate rotation without automation is just future you’s outage plan.
  12. Put a canary in place. A periodic pull from a fresh node pool catches chain regressions early.

Operational checklist: before you blame PKI

  • Does DNS resolve to the expected IP from the failing node?
  • Is the failing node’s clock synced?
  • Are you behind a corporate TLS inspection proxy?
  • Are you using SNI correctly (hostname matches -servername)?
  • Did a load balancer get reconfigured recently?

Operational checklist: before you accept “insecure registries”

  • Have you proven the server is missing intermediates?
  • Have you proven the client is missing the root CA?
  • Have you checked whether the cert SANs match the used hostname?
  • Have you documented the risk and a rollback plan?

If you can’t answer those, you’re not making a decision; you’re making a mess.

FAQ

1) Should I put the root CA in the server’s fullchain file?

Usually no. Servers typically send leaf + intermediates; clients already have (or should have) the root. Sending the root can confuse some clients and wastes bytes. If your environment mandates it, document the exception and test widely.

2) Why does my browser trust the registry but Docker doesn’t?

Browsers cache intermediates and may fetch missing ones automatically. Docker on a minimal host often won’t have those cached and may not fetch them. Treat browser success as “nice,” not as proof.

3) Where do I install a custom CA for Docker?

Best practice is to install your corporate root CA into the OS trust store so everything benefits. Docker also supports per-registry CAs under /etc/docker/certs.d/<host:port>/ca.crt. After changes, restart Docker.

4) I fixed the chain on the server, but Kubernetes nodes still fail pulls. Why?

Kubernetes might use containerd directly, and the nodes might not have your CA installed in the OS store or in the runtime’s expected location. Debug on the node, not on your workstation.

5) What’s the difference between “unknown authority” and “valid for X, not Y”?

“Unknown authority” is a trust-chain problem (CA/root/intermediate). “Valid for X, not Y” is a hostname validation problem (SAN mismatch). They have different fixes; don’t conflate them.

6) Can I temporarily use insecure-registries?

You can, in the same sense you can temporarily disable your smoke detectors while cooking. It might reduce noise, but it also removes a safety control. Use it only with explicit approval and a planned removal date.

7) Do I need to restart Nginx/Ingress after updating cert files?

Yes, you need at least a reload so the process picks up the new certificate and chain. Validate with openssl s_client after reload, not before.

8) How do I avoid this outage during certificate rotation?

Pre-distribute new trust anchors (roots/intermediates) before switching leaf certificates, serve full chains, and validate from clean hosts. Treat CA distribution as a managed artifact, not tribal knowledge.

9) Why does it fail only on brand-new nodes?

Fresh nodes expose missing intermediates and missing root CAs because they don’t have cached certs, and they often have stricter, minimal trust stores. This is why autoscaling is a great auditor.

10) Is a registry just HTTPS, or does Docker require something special?

Transport-wise it’s HTTPS with client expectations. Application-wise it’s the Docker Registry API. If TLS is correct, you’ll typically get 401 Unauthorized from /v2/ when unauthenticated, which is a healthy sign.

Conclusion: next steps that won’t haunt you

Fixing Docker registry TLS errors “properly” is less about Docker and more about being disciplined with PKI. Serve the full chain. Use correct SANs. Install the right trust anchors on every machine that pulls images. Then delete the insecure hacks you added during the panic.

Practical next steps:

  1. Run the fast diagnosis playbook from the failing node and capture outputs for your incident notes.
  2. Update the registry TLS termination to serve leaf + intermediates (verify with openssl s_client -showcerts).
  3. Standardize CA distribution across nodes/runners (OS trust first; /etc/docker/certs.d only as needed).
  4. Add a canary pull from a clean environment to catch chain regressions before your developers do.
  5. Remove insecure-registries and treat any reintroduction as a security exception with an expiration date.

If you do those five things, registry TLS stops being a recurring drama and becomes what it should have been all along: background noise you never hear.

← Previous
WooCommerce Critical Error After Update: Roll Back and Recover Safely
Next →
Debian 13 LACP Bonding Flaps: Proving Whether the Switch or Host Is Wrong

Leave a comment