VPN certificates: do it properly without permanent self-signed pain

Was this helpful?

The outage doesn’t start with a bang. It starts with a “can’t connect” from one remote engineer, then five, then a sales team stranded at an airport lounge,
and finally your exec who hasn’t used the VPN since the last quarterly incident review. Somewhere in the background, a certificate expired, a chain broke,
or a “temporary” self-signed shortcut metastasized into policy.

Certificates are supposed to remove doubt. In VPN land they often add it: odd OpenSSL errors, mismatched EKUs, phantom revocations, and that one laptop
that insists today is 1970. Let’s build this properly: a sane private PKI, boring rotation, fast diagnosis, and no permanent self-signed pain.

What you’re actually building (and what “self-signed” really costs)

When people say “VPN certificates,” they’re usually mixing three separate problems:
identity, transport security, and lifecycle. The VPN tunnel is the transport; the certificate is the identity document; the PKI is the lifecycle engine.
If you only solve transport (encrypting packets) and ignore lifecycle (issuance, rotation, revocation, auditing), you get the classic “it worked for years”
until a Monday morning expiry takes the whole remote workforce hostage.

Self-signed certificates are not inherently evil. They’re just a promise that you are now the CA operator, incident responder, and compliance program.
If you do it once and document it, it can be fine. If you do it casually, you’ll end up in a place where half the fleet trusts “vpn-ca-old.crt,”
a quarter trusts “vpn-ca-final2.crt,” and the last quarter is running a pinned fingerprint from a Slack message.

In production, the goal is not “use a public CA” or “use self-signed.” The goal is: a stable trust anchor, minimal blast radius, automated renewal,
and a revocation story that matches reality. Most VPN deployments should use a private CA with an offline root and one or more online intermediates.
It’s the same pattern the web PKI uses; we’re just doing it privately and with fewer lawyers.

One sharp rule: never treat certificates like static config. They are credentials with expiration dates, cryptographic constraints, and operational consequences.
If your VPN onboarding is “copy this .ovpn file from a wiki,” you don’t have a VPN program. You have a folklore program.

Interesting facts and historical context

  • X.509 dates back to 1988, designed for directory services, not modern DevOps workflows. We inherited it anyway, like a legacy ERP.
  • TLS replaced SSL in name and in protocol evolution; “SSL VPN” is still used in marketing long after SSL itself became a bad idea.
  • RSA was dominant for decades; today, ECDSA is often preferred for performance and smaller certs, but compatibility still matters in enterprise clients.
  • SHA-1 deprecation wasn’t theoretical—certificate chains broke across ecosystems as trust stores and policies tightened.
  • Let’s Encrypt (2015) normalized automation via ACME; even if you can’t use public certs for VPN, the automation mindset is the real gift.
  • OCSP stapling became common on the web to reduce revocation latency and load; VPNs often skip revocation entirely, then act surprised later.
  • Chrome’s “certificate transparency” era forced public CAs into auditable logs; private PKI doesn’t get that for free, so you must log issuance yourself.
  • Heartbleed (2014) taught ops teams that “rotate everything” is easy to say and brutal to do without automation and inventory.
  • VPNs used to be mostly site-to-site; remote access at scale made per-user/device credentials and short lifetimes far more important.

PKI models that work for VPNs

Model A: private PKI with offline root + online intermediate (recommended)

This is the grown-up model. You keep a root CA offline (ideally literally powered off, stored securely, and used rarely).
You issue an intermediate CA cert from the root. The intermediate signs server and client certificates.
If the intermediate key is compromised, you revoke and replace it without having to re-trust a new root across the world.

Operationally, this model supports:
short-lived end-entity certs, automated renewal, staged rollout of new intermediates, and manageable compromise response.
It also makes auditors less cranky, which is a legitimate SRE objective.

Model B: single self-signed CA used for everything (acceptable only at small scale)

One CA key signs all server and client certs. Easy. Also: a single key compromise means “burn it all down and re-enroll the planet.”
Some teams live here for years because it “works,” until a laptop with the CA key in a Downloads folder gets stolen.

Model C: public CA for server identity + private CA for client identity (sometimes useful)

If your VPN endpoint is internet-facing and you want to avoid client trust-store distribution for the server cert, using a public CA for the server certificate
can make sense. But client certificates are still a private identity system; public CAs won’t issue “employee-laptop-3421” certs for your mTLS.
You’ll still run a private CA for client auth.

Be careful: this split model creates two parallel lifecycles. If you can’t operate one lifecycle well, you can’t operate two.

Model D: no certificates at all (WireGuard-style keys)

WireGuard doesn’t use X.509. It uses static public/private keys with a different trust model: you provision peer keys and allowed IPs.
That can be simpler and more reliable—until you need enterprise lifecycle features like centralized issuance, attestation, short lifetimes,
or consistent identity metadata across systems.

If you’re already all-in on WireGuard, you’re still doing “credential lifecycle.” You just moved it out of PKI and into key management and inventory.
The operational discipline is still required.

Design decisions that change outcomes

Pick your trust anchor strategy: one root, multiple intermediates

Trust anchors are expensive to change. Treat the root CA like a platform dependency: stable, boring, and rarely touched.
Rotate intermediates more often. Rotate leaf certificates often. That’s the pyramid.

Use short-lived leaf certificates (and automate renewal)

Long-lived client certificates feel convenient. They are not. They turn offboarding into revocation theater and make key compromise more valuable.
A good target for client leaf certs is days to a few months, depending on device management maturity and how quickly you can re-enroll.

Decide what “identity” means: user, device, or both

If the certificate identity maps to a user only, any device holding the key can impersonate them. If it maps to a device only, user-level access control is harder.
Many organizations do both: device cert for enrollment and posture checks, plus user auth (SSO/OIDC) for session-level identity.
Don’t let the certificate CN become your HR database. Put stable IDs in SAN/URI or custom extensions, and keep authorization elsewhere.

Revocation: either do it right or don’t pretend

CRLs and OCSP exist for a reason. But revocation is operationally tricky in VPNs because clients may be offline, and VPN servers must fetch and cache status.
If you can’t guarantee fresh revocation information, your real security control becomes certificate lifetime. Short lifetimes reduce the need for revocation.

Algorithm choices: ECDSA vs RSA, and what your clients actually support

ECDSA is fast and compact. RSA is ubiquitous and boring. The right answer is often “RSA for maximum compatibility” unless you’ve proven your client estate
supports ECDSA cleanly. Mixing algorithms across chain elements can also trip up older stacks. Test with the worst devices you still support.

One quote you should tattoo onto your runbooks

Hope is not a strategy. (paraphrased idea often used in engineering and operations circles)

If your certificate plan involves hoping people renew before expiry, hoping revocations propagate, or hoping no one copies the CA key around,
you’re not doing PKI. You’re doing vibes.

Practical tasks: commands, outputs, decisions

These are real tasks you can run today. Each one includes: the command, what the output means, and what decision you make based on it.
Use them during incidents, audits, and “why is this client failing while others work?” conversations.

Task 1: Check when a VPN server certificate expires (leaf cert)

cr0x@server:~$ openssl x509 -in /etc/openvpn/pki/issued/vpn-server.crt -noout -subject -issuer -dates
subject=CN = vpn.example.internal
issuer=CN = vpn-intermediate-01
notBefore=Sep 10 00:00:00 2025 GMT
notAfter=Dec  9 00:00:00 2025 GMT

Meaning: This is the server leaf cert, issued by your intermediate. It expires on Dec 9.
Decision: If “notAfter” is within your rotation window (say 14–30 days), renew now and plan rollout.
If it’s already expired, your incident is likely “clients cannot connect” with TLS errors.

Task 2: Verify a certificate chain against the CA bundle the client uses

cr0x@server:~$ openssl verify -CAfile /etc/openvpn/pki/ca-bundle.pem /etc/openvpn/pki/issued/vpn-server.crt
/etc/openvpn/pki/issued/vpn-server.crt: OK

Meaning: The server cert chains cleanly to the CA bundle you provided (root + intermediate(s)).
Decision: If you get “unable to get local issuer certificate,” your bundle is wrong or incomplete; fix packaging before touching crypto.

Task 3: Inspect SAN and Extended Key Usage (EKU) for server auth

cr0x@server:~$ openssl x509 -in /etc/openvpn/pki/issued/vpn-server.crt -noout -text | sed -n '/Subject Alternative Name/,+2p;/Extended Key Usage/,+1p'
            X509v3 Subject Alternative Name:
                DNS:vpn.example.internal, DNS:vpn
            X509v3 Extended Key Usage:
                TLS Web Server Authentication

Meaning: Clients validating hostname will use SAN, not CN. EKU includes server authentication.
Decision: If SAN is missing the name clients connect to, fix issuance templates. If EKU is wrong, some clients will refuse the cert entirely.

Task 4: Confirm the private key matches the certificate

cr0x@server:~$ openssl x509 -in /etc/openvpn/pki/issued/vpn-server.crt -noout -modulus | openssl md5
MD5(stdin)= 4c3c9c0d27a8e8b6d1a6c5f4e0a1b2c3
cr0x@server:~$ openssl rsa -in /etc/openvpn/pki/private/vpn-server.key -noout -modulus | openssl md5
MD5(stdin)= 4c3c9c0d27a8e8b6d1a6c5f4e0a1b2c3

Meaning: Matching hashes means the key and cert correspond.
Decision: If they differ, you deployed a mismatched key/cert pair. Fix that before debugging anything else.

Task 5: Test the TLS handshake from a client network vantage point

cr0x@server:~$ openssl s_client -connect vpn.example.internal:443 -servername vpn.example.internal -showcerts -verify 5
depth=2 CN = vpn-root
verify return:1
depth=1 CN = vpn-intermediate-01
verify return:1
depth=0 CN = vpn.example.internal
verify return:1
Verify return code: 0 (ok)

Meaning: The handshake succeeds, the chain verifies, and SNI is correct.
Decision: If verification fails here but works on the server, you likely have a middlebox, wrong endpoint, or mismatched certificate served by the load balancer.

Task 6: See what certificate your VPN endpoint is actually serving (load balancer check)

cr0x@server:~$ echo | openssl s_client -connect 203.0.113.10:443 -servername vpn.example.internal 2>/dev/null | openssl x509 -noout -subject -issuer -dates
subject=CN = vpn-old.example.internal
issuer=CN = vpn-intermediate-00
notBefore=Jun  1 00:00:00 2025 GMT
notAfter=Sep  1 00:00:00 2025 GMT

Meaning: Your public IP serves an old cert from an old intermediate and it’s expired.
Decision: Stop rotating certs on the backend host and forgetting the front door. Update the load balancer/ingress cert immediately.

Task 7: Check OpenVPN server logs for TLS-auth and cert verification errors

cr0x@server:~$ sudo tail -n 30 /var/log/openvpn/server.log
VERIFY ERROR: depth=0, error=certificate has expired: CN=alice-laptop-17
TLS_ERROR: BIO read tls_read_plaintext error
TLS Error: TLS object -> incoming plaintext read error

Meaning: Client certificate expired; server refuses it.
Decision: Re-issue the client certificate (or re-enroll via your device management). If many users hit this, your rotation program is broken.

Task 8: Inspect a client certificate for expiration and EKU (client auth)

cr0x@server:~$ openssl x509 -in /tmp/alice.crt -noout -subject -dates -ext extendedKeyUsage
subject=CN = alice-laptop-17
notBefore=Oct  1 00:00:00 2025 GMT
notAfter=Oct 31 00:00:00 2025 GMT
X509v3 Extended Key Usage:
    TLS Web Client Authentication

Meaning: Short-lived client cert with correct EKU.
Decision: If EKU lacks “client authentication,” some servers reject it. If expiry is too short for your enrollment pipeline, increase lifetime or fix automation.

Task 9: Validate the CRL you’re distributing (and its freshness)

cr0x@server:~$ openssl crl -in /etc/openvpn/pki/crl.pem -noout -lastupdate -nextupdate -issuer
lastUpdate=Dec 27 00:00:00 2025 GMT
nextUpdate=Jan  3 00:00:00 2026 GMT
issuer=CN = vpn-intermediate-01

Meaning: Your CRL is current and has a next update a week out.
Decision: If nextUpdate is in the past, many servers will treat the CRL as stale and either fail closed (good security, bad Monday) or fail open (bad security, fewer tickets).

Task 10: Confirm your VPN server is actually using the CRL

cr0x@server:~$ sudo grep -R "crl-verify" -n /etc/openvpn/server.conf
42:crl-verify /etc/openvpn/pki/crl.pem

Meaning: OpenVPN is configured to check revocations.
Decision: If it’s missing, your offboarding relies entirely on expiration. That can be acceptable with very short-lived certs; otherwise it’s a security gap you should admit out loud.

Task 11: Check the system trust store for the right root/intermediate (Linux client/server)

cr0x@server:~$ sudo openssl x509 -in /usr/local/share/ca-certificates/vpn-root.crt -noout -subject -fingerprint -sha256
subject=CN = vpn-root
sha256 Fingerprint=7A:3B:1E:2C:9F:11:5D:AA:0B:1C:44:3D:9E:8F:2A:67:0E:91:2B:7C:7B:77:64:93:2F:10:9C:54:48:99:AA:01

Meaning: You can fingerprint the expected root.
Decision: If the fingerprint doesn’t match what you intend, stop. You might be trusting the wrong CA. Fix distribution, then re-validate chains.

Task 12: Enumerate soon-to-expire certs on the VPN server

cr0x@server:~$ find /etc/openvpn/pki/issued -name "*.crt" -print0 | xargs -0 -n1 -I{} sh -c 'printf "%s " "{}"; openssl x509 -in "{}" -noout -enddate | cut -d= -f2' | sort -k2
/etc/openvpn/pki/issued/vpn-server.crt Dec  9 00:00:00 2025 GMT
/etc/openvpn/pki/issued/metrics-exporter.crt Jan 15 00:00:00 2026 GMT

Meaning: A quick inventory of leaf cert expiry dates.
Decision: If you can’t produce this output on demand, you don’t have certificate management—just certificate surprises.

Task 13: Detect “wrong time” failures on clients (NTP sanity)

cr0x@server:~$ timedatectl
               Local time: Sun 2025-12-28 10:42:11 UTC
           Universal time: Sun 2025-12-28 10:42:11 UTC
                 RTC time: Sun 2025-12-28 10:42:10
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

Meaning: Time is synchronized. Certificates are time-sensitive; “not yet valid” is often just a broken clock.
Decision: If synchronized is “no,” fix NTP before blaming PKI. A laptop with a drifting clock can’t be reasoned with.

Task 14: Confirm intermediate CA constraints (basicConstraints, keyCertSign)

cr0x@server:~$ openssl x509 -in /etc/openvpn/pki/ca/vpn-intermediate-01.crt -noout -text | sed -n '/Basic Constraints/,+3p;/Key Usage/,+2p'
            X509v3 Basic Constraints: critical
                CA:TRUE, pathlen:0
            X509v3 Key Usage: critical
                Certificate Sign, CRL Sign

Meaning: This certificate is actually allowed to be a CA and sign leaf certs and CRLs. Path length 0 means it cannot create another intermediate beneath it.
Decision: If CA:FALSE or missing key usage, you’ll see baffling chain failures. Re-issue the intermediate properly; don’t hack around it.

Joke #1: Certificates are like milk—fine until they aren’t, and then your morning gets weird fast.

Fast diagnosis playbook

When VPN connections fail, people reach for random OpenSSL incantations and start rotating keys like it’s a ritual. Don’t. Diagnose like an SRE:
confirm the failure domain, isolate the layer, then change the smallest thing that can fix it.

First: is it reachability or TLS?

  • Check port reachability (security group, firewall, route, DNS). If clients can’t reach the socket, certificates are irrelevant.
  • Then check handshake with openssl s_client or the VPN client’s verbose logs.

Second: identify which certificate is actually presented

  • Hit the public endpoint/IP directly with SNI and dump the served leaf cert.
  • If you’re behind a load balancer or ingress controller, assume the LB is serving the cert until proven otherwise.

Third: validate the chain and name constraints

  • Does the client trust the correct root and intermediate?
  • Does the server cert have the SAN matching the name clients use?
  • Are EKU and key usages correct for server auth?

Fourth: check time and revocation

  • Client clocks: “not yet valid” almost always means bad time.
  • CRL/OCSP: stale CRL or unreachable OCSP can cause intermittent failures depending on client behavior.

Fifth: decide whether you’re failing open or closed

Some VPN stacks treat revocation failures as hard failures; others let connections through. Neither is universally correct.
What’s unacceptable is not knowing which behavior you’re running in production.

Common mistakes: symptom → root cause → fix

1) Symptom: “certificate verify failed” on some clients only

Root cause: Missing intermediate in the served chain, or a client trust store missing the intermediate/root.

Fix: Serve the full chain from the VPN endpoint (leaf + intermediate). Distribute a stable CA bundle to clients and version it.

2) Symptom: everyone breaks at once, right after a quiet weekend

Root cause: Expired server certificate or expired intermediate CA.

Fix: Monitor expiry dates; rotate before the window. Keep overlap: deploy new certs while the old ones are still valid, and test the actual served endpoint.

3) Symptom: “not yet valid” or weird intermittent failures on a single laptop

Root cause: Client clock skew or broken NTP.

Fix: Enforce NTP via device management. For unmanaged devices, put a “check your time” step in support scripts and don’t be shy about it.

4) Symptom: handshake fails after you “upgraded security” to ECDSA

Root cause: Legacy clients or TLS libraries don’t support your chosen curves/ciphers consistently.

Fix: Validate against your worst supported client. If you must use ECDSA, keep RSA as a compatibility option or upgrade the clients.

5) Symptom: revoked users can still connect for days

Root cause: No revocation checking, stale CRL, or VPN server not reloading CRL.

Fix: Enable CRL verification, refresh CRL on a schedule, and reload the VPN service (or confirm it auto-reloads). Or switch to short-lived certs and accept expiration as your control.

6) Symptom: after renewal, clients get “hostname mismatch”

Root cause: SAN doesn’t include the DNS name clients use; CN is ignored by modern validation.

Fix: Issue server certs with correct SANs. Standardize the connect name (one canonical DNS entry) and stop letting people connect to random aliases.

7) Symptom: “private key does not match certificate” during deploy

Root cause: Automation pulled key from one run and cert from another; or someone copied files manually.

Fix: Deploy key+cert as an atomic pair, with checks in CI/CD (modulus or public key hash match). Stop SCP-based PKI.

8) Symptom: sudden trust failures after “cleaning up old CAs”

Root cause: You removed a still-needed intermediate/root from client trust stores while leaf certs still chain to it.

Fix: Plan CA transitions like migrations: overlap intermediates, re-issue leaves, validate fleet adoption, then remove old trust anchors.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company ran OpenVPN behind a load balancer. They renewed the VPN certificate on the OpenVPN hosts, tested from inside the VPC,
saw green handshakes, and closed the ticket. Then Monday arrived and remote users couldn’t connect.

The wrong assumption was subtle: “the server presents the certificate.” In reality the load balancer terminated TLS and presented its own certificate.
The backend OpenVPN servers never mattered for certificate presentation, so their renewal had zero effect on the user experience.

Support spent hours collecting client logs. Engineers rotated client configs, restarted OpenVPN, and even rolled back a seemingly unrelated kernel update.
None of it changed the fact that the public IP still served an expired certificate from last quarter.

The fix took minutes: upload the renewed certificate chain to the load balancer, confirm with openssl s_client from a laptop off-network,
then schedule a postmortem where “test the real endpoint” became a rule, not advice.

Mini-story 2: The optimization that backfired

Another organization decided certificate validation was “slow.” They had high VPN connection churn due to mobile clients, and someone noticed CPU spikes
during peak reconnect windows. The proposed optimization: disable revocation checks and increase client certificate lifetime from 30 days to two years.

Performance improved. Tickets dropped. The team congratulated itself on “removing complexity.” Then a contractor’s laptop went missing.
Offboarding was done in HR, accounts were disabled in SSO, and everyone assumed the VPN access was effectively gone too.

It wasn’t. The VPN relied on the client certificate, not SSO. Without revocation checks and with a two-year lifetime, the stolen laptop’s VPN key remained
a valid bearer token. The organization got lucky: they detected unusual traffic patterns and contained it before it became a headline.

The eventual compromise response was painful: emergency CA re-issuance, forced re-enrollment, and a whole lot of “why did we do this” meetings.
The lesson was boring: if you “optimize” by removing security lifecycle controls, you didn’t optimize—you just moved the cost into incident response.

Mini-story 3: The boring but correct practice that saved the day

A global company ran a private PKI with an offline root and two intermediates: “current” and “next.” Every quarter they issued new server certificates
and rolled out the “next” intermediate to clients before it was used for anything.

This looked like bureaucracy. It required maintaining a CA bundle, writing distribution code for endpoints, and keeping inventory.
Nobody loved it, but it was predictable.

One day an intermediate key was suspected of exposure due to a misconfigured backup process. No confirmed compromise, but enough smoke to act.
They revoked the intermediate, promoted the “next” intermediate to “current,” and re-issued server and client certificates on a tight schedule.

The business barely noticed. VPN access continued because clients already trusted the new intermediate, and leaf certificates were short-lived anyway.
The “boring” quarterly practice turned a scary situation into a controlled maintenance event. That’s what good operations feels like: quiet.

Joke #2: PKI is the only place where “just trust me” is implemented as a file you must distribute to every device on Earth.

Checklists / step-by-step plan

Step-by-step plan: build a VPN PKI that won’t hate you back

  1. Define the scope.
    Decide whether certs identify users, devices, or both. Write it down. If you can’t explain it in two sentences, you will encode confusion into certificates.
  2. Create an offline root CA.
    Generate a root key on a secure machine. Keep it offline. Use it only to issue intermediates. Store backups securely with access controls.
  3. Create an online intermediate CA for VPN issuance.
    Use it to sign server and client leaf certificates. Set pathlen:0 to prevent accidental intermediate sprawl.
  4. Standardize your naming.
    Pick one canonical DNS name for the VPN endpoint. Put it in SAN. Stop letting people connect to random hostnames and IPs.
  5. Set lifetimes deliberately.
    Root: long (years). Intermediate: medium (1–3 years). Leaf server: shorter (months). Leaf client: short (days to a few months).
  6. Automate issuance and renewal.
    Manual issuance is fine for a lab. In production it becomes a queue. Integrate with your device management or enrollment portal.
  7. Decide revocation strategy.
    If you can keep CRLs fresh and enforced, do it. If not, rely on short-lived certs and accept that revocation is limited.
  8. Instrument expiry monitoring.
    Alerts for: server leaf expiry, intermediate expiry, CRL nextUpdate. Don’t alert on 180 days remaining; alert on actionable windows.
  9. Test the real edge.
    Validate what the public endpoint actually serves (LB, ingress). Make this a release gate.
  10. Practice rotation.
    Schedule routine certificate rotations. If rotation only happens during emergencies, it will always be an emergency.
  11. Log issuance and enrollment.
    Keep an audit trail: who requested a cert, what identity it represents, when it was issued, when it was revoked or expired.
  12. Document the “break glass” path.
    Who can sign a new intermediate, where the root lives, how to distribute new trust bundles, and what clients need to re-enroll.

Deployment checklist: renewing the VPN server certificate without downtime

  • Confirm current served certificate from outside the network (LB vs backend).
  • Issue new server leaf certificate with correct SANs and EKU.
  • Deploy full chain (leaf + intermediate) where TLS terminates.
  • Reload/restart service safely; confirm new cert is served.
  • Run handshake checks from at least two networks (corp + external).
  • Monitor connection success rate and error logs for 30–60 minutes.

Offboarding checklist: removing VPN access when someone leaves

  • Disable user auth (SSO) if used.
  • Revoke client cert (if revocation enforced) and publish updated CRL.
  • If revocation isn’t reliable, ensure client cert lifetime is short and force re-enrollment on next use.
  • Invalidate device posture or MDM enrollment if applicable.
  • Audit: confirm the identity can’t establish a VPN session after offboarding actions.

FAQ

1) Should I use a public CA certificate for my VPN server?

Sometimes. If you have many unmanaged clients and want them to trust the server without distributing a private root, public CA can help for the server side.
But client certificates (mTLS) still require private issuance and lifecycle, so don’t confuse “public server cert” with “PKI solved.”

2) Is “self-signed” the same as “private CA”?

Not exactly. A self-signed leaf is a dead-end you pin manually. A private CA is a managed trust anchor that signs other certs and can support rotation and policy.
People use “self-signed” to mean “not a public CA,” but operationally the difference is huge.

3) Do I really need an offline root CA?

If you care about blast radius, yes. An offline root is cheap insurance: you rarely touch it, and it turns “intermediate compromise” from apocalypse into migration.
If you’re tiny and fully managed, you can start without it—just don’t pretend that’s the final architecture.

4) How short should client certificate lifetimes be?

Short enough that stolen keys age out quickly, long enough that your enrollment pipeline doesn’t melt. Many teams land between 7 and 90 days.
If you can auto-renew silently via device management, you can go shorter.

5) If I use short-lived certs, can I skip revocation?

You can, but own the trade-off. Short lifetimes reduce the value of revocation but don’t eliminate it—especially for high-risk roles.
If you skip revocation, tighten lifetimes and have a path to force re-enrollment when needed.

6) Why do some clients fail after a CA rotation while others work?

Because trust distribution is uneven. Some devices got the new CA bundle; others didn’t. Or they cached an old chain.
This is why you version CA bundles, track enrollment state, and roll out trust anchors before switching issuance.

7) What’s the single most common cause of “hostname mismatch”?

Missing SAN entries. Modern TLS validation uses SAN; CN is not reliably used.
Fix your issuance templates and standardize the VPN endpoint DNS name.

8) Can I store VPN client keys in the user profile on laptops?

You can, but it’s weaker than hardware-backed storage. If the private key can be copied, it can be reused elsewhere.
Prefer OS keychains, TPM-backed keys, or managed device certificates where feasible.

9) Why did enabling CRL checking cause an outage?

Usually because the CRL expired or became unreachable and your server/client fails closed. That’s not a reason to disable revocation;
it’s a reason to automate CRL publishing, monitor nextUpdate, and test reload behavior.

10) Do I need OCSP for VPNs?

Not always. CRLs are simpler operationally in many private environments. OCSP adds moving parts and availability dependencies.
If you can’t keep OCSP highly available, it can become a self-inflicted denial of service.

Conclusion: next steps you can do this week

Doing VPN certificates properly isn’t about buying a product or memorizing OpenSSL flags. It’s about choosing a trust model you can operate under stress,
then building a lifecycle around it: issuance, rotation, revocation (or short lifetimes), and verification at the real edge.

Practical next steps:

  1. Run the expiry inventory command on your VPN servers and write down what expires in the next 60 days.
  2. From outside your network, confirm what certificate your public endpoint actually serves.
  3. Decide on offline root + online intermediate, and schedule the migration if you’re not there yet.
  4. Set client certificate lifetimes to something you can rotate without heroics, then automate renewal.
  5. Pick a revocation approach and test it during business hours, not during an incident.
  6. Turn the “Fast diagnosis playbook” into a runbook page and make on-call follow it before changing anything.

If you do these things, “VPN cert problem” stops being a recurring genre of outage and becomes routine maintenance. That’s the whole point.

← Previous
The reset button era: the most honest fix in computing
Next →
Debian 13: PHP-FPM socket permissions — the tiny fix that kills 502s (case #35)

Leave a comment