Proxmox LDAP/AD Login Fails: Where the Auth Chain Breaks and How to Fix

Was this helpful?

When Proxmox LDAP/AD login fails, you don’t have “an LDAP problem.” You have a broken chain: name resolution, network, TLS, bind, search, group resolution, user mapping, permissions, and finally Proxmox’s own auth stack. One weak link and the UI shrugs with “authentication failure” like it’s doing you a favor.

This guide walks the chain in the order it actually fails in production, with commands you can run right now, what the output means, and what decision to make next. No mystery meat. No “try restarting” as a worldview.

The auth chain: what Proxmox really does when you log in

Proxmox VE (PVE) authentication is not a single subsystem. It’s a set of realms plus a permission model plus some glue. If you don’t understand the glue, you’ll fix the wrong layer and “it works on my laptop” your way into an outage.

Step 0: Which realm are you actually using?

Proxmox supports realms like:

  • pam: local Linux accounts authenticated via PAM (which might be backed by LDAP/SSSD, but that’s a different chain).
  • pve: local Proxmox users stored in /etc/pve/user.cfg.
  • ldap: external directory via LDAP (OpenLDAP, AD via LDAP).
  • ad: Active Directory realm type (still LDAP under the hood, but with AD-ish defaults like group handling).

Users log in as user@realm. If your team keeps typing user@pam because muscle memory, you can tune LDAP until the heat death of the universe and it won’t help.

Step 1: Can PVE reach the directory service?

Before authentication, Proxmox must resolve the DC/LDAP hostnames, route to them, and open a TCP connection. For AD, that often means picking the right DC (or multiple) and not being tricked by DNS.

Step 2: TLS (LDAPS or StartTLS) succeeds or it doesn’t

LDAP over TLS is the first place where “works in testing” goes to die in prod. Certificates, SNI, SANs, intermediate chains, and clocks all matter. If LDAPS fails, Proxmox often reports it as an authentication error because the bind never happened.

Step 3: Bind works

Proxmox can bind as:

  • Anonymous (rarely allowed in AD; often disabled in hardened LDAP).
  • Service account (the common case; used to search for users and groups).
  • Direct user bind (user DN derived from template, then bind as the user).

A successful bind proves the credentials and (often) that TLS negotiation was fine.

Step 4: Search finds the user entry

This is where base DN and filters matter. If the search returns zero entries, Proxmox can’t map the username you typed to a DN, and you’ll get “login failed” even if the password is correct.

Step 5: Group/role mapping decides what you can do

Even if auth succeeds, authorization can fail in ways that feel like auth failure (e.g., you “log in” but see nothing; or API calls 401/403). Proxmox permissions are RBAC-ish and separate from authentication. AD group membership must map to Proxmox groups/roles, or you’ve created a beautiful identity with zero rights—like issuing a badge that opens exactly none of the doors.

Step 6: Proxmox cluster config consistency

Proxmox stores auth config in the cluster filesystem (/etc/pve). If the cluster is partitioned or a node is in a weird state, one node may have realm config that others don’t. Then only “some logins” fail, which is everyone’s favorite category of incident.

One paraphrased idea from Werner Vogels (Amazon CTO): “You build it, you run it.” If you run it, you need to debug it without guesswork.

Fast diagnosis playbook (check first/second/third)

This is the order that reduces time-to-truth. It’s intentionally boring. Boring is fast.

First: identify the realm and reproduce from CLI

  • Confirm the username format (alice@ad vs alice@ldap vs alice@pam).
  • Use Proxmox CLI to validate the realm config exists on the node handling the login.
  • Test the LDAP bind/search from the node using ldapsearch (or openssl s_client for TLS).

Second: prove network + DNS + time

  • DNS resolves DCs correctly from the Proxmox node.
  • TCP connects to 389/636 and no firewall/NAT weirdness exists.
  • Clock skew is sane (TLS and Kerberos don’t forgive time travel).

Third: validate bind DN, base DN, user filter, and group attributes

  • Bind credentials are correct and not locked/expired.
  • Base DN is correct and includes the user objects you expect.
  • User filter matches the directory schema (AD vs OpenLDAP differences).
  • Group membership attributes match what Proxmox expects for your realm type.

Fourth: confirm authorization mapping in Proxmox

  • User is present in Proxmox user/group mapping (or auto-provisioned if configured).
  • Roles and ACLs grant at least PVEAuditor or similar where needed.

Fifth: escalate to logs and packet capture

  • Read pveproxy and pvedaemon logs for the exact error string.
  • Use tcpdump to confirm TLS handshake and whether binds happen at all.

One-sentence discipline: don’t touch Proxmox until you can reproduce the failure with ldapsearch from the failing node.

Interesting facts and context (why this is messier than it should be)

  • Fact 1: LDAP dates back to the early 1990s as a lightweight alternative to X.500. “Lightweight” has since become a personality trait, not a guarantee.
  • Fact 2: Active Directory speaks LDAP, but it’s not “just LDAP.” AD mixes LDAP with Kerberos, DNS SRV records, and policy behaviors that surprise people who grew up on OpenLDAP.
  • Fact 3: LDAPS (LDAP over 636) historically competed with StartTLS (LDAP + upgrade). Many enterprises still use both, depending on which client was written in which decade.
  • Fact 4: AD group membership is commonly retrieved from the user’s memberOf attribute, but nested groups require extra logic (or matching rules) that not every client implements the same way.
  • Fact 5: Microsoft hardened AD LDAP behaviors over time (signing, channel binding, weaker cipher deprecations). A Proxmox upgrade can “break LDAP” when it’s actually the DC enforcing modern security.
  • Fact 6: Certificates fail more often due to name mismatch than due to cryptography. Humans are excellent at naming things wrong, especially under pressure.
  • Fact 7: Proxmox’s cluster filesystem (pmxcfs) makes auth config consistent—until the cluster is unhealthy, at which point consistency becomes a polite suggestion.
  • Fact 8: Many LDAP “invalid credentials” errors are actually “bind DN not found” or “account locked,” because servers intentionally return vague errors to avoid leaking info.

Practical tasks: commands, outputs, decisions (12+)

Run these on the Proxmox node where the login fails. If you have a cluster behind a load balancer, that detail matters more than anyone wants to admit.

Task 1: List configured realms and verify the right one exists

cr0x@server:~$ pveum realm list
Realm    Type  Comment
pam      pam
pve      pve
ad       ad    Corporate AD

What it means: The ad realm exists on this node. If it doesn’t, you’re debugging the wrong node or the cluster config isn’t consistent.

Decision: If the realm is missing on one node, fix cluster health or re-create the realm from a healthy node. Don’t “just add it locally”; Proxmox wants this in /etc/pve.

Task 2: Dump realm config (sanity-check bind DN, base DN, server)

cr0x@server:~$ pveum realm config ad
base_dn          dc=corp,dc=example,dc=com
bind_dn          svc_pve@corp.example.com
capath           /etc/ssl/certs
default          0
domain           corp.example.com
group_classes    group
password         **********
port             636
secure           1
server1          dc01.corp.example.com
user_attr        sAMAccountName

What it means: This is what Proxmox will use. Pay attention to server1, port, secure, base_dn, and the user attribute.

Decision: If server1 is a short name without a matching cert SAN, expect TLS failures. Use an FQDN that matches the cert.

Task 3: Verify DNS resolution (and catch “helpful” search domains)

cr0x@server:~$ getent hosts dc01.corp.example.com
10.10.10.11   dc01.corp.example.com

What it means: The node resolves the DC. If it resolves to a different IP than expected, you may be hitting the wrong DC or a stale record.

Decision: If resolution is wrong, fix /etc/resolv.conf (or systemd-resolved), not Proxmox. Authentication depends on correct DNS.

Task 4: Confirm TCP connectivity to LDAP/LDAPS

cr0x@server:~$ nc -vz dc01.corp.example.com 636
Connection to dc01.corp.example.com 636 port [tcp/ldaps] succeeded!

What it means: Network path is open. If it fails, stop and fix routing/firewall first.

Decision: If 636 is blocked but 389 works, decide whether you’re allowed to use StartTLS on 389 or must open 636.

Task 5: Inspect TLS handshake and certificate names

cr0x@server:~$ openssl s_client -connect dc01.corp.example.com:636 -servername dc01.corp.example.com -showcerts /dev/null | openssl x509 -noout -subject -issuer -dates -ext subjectAltName
subject=CN = dc01.corp.example.com
issuer=CN = Corp Issuing CA 01, O = Corp
notBefore=Oct  1 00:00:00 2025 GMT
notAfter=Oct  1 00:00:00 2027 GMT
X509v3 Subject Alternative Name:
    DNS:dc01.corp.example.com, DNS:dc01

What it means: The cert matches the name you used. If the SAN doesn’t include the FQDN, Proxmox may refuse TLS (depending on settings).

Decision: If names don’t match, use the name that matches the cert or fix the cert. Don’t disable verification unless you enjoy preventable incidents.

Task 6: Validate the CA chain that Proxmox trusts

cr0x@server:~$ openssl verify -CApath /etc/ssl/certs <(openssl s_client -connect dc01.corp.example.com:636 -servername dc01.corp.example.com /dev/null | openssl x509)
stdin: OK

What it means: The node trusts the issuing CA chain.

Decision: If this fails, install your enterprise CA into the OS trust store and update certs. Fix trust once, not per-app hacks.

Task 7: Prove the bind works with ldapsearch (service account)

cr0x@server:~$ ldapsearch -H ldaps://dc01.corp.example.com:636 -D 'svc_pve@corp.example.com' -W -b 'dc=corp,dc=example,dc=com' -s base '(objectClass=*)' defaultNamingContext
Enter LDAP Password:
dn:
defaultNamingContext: DC=corp,DC=example,DC=com

What it means: TLS works, bind works, base DN is reachable.

Decision: If you get Invalid credentials (49), verify the bind identity format. AD often accepts UPN (user@domain) or full DN; pick one and be consistent.

Task 8: Search for the user entry the same way Proxmox will

cr0x@server:~$ ldapsearch -H ldaps://dc01.corp.example.com:636 -D 'svc_pve@corp.example.com' -W -b 'dc=corp,dc=example,dc=com' '(sAMAccountName=alice)' dn sAMAccountName userPrincipalName memberOf
Enter LDAP Password:
dn: CN=Alice Nguyen,OU=Users,DC=corp,DC=example,DC=com
sAMAccountName: alice
userPrincipalName: alice@corp.example.com
memberOf: CN=PVE-Admins,OU=Groups,DC=corp,DC=example,DC=com
memberOf: CN=VM-Operators,OU=Groups,DC=corp,DC=example,DC=com

What it means: The user exists and can be found via your chosen attribute.

Decision: If search returns nothing, fix base_dn and the user filter/attribute. Don’t touch passwords; they’re innocent until proven guilty.

Task 9: Check for nested groups if your RBAC depends on them

cr0x@server:~$ ldapsearch -H ldaps://dc01.corp.example.com:636 -D 'svc_pve@corp.example.com' -W -b 'dc=corp,dc=example,dc=com' '(&(objectClass=user)(sAMAccountName=alice)(memberOf:1.2.840.113556.1.4.1941:=CN=PVE-Admins,OU=Groups,DC=corp,DC=example,DC=com))' dn
Enter LDAP Password:
dn: CN=Alice Nguyen,OU=Users,DC=corp,DC=example,DC=com

What it means: The AD “matching rule in chain” confirms nested membership. If you need nested groups and your setup doesn’t account for them, authorization will be inconsistent.

Decision: Decide whether you will support nested groups (then test it deliberately) or enforce “no nesting for PVE groups” to keep behavior predictable.

Task 10: Inspect Proxmox logs during a login attempt

cr0x@server:~$ journalctl -u pveproxy -u pvedaemon --since "10 minutes ago" | tail -n 30
Dec 26 09:40:12 pve01 pveproxy[2211]: authentication failure; rhost=10.10.20.55 user=alice@ad msg=ldap_bind: Invalid credentials (49)
Dec 26 09:40:12 pve01 pvedaemon[2198]: user authentication failed: alice@ad

What it means: Proxmox got as far as binding (or tried to) and received an LDAP error. The error string is your compass.

Decision: If you see TLS errors instead (handshake/verify), stop looking at user configuration and fix trust/certs/time.

Task 11: Force a realm sync and watch what happens

cr0x@server:~$ pveum realm sync ad
syncing users
syncing groups
OK

What it means: Proxmox can query directory objects using the realm configuration. If it fails, it usually prints a meaningful LDAP error.

Decision: If sync works but login fails, suspect user attribute mismatch, password issues, or realm selection at login.

Task 12: Verify that the user exists inside Proxmox’s view and see permissions

cr0x@server:~$ pveum user list | grep -E 'alice@ad|userid'
alice@ad

What it means: Proxmox knows about the user. That does not guarantee permissions, only identity mapping.

Decision: If the user isn’t listed and you rely on sync, fix sync filters. If you don’t rely on sync, decide whether to manage PVE users locally or via realm sync—pick one and document it.

Task 13: Check ACLs (because “login works” but UI is empty is still a failure)

cr0x@server:~$ pveum acl list | grep -i alice
/ - user alice@ad - role PVEAuditor

What it means: The user has at least one role assigned at root. If this is missing, the user may authenticate but see almost nothing.

Decision: If you want AD group-driven access, assign roles to groups, not individual users, and keep the ACL list maintainable.

Task 14: Verify group mapping from AD into Proxmox

cr0x@server:~$ pveum group list
Administrators
VMOperators

What it means: These are Proxmox groups (not necessarily AD groups). You must map AD groups to Proxmox groups or use ACLs directly against users.

Decision: If your organization expects “AD group grants Proxmox rights,” implement that mapping explicitly. Expectations are not configurations.

Task 15: Check cluster filesystem health (auth config depends on it)

cr0x@server:~$ pvecm status
Cluster information
-------------------
Name:             prod-cluster
Config Version:   42
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             2025-12-26 09:42:33
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.12
Quorate:          Yes

What it means: Cluster is quorate and configuration should be consistent. If not quorate, expect weirdness including stale auth config.

Decision: If quorum is lost, stop doing auth surgery and fix the cluster first.

Task 16: Packet-level proof when everything “looks fine”

cr0x@server:~$ tcpdump -i vmbr0 -nn host 10.10.10.11 and port 636 -c 10
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:43:10.120345 IP 10.10.10.21.49122 > 10.10.10.11.636: Flags [S], seq 123456789, win 64240, options [mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 7], length 0
09:43:10.120611 IP 10.10.10.11.636 > 10.10.10.21.49122: Flags [S.], seq 987654321, ack 123456790, win 65160, options [mss 1460,sackOK,TS val 200 ecr 100,nop,wscale 7], length 0
09:43:10.120744 IP 10.10.10.21.49122 > 10.10.10.11.636: Flags [.], ack 1, win 502, options [nop,nop,TS val 101 ecr 200], length 0

What it means: SYN/SYN-ACK/ACK proves connectivity. If you don’t see traffic during login attempts, your Proxmox node isn’t even trying the directory—wrong realm, wrong node, or config not applied.

Decision: If packets flow but authentication still fails, focus on TLS/bind/search rather than routing.

Where it breaks: failure modes by layer

Layer 1: UI and realm selection (the “wrong door” problem)

The Proxmox login screen has a realm dropdown. Users pick the wrong realm. They always pick the wrong realm. If you have both pam and ad enabled, “pam” will win the popularity contest because it sounds familiar.

Fix: Set a clear default realm, communicate it, and consider restricting local logins to break-glass only.

Joke #1: LDAP is the only protocol where “Invalid credentials” can mean “your password is wrong,” “your account is locked,” or “the moon is in retrograde.”

Layer 2: DNS and DC selection (quietly catastrophic)

AD depends on DNS. If your Proxmox nodes don’t use AD-integrated DNS (or at least can resolve it correctly), you can get:

  • Lookups returning a decommissioned DC.
  • Split-horizon confusion where internal names resolve externally.
  • Search domain appending turning dc01 into dc01.lab.example.com.

Fix: Use FQDNs in realm config. Ensure Proxmox nodes use the right resolvers. Don’t rely on search domains to “help.” They help like a cat helps with a puzzle.

Layer 3: Time (TLS and Kerberos both care)

Even if you’re using LDAP binds (not Kerberos), TLS certificate validation depends on time. Clock skew yields certificate “not yet valid” or “expired” errors that look like TLS failures.

Fix: Make NTP boring and redundant. If your nodes can’t keep time, your auth stack is a decorative feature.

Layer 4: TLS trust and certificate identity

Most enterprise AD deployments use internal PKI. If Proxmox doesn’t trust your CA, LDAPS fails. If you connect to dc01 but the cert says dc01.corp.example.com, LDAPS fails. If the cert chain is missing an intermediate, LDAPS fails. If TLS policy changes on the DC, older clients fail.

Fix: Put your CA in the OS trust store. Use names that match SANs. Keep the chain intact. Prefer modern TLS policies, but test them with your actual clients, not hopes.

Layer 5: Bind identity formats (UPN vs DN vs DOMAIN\user)

AD accepts multiple forms, but not always in the same contexts. Proxmox realm config may use UPN style bind (svc_pve@corp.example.com) or full DN (CN=svc_pve,OU=Service Accounts,...). If your service account moved OUs and you hardcoded DN, you’ll get bind failures. If the account password changed and nobody updated Proxmox, you’ll get bind failures.

Fix: Use UPN for service accounts when possible (less OU-coupled). Document ownership of that credential. Rotate it with a plan, not vibes.

Layer 6: Base DN and filters (the “user exists, I swear” argument)

People set base_dn too narrow (one OU) and later add users elsewhere. Or they copy a filter from an OpenLDAP example into AD and wonder why uid doesn’t match anything.

Fix: Set the base DN to the domain root unless you have a strong reason. If you must scope, include all relevant OUs and keep it maintained.

Layer 7: Group membership resolution (nested groups, attribute choices)

You can authenticate without correct group handling, but you can’t authorize correctly. AD nested groups are a classic pitfall: user shows memberOf only for direct memberships. Your policy says “PVE-Admins” but users are members through nesting. Half the team gets access, half doesn’t, and everyone blames Proxmox.

Fix: Decide whether you allow nested groups. If yes, test nested resolution. If no, enforce it with directory governance.

Layer 8: Proxmox permissions and tokens (auth vs authz)

Proxmox login is separate from permissions. A user can authenticate and still see nothing. Also: API tokens and user sessions behave differently than interactive logins. Don’t debug an API token failure by changing LDAP filters.

Fix: Give users a minimal role at the correct path. Use groups for ACLs. Keep break-glass local accounts for emergencies and audit them aggressively.

Layer 9: Cluster quirks (it worked on node1)

In a cluster, it’s common to test on one node, declare victory, then get paged because logins fail on another node. Usually that’s:

  • Node not in quorum, stale config.
  • Different DNS or firewall policy per node.
  • Different CA trust store state (someone “fixed it” manually once).

Fix: Treat auth dependencies (DNS, CA, time, firewall) as cluster-wide configuration management, not artisanal node-tweaking.

Three corporate mini-stories (how this goes wrong at work)

Mini-story 1: The incident caused by a wrong assumption

They were migrating from a legacy vSphere environment to Proxmox, mid-quarter, with the usual background radiation of deadlines. The AD team handed over a service account and said, “It can read users and groups.” The Proxmox team assumed that meant it could read all users and groups under the domain. It couldn’t.

In AD, they had delegated read permissions only to a specific OU where “server admins” lived. The Proxmox realm base DN was the domain root, so searches crossed into areas where that service account had no rights. In test, everyone who tried logging in happened to be in the delegated OU. In production, half the NOC was in a different OU. Login failed for them, but not for the on-call engineer, which is a special kind of misleading.

The first reaction was predictable: someone tried changing user filters, then someone else disabled TLS verification “temporarily.” That didn’t help, because the problem wasn’t TLS or filters. It was authorization inside AD for the bind account.

They fixed it by aligning scope with permissions: either broaden the service account’s read rights (preferred, with minimal scope) or narrow the base DN to only the intended OU and document the coupling. They did both: narrow scope for now, create a change request to extend read permissions properly, and add a login test for a user outside the original OU to the deployment checklist.

Mini-story 2: The optimization that backfired

A different shop had a very “efficient” idea: point Proxmox at a single DC because it “reduces latency” and makes troubleshooting easier. They picked the DC in the same rack row. Everything looked great—until patch night.

The Windows team patched that DC and it rebooted. For about 12 minutes, LDAP/LDAPS was unavailable. Proxmox nodes tried to authenticate; logins failed; API calls started returning errors; automation that used the API began to retry aggressively. Load increased everywhere, including on the DC once it came back, and their monitoring went from “red” to “performance art.”

After the dust, they added server2/server3 to the realm config and re-tested. That solved availability, but they discovered a second-order effect: the backup DC had a slightly different certificate chain (still valid, but issued from a different intermediate). Some Proxmox nodes trusted one intermediate but not the other, because someone had installed certificates inconsistently over time.

The final fix was unsexy: standardize CA trust across nodes, configure multiple LDAP servers, and stop optimizing for a few milliseconds in an authentication path that runs orders of magnitude less often than storage IO. They also added a maintenance window runbook: patch DCs one at a time and validate LDAPS from Linux clients before declaring success.

Mini-story 3: The boring but correct practice that saved the day

One organization had a habit: every auth change came with a tiny, repeatable test script run from each Proxmox node. Nothing fancy—just DNS resolution, TCP connect, TLS verify, ldapsearch bind, and a user lookup. They version-controlled the script and kept expected outputs documented.

During a routine CA rotation, the AD CS team rolled out a new issuing CA and began serving new chains on the domain controllers. Lots of systems had problems. Proxmox didn’t—because the Proxmox team had already deployed the new CA into the OS trust store on every node as part of the change plan, a week earlier, with verification.

What looked like luck was actually a cheap discipline: “trust store is managed config,” not “something you fix when it breaks.” Their auth didn’t hiccup, which meant their virtualization platform stayed boring while other teams were triaging certificate fires.

It didn’t win awards. It did save a weekend. In enterprise operations, that’s the best award anyway.

Common mistakes: symptom → root cause → fix

1) Symptom: “Login failed” for everyone; logs mention TLS/handshake

Root cause: Proxmox doesn’t trust the AD/LDAP certificate chain, or the hostname doesn’t match the cert SAN.

Fix: Install enterprise CA into OS trust store, verify with openssl verify, use FQDN matching SAN. Avoid disabling verification unless you want attackers to log in as your directory server.

2) Symptom: “Invalid credentials (49)” even with correct password

Root cause: Wrong bind DN format, expired/locked service account, or AD policy blocking simple binds without TLS.

Fix: Confirm bind DN/UPN, check account status in AD, ensure LDAPS/StartTLS is used. Validate with ldapsearch -D ... -W from the node.

3) Symptom: Only some users can log in

Root cause: Base DN too narrow, OU scoping mismatch, or service account lacks read rights in parts of the directory.

Fix: Expand base DN appropriately or adjust AD delegation. Test with users across OUs.

4) Symptom: User can authenticate but sees an empty UI or no nodes

Root cause: Missing Proxmox ACLs/roles; authentication succeeded but authorization didn’t.

Fix: Assign a role at the correct path (often / or /vms), ideally via group-based ACLs.

5) Symptom: Works on node1, fails on node2

Root cause: Cluster not healthy, config not replicated, or node-specific DNS/firewall/trust store differences.

Fix: Restore quorum/cluster health, standardize node configuration management, verify realm config on each node.

6) Symptom: Realm sync works, but interactive login fails

Root cause: User attribute mismatch (e.g., expecting sAMAccountName but users log in with UPN), or user typed the wrong realm.

Fix: Align user login format with user_attr and UI instructions. Consider setting a default realm to reduce human error.

7) Symptom: After security hardening, LDAP breaks “randomly”

Root cause: LDAP signing/channel binding requirements changed on AD, or TLS policy tightened, exposing older client assumptions.

Fix: Move to LDAPS/StartTLS with valid trust, ensure Proxmox and underlying libraries support required policies, test in staging with actual DC policy settings.

8) Symptom: Group-based permissions don’t apply as expected

Root cause: Nested groups not resolved, wrong group attribute/class configured, or the group DNs don’t match what Proxmox is mapping.

Fix: Decide nested group strategy, validate membership queries, and keep AD group naming/DN stable (or map using unambiguous attributes where supported).

Joke #2: The fastest way to find your single point of failure is to “optimize” it. Authentication is a great teacher with terrible timing.

Checklists / step-by-step plan (make it boring)

Checklist A: When you’re building a new AD/LDAP realm

  1. Pick the realm type (ad for AD, ldap for generic LDAP). Don’t be clever unless you have to.
  2. Use FQDNs for DC/LDAP servers that match certificate SANs.
  3. Configure multiple servers if available (avoid single DC dependency).
  4. Install CA trust into the OS trust store on every node, consistently.
  5. Use a dedicated service account with least privilege read access to the required OUs.
  6. Set base DN intentionally. Domain root is simplest; scoping requires governance.
  7. Decide user login format (sAMAccountName vs UPN) and align user_attr accordingly.
  8. Decide nested group policy and test it with real user examples.
  9. Define group-to-role mapping in Proxmox and keep it in change control.
  10. Test from every node (DNS, TCP, TLS, ldapsearch, Proxmox login).

Checklist B: When logins fail right now and you’re on call

  1. Confirm the realm used at login and the node receiving the request.
  2. Run pveum realm list and pveum realm config <realm>.
  3. Check DNS: getent hosts for the configured LDAP server.
  4. Check TCP: nc -vz to 636/389.
  5. Check TLS: openssl s_client and openssl verify.
  6. Run ldapsearch bind and user search with the same base DN and attribute.
  7. Look at journalctl -u pveproxy -u pvedaemon around the failure time.
  8. Check Proxmox permissions: pveum acl list and group mapping.
  9. If multi-node: verify cluster quorum and that config is consistent.
  10. Only then escalate to packet capture.

Checklist C: Hardening that won’t bite you later

  1. Keep at least one break-glass local Proxmox user with MFA or strong controls (and audit its use).
  2. Standardize CA trust store management across nodes (config management).
  3. Monitor LDAPS reachability and certificate expiration from Proxmox nodes.
  4. Rotate service account credentials with a runbook and a test plan.
  5. Restrict firewall rules to DCs and required ports, but document it.

FAQ

1) Why does Proxmox show “authentication failure” when the real issue is TLS?

Because from Proxmox’s point of view, it couldn’t authenticate you. The bind never happened. Always check logs and test TLS separately with openssl s_client.

2) Should I use the ad realm type or ldap for Active Directory?

Use ad unless you have a specific reason not to. It aligns defaults (attributes/classes) better with AD behaviors and reduces schema footguns.

3) LDAPS on 636 or StartTLS on 389?

Either can be secure. Pick what your directory team supports and what your monitoring can validate reliably. In many enterprises, 636 is operationally simpler (clear separation) but depends heavily on certificates.

4) My bind DN is a full DN and it broke after an OU move. How do I prevent that?

Use a UPN-style bind identity (e.g., svc_pve@corp.example.com) if allowed. That decouples your bind from OU structure. Also: treat OU moves as changes with impact, not “just housekeeping.”

5) Users can log in, but permissions don’t reflect AD groups. What’s missing?

Authentication isn’t authorization. You still need Proxmox ACLs/roles bound to users or groups. If you want AD-driven access, decide how AD groups map to Proxmox roles and implement it explicitly.

6) Do I need to sync users and groups with pveum realm sync?

It depends on your operational model. Sync can make group-based management easier, but it’s another moving part. If you use it, monitor it and run it as part of controlled changes.

7) Why does login work for direct group members but not for nested group members?

Because memberOf often reflects only direct memberships. Nested resolution requires special matching rules or explicit expansion. Either support nested groups and test it, or ban nesting for Proxmox-related groups.

8) Can I “just disable certificate verification” to get unstuck?

You can, and you can also remove your seatbelt to make it easier to reach the radio. If you disable verification, you’re vulnerable to MITM and rogue directory endpoints. Fix trust properly.

9) Why does it fail only on one Proxmox node?

Usually because that node has different DNS, firewall rules, or CA trust state. Sometimes it’s cluster config inconsistency due to quorum issues. Verify both: node dependencies and pvecm status.

10) What’s the cleanest way to separate break-glass from normal access?

Keep a local pve user (or tightly controlled pam account) for emergencies, but require a change ticket or audited process to use it. Use AD for day-to-day access.

Next steps you should do after it works

Getting LDAP/AD login to work once is not the goal. Keeping it working during routine changes is the job.

  1. Write a one-page runbook with your realm settings, supported login formats, and the exact ldapsearch command used to validate binds and searches.
  2. Standardize trust stores across all Proxmox nodes and track CA rotations. If your CA changes every few years, it’s still often enough to ruin your week.
  3. Add monitoring for LDAPS reachability and certificate expiration from Proxmox nodes (not from some random monitoring subnet).
  4. Make authorization explicit: group-to-role mapping in Proxmox, documented and reviewed like firewall rules.
  5. Test failover by temporarily removing one DC from service and verifying logins still work via the alternate server.

If you do those, LDAP stops being a haunted house and becomes what it should be: a dependency you understand, measure, and can fix under pressure.

← Previous
Fix WordPress “exceeds upload_max_filesize”: raise limits correctly (PHP, Nginx/Apache, hosting)
Next →
Site-to-Site VPN: The Routing Checklist That Prevents One-Way Traffic

Leave a comment