You can ping the host. The Web UI loads. The certificate warning is the usual “we’ll deal with it later.” Then you type the password you’ve typed a thousand times and get the deadpan response: Login failed.
This is the kind of failure that wastes time because it looks like a simple password problem. Sometimes it is. Often it’s not. The UI is just the messenger, and it’s not great at explaining what actually broke. Let’s make it explain itself.
Fast diagnosis playbook (do this first)
If you want the fastest path to the real cause, you’re hunting for one of three things: wrong realm, time/ticket problems, or backend auth failing. Don’t start by randomly restarting services. Read the errors first.
Step 1 — Confirm you’re using the right realm
- If you’re logging in as
root, in most deployments you want Realm = PAM, so the username should look likeroot@pam. - If you have LDAP/AD configured, you might need
user@yourrealm. The UI remembers your last realm selection, which is great until it isn’t.
Step 2 — Look at the auth errors where they actually are
- Open a shell (console, IPMI, or SSH if still possible).
- Tail the logs while attempting a login. If you see “authentication failure” with details, you’re already ahead of the UI.
Step 3 — Check time drift (the silent killer)
If the node clock is wrong, tickets can be minted and then immediately rejected as expired or not-yet-valid. This presents as “login failed” even with perfect credentials.
Step 4 — Verify pveproxy and pvedaemon aren’t stuck or misconfigured
The UI is just JavaScript talking to pveproxy (API) and friends. If the services are unhappy, you’ll get generic failures.
Step 5 — If it’s a cluster, test from another node
In a cluster, auth configuration and user database are shared via /etc/pve (pmxcfs). If pmxcfs is unhealthy on a node, it may show the UI but behave strangely for auth and permissions.
One operational truth: if you can’t log in, don’t argue with the GUI. Ask the daemon. It will tell you, eventually.
How Proxmox login actually works (so you know where to look)
Proxmox VE’s Web UI is a client for the REST API. The login flow is roughly:
- Browser loads the UI from
pveproxy(port 8006). - You submit
username@realmand password (and possibly 2FA). pveproxypasses auth to the selected realm backend (PAM, PVE, LDAP/AD, OpenID, etc.).- On success, Proxmox issues a ticket (cookie) and a CSRF prevention token for API calls.
- The UI uses that ticket and CSRF token for subsequent operations.
That matters because “login failed” isn’t one thing. It could mean:
- The realm backend rejected credentials (real password/2FA problem).
- The backend is unreachable (LDAP down, AD TLS issue, DNS failure).
- The ticket was created but the browser can’t store it (clock skew, cookie oddities, reverse proxy header problems).
- The node can’t read its own auth configuration (pmxcfs issues, permissions in
/etc/pve).
Here’s a useful mental model: UI loads means TCP + TLS + basic service availability are fine. Login works means auth backend, time, and stateful ticketing are all fine too. Different layers. Different failure modes.
Interesting facts and context (why these failures happen)
- Fact 1: Proxmox VE’s cluster filesystem (
pmxcfs) is implemented as a FUSE filesystem and is backed by corosync for distribution. When it’s unhealthy, “simple files” like user config aren’t simple anymore. - Fact 2: The default Proxmox UI/API port is 8006, historically chosen to avoid common collisions with web servers and appliances already using 443/8443.
- Fact 3: Proxmox distinguishes between PAM users (system accounts), PVE users (stored in Proxmox user database), and external realms like LDAP/AD/OIDC. Picking the wrong one is the #1 self-inflicted wound.
- Fact 4: Ticket-based auth has an implicit dependency on time correctness. Modern auth systems are allergic to clock drift because it breaks replay protection.
- Fact 5: Many Proxmox components are Perl-based (notably
pveproxyandpvedaemon), which means failure logs can be very specific if you bother to read them. - Fact 6: In clustered setups,
/etc/pveis not “just local config.” It’s a distributed state store. If corosync has quorum issues, config reads and writes can behave differently than you expect. - Fact 7: Browser-side issues are real: stale cookies, cached JS, and aggressive privacy extensions can break the CSRF/token dance and look like “login failed.” Rare, but it happens.
- Fact 8: The Proxmox UI historically relied on ExtJS. A lot of the front-end behaviors (including some error handling) were shaped by that ecosystem’s assumptions about sessions and tokens.
Top causes when the UI loads but you can’t log in
1) Wrong realm (PAM vs PVE vs LDAP/AD vs OIDC)
The UI remembers your last realm. That’s convenient until you switch from an LDAP user to root and forget to flip it back. Result: you try root against LDAP, it fails, and the UI shrugs.
Diagnosis hint: If root “suddenly” stopped working after you added an LDAP realm, this is the first thing I check. Most “sudden” auth breakages are actually UI state.
2) Clock drift or NTP broken (tickets become invalid)
Time drift doesn’t always show as “ticket expired.” Sometimes it’s just “login failed.” If your RTC battery is dying or your NTP is blocked, you can get drift big enough to break sessions, TLS validation, and cluster coordination.
3) PAM is failing (account locked, expired, or NSS issues)
PAM auth can fail for reasons beyond “wrong password”:
- Account locked (
pam_tally2/faillockscenarios). - Password expired (especially if you enforced policies on system accounts).
- NSS resolution problems (odd if you wired in LDAP for system accounts).
4) Two-factor auth mismatch or broken time base
TOTP is time-based. If the server clock is wrong, your 2FA is wrong. If you’re using WebAuthn, browser and reverse-proxy behavior matters. And if the UI prompts for “OTP” but you’re entering the password+OTP in the wrong format for that realm, it’ll fail without much guidance.
5) pveproxy / pvedaemon issues (service unhealthy or stuck)
Yes, the UI can load even if the backend has partial failures. You can get static content and still fail API auth calls. Service restarts can help, but treat restarts like painkillers: useful, not curative.
6) Certificate/key mismatch after restore or hostname/IP changes
If you restored a node from backup, cloned a VM into a new identity, or changed the hostname without cleaning up, you can end up with mismatched certs or stale node identity. This can snowball in clusters.
7) Cluster filesystem (/etc/pve) not mounted/healthy
If /etc/pve is unhappy, user config and realms may not be readable as expected. The node might still serve the UI because the service starts, but auth logic may not find what it needs.
8) External auth realm failure (LDAP/AD, OIDC)
LDAP/AD failures are classics: DNS misbehavior, TLS trust chain issues, bind user password expired, firewall, or a domain controller maintenance window that nobody told you about.
9) Browser/cookie/CSRF weirdness (less common, still real)
When the UI loads but immediately “login failed” after entering correct credentials, try an incognito window or another browser. If that fixes it, you didn’t fix Proxmox; you fixed state in the client.
Joke #1: If your incident response plan is “clear browser cache,” you don’t have a plan—you have a ritual.
Hands-on tasks: commands, outputs, and decisions (12+)
These are the moves I actually make on a node when the UI loads but auth fails. Each task includes a command, what typical output means, and what decision you make next. Run these on the Proxmox host unless stated otherwise.
Task 1 — Confirm services are running: pveproxy and pvedaemon
cr0x@server:~$ systemctl status pveproxy pvedaemon --no-pager
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled)
Active: active (running) since Tue 2025-12-23 09:11:04 UTC; 2h 18min ago
● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled)
Active: active (running) since Tue 2025-12-23 09:11:02 UTC; 2h 18min ago
Meaning: If either is not active, auth will be unreliable or dead.
Decision: If not running, check logs (Task 2) before restarting. If they’re running, proceed to realm/time checks.
Task 2 — Tail the proxy log while attempting login
cr0x@server:~$ journalctl -u pveproxy -f
Dec 23 11:29:31 pve pveproxy[2154]: authentication failure; rhost=192.0.2.55 user=root@pam msg=failed to authenticate user
Dec 23 11:29:31 pve pveproxy[2154]: client denied: invalid credentials
Meaning: “invalid credentials” points to realm/password/2FA or PAM lockouts.
Decision: If you see realm mismatch or LDAP errors, jump to the relevant section. If you see ticket/CSRF errors, go to time and cookie checks.
Task 3 — Tail pvedaemon for deeper auth/permission errors
cr0x@server:~$ journalctl -u pvedaemon -n 200 --no-pager
Dec 23 11:29:31 pve pvedaemon[2031]: authentication failure; user=root@pam msg=pam_authenticate failed: Authentication failure
Meaning: This is backend-level rejection (PAM in this example).
Decision: Investigate PAM/account state (Tasks 9–11). If pvedaemon shows “permission denied reading /etc/pve,” investigate pmxcfs (Tasks 6–7).
Task 4 — Verify node time and NTP sync
cr0x@server:~$ timedatectl
Local time: Tue 2025-12-23 11:31:08 UTC
Universal time: Tue 2025-12-23 11:31:08 UTC
RTC time: Tue 2025-12-23 11:31:07
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: no
NTP service: active
RTC in local TZ: no
Meaning: “System clock synchronized: no” is suspicious, especially in clusters or with 2FA.
Decision: Fix time sync before doing anything else. If time is off by minutes, expect ticket failures and corosync problems too.
Task 5 — Check chrony status (common on Debian-based systems)
cr0x@server:~$ chronyc tracking
Reference ID : 203.0.113.10 (ntp1.example)
Stratum : 3
Ref time (UTC) : Tue Dec 23 11:30:58 2025
System time : 0.842123 seconds slow of NTP time
Last offset : -0.001993812 seconds
RMS offset : 0.012340123 seconds
Frequency : 12.345 ppm fast
Leap status : Normal
Meaning: Small offsets are fine. Big offsets and “Leap status: Not synchronised” are not.
Decision: If not syncing, check firewall/DNS and your NTP servers. If you’re in a cluster, fix time on all nodes.
Task 6 — Confirm pmxcfs is mounted and healthy
cr0x@server:~$ mount | grep /etc/pve
pmxcfs on /etc/pve type fuse.pmxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
Meaning: If this is missing, Proxmox cluster config isn’t mounted; auth realms and users can behave oddly.
Decision: Check pve-cluster and corosync (Task 7). In a single-node setup, pmxcfs should still be mounted.
Task 7 — Check cluster/quorum status (clustered nodes)
cr0x@server:~$ pvecm status
Cluster information
-------------------
Name: prod-cluster
Config Version: 23
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue Dec 23 11:33:12 2025
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.3a
Quorate: Yes
Meaning: If Quorate: No, configuration reads/writes may be restricted and symptoms can include auth weirdness depending on what’s failing.
Decision: Fix corosync network, node connectivity, and quorum. Don’t “just reboot until it works.” That’s how you turn a hiccup into downtime.
Task 8 — List realms and verify which ones exist
cr0x@server:~$ pveum realm list
Realm Type Comment
pam pam
pve pve
corp-ldap ldap Corporate LDAP
Meaning: If the realm you’re selecting in the UI isn’t here (or has been renamed), your login attempt is doomed.
Decision: Use a known-good realm (often pam) for break-glass access, then fix external realms.
Task 9 — Validate that the root account is not locked at the OS level
cr0x@server:~$ passwd -S root
root P 2025-11-04 0 99999 7 -1
Meaning: Status P usually indicates a usable password is set. L indicates locked.
Decision: If locked, unlock intentionally (Task 10). If password expired, set a new one in a controlled way.
Task 10 — Unlock root (if you mean it) and set a password
cr0x@server:~$ passwd -u root
passwd: password expiry information changed.
Meaning: Root is unlocked. That’s a security decision, not a technical one.
Decision: If your policy forbids root logins, don’t do this casually. Instead, use an admin user in the PVE realm or fix your identity provider.
Task 11 — Test PAM auth directly (without the UI)
cr0x@server:~$ pamtester login root authenticate
Password:
pamtester: successfully authenticated
Meaning: If PAM auth succeeds here but fails in the UI, the problem may be realm selection, 2FA expectations, or ticket/cookie/time issues.
Decision: If PAM fails here too, you have an OS auth problem (password, lockout, PAM stack changes).
Task 12 — Check the Proxmox user database (PVE realm)
cr0x@server:~$ pveum user list
UserID Enable Expire
root@pam 1
admin@pve 1
ops@corp-ldap 1
Meaning: If your admin user is disabled or expired, UI login fails even if LDAP is fine.
Decision: Enable/extend the right user. If the user is external, make sure the realm is reachable.
Task 13 — Check if 2FA is enforced for the user
cr0x@server:~$ pveum user get admin@pve
┌─────────────┬──────────┐
│ key │ value │
╞═════════════╪══════════╡
│ enable │ 1 │
│ expire │ 0 │
│ firstname │ │
│ lastname │ │
│ email │ │
│ keys │ │
│ groups │ │
│ tokens │ │
│ totp │ 1 │
└─────────────┴──────────┘
Meaning: If TOTP is enabled (or required by policy), password-only logins fail. Some users forget they enabled it months ago.
Decision: Use the right OTP flow, or temporarily disable 2FA only if you have a formal break-glass procedure.
Task 14 — Verify the API responds locally (narrow network vs auth)
cr0x@server:~$ curl -k -s https://127.0.0.1:8006/api2/json/version | sed 's/[{}]//g'
"data":"release":"8.2","repoid":"a1b2c3d4"
Meaning: If this fails locally, you don’t have a “browser problem.” You have a service problem (pveproxy down, TLS issues, or local firewall).
Decision: Fix pveproxy/listeners before chasing realms.
Task 15 — Try a password-based API ticket request (isolates GUI)
cr0x@server:~$ curl -k -s -d "username=root@pam&password=CorrectHorseBatteryStaple" https://127.0.0.1:8006/api2/json/access/ticket | head
{"data":{"username":"root@pam","ticket":"PVE:root@pam:...","CSRFPreventionToken":"b0c1..."}}
Meaning: If this works, the backend auth path is fine; the GUI/cookies/reverse proxy might be the issue.
Decision: Check browser cookies, reverse proxy headers, and time validity. If it fails, read the logs again with Task 2.
Task 16 — Check for reverse proxy header problems (if you front Proxmox)
cr0x@server:~$ grep -R "proxy_set_header" -n /etc/nginx/sites-enabled 2>/dev/null | head
/etc/nginx/sites-enabled/pve.conf:12: proxy_set_header Host $host;
/etc/nginx/sites-enabled/pve.conf:13: proxy_set_header X-Forwarded-Proto https;
/etc/nginx/sites-enabled/pve.conf:14: proxy_set_header X-Forwarded-For $remote_addr;
Meaning: Missing/incorrect Host or proto headers can break cookie scope, redirects, and sometimes auth flow.
Decision: Ensure your reverse proxy is configured specifically for Proxmox and supports WebSockets. If you don’t need a reverse proxy, don’t add one “for neatness.”
Task 17 — Inspect local firewall rules quickly
cr0x@server:~$ pve-firewall status
Status: enabled/running
Meaning: A misconfigured firewall can block LDAP/AD or NTP while still allowing 8006, producing “login failed” for external realms.
Decision: If external realms are failing, confirm outbound access to LDAP/DC/NTP.
Task 18 — Validate LDAP realm connectivity (when LDAP/AD is involved)
cr0x@server:~$ pveum realm sync corp-ldap
syncing users and groups...
ERROR: LDAP connection failed: Can't contact LDAP server
Meaning: External identity is down or unreachable from this host (DNS, routing, firewall, TLS, server down).
Decision: Fix network/DNS/TLS first. Do not “fix” this by creating local users unless you’re executing a documented break-glass plan.
Three corporate mini-stories (how teams really get burned)
Mini-story 1: The incident caused by a wrong assumption
The company had a tidy Proxmox cluster: three nodes, shared storage, and “enterprise-ish” LDAP integrated. The team onboarded a new on-call engineer and did the usual: “Just log in with your LDAP account.”
A month later, during a storage incident, the LDAP team performed a planned domain controller maintenance. It was supposed to be seamless. It wasn’t. One DC came back with an old certificate chain, the other had replication lag, and the load balancer made creative choices. Proxmox Web UI loaded everywhere, but LDAP logins failed.
The on-call engineer tried root next. Still failed. They assumed the root password had been rotated or lost and escalated to security. Meanwhile, the storage issue worsened because nobody could change VM IO limits or migrate workloads.
The real root cause was painfully small: the login form was set to the LDAP realm, and the engineer typed root without @pam. Proxmox did exactly what it was told: authenticate root against LDAP. No such user, no login.
They fixed the realm selection, logged in as root@pam, and stabilized the cluster. The postmortem action item wasn’t “train better.” It was to create a documented break-glass local admin user in the PVE realm, with tested 2FA recovery, and to include “check realm dropdown” in the runbook. Boring. Effective.
Mini-story 2: The optimization that backfired
A different org put Proxmox behind a corporate reverse proxy. The goal was single entry point, WAF logging, and “consistent TLS.” All reasonable. Then someone optimized the proxy config to enforce aggressive cookie policies and tightened header handling because compliance tooling flagged “insecure defaults.”
On Monday, users reported intermittent “login failed.” The UI loaded fine. Sometimes it worked after a refresh; sometimes not. Engineers restarted pveproxy on two nodes and temporarily “fixed” it, which bought them confidence and lost them time.
The failure was in the reverse proxy’s handling of headers and cookies. Under certain conditions, the proxy changed the perceived scheme/host, Proxmox issued cookies with attributes that the browser refused, and the subsequent CSRF/token exchange failed. The UI’s only opinion: “login failed.”
The fix was to stop being clever. The proxy was reconfigured to pass the correct Host and X-Forwarded-Proto, preserve WebSocket upgrade headers, and not mangle cookie attributes. After that, logins became boring again—exactly what you want from authentication.
Mini-story 3: The boring but correct practice that saved the day
A team running Proxmox for internal CI had a habit that looked paranoid: every quarter they validated a break-glass login on each cluster. Not “we think it works,” but an actual login test from the console and the UI, with a recorded checklist.
One day, their NTP upstream changed and a firewall rule update blocked outbound NTP on one VLAN. A single Proxmox node drifted. Not enough to panic people—just enough to make auth tickets flaky. The UI loaded, logins failed sometimes, and sessions randomly dropped.
Their monitoring caught clock offset, but the real win was procedural: the runbook’s first steps included checking timedatectl and chronyc tracking. They corrected the firewall rule, forced a time step, and the issue vanished.
No heroics. No emergency password resets. No “maybe Proxmox is broken.” Just a system behaving predictably and a team refusing to improvise authentication fixes in production.
Paraphrased idea (Gene Kranz): “Failure isn’t an option” is really about discipline—good systems and calm procedures reduce the space where failure can happen.
Common mistakes: symptom → root cause → fix
This section is intentionally specific. If you can map your symptom to a row, you can stop guessing.
1) “Root password definitely works” but login fails immediately
- Symptom: UI loads, password rejected instantly, no OTP prompt.
- Root cause: Wrong realm selected (often LDAP realm) so
rootis being authenticated against the wrong backend. - Fix: Log in as
root@pamwith Realm “Linux PAM standard authentication.” Confirm realms withpveum realm list.
2) Login fails only for LDAP/AD users; local users still work
- Symptom:
root@pamworks;user@corp-ldapdoesn’t. - Root cause: LDAP connectivity/TLS/DNS/firewall, expired bind user credentials, or DC maintenance.
- Fix: Run
pveum realm sync corp-ldapand inspect errors. Verify DNS and firewall egress. Fix trust chain if using LDAPS.
3) Login worked yesterday; today it fails across multiple nodes
- Symptom: Multiple users report “login failed” with correct passwords.
- Root cause: Time drift (NTP broken), or a realm backend outage (LDAP/OIDC down).
- Fix: Check
timedatectland NTP status. If external realm, test connectivity from nodes. Fix time first, then identity.
4) Login succeeds, then UI acts like you’re logged out
- Symptom: You “log in,” then get bounced back or actions fail with permission errors.
- Root cause: Cookie not stored/returned due to reverse proxy or browser privacy settings; sometimes CSRF token mismatch due to scheme/host confusion.
- Fix: Try direct access to
https://node:8006. Try a clean browser profile. Fix reverse proxy headers and WebSocket settings.
5) Cluster node UI loads, but login only works on some nodes
- Symptom: Node A accepts login, Node B rejects, same user.
- Root cause: Node-level time drift, pmxcfs mount issues, or partial corosync partitioning causing stale/partial config.
- Fix: Compare time and
pvecm status. Verify/etc/pvemount. Fix cluster transport issues and re-establish quorum.
6) “Login failed” after enabling 2FA
- Symptom: Password accepted previously; now always fails.
- Root cause: OTP required but not provided/entered correctly; server time drift makes OTP invalid.
- Fix: Fix time sync. Confirm 2FA status with
pveum user get USER. Use recovery methods per policy; don’t invent new ones under pressure.
7) After restore/clone, nobody can log in (or UI behaves inconsistently)
- Symptom: Web UI accessible; auth failing or weird session behavior after node identity change.
- Root cause: Hostname mismatch, certificate/key mismatch, or duplicated node identity in a cluster.
- Fix: Verify hostname resolution and node identity. In clusters, ensure node IDs are unique and corosync config is consistent. Re-issue certificates if needed (with caution).
Joke #2: Authentication is like a corporate badge reader—when it fails, it’s never because you’re standing there angrily enough.
Checklists / step-by-step plan
Checklist A: Single-node Proxmox, UI loads, login fails
- Try root@pam explicitly. Don’t trust the realm dropdown memory.
- Run
journalctl -u pveproxy -n 100and look for the exact failure reason. - Confirm time sync:
timedatectland (if present)chronyc tracking. - Test PAM directly:
pamtester login root authenticate. - Check if root is locked:
passwd -S root. - Test API ticket creation locally with curl (Task 15). If curl works but GUI doesn’t, suspect browser/proxy.
- If you front Proxmox with a reverse proxy, bypass it and test direct access to
:8006. - Only then consider restarting
pveproxy/pvedaemon, and document why.
Checklist B: Clustered Proxmox, UI loads, login fails on one node
- Check time offset on the bad node vs a good node (
timedatectlon both). - Check quorum and corosync health:
pvecm status. - Verify
/etc/pveis mounted:mount | grep /etc/pve. - Tail
pveproxylogs while trying to log in. - If LDAP/AD: test realm sync from the bad node (
pveum realm sync). Network segmentation issues often hit one node. - Confirm the node is not isolated by firewall changes (Proxmox firewall and upstream ACLs).
- If the node is non-quorate: don’t make config changes. Fix quorum first.
Checklist C: External realms (LDAP/AD/OIDC) suddenly fail
- Confirm local PAM login works (
root@pam) to regain control. - Check DNS resolution for identity endpoints from the node.
- Check outbound firewall rules from Proxmox to LDAP/DC/OIDC.
- Look for TLS/trust chain changes if using LDAPS.
- Validate bind credentials haven’t expired or been rotated without updating Proxmox.
- Do not “fix” by creating random local admins. Create exactly one break-glass account, pre-approved, and monitor its use.
FAQ
1) Why does the Proxmox Web UI load if authentication is broken?
Because loading the UI is mostly static content from pveproxy. Authentication is a separate API call that hits realm backends and ticket logic.
2) What’s the difference between root, root@pam, and root@pve?
root@pam is the system root account authenticated via PAM. root@pve would be a Proxmox-managed user (if created), separate from OS accounts.
3) I can SSH in as root, but the Web UI login fails. How?
SSH and Web UI might both use PAM, but Web UI can also involve 2FA, realm selection, and ticket/cookie handling. Also, the UI might be authenticating against a different realm than you think.
4) Can time drift really cause “login failed” even with correct passwords?
Yes. Tickets and TOTP are time-sensitive, and even modest drift can cause immediate rejection or weird session behavior.
5) How do I tell if it’s an LDAP/AD problem versus a Proxmox problem?
If root@pam works but user@corp-ldap fails, it’s usually LDAP/AD connectivity, TLS, DNS, or bind credentials. Confirm with pveum realm sync corp-ldap and logs.
6) Does a cluster being non-quorate cause login failures?
It can contribute, especially if /etc/pve state isn’t consistent or pmxcfs/corosync are unhealthy. It’s more common to see config operations blocked, but auth and permissions can become confusing during partitions.
7) Should I restart pveproxy when I see “login failed”?
Only after you’ve checked logs and time. Restarting can mask the underlying issue (like NTP drift or LDAP outage) and you’ll be back here later, only angrier.
8) What if the UI works when accessing https://node:8006 directly but fails behind a reverse proxy?
Then the proxy is the problem. Fix forwarded headers (Host, X-Forwarded-Proto), WebSocket upgrade handling, and avoid cookie mangling.
9) I enabled 2FA and now nobody can log in. What’s the safest recovery?
Use your documented recovery method (recovery codes, break-glass account, console access). If you don’t have one, create it after you recover—because this will happen again.
10) What’s the single most reliable “break-glass” access path?
Console access (physical/IPMI) plus a local admin path (root@pam or a local PVE admin user) that does not depend on external identity providers.
Conclusion: next steps that prevent a repeat
When Proxmox says “login failed” but the UI loads, you’re not dealing with a mystery. You’re dealing with a small set of predictable failure modes: realm mismatch, time drift, backend auth failures, cluster filesystem problems, and proxy/browser token weirdness.
Do this next:
- Add a runbook entry that starts with: realm check,
journalctl -u pveproxy, time sync verification. - Implement and test break-glass access: a local admin path that doesn’t depend on LDAP/OIDC being healthy.
- Monitor time drift on every node. Alert on NTP desync before tickets start failing.
- If you use a reverse proxy, treat it as production code. Version it, review it, and test login flows after changes.
- In clusters, monitor quorum and corosync health. Authentication and config live in the same ecosystem of assumptions.
The UI is polite. The logs are honest. Trust the honest one.