“Authentication failed” is the kind of message that wastes half a day because it refuses to say what failed:
your credentials, your token, your ACL, your realm, your TLS fingerprint, or your assumptions about how Proxmox talks to PBS.
In production, this error doesn’t arrive politely. It shows up at 02:13, right after a node reboot, right before a compliance audit,
and exactly when you thought backups were “set and forget.” They aren’t. Let’s make them boring again.
Fast diagnosis playbook
When PBS auth breaks, you want the shortest path to the truth. Don’t start by rotating everything. Start by deciding
whether this is (a) the wrong secret, (b) the wrong permissions, or (c) the wrong server identity (fingerprint/cert).
Everything else is garnish.
First: identify where the failure occurs (PVE side vs PBS side)
- If PVE can’t reach PBS at all: it’s networking, DNS, firewall, or the PBS API service.
- If PVE reaches PBS but gets 401/403: it’s token/user/realm/ACL.
- If PVE complains about fingerprint/certificate: it’s TLS identity mismatch.
Second: classify the HTTP-ish behavior
- 401 Unauthorized: credentials/token not accepted (wrong token, wrong user, wrong realm, token expired/disabled).
- 403 Forbidden: identity accepted, permissions denied (ACL missing, wrong path, wrong privilege).
- TLS handshake/fingerprint mismatch: you’re talking to the wrong server or the server changed identity.
Third: pick one safe action
- For 401: validate the token is being sent, belongs to the user you think, and is not restricted in a way you forgot.
- For 403: map required privileges to the exact PBS path (datastore, namespace, backup group).
- For fingerprint errors: verify the server certificate on PBS, then update the fingerprint on PVE. Do not “just click accept” on a production LAN unless you’ve verified you’re not being redirected.
One quote worth keeping on your wall:
Paraphrased idea (Werner Vogels): you build systems assuming failures happen, then you make them recover fast and safely.
The mental model: who authenticates to what
Proxmox VE (PVE) doesn’t “log in” to PBS like a human with a web browser. It uses the PBS HTTP API.
That API call includes an authentication identity (user or token), and PBS evaluates that identity against its ACLs.
Separately, PVE pins a TLS fingerprint so that “pbs01” actually is your PBS and not whatever answered that IP today.
So “authentication failed” can mean any of these layers broke:
- Name and reachability: DNS and routing to the PBS host and port.
- TLS identity: certificate fingerprint mismatch or trust failure.
- Authentication: user/password, or API token header format, realm, token status.
- Authorization: PBS ACL path and privileges for datastore, namespace, maintenance operations.
- Capability mismatch: PVE expects a feature (namespace, owner checking, encryption) that PBS config forbids or lacks.
The good news: PBS and PVE are noisy in logs once you look in the right place. The bad news: the GUI error bubble is useless.
Treat it as an invitation to stop clicking and start reading logs like an adult.
Interesting facts and context (why this is tricky)
- Fact 1: PBS uses a path-based ACL model. Permissions can be granted at “/” or at a datastore-specific subtree, and inheritance matters.
- Fact 2: PVE pins the PBS server certificate fingerprint for the storage definition. This is a deliberate anti-MITM choice; it’s not just being picky.
- Fact 3: API tokens in Proxmox ecosystems are separate identities with their own privileges and optional restrictions. They are not just “passwords without MFA.”
- Fact 4: A 401 vs 403 distinction is operational gold. 401 means “who are you?” and 403 means “I know who you are and no.”
- Fact 5: When you restore backups, PBS can enforce ownership and permission checks differently than for backups. You might be authorized to write backups but not to read/restore them.
- Fact 6: The PBS API runs on port 8007 by default. People still accidentally point PVE at 8006 out of muscle memory, because Proxmox trains you that 8006 is “the Proxmox port.”
- Fact 7: Fingerprint changes often correlate with reinstallation, host replacement, certificate regeneration, or “someone cleaned up /etc/proxmox-backup” without understanding consequences.
- Fact 8: Namespace support in PBS introduced more granular control, and also more ways to deny access accidentally (a gift to compliance, a prank on ops).
- Fact 9: Backup verification, pruning, and garbage collection are privileged operations. A token that can run backups may still fail on maintenance tasks later.
What “authentication failed” usually really means
1) Wrong realm or wrong identity format
Proxmox identities are user@realm. If you created backup@pbs and then configured backup@pam,
you will get a failure that looks like a password typo. It isn’t.
2) Token exists, but you used the wrong token ID or secret
Tokens are two parts: the token identifier (like a username suffix) and the token secret (the one-time shown string).
It’s easy to paste the wrong one into a password field and then swear PBS is broken.
3) Token authenticates, ACL denies (403), GUI hides the nuance
The most common “it works in one place but not another” problem: the token can list datastores, but cannot write snapshots,
or can write but cannot prune, or can back up VMs but cannot back up containers (different content type on PVE side, different API calls).
4) Fingerprint mismatch after rebuild or IP reassignment
If the PBS server cert changed, PVE will refuse the connection unless you update the pinned fingerprint.
This is correct behavior. Treat fingerprint mismatches like you treat SSH host key changes: verify first, then update.
5) Time skew causes ticket/token weirdness
PBS and PVE are sensitive to time for sessions and validations. If one host drifted because NTP died quietly,
you can get failures that vanish after rebooting (which is not a fix; it’s a coin flip).
Joke #1: Backups are like parachutes—if you need one and it doesn’t open, you’re not “learning,” you’re “falling.”
Practical tasks (commands, output meanings, decisions)
These are the tasks I actually run when PBS auth breaks. Each includes what the output implies and what decision you make next.
Run commands on the appropriate host (PVE or PBS). Don’t improvise; follow the breadcrumbs.
Task 1: Confirm you’re even hitting the PBS API port (PVE)
cr0x@pve01:~$ nc -vz pbs01 8007
Connection to pbs01 8007 port [tcp/*] succeeded!
Meaning: Network path and port are reachable.
Decision: Move up the stack to TLS and auth. If it fails, fix DNS/routing/firewall/service before touching tokens.
Task 2: Check PBS API service health (PBS)
cr0x@pbs01:~$ systemctl status proxmox-backup-proxy
● proxmox-backup-proxy.service - Proxmox Backup Server Proxy
Loaded: loaded (/lib/systemd/system/proxmox-backup-proxy.service; enabled)
Active: active (running) since Fri 2025-12-26 00:11:04 UTC; 2h 9min ago
Meaning: The proxy is running; PBS should accept API connections.
Decision: If it’s down, restart and check logs before blaming authentication.
Task 3: Tail PBS proxy logs during a failing attempt (PBS)
cr0x@pbs01:~$ journalctl -u proxmox-backup-proxy -f
Dec 26 02:13:21 pbs01 proxmox-backup-proxy[1872]: authentication failure; rhost=10.10.20.11 user=backup@pbs msg=invalid credentials
Dec 26 02:13:21 pbs01 proxmox-backup-proxy[1872]: failed login attempt: backup@pbs from 10.10.20.11
Meaning: PBS received the request and rejected credentials (likely 401).
Decision: Focus on identity format, realm, token secret, and whether PVE is using a token or password.
Task 4: Check for “permission denied” vs “invalid credentials” (PBS)
cr0x@pbs01:~$ journalctl -u proxmox-backup-proxy --since "30 minutes ago" | tail -n 20
Dec 26 02:18:09 pbs01 proxmox-backup-proxy[1872]: user 'backup@pbs' failed to access /datastore/vmstore: permission check failed
Dec 26 02:18:09 pbs01 proxmox-backup-proxy[1872]: api request failed: 403 Forbidden
Meaning: Identity is valid; authorization failed (403).
Decision: Fix ACLs on the datastore/namespace path; do not rotate tokens.
Task 5: Verify NTP/time sync on both sides (PVE and PBS)
cr0x@pve01:~$ timedatectl
Local time: Fri 2025-12-26 02:20:31 UTC
Universal time: Fri 2025-12-26 02:20:31 UTC
RTC time: Fri 2025-12-26 02:20:31
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active
Meaning: Time is sane here.
Decision: If System clock synchronized: no on either host, fix NTP first. Time skew creates fake “auth” problems.
Task 6: Confirm the fingerprint PVE has pinned (PVE)
cr0x@pve01:~$ grep -R "fingerprint" /etc/pve/storage.cfg
pbs: pbs-vmstore
server pbs01
datastore vmstore
username backup@pbs
fingerprint 8A:12:6F:9C:AA:6E:9D:5A:5B:22:1C:77:41:9A:0E:1F:74:54:2B:11
Meaning: PVE will only trust a PBS presenting that fingerprint.
Decision: If PBS cert changed, update this fingerprint after verifying the new one on PBS.
Task 7: Obtain the actual current PBS certificate fingerprint (PBS)
cr0x@pbs01:~$ openssl x509 -in /etc/proxmox-backup/proxy.pem -noout -fingerprint -sha1
sha1 Fingerprint=8A:12:6F:9C:AA:6E:9D:5A:5B:22:1C:77:41:9A:0E:1F:74:54:2B:11
Meaning: This is what PVE should pin.
Decision: If it differs from storage.cfg, update PVE’s storage definition fingerprint (and ask why the cert changed).
Task 8: Check you’re not hitting the wrong host (PVE)
cr0x@pve01:~$ getent hosts pbs01
10.10.20.50 pbs01
Meaning: DNS resolves pbs01 to an IP.
Decision: If this IP recently changed, confirm the new host is actually PBS and not a recycled address or load balancer.
Task 9: Verify TLS identity over the wire (PVE)
cr0x@pve01:~$ echo | openssl s_client -connect pbs01:8007 -servername pbs01 2>/dev/null | openssl x509 -noout -subject -fingerprint -sha1
subject=CN = pbs01
sha1 Fingerprint=8A:12:6F:9C:AA:6E:9D:5A:5B:22:1C:77:41:9A:0E:1F:74:54:2B:11
Meaning: The server you reached presents a cert with CN and fingerprint shown.
Decision: If fingerprint differs from what PBS reports locally, you’re being redirected or there’s a proxy in between.
Task 10: Confirm the PBS user exists and is enabled (PBS)
cr0x@pbs01:~$ proxmox-backup-manager user list
┌─────────────┬───────┬───────────┬────────┐
│ userid │ enable│ expire │ comment│
╞═════════════╪═══════╪═══════════╪════════╡
│ root@pam │ true │ │ │
│ backup@pbs │ true │ │ PVE jobs│
└─────────────┴───────┴───────────┴────────┘
Meaning: The user exists and is enabled.
Decision: If disabled/expired, fix that. If the user doesn’t exist, stop: you’re authenticating to a fantasy identity.
Task 11: Confirm the token exists and is enabled (PBS)
cr0x@pbs01:~$ proxmox-backup-manager user token list backup@pbs
┌───────────────┬────────┬─────────┬──────────────┐
│ tokenid │ enable │ expire │ comment │
╞═══════════════╪════════╪═════════╪══════════════╡
│ pve01 │ true │ │ PVE node token│
└───────────────┴────────┴─────────┴──────────────┘
Meaning: The token backup@pbs!pve01 exists and is enabled.
Decision: If the token is missing/disabled/expired, re-issue a token and update PVE. Don’t keep guessing secrets.
Task 12: Inspect ACLs relevant to the datastore (PBS)
cr0x@pbs01:~$ proxmox-backup-manager acl list
┌──────────────────────┬─────────────┬───────────┬─────────┐
│ path │ ugid │ role │ propagate│
╞══════════════════════╪═════════════╪═══════════╪═════════╡
│ /datastore/vmstore │ backup@pbs │ DatastoreBackup │ true │
│ /datastore/vmstore │ backup@pbs!pve01 │ DatastoreBackup │ true │
└──────────────────────┴─────────────┴───────────┴─────────┘
Meaning: The identity is assigned a role on the datastore path.
Decision: If ACL is absent or points to the wrong path, add it. If you only granted at “/” but disabled propagation, you may have cut yourself off.
Task 13: Confirm datastore exists and is healthy (PBS)
cr0x@pbs01:~$ proxmox-backup-manager datastore list
┌───────────┬───────────────┬──────────┬────────┐
│ name │ path │ comment │ state │
╞═══════════╪═══════════════╪══════════╪════════╡
│ vmstore │ /mnt/pbs/vmstore │ │ ok │
└───────────┴───────────────┴──────────┴────────┘
Meaning: The datastore exists and PBS thinks it’s OK.
Decision: If missing or in error, fix the datastore mount/permissions first; auth errors can be secondary noise.
Task 14: Test login to PBS API using a token from a node (PVE)
cr0x@pve01:~$ curl -sk -H "Authorization: PBSAPIToken=backup@pbs!pve01:REDACTED" https://pbs01:8007/api2/json/version
{"data":{"version":"3.2-1","release":"bookworm"}}
Meaning: The token is valid for API access and TLS is working (because you got a response).
Decision: If this fails with 401, your token/secret/user realm is wrong. If it fails with TLS errors, fix fingerprint/cert trust.
Task 15: Spot 403 permission failures with a targeted API call (PVE)
cr0x@pve01:~$ curl -sk -H "Authorization: PBSAPIToken=backup@pbs!pve01:REDACTED" https://pbs01:8007/api2/json/admin/datastore/vmstore/status
{"errors":"permission check failed"}
Meaning: Token authenticates but lacks privilege for that endpoint/path.
Decision: Adjust roles/ACLs for the token (preferred) or user (less preferred) on the correct datastore path.
Task 16: Check PVE storage config correctness (PVE)
cr0x@pve01:~$ pvesm status
Name Type Status Total Used Available %
local dir active 19684264 6021132 13012000 30.59%
pbs-vmstore pbs active 0 0 0 0.00%
Meaning: PVE sees PBS storage as active. If it’s inactive, check auth/fingerprint.
Decision: If it’s active but jobs fail, focus on backup job permissions (snapshot/upload/prune) rather than basic connectivity.
Task 17: Validate which identity PVE is using for the PBS storage (PVE)
cr0x@pve01:~$ awk '/^pbs: /{f=1} f{print} /^$/{f=0}' /etc/pve/storage.cfg
pbs: pbs-vmstore
datastore vmstore
server pbs01
username backup@pbs
fingerprint 8A:12:6F:9C:AA:6E:9D:5A:5B:22:1C:77:41:9A:0E:1F:74:54:2B:11
Meaning: You can sanity-check the user realm here.
Decision: If username is wrong realm (pam vs pbs) or a typo, fix it and re-test before touching ACLs.
Task 18: Check for login failures at the auth backend level (PBS)
cr0x@pbs01:~$ journalctl --since "1 hour ago" | grep -E "failed login|authentication failure" | tail -n 10
Dec 26 02:13:21 pbs01 proxmox-backup-proxy[1872]: authentication failure; rhost=10.10.20.11 user=backup@pbs msg=invalid credentials
Meaning: Confirms repeated attempts and the user value PBS sees.
Decision: If PBS logs show a different user than you expect, you’ve configured PVE wrong (or you’re testing from the wrong host).
Tokens and ACLs: the permission stack that bites
User vs token: pick tokens for machines
Use API tokens for PVE nodes and automation. Use passwords for humans. Tokens reduce blast radius and eliminate the
“someone rotated root@pam password to satisfy a policy” surprise.
A typical clean model looks like this:
- Create a dedicated PBS user:
backup@pbs(realmpbs, notpam). - Create one token per PVE node:
backup@pbs!pve01,backup@pbs!pve02, etc. - Grant ACL roles on the specific datastore (and namespace if used), not at “/”.
- Grant only what the node needs: backup, maybe verify, maybe prune—decide consciously.
Roles aren’t decoration; they’re the contract
If you want backups to run but not allow a compromised node to delete history, don’t hand it broad admin roles.
The correct approach is to explicitly grant a backup role and keep prune/GC restricted to a separate maintenance identity.
That split is dull. Dull is good.
Where people get hurt: they test with root, it works, then “tighten permissions later,” and later arrives as a 403
in the middle of the night.
Namespace and ownership gotchas
PBS can segment data with namespaces. Great for multi-tenant setups, or for separating environments (prod vs dev) without extra datastores.
Also great for creating a token that can authenticate but sees nothing.
If your job targets a namespace, your ACL must match that namespace path. Granting permissions to the datastore root might not be enough
depending on how you’ve structured it and whether propagate is set.
Fingerprints and TLS: when security does its job
Fingerprint errors feel like bureaucracy until you remember the alternative: silently sending your backups to an attacker’s host
because someone ARP-spoofed your VLAN or because an IP got reused. PVE pins the PBS cert fingerprint so it can detect “this is not the same server.”
That’s not paranoia. That’s Tuesday.
Legit reasons fingerprints change
- PBS was reinstalled or replaced.
- The certificate was regenerated intentionally.
- The
proxy.pemfile was replaced during restore/migration. - Someone restored an old filesystem snapshot of
/etc/proxmox-backup. - You moved PBS behind a TLS-terminating reverse proxy (usually a bad idea for this use case).
Bad reasons fingerprints change
- “We cleaned up certificates because security.”
- “We redeployed the VM from template and assumed it would be the same.”
- “We changed the hostname and didn’t think it mattered.”
The correct workflow is boring:
verify the fingerprint on PBS locally, compare to what PVE pins, then update the storage definition
if the change is legitimate. If you can’t explain the change, stop and investigate. Backups are a security boundary.
Joke #2: TLS fingerprints are like airport security—annoying until the day you’re glad someone asked questions.
Three corporate mini-stories from the backup trenches
Incident #1 (wrong assumption): “It’s on the same VLAN, so it’s fine”
A mid-sized company ran PVE clusters and a single PBS VM. It sat on a “trusted” infrastructure VLAN. The team assumed that meant
no one would ever impersonate PBS, so the fingerprint warning in PVE was treated as a nuisance.
During a network cleanup, an IP address used by PBS was briefly reassigned to a different host. That host wasn’t malicious; it was just there first
when DNS caught up. PVE started throwing authentication and fingerprint errors. Someone, under pressure, updated the fingerprint to “make backups green”
without verifying the cert on the PBS side.
Backups started “working” again—except they were sent to the wrong machine that had no datastore mounted, so uploads failed mid-stream and jobs retried forever.
The monitoring was keyed on “job started,” not “backup finished,” so the dashboard stayed pleasantly optimistic.
The fix was simple but humbling: revert the fingerprint, correct DNS/IP, and treat fingerprint prompts like SSH host key changes:
verify out-of-band, then accept. They also changed monitoring to alert on successful completion and on datastore growth patterns.
Incident #2 (optimization that backfired): “One token for the whole cluster”
Another team wanted to reduce configuration sprawl. So they created a single PBS token and pasted it into every PVE node’s storage config.
It worked, and it was fast. It also meant every node had the same effective identity.
Later, they tried to tighten permissions by restricting the token. But they needed different permissions for different nodes:
one node handled compliance workloads and required verify permissions; another node was a disposable dev host where prune wasn’t allowed.
With one shared token, permission changes became political and slow.
Then a node was decommissioned. Nobody remembered to rotate the shared token, because “it’s just backups.” That node’s disks were sold off,
and the token secret lived on in a forgotten config backup. Months later, during an audit, they realized they had no clean way to prove the secret was gone.
The backfiring optimization was the shared identity. The fix: one token per node, limited ACLs per token, and a rotation practice tied to node lifecycle.
Slightly more typing; dramatically less existential dread.
Incident #3 (boring but correct practice): the checklist that saved the day
A regulated environment had a PBS hardware failure. They restored PBS from a known-good recovery procedure: reinstall OS,
reattach storage, restore PBS config, validate certificate fingerprint, validate user/token inventory, then re-enable PVE jobs.
It was a dull runbook, full of command outputs and “expected values.”
During restore, the PBS certificate regenerated. That’s normal. The runbook explicitly said:
“Before updating fingerprints on PVE, obtain fingerprint from /etc/proxmox-backup/proxy.pem and validate console access.”
So they did. No guessing. No clicking through warnings.
They updated fingerprints on PVE nodes, then tested with a single VM backup, then expanded to the whole fleet.
The entire outage was contained to a predictable window. The best part: the postmortem had almost no drama, because the process already answered the questions.
This is why you write runbooks. Not because you enjoy paperwork, but because your future self is busy, tired, and one bad decision away from turning a recovery
into an incident.
Common mistakes: symptom → root cause → fix
1) Symptom: “authentication failed” immediately when adding PBS storage
Root cause: Wrong realm (@pam vs @pbs) or wrong username format.
Fix: Verify user exists on PBS (proxmox-backup-manager user list) and set username backup@pbs accordingly.
2) Symptom: 401 Unauthorized in PBS proxy logs
Root cause: Token secret wrong, token disabled/expired, or token ID mismatch.
Fix: List tokens (proxmox-backup-manager user token list backup@pbs), re-issue if needed, update PVE, then test via curl.
3) Symptom: 403 Forbidden / permission check failed
Root cause: Missing ACL on /datastore/<name> or wrong namespace/propagation.
Fix: Add ACL on the exact path with appropriate role and propagate; confirm via proxmox-backup-manager acl list.
4) Symptom: Fingerprint mismatch prompt after PBS reboot/reinstall
Root cause: PBS certificate changed or PVE is reaching a different host.
Fix: Get fingerprint from PBS locally with openssl x509 -in /etc/proxmox-backup/proxy.pem ..., validate DNS/IP, update PVE fingerprint only after verifying.
5) Symptom: Storage shows active, but backup jobs fail during prune or verify
Root cause: Token can write backups but lacks maintenance privileges.
Fix: Separate identities: one token for backup writes, another for prune/verify, or explicitly grant required roles for maintenance endpoints.
6) Symptom: Works from one PVE node, fails from another
Root cause: Per-node token differences, or different storage.cfg entries, or firewall rules by source IP.
Fix: Compare /etc/pve/storage.cfg across nodes; test curl from each node; check PBS logs for rhost and user/token.
7) Symptom: After enabling 2FA for a user, backups start failing
Root cause: You used a human account for automation. 2FA breaks non-interactive flows unless token-based auth is used.
Fix: Switch PVE to PBS API tokens for machine access. Keep 2FA for humans.
8) Symptom: “permission denied” when listing snapshots/contents
Root cause: ACL allows write but disallows read/listing or browsing on the relevant path/namespace.
Fix: Ensure roles cover the operations you need (backup, read/list, restore). Test via API calls that list groups/snapshots.
9) Symptom: Everything broke after “security hardening”
Root cause: Cert regeneration or reverse proxy insertion without updating fingerprints, or ACL tightened at “/” without propagation.
Fix: Undo the proxy, re-pin fingerprints, and reapply ACLs at datastore scope with propagate set intentionally.
Checklists / step-by-step plan
Plan A: fix it fast without making it worse (recommended)
- Confirm reachability:
nc -vz pbs01 8007from the failing PVE node. - Read PBS logs while reproducing:
journalctl -u proxmox-backup-proxy -f. - Classify the failure:
- invalid credentials / 401 → token/user/realm.
- permission check failed / 403 → ACL path/role/propagation.
- fingerprint mismatch → verify cert and update PVE pin.
- Verify identity inventory on PBS:
proxmox-backup-manager user listproxmox-backup-manager user token list backup@pbs
- Verify ACL on PBS:
proxmox-backup-manager acl listand confirm correct datastore path. - Verify fingerprint:
- PBS local:
openssl x509 -in /etc/proxmox-backup/proxy.pem -noout -fingerprint -sha1 - PVE pinned: grep storage.cfg
- PBS local:
- Test via curl from PVE using the token header to isolate GUI issues.
- Fix exactly one thing, retest, then proceed. No “rotate everything and hope.”
Plan B: rebuild credentials cleanly (when you’ve lost the thread)
- Create (or re-create) a dedicated PBS user for PVE jobs (
backup@pbs), enabled, no expiry. - Create a new token per node; record the secret once; store it in your secret manager.
- Assign ACL on
/datastore/<name>to the token identity directly. - Update PVE storage config to use the token and correct fingerprint.
- Run a single VM backup test; confirm PBS logs show the correct token/user and no 403s.
- Only then re-enable the full backup schedule.
Plan C: fingerprint change response (treat it like an incident)
- Stop and verify DNS/IP mapping (
getent hosts pbs01). - Verify fingerprint locally on PBS via console access.
- Compare to what PVE pins; if different, document why.
- Update PVE fingerprint and retest connectivity.
- If you can’t explain the change, assume compromise or misrouting until proven otherwise.
FAQ
1) Is “authentication failed” always a bad password or token?
No. It can be TLS fingerprint mismatch, wrong realm, or even a permission denial that got flattened into a generic GUI message.
Always check PBS proxy logs to see whether it’s 401 or 403.
2) What’s the difference between user/password and API token auth for PBS?
User/password is interactive-friendly and human-oriented. API tokens are meant for automation, can be revoked independently,
and are safer to scope with ACLs. For PVE-to-PBS, use tokens.
3) Why does PVE care about PBS fingerprint? We’re on an internal network.
Internal networks are where most “trusted” mistakes happen: IP reuse, DNS drift, someone plugging in a test VM, or a compromised host.
Fingerprint pinning prevents silently backing up to the wrong endpoint.
4) I updated the fingerprint and now it works. Are we done?
Only if you can explain why it changed. If it changed because PBS was rebuilt, fine. If it changed “mysteriously,” investigate DNS, ARP, proxies,
and whether the PBS host was replaced or restored from snapshot.
5) Why do I get 403 when backups run but prune fails?
Backup upload and prune/GC/verify are different operations. Your token may have write privileges but not maintenance privileges.
Split duties: backup token for writes, maintenance token for pruning/verifying.
6) Can I use root@pam everywhere to avoid permission issues?
You can, and you’ll regret it. Root is a great troubleshooting identity and a terrible long-term automation identity.
Use least privilege tokens and datastore-scoped ACLs.
7) What’s the quickest way to confirm the token works outside the GUI?
Use curl against /api2/json/version with the PBSAPIToken=... header. If that returns JSON, auth and TLS are good.
If it fails, the error message becomes more specific than the GUI.
8) My PBS user exists, my token exists, ACL looks right, but I still get 403. What now?
Confirm the ACL path matches the actual target: correct datastore name, namespace, and whether propagation is enabled.
Also confirm you’re granting to the right identity: backup@pbs vs backup@pbs!pve01.
9) Does time skew really cause auth errors here?
Yes, especially around ticket/session validation and anything with expiry semantics. If you see intermittent failures after reboots,
check timedatectl on both ends before blaming tokens.
10) Should I put PBS behind a reverse proxy with my own cert?
Generally no. It introduces extra moving parts and breaks the mental model of “PVE pins the PBS cert identity.”
If you must, treat it as an architecture change and plan fingerprint management deliberately.
Conclusion: next steps that keep you out of pager jail
Fixing PBS “authentication failed” isn’t about heroics. It’s about refusing to guess. Start with logs, classify 401 vs 403 vs TLS,
and only then touch tokens, ACLs, or fingerprints.
Practical next steps:
- Create a dedicated PBS user and one token per PVE node. Kill shared tokens unless you enjoy rotating secrets in bulk.
- Grant datastore-scoped ACLs with deliberate propagation. Avoid “/” grants that turn into accidental superpowers.
- Document and monitor fingerprint changes like SSH host key changes: verify, then update, and always record why.
- Split duties: backup-write token vs prune/verify token. Least privilege isn’t a religion; it’s a containment strategy.
- Add monitoring on successful backup completion and datastore growth, not just “job ran.”
When backups are healthy, they are boring. Aim for boring. Production likes boring.