Proxmox “backup storage not available on node”: why “shared” isn’t shared

Was this helpful?

You schedule backups, go to bed, and wake up to the kind of alert that makes coffee taste like regret: “backup storage not available on node”. You check the Proxmox GUI. Storage says “Shared”. Your brain says “So it’s available everywhere.” Reality says “Cute.”

This error is Proxmox being blunt: the node running the backup can’t use that storage right now. The “Shared” checkbox is not a magic distributed filesystem fairy. It’s metadata and assumptions. The fix is to prove those assumptions on every node, in the order that actually matters: mount, reachability, permissions, identity mapping, and config consistency.

What “shared” actually means in Proxmox (and what it doesn’t)

In Proxmox VE, a “storage” is a configuration object defined in /etc/pve/storage.cfg. That config can be clustered (replicated across nodes via pmxcfs) or local to one node. The “shared” flag is Proxmox’s way of saying: “I expect this storage to be accessible from multiple nodes, and I’ll treat it accordingly when placing VM disks, doing migrations, and running backups.”

It does not mean Proxmox will:

  • Mount NFS/CIFS for you on every node.
  • Create identical paths across nodes.
  • Fix permissions, UID/GID mapping, or root squashing.
  • Ensure the network route exists from every node.
  • Keep your out-of-band systemd mount units in sync.

The Proxmox GUI checkbox is a contract. You signed it. Now you need to fulfill it.

There are two common interpretations of “shared” in real life:

  1. Storage-level shared: NFS/CIFS, iSCSI+LVM, FC SAN, Ceph, Gluster, PBS (in a sense). Multiple nodes can access the same backend.
  2. Config-level shared: the storage definition exists cluster-wide so any node can try to use it.

The error you’re seeing happens when the second is true (or at least configured), but the first is false at runtime.

Opinionated rule: if it’s “shared”, every node must be able to run touch inside it without guessing. That means: mount is present, path exists, permissions allow write, and the backend is reachable. Anything less is a future incident.

One dry little truth: storage systems don’t care what your GUI checkbox says. They care what your kernel can mount and your process can write.

Fast diagnosis playbook

When you need answers fast, don’t wander. Run this like a checklist. The goal is to find where “shared” breaks: config, mount, network, permissions, or identity.

First: confirm what node is failing and what storage Proxmox thinks it’s using

  1. Look at the backup job history: which node executed it? If you have a cluster, jobs can run on multiple nodes depending on where the VM is.
  2. Identify the storage ID (e.g., backup-nfs, pbs01).
  3. Check Proxmox sees it as “active” on that node.

Second: prove the mount/path exists on the failing node

  1. pvesm status and pvesm path for that storage.
  2. findmnt for the mountpoint.
  3. Create a test file as root in the target directory.

Third: isolate network vs permissions vs identity mapping

  1. Network: ping, nc (for PBS), showmount (for NFS), SMB probe for CIFS.
  2. Permissions: try touch, check ownership and mode, inspect NFS export options (root_squash and friends).
  3. Identity: check if backup process runs as root; confirm remote side expects root or maps root.

Fourth: check cluster config consistency and split-brain symptoms

  1. pvecm status for quorum.
  2. Verify /etc/pve/storage.cfg content is identical across nodes (it usually is, unless pmxcfs is unhappy or someone edited local files incorrectly).
  3. Check time sync; Kerberos-based SMB and some TLS setups get spicy when clocks drift.

Stop early when you find the first broken layer. Fixing permissions won’t help if the mount never existed. Rebooting won’t help if you’re pointing to the wrong DNS name from one node.

Facts and history that explain the trap

  • Fact 1: Proxmox’s cluster filesystem (pmxcfs) is a user-space, replicated config store. It distributes configuration, not mounts or kernel state.
  • Fact 2: The “shared” attribute predates many people’s current expectation of “cloud semantics”. It’s from the era where admins knew an NFS mount was their job, not the hypervisor’s.
  • Fact 3: In Linux, a mount is per-node kernel state. No cluster config file can “share” a mount unless you deploy it on each node.
  • Fact 4: NFS’s root_squash behavior is a security default that frequently collides with backup software running as root. It’s not a Proxmox bug; it’s your security policy meeting your assumptions.
  • Fact 5: CIFS/SMB “permissions” are a multi-layer cake: server ACLs, share permissions, client mount options, and sometimes ID mapping. It’s impressive when it works.
  • Fact 6: Proxmox Backup Server (PBS) is not a filesystem mount; it’s an API-backed datastore with chunking and dedup. Availability errors there are often network/TLS/auth, not “mount missing”.
  • Fact 7: Proxmox’s storage plugins often run checks by attempting to access paths or perform operations; “not available” can mean “can’t stat directory”, “wrong content type”, or “backend unreachable”.
  • Fact 8: Systemd changed the game for mounts: x-systemd.automount can make mounts “lazy”, which is great for boot speed and terrible for time-bound backup jobs if misconfigured.

One quote that belongs in every on-call handbook:

“Hope is not a strategy.” — paraphrased idea attributed to many operations leaders

Shared storage is one of those places where hope shows up wearing a “works on my node” T-shirt.

Joke #1: “Shared” storage is like “shared” responsibility in a postmortem: everyone agrees it exists, and nobody is sure who owns it.

Why the storage is “not available”: the real failure modes

1) The storage definition is cluster-wide, but the backend is not mounted on every node

This is the classic. The storage is defined as dir or NFS/CIFS, but only one node has the actual mount or directory. Another node tries to run a backup job and finds an empty directory, a missing mountpoint, or a local path that is not what you think it is.

2) The mount exists, but it’s mounted differently (options, versions, paths)

NFSv3 on one node and NFSv4 on another. Different rsize/wsize. Different vers=. Different credential cache for SMB. Or worse: one node mounted a different export with the same mountpoint name. The GUI doesn’t scream; it just fails later.

3) Permissions: root squashed, ACL mismatch, or wrong owner

VZDump and many Proxmox operations run as root. If your NFS server maps root to nobody and your directory isn’t writable for that user, Proxmox gets an I/O or permission error and reports “not available” or a backup failure. You might see the storage as “active”, but writes fail.

4) DNS/routing asymmetry between nodes

Node A can resolve nas01 to the right IP. Node B resolves it to an old IP, a different VLAN, or a dead interface. Or one node routes through a firewall that blocks NFS ports. Storage “works” until a job lands on the wrong node.

5) Cluster state issues: quorum loss or pmxcfs weirdness

If a node loses quorum, some cluster operations are restricted. Storage config might be readable but changes may not propagate the way you expect. Also, a node with a sick cluster filesystem can present stale config. It’s rarer, but it’s real.

6) Backup target isn’t “storage” in the filesystem sense (PBS)

PBS failures look like storage failures in the GUI because Proxmox treats it as a storage backend. But the root causes are different: expired fingerprints, TLS trust issues, auth tokens, datastore removed, or network port blocked.

7) Boot ordering and “nofail” mounts create a time bomb

nofail in /etc/fstab lets the node boot even if the NAS is down. Great. But now the node is “up” and can schedule backups, while the backup directory is actually just an empty local folder. That’s not a backup; that’s performance art.

Joke #2: A backup that writes to an unmounted directory is Schrödinger’s disaster: it both succeeded and failed until you need to restore.

Practical tasks (commands, outputs, decisions)

These are the tasks I run when diagnosing “backup storage not available on node”. Each one includes a realistic command, a plausible snippet of output, and what decision you make from it. Run them on the failing node first, then compare to a known-good node.

Task 1: Identify storage status from Proxmox’s point of view

cr0x@pve2:~$ pvesm status
Name             Type     Status           Total            Used       Available        %
local             dir     active       98.00GiB        21.13GiB        71.87GiB   21.56%
backup-nfs        nfs     inactive            0B             0B              0B    0.00%
pbs01             pbs     active             n/a            n/a             n/a      n/a

What it means: Proxmox thinks backup-nfs is inactive on this node. That’s not a backup problem; it’s a mount/reachability problem.

Decision: Skip VM-level troubleshooting. Fix the storage backend first.

Task 2: Confirm the storage configuration exists and is marked shared

cr0x@pve2:~$ grep -A6 -n "backup-nfs" /etc/pve/storage.cfg
18:nfs: backup-nfs
19:        export /exports/pve-backups
20:        path /mnt/pve/backup-nfs
21:        server nas01
22:        content backup
23:        prune-backups keep-last=7
24:        shared 1

What it means: The storage is configured cluster-wide as NFS, content type “backup”, and “shared 1”.

Decision: Now prove the path is actually mounted and writable.

Task 3: Ask Proxmox where it expects the path to be

cr0x@pve2:~$ pvesm path backup-nfs
/mnt/pve/backup-nfs

What it means: That’s the mountpoint Proxmox will use.

Decision: Validate kernel mount state at that path.

Task 4: Verify the mount is real (not just an empty directory)

cr0x@pve2:~$ findmnt /mnt/pve/backup-nfs

What it means: No output usually means “not mounted”. If it returns an NFS line, you’re mounted.

Decision: If not mounted, check /etc/fstab and systemd units; if mounted, move to permissions and connectivity.

Task 5: Compare with a known-good node to catch “same name, different mount”

cr0x@pve1:~$ findmnt /mnt/pve/backup-nfs
TARGET               SOURCE                FSTYPE OPTIONS
/mnt/pve/backup-nfs  nas01:/exports/pve-backups nfs4   rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2

What it means: This is what “good” looks like: correct server, correct export, correct mountpoint.

Decision: Make the failing node match this, byte-for-byte where possible.

Task 6: Attempt to mount (and capture the actual error)

cr0x@pve2:~$ mount -av
/                        : ignored
/mnt/pve/backup-nfs       : mount.nfs4: Connection timed out

What it means: Network reachability problem to nas01 (routing, firewall, DNS, or NAS down).

Decision: Stop and troubleshoot network before touching permissions or Proxmox settings.

Task 7: Test DNS and routing symmetry

cr0x@pve2:~$ getent hosts nas01
10.20.30.50   nas01

What it means: Node resolves nas01 to an IP. That doesn’t prove it’s the right IP.

Decision: Compare output across nodes. If different, fix DNS/hosts. If same, check connectivity to that IP.

Task 8: Prove network reachability (ICMP is not enough, but it’s a start)

cr0x@pve2:~$ ping -c 2 10.20.30.50
PING 10.20.30.50 (10.20.30.50) 56(84) bytes of data.
64 bytes from 10.20.30.50: icmp_seq=1 ttl=63 time=0.462 ms
64 bytes from 10.20.30.50: icmp_seq=2 ttl=63 time=0.497 ms

--- 10.20.30.50 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 0.462/0.479/0.497/0.017 ms

What it means: The host is reachable, but NFS still might be blocked.

Decision: Test NFS specifically (portmapper for v3, 2049 for v4).

Task 9: Test NFS service reachability

cr0x@pve2:~$ nc -vz 10.20.30.50 2049
Connection to 10.20.30.50 2049 port [tcp/nfs] succeeded!

What it means: NFSv4 TCP port reachable. If this fails, it’s firewall/routing or NAS service down.

Decision: If reachable, check exports and permissions; if not, fix network/security first.

Task 10: Verify the export exists (server-side view via showmount)

cr0x@pve2:~$ showmount -e nas01
Export list for nas01:
/exports/pve-backups 10.20.30.0/24

What it means: The NAS claims it exports the path to your subnet.

Decision: If your node isn’t in that subnet (different VLAN), you’ve found the mismatch.

Task 11: If mounted, validate you can write as root (the backup reality)

cr0x@pve2:~$ sudo sh -c 'touch /mnt/pve/backup-nfs/.pve-write-test && ls -l /mnt/pve/backup-nfs/.pve-write-test'
-rw-r--r-- 1 root root 0 Dec 26 03:12 /mnt/pve/backup-nfs/.pve-write-test

What it means: Root can create files. That’s the minimum for backup storage.

Decision: If this fails with “Permission denied”, inspect NFS export options and directory permissions.

Task 12: Detect root squashing (common on NFS) from the client side

cr0x@pve2:~$ stat -c "%U %G %a %n" /mnt/pve/backup-nfs
nobody nogroup 755 /mnt/pve/backup-nfs

What it means: The server may be mapping root to nobody, and the directory mode is not writable.

Decision: Either adjust export settings (carefully) or create a dedicated backup user and align UID/GID across nodes and NAS.

Task 13: Check for the “mounted but stale” condition (NFS hiccups)

cr0x@pve2:~$ timeout 5 ls -la /mnt/pve/backup-nfs | head
total 16
drwxr-xr-x  2 root root 4096 Dec 26 02:10 .
drwxr-xr-x 10 root root 4096 Dec 26 01:55 ..
-rw-r--r--  1 root root    0 Dec 26 03:12 .pve-write-test

What it means: Directory listing returns quickly. If it hangs until timeout, you likely have a stale mount or network flaps.

Decision: Investigate network stability, NFS server load, and consider hard mounts with sane timeouts; avoid “soft” for backup integrity.

Task 14: For CIFS/SMB-based backup storage, verify mount options and credential use

cr0x@pve3:~$ findmnt /mnt/pve/backup-smb
TARGET               SOURCE                       FSTYPE OPTIONS
/mnt/pve/backup-smb  //files01/backups            cifs   rw,relatime,vers=3.1.1,cache=strict,username=svc_pve,uid=0,gid=0,file_mode=0640,dir_mode=0750

What it means: SMB mount exists and maps ownership to root. That’s common for Proxmox backup targets.

Decision: If a node lacks this mount or uses different credentials, standardize via /etc/fstab or a systemd mount unit deployed uniformly.

Task 15: For Proxmox Backup Server storage, validate API reachability and auth

cr0x@pve2:~$ nc -vz pbs01 8007
Connection to pbs01 8007 port [tcp/*] succeeded!

What it means: Network path to PBS API is open. That’s step one.

Decision: If blocked, fix firewall/VLAN. If open, check Proxmox storage config and certificates/fingerprint.

Task 16: Inspect backup job logs for the exact storage-related failure

cr0x@pve2:~$ journalctl -u pvedaemon -u pvescheduler --since "today" | grep -i backup | tail -n 12
Dec 26 02:01:01 pve2 pvescheduler[1890]: starting new backup job: vzdump 101 --storage backup-nfs --mode snapshot
Dec 26 02:01:02 pve2 vzdump[2033]: ERROR: storage 'backup-nfs' is not available on node 'pve2'
Dec 26 02:01:02 pve2 vzdump[2033]: INFO: Failed at 2025-12-26 02:01:02

What it means: The error is coming from the backup tool invoking storage checks. It’s not a VM snapshot failure; it’s storage availability.

Decision: Focus on why pvesm status shows inactive or why the path check fails.

Task 17: Confirm the node is healthy in the cluster (quorum and membership)

cr0x@pve2:~$ pvecm status
Cluster information
-------------------
Name:             prod-pve
Config Version:   42
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Dec 26 03:18:10 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000002
Ring ID:          1.10
Quorate:          Yes

What it means: Cluster is quorate. You’re not debugging storage under a cluster partition (good).

Decision: If Quorate: No, stabilize the cluster first; storage config and job scheduling can get unpredictable.

Task 18: Validate time sync (quiet cause of TLS and SMB weirdness)

cr0x@pve2:~$ timedatectl
               Local time: Fri 2025-12-26 03:19:44 UTC
           Universal time: Fri 2025-12-26 03:19:44 UTC
                 RTC time: Fri 2025-12-26 03:19:44
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

What it means: Clock is synchronized. This reduces the chance of TLS handshake failures with PBS or SMB auth edge cases.

Decision: If out of sync, fix time first. It’s boring and it works.

Task 19: Check that the storage mount survives reboot the way you think it does

cr0x@pve2:~$ grep -n "backup-nfs" /etc/fstab
12:nas01:/exports/pve-backups /mnt/pve/backup-nfs nfs4 rw,hard,timeo=600,retrans=2,_netdev 0 0

What it means: There is an fstab entry. Good. Options matter.

Decision: If missing on one node, that’s why “shared” isn’t shared. Standardize mounts across nodes.

Task 20: Spot the dangerous “nofail” + local directory trap

cr0x@pve2:~$ grep -n "backup-nfs" /etc/fstab
12:nas01:/exports/pve-backups /mnt/pve/backup-nfs nfs4 rw,nofail,_netdev 0 0

What it means: If NFS is down at boot, the mount might not occur, but the directory still exists, and jobs may write locally.

Decision: Replace with systemd automount or explicit dependency ordering so services don’t proceed without storage (details in the checklist section).

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

They had a three-node Proxmox cluster and one shiny “backup” storage pointing at an NFS share. The admin who set it up did the right thing in the GUI: added NFS storage, ticked “Shared”, set content to “VZDump backup file”. Everyone nodded.

Backups succeeded for weeks. That’s the dangerous part. The only reason they succeeded is that most of the “important” VMs lived on node 1 for historical reasons, and the backup schedule ran when node 1 was healthy.

Then a host maintenance window moved a batch of VMs to node 2. The next nightly backups ran from node 2. Storage reported “not available on node.” Someone reran jobs manually on node 1 and called it “temporary.” Two days later, node 1 had an unplanned reboot and the business discovered that “temporary” is another word for “permanent.”

The root cause was not exotic: only node 1 had an /etc/fstab entry for the NFS share. Node 2 and 3 had the directory /mnt/pve/backup-nfs (created by Proxmox), but it wasn’t mounted. The team assumed the cluster config made it “shared.” It didn’t.

The fix was also not exotic: identical mount configuration deployed via automation, plus a canary file check in monitoring: “is this path mounted and writable?” Incidents rarely need cleverness. They need discipline.

Mini-story 2: The optimization that backfired

A different company wanted faster boot times and fewer “node stuck at boot waiting for NAS” incidents. They changed NFS mounts to include nofail and added some systemd tweaks so the hypervisor would come up even if storage was down.

Boot times improved. On paper. In reality, they swapped one failure mode for a sneakier one: nodes came up, the Proxmox GUI looked healthy, and the backup jobs ran. But on the mornings when the NAS was slow to respond, mounts didn’t happen in time. The backup path existed as a normal directory, so the backup job wrote to local disk.

Local disk filled. VMs started pausing. Logs exploded. The storage team got paged for “NAS performance” even though the NAS was fine—Proxmox had quietly diverted workload to local storage because the mount wasn’t there.

They fixed it by using x-systemd.automount (so access triggers the mount), removing “write-to-local” risk by making the mountpoint owned by root but not writable unless mounted, and adding a pre-backup hook to validate mount status. The moral: optimizing boot behavior without thinking about failure semantics is how you create haunted systems.

Mini-story 3: The boring but correct practice that saved the day

A financial services shop ran Proxmox with PBS as the primary backup target and NFS as a secondary “export” location. They were not exciting people. They wrote everything down, and they tested restores quarterly. This is why they slept.

One weekend, a core switch firmware upgrade introduced an ACL change that blocked TCP/8007 from one rack. Two of four Proxmox nodes could reach PBS; two could not. Backups started failing only for VMs currently running on those isolated nodes.

They caught it within an hour because their monitoring didn’t just watch “backup job success.” It watched storage reachability from every node. The alert said, essentially, “pve3 cannot reach pbs01:8007.” No guessing, no archaeology.

They failed over by pinning backup jobs to healthy nodes temporarily, then rolled back the ACL change. After that, they added a change-management checklist item: “validate PBS connectivity from each hypervisor.” Boring. Correct. Saved the day.

Common mistakes: symptom → root cause → fix

1) Symptom: “storage is not available on node” only on one node

Root cause: Mount exists only on some nodes, or DNS resolves differently per node.

Fix: Standardize mounts via /etc/fstab or systemd mount units across all nodes; verify with findmnt and getent hosts on every node.

2) Symptom: storage shows “active”, but backups fail with permission errors or “cannot create file”

Root cause: NFS root_squash, SMB ACL mismatch, or wrong ownership/mode on the target directory.

Fix: Decide your security model: either allow root writes on the export (carefully) or map to a dedicated service user with consistent UID/GID; then test with touch as root.

3) Symptom: backups “succeed” but space usage is on local disk, not NAS

Root cause: Mount not present; writes went to the mountpoint directory on the local filesystem (often due to nofail at boot).

Fix: Remove the trap. Use systemd automount or enforce mount availability before scheduler runs; make the mountpoint non-writable when not mounted; monitor for “is mounted” not “directory exists”.

4) Symptom: NFS mount works manually but not during boot

Root cause: Network not ready when mount is attempted; missing _netdev or systemd ordering problems.

Fix: Add _netdev, consider x-systemd.automount, and ensure network-online target if needed. Validate via reboot test.

5) Symptom: only PBS-backed storage fails, filesystem storages fine

Root cause: Port blocked, TLS trust/fingerprint mismatch, auth token revoked, or datastore renamed/removed on PBS.

Fix: Verify connectivity to TCP/8007, validate PBS storage config, re-approve fingerprint if it changed intentionally, and confirm datastore exists.

6) Symptom: storage intermittently “inactive” with NFS under load

Root cause: Network flaps, NFS server saturation, stale handles, or too-aggressive timeouts.

Fix: Stabilize network, tune NFS server, use hard mounts with sane timeouts, and consider separating backup traffic onto a dedicated VLAN/interface.

7) Symptom: after adding a new node, backups fail on that node only

Root cause: New node didn’t get the OS-level mount setup, firewall rules, DNS search domains, or CA trust store entries.

Fix: Treat node provisioning as code. Apply the same storage mount and network policy as existing nodes before putting it into rotation.

8) Symptom: storage config looks right, but node behaves like it’s not in the cluster

Root cause: Quorum loss, corosync issues, or pmxcfs problems.

Fix: Restore cluster health first. Validate pvecm status, network between nodes, and corosync ring stability.

Checklists / step-by-step plan

Step-by-step: make “shared” actually shared (NFS/CIFS directory style)

  1. Pick one canonical storage ID and mountpoint. Example: backup-nfs mounted at /mnt/pve/backup-nfs.

    Do not create per-node variations like /mnt/pve/backup-nfs2 “just for now”. “Just for now” is how you get archaeology jobs.

  2. Standardize name resolution. Use getent hosts nas01 on all nodes and ensure it resolves identically. If you must pin, use /etc/hosts consistently.

  3. Deploy identical mount configuration to every node. Use /etc/fstab or systemd mount units. The key is identical behavior on reboot.

    Minimal NFSv4 example:

    cr0x@pve1:~$ sudo sh -c 'printf "%s\n" "nas01:/exports/pve-backups /mnt/pve/backup-nfs nfs4 rw,hard,timeo=600,retrans=2,_netdev 0 0" >> /etc/fstab'
    

    Decision: If you require the node to boot even when NAS is down, don’t blindly add nofail. Use automount plus guardrails.

  4. If you use systemd automount, do it deliberately. It avoids boot hangs and reduces “mount wasn’t ready” races.

    Example mount options in fstab:

    cr0x@pve1:~$ sudo sed -i 's#nfs4 rw,hard,timeo=600,retrans=2,_netdev#nfs4 rw,hard,timeo=600,retrans=2,_netdev,x-systemd.automount,x-systemd.idle-timeout=600#' /etc/fstab
    

    Decision: If your workload includes tight backup windows, validate automount latency under load.

  5. Enforce “not mounted means not writable”. One practical tactic: make the mountpoint owned by root and mode 000 when unmounted, then let the mount overlay provide permissions. This reduces “writes went local” surprises. Test it carefully so Proxmox can still mount.

  6. Validate on each node: mount exists, write works, latency acceptable.

    cr0x@pve2:~$ sudo mount -a && findmnt /mnt/pve/backup-nfs && sudo sh -c 'dd if=/dev/zero of=/mnt/pve/backup-nfs/.speedtest bs=1M count=64 conv=fdatasync' 
    64+0 records in
    64+0 records out
    67108864 bytes (67 MB, 64 MiB) copied, 0.88 s, 76.6 MB/s
    

    Decision: If throughput is wildly different per node, you likely have routing differences, NIC issues, or a switch path problem.

  7. Confirm Proxmox sees it active everywhere.

    cr0x@pve3:~$ pvesm status | grep backup-nfs
    backup-nfs        nfs     active        9.09TiB         3.21TiB         5.88TiB   35.31%
    

    Decision: Only after this do you consider backup job scheduling tweaks.

  8. Add monitoring that checks mount and writability from every node. The check should be dumb on purpose: “is mounted” and “can create a file” and “is the filesystem type expected”.

Step-by-step: PBS-backed “storage not available”

  1. Confirm TCP reachability to PBS on port 8007 from every node. Use nc -vz.
  2. Confirm the storage is active in pvesm status. If inactive, it’s usually connectivity or auth.
  3. Validate time sync. TLS hates time travel.
  4. Confirm PBS datastore exists and hasn’t been renamed. Storage config can outlive the thing it points to.
  5. Be strict about certificates and fingerprints. If a fingerprint changed unexpectedly, treat it as a security event until proven otherwise.

Step-by-step: make backups resilient without lying to yourself

  1. Prefer PBS for dedup + integrity. Use filesystem shares as secondary export/replication, not your only lifeline.
  2. Keep a local fallback only if you alert on it. A local backup directory can save you during a NAS outage, but only if you monitor local disk pressure and rotate aggressively.
  3. Test restores. Not once. Regularly. A backup you haven’t restored is a rumor.

FAQ

1) What does “storage is not available on node” mean in Proxmox?

It means the node executing the operation (often vzdump) can’t access that storage backend at that moment. Typically: not mounted, unreachable, wrong credentials, or permission denied.

2) If storage is marked “Shared”, why doesn’t Proxmox mount it everywhere?

Because Proxmox manages storage definitions, not kernel mounts. Mounting is OS-level state. You must configure mounts on each node (or use a backend that is inherently shared, like Ceph).

3) Can I fix this by unchecking “Shared”?

You can silence some scheduling and migration expectations, but you won’t fix the underlying problem. If multiple nodes need to back up to it, it must be accessible from multiple nodes. Make reality match the checkbox, not the other way around.

4) Why does it fail only sometimes?

Because only some backups run on the node that can’t access storage, or because mounts are flaky (automount delays, network blips, NAS load). Intermittent failures are still failures; they just wait for your worst day.

5) What’s the difference between NFS backup storage and PBS storage in Proxmox?

NFS is a mounted filesystem path. PBS is an API-based backup datastore with deduplication, compression, verification, and pruning semantics. Troubleshooting PBS availability is closer to debugging an application dependency than a mount.

6) My NFS share mounts, but backups fail with “Permission denied”. What now?

Check for root_squash and directory permissions. Proxmox backup jobs typically need root to write. Either allow root writes on that export (risk trade-off) or map to a dedicated service identity with consistent UID/GID and appropriate permissions.

7) How do I prevent backups from writing to local disk when the NFS mount is missing?

Don’t rely on directory existence. Enforce mount checks: use systemd automount and/or make the mountpoint non-writable when unmounted, and monitor “is mounted” plus “write test”. Avoid casual nofail without guardrails.

8) Does cluster quorum affect storage availability?

Not directly for mounts, but quorum loss can cause cluster services and config distribution to behave differently. If a node isn’t quorate, fix cluster health first so you aren’t debugging two problems at once.

9) Is it okay to have different mount options on different nodes if it still “works”?

It’s okay right up until it isn’t. Different NFS versions and mount options can change locking behavior, performance, and failure semantics. Standardize. Your future self will send a thank-you note.

10) Should I use CIFS/SMB for Proxmox backups?

You can, but it’s usually more fragile than NFS in Linux hypervisor environments due to auth and ACL complexity. If you must use it, standardize mount options and credentials across nodes and test failure behavior.

Conclusion: next steps that prevent repeats

“Backup storage not available on node” is Proxmox telling you the truth. The unpleasant part is that the truth lives below the GUI: mounts, networks, permissions, identity mapping, and boot ordering.

Next steps that actually change outcomes:

  1. Pick the failing node and run the fast diagnosis playbook. Confirm whether the storage is inactive, unmounted, unreachable, or unwritable.
  2. Standardize mounts across every node. Same server name, same export/share, same mountpoint, same options.
  3. Remove the “writes went local” trap. If you use nofail, counterbalance it with automount and monitoring.
  4. Add monitoring per node. Check mount + writability + expected filesystem type, not just “backup job succeeded”.
  5. Test a restore. Not because it’s fun. Because it’s cheaper than learning during an outage.

If you want one mental model to keep: the “Shared” checkbox is a promise you make to Proxmox. Keep it, and backups become boring. Break it, and backups become a weekly surprise.

← Previous
Docker Nginx upstream errors: debug 502/504 with the correct logs
Next →
GPU shortages: how gamers became collateral damage

Leave a comment