Debian 13: /etc/pve looks empty after reboot — why mounts fail and how to recover

Was this helpful?

You reboot a Debian 13 host running Proxmox VE and suddenly /etc/pve looks like a freshly installed system: no storage.cfg, no VM configs, nothing.
Your first thought is “we lost the cluster config.” Your second thought is unprintable.

Most of the time, nothing is “gone.” You’re looking at the wrong filesystem at the wrong moment because a mount, a service dependency, quorum, or a storage import failed.
The fix is rarely heroic; it’s usually disciplined diagnosis and refusing to guess.

What /etc/pve actually is (and why it “vanishes”)

On Proxmox VE, /etc/pve is not “a directory with some files.” It’s a mountpoint for the Proxmox Cluster File System, pmxcfs.
It’s a userspace filesystem (FUSE) that presents cluster configuration as a coherent tree, backed by an internal database and synchronized via cluster messaging.

When pmxcfs isn’t mounted, your system still has a literal directory at /etc/pve—because Linux needs somewhere to mount things.
That directory is typically empty or contains only whatever was left behind from early boot. So you aren’t seeing “lost configs.” You’re seeing “not mounted.”

The trick is that mount failures and “empty” configs often show up together with storage problems:
ZFS pools not imported, LVM volumes not activated, iSCSI sessions not logged in, NFS not mounted, Ceph not ready, or systemd starting services out of order.
One boot race later, your management stack is running with half the floor missing.

One quote that operations people earn the hard way: “Hope is not a strategy.” — Gene Kranz.
It’s not storage-specific, but it’s painfully relevant when someone says, “Maybe it’ll be fine after another reboot.”

Joke #1: If your recovery plan is “reboot until it works,” congratulations—you’ve invented a slot machine with worse odds.

Fast diagnosis playbook

This is the shortest path from “/etc/pve is empty” to “I know exactly what’s broken.” Do it in order. Don’t freestyle.

1) Confirm whether /etc/pve is mounted (not “empty”)

  • If it’s not a FUSE mount, you’re looking at the underlying directory.
  • If it is mounted, your issue is probably quorum, corosync, or permissions/lock contention—not a mount.

2) Check pmxcfs status and logs

  • If pmxcfs is down, fix that first. Anything else is noise.
  • If pmxcfs is up, check if it complains about database corruption, lock files, or cluster state.

3) Check corosync + quorum (on clusters)

  • No quorum often means the UI is weird, config looks partial, and cluster operations are blocked.
  • Single-node setups can still suffer if they were once clustered or misconfigured.

4) Check storage readiness and systemd ordering

  • If ZFS didn’t import or LVM didn’t activate, mounts/services depending on them will fail.
  • Fix storage readiness, then restart dependent services cleanly.

5) Decide: recover in-place vs. restore from backups

  • If pmxcfs DB is damaged, you may need to rebuild the node’s state from other nodes or backups.
  • If it’s just “not mounted,” fix the service and move on.

The real failure modes (and what they look like)

Failure mode A: pmxcfs never mounted

This is the classic “/etc/pve is empty” case. The directory exists but the FUSE mount didn’t happen.
Causes include pmxcfs service not starting, failing immediately, or being blocked waiting on something else.

What you’ll see:

  • mount doesn’t show pmxcfs.
  • systemctl status pve-cluster shows failed.
  • journalctl shows startup errors (often lock/db issues).

Failure mode B: corosync is down or quorum is lost

In a cluster, pmxcfs depends on cluster communications. If corosync can’t form membership or quorum is lost,
pmxcfs may go read-only, partially functional, or block certain operations. Sometimes /etc/pve still mounts,
but writes fail and management tools behave like they’re walking through wet cement.

Failure mode C: storage didn’t come back, and services started anyway

Debian 13 brings newer systemd behaviors and timing changes, plus whatever your hardware/firmware decides to do this week.
A simple boot order issue can make storage “late”:

  • ZFS pools not auto-imported because device naming changed or a pool was last imported elsewhere.
  • LVM VG not activated because multipath devices appear after the activation attempt.
  • NFS mounts blocked by network-online not actually meaning “network usable.”
  • iSCSI logins not occurring early enough.

The result is secondary chaos: storages missing in the UI, VMs won’t start, backups fail, and admins misdiagnose it as “Proxmox ate my config.”
The config is still there; the things it references are not.

Failure mode D: pmxcfs database or state corruption

Less common, but real. Sudden power loss, disk-full conditions, broken RAM, or aggressive filesystem tuning can damage state files.
You’ll see explicit errors about the pmxcfs database, inability to load state, or repeated crashes.

Failure mode E: You’re on the wrong node or wrong root

I’ve seen people “fix” empty /etc/pve while being chrooted into a rescue environment, or booted into the wrong root volume.
The filesystem is fine; the operator is simply not on the system they think they are.

Joke #2: Nothing makes a sysadmin humble like realizing they’ve been fixing the wrong server for 20 minutes.

Practical tasks: commands, meaning, and decisions

These are real tasks you can do on a production box at 3 a.m. Each includes:
the command, what the output means, and the decision you make.
Run them as root or with sudo where appropriate.

Task 1: Verify /etc/pve is a mount (and what type)

cr0x@server:~$ findmnt -no TARGET,SOURCE,FSTYPE,OPTIONS /etc/pve
/etc/pve pmxcfs fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0

Meaning: If you see pmxcfs fuse, it’s mounted and alive. If you see nothing or it reports ext4/xfs,
you’re on the underlying directory and pmxcfs isn’t mounted.

Decision: If not mounted, stop looking for “missing files” and start fixing pve-cluster/pmxcfs.

Task 2: Check whether pmxcfs is running and whether it crashed

cr0x@server:~$ systemctl status pve-cluster --no-pager
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
     Active: failed (Result: exit-code) since Sun 2025-12-28 02:41:18 UTC; 2min 9s ago
    Process: 813 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)

Meaning: It’s not running; it exited with an error.

Decision: Pull logs next. Don’t restart randomly—understand the failure (DB, lock, permissions, quorum).

Task 3: Read the pve-cluster logs for a direct reason

cr0x@server:~$ journalctl -u pve-cluster -b --no-pager -n 80
Dec 28 02:41:18 server pmxcfs[813]: [main] notice: starting pmxcfs
Dec 28 02:41:18 server pmxcfs[813]: [main] error: unable to open database '/var/lib/pve-cluster/config.db': Input/output error
Dec 28 02:41:18 server systemd[1]: pve-cluster.service: Main process exited, code=exited, status=255/EXCEPTION
Dec 28 02:41:18 server systemd[1]: pve-cluster.service: Failed with result 'exit-code'.

Meaning: This is not a mount issue; it’s a state/IO issue. That “Input/output error” could be disk, filesystem, or underlying storage.

Decision: Immediately check disk health and filesystem errors; avoid repeated restarts that could worsen corruption.

Task 4: Confirm you’re on the expected root filesystem (not rescue, not the wrong LV)

cr0x@server:~$ findmnt -no TARGET,SOURCE,FSTYPE /
/ /dev/mapper/pve-root ext4

Meaning: You’re on pve-root. If you see something like /dev/sda1 unexpectedly, or a live/rescue overlay,
you’re not where you think you are.

Decision: If wrong root, stop and boot the correct volume; do not “repair” configs in the wrong environment.

Task 5: Check corosync status and membership (clustered nodes)

cr0x@server:~$ systemctl status corosync --no-pager
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
     Active: active (running) since Sun 2025-12-28 02:39:02 UTC; 4min 25s ago

Meaning: Corosync is running. That doesn’t guarantee quorum, but it’s a start.

Decision: If corosync is dead, fix network/interfaces/firewall and corosync config before blaming pmxcfs.

Task 6: Check quorum explicitly

cr0x@server:~$ pvecm status
Cluster information
-------------------
Name:             prod-cluster
Config Version:   14
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Dec 28 02:43:12 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.23
Quorate:          No

Meaning: Not quorate. In this state, many cluster operations are blocked to prevent split-brain.

Decision: Restore quorum by bringing up enough nodes or fixing connectivity. Avoid “forcing” quorum unless you accept the risk.

Task 7: See whether pmxcfs is mounted read-only (a subtle gotcha)

cr0x@server:~$ findmnt -no TARGET,FSTYPE,OPTIONS /etc/pve
/etc/pve fuse rw,nosuid,nodev,relatime

Meaning: This shows rw, so at least the mount flags aren’t forcing read-only.
If you see ro or writes fail, you likely have quorum or underlying IO problems.

Decision: If read-only because of quorum, fix quorum. If read-only due to IO errors, fix storage/filesystem.

Task 8: Confirm storage services didn’t fail during boot (systemd-wide triage)

cr0x@server:~$ systemctl --failed --no-pager
  UNIT                         LOAD   ACTIVE SUB    DESCRIPTION
● zfs-import-cache.service     loaded failed failed  Import ZFS pools by cache file
● pve-storage.service          loaded failed failed  Proxmox VE storage daemon

Meaning: You have concrete failures: ZFS import and pve-storage.

Decision: Fix ZFS import first; then restart pve-storage and re-check storages.

Task 9: Check whether ZFS pools imported

cr0x@server:~$ zpool status
no pools available

Meaning: No pools imported. That will break any storage defined on those pools.

Decision: Attempt safe import. If disks are missing, stop and investigate hardware/device paths.

Task 10: Attempt ZFS pool import safely (read-only first if suspicious)

cr0x@server:~$ zpool import
   pool: rpool
     id: 1234567890123456789
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        rpool  ONLINE
          sda3 ONLINE

Meaning: The pool is importable.

Decision: If this is the boot pool, import normally; if you suspect damage, import read-only to inspect.

cr0x@server:~$ zpool import -f rpool
cr0x@server:~$ zpool status rpool
  pool: rpool
 state: ONLINE
config:

        NAME  STATE     READ WRITE CKSUM
        rpool ONLINE       0     0     0
          sda3 ONLINE       0     0     0

Meaning: Pool is back. Now dependent services should have a fighting chance.

Decision: Restart the services that failed due to missing ZFS (typically pve-storage, sometimes pvedaemon/pveproxy if they depend on configs/storages).

Task 11: LVM cases — check if the VG is active

cr0x@server:~$ vgs
  VG  #PV #LV #SN Attr   VSize   VFree
  pve   1   3   0 wz--n- 930.00g 120.00g

Meaning: VG exists. If attributes show it’s inactive or missing, boot-time activation failed.

Decision: If missing, check device discovery (multipath/raid). If present but inactive, activate it.

Task 12: Activate LVM volumes (only if you know it’s safe)

cr0x@server:~$ vgchange -ay pve
  1 logical volume(s) in volume group "pve" now active

Meaning: LVs are active; mounts and services depending on them can proceed.

Decision: If activation fails, don’t brute-force. Investigate missing PVs, multipath, and dmesg.

Task 13: Networked storage cases — confirm network-online is real

cr0x@server:~$ systemctl status systemd-networkd-wait-online --no-pager
● systemd-networkd-wait-online.service - Wait for Network to be Configured
     Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; enabled)
     Active: active (exited) since Sun 2025-12-28 02:38:40 UTC; 6min ago

Meaning: It says “online,” but that only means interfaces are configured, not that your gateway/DNS/storage VLAN is reachable.

Decision: Test the storage endpoint reachability explicitly.

Task 14: Verify NFS mounts and mount failures

cr0x@server:~$ findmnt -t nfs,nfs4
TARGET          SOURCE                 FSTYPE OPTIONS
/mnt/pve/backup nas01:/export/backup   nfs4   rw,relatime,vers=4.2,proto=tcp

Meaning: If the mount is absent, your storage isn’t available. If present, look elsewhere.

Decision: If absent, check journalctl -b for mount errors; fix DNS, routing, firewall, or auth.

Task 15: iSCSI cases — confirm sessions exist

cr0x@server:~$ iscsiadm -m session
tcp: [1] 10.10.20.50:3260,1 iqn.2003-01.org.linux-iscsi.nas01:storage.lun1 (non-flash)

Meaning: Session exists. If this is empty, LUNs won’t appear, and LVM/ZFS-on-iSCSI will fail.

Decision: If missing, re-login and ensure the service ordering is correct (iscsid before LVM activation).

Task 16: Check for obvious disk or filesystem errors behind pmxcfs DB issues

cr0x@server:~$ dmesg -T | tail -n 20
[Sun Dec 28 02:41:10 2025] blk_update_request: I/O error, dev sda, sector 194567890 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[Sun Dec 28 02:41:10 2025] EXT4-fs error (device dm-0): ext4_find_entry:1535: inode #262401: comm pmxcfs: reading directory lblock 0

Meaning: That’s not “Proxmox being weird.” That’s the kernel telling you storage is failing.

Decision: Treat this as an incident: stop risky writes, validate the disk/RAID/ZFS layer, and plan repair.

Task 17: Confirm /etc/pve content from the management layer

cr0x@server:~$ pvesh get /cluster/resources --type vm | head
vmid  status  name       node    type
100   running web-01     node1    qemu
101   stopped db-01      node1    qemu

Meaning: If pvesh can read cluster resources, pmxcfs is likely functional.

Decision: If this fails, focus on pmxcfs/corosync. If it works but storages are missing, focus on storage stack.

Task 18: Force a clean remount attempt (only after understanding why it failed)

cr0x@server:~$ systemctl restart pve-cluster
cr0x@server:~$ findmnt -no TARGET,SOURCE,FSTYPE /etc/pve
/etc/pve pmxcfs fuse

Meaning: pmxcfs is mounted again.

Decision: Re-check quorum and storage. Then verify that expected configs are present before restarting VMs.

Task 19: Validate that node configs exist (not just the mount)

cr0x@server:~$ ls -la /etc/pve/nodes
total 0
drwxr-xr-x 1 root www-data  0 Dec 28 02:47 .
drwxr-xr-x 1 root www-data  0 Dec 28 02:47 ..
drwxr-xr-x 1 root www-data  0 Dec 28 02:47 node1

Meaning: The cluster filesystem is presenting nodes. If it’s empty or missing your node, that’s a cluster membership/state issue.

Decision: If node is missing, check corosync membership and hostname consistency; don’t start changing configs blindly.

Task 20: Identify boot ordering problems for mounts/services

cr0x@server:~$ systemd-analyze critical-chain pve-storage.service
pve-storage.service +3.201s
└─network-online.target +2.998s
  └─systemd-networkd-wait-online.service +2.950s
    └─systemd-networkd.service +1.102s
      └─systemd-udevd.service +0.421s

Meaning: pve-storage waited on “network-online,” which may or may not include your storage path readiness.

Decision: If you depend on iSCSI/NFS, ensure those units have explicit dependencies and timeouts suited to reality.

Three corporate-world mini-stories (because this keeps happening)

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company migrated two hypervisor nodes to new hardware and reused the old management IPs. The change window was tight.
Someone rebooted one node to “verify BIOS settings” and came back to an empty /etc/pve. Panic spread quickly because the UI showed missing VM configs.

The wrong assumption was simple: “If /etc/pve is empty, the configuration is gone from disk.”
They started restoring random backups into /etc and editing /etc/fstab to “fix mounts,” neither of which had anything to do with pmxcfs.
On a Proxmox host, that’s like fixing a toaster by repainting the kitchen.

The actual problem was corosync. During the hardware migration, the team had changed interface names (predictable naming),
but left corosync.conf referencing the old NIC. Corosync never formed membership, quorum was lost, and pmxcfs didn’t come up properly.
The “empty” directory was just an unmounted mountpoint.

The recovery was clean once they stopped guessing: fix interface naming, restart corosync, confirm quorum, restart pve-cluster.
Everything “reappeared” instantly. The only lasting damage was a set of manual edits that had to be reverted.

The takeaway: treat /etc/pve as a service endpoint. If it’s empty, that’s a service failure until proven otherwise.

Mini-story 2: The optimization that backfired

Another org had a habit: shave seconds off boot time wherever possible. It started innocently—disable unused services,
shorten timeouts, remove “unnecessary waits.” They reduced the wait-online timeout and parallelized more of the boot.
Their graphs looked nicer. Their confidence expanded to fill the available space.

After an unplanned power event, several nodes rebooted simultaneously. Some came up clean; others showed missing storages,
and a couple had the infamous empty /etc/pve look during early troubleshooting. The underlying cause wasn’t mystical.

iSCSI sessions were not established yet when LVM activation ran. LVM saw no PVs, so it didn’t activate the VG.
Services depending on those volumes started anyway and cached “missing” state. Later, when iSCSI finally logged in,
nobody retriggered activation cleanly. The system was “up,” but in a half-real universe.

Fixing it required reversing the “optimization”: enforce ordering between iscsid login and volume activation, and stop pretending
that network-ready means storage-ready. They also added targeted retries for storage, rather than random restarts of the entire stack.

The takeaway: boot-time parallelism is great until it intersects with distributed storage that doesn’t care about your charts.

Mini-story 3: The boring but correct practice that saved the day

A regulated environment (think: paperwork with teeth) ran a Proxmox cluster with strict change control.
They did two unglamorous things consistently: (1) nightly config backups of /etc/pve and critical host configs,
and (2) quarterly disaster recovery drills where someone other than the usual operator restored a node.

A node rebooted after a kernel update and failed to mount pmxcfs due to an underlying disk error that corrupted the pmxcfs database file.
This was not a “restart the service” situation. The node’s local state was untrustworthy.

Their response was boring and fast. They put the node in maintenance, confirmed the cluster remained quorate without it,
replaced the failed disk, reinstalled the node OS, rejoined it to the cluster, and restored only the minimal required local config.
VM definitions were still safely in the cluster; storage definitions were revalidated rather than blindly reimported.

The end result: one node outage, no VM data loss, no config archaeology at 3 a.m.
The most “exciting” part was watching someone follow a runbook line-by-line and being done early.

The takeaway: routine backups + practiced restore beats “tribal memory” every single time.

Common mistakes: symptom → root cause → fix

1) Symptom: /etc/pve is empty after reboot

Root cause: pmxcfs not mounted (pve-cluster failed) or you’re in the underlying directory.

Fix: findmnt /etc/pve and systemctl status pve-cluster. Fix the cause in logs, then restart pve-cluster.

2) Symptom: /etc/pve is present but writes fail (permission denied / read-only)

Root cause: Cluster not quorate; pmxcfs may refuse writes to prevent split-brain.

Fix: Restore quorum (bring nodes back, fix corosync network). Don’t “force quorum” unless you understand the blast radius.

3) Symptom: UI shows missing storages, VMs won’t start, but /etc/pve is fine

Root cause: Storage not ready: ZFS not imported, LVM not active, NFS/iSCSI not mounted/logged in.

Fix: Diagnose at the storage layer first (zpool status, vgs, findmnt, iscsiadm).
Then restart pve-storage and validate storage definitions.

4) Symptom: pve-cluster fails with DB errors

Root cause: Disk IO errors, filesystem corruption, or pmxcfs database corruption.

Fix: Stop thrashing the service. Check dmesg for IO errors and repair the underlying filesystem/hardware.
Recover pmxcfs state from a healthy node or backups as needed.

5) Symptom: Everything works until you reboot; then it breaks again

Root cause: Boot ordering/race conditions in systemd; missing dependencies for networked storage or slow devices.

Fix: Use systemd-analyze critical-chain. Add proper unit dependencies and timeouts.
Avoid “sleep 30” hacks unless you like living in the past.

6) Symptom: Empty /etc/pve when booted into rescue ISO

Root cause: You’re not booted into the installed system. pmxcfs isn’t running. Of course it’s empty.

Fix: Mount the real root filesystem and chroot only if you must; otherwise boot normally and troubleshoot there.

7) Symptom: Node name changed; cluster looks “new”

Root cause: Hostname mismatch with cluster config; pmxcfs paths are node-name based.

Fix: Restore correct hostname and hosts resolution; ensure corosync and Proxmox see the expected node name.

Checklists / step-by-step plan

Checklist A: “/etc/pve is empty” recovery steps (safe and fast)

  1. Confirm mount: findmnt /etc/pve.
    If not pmxcfs, treat it as a service failure.
  2. Check pve-cluster: systemctl status pve-cluster and journalctl -u pve-cluster -b.
    The logs usually tell you what’s wrong. Believe them.
  3. Cluster check: systemctl status corosync and pvecm status.
    If not quorate, fix quorum before trying to write configs.
  4. Storage check: systemctl --failed, then validate ZFS/LVM/NFS/iSCSI as applicable.
    Fix storage readiness issues (imports, activations, mounts).
  5. Restart in the right order:
    storage services first (zfs-import, iscsid, remote-fs), then pve-storage, then pve-cluster if needed.
  6. Validate: ls /etc/pve, pvesh get /cluster/resources, and confirm storages show up.
  7. Only then: start VMs or re-enable HA resources.

Checklist B: When to stop and declare a hardware/storage incident

  1. dmesg shows I/O errors, resets, or filesystem errors.
  2. SMART/RAID reports degraded arrays or media errors (vendor tooling applies).
  3. pmxcfs DB errors coincide with kernel disk errors.
  4. The problem returns after “fixing” services without any config changes.

If any of these are true, your priority is data integrity and stability, not “getting green lights in the UI.”

Checklist C: Preventing the reboot surprise (the boring controls that work)

  1. Add explicit systemd dependencies for your storage stack (iSCSI before LVM, network reachability before NFS).
  2. Avoid fragile device paths; prefer stable identifiers (WWN, by-id). If you must use multipath, make it deterministic.
  3. Ensure the cluster has an odd number of voters or a proper quorum device.
  4. Keep /etc/pve backups and test restores on a schedule, not during an incident.
  5. Reboot drills: reboot one node during business hours occasionally. If that scares you, that’s the point.

Interesting facts and historical context

  • pmxcfs is a FUSE filesystem. That’s why /etc/pve can “disappear” without deleting anything—it’s a mountpoint for a userspace process.
  • Split-brain prevention drives design. Cluster stacks often block writes without quorum because the alternative is quietly corrupting state across nodes.
  • Corosync has been a Linux HA staple for years. It’s used as a cluster messaging layer in multiple HA ecosystems, not just Proxmox deployments.
  • systemd changed how people think about boot order. The shift from sequential init scripts to parallel unit activation made race conditions more common—and more subtle.
  • Network-online is not the same as network-reachable. systemd can declare victory while your storage VLAN is still negotiating, routing is missing, or DNS is wrong.
  • ZFS imports are conservative by design. ZFS will refuse to auto-import pools in some circumstances to avoid importing the same pool on multiple systems at once.
  • Mountpoints are just directories until mounted. Linux happily shows you an empty mountpoint directory; it won’t warn you that “this is normally a different filesystem.”
  • Cluster config location is a tradeoff. Centralizing config improves consistency but raises the bar for quorum and cluster health; that’s why local fallback behaviors can be limited.
  • Boot-time storage readiness is a multi-layer problem. Firmware, HBA initialization, multipath, udev, network, authentication, and filesystem import all need to line up.

FAQ

1) Is my Proxmox configuration actually deleted if /etc/pve is empty?

Usually no. The most common situation is that pmxcfs didn’t mount, so you’re viewing the underlying empty directory.
Confirm with findmnt /etc/pve.

2) Why does this show up after a reboot and not during normal runtime?

Reboots reshuffle timing: device discovery, network readiness, cluster membership, and service ordering.
Boot races are polite during uptime and rude during boot.

3) Can I just recreate /etc/pve and copy files back?

Don’t. /etc/pve is a mountpoint for pmxcfs. Copying files into the underlying directory won’t fix the real issue,
and can confuse future troubleshooting.

4) What if pmxcfs is mounted but the GUI still looks wrong?

Then focus on quorum and corosync health, or on storage readiness. A mounted pmxcfs means the filesystem endpoint exists;
it doesn’t guarantee cluster operations are permitted or that storages are available.

5) Is it safe to restart pve-cluster and corosync on a production node?

It can be, but only after you understand current cluster state. Restarting corosync on the wrong node at the wrong time can worsen an outage.
If the cluster is quorate without this node, isolate it and fix it without destabilizing the rest.

6) My cluster is “not quorate.” Can I force it?

You can, and you might regret it. Forcing quorum can allow writes that lead to split-brain when the other partition comes back.
Do it only if you’ve confirmed the other side is truly gone and you accept the reconciliation work later.

7) Why do storage failures make /etc/pve look empty?

They don’t directly—pmxcfs is separate. But storage failures often co-occur with boot timing issues and service failures,
and they produce similar symptoms: missing VM disks, missing storages, failed services, and a general feeling that reality is optional.

8) What’s the single most useful first command?

findmnt /etc/pve. It tells you whether you’re dealing with a mount/service issue or chasing ghosts in an empty directory.

9) How do I prevent this from recurring on Debian 13?

Make boot ordering explicit for storage dependencies, ensure stable device naming, validate corosync interfaces after upgrades,
and test reboots like you mean it.

10) If pmxcfs DB is corrupted, do I need to reinstall?

Not always, but sometimes reinstalling and rejoining the cluster is the cleanest path—especially if you also have disk errors.
Treat corruption plus IO errors as a hardware/storage problem first.

Conclusion: next steps that actually prevent repeats

When /etc/pve looks empty after reboot on Debian 13, assume a mount/service problem first, not data loss.
Confirm whether pmxcfs is mounted. Read the logs. Check quorum. Then check storage readiness and systemd ordering.
That sequence saves hours because it narrows the failure domain quickly.

Practical next steps:

  1. Run the fast diagnosis playbook and write down what failed (pmxcfs, corosync/quorum, ZFS/LVM, remote mounts).
  2. If the issue is boot ordering, fix it with proper systemd dependencies—not sleep hacks.
  3. If you see IO errors, treat it as a storage incident and stop “restarting until it works.”
  4. Schedule a reboot drill and a config restore drill. The day you practice is the day you’re calm.
← Previous
Debian 13: Package pinning saved my server — how to use apt preferences without chaos
Next →
Responsive Tables for Technical Docs That Don’t Break in Production

Leave a comment