That red banner in the Proxmox UI—“cluster filesystem not ready”—has a special talent: it arrives right when you’re trying to do something urgent. Migrate a VM. Attach storage. Stop a runaway container. Suddenly the UI can’t read or write cluster config, and half the buttons feel like they’re on strike.
This is one of those errors that looks like “the cluster is broken,” when the real story is usually narrower: a service (pmxcfs) can’t mount or serve /etc/pve, often because corosync quorum isn’t healthy, the node is under pressure, or time/network assumptions stopped being true.
What “cluster filesystem not ready” actually means
In Proxmox VE, the “cluster filesystem” is not your CephFS, NFS, ZFS, or anything you mounted under /mnt. It’s pmxcfs, a user-space filesystem (FUSE) that lives at /etc/pve. It’s where Proxmox stores and distributes the cluster’s configuration: node definitions, VM configs, storage config, corosync config, firewall rules, and a bunch of state metadata.
When Proxmox says the cluster filesystem isn’t ready, it’s telling you:
/etc/pveis not mounted correctly (pmxcfs not running or stuck), or- pmxcfs is running but refuses to serve writable config because the cluster isn’t in a consistent state (often a quorum or corosync problem), or
- pmxcfs can’t operate due to local node issues (disk full, RAM pressure, time jump, FUSE hang).
The UI and most CLI actions talk to /etc/pve. If that mount is missing or read-only, Proxmox can’t safely change cluster state. That’s why you see unrelated-looking errors: “unable to parse storage config,” “cannot write file,” “failed to lock file,” and so on. They’re all downstream of the same bottleneck: pmxcfs is not providing the config filesystem.
How pmxcfs and /etc/pve work (and why it’s touchy)
pmxcfs is the little engine behind cluster config. It runs as a daemon, mounts a FUSE filesystem at /etc/pve, and replicates changes across nodes using the cluster communications layer (corosync). It also keeps a local copy so a node can still boot and run VMs even if others are down—within reason.
Key operational truths
- /etc/pve is not a normal directory. It’s a live mount. “Fixing” it by editing files while pmxcfs is down can be either smart or disastrous, depending on what you touch.
- Quorum matters. In a multi-node cluster, Proxmox wants majority agreement before allowing config writes. This isn’t pedantry; it prevents split-brain config divergence.
- Corosync is the transport. If corosync can’t form a stable membership, pmxcfs typically won’t be “ready,” or it’ll be read-only.
- Latency, packet loss, and time drift are not “network issues,” they are “cluster integrity issues.” The cluster stack treats them like threats to consistency because that’s what they are.
Here’s the operational model I want you to keep in your head:
- System boots.
- corosync tries to form cluster membership.
- pve-cluster (pmxcfs) mounts
/etc/pve. - pvedaemon/pveproxy (API/UI) read and write config under
/etc/pve.
If you break #2, #3 will wobble. If you break #3, #4 becomes a crime scene.
One paraphrased idea, attributed to Werner Vogels (reliability and distributed systems): everything fails, so you design assuming it will
(paraphrased idea).
Joke #1: A Proxmox cluster without quorum is like a meeting without minutes—everyone remembers it differently, and nobody agrees what happened.
Fast diagnosis playbook
If you have five minutes and a pager vibrating your desk into sawdust, don’t wander. Run this in order. Each step narrows the bottleneck.
First: is /etc/pve mounted and is pmxcfs alive?
- Check mount and pve-cluster status. If pmxcfs is dead or the mount is missing, you’re in “local node” territory: service crash, FUSE hang, disk full, memory pressure.
Second: is corosync membership and quorum healthy?
- Check
pvecm statusandcorosync-cfgtool. If quorum is lost, decide whether you can restore networking/peer nodes, or whether you must temporarily force quorum (rarely the right long-term move).
Third: are time and network stable enough for consensus?
- Check time sync (
timedatectl) and corosync logs for retransmits, token timeouts, link flaps. Fix time drift and packet loss before restarting services; otherwise you’re just making fresh logs.
Fourth: is the node itself sick (disk/RAM/IO)?
- Check disk usage (
df), inode usage, memory pressure, and IO stalls. pmxcfs is lightweight but not magic; it can still fall over under extreme pressure.
Fifth: is this a split brain or a single-node isolate?
- If multiple nodes claim to be primary/active with inconsistent membership, stop and plan. Split brain is where “quick fixes” become career events.
Practical tasks: commands, outputs, decisions (12+)
These are the checks I actually run. Each one includes what the output implies and what decision you make next. Run them on the affected node first, then on a known-good node for comparison.
Task 1: Confirm /etc/pve is mounted as pmxcfs
cr0x@server:~$ mount | grep -E '(/etc/pve|pmxcfs)'
pmxcfs on /etc/pve type fuse.pmxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
Meaning: If you see fuse.pmxcfs, the mount exists. If nothing returns, the UI error is explained: pmxcfs isn’t mounted.
Decision: If missing, jump to pve-cluster service checks (Task 2) and logs (Task 3). If present but still “not ready,” look at quorum and corosync (Tasks 5–8).
Task 2: Check pve-cluster (pmxcfs) service health
cr0x@server:~$ systemctl status pve-cluster --no-pager
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Fri 2025-12-26 09:41:18 UTC; 3min ago
Main PID: 1187 (pmxcfs)
Tasks: 3 (limit: 154214)
Memory: 38.2M
CPU: 1.012s
CGroup: /system.slice/pve-cluster.service
└─1187 /usr/bin/pmxcfs
Meaning: If it’s not active/running, pmxcfs isn’t serving /etc/pve. If it’s flapping, you likely have an underlying corosync/time/disk issue.
Decision: If failed, inspect logs (Task 3), then restart carefully (Task 4) after you understand why it failed.
Task 3: Read pve-cluster logs for the real complaint
cr0x@server:~$ journalctl -u pve-cluster -n 200 --no-pager
Dec 26 09:41:18 server pmxcfs[1187]: [main] notice: starting pmxcfs
Dec 26 09:41:18 server pmxcfs[1187]: [main] notice: resolved node name 'server' to '10.10.0.11'
Dec 26 09:41:19 server pmxcfs[1187]: [dcdb] notice: data verification successful
Dec 26 09:41:20 server pmxcfs[1187]: [status] notice: quorum not present - operations restricted
Dec 26 09:41:20 server pmxcfs[1187]: [status] notice: continuing in local mode
Meaning: “quorum not present” is your headline. pmxcfs may mount, but it may be read-only or restrict operations. If you see database corruption messages, that’s a different play.
Decision: If quorum is missing, move to corosync/quorum checks (Tasks 5–8). If corruption is mentioned, plan recovery and backups before changing anything.
Task 4: Restart pve-cluster (only after reading logs)
cr0x@server:~$ systemctl restart pve-cluster
cr0x@server:~$ systemctl is-active pve-cluster
active
Meaning: If restart succeeds and mount appears, you may be back. If it immediately fails again, you didn’t fix the cause—go back to logs and corosync.
Decision: If it fails repeatedly, stop restarting. Fix time/network/quorum or resource exhaustion first.
Task 5: Check cluster quorum and membership
cr0x@server:~$ pvecm status
Cluster information
-------------------
Name: prod-cluster
Config Version: 42
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Fri Dec 26 09:44:02 2025
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.1
Quorate: Yes
Meaning: Quorate: Yes is what you want. If it says No, pmxcfs is likely restricting writes, and the UI yells.
Decision: If not quorate, check which nodes are visible (Task 6) and why links are down (Task 7–8).
Task 6: List nodes and see who’s missing
cr0x@server:~$ pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 pve-a (local)
2 1 pve-b
3 1 pve-c
Meaning: Missing nodes here explain quorum loss. If only one node shows up in a 2- or 3-node cluster, you’re isolated.
Decision: If nodes are missing, check corosync links (Task 7) and network reachability (Task 9).
Task 7: Check corosync ring link status
cr0x@server:~$ corosync-cfgtool -s
Printing link status.
Local node ID 1
LINK ID 0
addr = 10.10.0.11
status = connected
LINK ID 1
addr = 172.16.0.11
status = disconnected
Meaning: This shows which corosync network(s) are up. A disconnected ring can be fine if you planned redundancy, or fatal if it’s your only working path.
Decision: If the only ring is down, fix network or routing/VLAN/MTU before touching corosync config.
Task 8: Look for corosync token timeouts and retransmits
cr0x@server:~$ journalctl -u corosync -n 200 --no-pager
Dec 26 09:42:10 pve-a corosync[1050]: [TOTEM ] Token has not been received in 15000 ms
Dec 26 09:42:10 pve-a corosync[1050]: [TOTEM ] Retransmit List: 12
Dec 26 09:42:11 pve-a corosync[1050]: [KNET ] link: host: 2 link: 0 is down
Dec 26 09:42:12 pve-a corosync[1050]: [QUORUM] Members[1]: 1
Meaning: Token not received + link down means corosync can’t maintain membership. This is usually network loss, MTU mismatch, or a switch doing “helpful” things to multicast/unicast.
Decision: Treat this as a network incident first. Don’t “force” quorum to work around flaky transport unless you enjoy chaos.
Task 9: Verify L2/L3 reachability between nodes (and catch MTU pain)
cr0x@server:~$ ping -c 3 10.10.0.12
PING 10.10.0.12 (10.10.0.12) 56(84) bytes of data.
64 bytes from 10.10.0.12: icmp_seq=1 ttl=64 time=0.435 ms
64 bytes from 10.10.0.12: icmp_seq=2 ttl=64 time=0.401 ms
64 bytes from 10.10.0.12: icmp_seq=3 ttl=64 time=0.398 ms
--- 10.10.0.12 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2022ms
rtt min/avg/max/mdev = 0.398/0.411/0.435/0.016 ms
Meaning: Basic reachability is necessary but not sufficient. Corosync can fail on jitter/loss that ping won’t expose.
Decision: If ping fails, fix addressing/VLAN/routing. If ping succeeds but corosync is unstable, test MTU and packet loss under load.
Task 10: Check time sync (corosync hates time travel)
cr0x@server:~$ timedatectl
Local time: Fri 2025-12-26 09:46:33 UTC
Universal time: Fri 2025-12-26 09:46:33 UTC
RTC time: Fri 2025-12-26 09:46:33
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Meaning: You want System clock synchronized: yes. Significant drift can contribute to membership instability and confusing logs.
Decision: If not synchronized, fix NTP/chrony/systemd-timesyncd. Then restart corosync/pve-cluster if needed.
Task 11: Check disk space and inode exhaustion (silent killers)
cr0x@server:~$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/pve/root 96G 92G 0.0G 100% /
Meaning: A full root filesystem makes services fail in weird ways, including pmxcfs and logging. Even if pmxcfs is mostly memory-backed, the system around it needs disk to breathe.
Decision: Free space immediately (logs, old kernels, ISO cache). Then re-check service health.
cr0x@server:~$ df -ih /
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/pve/root 6.1M 6.1M 0 100% /
Meaning: Inode exhaustion looks like “disk full” but it’s worse because small files stop being created, including lock and state files.
Decision: Find and remove directories with millions of files (common offenders: runaway logs, misconfigured backups, container overlays).
Task 12: Check memory pressure and OOM kills
cr0x@server:~$ journalctl -k -n 100 --no-pager | grep -E 'Out of memory|Killed process' || true
Dec 26 09:35:07 pve-a kernel: Out of memory: Killed process 28814 (pveproxy) total-vm:620004kB, anon-rss:112344kB
Meaning: If the kernel is killing pveproxy/pvedaemon/corosync, your “cluster filesystem not ready” is collateral damage from resource starvation.
Decision: Fix the pressure: stop memory hogs, add swap (carefully), reduce overcommit, and investigate the workload that triggered OOM.
Task 13: Confirm /etc/pve is writable (and not lying)
cr0x@server:~$ touch /etc/pve/.pmxcfs-write-test && echo "ok"
ok
cr0x@server:~$ rm -f /etc/pve/.pmxcfs-write-test
Meaning: If touch fails with “Read-only file system” or “Input/output error,” pmxcfs is mounted but not healthy for writes.
Decision: Read-only often means no quorum. IO errors suggest a stuck FUSE mount or a pmxcfs internal failure—go back to logs and consider restarting pve-cluster after fixing underlying issues.
Task 14: Inspect corosync config consistency (do not edit yet)
cr0x@server:~$ sed -n '1,120p' /etc/pve/corosync.conf
totem {
version: 2
cluster_name: prod-cluster
transport: knet
interface {
linknumber: 0
bindnetaddr: 10.10.0.0
}
}
nodelist {
node {
name: pve-a
nodeid: 1
quorum_votes: 1
ring0_addr: 10.10.0.11
}
node {
name: pve-b
nodeid: 2
quorum_votes: 1
ring0_addr: 10.10.0.12
}
node {
name: pve-c
nodeid: 3
quorum_votes: 1
ring0_addr: 10.10.0.13
}
}
Meaning: You’re checking for obvious wrongness: wrong IPs, wrong bindnetaddr, missing node, duplicate nodeid. Also confirm you can read the file; if not, pmxcfs isn’t serving it.
Decision: If config is wrong due to a known change, plan a controlled correction. Random edits during quorum loss are how you manufacture split brain.
Task 15: See if pveproxy/pvedaemon are failing because /etc/pve is down
cr0x@server:~$ systemctl status pveproxy pvedaemon --no-pager
● pveproxy.service - PVE API Proxy Server
Active: active (running) since Fri 2025-12-26 09:41:22 UTC; 6min ago
● pvedaemon.service - PVE API Daemon
Active: active (running) since Fri 2025-12-26 09:41:21 UTC; 6min ago
Meaning: These can be “running” but still erroring because they can’t read cluster config reliably.
Decision: If UI is broken but services are running, check their logs for “/etc/pve” read/write failures and then focus back on pmxcfs/corosync.
Root causes by subsystem
1) Quorum loss (most common, and usually correct behavior)
Quorum isn’t a punishment; it’s a safety belt. Without quorum, a cluster can’t be confident that the config you’re about to write won’t conflict with another node doing the same thing. pmxcfs responds by restricting operations. The UI interprets that as “cluster filesystem not ready,” because from its perspective, it can’t do its job.
Typical triggers:
- One node down in a 2-node cluster (there is no majority).
- Network partition (nodes can’t see each other; both sides may think the other is dead).
- Corosync interface down, VLAN change, switch issue, MTU mismatch.
- Time drift plus unstable network causes membership churn.
Fix pattern: restore membership (bring nodes back, fix network), or add a quorum device for small clusters, or redesign away from 2-node clusters without qdevice.
2) Corosync transport instability (knet, link flaps, and “it pings fine”)
Corosync doesn’t just need connectivity; it needs predictable delivery. Packet loss, microbursts, bufferbloat, or a “smart” firewall in the path can cause token timeouts. When that happens, membership changes constantly, and pmxcfs gets jittery or refuses readiness.
Fix pattern: put corosync on a dedicated, boring network. No NAT. No stateful firewall in-between. Match MTU. Avoid asymmetric routing. If you need redundancy, use multiple links intentionally and test failover.
3) Time drift (the stealth ingredient)
Distributed systems don’t require perfect time, but they do require time that doesn’t jump around. If one node is minutes off, or if NTP steps the clock aggressively, you can get bizarre log sequences, auth weirdness, and unstable cluster behavior.
Fix pattern: configure reliable time sync on every node; prefer gradual slewing; don’t run mixed time daemons fighting each other.
4) Local node resource exhaustion (disk full, inode full, OOM, IO stalls)
pmxcfs is small, but it lives in a real OS. If root is full, journald can’t write, services fail to start, and FUSE mounts can behave badly. If memory is tight, OOM kills the wrong thing at the wrong time. If storage is stalling, processes block, including corosync.
Fix pattern: treat it like a node outage root cause. Free space, fix logging, stop the runaway, then restart services.
5) FUSE mount or pmxcfs stuck state
Sometimes pmxcfs is “running” but the /etc/pve mount is wedged. You see IO errors, long hangs on ls /etc/pve, or processes stuck in D state. This can be caused by kernel/FUSE issues, extreme pressure, or pmxcfs internal deadlock.
Fix pattern: stabilize the node first (CPU/IO/memory), then restart pve-cluster. If the mount won’t unmount cleanly, you may need a reboot of the node (yes, really).
6) Configuration divergence and split brain
Split brain is when parts of the cluster disagree about membership and proceed independently. In Proxmox terms, it’s when nodes write competing versions of the “truth” under /etc/pve. The platform works hard to prevent this; admins can still defeat those safeguards by forcing quorum, copying configs around, or bringing nodes back in the wrong order.
Fix pattern: stop writing, pick a source of truth, and reconcile carefully. Often that means shutting down the minority partition, restoring connectivity, and ensuring the cluster forms with the correct expected votes.
Joke #2: “Just force quorum” is the distributed-systems version of “hold my beer.”
Three corporate mini-stories (anonymized, plausible, and educational)
Incident #1: a wrong assumption (the two-node trap)
They had a neat little Proxmox cluster: two nodes, shared storage, and the comforting belief that “if either node survives, we’re fine.” It ran for months. Then a top-of-rack switch rebooted during a firmware rollout, and one node lost cluster connectivity for a couple minutes.
When the network came back, the team found “cluster filesystem not ready” on the isolated node. No migrations. No storage edits. Some VM operations still worked locally, but anything involving cluster config was blocked. The on-call assumed the cluster filesystem was “a storage mount” and started checking NFS. The NFS was fine, the problem wasn’t.
The wrong assumption was subtle: they thought “two nodes” naturally implies “redundant.” In quorum logic, two nodes implies “tie.” A tie is not a decision, it’s a deadlock with better manners.
They “solved” it by forcing expected votes to 1 on the isolated node, edited config, and then later reconnected the second node. Now both nodes had made changes independently. It didn’t explode instantly. It just became a slow-burn config divergence: storage entries didn’t match, HA resources looked inconsistent, and a later maintenance window turned into a puzzle game.
The eventual fix was boring: add a quorum device and stop treating two nodes as a real cluster without a tiebreaker. They also rewired corosync onto a dedicated VLAN. The big win wasn’t uptime; it was that the cluster stopped arguing about reality.
Incident #2: an optimization that backfired (jumbo frames and the vanishing token)
A different org wanted lower CPU overhead and better throughput on their “cluster network,” so they enabled jumbo frames end-to-end. Except it wasn’t end-to-end. One intermediate switch port stayed at 1500 MTU because it was part of a legacy trunk and nobody wanted to touch it during business hours.
VM traffic barely noticed. TCP can negotiate and recover. Corosync noticed immediately, because it depends on timely message delivery. It didn’t fail cleanly. It flapped: token timeouts, retransmits, members dropping and rejoining. Every few minutes someone would see “cluster filesystem not ready” in the UI, then it would clear, then return.
The team optimized for performance and accidentally optimized away reliability. Worse, the symptom didn’t point at MTU. It pointed at Proxmox. They restarted services for hours, which mostly generated new log lines and fresh confidence that “it’s random.”
The fix was simply to make MTU consistent. Not “bigger.” Consistent. After that, corosync stopped timing out, quorum stabilized, and pmxcfs became boring again—which is exactly what you want for config distribution.
Incident #3: boring but correct practice that saved the day (dedicated corosync NICs and disciplined changes)
A finance company had a Proxmox cluster in a shared datacenter rack where networking changes happened often. They’d been burned before, so they kept corosync traffic on a dedicated pair of NICs, pinned to a separate switch pair, and they documented the exact IPs and MTUs like it was a legal contract.
One day a facility incident took out one switch. Half the servers in the rack saw link loss on one side. Their Proxmox nodes logged a link down, but corosync stayed connected on the second link. Quorum never dropped. pmxcfs never went read-only. The UI stayed functional.
What made it work wasn’t heroics. It was that they had designed for the boring failure: losing a switch. They also had an operational rule: no corosync config edits during an active incident. When things wobble, humans make “creative” choices. The rule prevented that.
After the incident, they changed nothing except to replace the dead switch. Then they ran a controlled failover test the following week to confirm behavior. It was not exciting. And that’s the point.
Common mistakes: symptoms → root cause → fix
1) Symptom: UI banner “cluster filesystem not ready,” but VMs still run
Root cause: pmxcfs is restricting writes due to quorum loss; workloads can run locally, but cluster config changes are blocked.
Fix: restore corosync membership and quorum. Bring missing nodes online, fix the corosync network, or deploy a quorum device for small clusters.
2) Symptom: ls /etc/pve hangs or returns “Input/output error”
Root cause: FUSE mount is wedged, pmxcfs is stuck, or the node is under severe IO pressure.
Fix: check IO stalls and kernel logs, then restart pve-cluster. If the mount won’t recover, reboot the node after ensuring VMs are handled (migrated or safely stopped if local storage).
3) Symptom: “quorum not present – operations restricted” in pve-cluster logs
Root cause: the cluster doesn’t have majority membership; often a 2-node design issue or a node/network outage.
Fix: restore the missing node connectivity, or add qdevice. Avoid forcing expected votes unless you are deliberately running single-node mode and understand the split-brain risk.
4) Symptom: corosync logs show token timeouts, retransmit list grows
Root cause: packet loss, MTU mismatch, VLAN mis-tag, or a firewall/ACL interfering with corosync traffic.
Fix: put corosync on a dedicated, stable network; validate MTU consistency; remove stateful devices in path. Then restart corosync and confirm stable membership.
5) Symptom: services keep restarting; logs are sparse; “random” behavior
Root cause: root filesystem full or inode exhaustion; journald can’t write, services fail in confusing ways.
Fix: free space/inodes immediately. Then restart services and re-check. Don’t chase ghosts while the OS can’t create files.
6) Symptom: after “fixing,” cluster configs differ between nodes
Root cause: someone edited configs while the node was isolated or forced quorum, causing divergence.
Fix: stop config writes, establish a single source-of-truth node, and reconcile carefully. If needed, rejoin nodes in a controlled sequence and verify cluster config version progression.
7) Symptom: only one node consistently shows “not ready” after reboots
Root cause: hostname/IP mismatch, wrong ring0_addr, broken DNS resolution, or NIC renaming changed the interface used for corosync.
Fix: confirm node name resolution, corosync bindnetaddr, and the correct interface. Fix OS networking first, then corosync, then pmxcfs.
Checklists / step-by-step plan
Checklist A: Restore “ready” state safely (single affected node)
- Stop making changes. No storage edits, no node joins, no random file copies.
- Verify mount:
mount | grep /etc/pve. If missing, focus onpve-cluster. - Check service:
systemctl status pve-clusterandjournalctl -u pve-cluster. - Check quorum:
pvecm status. If not quorate, do not expect writable config. - Check corosync:
journalctl -u corosync,corosync-cfgtool -s. - Check time:
timedatectl. - Check disk/inodes:
df -h /,df -ih /. - Only then restart: restart
corosyncif transport is fixed; restartpve-clusterafter corosync is stable. - Validate:
pvecm statusquorate,touch /etc/pve/testworks, UI banner clears.
Checklist B: Two-node cluster “not ready” after a node outage
- Assume quorum loss is expected behavior.
- Bring the second node back (or fix the interconnect) rather than forcing writes.
- If this is a repeating operational problem: implement a quorum device.
- Document a hard rule: “no forced quorum during incidents unless approved and recorded.”
Checklist C: Suspected split brain
- Freeze config writes. Stop humans from “fixing” by editing cluster files.
- Identify partitions: which nodes see which members (run
pvecm statuson each). - Pick a source of truth based on the majority partition and most recent consistent config version.
- Restore networking and ensure the cluster forms as a single membership.
- Only then reconcile any divergent configs. Verify storage definitions and VM configs match expected state before doing HA actions.
Checklist D: When a reboot is the right answer
Reboot is justified when:
/etc/pveis hung and you can’t unstick FUSE cleanly.- The node is in severe IO wait or kernel-level weirdness (D-state processes) and service restarts don’t help.
- OOM conditions are ongoing and you need to reset to a known state (after reducing workload).
Reboot is not justified when:
- You haven’t checked disk full/inodes/time/network and you’re just hoping.
- The cluster is partitioned and rebooting will shuffle the deck without restoring connectivity.
Interesting facts and historical context (for people who like to know why)
- pmxcfs is a FUSE filesystem, which is why
/etc/pvebehaves unlike normal directories and can “hang” when the userspace daemon is sick. - Proxmox’s cluster config is distributed, not shared-storage-based. You don’t need a SAN for cluster config replication; you need corosync working.
- Corosync historically leaned on multicast in many setups; modern Proxmox uses knet transport commonly, which can use unicast and handles multiple links more gracefully.
- Quorum logic exists to prevent split brain, a failure mode well-known in early HA clusters where both halves believed they were primary and wrote conflicting state.
- Two-node clusters are inherently ambiguous without a tie-breaker. This isn’t a Proxmox quirk; it’s a distributed-systems math problem wearing an ops hat.
- /etc/pve contains more than VM configs: storage definitions, firewall rules, and cluster-wide settings. When it’s down, the blast radius looks bigger than “just clustering.”
- Config Version in
pvecm statusis a useful breadcrumb during incidents: it tells you if nodes are progressing together or diverging. - Time drift causes “impossible” debugging because logs don’t line up and membership events appear out of order; cluster stacks tend to amplify that confusion.
FAQ
1) Does “cluster filesystem not ready” mean my VM disks are unavailable?
No. It usually means /etc/pve (pmxcfs) isn’t available for config operations. Your VM disks may be fine. But management actions that rely on cluster config can fail.
2) Can I keep running VMs when the cluster filesystem isn’t ready?
Often yes. Running workloads don’t necessarily stop. The risk is operational: you may not be able to migrate, change config, or manage HA cleanly until pmxcfs and quorum are healthy.
3) Why does Proxmox block writes when quorum is lost?
To prevent split brain config. Without quorum, two partitions could both accept changes and later conflict. Blocking writes is the safe choice.
4) Is it safe to force quorum / expected votes to 1?
It can be temporarily useful in a controlled single-node emergency, but it’s risky. If another node is alive and also writing, you can create divergence. Treat it like a last resort and document it.
5) What’s the difference between corosync being “up” and quorum being “quorate”?
corosync “up” can mean the daemon is running locally. “Quorate” means it has formed a valid membership with enough votes to make decisions. pmxcfs cares about the second one.
6) Why does this happen after a network change that “shouldn’t affect” Proxmox?
Because corosync is extremely sensitive to packet loss, MTU inconsistency, and link flaps. Your VM traffic can survive sloppy networks; consensus traffic is less forgiving.
7) How do I know if /etc/pve is actually mounted or just a directory?
Run mount | grep /etc/pve. It should show fuse.pmxcfs. If not, you’re not looking at the cluster filesystem.
8) Should I reinstall the node if pmxcfs is broken?
Almost never as a first move. Most cases are quorum/network/time/disk pressure. Reinstalling can destroy evidence and make recovery harder. Diagnose first.
9) Can Ceph problems cause “cluster filesystem not ready”?
Indirectly. If Ceph issues cause massive IO wait or node pressure, they can destabilize corosync and pmxcfs. But pmxcfs itself is not stored on Ceph.
10) Why does the UI show the error even when pve-cluster is active?
Because “active” doesn’t mean “ready for safe writes.” pmxcfs can be mounted but operating in restricted mode due to lost quorum, or it can be wedged while still running.
Conclusion: practical next steps
If you take only one operational lesson from this error, take this: don’t treat it like a storage problem; treat it like a cluster-consensus problem. Check whether /etc/pve is mounted, then check quorum and corosync health, then fix time/network/resource pressure in that order.
Next steps that pay off:
- If you run two nodes, add a quorum device. Stop gambling on ties.
- Give corosync a dedicated, boring network. Stable MTU, no “security appliances” in the path, redundant links only if you can test them.
- Monitor the boring stuff: disk/inodes on root, NTP sync status, and corosync link stability. The cluster filesystem error is usually the messenger, not the villain.
- Write an incident rule: no forced quorum and no manual config edits under partition unless a single owner is accountable and the blast radius is understood.