Proxmox filesystem becomes read-only: why it happens and how to recover

October 4, 2025 • February 3, 2026 • Read: 24 min • Views: 17

Was this helpful?

You notice it when backups fail, migrations stall, or the GUI starts throwing “permission denied” like it’s having a bad day. Then you try to edit a config and your shell replies: Read-only file system. That’s not a “reboot later” problem. That’s your storage stack telling you it’s protecting itself from making things worse.

On Proxmox, a read-only filesystem can be a boring ext4 safety feature, a loud ZFS integrity reaction, a dying SSD, or a cluster filesystem that’s lost quorum and decided it’s done with your nonsense. The good news: most cases are recoverable. The bad news: the recovery steps are different depending on which filesystem went read-only and why.

What “read-only” actually means on Proxmox

“Read-only” is not one single thing. On a Proxmox host you can have:

The root filesystem (usually ext4 or ZFS root, sometimes XFS) mounted read-only by the kernel after errors.
A data filesystem (ZFS dataset, ext4/XFS on LVM, NFS/Ceph mount) going read-only while the host stays writable.
pmxcfs (the Proxmox cluster filesystem mounted at /etc/pve) refusing writes when there’s no quorum or local database trouble.
App-level read-only behavior that looks like a filesystem problem (e.g., ZFS dataset readonly=on, or a storage backend returning EROFS).

The kernel flips a mount into read-only mode when it thinks continuing writes may corrupt metadata. That’s not Linux being dramatic; it’s Linux refusing to be your accomplice.

Rule #1: identify which mount is read-only and what is generating the error path (VFS, filesystem driver, block layer, or cluster service). If you skip this, you’ll spend an hour remounting the wrong thing and feel productive while nothing improves.

Fast diagnosis playbook (first/second/third)

First: confirm what is read-only and where writes fail

Is it /? /var? /etc/pve? A ZFS dataset like rpool/data? A mounted NFS share?
Try a tiny write in the failing path (don’t “fix” anything yet). It’s a probe.

Second: grab the kernel story before it scrolls away

dmesg -T and journalctl -k -b tell you whether the kernel saw I/O errors, filesystem corruption, ZFS pool suspension, or a forced shutdown.
Look for: Buffer I/O error, EXT4-fs error, XFS (…): Corruption detected, blk_update_request, ata/nvme resets, ZFS: pool I/O suspended.

Third: decide if this is “storage is sick” or “filesystem is sick”

Storage sick: I/O errors, timeouts, resets, SMART warnings, NVMe media errors. Fix hardware/cabling/controller first. Running fsck on a drive that’s actively failing is how you turn a scrape into an amputation.
Filesystem sick: clean device health, but metadata errors. Then repair (fsck/xfs_repair/zpool clear depending on stack).
Cluster/quorum sick: only /etc/pve is read-only and logs show quorum loss. That’s political, not physical.

Make a call quickly: stabilize, preserve evidence, then repair. Your goal is to prevent cascading damage (VM disks, logs, databases) while recovering service.

Why filesystems go read-only (real failure modes)

1) Actual disk I/O errors (the most common “real” cause)

When the block layer can’t complete writes, filesystems often respond by remounting read-only to avoid half-written metadata. This can be from:

Bad sectors on HDDs or NAND wear on SSDs
NVMe controller resets or firmware bugs
SATA cable issues (yes, still)
RAID controller hiccups or cache/battery faults
Power loss causing internal device panic

On Proxmox, the first clue is usually in dmesg long before you see “read-only” at the mount layer.

2) Filesystem metadata corruption (ext4/XFS)

ext4 has an “errors” behavior policy. Common defaults: remount read-only on error. XFS tends to force shutdown if it detects corruption. Both are doing you a favor—just not a convenient one.

3) ZFS pool trouble: suspended I/O, faulted vdevs, or device disappearances

ZFS doesn’t “remount read-only” in the same way ext4 does. Instead, it may suspend I/O to protect consistency when it can’t satisfy writes safely. From the VM’s perspective it’s similar: operations hang or error, and Proxmox starts logging storage failures.

4) Full filesystem or inode exhaustion (looks like read-only if you don’t read the error)

A full filesystem usually throws No space left on device, not read-only. But plenty of tooling and human brains interpret “can’t write” as “read-only”. On a Proxmox host, /var fills from logs, backups, crash dumps, or runaway container writes.

5) pmxcfs (`/etc/pve`) read-only due to quorum loss

/etc/pve is a FUSE-based cluster filesystem (pmxcfs). When the cluster loses quorum, Proxmox protects you from split-brain by making cluster config read-only. People misdiagnose this as a disk failure because the error string is identical.

6) “Optimizations” that create fragility

Write caching without power-loss protection, aggressive discard settings, disabling barriers, exotic RAID modes, cheap USB boot devices… all of these can produce “surprising” read-only remounts during stress or power events. The surprise is only for the person who configured it.

Joke #1: Storage is like parachuting—most of the time it’s boring, and the exciting parts are rarely repeatable in a good way.

Interesting facts and history you can weaponize

ext4’s “errors=remount-ro” lineage goes back to ext2/ext3 era thinking: if metadata is suspect, stop writing fast to preserve recoverability.
XFS was born at SGI for big-iron workloads; its “force shutdown” behavior is a deliberate “better dead than corrupt” stance.
Linux VFS uses EROFS (“Error Read-Only File System”) as a generic signal—so totally different causes can look identical from user space.
Proxmox’s /etc/pve isn’t “a directory” in the normal sense. It’s a distributed config store exposed as a filesystem via FUSE (pmxcfs).
Quorum rules are older than Proxmox: distributed systems have used majority voting for decades to avoid split-brain writes. Proxmox just makes it painfully visible.
ZFS treats silent corruption as a first-class enemy; the end-to-end checksum model is why it’s loved in ops circles—and why it can get strict when devices misbehave.
SMART wasn’t designed for perfect prediction; it’s more like “weather forecasting for disks.” Still useful, but not magic.
Write caches changed the failure game: modern SSDs and controllers can acknowledge writes before they’re durable. Without power-loss protection, that’s a data integrity tax you pay later.
FUSE filesystems can go “read-only” for reasons unrelated to disks, because the daemon can decide to reject writes when its consistency model is threatened.

Practical tasks: commands, outputs, decisions (12+)

These are the tasks I actually run when a Proxmox node starts refusing writes. Each task includes: command, what the output means, and what you decide next.

Task 1: Prove which mount is read-only

cr0x@server:~$ mount | egrep ' on / | on /var | on /etc/pve |type zfs|type ext4|type xfs'
/dev/mapper/pve-root on / type ext4 (ro,relatime,errors=remount-ro)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0)

Meaning: root is mounted (ro,...). /etc/pve is still rw.

Decision: treat this as a host filesystem problem, not quorum. Move to kernel logs and storage health immediately.

Task 2: Attempt a controlled write to the failing path

cr0x@server:~$ touch /root/.rw-test
touch: cannot touch '/root/.rw-test': Read-only file system

Meaning: the error is real and coming from VFS/FS, not permissions.

Decision: stop non-essential writes (backups, log storms), and gather evidence before rebooting.

Task 3: Read kernel messages for the trigger

cr0x@server:~$ dmesg -T | tail -n 40
[Thu Dec 26 10:41:09 2025] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Thu Dec 26 10:41:09 2025] blk_update_request: I/O error, dev sda, sector 312492800 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[Thu Dec 26 10:41:09 2025] EXT4-fs error (device dm-0): ext4_journal_check_start:83: Detected aborted journal
[Thu Dec 26 10:41:09 2025] EXT4-fs (dm-0): Remounting filesystem read-only

Meaning: this started as a block-layer write error (dev sda) which then caused ext4 journal abort and remount ro.

Decision: assume hardware path issue until proven otherwise. Do SMART, cabling/controller checks; avoid fsck until the device is stable.

Task 4: Confirm current boot and get persistent logs

cr0x@server:~$ journalctl -k -b | egrep -i 'error|ext4|xfs|nvme|ata|timeout|reset|zfs|I/O' | tail -n 60
Dec 26 10:41:09 server kernel: blk_update_request: I/O error, dev sda, sector 312492800 op 0x1:(WRITE)
Dec 26 10:41:09 server kernel: EXT4-fs error (device dm-0): ext4_journal_check_start: Detected aborted journal
Dec 26 10:41:09 server kernel: EXT4-fs (dm-0): Remounting filesystem read-only

Meaning: same story as dmesg, but now in the journal (useful after reboot).

Decision: proceed with device health checks; plan for maintenance window if the underlying disk is failing.

Task 5: Identify the block device stack (dm-0 → LVM → physical disk)

cr0x@server:~$ lsblk -o NAME,TYPE,SIZE,FSTYPE,MOUNTPOINTS,UUID
NAME           TYPE   SIZE FSTYPE      MOUNTPOINTS                UUID
sda            disk  1.8T
├─sda1         part  1007K
├─sda2         part     1G vfat        /boot/efi                 2B1A-9C3D
└─sda3         part   1.8T LVM2_member                          1dGk2A-...
  ├─pve-root   lvm     96G ext4        /                         4d9a7a0e-...
  ├─pve-swap   lvm      8G swap        [SWAP]                    0f19...
  └─pve-data   lvm    1.7T LVM2_member
    └─pve-vm--data-lvpool lvm 1.7T

Meaning: root filesystem sits on LVM logical volume pve-root backed by sda3.

Decision: check the physical disk (/dev/sda) first. A pretty LVM stack won’t save a dying drive.

Task 6: Check SMART status (SATA/SAS) for “is this disk lying to me?”

cr0x@server:~$ smartctl -a /dev/sda
SMART overall-health self-assessment test result: FAILED!
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   001   001   010    Pre-fail  Always   FAILING_NOW 2048
197 Current_Pending_Sector  0x0012   001   001   000    Old_age   Always       -       128
198 Offline_Uncorrectable   0x0010   001   001   000    Old_age   Offline      -       128

Meaning: this disk is not “degraded,” it’s actively failing. Pending and uncorrectable sectors correlate strongly with I/O errors.

Decision: stop trusting the device. Prioritize data evacuation and replacement. Don’t run write-heavy repairs unless you must.

Task 7: NVMe health (if you boot from NVMe)

cr0x@server:~$ nvme smart-log /dev/nvme0
critical_warning                    : 0x00
media_errors                        : 12
num_err_log_entries                 : 58
percentage_used                     : 87%

Meaning: high wear (percentage_used) and recorded media errors. Not automatically fatal, but not a “ship it” signal.

Decision: schedule replacement and check firmware/thermal environment. If the filesystem already remounted ro, treat it as urgent.

Task 8: Check if `/etc/pve` is the thing that’s read-only (quorum failure detection)

cr0x@server:~$ touch /etc/pve/.pmxcfs-test
touch: cannot touch '/etc/pve/.pmxcfs-test': Read-only file system

Meaning: pmxcfs is refusing writes (often quorum, sometimes pmxcfs trouble).

Decision: do cluster checks, not disk checks, unless other mounts also failed.

Task 9: Cluster health and quorum

cr0x@server:~$ pvecm status
Quorum information
------------------
Date:             Thu Dec 26 11:02:18 2025
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.2a
Quorate:          No

Meaning: not quorate. Proxmox will protect cluster config writes.

Decision: fix corosync network / bring nodes back / adjust expected votes for emergency operation (carefully). Don’t try to “chmod your way out.”

Task 10: Check pmxcfs / corosync services

cr0x@server:~$ systemctl status pve-cluster corosync --no-pager
● pve-cluster.service - The Proxmox VE cluster filesystem
     Active: active (running)
● corosync.service - Corosync Cluster Engine
     Active: active (running)

Meaning: services run, but quorum can still be lost. Don’t confuse “active” with “healthy.”

Decision: troubleshoot connectivity and votes; validate pvecm status again after changes.

Task 11: ZFS quick health check (if you use ZFS)

cr0x@server:~$ zpool status -x
pool 'rpool' is DEGRADED
status: One or more devices could not be used because the label is missing or invalid.
action: Replace the device using 'zpool replace'.

Meaning: a vdev member went missing or is unreadable. On mirrored/raidz pools, you might still be running, but you’re on borrowed time.

Decision: identify the missing device, check cabling/backplane, and replace. If I/O is suspended, prioritize clearing that condition safely.

Task 12: ZFS detailed status with errors and device mapping

cr0x@server:~$ zpool status -v rpool
  pool: rpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or invalid.
config:

        NAME                        STATE     READ WRITE CKSUM
        rpool                       DEGRADED     0     0     0
          mirror-0                  DEGRADED     0     0     0
            ata-SAMSUNG_SSD_1       ONLINE       0     0     0
            ata-SAMSUNG_SSD_2       UNAVAIL      0     0     0  cannot open

errors: No known data errors

Meaning: one mirror member vanished. No checksum errors yet, but resilience is reduced to zero redundancy.

Decision: treat as an incident. Find why the device is UNAVAIL (HBA reset? backplane? dead SSD). Replace then scrub.

Task 13: Check for a ZFS “suspended I/O” event in logs

cr0x@server:~$ journalctl -k -b | egrep -i 'zfs|suspend|zio|I/O' | tail -n 40
Dec 26 10:55:01 server kernel: ZFS: vdev IO failure, zio pipeline stalled
Dec 26 10:55:01 server kernel: ZFS: pool rpool has encountered an uncorrectable I/O failure and has been suspended.

Meaning: ZFS suspended the pool to prevent further damage. Writes may fail/hang.

Decision: stop VM/CT writes, ensure hardware path is stable, then clear/replace as appropriate. Reboots without fixing hardware can produce loops.

Task 14: Detect “it’s just full” quickly

cr0x@server:~$ df -hT /
Filesystem          Type  Size  Used Avail Use% Mounted on
/dev/mapper/pve-root ext4   94G   94G     0 100% /

Meaning: root is full. This alone shouldn’t remount ro, but it will cause widespread write failures.

Decision: free space safely (log cleanup, remove old kernels, prune caches), then reassess. If you also see ext4 errors, the full disk may be a symptom (e.g., log spam from failing hardware).

Task 15: Confirm ext4 error policy and mount options

cr0x@server:~$ tune2fs -l /dev/mapper/pve-root | egrep -i 'Filesystem state|Errors behavior|Last error'
Filesystem state:         clean
Errors behavior:          Remount read-only
Last error time:          Thu Dec 26 10:41:09 2025

Meaning: ext4 is configured to remount ro on errors; last error time matches the incident.

Decision: plan offline fsck after stabilizing hardware. Don’t “fix” by changing error behavior; that’s like removing the smoke alarm batteries because the beeping is annoying.

Task 16: Confirm XFS forced shutdown details (if XFS)

cr0x@server:~$ journalctl -k -b | egrep -i 'xfs|shutdown|metadata|corrupt' | tail -n 40
Dec 26 10:39:12 server kernel: XFS (dm-0): Corruption detected. Unmount and run xfs_repair
Dec 26 10:39:12 server kernel: XFS (dm-0): xfs_do_force_shutdown(0x2) called from xfs_reclaim_inodes+0x2b0/0x2f0

Meaning: XFS detected corruption and forced shutdown to prevent further damage.

Decision: schedule downtime to unmount and run xfs_repair (often from rescue mode). Again: verify hardware first.

That’s the diagnostic core. Now let’s talk recovery without making it worse.

Recovery by storage stack: ext4/XFS, ZFS, LVM, pmxcfs

Case A: ext4 root or data filesystem remounted read-only

ext4 goes read-only when it detects internal inconsistency (often triggered by I/O errors). Your priorities:

Stop write-heavy services to reduce churn.
Capture logs (kernel and syslog).
Confirm underlying storage health (SMART, dmesg resets).
Repair offline (fsck) after you can trust the block layer.

Try remounting read-write (only as a temporary move)

If the kernel remounted ro due to ext4 errors, remounting rw can fail—or succeed briefly then fail again. Treat this as “get logs and copy critical configs,” not “resume business as usual.”

cr0x@server:~$ mount -o remount,rw /
mount: /: cannot remount /dev/mapper/pve-root read-write, is write-protected.

Meaning: the kernel is refusing rw because the filesystem is in an error state.

Decision: you need offline repair. Don’t fight the kernel. It will win, and you’ll lose data.

Prepare for offline fsck (safe-ish approach)

You usually can’t fsck the mounted root filesystem. Plan to boot into a rescue environment or single-user mode with root unmounted.

In Proxmox terms, that’s often: boot from ISO/IPMI virtual media, or use grub recovery if you know what you’re doing.

Once in rescue, run:

cr0x@server:~$ fsck.ext4 -f -y /dev/mapper/pve-root
e2fsck 1.47.0 (5-Feb-2023)
/dev/mapper/pve-root: recovering journal
/dev/mapper/pve-root: Clearing orphaned inode 131082
/dev/mapper/pve-root: FIXED.

Meaning: journal recovered; orphaned inodes fixed; fsck reports repairs.

Decision: reboot and watch logs. If errors recur, the disk path is still failing or you have RAM/controller issues.

When fsck is the wrong first step

If SMART is failing or dmesg shows resets/timeouts, fsck can accelerate failure due to heavy reads/writes. In that case:

Image/copy what you can (VM disks, configs) to a healthy device.
Replace hardware.
Then repair if needed.

Case B: XFS filesystem forced shutdown

XFS repair is xfs_repair. It must run with the filesystem unmounted. If it’s the root filesystem, you’re in rescue mode again.

cr0x@server:~$ xfs_repair /dev/mapper/pve-root
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
Phase 7 - verify and correct link counts...
done

Meaning: the log was reset and metadata corrected.

Decision: reboot, then run a workload check. If corruption repeats, it’s frequently hardware (RAM, controller, drive) rather than XFS “being fragile.”

Case C: ZFS pool trouble (degraded, faulted, suspended I/O)

ZFS is not shy. If it thinks writes can’t be safely committed, it can suspend I/O. That’s when Proxmox starts acting haunted: VM disks hang, tasks don’t complete, and you get timeouts everywhere.

Step 1: status, don’t guess

cr0x@server:~$ zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data corruption.
action: Restore the file in question if possible. Otherwise restore the entire pool from backup.
errors: Permanent errors have been detected in the following files:
        rpool/data/vm-101-disk-0

Meaning: you have actual permanent errors affecting a VM disk. ZFS is telling you it could not repair from redundancy.

Decision: stop that VM, restore from backup/replica. Don’t “scrub harder.” Scrub won’t invent missing bits.

Step 2: replace missing/faulted device (example)

cr0x@server:~$ zpool replace rpool ata-SAMSUNG_SSD_2 /dev/disk/by-id/ata-SAMSUNG_SSD_NEW

Meaning: ZFS begins resilvering to the new device.

Decision: monitor resilver progress and system load. If this is production, you may want to pause migrations and heavy backups.

Step 3: clear errors after hardware fix

cr0x@server:~$ zpool clear rpool

Meaning: clears error counts; does not “repair” data by itself.

Decision: run a scrub to validate and repair from redundancy.

Step 4: scrub and interpret results

cr0x@server:~$ zpool scrub rpool
cr0x@server:~$ zpool status rpool
  pool: rpool
 state: ONLINE
scan: scrub repaired 0B in 00:12:31 with 0 errors on Thu Dec 26 12:03:44 2025

Meaning: scrub found no errors (good). If it repaired data, you’d see non-zero repaired bytes; if it couldn’t repair, you’d see errors.

Decision: if errors persist: check cabling/HBA, and consider memory testing (ZFS is good at exposing bad RAM).

Case D: LVM or device-mapper issues causing read-only behavior

Sometimes the filesystem is innocent; the underlying block device flips to an error state. Check dm and LVM messages:

cr0x@server:~$ dmsetup info -C
Name             Maj Min Stat Open Targ Event  UUID
pve-root         252   0 L--w   1    1      0  LVM-...

Meaning: Stat flags can hint at device state (this varies). More importantly, combine with kernel logs for “device mapper: thin: …” or “I/O error on dm-0”.

Decision: if thin-pool metadata is full or corrupted, you need LVM-thin specific fixes and likely downtime. Don’t improvise.

Case E: `/etc/pve` is read-only (pmxcfs / quorum)

This is the classic Proxmox trap: everything else is writable, but you can’t edit a VM config, storage config, or add a node. Error says read-only. People blame disks. Meanwhile the cluster is just not quorate.

Confirm pmxcfs mount and quorum

cr0x@server:~$ mount | grep /etc/pve
pve-cluster on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

cr0x@server:~$ pvecm status | egrep 'Quorate|Nodes|Expected votes|Total votes'
Nodes:            2
Expected votes:   3
Total votes:      2
Quorate:          No

Meaning: pmxcfs mount says rw, but the cluster state is not quorate; Proxmox will still refuse config writes.

Decision: restore quorum by bringing nodes back or fixing corosync networking. Only do vote changes if you understand split-brain implications.

Emergency: set expected votes (use with respect)

cr0x@server:~$ pvecm expected 2

Meaning: you tell corosync to expect fewer votes so the remaining nodes can become quorate.

Decision: do this only when you’re sure the missing nodes won’t come back and write divergent config. This is a “get production back” lever, not a daily convenience.

Joke #2: Quorum is the corporate meeting where nothing gets approved unless enough people show up, and somehow that’s still better than chaos.

Three mini-stories from corporate life

Mini-story 1: The incident caused by a wrong assumption

The team saw “Read-only file system” while editing a VM config. Naturally, they blamed the boot SSD. Someone opened a change ticket to replace the drive. Meanwhile, the node was still serving VMs fine, and SMART was clean.

A senior admin tried to create a file in /tmp and it worked. They tried the same in /etc/pve and it failed. That one detail should have ended the debate, but the assumption was already sticky: “read-only means disk.”

They eventually ran pvecm status and found the cluster was not quorate after a network change: corosync traffic was riding the same bonded interface as a noisy backup VLAN, and multicast/UDP loss was making membership flap. pmxcfs protected the cluster state by refusing writes. Exactly what it’s designed to do.

The fix was boring: move corosync to its own VLAN and verify end-to-end MTU and packet loss. No drive swap required. The postmortem’s best line was the simplest: “We treated a distributed system issue like a disk issue.” That’s the wrong assumption in one sentence.

Mini-story 2: The optimization that backfired

A virtualization platform team wanted faster VM storage. They enabled aggressive write caching on a RAID controller, and for good measure disabled barriers because “we have a battery.” It benchmarked beautifully. Everyone clapped. Someone put the graphs in a slide deck, which is how you know it’s serious.

Months later, a maintenance event caused a brief power disturbance. The RAID cache battery was present but not actually healthy; it had been reporting warnings that nobody monitored. The controller acknowledged writes that weren’t durable, and after the reboot, the host filesystem came up with journal problems. ext4 detected it and remounted root read-only mid-boot.

Recovery was messy: a rescue boot, fsck, and then days of chasing subtle VM-level corruption symptoms. Not everything was lost, but trust was. The “optimization” saved milliseconds and cost weekends.

The lasting change wasn’t performance tuning; it was governance: any change touching write ordering needed an explicit power-loss model and monitoring for the controller battery/flash module status. Speed is great. Speed with lies is how you get career development you didn’t ask for.

Mini-story 3: The boring but correct practice that saved the day

A different shop ran Proxmox on ZFS mirrors, kept weekly scrub schedules, and did something deeply unglamorous: they tested restores. Not once. Regularly. The backups were not “a checkbox,” they were part of the operational rhythm.

One morning, a node started logging NVMe media errors. ZFS reported a device going flaky, but the pool stayed online thanks to mirroring. The team didn’t wait for heroic failure. They migrated the busiest VMs off the node, replaced the NVMe, and resilvered.

During the resilver, one VM disk showed a permanent error on a specific block. That’s the nightmare headline, except it wasn’t. They stopped the VM, restored from last night’s backup, and moved on. Downtime was limited to one workload, and the blast radius stayed small.

The saving move wasn’t a clever command. It was the practiced muscle memory: scrub, monitor, migrate, replace, restore. No drama. No midnight archaeology in dmesg. Boring won because boring was prepared.

Common mistakes: symptoms → root cause → fix

1) “Can’t edit VM config” → pmxcfs/quorum issue → fix cluster quorum

Symptoms: writes fail only under /etc/pve; VMs still run; disk checks look fine.
Root cause: cluster not quorate, or corosync instability.
Fix: pvecm status, restore node connectivity; in emergencies, adjust expected votes deliberately. Then validate writes to /etc/pve.

2) Root filesystem went ro after I/O errors → dying disk/cable/HBA → fix hardware path first

Symptoms: blk_update_request, timeouts, ATA resets, NVMe resets; ext4 journal abort.
Root cause: unstable block device path.
Fix: SMART/NVMe logs; reseat/replace cable; update firmware; replace device. Only then fsck/repair.

3) ZFS pool “suspended” → repeated device errors → stop writes, replace, clear, scrub

Symptoms: tasks hang; ZFS logs show suspended pool; zpool status shows UNAVAIL/FAULTED.
Root cause: ZFS encountered uncorrectable I/O and suspended to protect consistency.
Fix: stabilize hardware; replace failed vdev; zpool clear; scrub. Restore affected VM disks if permanent errors exist.

4) “Read-only” but actually full disk → log/backup growth → free space, fix growth source

Symptoms: writes fail; df -h shows 100%; logs huge.
Root cause: full filesystem or inode exhaustion.
Fix: remove large offenders (/var/log, old ISOs, old kernels), rotate logs, move backups, and add monitoring thresholds.

5) Repair loop: fsck “fixes” but ro returns → underlying errors continue → stop and address root

Symptoms: after reboot/repair, within hours it remounts ro again.
Root cause: failing hardware, bad RAM, flaky controller, or power issues.
Fix: hardware diagnostics, memtest in maintenance window, firmware updates, and power path review (PSU/UPS).

6) Trying to force rw mounts in production → more corruption → accept downtime and repair properly

Symptoms: intermittent recovery with mount -o remount,rw, then worsening errors.
Root cause: treating the symptom; ignoring the integrity protection.
Fix: take outage, repair offline, restore from backup if needed. You don’t negotiate with physics.

Checklists / step-by-step plan

Checklist A: When the Proxmox host root filesystem is read-only

Stabilize: pause heavy tasks. If VMs are running, stop backups/migrations. Avoid log storms.
Identify mounts: confirm which mount is ro (mount, findmnt).
Collect evidence: dmesg -T, journalctl -k -b, store copies off-host if possible.
Check storage health: smartctl or nvme smart-log; note any resets/timeouts.
Decide:
- If hardware sick: evacuate data and replace hardware.
- If hardware clean: plan offline fsck/xfs_repair.
Repair offline: boot rescue, run filesystem repair tool, reboot.
Validate: watch logs for reoccurrence; run a controlled workload; schedule follow-up.

Checklist B: When only `/etc/pve` acts read-only

Verify scope: write test in /tmp and /etc/pve.
Check quorum: pvecm status.
Check corosync health: packet loss, VLAN, MTU consistency, bonding mode, switch settings.
Recover quorum: bring nodes back, fix network. Use expected votes only as an emergency lever.
Validate: create/edit a harmless config change (or touch file) under /etc/pve.

Checklist C: When ZFS is unhappy (degraded/suspended/errors)

Get status: zpool status -v and capture it.
Stop damage: pause VM writes if pool is suspended or throwing errors.
Confirm device mapping: map by-id to physical bay; check HBA/backplane logs.
Replace/fix device: reseat, replace, zpool replace as appropriate.
Clear and scrub: zpool clear, then zpool scrub.
Handle permanent errors: restore affected VM disks from backup/replica; don’t pretend it didn’t happen.

A single operational principle (worth memorizing)

“Repair” is not a goal; “recover service without corrupting data” is. Many bad incidents happen because someone optimized for the former.

One quote, because it’s the closest thing our field has to scripture:

“Hope is not a strategy.” — Gen. Gordon R. Sullivan

FAQ

1) Why does Linux remount ext4 read-only instead of crashing?

Because continuing writes after metadata corruption can make recovery impossible. Remounting ro is a containment move: preserve what’s still consistent.

2) Can I just run `mount -o remount,rw /` and keep going?

Sometimes it works briefly. If the kernel flagged ext4 errors, it often won’t. Even if it does, you’re writing onto a filesystem that already admitted it can’t guarantee consistency. Use remount rw only to extract data or logs, not as a “fix.”

3) If `/etc/pve` is read-only, does that mean my disk is broken?

Not necessarily. pmxcfs can block writes due to quorum loss or cluster state issues. Test writes elsewhere (like /tmp), then check pvecm status.

4) What’s the fastest way to tell “quorum problem” vs “disk problem”?

If only /etc/pve fails writes and the rest of the system is writable, it’s usually quorum/pmxcfs. If root or /var is ro, it’s usually filesystem/hardware. Confirm with mount and pvecm status.

5) ZFS says “permanent errors.” Can scrub fix that?

No. “Permanent errors” means ZFS couldn’t reconstruct the data from redundancy. You need to restore the affected file/volume from backup or a replica.

6) The disk SMART looks fine, but ext4 still remounted ro. What else could it be?

SMART can miss transient or path-level failures. Look for SATA/NVMe resets, controller errors, bad cables/backplanes, power issues, and (occasionally) bad RAM.

7) Should I reboot immediately when I see read-only?

Not blindly. First collect logs (dmesg, journalctl) and check the failure scope. Rebooting can erase the best evidence, and if hardware is failing, it can turn a limping system into a dead one.

8) How do I prevent this from happening again?

You don’t prevent all failures. You reduce surprise and blast radius: monitor SMART/NVMe errors, watch dmesg for resets, keep free space, scrub ZFS, test backups, and isolate corosync networking.

9) Is ext4 or ZFS “better” for avoiding read-only incidents?

They fail differently. ext4 often remounts ro on detected errors; ZFS often keeps going until it can’t, then gets strict (suspends I/O) and gives you excellent diagnostics. Choose based on your operational maturity: ZFS rewards discipline; ext4 is simpler but less self-describing.

Next steps you should actually take

If you’re in the middle of the incident right now:

Run the fast diagnosis playbook: identify the mount, read kernel logs, decide “storage vs filesystem vs quorum.”
If you see I/O errors: treat hardware as guilty until proven innocent. Grab SMART/NVMe logs and plan replacement.
If it’s ext4/XFS metadata: schedule an offline repair window. Don’t “remount and pray.”
If it’s /etc/pve: restore quorum. Fix the corosync network. Don’t replace disks to solve a voting problem.
If it’s ZFS: interpret zpool status -v literally. Replace bad devices, scrub, and restore anything with permanent errors.

Then, after service is back, do the unsexy work: add monitoring for kernel I/O errors and device resets, alert on disk wear/media errors, enforce free-space thresholds on / and /var, and treat corosync like the control plane it is. Production doesn’t reward heroics. It rewards systems that fail predictably and recover quickly.