Your Proxmox node is “fine” until it isn’t. Then you notice a yellow banner, a pool that says DEGRADED,
and VMs that suddenly feel like they’re running through syrup. You don’t need a pep talk. You need a disk replaced
correctly—without yanking the wrong drive, without triggering a second failure, and without turning a recoverable event
into a resume update.
ZFS is forgiving, but it’s not magic. A degraded pool is ZFS telling you: “I am operating without a safety net.
Please stop improvising.” This guide is the production way to do it: identify the right device, preserve your topology,
replace with confidence, and verify resilvering and integrity afterwards.
What “DEGRADED” actually means in Proxmox + ZFS
In ZFS, DEGRADED means the pool is still usable, but at least one vdev (virtual device) has lost redundancy
or has a component that’s unavailable or misbehaving. The important part is not the word; it’s the implications:
-
You are one failure away from data loss if you’re on mirrors with one side gone, or on RAIDZ with
no remaining parity tolerance. - Performance is often worse because ZFS may be reconstructing reads, retrying I/O, or working around a flaky device.
- Scrubs and resilvers become risk events: they stress the remaining disks, which is exactly what you don’t want when they’re aging.
Proxmox is mostly a messenger here. It runs ZFS as the storage backend (often for rpool and sometimes for VM storage pools),
and surfaces zpool status in the UI. The real control surface is the shell.
“Replace a disk” sounds like a hardware task. In ZFS, it’s a data migration operation with a hardware dependency. Treat it that way.
Fast diagnosis playbook (check first/second/third)
When a pool goes degraded, your job is to answer three questions quickly: Which disk? Is it actually failing or just missing?
Is the pool stable enough to resilver safely?
First: confirm the pool state and the exact vdev member that’s sick
Don’t guess from the Proxmox UI. Get the authoritative status from ZFS.
Second: determine if this is a “dead disk” or a “pathing problem”
A disk can look failed because a controller reset, a bad SATA cable, a flaky backplane slot, or a device renumbering made it vanish.
The fix is different—and replacing hardware blindly can make things worse.
Third: assess risk before you start stressing the pool
If you have remaining disks showing reallocated sectors, read errors, or timeouts, you may want to slow down I/O,
schedule a window, or take a backup snapshot/replication pass before resilvering.
The fastest way to find the bottleneck: ZFS status → kernel logs → SMART. If those three line up, you act.
If they disagree, you pause and figure out why.
Interesting facts & historical context (why ZFS behaves this way)
- ZFS came out of Sun Microsystems in the mid-2000s with end-to-end checksumming as a first-class feature, not an add-on.
- “Copy-on-write” is why ZFS hates partial truths: it writes new blocks and then updates pointers, which makes silent corruption easier to detect.
- ZFS doesn’t “rebuild”; it “resilvers”, meaning it only reconstructs the blocks that are actually in use—not the entire raw device.
- RAIDZ was designed to fix RAID-5/6 write hole problems by integrating parity management with the filesystem transaction model.
- Device names like
/dev/sdaare not stable; persistent naming via/dev/disk/by-idbecame best practice because Linux enumeration changes. - Scrubs exist because checksums need exercising: ZFS can detect corruption, but a scrub forces reading and verifying data proactively.
- Advanced Format (4K sector) drives created a whole era of pain;
ashiftis ZFS’s way of aligning allocations to physical sector size. - SMART isn’t a verdict, it’s a weather report: many disks die “healthy,” while others limp along “failing” for months.
One reliability paraphrased idea that remains painfully true comes from Gene Kranz (NASA flight director): paraphrased idea: be tough and competent
.
In storage terms: don’t improvise, and don’t touch two things at once.
Before you touch hardware: safety rails that prevent collateral damage
Use stable disk identities (serial-based), not whatever Linux called it today
If you do disk work by /dev/sdX alone, you are playing roulette with a loaded wheel. Proxmox upgrades,
kernel updates, controller resets, or simply rebooting can reshuffle enumeration. ZFS can also store paths in multiple forms.
You want to anchor your decisions to immutable facts: WWN, serial, and bay location.
Decide your blast radius
If the pool is hosting VM disks (zvols), a resilver is heavy I/O. If you can evacuate critical VMs,
do it. If you can’t, you still proceed—but you do it intentionally: limit workload, avoid concurrent maintenance,
and watch latency like a hawk.
Don’t “fix” two problems at the same time
Replacing a disk is already a controlled emergency. Do not pair it with a kernel update, HBA firmware update,
ZFS feature flag changes, or a “quick cable tidy.” You want causality. You want reversibility.
Joke #1: Storage arrays don’t have moods. They have consequences, and they always remember what you did.
Practical tasks with commands, outputs, and decisions (12+)
Everything below is written for a typical Proxmox node using ZFS on Linux. Commands assume root privileges or sudo.
Replace pool/device names with your reality. If your pool is rpool, great. If it’s tank, use that.
Task 1: Confirm pool health and identify the failing member
cr0x@server:~$ sudo zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.
action: Replace the device using 'zpool replace'.
scan: scrub repaired 0B in 00:14:12 with 0 errors on Thu Dec 19 02:10:03 2025
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K123456 ONLINE 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K654321 FAULTED 12 0 0 too many errors
errors: No known data errors
What it means: You have a mirror with one side faulted. Reads are being served from the remaining disk.
The pool is still running, but redundancy is gone for that vdev.
Decision: Replace ...654321. Do not detach the healthy disk. Do not “clear” errors and hope.
Task 2: Map ZFS device IDs to physical disks (WWN/serial)
cr0x@server:~$ ls -l /dev/disk/by-id | egrep 'S2Y3NX0K123456|S2Y3NX0K654321'
lrwxrwxrwx 1 root root 9 Dec 26 09:10 ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K123456 -> ../../sda
lrwxrwxrwx 1 root root 9 Dec 26 09:10 ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K654321 -> ../../sdb
What it means: Right now the healthy disk is /dev/sda and the bad disk is /dev/sdb.
That mapping may change after a reboot.
Decision: Work in terms of /dev/disk/by-id/ata-..., and use /dev/sdb only for immediate SMART checks.
Task 3: Check kernel logs for timeouts/resets (distinguish disk vs path)
cr0x@server:~$ sudo journalctl -k -b | egrep -i 'sdb|ata|I/O error|timeout|reset' | tail -n 25
Dec 26 08:41:12 server kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 26 08:41:12 server kernel: ata2.00: failed command: READ DMA EXT
Dec 26 08:41:12 server kernel: blk_update_request: I/O error, dev sdb, sector 118782976 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Dec 26 08:41:13 server kernel: ata2: hard resetting link
Dec 26 08:41:18 server kernel: ata2: link is slow to respond, please be patient (ready=0)
Dec 26 08:41:23 server kernel: ata2.00: configured for UDMA/133
What it means: This looks like real I/O errors and link resets. Could be the disk, could be the cable/backplane.
Decision: If this is a hot-swap bay, reseat the drive once (carefully) before replacing. If it persists, replace disk and consider swapping bay/cable later.
Task 4: Get SMART health summary (fast)
cr0x@server:~$ sudo smartctl -H /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-5-pve] (local build)
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
What it means: The drive is admitting it’s toast.
Decision: Replace the drive. No more diagnostics theater.
Task 5: Pull the SMART attributes that matter (timeouts vs wear vs media)
cr0x@server:~$ sudo smartctl -a /dev/sdb | egrep -i 'Serial Number|Reallocated|Pending|Offline_Uncorrectable|CRC|Power_On_Hours|Media_Wearout|Percentage_Used'
Serial Number: S2Y3NX0K654321
Power_On_Hours: 43120
Reallocated_Sector_Ct: 48
Current_Pending_Sector: 9
Offline_Uncorrectable: 9
UDMA_CRC_Error_Count: 0
What it means: Pending and uncorrectable sectors are classic media failure. CRC count is 0, so it’s less likely a cable problem.
Decision: Replace the drive. If CRC errors were high, you’d suspect cabling/backplane and still likely replace—just with extra scrutiny.
Task 6: Verify the pool’s topology (mirror vs RAIDZ) before replacement commands
cr0x@server:~$ sudo zpool get -H -o value ashift rpool
12
What it means: ashift=12 (4K sectors). This is what you want in modern systems.
Decision: Ensure the replacement disk is not smaller and is suitable for 4K alignment. You cannot change ashift on an existing vdev member in-place.
Task 7: Check that the replacement disk is visible and not in use
cr0x@server:~$ lsblk -o NAME,SIZE,MODEL,SERIAL,TYPE,MOUNTPOINT
NAME SIZE MODEL SERIAL TYPE MOUNTPOINT
sda 476.9G SAMSUNG MZ7LN512 S2Y3NX0K123456 disk
├─sda1 1007K 0 part
├─sda2 1G 0 part
└─sda3 475.9G 0 part
sdb 476.9G SAMSUNG MZ7LN512 S2Y3NX0K654321 disk
├─sdb1 1007K 0 part
├─sdb2 1G 0 part
└─sdb3 475.9G 0 part
sdc 476.9G SAMSUNG MZ7LN512 S2Y3NX0K777777 disk
What it means: The new disk is sdc and appears blank (no partitions listed). Good.
Decision: Use /dev/disk/by-id for sdc too. Confirm serial matches what’s on the box/bay label.
Task 8: Confirm persistent ID for the new disk
cr0x@server:~$ ls -l /dev/disk/by-id | grep S2Y3NX0K777777
lrwxrwxrwx 1 root root 9 Dec 26 09:16 ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K777777 -> ../../sdc
What it means: You have a stable identifier for the replacement disk.
Decision: Proceed with zpool replace using by-id paths.
Task 9: Replace the member correctly (let ZFS do the right thing)
cr0x@server:~$ sudo zpool replace rpool \
/dev/disk/by-id/ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K654321 \
/dev/disk/by-id/ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K777777
What it means: ZFS will attach the new disk and start resilvering onto it.
Decision: Do not reboot. Do not remove the remaining good disk. Monitor resilver progress.
Task 10: Watch resilver progress and verify it’s actually moving
cr0x@server:~$ watch -n 5 sudo zpool status rpool
pool: rpool
state: DEGRADED
status: One or more devices is being resilvered.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Dec 26 09:18:41 2025
112G scanned at 1.21G/s, 38.2G issued at 423M/s, 475G total
38.2G resilvered, 8.04% done, 00:17:42 to go
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K123456 ONLINE 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K777777 ONLINE 0 0 0 (resilvering)
What it means: Resilver is active, shows throughput, ETA, and scanned vs issued. That’s a good sign.
Decision: If “issued” stays stuck at 0 for minutes while “scanned” increases slowly, you may be bottlenecked or blocked by errors. Check logs and SMART on the surviving disk.
Task 11: Check load/latency impact on the node (avoid accidental outage)
cr0x@server:~$ sudo iostat -x 2 5
Linux 6.8.12-5-pve (server) 12/26/2025 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
8.12 0.00 3.44 24.66 0.00 63.78
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s w_await wareq-sz aqu-sz %util
sda 98.0 91200.0 0.0 0.00 3.10 930.6 21.0 980.0 8.60 46.7 0.41 38.0
sdc 44.0 40120.0 0.0 0.00 4.90 912.7 62.0 15200.0 2.40 245.2 0.44 29.5
What it means: %iowait is elevated. That’s normal during resilver. %util not pegged suggests you still have headroom.
Decision: If iowait is crushing VM latency, throttle workload or schedule a quiet window. Avoid launching backups or heavy migrations mid-resilver.
Task 12: Confirm pool returns to ONLINE and the old device is gone
cr0x@server:~$ sudo zpool status -v rpool
pool: rpool
state: ONLINE
scan: resilvered 475G in 00:22:19 with 0 errors on Fri Dec 26 09:41:00 2025
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K123456 ONLINE 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K777777 ONLINE 0 0 0
errors: No known data errors
What it means: You’re back to redundancy. No errors. That’s the goal state.
Decision: Now you do cleanup and verification: scrub scheduling, alerting, labeling, and postmortem notes.
Task 13: If the disk is “UNAVAIL” not “FAULTED,” check if it’s just missing
cr0x@server:~$ sudo zpool status rpool
pool: rpool
state: DEGRADED
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K123456 ONLINE 0 0 0
ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K654321 UNAVAIL 0 0 0 cannot open
What it means: ZFS cannot open the device; that might be a dead disk, or it might be a path issue.
Decision: Check whether the device exists in /dev/disk/by-id. If it disappeared, look for controller/backplane issues before declaring it “failed.”
Task 14: Validate the missing device path exists (or not)
cr0x@server:~$ test -e /dev/disk/by-id/ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K654321; echo $?
1
What it means: Exit code 1 means the path doesn’t exist. The OS doesn’t see it.
Decision: Check physical seating, backplane slot, HBA logs. If it comes back, you might do zpool online instead of replace.
Task 15: If a disk came back, try onlining it (only if you trust it)
cr0x@server:~$ sudo zpool online rpool /dev/disk/by-id/ata-SAMSUNG_MZ7LN512HMJP_S2Y3NX0K654321
What it means: ZFS will attempt to bring the device back. If it was transient, the pool might return to ONLINE without replacement.
Decision: If SMART still looks ugly or logs show repeated resets, don’t get sentimental. Replace it anyway.
Task 16: Clear old error counts only after fixing the underlying problem
cr0x@server:~$ sudo zpool clear rpool
What it means: This clears error counters and some fault states.
Decision: Use it to confirm the fix held (errors stay at 0). Don’t use it as a way to “green up” a pool without replacement.
Joke #2: If you “clear” the pool errors without fixing anything, congratulations—you’ve successfully silenced the smoke alarm while the toast is still on fire.
Replacement workflows: mirror vs RAIDZ, hot-swap vs cold-swap
Mirror vdevs: the straightforward case (still easy to mess up)
Mirrors are operationally friendly. Replace one disk, resilver, you’re done. The failure mode is human:
someone pulls the wrong disk, or replaces a disk with one that’s a hair smaller, or uses unstable device names,
or detaches the wrong member.
Recommended approach:
- Identify the failed disk by serial and bay, not by
sdb. - Use
zpool replacewith/dev/disk/by-idpaths. - Monitor resilver and system I/O. Resilver is not a background whisper; it’s a forklift.
- Verify
zpool statusreturns toONLINE, then do a scrub within a maintenance window if you can tolerate the load.
RAIDZ vdevs: replacement is still simple; the consequences aren’t
With RAIDZ, the pool can remain online with one or more missing disks depending on parity level (RAIDZ1/2/3).
But during degradation, every read of data that touched the missing disk may require reconstruction, which stresses remaining drives.
Then resilvering adds more I/O. This is how “one failed disk” turns into “why are three disks timing out.”
The method is the same: zpool replace the member. The operational posture changes:
- Don’t run a scrub and a resilver concurrently unless you enjoy long nights.
- Consider reducing workload during resilver.
- If another disk is throwing errors, pause and decide whether to back up/replicate before proceeding.
Hot-swap bays: trust but verify
Hot swap is not “plug-and-pray.” It’s “hot swap if your backplane, HBA, and OS agree on reality.”
On Proxmox, you can typically replace a failed disk live, but you must:
- Confirm the right bay LED (if available) or use a mapping process (serial ↔ bay).
- Insert the new disk and ensure it appears under
/dev/disk/by-id. - Then run
zpool replace. Not before.
Cold swap (shutdown): sometimes boring is correct
If you’re on questionable hardware (consumer SATA controllers, flaky backplanes, old BIOS, or a history of link resets),
a cold swap can reduce risk. It’s also operationally cleaner if the pool is already unstable.
You still do the same identity checks after boot, because enumeration can change.
Checklists / step-by-step plan (production-ready)
Checklist A: “I saw DEGRADED” response plan
- Capture current state:
zpool status -vand save output to your ticket/notes. - Check if errors are rising: run
zpool statusagain after 2–5 minutes. Are READ/WRITE/CKSUM counts increasing? - Check kernel logs for resets/timeouts:
journalctl -k -b. - SMART check on suspect disk and on the surviving members in the same vdev.
- Decide whether you can proceed live or need a maintenance window.
Checklist B: Safe replacement steps (mirror or RAIDZ member)
- Identify the failing member by
zpool statusname (preferby-id). - Map it to a serial and bay label using
ls -l /dev/disk/by-idand your chassis inventory. - Insert the replacement disk. Confirm it appears in
/dev/disk/by-id. - Confirm the replacement disk size is not smaller than the old one:
lsblk. - Run
zpool replace <pool> <old> <new>. - Monitor resilver:
zpool statusuntil done. Watch system latency. - When ONLINE, record the resilver duration and any errors observed.
- Optionally run a scrub in the next quiet window (not immediately if the system is under heavy load).
Checklist C: Post-replacement validation
zpool status -vshowsONLINEand 0 known data errors.- SMART for the new disk shows clean baseline (save SMART report).
- Confirm alerts are clear in Proxmox and your monitoring system.
- Update inventory: bay → serial mapping, warranty tracking, and replacement date.
Checklist D: If resilver is slow or stuck
- Check if another disk is now erroring (SMART + kernel logs).
- Check for saturation:
iostat -xand VM workload. - Consider pausing noncritical jobs (backups, replication, bulk storage moves).
- If errors are climbing, stop making changes and plan a controlled outage with backups ready.
Common mistakes: symptoms → root cause → fix
1) Pool still DEGRADED after replacement
Symptom: You replaced a disk, but zpool status still shows DEGRADED and the old device name lingers.
Root cause: You used the wrong identifier (e.g., replaced /dev/sdb but ZFS tracked by-id), or you attached without replacing.
Fix: Use zpool status to find the exact device string, then run zpool replace pool <that-exact-old> <new-by-id>. Avoid /dev/sdX.
2) Replacement disk is “too small” even though it’s the same model
Symptom: zpool replace errors with “device is too small.”
Root cause: Manufacturers quietly vary usable capacity across firmware revisions, or the old disk had slightly larger reported size.
Fix: Use an equal-or-larger disk. For mirrored boot pools, buy the next capacity up rather than playing model-number bingo.
3) Resilver is crawling and the node is unusable
Symptom: VM latency spikes, I/O wait is high, users complain, and resilver ETA keeps growing.
Root cause: Workload contention (VM writes + resilver reads/writes), plus potentially a marginal surviving disk.
Fix: Reduce workload (pause heavy jobs, migrate noncritical VMs), check SMART on surviving disks, and consider a maintenance window. If surviving disk is erroring, your real emergency is “second disk is dying.”
4) You pulled the wrong disk and now the pool is OFFLINE
Symptom: Pool drops, VMs pause/crash, and zpool status shows missing devices.
Root cause: Human identification failure: bay mapping not confirmed, device naming instability, or no LED locate procedure.
Fix: Reinsert the correct disk immediately. If you removed a healthy mirror member, put it back first. Then reassess. This is why you label bays and record serials.
5) Proxmox boot pool replaced, but node won’t boot
Symptom: After replacing a disk in rpool, system fails to boot from the new disk if the old one is removed.
Root cause: The bootloader/EFI entry wasn’t installed or mirrored to the new device. ZFS redundancy does not automatically mirror your bootloader state.
Fix: Ensure Proxmox boot tooling/EFI setup is replicated across boot devices. Validate by temporarily setting BIOS boot order or performing a controlled boot test.
6) CKSUM errors with “healthy” SMART
Symptom: zpool status shows checksum errors on a device, but SMART looks fine.
Root cause: Often cabling, backplane, HBA issues, or power problems causing data corruption in transit.
Fix: Reseat/replace cables, move the disk to another bay/controller port, check HBA firmware stability. Clear errors after fixing and watch if they return.
Three corporate mini-stories from real life
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company ran Proxmox on two nodes, each with a ZFS mirror boot pool and a separate RAIDZ for VM storage.
One morning, a node flagged DEGRADED. The on-call engineer saw /dev/sdb in a quick glance at zpool status,
assumed it mapped cleanly to “Bay 2,” and asked facilities to pull that drive.
Facilities did exactly what they were told, except the bays weren’t labeled and the chassis had been re-cabled during a “cleanup.”
The drive pulled was the healthy mirror member, not the faulted one. The pool didn’t die instantly because ZFS is polite—until it isn’t.
The node took a performance hit, then a second disk threw errors under the extra read load.
The recovery was ugly but educational: reinsert the removed disk, let the pool settle, then redo the identification properly using
/dev/disk/by-id serials cross-checked against the physical labels they created during the incident.
The long-term fix wasn’t fancy: they documented bay-to-serial mapping and stopped speaking in sdX.
The wrong assumption wasn’t “people make mistakes.” It was “device names are stable.” They aren’t. Not on a good day.
Mini-story 2: The optimization that backfired
Another org wanted faster resilvers and scrubs. Someone read that “more parallelism is better” and tuned ZFS aggressively:
higher scan rates, more concurrent operations, the whole “make the graph go up” approach.
It looked great in a quiet lab. In production, it collided with real workloads: database VMs, backups, and replication traffic.
During a degraded event, resilver began at impressive throughput, then the node started timing out guest I/O.
The VM cluster didn’t crash; it just slowly turned into molasses. Operators tried to fix it by restarting services and migrating VMs,
which added more I/O. The resilver slowed further. More retries. More pain.
They eventually stabilized by backing off the “optimization” and letting the resilver run at a sustainable pace,
prioritizing service latency over benchmark numbers. After the incident, they kept conservative defaults and built a runbook:
during resilver, pause nonessential jobs and treat storage latency as a primary SLO.
The lesson: resilver speed is not a vanity metric. The only number that matters is “finished without a second failure while users kept working.”
Mini-story 3: The boring but correct practice that saved the day
A regulated business ran Proxmox nodes with ZFS mirrors and a strict habit: every disk bay had a label,
every label corresponded to a recorded serial, and every replacement was done by by-id with screenshots of zpool status
pasted into the ticket.
When a pool degraded during a holiday week, the on-call engineer was a generalist, not a storage person.
They followed the runbook: check ZFS status, map serial, validate SMART, confirm replacement serial,
and only then replace. No cleverness. No shortcuts.
The resilver finished cleanly. A scrub later showed no errors. The post-incident review was short because there wasn’t much to review.
Their “boring practice” prevented the most common disaster: removing the wrong disk or replacing the wrong vdev member.
Boring is underrated. Boring is how you sleep.
FAQ
1) Should I use the Proxmox GUI to replace a disk, or the CLI?
Use the CLI for the actual ZFS operations. The GUI is fine for visibility, but zpool status is the source of truth,
and you want copy-pastable commands and outputs for your incident notes.
2) Can I reboot during a resilver?
Avoid it. ZFS can usually resume resilvering, but rebooting introduces risk: device renumbering, HBA quirks, and the chance that a marginal disk doesn’t come back.
If you must reboot for stability, do it intentionally and document the state before and after.
3) What’s the difference between zpool replace and zpool attach?
replace substitutes one device for another and triggers resilver. attach adds a new device to a mirror (turning a single-disk vdev into a mirror, or expanding a mirror width).
For a failed mirror member, you almost always want replace.
4) Should I run a scrub immediately after replacement?
Not immediately, unless you’re in a quiet window and can tolerate the load. A resilver already reads a lot of data.
Schedule a scrub after the system cools down, especially on RAIDZ pools.
5) The disk shows as UNAVAIL. Is it dead?
Not necessarily. UNAVAIL can mean the OS can’t see it (path/cable/controller) or the disk is dead.
Check /dev/disk/by-id, kernel logs, and SMART if the device reappears.
6) Can I replace a disk with a larger one and get more space?
You can replace with larger disks, but you only gain usable space after all members of a vdev are upgraded and ZFS is allowed to expand.
Mirrors expand after both sides are larger; RAIDZ expands after all disks in that RAIDZ vdev are larger.
7) Why does ZFS show errors but applications seemed fine?
Because ZFS can correct many errors transparently using redundancy and checksums. That’s not “fine,” it’s “caught it in the act.”
Treat corrected errors as a warning: something is degrading.
8) What if the surviving disk in a mirror starts showing SMART errors during resilver?
That’s the uncomfortable moment: you may be in a two-disk failure scenario waiting to happen.
If possible, reduce load, prioritize backing up/replicating critical data, and consider whether you should stop and do a controlled recovery plan.
Continuing might work—but you’re gambling with your last good copy inside that vdev.
9) Does ZFS automatically mirror the Proxmox bootloader on rpool?
Not reliably by itself. ZFS mirrors data blocks; bootability depends on EFI/bootloader installation and firmware boot entries.
After replacing boot disks, validate that each disk is independently bootable if your design requires it.
10) Is it safe to “offline” a disk before pulling it?
Yes, when you’re deliberately removing a member (especially in hot-swap). Offlining reduces surprise by telling ZFS the device is going away.
But never offline the last good member of a vdev. Confirm topology first.
Conclusion: next steps after you’re back to ONLINE
Getting from DEGRADED to ONLINE is the tactical win. The strategic win is making sure the next disk failure
is boring, fast, and doesn’t require heroics.
- Record the incident: paste
zpool status -vbefore/after, SMART output, and which serial was replaced. - Fix identification debt: label bays, maintain a serial-to-slot map, and standardize on
/dev/disk/by-id. - Verify monitoring: alerts for ZFS pool state, SMART critical attributes, and kernel link resets.
- Schedule scrubs intentionally: regular enough to catch rot, not so aggressive that you’re always stressing disks.
- Practice the runbook: the best time to learn
zpool replaceis not during the first time your pool goes degraded.
ZFS gives you a fighting chance. Don’t squander it with guesswork. Replace the right disk, in the right way, and keep the collateral damage where it belongs: nowhere.