At some point, every storage team meets the same villain: the “temporary” vdev. It arrived during a crisis, it solved a capacity issue, and now it’s living in your pool like a bad roommate who doesn’t pay rent. You want it gone. ZFS, famously conservative, says: maybe.
Vdev removal is real, useful, and still full of sharp edges. In production, the difference between “works fine” and “ate my weekend” is planning around ZFS’s rules, not your hopes.
The mental model: what “removal” actually means
ZFS pools are built from vdevs (virtual devices). A vdev is the atomic unit of redundancy and allocation. ZFS stripes data across vdevs, not across individual disks. That sentence should change how you think about removal.
When you run zpool remove, you’re not “unplugging a disk.” You’re initiating an evacuation: ZFS rewrites every allocated block that lives on the target vdev to other vdevs in the pool. Only after that succeeds can the vdev be detached from the pool configuration. During evacuation, the vdev must be readable; the rest of the pool must have enough free space to absorb the data.
That also means removal is not instant. It’s a pool-wide data movement job with real side effects: more I/O, more fragmentation pressure, more latency spikes, and a lot of “why is my pool busy at 2 a.m.?” messages.
One more crucial nuance: vdev removal is not a general “shrink the pool” feature. It’s a very specific capability with constraints, and ZFS will refuse to do it in many layouts. If you take nothing else: design your pool like you might want to change it later, because ZFS won’t magically save you from architectural regret.
Terms you need, in operational language
- Top-level vdev: A vdev directly under the pool (e.g., a mirror vdev, a raidz vdev, or a single-disk vdev). Most removal discussions are about top-level vdevs.
- Leaf device: A disk or partition inside a vdev. In a mirror, you can remove/replace leaves; that’s not “vdev removal,” that’s “device replacement/detach.”
- Special vdev: A metadata (and optionally small blocks) allocation class device. Powerful. Also easy to turn into a single point of failure if you get clever.
- SLOG (log vdev): Separate intent log for synchronous writes. Removable, but the consequences depend on your workload and sync settings.
- L2ARC (cache vdev): Read cache. Always removable; it’s a performance hint, not primary storage.
- Evacuation: ZFS’s data move off the removed vdev. That’s the work.
What you can remove (and what you can’t)
You can usually remove: cache (L2ARC) devices
L2ARC devices can be added and removed without affecting data correctness. Your worst case is performance regression while the cache warms again. If your pool is healthy, removing an L2ARC is basically boring.
You can usually remove: log (SLOG) devices
Removing a log device is allowed. ZFS will fall back to using the main pool for the ZIL. The pool remains consistent. The risk is latency and throughput for synchronous writes, not data loss. If you remove SLOG under high sync load, your apps will tell you immediately, usually via timeouts.
You can remove: top-level vdevs in some configurations (the big one)
Top-level vdev removal is the capability people care about when they say “shrink a pool.” It exists, but it’s gated by implementation and version support. In practice:
- Removing an entire top-level mirror vdev is commonly supported (when the feature exists on your platform and pool).
- Removing a top-level single-disk vdev is often supported (again: feature/platform dependent).
- Removing a top-level raidz vdev is typically not supported on many deployments; even when some platforms evolve, you should treat it as “assume no” unless you have explicitly verified on your exact ZFS implementation and feature flags.
You can remove: devices from a mirror vdev (but that’s not “vdev removal”)
If you have a mirror vdev with two disks, you can detach one disk and keep operating on the remaining disk (now a single-disk vdev). That is a leaf operation. It reduces redundancy. It doesn’t evacuate blocks to other vdevs. It’s often what people actually need when they say “remove a disk.”
Special vdevs: removable in theory, scary in practice
Special vdevs (allocation class) are where production meets hubris. If a special vdev holds metadata (it does), losing it can make the pool unusable. Removing a special vdev might be supported in your stack, but you should treat it as a migration project, not a casual command. If your special vdev is mirrored and healthy, you can plan its removal; if it’s degraded, you’re in emergency territory.
Joke #1: A special vdev is like a “temporary” Kubernetes cluster—everyone swears it’s just for testing until it becomes mission-critical on a Friday night.
Hard limits you must respect
1) You can’t remove what ZFS can’t evacuate
Evacuation requires free space on the remaining vdevs. Not “a little” free. Enough to rewrite allocated blocks plus overhead, while also keeping the pool’s allocator functioning. If your pool is 85–90% full, removal is a bad plan. At 95% full, it’s self-harm.
2) Removal is feature-gated and implementation-specific
There isn’t one ZFS. There’s OpenZFS, and then there are operating system integrations and vendor distributions with different versions, feature flags, and bugfix backports. The pool itself also has feature flags: if the pool doesn’t have the right feature enabled, you’re not removing anything.
3) You can’t “partially remove” a top-level raidz to shrink it
People ask: “Can I remove one disk from my raidz vdev and keep raidz?” No. You can replace disks with larger ones and expand (with the right feature support), but shrinking a raidz width by removing a leaf device is not how ZFS is designed. If you want flexible shrinking, you build with mirrors, accept the capacity overhead, and sleep at night.
4) You can’t remove the last top-level vdev
A pool needs at least one top-level vdev. If you want to “remove everything,” you’re not shrinking. You’re migrating.
5) Removal stresses the pool and competes with production workloads
Evacuation is a rewrite job. It will hit disks, ARC, metadata, and possibly your fragmentation. If you do it during peak load, expect latency to climb and “mysterious” application slowdowns. If you must do it online, throttle it and monitor relentlessly.
6) Removal is not the same as resilvering, and the knobs differ
Resilvering is reconstruction into a replacement device. Evacuation is redistribution across the pool. You’ll watch different symptoms: not just “scan speed,” but allocator behavior, metaslab contention, and overall I/O queuing.
7) You might remove the vdev and still not get the space back you expect
ZFS accounting can surprise you. Snapshots, reservations, refreservation, special small block settings, and copies all affect “why is it still full?” After removal, your pool’s total size decreases—sometimes dramatically—so any headroom you thought you had can evaporate. Plan post-removal capacity, not just “can I remove it.”
One operational quote worth keeping in your pocket
John Allspaw (paraphrased idea): “Reliability comes from how systems behave under stress, not from how they behave when everything goes right.”
Interesting facts and short history (because this feature didn’t fall from the sky)
- ZFS was built around “vdevs are forever.” Early ZFS designs assumed you would only grow pools, not shrink them.
- Top-level vdev removal arrived late compared to mirrors/raidz basics; it’s an admission that real operators do messy things during incidents.
- “Allocation classes” (special vdevs) changed operational risk: moving metadata onto fast devices can be brilliant, but it makes those devices existential.
- The ZIL is always there even without a SLOG device; SLOG just relocates the log to faster, lower-latency storage.
- L2ARC contents are disposable by design; it was engineered as a cache you can lose without tears.
- Pool feature flags are a one-way door: enabling newer features can block imports on older systems. Removal capability often depends on these flags.
- ZFS scrubs predate most people’s “cloud native” era as a routine way to detect silent corruption with checksums end-to-end.
- Metaslabs and space maps matter more during evacuation: the allocator becomes the bottleneck long before you peg CPU.
- Fragmentation isn’t just a performance issue: it can turn evacuation into a long, stop-and-go rewrite job with nasty latency spikes.
Fast diagnosis playbook (find the bottleneck before you guess)
This is the “I have 15 minutes before the next meeting and the pool is melting” flow. Don’t be creative. Be systematic.
First: Is the pool even eligible for vdev removal?
- Check feature flags and ZFS version/implementation.
- Confirm the target is a removable class (not a raidz top-level vdev on a stack that doesn’t support it).
Second: Do you have enough space to evacuate?
- Look at pool free space and fragmentation.
- Look at snapshots and reservations that will prevent freeing space.
- Decide whether you need to delete/move data or add temporary capacity first.
Third: Is the pool healthy enough to survive the operation?
- Any degraded vdevs? Fix those first.
- Any ongoing resilver/scrub? Decide whether to wait.
- Check SMART for devices that might die mid-evacuation.
Fourth: If removal is slow, what is it waiting on?
- Disk IOPS saturation? Watch
iostatand per-vdev latency. - Single slow disk? Look for one device with huge await times.
- Allocator contention / fragmentation? Expect uneven progress and high metadata I/O.
- Recordsize/small blocks? Small block workloads can make evacuation look like glue drying.
Practical tasks: commands, outputs, and the decision you make
Below are real operational moves. Each one is “do a thing, interpret output, decide next step.” That’s how you keep storage from becoming performance art.
Task 1: Identify what you’re dealing with (pool topology)
cr0x@server:~$ zpool status -v tank
pool: tank
state: ONLINE
scan: scrub repaired 0B in 02:11:43 with 0 errors on Wed Dec 18 03:11:12 2025
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD80EFAX-1 ONLINE 0 0 0
ata-WDC_WD80EFAX-2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD80EFAX-3 ONLINE 0 0 0
ata-WDC_WD80EFAX-4 ONLINE 0 0 0
logs
nvme-SAMSUNG_MZVLB256 ONLINE 0 0 0
cache
nvme-INTEL_SSDPEKNW512 ONLINE 0 0 0
errors: No known data errors
What it means: Two top-level mirror vdevs, plus a log device and a cache device. Mirrors are the most removal-friendly top-level layout.
Decision: If the goal is shrinking capacity, candidate is mirror-1 (top-level). If the goal is simplifying, you can remove SLOG or L2ARC with minimal risk.
Task 2: Verify pool feature flags (eligibility gate)
cr0x@server:~$ zpool get -H feature@device_removal tank
tank feature@device_removal active -
What it means: The pool has the device removal feature flag and it’s active.
Decision: Proceed with planning. If it’s disabled or -, you’re likely not removing top-level vdevs on this pool without upgrading/importing with features (which is its own change window).
Task 3: Check how full the pool is (can you evacuate?)
cr0x@server:~$ zfs list -o name,used,avail,refer,mountpoint -r tank
NAME USED AVAIL REFER MOUNTPOINT
tank 41.2T 6.3T 96K /tank
tank/home 11.1T 6.3T 10.9T /tank/home
tank/vm 28.7T 6.3T 28.7T /tank/vm
tank/backups 1.4T 6.3T 1.4T /tank/backups
What it means: Only 6.3T available. If the vdev you want to remove has more than that allocated, evacuation cannot finish.
Decision: Before removing a top-level vdev, estimate how much data is on it. If uncertain, assume “about proportional” and require comfortable free space. If you can’t create headroom, don’t start removal.
Task 4: Check pool-wide capacity and fragmentation (evacuation pain predictor)
cr0x@server:~$ zpool list -o name,size,alloc,free,frag,health tank
NAME SIZE ALLOC FREE FRAG HEALTH
tank 47.5T 41.2T 6.3T 58% ONLINE
What it means: 58% fragmentation is not “fine.” It’s “this will take longer than you want and interfere with performance.”
Decision: If you must remove now, plan throttling and a longer window. If you can wait, consider moving cold data off and back (or other defrag strategies) before attempting removal.
Task 5: Identify whether snapshots are trapping space
cr0x@server:~$ zfs list -t snapshot -o name,used,refer -S used | head
NAME USED REFER
tank/vm@hourly-2025-12-25-0900 1.2T 28.7T
tank/vm@hourly-2025-12-25-0800 1.1T 28.7T
tank/home@daily-2025-12-24 640G 10.9T
tank/vm@hourly-2025-12-25-0700 610G 28.7T
What it means: Snapshots are consuming significant space. Deleting datasets may not free space as expected if snapshots reference old blocks.
Decision: If you need headroom to evacuate, reduce snapshot retention or replicate snapshots off-pool, then destroy locally—carefully, with stakeholder alignment.
Task 6: Check for dataset reservations that block evacuation
cr0x@server:~$ zfs get -H -o name,property,value -r reservation,refreservation tank | egrep -v '\t(none|0)\t' | head
tank/vm reservation 5T
tank/home refreservation 1T
What it means: Reservations carve out space that other operations can’t use, including evacuation headroom.
Decision: Temporarily lower or remove reservations (with app owner buy-in) to create real free space. Put them back after.
Task 7: Confirm health before you start moving the world
cr0x@server:~$ zpool status tank
pool: tank
state: ONLINE
scan: scrub repaired 0B in 02:11:43 with 0 errors on Wed Dec 18 03:11:12 2025
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD80EFAX-1 ONLINE 0 0 0
ata-WDC_WD80EFAX-2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD80EFAX-3 ONLINE 0 0 0
ata-WDC_WD80EFAX-4 ONLINE 0 0 0
logs
nvme-SAMSUNG_MZVLB256 ONLINE 0 0 0
cache
nvme-INTEL_SSDPEKNW512 ONLINE 0 0 0
What it means: Online, no errors, recent scrub succeeded. This is what “eligible for risky work” looks like.
Decision: If you see DEGRADED, FAULTED, or rising checksum errors, fix that first. Removal under degradation is gambling with higher stakes.
Task 8: Remove an L2ARC device (safe, reversible in spirit)
cr0x@server:~$ sudo zpool remove tank nvme-INTEL_SSDPEKNW512
What it means: The cache device is detached from the pool. There’s usually no dramatic output.
Decision: Monitor read latency and cache hit ratio (if you track it). If performance drops, add it back or replace with a better cache device. Data integrity is not at risk.
Task 9: Remove a SLOG device (expect sync write latency changes)
cr0x@server:~$ sudo zpool remove tank nvme-SAMSUNG_MZVLB256
What it means: ZFS will revert to using the main pool for intent logging. Your applications may notice.
Decision: If your environment depends on sync writes (databases, NFS with sync), schedule this change or temporarily set expectations. If you see app timeouts, re-add SLOG quickly.
Task 10: Start a top-level vdev removal (the evacuation begins)
cr0x@server:~$ sudo zpool remove tank mirror-1
What it means: The pool begins evacuating data from mirror-1 onto remaining top-level vdevs.
Decision: Immediately start monitoring progress and performance. Also: notify humans. This is not a silent operation.
Task 11: Monitor removal progress via zpool status
cr0x@server:~$ zpool status tank
pool: tank
state: ONLINE
remove: Removal of vdev 1 copied 8.31T of 17.2T at 612M/s, 03:48:29 remaining
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD80EFAX-1 ONLINE 0 0 0
ata-WDC_WD80EFAX-2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0 (removing)
ata-WDC_WD80EFAX-3 ONLINE 0 0 0
ata-WDC_WD80EFAX-4 ONLINE 0 0 0
errors: No known data errors
What it means: ZFS reports copied bytes, rate, and estimated time remaining. Rates will vary wildly; don’t worship the ETA.
Decision: If the rate collapses and latency spikes, throttle or pause workload. If errors appear, stop and reassess hardware health.
Task 12: Watch per-device latency while removal runs (find the slow wheel)
cr0x@server:~$ iostat -x 5
Linux 6.6.0 (server) 12/25/2025 _x86_64_ (32 CPU)
Device r/s w/s rkB/s wkB/s await svctm %util
sda 92.1 48.3 9840 5120 6.4 1.1 15.2
sdb 89.7 47.9 9600 4980 6.7 1.0 14.8
sdc 31.2 88.4 3200 10120 38.9 1.2 98.5
sdd 29.8 90.1 3100 10320 41.3 1.2 99.1
What it means: sdc and sdd are saturated with high await; they are likely the vdev being removed or the destination vdev under pressure.
Decision: If only one device is slow, suspect a failing disk or a path issue. If both destination devices are pegged, you’re simply doing too much I/O—throttle removal by reducing competing workload.
Task 13: Check for TRIM/discard behavior on SSD-based pools (hidden performance cliff)
cr0x@server:~$ zpool get -H autotrim tank
tank autotrim on -
What it means: Autotrim is enabled. During heavy rewrites, trim can add overhead depending on device firmware.
Decision: If you see weird write amplification or latency spikes on SSDs during evacuation, test with autotrim=off in a maintenance window. Don’t toggle this casually without understanding SSD behavior in your fleet.
Task 14: Find “why is space not freeing?” with logical vs physical usage
cr0x@server:~$ zfs get -H -o name,property,value -r used,usedbysnapshots,usedbydataset,usedbyrefreservation tank/vm
tank/vm used 28.7T
tank/vm usedbysnapshots 6.4T
tank/vm usedbydataset 22.3T
tank/vm usedbyrefreservation 0B
What it means: A large chunk is snapshots. Deleting files inside tank/vm won’t reclaim that snapshot-referenced space.
Decision: Adjust snapshot policy or replicate and prune. If you’re trying to make room for removal, snapshots are often the real blocker.
Task 15: Confirm the removed vdev is gone (end-state verification)
cr0x@server:~$ zpool status -v tank
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD80EFAX-1 ONLINE 0 0 0
ata-WDC_WD80EFAX-2 ONLINE 0 0 0
errors: No known data errors
What it means: mirror-1 is gone. Pool size will reflect the shrink.
Decision: Now you can repurpose the disks, replace them, or rebuild a new vdev elsewhere. Also: re-check capacity alarms. Your thresholds should change because the pool is smaller.
Task 16: Validate post-change pool size and headroom (don’t assume)
cr0x@server:~$ zpool list -o name,size,alloc,free,frag,health tank
NAME SIZE ALLOC FREE FRAG HEALTH
tank 23.7T 21.1T 2.6T 64% ONLINE
What it means: The pool is now much smaller; free space shrank too. Fragmentation increased (common after big moves).
Decision: If free space is now tight, move workloads off, add new vdev capacity, or adjust retention. Do not celebrate the removal and then forget you’re running close to full again.
Three corporate mini-stories from the trenches
Incident: the wrong assumption (“we can remove a raidz vdev, right?”)
A mid-size SaaS company had a ZFS pool backing a fleet of VM hosts. It started life as raidz2 because the team wanted capacity efficiency and fault tolerance. Over time, they added another raidz2 vdev during a growth spurt. Then growth slowed, hardware budgets tightened, and someone proposed “removing the newer vdev” to repurpose those disks for a different cluster.
The plan looked harmless: start removal on Friday evening, let it run all weekend, show up Monday to a smaller pool. The operator had used zpool remove before—on cache devices. They assumed the same flexibility applied to any vdev. ZFS disagreed.
The command failed with a blunt error indicating the vdev type wasn’t removable in their environment. The bigger problem wasn’t the failure; it was what happened next. To “force it,” the team tried a sequence of desperate workarounds: detaching devices, offlining disks, and attempting partial replacements. That’s how they converted a healthy raidz vdev into a degraded one, mid-change, while running production.
They recovered, but it took a long night: re-onlining devices, resilvering, and canceling the “shrink” plan entirely. The postmortem was simple: the original mistake was treating vdev removal as a general-purpose pool shrink feature. It isn’t. If you want the option to shrink, you pick a topology that supports it and accept the cost.
Optimization that backfired: adding a “temporary” special vdev to speed up metadata
A financial services engineering team had a ZFS pool serving a large file tree with millions of small files. Metadata operations were slow, directory traversals were painful, and every incident review had the same line: “we should add SSDs.” They did. They added a special vdev on a pair of NVMe devices to hold metadata and small blocks. Performance improved immediately.
Then they got ambitious. To maximize capacity, they configured the special vdev in a way that was technically redundant but operationally fragile: firmware updates and maintenance were handled by a separate platform team, and the storage team didn’t get notified about NVMe quirks. Predictably, one NVMe started throwing errors under sustained load and would occasionally disappear and reappear on the PCIe bus.
Nothing corrupted instantly. That’s what makes it dangerous. The pool began logging intermittent checksum errors tied to metadata reads. Users saw “random” I/O errors. The team’s first instinct was to remove the special vdev and go back to the old world. But removal requires a healthy path to read blocks and rewrite them elsewhere. Their “optimization” had become a dependency.
The eventual fix was unsexy: replace the unstable NVMe, resilver the special mirror, scrub until clean, then plan a controlled removal/migration with adequate headroom. The lesson wasn’t “never use special vdevs.” It was: if you add a special vdev, treat it like a first-class storage tier with lifecycle management, monitoring, and change control. Otherwise it’s a performance upgrade that invoices you later.
Boring but correct practice that saved the day: capacity headroom and staged migration
An enterprise internal platform team ran ZFS for NFS home directories and build artifacts. They had a rule that annoyed everyone: the pool must stay under 75% allocated, and snapshot retention had to be justified with actual recovery needs. Engineers complained because it felt like wasted capacity.
One quarter, they needed to decommission a chassis of older disks. The pool had been expanded over the years with multiple top-level mirror vdevs (not raidz), specifically because they valued operational flexibility. They planned vdev removal for the mirrors that sat on the old chassis.
Because the pool had headroom, evacuation could run without the allocator panicking. Because snapshots were controlled, they could actually free space if they needed to. Because they practiced scrubs and replaced questionable drives early, they didn’t discover hardware failures halfway through a multi-day rewrite job.
The removals still took time, and the pool ran hotter than usual, but it stayed predictable. The decommission happened on schedule, and no one outside the storage on-call even noticed. That’s the point of boring practices: they don’t create exciting stories, and that’s why you keep doing them.
Joke #2: “Just remove the vdev” is the storage version of “just restart it”—occasionally correct, usually missing the part where you explain what breaks next.
Common mistakes: symptoms → root cause → fix
1) Symptom: zpool remove refuses with “operation not supported”
Root cause: Pool feature flags don’t include device removal, or your ZFS implementation/version doesn’t support removal for that vdev type.
Fix: Verify feature flags, ZFS version, and vdev type. If it’s raidz, assume you need migration rather than removal. Don’t try “creative” detach/offline hacks.
2) Symptom: removal starts, then progress crawls to nearly zero
Root cause: Pool is too full or too fragmented; allocator and metaslab selection become expensive, and writes compete with production I/O.
Fix: Create headroom (delete/replicate data, reduce snapshots), schedule work in a quiet window, and reduce competing workload. If you can’t, don’t run removal online at peak.
3) Symptom: pool is ONLINE but apps see latency spikes and timeouts during removal
Root cause: Evacuation saturates disks and blows out latency for synchronous workloads or metadata-heavy operations.
Fix: Add temporary capacity to reduce movement pressure, move hot workloads away, or throttle at the workload layer (rate-limit backups, pause rebuilds, stagger batch jobs). Monitor at the application SLO level, not just storage metrics.
4) Symptom: “We freed 5TB but zpool list free didn’t change much”
Root cause: Snapshots still reference blocks; space is logically deleted but physically retained.
Fix: Identify large snapshot consumers with zfs list -t snapshot and usedbysnapshots. Adjust retention, replicate, then destroy snapshots intentionally.
5) Symptom: after removing SLOG, database performance tanks
Root cause: Sync writes now land on main pool; latency increased drastically.
Fix: Re-add SLOG on low-latency, power-loss-protected devices. If you can’t, evaluate whether the app truly needs sync or if you were accidentally forcing sync via mount/export settings.
6) Symptom: removing special vdev seems impossible without risk
Root cause: Special vdev contains critical metadata/small blocks, and you don’t have a safe target layout or headroom for migration.
Fix: Treat it as a migration: add a new special vdev (mirrored), migrate, scrub, then remove the old. If your platform doesn’t support removal, plan a pool migration instead.
7) Symptom: pool shrank but free space is now dangerously low
Root cause: You planned evacuation feasibility, not post-removal operating headroom.
Fix: Recompute capacity targets after removal. Update alert thresholds. Add capacity before you’re forced to add it during an outage.
8) Symptom: one disk shows crazy await during removal
Root cause: Failing disk, bad cable/HBA path, or a device with shingled behavior/firmware quirks under sustained writes.
Fix: Validate hardware. Replace suspect device before continuing a long evacuation. If it dies mid-operation, you may lose your ability to read blocks for evacuation.
Checklists / step-by-step plan (how to do this without improvising at 3 a.m.)
Pre-flight checklist: decide if removal is the right tool
- State the goal: shrink capacity, decommission hardware, remove a performance tier, or reverse a temporary expansion.
- Identify the target: cache/log/special/top-level data vdev.
- Verify support: feature flags and ZFS implementation/version for that target.
- Confirm pool health: no degraded vdevs, no unresolved checksum errors.
- Confirm headroom: free space plus operational margin; adjust snapshots/reservations.
- Plan workload impact: schedule, throttle, and communicate.
Execution plan: top-level vdev removal (data vdev)
- Take a configuration snapshot for your own sanity: capture
zpool status,zpool list,zfs list, and reservations settings. - Run a scrub before removal if you haven’t recently. You want to discover latent read errors now, not mid-evacuation.
- Reduce or pause non-critical batch jobs: backups, large restores, and compaction-like workloads.
- Start removal:
zpool remove pool target. - Monitor progress and latency:
zpool statusandiostat -x. - If performance impact is unacceptable: reduce competing workload first. Don’t panic-stop unless you have an explicit rollback plan.
- When complete, verify topology and capacity. Update monitoring thresholds and capacity forecasts.
Execution plan: removing SLOG or L2ARC (supporting vdevs)
- Measure baseline app latency and storage latency.
- Remove the device with
zpool remove. - Watch synchronous write workloads (databases, NFS) for SLOG removal; expect little for L2ARC removal besides cache warm-up behavior.
- Decide whether to replace with a better device or stay without it.
Rollback thinking (because you don’t always get what you want)
- If removal is not supported: your rollback is “stop trying” and plan a migration to a new pool with the right topology.
- If removal is supported but too disruptive: rollback is operational—pause competing workloads, add temporary capacity, or reschedule. You can’t “unwrite” the evacuation already completed.
- If hardware starts failing during removal: treat it as an incident. The removed vdev must remain readable to finish evacuation safely.
FAQ
Can I shrink a ZFS pool the way I shrink an LVM volume?
Usually no. ZFS was designed to grow pools. Vdev removal is a specific evacuation feature, not a general-purpose shrink tool, and it’s limited by vdev type and platform support.
Is removing a disk from a mirror the same as removing a vdev?
No. Detaching a leaf disk from a mirror changes redundancy inside a vdev. Removing a top-level vdev triggers evacuation of allocated blocks to other top-level vdevs.
Can I remove a raidz vdev from a pool?
On many real-world deployments: effectively no. Treat raidz top-level vdevs as permanent. If you need shrink flexibility, design with mirrors or plan for pool migration.
How long does vdev removal take?
It depends on allocated data on the target vdev, pool fragmentation, device speeds, and competing workload. Expect “hours to days,” not “minutes,” for multi-terabyte vdevs.
What happens if a disk in the removing vdev fails mid-evacuation?
If the vdev becomes unreadable, evacuation may fail and you can lose the ability to complete removal safely. Mirrors give you more resilience; single-disk vdevs are unforgiving. Don’t start removal on questionable hardware.
Does removing SLOG risk data loss?
Not in the sense of corrupting committed data. ZFS remains consistent. The risk is performance: synchronous write latency can jump sharply, causing application timeouts.
Does removing L2ARC risk data loss?
No. L2ARC is a cache. Removing it drops cached data and your hit rate will rebuild over time.
Can I remove a special vdev safely?
Sometimes, but it’s not “safe” in the casual sense. Special vdevs hold metadata; you must ensure redundancy, health, headroom, and support on your platform. Treat it like a migration with scrubs and verification.
Why is free space so critical for removal?
Because evacuation rewrites allocated blocks elsewhere. If the remaining vdevs can’t accept the data, ZFS can’t complete the move. Also, very full pools behave badly: allocation slows and performance collapses.
Should I prefer mirrors over raidz if I want to remove vdevs later?
Yes, if operational flexibility matters more than raw capacity efficiency. Mirrors cost more in usable space, but they buy you simpler expansions, replacements, and (where supported) removals.
Conclusion: practical next steps
If you want to remove something from a ZFS pool, start by classifying it. Cache and log devices are generally easy wins. Top-level data vdevs are possible only when your pool and platform explicitly support it, and even then it’s a controlled data migration under load.
Next steps that pay off immediately:
- Run the eligibility checks now (feature flags, topology) before you need the option during an incident.
- Enforce headroom. If your pool lives above ~80% allocated all the time, you’re choosing crisis mode as a lifestyle.
- Build for change: if you anticipate shrinking, prefer mirror-based top-level vdevs and avoid “temporary” layouts that you can’t undo.
- Practice the boring disciplines: scrubs, SMART monitoring, snapshot hygiene, and reservations you can explain to a skeptical coworker.
ZFS will do the right thing when you respect its rules. When you don’t, it will still do the right thing—by refusing. That’s a feature. Treat it like one.