It’s 02:17. Your deploy is mid-flight. A write fails with ENOSPC. Someone runs df -h and announces the filesystem is only 72% full. Congratulations: you’ve met one of ZFS’s favorite party tricks—being technically correct and operationally infuriating.
This isn’t ZFS being “buggy.” It’s ZFS being honest about different constraints than the ones your muscle memory checks first. Space in ZFS is not a single number. It’s a pile of accounting rules, metadata realities, and safety margins that become visible only when you’re already bleeding.
What “no space left” really means in ZFS
When an application gets ENOSPC on ZFS, it’s not always about raw free bytes. ZFS is a copy-on-write filesystem. Every “overwrite” is a new write somewhere else, followed by metadata updates, followed by the old blocks becoming free only after the transaction group (TXG) commits—and only if nothing else references those blocks.
So “no space” can mean any of these:
- The pool is actually near-full and ZFS is protecting itself from collapse (slop space, allocator constraints).
- Snapshots are pinning blocks you thought you deleted.
- Reservations are consuming space that
dfdoesn’t describe well. - Quotas are blocking a dataset even though the pool has room.
- Metadata or small-block overhead is eating the remaining allocatable space.
- Fragmentation and metaslab fullness make “free space” unusable for the write pattern.
- Zvol thin-provisioning looks spacious until it isn’t; or it looks full because of volsize/volblocksize constraints.
Operationally, you should treat “ZFS no space left” as: “the allocator cannot satisfy the request given current policies and references.” That’s different from “disk is full,” and that difference matters because the fix is rarely “rm -rf something until it works.”
One quote worth keeping in your head when you’re tempted to “just delete stuff” in production: “Hope is not a strategy.”
— Rick Page
Fast diagnosis playbook (check this before you delete anything)
If you do nothing else, do this in order. It’s optimized for speed, signal, and not making the incident worse.
1) Is it pool-wide, dataset-specific, or a zvol?
- Pool:
zpool listandzpool status - Dataset:
zfs list,zfs get quota,reservation,refreservation - Zvol:
zfs list -t volume,zfs get volsize,volblocksize,refreservation
Decision: if it’s dataset quota/reservation, fixes are local and fast. If it’s pool-level near-full, you need space management, not knob-twiddling.
2) Are snapshots pinning space?
zfs list -t snapshot -o name,used,refer,creation -s usedzfs get usedbydataset,usedbysnapshots,usedbyrefreservation
Decision: if snapshots dominate, deleting files won’t help; deleting snapshots (carefully) will.
3) Are reservations or refreservations stealing allocatable space?
zfs get -r reservation,refreservation- Check for unexpectedly large
refreservationon zvols and datasets.
Decision: remove or reduce reservations only if you understand why they existed. “Because someone once read a tuning blog” is not a valid reason.
4) Is the pool hitting slop space / allocator pain?
zpool listfor high capacity percentage.zpool get autotrimandzpool statusfor device errors and degraded layouts.
Decision: if the pool is > ~80–90% and under write pressure, plan for immediate relief (free real space, add vdevs, move workloads). Do not “defrag ZFS”; that’s not how this works.
5) Check for “hidden” consumers: refreservation, special devices, small blocks, and sync writes
zfs get -r special_small_blocksand confirm you’re not dumping metadata onto an undersized special vdev.zfs get recordsize,compressionfor datasets with pathological small writes.
Decision: if metadata/special is full, you can see “no space” even when the main pool looks okay. That requires a targeted fix.
Interesting facts and historical context
- ZFS was born at Sun Microsystems in the mid-2000s, designed to end the “filesystem + volume manager” split by making storage a single coherent stack.
- Copy-on-write was a deliberate reliability choice: ZFS avoids in-place overwrites so it can always maintain on-disk consistency, even on crashes.
- The “pool” concept flipped the old model: instead of filesystems owning partitions, filesystems are cheap views over a shared storage pool, which makes accounting more nuanced.
- Snapshots are not copies; they’re bookmarks. They cost almost nothing to create, but can hold huge amounts of “deleted” data indefinitely.
- ZFS “slop space” exists because near-full pools behave badly: fragmentation grows, allocation becomes expensive, and worst case you can deadlock yourself out of space needed to free space.
- Metadata is first-class and abundant in ZFS: checksums, block pointers, space maps, and intent logs mean the filesystem knows a lot—and stores a lot.
- Ashift became famous because of 4K drives: choosing the wrong sector alignment permanently taxes your usable capacity and can amplify “no space” pain through wasted slack space.
- OpenZFS split and evolved across platforms (Illumos, FreeBSD, Linux), and some space-reporting and feature behaviors differ subtly by version.
- Zvols made ZFS popular for virtualization, but they brought block-device semantics (volsize, volblocksize, discard/TRIM expectations) into a filesystem world with snapshots and COW.
Space accounting that makes ZFS look like it’s lying
df is not your authority here
df asks the mounted filesystem what it thinks is available to that mount. On ZFS, the mount is a dataset with properties: quotas, reservations, refreservations, and a view into a pool that might have its own constraints. When ZFS says “no space,” it might be:
- Out of space in the pool allocator’s view.
- Blocked by a dataset quota even though the pool has room.
- Unable to allocate a suitably sized contiguous region in a mostly-full metaslab (especially for certain block sizes and patterns).
If you’re debugging ZFS space, zfs and zpool are the source of truth. df is the weather report taped to your window.
Snapshots: the #1 reason deletes don’t “free space”
In ZFS, a block is freed only when nothing references it. A snapshot is a reference. So if you delete a file that existed when a snapshot was taken, the blocks are still referenced by that snapshot. Result: your delete “works,” your application is happy, and your pool is still full.
Space math becomes especially confusing because ZFS reports snapshot USED as the space that snapshot uniquely holds compared to the dataset’s current state. That number can be counterintuitive. Snapshots can collectively pin a lot of space even when each one looks small, depending on churn.
Reservations and refreservations: space you can’t use even though it’s “free”
reservation guarantees space for a dataset. That means ZFS treats that space as unavailable to others.
refreservation guarantees space for the dataset’s referenced data (excluding snapshots). It’s commonly used for zvols to ensure they can always be written to, because running out of space under a block device can be catastrophic for the guest filesystem.
Both are legitimate tools. Both are also a great way to starve your pool if you set them like you’re provisioning an enterprise SAN in 2009.
Slop space: ZFS’s “keep some cash in your wallet” policy
ZFS reserves a chunk of pool space so the system can keep functioning near full: allocate metadata, commit TXGs, and recover. This is sometimes called “slop space.” The exact behavior depends on version and tuning, but the principle is stable: ZFS will start denying allocations before you hit 100% raw usage.
This is not ZFS being dramatic. A 99% full COW filesystem is a performance and reliability horror show. If you run pools routinely above ~80–85% and then act surprised when things get weird, that’s not a ZFS problem. That’s a grown-up choices problem.
Metadata and small blocks: you can fill the pool without “big files”
ZFS stores checksums, block pointers, indirect blocks, spacemaps, directory structures, extended attributes, and more. If you create millions of tiny files, or you store highly fragmented small blocks, metadata overhead rises. Also, small blocks have proportionally more pointer overhead, and can waste space due to minimum allocation sizes.
If you’re using a special vdev (for metadata and optionally small blocks), you can hit “no space” when the special vdev fills—even if the main data vdevs have plenty of room. That’s a particular flavor of excitement.
Fragmentation and metaslabs: “free” space that isn’t allocatable
ZFS allocates from metaslabs. As a pool fills and churns, free space can become scattered into chunks too small for the requested block sizes or too costly to allocate efficiently. The allocator can fail a request even when total free space looks non-zero.
This shows up most painfully with:
- Large sequential writes late in a pool’s life.
- VM images with random writes, snapshots, and clones.
- Overly small recordsize with high churn.
Zvols: block devices with ZFS consequences
Zvols look like disks. People treat them like disks. Then they learn Zvols are backed by ZFS datasets, which means snapshots, refreservations, compression, and COW behavior can all influence “no space left.”
Thin provisioning makes this spicier. You can have a zvol with a large volsize, plenty of apparent guest free space, and still wedge the host pool until writes fail. The guest sees a disk that “should work.” The host sees a pool that can’t allocate. Both are right. Everyone is angry.
Joke #1: The fastest way to increase free space in a ZFS pool is to take a snapshot of your career before touching production. You’ll want something to roll back to.
Practical tasks (commands, outputs, decisions)
These are the tasks I actually run when ZFS reports ENOSPC or the pool feels “full” in ways that don’t match df. Each task includes a realistic output sample and what decision to make from it.
Task 1: Confirm pool capacity and basic health
cr0x@server:~$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 21.8T 19.9T 1.93T - - 62% 91% 1.00x ONLINE -
What it means: 91% capacity and 62% fragmentation is a red flag. Even with 1.93T free, allocation can fail under pressure, and performance will be ugly.
Decision: treat this as a pool-level incident. Your primary fix is to free meaningful space or add vdev capacity, not “delete a few logs.”
Task 2: Check for a dataset quota causing local ENOSPC
cr0x@server:~$ zfs get -o name,property,value,source quota,refquota tank/app
NAME PROPERTY VALUE SOURCE
tank/app quota 2T local
tank/app refquota none default
What it means: dataset tank/app is capped at 2T regardless of pool free space.
Decision: if the app legitimately needs more, raise the quota; if not, the app must clean up within its budget.
Task 3: Check reservations and refreservations recursively
cr0x@server:~$ zfs get -r -o name,property,value,source reservation,refreservation tank | head
NAME PROPERTY VALUE SOURCE
tank reservation none default
tank refreservation none default
tank/vm reservation none default
tank/vm refreservation none default
tank/vm/win01 reservation none default
tank/vm/win01 refreservation 500G local
tank/vm/win02 reservation none default
tank/vm/win02 refreservation 500G local
What it means: two VMs have 500G each reserved. That’s 1T of pool space effectively taken off the table for everyone else.
Decision: keep refreservations only where you need hard guarantees. If the pool is stressed, reduce them carefully and document why.
Task 4: See what is consuming space by category
cr0x@server:~$ zfs get -o name,property,value -s local,default used,available,usedbysnapshots,usedbydataset,usedbyrefreservation tank/app
NAME PROPERTY VALUE
tank/app used 1.89T
tank/app available 110G
tank/app usedbydataset 940G
tank/app usedbysnapshots 850G
tank/app usedbyrefreservation 0B
What it means: snapshots are holding almost as much as the live dataset.
Decision: deleting files in tank/app will disappoint you. Address snapshots first: retention, replication, or targeted snapshot deletion.
Task 5: Identify the biggest snapshots (by space “used”)
cr0x@server:~$ zfs list -t snapshot -o name,used,refer,creation -s used | tail -5
tank/app@auto-2025-12-20_0100 120G 820G Sat Dec 20 01:00 2025
tank/app@auto-2025-12-21_0100 128G 812G Sun Dec 21 01:00 2025
tank/app@auto-2025-12-22_0100 140G 800G Mon Dec 22 01:00 2025
tank/app@auto-2025-12-23_0100 155G 785G Tue Dec 23 01:00 2025
tank/app@auto-2025-12-24_0100 210G 740G Wed Dec 24 01:00 2025
What it means: recent snapshots are growing fast; churn is high.
Decision: review what changed (new workload, compaction job, log rotation failure) and adjust snapshot frequency/retention or dataset layout.
Task 6: Dry-run a safe snapshot deletion plan (human review first)
cr0x@server:~$ zfs list -t snapshot -o name -s creation | head -5
tank/app@auto-2025-11-01_0100
tank/app@auto-2025-11-02_0100
tank/app@auto-2025-11-03_0100
tank/app@auto-2025-11-04_0100
tank/app@auto-2025-11-05_0100
What it means: you have an ordered list to propose deletions (e.g., oldest first) aligned with policy.
Decision: if you must delete snapshots, delete oldest first unless you have a known “bad snapshot” that pins a specific dataset state you no longer need.
Task 7: Actually delete snapshots (carefully, and preferably in batches)
cr0x@server:~$ sudo zfs destroy tank/app@auto-2025-11-01_0100
cr0x@server:~$ sudo zfs destroy tank/app@auto-2025-11-02_0100
What it means: snapshot references are removed; space will be freed once no other references exist and TXGs commit.
Decision: after a few deletes, re-check zpool list and dataset available. If nothing improves, another constraint is in play (clones, holds, reservations, special vdev).
Task 8: Check for snapshot holds preventing deletion
cr0x@server:~$ zfs holds tank/app@auto-2025-11-03_0100
NAME TAG TIMESTAMP
tank/app@auto-2025-11-03_0100 keep Thu Dec 12 09:14 2025
What it means: a hold named keep is pinning the snapshot.
Decision: coordinate with whoever set the hold (backup/replication tooling). Remove the hold only when you’re sure it’s no longer required.
Task 9: Release a hold (explicitly, with intent)
cr0x@server:~$ sudo zfs release keep tank/app@auto-2025-11-03_0100
cr0x@server:~$ sudo zfs destroy tank/app@auto-2025-11-03_0100
What it means: the snapshot can now be destroyed.
Decision: if holds appear unexpectedly, audit your backup/replication pipeline. Holds are good; surprise holds are bad.
Task 10: Check for clones that keep snapshots alive
cr0x@server:~$ zfs get -o name,property,value clones tank/app@auto-2025-12-01_0100
NAME PROPERTY VALUE
tank/app@auto-2025-12-01_0100 clones tank/app-test
What it means: dataset tank/app-test is a clone depending on that snapshot. You can’t destroy the snapshot without dealing with the clone.
Decision: either destroy the clone, promote it, or accept the snapshot must remain. Do not “force” your way through unless you like explaining data loss.
Task 11: Identify whether a zvol is over-reserved or mis-sized
cr0x@server:~$ zfs list -t volume -o name,volsize,used,available tank/vm/win01
NAME VOLSIZE USED AVAIL
tank/vm/win01 800G 610G 0B
What it means: AVAIL is 0B for that volume’s dataset view. That can be quota/refreservation or pool-level constraint. It can also mean your zvol is at its limit relative to pool conditions.
Decision: check refreservation and pool capacity. If the pool is near-full, growing volsize may be risky or impossible.
Task 12: Inspect zvol properties that affect space behavior
cr0x@server:~$ zfs get -o name,property,value -s local,default volsize,volblocksize,compression,refreservation tank/vm/win01
NAME PROPERTY VALUE
tank/vm/win01 volsize 800G
tank/vm/win01 volblocksize 8K
tank/vm/win01 compression lz4
tank/vm/win01 refreservation 800G
What it means: an 800G refreservation guarantees the full volsize. Great for safety, brutal for shared pools.
Decision: if you need guarantees, keep it. If you’re running a dense virtualization host and you’re short on space, consider reducing refreservation and accepting the operational risk (with monitoring and headroom).
Task 13: Check if the pool has a checkpoint consuming “phantom” space
cr0x@server:~$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 21.8T 19.9T 1.93T 1.2T - 62% 91% 1.00x ONLINE -
What it means: a checkpoint exists and is holding 1.2T worth of old pool state. That’s real space you can’t reclaim until the checkpoint is discarded.
Decision: if you don’t need the checkpoint for rollback, free it. If you do need it, you’ve chosen to run with less capacity—own that choice.
Task 14: Discard a checkpoint (irreversible, so think first)
cr0x@server:~$ sudo zpool checkpoint -d tank
cr0x@server:~$ zpool list tank
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 21.8T 19.9T 1.93T - - 62% 91% 1.00x ONLINE -
What it means: checkpoint is removed; that pinned space can now be reclaimed as blocks free up.
Decision: do this only when you’re sure you don’t need the rollback safety net.
Task 15: Verify special vdev health and capacity (if you have one)
cr0x@server:~$ zpool status -v tank
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
special
mirror-1 ONLINE 0 0 0
nvme0n1 ONLINE 0 0 0
nvme1n1 ONLINE 0 0 0
errors: No known data errors
What it means: special vdev exists. If it fills, the pool can become effectively “out of space” for metadata/small blocks.
Decision: monitor special vdev usage aggressively. If it’s near full, you may need to add special capacity or adjust special_small_blocks strategy.
Task 16: Check small-block policy that can overload special vdev
cr0x@server:~$ zfs get -r -o name,property,value special_small_blocks tank | head
NAME PROPERTY VALUE
tank special_small_blocks 16K
tank/app special_small_blocks 16K
tank/vm special_small_blocks 0
What it means: any blocks ≤16K for tank and tank/app go to special. That can be great for performance and catastrophic for special capacity if the workload has lots of small blocks.
Decision: if special is tight, reduce this for churny datasets (new writes will follow the new rule; old blocks won’t magically migrate).
Joke #2: ZFS doesn’t “eat” your free space. It just keeps meticulous receipts, and your deletes are not tax-deductible until the snapshots agree.
Three corporate mini-stories from the space wars
Mini-story 1: The incident caused by a wrong assumption
The company was migrating a legacy service to containers. Storage was “simple”: a ZFS dataset mounted into the nodes, with hourly snapshots for safety. The on-call playbook was inherited from ext4 days: if the disk is full, delete old logs and restart.
During a peak traffic day, writes started failing. The application threw ENOSPC and fell into a crash loop. The on-call checked df -h: 68% used. They suspected a bug. They restarted pods. They rotated logs more aggressively. Nothing changed.
Then someone ran zfs get usedbysnapshots and found snapshots were holding more space than the live dataset. The team had recently enabled verbose request logging for debugging and deployed a change that rewrote a large JSON file repeatedly. Every hour, a new snapshot pinned another slice of that churn. Deleting current logs didn’t touch the pinned blocks.
The fix was boring: delete a tranche of older snapshots and reduce snapshot frequency for that dataset. The postmortem lesson was sharper: “df is not an authority on ZFS” became a literal runbook line, and the team added a dashboard tile for snapshot-used and pool capacity.
Mini-story 2: The optimization that backfired
A virtualization platform team wanted faster VM performance. They added a special vdev on mirrored NVMe to accelerate metadata and small blocks. They also set special_small_blocks=32K pool-wide because it looked good in a lab. Everyone celebrated; graphs got better.
Months later, a new internal build system landed on the same pool. It created oceans of small files, churned them constantly, and loved to rewrite tiny blobs. The special vdev started filling far faster than the main data vdevs. Nobody noticed because the pool still had “terabytes free” and the only alert was a generic “pool capacity.”
Then came the fun part: seemingly random “no space left” errors on unrelated services. Metadata allocations were failing because the special vdev was tight. Some datasets weren’t even using small blocks, but they still needed metadata. The storage team saw free space on the main vdevs and got blamed for “ZFS lying again.”
The eventual fix required adding more special capacity and narrowing special_small_blocks to only the datasets that actually benefited. The lesson was painful but clear: “fast” knobs have blast radius. You don’t roll them out pool-wide just because the benchmark chart looks pretty.
Mini-story 3: The boring but correct practice that saved the day
A finance-adjacent system stored daily reports and also served them to customers. The team ran ZFS with strict quotas per dataset and a policy that the pool should never exceed an operational ceiling. They treated 80–85% as “full” for the pool, because they’d seen what happens after that.
One Friday, an upstream vendor changed a feed format and the parser started duplicating data. Storage consumption spiked. The service was still healthy, but the pool was climbing toward the ceiling fast. Alerts fired early because they were keyed to pool capacity and snapshot growth—not to customer-visible failure.
The team’s response was unglamorous: they halted the ingest, kept the serving side running, and used snapshots to preserve evidence for debugging. Because the pool still had headroom, they could do that without triggering allocator panic or emergency deletions. They fixed the parser, backfilled the correct data, and resumed.
The postmortem was short and slightly smug. The “boring practice” was simply maintaining headroom, enforcing quotas, and monitoring snapshot used space. Nobody had to delete anything randomly. Nobody had to explain why backups were gone. Sometimes the best engineering is just refusing to operate on the edge.
Common mistakes: symptoms → root cause → fix
1) “df shows plenty of space, but writes fail”
Symptom: application gets ENOSPC; df -h shows comfortable free space.
Root cause: dataset quota, reservation/refreservation, or pool slop space / allocator constraints.
Fix: check zfs get quota,refquota,reservation,refreservation and zpool list. Adjust the relevant property or free pool space.
2) “I deleted a lot, but nothing freed”
Symptom: you remove files; pool allocation doesn’t drop.
Root cause: snapshots (or clones) pinning the blocks.
Fix: quantify usedbysnapshots. Identify heavy snapshots, holds, and clones; delete snapshots consistent with retention, or remove/promote clones.
3) “Pool is 90% full and suddenly everything is slow and failing”
Symptom: latency spikes, allocations fail intermittently, scrubs slow down, random IO becomes miserable.
Root cause: near-full allocator behavior + fragmentation. COW needs room to breathe.
Fix: free significant space (not gigabytes, but meaningful percentage points), add vdev capacity, or migrate workloads. Then enforce a ceiling.
4) “We added a special vdev and now ‘no space’ happens early”
Symptom: pool has plenty of raw free space; metadata-heavy workloads fail; errors look like space exhaustion.
Root cause: special vdev is full or nearly full due to metadata/small blocks policy.
Fix: monitor and expand special vdev; restrict special_small_blocks to targeted datasets; stop sending churny small blocks there unless you sized it for that.
5) “Zvol guest filesystem says it has space, host says ENOSPC”
Symptom: VM writes fail; guest free space exists; host pool is near-full or thin-provisioned.
Root cause: thin provisioning + COW + snapshots + lack of headroom; or a zvol reservation scheme that starves the pool.
Fix: add headroom, reduce snapshot churn, ensure discard/TRIM is configured end-to-end where appropriate, and decide whether refreservation is required.
6) “We turned on dedup to save space; now we have no space”
Symptom: capacity math got worse; memory pressure increased; space behaves unpredictably.
Root cause: dedup adds metadata and operational cost; on many workloads it’s a trap unless designed for.
Fix: don’t enable dedup casually. If you already did, measure and consider migrating data to a non-dedup dataset/pool rather than trying to “toggle it off” as a magic undo.
Checklists / step-by-step plan
Checklist A: “Stop the bleeding” during an ENOSPC incident
- Confirm scope: pool vs dataset vs zvol. Run
zpool listandzfs list. - Freeze churn: pause the job generating writes (ingest, compaction, backups, CI artifacts). Space incidents worsen with churn.
- Find the constraint: quotas/reservations/snapshots/special vdev. Do not guess.
- Recover space safely: delete snapshots per policy (oldest first), or reduce reservations, or move a dataset to another pool.
- Verify recovery: re-check pool
CAP, datasetavailable, and application write success. - Document what you changed: snapshot deletions, property changes, anything. Future-you is a stakeholder.
Checklist B: “Make it not happen again”
- Set an operational ceiling for pool usage (commonly 80–85% depending on workload) and alert before it.
- Monitor snapshot growth per dataset, not just pool used.
- Use quotas intentionally for noisy neighbors and runaway jobs.
- Use reservations sparingly and only where guarantees are worth the shared-pool cost.
- For virtualization: decide your policy on thin provisioning; if you allow it, enforce headroom and monitor aggressively.
- Review special vdev strategy as a capacity plan, not a tweak.
- Run regular scrubs and treat any device errors as urgent. Space incidents and reliability incidents often arrive as a bundle deal.
FAQ
1) Why does ZFS say “no space left” before the pool hits 100%?
Because ZFS keeps reserve space (slop space) and needs allocatable room for metadata and TXG commits. Near 100%, a COW filesystem can trap itself.
2) Why didn’t deleting files free space?
Snapshots (or clones) likely still reference those blocks. Check zfs get usedbysnapshots and list snapshots by used.
3) Is it safe to delete snapshots to free space?
Usually, yes—if you understand why they exist (backup, replication, rollback points). The unsafe part is deleting the wrong ones without coordinating retention requirements.
4) What’s the difference between reservation and refreservation?
reservation reserves space for the dataset including snapshots. refreservation reserves space for the dataset’s referenced data only, often used for zvol safety.
5) Can fragmentation alone cause ENOSPC?
It can contribute. When metaslabs are crowded and fragmented, allocation for certain block sizes can fail even with some free space remaining, especially under heavy churn.
6) Does compression help with “no space left”?
Compression can reduce allocated space and delay pain, but it’s not a bailout when you’re already near-full and snapshot churn is pinning old blocks.
7) Why does snapshot USED look small even when snapshots are holding lots of space?
Snapshot USED is “unique to that snapshot” relative to current dataset state. Many snapshots can each look modest while collectively pinning large historical churn.
8) Should I just add a bigger disk to the pool?
Add capacity properly: ZFS pools grow by adding vdevs, not by swapping a single disk (unless you replace every disk in a vdev and expand). Capacity fixes are great when they’re planned, not panic-driven.
9) Can a special vdev filling up cause pool-wide write failures?
Yes. If metadata (or small blocks routed to special) can’t be allocated, normal writes can fail. Special capacity is not optional once you depend on it.
10) How much free space should I keep in a ZFS pool?
Enough that you’re not playing allocator roulette. For many production workloads, treat 80–85% as “full.” For VM-heavy random-write pools, be even more conservative.
Conclusion: next steps you can do today
If you’re currently in an incident: stop churn, identify whether snapshots or reservations are the real space hog, and free space in a way you can explain in a postmortem. Random deletion is how you trade a space outage for a data-loss outage.
If you’re not in an incident (the rare luxury): set a pool capacity ceiling, monitor usedbysnapshots and reservations, and rehearse the “fast diagnosis” steps so you don’t learn them at 2 a.m. The goal isn’t to make ZFS stop “lying.” The goal is to speak ZFS’s language before it starts shouting.