ZFS zfs list -o space: The View That Explains ‘Where Did It Go?’

Was this helpful?

Every ZFS operator eventually meets the same ghost: the pool that’s “suddenly” full. You check the obvious dataset, you check the obvious directory, you even run a quick du and feel briefly reassured—until ZFS calmly insists the pool is running out of space anyway. That’s not ZFS being cute. That’s ZFS telling you you’re looking at the wrong layer.

zfs list -o space is the layer. It’s the “space accounting ledger” view that reconciles datasets, snapshots, reservations, refreservations, and the kinds of invisible consumption that make on-call shifts spicy. In production, it’s the difference between guessing and knowing—and between deleting the wrong thing and fixing the actual cause.

What zfs list -o space actually shows

Most ZFS “where did my space go?” arguments happen because we mix two valid but different perspectives:

  • Filesystem perspective (what du reports): counts live file data reachable from a mountpoint, usually ignoring snapshots, often missing metadata, and frequently lying by omission when compression, sparse files, or clones are involved.
  • Pool allocator perspective (what ZFS must manage): counts real allocated blocks, plus space effectively held by snapshots, plus space reserved for promises you made to other datasets (reservations/refreservations), plus overhead.

zfs list -o space is a curated map of that second perspective. It breaks “USED” into meaningful buckets: data, snapshots, child datasets, reservations, refreservation, and metadata. If you’ve only ever looked at zfs list with the default columns, you’re missing the receipts.

One operational note: the space shorthand is not magic; it’s a named column set. On many systems, it expands into something like:

  • name, avail, used, usedbysnapshots, usedbydataset, usedbychildren, usedbyrefreservation, usedbyreservation

Depending on ZFS implementation and version, you may also see usedbyrefquota or may need to request other properties explicitly. The principle holds: stop treating USED as a single number.

Interesting facts & historical context

Because it helps to know what kind of beast you’re operating:

  1. ZFS was born at Sun Microsystems in the early 2000s as a combined filesystem and volume manager, which is why it can do things like snapshots at the block level without a separate LVM layer.
  2. The original ZFS design pushed end-to-end checksumming and copy-on-write as first-class features, which is why “deleting a file” doesn’t necessarily mean “freeing space now” if snapshots still reference the blocks.
  3. ZFS datasets are cheap and meant to be numerous; the design assumes you’ll slice storage by application/tenant and manage behavior per dataset using properties.
  4. Space accounting in ZFS intentionally distinguishes “referenced” space from “used” space; clones and snapshots make “ownership” non-obvious, so ZFS tracks it.
  5. Reservations and refreservations exist because “best effort” storage is not acceptable in many enterprise workloads; they’re promises backed by allocator behavior, not vibes.
  6. Zvols (block devices backed by ZFS) were introduced to serve VM images and iSCSI/FC-style consumers; they complicate space intuition because filesystem tools don’t see inside them.
  7. Compression was added early and became widely used because it’s one of the few performance optimizations that can also reduce write amplification and improve cache efficiency—when the data compresses.
  8. Deduplication is famous for being both brilliant and financially ruinous when enabled casually; many operators have a “we tried dedup once” story and now treat it like a loaded nail gun.

The space columns, demystified

Start with the mental model: “who is holding blocks?”

In ZFS, blocks are shared. A snapshot is not a copy; it’s a bookmark. When you modify a block after a snapshot exists, ZFS writes the new block elsewhere (copy-on-write), and the snapshot keeps pointing at the old one. So the snapshot “holds” the old blocks alive. That’s why deleting files doesn’t always return space: you deleted the live reference, but the snapshot still references the data.

zfs list -o space answers: how much of this dataset’s USED is held for each reason?

Column-by-column interpretation (the usual suspects)

USED

The total space consumed by the dataset and everything “below” it, depending on context. For a dataset, USED typically includes its own data and metadata, snapshots (in the sense of unique space attributable), and children, plus reservations/refreservations impacts. This is allocator-facing. This is what makes pools fill.

AVAIL

Space available to that dataset, factoring in pool free space and quotas/reservations (and sometimes special rules like “slop space”). AVAIL is the number your application experiences when writing.

USEDBYDATASET

Space used by the dataset’s live filesystem (and metadata) excluding snapshots and excluding children. If this is large, your live data is large. If this is small but USED is huge, your ghost is elsewhere.

USEDBYSNAPSHOTS

Space held due to snapshots. It is not “the size of snapshots” in a human sense; it’s the unique space that can’t be freed because snapshots reference it. High USEDBYSNAPSHOTS often means “we churned a dataset with snapshots enabled.” Think databases, VM images, CI artifact caches, or anything that rewrites large files in place.

USEDBYCHILDREN

Space used by descendant datasets and volumes. A parent can look “full” when it’s really just acting as a container. This is the number that saves you from blaming the wrong dataset.

USEDBYRESERVATION

Space consumed because of a reservation on the dataset (or in some implementations, because of a reservation impacting accounting). A reservation is a guarantee: this dataset will be able to write that amount even if the pool gets tight.

USEDBYREFRESERVATION

Space consumed because of refreservation. Unlike reservation, refreservation typically applies to the dataset itself (not its children) and is common with zvols backing VMs. It can be the “invisible tax” that makes a pool look full while actual data is not.

Joke #1 (because we’ve earned it): Reservations are like meeting invites—once you accept them, they block your calendar even if nobody shows up.

Related properties that often complete the picture

zfs list -o space is the fastest lens, but you’ll often want adjacent properties to answer “why”:

  • referenced: how much space this dataset’s live data references (not counting snapshots). Great for “how big is it, really?”
  • logicalused / logicalreferenced: how much data would exist without compression (and sometimes before copies/dedup). Great for spotting compression wins/losses.
  • compressratio: quick hint whether compression is helping.
  • written: how much data has been written since the last snapshot (or dataset creation). Great for churn analysis.
  • volsize / volblocksize: for zvol behavior.
  • copies, dedup: the “are we multiplying blocks?” switches.

Fast diagnosis playbook

This is the “it’s 03:17, the pool is at 92%, and the pager is singing” version. The goal is not to be elegant; it’s to avoid making it worse.

1) Confirm the pool-level truth

First question: is it truly a space problem or a fragmentation/allocation/metadata problem presenting as space?

cr0x@server:~$ zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank  10.9T  9.80T  1.10T        -         -    43%    89%  1.00x  ONLINE  -

Interpretation: CAP near 90% is danger territory for many pools, especially if FRAG is high and the workload is random-write heavy. If CAP is low but AVAIL in datasets is low, you’re likely dealing with quotas/reservations.

2) Find the top offenders by actual allocator accounting

cr0x@server:~$ zfs list -o space -S used -r tank
NAME              AVAIL   USED  USEDSNAP  USEDDS  USEDCHILD  USEDREFRESERV  USEDRESERV
tank              1.10T  9.80T     3.20T    900G      5.70T            0B          0B
tank/vm           1.10T  6.10T     2.60T    400G      3.10T         80.0G          0B
tank/home         1.10T  2.90T     400G     2.30T     200G            0B          0B
tank/backup       1.10T  800G     200G      580G       20G            0B          0B

Interpretation: You now have a map: the pool is full mostly because tank/vm is full; within that, snapshots dominate (USEDSNAP), and there’s also a refreservation footprint.

3) Decide which lever is safe: snapshots, reservations, or real data

In order of least risky to most risky:

  1. Prune snapshots with known retention policy (especially automated ones). This typically has the highest return with the lowest blast radius.
  2. Reduce/remove refreservations only if you understand the consumer (VM/zvol) and can accept overcommit risk.
  3. Delete/move live data only if you’re sure snapshots are not the true holder and you’re not breaking a workload expectation.

4) Validate with one more accounting view before acting

cr0x@server:~$ zfs get -o name,property,value -H used,referenced,logicalused,logicalreferenced,compressratio tank/vm
tank/vm	used	6.10T
tank/vm	referenced	410G
tank/vm	logicalused	7.90T
tank/vm	logicalreferenced	520G
tank/vm	compressratio	1.92x

Interpretation: Live referenced data is only ~410G; the rest is snapshots/children/reservations. If you start deleting “live” VM files, you won’t get much back. Snapshots are the lever.

Practical tasks (commands + interpretation)

These are the day-to-day moves that turn -o space from trivia into money saved and outages avoided. All examples assume pool tank; adjust accordingly.

Task 1: Show the space breakdown for a dataset tree

cr0x@server:~$ zfs list -o space -r tank
NAME        AVAIL   USED  USEDSNAP  USEDDS  USEDCHILD  USEDREFRESERV  USEDRESERV
tank        1.10T  9.80T     3.20T    900G      5.70T            0B          0B
tank/vm     1.10T  6.10T     2.60T    400G      3.10T         80.0G          0B
tank/home   1.10T  2.90T     400G     2.30T     200G            0B          0B

Interpretation: Use this to answer “is it snapshots, children, or the dataset itself?” without guessing.

Task 2: Sort by snapshot-held space to find snapshot churn hotspots

cr0x@server:~$ zfs list -o name,usedbysnapshots,usedbydataset,usedbychildren -S usedbysnapshots -r tank
NAME        USEDSNAP  USEDDS  USEDCHILD
tank/vm       2.60T    400G     3.10T
tank/home      400G   2.30T      200G
tank/backup     200G    580G       20G

Interpretation: Snapshot pruning should start with tank/vm. This is also a hint to revisit snapshot frequency on churn-heavy datasets.

Task 3: List snapshots and their “used” (incremental) footprint

cr0x@server:~$ zfs list -t snapshot -o name,used,referenced -S used -r tank/vm | head
NAME                              USED  REFERENCED
tank/vm@auto-2025-12-24-2300       48G        410G
tank/vm@auto-2025-12-24-2200       41G        409G
tank/vm@auto-2025-12-24-2100       39G        409G
tank/vm@weekly-2025-w51            22G        405G

Interpretation: Snapshot USED is the additional space consumed compared to previous snapshot state. Big numbers mean heavy rewrite churn between snapshots.

Task 4: Quickly see snapshot churn since last snapshot using written

cr0x@server:~$ zfs get -o name,property,value -H written tank/vm
tank/vm	written	312G

Interpretation: If written is huge and snapshots are frequent, snapshot space growth will be huge. Databases and VM images are classic offenders.

Task 5: Identify refreservation landmines (common with zvol-backed VM storage)

cr0x@server:~$ zfs list -o name,type,usedbyrefreservation,refreservation,volsize -r tank/vm
NAME              TYPE   USEDREFRESERV  REFRESERVATION  VOLSIZE
tank/vm           filesystem       80.0G             none        -
tank/vm/zvol0     volume          200G              200G      500G
tank/vm/zvol1     volume          300G              300G      300G

Interpretation: USEDREFRESERV tells you how much space is being held aside. If the pool is tight, these “held” blocks can be the difference between stable and outage.

Task 6: Find datasets where quotas make AVAIL look smaller than pool free

cr0x@server:~$ zfs get -o name,property,value -H quota,refquota,reservation,refreservation tank/home tank/vm
tank/home	quota	2.5T
tank/home	refquota	none
tank/home	reservation	none
tank/home	refreservation	none
tank/vm	quota	none
tank/vm	refquota	none
tank/vm	reservation	none
tank/vm	refreservation	none

Interpretation: A quota can make a dataset report low AVAIL even when the pool has free space. This is a policy issue, not a pool issue.

Task 7: Compare live filesystem view vs ZFS allocator view (and catch the mismatch)

cr0x@server:~$ df -h /tank/vm
Filesystem      Size  Used Avail Use% Mounted on
tank/vm          11T  410G   11T   4% /tank/vm

cr0x@server:~$ zfs list -o name,used,referenced,usedbysnapshots tank/vm
NAME     USED  REFERENCED  USEDSNAP
tank/vm  6.10T       410G    2.60T

Interpretation: df reports what’s reachable in the live dataset. ZFS reports what the pool allocator must keep. When they disagree massively, snapshots/clones/reservations are usually involved.

Task 8: Catch clones holding space (the “I deleted the base image” surprise)

cr0x@server:~$ zfs list -o name,origin,used,referenced -r tank/vm
NAME                 ORIGIN                          USED  REFERENCED
tank/vm/base         -                               60G        60G
tank/vm/base@golden   -                               0B        60G
tank/vm/clone01       tank/vm/base@golden            80G        75G
tank/vm/clone02       tank/vm/base@golden            95G        88G

Interpretation: Clones depend on their origin snapshot. Delete the origin snapshot blindly and you’ll get blocked—or worse, you’ll plan capacity incorrectly because the origin can’t go away yet.

Task 9: Check compression effectiveness when logical and physical diverge

cr0x@server:~$ zfs get -o name,property,value -H logicalused,used,compressratio compression tank/home
tank/home	logicalused	3.40T
tank/home	used	2.90T
tank/home	compressratio	1.17x
tank/home	compression	lz4

Interpretation: Compression is helping a bit, not a lot. If logicalused is far larger than used, you have real wins; if it’s close, don’t bank on compression for capacity planning.

Task 10: Find datasets with unexpectedly high metadata (indirectly)

cr0x@server:~$ zfs list -o name,used,referenced,recordsize,special_small_blocks -r tank | head -n 10
NAME       USED  REFERENCED  RECORDSIZE  SPECIAL_SMALL_BLOCKS
tank       9.80T      900G     128K      0
tank/home  2.90T     2.30T     128K      0
tank/vm    6.10T      410G     128K      0

Interpretation: ZFS doesn’t expose “metadata used” as a simple dataset property in zfs list output, but mismatches between referenced and used, plus workload type (millions of small files) and special vdev tuning can hint strongly. If metadata is suspected, confirm at pool level with other tooling and consider special vdev strategies carefully.

Task 11: Use zfs list -p for scriptable, non-human units

cr0x@server:~$ zfs list -o space -p -r tank/vm | head -n 3
NAME	AVAIL	USED	USEDSNAP	USEDDS	USEDCHILD	USEDREFRESERV	USEDRESERV
tank/vm	1209462790553	6713214521344	2858730235904	429496729600	3420000000000	85899345920	0
tank/vm/zvol0	1209462790553	650000000000	0	650000000000	0	214748364800	0

Interpretation: Use this when you’re building alerts or reports. Human-friendly units are for humans; automation wants integers.

Task 12: Verify what would be freed by deleting a snapshot (dry-run thinking)

cr0x@server:~$ zfs list -t snapshot -o name,used -S used -r tank/vm | head -n 5
NAME                              USED
tank/vm@auto-2025-12-24-2300       48G
tank/vm@auto-2025-12-24-2200       41G
tank/vm@auto-2025-12-24-2100       39G
tank/vm@weekly-2025-w51            22G

Interpretation: If you delete tank/vm@auto-2025-12-24-2300, the upper bound of immediate reclaim is about 48G—often less if blocks are shared with clones or other snapshots. This ranking still tells you what to prune first.

Task 13: Show what’s consuming a parent: dataset vs children

cr0x@server:~$ zfs list -o name,usedbydataset,usedbychildren,usedbysnapshots -r tank | grep -E 'tank$|tank/vm$|tank/home$'
tank       900G      5.70T     3.20T
tank/vm    400G      3.10T     2.60T
tank/home  2.30T      200G      400G

Interpretation: Great for “which subtree is the problem?” and for avoiding the classic mistake of deleting from the parent when the child is the hog.

Task 14: Confirm reservations are the reason AVAIL is low

cr0x@server:~$ zfs get -o name,property,value -H reservation,refreservation tank/vm/zvol1
tank/vm/zvol1	reservation	none
tank/vm/zvol1	refreservation	300G

cr0x@server:~$ zfs list -o name,avail,usedbyrefreservation -r tank/vm | grep zvol1
tank/vm/zvol1  1209462790553  322122547200

Interpretation: If the pool is tight, refreservation is “spent” space from the allocator’s point of view. If you remove it, you’re changing the contract with that VM or block consumer.

Joke #2: Space accounting is like budgeting—everything is fine until you categorize your spending.

Three corporate-world mini-stories

1) Incident caused by a wrong assumption: “We deleted the data, so the space must be free”

The ticket came in as a routine capacity alert: pool at 88%, climbing. The service owner swore nothing big had changed. The sysadmin did the usual first pass: du -sh on the mountpoint, saw a couple hundred gigabytes, shrugged, and assumed the alert was noisy. It wasn’t.

Two hours later, writes started failing in a non-obvious way. Applications weren’t out of disk inside the filesystem (according to df), but ZFS began returning ENOSPC during bursts. That’s the kind of failure that makes people distrust storage—and they’re not wrong to.

The root issue: heavy churn on a VM image dataset plus aggressive snapshotting. A CI system was generating large artifacts, copying them into VM disks, then discarding and repeating. The live dataset stayed small because the pipeline cleaned up after itself. But snapshots held onto the previous versions of those blocks like a museum that refuses to deaccession anything.

zfs list -o space made it embarrassing in seconds: USEDBYSNAPSHOTS dwarfed everything else. The fix wasn’t “delete more files,” it was “prune snapshots to match reality” and “stop snapshotting churn-heavy data at a cadence meant for databases.” They adjusted retention, moved ephemeral workloads to a dataset with different snapshot policy, and the alert went away permanently.

The lesson stuck: filesystem tools tell you what’s visible; ZFS tells you what’s allocated. If you only listen to one, you’ll make the wrong call under pressure.

2) Optimization that backfired: “Let’s crank snapshot frequency for better RPO”

A team wanted tighter recovery points for a fleet of VM-backed services. They increased snapshot frequency from hourly to every five minutes and kept the same retention window. On paper, it sounded like a free win: snapshots are “cheap,” right?

In practice, the workload rewrote big files constantly. VM images do that: logs inside guests, database pages, package updates, temp files. Copy-on-write meant each snapshot froze a set of block pointers, so the next rewrite had to allocate new blocks. Multiply that by twelve snapshots per hour and suddenly the pool allocator was doing cardio all day.

Then came the second-order effect: replication. Incrementals got larger and more frequent. Network and receiving side I/O went up. Snapshot destroy jobs got slower because there were more snapshots and more block accounting to unwind. People noticed “ZFS is slow” and started proposing drastic changes, including disabling checksums in other systems and buying emergency disks. Classic misdiagnosis spiral.

They pulled back, but not by going back to hourly. They split datasets: critical low-churn data got frequent snapshots; churn-heavy VM scratch and caches got fewer. They also rethought retention: lots of five-minute snapshots for a few hours, fewer hourlies for a day, then dailies. The pool stabilized without losing the real RPO requirement.

The operational lesson: snapshot frequency is not a moral virtue. It’s a cost function. zfs list -o space is how you see the bill.

3) A boring but correct practice that saved the day: “Always reserve breathing room and measure by ZFS, not by vibes”

In one environment—large, multi-tenant, lots of internal teams—storage incidents were rare. Not because the hardware was magical, but because the storage team was stubborn about boring rules.

First: they refused to run pools “hot.” Capacity alerts triggered at thresholds that felt conservative to application teams. The pushback was predictable: “We paid for the disks; why not use them?” The storage team answered with operational reality: allocator behavior degrades, rebuild resilver stress rises, and emergency deletions are where mistakes breed. They wanted margin, not drama.

Second: they built dashboards from zfs list -o space -p, not from du or application-reported numbers. The dashboards broke down USEDBYSNAPSHOTS and USEDBYREFRESERVATION explicitly, so “mystery growth” became “snapshot retention mismatch” or “refreservation creep” instead of a blame game.

Third: they made snapshot policy explicit per dataset class and reviewed it quarterly. Nothing fancy—just enough to catch “CI cache dataset inherited database snapshot policy” kinds of accidents.

When an app team accidentally deployed a log storm that rewrote a huge VM disk repeatedly, the pool started rising. The dashboards showed snapshots as the holder within minutes. They pruned safely, adjusted policy for that dataset, and moved on. No all-hands call, no emergency storage purchase, no “ZFS is haunted” folklore. Boring, correct, effective.

Common mistakes, symptoms, fixes

Mistake 1: Trusting du to explain pool allocation

Symptom: du shows small usage; pool is near full; deleting files doesn’t help.

Why it happens: Snapshots, clones, zvols, compression, and metadata aren’t represented the way you think in a directory walk.

Fix: Use ZFS accounting first.

cr0x@server:~$ zfs list -o space -S used -r tank

Mistake 2: Deleting the wrong dataset because you looked at the parent’s USED

Symptom: You delete from tank mountpoint, nothing changes, or you break something unrelated.

Why it happens: Parent USED includes children via USEDBYCHILDREN.

Fix: Compare USEDDS vs USEDCHILD before acting.

cr0x@server:~$ zfs list -o name,usedbydataset,usedbychildren,usedbysnapshots -r tank

Mistake 3: Assuming snapshot “size” equals snapshot “used”

Symptom: Snapshots look small individually, but USEDBYSNAPSHOTS is huge.

Why it happens: Many small incrementals add up; also, long retention + churn accumulates blocks.

Fix: Sort snapshots by USED and prune by retention policy, not by name aesthetics.

cr0x@server:~$ zfs list -t snapshot -o name,used -S used -r tank/dataset

Mistake 4: Forgetting refreservations on zvols

Symptom: Pool looks inexplicably tight; removing data doesn’t free as much as expected; VM storage datasets show high USEDREFRESERV.

Why it happens: refreservation pre-allocates “guaranteed” space and shows up as consumed in allocator accounting.

Fix: Audit refreservations and decide whether you truly need guarantees or can tolerate overcommit.

cr0x@server:~$ zfs list -o name,usedbyrefreservation,refreservation -r tank/vm

Mistake 5: Confusing quota with pool free space

Symptom: Application says “no space,” but zpool list shows plenty free; dataset AVAIL is small.

Why it happens: Quotas cap dataset growth regardless of pool free.

Fix: Check quota/refquota and adjust intentionally.

cr0x@server:~$ zfs get -o name,property,value -H quota,refquota tank/app

Mistake 6: Treating “delete snapshots” as always safe

Symptom: Snapshot destroy fails or is blocked; space doesn’t reclaim as expected.

Why it happens: Clones depend on origin snapshots; shared blocks across snapshots can reduce reclaim.

Fix: Check for clones via origin and plan clone promotion or lifecycle changes.

cr0x@server:~$ zfs list -o name,origin -r tank | grep -v '^-'

Checklists / step-by-step plan

Step-by-step plan: “Pool is filling faster than expected”

  1. Confirm pool status and capacity trend.
    cr0x@server:~$ zpool list
    cr0x@server:~$ zpool status

    Interpretation: Ensure you’re not mixing a capacity issue with a degraded vdev/resilver performance issue.

  2. Find top datasets by USED and then by snapshot-held space.
    cr0x@server:~$ zfs list -o space -S used -r tank | head -n 30
    cr0x@server:~$ zfs list -o name,usedbysnapshots,usedbydataset,usedbychildren -S usedbysnapshots -r tank | head -n 30

    Interpretation: Identify whether growth is live data, snapshots, children, or reservations.

  3. If snapshots dominate, locate the snapshot policy and churn.
    cr0x@server:~$ zfs list -t snapshot -o name,used -S used -r tank/offender | head
    cr0x@server:~$ zfs get -o name,property,value -H written tank/offender

    Interpretation: Big written + many snapshots = predictable space pressure.

  4. If refreservation dominates, inventory zvols and contracts.
    cr0x@server:~$ zfs list -o name,type,usedbyrefreservation,refreservation,volsize -r tank/offender

    Interpretation: Decide whether guarantees are required or if you can reduce held space safely.

  5. If children dominate, drill down recursively and avoid deleting at the wrong level.
    cr0x@server:~$ zfs list -o space -S used -r tank/offender

    Interpretation: Find the real subtree; don’t “clean up” random paths.

  6. After any change, re-check space breakdown and pool CAP.
    cr0x@server:~$ zfs list -o space tank/offender
    cr0x@server:~$ zpool list

    Interpretation: Ensure the lever you pulled actually moved the numbers you expected.

Checklist: before deleting snapshots in production

  1. Confirm the dataset and snapshot naming/retention policy (avoid deleting the last good restore point).
  2. Check for clones that depend on those snapshots (origin relationships).
  3. Prefer deleting oldest snapshots first to improve reclaim predictability (unless policy dictates otherwise).
  4. Measure expected reclaim using snapshot USED ordering (knowing it’s an upper bound).

Checklist: before changing reservation/refreservation

  1. Identify consumer (VM host, iSCSI target, app requirement).
  2. Confirm whether the guarantee is required for correctness or just comfort.
  3. Adjust in small steps during stable periods; watch pool CAP and workload behavior.

FAQ

1) Why does df show plenty of space but ZFS says the pool is almost full?

df reports filesystem-visible space inside a dataset, generally ignoring snapshot-held blocks and sometimes other allocator realities. ZFS pool fullness is about allocated blocks. If snapshots are large or reservations are in play, these numbers diverge sharply.

2) What’s the difference between USED and REFERENCED?

REFERENCED is how much space the dataset’s live state points to. USED is allocator-facing and can include snapshot-held space, children, and reservation impacts depending on context. When snapshots are heavy, USED can dwarf REFERENCED.

3) Does USEDBYSNAPSHOTS equal the sum of snapshot USED values?

Not always, because sharing and accounting boundaries can complicate naive sums—especially with clones and shared blocks. Treat snapshot USED as “incremental growth by snapshot,” and treat USEDBYSNAPSHOTS as “space held due to snapshots” at the dataset level.

4) I deleted a bunch of snapshots, but space didn’t come back immediately. Why?

Common reasons: blocks are still referenced by other snapshots, clones, or dataset states; asynchronous freeing may take time; or the pressure is actually reservations/refreservations. Re-check zfs list -o space and look for clones via origin.

5) Are reservations and refreservations “real usage”?

They’re real from the allocator’s point of view because they reduce what other datasets can safely use. They may not correspond to live data, but they absolutely affect whether writes succeed under pressure.

6) Why do VM datasets often have huge snapshot usage?

VM disks churn: small writes across large virtual disks rewrite blocks constantly. With snapshots, old blocks remain referenced, so churn translates into space growth. Snapshot frequency and retention matter more for VM datasets than for mostly-append workloads.

7) How do I find the single biggest “space hog” quickly?

Start with:

cr0x@server:~$ zfs list -o space -S used -r tank | head -n 30

Then decide whether the hog is snapshots, children, dataset, or reservations by reading the breakdown columns.

8) When should I use -p output?

Whenever you plan to parse results in scripts, dashboards, or alerts. Human-readable units are ambiguous and locale-sensitive; integers are stable.

9) Can I “fix” space issues by turning off snapshots?

Disabling future snapshots won’t free existing snapshot-held blocks. You must delete snapshots (safely) to reclaim space. Also, turning off snapshots can be a governance decision—make sure you’re not trading capacity for unacceptable recovery risk.

10) What if USEDBYDATASET is huge but the application insists it cleaned up?

Then you may be dealing with data not visible at the mountpoint (zvols), hidden mount overlays, or cleanup that happened inside a guest (VM) while the host sees a big virtual disk file that didn’t shrink. Validate what the dataset actually contains and whether it’s a volume or filesystem.

Conclusion

zfs list -o space is the most useful “capacity truth” view in day-to-day ZFS operations because it answers the only question that matters during an incident: what is holding the blocks? Once you can separate snapshot-held space from live data and from reservation promises, the mystery evaporates—and your remediation options become clear, safe, and fast.

If you operate ZFS in production, make this view a habit: baseline it, alert on it, and teach it. The pool isn’t haunted. It’s accounted for.

← Previous
Docker Bind Mount Permissions on Windows: The Least Painful Setup
Next →
ZFS IO Scheduler Choices: mq-deadline vs none for HDD, SSD, and NVMe

Leave a comment