ZFS refquota: The Only Quota That Stops “Used Space Lies”

December 9, 2025 • February 3, 2026 • Read: 22 min • Views: 15

Was this helpful?

ZFS is a filesystem that tells the truth—except when it doesn’t. Not because it’s buggy, but because ZFS is doing accounting across a family tree: datasets, snapshots, clones, holds, reservations, special vdevs, and metadata you didn’t realize was billable. If you’ve ever stared at USED numbers that don’t match reality, you’ve met the “used space lies.” They’re not malicious. They’re just ZFS being technically correct in a way that will still page you at 02:00.

refquota is the quota that pins the bill to the dataset that created it, instead of letting it be socialized across descendants and shared snapshot history. If you run multi-tenant ZFS (VM storage, home directories, CI caches, containers, backup targets), refquota is the only quota that consistently stops “I didn’t use that space” arguments from turning into a storage incident.

The problem: “used space lies” in ZFS

Let’s name the enemy precisely. When people complain that ZFS lies about used space, they usually mean one of these:

A dataset shows low usage, but the pool is full.
A tenant claims they only used 200G, but the pool lost 2T.
Deleting files doesn’t free space (because snapshots keep blocks alive).
Clones appear to use “nothing” until they diverge, then the bill arrives somewhere unexpected.
A parent dataset quota looks fine, yet children can still cause pool exhaustion via shared snapshot history.

Here’s the key: ZFS does block-level accounting. Blocks can be referenced by multiple things at once: a live filesystem, a snapshot, a clone, a snapshot of a clone, and so on. “Used” is not just “files currently visible.” It’s “blocks currently referenced by something.” And that “something” can be historical.

Classic quotas (quota) are about total space used by a dataset and its descendants. They’re great for “this subtree should not exceed X.” But they are not designed to fairly bill just this dataset’s own changes when snapshots/clones/descendants share blocks. That’s what refquota is for: it caps referenced space by that dataset alone.

One sentence summary you can take into a meeting: quota is a family budget; refquota is a personal credit limit.

First joke (storage engineers deserve at least one): ZFS accounting is like a corporate expense report—everything is itemized, nothing is simple, and somehow “shared costs” always show up on the wrong person’s card.

Why operations teams get burned

Operations failures around ZFS space are rarely about not knowing commands. They’re about assuming a model that isn’t true:

Assuming deleting a file frees space immediately.
Assuming a dataset’s USED is what it “costs” the pool right now.
Assuming quotas behave like ext4 project quotas or XFS quotas.
Assuming snapshots are “free until they’re big.” They’re free until you change blocks.

refquota doesn’t magically make snapshots free. What it does is stop a dataset from growing its referenced footprint beyond a cap—even when the usage is hard to reason about because of shared history. It forces an answer to a simple question: “How big is this dataset allowed to be, no matter what?”

Interesting facts and historical context

Some context points that matter when you’re designing policy and predicting behavior:

ZFS’s core unit of accounting is the block pointer, not the filename. That’s why snapshots can keep space alive without “having files” you can see.
Snapshots are not copies. They’re immutable references to existing blocks, and they only “cost” space as the live dataset diverges.
Clones are writable snapshots. They share blocks with their origin, and your space bill depends on how much they diverge from shared history.
“USED” is a layered metric. It includes space referenced by the dataset plus space referenced on behalf of children and snapshots depending on which property you’re reading (used, refer, usedby*).
ZFS has separate notions of “logical” and “physical” space. Compression and copies mean the logical dataset size can differ wildly from the pool allocation.
Quotas in ZFS evolved to match the dataset model, not POSIX user quotas. They’re per-dataset limits first; user/group quotas are an extra layer.
“Referenced” is the closest thing to a billable footprint. It approximates what would be freed if you destroyed the dataset (not counting blocks still needed elsewhere).
Reservations can cause “mysterious” ENOSPC. A pool with plenty of raw free space can still deny writes if the free space isn’t available to that dataset due to reservations and quotas.
Historically, many ZFS incidents come from snapshot retention policies, not write rates. People size for daily churn and forget that “weekly kept for a year” compounds.

Quota family: quota, refquota, reservation, refreservation

ZFS gives you four knobs that look similar until they ruin your weekend. Let’s define them with operational intent.

quota

quota limits the amount of space a dataset and all its descendants can use. If you set quota=1T on tank/prod, then tank/prod plus tank/prod/db plus tank/prod/vm collectively can’t exceed 1T.

Use it when you want a subtree to have a hard ceiling.

refquota

refquota limits the amount of space a dataset itself can reference (its “personal footprint”), excluding descendants. This is the one that behaves like “cap this dataset, regardless of what its kids do.”

Use it for tenant datasets, VM disks, container roots, home dirs—anything where you need per-entity limits that don’t turn into arguments about shared snapshots.

reservation

reservation guarantees space to a dataset and all its descendants. It’s the “this subtree always has at least X available” promise.

Use it carefully. Reservations are a blunt instrument. They’re how you build “this workload must never run out of space because another tenant spiked.” They’re also how you create a pool that looks half-empty and still returns ENOSPC to the workload that didn’t get a reservation.

refreservation

refreservation guarantees space to a dataset itself (excluding descendants). It’s the personal version of reservation, much like refquota is the personal version of quota.

Use it when a dataset needs guaranteed headroom for bursts, metadata growth, or transactional spikes (databases, WAL, some VM patterns). But treat it like a budget allocation: it reduces the pool’s “free to everyone” space.

A quick mental model

If you only remember one thing:

quota: cap the family.
refquota: cap the person.
reservation: guarantee the family.
refreservation: guarantee the person.

How refquota actually works (and what it doesn’t do)

Referenced vs used: the bill you can defend

The property that matters here is referenced (often shown as REFER in zfs list). This is the amount of space that would be freed if the dataset were destroyed, excluding blocks that are referenced elsewhere (like by snapshots outside the dataset’s own snapshot set, or by clones depending on relationships).

refquota is enforced against the dataset’s referenced space. When you set refquota, ZFS stops writes when that dataset’s referenced space would exceed the cap.

This has two operational consequences:

It provides a per-dataset “stop sign” that doesn’t expand just because a child dataset exists.
It does not prevent the pool from filling up due to snapshots held elsewhere or other datasets. It’s not a pool-wide safeguard; it’s a fairness tool.

What refquota does not solve

refquota won’t save you from:

Pool fragmentation and slop space. ZFS needs working room. If you run pools hot, you will experience performance collapse and weird ENOSPC patterns regardless of quotas.
Snapshot retention explosions. A dataset can stay under its refquota while old snapshots keep massive amounts of space referenced at the pool level. The dataset can’t “see” that bill as its own referenced usage if the blocks are attributed elsewhere.
Replication targets without consistent policy. A send/receive stream can recreate snapshots and holds that make usage look different than the source if you’re not careful with retention and holds.

Second joke, because you’ll need it when the graph goes vertical: A refquota is like a speed limiter on a truck—it won’t stop you from driving into a river, but it will stop you from blaming the engine afterward.

Why quota alone leads to “used space lies”

With snapshots and clones, “who used the space” becomes a question of block ownership. Standard quota is subtree-based and interacts with descendants. If your tenant structure is “one parent dataset with many children,” a quota on the parent can be correct while still being operationally useless for billing and blast-radius control.

Here’s the pattern that bites:

You set quota on tank/tenants.
Each tenant is a child dataset tank/tenants/acme, tank/tenants/zephyr, etc.
Snapshots are taken at the parent or replicated in a way that cross-cuts expectations.
One tenant churns data; snapshots retain old blocks.
Pool fills, but tenant’s dataset appears “small” depending on what metric you look at.

refquota on each tenant dataset changes the conversation from “space is complicated” to “you hit your limit; clean up or buy more.”

Three corporate-world mini-stories

1) Incident caused by a wrong assumption: “Deleting files frees space”

The setup was familiar: a shared ZFS pool backing a fleet of CI runners and some ephemeral build caches. Each team got a dataset, and the platform team set quota on a parent dataset. Snapshots were taken every hour because someone once lost a cache and declared war on data loss.

The incident started as a small alert: pool free space trending down faster than usual. Then it became a real alert: write latency spiking, queues building, random jobs failing with ENOSPC. When the incident channel lit up, the first response was the classic: “Just delete old files from the caches.” Teams complied. They deleted tens or hundreds of gigabytes. The pool didn’t move.

Why? Snapshots. The snapshots held the old blocks, so deleting the live files didn’t free them. But the bigger operational mistake was the assumption that the dataset quotas would prevent any single team from causing broad impact. They didn’t, because the quota was on the parent subtree and the accounting they were watching was USED from zfs list, not a snapshot-aware view.

The fix wasn’t heroic. It was boring and correct: per-team refquota, plus a retention policy that matched the actual use case (hourly snapshots for 24 hours, daily for a week, not “hourly forever”). After that, when a team hit the wall, it was their wall. The pool stopped being a shared tragedy.

2) Optimization that backfired: “Clones to save space”

A virtualization team wanted faster provisioning for dev VMs. They had golden images. They discovered ZFS cloning and decided to use it everywhere: create a snapshot of the template, clone it for each new VM, and enjoy instant creation and shared blocks.

For a while it was beautiful. Storage growth slowed, deploy times improved, and the dashboard looked healthy. Then patch day hit. Everyone applied OS updates across dozens of VMs, and suddenly writes exploded. Divergence from the base image meant each VM started allocating its own blocks. That’s normal, but the surprise came from the human side: teams assumed “the template is shared so it’s basically free,” and no one set per-VM refquotas.

The pool hit a threshold where performance degraded. Not just “slower,” but “metadata operations feel like they’re done by carrier pigeon.” The team tried to mitigate by deleting VMs, but some of the heavy blocks were still referenced due to clones and snapshot relationships. The space didn’t come back where they expected. They were looking at the wrong metrics.

The recovery involved two policy changes: (1) put each VM dataset under a refquota aligned to its intended size, and (2) treat clones as a provisioning tool, not a lifecycle strategy—promote clones when needed, and don’t keep a complex dependency chain across months of snapshots. The optimization wasn’t wrong; it was just missing guardrails.

3) Boring but correct practice that saved the day: “Space budgets + weekly audits”

A different shop ran a multi-tenant ZFS appliance for internal teams. It wasn’t flashy: predictable snapshot schedules, consistent naming, and a small weekly ritual. Every Monday morning, an engineer spent 15 minutes reviewing a short report: top datasets by usedbysnapshots, top by referenced, datasets near refquota, and pools under 20% free.

This practice looked unnecessary—until a vendor integration started dumping large, compressible logs into the wrong dataset. Compression made the logical size look terrifying, but the physical allocation stayed moderate… until it didn’t. Log rotation churned blocks daily, and snapshots retained old versions. The pool trend line began to curve up.

The Monday audit caught it before it became an incident. The team saw a dataset with rapidly increasing usedbysnapshots and a refquota that was approaching. They fixed the ingestion path, expired snapshots for that dataset only, and increased the refquota slightly to avoid breaking a critical pipeline while changes rolled out.

No late-night pager. No emergency expansion. Just a small, boring habit and quotas that meant what they thought they meant. In production, boring is a feature.

Practical tasks: commands and interpretations

These are tasks you can run today. Each includes what to look for and how to interpret the results. Examples assume a pool named tank.

Task 1: See the truth table: used vs referenced

cr0x@server:~$ zfs list -o name,used,refer,usedbysnapshots,usedbychildren,usedbydataset -r tank
NAME                      USED  REFER  USEDBYSNAPSHOTS  USEDBYCHILDREN  USEDBYDATASET
tank                     3.12T   192K             0B            3.12T            192K
tank/tenants             3.12T   128K           540G            2.58T            128K
tank/tenants/acme         900G   600G           220G             80G            600G
tank/tenants/zephyr       780G   760G            10G             10G            760G

Interpretation: USED includes snapshots and children. REFER is the dataset’s “personal footprint.” If a tenant says “I’m only 600G,” check whether snapshots are holding 220G and whether children are using more.

Task 2: Check quotas and refquotas in one shot

cr0x@server:~$ zfs get -o name,property,value,source quota,refquota,reservation,refreservation tank/tenants/acme
NAME               PROPERTY        VALUE  SOURCE
tank/tenants/acme  quota           none   default
tank/tenants/acme  refquota        700G   local
tank/tenants/acme  reservation     none   default
tank/tenants/acme  refreservation  none   default

Interpretation: If refquota is set and quota is not, you’re enforcing per-dataset limits without restricting descendants. That’s often correct for tenant datasets that shouldn’t be punished for child datasets (or don’t use them).

Task 3: Set a refquota safely (and confirm it)

cr0x@server:~$ sudo zfs set refquota=500G tank/tenants/acme
cr0x@server:~$ zfs get -o name,property,value refquota tank/tenants/acme
NAME               PROPERTY  VALUE  SOURCE
tank/tenants/acme  refquota  500G   local

Interpretation: Writes to tank/tenants/acme will fail once REFER reaches ~500G (subject to recordsize, metadata, slop space). This is the “stop sign.”

Task 4: Simulate “why did my delete not free space?”

cr0x@server:~$ zfs list -t snapshot -o name,used,refer,creation -s creation tank/tenants/acme
NAME                              USED  REFER  CREATION
tank/tenants/acme@hourly-2025...   24G   600G  Mon Dec 23 01:00 2025
tank/tenants/acme@hourly-2025...   30G   620G  Mon Dec 23 02:00 2025

Interpretation: Snapshot USED is how much unique space that snapshot is keeping alive compared to the current dataset state. If snapshot USED is large, deletes won’t free much until snapshots expire or are destroyed.

Task 5: Find which datasets are near their refquota

cr0x@server:~$ zfs list -o name,refer,refquota -r tank/tenants
NAME                REFER  REFQUOTA
tank/tenants         128K      none
tank/tenants/acme    498G      500G
tank/tenants/zephyr  310G      800G

Interpretation: acme is about to hit a hard stop. You should expect application errors soon (write failures), not a gentle warning.

Task 6: Diagnose ENOSPC: check pool health and free space

cr0x@server:~$ zpool list -o name,size,alloc,free,cap,health
NAME  SIZE  ALLOC  FREE  CAP  HEALTH
tank  7.25T  6.60T  660G  90%  ONLINE

Interpretation: 90% capacity is danger territory for many pools, especially with HDD vdevs. Even if quotas are set, the pool can still fill due to snapshots, reservations, or other datasets.

Task 7: Show space held by snapshots (the silent tax)

cr0x@server:~$ zfs list -o name,usedbysnapshots -r tank/tenants | sort -h -k2
tank/tenants               540G
tank/tenants/acme          220G
tank/tenants/zephyr         10G

Interpretation: If usedbysnapshots dominates, your “space problem” is really a retention problem. Refquota won’t undo that; it just prevents a tenant from growing their referenced footprint indefinitely.

Task 8: Identify holds that prevent snapshot deletion

cr0x@server:~$ zfs holds -r tank/tenants/acme@hourly-2025-12-23-0200
NAME                               TAG          TIMESTAMP
tank/tenants/acme@hourly-2025-12-23-0200  keep         Tue Dec 23 02:10 2025

Interpretation: A hold tag (here keep) blocks deletion. If space isn’t freeing after “we removed old snapshots,” check holds before you assume ZFS is haunted.

Task 9: Release a hold and delete the snapshot

cr0x@server:~$ sudo zfs release keep tank/tenants/acme@hourly-2025-12-23-0200
cr0x@server:~$ sudo zfs destroy tank/tenants/acme@hourly-2025-12-23-0200

Interpretation: Space won’t necessarily return instantly if other snapshots/clones still reference the same blocks, but you’ve removed one anchor.

Task 10: Audit clone relationships (space shared across lineage)

cr0x@server:~$ zfs get -o name,property,value origin tank/vm/dev-42
NAME           PROPERTY  VALUE
tank/vm/dev-42  origin   tank/vm/template@golden-2025-12-01

Interpretation: If a dataset is a clone, destroying the origin snapshot may be blocked, and accounting may surprise you. Space can be “owned” by a chain of dependencies.

Task 11: Promote a clone to break dependency on the origin

cr0x@server:~$ sudo zfs promote tank/vm/dev-42
cr0x@server:~$ zfs get -o name,property,value origin tank/vm/dev-42
NAME           PROPERTY  VALUE
tank/vm/dev-42  origin   -

Interpretation: Promotion flips the dependency so the clone becomes independent in the lineage. This is a lifecycle decision; it can simplify cleanup and make future space behavior easier to predict.

Task 12: Confirm what a dataset would free (the defensive report)

cr0x@server:~$ zfs get -o name,property,value referenced,logicalreferenced,used,logicalused tank/tenants/acme
NAME               PROPERTY           VALUE
tank/tenants/acme  referenced         498G
tank/tenants/acme  logicalreferenced  1.02T
tank/tenants/acme  used               900G
tank/tenants/acme  logicalused        1.60T

Interpretation: Compression (and sometimes copies) makes logical sizes larger than physical. If you’re charging back, decide whether you bill on physical (referenced) or logical (logicalreferenced). For capacity planning, physical matters; for “how much data do you have,” logical may match user expectation.

Task 13: Watch enforcement in action (spotting refquota hits)

cr0x@server:~$ zfs get -o name,property,value refquota,refer tank/tenants/acme
NAME               PROPERTY  VALUE
tank/tenants/acme  refquota  500G
tank/tenants/acme  refer     498G

cr0x@server:~$ sudo -u acmeuser dd if=/dev/zero of=/tank/tenants/acme/bigfile bs=1M count=4096
dd: failed to open '/tank/tenants/acme/bigfile': Disc quota exceeded

Interpretation: The error will often be “Disc quota exceeded” (EDQUOT), not “No space left on device.” That’s good: it points to dataset policy, not pool exhaustion.

Task 14: Find datasets with reservations that can cause surprise ENOSPC

cr0x@server:~$ zfs get -o name,property,value -r tank reservation,refreservation | egrep -v ' none$'
tank/prod         reservation     2T
tank/prod/db      refreservation  300G

Interpretation: Reservations carve out space. If a pool looks like it has free space but workloads still fail, reservations are one of the first suspects.

Fast diagnosis playbook

This is the order that finds the bottleneck quickly in real incidents. Don’t start by debating compression ratios. Start by locating the enforcement boundary and the actual consumer.

1) Determine the failure mode: EDQUOT vs ENOSPC

Check application logs and kernel messages. If you see “Disc quota exceeded,” you’re hitting quota or refquota (or user/group quotas). If you see “No space left on device,” the pool is likely out of allocatable space, or constrained by reservations/slop.

cr0x@server:~$ dmesg | tail -n 20
...

2) Check pool capacity and health first (always)

cr0x@server:~$ zpool list -o name,size,alloc,free,cap,health
NAME  SIZE  ALLOC  FREE  CAP  HEALTH
tank  7.25T  6.60T  660G  90%  ONLINE

cr0x@server:~$ zpool status -x
all pools are healthy

Decision: If CAP is high (commonly >80–85% on HDD pools, sometimes lower depending on workload), treat it as a capacity incident even if “free” exists. ZFS needs headroom for allocation and performance.

3) Identify whether snapshots are the real consumer

cr0x@server:~$ zfs list -o name,usedbysnapshots -r tank | sort -h -k2 | tail -n 10
tank/tenants               540G
tank/backup                1.40T

Decision: If snapshots dominate, go straight to retention policy, holds, and replication behavior. Don’t waste time deleting live files.

4) If the error is EDQUOT, check refquota/refer first

cr0x@server:~$ zfs get -o name,property,value refquota,quota,refer,used tank/tenants/acme
NAME               PROPERTY  VALUE
tank/tenants/acme  refquota  500G
tank/tenants/acme  quota     none
tank/tenants/acme  refer     498G
tank/tenants/acme  used      900G

Decision: If refer is near refquota, the fix is to delete/compact data in that dataset, expire snapshots that are inflating referenced blocks (rare but possible depending on clone relationships), or increase the refquota intentionally.

5) If the error is ENOSPC, check reservations next

cr0x@server:~$ zfs get -o name,property,value -r tank reservation,refreservation | egrep -v ' none$'
tank/prod         reservation     2T
tank/prod/db      refreservation  300G

Decision: Reservations might be consuming the allocatable free space. Either reduce them, add capacity, or move workloads—preferably not during the incident unless you’re already in the “choose pain” phase.

6) Find the top space consumers by referenced space

cr0x@server:~$ zfs list -o name,refer -r tank | sort -h -k2 | tail -n 15
tank/backup/job-17     1.10T
tank/tenants/acme       498G
tank/prod/db            420G

Decision: refer gives you the “if I delete this dataset, what do I get back” view. It’s a fast way to rank candidates for cleanup.

Common mistakes, symptoms, and fixes

Mistake 1: Setting quota when you meant refquota

Symptom: A tenant dataset appears limited, but child datasets can still grow in ways you didn’t anticipate, or the limit applies to the whole subtree and causes internal fights.

Fix: Apply refquota to each tenant dataset that represents a billable unit. Use quota only when you explicitly want a subtree cap.

cr0x@server:~$ sudo zfs set refquota=200G tank/tenants/zephyr

Mistake 2: Reading USED as “what this dataset costs”

Symptom: You delete data from a dataset, but its USED barely changes. Or a dataset looks huge because it has children.

Fix: Use refer for the dataset’s own footprint; use usedbysnapshots to see snapshot tax; use usedbychildren to understand subtree growth.

Mistake 3: Not accounting for snapshot retention

Symptom: Pool fills slowly but relentlessly, even though active datasets are stable and “cleanup” doesn’t help.

Fix: Inspect snapshot usage and retention. Remove holds. Adjust schedules. Consider per-dataset policies rather than one global snapshot plan.

Mistake 4: Clone dependency chains that prevent cleanup

Symptom: You can’t destroy an old snapshot because “snapshot has dependent clones,” or space doesn’t return after deleting the “big thing.”

Fix: Audit origin relationships. Promote clones when appropriate. Don’t keep clones tied to a months-old template snapshot unless you enjoy archaeology.

Mistake 5: Confusing physical and logical sizes

Symptom: Users report “I only have 300G of data,” but you see 800G logical. Or finance wants chargeback and numbers don’t match expectations.

Fix: Decide which number is policy: physical (referenced) for capacity, logical (logicalreferenced) for perceived data volume. Communicate it. Put it in runbooks.

Mistake 6: Overusing reservations

Symptom: Pool has free space but writes fail for some datasets; or one workload seems “immune” while others starve.

Fix: Audit reservation/refreservation. Remove or right-size. Use reservations sparingly and intentionally.

Mistake 7: Expecting refquota to protect the pool

Symptom: Every tenant has a refquota, but the pool still fills and the platform team is surprised.

Fix: Refquota is per-dataset enforcement, not a pool capacity plan. You still need pool headroom targets, snapshot retention controls, and monitoring for pool-wide consumers like backups.

Checklists / step-by-step plan

Checklist A: Implement refquota for multi-tenant datasets

Define billable units. One dataset per tenant/VM/container root is ideal.
Choose a quota basis. Physical (refquota on referenced) is easiest to enforce; document it.
Set refquota on each tenant dataset.
Optionally set a parent quota as a circuit breaker. Use quota on the parent to cap the whole program.
Build a snapshot policy per class of data. CI caches are not databases; don’t treat them the same.
Monitor near-limit datasets. Alert on refer/refquota > 85–90% and on usedbysnapshots growth rate.

cr0x@server:~$ sudo zfs set refquota=300G tank/tenants/acme
cr0x@server:~$ sudo zfs set refquota=500G tank/tenants/zephyr
cr0x@server:~$ sudo zfs set quota=5T tank/tenants

Checklist B: Incident response for “pool is filling”

Check pool CAP and health (zpool list, zpool status).
Rank datasets by usedbysnapshots and refer.
Check holds on large snapshots that should have expired.
Check clone dependencies preventing snapshot deletion.
Adjust retention and delete snapshots surgically (not globally, unless you’re already in emergency mode).
If the pool is hot (>85–90%), plan immediate headroom: delete, move, or expand. Quotas won’t create space.

Checklist C: Ongoing hygiene that keeps refquota honest

Weekly report: top usedbysnapshots, top refer, near refquota.
Monthly review of reservations.
Snapshot lifecycle review after major product changes (new logging, new build artifacts, new replication).
Template/clone lifecycle policy: when to promote, when to rebuild from scratch.

FAQ

1) Is refquota enforced on logical or physical space?

refquota is enforced on referenced space as ZFS accounts it (effectively physical allocation after compression). That’s why a compressed dataset can store more logical data than the refquota might suggest.

2) If I set refquota, can snapshots still fill my pool?

Yes. Snapshots can retain old blocks and consume pool space. Refquota limits how much the live dataset references, not how much historical data your snapshot policy retains across the pool.

3) Why does a dataset show low REFER but high USED?

Because USED can include snapshot space and child datasets. REFER is “what this dataset itself references.” Always break down with usedbydataset, usedbysnapshots, and usedbychildren.

4) What error will applications see when refquota is hit?

Commonly Disc quota exceeded (EDQUOT). That’s a clue you hit a dataset limit, not pool exhaustion.

5) Should I use quota or refquota for home directories?

If each user has their own dataset, use refquota. If users are directories within one dataset, dataset quotas won’t help; you’d need user/group quotas (a different mechanism) or restructure into per-user datasets.

6) Does refquota include descendants?

No. That’s the point. refquota limits the dataset itself, excluding children. If you want a subtree cap, that’s quota.

7) How does refquota interact with reservations?

A reservation guarantees space availability; a refquota caps growth. You can have both: guarantee 50G with refreservation and cap at 500G with refquota. Misuse can create confusing free-space behavior, so monitor both.

8) Why did increasing refquota not fix ENOSPC?

Because ENOSPC is often pool-level exhaustion (or reservation constraints), not a dataset limit. If the pool is at high capacity, raising a refquota just changes who gets to fail next.

9) Can I rely on refquota for chargeback?

You can rely on it for enforcement and blast-radius control. For chargeback, decide whether you bill on referenced physical space, logical space, or provisioned limits. The most defensible operationally is billing on enforced limits (refquota) plus overage exceptions.

10) What’s the simplest policy that works?

One dataset per tenant/workload, refquota

Conclusion

ZFS doesn’t lie about space. It reports space the way a copy-on-write, snapshotting, cloning filesystem must report it: blocks and references, not just files and folders. The problem is that humans keep asking ZFS questions in the wrong dialect, then act surprised by the answers.

refquota is the quota that translates ZFS’s block-level reality into a policy boundary you can enforce per dataset. It won’t stop your pool from filling if your snapshot policy is reckless, and it won’t make clones “free forever.” But it will stop the most common operational argument—“that space isn’t mine”—by making “mine” a measurable, enforceable footprint. In multi-tenant storage, that’s not just convenient. It’s the difference between a stable platform and a weekly ritual of storage blame.