Multi-tenant storage fails in the least poetic way possible: the pool hits 100%, metadata updates stall, and suddenly your “small issue”
becomes an outage with a meeting invite. One noisy tenant doesn’t need malice; a runaway build cache or a log loop will do.
ZFS gives you the tools to keep tenants in their lane. The trick is choosing the right kind of quota, placing it at the right boundary,
and understanding how snapshots and reservations bend your mental model. If you get any of those wrong, you haven’t enforced fairness—
you’ve just created a new and exciting failure mode.
Design goals: what “safe multi-tenant” actually means
Multi-tenant ZFS safety is not “everyone gets a quota.” It’s a set of explicit outcomes:
- One tenant cannot fill the pool (or if they can, you notice early and the blast radius is bounded).
- Pool free space stays above a safety floor so ZFS can allocate, flush, and keep latency sane.
- Tenants get predictable errors: ideally EDQUOT (quota exceeded), not ENOSPC (pool full), and not “everything is slow.”
- Operations can explain space usage without interpretive dance: “this dataset is big because snapshots” is a real answer.
- Deletion works when you need it. “Disk full and you can’t delete” is a classic storage horror story.
You’re designing boundaries. Datasets are those boundaries. Quotas enforce them. Reservations guarantee them. And snapshots are the
boundary-crossing gremlins you must account for or you’re just doing performance art.
Opinionated guidance: use datasets as tenant containers, not just directories. If you can’t put a ZFS property on it,
you can’t reliably govern it.
Interesting facts and historical context
- ZFS was built at Sun in the mid-2000s with end-to-end data integrity and pooled storage as first-class goals, not bolt-ons.
- Quotas arrived early because ZFS expected consolidation: multiple consumers sharing a pool, each needing predictable limits.
- Snapshots are cheap to create because they’re metadata-only at birth; the cost shows up later via referenced blocks.
- “Referenced” vs “used” in ZFS reporting exists specifically because snapshots complicate “how much space is mine?”
- Reservations were designed for fairness and availability: they keep critical datasets alive even when the pool is pressured.
- Zvols and filesystems are governed differently: quotas on filesystems don’t directly map to zvol consumers; provisioning strategies matter.
- Historically, ZFS wanted free space headroom (often 10–20%) to keep allocation efficient and avoid pathological fragmentation and latency spikes.
- OpenZFS evolved the tooling (like expanded quota reporting) as operators deployed it in larger, noisier multi-tenant environments.
Quota primitives: quota, refquota, reservations, and why names matter
Dataset boundaries are the policy boundary
ZFS doesn’t do “quotas on a directory tree” in the same native way traditional filesystems do. It does properties on datasets.
That’s a feature. It forces you to define real tenants. A tenant is a dataset. Everything else is an implementation detail.
quota: limits the dataset and its descendants
quota caps the total space a dataset can consume, including space used by descendants (child datasets).
This is the right tool when the tenant owns a subtree of datasets.
But it’s also the tool that surprises people because it interacts with snapshots. If your tenant’s dataset has snapshots,
the blocks held by snapshots count toward usage in a way that can be unintuitive. If you want “the tenant’s live data” capped,
you probably want refquota.
refquota: limits referenced space (live data), not snapshots
refquota caps the dataset’s referenced space: the blocks currently reachable from the dataset’s head.
Snapshots are not part of “referenced,” so tenants can’t get stuck because old snapshots are holding space hostage.
That sounds like magic. It’s not. The pool can still fill because snapshots still consume pool space. You’ve just moved the blast radius:
you prevented the tenant from getting random EDQUOT because of retention, but you did not prevent pool-wide ENOSPC.
reservation and refreservation: guaranteed space, but not free lunch
Reservations carve out space that cannot be used by others. They’re your “keep this service alive” lever.
reservation includes descendants. refreservation applies to referenced space.
Reservations can save you in a pool pressure event. They can also turn “we are low” into “we are dead” if overused, because they make
free space look available to the pool but unavailable to most datasets.
Why “one user killing the pool” still happens with quotas
Quotas stop a tenant from writing beyond a limit. They do not automatically enforce a pool-wide safety floor.
If you set quotas that sum to 200% of the pool, you’ve created oversubscription. That might be fine for many workloads.
It might also be how you end up learning what “space accounting under snapshots” means at high speed.
Paraphrased idea, attributed: When you build systems, you trade easy problems for hard ones; reliability work is choosing the hard problems you can monitor.
— Charity Majors (paraphrased idea)
Also: quotas don’t reduce write amplification. A tenant can stay under quota and still destroy latency by forcing fragmentation,
sync-heavy workloads, or small-block churn. Quotas are about capacity governance, not performance governance. You need both.
Joke #1: A quota is like a diet—effective until you discover snapshots are the midnight snacks you didn’t log.
Dataset layout models that don’t hate you back
Model A: one dataset per tenant (the default winner)
Create pool/tenants/$tenant as a filesystem dataset. Put everything for that tenant there.
Apply quotas, compression, recordsize choices, snapshot policies, and mountpoints per tenant.
Pros: clean governance, easy reporting, low cognitive load. Cons: more datasets (which is fine until you get silly), and you need automation.
Model B: parent dataset with child datasets per service
Example: pool/tenants/acme/home, pool/tenants/acme/db, pool/tenants/acme/cache.
Put a quota on the parent to bound the total tenant footprint, and refquota on specific children to keep live data sane.
This model lets you tune properties per workload (database recordsize, logbias, compression) while still enforcing a tenant-level cap.
It’s a grown-up design when you operate platform services.
Model C: directory-per-tenant inside one dataset (avoid)
Traditional UNIX admins love this because it’s simple: /srv/tenants/acme, /srv/tenants/zenith.
On ZFS, it’s the wrong abstraction. You lose native governance and end up bolting on user/group quotas, project quotas, or external tooling.
There are valid reasons—like millions of tenants where dataset count becomes a management issue—but make that choice with eyes open.
For most corporate multi-tenant systems (dozens to thousands), dataset-per-tenant is both safer and simpler.
Model D: zvol-per-tenant (only when you must)
If tenants need block devices (VM disks, iSCSI LUNs), you’ll use zvols. Quotas on zvols are volsize.
Thin provisioning can oversubscribe a pool hard if you’re not careful. For multi-tenant, you must pair this with strict monitoring
and a pool safety floor.
Snapshots: the silent quota bypass
The two most common “quota surprises” are:
- The tenant hits their quota even after deleting a bunch of files.
- The tenant stays under quota but the pool still fills and everyone suffers.
How snapshots mess with deletion
If a snapshot references blocks that a file used, deleting the file from the live dataset doesn’t free those blocks. The snapshot still owns them.
This is why operators say “space is stuck in snapshots.” It’s not stuck; it’s correctly accounted to history.
If you used quota (not refquota), snapshot-held blocks contribute to “used” and can keep a tenant pinned at quota.
The tenant will swear they deleted things. They did. Your retention policy disagrees.
Why refquota helps users but can hurt pools
refquota is a user-experience improvement: it makes quota enforcement track the live dataset head.
But it shifts the risk: snapshots can grow until the pool is pressured. If you choose refquota, you must also choose:
snapshot limits, retention discipline, and pool-wide alerting.
Snapshot retention is policy, not a backup strategy
Snapshots are great for short-term rollback, replication streams, and forensic recovery. They are not a license to keep everything forever
on your hottest pool. Treat retention like a budget: define it, enforce it, and review it when tenants change behavior.
Joke #2: Snapshots are like office junk drawers—nobody wants them, but everyone panics when you try to empty them.
Practical tasks (commands, output, decisions)
The fastest way to get quotas right is to run the same small set of commands every time, and interpret them consistently.
Below are real tasks you can execute on a ZFS host. Each includes: command, what the output means, and what decision to make.
Task 1: Confirm pool health and whether you’re already in trouble
cr0x@server:~$ zpool status -x
all pools are healthy
Meaning: no known pool errors. This does not mean you have free space, nor does it mean performance is fine.
Decision: if this is not “healthy,” fix hardware/pool errors first. Quotas won’t save a degraded pool from bad latency.
Task 2: Check pool capacity, fragmentation, and headroom
cr0x@server:~$ zpool list -o name,size,alloc,free,cap,frag,health
NAME SIZE ALLOC FREE CAP FRAG HEALTH
tank 21.8T 18.9T 2.9T 86% 42% ONLINE
Meaning: 86% used, fragmentation rising. Many ZFS pools get unpleasant above ~85–90%, depending on workload.
Decision: if cap > 85%, treat quotas as secondary; you need a capacity plan (delete snapshots, add vdevs, move tenants).
Task 3: Identify the biggest datasets first (the usual suspects)
cr0x@server:~$ zfs list -o name,used,refer,avail,mountpoint -S used | head -n 10
NAME USED REFER AVAIL MOUNTPOINT
tank/tenants/zenith 6.21T 1.02T 1.48T /srv/tenants/zenith
tank/tenants/acme 3.88T 3.62T 2.11T /srv/tenants/acme
tank/tenants/blue 2.45T 2.40T 1.90T /srv/tenants/blue
tank/backups 1.91T 1.88T 4.05T /tank/backups
tank/tenants 512K 192K 2.90T /srv/tenants
Meaning: notice USED vs REFER. zenith has huge USED but small REFER: snapshots or descendants own the difference.
Decision: if USED ≫ REFER, investigate snapshots/children before yelling at the tenant.
Task 4: See quotas and reservations applied across tenants
cr0x@server:~$ zfs get -r -o name,property,value,source quota,refquota,reservation,refreservation tank/tenants | head -n 25
NAME PROPERTY VALUE SOURCE
tank/tenants quota none default
tank/tenants refquota none default
tank/tenants reservation none default
tank/tenants refreservation none default
tank/tenants/acme quota 5T local
tank/tenants/acme refquota none default
tank/tenants/acme reservation none default
tank/tenants/acme refreservation none default
tank/tenants/blue quota 3T local
tank/tenants/blue refquota 2500G local
tank/tenants/blue reservation none default
tank/tenants/blue refreservation none default
tank/tenants/zenith quota 7T local
tank/tenants/zenith refquota 1500G local
tank/tenants/zenith reservation 500G local
tank/tenants/zenith refreservation none default
Meaning: you can audit governance quickly. Mixed strategy is fine, but it must be intentional.
Decision: if tenants rely on “deletes free space,” favor refquota plus snapshot controls. If you want “all in,” use quota.
Task 5: Set a tenant quota (hard cap) and immediately verify
cr0x@server:~$ sudo zfs set quota=2T tank/tenants/acme
cr0x@server:~$ zfs get -o name,property,value tank/tenants/acme quota
NAME PROPERTY VALUE
tank/tenants/acme quota 2T
Meaning: writes that would exceed 2T for that dataset subtree will fail with quota errors.
Decision: if acme has child datasets, remember quota includes them. If you want only the head dataset capped, use refquota.
Task 6: Set refquota for “live data” and confirm refer behavior
cr0x@server:~$ sudo zfs set refquota=1500G tank/tenants/acme
cr0x@server:~$ zfs get -o name,property,value tank/tenants/acme refquota
NAME PROPERTY VALUE
tank/tenants/acme refquota 1500G
Meaning: the dataset head can’t exceed 1.5T referenced. Snapshots can still grow.
Decision: pair this with snapshot retention/limits or you’re just postponing the argument until the pool is full.
Task 7: Guarantee headroom for a critical service using reservation
cr0x@server:~$ sudo zfs set reservation=200G tank/tenants/platform
cr0x@server:~$ zfs get -o name,property,value tank/tenants/platform reservation
NAME PROPERTY VALUE
tank/tenants/platform reservation 200G
Meaning: 200G is carved out for that dataset tree. Other tenants can’t consume it.
Decision: use reservations sparingly. They are for “must keep running” datasets, not for political comfort.
Task 8: Spot snapshot-driven usage growth on a dataset
cr0x@server:~$ zfs list -t snapshot -o name,used,refer,creation -S used tank/tenants/zenith | head -n 8
NAME USED REFER CREATION
tank/tenants/zenith@daily-2025-12-25 210G 1.02T Thu Dec 25 01:00 2025
tank/tenants/zenith@daily-2025-12-24 198G 1.01T Wed Dec 24 01:00 2025
tank/tenants/zenith@daily-2025-12-23 176G 1.00T Tue Dec 23 01:00 2025
tank/tenants/zenith@daily-2025-12-22 165G 1008G Mon Dec 22 01:00 2025
tank/tenants/zenith@daily-2025-12-21 152G 1004G Sun Dec 21 01:00 2025
tank/tenants/zenith@daily-2025-12-20 141G 1001G Sat Dec 20 01:00 2025
tank/tenants/zenith@daily-2025-12-19 135G 999G Fri Dec 19 01:00 2025
Meaning: each snapshot’s USED is the unique blocks held by that snapshot. Growth here often means churn (rewrites) in the live dataset.
Decision: if snapshot USED is ballooning, shorten retention, move churny workloads, or tune workload (e.g., stop rewriting giant files).
Task 9: Confirm what space is actually available to a tenant under quota
cr0x@server:~$ zfs get -o name,avail,used,quota,refquota tank/tenants/acme
NAME AVAIL USED QUOTA REFQUOTA
tank/tenants/acme 320G 1.68T 2T 1500G
Meaning: AVAIL reflects the tighter constraint between pool free space and quota/refquota enforcement. Here refquota is likely the limiter.
Decision: if AVAIL is unexpectedly tiny, check whether refquota is lower than intended, or whether snapshots/descendants are counted via quota.
Task 10: Find which children are consuming a parent tenant quota
cr0x@server:~$ zfs list -r -o name,used,refer,quota,refquota -S used tank/tenants/acme
NAME USED REFER QUOTA REFQUOTA
tank/tenants/acme 1.68T 1.45T 2T 1500G
tank/tenants/acme/cache 220G 210G none 250G
tank/tenants/acme/db 110G 108G none none
tank/tenants/acme/home 35G 34G none none
Meaning: the cache is large and close to its refquota. That’s often correct: caches should be bounded.
Decision: if the cache is unbounded, set a refquota. If db is spiky, consider separate quotas and reservation to keep it alive.
Task 11: Identify whether “space not freeing” is snapshots vs open files
cr0x@server:~$ sudo zfs destroy -n tank/tenants/zenith@daily-2025-12-19
would destroy tank/tenants/zenith@daily-2025-12-19
would reclaim 135G
Meaning: a dry-run destroy tells you reclaimable space if you remove a snapshot. This is gold for decision-making.
Decision: if reclaimable space is large and you’re in trouble, delete snapshots (starting with oldest) per policy.
If reclaim is tiny, you’re not chasing snapshots—look for open-but-deleted files or other datasets.
Task 12: Check for open-but-deleted files pinning space (classic ENOSPC bait)
cr0x@server:~$ sudo lsof +L1 /srv/tenants/acme | head
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
java 2714 acme 12w REG 0,118 2147483648 0 553211 /srv/tenants/acme/logs/app.log (deleted)
Meaning: the process still holds a file descriptor to a deleted file. Space won’t be freed until the process closes it.
Decision: restart or signal the process to reopen logs. Don’t delete more files; you’ll just create more “deleted but open.”
Task 13: Confirm mountpoints and avoid “writing outside the dataset you think”
cr0x@server:~$ zfs get -o name,mountpoint,canmount tank/tenants/acme
NAME MOUNTPOINT CANMOUNT
tank/tenants/acme /srv/tenants/acme on
Meaning: if mountpoints are wrong, tenants can write to the parent dataset (unquoted) and bypass limits.
Decision: verify every tenant’s mountpoint and ensure parent datasets are not writable by tenants.
Task 14: Apply “boring guardrails” on the parent dataset
cr0x@server:~$ sudo zfs set readonly=on tank/tenants
cr0x@server:~$ zfs get -o name,readonly tank/tenants
NAME PROPERTY VALUE
tank/tenants readonly on
Meaning: tenants can’t accidentally write to the parent mount (if it’s even mounted), forcing writes into tenant datasets.
Decision: for multi-tenant, make parents non-writable and use explicit child mountpoints. It prevents accidental bypass.
Task 15: Monitor per-dataset logical space pressure (quota nearing)
cr0x@server:~$ zfs list -o name,used,quota,refquota,available -r tank/tenants | awk 'NR==1 || $3!="none" || $4!="none"{print}'
NAME USED QUOTA REFQUOTA AVAIL
tank/tenants/acme 1.68T 2T 1500G 320G
tank/tenants/blue 2.45T 3T 2500G 550G
tank/tenants/zenith 6.21T 7T 1500G 790G
Meaning: a quick view of governed datasets. AVAIL gives you a near-term “will writes fail soon?” indicator.
Decision: alert on %used of quota and also on pool cap. A tenant can be fine while the pool is not.
Task 16: For zvol tenants, verify thin provisioning risk
cr0x@server:~$ zfs list -t volume -o name,volsize,used,refer,logicalused,logicalrefer -S logicalused
NAME VOLSIZE USED REFER LOGICALUSED LOGICALREFER
tank/vm/tenant01 800G 120G 120G 640G 640G
tank/vm/tenant02 800G 160G 160G 790G 790G
Meaning: logicalused shows what the guest thinks it used; USED is what the pool actually allocated.
Thin provisioning hides risk until it doesn’t.
Decision: if logicalused approaches volsize across many tenants, treat it as real capacity pressure and budget space accordingly.
Fast diagnosis playbook
When a multi-tenant pool is in trouble, you don’t have time for philosophical purity. You need a fast path to: “what is filling what?”
and “is this capacity or performance?”
First: confirm whether you have a pool-wide emergency
- Pool capacity:
zpool list -o name,alloc,free,cap,frag. If cap is > 90%, assume everything will get weird. - Pool health:
zpool status. If degraded, expect worse latency and slower deletes. - Immediate reclaim candidates:
zfs list -t snapshot -o name,used -S used.
Second: identify whether the pain is “quota hit” or “pool full”
- If tenants see errors like “Disk quota exceeded,” you’re dealing with dataset-level governance.
- If everyone sees “No space left on device,” you’re dealing with pool-level exhaustion or reservation starvation.
- Check
zfs get avail,quota,refquotaon the impacted dataset and compare to pool free.
Third: decide snapshots vs open files vs a different dataset
- Snapshots: if USED ≫ REFER on the dataset, list snapshots and do a dry-run destroy to estimate reclaim.
- Open-but-deleted files: run
lsof +L1on the mount. If present, restart the offender. - Wrong mountpoint / bypass: verify mountpoints and check whether writes landed in a parent dataset with no quota.
Fourth: if performance is the symptom, don’t confuse it with capacity
- High fragmentation + high cap can look like “quota issues” because writes time out or stall.
- Measure IO pressure with
zpool iostat -v 1and look for saturated vdevs. - If you’re near full, your best “performance tuning” is freeing space.
Three corporate mini-stories from the quota trenches
Mini-story 1: the outage caused by a wrong assumption
A mid-sized company ran a shared ZFS pool for internal teams: analytics, build systems, a few web properties. They did the sensible thing:
dataset per team, quotas on each dataset. They were proud. The pool was stable. Then one Monday, half the CI jobs failed with ENOSPC.
The on-call assumed a team had exceeded its quota. But quotas were fine. Each team dataset still had headroom.
The pool, however, was at 98%, and ZFS was behaving like a storage system at 98%: allocation got expensive, and metadata updates slowed down.
The wrong assumption was subtle: “If every team has a quota, the pool can’t fill.” Quotas don’t sum themselves into safety.
They had oversubscribed—quietly—because quotas were set based on business expectations, not on actual pool capacity, and retention wasn’t bounded.
The real culprit: automated snapshots kept for “a while,” which slowly became “forever” because nobody wanted to delete history.
A single team with a high-churn workload (large artifacts rewritten daily) caused snapshot growth. Their live data stayed under refquota,
but snapshots steadily ate the pool.
The fix wasn’t heroic. They defined snapshot retention per tenant class, added snapshot count limits, and set a pool safety alert at 80/85/90%.
They also started a monthly review of datasets where USED-REFER exceeded a threshold. Boring, consistent, effective.
Mini-story 2: the optimization that backfired
Another company offered “developer sandboxes” on ZFS. They wanted a great developer experience, so they switched many tenant datasets
from quota to refquota. The goal: stop devs from complaining that deleting files didn’t restore their ability to write
because snapshots were holding space.
It worked. Complaints dropped. The platform team celebrated with the kind of quiet satisfaction you only get from removing a whole class
of tickets. And then the pool started filling faster than expected, but nobody noticed immediately because tenant dashboards looked fine.
The backfire came from visibility. With refquota, tenants never hit their “limit” because their live data stayed bounded,
while snapshots were allowed to grow under the radar. The system had shifted the failure from “tenant can’t write” to “pool is full,”
which is a much worse failure in multi-tenant land.
The incident ended the usual way: they deleted snapshots under pressure, replication lag spiked, and a few restores became impossible.
Not catastrophic, but painful and avoidable.
The fix was to treat snapshot retention as part of quota governance. They implemented:
per-dataset snapshot caps, per-tenant snapshot schedules, and a report that ranked tenants by “snapshot-only space.”
Refquota stayed—but only with guardrails and a pool-wide free-space floor.
Mini-story 3: the boring but correct practice that saved the day
A regulated org ran a multi-tenant ZFS cluster for application teams. The storage engineers were allergic to surprises,
so they did two unsexy things: they kept 20% free space as policy, and they reserved a small slice for platform datasets
(logging, auth, monitoring spools).
One quarter-end, an app team’s batch job started producing far more output than normal. The tenant dataset hit its quota.
The app failed loudly—exactly what you want. The pool stayed healthy, monitoring stayed online, and other teams didn’t notice.
The on-call got a clean alert: “tenant quota exceeded.” Not “pool full.” Not “IO latency 10x.” Not “everything is on fire.”
They increased the tenant quota temporarily, but only after moving older snapshots to a colder pool and trimming retention.
The key wasn’t the quota by itself. It was the combination: a pool safety floor, reservations for essential services, and consistent reporting.
The incident stayed tenant-scoped. That’s the whole point of multi-tenant engineering.
Common mistakes: symptoms → root cause → fix
1) Symptom: “I deleted 500GB but I’m still at quota”
Root cause: snapshots still reference the deleted blocks; quota enforcement counts them.
Fix: either delete/expire snapshots, or switch to refquota for that dataset and control snapshots separately.
2) Symptom: tenant is under quota, but pool hits 100% anyway
Root cause: refquota limits only live data; snapshots, other datasets, and zvol thin provisioning still consume pool space.
Fix: enforce snapshot retention/limits, monitor “snapshot-only” growth (USED-REFER), and keep a pool-wide free-space floor.
3) Symptom: random ENOSPC even though zpool list shows free space
Root cause: reservations or special allocation constraints mean the free space isn’t usable for that dataset.
Fix: audit reservation/refreservation; reduce or remove non-critical reservations; ensure critical datasets have the reservations, not everything.
4) Symptom: tenant can write outside quota somehow
Root cause: writes are landing in a parent dataset (wrong mountpoint, bind-mount confusion, or permissions on parent mount).
Fix: lock parent datasets (readonly=on, canmount=off where appropriate), verify mountpoints, and restrict permissions.
5) Symptom: pool is not full, but latency is awful and writes crawl
Root cause: high fragmentation, small-block churn, sync-heavy workload, or a degraded vdev; capacity governance doesn’t solve IO saturation.
Fix: keep headroom, separate churny workloads into their own vdevs/pools, and measure with zpool iostat. Consider SLOG/special vdevs where appropriate.
6) Symptom: “space not freeing” after deleting big files, no snapshots found
Root cause: open-but-deleted files held by processes.
Fix: lsof +L1 to find offenders; restart or signal log rotation properly.
7) Symptom: tenant replication grows without obvious live growth
Root cause: frequent rewrites create lots of snapshot deltas; send streams grow even if live data stays stable.
Fix: reduce churn (app changes), adjust snapshot frequency, or move that tenant to a pool designed for churn.
Checklists / step-by-step plan
Step-by-step: set up a new tenant safely
- Create a dataset per tenant (or per tenant/service if you need different properties).
- Set mountpoint explicitly and ensure parent datasets are not writable by tenants.
- Choose quota model:
- Use
quotaif snapshots count as “their problem” and you want strict total cap. - Use
refquotaif you want “live data” capped and you manage snapshots centrally.
- Use
- Decide snapshot policy: frequency and retention. Put it in code, not tribal memory.
- Add alerting: quota %used, pool cap thresholds, and snapshot-only growth.
- Document the failure mode the tenant will see: EDQUOT vs ENOSPC and what they should do.
Step-by-step: enforce pool safety floor (the “don’t page me” plan)
- Pick a target free-space floor (commonly 10–20% depending on workload and vdev layout).
- Alert early at multiple thresholds (e.g., 80/85/90%), not just at 95% when it’s already miserable.
- Audit oversubscription: sum of quotas vs pool size; accept oversubscription only if you can explain why it’s safe.
- Limit snapshot growth: retention limits and (where supported by your tooling) snapshot count/space caps per tenant.
- Keep platform datasets reserved: monitoring, logging spools, and auth metadata should not be competing with tenants during an incident.
Step-by-step: respond when the pool is near full
- Stop the bleeding: identify the fastest reclaim (usually snapshots) and confirm reclaim with
zfs destroy -n. - If tenants are writing outside quotas, fix mountpoints and permissions immediately.
- Check for open-but-deleted files and restart offenders.
- Trim snapshot retention temporarily, then restore a sane policy with approvals.
- Schedule capacity expansion or data movement; “we’ll be careful” is not a capacity plan.
FAQ
1) Should I use quota or refquota for tenants?
If tenants manage their own snapshots or you want “total footprint including history” capped, use quota.
If you centrally manage snapshots and want user experience to reflect live data, use refquota, but then you must govern snapshot growth separately.
2) Can quotas prevent a pool from hitting 100%?
Not by themselves. Quotas limit datasets. Pool-level exhaustion still happens via snapshots, other datasets, zvol thin provisioning,
reservations, and oversubscription. You still need a pool headroom policy and alerting.
3) Why does USED differ so much from REFER?
REFER is the space referenced by the dataset head (live view). USED includes snapshot-held blocks and descendants.
A big gap usually means snapshots or child datasets.
4) What error will applications see when a quota is hit?
Typically “Disk quota exceeded” (EDQUOT). If the pool itself is out of space, they’ll see “No space left on device” (ENOSPC),
which affects everyone and is far worse operationally.
5) If I delete snapshots, will I always get space back immediately?
Usually yes, but the amount reclaimed depends on block sharing. Use zfs destroy -n snapshot to estimate reclaim.
If reclaim is small, the snapshot isn’t your main issue.
6) Are reservations a good way to “protect” each tenant?
No. Reservations are for protecting critical services, not for making everyone feel safe. Overusing reservations can starve the pool
and cause confusing ENOSPC behavior even when the pool reports free space.
7) How do I stop tenants from bypassing quotas by writing elsewhere?
Use dataset-per-tenant mountpoints, make parent datasets non-writable, verify mountpoint and canmount,
and ensure permissions don’t allow writes to shared parents.
8) Do snapshots count against refquota?
No, that’s the point. Snapshots still count against the pool, though. Refquota is a per-dataset live-data cap, not a pool safety mechanism.
9) What’s the simplest multi-tenant pattern that works in production?
One dataset per tenant, a clear quota model (quota or refquota), automated snapshots with strict retention,
and alerts on both tenant limits and pool headroom. Keep it boring.
Conclusion: next steps that prevent the 2 a.m. page
ZFS quotas are not a nice-to-have; they’re how you prevent one tenant from turning shared storage into a shared incident.
But quotas only work when your dataset layout matches your tenancy model, and when snapshots and reservations are treated as first-class policy.
Practical next steps:
- Audit your tenant boundaries: if tenants are directories, plan a migration to dataset-per-tenant.
- Pick quota semantics intentionally:
quotafor total footprint,refquotafor live-data UX—then implement the missing guardrails. - Implement snapshot retention limits and a report for “snapshot-only space” growth.
- Set a pool free-space floor and alert before 85% usage; don’t wait for 95% to discover physics.
- Reserve space only for platform-critical datasets so you can still operate when tenants misbehave.
Do this well and “one user killed the pool” becomes a story you tell new hires as a warning, not a quarterly tradition.