In ZFS, “undeletable snapshot” usually means “snapshot with a dependency you didn’t notice.” ZFS is ruthlessly consistent: it won’t delete history that something else still needs—whether that “something” is a clone, a hold tag from a backup pipeline, a replication policy, or your own earlier decision to treat snapshots like a trash can.
This is a practical field guide to making those snapshots go away safely. We’ll cover what actually blocks deletion, how to prove it with commands, and how to clean up without turning a healthy pool into a career-limiting event. I’ll lean on the two ZFS truths you learn in production: the data always has an owner, and the bill always comes due.
What “undeletable” really means in ZFS
When people say they “can’t delete a ZFS snapshot,” the system is usually telling them one of these things:
- A hold exists on the snapshot, so ZFS refuses to destroy it until the hold is released.
- A clone depends on it (a writable dataset created from a snapshot). ZFS won’t delete the snapshot because it’s literally the origin of a live filesystem.
- The snapshot is “busy” due to ongoing operations (send/receive, snapshotting, or something holding references).
- Replication, backup tooling, or policies keep recreating it, so it looks like deletion “doesn’t work.”
- You’re deleting the wrong thing (e.g., you removed a snapshot on the source, but the target still has it; or you destroyed a bookmark instead of a snapshot; or the snapshot name has special characters and your shell lied).
And the other common complaint—“I deleted snapshots, but space didn’t free”—isn’t undeletable. It’s accounting. Space in ZFS doesn’t free until the last reference to the blocks is gone, and those blocks may still be referenced by newer snapshots, clones, or just plain old data still present in the live filesystem. ZFS is many things, but it’s not a magic shredder.
First joke (as promised, only two total): ZFS snapshots are like receipts—easy to collect, hard to get rid of, and you only need them when you’ve already made a terrible decision.
Interesting facts & short history: why ZFS behaves this way
Some context makes the behavior feel less like sabotage and more like engineering:
- ZFS snapshots are copy-on-write views. No in-place overwrite; snapshots keep references to old block versions. That’s why they’re instant and why they can keep space “pinned.”
- “Undeletable” is often a feature. Holds were designed specifically to prevent automation from deleting snapshots needed for compliance, backup windows, or replication integrity.
- Clones are real filesystems. A clone is not a “copy,” it’s a dataset that shares blocks with its origin snapshot. Deleting the origin would orphan the clone’s lineage.
- ZFS was built to be its own volume manager. That’s why it can track block ownership at a level that makes snapshot dependencies enforceable—and non-negotiable.
- Snapshots can be “kept” by remote systems. With replication, the receiver may hold snapshots or require them as incremental bases. Your “delete” on the source doesn’t make the receiver forget.
- The dataset property system is part of the control plane. Properties like
readonly,canmount, andmountpointchange behavior without changing data; holds and clone origins are similar “metadata power.” - Solaris heritage shows. Many ZFS behaviors (including strictness around invariants) come from enterprise storage expectations: correctness first, convenience later.
- Destruction is transactional. ZFS changes are atomic; when it refuses to destroy a snapshot, it’s protecting a consistent on-disk graph of references.
- Bookmarks exist for a reason. Bookmarks are lightweight pointers to a snapshot’s transaction group, used to preserve an incremental base without keeping all snapshot metadata. People confuse them with snapshots and wonder why space doesn’t change.
Fast diagnosis playbook (first, second, third)
If you’re on call and storage is paging you, don’t wander. Do this, in this order:
1) Confirm what’s “undeletable” and why ZFS is objecting
Try the destroy and read the exact error. ZFS is usually explicit. “snapshot has dependent clones” and “snapshot is held” are not subtle.
2) Look for holds and clone dependencies immediately
Holds and clones account for the majority of “why won’t it delete.” Check holds with zfs holds. Check clones with zfs get origin (or reverse-lookup via zfs list -t all and grep).
3) If space is the problem, find what is actually consuming it
Use usedbysnapshots, usedbydataset, and usedbychildren. If snapshots are consuming space, determine which snapshots are heavy and whether they are pinned by clones/holds.
4) If deletion “works” but snapshots come back, chase the creator
That’s usually a cron job, a systemd timer, a backup appliance, or a replication tool that believes it’s responsible for retention. Deleting symptoms won’t cure the disease.
Core mechanics: holds, clones, busy datasets, and replication
Holds: the polite “no”
A hold is a tag on a snapshot that prevents destruction. A snapshot can have multiple holds, typically one per workflow: “backup,” “replication,” “legal,” “before-upgrade,” and the infamous “temp” that becomes permanent.
Key operational detail: you don’t “remove the hold,” you release it by name. If there are multiple hold tags, you must release them all (or use -r patterns carefully). This is where zfs release is your friend, and also where you can shoot yourself in the foot by releasing a hold that a backup system expects.
Clones: the quiet dependency
Clones are datasets created from a snapshot. They keep the snapshot alive because the snapshot is the clone’s origin. ZFS won’t destroy the origin snapshot until you either destroy the clone or “promote” it (making the clone independent and flipping the dependency direction).
Real-world gotcha: clones show up in places you don’t expect: CI pipelines, developer “temporary” environments, virtualization templates, database test refreshes. If you’re lucky, the clone is named honestly. If you’re not, it’s called prod2 and everybody pretends it’s not important.
Busy snapshots and operational locks
ZFS can report snapshots as busy if they’re involved in ongoing operations. Common causes:
zfs send/zfs receivestreams in progress- long-running snapshot destroys (especially with huge snapshot counts)
- tools that hold dataset references open for longer than expected
“Busy” is usually temporary, but in messy environments it can be a permanent state created by automation loops.
Replication and incremental bases
Replication pipelines often require a common snapshot between source and target to send incrementals. Some tools enforce that by placing holds on snapshots until the target confirms receipt, or by retaining certain snapshots as “anchors.”
Operationally, you need to know whether the snapshot is being kept because ZFS says so (holds/clones) or because your tooling says so (recreated snapshots, retention policies, or remote constraints).
Practical tasks: commands that actually solve it (with interpretation)
The following tasks are written the way you’d use them at a console, with enough interpretation to keep you out of trouble. Adjust dataset names to your environment.
Task 1: List snapshots and sort by space impact
cr0x@server:~$ zfs list -t snapshot -o name,used,refer,creation -s used tank/data
NAME USED REFER CREATION
tank/data@autosnap_2025-12-20 84G 1.2T Sat Dec 20 03:00 2025
tank/data@autosnap_2025-12-21 91G 1.2T Sun Dec 21 03:00 2025
tank/data@autosnap_2025-12-22 12G 1.2T Mon Dec 22 03:00 2025
Interpretation: USED here is “unique to this snapshot given everything else that exists.” If you’re hunting space, sort by used and start with the biggest offenders. If your snapshot list is huge, scope to a dataset subtree.
Task 2: Attempt destroy and capture the real error
cr0x@server:~$ sudo zfs destroy tank/data@autosnap_2025-12-21
cannot destroy snapshot tank/data@autosnap_2025-12-21: snapshot is held
Interpretation: Don’t guess. The error string tells you which branch of the decision tree you’re in: held, has dependent clones, busy, etc.
Task 3: Show holds on a snapshot
cr0x@server:~$ sudo zfs holds tank/data@autosnap_2025-12-21
NAME TAG TIMESTAMP
tank/data@autosnap_2025-12-21 backup Tue Dec 23 01:12 2025
tank/data@autosnap_2025-12-21 replicate Tue Dec 23 01:13 2025
Interpretation: Two different systems (or two phases of one system) placed holds. You must release both tags before destroy will work. Also: these tags are clues—find the owners before you rip them out.
Task 4: Release a single hold tag with zfs release
cr0x@server:~$ sudo zfs release backup tank/data@autosnap_2025-12-21
Interpretation: This removes the backup hold only. If deletion still fails, there’s at least one more tag, or another blocker entirely.
Task 5: Release multiple holds, then destroy
cr0x@server:~$ sudo zfs release backup tank/data@autosnap_2025-12-21
cr0x@server:~$ sudo zfs release replicate tank/data@autosnap_2025-12-21
cr0x@server:~$ sudo zfs destroy tank/data@autosnap_2025-12-21
Interpretation: Clean and explicit beats clever. In production, I prefer two commands I can audit over one command that “should work.”
Task 6: Find clones that depend on a snapshot (origin search)
This is the most reliable way: search for datasets whose origin matches your snapshot.
cr0x@server:~$ zfs get -H -o name,value origin -r tank | grep 'tank/data@autosnap_2025-12-20'
tank/dev-jenkins-workspace tank/data@autosnap_2025-12-20
tank/vm-templates/ubuntu tank/data@autosnap_2025-12-20
Interpretation: Those datasets are clones (or descendants of clones) tied to that snapshot. The snapshot won’t die until these are handled.
Task 7: Confirm a dataset is a clone and see its origin
cr0x@server:~$ zfs get origin tank/dev-jenkins-workspace
NAME PROPERTY VALUE SOURCE
tank/dev-jenkins-workspace origin tank/data@autosnap_2025-12-20 -
Interpretation: This dataset depends on that snapshot. Your options are: destroy the clone, or promote it (if it must live).
Task 8: Destroy a clone (and its children) to unblock snapshot destruction
cr0x@server:~$ sudo zfs destroy -r tank/dev-jenkins-workspace
Interpretation: -r destroys the dataset and all descendants. That’s correct for CI workspaces and ephemeral dev datasets. It is not correct for something someone quietly started treating as production.
Task 9: Promote a clone to remove dependency on the origin snapshot
cr0x@server:~$ sudo zfs promote tank/vm-templates/ubuntu
Interpretation: Promotion flips the dependency so the clone becomes the “parent” lineage. After promotion, the original snapshot may become destroyable (subject to other clones/holds). Promotion changes snapshot relationships; treat it like a change request, not a casual fix.
Task 10: Identify snapshot space vs dataset space (why space didn’t free)
cr0x@server:~$ zfs list -o name,used,usedbysnapshots,usedbydataset,usedbychildren -r tank/data
NAME USED USEDBYSNAPSHOTS USEDBYDATASET USEDBYCHILDREN
tank/data 3.1T 1.4T 1.5T 0.2T
Interpretation: Snapshots account for 1.4T here. If you delete snapshots and this number doesn’t drop, you still have snapshots (maybe on children), holds/clones preventing deletion, or you’re deleting ones that don’t actually own the space you care about.
Task 11: See which snapshots exist on descendants (the “I deleted it” trap)
cr0x@server:~$ zfs list -t snapshot -r -o name,used -s used tank/data | tail -n 10
tank/data/projects@autosnap_2025-12-21 110G
tank/data/projects@autosnap_2025-12-22 95G
tank/data/home@autosnap_2025-12-21 70G
tank/data/home@autosnap_2025-12-22 68G
Interpretation: You may have destroyed the snapshot on tank/data but the heavy usage is on tank/data/projects and tank/data/home. Snapshots are per-dataset; recursion matters.
Task 12: Destroy a snapshot recursively (carefully)
cr0x@server:~$ sudo zfs destroy -r tank/data@autosnap_2025-12-21
Interpretation: This destroys the snapshot with that name on the dataset and all descendants. This is powerful and easy to misuse. Ensure snapshot naming conventions are consistent and you actually mean recursion.
Task 13: Find snapshots that are held across a whole tree
ZFS doesn’t provide a single “list all held snapshots” command with full detail in one shot, so we usually script it with safe plumbing.
cr0x@server:~$ for s in $(zfs list -H -t snapshot -o name -r tank/data); do
> zfs holds "$s" 2>/dev/null | awk 'NR==1{next} {print $1" "$2}'
> done | head
tank/data@autosnap_2025-12-20 replicate
tank/data/projects@autosnap_2025-12-20 replicate
tank/data/projects@autosnap_2025-12-20 backup
Interpretation: This tells you which snapshots are protected and by which tags. In a messy environment, the tag names are the Rosetta stone to figure out which system is “owning” retention.
Task 14: Verify whether replication/backup is recreating snapshots
If snapshots “come back,” check creation times and patterns. This is the least glamorous but most effective approach: prove the source of truth.
cr0x@server:~$ zfs list -t snapshot -o name,creation -s creation tank/data | tail -n 5
tank/data@autosnap_2025-12-25_0000 Thu Dec 25 00:00 2025
tank/data@autosnap_2025-12-25_0100 Thu Dec 25 01:00 2025
tank/data@autosnap_2025-12-25_0200 Thu Dec 25 02:00 2025
tank/data@autosnap_2025-12-25_0300 Thu Dec 25 03:00 2025
tank/data@autosnap_2025-12-25_0400 Thu Dec 25 04:00 2025
Interpretation: Hourly cadence screams automation. Deleting without fixing the scheduler is like bailing a boat while drilling more holes.
Task 15: Check for an in-progress send/receive that may hold things “busy”
This is OS- and tooling-dependent, but you can usually spot active ZFS streams via process lists.
cr0x@server:~$ ps aux | egrep 'zfs (send|receive)|mbuffer|ssh .*zfs receive' | grep -v egrep
root 18244 2.1 0.0 17768 4100 ? Ss 03:02 0:01 zfs send -I tank/data@autosnap_2025-12-24 tank/data@autosnap_2025-12-25_0300
root 18245 0.8 0.0 10432 2820 ? S 03:02 0:00 ssh backup-target zfs receive -uF tank/replica/data
Interpretation: If you destroy a snapshot that’s currently the basis of an incremental send, the job may fail or restart and re-hold. Coordinate with replication windows.
Task 16: Use dry-run thinking before destructive commands
ZFS doesn’t have a universal --dry-run for destroy. Your “dry run” is inspection: list holds, list clones, list dependents, confirm naming, and confirm scope (-r vs not).
cr0x@server:~$ zfs holds tank/data@autosnap_2025-12-20
cr0x@server:~$ zfs get -H -o name,value origin -r tank | grep 'tank/data@autosnap_2025-12-20' || true
cr0x@server:~$ zfs list -t snapshot -r -o name tank/data | grep '@autosnap_2025-12-20' | head
Interpretation: This sequence answers: “Is it held?”, “Are there clones?”, and “What exactly will recursion touch?” It’s boring. It works.
Three corporate-world mini-stories from the trenches
Mini-story 1: An incident caused by a wrong assumption
The assumption: “If the snapshot is old, it’s safe to delete.” A storage admin (competent, careful, just rushed) saw a pool hitting 90% and began pruning snapshots older than 30 days. The destroy command failed on a subset with “dependent clones,” so they moved on and deleted what they could. Space didn’t recover enough. Panic rose. More deletion followed.
What they didn’t realize: a dev team had a “temporary” analytics sandbox cloned from a month-old snapshot of a production dataset. It was used for ad-hoc queries, and someone had pointed an internal dashboard at it because it was “fast.” No ticket. No documentation. Just a quiet dependency sitting on a snapshot like a barnacle.
The on-call path got ugly because the error didn’t read “this is powering your dashboards.” It just said “dependent clones.” The team tried to promote the clone without understanding the consequences, which rearranged the lineage and broke a retention script that assumed origins lived in one place. Suddenly replication incrementals couldn’t find their base snapshots. Jobs failed. Alerts multiplied.
What fixed it was not heroics; it was the adult move: inventory the clones, identify owners, schedule downtime for the sandbox, and either destroy it or rehome it as a proper dataset with explicit retention. The takeaway that stuck: “old” isn’t a safety classification. Dependency is.
Mini-story 2: An optimization that backfired
The goal was noble: reduce snapshot counts. A platform team changed their policy from “hourly snapshots for 7 days” to “hourly for 24 hours, then daily for 30 days.” They also added a convenience feature: create a clone from last night’s snapshot for each developer who wanted a fresh environment. The clone creation was automated and fast. Developers were delighted.
Two months later, the pool started growing in a way nobody could explain. They were deleting snapshots on schedule; graphs looked compliant. But usedbysnapshots stayed high. The storage engineer dug in and found dozens of long-lived clones created from “last night,” some of which had become semi-permanent because they were used to reproduce bugs, run benchmarks, or hold “just in case” data.
The optimization backfired because the retention policy assumed snapshots were the only pinning mechanism. Clones silently turned “short-lived snapshot history” into “long-lived block retention,” and the system was doing exactly what it was told: protect the clone’s origin blocks. The fix was to put a lifecycle on clones too: auto-expire, auto-promote (where appropriate), and enforce naming and ownership.
It’s a classic enterprise story: you optimize one metric (snapshot count) and accidentally monetize a different one (block retention). Storage doesn’t care which spreadsheet you’re using; it bills you in terabytes.
Mini-story 3: A boring but correct practice that saved the day
In another environment, snapshot cleanup was never exciting—and that’s why it worked. They had a written convention: every hold tag must include the system name and purpose (e.g., backup:daily, replicate:dr, legal:case123). Every clone must include an owner prefix and an expiry date in the dataset name. Every retention change required a simple peer review.
Then a ransomware scare hit. The security team wanted older snapshots preserved while they investigated. Backup tooling started placing holds everywhere. Storage usage climbed. The pool approached uncomfortable thresholds. This is where “boring practice” shines: because tags were meaningful, the storage team could see which holds were security-driven versus replication-driven. They selectively released holds after the investigation window closed, without breaking DR replication.
They also had a weekly report (yes, a boring CSV) listing clones older than their expiry. When the incident hit, they already knew what was safe to remove. No archaeology, no guessing, no midnight “who owns this dataset” emails.
The day was saved by something nobody brags about: consistent naming, visible ownership, and an audit trail. The most underrated performance feature in ZFS is an organization that knows what it asked ZFS to do.
Checklists / step-by-step plan
Checklist A: Remove a snapshot blocked by holds (safely)
- Attempt a destroy to capture the exact error message.
- List holds with
zfs holds. - Identify the owner of each hold tag (backup, replication, compliance).
- Pause/coordinate any workflow that will re-add the hold.
- Release holds with
zfs release <tag> <snapshot>for each tag. - Destroy the snapshot.
- Verify snapshot list and space accounting.
Checklist B: Remove a snapshot blocked by clones
- Confirm the error “has dependent clones” or find origins referencing the snapshot.
- Enumerate clones with
zfs get origin -rand match the snapshot. - For each clone: decide destroy vs promote.
- If destroying: use
zfs destroy -ron the clone dataset (not the snapshot). - If promoting: run
zfs promoteand then re-check dependencies. - Destroy the original snapshot when dependencies are gone.
Checklist C: Space isn’t freeing after snapshot deletion
- Check
usedbysnapshotson the dataset subtree. - Identify heavy snapshots via
zfs list -t snapshot -o used. - Check for remaining holds and clones.
- Confirm you deleted snapshots on the right datasets (recursion matters).
- Look for other references: newer snapshots, clones, or the live dataset itself.
- Re-check pool-level free space and reservations/refreservations if applicable.
Second joke (and last): The fastest way to reduce snapshot count is rm -rf /, but it’s also a great way to reduce employment count.
Common mistakes (symptoms + fixes)
Mistake 1: Releasing holds without coordinating with replication/backup
Symptom: Snapshots delete fine, then replication jobs fail or backups complain about missing incremental bases.
Fix: Identify why the hold exists. If it’s a replication anchor, either complete the replication cycle first or adjust the replication strategy to use a different base snapshot. In some environments, the correct fix is to stop the replication service briefly, clean up, then restart with a new baseline.
Mistake 2: Destroying recursively when you meant a single dataset snapshot
Symptom: A bunch of datasets lose snapshots at once. Developers ask why their “restore point” vanished.
Fix: Before using -r, list snapshots in the subtree and confirm naming is consistent. If the environment uses mixed snapshot naming, avoid recursive destroys and target datasets explicitly.
Mistake 3: Confusing snapshot USED with “how big the snapshot is”
Symptom: You delete a snapshot that shows USED=0 and expect space to return; nothing changes, or you delete the wrong ones first.
Fix: Understand that USED is unique space attributable to that snapshot. A snapshot with low USED may still be operationally critical as an incremental base; a snapshot with high USED is where you get space back.
Mistake 4: Forgetting clones exist (or not recognizing them)
Symptom: “dependent clones” errors, or space doesn’t free despite snapshot pruning.
Fix: Search for datasets with an origin pointing to the snapshot. Decide destroy vs promote. Add lifecycle rules for clones so this doesn’t recur.
Mistake 5: Deleting snapshots that automation recreates
Symptom: You destroy snapshots and they reappear with new creation timestamps.
Fix: Find and modify the scheduler/policy that creates them. Snapshot deletion is not a retention policy; it’s a cleanup action. Retention lives in the policy engine.
Mistake 6: Ignoring dataset reservations and refreservations
Symptom: Pool still looks full even after meaningful snapshot deletion, or datasets can’t allocate space despite “free” showing somewhere else.
Fix: Check reservation and refreservation properties. Reservations can make space appear “unavailable.” Adjust carefully—reservations are often there to protect critical workloads.
Mistake 7: Fighting “busy” snapshots with force instead of patience and coordination
Symptom: Destroy fails with “dataset is busy” during replication windows or backups.
Fix: Identify active zfs send/receive pipelines. Either wait for completion or coordinate a stop. Repeatedly retrying destroy in a loop just adds chaos.
FAQ
1) What does zfs release actually do?
It removes a named hold tag from a snapshot. Holds prevent snapshot destruction. If a snapshot has multiple holds, you must release them all (or the remaining holds still block destroy).
2) Why does ZFS allow holds at all? It feels like it’s fighting me.
Because in real operations, automation and humans both make mistakes. Holds are guardrails: “this snapshot is required for backup/replication/compliance.” Without them, a cleanup script can delete your recovery points faster than your incident response can type.
3) I released holds and destroyed the snapshot. Why didn’t I get space back?
Because those blocks may still be referenced by newer snapshots, clones, or the live filesystem. Use zfs list with usedbysnapshots and sort snapshots by used to find where space is actually pinned.
4) What’s the difference between a snapshot and a bookmark?
A snapshot includes full metadata for the point-in-time view. A bookmark is a lightweight reference used primarily as an incremental replication base. Destroying a bookmark doesn’t free snapshot space because it isn’t a snapshot; destroying a snapshot may free space if blocks are no longer referenced elsewhere.
5) How do I know whether a snapshot is kept because of clones?
Search for datasets whose origin equals that snapshot. If any exist, the snapshot is an origin and cannot be destroyed until those clones are destroyed or promoted.
6) Is it safe to use zfs destroy -r on a snapshot name?
It can be safe if you are intentionally destroying that snapshot across a well-understood dataset tree with consistent naming. It is dangerous if child datasets have different meaning/owners. “Recursive” is not a convenience flag; it’s a scope multiplier.
7) What’s the safest way to clean up snapshots in a replicated environment?
Make retention decisions in one place (source-of-truth), coordinate replication windows, and avoid deleting the incremental base snapshots that replication needs. In practice, that means: understand your replication tool’s expectations, respect hold tags it places, and clean up in a controlled change window when you need to break glass.
8) Can I force-destroy a held snapshot?
Not in the sense of “override holds.” The correct method is to release the holds. If you don’t control the holds (e.g., compliance), your real job is governance: figure out who owns the hold policy and what conditions allow release.
9) Why do I get “snapshot has dependent clones” when I don’t remember creating clones?
Because something else did. Virtualization tooling, CI pipelines, developer self-service, and some backup workflows can create clones. The dataset names and origin property will tell you what’s connected—even if nobody remembers.
10) How do I prevent “undeletable snapshots” from happening again?
Standardize hold tags, implement clone lifecycle (expiry/ownership), and make retention policy changes explicit and reviewed. Technically, ZFS is doing the right thing; operationally, you need to ensure the right people and systems are the ones setting those constraints.
Conclusion
In ZFS, snapshots aren’t undeletable. They’re accountable. If a snapshot won’t die, it’s because something still depends on it—by design. The practical route is always the same: read the exact error, check holds, check clones, then check whether automation is recreating what you’re deleting. Once you treat snapshots as part of a dependency graph instead of “old files,” zfs release stops being a mysterious incantation and becomes what it is: a controlled way to remove a safety lock.
Do the boring inspections first, coordinate with replication/backup owners, and be explicit with scope. ZFS will meet you halfway—right after it confirms you’re not about to delete the only lifeboat.