ZFS clones: Instant Copies With Hidden Dependencies (Know This First)

Was this helpful?

ZFS clones are the kind of feature that makes storage engineers look like magicians: create a “copy” of a filesystem or volume instantly, without moving data, and start writing to it like it’s always existed. The first time you use it for a VM template, a CI environment, or a forensic copy, it feels like you’ve cheated physics.

Then you try to clean up snapshots, reclaim space, or replicate the dataset… and you learn the other half of the trick: clones are instant because they’re not independent. A clone is a writable child of a snapshot, and that hidden dependency will eventually show up on a pager somewhere. This article is about making sure it shows up in a ticket instead.

What a ZFS clone really is

A ZFS snapshot is a read-only point-in-time view of a dataset (filesystem) or volume (zvol). It costs almost nothing upfront because it only records metadata; blocks are shared until they diverge.

A ZFS clone is a writable dataset or zvol created from a snapshot. That “from a snapshot” part is not a detail; it’s the whole contract. A clone starts life sharing blocks with its origin snapshot. As you write to the clone, ZFS uses copy-on-write (CoW): it writes new blocks elsewhere, updates metadata, and the clone diverges from the snapshot.

Clones have two properties operators should tattoo on their mental runbook:

  1. A clone depends on its origin snapshot. You can’t destroy the snapshot while the clone exists (unless you do a promotion dance, which changes the family tree).
  2. Space accounting becomes “shared.” You can fill a pool “without filling anything” until suddenly you did. The numbers don’t lie, but they do require interpretation.

One operational joke that keeps landing because it’s too accurate: ZFS clones are like adopting a cat; it’s instant, but now you’ve inherited a long-term dependency you didn’t budget for.

Clone anatomy: dataset vs zvol

ZFS can clone both:

  • Filesystems (datasets): mounted at a mountpoint, contain files and directories.
  • Volumes (zvols): block devices, often used for VM disks, iSCSI LUNs, database raw devices.

The clone mechanism is the same. The operational impact differs because zvol clones are frequently used in virtualization stacks where “template” and “linked clone” patterns are common, and where write amplification can appear quickly under heavy random-write workloads.

What “instant” actually means

The clone creation is instant because ZFS doesn’t copy blocks. It creates a new dataset and points its metadata at the same blocks referenced by the snapshot. If you expect that to behave like an independent copy, you’re about to learn about reference counts, usedbysnapshots, and why a snapshot you “don’t need” is still preventing space reclamation.

The hidden dependency: “you can’t delete that snapshot”

The classic surprise looks like this:

  • You create snapshot pool/prod@app-2025-12-25.
  • You create clone pool/dev/app-test from that snapshot.
  • Weeks later, you try to delete old snapshots to reclaim space.
  • ZFS refuses to destroy the snapshot because it has dependent clones.

This is not ZFS being stubborn; it’s ZFS preventing corruption. The origin snapshot contains block references that the clone may still rely on. If ZFS let you delete the snapshot, it could free blocks that the clone still needs. ZFS solves this by enforcing the dependency.

Why this matters beyond “cleanup fails”

In production, clone dependencies tend to surface as:

  • Snapshot retention policy breaks. Your “keep 14 days” rule silently becomes “keep 14 days unless there’s a clone, then keep forever.”
  • Space reclamation stalls. Teams keep deleting files, but pool usage doesn’t drop because blocks are still referenced by snapshots that can’t be removed.
  • Replication complexity. Clones can complicate zfs send workflows if you don’t plan the dataset/snapshot topology.
  • Promotion chaos. Someone “promotes” a clone to break dependencies and changes which dataset is considered the origin, and suddenly your mental model of “what is prod” is wrong.

Promotion: the escape hatch (and the footgun)

zfs promote makes a clone become the “mainline” dataset, and its origin becomes a clone child (conceptually). It’s how you break the dependency on an origin snapshot so you can delete it. It’s also how you accidentally invert a lineage and confuse replication, monitoring, and humans.

Promotion is legitimate and often necessary. But it should be a planned action with explicit outcomes, not a desperate command copied from a forum while the pool is 99% full.

Interesting facts and historical context

Six to ten short points that matter more than trivia because they explain why clones behave the way they do:

  1. ZFS snapshots are cheap because of CoW. ZFS never overwrites in-place; it writes new blocks and updates pointers, which makes consistent snapshots a natural consequence of the design.
  2. Clones were built for fast provisioning. Early ZFS adopters used clones for dev/test and VM templating long before “infrastructure as code” became mainstream.
  3. Space is shared by reference counts. A block may be referenced by the live filesystem, one or more snapshots, and one or more clones; it’s only freed when the last reference is gone.
  4. “Used” in ZFS is contextual. There’s “used by dataset,” “used by snapshots,” “used by children,” and “logical used”; your pool fills according to physical reality, not whichever number you stared at in a dashboard.
  5. Clones predate many modern “copy data management” products. The idea—copy instantly, diverge on write—is the same principle behind many commercial snapshot/clone systems.
  6. Clone lineage is explicit in properties. ZFS stores the origin property on clones; you can query and script around it (and you should).
  7. Promote changes ancestry, not data validity. It rearranges which snapshot is considered “origin” for dependency purposes; it doesn’t rewrite the entire dataset.
  8. Clones and snapshots are not backups. They’re excellent local recovery tools, but they share the same pool, failure domain, and often the same “oops I destroyed the pool” risk.
  9. ZVOL clones changed VM economics. Linked clones let teams deploy hundreds of VMs from one golden image with minimal storage—until write patterns turned “minimal” into “surprise.”

When clones are the right tool (and when they’re not)

Good uses

  • VM templates / golden images: provision new VM disks instantly; pay only for divergence.
  • CI environments: spin up test datasets quickly, throw them away after the run.
  • Forensics: make a writable copy of a snapshot to reproduce issues without touching prod.
  • Data science sandboxes: give analysts a writable dataset from a stable snapshot without duplicating terabytes.
  • Risky migrations: clone a dataset and rehearse changes; if it goes wrong, destroy the clone.

Bad uses (or at least: needs guardrails)

  • Long-lived “temporary” environments that outlive snapshot retention windows. This is how a weekly test clone becomes a three-year-old space anchor.
  • “Clone as backup.” If the pool dies, both original and clone die together. Clones reduce the blast radius of mistakes, not of hardware failure.
  • Unbounded developer self-service. Developers will create clones with creative names, forget them, and your snapshot cleanup will become a therapy session.

Second joke, because storage needs levity: Clones are “free” the way free puppies are free—initial cost is zero, but the cleanup schedule is now your life.

Three corporate-world mini-stories

1) Incident caused by a wrong assumption: “Deleting the snapshot will free space”

The storage team at a mid-sized company ran a ZFS-backed virtualization cluster. They had a tidy policy: hourly snapshots for 48 hours, daily for two weeks. It worked—until it didn’t. The pool crept from 70% to 88% over a month, then hit 94% in a weekend, and Monday morning arrived with slow VMs and angry dashboards.

The on-call did the obvious: delete old snapshots. The destroy commands started failing with “snapshot has dependent clones.” That message looks harmless until you realize it means “your retention policy is now fiction.” The assumption was that snapshots were the only retention object. In reality, the VM provisioning pipeline had quietly shifted to “linked clones from last night’s snapshot” to speed up environment creation. Nobody told storage, because from the pipeline’s perspective, it was a performance improvement.

They tried deleting the clones, but now those clones were running: someone had repurposed “test” VMs into “temporary production” during a previous incident. The clones were no longer disposable. The origin snapshots were pinned, which pinned blocks, which pinned space.

The incident ended with a forced triage: identify which clones mattered, migrate the important ones to independent datasets, promote where appropriate, then re-establish a sane snapshot schedule. The postmortem’s key fix wasn’t “delete more snapshots.” It was governance: every clone got a TTL, and the provisioning system tagged clones with an owner and expiration. Storage finally got a daily report of dependent clones blocking snapshot deletion.

2) An optimization that backfired: “Let’s dedupe it too”

A different org loved efficiency. They discovered clones and promptly used them everywhere: developer databases, QA environments, staging refreshes. The pool usage dropped, provisioning got faster, and everyone congratulated everyone.

Then someone said the sentence that has ended many good weeks: “If clones save space, dedup will save even more.” They enabled dedup on a dataset that held cloned VM images and database zvols. In the first days, the numbers looked great. In the next weeks, memory pressure got worse. ARC misses rose. Latency became spiky during busy hours. Eventually, the box developed a habit of pausing under load like it was thinking deeply about life choices.

Dedup tables are hungry. Clones already share blocks efficiently when derived from the same snapshot lineage; dedup adds global block accounting overhead. For their workload—many similar but diverging VMs with lots of random writes—dedup increased metadata churn and made every write more expensive. The “saved space” was real, but they traded away predictable performance and operational simplicity.

The fix was painful: migrate off deduped datasets, re-provision with clones only, and accept that “efficient” has multiple dimensions. Clones were the right optimization; dedup was the right optimization for a different workload and a different hardware budget.

3) A boring but correct practice that saved the day: “Clones have owners and expiry”

One enterprise platform team treated clones as production resources, not magic tricks. Their rule was boring: every clone must have an owner, a ticket reference, and an expiration date. Not in a spreadsheet—stored as ZFS user properties right on the dataset.

When a pool started trending upward, they didn’t guess. They ran a script: list all clones, their origin snapshots, and their expiry dates. Most were legitimate. A handful were expired and unused. Those were destroyed first, which unpinned snapshots, which freed space. No firefight required.

Later, a deployment pipeline failed because it couldn’t destroy old snapshots. The on-call followed the same script output and found a single long-lived clone created for an incident investigation months earlier. The owner had moved teams. The dataset still existed because nobody wanted to delete “just in case.” The team contacted the new service owner, agreed on a replication of needed artifacts, then destroyed the clone and resumed normal operations.

This is the unsexy truth: boring inventory beats clever storage features. Clones are powerful, but the only sustainable way to use them at scale is to make them visible, owned, and lifecycle-managed.

Practical tasks: commands, outputs, interpretation

Below are hands-on tasks you can run on a typical OpenZFS system (Linux or illumos). Commands assume you have privileges. Outputs are representative; yours will differ.

Task 1: List snapshots and clones together

cr0x@server:~$ zfs list -t filesystem,snapshot,volume -o name,type,used,refer,origin -r tank/app
NAME                          TYPE      USED  REFER  ORIGIN
tank/app                      filesystem 12.3G  9.8G  -
tank/app@daily-2025-12-20      snapshot   0B    9.8G  -
tank/app@daily-2025-12-21      snapshot   0B    10.1G -
tank/app@daily-2025-12-22      snapshot   0B    10.1G -
tank/app-clone-qa             filesystem 3.1G  11.0G tank/app@daily-2025-12-21

Interpretation: The clone has an origin property pointing to a snapshot. That origin snapshot is now protected from destruction unless you remove or promote the clone.

Task 2: Create a snapshot (safe and fast)

cr0x@server:~$ zfs snapshot tank/app@pre-change-001
cr0x@server:~$ zfs list -t snapshot -o name,creation -r tank/app | tail -n 3
NAME                          CREATION
tank/app@daily-2025-12-22      Mon Dec 22 01:00 2025
tank/app@pre-change-001        Thu Dec 25 09:14 2025

Interpretation: Snapshot is instantaneous and consistent at the ZFS level.

Task 3: Create a clone from a snapshot

cr0x@server:~$ zfs clone tank/app@pre-change-001 tank/app-clone-dev
cr0x@server:~$ zfs get origin,mountpoint tank/app-clone-dev
NAME               PROPERTY    VALUE                    SOURCE
tank/app-clone-dev  origin      tank/app@pre-change-001  -
tank/app-clone-dev  mountpoint  /tank/app-clone-dev      default

Interpretation: The clone is a real dataset with its own properties, mountpoint, quotas, and a hard dependency on the origin snapshot.

Task 4: Prove the dependency: try to destroy the origin snapshot

cr0x@server:~$ zfs destroy tank/app@pre-change-001
cannot destroy snapshot tank/app@pre-change-001: snapshot has dependent clones

Interpretation: That snapshot is pinned. Your retention policy just met reality.

Task 5: Find dependent clones for a snapshot

cr0x@server:~$ zfs list -t filesystem,volume -o name,origin -r tank | grep 'tank/app@pre-change-001'
tank/app-clone-dev  tank/app@pre-change-001

Interpretation: Simple grep works, but for large estates you’ll want scripted queries using zfs get -H.

Task 6: Quantify “what’s actually using space” with used breakdown

cr0x@server:~$ zfs list -o name,used,usedbysnapshots,usedbychildren,usedbydataset -r tank/app | head
NAME          USED  USEDBYSNAPSHOTS  USEDBYCHILDREN  USEDBYDATASET
tank/app      12.3G 2.4G             3.1G            6.8G

Interpretation: If usedbysnapshots is large, deleting snapshots would help—unless clones prevent it. If usedbychildren is large, clones (or other child datasets) are consuming space.

Task 7: Check pool health and free space headroom (because CoW needs it)

cr0x@server:~$ zpool status tank
  pool: tank
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0

errors: No known data errors
cr0x@server:~$ zpool list tank
NAME  SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank  7.25T  6.68T   576G        -         -    41%    92%  1.00x  ONLINE  -

Interpretation: 92% capacity is danger territory for CoW filesystems. Clone-heavy environments can drift here because “we didn’t copy anything” becomes “we wrote a lot of diverged data.”

Task 8: Identify clone lineage and prepare for promotion

cr0x@server:~$ zfs get -r -o name,property,value origin tank/app tank/app-clone-dev
NAME              PROPERTY  VALUE
tank/app           origin    -
tank/app@pre-change-001  origin    -
tank/app-clone-dev  origin    tank/app@pre-change-001

Interpretation: Only clones have an origin. Snapshots do not.

Task 9: Promote a clone to break dependency (carefully)

cr0x@server:~$ zfs promote tank/app-clone-dev
cr0x@server:~$ zfs get origin tank/app-clone-dev tank/app
NAME              PROPERTY  VALUE                   SOURCE
tank/app-clone-dev  origin    -                       -
tank/app            origin    tank/app-clone-dev@pre-change-001  -

Interpretation: After promotion, the clone is now the “primary” lineage; the original dataset may become a clone depending on the snapshot graph. This is correct behavior, and also why promotions should be tracked in change management.

Task 10: Destroy the old snapshot after promotion (if appropriate)

cr0x@server:~$ zfs destroy tank/app@pre-change-001
cr0x@server:~$ zfs list -t snapshot -r tank/app | grep pre-change-001
cr0x@server:~$ echo $?
1

Interpretation: Snapshot is gone. Space is only reclaimed if no other references exist.

Task 11: Create a zvol clone for a VM disk workflow

cr0x@server:~$ zfs create -V 80G -o volblocksize=16K tank/vm/golden
cr0x@server:~$ zfs snapshot tank/vm/golden@v1
cr0x@server:~$ zfs clone tank/vm/golden@v1 tank/vm/vm-123-disk0
cr0x@server:~$ ls -l /dev/zvol/tank/vm/vm-123-disk0
brw-rw---- 1 root disk 230, 128 Dec 25 09:33 /dev/zvol/tank/vm/vm-123-disk0

Interpretation: You now have a block device backed by a clone. Perfect for fast provisioning—until you forget that snapshots can’t be rotated while any VM disk clones depend on them.

Task 12: Check compression and logical space vs physical

cr0x@server:~$ zfs get compressratio,logicalused,used tank/app-clone-dev
NAME              PROPERTY       VALUE  SOURCE
tank/app-clone-dev  compressratio  1.45x  -
tank/app-clone-dev  logicalused    14.9G  -
tank/app-clone-dev  used           3.1G   -

Interpretation: logicalused reflects uncompressed logical size; used is physical space consumed by this dataset and its descendants. With clones, “used” is not “unique.”

Task 13: Show unique vs shared usage for a dataset (quick estimate)

cr0x@server:~$ zfs list -o name,used,refer,logicalrefer tank/app tank/app-clone-dev
NAME              USED  REFER  LOGICALREFER
tank/app          12.3G  9.8G   13.9G
tank/app-clone-dev 3.1G  11.0G  15.1G

Interpretation: refer is the amount accessible from that dataset; clones can show large refer because they reference shared blocks. It doesn’t mean they uniquely consume that much space.

Task 14: Use holds to intentionally pin a snapshot (controlled dependency)

cr0x@server:~$ zfs snapshot tank/app@forensics-keep
cr0x@server:~$ zfs hold case-INC123 tank/app@forensics-keep
cr0x@server:~$ zfs destroy tank/app@forensics-keep
cannot destroy 'tank/app@forensics-keep': snapshot is held
cr0x@server:~$ zfs holds tank/app@forensics-keep
NAME                  TAG           TIMESTAMP
tank/app@forensics-keep  case-INC123  Thu Dec 25 09:41 2025

Interpretation: Holds are a deliberate way to prevent deletion. Compared to clone dependencies, holds are explicit and easy to audit.

Task 15: Release the hold and destroy the snapshot

cr0x@server:~$ zfs release case-INC123 tank/app@forensics-keep
cr0x@server:~$ zfs destroy tank/app@forensics-keep

Interpretation: This is the cleanup path you want: explicit lifecycle control.

Task 16: Replication sanity check: list snapshots to send

cr0x@server:~$ zfs list -t snapshot -o name -s creation -r tank/app | tail -n 5
tank/app@daily-2025-12-21
tank/app@daily-2025-12-22
tank/app@pre-change-001
tank/app@daily-2025-12-23
tank/app@daily-2025-12-24

Interpretation: If clones are involved, ensure your send/receive plan matches the snapshot graph you intend to preserve. The “right” approach varies by whether you want to replicate clones, promote on target, or keep only primaries.

Fast diagnosis playbook

This is the “it’s slow and the pool is filling” playbook I actually want on-call engineers to follow. Not because it’s perfect, but because it forces the right questions in the right order.

1) First: confirm it’s not a pool health or capacity cliff

cr0x@server:~$ zpool status -x
all pools are healthy
cr0x@server:~$ zpool list -o name,size,alloc,free,cap,frag,health
NAME  SIZE  ALLOC  FREE  CAP  FRAG  HEALTH
tank  7.25T 6.68T 576G   92%  41%   ONLINE

Interpretation: If CAP is above ~85–90%, expect fragmentation and CoW overhead to amplify latency. Clones increase the odds you drift here because they encourage “fast provisioning” without equal focus on “fast reclamation.”

2) Second: find what’s pinning snapshots and blocking deletion

cr0x@server:~$ zfs list -t snapshot -o name,used,refer -r tank/app | head
NAME                     USED  REFER
tank/app@daily-2025-12-10 0B   8.9G
tank/app@daily-2025-12-11 0B   9.0G
cr0x@server:~$ zfs destroy tank/app@daily-2025-12-10
cannot destroy snapshot tank/app@daily-2025-12-10: snapshot has dependent clones

Interpretation: Now you know why retention isn’t working. Next step is to identify the clones (or holds) responsible.

3) Third: measure where space is going (dataset vs snapshots vs children)

cr0x@server:~$ zfs list -o name,used,usedbydataset,usedbysnapshots,usedbychildren -r tank/app
NAME      USED  USEDBYDATASET  USEDBYSNAPSHOTS  USEDBYCHILDREN
tank/app  12.3G 6.8G           2.4G             3.1G

Interpretation: If usedbychildren is high, you likely have clones (or other child datasets) consuming divergent data. If usedbysnapshots is high, snapshots are holding blocks—possibly because clones force their existence.

4) Fourth: identify which clones are “active” vs abandoned

cr0x@server:~$ zfs list -t filesystem,volume -o name,origin,used,creation -r tank | grep -v 'origin\s*-' | head
tank/app-clone-qa     tank/app@daily-2025-12-21  3.1G  Mon Dec 22 02:10 2025
tank/vm/vm-123-disk0  tank/vm/golden@v1          9.4G  Thu Dec 25 09:33 2025

Interpretation: Create a “who owns this?” list quickly. If you don’t have ownership metadata, you’re doing archeology under pressure.

5) Fifth: if performance is the symptom, check I/O and TXG pressure

cr0x@server:~$ zpool iostat -v tank 1 5
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
tank                        6.68T   576G    820   2400   65.2M  210M
  mirror-0                  6.68T   576G    820   2400   65.2M  210M
    sda                         -      -    410   1200   32.6M  105M
    sdb                         -      -    410   1200   32.6M  105M
--------------------------  -----  -----  -----  -----  -----  -----

Interpretation: Clones don’t inherently make I/O slow, but clone-heavy environments often mean more snapshots, more metadata, and more random writes (especially with VM zvols). If the pool is near full, the same workload becomes slower.

Common mistakes: symptoms and fixes

Mistake 1: Treating clones as independent copies

Symptom: You delete a snapshot and ZFS refuses: “snapshot has dependent clones.” Or worse: you’re afraid to delete anything because “it might break the clone.”

Fix: Inventory clones and their origins. Decide whether the clone is disposable. If disposable, destroy it. If not, consider zfs promote (planned) or migrate it to an independent dataset (send/receive, or full copy) and then remove the dependency.

Mistake 2: Clone sprawl with no TTL

Symptom: Snapshot retention never shrinks; pool usage trends upward even after deleting data. “Temporary” datasets are older than some employees.

Fix: Enforce expiration with user properties (owner, ticket, expires). Report on clones weekly. Destroy expired clones automatically after notification.

Mistake 3: Promoting a clone in the middle of an incident without understanding lineage

Symptom: Replication scripts fail, monitoring labels flip, or someone discovers that the “original” dataset now has an origin property and is no longer the primary.

Fix: Before promoting: capture zfs get origin across involved datasets, document intended outcomes, and coordinate with replication owners. After promoting: update operational docs and any automation that assumes dataset roles.

Mistake 4: Expecting snapshot deletion to reclaim space immediately

Symptom: You destroy snapshots and the pool stays full. Or you delete files and nothing changes.

Fix: Check for dependent clones, snapshot holds, and other datasets referencing the blocks. Use usedbysnapshots and usedbychildren to understand who is actually pinning space.

Mistake 5: Using clones for long-lived databases without watching write amplification

Symptom: The clone’s used grows rapidly, performance becomes erratic, and you see increased fragmentation over time.

Fix: For write-heavy workloads, clones are still fine, but treat them as real datasets with real growth. Apply quotas, monitor used and pool headroom, and consider refreshing clones from new snapshots periodically rather than keeping them forever.

Mistake 6: Confusing “refer” with “unique space consumed”

Symptom: A clone shows refer=10T and someone panics, assuming it consumes 10T.

Fix: Teach the team that refer is accessible data, not unique physical allocation. Use used breakdown and pool allocation to assess actual capacity risk.

Checklists / step-by-step plan

Checklist A: Safe workflow for creating clones in production

  1. Create a named snapshot with a timestamp and purpose (e.g., @template-v3 or @pre-change-INC123).
  2. Create the clone with a name that encodes owner/service.
  3. Set user properties on the clone: owner, ticket, expiry.
  4. Apply quotas/reservations where appropriate so clones can’t silently eat the pool.
  5. Document origin snapshot in the ticket; treat it as a dependency like a database schema version.
cr0x@server:~$ zfs snapshot tank/app@template-v3
cr0x@server:~$ zfs clone tank/app@template-v3 tank/app-clone-ci-4512
cr0x@server:~$ zfs set org.owner=ci-team tank/app-clone-ci-4512
cr0x@server:~$ zfs set org.ticket=CI-4512 tank/app-clone-ci-4512
cr0x@server:~$ zfs set org.expires=2026-01-05 tank/app-clone-ci-4512
cr0x@server:~$ zfs set quota=200G tank/app-clone-ci-4512

Checklist B: Safe cleanup when snapshots won’t delete

  1. Attempt to destroy snapshot and confirm the error is dependent clones (not holds).
  2. List clones with matching origin.
  3. Classify clones: disposable, must-keep, unknown owner.
  4. For disposable clones: destroy them first.
  5. For must-keep clones: decide between promotion or migration to new lineage.
  6. After dependency removed: destroy snapshot, confirm space reclaimed trends.
cr0x@server:~$ zfs destroy tank/app@daily-2025-12-10
cannot destroy snapshot tank/app@daily-2025-12-10: snapshot has dependent clones
cr0x@server:~$ zfs list -t filesystem,volume -o name,origin -r tank | grep 'tank/app@daily-2025-12-10'
tank/app-clone-sandbox  tank/app@daily-2025-12-10
cr0x@server:~$ zfs get org.owner,org.expires tank/app-clone-sandbox
NAME                 PROPERTY    VALUE        SOURCE
tank/app-clone-sandbox  org.owner   alice        local
tank/app-clone-sandbox  org.expires 2025-12-01  local
cr0x@server:~$ zfs destroy tank/app-clone-sandbox
cr0x@server:~$ zfs destroy tank/app@daily-2025-12-10

Checklist C: Promotion decision tree (make it explicit)

  1. Do you want the clone to become the primary lineage? If yes, promotion is reasonable.
  2. Will replication or automation assume the original dataset is primary? If yes, update automation first or choose migration instead.
  3. Do you need to preserve snapshot history on the original? If yes, carefully review snapshot sets and test send/receive behavior in staging.
  4. Do you have enough free space? Promotions are metadata operations, but the cleanup you’re trying to do may require headroom to complete safely.

FAQ

1) What’s the difference between a snapshot and a clone in ZFS?

A snapshot is read-only and represents a point-in-time view. A clone is a writable dataset or zvol created from a snapshot. The clone depends on the snapshot until you destroy the clone or change lineage via promotion.

2) Why can’t I delete a snapshot even though I don’t need it?

Because at least one clone depends on it. That snapshot contains references to blocks the clone may still be using. ZFS prevents you from deleting it to avoid freeing in-use blocks.

3) How do I find which clone is blocking snapshot deletion?

List datasets and volumes with their origin property and match the snapshot name.

cr0x@server:~$ zfs list -t filesystem,volume -o name,origin -r tank | grep 'tank/app@daily-2025-12-10'

4) Do clones consume space?

They consume space as they diverge from the origin snapshot. Initially they share blocks, so creation is cheap. Over time, writes allocate new blocks, and those blocks are unique to the clone (unless additionally referenced elsewhere).

5) Can I replicate clones with zfs send?

Yes, but you need a clear plan. Replicating a dataset that has clones can require replicating the relevant snapshots and preserving the lineage expected on the receiving side. If you only need an independent copy, consider sending a snapshot stream into a new dataset and not recreating the clone topology.

6) Should I use clones for database environments?

Often yes for fast refreshes, but set quotas and monitor growth. Databases write a lot of new blocks; a clone can quickly become “nearly full size” compared to its origin depending on churn. The win is provisioning speed and rollback convenience, not guaranteed long-term space savings.

7) What does zfs promote actually do?

It changes which dataset is considered the origin in the clone relationship. Practically, it makes the selected clone independent of its origin snapshot, and it rearranges ancestry so you can remove dependencies. It doesn’t “copy all data,” but it absolutely changes how future snapshot deletion and replication behave.

8) Are ZFS clones safe to use in production?

Yes—if you manage lifecycle and dependencies. The risk isn’t data integrity; ZFS is doing exactly what it promises. The risk is operational: hidden retention, space pinning, and surprise promotions. Treat clones as first-class assets with ownership and expiry.

9) Why did freeing files not free pool space?

Because the deleted blocks are still referenced by snapshots and/or clones. Check usedbysnapshots, dependent clones, and snapshot holds. Space is freed only when the last reference disappears.

10) Is there a safer alternative to clones for “copying” data?

If you need an independent copy without dependencies, do a full copy or use zfs send | zfs receive into a separate dataset/pool. It’s not instant, but it removes the shared-block dependency chain that clones introduce.

Conclusion

ZFS clones are one of the best “move fast without breaking storage” tools we have—until you treat them like normal copies. They’re instant because they’re linked. That link is powerful when you control it, and expensive when you forget it exists.

If you remember only three things, make them these: clones pin snapshots, pinned snapshots pin space, and unmanaged space eventually pins your weekend. Put owners and expirations on clones, understand promotion before you need it, and keep enough free headroom that CoW can do its job. That’s how clones stay a superpower instead of a crime scene.

← Previous
Ubuntu 24.04 disk is slow: IO scheduler, queue depth, and how to verify improvements
Next →
Docker IOPS Starvation: Why One DB Container Makes Everything Lag

Leave a comment