ZFS Incremental Send: Backups That Don’t Recopy Everything

Was this helpful?

The first time you watch a ZFS incremental send finish in seconds—after last night’s full backup took hours—you get that rare feeling in infrastructure:
something is both elegant and practical. ZFS snapshots give you immutable points-in-time, and zfs send/zfs receive can replicate
only what changed between those snapshots. The payoff is obvious: faster backups, less network traffic, and a recovery story that doesn’t start with “sorry, we’re still copying.”

But incremental replication is not magic. It’s a contract: your snapshot chain, dataset properties, and receiver state must line up. When they don’t,
ZFS will refuse to guess. That’s good engineering, and occasionally a bad day for whoever assumed “incremental” means “it’ll just work.”

What “incremental send” actually means

A ZFS “send stream” is a serialized representation of a dataset (or snapshot) that can be reconstructed on another pool with zfs receive.
A full send streams the entire snapshot. An incremental send streams only the blocks that changed between two snapshots—“from A to B.”

Incremental sends aren’t file-by-file diffs. ZFS works at the block level. If a file changes in a way that rewrites blocks, those blocks are sent.
If nothing changed, nothing is sent. If a file got rewritten wholesale (say, a VM disk image with a small edit that triggers huge rewrite patterns),
you can still send a lot of data incrementally. “Incremental” is about changed blocks, not changed intentions.

There are two main incremental modes you’ll use:

  • Snapshot-to-snapshot incremental: zfs send -i pool/fs@old pool/fs@new sends changes from @old to @new.
  • Replication incremental (a.k.a. “base snapshot”): zfs send -I pool/fs@base pool/fs@new can include intermediate snapshots in the stream.

That -I (capital i) matters when you want the receiver to end up with the same snapshot set—not just the latest one. In most backup
operations, snapshot history is part of the product.

Joke #1: Incremental backups are like dieting—if you “cheat” by deleting the base snapshot, the math stops working, and you’ll feel it immediately.

Interesting facts and context (the stuff that changes your decisions)

  1. ZFS snapshots are cheap because they’re metadata, not copies. The cost shows up later as blocks diverge and must be retained for older snapshots.
  2. Send streams are deterministic for a given dataset state, but not necessarily stable across features. Enabling new pool/dataset features can affect compatibility with older receivers.
  3. OpenZFS unifies multiple platform lineages. The “ZFS” you run on Linux today is a descendant of Solaris ZFS, with years of feature development and operational hardening.
  4. Large record sizes can make incrementals surprisingly big. If your dataset uses recordsize=1M and a workload rewrites small regions randomly, you may churn large blocks.
  5. Resumable send exists because networks are unreliable. Resume tokens allow you to continue a receive after interruption, rather than restarting a multi-terabyte stream.
  6. ZFS replication can preserve properties, ACLs, and xattrs—but only if you ask. Flags like -p and -x change what gets replicated and what gets overridden.
  7. Bookmarks were introduced to keep incremental bases without keeping full snapshots. They’re lightweight references to a snapshot state, useful in replication pipelines.
  8. Encrypted replication has two modes: raw and non-raw. “Raw” replication keeps the encryption intact and avoids decrypting/re-encrypting on the sender.
  9. zfs receive is intentionally strict about lineage. If the receiver’s last snapshot doesn’t match the sender’s base snapshot, ZFS won’t “best effort” your data.

The operational mental model: snapshots, streams, and lineage

If you remember one concept, make it lineage. Incremental send works only when the receiver has a snapshot that is byte-for-byte the same as the sender’s base snapshot.
That’s why snapshot naming conventions and retention policies are not “nice to have”—they’re structural.

1) The chain: base → changes → new

When you run:

cr0x@server:~$ zfs send -i tank/app@2025-12-24_0000 tank/app@2025-12-25_0000 | zfs receive -u backup/app

ZFS computes which blocks are different between those snapshots and streams them. On the receiver, those blocks are applied to reconstruct @2025-12-25_0000.
If backup/app does not already have @2025-12-24_0000 with matching GUID lineage, the receive will fail.

2) Snapshot sets vs “just the latest”

A common backup goal is: “keep a week of hourlies and a month of dailies.” With ZFS, that often means you want the receiver to also have those snapshots,
not just the head state. That’s where -I helps: it can send a range that includes intermediate snapshots, preserving the set.

3) Properties, mount behavior, and why your backup server suddenly mounted production datasets

Receives can create datasets and snapshots. By default, a received filesystem may mount according to inherited properties. On a backup host, you usually
want replicated filesystems to stay unmounted to avoid accidental access patterns, indexing, or “helpful” agents scanning them.

That’s why you’ll see zfs receive -u everywhere in sane playbooks: it receives the dataset but does not mount it.

Designing a replication workflow that survives real life

A workable replication design answers four questions:

  • What is the unit of replication? A single dataset, or a subtree with children (e.g., tank/services plus all descendants).
  • How do we name snapshots? So automation can compute the “last common snapshot” and humans can triage failures quickly.
  • How do we handle interruptions? Resumable receives, timeouts, and monitoring.
  • What are our invariants? “Backups are unmounted,” “we replicate properties,” “we don’t replicate temporary datasets,” and so on.

Snapshot naming that operators can parse at 03:00

Avoid clever. Use sortable timestamps and a stable prefix. For example:
auto-2025-12-25_0100, auto-2025-12-25_0200.
If you maintain multiple policies, include the policy name:
hourly-2025-12-25_0200, daily-2025-12-25.

Raw encrypted replication: when backups shouldn’t see plaintext

If your datasets are encrypted and you want your backup host to store ciphertext (and not require keys), you’re looking for raw sends:
zfs send -w (or --raw, depending on your platform). This keeps the on-disk encryption representation intact.
It’s a huge operational win: your backup system can be treated as “storage,” not “trusted compute.”

Replication of children: one stream, many datasets

Use -R (replication stream) when you want to replicate a dataset and its descendants, plus properties and snapshots. It’s the “bring the whole tree”
option. It’s also the option that can surprise you if you didn’t mean to replicate that one nested dataset that someone created for “temporary testing.”

Bandwidth and CPU: your pipeline is a system

zfs send produces a stream; how you transport it matters. SSH encryption can become your bottleneck before disks do.
Compression can help if your data compresses and your CPU is cheaper than your WAN. Buffering can smooth bursts and keep both ends busy.

Joke #2: SSH is great until you realize your “backup appliance” is actually a space heater with a network port.

Practical tasks: commands you will actually run

The commands below assume two hosts: a source (prod) and a backup receiver (backup).
Adjust pool and dataset names to match your environment.

Task 1: Confirm dataset layout and basic properties

cr0x@prod:~$ zfs list -o name,used,avail,refer,mountpoint,compression,recordsize -r tank/app
NAME              USED  AVAIL  REFER  MOUNTPOINT        COMPRESS  RECSIZE
tank/app          220G   1.4T   180G  /srv/app          lz4       128K
tank/app/db        40G   1.4T    40G  /srv/app/db       lz4       16K
tank/app/uploads   60G   1.4T    60G  /srv/app/uploads  lz4       128K

Interpretation: replication decisions are per dataset. Here, db is tuned with smaller recordsize.
If you replicate the parent with -R, you get children too—good when intentional.

Task 2: Create a snapshot (single dataset)

cr0x@prod:~$ SNAP=auto-2025-12-25_0000
cr0x@prod:~$ zfs snapshot tank/app@${SNAP}

Interpretation: snapshots are instantaneous. The dataset keeps changing after the snapshot; the snapshot is a consistent point-in-time view.

Task 3: Create recursive snapshots (dataset + children)

cr0x@prod:~$ SNAP=auto-2025-12-25_0000
cr0x@prod:~$ zfs snapshot -r tank/app@${SNAP}

Interpretation: every descendant gets a snapshot with the same name. That symmetry makes replication scripting much easier.

Task 4: Do the first full replication to an empty receiver

cr0x@prod:~$ SNAP=auto-2025-12-25_0000
cr0x@prod:~$ zfs send -R tank/app@${SNAP} | ssh backup 'zfs receive -u -F backup/app'

Interpretation: -R replicates the dataset tree. receive -u keeps it unmounted. -F forces a rollback on the receiver if needed.
Use -F only when you understand the blast radius; it can discard receiver-side changes.

Task 5: Perform an incremental replication to the next snapshot

cr0x@prod:~$ OLD=auto-2025-12-25_0000
cr0x@prod:~$ NEW=auto-2025-12-25_0100
cr0x@prod:~$ zfs snapshot -r tank/app@${NEW}
cr0x@prod:~$ zfs send -R -i tank/app@${OLD} tank/app@${NEW} | ssh backup 'zfs receive -u backup/app'

Interpretation: this sends only changes since @OLD while maintaining the replicated dataset structure.
If the receiver is missing @OLD (or has a different lineage), this fails.

Task 6: Use -I to include intermediate snapshots

cr0x@prod:~$ BASE=auto-2025-12-25_0000
cr0x@prod:~$ HEAD=auto-2025-12-25_0600
cr0x@prod:~$ zfs send -R -I tank/app@${BASE} tank/app@${HEAD} | ssh backup 'zfs receive -u backup/app'

Interpretation: the receiver ends up with every snapshot between @BASE and @HEAD that exists on the sender, not just @HEAD.
This is how you keep snapshot history aligned.

Task 7: Confirm snapshots exist and line up on both ends

cr0x@prod:~$ zfs list -t snapshot -o name,creation -s creation -r tank/app | tail -n 5
tank/app@auto-2025-12-25_0200  Wed Dec 25 02:00 2025
tank/app@auto-2025-12-25_0300  Wed Dec 25 03:00 2025
tank/app@auto-2025-12-25_0400  Wed Dec 25 04:00 2025
tank/app@auto-2025-12-25_0500  Wed Dec 25 05:00 2025
tank/app@auto-2025-12-25_0600  Wed Dec 25 06:00 2025
cr0x@backup:~$ zfs list -t snapshot -o name,creation -s creation -r backup/app | tail -n 5
backup/app@auto-2025-12-25_0200  Wed Dec 25 02:01 2025
backup/app@auto-2025-12-25_0300  Wed Dec 25 03:01 2025
backup/app@auto-2025-12-25_0400  Wed Dec 25 04:01 2025
backup/app@auto-2025-12-25_0500  Wed Dec 25 05:01 2025
backup/app@auto-2025-12-25_0600  Wed Dec 25 06:01 2025

Interpretation: timestamps won’t match exactly, but snapshot names should. When they drift, your next incremental send is where you find out.

Task 8: Estimate send size before committing (useful for WAN links)

cr0x@prod:~$ zfs send -n -v -i tank/app@auto-2025-12-25_0500 tank/app@auto-2025-12-25_0600
send from @auto-2025-12-25_0500 to tank/app@auto-2025-12-25_0600 estimated size is 3.14G
total estimated size is 3.14G

Interpretation: -n is a dry run; -v prints estimates. Treat as an estimate—compression, recordsize, and stream content can skew reality.

Task 9: Add compression and buffering in the transport pipeline

cr0x@prod:~$ OLD=auto-2025-12-25_0500
cr0x@prod:~$ NEW=auto-2025-12-25_0600
cr0x@prod:~$ zfs send -R -i tank/app@${OLD} tank/app@${NEW} \
  | lz4 -z \
  | ssh backup 'lz4 -d | zfs receive -u backup/app'

Interpretation: if your data is already compressed (media, backups of backups), this may waste CPU and reduce throughput.
If your bottleneck is WAN bandwidth, it can be a big win. Measure, don’t vibe.

Task 10: Encrypted dataset replication with raw send

cr0x@prod:~$ zfs get -H -o property,value encryption tank/secret
encryption	on
cr0x@prod:~$ SNAP=auto-2025-12-25_0000
cr0x@prod:~$ zfs snapshot tank/secret@${SNAP}
cr0x@prod:~$ zfs send -w tank/secret@${SNAP} | ssh backup 'zfs receive -u backup/secret'

Interpretation: the receiver stores encrypted data and does not need the encryption key to hold it.
Your restore workflow needs to account for where keys live and how you’ll load them.

Task 11: Use bookmarks to preserve an incremental base without keeping old snapshots

cr0x@prod:~$ zfs snapshot tank/app@auto-2025-12-25_0700
cr0x@prod:~$ zfs bookmark tank/app@auto-2025-12-25_0700 tank/app#base-0700
cr0x@prod:~$ zfs list -t bookmark -o name,creation -r tank/app
NAME                   CREATION
tank/app#base-0700      Wed Dec 25 07:00 2025

Interpretation: bookmarks are lightweight “anchors” to a snapshot state. They can be used as incremental bases in many workflows,
allowing you to delete older snapshots while keeping replication continuity—when your platform supports it end-to-end.

Task 12: Handle an interrupted receive with resume tokens

cr0x@backup:~$ zfs get -H -o value receive_resume_token backup/app
1-7d3f2b8c2e-120-789c...
cr0x@prod:~$ ssh backup 'zfs get -H -o value receive_resume_token backup/app'
1-7d3f2b8c2e-120-789c...
cr0x@prod:~$ TOKEN=$(ssh backup 'zfs get -H -o value receive_resume_token backup/app')
cr0x@prod:~$ zfs send -t ${TOKEN} | ssh backup 'zfs receive -u backup/app'

Interpretation: you resume the stream from where it stopped, instead of restarting. This is one of those features you don’t appreciate
until a flaky link drops at 97%.

Task 13: Safely prune snapshots on the sender after successful replication

cr0x@prod:~$ KEEP=auto-2025-12-25_0600
cr0x@prod:~$ zfs list -H -t snapshot -o name -s creation -r tank/app | head
tank/app@auto-2025-12-24_0000
tank/app@auto-2025-12-24_0100
tank/app@auto-2025-12-24_0200
cr0x@prod:~$ zfs destroy tank/app@auto-2025-12-24_0000
cr0x@prod:~$ zfs destroy -r tank/app@auto-2025-12-24_0100

Interpretation: snapshot deletion is permanent. In production, snapshot pruning should be automated but conservative, ideally gated by:
“receiver has snapshot X” and “last replication succeeded.” If you delete the only common base, your next incremental becomes a full.

Task 14: Verify integrity with a scrub and check for errors

cr0x@backup:~$ zpool scrub backup
cr0x@backup:~$ zpool status -v backup
  pool: backup
 state: ONLINE
status: scrub in progress since Wed Dec 25 09:12:48 2025
  1.23T scanned at 2.11G/s, 410G issued at 701M/s, 3.02T total
config:

        NAME        STATE     READ WRITE CKSUM
        backup      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
errors: No known data errors

Interpretation: replication moves data; scrubs validate it at rest. Backups without periodic integrity checks are just expensive optimism.

Fast diagnosis playbook (find the bottleneck in minutes)

When replication is slow or failing, your job is to avoid the 90-minute detour where everyone argues about “the network” while the real culprit is a single CPU core pegged by SSH.
This is the sequence I use because it converges quickly.

First: is it failing fast because of lineage?

Check the error message and confirm the receiver has the base snapshot (or correct snapshot chain).

cr0x@backup:~$ zfs list -t snapshot -o name -r backup/app | tail -n 10
backup/app@auto-2025-12-25_0300
backup/app@auto-2025-12-25_0400
backup/app@auto-2025-12-25_0500
backup/app@auto-2025-12-25_0600

If the receiver is missing the base snapshot, the fix is not “retry harder.” It’s “send a full” or “send an incremental from the last common snapshot.”

Second: is the receiver blocked on disk I/O?

Replication is write-heavy on the receiver. If the backup pool is slow, every tuning attempt upstream is lipstick on a forklift.

cr0x@backup:~$ zpool iostat -v 1 5
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
backup                       3.2T  5.1T      0    820      0  210M
  raidz1-0                   3.2T  5.1T      0    820      0  210M
    sda                          -      -      0    275      0   70M
    sdb                          -      -      0    280      0   71M
    sdc                          -      -      0    265      0   69M
--------------------------  -----  -----  -----  -----  -----  -----

If write bandwidth is low and latency is high (you’ll see it in system tools), the receiver is your limiter.
Common reasons: RAIDZ parity overhead on small writes, SMR drives, a busy pool, or a badly undersized SLOG for sync-heavy workloads.

Third: is the sender blocked on reading?

cr0x@prod:~$ zpool iostat -v 1 5
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
tank                         5.8T  2.1T    620      5  640M   2.1M
  mirror-0                   1.9T   700G   210      2  220M   900K
    nvme0n1                      -      -   210      2  220M   900K
    nvme1n1                      -      -   210      2  220M   900K
--------------------------  -----  -----  -----  -----  -----  -----

If reads are slow, check if the sender pool is under load (apps, compactions, scrubs) or if you’re thrashing ARC.

Fourth: is the transport pipeline the bottleneck (SSH, compression, buffering)?

Watch CPU and throughput during a send. A classic symptom is one CPU core pinned and the stream capped at a suspiciously consistent rate.

cr0x@prod:~$ ps -eo pid,comm,%cpu,args | egrep 'zfs|ssh|lz4' | head
21433 zfs      35.2 zfs send -R -i tank/app@auto-2025-12-25_0500 tank/app@auto-2025-12-25_0600
21434 lz4      98.7 lz4 -z
21435 ssh      64.1 ssh backup lz4 -d | zfs receive -u backup/app

If compression is pegging CPU and not reducing bytes meaningfully, remove it. If SSH is pegging CPU, consider faster ciphers (where policy allows)
or moving replication to a dedicated network with IPsec offload—anything that relocates crypto cost.

Fifth: is it stuck due to a partial receive?

cr0x@backup:~$ zfs get -H -o property,value receive_resume_token backup/app
receive_resume_token  1-7d3f2b8c2e-120-789c...

If a resume token exists, you likely have an interrupted receive that must be resumed or aborted before new replication proceeds cleanly.

Three corporate-world mini-stories

Mini-story 1: The incident caused by a wrong assumption

A team rolled out ZFS replication for a fleet of build servers. The idea was solid: snapshot the workspace dataset hourly, replicate to a backup host,
and keep a week of history. They tested a full send, tested an incremental, and declared victory.

Two months later, a build server failed. Restore day arrived with the confidence of a well-rehearsed runbook—until zfs receive refused
the incremental chain. The receiver had the snapshots, yes. But not the right snapshots: someone had “cleaned up old backups” on the backup host
to reclaim space and had deleted a snapshot that was still the base for the next incremental series.

The wrong assumption was subtle: “If I delete old snapshots on the receiver, the sender can still send incrementals from whatever it has.”
That’s not how ZFS works. Incrementals require a common ancestor snapshot. Delete the ancestor, and the sender can’t compute a delta the receiver can apply.
ZFS doesn’t invent a new base.

The recovery was a full send over a link that was never designed for it. The service came back, but the lesson stuck: snapshot retention on the receiver
is not independent. If you want independent retention, you need independent bases—bookmarks can help, and so can a policy that prunes only after confirming
the next incremental base exists on both ends.

After the incident, they implemented a “last common snapshot” check in automation and blocked deletions on the receiver unless the corresponding snapshot
was pruned on the sender first (or unless a bookmark base existed). It wasn’t glamorous, but it turned replication from “works until it doesn’t” into a system.

Mini-story 2: The optimization that backfired

Another organization had a remote site connected over a modest WAN. Someone had the bright idea to “compress everything” in the replication pipeline
because, on paper, compression reduces bandwidth. They inserted a compression stage and celebrated when early tests with text-heavy logs showed big gains.

Then they pointed the same pipeline at VM datasets. The VMs hosted databases, caches, and plenty of already-compressed blobs. The result was a throughput drop
and increased replication lag. The backup window wasn’t technically “a window” anymore—it was just “the time between now and forever.”

The backfire was classic: CPU became the bottleneck. Compression was chewing cycles and generating little reduction. SSH was also encrypting the now-compressed stream,
burning more CPU on both ends. The storage arrays sat bored while the CPUs sweated.

Fixing it was not heroic. They measured. They removed compression for datasets that didn’t benefit, kept it for those that did, and pinned replication to off-peak
hours where it wouldn’t compete with application CPU. For a few critical datasets, they did raw encrypted sends, reducing overhead from redundant transforms.

The “optimization” wasn’t wrong in principle. It was wrong as a blanket policy. In backup engineering, universal rules are how you get universal pain.

Mini-story 3: The boring but correct practice that saved the day

A security-conscious company used encrypted datasets for customer data and replicated them to a separate backup environment. The backup environment was treated
as semi-trusted: good physical controls, but not trusted enough to hold decryption keys. So they used raw sends and kept keys in a controlled location.

On a Wednesday that started like any other, a storage controller on the primary system began throwing intermittent errors. Nothing catastrophic—just enough
to seed doubt. They initiated a controlled failover: restore the latest snapshots to a standby cluster and cut traffic over. This was not the first time
they had run the procedure; they practiced quarterly, like adults.

The restore went smoothly because the backups were not just “present,” they were predictable: received datasets were unmounted, properties were consistent,
and the snapshot set was complete. Most importantly, they had a routine scrub schedule on the backup pool and monitored checksum errors like a pager-worthy metric.
When the time came to trust the backups, they weren’t trusting hope.

The team later joked that the most exciting part of the incident was how unexciting the recovery was. That’s the correct outcome. If your DR story is thrilling,
it’s probably also expensive.

Common mistakes, symptoms, and fixes

Mistake 1: Deleting the base snapshot (or diverging the receiver)

Symptoms: incremental receive fails with messages about missing snapshots or “does not exist,” or “incremental source is not earlier than destination.”

Fix: identify the last common snapshot and replicate from there. If none exists, do a full send. Prevent recurrence by coordinating retention and/or using bookmarks.

Mistake 2: Using zfs receive -F as a reflex

Symptoms: “It worked” but a receiver-side dataset lost newer snapshots or was rolled back unexpectedly.

Fix: reserve -F for cases where the receiver is strictly a replication target and you accept rollback semantics. Prefer a clean target namespace and controlled promotions.

Mistake 3: Replicating mounts into the backup host’s namespace

Symptoms: datasets mount on the backup host; indexing/AV agents chew I/O; admins accidentally browse and modify backup copies (or try to).

Fix: always receive with -u, and consider setting canmount=off on replication roots on the receiver.

Mistake 4: Confusing -i and -I

Symptoms: receiver has only the latest snapshot, not the intermediate history; retention policies don’t match; restores lack the point-in-time you expected.

Fix: use -I when you want intermediate snapshots included. Use -i when you truly want only the delta between two snapshots.

Mistake 5: Assuming “incremental” means “small” (especially for VM images)

Symptoms: incrementals as large as full sends; replication falls behind; networks saturate unexpectedly.

Fix: measure send estimates (zfs send -n -v), tune dataset recordsize appropriately, and consider workload-aware layouts (separate datasets for churn-heavy data).

Mistake 6: Not planning for interruptions

Symptoms: partial receives block future replication; repeated restarts waste hours; backup lag grows.

Fix: use resumable receive and monitor receive_resume_token. Add automation to resume or cleanly abort based on policy.

Mistake 7: Cross-version / feature mismatch surprises

Symptoms: receive fails with messages about unsupported features or stream versions.

Fix: standardize OpenZFS versions across sender/receiver where possible, or constrain sends to compatible features via platform-specific options and disciplined upgrade sequencing.

Checklists / step-by-step plan

Checklist A: Build a sane replication baseline (first time setup)

  1. Pick a dataset root for replication (e.g., tank/app) and decide whether children are included.
  2. Define snapshot naming: prefix + sortable time.
  3. On receiver, create a dedicated pool/dataset namespace (e.g., backup/app) and ensure it has enough space for retention.
  4. Set receiver defaults: canmount=off on the root, and plan to always use zfs receive -u.
  5. Create the first recursive snapshot: zfs snapshot -r tank/app@auto-....
  6. Do the first full send with -R and receive unmounted.
  7. Validate snapshot presence on both ends.
  8. Decide on retention and implement it carefully (prune after confirming receiver has a safe base).

Checklist B: Daily operations (the “don’t get paged” routine)

  1. Create snapshots on a schedule (hourly/daily).
  2. Replicate incrementals from the last replicated snapshot to the new snapshot.
  3. After replication, validate: last snapshot exists on receiver and receive_resume_token is empty.
  4. Prune old snapshots on sender (and optionally receiver) according to policy without breaking the chain.
  5. Run periodic scrubs on receiver and alert on checksum errors.
  6. Do restore drills: mount a received snapshot clone and verify application-level integrity.

Checklist C: Restore workflow (filesystem-level)

  1. Identify the snapshot you want to restore.
  2. Clone it to a new dataset on the restore target (avoid overwriting live data during investigation).
  3. Mount the clone, validate contents, then promote or copy into place according to your app’s needs.

FAQ

1) What’s the difference between zfs send -i and -I?

-i sends the delta between exactly two snapshots. -I sends a replication range and can include intermediate snapshots, preserving history.
If you care about retaining many restore points on the receiver, -I is usually the right hammer.

2) Can I do incremental sends without keeping old snapshots forever?

You need some common base. In many environments, bookmarks can act as lightweight bases, letting you delete snapshots while preserving replication continuity.
Whether that fits depends on your OpenZFS version and workflow. Without a base (snapshot or bookmark), you’re doing a full send.

3) Why does zfs receive fail even though the snapshot names match?

Because lineage matters, not just names. The receiver must have the same base snapshot content and GUID ancestry. If the receiver snapshot was created independently
or the dataset was rolled back/modified in a way that diverges, ZFS will refuse the incremental stream.

4) Should I replicate properties?

Often yes, because properties like recordsize, compression, and acltype can affect how restores behave. But be intentional:
on backup targets you may want different mount behavior (canmount=off, readonly=on) than production.

5) Is it safe to run replication while the application is writing?

Yes, because you replicate a snapshot, which is consistent. The live dataset continues to change, but the snapshot is a stable view.
The key is to snapshot at the right layer: for databases, application-consistent snapshots may require pre/post hooks (flush, freeze, or checkpoint).

6) How do I know if SSH encryption is my bottleneck?

If throughput is capped and CPU usage is high on one core during sends, suspect SSH. Validate by observing CPU consumption of ssh processes
during replication and comparing performance on a trusted local link or with different transport settings approved by your security policy.

7) What does zfs send -n -v really tell me?

It estimates stream size without sending it. It’s good for planning and detecting “why is this incremental huge,” but it’s not a billing meter.
Always treat it as directional, not absolute.

8) Should backups be mounted on the backup server?

In most production setups, no. Keep them unmounted to reduce accidental reads/writes, indexing, and human curiosity. Mount only when restoring or testing.
Use zfs receive -u and consider canmount=off on replicated roots.

9) Can I replicate an encrypted dataset to an untrusted backup host?

Yes, with raw sends. The backup host stores ciphertext and does not need keys to receive it. Your restore process needs access to keys on a trusted system.

10) How often should I scrub the backup pool?

Often enough to catch silent corruption before it becomes your restore story. Many operators do monthly on large pools and more frequently on smaller ones,
with alerts on any checksum errors. The “right” cadence depends on drive type, pool size, and how much you value sleep.

Conclusion

ZFS incremental send is one of the rare backup mechanisms that scales from “one server in a closet” to “a fleet with real compliance pressure”
without changing its fundamentals. Snapshots create stable points-in-time. Incremental streams move only changed blocks. Receivers reconstruct history
with integrity checks built into the filesystem itself.

The operational catch is lineage: incrementals require a shared base, and your retention policies, naming conventions, and automation must respect that.
Do that work, and you get backups that are fast, verifiable, and restore-friendly—without recopying everything every time.

← Previous
MySQL vs PostgreSQL backups in containers: how to avoid fake backups
Next →
DNS: MTU Issues Can Break DNS — How to Prove It and Fix It

Leave a comment