ZFS Bookmarks: Saving Incrementals When Something Goes Wrong

Was this helpful?

Snapshots are the currency of ZFS replication. You mint them, trade them across the network with zfs send/zfs receive, and occasionally drop one on the floor at 2 a.m. because a retention job got “creative.” When that happens, your incremental chain breaks, your next replication turns into a full send, and your WAN link starts smoking like a toaster left in the rain.

ZFS bookmarks are the quiet feature that makes these disasters boring again. They’re not snapshots. They don’t hold data. They’re tiny pointers to “the state of the filesystem at this snapshot,” and that pointer can keep an incremental stream alive even if the snapshot itself is gone. Bookmarks are what you wish you had configured the last time someone said, “We can prune aggressively; replication will be fine.”

What ZFS bookmarks actually are (and aren’t)

A ZFS bookmark is a named reference to a snapshot’s GUID and transaction group (TXG) position. Think of it as a “save point” that says: this dataset looked like this at the time of snapshot X—without keeping snapshot X’s user-visible namespace around. It’s metadata-only: no file tree, no mountable view, no directory browsing.

Bookmarks exist for one main reason: incremental sends need a common ancestor on both sides. Normally, that common ancestor is a snapshot on both sender and receiver. With bookmarks, the receiver can hold the ancestor as a bookmark even if the snapshot is later destroyed, and ZFS can still accept incrementals that reference that point in history.

Bookmarks are not magic:

  • They don’t let you restore files; they’re not mountable.
  • They don’t preserve data blocks by themselves; they don’t pin blocks the way snapshots do.
  • They can’t resurrect an incremental stream if the sender lost the ancestor and can’t generate the right stream.

But for replication pipelines—where the receiver often needs to remember the “last good point” and the sender needs to continue from it—bookmarks are the difference between “incremental continues” and “we’re shipping 40 TB again.”

Joke #1 (short and practical): A bookmark is like writing down your hotel room number. If you lose the keycard (snapshot), you can still tell the front desk where you were supposed to be.

Why incremental replication breaks

Incremental ZFS send works by shipping only the changed blocks between two snapshots (or a snapshot and a bookmark, depending on side). The important part is that both sides must agree on the “from” point. If they don’t, ZFS refuses to apply the stream because it can’t prove it will result in the same filesystem state.

Most broken chains are self-inflicted. Common causes:

  • Snapshot pruning on the receiver: someone deletes old snapshots to save space, not realizing replication needs at least one common snapshot (or bookmark) to anchor incrementals.
  • Snapshot pruning on the sender: the sender loses the “from” snapshot so it can’t generate the incremental that the receiver expects.
  • Replication to the wrong dataset: a target got recreated, rolled back, or replaced. Now the receiver’s “latest snapshot” isn’t actually related to the sender.
  • Inconsistent naming assumptions: scripts assume “the newest snapshot is the newest,” but timezones, clock skew, or naming changes make “newest” not the one replication last used.

When it breaks, operators often choose the path of least resistance: do a full send. That works—eventually. It also eats bandwidth, IOPS, and patience. Bookmarks let you keep receiver-side ancestry even when you delete snapshots, and that changes the decision from “full send now” to “incremental continues as planned.”

Interesting facts and context

Some context helps because bookmarks are one of those features people assume are new, exotic, or “only for big shops.” They’re not.

  1. Bookmarks were designed specifically for replication hygiene: keeping incremental lineage without keeping full snapshots forever.
  2. They’re metadata-only: creating a bookmark is fast and space-cheap, typically measured in “rounding error” compared to snapshot space.
  3. Snapshots pin blocks; bookmarks don’t: a bookmark doesn’t prevent space from being freed as blocks age out. This is intentional: replication ancestry without retention bloat.
  4. Incremental sends can reference bookmarks (e.g., send from a bookmark to a new snapshot), which is operationally useful when you want to prune snapshots on the receiver.
  5. The receive side can keep a bookmark for the last replicated snapshot, then delete that snapshot, and still accept future incrementals referencing that point—assuming the sender can still generate them.
  6. Bookmarks have their own namespace: you list them with zfs list -t bookmark and manage them separately from snapshots.
  7. Many replication tools implemented bookmarks later than snapshots, so older scripts and wrappers may ignore them unless explicitly configured.
  8. As of modern OpenZFS, bookmark support is mainstream on common platforms, but mixed-version fleets still trip over feature flags and send stream options.

Core mechanics: how bookmarks save incrementals

Here’s the mental model that holds up under incident pressure:

  1. A snapshot is a full, browsable checkpoint of a dataset. ZFS uses it to compute differences and to provide a stable view.
  2. A bookmark is a non-browsable checkpoint reference to a snapshot’s exact point in history (its GUID and related metadata).
  3. An incremental stream is a transformation from state A to state B. For safety, the receiver must confirm it is currently at state A before applying the transformation.

So how do bookmarks help when something goes wrong?

Suppose the receiver had snapshot pool/fs@replica_2025-12-01, and you took newer snapshots and replicated incrementally from it. If someone deletes @replica_2025-12-01 on the receiver, you’ve removed the “state A” anchor. With a bookmark, you instead do this:

  • Create bookmark #replica_2025-12-01 that points at that snapshot’s GUID.
  • Optionally delete the snapshot @replica_2025-12-01 to save namespace clutter (and possibly space if it was pinning blocks).
  • Continue receiving incrementals that refer to that state, because ZFS can validate the “from” side against the bookmark.

Key constraint: the sender must still have the “from” snapshot (or something equivalent) to compute the incremental differences. Bookmarks can’t conjure deltas out of thin air. They solve the receiver-side ancestry problem, not the sender-side one.

Joke #2 (short and relevant): ZFS replication is like couples therapy: it only works if both sides agree on what happened last time.

Practical tasks with commands (and what the output means)

Below are real tasks you can do today. These are written in the “runbook voice” I wish more teams used: command, sample output, and what it implies. Adjust dataset names to your environment.

Task 1: Confirm bookmark feature support and dataset health

cr0x@server:~$ zpool status -x
all pools are healthy

cr0x@server:~$ zfs get -H -o value -s local,received,default all pool/fs | head -n 1
on

Interpretation: Start with “is the pool sick?” before you blame replication. If the pool is degraded, you may be chasing performance symptoms caused by resilvering or checksum errors, not bookmarks.

Task 2: List snapshots and bookmarks separately (don’t assume you’re seeing both)

cr0x@server:~$ zfs list -t snapshot -o name,used,refer,mountpoint -s creation pool/fs | tail -n 3
pool/fs@replica_2025-12-20   0B  1.20T  -
pool/fs@replica_2025-12-21   0B  1.21T  -
pool/fs@replica_2025-12-22   0B  1.22T  -

cr0x@server:~$ zfs list -t bookmark -o name,createtxg,guid -s createtxg pool/fs | tail -n 3
pool/fs#replica_2025-12-19  19548312  17084256651437522314
pool/fs#replica_2025-12-20  19591288  5721345558355301722
pool/fs#replica_2025-12-21  19634801  15884259650518200641

Interpretation: Snapshots and bookmarks live in different lists. If your tooling only looks at snapshots, it may tell you “no common snapshot,” while ZFS could still have a bookmark anchor.

Task 3: Create a bookmark from an existing snapshot

cr0x@server:~$ zfs bookmark pool/fs@replica_2025-12-22 pool/fs#replica_2025-12-22

Interpretation: This is the core operation. You’re naming a replication anchor. The bookmark name can follow your snapshot naming convention; just remember it uses #, not @.

Task 4: Safely delete an old snapshot after bookmarking it (receiver-side pruning)

cr0x@server:~$ zfs destroy pool/fs@replica_2025-12-20

cr0x@server:~$ zfs list -t bookmark pool/fs#replica_2025-12-20
NAME                         CREATETXG  GUID
pool/fs#replica_2025-12-20   19591288   5721345558355301722

Interpretation: You can delete the snapshot while keeping the ancestry pointer. This is the “save incrementals when retention runs hot” move.

Task 5: Verify you still have a common base on the receiver after pruning

cr0x@server:~$ zfs get -H -o name,value type pool/fs
pool/fs	filesystem

cr0x@server:~$ zfs list -t snapshot,bookmark -o name -s name pool/fs | grep replica_2025-12-20
pool/fs#replica_2025-12-20

Interpretation: Your chain anchor is present even though the snapshot is not. This is what keeps future incrementals viable.

Task 6: Send an incremental stream using a bookmark as the “from” point

cr0x@sender:~$ zfs send -nv -i pool/fs#replica_2025-12-20 pool/fs@replica_2025-12-22
send from pool/fs#replica_2025-12-20 to pool/fs@replica_2025-12-22 estimated size is 18.4G

cr0x@sender:~$ zfs send -w -i pool/fs#replica_2025-12-20 pool/fs@replica_2025-12-22 | ssh backup1 zfs receive -u -F pool/fs
receiving full stream of pool/fs@replica_2025-12-22 into pool/fs@replica_2025-12-22
received 18.4G stream in 00:06:12 (50.7M/sec)

Interpretation: The dry-run (-n) tells you whether ZFS recognizes the bookmark as a valid base and estimates the transfer size. The real send uses -w (raw) when appropriate in your environment; if you’re not using encryption/raw streams, drop it.

Task 7: Diagnose “incremental source is not earlier than destination” (a common mismatch)

cr0x@sender:~$ zfs send -i pool/fs@replica_2025-12-22 pool/fs@replica_2025-12-21
cannot send 'pool/fs@replica_2025-12-21': incremental source (pool/fs@replica_2025-12-22) is not earlier than destination (pool/fs@replica_2025-12-21)

Interpretation: You’re trying to send “backwards” in time. Usually it’s a scripting bug that picked the wrong “latest.” Fix your snapshot selection logic, don’t brute-force with a full send.

Task 8: Show what the receiver thinks it has (snapshots vs bookmarks)

cr0x@backup1:~$ zfs list -t snapshot -o name -s creation pool/fs | tail -n 5
pool/fs@replica_2025-12-18
pool/fs@replica_2025-12-19
pool/fs@replica_2025-12-22

cr0x@backup1:~$ zfs list -t bookmark -o name -s createtxg pool/fs | tail -n 5
pool/fs#replica_2025-12-19
pool/fs#replica_2025-12-20
pool/fs#replica_2025-12-21

Interpretation: This receiver is missing snapshots 20–21 but kept bookmarks. That’s a normal pattern for “keep only a few recovery points, keep many anchors.”

Task 9: Confirm GUID lineage (when you suspect you’re replicating to the wrong target)

cr0x@sender:~$ zfs get -H -o value guid pool/fs@replica_2025-12-20
5721345558355301722

cr0x@backup1:~$ zfs list -t bookmark -o name,guid pool/fs | grep replica_2025-12-20
pool/fs#replica_2025-12-20  5721345558355301722

Interpretation: Matching GUIDs are strong evidence you’re aligned on ancestry. If these don’t match, you’re not talking about the same history, no matter how similar the names look.

Task 10: Convert “latest replicated snapshot” into a bookmark automatically

cr0x@backup1:~$ last=$(zfs list -H -t snapshot -o name -s creation pool/fs | tail -n 1)
cr0x@backup1:~$ echo "$last"
pool/fs@replica_2025-12-22

cr0x@backup1:~$ zfs bookmark "$last" "${last/@/#}"

Interpretation: This is a common pattern: after each successful receive, bookmark the received snapshot, then your retention policy can delete snapshots without severing the chain.

Task 11: Destroy bookmarks safely (cleaning up anchors you no longer need)

cr0x@backup1:~$ zfs destroy pool/fs#replica_2025-12-19

cr0x@backup1:~$ zfs list -t bookmark -o name pool/fs | grep replica_2025-12-19 || echo "bookmark removed"
bookmark removed

Interpretation: Bookmarks can accumulate. If you keep one per snapshot forever, your namespace will eventually look like a calendar exploded. It’s fine to prune bookmarks too—just do it with intent.

Task 12: Estimate send size before you pull the trigger (avoid surprise full sends)

cr0x@sender:~$ zfs send -nv -i pool/fs@replica_2025-12-21 pool/fs@replica_2025-12-22
send from pool/fs@replica_2025-12-21 to pool/fs@replica_2025-12-22 estimated size is 1.7G

Interpretation: The -n dry-run is a cheap sanity check. If you expect 1–3 GB and it says 1.2 TB, stop and figure out why before saturating your replication link.

Task 13: Check receive-side space pressure and whether deletes are actually freeing space

cr0x@backup1:~$ zfs list -o name,used,avail,refer,mountpoint pool
NAME   USED  AVAIL  REFER  MOUNTPOINT
pool   61.2T  3.8T   192K  /pool

cr0x@backup1:~$ zfs get -o name,property,value -s local,received usedbysnapshots,usedbydataset pool/fs
NAME    PROPERTY         VALUE
pool/fs usedbysnapshots  9.4T
pool/fs usedbydataset    51.6T

Interpretation: If you’re deleting snapshots because you’re out of space, measure whether snapshots are actually the culprit. Bookmarks won’t free space (they don’t hold blocks), but snapshot pruning might—unless clones, holds, or active references keep blocks pinned.

Task 14: Receive with caution: understand what -F does before using it in anger

cr0x@backup1:~$ ssh sender zfs send -i pool/fs#replica_2025-12-20 pool/fs@replica_2025-12-22 | zfs receive -u -F pool/fs

Interpretation: zfs receive -F can roll back the target dataset to match the incoming stream, destroying newer snapshots on the receiver. This is sometimes exactly what you want for a strict replica—and sometimes a career-limiting move if the receiver also hosts local snapshots for recovery. Decide which you’re running: a replica, or a backup with local history.

Three corporate-world mini-stories

Mini-story 1: The incident caused by a wrong assumption

The assumption was simple and wrong: “If the receiver has the latest snapshot name, incrementals will always work.” The replication script was written by a capable engineer who had never lived through snapshot retention policies colliding with replication schedules. They used snapshot names with dates, sorted them lexicographically, took the last one, and called it the base. It worked for months.

Then the storage team changed naming to include a timezone suffix because an audit asked for “explicit UTC.” The sender started creating @replica_2025-12-01Z while the receiver still had older snapshots without the suffix. The script—still sorting by name—picked the wrong base. ZFS did the right thing and refused the incremental receive with a message that sounded like a philosophical argument: the stream didn’t match the dataset’s current state.

Ops did what ops does under pressure: they tried again with -F, which rolled back the receiver and deleted a week of local “just in case” snapshots that weren’t part of replication. Now they had two problems: replication was still failing intermittently, and the helpdesk was fielding restore requests that couldn’t be satisfied.

The fix wasn’t heroic. They stopped using naming as truth and started using GUID lineage. The receiver bookmarks the last successfully received snapshot (by name, yes, but verified by GUID), and the sender uses that bookmark as the incremental base. When the naming convention changed again (because naming conventions always change again), replication didn’t care. It just kept shipping deltas from the correct history.

Mini-story 2: The optimization that backfired

This one starts with a good idea: “We can save space by pruning snapshots aggressively on the backup target. It’s a replica, not a museum.” They reduced receiver snapshot retention from 30 days to 3 days. The space graph looked great. The storage alerts stopped paging. Everyone congratulated everyone else in a meeting that should have been an email.

Two weeks later, a network maintenance window ran long, and replication was paused for 36 hours. That’s not dramatic in itself. The drama came from the retention job: it deleted the last common snapshot on the receiver during the pause. When replication resumed, the sender tried to send incrementals from a snapshot the receiver no longer had. ZFS refused the receive, correctly, because it couldn’t validate the base.

The team’s next “optimization” was to force full sends for that dataset “until things stabilize.” Full sends did stabilize the process: they stabilized it into a nightly transfer that collided with the business day and saturated the array’s write path. Latency complaints started coming from the database tier. A backup system had quietly become a production performance problem.

The postmortem conclusion was almost boring: they should have been using bookmarks on the receiver the whole time. Bookmarks would have allowed snapshot pruning without breaking the incremental chain. They reworked the pipeline so that after each successful receive, the receiver creates a bookmark for that snapshot, then deletes snapshots according to local retention. The next time a pause happened, replication resumed incrementally like nothing had happened. The “optimization” became an actual optimization instead of a slow-moving incident.

Mini-story 3: The boring but correct practice that saved the day

A different company, different vibe. Their storage team was allergic to cleverness. They wrote runbooks. They practiced restores quarterly. They had an unglamorous rule: “Every replicated dataset must have a machine-readable ‘last replicated’ anchor on the receiver, independent of snapshot retention.” That anchor was a bookmark with a fixed name, like pool/fs#last-received, updated on each successful replication.

One day, an engineer troubleshooting space usage ran a cleanup script in the wrong shell history tab. It deleted a block of older snapshots on the receiver—exactly the kind of human mistake nobody admits to until you show them the audit logs. The team noticed because monitoring flagged “replication lag increasing” and “incremental receive base missing.”

They didn’t panic. The runbook said: check for the fixed-name bookmark; verify GUID matches expected base; resume replication from bookmark. The bookmark was there, still pointing to the last received state. They didn’t need the deleted snapshots to continue incrementals; they only needed the ancestry pointer. Replication resumed.

There were consequences, but they were contained: fewer recovery points on the receiver for that week, and a minor compliance headache because retention wasn’t met. The core system stayed healthy. No WAN meltdown, no full resend, no “why is the database slow” mystery. This is what good operational hygiene looks like: not preventing every mistake, but making mistakes cheap.

Fast diagnosis playbook

This is the “you have 10 minutes before the next meeting and 30 minutes before the link saturates” checklist. It’s ordered to find the bottleneck quickly and avoid destructive actions.

1) First: is it actually a replication-chain problem, or a system problem?

cr0x@backup1:~$ zpool status -x
all pools are healthy

cr0x@backup1:~$ zfs list -o name,used,avail pool
NAME   USED  AVAIL
pool   61.2T  3.8T

Interpretation: If the pool is degraded, near-full, or actively resilvering, replication will be slow or fail in ways that look like send/receive issues. Fix the underlying pool health first.

2) Second: what exact error are you getting from send/receive?

cr0x@backup1:~$ ssh sender zfs send -nv -i pool/fs@replica_2025-12-20 pool/fs@replica_2025-12-22 | zfs receive -nvu pool/fs
cannot receive incremental stream: incremental source (replica_2025-12-20) does not exist

Interpretation: That’s receiver-side missing base. This is where bookmarks usually fix it—if you had them already.

3) Third: does the receiver have a bookmark that matches the sender’s base snapshot GUID?

cr0x@sender:~$ zfs get -H -o value guid pool/fs@replica_2025-12-20
5721345558355301722

cr0x@backup1:~$ zfs list -t bookmark -o name,guid pool/fs | grep 5721345558355301722
pool/fs#replica_2025-12-20  5721345558355301722

Interpretation: If you have a bookmark with the right GUID, you can likely resume incrementals without recreating snapshots on the receiver.

4) Fourth: if incrementals are valid, why is it slow?

cr0x@sender:~$ zpool iostat -v 1 3
                              capacity     operations     bandwidth
pool                          alloc   free   read  write   read  write
----------------------------  -----  -----  -----  -----  -----  -----
pool                           48.1T  12.3T    210    980  92.1M  311M
  raidz2-0                     48.1T  12.3T    210    980  92.1M  311M
    sda                            -      -     30    140  12.9M  42.7M
    sdb                            -      -     28    139  12.3M  42.0M
    ...

Interpretation: If writes are pegged on the receiver or reads are pegged on the sender, your bottleneck is storage, not ZFS logic. If neither is pegged, it’s likely network, CPU (compression/encryption), or a serialized replication job.

5) Fifth: validate the “base” and “target” selection logic in your tooling

cr0x@sender:~$ zfs list -H -t snapshot -o name -s creation pool/fs | tail -n 5
pool/fs@replica_2025-12-18
pool/fs@replica_2025-12-19
pool/fs@replica_2025-12-20
pool/fs@replica_2025-12-21
pool/fs@replica_2025-12-22

Interpretation: The right base is “the last successfully replicated snapshot,” not “the newest snapshot by name” and not “yesterday.” Bookmarks make it easier to persist that state explicitly.

Common mistakes, symptoms, and fixes

Mistake 1: Treating bookmarks like snapshots

Symptom: Someone tries to mount or browse a bookmark, or expects file restore from it.

Fix: Bookmarks are ancestry markers only. If you need restore points, you need snapshots (or clones) on the receiver. Use bookmarks to keep incrementals intact while pruning snapshots, not to replace snapshots entirely.

Mistake 2: Only bookmarking on the receiver, while pruning on the sender

Symptom: Receiver has correct bookmarks, but sender errors with “cannot send incremental: snapshot does not exist” or your replication tool falls back to full sends anyway.

Fix: The sender must retain enough snapshots to generate incrementals. Receiver bookmarks help the receiver accept streams; they do not help the sender compute deltas. Align retention: sender keeps at least the “last replicated” base plus whatever your RPO needs.

Mistake 3: Using zfs receive -F as a reflex

Symptom: Replication “works” again, but receiver lost newer snapshots or local-only recovery points. Later, restore requests fail.

Fix: Decide whether the target is a strict replica (rollback acceptable) or a backup store (local history preserved). If it’s the latter, avoid -F or use separate datasets for replica vs backups.

Mistake 4: Assuming snapshot names define lineage

Symptom: Snapshots with matching names exist on both ends, but incrementals still fail with “does not match incremental source.”

Fix: Validate GUIDs. Names can collide, be recreated, or be applied to unrelated datasets after rollbacks. Use zfs get guid to confirm ancestry.

Mistake 5: Retention jobs that delete the last common base first

Symptom: Incrementals fail right after pruning, typically after a replication pause or backlog.

Fix: Retention should protect the base anchor. Common approach: maintain a fixed-name bookmark for “last received” and/or never delete snapshots newer than that anchor until a new anchor exists.

Mistake 6: Over-creating bookmarks without cleanup strategy

Symptom: Thousands of bookmarks clutter operations; listing takes longer; tooling becomes slow or confusing.

Fix: Keep bookmarks for the needed window (e.g., “last 60 anchors”) or keep a fixed-name “last-received” plus periodic “weekly anchor” bookmarks. Prune the rest deliberately.

Mistake 7: Confusing “space usage improved” with “replication safety improved”

Symptom: You prune snapshots to free space, replication later breaks, and someone suggests “just prune more.”

Fix: Separate concerns: use bookmarks for replication ancestry, snapshots for recovery points, and monitor usedbysnapshots to understand what actually pins space.

Checklists / step-by-step plan

This is a pragmatic plan you can adopt without rewriting your entire replication system. It assumes you already do snapshot-based replication and want bookmarks to harden it.

Checklist A: Introduce receiver-side “last replicated” bookmarks

  1. After each successful receive, create/update a fixed-name bookmark pointing to the received snapshot.
  2. Keep your existing snapshot retention for restores, but allow it to be more aggressive because the bookmark protects incremental lineage.
  3. Ensure your replication script/tool can use bookmark bases when generating or validating incrementals.
cr0x@backup1:~$ snap="pool/fs@replica_2025-12-22"
cr0x@backup1:~$ zfs destroy -r pool/fs#last-received 2>/dev/null || true
cr0x@backup1:~$ zfs bookmark "$snap" pool/fs#last-received
cr0x@backup1:~$ zfs list -t bookmark -o name,guid pool/fs#last-received
NAME                 GUID
pool/fs#last-received 15884259650518200641

Interpretation: A stable bookmark name gives you a stable “base” regardless of snapshot pruning or naming changes.

Checklist B: Make retention replication-aware (avoid cutting the branch you’re sitting on)

  1. Identify the snapshot corresponding to #last-received (or keep it as a snapshot if you prefer).
  2. Delete older snapshots first; never delete the anchor snapshot until you have moved the anchor forward.
  3. Prune bookmarks too, but keep at least the fixed-name bookmark.
cr0x@backup1:~$ zfs list -t bookmark -o name,createtxg -s createtxg pool/fs | tail -n 5
pool/fs#replica_2025-12-20  19591288
pool/fs#replica_2025-12-21  19634801
pool/fs#replica_2025-12-22  19678110
pool/fs#last-received       19678110

Interpretation: If #last-received tracks the latest, your retention should avoid deleting anything “newer than” its TXG-equivalent state. In practice, you protect the most recent replicated snapshot(s) and the anchor bookmark(s).

Checklist C: Recovery procedure when a receiver snapshot was deleted

  1. Stop automated replication retries (they may trigger fallbacks like full sends).
  2. Check whether the receiver has a bookmark with the base GUID.
  3. If it does, resume incrementals using that bookmark as the base.
  4. If it doesn’t, decide between: recreate a common base (rarely possible), rollback target (dangerous), or full resend (expensive but reliable).
cr0x@sender:~$ base="pool/fs@replica_2025-12-20"
cr0x@sender:~$ next="pool/fs@replica_2025-12-22"
cr0x@sender:~$ zfs send -nv -i "$base" "$next"
send from pool/fs@replica_2025-12-20 to pool/fs@replica_2025-12-22 estimated size is 18.4G

cr0x@backup1:~$ zfs list -t bookmark -o name,guid pool/fs | grep "$(ssh sender zfs get -H -o value guid "$base")"
pool/fs#replica_2025-12-20  5721345558355301722

Interpretation: This is the “prove it before you do it” approach. If you can match GUIDs, you can usually restore incremental flow without destructive receive options.

Checklist D: Performance sanity for incremental streams

  1. Dry-run estimate size with zfs send -nv.
  2. Check sender read throughput and receiver write throughput (zpool iostat).
  3. Confirm CPU overhead if using compression or encryption streams.
  4. Confirm the dataset isn’t being thrashed by other workloads during replication.

FAQ

1) Can bookmarks replace snapshots for backups?

No. Bookmarks do not contain a browsable point-in-time filesystem view and don’t let you restore files directly. They’re replication ancestry markers, not recovery points.

2) Do bookmarks consume space?

They consume a small amount of metadata space. They do not keep data blocks alive the way snapshots do, so they won’t cause the same space retention behavior.

3) If I delete a snapshot but keep a bookmark, can I still do an incremental send from that point?

On the receiver, yes: it can still validate incrementals that reference that base (bookmark). On the sender, you still need a valid base snapshot (or equivalent lineage) to generate the incremental delta.

4) What’s the operational best practice: one bookmark per snapshot or a fixed-name bookmark?

In production, a fixed-name bookmark like #last-received is the boring, correct baseline. Some teams add periodic “anchor” bookmarks (weekly/monthly) for safety, but avoid unbounded growth.

5) Why do I still get “does not exist” when the receiver has a bookmark?

Because the stream’s base reference must match what the receiver has. If your send command references @snapname but the receiver only has #snapname, that can be fine—but only if the send/receive pairing and stream metadata line up. Verify GUID matches, and consider using the bookmark explicitly as the base in the send command where supported.

6) Are bookmarks safe across rollbacks and dataset recreation?

They’re safe in the sense that they remain accurate references to the dataset history they were created from. They’re not safe in the sense of “will still apply to a dataset that was destroyed and recreated with the same name.” Use GUID checks to avoid replicating into an unrelated dataset.

7) Do bookmarks help if the sender lost snapshots due to pruning?

Not directly. If the sender no longer has the base snapshot needed to compute an incremental, you may be forced into a full send or a different recovery approach. Bookmarks primarily protect the receiver’s ability to accept incrementals without keeping old snapshots.

8) Should I use zfs receive -F to “fix” incremental failures?

Only if the target is a strict replica and you accept losing newer snapshots on the receiver. If the receiver also serves as a backup repository with local retention, -F can silently delete the very recovery points you care about.

9) How do bookmarks interact with encryption and raw sends?

Bookmarks track ancestry regardless of whether you send raw streams, but encryption adds constraints: keys and properties must be compatible with how you receive. Use dry-runs and test a restore path, not just “replication succeeded.”

10) What’s the simplest “I can implement this today” version?

On the receiver: after each successful receive, create/update #last-received pointing to the received snapshot. Then adjust snapshot retention so it never deletes the most recent replicated snapshot until the bookmark moves forward.

Conclusion

ZFS bookmarks are not glamorous, and that’s the point. They’re a small metadata feature that turns a category of replication failures into routine maintenance. When snapshots disappear—by accident, by retention, or by someone chasing free space—bookmarks can keep your incremental lineage intact on the receiver side, which often means you avoid expensive full resends and the cascading performance fallout that comes with them.

If you run ZFS replication in production, treat bookmarks like seatbelts: you don’t install them because you plan to crash. You install them because you’ve met humans, cron, and quarter-end change freezes.

← Previous
Ubuntu 24.04 “Temporary failure in name resolution”: stop guessing and fix DNS the right way
Next →
MySQL vs OpenSearch for Self-Hosted Search: Worth It or Self-Harm on a VPS?

Leave a comment