ZFS zfs diff: Finding Exactly What Changed Between Snapshots

December 30, 2025 • February 3, 2026 • Read: 21 min • Views: 11

Was this helpful?

ZFS snapshots are the closest thing storage has to time travel that still passes an audit. But time travel is only useful if you can answer the question your boss, your security team, or your own 3 a.m. brain will ask: what changed?

zfs diff is the blunt, honest tool for that job. It’s not fancy, it’s not a GUI, and it doesn’t care about your feelings. It tells you which paths were added, removed, modified, or renamed between two snapshots (or between a snapshot and “now”). Used well, it turns a scary “something happened” into a list you can act on.

What `zfs diff` is (and what it is not)

zfs diff compares two points in time in a single dataset (or zvol, though file paths are the common case). Those points are usually snapshots like pool/ds@snapA and pool/ds@snapB, or a snapshot and the current live filesystem. The result is a stream of path-level changes: created files, removed files, modified files, renamed paths.

It is not a content-aware diff like diff -u. It will not tell you “line 83 changed.” It tells you “this file changed,” which is usually exactly what operations and incident response need. If you want content diffs, you can mount snapshots and run your own tools (and yes, that’s slower, noisier, and easier to mess up).

It is also not an authoritative record of why something changed, or which process did it. It’s a filesystem-level accounting tool: here’s the set of paths whose metadata/content state differs between two transaction groups.

Joke #1: Snapshots don’t lie, but administrators do—usually to themselves, five minutes before the incident call.

Facts and historical context you can use in meetings

These are short, concrete points that help you explain why ZFS snapshot diffs behave the way they do—and why that’s a feature, not an inconvenience.

ZFS snapshots are copy-on-write, not “backup copies.” A snapshot is a consistent reference to existing blocks; new writes go elsewhere. That’s why snapshots are cheap and instant.
“Rename” is not “copy then delete” at the filesystem semantics level. ZFS can detect renames as a first-class operation, and zfs diff can report them when it can correlate the object identity.
ZFS was designed with end-to-end checksums as a baseline. That means “file changed” can be tied to real block changes, not just timestamps lying because an app touched metadata.
Snapshot-based replication (send/receive) predates today’s “immutable backup” marketing. ZFS has been doing incremental snapshot streams for years; the “immutability” comes from policy and permissions, not a new file format.
FreeBSD and illumos have long treated ZFS as a first-class citizen. Many production behaviors (including tooling maturity around snapshots) were hammered out there before Linux ZFS reached the same operational comfort.
zfs diff output is a view of metadata and object state, not application intent. For example, a database checkpoint may change lots of files even if the logical “data” barely moved.
ZFS datasets can have different mountpoints and can be unmounted. zfs diff doesn’t need the dataset mounted to compare snapshots; it compares snapshot trees internally.
Clones are writable snapshots. The diff between a snapshot and a clone’s snapshot can be a clean “what diverged” story—useful in dev/test, or when prod accidentally ran on a clone (yes, it happens).

How `zfs diff` actually works under the hood

At a high level, ZFS is tracking object sets. A dataset is an object set; a snapshot is a read-only, pinned version of that object set. zfs diff walks two versions and compares directory entries and object metadata, then emits a set of path changes.

The important operational consequence: the diff is based on filesystem objects, not on your application’s worldview. If an application rewrites a file in place, you’ll see a modification. If it writes a new temp file and renames it over the original, you may see a modification plus rename behavior depending on how the operation landed in the txg and whether object identity correlation is possible.

Another consequence: path-based reporting is reconstructed. ZFS has object numbers and directory structures; to print “/var/log/messages,” the tool must walk directories and reconstruct names. On huge directory trees, especially with churn, this can be expensive. “Expensive” here doesn’t necessarily mean “slow disk”; it can mean “a lot of metadata walking,” which turns into ARC pressure or lots of random reads if it misses cache.

One subtlety worth remembering: the live filesystem (“now”) is not a snapshot. If you run zfs diff pool/ds@snap against the live dataset while it’s actively changing, you’re asking for a moving target. ZFS will do its best, but your operational conclusion should be “this is indicative,” not “this is a court transcript.” For a stable comparison, diff snapshot-to-snapshot.

Reading the output without lying to yourself

The canonical output format is a single-character change code followed by a path. Common codes you’ll see:

+ path added
- path removed
M path modified (content or metadata change)
R path renamed (often shown as “from” and “to” paths depending on implementation)

Interpretation advice from someone who’s been burned:

“Modified” is bigger than “contents changed”

A file can be “modified” because of permissions, ownership, ACLs, xattrs, timestamps, link count changes, or content. If you’re doing incident response, you often care about content changes; if you’re doing compliance, you might care about metadata changes even more.

Renames are your friend—until they aren’t

When a rename is detected, it’s a gift: it means you can track movement without panicking about a delete+create. But rename detection can fail when the correlation is ambiguous (massive churn, temp files, or patterns that break object identity mapping). Don’t treat a missing R as proof that a rename didn’t happen; treat it as “the tool couldn’t reliably prove it.”

Deletions are often the most valuable signal

If you’re debugging a broken deployment, a single removed file in /etc can explain more than 5,000 “modified” files in /var. When reading diffs, filter by directories that map to your failure domain.

Joke #2: zfs diff is like a code review: it doesn’t fix the problem, but it does remove your ability to pretend nothing happened.

Practical tasks: 14 real commands with interpretation

All examples assume a dataset like tank/app with snapshots @pre and @post. Adjust names to your environment.

Task 1: Compare two snapshots (baseline)

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post
M       /etc/app/app.conf
+       /var/lib/app/new-index.db
-       /var/lib/app/old-index.db
R       /var/log/app/old.log   -> /var/log/app/old.log.1

Interpretation: Config changed, index rotated, one file replaced, and a log rotated via rename. This is typical of a package upgrade or application maintenance job.

Task 2: Compare a snapshot to the current live filesystem

cr0x@server:~$ sudo zfs diff tank/app@pre
M       /etc/app/app.conf
+       /var/tmp/app-build-9281.tmp

Interpretation: “Since @pre, what differs now?” Good for quick “what did we change since last known-good” checks. Beware high churn: results can change while you read them.

Task 3: Limit noise by grepping to a subtree

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post | grep '^M' | grep '^M[[:space:]]\+/etc/'
M       /etc/app/app.conf
M       /etc/app/limits.conf

Interpretation: Reduce “changed logs” noise. In outages, you usually want a short list of config and binary changes first.

Task 4: Count changes by type (cheap triage)

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post | awk '{print $1}' | sort | uniq -c
   82 +
   10 -
  911 M
    4 R

Interpretation: 911 modified paths is a lot. If you expected a config-only change, this is a red flag. If you expected a package upgrade or a DB migration, it might be normal.

Task 5: Find the “top directories” affected

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post \
  | awk '{print $2}' \
  | awk -F/ 'NF>2 {print "/"$2"/"$3} NF==2 {print "/"$2} NF==1 {print "/"}' \
  | sort | uniq -c | sort -nr | head
  640 /var/lib
  210 /var/log
   95 /usr/local
   12 /etc/app

Interpretation: Most churn is under /var/lib and /var/log. That suggests application data movement and logging—not random config drift.

Task 6: Verify snapshots exist and belong to the same dataset lineage

cr0x@server:~$ zfs list -t snapshot -o name,creation -s creation | grep '^tank/app@'
tank/app@pre     Mon Dec 23 11:58 2025
tank/app@post    Mon Dec 23 12:12 2025

Interpretation: If snapshots were taken from different datasets (or after a rollback/clone divergence), diffs may be meaningless or fail.

Task 7: Confirm the dataset mountpoint and whether it’s mounted

cr0x@server:~$ zfs get -H -o property,value mountpoint,mounted tank/app
mountpoint  /app
mounted     yes

Interpretation: Not required for zfs diff, but required if you plan to inspect files directly (e.g., mount snapshots via .zfs/snapshot).

Task 8: Inspect a changed file across snapshots (content-level confirmation)

cr0x@server:~$ sudo sed -n '1,120p' /app/.zfs/snapshot/pre/etc/app/app.conf
# app.conf - baseline
max_workers=32
log_level=info

cr0x@server:~$ sudo sed -n '1,120p' /app/.zfs/snapshot/post/etc/app/app.conf
# app.conf - after deploy
max_workers=64
log_level=debug

Interpretation: zfs diff told you “modified”; this tells you “how.” In incidents, do this for the handful of files that matter.

Task 9: Detect suspicious mass renames (common in ransomware and “log cleanup” scripts)

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post | grep '^R' | head
R       /var/lib/app/data/part-0001 -> /var/lib/app/data/part-0001.locked
R       /var/lib/app/data/part-0002 -> /var/lib/app/data/part-0002.locked
R       /var/lib/app/data/part-0003 -> /var/lib/app/data/part-0003.locked

Interpretation: A burst of uniform renames with new extensions is suspicious. It might be a legitimate archival process. It might also be a “fast encryption then rename” pattern. Escalate quickly.

Task 10: Validate whether incremental send should include the expected changes

cr0x@server:~$ sudo zfs diff tank/app@replica-base tank/app@replica-next | head -20
M       /etc/app/app.conf
+       /var/lib/app/new-index.db

Interpretation: If your replication target “didn’t get the config change,” compare the exact snapshots you’re sending. Many replication “mysteries” are just “we sent the wrong base snapshot.”

Task 11: Correlate snapshot diffs with space usage (is this change expensive?)

cr0x@server:~$ zfs list -o name,used,refer,avail,mountpoint tank/app
NAME      USED  REFER  AVAIL  MOUNTPOINT
tank/app  412G  380G   2.1T   /app

cr0x@server:~$ zfs get -o name,property,value -s local,received,default usedbysnapshots,usedbydataset,usedbychildren tank/app
tank/app  usedbysnapshots  31.4G
tank/app  usedbydataset    380G
tank/app  usedbychildren   0B

Interpretation: Lots of “modified” files doesn’t always mean lots of new blocks, but it often correlates. If usedbysnapshots is rising quickly, churn is pinning old blocks.

Task 12: Spot “metadata churn” that looks like content churn

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post | grep '^M' | grep '^M[[:space:]]\+/var/log/' | head
M       /var/log/app/access.log
M       /var/log/app/error.log
M       /var/log/app/metrics.log

Interpretation: Logs changing is normal; don’t let it drown the signal. If you need a “what changed in code/config,” filter out known churn directories.

Task 13: Confirm snapshot naming and ordering to avoid “backwards diff” confusion

cr0x@server:~$ zfs list -t snapshot -o name,creation -s creation tank/app | tail -5
tank/app@daily-2025-12-21   Sun Dec 21 00:00 2025
tank/app@daily-2025-12-22   Mon Dec 22 00:00 2025
tank/app@daily-2025-12-23   Tue Dec 23 00:00 2025
tank/app@pre                Tue Dec 23 11:58 2025
tank/app@post               Tue Dec 23 12:12 2025

Interpretation: When you diff “newer vs older” you can confuse yourself about plus/minus. Always verify creation times.

Task 14: Use `zfs diff` output as an allowlist for surgical restores

cr0x@server:~$ sudo zfs diff tank/app@good tank/app@bad | awk '$1 ~ /^M|^-$/ {print $2}' | head
/etc/app/app.conf
/usr/local/bin/app
/usr/local/lib/libapp.so

cr0x@server:~$ sudo cp -a /app/.zfs/snapshot/good/usr/local/bin/app /app/usr/local/bin/app

Interpretation: Sometimes you don’t want a full rollback; you want to restore a small set of known-good files. Snapshot browsing plus a carefully filtered diff gives you a plan that won’t nuke unrelated good data.

Three corporate-world stories (pain included)

Mini-story 1: The incident caused by a wrong assumption (snapshots as “backups”)

The setup was familiar: a business-critical application dataset on ZFS, hourly snapshots, and a replication job to a secondary box. Everyone slept well because “we have snapshots.” The team even had a wiki page that said “snapshots = backups,” which is the sort of sentence that should come with a warning label and a fire extinguisher.

Then a developer ran a cleanup script against production. It wasn’t malicious; it was just pointed at the wrong environment variable. A couple of directories under /var/lib/app were recursively removed. Within minutes, monitoring turned into screaming. The on-call person said the sentence you never want to hear: “It’s okay, we have snapshots, I’ll just roll back.”

The rollback restored the deleted data. It also reverted a handful of legitimate writes made after the snapshot—writes that included customer-facing state. The resulting inconsistency wasn’t catastrophic, but it was ugly: retries, duplicate events, and a few hours of manual reconciliation with support and operations. The postmortem wasn’t about ZFS being risky; it was about the wrong assumption that the “safest” action is the fastest one.

What fixed it next time was boring discipline: before any rollback, they ran zfs diff between the last known-good snapshot and the current state. The diff made the blast radius visible. Instead of rolling back the entire dataset, they restored only the deleted subtrees from the snapshot. They kept the legitimate writes. The next incident wasn’t fun, but it was contained—and containment is what “resilience” looks like when the pager is loud.

Mini-story 2: The optimization that backfired (aggressive snapshot cadence + diff in cron)

A different organization wanted “near real-time auditing” for a compliance initiative. The plan: take snapshots every five minutes, run zfs diff between the last two, archive the output, and ship it to a central system. On paper, it looked elegant. In practice, it was a small denial-of-service against their own storage.

First problem: the dataset was a build cache with millions of small files. zfs diff had to walk huge directory trees and reconstruct paths. Every five minutes. That translated into relentless metadata pressure. ARC hit ratios dipped. Latency spiked. Developers complained that “ZFS is slow,” which is the storage equivalent of blaming the road for your bad driving.

Second problem: the output was gigantic and mostly noise. Build systems churn. They create temp files, rename things, delete directories, repeat. The compliance team wanted “what changed,” but what they got was “everything always changes.” The signal-to-noise ratio was so low that nobody looked at the reports, which is compliance’s secret failure mode: you can generate an infinite amount of evidence that nobody reads.

The recovery plan was pragmatic. They reduced snapshot frequency to match business needs (hourly for most, more frequent only for a few small datasets). They stopped running diffs on noisy caches and instead focused on config and data-of-record datasets. And when they did run diffs, they filtered to meaningful subtrees and summarized counts rather than archiving every path. The optimization didn’t fail because ZFS can’t do it; it failed because the workload didn’t deserve that level of scrutiny.

Mini-story 3: The boring but correct practice that saved the day (diff before replication cutover)

This one is the sort of story nobody tells at conferences because it’s not glamorous. A team planned a storage migration: new servers, new pool layout, ZFS send/receive replication, then a cutover window. The usual anxiety: “Will the target be identical?” and “What if we miss something?”

They did something unsexy: they wrote a runbook that required a pre-cutover zfs diff on a frozen pair of snapshots—one on the source, one on the destination after receive. Not “trust the replication job,” not “compare zfs list outputs,” but an explicit change list check on representative datasets.

During dress rehearsal, the diff showed a handful of unexpected changes under a directory that “should never change.” It turned out one service on the destination host had auto-started and began writing cache files into a mountpoint that was supposed to stay idle until cutover. The replication itself was fine; the environment wasn’t.

They fixed the service ordering, repeated the rehearsal, and the diffs went quiet. On cutover night, it was anticlimactic in the best way. The boring practice—diffing snapshots that represent “source-of-truth” and “received-as-truth”—caught a real divergence before it became a late-night data integrity argument.

Fast diagnosis playbook

When you need answers quickly, don’t start by staring at thousands of diff lines. Start with a small number of checks that narrow the problem into “expected change,” “unexpected change,” “tooling problem,” or “performance bottleneck.”

Step 1: Confirm you’re comparing the right snapshots

cr0x@server:~$ zfs list -t snapshot -o name,creation -s creation tank/app | tail -10

Look for: Are the snapshot names correct? Are they in the order you think? Did someone rollback and re-create snapshots with confusing names?

Step 2: Determine whether the change is “data churn” or “control-plane churn”

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post | awk '{print $2}' | head

Then immediately: summarize where it’s happening.

cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@post \
  | awk '{print $2}' \
  | awk -F/ 'NF>2 {print "/"$2"/"$3} NF==2 {print "/"$2} NF==1 {print "/"}' \
  | sort | uniq -c | sort -nr | head -15

Look for: Mostly /var/log and /var/tmp (usually noise) vs /etc, /usr, and the app’s data directories (usually meaningful).

Step 3: If performance is the bottleneck, check whether you’re metadata-bound

cr0x@server:~$ arcstat 1 5
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
12:00:01   210    88     41    60   28    22   10     6    2   128G  192G

Interpretation: High miss rates during a diff run suggest metadata isn’t in ARC and the system is doing real work to walk directories. If you don’t have arcstat, use OS-native IO and memory telemetry; the pattern is the same: CPU + random reads + cache misses.

Step 4: Validate dataset properties that change the meaning of “modified”

cr0x@server:~$ zfs get -o name,property,value atime,recordsize,compression,acltype,xattr,overlay,encryption tank/app

Look for: atime=on can create metadata churn. ACL and xattr behavior changes can create waves of “modified” results across many files.

Step 5: If the diff is huge, decide whether you need a summary or a surgical answer

If you’re on an incident call, you usually don’t need all paths. You need: “Did anything change under /etc?” “Did binaries under /usr/local/bin change?” “Were files deleted under the data directory?” That’s a filtering job, not a complete inventory.

Common mistakes (symptoms and fixes)

Mistake 1: Diffing the wrong dataset because mountpoints look similar

Symptom: zfs diff reports changes that don’t match what you saw on disk, or you can’t find the paths in the live filesystem.

Cause: Multiple datasets mounted under similar paths, or a legacy mountpoint change. You diffed tank/app but you were looking at tank/app/data (or vice versa).

Fix: Confirm mountpoints and dataset boundaries.

cr0x@server:~$ zfs list -o name,mountpoint -r tank/app
NAME           MOUNTPOINT
tank/app       /app
tank/app/data  /app/data

Mistake 2: Treating snapshot-to-live diffs as stable truth during active writes

Symptom: You rerun the same diff and results change; you see temp files appear/disappear; you can’t reconcile counts.

Cause: The live filesystem is changing while you compare it to a fixed snapshot.

Fix: Take a second snapshot and compare snapshot-to-snapshot.

cr0x@server:~$ sudo zfs snapshot tank/app@now
cr0x@server:~$ sudo zfs diff tank/app@pre tank/app@now | head

Mistake 3: Assuming “M” means “file contents changed”

Symptom: Security flags “thousands of modified files” after a scan, but application behavior is normal.

Cause: Metadata changes (ACLs, xattrs, timestamps) are modifications too.

Fix: Validate with a content hash on a small sample, or examine which directories are changing. Also check if something changed ACL policies or extended attributes behavior.

Mistake 4: Forgetting that deletes might be masked by later recreation

Symptom: An incident suggests a file was deleted, but the current filesystem has a file at that path.

Cause: The file was deleted and later recreated (possibly different content/ownership). Depending on timing and application behavior, the diff might show a modification or a delete+add rather than a clean story.

Fix: Use snapshot browsing to inspect the inode history by content and metadata at both points in time. Treat the path as “changed,” not “same name implies same file.”

Mistake 5: Running huge diffs on production at the worst possible time

Symptom: Latency spikes during zfs diff, users complain, graphs go red.

Cause: Metadata-heavy traversal can compete with the workload for ARC and IO. On some datasets, this is effectively a read storm.

Fix: Run diffs off-peak, diff smaller subtrees (filter output), or perform diffs on a replica host where possible. If you must diff in prod, keep it short and targeted.

Mistake 6: Confusing “no output” with “no change” when permissions block visibility

Symptom: You get errors, truncated output, or unexpectedly empty results when you know changes occurred.

Cause: Insufficient privileges to traverse snapshots or dataset internals, especially on hardened systems.

Fix: Run as root (or with the appropriate delegated ZFS permissions). Confirm the dataset is accessible and that snapshot directories are not blocked by mount options or security frameworks.

Checklists / step-by-step plan

Checklist: “What changed since the last known-good deploy?”

Identify the last known-good snapshot (usually taken pre-deploy).
Take a “now” snapshot if the system is still changing.
Run zfs diff snapshot-to-snapshot.
Summarize changes by type and by directory.
Inspect a short list of high-value paths: config, binaries, service unit files, and secrets.
Decide between rollback, selective restore, or forward-fix.

cr0x@server:~$ sudo zfs snapshot tank/app@now
cr0x@server:~$ sudo zfs diff tank/app@predeploy tank/app@now | awk '{print $1}' | sort | uniq -c
cr0x@server:~$ sudo zfs diff tank/app@predeploy tank/app@now | grep -E '^[M+-][[:space:]]+/(etc|usr|opt)/' | head -200

Decision tip: If changes are concentrated in app data and logs, a rollback might destroy legitimate state. If changes are in binaries/config only, rollback is safer.

Checklist: “Selective restore without a full rollback”

Diff known-good vs bad snapshot and extract deleted/modified files in critical directories.
Verify existence and sanity of the snapshot copies.
Restore with cp -a (or rsync -aHAX if available) from .zfs/snapshot.
Restart the service and re-check.
Document exactly what you restored (because the next person will ask).

cr0x@server:~$ sudo zfs diff tank/app@good tank/app@bad | grep -E '^[M-][[:space:]]+/(etc/app|usr/local/bin)/' | head -50
cr0x@server:~$ sudo cp -a /app/.zfs/snapshot/good/etc/app/app.conf /app/etc/app/app.conf
cr0x@server:~$ sudo cp -a /app/.zfs/snapshot/good/usr/local/bin/app /app/usr/local/bin/app

Checklist: “Replication sanity check before cutover”

Pick a specific snapshot name that exists on both source and destination after receive.
Compare representative datasets’ snapshot diffs: source baseline vs source cutover snapshot; then verify the destination has the same snapshot and stays quiescent.
Ensure nothing is writing to destination mountpoints prior to cutover.

cr0x@server:~$ zfs list -t snapshot -o name | grep '^tank/app@cutover$'
cr0x@server:~$ sudo zfs diff tank/app@baseline tank/app@cutover | head

FAQ

1) Does `zfs diff` show file contents?

No. It reports changed paths and the change type (added/removed/modified/renamed). To see contents, browse snapshot directories and use text tools, or compute hashes on both versions.

2) Can I diff snapshots across different datasets?

Not in the sense of “compare arbitrary trees.” zfs diff is meant for snapshots within the same dataset lineage. If you need cross-dataset comparisons, mount both snapshots and use external tools (with the usual performance and correctness caveats).

3) Why do I see thousands of `M` entries after a security scan?

Because “modified” includes metadata changes, and scans can touch atime or xattrs depending on configuration. Check atime and whether the scanner writes extended attributes or quarantine tags.

4) Why is `zfs diff` slow on this one dataset?

Large directory trees with lots of small files and churn are worst-case. The tool has to walk and reconstruct paths, which is metadata-heavy. If ARC is cold, it will do real IO. Summarize first, filter to subtrees, and consider running diffs on a replica host.

5) Is a rename always shown as `R`?

No. Rename detection depends on the tool’s ability to correlate object identity and directory structure changes. If it can’t prove a rename, you may see a delete plus an add. Treat that as “something moved or was replaced,” then confirm by inspecting snapshot contents.

6) Can I use `zfs diff` to detect ransomware?

It’s a good early warning signal: mass modifications, uniform renames, sudden deletion bursts. But it won’t tell you which process did it. Pair it with immutable snapshot policy, access controls, and host-level telemetry.

7) What’s safer in an incident: rollback or selective restore?

Rollback is fast and clean when the dataset is mostly code/config and the system can tolerate reverting state. Selective restore is safer when the dataset includes live data-of-record. Use zfs diff to decide instead of guessing.

8) Does encryption change how diffs work?

Not conceptually. You still diff snapshot trees within the dataset. Operationally, ensure you have keys loaded and permissions to traverse the dataset and snapshots; otherwise you’ll get errors or incomplete visibility.

9) How do I avoid noise when I only care about config drift?

Filter the diff output to the directories that matter (/etc, service definitions, app config). Also consider separating datasets: put logs and caches in their own dataset so diffs on “config dataset” remain readable.

10) Can I automate `zfs diff` for audit trails?

Yes, but be careful: high-frequency diffs on high-churn datasets can become self-inflicted load and generate useless noise. Summarize and scope the output to meaningful subtrees.

Conclusion

zfs diff is the tool you reach for when the timeline matters and you’re done guessing. It turns snapshots from “we can roll back” into “we can explain exactly what happened,” which is a different and more powerful capability.

Use it like an operator: verify you’re diffing the right snapshots, summarize before you spelunk, filter aggressively, and confirm high-impact paths by inspecting snapshot contents. And when it’s slow or noisy, take that as information about your dataset design and churn—not as a personal insult from the filesystem.