Everything is fine until the pool isn’t. One day you’re doing a routine zfs send; the next day someone slacks you a screenshot of checksum error like it’s a horoscope. You run zpool status, it points vaguely at “corrupted data,” and your confidence takes a small but measurable IOPS hit.
This is where zdb shows up. It’s the ZFS debugger: a flashlight that also doubles as a laser. It can tell you what ZFS really thinks is on disk, not what the nice high-level tools politely report. It’s also the tool people avoid until they need it—because it’s sharp, under-documented in places, and not shy about showing you uncomfortable truths.
What zdb is (and what it is not)
zdb is ZFS’s introspection tool. If zpool is your fleet dashboard and zfs is your product UI, zdb is the engineer’s console with the back cover removed. It reads the on-disk structures and can print internal metadata: block pointers, object sets, datasets, metaslabs, the MOS (Meta Object Set), feature flags, DDT (dedup tables), spacemap accounting, and more. It’s not “a repair tool” in the way people hope. It’s a truth tool. Repairs are typically scrubs, resilvers, replacing hardware, or restoring from known-good copies.
The fear around zdb isn’t irrational. It’s a “debugger” in the literal sense: it shows you the plumbing, including the places where plumbing can explode. And it has options that can be expensive or disruptive if you run them casually on a production box during peak traffic. Most of the time, you’ll use it in read-only, non-destructive ways. You’ll ask: “What is ZFS seeing?” not “Can I poke this until it changes?”
What zdb is great at
- Confirming space accounting: why
dfdisagrees withzfs list, why “used” doesn’t add up, and where the bytes went. - Explaining performance: metaslab fragmentation, recordsize behavior, compression ratios, indirect blocks, and why a vdev is screaming.
- Investigating corruption: mapping reported errors to datasets and sometimes to objects/blocks, especially when combined with
zpool status -vand scrub output. - Understanding features: verifying what’s enabled, active, or required on a pool.
- Forensics: sometimes identifying objects associated with paths, and sometimes understanding what changed between snapshots.
What zdb is not
- Not a routine monitoring tool: it’s not for cronjobs that run hourly across all pools. Don’t normalize running deep debug scans as “observability.”
- Not a magic “undelete” button: ZFS is copy-on-write; snapshots are your undelete. If you don’t have them, zdb can help with forensics but rarely with miracles.
- Not a substitute for scrubs:
zpool scrubis how you validate and repair checksums. zdb can point and explain; scrub can correct (if redundancy exists).
One useful mindset: zdb is a microscope, not a defibrillator. If you’re trying to “bring the pool back,” your real tools are hardware replacement, import/recovery options, backups, and calm.
Interesting facts and history that actually matters
Some context makes zdb less mystical and more… mechanical. Here are a few concrete points that influence how you should think about it.
- ZFS was born at Sun Microsystems as an end-to-end data integrity filesystem + volume manager, and zdb is part of that “trust, but verify” culture baked into the tooling.
- zdb exists because ZFS metadata is rich: object sets, dnodes, block pointers, checksums, and space maps are first-class citizens. The tool is basically a metadata printer with opinions.
- Modern ZFS implementations diverged (Illumos, OpenZFS, vendor ports). zdb output and flags can differ between platforms and versions—treat online examples as sketches, not gospel.
- Feature flags replaced version numbers in pool on-disk format evolution. zdb is one of the clearest ways to see what features are enabled and active on a pool.
- Copy-on-write changes “corruption” narratives: old blocks don’t get overwritten in place. That’s great for consistency and snapshots, but it changes how you reason about “the file that got corrupted yesterday.”
- Scrub/resilver semantics matter: ZFS can repair from redundancy if it knows which copy is correct (via checksums). zdb helps you understand where and why scrub complained.
- Dedup has always been a sharp knife: the DDT is metadata-heavy and can become the performance bottleneck. zdb exposes DDT stats that explain “why everything is slow” better than most dashboards.
- Metaslab fragmentation is real and shows up as allocator pain. zdb gives you allocator-level visibility when “the disks are not full but writes are slow.”
- zdb has historically been labeled “for developers” in tone, not capability. In production, it’s for SREs too—when used intentionally and sparingly.
And yes: zdb can feel like reading a hex dump with aspirations. But it’s structured, consistent, and once you learn a handful of outputs, it becomes a practical tool—not a haunted one.
Safety rules: how not to turn debugging into an incident
Rule zero: don’t run heavyweight zdb commands on a pool that’s already limping unless you understand the cost. When latency is on fire, your “quick check” can be the last straw.
Operational guardrails
- Prefer read-only operations and avoid options that scan every block unless you’re off-peak or on a replica.
- Capture context first:
zpool status,zpool get,zfs get,arcstat(if available), and OS-level I/O stats. zdb is not the first tool, it’s the “show me the internals” tool. - Pin your version expectations: zdb flags differ. Run
zdb -?and trust local help more than muscle memory. - Don’t “optimize” based solely on zdb: it shows internals, not user experience. Always correlate with application latency, queue depths, and error logs.
- Never test recovery flags on the only copy: if you’re trying pool import recovery options, practice on a clone or a snapshot-backed replica first.
Joke #1: zdb is like opening the hood while driving—technically possible, socially frowned upon.
A single reliability quote you should keep nearby
Hope is not a strategy.
— General Gordon R. Sullivan
zdb is what you use when you stop hoping and start proving.
A mental model: what zdb can see that zfs/zpool won’t show you
ZFS is layered. When you type zfs list, you’re asking a friendly library: “What datasets do you know about?” When you type zdb, you’re asking: “What is written on disk, in the MOS, in the object sets, in the block trees?” That difference matters when metadata is inconsistent, when the pool is partially imported, or when you’re doing forensics.
The handful of internal nouns you need
- Pool / SPA: the storage pool allocator—top-level pool logic.
- MOS (Meta Object Set): the pool’s metadata “filesystem.” If the MOS is unhappy, everything is unhappy.
- Dataset: the ZFS notion of a filesystem/volume/snapshot, with properties and references to blocks.
- Object set: the collection of objects (dnodes) for a dataset.
- Dnode: metadata describing an object (file, directory, ZVOL block device, etc.).
- Block pointer (blkptr): a reference to a block, including size, checksum, birth txg, and physical DVAs (where it lives).
- TXG: transaction group. A time-ish sequence of changes committed together.
- Metaslab: per-vdev space allocator unit. Fragmentation and free space live here.
Most production use of zdb falls into two categories: (1) “tell me why space/perf looks wrong,” and (2) “tell me what block/object is broken and how bad it is.” If you keep that split in your head, you’ll choose better commands and stop spelunking for sport.
Fast diagnosis playbook (check first/second/third)
This is the on-call version. You’re allowed to be tired, but not allowed to be random.
First: establish whether you have a reliability problem or a performance problem
- Reliability signals: checksum errors, I/O errors, degraded/faulted vdevs, scrub errors, unexpected read/write errors.
- Performance signals: high latency with “healthy” pool, slow writes, slow reads, space nearly full, weird free space behavior, ARC misses, allocator thrash.
Second: decide if zdb is needed
- If
zpool statusclearly points to a disk and redundancy can heal, do that first (replace, resilver, scrub). zdb is helpful but not mandatory. - If the symptom is “space doesn’t add up,” “metaslabs are fragmented,” “what dataset is consuming space,” or “what features are active,” zdb is often the fastest path.
- If the pool won’t import, zdb can tell you whether labels/MOS look sane before you start recovery options.
Third: run the cheapest zdb queries that answer the question
- Pool config and feature flags: confirms what you’re dealing with and whether an import mismatch is likely.
- Space accounting: checks “where did my space go” without walking every block.
- Metaslab stats: checks fragmentation and allocation behavior.
- Targeted object inspection: only when you already have a suspect dataset/object.
If you’re looking for a bottleneck quickly: determine whether you’re constrained by (a) a single vdev’s latency, (b) allocation/fragmentation, (c) dedup/DDT, (d) small-record writes, (e) sync writes/log device behavior, or (f) plain old hardware errors. zdb helps most with (b) and (c), and with explaining (a) after the fact.
Practical tasks: commands, output meaning, and the decision you make
These are real things you can do in production without turning your storage box into a science project. Each task includes a command, an example of the kind of output you’ll see, what it means, and the operational decision you make from it.
Task 1: Confirm the pool’s on-disk view and top-level health clues
cr0x@server:~$ sudo zdb -C tank
MOS Configuration:
vdev_tree:
type: 'root'
id: 0
guid: 12345678901234567890
children[0]:
type: 'raidz'
id: 0
guid: 9876543210987654321
ashift: 12
nparity: 2
children[0]:
type: 'disk'
path: '/dev/disk/by-id/ata-SAMSUNG_SSD_1'
guid: 1111
children[1]:
type: 'disk'
path: '/dev/disk/by-id/ata-SAMSUNG_SSD_2'
guid: 2222
features_for_read:
com.delphix:hole_birth
org.openzfs:embedded_data
org.openzfs:project_quota
What it means: This prints the MOS config: vdev topology, ashift, GUIDs, and the features required for read. It’s the “what does disk say the pool is” truth source.
Decision: If your OS device paths changed, GUIDs still match—good. If features_for_read includes something your target host doesn’t support, importing elsewhere will fail. Plan upgrades accordingly.
Task 2: List datasets and space at the pool’s internal accounting level
cr0x@server:~$ sudo zdb -Lbbbs tank
Dataset tank [ZPL], ID 50, cr_txg 4, 1.23G used, 7.88T available
Dataset tank/home [ZPL], ID 54, cr_txg 120, 310G used, 7.55T available
Dataset tank/vm [ZVOL], ID 61, cr_txg 2201, 2.10T used, 5.75T available
What it means: zdb is summarizing dataset usage from ZFS’s internal structures, not from df. It can highlight datasets you forgot existed (especially zvols).
Decision: If a dataset surprises you, stop guessing. Confirm with zfs list -o space and decide whether you need quotas, reservations, or snapshot cleanup.
Task 3: Inspect feature flags and whether they’re active
cr0x@server:~$ sudo zdb -S tank
Storage pool tank:
version: 5000
features:
async_destroy
enabled
active
embedded_data
enabled
active
spacemap_histogram
enabled
active
What it means: The pool is using feature flags; “enabled” means the pool supports it, “active” means it has been used and is now required for full compatibility.
Decision: If you’re planning a pool migration to an older appliance/host, “active” features are your compatibility blockers. Don’t learn this mid-migration.
Task 4: Validate vdev labels and GUIDs (disk identity sanity)
cr0x@server:~$ sudo zdb -l /dev/disk/by-id/ata-SAMSUNG_SSD_2
------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'tank'
state: 0
txg: 902341
pool_guid: 12345678901234567890
vdev_guid: 2222
top_guid: 9876543210987654321
What it means: This reads the on-disk ZFS label. It’s how you prove “this disk belongs to this pool,” independent of Linux naming.
Decision: When someone swapped cables or a controller renumbered drives, this prevents replacing the wrong disk. If a disk label points to a different pool, stop and investigate before doing anything destructive.
Task 5: Check metaslab fragmentation and allocation health
cr0x@server:~$ sudo zdb -mm tank
Metaslab statistics for pool 'tank':
vdev 0: raidz
metaslabs: 512
free space: 5.75T
fragmentation: 62%
largest free segment: 128M
What it means: Fragmentation here is allocator fragmentation, not “file fragmentation.” High fragmentation with small largest-free-segment means allocations get expensive, especially for large blocks.
Decision: If fragmentation is high and writes are slow, you either (a) free space (delete + zpool trim if SSDs, depending on setup), (b) add vdevs to increase free space and reduce allocator pressure, or (c) plan a rewrite/migration. Don’t try to “defrag” ZFS; it doesn’t work like that.
Task 6: See space maps and histogram hints (is space accounting weird?)
cr0x@server:~$ sudo zdb -DD tank
DDT-sha256-zap-duplicate: 1248 entries, size 1.10M on disk, 2.40M in core
DDT-sha256-zap-unique: 98122 entries, size 86.0M on disk, 210M in core
What it means: This is dedup table information. “In core” hints at memory pressure when dedup is enabled and actively used.
Decision: If dedup is enabled and your ARC/memory is tight, expect performance to degrade under load. If dedup isn’t mission-critical, plan to migrate away from it rather than hoping it behaves.
Task 7: Identify what object number corresponds to a file path (targeted forensics)
cr0x@server:~$ sudo zdb -vvv tank/home 2>/dev/null | head -n 20
Dataset tank/home [ZPL], ID 54, cr_txg 120, 310G used, 7.55T available
Object Number Type
1 1 ZFS plain file
2 2 ZFS directory
3 3 ZFS plain file
What it means: Dumping full object listings is expensive; even sampling shows the object model. On many systems, you’ll use more targeted approaches (like finding an inode first) rather than dumping everything.
Decision: If you’re investigating a known file, don’t brute-force. Use inode/object mapping (next task) so you can inspect only what matters.
Task 8: Map a Linux inode to a ZFS object and inspect its blocks
cr0x@server:~$ stat -c 'inode=%i path=%n' /tank/home/app/logs/badfile.log
inode=914227 path=/tank/home/app/logs/badfile.log
cr0x@server:~$ sudo zdb -dddd tank/home 914227 | sed -n '1,25p'
Object lvl iblk dblk dsize dnsize bonus type
914227 1 128K 16K 512K 512 320 ZFS plain file
path /app/logs/badfile.log
gen 7241
size 501760
parent 912000
Indirect blocks:
0 L1 0:10000:20000 20000L/10000P F=1 B=81234/81234 cksum=on
What it means: For many platforms, ZFS object numbers correspond to inode numbers. zdb -dddd prints the dnode and block tree. The DVA-ish fields show where blocks live; checksum settings and levels show the structure.
Decision: If zpool status -v references a file and you can map to an object, you can determine whether it’s a small isolated object or part of broader metadata damage. If it’s isolated and you have redundancy, scrub/resilver is likely to fix; if not, restore that file from snapshot/backup.
Task 9: Check pool space accounting at the MOS level (why “used” doesn’t add up)
cr0x@server:~$ sudo zdb -b tank | sed -n '1,40p'
Traversing all blocks...
blocks = 14723918
leaked = 0
compressed = 1.82T, uncompressed = 2.61T, ratio = 1.43x
bp logical = 2.61T, bp physical = 1.82T
What it means: This is a full block traversal. It can be expensive. “leaked=0” is good: no unreachable-but-allocated blocks detected by the traversal.
Decision: Use this when you suspect leaks or severe accounting inconsistencies and you can afford the scan. If it’s a busy pool, schedule this off-peak or run on a replica.
Task 10: Understand compression effectiveness per dataset
cr0x@server:~$ sudo zdb -DDDDD tank/vm | sed -n '1,25p'
Dataset tank/vm [ZVOL], ID 61, cr_txg 2201
bp logical = 2.40T, bp physical = 2.33T, ratio = 1.03x
compression: lz4
What it means: Compression is enabled but barely doing anything. For VM images, that’s common depending on guest filesystem and data entropy.
Decision: Don’t disable compression reflexively; lz4 is cheap and can still help metadata and zero blocks. But stop expecting compression to “save” space on already-compressed data. Plan capacity like an adult.
Task 11: Inspect a specific block pointer (deep corruption work)
cr0x@server:~$ sudo zdb -bbbb tank 0:10000:20000
Block 0:10000:20000
size=131072L/65536P compression=lz4
birth=81234 fill=1
cksum=sha256 2f3a...9c1d
DVA[0]=<0:10000:20000>
What it means: You’re looking at a block pointer by DVA. This is “surgery,” usually used when correlating reported checksum errors to physical locations.
Decision: If block pointer inspection indicates a single DVA on a single device repeatedly failing, that supports the case for device replacement or controller path investigation. If multiple DVAs are impacted, you might be looking at broader corruption or systemic I/O issues.
Task 12: Print ZIL and intent log hints (sync write pain)
cr0x@server:~$ sudo zdb -iv tank | sed -n '1,60p'
ZFS_DBGMSG(zil): zil_claim: txg 902340 replayed 0 blocks
ZFS_DBGMSG(zil): zil_itxg_clean: cleaned up log blocks
What it means: On some systems, zdb can emit ZIL-related information. This helps confirm whether log replay happened and whether the pool believes it’s clean.
Decision: If an app complains about missing recent sync writes after a crash, confirm whether ZIL replay occurred. If it didn’t, you’re investigating import semantics and whether the pool was imported read-only or with recovery flags.
Task 13: Spot a too-small ashift after the fact (performance and wear)
cr0x@server:~$ sudo zdb -C tank | grep -n 'ashift'
18: ashift: 9
What it means: ashift is the sector size ZFS uses for alignment. ashift: 9 means 512B sectors. On modern SSDs and many HDDs, 4K alignment is the sane default (ashift 12).
Decision: If ashift is too small, you can’t change it in place. Plan a migration or rebuild. If performance is currently acceptable, you still plan it—because wear amplification is a slow-burn incident.
Task 14: Check whether you’re paying the “special vdev” tax or enjoying the benefit
cr0x@server:~$ sudo zdb -C tank | sed -n '1,120p' | grep -n "special" -n
74: children[1]:
75: type: 'special'
76: id: 1
77: guid: 3333
What it means: A special vdev can store metadata (and optionally small blocks). It can be a performance win, or a disaster if undersized or not redundant.
Decision: If you have a special vdev, you treat it like tier-0 storage with redundancy and monitoring. If it dies and it’s not redundant, your pool can become unusable. This is not a “nice-to-have” device; it is structural.
Task 15: Confirm what ZFS thinks the pool’s uberblock history looks like (import paranoia)
cr0x@server:~$ sudo zdb -u tank | head -n 20
Uberblock[0]
magic = 0000000000bab10c
version = 5000
txg = 902341
guid_sum = 2222222222222222
timestamp = 2025-12-26 12:41:03
Uberblock[1]
txg = 902340
timestamp = 2025-12-26 12:40:52
What it means: Uberblocks are the “checkpoints” used for pool import. Seeing multiple recent uberblocks with monotonic txg progression is reassuring.
Decision: If the latest uberblock is far behind expected or timestamps look wrong, you suspect incomplete writes, controller caching issues, or a disk not actually persisting data. That changes your recovery plan: you stop trusting “it should be there.”
Joke #2: zdb output is the closest thing storage engineers have to poetry—mostly because nobody else can read it.
Three corporate mini-stories from the land of “it seemed reasonable”
1) The incident caused by a wrong assumption: “If it imports, it’s fine”
A mid-sized company ran a ZFS-backed VM platform on a pair of storage servers. After a power event, one node came back, imported the pool, and looked “healthy enough.” The admin saw no degraded vdevs. The hypervisor cluster started launching VMs again. Everybody exhaled.
Two days later, a handful of VMs started logging filesystem errors. Then a database instance crashed with checksum complaints inside the guest. ZFS reported a growing list of checksum errors, but only during certain read patterns. The team assumed “bit rot,” kicked off a scrub, and went back to meetings.
The scrub didn’t really move. It crawled and occasionally stalled. Meanwhile, latency spikes hit the VMs during business hours. They replaced a drive that seemed “slow,” but errors persisted. That’s when someone finally ran zdb -u and zdb -l against the suspect devices.
The labels showed something ugly: one controller path intermittently presented stale data. The pool imported because the metadata was consistent enough, but the “latest” uberblocks weren’t as recent as the team assumed. zdb -u revealed a suspicious gap in txg progression after the power event. Combined with kernel logs, it pointed to a write cache/flush problem—storage acknowledged writes that never hit stable media.
The fix wasn’t a filesystem incantation. It was hardware and policy: replace the controller, validate cache settings, and run scrubs with I/O isolation. They restored a small set of affected VM disks from snapshots replicated to the other node. The lesson that stuck: import success is not integrity success. zdb didn’t “fix” anything; it proved the timeline was lying.
2) The optimization that backfired: dedup as a cost-saving miracle
A large internal platform team got a directive: “Reduce storage footprint.” They noticed many VM templates and container layers were duplicates. Someone proposed enabling ZFS dedup on the main pool. A short test looked promising: space usage dropped on a small dataset, and everyone high-fived quietly.
They enabled dedup more broadly. At first, it looked fine. Then Monday arrived. Latency increased, then increased again. The storage nodes began swapping under load. Writes slowed. Reads started queueing. It wasn’t catastrophic; it was worse. It was the slow-motion kind of outage where everyone keeps adding dashboards until the dashboards start timing out too.
They ran zdb -DD and saw the DDT sizes and in-core estimates. It was not subtle. The pool was paying a metadata tax that didn’t fit in RAM, so every I/O became a scavenger hunt through cold metadata. ARC hit ratios dropped; the CPU was busy doing legitimate work that nobody wanted.
Disabling dedup doesn’t remove existing deduped blocks. They were now committed to the choice at the on-disk level. The exit plan became a migration plan: replicate datasets to a new pool with dedup off, validate, cut over, and eventually destroy the old pool. It took time, change management, and patience.
The backfired optimization wasn’t “dedup is bad.” The real failure mode was enabling it without modeling memory, workload, and operational exit cost. zdb made the cost visible. It didn’t make the decision reversible.
3) The boring but correct practice that saved the day: labels, spares, and rehearsed imports
An enterprise team ran several ZFS pools across multiple racks. Nothing fancy. They did three things consistently: (1) every disk had stable by-id paths tracked in inventory, (2) every pool had hot spares and clear replacement procedures, and (3) quarterly, they rehearsed pool import and recovery steps on a staging host using replicated snapshots.
One night a storage node failed hard. The hardware team replaced a backplane, and suddenly the OS enumerated disks differently. The junior admin on call did the correct boring thing: before replacing anything, they ran zdb -C on the pool and zdb -l on a few devices to verify membership and GUIDs.
They discovered two drives were present but behind a different path, and one “missing” drive was actually there—just renamed. Without that check, they could have accidentally offlined the wrong disk and pushed the pool into a degraded state during the import.
The pool imported cleanly. A scrub was scheduled. Performance stayed stable. Nobody wrote a postmortem because nothing broke. The real win was procedure: inventory + label verification + rehearsal. zdb was the quiet supporting actor that made “calmly correct” possible.
Common mistakes (symptom → root cause → fix)
1) Symptom: “zfs list says one thing, df says another”
Root cause: You’re mixing viewpoints: dataset logical space vs. pool physical space, snapshots, refreservation, and the fact that df reports filesystem view after properties like refquota.
Fix: Use zfs list -o space for first-pass, then confirm with zdb -Lbbbs pool for internal view. If a dataset has a refreservation or large snapshot usage, adjust quotas/reservations and implement snapshot retention policies.
2) Symptom: Pool is “ONLINE” but scrub finds checksum errors repeatedly
Root cause: A device/controller path returns wrong data intermittently, or you have latent sector errors that redundancy can sometimes but not always correct.
Fix: Replace the suspect disk or controller, not just “scrub harder.” Use zdb -l to confirm disk identity, and correlate error locations. Run a scrub after hardware remediation.
3) Symptom: Writes slow down dramatically when pool is ~80–90% full
Root cause: Allocator pressure and metaslab fragmentation; free space exists but not in useful segments.
Fix: Check metaslab stats via zdb -mm. Free space (delete data and snapshots) to bring utilization down, or add vdev capacity. Long term: don’t run pools that full if you care about latency.
4) Symptom: After enabling dedup, everything feels like it’s running through molasses
Root cause: DDT metadata doesn’t fit in RAM, causing constant cache misses and heavy random I/O.
Fix: Use zdb -DD to quantify. If already active, plan migration off dedup. If not yet enabled, don’t enable dedup without a memory model and an exit plan.
5) Symptom: Pool import fails on a different host “even though it’s the same disks”
Root cause: Feature flags required for read are not supported on the target host’s ZFS version.
Fix: Use zdb -C pool and zdb -S pool to see required/active features. Upgrade target host or choose a compatibility strategy before you move disks.
6) Symptom: Small random writes are terrible, sync-heavy workload stalls
Root cause: SLOG absent/misconfigured, or device latency issues; also recordsize/volblocksize mismatch for workload.
Fix: Validate your intent log behavior and device latency. zdb can provide hints, but you’ll usually decide based on workload. For zvols, set volblocksize correctly at creation time; for filesystems, tune recordsize and app I/O patterns.
7) Symptom: Replacing a disk doesn’t reduce errors; the “wrong disk” got pulled
Root cause: Reliance on OS device names like /dev/sdX rather than stable IDs and on-disk labels.
Fix: Always identify disks via /dev/disk/by-id and verify with zdb -l before offlining/replacing.
Checklists / step-by-step plan
Checklist A: Before you run zdb on production
- Write down the question you’re trying to answer (space? performance? corruption? import?).
- Capture baseline:
zpool status,zpool get all pool,zfs get all dataset(or at least relevant properties). - Check load: if latency is already high, avoid block-traversal commands.
- Confirm platform/version: run
zdb -?and verify flags exist as expected. - Prefer targeted commands (
-C,-S,-mm,-l) before full traversals. - Log outputs to a ticket or incident doc. zdb output is evidence.
Checklist B: Space mystery (“pool is full but I don’t see the data”)
- Check snapshot usage and clones at the dataset level (high-level first).
- Use
zdb -Lbbbs poolto confirm internal dataset usage and find surprises (zvols, hidden datasets). - If you suspect leaks/accounting mismatch and can afford it, run
zdb -b pooloff-peak. - Decision point: if space is held by snapshots, fix retention and automate pruning; if it’s fragmentation pressure, add capacity or migrate.
Checklist C: Corruption investigation (“checksum errors”)
- From
zpool status -v, identify affected files/datasets if listed. - Confirm redundancy state: degraded? faulted? how many errors?
- Verify the suspect disk identity with
zdb -lbefore replacing anything. - If you can map to a file, map inode to object and inspect via
zdb -dddd dataset object. - Decision point: if redundancy exists, scrub after hardware remediation; if not, restore affected objects from snapshot/backup.
- After: schedule a scrub and review cabling/controller logs. ZFS often tells you something is wrong; it doesn’t replace root-cause discipline.
Checklist D: Performance regression (“it was fast last week”)
- Confirm pool fullness and fragmentation via
zdb -mmand normal pool stats. - Check whether dedup is in play and what the DDT looks like (
zdb -DD). - Confirm
ashiftand vdev layout viazdb -C. - Decision point: if fragmentation is high, lower utilization/add vdev; if dedup is pressuring memory, migrate; if vdev layout is wrong, plan rebuild.
FAQ
Is zdb safe to run on a live production pool?
Mostly, if you stick to configuration/label/metaslab summaries. Avoid full traversals (like block walking) during peak load. Treat it like a diagnostic that consumes I/O and CPU.
Will zdb repair corruption?
No. It diagnoses. Repairs come from redundancy (scrub/resilver) or restores from snapshots/backups. zdb helps you understand the scope and likely root cause.
Why does zdb output differ between servers?
ZFS implementations and versions differ. Flags and output formats can change. Always check zdb -? on the host you’re using and don’t assume a blog snippet matches your platform.
When should I use zdb instead of zpool/zfs?
When you need internal truth: feature flags, labels, metaslab fragmentation, DDT stats, uberblock history, or object/block details that high-level tools abstract away.
Can zdb help me find what’s holding space after I delete files?
Indirectly. zdb can validate dataset usage and sometimes reveal that snapshots/clones are retaining blocks. The “fix” is usually snapshot retention policy, not more zdb.
Does zdb help with “deleted file recovery”?
Sometimes for forensics, but operationally the correct answer is snapshots. If you don’t have snapshots, zdb may identify objects, but recovery is unreliable and time-consuming.
What’s the single most useful zdb command to memorize?
zdb -C pool. It tells you what the pool is: vdev layout, ashift, and feature requirements. It prevents dumb mistakes during replacements and migrations.
How do I know if dedup is hurting me?
If dedup is active and the DDT in-core needs exceed available memory, you’ll see cache misses and I/O amplification. zdb -DD pool provides DDT sizing hints that correlate strongly with pain.
Can zdb explain why my pool is slow even though it’s not full?
Yes, via metaslab fragmentation and allocation stats (zdb -mm). “Not full” isn’t the same as “easy to allocate.” Fragmentation can make a half-empty pool behave like a crowded parking lot.
Conclusion: what to do next time before you panic
zdb isn’t scary because it’s dangerous. It’s scary because it’s honest. It will happily show you that your pool is compatible with fewer hosts than you assumed, that your ashift was a bad life choice, or that your “space savings” plan is actually a metadata furnace.
Practical next steps that pay off:
- Practice on a non-critical pool: run
zdb -C,zdb -S,zdb -mm, andzdb -luntil the outputs feel familiar. - Standardize disk identification using by-id paths and label checks before replacements. Make it policy, not heroism.
- Write down your “fast diagnosis” sequence and keep it in the on-call runbook. zdb is most effective when it’s answering a specific question.
- Budget time for the irreversible decisions: dedup, special vdevs, ashift, and vdev layout are architecture. zdb will show you the consequences; it can’t unmake them.
When the pool misbehaves, you don’t need more mystery. You need evidence. zdb is evidence—delivered in a dialect that rewards patience and punishes improvisation.