The worst storage outages aren’t loud. They’re quiet, polite, and full of empty listings:
zpool import shows nothing, management wants “just reboot it again,” and your
on-call brain starts bargaining with physics.
When ZFS won’t tell you the truth through the usual tools, zdb -C will. It reads pool
configuration straight from the disk labels—no cachefile, no wishful thinking, no “maybe it’s in
/etc/zfs.” This is the tool you use when you need facts, not vibes.
What zdb -C really does (and why you should care)
zdb is the ZFS debugger. Treat it like you treat fsck on filesystems that need it: powerful,
not routine, and not something you run casually on production during peak traffic because you got curious.
The -C flag tells zdb to dump the pool configuration it finds on disk.
Crucially, this configuration is not the same thing as “what this host currently thinks the pool looks like.”
It’s the config stored in ZFS labels on each vdev (disk, partition, or file-backed device), written by ZFS
itself. That makes zdb -C a reality check when:
- The pool won’t import or imports degraded and you don’t trust the GUI.
- You moved disks between machines and the cachefile is stale or missing.
- Device names changed (
/dev/sdaroulette), but GUIDs did not. - You suspect a split-brain / multi-host situation and need on-disk evidence.
- You need to map “mystery disks” to vdevs without risking an import.
zpool import is higher-level and opinionated. It tries to assemble a pool and report a state.
zdb -C is lower-level and blunt. It prints what ZFS labels say, even if the system is confused.
Safety posture: how to not make a bad day worse
Use zdb -C for read-only inspection. It generally won’t write, but the act of importing a pool can.
If you’re in a forensic or recovery situation, prefer:
zpool import -Nto import without mounting datasets, reducing blast radius.zpool import -o readonly=onif your platform supports it for your workflow.- Disconnect any other host that might import the same pool (yes, even “it shouldn’t”).
Joke #1: ZFS doesn’t lose your pool. Humans just hide it in the device namespace and then act surprised.
Where the truth lives: ZFS labels, uberblocks, and config trees
ZFS stores pool metadata on every top-level vdev. Each vdev has multiple labels at fixed offsets (historically
four labels), containing a serialized config and other metadata. When you run zdb -C, it reads those
labels and prints the configuration tree ZFS will use to assemble the pool.
The config is a nested structure: pool → vdev tree → children → leaves. Leaves represent actual devices
(disks, partitions, files), each with a GUID. Top-level vdevs (mirror, raidz, draid) have their own GUIDs too.
That’s why GUIDs are your anchor when device paths change.
Key pieces you’ll see in zdb -C
- pool_guid: The identity of the pool. If this changes, you’re not looking at the same pool.
- vdev_guid: Identity of a vdev node in the tree.
- path and devid: OS-dependent, often stale across machines.
- txg: Transaction group. Higher means “newer” state.
- state: Whether labels think the device was healthy, offline, removed, etc.
- ashift: Sector alignment power-of-two. Wrong
ashiftisn’t fatal, but it’s forever. - features: Feature flags that affect compatibility.
- hostid and hostname: Breadcrumbs for multi-host imports.
Why zdb -C is different from cachefile and zpool.cache
Many administrators over-trust /etc/zfs/zpool.cache or whatever their distro uses. That cachefile is a
convenience. It can be missing, stale, or wrong after re-cabling, HBA swaps, VM migrations, or “someone cleaned
up /etc because it looked messy.”
The on-disk labels are not optional. If they’re intact, ZFS can rebuild the pool config. If they’re damaged,
ZFS gets opinionated in unhelpful ways—and you’re in recovery land.
Interesting facts and historical context
- ZFS originated at Sun Microsystems and shipped in Solaris in the mid-2000s, built around end-to-end checksums and copy-on-write metadata.
- On-disk pool labels are redundant by design: multiple labels per vdev, plus multiple vdevs, because single points of metadata failure are for other filesystems.
- GUID-based identity was a deliberate choice to avoid device-name dependence. Linux renaming
sdais not a surprise; it’s a Tuesday. - The feature-flag era replaced monolithic “ZFS versions” so pools could evolve incrementally, but it also made cross-platform imports a negotiation.
- Hostid recording exists because multi-host imports can silently corrupt pools; ZFS tries to detect “this pool was last imported elsewhere.”
- zdb is intentionally sharp: it exposes internal structures (MOS, uberblocks, block pointers) and assumes you know what you’re doing.
- zpool.cache came later as a boot convenience, especially important for root-on-ZFS boot flows that can’t scan everything slowly every time.
- 4K sector alignment (ashift) became a mainstream pain point when “512e” and “4Kn” disks arrived; ZFS made it explicit rather than guessing.
- Modern OpenZFS spreads across platforms (Linux, FreeBSD, illumos) with the same basic on-disk format, but feature flags still gate compatibility.
Reading zdb -C output like you mean it
The output of zdb -C is dense because the truth is dense. Your job is to reduce it to answers:
“Which disks belong to which vdev?”, “Is this the newest txg?”, “Did this pool last import on another host?”,
and “What will ZFS try to assemble if I import?”
Patterns to look for
- Multiple configs disagreeing: If different disks show different txg values and slightly different trees, you may have missing devices, a partial write, or a split event.
- Paths that don’t exist:
/dev/disk/by-id/...values might be old. That’s fine. GUIDs matter more. - State and aux_state: A device can be
OFFLINEby intent, orUNAVAILbecause it’s gone. - Feature flags: A pool can be perfectly healthy and still unimportable on an older system.
- hostid/hostname: If it says the pool was last imported on a different host, believe it and investigate.
A single quote you should tattoo on your runbooks
“Hope is not a strategy.” — paraphrased idea often attributed to reliability culture in operations.
Practical tasks: commands, outputs, and decisions (12+)
These are the tasks I actually run when a pool is missing, degraded, or suspicious. Each task includes:
a command, what the output means, and the decision you make next.
Task 1: Confirm what ZFS thinks is importable
cr0x@server:~$ sudo zpool import
pool: tank
id: 5486843001296116957
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
tank ONLINE
mirror-0 ONLINE
sdb ONLINE
sdc ONLINE
Meaning: ZFS scanning found a pool called tank and can assemble it.
Decision: If the pool is listed, you’re likely dealing with mount issues, stale cache, or features;
proceed to zdb -C tank to confirm config and hostid, then consider zpool import -N for safer import.
Task 2: Dump on-disk config for a pool by name
cr0x@server:~$ sudo zdb -C tank
MOS Configuration:
pool_guid: 5486843001296116957
vdev_tree:
type: 'root'
id: 0
guid: 5486843001296116957
children[0]:
type: 'mirror'
id: 0
guid: 15320114977311041256
ashift: 12
children[0]:
type: 'disk'
id: 0
guid: 1183942225006373321
path: '/dev/disk/by-id/ata-SAMSUNG_SSD_870_QVO_1TB_S5R8...'
children[1]:
type: 'disk'
id: 1
guid: 10622090183881642744
path: '/dev/disk/by-id/ata-SAMSUNG_SSD_870_QVO_1TB_S5R8...'
features_for_read:
com.delphix:hole_birth
org.openzfs:project_quota
hostid: 0x8d3f2a11
hostname: 'db-node-02'
Meaning: This is the authoritative tree ZFS recorded. Note ashift, feature flags, and last importer.
Decision: If hostname isn’t your current box, stop and verify you’re not about to dual-import the pool.
Task 3: Dump configs by scanning devices (no pool name required)
cr0x@server:~$ sudo zdb -C
zdb: can't open 'tank': no such pool
zdb: examining /dev/sdb ...
MOS Configuration:
pool_guid: 5486843001296116957
vdev_tree:
type: 'root'
children[0]:
type: 'mirror'
children[0]:
path: '/dev/sdb'
children[1]:
path: '/dev/sdc'
Meaning: Even if the pool name resolution fails, zdb -C can still find labels on block devices.
Decision: Use this when zpool import shows nothing. It tells you whether labels exist at all.
Task 4: Verify disk identity mapping using by-id
cr0x@server:~$ ls -l /dev/disk/by-id/ | egrep 'SAMSUNG_SSD_870_QVO|wwn|scsi'
lrwxrwxrwx 1 root root 9 Dec 26 10:11 ata-SAMSUNG_SSD_870_QVO_1TB_S5R8... -> ../../sdb
lrwxrwxrwx 1 root root 9 Dec 26 10:11 ata-SAMSUNG_SSD_870_QVO_1TB_S5R8... -> ../../sdc
lrwxrwxrwx 1 root root 9 Dec 26 10:11 wwn-0x5002538f4123abcd -> ../../sdb
lrwxrwxrwx 1 root root 9 Dec 26 10:11 wwn-0x5002538f4fedcba -> ../../sdc
Meaning: Paths in zdb -C might be by-id; ensure they resolve to actual devices.
Decision: If the by-id paths don’t exist anymore, plan to import using -d /dev/disk/by-id and rely on GUIDs, not /dev/sdX.
Task 5: Check the cachefile that might be lying to you
cr0x@server:~$ sudo zpool get cachefile tank
NAME PROPERTY VALUE SOURCE
tank cachefile /etc/zfs/zpool.cache local
Meaning: The pool is configured to use a cachefile path.
Decision: If you moved disks to a new host or changed HBAs, consider temporarily importing with zpool import -o cachefile=none to avoid stale cache poisoning.
Task 6: Import without mounting to reduce risk
cr0x@server:~$ sudo zpool import -N tank
Meaning: Pool imported; datasets not mounted.
Decision: Run zpool status and zfs mount decisions intentionally. This is the safe stance when you suspect partial failure or feature mismatch.
Task 7: Inspect health and vdev error counters after import
cr0x@server:~$ sudo zpool status -v tank
pool: tank
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
wwn-0x5002538f4123abcd ONLINE 0 0 0
10622090183881642744 UNAVAIL 0 0 0 was /dev/sdc
errors: No known data errors
Meaning: One member is missing; ZFS remembers it by GUID and previous path.
Decision: If this is a mirror and you have the disk, find it by hardware inventory and reattach; if you don’t, plan a replace after verifying you’re not missing the wrong disk.
Task 8: Match missing GUID from zpool status to zdb -C leaf GUID
cr0x@server:~$ sudo zdb -C tank | egrep "guid:|path:"
guid: 1183942225006373321
path: '/dev/disk/by-id/wwn-0x5002538f4123abcd'
guid: 10622090183881642744
path: '/dev/disk/by-id/wwn-0x5002538f4fedcba'
Meaning: You can map the missing GUID to its expected identity.
Decision: Use this to avoid the classic mistake: replacing the wrong disk because /dev/sdX changed.
Task 9: Check feature flags that may block import on this host
cr0x@server:~$ sudo zdb -C tank | sed -n '/features_for_read:/,/hostid:/p'
features_for_read:
com.delphix:hole_birth
org.openzfs:embedded_data
org.openzfs:project_quota
hostid: 0x8d3f2a11
Meaning: These features must be supported to import read-write; sometimes read-only import is possible depending on platform/tooling.
Decision: If your system’s OpenZFS is older, upgrade ZFS packages/modules before attempting import. Don’t “force” imports across feature gaps unless you like betting your job.
Task 10: Check hostid mismatch (multi-host risk)
cr0x@server:~$ hostid
7f3a19c2
Meaning: Current hostid differs from the one recorded in labels.
Decision: Investigate whether another host still has access. If this pool was shared via SAS shelf, iSCSI LUN cloning, or VM snapshotting, stop and ensure single-writer semantics.
Task 11: Import from a specific device directory to avoid wrong matches
cr0x@server:~$ sudo zpool import -d /dev/disk/by-id -N tank
Meaning: Import only considers devices found under the given directory.
Decision: Use this when the system has lots of disks and you want to avoid accidentally importing the wrong pool (yes, that happens in labs and in “temporary” recovery VMs).
Task 12: Inspect labels directly (when you suspect label damage)
cr0x@server:~$ sudo zdb -l /dev/sdb
------------------------------------
LABEL 0
------------------------------------
version: 5000
name: 'tank'
state: 0
txg: 1483921
pool_guid: 5486843001296116957
guid: 1183942225006373321
Meaning: You can see whether the label exists, what txg it advertises, and which pool/vdev it belongs to.
Decision: If labels are missing on some vdevs (or show wildly different txg), treat the situation as partial metadata loss and proceed cautiously: attempt import with the most complete set of devices, and avoid writing until you understand the divergence.
Task 13: Compare txg across devices to find the newest consistent config
cr0x@server:~$ for d in /dev/sdb /dev/sdc; do echo "== $d =="; sudo zdb -l $d | egrep "LABEL|txg:|pool_guid:"; done
== /dev/sdb ==
LABEL 0
txg: 1483921
pool_guid: 5486843001296116957
LABEL 1
txg: 1483921
pool_guid: 5486843001296116957
== /dev/sdc ==
LABEL 0
txg: 1481200
pool_guid: 5486843001296116957
LABEL 1
txg: 1481200
pool_guid: 5486843001296116957
Meaning: /dev/sdb has newer txg than /dev/sdc.
Decision: This can indicate sdc was offline or lagging. Don’t detach/replace blindly; try to bring the old member online and resilver, but validate hardware health first.
Task 14: Find which physical slot a disk is in (so you pull the right one)
cr0x@server:~$ sudo udevadm info --query=all --name=/dev/sdb | egrep "ID_SERIAL=|ID_WWN=|ID_PATH="
E: ID_SERIAL=SAMSUNG_SSD_870_QVO_1TB_S5R8...
E: ID_WWN=0x5002538f4123abcd
E: ID_PATH=pci-0000:3b:00.0-scsi-0:0:10:0
Meaning: This correlates the Linux device to a stable WWN and a bus path.
Decision: Use this to match what zdb -C says to what your DC hands can touch. In a dense chassis, “pull disk sdb” is not a plan.
Task 15: Validate sector size and alignment expectations (ashift sanity)
cr0x@server:~$ sudo blockdev --getss /dev/sdb
512
Meaning: This is the logical sector size; physical may differ.
Decision: If zdb -C shows ashift: 12 (4K) but the device reports 512, that’s fine (512e). If you see ashift: 9 on modern disks, expect performance pain—don’t rebuild the pool just for that, but stop repeating the mistake on new pools.
Fast diagnosis playbook
When you have five minutes to stop the bleeding, you need an order of operations that avoids rabbit holes.
Here’s the sequence that finds the bottleneck quickly and keeps you from “fixing” the wrong thing.
1) Confirm visibility: do the disks exist and are they the right ones?
- Run
lsblkandls -l /dev/disk/by-idto confirm devices present. - If SAN/LUN: confirm multipath is stable before touching ZFS.
cr0x@server:~$ lsblk -o NAME,SIZE,MODEL,SERIAL,WWN,TYPE
NAME SIZE MODEL SERIAL WWN TYPE
sdb 931.5G Samsung SSD 870 S5R8... 0x5002538f4123abcd disk
sdc 931.5G Samsung SSD 870 S5R8... 0x5002538f4fedcba disk
Decision: If disks aren’t visible, stop. Fix cabling/HBA/multipath first. ZFS can’t import disks it can’t see.
2) Ask ZFS politely: what does zpool import see?
cr0x@server:~$ sudo zpool import -d /dev/disk/by-id
pool: tank
id: 5486843001296116957
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
Decision: If it shows up, your problem is likely not “pool vanished.” It’s often mountpoints, keys, or hostid warnings.
3) If import scan is empty or suspicious, go to disk truth with zdb -C and zdb -l
cr0x@server:~$ sudo zdb -C | head -n 30
zdb: examining /dev/sdb ...
MOS Configuration:
pool_guid: 5486843001296116957
vdev_tree:
type: 'root'
children[0]:
type: 'mirror'
Decision: If zdb sees labels but zpool import doesn’t, suspect cachefile confusion, device access permissions, or feature mismatch.
4) Identify the bottleneck type
- Not visible: hardware/OS discovery.
- Visible but not importable: feature flags, hostid, missing vdevs, or corrupted labels.
- Importable but degraded: missing disks, bad paths, or intermittent I/O errors.
- Imported but slow: separate track—scrubs, resilvers, recordsize mismatch, or ashift regrets.
Common mistakes: symptoms → root cause → fix
1) “zpool import shows nothing” but disks are present
Symptoms: zpool import returns no pools; lsblk shows disks.
Root cause: ZFS modules not loaded, wrong device directory scanned, or permissions/udev timing after boot in initramfs context.
Fix: Load ZFS, scan the right place, and verify labels.
cr0x@server:~$ sudo modprobe zfs
cr0x@server:~$ sudo zpool import -d /dev/disk/by-id
cr0x@server:~$ sudo zdb -l /dev/sdb | head
2) “zpool import warns: last accessed by another system”
Symptoms: Import warning about another host; zdb -C shows different hostid/hostname.
Root cause: Pool was imported elsewhere recently (or a clone has identical labels). Dual access risk.
Fix: Confirm single-writer. Power off the other node or remove access paths. If you’re importing a clone intentionally, use proper procedures to avoid GUID collisions (clone handling differs by platform).
3) “It imports but shows devices as UNAVAIL with old /dev paths”
Symptoms: vdevs show was /dev/sdX; new host uses different naming.
Root cause: Device paths are not stable; ZFS stored a path that no longer resolves.
Fix: Use /dev/disk/by-id and let ZFS match by GUID; then clear/replace as needed.
4) “Feature flags prevent import after OS rollback”
Symptoms: Import fails with unsupported features; zdb -C shows features your stack doesn’t support.
Root cause: You upgraded ZFS, enabled new features, then booted older ZFS (common after rescue media boot).
Fix: Boot a system with equal/newer OpenZFS support. Don’t toggle features casually; treat them as schema migrations.
5) “One disk shows older txg and keeps flapping”
Symptoms: Mirror member frequently offlines; label txg lags; zpool status shows intermittent errors.
Root cause: Real I/O issues: bad cable, flaky HBA port, power issues, or a dying SSD.
Fix: Replace suspect parts. Don’t “clear” errors repeatedly and pretend it’s healed. Scrub after stabilizing hardware.
6) “Pool imported on wrong host because cachefile pointed to the wrong thing”
Symptoms: Unexpected pool name imports; pool doesn’t match expected vdevs.
Root cause: Stale cachefile, cloned VM images with identical zpool.cache, or multiple pools with similar names.
Fix: Import using -d /dev/disk/by-id, verify with zdb -C before mounting, and consider -o cachefile=none during recovery.
Three corporate-world mini-stories
Mini-story 1: An incident caused by a wrong assumption
A mid-sized SaaS shop migrated a storage shelf from one database node to another during a maintenance window.
The plan was simple: shut down node A, move the SAS cables, boot node B, import pool, done. The operator did
the physical work cleanly. The boot went fine. zpool import showed nothing.
The on-call assumed, loudly, that “ZFS must have lost the pool because the cachefile didn’t come over.”
They copied /etc/zfs/zpool.cache from node A’s last backup onto node B. Still nothing. Then they rebooted,
because of course they did.
The actual issue was more banal: node B’s HBA enumerated the shelf as a different set of device nodes,
but udev rules that created /dev/disk/by-id links were missing in the initramfs environment used during early boot.
ZFS wasn’t “losing” anything; it simply couldn’t see stable identifiers at the moment it tried to import.
Someone finally ran zdb -C against the raw block devices after boot, saw intact labels and the correct
pool_guid. They imported with zpool import -d /dev once the OS was fully up, then fixed the initramfs
so the by-id paths existed early. The wrong assumption wasn’t that cachefiles exist; it was believing they’re
authoritative. They’re not.
Mini-story 2: An optimization that backfired
A company with a large analytics cluster decided boot time was too slow. Their nodes had a zoo of disks:
OS SSDs, ephemeral NVMe, and a set of shared JBOD shelves. Someone “optimized” by pinning ZFS imports to a
narrow device directory and disabling broad scans. It shaved seconds off boot. Everybody clapped.
Months later, a shelf was replaced under warranty. Same model, same capacity, different WWNs. The udev links
under /dev/disk/by-id changed shape. The boot-time import script still pointed at a stale directory and,
worse, filtered by an outdated naming pattern.
The pool didn’t import automatically, but the system came up “healthy” otherwise. Monitoring was tied to
mounted datasets, so alerts came late. When the on-call tried manual import, it partially assembled the pool
using the disks it could find, leaving some vdevs missing. ZFS correctly refused a clean import.
zdb -C saved the day by making the mismatch obvious: the on-disk config listed vdev GUIDs that didn’t
exist in the filtered device set. The fix was to stop being clever: scan /dev/disk/by-id broadly, and alert
on “pool not imported” explicitly, not just “filesystem not mounted.” The optimization wasn’t evil; it just
assumed hardware identity would never change. Hardware loves proving you wrong.
Mini-story 3: A boring but correct practice that saved the day
A financial services team ran mirrored boot pools and a separate raidz pool for data. They also had a strict,
boring habit: every disk bay had a physical label matching the WWN, and every change ticket included a snippet
of zdb -C output archived with the pool_guid and vdev GUIDs.
One quarter, after a rushed data center move, a data pool came up degraded. Two disks showed as present, one
was missing. The junior tech on site insisted they had installed all drives. The OS showed a disk in the bay,
but ZFS didn’t see it as part of the pool.
The on-call used the archived zdb -C snapshot to identify the missing leaf GUID and expected WWN. Then they
compared it to udevadm info for the physically installed disk. It was the right size, wrong WWN: someone had
inserted a spare from a different shelf, same vendor, same label color. Close enough for humans. Not close
enough for ZFS.
They swapped in the correct drive, resilvered, and moved on. No heroics. No data loss. Just the kind of
procedural dullness that makes storage reliable. Joke #2: The most powerful storage feature is still “label your disks,” annoyingly absent from most marketing brochures.
Checklists / step-by-step plan
Checklist A: You can’t import the pool and you need answers first
- Confirm devices:
lsblk,dmesgfor link resets, and verify/dev/disk/by-idexists. - Run
sudo zpool import -d /dev/disk/by-id. If empty, proceed. - Run
sudo zdb -C(scan mode). Confirm you see the expectedpool_guid. - Run
sudo zdb -lfor each candidate disk and comparepool_guidandtxg. - Check
features_for_readand confirm your platform supports them. - Check
hostid/hostnamein labels and ensure no other host can import. - Decide import mode:
zpool import -Nfirst, optionally with-o cachefile=none.
Checklist B: You found the pool, but devices are UNAVAIL
- Run
zpool status -vand record missing GUIDs. - Map GUIDs to paths using
zdb -C poolnameand locate corresponding WWNs. - Verify physical presence and bus path with
udevadm info. - Fix hardware path issues first (cables/HBA/backplane). Then attempt online/replace actions.
- After changes, scrub or at least monitor resilver completion and error counters.
Checklist C: You’re doing a planned migration (preventive medicine)
- Before shutdown, capture:
zpool status -v,zdb -C pool, and device WWNs. - Ensure the target host has compatible OpenZFS feature support.
- Use stable device names (
/dev/disk/by-id) on the target; avoid/dev/sdXin scripts. - Import with
-Nfirst, validate, then mount intentionally. - Only after successful import, update cachefile if you use one.
FAQ
1) Is zdb -C safe to run on production?
It’s read-oriented and typically safe, but “safe” depends on your operational context. If your storage is
already failing and every I/O triggers timeouts, even reading labels can add load. Use it deliberately.
2) Why does zdb -C show paths that don’t exist?
Because paths are hints, not identity. ZFS stores whatever path it last knew. The identity is the GUID, often
correlated with WWN/devid. Use by-id links and GUID mapping to reconcile.
3) What’s the difference between zdb -C and zpool import?
zpool import tries to assemble and present importable pools using OS discovery and heuristics.
zdb -C dumps the stored config from labels, even when assembly fails.
4) Can zdb -C help if the pool name is unknown?
Yes. Run zdb -C without a pool name to scan devices and print configs it finds. Pair with zdb -l
on specific devices for clarity.
5) What does txg tell me during recovery?
Higher txg generally means newer metadata. If some vdev labels advertise a much older txg, that device
likely missed writes (offline, failing, or disconnected). It’s a clue, not a verdict.
6) If I see a hostid mismatch, what should I do?
Assume the risk is real until proven otherwise. Verify the other host is powered off or has no access paths.
Then import cautiously (often with -N) and validate before mounting or writing.
7) How do feature flags in zdb -C relate to import failures?
Pools can require certain features to be understood. If your system doesn’t support a required feature, import
may fail outright. The fix is usually upgrading OpenZFS, not “forcing” anything.
8) Why does ZFS remember a missing disk by a number (GUID) instead of a device name?
Because device names are not stable across boots and hosts. GUIDs are stable identities written into labels.
This is one of ZFS’s best design choices—especially when you’re tired and it’s 03:00.
9) Can I use zdb -C to plan a safe disk replacement?
Yes. Use it to map the leaf GUID and expected by-id path/WWN, then match that to physical inventory via
udevadm info. It reduces the chance you replace the wrong disk in a mirror/raidz set.
10) What if labels are corrupted on one disk?
If redundancy exists (mirror/raidz) and other labels are intact, ZFS can usually import and reconstruct.
If multiple devices have damaged labels, stop improvising and treat it as data recovery: stabilize hardware,
clone disks if needed, and work from the most complete label set you can read.
Conclusion: practical next steps
zdb -C is how you stop guessing. It reads pool configuration straight from disk labels and gives you the
facts you need when imports fail, devices rename, or someone swears “nothing changed.”
Next steps you can do today, before the next outage:
- Add a runbook section:
zpool import→zdb -C→zdb -lwith a decision tree. - Standardize on
/dev/disk/by-idfor imports and monitoring, not/dev/sdX. - Archive
zdb -Coutput after major storage changes. It’s cheap insurance and excellent blame-free evidence. - Train your team to map GUIDs to WWNs to physical slots. This is how you avoid “we replaced the wrong disk” incidents.