ZFS Corrupt Labels: Fixing Import Failures the Right Way

Was this helpful?

There are few storage failures as rude as a ZFS pool that used to work and now refuses to import. The server boots, the disks spin, and ZFS looks you in the eye and says: “no.” The error varies—corrupt labels, missing vdevs, bad GUIDs—but the result is the same: your applications are down and your stomach is performing its own resilver.

This piece is about getting that pool back the right way: with evidence, minimal extra damage, and a plan you can explain to your future self. We’ll treat “corrupt labels” as a symptom, not a diagnosis. Labels can be genuinely corrupted, or they can be “corrupt” because your device paths changed, your HBA lied, or you imported the wrong pool on the wrong host. ZFS is usually telling the truth. Humans are the unreliable component.

What ZFS labels are (and what “corrupt” really means)

ZFS doesn’t keep a single “superblock” in one precious location. It writes critical pool configuration and device identity data into four labels per vdev: two at the start of the disk (or partition) and two at the end. In those labels you’ll find things like:

  • Pool GUID and name
  • Vdev GUID and topology
  • Transaction group (TXG) pointers to “where the truth is now”
  • Uberblocks: a set of checkpoints that allow ZFS to find the most recent consistent state

When ZFS says a label is corrupt, it can mean several different things:

  • The label checksum doesn’t verify (actual corruption).
  • The device doesn’t contain the expected pool/vdev GUIDs (wrong disk, stale disk, or device mapping changed).
  • Only some labels are readable (partial damage, end-of-disk issues, enclosure quirks).
  • The label is fine, but the rest of the metadata tree points to blocks that can’t be read (a deeper I/O problem that bubbles up as label/uberblock trouble).

One more nuance that people miss under stress: ZFS import is a read-heavy operation. Import can fail because reads are timing out, not because metadata is logically wrong. A path that “kind of works” under light load may collapse when ZFS scans labels, reads uberblocks, and tries to open every vdev. So when you see “corrupt label,” don’t immediately reach for destructive tools. First, prove what kind of “corrupt” you’re dealing with.

Interesting facts and historical context

  1. ZFS labels are redundant by design: four per vdev. Losing one is common; losing all four usually indicates a serious I/O or overwrite event.
  2. Early ZFS designs avoided single points of failure: the multi-label approach is part of the “no superblock” philosophy that shaped ZFS from the start.
  3. Pool identity is GUID-based, not name-based: the pool name is human-friendly; the GUID is what ZFS trusts.
  4. Device paths are not stable identifiers: “/dev/sda” has always been a bad idea for production pools. Persistent paths like by-id exist because the world is chaotic.
  5. Modern ZFS keeps multiple uberblocks: it can roll back to an earlier consistent state when the latest TXG is unreadable.
  6. Import behavior evolved: flags like -F (rewind) and features like checkpoints exist because real systems crash, and metadata needs safe fallback mechanisms.
  7. 4K sector reality hit hard: ashift mistakes (e.g., 512e vs 4Kn) have caused pools that “work” until you replace a disk and suddenly nothing lines up the way you assumed.
  8. HBAs and expanders can be liars: transient link resets and enclosure firmware bugs have historically masqueraded as corruption.
  9. ZFS is conservative on import: it would rather refuse to import than import a pool in a way that risks silent damage.

Import failures: the failure modes that matter

1) Real label corruption (overwrite or media damage)

This happens when something writes over the start/end of the device (wrong partitioning tool, misguided “wipefs”, misdirected dd, a RAID controller initializing metadata), or when the disk can’t reliably read those regions. In the overwrite case, it’s often sudden and total: ZFS can’t identify the vdev at all. In the media case, it can be intermittent: labels readable one boot, not the next.

2) Wrong device mapping (the classic)

Device names change. Controller order changes. Multipath is half-configured. Someone moved disks between shelves. The OS now presents your disks differently and ZFS can’t find what it expects. This looks like corruption because ZFS sees disks, but the labels it reads aren’t the ones belonging to the pool you’re trying to import.

3) Missing vdevs (single disk missing in a RAIDZ is not “optional”)

Mirrors can often limp with a missing member. RAIDZ vdevs are pickier: a missing disk can prevent import, depending on redundancy and which disk is gone. Also: if you built the pool with partitions and now a disk shows up as a whole device, ZFS may not match it.

4) I/O path failure masquerading as metadata failure

Import requires reads across the topology. A flaky SAS cable, marginal power, expander issues, or a dying HBA can turn into “corrupt label” errors because reads time out or return garbage. This is where you stop believing in software fixes and start looking at hardware like it owes you money.

5) Feature flag / version mismatch (less common, still real)

Importing a pool created on a newer OpenZFS into an older implementation can fail. That typically doesn’t present as “corrupt label,” but in messy environments the messages can be confusing. Keep it on the list, just not at the top.

6) You imported the wrong pool (yes, this happens)

Shared labs, reused disks, staging environments. The box sees multiple pools. You import the wrong one, then wonder why your expected datasets aren’t there. Or you export the wrong one and take production down in a way that makes everyone suddenly “available for a quick call.”

Joke #1: The fastest way to discover your asset inventory is incomplete is to try importing a ZFS pool during an outage.

Fast diagnosis playbook

This is the “don’t get lost” sequence. The goal is to find the bottleneck quickly: wrong devices, missing devices, or failing I/O.

First: confirm what ZFS sees without changing anything

  • List importable pools (zpool import).
  • Try import read-only if it’s visible (zpool import -o readonly=on).
  • Capture errors verbatim. Don’t summarize them from memory. Your memory is not a log.

Second: confirm device identity stability

  • Use persistent paths: /dev/disk/by-id (Linux) or stable device nodes on your OS.
  • Map serials to bays/enclosures (via lsblk, udevadm, smartctl).
  • Verify that each expected disk is present and the OS can read it without errors.

Third: check for I/O errors and timeouts

  • Look at kernel logs for resets and timeouts (dmesg, journalctl).
  • Run SMART quick checks and read error counters.
  • If reads are failing, stop “ZFS surgery” and fix the hardware path first.

Fourth: inspect labels directly

  • Use zdb -l against candidate devices to see pool/vdev GUIDs.
  • Compare what you find to what import expects.

Fifth: only then attempt rewind/recovery imports

  • Try read-only import first.
  • Use -F (rewind) cautiously and understand the rollback.
  • Use extreme flags only when you can explain the blast radius.

Practical tasks: commands, outputs, decisions

These are the field moves. Each task includes a command, a sample output, what it means, and the decision you make from it. Assume Linux with OpenZFS unless otherwise stated.

Task 1: List pools ZFS thinks exist

cr0x@server:~$ sudo zpool import
   pool: tank
     id: 10384722971711646021
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        tank                          ONLINE
          mirror-0                    ONLINE
            ata-ST8000VN004-2M2101_ZA1A2B3C  ONLINE
            ata-ST8000VN004-2M2101_ZD4E5F6G  ONLINE

   pool: prod
     id: 1692242210959834013
  state: FAULTED
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: zpool(8)
 config:

        prod                          FAULTED  corrupted data
          raidz2-0                    FAULTED
            ata-WDC_WD120EMFZ-11A6JA0_9HGK1AAA  UNAVAIL  invalid label
            ata-WDC_WD120EMFZ-11A6JA0_9HGK2BBB  ONLINE
            ata-WDC_WD120EMFZ-11A6JA0_9HGK3CCC  ONLINE
            ata-WDC_WD120EMFZ-11A6JA0_9HGK4DDD  ONLINE
            ata-WDC_WD120EMFZ-11A6JA0_9HGK5EEE  ONLINE
            ata-WDC_WD120EMFZ-11A6JA0_9HGK6FFF  ONLINE

What it means: ZFS can see a pool named prod but one device is UNAVAIL invalid label. That’s either a missing disk, wrong device, or a label that can’t be read.

Decision: Do not run zpool import -f yet. First, verify the disk exists and is readable at the OS level, then inspect labels.

Task 2: Try a read-only import (safe first touch)

cr0x@server:~$ sudo zpool import -o readonly=on prod
cannot import 'prod': one or more devices is currently unavailable

What it means: Read-only doesn’t override missing vdev requirements. The pool still can’t assemble.

Decision: Identify the missing/unavailable device and whether it’s truly gone or just misidentified.

Task 3: Verify the OS sees the expected block devices

cr0x@server:~$ lsblk -o NAME,SIZE,MODEL,SERIAL,WWN,TYPE | sed -n '1,12p'
NAME   SIZE MODEL              SERIAL      WWN                TYPE
sda  10.9T WDC WD120EMFZ-11A6  9HGK2BBB    0x50014ee2b1a2b222  disk
sdb  10.9T WDC WD120EMFZ-11A6  9HGK3CCC    0x50014ee2b1a2b333  disk
sdc  10.9T WDC WD120EMFZ-11A6  9HGK4DDD    0x50014ee2b1a2b444  disk
sdd  10.9T WDC WD120EMFZ-11A6  9HGK5EEE    0x50014ee2b1a2b555  disk
sde  10.9T WDC WD120EMFZ-11A6  9HGK6FFF    0x50014ee2b1a2b666  disk

What it means: We expected six disks; only five are present. The “invalid label” device might actually be missing, not corrupted.

Decision: Check physical connectivity, enclosure slot, HBA visibility, and logs. Don’t rewrite labels for a disk the OS can’t even see.

Task 4: Check kernel logs for link resets and missing device events

cr0x@server:~$ sudo journalctl -k -b | egrep -i 'sd[a-z]|sas|ata|reset|timeout|I/O error' | tail -n 20
kernel: ata12: hard resetting link
kernel: ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
kernel: sd 12:0:0:0: [sdf] tag#18 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
kernel: sd 12:0:0:0: [sdf] tag#18 CDB: Read(16) 88 00 00 00 00 00 00 00 00 10 00 00 00 08 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 16 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
kernel: sd 12:0:0:0: [sdf] Synchronizing SCSI cache
kernel: sd 12:0:0:0: [sdf] Stopping disk

What it means: There was an sdf, but it timed out on reads near the start of disk (sector 16 is label territory). That’s classic “label corruption” caused by I/O failure.

Decision: Treat this as hardware/path first. Reseat, swap cable, try another HBA port, check power. If it’s a single disk failure in RAIDZ2, plan to replace the disk and let ZFS heal—once you can import.

Task 5: Enumerate persistent device IDs to avoid /dev/sdX roulette

cr0x@server:~$ ls -l /dev/disk/by-id/ | egrep 'WDC_WD120EMFZ|ST8000' | head
lrwxrwxrwx 1 root root  9 Dec 26 10:12 ata-WDC_WD120EMFZ-11A6JA0_9HGK2BBB -> ../../sda
lrwxrwxrwx 1 root root  9 Dec 26 10:12 ata-WDC_WD120EMFZ-11A6JA0_9HGK3CCC -> ../../sdb
lrwxrwxrwx 1 root root  9 Dec 26 10:12 ata-WDC_WD120EMFZ-11A6JA0_9HGK4DDD -> ../../sdc
lrwxrwxrwx 1 root root  9 Dec 26 10:12 ata-WDC_WD120EMFZ-11A6JA0_9HGK5EEE -> ../../sdd
lrwxrwxrwx 1 root root  9 Dec 26 10:12 ata-WDC_WD120EMFZ-11A6JA0_9HGK6FFF -> ../../sde

What it means: These IDs survive reboots and device renumbering better than sdX names.

Decision: When importing or replacing devices, use these stable identifiers (or equivalents on your OS). If your pool was built with unstable paths, consider migrating configuration once you’re stable again.

Task 6: Inspect labels on a known-good disk

cr0x@server:~$ sudo zdb -l /dev/disk/by-id/ata-WDC_WD120EMFZ-11A6JA0_9HGK2BBB | sed -n '1,25p'
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'prod'
    state: 0
    txg: 1984567
    pool_guid: 1692242210959834013
    vdev_guid: 15812007741234567890
    hostid: 1029384756
    hostname: 'fileserver-a'
    top_guid: 7321098765432109876
    guid_sum: 12345678901234567890

What it means: This disk clearly belongs to pool prod (pool GUID matches the import listing). Labels are readable.

Decision: Use this as a reference. Now check the “bad” disk (if visible) for label readability and whether it belongs to the same pool.

Task 7: Inspect labels on the suspect disk (if it appears at all)

cr0x@server:~$ sudo zdb -l /dev/sdf
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3

What it means: All four labels are unreadable. That’s not “one flipped bit.” It’s usually unreadable regions, a controller translation issue, or the disk is returning nonsense.

Decision: If this is a redundant vdev (mirror/RAIDZ with enough parity), plan to replace sdf. If redundancy is insufficient, you’re now in data recovery territory and should avoid any writes to the device.

Task 8: Validate whether the disk is actually failing (SMART)

cr0x@server:~$ sudo smartctl -a /dev/sdf | egrep -i 'Reallocated|Pending|Uncorrect|Offline|CRC|SMART overall|Power_On_Hours'
SMART overall-health self-assessment test result: FAILED!
  5 Reallocated_Sector_Ct   0x0033   001   001   010    Pre-fail  Always       -       3120
197 Current_Pending_Sector  0x0012   001   001   000    Old_age   Always       -       128
198 Offline_Uncorrectable   0x0010   001   001   000    Old_age   Offline      -       128
199 UDMA_CRC_Error_Count    0x003e   200   199   000    Old_age   Always       -       12
Power_On_Hours             0x0032   062   062   000    Old_age   Always       -       31240

What it means: The drive is failing and the link has CRC errors too (could be cable/backplane). Labels being unreadable is consistent with this.

Decision: Replace the drive. Also consider swapping the cable/slot, because CRC errors are often path-related. Fix both or you’ll be back here soon.

Task 9: Attempt import with device scanning limited (reduce confusion)

cr0x@server:~$ sudo zpool import -d /dev/disk/by-id prod
cannot import 'prod': one or more devices is currently unavailable

What it means: Same result, but you’ve reduced the chance ZFS is matching against stale or unexpected device nodes elsewhere.

Decision: If the missing device is real and redundancy permits, consider importing degraded with -m (if supported/appropriate), but only after confirming which vdev is missing and what redundancy you truly have.

Task 10: Import the pool (degraded) when redundancy allows

cr0x@server:~$ sudo zpool import -o readonly=on -m prod
cannot import 'prod': I/O error
        Destroy and re-create the pool from
        a backup source.

What it means: Even degraded import isn’t possible because the pool needs data that can’t be read from remaining devices, or the I/O path is unhealthy beyond one disk.

Decision: Stop and reassess: you may have more than one failing disk, or your HBA/backplane is dropping devices. Go back to hardware and verify every remaining disk is stable and readable.

Task 11: Identify exactly which vdev GUID is missing using zdb (when import listing is vague)

cr0x@server:~$ sudo zdb -C -e prod | sed -n '1,80p'
MOS Configuration:
        pool_guid: 1692242210959834013
        pool_name: prod
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 7321098765432109876
            children[0]:
                type: 'raidz'
                id: 0
                guid: 882233445566778899
                nparity: 2
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 15812007741234567890
                    path: '/dev/disk/by-id/ata-WDC_WD120EMFZ-11A6JA0_9HGK2BBB'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 15812007749876543210
                    path: '/dev/disk/by-id/ata-WDC_WD120EMFZ-11A6JA0_9HGK3CCC'
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 15812007740000111111
                    path: '/dev/disk/by-id/ata-WDC_WD120EMFZ-11A6JA0_9HGK1AAA'

What it means: ZFS expects a disk with serial 9HGK1AAA which is currently absent/unreadable. You now have a concrete target for physical work.

Decision: Find that exact disk in the chassis/enclosure. If it’s present but not detected, troubleshoot path. If it’s gone or dead, replace it—then import and resilver.

Task 12: If the pool imported, verify status before doing anything else

cr0x@server:~$ sudo zpool status -v prod
  pool: prod
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Replace the device using 'zpool replace'.
  scan: scrub repaired 0B in 0 days 00:42:17 with 0 errors on Sun Dec 22 03:12:19 2025
config:

        NAME                                                   STATE     READ WRITE CKSUM
        prod                                                   DEGRADED     0     0     0
          raidz2-0                                             DEGRADED     0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK1AAA                 UNAVAIL      0     0     0  cannot open
            ata-WDC_WD120EMFZ-11A6JA0_9HGK2BBB                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK3CCC                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK4DDD                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK5EEE                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK6FFF                 ONLINE       0     0     0

errors: No known data errors

What it means: The pool is up and degraded; it’s telling you exactly what to do next.

Decision: Replace the missing device and resilver. Do not scrub first “to be safe.” Replace first, then scrub after resilver if you need extra confidence.

Task 13: Replace the failed disk with a new one (same slot, new serial)

cr0x@server:~$ sudo zpool replace prod \
  /dev/disk/by-id/ata-WDC_WD120EMFZ-11A6JA0_9HGK1AAA \
  /dev/disk/by-id/ata-WDC_WD120EMFZ-11A6JA0_NEW9SER1AL
cr0x@server:~$ sudo zpool status prod
  pool: prod
 state: DEGRADED
  scan: resilver in progress since Fri Dec 26 10:44:01 2025
        1.23T scanned at 1.12G/s, 220G issued at 201M/s, 10.9T total
        220G resilvered, 1.94% done, 0 days 15:10:22 to go
config:

        NAME                                                   STATE     READ WRITE CKSUM
        prod                                                   DEGRADED     0     0     0
          raidz2-0                                             DEGRADED     0     0     0
            replacing-0                                        DEGRADED     0     0     0
              ata-WDC_WD120EMFZ-11A6JA0_9HGK1AAA               UNAVAIL      0     0     0  cannot open
              ata-WDC_WD120EMFZ-11A6JA0_NEW9SER1AL             ONLINE       0     0     0  (resilvering)
            ata-WDC_WD120EMFZ-11A6JA0_9HGK2BBB                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK3CCC                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK4DDD                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK5EEE                 ONLINE       0     0     0
            ata-WDC_WD120EMFZ-11A6JA0_9HGK6FFF                 ONLINE       0     0     0

errors: No known data errors

What it means: Resilver is progressing. If it stalls, you likely have more path problems or another disk marginal under load.

Decision: Monitor resilver and system logs. If you see timeouts, pause the heroics and fix hardware stability.

Task 14: If import complains about “active pool” or host ownership, verify and decide

cr0x@server:~$ sudo zpool import prod
cannot import 'prod': pool may be in use from other system
use '-f' to import anyway

What it means: ZFS thinks the pool was last imported elsewhere (or didn’t export cleanly). This can be legit (shared storage) or stale (crash).

Decision: Before -f, confirm the other host is not actually using the pool. If you dual-import the same pool without clustering support, you’ll manufacture corruption in record time.

Task 15: Use rewind (-F) only after measuring the rollback

cr0x@server:~$ sudo zpool import -F -n prod
Would be able to return pool to state as of TXG 1984501
Would discard approximately 12 seconds of transactions

What it means: -n is your seatbelt. It shows what would be lost if you rewind. Here it’s trivial (seconds), which usually means a crash mid-TXG or unreadable newest uberblock.

Decision: If the rollback window is acceptable, proceed with zpool import -F prod. If it’s hours/days, stop and reassess; you may be importing the wrong devices or facing deeper damage.

Task 16: After import, verify mount behavior and avoid surprise writes

cr0x@server:~$ sudo zfs get -o name,property,value,mountpoint,mounted -r mountpoint,mounted prod | head -n 12
NAME                PROPERTY    VALUE    MOUNTPOINT   MOUNTED
prod                mountpoint  /prod    /prod        yes
prod/home           mountpoint  /home    /home        yes
prod/backups        mountpoint  /backups /backups     no
prod/backups        mounted     no       -            -

What it means: Not everything mounts automatically, and that can be intentional. In recovery mode, it can also be a sign you imported readonly or with alternate mountpoint settings.

Decision: Confirm properties before turning services back on. Don’t let an application write into the wrong mountpoint because you panicked and did a temporary altroot yesterday.

Joke #2: “It worked in staging” is comforting until you remember staging is where data goes to die quietly.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

The company had a storage cluster for internal artifacts—build caches, container layers, CI outputs. Not glamorous, but everything depended on it. One morning, after routine maintenance, the primary node rebooted and the pool wouldn’t import. The alert said “invalid label” on two disks. The on-call engineer assumed the disks were dead because the hardware dashboard showed a couple of amber lights. Easy story: replace disks, resilver, move on.

They swapped the first disk. The pool still wouldn’t import. They swapped the second. Now the pool not only wouldn’t import; it also wouldn’t assemble enough metadata to try a rewind. The graph of “confidence” looked like a ski slope.

The wrong assumption: the amber lights were not disk failure; they were link negotiation problems in a particular backplane. The disks were fine. The backplane was intermittently presenting the wrong SAS addresses, and the OS was renaming devices across boots. ZFS saw “different disks” with “wrong labels,” which looked like corruption but was really identity churn.

When they finally sat down and compared zdb -l outputs against serial numbers, they realized the “bad” disks had been replaced with good disks that had never been in the pool. ZFS wasn’t being stubborn; it was being correct. The recovery path was ugly but educational: reintroduce the original disks (now on a stable HBA path), import read-only, and replace one by one properly.

Afterward they changed two things. First, a rule: no replacing disks until you’ve proven the device identity and read path stability. Second, they stopped relying on sdX names in any runbook. “/dev/sdb” is a vibe, not an identifier.

Mini-story 2: The optimization that backfired

A different org decided that import time was too slow after outages. They had a large pool with many vdevs and a lot of devices behind expanders. Someone read about speeding up discovery by limiting device scanning paths and caching vdev mappings aggressively. They tuned startup scripts to import using a narrow set of device nodes and trimmed udev rules to “reduce noise.” It did make imports faster—until it didn’t.

Months later, a minor power event caused a subset of disks to come up slower than usual. The import script ran early, only scanned the “fast” paths, and concluded the pool had missing vdevs. Then it tried a forced import with recovery flags because “that’s what the script does when import fails.” The pool imported in a degraded state with a rewind. Services resumed. Everyone congratulated the automation.

The cost showed up later as weird application errors: missing recent data in a few datasets, some files reverting to older versions. The rewind had discarded a small but meaningful window of transactions. It wasn’t a huge loss, but it was the kind that makes auditors stare at you like you just admitted you enjoy surprise downtime.

The optimization backfired because it traded correctness for speed without guardrails. Import is not the time to be clever. ZFS is assembling a consistent state; your scripts shouldn’t be trying to “help” by skipping disks that are merely slow to appear.

They fixed it by making startup logic boring again: wait for udev settle, scan stable by-id paths, and refuse to use rewind flags automatically. Recovery flags became a human decision with a human reading the rollback estimate first.

Mini-story 3: The boring but correct practice that saved the day

A finance-adjacent company ran a ZFS pool backing a set of compliance archives. The workload was write-once, read-sometimes, with periodic verification scrubs. It was the kind of system that people forget exists until they really need it—which is exactly when you don’t want surprises.

They had one unsexy rule: every disk replacement required recording the disk’s serial, bay location, and the ZFS vdev GUID mapping in a ticket. No exceptions. They also had a standing quarterly exercise: simulate an import on a staging host with exported pools (read-only) to verify that their recovery procedure still matched reality.

One day a controller failed and was replaced under warranty. After replacement, device enumeration order changed and a couple of disks came up with different OS names. The pool import initially failed with “cannot open” and “invalid label” on what appeared to be random devices.

The on-call engineer didn’t guess. They pulled the last mapping from the ticket system, compared it with zdb -l outputs on the current host, and rebuilt the correct device list using by-id paths. The pool imported cleanly, no rewind required, no replacement disks ordered in a panic.

It wasn’t heroic. It was correct. Boring practices don’t get you applause, but they do get you sleep.

Common mistakes (symptoms → root cause → fix)

1) Symptom: “invalid label” on a disk after a reboot

Root cause: The disk is present but unreadable at the beginning/end (I/O errors), or the OS mapped a different disk to the expected path.

Fix: Check logs for I/O timeouts, confirm serials via lsblk/udevadm, inspect labels with zdb -l. Replace only after you’ve proven it’s the correct disk and it’s actually failing.

2) Symptom: Import shows “pool may be in use from other system”

Root cause: Pool not cleanly exported, or it really is imported elsewhere (shared JBOD, reused disks, split-brain risk).

Fix: Confirm the other host is down or has exported the pool. Only then use zpool import -f. If the pool might be active elsewhere, stop. Dual import is self-inflicted pain.

3) Symptom: Import succeeds only with -F rewind

Root cause: Newest uberblock/TXG is unreadable (often due to a failing disk or controller reset during write), or missing recent devices during import attempt.

Fix: Run zpool import -F -n first and read the rollback window. If it’s small, proceed. If it’s large, suspect wrong devices or broader corruption. Verify hardware stability before repeating imports.

4) Symptom: Pool imports but datasets mount incorrectly or not at all

Root cause: Imported with altroot, readonly, or a changed cachefile; or mountpoint properties were changed in the chaos.

Fix: Inspect zfs get mountpoint,mounted, confirm zpool get cachefile, and avoid “temporary fixes” that linger. Make mount behavior explicit before restarting services.

5) Symptom: Import hangs for a long time

Root cause: A disk is timing out reads; ZFS is waiting on slow I/O during label/metadata discovery.

Fix: Check kernel logs, run SMART, and look for a single device dragging the system. Fix the path or remove the failing device if redundancy allows.

6) Symptom: “corrupted data” immediately after someone “cleaned partitions”

Root cause: A tool overwrote the label areas (beginning/end) or changed partitioning offsets.

Fix: Stop writing to disks. Identify which disks still have intact labels via zdb -l. If enough labels survive and redundancy permits, import and replace. If not, you’re restoring from backup or doing specialized recovery.

7) Symptom: After HBA swap, half the disks show as different sizes or 512/4K mismatch

Root cause: Controller presents different logical sector size or translation (512e vs 4Kn), confusing assumptions around ashift and alignment.

Fix: Confirm lsblk -t and disk logical/physical sector sizes. Keep consistent HBAs/firmware where possible. Don’t rebuild topology blindly; validate first, then replace carefully.

Checklists / step-by-step plan

Checklist A: Before you touch anything (evidence collection)

  1. Capture zpool import output and save it somewhere safe.
  2. Capture lsblk -o NAME,SIZE,MODEL,SERIAL,WWN.
  3. Capture kernel log snippets around disk discovery and errors.
  4. If the pool is visible, try zpool import -o readonly=on (and record the result).

Checklist B: Decide if this is hardware/path or metadata

  1. If logs show timeouts/resets/I/O errors: treat as hardware/path first.
  2. If disks are stable and readable but ZFS claims invalid labels: suspect wrong device mapping or overwritten labels.
  3. Use zdb -l on multiple disks to confirm pool GUID consistency.

Checklist C: Safe import workflow

  1. Prefer importing by stable paths (-d /dev/disk/by-id).
  2. Attempt read-only import.
  3. If “pool in use,” confirm the other host status before -f.
  4. If necessary, evaluate rewind with -F -n, then decide.
  5. Once imported, check zpool status and dataset mounts before starting applications.

Checklist D: Recovery after import (make it healthy)

  1. Replace failed/unavailable devices using stable identifiers.
  2. Monitor resilver and logs; if errors appear, stop and fix the I/O path.
  3. After resilver completes, run a scrub during a controlled window.
  4. Document what happened: which disk, which slot, which GUID, what logs showed.

One quote worth keeping in your head

Paraphrased idea (attributed to W. Edwards Deming): Without data, you’re just another person with an opinion.

This is why you collect outputs first, then act. ZFS recovery is not a vibes-based sport.

FAQ

1) What exactly is a “ZFS label”?

A small region of metadata stored redundantly (four copies per vdev) at the start and end of each device. It describes pool identity, topology, and pointers needed to find the active state.

2) Does “corrupt label” always mean the disk is bad?

No. It can mean the disk is missing, the device path changed, the HBA is returning garbage, or the label areas were overwritten. Prove disk health and identity before replacing.

3) Can I repair labels manually?

In normal operations, you don’t “repair labels” by writing magic bytes. The correct pattern is: import if possible, then zpool replace the failing/missing device and let ZFS rebuild redundancy. Manual label rewriting is for specialists and usually follows “we already lost.”

4) Is zpool import -f dangerous?

It can be. If the pool is genuinely active on another host, forcing import risks simultaneous writes and real corruption. If the other host is dead and the pool wasn’t exported, -f is often the right move—after verification.

5) What does zpool import -F actually do?

It rewinds the pool to an earlier transaction group (TXG) that appears consistent and readable. You will lose the most recent transactions after that point. Always run with -n first to see the rollback estimate.

6) My pool imports, but it’s degraded. Should I scrub immediately?

No. Replace missing/failing devices first. Scrubbing a degraded pool increases read load and can push marginal disks over the edge. Resilver to restore redundancy, then scrub for confidence.

7) Why do imports sometimes hang?

Because ZFS is waiting for I/O. A single disk timing out reads can stall discovery. Check kernel logs and SMART; don’t keep retrying imports while the hardware is melting.

8) How do I avoid this next time?

Use stable device identifiers, document disk-to-slot mappings, scrub on schedule, monitor link resets/CRC errors, and test your recovery procedure when you’re not panicking.

9) If one disk’s labels are unreadable, can ZFS still import?

Depends on topology and redundancy. Mirrors are forgiving. RAIDZ depends on parity and what’s missing. If ZFS can’t assemble the vdev tree or meet redundancy requirements, it will refuse import.

10) Should I ever use zpool import -D (destroyed pools) in label-corruption cases?

Only when you have strong evidence the pool was accidentally destroyed/cleared and you’re doing recovery intentionally. It’s not a first-line tool for “invalid label” and it can complicate the situation if used casually.

Conclusion: what to do next time

When ZFS says “corrupt labels,” it’s not inviting you to start random repair attempts. It’s asking you to do what operations people do best: reduce uncertainty. Check what ZFS sees, verify device identity, look for I/O path failures, and inspect labels with zdb before you change anything.

Practical next steps:

  • Standardize on persistent device naming for pools (by-id/WWN or your platform’s equivalent).
  • Add a runbook step that collects zpool import, lsblk, and kernel logs before any replacements.
  • Monitor for CRC/link resets and treat them as precursors, not trivia.
  • Require human approval for rewind imports (-F) and always run -n first.
  • Keep a simple mapping of disk serial → bay → vdev GUID. It’s dull, which is why it works.
← Previous
How Not to Get Tricked by “Marketing FPS”: Simple Rules
Next →
ZFS for PostgreSQL: The Dataset and Sync Strategy That Works

Leave a comment