ZFS Boot Pool Recovery: Getting a System Back After a Bad Update

December 2, 2025 • February 3, 2026 • Read: 20 min • Views: 7

Was this helpful?

You reboot after an update and the machine answers with a blinking cursor, a GRUB prompt, or the classic “cannot import pool.”
Production is down. Your pager is loud. Your coffee is useless.

This is where ZFS is either your best friend or the colleague who says “works on my laptop” and disappears. The good news:
ZFS boot pool recovery is usually deterministic if you stop guessing and start collecting evidence in the right order.

A working mental model: what “boot pool” really means

ZFS systems that “boot from ZFS” often have two distinct storage personalities:
a small boot pool (commonly bpool) holding kernel, initramfs, and bootloader-friendly bits; and a root pool
(commonly rpool) holding everything else. Some platforms skip the split and boot directly from the root pool.
Some platforms use UEFI and store EFI artifacts in a FAT ESP, not ZFS at all. You need to identify which universe you’re in.

A bad update can break boot in several different ways:

Bootloader can’t find its config or can’t read ZFS.
Kernel boots but initramfs can’t import the root pool.
ZFS modules don’t match the kernel.
Pool features were enabled on one box and now an older environment can’t import.
A mirror member died and the firmware/boot order picks the wrong disk.

Your job is not to “fix ZFS.” Your job is to answer one question quickly:
where in the boot chain does it fail? Bootloader stage, kernel stage, initramfs stage, or userspace stage.
Different tools; different fixes.

Here’s the chain in plain terms:
firmware → bootloader (GRUB/systemd-boot) → kernel + initramfs → ZFS import → mount root → switch_root → services.
When you treat it as a chain, you stop randomly reinstalling GRUB like it’s holy water.

Joke #1: GRUB isn’t malicious; it’s just an enthusiast for interpretive error messages.

Fast diagnosis playbook (check these first)

First: identify the failure stage

Stuck before GRUB menu: firmware/ESP/boot order, or bootloader missing.
GRUB menu appears, kernel selection works, then drops to initramfs shell: ZFS import or initramfs/modules issue.
Kernel boots, then panic: often missing root dataset, wrong root=, or incompatible ZFS module.
Boots to single-user/emergency: pool imported but mountpoints/datasets/services or encryption keys failing.

Second: don’t mutate anything until you can read the pools

Boot a live environment that has ZFS tooling matching (or close to) your on-disk version.
Import pools read-only first. If it imports, you have options. If it doesn’t, you need to understand why.

Third: decide your recovery strategy

Roll back to a known-good boot environment/snapshot if you have one. This is fastest.
Repair bootloader if the OS is intact but the boot entry is broken.
Rebuild initramfs if ZFS modules aren’t available early in boot.
Fix pool health (degraded mirror, wrong disk boot) if the machine is booting from the “wrong” member.

Fourth: preserve evidence

Grab the last boot logs from the failed system (if accessible) and the exact versions of kernel + ZFS packages.
On systems with multiple boot environments, record which one was selected. Recovery is easier when you can prove what changed.

Interesting facts and context (because history bites)

ZFS was built with end-to-end integrity as a first-class feature, which is why “silent corruption” is taken personally by the stack.
The “boot pool” split is largely a bootloader constraint: historically GRUB had limited ZFS feature support, so keeping /boot simple reduced drama.
Feature flags were introduced to avoid version-number lock-in. Great idea—until you enable a feature and try to import on an older rescue image.
Illumos and BSDs have long treated ZFS as a native citizen, while Linux often treats it as an external module; that difference matters during initramfs and kernel updates.
ZFS “zpool.cache” became a quiet dependency for many boot flows; if it’s stale or missing, imports can become scavenger hunts.
UEFI changed the bootloader failure modes: you can have a perfect ZFS pool and still fail because the EFI System Partition entry went missing.
Boot environments popularized “upgrade without fear” workflows in Solaris/illumos land; Linux folks are rediscovering the joy via snapshots and dataset cloning.
The ZFS on Linux project merged into OpenZFS, which reduced fragmentation but didn’t eliminate the “kernel vs module” coordination problem.

One quote that belongs on every on-call wall:
“Hope is not a strategy.” — General Gordon R. Sullivan

Practical recovery tasks (commands, outputs, decisions)

Assumptions: you booted a live ISO or rescue environment that includes ZFS tools (zpool, zfs) and basic disk utilities.
Replace pool/dataset names as needed. Commands below are designed to be runnable and to produce meaningful evidence.

Task 1: Identify disks, partitions, and what changed

cr0x@server:~$ lsblk -o NAME,SIZE,TYPE,FSTYPE,PARTTYPE,PARTLABEL,MOUNTPOINTS
NAME        SIZE TYPE FSTYPE PARTTYPE                             PARTLABEL MOUNTPOINTS
nvme0n1   476.9G disk
├─nvme0n1p1   1G part vfat   c12a7328-f81f-11d2-ba4b-00a0c93ec93b EFI
├─nvme0n1p2   4G part zfs_member
└─nvme0n1p3 471G part zfs_member
nvme1n1   476.9G disk
├─nvme1n1p1   1G part vfat   c12a7328-f81f-11d2-ba4b-00a0c93ec93b EFI
├─nvme1n1p2   4G part zfs_member
└─nvme1n1p3 471G part zfs_member

What it means: you see whether you’re dealing with an ESP (vfat) plus ZFS members, and whether mirrors exist.
Decision: if ESP exists, you must consider UEFI boot entries. If no ESP and you’re BIOS, GRUB likely lives in the MBR area.

Task 2: Check if the pools are visible and why they may not import

cr0x@server:~$ sudo zpool import
   pool: bpool
     id: 1234567890123456789
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        bpool        ONLINE
          nvme0n1p2  ONLINE
          nvme1n1p2  ONLINE

   pool: rpool
     id: 9876543210987654321
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        rpool        ONLINE
          mirror-0   ONLINE
            nvme0n1p3  ONLINE
            nvme1n1p3  ONLINE

What it means: the live environment can see both pools; good sign. If you instead see “cannot import” with feature flags,
your rescue image’s OpenZFS is likely too old.
Decision: if pools appear, import read-only first. If not, upgrade the rescue environment or use a matching distro ISO.

Task 3: Import pools read-only and without mounting (safe inspection)

cr0x@server:~$ sudo zpool import -N -o readonly=on rpool
cr0x@server:~$ sudo zpool import -N -o readonly=on bpool

What it means: pools are imported but datasets aren’t mounted; you reduce accidental writes.
Decision: if import fails with “pool is busy,” something auto-imported; check zpool status and export/re-import deliberately.

Task 4: Confirm pool health and spot “wrong disk boot” scenarios

cr0x@server:~$ sudo zpool status -v rpool
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:10:12 with 0 errors on Tue Dec 24 03:10:12 2025
config:

        NAME           STATE     READ WRITE CKSUM
        rpool          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme0n1p3  ONLINE       0     0     0
            nvme1n1p3  ONLINE       0     0     0

errors: No known data errors

What it means: the root pool is healthy. If you see DEGRADED or one device UNAVAIL,
boot may be attempting the dead path.
Decision: if mirrored, ensure bootloader/EFI artifacts exist on both disks, not just the “primary.”

Task 5: List datasets and find the boot-critical ones

cr0x@server:~$ sudo zfs list -o name,mountpoint,canmount,readonly -r rpool | head -n 12
NAME                         MOUNTPOINT  CANMOUNT  READONLY
rpool                        none        off       on
rpool/ROOT                    none        off       on
rpool/ROOT/ubuntu_1           /           noauto    on
rpool/home                    /home       on        on
rpool/var                     /var        on        on
rpool/var/log                 /var/log    on        on

What it means: you can see the root dataset (often under rpool/ROOT/*) and whether it is set to mount.
Decision: identify the intended boot environment dataset. If multiple exist, you may roll back by selecting a different one.

Task 6: Identify the active boot environment (or what should be active)

cr0x@server:~$ sudo zpool get -H bootfs rpool
rpool  bootfs  rpool/ROOT/ubuntu_1  local

What it means: bootfs points to the dataset expected to be mounted as /.
Decision: if bootfs is wrong or points to a missing dataset, set it to a valid one and rebuild boot artifacts.

Task 7: Mount the system in a controlled way (for repair)

cr0x@server:~$ sudo mkdir -p /mnt
cr0x@server:~$ sudo zfs set readonly=off rpool/ROOT/ubuntu_1
cr0x@server:~$ sudo zfs mount rpool/ROOT/ubuntu_1
cr0x@server:~$ sudo mount --rbind /dev /mnt/dev
cr0x@server:~$ sudo mount --rbind /proc /mnt/proc
cr0x@server:~$ sudo mount --rbind /sys /mnt/sys
cr0x@server:~$ mount | head -n 5
rpool/ROOT/ubuntu_1 on / type zfs (rw,relatime,xattr,noacl)
udev on /mnt/dev type devtmpfs (rw,nosuid,relatime,size=16343656k,nr_inodes=4085914,mode=755)
proc on /mnt/proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /mnt/sys type sysfs (rw,nosuid,nodev,noexec,relatime)

What it means: you’ve mounted root and bound kernel pseudo-filesystems so chroot repairs work.
Decision: if the root dataset mounts but key paths are empty, you may have mounted the wrong dataset (common with multiple boot environments).

Task 8: Check kernel and ZFS module alignment (classic bad update failure)

cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'uname -r; dpkg -l | egrep "zfs|linux-image" | head'
5.15.0-92-generic
ii  linux-image-5.15.0-92-generic   5.15.0-92.102  amd64  Signed kernel image generic
ii  zfsutils-linux                  2.1.5-1ubuntu6 amd64  command-line tools to manage ZFS filesystems
ii  zfs-dkms                        2.1.5-1ubuntu6 amd64  OpenZFS kernel modules for Linux

What it means: you confirm which kernel is expected and whether DKMS ZFS is installed.
Decision: if the kernel updated but ZFS modules didn’t build (or were purged), initramfs won’t import the pool. Rebuild modules/initramfs.

Task 9: Rebuild initramfs and ensure ZFS hooks are present

cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'update-initramfs -u -k all'
update-initramfs: Generating /boot/initrd.img-5.15.0-92-generic
W: Possible missing firmware /lib/firmware/i915/tgl_dmc_ver2_12.bin for module i915

What it means: initramfs generation succeeded; a firmware warning is usually unrelated to ZFS boot.
Decision: if you see errors about zfs hooks or missing zpool, fix package state (install zfs-initramfs where applicable).

Task 10: Verify that the kernel command line points to the right root dataset

cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'grep -R "root=ZFS=" -n /boot/grub/grub.cfg | head -n 3'
/boot/grub/grub.cfg:126:        linux   /ROOT/ubuntu_1/@/boot/vmlinuz-5.15.0-92-generic root=ZFS=rpool/ROOT/ubuntu_1 ro quiet

What it means: GRUB is configured to mount the expected ZFS dataset as root.
Decision: if it points to a dataset that no longer exists (renamed, rolled back, cloned), regenerate GRUB config and set bootfs appropriately.

Task 11: Regenerate GRUB config and reinstall the bootloader (BIOS case)

cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'grub-mkconfig -o /boot/grub/grub.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.0-92-generic
Found initrd image: /boot/initrd.img-5.15.0-92-generic
done
cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'grub-install /dev/nvme0n1'
Installing for i386-pc platform.
Installation finished. No error reported.

What it means: config includes kernels; GRUB installed to disk. On mirrored boots, repeat for the other disk.
Decision: if you’re actually UEFI, don’t do BIOS installs. Instead mount the ESP and use the UEFI pathway below.

Task 12: Repair UEFI boot artifacts and entries (UEFI case)

cr0x@server:~$ sudo mount /dev/nvme0n1p1 /mnt/boot/efi
cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=ubuntu --recheck'
Installing for x86_64-efi platform.
Installation finished. No error reported.
cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'efibootmgr -v | head -n 5'
BootCurrent: 0003
Timeout: 1 seconds
BootOrder: 0003,0001,0002
Boot0003* ubuntu  HD(1,GPT,2b3c...,0x800,0x32000)/File(\EFI\ubuntu\grubx64.efi)

What it means: GRUB is installed in the ESP and a boot entry exists.
Decision: on mirrored systems, also mount and populate the second disk’s ESP. Firmware boot order can flip unexpectedly after failures.

Task 13: Verify /boot contents live on the expected pool/dataset

cr0x@server:~$ sudo zfs list -o name,mountpoint -r bpool
NAME            MOUNTPOINT
bpool           /boot
bpool/BOOT      /boot
bpool/BOOT/ubuntu_1 /boot
cr0x@server:~$ sudo ls -lh /mnt/boot | head
total 192M
-rw-r--r-- 1 root root  26M Dec 24 03:02 initrd.img-5.15.0-92-generic
-rw------- 1 root root  12M Dec 24 03:02 System.map-5.15.0-92-generic
-rw------- 1 root root 8.3M Dec 24 03:02 vmlinuz-5.15.0-92-generic

What it means: confirms whether /boot is ZFS-mounted and has the kernel/initrd you expect.
Decision: if /boot is empty or not mounted, you’re repairing the wrong filesystem. Mount bpool datasets correctly and rerun initramfs/GRUB.

Task 14: Confirm ZFS import behavior in early boot (cachefile and hostid)

cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'ls -l /etc/hostid; zpool get -H cachefile rpool'
-rw-r--r-- 1 root root 4 Dec 22 10:10 /etc/hostid
rpool  cachefile  /etc/zfs/zpool.cache  local

What it means: hostid exists (helps prevent accidental imports on the wrong host) and cachefile is set.
Decision: if cachefile is “-” or points nowhere, set it and regenerate it from a known-good import to make initramfs life easier.

Task 15: Fix a broken zpool.cache (common after chroot work or image cloning)

cr0x@server:~$ sudo chroot /mnt /bin/bash -lc 'zpool set cachefile=/etc/zfs/zpool.cache rpool; zpool set cachefile=/etc/zfs/zpool.cache bpool; ls -l /etc/zfs/zpool.cache'
-rw-r--r-- 1 root root 2430 Dec 26 18:02 /etc/zfs/zpool.cache

What it means: cachefile exists and contains device paths/guids for import.
Decision: rebuild initramfs after changing this so early boot sees the updated cache.

Task 16: Check for feature flag incompatibility (why a rescue ISO can’t import)

cr0x@server:~$ sudo zpool import -o readonly=on rpool
cannot import 'rpool': unsupported feature(s)
  Pool 'rpool' uses the following feature(s) not supported by this system:
    com.delphix:spacemap_histogram
    org.openzfs:project_quota

What it means: your rescue environment’s ZFS is too old to understand the pool’s enabled features.
Decision: stop. Don’t “force” anything. Boot a newer rescue image, or install newer OpenZFS in the live environment.

Task 17: If GRUB drops to rescue mode, find /boot and load normal mode

cr0x@server:~$ ls
(hd0) (hd0,gpt1) (hd0,gpt2) (hd0,gpt3) (hd1) (hd1,gpt1) (hd1,gpt2) (hd1,gpt3)
cr0x@server:~$ ls (hd0,gpt2)/
@/ BOOT/ grub/
cr0x@server:~$ set prefix=(hd0,gpt2)/grub
cr0x@server:~$ insmod normal
cr0x@server:~$ normal

What it means: you manually pointed GRUB to the right filesystem and loaded the normal module to reach the menu.
Decision: once booted (or chrooted), reinstall GRUB properly; manual rescue is a one-time parachute, not a lifestyle.

Task 18: Roll back a broken update using snapshots (when you have them)

cr0x@server:~$ sudo zfs list -t snapshot -o name,creation -r rpool/ROOT/ubuntu_1 | tail -n 3
rpool/ROOT/ubuntu_1@pre-upgrade-2025-12-24  Tue Dec 24 02:55 2025
rpool/ROOT/ubuntu_1@post-upgrade-2025-12-24 Tue Dec 24 03:05 2025
cr0x@server:~$ sudo zfs rollback -r rpool/ROOT/ubuntu_1@pre-upgrade-2025-12-24

What it means: you reverted filesystem state to a known point. This is often the fastest way out of a bad package set.
Decision: after rollback, regenerate initramfs/GRUB to align boot artifacts with the restored state, then reboot.

Joke #2: If your recovery plan is “reboot again,” congratulations—you’ve implemented chaos engineering without the paperwork.

Three corporate mini-stories from the trenches

Incident #1: The outage caused by a wrong assumption

A mid-size company ran a fleet of Linux hosts with mirrored NVMe and ZFS root. The team had recently standardized on UEFI.
Their mental model: “Mirrored disks means mirrored boot.” They were half right, which is the most dangerous kind of right.

A routine update included a GRUB reinstall and a kernel bump. The change worked on staging and on most production nodes.
One node rebooted and fell straight into firmware. No GRUB menu. No disk found. The on-call assumed ZFS import issues.
They spent an hour in initramfs fantasies while the machine was never reaching initramfs at all.

The real issue: only one disk’s ESP had been mounted during the update cycle, so only one disk got refreshed EFI artifacts.
The node had a prior, quiet NVMe hiccup and the firmware flipped boot order to the other drive—whose ESP was stale.
Mirrored ZFS data didn’t matter; firmware can’t boot what isn’t there.

The fix was embarrassingly simple: mount both ESPs, install GRUB to both, and ensure a recurring job kept them in sync.
The lasting improvement was cultural: during incidents, they started calling the failure stage out loud.
“We are failing before GRUB” became a sentence people were allowed to say without being judged.

Incident #2: The optimization that backfired

Another org wanted faster boot and less complexity. Someone proposed: “Why keep a separate boot pool?
Let’s put /boot on the root pool and enable newer ZFS features everywhere. One pool to rule them all.”
It was pitched as simplification. It was actually coupling.

For months it worked. Then a kernel update coincided with a rescue scenario: a node failed to boot due to unrelated hardware.
The team grabbed a generic rescue ISO—older, but “good enough.” The ISO couldn’t import the pool due to unsupported features.
They were locked out of their own data by a tool-version mismatch they’d optimized into existence.

They eventually recovered by finding a newer live environment and importing read-only. But the time-to-recovery ballooned.
The postmortem wasn’t about blame; it was about assumptions.
“We’ll always have a new enough rescue image” turned out to be a fairy tale that doesn’t survive long weekends.

Afterward, they restored the split boot pool pattern and treated feature flag enablement like a production change
requiring an explicit compatibility plan. Sometimes “old constraints” exist because the world is messy, not because people were lazy.

Incident #3: The boring practice that saved the day

A financial services team ran ZFS root with strict change controls. They did one unfashionable thing consistently:
before any upgrade window, they created snapshots of the root dataset and recorded the intended bootfs in the ticket.
No heroics. Just discipline.

An update introduced a mismatch between kernel and ZFS modules. The node rebooted to initramfs and refused to import root.
The on-call did not start reinstalling random packages. They imported pools read-only from a rescue environment,
verified the snapshots existed, and rolled back to the pre-upgrade snapshot.

Then they regenerated initramfs and GRUB, rebooted, and the system came back on the older, known-good kernel.
The incident was short and deeply uninteresting—exactly what you want when money is involved.

The follow-up work was equally boring: they adjusted the upgrade pipeline to validate DKMS build success before reboot.
It’s not glamorous, but it turns “bad update” into “minor delay.”

Common mistakes: symptom → root cause → fix

1) Symptom: GRUB prompt or “grub rescue>” after update

Root cause: GRUB can’t find its prefix/modules or the disk/partition numbering changed.

Fix: From rescue, locate the partition containing /boot/grub, set prefix, load normal, boot once, then reinstall GRUB properly to all boot disks.

2) Symptom: Kernel loads, then drops to initramfs with “cannot import rpool”

Root cause: initramfs missing ZFS modules/tools, broken zpool.cache, or wrong root dataset specified.

Fix: Chroot, install/verify ZFS initramfs integration, rebuild initramfs, ensure root=ZFS=... points to a real dataset, regenerate GRUB.

3) Symptom: Live ISO can’t import pool, complains about unsupported features

Root cause: Rescue environment’s OpenZFS is older than pool feature set.

Fix: Use a newer rescue image or install newer ZFS packages in the live environment. Do not force import in a way that risks corruption.

4) Symptom: Boots only when one specific disk is present

Root cause: ESP/GRUB installed on one disk only; firmware boot order flips to the other.

Fix: Install bootloader to both disks and keep ESPs synchronized; confirm UEFI entries and fallback paths.

5) Symptom: ZFS imports in rescue, but system fails later with mount errors

Root cause: wrong dataset properties (mountpoint, canmount), or bootfs points to the wrong dataset/clone.

Fix: Check zfs get mountpoint,canmount, correct dataset selection, set bootfs, then update boot config.

6) Symptom: Panic mentioning ZFS symbols or module loading errors

Root cause: kernel update without matching ZFS module build/installation (DKMS failure, package partial upgrade).

Fix: Boot an older kernel from GRUB if available; otherwise chroot, reinstall zfs-dkms/zfsutils, rebuild modules and initramfs.

7) Symptom: Pool imports but immediately exports or refuses due to “hostid mismatch”

Root cause: cloned image reused hostid, or /etc/hostid changed unexpectedly.

Fix: Confirm /etc/hostid in the installed system, correct it if needed, and regenerate initramfs so early boot uses the right identity.

8) Symptom: “No such device” for vmlinuz/initrd in GRUB

Root cause: /boot not mounted at update time; initramfs and kernels written somewhere else, or bpool dataset mismatch.

Fix: Mount bpool properly, rerun update-initramfs and grub-mkconfig; verify file presence in /boot.

Checklists / step-by-step plan

Recovery checklist (do this in order, stop when the system boots)

Identify boot mode: UEFI with ESP vs BIOS. Use lsblk to spot vfat ESP partitions.
Confirm failure stage: before GRUB, in GRUB, in initramfs, or later.
Boot a compatible rescue environment with ZFS tools close to your pool’s feature set.
Run zpool import and read the error. Don’t freestyle.
Import pools read-only with -N to inspect safely.
Check zpool status for degraded mirrors and missing members.
Find root dataset and bootfs (zfs list, zpool get bootfs).
Mount root and chroot (bind /dev, /proc, /sys).
Rebuild initramfs and validate ZFS components are included.
Regenerate GRUB config so root=ZFS=... matches reality.
Reinstall bootloader to the correct target:
- BIOS: grub-install /dev/disk for each boot disk.
- UEFI: mount ESP, grub-install --target=x86_64-efi ..., verify efibootmgr.
Reboot once and observe. If it fails, capture the exact message and return to the chain-of-failure model.

Prevention checklist (future you deserves nice things)

Create a pre-upgrade snapshot of the root dataset and keep a naming convention that makes sense at 3 AM.
Validate DKMS/module builds complete successfully before scheduling a reboot.
Keep at least one known-good older kernel in GRUB.
On mirrored boots, install/update EFI artifacts or GRUB on both disks, not “whichever is mounted today.”
Store a rescue ISO (or netboot option) that can import your pool features. Update it when you enable new features.
Document your pool layout: which partitions belong to bpool/rpool, where ESP lives, and expected bootfs.

FAQ

1) Do I always need a separate boot pool (bpool)?

No, but it’s a pragmatic hedge. A separate boot pool can keep /boot on a ZFS feature set that bootloaders handle well.
If your platform boots happily from your root pool and you control feature enablement tightly, one pool can work.
If you want reliable rescue paths, the split is usually worth the minor complexity.

2) Why import pools read-only first?

Because panic makes people type destructive commands. Read-only import lets you inspect datasets, snapshots, and health without committing changes.
Once you understand the failure, remount read-write for targeted repairs.

3) My rescue ISO can’t import due to unsupported features. Can I “force” it?

Don’t. Unsupported features mean the software literally doesn’t understand parts of the on-disk format.
Your fix is to use newer OpenZFS tooling, not to gamble with your root pool.

4) If GRUB shows the menu, does that mean my bootloader is fine?

It means GRUB is at least executing. You can still have broken kernel/initrd paths, wrong root=ZFS=,
or a missing ZFS module story that appears later in initramfs.

5) How do I know which dataset is supposed to boot?

Check zpool get bootfs rpool. Then confirm the dataset exists via zfs list.
If you use multiple boot environments, the intended one is usually reflected in GRUB entries too.

6) What’s the most common “bad update” breakage on ZFS root?

Kernel and ZFS module mismatch. The kernel updates, DKMS fails quietly, and initramfs boots without the ability to import ZFS.
The fix is typically: chroot, repair packages, rebuild initramfs, regenerate GRUB.

7) Do I need to export pools before rebooting from rescue?

Ideally, yes: zpool export rpool and zpool export bpool after you’re done, so the next boot imports cleanly.
If you used read-only import and are rebooting immediately, it’s still good hygiene, especially in complicated rescue sessions.

8) Can I recover by rolling back snapshots even if boot is broken?

Often yes, because rollback can be done from rescue after importing the pool. The key is to roll back the correct root dataset
and then ensure boot artifacts (initramfs/GRUB/EFI) match the rolled-back state.

9) Why does mirrored ZFS not guarantee mirrored boot?

ZFS mirrors your ZFS data. Firmware boots from ESP/MBR/bootloader locations that may not be mirrored unless you make them so.
Treat “boot redundancy” as a separate engineering requirement.

10) Should I enable new ZFS feature flags on my boot pool?

Be conservative. Boot pools are there to boot, not to be cutting-edge. Enable features on the root pool when you need them,
and only after confirming your bootloaders and rescue tooling can still function.

Conclusion: what to do next time (before it’s 2 AM)

ZFS boot pool recovery is not black magic. It’s a sequence: identify the failure stage, import safely, verify the intended boot dataset,
then repair exactly what’s broken—bootloader, initramfs, or package alignment.

Practical next steps you should do on a healthy system:

Write down your pool layout (bpool/rpool, ESPs, disk mapping) and keep it with your runbooks.
Automate pre-upgrade snapshots and make rollback a first-class option.
Ensure boot artifacts are installed to every bootable disk, not just the one the OS happened to mount.
Keep a known-good rescue environment that can import your pool features—and refresh it when you change features.
Test the “bad update” path on a non-critical host: break it intentionally, recover it deliberately, and time yourself.

When the next update goes sideways, you want fewer surprises and more receipts. Systems don’t reward optimism. They reward preparation.