ZFS Module Versions: Keeping Kernel and ZFS From Fighting

Was this helpful?

The most annoying ZFS failure mode isn’t “disk died.” It’s “everything is fine until you reboot,” and then the pool won’t import
because the kernel module and the userland tools disagree about what reality looks like.

When the Linux kernel moves and ZFS doesn’t move with it (or moves differently), you get a quiet mismatch that only becomes loud at
02:00. This is about preventing that, diagnosing it fast, and upgrading like an adult.

What “kernel vs ZFS fighting” actually means

On Linux, ZFS is a kernel module plus userland tools. The kernel module does the real work: ARC, transaction groups, IO scheduling,
checksums, compression, vdev state, pool import/export. The userland tools (zpool, zfs, zdb,
zed) are the control plane.

“Fighting” happens when those pieces don’t agree on interfaces, expectations, or enabled features. That can show up as:

  • Module won’t load after a kernel update (classic DKMS failure, missing headers, incompatible symbols).
  • Module loads but pool won’t import due to feature flag expectations or unsupported on-disk formats.
  • Tools report one version, kernel runs another, leading to weird errors like “unsupported pool version” or missing properties.
  • Boot-time failures because initramfs doesn’t include the right module, so the root pool never appears.

Your job is to keep these aligned:
kernel ABI expectations, ZFS module build, and on-disk pool features. In other words: don’t let your storage stack become a group project.

Joke #1: A kernel update is like a surprise fire drill—except the fire is your storage, and the exit signs are maintained by DKMS.

Interesting facts and historical context (the bits people forget)

  1. ZFS started at Sun Microsystems (mid-2000s) and baked in end-to-end checksumming, copy-on-write, and pooled storage when most filesystems were still vibing on “fsck and pray.”
  2. Linux doesn’t ship ZFS in-tree largely due to licensing incompatibilities (CDDL vs GPL). That’s why you live in module-land.
  3. “Pool versions” were replaced by “feature flags” in modern OpenZFS. Feature flags allow incremental capability negotiation instead of monolithic version numbers.
  4. FreeBSD and illumos integrate ZFS differently. On Linux you’re juggling out-of-tree kernel modules; on FreeBSD it’s closer to a first-class citizen.
  5. DKMS popularized the “rebuild on kernel update” workflow so third-party modules (ZFS, NVIDIA, etc.) don’t require you to manually recompile every time.
  6. Initramfs inclusion is a storage availability issue, not a “boot optimization.” If ZFS isn’t in the initramfs and your root lives on ZFS, you’re about to have a character-building experience.
  7. OpenZFS is a cross-platform project. That means feature decisions are often constrained by “works across Linux/FreeBSD/macOS forks,” not just your distribution.
  8. The kernel module’s version is not the pool format version. People conflate these. Then they “upgrade” the pool and discover downgrade is not a thing.

The three compatibility layers you must keep straight

1) Kernel ↔ ZFS kernel module (build/ABI compatibility)

Your running kernel exports symbols. The ZFS module expects certain symbols and structures. If the kernel changes and the module
doesn’t match, modprobe zfs fails or loads with missing pieces.

On many distros, you’ll see ZFS packaged in one of two broad shapes:

  • DKMS builds: ZFS source installed; module built locally for each kernel. Flexible, but you’re now a build pipeline.
  • Prebuilt kmods: module packages built by the distro for specific kernels. Less flexible, usually more predictable.

2) ZFS userland tools ↔ ZFS kernel module (ioctl/API compatibility)

The zfs/zpool tools talk to the kernel via ioctls and sysfs/proc interfaces. If userland is newer than the
module (or vice versa), you can get odd behavior: missing properties, “unknown command,” or misleading “pool is outdated” prompts.

Most distros try hard to ship matched pairs. Problems arise when you mix repos, pin only half the stack, or roll your own builds.

3) Pool on-disk features ↔ ZFS implementation (feature flags compatibility)

This is the part that bites you across machines. Pools carry feature flags: encryption, large_dnode, spacemap_histogram, etc.
A system can only import a pool if it supports the enabled features. Import can be read-only in some scenarios, but “I just need it
to mount so I can copy data” is not a strategy.

The dangerous command isn’t zpool import. It’s zpool upgrade. The former negotiates. The latter makes irreversible changes.

One reliability maxim worth keeping on a sticky note. Here’s a paraphrased idea from Werner Vogels (Amazon CTO): Build systems that assume things will fail, and make recovery routine. —paraphrased idea

Fast diagnosis playbook

When ZFS and the kernel are “fighting,” don’t wander. Follow a strict order to locate the bottleneck: boot, module, or pool features.

First: is the module loaded and actually the one you think it is?

  • Check uname -r, then confirm ZFS modules are loaded for that kernel.
  • Check kernel logs for “Unknown symbol” or vermagic mismatch.
  • Confirm userland tools and kernel module versions are aligned.

Second: can the pool be seen and imported?

  • zpool import without importing: does it list the pool?
  • Try zpool import -N (don’t mount) to isolate import from mount/property issues.
  • Look for feature flags that are unsupported on this host.

Third: is this a boot/initramfs problem?

  • If root is on ZFS and you dropped to initramfs shell: verify /lib/modules/$(uname -r) includes ZFS, rebuild initramfs.
  • Confirm the right kernel is booted (not a fallback kernel missing matching ZFS kmod packages).

Fourth: performance incidents that look like “version problems”

Sometimes the system boots fine and you only see trouble under load. If ZFS silently fell back to a different module build, or you’re running
a kernel/ZFS combo with a known regression, the symptom is “ZFS is slow” or “txg takes forever,” not “module mismatch.”

  • Check ARC stats and memory pressure.
  • Check txg sync times and I/O saturation.
  • Check if you’re on a newly-upgraded kernel without a matching prebuilt ZFS kmod (DKMS built something, but did it build correctly?).

Practical tasks: commands, outputs, and the decision you make

Below are real tasks you can run on a typical Linux host using OpenZFS packages. Commands are shown with representative output; your
environment will differ. The point is what the output means and what you do next.

Task 1: Confirm the running kernel (the target you must match)

cr0x@server:~$ uname -r
6.8.0-52-generic

Meaning: ZFS module must be built for 6.8.0-52-generic specifically.
Decision: Any ZFS module build/install steps must reference this kernel version. If you’re troubleshooting, keep this value in your head.

Task 2: Check whether the ZFS module is loaded

cr0x@server:~$ lsmod | grep -E '^zfs|^spl'
zfs                  6340608  6
spl                   131072  1 zfs

Meaning: Kernel has ZFS and SPL loaded (SPL is the Solaris Porting Layer).
Decision: If nothing shows up, try loading it and look for errors in the next tasks.

Task 3: Try loading ZFS and read the error (the kernel will tell you)

cr0x@server:~$ sudo modprobe zfs
modprobe: ERROR: could not insert 'zfs': Unknown symbol in module, or unknown parameter (see dmesg)

Meaning: The module binary doesn’t match the kernel’s exported symbols (classic mismatch after kernel update).
Decision: Immediately check dmesg, then rebuild/reinstall ZFS modules for this kernel, or boot the previous kernel that matches.

Task 4: Read kernel logs for vermagic/symbol mismatch

cr0x@server:~$ dmesg | tail -n 30
[  112.941230] zfs: disagrees about version of symbol module_layout
[  112.941251] zfs: Unknown symbol spl_kmem_cache_alloc (err -22)
[  112.941260] zfs: Unknown symbol spl_kmem_cache_free (err -22)

Meaning: Module built against different kernel headers or different SPL/ZFS pair.
Decision: Treat this as “module build problem,” not “pool problem.” Fix the module chain before touching ZFS on-disk features.

Task 5: Confirm the installed ZFS package versions (userland and kmods)

cr0x@server:~$ apt-cache policy zfsutils-linux zfs-dkms libzfs4linux
zfsutils-linux:
  Installed: 2.2.2-0ubuntu9
  Candidate: 2.2.2-0ubuntu9
zfs-dkms:
  Installed: 2.2.2-0ubuntu9
  Candidate: 2.2.2-0ubuntu9
libzfs4linux:
  Installed: 2.2.2-0ubuntu9
  Candidate: 2.2.2-0ubuntu9

Meaning: Userland and DKMS source package are aligned in version.
Decision: If these versions differ significantly (or come from mixed repos), stop and align them before debugging anything else.

Task 6: Check DKMS status (did it actually build for the current kernel?)

cr0x@server:~$ dkms status
zfs/2.2.2, 6.8.0-52-generic, x86_64: installed

Meaning: DKMS claims a ZFS module build exists for the running kernel.
Decision: If it says “added” or “built” but not “installed,” you likely have a half-finished build. Rebuild and install.

Task 7: Force a DKMS rebuild for the current kernel (when you suspect a broken build)

cr0x@server:~$ sudo dkms remove -m zfs -v 2.2.2 --all
Deleting module zfs-2.2.2 completely from the DKMS tree.

cr0x@server:~$ sudo dkms install -m zfs -v 2.2.2 -k 6.8.0-52-generic
Building module:
Cleaning build area...
make -j8 KERNELRELEASE=6.8.0-52-generic...
Installing module...
DKMS: install completed.

Meaning: You removed stale builds and rebuilt for the specific kernel.
Decision: If this fails, the compiler toolchain, headers, or kernel changes are your problem—solve that before rebooting into production.

Task 8: Verify that the module on disk matches the running kernel

cr0x@server:~$ modinfo -F vermagic zfs
6.8.0-52-generic SMP preempt mod_unload modversions

Meaning: The module’s vermagic matches the running kernel.
Decision: If vermagic doesn’t match, you’re loading the wrong module or built for the wrong kernel. Fix packaging or boot the correct kernel.

Task 9: Verify userland and kernel module report consistent ZFS versions

cr0x@server:~$ zfs version
zfs-2.2.2-0ubuntu9
zfs-kmod-2.2.2-0ubuntu9

Meaning: Tools and kernel module are in sync.
Decision: If they differ (e.g., userland newer than kmod), expect missing properties and weird errors. Align packages; don’t “work around” it.

Task 10: List pools available to import (without doing it)

cr0x@server:~$ sudo zpool import
   pool: tank
     id: 1283947562198834211
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        tank        ONLINE
          mirror-0  ONLINE
            sda3    ONLINE
            sdb3    ONLINE

Meaning: The kernel can read labels and see the pool topology.
Decision: If the pool isn’t listed, this is not a “mountpoint” problem. It’s device discovery, encryption key availability, or kernel/module failure.

Task 11: Import without mounting datasets (separate import from mount issues)

cr0x@server:~$ sudo zpool import -N tank
cr0x@server:~$ zpool status -x
all pools are healthy

Meaning: Pool import works; any remaining problems are dataset mount properties, keys, or services.
Decision: If your original problem was “systemd mount units hang,” this narrows it: the pool is fine; the mount workflow is the fight.

Task 12: Check feature flags and compatibility before you “upgrade” anything

cr0x@server:~$ zpool get all tank | grep feature@
tank  feature@async_destroy          enabled   local
tank  feature@encryption             enabled   local
tank  feature@project_quota          active    local

Meaning: The pool has enabled/active features; importing elsewhere requires those features supported.
Decision: If you need cross-host portability (DR, forensic recovery), keep a matrix of which hosts support which features. Don’t enable features casually.

Task 13: See what upgrades are available (read-only decision support)

cr0x@server:~$ zpool upgrade
This system supports ZFS pool feature flags.

The following pools are formatted with legacy version numbers and can be upgraded to use feature flags:
  none

The following pools can be upgraded:
  pool 'tank' has the following supported feature flags:
    spacemap_histogram
    extensible_dataset
    bookmarks
    filesystem_limits

Meaning: “Can be upgraded” means “you can make on-disk changes this system understands.”
Decision: Only run zpool upgrade -a if you are comfortable making the pool potentially unimportable on older systems. Most environments should not auto-upgrade pools.

Task 14: Identify whether you’re using prebuilt kmods or DKMS (packaging shape matters)

cr0x@server:~$ dpkg -l | grep -E 'zfs-dkms|zfs-kmod|spl-dkms' | awk '{print $1, $2, $3}'
ii zfs-dkms 2.2.2-0ubuntu9

Meaning: This host relies on DKMS to compile the module for each kernel.
Decision: Treat kernel upgrades as “build events.” Ensure headers, compiler, and DKMS hooks are working before rebooting.

Task 15: Check if initramfs contains ZFS modules (boot reliability check)

cr0x@server:~$ lsinitramfs /boot/initrd.img-6.8.0-52-generic | grep -E '/zfs\.ko|/spl\.ko' | head
usr/lib/modules/6.8.0-52-generic/updates/dkms/spl.ko
usr/lib/modules/6.8.0-52-generic/updates/dkms/zfs.ko

Meaning: ZFS is present in initramfs for that kernel.
Decision: If not present and your root pool depends on it, rebuild initramfs before rebooting. This is one of those “cheap now, expensive later” checks.

Task 16: Rebuild initramfs after ZFS module changes (make boot use the new reality)

cr0x@server:~$ sudo update-initramfs -u -k 6.8.0-52-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-52-generic

Meaning: The initramfs image is regenerated with current modules and hooks.
Decision: After DKMS rebuilds, always refresh initramfs on hosts that import pools during boot (especially root-on-ZFS).

Task 17: Confirm pool health and rule out “real storage” issues masquerading as version trouble

cr0x@server:~$ zpool status -v tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:21:43 with 0 errors on Thu Dec 19 03:11:52 2025
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda3    ONLINE       0     0     0
            sdb3    ONLINE       0     0     0

errors: No known data errors

Meaning: Storage is healthy at the ZFS layer.
Decision: If you see checksum errors or degraded vdevs, don’t blame versions. Fix hardware/paths first; version churn during a real fault is how outages get longer.

Task 18: Check ZED events and service state (sometimes it’s not loading policy)

cr0x@server:~$ systemctl status zfs-zed.service
● zfs-zed.service - ZFS Event Daemon (zed)
     Loaded: loaded (/lib/systemd/system/zfs-zed.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2025-12-26 08:14:12 UTC; 10min ago

Meaning: ZED is alive; it can handle hotplug and fault notifications.
Decision: If ZED is dead, you’ll miss automated replacements and alerts. It won’t cause module mismatch, but it will hide the story until it’s ugly.

Three corporate mini-stories (anonymized, plausible, and painfully familiar)

Incident caused by a wrong assumption: “The kernel update is just security patches”

A mid-sized SaaS shop ran ZFS on Linux for database replicas. They had a weekly patch window, and kernel updates were treated as
“routine hygiene.” Storage wasn’t on the change ticket because, quote, “ZFS is installed already.”

The patch run updated the kernel and rebooted the node fleet. A chunk of nodes came back fine. A few didn’t. The broken ones dropped
into emergency mode with missing filesystems, and the on-call saw the usual red herring: “cannot import pool.”

The actual failure was boring: those nodes had a slightly different kernel flavor installed earlier (same major series, different build),
and DKMS didn’t compile ZFS for that specific kernel because headers weren’t present. No headers, no module. No module, no pool. ZFS did
exactly what it should: it refused to pretend.

The fix wasn’t heroic. They installed the correct headers, rebuilt DKMS, regenerated initramfs, and rebooted. Then they updated their
patch workflow: kernel upgrades are storage changes if you run out-of-tree storage modules. The “wrong assumption” was thinking OS and
storage are separate layers. On Linux, they’re roommates.

Optimization that backfired: “Let’s enable every shiny feature flag everywhere”

A company standardized on ZFS for build artifact storage. Someone noticed “zpool upgrade” listed a buffet of features and concluded:
more features equals more better. They upgraded pools across the primary cluster during a quiet quarter. Nothing broke immediately.

Months later, disaster recovery testing moved a replicated pool to a smaller standby environment that lagged a couple OpenZFS releases
behind. Import failed with unsupported features. The DR host wasn’t “wrong,” it was just older.

The team tried to fix it under pressure. Spoiler: you can’t “downgrade feature flags.” They ended up standing up a newer standby host
to import the pool, then moving data across at the filesystem level. That cost time, introduced risk, and made the DR test look like
improv theater.

The lesson: pool features are a compatibility contract. If you have to move pools between hosts, treat feature flags like API versions.
Only enable what you need, and only after confirming your recovery environments can import it. “Because it’s available” is not a requirement.

Boring but correct practice that saved the day: kernel pinning plus staged ZFS upgrades

A large enterprise ran mixed workloads: VM storage, analytics, and some legacy NFS. They had the budget for good hardware and the
patience for process. Their change management was not glamorous, but it worked.

They pinned kernels on storage nodes to a known-good series and only advanced after a staging bake. ZFS packages were updated in a
controlled cadence, and they never ran zpool upgrade automatically. Pools stayed on a conservative feature set until
the whole fleet—including break-glass recovery hosts—was confirmed compatible.

Then a kernel security advisory hit, and everyone panicked. They didn’t. They rolled the new kernel into staging, validated DKMS builds,
verified initramfs content, reboot-tested, and ran targeted I/O tests. Only then did production move.

A month later, another team had a rushed kernel upgrade on general compute nodes and hit module issues with an unrelated driver.
Storage nodes kept serving. The enterprise’s “boring” discipline meant their storage platform stayed predictable while other parts of the
company practiced surprise engineering.

Joke #2: The only thing worse than a ZFS upgrade surprise is discovering your DR plan was “import the pool and hope.”

How to upgrade without turning your pool into a science project

Rule 1: Separate “software upgrade” from “on-disk format upgrade”

Upgrading packages (ZFS userland + kmod) is reversible: you can roll back packages, boot an older kernel, or reinstall known-good versions.
Upgrading the pool’s feature flags is not reversible in any practical sense.

Do software upgrades first. Run for a while. Confirm stability. Only then consider feature flags, and only if you have a reason you can explain
without using the word “modern.”

Rule 2: Keep kernels and ZFS modules paired intentionally

On DKMS-based systems, kernel upgrades are build events. Your CI/CD may be flawless and your DKMS hooks may still fail because one host is missing
headers, or because a new kernel release changed an internal interface enough to break compilation.

If you’re running a fleet, you want:

  • A single kernel series per node class (storage vs compute)
  • Automated verification that DKMS built for the installed kernels
  • A reboot test before declaring the upgrade “done”

Rule 3: Treat pool portability as a first-class requirement (or explicitly drop it)

Some environments truly don’t need to import pools elsewhere. Fine. Write it down and accept the risk. But many orgs assume portability
for DR, forensic recovery, or vendor support. If you assume you can move a pool, you must preserve compatibility.

Practically: maintain a compatibility matrix of OpenZFS versions across all hosts that might import a pool. If one environment lags,
don’t enable features it can’t handle.

Rule 4: Don’t mix distribution repos and “random builds” on production storage

Mixing vendor kernels, HWE kernels, backports, and third-party ZFS builds can work—until it doesn’t. Then you’re debugging an ecosystem,
not a system. You want the smallest set of moving parts that meets requirements.

If you must deviate (custom kernel, custom ZFS), do it everywhere consistently and own the test burden. “One special host” always
becomes the host that fails in the most creative way.

Rule 5: Prefer predictable upgrade paths over maximum novelty

The storage stack is not the place to chase every new kernel release. Your users won’t thank you for preemptively discovering a regression.
Pick stable kernel lines, keep ZFS matched, and spend your innovation budget on things that improve SLOs.

Common mistakes: symptom → root cause → fix

1) Symptom: modprobe zfs fails after kernel upgrade

Root cause: ZFS module not built for the new kernel; missing headers; DKMS build failed; vermagic mismatch.

Fix: Install matching kernel headers, rebuild DKMS for that kernel, confirm modinfo -F vermagic zfs matches, rebuild initramfs, reboot.

2) Symptom: zfs version shows userland newer than kmod (or vice versa)

Root cause: Mixed repos, partial upgrades, pinned packages, or manual installs.

Fix: Align packages to the same OpenZFS release from the same repo channel. Don’t “just upgrade zfsutils-linux” without the kernel side.

3) Symptom: Pool won’t import on a different host; error mentions unsupported features

Root cause: Feature flags enabled on pool are not supported by that host’s ZFS implementation.

Fix: Import on a host that supports the features, then migrate data (send/receive, rsync, application-level replication). Prevent recurrence: stop enabling features without fleet-wide compatibility.

4) Symptom: System drops to initramfs shell; root pool not found

Root cause: ZFS modules missing from initramfs or not loaded early enough; wrong kernel booted; initramfs not regenerated after DKMS.

Fix: Boot an older working kernel, rebuild initramfs for the intended kernel, ensure ZFS hooks included, then test reboot.

5) Symptom: After upgrade, mounts hang or import is slow

Root cause: Not always versions. Could be device path changes, multipath issues, missing by-id naming, or a txg stall due to I/O timeouts.

Fix: Check dmesg for I/O errors, confirm device names stable, verify zpool status and scrub results, then investigate performance stats (ARC, txg, vdev queueing).

6) Symptom: You “fixed it” by rebooting, then it breaks again next reboot

Root cause: You loaded a module manually but didn’t rebuild/initramfs, or DKMS rebuilt for one kernel but you boot another.

Fix: Make the fix persistent: ensure the correct kernel is installed and default, DKMS built for it, initramfs updated, and packages pinned/managed properly.

7) Symptom: zpool import sees pool but import fails with “missing log” or “cannot open” devices

Root cause: Device discovery naming changed across boots (e.g., /dev/sdX churn), missing partition, failed HBA, or udev race; not primarily a module version issue.

Fix: Use persistent device paths (by-id), check HBAs, validate partitions, and only then blame software versions.

Checklists / step-by-step plan

Checklist A: Before a kernel upgrade on ZFS hosts

  1. Confirm whether this host uses DKMS or prebuilt kmods (Task 14).
  2. Ensure kernel headers for the new kernel will be installed alongside the kernel.
  3. Confirm build tooling is present (compiler, make) if DKMS is used.
  4. Run a dry check: after installing the new kernel (but before reboot), verify DKMS status shows the module built for the new kernel.
  5. Regenerate initramfs for the new kernel (Task 16) and verify it contains zfs.ko (Task 15).
  6. Have a rollback plan: ensure old kernel remains installed and bootable.

Checklist B: Safe OpenZFS package upgrade (userland + module)

  1. Upgrade packages in lockstep (don’t upgrade zfsutils alone).
  2. Confirm zfs version shows matching userland and kmod (Task 9).
  3. Confirm modinfo -F vermagic zfs matches uname -r (Tasks 1 and 8).
  4. Verify pool health (zpool status -v) before and after (Task 17).
  5. Reboot in a controlled window and verify import/mount behavior.

Checklist C: Deciding whether to run zpool upgrade

  1. Write down the reason (specific feature needed, performance bug fixed, operational necessity).
  2. Confirm every system that might need to import the pool supports the features you’ll enable.
  3. Confirm backups/replication are current.
  4. Prefer upgrading a test pool first, then validating send/receive, scrub, and failover/import workflows.
  5. Upgrade one pool at a time. Watch it. Then proceed.

Step-by-step: When a host won’t import pools after reboot

  1. Confirm running kernel (uname -r).
  2. Check module load (lsmod, modprobe zfs).
  3. If modprobe fails, read dmesg for symbol/vermagic errors.
  4. Confirm DKMS status for that kernel; rebuild if needed.
  5. Verify initramfs contains ZFS modules; regenerate if missing.
  6. Only after module issues are resolved: run zpool import and attempt zpool import -N.
  7. Check feature flags and portability constraints if importing on a different host.

FAQ

1) What’s the difference between “ZFS version” and “pool version”?

“ZFS version” usually means the software release (userland and kernel module). “Pool version” used to be a monolithic number; modern OpenZFS uses feature flags instead.
Software can be upgraded without changing the pool. Enabling new pool features changes on-disk compatibility.

2) If zfs version shows mismatched userland and kmod, is it always fatal?

Not always immediately fatal. It is always a bad smell. You might get missing properties, odd errors, or subtle behavior differences.
In production, align them. “Works for me” is how outages incubate.

3) Why does ZFS break after a kernel update when ext4 doesn’t?

ext4 is in the kernel tree. ZFS is not. Kernel updates can change internal interfaces, and out-of-tree modules must be rebuilt (DKMS) or replaced with matching prebuilt modules.

4) Should I prefer DKMS or prebuilt kmods?

Prebuilt kmods are usually more predictable if your distro supports them well. DKMS is flexible and common, but shifts reliability onto your build environment.
If you run DKMS, invest in pre-reboot validation and staging.

5) Can I downgrade a pool after running zpool upgrade?

Practically, no. Once new feature flags are enabled, older implementations may refuse import. Your “downgrade” becomes “migrate data to a new pool.”

6) Is it safe to run zpool upgrade -a during routine upgrades?

No. It’s the storage equivalent of “format C: for cleanliness.” Upgrade pools deliberately and sparingly, after verifying your recovery/import environments.

7) My pool imports manually, but not on boot. What’s going on?

Usually initramfs or ordering. The ZFS module might not be in initramfs, or the service ordering is wrong (keys not available, devices not ready).
Verify initramfs content and systemd units, and test with zpool import -N during boot troubleshooting.

8) Why does dkms status say “installed” but modprobe still fails?

DKMS can be “installed” for a different kernel than you booted, or you may have multiple kernels installed and you’re loading modules from the wrong tree.
Check uname -r, modinfo -F vermagic, and verify the module path under /lib/modules/$(uname -r).

9) Is a kernel/ZFS mismatch only a boot-time problem?

Mostly, but not exclusively. A “works but degraded” scenario can happen if you’re on a new combo with regressions or changed defaults.
Treat storage performance incidents after upgrades as suspect until proven otherwise.

10) What’s the safest portability strategy for DR?

Keep DR hosts on equal or newer OpenZFS versions than production, and avoid enabling pool features until DR is verified.
If portability is critical, consider standardizing feature flags and documenting them like an API contract.

Conclusion: next steps that actually reduce pager noise

ZFS doesn’t “randomly break” after kernel updates. Humans break it by letting kernel, module, tools, and pool features drift into incompatible combinations.
The fix is boring: control versions, verify builds, test reboots, and treat feature flags as a compatibility promise.

Do this next:

  1. Pick your upgrade model (DKMS vs prebuilt) and standardize it per node class.
  2. Add a pre-reboot gate: confirm DKMS built for the target kernel and initramfs includes ZFS modules.
  3. Stop automatic pool upgrades. Require an explicit decision and a portability check.
  4. Document a one-page “Fast diagnosis” flow and make on-call follow it. Consistency beats creativity at 02:00.
← Previous
Debian 13: Swap Is Growing and Performance Tanks — Memory Pressure Fixes That Actually Help
Next →
Docker “No route to host” from containers: routing and iptables fixes that stick

Leave a comment