Ubuntu 24.04 updates broke kernel modules: rebuild initramfs correctly (case #28)

November 13, 2025 • February 3, 2026 • Read: 26 min • Views: 4

Was this helpful?

You reboot after “just a routine security update,” and your server returns the favor by dropping into an initramfs shell,
refusing to mount root, or spamming modprobe failures like it’s getting paid per error.
Production is down, someone is asking “is it hardware,” and you’re staring at a kernel that no longer recognizes its own modules.

In Ubuntu 24.04, the most common reason is boring and fixable: your initramfs is stale, incomplete, or built against the wrong kernel,
so the early boot environment can’t load the storage/network modules it needs. The cure isn’t mystical.
It’s a disciplined rebuild, with verification, and with the right mental model of what initramfs is actually doing.

How this failure shows up (and why it’s confusing)

“Modules broke” is a lazy diagnosis, but it’s usually what your console tells you.
On Ubuntu 24.04, the failure modes cluster around early boot: the point where the kernel is running,
but your real root filesystem isn’t mounted yet.

Common symptom patterns

Drop to initramfs shell (BusyBox) with “ALERT! /dev/… does not exist”.
The kernel can’t load the storage driver (NVMe, virtio, RAID HBA, dm-crypt, LVM, ZFS) from the initramfs image.
Boots, but network is dead until later—or forever.
Missing NIC module or firmware in initramfs can block network-root or iSCSI, or break early-stage network requirements.
NVIDIA / ZFS / WireGuard / vendor modules stopped loading.
Usually DKMS didn’t build for the new kernel, or Secure Boot refuses unsigned modules.
“Unknown symbol” / “invalid module format” in dmesg.
That’s a mismatch: module built for another kernel build, or vermagic disagrees.
Systemd services fail post-boot because modules that “used to be there” aren’t.
If the system boots from a fallback kernel but your module packages track “latest,” the mismatch persists and shows up as runtime errors.

The confusion comes from timing. Once you’re in the initramfs environment, you have a tiny filesystem image
with a minimal set of binaries and kernel modules. If that image is wrong, the kernel can be fine and still fail to reach userspace.
The fix is typically to regenerate the image so the right modules and configuration are embedded for the kernel you intend to boot.

Joke #1: An initramfs is like a parachute—most days you forget it exists, and on the day it matters you really want it packed correctly.

Interesting facts and context (so the behavior makes sense)

A few concrete facts help you stop guessing and start diagnosing. These aren’t trivia; they explain why “it worked yesterday” is plausible.

initramfs replaced initrd as the mainstream early-boot mechanism in Linux.
The switch (years ago) mattered because initramfs is a cpio archive unpacked into a RAM-backed filesystem; it’s flexible and scriptable.
Ubuntu’s default tooling is still initramfs-tools.
Some ecosystems prefer dracut, but on Ubuntu 24.04 you usually troubleshoot update-initramfs and hooks in /etc/initramfs-tools.
Kernel module compatibility is tied to “vermagic”.
If the module’s vermagic doesn’t match the running kernel, you get “invalid module format” and it won’t load.
DKMS is a build system, not magic glue.
It compiles out-of-tree modules against the installed kernel headers. If headers aren’t installed, or the build fails, you get no module for the new kernel.
Secure Boot can silently turn module loading into a policy decision.
If Secure Boot is enabled, unsigned modules may be refused. You can rebuild initramfs perfectly and still fail if signatures don’t match policy.
Some storage stacks are needed before root is mounted.
LVM, dm-crypt, MD RAID, NVMe, virtio-blk/scsi, and ZFS can be required inside initramfs. If they’re missing, you don’t reach the real filesystem.
Microcode updates can change early boot behavior.
CPU microcode may be included in initramfs; updating it can change timings and reveal races or firmware dependencies.
Ubuntu keeps multiple kernels installed by default.
That safety net is why you can often boot an older kernel and repair the initramfs for the newer one from a stable environment.
Compressed initramfs formats vary (gzip, zstd).
If you copy images across systems or mess with compression, you can create “unpacking initramfs failed” problems that look like module issues.

One paraphrased idea from John Allspaw (operations/reliability): Reliability comes from learning how systems actually fail, not pretending they won’t.
That’s the vibe here: stop treating initramfs as a black box.

Fast diagnosis playbook (first/second/third)

When a boot breaks, your goal is not “fix everything.” Your goal is to find the bottleneck in the boot chain.
Think in layers: firmware → bootloader → kernel → initramfs → real root → services.
Module breakage usually lives in layers 3–5.

First: determine what kernel you’re actually booting

If you can reach a shell, check uname -r.
If you can’t, check GRUB and boot an older kernel to get a working environment.
Decision: if an older kernel boots, you have a repair path without rescue media.

Second: decide if it’s “initramfs can’t mount root” or “modules missing post-boot”

If you land in initramfs BusyBox with root device missing, it’s early-boot: initramfs content/config or storage module/firmware.
If the system boots but specific modules fail, it’s usually DKMS, module signature policy, or wrong kernel headers.
Decision: early boot failures require rebuilding initramfs for the target kernel; runtime failures may require rebuilding DKMS modules and then regenerating initramfs if needed.

Third: check for module mismatch vs module absence

“Invalid module format” → mismatch (wrong build, wrong kernel, wrong headers).
“Module not found” → absence (not installed, not built, not included in initramfs).
Decision: mismatch pushes you toward DKMS rebuild/headers; absence pushes you toward package install and initramfs hooks.

Fourth: verify Secure Boot status before you waste time

Secure Boot enabled + third-party modules = signatures matter.
Decision: either enroll a key/sign modules, or disable Secure Boot for that host’s operational policy.

The fastest route is almost always: boot a known-good kernel, fix packages/headers/DKMS, regenerate initramfs for the kernel you actually want, update GRUB, reboot once.
Don’t do repeated “reboot and hope.” That’s not engineering; that’s superstition with extra steps.

A practical mental model: kernel, modules, initramfs, DKMS

Here’s the model that keeps you from making expensive guesses.

Kernel and modules

The kernel image (vmlinuz) is the core executable. Most hardware support and features live in loadable modules under
/lib/modules/<kernel-version>/. Those modules are tightly coupled to the kernel build.
That coupling is enforced by vermagic and symbol versions.

initramfs is not your root filesystem

initramfs is an early boot filesystem image. The kernel unpacks it to RAM, runs /init,
and that script mounts the real root filesystem (your disk, RAID, LUKS, ZFS dataset, whatever).
If you can’t load the module that talks to your storage, you can’t mount the real root, and you stop there.

Why updates trigger this

A kernel update typically installs:
linux-image-…, linux-modules-…, and maybe linux-modules-extra-….
It also triggers initramfs regeneration hooks.
If any of these steps fail (disk full, hook error, DKMS failure, missing headers, interrupted upgrade),
you can end up with a kernel that exists on disk but an initramfs that does not contain what boot needs.

DKMS is a usual suspect

DKMS builds modules like ZFS, NVIDIA, VirtualBox, some vendor NIC/RAID drivers, and various specialty modules.
After a kernel upgrade, DKMS should compile modules for the new kernel automatically.
But “should” is the most dangerous word in ops. If the build fails, you might still reboot into the new kernel and discover nothing loads.

Why rebuilding initramfs works (when done correctly)

A correct rebuild does three things:

Ensures module dependency metadata is correct (depmod output matches the kernel’s modules).
Ensures the required modules and firmware are included in the initramfs image for that kernel.
Ensures bootloader entries point to a matching kernel+initramfs pair.

If you rebuild initramfs for the wrong kernel, or rebuild while DKMS is still broken, you’ll produce a beautiful, perfectly wrong artifact.
That’s how people end up in reboot loops.

Practical tasks: commands, output meanings, decisions

This section is the meat. Real commands, what you should see, and what decision you make from each output.
Use it as a diagnostic runbook. Run commands as root where needed.

Task 1: Identify the running kernel (and whether you’re on a fallback)

cr0x@server:~$ uname -r
6.8.0-49-generic

Meaning: This is the kernel currently running. If you’re troubleshooting a boot failure of a newer kernel,
you may have booted an older one via GRUB.
Decision: Keep this value. You’ll rebuild initramfs for the kernel you intend to boot, not necessarily the one you’re running.

Task 2: List installed kernels (so you know your targets)

cr0x@server:~$ dpkg -l | awk '/^ii  linux-image-[0-9]/{print $2}' | sort -V
linux-image-6.8.0-49-generic
linux-image-6.8.0-50-generic

Meaning: These kernel images are installed. Your broken boot likely involves the newest one.
Decision: Pick the kernel version you want to fix (usually the latest installed).

Task 3: Confirm the initramfs images exist for the target kernel

cr0x@server:~$ ls -lh /boot/initrd.img-6.8.0-50-generic /boot/vmlinuz-6.8.0-50-generic
-rw-r--r-- 1 root root  98M Dec 30 10:12 /boot/initrd.img-6.8.0-50-generic
-rw------- 1 root root  14M Dec 30 10:11 /boot/vmlinuz-6.8.0-50-generic

Meaning: The kernel and initramfs artifacts exist. If initrd is missing or tiny, you have a packaging/hook failure.
Decision: If missing or suspiciously small, rebuild initramfs and check hook logs.

Task 4: Check if /boot is full (the classic silent killer)

cr0x@server:~$ df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       974M  942M     0 100% /boot

Meaning: No free space. Kernel upgrades can “install” but fail to write initramfs, or write truncated images.
Decision: Free space before rebuilding: remove older kernels or enlarge /boot. Rebuild after space is available.

Task 5: Find the last initramfs build errors in logs

cr0x@server:~$ journalctl -b -1 -u systemd-update-done.service --no-pager
Dec 30 09:58:12 server systemd[1]: Finished Wait for System Update to Complete.

cr0x@server:~$ grep -R "update-initramfs" -n /var/log/apt/term.log | tail -n 20
Log started: 2025-12-30  09:54:18
Setting up linux-image-6.8.0-50-generic (6.8.0-50.51) ...
update-initramfs: Generating /boot/initrd.img-6.8.0-50-generic
W: Possible missing firmware /lib/firmware/i915/skl_dmc_ver1_27.bin for module i915

Meaning: APT logs show if initramfs was generated and whether firmware warnings occurred.
Decision: Warnings aren’t always fatal, but if you see errors (I/O errors, no space, hook failures), fix those first.

Task 6: Verify module directory exists for the kernel

cr0x@server:~$ ls -ld /lib/modules/6.8.0-50-generic
drwxr-xr-x 6 root root 4096 Dec 30 10:10 /lib/modules/6.8.0-50-generic

Meaning: Modules for that kernel are installed.
Decision: If missing, reinstall linux-modules-* packages for that kernel.

Task 7: Check depmod for errors (dependency metadata)

cr0x@server:~$ sudo depmod -a 6.8.0-50-generic

Meaning: No output is good. Errors here mean your module tree is inconsistent.
Decision: If depmod complains about missing files, reinstall kernel modules packages before rebuilding initramfs.

Task 8: Confirm the module you need exists (storage driver example)

cr0x@server:~$ modinfo -k 6.8.0-50-generic nvme | sed -n '1,6p'
filename:       /lib/modules/6.8.0-50-generic/kernel/drivers/nvme/host/nvme.ko.zst
license:        GPL
description:    NVM Express block device driver
author:         Matthew Wilcox <willy@linux.intel.com>
alias:          pci:v0000106Bd00002001sv*sd*bc*sc*i*

Meaning: The module exists on disk for that kernel.
Decision: If modinfo fails, install the missing modules package (often linux-modules-extra) or correct kernel selection.

Task 9: Check DKMS status (out-of-tree modules)

cr0x@server:~$ dkms status
zfs/2.2.2, 6.8.0-49-generic, x86_64: installed
zfs/2.2.2, 6.8.0-50-generic, x86_64: built
nvidia/550.90.07, 6.8.0-50-generic, x86_64: install failed

Meaning: DKMS modules may not be installed for the new kernel.
Decision: Fix DKMS failures before you rebuild initramfs if early boot depends on those modules (e.g., ZFS root).

Task 10: Confirm Secure Boot state (module signature policy)

cr0x@server:~$ mokutil --sb-state
SecureBoot enabled

Meaning: Secure Boot is enabled. Unsigned third-party modules may not load.
Decision: If your broken module is third-party (NVIDIA, vendor, ZFS DKMS in some setups), plan for signing/enrolling keys or disabling Secure Boot (policy decision).

Task 11: Inspect initramfs contents for a specific module

cr0x@server:~$ lsinitramfs /boot/initrd.img-6.8.0-50-generic | grep -E '/nvme\.ko|/zfs\.ko|/virtio_blk\.ko' | head
usr/lib/modules/6.8.0-50-generic/kernel/drivers/nvme/host/nvme.ko.zst

Meaning: The initramfs includes the module. If it doesn’t, early boot may not see the device.
Decision: If required modules are absent, force inclusion (initramfs-tools config) and regenerate.

Task 12: Validate root device mapping (UUID vs device names)

cr0x@server:~$ findmnt -no SOURCE /
/dev/mapper/vg0-root

cr0x@server:~$ cat /etc/fstab | sed -n '1,8p'
# /etc/fstab: static file system information.
UUID=8b7f5f4a-2f15-4e5a-a6b3-7c9e0c03c7b0 / ext4 defaults 0 1

Meaning: You’re using UUIDs (good). If /etc/fstab uses raw device names (like /dev/sda3),
the boot can break when enumeration changes.
Decision: Use UUIDs/LABELs and ensure initramfs has the modules needed to find those devices.

Task 13: Rebuild initramfs for a specific kernel (the core fix)

cr0x@server:~$ sudo update-initramfs -u -k 6.8.0-50-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-50-generic
W: Possible missing firmware /lib/firmware/amdgpu/gc_11_0_0_mes.bin for module amdgpu

Meaning: The image was regenerated. Warnings about firmware might be acceptable, depending on hardware and whether that driver is needed at boot.
Decision: If you see errors (not warnings), stop and fix them. If you rely on that firmware (GPU console on some systems), install firmware packages.

Task 14: Rebuild all initramfs images (useful after systemic fixes)

cr0x@server:~$ sudo update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.0-49-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-50-generic

Meaning: All installed kernels got refreshed initramfs images.
Decision: Use this after you fix DKMS or firmware so both the “current” and “fallback” kernels are consistent.

Task 15: Update GRUB entries (avoid booting mismatched artifacts)

cr0x@server:~$ sudo update-grub
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-50-generic
Found initrd image: /boot/initrd.img-6.8.0-50-generic
Found linux image: /boot/vmlinuz-6.8.0-49-generic
Found initrd image: /boot/initrd.img-6.8.0-49-generic
done

Meaning: GRUB sees matching kernel and initrd pairs.
Decision: If GRUB doesn’t list your initrd image, you’re booting without the thing you’re trying to fix. Go back to /boot integrity and naming.

Task 16: Verify module loading errors in the kernel log

cr0x@server:~$ dmesg -T | egrep -i 'invalid module format|Unknown symbol|module verification failed|failed to load' | tail -n 20
[Mon Dec 30 10:22:41 2025] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[Mon Dec 30 10:22:41 2025] nvidia: loading out-of-tree module taints kernel.

Meaning: This indicates signature policy and tainting. If Secure Boot is enforcing, it might refuse instead of tainting.
Decision: Decide if you will sign modules (preferred for fleets with Secure Boot) or change Secure Boot policy for this host class.

Rebuild initramfs correctly (by scenario)

Scenario A: The system boots an older kernel, but the newest kernel fails

This is the best-case failure. You have a stable userland to repair from.
Do not “fix” by removing the new kernel unless you’re truly boxed in; you’ll just defer the problem to the next upgrade window.

Free space in /boot if needed. That’s step zero, not step seven.
Ensure headers for the target kernel are installed (especially if DKMS is involved).
Repair DKMS modules for the target kernel.
Regenerate initramfs for that kernel.
Update GRUB and reboot once.

cr0x@server:~$ sudo apt-get update
Hit:1 Ubuntu noble InRelease
Reading package lists... Done

cr0x@server:~$ sudo apt-get install -y linux-headers-6.8.0-50-generic
Reading package lists... Done
Building dependency tree... Done
The following NEW packages will be installed:
  linux-headers-6.8.0-50-generic
Setting up linux-headers-6.8.0-50-generic (6.8.0-50.51) ...

Meaning: Headers are present. DKMS builds have a fighting chance.
Decision: If headers were missing, expect DKMS failures before. Re-run DKMS install/build now.

cr0x@server:~$ sudo dkms autoinstall -k 6.8.0-50-generic
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Installing module zfs/2.2.2 for kernel 6.8.0-50-generic

Meaning: DKMS built/installed modules for the target kernel. If signing is configured, it signs them.
Decision: If DKMS fails, read the build log and fix that first. Don’t rebuild initramfs while DKMS is broken if boot depends on those modules.

cr0x@server:~$ sudo update-initramfs -u -k 6.8.0-50-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-50-generic

cr0x@server:~$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-50-generic
Found initrd image: /boot/initrd.img-6.8.0-50-generic
done

Scenario B: Dropped into initramfs BusyBox, cannot mount root

If you’re already in the initramfs shell, you’re in a restricted environment. The goal is to gather enough signal to decide:
“missing module/firmware” vs “wrong root= parameter” vs “filesystem corruption.”

In the initramfs prompt, check what the kernel can see and what it can load.

cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.0-50-generic root=UUID=8b7f5f4a-2f15-4e5a-a6b3-7c9e0c03c7b0 ro quiet splash

Meaning: Root is specified by UUID. Good. Now verify the UUID appears in block devices.
Decision: If root is specified as /dev/sdX and the letter changed, fix GRUB/cmdline and fstab to use UUID.

cr0x@server:~$ ls /dev/disk/by-uuid | head
2c1f9b55-0a69-4b59-9c4d-2b0a7dc3c77a

Meaning: If your root UUID is missing here, the kernel can’t see the disk or the partition table.
Decision: That points to missing storage driver, missing firmware, or hardware issue. Rebuild initramfs from a working kernel or rescue environment.

cr0x@server:~$ modprobe nvme
modprobe: module nvme not found in modules.dep

Meaning: The initramfs does not contain the nvme module or dependency metadata.
Decision: You need to boot another kernel (or rescue media), ensure linux-modules is installed, and regenerate initramfs with nvme included.

In this scenario, don’t attempt clever repairs inside the broken initramfs unless you know exactly what you’re doing.
It’s typically faster to reboot into an older kernel, or use recovery mode, fix, rebuild, and reboot.

Scenario C: ZFS root or “root on exotic storage” fails after update

ZFS is a frequent case because it often relies on out-of-tree modules (via DKMS) and is required before root mounts.
If ZFS modules aren’t present in initramfs, boot stops early.

Your priority order:
DKMS status → module presence → initramfs regeneration → verify initramfs contains ZFS → reboot.

cr0x@server:~$ dkms status | grep -i zfs
zfs/2.2.2, 6.8.0-50-generic, x86_64: installed

cr0x@server:~$ lsinitramfs /boot/initrd.img-6.8.0-50-generic | grep -E '/zfs\.ko|/zcommon\.ko' | head
usr/lib/modules/6.8.0-50-generic/updates/dkms/zfs.ko
usr/lib/modules/6.8.0-50-generic/updates/dkms/zcommon.ko

Meaning: ZFS modules are embedded. If they’re not, boot will fail before importing pools.
Decision: If missing, ensure initramfs-tools hooks for ZFS are present (package state) and rerun update-initramfs.

Scenario D: NVIDIA or other third-party runtime modules fail post-boot

For runtime modules, initramfs can still matter (if you want early KMS, encrypted root prompts over GPU console, etc.),
but most of the time it’s a DKMS/signing issue.

cr0x@server:~$ dkms status | grep -i nvidia
nvidia/550.90.07, 6.8.0-50-generic, x86_64: installed

cr0x@server:~$ modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': Key was rejected by service

Meaning: Signature policy rejection (common with Secure Boot). The module exists but is not trusted.
Decision: Either sign the module with an enrolled key, enroll MOK, or disable Secure Boot for that machine class.

Joke #2: Secure Boot is great until it decides your carefully built module is “unauthorized,” like a bouncer at a kernel nightclub.

Three corporate mini-stories from the trenches

1) The incident caused by a wrong assumption: “The initramfs always rebuilds on upgrade”

A mid-sized company ran a fleet of Ubuntu servers hosting internal CI runners and artifact caches.
Nothing exotic—until you notice that their “standard image” had a tiny /boot partition inherited from a 2018-era layout.
It worked for years because kernels were infrequent and no one watched /boot usage.

One Tuesday, an unattended upgrade pulled a new kernel. The package installation succeeded, but /boot hit 100% while generating the initramfs.
The install process logged errors, but the upgrade pipeline didn’t treat it as fatal. The servers kept running on the old kernel.
Everything looked green.

The next scheduled reboot window arrived. Machines rebooted into a kernel whose initramfs was missing or truncated.
Half the fleet dropped into initramfs BusyBox with missing root devices. The other half booted only because they accidentally selected the older kernel from GRUB defaults.
The incident commander assumed “it can’t be initramfs; the upgrade would have rebuilt it.”

The fix was unglamorous: expand /boot on the base image, enforce a check that initramfs generation succeeded, and keep at least one old kernel as a fallback.
They also changed their reboot automation: it now verifies that /boot/vmlinuz and /boot/initrd pairs exist for the default entry before restarting.

The wrong assumption wasn’t technical ignorance. It was operational laziness: confusing “package scripts ran” with “the produced artifact is valid.”
In ops, you don’t get credit for intending to generate an initramfs.

2) The optimization that backfired: “Trim initramfs to speed boot”

Another organization ran latency-sensitive workloads and had a habit of shaving milliseconds off anything that moved.
Someone noticed initramfs size and decided it was “bloat.” They edited initramfs-tools configs to be aggressive:
fewer modules, fewer hooks, and a belief that udev would discover everything later.

It worked on their primary hardware for a while. Boot time improved slightly. The change was rolled across the fleet with little fanfare.
The trick, of course, is that fleets are never uniform, and reality loves edge cases.

A new batch of servers arrived with a different storage controller and required a module that wasn’t built-in to the kernel.
The trimmed initramfs didn’t include it. The machines failed to mount root and never reached configuration management.
That meant they couldn’t self-heal, couldn’t pull fixes, and couldn’t even phone home properly.

The postmortem was painful because the “optimization” was correct on the old hardware and wrong on the new.
The lesson was not “never optimize.” It was: if you’re going to optimize early boot artifacts, you must have hardware-aware profiles or include broad storage drivers by default.
Boot correctness beats boot speed. Every time.

3) The boring but correct practice that saved the day: “Always keep a known-good boot path”

A finance org ran Ubuntu hosts with encrypted root and remote hands that cost real money.
Their rule was old-school: never remove the previous kernel until the new one has booted successfully at least once,
and always have console access tested quarterly. Boring. Predictable. Very unsexy.

One upgrade cycle, a DKMS module failed to build for the newest kernel due to a toolchain mismatch.
The initramfs for the new kernel was generated, but it didn’t contain the required module for their root stack.
The next reboot would have been a brick.

But they didn’t reboot blindly. Their change process included a pre-reboot check:
verify DKMS status for the target kernel, verify initramfs contains required modules, verify /boot has space,
and verify GRUB points at a matching kernel+initrd pair. The pre-check failed, so the reboot was postponed.

They patched the DKMS build issue, regenerated initramfs, then rebooted in a controlled window.
No outage. No heroics. No war room. The best incident is the one that never gets invited.

Common mistakes: symptom → root cause → fix

1) Drops to initramfs shell: “ALERT! UUID=… does not exist”

Root cause: Storage driver missing from initramfs, or initramfs built for a different kernel than the one booted.
Sometimes it’s a missing firmware blob for the storage/NIC device.

Fix: Boot a known-good kernel, ensure correct linux-modules packages installed, then update-initramfs -u -k <target> and update-grub.
Verify with lsinitramfs that the needed module is present.

2) “Invalid module format” when loading a module

Root cause: Module was built for another kernel version/build flags (vermagic mismatch) or stale DKMS artifacts.

Fix: Install matching headers, rebuild DKMS module for the target kernel, run depmod -a <kernel>, then rebuild initramfs.

3) “Key was rejected by service” for NVIDIA/ZFS/vendor modules

Root cause: Secure Boot enforcing module signature policy; module is unsigned or signed with an untrusted key.

Fix: Choose: enroll a Machine Owner Key and sign modules (fleet-friendly), or disable Secure Boot (simpler but policy-dependent).
Rebuild initramfs if early boot needs that module.

4) Kernel installs but initrd is missing or tiny

Root cause: /boot full, interrupted upgrade, filesystem errors, or hook scripts failing.

Fix: Free space, run filesystem checks if needed, rerun update-initramfs. Confirm artifact size and timestamps.

5) Boot works only with older kernel after update

Root cause: DKMS not built for the new kernel, or missing linux-modules-extra package for the new kernel.

Fix: Repair DKMS status, install missing packages, regenerate initramfs, and update GRUB.

6) “unpacking initramfs failed” or early kernel panic

Root cause: Corrupted initramfs image (disk full at write time, bad storage, or manual edits), or incompatible compression tooling.

Fix: Recreate initramfs from scratch with update-initramfs -c -k <kernel>, and verify /boot filesystem health and free space.

7) Root mounts, but devices missing later

Root cause: Module package mismatch or missing firmware package; initramfs wasn’t rebuilt after firmware/module changes.

Fix: Install firmware packages as needed, run update-initramfs -u -k all, reboot.

Checklists / step-by-step plan

Checklist 1: The “don’t make it worse” rules

Do not remove the only working kernel. Keep at least one known-good entry until the new one boots.
Do not rebuild initramfs while /boot is full.
Do not assume DKMS succeeded because APT exited 0. Verify DKMS status.
Do not change Secure Boot policy mid-incident without recording the decision and the rollback path.
Do not debug module issues without confirming which kernel you’re booting.

Checklist 2: Step-by-step recovery when you can boot an older kernel

Confirm current kernel:
```
cr0x@server:~$ uname -r
6.8.0-49-generic
```
Decision: If you’re already on the newest kernel, your issue is likely runtime/DKMS/signing, not initramfs for a different kernel.

Identify the target kernel you want to boot:

cr0x@server:~$ dpkg -l | awk '/^ii  linux-image-[0-9]/{print $2}' | sort -V | tail -n 2
linux-image-6.8.0-49-generic
linux-image-6.8.0-50-generic

Decision: Target is usually the latest installed.

Check /boot space:

cr0x@server:~$ df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       974M  620M  305M  68% /boot

Decision: If >90% used, clean up old kernels before you regenerate.

Ensure modules tree exists:
```
cr0x@server:~$ test -d /lib/modules/6.8.0-50-generic && echo ok
ok
```
Decision: If not ok, reinstall the kernel modules packages.

Ensure headers exist if DKMS modules are in play:

cr0x@server:~$ dpkg -l | awk '/^ii  linux-headers-6.8.0-50-generic/{print $2, $3}'
linux-headers-6.8.0-50-generic 6.8.0-50.51

Decision: If missing, install headers and rerun DKMS.

Repair DKMS and check status:
```
cr0x@server:~$ sudo dkms autoinstall -k 6.8.0-50-generic
Installing module zfs/2.2.2 for kernel 6.8.0-50-generic
```
Decision: If failures appear, open the DKMS build log and fix compilation/signing before proceeding.
Regenerate initramfs for the target kernel:
```
cr0x@server:~$ sudo update-initramfs -u -k 6.8.0-50-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-50-generic
```
Decision: If you see errors, do not reboot. Fix the error. Warnings require judgment.

Verify required modules are inside initramfs:

cr0x@server:~$ lsinitramfs /boot/initrd.img-6.8.0-50-generic | grep -E '/(nvme|virtio_blk|dm_crypt|zfs)\.ko' | head
usr/lib/modules/6.8.0-50-generic/kernel/drivers/nvme/host/nvme.ko.zst

Decision: If the module you require for root isn’t there, you need initramfs-tools configuration changes or missing packages.

Update bootloader config:

cr0x@server:~$ sudo update-grub
Found linux image: /boot/vmlinuz-6.8.0-50-generic
Found initrd image: /boot/initrd.img-6.8.0-50-generic
done

Decision: If GRUB doesn’t see initrd, fix /boot naming and regenerate again.

Reboot once, then validate:
```
cr0x@server:~$ sudo reboot
...connection closed...
```
Decision: After boot, verify uname -r and check logs for module load errors.

Checklist 3: If you must rebuild initramfs from a rescue environment

Use this when you can’t boot any installed kernel. The high-level steps are consistent:
mount root, mount /boot, bind mount /dev /proc /sys, chroot, then rebuild.
The details vary by your storage stack, but the discipline is the same.

cr0x@server:~$ sudo mount /dev/mapper/vg0-root /mnt
cr0x@server:~$ sudo mount /dev/sda2 /mnt/boot
cr0x@server:~$ for i in dev proc sys run; do sudo mount --bind /$i /mnt/$i; done
cr0x@server:~$ sudo chroot /mnt /bin/bash

cr0x@server:/# update-initramfs -u -k 6.8.0-50-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-50-generic

cr0x@server:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-50-generic
Found initrd image: /boot/initrd.img-6.8.0-50-generic
done

Decision: If update-initramfs fails inside chroot, fix package state (headers, modules, DKMS) within the chroot, not in the rescue OS.

FAQ

1) Should I use `update-initramfs -u -k all` or target one kernel?

Target one kernel when you’re in incident mode and want to reduce variables.
Use -k all after you’ve fixed systemic issues (DKMS, firmware) and want consistency across fallback kernels.

2) What’s the difference between `-u` and `-c` for update-initramfs?

-u updates an existing initramfs image; -c creates a new one from scratch.
If you suspect corruption or a bad incremental result, use -c for the target kernel.

3) I rebuilt initramfs, but it still can’t find root. Now what?

Verify the module is present inside initramfs (lsinitramfs | grep), verify the root UUID exists in /dev/disk/by-uuid in the initramfs shell,
and verify your kernel command line (/proc/cmdline). If the UUID doesn’t exist, you’re missing a storage driver/firmware or have a real hardware problem.

4) DKMS shows “built” but not “installed.” Does that matter?

Yes. “Built” means it compiled; “installed” means it placed the module where the kernel will load it from and ran depmod integration.
For boot-critical modules (ZFS root), you want “installed” for the target kernel.

5) How do I tell if I need `linux-modules-extra`?

If a module you expect (filesystem, storage driver, uncommon NIC) is missing from /lib/modules/<kernel>,
it may live in the “extra” package depending on Ubuntu packaging.
Check with modinfo -k <kernel> <module>; if it fails, install the extra modules package for that kernel flavor.

6) Are firmware warnings during initramfs generation fatal?

Not automatically. They become fatal if the firmware is needed to initialize hardware required for boot (some storage/NIC devices).
If the system fails early and the warning mentions your storage or boot-critical NIC, treat it as likely relevant.

7) I use LUKS + LVM. What should I verify in initramfs?

Make sure dm-crypt and LVM tools are present and the initramfs includes the relevant hooks.
Practically: check that the modules and binaries exist in initramfs and that your /etc/crypttab is correct.
Then regenerate initramfs for the target kernel.

8) Can I fix this by pinning the old kernel and ignoring the new one?

Temporarily, yes, as a containment move. Operationally, it’s a debt bomb.
You still need to repair the build chain (DKMS, headers, /boot space, signing) or you’ll repeat the outage later under worse conditions.

9) What if GRUB is booting the wrong kernel even after update?

Confirm what GRUB thinks is default, regenerate config with update-grub, and verify that the referenced initrd exists.
Also check for manual edits in /etc/default/grub that pin an older kernel.

10) Why does this happen “randomly” after updates?

It’s not random. It’s usually one of: /boot capacity, interrupted upgrades, DKMS build failures, Secure Boot policy, missing headers,
or an initramfs hook failing. Those are deterministic problems with noisy symptoms.

Next steps you should actually do

If you’re here because production just ate dirt, do the disciplined fix:
boot a known-good kernel, confirm /boot has space, verify headers and DKMS status for the target kernel,
rebuild initramfs for that kernel, verify required modules are embedded, update GRUB, reboot once.
No more. No less.

Then prevent the sequel. Add checks to your upgrade and reboot workflows:
alert on /boot usage, treat initramfs generation errors as failures, verify DKMS status per kernel,
and keep at least one fallback kernel until the new one is proven. The boring practices aren’t glamorous,
but they’re cheaper than downtime and less exciting than a 2 a.m. console session.