Ubuntu 24.04: System Updates Broke Modules — Rebuild initramfs Correctly (Case #88)

Was this helpful?

You patch Ubuntu, reboot like a responsible adult, and the machine rewards you with an initramfs prompt, missing NICs, or a root filesystem that suddenly “doesn’t exist.”
Welcome to case #88: the update didn’t “break Linux.” It broke the contract between your kernel, your modules, and the initramfs that’s supposed to glue them together at boot.

The fix is usually boring: rebuild initramfs correctly, for the kernel you’re actually booting, with the modules you actually need, and without lying to yourself about Secure Boot, DKMS, or ZFS.
The trick is doing it with enough discipline that it stays fixed after the next update.

What actually broke: the boot chain and the module contract

On Ubuntu 24.04, the classic initramfs story still applies: GRUB loads a kernel (vmlinuz) and an initramfs image (initrd.img).
The initramfs is a tiny root filesystem in RAM that contains early userspace, scripts, and a curated set of kernel modules required to find and mount the real root filesystem.

When an update “breaks modules,” it’s rarely the kernel forgetting how to do its job. It’s usually one of these:

  • You’re booting a different kernel than you think (GRUB default changed, old kernel removed, or fallback entry chosen).
  • Your initramfs doesn’t match your kernel (wrong version, stale image, failed regeneration).
  • Modules exist on disk but weren’t included in initramfs (bad hooks, wrong config, missing dependency metadata).
  • DKMS didn’t build (headers missing, compiler mismatch, Secure Boot signing failure).
  • Secure Boot is blocking unsigned modules (NVIDIA, ZFS DKMS, third-party drivers).
  • Storage stack changed (new initramfs lacks LUKS, LVM, RAID, NVMe, HBA modules, or ZFS bits).

The operational mindset is simple: the boot path is a pipeline. Find which stage has stale artifacts and rebuild that stage.
Don’t flail. Don’t “just reinstall the OS.” That’s not engineering; that’s capitulation.

Interesting facts and context (because history repeats at 3 a.m.)

  1. initramfs replaced initrd largely because a compressed cpio archive is more flexible than a fixed-size ramdisk image.
  2. Ubuntu’s default tooling is initramfs-tools with update-initramfs, while other distros lean on dracut; muscle memory can sabotage you.
  3. “Early userspace” became mainstream when storage got complicated: LVM-on-LUKS-on-MDRAID isn’t something you want hardcoded into the kernel.
  4. DKMS exists because out-of-tree modules are a fact of life: vendor NICs, NVIDIA, and specialty filesystems keep showing up in real fleets.
  5. Secure Boot made module loading political: the kernel can be happy, and still refuse your module because the signature story isn’t.
  6. ZFS-on-root is operationally delightful until it isn’t: if your initramfs doesn’t contain ZFS bits, you’re not mounting root no matter how correct your pool is.
  7. GRUB can boot “fine” while the OS is broken: the handoff succeeds, and then initramfs can’t find root; people blame GRUB anyway.
  8. Kernel ABI bumps are intentional: Ubuntu kernel packages change uname -r, and any third-party modules must match exactly.

One paraphrased idea from W. Edwards Deming: Quality comes from improving the process, not from inspecting failures after they happen.
Reliability work is mostly building processes that don’t create broken initramfs images in the first place.

Fast diagnosis playbook

When a reboot goes sideways after updates, time matters. This is the shortest path to truth.
Check these in order; stop as soon as you find the mismatch.

First: Are you booting the kernel you think you are?

  • If you can log in: compare uname -r to what you believe you installed.
  • If you can’t: use GRUB’s “Advanced options” to boot the previous kernel and get a shell.

Second: Does initramfs exist and match that kernel?

  • Check /boot/initrd.img-$(uname -r) exists and is recent.
  • Inspect whether required modules are present inside the image (storage + crypto + platform drivers).

Third: Did module builds fail (DKMS / headers / Secure Boot)?

  • Look at dkms status and package states.
  • Check journalctl -b -0 for module signature and load failures.

Fourth: Is the root device discovery failing?

  • “Gave up waiting for root device” means missing drivers or wrong UUIDs.
  • Verify UUIDs in /etc/fstab, crypttab, and GRUB cmdline.

Joke #1 (short, relevant): The initramfs is like an emergency kit—you only notice what’s missing when you’re already cold and annoyed.

Hands-on tasks (commands, outputs, decisions)

These tasks are designed for Ubuntu 24.04 systems using initramfs-tools (the default).
Each task includes: a command, realistic output, what it means, and the decision you make.
Run them from a working boot (maybe an older kernel), a rescue shell, or a live ISO with the root filesystem mounted.

Task 1 — Confirm the running kernel and whether you’re in a fallback boot

cr0x@server:~$ uname -r
6.8.0-41-generic

Meaning: Your active kernel is 6.8.0-41-generic. Everything you debug must match this version.
Decision: Use this exact string when checking module directories and initramfs images. No guessing, no “close enough.”

Task 2 — List installed kernels (and spot half-removed packages)

cr0x@server:~$ dpkg -l 'linux-image-*' | awk '/^ii|^rc/ {print $1, $2, $3}' | head -n 12
ii linux-image-6.8.0-40-generic 6.8.0-40.40
ii linux-image-6.8.0-41-generic 6.8.0-41.41
ii linux-image-generic-hwe-24.04 6.8.0-41.41
rc linux-image-6.8.0-39-generic 6.8.0-39.39

Meaning: You have multiple kernels, and one is in “rc” (removed, config remains).
Decision: If you’re missing a known-good prior kernel, reinstall it. Keep at least one fallback kernel until the system proves stable.

Task 3 — Verify initramfs image exists for the running kernel

cr0x@server:~$ ls -lh /boot/initrd.img-$(uname -r)
-rw-r--r-- 1 root root 114M Dec 31 09:12 /boot/initrd.img-6.8.0-41-generic

Meaning: The initramfs file exists and is plausible in size.
Decision: If it’s missing, tiny, or timestamped before the update, you rebuild it. If it looks right, you still may need to inspect its contents.

Task 4 — Check whether the module directory exists for that kernel

cr0x@server:~$ ls -ld /lib/modules/$(uname -r)
drwxr-xr-x 7 root root 4096 Dec 31 09:10 /lib/modules/6.8.0-41-generic

Meaning: Modules for the running kernel exist on disk.
Decision: If this directory is missing, your kernel install is incomplete (or you’re booted into a kernel without modules). Fix packages first.

Task 5 — Confirm module dependency metadata is present (depmod state)

cr0x@server:~$ ls -1 /lib/modules/$(uname -r)/modules.dep /lib/modules/$(uname -r)/modules.alias
/lib/modules/6.8.0-41-generic/modules.alias
/lib/modules/6.8.0-41-generic/modules.dep

Meaning: depmod has produced dependency maps.
Decision: If these are missing or empty, run depmod -a and rebuild initramfs. initramfs-tools depends on this metadata to pull in drivers.

Task 6 — Identify the root device type (NVMe/SATA/LVM/LUKS/ZFS)

cr0x@server:~$ findmnt -no SOURCE,TARGET,FSTYPE /
/dev/mapper/cryptroot / ext4

Meaning: Root is on a dm-crypt mapping named cryptroot.
Decision: initramfs must include dm-crypt and whatever storage driver sits below it (NVMe, AHCI, RAID, etc.). If boot fails, suspect missing crypto/storage modules or stale crypttab.

Task 7 — Check crypttab and fstab UUIDs are coherent

cr0x@server:~$ sudo grep -v '^\s*#' /etc/crypttab
cryptroot UUID=3b5a2d2c-9f1a-4c3b-bfb6-9f8c7e4a1f0e none luks,discard
cr0x@server:~$ sudo blkid | grep 3b5a2d2c-9f1a-4c3b-bfb6-9f8c7e4a1f0e
/dev/nvme0n1p3: UUID="3b5a2d2c-9f1a-4c3b-bfb6-9f8c7e4a1f0e" TYPE="crypto_LUKS" PARTUUID="b6a0..."

Meaning: crypttab points to a real device UUID.
Decision: If UUID mismatches, initramfs will wait forever for a device that’s not coming. Fix crypttab/fstab, then rebuild initramfs.

Task 8 — Inspect required modules inside the initramfs image

cr0x@server:~$ lsinitramfs /boot/initrd.img-$(uname -r) | grep -E 'cryptsetup|dm-crypt|nvme|ahci|zfs' | head
usr/sbin/cryptsetup
lib/modules/6.8.0-41-generic/kernel/drivers/md/dm-crypt.ko.zst
lib/modules/6.8.0-41-generic/kernel/drivers/nvme/host/nvme.ko.zst

Meaning: initramfs contains cryptsetup and the dm-crypt and NVMe modules.
Decision: If your critical module is absent, you fix initramfs-tools configuration or hooks, then rebuild.

Task 9 — Verify whether initramfs rebuilds are failing (look at logs)

cr0x@server:~$ sudo journalctl -u systemd-update-utmp -b -0 | tail -n 5
Dec 31 09:12:24 server systemd[1]: Starting Update UTMP about System Boot/Shutdown...
Dec 31 09:12:24 server systemd[1]: Finished Update UTMP about System Boot/Shutdown.
cr0x@server:~$ sudo grep -R "update-initramfs" -n /var/log/apt/term.log | tail -n 6
2025-12-31 09:11:57 update-initramfs: Generating /boot/initrd.img-6.8.0-41-generic
2025-12-31 09:12:03 cryptsetup: WARNING: Resume target cryptswap uses a key file

Meaning: initramfs was generated during the upgrade; warnings exist but no hard failure shown here.
Decision: If you see “failed” or “cannot find module,” treat it as a hard stop: fix the underlying issue and regenerate again.

Task 10 — Check DKMS status for out-of-tree modules

cr0x@server:~$ dkms status
nvidia/550.90.07, 6.8.0-41-generic, x86_64: installed
zfs/2.2.2, 6.8.0-41-generic, x86_64: built

Meaning: NVIDIA is installed for this kernel; ZFS is built (not necessarily installed).
Decision: Anything not “installed” for your kernel is a likely reason modules don’t load. Rebuild DKMS modules and re-run initramfs.

Task 11 — Confirm Secure Boot state (and anticipate signature failures)

cr0x@server:~$ mokutil --sb-state
SecureBoot enabled

Meaning: Secure Boot is enabled; unsigned third-party modules may fail to load.
Decision: If module loading fails with signature errors, either enroll a MOK and sign modules, or disable Secure Boot. Pick one; don’t pretend you can ignore it.

Task 12 — Detect module load failures in the current boot

cr0x@server:~$ sudo journalctl -k -b -0 | grep -E "module verification failed|Unknown symbol|taint|Lockdown" | tail -n 8
Dec 31 09:13:01 server kernel: Lockdown: modprobe: unsigned module loading is restricted; see man kernel_lockdown.7
Dec 31 09:13:01 server kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel

Meaning: The kernel is in lockdown mode; unsigned modules may be blocked or taint the kernel depending on policy.
Decision: If the module is blocked (not merely tainting), your fix is signing/enrolling keys or disabling Secure Boot. Rebuilding initramfs alone won’t help.

Task 13 — Verify you didn’t accidentally switch initramfs generators

cr0x@server:~$ dpkg -l | awk '/initramfs-tools|dracut/ {print $1, $2, $3}'
ii initramfs-tools 0.142ubuntu25.2

Meaning: You’re using initramfs-tools. Good: stick with one system unless you have a policy to migrate.
Decision: If dracut is installed and taking over, confirm which tool updates your initrd and standardize. Mixed tooling is a slow-motion outage.

Task 14 — Validate boot artifacts and symlinks in /boot

cr0x@server:~$ ls -l /boot | grep -E 'vmlinuz|initrd|System.map' | head -n 10
lrwxrwxrwx 1 root root      27 Dec 31 09:12 initrd.img -> initrd.img-6.8.0-41-generic
-rw-r--r-- 1 root root 119458690 Dec 31 09:12 initrd.img-6.8.0-41-generic
lrwxrwxrwx 1 root root      24 Dec 31 09:12 vmlinuz -> vmlinuz-6.8.0-41-generic
-rw-r--r-- 1 root root  14901248 Dec 31 09:12 vmlinuz-6.8.0-41-generic

Meaning: Symlinks point to the newest kernel. That’s normal on Ubuntu.
Decision: If these point to a kernel that lacks modules or has a broken initramfs, pin GRUB to a known-good kernel while you repair the new one.

Rebuild initramfs correctly (and prove it worked)

The goal isn’t “run update-initramfs and hope.” The goal is to create a deterministic initramfs for a specific kernel version,
containing the right modules, with module dependency metadata current, and with bootloader configs updated accordingly.

Step 0 — Pick the kernel version you are fixing

Decide the target kernel version explicitly. Usually it’s the newest installed kernel you want to boot.
For the rest of this section, assume KVER=6.8.0-41-generic.

Task 15 — Ensure kernel image and headers are installed

cr0x@server:~$ KVER=6.8.0-41-generic
cr0x@server:~$ sudo apt-get update
Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease
Reading package lists... Done
cr0x@server:~$ sudo apt-get install -y linux-image-$KVER linux-headers-$KVER
Reading package lists... Done
Building dependency tree... Done
linux-image-6.8.0-41-generic is already the newest version (6.8.0-41.41).
linux-headers-6.8.0-41-generic is already the newest version (6.8.0-41.41).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Meaning: Kernel and headers are present. DKMS has a fighting chance.
Decision: If headers were missing, install them before touching DKMS or initramfs. Otherwise you’ll regenerate an initramfs that can’t ever contain the modules you need.

Task 16 — Rebuild DKMS modules (if you use any)

cr0x@server:~$ sudo dkms autoinstall -k $KVER
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Building module:
Cleaning build area...
Building module(s)... done.
Installing /lib/modules/6.8.0-41-generic/updates/dkms/nvidia.ko.zst
depmod...

Meaning: DKMS rebuilt and installed a module for that kernel, then ran depmod.
Decision: If DKMS errors out, do not rebuild initramfs yet. Fix DKMS first (headers, compiler, Secure Boot signing), then continue.

Task 17 — Refresh module dependency metadata explicitly

cr0x@server:~$ sudo depmod -a $KVER

Meaning: Module dependency maps are regenerated for the target kernel.
Decision: If depmod prints errors about missing files, your module tree is inconsistent. Resolve that before generating an initramfs that will inherit the inconsistency.

Task 18 — Rebuild initramfs for exactly one kernel (the sane default)

cr0x@server:~$ sudo update-initramfs -u -k $KVER
update-initramfs: Generating /boot/initrd.img-6.8.0-41-generic
W: Possible missing firmware /lib/firmware/i915/rlc.bin for module i915

Meaning: initramfs was rebuilt for the target kernel. A firmware warning may or may not matter for your boot path.
Decision: If it says “failed” or exits non-zero, stop and fix. If it completes with warnings, decide whether the warning affects boot-critical hardware.

Task 19 — Inspect initramfs contents to confirm the fix

cr0x@server:~$ lsinitramfs /boot/initrd.img-$KVER | grep -E 'dm-crypt.ko|nvme.ko|mlx|bnx2|zfs.ko' | head -n 20
lib/modules/6.8.0-41-generic/kernel/drivers/md/dm-crypt.ko.zst
lib/modules/6.8.0-41-generic/kernel/drivers/nvme/host/nvme.ko.zst

Meaning: Critical storage/crypto modules are present.
Decision: If the module you need isn’t in the initramfs, add it (see next steps) rather than regenerating repeatedly like a slot machine.

Task 20 — Force inclusion of a module (when autodetection fails)

initramfs-tools usually does the right thing. Sometimes it doesn’t—especially on unusual storage controllers, layered storage, or when hooks break.
You can force modules via /etc/initramfs-tools/modules.

cr0x@server:~$ echo -e "nvme\ndm-crypt\n" | sudo tee -a /etc/initramfs-tools/modules
nvme
dm-crypt
cr0x@server:~$ sudo update-initramfs -u -k $KVER
update-initramfs: Generating /boot/initrd.img-6.8.0-41-generic

Meaning: You’ve pinned required modules into initramfs generation.
Decision: Use this for boot-critical items only. Don’t bloat initramfs with every driver “just in case.” That’s how you turn boot into a mystery novel.

Task 21 — Ensure the bootloader config knows about the kernel

cr0x@server:~$ sudo update-grub
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-41-generic
Found initrd image: /boot/initrd.img-6.8.0-41-generic
done

Meaning: GRUB detected the kernel and initrd pair.
Decision: If GRUB doesn’t list the kernel/initrd you expect, fix /boot mounting, package installs, or filesystem space issues before rebooting.

Task 22 — Confirm /boot isn’t full (silent initramfs failures love this)

cr0x@server:~$ df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  975M  912M   11M  99% /boot

Meaning: /boot is at 99%. That’s the danger zone; regeneration can fail or truncate.
Decision: Remove old kernels safely (apt-get autoremove --purge) and keep at least one fallback kernel. Then rebuild initramfs again.

Task 23 — Remove old kernels safely (space recovery without self-sabotage)

cr0x@server:~$ sudo apt-get autoremove --purge
Reading package lists... Done
Building dependency tree... Done
The following packages will be REMOVED:
  linux-image-6.8.0-40-generic* linux-modules-6.8.0-40-generic*
After this operation, 412 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 214233 files and directories currently installed.)
Removing linux-image-6.8.0-40-generic (6.8.0-40.40) ...
Processing triggers for initramfs-tools (0.142ubuntu25.2) ...
update-initramfs: Generating /boot/initrd.img-6.8.0-41-generic
Processing triggers for grub-pc (2.12-1ubuntu7) ...

Meaning: Old kernel removed; initramfs regenerated; GRUB updated.
Decision: Verify you still have at least one older kernel as fallback. Then proceed to reboot testing.

Task 24 — Reboot to test (and keep a console open)

cr0x@server:~$ sudo reboot

Meaning: You’re testing the repaired boot path.
Decision: If this is remote, use your out-of-band console (IPMI/iDRAC/virt console). Reboots without a console are how you end up practicing apology emails.

Joke #2 (short, relevant): Secure Boot is great until you’re the one trying to load a module at 2 a.m. and it starts asking for “papers.”

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company ran Ubuntu fleets on bare metal with encrypted root disks. They had a clean patching cadence: stage, test, roll out.
One Friday, the rollout went out anyway—because the pre-prod cluster “looked fine.”

Monday morning, a slice of production wouldn’t come back after a routine reboot. Systems landed in initramfs with “cannot find /dev/mapper/cryptroot.”
The on-call suspected the disk array. Storage got paged. Everyone stared at iostat numbers like they were tea leaves.

The actual failure was mundane: the initramfs generated during the update didn’t include a particular storage controller module needed to see the NVMe devices.
The wrong assumption was that “cryptsetup failing implies crypto.” But the crypto layer was innocent; it was waiting on a disk that wasn’t visible yet.

The fix was to force-include the missing controller module in /etc/initramfs-tools/modules, regenerate for the target kernel, and roll that as a configuration baseline.
Afterward, they added a boot-path validation test: “can initramfs see the root block device?” That single check would have caught this before Monday.

Mini-story 2: The optimization that backfired

Another team wanted faster reboots for a low-latency trading app. They shaved milliseconds anywhere they could.
Someone proposed a “lean initramfs”: remove “unneeded” modules, disable most autodetection fallbacks, and keep only exactly what the current server used.

It worked—until the next kernel update. One kernel revision changed which module provided a particular storage feature.
The initramfs generation still “succeeded,” but it was now missing the correct module name for the new kernel packaging.

The symptom was brutal: machines booted into initramfs on reboot, and the rollback was delayed because their process removed older kernels to keep /boot tidy.
They optimized the one thing you shouldn’t optimize aggressively: the ability to recover.

They reverted the minimal initramfs policy, kept one extra fallback kernel, and introduced a change rule:
if you tune initramfs content, you also ship a validation that inspects the actual initrd for required modules by pattern, not by fragile module filenames.

Mini-story 3: The boring but correct practice that saved the day

A large enterprise had a dull policy: every kernel upgrade ticket required capturing three artifacts before reboot:
uname -r, the list of installed kernels, and the output of lsinitramfs checks for storage and crypto modules.
Nobody loved it. It felt like paperwork.

Then an update introduced a DKMS rebuild failure for a third-party network driver on a subset of hosts.
The upgrade completed, but the module wasn’t available for the new kernel. Without it, the NIC didn’t come up after reboot.

Because the policy forced collection of DKMS status and initramfs contents, they caught the failure in staging before production.
They pinned the affected package version, rebuilt DKMS with correct headers, and scheduled a maintenance window with a verified rollback kernel still installed.

The event didn’t become an incident. It became a line item in the weekly report.
Boring practices don’t get applause, but they do keep your weekends intact.

Common mistakes: symptom → root cause → fix

1) Drops to initramfs with “Gave up waiting for root filesystem device”

  • Symptom: Boot lands in initramfs prompt; root UUID not found.
  • Root cause: Missing storage driver module in initramfs, or wrong UUID in fstab/crypttab.
  • Fix: Verify UUIDs with blkid. Inspect initrd contents with lsinitramfs. Force module inclusion in /etc/initramfs-tools/modules. Rebuild with update-initramfs -u -k KVER.

2) Network disappears after reboot (NIC driver missing)

  • Symptom: Host boots but no interfaces beyond loopback; ip link shows nothing useful.
  • Root cause: Out-of-tree NIC driver (DKMS) failed to build for the new kernel; initramfs may also be missing firmware.
  • Fix: Check dkms status. Install headers. Run dkms autoinstall -k KVER. Confirm module loads with modprobe and check kernel logs, then rebuild initramfs.

3) NVIDIA driver “works yesterday, black screen today”

  • Symptom: GUI fails, display manager loops, kernel logs show nvidia module errors.
  • Root cause: DKMS module mismatch or Secure Boot signature enforcement.
  • Fix: Verify Secure Boot via mokutil --sb-state. Rebuild DKMS. If Secure Boot is enabled, enroll/sign properly or disable it. Then rebuild initramfs and reboot.

4) “Unknown symbol” when loading a module

  • Symptom: Module inserts fail; dmesg shows unknown symbols.
  • Root cause: Module built against different kernel headers than the running kernel (ABI mismatch).
  • Fix: Ensure linux-headers-KVER matches. Rebuild DKMS for that kernel only, run depmod -a KVER, regenerate initramfs.

5) initramfs rebuild “succeeds,” but boot still fails

  • Symptom: You ran update-initramfs; nothing changed; still fails at boot.
  • Root cause: You rebuilt the wrong kernel version, or GRUB boots a different entry.
  • Fix: Confirm uname -r for the kernel you booted successfully, inspect /boot symlinks, run update-grub, and set GRUB default to the known-good kernel while you fix the new one.

6) “Not enough space on /boot” leads to partially written initrd

  • Symptom: initrd exists but is unusually small; regeneration logs show space issues.
  • Root cause: /boot partition nearly full; old kernels not cleaned up.
  • Fix: Remove old kernels via apt-get autoremove --purge (keep a fallback). Rebuild initramfs and update GRUB.

7) ZFS root won’t import at boot

  • Symptom: Boot drops into initramfs; pool not found/import fails.
  • Root cause: ZFS module not present in initramfs, DKMS build not installed, or mismatch between kernel and ZFS module.
  • Fix: Confirm ZFS DKMS state, ensure ZFS module appears in initrd (lsinitramfs), rebuild DKMS and initramfs for the target kernel.

Checklists / step-by-step plan

Checklist A — If the system still boots (best case)

  1. Confirm running kernel: uname -r.
  2. Confirm installed kernels and spot broken package states: dpkg -l 'linux-image-*'.
  3. Check /boot space: df -h /boot. If >90%, clean up first.
  4. Verify target kernel has modules: ls /lib/modules/KVER.
  5. Check DKMS: dkms status. Fix anything not “installed.”
  6. Run depmod -a KVER.
  7. Rebuild initramfs: update-initramfs -u -k KVER.
  8. Verify module presence inside initrd: lsinitramfs grep for your drivers.
  9. Update GRUB: update-grub.
  10. Reboot with console access and verify post-boot logs.

Checklist B — If the system does not boot (real incident mode)

  1. Use GRUB “Advanced options” to boot an older kernel (if present).
  2. If none boot: use a live ISO, mount root and boot partitions, then chroot.
  3. Inside chroot: ensure /boot is mounted and writable.
  4. Reinstall kernel image + headers for the target version.
  5. Fix DKMS failures, especially storage/network modules and ZFS.
  6. Rebuild initramfs for the target kernel.
  7. Run update-grub and confirm it sees the kernel/initrd.
  8. Reboot and validate.

Task 25 — Chroot recovery workflow (when you need to repair from a live ISO)

cr0x@server:~$ sudo mount /dev/nvme0n1p2 /mnt/boot
cr0x@server:~$ sudo mount /dev/mapper/cryptroot /mnt
cr0x@server:~$ for d in /dev /proc /sys /run; do sudo mount --bind $d /mnt$d; done
cr0x@server:~$ sudo chroot /mnt /bin/bash
cr0x@server:/# update-initramfs -u -k 6.8.0-41-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-41-generic
cr0x@server:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-41-generic
Found initrd image: /boot/initrd.img-6.8.0-41-generic
done

Meaning: You rebuilt boot artifacts in the context of the installed OS, not the live environment.
Decision: If update-grub doesn’t find your kernel, /boot may not be mounted correctly or packages are missing. Fix that before reboot.

FAQ

1) Should I use update-initramfs or mkinitramfs?

Use update-initramfs for routine operations. It manages the standard paths and works with packaging triggers.
Use mkinitramfs when you want to generate a specific file manually for testing, but don’t make that your default habit.

2) I rebuilt initramfs but it still boots the old kernel. Why?

Because GRUB is booting what it’s configured to boot. Run update-grub, check /boot symlinks, and confirm the default menu entry.
Also confirm you didn’t rebuild initramfs for a kernel you’re not selecting.

3) How do I know which modules must be in initramfs?

You need everything required to discover and mount root: storage controller (NVMe/AHCI/HBA), RAID/LVM/dm-crypt, filesystem driver, and anything needed for the root device path.
You can prove it by inspecting the initrd using lsinitramfs and grepping for those modules.

4) Is it safe to force modules in /etc/initramfs-tools/modules?

Yes, when you do it deliberately for boot-critical drivers. Keep the list tight.
If you turn it into a dumping ground, you’ll bloat initramfs and make boot debugging harder, not easier.

5) Why does Secure Boot matter if the module is in initramfs?

Being present in initramfs doesn’t guarantee it can be loaded. Secure Boot policies can refuse unsigned modules during boot.
If you see signature/lockdown messages in kernel logs, handle signing/enrollment or disable Secure Boot.

6) DKMS says “built” but not “installed.” What’s the difference?

“Built” means compilation succeeded. “Installed” means the module was copied into the kernel’s module tree (typically under /lib/modules/KVER/updates) and depmod was run.
You need “installed” for it to load reliably and be included in initramfs when relevant.

7) My /boot is small. Should I just enlarge it?

If you can, yes—modern kernels and initramfs images are not tiny. But in many environments resizing partitions is change-management pain.
Operationally, keep one fallback kernel, prune old ones regularly, and alert on /boot usage.

8) Can I rebuild initramfs for all kernels at once?

You can: update-initramfs -u -k all. It’s useful after broad changes (like adding a required module).
But in incident response, target one kernel first so you can reason about what changed.

9) What if the initramfs prompt appears and I want to debug live?

At the initramfs shell, check what block devices exist, confirm the root UUID is visible, and try loading modules with modprobe.
If loading the missing driver makes the root device appear, you’ve proven the root cause: missing module in initramfs.

10) Does reinstalling the kernel package fix most of this?

Often, yes—because it restores the module tree and triggers initramfs regeneration.
But if the real issue is DKMS + Secure Boot, reinstalling the kernel just gives you a fresh stage to fail on.

Next steps you should actually do

If you’re in the middle of an outage: boot a known-good kernel, verify the kernel/initramfs/module version alignment, fix DKMS and Secure Boot realities, then rebuild initramfs for the kernel you want.
Prove the fix by inspecting initramfs contents and confirming GRUB sees the correct pair.

Then do the unsexy follow-up work:

  • Keep at least one fallback kernel installed until the new kernel survives real reboots.
  • Monitor /boot usage and clean old kernels on schedule, not when it’s at 99%.
  • In staging, validate the boot path by checking initramfs contains your root-critical modules (storage/crypto/ZFS/NIC as applicable).
  • If you use DKMS modules, treat “DKMS installed for KVER” as a release gate, not a nice-to-have.
  • Pick a Secure Boot strategy and document it. “We’ll deal with it later” is not a strategy; it’s a future incident.

Most “system updates broke modules” cases are just mismatched artifacts. Once you start treating initramfs as a build product you can inspect and validate, the drama drops sharply.
Linux isn’t fragile. Our assumptions are.

← Previous
MySQL vs SQLite: the “free speed” case—when a file DB beats a server
Next →
Fluid Spacing with CSS clamp(): Padding and Margins That Scale Naturally

Leave a comment