Ubuntu 24.04: DKMS broke after kernel update — recover drivers without downtime

November 30, 2025 • February 3, 2026 • Read: 24 min • Views: 7

Was this helpful?

You patch a fleet. The kernel bumps. A few minutes later your monitoring lights up: GPUs missing, ZFS pools complaining, NIC offloads gone, maybe an InfiniBand fabric flapping. The services are still up… for now. Then you see it: DKMS didn’t build modules for the new kernel, so the next reboot is a trap.

This is the reality of running Ubuntu 24.04 in production. Kernel updates are routine; DKMS failures are the price of doing business with out-of-tree drivers. The goal here is not “fix it after it breaks.” The goal is: fix it while the system stays online, and make the next kernel update boring.

How DKMS actually fails after a kernel update

DKMS (Dynamic Kernel Module Support) exists because vendors keep shipping kernel modules that aren’t in the mainline kernel. NVIDIA, ZFS-on-Linux, some NIC and RAID drivers, VirtualBox, some security agents—anything that compiles against kernel headers is a candidate.

When Ubuntu installs a new kernel, the package scripts attempt to rebuild DKMS modules for that kernel. If that rebuild fails, you might not notice immediately because the currently running kernel still has working modules loaded. The breakage appears when:

You reboot and the new kernel boots without the module.
initramfs was generated without the module, causing early-boot failures (storage, root-on-ZFS, encryption, etc.).
Secure Boot blocks the unsigned module, and you get the “built but not loadable” special.
Headers for the new kernel weren’t installed, so DKMS had nothing to compile against.

Most “DKMS broke” incidents are one of these four. The fix is rarely mysterious; it’s usually just time-consuming and operationally scary. The trick is to de-risk it: diagnose precisely, build for the kernel you are going to boot into, validate loadability, and only then allow reboots.

Dry truth: DKMS is not “dynamic” in the way management imagines. It’s “dynamic” like a paper form is dynamic: you can fill it out again every time the kernel changes.

Fast diagnosis playbook

When you’re trying to avoid downtime, speed matters. The fastest path is: identify the target kernel, confirm whether the module exists for it, confirm whether it can load, then validate boot artifacts (initramfs). Everything else is garnish.

First: what kernel are you running, and what kernels are installed?

If you are still running the old kernel, you can rebuild calmly before reboot.
If you’re already on the new kernel and modules are missing, you need to restore functionality on the live kernel (sometimes possible, sometimes not).

Second: does DKMS show “built” for the target kernel?

If not built: you’re in “rebuild and fix build deps” mode.
If built: check if it installed into /lib/modules/<kernel> and whether modprobe succeeds.

Third: is Secure Boot blocking the module?

Secure Boot on + unsigned module = it will build fine and then fail at load time with signature errors.
This is the number one “we rebuilt it three times and nothing changed” loop.

Fourth: does initramfs include what you need?

If the module is needed for early boot (storage/network root, ZFS root, crypto), “built” isn’t enough.
Regenerate initramfs for the target kernel and verify it contains the module.

Fifth: block risky change while you fix it

Hold kernel packages if unattended upgrades keeps pulling new kernels while you’re mid-recovery.
Pin a known-good kernel as a rollback option.

Interesting facts and context (why this keeps happening)

DKMS originated in the Dell ecosystem in the mid-2000s to keep vendor drivers buildable across kernel upgrades, especially on enterprise fleets.
Ubuntu has shipped DKMS integration for years, but it still hinges on packaging scripts and the presence of headers—no headers, no module.
Secure Boot enforcement turned “build failures” into “load failures”. The module can compile perfectly and still be rejected by the kernel.
ZFS on Linux lived out-of-tree for a long time due to licensing friction; that history is why many Ubuntu installs still rely on DKMS for ZFS modules.
Kernel ABI stability is not a promise for out-of-tree modules. Minor kernel bumps can break builds if the module uses internal APIs.
Ubuntu’s HWE and SRU cadence can surprise you: a kernel update may arrive via unattended upgrades even if you “didn’t change anything.”
initramfs is often the real failure domain. The system boots a kernel; then early userspace can’t find the storage module it needs.
DKMS builds can be affected by toolchain changes (gcc, make, binutils). “Kernel updated” is sometimes shorthand for “your compiler also moved.”
Some vendors ship prebuilt modules for specific kernel versions, but Ubuntu’s kernel versions drift; DKMS becomes the fallback—until it isn’t.

One quote that has survived more postmortems than any one person deserves: “Hope is not a strategy.” — Gene Kranz. It applies to DKMS rebuilds too.

Practical tasks: commands, outputs, and decisions (12+)

These are not “run everything.” They’re a toolkit. Each task includes: the command, example output, what it means, and the decision you make from it.

Task 1: Confirm the running kernel

cr0x@server:~$ uname -r
6.8.0-51-generic

What it means: This is the kernel currently running. If DKMS broke during install of a newer kernel, you can usually fix it without any immediate outage, because you’re not using that new kernel yet.

Decision: If the running kernel is still the known-good kernel, do your DKMS rebuild for the new kernel now, then schedule a controlled reboot later.

Task 2: List installed kernels and see what the next reboot will likely use

cr0x@server:~$ dpkg -l 'linux-image-*generic' | awk '/^ii/{print $2,$3}'
linux-image-6.8.0-51-generic 6.8.0-51.52
linux-image-6.8.0-52-generic 6.8.0-52.53

What it means: You have at least two kernels installed; the highest version is typically selected at boot.

Decision: Identify the “target” kernel you must have DKMS modules for (here: 6.8.0-52-generic).

Task 3: Check DKMS status across kernels

cr0x@server:~$ dkms status
zfs/2.2.2, 6.8.0-51-generic, x86_64: installed
zfs/2.2.2, 6.8.0-52-generic, x86_64: built
nvidia/550.90.07, 6.8.0-51-generic, x86_64: installed
nvidia/550.90.07, 6.8.0-52-generic, x86_64: added

What it means: “installed” means the module is built and copied into /lib/modules/<kernel>. “built” is compiled but may not be installed. “added” means DKMS knows about it but hasn’t built it for that kernel.

Decision: For the target kernel, anything not “installed” is a risk. Build+install now.

Task 4: Verify kernel headers exist for the target kernel

cr0x@server:~$ dpkg -l | awk '/linux-headers-6.8.0-52-generic/{print $1,$2,$3}'
ii linux-headers-6.8.0-52-generic 6.8.0-52.53

What it means: DKMS needs headers. If this is missing, DKMS will fail with errors like “Kernel headers for target not found.”

Decision: If headers are missing, install them before rebuilding DKMS modules.

Task 5: Install the missing headers (if needed)

cr0x@server:~$ sudo apt-get update
Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease
Reading package lists... Done

cr0x@server:~$ sudo apt-get install -y linux-headers-6.8.0-52-generic
Reading package lists... Done
Building dependency tree... Done
The following NEW packages will be installed:
  linux-headers-6.8.0-52-generic
Setting up linux-headers-6.8.0-52-generic (6.8.0-52.53) ...

What it means: Headers are now present; DKMS has a fair chance.

Decision: Rebuild DKMS modules for the target kernel.

Task 6: Trigger DKMS autoinstall for the target kernel

cr0x@server:~$ sudo dkms autoinstall -k 6.8.0-52-generic
Sign command: /lib/modules/6.8.0-52-generic/build/scripts/sign-file
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Building module:
Cleaning build area... done.
Building module(s).... done.
Installing /lib/modules/6.8.0-52-generic/updates/dkms/zfs.ko
Installing /lib/modules/6.8.0-52-generic/updates/dkms/nvidia.ko
depmod... done.

What it means: DKMS built and installed modules for that specific kernel, and depmod updated module dependency maps.

Decision: If this succeeded, move to validation: can the module load (or at least is it present and signed)?

Task 7: If it fails, read the DKMS build log like you mean it

cr0x@server:~$ sudo tail -n 40 /var/lib/dkms/nvidia/550.90.07/build/make.log
CONFTEST: drm_prime_pages_to_sg_has_drm_device_arg
CONFTEST: drm_gem_object_put_unlocked
error: implicit declaration of function ‘drm_gem_object_put_unlocked’
make[2]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/550.90.07/build/nvidia-drm/nvidia-drm-gem.o] Error 1
make[1]: *** [Makefile:1926: /var/lib/dkms/nvidia/550.90.07/build] Error 2
make: *** [Makefile:234: __sub-make] Error 2

What it means: This is a compile-time API mismatch. It’s not a missing package; it’s the module source not supporting this kernel API.

Decision: Stop trying random rebuilds. You need a driver/module version compatible with that kernel (e.g., update NVIDIA driver package), or you need to boot the older kernel until you can.

Task 8: Validate module presence for the target kernel without rebooting

cr0x@server:~$ ls -l /lib/modules/6.8.0-52-generic/updates/dkms/ | egrep 'zfs|nvidia' | head
-rw-r--r-- 1 root root  8532480 Dec 29 10:12 nvidia.ko
-rw-r--r-- 1 root root 17362944 Dec 29 10:12 zfs.ko

What it means: The files exist where DKMS places them for that kernel.

Decision: Next validate loadability and signature state (especially under Secure Boot).

Task 9: Check Secure Boot state (the “built but blocked” detector)

cr0x@server:~$ mokutil --sb-state
SecureBoot enabled

What it means: The kernel will enforce module signature verification. Unsigned DKMS modules will fail to load.

Decision: If Secure Boot is enabled, ensure DKMS modules are signed with an enrolled key, or plan a controlled MOK enroll flow.

Task 10: Attempt to load the module on the running kernel (only when safe)

cr0x@server:~$ sudo modprobe -v zfs
insmod /lib/modules/6.8.0-51-generic/updates/dkms/spl.ko
insmod /lib/modules/6.8.0-51-generic/updates/dkms/zfs.ko

What it means: On the running kernel, module load succeeds. This is a sanity check that your DKMS install isn’t globally broken.

Decision: If modprobe fails with “Required key not available,” you’re in Secure Boot signing trouble. If it fails with “Unknown symbol,” you have a kernel/module mismatch.

Task 11: Inspect kernel logs for signature or symbol errors

cr0x@server:~$ sudo dmesg -T | tail -n 20
[Mon Dec 29 10:19:02 2025] Lockdown: modprobe: unsigned module loading is restricted; see man kernel_lockdown.7
[Mon Dec 29 10:19:02 2025] nvidia: module verification failed: signature and/or required key missing - tainting kernel

What it means: Secure Boot or lockdown policy is blocking or tainting. Some environments tolerate taint; some treat it as noncompliance.

Decision: If your policy requires signed modules, fix signing and enrollment now, before you reboot into a kernel that will refuse the module entirely.

Task 12: Verify initramfs was rebuilt for the target kernel

cr0x@server:~$ ls -lh /boot/initrd.img-6.8.0-52-generic
-rw-r--r-- 1 root root 98M Dec 29 10:14 /boot/initrd.img-6.8.0-52-generic

What it means: The initramfs exists and was updated recently, but that doesn’t guarantee it contains your module.

Decision: If the module is needed at boot (ZFS root, storage HBA, special NIC), you must verify its presence inside initramfs.

Task 13: Confirm the module is inside initramfs (the “trust but verify” step)

cr0x@server:~$ lsinitramfs /boot/initrd.img-6.8.0-52-generic | egrep '/zfs\.ko|/nvidia\.ko' | head
usr/lib/modules/6.8.0-52-generic/updates/dkms/zfs.ko

What it means: ZFS is included in early userspace for that kernel. For GPUs, it usually doesn’t need to be in initramfs; for storage/network boot scenarios, it might.

Decision: If missing, regenerate initramfs after fixing DKMS install.

Task 14: Rebuild initramfs for a specific kernel (targeted, not shotgun)

cr0x@server:~$ sudo update-initramfs -u -k 6.8.0-52-generic
update-initramfs: Generating /boot/initrd.img-6.8.0-52-generic

What it means: You forced initramfs regeneration for the kernel you care about.

Decision: Re-run lsinitramfs checks; only then consider rebooting.

Task 15: Ensure the module dependency map is correct for the target kernel

cr0x@server:~$ sudo depmod -a 6.8.0-52-generic

What it means: modules.dep and friends are updated. Some postinst scripts do this; some failures skip it. Running it manually is cheap.

Decision: If modprobe later complains it can’t find dependencies, you likely missed depmod or modules ended up in a nonstandard path.

Task 16: Hold kernel updates while you stabilize (optional but often wise)

cr0x@server:~$ sudo apt-mark hold linux-image-generic linux-headers-generic
linux-image-generic set on hold.
linux-headers-generic set on hold.

What it means: You’re stopping meta-packages from pulling new kernels automatically.

Decision: Use this during incident response. Remove holds once you have a repeatable DKMS pipeline and validation gate.

Task 17: Confirm what will be the default boot entry (so you don’t reboot into the trap)

cr0x@server:~$ grep -E 'GRUB_DEFAULT|GRUB_TIMEOUT|GRUB_SAVEDEFAULT' /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5

What it means: Default is the first menu entry, typically the newest kernel.

Decision: If the newest kernel doesn’t have working modules, either fix DKMS for it or temporarily set GRUB to boot the known-good kernel.

Task 18: Spot “half-configured” packages after a messy update

cr0x@server:~$ sudo dpkg --audit
The following packages are in a mess due to serious problems during installation. They must be reinstalled for them to work properly:
 linux-image-6.8.0-52-generic

What it means: Your kernel package install didn’t finish cleanly, which can skip DKMS triggers and initramfs generation.

Decision: Fix packaging state before debugging DKMS endlessly.

Task 19: Repair package state and re-run postinst triggers

cr0x@server:~$ sudo apt-get -f install
Reading package lists... Done
Building dependency tree... Done
Correcting dependencies... Done
Setting up linux-image-6.8.0-52-generic (6.8.0-52.53) ...
update-initramfs: Generating /boot/initrd.img-6.8.0-52-generic

What it means: The kernel’s post-install hooks ran. That often includes DKMS rebuild triggers.

Decision: Re-check dkms status for the target kernel; validate module presence and initramfs contents again.

Joke 1: DKMS is like a gym membership: you only notice it’s not working when you actually try to use it.

Recover drivers without downtime: strategy that works

“Without downtime” doesn’t mean magic. It means you avoid rebooting into a kernel that can’t load critical modules, and you avoid resetting hardware mid-traffic. For most DKMS incidents, the system is still running on the previous kernel and everything is fine—until you reboot. That’s your window.

Step 1: Decide what “critical driver” means on this host

Don’t treat all DKMS modules equally. A missing VirtualBox module on a server is annoying; a missing storage module on a root-on-ZFS node is catastrophic. Classify the host:

Storage critical: ZFS root, ZFS data pools, HBA drivers, dm-crypt dependencies.
Network critical: out-of-tree NIC drivers (rare on Ubuntu, but happens), DPDK modules, SR-IOV stacks, vendor offloads.
Compute critical: NVIDIA GPU nodes, ML clusters, video transcoders.
“Nice to have”: developer workstation drivers and nonessential modules.

Critical means: no reboot until the target kernel has a validated, loadable module and a sane initramfs.

Step 2: Build for the kernel you will boot, not the one you’re running

DKMS defaults can mislead you. If you just run dkms autoinstall without -k, it often targets the running kernel. That is not what you need during recovery. You need the next boot kernel.

Build explicitly for the target kernel version. Always.

Step 3: Prefer vendor packaging that tracks your kernel line

When a DKMS module fails to compile due to API mismatches, you have two realistic choices:

Upgrade the driver/module package to a version compatible with the new kernel.
Delay the kernel reboot and pin the kernel version until a compatible driver exists.

Trying to patch the module source on a production box at 2am is a hobby, not an SRE practice.

Step 4: Validate with “can it load” and “is it in initramfs”

Presence on disk is not enough. You need at least one of:

Load test on the target kernel (hard without reboot).
Signature validation (if Secure Boot is enabled).
initramfs inclusion verification for early-boot-critical modules.

A practical compromise: you validate the DKMS artifact path, run modinfo checks, verify module signing state, and validate initramfs. Then reboot in a controlled maintenance window with a known-good rollback kernel ready.

Step 5: Don’t break your own network while fixing a driver

Most DKMS recovery is CPU-and-disk heavy but doesn’t disturb traffic. The danger zone is when you unload/reload modules on a live system. Unless you have redundancy (bonding, multipath, clustering), avoid reloading network/storage modules on a single-node critical host during business hours.

Rebuild and install is safe. Unload/reload is a change.

Step 6: Create a “reboot gate”

In production, the simplest no-downtime control is policy: don’t allow reboot if DKMS modules aren’t installed for the newest installed kernel. You can enforce this with a local script that checks:

dkms status for target kernel shows “installed” for critical modules
lsinitramfs contains early-boot modules
mokutil --sb-state and signature status are aligned

Then you wire it into your change process. Boring. Works.

Secure Boot and module signing (MOK): the silent breaker

If you run Ubuntu 24.04 on hardware with Secure Boot enabled—and many orgs do, because compliance loves checkboxes—DKMS can “succeed” and you still lose. Here’s why:

DKMS compiles a module.
The kernel refuses to load it if it’s unsigned (or signed by an unenrolled key).
You discover this only when the driver is first needed, often after reboot.

How to recognize Secure Boot signature failures quickly

Typical symptoms:

modprobe: ERROR: could not insert '...': Required key not available
dmesg shows “module verification failed” or lockdown restrictions
dkms status claims “installed” but functionality is absent

What to do about it (pragmatic options)

Sign DKMS modules and enroll the key (MOK). This is the clean option when Secure Boot must remain enabled.
Disable Secure Boot in firmware. This is operationally simplest but may violate policy.
Use signed, in-tree drivers where possible. Long-term best, not always available.

Check if DKMS is signing modules

cr0x@server:~$ sudo grep -R "sign-file" -n /etc/dkms /etc/modprobe.d 2>/dev/null | head

What it means: There may be no explicit config. On Ubuntu, module signing for DKMS often ties into the shim/MOK tooling and the packaging scripts.

Decision: If Secure Boot is enabled and you see signature failures, don’t guess. Verify module signing with modinfo.

Inspect signature metadata on a module

cr0x@server:~$ modinfo -F signer /lib/modules/6.8.0-52-generic/updates/dkms/zfs.ko
Canonical Ltd. Secure Boot Signing

What it means: The module carries a signer string. If empty, it may be unsigned (or stripped of metadata).

Decision: If signer is missing and Secure Boot is on, you need to sign and enroll, or accept the module won’t load.

Verify enrolled MOK keys

cr0x@server:~$ sudo mokutil --list-enrolled | head
[key 1]
SHA1 Fingerprint: 12:34:56:78:90:...
Subject: CN=Canonical Ltd. Secure Boot Signing

What it means: The system trusts a set of keys. If your DKMS signing uses a different key, the kernel will reject it.

Decision: Align your signing key with the enrolled keys, or enroll the correct key via MOK (which typically requires a reboot into the MOK manager).

Joke 2: Secure Boot is the bouncer at the kernel nightclub: your module can be perfectly dressed and still not be on the list.

initramfs, early boot, and why “it built” isn’t enough

initramfs is the compressed early userspace image the kernel loads to get from “kernel started” to “real root filesystem mounted.” If your critical module isn’t in initramfs, the module being present on disk is irrelevant because the disk might not be reachable yet.

This matters for:

Root-on-ZFS systems
Encrypted root that needs specific modules early
Some exotic storage or network boot flows

Failure mode: DKMS installed modules, but initramfs was generated before the install

This happens during interrupted upgrades, parallel package operations, or when DKMS runs late and initramfs ran early. You boot and discover early userspace can’t find ZFS/SPL, or your storage driver isn’t present.

Fix: rebuild initramfs after DKMS installation for the target kernel, and verify contents with lsinitramfs.

Failure mode: multiple kernels, stale initramfs

You might have a correct initramfs for the running kernel but not for the newest installed kernel. That’s how the reboot trap is set. Always validate the initramfs that matches the kernel you’ll reboot into.

Three corporate mini-stories (realistic, anonymized)

Mini-story 1: The incident caused by a wrong assumption

They ran a small GPU cluster for batch inference. Nothing exotic: Ubuntu hosts, NVIDIA DKMS driver, a job scheduler, and a change window every Tuesday. The update cadence was “kernel updates automatically, driver updates when someone complains.” That worked until it didn’t.

A kernel update landed on Friday night via unattended upgrades. No one noticed because the nodes were still running the old kernel, and the GPUs were still available. Monday morning they drained one node for unrelated maintenance and rebooted it. It came back without the NVIDIA modules loading.

The wrong assumption was subtle: “If the driver is installed, it’s installed.” They never checked whether the driver built for the newly installed kernel. The node rebooted into the newest kernel (as it should), and DKMS had quietly failed days earlier.

They tried the classic fix: reinstall the driver package. It still didn’t load. Eventually someone checked dmesg and found a Secure Boot signature enforcement message. Secure Boot had been enabled in firmware on a recent hardware refresh, but nobody updated the runbook.

The fix was straightforward—signing and enrolling the key properly—but it required reboots into the MOK manager. They burned a day coordinating reboots across nodes, which was avoidable if they had a pre-reboot gate and a “DKMS installed for newest kernel” check.

Mini-story 2: The optimization that backfired

A financial services shop got tired of slow patch rollouts. They decided to “optimize” by removing build tooling from production servers: no gcc, no make, no headers, minimal packages only. Security liked it. The image was smaller, scans were cleaner, and the servers felt more appliance-like.

Then a kernel update rolled out. DKMS tried to rebuild the out-of-tree NIC module they relied on for a particular card’s features. No compiler, no headers, no build. DKMS failed, but the current kernel kept running. The failure stayed invisible.

The next reboot wave hit during a datacenter power maintenance event. Reboots were mandatory. Several hosts came up on the new kernel without the NIC module. The built-in driver worked enough to boot, but it lacked the offload features they had tuned their latency around. The symptom wasn’t “no network.” It was worse: intermittent performance collapse and timeouts under load.

They reverted the “minimal image” decision for that fleet and moved DKMS builds into a controlled pipeline: prebuild modules for the target kernel in a build environment, ship the artifacts, and verify before reboot. The optimization wasn’t wrong in principle. It was wrong without replacing DKMS’s implicit build requirement with an explicit supply chain.

The lesson: if you remove compilers from hosts, you own the module build process end-to-end. Otherwise, you’re just postponing the failure to reboot time.

Mini-story 3: The boring but correct practice that saved the day

A media company ran a bunch of storage-heavy Ubuntu boxes, some with ZFS pools. Their practice was painfully dull: every kernel update was followed by an automated “reboot readiness” check. It verified DKMS status for ZFS against the newest installed kernel, verified initramfs contains ZFS, and confirmed a known-good kernel remained installed as rollback.

One morning, the check flagged a failure on a subset of hosts. DKMS showed ZFS “built” but not “installed” for the newest kernel. The hosts were still running fine, so they didn’t panic. They blocked reboots via their orchestrator and opened a ticket.

The root cause was a packaging race during an earlier unattended upgrade: initramfs generation happened, then DKMS install failed and was retried, leaving inconsistent state. The boring check caught it before any reboot. They ran dkms autoinstall -k, rebuilt initramfs for the target kernel, and cleared the gate.

Zero downtime, no drama, no weekend. This is what “operational excellence” looks like when you strip away the PowerPoint.

Common mistakes: symptom → root cause → fix

1) Symptom: `dkms status` shows “added” for the new kernel

Root cause: DKMS module registered but not built for that kernel; headers missing or build failed earlier.

Fix: Install headers for the target kernel, then run sudo dkms autoinstall -k <kernel>. Validate files exist under /lib/modules/<kernel>/updates/dkms.

2) Symptom: DKMS build fails with “Kernel headers not found”

Root cause: Missing linux-headers-<kernel> package, or /lib/modules/<kernel>/build symlink broken.

Fix: Install matching headers; verify ls -l /lib/modules/<kernel>/build points to headers.

3) Symptom: Module builds, but `modprobe` fails with “Required key not available”

Root cause: Secure Boot enabled; module unsigned or signed with non-enrolled key.

Fix: Ensure DKMS modules are signed with a trusted key and enroll via MOK, or disable Secure Boot if policy allows.

4) Symptom: Boot into new kernel loses ZFS/root storage

Root cause: initramfs for the new kernel missing required module(s), often due to DKMS timing or failed postinst triggers.

Fix: After DKMS install, run update-initramfs -u -k <kernel>, then confirm with lsinitramfs.

5) Symptom: DKMS compile errors about missing symbols / implicit declarations

Root cause: Kernel API change; driver version incompatible with new kernel.

Fix: Upgrade the driver/module source package (e.g., newer NVIDIA/ZFS release), or hold kernel and reboot into the older kernel until compatible packages exist.

6) Symptom: Everything looks installed, but hardware still doesn’t work after reboot

Root cause: You built for the wrong kernel version (running kernel, not the installed newest one), or booted a different kernel than expected.

Fix: Confirm installed kernels, confirm default boot selection, rebuild explicitly for the boot kernel with dkms autoinstall -k.

7) Symptom: Package upgrades hang or leave “half-configured” state

Root cause: Interrupted upgrade, dpkg lock contention, full filesystem, or postinst script failures (often DKMS).

Fix: Repair dpkg state: apt-get -f install, check disk space, and re-run DKMS builds after the packaging layer is healthy.

Checklists / step-by-step plan

Checklist A: No-downtime recovery on a host still running the old kernel

Identify target kernel (newest installed): use dpkg -l for installed images.
Check DKMS status for critical modules against that kernel: dkms status.
Install headers for target kernel if missing: apt-get install linux-headers-<kernel>.
Rebuild modules for target kernel: dkms autoinstall -k <kernel>.
Validate artifacts exist in /lib/modules/<kernel>/updates/dkms.
Secure Boot check: mokutil --sb-state and confirm module signer via modinfo.
Rebuild initramfs for target kernel (storage-critical hosts): update-initramfs -u -k <kernel>.
Verify initramfs contents: lsinitramfs includes the required module(s).
Keep rollback available: confirm an older known-good kernel remains installed.
Schedule reboot with a rollback plan (console access, GRUB selection, remote hands if needed).

Checklist B: If you already rebooted into the broken kernel

Confirm what’s missing: lsmod, modprobe, and dmesg.
Check Secure Boot immediately; do not waste time rebuilding unsigned modules if Secure Boot will block them.
Install build prerequisites (temporarily): headers, compiler toolchain if DKMS needs it.
Rebuild DKMS for the running kernel: dkms autoinstall -k $(uname -r).
If build fails due to API mismatch: stop and pick a compatible driver version or roll back to previous kernel via GRUB.
Fix initramfs if early-boot modules are involved, then test reboot.

Checklist C: Prevent it next time (production hygiene)

Create a “reboot gate” that verifies DKMS installed for newest kernel and validates initramfs where needed.
Stage kernel updates on canary hosts with representative hardware.
Track Secure Boot policy as a first-class constraint, not a BIOS footnote.
Keep at least one rollback kernel installed and bootable at all times.
Control unattended upgrades so kernels don’t change without validation.

FAQ

1) Why did DKMS “break” only after the kernel update?

Because DKMS modules are compiled against a specific kernel’s headers. When the kernel changes, the module must be rebuilt. If that rebuild fails, you won’t notice until you boot the new kernel or attempt to load the module for it.

2) Can I fix DKMS without rebooting?

You can rebuild and install modules for the next kernel without rebooting, yes. You typically cannot test loading them into that next kernel without actually booting it. That’s why you validate artifacts, signatures, and initramfs contents before reboot.

3) What does “added” vs “built” vs “installed” mean in `dkms status`?

added: DKMS knows the module source but hasn’t built it for that kernel. built: compiled but not necessarily installed into the kernel’s module tree. installed: placed into /lib/modules/<kernel> and depmod has been run (or should be).

4) Do I really need matching kernel headers?

Yes. DKMS builds against the headers for the kernel version you’re targeting. “Close enough” doesn’t exist here; install linux-headers-<exact-version>.

5) Why does Secure Boot make this so much worse?

Because it turns a compile-time problem into a runtime enforcement problem. You can build the module successfully and still be unable to load it. The kernel will reject modules not signed by a trusted key when Secure Boot and lockdown policies require it.

6) If Secure Boot is enabled, should I disable it?

Only if your policy allows it. Disabling Secure Boot can be operationally simplest, but the correct fix in regulated environments is to sign DKMS modules with a key you control and enroll it via MOK.

7) Why did my system boot but then storage or networking was broken?

Often because the driver is loaded later than you think, or a fallback in-tree driver exists but lacks features. Another common cause: initramfs missing a module needed early, so boot succeeds partially, then devices appear late or incorrectly.

8) What’s the safest rollback if I can’t get DKMS to build for the new kernel?

Boot the previous known-good kernel and hold kernel meta-packages temporarily. Then upgrade the driver/module package to a version that supports the new kernel before attempting the reboot again.

9) Should I keep compilers off production servers?

It depends. If you rely on DKMS builds on-host, you need the build toolchain and headers. If you remove them, you must replace DKMS’s on-host build with a pipeline that produces and ships compatible modules for every kernel you deploy.

10) How do I prevent “reboot trap” kernels from accumulating?

Have a validation step after kernel install that checks DKMS status for critical modules on the newest kernel. If it fails, block reboot automation and alert. This is cheaper than incident response.

Next steps you can do today

If you run Ubuntu 24.04 with DKMS-managed drivers, stop treating kernel updates as “just security patches.” They are also driver rebuild events. The practical path to no downtime is short:

Pick your critical DKMS modules per host role (storage, network, GPU).
After every kernel install, rebuild modules for the newest installed kernel (dkms autoinstall -k).
Validate signatures if Secure Boot is enabled; don’t assume build success equals load success.
Regenerate and verify initramfs for early-boot-critical modules.
Only then reboot. Keep a rollback kernel installed and bootable.

Do that, and “DKMS broke after kernel update” stops being an incident. It becomes a checklist item that finishes before anyone notices.