Debian 13: Your server won’t boot after updates — the clean GRUB rollback that actually works

Was this helpful?

Your maintenance window ended 20 minutes ago. The server is still “down”. The hypervisor console shows a polite little grub rescue> prompt that does not care about your SLA. And someone in Slack is typing “did we brick it?” with the confidence of a person who has never had to unbrick anything.

This is the recovery path that works in the real world: fast diagnosis first, then a clean rollback/reinstall of GRUB without turning your disk into a crime scene, then kernel/initramfs and /boot hygiene so it doesn’t happen again.

Fast diagnosis playbook (what to check 1st/2nd/3rd)

Boot failures after updates are rarely “mystical.” They’re usually one of four categories: wrong device/EFI entry, broken GRUB core/config, missing kernel/initramfs, or storage unlock/assembly failures (LUKS/RAID/ZFS). Your job is to identify which category in minutes, not hours.

First: identify the failure stage by the exact screen

  • UEFI firmware screen loops back to BIOS/UEFI menu: firmware can’t find a boot entry or the EFI binary is missing/unreadable.
  • grub rescue>: GRUB loaded but can’t find its modules or configuration; often the prefix points wrong or the filesystem moved.
  • grub> (full prompt): GRUB is functional; it can likely load a kernel if you point it at the right partition.
  • Kernel loads then drops to initramfs: initramfs can’t find root (UUID mismatch, missing driver, LUKS prompt not appearing, mdraid not assembling).
  • Kernel panic immediately after update: wrong kernel, broken initramfs, or a driver/regression; rollback kernel is usually fastest.

Second: collect one hard fact: what disk and what boot mode

Before you “fix,” decide: UEFI or legacy BIOS? NVMe or SATA? mdraid/LUKS/ZFS? If you guess, you will guess wrong under pressure.

Third: decide the lowest-risk recovery route

  • If you can reach GRUB menu: boot an older kernel first. That’s the least invasive rollback.
  • If you can reach a rescue environment: mount, chroot, reinstall GRUB cleanly, regenerate initramfs, verify /boot space.
  • If disks aren’t assembling/unlocking: fix mdadm/LUKS first; reinstalling GRUB onto a disk the system can’t read is performance art.

Operational rule: don’t write to disks until you can explain what you’re writing and where the bytes are going. The fastest way to turn a boot incident into a data recovery incident is “just run grub-install everywhere.”

Why Debian “won’t boot after updates” happens (and what’s actually broken)

Debian upgrades touch boot-critical pieces in a way that’s both robust and unforgiving. The packages do the right thing most of the time—until your setup is slightly unusual, your /boot is tight on space, your firmware is quirky, or you run storage stacks that require early userspace cooperation.

Common triggers:

  • GRUB package update wrote new core files but your EFI System Partition (ESP) wasn’t mounted. The update “succeeds” yet the firmware still boots an old binary.
  • /boot filled up so kernel or initramfs generation was partial. You now have a GRUB menu item pointing at a kernel that exists, but initramfs is missing—or vice versa.
  • Disk IDs/UUIDs changed (cloning, disk replacement, RAID reshaping). GRUB config references old UUIDs, so it can’t locate /boot/grub or the kernel.
  • UEFI NVRAM boot entries got reset (firmware update, CMOS reset, vendor firmware being itself). The disk is fine; the firmware forgot how to find it.
  • Initramfs lost a needed driver or hook (especially with mdraid, LUKS, exotic HBAs, or ZFS-on-root). The kernel boots, then early userspace fails to locate root.
  • Secure Boot interactions: shim, signed GRUB, or kernel signatures not matching what firmware expects. Symptoms vary from silent refusal to load to a brief flash and reboot.

If you’re looking for a single villain, it’s usually not “GRUB is bad.” It’s “the chain of custody for boot artifacts is messy.” Your rollback should restore the chain, not add more random copies of grubx64.efi in surprising places.

Interesting facts and a little bootloader history

  • GRUB’s name is literal: it started as the “GRand Unified Bootloader” under the GNU project, meant to unify the chaos of early PC boot managers.
  • BIOS-era GRUB used multi-stage loading (stage1 in the MBR, stage1.5 in the “MBR gap,” stage2 in /boot). GPT disks and modern tooling made the “gap” unreliable, pushing people toward BIOS Boot Partitions.
  • UEFI changed the game: bootloaders became normal files on a FAT-formatted ESP, which is both simpler and more fragile (easy to overwrite, easy to forget to mount).
  • The “fallback” UEFI path exists for a reason: \EFI\BOOT\BOOTX64.EFI is the default many firmwares try when NVRAM entries are missing or corrupted.
  • Debian’s kernel packaging is conservative compared to some distros: old kernels stick around by design, which is why you often can roll back by selecting an older entry—unless /boot ran out of space.
  • Initramfs is not optional theater: for encrypted root, RAID, or many storage drivers, it’s the early userspace that assembles the world before /sbin/init gets a chance.
  • “update-grub” is a friendly wrapper around grub-mkconfig. The important part is the generated config and the modules GRUB can actually load.
  • UEFI NVRAM is finite and vendor firmware implementations vary wildly. Systems can and do “forget” boot entries during firmware updates or when the NVRAM fills.

One quote to keep you honest, attributed to Werner Vogels: “Everything fails, all the time.” That’s not nihilism; it’s a design requirement.

On-console triage: GRUB screen types and what they mean

Case A: “No bootable device” or firmware drops into setup

This is usually firmware not finding the EFI loader (UEFI) or missing boot code (legacy BIOS). The OS may be intact. Your work is to restore a valid boot path.

Case B: grub rescue>

GRUB is running in a reduced mode. It can’t find its normal modules/config. Typical causes: wrong prefix, moved partitions, missing /boot/grub, or filesystem GRUB can’t read (less common on Debian defaults, more common with exotic filesystems).

Case C: GRUB menu appears, kernel loads, then you land in initramfs

GRUB did its job. Now the initramfs can’t mount the root filesystem. Common reasons: wrong root UUID in kernel command line, missing mdraid assembly, LUKS device not unlocked, missing driver module, or a regression. This often looks like “Debian won’t boot” but GRUB is innocent.

Joke #1: GRUB is like a bouncer: it looks scary, but most of the time it’s just enforcing a list you gave it.

The clean GRUB rollback that actually works

“Rollback” in bootloader land isn’t a single button. What you want is a known-good set of boot artifacts: a GRUB binary your firmware can load, GRUB modules where GRUB expects them, and a config that points at real kernels and initramfs images. You can get there in two clean ways:

  • Soft rollback (preferred when possible): boot an older kernel from GRUB’s “Advanced options” menu. Then rebuild initramfs and GRUB config in the running system, and only then reinstall the bootloader if needed.
  • Hard rollback (when it won’t boot at all): boot a rescue environment, mount filesystems, chroot, reinstall GRUB cleanly to the correct target (UEFI or BIOS), regenerate initramfs and GRUB config, verify EFI entries.

What you should avoid: copying random EFI binaries around without understanding which one firmware uses, reinstalling GRUB to every disk “just in case,” or editing grub.cfg by hand like it’s 2004. Debian will overwrite it later, and you’ll forget you did it, and future-you will deserve better.

Recovery environment choice

Use what’s closest to the system: a Debian installer in rescue mode, a live image, or your data center’s remote rescue system. The key is that you can mount the installed root filesystem and run Debian tools against it.

Decide boot mode: UEFI vs BIOS

Don’t assume. Many servers support both, and a firmware reset can flip the preference. Debian can be installed either way. Your reinstall must match the mode.

What “clean” means for GRUB

  • UEFI: the correct ESP is mounted at /boot/efi in the chroot, you install grub-efi-amd64 (or arm64 equivalent), you run grub-install once to the right --efi-directory, and you confirm the NVRAM entry (or fallback file) exists.
  • BIOS: you install grub-pc, you run grub-install /dev/sdX to the correct disk(s), not partitions, and you verify the BIOS Boot Partition exists on GPT if needed.

Practical tasks: commands, expected output, and the decision you make

These are the tasks you actually run under pressure. Each one includes: command, what the output means, and what decision you make next.

Task 1: Confirm whether you’re booted in UEFI mode (in rescue/live)

cr0x@server:~$ ls -ld /sys/firmware/efi
drwxr-xr-x 5 root root 0 Dec 28 10:11 /sys/firmware/efi

Meaning: directory exists → you are currently booted in UEFI mode. If it’s missing, you’re in legacy BIOS/CSM mode.

Decision: match the installed system’s boot mode. If the server was installed UEFI but your rescue boot is BIOS, reinstalling UEFI GRUB may fail or install the wrong thing.

Task 2: Inventory disks/partitions and spot the ESP and /boot

cr0x@server:~$ lsblk -o NAME,SIZE,FSTYPE,FSVER,LABEL,PARTLABEL,PARTUUID,MOUNTPOINTS
NAME        SIZE FSTYPE FSVER LABEL PARTLABEL        PARTUUID                             MOUNTPOINTS
nvme0n1   953.9G
├─nvme0n1p1   512M vfat   FAT32       EFI System     7c2f2d6a-7f0a-4d6b-8a8d-1a2d2b8c0c1e
├─nvme0n1p2     2G ext4         boot  Linux filesystem 1b2c3d4e-...                      
└─nvme0n1p3 951.4G crypto_LUKS        luks  Linux filesystem 9a8b7c6d-...                 

Meaning: look for vfat with “EFI System” → that’s your ESP. Look for ext4 labeled boot → likely /boot.

Decision: you now know what to mount where. If you don’t see an ESP on a UEFI system, you may be looking at the wrong disk or the ESP is missing/corrupt.

Task 3: If using LUKS, unlock root (or the container that contains root)

cr0x@server:~$ cryptsetup luksOpen /dev/nvme0n1p3 cryptroot
Enter passphrase for /dev/nvme0n1p3:
cr0x@server:~$ ls /dev/mapper/cryptroot
/dev/mapper/cryptroot

Meaning: the mapping exists → decrypted block device is available.

Decision: proceed to mount LVM or filesystems inside it. If unlock fails, stop and fix keys/TPM/prompting first—bootloader work won’t help.

Task 4: If using LVM inside LUKS, activate volume groups

cr0x@server:~$ vgscan
  Found volume group "vg0" using metadata type lvm2
cr0x@server:~$ vgchange -ay
  2 logical volume(s) in volume group "vg0" now active

Meaning: LVs are active and will show up under /dev/vg0/....

Decision: mount the root LV. If VGs don’t show, you might be missing the right device mapper nodes, or you’re unlocking the wrong container.

Task 5: Mount the installed system (root, /boot, ESP) correctly

cr0x@server:~$ mount /dev/vg0/root /mnt
cr0x@server:~$ mount /dev/nvme0n1p2 /mnt/boot
cr0x@server:~$ mount /dev/nvme0n1p1 /mnt/boot/efi
cr0x@server:~$ findmnt -R /mnt
TARGET        SOURCE           FSTYPE OPTIONS
/mnt          /dev/vg0/root    ext4   rw,relatime
/mnt/boot     /dev/nvme0n1p2   ext4   rw,relatime
/mnt/boot/efi /dev/nvme0n1p1   vfat   rw,relatime,fmask=0022,dmask=0022

Meaning: all expected mountpoints are present. This is where many “reinstall GRUB” attempts go wrong: the ESP wasn’t mounted, so the new EFI binary went into a directory on the root filesystem, not into firmware-readable FAT.

Decision: if /mnt/boot/efi isn’t vfat, fix your mounts before touching GRUB.

Task 6: Bind-mount runtime filesystems and chroot

cr0x@server:~$ mount --bind /dev /mnt/dev
cr0x@server:~$ mount --bind /proc /mnt/proc
cr0x@server:~$ mount --bind /sys /mnt/sys
cr0x@server:~$ chroot /mnt /bin/bash
root@server:/# 

Meaning: you’re now operating as if booted into the installed OS, using its packages and configs.

Decision: all remediation should now use Debian’s tooling from inside this chroot.

Task 7: Check for a half-configured dpkg state (common after interrupted updates)

root@server:/# dpkg --audit
The following packages are only half configured, probably due to problems configuring them the first time.
 grub-efi-amd64
 linux-image-6.12.0-amd64

Meaning: packaging state is inconsistent. This can leave GRUB files missing or kernel artifacts incomplete.

Decision: run recovery steps for dpkg before reinstalling bootloader.

Task 8: Finish interrupted package configuration cleanly

root@server:/# apt-get -f install
Reading package lists... Done
Building dependency tree... Done
Correcting dependencies... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up linux-image-6.12.0-amd64 (6.12.18-1) ...
update-initramfs: Generating /boot/initrd.img-6.12.0-amd64
Setting up grub-efi-amd64 (2.12-5) ...

Meaning: initramfs generation and GRUB package configuration ran. Watch for errors about /boot being full or ESP not mounted.

Decision: if errors appear, address them immediately (space, mounts) before continuing.

Task 9: Check /boot space (this one bites constantly)

root@server:/# df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  2.0G  1.9G   42M  98% /boot

Meaning: /boot is effectively full. Kernel updates may have left partial images or refused to install new ones cleanly.

Decision: remove old kernels properly (not by deleting random files) before regenerating initramfs/GRUB config.

Task 10: List installed kernels and remove stale ones safely

root@server:/# dpkg -l 'linux-image-*' | awk '/^ii/ {print $2 "\t" $3}'
linux-image-6.10.0-amd64	6.10.12-1
linux-image-6.12.0-amd64	6.12.18-1
linux-image-amd64	6.12.18-1
root@server:/# apt-get remove --purge linux-image-6.10.0-amd64
Reading package lists... Done
Building dependency tree... Done
The following packages will be REMOVED:
  linux-image-6.10.0-amd64*
After this operation, 412 MB disk space will be freed.
Removing linux-image-6.10.0-amd64 (6.10.12-1) ...
update-initramfs: Deleting /boot/initrd.img-6.10.0-amd64
run-parts: executing /etc/kernel/postrm.d/zz-update-grub 6.10.0-amd64 /boot/vmlinuz-6.10.0-amd64
Generating grub configuration file ...
done

Meaning: space reclaimed and GRUB config updated as part of kernel removal hooks.

Decision: re-check /boot space; you want headroom for at least one new kernel + initramfs.

Task 11: Regenerate initramfs for the kernel you intend to boot

root@server:/# update-initramfs -u -k 6.12.0-amd64
update-initramfs: Generating /boot/initrd.img-6.12.0-amd64

Meaning: initramfs exists and was rebuilt with current hooks/modules.

Decision: if this errors, fix it before touching GRUB. A perfect bootloader pointing to a broken initramfs still won’t boot.

Task 12: Verify the root filesystem UUIDs match what the system expects

root@server:/# blkid /dev/vg0/root
/dev/vg0/root: UUID="f3a1f9e4-7d72-4d88-8c7f-2b1b9d1a0d4c" BLOCK_SIZE="4096" TYPE="ext4"
root@server:/# grep -E '^\s*UUID=|^\s*/dev/' /etc/fstab
UUID=f3a1f9e4-7d72-4d88-8c7f-2b1b9d1a0d4c / ext4 defaults 0 1
UUID=2D3A-11F0 /boot/efi vfat umask=0077 0 1

Meaning: fstab references current UUIDs. If they mismatch, boot can fail in initramfs or later.

Decision: if UUIDs are wrong, fix /etc/fstab and any crypttab/mdadm references before rebuilding initramfs again.

Task 13: Generate a fresh GRUB config (don’t hand-edit grub.cfg)

root@server:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.12.0-amd64
Found initrd image: /boot/initrd.img-6.12.0-amd64
done

Meaning: GRUB sees the kernel and initramfs in /boot. If it says it found no kernels, your /boot isn’t mounted, or kernels are missing, or the scripts aren’t seeing them.

Decision: only proceed to reinstall GRUB when update-grub produces sane output.

Task 14: Reinstall GRUB for UEFI cleanly (the common Debian 13 server case)

root@server:/# grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck
Installing for x86_64-efi platform.
Installation finished. No error reported.

Meaning: the EFI binary was written to the mounted ESP in the expected directory tree. No error is good; but you still need to verify that firmware entries exist and the files are actually on the ESP.

Decision: check NVRAM boot entries and ESP contents next.

Task 15: Check UEFI NVRAM boot entries and confirm Debian is present

root@server:/# efibootmgr -v
BootCurrent: 0003
Timeout: 1 seconds
BootOrder: 0003,0001,0000
Boot0003* debian	HD(1,GPT,7c2f2d6a-7f0a-4d6b-8a8d-1a2d2b8c0c1e,0x800,0x100000)/File(\EFI\debian\grubx64.efi)
Boot0001* UEFI: Built-in EFI Shell
Boot0000* UEFI PXEv4

Meaning: firmware knows about \EFI\debian\grubx64.efi and it’s in BootOrder.

Decision: if the entry is missing or points to the wrong disk GUID, create/fix it, or put a fallback binary in the default path as a pragmatic backup.

Task 16: Verify ESP file layout (because “mounted” isn’t “correct”)

root@server:/# ls -R /boot/efi/EFI
/boot/efi/EFI:
BOOT  debian

/boot/efi/EFI/BOOT:
BOOTX64.EFI

/boot/efi/EFI/debian:
grubx64.efi  grub.cfg

Meaning: both vendor-neutral fallback and Debian entry exist. On servers with flaky NVRAM, that fallback file is a quiet hero.

Decision: if /boot/efi/EFI is empty, you installed to the wrong place or the ESP isn’t mounted. Stop and correct it.

Task 17: Reinstall GRUB for legacy BIOS (only if that’s your system)

root@server:/# grub-install --target=i386-pc --recheck /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.

Meaning: GRUB core written to the disk’s boot area. On GPT/BIOS, you also need a BIOS Boot Partition; without it, installs can “succeed” but not boot reliably.

Decision: if you’re on GPT and don’t have a BIOS boot partition, reconsider: switching to UEFI is often cleaner on modern servers.

Task 18: Last-mile sanity check: what kernel cmdline will be used

root@server:/# cat /etc/default/grub | sed -n 's/^GRUB_CMDLINE_LINUX_DEFAULT=//p'
"quiet"
root@server:/# cat /etc/default/grub | sed -n 's/^GRUB_CMDLINE_LINUX=//p'
""

Meaning: if you require special parameters (mdraid, cryptdevice, rootdelay, console=ttyS0), this is where persistent config lives. Don’t jam it into grub.cfg.

Decision: if you’re remote-only, set serial console params now, regenerate GRUB config, and spare yourself the next blind incident.

Task 19: Exit chroot, unmount cleanly, reboot

root@server:/# exit
cr0x@server:~$ umount -R /mnt
cr0x@server:~$ reboot

Meaning: clean unmount reduces filesystem surprises. Reboot tests the full chain: firmware → GRUB → kernel → initramfs → root.

Decision: if it still fails, capture the exact new failure stage; don’t repeat the same fix hoping for different physics.

Joke #2: Rebooting is not a fix. It’s a vote. Sometimes it votes for “still broken.”

Three corporate mini-stories (how this fails in production)

Mini-story 1: The incident caused by a wrong assumption

They ran a fleet of Debian servers in a private cloud. Most were installed UEFI. A few older nodes were legacy BIOS because “it didn’t matter” at the time, and the installer defaults were whatever the tech happened to click on that day. Nobody documented which was which. Of course not.

An update cycle rolled through: kernel update plus GRUB updates. One node didn’t come back. Console showed a firmware boot menu. The on-call engineer booted the rescue ISO, chrooted, and ran grub-install /dev/sda out of habit. It “worked.” The system still didn’t boot.

They repeated with more intensity: installed GRUB to both disks (it was mirrored), rebuilt configs, rebooted again. Still firmware menu. Hours disappeared in the special way only a boot loop can steal time.

The problem was simple and humiliating: the node was installed in UEFI mode, and the firmware was looking for an EFI entry and an ESP file. Installing BIOS GRUB to the MBR did nothing except add noise. The rescue ISO had booted in BIOS mode, which reinforced the wrong assumption.

Once they rebooted the rescue ISO in UEFI mode, mounted the ESP, and ran the UEFI-targeted grub-install, it came back immediately. The postmortem action item was equally simple: record boot mode in inventory. Not in someone’s head. Not in a wiki page that nobody opens during an incident. In the system facts that automation can query.

Mini-story 2: The optimization that backfired

A different company had a “lean boot” initiative. Someone noticed /boot was only used during boot and updates, so they shrank it aggressively. Smaller partitions meant faster imaging and less wasted space on thousands of nodes. That was the pitch.

It worked for months. Then a routine update included a kernel and microcode packages. The initramfs got bigger, as they tend to do when hardware support expands and hooks accumulate. One node’s /boot filled up mid-upgrade. The kernel package installed its files, but initramfs generation failed. The package manager output was there—somewhere—but the automation didn’t treat it as a hard failure.

After reboot, GRUB showed the new kernel entry (because the vmlinuz existed) and tried to boot it. The initramfs referenced in the menu did not. The node dropped into initramfs shell, then timed out, then rebooted, then did it again. A perfect little self-sustaining outage.

The fix wasn’t exotic: boot the previous kernel, purge old kernels properly, rebuild initramfs, regenerate GRUB config. The lesson was that “optimization” that removes slack from boot-critical storage is a loan. Eventually you pay it back with interest, usually during a maintenance window that you promised would be boring.

Mini-story 3: The boring but correct practice that saved the day

A finance-ish org ran Debian servers with encrypted roots (LUKS) and strict change control. Their updates were automated but intentionally paced: update a canary, wait, then roll. They also kept two known-good kernels installed at all times and monitored /boot usage.

One evening, a kernel update introduced a regression for a specific storage controller firmware version. The canary rebooted and landed in initramfs because root wasn’t found. The service was down on that node, but the incident didn’t spread because the rollout paused automatically after a failed health check.

Ops used the console to select the previous kernel from GRUB’s advanced options. The node came back. They pinned that kernel version temporarily, rebuilt initramfs with a specific module inclusion, and scheduled the controller firmware update separately.

Nothing heroic happened. Nobody typed magic incantations. The system survived because they did the boring parts: staged rollout, preserved rollback kernels, and treated /boot space as a monitored resource. That’s what reliability looks like most days: dull, repeatable competence.

Common mistakes: symptom → root cause → fix

  • Symptom: grub-install “succeeds” but after reboot you still get firmware boot menu.
    Root cause: ESP wasn’t mounted; you installed into /boot/efi on the root filesystem, not the ESP.
    Fix: mount the real ESP (vfat), verify with findmnt, rerun grub-install --efi-directory=/boot/efi, check efibootmgr -v.
  • Symptom: GRUB menu shows new kernel, but boot drops to initramfs with “cannot find UUID”.
    Root cause: root UUID changed (disk clone/replacement) or crypt/mdraid mapping name changed; initramfs still has old references.
    Fix: correct /etc/fstab, /etc/crypttab, and mdadm config if applicable; run update-initramfs -u -k all.
  • Symptom: update-grub finds no kernels.
    Root cause: /boot not mounted (separate partition), or kernels were removed/never installed due to dpkg errors.
    Fix: mount /boot; verify /boot/vmlinuz-*; repair packages with dpkg --configure -a and apt-get -f install.
  • Symptom: Boot loop after GRUB selection; kernel panic early.
    Root cause: broken initramfs, missing storage driver, or regression in new kernel.
    Fix: boot older kernel; rebuild initramfs; consider pinning the kernel package until regression is addressed.
  • Symptom: grub rescue> prompt, ls shows partitions but normal can’t load.
    Root cause: GRUB prefix points to the wrong partition or /boot/grub missing/corrupted.
    Fix: use rescue/live, mount properly, chroot, reinstall GRUB and regenerate config; also check disk/FS integrity.
  • Symptom: Secure Boot enabled systems refuse to boot after GRUB update.
    Root cause: unsigned or mismatched EFI binaries/shim chain; or wrong package set installed.
    Fix: ensure the correct signed packages are installed for your policy; temporarily disable Secure Boot only as a diagnostic step, then restore a compliant boot chain.
  • Symptom: mdraid root systems drop into initramfs with no arrays assembled.
    Root cause: initramfs missing mdadm config or modules; or metadata version/UUID mismatch after disk replacement.
    Fix: confirm arrays with mdadm --examine in rescue; fix /etc/mdadm/mdadm.conf; rebuild initramfs.

Checklists / step-by-step plan

Checklist A: If you can see a GRUB menu

  1. Select Advanced options and boot the previous kernel.
  2. Once booted, check /boot space and package state.
  3. Rebuild initramfs for the latest kernel once space is adequate.
  4. Run update-grub, then reboot and test latest kernel.
  5. If firmware still doesn’t consistently find GRUB, reinstall GRUB (UEFI/BIOS correctly) and verify with efibootmgr.

Checklist B: If you’re stuck in firmware menu or grub rescue

  1. Boot a rescue environment in the correct mode (UEFI vs BIOS).
  2. Identify disks/partitions with lsblk. Locate root, /boot, and ESP.
  3. Unlock LUKS / assemble RAID / import pools as needed before mounting.
  4. Mount root at /mnt, then mount /mnt/boot and /mnt/boot/efi if present.
  5. Bind-mount /dev, /proc, /sys, then chroot.
  6. Repair dpkg state: dpkg --audit, apt-get -f install.
  7. Fix /boot space if needed; remove old kernels properly.
  8. Rebuild initramfs for the target kernel.
  9. Run update-grub and confirm it finds kernels/initrd.
  10. Reinstall GRUB to the correct target (UEFI or BIOS).
  11. Verify with efibootmgr -v and listing ESP files.
  12. Reboot and watch the console through the first successful boot.

Checklist C: Guardrails to prevent the next one

  1. Monitor /boot usage and alert at 70–80%.
  2. Keep at least one known-good older kernel installed.
  3. Stage updates with canaries; pause rollout on boot failures.
  4. Ensure ESP is mounted and checked during updates (and in config management).
  5. Standardize boot mode per fleet; record it in inventory.
  6. For remote-only systems, configure serial console kernel parameters persistently.

FAQ

1) Is this really a “GRUB problem” if I drop into initramfs?

Usually no. If the kernel started and you’re in initramfs, GRUB did its part. Focus on root device discovery: UUIDs, LUKS unlock, mdraid assembly, missing drivers, or a kernel regression. Rebuild initramfs and validate /etc/fstab//etc/crypttab.

2) Why does grub-install succeed but nothing changes?

Most common: the ESP wasn’t mounted. You wrote EFI files into a directory on your root filesystem, not to the firmware-readable FAT partition. Always verify with findmnt /boot/efi and check the ESP contents afterward.

3) Should I copy grubx64.efi to the fallback path?

On servers with unreliable NVRAM entries, having \EFI\BOOT\BOOTX64.EFI can save you. But do it deliberately: ensure it’s on the ESP, and keep it consistent with your intended loader chain. Don’t leave three conflicting loaders and call it resilience.

4) Can I just delete old files from /boot to free space?

You can, and it sometimes works, and it also leaves dpkg thinking packages still exist. Remove kernels using apt-get remove --purge linux-image-X so hooks update initramfs and GRUB config correctly.

5) My system uses RAID1 for the ESP. Is that okay?

UEFI expects FAT on an ESP; mirroring strategies vary. Some shops keep identical ESPs on both disks and update both. That can work, but you must operationalize it (update process, verification). If you only update one ESP, you’ve created a failover that fails over into an older reality.

6) What if efibootmgr isn’t available in the chroot?

Install it inside the chroot (apt-get install efibootmgr) if network/package sources are available. If not, you can still ensure the ESP has the correct files; many firmwares will boot the fallback path even without a NVRAM entry.

7) Does Secure Boot change the rollback steps?

The mechanics are the same—mount ESP, reinstall the right packages, regenerate configs—but the allowed binaries differ. If Secure Boot is enforced, unsigned EFI binaries may be refused. Treat “disable Secure Boot” as a diagnostic step, not a permanent fix, unless policy allows it.

8) How do I know which disk to run grub-install on in BIOS mode?

Pick the disk the BIOS actually boots first (often the first in boot order), and on mirrored systems consider installing to both boot disks intentionally. But don’t spray and pray; confirm with firmware boot order and your RAID topology. In BIOS mode you install to the whole disk device, not a partition.

9) Why did this happen right after a firmware update?

Firmware updates can reset NVRAM boot entries or change boot ordering. Your OS and ESP may be fine; the firmware simply forgot about the Debian entry. Verify with efibootmgr -v and restore the boot entry or fallback loader path.

10) What’s the single best prevention for “won’t boot after updates”?

Keep rollback options: at least one older kernel installed, sufficient /boot space, staged rollouts, and automated checks that the ESP is mounted during bootloader updates. Reliability isn’t a hero move; it’s compound interest.

Conclusion: next steps that reduce repeat incidents

When Debian 13 won’t boot after updates, treat it as a chain problem. Firmware must find a loader, the loader must find modules/config, GRUB must point at a kernel and initramfs that exist, and initramfs must be able to unlock/assemble/mount root. Fix the broken link, not the whole chain with a hammer.

Do this next, while the incident is fresh:

  • Standardize boot mode across your fleet (UEFI strongly preferred on modern hardware) and record it in inventory.
  • Add monitoring for /boot utilization and alert before it becomes a packaging failure.
  • Keep at least one known-good kernel installed; don’t “clean up” your rollback path.
  • Automate a post-update check: ESP mounted, update-initramfs succeeded, update-grub found kernels, and efibootmgr shows a sane entry.
  • If you run encrypted root, RAID, or ZFS: validate that your initramfs includes the right hooks/modules after each major change. Early userspace is part of your storage stack.
← Previous
ZFS sync: The Setting That Can Make You Fast… and Unsafe
Next →
Office VPN + VLAN: connect segments safely without flattening the network

Leave a comment