Proxmox “Not a Bootable Disk”: BIOS/UEFI Boot Order, Disk Flags, and a Fast Recovery Path

Was this helpful?

You reboot a Proxmox node for a kernel update, a drive swap, or because a remote hands tech “just reseated the cables.” The screen comes back with the kind of message that ruins your coffee: “not a bootable disk” (or “no boot device,” “insert boot media,” “grub rescue,” pick your poison).

This failure is rarely mysterious. It’s usually one of three things: the firmware is looking in the wrong place, the disk no longer looks bootable to that firmware, or the bootloader/EFI entry got lost in a way only firmware can make feel personal.

Fast diagnosis playbook

This is the “stop bleeding” order. Don’t freestyle. Don’t start reinstalling Proxmox like it’s a laptop from 2009.

1) Confirm firmware mode and boot order (BIOS/UEFI mismatch is the #1 time-waster)

  • If the machine is in UEFI mode, you need an EFI System Partition (ESP), an EFI boot entry, and working EFI binaries.
  • If the machine is in legacy BIOS mode, you need GRUB in the MBR (or BIOS boot partition for GPT), and the disk must be marked bootable in the way that firmware expects.

Decision: pick one mode and stick to it. Mixed-mode boot setups “worked before” until they didn’t—usually right after a firmware reset.

2) Confirm the disk is detected and the right one is first

Firmware boot order is not stable. Swap a SATA cable or add an NVMe and suddenly your boot disk becomes “the other one.”

Decision: if the disk is missing at the hardware level, stop. This is not a GRUB problem.

3) From rescue media, validate partitions: ESP/BIOS boot, root, and what Proxmox actually installed

Don’t guess. Inspect partition tables, flags, and filesystems. If you’re on ZFS, confirm the pool imports.

Decision: if ESP is missing or not mounted, repair/restore it. If ZFS pool won’t import, bootloader work is irrelevant until storage is healthy.

4) Repair bootloader and EFI entries (the minimal change that gets you back)

  • UEFI: mount ESP, reinstall EFI bits, recreate NVRAM entries with efibootmgr if needed.
  • BIOS: reinstall GRUB to the correct disk and regenerate config.

Decision: don’t reinstall Proxmox unless your OS partition is actually gone or irreparably corrupted.

5) Make boot redundant (so this isn’t a recurring meeting)

UEFI boot entries and ESPs can be single points of failure even on ZFS mirrors. Fix that after the node boots, not before.

One paraphrased idea often attributed in SRE circles: “Hope is not a strategy” — paraphrased idea (commonly linked to reliability/operations culture). Use checklists, not vibes.

What “not a bootable disk” actually means

The message is a firmware complaint, not a Proxmox complaint. It means the system firmware tried its configured boot targets and couldn’t find something it considers bootable.

Where it fails in the chain

  1. Hardware detection: does the firmware see the disk at all?
  2. Firmware boot selection: is it trying the correct disk and the correct mode (UEFI vs legacy)?
  3. Boot code location:
    • Legacy BIOS: MBR/boot sector → GRUB stage1 → GRUB core image → /boot/grub
    • UEFI: NVRAM boot entry → ESP FAT32 → EFI binary (shim/grub/systemd-boot) → kernel/initramfs
  4. OS handoff: kernel mounts root filesystem; on ZFS it may import the pool early.

Most Proxmox “not bootable” incidents are one of these:

  • Firmware silently reset to default (often toggling UEFI/legacy behavior).
  • Boot order changed after adding/removing disks.
  • ESP got overwritten, not mounted, or created on only one disk in a mirror.
  • GRUB installed to the wrong device (common after disk replacement).
  • UEFI NVRAM entries wiped (firmware update, CMOS reset, some remote-management shenanigans).

Joke #1: Firmware boot order is like office seating: you can be “the important one” until Facilities moves a plant.

Interesting facts and historical context (because the past keeps breaking your present)

  • BIOS predates your career: IBM PC BIOS conventions date back to the early 1980s, and a lot of “boot magic” is still backwards-compatible duct tape.
  • UEFI was meant to replace BIOS: it formalized boot as loading EFI binaries from a FAT filesystem, not from opaque boot sectors.
  • GPT vs MBR isn’t just partition count: GPT pairs naturally with UEFI, while legacy BIOS booting from GPT needs a dedicated “BIOS boot partition” for GRUB.
  • The ESP is just FAT32: your “high availability virtualization host” often boots from a filesystem designed for cameras and USB sticks.
  • NVRAM boot entries are fragile: firmware updates, CMOS resets, and sometimes power events can wipe them, even while disks remain perfect.
  • Secure Boot changed expectations: machines increasingly default to Secure Boot enabled; Proxmox deployments often disable it unless you manage signed boot chains.
  • Linux bootloaders evolved: LILO gave way to GRUB, then GRUB2; each improved flexibility while adding new ways to misconfigure a boot environment.
  • ZFS boot is “supported,” not “simplified”: ZFS adds reliability for data, but boot still depends on tiny non-ZFS pieces: ESPs, GRUB, initramfs modules.
  • Device naming is not stable: /dev/sda today can be /dev/sdb tomorrow after HBA changes; modern practice leans on by-id paths for a reason.

Practical tasks: commands, what the output means, and the decision you make

These are field tasks you can run from a Proxmox shell, a Debian live ISO, or the Proxmox installer in rescue mode (where available). The goal is to move from “won’t boot” to “I know exactly which layer failed.”

Task 1: Confirm whether you booted the rescue environment in UEFI or BIOS mode

cr0x@server:~$ [ -d /sys/firmware/efi ] && echo UEFI || echo BIOS
UEFI

Meaning: If this prints UEFI, your rescue session is UEFI. That should match how you intend the node to boot.

Decision: If the machine used to boot in legacy BIOS and now you’re in UEFI (or vice versa), fix firmware settings before you touch disks.

Task 2: See what disks the OS can see (hardware sanity check)

cr0x@server:~$ lsblk -e7 -o NAME,SIZE,MODEL,SERIAL,TYPE,FSTYPE,MOUNTPOINTS
NAME      SIZE MODEL            SERIAL           TYPE FSTYPE MOUNTPOINTS
sda     447.1G INTEL SSDSC2KB   BTWL1234567      disk
├─sda1    512M                  -                part vfat
├─sda2   1007K                  -                part
└─sda3  446.6G                  -                part zfs_member
sdb     447.1G INTEL SSDSC2KB   BTWL7654321      disk
├─sdb1    512M                  -                part vfat
├─sdb2   1007K                  -                part
└─sdb3  446.6G                  -                part zfs_member

Meaning: Disks are visible, partitions exist, and you can already spot an ESP (vfat) and ZFS members.

Decision: If your boot disk isn’t listed at all, stop and fix cabling/HBA/BIOS detection. Bootloader work won’t summon missing hardware.

Task 3: Identify how the disk is partitioned (GPT vs MBR) and whether an ESP exists

cr0x@server:~$ parted -l
Model: ATA INTEL SSDSC2KB (scsi)
Disk /dev/sda: 480GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End    Size   File system  Name  Flags
 1      1049kB  538MB  537MB  fat32              boot, esp
 2      538MB   539MB  1049kB                   bios_grub
 3      539MB   480GB  479GB

Model: ATA INTEL SSDSC2KB (scsi)
Disk /dev/sdb: 480GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End    Size   File system  Name  Flags
 1      1049kB  538MB  537MB  fat32              boot, esp
 2      538MB   539MB  1049kB                   bios_grub
 3      539MB   480GB  479GB

Meaning: GPT disks with both esp and bios_grub are common in Proxmox installs that aim to be flexible. The ESP flag is the important part for UEFI boot.

Decision: If there is no ESP (boot, esp) but the server is in UEFI mode, you’ve found the mismatch.

Task 4: Check the filesystem integrity of the ESP (it’s FAT; treat it like it’s 1998)

cr0x@server:~$ fsck.vfat -n /dev/sda1
fsck.fat 4.2 (2021-01-31)
/dev/sda1: 12 files, 1646/130560 clusters

Meaning: Read-only check passed; no obvious FAT corruption.

Decision: If you see FAT errors, plan to repair (-a) or recreate the ESP and reinstall EFI files.

Task 5: Mount the ESP and verify EFI binaries exist

cr0x@server:~$ mkdir -p /mnt/esp && mount /dev/sda1 /mnt/esp && find /mnt/esp/EFI -maxdepth 2 -type f | sed -n '1,10p'
/mnt/esp/EFI/BOOT/BOOTX64.EFI
/mnt/esp/EFI/proxmox/grubx64.efi
/mnt/esp/EFI/debian/grubx64.efi

Meaning: Files exist; there’s a fallback EFI/BOOT/BOOTX64.EFI and vendor paths.

Decision: If /mnt/esp/EFI is empty or missing, reinstall bootloader and consider creating the fallback path.

Task 6: Inspect UEFI NVRAM boot entries (the firmware “bookmark list”)

cr0x@server:~$ efibootmgr -v
BootCurrent: 0002
Timeout: 2 seconds
BootOrder: 0002,0001,0000
Boot0000* UEFI: PXE IPv4 Intel(R) I350
Boot0001* UEFI: Built-in EFI Shell
Boot0002* proxmox	HD(1,GPT,2b0c3f1a-9b2f-4a7e-9d64-3f3c3a0e2f21,0x800,0x100000)/File(\EFI\proxmox\grubx64.efi)

Meaning: There is a Proxmox entry and it’s first. Great—unless it points to the wrong disk/partition UUID.

Decision: If there is no Proxmox entry, or it points to a dead disk, recreate it with efibootmgr after verifying the correct ESP.

Task 7: Import ZFS pool from rescue and confirm it’s healthy enough to boot

cr0x@server:~$ zpool import
   pool: rpool
     id: 1234567890123456789
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	rpool        ONLINE
	  mirror-0    ONLINE
	    sda3      ONLINE
	    sdb3      ONLINE

Meaning: The root pool exists and is importable.

Decision: If the pool is DEGRADED, you can often still boot, but you should treat boot repair as a chance to fix redundancy too.

Task 8: Mount the installed system and prepare for chroot repair

cr0x@server:~$ zpool import -R /mnt rpool
cr0x@server:~$ zfs mount -a
cr0x@server:~$ mount | grep -E '(/mnt|rpool)' | sed -n '1,8p'
rpool/ROOT/pve-1 on /mnt type zfs (rw,relatime,xattr,posixacl)
rpool/data on /mnt/rpool/data type zfs (rw,relatime,xattr,posixacl)

Meaning: Your root dataset is mounted at /mnt.

Decision: If mounts fail, don’t proceed to reinstall GRUB. Fix ZFS import/mount first.

Task 9: Bind-mount runtime filesystems and chroot

cr0x@server:~$ for i in /dev /dev/pts /proc /sys /run; do mount --bind $i /mnt$i; done
cr0x@server:~$ chroot /mnt /bin/bash
root@server:/# cat /etc/os-release | head -2
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"

Meaning: You’re inside the installed OS environment; bootloader tools will behave correctly.

Decision: If chroot fails due to missing binaries, your root filesystem is not what you think it is.

Task 10 (UEFI): Ensure ESP is mounted at /boot/efi and reinstall GRUB EFI

root@server:/# mkdir -p /boot/efi
root@server:/# mount /dev/sda1 /boot/efi
root@server:/# grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=proxmox
Installing for x86_64-efi platform.
Installation finished. No error reported.

Meaning: GRUB EFI binaries were installed into the ESP under the proxmox label.

Decision: If this errors about “EFI variables not supported,” you likely booted the rescue environment in BIOS mode. Reboot rescue in UEFI.

Task 11 (UEFI): Regenerate GRUB menu and initramfs

root@server:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.12-2-pve
Found initrd image: /boot/initrd.img-6.8.12-2-pve
done
root@server:/# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.12-2-pve

Meaning: Boot menu points at real kernels; initramfs is rebuilt.

Decision: If kernels aren’t found, you may be chrooted into the wrong root dataset or /boot isn’t mounted correctly.

Task 12 (UEFI): Recreate a missing NVRAM entry

root@server:/# efibootmgr -c -d /dev/sda -p 1 -L "proxmox" -l '\EFI\proxmox\grubx64.efi'
BootCurrent: 0002
BootOrder: 0002,0001,0000,0003
Boot0003* proxmox	HD(1,GPT,2b0c3f1a-9b2f-4a7e-9d64-3f3c3a0e2f21,0x800,0x100000)/File(\EFI\proxmox\grubx64.efi)

Meaning: You created a new entry. It might not be first yet.

Decision: Set BootOrder so proxmox is first, or rely on the fallback loader as belt-and-suspenders.

Task 13 (BIOS): Reinstall GRUB to the disk MBR (or BIOS boot area)

root@server:/# grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
root@server:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.12-2-pve
Found initrd image: /boot/initrd.img-6.8.12-2-pve
done

Meaning: Legacy GRUB installed to /dev/sda.

Decision: If you’re on a mirror, install GRUB to each member disk you might boot from.

Task 14: Verify GRUB sees the right root and doesn’t reference /dev/sdX names

root@server:/# grep -R "root=" /boot/grub/grub.cfg | sed -n '1,5p'
linux	/boot/vmlinuz-6.8.12-2-pve root=ZFS=rpool/ROOT/pve-1 ro quiet

Meaning: Root is referenced via ZFS dataset, not unstable disk paths.

Decision: If you see root=/dev/sda3 on systems where naming can change, fix /etc/default/grub and regenerate.

Task 15: Build a fallback EFI path (because NVRAM isn’t to be trusted)

root@server:/# mkdir -p /boot/efi/EFI/BOOT
root@server:/# cp -f /boot/efi/EFI/proxmox/grubx64.efi /boot/efi/EFI/BOOT/BOOTX64.EFI
root@server:/# ls -l /boot/efi/EFI/BOOT/BOOTX64.EFI
-rwx------ 1 root root 1536000 Dec 26 11:11 /boot/efi/EFI/BOOT/BOOTX64.EFI

Meaning: Even if the firmware forgets NVRAM entries, it can often boot the default path.

Decision: Do this on each ESP you expect to boot from.

Task 16: After reboot, confirm the system actually booted via the intended mode

cr0x@server:~$ dmesg | grep -i "efi\|UEFI" | sed -n '1,6p'
[    0.000000] efi: EFI v2.70 by American Megatrends
[    0.000000] efi: ACPI=0x9bfe6000 ACPI 2.0=0x9bfe6014 SMBIOS=0x9c01c000

Meaning: Kernel sees EFI; you’re in UEFI boot.

Decision: If you intended legacy boot and see EFI, you’re back to “mixed-mode”; fix it before the next maintenance window fixes you.

UEFI vs BIOS: choosing the right recovery path

The simplest rule

If the machine is capable of UEFI (most are), prefer UEFI for new installs. But don’t “convert” on a live incident unless you enjoy surprise downtime. Recover in the mode it was installed, then plan a controlled migration if you must.

How Proxmox typically lays things out

Proxmox VE rides on Debian. The installer will create partitions for:

  • ESP (usually 512MB FAT32) when using UEFI.
  • bios_grub tiny partition (around 1MB) when booting BIOS from GPT, so GRUB has a place to embed.
  • Root filesystem on ext4, LVM, or ZFS (common in modern Proxmox setups).

UEFI failure modes that look like “disk is not bootable”

  • ESP exists but isn’t used: boot order points to PXE or some “UEFI: USB” placeholder above your disk.
  • NVRAM entry missing: the firmware forgot your Proxmox boot entry. Disk is fine; firmware memory isn’t.
  • Wrong disk path in NVRAM: you replaced a mirrored disk and the boot entry still points to the old drive’s GPT UUID.
  • ESP only on one disk: ZFS mirror keeps your data safe, but you still have one ESP. Lose that disk, lose boot.

BIOS failure modes that look like “disk is not bootable”

  • GRUB installed to the wrong disk: the system boots the first disk in BIOS order, but GRUB is on the second.
  • MBR overwritten: a disk cloning operation, an installer, or a “helpful” vendor tool stomped the boot sector.
  • GPT without bios_grub: legacy GRUB can’t embed its core image properly.

Joke #2: The BIOS doesn’t hate you. It just expresses love by ignoring your last three configuration changes.

Proxmox + ZFS: where boot redundancy helps, and where it absolutely doesn’t

ZFS makes root storage robust. It does not automatically make boot robust. Your ZFS mirror can happily keep serving perfect blocks while your firmware stares at an empty NVRAM list like it’s never met your server before.

What ZFS protects

  • Root datasets, VM disks stored on the pool, metadata consistency, and detection of silent corruption.
  • Operational recovery: replace a failed disk, resilver, keep running.

What ZFS does not protect

  • The ESP, unless you created ESPs on multiple disks and kept them in sync.
  • UEFI NVRAM entries, because they live in firmware, not on disk.
  • Firmware boot order, because it’s a firmware setting and it will reset at the worst moment.

What you should do on mirrored boot disks

If you have two disks and you expect to survive losing either one, you need two bootable paths:

  • Two ESPs (one per disk) and a plan to keep them consistent.
  • UEFI entries for both, or a robust fallback loader in EFI/BOOT/BOOTX64.EFI on both.
  • If booting legacy BIOS: GRUB installed to both disks.

In Proxmox environments, the boring pragmatic approach is: ESP per disk + fallback loader + periodic drift check. Fancy is optional.

Common mistakes (symptom → root cause → fix)

1) Symptom: “No bootable device” after a firmware update

Root cause: UEFI NVRAM boot entries got wiped or reordered; PXE or “UEFI Shell” moved ahead.

Fix: Use firmware setup to select the correct UEFI entry. If missing, recreate with efibootmgr from rescue and copy fallback BOOTX64.EFI.

2) Symptom: Boots only when a specific disk is present (mirror exists, but one disk is “special”)

Root cause: Only one disk has an ESP or only one disk has GRUB installed.

Fix: Create ESP on the other disk, mount it, install GRUB EFI to it (or GRUB BIOS to the disk), then add fallback path.

3) Symptom: “EFI variables are not supported” during repair

Root cause: You booted the rescue environment in BIOS mode, so efivars aren’t available.

Fix: Reboot the installer/live ISO in UEFI mode and repeat the GRUB EFI install.

4) Symptom: GRUB loads, then drops to “grub rescue>”

Root cause: GRUB can’t find its modules or config due to moved partitions, changed UUIDs, or /boot not where it expects.

Fix: From rescue, chroot and rerun grub-install + update-grub. Verify /etc/fstab and that ESP mounts correctly.

5) Symptom: “Not a bootable disk” right after adding a new disk

Root cause: Boot order changed; firmware now tries the new disk first. Some HBAs also reorder disks.

Fix: Reorder boot targets; disable booting from non-OS disks if firmware supports it.

6) Symptom: Works after manual boot selection, fails on the next reboot

Root cause: Firmware stores a one-time override but not permanent order; or your permanent setting isn’t saving due to CMOS battery issues.

Fix: Set persistent boot order; check BIOS event logs; replace CMOS battery if settings drift.

7) Symptom: After disk replacement, system won’t boot but ZFS pool looks fine

Root cause: New disk lacks ESP/bootloader, or NVRAM entry points to old disk’s partition GUID.

Fix: Partition the new disk appropriately, create ESP, install GRUB, update efiboot entries, and add fallback loader.

8) Symptom: Secure Boot complaints or “invalid signature”

Root cause: Secure Boot enabled but your boot chain isn’t signed in a way that firmware accepts.

Fix: Disable Secure Boot for typical Proxmox deployments unless you have a managed signed boot workflow.

Checklists / step-by-step plan

Checklist A: What to collect before you start changing things

  1. Was the node installed in UEFI or BIOS mode?
  2. Did anything change? Firmware update, disk replacement, new HBA, cable move, power event, remote hands?
  3. Is there console access (iKVM/IDRAC/iLO) or only SSH (you probably don’t have SSH)?
  4. What storage layout: ZFS mirror, ZFS RAIDZ, ext4, LVM-thin?
  5. Which disk is supposed to be bootable? Identify by serial, not by /dev/sdX.

Checklist B: Minimal recovery (UEFI)

  1. Boot rescue media in UEFI mode.
  2. Confirm disks visible with lsblk.
  3. Verify ESP exists and is FAT32 with parted -l.
  4. Import ZFS pool (if applicable) and mount root at /mnt.
  5. Bind mount /dev, /proc, /sys, /run and chroot.
  6. Mount ESP at /boot/efi.
  7. Run grub-install --target=x86_64-efi and update-grub.
  8. Create/verify NVRAM entry with efibootmgr.
  9. Copy fallback loader to EFI/BOOT/BOOTX64.EFI.
  10. Reboot and verify boot mode via dmesg.

Checklist C: Minimal recovery (legacy BIOS)

  1. Boot rescue media (mode doesn’t matter as much, but be consistent).
  2. Confirm disk presence and partitioning; if GPT, ensure bios_grub exists.
  3. Mount root filesystem (or import ZFS) and chroot.
  4. Run grub-install /dev/sdX to the correct disk.
  5. Run update-grub.
  6. If mirrored, repeat grub-install on the other disk(s).
  7. Set BIOS boot order to a disk that actually has GRUB.

Checklist D: Post-recovery hardening (do this while you still remember the pain)

  1. Make ESPs redundant across mirrored boot disks.
  2. Write down firmware boot mode and required BIOS settings in runbooks.
  3. Disable boot from data disks and random removable media in firmware.
  4. After any disk replacement, reinstall bootloader onto the new disk as part of the procedure.
  5. Add a periodic “boot chain audit” task (ESP contents + efibootmgr output + grub-install status).

Three corporate mini-stories from the boot-failure trenches

Mini-story 1: The incident caused by a wrong assumption

They had a Proxmox cluster where every node was “identical.” Identical is a dangerous word in production because it encourages people to stop looking. One node refused to boot after a planned outage: “not a bootable disk.” The on-call assumed disk failure and opened a ticket for a replacement. Reasonable, except the disks were fine.

The actual root cause was a firmware reset during maintenance. The server flipped from legacy BIOS to UEFI-first, and the installed system was legacy. The disks were GPT with a BIOS boot partition, and GRUB was correctly installed for i386-pc. But the firmware only cared about UEFI binaries on an ESP that didn’t exist.

Recovery was almost insulting in its simplicity: set firmware back to legacy, reboot, node returns. The “incident” wasn’t storage corruption or an OS update. It was a settings mismatch created by the assumption that “UEFI is on everywhere.”

What changed afterward was process, not technology. They added a one-page hardware profile per node: boot mode, secure boot state, RAID/HBA mode, and the expected boot disk serial. It wasn’t glamorous. It also prevented a repeat.

Mini-story 2: The optimization that backfired

A different team wanted faster boot times after kernel updates. They trimmed what they considered “extra” boot artifacts. The ESP was resized down to something tiny, and they removed fallback EFI files because “we already have a proper NVRAM entry.” They also standardized on a single ESP on the first disk of a ZFS mirror, because maintaining two felt like “waste.”

Then a disk failed. ZFS did its job; the pool stayed online. They swapped the disk, resilvered, and scheduled a reboot to validate everything. On reboot, the firmware tried to boot from the remaining disk (now first in the firmware’s eyes) and couldn’t find a valid EFI path. The only ESP was on the removed disk. The fallback path had been deleted during the optimization.

The downtime wasn’t catastrophic, but it was unnecessary and annoying. Most of the time was spent getting remote hands to mount media and provide console access, not actually fixing software. The fix was a proper mirrored ESP strategy and reinstating the default EFI fallback binary.

The lesson wasn’t “never optimize.” It was that boot resilience is a system property, not a disk property. ZFS redundancy doesn’t cover firmware amnesia or the lack of boot artifacts on the surviving device.

Mini-story 3: The boring but correct practice that saved the day

An enterprise had a habit that looked excessive: after any disk replacement on a Proxmox host, the procedure included reinstalling bootloaders and verifying both boot paths. Every time. No exceptions. It was the kind of step people roll their eyes at—until it pays rent.

One weekend, a node lost power, then refused to boot. The console showed the firmware scanning drives and giving up. The on-call followed the playbook: verify UEFI mode, inspect ESPs on both disks, and check efibootmgr. The NVRAM entries were gone, likely wiped by a firmware quirk during power recovery.

Normally that would have meant reconstructing entries manually. But their boring practice included copying a fallback loader to EFI/BOOT/BOOTX64.EFI on both ESPs. The firmware, having forgotten everything else, still knew the default path. It booted without anyone touching NVRAM.

They restored proper boot entries afterward, but the key point is that recovery happened quickly, without special access, and without guesswork. It wasn’t heroics. It was routine discipline that made the system less dramatic.

FAQ

1) Is “not a bootable disk” usually a dead drive?

No. Dead drives happen, but the common reality is firmware boot order or mode mismatch. Confirm disk detection first, then boot mode.

2) Can I fix this by reinstalling Proxmox?

You can, but you shouldn’t. Reinstalling is a blunt instrument that risks wiping storage or breaking cluster expectations. Repair boot first; it’s faster and safer when done carefully.

3) How do I know if Proxmox was installed in UEFI mode?

If the installed system has /sys/firmware/efi when running, it booted via UEFI. On disk, look for an ESP (FAT32, flagged esp).

4) Why does replacing a disk in a ZFS mirror break boot?

ZFS mirrors your data partitions, not automatically your ESP or firmware boot entries. If the ESP or bootloader was only on the removed disk, the remaining disk won’t boot by itself.

5) Do I need both an ESP and a bios_grub partition?

Not always. UEFI needs an ESP. Legacy BIOS booting from GPT benefits from a bios_grub partition. Some installers create both for flexibility. The key is: the active boot mode must have its required pieces.

6) What’s the quickest “just boot something” move if NVRAM entries are broken?

Create the fallback path EFI/BOOT/BOOTX64.EFI on the ESP. Many firmware implementations will boot it even with empty NVRAM entries.

7) Should I enable Secure Boot on Proxmox?

If you don’t have a planned, tested signed boot chain, Secure Boot tends to create boot failures during updates or repairs. Many Proxmox deployments disable it deliberately.

8) Why does grub-install say “EFI variables are not supported”?

Because you’re not booted in UEFI mode (or efivars aren’t available). Boot the rescue media in UEFI mode so /sys/firmware/efi exists and try again.

9) Do I need to reinstall GRUB after every kernel update?

Usually no. Kernel updates regenerate GRUB config and initramfs automatically. Reinstall GRUB when disks change, ESPs are rebuilt, or bootloader files are missing.

10) Can I keep boot entries stable across hardware changes?

Some stability comes from using correct disk partition GUIDs and keeping ESPs consistent, but firmware behavior varies. That’s why fallback loaders and redundant ESPs are practical.

Conclusion: practical next steps

If you’re staring at “not a bootable disk,” treat it like an incident at the firmware/boot boundary, not an OS reinstall invitation. Start with mode and order. Then inspect partitions and ESP contents. Only then reinstall GRUB or recreate EFI entries. That sequence avoids the two classic time sinks: repairing the wrong layer, and “fixing” a system into a different boot mode without intending to.

After you’re back up, do the unexciting hardening: make ESPs redundant on mirrored disks, copy a fallback EFI loader, and document the expected firmware settings per node. The next reboot should be boring. Production loves boring.

← Previous
ZFS redundant_metadata: When More Metadata Copies Actually Matter
Next →
Ubuntu 24.04: Hostname/hosts mismatch breaks sudo/ssh — the boring fix that saves hours

Leave a comment