Partition resizes are supposed to be boring. You add a few gigs, grow a filesystem, and move on with your day. Then you reboot and get a blinking cursor, a GRUB rescue prompt, or an initramfs shell asking philosophical questions like “where is /?”.
This is the field guide for that moment: the one where production is down, you’re holding a Live USB like a defibrillator, and someone in chat is typing “can we just roll back the partition?” as if partitions have undo buttons.
Rules of engagement (so you don’t make it worse)
Partition recovery isn’t about heroics. It’s about controlling writes. Most “resize disasters” become unrecoverable because someone keeps “trying stuff” that modifies metadata on the wrong device.
Do this first
- Stop writing. If this is a VM, take a snapshot at the hypervisor level now. If it’s physical, clone the disk or image it if you can.
- Work from a Live environment. Don’t boot the broken OS and “hope it heals.” Boot a rescue ISO so you can mount things read-only and think clearly.
- Identify the exact target disk. “/dev/sda” is not a personality trait. Use serials, WWNs, and stable identifiers.
- Assume layer interactions. Partition tables, LVM, mdraid, LUKS, filesystems, bootloaders, fstab/initramfs—resizes can disturb any of them.
One short joke, as a palate cleanser: Partitions are like tattoos—easy to get, hard to remove, and they never look better after an improvised session.
Fast diagnosis playbook
This is the “get your bearings in under 10 minutes” sequence. The order matters because it narrows failure domains fast.
First: do we even see the disk and partitions?
- From the rescue environment: enumerate block devices and partition tables.
- Decision: if the partition table is unreadable, you’re doing GPT/MBR repair before anything else.
Second: can we find the root filesystem and does it mount read-only?
- Try mounting likely root partitions read-only.
- Decision: if mounting fails, pivot to filesystem repair and check for shifted starts/sizes.
Third: if root mounts, why doesn’t it boot?
- Check
/etc/fstab, UUIDs, LVM/mdraid assembly, and bootloader configuration. - Decision: if UUIDs changed, you fix references and rebuild initramfs/GRUB, not the filesystem.
Fourth: UEFI vs BIOS mismatch?
- Verify if the system boots in UEFI mode, and whether the EFI System Partition exists and is mounted.
- Decision: if ESP is missing/corrupt/unmounted, fix it and reinstall GRUB to EFI.
Fifth: confirm the “resize” wasn’t actually a move
- Compare partition start sectors to what they used to be (if you have any historical data, backups, or CMDB notes).
- Decision: if the start sector moved unexpectedly, treat it like data recovery: minimize writes and restore correct geometry.
Interesting facts and a little history
- MBR’s 2 TiB limit isn’t a “Linux problem.” Classic MBR uses 32-bit sector counts, which caps addressable space on 512-byte sectors.
- GPT keeps two copies of its header and partition table: a primary at the beginning and a backup at the end of the disk. This saves you when “the beginning got weird.”
- UEFI made the EFI System Partition (ESP) a first-class citizen. Many boot failures after resizes are actually “ESP not mounted” failures.
- GRUB2 got more modular than legacy GRUB. That flexibility is also why “grub rescue>” exists: it can’t find its modules or config, so it panics politely.
- ext4 can grow online in many cases, but shrinking is still a careful offline dance. People remember “grow is safe” and forget “shrink is a knife fight.”
- XFS famously can’t shrink (by design). If you “shrunk the partition” under XFS, you didn’t shrink XFS; you just cut the floor out from under it.
- LVM was built for change (PV/VG/LV resizing), but it doesn’t exempt you from partition-table correctness. It just adds another layer to be correct about.
- Linux started embracing UUID-based mounts to avoid device-name churn. That’s great—until a resize/regenerate changes identifiers and your fstab becomes historical fiction.
One quote, because reliability engineering has been learning the same lesson for decades. John Allspaw’s well-known framing is often paraphrased as: “Incidents come from normal work in complex systems; blame doesn’t fix the system.”
(paraphrased idea)
What actually breaks during a resize
“Resize” sounds singular. In practice it’s a multi-step pipeline that touches independent metadata domains. A failure in any step can brick boot.
1) Partition table geometry changes
If the partition start moves, everything above it is shifted. Filesystems and LVM store their own expectations about where they live. Shift the start and you’re pointing the OS at the wrong offset on disk. Sometimes the data is still there; you’re just looking at the wrong address.
2) Filesystem metadata no longer matches the block device
For ext4, you can end up with a filesystem that believes it has N blocks, but the partition now offers N-Δ. Mounts fail, fsck complains, or worse: mounts succeed and writes land in the void beyond the new end.
For XFS, “shrink under it” equals corruption risk. XFS expects the block device not to get smaller. When it does, you get I/O errors and log replay failures.
3) Bootloader paths and IDs drift
GRUB often references filesystem UUIDs or searches for specific partitions. UEFI boot entries point to an EFI binary on the ESP. If the ESP moved, was reformatted, or isn’t mounted during updates, your system can boot into a void.
4) initramfs can’t assemble your storage stack
If your root lives on LVM, mdraid, or LUKS, the early boot environment must assemble it. Resizing can change device discovery order, confuse mdadm, or invalidate crypttab expectations. The kernel boots, then drops you into initramfs because it can’t find /dev/mapper/root.
Runbook: 14 concrete tasks with commands, outputs, and decisions
These tasks assume a Linux rescue environment. Commands are written like you’re on the console with root privileges available (via sudo or direct root). Replace device names carefully.
Task 1: List disks and partitions (trust but verify)
cr0x@server:~$ lsblk -o NAME,SIZE,TYPE,FSTYPE,FSVER,FSUUID,MOUNTPOINTS,MODEL,SERIAL
NAME SIZE TYPE FSTYPE FSVER FSUUID MOUNTPOINTS MODEL SERIAL
sda 477G disk Samsung_SSD S4X...
├─sda1 512M part vfat FAT32 1A2B-3C4D /boot/efi
├─sda2 1G part ext4 1.0 2c1d... /boot
└─sda3 475.5G part LVM2_member hP0x...
├─vg0-root 80G lvm ext4 1.0 9f7a... /
└─vg0-var 50G lvm xfs 4e2b... /var
What it means: You can see the disk, partitions, their filesystems, and whether LVM is involved.
Decision: If the expected partitions are missing or sizes look wrong, focus on partition table integrity before filesystem repairs.
Task 2: Confirm partition table type and geometry
cr0x@server:~$ sudo parted -s /dev/sda print
Model: Samsung SSD (scsi)
Disk /dev/sda: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 538MB 537MB fat32 boot, esp
2 538MB 1612MB 1074MB ext4
3 1612MB 512GB 510GB
What it means: Start/end boundaries are in human units; the table is GPT.
Decision: If you expected MBR but see GPT (or vice versa), confirm firmware boot mode and how GRUB was installed. Mismatches cause ghost boots.
Task 3: Get exact start sectors (the only numbers that matter)
cr0x@server:~$ sudo sfdisk -d /dev/sda
label: gpt
device: /dev/sda
unit: sectors
first-lba: 34
last-lba: 1000215214
/dev/sda1 : start= 2048, size= 1048576, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B
/dev/sda2 : start= 1050624, size= 2097152, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4
/dev/sda3 : start= 3147776, size= 996?..., type=E6D6D379-F507-44C2-A23C-238F2A3DF928
What it means: These are the true offsets. If a resize accidentally moved a start sector, this output is your evidence.
Decision: If start sectors changed from known-good values (from notes, backups, screenshots, automation state), stop and plan a geometry correction. Don’t run fsck yet; you might be fsck’ing the wrong offset.
Task 4: Check GPT consistency and recover backup header if needed
cr0x@server:~$ sudo sgdisk -v /dev/sda
Verifying disk /dev/sda
Problem: The secondary GPT header is not at the end of the disk.
Problem: The secondary partition table is not at the end of the disk.
No problems found. 0 free sectors (0 bytes) available in 0 segments.
What it means: Common after disk size changes or botched operations: the backup GPT isn’t where it should be.
Decision: If sgdisk -v reports header/table placement issues, plan to relocate with sgdisk -e (carefully). If it reports CRC errors, prioritize restoring GPT from backup header.
Task 5: Relocate the backup GPT to disk end (safe-ish fix for certain GPT warnings)
cr0x@server:~$ sudo sgdisk -e /dev/sda
Relocating backup data structures to the end of the disk
The operation has completed successfully.
What it means: GPT backup structures are moved to the correct end-of-disk location.
Decision: Do this only when you’re sure the disk size is correct and you’re not trying to preserve an older “end” for data recovery. When in doubt: image the disk first.
Task 6: Probe filesystem signatures (are we looking at the right thing?)
cr0x@server:~$ sudo blkid /dev/sda1 /dev/sda2 /dev/sda3
/dev/sda1: UUID="1A2B-3C4D" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="..."
/dev/sda2: UUID="2c1d..." BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="..."
/dev/sda3: TYPE="LVM2_member" PARTUUID="..."
What it means: blkid reads on-disk superblocks. If it can’t identify something you expect, you might have wrong offsets or corruption.
Decision: If root’s supposed to be ext4 but blkid says “unknown,” don’t format. You’re here to recover, not to create fresh disasters.
Task 7: Assemble mdraid if present (common in servers)
cr0x@server:~$ sudo mdadm --examine --scan
ARRAY /dev/md0 metadata=1.2 name=server:0 UUID=8b3c...
What it means: Arrays are discoverable. If this returns nothing, md metadata may be missing or devices renamed.
Decision: If arrays exist, assemble them before LVM/filesystem work.
cr0x@server:~$ sudo mdadm --assemble --scan
mdadm: /dev/md0 has been started with 2 drives.
Decision: If assembly fails with “device busy” or “wrong uuid,” stop and inspect members individually; don’t force assemble unless you’re sure which members are current.
Task 8: Activate LVM (if your root is on LVs)
cr0x@server:~$ sudo pvscan
PV /dev/sda3 VG vg0 lvm2 [475.50 GiB / 0 free]
Total: 1 [475.50 GiB] / in use: 1 [475.50 GiB] / in no VG: 0 [0 ]
cr0x@server:~$ sudo vgscan
Found volume group "vg0" using metadata type lvm2
cr0x@server:~$ sudo vgchange -ay
2 logical volume(s) in volume group "vg0" now active
What it means: LVM metadata is readable and LVs are activated.
Decision: If pvscan doesn’t see the PV, the partition start may have moved or the PV label got overwritten. Treat it as offset/geometry first, not “LVM is broken.”
Task 9: Try mounting root read-only (don’t be brave yet)
cr0x@server:~$ sudo mount -o ro /dev/vg0/root /mnt
cr0x@server:~$ ls /mnt
bin boot dev etc home lib lib64 opt root sbin usr var
What it means: The filesystem is mountable and likely intact.
Decision: If it mounts read-only, you can often repair boot without running fsck. If it doesn’t mount, capture the error message and pivot to filesystem checks.
Task 10: Validate fstab vs real UUIDs (classic post-resize failure)
cr0x@server:~$ sudo cat /mnt/etc/fstab
UUID=9f7a... / ext4 defaults 0 1
UUID=2c1d... /boot ext4 defaults 0 2
UUID=1A2B-3C4D /boot/efi vfat umask=0077 0 1
What it means: fstab is using UUIDs. Good. Now ensure they match what blkid reports.
Decision: If fstab references UUIDs that don’t exist anymore, you fix fstab (and rebuild initramfs if needed). Don’t “fsck harder.”
Task 11: Check what the initramfs expects (root device, crypt, md, lvm)
cr0x@server:~$ sudo grep -R "UUID=" -n /mnt/etc/default/grub /mnt/etc/crypttab /mnt/etc/fstab 2>/dev/null
/mnt/etc/fstab:1:UUID=9f7a... / ext4 defaults 0 1
/mnt/etc/fstab:2:UUID=2c1d... /boot ext4 defaults 0 2
What it means: You’re searching the config surface area that commonly breaks after resizes.
Decision: If crypttab references an old UUID for the LUKS container, initramfs won’t unlock root. Update it and rebuild initramfs.
Task 12: Chroot properly (so your repairs hit the installed system)
cr0x@server:~$ sudo mount --bind /dev /mnt/dev
cr0x@server:~$ sudo mount --bind /proc /mnt/proc
cr0x@server:~$ sudo mount --bind /sys /mnt/sys
cr0x@server:~$ sudo chroot /mnt /bin/bash
root@server:/# mount | head
/dev/mapper/vg0-root on / type ext4 (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
What it means: You’re now operating inside the installed OS context.
Decision: If root is mounted read-only in chroot and you need to edit files, remount it read-write only after you trust filesystem integrity.
Task 13: Rebuild initramfs (because it caches storage reality)
cr0x@server:~$ update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.0-31-generic
What it means: Early boot image is regenerated with current UUIDs/modules.
Decision: If you changed crypttab, mdadm config, or LVM settings, you rebuild initramfs. Skipping this is how you end up back at the initramfs prompt wondering why the universe hates you.
Task 14: Reinstall GRUB and regenerate config (BIOS and UEFI paths)
cr0x@server:~$ grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
cr0x@server:~$ update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-31-generic
Found initrd image: /boot/initrd.img-6.8.0-31-generic
done
What it means: For BIOS systems, GRUB is installed to the disk MBR/embedding area. Then the config is regenerated.
Decision: If you’re actually UEFI booting, grub-install /dev/sda isn’t the right fix. You install to the ESP and ensure it’s mounted at /boot/efi. Verify boot mode before acting.
“Undo it” strategies that work in real life
You can’t truly “undo” a partition resize like you undo a bad git commit. You can, however, restore geometry and metadata references to what the system expects. That’s usually enough to boot.
Strategy A: Restore correct partition start sector (the big one)
If the partition start moved, your filesystem/LVM is likely intact but displaced. The fix is to put the start sector back exactly. “Close enough” doesn’t exist here. Off by one sector is still wrong, just more confident.
How you do it depends on what tool changed it. In practice, you use parted or sfdisk to recreate the partition entry with the original start and a size that at least covers the old filesystem end. You do not format. You do not create a new filesystem. You only correct the table entry.
If you don’t know the old start sector, check:
- Old tickets/change requests (sometimes someone pasted
sfdisk -doutput). - Monitoring inventories or CMDB snapshots.
- Backups of
/etcthat might include installer logs or partitioning scripts. - GPT backup header (if primary got wrecked) and recovery tools.
Strategy B: Fix UUID drift (fstab/crypttab/grub)
Sometimes nothing moved. You just changed something that caused identifiers to change, or you cloned a disk and now the system has duplicate UUIDs and picks the wrong one. The fix is boring: make identifiers consistent and references correct.
Common moves:
- Update
/etc/fstabto matchblkid. - For encrypted roots: update
/etc/crypttab, then rebuild initramfs. - For mdraid: ensure
/etc/mdadm/mdadm.confis correct, then rebuild initramfs. - Regenerate GRUB configuration so it discovers the right root.
Strategy C: Repair GPT metadata (when the table is “valid-ish” but inconsistent)
GPT’s backup header saves many people. If sgdisk -v reports CRC mismatch or header corruption, you can often restore from the backup header, or relocate it to the new disk end after resizing a virtual disk.
This works well when the data partitions are intact and only the GPT bookkeeping is wrong.
Strategy D: Filesystem repair (only after geometry is correct)
If you resized a filesystem and power died mid-flight, or you resized the partition boundary incorrectly, you may need fsck (ext4) or xfs_repair (XFS). Do it after you confirm you’re pointing at the correct on-disk offset and the block device is the expected size.
Second short joke, because you’ve earned it: Nothing accelerates a “quick resize” like a manager asking for an ETA.
Bootloader repair: BIOS/MBR, UEFI, and the usual traps
Boot failures after partitioning changes are often bootloader failures disguised as storage failures. The trick is to separate “kernel can’t find root” from “firmware can’t find bootloader.”
Know your boot mode
From a rescue environment, check whether you booted the ISO in UEFI mode. If the live environment is in BIOS mode but your installed OS expects UEFI, you can still mount and fix things—but be deliberate about where you install GRUB.
cr0x@server:~$ [ -d /sys/firmware/efi ] && echo UEFI || echo BIOS
UEFI
What it means: If this prints UEFI, the running environment supports EFI variables and you can manage NVRAM entries.
Decision: If you need to repair an UEFI boot, boot the rescue ISO in UEFI mode. Otherwise you’ll reinstall GRUB “successfully” into a place the firmware never uses.
Verify the EFI System Partition (ESP)
The ESP should be FAT32, typically 100–512MB (sometimes larger), with the esp flag. If it’s missing, moved, or not mounted at /boot/efi during GRUB updates, your UEFI boot entry may point to a file that no longer exists.
cr0x@server:~$ sudo lsblk -f | grep -E "vfat|/boot/efi"
sda1 vfat FAT32 1A2B-3C4D /mnt/boot/efi
Decision: If ESP isn’t mounted, mount it and rerun GRUB install/config generation inside chroot.
UEFI GRUB reinstall pattern (inside chroot)
cr0x@server:~$ sudo chroot /mnt /bin/bash
root@server:/# grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=GRUB
Installing for x86_64-efi platform.
Installation finished. No error reported.
root@server:/# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-31-generic
done
What it means: The EFI binary is written to the ESP and GRUB config is regenerated.
Decision: If the firmware still doesn’t boot, inspect NVRAM entries and consider fallback path \EFI\BOOT\BOOTX64.EFI depending on platform policies.
BIOS GRUB reinstall pattern (inside chroot)
For BIOS installs, GRUB needs embedding space (often in the post-MBR gap) or a BIOS boot partition on GPT disks. Resizing operations can accidentally remove that gap or mislabel the BIOS boot partition.
cr0x@server:~$ sudo parted -s /dev/sda print | grep -i "bios"
Partition Table: gpt
Decision: If this is GPT+BIOS, ensure you have a tiny BIOS boot partition (usually 1–2MB) with the bios_grub flag. Without it, GRUB install may fail or “succeed” but not boot.
Filesystem repair and verification (ext4, XFS)
ext4: check, then repair
ext4 is forgiving, but don’t confuse “forgiving” with “invincible.” If the partition end was set smaller than the filesystem, ext4 might mount read-only, refuse to mount, or show journal replay errors.
cr0x@server:~$ sudo e2fsck -f -n /dev/vg0/root
e2fsck 1.47.0 (5-Feb-2023)
/dev/vg0/root: clean, 412345/5242880 files, 8123456/20971520 blocks
What it means: -n is a dry-run. “clean” is what you want to see.
Decision: If dry-run shows only minor issues, plan a real repair with -y in a maintenance window. If it shows “bad magic number,” you’re likely pointing at the wrong device/offset.
cr0x@server:~$ sudo e2fsck -f -y /dev/vg0/root
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vg0/root: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vg0/root: 412346/5242880 files, 8123460/20971520 blocks
Decision: If fsck modifies the filesystem, rerun it until it reports clean. Then mount read-only and validate key paths (/etc, /boot, application data).
XFS: repair realities
XFS can grow; it won’t shrink. If you reduced the underlying block device, you may have truncated live metadata. Sometimes you can repair; sometimes you’re in restore-from-backup territory.
cr0x@server:~$ sudo xfs_repair -n /dev/vg0/var
Phase 1 - find and verify superblock...
Phase 2 - using internal log
Phase 3 - for each AG...
No modify flag set, skipping filesystem flush and exiting.
What it means: Dry-run mode; it tells you whether a repair would attempt modifications.
Decision: If it reports severe geometry mismatch, stop and check whether the LV/partition size matches what XFS expects. Fix the underlying size first (expand back), then repair.
LVM, RAID, and encryption: the layered cake of pain
Modern systems stack abstractions because it makes life better 99% of the time. During recovery, you pay the 1% tax with interest. The key is to rebuild the stack from the bottom up: disk → partition → mdraid → LUKS → LVM → filesystem.
If you resized a partition that holds an LVM PV
Best case: partition grew, PV didn’t, so you just need pvresize. Worst case: partition shrank or moved and PV label isn’t where LVM expects.
cr0x@server:~$ sudo pvs -o+pv_used,pv_size,vg_name
PV VG Fmt Attr PSize PUsed VG
/dev/sda3 vg0 lvm2 a-- <475.50g 130.00g vg0
What it means: LVM sees the PV and its size. If PV size is smaller than the partition and you intended to grow it, that’s a normal post-resize step.
Decision: If PV is visible and you only grew the partition, run pvresize /dev/sda3. If PV is not visible, do not create a new PV. That overwrites metadata.
cr0x@server:~$ sudo pvresize /dev/sda3
Physical volume "/dev/sda3" changed
1 physical volume(s) resized or updated / 0 physical volume(s) not resized
If mdraid won’t assemble after a resize
Resizing partitions that back mdraid members can change their sizes. mdraid is picky: member sizes must be compatible. If one member is smaller, the array may refuse to assemble or mark it faulty.
cr0x@server:~$ sudo cat /proc/mdstat
Personalities : [raid1]
md0 : inactive sdb1[1] sda1[0]
976630336 blocks super 1.2
unused devices: <none>
What it means: The array is inactive. That’s not “fine.” That’s “root is about to vanish.”
Decision: Inspect each member with mdadm --examine, compare event counters, and ensure partition sizes are consistent before forcing assembly.
If LUKS is involved
If root is encrypted, initramfs must unlock it. A resize might not affect LUKS itself, but it can affect the UUID or device path referenced in crypttab, or the underlying partition mapping.
cr0x@server:~$ sudo cryptsetup luksDump /dev/sda3 | head
LUKS header information for /dev/sda3
Version: 2
UUID: 7c0b...
Decision: If the LUKS header reads fine, don’t “repair” it. Fix the references and boot chain. If the header is unreadable, that’s a different incident class—restore from backup or header backup if available.
Common mistakes: symptom → root cause → fix
1) Symptom: “grub rescue>” prompt after reboot
Root cause: GRUB can’t find its modules/config because the partition IDs changed, /boot moved, or the filesystem UUID changed.
Fix: Boot rescue ISO, mount root and boot/ESP, chroot, run grub-install (correct for BIOS/UEFI), update-grub, rebuild initramfs if storage stack changed.
2) Symptom: drops to initramfs with “cannot find UUID=…”
Root cause: fstab/crypttab/initramfs expects an old UUID, or the device isn’t assembled (mdraid/LVM not activated) in early boot.
Fix: In rescue, compare blkid output to /etc/fstab and /etc/crypttab. Update, then update-initramfs -u -k all.
3) Symptom: filesystem mounts, but /var or /home missing
Root cause: Logical volumes exist but weren’t activated, or fstab entries changed and now mountpoints fail silently.
Fix: vgchange -ay, verify LVs with lvs, check journalctl after boot or mount manually to see the real error.
4) Symptom: ext4 “bad magic number” on fsck
Root cause: You are fsck’ing the wrong device (wrong partition, wrong offset due to moved start), or the superblock is damaged.
Fix: Confirm with blkid and partition start sectors. If geometry is correct, try alternate superblocks (mke2fs -n to list). If geometry is wrong, fix the partition table first.
5) Symptom: XFS log replay errors after resize
Root cause: Underlying block device shrank or got truncated; XFS sees inconsistent geometry.
Fix: Expand the partition/LV back to at least the previous size if possible, then run xfs_repair. If it was truly truncated, plan restore.
6) Symptom: system boots, but kernel panics on root mount
Root cause: initramfs missing drivers/modules for storage stack, or wrong root= parameter in GRUB.
Fix: Chroot and rebuild initramfs; regenerate GRUB config; verify storage modules are included.
7) Symptom: “no bootable device” at firmware screen
Root cause: UEFI NVRAM entry points to a missing EFI binary; ESP corrupted; or BIOS boot sector overwritten.
Fix: Mount ESP, reinstall GRUB EFI target, ensure /boot/efi is correct; optionally create fallback EFI path. For BIOS, reinstall GRUB to the correct disk.
Checklists / step-by-step plan
Checklist A: Contain the blast radius
- Snapshot at hypervisor/storage layer if virtualized.
- If physical, consider imaging the disk before writing repairs.
- Boot a rescue ISO in the same boot mode (UEFI vs BIOS) as the installed system.
- Identify the correct disk using model/serial, not just
/dev/sdX.
Checklist B: Triage to the correct failure domain
- Run
lsblkandparted printto see if partitions exist and look plausible. - Run
sfdisk -dto capture start sectors; compare against known-good if you have it. - Run
blkidto confirm filesystem signatures exist on expected partitions. - Assemble mdraid (
mdadm --assemble --scan) and activate LVM (vgchange -ay) if applicable. - Try mounting root read-only.
Checklist C: Repair boot chain (the “it mounts, but it won’t boot” path)
- Mount root at
/mnt. - Mount
/bootand/boot/efiif separate partitions exist. - Verify UUIDs in
/mnt/etc/fstaband/mnt/etc/crypttabmatchblkid. - Bind-mount
/dev,/proc,/sys, then chroot. - Rebuild initramfs.
- Reinstall GRUB appropriate to boot mode; regenerate GRUB config.
- Exit chroot, unmount, reboot.
Checklist D: Repair filesystem (the “it doesn’t mount” path)
- Confirm you have correct partition start sectors before fsck/xfs_repair.
- For ext4:
e2fsck -f -nfirst; then real repair if needed. - For XFS:
xfs_repair -nfirst; never try to “shrink it back.” Expand underlying device first if geometry mismatch. - After repair: mount read-only, validate critical data, then proceed to boot repairs.
Three corporate mini-stories (anonymized, plausible, instructive)
Incident caused by a wrong assumption: “It’s just /dev/sda”
A mid-sized SaaS company had a standard playbook: when a node’s root disk got tight, expand the virtual disk, run a partition grow, then grow the filesystem. Routine. Until it wasn’t.
The engineer on call followed muscle memory and ran the resize against /dev/sda. In that hypervisor cluster, udev ordering was not stable across that template: the OS disk was /dev/sdb, and /dev/sda was a data disk attached for logs. The resize operation “worked.” It even printed reassuring messages. Nobody likes those messages more than the person who just resized the wrong disk.
The reboot that followed was catastrophic in a subtle way. The node booted, but the log volume was now misaligned; the filesystem mounted dirty; application started; then crashed under write load. The alerts were confusing: it looked like an application regression. It was actually a storage geometry problem.
Recovery took longer than it should have because the team chased symptoms at the wrong layer. Once someone compared lsblk -o MODEL,SERIAL against the VM’s disk mapping, it snapped into focus: they resized the wrong device, and the device names weren’t trustworthy identifiers.
They fixed it the boring way: detach the mis-resized disk, snapshot it, repair partition geometry using known-good start sectors from the previous day’s inventory, then run filesystem checks. The OS disk was then resized correctly using stable disk-by-id paths.
An optimization that backfired: shrinking “unused space” to save money
An enterprise team wanted to cut cloud costs. Someone noticed that a fleet of build servers had 200GB disks but used only 40GB. Easy win: shrink volumes, save money. The plan was approved because it looked like housekeeping, not surgery.
The gotcha was XFS on /var, holding build caches and container layers. The script reduced partition size first, planning to shrink the filesystem after. That last step doesn’t exist for XFS. The script didn’t know that; it just knew how to call parted and then “do filesystem stuff.”
Most servers didn’t fail immediately. They failed later, under load, when XFS attempted allocations near the former end of the device. That produced I/O errors, journal issues, and eventually nodes that booted into emergency mode because systemd couldn’t mount /var. The incidents trickled in like a slow leak, which is the worst kind of leak for organizational attention.
The recovery was blunt: expand the disks back to their original sizes, repair XFS where possible, and restore from backups where truncation had bitten metadata. The “optimization” ended up costing more in engineering time and operational risk than it saved.
Lesson learned: shrinking is not the inverse of growing. Not operationally, not mathematically, not emotionally.
A boring but correct practice that saved the day: capturing partition maps before touching them
A financial services team had a tedious pre-change requirement: before any disk/partition work, capture sfdisk -d output and store it with the change record. Engineers complained. It felt like paperwork pretending to be engineering.
Then a production database VM got a disk expansion and partition grow. A junior engineer used a GUI tool that accidentally moved the start sector of the LVM partition. It was a single mistake, but it shifted the PV offset. On reboot, the system dropped into initramfs. LVM couldn’t find the VG. The database was down.
Instead of guessing, the on-call pulled the pre-change sfdisk -d from the ticket, compared it to current geometry, and recreated the partition entry with the original start sector. LVM immediately recognized the PV, the VG activated, and the root mounted like nothing happened.
The rest of the repair was routine: rebuild initramfs to ensure early boot knew how to assemble the stack, reinstall GRUB to be safe, reboot. Total downtime was measured in “people were annoyed,” not “executives were summoned.”
The boring practice worked because it turned a mystery into a geometry correction. That’s the difference between incident response and archaeology.
FAQ
1) Can I “undo” a partition resize?
Not in one button. You can often restore the previous partition geometry (especially the start sector) and fix UUID/config drift. That’s the practical equivalent of undo for many incidents.
2) Should I run fsck immediately?
No. First confirm you’re targeting the right device at the right offset. If the partition start moved, fsck will at best fail and at worst make things worse.
3) Why did it boot before the reboot and then fail?
Because the running kernel had the old mappings in memory and was happily using already-open block devices. Reboot forces the system to rediscover the world from disk metadata, which is where you broke reality.
4) If blkid shows the right UUIDs, why won’t it mount?
UUID presence doesn’t guarantee filesystem consistency. The superblock might be readable but the journal or allocation metadata may be inconsistent. Mount read-only; check dmesg for I/O errors; run the appropriate repair tool.
5) What’s the quickest way to tell UEFI vs BIOS?
In Linux: check if /sys/firmware/efi exists. Also look for an EFI System Partition (vfat, esp flag). Then align your GRUB repair to that mode.
6) My root is on LVM. Do I repair partitions or LVM first?
Partitions first. LVM metadata lives at fixed offsets within the PV. If the partition entry points to the wrong start sector, LVM will look in the wrong place and swear it’s missing.
7) Is sgdisk -e safe?
It’s usually safe when the only problem is that GPT backup structures aren’t at the disk end after a disk expansion. It’s not a universal “repair GPT” wand. Image first if you suspect deeper corruption.
8) Why does resizing the ESP matter?
UEFI firmware boots an EFI binary from the ESP. If the ESP moved, got reformatted, or stopped being mounted during updates, your boot entry can point to a file that doesn’t exist.
9) What if I resized the partition smaller than the data?
If you truncated live filesystem data, you may not be able to fully recover. Your best move is to expand the partition/LV back to the previous size immediately (if possible) and then repair. If data beyond the new end is gone, restore from backup.
10) How do I prevent this next time?
Record partition maps before changes, use stable identifiers (by-id), rehearse on clones, and separate “expand disk” from “change partition start” with explicit guardrails in tooling.
Conclusion: next steps that prevent repeat offenses
If you’re here because a resize nuked boot, your job is to stop the bleeding, restore geometry or references, and get back to a mountable root. The fastest path is rarely “repair everything.” It’s “repair the one layer that’s lying.”
Do these next, even if you’re back online:
- Capture the post-recovery truth: save
lsblk,sfdisk -d,blkid, and boot mode details with the incident record. - Automate pre-change snapshots: VM snapshot or storage snapshot before partition operations. Make it policy, not heroism.
- Standardize device identification: prefer
/dev/disk/by-idin scripts and runbooks. - Ban casual shrinking: treat shrink operations as migration projects, not as “cleanup.” Especially with XFS.
- Test the reboot: after any storage/boot change, schedule a controlled reboot while you still have context and time.
Partitions aren’t scary. Unplanned writes are. Keep your hands steady, your offsets exact, and your repairs boring.