You’re trying to do the one thing Proxmox is famously good at: PCIe passthrough. GPU, HBA, NIC, whatever. Then the host calmly informs you: No IOMMU detected. Translation: your hypervisor is blindfolded and you’re asking it to juggle chainsaws.
This is usually not “a Proxmox bug.” It’s a settings mismatch between firmware mode (UEFI/Legacy), CPU virtualization features (VT-d / AMD-Vi), and the kernel flags that actually turn the IOMMU on. The fix is quick when you know where to look—and maddening when you don’t.
What “No IOMMU detected” actually means
On x86 virtualization, an IOMMU is the piece of hardware that lets the host control DMA (direct memory access) from devices. Without it, a PCIe device can potentially read/write memory it shouldn’t. That’s a non-starter for safe passthrough.
Proxmox (KVM/QEMU under the hood) needs the kernel to expose an active IOMMU. If you see “No IOMMU detected,” one of these is true:
- The CPU/chipset doesn’t support it (rare on modern servers; less rare on budget desktop parts).
- Firmware disabled the feature (VT-d on Intel, AMD-Vi/IOMMU on AMD).
- You booted in a mode/configuration where the kernel didn’t enable it (missing kernel flags, wrong bootloader config, Secure Boot quirks).
- It’s enabled, but you’re checking the wrong evidence (common—people grep the wrong log, then chase ghosts).
And here’s the important operational truth: “IOMMU enabled” is not binary in practice. You can have a detected IOMMU with terrible grouping, broken interrupt remapping, or a device that won’t reset. You still won’t get stable passthrough.
Fast diagnosis playbook (first/second/third)
If you’re in a hurry (you are), do this in order. It’s not philosophical. It’s fast.
First: prove whether the kernel sees an IOMMU
Check the live kernel messages for Intel DMAR or AMD-Vi. This is the quickest source of truth.
cr0x@server:~$ dmesg | egrep -i 'dmar|iommu|amd-vi|ivrs' | head -n 50
[ 0.000000] DMAR: IOMMU enabled
[ 0.000000] DMAR: Host address width 39
[ 0.000000] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
What it means: If you see “DMAR: IOMMU enabled” (Intel) or “AMD-Vi: IOMMU enabled” (AMD), the “No IOMMU detected” message is coming from a different layer or an older boot.
Decision: If there’s no relevant line at all, go to firmware settings and kernel flags immediately. If there are lines but grouping is bad, jump to the IOMMU groups section.
Second: confirm you’re booted the way you think you’re booted
cr0x@server:~$ cat /sys/firmware/efi/fw_platform_size 2>/dev/null || echo "Not booted via UEFI"
64
What it means: “64” means UEFI boot. If you get “Not booted via UEFI,” you’re in legacy mode. That can still work, but UEFI tends to be less weird on modern platforms.
Decision: If the platform supports UEFI, use it. Mixing legacy boot with modern virtualization features is how you earn weekend work.
Third: verify kernel flags are actually applied
cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.12-3-pve root=ZFS=rpool/ROOT/pve-1 ro quiet intel_iommu=on iommu=pt
What it means: You should see intel_iommu=on for Intel or amd_iommu=on for AMD. Often you also want iommu=pt to reduce overhead for host-owned devices.
Decision: If the flags aren’t present, fix GRUB or systemd-boot. If they are present and dmesg still shows no IOMMU, firmware is the culprit (or the hardware is).
Interesting facts (and a little history) about IOMMU
- DMA pre-dates modern virtualization by decades. Early systems let devices write directly to memory because CPUs were slow and copying buffers was expensive.
- Intel’s “VT-d” and AMD’s “AMD-Vi” arrived because hypervisors needed safe DMA. Without an IOMMU, passthrough is basically “trust the device and pray.”
- The “DMAR” ACPI table is a key artifact. On Intel systems, the kernel often discovers IOMMU capability via the DMAR table provided by firmware.
- Interrupt remapping matters. A working IOMMU is not just address translation; proper interrupt remapping reduces attack surface and improves isolation.
- IOMMU grouping is largely a motherboard/topology story. Groups depend on PCIe root ports, switches, and ACS (Access Control Services) behavior.
- ACS wasn’t invented for home labs. It’s a PCIe feature designed to enforce isolation in multi-function/multi-tenant systems—then hobbyists discovered it enables nicer VFIO layouts.
- “iommu=pt” is a performance compromise. It uses pass-through mappings for host devices to reduce translation overhead, while still allowing isolation where needed.
- UEFI didn’t make this simpler; it made it different. Many platforms ship better ACPI tables in UEFI mode, but also introduce Secure Boot and NVRAM complexities.
Pre-checks: confirm CPU, firmware mode, and Proxmox bootloader
Before you touch BIOS settings, confirm what you’re dealing with. This is production systems 101: measure first, change second.
Task 1: Identify CPU vendor and virtualization capabilities
cr0x@server:~$ lscpu | egrep 'Vendor ID|Model name|Virtualization|Flags'
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E-2278G @ 3.40GHz
Virtualization: VT-x
Flags: ... vmx ... dmar ...
What it means: On Intel, you want VT-x (CPU virtualization) and VT-d (IOMMU). VT-d doesn’t show as “VT-d” here reliably; you’ll typically infer via platform and DMAR logs. On AMD you’ll see “svm” for CPU virt and later check AMD-Vi in dmesg.
Decision: If the CPU is a low-end part or an odd embedded SKU, confirm it supports IOMMU at all before blaming Proxmox.
Task 2: Confirm IOMMU-related kernel modules are present
cr0x@server:~$ lsmod | egrep 'vfio|iommu|kvm' | head
vfio_pci 16384 0
vfio_pci_core 69632 1 vfio_pci
vfio_iommu_type1 45056 0
vfio 40960 2 vfio_pci_core,vfio_iommu_type1
kvm_intel 372736 0
kvm 1036288 1 kvm_intel
What it means: VFIO modules being loaded doesn’t prove IOMMU is enabled, but it tells you you’re not missing basic plumbing.
Decision: If VFIO is absent and you’re attempting passthrough, you’ll need to load/configure it later. For “No IOMMU detected,” focus on firmware + cmdline first.
Task 3: Determine whether Proxmox uses GRUB or systemd-boot
cr0x@server:~$ proxmox-boot-tool status 2>/dev/null || echo "proxmox-boot-tool not in use"
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace
System currently booted with: uefi
Configured ESPs: /boot/efi
What it means: If proxmox-boot-tool is in use, you’re usually on systemd-boot with an EFI System Partition (ESP). If it’s not in use, you might be on GRUB.
Decision: Don’t guess which bootloader you have. Apply the kernel flags in the correct place for your boot path.
Task 4: Confirm current bootloader entries (systemd-boot path)
cr0x@server:~$ bootctl status 2>/dev/null | sed -n '1,120p'
System:
Firmware: UEFI 2.70
Secure Boot: disabled
Setup Mode: user
Boot Loader:
Product: systemd-boot 252
Features: ✓ Boot counting
Default Boot Entry: proxmox
What it means: You’re using systemd-boot. Proxmox commonly manages kernel command line via /etc/kernel/cmdline.
Decision: If you’re on systemd-boot, editing /etc/default/grub may do nothing. That’s a classic time-waster.
Task 5: Confirm GRUB is actually installed/active (GRUB path)
cr0x@server:~$ grub-probe /boot 2>/dev/null && echo "GRUB tooling present"
ext2
GRUB tooling present
What it means: GRUB tooling exists, but doesn’t prove it’s the active bootloader. Combine with the UEFI checks above.
Decision: If you’re booted in UEFI and systemd-boot is active, treat GRUB as a distraction unless you explicitly installed it.
UEFI/BIOS settings that matter (and the ones that don’t)
Firmware menus vary wildly, but the concepts are stable. You need to enable the IOMMU feature and sometimes a few side-options that make it usable in practice.
Intel platforms (common labels)
- Intel Virtualization Technology (VT-x): Enables CPU virtualization. Needed for KVM, but not sufficient for passthrough.
- Intel VT-d: This is the IOMMU. This is the one you’re here for.
- Above 4G decoding: Helpful/required when you pass through GPUs or multiple devices that map large BARs. On some boards, this also changes PCIe resource allocation in a way that improves grouping.
- Resizable BAR: Can complicate passthrough in some setups. If you’re debugging, consider disabling it temporarily.
- SR-IOV: Not required for IOMMU, but often in the same neighborhood. Enable only if you need VFs.
AMD platforms (common labels)
- SVM: CPU virtualization (KVM). Enable it.
- IOMMU / AMD-Vi: The IOMMU. Enable it.
- ACS / ACS Enable: Sometimes exposed, sometimes not. Do not rely on it being present.
- Above 4G decoding: Same story as Intel; often required for GPU passthrough stability.
Firmware settings that people obsess over (usually pointless)
- “UEFI vs Legacy” as a magic switch: UEFI can help, but it doesn’t replace enabling VT-d/AMD-Vi.
- C-states: Power management can cause latency issues, not “No IOMMU detected.” Don’t change ten knobs at once.
- XMP/EXPO: Memory profiles can destabilize hosts, but they don’t disable IOMMU. Keep them conservative if this is production-ish.
Joke #1: If your BIOS UI looks like a 2009 DVD player menu, don’t worry—your IOMMU settings are still hiding in there somewhere.
Kernel flags: GRUB vs systemd-boot (the 10-minute fix)
Once firmware is correct, the kernel still needs to enable IOMMU. On Proxmox, this is typically a one-line change plus a bootloader update.
Pick your vendor flags
- Intel:
intel_iommu=on iommu=pt - AMD:
amd_iommu=on iommu=pt
Why iommu=pt? It’s a pragmatic default for many virtualization hosts: host devices use pass-through mappings (less overhead), while VFIO devices still get isolation. If you’re doing maximum-security multi-tenant work, you may choose a different posture. Most Proxmox users aren’t.
Task 6 (systemd-boot): Set flags in /etc/kernel/cmdline
cr0x@server:~$ sudo cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 ro quiet
What it means: This is your kernel command line template.
Decision: Append the correct flags for your CPU vendor.
cr0x@server:~$ sudo nano /etc/kernel/cmdline
Edit it to something like:
cr0x@server:~$ sudo cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 ro quiet intel_iommu=on iommu=pt
Now apply it:
cr0x@server:~$ sudo proxmox-boot-tool refresh
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Refreshing ESP /boot/efi
Copying kernel and creating EFI boot entry
What it means: The ESP entries are refreshed with the new cmdline.
Decision: If you don’t see it refreshing the ESP, you may not be using systemd-boot—or your ESP isn’t mounted correctly.
Task 7 (GRUB): Set flags in /etc/default/grub and update-grub
cr0x@server:~$ sudo grep -n '^GRUB_CMDLINE_LINUX_DEFAULT' /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
What it means: This is where you add the flags for typical GRUB setups.
Decision: Add the vendor flag and usually iommu=pt.
cr0x@server:~$ sudo sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="quiet"/GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"/' /etc/default/grub
cr0x@server:~$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.12-3-pve
Found initrd image: /boot/initrd.img-6.8.12-3-pve
done
What it means: Your GRUB config now includes the flags.
Decision: If update-grub doesn’t find your Proxmox kernel images, you’re either not using GRUB or your /boot layout is unusual.
Task 8: Reboot, then confirm cmdline contains the flags
cr0x@server:~$ sudo reboot
Connection to server closed by remote host.
After reboot:
cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.12-3-pve root=ZFS=rpool/ROOT/pve-1 ro quiet intel_iommu=on iommu=pt
What it means: The running kernel has the flags.
Decision: If the flags aren’t there, you edited the wrong place for your bootloader. Go back to the bootloader detection tasks.
Validate IOMMU is working (and interpret the output)
You don’t “feel” IOMMU. You prove it. Here are the checks I trust.
Task 9: Check dmesg for IOMMU enablement (again, now after changes)
cr0x@server:~$ dmesg | egrep -i 'DMAR: IOMMU enabled|AMD-Vi: IOMMU enabled|IOMMU enabled' | head
[ 0.000000] DMAR: IOMMU enabled
What it means: Kernel is using the IOMMU.
Decision: If you still see nothing, firmware is still not enabling VT-d/AMD-Vi or the platform doesn’t provide the tables properly.
Task 10: Check interrupt remapping / DMAR warnings
cr0x@server:~$ dmesg | egrep -i 'remapping|x2apic|DMAR:.*fault|AMD-Vi:.*Event' | head -n 50
[ 0.000000] DMAR: Interrupt remapping enabled
What it means: Interrupt remapping is on (good). If you see DMAR faults, that can indicate buggy firmware or a device doing illegal DMA.
Decision: If you see repeated faults, don’t ignore them. They correlate strongly with “VM randomly freezes under IO.”
Task 11: Check for IOMMU sysfs presence
cr0x@server:~$ ls -1 /sys/kernel/iommu_groups 2>/dev/null | head
0
1
2
3
What it means: If /sys/kernel/iommu_groups exists and has numbered directories, the kernel created IOMMU groups.
Decision: If the directory doesn’t exist, you don’t have an active IOMMU.
Task 12: Enumerate groups and devices (the truth serum)
cr0x@server:~$ for g in /sys/kernel/iommu_groups/*; do echo "Group $(basename "$g")"; for d in "$g"/devices/*; do echo -n " "; lspci -nns "$(basename "$d")"; done; done | sed -n '1,60p'
Group 0
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:3e0f]
Group 1
00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901]
Group 7
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1b80] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f0] (rev a1)
What it means: Devices in the same group cannot be safely separated for passthrough. A GPU typically shows up with its audio function in the same group—normal and fine. But if your GPU shares a group with, say, the SATA controller you boot from, you’re in for a bad day.
Decision: If the target device’s group includes unrelated critical devices, you either move the card to another slot, change firmware settings (Above 4G decoding sometimes helps), or consider ACS override (with eyes open).
IOMMU groups: what “good” looks like
IOMMU groups are the difference between “passthrough is a checkbox” and “passthrough is a quarterly project.” You want clean separation. You rarely get perfection. You settle for safe.
What you want
- Your passthrough device (GPU/HBA/NIC) is isolated in its own group, or shares only with its own functions (multi-function devices).
- Motherboard controllers you rely on for host boot (SATA/NVMe controller holding Proxmox root) are not glued to the same group as your passthrough target.
- PCIe root ports and bridges form logical separation between slots.
What you can tolerate
- GPU + GPU audio function in the same group.
- NIC with multiple functions grouped together (depends on hardware).
What you should not tolerate
- HBA grouped with the chipset SATA controller hosting Proxmox boot disks.
- GPU grouped with the USB controller you need for keyboard/console during outages (ask me how I know; don’t, actually).
Joke #2: IOMMU groups are like office seating—if Finance and Production share a desk, something is about to get audited.
VFIO basics: binding devices cleanly
Once IOMMU is on and groups are acceptable, you still need to keep the host from grabbing the device with a native driver. VFIO wants the device first.
Task 13: Identify the PCI IDs of the device you’ll pass through
cr0x@server:~$ lspci -nn | egrep -i 'vga|nvidia|amd|ethernet|sas|sata' | head -n 20
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1b80] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f0] (rev a1)
What it means: The vendor:device IDs are in brackets. Here: 10de:1b80 and 10de:10f0.
Decision: You generally bind all functions you intend to pass through (GPU + audio). Don’t bind half a multi-function device unless you enjoy debugging.
Task 14: Check which driver currently owns the device
cr0x@server:~$ lspci -k -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b80 (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3301
Kernel driver in use: nouveau
Kernel modules: nouveau
What it means: The host is using nouveau. For passthrough, you typically want vfio-pci instead.
Decision: Plan to bind to vfio-pci and blacklist host drivers that auto-attach.
Task 15: Configure VFIO IDs
cr0x@server:~$ sudo tee /etc/modprobe.d/vfio.conf >/dev/null <<'EOF'
options vfio-pci ids=10de:1b80,10de:10f0 disable_vga=1
EOF
What it means: This tells vfio-pci to claim those devices early.
Decision: If this is a production host that also uses the GPU for console, reconsider. Passthrough means the host should not rely on that device.
Task 16: Blacklist conflicting GPU drivers (example for nouveau)
cr0x@server:~$ sudo tee /etc/modprobe.d/blacklist-gpu.conf >/dev/null <<'EOF'
blacklist nouveau
options nouveau modeset=0
EOF
What it means: Prevents the host from binding the open-source NVIDIA driver at boot.
Decision: Only blacklist what conflicts. Don’t go on a blacklisting spree; that’s how you remove your own NIC driver and “learn” remote hands policy.
Task 17: Ensure VFIO modules are loaded early
cr0x@server:~$ sudo tee -a /etc/modules >/dev/null <<'EOF'
vfio
vfio_pci
vfio_iommu_type1
vfio_virqfd
EOF
What it means: Ensures modules load at boot. Some setups work without this; many behave better with it.
Decision: If you’re chasing intermittent binding, loading VFIO early is a cheap stabilizer.
Task 18: Refresh initramfs and reboot
cr0x@server:~$ sudo update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.12-3-pve
What it means: Your initramfs now includes updated module configs for early boot.
Decision: Reboot to test clean binding; don’t try to hot-unbind a GPU on a system you need to stay reachable.
Task 19: Confirm the device is now bound to vfio-pci
cr0x@server:~$ lspci -k -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b80 (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3301
Kernel driver in use: vfio-pci
Kernel modules: nouveau
What it means: “Kernel driver in use: vfio-pci” is the win condition. “Kernel modules” may still list potential modules; that’s fine.
Decision: If it’s still owned by nouveau/nvidia/amdgpu, your blacklist or initramfs update didn’t take effect, or the device is used as the primary console.
Checklists / step-by-step plan
Step-by-step: fix “No IOMMU detected” in the real world
- Confirm boot mode: UEFI or legacy via
/sys/firmware/efi. Decide whether to standardize on UEFI. - Identify bootloader: systemd-boot vs GRUB. Edit the right configuration file.
- Enable firmware options: VT-d (Intel) or IOMMU/AMD-Vi (AMD). Also enable SVM/VT-x for CPU virtualization.
- Enable Above 4G decoding: Especially if doing GPU/HBA passthrough or multiple PCIe devices.
- Add kernel flags:
intel_iommu=on iommu=ptoramd_iommu=on iommu=pt. - Refresh boot entries:
proxmox-boot-tool refreshfor systemd-boot, orupdate-grubfor GRUB. - Reboot: Don’t trust partial states.
- Validate in dmesg: “IOMMU enabled,” and ideally interrupt remapping.
- Check groups: Ensure your target device is in a safe group.
- Bind VFIO: Set vfio-pci IDs, blacklist conflicting drivers, update initramfs, reboot.
- Test VM stability: Boot VM, run load, confirm resets and reboots work repeatedly.
Safety checklist before passing through storage controllers
- Confirm the HBA is not in the same IOMMU group as the boot controller.
- Ensure Proxmox root is not on disks attached to the controller you will pass through.
- Have out-of-band access (IPMI/iKVM) or at least a plan if networking drops.
- Snapshot VM config and record PCI addresses before changes.
Common mistakes: symptoms → root cause → fix
This is where most time is lost: not in the fix, but in confidently fixing the wrong thing.
1) Symptom: “No IOMMU detected” even after setting intel_iommu=on
- Root cause: You edited GRUB, but the host boots via systemd-boot (or vice versa).
- Fix: Confirm with
bootctl statusandproxmox-boot-tool status. Apply flags in/etc/kernel/cmdlinefor systemd-boot and runproxmox-boot-tool refresh.
2) Symptom: dmesg shows IOMMU enabled, but Proxmox GUI still complains
- Root cause: You’re looking at stale information (old boot), or the warning is from a VM config check that expects VFIO modules and groups, not just IOMMU.
- Fix: Reboot, then validate with
cat /proc/cmdlineandls /sys/kernel/iommu_groups. Ensure VFIO modules are loaded and group isolation is acceptable.
3) Symptom: VM starts, but device doesn’t appear inside guest
- Root cause: Host driver still owns the device, or you only bound part of a multi-function device.
- Fix: Use
lspci -kto confirmvfio-pci. Bind all relevant functions (GPU + audio). Update initramfs and reboot.
4) Symptom: “IOMMU groups are not viable” / huge groups containing half the system
- Root cause: Motherboard topology lacks ACS isolation; common on consumer boards.
- Fix: Try different PCIe slots, enable Above 4G decoding, update BIOS, and only then consider ACS override (knowing it can weaken isolation).
5) Symptom: Passthrough works once, then fails after VM reboot
- Root cause: Device reset issues (especially GPUs), firmware bugs, or driver quirks in the guest.
- Fix: Test multiple reboot cycles. Consider different GPU, apply vendor-specific reset workarounds only if you can validate stability. Don’t ship “works on first boot” to production.
6) Symptom: Host becomes unreachable after enabling VFIO/binding a NIC
- Root cause: You bound the management NIC (or its entire group) to vfio-pci, removing host networking.
- Fix: Unbind by removing vfio IDs/blacklists, rebuild initramfs, reboot. Use out-of-band console. And next time, check groups before binding.
7) Symptom: “DMAR: DRHD: handling fault status” spam in logs
- Root cause: Firmware/IOMMU bugs, or a device doing illegal DMA.
- Fix: Update BIOS, confirm correct kernel flags, try disabling aggressive PCIe power saving in firmware. If it persists, consider that the platform may be unreliable for passthrough.
Three corporate mini-stories from the trenches
Mini-story 1: The outage caused by a wrong assumption
A team I worked with inherited a Proxmox cluster that “definitely supported passthrough.” Someone had done it once, on a different node, in a different rack, during a migration window. That became tribal truth. The next step was to pass through an HBA to a storage VM—because that was the fastest path to importing a pile of disks.
They enabled VT-d in BIOS, added intel_iommu=on to GRUB, rebooted, and called it done. Then they bound the HBA to vfio-pci. The host went down hard: boot loop, missing root device, the usual sickening quiet after the SSH session drops. On-site hands had to plug in a crash cart because out-of-band wasn’t wired. It was a long afternoon.
The root cause wasn’t exotic. The HBA they “passed through” wasn’t a separate HBA at all. It was the onboard storage controller. It lived in the same IOMMU group as the chipset functions that also hosted the boot volume. They assumed “controller name looks like HBA” meant “safe for passthrough.” The kernel disagreed.
The fix was boring: stop, map IOMMU groups, and physically install a real dedicated HBA in a slot that grouped cleanly. Then bind only that device. The lesson wasn’t “be careful.” The lesson was never assume identity from marketing names. PCI addresses and IOMMU groups are the only reliable naming scheme in the building.
Mini-story 2: The optimization that backfired
Another environment wanted maximum performance for a GPU-heavy workload. Someone read that IOMMU translation adds overhead. They decided to “optimize” by experimenting with different kernel parameters and firmware toggles until benchmarking looked good. The configuration drifted: one node used iommu=pt, another didn’t, a third had different PCIe settings because that node “needed it to boot.”
Everything looked fine for a while. Then they started seeing rare, ugly failures: GPU passthrough VMs would hang under high IO while the host stayed mostly alive. It wasn’t frequent enough to reproduce on demand. It was frequent enough to burn engineers. They chased guest drivers, then QEMU versions, then power supply theories. I wish I were joking. I’m not.
The eventual culprit was configuration inconsistency around IOMMU and interrupt remapping combined with slightly different BIOS versions. A single node had interrupt remapping disabled by firmware after an update. Under a specific workload, it was the one that folded. The “optimization” wasn’t the direct cause; the lack of standardization was.
The recovery was operational: standardize firmware, standardize kernel command line, enforce it with configuration management, and add a boot-time validation check that alerts when IOMMU isn’t enabled. Performance didn’t drop meaningfully. Stability improved a lot. The backfire wasn’t from tuning—it was from tuning without discipline.
Mini-story 3: The boring practice that saved the day
A smaller shop ran a couple of Proxmox nodes supporting internal services. They weren’t fancy. But they had one habit I respected: for every node, they kept a simple “hardware mapping” file in their ops repo—PCI addresses, what each device was, and what it was allowed to be used for.
When they decided to pass through a NIC to a firewall VM, they didn’t start by editing modprobe files. They started by checking the mapping, then verifying IOMMU groups live, then planning the change during a window with out-of-band access tested. They also staged it: bind VFIO, reboot, confirm host is still reachable, then add the device to the VM.
The day of the change, the NIC they wanted to pass through was grouped with the onboard USB controller. Passing it through would have made remote console recovery harder if anything went wrong, because their USB-based IP KVM dongle was on that controller. Nothing broke, because they didn’t proceed. They moved the NIC to a different slot and the groups improved.
Nothing exciting happened. That’s the point. The “boring” practice—keeping a device map and treating changes like changes—prevented a self-inflicted incident. Reliability is mostly the art of not being surprised by your own infrastructure.
One operations quote (paraphrased idea)
Paraphrased idea: “Hope is not a strategy.” — often attributed in operations circles, commonly associated with pragmatic engineering management (paraphrased)
FAQ
1) Do I need UEFI to use IOMMU on Proxmox?
No, but UEFI tends to provide better firmware tables and fewer surprises on modern hardware. If you can choose, choose UEFI and be consistent across nodes.
2) What’s the difference between VT-x and VT-d?
VT-x is CPU virtualization (running VMs efficiently). VT-d is IOMMU (safe device DMA and PCI passthrough). You need VT-x/SVM for VMs; you need VT-d/AMD-Vi for passthrough.
3) Should I always use iommu=pt?
For most Proxmox hosts doing passthrough, yes: it’s a good default. If you’re aiming for maximum isolation across all host devices (more strict mapping), you might not use it—but expect some overhead.
4) I enabled IOMMU but my groups are huge. Is ACS override safe?
ACS override can improve grouping by forcing the kernel to treat devices as more isolated than the topology suggests. It can also reduce real isolation. Use it only if you understand the risk and you’re not pretending this host is a hardened multi-tenant platform.
5) Why does Proxmox still complain after I set BIOS options?
Because BIOS options don’t matter if the kernel flags weren’t applied, or you changed the wrong bootloader configuration. Validate with /proc/cmdline and dmesg, not with hope.
6) Can I pass through my boot NVMe controller?
You can, but you shouldn’t if Proxmox boots from it. If you hand the controller to a VM, the host loses access. If you want VM-direct NVMe, install a second controller or use a dedicated disk.
7) Does ZFS change anything about IOMMU?
Not directly. ZFS affects boot cmdline patterns (e.g., root=ZFS=...) and can influence how you manage bootloaders (systemd-boot is common). IOMMU itself is still firmware + kernel behavior.
8) How do I know if a device is safe to pass through?
Check its IOMMU group. If the group contains only that device (and its functions), you’re in good shape. If it shares with critical host devices, move slots or rethink the plan.
9) My GPU is bound to vfio-pci but the VM won’t start. Now what?
Check for reset quirks and ensure you passed all functions (GPU + audio). Verify the VM machine type and firmware (OVMF/UEFI is common for modern GPUs). Then check QEMU logs for the actual error; it’s usually explicit.
10) Can Secure Boot cause “No IOMMU detected”?
Secure Boot usually affects module loading and signed kernels more than IOMMU detection itself. But it can complicate the boot path and make you think you applied flags when you didn’t. Validate with /proc/cmdline.
Next steps (what to do after it works)
Once IOMMU is enabled and your groups look sane, don’t stop at “the VM booted.” Do the operational checks that prevent future incidents:
- Reboot tests: Reboot the VM repeatedly. Reboot the host once. If a device only works after a cold boot, that’s a reliability problem waiting for your next patch window.
- Log hygiene: Watch
dmesgfor DMAR/AMD-Vi faults during load. Fault spam is not “normal.” - Change control: Record BIOS version, boot mode, kernel cmdline, and vfio IDs. Standardize it across nodes if you have more than one.
- Keep passthrough scoped: Pass through the minimum set of devices. Don’t hand the VM the world because it was easier than thinking.
If you did everything above and you still see “No IOMMU detected,” you’re in one of the small, annoying edge cases: firmware that lies, a platform without real VT-d/AMD-Vi, or a bootloader path you didn’t realize existed. That’s when you stop making changes and start proving each layer again: firmware → cmdline → dmesg → groups. Systems don’t respond to vibes. They respond to configuration.