Fix Proxmox “IOMMU not enabled” for PCI Passthrough (VT-d/AMD-Vi) Safely

Was this helpful?

You’re staring at Proxmox, trying to pass a GPU, HBA, or NIC into a VM, and it throws the classic cold shower: “IOMMU not enabled”. You’ve enabled “something” in BIOS, you’ve rebooted twice, and now you’re wondering if the machine is gaslighting you.

This problem is almost never mystical. It’s usually one of five things: firmware toggles, the wrong bootloader file, missing kernel parameters, missing VFIO modules, or a board that groups devices like it’s trying to prevent you from having hobbies. We’ll fix it safely, with evidence at every step, and with rollback options so you don’t brick a remote host at 2 a.m.

What IOMMU actually is (and why Proxmox cares)

An IOMMU (Intel VT-d, AMD-Vi) is a memory management unit for devices. CPUs have an MMU that translates virtual memory addresses to physical addresses for processes. The IOMMU does the same kind of translation for DMA (Direct Memory Access) from PCIe devices: NICs, HBAs, GPUs, USB controllers, and friends.

PCI passthrough depends on this because when you hand a real device to a VM, you need the host to enforce “this device may only DMA into this VM’s memory.” Without that, a device could scribble over host memory or another VM’s memory. Security aside, it’s also stability: random DMA into the wrong place is the kind of chaos that produces creative kernel panics.

So Proxmox checks for IOMMU. If it can’t see it, it refuses to pretend everything is fine. That’s not Proxmox being picky. That’s Proxmox preventing your storage controller from accidentally becoming an abstract art generator.

One reliability mantra worth keeping in your pocket comes from John Allspaw, paraphrased idea: Reliability comes from designing for failure and learning from it, not from pretending failures won’t happen. That’s the posture here: make changes, verify them, keep rollback.

Also: “IOMMU not enabled” is a message about the host. It isn’t about your VM configuration yet. Don’t waste time editing VM config files until the host proves it has IOMMU enabled and working.

Interesting facts and history (why this stuff is messy)

  • Fact 1: Intel’s IOMMU branding is VT-d; VT-x is CPU virtualization. People mix them up constantly, including in vendor BIOS menus.
  • Fact 2: AMD’s IOMMU implementation is commonly called AMD-Vi, and the BIOS toggle is often labeled simply “IOMMU”.
  • Fact 3: Early PCI passthrough on Linux leaned on the old “pciback” approach before VFIO became the standard. VFIO won because it’s saner and safer.
  • Fact 4: IOMMU group boundaries come from ACPI tables and PCIe topology. Two identical CPUs with two different motherboards can behave wildly differently.
  • Fact 5: “ACS” (Access Control Services) is a PCIe feature that helps isolate devices behind switches/bridges. Some consumer boards omit it or implement it partially.
  • Fact 6: The Linux kernel has supported DMA remapping for a long time, but defaults and heuristics changed across versions—especially around performance vs safety trade-offs.
  • Fact 7: NVIDIA consumer GPUs historically fought virtualization in various ways; modern drivers are better, but the folklore persists because people have scars.
  • Fact 8: “Interrupt remapping” is part of the story: it helps keep device interrupts correctly isolated. Missing it can block some advanced passthrough setups.
  • Fact 9: Proxmox makes this approachable, but under the hood it’s still Linux: boot parameters, initramfs, modules, sysfs, and the occasional firmware gotcha.

That’s the context. Now the practical part: make the host prove it can do IOMMU, then make it prove it can isolate the device you care about, then pass it through.

Fast diagnosis playbook

If you’re on call, you don’t want philosophy. You want the fastest route to “is this BIOS, bootloader, kernel, or hardware topology?” Here’s the order that wins most often.

First: confirm the CPU and firmware toggle reality

  • Does the CPU/platform support VT-d/AMD-Vi?
  • Is it enabled in BIOS/UEFI (not just “virtualization” but IOMMU/VT-d specifically)?
  • Did you reboot after changing it? (Warm reboot usually counts, but some boards require a full power cycle.)

Second: confirm the kernel got the right parameters

  • Check your actual boot command line in /proc/cmdline.
  • Don’t trust the file you edited until you see it reflected in the running kernel.

Third: confirm the kernel actually initialized IOMMU

  • Look for DMAR/IOMMU lines in dmesg.
  • Check that /sys/kernel/iommu_groups exists and is populated.

Fourth: check grouping before blaming VFIO

  • Bad IOMMU grouping is a topology/firmware issue, not a VFIO issue.
  • Know what you’re willing to do: move the card to another slot, or accept ACS override risks.

Fifth: only then bind the device to vfio-pci

  • Identify the device by vendor:device ID.
  • Bind it in initramfs so the host driver doesn’t grab it first.

This order avoids the classic mistake: spending an hour on VFIO binding while IOMMU is still disabled at the firmware layer.

Preflight safety: don’t strand yourself

Enabling IOMMU is usually safe. The unsafe part is how people do it: editing the wrong boot config, adding aggressive parameters, rebooting a remote host without out-of-band access, then discovering it won’t come back.

Before you touch anything:

  • Confirm you have console access (IPMI/iDRAC/iLO, KVM-over-IP, or at least someone who can plug in a monitor).
  • Schedule a reboot window. This is kernel/firmware territory; you are rebooting.
  • Take a snapshot of config files you change, and keep one known-good boot entry if possible.
  • Know your boot mode: GRUB vs systemd-boot. Proxmox can be either depending on install method and storage layout.

Short joke #1: If you’re enabling IOMMU on a remote box with no console, you’re not doing SRE—you’re doing performance art.

Hands-on tasks: commands, outputs, decisions (12+)

These tasks are ordered the way I’d do them on a production Proxmox host. Each one includes what it means and what decision you make next.

Task 1: confirm CPU virtualization extensions (sanity check)

cr0x@server:~$ lscpu | egrep -i 'Vendor ID|Model name|Virtualization|Flags'
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) CPU E-2278G @ 3.40GHz
Virtualization:                  VT-x
Flags:                           ... vmx ...

What it means: VT-x (Intel) or SVM (AMD) is CPU virtualization. This output does not confirm VT-d/AMD-Vi, but it tells you you’re not on ancient hardware.

Decision: Proceed. If you don’t see vmx (Intel) or svm (AMD), you may be in the wrong BIOS mode, or virtualization is disabled entirely.

Task 2: identify whether you’re on Intel or AMD (to pick kernel params)

cr0x@server:~$ lscpu | awk -F: '/Vendor ID/ {print $2}'
 GenuineIntel

What it means: Intel uses intel_iommu=on; AMD uses amd_iommu=on. Yes, there are other modes, but start simple.

Decision: Pick the correct parameter for your platform. Wrong parameter usually just does nothing, which is how you end up “sure you enabled it.”

Task 3: confirm boot mode and bootloader (GRUB vs systemd-boot)

cr0x@server:~$ proxmox-boot-tool status
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
Found 1 ESP(s) configured:
  /dev/disk/by-uuid/9A1B-2C3D mounted on /boot/efi
Configured with: systemd-boot

What it means: This host uses systemd-boot. Editing /etc/default/grub would be a nice waste of time.

Decision: If you see systemd-boot, you’ll typically edit /etc/kernel/cmdline and run proxmox-boot-tool refresh. If you see GRUB, you edit /etc/default/grub and run update-grub.

Task 4: check current kernel command line (the truth source)

cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.12-4-pve root=/dev/mapper/pve-root ro quiet

What it means: No IOMMU parameters are currently in effect.

Decision: Add the correct parameters and reboot. Do not proceed to VFIO until this changes.

Task 5A (systemd-boot): set kernel parameters safely

cr0x@server:~$ sudo cp -a /etc/kernel/cmdline /etc/kernel/cmdline.bak
cr0x@server:~$ echo "root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt" | sudo tee /etc/kernel/cmdline
root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt

What it means: You’ve set the parameters for Intel. iommu=pt uses pass-through mode for host devices (often better performance, still keeps isolation for VFIO).

Decision: Refresh boot entries and initramfs integration using Proxmox tooling.

Task 6A (systemd-boot): apply and sync boot config

cr0x@server:~$ sudo proxmox-boot-tool refresh
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
Running hook script 'proxmox-auto-removal'..
Copying kernel and creating EFI boot entry
Refreshing /dev/disk/by-uuid/9A1B-2C3D
Success.

What it means: Your new kernel cmdline is now in the UEFI boot environment Proxmox uses.

Decision: Reboot when you have console access.

Task 5B (GRUB alternative): set kernel parameters

cr0x@server:~$ sudo cp -a /etc/default/grub /etc/default/grub.bak
cr0x@server:~$ sudo sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="\(.*\)"/GRUB_CMDLINE_LINUX_DEFAULT="\1 intel_iommu=on iommu=pt"/' /etc/default/grub
cr0x@server:~$ grep GRUB_CMDLINE_LINUX_DEFAULT /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

What it means: Parameters will be embedded into GRUB-generated config next.

Decision: Run update-grub and reboot.

Task 6B (GRUB alternative): regenerate GRUB config

cr0x@server:~$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.12-4-pve
Found initrd image: /boot/initrd.img-6.8.12-4-pve
done

What it means: GRUB config is updated. Still not active until reboot.

Decision: Reboot with console access ready.

Task 7: reboot and verify cmdline actually changed

cr0x@server:~$ sudo reboot
cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.12-4-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt

What it means: You’re no longer guessing. The running kernel has IOMMU parameters.

Decision: Now check kernel logs to confirm initialization.

Task 8: confirm IOMMU initialized in dmesg (Intel DMAR or AMD IOMMU)

cr0x@server:~$ dmesg | egrep -i 'DMAR|IOMMU|AMD-Vi|Interrupt Remapping' | head -n 20
[    0.000000] DMAR: IOMMU enabled
[    0.000000] DMAR: Host address width 39
[    0.123456] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.234567] DMAR: Interrupt remapping enabled

What it means: This is the money shot. If you don’t see “IOMMU enabled” (or AMD equivalents), you are still not done.

Decision: If absent, go back: firmware settings, wrong boot parameter, or a platform limitation.

Task 9: check for IOMMU groups existing in sysfs

cr0x@server:~$ ls -1 /sys/kernel/iommu_groups | head
0
1
10
11
12
13
14
15
16
17

What it means: Groups exist. That strongly suggests IOMMU is operational.

Decision: Move on to identifying your device and verifying isolation.

Task 10: list PCI devices and find the one you want to passthrough

cr0x@server:~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:3ec2]
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1]
01:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8]
03:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller [144d:a808]

What it means: The GPU is at 01:00.0 and its audio function at 01:00.1. For GPUs, you usually passthrough both functions together.

Decision: Record BDF addresses and vendor:device IDs. You’ll need them for VFIO binding and for the VM config.

Task 11: see which kernel driver currently owns the device

cr0x@server:~$ lspci -nnk -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1]
        Subsystem: Hewlett-Packard Company Device [103c:12a3]
        Kernel driver in use: nouveau
        Kernel modules: nouveau

What it means: The host driver nouveau grabbed the GPU. That will fight VFIO.

Decision: Plan to bind this device to vfio-pci early (initramfs) and blacklist conflicting drivers if needed.

Task 12: load VFIO modules (and make them persistent)

cr0x@server:~$ printf "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd\n" | sudo tee /etc/modules-load.d/vfio.conf
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
cr0x@server:~$ sudo modprobe vfio_pci
cr0x@server:~$ lsmod | egrep '^vfio'
vfio_pci               16384  0
vfio_iommu_type1       45056  0
vfio                   65536  2 vfio_pci,vfio_iommu_type1

What it means: VFIO is available now and will be loaded at boot in the future.

Decision: Next bind the specific device IDs to vfio-pci, then rebuild initramfs so the binding happens before host GPU drivers attach.

Task 13: bind specific vendor:device IDs to vfio-pci

cr0x@server:~$ echo "options vfio-pci ids=10de:1eb1,10de:10f8 disable_vga=1" | sudo tee /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1eb1,10de:10f8 disable_vga=1

What it means: On boot, vfio-pci should claim those IDs.

Decision: If this is your only GPU and you rely on local console, pause and think. Passing through the only display adapter can make the host “headless” in surprising ways.

Task 14: blacklist conflicting GPU drivers (only if needed)

cr0x@server:~$ printf "blacklist nouveau\noptions nouveau modeset=0\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

What it means: Prevents the open-source NVIDIA driver from binding first. Similar logic applies to other devices (e.g., ixgbe for certain NICs), but blacklisting NIC drivers can cut your own network cable. Don’t do that casually.

Decision: Rebuild initramfs and reboot to ensure early binding.

Task 15: rebuild initramfs (so VFIO binding happens early)

cr0x@server:~$ sudo update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.12-4-pve

What it means: Boot-time driver selection will now honor your VFIO config earlier.

Decision: Reboot and verify the device is claimed by vfio-pci.

Task 16: verify the device is now bound to vfio-pci

cr0x@server:~$ sudo reboot
cr0x@server:~$ lspci -nnk -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1]
        Subsystem: Hewlett-Packard Company Device [103c:12a3]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau

What it means: Perfect. The host sees the card, but vfio-pci owns it. The “Kernel modules” line can still list possible modules; what matters is “driver in use”.

Decision: Now check IOMMU group membership to ensure you can pass it through safely.

Task 17: list IOMMU groups with devices (the topology truth)

cr0x@server:~$ for g in /sys/kernel/iommu_groups/*; do \
  echo "IOMMU Group ${g##*/}"; \
  for d in "$g"/devices/*; do echo "  $(lspci -nn -s ${d##*/})"; done; \
done | sed -n '1,40p'
IOMMU Group 1
  00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901]
IOMMU Group 2
  01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104GL [Quadro RTX 4000] [10de:1eb1]
  01:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8]
IOMMU Group 3
  03:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller [144d:a808]

What it means: The GPU functions are isolated in their own group. That’s what you want. If your GPU shares a group with, say, a SATA controller or USB controller you need for the host, you have a problem.

Decision: If isolation is good, proceed to VM config. If not, you’re in “slot shuffling / BIOS updates / ACS override decision” territory.

Task 18: check KVM is actually usable (don’t skip this on odd platforms)

cr0x@server:~$ lsmod | egrep 'kvm|vfio'
kvm_intel             380928  0
kvm                  1032192  1 kvm_intel
vfio_pci               16384  0
vfio_iommu_type1       45056  0
vfio                   65536  2 vfio_pci,vfio_iommu_type1

What it means: Virtualization is present (kvm_intel or kvm_amd) and VFIO stack is loaded.

Decision: If KVM isn’t loaded, you’re solving a different problem (BIOS virtualization disabled or nested virtualization complexity). Don’t confuse it with IOMMU.

Task 19: verify Proxmox sees IOMMU in a user-facing way

cr0x@server:~$ pvesh get /nodes/$(hostname)/hardware/pci --noborder | head -n 20
┌─────────┬───────────────────────────────────────────────────────────┬────────────┬────────────┬─────────────┬───────────────┐
│ class   │ device                                                    │ id         │ iommugroup │ subsystem_id │ subsystem_name │
╞═════════╪═══════════════════════════════════════════════════════════╪════════════╪════════════╪═════════════╪═══════════════╡
│ 0x0300  │ NVIDIA Corporation TU104GL [Quadro RTX 4000]              │ 0000:01:00 │ 2          │ 103c:12a3    │ Hewlett-Packard│
└─────────┴───────────────────────────────────────────────────────────┴────────────┴────────────┴─────────────┴───────────────┘

What it means: Proxmox can map devices to IOMMU groups. That’s a strong indicator your earlier work is correct.

Decision: Configure passthrough on the VM. If Proxmox still claims “IOMMU not enabled” at this point, you’re likely looking at a stale UI state, wrong node, or you enabled it on the wrong host.

GRUB vs systemd-boot on Proxmox: pick the right lever

This is the single most common time sink I see in real environments: someone edits GRUB, but the system boots via systemd-boot; or they edit /etc/kernel/cmdline, but the system is GRUB-based. Both approaches are “valid Linux work” that changes absolutely nothing on the running host.

How to tell quickly

  • systemd-boot (common with ZFS root, UEFI): proxmox-boot-tool status shows “Configured with: systemd-boot”. You edit /etc/kernel/cmdline and run proxmox-boot-tool refresh.
  • GRUB: efibootmgr -v often shows GRUB, and proxmox-boot-tool may not be in use. You edit /etc/default/grub then run update-grub.

Kernel parameters that are usually sane defaults

  • Intel: intel_iommu=on iommu=pt
  • AMD: amd_iommu=on iommu=pt

Parameters people add because they saw them online (and why to be cautious)

  • pcie_acs_override=downstream,multifunction: can split groups artificially; also reduces isolation guarantees. Use only when you understand the risk and accept it.
  • intel_iommu=on,igfx_off: can help with iGPU quirks on some systems, but don’t cargo-cult it.
  • iommu=soft: not what you want for passthrough; it’s a fallback mode and can defeat your goal.

The right mindset: add the minimum parameters to get IOMMU enabled. Verify. Then solve grouping. Then solve binding. Anything else is seasoning, not the meal.

Binding devices to VFIO (without stealing your boot disk)

Once IOMMU works, VFIO is the mechanism that makes passthrough manageable. The core idea is simple: tell the host to attach a device to vfio-pci instead of its usual driver, so QEMU can take it for the VM.

The safe way: bind by vendor:device ID, not by “whatever is at 01:00.0”

Binding by PCI address can work, but binding by ID is typically more robust across reboots and topology changes. It’s also more dangerous if you’re sloppy: if you bind your storage controller by accident, you will have a very educational reboot.

What to passthrough together

  • GPUs: usually VGA function + HDMI/DP audio function.
  • HBAs: the entire controller. Don’t pass through individual disks from a controller you also use on the host unless you like edge cases.
  • USB controllers: whole controller is often better than per-device USB passthrough, especially for dongles and low-latency devices.
  • NICs: can be great, but think through management access. If you pass through the only NIC, you might cut your own SSH session mid-flight.

When to blacklist drivers

If the host driver grabs the device before VFIO can, binding can fail. Blacklisting helps, but it’s a blunt tool. Blacklisting a GPU driver on a headless server is fine. Blacklisting a NIC driver on a remote host is how you learn whether you really have IPMI.

Short joke #2: Blacklisting your only NIC driver is a fast way to achieve “air-gapped” compliance.

IOMMU groups: how to read them and what to do about bad grouping

IOMMU groups are the security boundary. Devices in the same group cannot be safely isolated from each other for passthrough purposes. Practically, that means: if you pass one device from a group to a VM, you usually need to pass all devices in that group, or accept increased risk.

What “bad grouping” looks like

A common nightmare: the GPU shares an IOMMU group with a SATA controller, a USB controller, or a PCI bridge that also hosts devices you need. Proxmox will often refuse passthrough or warn loudly. That’s not Proxmox being dramatic; it’s reflecting the platform’s ability to isolate DMA.

Three levers that actually work (ranked by correctness)

  1. Move the card to a different PCIe slot. Different slot, different downstream ports, different grouping. This is the most effective “free” fix.
  2. Update BIOS/UEFI firmware. Vendors sometimes fix ACS/DMAR table issues. Sometimes they don’t, but it’s worth trying before hacks.
  3. ACS override kernel parameter. This can split groups by force. It can also give you a false sense of isolation. Use it only if you accept the risk model (typically a homelab; rarely a regulated environment).

What I do in production

In production environments where the VM boundary matters, I avoid ACS override. If the board doesn’t group devices properly, I pick different hardware, or I redesign: SR-IOV for NICs where possible, or a host-level service instead of passthrough.

If you are doing a workstation-in-a-VM on a single machine and you’re willing to trade strict isolation for functionality, ACS override can be acceptable. Just don’t pretend it’s “the same security.” It isn’t.

Three corporate mini-stories from the trenches

1) Incident caused by a wrong assumption: “virtualization enabled” meant VT-d was enabled

A team rolled out a new Proxmox cluster for a lab that hosted a pile of appliance VMs. One of those appliances needed a dual-port NIC passed through for packet timing and some vendor support requirement. The engineer doing the build enabled “Intel Virtualization Technology” in BIOS, verified KVM loaded, and moved on.

During the cutover window, passthrough failed with “IOMMU not enabled.” The team assumed it was a Proxmox regression because the kernel had recently updated. They chased VFIO configs for an hour, rebuilt initramfs twice, and blacklisted a driver that didn’t matter. Meanwhile, the old environment was already being drained and powered down because the plan was “clean switch.”

The fix was embarrassingly simple: the BIOS had two separate toggles—VT-x and VT-d. VT-x was enabled; VT-d was disabled by default. The machine rebooted, DMAR appeared in dmesg, and everything worked.

The lesson wasn’t “read the manual.” The lesson was: always verify the running kernel state (/proc/cmdline, dmesg, /sys/kernel/iommu_groups) before touching VFIO. Assumptions are comfortable. Production isn’t.

2) Optimization that backfired: ACS override as a shortcut

Another environment wanted to maximize density. They had a set of hosts with consumer-ish motherboards (the kind that look great on a spec sheet). They needed to pass through multiple devices per host: a GPU for VDI-ish workloads and a USB controller for licensing dongles. Grouping was bad: several root ports and endpoints landed in one giant IOMMU group.

Someone found the ACS override parameter and pitched it as a “clean fix.” It did split the groups. Proxmox stopped complaining. The rollout resumed, and the dashboards were green.

Weeks later, they began seeing rare but nasty issues: VMs would occasionally lock under heavy I/O. The host didn’t crash, but the affected VM became unresponsive, and resets sometimes failed. There was no single smoking gun. Just scattered VFIO timeouts and occasional PCIe AER noise.

They eventually rolled back the ACS override and reworked the design: moved critical passthrough workloads onto servers with better PCIe isolation and used network-based USB redirection where it was tolerable. The “optimization” wasn’t performance-related; it was schedule-related. It bought time and cost stability.

The moral: ACS override is not a free lunch. It’s a compromise you should write down as a risk, not a “final solution.”

3) Boring but correct practice that saved the day: staged reboot and rollback boot entries

A storage-heavy shop ran Proxmox hosts with HBAs for ZFS and a dedicated NIC passthrough for a firewall VM. They needed to enable IOMMU on a subset of hosts to support a new PCIe card. They did it the slow, correct way: one host at a time, after-hours, with out-of-band console verified.

They also kept a rollback plan that was almost offensively unglamorous. Before changing boot parameters, they copied the relevant config file, noted the current /proc/cmdline, and ensured there was a known-good boot entry available. On systemd-boot hosts, they verified proxmox-boot-tool status and ran proxmox-boot-tool refresh explicitly. No “I think it syncs automatically.”

On the third host, enabling IOMMU exposed a firmware bug: the machine booted, but the PCIe bus enumeration changed and a NIC name shifted. Their firewall VM didn’t start because its configuration referenced a now-missing interface mapping. Because the change was staged, only one site segment was affected, and rollback was straightforward.

They fixed it by pinning interface naming more robustly and then proceeded host-by-host. The practice that saved them wasn’t genius—it was making one change at a time and having a way back. The reliability win was boredom.

Common mistakes: symptom → root cause → fix

1) Symptom: Proxmox UI says “IOMMU not enabled” after you “enabled virtualization” in BIOS

Root cause: VT-x/SVM enabled, but VT-d/AMD-Vi (IOMMU) is still disabled, or requires a power cycle.

Fix: Enable VT-d (Intel) or IOMMU/AMD-Vi (AMD) explicitly in firmware. Then cold boot if necessary. Confirm with dmesg | egrep -i 'DMAR|AMD-Vi'.

2) Symptom: You edited GRUB, but /proc/cmdline never changes

Root cause: Host uses systemd-boot, not GRUB (common on UEFI + ZFS setups).

Fix: Edit /etc/kernel/cmdline, run proxmox-boot-tool refresh, reboot, re-check /proc/cmdline.

3) Symptom: /proc/cmdline includes intel_iommu=on, but dmesg shows no DMAR lines

Root cause: Firmware still disables VT-d, or platform lacks it, or DMAR tables are broken/hidden due to BIOS settings (sometimes “Above 4G decoding” and related PCIe settings interact).

Fix: Re-check firmware options, update BIOS, try enabling “Above 4G decoding” on some platforms (especially for multiple GPUs), and verify on boot logs.

4) Symptom: IOMMU enabled, but your GPU shares a group with half the machine

Root cause: Poor ACS support / PCIe topology on the motherboard. Common on consumer boards.

Fix: Move card to another slot, update BIOS, or accept ACS override risk. If this is production with strict isolation needs: change hardware.

5) Symptom: VM won’t start; “device is in use” or “cannot bind to vfio”

Root cause: Host driver still owns the device (nouveau/nvidia/amdgpu, or a storage/NIC driver), or binding wasn’t in initramfs.

Fix: Bind with /etc/modprobe.d/vfio.conf, rebuild initramfs, reboot, confirm Kernel driver in use: vfio-pci.

6) Symptom: Host boots, but local console is dead after VFIO binding

Root cause: You bound the only GPU to VFIO. The host has nothing to display with.

Fix: Use an iGPU for host console, add a cheap second GPU for host, or be comfortable with headless + remote management.

7) Symptom: Passthrough works, but performance is weird (latency spikes)

Root cause: Interrupt remapping disabled, MSI/MSI-X quirks, power management, or CPU pinning/NUMA mismatch. Not strictly “IOMMU not enabled,” but often discovered right after.

Fix: Confirm interrupt remapping in dmesg, check NUMA locality, consider pinning vCPUs, and avoid passing devices across NUMA nodes if you can.

8) Symptom: After enabling IOMMU, network interface names changed and VMs lost connectivity

Root cause: PCI enumeration changed; predictable interface naming shifted; bridge config references old names.

Fix: Use stable naming (by MAC in Proxmox network config), verify /etc/network/interfaces, and treat IOMMU enablement as a “reboot with potential enumeration changes.”

Checklists / step-by-step plan

Step-by-step: enabling IOMMU safely on a Proxmox host

  1. Get console access (IPMI/iDRAC/iLO). If you can’t, stop.
  2. Identify platform: Intel vs AMD (lscpu).
  3. Enable firmware setting: VT-d (Intel) or IOMMU/AMD-Vi (AMD). Save, reboot. If it still doesn’t show up later, do a cold boot.
  4. Identify bootloader: proxmox-boot-tool status.
  5. Add kernel params: Intel intel_iommu=on iommu=pt or AMD amd_iommu=on iommu=pt.
  6. Apply bootloader changes: systemd-boot proxmox-boot-tool refresh; GRUB update-grub.
  7. Reboot.
  8. Verify:
    • /proc/cmdline contains your params
    • dmesg shows DMAR/IOMMU enabled
    • /sys/kernel/iommu_groups populated

Step-by-step: preparing a device for passthrough

  1. Identify device: lspci -nn and capture vendor:device IDs.
  2. Check group: enumerate IOMMU groups; ensure the device isn’t glued to critical host devices.
  3. Load VFIO modules persistently (/etc/modules-load.d/vfio.conf).
  4. Bind IDs to vfio-pci (/etc/modprobe.d/vfio.conf).
  5. Blacklist host driver only if needed (avoid blacklisting NIC/storage drivers unless you really mean it).
  6. Rebuild initramfs and reboot.
  7. Verify binding: lspci -nnk shows Kernel driver in use: vfio-pci.
  8. Only then attach the device to the VM in Proxmox.

Rollback checklist (because grown-ups plan for rollback)

  • Remove IOMMU kernel params from GRUB or /etc/kernel/cmdline.
  • Remove or comment out VFIO ID binding in /etc/modprobe.d/vfio.conf.
  • Remove blacklists you added.
  • Rebuild initramfs.
  • Refresh bootloader config (GRUB/systemd-boot).
  • Reboot and confirm the original driver owns the device again.

FAQ

1) Is VT-x the same as VT-d?

No. VT-x is CPU virtualization. VT-d is IOMMU for devices. You can have VT-x working (KVM loads) while VT-d is off and passthrough fails.

2) On AMD, what am I enabling in BIOS?

Look for “IOMMU” or “AMD-Vi.” “SVM” is CPU virtualization, not device DMA remapping. You typically want both enabled for Proxmox virtualization + passthrough.

3) Do I need iommu=pt?

Usually it’s a good default on hosts doing passthrough. It can reduce overhead for host devices while keeping VFIO isolation. If you’re troubleshooting, you can remove it to simplify, but most setups run fine with it.

4) Why does Proxmox still say “IOMMU not enabled” after I set kernel parameters?

Because the kernel parameters might not be applied (edited wrong bootloader config), or firmware VT-d/AMD-Vi is off, or you didn’t reboot. Verify /proc/cmdline first. If it doesn’t show the params, nothing else matters.

5) What if IOMMU groups are terrible?

First try a different PCIe slot and a BIOS update. If that fails, decide whether you accept ACS override risk. If you need strong isolation, don’t “fix” bad hardware with a kernel hack—use better hardware.

6) Can I pass through a USB device without passing through a whole controller?

Yes, via USB passthrough at the QEMU layer. But for flaky devices (dongles, VR gear, low-latency input), passing through a whole USB controller is often more reliable.

7) Do I need to disable the host’s GPU driver?

Only if the host driver binds to the GPU before VFIO does. The cleanest approach is VFIO ID binding in initramfs. Blacklisting helps when the driver is aggressive, but it’s a blunt instrument.

8) Will enabling IOMMU break anything?

Usually no. Occasionally it changes PCI enumeration or reveals firmware bugs. That’s why you stage reboots, verify network naming, and keep rollback options.

9) Can I enable IOMMU without rebooting?

No. This is a boot-time hardware/firmware and kernel initialization feature. If someone claims otherwise, they’re selling you vibes.

10) What’s the difference between passing through by PCI address and by vendor:device ID?

Address binding targets a specific slot/path; ID binding targets a device model. ID binding is common and stable but can catch more devices than intended if you have duplicates. In production, be explicit and verify with lspci -nnk after reboot.

Conclusion: next steps that actually move the needle

If you’re seeing “IOMMU not enabled,” don’t thrash around in VM configs. Make the host prove IOMMU is enabled:

  • Verify firmware toggle: VT-d/AMD-Vi is on.
  • Verify kernel params are applied via /proc/cmdline.
  • Verify kernel initialized IOMMU via dmesg and /sys/kernel/iommu_groups.
  • Verify device isolation via IOMMU groups before binding and passthrough.

Then do VFIO the disciplined way: bind the right IDs, rebuild initramfs, reboot, confirm vfio-pci owns the device, and only then attach it to the VM.

Finally, decide like an adult about grouping: move slots and update firmware first; consider ACS override only if you accept the trade. The safest passthrough is the one your hardware was designed to support.

← Previous
MySQL vs PostgreSQL Backups and Restores: Who Gets You Back Online Faster
Next →
Docker “too many open files”: raising limits the right way (systemd + container)

Leave a comment