PCI passthrough is the kind of feature you only notice when it breaks: your VM won’t boot, your GPU goes black,
your NIC vanishes, or Proxmox politely informs you that the device is “in use” by something that definitely isn’t you.
And then you discover IOMMU groups—the CPU’s way of telling you which devices are allowed to be trusted together.
This piece is for the operators who need passthrough to work reliably, not “works after three reboots and a sacrifice.”
We’ll diagnose real failure modes, use commands you can run right now, and make the sharp calls: when to fix BIOS,
when to rebind drivers, and when to stop fighting the motherboard and buy a better one.
Fast diagnosis playbook
If you’re on-call and a VM won’t start after enabling passthrough, you don’t need philosophy. You need to find the bottleneck fast.
Here’s the short playbook I use in production, in the order that reveals the most with the least effort.
1) Confirm IOMMU is actually enabled (BIOS + kernel)
Most “VFIO is broken” tickets are “IOMMU never turned on.” Or it’s on in BIOS but off in the kernel, or vice versa.
Check kernel logs and active command line first; don’t guess.
2) Check the IOMMU group of the target device
If the GPU shares a group with a SATA controller, that’s not a “tweak it later” situation. That’s the problem.
Decide early whether you can live with passing the whole group, whether ACS override is acceptable, or whether you need different hardware.
3) Check who owns the device right now
If the host is using the GPU for console output, or the NIC is claimed by a kernel module, the VM can’t take it.
Bind to vfio-pci and blacklist the competing driver. Verify with lspci -k.
4) Validate VM configuration choices that silently sabotage you
Machine type, ROM BAR sizing, primary GPU selection, UEFI vs SeaBIOS, and reset quirks cause 80% of “black screen” issues.
This is where you stop “trying random forum stuff” and start making intentional choices.
5) Only then: chase vendor-specific weirdness
NVIDIA Code 43, AMD reset bugs, Intel iGPU GVT alternatives, and multifunction devices are real—but don’t go there until
the fundamentals are clean.
One paraphrased idea from Leslie Lamport applies here: Paraphrased idea: A distributed system fails because of a component you didn’t even know existed.
— Leslie Lamport
IOMMU groups: what they are and why you should care
An IOMMU is the hardware boundary that lets the host control how PCIe devices access memory. Without it, passthrough is basically
“please don’t DMA into my hypervisor,” which is not a security model—more of a hope-and-prayer model.
With IOMMU, you can hand a device to a VM and still enforce memory isolation.
IOMMU groups are the practical consequence of PCIe topology. Devices behind the same upstream bridge without proper isolation
(ACS: Access Control Services) can’t be cleanly separated. Linux then groups them, meaning: if you pass one device through,
you must treat the entire group as potentially able to snoop or interfere with the others.
Operators often misunderstand this in a very specific way: they think groups are a suggestion. They’re not. They’re the kernel
describing what hardware can be safely isolated. Sometimes you can cheat with ACS override; sometimes you shouldn’t.
The three buckets of passthrough failure
- Isolation failure: the device is in a group with things you can’t sacrifice (e.g., storage controller, host USB, primary NIC).
- Ownership failure: the host kernel driver owns the device (or Proxmox services do), so VFIO can’t bind it.
- Runtime failure: the VM starts but device doesn’t initialize (reset issues, ROM quirks, firmware mismatch, BAR sizing, GPU driver detection).
Treat those buckets like a branching decision tree. Don’t “keep tweaking” without knowing which category you’re in.
Joke #1: IOMMU groups are like org charts—everything looks separable until you try to move one person and discover three teams collapse.
Facts and history that explain today’s pain
You can run passthrough without knowing the history, but knowing it helps you predict which platforms will behave and which will gaslight you.
Here are concrete points that matter operationally.
- VT-d (Intel) and AMD-Vi (AMD) arrived as the “DMA version” of virtualization, not the compute version. CPU virtualization (VT-x/AMD-V) isn’t enough for PCI passthrough.
- Early consumer chipsets often implemented IOMMU poorly or with limited isolation; server platforms tended to do it correctly first.
- PCIe ACS became the linchpin for group separation; many consumer boards omit ACS features on downstream ports, creating giant groups.
- VFIO replaced older device assignment methods (historically KVM used a mix of mechanisms). VFIO standardized secure device access and is the modern baseline.
- GPU passthrough grew up in the shadow of vendor policies; drivers sometimes behaved differently in VMs, and the community built workarounds.
- UEFI adoption changed how option ROMs are loaded and how GPUs initialize. Many “black screen” incidents are really firmware/boot pipeline mismatches.
- Resizable BAR is great for performance on bare metal, but it can complicate passthrough and BAR mapping in VMs depending on platform and firmware.
- Modern kernels keep tightening security; what “worked” with permissive defaults years ago may now require explicit flags (and that’s good).
- ACS override patches became popular precisely because the hardware ecosystem didn’t prioritize clean isolation for enthusiasts—useful, but a trade-off.
Practical tasks (commands, outputs, decisions)
The goal here is not to dump commands. It’s to make each command earn its keep: run it, interpret the output, then make a decision.
The host examples assume a Debian-based Proxmox node.
Task 1: Confirm CPU virtualization and IOMMU capability
cr0x@server:~$ lscpu | egrep -i 'Virtualization|Model name|Vendor ID'
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E-2246G @ 3.60GHz
Virtualization: VT-x
What it means: VT-x/AMD-V is present, but this does not confirm VT-d/AMD-Vi is enabled.
Decision: proceed to kernel/IOMMU checks; don’t assume VT-d exists just because VT-x does.
Task 2: Check kernel boot parameters for IOMMU
cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.12-4-pve root=ZFS=rpool/ROOT/pve-1 ro quiet intel_iommu=on iommu=pt
What it means: intel_iommu=on enables Intel IOMMU; iommu=pt uses pass-through mapping for host devices (often reduces overhead).
Decision: if these flags aren’t present, add them and reboot before doing anything else.
Task 3: Verify IOMMU actually initialized
cr0x@server:~$ dmesg | egrep -i 'DMAR|IOMMU|AMD-Vi' | head -n 15
[ 0.000000] DMAR: IOMMU enabled
[ 0.000000] DMAR: Host address width 39
[ 0.112345] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[ 0.112678] DMAR: Initialized
What it means: the kernel sees and enabled the IOMMU.
Decision: if you don’t see “IOMMU enabled/Initialized,” go to BIOS (VT-d/AMD-Vi) and fix the platform first.
Task 4: List PCI devices with IDs (you will need these)
cr0x@server:~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:3e0f]
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080] [10de:1e87]
01:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8]
02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller [144d:a808]
What it means: you have device IDs like 10de:1e87 (GPU) and 10de:10f8 (its audio function).
Decision: plan to passthrough the whole multifunction set (GPU + audio) unless you have a good reason not to.
Task 5: Print IOMMU groups (the “can I isolate it?” test)
cr0x@server:~$ for g in /sys/kernel/iommu_groups/*; do echo "Group ${g##*/}"; lspci -nns "$(basename -a $g/devices/*)"; echo; done
Group 0
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:3e0f]
Group 7
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080] [10de:1e87]
01:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8]
Group 8
02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller [144d:a808]
What it means: the GPU is in a nice small group with its audio function—good.
Decision: if your target device shares a group with critical host devices (SATA controller, USB controller you need for keyboard, the only NIC),
you either pass the whole group, change slots/BIOS settings, use ACS override with eyes open, or change hardware.
Task 6: Identify which driver currently owns the device
cr0x@server:~$ lspci -k -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080]
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3751
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
What it means: the host NVIDIA driver is loaded; VFIO can’t take the card without unbinding/rebinding.
Decision: remove host usage of the GPU (no console, no Xorg), blacklist the driver if needed, and bind to vfio-pci.
Task 7: Make sure VFIO modules are available and loaded
cr0x@server:~$ lsmod | egrep 'vfio|kvm'
vfio_pci 16384 0
vfio_pci_core 94208 1 vfio_pci
vfio_iommu_type1 45056 0
vfio 40960 2 vfio_pci_core,vfio_iommu_type1
kvm_intel 372736 0
kvm 1036288 1 kvm_intel
What it means: VFIO is present.
Decision: if VFIO modules are missing, install the right kernel packages and ensure the Proxmox kernel matches expectations.
Task 8: Bind a device to vfio-pci using device IDs
cr0x@server:~$ sudo sh -c 'echo "options vfio-pci ids=10de:1e87,10de:10f8 disable_vga=1" > /etc/modprobe.d/vfio.conf'
cr0x@server:~$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e87,10de:10f8 disable_vga=1
What it means: on next boot, VFIO will claim the GPU functions early.
Decision: do this when you want passthrough to be stable across reboots. Temporary unbind tricks are fine for experiments, not for production.
Task 9: Blacklist conflicting GPU drivers (nouveau is a frequent squatter)
cr0x@server:~$ sudo tee /etc/modprobe.d/blacklist-gpu.conf >/dev/null <<'EOF'
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
blacklist nvidia_modeset
EOF
cr0x@server:~$ cat /etc/modprobe.d/blacklist-gpu.conf
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
blacklist nvidia_modeset
What it means: you’re preventing the host from “helpfully” loading a driver that steals your passthrough device.
Decision: blacklist only what you must. If the host needs a GPU for console, use iGPU or a cheap secondary card and passthrough the other.
Task 10: Update initramfs so early binding/blacklists actually apply
cr0x@server:~$ sudo update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.12-4-pve
What it means: your early boot environment now contains the new module settings.
Decision: if you skip this, you’ll spend a long afternoon staring at the wrong driver loaded “mysteriously” before rootfs mounts.
Task 11: After reboot, confirm vfio owns the device
cr0x@server:~$ lspci -k -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080]
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3751
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
What it means: correct ownership. Note that “Kernel modules” can list candidates; “driver in use” is what matters.
Decision: if it’s still not vfio-pci, revisit blacklists, initramfs, and whether the device is used as boot VGA.
Task 12: Confirm Proxmox/QEMU sees the IOMMU device nodes
cr0x@server:~$ ls -l /dev/vfio/
total 0
crw------- 1 root root 240, 0 Dec 26 09:12 0
crw------- 1 root root 240, 7 Dec 26 09:12 7
crw------- 1 root root 240, 255 Dec 26 09:12 vfio
What it means: group devices exist (here group 7 is present).
Decision: if the group node isn’t present, your IOMMU grouping isn’t active or the kernel isn’t exposing VFIO correctly.
Task 13: Check if the device supports reset cleanly (common GPU trap)
cr0x@server:~$ sudo dmesg | egrep -i 'vfio|reset|FLR|D3' | tail -n 20
[ 112.334455] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[ 112.334890] vfio-pci 0000:01:00.0: not capable of FLR, device reset may be unreliable
What it means: no Function Level Reset (FLR). This often shows up as “works once, then fails on VM reboot” because the device never resets.
Decision: plan around it: power-cycle host between VM runs, use vendor reset modules if appropriate, or choose hardware with reliable resets.
Task 14: Validate VM config for PCIe and OVMF (Proxmox qm config)
cr0x@server:~$ qm config 110
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
machine: q35
memory: 16384
name: win11-gpu
ostype: win11
scsi0: local-lvm:vm-110-disk-0,size=200G
hostpci0: 01:00.0,pcie=1,x-vga=1
hostpci1: 01:00.1,pcie=1
efidisk0: local-lvm:vm-110-disk-1,size=1M,efitype=4m,pre-enrolled-keys=1
What it means: this is the sane baseline for many GPUs: q35 + OVMF + pcie=1 and x-vga=1 for primary GPU.
Decision: if you’re using i440fx for a modern GPU and getting weirdness, switch to q35 unless you have a specific reason not to.
Task 15: If VM won’t start, read the real error message in QEMU logs
cr0x@server:~$ tail -n 60 /var/log/pve/tasks/active
UPID:server:00004A2C:0001D2B1:676D5B2A:qmstart:110:root@pam:
cr0x@server:~$ journalctl -u pvedaemon -n 80 --no-pager
Dec 26 09:20:11 server pvedaemon[1324]: start VM 110: UPID:server:00004A2C:0001D2B1:676D5B2A:qmstart:110:root@pam:
Dec 26 09:20:11 server pvedaemon[1324]: VM 110 qmp command failed - unable to open /dev/vfio/7: Device or resource busy
What it means: “busy” is usually ownership or a dangling process.
Decision: find who holds the VFIO group and kill it cleanly (or stop the VM that’s already using it).
Task 16: Find which process is holding the device/group
cr0x@server:~$ sudo lsof /dev/vfio/7
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
qemu-syst 991 root 25u CHR 240,7 0t0 165 /dev/vfio/7
What it means: a QEMU process already holds the group.
Decision: stop the owning VM. If it’s a crashed QEMU, terminate it and investigate why it didn’t release VFIO.
Task 17: Inspect PCIe topology to understand grouping problems
cr0x@server:~$ lspci -t
-[0000:00]-+-00.0
+-01.0-[01]----00.0
| \-00.1
\-02.0-[02]----00.0
What it means: the GPU sits behind bus 01; NVMe behind bus 02. Clean separation often means clean groups.
Decision: if everything sits under one downstream port, expect messy groups. Consider moving cards to different slots (wired to different root ports).
Task 18: If you must, test ACS override and see what groups become (risk-aware)
cr0x@server:~$ sudo sed -i 's/quiet/quiet pcie_acs_override=downstream,multifunction/' /etc/default/grub
cr0x@server:~$ grep -n '^GRUB_CMDLINE_LINUX_DEFAULT' /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_acs_override=downstream,multifunction intel_iommu=on iommu=pt"
cr0x@server:~$ sudo update-grub
Generating grub configuration file ...
done
What it means: you’re asking the kernel to pretend ACS isolation exists where it may not.
Decision: use this only if you accept the security/isolation compromise. In multi-tenant or compliance environments, avoid it.
Joke #2: ACS override is like removing the “do not enter” tape because it slows you down—until you remember why the tape existed.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A team rolled out Proxmox nodes to host GPU-accelerated Windows VMs for a business unit that needed a legacy app with GPU-based rendering.
They tested on one “golden” workstation board and it worked. Procurement then sourced a different motherboard revision because it was “equivalent.”
Equivalent is a dangerous word in infrastructure.
On the new boards, the GPU IOMMU group also contained a USB controller and the onboard audio device. The team assumed:
“We’re only passing through the GPU, so the USB controller shouldn’t matter.” They enabled ACS override to split groups,
patted themselves on the back, and moved to production.
Weeks later, random VMs started hard-freezing during peak usage. Not crashing. Freezing. The host stayed up, but QEMU processes went into states
that required force-kill. The freeze rate rose under higher USB activity: smartcard readers, USB cameras, odd peripherals.
You can probably see where this is going.
The “split” IOMMU groups weren’t real isolation; they were software-imposed boundaries. Under load, devices that shared the same physical path
still interacted in ways VFIO could not fully police. The outcome wasn’t a clean security breach; it was an operational nightmare:
unreliable resets, flaky interrupts, and VM lockups that looked like driver problems.
The fix was unglamorous: they moved to boards with proper ACS isolation on the PCIe root ports and stopped overriding reality.
The postmortem’s key line: the assumption that IOMMU groups are “just Linux grouping” rather than “hardware containment boundaries.”
That sentence probably saved them a year of recurring incidents.
Mini-story 2: The optimization that backfired
Another org wanted to squeeze performance. They read that iommu=pt reduces overhead for host devices, so they added it everywhere.
Then they went further: they tuned kernel parameters, rearranged interrupts, and set aggressive power management defaults in BIOS because
energy efficiency looked good on a slide.
Passthrough worked—mostly. But after maintenance windows, some VMs would boot with the GPU missing, or the NIC passed through to a firewall VM
would fail to link until the VM was restarted. The team treated it as “transient hardware weirdness.”
Those words are how outages get promoted to recurring.
The culprit was a combination of power management (devices dropping into deeper D-states), firmware behavior, and reset capability.
Some devices didn’t tolerate the new defaults; others needed explicit reset handling. The “optimization” wasn’t wrong in general.
It was wrong for their specific mix of devices and uptime patterns.
They rolled back power-saving features for the PCIe fabric, stopped chasing micro-optimizations without measurement, and documented a hard rule:
production passthrough nodes get conservative PCIe power settings and only proven kernel parameters.
Performance improved slightly less than hoped, but reliability improved dramatically—which is what pays your salary in the long run.
Mini-story 3: The boring but correct practice that saved the day
A small platform team ran a fleet of Proxmox nodes with mixed workloads. Some VMs used passthrough NICs; others used GPUs.
Their practice was aggressively boring: every node had a “hardware topology” note updated after any physical change—slot maps, IOMMU groups,
and which VM consumed which group.
One weekend, a node failed and a technician replaced it with a spare chassis, moving cards quickly to restore service.
The VMs came up, but a storage VM that used a passed-through HBA started throwing errors, while another VM that used a NIC passthrough
wouldn’t start at all.
Instead of experimenting in production, they pulled the topology note. It showed that on the spare chassis, the HBA and NIC landed in the same group.
That wasn’t a Linux configuration issue; it was a slot/topology mismatch. They shut down the node, moved one card to the correct slot,
and brought services back without a prolonged outage.
No heroics. No kernel patching. No blind ACS override. Just documentation and disciplined mapping of “this PCIe slot leads to this root port
and this IOMMU group.” The practice was boring enough that nobody wanted to do it—until it saved a weekend and a reputational dent.
Common mistakes: symptom → root cause → fix
1) VM fails to start: “unable to open /dev/vfio/X: Device or resource busy”
Root cause: a process already holds the VFIO group (another VM, stuck QEMU, or leftover binding state).
Fix: identify the holder with lsof /dev/vfio/X, stop the owning VM, and clean up stale QEMU processes. Confirm group ownership is unique.
2) VM starts, but GPU shows black screen
Root cause: wrong VM firmware/machine type (SeaBIOS vs OVMF, i440fx vs q35), missing GPU ROM, or the GPU isn’t set as primary.
Fix: use bios: ovmf, machine: q35, hostpci0: ...,pcie=1,x-vga=1. Ensure you passthrough all functions (GPU audio).
3) GPU works once, then fails after VM reboot
Root cause: reset bug or no FLR support; the device doesn’t reinitialize cleanly.
Fix: prefer hardware with FLR; otherwise plan operationally (host reboot for clean reset) or use a reset workaround module where appropriate.
4) Device won’t bind to vfio-pci; host driver keeps grabbing it
Root cause: driver loads in initramfs before vfio-pci claims the device; blacklists not applied early.
Fix: configure /etc/modprobe.d/vfio.conf + blacklist file, then update-initramfs -u. Reboot and verify lspci -k.
5) IOMMU groups are huge; GPU shares group with SATA/USB/NIC
Root cause: motherboard PCIe topology and missing ACS isolation; sometimes BIOS settings worsen it.
Fix: try different slots, disable “above 4G decoding” only if it demonstrably changes grouping (often you need it on), consider a different board.
Use ACS override only when you accept reduced isolation.
6) Windows VM shows GPU error (e.g., driver won’t load, device disabled)
Root cause: VM configuration mismatch (primary GPU), missing UEFI, or vendor/driver virtualization detection.
Fix: correct q35/OVMF config first; then handle vendor-specific issues only if they persist.
7) Network passthrough NIC drops link randomly
Root cause: power management states, interrupt remapping quirks, or flaky reset behavior on that NIC/slot.
Fix: set conservative PCIe power management in BIOS, update firmware, and validate interrupt remapping logs. Consider moving slots/root ports.
8) Proxmox host becomes unstable when ACS override is enabled
Root cause: you split groups in software that aren’t isolated in hardware; DMA/interrupt paths still collide.
Fix: remove ACS override, redesign hardware topology, or pass entire groups (and accept losing those devices to the host).
Checklists / step-by-step plan
Step-by-step: bring up passthrough without drama
- BIOS first: enable VT-d (Intel) or AMD-Vi (AMD). Disable CSM if you can, and keep PCIe power management conservative.
-
Kernel flags: set
intel_iommu=on iommu=pt(oramd_iommu=on iommu=pt) in your bootloader. -
Verify IOMMU init: check
dmesgfor “IOMMU enabled/Initialized.” -
Map your target device: record its PCI address and device IDs via
lspci -nn. - Inspect IOMMU groups: if the group contains anything you can’t lose, stop and redesign (slot move or hardware change).
-
Bind to vfio-pci: add
options vfio-pci ids=...and blacklist competing drivers. - Update initramfs and reboot: make the early binding real.
-
Confirm binding:
lspci -kmust showKernel driver in use: vfio-pci. -
Configure the VM intentionally: q35 + OVMF for modern GPUs, include all functions, set
pcie=1. - Validate with logs: if it fails, read the QEMU/daemon logs; don’t guess.
- Test reboot cycles: not just “it booted once,” but “it survives guest reboot, shutdown/start, and migration policies.”
- Document topology: record slot → PCI address → IOMMU group → consumer VM. Future-you will thank present-you.
Operational checklist: before you call it “production-ready”
- Host boots without needing the passthrough GPU for console output (or has a separate console GPU/iGPU).
- All devices in the IOMMU group are either passed through together or safely unused by the host.
- The VM survives 10+ start/stop cycles and at least 3 guest reboots without device loss.
- Host kernel upgrades have a rollback plan (and you’ve tested VFIO after at least one upgrade).
- No ACS override in environments that require strong isolation (multi-tenant, compliance, adversarial threat model).
- You can explain the failure domain in one sentence: “If group 7 breaks, it affects VM 110 and nothing else.”
FAQ
1) Why does my GPU have to be passed through with its audio device?
Because it’s typically a multifunction PCI device: GPU function at 01:00.0 and HDMI/DP audio at 01:00.1.
They often share initialization and reset behavior. Pass both unless you have a tested reason not to.
2) Is ACS override “safe”?
“Safe” depends on your threat model. ACS override can make Linux report smaller groups, but it can’t add missing hardware isolation.
For a home lab, it’s often acceptable. For environments needing strict DMA isolation, avoid it and fix the topology with hardware.
3) My IOMMU group includes the SATA controller. What now?
Do not passthrough that group unless you’re prepared to lose the host’s storage access. Try another PCIe slot, check BIOS options,
or move to a motherboard/CPU platform with better root port isolation.
4) Why does Proxmox say the VFIO device is busy when no VM is running?
Often there is a stuck QEMU process, a management task still holding the device, or a partially started VM.
Use lsof /dev/vfio/X to find the process. Then stop the VM cleanly or kill the stuck process and review logs.
5) Should I use q35 or i440fx for passthrough?
For modern GPUs and PCIe devices, use q35. It models PCIe more naturally and avoids some legacy quirks.
Use i440fx only if you have a legacy guest dependency or a specific compatibility reason.
6) Do I need OVMF (UEFI) for GPU passthrough?
Not always, but often. Many modern GPUs and Windows 11 setups behave better with OVMF.
If you see black screens or initialization weirdness, switching to OVMF is a high-value change.
7) What does “not capable of FLR” imply?
FLR is a standardized reset mechanism. Without it, a device might not reset cleanly between VM runs.
Symptoms include “works once” and “fails after reboot.” The operational fix can be as blunt as host reboot between runs, or as real as new hardware.
8) Can I passthrough devices to containers (LXC) instead of VMs?
Some device access can be granted to containers, but true PCI passthrough is a VM story because you need hardware isolation and VFIO behavior.
If you need strong separation and full device control, use a VM.
9) Why do IOMMU groups change after a BIOS update?
BIOS updates can change PCIe routing, ACS exposure, and how bridges are enumerated. That can reshape groups.
Treat BIOS updates as a change that must be validated for passthrough nodes, not “routine patching.”
10) Can I pass through a USB controller instead of individual USB devices?
Yes, and it’s often more reliable for USB-heavy workloads (dongles, cameras, smartcard readers).
But check its IOMMU group carefully; USB controllers often share groups with other chipset devices.
Conclusion: next steps that actually move the needle
When PCI passthrough fails on Proxmox, the fastest path out is to stop treating it as magic and start treating it as topology plus ownership plus firmware.
Check IOMMU initialization, read the groups, bind devices to VFIO early, and configure VMs intentionally (q35 + OVMF is the default stance for modern GPUs).
Practical next steps:
- Run the IOMMU group dump and save it alongside your node documentation. Treat it like a wiring diagram.
- Verify binding with
lspci -kafter every kernel update and after any BIOS change. - If you’re relying on ACS override in anything resembling production, plan a hardware/topology migration. It’s technical debt with a PCIe edge.
- Test reboot cycles, not just “first boot.” Most failures are reset- and lifecycle-related.
Passthrough isn’t hard because Linux is complicated. It’s hard because hardware is complicated—and it’s very confident about it.
Align with the hardware, and Proxmox will look boring. Boring is the goal.