Proxmox VFIO “device is in use”: detaching PCI devices from the host the right way

Was this helpful?

If you’ve ever tried PCI passthrough on Proxmox and hit “device is in use”, you’ve met the Linux kernel doing its job: protecting a device that’s still owned by something else. The problem is that “something else” is often invisible—an early boot framebuffer, a polite little audio function on the GPU, a storage driver, a udev rule, or Proxmox itself trying to help.

Production reality: you don’t want to “just reboot and hope.” You want deterministic control over who owns the PCI function, when it’s claimed, and how to prove it’s detached before you hand it to a VM. This is how to do that without superstition.

A working mental model: what “device is in use” really means

When Proxmox (QEMU) tells you a PCI device is “in use,” it’s not being poetic. It’s reporting that something on the host currently holds the device open, owns the driver binding, or otherwise prevents QEMU from safely taking control through VFIO.

On Linux, a PCI function is “owned” primarily through driver binding. If nouveau, amdgpu, nvidia, xhci_hcd, ixgbe, megaraid_sas, etc. are bound, the kernel thinks, correctly, that the device is part of the host. VFIO needs that device bound to vfio-pci (or sometimes vfio-pci plus device-specific reset quirks) so QEMU can map it into the guest.

But binding is only the start. You can unbind a device and still have a process poking at it via sysfs, a framebuffer console attached, or an audio function used by PulseAudio you forgot existed. And Proxmox adds one more twist: it’s a hypervisor that tries to be helpful. If the host thinks a GPU is available, something might grab it during boot (framebuffer, DRM), and then you’re negotiating with a moving target.

So the goal is simple and strict:

  • The device (and any functions that must travel with it) are in an isolated IOMMU group or you accept the security trade-off with ACS override.
  • The host never binds a non-VFIO driver to it during boot.
  • The device is reset-capable enough to survive VM reboots and stops.
  • QEMU gets exclusive access through VFIO, every time, without manual “unbind roulette.”

Dry truth: VFIO passthrough is less “turnkey feature” and more “contract you enforce.”

Fast diagnosis playbook (first/second/third)

This is the order that finds the bottleneck quickly in production, where you don’t have time to admire kernel logs.

First: prove who owns the device right now

  • Find the PCI address (bus:slot.func) and see which driver is bound.
  • Confirm whether VFIO modules are loaded.
  • Check if any other function in the same multi-function device is still bound to a host driver.

Second: check IOMMU grouping and whether you’re fighting the platform

  • If the device shares an IOMMU group with something the host needs (like the SATA controller running your root filesystem), stop and redesign.
  • If you used ACS override, treat it like a security exception, not a victory lap.

Third: look for early boot claimers and “invisible” users

  • Framebuffer/DRM (common with GPUs).
  • udev autoloading drivers based on modalias.
  • systemd services (display manager, persistence daemons).
  • Leftover kernel modules in initramfs.

If you do those three in order, “device is in use” becomes a solvable ticket instead of a vibe.

Interesting facts and historical context (why VFIO is weird)

  1. VFIO wasn’t the first attempt. Before VFIO became the standard, pci-stub was commonly used to “park” devices away from host drivers.
  2. IOMMU isn’t just for virtualization. It exists for DMA isolation, protecting memory from devices that can bus-master—useful for security and reliability even without VMs.
  3. PCIe devices can DMA without asking. That’s why a device in the wrong IOMMU group is not merely inconvenient; it’s a boundary problem.
  4. ACS (Access Control Services) is a hardware feature, not a kernel mood. When platforms don’t expose ACS properly, Linux can’t always split groups cleanly.
  5. Multi-function devices are a classic trap. GPUs often show up as VGA + HDMI audio (and sometimes USB-C controllers). Passing only one function is asking for conflicts.
  6. Reset support is uneven. Some consumer GPUs are notorious for not resetting cleanly between guest boots without additional tricks.
  7. Early boot graphics is a thing. The Linux framebuffer and DRM KMS start early. If they grab the GPU, you’re already late unless you pre-bind to VFIO in initramfs.
  8. “Device is in use” is often a kernel-level lock, not a file lock. Tools like lsof can be useless because the “user” is the kernel driver itself.
  9. Virt stacks evolved around QEMU’s needs. VFIO’s design aligns with QEMU/KVM’s model: user-space gets safe, mediated access while the kernel enforces DMA isolation.

One quote that’s stuck around SRE circles: Hope is not a strategy — general operations maxim. Treat it as an SRE principle, not décor. (paraphrased idea)

Practical tasks: commands, outputs, and decisions (12+)

These are the moves that actually answer “who is using it?” and “what do I do next?” Each task includes: command, what the output means, and the decision you make.

Task 1: Identify the device and the current driver binding

cr0x@server:~$ lspci -nnk -d 10de:2684
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88aa]
	Kernel driver in use: nouveau
	Kernel modules: nouveau, nvidia_drm, nvidia

Meaning: The GPU is bound to nouveau. VFIO can’t take it while a graphics driver owns it.

Decision: You must prevent the host from binding nouveau/nvidia and bind vfio-pci instead (ideally in initramfs).

Task 2: Confirm VFIO modules are loaded

cr0x@server:~$ lsmod | egrep 'vfio|kvm'
vfio_pci               16384  0
vfio_pci_core          73728  1 vfio_pci
vfio_iommu_type1       40960  0
vfio                   45056  2 vfio_pci_core,vfio_iommu_type1
kvm_intel             397312  0
kvm                  1036288  1 kvm_intel

Meaning: VFIO and KVM are present. This is necessary but not sufficient.

Decision: If VFIO modules are missing, fix that first. Otherwise proceed to binding/IOMMU issues.

Task 3: Verify IOMMU is enabled at the kernel level

cr0x@server:~$ dmesg | egrep -i 'iommu|dmari|remapping' | head -n 8
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-4-pve root=ZFS=rpool/ROOT/pve-1 ro quiet intel_iommu=on iommu=pt
[    0.412345] DMAR: IOMMU enabled
[    0.412901] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    0.413112] DMAR: Interrupt remapping enabled
[    0.419876] pci 0000:00:00.0: DMAR: Skip IOMMU disabling for graphics

Meaning: IOMMU is active and interrupt remapping is enabled (good for stability and security).

Decision: If you don’t see IOMMU enabled, stop and fix BIOS and kernel parameters before touching anything else.

Task 4: Check the IOMMU group membership

cr0x@server:~$ for g in /sys/kernel/iommu_groups/*; do echo "Group $(basename "$g")"; ls -1 "$g/devices"; done | sed -n '1,24p'
Group 0
0000:00:00.0
0000:00:01.0
Group 10
0000:01:00.0
0000:01:00.1
Group 11
0000:02:00.0

Meaning: The GPU (01:00.0) and its audio function (01:00.1) share a group. That’s fine; you typically pass both.

Decision: If your target device shares a group with something critical (like the boot NVMe controller), do not proceed with passthrough on that platform layout.

Task 5: See what’s bound via sysfs (authoritative)

cr0x@server:~$ readlink -f /sys/bus/pci/devices/0000:01:00.0/driver
/sys/bus/pci/drivers/nouveau

Meaning: Kernel binding is nouveau, regardless of what you think your config says.

Decision: You need to stop the host from binding it on boot (blacklist + initramfs + vfio-pci ids).

Task 6: Find whether a device is “busy” because it’s part of the console/DRM stack (GPUs)

cr0x@server:~$ dmesg | egrep -i 'drm|framebuffer|fb0|efi fb|simpledrm' | head -n 12
[    0.623456] simpledrm: initialized
[    1.912345] nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
[    1.934567] fb0: switching to nouveaufb from simple

Meaning: The GPU became a framebuffer device (fb0). That’s a classic “in use” anchor.

Decision: Ensure the host never uses it for console graphics: bind to vfio-pci in initramfs; often disable framebuffer handoff where appropriate.

Task 7: Check if a host service is keeping GPU drivers loaded

cr0x@server:~$ systemctl status nvidia-persistenced --no-pager
● nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled)
     Active: active (running) since Fri 2025-12-26 08:11:12 UTC; 3h 14min ago
   Main PID: 2211 (nvidia-persiste)
      Tasks: 1
     Memory: 1.8M
        CPU: 2.013s

Meaning: The host is actively managing the NVIDIA stack, which will fight your passthrough.

Decision: Disable it (and unload drivers) if that GPU is meant for a VM.

Task 8: Spot kernel modules that will re-claim the device after you unbind

cr0x@server:~$ modprobe --showconfig | egrep -n 'blacklist (nouveau|amdgpu|nvidia|radeon)' | head
412:blacklist nouveau

Meaning: Blacklisting exists for nouveau, but that doesn’t guarantee it’s in your initramfs or that alternatives won’t bind.

Decision: Ensure the blacklist is present and then rebuild initramfs so early boot respects it.

Task 9: Check initramfs contents for the wrong drivers (the “it still loads!” mystery)

cr0x@server:~$ lsinitramfs /boot/initrd.img-6.8.12-4-pve | egrep 'nouveau|amdgpu|nvidia' | head
usr/lib/modules/6.8.12-4-pve/kernel/drivers/gpu/drm/nouveau/nouveau.ko
usr/lib/modules/6.8.12-4-pve/kernel/drivers/gpu/drm/drm.ko

Meaning: The module is inside initramfs, so it can load before your root filesystem configs are applied.

Decision: Remove it from initramfs via proper configuration (blacklist in initramfs context) and regenerate initramfs.

Task 10: Bind a device to vfio-pci using driver_override (surgical, runtime)

cr0x@server:~$ echo vfio-pci | sudo tee /sys/bus/pci/devices/0000:01:00.0/driver_override
vfio-pci
cr0x@server:~$ echo 0000:01:00.0 | sudo tee /sys/bus/pci/drivers/nouveau/unbind
0000:01:00.0
cr0x@server:~$ echo 0000:01:00.0 | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
0000:01:00.0

Meaning: You forcibly re-bound the device without rebooting. This is useful for testing and emergency recovery.

Decision: If this “works,” don’t stop here—make it persistent at boot. Runtime binds are a great way to forget what you changed.

Task 11: Validate the binding is now VFIO

cr0x@server:~$ lspci -nnk -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88aa]
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau, nvidia_drm, nvidia

Meaning: vfio-pci owns it. Kernel modules listed are “available,” not necessarily loaded.

Decision: Proceed to ensure the rest of the functions (01:00.1, etc.) are also bound to VFIO, then start the VM.

Task 12: Check the audio function (because GPUs are sneaky)

cr0x@server:~$ lspci -nnk -s 01:00.1
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
	Subsystem: ASUSTeK Computer Inc. Device [1043:88aa]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

Meaning: The GPU’s audio function is still bound to the host. QEMU can choke on this, and you’ll get “in use” or guest instability.

Decision: Bind 01:00.1 to VFIO too, or explicitly pass it as well if your VM config expects it. Treat GPU functions as a set.

Task 13: Identify “holders” via the driver’s perspective (who depends on it)

cr0x@server:~$ sudo lsof /dev/nvidia0 2>/dev/null | head
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
Xorg     1880 root   15u   CHR 195,0      0t0  331 /dev/nvidia0

Meaning: If you’re using the proprietary NVIDIA stack, processes can hold device nodes open.

Decision: Stop the service (display manager, Xorg, persistence daemon) before unbinding. If you’re headless, don’t install desktop stacks on a passthrough host unless you have a reason.

Task 14: Watch what QEMU/Proxmox complains about when start fails

cr0x@server:~$ journalctl -u pvedaemon -u pveproxy -u pvestatd -u pve-firewall -u pve-ha-lrm -u pve-ha-crm --since "10 minutes ago" --no-pager | tail -n 12
Dec 26 11:20:41 server pvedaemon[1552]: start VM 120: UPID:server:00003A1B:0001D2A1:676D...:qmstart:120:root@pam:
Dec 26 11:20:42 server pvedaemon[1552]: VM 120 qmp command failed - unable to open /dev/vfio/10: Device or resource busy
Dec 26 11:20:42 server pvedaemon[1552]: start failed: QEMU exited with code 1

Meaning: This points at an IOMMU group device node (/dev/vfio/10) being busy. Something else has that group open.

Decision: Look for another VM or process using the same IOMMU group; confirm no stale QEMU process exists; verify all functions in the group are passed consistently.

Joke #1: VFIO is like office hot-desking: if someone left their mug on the desk, you technically can’t sit there.

Detaching devices from the host: the right way, not the lucky way

Step 0: Decide whether the host should ever use the device

There are two legitimate operating modes:

  • Dedicated passthrough device: The host must never use it. Bind to VFIO at boot. This is the sane mode for GPUs, USB controllers, and NICs intended for guests.
  • Occasional passthrough device: The host sometimes uses it, sometimes passes it. This is fragile and invites “device is in use” because you’re doing runtime unbind/rebind. Use it only when hardware is scarce and you enjoy living dangerously.

Step 1: Confirm your platform supports clean isolation

IOMMU group isolation isn’t a “nice to have.” It’s the boundary that prevents a VM from DMA-writing into memory owned by other devices. In a strict environment, you pass through whole groups. In a practical environment, you at least understand what you’re trading away.

If your device shares an IOMMU group with a host-critical device, your options are:

  • Move the card to another slot (different root port, different grouping).
  • Change platform settings (sometimes “Above 4G decoding”, “Resizable BAR” toggles groupings indirectly; sometimes they don’t).
  • Use a different motherboard/CPU generation that exposes ACS properly.
  • Use ACS override patch/module parameter (last resort; security and stability caveats).

Step 2: Make VFIO binding persistent (the part people half-do)

On Proxmox, the clean approach is: set kernel parameters for IOMMU, load VFIO modules, and configure vfio-pci to claim the device IDs early.

2a) Ensure IOMMU kernel parameters exist (Intel example):

cr0x@server:~$ cat /etc/default/grub | egrep 'GRUB_CMDLINE_LINUX_DEFAULT'
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

Meaning: The kernel command line includes IOMMU enablement and pass-through mode for performance (iommu=pt) while still allowing VFIO isolation.

Decision: If missing, add the right parameters, then run update-grub and reboot.

cr0x@server:~$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.12-4-pve
Found initrd image: /boot/initrd.img-6.8.12-4-pve
done

2b) Load VFIO modules at boot:

cr0x@server:~$ cat /etc/modules | egrep '^vfio'
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Meaning: The core VFIO modules will be available early.

Decision: If not present, add them and rebuild initramfs so early boot has them.

2c) Bind by vendor:device ID with vfio-pci

Create a modprobe config that tells vfio-pci to claim the device IDs. Example for a GPU and its audio function:

cr0x@server:~$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2684,10de:22ba disable_vga=1

Meaning: On load, vfio-pci will bind to those IDs. disable_vga=1 helps avoid VGA arbitration conflicts on some setups.

Decision: If you have multiple identical GPUs, consider using driver_override per PCI address instead, to avoid capturing the wrong card.

2d) Blacklist conflicting host drivers (and do it for initramfs)

cr0x@server:~$ cat /etc/modprobe.d/blacklist-gpu.conf
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist rivafb

Meaning: These modules should not auto-load.

Decision: If you still see them in lsmod after reboot, they are being pulled in via initramfs or dependencies; fix initramfs next.

2e) Rebuild initramfs (where most “I blacklisted it” stories go to die)

cr0x@server:~$ sudo update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.12-4-pve

Meaning: The new initramfs includes your VFIO and blacklist configuration.

Decision: Reboot and re-check driver bindings. If it’s still wrong, check for other modules like simpledrm interactions and console settings.

Step 3: Detach cleanly at runtime (when you must)

Sometimes you’re already in the middle of an incident and a reboot is expensive. Runtime detach is possible, but it’s a controlled procedure, not random echoing into sysfs until it stops yelling.

Order of operations for runtime detach:

  1. Stop processes/services using the device (display manager, persistence daemons, storage services).
  2. Unbind all PCI functions that will be passed through.
  3. Set driver_override to vfio-pci to prevent immediate rebind to the old driver.
  4. Bind to vfio-pci.
  5. Start the VM.

Example for GPU VGA function + audio function:

cr0x@server:~$ sudo systemctl stop nvidia-persistenced
cr0x@server:~$ echo vfio-pci | sudo tee /sys/bus/pci/devices/0000:01:00.0/driver_override
vfio-pci
cr0x@server:~$ echo vfio-pci | sudo tee /sys/bus/pci/devices/0000:01:00.1/driver_override
vfio-pci
cr0x@server:~$ echo 0000:01:00.0 | sudo tee /sys/bus/pci/drivers/nouveau/unbind
0000:01:00.0
cr0x@server:~$ echo 0000:01:00.1 | sudo tee /sys/bus/pci/drivers/snd_hda_intel/unbind
0000:01:00.1
cr0x@server:~$ echo 0000:01:00.0 | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
0000:01:00.0
cr0x@server:~$ echo 0000:01:00.1 | sudo tee /sys/bus/pci/drivers/vfio-pci/bind
0000:01:00.1

What can go wrong: the old driver rebinds immediately; a framebuffer holds on; the device doesn’t reset; or you detach a NIC that carries your SSH session (yes, people do this).

Joke #2: Unbinding the management NIC over SSH is a great way to discover how much you trust your out-of-band console.

Step 4: Confirm Proxmox is passing the whole story, not half of it

In Proxmox VM configs, you typically use hostpciX entries. The number of problems caused by passing only the VGA function and forgetting audio, USB-C, or a bridge function is impressive in a depressing way.

Check the VM config:

cr0x@server:~$ sudo cat /etc/pve/qemu-server/120.conf
agent: 1
bios: ovmf
machine: q35
memory: 16384
name: win11-gpu
ostype: win11
scsihw: virtio-scsi-single
hostpci0: 0000:01:00.0,pcie=1,x-vga=1
hostpci1: 0000:01:00.1,pcie=1

Meaning: Both functions are passed; OVMF + Q35 is typically the right baseline for modern GPU passthrough.

Decision: If you only pass 01:00.0, fix it. If the IOMMU group includes more functions (like 01:00.2), pass those too or reconsider the device.

GPU passthrough failure modes (the usual suspects)

1) The host console is using the GPU (DRM/KMS)

This is the most common root cause behind “in use.” If the GPU is providing console output, the kernel’s DRM driver is not letting go politely. You can sometimes unbind, but you’ll also sometimes get a black console, hung driver, or half-reset card.

Production advice: use a cheap secondary GPU (or onboard graphics) for the host, and dedicate the passthrough GPU to VFIO from the first millisecond of boot.

2) You passed the VGA function but not the audio function

Many GPUs are two devices glued together: VGA and audio. They live in the same IOMMU group because they share resources. If you pass one and not the other, the host keeps one piece and your VM gets the other. That’s not sharing; it’s a custody battle.

3) Reset problems: the GPU doesn’t come back after VM stop/reboot

Some devices don’t implement a proper Function Level Reset (FLR), or the platform doesn’t route resets cleanly. Symptoms:

  • First VM start works, second start fails with “device is in use” or the guest sees Code 43 / device error.
  • The host logs show the device stuck, or VFIO can’t reinitialize it.

What you do:

  • Ensure you’re passing all functions.
  • Check whether the device supports reset; some devices require vendor-specific quirks.
  • In worst cases, only a host reboot truly resets it. Plan capacity and change windows accordingly.

4) The wrong GPU got captured by vfio-pci

Binding by device ID (ids=) is easy. It’s also blunt. If you have two identical GPUs, the host might bind both to VFIO and you lose console output—or you bind the wrong one and wonder why the VM can’t see the card you installed “yesterday.”

Preferred approach in multi-GPU systems: bind by PCI address using driver_override logic in early boot scripts, or keep the device IDs unique by choosing a different class of adapter for the host.

HBAs, USB controllers, NICs: different device classes, different traps

Passing through an HBA (SAS/SATA controller)

HBAs are usually excellent passthrough candidates because they behave like honest PCI devices and don’t need graphics console nonsense. The trap is simpler: you accidentally try to pass through the controller that provides the host’s storage. Linux will object, and if you force it, the host will stop being a host.

Task: confirm which block devices are on the controller you’re about to yank.

cr0x@server:~$ lsblk -o NAME,MODEL,SERIAL,HCTL,TYPE,SIZE | head -n 15
NAME   MODEL           SERIAL     HCTL        TYPE  SIZE
sda    ST12000NM0007   ZHZ0AAAA   1:0:0:0     disk 10.9T
sdb    ST12000NM0007   ZHZ0BBBB   1:0:1:0     disk 10.9T
nvme0n1 Samsung SSD    S6E...     -           disk  1.8T
zd0    -               -          -           disk   50G

Meaning: The HCTL values suggest disks on a SCSI host (likely an HBA). If that HBA is your ZFS pool, it’s not a passthrough candidate unless the host will no longer manage that pool.

Decision: Only pass through an HBA that is not hosting Proxmox’s root or critical storage, or accept you’re building a storage VM architecture and design accordingly.

Passing through a USB controller

USB controllers are popular because they provide “real” USB behavior (dongles, VR headsets, UPS units). The failure mode is IOMMU grouping: your USB controller shares a group with other chipset devices you need. Or you pass through the controller and lose the keyboard that you needed to fix it. Comedy is optional.

Task: identify the controller and group.

cr0x@server:~$ lspci -nn | grep -i usb
00:14.0 USB controller [0c03]: Intel Corporation Device [8086:7ae0] (rev 11)
cr0x@server:~$ readlink /sys/bus/pci/devices/0000:00:14.0/iommu_group
../../../../kernel/iommu_groups/2

Meaning: Group 2 contains the USB controller. Now you must verify what else is in group 2.

Decision: If group 2 includes other chipset essentials, don’t pass it. Add a dedicated USB controller card instead.

Passing through a NIC

NIC passthrough is great for special workloads (firewalls, DPDK, low-latency). It’s also how you cut off your own branch if you pass the management interface. Always have out-of-band access (IPMI/iKVM) before doing NIC passthrough experiments.

Task: map NIC PCI address to Linux interface name, and confirm it’s not your management path.

cr0x@server:~$ sudo lshw -class network -businfo | head -n 12
Bus info          Device      Class          Description
pci@0000:03:00.0  eno1        network        Ethernet controller
pci@0000:04:00.0  enp4s0      network        Ethernet controller

Meaning: You now know which PCI function corresponds to which interface.

Decision: Do not pass through the interface providing your current SSH session unless you have a tested OOB path and a rollback plan.

Three corporate mini-stories (how this breaks in real life)

Incident: the wrong assumption about “blacklisting”

A mid-sized company migrated a few compute-heavy workloads into a Proxmox cluster. One node had a spare GPU intended for a Windows VM that ran a CAD license dongle and some GPU-accelerated rendering. The engineer did what everyone does at first: blacklisted nouveau, set the vfio-pci IDs, rebooted, and celebrated when lspci showed VFIO.

Two weeks later, after a kernel update, the VM stopped starting. Proxmox reported the GPU was “in use.” The engineer assumed the config “must have reverted” and re-applied it. Still broken. They rolled back the kernel. It worked. They blamed the new kernel and opened a ticket.

The real issue was boring: the updated initramfs contained the GPU driver again and loaded it early. The blacklist file existed, but it wasn’t applied in the initramfs context due to how it was generated. The GPU got claimed before Proxmox ever had a chance. The reboot had been hiding the problem until the update changed the boot timing.

The fix wasn’t heroic. They standardized on: confirm module presence in initramfs, regenerate initramfs after changes, and verify bindings after every kernel update. No more “it worked last month” arguments. The VM became boring again, which is the correct state for infrastructure.

Optimization that backfired: runtime rebind to avoid reboots

A different org had a single GPU in a node that sometimes served a host-side monitoring dashboard (local console graphics) and sometimes got passed into a VM for short-lived ML experiments. They didn’t want reboots because the node also hosted other VMs. So they built a script: stop display manager, unbind GPU, bind to VFIO, start VM; later reverse it.

It worked in testing. Then it hit real usage patterns: the VM would crash, restart quickly, and the GPU wouldn’t reset cleanly. The script dutifully “rebound” the device, but the hardware was in a bad state. Sometimes the host would hang when reloading the DRM driver. Sometimes the VM would start but see a broken GPU. Sometimes Proxmox would refuse with “device is in use” because a stale QEMU process held the group node open for a few seconds longer than expected.

The postmortem lesson was painful and predictable: runtime rebinding is a reliability tax. They bought a cheap low-power GPU for the host console and dedicated the better GPU to passthrough only. Suddenly there was no script, no dance, and no mystery. The “optimization” had been avoiding hardware spend by spending engineering time and incident budget instead.

Boring but correct practice that saved the day: group-level validation before changes

A financial services team ran Proxmox hosts with strict change control. They had a checklist item that annoyed everyone: before adding or modifying any hostpci assignment, they recorded the full IOMMU group membership and confirmed no group contents had changed after firmware updates.

One quarter, a routine BIOS update changed PCIe bifurcation behavior on a subset of nodes. The GPU’s IOMMU group started including an upstream bridge and a USB controller that used to be separate. On the nodes that got updated first, passthrough started failing with “device is in use” and, worse, intermittent USB issues on the host.

Because the team had before/after group snapshots as part of the change ticket, diagnosis took minutes. They didn’t waste hours unbinding drivers and blaming Proxmox. They rolled the BIOS back on affected nodes, scheduled a hardware layout change, and prevented a wider outage. Everyone hated the checklist until it paid rent, which is how reliable operations usually works.

Common mistakes: symptom → root cause → fix

1) Symptom: “device is in use” right at VM start, every time

Root cause: Host driver bound (GPU driver, USB driver, NIC driver) instead of vfio-pci.

Fix: Verify with lspci -nnk and readlink /sys/bus/pci/devices/.../driver. Configure vfio-pci ids=, blacklist conflicting drivers, rebuild initramfs, reboot.

2) Symptom: it works after a manual unbind, but fails after reboot

Root cause: Your changes aren’t applied early enough; initramfs loads the driver before rootfs config is active.

Fix: Check lsinitramfs for the offending module, update initramfs, confirm VFIO modules are included early.

3) Symptom: GPU passed through, but guest has no output or driver errors

Root cause: Missing companion functions (audio, USB-C), wrong firmware (SeaBIOS vs OVMF), or x-vga mismatch.

Fix: Pass all functions in the IOMMU group; use OVMF + Q35 for modern GPUs; add x-vga=1 when appropriate.

4) Symptom: first VM start works, second start fails without host reboot

Root cause: Device reset issues (no FLR, buggy platform reset routing), or stale group ownership by a lingering QEMU process.

Fix: Confirm no leftover QEMU process; check logs; consider a full host reboot as the only reliable reset. Prefer server-grade GPUs or known-reset-friendly devices when uptime matters.

5) Symptom: “/dev/vfio/X: Device or resource busy”

Root cause: Another process/VM holds the IOMMU group node open, often because you passed a different function from the same group elsewhere.

Fix: Ensure you are not splitting a group across VMs; stop the other VM; re-check group membership.

6) Symptom: network drop right after binding NIC to VFIO

Root cause: You passed through the management NIC, or systemd-networkd/ifupdown lost the interface.

Fix: Use OOB access, revert binding, redesign with a dedicated passthrough NIC and a separate management NIC or bond.

7) Symptom: host freezes when you unbind GPU driver

Root cause: The GPU is still the active console framebuffer or used by a compositor/display manager.

Fix: Don’t do runtime GPU detach on a host that uses that GPU for console. Use a different GPU for host graphics or go headless.

Checklists / step-by-step plan

Checklist A: Dedicated passthrough device (recommended)

  1. Pick the device and list all functions (lspci -nn for 01:00.0, 01:00.1, etc.).
  2. Check IOMMU groups and confirm group isolation is acceptable.
  3. Enable IOMMU in BIOS and kernel parameters (intel_iommu=on or amd_iommu=on).
  4. Load VFIO modules early via /etc/modules.
  5. Bind device IDs to vfio-pci using /etc/modprobe.d/vfio.conf.
  6. Blacklist conflicting host drivers that would claim the device.
  7. Rebuild initramfs and reboot.
  8. Verify binding after boot using lspci -nnk and sysfs driver path.
  9. Configure VM to pass all required functions; use OVMF/Q35 for GPUs.
  10. Test stop/start loops (at least 5 cycles) to catch reset issues before users do.

Checklist B: Runtime detach (use sparingly)

  1. Confirm you have console access if you lose the device (especially NICs).
  2. Stop services that might touch the device (display manager, persistence daemons, storage services).
  3. Unbind all functions that will be passed through.
  4. Set driver_override to vfio-pci to avoid immediate rebind.
  5. Bind to vfio-pci.
  6. Start VM and confirm /dev/vfio/* ownership is correct.
  7. When done, reverse carefully: stop VM, unbind from VFIO, clear driver_override, bind back to host driver, restart services.

Rollback plan (write it down before you start)

  • If the host loses network: use out-of-band console, revert VFIO binding, reboot if needed.
  • If the GPU wedge persists: host reboot is the hard reset; schedule accordingly.
  • If IOMMU groups changed after update: roll back firmware/kernel or relocate cards; don’t fight physics.

FAQ

1) Why does Proxmox say “device is in use” even when no VM is running?

Because the host kernel driver is using it. “In use” usually means a driver is bound or the IOMMU group node is held open. Check lspci -nnk and /sys/bus/pci/devices/.../driver.

2) Is blacklisting modules enough?

Not reliably. If the driver is present in initramfs, it can load before your blacklist is applied from the root filesystem. Rebuild initramfs and verify module presence with lsinitramfs.

3) Do I need to pass through the GPU audio function?

Usually yes. It’s often in the same IOMMU group and shares the physical device. Leaving it bound to snd_hda_intel on the host is a common cause of “in use” and guest driver weirdness.

4) What does “/dev/vfio/10 busy” specifically indicate?

That IOMMU group 10 is already open by some process—often another QEMU instance, sometimes a stuck one. It can also happen if you split group functions across VMs. Fix by ensuring one group goes to one VM (or none).

5) Can I hot-unplug a PCI device from the host safely?

You can unbind and rebind drivers, but “safe” depends on the device class and whether the platform supports clean reset. GPUs are the least cooperative; HBAs and NICs are usually better behaved.

6) Should I use ACS override?

Only if you understand the trade-off: you may be weakening isolation guarantees. In homelabs it’s common; in regulated environments it’s typically a security exception requiring explicit approval.

7) Why does it break after kernel or BIOS updates?

Updates change driver behavior, initramfs contents, PCIe enumeration order, and IOMMU grouping. Treat passthrough hosts as systems where updates must include a post-change validation: binding, groups, and VM start/stop loops.

8) How do I avoid binding the wrong GPU when I have two identical cards?

Avoid relying solely on vfio-pci ids=. Use PCI addresses with driver_override logic at boot, or ensure the host uses a different GPU class (onboard or a cheap adapter) so you can bind by ID without collateral damage.

9) What’s the difference between “Kernel modules” and “Kernel driver in use” in lspci output?

“Kernel driver in use” is the active binding. “Kernel modules” are drivers that could bind based on modalias. You care about “in use.”

10) Do I need OVMF for GPU passthrough?

For most modern GPUs and modern guests, yes. OVMF (UEFI) with Q35 is the most predictable baseline. SeaBIOS can work in specific setups but adds avoidable complexity.

Conclusion: next steps that prevent 2 a.m. surprises

The “device is in use” error is not random. It’s your system telling you the host still owns the hardware. Your job is to make ownership unambiguous: correct IOMMU isolation, persistent VFIO binding in initramfs, and VM configs that pass all required functions together.

Practical next steps:

  1. Pick one passthrough device and fully inventory its functions and IOMMU group membership.
  2. Make VFIO binding persistent (IDs or address-based), rebuild initramfs, and verify after reboot.
  3. Run repeated VM stop/start cycles to flush out reset issues before users do.
  4. Write a rollback plan that assumes you will eventually detach the wrong thing once in your career—because you will.

If you make passthrough boring, you’ve done it right. Boring is scalable.

← Previous
Glide vs Direct3D: the API war that decided gaming’s future
Next →
MySQL vs PostgreSQL: the honest “website DB” pick (based on real bottlenecks)

Leave a comment