“It works on my desktop” is cute until you’re trying to run Windows in a VM with a real GPU, a real workload, and real users who file tickets when the mouse lags.
GPU passthrough is one of those projects that looks like a checkbox (“enable IOMMU”) and turns into a week of chasing a single missing BIOS toggle, a broken PCIe reset, or a Windows driver that refuses to cooperate. This guide is for getting it done without turning your host into a science fair.
What you’re building (and what you’re not)
You’re building a Windows VM on a Linux host (KVM/QEMU stack) that uses a physical GPU directly via VFIO/IOMMU. The VM owns that GPU. Not “kind of.” Not “shared politely.” The host should treat that GPU like it’s unplugged.
This is not the same thing as:
- Remote desktop acceleration (RDP, VNC, SPICE). Useful, but usually not “near-native” for 3D or low-latency interactive work.
- vGPU / mediated devices (SR-IOV-style GPU sharing). Great when you have it, but hardware and licensing can be restrictive.
- WSL2 GUI acceleration (nice, different problem).
Near-native speed means you’ve removed the typical virtualization bottlenecks: emulated devices, bad interrupt routing, storage sync penalties, and CPU scheduling chaos. The GPU is the headline, but the supporting cast (firmware, chipset isolation, storage, timers) wins or loses your day.
Facts and historical context (so the weirdness makes sense)
- IOMMU wasn’t created for gamers. AMD-Vi and Intel VT-d were pushed hard for server consolidation and DMA isolation—stopping devices from scribbling over memory they don’t own.
- VFIO is relatively “modern Linux.” Before VFIO, device assignment was possible but more fragile and less secure; VFIO standardized a safer user-space interface for passthrough.
- PCIe ACS is the unsung villain/hero. Access Control Services determine whether a switch can isolate peer-to-peer traffic. Many consumer boards cheap out, and your IOMMU groups show it.
- UEFI (OVMF) changed the game. Legacy BIOS ROM behaviors and GPU option ROM handling were a mess. OVMF made modern Windows installs and GPU init much more predictable.
- NVIDIA’s “Code 43” era was real. Consumer driver behaviors historically punished virtualization signals. Many of those pains are reduced today, but the folklore remains for a reason.
- Reset behavior is hardware, not vibes. Some GPUs cannot cleanly reset without a full bus reset; that’s why “works once after boot” keeps showing up.
- VirtIO wasn’t a performance hack; it’s a philosophy. Paravirtual devices reduce emulation overhead by designing an interface explicitly for VMs.
- MSI/MSI-X interrupts matter more than you want to admit. They reduce interrupt sharing and can improve latency. But they can also reveal broken firmware tables and driver assumptions.
Hardware decisions that make or break passthrough
Pick a CPU/platform that does IOMMU properly
Most modern AMD Ryzen and Intel Core platforms support IOMMU/VT-d, but the motherboard implementation is what you’re really buying. If you have a choice, prioritize boards known to have sane IOMMU grouping and stable firmware updates. A board with three BIOS updates in six months is either a great team… or a warning label.
Two GPUs makes life easier
Yes, you can passthrough the only GPU and run the host headless. But it’s harder to diagnose and more annoying to recover when you break display. Use an iGPU (Intel) or a cheap second GPU for the host console. It turns “I bricked my boot” into “I can still SSH in and fix it.”
PCIe slot wiring: read the fine print
The top x16 slot isn’t always “CPU lanes direct.” Some boards route through chipset or share bandwidth with M.2 slots. For passthrough, you want:
- GPU in a slot with stable link training (Gen4/Gen5 can be spicy).
- Minimal sharing with other high-bandwidth devices.
- IOMMU grouping that doesn’t lump your GPU with half the chipset.
USB and audio strategy
Decide early: do you passthrough a whole USB controller, or do you forward individual devices? Passing through a full controller is cleaner and more stable for keyboards, mice, VR headsets, and audio interfaces. But it needs a controller in its own IOMMU group.
Joke #1: USB passthrough is like office politics: everything is fine until a headset shows up and suddenly nothing works.
Networking: keep it boring
A virtio-net NIC on a Linux bridge will saturate 10GbE easily in many cases. Don’t passthrough the NIC unless you have a hard reason (special drivers, strict isolation). Every passthrough device adds failure modes and reboot dependencies.
BIOS/UEFI settings: the boring switches that control your fate
Most “GPU passthrough doesn’t work” threads are just “IOMMU wasn’t actually enabled.” The BIOS labels vary, the defaults vary, and firmware updates can quietly reset them. Write down your settings. Seriously.
Settings to enable
- Intel: VT-d (IOMMU), VT-x (virtualization). Sometimes “DMA remapping.”
- AMD: SVM (virtualization), IOMMU (AMD-Vi). Sometimes “IOMMU Controller.”
- UEFI boot mode (prefer UEFI, not CSM/legacy).
- Above 4G decoding (often required for modern GPUs, BAR sizing, and clean mapping).
- Resizable BAR (optional; test carefully—can help or complicate).
Settings to consider disabling (if you’re chasing gremlins)
- CSM (compatibility support module). It can create weird ROM/init paths.
- Fast boot (hides POST info and sometimes skips device init steps you need for reset behavior).
- PCIe ASPM (power saving that can induce latency or instability on some platforms).
Firmware updates: treat them like production changes
Update BIOS if you need improved ACS grouping or stability. But do it like you would patch a core router: maintenance window, rollback plan, and a way to get back in if the host won’t boot. You are changing the platform that your security boundary (IOMMU) depends on.
Host Linux configuration: IOMMU, VFIO, and isolation
Kernel boot params: IOMMU on, and actually used
On Intel, you typically want intel_iommu=on. On AMD, amd_iommu=on. Many setups also add iommu=pt to reduce overhead for host devices by using pass-through mappings where safe.
If you’re chasing performance: pin CPUs, isolate host noise, and avoid emulated devices. If you’re chasing stability: keep the configuration simple and reproducible before you tune.
Bind the GPU to VFIO early
You want the host to never load the normal GPU driver for the passed-through device. That means binding it to vfio-pci at boot, using device IDs (vendor:device). Also bind the GPU’s HDMI audio function—most discrete GPUs expose at least two PCI functions.
Don’t forget the initramfs
Many distributions need VFIO modules available in initramfs so the kernel can bind devices before the “real” GPU drivers grab them. Miss this, and you’ll spend a day wondering why your GPU is “in use” even when you blacklisted drivers.
Security and blast radius
IOMMU is isolation, not magic. DMA attacks are why this exists, and misconfiguration can compromise the host. If this system matters, keep the host lean, minimize third-party kernel modules, and treat passthrough like giving a VM a direct lane into the hardware.
Here’s one quote worth remembering. It’s short, and it punches above its weight:
“Hope is not a strategy.” — Gen. Gordon R. Sullivan
IOMMU groups: interpret them like a crime scene
IOMMU groups determine what can be safely isolated. If your GPU is in a group with your SATA controller, you don’t have a “cool passthrough setup.” You have a future incident report.
What “good” looks like
- The GPU and its audio function are in the same group (fine), but not with unrelated devices.
- Your USB controller intended for passthrough is in its own group.
- NVMe drives you want to pass through are isolated cleanly (or you don’t pass them through at all and keep storage virtualized).
What “bad” looks like
- GPU shares a group with chipset bridge devices and multiple controllers.
- Everything behind a PCIe switch collapses into a single group (common on consumer gear without ACS).
ACS override: a tool, not a solution
The kernel has an ACS override patch/parameter in some environments. It can split IOMMU groups artificially. It can also create a false sense of safety by telling the software stack “it’s isolated” when the hardware can still do peer-to-peer DMA. Use it only when you understand the risk and accept it. In production: avoid relying on it.
Building the Windows VM: OVMF, Q35, VirtIO, and sanity
Machine type and firmware
Use Q35 (modern chipset emulation) and OVMF (UEFI). It aligns with modern Windows expectations, PCIe topology, and GPU initialization.
Disk and network devices
Use VirtIO for disk and network. Install VirtIO drivers during Windows setup (or inject them). Emulated SATA/IDE works for bootstrapping but costs performance and sometimes stability under load.
CPU model and features
Expose a sensible CPU model. “host-passthrough” is common because it gives Windows the same instruction set as the host, reducing translation overhead. But it can reduce VM portability. For a single host workstation/server, that’s usually fine.
TPM and Secure Boot
Windows 11 wants TPM 2.0 and Secure Boot. Use a virtual TPM (swtpm) and OVMF Secure Boot if your platform supports it. If you’re doing this for a controlled environment, meet the requirements instead of hacking around them. Hacks become policy. Then audits happen.
Passing through the GPU (and its audio function)
Pass both functions, or enjoy half-working HDMI audio
Discrete GPUs commonly present:
- VGA/3D controller function
- HDMI/DP audio function
- Sometimes USB-C controller (on some cards)
Pass the whole device set that belongs together. Mixing host/guest ownership creates driver confusion and device resets that fail in creative ways.
Reset bugs and “works once” syndrome
If the VM can start once after host boot, but fails on second start, you’re likely hitting a reset issue. Some AMD GPUs historically needed vendor-reset; some NVIDIA cards can be stubborn too. The fix might be a kernel module, a different PCIe slot, disabling ASPM, or changing how QEMU resets the device. Sometimes the fix is “use a different GPU,” which is not emotionally satisfying but is operationally correct.
Joke #2: GPU reset bugs are nature’s way of reminding us that “turn it off and on again” is a hardware feature, not a life philosophy.
Windows driver install order
Install VirtIO first (storage/network), then GPU drivers once the VM is stable. If you install GPU drivers early and the VM is unstable, you’ll spend time debugging Windows when the root cause is actually VFIO binding or PCIe grouping.
Storage and performance: don’t sabotage yourself
GPU passthrough setups often fail the “near-native” promise because storage is configured like a demo VM from 2009.
Decide: virtual disk vs device passthrough
- Virtual disk (qcow2/raw on NVMe, VirtIO-blk/scsi): easiest to snapshot, migrate, back up, and monitor. Usually fast enough.
- NVMe passthrough: high performance, but reduces manageability and increases recovery complexity. Also requires clean IOMMU isolation.
ZFS on the host: powerful, but respect sync
If your VM disk is on ZFS, you’ll run into sync semantics. A Windows guest with write caching and flushes can trigger sync writes. If your ZFS pool has no SLOG or has a weak one, latency spikes show up as “VM stutter.”
Rule: don’t disable sync unless you can afford corruption. If you’re building a gaming-only VM and can tolerate losing a session, maybe you’ll take that risk. For anything that smells like work: keep sync and invest in a proper SLOG or accept the latency.
Trim/Discard
Enable discard where appropriate so SSD space is reclaimed. But validate it. Misconfigured discard can cause periodic latency spikes depending on storage stack and firmware.
Latency and “near-native”: where the last 10% hides
CPU pinning and scheduler noise
Pin vCPUs to physical cores. Keep emulator threads off your pinned cores. If you run the host’s background jobs on the same cores as the VM’s render thread, you’ll get micro-stutter that’s nearly impossible to explain to anyone who wants a simple answer.
Hugepages: sometimes a win, sometimes a trap
Hugepages can reduce TLB misses and improve memory performance. They can also make memory allocation brittle. If you’re running multiple services, reserve hugepages carefully and monitor fragmentation. If this is a single-purpose box, hugepages are more likely to be worth it.
Interrupts: MSI/MSI-X and isolation
When performance is “close but not smooth,” check interrupt routing and whether devices are using MSI. Shared line-based interrupts can produce latency under load. The fix isn’t always to force MSI; sometimes it’s to fix the topology, update firmware, or move the device to a different slot.
Frame capture vs direct display
If you want to keep the monitor attached to the host but see the VM’s output, tools like Looking Glass can help using shared memory (ivshmem). It’s slick. It’s also another moving part. Get basic passthrough stable first; then add convenience.
Three corporate mini-stories from the trenches
Incident: a wrong assumption about “identical servers”
A company decided to standardize on a small fleet of “identical” workstations for CAD contractors: same CPU, same RAM, same GPU model, same Linux image. The plan was to host Windows VMs with GPU passthrough so contractors could remote into their assigned VM from anywhere.
Deployment week went smoothly. Then half the machines started failing VM boots after a routine reboot. The symptoms were inconsistent: some hosts would start the VM once, then fail on subsequent starts; others would show the GPU missing entirely. The team assumed it was a kernel update regression and rolled back. No change.
The real issue was embarrassingly physical. The “identical” machines were bought in two batches. The second batch had a different motherboard revision with a different PCIe switch. IOMMU groups changed, and the GPU ended up sharing a group with a USB controller that the host needed for its boot media and remote KVM dongle. The passthrough scripts blindly bound the whole group to VFIO.
Fix: inventory hardware revisions, pin critical devices by PCI address carefully, and refuse to auto-bind an entire group when it includes host-critical devices. They also started validating IOMMU group layout as part of acceptance testing, like you’d validate NIC firmware.
Optimization that backfired: chasing benchmark numbers with storage settings
Another shop ran Windows VMs for GPU-accelerated video processing. The VMs lived on a ZFS pool. Someone noticed the ingest pipeline sometimes stalled, and they wanted faster writes. They toggled a dataset property to reduce sync overhead (and patted themselves on the back because the benchmark graph went up and to the right).
For a few weeks, things seemed fine. Then there was a power event. Not a dramatic one—just a brief brownout that tripped a UPS in a way that forced a hard shutdown. When the hosts came back, several VMs booted with corrupted NTFS volumes. Some projects had to be re-imported from source media; some work-in-progress was gone.
The postmortem was uncomfortable because the “optimization” wasn’t malicious; it was a common internet tip. The problem is that write ordering and durability semantics matter more under virtualization, because guests often assume the hypervisor honors flushes.
Fix: restore safe sync behavior, add a real SLOG device for latency, and document a rule: performance changes to storage durability require the same approval path as security changes. Benchmarks don’t sign incident reports; people do.
Boring but correct practice that saved the day: deterministic pinning and version control
A team running a small internal “GPU farm” for Windows-based rendering had a habit that looked almost comically bureaucratic: every host had its full passthrough config in version control, including PCI IDs, QEMU args, and kernel parameters. Changes went through review. They also had a one-page runbook for how to identify the GPU and its audio function and confirm VFIO binding.
Then a routine maintenance window arrived: kernel updates, microcode updates, a BIOS update on two machines, and a GPU swap because a fan was failing. Exactly the kind of day where “I’ll remember what I did last time” turns into improvisational theater.
One of the updated machines came back with the GPU in a different PCI address (slot change). The VM failed to start because the config referenced the old address. No panic. The on-call engineer followed the runbook: list devices, map IOMMU groups, update the config, rebind VFIO, regenerate initramfs, reboot. Thirty minutes later the VM was running. Nobody had to rediscover how the setup worked under pressure.
The boring practice wasn’t heroism. It was just treating passthrough configs like production infrastructure: deterministic, reviewable, and testable. That’s the whole trick.
Fast diagnosis playbook
When performance isn’t near-native or the VM won’t start, don’t wander. Triage like an SRE: confirm the boundary conditions first, then narrow.
First: confirm the fundamentals (10 minutes)
- IOMMU enabled and active (kernel logs show DMAR/AMD-Vi, and IOMMU groups exist).
- GPU bound to vfio-pci on the host before VM start.
- VM uses OVMF + Q35 (modern firmware and chipset model).
- Windows sees the GPU without driver errors (Device Manager, no Code 12/43).
Second: identify the bottleneck class
- Stutter under load → CPU scheduling / interrupts / storage latency.
- VM won’t start → IOMMU groups, VFIO binding, ROM/reset issues.
- Starts once, fails later → GPU reset behavior, power management, driver state.
- Low FPS but stable → wrong display path (using a virtual GPU), wrong driver, PCIe link width/speed.
Third: measure the obvious signals
- PCIe link state (is the GPU at x16 Gen4, or did it train to x4 Gen1?).
- Storage latency (are sync writes throttling?).
- CPU steal time and host load (are you fighting the scheduler?).
- Interrupt distribution (is one core taking all interrupts?).
If you can’t say what category you’re in, you’re not diagnosing yet—you’re sightseeing.
Common mistakes: symptoms → root cause → fix
VM won’t start; QEMU says “device is in use”
Symptoms: VM fails immediately; logs mention the GPU device is busy.
Root cause: Host GPU driver (nouveau/amdgpu/nvidia) claimed the device before vfio-pci.
Fix: Bind by vendor/device ID with vfio-pci at boot; ensure VFIO modules are in initramfs; blacklist conflicting drivers only as a secondary measure.
Windows shows Code 12 (“not enough resources”) for the GPU
Symptoms: GPU appears but won’t start; resource allocation error.
Root cause: BAR mapping/resource constraints; often missing “Above 4G decoding” or bad PCIe topology in VM.
Fix: Enable Above 4G decoding in BIOS; use Q35; consider Resizable BAR off initially; ensure you pass through all GPU functions.
Works once after host boot, then GPU disappears on VM restart
Symptoms: First VM boot OK; subsequent starts fail until host reboot.
Root cause: GPU doesn’t support clean function-level reset; driver leaves it in a bad state.
Fix: Try different kernel/QEMU versions; disable PCIe power management; use vendor-reset where applicable; if it’s mission-critical, choose a GPU known for reset stability.
Low FPS and high CPU usage in the guest
Symptoms: Windows feels like software rendering; CPU spikes.
Root cause: You’re not actually using the passed-through GPU (Windows is on Microsoft Basic Display Adapter or a virtual GPU), or the display is routed through a slow remoting path.
Fix: Confirm device manager shows the correct GPU; install proper GPU drivers; attach a monitor to the passed-through GPU or use a low-latency capture method like ivshmem-based solutions.
Audio crackles under load
Symptoms: Pops/crackles, especially when compiling or copying files.
Root cause: DPC latency from CPU scheduling, interrupts, or power management; sometimes caused by passing through only part of the audio chain.
Fix: Pin vCPUs; reduce host background load; ensure MSI/MSI-X where appropriate; passthrough a dedicated USB controller for audio devices if needed.
VM disk performance is terrible on a “fast” pool
Symptoms: FPS drops when the game loads assets; video renders stall; IO wait spikes.
Root cause: Sync write penalties, thin-provisioning fragmentation, qcow2 overhead, or lack of cache/SLOG tuning.
Fix: Prefer raw on fast SSD/NVMe for performance; use VirtIO with proper queueing; on ZFS, keep sync=standard and provide a proper SLOG if needed.
USB devices randomly disconnect
Symptoms: Keyboard/mouse drop; VR devices vanish; Windows “device removed” sounds.
Root cause: USB device passthrough flapping; host power management; forwarding individual devices instead of a controller; poor IOMMU grouping forcing compromises.
Fix: Passthrough a full USB controller in an isolated group; disable USB autosuspend on host if needed; avoid hubs where possible.
Practical tasks with commands, outputs, and decisions
These are the checks I run when I want answers fast. Each task includes: a command, what the output means, and what decision you make next.
Task 1: Confirm virtualization extensions are present
cr0x@server:~$ lscpu | egrep -i 'Virtualization|Flags'
Virtualization: AMD-V
Flags: fpu vme de pse tsc ... svm ...
Meaning: You have CPU virtualization (SVM on AMD, VT-x on Intel). This is necessary but not sufficient for IOMMU.
Decision: If virtualization is missing, fix BIOS settings first (SVM/VT-x). Don’t touch VFIO yet.
Task 2: Confirm IOMMU/DMAR is actually enabled in the kernel
cr0x@server:~$ dmesg | egrep -i 'DMAR|IOMMU|AMD-Vi' | head -n 12
[ 0.812345] AMD-Vi: IOMMU performance counters supported
[ 0.812678] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 0.813210] AMD-Vi: Interrupt remapping enabled
Meaning: The IOMMU is initialized and interrupt remapping is on (good sign for stability).
Decision: If you see nothing relevant, add amd_iommu=on or intel_iommu=on to GRUB/systemd-boot and reboot.
Task 3: Confirm VFIO modules are loaded
cr0x@server:~$ lsmod | egrep 'vfio|kvm' | head
vfio_pci 65536 0
vfio_pci_core 90112 1 vfio_pci
vfio_iommu_type1 40960 0
vfio 45056 2 vfio_pci_core,vfio_iommu_type1
kvm_amd 139264 0
kvm 1105920 1 kvm_amd
Meaning: VFIO core is present. If this is missing, passthrough won’t happen.
Decision: If missing, install appropriate packages and ensure initramfs includes VFIO modules.
Task 4: Identify the GPU and its functions (VGA + audio)
cr0x@server:~$ lspci -nn | egrep -i 'vga|3d|audio'
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070] [10de:2484] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
Meaning: Two functions: GPU (01:00.0) and GPU audio (01:00.1). You almost always pass both.
Decision: Record the PCI addresses and device IDs. These drive vfio-pci binding.
Task 5: Check IOMMU group membership for the GPU
cr0x@server:~$ for d in /sys/kernel/iommu_groups/*/devices/*; do echo "$d" | sed 's/.*iommu_groups\/\([0-9]*\).*/Group \1/'; lspci -nns "${d##*/}"; done | egrep -A2 '01:00\.'
Group 12
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070] [10de:2484] (rev a1)
Group 12
01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
Meaning: The GPU and its audio are in Group 12 and nothing else is shown in this filtered output. You still need to confirm the full group contents.
Decision: If Group 12 contains unrelated devices, you cannot safely passthrough just the GPU without addressing isolation (different slot/board, or accept risk).
Task 6: Print the full IOMMU group (no cheating)
cr0x@server:~$ find /sys/kernel/iommu_groups/12/devices -maxdepth 1 -type l -printf '%f\n' | sort
0000:01:00.0
0000:01:00.1
Meaning: Clean, minimal group. This is what you want.
Decision: Proceed with VFIO binding. If the group is crowded, stop and reconsider hardware/topology.
Task 7: Bind GPU to vfio-pci using vendor:device IDs
cr0x@server:~$ sudo tee /etc/modprobe.d/vfio.conf >/dev/null <<'EOF'
options vfio-pci ids=10de:2484,10de:228b disable_vga=1
EOF
Meaning: This tells vfio-pci which devices to claim. disable_vga=1 can help with legacy VGA decoding quirks.
Decision: After setting this, you must rebuild initramfs (next task) and reboot to ensure early binding.
Task 8: Ensure VFIO modules load early (initramfs) and regenerate
cr0x@server:~$ echo -e "vfio\nvfio_pci\nvfio_iommu_type1\nvfio_pci_core" | sudo tee /etc/initramfs-tools/modules
vfio
vfio_pci
vfio_iommu_type1
vfio_pci_core
cr0x@server:~$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.8.0-41-generic
Meaning: The initramfs now includes VFIO modules so they can bind devices before GPU drivers load.
Decision: Reboot. If you skip reboot here, you’re mostly testing your luck.
Task 9: After reboot, verify the GPU is using vfio-pci
cr0x@server:~$ lspci -k -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3070] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 3901
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
Meaning: “Kernel driver in use: vfio-pci” is the money line. The presence of other modules under “Kernel modules” is fine; it lists possible modules, not active binding.
Decision: If the driver in use is nouveau/amdgpu/nvidia, go back: initramfs order, vfio IDs, and blacklists.
Task 10: Confirm the host didn’t grab the GPU for display
cr0x@server:~$ loginctl seat-status seat0 | head -n 20
seat0
Devices:
/sys/devices/pci0000:00/0000:00:02.0/drm/card0
cr0x@server:~$ ls -l /dev/dri/by-path | head
total 0
lrwxrwxrwx 1 root root 8 Feb 4 10:12 pci-0000:00:02.0-card -> ../card0
Meaning: The active DRM device is the iGPU at 00:02.0, not the passthrough GPU at 01:00.0.
Decision: If the host is using the passthrough GPU for DRM, you need a secondary GPU/iGPU for the host or a headless setup with careful binding.
Task 11: Validate PCIe link width/speed (performance sanity)
cr0x@server:~$ sudo lspci -vv -s 01:00.0 | egrep -i 'LnkCap|LnkSta'
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <64us
LnkSta: Speed 16GT/s (ok), Width x16 (ok)
Meaning: The GPU trained correctly (Gen4 x16 here). If you see x4 or Gen1, expect bandwidth bottlenecks and weird stutter.
Decision: If link is degraded, reseat the card, change slot, update BIOS, or force PCIe generation in BIOS for stability testing.
Task 12: Check KVM acceleration is available
cr0x@server:~$ sudo kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used
Meaning: You have hardware virtualization acceleration exposed to QEMU.
Decision: If this fails, fix BIOS virtualization or kernel module loading before you troubleshoot passthrough.
Task 13: Confirm QEMU sees the VFIO device node
cr0x@server:~$ ls -l /dev/vfio
total 0
crw-rw---- 1 root root 10, 196 Feb 4 10:15 vfio
crw-rw---- 1 root root 10, 140 Feb 4 10:15 12
Meaning: Group 12 has a VFIO device node. That’s what QEMU opens to access the group.
Decision: If the group node is missing, IOMMU grouping isn’t active or permissions/udev rules are off.
Task 14: Watch for IOMMU faults during VM boot
cr0x@server:~$ sudo dmesg -w | egrep -i 'vfio|iommu|fault|DMAR|AMD-Vi'
[ 312.123456] vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
[ 312.456789] vfio-pci 0000:01:00.1: enabling device (0000 -> 0002)
Meaning: Normal device enable lines are fine. Actual IOMMU faults would show “fault” and an address.
Decision: If you see faults, suspect buggy firmware, unsafe ACS overrides, or a device doing DMA outside its assigned ranges.
Task 15: Check host CPU scheduling pressure and steal-like symptoms
cr0x@server:~$ top -b -n 1 | head -n 12
top - 10:22:01 up 12 days, 3:11, 2 users, load average: 7.82, 6.11, 5.44
Tasks: 382 total, 2 running, 380 sleeping, 0 stopped, 0 zombie
%Cpu(s): 62.1 us, 3.4 sy, 0.0 ni, 33.9 id, 0.4 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 64239.0 total, 812.5 free, 28410.2 used, 35016.3 buff/cache
Meaning: Load is high but CPU idle is still ~34%. That suggests runnable pressure but not total saturation. If wa spikes, suspect storage latency.
Decision: If you see sustained high wa, go straight to storage checks. If sy is high, suspect interrupt storms or heavy networking.
Task 16: Measure storage latency signals on ZFS (if applicable)
cr0x@server:~$ sudo zpool iostat -v 1 3
capacity operations bandwidth
pool alloc free read write read write
rpool 1.20T 2.40T 220 980 28.0M 96.5M
nvme0n1 1.20T 2.40T 220 980 28.0M 96.5M
Meaning: High write ops could include sync-heavy guest workloads. This output doesn’t show latency directly, but it shows pressure patterns.
Decision: If write ops jump when stutter happens, investigate ZFS sync behavior, SLOG, and guest flush patterns.
Checklists / step-by-step plan
Phase 1: Hardware and firmware (do this before touching Linux configs)
- Install a second GPU or enable iGPU for the host display.
- Update BIOS/UEFI to a stable release (not necessarily the newest beta).
- Enable VT-d/AMD-Vi (IOMMU), VT-x/SVM, Above 4G decoding.
- Disable CSM; boot in pure UEFI mode.
- Place GPU in a slot likely to be CPU-lane direct; avoid sharing with critical controllers.
Phase 2: Host OS baseline
- Install a recent LTS kernel known to be stable with VFIO for your distro.
- Enable IOMMU kernel params (
intel_iommu=onoramd_iommu=on; optionallyiommu=pt). - Install KVM/QEMU, libvirt (or your hypervisor stack), and ensure
/dev/kvmexists. - Identify GPU functions with
lspci -nn. - Check IOMMU groups. If groups are bad, fix topology now (slot/board), not later.
Phase 3: VFIO binding
- Create vfio-pci IDs config (
/etc/modprobe.d/vfio.conf). - Add VFIO modules to initramfs, regenerate initramfs, reboot.
- Verify “Kernel driver in use: vfio-pci” for GPU and audio function.
- Confirm host display is not using the passthrough GPU.
Phase 4: VM creation
- Create VM with Q35 + OVMF.
- Use VirtIO disk + VirtIO network; attach VirtIO driver ISO.
- Install Windows; install VirtIO drivers; then install GPU drivers.
- Pass through GPU + audio, and optionally a USB controller.
- Reboot VM several times. If it only works once, you’re not done.
Phase 5: Performance tuning (only after stability)
- Pin vCPUs; isolate host CPUs if needed.
- Evaluate hugepages if workload benefits.
- Verify PCIe link speed/width.
- Measure storage behavior under load; fix sync latency correctly.
FAQ
1) Do I need two GPUs?
Need? No. Want? Yes. Two GPUs (or an iGPU for the host) makes recovery and debugging dramatically easier. Headless-only is viable but less forgiving.
2) Can I do near-native gaming performance?
Often, yes. The GPU part can be near-native; the remaining gap is usually CPU scheduling, storage latency, and the display/USB path. If you tune those, you can get very close.
3) Should I passthrough the NVMe drive for maximum speed?
Only if you have a reason that beats the operational cost. A raw virtual disk on a fast NVMe with VirtIO is already excellent. NVMe passthrough complicates backups, snapshots, and recovery.
4) Why does Windows show Code 12 or Code 43?
Code 12 is typically resource/BAR allocation issues—often fixed by Above 4G decoding and correct VM chipset/firmware. Code 43 historically involved driver virtualization detection, but modern causes include misconfiguration and unstable device state.
5) Does Resizable BAR help?
It can, depending on game/workload and platform. It can also introduce mapping complexity. Get a stable baseline with it off, then test it intentionally.
6) Is ACS override safe?
It can make the kernel pretend you have better isolation than the hardware provides. For a personal workstation, you might accept the risk. For anything that matters, treat it as a last resort and prefer real hardware isolation.
7) What’s the cleanest way to handle keyboard/mouse and USB devices?
Passthrough a whole USB controller in its own IOMMU group. Individual USB device forwarding works, but it’s more prone to resets and reconnect weirdness with high-churn devices.
8) How do I know if the bottleneck is storage vs CPU vs GPU?
Check PCIe link state and confirm the GPU is actually used. Then look at host CPU wa and storage stats under stutter. If IO wait spikes coincide with stutter, it’s storage. If CPU is saturated or interrupts pile on one core, it’s scheduling/interrupts.
9) Can I run multiple Windows VMs with multiple GPUs?
Yes, if you have the PCIe lanes, power, cooling, and clean IOMMU groups. Operationally, it’s closer to running a small cluster: deterministic configs and change control pay off.
10) Should I use libvirt, Proxmox, or raw QEMU?
Use what you can operate. Libvirt gives structure and easier management; raw QEMU gives full control. Proxmox is convenient but still relies on the same kernel/VFIO reality underneath.
Next steps you should actually take
- Inventory your hardware: motherboard revision, BIOS version, GPU model, and PCIe layout. Don’t assume “same model” means “same behavior.”
- Validate IOMMU groups before you buy extra parts or start tweaking kernel flags. If groups are ugly, change slots or boards now.
- Make VFIO binding deterministic: device IDs, initramfs, and post-reboot verification that the host is not driving the GPU.
- Build the VM with modern defaults: Q35 + OVMF + VirtIO. Avoid the nostalgia path.
- Stabilize first, tune second: reboot cycles, VM start/stop cycles, and suspend/resume (if you care) before you chase microseconds.
- Write a runbook: the 10 commands you used today will be the same ones you’ll need at 2 a.m. later.
If you do those six things, you’ll spend your time on real performance tuning instead of arguing with a BIOS checkbox that quietly turned itself off.