Nothing ages you faster than a “minor” kernel update that turns PCI passthrough into interpretive dance. Yesterday your VM had a GPU, an HBA, and a smug sense of stability. Today it boots like it’s trying to remember its own name, and your IOMMU groups look like they were shuffled by a cat.
This is a practical, production-first guide for diagnosing IOMMU and VFIO regressions fast—without ritual sacrifices, vague forum advice, or toggling random BIOS options until something “works.” We’ll verify the hardware path, the kernel path, the driver binding path, and the virtualization path—in that order—so you stop guessing and start deciding.
Fast diagnosis playbook (do this first)
If you’re on-call, you don’t need a philosophy of IOMMU. You need a fast fork in the road: is this BIOS, kernel parameters, driver binding, grouping, or device reset behavior? Work top-down and stop when you find a hard contradiction.
First: confirm IOMMU is actually enabled and active
- Check kernel boot parameters for IOMMU flags.
- Check
dmesgfor IOMMU initialization. - Confirm you have IOMMU groups under
/sys/kernel/iommu_groups.
Decision: If IOMMU isn’t active, don’t touch VFIO, libvirt, QEMU, or Proxmox settings yet. Fix firmware + kernel args first.
Second: confirm the device is bound to vfio-pci (and not the host driver)
lspci -nnkfor “Kernel driver in use”.- Check
vfio-pciloaded and device IDs matched. - Verify initramfs includes VFIO modules if you need early binding.
Decision: If your GPU/HBA is still claimed by amdgpu/nvidia/nouveau/ahci/megaraid_sas, the VM will lose. Fix binding before you touch groups.
Third: check IOMMU groups and isolation regressions
- List groups and see what moved.
- Look for “group grew” symptoms: your GPU now shares with a USB controller or audio device you can’t pass through cleanly.
- Check whether ACS/ARI behavior changed with the kernel update.
Decision: If grouping is the issue, you either (a) move slots, (b) adjust BIOS PCIe settings, (c) accept ACS override risk, or (d) change motherboard. Pick one; don’t “just override” in production without a threat model.
Fourth: inspect DMA/IOMMU faults and reset issues
- Search
dmesgfor IOMMU faults and DMAR/AMD-Vi errors. - Check for FLR/reset quirks (especially GPUs).
- Validate your VM config doesn’t request impossible things (like multifunction mismatch).
Decision: If it’s a reset quirk, you may need kernel parameters, a different GPU, or to stop trying to hot-restart the VM like it’s a web server.
Short joke #1: An IOMMU group is like an open-plan office: one noisy neighbor and suddenly nobody gets work done.
What likely changed after a kernel update
Kernel updates don’t “randomly” break passthrough. They change how the kernel enumerates devices, how it applies quirks, how it handles PCIe features, and sometimes how it defaults security vs convenience. The failure feels random because your mental model is stale.
Common kernel-update triggers for passthrough regressions
- Different PCIe topology enumeration: a bridge driver or quirk changes and your device ends up behind a different root port behaviorally.
- ACS/ARI handling changes: groups can merge or split when the kernel decides a downstream port can’t be trusted to isolate.
- VFIO behavior changes: stricter checks around unsafe interrupts, BAR sizing, or device reset.
- Initramfs differences: after a kernel install, your initramfs might not include vfio-pci early binding; the host grabs the device first.
- Module signing / secure boot interactions: an out-of-tree module fails to load, leaving fallback drivers in control.
- Firmware/BIOS updates piggybacked: “while you’re at it” updates sometimes toggle virtualization defaults or reset PCIe settings.
Two rules that keep you sane
Rule 1: Treat passthrough as a chain of custody. Firmware exposes, kernel enumerates, IOMMU isolates, VFIO binds, hypervisor assigns. Breaks happen at a link, not in the abstract.
Rule 2: Don’t try to fix grouping until you’ve proven binding and IOMMU activation. Otherwise you’ll “fix” the wrong layer and learn nothing.
Interesting facts and short history that actually help
Knowing a little background helps you predict which knobs matter, and which ones are placebo.
- Intel’s “VT-d” and AMD’s “AMD-Vi” are different branding for the same core idea: DMA remapping so devices can’t scribble on arbitrary RAM.
- IOMMU wasn’t originally about virtualization convenience; it was about security and correctness when devices DMA into memory.
- Early VFIO replaced older approaches (like legacy KVM assignment paths) by leaning on the IOMMU for isolation and the kernel for mediation.
- ACS (Access Control Services) is a PCIe feature that can help enforce isolation boundaries; lack of ACS is why consumer platforms often have “sticky” groups.
- ARI (Alternative Routing-ID Interpretation) affects how functions are addressed behind a bridge; changes in ARI handling can alter groupings and multifunction behavior.
- MSI/MSI-X interrupt remapping matters: passthrough is safer when the platform supports interrupt remapping; some kernels are more strict when it’s missing.
- FLR (Function Level Reset) is optional in PCIe devices; plenty of devices “support reset” in marketing terms but behave badly under repeated VM reboots.
- GPU passthrough got easier and harder over time: better VFIO and QEMU features, but also more complex GPUs, power states, and firmware interactions.
- ACS override exists because reality is messy, not because it’s a best practice; it forces the kernel to pretend isolation exists where hardware doesn’t guarantee it.
Practical diagnosis tasks (commands, outputs, decisions)
These are the tasks I run in production when passthrough breaks after a kernel update. Each task includes: command, what the output means, and the decision you make next. Run them in order if you want speed; cherry-pick if you already know the layer that failed.
Task 1: Confirm your running kernel and last boot context
cr0x@server:~$ uname -r
6.8.0-48-generic
What it means: You’re debugging the running kernel. If you just updated but didn’t reboot, you may be chasing ghosts.
Decision: If the kernel version doesn’t match the “broken” change window, reboot into the suspected kernel and reproduce. Don’t do archaeology in the wrong runtime.
Task 2: Check the kernel command line for IOMMU parameters
cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.0-48-generic root=UUID=2a3d... ro quiet splash intel_iommu=on iommu=pt
What it means: You’ve asked for Intel IOMMU and “passthrough mode” for host devices (iommu=pt) which can reduce overhead for non-assigned devices.
Decision: If you’re on AMD, intel_iommu=on is decorative. Use amd_iommu=on. If neither is present, add the right one and regenerate bootloader config.
Task 3: Prove IOMMU actually initialized (Intel DMAR / AMD-Vi)
cr0x@server:~$ dmesg -T | egrep -i 'DMAR|IOMMU|AMD-Vi' | head -n 20
[Tue Feb 4 10:18:11 2026] DMAR: IOMMU enabled
[Tue Feb 4 10:18:11 2026] DMAR: Host address width 46
[Tue Feb 4 10:18:11 2026] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[Tue Feb 4 10:18:11 2026] DMAR: Intel(R) Virtualization Technology for Directed I/O
What it means: Kernel says IOMMU is enabled and found DMAR units.
Decision: If you see “IOMMU disabled” or nothing at all, stop. Go to BIOS/UEFI: enable VT-d/AMD-Vi, and disable “Above 4G decoding” only if you have a specific broken platform (rare; usually you want it enabled).
Task 4: Confirm IOMMU groups exist
cr0x@server:~$ ls -1 /sys/kernel/iommu_groups | head
0
1
10
11
12
What it means: IOMMU grouping is active. No groups usually means IOMMU isn’t enabled or you’re in a mode that didn’t create group sysfs entries.
Decision: If this path is empty or missing, revisit Task 2–3 and firmware settings. Don’t waste time on VFIO binding yet.
Task 5: Inventory devices and current driver binding
cr0x@server:~$ lspci -nnk | sed -n '1,120p'
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:3e0f] (rev 0a)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080] [10de:1e87] (rev a1)
Subsystem: Device [1462:3771]
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
What it means: The GPU is claimed by the host NVIDIA driver, and its audio function is claimed by snd_hda_intel. Passthrough will fail or be unstable if the host owns the device.
Decision: If your goal is to passthrough the GPU, bind both functions (VGA + audio) to vfio-pci. If you only bind one, you’ll often end up with odd errors or a VM that hangs at boot.
Task 6: Identify the IOMMU group membership for the target device
cr0x@server:~$ for g in /sys/kernel/iommu_groups/*; do echo "Group $(basename "$g")"; ls -1 "$g/devices" | sed 's/^/ /'; done | sed -n '1,60p'
Group 1
0000:00:01.0
Group 2
0000:01:00.0
0000:01:00.1
Group 3
0000:00:14.0
0000:00:14.2
What it means: GPU functions share Group 2 together (good), but you need to check whether anything else is in that group (bad). Here it’s only the two GPU functions.
Decision: If your GPU shares a group with other devices you can’t pass through (like a SATA controller hosting your root disk), you have an isolation problem. Jump to grouping remediation, not VFIO config.
Task 7: Spot what changed by comparing groups before/after (if you have logs)
cr0x@server:~$ journalctl -b -1 -k | egrep -i 'DMAR|IOMMU|ACS|vfio' | tail -n 20
Jan 31 09:12:02 server kernel: DMAR: IOMMU enabled
Jan 31 09:12:03 server kernel: vfio-pci 0000:01:00.0: enabling device (0000 -> 0003)
What it means: Prior boot shows VFIO binding happened (or at least attempted). If the current boot doesn’t show similar lines, something changed in initramfs/module load order or device IDs.
Decision: If previous boot had VFIO and current doesn’t, focus on initramfs and modprobe config rather than BIOS or ACS.
Task 8: Verify VFIO modules are loaded
cr0x@server:~$ lsmod | egrep 'vfio|kvm' | head
vfio_pci 163840 0
vfio_pci_core 73728 1 vfio_pci
vfio_iommu_type1 45056 0
vfio 65536 2 vfio_pci_core,vfio_iommu_type1
kvm_intel 430080 0
kvm 1335296 1 kvm_intel
What it means: Core VFIO pieces are present. If vfio_iommu_type1 is missing, VFIO can’t map DMA safely and device assignment will fail.
Decision: If modules are missing, load them and fix persistence. For Debian/Ubuntu, put them in /etc/modules or ensure your initramfs includes them.
Task 9: Bind a device to vfio-pci (temporary, runtime)
cr0x@server:~$ sudo modprobe vfio-pci
cr0x@server:~$ echo 10de 1e87 | sudo tee /sys/bus/pci/drivers/vfio-pci/new_id
10de 1e87
cr0x@server:~$ sudo lspci -nnk -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080] [10de:1e87] (rev a1)
Subsystem: Device [1462:3771]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
What it means: The device is now bound to vfio-pci (runtime). This proves binding works without reboot.
Decision: If this fails because the device is “in use,” you must stop the host service owning it (display manager, persistence daemon, audio). For GPUs, headless hosts are your friend.
Task 10: Make VFIO binding persistent (modprobe config + initramfs)
cr0x@server:~$ sudo tee /etc/modprobe.d/vfio.conf >/dev/null <<'EOF'
options vfio-pci ids=10de:1e87,10de:10f8 disable_vga=1
EOF
cr0x@server:~$ sudo tee -a /etc/modules >/dev/null <<'EOF'
vfio
vfio_pci
vfio_iommu_type1
vfio_pci_core
EOF
cr0x@server:~$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.8.0-48-generic
What it means: You’ve asked vfio-pci to claim specific device IDs, and you’ve ensured modules are present early. disable_vga=1 can help with primary VGA arbitration issues on some platforms.
Decision: If after reboot the host still grabs the GPU, you likely need to blacklist the host GPU drivers (nouveau/nvidia/amdgpu) or remove framebuffer consoles. Be deliberate; don’t blacklist storage/network drivers casually unless you enjoy remote hands.
Task 11: Validate blacklist status for conflicting GPU drivers
cr0x@server:~$ grep -R "blacklist nouveau\|blacklist nvidia\|blacklist amdgpu" /etc/modprobe.d || true
/etc/modprobe.d/blacklist-nouveau.conf:blacklist nouveau
/etc/modprobe.d/blacklist-nouveau.conf:options nouveau modeset=0
What it means: Nouveau is blacklisted. That reduces one common race where the open driver binds first.
Decision: If the proprietary driver still binds early, ensure its packages aren’t installed or its services aren’t starting. For a dedicated passthrough GPU, the host should act like it’s not there.
Task 12: Check whether interrupt remapping is available (safety and stability)
cr0x@server:~$ dmesg -T | egrep -i 'interrupt remap|IR:' | head
[Tue Feb 4 10:18:11 2026] DMAR-IR: Enabled IRQ remapping in x2apic mode
What it means: IRQ remapping is enabled. This matters for safe device assignment and can affect whether VFIO will warn about “unsafe interrupts.”
Decision: If you don’t have interrupt remapping, you may still run passthrough, but you’re in “it works until it doesn’t” territory. For corporate environments, treat missing IRQ remap as a platform risk, not a tweakable nuisance.
Task 13: Detect ACS capability and whether the kernel trusts the topology
cr0x@server:~$ sudo lspci -s 00:01.0 -vv | egrep -i 'ACSCap|ACSctl|ARI|SR-IOV' -n
82: ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
84: ACSctl: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl+ DirectTrans+
What it means: The upstream port advertises ACS features. That increases the odds you’ll get clean groups without overrides.
Decision: If ACS is absent on key bridges, accept that “perfect groups” may be impossible on that board. Move devices to different slots/root ports before you reach for ACS override.
Task 14: Check for IOMMU faults during VM start
cr0x@server:~$ dmesg -T | egrep -i 'fault|DMAR:.*fault|AMD-Vi:.*Event|vfio.*error' | tail -n 20
[Tue Feb 4 10:26:44 2026] DMAR: DRHD: handling fault status reg 2
[Tue Feb 4 10:26:44 2026] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0x7f3c0000 [fault reason 0x06] PTE Read access is not set
What it means: The device attempted DMA to an address not mapped in the IOMMU page tables. That can be a misconfigured guest, buggy device behavior during reset, or a host mapping issue.
Decision: If faults appear right at VM start, suspect device reset/firmware state or an incomplete passthrough (missing companion function). If faults appear under load, suspect driver/firmware bugs or out-of-tree kernel modules interacting badly.
Task 15: Verify the device supports reset behavior you can live with
cr0x@server:~$ sudo lspci -s 01:00.0 -vv | egrep -i 'Capabilities: \[.*\] Express|FLR|PowerCtl|LnkCtl' -n | head -n 30
10: Capabilities: [60] Express (v2) Endpoint, MSI 00
45: DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
72: Capabilities: [100] Advanced Error Reporting
What it means: You can see PCIe Express capabilities, but you may not see explicit FLR lines here; FLR support can vary, and some GPUs behave poorly even when nominally reset-capable.
Decision: If your problem is “first VM boot works, second boot fails,” you’re likely facing reset quirks. The fix is often: avoid frequent VM reboots, cold reboot host between assignments, or select hardware with better reset semantics.
Task 16: Validate QEMU/KVM sees the IOMMU and VFIO devices
cr0x@server:~$ sudo virsh nodedev-list | egrep 'pci_0000_01_00_[01]' || true
pci_0000_01_00_0
pci_0000_01_00_1
What it means: Libvirt sees the PCI devices as node devices. That’s the hypervisor-level inventory check.
Decision: If libvirt can’t see them, your udev/libvirt environment is broken or permissions are wrong. Don’t keep changing VFIO IDs if the management layer can’t enumerate.
Task 17: Confirm vfio device node permissions and group ownership
cr0x@server:~$ ls -l /dev/vfio
total 0
crw-rw---- 1 root kvm 10, 196 Feb 4 10:20 0
crw-rw---- 1 root kvm 10, 196 Feb 4 10:20 2
crw-rw---- 1 root kvm 10, 196 Feb 4 10:20 vfio
What it means: VFIO group character devices exist, and are owned by root:kvm typically. If your libvirt/QEMU runs unprivileged, group membership matters.
Decision: If permissions are wrong, fix your hypervisor service account groups or udev rules. Don’t chmod these to 666 and call it “solved.” That’s not engineering; it’s surrender.
Task 18: Check Secure Boot/module signing trouble (common after updates)
cr0x@server:~$ dmesg -T | egrep -i 'Lockdown|Secure boot|module verification|taint' | tail -n 10
[Tue Feb 4 10:18:12 2026] Lockdown: Loading of unsigned modules is restricted; see man kernel_lockdown.7
What it means: The kernel is in a mode that restricts unsigned modules. If your setup depends on out-of-tree modules (common for vendor GPU drivers), they may not load after an update.
Decision: Either enroll keys/sign modules properly, or avoid out-of-tree dependency on the host for a passthrough GPU. If the host needs vendor drivers, treat Secure Boot as a first-class requirement, not a surprise.
Three corporate mini-stories (pain, hubris, boring victory)
Mini-story 1: The incident caused by a wrong assumption
They had a neat little virtualization cluster used for build acceleration: a handful of VMs with GPU passthrough for rendering tests. It wasn’t “mission critical,” which in corporate language means it becomes mission critical at 2 a.m. when executives want a demo.
A kernel update rolled out as part of routine patching. After reboot, half the VMs wouldn’t start. The operator on-call saw the GPU still listed in the host with the vendor driver loaded and assumed, “That’s fine; the VM can still attach it.” This assumption is how you turn a solvable bug into a four-hour incident.
The actual issue was simpler: early binding changed. The updated initramfs stopped including vfio-pci modules because of a config drift in their image pipeline. The host grabbed the GPUs before VFIO ever got a chance. They wasted time changing BIOS settings and swapping PCIe slots because the groups “looked different,” when in reality the groups were fine—the binding was wrong.
Once they re-added VFIO modules to initramfs and pinned vfio-pci IDs, everything came back. The postmortem wasn’t about VFIO. It was about “verify the layer you’re changing.” The fix went into the image build as an explicit check: fail the build if vfio modules aren’t present in the initramfs for GPU hosts.
Mini-story 2: The optimization that backfired
A platform team wanted to reduce virtualization overhead. They pushed iommu=pt everywhere because they’d seen it recommended for performance. It’s a valid knob in the right context. They deployed it broadly without understanding what it trades off.
After a kernel update, one hardware SKU began throwing intermittent DMA faults under a very specific workload: high I/O plus frequent VM creation/destruction. The fault logs pointed to device DMA accesses that weren’t mapped the way they expected. Their gut reaction was to blame the kernel.
The real failure mode was uglier: the combination of “passthrough mode for host devices” and a platform quirk around ATS/PRI (address translation services / page request interface) created timing-sensitive behavior during rapid attach/detach. On older kernels, a quirk masked it. On the new kernel, that quirk changed. The “optimization” surfaced a correctness issue.
They rolled back the performance flag for that SKU, then reintroduced it only after validating the PCIe feature set and device behavior across their fleet. The lesson: performance flags are not decorations. Treat them like code changes. Gate them behind testing, not vibes.
Mini-story 3: The boring but correct practice that saved the day
A storage team ran a small fleet of hypervisors doing HBA passthrough to VMs running storage services. They were conservative bordering on annoying: every kernel update required a dry-run reboot on a canary host, and they always captured a “passthrough health snapshot” before touching anything.
The snapshot was boring: kernel version, /proc/cmdline, IOMMU status from dmesg, a full IOMMU group listing, and lspci -nnk output for the HBAs. Then they’d store it alongside the change request. Nobody loved this process. It wasn’t glamorous. It worked.
One update changed IOMMU grouping on a specific motherboard revision. The canary reboot immediately showed the HBA group now contained a root port shared with a NIC. That would have blocked safe assignment and forced a risky ACS override. They stopped the rollout, rerouted workloads, and adjusted slot placement on that model.
There was no incident. Just a dull ticket with “rollout paused” and a short plan. That’s the kind of boring that keeps your weekends intact.
Common mistakes: symptom → root cause → fix
This section is intentionally blunt. These are the top failure modes after kernel updates, mapped to the fixes that actually close the loop.
1) VM won’t start: “vfio: failed to setup container” or “No such device”
Symptom: VM start fails immediately; logs mention VFIO container or IOMMU type1 mapping errors.
Root cause: IOMMU not enabled in firmware/kernel, or vfio_iommu_type1 missing.
Fix: Enable VT-d/AMD-Vi in BIOS; add correct kernel args; load VFIO modules; rebuild initramfs.
2) Device shows up, but host driver still binds after reboot
Symptom: lspci -nnk shows nvidia/amdgpu/snd_hda_intel still “in use.”
Root cause: VFIO binding not happening early enough; initramfs lacks vfio-pci; conflicting drivers load first.
Fix: Use options vfio-pci ids=... and ensure VFIO modules are in initramfs; blacklist conflicting drivers where appropriate; keep host off the GPU console.
3) IOMMU groups got worse after update (devices merged)
Symptom: GPU/HBA shares a group with a USB controller, SATA controller, or other critical device.
Root cause: Kernel changed ACS trust/quirks; BIOS toggles changed PCIe routing; board lacks ACS on bridges.
Fix: Move device to a different slot/root port; enable ACS/ARI settings in BIOS if available; avoid ACS override unless you accept the isolation risk.
4) First VM boot works; second boot fails or device disappears
Symptom: VM starts once after host reboot; subsequent starts hang or device won’t reattach.
Root cause: Device reset quirks (GPU FLR issues, power state D3cold behavior), or guest driver leaves device in a bad state.
Fix: Prefer hardware known to reset cleanly; use full host reboot between assignments if needed; tune guest shutdown; avoid “rapid restart loops.”
5) Random IOMMU faults under load
Symptom: dmesg shows DMA faults when the guest is busy; performance tanks or VM crashes.
Root cause: Unstable platform firmware, buggy device firmware/driver, or unsafe interrupt remapping situation.
Fix: Validate interrupt remapping; update motherboard firmware; test with a different kernel; remove performance flags until stability is proven.
6) “ACS override fixed it” and then weird things happen later
Symptom: You forced groups to split; VM starts; later you see odd bus errors, DMA faults, or security concerns.
Root cause: ACS override fakes isolation boundaries; devices can still DMA in ways the hardware doesn’t properly isolate.
Fix: Treat ACS override as a last resort; for multi-tenant or sensitive environments, don’t use it. Fix hardware topology instead.
Short joke #2: ACS override is like using duct tape on a circuit breaker—things get quiet right up until they don’t.
Checklists / step-by-step plan
Checklist A: The “I need it back up in 30 minutes” plan
- Capture evidence: save
uname -r,/proc/cmdline,dmesgIOMMU lines,lspci -nnkfor target devices, and group listing. - Prove IOMMU is active: if no DMAR/AMD-Vi enable lines, fix BIOS/kernel args first.
- Prove binding: device must be
vfio-pciinlspci -nnk. - Prove groups: target device group must not include non-passable critical devices.
- Attempt VM start once: gather errors. Don’t loop start attempts mindlessly; that hides reset problems.
- If reset quirks suspected: do one clean host reboot, start VM once, then avoid frequent restarts until you have a permanent fix.
Checklist B: The “make it stay fixed” plan
- Pin kernel args in bootloader config and document them (Intel vs AMD matters).
- Make VFIO binding persistent via
/etc/modprobe.d/vfio.confand initramfs rebuild. - Blacklist conflicting drivers only when you’re sure the host doesn’t need them.
- Record baseline IOMMU groups and compare after updates on a canary host.
- Validate IRQ remapping and treat lack of it as platform risk.
- Decide on ACS override policy (allowed where? never on shared hosts? only on lab?). Write it down.
- Test cold/warm reboot behavior for each passthrough class device (GPU, HBA, NIC). Some devices lie about reset quality.
- Automate checks so the next update doesn’t become a treasure hunt.
Checklist C: Minimal “passthrough health snapshot” you should store
- Kernel version:
uname -r - Boot args:
cat /proc/cmdline - IOMMU enable lines:
dmesg -T | egrep -i 'DMAR|AMD-Vi|IOMMU' | head - Group listing: enumerate
/sys/kernel/iommu_groups - Driver binding:
lspci -nnkfiltered to assigned devices - Hypervisor view:
virsh nodedev-listor your platform’s equivalent
A reliability idea worth stealing
Paraphrased idea (from Werner Vogels, Amazon CTO): “You build it, you run it.” The team that ships the change should own the operational outcome.
Passthrough is not a “hardware problem” or “kernel problem” you throw over a wall. If you operate it, you need a regression checklist and a rollback path.
FAQ
1) My IOMMU groups changed after a kernel update. Is that normal?
It’s not rare. Grouping is derived from PCIe topology plus what the kernel believes about isolation features (ACS/ARI/quirks). If a kernel changes how it trusts a bridge, groups can merge or split. Treat it as a regression to triage, not as “the system is haunted.”
2) Should I use acs_override=downstream,multifunction to fix grouping?
Only if you understand the security and correctness tradeoff. ACS override forces the kernel to pretend isolation exists. In a lab box, maybe fine. In shared or sensitive environments, avoid it and fix topology (slots, platform, different motherboard).
3) What’s the fastest way to tell if passthrough is failing because of binding?
lspci -nnk. If “Kernel driver in use” isn’t vfio-pci for the device you want to pass through (and its companion functions), you don’t have passthrough yet. Everything else is secondary.
4) Why do I have to pass through the GPU audio function too?
Because it’s a separate PCI function on the same device. Leaving it on the host can prevent proper reset, confuse the guest, or trigger group/ownership conflicts. Treat GPU + GPU-audio as a pair unless you have a specific reason and proof it’s safe to split.
5) My VM works only after a full host reboot. What’s going on?
Reset behavior. Some devices don’t reset cleanly when detached from a guest, especially consumer GPUs. The device ends up in a state the next VM start can’t recover from. Workarounds include avoiding frequent VM restarts, using different hardware, or accepting host reboots between runs.
6) Does iommu=pt help or hurt?
It can help performance for host-owned devices by reducing translation overhead, but it’s not a free lunch. If you hit platform quirks or complex device behavior, remove it during troubleshooting. Prove correctness first, then optimize.
7) How do Secure Boot and kernel updates affect passthrough?
If you rely on out-of-tree modules (common with vendor GPU drivers), Secure Boot can block them after an update unless they’re signed/enrolled properly. That changes driver binding and can break your setup indirectly. Check dmesg for lockdown and module verification messages.
8) I see IOMMU faults in dmesg. Is that always fatal?
Not always, but it’s never “fine.” Faults mean a device attempted DMA outside permitted mappings. If it’s associated with your passthrough device, treat it as a serious signal: incomplete passthrough, reset quirks, buggy drivers, or platform instability.
9) Can I hot-plug passthrough devices safely?
Sometimes, but don’t assume it. Hot-plug adds complexity: power states, reset semantics, and enumeration races. For production reliability, prefer static assignment with well-tested hardware.
10) What should I snapshot before kernel updates on passthrough hosts?
Kernel version, boot args, IOMMU enable lines, IOMMU group inventory, and driver binding for assigned devices. If you can’t compare “before” and “after,” you’ll guess, and guessing is expensive.
Conclusion: next steps that stick
When a kernel update breaks passthrough, the winning move is refusing to thrash. Verify IOMMU activation, then VFIO binding, then groups, then faults/reset quirks. That order prevents you from “fixing” three unrelated things and learning nothing.
Do these next:
- Build a one-command “passthrough health snapshot” script and store its output before and after updates.
- Canary reboot one host per hardware SKU before rolling patches fleet-wide.
- Write down your policy on ACS override and stick to it.
- For devices with bad reset behavior, change the operational pattern (fewer restarts, cold reboots) or change the hardware. Heroics are not a strategy.
If you do nothing else: the next time this breaks, don’t start in the VM config. Start with dmesg, /proc/cmdline, and lspci -nnk. The machine is already telling you the truth. Your job is to listen in the right order.