You did the thing. You enabled PCIe passthrough on Proxmox, stared at IOMMU groups that looked like a junk drawer,
and someone on a forum said the magic words: “ACS override.” One reboot later your GPU finally shows up inside the VM.
You feel clever. You go to bed.
And then the host starts acting haunted: intermittent VM freezes, NVMe timeouts, spontaneous reboots under load,
or that slow drip of “corrected” PCIe errors that turns into an outage on a Friday. ACS override didn’t “fix” your platform.
It told the kernel to pretend your hardware is more isolated than it really is. Sometimes that’s fine. Sometimes it’s a trap.
What ACS override actually does (and what it does not)
In PCIe land, you don’t “pass through a device” so much as you convince the host to stop touching it and then give a VM
direct ownership via VFIO. The security and stability of that arrangement depend on isolation boundaries that the platform
provides: IOMMU grouping, ACS capabilities, interrupt remapping, reset behavior, and plain old PCIe topology.
ACS (Access Control Services) is a set of PCIe features that can enforce how transactions are routed and can prevent peer-to-peer
DMA where it shouldn’t happen. In practice, ACS is a big reason why devices can end up in separate IOMMU groups. If ACS isn’t present
(or isn’t enabled) on a given downstream path, the kernel may have to assume that devices behind the same PCIe bridge can talk to each other.
That assumption forces them into the same IOMMU group.
The Linux “ACS override” option is not your motherboard suddenly learning new tricks. It’s a kernel parameter that can
artificially split IOMMU groups by pretending ACS isolation exists where the kernel can’t prove it.
That’s useful when you want to pass through one device from a group that contains other stuff you want to keep on the host.
Here’s the uncomfortable part: with ACS override, you can create a configuration that looks isolated to VFIO and the VM manager,
but is not isolated on the bus. If two devices can still do peer-to-peer DMA or share an upstream path without proper isolation,
you’ve built a system that is “working” until it doesn’t. You can see crashes, data corruption risks, or weird performance pathologies
that do not reproduce cleanly.
If you’re doing this on a homelab, your risk tolerance may be “acceptable.” In production—or anything you’ll be blamed for—treat ACS override
like a temporary diagnostic tool, not a permanent architecture.
What ACS override changes
- It changes group formation. Devices may appear in separate IOMMU groups even if the platform doesn’t enforce that separation.
- It changes what Proxmox lets you do. You can bind a single device to VFIO without dragging the whole group along.
What ACS override does not change
- It does not add hardware isolation. If the PCIe path lacks ACS, it still lacks ACS.
- It does not fix broken resets. GPU “reset bugs” and FLR issues remain.
- It does not guarantee DMA safety. In the worst case, a passed-through device can still influence host memory or other devices.
- It does not fix interrupt remapping problems. If your platform can’t remap interrupts properly, you can still get hard-to-debug lockups.
One quote that should live in your head when you’re tempted by magical flags is from John Ousterhout:
“Complexity is incremental.
” You rarely notice the first compromise. You absolutely notice the tenth.
Why people reach for ACS override on Proxmox
Proxmox makes VFIO accessible. That’s good. But hardware vendors don’t design consumer platforms to make clean IOMMU groups;
they design them to hit price points and marketing bullet lists. So you get the classic annoyances:
- Your GPU is grouped with the chipset USB controller and the SATA controller.
- Your NVMe slots share a root port with something you need on the host.
- Your HBA sits behind a bridge with other devices you can’t pass through.
- Your platform has IOMMU enabled but interrupt remapping is partial or flaky.
ACS override looks like the easiest exit. And for a lot of people, it “works.” The VM boots, the device appears, benchmarks look great,
and you move on with life.
Joke #1: ACS override is like using duct tape as a structural beam. It holds right up until gravity remembers you exist.
The stability costs: failure modes you can actually hit
Let’s talk about concrete ways this goes sideways. Not theoretical “security researchers might…” stuff. Real operational pain.
1) DMA and peer-to-peer surprises
If two devices are not actually isolated, a passed-through device may be able to reach memory or other devices in ways you didn’t intend.
Even if you’re not worried about malicious behavior, “unexpected peer-to-peer” can manifest as:
- Host instability when the VM driver enables advanced features.
- Device timeouts under load that look like cabling problems (even though it’s PCIe).
- Non-deterministic behavior when multiple VMs contend on shared upstream paths.
2) Interrupt remapping and “random” freezes
Some platforms support IOMMU translation but have weak or buggy interrupt remapping. When you pass through devices that generate a lot of interrupts
(GPUs, NVMe, HBAs, NICs), you can get VM hangs or full host lockups. People blame drivers. Sometimes the driver is fine.
The platform’s interrupt plumbing is the villain.
3) PCIe AER spam that you ignore until it bites
Advanced Error Reporting (AER) logs corrected and uncorrected PCIe errors. Corrected errors can look harmless—until the rate increases and performance tanks,
or until an uncorrected error escalates into a device reset mid-IO. ACS override doesn’t cause AER errors by itself, but it can make you put
high-load devices behind questionable topology, which makes marginal links show their true personality.
4) The “it worked for months” failure
The nastiest outages are the ones that wait. A kernel update changes timing. A VM workload changes interrupt rates. A new driver enables ASPM differently.
Suddenly your previously “stable” passthrough becomes a roulette wheel.
5) Storage corruption risk when you passthrough the wrong thing
Passing through an HBA can be a great pattern for ZFS inside a VM—when it’s done on real isolation boundaries. With ACS override on shaky topology,
the risk profile changes. “But ZFS has checksums” is not a force field. Checksums tell you something went wrong; they don’t guarantee you can recover
without downtime, data loss, or both.
6) Performance cliff: shared root complexes and hidden contention
Group splitting can trick you into believing devices have independent paths. They might still share a root port bandwidth budget,
or share a switch uplink, or share chipset lanes with DMI bottlenecks. You can get:
- NVMe latency spikes when the GPU is under load.
- Packet loss or jitter on passthrough NICs during storage bursts.
- CPU soft lockups caused by interrupt storms when the system can’t remap efficiently.
Interesting facts and history (short, useful, and slightly nerdy)
- IOMMU group semantics in Linux are conservative by design. The kernel groups devices together if it can’t prove isolation. That’s intentional paranoia, not laziness.
- ACS came from the PCIe spec’s need to manage multi-function and switched fabrics. Servers with PCIe switches and lots of endpoints needed enforcement features; desktops often don’t bother.
- VT-d (Intel) and AMD-Vi (AMD IOMMU) matured through the virtualization boom. Early implementations existed, but robust passthrough became mainstream only when virtualization became a default, not a niche.
- Proxmox didn’t invent VFIO; it operationalized it. VFIO is a Linux kernel framework; Proxmox is the friendly face that makes people brave enough to click “Add PCI Device.”
- Interrupt remapping was the “oh right” moment. Address translation alone wasn’t enough; safe device assignment needs interrupts handled with similar rigor.
- Consumer chipsets often hang devices off the same upstream ports. That’s why your USB controller and SATA controller can end up married in the same IOMMU group.
- GPU reset behavior has been a recurring pain point for years. FLR support varies, vendor drivers vary, and the result is the famous “works until VM reboot” pattern.
- AER logging is older than many people think. PCIe has long had a way to tell you the link is sick; we just got very good at ignoring it.
- ACS override exists because users asked for it. It’s a pragmatic tool for development and homelabs, not a stamp of architectural approval.
Fast diagnosis playbook (find the bottleneck quickly)
When passthrough is flaky, you can waste days in driver folklore. Don’t. Start with topology and evidence.
Here’s the order that has saved me the most time.
First: confirm you actually have IOMMU + interrupt remapping
- Check kernel command line for AMD/Intel IOMMU settings.
- Check dmesg for “DMAR” (Intel) or “AMD-Vi” and interrupt remapping enabled.
- If interrupt remapping is missing/disabled, expect weird hangs under load.
Second: inspect IOMMU groups and real PCIe topology
- List groups from /sys/kernel/iommu_groups.
- Compare with
lspci -tandlspci -vvto see which devices share bridges/root ports. - If ACS override is enabled, treat group splits as “logical,” not necessarily “physical.”
Third: look for AER, reset, and VFIO errors
- Search logs for AER, DPC, “BAR”, “vfio”, “DMAR fault”, “IOMMU fault”.
- Correlate with workload times. If errors show up when the VM starts/stops, suspect reset/FLR behavior.
Fourth: isolate variables with one change at a time
- Disable ACS override temporarily and see if the issue disappears (even if it breaks your layout).
- Move the device to a different slot/root port if possible.
- Update firmware/BIOS only after you’ve captured baseline logs.
Fifth: decide if this is an architecture problem
If the platform can’t provide stable isolation, you don’t “tune” your way out. You replace hardware, or you change the design:
different NIC, different GPU, different slot, different board, or no passthrough.
Practical tasks: commands, outputs, and decisions (12+)
These are the checks I actually run on Proxmox (Debian-based) when someone says “ACS override fixed it” and I want to know what we just signed up for.
Each task includes: command, sample output, what it means, and the decision you make.
Task 1: Confirm IOMMU is enabled in the running kernel
cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.12-4-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt
What it means: You’re booted with AMD IOMMU enabled; iommu=pt uses passthrough mode for host devices (often fine for performance).
Decision: If you don’t see intel_iommu=on or amd_iommu=on, fix that before touching ACS override or VFIO.
Task 2: Verify IOMMU and interrupt remapping in dmesg (Intel)
cr0x@server:~$ dmesg -T | egrep -i "DMAR|IOMMU|remapping" | head -n 20
[Tue Feb 4 10:12:11 2026] DMAR: IOMMU enabled
[Tue Feb 4 10:12:11 2026] DMAR-IR: Enabled IRQ remapping in x2apic mode
What it means: Intel VT-d is active and IRQ remapping is enabled. That’s the good path.
Decision: If you see “IRQ remapping disabled” or nothing about IR, assume higher risk for passthrough stability.
Task 3: Verify IOMMU and interrupt remapping in dmesg (AMD)
cr0x@server:~$ dmesg -T | egrep -i "AMD-Vi|IOMMU|remap|IVRS" | head -n 30
[Tue Feb 4 10:12:10 2026] AMD-Vi: IOMMU performance counters supported
[Tue Feb 4 10:12:10 2026] AMD-Vi: Interrupt remapping enabled
What it means: AMD IOMMU is working and interrupt remapping is enabled.
Decision: No interrupt remapping? Don’t “paper over” with ACS override. Expect trouble with high-interrupt devices.
Task 4: Check whether ACS override is enabled
cr0x@server:~$ grep -R "pcie_acs_override" -n /etc/default/grub /etc/kernel/cmdline 2>/dev/null
/etc/default/grub:6:GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction"
What it means: ACS override is explicitly enabled.
Decision: Treat every “clean” IOMMU group as suspect until proven by topology and ACS capability bits.
Task 5: Enumerate IOMMU groups the straightforward way
cr0x@server:~$ for g in /sys/kernel/iommu_groups/*; do echo "Group ${g##*/}:"; lspci -nns $(ls "$g"/devices | sed 's/^/0000:/'); echo; done | head -n 40
Group 12:
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
03:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
Group 13:
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller [144d:a808]
What it means: GPU and its audio function are grouped together (expected). NVMe is in a different group (nice, maybe).
Decision: If critical host controllers share a group with the device you want, do not jump to ACS override yet. Map the topology first.
Task 6: View PCIe topology as a tree (who shares which bridge)
cr0x@server:~$ lspci -t
-+-[0000:00]-+-00.0
| +-01.0-[01]----00.0
| +-03.0-[02-05]--+-00.0
| | +-00.1
| | \-01.0
| \-08.1-[06]----00.0
What it means: Devices behind the same downstream port may share isolation constraints. Group splitting that contradicts this tree is a red flag.
Decision: If your target device sits under a switch/bridge with other devices, consider moving slots or using a board with better root port separation.
Task 7: Check whether the relevant bridges advertise ACS capability
cr0x@server:~$ lspci -vv -s 00:03.0 | egrep -i "ACSCap|ACSCtl|PCIe Cap|Root Port" -n
45:00:03.0 PCI bridge: Intel Corporation Device 7a44 (rev 11)
86: Capabilities: [148 v1] Access Control Services
87: ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
88: ACSCtl: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+
What it means: This bridge actually has ACS. That’s good; group separation here is more likely to be “real.”
Decision: If the bridge lacks ACS capability but you’re relying on ACS override to split devices behind it, you’re betting on a lie.
Task 8: Confirm VFIO bindings for the passthrough device
cr0x@server:~$ lspci -nnk -s 03:00.0
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2684] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:5162]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
What it means: The GPU is bound to vfio-pci, not the host graphics driver.
Decision: If you see the vendor driver in use, fix binding before debugging stability. Half-bound devices create phantom issues.
Task 9: Look for IOMMU faults and DMAR/AMD-Vi errors during VM use
cr0x@server:~$ journalctl -k --since "2 hours ago" | egrep -i "DMAR|AMD-Vi|IOMMU fault|vfio|AER" | tail -n 30
Feb 04 09:41:22 server kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Feb 04 09:58:03 server kernel: pcieport 0000:00:03.0: AER: Corrected error received: 0000:04:00.0
Feb 04 09:58:03 server kernel: nvme 0000:04:00.0: AER: [0] RxErr (First)
What it means: Corrected PCIe errors on the NVMe link. Not instantly fatal, but it’s telling you the physical link is unhappy.
Decision: If corrected errors occur under load or increase over time, treat it like a failing cable—except the “cable” is your slot, riser, power, or marginal PCIe signaling.
Task 10: Check whether your NVMe is dropping queues or timing out
cr0x@server:~$ journalctl -k --since "24 hours ago" | egrep -i "nvme.*timeout|I/O.*timeout|resetting controller|frozen" | tail -n 20
Feb 03 22:14:19 server kernel: nvme nvme0: I/O 124 QID 7 timeout, aborting
Feb 03 22:14:19 server kernel: nvme nvme0: Abort status: 0x371
Feb 03 22:14:20 server kernel: nvme nvme0: resetting controller
What it means: The NVMe controller is timing out and resetting. If this coincides with GPU/VM load, suspect shared PCIe path contention or signaling issues.
Decision: Don’t blame ZFS first. Fix the transport. Consider moving the NVMe to a CPU-connected slot or disabling aggressive power management.
Task 11: Confirm virtualization features and IOMMU visibility
cr0x@server:~$ pveversion -v | head -n 5
proxmox-ve: 8.2.2 (running kernel: 6.8.12-4-pve)
pve-manager: 8.2.2 (running version: 8.2.2/1a3f7d3e)
pve-kernel-6.8: 6.8.12-4
What it means: You can correlate behavior with kernel versions. VFIO regressions and PCIe quirks can be kernel-specific.
Decision: If instability started right after a kernel update, test the previous Proxmox kernel before changing hardware or “tuning” blindly.
Task 12: Inspect QEMU VM config for passthrough options that affect reset and ROM
cr0x@server:~$ cat /etc/pve/qemu-server/101.conf
agent: 1
bios: ovmf
boot: order=scsi0;net0
cpu: host
hostpci0: 0000:03:00,pcie=1,x-vga=1
machine: q35
memory: 16384
name: gpu-vm
net0: virtio=DE:AD:BE:EF:00:01,bridge=vmbr0
ostype: win11
scsi0: local-lvm:vm-101-disk-0,size=200G
What it means: OVMF + q35 + cpu: host is a typical GPU passthrough setup.
Decision: If you see legacy BIOS, i440fx, or strange combinations, standardize first. Debugging bespoke VM configs is a tax you don’t need.
Task 13: Check whether the device supports FLR (Function Level Reset)
cr0x@server:~$ lspci -vv -s 03:00.0 | egrep -i "FLR|Reset" -n
210: Capabilities: [1b0 v1] Transaction Processing Hints
233: Capabilities: [300 v1] Secondary PCI Express
262: Kernel driver in use: vfio-pci
What it means: This snippet doesn’t show FLR explicitly; many devices don’t advertise it clearly in a way users notice. Reset behavior still matters operationally.
Decision: If VM reboots fail unless you reboot the host, suspect reset behavior. Consider a different GPU/model or a topology where the device can be power-cycled (e.g., slot power control on some server boards).
Task 14: Validate that the host isn’t accidentally using the passed-through GPU for console
cr0x@server:~$ ls -l /dev/dri
total 0
drwxr-xr-x 2 root root 80 Feb 4 10:12 by-path
crw-rw---- 1 root video 226, 0 Feb 4 10:12 card0
crw-rw---- 1 root render 226, 128 Feb 4 10:12 renderD128
What it means: DRM devices exist. If the passthrough GPU is bound to VFIO, it typically should not appear as an active DRM device for the host stack.
Decision: If the host is using the GPU (framebuffer/DRM), fix that: blacklist drivers, set primary GPU in BIOS, and ensure VFIO binds early.
Task 15: Confirm that your intended HBA is in a clean group before passthrough
cr0x@server:~$ lspci -nn | egrep -i "SAS|HBA|LSI|Broadcom|MegaRAID"
81:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
cr0x@server:~$ readlink -f /sys/bus/pci/devices/0000:81:00.0/iommu_group
/sys/kernel/iommu_groups/26
What it means: The HBA is in group 26. Now list what else is in that group before passing it through.
Decision: If the group contains anything besides the HBA (or its expected functions), don’t rely on ACS override to “make it fine.” Move slots or change board.
Task 16: Check CPU/chipset lane attachment hints (quick-and-dirty)
cr0x@server:~$ lspci -vv -s 04:00.0 | egrep -i "LnkCap|LnkSta"
LnkCap: Port #0, Speed 16GT/s, Width x4
LnkSta: Speed 16GT/s, Width x4
What it means: Link is at expected speed/width. If you see x1 width or lower speed than expected, you’ve found a performance and stability clue.
Decision: Fix physical seating, BIOS PCIe settings, risers, or slot choice before blaming Proxmox or VFIO.
Three corporate mini-stories from the real world
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company built a Proxmox cluster to host build agents and some internal services. They had one GPU box for CI workloads:
browser tests, video encoding, and a few CUDA jobs. The GPU lived in a workstation-class motherboard that looked “server-ish” enough.
The engineer setting it up found ugly IOMMU groups and enabled ACS override. The GPU VM booted. Benchmarks looked great. Everyone moved on.
Their assumption was simple: “If it’s in its own IOMMU group, it’s isolated.” The kernel said it was. So it must be true.
Weeks later, they started seeing sporadic failures in completely unrelated VMs on that same host. Not every day. Not every workload.
The failures were maddening: a VM would freeze for 30–90 seconds, recover, then have corrupted application state. Sometimes the host logged corrected PCIe errors.
Sometimes nothing at all.
The eventual pattern was that the freezes correlated with high GPU DMA activity and high NVMe IO at the same time.
The PCIe topology showed both devices behind a shared upstream port without proper ACS capability. ACS override had split the groups logically,
but the bus could still behave like a shared neighborhood with thin walls.
The fix was boring and expensive: they moved the GPU to a different slot, which moved it to a different root port, and replaced a riser that was marginal at Gen4 speeds.
They disabled ACS override after confirming the groups were clean without it. Stability returned immediately. Nobody missed the cleverness.
Mini-story 2: The optimization that backfired
Another org tried to squeeze maximum performance out of a storage-heavy Proxmox host. They were passing through an HBA to a TrueNAS VM
and also passing through a high-speed NIC to a router VM. The motherboard didn’t give them clean groups. ACS override looked like a way to avoid buying a new platform.
They also enabled iommu=pt and a handful of “performance” BIOS toggles: ASPM tweaks, aggressive power savings disabled, and some PCIe speed forcing.
The goal was low latency and high throughput. The result was… initially impressive benchmarks.
Then came the backfire: under sustained traffic, the NIC VM would occasionally drop link for a second. Not long enough to trigger obvious alarms,
but long enough to cause TCP retransmits and application timeouts upstream. Meanwhile the storage VM logged occasional controller resets.
The host itself didn’t crash, so everyone treated it as a guest problem.
It wasn’t. It was contention and error recovery on a shared PCIe path that ACS override had made easy to ignore.
The “optimization” increased load and pushed the link into an error-recovery regime. Corrected AER errors climbed. Latency got spiky.
They rolled back the BIOS forcing, then moved one device to a CPU-connected slot. Performance dropped slightly on paper, but the network stopped hiccuping.
The service graphs looked boring again. Boring is the target.
Mini-story 3: The boring but correct practice that saved the day
A financial services team ran Proxmox for internal workloads with a strict change management process (yes, they exist).
They wanted GPU passthrough for a small analytics workload. The hardware was a proper server motherboard with clear root ports and validated VT-d.
The lead SRE insisted on a checklist: capture PCIe topology, record IOMMU groups before changes, validate interrupt remapping, and run a burn-in test.
No ACS override unless they could prove ACS capability on the upstream path. This was unpopular. People wanted the VM today.
During burn-in, they caught a steady stream of corrected AER errors on the GPU’s upstream port at Gen4. The system was “working,” but the logs were not happy.
They swapped the GPU to another slot and the errors vanished. It was likely seating or signal integrity on that slot, but they didn’t need a philosophical answer.
They needed clean operation.
Months later, a firmware update changed PCIe equalization behavior and a different team started seeing AER noise on similar hosts.
This team already had baseline logs and topology snapshots, so they could immediately compare “before” and “after.”
They avoided days of guesswork and scoped the issue to a specific BIOS revision.
Their practice wasn’t glamorous. It was also the reason nobody got paged at 2 a.m.
Do this instead: safer ways to get passthrough
If you’re tempted by ACS override, it usually means the platform isn’t giving you clean boundaries. The right response is to fix boundaries,
not to pretend they exist.
Option A: Use the right slot (topology is a feature)
Many boards have one or two slots wired directly to CPU root complexes and others wired through the chipset.
CPU-connected slots tend to have cleaner isolation and fewer “everything shares everything” surprises.
Move the device. Yes, physically. It’s amazing how many “VFIO problems” are actually “you chose the slot the board designer used for convenience.”
Option B: Pick hardware with real ACS and sane IOMMU grouping
Server boards, workstation boards, and some higher-end consumer boards provide better ACS behavior and more root ports.
This is not guaranteed by brand; it’s model-specific. You want:
- Clear separation of CPU lanes vs chipset lanes
- Proper ACS capability on bridges/root ports
- Validated VT-d/AMD-Vi with interrupt remapping
Option C: Pass through the whole group (when it’s reasonable)
The kernel groups devices together because it can’t prove isolation. Sometimes the simplest stable move is to accept that:
pass through the entire IOMMU group to a single VM and keep the host away from those devices entirely.
This works well when the group is “GPU + GPU audio” or “USB controller you can spare.” It works poorly when the group includes storage or network
you need for the host. That’s your signal that the platform is the problem, not Proxmox.
Option D: Stop passing through the wrong class of device
If you want storage performance, you don’t always need HBA passthrough. Sometimes a virtio-scsi setup on the host with ZFS is simpler and safer.
If you want GPU acceleration for a specific application, sometimes a container with proper device access is enough (and avoids bus-level risk).
Option E: If you must use ACS override, scope it and test like you mean it
Sometimes you’re stuck. Budget, hardware constraints, time. If you decide to use ACS override anyway, treat it like a controlled burn:
- Enable it only to validate feasibility, then try to remove it by changing slots/hardware.
- Burn-in test under realistic IO and interrupt load.
- Monitor AER, IOMMU faults, resets, and latencies continuously.
- Have a rollback path and document why this risk was accepted.
Joke #2: If your availability plan is “I enabled ACS override and prayed,” congratulations—you’ve implemented faith-based infrastructure.
Common mistakes: symptom → root cause → fix
Here are the repeat offenders. Not moral failings. Just patterns.
1) Symptom: VM runs fine until reboot; then GPU passthrough fails
Root cause: GPU reset behavior (no FLR or buggy reset path). Device doesn’t return to a clean state after guest shutdown/reboot.
Fix: Try a different GPU model, ensure the device isn’t shared, consider host reboot as last resort, and avoid ACS override masking topology issues. If stability matters, choose hardware known for sane reset behavior.
2) Symptom: Host hard-freezes under IO or GPU load
Root cause: Missing/disabled interrupt remapping, or platform IOMMU bugs triggered by high interrupt rates.
Fix: Confirm dmesg shows interrupt remapping enabled. Update BIOS/firmware. If IR cannot be enabled reliably, don’t do passthrough of high-interrupt devices on that platform.
3) Symptom: Storage latency spikes when GPU VM is busy
Root cause: Shared root port/switch uplink bandwidth contention, often made invisible by ACS override group splitting.
Fix: Move devices to different CPU root ports; avoid chipset lanes for both heavy devices. Verify with lspci -t and link width/speed.
4) Symptom: AER corrected errors slowly climb in logs
Root cause: Marginal PCIe signaling (slot seating, riser, power delivery, forced Gen speed, or board layout).
Fix: Reseat device, remove risers, revert forced PCIe settings, try a different slot, and consider limiting link speed (as a diagnostic) to see if errors disappear.
5) Symptom: “Device is in its own IOMMU group” but weird cross-device effects happen
Root cause: ACS override created logical groups without real ACS enforcement.
Fix: Disable ACS override and re-check groups. If separation disappears, don’t rely on the fake split—change hardware/topology.
6) Symptom: VFIO binds, but device still appears used by host drivers
Root cause: Driver binding order issues; missing initramfs VFIO configuration; host console using GPU.
Fix: Ensure VFIO modules load early, blacklist conflicting modules, set primary display in BIOS to iGPU/other GPU, and re-check lspci -nnk.
7) Symptom: Passing through an HBA causes sporadic ZFS errors inside the storage VM
Root cause: HBA shares a group/topology with other devices; PCIe errors/reset events; or power management quirks.
Fix: Put HBA on a clean root port; avoid ACS override; use enterprise HBAs with stable firmware; watch for AER and controller resets.
Checklists / step-by-step plan
Checklist A: Before you enable ACS override (the “don’t regret this” pass)
- Confirm IOMMU is enabled in
/proc/cmdline. - Confirm interrupt remapping is enabled in
dmesg(DMAR-IR or AMD-Vi interrupt remapping). - Capture
lspci -toutput for topology. - Capture IOMMU group listing from
/sys/kernel/iommu_groups. - Check the relevant bridges for ACS capability using
lspci -vv. - Decide whether you can pass the whole group instead.
- Decide whether moving slots can fix grouping without override.
Checklist B: If you already enabled ACS override (stabilize or back out)
- Document the exact kernel parameters and Proxmox version.
- Run a 2–4 hour burn-in that matches your real workload (GPU + storage + network).
- Monitor
journalctl -kfor AER, IOMMU faults, VFIO reset messages. - If you see AER corrected errors rising, treat it as a hardware/topology problem first.
- Try moving the device to a different slot and repeat burn-in.
- Try disabling ACS override and re-check if grouping is now acceptable.
- If you can’t remove ACS override, set expectations: higher operational risk, more monitoring, and a tested rollback plan.
Checklist C: A production-grade PCI passthrough acceptance test
- Cold boot host, start VM, run load for 60 minutes.
- Stop VM, start VM again (reset path test), run load again.
- Migrate unrelated VMs off and on the host if your environment does that (stress scheduling).
- Capture AER counts and compare start vs end.
- Validate no NVMe resets, no NIC link flaps, no host soft lockups.
FAQ
1) Is ACS override “unsafe” or just “unsupported”?
It’s a deliberate trade. It can reduce isolation guarantees by creating IOMMU group splits the hardware may not enforce.
That’s a safety and security concern, not just a support concern.
2) If my VM works, why should I care about “real” isolation?
Because your failure mode may be rare and workload-dependent. Isolation problems tend to show up as intermittent hangs, resets, and performance cliffs.
Those are the worst kinds of incidents to debug.
3) Does ACS override always cause instability?
No. On some topologies the practical risk is low, and the override mainly helps the kernel be less conservative. But you usually don’t know which case you’re in
unless you inspect ACS capabilities and topology.
4) Can I use ACS override only for one device?
The kernel parameter affects grouping behavior broadly (depending on mode). You can’t neatly constrain it to a single endpoint the way you can with VFIO binding.
Treat it as a host-wide behavioral change.
5) What’s the best alternative when my GPU shares a group with SATA or USB?
Move the GPU to a different slot/root port, or change the board. If that’s impossible, consider passing through a different GPU or using a platform built for virtualization.
If you must, pass through the whole group—only if you can spare everything in it.
6) I’m doing ZFS in a VM with an HBA passthrough. Is ACS override ever acceptable?
It’s a risk multiplier. If the HBA isn’t truly isolated, you’re creating a path where bus-level weirdness can impact storage reliability.
For storage, I’m stricter: avoid ACS override unless you’ve validated ACS on the upstream path and you’ve burn-in tested under real IO.
7) What should I watch in logs to catch trouble early?
AER messages, “resetting controller” for NVMe, IOMMU faults, DMAR/AMD-Vi errors, and VFIO messages around device reset or BAR mapping.
If corrected errors trend upward, don’t ignore them.
8) Does iommu=pt make passthrough less safe?
It can reduce overhead by mapping host devices more directly, but it’s not a substitute for proper isolation. For most Proxmox setups it’s common.
Your biggest safety lever is still real IOMMU grouping and interrupt remapping.
9) Can a kernel update change IOMMU groupings?
Yes. Kernel PCI quirks and ACS handling can change. That’s why you should capture “known good” groupings and topology, and re-validate after major updates.
10) When is ACS override a reasonable temporary tool?
When you’re validating feasibility, doing a lab proof, or unblocking a migration while you wait for correct hardware. Temporary means you’ve already planned the exit:
different slot, different board, or different architecture.
Next steps you can take this week
If you’re currently running ACS override on a Proxmox host that matters, do not “set and forget” it. Do these next:
- Capture evidence: save
/proc/cmdline,lspci -t, and a full IOMMU group listing. - Verify remapping: confirm interrupt remapping in
dmesg. If it’s not there, stop calling it stable. - Audit bridges: check ACS capability on the upstream path of the devices you pass through.
- Burn-in test: run the real workload and watch for AER/IOMMU/NVMe reset messages. If logs are noisy, fix physical/topology issues first.
- Try to remove the override: move slots, pass through whole groups, or change hardware until groups are clean without the kernel pretending.
The goal isn’t purity. The goal is predictable failure domains. If ACS override is the only thing between you and a functioning system,
that’s a signal to redesign—not a reason to celebrate.