You did the “right” things: enabled IOMMU, added the PCI device, picked OVMF, installed drivers. The VM boots.
And then… nothing. No signal. No cursor. Just a black screen, the universal symbol for “your weekend plans are canceled.”
A Proxmox GPU passthrough black screen is rarely mysterious. It’s usually one of a small set of failure modes:
the GPU is bound to the wrong driver, the VM firmware isn’t compatible, the GPU never resets, the display is on the wrong output,
or Windows decided to be Windows. This is a production-grade checklist for getting to root cause quickly, with commands,
expected outputs, and the decision you make next.
Fast diagnosis playbook (check first/second/third)
When the screen is black, don’t thrash. A GPU passthrough pipeline is long. Your goal is to find which segment is broken:
host device isolation, VM firmware initialization, guest driver, or display routing.
First: confirm the host actually handed the GPU to VFIO
-
Check binding: the GPU must be bound to
vfio-pci, notnvidia,amdgpu, ornouveau. - Check IOMMU is on: if DMA remapping is off, passthrough may half-work and then fail weirdly.
- Check IOMMU groups: if your GPU is in a group with other devices you didn’t pass through, you’ll get inconsistent failures.
Second: confirm the VM can initialize the device (firmware + machine type)
- OVMF vs SeaBIOS: modern GPUs often need UEFI (OVMF). Some older cards behave better with SeaBIOS. Pick deliberately.
- Q35 vs i440fx: Q35 is the default for a reason (PCIe). Many GPU passthrough issues evaporate on Q35.
- Primary GPU setting: if the VM expects a display device but you removed it, you might be staring at nothing because you asked for nothing.
Third: confirm the guest driver is healthy and output is routed correctly
- Windows: check Device Manager errors (Code 43, Code 31) and the event log.
- Linux guests: check dmesg for amdgpu/nvidia initialization, and confirm which output is hot.
- Display routing: test different ports (DP vs HDMI) and disable high refresh + HDR while debugging.
One quote worth keeping in your head while you debug: paraphrased idea from Richard Feynman:
“Reality must take precedence over public relations.” In ops terms: trust what the kernel says, not what your UI implies.
Interesting facts and short history (why this is so fussy)
- Intel VT-d (DMA remapping) shipped widely years after CPU virtualization; early hypervisors could run VMs but not safely pass devices.
- VFIO replaced older PCI passthrough frameworks in Linux because it’s safer and integrates with IOMMU properly.
- OVMF is essentially UEFI firmware for VMs; as GPUs became more UEFI-centric, legacy BIOS paths started failing more often.
- GPUs are not “just PCIe devices”: many are also tiny computers with their own firmware, memory training, and boot-time state.
- NVIDIA’s “Code 43” drama became a rite of passage for passthrough users; it’s less common now but still appears in edge cases.
- Reset bugs are real: some GPUs don’t fully reset after VM shutdown, so the next boot gets a device stuck in a bad state.
- ACS (Access Control Services) affects PCIe isolation; consumer boards may group devices together even when you’d like them separate.
- Resizable BAR is a performance feature that can change how memory is mapped; it’s great until it’s not, especially across firmware boundaries.
- DisplayPort link training can fail after warm resets; a “black screen” can be a monitor negotiation issue, not a PCI issue.
The mental model: where the black screen happens
A “black screen” isn’t one problem; it’s a symptom. The GPU passthrough stack has layers, and each layer can fail in a way that looks identical
from your chair.
- Hardware isolation: IOMMU must translate DMA, and the PCIe topology must let you isolate the GPU cleanly.
- Host driver binding: the host must not claim the GPU with a native driver. VFIO must own it before the VM starts.
- VM firmware + machine model: OVMF/Q35 must enumerate PCIe, assign BARs, and initialize option ROM paths correctly.
- Guest driver: Windows/Linux must load a driver that can handle the environment, power management, and display initialization.
- Display output: the GPU may be fine, but output is on a different port, the monitor negotiation failed, or the GPU is waiting for a mode set.
Dry-funny truth: the GPU is innocent until proven guilty, but it will still ruin your day like a cat with access to a keyboard.
Practical tasks: commands, outputs, decisions (at least 12)
These are the tasks I run in roughly this order. Each one has: command, what output means, and what decision you make next.
They’re written for a Proxmox VE host (Debian-based).
Task 1: Confirm IOMMU is actually enabled
cr0x@server:~$ dmesg | egrep -i 'DMAR|IOMMU|AMD-Vi' | head -n 30
[ 0.000000] DMAR: IOMMU enabled
[ 0.012345] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 0.045678] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
Meaning: You want to see “IOMMU enabled” (Intel DMAR) or “AMD-Vi: IOMMU enabled”.
Decision: If you don’t see it, stop. Fix BIOS settings and kernel parameters before touching VM config.
Task 2: Verify kernel cmdline parameters
cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.12-4-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt
Meaning: intel_iommu=on or amd_iommu=on should be present. iommu=pt is commonly used for performance.
Decision: If missing, edit GRUB, update-grub, reboot. Don’t “hope it’s fine.”
Task 3: Identify GPU PCI IDs (and the audio function)
cr0x@server:~$ lspci -nn | egrep -i 'vga|3d|display|audio'
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080] [10de:1e87] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
Meaning: GPUs are often multi-function devices. The audio function matters for resets and for Windows driver sanity.
Decision: Plan to pass through both functions unless you have a reason not to.
Task 4: Check what driver currently owns the GPU
cr0x@server:~$ lspci -nnk -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2080] [10de:1e87] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] TU104 [1462:3751]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
Meaning: “Kernel driver in use: vfio-pci” is what you want.
Decision: If it says nouveau, nvidia, or amdgpu, fix binding before running the VM.
Task 5: Confirm VFIO modules are loaded
cr0x@server:~$ lsmod | egrep 'vfio|kvm'
vfio_pci 16384 0
vfio_pci_core 69632 1 vfio_pci
vfio_iommu_type1 45056 0
vfio 57344 2 vfio_pci_core,vfio_iommu_type1
kvm_intel 425984 0
kvm 1376256 1 kvm_intel
Meaning: If VFIO isn’t present, passthrough will be flaky or fail outright.
Decision: Load modules, fix initramfs configuration, reboot if needed.
Task 6: Inspect IOMMU group isolation for the GPU
cr0x@server:~$ for d in /sys/kernel/iommu_groups/*/devices/*; do echo "$(basename "$(dirname "$d")") $(basename "$d")"; done | sort -n | egrep '01:00'
12 0000:01:00.0
12 0000:01:00.1
Meaning: Ideally, the GPU and its audio function are in a group by themselves (or with only what you also pass through).
Decision: If the group includes chipset devices or an NVMe you need on the host, you likely need different slot placement or ACS override.
Task 7: Verify Proxmox VM config for correct machine/firmware
cr0x@server:~$ qm config 120
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 8
cpu: host,hidden=1,flags=+pcid
machine: q35
memory: 16384
name: win11-gpu
ostype: win11
scsi0: local-lvm:vm-120-disk-0,discard=on,iothread=1
hostpci0: 0000:01:00.0,pcie=1,x-vga=1
hostpci1: 0000:01:00.1,pcie=1
Meaning: bios: ovmf and machine: q35 are usually correct. x-vga=1 can help when you want GPU as primary.
Decision: If you’re on i440fx/SeaBIOS with a modern GPU, switch to Q35/OVMF unless you have evidence not to.
Task 8: Read QEMU log for device assignment errors
cr0x@server:~$ journalctl -u pve-qm@120 --no-pager -n 120
... qemu-system-x86_64: vfio-pci 0000:01:00.0: BAR 1: can't reserve [mem 0x...] ...
... qemu-system-x86_64: vfio-pci 0000:01:00.0: failed to setup container for group 12: Failed to set iommu for container: Operation not permitted
... VM 120 start failed: QEMU exited with code 1
Meaning: BAR reservation issues, group errors, and permission problems show up here first.
Decision: If QEMU can’t start, don’t chase guest drivers. Fix the host/firmware layer.
Task 9: Check for host console takeover (framebuffer conflicts)
cr0x@server:~$ dmesg | egrep -i 'fbcon|efifb|simplefb|vesafb|nouveau|amdgpu' | head -n 60
[ 1.234567] efifb: framebuffer at 0xe1000000, using 3072k, total 3072k
[ 1.234890] fbcon: Deferring console take-over
Meaning: If the host binds a framebuffer to the GPU, VFIO binding may race or fail.
Decision: If you see the host grabbing the target GPU, blacklist the relevant modules and consider disabling EFI framebuffer for that device.
Task 10: Confirm the VM sees the GPU in-guest (Windows via agent-less hint)
cr0x@server:~$ qm monitor 120
(qemu) info pci
Bus 0, device 1, function 0:
VGA controller: PCI device 10de:1e87
BAR0: 64 bit memory at 0x00000000f6000000 [0x01000000]
BAR1: 64 bit memory at 0x00000000e0000000 [0x10000000]
Meaning: QEMU enumerates the device and maps BARs. This doesn’t prove drivers work, but it proves you’re past basic enumeration.
Decision: If it’s missing, go back to VM config (hostpci lines, pcie=1) and IOMMU group checks.
Task 11: Detect reset-related issues (GPU stuck after shutdown)
cr0x@server:~$ dmesg | egrep -i 'vfio-pci|reset|FLR|D3' | tail -n 60
[ 912.345678] vfio-pci 0000:01:00.0: not ready 1023ms after bus reset; waiting
[ 913.369012] vfio-pci 0000:01:00.0: not ready 2047ms after bus reset; giving up
Meaning: The GPU didn’t reset cleanly. That often yields a black screen on the next VM boot.
Decision: Apply a reset workaround (vendor-reset for some AMD GPUs, full host reboot, or choose a GPU known to reset cleanly).
Task 12: Check interrupt remapping and kernel warnings
cr0x@server:~$ dmesg | egrep -i 'interrupt remapping|IR:|x2apic|ATS|posted interrupt' | head -n 80
[ 0.123456] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 0.123789] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
Meaning: IRQ remapping reduces “mysterious device lockups under load” and improves isolation.
Decision: If you see warnings about IRQ remapping being disabled, expect instability. Fix BIOS/firmware or consider kernel flags appropriate to your platform.
Task 13: Confirm the GPU ROM situation (only if needed)
cr0x@server:~$ echo 1 | sudo tee /sys/bus/pci/devices/0000:01:00.0/rom >/dev/null
cr0x@server:~$ sudo cat /sys/bus/pci/devices/0000:01:00.0/rom | head -c 16 | hexdump
0000000 55 aa 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000010
Meaning: A ROM header starts with 55 aa. Some GPUs need a dumped ROM to initialize in a VM, especially if not the primary GPU.
Decision: If ROM reads are blocked or garbage, consider providing a known-good ROM file in the VM config (and only then).
Task 14: Verify hugepages / memory backing didn’t break DMA
cr0x@server:~$ grep -H . /proc/meminfo | egrep 'HugePages|Hugepagesize'
/proc/meminfo:HugePages_Total: 8192
/proc/meminfo:HugePages_Free: 8000
/proc/meminfo:Hugepagesize: 2048 kB
Meaning: Hugepages are fine, but mis-sizing or starving the host can cause odd failures that look like GPU issues.
Decision: If the host is under memory pressure or hugepages are fragmented, roll back “performance tuning” until the VM works reliably.
10 causes and fixes (black screen edition)
1) The host is still using the GPU (wrong driver binding)
This is the classic. You passed through the GPU, but the host loaded nvidia, amdgpu, or nouveau first.
The VM starts, but the GPU is half-owned and behaves like it’s being pulled in two directions. Black screen is a common outcome.
Fix: bind the device to vfio-pci early (initramfs), and blacklist conflicting modules.
cr0x@server:~$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e87,10de:10f8 disable_vga=1
cr0x@server:~$ cat /etc/modprobe.d/blacklist-gpu.conf
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
cr0x@server:~$ update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-6.8.12-4-pve
Decision: After reboot, re-check with lspci -nnk. If it’s not vfio-pci, you’re not done.
2) IOMMU groups are dirty (you can’t isolate the GPU)
If the GPU sits in an IOMMU group with other devices (USB controllers, SATA, sometimes an entire PCIe slot’s neighbors),
VFIO can’t safely assign just the GPU. You might get “it boots once, then black screen,” or “works until driver loads.”
Fix: move the GPU to a different slot, disable “Above 4G decoding” only if it truly breaks grouping (rare), or use ACS override cautiously.
cr0x@server:~$ pvesh get /nodes/$(hostname)/hardware/pci --pci-class-blacklist ""
...
Decision: If moving slots gives you a clean group, do that. If you must use ACS override, treat it as a workaround, not a victory.
3) Wrong firmware/machine type (OVMF/Q35 mismatch)
SeaBIOS plus i440fx can still work, but it’s the “it ran on my laptop in 2017” choice.
Modern GPUs and modern OS installers are happiest with OVMF + Q35.
Fix: set bios: ovmf and machine: q35. Also add an EFI disk if needed (Proxmox does this in the UI).
Decision: If you’re stuck on SeaBIOS for a specific legacy reason, you may need to supply a ROM file and accept that some GPUs won’t cooperate.
4) You’re staring at the wrong display output (port and monitor negotiation)
This one is humiliating because it’s real. Different ports can behave differently under passthrough: DP link training,
HDMI EDID quirks, and monitor power-save states can all look like “GPU is dead.”
Fix: while debugging, use a single monitor, set it to a basic refresh rate (60 Hz), disable HDR, and test a different port.
Decision: If the VM is reachable via RDP/SSH but local output is black, the GPU is probably fine; the display path is the problem.
5) Missing the GPU’s audio function (multi-function passthrough)
Many GPUs expose an HDMI/DP audio function. Some drivers assume it exists. Some resets behave better when all functions are passed through.
Pass only the VGA function and you might get a driver that loads… and then nothing on-screen.
Fix: pass through both 01:00.0 and 01:00.1 as separate hostpci entries.
Decision: If you can’t pass the audio function because it’s grouped with something else, go back to IOMMU grouping and slot choice.
6) NVIDIA Code 43 / hypervisor detection (less common, still real)
In some environments, Windows + NVIDIA drivers decide they don’t like virtualized guests and refuse to initialize the GPU correctly.
Symptoms vary: black screen, device error, drivers that install but never produce output.
Fix: use cpu: host,hidden=1 in the VM config and avoid exotic CPU flag masking unless you need it for licensing or compatibility.
Decision: If you’re doing VDI-like workloads, test driver versions. This isn’t superstition; regressions happen.
7) The reset bug (GPU won’t reinitialize after VM reboot/shutdown)
Some GPUs don’t fully reset without a full power cycle. You stop the VM, start it again, and you get a black screen.
The first boot works, which makes you blame the guest. The GPU is the one holding a grudge.
Fix options:
- Try a full host reboot as a diagnostic. If that “fixes” it, you likely have a reset issue.
- For some AMD cards, a reset helper module can restore proper function (platform-dependent).
- Prefer GPUs known to reset cleanly for production. “It works if I never reboot” is not a strategy.
Joke #2: If your GPU only works after a host reboot, congratulations—you’ve built a very expensive light switch.
8) BAR sizing / Above 4G decoding / Resizable BAR quirks
Modern GPUs map large memory regions (BARs). If firmware can’t allocate space cleanly, you’ll see BAR reservation errors,
random QEMU failures, or a VM that boots but never shows output when the driver tries to map VRAM.
Fix:
- Enable “Above 4G decoding” in BIOS for large BAR allocations (commonly required).
- Temporarily disable Resizable BAR while debugging (especially if you’re mixing older guest OS or firmware).
- Stay on Q35 and OVMF for sane PCIe behavior.
Decision: If your logs show BAR allocation failures, don’t waste time in guest driver land.
9) Primary GPU confusion: x-vga, vBIOS, and “no emulated display”
If you remove the emulated VGA device and expect the passed GPU to be primary, the guest firmware must use it as a boot display.
Sometimes it won’t, especially if the GPU doesn’t expose a compatible option ROM in the way the VM expects.
Fix:
- Try
x-vga=1on the GPU passthrough device. - Keep a basic emulated display temporarily (like std VGA) for install/debug, then remove it once stable.
- If required, supply a ROM file that matches your exact GPU model.
Decision: If you can see boot screens via emulated VGA but lose output when switching to the GPU, you’re in firmware/option-ROM territory.
10) Power management and sleep states (D3cold, ASPM, and friends)
Some platforms aggressively put devices into low-power states. On bare metal, drivers handle it.
Under passthrough, power-state transitions can be less forgiving. The guest driver loads, flips the GPU power state,
and your display goes black.
Fix: disable PCIe ASPM in BIOS for testing, and avoid suspend/hibernate inside the guest until you’ve proven stability.
Decision: If the GPU works until the guest idles or the screen blanks, suspect power management rather than VFIO basics.
Common mistakes: symptom → root cause → fix
-
Symptom: VM starts, Proxmox console is black, monitor is black, but VM is pingable.
Root cause: Output is on a different GPU port or the guest is using a different display adapter.
Fix: RDP/SSH in, set the passed GPU as primary display, test another physical port, reduce refresh/HDR. -
Symptom: VM won’t start; QEMU exits with group/container errors.
Root cause: IOMMU group not isolated or IOMMU disabled/misconfigured.
Fix: enable VT-d/AMD-Vi, fix kernel flags, move GPU slots, only then consider ACS override. -
Symptom: Works once after host reboot, then black screen after VM restart.
Root cause: GPU reset bug / FLR not supported.
Fix: avoid frequent stop/start, use vendor reset workarounds where applicable, or choose a GPU with clean resets. -
Symptom: Windows shows GPU but driver fails; black screen when driver loads.
Root cause: driver/hypervisor mismatch, missing audio function, or firmware mode mismatch.
Fix: pass through all functions, use OVMF+Q35, setcpu: host,hidden=1, test a stable driver branch. -
Symptom: Host loses video after binding GPU to VFIO; you can’t see the console anymore.
Root cause: You passed through the only GPU the host uses for display.
Fix: use an iGPU or a cheap secondary GPU for host console, or run headless with IPMI/serial and accept the tradeoff. -
Symptom: Random freezes under load; sometimes ends as black screen.
Root cause: interrupt remapping disabled, buggy BIOS, or power management transitions.
Fix: update BIOS, enable IRQ remapping, disable ASPM for testing, keep kernel current. -
Symptom: QEMU log mentions BAR reservation failures.
Root cause: Above 4G decoding/Resizable BAR interaction or address space exhaustion.
Fix: enable Above 4G decoding, disable Resizable BAR temporarily, stick to Q35/OVMF.
Three corporate mini-stories from the trenches
Incident #1: The outage caused by a wrong assumption
A team wanted “cheap GPU acceleration” for a Windows CAD workload. They built a Proxmox node with one big GPU and assumed:
“The VM uses the GPU, the host doesn’t need it.” They even disabled the emulated VGA because it felt cleaner.
The VM started fine—once. After a maintenance reboot, it came back with a black screen. The on-call tried the usual:
reinstall drivers, change display cables, toggle OVMF settings. No luck. The VM was alive on the network, but local output never returned.
The hidden assumption was that the GPU would always present a usable boot display to OVMF. In reality, the card’s option ROM path
wasn’t consistent in that setup. With no emulated display, there was nothing to see during early boot and driver initialization was failing silently.
The fix was boring: re-add an emulated display for installation and troubleshooting, keep Q35+OVMF, pass through both GPU functions,
and only remove the emulated adapter after confirming the GPU is stable through multiple reboots. They also added a tiny secondary GPU for host console.
The “cheap acceleration” became cheaper in human hours immediately.
Incident #2: The optimization that backfired
Another org was chasing latency and decided to “tune everything”: hugepages, CPU pinning, aggressive power saving, ASPM enabled,
and a custom kernel cmdline they copied from a forum thread that looked authoritative.
The GPU passthrough VM would boot, show output, then go black under load—usually when a rendering job started.
Their first instinct was drivers. They pinned more cores. They changed Windows power profiles. They even swapped monitors.
The black screen kept coming back like a sequel nobody asked for.
Eventually they looked at dmesg and found interrupt remapping was effectively not doing what they thought due to BIOS settings interacting
with x2APIC mode. The system was stable enough for light usage, unstable under heavy DMA/interrupt churn.
The “optimization” (aggressive power management + questionable kernel flags) turned a mostly-OK setup into a flaky one.
The fix was to revert to a known-good baseline: default Proxmox kernel parameters for IOMMU, update BIOS,
disable ASPM during validation, and only then reintroduce tuning one change at a time with a rollback plan.
Performance improved because the system stopped falling over. That’s an underrated optimization.
Incident #3: The boring but correct practice that saved the day
A third team ran GPU passthrough for a small internal CI pipeline that needed CUDA builds. Not flashy, but production-adjacent.
They had a rule: every infrastructure change had a pre-change snapshot of VM config and a host-side “known state” file dump.
They also kept a text log of GPU PCI IDs, IOMMU groups, and the exact Proxmox kernel version in use.
One morning after a routine kernel upgrade, VMs started booting to black screens. Panic started to form—this pipeline fed releases.
But the team had receipts: they compared qm config before and after, confirmed the PCI IDs didn’t change,
and immediately saw that the GPU had rebound to a host driver after the initramfs update.
Because they had a checklist, they didn’t argue about theories. They regenerated initramfs, confirmed VFIO binding, rebooted once,
and recovered quickly. No heroics. No “let’s reinstall the guest.” Just disciplined verification.
The practice wasn’t sexy, but it shortened the incident. In ops, the fastest fix is often “prove what changed” and then undo it cleanly.
Checklists / step-by-step plan
Checklist A: Host readiness (do this before touching the VM)
- Enable VT-d/AMD-Vi and (if available) interrupt remapping in BIOS.
- Enable Above 4G decoding for modern GPUs; disable Resizable BAR temporarily if debugging.
- Boot host and verify IOMMU enabled in dmesg.
- Confirm GPU PCI IDs and IOMMU group isolation.
- Bind GPU + audio function to vfio-pci in initramfs; blacklist native GPU drivers.
- Reboot and re-check
lspci -nnkfor vfio-pci binding.
Checklist B: Proxmox VM configuration (stable defaults)
- Use Q35 machine type and OVMF BIOS for most modern GPUs.
- Add EFI disk (UEFI vars) for OVMF.
- Pass through GPU and its audio function with
pcie=1. - Start with an emulated display present for install/debug; remove later if needed.
- Set CPU type to
host; for NVIDIA/Windows, considerhidden=1. - Boot once, install drivers, reboot multiple times to validate reset behavior.
Checklist C: Black screen response plan (when it breaks at 2 a.m.)
- Confirm VM is alive on the network (ping, RDP/SSH). If alive, suspect display/driver output rather than VM boot failure.
- Check
journalctl -u pve-qm@VMIDfor VFIO/BAR/group errors. - Check host GPU binding and dmesg reset messages.
- If it’s the reset bug, do a controlled host reboot to restore the GPU, then plan a real fix (not “never reboot again”).
- Document what changed since last known-good: kernel update, BIOS setting, driver update, VM hardware config.
FAQ
1) Why does my VM boot fine, but the screen goes black when GPU drivers install?
That’s often the transition from basic VGA/UEFI framebuffer to full driver mode-setting. Suspect firmware/machine type mismatch,
missing audio function, power management, or a driver/hypervisor quirk (including Code 43 behavior on some setups).
2) Should I use OVMF or SeaBIOS for GPU passthrough?
Use OVMF for most modern GPUs and modern guest OSes. SeaBIOS is for specific legacy cases or troubleshooting,
and it narrows your options with newer hardware.
3) Q35 or i440fx?
Q35 unless you have a reason not to. GPUs are PCIe devices; Q35 models PCIe more naturally and avoids a lot of weirdness.
4) Do I really need to pass through the GPU’s audio function?
In practice: yes, most of the time. It improves driver sanity and can reduce edge cases with resets and initialization.
If you skip it, you’re choosing a more fragile setup.
5) My host has only one GPU. Can I pass it through and still manage Proxmox?
Yes, but expect to manage headless (SSH, web UI) and accept that local console output may vanish.
If you value your blood pressure, use an iGPU, IPMI, or a cheap secondary GPU for the host.
6) What’s the quickest way to tell if this is a host problem or a guest problem?
If QEMU logs show VFIO/group/BAR errors, it’s host/firmware. If the VM is reachable over the network but has no display, it’s guest driver/output.
If it only breaks after VM restart, it’s likely a reset issue.
7) Does “iommu=pt” cause black screens?
Usually no; it’s a common performance setting. But if your platform already has shaky IOMMU behavior, any tuning can expose it.
If debugging, simplify: keep required IOMMU flags, remove nonessential tweaks.
8) How do I know if I have a reset bug?
The pattern is: first VM boot works, subsequent boots after VM shutdown produce a black screen, and dmesg shows VFIO reset timeouts.
A full host reboot “fixes” it temporarily. That’s your tell.
9) Is ACS override safe?
It’s a workaround that trades isolation guarantees for convenience. For lab use, it’s often acceptable.
For production, prefer hardware that gives clean IOMMU groups without tricks.
10) Why does Proxmox console show nothing even though the GPU is passed through?
The Proxmox console is for the emulated display device. If you rely on the passed-through GPU output, the console can be blank by design.
Keep an emulated adapter during debugging so you have eyes inside the VM.
Conclusion: practical next steps
Treat a black screen like a routing problem, not a mood. Start on the host: IOMMU enabled, clean IOMMU group, vfio-pci binding.
Then validate the VM’s firmware and machine model (OVMF + Q35 is the sane baseline). Only then chase guest drivers and monitor quirks.
Next steps that pay off immediately:
- Capture a “known-good” snapshot of:
/proc/cmdline,lspci -nnk, IOMMU groups, andqm config. - Prove VFIO binding after every kernel update (initramfs changes can revert you to host drivers).
- Validate resets by doing multiple VM shutdown/start cycles before you call it done.
- If you hit a reset bug, stop negotiating with that GPU and change the plan: workaround module, different model, or accept host reboots.