Nothing humbles a confident sysadmin like a Windows VM that’s “fine” on paper and miserable in real life. You click Start, it thinks about it. You open an app, it negotiates. Disk queue length looks like a commuter line for coffee.
Proxmox can run Windows fast—boringly fast. But you have to pick the right few knobs. Not every knob. Eight of them. And you need to validate with measurements instead of vibes, because Windows can hide storage pain behind “System” CPU and Proxmox can hide it behind “io=…”.
Fast diagnosis playbook (find the bottleneck in 10 minutes)
When a Windows VM is slow, your brain wants to blame Windows. Your job is to identify which layer is actually waiting: guest CPU scheduling, guest storage queueing, QEMU device emulation, host IO, or the storage backend. The fastest path is to triangulate with latency, not throughput.
First: decide if it’s CPU scheduling or storage latency
- Check host CPU steal/ready behavior: if vCPUs aren’t getting scheduled, everything feels like molasses.
- Check host IO latency: if IO is slow, Windows will “freeze” in ways that look like CPU issues.
Second: confirm the VM is using the right devices and drivers
- Windows on SATA/IDE or using Microsoft’s generic storage driver is basically a tax you pay forever.
- Wrong controller + no VirtIO = high CPU per IO and shallow queues.
Third: isolate where the queue is building
- Queue in the guest (high disk queue length) but host disk idle: often driver/controller configuration or cache mode mismatch.
- Queue in the host (high await in iostat) and guest also slow: storage backend latency.
- Queue in QEMU thread (high single core usage on host): missing IO thread or wrong controller.
Minimal order of operations (do this in order)
- On the host: check IO latency and CPU contention (
iostat,pvesh/qmstatus,top). - On the VM config: verify controller (VirtIO SCSI), cache mode, iothread, discard, and CPU type.
- Inside Windows: verify VirtIO drivers installed and actually in use; check disk queue length and storport usage.
- On the storage backend: ZFS recordsize/volblocksize alignment, sync behavior, Ceph RBD settings, or LVM cache and scheduler.
If you do those four, you’ll usually find the culprit before anyone suggests “just give it more vCPUs,” which is how performance problems get promoted into production incidents.
Facts & context that explain why this happens
- VirtIO exists because emulating “real” hardware is slow. KVM is fast at CPU virtualization; device emulation is where you pay.
- IDE is historically simple, not historically fast. IDE emulation is a compatibility crutch; it burns CPU per IO and has shallow queues.
- Windows storage stack changed behavior across versions. Storport-based drivers and queueing improvements make modern Windows benefit more from proper virtual controllers.
- Write cache modes are a reliability contract. “Writeback” can be a rocket or a footgun depending on whether the host can guarantee flush semantics.
- ZFS is transactional and honest about sync writes. If your workload issues sync writes (many Windows patterns do), ZFS will make you pay unless you design for it.
- Ceph trades latency for distributed resilience. It can be fast, but small random writes can be brutally sensitive to network jitter and backend tuning.
- NVMe wasn’t designed for one queue. Modern SSDs want many queues and deep parallelism; a single threaded IO path becomes the new bottleneck.
- Ballooning was designed for density, not happiness. It can make a VM “fit,” but Windows reacts poorly to sudden memory pressure.
- IO schedulers and cache layers can fight each other. Two layers “optimizing” the same IO stream often means neither one is.
One paraphrased idea attributed to Werner Vogels (Amazon CTO): Everything fails, so design systems to keep working when parts inevitably break.
Performance tuning is similar: assume something will stall, then reduce the blast radius.
The 8 settings that actually move the needle
1) Disk bus and controller: VirtIO SCSI (and when not to)
If your Windows VM disk is on IDE or SATA in Proxmox, you’ve already found one problem. Those are compatibility choices, not performance choices. The default “SCSI” can be good, but the controller type matters.
Best default for Windows on Proxmox: VirtIO SCSI controller (usually virtio-scsi-single for simplicity) plus VirtIO storage driver in Windows.
Why it matters: VirtIO is a paravirtualized device. It reduces VM exits, reduces CPU per IO, and supports deeper queues. Windows workloads—especially “many small reads” during boot and updates—benefit enormously.
When you might choose something else:
- NVMe virtual disk can be excellent for high IOPS workloads because it maps well to modern queueing. It’s also simpler than SCSI in some cases.
- SATA might be appropriate if you’re stuck with an old Windows installer and you don’t want to inject drivers—only as a temporary bridge.
Opinionated take: don’t ship a “temporary SATA bridge” into production. That’s how temporary becomes permanent, like the cardboard sign on a broken door that’s still there two years later.
2) Cache mode: pick one, understand the blast radius
Cache mode decides who is allowed to lie about durability. Windows cares. Databases care more. And your future self cares most.
Common Proxmox cache modes you’ll see for disks:
- cache=none: Generally the safest default. Uses direct IO, avoids double caching, respects flush semantics properly. Often best for ZFS and Ceph.
- cache=writeback: Faster perceived writes because the host page cache absorbs them. Higher risk if power loss or host crash occurs and you don’t have proper power protection and flush handling.
- cache=writethrough: Safer than writeback, often slower.
What “slow” looks like when cache is wrong: random pauses, huge variance in IO latency, or great benchmarks that collapse under real workload because flushes force the truth back into the timeline.
Recommendation: start with cache=none, measure, then consider writeback only when you can defend the data durability story (UPS, enterprise SSD with PLP, and you understand your storage stack).
3) IO thread: separate the slow path from everything else
QEMU has threads. Your storage device emulation and virtqueue processing can become a single hot lane if you don’t give it its own lane. That’s what IO threads do: they isolate disk IO processing from the main vCPU thread(s).
Symptoms when you need iothread:
- VM feels “laggy” under disk load even when CPU usage seems modest.
- One host CPU core pegs during heavy IO, while others nap.
- Latency spikes correlate with background tasks (Windows Defender, updates, indexing).
Enable an iothread for the disk (or use a controller that supports it cleanly). This is one of those changes that is unglamorous but frequently obvious in the “before/after” graph.
4) Discard/TRIM and SSD emulation: stop lying to the guest
Windows assumes it can tell storage about freed blocks (TRIM). If the guest can’t discard, the backend may slowly degrade, especially on thin-provisioned volumes or SSD-backed pools that depend on knowing what’s free.
Enable discard on the virtual disk if your backend supports it (ZFS zvols can, Ceph RBD can, LVM-thin can). Then enable Windows optimization (it usually does this automatically on “SSD”).
Also consider SSD emulation: letting the guest know it’s on SSD can change Windows behavior (defrag vs optimize/trim schedules). It’s not magic, but it avoids stupid work.
Be careful: discard isn’t free. On some backends it can generate load. The point is correctness first, then measure.
5) VirtIO drivers & guest tools: your VM’s nervous system
Performance problems love missing drivers. Windows will function with generic drivers, but it will function like a rental car with the parking brake slightly on.
Inside Windows you want:
- VirtIO storage driver (viostor or vioscsi depending on the controller)
- VirtIO network driver (NetKVM)
- Balloon driver only if you intentionally use ballooning (many environments should not)
- QEMU guest agent for better lifecycle operations and sometimes better timing behavior
Do not “set and forget” driver ISO mounting. Confirm the driver is active in Device Manager and that the disk uses the expected controller.
Joke #1: Windows without VirtIO drivers is like a sports car delivered with square wheels—technically it moves, but nobody’s happy about it.
6) CPU type and NUMA: stop starving Windows
If your Windows VM is slow under load, you might be paying for conservative CPU flags or a poor topology. Proxmox offers multiple CPU types. The safe-but-slow trap is using a generic CPU model when “host” would be fine.
Recommendation: use CPU type: host for performance-sensitive VMs on a stable cluster (or where live migration constraints are acceptable). This exposes more CPU features to the guest, improving performance for crypto, compression, and some scheduler behaviors.
NUMA: If you give a VM lots of vCPUs and RAM, but it spans NUMA nodes without awareness, latency goes up. Windows will schedule threads across nodes; if the VM topology is mismatched, you get random stalls that look like storage but aren’t.
Rule of thumb: don’t allocate a VM bigger than a NUMA node unless you have a reason and you understand the host layout.
7) Memory ballooning and swap: death by “helpfulness”
Ballooning is seductive: “We can overcommit and it’ll be fine.” Windows disagrees. When memory is reclaimed, Windows hits its own paging behavior, caches collapse, and the VM starts doing storage IO just to compensate for missing RAM. That’s how a memory problem becomes a disk problem.
Recommendation: for performance-sensitive Windows VMs, set a fixed memory size and disable ballooning unless you have a proven capacity management process.
On the host, ensure you’re not swapping under load. Host swap activity makes guest latency chaotic, and chaotic latency is what users perceive as “slow.”
8) Storage backend tuning (ZFS/Ceph/LVM): match block sizes and queues
This is where most “my VM is slow” posts go to die: the backend is doing exactly what you asked, not what you meant. Proxmox abstracts storage, but your disks still obey physics, write amplification, and flush semantics.
ZFS (zvols and datasets)
- volblocksize matters for zvols. If it’s poorly matched to workload, you pay in write amplification and latency.
- sync writes are expensive without a proper SLOG. Windows can generate sync-ish patterns (metadata, journaling, application flushes).
- compression can help or hurt depending on CPU and workload; modern CPUs often make it a win.
Ceph RBD
- Ceph can deliver excellent throughput but may have higher baseline latency than local NVMe.
- Small random writes are sensitive to network and OSD performance; “it’s only 4k” is not comforting.
- RBD cache and client settings can change behavior, but start with measuring latency distribution.
LVM-thin / local SSD
- Thin provisioning needs discard to avoid silent space pressure.
- Check the underlying device scheduler and queue depth; defaults are not always optimal.
- Monitor for write cliff behavior on consumer SSDs (SLC cache exhaustion). You’ll see it as sudden latency spikes.
Joke #2: If your storage backend is a single consumer SSD with no power-loss protection, “writeback cache” is just speedrunning your résumé update.
Practical tasks: commands, outputs, and the decision you make
These are the tasks I actually run when a Windows VM feels slow on Proxmox. Each includes a command, an example output, and what decision it drives. Run them on the Proxmox host unless noted.
Task 1: Identify VM configuration (disk bus, cache, iothread, discard)
cr0x@server:~$ qm config 104
boot: order=scsi0;ide2;net0
cores: 8
cpu: x86-64-v2-AES
memory: 16384
name: win-app-01
net0: virtio=DE:AD:BE:EF:10:04,bridge=vmbr0,firewall=1
ostype: win11
scsi0: local-zfs:vm-104-disk-0,cache=none,discard=on,iothread=1,ssd=1,size=200G
scsihw: virtio-scsi-single
What it means: Disk is on VirtIO SCSI single, cache=none, discard=on, iothread enabled, SSD emulation enabled. This is the “good defaults” shape.
Decision: If you see ide0, sata0, cache=writeback without a durability story, or missing iothread=1, fix config first before chasing ghosts.
Task 2: Confirm the VM is actually running and not memory balloon-thrashing
cr0x@server:~$ qm status 104 --verbose
status: running
cpus: 8
memory: 16384
balloon: 0
uptime: 91234
What it means: balloon: 0 implies ballooning is disabled (fixed memory). Good for predictable performance.
Decision: If ballooning is enabled and the host is tight on RAM, turn it off for latency-sensitive Windows, or fix host capacity.
Task 3: Find which process/thread is burning CPU on the host during “disk slowness”
cr0x@server:~$ top -H -p $(pgrep -f "kvm.*104" | head -n1)
top - 10:03:11 up 12 days, 5:44, 2 users, load average: 6.10, 5.92, 5.80
Threads: 52 total, 1 running, 51 sleeping
%Cpu(s): 22.0 us, 3.0 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31241 root 20 0 6143200 902144 21760 S 98.0 1.4 10:22.11 iothread
31200 root 20 0 6143200 902144 21760 S 12.0 1.4 2:13.45 CPU 0/KVM
31201 root 20 0 6143200 902144 21760 S 11.5 1.4 2:11.88 CPU 1/KVM
What it means: An iothread is present and is busy. That’s fine if it corresponds to real disk load. If the main KVM vCPU thread is pegged on IO, you likely need an IO thread or a better controller.
Decision: If a single QEMU thread pins at 100% during IO, enable iothread and ensure VirtIO SCSI/NVMe is used.
Task 4: Measure host disk latency and queueing
cr0x@server:~$ iostat -x 1 5
Linux 6.8.12 (pve01) 12/26/2025 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
10.12 0.00 2.31 8.44 0.00 79.13
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %util await r_await w_await
nvme0n1 220.0 180.0 8800.0 7200.0 0.0 5.0 92.0 18.2 9.1 29.0
What it means: await in double digits under load suggests real latency. If this is your VM’s backing device and Windows is “slow,” the backend is likely the limit.
Decision: If %util is high and await is high, fix storage (faster media, better pool layout, SLOG for sync, reduce write amplification). If host disks are fine but VM is slow, look at VM config/drivers.
Task 5: Identify which storage device backs the VM disk
cr0x@server:~$ pvesm path local-zfs:vm-104-disk-0
/dev/zvol/rpool/data/vm-104-disk-0
What it means: The VM disk is a ZFS zvol. ZFS tuning rules apply.
Decision: If it’s on ZFS, check volblocksize, sync behavior, and whether the pool has a fast log device for sync-heavy workloads.
Task 6: Check ZFS pool health and latency signals
cr0x@server:~$ zpool status -v rpool
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:18:44 with 0 errors on Sun Dec 22 03:12:18 2025
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme0n1 ONLINE 0 0 0
nvme1n1 ONLINE 0 0 0
errors: No known data errors
What it means: Healthy pool. Health doesn’t mean fast, but it removes “degraded resilvering” as a performance culprit.
Decision: If the pool is degraded or resilvering, expect latency. Fix hardware/pool state first; tuning won’t outrun a rebuild.
Task 7: Inspect ZFS dataset properties that affect VM disks
cr0x@server:~$ zfs get -o name,property,value -s local recordsize,compression,atime,sync rpool/data
NAME PROPERTY VALUE
rpool/data recordsize 128K
rpool/data compression lz4
rpool/data atime off
rpool/data sync standard
What it means: Recordsize applies to datasets, not zvols, but this hints at general intent. Compression on and atime off are usually good defaults.
Decision: Don’t “optimize” by setting sync=disabled on VM storage unless you are intentionally accepting data loss on crash. Use the right hardware instead.
Task 8: Check zvol volblocksize (critical for Windows VM IO patterns)
cr0x@server:~$ zfs get -o name,property,value volblocksize rpool/data/vm-104-disk-0
NAME PROPERTY VALUE
rpool/data/vm-104-disk-0 volblocksize 8K
What it means: 8K volblocksize is often a reasonable compromise for general VM workloads. Too large can amplify small random writes; too small can increase metadata overhead.
Decision: If you see 128K volblocksize on a Windows system disk with lots of small random IO, consider recreating the disk with a smaller volblocksize and migrating.
Task 9: Observe ZFS IO and latency pressure in real time
cr0x@server:~$ zpool iostat -v rpool 1 5
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
rpool 1.20T 600G 800 650 60.0M 48.0M
mirror-0 1.20T 600G 800 650 60.0M 48.0M
nvme0n1 - - 420 330 31.0M 24.0M
nvme1n1 - - 380 320 29.0M 24.0M
What it means: You see actual read/write rates and ops. If Windows “hangs” while these numbers spike, you’re correlated with storage activity.
Decision: If the pool is saturated during user-visible stalls, move noisy neighbors, add faster vdevs, or separate workloads.
Task 10: Check host memory pressure and swap (latency’s silent killer)
cr0x@server:~$ free -h
total used free shared buff/cache available
Mem: 256Gi 210Gi 12Gi 2.0Gi 34Gi 38Gi
Swap: 16Gi 3.2Gi 13Gi
What it means: Swap is in use. It may be historical, or it may be active pain.
Decision: If swap grows during the slowdown window, stop overcommitting memory or move VMs. Host swap makes guest IO latency look like random sabotage.
Task 11: See if the kernel is actively swapping under load
cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 3355443 1200000 8000 32000000 0 0 120 220 800 1200 12 3 77 8 0
3 1 3355443 1150000 8000 31500000 0 256 1024 4096 1200 1800 10 4 65 21 0
2 0 3355700 1100000 8000 31000000 128 512 900 3800 1100 1600 11 4 62 23 0
What it means: Non-zero si/so indicates active swapping. That’s a performance incident in slow motion.
Decision: Fix memory pressure. Don’t tune cache modes to compensate for swapping; that’s like changing tire pressure to fix an engine knock.
Task 12: Check block device scheduler and queue settings (local disks)
cr0x@server:~$ cat /sys/block/nvme0n1/queue/scheduler
[mq-deadline] none
What it means: The scheduler is mq-deadline. For NVMe, this is commonly fine. For some arrays, “none” can be better; for some mixed workloads, deadline-like behavior helps tail latency.
Decision: If you’re troubleshooting tail latency, test scheduler changes carefully and measure. Do not cargo-cult “none everywhere.”
Task 13: Check VM disk options at the QEMU commandline level (sanity)
cr0x@server:~$ ps -ef | grep -E "kvm.*104" | head -n1
root 31200 1 18 09:40 ? 00:42:10 /usr/bin/kvm -id 104 -name win-app-01 -m 16384 -smp 8 -cpu x86-64-v2-AES ... -device virtio-scsi-pci,id=scsihw0 ... -drive file=/dev/zvol/rpool/data/vm-104-disk-0,if=none,id=drive-scsi0,cache=none,aio=io_uring,discard=unmap ...
What it means: Confirms the effective settings (cache, discard, aio). This catches mismatches between what you think Proxmox set and what QEMU actually runs.
Decision: If you’re expecting discard=unmap and it’s absent, re-check VM hardware settings and storage support.
Task 14: Verify Ceph latency quickly (if using RBD)
cr0x@server:~$ ceph -s
cluster:
id: 9c1f3a1a-aaaa-bbbb-cccc-1234567890ab
health: HEALTH_OK
services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mgr1(active), standbys: mgr2
osd: 12 osds: 12 up, 12 in
data:
pools: 4 pools, 256 pgs
objects: 1.20M objects, 4.6 TiB
usage: 13 TiB used, 22 TiB / 35 TiB avail
pgs: 256 active+clean
What it means: HEALTH_OK doesn’t guarantee low latency, but it removes “cluster on fire” from your list.
Decision: If health is not OK (backfill, degraded, slow ops), stop tuning the VM and fix Ceph first. Windows will not out-stubborn slow ops.
Task 15: Confirm Windows is using VirtIO storage driver (host-side check via agent)
cr0x@server:~$ qm agent 104 get-osinfo
{
"id": "mswindows",
"kernel-release": "10.0.22631",
"name": "Microsoft Windows 11 Pro",
"pretty-name": "Microsoft Windows 11 Pro",
"version": "11"
}
What it means: Guest agent is working. That’s not a performance win by itself, but it enables more reliable operations and visibility.
Decision: If the agent isn’t installed, install it. When you’re diagnosing performance, having fewer blind spots matters.
Common mistakes: symptom → root cause → fix
1) “Windows is slow only during updates and reboots”
Symptom: Reboots take forever, updates crawl, “Working on updates” looks like a life choice.
Root cause: Windows update is metadata-heavy and flush-heavy; wrong disk controller, missing VirtIO driver, or ZFS sync latency gets exposed.
Fix: Move system disk to VirtIO SCSI (or virtual NVMe), install VirtIO storage driver, use cache=none, and ensure backend can handle sync writes (SLOG or better media).
2) “Benchmarks are great, real apps are awful”
Symptom: Sequential throughput tests look fine; user interaction and app launches stutter.
Root cause: You optimized for throughput, but the workload is tail-latency sensitive. Often: writeback cache hiding flushes, or single-threaded IO path without iothread.
Fix: Enable iothread, prefer cache=none, and measure latency distribution (await, not just MB/s).
3) “Random 2–10 second freezes, then it recovers”
Symptom: The VM “hangs” briefly, especially under load; mouse/keyboard lag inside console.
Root cause: Host swapping, ZFS transaction group contention, or storage backend latency spikes (consumer SSD write cliff, Ceph slow ops).
Fix: Stop host swapping (more RAM, less overcommit), ensure storage is not saturated, check for slow ops/rebuild activity, and avoid ballooning.
4) “CPU usage is high but it feels IO-bound”
Symptom: Windows shows high CPU; host shows one busy core; disk metrics look moderate.
Root cause: Emulated device overhead (IDE/SATA) or lack of paravirtual drivers causing high CPU per IO.
Fix: Switch to VirtIO SCSI/NVMe and install correct drivers; enable iothread.
5) “Network is slow, but only for this Windows VM”
Symptom: SMB copies are slow; other VMs are fine.
Root cause: Using e1000 emulation or missing VirtIO net driver; offload settings mismatched.
Fix: Use VirtIO net; install NetKVM driver; confirm MTU and bridge config; measure with iperf-like tools (and don’t confuse disk with network).
6) “Everything was fine until we enabled discard”
Symptom: Periodic IO spikes after enabling discard/trim.
Root cause: Backend handles discard as real work (thin provisioning metadata updates, Ceph discards, or ZFS free space maps) and you surfaced it.
Fix: Keep discard (correctness matters), but schedule Windows optimize tasks, consider throttling, and ensure backend is sized properly. If discards are pathological, revisit backend configuration rather than disabling discard blindly.
Three corporate mini-stories from the performance trenches
Mini-story 1: The incident caused by a wrong assumption
The ticket started as “Citrix is slow.” It always does. Users reported that opening Outlook inside a Windows VM took minutes. A few admins insisted it must be the new antivirus policy, because “disk is SSD and network is fine.” That assumption—“SSD implies low latency”—was the seed of the incident.
On the Proxmox host, iostat showed moderate throughput but ugly await spikes. The Windows VMs were on ZFS, and the pool was healthy. The killer detail: the workload was sync-heavy due to the application’s flush behavior and the way Windows updates metadata. There was no dedicated log device, and the pool was built on mirror vdevs of consumer NVMe without power-loss protection.
The team had also enabled cache=writeback “for speed,” believing it was harmless because “it’s on SSD.” Under pressure, the system oscillated between fast bursts and stalls when flushes hit. The users experienced it as random hangs. The host experienced it as a storage device that sometimes had to tell the truth.
The fix was painfully simple and politically annoying: revert to cache=none, deploy proper enterprise NVMe with PLP, and add a fast log device for the ZFS pool where sync-heavy VMs lived. Suddenly the same VMs behaved like they belonged in a data center instead of a science fair.
The postmortem’s lesson wasn’t “ZFS is slow” or “Windows is weird.” It was that the stack has contracts. If you assume SSD equals durability and low latency, you end up debugging human expectations, not machines.
Mini-story 2: The optimization that backfired
A different org ran a mixed Proxmox cluster: some nodes with local NVMe, some backed by Ceph. They wanted “consistent VM performance,” so they standardized on one VM hardware template: VirtIO SCSI, iothread enabled, discard on, and a single large virtual disk for simplicity.
Then they made a clever change: they enabled aggressive memory ballooning across the fleet to improve consolidation. Their graphs looked great. The CFO was delighted. The Windows VMs were not.
Under peak hours, the host reclaimed memory. Windows responded by paging and dropping caches. That created more disk IO. On Ceph-backed nodes, that meant more small random writes and more metadata churn. The increased IO raised latency. The increased latency slowed applications, which increased user retries, which increased load. A performance feedback loop, dressed up as “capacity efficiency.”
They tried to fix it by increasing vCPUs. That just made contention worse: more runnable threads competing for the same IO and memory bandwidth. The VMs looked busier while getting less done—a classic “we scaled the problem” moment.
The eventual fix was boring and effective: disable ballooning for Windows app servers, reserve memory on the host, and separate latency-sensitive workloads from noisy ones. Density dropped a bit. Incidents dropped a lot. That trade is called “being an adult.”
Mini-story 3: The boring but correct practice that saved the day
One team I liked working with had a ritual: every time they deployed a new Proxmox node or changed storage, they ran the same validation suite. Not a huge benchmark circus—just a handful of repeatable checks: host swap behavior, iostat await under synthetic load, and a Windows VM sanity check confirming VirtIO drivers and controller types.
They also kept VM templates strict. No one was allowed to spin up Windows with SATA “just to get it installed.” The template had VirtIO SCSI and the driver ISO attached; the build steps included verifying the driver binding. It was dull. It worked.
One quarter, a storage firmware update introduced a subtle latency regression. Nothing crashed. But the team’s routine checks showed higher tail latency on random writes. They paused the rollout, rolled back firmware on the updated nodes, and avoided a month of “users say it’s sluggish” tickets that never reproduce when you’re watching.
This wasn’t magic. It was discipline: a short checklist, a known-good baseline, and the humility to believe that performance regressions are real even when nobody can point to a smoking crater.
Checklists / step-by-step plan
Checklist A: Make an existing slow Windows VM sane (minimal downtime approach)
- Collect baseline metrics: host
iostat -x, hostvmstat, VM config viaqm config. - Verify controller and disk options: ensure VirtIO SCSI (or NVMe),
cache=none,iothread=1,discard=onwhere supported. - Install VirtIO drivers in Windows if not present (storage + net). Confirm they are active.
- Disable ballooning for latency-sensitive workloads; set fixed memory.
- Switch CPU type to host where migration constraints allow; ensure vCPU count is reasonable.
- Re-test under real workload and verify host latency improved or at least became stable.
Checklist B: New Windows VM template that won’t embarrass you later
- Machine type: modern (q35) unless you need legacy compatibility.
- Disk: VirtIO SCSI single (or NVMe),
cache=none,iothread=1,discard=on,ssd=1. - Network: VirtIO net.
- CPU: host (or a consistent model across the cluster if you need seamless migration).
- Memory: fixed; ballooning off by default for Windows app servers.
- Install: VirtIO driver ISO attached; confirm correct drivers before finishing build.
- Post-install: QEMU guest agent installed; Windows power plan set appropriately for servers (avoid aggressive power saving).
Checklist C: Storage backend sanity for Windows VMs
- Confirm storage health (no rebuild/backfill).
- Measure latency (
iostat -x) during representative IO load. - ZFS: validate zvol volblocksize choice before provisioning; consider SLOG for sync-heavy workloads.
- Ceph: watch for slow ops and network jitter; avoid mixing latency-critical and bulk workloads without planning.
- Thin provisioning: ensure discard works and monitor free space pressure.
FAQ
1) Should I use VirtIO SCSI or virtual NVMe for Windows on Proxmox?
If you want a safe default: VirtIO SCSI (often virtio-scsi-single) with iothread enabled. If you’re chasing high IOPS with modern Windows and you prefer NVMe semantics, virtual NVMe can be excellent. Measure either way; don’t assume.
2) Is cache=writeback always faster?
It often looks faster until flushes happen, then reality arrives with paperwork. Use it only when you can defend durability (power protection, correct flush semantics, and a storage backend you trust under crash scenarios).
3) Why does Windows “freeze” when disk is busy?
Because a lot of Windows UI and service behavior is gated on storage latency, not throughput. High tail latency makes the whole system feel hung even if average MB/s looks fine.
4) Does enabling discard/TRIM help performance?
It helps correctness and long-term stability on thin and SSD-backed systems by informing the backend what’s free. Short-term, it may add work. If enabling discard causes spikes, you’ve learned something about your backend.
5) How many vCPUs should I give a Windows VM?
Enough to match workload concurrency, not enough to create scheduling contention. Start smaller and scale up. Oversizing vCPU count can worsen latency, especially on busy hosts.
6) Should I enable ballooning for Windows VMs?
For test/dev or low-stakes desktops, maybe. For production Windows app servers where humans notice latency, usually no. Fixed memory buys predictability.
7) My host IO looks fine, but Windows still has high disk queue length. Why?
Often a driver/controller issue (wrong bus, missing VirtIO), or the VM’s IO path is single-threaded and saturating a QEMU thread. Check controller type, iothread, and Windows device driver binding.
8) Can ZFS be fast for Windows VMs?
Yes. But you need to respect sync writes and block size behavior. ZFS will not pretend sync writes are cheap. If your workload demands low sync latency, design the pool accordingly.
9) What’s the single highest impact change for a slow Windows VM?
Moving from emulated IDE/SATA to VirtIO SCSI (or NVMe) with the proper VirtIO storage driver. It’s not subtle.
10) If I can’t reinstall Windows, can I still switch controllers?
Usually yes, but do it carefully: install VirtIO drivers first while the old controller still boots, then add the new controller/disk mapping, then switch boot. Test with snapshots/backups.
Conclusion: next steps that won’t waste your weekend
Make your Windows VM fast by fixing the boring parts: the disk controller, the cache contract, the IO threading, and the drivers. Then validate that the host isn’t swapping and that your storage backend latency is compatible with human expectations.
Concrete next steps:
- Run
qm config <vmid>and confirm VirtIO SCSI/NVMe,cache=none,iothread=1,discard=onwhere supported. - Measure host latency with
iostat -x 1during the slowdown. If await is ugly, stop tuning the guest and fix storage. - Disable ballooning for the VM (and fix host RAM pressure) if you see swapping or memory reclaim behavior.
- Standardize a Windows VM template that bakes in the right controller/driver choices so you don’t debug this again next quarter.
Performance work is rarely about genius tweaks. It’s about removing accidental bottlenecks and refusing to ship defaults that were designed for compatibility, not speed.