Proxmox CPU Pinning Myths — The Setting That Makes Latency Worse

Was this helpful?

You pin vCPUs because you want “determinism.” Then the database VM starts hiccuping every few minutes, your p95 latency doubles,
and someone inevitably says, “But we pinned it, so it can’t be CPU.”

CPU pinning in Proxmox can be a performance tool. It can also be a self-inflicted denial-of-service on your own scheduler.
The trick is knowing which kind of “pinning” you actually did, what it stole from the rest of the system, and which latency
you just made worse.

The myth: pinning equals lower latency

Pinning is sold as “stop the VM from bouncing between cores.” That’s not wrong. It’s also not the point.
On modern Linux with KVM, the scheduler is already pretty good at keeping hot threads warm on the same cores.
What it’s even better at is borrowing spare CPU from anywhere when a burst hits.

Most people pin because they have a problem they can’t see: noisy neighbors, host contention, interrupt storms,
NUMA memory miss penalties, or oversubscription. Pinning feels like control. And sometimes it is.
But if you pin without understanding where the time is going, you turn a flexible, load-adaptive system into a set
of tiny cages. Latency then gets worse for two reasons:

  • You remove the scheduler’s escape routes. When a CPU is busy (or preempted by an interrupt), the VM can’t run elsewhere.
  • You concentrate collisions. The VM’s vCPU threads, QEMU emulation, vhost-net, and host housekeeping may fight over the same cores.

Pinning is not “performance mode.” It’s a contract. You’re promising the VM those cores. In return, you must keep those cores clean:
few interrupts, consistent frequency, predictable memory locality, and enough headroom. Most environments sign the contract and then
forget to pay.

What CPU pinning really is (and what Proxmox actually changes)

Three different things people call “pinning”

In Proxmox land, “pinning” gets used for at least three distinct mechanisms. Mixing them up is where the folklore starts.

  1. QEMU/KVM thread affinity (taskset-style). You bind VM vCPU threads to host CPUs. This is the classic “vCPU pinning.”
  2. cpuset cgroups (hard partitioning). You constrain a VM process tree to a CPU set. Stronger than affinity; you’re fencing it.
  3. CPU isolation for the host (kernel boot params). You reserve CPUs for specific workloads by pushing general host work away
    (RCU callbacks, kworkers, many interrupts). This is the part people skip.

What Proxmox exposes vs what Linux actually schedules

Proxmox gives you knobs like CPU units, CPU limit, NUMA, and sometimes “affinity” via args or hooks. Under the hood,
your VM is a QEMU process with multiple threads: one per vCPU, plus I/O threads, plus emulation, plus vhost threads depending on device model.
Linux schedules threads, not “VMs.”

When you pin “the VM,” you are really pinning some subset of those threads.
If you pin only vCPU threads and forget the I/O thread, you can still bottleneck on the unpinned thread running on a crowded core.
If you pin everything onto the same small set, you can create a latency pressure cooker.

Pinning changes fairness, not physics

Pinning doesn’t make CPUs faster. It changes who is allowed to run where.
That matters because the Linux scheduler’s job is to distribute runnable work across available CPUs while maintaining fairness
and cache locality. If you restrict it, you must ensure your restricted set has:

  • Enough CPU cycles under peak load
  • Stable frequency (no aggressive downclocking mid-burst)
  • Interrupt behavior you can live with
  • NUMA locality you can live with

The setting that makes latency worse: constraining the scheduler without isolating the host

Here’s the setting pattern that quietly ruins latency: pinning vCPUs (or using cpuset constraints) on a Proxmox host
that still schedules host housekeeping and interrupts on those same cores
.

You meant “dedicated cores.” You got “shared cores with fewer options.” The VM can’t escape a busy core, but the busy core is busy
because the host is still doing host things there: interrupt handling, kernel workers, ZFS housekeeping, softirqs from networking,
and whatever other VMs weren’t as “special” as the pinned one.

The result is classic tail latency. Median looks fine. p95/p99 gets ugly. And the ugliness often correlates with:

  • Network bursts (softirq time spikes)
  • Storage bursts (kworker, ZFS txg sync, IO completion)
  • Periodic kernel activity (RCU, timer ticks on non-tickless systems)
  • Frequency scaling transitions (boost behavior changes under thermal/power constraints)

Why this pattern is worse than doing nothing

Without pinning, the VM’s vCPU threads can migrate away from a temporarily bad core. The scheduler can spread work.
When you pin, you convert transient host interference into a hard stall: the vCPU is runnable, but cannot be scheduled on an unblocked CPU.
That’s not “determinism.” That’s “the shortest line at the grocery store, but it’s the only one you’re allowed to use.”

Joke #1: CPU pinning is like assigning everyone in the office one elevator. It’s very orderly until someone brings a cart.

What to do instead (most of the time)

If your goal is lower latency, don’t start with pinning. Start with:

  • Capacity and headroom: stop oversubscribing CPUs for latency-sensitive VMs
  • NUMA correctness: keep vCPUs and memory local, or accept the penalty knowingly
  • Interrupt hygiene: make sure the VM’s “dedicated cores” aren’t taking the host’s worst interrupts
  • CPU frequency policy: pick a governor that matches your workload (and verify)

Pinning becomes sane when it’s part of a bundle: CPU isolation + IRQ affinity + NUMA alignment + sensible vCPU sizing.
Pinning alone is like buying a race tire and putting it on a shopping cart.

Interesting facts and short history (why this keeps happening)

  • Fact 1: CPU affinity has existed in Linux for decades, but it became mainstream with SMP systems and then exploded with virtualization.
  • Fact 2: KVM isn’t a separate hypervisor scheduler; it’s a kernel module. Your “hypervisor scheduling” is largely the Linux scheduler.
  • Fact 3: NUMA has been a performance landmine since multi-socket servers became common; remote memory access can look like random latency spikes.
  • Fact 4: “Tickless” kernels (NO_HZ) reduced periodic timer interrupts, which matters when you’re chasing micro-stalls on isolated CPUs.
  • Fact 5: irqbalance was created to distribute interrupts across CPUs for throughput, not to protect your low-latency cores from noisy IRQs.
  • Fact 6: In virtualization, the vCPU is a host thread. If it’s preempted by a long softirq, your guest experiences it as “CPU steal” or just “slow.”
  • Fact 7: The rise of NVMe reduced storage latency enough that CPU scheduling and interrupts became the new bottleneck in many stacks.
  • Fact 8: SMT/Hyper-Threading complicates pinning because two “CPUs” share execution resources; pinning to siblings can increase jitter under contention.
  • Fact 9: cgroups evolved from CPU shares to more precise controllers; cpuset is powerful, and easy to misuse like a sledgehammer in a watch shop.

How pinning breaks: the main latency failure modes

1) You pinned vCPUs onto CPUs that are interrupt hotspots

Latency-sensitive VM pinned to a core that receives NIC interrupts is a special kind of self-harm.
Under load, softirq processing can dominate. Your vCPU thread is runnable, but the CPU is busy doing network work for the host.
The guest sees random pauses and jitter.

2) You pinned across NUMA nodes without controlling memory locality

You can pin vCPUs to CPUs on two sockets while the guest memory ends up allocated mostly on one node. Now half your vCPUs
do remote memory access. Remote memory isn’t always disastrous, but it’s rarely stable. You can get tail spikes when remote bandwidth saturates.

3) You pinned, then oversubscribed anyway

Pinning doesn’t fix oversubscription; it makes it more rigid. If you pinned multiple busy VMs onto overlapping CPU sets,
you’ve created contention you can’t schedule around. This is how you get “everything is fine until 10:02, then everything is on fire.”

4) You forgot the “other threads” (I/O thread, vhost threads, emulator)

A VM doing heavy storage or networking can be gated by a QEMU I/O thread or vhost threads.
If you pin vCPUs but not the I/O path, you can end up with vCPUs waiting on an unpinned, overworked thread.
That’s not a CPU problem in the guest. It’s an architecture problem on the host.

5) SMT sibling collisions

Pinning a VM to “two cores” that are actually two threads on the same physical core can reduce throughput and increase jitter.
For latency work, you typically want full cores, not siblings, unless you have a strong reason and measured evidence.

6) Frequency scaling and power limits

Pinning reduces scheduler mobility, which can interact badly with boost behavior.
If your pinned CPUs are the ones that downclock first under thermal or power constraints,
you’ll see slowdowns that look like “mysterious random latency.” They’re not random; they’re policy.

Quote (paraphrased idea), attributed: Werner Vogels often repeats a reliability principle: “Everything fails; design so it’s safe and recoverable when it does.”
Pinning is a reliability decision too—make it recoverable, not fragile.

Fast diagnosis playbook

When latency spikes on a Proxmox host with “pinned” VMs, you want answers in minutes, not after a week of interpretive performance art.
Here’s the order that usually finds the culprit fastest.

First: prove it’s CPU scheduling/interrupts, not storage

  1. Check host CPU saturation and steal-like symptoms: look for runnable backlog and per-CPU utilization.
  2. Check softirq/irq time: if softirq is high on the pinned CPUs, that’s your villain.
  3. Check storage latency: if disks are fine and CPU is not, stop blaming ZFS out of habit.

Second: verify the pinning is real and complete

  1. Which threads are pinned (vCPU only? all QEMU threads?)
  2. Are pinned CPUs shared with other VMs or host work?
  3. Are you pinning to SMT siblings by accident?

Third: check NUMA locality

  1. Are vCPUs spread across nodes?
  2. Is memory allocated on the same node(s)?
  3. Is the VM large enough that remote memory penalties are inevitable?

Fourth: confirm power/frequency behavior

  1. Governor and current frequencies under load
  2. Power cap/thermal throttling events

Fifth: only then adjust pinning

Pinning is the last step because it’s easy to change and hard to reason about after the fact.
Diagnose first; then tighten constraints in small, reversible moves.

Hands-on tasks: commands, outputs, and decisions (12+)

These are the checks I run on a Proxmox host when someone tells me, “We pinned CPUs and latency got worse.”
Each task includes: a command, what the output means, and the decision you make.

Task 1: Identify the VM’s QEMU process and threads

cr0x@server:~$ pgrep -a qemu-system
21433 /usr/bin/kvm -id 101 -name vm-db01 -m 32768 -smp 8,sockets=1,cores=8,threads=1 ...

Meaning: You have the PID for the VM. Pinning affects this process and its threads.
Decision: Use PID 21433 for subsequent affinity and scheduling checks.

Task 2: See per-thread CPU usage (find vCPU threads, iothread, vhost)

cr0x@server:~$ top -H -p 21433 -b -n 1 | head -n 20
top - 10:12:41 up 32 days,  2:10,  2 users,  load average: 6.12, 5.88, 5.77
Threads:  36 total,   6 running,  30 sleeping,   0 stopped,   0 zombie
%Cpu(s): 62.3 us,  7.2 sy,  0.0 ni, 28.9 id,  1.1 wa,  0.0 hi,  0.5 si,  0.0 st
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
21433 root      20   0 12.3g  1.1g  19772 S  10.0   3.6   2:41.33 kvm
21445 root      20   0 12.3g  1.1g  19772 R  98.7   3.6  22:18.91 CPU 0/KVM
21446 root      20   0 12.3g  1.1g  19772 R  95.4   3.6  21:55.07 CPU 1/KVM
21447 root      20   0 12.3g  1.1g  19772 S  12.3   3.6   1:10.10 IOThread

Meaning: vCPU threads (CPU 0/KVM, CPU 1/KVM, etc.) and an IOThread are visible. IOThread is doing work too.
Decision: If you pin only vCPUs but IOThread is busy, you must consider pinning IOThread (or giving it a clean CPU) as well.

Task 3: Check current CPU affinity for the QEMU process

cr0x@server:~$ taskset -cp 21433
pid 21433's current affinity list: 2-5

Meaning: The QEMU process is constrained to CPUs 2-5. That’s only four CPUs worth of scheduling space.
Decision: Confirm this matches intended vCPU count and that CPUs 2-5 are not interrupt-heavy or shared with other critical work.

Task 4: Check affinity per thread (this is where “we pinned it” often falls apart)

cr0x@server:~$ for t in $(ls /proc/21433/task); do echo -n "$t "; taskset -cp $t | sed 's/^.*: //'; done | head
21433 2-5
21445 2-5
21446 2-5
21447 0-31
21448 0-31

Meaning: vCPU threads are pinned to 2-5, but IOThread (21447) and another helper thread are not pinned.
Decision: Either pin all relevant QEMU threads intentionally, or accept that the unpinned threads can bottleneck and introduce jitter.

Task 5: Find which CPUs are SMT siblings (avoid accidental half-core pinning)

cr0x@server:~$ lscpu -e=CPU,CORE,SOCKET,NODE | head
CPU CORE SOCKET NODE
0   0    0      0
1   0    0      0
2   1    0      0
3   1    0      0
4   2    0      0
5   2    0      0

Meaning: CPUs 2 and 3 share CORE 1, CPUs 4 and 5 share CORE 2. If you pin to 2-5, you’re using two physical cores (with SMT).
Decision: For latency-sensitive VMs, prefer pinning to full cores (e.g., 2,4,6,8…) rather than sibling pairs, unless you measured and liked the result.

Task 6: Check interrupts per CPU (catch IRQ hotspots)

cr0x@server:~$ egrep -i 'CPU|eth0|nvme|mlx|ixgbe|virtio' /proc/interrupts | head -n 15
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
  45:          0          0    9123456          0          0          0  IR-PCI-MSI 524288-edge  eth0-TxRx-0
  46:          0          0          0    8876543          0          0  IR-PCI-MSI 524289-edge  eth0-TxRx-1
  97:      11234      11876      10987      11022      11301      11450  IR-PCI-MSI 0000:01:00.0  nvme0q0

Meaning: CPU2 and CPU3 are taking the bulk of NIC interrupts. If your VM is pinned to 2-5, half its “dedicated” capacity is handling packets.
Decision: Move VM pinning away from interrupt-heavy CPUs, or move IRQ affinity away from the VM CPUs. Don’t share by accident.

Task 7: Check softirq pressure per CPU (network bursts show up here)

cr0x@server:~$ awk 'NR==1||/NET_RX|NET_TX|SCHED|RCU/ {print}' /proc/softirqs
                    CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
NET_RX            123456     234567    9988776    8877665     345678     456789
NET_TX             98765      87654    5432109    4321098      76543      65432
SCHED            456789     567890     678901     789012     890123     901234
RCU               34567      45678      56789      67890      78901      89012

Meaning: NET_RX/NET_TX are much higher on CPU2/CPU3. That’s consistent with IRQ mapping.
Decision: If your latency spikes correlate with network activity, treat IRQ/softirq placement as a first-class tuning item before touching more pinning.

Task 8: Check load vs runnable threads (are you CPU-starved on pinned CPUs?)

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 7  0      0 812344  90212 4231120    0    0     2    31 8123 19022 64  9 26  1  0
 9  0      0 811920  90212 4231400    0    0     0     0 9100 22011 71 11 17  1  0

Meaning: The “r” column (runnable) is high. If the VM is pinned to a small CPU set, runnable backlog may be concentrated there.
Decision: If runnable exceeds available CPUs persistently, unpin or expand the CPU set, or reduce vCPU count to match real capacity.

Task 9: See per-CPU utilization and steal-like stalls

cr0x@server:~$ mpstat -P ALL 1 3
Linux 6.2.16-xx (pve01)  02/04/2026  _x86_64_  (32 CPU)

12:14:01 PM  CPU   %usr %nice %sys %iowait %irq %soft %steal %idle
12:14:02 PM    2   68.0  0.0  9.0    0.0   2.0  21.0    0.0   0.0
12:14:02 PM    3   61.0  0.0  8.0    0.0   1.0  29.0    0.0   1.0
12:14:02 PM    4   55.0  0.0  6.0    0.0   0.0   4.0    0.0  35.0

Meaning: CPU2/CPU3 have huge %soft. Those CPUs are doing network softirq work instead of running your pinned vCPUs.
Decision: Rebalance IRQs or move the VM. Pinning onto softirq-heavy CPUs is how you manufacture jitter.

Task 10: Confirm NUMA topology and whether the VM spans nodes

cr0x@server:~$ numactl -H | head -n 20
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 128755 MB
node 0 free:  42110 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 128705 MB
node 1 free:  39876 MB

Meaning: Two NUMA nodes. If you pin a VM to CPUs across both nodes without ensuring memory locality, remote accesses will happen.
Decision: Keep latency-sensitive VMs within one NUMA node when possible, or enable/verify NUMA-aware configuration and measure.

Task 11: Inspect the QEMU process NUMA memory policy (is memory local?)

cr0x@server:~$ grep -E 'Mems_allowed_list|Cpus_allowed_list' /proc/21433/status
Cpus_allowed_list:	2-5
Mems_allowed_list:	0-1

Meaning: CPU set is 2-5 (node 0), but memory is allowed on both nodes. That can cause cross-node memory allocations depending on pressure.
Decision: If you’re pinning tightly, also control memory locality. Otherwise you get the worst combo: fixed CPUs, wandering memory.

Task 12: Check CPU governor and current frequency behavior

cr0x@server:~$ cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
powersave

Meaning: The governor is “powersave.” That can be fine on some platforms, but it’s often bad for bursty latency-sensitive workloads.
Decision: Consider switching to “performance” (or a tuned policy) for hosts running latency-critical VMs, then verify frequency under load.

Task 13: Check for throttling / power limit hints (when available)

cr0x@server:~$ dmesg -T | egrep -i 'thrott|powercap|thermal' | tail -n 5
[Mon Feb  3 22:11:52 2026] thermal thermal_zone0: critical temperature reached (97 C), shutting down
[Mon Feb  3 22:11:05 2026] CPU0: Package temperature above threshold, cpu clock throttled

Meaning: If you see throttling, your “pinning tuning” is rearranging deck chairs. The CPU is literally slowing down.
Decision: Fix cooling/power limits before micro-optimizing scheduler behavior.

Task 14: Check ZFS latency and txg behavior (so storage doesn’t get blamed incorrectly)

cr0x@server:~$ zpool iostat -v 1 3
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
rpool       1.12T  2.38T    120    310   8.2M  21.1M
  nvme0n1   1.12T  2.38T    120    310   8.2M  21.1M

Meaning: No obvious storage saturation in this snapshot. If VM latency spikes while ZFS is calm, look back to CPU/interrupts.
Decision: Don’t “fix ZFS” just because it’s ZFS. Fix the bottleneck you can measure.

Task 15: Confirm Proxmox VM CPU configuration (vCPU count, sockets/cores)

cr0x@server:~$ qm config 101 | egrep 'name|cores|sockets|cpu|numa|balloon'
name: vm-db01
cores: 8
sockets: 1
cpu: x86-64-v2-AES
numa: 1
balloon: 0

Meaning: NUMA is enabled, 8 cores configured. If you pinned the process to 4 host CPUs (Task 3), you’ve created an obvious mismatch.
Decision: Align vCPU sizing and pinning. If you want 8 vCPUs, give it scheduling room for 8, or reduce vCPU count to fit reality.

Three corporate mini-stories from the pinning trenches

Mini-story 1: The incident caused by a wrong assumption (“Pinned means dedicated”)

A mid-sized SaaS shop ran Proxmox for internal services: CI runners, artifact storage, monitoring, a couple of database clusters.
One day the primary database VM started showing periodic query stalls. Not slow queries—stalls. Connections would hang for a second or two,
then recover. The graphs looked like a saw blade: calm, spike, calm, spike.

The team had recently “hardened performance” by pinning the database VM’s vCPUs to four host CPUs. The assumption was straightforward:
pinned equals dedicated, dedicated equals stable. They also assumed that if the VM had four pinned CPUs, it couldn’t be impacted by other VMs.
That assumption is how you get to spend your afternoon reading /proc like a mystery novel.

The root cause wasn’t the database, or even the VM configuration. It was interrupt placement.
The NIC’s busiest RX/TX queues were landing on the same CPUs the VM was pinned to. Under backup traffic and CI artifact uploads,
softirq time spiked on those cores. The vCPU threads were runnable, but “their” CPUs were stuck doing packet work.
The guest experienced it as random pauses.

Fixing it was boring: move IRQ affinity away from the VM CPUs, and stop pretending pinning is isolation.
They also expanded the VM’s allowed CPU set to include a couple of additional clean cores, trading a bit of cache locality for the ability to escape.
The stalls vanished. Postmortem lesson: pinning is a commitment to the host, not a protection from it.

Mini-story 2: The optimization that backfired (“We pinned for cache warmth”)

Another organization ran latency-sensitive message brokers in VMs. They wanted lower tail latency during traffic bursts.
Someone proposed pinning each broker VM to a small set of CPUs “for cache warmth.” The plan looked neat on a whiteboard:
two VMs per socket, each with pinned vCPUs, no wandering.

It worked in a synthetic test. Then production happened. Real traffic isn’t a benchmark; it’s a series of small disasters with timestamps.
During bursts, one broker VM would saturate its pinned CPUs and build a queue. The scheduler couldn’t borrow idle time from other CPUs
because the process was constrained. Latency climbed and then stayed high longer than expected, because the queue needed time to drain.

Meanwhile, other CPUs on the host were partially idle. The host had capacity; the VM wasn’t allowed to use it.
The optimization traded “warm caches” for “hard walls.” Warm caches are nice. Dropped messages and timeouts are less charming.

The rollback was instructive: they removed hard pinning, kept sane CPU shares, and focused on reducing host contention:
fewer noisy neighbors per node, better IRQ distribution, and NUMA-aware placement.
Their median latency barely changed. Their tail latency improved. The result wasn’t heroic; it was predictable.

Mini-story 3: The boring but correct practice that saved the day (“Measure, then isolate only what matters”)

A finance-adjacent company ran a Proxmox cluster for mixed workloads: batch jobs, internal web apps, a few latency-sensitive services.
They had a practice that sounded dull in meetings: every “performance change” required a before/after capture of CPU, interrupts,
and storage latency at the host level, plus a rollback plan that could be executed in minutes.

During a peak period, a critical VM started showing jitter. The knee-jerk reaction from an application team was to demand CPU pinning.
The infrastructure team didn’t refuse—politely—but insisted on running their capture first.
Within 15 minutes they found the issue: the VM was fine; the host had a periodic spike in softirq caused by a change in NIC queue settings.

They corrected the NIC/IRQ distribution, verified softirq time normalized, and the VM jitter disappeared without touching vCPU affinity.
Nobody got to brag about “tuning.” The system just behaved again, which is the entire point.

The boring practice that saved them was procedural: measure host contention first, treat pinning as a surgical tool,
and keep changes reversible. It prevented a week of superstitious pinning experiments and the kind of “works on one host” drift
that makes clusters hard to operate.

Joke #2: Pinning without measuring is like setting your watch by staring at it harder. You feel productive; time remains unimpressed.

Common mistakes: symptoms → root cause → fix

1) Symptom: p99 latency spikes after pinning, while average CPU looks fine

Root cause: You pinned onto IRQ/softirq-heavy CPUs, or you constrained the VM so it can’t escape transient host interference.

Fix: Move IRQ affinity away from those CPUs and/or expand the allowed CPU set. If you truly need “dedicated” cores, implement CPU isolation properly.

2) Symptom: VM feels “stuttery” during network traffic bursts

Root cause: Softirq processing (NET_RX/NET_TX) stealing time from your pinned vCPUs.

Fix: Rebalance NIC queues/IRQs, consider RSS settings, and keep latency VMs off those cores.

3) Symptom: You pinned an 8-vCPU VM but it behaves like a 4-vCPU VM

Root cause: QEMU process affinity limited to fewer host CPUs than vCPU count, or SMT siblings masquerading as “cores.”

Fix: Align vCPU count with physical cores (not threads) and the allowed CPU set. Reduce vCPU count if you can’t provide real capacity.

4) Symptom: Performance is inconsistent across hosts in the same cluster

Root cause: Different BIOS power settings, CPU governors, microcode, or IRQ layouts. Pinning magnifies these differences.

Fix: Standardize firmware settings, kernel parameters, and performance policy across nodes. Validate with the same host-level commands.

5) Symptom: Storage latency blamed, but disks look idle

Root cause: CPU contention in I/O completion path (kworkers, softirq, vhost), not media latency.

Fix: Measure CPU %soft and per-thread utilization; ensure I/O threads aren’t stuck on crowded CPUs; check IRQ placement for NVMe and NIC.

6) Symptom: “Pinning helped once, then got worse after a kernel update”

Root cause: Scheduler, IRQ defaults, or driver behavior changed; your pinning depends on fragile assumptions.

Fix: Re-validate interrupts, softirqs, and CPU topology after updates. Treat pinning configs as per-kernel/per-platform artifacts, not timeless truths.

7) Symptom: VM is fast until another VM on the host gets busy

Root cause: Overlapping pin sets or shared SMT siblings. You built noisy neighbors into the same physical resources.

Fix: Ensure CPU sets don’t overlap for critical VMs, and avoid sibling contention. If you can’t avoid overlap, don’t pin; rely on shares and headroom.

Checklists / step-by-step plan

Checklist A: Decide if pinning is even warranted

  1. Is the host CPU oversubscribed? If yes, fix that first. Pinning won’t invent capacity.
  2. Do you have proven IRQ/softirq interference? If yes, fix IRQ placement before pinning.
  3. Is NUMA locality currently broken? If yes, fix placement and memory locality before pinning.
  4. Is the workload latency-sensitive or throughput-sensitive? If throughput, pinning often reduces peak throughput by limiting scheduling flexibility.
  5. Can you roll back quickly? If no, don’t do it in production. Pinning changes are deceptively high-risk because they “sort of work” while being wrong.

Checklist B: If you pin, do it as a bundle (the “make it true” plan)

  1. Pick full physical cores (avoid SMT siblings for the primary pin set unless you’ve measured it).
  2. Keep it within a NUMA node when possible: vCPUs and memory should live together.
  3. Move interrupts away from those cores (NIC and NVMe are usual suspects).
  4. Account for non-vCPU threads: IOThread, vhost threads, emulator threads. Don’t starve the I/O path.
  5. Set a frequency policy that matches latency needs and validate under real load.
  6. Leave escape hatches if you can: a slightly larger CPU set can reduce tail latency in bursty environments.

Checklist C: Rollout and validation (don’t trust your first result)

  1. Capture baseline: mpstat, /proc/interrupts, /proc/softirqs, VM thread utilization, storage iostat.
  2. Change one thing: pinning or IRQ affinity or NUMA policy—not all at once.
  3. Measure p95/p99 in the guest and %soft/%irq on the host.
  4. Keep the rollback command ready and tested.
  5. Re-check after a reboot: IRQ distribution and CPU topology can shift.

FAQ

1) Should I pin CPUs for every VM in Proxmox?

No. Most VMs benefit from the scheduler’s flexibility. Pin only when you have a measured reason: predictable isolation needs, licensing constraints,
or a latency-sensitive workload where you can also control interrupts and NUMA locality.

2) Why did pinning improve throughput but worsen latency?

Throughput can improve from cache locality and reduced migrations. Tail latency can worsen because the VM can’t escape transient interference
on its pinned CPUs. Burstiness exposes the downside.

3) Is “CPU limit” the same as pinning?

No. CPU limit is throttling (cgroup quota). Pinning is placement (affinity/cpuset). Throttling can add latency by forcing periodic sleeps.
Placement can add latency by preventing migration away from contention. Different knives, different cuts.

4) Does enabling NUMA in the VM fix NUMA problems?

It can help, but it’s not magic. You still need to ensure the host places vCPUs and memory sensibly. Verify with host NUMA tools and observe
whether the VM’s CPU set maps cleanly to one node.

5) What’s the simplest way to tell if interrupts are stealing my pinned CPUs?

Look at /proc/interrupts and /proc/softirqs, then correlate with mpstat %irq/%soft on the pinned CPUs.
If pinned CPUs show high softirq during latency spikes, you have your answer.

6) Should I disable SMT/Hyper-Threading for low latency?

Sometimes. SMT can increase throughput but also adds contention and jitter in some workloads. Before disabling globally, try pinning to one thread per core
(avoid siblings) and measure. Disabling SMT is a bigger hammer with broader consequences.

7) Can pinning cause time drift or clock issues inside the guest?

Pinning itself doesn’t directly cause clock drift, but it can increase scheduling delays which can affect time-sensitive applications.
Ensure guest time sync is correct and avoid CPU throttling policies that introduce periodic stalls.

8) My VM is pinned and still slow. What now?

Validate the pinning is complete (threads), verify no IRQ hotspots on those CPUs, check NUMA locality, check frequency scaling and throttling,
then confirm you’re not simply CPU-starved due to workload demand. Pinning doesn’t fix “needs more CPU.”

9) Is it better to pin fewer vCPUs and scale vertically, or more vCPUs and rely on scheduling?

For many latency-sensitive services, fewer well-utilized vCPUs with headroom beats many vCPUs fighting over constrained cores.
Right-size based on runnable backlog and application concurrency. Measure, don’t guess.

Practical next steps

If you remember one thing: pinning is not isolation. It’s a constraint. Constraints raise the cost of mistakes.
If you’re going to constrain the scheduler, you owe it a clean environment: interrupts where you want them, NUMA locality that makes sense,
and enough CPU headroom that bursts don’t turn into queueing theory homework.

  1. Run the fast diagnosis playbook and identify whether the spikes correlate with %soft/%irq, runnable backlog, or storage latency.
  2. Audit pinning completeness: vCPU threads, IOThread, and any helper threads that matter for your I/O path.
  3. Stop pinning onto IRQ hotspots. If you must pin, pair it with IRQ affinity hygiene.
  4. Align pinning with physical cores and NUMA nodes. Avoid accidental SMT sibling pinning for latency-critical work.
  5. Make changes one at a time and keep rollbacks ready. Your future self will be tired and grateful.

If you do those steps, you’ll pin less often—and when you do pin, it’ll be for a reason you can explain without superstition.
That’s the whole job.

← Previous
WSL2 DNS Problems: The resolv.conf Fix That Survives Reboots
Next →
CentOS Stream 10 Install: The ‘Next RHEL’ Setup for Labs and CI

Leave a comment