Proxmox: The ‘Ballooning’ Setting That Creates Fake Memory Pressure

February 19, 2026 • February 19, 2026 • Read: 21 min • Views: 1

Was this helpful?

Everything is “fine” until it isn’t: the VM has plenty of RAM configured, CPU is bored, disk is fast, and yet latency crawls. Your database starts acting like it’s running on a USB stick. The guest reports memory pressure, swap grows, caches vanish, and your graphs look like a cardiogram.

Then you notice it: ballooning is enabled. Someone thought it was “free efficiency.” What it actually bought you was synthetic scarcity—memory pressure manufactured by the hypervisor that the guest OS cannot reason about, only react to.

Ballooning in one sentence (and why you should be suspicious)

Memory ballooning lets the host take RAM back from a running VM by forcing the guest to “use” memory internally, causing the guest to reclaim its own pages.

That sounds civilized—like borrowing a chair at a party. In practice, the host is grabbing cushions off the couch while you’re still sitting on it, and the guest has no idea whether the cushions were “free” cache or critical working set.

Opinionated rule: if you run anything stateful or latency-sensitive (databases, search, brokers, CI runners with hot caches, ZFS inside guests, Windows file servers), ballooning is usually a tax you don’t need. Buy RAM. Or shrink VMs intentionally. Don’t let the hypervisor do surprise “diet plans” mid-workload.

A few facts and historical context (6–10 quick hits)

Ballooning predates the cloud hype. VMware popularized ballooning in early ESX days as a practical tool for memory overcommit when RAM was expensive and consolidation was the selling point.
KVM ballooning is typically implemented via virtio-balloon. The guest runs a driver that allocates pages and “pins” them so the host can reclaim their backing memory.
Ballooning targets “unused” memory… as defined by the guest. But what the guest calls “unused” includes file cache that may be extremely valuable for performance.
Linux got better at reclaim heuristics over time, but it’s still guessing. Reclaim, swapping, and cache drop decisions depend on workload and kernel policy, not your SLOs.
Windows supports ballooning too, but visibility is different. Driver behavior and performance counters don’t always align with what the hypervisor thinks it reclaimed.
Ballooning is not the same as memory hotplug. Hotplug changes the guest’s view of “installed” memory; ballooning keeps installed memory the same and creates pressure inside it.
Transparent Huge Pages can amplify the mess. If the guest uses THP heavily, reclaim can fragment memory, increase compaction work, and create CPU overhead spikes during pressure.
Hypervisors also have a more brutal tool: swapping. If the host overcommits beyond what ballooning can reclaim, the host can swap guest pages—often worse than guest swap because the guest can’t optimize it.
Proxmox makes ballooning easy to enable, which is both convenient and dangerous. It’s a checkbox that looks like savings until you pay in tail latency.

What ballooning really does inside Proxmox + KVM

Proxmox VE is a management layer over KVM/QEMU. When you set a VM’s memory, there are two numbers that matter in practice:

Memory (max): what QEMU can give the VM (and what the guest thinks it has installed).
Balloon (min/target): how low Proxmox is allowed to push the VM under pressure or policy.

With ballooning enabled, QEMU uses the virtio-balloon device. The guest driver inflates by allocating memory pages and reporting them to the host as “don’t need these.” The host can then reuse the underlying physical RAM for other VMs or itself.

There are two key practical consequences:

The guest still believes it has the same amount of RAM installed. The OS doesn’t get a polite “you now have 12 GB instead of 16 GB.” It just sees memory pressure and reacts.
The hypervisor is making a performance decision using incomplete information. It can’t see which guest pages are hot in the application sense; it only sees pages. So it creates pressure and lets the guest panic responsibly.

Ballooning can be fine for lightly used general-purpose VMs—think random utility servers that idle most of the day. It’s a gamble for anything with caches or steady-state working sets. Which is most production workloads.

There’s an old operations idea (paraphrased) from Gene Kranz: discipline and responsible behavior beat improvisation when systems get weird. Ballooning is improvisation disguised as policy.

How “fake” memory pressure becomes real pain

Let’s be clear: the memory pressure is “fake” only in origin. The pain is completely real.

1) The guest drops page cache first (and you notice later)

Linux will happily reclaim file cache under pressure. That’s normal and often correct. But if your performance relies on cache warmth—databases reading indexes, search nodes scanning segments, CI pulling dependencies—ballooning causes cache churn that looks like random disk performance regression.

What makes this nasty: you may not see disk saturation. You’ll see latency. More IOPS, smaller reads, more metadata work. ZFS on the host gets busier too because the guest is asking for data it used to keep hot.

2) Then the guest starts swapping (and you blame the wrong layer)

Once caches are gone, reclaim hits anonymous pages: heap, stacks, JVM memory, database buffers. If the guest has swap enabled (it usually does), the kernel swaps. Your application latency spikes and you start blaming the storage array, the network, or “that one noisy neighbor.” Sometimes the noisy neighbor is your own hypervisor policy.

Joke #1: Ballooning is like “hot desking” for RAM—great until you come back from lunch and your chair is running swap.

3) Host pressure leads to host swapping (which is worse)

If the host is overcommitted and ballooning can’t reclaim enough fast enough, the host may swap. Host swap is brutal because the guest doesn’t know which of its pages got evicted. The guest thinks memory is still resident; it keeps running into hidden major faults at the hypervisor layer. Troubleshooting becomes a game of shadows.

4) The hypervisor’s reclaim timing is not your workload timing

Ballooning isn’t synchronized with your load patterns. It can inflate right before your batch job starts, right when your queue spikes, or during compaction-heavy phases. You don’t get a vote. Your application gets to discover the new reality one page fault at a time.

5) It can turn “safe” overprovision into chaotic overload

There’s a legitimate use case for overcommit: on a large fleet, not all VMs peak at once. But ballooning lets you push the system into a region where a correlated peak (deploy day, monthly jobs, incident failover) creates sudden contention. That contention isn’t shared fairly. It’s shared based on balloon targets and who gets squeezed first.

Who should use ballooning (rarely) and who should not (most of you)

Ballooning can be acceptable when:

VMs are mostly idle and stateless (jump boxes, low-traffic internal tools).
You have strong, boring capacity management and ballooning is just a safety net.
The guest OS and applications tolerate reclaim without tail-latency SLOs.
You have host RAM headroom and you’re using ballooning primarily to reduce waste, not to enable aggressive consolidation.

Ballooning is usually a bad idea when:

You run databases (PostgreSQL, MySQL, MSSQL), search (Elasticsearch/OpenSearch), message brokers (Kafka/RabbitMQ), caches (Redis), or JVM-heavy services.
You rely on filesystem cache as a performance feature (most Linux services do, whether you admit it or not).
You run ZFS on the host and already depend on RAM for ARC. Now you’re juggling two hungry cache layers.
You need predictable performance more than you need “high utilization.”

My default recommendation: disable ballooning for critical VMs, size memory intentionally, keep host memory headroom, and treat overcommit like a controlled substance.

Fast diagnosis playbook

If performance is tanking and you suspect ballooning, don’t boil the ocean. Triage in this order:

First: is the host actually under memory pressure?

Check host free memory, swap use, and whether kswapd is active.
If the host is swapping, fix that before you blame guests.

Second: are the guests reclaiming/swapping because ballooning shrank them?

Check per-VM balloon status and the gap between max memory and current ballooned memory.
Inside the guest, check major faults, swap in/out, and reclaim activity.

Third: is storage latency a symptom of cache collapse?

Check guest cache hit patterns indirectly: increased read IOPS, metadata IO, random reads, and elevated latency without throughput saturation.
On ZFS hosts, check ARC behavior and memory pressure events.

Fourth: decide the least risky immediate action

If ballooning is squeezing a critical VM: raise its balloon target (or disable ballooning) and ensure the host has headroom.
If the host is overcommitted: migrate load, add RAM, reduce allocations, or shut down noncritical VMs.

Practical tasks: commands, outputs, and decisions (12+)

These are the checks I actually run on Proxmox hosts and inside guests. Each task includes: command, what the output means, and the decision you make.

Task 1: Check host memory + swap at a glance

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            62Gi        41Gi       1.8Gi       1.2Gi        20Gi        18Gi
Swap:           16Gi       6.5Gi       9.5Gi

Meaning: “available” is what matters for Linux host comfort. Swap use on the host is already non-trivial.

Decision: If host swap is growing during the incident, treat it as a P0. Reduce load or increase RAM before tuning guests.

Task 2: Identify whether the host is actively reclaiming

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0 6815744 1887340  9120 16720256  12  210   320   410  980 2100 12  6 77  5  0
 1  1 6817780 1750020  9152 16611020   0  540   410   920 1020 2400 10  7 73 10  0
 3  0 6820100 1628800  9152 16500100   0  820   390   800 1100 2600 14  9 67 10  0
 2  0 6821100 1522100  9152 16388900   0  600   420   700 1050 2500 13  8 69 10  0
 1  0 6822100 1489000  9152 16299000   0  480   380   650 1000 2300 11  7 72 10  0

Meaning: so (swap out) is active and wa (IO wait) is climbing. That’s the host pushing pages out.

Decision: Stop the bleeding: migrate a VM, stop noncritical workloads, or add memory. Ballooning won’t save you if you’ve already crossed the line.

Task 3: See top memory consumers on the host

cr0x@server:~$ ps -eo pid,comm,rss,vsz --sort=-rss | head
  PID COMMAND           RSS    VSZ
 4121 qemu-system-x86 8234500 17000000
 3988 qemu-system-x86 8012200 16500000
 4550 qemu-system-x86 6123000 13000000
 1872 pveproxy         320000  550000
 1601 pvestatd         210000  420000
 1023 zfs              180000  300000
  950 pvedaemon        120000  300000
  811 systemd-journal   95000  160000
  621 corosync          82000  210000

Meaning: QEMU processes dominate. That’s expected; now you need to know whether they’re ballooned or pinned.

Decision: Move from “host pressure exists” to “which VMs are driving it and are they being squeezed.”

Task 4: List Proxmox VMs and identify candidates

cr0x@server:~$ qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       101 db-prod-1            running    16384      200.00       4121
       102 api-prod-1           running     8192       60.00       3988
       103 ci-runner-1          running     4096       80.00       4550
       120 jumpbox-1            running     2048       20.00       4801

Meaning: You now have VMIDs to query. The “MEM(MB)” column is configured max, not necessarily current ballooned usage.

Decision: Inspect the ones with tight performance needs first (db-prod-1).

Task 5: Check whether ballooning is configured for a VM

cr0x@server:~$ qm config 101 | egrep -i 'memory|balloon'
balloon: 8192
memory: 16384

Meaning: This VM can be pushed down to 8 GB from a max of 16 GB. That’s not a safety net; that’s a second personality.

Decision: For critical services, set balloon: 0 (disable) or raise it close to max if you truly need minimal reclaim.

Task 6: Inspect runtime balloon stats from the host

cr0x@server:~$ qm monitor 101 --cmd 'info balloon'
balloon: actual=8704

Meaning: The VM is currently ballooned to ~8.5 GB. If the workload expects 16 GB behavior, you’ve found your villain.

Decision: If you need immediate relief: increase balloon target or disable ballooning and confirm host headroom.

Task 7: Confirm the virtio balloon device exists (guest)

cr0x@server:~$ lsmod | grep balloon
virtio_balloon         24576  0

Meaning: The guest is capable of ballooning. If it’s loaded, it will obey the host’s balloon requests.

Decision: For certain appliances or sensitive workloads, you may remove/blacklist it—but prefer fixing at the Proxmox config layer to avoid surprises later.

Task 8: Check guest memory pressure indicators

cr0x@server:~$ cat /proc/pressure/memory
some avg10=42.31 avg60=28.10 avg300=12.90 total=987654321
full avg10=8.22 avg60=4.10 avg300=1.20 total=12345678

Meaning: PSI shows the system is frequently stalled on memory reclaim (some) and sometimes fully stalled (full), which is a latency killer.

Decision: If PSI is elevated during the incident and ballooning is active, treat ballooning as a likely root cause—not just a correlation.

Task 9: Check guest swapping and reclaim behavior quickly

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  2 1843200  12000  20000  340000   80  220   900   700 1200 2500 18  9 50 23  0
 2  1 1845500   9000  19000  320000   20  480  1200   900 1300 2700 20 10 44 26  0
 1  1 1849000   8000  17000  300000    0  520  1100  1000 1250 2600 17  8 48 27  0
 2  1 1852000   7000  16000  280000    0  600  1000  1100 1400 2900 22 11 40 27  0
 1  1 1856000   6000  15000  260000    0  430   950   980 1320 2800 19  9 45 27  0

Meaning: so is persistent. The guest is actively swapping out. Cache is shrinking too.

Decision: Stop the swap storm: give the VM real RAM (disable ballooning or increase memory) and then consider tuning swappiness only after stability returns.

Task 10: Check major faults (guest) to detect paging pain

cr0x@server:~$ grep -E 'pgmajfault|pgfault' /proc/vmstat
pgfault 348778901
pgmajfault 129988

Meaning: Major faults require IO. A rising pgmajfault rate during the incident is consistent with swapping or cache misses.

Decision: If major faults rise when balloon shrinks, this is not “application slowness.” It’s memory starvation.

Task 11: Check the host’s ZFS ARC pressure (if using ZFS)

cr0x@server:~$ arcstat 1 3
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
12:01:01   512    88     14    40   45    20   23    28   32   22.1G  24.0G
12:01:02   680   210     30   140   67    30   14    40   19   18.4G  24.0G
12:01:03   700   260     37   190   73    35   13    35   13   16.2G  24.0G

Meaning: ARC size (arcsz) is shrinking while miss% rises. Host memory pressure is forcing ARC eviction, increasing backend IO.

Decision: If both host ARC is collapsing and guests are ballooned, you have a two-layer cache disaster. Add RAM or reduce VM pressure; don’t “tune” your way out of physics.

Task 12: Check per-QEMU memory from the host to spot weirdness

cr0x@server:~$ pmap -x 4121 | tail -n 5
total kB         17123456  8901232  120000  8200000

Meaning: RSS is high; ballooning doesn’t always immediately reflect as lower RSS because of how memory is accounted and reused.

Decision: Use balloon stats and host pressure indicators, not just RSS, to reason about reclaim. RSS alone can mislead you.

Task 13: Check kernel logs for OOM or reclaim warnings (host or guest)

cr0x@server:~$ journalctl -k -S -30min | egrep -i 'oom|out of memory|kswapd|memory pressure' | tail
Feb 04 12:00:12 server kernel: Memory cgroup out of memory: Killed process 2881 (java) total-vm:8123456kB, anon-rss:3456789kB, file-rss:12345kB, shmem-rss:0kB

Meaning: The guest or host may be killing processes if pressure is severe. Ballooning can be a contributor by shrinking the effective memory.

Decision: If you see OOM kills, stop optimizing and start allocating. Fix the memory budget first.

Task 14: Disable ballooning for a VM (controlled change)

cr0x@server:~$ qm set 101 --balloon 0
update VM 101: -balloon 0

Meaning: Ballooning disabled. The VM should now keep its configured memory without hypervisor-driven reclaim.

Decision: Only do this if the host has enough RAM. If the host is already tight, you may simply move the pain elsewhere (host swap).

Task 15: Raise minimum balloon target instead of disabling (compromise)

cr0x@server:~$ qm set 102 --balloon 7168
update VM 102: -balloon 7168

Meaning: VM 102 can be squeezed, but not below 7 GB. This is a controlled “don’t starve it” guardrail.

Decision: Use this for medium-criticality services where some reclaim is acceptable, but swap storms are not.

Task 16: Validate host overcommit posture quickly

cr0x@server:~$ pvesh get /nodes/server/qemu --output-format yaml | egrep 'vmid|name|memory|balloon' -n | head -n 30
1:vmid: 101
2:name: db-prod-1
3:memory: 16384
4:balloon: 0
5:vmid: 102
6:name: api-prod-1
7:memory: 8192
8:balloon: 7168
9:vmid: 103
10:name: ci-runner-1
11:memory: 4096
12:balloon: 2048

Meaning: You can quickly see who is allowed to be squeezed and by how much.

Decision: Align ballooning policy with service tiers. If a critical VM has a huge balloon range, fix that now, not during the next incident.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized SaaS company migrated a set of internal services from old hardware to a clean Proxmox cluster. They did the right things: redundant power, decent SSDs, dual NICs, and backups that actually restored. Then they did one wrong thing: they assumed “ballooning means the VM gives back truly unused memory.”

Their PostgreSQL VM was configured with 32 GB max and 16 GB balloon. On paper, plenty. In reality, the workload relied on Linux page cache and PostgreSQL shared buffers tuned for the higher memory. During business hours, the cluster got a little busy, Proxmox inflated the balloon, and the guest started reclaiming hard.

Symptoms were subtle at first: occasional query spikes, increased checkpoint times, more IO jitter. The team chased storage latency and even swapped NVMe firmware versions. Meanwhile, the guest quietly grew swap usage in the background because it had been trained to believe it “owned” 32 GB even while the balloon kept pulling it down.

The outage came on a Tuesday deploy when traffic and background migrations overlapped. Tail latency crossed their alert threshold, then blew past it. Auto-scaling didn’t help because it scaled application replicas, not the database VM. The database wasn’t CPU-bound; it was drowning in major faults.

Fix was boring and immediate: disable ballooning for the database VM, right-size memory to what the host could truly afford, and keep host headroom. Performance stabilized within minutes. The postmortem’s most painful line was also the most honest: “We treated cache as free.” Cache is never free; you pay for it in RAM or you pay for it in latency.

Mini-story 2: The optimization that backfired

A large enterprise IT shop had a mandate: increase virtualization density. The Proxmox cluster was running dozens of Windows and Linux VMs, many of them “low priority.” Someone suggested enabling ballooning everywhere with aggressive minimums. The idea was to reclaim idle memory and squeeze more VMs onto each node.

It worked—at first. Host memory graphs looked great. Utilization climbed. The weekly report had numbers that made management nod approvingly. Then the monthly patch window hit, and “low priority” stopped being low priority. Everyone patched, rebooted, compiled, scanned, and logged at the same time.

Ballooning inflated across the fleet. Windows guests started paging. Linux guests dropped caches and went into reclaim. The domain controller VM—configured as “not critical” because it wasn’t customer-facing—got squeezed too. Authentication latency rose. Suddenly lots of services couldn’t log in, which made them retry, which increased load, which made reclaim worse. Classic feedback loop, now with extra bureaucracy.

Operations did the usual dance: blame the SAN, blame the network, blame the patch. But the real failure was policy: they had built a system that depended on workloads never being correlated. Enterprises are correlation factories. Patch windows, quarter-end batch jobs, and vulnerability scans are synchronized pain by design.

They rolled back ballooning on infrastructure VMs, set sane minimums for the rest, and created a capacity budget that assumed correlated peaks. Density went down a bit. Incidents went down a lot. The only “optimization” left was the correct one: buy enough RAM for the workloads you actually run.

Mini-story 3: The boring but correct practice that saved the day

A smaller fintech team ran Proxmox for a mix of services: API, a message broker, a couple of databases, and observability. They had one unglamorous rule: no ballooning on tier-0 services, and a hard requirement of 25–30% host memory headroom.

It wasn’t popular during procurement. RAM isn’t cheap, and “unused” headroom looks like waste on a spreadsheet. But the team stuck to it and documented it as a reliability requirement, not a preference.

One day, a developer accidentally triggered a runaway batch job in a noncritical analytics VM. CPU climbed, memory usage climbed, and the VM started eating page cache like popcorn. On a tightly packed host, that could have forced ballooning elsewhere or pushed the host into swap.

Instead, nothing else flinched. The noisy VM got slower, as it deserved. The database stayed stable. The message broker kept its tail latency. They throttled the batch job and moved on with their day.

That’s what good infrastructure looks like: the blast radius is predictable. It’s not exciting. It doesn’t create hero stories. It quietly prevents them.

Common mistakes: symptom → root cause → fix

1) Symptom: VM “has RAM” but still swaps

Root cause: Ballooning shrank the VM’s effective memory; the guest still sees installed RAM but faces reclaim pressure internally.

Fix: Disable ballooning (qm set <vmid> --balloon 0) for that VM or raise the minimum balloon target close to max. Verify host has headroom.

2) Symptom: Random latency spikes, especially after periods of low activity

Root cause: Cache collapse. Ballooning reclaimed memory used as filesystem cache; workload becomes IO-latency sensitive until cache warms again.

Fix: Don’t balloon cache-dependent workloads. Right-size memory. If necessary, isolate such VMs on nodes with more RAM and fewer tenants.

3) Symptom: Host IO wait rises, but disks are not “maxed out”

Root cause: Swapping or page-in storms produce lots of small random IO with high latency, not necessarily high throughput.

Fix: Confirm with vmstat and guest major faults. Remove host swap pressure; reduce overcommit; stop ballooning critical VMs.

4) Symptom: ZFS ARC shrinks and hit rate drops during “busy” periods

Root cause: Host memory pressure—often from overcommit plus ballooning dynamics—forces ARC eviction, compounding guest cache misses.

Fix: Increase host RAM headroom. Consider setting zfs_arc_max only after you fix overcommit; capping ARC to “save memory” often just transfers pain to IO.

5) Symptom: You disabled ballooning but things got worse

Root cause: The host was already overcommitted; ballooning was masking a capacity deficit. Disabling it pushed the host into swap or OOM.

Fix: Treat it as capacity: migrate VMs, reduce configured memory, add RAM, or split workloads across nodes. Ballooning is not a substitute for capacity planning.

6) Symptom: “It’s only the CI runners that are slow”

Root cause: CI runners depend on caches (package caches, Docker layer caches). Ballooning disrupts them; build times become unpredictable.

Fix: Give CI runners fixed RAM or move caching to external storage designed for it. If you need to squeeze runners, do it intentionally with lower max memory, not dynamic ballooning.

7) Symptom: Windows guests feel sluggish but metrics don’t scream

Root cause: Ballooning plus Windows memory management can hide the real “why” behind high-level counters; you see user-perceived lag first.

Fix: Correlate with host balloon stats (qm monitor ... info balloon) and host memory pressure. If the host is squeezing the VM, stop doing that.

8) Symptom: Frequent OOM kills inside a containerized workload in a VM

Root cause: Ballooning reduces the VM’s available memory; inside it, cgroups enforce limits; the workload hits its cgroup OOM faster.

Fix: Align budgets: VM memory must cover container headroom. Disable ballooning or set a high minimum. Then review container limits.

Joke #2: Ballooning is the only diet where the scale lies first and your database cries second.

Checklists / step-by-step plan

Checklist A: Decide whether ballooning belongs in your environment

Inventory workload types: databases, search, brokers, caches, CI, file servers, “misc.”
Mark tier-0 and tier-1 services: anything with strict latency SLOs or that other systems depend on (DNS, auth, databases).
For tier-0/tier-1: set ballooning to off (balloon: 0) unless you have a very strong reason.
For tier-2/tier-3: if enabling ballooning, set a conservative minimum (not 50% unless you want 50% performance).
Set a host policy: maintain memory headroom (I like 25–30% for mixed workloads; more if you do heavy ZFS caching).
Document exceptions with owners and rollback triggers.

Checklist B: Production-safe remediation when you suspect ballooning

Confirm host is not swapping heavily (free -h, vmstat).
Confirm VM is ballooned below its max (qm config + qm monitor ... info balloon).
Check guest PSI and swap activity (/proc/pressure/memory, vmstat).
If VM is tier-0: raise balloon target immediately (or disable) and watch host memory.
If host is near the edge: migrate a VM off the node first, then disable ballooning.
After stabilization: adjust max memory to a truthful number and keep ballooning off for that service.

Checklist C: Build a sane memory model for Proxmox nodes

Start with physical RAM.
Subtract a fixed host reserve (OS + Proxmox services + safety margin).
Subtract storage cache reserve (ZFS ARC expectations, metadata, IO buffers).
Budget the remaining RAM to VMs as fixed allocations for tier-0/1.
Only then consider ballooning for low-criticality VMs, and cap how much can be reclaimed.
Plan for correlated events: patch windows, batch jobs, failover, incident mode.

FAQ

1) Is Proxmox ballooning “bad” or just misused?

It’s a tool that’s easy to misuse. In production, it often becomes a substitute for capacity planning. Used conservatively for low-criticality VMs, it can be fine. Used broadly, it manufactures latency.

2) What’s the difference between ballooning and memory hotplug?

Hotplug changes the guest’s view of installed RAM (it gains or loses memory devices). Ballooning keeps installed RAM constant and creates internal pressure by forcing reclaim.

3) Why does ballooning cause swap inside the guest?

Because the guest is trying to satisfy memory allocations under pressure. It drops caches first, then reclaims anonymous pages; if that isn’t enough, it swaps—especially if the host keeps pushing the balloon lower.

4) If ballooning reclaims “unused” memory, why does performance drop?

Because “unused” includes cache. Cache may not be actively allocated by applications, but it prevents IO and keeps latency predictable. Reclaiming it is like throwing away your tools because you’re not holding them this second.

5) Should I disable swap inside guests to avoid ballooning pain?

Not as a first move. Disabling swap can turn memory pressure into OOM kills, which are often worse. Fix the cause (ballooning/overcommit). Then decide swap policy per workload.

6) Can ballooning help prevent host OOM?

Yes, sometimes. It’s one way to reclaim memory before the host hits the wall. But if you rely on it routinely, you’re operating too close to the wall. The correct solution is headroom.

7) Why do I see host swap even though ballooning is enabled?

Because ballooning has limits (min targets, reclaim speed, guest behavior). If the host is overcommitted or multiple VMs peak together, ballooning can’t reclaim enough quickly, and the host swaps anyway.

8) What should I set the balloon minimum to if I keep ballooning on?

Start high. For services you care about, set the minimum near the max—think “can reclaim a little idle slack,” not “can halve the VM.” For junk-drawer VMs, you can be more aggressive, but test under load.

9) Does ballooning interact badly with ZFS on the host?

It can. ZFS loves RAM for ARC. If you overcommit VMs and then balloon them while the host is also pressuring ARC, you can end up with double cache collapse: guests drop cache, host drops ARC, disks take the hit.

10) How do I know ballooning is the bottleneck and not CPU or storage?

Correlate: balloon actual shrinks, guest PSI and swap rise, major faults increase, and performance tanks without proportional CPU saturation. That pattern is very characteristic.

Conclusion: practical next steps

Ballooning isn’t evil. It’s just extremely willing to turn your production workload into a memory management experiment without asking permission.

Do these next:

Pick your tier-0 VMs and disable ballooning on them today.
Verify host headroom: if you’re routinely close to the limit, fix capacity before you tweak settings.
Use ballooning only as a controlled policy for low-criticality VMs, with conservative minimums.
Instrument memory pressure: host swap activity, guest PSI, and balloon actual vs max. Make it visible so “mystery slowness” becomes a graph, not a debate.
Write down the rule: predictable performance beats clever consolidation. Your future incident channel will thank you.