You open the Proxmox VM summary to do something routine—grab the guest IP, trigger a clean shutdown, maybe freeze the filesystem before a backup—and Proxmox replies with the operational equivalent of a shrug:
“guest agent not running”.
This is one of those errors that looks like “install a package and move on” until you learn it can also mean “wrong virtual hardware,” “service masked,” “Windows driver missing,” or “you enabled the checkbox but forgot the actual agent.” Let’s fix it properly, verify it, and make sure it stays fixed after reboots, upgrades, and template cloning.
What Proxmox actually means by “guest agent not running”
Proxmox VE talks to a VM through QEMU. To talk into the guest OS, QEMU uses a small helper service running inside the guest: QEMU Guest Agent (qemu-guest-agent on Linux, a service installed via the virtio/guest tools on Windows).
When Proxmox says “guest agent not running,” it’s typically one of these:
- The agent is not installed in the guest OS.
- The agent is installed but the service is stopped/disabled (systemd, Windows service state, policy, broken upgrades).
- The VM is missing the communication channel: the virtio-serial device and QEMU agent socket aren’t available to the guest.
- Proxmox is not configured to use it: the VM setting “QEMU Guest Agent” is disabled in Proxmox.
- It’s running, but not responding: blocked by SELinux/AppArmor, a stale device node, or a stuck agent process.
Think of it like this: Proxmox’s checkbox is the phone line, the package is the phone, and the service is someone actually picking up. You need all three.
Facts and context that make this less mysterious
- QEMU Guest Agent is older than most cloud teams’ patience. It’s been around for years as part of the “make VMs manageable” push in virtualization stacks.
- Proxmox relies on QEMU’s QMP channel under the hood. When you run agent actions (like shutdown), Proxmox uses QMP to call guest-agent commands.
- The agent’s transport is usually virtio-serial. It’s not “network magic,” which is why it works even when the guest network is misconfigured.
- Getting the guest IP via Proxmox often depends on the agent. DHCP and ARP tricks are brittle; the agent is the authoritative source for guest interface info.
- Windows support historically lagged Linux in ease-of-use. Linux can
apt installand move on; Windows typically needs virtio drivers and a separate guest-agent installer. - fsfreeze is a big reason the agent exists in production. Consistent snapshots/backups without application-level hooks are hard; the agent is a pragmatic compromise.
- Cloned templates commonly break it. People build a “golden image,” forget to enable the service, then clone the mistake a hundred times.
- Agent isn’t the same as SPICE or VNC console. Console access is display; guest agent is operations control plane. Confusing them wastes hours.
Fast diagnosis playbook (check 1/2/3)
You want signal fast. Here’s the sequence I use when an on-call page says “shutdown stuck” or “backup fsfreeze failing” and the UI shows “guest agent not running.”
1) Check whether Proxmox enabled the agent for this VM
If Proxmox isn’t exposing the virtio-serial channel, the guest can run the agent all day and it won’t matter.
2) Check the guest OS service state
Installed is not enabled. Enabled is not running. Running is not necessarily healthy.
3) Check the transport: virtio-serial device and agent socket
If the guest doesn’t see /dev/virtio-ports/org.qemu.guest_agent.0 (Linux) or the Windows device is missing, you’ve got a virtual hardware/driver problem.
Only after those three do I go hunting for “exotic” causes (SELinux denial, AppArmor profile, agent crash loops, version mismatches).
Hands-on tasks: commands, expected output, and decisions (12+)
This section is the meat. Each task includes: a command, what “good” looks like, what “bad” looks like, and what decision you make next.
Task 1: On the Proxmox host, confirm the VM config has the agent enabled
cr0x@server:~$ qm config 101 | egrep -i 'agent|serial|vga|machine'
agent: 1
machine: q35
vga: serial0
What it means: agent: 1 is the Proxmox-side enablement. If it’s missing or 0, Proxmox won’t attempt agent calls.
Decision: If agent: 0 or absent, enable it (Task 2). If it’s enabled, move to guest-side checks (Task 5+).
Task 2: Enable the QEMU Guest Agent option in Proxmox (CLI, reproducible)
cr0x@server:~$ qm set 101 --agent enabled=1,fstrim_cloned_disks=1
update VM 101: -agent enabled=1,fstrim_cloned_disks=1
What it means: This flips the same switch as the UI checkbox. The optional fstrim_cloned_disks=1 is a quality-of-life improvement for thin-provisioned storage (especially after cloning).
Decision: After enabling, reboot the VM if needed to ensure the virtio-serial port is present (Task 4). Then verify agent responsiveness (Task 3).
Task 3: Ask Proxmox to ping the guest agent via QMP
cr0x@server:~$ qm agent 101 ping
{"return":{}}
What it means: If you get a JSON return, QEMU can talk to the agent. If you get an error like “QEMU guest agent is not running,” Proxmox can’t reach it.
Decision: If ping works, the error in the UI is stale or the issue is specific to another command (like fsfreeze). If ping fails, go check guest service and device (Tasks 5–8).
Task 4: Confirm the virtio-serial port is present in the VM hardware (host-side)
cr0x@server:~$ qm monitor 101 --cmd 'info chardev'
chardev: org.qemu.guest_agent.0
backend: socket
path: /var/run/qemu-server/101.qga
What it means: This shows QEMU created a socket for the agent. If org.qemu.guest_agent.0 isn’t listed, the guest agent channel isn’t configured/available.
Decision: If missing, re-check agent: 1 and ensure you rebooted the VM after changes. If present, focus inside the guest.
Task 5: Inside a Linux guest, confirm the package is installed
cr0x@server:~$ dpkg -l | grep -E '^ii\s+qemu-guest-agent\b'
ii qemu-guest-agent 1:8.2+dfsg-1+deb12u1 amd64 Guest-side QEMU helper daemon
What it means: If you see the ii line, it’s installed. If you see nothing, it isn’t.
Decision: If not installed, install it (Task 6). If installed, check service state (Task 7).
Task 6: Install the agent on Debian/Ubuntu guests
cr0x@server:~$ sudo apt-get update
Hit:1 http://deb.debian.org/debian bookworm InRelease
Reading package lists... Done
cr0x@server:~$ sudo apt-get install -y qemu-guest-agent
Reading package lists... Done
Building dependency tree... Done
The following NEW packages will be installed:
qemu-guest-agent
Setting up qemu-guest-agent (1:8.2+dfsg-1+deb12u1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/qemu-guest-agent.service → /lib/systemd/system/qemu-guest-agent.service.
What it means: On systemd distros, install typically enables the service. Don’t trust that blindly; verify.
Decision: Move to Task 7 to confirm it’s running and healthy.
Task 7: Check the Linux agent service status (systemd)
cr0x@server:~$ systemctl status qemu-guest-agent --no-pager
● qemu-guest-agent.service - QEMU Guest Agent
Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; enabled; preset: enabled)
Active: active (running) since Thu 2025-12-26 11:14:02 UTC; 2min 10s ago
Main PID: 623 (qemu-ga)
Tasks: 1 (limit: 18936)
Memory: 4.1M
CPU: 82ms
What it means: You want enabled and active (running). If it’s “disabled,” it will regress on reboot. If it’s “failed,” you need logs (Task 8).
Decision: If not enabled, enable it (Task 9). If failed, check logs and device nodes.
Task 8: Read the agent logs to see if it can open the virtio port
cr0x@server:~$ journalctl -u qemu-guest-agent -b --no-pager | tail -n 30
Dec 26 11:13:58 vm101 qemu-ga[603]: info: guest agent started
Dec 26 11:13:58 vm101 qemu-ga[603]: error: failed to open /dev/virtio-ports/org.qemu.guest_agent.0: No such file or directory
Dec 26 11:13:58 vm101 systemd[1]: qemu-guest-agent.service: Main process exited, code=exited, status=1/FAILURE
Dec 26 11:13:58 vm101 systemd[1]: qemu-guest-agent.service: Failed with result 'exit-code'.
What it means: The agent is installed, systemd tries to start it, but the virtio port isn’t present inside the guest. That’s almost always a Proxmox VM hardware setting problem or a missing virtio driver (Windows) / kernel module issue (rare on Linux).
Decision: Go back host-side: confirm agent: 1, confirm QEMU created the socket (Task 4), reboot the VM. If still missing, inspect guest devices (Task 10).
Task 9: Enable and start the agent service (and unmask if needed)
cr0x@server:~$ sudo systemctl unmask qemu-guest-agent
Removed "/etc/systemd/system/qemu-guest-agent.service".
cr0x@server:~$ sudo systemctl enable --now qemu-guest-agent
Created symlink /etc/systemd/system/multi-user.target.wants/qemu-guest-agent.service → /lib/systemd/system/qemu-guest-agent.service.
What it means: unmask fixes the “someone intentionally disabled it” situation. enable --now makes it stick across reboots and starts it immediately.
Decision: Re-run Task 3 from the host. If ping works, you’re done with the basic problem.
Task 10: Verify the virtio-ports device exists inside the Linux guest
cr0x@server:~$ ls -l /dev/virtio-ports/
total 0
crw------- 1 root root 241, 0 Dec 26 11:14 org.qemu.guest_agent.0
What it means: This is the device node the agent needs. If the directory exists but the file does not, the guest doesn’t see the virtio-serial endpoint.
Decision: If missing: confirm VM agent is enabled and reboot the guest; check VM machine type changes; check whether you migrated from another hypervisor with odd devices. If present: agent should be able to start; check AppArmor/SELinux if it still can’t open it (Task 11).
Task 11: Look for SELinux or AppArmor interference (rare, but real)
cr0x@server:~$ sudo aa-status --enabled
apparmor module is loaded.
apparmor filesystem is mounted.
35 profiles are loaded.
0 profiles are in complain mode.
0 profiles are in enforce mode.
What it means: If you see a profile enforcing against qemu-ga, it can break device access. SELinux denials can do the same.
Decision: If enforcement exists and logs show denials, adjust policy or set the agent profile correctly. Don’t “disable security” as a first reflex.
Task 12: From Proxmox, fetch the guest network interfaces and confirm the agent is delivering data
cr0x@server:~$ qm agent 101 network-get-interfaces
{"return":[{"name":"lo","ip-addresses":[{"ip-address":"127.0.0.1","ip-address-type":"ipv4","prefix":8}],"statistics":{"rx-bytes":1200,"tx-bytes":1200}},{"name":"ens18","ip-addresses":[{"ip-address":"10.10.20.41","ip-address-type":"ipv4","prefix":24},{"ip-address":"fe80::f816:3eff:fe1b:9d2a","ip-address-type":"ipv6","prefix":64}],"statistics":{"rx-bytes":12093284,"tx-bytes":2209381}}]}
What it means: This is the payoff: authoritative interface and IP data without ARP guessing games.
Decision: If this works, “guest agent not running” is resolved. If ping works but this fails, your agent is alive but limited—often a version mismatch or restricted permissions.
Task 13: Verify clean shutdown through the agent (so you can stop forcing power-offs)
cr0x@server:~$ qm shutdown 101 --timeout 60
VM 101 shutting down
What it means: With a working agent, Proxmox can request an in-guest shutdown. Without it, Proxmox may fall back to ACPI, which is less reliable depending on OS and configuration.
Decision: If the VM doesn’t shut down, check guest OS power handling and agent logs. If it does, stop using “Stop” as your default shutdown button.
Task 14: Test fsfreeze hooks if you rely on consistent backups
cr0x@server:~$ qm agent 101 fsfreeze-status
{"return":"thawed"}
cr0x@server:~$ qm agent 101 fsfreeze-freeze
{"return":0}
cr0x@server:~$ qm agent 101 fsfreeze-status
{"return":"frozen"}
cr0x@server:~$ qm agent 101 fsfreeze-thaw
{"return":0}
What it means: If freeze/thaw works, you can use agent-assisted backup consistency. If it fails with “command not supported,” the agent may be too old or built without support.
Decision: If you need consistent snapshots, upgrade the guest agent package and confirm the guest filesystem supports freezing (some setups don’t play nicely).
Task 15: On Windows guests, confirm the service exists and is running
Windows doesn’t come with QEMU Guest Agent by default. You typically install it as part of the virtio guest tools package.
cr0x@server:~$ qm agent 202 ping
qemu agent is not running
That output alone doesn’t tell you what’s wrong; it just tells you Proxmox can’t reach it. On the Windows VM, you’d validate the QEMU GA service and virtio-serial driver presence (Device Manager) and then retry qm agent.
Task 16: Host-side sanity: is the qga socket being created?
cr0x@server:~$ ls -l /var/run/qemu-server/101.qga
srw-rw---- 1 root root 0 Dec 26 11:14 /var/run/qemu-server/101.qga
What it means: This is the UNIX socket QEMU uses for the guest agent channel. If it doesn’t exist, QEMU isn’t configured to provide the channel (or the VM isn’t running).
Decision: If missing: check VM state (qm status 101), re-check agent: 1, and look for startup errors in journalctl for pvedaemon/pveproxy if you suspect config handling issues.
Make it stick: persistent configuration that survives templates and updates
Fixing today’s VM is easy. Fixing next month’s clones is where production systems go to die quietly.
1) Bake the agent into your templates, not your runbooks
If you build templates (and you should), treat QEMU Guest Agent like you treat SSH: installed, enabled, validated. For Linux templates:
- Install
qemu-guest-agent - Enable it (
systemctl enable --now qemu-guest-agent) - Verify the device node exists after first boot under Proxmox
The “first boot under Proxmox” clause matters if the template was built elsewhere. The virtio-serial port is virtual hardware. If you build in one hypervisor and run in another, you can “successfully enable a service” that has nothing to talk to.
2) Don’t treat the Proxmox checkbox as optional metadata
In Proxmox, enabling the agent on the VM is not cosmetic. It affects QEMU device configuration and which commands Proxmox even attempts. Enforce it:
- Set it on templates before converting to template
- Set it automatically during provisioning (CLI/API)
- Audit existing VMs and remediate
3) Make service enablement idempotent
One-off manual fixes don’t scale. Use configuration management (Ansible, Salt, your poison) to enforce:
- Package installed
- Service enabled and running
- Optional: a health check that validates the virtio port exists
You don’t need fancy orchestration. You need boring consistency.
4) Watch for regressions after OS upgrades and “hardening”
The two most common regression sources:
- Golden-image hardening scripts that disable “unknown services.” Congratulations, you just disabled your hypervisor integration.
- OS upgrades that change systemd presets or replace packages; the service flips to disabled, or the agent binary changes behavior.
Joke #1: Security teams love “disable everything you don’t recognize” until the hypervisor can’t shut down the VM and the “incident bridge” becomes a group therapy session.
5) Know what the agent is for—and what it isn’t
The guest agent helps with:
- Clean shutdown/reboot
- Guest IP reporting
- Filesystem freeze/thaw
- Time sync hooks in some setups
It is not:
- A replacement for monitoring
- A remote shell
- A magical fix for broken guest networking
Three corporate mini-stories (how teams break this in real life)
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company migrated from an older virtualization platform to Proxmox. They had a tidy checklist: import VM disks, boot, validate app. The apps came up. Everyone declared victory. Then came the first maintenance window.
The operations engineer tried to shut down a batch of VMs cleanly to patch hosts. Proxmox showed “guest agent not running” on a non-trivial subset. They shrugged and used “Stop” because the change window was ticking. That’s not a shutdown; that’s pulling the plug.
A few databases came back “fine” after power-on. One didn’t. It booted, but the database replay time was painful and the application layer started timing out. The postmortem had the familiar smell of hindsight: “We assumed ACPI shutdown would work everywhere. We assumed imported VMs had the same guest tooling. We assumed ‘Stop’ is safe.”
The fix was unglamorous: enable the agent on every VM, install it inside the guests, and enforce it via automation. Also, teach people that “Stop” is the fire axe behind glass—sometimes needed, never the default.
Mini-story 2: The optimization that backfired
Another shop chased boot-time improvements and “lean images.” They trimmed services and packages from templates, including anything that sounded optional. QEMU Guest Agent looked optional. It got removed.
The templates got faster. The dashboard looked neat. The first few weeks were quiet. Then backups started throwing errors intermittently: filesystem freeze commands failed on some VMs. The backup system fell back to crash-consistent snapshots. Nobody noticed because restores weren’t tested frequently (a classic).
Months later a restore was needed for a corrupted application deployment. The restored VM booted, but the data was inconsistent in exactly the way crash-consistent backups can be. The “optimization” saved seconds per VM boot and cost a day of recovery effort and a lot of uncomfortable questions.
They reintroduced the agent, validated fsfreeze on the handful of workloads that needed it, and documented exceptions. The lesson wasn’t “never optimize.” It was “don’t optimize away operational control.”
Mini-story 3: The boring but correct practice that saved the day
A financial services team ran Proxmox clusters with strict change control. Their practice was dull: weekly audits of VM configs, enforced agent enablement, and a pre-maintenance script that checked agent health before host patching.
During a routine host reboot cycle, one VM stubbornly refused a clean shutdown. The script flagged it early: agent ping failed, but the VM was otherwise healthy. The on-call had time to investigate without holding up the entire maintenance window.
The culprit was a guest OS update that masked the qemu-guest-agent service due to a local policy conflict. Because they caught it before the shutdown wave, they fixed it in-guest, confirmed agent ping, and the VM shut down cleanly.
Nothing dramatic happened. No one got praised in an all-hands. That’s the point. Boring controls prevent exciting outages.
Common mistakes: symptom → root cause → fix
These are the ones I keep seeing in the field. The symptoms look similar; the root causes are not.
1) Symptom: Proxmox UI shows “guest agent not running” after you installed the package
- Root cause: Proxmox VM setting
agent: 1is not enabled, so the virtio-serial channel isn’t present. - Fix:
qm set <vmid> --agent enabled=1, reboot the VM, thenqm agent <vmid> ping.
2) Symptom: systemctl status qemu-guest-agent shows “failed to open /dev/virtio-ports/…”
- Root cause: Guest can’t see the virtio-serial device. Usually agent disabled on Proxmox side, or guest booted without that virtual hardware.
- Fix: Enable agent in Proxmox, reboot. Confirm
/dev/virtio-ports/org.qemu.guest_agent.0exists.
3) Symptom: Agent ping works, but IP address is missing in Proxmox summary
- Root cause: Agent running but network info not reported (older agent, restricted permissions, network manager oddities, interface naming/containers).
- Fix: Use
qm agent network-get-interfacesto see what’s reported; upgrade guest agent; ensure guest has stable interface configuration.
4) Symptom: Shutdown hangs, Proxmox times out, then you hit “Stop”
- Root cause: Agent not working and ACPI shutdown not handled properly by the guest OS; or guest OS is unhealthy.
- Fix: Fix agent first. If agent is healthy, check guest OS shutdown logs and services. Use “Stop” only after you’ve accepted potential filesystem inconsistency.
5) Symptom: Backups complain about fsfreeze or run crash-consistent unexpectedly
- Root cause: Agent not running, or fsfreeze not supported/working in guest (unsupported filesystem, agent too old, app holds locks).
- Fix: Validate
qm agent fsfreeze-statusand freeze/thaw manually. Upgrade agent. Exclude workloads where fsfreeze is risky and use application-aware hooks instead.
6) Symptom: Works on one node, fails after live migration to another
- Root cause: Usually not “node-specific,” but you might be seeing timing/race issues or different QEMU versions. Sometimes the agent process in the guest is wedged and recovers on reboot, not migration.
- Fix: Confirm agent ping before and after migration. If it breaks consistently on a node, check node QEMU packages and host logs.
7) Symptom: After “hardening,” the service is “masked”
- Root cause: A baseline script masked the unit to prevent start.
- Fix:
systemctl unmask qemu-guest-agent, thenenable --now. Fix the baseline so it stops doing that.
Checklists / step-by-step plan
Checklist A: Fix a single VM (Linux guest) in under 10 minutes
- On the Proxmox host:
qm config <vmid> | grep -i agent→ if not enabled, setqm set <vmid> --agent enabled=1. - Reboot the VM (yes, really) if you just enabled the agent device channel.
- In the guest: install
qemu-guest-agentvia your package manager. - In the guest:
systemctl enable --now qemu-guest-agent. - In the guest: verify
ls -l /dev/virtio-ports/showsorg.qemu.guest_agent.0. - On the host:
qm agent <vmid> pingandqm agent <vmid> network-get-interfaces.
Checklist B: Make it stick across templates and clones
- Update your base templates to include
qemu-guest-agentinstalled and enabled. - Ensure the template’s VM config has
agent: 1. - Automate a post-provision check: host-side
qm agent <vmid> ping. - Audit existing VMs weekly: flag missing
agent: 1, or guests where ping fails. - Teach your ops team that “Stop” is not shutdown; it’s forced power-off.
Checklist C: When backups depend on guest coordination
- For each workload class, explicitly decide: crash-consistent is acceptable, or you need fsfreeze/app-aware logic.
- Test
fsfreeze-freezeandfsfreeze-thawduring a low-traffic window. - Monitor for freeze timeouts; don’t let a “freeze attempt” hang backups indefinitely.
FAQ
1) Is the Proxmox “QEMU Guest Agent” checkbox enough?
No. It enables the channel on the VM side. You still need the agent installed and running inside the guest OS.
2) Do I need to reboot after enabling the agent in Proxmox?
Often, yes. Adding the virtio-serial endpoint is a virtual hardware change; guests typically detect it cleanly on reboot.
3) Why do I care about the agent if my VM works fine?
Because “works fine” is what you say before you need to shut down 40 VMs cleanly during a host emergency, or before you need consistent backups.
4) Does the agent affect performance?
Negligibly. It’s a small daemon waiting for requests. If it’s consuming real CPU, something else is wrong (like crash loops).
5) Can I get guest IP addresses without the agent?
Sometimes, via ARP tables or DHCP leases. It’s unreliable across VLANs, firewalls, and multi-NIC setups. The agent is the dependable method.
6) My agent is running, but Proxmox still shows “not running.” What gives?
Usually the guest can’t see the virtio-serial device, or Proxmox didn’t enable the agent channel. Confirm agent: 1, confirm the socket exists, and check for /dev/virtio-ports/… in the guest.
7) Is QEMU Guest Agent safe from a security perspective?
It expands what the host can ask the guest to do. That’s the point. If you don’t trust your hypervisor administrators, you have larger problems than this package.
8) Can I use it on Windows guests?
Yes, but it’s not “apt install.” You need the Windows guest agent installed and the virtio-serial driver available. Validate the Windows service and Device Manager entries.
9) Should I enable fsfreeze for every VM?
No. Use it where it provides value (databases, stateful apps) and where you’ve tested it. For some workloads, crash-consistent snapshots are acceptable and simpler.
10) What’s the most reliable health check?
qm agent <vmid> ping from the host, plus a targeted command like network-get-interfaces. Ping alone proves connectivity, not usefulness.
Conclusion: next steps you can actually do today
Fixing “guest agent not running” is not heroic engineering. It’s basic hygiene that prevents avoidable pain: ugly shutdowns, missing IP visibility, and inconsistent backups. Do the boring thing. Your future self will be less busy.
Practical next steps:
- Pick one affected VM and run the host-side triage:
qm config→qm agent ping→ check qga socket. - In the guest, install and enable the agent; verify the virtio port exists.
- Update your templates and provisioning so new VMs don’t repeat the problem.
- Add a weekly audit: VMs with
agent: 0or ping failures get fixed before maintenance windows.
Quote (paraphrased idea) from Gene Kranz: “Failure isn’t an option” is less a slogan and more an operational budget line—paid in checks, tests, and discipline.
Joke #2: If your only shutdown procedure is “Stop,” you don’t have a procedure—you have a coin toss with better UI.