Nothing makes a Proxmox node feel more “production” than the moment you run apt update and it replies: “could not resolve host”. Suddenly your patch window is a hostage situation, your HA cluster is fine-but-not-fine, and you’re staring at a DNS problem that could actually be IPv6, routing, or a proxy that only exists in someone’s PowerPoint.
This is a fast, opinionated workflow for SREs and operators who need the node to update now, but also want to fix the real cause so it doesn’t return during the next maintenance window like a bad sequel.
Fast diagnosis playbook (first/second/third)
When you’re on the clock, do not “try stuff.” Prove where the failure is. This playbook is ordered to maximize signal in minimal time.
First: Is it DNS, or is it basic reachability?
- Check if the node can reach an IP on the internet (no DNS). If it can’t, stop blaming DNS. Fix routing/firewall first.
- Check if the node can resolve a name using its configured resolver. If not, it’s DNS config, DNS server reachability, or a resolver service issue.
- Check if the node can connect to the repository host after resolution. If DNS works but TCP fails, it’s proxy/firewall/MTU/TLS.
Second: Decide whether IPv6 is in the blast radius
- See if name resolution returns AAAA records and whether the system prefers them.
- Test IPv6 routing. A broken IPv6 default route can produce confusing resolver timeouts and “could not resolve host” symptoms, depending on the client.
- Temporarily force IPv4 for a quick operational workaround (then fix IPv6 properly).
Third: Proxies (explicit, auto-config, “transparent”) and APT specifics
- Check environment variables and APT proxy config. APT can be proxied even when your shell isn’t.
- Test direct vs proxy egress. Make an explicit request with and without proxy and compare.
- Confirm proxy DNS behavior. Some proxies resolve on your behalf; others expect your node to resolve.
If you follow that order, you’ll usually find the bottleneck in under ten minutes. If you don’t, you’ll spend an hour “fixing DNS” by breaking IPv6 and teaching the proxy to lie.
The mental model: what “could not resolve host” really means on Proxmox
On Proxmox (Debian-based), “could not resolve host” typically comes from one of three layers:
- Resolver layer: the libc resolver stack can’t convert a hostname to an IP. That might be because the configured DNS server is unreachable, the resolver service is miswired, or the node is using an unexpected resolver (systemd-resolved vs plain resolv.conf).
- Transport layer disguised as DNS: some tools report a resolution error when they time out during a resolution attempt that depends on network reachability (especially with IPv6 preference or broken routes).
- Proxy layer: the client thinks it should talk to a proxy, the proxy is down, the proxy requires auth, or the proxy’s DNS behavior doesn’t match your network design.
Proxmox makes this more fun because your node is also a network appliance. It has bridges, VLANs, sometimes bonds, sometimes multiple uplinks, sometimes management networks with special DNS. And it runs critical services (corosync, pvedaemon, pveproxy) that you don’t want to disturb while “trying a fix.”
Rule one: don’t shotgun-edit /etc/resolv.conf and hope. Prove the path. Then change the minimum.
Joke #1: DNS is the only system where “it works on my laptop” is a threat model.
Interesting facts and historical context (yes, it matters)
- resolv.conf predates most of your tooling. The file format comes from early Unix resolver conventions and still anchors many modern setups, even when it’s a symlink managed by something else.
- systemd-resolved changed the default failure modes. Instead of a static
/etc/resolv.confwith direct upstream resolvers, many systems now point at a local stub (often127.0.0.53) and the daemon does the upstream work. - APT has its own networking personality. It doesn’t behave exactly like
curlorwgetin proxy handling, IPv6 preference, or error reporting. Testing withaptmatters. - IPv6 “partial deployment” is worse than no IPv6. A network that advertises IPv6 but can’t route it reliably will create intermittent, hard-to-debug resolution and connectivity issues.
- Happy Eyeballs exists because IPv6 broke people. Modern clients try IPv6 and IPv4 in parallel to reduce perceived latency. But not every tool implements it the same way, and some still get stuck.
- Split-horizon DNS is common in enterprises. Internal DNS returns different answers than public DNS. Proxmox nodes often sit in “semi-internal” segments where the wrong resolver gives wrong answers.
- glibc resolver behavior is conservative. It may try nameservers in order, apply timeouts, and retry. A single dead nameserver first in the list can add long delays that look like “DNS is down.”
- Proxmox clusters depend on name resolution more than you think. You can run by IP, but many operational scripts, metrics, backup integrations, and repo configurations assume hostnames work consistently.
- Proxy auto-config (PAC/WPAD) is ancient and still causes outages. A client picking up a proxy setting it shouldn’t is a classic “works yesterday, fails today” trap.
Practical tasks: commands, expected output, and decisions (12+)
These are real operator moves. Each one has: the command, what the output means, and what decision you make next. Run them on the Proxmox node (pve), not on your laptop, because your laptop is always lying.
Task 1: Confirm the error is actually name resolution
cr0x@server:~$ apt-get update
Hit:1 http://deb.debian.org/debian bookworm InRelease
Err:2 https://enterprise.proxmox.com/debian/pve bookworm InRelease
Could not resolve 'enterprise.proxmox.com'
Reading package lists... Done
W: Failed to fetch https://enterprise.proxmox.com/debian/pve/dists/bookworm/InRelease Could not resolve 'enterprise.proxmox.com'
Meaning: APT claims DNS resolution failed for a specific host. This might still be proxy misconfiguration or IPv6 reachability causing the resolver path to time out.
Decision: Move to direct resolution tests (getent, resolvectl, dig) before editing anything.
Task 2: Check what resolvers the system thinks it has (systemd-resolved path)
cr0x@server:~$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 10.10.0.53
DNS Servers: 10.10.0.53 10.10.0.54
DNS Domain: corp.example
Meaning: The host uses systemd-resolved in stub mode, with upstream DNS servers listed. Good: you have a source of truth.
Decision: If DNS servers aren’t what you expect, fix network config (interfaces) or systemd-resolved config; don’t hardcode /etc/resolv.conf unless you’re intentionally bypassing resolved.
Task 3: If resolvectl isn’t available or looks empty, inspect /etc/resolv.conf
cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Nov 21 10:12 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
Meaning: It’s a symlink to systemd-resolved’s stub file. Editing it directly is wasted motion; it’ll get overwritten.
Decision: Use resolvectl, systemd-resolved config, or adjust interface DNS settings rather than manual edits.
Task 4: Test resolution through libc (this is what most apps use)
cr0x@server:~$ getent ahosts enterprise.proxmox.com
51.91.38.34 STREAM enterprise.proxmox.com
51.91.38.34 DGRAM
51.91.38.34 RAW
Meaning: libc can resolve the hostname. If APT still fails, suspect proxy, TLS interception, or intermittent DNS timeouts rather than total DNS failure.
Decision: If getent succeeds, move to connectivity tests. If it fails, focus on resolver and DNS server reachability.
Task 5: Test raw DNS to a specific nameserver (bypass local resolver logic)
cr0x@server:~$ dig @10.10.0.53 enterprise.proxmox.com +time=2 +tries=1
; <<>> DiG 9.18.24-1 <<>> @10.10.0.53 enterprise.proxmox.com +time=2 +tries=1
; (1 server found)
;; global options: +cmd
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19041
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; ANSWER SECTION:
enterprise.proxmox.com. 300 IN A 51.91.38.34
;; Query time: 21 msec
;; SERVER: 10.10.0.53#53(10.10.0.53) (UDP)
;; WHEN: Tue Dec 26 10:12:11 UTC 2025
;; MSG SIZE rcvd: 64
Meaning: DNS server reachable and answering. If other names fail, it’s not “DNS is down,” it’s “DNS is selectively broken.” Different problem.
Decision: If the query times out, test routing to the DNS server and check firewall rules between node and resolver.
Task 6: Verify you can reach the DNS server at all (routing + ACLs)
cr0x@server:~$ ping -c 2 10.10.0.53
PING 10.10.0.53 (10.10.0.53) 56(84) bytes of data.
64 bytes from 10.10.0.53: icmp_seq=1 ttl=63 time=0.612 ms
64 bytes from 10.10.0.53: icmp_seq=2 ttl=63 time=0.590 ms
--- 10.10.0.53 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.590/0.601/0.612/0.011 ms
Meaning: At least ICMP works. Not a proof that UDP/53 works, but it kills the “no route” theory.
Decision: If ping fails, check default route, VLAN tagging, bridge configuration, and firewall policies.
Task 7: Confirm default routes (IPv4 and IPv6) to rule out “DNS by way of nowhere”
cr0x@server:~$ ip route
default via 10.10.0.1 dev vmbr0 proto kernel
10.10.0.0/24 dev vmbr0 proto kernel scope link src 10.10.0.20
cr0x@server:~$ ip -6 route
default via fe80::1 dev vmbr0 proto ra metric 1024 expires 1532sec
2001:db8:10:10::/64 dev vmbr0 proto kernel metric 256
Meaning: You have default routes for both stacks. If IPv6 default route is present but upstream is broken, some clients will try IPv6 first and stall.
Decision: If IPv6 looks suspicious (route exists but upstream doesn’t), test IPv6 reachability explicitly and decide whether to disable IPv6 temporarily or fix RA/route.
Task 8: Test connectivity without DNS (prove egress)
cr0x@server:~$ curl -I --connect-timeout 3 https://1.1.1.1
HTTP/1.1 400 Bad Request
Server: cloudflare
Date: Tue, 26 Dec 2025 10:13:01 GMT
Content-Type: text/html
Content-Length: 155
Connection: close
Meaning: You can reach the internet and do TLS to a known IP. The 400 is fine; the network path is working.
Decision: If this fails, stop: you have an egress/routing/firewall issue, not a DNS issue.
Task 9: Test DNS + TCP to the repo host (the “end-to-end” check)
cr0x@server:~$ curl -I --connect-timeout 5 https://enterprise.proxmox.com/debian/pve/dists/bookworm/InRelease
HTTP/2 200
date: Tue, 26 Dec 2025 10:13:33 GMT
content-type: application/octet-stream
content-length: 1880
Meaning: DNS and HTTPS work end-to-end. If APT still fails, it’s likely APT proxy config, APT acquiring IPv6 differently, or a transient resolver timeout.
Decision: If curl works but APT fails, compare proxy settings and force IPv4 in APT as a test.
Task 10: Check proxy configuration affecting APT (common in corporate networks)
cr0x@server:~$ grep -R "Acquire::http::Proxy" -n /etc/apt/apt.conf /etc/apt/apt.conf.d 2>/dev/null
/etc/apt/apt.conf.d/80proxy:1:Acquire::http::Proxy "http://proxy.corp.example:3128";
/etc/apt/apt.conf.d/80proxy:2:Acquire::https::Proxy "http://proxy.corp.example:3128/";
Meaning: APT is forced through a proxy, even if your shell environment isn’t. If that proxy is down or misbehaving, APT may show misleading errors.
Decision: Validate proxy connectivity, authentication, and DNS behavior. If the environment shouldn’t use a proxy, remove this config deliberately (don’t just comment it out and forget).
Task 11: Check shell proxy environment variables (they can sabotage curl/wget too)
cr0x@server:~$ env | grep -i proxy
http_proxy=http://proxy.corp.example:3128
https_proxy=http://proxy.corp.example:3128
no_proxy=localhost,127.0.0.1,10.0.0.0/8
Meaning: Your shell is proxied. Some tools honor these variables, some don’t. Mixed behavior is how you get contradictory tests.
Decision: For clean testing, run commands with proxy disabled/enabled explicitly.
Task 12: Prove proxy connectivity and whether proxy resolves DNS for you
cr0x@server:~$ curl -I --proxy http://proxy.corp.example:3128 --connect-timeout 5 https://enterprise.proxmox.com/
HTTP/1.1 200 Connection established
HTTP/2 200
date: Tue, 26 Dec 2025 10:14:11 GMT
content-type: text/html; charset=utf-8
Meaning: Proxy is reachable and can establish a tunnel for HTTPS. If the proxy returns “could not resolve host” errors itself, then the proxy’s DNS is broken, not your node’s.
Decision: If proxy works but direct fails, your network requires proxy egress. If direct works but proxy fails, remove proxy dependency for nodes if policy allows.
Task 13: Force IPv4 in a single test to isolate IPv6 preference issues
cr0x@server:~$ curl -4 -I --connect-timeout 5 https://enterprise.proxmox.com/
HTTP/2 200
date: Tue, 26 Dec 2025 10:14:40 GMT
content-type: text/html; charset=utf-8
Meaning: IPv4 path is fine. If the non--4 curl hangs or fails, IPv6 is likely broken or filtered.
Decision: Fix IPv6 routing or disable IPv6 on the node/network until it’s actually supported end-to-end.
Task 14: Inspect NSS configuration (rare, but lethal when wrong)
cr0x@server:~$ grep -E "^(hosts:)" /etc/nsswitch.conf
hosts: files dns
Meaning: Hostname resolution uses /etc/hosts then DNS. If this line is weird (missing dns, or includes broken modules), you get bizarre failures.
Decision: Keep files dns unless you have a very good reason to be clever.
Task 15: Check /etc/hosts for self-inflicted wounds
cr0x@server:~$ cat /etc/hosts
127.0.0.1 localhost
10.10.0.20 pve01.corp.example pve01
Meaning: This is sane: localhost maps to 127.0.0.1 only; node’s management IP maps to its hostname. If you map random external names here, you deserve the outage you get.
Decision: Only pin what you must (usually the node’s own name). Use DNS for everything else.
Task 16: Capture DNS packets (when you need proof, not opinions)
cr0x@server:~$ tcpdump -ni vmbr0 port 53 -c 5
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:15:11.123456 IP 10.10.0.20.41788 > 10.10.0.53.53: 1234+ A? enterprise.proxmox.com. (39)
10:15:11.144321 IP 10.10.0.53.53 > 10.10.0.20.41788: 1234 1/0/1 A 51.91.38.34 (55)
10:15:11.145001 IP 10.10.0.20.35321 > 10.10.0.53.53: 5678+ AAAA? enterprise.proxmox.com. (39)
10:15:11.165210 IP 10.10.0.53.53 > 10.10.0.20.35321: 5678 0/1/1 (94)
5 packets captured
Meaning: You can see queries and responses. Here: A record returns, AAAA doesn’t (maybe NODATA). That’s not inherently wrong.
Decision: If you see queries leaving but no responses, it’s a path/ACL issue. If you see responses but apps still fail, look at resolver selection, caching, or proxy behavior.
Task 17: Check systemd-resolved health and logs
cr0x@server:~$ systemctl status systemd-resolved --no-pager
● systemd-resolved.service - Network Name Resolution
Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-12-26 09:55:02 UTC; 20min ago
Docs: man:systemd-resolved.service(8)
Main PID: 612 (systemd-resolve)
Status: "Processing requests..."
Tasks: 1 (limit: 38351)
Memory: 7.2M
CPU: 1.104s
CGroup: /system.slice/systemd-resolved.service
└─612 /lib/systemd/systemd-resolved
cr0x@server:~$ journalctl -u systemd-resolved -n 20 --no-pager
Dec 26 10:02:11 server systemd-resolved[612]: Using DNS server 10.10.0.53 for interface vmbr0.
Dec 26 10:07:44 server systemd-resolved[612]: Switching to DNS server 10.10.0.54 for interface vmbr0.
Dec 26 10:08:10 server systemd-resolved[612]: Using degraded feature set UDP instead of TCP for DNS server 10.10.0.54.
Meaning: Resolved is flipping servers or degrading transport. That’s a strong hint your DNS servers are flaky or blocked for TCP/53/EDNS0 paths.
Decision: Escalate to DNS infrastructure or adjust which resolvers the node uses (prefer stable, local resolvers).
Task 18: Validate Proxmox repo configuration (because “could not resolve host” sometimes means “wrong host”)
cr0x@server:~$ grep -R "proxmox" -n /etc/apt/sources.list /etc/apt/sources.list.d 2>/dev/null
/etc/apt/sources.list.d/pve-enterprise.list:1:deb https://enterprise.proxmox.com/debian/pve bookworm pve-enterprise
Meaning: Repo host is exactly what APT is trying to resolve. If you copied a config from a wiki or an old build, you might be pointing at a hostname that’s blocked, retired, or only reachable via proxy.
Decision: Confirm you’re using the correct repo for your subscription status and network policy. Don’t “fix DNS” to reach a repo you’re not allowed to use.
DNS stack on Proxmox: resolv.conf, systemd-resolved, and the usual traps
Most Proxmox deployments are basically Debian with opinions. Debian gives you choices for name resolution: classic resolv.conf, or systemd-resolved as a local stub resolver. Proxmox nodes often get configured through /etc/network/interfaces (ifupdown2) and sometimes inherit DNS from DHCP, sometimes from static config.
Know which resolver path you’re on
The fastest tell is: what is /etc/resolv.conf?
- If it points to
127.0.0.53(stub), you’re on systemd-resolved and upstream DNS is controlled elsewhere. - If it contains
nameserver 10.x.x.xentries directly, you’re likely on classic resolv.conf management (or resolved is in “foreign” mode). - If it’s being rewritten every reboot, you probably have a DHCP client, network manager, or ifupdown hooks managing it.
Operators get burned because they “fix” DNS by editing resolv.conf, it works for 20 minutes, then the network stack overwrites it. The next patch window becomes a déjà vu festival.
Prefer stable resolver sources for nodes
In an enterprise environment, the best DNS servers for Proxmox nodes are usually:
- Local recursive resolvers designed for infrastructure (often internal), with monitoring and redundancy.
- Resolvers reachable on the management VLAN with explicit firewall rules.
The worst DNS servers for Proxmox nodes are usually:
- Random consumer resolvers reachable “some of the time” through NAT paths.
- Branch-office DNS servers that disappear during WAN events.
- Anything that relies on captive portals, endpoint posture checks, or “helpful” web-based auth.
Timeout behavior: one dead nameserver can ruin your day
The libc resolver typically tries servers in order, with timeouts and retries. If your first nameserver is dead and your second is healthy, resolution will still be slow and can look like it’s failing under load.
If you see long pauses on apt update before it fails, suspect timeouts rather than a clean NXDOMAIN. That’s routing, firewall, or a dead resolver, not a typo.
Single quote requirement
Hope is not a strategy.
— often attributed in operations culture (paraphrased idea)
IPv6 failure modes that masquerade as DNS
IPv6 is not the villain. Half-IPv6 is the villain: when the node gets an IPv6 address and default route (via RA), but upstream routing, firewall, or proxy doesn’t actually support IPv6 for outbound traffic.
Here’s how this becomes a “could not resolve host” problem:
- Your resolver returns both A and AAAA records.
- The client prefers AAAA (IPv6) or tries it first.
- The connection attempt hangs or fails.
- The application reports a generic failure. Some apps blame DNS because name resolution was involved.
Patterns that scream “IPv6 is lying to you”
getent ahostsreturns IPv6 entries (AAAA) and IPv4 entries, but only IPv4 connections work.curl -4works; plaincurlis slow or fails.- You have an IPv6 default route but no stable upstream IPv6 connectivity.
What to do about it
In a perfect world, you fix IPv6 properly: correct RA, correct routing, correct firewall, correct DNS. In the real world, you also need a safe operational workaround. Two pragmatic approaches:
- Force IPv4 for APT during an outage window (short-term workaround).
- Disable IPv6 on the management interface if your organization is not actually supporting it yet (medium-term cleanup).
Be disciplined: if you disable IPv6, document it and track the work to re-enable it when the network is ready. “Temporary” is how you end up with 2035-era legacy config and a team that believes IPv6 is haunted.
Joke #2: IPv6 is great—right up until someone deploys it “as a concept” instead of as a network.
Corporate proxy and “transparent” proxy failure modes
Proxmox nodes are servers. But they often live in networks designed for endpoints. That’s how you get proxies. Sometimes explicit. Sometimes “transparent.” Sometimes decided by a PAC file that no server should ever consume, but surprise: someone exported environment variables into /etc/environment and now APT is negotiating with a squid box that thinks it’s an HR browser.
Proxy failure modes that look like DNS
- Proxy is down: client fails to connect, error bubbles up as “could not resolve host” depending on the tool and configuration.
- Proxy requires authentication: APT might fail early. curl may show a 407. Some tooling reports generic resolution errors after retries.
- Proxy resolves DNS itself: your node never resolves external names; the proxy does. If the proxy’s DNS breaks, every node “has DNS issues” simultaneously.
- Proxy does not resolve DNS: it expects the client to resolve and then CONNECT by IP or by hostname. If the client DNS is wrong, the proxy is irrelevant.
- TLS interception: less about resolution, more about “APT can’t fetch,” but it gets misdiagnosed the same way because operators fixate on the first error line.
Proxy handling rules I enforce on Proxmox nodes
- Be explicit. If a node must use a proxy, configure it in APT config files, not in random shell profiles.
- Limit scope. Use
no_proxyfor cluster subnets, storage networks, and management ranges so internal traffic doesn’t hairpin through the proxy. - Test both paths. Have a standard “direct” test and a standard “proxy” test so you can immediately isolate the failure domain.
Common mistakes: symptom → root cause → fix
This is the section where you recognize your last outage.
1) Symptom: apt update says “could not resolve host”, but dig works
- Root cause: APT is using a proxy (APT config), and the proxy is failing to resolve or connect. Your direct DNS tests bypassed it.
- Fix: Inspect
/etc/apt/apt.conf.dfor proxy directives; test proxy with curl using--proxy. Fix proxy or remove proxy config.
2) Symptom: DNS works for some names, fails for others
- Root cause: split-horizon DNS, conditional forwarding misconfig, or a resolver that blocks certain domains/categories.
- Fix: Query the same name against multiple resolvers (
dig @server), compare answers, choose the right resolver for the node’s network role.
3) Symptom: Everything is slow; sometimes it works after 30–60 seconds
- Root cause: first nameserver is dead/filtered, second is fine; resolver timeouts waste time before failover.
- Fix: Remove or deprioritize dead resolvers; ensure firewall allows UDP/53 and TCP/53 to your resolvers.
4) Symptom: curl -4 works; plain curl hangs or fails; APT is flaky
- Root cause: broken IPv6 route or upstream IPv6 filtering. Client tries IPv6 first.
- Fix: Fix IPv6 properly (routing/RA/firewall) or disable IPv6 on the management interface until it’s supported. As a workaround, force IPv4 for APT.
5) Symptom: Editing /etc/resolv.conf “fixes it” until reboot
- Root cause: resolv.conf is managed (systemd-resolved, DHCP client, ifupdown hooks). Your manual edit is overwritten.
- Fix: Configure DNS in the correct layer: interface config, systemd-resolved settings, or DHCP options. Stop fighting the automation.
6) Symptom: Node resolves internal names fine, external names fail
- Root cause: internal resolver lacks recursion to internet, or firewall blocks outbound DNS from that segment, expecting a proxy.
- Fix: Use resolvers that provide recursion for nodes that need updates, or provide a controlled egress path (proxy or DNS forwarder) explicitly.
7) Symptom: Only Proxmox nodes fail, VMs are fine
- Root cause: host network differs from VM network: management VLAN ACLs, vmbr0 mis-tagged VLAN, host firewall rules, or wrong gateway on host.
- Fix: Compare host and VM routing/DNS, check bridge/VLAN config, validate host default route and DNS server reachability on vmbr interfaces.
8) Symptom: Error appears after “hardening” changes
- Root cause: firewall rules blocking outbound UDP/53/TCP/53, or blocking IPv6 ICMP (which breaks PMTUD and neighbor discovery, causing weirdness).
- Fix: Allow necessary DNS traffic to resolvers; allow required ICMP/ICMPv6 types; verify with tcpdump and explicit tests.
Checklists / step-by-step plan
Checklist A: The 10-minute “get updates working” plan
- Prove egress by IP:
curl -I https://1.1.1.1. If broken, fix routing/firewall before DNS. - Prove libc resolution:
getent ahosts enterprise.proxmox.com. If broken, focus on resolver config. - Prove DNS server reachability:
dig @<dns> enterprise.proxmox.com. If timeouts, fix ACLs/routes to DNS. - Check resolver source of truth:
resolvectl statusandls -l /etc/resolv.conf. Don’t edit the wrong layer. - Check proxy settings:
grep -R Acquire:: /etc/apt/apt.conf.dandenv | grep -i proxy. - Try forced IPv4 as a diagnostic:
curl -4 -I https://enterprise.proxmox.com/. - Re-run
apt-get update: confirm the fix is real, not placebo.
Checklist B: The “make it stay fixed” plan (after the outage)
- Pick the correct DNS ownership model: systemd-resolved + interface DNS, or classic resolv.conf management. Document it.
- Make DNS servers explicit for static management networks. Relying on DHCP in a data center is fine until it isn’t.
- Validate IPv6 posture: either support it end-to-end or disable it intentionally on that segment. No half measures.
- Standardize proxy config: APT proxy file managed by configuration management; keep shell environment clean for operators.
- Add a lightweight node health check: periodic DNS + repo reachability checks with clear alerting on which layer failed.
Checklist C: Evidence collection for escalation (when it’s not your fault)
- Capture resolver config:
resolvectl status,/etc/resolv.conf,/etc/nsswitch.conf. - Capture DNS query proof:
dig @dns hostoutputs andtcpdumpsnippets. - Capture route tables:
ip route,ip -6 route. - Capture proxy config: APT proxy file, environment variables, and a
curl --proxytest. - Write down timestamps. DNS teams love timestamps because logs exist. Vibes do not.
Three corporate-world mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A team inherited a small Proxmox cluster running a mix of management and guest networks across a couple of VLANs. Everything looked tidy: vmbr0 for management, vmbr1 for guests. Updates were “sometimes flaky,” which everyone interpreted as “the internet is the internet.” Nobody liked it, but nobody owned it.
During a planned upgrade, apt update started failing with “could not resolve host” for multiple repositories. An engineer did the standard thing: replaced /etc/resolv.conf with public resolvers and moved on. It worked. For a few hours. Then the maintenance automation rebooted nodes one-by-one and the problem came back on every reboot. Cue panic.
The wrong assumption was simple: they assumed resolv.conf was authoritative. It wasn’t. The system was using systemd-resolved, and DHCP on vmbr0 was injecting DNS servers from a branch-office DHCP relay that intermittently lost its upstream. The “fix” was overwritten precisely when reliability mattered: during reboots.
Once they mapped the resolver chain, the real fix was boring: pin stable internal recursive resolvers for the management interface and stop consuming DHCP-provided DNS on that VLAN. Then they documented the resolver ownership model in the build standard. The next upgrade was quiet, which is the highest compliment you can give an SRE.
Mini-story 2: The optimization that backfired
A security project decided to “optimize DNS” by forcing all server DNS queries through a central filtering resolver to standardize logging and block categories. It was pitched as reducing risk and simplifying audits. It did both, technically.
But Proxmox nodes were now behind a resolver that rate-limited bursts and had aggressive timeouts under load. PVE nodes are chatty in maintenance windows: they hit Debian mirrors, Proxmox repos, sometimes Ceph repos, plus whatever monitoring and backup endpoints you’ve wired in. The new resolver worked fine during normal hours and then faceplanted during patch waves.
The first symptom was “could not resolve host” in APT. The second symptom was worse: slow UI loads on pveproxy when it tried to resolve external endpoints for certain integrations. Engineers “fixed” it by adding another nameserver entry (a public resolver) as a fallback. This created a policy violation and, more importantly, nondeterminism: half the time they used internal DNS, half the time they leaked queries externally.
The eventual fix was not heroic. They created a dedicated resolver tier for infrastructure nodes with higher capacity and tuned retry/timeout behavior, and they pinned Proxmox management networks to it. The filtering resolver still existed for endpoints and general servers. The lesson was painfully corporate: optimization is not free; it just moves the bill to a different cost center.
Mini-story 3: The boring but correct practice that saved the day
A different company had a habit that seemed almost old-fashioned: every Proxmox node build included a small “connectivity contract” test. It ran after provisioning and daily thereafter. It checked three things: resolve a known hostname using the configured resolver, connect to a known IP directly, and fetch a tiny file from the repo origin used by that environment. Results went to monitoring with explicit labels: DNS, routing, or repo/proxy.
One Tuesday morning, multiple teams started complaining that their own systems couldn’t reach external services. The Proxmox cluster looked healthy. VMs were running. Nobody wanted to touch the hypervisors.
Then the alert fired: “DNS failure on management VLAN resolver path.” Not “apt broken.” Not “Proxmox can’t update.” A clean DNS failure, across nodes, with timestamps and last-known-good data. The ops team walked into the network room (figuratively), handed over packet captures showing unanswered UDP/53 queries, and got the DNS service restored quickly.
The boring practice didn’t prevent the outage. It prevented the second outage: the one where five engineers spend half a day changing random configs on production hypervisors because they don’t know which layer is failing.
FAQ
1) Why does Proxmox show “could not resolve host” during apt update?
Because APT can’t translate the repository hostname to an IP address using the node’s resolver path. That can be actual DNS failure, unreachable DNS servers, broken IPv6 preference, or proxy-related confusion.
2) If ping 8.8.8.8 works, does that prove DNS is the problem?
No. It proves basic IPv4 egress and routing for ICMP to that address. DNS can still be broken, blocked, or misconfigured. Also, ICMP working doesn’t prove TCP/443 works.
3) If dig works but APT fails, what should I suspect first?
Proxy configuration and IPv6 preference differences. Check /etc/apt/apt.conf.d for Acquire:: proxy directives, then test curl -4 vs default.
4) Should I just edit /etc/resolv.conf?
Only if you’ve confirmed it’s not managed by systemd-resolved or DHCP hooks. If it’s a symlink to systemd-resolved, editing it is a temporary illusion. Fix DNS in the correct ownership layer.
5) How can IPv6 break DNS resolution?
IPv6 itself doesn’t break DNS. But if your network advertises IPv6 and your clients prefer it, connection attempts can fail or stall, and tools may report confusing errors that look like resolution issues.
6) What’s the quickest way to prove it’s an IPv6 issue?
Run the same test with IPv4 forced: curl -4 -I https://host. If forced IPv4 works reliably but default doesn’t, IPv6 routing or filtering is suspect.
7) Can Proxmox clustering (corosync) be affected by DNS problems?
Corosync itself usually runs by IP configuration, but operational tooling, repo access, metrics, backups, and some integrations depend on stable name resolution. DNS issues won’t always break clustering, but they will break maintenance, which is when you least want surprises.
8) I’m behind a corporate proxy. Should Proxmox nodes use it?
If policy requires it, yes—explicitly and consistently. Configure APT proxy settings in a managed file, keep no_proxy correct for cluster and storage networks, and test proxy reachability as part of node health checks.
9) Why does this happen only after reboot?
Because DNS settings are often injected at boot by DHCP or managed services. Your manual fix gets overwritten. Confirm management via ls -l /etc/resolv.conf and resolvectl status.
10) When should I escalate to the DNS/network team?
When you can demonstrate: queries leave the node, DNS server is unreachable or not responding, or resolved logs show server switching/degraded behavior. Bring packet captures and timestamps, not interpretations.
Conclusion: practical next steps
“Could not resolve host” on a Proxmox node is rarely a mystery. It’s a workflow problem. If you prove the failure domain in order—egress by IP, libc resolution, DNS server reachability, IPv6 preference, then proxy behavior—you’ll stop treating DNS like folklore.
Next steps that actually pay off:
- Standardize resolver ownership (systemd-resolved or not) and document it in your node build.
- Pin stable DNS servers for the management network and ensure UDP/53 and TCP/53 are allowed explicitly.
- Decide on IPv6 intentionally: support it end-to-end or disable it on the segment. Half-IPv6 is operational debt with interest.
- Make proxy usage explicit and testable. APT proxy config should not be a scavenger hunt.
- Add a tiny, scheduled connectivity contract check so you detect DNS/proxy/routing drift before patch night.