Nothing makes a virtualization host feel more “alive” than a pack of VMs that can’t reach the internet. The host can apt update, DNS looks fine, but every guest acts like it’s on a submarine. Nine times out of ten, the culprit is not “Proxmox being weird.” It’s vmbr0 doing exactly what you (accidentally) told it to do.
This is a production-grade troubleshooting guide for when your Proxmox guests have no outbound connectivity. We’ll treat it like an incident: isolate the failure domain, test the path hop-by-hop, and fix the bridge with minimal drama. Expect opinions, commands you can actually run, and failure modes that only show up at 02:00.
Fast diagnosis playbook
If you’re on-call, you don’t want a networking lecture. You want a triage order that finds the bottleneck fast. Here’s the path I use: guest → bridge → uplink → gateway → DNS. Don’t guess. Measure.
First: confirm the failure domain (guest-only, host-only, or upstream)
- From the host: can it reach the gateway and the internet? If yes, upstream is likely fine.
- From a VM: can it reach its default gateway? If no, the issue is inside the guest, the vNIC, the bridge, VLAN tagging, or firewall between guest and gateway.
- From a VM: can it reach an IP on the internet (e.g., a known resolver) but not DNS names? If yes, it’s DNS or DNS interception.
Second: verify vmbr0 is carrying the right frames
- Does
vmbr0have the correct IP config (if it’s your host’s L3 interface) and correct ports (physical NIC or bond as a bridge port)? - Are VLANs tagged where you think they are? Proxmox VLAN-aware bridges don’t magically fix trunk mistakes.
- Is Proxmox Firewall enabled at Datacenter / Node / VM levels with a default DROP you forgot about?
Third: validate gateway behavior and MTU
- Check ARP and neighbor tables: if ARP fails, you’re not even at Layer 2.
- Check MTU mismatches: classic when the host works (smaller packets) and guest traffic stalls (path MTU discovery blocked).
- Check offloads and bridge netfilter settings: they can produce “it pings but TCP is dead” symptoms.
Stop doing this: randomly restarting networking hoping it will “relearn.” Bridges don’t learn; they forward. Your problem is deterministic, even if it’s annoying.
How vmbr0 actually works (and why it fails)
vmbr0 on Proxmox is usually a Linux bridge. Think of it as a virtual switch living in the kernel. Your VMs attach to it via tap interfaces (e.g., tap100i0) and the bridge forwards frames between those taps and a physical uplink (e.g., eno1) or a bond (bond0).
The bridge is Layer 2. Routing is Layer 3. Proxmox networking problems happen when people blur that line. A bridge can carry VLAN-tagged traffic, but it does not “route VLANs.” If your VM is on VLAN 30 and your upstream switch port is not trunking VLAN 30, your VM is not “misconfigured.” It is isolated by design.
Typical Proxmox bridge layouts
- Simple bridged uplink:
vmbr0withbridge-ports eno1. Host IP is onvmbr0. VMs share the same L2 domain. - VLAN-aware bridge:
vmbr0withbridge-vlan-aware yes, and each VM NIC specifies a VLAN tag. - Routed or NAT network: a separate bridge (e.g.,
vmbr1) is internal; host does NAT out viavmbr0. Good for labs, not my first choice for production unless you want intentional isolation. - Bonded uplink:
bond0is a port ofvmbr0. Misconfiguring bonding mode vs switch configuration is a classic outage generator.
The failure modes are mostly boring
When VMs have no internet, the issue is usually one of these:
- Wrong default gateway (guest) or wrong default route (host) if you’re doing NAT/routing.
- VLAN mismatch (VM tag vs bridge vs switch trunk).
- Firewall rule dropping forwarding traffic.
- Bridge connected to the wrong physical NIC (yes, really).
- MAC filtering / port security upstream rejecting multiple MACs on one port.
- MTU mismatch or PMTUD blocked.
- nftables/iptables or
bridge-nf-call-iptablesdoing surprise filtering.
Joke #1: A Linux bridge is like a conference room whiteboard—everyone assumes it’s correct until they actually read what’s written on it.
Interesting facts and historical context (useful, not trivia)
- Linux bridging has been in the kernel for decades, and early designs assumed “simple L2 forwarding.” Modern setups pile VLANs, firewall hooks, and offloads on top.
- Proxmox networking inherits Debian conventions: what you configure in
/etc/network/interfacesis still the source of truth for many deployments, even with newer tooling available. - VLAN tags are not “metadata,” they are part of the Ethernet frame. If a switch port is access-only, tagged frames are usually dropped without ceremony.
- Bonding modes are not interchangeable. 802.3ad (LACP) requires a matching LACP config on the switch; active-backup does not. Mixing them can look like “flaky internet” rather than a clean failure.
- Bridge STP defaults matter. Enabling STP can prevent loops, but it can also introduce forwarding delay that looks like random DHCP failures right after link changes.
- “The host has internet” proves less than you think. The host might be routed differently (different VLAN, different interface, different firewall policy) than the guests.
- Port security is common in corporate networks. Switches may limit the number of MAC addresses learned on a port. Virtualization violates that assumption by design.
- MTU mismatches are survivable for ICMP, lethal for TCP when PMTUD is blocked. People interpret this as “DNS is broken” because web requests hang.
- Bridge netfilter hooks were controversial because they blur L2/L3 boundaries. They’re powerful, but they make packet flow non-obvious unless you know where the hooks are.
Practical tasks: commands, what the output means, and what you decide
These are real tasks you can run on a Proxmox node. Each one gives a signal. The trick is to treat outputs as decision points, not as “information.”
Task 1: Check host routes (prove the host has a sane default route)
cr0x@server:~$ ip route
default via 192.0.2.1 dev vmbr0 proto static
192.0.2.0/24 dev vmbr0 proto kernel scope link src 192.0.2.10
What it means: the host’s default route is via vmbr0 to 192.0.2.1. If the default route points somewhere else (wrong interface, missing), your host may still have some connectivity via policy routing, but your “normal” path is broken.
Decision: if default route is missing or wrong, fix host networking first. Don’t touch VMs yet.
Task 2: Inspect bridge membership (is the uplink actually in vmbr0?)
cr0x@server:~$ bridge link
2: eno1 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> master vmbr0
5: tap100i0 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> master vmbr0
What it means: eno1 is enslaved to vmbr0. If you don’t see your physical NIC/bond with master vmbr0, your VMs are bridged to nowhere.
Decision: if the uplink is not a bridge port, fix /etc/network/interfaces (or the node network config) and reload networking carefully.
Task 3: Confirm vmbr0 configuration and VLAN awareness
cr0x@server:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.0.2.10/24
gateway 192.0.2.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
What it means: the IP is on vmbr0, not on eno1. VLAN-aware is enabled, allowing tagged traffic to traverse.
Decision: if VLAN-aware is off but you tag VM NICs, either enable it or remove tags and use access VLAN on the switch. Pick one. Mixing intentions creates outages.
Task 4: Verify the VM NIC is attached to vmbr0 and has a VLAN tag (Proxmox view)
cr0x@server:~$ qm config 100 | sed -n 's/^net0:/net0:/p'
net0: virtio=DE:AD:BE:EF:10:00,bridge=vmbr0,tag=30,firewall=1
What it means: VM 100 uses vmbr0, VLAN tag 30, and Proxmox firewall is enabled for that NIC.
Decision: if the bridge is wrong, fix it here. If firewall=1, you must consider firewall policy as a first-class suspect.
Task 5: Confirm the VM sees a link and has an IP (inside the guest)
cr0x@server:~$ qm guest exec 100 -- ip -br addr
1: lo UNKNOWN 127.0.0.1/8 ::1/128
2: ens18 UP 192.0.2.50/24 fe80::dcad:beff:feef:1000/64
What it means: interface ens18 is UP with IPv4 192.0.2.50/24.
Decision: if there’s no IP, stop blaming vmbr0 and fix DHCP/static config in the guest (or cloud-init).
Task 6: Check the guest’s default route (the most common “no internet” cause)
cr0x@server:~$ qm guest exec 100 -- ip route
default via 192.0.2.1 dev ens18
192.0.2.0/24 dev ens18 proto kernel scope link src 192.0.2.50
What it means: default route points to 192.0.2.1.
Decision: if the default gateway is wrong (or missing), fix the guest network config or DHCP scope. Don’t touch the bridge yet.
Task 7: Quick reachability ladder: ARP → gateway ping → external IP ping → DNS
cr0x@server:~$ qm guest exec 100 -- bash -lc 'ip neigh; ping -c2 -W1 192.0.2.1; ping -c2 -W1 1.1.1.1; getent hosts example.com'
192.0.2.1 dev ens18 lladdr aa:bb:cc:dd:ee:ff REACHABLE
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.421 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.389 ms
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=3.12 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=57 time=3.08 ms
93.184.216.34 example.com
What it means: the VM has L2 neighbor resolution, L3 to gateway, L3 to internet, and working DNS resolution.
Decision: if any step fails, you now know which layer to investigate. Don’t skip steps, because “DNS is broken” is often just “no route.”
Task 8: Inspect ARP and forwarding database on the host (who learned whose MAC)
cr0x@server:~$ ip neigh show dev vmbr0 | head
192.0.2.1 lladdr aa:bb:cc:dd:ee:ff REACHABLE
192.0.2.50 lladdr de:ad:be:ef:10:00 STALE
cr0x@server:~$ bridge fdb show br vmbr0 | head
aa:bb:cc:dd:ee:ff dev eno1 master vmbr0
de:ad:be:ef:10:00 dev tap100i0 master vmbr0
What it means: the bridge learned the VM MAC on tap100i0 and the gateway MAC on eno1. If you see the gateway MAC learned on a tap (or not at all), you’ve got a loop, wrong wiring, or VLAN mismatch.
Decision: if FDB learning is wrong, focus on VLAN tagging, bridge ports, and physical switch port configuration.
Task 9: Check whether Proxmox firewall is enabled and dropping forwarding
cr0x@server:~$ pve-firewall status
Status: enabled/running
cr0x@server:~$ nft list ruleset | sed -n '1,120p'
table inet filter {
chain forward {
type filter hook forward priority filter; policy drop;
ct state established,related accept
iifname "vmbr0" oifname "vmbr0" accept
}
}
What it means: forward policy is DROP by default (common), with a specific accept for intra-bridge traffic. If you’re doing NAT (vmbr1 → vmbr0), you need explicit forwards.
Decision: if ruleset drops forwarding, either adjust PVE firewall rules or temporarily disable firewall to confirm it’s the cause (then re-enable with correct policy).
Task 10: Validate bridge netfilter sysctls (the “why is my bridge being firewalled?” trap)
cr0x@server:~$ sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.bridge.bridge-nf-call-arptables
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-arptables = 0
What it means: bridged IPv4/IPv6 frames are passed to iptables/nftables hooks. That can be correct, or it can be an accidental packet shredder if you thought “bridge equals switch equals no firewall.”
Decision: if you don’t intentionally filter bridged traffic, consider disabling these hooks. If you do filter, audit rules carefully and document it.
Task 11: Check MTU end-to-end (VM, bridge, uplink)
cr0x@server:~$ ip -br link show vmbr0 eno1
vmbr0 UP 00:11:22:33:44:55 <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
eno1 UP 00:11:22:33:44:55 <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
cr0x@server:~$ qm guest exec 100 -- ip -br link show ens18
ens18 UP de:ad:be:ef:10:00 <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
What it means: MTU matches at 1500. If VM MTU is 1500 but uplink is 9000 (or vice versa), you can get weird partial connectivity depending on what traffic is flowing.
Decision: pick one MTU strategy and apply it consistently: 1500 everywhere, or jumbo frames everywhere including the upstream switch ports. Half-jumbo is a great way to waste an afternoon.
Task 12: Detect “it pings but TCP dies” with a DF ping
cr0x@server:~$ qm guest exec 100 -- ping -c2 -M do -s 1472 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 1472(1500) bytes of data.
1472 bytes from 1.1.1.1: icmp_seq=1 ttl=57 time=3.24 ms
1472 bytes from 1.1.1.1: icmp_seq=2 ttl=57 time=3.18 ms
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
What it means: Path MTU at least 1500 works. If you get “Frag needed and DF set,” you have an MTU mismatch or a tunnel in the path.
Decision: if DF ping fails, fix MTU or allow PMTUD (ICMP too-big) through firewalls. Don’t “fix” it by reducing MSS blindly unless you own the environment and document the hack.
Task 13: Check for MAC address limits / port security symptoms (from host perspective)
cr0x@server:~$ journalctl -k --since "30 min ago" | egrep -i 'mac|flood|storm|blocked|bond|link' | tail -n 20
Dec 26 09:21:07 server kernel: vmbr0: port 2(eno1) entered blocking state
Dec 26 09:21:07 server kernel: vmbr0: port 2(eno1) entered forwarding state
What it means: kernel logs are not going to directly tell you “your switch is enforcing port security,” but frequent port state changes or link flaps will show.
Decision: if you suspect port security, talk to the network team. Your Proxmox host is not a single MAC endpoint; it’s a MAC factory. Configure the switch accordingly.
Task 14: Packet capture on vmbr0 and on the tap (prove VLAN tagging and DHCP)
cr0x@server:~$ tcpdump -eni vmbr0 -c 10 '(arp or (udp port 67 or 68) or icmp)'
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
08:15:11.120001 de:ad:be:ef:10:00 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request
08:15:11.220112 aa:bb:cc:dd:ee:ff > de:ad:be:ef:10:00, ethertype IPv4 (0x0800), length 342: 192.0.2.1.67 > 192.0.2.50.68: BOOTP/DHCP, Reply
What it means: you see DHCP request and reply. If DHCP requests leave the VM but no replies return, the upstream VLAN, relay, or firewall is wrong. If you see tagged frames when you expect untagged (or the reverse), you found your mismatch.
Decision: packet captures end arguments. Use them early when the team is stuck in “it should work.”
Task 15: Validate NAT configuration (if you built a private vmbr1)
cr0x@server:~$ ip -br addr show vmbr1
vmbr1 UP 10.10.10.1/24
cr0x@server:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1
cr0x@server:~$ nft list ruleset | sed -n '1,200p' | egrep -n 'nat|masquerade|postrouting|forward' | head -n 20
42: table ip nat {
55: chain postrouting {
58: oifname "vmbr0" ip saddr 10.10.10.0/24 masquerade
What it means: vmbr1 exists, forwarding is on, and there’s a masquerade rule to NAT traffic out of vmbr0.
Decision: if guests on vmbr1 have no internet, NAT is a prime suspect. Either fix it or stop doing NAT and use proper VLANs/routing like an adult network.
Task 16: Check Proxmox-specific network device naming and bonds
cr0x@server:~$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
MII Status: up
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Slave Interface: eno1
MII Status: up
Slave Interface: eno2
MII Status: up
What it means: bond is LACP (802.3ad) and both slaves are up. If bond shows “Number of ports: 1” unexpectedly, or slaves are flapping, your switch LACP config likely disagrees with you.
Decision: if bonding mode and switch config aren’t aligned, don’t “tune.” Align. For production, either do proper LACP on both ends or stick to active-backup for simplicity.
Common mistakes: symptom → root cause → fix
This is the part you forward to your future self.
1) Symptom: Host has internet, VMs can’t reach gateway
Root cause: VM attached to wrong bridge, or bridge has no physical uplink port, or VLAN tag mismatch blocks L2 adjacency.
Fix: verify qm config NIC uses bridge=vmbr0; verify bridge link shows physical NIC as master vmbr0; verify VLAN configuration end-to-end (VM tag, bridge VLAN-aware, switch trunk/access).
2) Symptom: VM can ping gateway, can’t ping external IP
Root cause: missing route beyond gateway (guest gateway wrong), upstream ACL, or NAT/routing misconfigured if VM is on private bridge.
Fix: check guest default route; if NAT design, verify ip_forward and masquerade; confirm upstream gateway routes back to the VM subnet (if routed).
3) Symptom: VM can ping 1.1.1.1 but DNS fails
Root cause: DNS servers unreachable, wrong resolv.conf, systemd-resolved stub mispointed, or corporate DNS interception + firewall rules.
Fix: verify getent hosts, check /etc/resolv.conf in guest, ensure UDP/TCP 53 allowed, and that DHCP is handing correct resolvers for that VLAN.
4) Symptom: DHCP times out in VM, static IP works
Root cause: VLAN trunk missing that VLAN, STP delay at link up, DHCP snooping/relay issue, or firewall dropping DHCP broadcasts.
Fix: tcpdump on vmbr0 for DHCP; enable bridge-fd 0 unless you need STP; coordinate with network team on snooping/relay; ensure VM firewall isn’t blocking DHCP.
5) Symptom: Some VMs work, others don’t, on same host
Root cause: per-VM VLAN tags differ; per-VM firewall differs; MAC conflict; duplicate IP; or one VM is on a different bridge.
Fix: compare qm config across VMs; check for duplicate IPs with arping or neighbor table; audit firewall settings at VM level.
6) Symptom: Works for small pings, HTTPS hangs or is slow
Root cause: MTU mismatch or PMTUD blocked; sometimes checksum offload weirdness with certain NIC drivers/firmware.
Fix: DF ping test; align MTU; allow ICMP too-big; if still odd, test disabling GRO/LRO/TSO on the host NIC as a diagnostic.
7) Symptom: Connectivity dies after enabling Proxmox Firewall
Root cause: default policy drop in forward chain, missing allow rules, or bridge netfilter hooks enabled with unexpected rules.
Fix: review nftables rules; adjust PVE firewall rules; keep a minimal “allow established + allow vmbrX forwarding” baseline and build from there.
8) Symptom: VMs lose connectivity when another host connects, or during migration
Root cause: upstream port security limiting MAC count, or switch not expecting multiple MACs behind a single port, or LACP hashing oddities with asymmetric flows.
Fix: configure switchport for virtualization (allow multiple MACs, disable strict port security or raise limits), verify LACP on both ends, consider active-backup if the network team won’t support LACP properly.
Three corporate mini-stories (anonymized, plausible, technically accurate)
Mini-story 1: The outage caused by a wrong assumption
The team inherited a small Proxmox cluster from a project that “never went live.” They moved it into a production rack, plugged two 10GbE ports into a pair of top-of-rack switches, and set up bond0 in 802.3ad because “that’s the professional way.” The switches had ports configured as independent access ports with a standard VLAN. Nobody told the virtualization team. Nobody told the network team either, because everyone assumed it was obvious.
The host itself had intermittent connectivity. VMs were worse: sometimes they could resolve DNS but not fetch packages; sometimes the default gateway ping worked; sometimes ARP looked stale. It smelled like a flaky ISP link, which is always a tempting lie because it lets you stop thinking.
The turning point was a packet capture on the uplink showing bursts of retransmits and then silence, correlated with LACP negotiation messages that never stabilized. On the host, /proc/net/bonding/bond0 showed an “up” interface with one active slave, then flipped. The wrong assumption was that LACP is harmless if the switch doesn’t support it. It’s not harmless; it’s a negotiation protocol that expects a partner.
They fixed it by switching the bond to active-backup temporarily, restoring stability in minutes, then scheduling a change with the network team to deploy a proper LACP port-channel. The incident review was short and slightly painful: the system behaved exactly as configured. The humans were the unpredictable part.
Mini-story 2: The optimization that backfired
A performance-minded engineer noticed higher CPU usage on the Proxmox nodes during peak hours. Network interrupts were up. There were also some latency spikes on a busy API VM. They decided to “optimize networking” by enabling jumbo frames on the hypervisor uplink and vmbr0. The storage network already used jumbo frames, so it felt consistent.
They changed MTU to 9000 on vmbr0 and the physical NICs. The host looked fine. VMs started reporting bizarre behavior: ICMP pings worked, but TLS handshakes stalled. The monitoring dashboards were a modern art piece of partial failures. The engineer assumed the MTU change couldn’t be responsible because “bigger packets reduce overhead.” That’s true only if the whole path agrees.
The upstream switch ports were still at 1500. Worse, the network firewall in the path was configured to drop ICMP fragmentation-needed messages. PMTUD couldn’t do its job, so TCP sessions hung when they tried to send larger segments. DNS looked flaky because client retries were delayed and timeouts were misattributed.
The fix was not heroic: set MTU back to 1500 on vmbr0 and the VM vNICs, then plan jumbo frames properly with end-to-end validation and ICMP policy aligned. The lesson was old-school: optimization without measurement is just change. Sometimes expensive change.
Mini-story 3: The boring but correct practice that saved the day
A different organization ran Proxmox nodes in multiple sites, each with slightly different switch vendors and VLAN layouts. They had a simple rule: every Proxmox node must have a single “known good” management network on untagged vmbr0, and every VM network must be tagged with explicit VLANs on a VLAN-aware bridge. No exceptions, no “temporary” access ports.
It sounds pedantic. It also meant that when a site reported “VMs have no internet,” the team immediately checked VLAN trunks and tags instead of rewriting network configs. The host management IP stayed reachable, because it wasn’t riding on the same tagged mess as the tenant networks.
During a switch replacement, one trunk port was accidentally configured as access VLAN 10 instead of trunk. The symptom appeared instantly: all VMs on VLAN 30 at that node lost connectivity, but the node remained reachable on management. Packet capture showed tagged frames leaving vmbr0 and disappearing into the access port. Diagnosis took minutes, not hours.
The boring practice was separation of concerns: stable management connectivity, explicit VLANs for guests, and a standard test checklist after network changes. It’s not glamorous. It is how you avoid paging your entire team because somebody fat-fingered a switch template.
Joke #2: The only thing more persistent than a mis-tagged VLAN is a meeting invite about the mis-tagged VLAN.
Checklists / step-by-step plan
Step-by-step: Fix “VM has no internet” without making it worse
- Pick one affected VM and use it as your canary. Don’t flip random settings on all guests at once.
- Inside the VM: verify IP, subnet, gateway, DNS. Confirm
ip routeandgetent hosts. - From the VM: ping gateway, ping external IP, resolve DNS name. Record exactly what fails.
- On the host: confirm
vmbr0has the uplink port and correct VLAN settings. - Check firewall state: Proxmox firewall status, VM NIC firewall flag, nftables forward policy.
- Confirm L2 adjacency: host neighbor table and bridge FDB entries for VM MAC and gateway MAC.
- Run a targeted capture: tcpdump on
vmbr0to see ARP/DHCP/ICMP flow; confirm tags if relevant. - Check MTU: verify consistent MTU; run DF ping test to internet.
- Only then change configuration, one dimension at a time: VLAN config, firewall rule, route, NAT, MTU.
- Validate and document the final working state: bridge config, switchport expectations, and VM tagging policy.
Minimal “known good” vmbr0 patterns
Pattern A: Simple untagged access network for host and VMs (good for small labs; limited for multi-tenant):
cr0x@server:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.0.2.10/24
gateway 192.0.2.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
Pattern B: VLAN-aware bridge with tagged VM networks (what I recommend when you have real networking):
cr0x@server:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.0.2.10/24
gateway 192.0.2.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
Pattern C: Private VM network with NAT (use sparingly, document heavily):
cr0x@server:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.0.2.10/24
gateway 192.0.2.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
auto vmbr1
iface vmbr1 inet static
address 10.10.10.1/24
bridge-ports none
bridge-stp off
bridge-fd 0
For Pattern C, you still need forwarding and NAT rules. If you can’t explain them to the next on-call engineer in under a minute, you’re building an outage, not a network.
Operational checklist after changes (the one people skip)
- Can the host reach gateway, resolve DNS, and fetch packages?
- Can a VM on each VLAN reach its gateway, external IP, and resolve DNS?
- Are ARP entries stable (not constantly incomplete)?
- Does a DF ping at 1472 bytes work to an external IP?
- Is Proxmox firewall policy documented (what’s allowed, what’s default drop)?
- Is the switchport configuration recorded (access vs trunk, allowed VLANs, MAC limits, LACP)?
A reliability quote (paraphrased idea)
Paraphrased idea, attributed to W. Edwards Deming: “You can’t improve what you don’t measure.”
FAQ
1) Why does the host have internet but VMs don’t?
Because the host and VMs may not share the same network path. The host might be untagged on vmbr0 while VMs are tagged, or firewall rules may treat forwarded traffic differently than local host traffic.
2) Should the host IP be on eno1 or on vmbr0?
On vmbr0, in the common bridged design. Put eno1 in manual mode and make it a bridge port. Assigning an IP to eno1 while also bridging it is a classic “it sort of works until it doesn’t” setup.
3) Do I need “bridge-vlan-aware yes” if I’m not using VLAN tags?
No. If everything is untagged and the switchport is access VLAN, keep it simple. Turn on VLAN awareness only when you actually tag guest NICs or need trunks.
4) My VM has an IP, but can’t get out. What’s the fastest single check?
Check the VM’s default route: ip route. If the default gateway is wrong or missing, nothing else matters until that’s fixed.
5) Is Proxmox Firewall safe to enable?
Yes, if you understand its default policies and where it applies (Datacenter, Node, VM/NIC). It becomes unsafe when someone enables it with a default DROP and no forwarding allowances, then leaves for the weekend.
6) When should I use NAT for VM internet access?
When you deliberately want a private, non-routable VM network and you accept that the Proxmox host becomes a router/firewall. For production multi-VM environments, VLANs and proper routing are usually clearer and easier to troubleshoot.
7) Why do I see DHCP requests in tcpdump but no replies?
Usually VLAN trunk issues (wrong VLAN allowed), DHCP snooping/relay misconfiguration, or firewall rules blocking broadcast/UDP. If the request leaves the VM and shows on vmbr0, the guest side is fine.
8) Can MTU mismatch really cause “no internet”?
Yes, in a deceptive way. Small traffic (pings, some DNS) might work while TCP stalls. Use DF pings and align MTU end-to-end, or allow PMTUD ICMP messages.
9) Should I disable offloads (GRO/LRO/TSO) on the NIC?
Not as a default. But as a diagnostic, it can isolate weird driver/firmware interactions. If disabling offloads “fixes” the issue, treat that as a lead: update firmware/drivers and validate switch features rather than living with a permanent workaround.
10) What’s the cleanest way to handle multiple networks on Proxmox?
One stable management network (often untagged), plus VLAN-aware bridges for guest networks with explicit tags per VM NIC. Keep switchport trunk config documented and consistent across nodes.
Conclusion: next steps that prevent repeats
When Proxmox VMs have no internet, it’s almost never “mystery virtualization.” It’s a predictable mismatch between what the VM is sending, what vmbr0 is forwarding, and what the upstream network accepts. The fastest path to resolution is to stop treating it like magic and start treating it like a pipeline.
Do these next:
- Standardize your bridge pattern: pick untagged-only or VLAN-aware-and-tagged. Don’t run both by accident.
- Write down the switch expectations: trunk/access, allowed VLANs, MAC limits, LACP mode. If it’s not written, it’s folklore.
- Keep a canary VM with known-good network config and a simple test script (gateway ping, external IP ping, DNS lookup, DF ping).
- Audit firewall posture: know where default DROP exists, and ensure forwarding rules match your design (bridged, routed, or NAT).
- Make MTU a policy decision, not a knob. 1500 everywhere is fine. Jumbo frames everywhere is fine. “Somewhere in the middle” is where incidents live.
If you want one practical takeaway: capture packets sooner. Nothing ends a debate like watching frames go out and not come back.