When an LXC container “has an IP” but can’t ping the gateway, you don’t have a mystery. You have a missing fact. Proxmox networking failures are rarely subtle; they’re just distributed across namespaces, bridges, firewall layers, and one misunderstood checkbox.
This is a production-grade checklist for the specific pain: veth/tap/bridge traffic not flowing. It’s written to force clarity: where does the packet die, why, and what decision you make next.
Fast diagnosis playbook
If you’re on-call, you don’t need a theory. You need a funnel. This is the order that tends to find the bottleneck fastest.
1) Decide: is it “container only” or “host too”?
- If the host can reach the gateway and internet but the container cannot: suspect bridge, firewall, VLAN, rp_filter, container route, or MTU.
- If the host also can’t: stop touching LXC. Fix upstream physical network, host routes, or host firewall first.
2) Confirm the veth exists and is attached to the expected bridge
- No veth on the host: container didn’t start cleanly, config error, or veth creation failed due to name/limits.
- veth exists but not on vmbrX: wrong bridge name in container config, or a stale bridge was renamed.
3) Check L2 first, then L3, then policy
- L2: link up, bridge membership, VLAN filtering, MAC learning/forwarding.
- L3: container IP/mask, default route, ARP/neighbor, host routes.
- Policy: Proxmox firewall (datacenter/node/CT), nftables/iptables, sysctls (rp_filter, forwarding).
4) Use one “known good” test packet
- From container: ping gateway IP, then ping host bridge IP, then ping external IP.
- If ping gateway fails, look at ARP/neigh and bridge/VLAN before you waste time on NAT/DNS.
5) Capture on both sides of the veth
- Capture on host veth and on vmbrX. If you see egress but no ingress, the bridge or VLAN rules are dropping.
- If you see nothing on host veth, the container never emitted the packet (route, interface down, policy inside CT).
One quote you should keep in your pocket when you’re tempted to “just reboot it”: “Hope is not a strategy.”
— General Gordon R. Sullivan (often quoted in operations circles).
The mental model: veth, bridges, and namespaces (without mythology)
In Proxmox LXC, the container does not get a “real NIC.” It gets one end of a veth pair. The other end sits on the host and is typically plugged into a Linux bridge like vmbr0. That’s it. That’s the whole trick.
Where people get lost is that the network spans:
- The container’s network namespace (its own interfaces, routes, ARP table).
- The veth pair boundary (two interfaces that are permanently connected back-to-back).
- The host’s bridge (switch-like behavior: forwarding database, VLAN filtering, STP settings).
- The host’s IP stack (if the host is acting as router/NAT, or if you’re using host firewalling).
- Proxmox firewall (which is not magic; it’s rules that end up in netfilter).
- Upstream switching (VLAN trunks, MAC limits, port security, MTU, LACP quirks).
Tap comes up mostly with QEMU VMs, where tap devices connect the VM to the host bridge. LXC uses veth by default. But the failure patterns overlap: wrong bridge, wrong VLAN, firewall drop, MTU mismatch, or “it’s up but not actually forwarding.”
Here’s the most useful simplification:
- veth = a virtual Ethernet cable with two ends. If one end is up, the other can still be down. Treat it like a real cable.
- bridge = a virtual switch. If VLAN filtering is on, it can behave like a managed switch and drop silently.
- namespace = a room with its own routing table and ARP cache. You can’t debug a room you never enter.
Joke #1 (short, relevant): A veth pair is like a marriage: if one side stops talking, the other side still thinks everything is “up.”
Interesting facts and context (because your future self deserves it)
- veth pairs arrived in Linux long before containers were mainstream. They started as a generic way to connect namespaces and virtual switches; Docker just popularized the pattern.
- Linux bridge predates Open vSwitch in many deployments. It’s old, stable, and extremely capable—especially since VLAN filtering and netfilter hooks matured.
- Proxmox firewall is not a separate firewall appliance. It generates netfilter rules on the node. If you run your own nftables rules, you’re co-authoring the policy.
- ARP/neighbor discovery is the first canary. If a container can’t reach its gateway, 80% of the time ARP fails due to VLAN, bridge, or MAC filtering—not “DNS issues.”
- Reverse path filtering (rp_filter) was designed for anti-spoofing. It’s great until you do asymmetric routing, policy routing, or multi-homing. Then it becomes a silent traffic shredder.
- MTU mismatches rarely break small pings. They break real workloads: TLS handshakes, container image pulls, and anything that triggers fragmentation/PMTUD failures.
- VLAN filtering on a Linux bridge is relatively “new” compared to classic bridging. Many teams enable it for hygiene and then forget that untagged traffic now needs explicit PVID handling.
- Conntrack state tables are a finite resource. When they overflow or timeouts are wrong, “random” container connectivity failures appear—especially under NAT.
Practical tasks: commands, outputs, decisions (the real checklist)
Below are hands-on tasks. Each one is designed to answer a single question and force a next action. Run them in order until you find the first lie.
Task 1 — Confirm the container sees an interface and an IP
cr0x@server:~$ pct exec 101 -- ip -br link
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0 UP 2a:7d:1c:4f:9b:12 <BROADCAST,MULTICAST,UP,LOWER_UP>
What it means: If eth0 is missing or DOWN, stop. That’s not a routing issue; that’s interface plumbing or container config.
Decision: If eth0 is down, inspect the container config and host veth creation (Tasks 5–7). If it’s up, check IP/routing next.
cr0x@server:~$ pct exec 101 -- ip -br addr show dev eth0
eth0 UP 192.0.2.50/24 fe80::287d:1cff:fe4f:9b12/64
Decision: No IPv4 address when you expect one? Fix DHCP/static config inside the CT (or the Proxmox config if using ip=dhcp).
Task 2 — Verify default route inside the container
cr0x@server:~$ pct exec 101 -- ip route
default via 192.0.2.1 dev eth0
192.0.2.0/24 dev eth0 proto kernel scope link src 192.0.2.50
What it means: If default route is missing or points to the wrong gateway, the container can’t leave its subnet.
Decision: Fix the gateway. Don’t touch bridges/firewalls until L3 inside the CT is sane.
Task 3 — Quick ping ladder (gateway → host bridge → public IP)
cr0x@server:~$ pct exec 101 -- ping -c 2 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.512 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.487 ms
--- 192.0.2.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
Decision: If gateway ping fails, go straight to ARP/neigh and VLAN/bridge checks (Tasks 4, 8, 10, 11). If gateway ping works but public IP fails, you likely have routing/NAT/firewall upstream.
Task 4 — Check neighbor (ARP) state inside the container
cr0x@server:~$ pct exec 101 -- ip neigh show dev eth0
192.0.2.1 lladdr 00:11:22:33:44:55 REACHABLE
What it means: REACHABLE/STALE is fine. INCOMPLETE means ARP requests aren’t being answered (VLAN mismatch, bridge drop, upstream switch, or wrong gateway IP).
Decision: If you see INCOMPLETE, don’t tweak DNS. Fix L2/L3 adjacency.
Task 5 — Find the host-side veth for the container
cr0x@server:~$ pct config 101 | grep -E '^net'
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=2A:7D:1C:4F:9B:12,ip=192.0.2.50/24,gw=192.0.2.1,type=veth
Decision: Confirm the bridge name. If it references vmbr1 but you expect vmbr0, you found your bug.
cr0x@server:~$ ip -br link | grep -E 'veth|vmbr0'
vmbr0 UP 3a:ad:be:ef:01:23 <BROADCAST,MULTICAST,UP,LOWER_UP>
veth101i0 UP fe:9a:1c:0f:3b:7a <BROADCAST,MULTICAST,UP,LOWER_UP>
What it means: Proxmox typically names host veth like veth<ctid>i0. If it’s missing, the container network didn’t come up.
Decision: If missing, look at container start logs and kernel messages (Task 14).
Task 6 — Confirm the veth is enslaved to the bridge
cr0x@server:~$ ip link show master vmbr0
7: veth101i0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master vmbr0 state UP mode DEFAULT group default qlen 1000
link/ether fe:9a:1c:0f:3b:7a brd ff:ff:ff:ff:ff:ff
Decision: If the veth isn’t listed under vmbr0, the container is connected to nowhere. Fix the bridge assignment or recreate the interface.
Task 7 — Check bridge port state and learning/flooding flags
cr0x@server:~$ bridge link show dev veth101i0
7: veth101i0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> master vmbr0 state forwarding priority 32 cost 2
What it means: If the port is blocking or not forwarding, STP or bridge misconfig is stopping traffic.
Decision: If blocked, inspect STP and loop prevention on the bridge and upstream (Task 12).
Task 8 — Verify VLAN tagging/filtering on the bridge
cr0x@server:~$ bridge vlan show dev veth101i0
port vlan-id
veth101i0 1 PVID Egress Untagged
What it means: With VLAN-aware bridges, the port needs a correct PVID for untagged traffic, or explicit VLAN membership for tagged traffic.
Decision: If you expect VLAN 20 but see only VLAN 1, either set tag=20 in the container net config or adjust bridge VLAN rules.
Task 9 — Confirm the container net config matches your VLAN intent
cr0x@server:~$ pct config 101 | grep -E '^net0'
net0: name=eth0,bridge=vmbr0,firewall=1,tag=20,hwaddr=2A:7D:1C:4F:9B:12,ip=192.0.2.50/24,gw=192.0.2.1,type=veth
Decision: If your switchport is access VLAN 20 (untagged), do not tag inside Proxmox; use untagged and set the bridge PVID appropriately. If your host uplink is a trunk, tagging can be correct. Match reality, not wishes.
Task 10 — Check the bridge FDB (MAC learning)
cr0x@server:~$ bridge fdb show br vmbr0 | grep -i fe:9a:1c:0f:3b:7a
fe:9a:1c:0f:3b:7a dev veth101i0 master vmbr0 permanent
What it means: If MACs aren’t being learned dynamically (or appear on the wrong port), you can get “works for one container” weirdness.
Decision: No sign of container MACs at all? Traffic may not be entering the bridge (container not emitting, firewall dropping, or veth down).
Task 11 — Inspect Proxmox firewall status at all three levels
cr0x@server:~$ pve-firewall status
Status: enabled/running
cr0x@server:~$ cat /etc/pve/firewall/cluster.fw | sed -n '1,120p'
[OPTIONS]
enable: 1
[RULES]
cr0x@server:~$ cat /etc/pve/nodes/$(hostname)/host.fw | sed -n '1,160p'
[OPTIONS]
enable: 1
policy_in: DROP
policy_out: ACCEPT
What it means: A default DROP on the host firewall can break container forwarding if you didn’t explicitly allow bridge traffic.
Decision: If you don’t fully understand your firewall policies, temporarily set the CT firewall off for a controlled test, not as a “fix.”
Task 12 — Check whether STP/bridge settings are quietly preventing forwarding
cr0x@server:~$ cat /etc/network/interfaces | sed -n '1,220p'
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.0.2.10/24
gateway 192.0.2.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
Decision: If STP is on and you’re not expecting it, you may be waiting through forwarding delays or hitting a blocked port. If VLAN-aware is on, you must manage VLAN membership intentionally.
Task 13 — Validate host sysctls that affect forwarding and asymmetry
cr0x@server:~$ sysctl net.ipv4.ip_forward net.ipv4.conf.all.rp_filter net.ipv4.conf.vmbr0.rp_filter
net.ipv4.ip_forward = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.vmbr0.rp_filter = 1
What it means: If you do routing/NAT on the host and have asymmetric paths (multiple uplinks, policy routing, VRFs), rp_filter=1 can drop valid traffic.
Decision: If you suspect rp_filter, test with rp_filter=2 (loose) on relevant interfaces, then lock it down with proper routing. Don’t leave it at “whatever makes it work” without understanding.
Task 14 — Look for kernel hints: veth, bridge, nf_tables, MTU
cr0x@server:~$ dmesg -T | tail -n 30
[Thu Dec 26 09:14:02 2025] device veth101i0 entered promiscuous mode
[Thu Dec 26 09:14:02 2025] vmbr0: port 7(veth101i0) entered blocking state
[Thu Dec 26 09:14:03 2025] vmbr0: port 7(veth101i0) entered forwarding state
Decision: If you see errors like “failed to create veth,” “RTNETLINK answers: File exists,” or MTU-related warnings, you have a host-level problem: naming collisions, resource exhaustion, or bad config sequencing.
Task 15 — Trace packets with tcpdump on both sides
cr0x@server:~$ tcpdump -ni veth101i0 arp or icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on veth101i0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:18:10.102938 ARP, Request who-has 192.0.2.1 tell 192.0.2.50, length 28
Decision: If ARP requests leave the veth but you never see replies, check VLAN and upstream switch. If you see replies on vmbr0 but not on veth101i0, something on the host is filtering between bridge and port (bridge VLAN filtering, ebtables/nft bridge rules, Proxmox firewall).
cr0x@server:~$ tcpdump -ni vmbr0 arp or icmp
listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:18:10.102944 ARP, Request who-has 192.0.2.1 tell 192.0.2.50, length 28
Decision: Seeing the same request on vmbr0 is good: the packet is making it into the bridge domain. Now find where the reply dies.
Task 16 — Verify nftables/iptables policy isn’t killing bridge traffic
cr0x@server:~$ nft list ruleset | sed -n '1,200p'
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
ct state established,related accept
}
chain forward {
type filter hook forward priority 0; policy drop;
}
}
What it means: A default DROP on forward will break routed/NATed container traffic. Bridged traffic can also be impacted depending on sysctls and bridge netfilter settings.
Decision: If you rely on host routing/NAT, add explicit forward accepts for the container subnet and the outgoing interface. If you’re purely bridging, inspect bridge-nf-call-iptables (Task 17).
Task 17 — Check bridge netfilter sysctls (the silent policy multiplier)
cr0x@server:~$ sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.bridge.bridge-nf-call-arptables
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-arptables = 0
What it means: When enabled, bridged packets can traverse iptables/nft hooks. That can be good (security) or disastrous (surprise drops).
Decision: If you didn’t intend firewalling on bridged traffic, set these to 0 and rely on proper L2/L3 boundaries—or write rules consciously.
Task 18 — Validate MTU across container, veth, bridge, and uplink
cr0x@server:~$ ip -br link show dev veth101i0 vmbr0 eno1
veth101i0 UP fe:9a:1c:0f:3b:7a <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
vmbr0 UP 3a:ad:be:ef:01:23 <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
eno1 UP 58:11:22:aa:bb:cc <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000
What it means: An uplink at 9000 and a bridge at 1500 is not always wrong, but it’s a breeding ground for PMTUD edge cases if routing/NAT is involved.
Decision: Standardize MTU end-to-end for the segment, or prove with captures that fragmentation/ICMP “frag needed” is working.
Task 19 — Confirm container’s DNS isn’t the scapegoat
cr0x@server:~$ pct exec 101 -- getent hosts example.com
2606:2800:220:1:248:1893:25c8:1946 example.com
93.184.216.34 example.com
Decision: If gateway ping works but DNS fails, fix resolvers (container /etc/resolv.conf or systemd-resolved handling). Don’t rewrite bridge configs for a DNS issue.
Task 20 — Test host-to-container connectivity (local sanity)
cr0x@server:~$ ping -c 2 192.0.2.50
PING 192.0.2.50 (192.0.2.50) 56(84) bytes of data.
64 bytes from 192.0.2.50: icmp_seq=1 ttl=64 time=0.221 ms
64 bytes from 192.0.2.50: icmp_seq=2 ttl=64 time=0.213 ms
--- 192.0.2.50 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
Decision: If the host can’t ping the container but the container can ping the host bridge, suspect firewall policy on the host, or ebtables/nft bridge filtering.
Common mistakes: symptom → root cause → fix
1) “Container has IP but can’t ping gateway”
Root cause: VLAN mismatch or VLAN-aware bridge dropping untagged frames; sometimes upstream switchport is access VLAN X but Proxmox tags VLAN X (double tagging or wrong tagging).
Fix: Make VLAN intent explicit. If uplink is trunk and container should be on VLAN 20, set tag=20 and ensure vmbr0 allows VLAN 20. If uplink is access VLAN 20, do not tag in Proxmox; ensure port PVID matches.
2) “Works for minutes, then dies; restart container fixes it”
Root cause: Neighbor/ARP table churn due to duplicate IP, duplicate MAC (cloned CT), or upstream MAC limit/port-security shutting down learning.
Fix: Ensure unique MAC per CT; never clone and forget. Check upstream switch MAC security. Verify no duplicate IPs on the segment.
3) “Container can ping IPs but not resolve names”
Root cause: DNS misconfiguration, systemd-resolved stub issues, or Proxmox injecting wrong resolvers. Sometimes firewall blocks UDP/53 while ICMP passes.
Fix: Validate /etc/resolv.conf inside CT and run getent hosts. If firewalling is on, allow DNS explicitly.
4) “Only large downloads fail; pings work”
Root cause: MTU mismatch + broken PMTUD (ICMP “frag needed” blocked), or overlay networks interacting with jumbo frames.
Fix: Standardize MTU; allow ICMP type 3 code 4 where relevant; confirm with tcpdump and a large curl or ping -M do -s test.
5) “Traffic works until we enable Proxmox firewall”
Root cause: Default DROP in node/host/CT policies with missing allow rules for bridge forwarding; or bridge netfilter sysctls sending bridged packets through L3 rules unexpectedly.
Fix: Audit policies at datacenter/node/CT. Decide where filtering should happen. Keep bridge-nf sysctls intentional.
6) “Container can reach host but not anything else; NAT is configured (we think)”
Root cause: Host net.ipv4.ip_forward is off, forward chain drops, or NAT rules are in iptables while the system uses nftables (or vice versa).
Fix: Validate forwarding sysctl, ensure forward chain accepts, and make sure you’re writing rules to the active firewall backend.
7) “After migrating a CT to another node, network broke”
Root cause: The destination node has different vmbr names, VLAN-aware settings, MTU, or firewall defaults.
Fix: Treat node networking as an API contract. Standardize /etc/network/interfaces across nodes and verify with a pre-migration checklist.
Joke #2 (short, relevant): VLAN bugs are like glitter—once they show up, you’ll find them in places you didn’t touch.
Checklists / step-by-step plan
Checklist A — “Container can’t reach gateway” (most common)
- Inside CT:
ip -br addrandip route. Fix missing IP/route first. - Inside CT:
ip neigh. If gateway isINCOMPLETE, it’s L2/VLAN/bridge. - On host: confirm veth exists and is on expected bridge (
ip -br link,ip link show master vmbr0). - On host: check VLAN membership (
bridge vlan show). Compare to switchport mode (access vs trunk) and Proxmoxtag=. - Capture ARP on host veth and vmbr. If request leaves but reply never returns, go upstream (switch config, MAC security, cabling, LACP hashing issues).
Checklist B — “Gateway works, internet doesn’t”
- Ping a public IP (not a name). If it fails, it’s routing/NAT/firewall, not DNS.
- Confirm host has connectivity and correct default route.
- If using host routing/NAT: verify
net.ipv4.ip_forward=1, forward chain policy, and NAT rules applied in the right firewall backend. - Check rp_filter if multiple paths exist.
- Only after IP reachability works: validate DNS.
Checklist C — “Intermittent or performance-related”
- Check MTU across uplink, bridge, veth, and container.
- Check conntrack table pressure (symptoms: random drops under load).
- Capture and look for retransmits, ICMP frag-needed, or TCP MSS anomalies.
- Audit firewall rules for stateful tracking timeouts and unexpected drops.
Three corporate-world mini-stories (so you don’t repeat them)
Story 1 — The incident caused by a wrong assumption
They had a “simple” plan: move a set of legacy services from VMs into LXC to save memory. Networking was “the same,” because everything was still on vmbr0. That sentence is how incidents start.
The first container came up with an IP and answered pings from the host. From the outside network? Dead. The team assumed the upstream firewall didn’t like the new MAC, so they opened tickets and waited. Meanwhile, services were down and the incident channel was doing what incident channels do: generating heat, not data.
The actual issue was a VLAN mismatch introduced by an “obvious” assumption: the switchport was an access port for VLAN 30, so they set tag=30 on the container. On that port, the frames were already untagged VLAN 30. The container was effectively sending VLAN 30 tagged frames into an access port that discarded them. ARP never completed. No gateway, no joy.
What made it worse: the host itself had an IP on that access VLAN and could reach the gateway, so “the host works” was used as proof that the container should work. But the host was untagged; the container was tagged. Different reality, same cable.
The fix was boring: remove the tag, keep the container untagged, and document the switchport mode in the cluster network standard. After that, they added a pre-flight test: run bridge vlan show and compare it to the switchport config before migration.
Story 2 — The optimization that backfired
A different org was proud of its security posture. They enabled Proxmox firewall everywhere, set default policies to DROP, and rolled it out with a change window that looked generous on the calendar. It wasn’t generous in real life.
Connectivity failures weren’t total. Some containers could talk to their database; others couldn’t. DNS sometimes worked. ICMP was inconsistent. That’s the kind of failure that makes people doubt physics.
The “optimization” was enabling bridge netfilter hooks because they wanted to filter traffic “even if it’s bridged.” They turned on net.bridge.bridge-nf-call-iptables=1 and applied a set of L3 rules that assumed routed traffic. Suddenly, bridged frames were subject to a forward chain policy of DROP. Certain flows survived due to state tracking; new ones didn’t. It looked random because it was stateful policy meeting a topology it wasn’t designed for.
The fastest path out was to decide: filter at L3 routed boundaries, or filter at L2/bridge intentionally, but not by accident. They disabled bridge netfilter on nodes where the design was pure bridging, kept Proxmox firewall for management interfaces, and moved east-west controls to a place that actually understood the traffic model.
After the incident, the best change they made wasn’t technical. It was procedural: any firewall change required a single “known-good” container to run a fixed set of tests (gateway ping, DNS resolve, TCP connect) before rollout expanded.
Story 3 — The boring but correct practice that saved the day
A finance-adjacent company (the kind that treats downtime like a career event) had a cluster standard that annoyed engineers: every node had identical bridge names, VLAN-aware settings, and MTUs. No exceptions. If a node needed special handling, it got its own cluster.
During a hardware refresh, they migrated containers node-to-node all day. One migration failed: a container lost external connectivity immediately after starting on the new node. The on-call engineer didn’t speculate. They ran the standard checklist: confirm veth exists, check bridge membership, check VLAN membership, tcpdump ARP on both sides.
Within minutes, they saw ARP requests leaving the veth but never appearing on the physical uplink. That narrowed it to bridge/uplink plumbing, not the container. Next, ip link show master vmbr0 revealed the veth was on vmbr0, but vmbr0 wasn’t actually attached to the NIC they thought. The interface name had changed due to firmware/BIOS updates and predictable network interface naming doing what it promised: being predictable, just not in the way they expected.
Because they had configuration management, they redeployed the node network config with the corrected NIC name and were done. No snowflake debugging, no “maybe reboot.” The practice that saved them was boring: enforce standardization, and treat node network config like production code.
FAQ
1) Is Proxmox LXC using tap devices or veth devices?
Typically veth for LXC. Tap is common for QEMU VMs. The troubleshooting overlap is bridge membership, VLAN, firewall, and MTU.
2) How do I find which veth belongs to which container?
Start with pct config <ctid> for MAC/bridge. Then list ip -br link | grep veth. Names usually look like veth<ctid>i0.
3) My container can ping the host but not the gateway. What does that imply?
You likely have local bridge connectivity but broken upstream L2/L3 adjacency: VLAN mismatch, upstream switchport security, or incorrect tagging/PVID.
4) Does enabling Proxmox firewall automatically block containers?
It can, depending on default policies and rule sets at datacenter/node/CT levels. Also watch bridge netfilter sysctls; they can make bridged traffic hit L3 rules unexpectedly.
5) What’s the fastest way to prove it’s VLAN-related?
Check bridge vlan show dev vethXYZ and compare to the container’s tag= and the upstream switchport mode. Then capture ARP on vmbr and uplink.
6) Why do pings work but HTTPS fails?
Classic MTU/PMTUD failure or TCP MSS issues. ICMP echo is small; TLS handshakes and data transfer aren’t. Verify MTU and look for “frag needed” ICMP behavior.
7) When should I suspect rp_filter?
When you have multiple uplinks, policy routing, VRFs, or asymmetric return paths. If traffic disappears without a clear firewall drop, rp_filter is a prime suspect.
8) Can I “fix” this by switching from bridge to routed/NAT networking?
You can, but that’s not a fix; it’s a design change. If you route/NAT, you must own forwarding sysctls, forward-chain policy, and NAT rules. Do it intentionally.
9) My containers lose network after live migration or restore. Why?
Node config drift: different bridge names, VLAN-aware settings, firewall defaults, or MTU. Standardize node networking and validate before migrations.
Next steps (what to change after you fix it)
Once you’ve found the cause, don’t stop at “it works.” Production systems regress because the underlying ambiguity stays in place.
- Standardize node networking. Same vmbr names, same VLAN-aware setting, same MTU. Treat it as an API contract for workloads.
- Document VLAN intent in one place. For each bridge/uplink: trunk vs access, allowed VLANs, and whether containers should tag.
- Make firewalling deliberate. Decide whether you filter bridged traffic. If yes, write rules with that topology in mind. If no, disable bridge netfilter hooks.
- Keep a known-good test container. A tiny CT that can ping gateway, resolve DNS, and hit a TCP endpoint. Run it after any network/firewall change.
- Capture-first culture. When in doubt, tcpdump both sides of the veth and the bridge. If you can’t see the packet, you’re arguing with a ghost.