Everything looks correct. The switch port is a trunk. The VM has a tagged interface. The VLAN exists. You can even see packets on the wire. And yet: no traffic. Not “slow.” Not “flaky.” Just dead, like your change window.
On Ubuntu 24.04, VLANs “not working” on a Linux bridge is often one missing bit: bridge VLAN filtering. If vlan_filtering isn’t enabled on the bridge, your carefully crafted trunk turns into a polite suggestion that the kernel ignores. The fix is simple; the diagnosis is where people burn hours.
The forgotten flag: vlan_filtering and why it matters
Linux bridges can do two very different things with VLANs:
- VLAN-unaware bridging: the bridge forwards frames based on MAC learning and doesn’t maintain per-port VLAN membership. Tags may pass through in some cases, but you are not configuring VLANs on the bridge itself.
- VLAN-aware bridging: the bridge acts like a small managed switch. Ports have VLAN membership, a PVID (Port VLAN ID), optional untagged behavior, and filtering. This mode is controlled by
vlan_filtering.
On Ubuntu 24.04, especially with Netplan + systemd-networkd, people often build a bridge expecting “switch-like trunking.” They define VLANs, create a VM interface, tag it, and assume the bridge will forward tagged traffic. But the bridge won’t behave like a VLAN-aware switch unless you enable VLAN filtering. You can have perfectly valid VLAN definitions and still be filtering nothing—because the kernel isn’t asked to.
Here’s the core behavior to remember:
- If
vlan_filtering=0,bridge vlanconfiguration is effectively ignored for forwarding decisions. - If
vlan_filtering=1, the bridge uses its VLAN table to decide which VLANs are permitted on which ports, and how untagged frames are classified (PVID).
One small joke, because you’ve earned it: VLAN filtering is like a seatbelt—nobody notices it until the moment they’re launched through the windshield.
What “VLANs don’t work” actually looks like
Common patterns:
- VMs on a tagged VLAN can ARP but not get replies.
- VM-to-VM on the same host works, but VM-to-gateway doesn’t.
- Untagged traffic works on the bridge, but tagged traffic disappears.
- tcpdump shows VLAN tags on a VM interface, but not on the physical NIC (or vice versa).
How Linux bridges really handle VLANs (not how we wish they did)
A Linux bridge is a kernel datapath (not just a userspace concept), with knobs that determine how frames are classified and forwarded. VLAN-aware bridging is implemented in the bridge layer: the bridge maintains VLAN membership per port and per VLAN, plus flags like “untagged” and “PVID.”
The moving parts you must keep straight
- Bridge device (e.g.,
br0): where VLAN filtering is enabled and where the VLAN table lives. - Bridge ports (e.g.,
eno1,vnet12): each port can be a trunk member of many VLANs, and can have a PVID. - PVID: the VLAN assigned to untagged ingress frames on a port. This is your “native VLAN” concept (but don’t overuse that term around network engineers unless you like sighing).
- Untagged egress flag: whether frames for that VLAN leave the port without a tag.
- VLAN subinterfaces (e.g.,
br0.20oreno1.20): a separate way to do VLANs, often used when you want L3 on a VLAN. This can coexist with VLAN-aware bridging, but mixing patterns carelessly is how you create ghost problems.
Why Ubuntu 24.04 trips people up
Ubuntu 24.04 defaults many servers to Netplan with systemd-networkd. That’s good. It’s deterministic. It’s also unforgiving: if you don’t explicitly ask for VLAN-aware behavior, you won’t get it. Older how-tos from the “ifupdown era” often assume different defaults, and many virtualization stacks “hide” complexity until you go off the happy path.
Also: Linux gives you at least three ways to build the same network topology. Pick one and stick to it:
- VLAN-aware Linux bridge (recommended for KVM bridging and multi-VLAN VM hosting).
- Separate VLAN interfaces + separate bridges per VLAN (boring and effective; scales poorly but debugs easily).
- OVS (Open vSwitch) (powerful; more moving parts; worth it if you need OVS features).
Fast diagnosis playbook (check 1, 2, 3)
If VLANs “don’t work,” you want to find the bottleneck quickly. Don’t jump into Netplan YAML immediately. First, prove what the kernel thinks is true.
1) Is the bridge VLAN-aware?
Check vlan_filtering and the VLAN table. If filtering is off, stop and fix that first.
2) Does the bridge allow the VLAN on the correct ports?
The VLAN must be present on both:
- the physical uplink port (the trunk to your switch), and
- the VM’s vnet port (or container veth), or a downstream bond/bridge.
3) Is untagged vs tagged behavior correct?
Most outages here are “PVID mismatch” or “native VLAN assumption.” Confirm:
- What VLAN does untagged ingress become?
- Are you unintentionally stripping tags on egress?
4) Then check the external switch
Only after you’ve proven Linux is configured right should you go argue with the network team (or with your past self). Validate trunk allowed VLANs and native VLAN expectation.
Practical tasks: commands, expected output, and decisions
These are the tasks I actually run in production when someone says “VLANs don’t work.” Each task includes a command, sample output, what it means, and the decision you make.
Task 1: Identify the bridge and its ports
cr0x@server:~$ bridge link
2: eno1 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> master br0
5: vnet12 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> master br0
6: vnet13 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> master br0
What it means: Ports eno1, vnet12, vnet13 are enslaved to br0.
Decision: If the physical NIC you expect isn’t listed, your bridge isn’t connected to the outside world. Fix that before chasing VLANs.
Task 2: Check whether VLAN filtering is enabled (the whole point)
cr0x@server:~$ ip -d link show br0
7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bridge forward_delay 1500 hello_time 200 max_age 2000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q
What it means: vlan_filtering 0 means the bridge is VLAN-unaware. Your bridge vlan table won’t be enforced.
Decision: If you’re trying to trunk VLANs through br0, you almost certainly want vlan_filtering 1. Turn it on (temporarily with ip, permanently via Netplan/systemd-networkd).
Task 3: Show the current VLAN membership table
cr0x@server:~$ bridge vlan show
port vlan-id
eno1 1 PVID Egress Untagged
vnet12 1 PVID Egress Untagged
vnet13 1 PVID Egress Untagged
What it means: Only VLAN 1 exists everywhere, and it’s the PVID, and it leaves untagged. No VLAN 20/30 is present.
Decision: Add VLANs to the relevant ports, or re-check that your config applied. If VLAN filtering is off, fix that first—this table may be meaningless for forwarding behavior.
Task 4: Confirm the VM interface sees VLAN tags
cr0x@server:~$ tcpdump -eni vnet12 -c 5 vlan
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vnet12, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:00:01.100000 52:54:00:11:22:33 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), vlan 20, p 0, ethertype ARP (0x0806), Request who-has 10.20.0.1 tell 10.20.0.50, length 46
12:00:01.200000 52:54:00:11:22:33 > 33:33:00:00:00:02, ethertype 802.1Q (0x8100), vlan 20, p 0, ethertype IPv6 (0x86dd), length 86
What it means: The VM is sending tagged frames (VLAN 20) to the host.
Decision: If tags exist on the vnet, the VM is likely configured correctly. Now ensure the bridge forwards VLAN 20 and the uplink allows it.
Task 5: Confirm the physical uplink sees the same VLAN tags
cr0x@server:~$ tcpdump -eni eno1 -c 5 vlan 20
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eno1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:00:01.300000 52:54:00:ab:cd:ef > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), vlan 20, p 0, ethertype ARP (0x0806), Request who-has 10.20.0.1 tell 10.20.0.50, length 46
What it means: Tags are making it out the NIC. That points away from Linux and toward the switch, upstream filtering, or return path issues.
Decision: If you see tags on vnet but not on eno1, Linux is dropping/stripping them (VLAN filtering table/PVID/untagged flags). Fix Linux bridge config first.
Task 6: Temporarily enable VLAN filtering to prove the hypothesis
cr0x@server:~$ sudo ip link set dev br0 type bridge vlan_filtering 1
cr0x@server:~$ ip -d link show br0 | sed -n '1,3p'
7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bridge forward_delay 1500 hello_time 200 max_age 2000 stp_state 0 priority 32768 vlan_filtering 1 vlan_protocol 802.1Q
What it means: The bridge is now VLAN-aware.
Decision: If VLANs suddenly start working, you’ve found the missing flag. Now make it persistent in Netplan/systemd-networkd. Don’t leave it as a snowflake runtime fix.
Task 7: Add VLAN 20 to uplink and VM port (runtime)
cr0x@server:~$ sudo bridge vlan add dev eno1 vid 20
cr0x@server:~$ sudo bridge vlan add dev vnet12 vid 20
cr0x@server:~$ bridge vlan show | egrep 'eno1|vnet12'
eno1 1 PVID Egress Untagged
eno1 20
vnet12 1 PVID Egress Untagged
vnet12 20
What it means: VLAN 20 is now permitted on both ports.
Decision: If traffic works now, translate these settings into your declarative config. Runtime commands are for diagnosis, not for sleeping at night.
Task 8: Verify PVID and untagged behavior on the uplink
cr0x@server:~$ bridge vlan show dev eno1
port vlan-id
eno1 1 PVID Egress Untagged
eno1 20
What it means: Untagged frames entering eno1 will be treated as VLAN 1, and VLAN 1 will egress untagged. VLAN 20 will egress tagged.
Decision: If your switch port expects a different native VLAN, adjust the PVID (and possibly remove VLAN 1). Misaligned PVIDs create “works for some hosts” mysteries.
Task 9: Check for unexpected VLAN subinterfaces that complicate bridging
cr0x@server:~$ ip -o link show | grep -E '\.([0-9]+):'
10: eno1.20@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
11: br0.30@br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
What it means: You have VLAN subinterfaces on top of the NIC and/or bridge. This isn’t wrong, but it’s a sign you might be mixing design patterns.
Decision: Decide whether you’re doing VLAN-aware bridging (preferred for VM trunks) or VLAN subinterfaces + per-VLAN bridges. Mixing without a diagram is how debugging turns into folklore.
Task 10: Confirm which renderer and backend are in use
cr0x@server:~$ networkctl status -a | sed -n '1,14p'
● State: routable
Online state: online
Address state: routable
IPv4 address state: routable
IPv6 address state: degraded
Carrier bound to: config
DNS: 10.0.0.53
Network:
Link 2 (eno1)
Link 7 (br0)
What it means: systemd-networkd is managing links (common on Ubuntu servers). If you were expecting NetworkManager, you’re debugging the wrong stack.
Decision: Stick with one control plane. If Netplan renders to networkd, use Netplan (or native networkd files) consistently.
Task 11: Validate netplan rendering and catch YAML mistakes
cr0x@server:~$ sudo netplan generate
cr0x@server:~$ sudo netplan try
Do you want to keep these settings?
Press ENTER before the timeout to accept the new configuration
Changes will revert in 120 seconds
What it means: generate succeeded (no syntax errors). try applies temporarily and gives you an escape hatch.
Decision: Use netplan try in remote sessions. If VLAN mistakes kill your connectivity, you get your server back without a flashlight-and-console pilgrimage.
Task 12: Inspect the kernel bridge state and forwarding database
cr0x@server:~$ bridge fdb show br br0 | head
52:54:00:11:22:33 dev vnet12 master br0 permanent
0a:1b:2c:3d:4e:5f dev eno1 master br0
33:33:00:00:00:16 dev vnet12 master br0 permanent
What it means: The bridge is learning MACs and has some permanent entries. This proves L2 forwarding is alive at least at a basic level.
Decision: If FDB is empty while interfaces are up and traffic exists, you may be dealing with offload quirks, wrong attachment (VM not on that bridge), or traffic never reaching the bridge at all.
Task 13: Check for VLAN-related drops in counters
cr0x@server:~$ ip -s link show eno1 | sed -n '1,12p'
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:aa:bb:cc brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
987654321 1234567 0 120 0 45678
TX: bytes packets errors dropped carrier collsns
876543210 1122334 0 0 0 0
What it means: Non-zero dropped on RX could be queueing, driver, offload, or policy. It’s a clue, not a verdict.
Decision: If drops spike during your VLAN tests, consider offload settings or MTU mismatch (especially with VLAN + overlay). Measure before “optimizing.”
Task 14: Verify MTU end-to-end (VLAN adds overhead, your overlay adds regret)
cr0x@server:~$ ip link show br0 | head -n 2
7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:ab:cd:ef brd ff:ff:ff:ff:ff:ff
What it means: Bridge MTU is 1500. If your physical network is 1500 and you also run VXLAN/Geneve on top, you’ll fragment or drop.
Decision: If you use overlays, raise MTU on underlay appropriately (and make it consistent), or accept lower overlay MTU and configure it intentionally.
Task 15: Confirm switch-facing VLAN expectations by testing untagged vs tagged
cr0x@server:~$ sudo arping -I br0 -c 3 10.0.0.1
ARPING 10.0.0.1 from 10.0.0.10 br0
Unicast reply from 10.0.0.1 [00:11:22:33:44:55] 1.233ms
Unicast reply from 10.0.0.1 [00:11:22:33:44:55] 1.104ms
Unicast reply from 10.0.0.1 [00:11:22:33:44:55] 1.098ms
Sent 3 probes (1 broadcast(s))
Received 3 response(s)
What it means: Untagged/native VLAN connectivity works on br0.
Decision: If untagged works but VLAN 20 doesn’t, focus on VLAN membership/PVID/tagging across bridge ports and switch trunk allow-list.
Netplan on Ubuntu 24.04: the VLAN-aware bridge pattern that works
There are two sane patterns for multi-VLAN VM hosts. Pick based on your operational priorities.
Pattern A (recommended): VLAN-aware bridge trunk to VMs
You create one bridge (br0) with the physical uplink as a port. You enable VLAN filtering. You define which VLANs are allowed on which ports. VMs can be connected and either tagged inside the VM, or you can configure per-VM VLAN behavior in your hypervisor tooling (libvirt/virt-manager/etc.).
A representative Netplan example (conceptually; your exact keys may vary with your environment):
cr0x@server:~$ sudo cat /etc/netplan/01-br0.yaml
network:
version: 2
renderer: networkd
ethernets:
eno1:
dhcp4: false
bridges:
br0:
interfaces: [eno1]
dhcp4: false
addresses: [10.0.0.10/24]
routes:
- to: default
via: 10.0.0.1
nameservers:
addresses: [10.0.0.53]
parameters:
stp: false
# The critical part: VLAN-aware bridge behavior is not magic.
# Netplan renders this into the appropriate backend config.
# If your environment doesn't apply it, validate with `ip -d link show br0`.
Reality check: Netplan’s abstraction is helpful until it isn’t. The only truth is what the kernel reports (ip -d link, bridge vlan show).
If you cannot get Netplan to reliably set VLAN filtering in your environment, it’s acceptable to drop down to native systemd-networkd config. Production systems reward boring correctness.
Pattern B: per-VLAN subinterfaces and per-VLAN bridges
This looks like:
eno1.20attached tobr20eno1.30attached tobr30
It’s verbose, but debug-friendly: if VLAN 20 is broken, you stare at eno1.20 and br20 and stop there. This is what you do when you want the simplest mental model and you don’t mind config sprawl.
systemd-networkd knobs you actually need
Ubuntu 24.04 server networking frequently ends up managed by systemd-networkd. Under that, VLAN-aware bridging typically comes down to:
- Bridge created and ports attached
VLANFiltering=yes(or equivalent) applied to the bridge- VLAN membership configured per port (sometimes via bridge VLAN entries)
Instead of guessing, validate what’s applied:
cr0x@server:~$ networkctl status br0 | sed -n '1,40p'
● 7: br0
Link File: /run/systemd/network/10-netplan-br0.network
Network File: /run/systemd/network/10-netplan-br0.network
Type: bridge
State: routable (configured)
Online state: online
What it means: Netplan generated runtime networkd units. Now you know where to look if you need to inspect the rendered config.
cr0x@server:~$ sudo sed -n '1,120p' /run/systemd/network/10-netplan-br0.netdev
[NetDev]
Name=br0
Kind=bridge
[Bridge]
STP=false
What it means: This rendered file might not include VLAN filtering, depending on how Netplan expressed it (or didn’t).
Decision: If the runtime config doesn’t set VLAN filtering, you can either fix your Netplan YAML or switch to native networkd files where you explicitly control it.
Second small joke (and final one): A bridge without VLAN filtering is like a change request without a rollback plan—technically allowed, emotionally irresponsible.
Common mistakes: symptom → root cause → fix
This is the section you’ll want when it’s 02:00 and you’re squinting at tcpdump.
1) Tagged VLAN traffic never leaves the host
Symptom: VLAN tags appear on vnetX, but not on eno1.
Root cause: Bridge VLAN filtering enabled but VLAN not allowed on uplink port; or VLAN filtering disabled so your VLAN config is not being enforced the way you think; or egress untagged set incorrectly.
Fix: Enable VLAN filtering on the bridge and add VLAN membership on both ports:
cr0x@server:~$ sudo ip link set dev br0 type bridge vlan_filtering 1
cr0x@server:~$ sudo bridge vlan add dev eno1 vid 20
cr0x@server:~$ sudo bridge vlan add dev vnet12 vid 20
2) Untagged traffic works, tagged VLANs fail
Symptom: Host management on native VLAN works. VM on VLAN 20 can’t reach gateway.
Root cause: Switch trunk allow-list missing VLAN; or Linux bridge not permitting VLAN 20 on uplink; or the VM expects tags but your hypervisor is stripping them.
Fix: Validate VLAN membership on Linux ports first with bridge vlan show, then validate switch trunk.
3) Some VMs can talk on VLAN 20, others can’t
Symptom: VM-A works on VLAN 20, VM-B doesn’t, same host.
Root cause: The broken VM’s vnet port lacks VLAN membership; or it has different tagging mode (tagged inside guest vs untagged from guest and tagged by hypervisor tooling).
Fix: Compare bridge vlan show dev vnetX for both, and standardize one approach.
4) ARP works but IP traffic doesn’t
Symptom: You see ARP replies but pings time out.
Root cause: MTU/fragmentation issues (especially with overlays), security policy upstream, or asymmetric routing. VLAN config can be fine.
Fix: Check MTU consistency, then check routing and firewall. VLANs aren’t the only thing that can ruin your evening.
5) Traffic works until you reboot
Symptom: Runtime bridge vlan add fixed it, but reboot breaks it.
Root cause: You didn’t make the VLAN filtering and VLAN table persistent; Netplan/networkd config missing critical lines.
Fix: Encode settings in Netplan or native networkd files. Confirm after reboot with ip -d link show br0 and bridge vlan show.
6) “But the bridge is up” (classic misdirection)
Symptom: All links show UP, but VLAN forwarding doesn’t happen.
Root cause: VLAN filtering is off or VLAN membership missing. Link state is not policy state.
Fix: Stop trusting “UP.” Start trusting the VLAN table and captures.
Three corporate mini-stories from the trenches
Incident caused by a wrong assumption: “A Linux bridge is a switch, right?”
A mid-sized company ran a virtualization cluster on Ubuntu. They migrated from an older host image where networking was configured with handcrafted scripts. The new build standardized on Ubuntu 24.04 and Netplan. The plan was sane: one bridge per host, trunk multiple VLANs, and keep VM networking flexible.
The first maintenance window went smoothly—until new VMs on a tagged “app VLAN” couldn’t reach their gateway. The team did the usual dance: they rechecked switch trunking, reloaded the ToR port, reattached the VM NICs, and tried different virtio models. Someone even toggled STP because it felt like something to do.
The wrong assumption was simple: they believed that defining VLAN membership somewhere in their toolchain would automatically make the Linux bridge behave like a VLAN-aware switch. It didn’t. The bridge was forwarding untagged frames fine, so everyone looked away from the host and toward the network. Meanwhile, the host was calmly ignoring their VLAN intentions because vlan_filtering was off.
The fix took minutes: enable VLAN filtering and add VLAN membership for the uplink and the VM ports. The real work was social: they had to update their build documentation and their acceptance tests so this couldn’t happen again.
Afterward, they wrote a tiny preflight check that ran on every host: dump ip -d link show br0, fail the build if vlan_filtering 0, and require a known VLAN list in bridge vlan show. Nobody loves guardrails until they prevent an outage.
Optimization that backfired: “Let’s offload everything”
Another organization wanted better throughput on VM networking. They had multiple VLANs bridged to a 25G NIC, and they were seeing occasional drops on bursts. The fix, they thought, was to enable every offload feature available and to tune queues aggressively.
At first, it looked better: CPU dropped, throughput climbed in synthetic tests, and graphs improved. Then the trouble started: sporadic packet loss only on one VLAN, and only for certain traffic patterns. ARP seemed fine. TCP would stall and recover. UDP-based monitoring missed intervals. It was the worst kind of bug: intermittent and plausibly “upstream.”
The backfire came from a mix of driver behavior and how captures were interpreted. With some offloads enabled, packet visibility changed: tcpdump didn’t always tell the full truth about what was tagged where, and the team chased phantom VLAN issues for days. They also discovered that the bursts were caused by an application deployment pattern, not by the network itself, so they tuned the wrong thing first.
The eventual resolution was not “disable all offloads forever.” It was more boring: create a repeatable traffic test, adjust one setting at a time, validate with hardware counters, and keep VLAN diagnosis separate from performance tuning. When they reintroduced offloads selectively and verified end-to-end MTU and tagging, the stability returned.
Lesson: performance work changes observability. If you can’t measure it, you can’t trust it. Especially around VLAN tagging and bridging.
Boring but correct practice that saved the day: “Make the kernel state your acceptance test”
A finance-adjacent enterprise ran strict change control, and their networking was famously conservative. Their VM hosts were Ubuntu-based, and they moved slowly for good reasons. They also had a habit I wish more teams copied: they validated the live kernel state after applying network config, every time, with a checklist.
During a refresh, a new template introduced a subtle Netplan change. The YAML looked “equivalent” to the old one, but it no longer set the bridge to VLAN-aware behavior. The first host came online and immediately failed the post-config acceptance checks.
Because they had written down what “correct” looks like—vlan_filtering 1, expected VLAN IDs present on expected ports, and a quick tagged ping test—they caught it before any production workload moved. No incidents. No midnight calls. Just a failed deployment step and a ticket to fix the template.
They didn’t win points for creativity. They won by being dull and right.
Facts and context (why this is confusing in 2025)
Some short, concrete facts and history that explain why people keep stepping on this rake:
- 802.1Q VLAN tagging adds a 4-byte header to Ethernet frames, inserting a VLAN ID and priority bits.
- Linux bridging has been in-kernel for decades, and it’s closer to a switch datapath than most people realize—until VLAN policy enters the chat.
- VLAN-aware bridge support matured over time; early setups often used VLAN subinterfaces instead because it was easier to reason about.
- The “native VLAN” concept is primarily a switch-port behavior; in Linux bridges, the analogous mechanism is the PVID and untagged egress flags.
- Netplan is a renderer, not a network stack. It generates config for systemd-networkd or NetworkManager, which means “I changed YAML” is not the same as “the kernel changed behavior.”
- systemd-networkd is strict and deterministic, which is good for servers. It also means missing one boolean can quietly disable an entire feature set.
- tcpdump visibility can be distorted by offloads (TSO/GSO/GRO and VLAN offloads), so captures should be corroborated with bridge state and counters.
- Bridge VLAN tables are per-port; it’s not enough to “enable VLAN 20 on the bridge.” The uplink and the VM port both must allow it.
- Per-VLAN bridges are an old-school pattern that still wins in operational simplicity; it’s verbose, but it’s hard to misunderstand during an incident.
Checklists / step-by-step plan
Step-by-step: fix a broken trunked VLAN bridge on Ubuntu 24.04
- Confirm topology: identify bridge name (
br0) and physical uplink (eno1). - Check VLAN filtering:
ip -d link show br0; ifvlan_filtering 0, that’s your first fix. - Turn it on (temporarily):
ip link set dev br0 type bridge vlan_filtering 1. - Inspect VLAN table:
bridge vlan show. Confirm the expected VLANs exist. - Allow VLANs on uplink and VM ports:
bridge vlan add dev eno1 vid 20andbridge vlan add dev vnet12 vid 20. - Verify PVID and untagged settings: ensure your “native” behavior matches the switch configuration.
- Prove it with packet capture: tags should appear on both vnet and physical NIC where appropriate.
- Make it persistent: encode in Netplan or native networkd; avoid runtime-only fixes.
- Reboot test: validate again after reboot; don’t trust a configuration until it survives one.
- Document the expected state: save the outputs of
ip -d link show br0andbridge vlan showas “known-good.”
Operational checklist: what “good” looks like
ip -d link show br0showsvlan_filtering 1.bridge vlan showlists expected VLAN IDs oneno1and on each relevant VM port.- PVID is intentional, not default-by-accident.
- Untagged egress is only set where you mean it.
- tcpdump confirms tags traverse the host correctly.
- A reboot does not change behavior.
FAQ
1) What is the “one bridge flag most people forget”?
vlan_filtering on the bridge device. Without it, VLAN-aware forwarding rules aren’t applied the way people expect when building trunks through a Linux bridge.
2) How do I check if VLAN filtering is enabled?
Run:
cr0x@server:~$ ip -d link show br0 | grep -o 'vlan_filtering [01]'
vlan_filtering 0
If it prints vlan_filtering 0, your bridge is VLAN-unaware.
3) If VLAN filtering is off, are VLAN tags always dropped?
Not always; that’s why this is confusing. Tags can sometimes appear to pass, depending on topology and expectations. But if you’re relying on per-port VLAN membership and trunk-like policy, you want VLAN-aware mode. Otherwise you’re debugging behavior that isn’t policy-driven.
4) Do I need to configure VLANs on both the uplink and the VM port?
Yes, for VLAN-aware bridging. The VLAN must be allowed on the ingress/egress ports that carry it. The physical trunk and the VM’s vnet port both need membership for that VLAN.
5) What’s the difference between PVID and “untagged”?
PVID classifies incoming untagged frames into a VLAN. “Egress untagged” decides whether frames for that VLAN leave the port without an 802.1Q tag. You can have a VLAN present without it being the PVID.
6) Should I trunk VLANs into the VM, or tag at the host?
Pick one per environment and standardize:
- Tag inside the VM if the VM is a router/firewall or needs multiple VLANs.
- Tag at the host/hypervisor if the VM should be “single VLAN, simple NIC.”
Mixing both approaches randomly is a great way to create “it depends” outages.
7) Why does my tcpdump not show VLAN tags I’m sure exist?
Offloads can change what you see in captures. VLAN tag handling may be offloaded to hardware, and GRO/GSO can coalesce packets. Use bridge state (bridge vlan show) and corroborate with multiple capture points.
8) Is Open vSwitch a better answer?
Sometimes. If you need features like more advanced policy, tunneling integration, or richer observability at the virtual switch layer, OVS can be worth it. If your requirement is “trunk VLANs to VMs reliably,” the in-kernel Linux bridge with VLAN filtering is usually simpler and faster to debug.
9) Can I fix this with a reboot or restarting network services?
Reboots don’t fix missing configuration; they just roll the dice on what you forgot. Use reboots to verify persistence, not as a repair tool.
10) What’s a good acceptance test after changes?
At minimum:
ip -d link show br0and verifyvlan_filtering 1bridge vlan showand verify VLAN presence on relevant ports- Packet capture or a tagged connectivity test from a VM
- Reboot and repeat the checks
Conclusion: practical next steps
When VLANs “don’t work” on Ubuntu 24.04 bridges, the fastest path is to stop hypothesizing and interrogate the kernel. Check vlan_filtering. Check the bridge VLAN table. Confirm the VLAN exists on both the uplink and the VM port. Only then start debating the switch config.
Here’s your next change-window-friendly plan:
- Run
ip -d link show br0and fixvlan_filteringfirst. - Run
bridge vlan showand ensure VLAN membership is correct per port. - Prove forwarding with a two-point tcpdump (vnet + physical NIC).
- Make it persistent in your chosen control plane (Netplan or native networkd) and validate after reboot.
One paraphrased idea from Werner Vogels (reliability-focused engineering): everything fails eventually; design so failures are expected and recoverable
(paraphrased idea).