When Proxmox says “bridge port has no carrier”, it’s not being poetic. Your host’s bridge has a member interface that is physically (or logically) not seeing link. That means VMs can be perfect, firewall rules can be pristine, and your routing can be Pulitzer-worthy—none of it matters if the underlying port is effectively unplugged.
The trick is speed. Don’t get hypnotized by Proxmox UI warnings or start randomly restarting networking. You want to identify whether you’re dealing with layer‑1 (cable/SFP/switch port), layer‑2 (VLAN/STP/LACP), or host driver/firmware. Then you fix the one thing that’s actually broken, and you stop.
What “bridge port has no carrier” actually means
In Proxmox, your VM networking usually sits on a Linux bridge like vmbr0. That bridge has one or more ports (usually a physical NIC like enp3s0 or a bond like bond0). The error appears when the port reports no carrier—Linux’s way of saying “link is down.”
Carrier is not “I can ping the gateway.” Carrier is the physical or PHY-level link signal: the interface can’t negotiate link with whatever is on the other end. If it’s a copper port, that’s usually cable/switch. If it’s fiber/DAC, that’s SFP module compatibility, fiber polarity, or the switch port configuration. If it’s a bond, it can be LACP mismatch. If it’s a driver, it can be firmware/driver issues, power management, or the NIC being in a weird state.
A bridge warning is often the messenger that gets shot. The bridge is fine. The port is the problem. Your job: find out why the port lost carrier, not why Proxmox complained about it.
Exactly one quote, because it fits this job: “Hope is not a strategy.”
— Vince Lombardi
Fast diagnosis playbook (first/second/third)
First: Prove whether it’s truly link (L1) or something higher
- Check link state from the host (
ip link,ethtool). If it saysNO-CARRIERorLink detected: no, treat it as L1 until proven otherwise. - Check the switch port lights and status. A dead LED is a gift: it means you can stop arguing with VLANs.
- Swap the known-good thing: cable/DAC/SFP or move to a known-good switch port. Do it early. It’s faster than being clever.
Second: If carrier is up, chase L2/LACP/VLAN issues
- Verify bridge membership (
bridge link). Make sure the right interface is enslaved to the bridge. - Verify VLAN mode matches switch (native VLAN vs tagged, trunks). A trunk mismatch doesn’t usually show “no carrier,” but it does show “everything is dead” and people mislabel it.
- If bonding, confirm LACP state (
cat /proc/net/bonding/bond0) and switch side aggregation.
Third: If link flaps or negotiates wrong, suspect driver/firmware/EEE
- Look for link flap events in
journalctlanddmesg. - Identify the driver and firmware (
ethtool -i,lspci -nnk). - Disable known troublemakers for diagnosis (EEE on copper, offloads in rare cases, ASPM). Test. Revert if it doesn’t help.
Joke #1: The fastest way to troubleshoot a link is to assume it’s the cable—because it’s almost always the cable, until it’s the one time it’s the SFP you “borrowed” from a drawer.
Interesting facts and historical context (you’ll diagnose faster)
- “Carrier detect” is older than your hypervisor: the term comes from early network and modem signaling—Linux kept the language because it maps cleanly to PHY link state.
- Linux bridges are kernel-native, not a “Proxmox thing”: Proxmox configures them, but the logic and telemetry are standard Linux tooling (iproute2, bridge, ethtool).
- Auto-negotiation has been a recurring source of pain since Fast Ethernet: mismatched negotiation can cause flaps, wrong speed/duplex, or “link up but useless” behavior.
- Energy Efficient Ethernet (EEE) was designed to save power: on some combinations of NIC+switch, it saves power by saving your uptime too (by accident, and not in a good way).
- SFP modules have “codes” and compatibility checks: many vendors enforce or “prefer” their optics; some NICs and switches are picky, and the failure mode is often a clean “no link.”
- DAC cables aren’t just copper cables: they embed identification and have length/quality constraints; a cheap DAC can negotiate at 1G but fail at 10G.
- LACP (802.3ad) is not “set and forget”: one side in static LAG and the other in LACP can produce partial connectivity, blackholing, or intermittent carrier depending on implementation.
- Predictable interface names (enpXsY): systemd/udev naming reduced “eth0 roulette,” but it also made humans misread NIC identities and plug the wrong port.
- STP and port security exist to prevent loops: the switch may shut or block a port after topology events; the host sees link, but traffic dies—different from no-carrier, but often confused during incidents.
Hands-on tasks: commands, outputs, and decisions
These are the tasks I run in production when Proxmox reports bridge port has no carrier. Each one includes: the command, an example of real output, what it means, and what decision you make.
Task 1: Confirm which interface is actually “no carrier”
cr0x@server:~$ ip -br link
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
enp3s0 DOWN 3c:fd:fe:aa:bb:cc <BROADCAST,MULTICAST>
vmbr0 DOWN 3c:fd:fe:aa:bb:cc <BROADCAST,MULTICAST>
Meaning: enp3s0 is down and vmbr0 inherits that. A bridge without an active uplink is basically a parking lot with no road out.
Decision: Stop looking at VM config. You’re in host NIC / switch / cable territory.
Task 2: Check carrier and negotiated parameters with ethtool
cr0x@server:~$ ethtool enp3s0
Settings for enp3s0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Auto-negotiation: on
Link detected: no
Meaning: The NIC can’t see link pulses. This is not a VLAN problem.
Decision: Check physical layer: cable, patch panel, switch port, transceiver.
Task 3: Confirm the bridge is configured how you think it is
cr0x@server:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
auto enp3s0
iface enp3s0 inet manual
auto vmbr0
iface vmbr0 inet static
address 10.20.30.11/24
gateway 10.20.30.1
bridge-ports enp3s0
bridge-stp off
bridge-fd 0
Meaning: vmbr0 is backed by enp3s0. If enp3s0 is dead, everything on vmbr0 is dead.
Decision: Verify the physical cable attached to the NIC port corresponding to enp3s0. Don’t assume the label is correct.
Task 4: Identify the NIC and the loaded driver
cr0x@server:~$ lspci -nnk | sed -n '/Ethernet controller/,+4p'
03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
Subsystem: Intel Corporation Ethernet Server Adapter I210-T1 [8086:0001]
Kernel driver in use: igb
Kernel modules: igb
Meaning: You now know what silicon you’re dealing with. This matters for known driver quirks and firmware tooling.
Decision: If you suspect driver issues, you can search internally by device ID/driver, and you can validate firmware and module parameters.
Task 5: Check driver and firmware versions from ethtool
cr0x@server:~$ ethtool -i enp3s0
driver: igb
version: 6.1.0-27-pve
firmware-version: 3.25, 0x80000d4b
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Meaning: Confirms kernel/driver and NIC firmware. If you’re seeing link flaps after a kernel update, this is where you start correlating.
Decision: If firmware is ancient or mismatched across hosts, schedule a maintenance update; for now keep diagnosing L1.
Task 6: Look for link flap messages in the journal
cr0x@server:~$ journalctl -k -b | grep -E 'enp3s0|igb|link up|link down' | tail -n 20
igb 0000:03:00.0 enp3s0: NIC Link is Down
igb 0000:03:00.0 enp3s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
igb 0000:03:00.0 enp3s0: NIC Link is Down
igb 0000:03:00.0 enp3s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Meaning: Link is flapping. That’s usually bad cable, bad port, marginal optics/DAC, or EEE weirdness—not a bridge configuration issue.
Decision: Swap cable/DAC first. If it persists, change switch port. If it persists, disable EEE and check NIC firmware.
Task 7: Verify bridge port state and whether it’s actually enslaved
cr0x@server:~$ bridge link
2: enp3s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master vmbr0 state disabled priority 32 cost 100
Meaning: The port is part of vmbr0, but state disabled because there’s no carrier.
Decision: Stop tweaking bridge STP timers. Fix the port’s carrier.
Task 8: Check switch-side symptoms indirectly with LLDP (if available)
cr0x@server:~$ lldpctl 2>/dev/null | sed -n '1,30p'
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Meaning: No LLDP neighbors. That’s consistent with no carrier, or LLDP not running, or the switch not speaking LLDP.
Decision: If you expect LLDP and get nothing, it nudges you toward L1. If you do get neighbors, you can identify the exact switch/port quickly.
Task 9: Check if NetworkManager/systemd is fighting ifupdown
cr0x@server:~$ systemctl is-active NetworkManager
inactive
Meaning: Good. Proxmox typically uses ifupdown2; you don’t want NetworkManager “helping.”
Decision: If it’s active, consider disabling it in a maintenance window, because dueling network stacks create ghost problems.
Task 10: Bring the interface down/up cleanly (controlled poke)
cr0x@server:~$ ifdown enp3s0 && ifup enp3s0
ifdown: interface enp3s0 not configured
Meaning: On many Proxmox hosts, the physical interface is “manual” and controlled via the bridge; ifdown may not apply. Don’t panic.
Decision: Bounce the bridge instead (carefully) or use ip link to toggle, but only if you have out-of-band access.
Task 11: Toggle link state and check if carrier returns
cr0x@server:~$ ip link set dev enp3s0 down
cr0x@server:~$ ip link set dev enp3s0 up
cr0x@server:~$ ip -br link show enp3s0
enp3s0 DOWN 3c:fd:fe:aa:bb:cc <BROADCAST,MULTICAST>
Meaning: If it remains down with no carrier, this is not a transient software hiccup.
Decision: Go physical. Swap cable/DAC. Check switch config and port status.
Task 12: If it’s fiber/DAC, read the module EEPROM (when supported)
cr0x@server:~$ ethtool -m enp3s0 | sed -n '1,25p'
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x21 (Copper pigtail)
Transceiver codes : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Vendor name : GENERIC
Vendor PN : DAC-10G-1M
Vendor rev : A
Vendor SN : ABC123456789
Meaning: You can see what module/cable you’re using. “GENERIC” is fine until a switch vendor decides it isn’t.
Decision: If link is down and the module looks suspect (wrong type, weird vendor, wrong speed), swap to a known-good supported optic/DAC.
Task 13: Check bonding state if the bridge port is a bond
cr0x@server:~$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Aggregator selection policy (ad_select): stable
Slave Interface: enp3s0
MII Status: down
Slave queue ID: 0
Slave Interface: enp4s0
MII Status: down
Slave queue ID: 0
Meaning: The bond itself is down because all slaves are down. That’s often upstream (switch stack issue, both ports disabled, wrong optics) or a shared physical mistake.
Decision: Verify both switch ports are enabled and in the same LACP group. Also verify cabling: two dead links often means one dead patch panel path used twice.
Task 14: Verify VLAN subinterfaces and bridge VLAN awareness
cr0x@server:~$ ip -d link show vmbr0 | sed -n '1,12p'
7: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:aa:bb:cc brd ff:ff:ff:ff:ff:ff
bridge forward_delay 0 hello_time 2 max_age 20 ageing_time 300 stp_state 0 priority 32768 vlan_filtering 0
Meaning: Bridge is up and has LOWER_UP here, so carrier exists. (This task is for the “it’s not carrier, it’s VLAN” cases.)
Decision: If VMs still can’t talk, now you pivot to VLAN tagging, trunking, and firewall rules—because carrier is fine.
Task 15: Validate reachability to the switch/router once carrier is up
cr0x@server:~$ ping -c 3 -I vmbr0 10.20.30.1
PING 10.20.30.1 (10.20.30.1) from 10.20.30.11 vmbr0: 56(84) bytes of data.
64 bytes from 10.20.30.1: icmp_seq=1 ttl=64 time=0.322 ms
64 bytes from 10.20.30.1: icmp_seq=2 ttl=64 time=0.289 ms
64 bytes from 10.20.30.1: icmp_seq=3 ttl=64 time=0.301 ms
--- 10.20.30.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2036ms
Meaning: Basic L3 connectivity is working on the host. If VMs still fail, focus on bridge ports, VLAN tags, and VM firewall.
Decision: Decide whether this is a host-only issue (VMs broken) or a host+uplink issue (host broken too). This ping draws that line.
Joke #2: If you’re troubleshooting “no carrier” and you haven’t physically touched a cable yet, you’re basically trying to SSH into a unplugged toaster.
Cable, switch, driver: how each failure mode looks
1) Cable/DAC/fiber problems (the unglamorous majority)
Cable failures are boring, frequent, and wildly under-documented. Copper patch cords fail from stress, bent tabs, or mediocre crimps. DAC cables fail from tight bends and cheap shielding. Fiber fails from contamination—yes, dust you can’t see. In all of these, Linux reports NO-CARRIER and Link detected: no.
Classic patterns:
- Always down: wrong cable type, dead switch port, dead NIC port, wrong optic, or port administratively disabled.
- Flapping under load: marginal cable, EEE weirdness, bad termination, or physical movement (rack door, cable bundle strain).
- Negotiates at wrong speed: auto-neg mismatch, bad pairs, or forced speed on switch.
What to do: swap the physical element first. Use a known-good cable/DAC/optic from a bag labeled “works.” If your bag isn’t labeled, congratulations: you have a bag of mysteries.
2) Switch port problems (config, security, and silent “help”)
Switches can kill connectivity in ways that look like physical failure. Some are honest (admin shutdown). Some are “protective” (port security, err-disable, BPDU guard). And some are just wrong configuration (speed/duplex hard-coded).
Distinguish:
- No carrier usually means the switch port is not transmitting link pulses (disabled, failed, wrong transceiver, incompatible optic).
- Carrier up but traffic dead often means VLAN mismatch, STP blocking, port security, or wrong LACP/static settings.
Don’t guess. Get the switch port status from the network team or your own access. If you can’t, use LLDP and link history on the host to infer what’s happening.
3) Driver/firmware/BIOS/power quirks (the “it was fine yesterday” cases)
When a Proxmox host is stable for months and then starts losing carrier after a kernel update, you’re often looking at:
- Driver regression for a specific NIC model.
- Firmware mismatch (especially with server NICs that depend on NVM/firmware behaviors).
- Power management features: ASPM, deep C-states, EEE.
- PCIe issues: bad riser, marginal slot, BIOS settings.
The key is correlation: does the log show link flaps at the same time every day (power policies)? Did it start after updating pve-kernel? Did you move cables and accidentally “fix it” temporarily? Your evidence should be chronological, not emotional.
Three corporate mini-stories (how this bites in real life)
Mini-story #1: The outage caused by a wrong assumption
A mid-sized company ran a Proxmox cluster for internal services: CI runners, artifact storage, a few databases, and the usual pile of “temporary” VMs that became permanent. After a routine rack cleanup, one node started screaming: bridge port has no carrier. The on-call engineer assumed it was a VLAN tagging issue because they’d recently introduced VLAN-aware bridges.
They spent an hour staring at /etc/network/interfaces, toggling VLAN filtering, and reloading the network stack. Link stayed down. The assumption “new VLAN feature equals new VLAN problem” was comforting and wrong.
Eventually someone walked to the rack. The uplink cable had been moved from the node’s NIC to the IPMI port by a well-meaning contractor who saw two identical RJ45 jacks and chose chaos. The host had no carrier because it was literally not connected to the network it was supposed to be connected to.
The fix took 30 seconds: move the cable back. The postmortem fix took longer but mattered: they labeled ports, they documented which physical NIC maps to which enp* name, and they added LLDP on switches and hosts to make the “what port am I on?” question answerable without hiking to the rack.
Mini-story #2: The optimization that backfired
Another shop wanted to reduce power and heat in a dense cluster. Someone enabled EEE aggressively on access switches and also tweaked BIOS power states. They didn’t touch Proxmox networking at all, which is why the resulting incident was so confusing: random nodes would lose carrier for a few seconds, then recover.
The pattern was nasty: it happened more often during low traffic periods, which made people blame “monitoring” or “backup windows” rather than the physical layer. Logs showed link down/up cycles. Switch logs were not obviously angry. VMs occasionally got fenced because corosync heartbeats didn’t appreciate surprise naps.
The real culprit was a combination: certain NICs in that generation didn’t play nicely with EEE transitions on that switch model. The “optimization” made the PHY negotiate power states in a way that looked like intermittent unplugging.
They fixed it by disabling EEE on the specific ports used by Proxmox nodes (not everywhere), and by rolling back the most aggressive power state settings. Power consumption went up slightly. Uptime went up dramatically. Everyone pretended this was the plan all along.
Mini-story #3: The boring but correct practice that saved the day
A financial services team had a habit that looked tedious: every time they patched or moved a cable, they performed a quick “link validation” script on the host and captured before/after output in the ticket. It included ip -br link, ethtool, and journalctl greps for link events.
During a maintenance, a top-of-rack switch was replaced. One Proxmox node came up with “no carrier” on one bond member. The network team insisted the port was configured correctly. The server team insisted the NIC was fine. The ticket had the before state showing both bond links stable at 10G for months, plus the exact transceiver EEPROM info.
That evidence narrowed the problem quickly: the new switch had a different optic policy and didn’t like the “GENERIC” DAC on that one port. They swapped the DAC to a supported one. Link came up instantly. No prolonged blame storm, no phantom VLAN changes.
The practice wasn’t glamorous. It didn’t require a new tool. It just forced the team to capture the “known-good” baseline. When the incident happened, they weren’t debugging in the dark—they were comparing reality to a saved snapshot of reality.
Common mistakes: symptom → root cause → fix
These are the repeat offenders I see in Proxmox environments—especially after rack work, switch changes, kernel updates, or “minor” refactors.
1) Symptom: “bridge port has no carrier” after moving a host
- Root cause: Cable in wrong NIC port (or into IPMI), wrong patch panel mapping, or wrong switch port.
- Fix: Use
ethtool -P(permanent MAC), LLDP if available, and physically verify the cable path. Label both ends. If you can, standardize: leftmost NIC is always uplink A, etc.
2) Symptom: Link flaps every few minutes
- Root cause: Marginal copper patch cable, bad keystone jack, DAC bend radius violation, or EEE interaction.
- Fix: Replace the physical segment first. If persistent, disable EEE on NIC and/or switch port for diagnosis. Check logs for flap frequency and correlation.
3) Symptom: Bond shows down even though one cable is connected
- Root cause: LACP mismatch (switch expects LACP but host configured active-backup, or vice versa), or both slaves miswired to the same switch port via patching error.
- Fix: Check
/proc/net/bonding/bond0on host and the switch port-channel configuration. Confirm each slave goes to the intended physical switch port.
4) Symptom: Link is up, but Proxmox still shows warning intermittently
- Root cause: The bridge includes an unused interface that is down; Proxmox reports it. Or a second port in a bond is unplugged.
- Fix: Remove unused ports from the bridge/bond, or plug them in properly. Don’t keep “future expansion” members in production config unless you enjoy false alarms.
5) Symptom: After kernel update, NIC stops getting carrier on boot
- Root cause: Driver regression or firmware interaction; occasionally PCIe ASPM/power management behavior changes.
- Fix: Confirm driver/firmware versions, check dmesg for errors, and test booting an older kernel if available. Plan a firmware update and consider pinning a known-good kernel until resolved.
6) Symptom: SFP port shows no carrier only with certain optics
- Root cause: Optic/DAC compatibility, wrong type (SR vs LR), wrong speed (1G module in 10G port), or vendor lock behavior.
- Fix: Read module info with
ethtool -m, swap to a known-good supported optic, and verify switch port configuration supports the intended speed.
7) Symptom: “No carrier” only after reboot, fixed by reseating cable
- Root cause: Physical connector wear, latch not seating, or a transceiver that doesn’t initialize reliably.
- Fix: Replace the patch cable or transceiver. “Reseat to fix” is a symptom, not a solution.
Checklists / step-by-step plan
Checklist A: Get from alert to root cause in 10 minutes
- Confirm the exact interface:
ip -br link. Identify which port isDOWN/NO-CARRIER. - Confirm carrier with ethtool:
ethtool enpXsY. IfLink detected: no, treat as L1. - Check bridge membership:
bridge link. Ensure it’s the expected port/bond. - Check logs for flap vs hard down:
journalctl -k -bgrep link up/down. - Swap the simplest physical item: cable/DAC/optic.
- Move to another switch port: eliminates a single bad port or err-disable state quickly.
- If bonded: check
/proc/net/bonding/bond0and match switch LACP config. - If still dead: identify driver/firmware (
ethtool -i,lspci -nnk) and consider firmware update or kernel rollback.
Checklist B: Before you touch production networking
- Have out-of-band access (IPMI/iKVM). If you don’t, your “quick bounce” is a career-limiting move.
- Capture state:
ip a,ip r,bridge link,ethtool, and last 200 kernel log lines. - Know what “good” looks like: speed, duplex, expected LLDP neighbor, and which switch port should light up.
- Make one change at a time. The network is not a slot machine.
Checklist C: Hardening so this doesn’t wake you up again
- Label NIC ports on the chassis and in your CMDB/tickets. Map
enp*to physical ports. - Enable LLDP in your environment if possible (host + switch). It’s cheap truth.
- Standardize optics/DACs. Mixed bins of transceivers are how you breed intermittent failures.
- Keep firmware reasonably current on NICs, especially Intel/Broadcom server adapters.
- Alert on link flaps, not just link down. Flaps are the smoke alarm; link-down is the fire.
- Document switch port profiles for hypervisor uplinks (LACP, VLAN trunking, MTU). Repeatability beats heroics.
FAQ
1) Does “no carrier” always mean a bad cable?
No, but treat it that way first. “No carrier” means the PHY isn’t seeing link. Cable/DAC/optic and switch port state are the common causes. Drivers and firmware are less common but real.
2) Can VLAN misconfiguration cause “bridge port has no carrier”?
Not typically. VLAN mistakes usually keep the link up but break traffic. If ethtool says Link detected: no, VLANs are not your culprit.
3) Proxmox shows the warning, but the host still has connectivity. Why?
Often because the bridge has multiple ports and one is down (e.g., a bond member unplugged), or there’s an unused interface still attached. Proxmox reports the down port even if another path works.
4) What’s the quickest proof it’s a switch-side issue?
Move the cable to a known-good switch port (with the same config profile). If link comes up immediately, your original switch port is misconfigured, disabled, or faulty.
5) How do I tell if my SFP/DAC is incompatible?
Read it with ethtool -m (if supported) and compare with what works elsewhere. If the same host+port works with a different module but not this one, the module is guilty. If the module works on another switch but not this one, you’ve met vendor compatibility policy.
6) Is it safe to bounce the bridge interface on a Proxmox node?
Only if you have out-of-band access and you understand what traffic you’ll drop. Bouncing vmbr0 drops connectivity for the host and VMs using it. Use it as a controlled diagnostic step, not a reflex.
7) My bond is in LACP mode; can “no carrier” still happen?
Yes. LACP is L2, but carrier is L1. If both physical links are down (or optics aren’t recognized), the bond goes down. If one link is down, the bond may stay up but with reduced capacity and possibly different traffic behavior depending on hashing.
8) After replacing a switch, only some Proxmox hosts lose carrier. Why not all?
Different NICs and optics behave differently. The new switch might be stricter about transceiver coding, might default to different speed/auto-neg settings, or might have different EEE defaults. Heterogeneous hardware turns “simple swap” into “surprise lab.”
9) Can a NIC be “up” in ip link but still have no carrier?
Yes. UP means the interface is administratively enabled. Carrier is the lower-layer signal indicated by LOWER_UP. You can have UP without LOWER_UP.
10) What if the switch shows link up but Linux says no carrier?
That’s uncommon but can happen with faulty transceivers, weird negotiation, or a port mirroring/monitoring setup causing misleading LEDs. Trust ethtool and logs, then validate by swapping optics/ports to break the tie.
Conclusion: next steps that prevent repeats
“Bridge port has no carrier” is a blunt message with a helpful subtext: stop debugging overlays and start debugging the underlay. If the NIC can’t see link, your bridge is innocent and your VMs are collateral damage.
Do this next, in order:
- Run
ip -br linkandethtoolto confirm true no-carrier. - Swap the physical element (cable/DAC/optic). Then try another switch port.
- If it flaps, mine
journalctl -kfor the timeline and consider EEE/power/driver interactions. - Once you’ve restored carrier, validate L2/L3 (bond state, VLAN tagging, ping gateway) and only then dig into VM-level issues.
- Finally, make it boring: label ports, standardize optics, and capture before/after baselines in change tickets.
The win isn’t just fixing today’s outage. It’s making sure the next “no carrier” takes five minutes and a cable swap, not a midnight interpretive dance with bridge settings.