You upgraded to Ubuntu 24.04, the NIC looks healthy, CPU is bored, and yet the application is wheezing: odd latency spikes, flaky connections, packet captures that lie, or storage replication that suddenly “needs more bandwidth” while using less. Someone says “turn off offloads.” Someone else says “never touch offloads.” Both are half-right, which is the most dangerous kind of right.
This is the field guide for proving when GRO/LRO/TSO (and friends) are the problem, how to disable them without making things worse, and how to do it in a way you can roll back at 3 a.m. with dignity intact.
What GRO/LRO/TSO actually do (and why they break things)
NIC offloads are a bargain: move per-packet work from the CPU to the NIC and you get higher throughput and lower CPU usage. The bill arrives when the “packet” the OS thinks it saw is not what actually crossed the wire, or when some middle layer assumes it’s seeing normal-sized segments but it’s actually seeing a coalesced monster.
GRO: Generic Receive Offload
GRO is a Linux software feature that merges multiple incoming packets into a larger “super-packet” before handing it up the stack. It reduces per-packet processing and interrupts. Great for throughput. Bad when you need precise per-packet timing, when your capture tooling expects the world to resemble reality, or when a bug in the driver/kernel path mishandles segmentation boundaries.
GRO is not a wire feature. It is a local optimization. The wire still carries normal MTU frames. Your host may show fewer, larger packets in captures and counters because the stack merged them.
LRO: Large Receive Offload
LRO is usually NIC/driver-based receive coalescing. It can be even more aggressive than GRO. It’s also more problematic in the presence of routing, tunneling, VLANs, and anything that expects strict packet boundaries. Many environments simply keep LRO off, especially in virtualized or overlay-heavy networks.
TSO/GSO: segmentation on transmit (hardware or software)
TSO (TCP Segmentation Offload) lets the kernel hand a big TCP payload to the NIC and the NIC chops it into MSS-sized segments and computes checksums. GSO (Generic Segmentation Offload) is the software equivalent for protocols the NIC doesn’t know. This is usually safe and beneficial—until it isn’t: driver bugs, checksum misfeatures, odd interactions with tunnels, or gear in the path that behaves badly under certain burst/segment patterns.
Checksum offloads: the subtle troublemaker
Rx/Tx checksum offload means checksums may be computed/validated by NIC hardware. Packet captures taken on the host can show “bad checksum” because the checksum wasn’t filled in yet at the point of capture. That’s not necessarily a broken network; it’s a broken assumption.
One paraphrased idea from John Allspaw (operations/reliability): “Incidents happen when reality diverges from our mental model.” Offloads are a machine for manufacturing that divergence.
Joke #1: Offloads are like hiring interns to do your paperwork—fast and cheap until they “optimize” your filing system into modern art.
Facts and history: why this keeps happening
- Fact 1: LRO predates GRO and was widely disabled in routing/tunneling scenarios because it can merge packets in ways that break forwarding assumptions.
- Fact 2: GRO was introduced as a safer, stack-aware alternative to LRO, but it still changes packet visibility for tools like tcpdump and AF_PACKET consumers.
- Fact 3: TSO became mainstream as 1 Gbit/s turned into 10/25/40/100+ Gbit/s; CPUs can’t afford per-packet overhead at modern line rates.
- Fact 4: “Bad checksum” in tcpdump on the sending host is often a capture artifact with checksum offload, not a real wire checksum failure.
- Fact 5: Many CNIs and overlay stacks (VXLAN/Geneve) have had periods where offload combinations were buggy until drivers and kernels matured.
- Fact 6: RSS (Receive Side Scaling) and RPS/XPS were developed because single-core receive processing became a bottleneck long before NIC bandwidth did.
- Fact 7: IRQ moderation/coalescing can cause latency spikes under low traffic even when it improves throughput under heavy traffic.
- Fact 8: Virtualization added another layer: virtio-net and vhost can do their own batching/coalescing, compounding offload effects.
- Fact 9: Packet capture accuracy has always been “best effort” once you add offloads; capturing at SPAN/TAP or on a router often tells a different story than capturing on the endpoint.
Fast diagnosis playbook (first/second/third)
First: decide if you’re debugging correctness or performance
If you have data corruption, broken sessions, weird resets, or “only fails under load” behavior, treat it as correctness until proven otherwise. Performance tuning can wait. Disabling offloads is a valid correctness test even if it costs throughput.
Second: find the choke point in one pass
- Look for retransmits, drops, and resets. If TCP retransmits spike while link utilization is low, you have loss or reordering (real or apparent).
- Check CPU per-softirq saturation. If ksoftirqd or a single CPU core pegs, you’re packet-rate bound or mis-steered (RSS/IRQ affinity issue).
- Compare host-level counters vs switch counters. If the host says “drops” but the switch port doesn’t, your issue is likely in the host stack/driver/offload path.
Third: run two controlled A/B tests
- A/B offloads: toggle GRO/LRO/TSO in a scoped way, measure latency/retransmits/throughput.
- A/B MTU and path: test standard MTU vs jumbo, and direct host-to-host vs through overlay/VPN.
Your goal is not “make numbers bigger.” Your goal is: prove causality, then apply the smallest fix that removes the failure mode.
Practical tasks: commands, outputs, decisions (12+)
These are not toy commands. They’re the things you run when you’re on-call and your graph is lying to you. Each task includes what to look at and what decision to make.
Task 1: Identify the real interface and driver
cr0x@server:~$ ip -br link
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
enp5s0f0 UP 3c:fd:fe:aa:bb:cc <BROADCAST,MULTICAST,UP,LOWER_UP>
enp5s0f1 DOWN 3c:fd:fe:aa:bb:cd <BROADCAST,MULTICAST>
Meaning: Pick the interface that carries the traffic (here: enp5s0f0). “UP, LOWER_UP” means link is up.
Decision: Only touch offloads on the active interface. Don’t shotgun changes across all NICs unless you enjoy guessing games.
cr0x@server:~$ sudo ethtool -i enp5s0f0
driver: ixgbe
version: 6.8.0-41-generic
firmware-version: 0x800003e7
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
Meaning: Driver and firmware matter. “It worked on 22.04” is not a proof; it’s a memory.
Decision: If issues started after kernel/firmware change, keep that in your hypothesis list.
Task 2: Inventory current offload settings (baseline)
cr0x@server:~$ sudo ethtool -k enp5s0f0
Features for enp5s0f0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
Meaning: This is your starting point. Note what is already off (LRO often is).
Decision: Copy this into your incident notes. When you roll back, you roll back to facts, not vibes.
Task 3: Check if you’re seeing retransmits (often the first “offload broke it” clue)
cr0x@server:~$ nstat -az | egrep 'Tcp(RetransSegs|ExtTCPSynRetrans|OutRsts|InErrs)'
TcpRetransSegs 1842 0.0
TcpExtTCPSynRetrans 17 0.0
TcpOutRsts 92 0.0
IpInErrs 0 0.0
Meaning: Rising TcpRetransSegs during a workload points to loss, reordering, or path weirdness.
Decision: If retransmits climb while CPU and link utilization look fine, suspect driver/offload/tunnel interactions before blaming “the network.”
Task 4: Prove whether drops are on the NIC, in the stack, or somewhere else
cr0x@server:~$ ip -s link show enp5s0f0
2: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:aa:bb:cc brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
9876543210 9123456 0 18421 0 1234
TX: bytes packets errors dropped carrier collsns
8765432109 8234567 0 0 0 0
Meaning: RX drops here are host-visible. They could be ring overruns, CPU backlog, or driver behavior.
Decision: If RX drops rise during load, you must look at ring sizes, IRQ distribution, and GRO/LRO before tuning application timeouts.
Task 5: See whether softirq is the real bottleneck
cr0x@server:~$ mpstat -P ALL 1 3
Linux 6.8.0-41-generic (server) 12/30/2025 _x86_64_ (32 CPU)
12:00:01 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
12:00:02 AM all 4.2 0.0 6.1 0.1 0.3 7.8 0.0 0.0 0.0 81.5
12:00:02 AM 7 2.1 0.0 4.9 0.0 0.2 48.7 0.0 0.0 0.0 44.1
Meaning: One CPU with very high %soft indicates receive processing concentrated on one queue/IRQ.
Decision: Before disabling offloads “because internet,” check RSS/IRQ affinity. Offloads aren’t your only lever.
Task 6: Check interrupt distribution and whether one queue is hogging
cr0x@server:~$ awk '/enp5s0f0/ {print}' /proc/interrupts | head
74: 1283921 2031 1888 1999 2101 1902 1988 2010 PCI-MSI 524288-edge enp5s0f0-TxRx-0
75: 1932 1290033 1890 2011 1998 1887 2002 2017 PCI-MSI 524289-edge enp5s0f0-TxRx-1
Meaning: This looks balanced-ish. If you see one IRQ count dwarfing others, RSS or affinity is off.
Decision: Fix steering (RSS/affinity) before you turn off GRO and accidentally crank packet rate into a CPU wall.
Task 7: Verify ring sizes (drops can be simple buffer starvation)
cr0x@server:~$ sudo ethtool -g enp5s0f0
Ring parameters for enp5s0f0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 512
RX Mini: 0
RX Jumbo: 0
TX: 512
Meaning: Current rings are smaller than max.
Decision: If you have RX drops under bursty load, increase rings before blaming GRO. Bigger rings increase latency a bit but can prevent loss.
Task 8: Confirm link speed/duplex and detect weird autoneg behavior
cr0x@server:~$ sudo ethtool enp5s0f0 | egrep 'Speed|Duplex|Auto-negotiation|Link detected'
Speed: 10000Mb/s
Duplex: Full
Auto-negotiation: on
Link detected: yes
Meaning: Obvious, but you’d be surprised how often a “network bug” is a 1G fallback.
Decision: If speed is wrong, stop. Fix the link/optic/cable/switch port before chasing offloads.
Task 9: Use tcpdump correctly when offloads are involved
cr0x@server:~$ sudo tcpdump -i enp5s0f0 -nn -s 96 tcp and port 443 -c 5
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp5s0f0, link-type EN10MB (Ethernet), snapshot length 96 bytes
12:00:10.123456 IP 10.0.0.10.51532 > 10.0.0.20.443: Flags [S], seq 123456789, win 64240, options [mss 1460,sackOK,TS val 111 ecr 0,nop,wscale 7], length 0
12:00:10.123789 IP 10.0.0.20.443 > 10.0.0.10.51532: Flags [S.], seq 987654321, ack 123456790, win 65160, options [mss 1460,sackOK,TS val 222 ecr 111,nop,wscale 7], length 0
Meaning: tcpdump on the endpoint is still useful for handshake and timing. But payload segmentation and checksums can mislead you.
Decision: If you’re diagnosing packet loss or MTU issues, consider capturing on a switch SPAN/TAP or disable GRO temporarily for the capture window.
Task 10: Check MTU path issues (offloads sometimes mask/trigger them)
cr0x@server:~$ ping -c 3 -M do -s 8972 10.0.0.20
PING 10.0.0.20 (10.0.0.20) 8972(9000) bytes of data.
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
--- 10.0.0.20 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2049ms
Meaning: Your interface MTU is 1500, so jumbo ping fails locally. That’s fine if you expect 1500.
Decision: Don’t mix jumbo assumptions with TSO/GSO expectations. If you need jumbo, configure it everywhere, then retest offloads.
Task 11: Measure throughput and CPU with iperf3 (baseline before changing offloads)
cr0x@server:~$ iperf3 -c 10.0.0.20 -P 4 -t 10
Connecting to host 10.0.0.20, port 5201
[SUM] 0.00-10.00 sec 10.9 GBytes 9.35 Gbits/sec 0 sender
[SUM] 0.00-10.00 sec 10.8 GBytes 9.29 Gbits/sec receiver
Meaning: Good baseline. If offloads are broken, you often see throughput collapse with retransmits or insane CPU.
Decision: Record this. Any change you make must beat it or fix correctness without unacceptable regression.
Task 12: Toggle GRO off temporarily and see if symptoms move
cr0x@server:~$ sudo ethtool -K enp5s0f0 gro off
cr0x@server:~$ sudo ethtool -k enp5s0f0 | egrep 'generic-receive-offload|large-receive-offload'
generic-receive-offload: off
large-receive-offload: off
Meaning: GRO is now off. LRO remains off.
Decision: Retest the failing workload and watch retransmits/latency. If correctness improves immediately, GRO was at least part of the failure mode.
Task 13: Toggle TSO/GSO off (careful: can increase CPU/packet rate)
cr0x@server:~$ sudo ethtool -K enp5s0f0 tso off gso off
cr0x@server:~$ sudo ethtool -k enp5s0f0 | egrep 'tcp-segmentation-offload|generic-segmentation-offload'
tcp-segmentation-offload: off
generic-segmentation-offload: off
Meaning: Transmit segmentation will happen in smaller chunks or be fully software, depending on stack and NIC.
Decision: If this fixes stalls/retransmits on certain paths (especially tunnels), keep it as a strong signal of a driver/firmware/overlay interaction. Then decide how to scope the change.
Task 14: Check driver-level stats for red flags (drops, missed, errors)
cr0x@server:~$ sudo ethtool -S enp5s0f0 | egrep -i 'drop|dropped|miss|error|timeout' | head -n 20
rx_missed_errors: 0
rx_no_buffer_count: 124
rx_errors: 0
tx_timeout_count: 0
Meaning: rx_no_buffer_count suggests receive buffer starvation; that’s often ring sizing, CPU, or burst behavior.
Decision: Increase rings, tune IRQ moderation, or reduce burstiness before permanently disabling offloads across the fleet.
Task 15: Look at qdisc and queueing (latency spikes can be local)
cr0x@server:~$ tc -s qdisc show dev enp5s0f0
qdisc mq 0: root
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
Sent 812345678 bytes 8234567 pkt (dropped 0, overlimits 0 requeues 12)
backlog 0b 0p requeues 12
Meaning: Queue discipline stats show drops/requeues. Not all pain is on the wire.
Decision: If you see local qdisc drops, disabling GRO won’t save you; you have queueing/bufferbloat or shaping issues.
Task 16: Validate conntrack or firewall path (often blamed on offloads)
cr0x@server:~$ sudo conntrack -S | egrep 'insert_failed|drop|invalid'
insert_failed=0
drop=0
invalid=0
Meaning: Conntrack isn’t melting. Good.
Decision: If conntrack drop/insert_failed climbs, you have a state table problem; don’t “fix” it by toggling GRO.
How to disable offloads safely (temporary, persistent, scoped)
The safe approach is boring: isolate, test, measure, persist only what you must, and document the rollback. Disabling offloads can absolutely fix real problems. It can also convert a throughput problem into a CPU incident and then you get to enjoy two incidents for the price of one.
Temporary changes (runtime only)
Use ethtool -K for an immediate A/B test. This does not survive reboot. That’s a feature when you’re experimenting.
cr0x@server:~$ sudo ethtool -K enp5s0f0 gro off lro off tso off gso off rx off tx off
Cannot change rx-checksumming
Cannot change tx-checksumming
Meaning: Some features can’t be toggled on some drivers/NICs. That’s normal.
Decision: Don’t fight hardware. Toggle what you can, then adjust your test plan. Often GRO/TSO toggles are the most impactful anyway.
Persistent changes with systemd-networkd (the sane Ubuntu 24.04-native way)
On Ubuntu 24.04, many servers are happy with systemd-networkd under the hood even if you use netplan to define configuration. The point: you want offloads set at link-up time, not by a hand-run script that “someone” forgot to install on half the fleet.
Create a .link file to match the interface by name or MAC and apply offload settings. Example matching by name:
cr0x@server:~$ sudo tee /etc/systemd/network/10-enp5s0f0.link >/dev/null <<'EOF'
[Match]
OriginalName=enp5s0f0
[Link]
GenericReceiveOffload=false
TCPSegmentationOffload=false
GenericSegmentationOffload=false
LargeReceiveOffload=false
EOF
cr0x@server:~$ sudo systemctl restart systemd-networkd
cr0x@server:~$ sudo ethtool -k enp5s0f0 | egrep 'generic-receive-offload|tcp-segmentation-offload|generic-segmentation-offload|large-receive-offload'
tcp-segmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
Meaning: Offloads are enforced after restart/link renegotiation.
Decision: If the persistent config applies cleanly and fixes the issue without unacceptable CPU cost, you can roll it out gradually.
Persistent changes with a oneshot systemd unit (works everywhere, less elegant)
If you can’t rely on systemd-networkd link settings (mixed environments, custom NIC naming, or you just want something blunt), use a systemd oneshot that runs after network is up.
cr0x@server:~$ sudo tee /etc/systemd/system/ethtool-offloads@.service >/dev/null <<'EOF'
[Unit]
Description=Set NIC offloads for %I
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/sbin/ethtool -K %I gro off lro off tso off gso off
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
EOF
cr0x@server:~$ sudo systemctl enable --now ethtool-offloads@enp5s0f0.service
Created symlink /etc/systemd/system/multi-user.target.wants/ethtool-offloads@enp5s0f0.service → /etc/systemd/system/ethtool-offloads@.service.
Meaning: Offloads set on boot for that interface.
Decision: Use this when you need predictable behavior fast. Then replace with cleaner link-level config later.
Scope it: disable only what you must
In practice:
- If packet captures are misleading or a userspace packet consumer is confused: disable GRO (and sometimes checksum offload) temporarily for diagnostics.
- If tunnels/overlays are behaving badly: try disabling TSO/GSO first, then GRO.
- If you’re routing/bridging: keep LRO off. Most production shops never enable it.
Verify you didn’t “fix” it by starving CPU
After disabling offloads, rerun your workload and watch:
- CPU %soft (softirq load)
- RX drops and
rx_no_buffer_count - p99 latency and retransmits
Joke #2: The fastest way to prove offloads “fixed it” is to create a brand-new CPU bottleneck—nature abhors a vacuum, and so does Linux networking.
Three corporate-world mini-stories (anonymized, plausible, technically accurate)
1) Incident caused by a wrong assumption: “Bad checksums mean the network is corrupting packets”
A payments team noticed intermittent TLS failures after moving a service onto new Ubuntu 24.04 hosts. A packet capture from the client host showed a parade of “bad checksum” warnings. The immediate conclusion was predictable: network gear was mangling packets. The network team got paged, a war room was opened, and everyone started hunting ghosts.
The first assumption was the mistake: that a checksum warning in a host-side capture is proof of wire-level corruption. On modern Linux, with TX checksum offload enabled, tcpdump can grab packets before the NIC fills in the checksum. The packet on the wire is fine; the packet in your capture is incomplete at that specific tap point.
The second mistake was escalation-by-screenshot. A senior engineer finally asked for two things: a capture from a switch SPAN port, and the output of ethtool -k. The SPAN capture showed valid checksums. The endpoint capture looked “bad” only when offloads were enabled.
The real issue ended up being unrelated to checksums: a mis-sized conntrack table on a shared gateway host dropped connections under bursty load. Offloads were innocent; the capture tooling was misleading.
What changed decision-making: they updated the on-call runbook to treat “bad checksum” as a diagnostic artifact unless confirmed off-host. They also standardized a “capture with offloads disabled for 2 minutes” method when endpoint captures were required.
2) Optimization that backfired: “Enable every offload for maximum performance”
A platform group running container workloads had a mandate: improve node density. Someone noticed that some nodes had LRO disabled and decided to “normalize” the fleet for throughput. They pushed a change to enable everything the NIC advertised. Throughput benchmarks looked great in a synthetic test. The change went to production.
Two days later, the support queue filled with oddities: gRPC streams stalling, sporadic timeouts between pods, and a pattern where failures were more common on nodes with heavy east-west traffic. CPU was fine. The network was fine. Logs were unhelpful. The incident had that special smell: everything is normal except the customer experience.
The culprit was LRO interacting poorly with encapsulated traffic and the way some components observed packet boundaries. The system wasn’t “dropping packets” so much as creating weird aggregation behavior that exposed edge-case bugs in the overlay path. Disabling LRO fixed it immediately.
The backfire lesson was simple: offloads are not “free performance flags.” They are behavioral changes. If your environment uses overlays, middleboxes, or advanced datapaths, treat offloads like kernel upgrades: test them on representative traffic, not just iperf.
They rolled back to a conservative baseline: LRO off everywhere, keep TSO/GSO on unless proven harmful, and allow GRO only where it doesn’t interfere with observability or packet-processing components.
3) Boring but correct practice that saved the day: “A/B test with a single host and a rollback timer”
A storage team running replication traffic over 25G links started seeing replication lag spikes after an OS refresh. No obvious packet loss, but the lag was real and correlated with busy periods. The first impulse was to tune the application and add bandwidth. They resisted it.
Their SRE lead insisted on a one-host experiment: pick one sender and one receiver, isolate the path, and flip offloads in a controlled matrix (GRO only, TSO only, both). Every change had a rollback command pre-written, and they used a timed job to revert settings after 20 minutes unless explicitly cancelled. Boring. Correct.
The experiment showed something actionable: disabling TSO/GSO removed the lag spikes but increased CPU by a noticeable margin. Disabling GRO had minimal effect. That narrowed the culprit to transmit segmentation interactions, not receive coalescing.
With that evidence, they updated NIC firmware on the affected model and re-tested. TSO/GSO could remain enabled after the firmware update. They avoided a permanent CPU tax and didn’t have to re-architect the storage topology.
The practice that saved the day was not a clever sysctl. It was discipline: one change, one host, one measurable outcome, guaranteed rollback.
Common mistakes: symptoms → root cause → fix
1) Symptom: tcpdump shows “bad checksum” on outgoing packets
Root cause: TX checksum offload; tcpdump captured before checksum was computed.
Fix: Capture on SPAN/TAP, or disable TX checksum offload briefly for capture (ethtool -K IFACE tx off) and re-check.
2) Symptom: p99 latency spikes under light traffic, throughput fine under heavy traffic
Root cause: Interrupt moderation/coalescing tuned for throughput; wakes CPU less often, adds latency.
Fix: Adjust NIC interrupt coalescing (ethtool -c/-C), and only then consider GRO changes.
3) Symptom: Kubernetes pod-to-pod networking flaky after upgrade
Root cause: Offload + overlay/tunnel interaction (TSO/GSO/GRO with VXLAN/Geneve), sometimes driver-specific.
Fix: A/B disable TSO/GSO first on affected nodes; keep LRO off. Confirm with retransmits and workload tests.
4) Symptom: RX drops climb on the host during bursts, switch shows clean counters
Root cause: Host receive buffer/ring starvation or softirq backlog; not necessarily external loss.
Fix: Increase RX ring size (ethtool -G), confirm IRQ/RSS distribution, then revisit GRO/TSO if needed.
5) Symptom: After disabling offloads, performance collapses and CPU spikes
Root cause: You removed batching, increased packet rate and per-packet overhead, and hit CPU/softirq limits.
Fix: Re-enable TSO/GSO, keep LRO off, consider leaving GRO on. Improve steering (RSS), ring sizes, and IRQ affinity.
6) Symptom: Packet captures show giant “packets” bigger than MTU
Root cause: GRO/LRO presenting coalesced skbs to the capture point; not a wire MTU violation.
Fix: Disable GRO/LRO for the capture window, or capture off-host.
7) Symptom: Storage replication or iSCSI/NFS behaves worse with “more optimization”
Root cause: Backpressure and burst patterns changed by offloads; driver bugs under sustained large segments.
Fix: A/B test TSO/GSO; watch retransmits and CPU. If disabling helps, check firmware/driver updates and consider scoping offload changes to storage VLANs/interfaces.
Checklists / step-by-step plan
Step-by-step: prove offloads are the problem (not just “different”)
- Baseline: record
ethtool -k,ethtool -i,ip -s link,nstat. - Reproduce: run the real workload, not only iperf. Capture p95/p99 latency, retransmits, and error counters.
- Single toggle: disable GRO only. Retest. If change is visible, keep going; if not, revert.
- Second toggle: disable TSO/GSO. Retest. Watch CPU and softirq.
- Decide scope: if only one traffic class breaks (overlay, storage VLAN), prefer scoping the change to those nodes or interfaces.
- Rollback plan: write the exact revert commands and verify they work.
Change control checklist (a.k.a. how to not become the incident)
- Do it on one host first. Then a small canary pool. Then expand.
- Keep LRO off unless you have a measured reason and a simple L2/L3 path.
- Don’t disable everything at once. GRO and TSO affect different sides of the stack.
- Track CPU softirq and RX drops after changes. If they spike, you moved the bottleneck.
- Persist config using systemd link files or systemd units, not rc.local folklore.
- Document: interface, driver, firmware, kernel version, offload matrix, and measured result.
Rollback checklist (write this before you change anything)
cr0x@server:~$ sudo ethtool -K enp5s0f0 gro on lro off tso on gso on
Meaning: Example rollback to a conservative baseline: keep LRO off, re-enable GRO/TSO/GSO.
Decision: If rollback doesn’t restore behavior, you learned offloads were not the root cause. Stop changing offloads and widen the search.
Optional: ring sizing change (only if drops point there)
cr0x@server:~$ sudo ethtool -G enp5s0f0 rx 2048 tx 2048
cr0x@server:~$ sudo ethtool -g enp5s0f0 | egrep 'Current hardware settings|RX:|TX:' -A4
Current hardware settings:
RX: 2048
RX Mini: 0
RX Jumbo: 0
TX: 2048
Meaning: Bigger rings reduce drops under burst but can increase buffering/latency.
Decision: If your issue is “drops under microbursts,” this is often more correct than disabling GRO.
FAQ
1) Should I disable GRO/LRO/TSO by default on Ubuntu 24.04?
No. Default offloads are generally correct for mainstream workloads. Disable only when you can show a correctness problem or a measurable performance regression tied to offloads.
2) Why does disabling offloads sometimes fix packet loss that “isn’t real”?
Because you’re changing timing, batching, and segmentation patterns. If a driver/firmware path has a bug under certain segment sizes or burst patterns, changing offloads can avoid the trigger.
3) If LRO is so sketchy, why does it exist?
It was built for throughput on simpler paths, often in older or more controlled environments. Modern networks (overlays, virtualization, routing in hosts) made its edge cases more expensive than its wins for many shops.
4) Is “bad checksum” in tcpdump always harmless?
Not always, but it’s often an offload artifact. Validate by capturing off-host or disabling checksum offload briefly. If the problem reproduces off-host with bad checksums, then it’s real.
5) What’s the difference between TSO and GSO in practice?
TSO is hardware TCP segmentation by the NIC. GSO is a generic software framework that lets the kernel handle segmentation similarly for other protocols or when hardware can’t.
6) Will disabling TSO/GSO increase latency?
It can go either way. It often increases CPU work and packet rate, which can increase queueing under load. But it can reduce burstiness and avoid certain driver/tunnel issues, improving tail latency in specific scenarios.
7) I disabled GRO and my captures look “normal” now. Did I fix production?
You fixed observability. That’s valuable, but not the same as fixing production behavior. Re-enable GRO after capturing unless you have evidence GRO was causing user-visible issues.
8) Can I disable offloads only for a VLAN, bridge, or tunnel?
Offloads are typically configured per physical or virtual interface. You can apply settings to the specific interface that terminates your traffic (e.g., the physical NIC, the bond, or the veth/bridge in some cases), but “per VLAN” granularity is limited and driver-dependent.
9) How do I know if I’m CPU-bound after disabling offloads?
Watch mpstat for high %soft, check /proc/interrupts for imbalance, and look for rising RX drops or rx_no_buffer_count. If those climb, you traded a network problem for a host processing problem.
10) Does Ubuntu 24.04 itself “change offloads” compared to 22.04?
The more meaningful change is the kernel, drivers, and how your NIC firmware interacts with them. Ubuntu releases can shift defaults, but most “it changed after upgrade” stories are really driver/firmware behavior shifts under the same nominal feature flags.
Conclusion: next steps you can ship
GRO/LRO/TSO aren’t villains. They’re power tools. If you use them without understanding what they change, you’ll eventually cut through your own mental model and bleed time in a war room.
- Run the fast diagnosis playbook and decide whether you’re chasing correctness or throughput.
- Take a baseline with
ethtool -k,nstat,ip -s link, and CPU softirq stats. - A/B test one toggle at a time: GRO first for observability/receive behavior, TSO/GSO for transmit/overlay path weirdness.
- Persist the smallest change using a systemd
.linkfile or a oneshot unit, and canary it. - Verify you didn’t create a CPU bottleneck and that retransmits/drops actually improved.
If you walk away with one opinionated rule: keep LRO off, don’t disable TSO/GSO casually, and treat GRO as both a performance knob and an observability hazard. Test like you mean it.