Debian 13: NFS is slower than expected — prove it’s sync/rsize/wsize and fix it (case #23)

Was this helpful?

Nothing is quite as demoralizing as a brand-new Debian 13 box with a fast NIC, a healthy storage array, and NFS performance that looks like it’s traveling by fax. Reads are “fine-ish.” Writes are a tragedy. And everyone has a theory, none of which include evidence.

This is the case where you stop guessing and start proving. We’ll show how to demonstrate—using stats, traces, and controlled tests—that your bottleneck is usually one of three culprits: synchronous semantics (sync/async plus commit behavior), too-small rsize/wsize, or a mismatch between client expectations and server storage reality. Then we fix it without lying to ourselves about durability.

Fast diagnosis playbook

If you’re on call and someone is breathing into the incident channel, this is the shortest path to the truth. Don’t start by changing mount options. Start by measuring one thing at a time.

First: confirm what you mounted and what you exported

  1. Client: confirm NFS version and mount options (nfs4 vs nfs, rsize/wsize, sync behavior via hard/soft and timeouts).
  2. Server: confirm export options and whether the server is forcing sync behavior (sync default) and whether it can cache safely.

Second: determine if you are latency-bound (sync) or bandwidth-bound (I/O size)

  1. Run a buffered read test and a direct write test from the client. If reads hit line rate but writes crawl, suspect synchronous commits.
  2. Check nfsstat -c for COMMIT call rate. A high COMMIT rate is a neon sign.

Third: isolate network vs server disk

  1. Network: verify you can push packets with a simple TCP test (not “ping,” not hope).
  2. Server disk: check whether NFSd is waiting on storage flushes (common with HDDs, RAID controllers with write cache off, or virtualized storage).

Fourth: only then tune

  • If you prove commits are the limiter: fix durability path (server storage cache, SLOG, controller settings) or accept controlled risk (async only if the workload allows it).
  • If you prove small I/O is the limiter: increase rsize/wsize, verify MTU and offloads aren’t sabotaging you, and confirm the server supports those sizes.

The performance model: where NFS actually spends time

NFS performance arguments tend to be emotional because people mix up “my app is slow” with “the network is slow” with “the storage is slow.” NFS is all three, glued together with a contract about consistency.

At a high level, an NFS write can be:

  • Sent from client to server in chunks sized by wsize.
  • Acknowledged by the server either as “I received it” or “it’s on stable storage,” depending on NFS version, export options, and server implementation.
  • Committed (explicitly via COMMIT operations in NFSv3 and NFSv4 when needed) to stable storage.

If the server must flush to stable storage frequently, you are latency-bound. If each flush takes 2–10 ms and you do it thousands of times, your throughput is mathematically doomed. You can have 100 GbE and still write at “a few tens of MB/s” because physics doesn’t negotiate.

If rsize/wsize are tiny (8K, 16K, 32K), you are overhead-bound. The CPU cost of RPC handling, the per-call latency, and the kernel bookkeeping become the bottleneck even if storage is fast.

Debian 13 doesn’t change these laws. It just gives you newer kernels, newer NFS client behavior, and slightly different defaults that can expose what was previously hidden by luck.

One quote worth keeping in your head while tuning anything in production:

“Hope is not a strategy.” — General Gordon R. Sullivan

Interesting facts and history (that actually help)

  • NFS was designed in the 1980s to make remote files feel local on LANs that were slow by today’s standards. A lot of its semantics assume latency is acceptable if consistency is preserved.
  • NFSv3 introduced the COMMIT concept so clients could pipeline writes but still request a durable commit later, which is why “COMMIT storms” are a recognizable failure mode.
  • NFSv4 folded “mount” into the protocol and introduced stateful locking and sessions; performance counters and failure modes differ from v3 in ways that show up in nfsstat.
  • sync is historically the safe default on many NFS servers because it matches user expectations: “when a write returns, it’s probably safe.” That safety has a price.
  • async exports can be very fast because they allow the server to acknowledge writes before they hit stable storage. They can also turn power loss into data loss in a way that’s “perfectly consistent” with the contract you just violated.
  • rsize/wsize used to be limited by older NICs, MTUs, and kernel constraints. Modern Linux can do large sizes, but you still need end-to-end compatibility.
  • “Jumbo frames” are not a magic spell. They help in some CPU-bound cases, but they also create path-MTU black holes and weird drops if one switch port is left at 1500.
  • Linux NFS has had multiple generations of client code; “it worked on Debian 10” doesn’t prove your workload is okay—it proves your old setup tolerated it.
  • Writeback caching is a storage problem disguised as a filesystem problem. A RAID controller with write cache disabled can make “sync” feel like molasses even on SSDs.

Joke #1: NFS tuning is like adjusting a shower mixer in an old hotel—one millimeter too far and you’re either freezing or scalded.

Prove it’s sync/commit behavior

You don’t prove “sync is the problem” by declaring it loudly in Slack. You prove it by correlating write latency, commit frequency, and server storage flush time.

What “sync” really means here

On the server side, sync export behavior generally means the server won’t reply “done” until it can guarantee the data is on stable storage (or at least as stable as the server’s storage stack claims). That often implies a cache flush or equivalent barrier.

On many setups, the killer isn’t the network transfer. It’s the stable-storage guarantee. If your underlying storage can’t complete flushes quickly, every “safe” write becomes a micro-transaction waiting for a disk to finish meditating.

The signature of sync pain

  • Large read throughput looks fine; large write throughput is low and flat.
  • Write throughput doesn’t increase much with bigger files or multiple threads (or it increases slightly and then plateaus).
  • nfsstat shows a non-trivial rate of COMMIT calls.
  • Server storage metrics show high latency on flush or write barriers (even when raw write IOPS look “okay”).

What to do with that proof

You have three honest options:

  1. Make stable storage fast (proper SSDs, controller cache with battery/flash-backed protection, ZFS SLOG, etc.).
  2. Change the workload (batch writes, reduce fsync frequency, write locally then move, use object storage for logs, etc.).
  3. Relax the safety contract (async, or application-level durability) with eyes wide open and in writing.

Prove it’s rsize/wsize

rsize and wsize define the maximum payload size for NFS READ and WRITE operations. Too small, and you drown in RPC overhead. Large enough, and you ride the bandwidth you paid for.

But there’s a catch: big I/O sizes only help if you aren’t already latency-bound on commit and if the path (client, server, network) handles them cleanly.

Signature of rsize/wsize pain

  • CPU usage on client or server grows quickly with throughput.
  • Many small NFS operations per second show up in nfsstat.
  • Throughput improves dramatically when you increase rsize/wsize.
  • Network is underutilized despite low disk latency and sufficient CPU.

What “good” sizes look like on modern Linux

On typical modern networks (1/10/25 GbE) and kernels, values like 262144 (256 KiB) are common starting points for NFSv4. Some environments go higher, but chasing max values is less useful than verifying that the negotiated size matches what you think it is.

Practical tasks: commands, outputs, what they mean, and what you decide

These are the tasks I actually run when someone says “NFS is slow” and expects a fix. Each task has a decision point. If you don’t change decisions based on output, you’re not troubleshooting; you’re sightseeing.

Task 1: Confirm what NFS you mounted (version, rsize/wsize, sec, proto)

cr0x@client:~$ nfsstat -m
/mnt/shared from nfs1:/export/shared
 Flags: rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.21,local_lock=none

Meaning: This is the ground truth for what the kernel negotiated. If you expected NFSv3 or different sizes, reality disagrees.

Decision: If rsize/wsize are small (e.g., 32768), plan to test larger sizes. If version is not what you expect, fix that first.

Task 2: Confirm the server export options (sync/async, fsid, subtree, security)

cr0x@nfs1:~$ sudo exportfs -v
/export/shared
	10.10.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,ro,secure,root_squash,no_all_squash)
/export/shared
	10.10.10.0/24(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

Meaning: The export is sync. That’s safe, and it can be slow depending on storage flush performance.

Decision: Don’t flip to async yet. First, prove that sync is the bottleneck with commit/latency evidence.

Task 3: Check server NFS thread and RPC load basics

cr0x@nfs1:~$ sudo nfsstat -s
Server rpc stats:
calls      badcalls   badfmt     badauth    badclnt
10293841   0          0          0          0

Server nfs v4:
null         compound     open         close        read         write
0            854123      1102         1102         302194       181004
commit       getattr      setattr      fsinfo       renew        fsstat
42001        701223      0            21           12944        98

Meaning: If commit is non-trivial relative to write, you may be paying for stable storage frequently.

Decision: If commit calls are high, prioritize sync/flush investigation. If commit is near zero, focus on I/O sizing or network.

Task 4: Look for COMMIT storms on the client too

cr0x@client:~$ nfsstat -c
Client rpc stats:
calls      retrans    authrefrsh
221004     12         0

Client nfs v4:
null         compound     read         write        commit       getattr
0            199321      100112       40123        9800         32119

Meaning: Commit calls exist and retrans are low. That’s not a network loss issue; it’s likely a durability/flush cost issue.

Decision: Move toward measuring flush latency on the server storage stack.

Task 5: Check if you’re accidentally using “sync-like” app behavior (fsync-heavy)

cr0x@client:~$ sudo strace -f -tt -T -e trace=fdatasync,fsync,openat,write -p 23817
15:12:09.441201 fsync(7)                 = 0 <0.012341>
15:12:09.453901 write(7, "....", 4096)   = 4096 <0.000221>
15:12:09.454301 fsync(7)                 = 0 <0.010998>

Meaning: The app calls fsync a lot. On NFS, that can translate into commits/flushes. You can tune NFS all day and still lose.

Decision: If the workload is fsync-heavy (databases, journaling loggers), you must make server stable storage fast or change app durability strategy.

Task 6: Run a controlled write test that bypasses client page cache

cr0x@client:~$ dd if=/dev/zero of=/mnt/shared/ddtest.bin bs=1M count=4096 oflag=direct status=progress
3229614080 bytes (3.2 GB, 3.0 GiB) copied, 78 s, 41.4 MB/s
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 104.2 s, 41.2 MB/s

Meaning: ~41 MB/s on a presumably faster link is suspicious. oflag=direct makes this more sensitive to server commit behavior.

Decision: If direct writes are slow while network tests are fast, focus on server storage flush and sync semantics.

Task 7: Run a controlled buffered read test to compare

cr0x@client:~$ dd if=/mnt/shared/ddtest.bin of=/dev/null bs=4M status=progress
4173336576 bytes (4.2 GB, 3.9 GiB) copied, 6 s, 695 MB/s
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.2 s, 693 MB/s

Meaning: Reads are near line rate; writes are not. This asymmetry screams “commit/flush/durability path.”

Decision: Stop blaming the network.

Task 8: Verify network capacity independently (simple TCP throughput)

cr0x@nfs1:~$ iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
cr0x@client:~$ iperf3 -c 10.10.10.11 -P 4
[SUM]   0.00-10.00  sec  36.8 GBytes  31.6 Gbits/sec  0             sender
[SUM]   0.00-10.00  sec  36.7 GBytes  31.5 Gbits/sec                  receiver

Meaning: Network is fine. Your 40 MB/s NFS writes are not a cable problem.

Decision: Invest time in storage flush latency and NFS write semantics.

Task 9: Check MTU consistency (because someone always “helped”)

cr0x@client:~$ ip -d link show dev enp65s0
2: enp65s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:aa:bb:cc brd ff:ff:ff:ff:ff:ff
cr0x@nfs1:~$ ip -d link show dev enp65s0
2: enp65s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:11:22:33 brd ff:ff:ff:ff:ff:ff

Meaning: MTU mismatch. This can cause fragmentation, drops, and performance weirdness—especially if intermediate gear is inconsistent.

Decision: Either set MTU 1500 everywhere (boring, reliable) or set jumbo frames end-to-end and prove it with packet capture and switch config.

Task 10: Observe per-operation latency with nfsiostat

cr0x@client:~$ nfsiostat 2 3
10.10.10.11:/export/shared mounted on /mnt/shared:

read:            ops/s    kB/s   kB/op  retrans  avg RTT (ms)  avg exe (ms)
                 85.00  696320  8192.00   0.00       1.20         1.35
write:           ops/s    kB/s   kB/op  retrans  avg RTT (ms)  avg exe (ms)
                 40.00   40960  1024.00   0.00      10.80        24.50

Meaning: Writes have much higher execution time than RTT. That often indicates server-side waiting (disk flush/commit), not network delay.

Decision: Focus server-side: storage latency, write cache, filesystem sync behavior, and NFSd threads.

Task 11: Watch server disk flush latency in real time

cr0x@nfs1:~$ iostat -x 2 3
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          6.23    0.00    4.12   21.44    0.00   68.21

Device            r/s     w/s   rkB/s   wkB/s  aqu-sz  await  svctm  %util
nvme0n1         12.00  420.00   896.0  43008.0   8.42  19.8   1.1   48.0

Meaning: High await with moderate utilization implies latency is coming from flush/barriers or storage stack behavior, not raw bandwidth saturation.

Decision: Investigate storage write cache policies, filesystem, and whether there’s a “stable storage” lie (or lack of one).

Task 12: Verify if the server is running out of NFS threads or hitting CPU limits

cr0x@nfs1:~$ ps -eo pid,comm,psr,pcpu,stat | grep -E 'nfsd|rpc'
  812 nfsd              2  6.5 S
  813 nfsd              3  6.2 S
  814 nfsd              5  6.1 S
  815 nfsd              7  6.4 S
  501 rpc.svcgssd       1  0.1 S
  476 rpc.mountd        0  0.0 S
cr0x@nfs1:~$ cat /proc/fs/nfsd/threads
8

Meaning: 8 threads might be fine or insufficient depending on workload and CPU. If nfsd threads are pegged while storage is idle, you’re CPU/thread bound.

Decision: If CPU-bound: increase threads and tune networking; if storage-bound: threads won’t save you.

Task 13: Confirm negotiated rsize/wsize by remounting explicitly (test, don’t assume)

cr0x@client:~$ sudo mount -o remount,vers=4.2,rsize=1048576,wsize=1048576 nfs1:/export/shared /mnt/shared
cr0x@client:~$ nfsstat -m | sed -n '1,3p'
/mnt/shared from nfs1:/export/shared
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.21,local_lock=none

Meaning: The kernel accepted 1 MiB sizes. If it silently negotiates down, you’ll see it here.

Decision: If throughput improves significantly and commit rates stay sane, keep the larger sizes. If you see errors or retransmits, roll back and examine network/MTU/offloads.

Task 14: Check for retransmits and timeouts (the “slow” that’s actually loss)

cr0x@client:~$ nfsstat -rc
Client rpc stats:
calls      retrans    authrefrsh
221004     932        0

Meaning: Retransmits are high. That’s not “NFS being NFS”; it’s congestion, drops, MTU issues, or a saturated server path.

Decision: Pause tuning. Fix packet loss first: switch counters, NIC stats, driver/firmware, MTU consistency, and QoS.

Task 15: Observe server-side NFSd in the kernel (quick sanity)

cr0x@nfs1:~$ sudo cat /proc/fs/nfsd/pool_stats
# pool packets arrived sockets enqueued woken queued timers
cpu0  120391  0 0 0 0
cpu1  118220  0 0 0 0
cpu2  119004  0 0 0 0
cpu3  121188  0 0 0 0

Meaning: If you see big queues or wakeups issues, you may be thread-limited or facing contention.

Decision: If the pool stats suggest queue buildup, increase threads and confirm CPU scheduling/NUMA placement.

Task 16: Validate the server filesystem and mount options (barriers and journaling matter)

cr0x@nfs1:~$ findmnt -no SOURCE,FSTYPE,OPTIONS /export/shared
/dev/md0 ext4 rw,relatime,errors=remount-ro

Meaning: Filesystem type and options influence flush behavior. Ext4 is fine, but your underlying block device policy matters more than internet folklore.

Decision: If the block device is a RAID controller, verify its cache policy and battery-backed protection. If it’s virtualized, verify the hypervisor storage stack and cache mode.

Fixes that work (client, server, and storage)

Once you’ve proven the bottleneck, fixes become straightforward. Not always easy. Straightforward.

Fix category A: rsize/wsize done properly

If you proved overhead-bound behavior (small I/O sizes), tune the mount. Keep it boring and reproducible.

  • Start with vers=4.2,proto=tcp,hard,timeo=600.
  • Set rsize=262144,wsize=262144 (or 1 MiB if you validated it).
  • Use noatime only if you understand the implications; it’s usually fine for shared data but can surprise some workflows.
cr0x@client:~$ sudo mount -t nfs4 -o vers=4.2,proto=tcp,hard,timeo=600,retrans=2,rsize=262144,wsize=262144 nfs1:/export/shared /mnt/shared
cr0x@client:~$ nfsstat -m | head -n 2
/mnt/shared from nfs1:/export/shared
 Flags: rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.21,local_lock=none

What to avoid: random mixes of legacy options copied from a 2009 blog post. If you can’t explain why an option exists, don’t ship it.

Fix category B: the honest way to make sync fast

When sync is the bottleneck, you need to reduce the cost of durable commits.

1) Fix the storage write cache policy (common in RAID / SAN)

If your RAID controller write cache is disabled, sync writes will crawl. Many controllers disable it when the battery/flash module is missing or degraded. That’s correct behavior, but it’s also a performance event.

Action: verify controller cache health and policy via vendor tooling. Then re-test dd oflag=direct and watch commit frequency.

2) Use a dedicated intent log / fast journal device when appropriate

On filesystems that support it (notably ZFS with SLOG), you can offload synchronous write intent to a low-latency device. This is not “more SSD”; it’s “the right SSD in the right place.”

3) Make sure your virtualization stack isn’t lying

Virtual disks with “writeback cache” at one layer and “sync” expectations at another layer can lead to either fake safety or fake slowness. Decide what’s true, then align settings end-to-end.

Fix category C: exporting async (only if you can tolerate it)

Let’s be clear: async can make NFS writes scream. It can also turn sudden power loss into corrupted application data that looks “fine” until it doesn’t. If you do this, do it because the workload is disposable (build artifacts, caches, scratch space) or because durability is handled elsewhere.

cr0x@nfs1:~$ sudoedit /etc/exports
cr0x@nfs1:~$ sudo exportfs -ra
cr0x@nfs1:~$ sudo exportfs -v | grep -A1 '/export/shared'
/export/shared
	10.10.10.0/24(async,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

Decision: Only ship async if it’s explicitly approved by the data owner and your durability story is written down. If the story is “probably fine,” it’s not a story; it’s a future incident.

Joke #2: async is like removing the seatbelt to get to the meeting faster—you’ll be early right up until you’re not.

Fix category D: tune NFS server concurrency (when you proved CPU/thread limits)

If your tests show storage is fast but NFSd is CPU-bound or thread-limited, increase threads and confirm you’re not creating lock contention.

cr0x@nfs1:~$ sudo systemctl edit nfs-server
cr0x@nfs1:~$ sudo systemctl show -p ExecStart nfs-server
ExecStart={ path=/usr/sbin/rpc.nfsd ; argv[]=/usr/sbin/rpc.nfsd 32 ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }

Meaning: Here the server is configured for 32 threads. That may help parallel workloads.

Decision: If increasing threads improves throughput without increasing latency or retransmits, keep it. If it increases contention or CPU steal (in VMs), back off.

Fix category E: stop sabotaging yourself with inconsistent MTU and offloads

End-to-end MTU consistency matters more than “9000 everywhere” slogans. If you can’t guarantee it, stick to 1500 and move on with your life.

cr0x@client:~$ sudo ip link set dev enp65s0 mtu 1500
cr0x@client:~$ ip link show dev enp65s0 | grep mtu
2: enp65s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

Decision: If retransmits drop and NFS stabilizes, the “jumbo frames” plan wasn’t ready for production.

Three corporate-world mini-stories

Mini-story 1: The incident caused by a wrong assumption

The company had a build farm writing artifacts to an NFS share. Debian upgrades rolled through quietly, and the team celebrated: newer kernel, newer NFS. Then the builds slowed down. Not failed—just painfully slow. Queue depth grew, engineers started rerunning jobs, which made it worse. A classic self-amplifying slowdown.

The wrong assumption was simple: “NFS performance is mostly network.” They spent a full day swapping cables, moving to a different top-of-rack switch, and arguing about LACP hashing. The network graphs were clean. The switch counters were clean. The only thing that wasn’t clean was the incident channel.

When someone finally ran a direct-write test (dd oflag=direct) and compared it to iperf3, the shape of the problem snapped into focus. Writes were latency-bound. Reads were fine. The team pulled nfsstat and saw COMMIT calls. On the server, storage was a RAID volume whose write cache had been disabled after a battery module went into a “learning cycle” and never returned to normal without manual intervention.

The fix wasn’t exotic: restore write cache protection, verify battery health, and retest. Performance returned. The postmortem lesson was even more boring: always separate network throughput from storage durability cost. They added a standard “iperf + direct-write dd” step to their runbook, and the next incident was shorter and less theatrical.

Mini-story 2: The optimization that backfired

A different org had a data science cluster. Someone read that “async makes NFS fast” and changed exports for the primary shared dataset. It did get fast. Everyone applauded. Jobs finished earlier. Dashboards looked great. They rolled it out wider.

Two weeks later, a power event hit a rack. Not catastrophic—just enough to reboot several nodes including the NFS server. The dataset didn’t look corrupted at first. Then training runs started producing inconsistent results. Some runs failed with weird file format errors. Others produced models that were subtly wrong. That’s the fun kind of failure: the one that passes CI and fails reality.

The investigation was ugly because nothing screamed “corruption.” Files were present. Permissions were normal. Checksums didn’t match historical values, but only for certain shards. The root cause was the async export: the server had acknowledged writes that never made it to stable storage before power loss. The application had assumed “write returned” meant durable. That assumption was reasonable—until someone changed the contract.

They reverted to sync for durable datasets and created a separate async export for scratch space. The performance hit was accepted because the alternative was epistemological chaos. They also introduced periodic data integrity checks for critical datasets. It didn’t make them faster, but it made them right.

Mini-story 3: The boring but correct practice that saved the day

A fintech team ran NFS for shared configuration bundles and deployment artifacts. It was not glamorous storage. The saving grace was that they treated NFS like a production dependency: documented mounts, pinned versions, and had a single “known-good” mount profile for Linux clients.

During a Debian 13 rollout, one application team complained about slow startup, blaming NFS. Instead of an argument, the SRE on duty pulled the standard runbook: confirm mount negotiation, run nfsiostat, compare to baseline. The baseline mattered—because they had one.

The output showed something subtle: retransmits were climbing during peak hours. Not insane, but non-zero. The network team found a switch port with intermittent errors. Because the storage team could prove “this is loss, not sync cost,” the fix was a simple hardware replacement, not a week-long tuning exercise.

The boring practice was “keep a baseline and a known-good mount profile.” It saved days of cross-team speculation and prevented the usual ritual sacrifice of random mount options.

Common mistakes: symptoms → root cause → fix

1) Writes stuck around 20–80 MB/s on fast links

Symptom: Reads saturate the link, writes are flat and low.

Root cause: Sync/commit latency dominated by server storage flush time (write cache off, slow journal, HDD-backed sync path).

Fix: Measure COMMIT rate and server await; fix storage durability path (controller cache protection, faster log device), or isolate scratch workloads to async.

2) Throughput increases with more clients, but each client stays slow

Symptom: Aggregate throughput climbs, single-stream stays disappointing.

Root cause: Small wsize/rsize, single-stream overhead, or application doing tiny sync writes.

Fix: Increase rsize/wsize; test with dd bs=1M and validate negotiated values; fix app write pattern if it’s fsync-heavy.

3) Random stalls, “server not responding” messages, then recovery

Symptom: Periodic hangs, then things continue.

Root cause: Packet loss/retransmits, MTU mismatch, NIC offload bugs, or overloaded server threads leading to timeouts.

Fix: Check nfsstat -rc retrans; verify MTU end-to-end; check NIC error counters; increase server threads only after loss is ruled out.

4) Tuning rsize/wsize makes things worse

Symptom: Bigger sizes reduce throughput or increase latency.

Root cause: Path MTU issues, fragmentation, or a server that can’t efficiently handle big I/O (CPU bound, memory pressure).

Fix: Fix MTU consistency; confirm NIC offloads; pick a moderate size (256 KiB) and validate with nfsstat -m.

5) “Async fixed it” and then data weirdness later

Symptom: Fast writes, later corruption or missing recent updates after crash/power loss.

Root cause: You changed the durability contract. The system behaved accordingly.

Fix: Use sync for durable data; isolate scratch exports; document durability expectations and test crash behavior if you must deviate.

6) CPU spikes on the NFS server at moderate throughput

Symptom: Storage is idle-ish, network is not saturated, but server CPU is hot.

Root cause: Too many small operations (small I/O sizes), insufficient nfsd threads, or encryption/krb overhead.

Fix: Increase rsize/wsize, increase threads, and measure again; if using Kerberos, budget CPU accordingly.

Checklists / step-by-step plan

Step-by-step: prove sync is the limiter

  1. On client: capture mount negotiation with nfsstat -m.
  2. On client: run iperf3 to the server to confirm network capacity.
  3. On client: run dd oflag=direct write and dd read; compare throughput.
  4. On client and server: check nfsstat for COMMIT call rate.
  5. On client: use nfsiostat to compare RTT vs execution time for writes.
  6. On server: use iostat -x to find high await/iowait during writes.
  7. Decision: if writes are latency-bound with commits, fix storage durability path or accept controlled async for non-durable workloads.

Step-by-step: prove rsize/wsize is the limiter

  1. Record current rsize/wsize from nfsstat -m.
  2. Run a read/write test with large block sizes (bs=4M read, bs=1M oflag=direct write).
  3. Temporarily remount with larger rsize/wsize and verify negotiation did not downgrade.
  4. Re-run the same tests and compare throughput and CPU usage.
  5. Decision: keep the smallest size that achieves the needed throughput without increasing retransmits/latency.

Production rollout checklist (so you don’t “tune” yourself into an outage)

  • Pick one client and one share as canary.
  • Measure baseline: nfsstat -m, nfsstat -c, nfsiostat, iostat -x on server during load.
  • Change one variable at a time: either rsize/wsize or server export sync behavior, not both.
  • Keep a rollback command ready (old mount options, old export line).
  • Watch retransmits and latency, not just MB/s.
  • Write down the durability contract for each export (durable vs scratch).

FAQ

1) Should I use NFSv3 or NFSv4.2 on Debian 13?

Default to NFSv4.2 unless you have a specific compatibility reason. v4 has better integration (single TCP connection model, statefulness). But don’t expect the version change alone to fix flush latency.

2) If I set rsize/wsize to 1 MiB, will it always be faster?

No. It helps when you’re overhead-bound. If you’re commit/flush latency-bound, bigger wsize won’t fix the fundamental “wait for stable storage” cost. Also, if your network path has MTU problems or drops, larger operations can amplify pain.

3) What’s the difference between “server is slow” and “storage is slow”?

For NFS writes, “server is slow” often means “server is waiting on storage flush.” Use nfsiostat: high execution time with low RTT points to server-side waiting, usually storage.

4) Is async always unsafe?

It’s unsafe for workloads that assume write completion implies durability. For caches, build artifacts, temporary scratch, it can be acceptable. You’re not making it “safe” by hoping; you’re making it “accepted risk” by choosing the right data.

5) Why do I see COMMIT calls even on NFSv4?

NFSv4 can still require commit-like behavior depending on how the client and server manage stable storage guarantees. If the server acknowledges data as unstable, clients will issue commits to make it stable.

6) My app is slow only on NFS, but local disk is fine. What’s the most common reason?

Fsync-heavy patterns. Local disk might have fast write cache and low-latency flush, while the NFS server’s stable storage path is slower. Prove it with strace on the app and iostat on the server.

7) Should I disable atime or use noatime to speed things up?

It can reduce metadata writes, but it’s rarely the main bottleneck in “writes are 40 MB/s” incidents. Fix commit latency and I/O sizing first. Then consider atime if metadata load is a real measured problem.

8) I increased NFS server threads and nothing changed. Why?

Because you were storage-latency bound, not thread bound. More threads just give you more concurrent waiting. Use nfsiostat and server iostat -x to see whether you’re waiting on disk flushes.

9) Can jumbo frames fix slow NFS?

Sometimes, for CPU-bound high-throughput cases. But MTU inconsistency creates loss and retransmits that look like random stalls. If you can’t guarantee end-to-end MTU, stick to 1500 and focus on the actual limiter.

Next steps you can actually take

Slow NFS on Debian 13 is rarely mysterious. It’s usually one of three things: sync/commit cost, undersized rsize/wsize, or network loss hiding behind “it’s probably fine.” The trick is refusing to treat guesses as data.

Do this next, in order:

  1. Capture client mount negotiation (nfsstat -m) and server export options (exportfs -v).
  2. Prove the network with a TCP throughput test, then stop talking about cables.
  3. Run paired direct-write and read tests. If writes are slow and reads are fast, pivot to commit/flush evidence.
  4. Use nfsstat and nfsiostat to decide whether you’re latency-bound or overhead-bound.
  5. Fix the proven limiter: durable storage path for sync workloads, or mount sizing for overhead-bound workloads.
  6. If you choose async, do it only for data that can be lost, and document that choice like an adult.

If you follow this discipline, “NFS is slow” turns from a vague complaint into a measurable system behavior with an actual fix. That’s the job.

← Previous
AI on GPUs: how your graphics card became a home supercomputer
Next →
Ubuntu 24.04: SSH hardening that won’t lock you out — a pragmatic checklist

Leave a comment