Your Debian 13 hosts are fine until they aren’t: a build job freezes, a web node stops rotating logs, or an
entire Kubernetes worker looks “up” but can’t ls a directory. You check the app. Nothing.
You check CPU. Bored. Then you find it: nfs: server foo not responding, still trying.
NFS timeouts feel like betrayal because they’re rarely one thing. They’re a negotiation between kernel client,
server, network, and whatever storage sits behind the server. Mount options can make that negotiation fair—or
they can just gag the messenger while the fire spreads. Let’s do the former.
What “NFS timeout” actually means on Linux
On Debian 13, “NFS timeout” is usually reported as a client-side symptom: RPCs to the server didn’t complete
in the expected time and the client is retrying (or giving up, if you forced it to). The key detail:
most “timeouts” are not a single timer expiring once. They’re a series of retransmissions, backoffs,
and state recovery attempts across multiple layers.
The client might be waiting on:
- Network delivery: packet loss, MTU mismatch, congested buffers, flapping links, ECMP weirdness.
- Server RPC processing: nfsd threads saturated, export FS blocked on disk I/O, lock manager delays.
- Storage behind the server: RAID rebuild, thin-provisioned pool out of space, overloaded SSD, mis-sized ARC, pathological snapshots.
- State recovery: NFSv4 lease renewal, delegations recalled, client ID changes after reboot, grace periods.
Linux NFS clients can also appear “hung” while behaving as designed. A hard mount will keep retrying I/O
forever (or for a very long time) because correctness matters more than fast failure. That’s not a timeout problem;
that’s you discovering the semantics you bought.
The questions that matter:
- Is the client retrying because the server is slow, or because the network is dropping?
- Is the client blocked in uninterruptible sleep (
D) on NFS I/O? - Is the server alive but its backing storage is wedged?
- Did you choose mount semantics that match the workload’s tolerance for corruption vs downtime?
One paraphrased idea from Werner Vogels (Amazon CTO): Everything fails; design so failure is routine and recoverable.
(paraphrased idea)
Interesting facts and historical context
- NFS predates “cloud” by decades: NFS first appeared in the mid-1980s, built for LANs where “fast” meant 10 Mbps and a beige server.
- NFSv3 is “stateless-ish” on purpose: the server doesn’t keep per-client open-file state like SMB; it reduces server complexity but pushes pain into locking and client retries.
- NFSv4 folded in locking: NLM/lockd was split out in v2/v3 days; v4 integrated locking and added a stateful protocol with leases.
- Timeout tuning exists because RPC over UDP existed: early NFS relied heavily on UDP; retrans behavior mattered a lot. TCP improved the transport story but didn’t eliminate server stalls.
- “Stale file handle” is a feature, not a bug: NFS filehandles encode server-side identity; if the inode is gone or the export moved, the client can’t pretend it’s the same file.
- rsize/wsize wars were once real: older networks and NICs meant you could absolutely crater performance with badly chosen sizes; modern defaults are saner but jumbo/overlay networks can still bite.
- nconnect is relatively new: multiple TCP connections per mount improves parallelism; great on busy clients, but it can amplify load on fragile servers.
- NFS timeouts often blame the wrong layer: a storage array doing a controller failover looks identical to “network jitter” from the client’s point of view.
Fast diagnosis playbook (first/second/third)
When people say “NFS is timing out,” the winning move is to stop debating mount flags and start narrowing the bottleneck.
This is the order that finds answers quickly without boiling the ocean.
First: confirm the failure mode on the client (is it blocking? retrying? failing fast?)
- Check kernel logs for “not responding” and “OK” transitions.
- Check whether processes are stuck in
Dstate on NFS syscalls. - Check NFS client RPC stats: retrans spikes vs normal traffic.
Second: isolate network vs server CPU vs server storage
- Ping is not enough; check packet loss, MTU path issues, and TCP retransmissions.
- On the server, check nfsd thread saturation and system load with I/O wait.
- On the server, check backing filesystem latency (e.g.,
iostat), and look for stalled devices.
Third: decide if mount options are mitigation or masking
- Use mount options to align semantics (hard vs soft, intr, timeo/retrans) with workload needs.
- Don’t “fix” storage stalls with
softunless you accept partial writes and app-level corruption risk. - Prefer options that improve recovery and reduce pathological behavior:
bg,nofail, sanetimeo, proper protocol version, and systemd ordering.
Mount options that help stability (and the traps)
Stability in NFS isn’t just “don’t time out.” It’s “when something is slow or broken, fail in a way the system can
survive.” That means the right defaults for general purpose, and targeted exceptions for workloads that can tolerate them.
Hard vs soft: choose semantics, not vibes
hard means the client retries I/O until it succeeds. That’s why you see “still trying”.
It’s the correct choice for data integrity and most POSIX-ish workloads. It can also freeze your application threads
if the server disappears. That’s the price.
soft means the client gives up after retries and returns an error to the application.
That sounds “stable” until you realize: some applications treat I/O errors as transient and keep going, possibly after
partial writes or unexpected short reads. You didn’t make NFS more reliable; you made failure faster and more creative.
Opinionated guidance:
- Use
hardfor anything that writes important data: databases, message queues, artifact stores, home dirs. - Consider
softonly for read-mostly, cacheable, non-critical content where errors are acceptable and handled (e.g., a render farm reading inputs that can be re-fetched). - If you choose
soft, set expectations with app owners in writing. You’re changing semantics, not tuning.
Joke #1: A soft mount is like a “temporary” firewall rule—everyone forgets it, until the incident report needs a villain.
timeo and retrans: what they really control
On Linux NFS, timeo is not “the time before failure.” It’s the base RPC timeout before retry, in tenths of a second
for many configurations (commonly: timeo=600 means 60 seconds). The effective wait can be much longer due to exponential backoff and retries.
retrans is how many times to retry an RPC before the client escalates behavior (and for soft mounts, before erroring out).
Lowering timeo can make the client detect issues sooner, but can also increase load during congestion due to more frequent retries.
Increasing timeo can reduce retrans storms on flaky networks, but makes real outages feel longer.
Practical guidance:
- For stable LANs, defaults are often fine. Tune only when you have evidence: retrans spikes, known WAN links, or known server stalls.
- For WAN-ish or lossy links, a slightly higher
timeocan reduce thrash; don’t make it minutes unless you enjoy watching people reboot nodes in desperation. - If you’re seeing brief pauses (a few seconds) due to server failovers, don’t overreact with tiny
timeovalues; let recovery happen.
proto=tcp and NFS versions: pick the protocol you can support
On modern Debian, you should be using TCP. UDP is mostly a museum exhibit unless you’re working around something truly strange.
TCP gives better behavior under loss and handles larger payloads more sanely.
Version choice:
- NFSv4.x (including 4.1/4.2) is typically the best default: fewer ports to manage, integrated locking, better firewall story.
- NFSv3 can still be the right choice with legacy servers, some NAS appliances, or weird corner cases. But it requires more moving parts (rpcbind, mountd, lockd/statd behavior).
Stability angle: v4 state recovery can be painful during server reboots (grace periods), but it’s more predictable when configured well.
v3 can look “simpler” until you debug lock recovery at 2 a.m.
vers=4.1 or vers=4.2 vs “whatever negotiates”
Letting the client and server negotiate versions can be fine, but it’s also how you end up with surprising downgrades after a change.
If you operate production, pin the version unless you have a reason not to.
- Pin
vers=4.1if your server is known good on 4.1 and you don’t need 4.2 features. - Use
vers=4.2if both sides support it and you’ve tested. Some 4.2 features (like server-side copy/clone in some stacks) are great; buggy implementations are not.
actimeo and attribute caching: stability vs correctness
actimeo (and the finer-grained acregmin/acregmax/acdirmin/acdirmax) controls attribute caching.
Increasing cache times reduces metadata RPCs, which can help under load and reduce exposure to transient slowness.
It can also make clients see stale metadata longer, which breaks apps expecting close-to-open consistency patterns.
Use cases:
- Build farms and CI reading shared dependencies: bigger attribute caching can be a win.
- Shared home directories with frequent edits across hosts: don’t get fancy; keep defaults or conservative values.
rsize/wsize: don’t chase benchmarks, chase tail latency
Bigger I/O sizes can improve throughput but can worsen tail latency if the server or network struggles with large bursts.
Modern defaults are usually good. If you tune, tune with measurement and rollback plans.
A common stability pattern: reduce wsize slightly when a fragile server falls over under heavy write bursts.
But if you’re doing that, you’re treating symptoms; the real fix is server/storage capacity.
nconnect: more parallelism, more ways to overload a server
nconnect=N opens multiple TCP connections per mount. It can dramatically help on busy clients with parallel I/O,
especially when a single TCP flow becomes the bottleneck.
Stability trade-off:
- On robust servers, it can reduce timeouts by improving throughput and reducing queueing.
- On marginal servers, it can amplify load and make stalls more frequent.
Recommendation: start with nconnect=4 for high-throughput clients if server capacity is known good; monitor server CPU, nfsd threads, and latency.
noatime: small win, sometimes real
Disabling atime updates reduces write traffic for read-heavy workloads. It won’t fix timeouts caused by server stalls,
but it can shave background churn and improve stability under borderline load.
System boot stability options: nofail, bg, and systemd timeouts
Many “NFS timeout incidents” begin as “a host won’t boot because NFS isn’t up yet.” That’s not an NFS performance problem.
That’s boot ordering and failure handling.
nofaillets boot continue if the mount isn’t available. Great for non-critical mounts; dangerous for required ones.bg(mainly for NFSv3-ish mount behavior) backgrounds mount attempts after an initial try, letting boot proceed.x-systemd.automountcreates an on-demand automount; the system can boot and mount only when accessed. This is a stability weapon when used carefully.x-systemd.mount-timeout=controls how long systemd waits for the mount unit. You can avoid minutes of boot hang.
Locking and “interruptibility”: know what you’re opting into
Older lore includes intr to interrupt NFS operations on signals. Modern kernels changed behavior; for many setups it’s either default or not meaningful.
The point remains: killing a process stuck in NFS I/O isn’t always possible, because the kernel is waiting for the server to respond.
If you need “kill always works,” you don’t want NFS semantics. You want a different architecture: local buffering, async replication, or an object store.
Joke #2: Tuning NFS timeouts without measuring the server is like fixing a slow elevator by changing the “door close” button sticker.
Debian 13 specifics: systemd, kernel, defaults
Debian 13 ships with a modern kernel and systemd behavior that makes NFS mounts feel different than the “edit fstab, reboot, pray” era.
The relevant operational changes:
- systemd treats mounts as units with dependencies; ordering against
network-online.targetmatters for remote mounts. remote-fs.targetcan block services if mounts are “required.” Decide explicitly what must block and what must not.- NFS client utilities are split: kernel provides the client; userland tools help with diagnostics and config. Ensure you have
nfs-commonon clients.
My opinionated Debian 13 baseline for a “normal” production client mount (NFSv4.1 on a LAN) looks like:
cr0x@server:~$ cat /etc/fstab
nfs01:/export/shared /mnt/shared nfs vers=4.1,proto=tcp,hard,timeo=600,retrans=2,noatime,_netdev,x-systemd.automount,x-systemd.idle-timeout=60 0 0
Why this mix?
hardkeeps data semantics sane.timeo=600,retrans=2is conservative (not tiny) and avoids hammering during blips; adjust only with evidence.x-systemd.automountavoids boot-time hostage situations for many workloads.x-systemd.idle-timeout=60lets the automount go idle; this can reduce stale state on laptops or ephemeral nodes._netdevsignals “this is network-dependent” for ordering.
When this baseline is wrong:
- If the mount must be available before critical services start (e.g., app binaries live there), automount can create surprise latency on first access. Better to mount early and fail fast, or redesign.
- If you’re on a high-latency WAN, you may need higher
timeoand careful caching choices. - If your server does planned failovers, you need to test recovery behavior under load; mount flags won’t save you from a storage controller that pauses for 45 seconds.
12+ practical tasks: commands, outputs, decisions
Below are operational tasks I’d actually run. Each includes what to look for and what decision it drives.
Run them on a client first, then on the server. You’re building a timeline and isolating the layer that lies.
Task 1: Confirm what’s mounted and with which options
cr0x@server:~$ findmnt -t nfs,nfs4 -o TARGET,SOURCE,FSTYPE,OPTIONS
TARGET SOURCE FSTYPE OPTIONS
/mnt/shared nfs01:/export/shared nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.10.2.15,local_lock=none
What it means: you’re using NFSv4.1 over TCP with a hard mount; the client will retry. rsize/wsize show negotiated values.
Decision: if you see unexpected vers=3, udp, or missing hard, fix the mount definition before chasing ghosts.
Task 2: Watch kernel logs for “not responding” and “OK” flaps
cr0x@server:~$ sudo journalctl -k -g 'nfs' -S -2h
Dec 29 09:12:41 client01 kernel: nfs: server nfs01 not responding, still trying
Dec 29 09:13:10 client01 kernel: nfs: server nfs01 OK
Dec 29 09:27:02 client01 kernel: nfs: server nfs01 not responding, still trying
What it means: you have intermittent stalls, not a clean outage. The “OK” line suggests recovery happens.
Decision: intermittent stalls push you toward investigating server/storage latency and network loss, not DNS or fstab typos.
Task 3: Check for processes stuck in uninterruptible sleep (D state)
cr0x@server:~$ ps -eo pid,state,comm,wchan:32,args | awk '$2=="D"{print}' | head
21453 D tar nfs_wait_on_request tar -cf /mnt/shared/backup.tar /var/lib/app
30112 D rsync rpc_wait_bit_killable rsync -a /data/ /mnt/shared/data/
What it means: user processes are blocked waiting on NFS/RPC. Killing them may not work until the kernel I/O completes or times out per semantics.
Decision: if many processes are in D, treat this as an infrastructure incident, not an application bug. Proceed to RPC stats and server checks.
Task 4: Inspect NFS client RPC stats for retransmissions
cr0x@server:~$ nfsstat -c
Client rpc stats:
calls retrans authrefrsh
128904 8421 0
Client nfs v4:
null read write commit open open_conf
0 48210 19342 0 188 0
What it means: retrans is high relative to calls (here ~6.5%). That usually means packet loss, congestion, or server stalls long enough to trigger retries.
Decision: if retrans is near zero, the issue may be application-level or server-side blocking without packet loss (still possible). If retrans is high, prioritize network and server responsiveness.
Task 5: Confirm basic path connectivity and detect loss (not just latency)
cr0x@server:~$ ping -c 50 -i 0.2 nfs01
PING nfs01 (10.10.2.20) 56(84) bytes of data.
64 bytes from 10.10.2.20: icmp_seq=1 ttl=64 time=0.424 ms
64 bytes from 10.10.2.20: icmp_seq=2 ttl=64 time=0.391 ms
...
--- nfs01 ping statistics ---
50 packets transmitted, 50 received, 0% packet loss, time 10047ms
rtt min/avg/max/mdev = 0.320/0.410/0.690/0.072 ms
What it means: no obvious ICMP loss; good sign, not proof. ICMP can be prioritized differently than NFS traffic.
Decision: if there is loss here, you have a network incident. If not, continue with TCP and interface counters.
Task 6: Check interface counters for drops and errors on the client
cr0x@server:~$ ip -s link show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
912345678 854321 0 412 0 12345
TX: bytes packets errors dropped carrier collsns
812345001 743210 0 0 0 0
What it means: RX drops indicate congestion, ring buffer overflow, or upstream issues. Drops correlate strongly with RPC retrans.
Decision: if drops increment during incidents, investigate NIC offload settings, host load, switch congestion, and MTU. Don’t touch NFS timeouts yet.
Task 7: Verify MTU consistency and look for fragmentation clues
cr0x@server:~$ ip link show dev eth0 | grep -E 'mtu|state'
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
What it means: MTU is 1500. If your network uses jumbo frames inconsistently, you can get blackhole PMTU problems that look like “random NFS stalls.”
Decision: if you suspect MTU mismatch, temporarily standardize to 1500 end-to-end or ensure jumbo is consistent (hosts, switches, bonds, VLANs, overlays).
Task 8: Check TCP retransmissions on the client
cr0x@server:~$ nstat -az | egrep 'TcpRetransSegs|TcpTimeouts'
TcpRetransSegs 1932
TcpTimeouts 41
What it means: TCP retrans and timeouts suggest network loss or severe congestion—not purely an NFS server problem.
Decision: if these spike during incidents, focus on network path and host NIC queues. If stable, focus more on server/storage latency.
Task 9: Confirm the server is reachable on the expected NFS ports and protocol
cr0x@server:~$ sudo ss -tn dst 10.10.2.20 | head
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 10.10.2.15:51144 10.10.2.20:2049
ESTAB 0 0 10.10.2.15:51146 10.10.2.20:2049
What it means: active TCP connections to port 2049 (NFS). If you expected NFSv4 and see a mess of auxiliary ports, you might be on v3.
Decision: if connections flap or Recv-Q grows, you may have server-side slowness or network queueing. Proceed to server metrics.
Task 10: On the server, check nfsd thread saturation and basic health
cr0x@server:~$ sudo cat /proc/net/rpc/nfsd | head -n 20
rc 0 0 0
fh 0 0 0 0 0
io 123456 654321
th 64 0 0.000 0.000 0.000 0.000 0 0 0 0
ra 32 0 0 0 0 0 0 0 0 0 0 0
net 0 0 0 0
rpc 1234 0 0 0 0 0 0 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc4 2 0 0
What it means: the th line shows number of nfsd threads (here 64). If this number is tiny on a busy server, you can get queueing.
Decision: if nfsd threads are too low, increase them appropriately (and ensure CPU is available). If threads are plenty but stalls persist, look at backing storage latency.
Task 11: On the server, spot I/O wait and blocked storage quickly
cr0x@server:~$ sudo iostat -xz 1 5
Linux 6.10.0 (nfs01) 12/29/2025 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
12.10 0.00 4.20 38.50 0.00 45.20
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz aqu-sz %util
nvme0n1 210.0 16800.0 0.0 0.00 8.20 80.0 95.0 9200.0 0.0 0.00 62.30 96.8 7.10 92.0
What it means: high %iowait plus high w_await and high %util suggests storage saturation or latency.
Decision: if backing storage is the bottleneck, mount options won’t cure it. You need storage capacity, cache, queue tuning, or workload shaping.
Task 12: Confirm exports and per-client access on the server
cr0x@server:~$ sudo exportfs -v
/export/shared 10.10.2.0/24(rw,wdelay,root_squash,sec=sys,async,no_subtree_check,fsid=0)
/export/backups 10.10.2.15(rw,root_squash,sec=sys,sync,no_subtree_check)
What it means: shows export options. Note the difference between async and sync, and scope by client/subnet.
Decision: if you see accidental sync on a busy export (or vice versa), you’ve found a stability/performance lever—handle carefully and test. Also verify the correct clients are allowed; access denials can look like timeouts from apps.
Task 13: Check NFS server logs for grace periods and recovery events
cr0x@server:~$ sudo journalctl -u nfs-server -S -2h | tail -n 30
Dec 29 09:11:58 nfs01 systemd[1]: Starting NFS server and services...
Dec 29 09:11:58 nfs01 kernel: NFSD: starting 64-second grace period (net ffffffff9a6a0000)
Dec 29 09:12:02 nfs01 systemd[1]: Started NFS server and services.
What it means: grace periods affect NFSv4 clients reclaiming locks. During grace, some operations can block or fail in specific ways.
Decision: if timeouts cluster around server restarts/failovers, you need to design for grace/recovery: app retries, maintenance windows, or HA that preserves state correctly.
Task 14: Validate name mapping and ID mapping (NFSv4 pain that looks like “hang”)
cr0x@server:~$ nfsidmap -c
cr0x@server:~$ nfsidmap -l | head
0x00000001: user@domain
0x00000002: nobody@domain
What it means: idmap caches can get stale or misconfigured, causing permission anomalies that get misreported as “NFS broken.”
Decision: if users see “permission denied” or odd ownership on mounted paths, verify idmapping and consistent UID/GID strategy before touching timeouts.
Task 15: Test perceived latency with a controlled small I/O loop
cr0x@server:~$ time bash -c 'for i in {1..200}; do echo $i > /mnt/shared/.lat_test.$i; done'
real 0m2.913s
user 0m0.059s
sys 0m0.351s
What it means: quick sanity check for small synchronous writes. Repeat during an incident; if real jumps wildly, you’re seeing storage or server processing latency.
Decision: if small writes become seconds-to-minutes, mount tuning won’t fix it. Investigate server disks, export options (sync/async), and server load.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption (“hard mounts are the problem”)
A mid-size company ran Debian clients mounting a shared NFS export for CI artifacts. Builds started “randomly freezing.”
The platform team noticed processes stuck in D and concluded the mount was the problem. They switched the mount to soft,timeo=50,retrans=2.
The freezes stopped. Tickets quieted down. Everyone congratulated the “stability fix.” Two weeks later, release engineering found corrupted artifacts:
tarballs that unpacked with missing files, checksums that didn’t match, and a few builds that “succeeded” but couldn’t be reproduced.
The logs showed occasional EIO from write syscalls that the build tooling didn’t treat as fatal because it wasn’t built for unreliable storage.
The underlying issue was a storage array behind the NFS server doing periodic latency spikes during snapshot cleanup.
The server wasn’t “down,” it was just sometimes unable to service RPCs in time, so clients retransmitted, then—under soft—gave up.
They rolled back to hard, then fixed the real culprit: snapshot scheduling and array performance headroom.
They also added artifact verification and made “I/O errors are fatal” an explicit policy in the CI pipeline.
The lasting lesson wasn’t “never soft mount.” It was “don’t treat semantics as tuning knobs.”
Mini-story 2: The optimization that backfired (aggressive caching and jumbo frames)
Another organization had a fast NFS server and wanted to reduce metadata load. Someone tuned attribute caching:
actimeo=60 across a fleet, plus larger rsize/wsize, plus enabling jumbo frames on new hosts.
Throughput looked great in synthetic tests. Fewer RPCs, higher MB/s. Slides were made. Promotions almost happened.
Then a subset of clients started seeing mysterious hangs and “server not responding” during peak hours.
Not all clients. Not all the time. The kind of problem that turns confident engineers into hobbyist astrologers.
The issue was MTU inconsistency across a couple of switch ports in a leaf pair.
Some paths handled jumbo frames, some silently dropped oversized packets. TCP PMTU discovery didn’t reliably recover due to
filtered ICMP “fragmentation needed” messages on a firewall segment that “wasn’t supposed to be in the path.”
The caching tweaks masked the problem at first (fewer metadata calls), but the larger I/O sizes plus jumbo made the blackhole more painful.
Fixing MTU consistency and allowing the relevant ICMP restored stability. They kept moderate caching, but stopped treating the network as a magical always-correct backplane.
Mini-story 3: The boring practice that saved the day (automount + explicit ordering)
A finance-adjacent company had a habit: every NFS mount in fstab used _netdev,x-systemd.automount,x-systemd.mount-timeout=15,
and “critical” mounts were mounted by a dedicated unit that required network-online.target plus a health check.
It was not glamorous. It did not improve benchmarks. It was basically adult supervision for boot time.
One day, a core switch rebooted unexpectedly. Half the fleet lost network for a couple minutes.
The dramatic services blipped; the boring ones recovered. Importantly, hosts didn’t wedge at boot and didn’t block unrelated services.
When the NFS server came back reachable, automount re-established mounts on demand.
They still had application errors (because network outage), but they didn’t have a second-order disaster: stuck boots, stuck deployments,
and panicked reboots making recovery slower. The postmortem had a rare line: “Mount configuration behaved as designed.”
The practice wasn’t clever. It was correct. And in production, “correct” ages better than “clever.”
Common mistakes: symptom → root cause → fix
1) “server not responding” spam, but ping is fine
Symptom: kernel logs show NFS not responding; ICMP ping shows no loss.
Root cause: server storage stalls or nfsd saturation; ICMP doesn’t reflect application RPC latency.
Fix: check server iostat for high await/util; verify nfsd threads; fix storage headroom. Don’t tune timeo blindly.
2) Boot hangs forever waiting for NFS
Symptom: rebooted node sits at “A start job is running for /mnt/shared”.
Root cause: required mount without proper systemd ordering/timeouts; network not “online” yet, or NFS server unavailable.
Fix: use _netdev, x-systemd.automount, and explicit x-systemd.mount-timeout=. For truly required mounts, gate dependent services and fail fast.
3) Random “permission denied” after an otherwise clean mount
Symptom: mount succeeds; users see wrong ownership or denied access; people call it “NFS flaky.”
Root cause: NFSv4 idmapping mismatch (domain), inconsistent UID/GID, stale idmap cache.
Fix: standardize identity (LDAP/SSSD or consistent local IDs), configure idmap domain, clear caches via nfsidmap -c.
4) Client processes stuck in D-state for hours
Symptom: unkillable processes; system “up” but unusable.
Root cause: hard mount semantics + server unreachable or stuck I/O; sometimes compounded by network blackhole.
Fix: restore server/network. If this is operationally unacceptable, redesign: local buffering, async pipelines, or separate critical paths from NFS.
5) “Stale file handle” after server maintenance or export changes
Symptom: applications fail on existing paths; new mounts work.
Root cause: server-side filesystem objects changed/moved; export remapped; underlying FS recreated; snapshots rolled back.
Fix: remount clients; avoid changing export root identity; use stable exports; document maintenance steps that invalidate handles.
6) Switching to soft “fixes” timeouts but causes silent data issues
Symptom: fewer hangs; later you see corrupted files or incomplete output without clear failures.
Root cause: soft mount returns errors mid-operation; application doesn’t treat it as fatal; partial results persist.
Fix: revert to hard for writes; add end-to-end integrity checks; fix the server/network problem causing retrans.
7) nconnect improves throughput, then timeouts get worse
Symptom: higher throughput in tests; under production load, server becomes less responsive; timeouts appear.
Root cause: increased parallelism amplifies server CPU, nfsd, or storage load; queueing increases tail latency.
Fix: reduce nconnect, increase server capacity, or isolate workloads; measure server-side latency and thread saturation.
Checklists / step-by-step plan
Checklist A: stabilize first, then tune
- Confirm mount semantics: keep
hardfor anything that writes important data. - Prevent boot hostage scenarios: add
_netdev,x-systemd.automount,x-systemd.mount-timeout=15for non-critical mounts. - Pin protocol/version: pick
vers=4.1orvers=4.2deliberately; enforceproto=tcp. - Measure retrans: use
nfsstat -candnstatbefore changingtimeo. - Fix the slow layer: network drops, server CPU saturation, or storage latency. Mount options are not horsepower.
Checklist B: mount option decisions by workload
- Databases on NFS: avoid if you can; if forced, use
hard, conservative caching, validate server guarantees, and test failover behavior. - Home directories:
hard, defaults for caching, focus on server HA and network stability. - Read-mostly shared dependencies: consider attribute caching tweaks; consider
x-systemd.automountto avoid boot issues. - Ephemeral compute nodes: automount with idle timeout reduces long-lived stale state and makes reboots less dramatic.
- WAN links: accept higher latency; tune
timeoupward modestly; consider local caching layers rather than pretending WAN is a LAN.
Step-by-step: a sane change plan for mount options
- Pick one client as a canary. Do not “fleet-wide roll out” your way into learning.
- Record baseline:
findmnt,nfsstat -c,nstat, and a small I/O loop timing. - Change one dimension (e.g., pin
vers=4.1, or addx-systemd.automount). - Re-test under load; watch server
iostatand nfsd stats. - Roll out gradually. Keep rollback instructions in the ticket, not in someone’s memory.
FAQ
1) Should I use hard or soft to prevent “hangs”?
Use hard for correctness, especially for writes. soft prevents indefinite blocking by returning errors,
but it changes semantics and can cause data corruption in poorly behaved applications.
2) What are good default values for timeo and retrans?
Start with defaults unless you have evidence. A common conservative pair is timeo=600,retrans=2 for LAN workloads.
If you’re on a lossy or high-latency link, increase timeo modestly; don’t just crank retries.
3) Why do I see “server not responding” and then “OK”?
Because the client is retrying RPCs and the server eventually responds. That pattern often indicates transient network loss,
server overload, or storage latency spikes behind the server.
4) Does x-systemd.automount make NFS more reliable?
It makes boot and service startup more resilient by avoiding hard dependency on immediate mount success. It doesn’t fix a slow server,
but it prevents a slow server from breaking unrelated parts of the system.
5) Can I “fix” NFS timeouts by changing rsize/wsize?
Sometimes you can reduce symptoms, especially in fragile networks or appliances, but it’s not a root-cause fix.
Tune for tail latency and stability, not peak throughput.
6) Is NFSv4 always better than NFSv3?
Not always, but usually. NFSv4 is easier to firewall and has a cleaner protocol story. NFSv3 can be fine with stable environments,
but it adds more auxiliary services and ports, which increases operational failure modes.
7) What does it mean when processes are stuck in D state?
They’re waiting on kernel I/O that can’t be interrupted easily. If it’s NFS-related, the kernel is waiting for RPC completion.
Solve the underlying connectivity/server stall; don’t expect kill -9 to save you.
8) Can firewall/NAT issues cause NFS timeouts even if connections exist?
Yes. Stateful firewalls timing out idle flows, asymmetric routing, MTU blackholes, and filtered ICMP can all cause stalls
that look like “NFS flakiness.” That’s why you check TCP retrans and interface drops early.
9) When is nofail appropriate?
For mounts that are helpful but not required for boot or core service health—like optional shared tools or caches.
Don’t use it for critical mounts unless you also build explicit health gating into your services.
10) If I need “never block,” what should I do instead of NFS?
Use local storage with replication, or an architecture that tolerates remote storage failure: object storage, content-addressed artifacts,
or write-behind buffering with clear durability guarantees. NFS is great, but it’s not a magic carpet.
Conclusion: what to do next
If you take one operational lesson from NFS timeouts on Debian 13, make it this: mount options are not a cure,
they’re a contract. Set the contract first (hard vs soft, protocol version, boot behavior), then measure where the system is actually slow.
Practical next steps you can do this week:
- Inventory all NFS mounts with
findmnt; standardize on TCP and pin versions where appropriate. - Add
x-systemd.automountand mount timeouts for non-critical mounts to prevent boot/service hostage situations. - Build a minimal “NFS incident pack”:
nfsstat -c,nstat,ip -s link, client kernel logs, serveriostat, and/proc/net/rpc/nfsd. - Run a controlled failover/reboot test of the NFS server and watch client behavior, including grace periods, under load.
- When you must tune
timeo/retrans, do it as an experiment with a canary, before you do it as a religion.