You deploy a perfectly boring container. It writes a few files. Then, at 02:17, everything freezes like it’s waiting for a permission slip.
df hangs. Your app threads pile up. Docker logs say nothing helpful. The only clue is a trail of “nfs: server not responding”.
NFS inside containerized systems fails in ways that look like application bugs, kernel bugs, or “network vibes.”
It’s usually none of those. It’s mount semantics colliding with transient network faults, DNS surprises, and a volume driver that won’t tell you what it did.
Why Docker NFS volumes time out (and why it looks random)
Docker “NFS volumes” are just Linux NFS mounts created by the host, then bind-mounted into containers.
That sounds simple, and it is—until you remember that NFS is a network filesystem with retry semantics,
RPC timeouts, locking, and state.
Most “timeouts” aren’t a single timeout
When someone reports “NFS timed out,” they usually mean one of these:
- The client is retrying forever (hard mount) and the application thread blocks in uninterruptible sleep (
Dstate). This looks like a hang. - The client gave up (soft mount) and returned an I/O error. This looks like corruption, failed writes, or “my app randomly errors.”
- The server went away and came back, but the client’s view of the export no longer matches (stale file handles). This looks like “it worked yesterday.”
- RPC plumbing problems (especially NFSv3: portmapper/rpcbind, mountd, lockd) cause partial failures where some operations work and others time out.
- Name resolution or routing flaps cause intermittent stalls that self-heal. These are the worst because they breed superstition.
Docker amplifies the failure modes
NFS is sensitive to mount timing. Docker is very good at starting containers quickly, concurrently, and sometimes before the network is ready.
If the NFS mount is triggered on-demand, your “container start” becomes “container start plus network plus DNS plus server responsiveness.”
That’s fine in a lab. In production, it’s an elaborate way to turn minor network jitter into full-service outages.
Hard vs soft is not a performance tuning knob; it’s a risk decision
For most stateful workloads, the safe default is hard mounts: keep retrying, don’t pretend writes succeeded.
But hard mounts can hang processes when the server is unreachable. So your job is to make “server unreachable” rare and short,
and make the mount resilient to the normal chaos of networks.
There’s a paraphrased idea from Werner Vogels (Amazon CTO) that’s worth keeping in your head: “Everything fails, so design for failure.”
NFS mounts are exactly where that philosophy stops being inspirational and starts being a checklist.
Interesting facts and context (short, concrete, useful)
- NFS predates containers by decades. It originated in the 1980s as a way to share files over a network without client-side state complexity.
- NFSv3 is stateless, mostly. That made failover simpler in some ways, but pushed complexity into auxiliary daemons (rpcbind, mountd, lockd).
- NFSv4 collapsed the side-channels. v4 typically uses a single well-known port (2049) and integrates locking and state, which often improves firewall and NAT friendliness.
- “Hard mount” is the historic default for a reason. Losing data silently is worse than waiting; hard mounts bias toward correctness, not liveness.
- The Linux NFS client has multiple layers of timeouts. There’s the per-RPC timeout (
timeo), the number of retries (retrans), and then higher-level recovery behaviors. - Stale file handles are a classic NFS tax. They happen when the server’s inode/file handle mapping changes under the client—common after server-side failover or export changes.
- NFS over TCP wasn’t always the default. UDP was popular early on; TCP is now the sane default for reliability and congestion control.
- DNS matters more than you think. NFS clients can cache name-to-IP differently than your application; a DNS change mid-flight can produce “half the world works” symptoms.
Joke #1: NFS is like a shared office printer—when it works, nobody notices; when it doesn’t, everyone suddenly has urgent “business-critical” documents.
Fast diagnosis playbook (find the bottleneck fast)
The goal is not “collect every metric.” The goal is to decide, quickly, whether you have a network path problem,
a server problem, a client mount semantics problem, or a Docker orchestration problem.
Here’s the order that saves time.
First: confirm it’s NFS and not the app
- On the Docker host, try a simple
statorlson the mounted path. If it hangs, it’s not your app. It’s the mount. - Check
dmesgfor server not responding / timed out / stale file handle. Kernel messages are blunt and usually correct.
Second: decide “network vs server” with one test
- From the client, verify connectivity to port 2049 and (if using NFSv3) rpcbind/portmapper. If you can’t connect, stop blaming mount options.
- From another host in the same network segment, test the same. If the issue is isolated to one client, suspect local firewall, conntrack exhaustion, MTU, or a bad route.
Third: verify protocol version and mount options
- Check whether you’re on NFSv3 or NFSv4. Many “random” timeouts are actually rpcbind/mountd issues from NFSv3 in modern networks.
- Confirm
hard,timeo,retrans,tcp, and whether you usedintr(deprecated behavior) or other legacy flags.
Fourth: inspect server-side logs and saturation
- Server load average isn’t enough. Look at NFS threads, disk latency, and network drops.
- If the server is a NAS appliance, identify whether it’s CPU-bound (encryption, checksumming) or I/O-bound (spindles, rebuild, snapshot delete).
If you do those four phases, you can usually name the failure class in under ten minutes. The long part is politics.
Mount options that improve stability (do this, not vibes)
“Best” mount options depend on whether you prefer correctness or uptime when the network misbehaves.
For most production systems with stateful writes, I’m biased toward correctness: hard mounts,
conservative timeouts, and protocol choices that reduce moving parts.
Baseline: what I’d deploy for general-purpose Docker NFS volumes
Use NFSv4.1+ when you can. Use TCP. Avoid options that “make errors go away” by returning success early.
Stability is mostly about predictable failure behavior.
- Prefer:
vers=4.1(or 4.2 if supported),proto=tcp,hard,timeo=600,retrans=2,noatime - Consider:
nconnect=4(Linux client) for throughput and some resilience,rsize/wsizeonly if you have evidence - Avoid by default:
soft,nolock(unless you truly understand locking requirements), aggressivetimeotweaks, and UDP
Hard vs soft: make the trade explicit
hard means system calls can block until the server returns, potentially forever. This protects you from silent data loss.
It also means your process can hang when the server is unreachable. That’s a feature and a liability.
soft means the kernel returns an error after retries. This is tempting because it “unsticks” your containers.
It also encourages partial writes, corrupt outputs, and applications that don’t handle EIO well (most don’t).
If you choose soft, do it for read-only or cache-like workloads, and treat errors as expected.
For databases, queues, and anything with durability claims: don’t.
Pick NFSv4 when you can, especially in container platforms
NFSv3 requires rpcbind and mountd for initial mount negotiation, plus lockd/statd for locking. Those are extra dependencies and ports.
In complex networks—firewalls, overlay networks, NAT, security groups—that’s more ways to fail.
NFSv4 consolidates much of that into port 2049 and a stateful protocol. That statefulness can introduce its own issues,
but in real-world container fleets it usually reduces “random mount” failures.
timeo and retrans: stop treating them like magic numbers
timeo is the base RPC timeout (in tenths of a second for many mounts). The client backs off.
retrans is the number of times to retry an RPC before reporting a “not responding” event (for soft) or continuing to retry (for hard).
A reasonable stability posture:
- Don’t go too low. Tiny timeouts amplify transient jitter into outages.
- Don’t go too high without thought. Huge timeouts can hide real failure for too long and delay failover behavior above you.
- Lower retrans, moderate timeo often works: fail “fast enough” at the RPC level, but still retry predictably at the mount level.
nconnect: a modern tool with sharp edges
nconnect creates multiple TCP connections to the NFS server for a single mount. This can improve throughput and reduce head-of-line blocking.
It can also increase load on the server, expose firewall/conntrack limits, and make debugging more “fun” because there’s more than one flow.
Use it when you have evidence of single-connection saturation, and after validating server capacity and network state tables.
If your issue is timeouts due to packet loss or server overload, nconnect can make it worse.
Locking options: don’t disable locks to stop timeouts
nolock is a common “fix” that trades timeouts for correctness bugs. It can help with some NFSv3 lockd issues,
but it also breaks applications that rely on POSIX locks or cooperative locking patterns.
In container fleets, you rarely have the institutional memory to know who relies on locking. So don’t do this casually.
Attribute caching and consistency: choose your poison consciously
Options like actimeo, acregmin, acregmax, acdirmin, acdirmax, and nocto impact how quickly clients notice changes.
Aggressive caching can reduce metadata traffic and improve performance.
It can also make your app think a file doesn’t exist (yet) or that it’s still the old version.
For shared-write workloads, keep caching conservative.
For read-mostly assets, you can increase caching, but confirm your deployment pattern won’t trip over “delayed visibility.”
When you should consider bg, automount, and systemd ordering
Many Docker NFS “timeouts” aren’t runtime at all; they’re boot/start ordering issues.
Your host comes up, Docker starts, containers start, and then the network stack finishes negotiating routes.
NFS mounts that happen during this window behave badly.
A practical solution is to use systemd automount units or at least ensure mounts require network-online.
Docker will happily start containers and block them on I/O; you want the mount ready before workloads depend on it.
Joke #2: The easiest way to reduce NFS timeouts is to stop calling them “timeouts” and start calling them “future career opportunities.”
Docker configuration patterns that don’t sabotage you
Pattern 1: Docker local volume driver with NFS options (fine, but verify what it mounted)
Docker’s local volume driver can mount NFS using type=nfs and o=....
This is common in Compose and Swarm.
The trap: people assume Docker “does something smart.” It doesn’t. It passes options to the mount helper.
If the mount helper falls back to another version or ignores an option, you may not notice.
Pattern 2: Pre-mount on the host and bind-mount into containers (often more predictable)
If you pre-mount via /etc/fstab or systemd mount units, you can control ordering, retries, and observe the mount directly.
Docker then just bind-mounts a local path. This reduces “Docker magic,” which is generally good for sleep.
Pattern 3: Separate mounts by workload class
Don’t use one NFS export and one mount option set for everything.
Treat NFS like a service with SLOs: low-latency metadata (CI caches), bulk throughput (media), correctness-first (stateful app data).
Different mounts, different options, different expectations.
Practical tasks: commands, output, and the decision you make
These are the on-call moves that turn “NFS is flaky” into a clear next action. Run them on the Docker host unless noted.
Each task includes (1) command, (2) what the output means, (3) what decision to make.
Task 1: Identify which mounts are NFS and how they’re configured
cr0x@server:~$ findmnt -t nfs,nfs4 -o TARGET,SOURCE,FSTYPE,OPTIONS
TARGET SOURCE FSTYPE OPTIONS
/var/lib/docker-nfs nas01:/exports/appdata nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.10.8.21
Meaning: Confirms NFS version, proto, and whether you’re on hard or soft. Also shows if rsize/wsize are huge and potentially mismatched.
Decision: If you see vers=3 unexpectedly, plan to move to v4 or audit rpcbind/mountd ports. If you see soft on write-heavy workloads, change it.
Task 2: Confirm Docker volume configuration (what Docker thinks it asked for)
cr0x@server:~$ docker volume inspect appdata
[
{
"CreatedAt": "2026-01-01T10:12:44Z",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/appdata/_data",
"Name": "appdata",
"Options": {
"device": ":/exports/appdata",
"o": "addr=10.10.8.10,vers=4.1,proto=tcp,hard,timeo=600,retrans=2,noatime",
"type": "nfs"
},
"Scope": "local"
}
]
Meaning: This is configuration, not truth. Docker’s options can be correct while the actual mount differs.
Decision: Compare with findmnt. If they differ, troubleshoot mount helper behavior, defaults, and kernel support.
Task 3: Look for kernel NFS client errors right now
cr0x@server:~$ dmesg -T | egrep -i 'nfs:|rpc:|stale|not responding|timed out' | tail -n 20
[Fri Jan 3 01:58:41 2026] nfs: server nas01 not responding, still trying
[Fri Jan 3 01:59:12 2026] nfs: server nas01 OK
Meaning: “Not responding, still trying” indicates a hard mount retrying through a disruption.
Decision: If these events align with app hangs, investigate network drops or server stalls; don’t “fix” the app.
Task 4: Confirm the process states during a hang (is it stuck in D-state?)
cr0x@server:~$ ps -eo pid,stat,comm,wchan:40 | egrep 'D|nfs' | head
8421 D php-fpm nfs_wait_on_request
9133 D rsync nfs_wait_on_request
Meaning: D state with nfs_wait_on_request points at blocked kernel I/O waiting on NFS.
Decision: Treat as infrastructure incident. Restarting containers won’t help if the mount is hard-stuck.
Task 5: Check basic TCP connectivity to the NFS server
cr0x@server:~$ nc -vz -w 2 10.10.8.10 2049
Connection to 10.10.8.10 2049 port [tcp/nfs] succeeded!
Meaning: Port 2049 reachable right now.
Decision: If this fails during the incident, your mount options aren’t the primary problem; fix routing, ACLs, firewall, or server availability.
Task 6: If using NFSv3, confirm rpcbind is reachable (common hidden dependency)
cr0x@server:~$ nc -vz -w 2 10.10.8.10 111
Connection to 10.10.8.10 111 port [tcp/sunrpc] succeeded!
Meaning: rpcbind/portmapper reachable. Without it, NFSv3 mounts can fail or hang during mount negotiation.
Decision: If 111 is blocked and you’re on v3, move to v4 or open required ports properly (and document them).
Task 7: Identify NFS version negotiated and server address used (catch DNS surprises)
cr0x@server:~$ nfsstat -m
/var/lib/docker-nfs from nas01:/exports/appdata
Flags: rw,hard,noatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.10.8.21,local_lock=none
Meaning: Confirms negotiated settings. Note the server name vs IP, and any local_lock behavior.
Decision: If the mount uses a hostname and your DNS is unstable, switch to IP or pin host entries—then plan a better DNS story.
Task 8: Measure retransmits and RPC-level pain (is it packet loss?)
cr0x@server:~$ nfsstat -rc
Client rpc stats:
calls retrans authrefrsh
148233 912 148245
Meaning: Retransmits indicate RPCs that had to be resent. A rising retrans count correlates with loss, congestion, or server stalls.
Decision: If retrans jumps during incidents, inspect network drops and server load; consider increasing timeo modestly, not decreasing it.
Task 9: Check interface errors and drops (don’t guess)
cr0x@server:~$ ip -s link show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
RX: bytes packets errors dropped missed mcast
128G 98M 0 127 0 1234
TX: bytes packets errors dropped carrier collsns
141G 92M 0 84 0 0
Meaning: Drops on RX/TX can be enough to trigger NFS “not responding” under load.
Decision: If drops grow, investigate NIC rings, MTU mismatch, switch congestion, or host CPU saturation.
Task 10: Spot MTU mismatch quickly (jumbo frames are innocent until proven guilty)
cr0x@server:~$ ping -c 3 -M do -s 8972 10.10.8.10
PING 10.10.8.10 (10.10.8.10) 8972(9000) bytes of data.
From 10.10.8.21 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 10.10.8.21 icmp_seq=2 Frag needed and DF set (mtu = 1500)
From 10.10.8.21 icmp_seq=3 Frag needed and DF set (mtu = 1500)
--- 10.10.8.10 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2043ms
Meaning: The path MTU is 1500, but your host thinks it’s 9000. That causes blackholing and “random” stalls.
Decision: Fix MTU end-to-end or drop to 1500. Then re-evaluate NFS stability before touching mount options.
Task 11: Confirm the server export exists and permissions are sane (server-side view)
cr0x@server:~$ showmount -e 10.10.8.10
Export list for 10.10.8.10:
/exports/appdata 10.10.8.0/24
/exports/shared 10.10.0.0/16
Meaning: Shows exports (mostly useful for NFSv3 environments; for v4 it’s still a helpful hint).
Decision: If the export isn’t listed or the client subnet isn’t allowed, stop tuning the client and fix the export policy.
Task 12: Capture a short NFS packet trace during the event (prove loss vs server silence)
cr0x@server:~$ sudo tcpdump -i eth0 -nn host 10.10.8.10 and port 2049 -c 30
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:02:11.101223 IP 10.10.8.21.51344 > 10.10.8.10.2049: Flags [P.], seq 219:451, ack 1001, win 501, length 232
10:02:12.102988 IP 10.10.8.21.51344 > 10.10.8.10.2049: Flags [P.], seq 219:451, ack 1001, win 501, length 232
10:02:13.105441 IP 10.10.8.21.51344 > 10.10.8.10.2049: Flags [P.], seq 219:451, ack 1001, win 501, length 232
Meaning: Repeated retransmits without server replies indicates server not responding or replies not returning.
Decision: If you see client retransmits and no server response, go to server health / network return path. If you see server replies but client keeps retransmitting, suspect asymmetric routing or firewall state.
Task 13: Check Docker daemon logs for mount attempts and failures
cr0x@server:~$ journalctl -u docker --since "30 min ago" | egrep -i 'mount|nfs|volume|rpc' | tail -n 30
Jan 03 09:32:14 server dockerd[1321]: time="2026-01-03T09:32:14.112345678Z" level=error msg="error while mounting volume 'appdata': failed to mount local volume: mount :/exports/appdata:/var/lib/docker/volumes/appdata/_data, data: addr=10.10.8.10,vers=4.1,proto=tcp,hard,timeo=600,retrans=2: connection timed out"
Meaning: Confirms Docker couldn’t mount, versus the app failing later.
Decision: If mounts fail at container start, prioritize network readiness and server reachability; don’t chase runtime tuning yet.
Task 14: Inspect systemd ordering (network-online is not the same as network)
cr0x@server:~$ systemctl status network-online.target
● network-online.target - Network is Online
Loaded: loaded (/lib/systemd/system/network-online.target; static)
Active: active since Fri 2026-01-03 09:10:03 UTC; 1h 2min ago
Meaning: If this target isn’t active when mounts occur, your NFS mount may race the network.
Decision: If you see ordering problems, move mounts to systemd units with After=network-online.target and Wants=network-online.target, or use automount.
Task 15: Validate that the mount is responsive (fast sanity check)
cr0x@server:~$ time bash -c 'stat /var/lib/docker-nfs/. && ls -l /var/lib/docker-nfs >/dev/null'
real 0m0.082s
user 0m0.004s
sys 0m0.012s
Meaning: Basic metadata ops are fast. If this sometimes takes seconds or hangs, you have intermittent latency or stalls.
Decision: If metadata is slow, investigate server disk latency and NFS thread saturation; mount options won’t rescue a drowning server.
Three corporate mini-stories (how this actually fails)
1) Incident caused by a wrong assumption: “The volume is local, because Docker said ‘local’”
A mid-sized company ran a Swarm cluster for internal services. A team created a Docker volume with the local driver and NFS options.
Everyone read “local” and assumed the data lived on each node. That assumption shaped everything: failure drills, maintenance windows, even incident ownership.
During a network maintenance, one top-of-rack switch flapped. Only some nodes lost connectivity to the NAS for a few seconds.
The affected nodes had hard-mounted NFS volumes. Their containers didn’t crash; they just stopped making progress. Health checks timed out.
The orchestrator started rescheduling, but new tasks landed on the same impaired nodes because scheduling had no idea NFS was the bottleneck.
The on-call response was classic: restart the service. That just created more blocked processes. Someone tried to delete and recreate the volume.
Docker complied, but the kernel mount was still wedged. The host became a museum of stuck tasks.
The fix wasn’t heroic. They documented that “local driver” can still be remote storage, added a preflight check in deployment pipelines
to verify mount type with findmnt, and pinned NFS-critical services away from nodes that couldn’t reach the storage VLAN.
The biggest change was cultural: storage stopped being “someone else’s problem” the moment containers entered the picture.
2) Optimization that backfired: “We lowered timeouts so failures fail fast”
Another organization had an intermittent issue: applications would hang when NFS hiccuped. Someone proposed a “simple” change:
switch to soft, lower timeo, and raise retrans so the client would give up quickly and the app could handle it.
This looked reasonable in a ticket, because everything looks reasonable in a ticket.
In practice, the applications were not built to handle mid-stream EIO on writes.
A background worker wrote to a temp file and then renamed it into place. Under soft mounts and low timeouts,
the write sometimes failed but the workflow didn’t always propagate the error. The rename happened with partial content.
Downstream tasks processed garbage.
The incident wasn’t a clean outage; it was worse. The system stayed “up” while producing wrong results.
That triggered a slow-motion response: rollback, reprocess, audit outputs. Eventually the mount options were reverted.
Then they fixed the real problem: intermittent packet loss from a misconfigured LACP bond and an MTU mismatch that only appeared under load.
The takeaway they wrote into their internal runbook was painfully accurate: “Fail fast” is great when the failure is surfaced reliably.
Soft mounts made the failure easier to ignore, not easier to handle.
3) Boring but correct practice that saved the day: pre-mount + automount + explicit dependencies
A financial services shop ran stateful batch jobs in containers, writing artifacts to NFS.
They had a dull rule: NFS mounts are managed by systemd, not by Docker volume creation at runtime.
Every mount had an automount unit, a defined timeout, and a dependency on network-online.target.
One morning, a routine reboot cycle hit a node while the NAS was being patched. The NAS was reachable but slow for a few minutes.
Containers started, but their NFS-backed paths were automounted only when needed. The automount attempt waited, then succeeded when the NAS recovered.
The jobs started slightly late, and nobody woke up.
The difference was not better hardware. It was that the mount lifecycle wasn’t coupled to container lifecycle.
Docker didn’t get to decide when the mount happened, and failures were visible at the system level with clear logs.
That’s the kind of practice executives never praise because nothing happened. It’s also the practice that keeps you employed.
Common mistakes: symptom → root cause → fix
1) Symptom: containers “freeze” and won’t stop; docker stop hangs
Root cause: hard-mounted NFS is stalled; processes are in D state waiting for kernel I/O.
Fix: restore connectivity/server health; don’t expect signals to work. If you must recover a node, unmount after the server is back, or reboot the host as a last resort. Prevent recurrence with stable networking and sane timeo/retrans.
2) Symptom: “works on one node, times out on another”
Root cause: per-node routing/firewall/MTU differences, or conntrack exhaustion on a subset of nodes.
Fix: compare ip route, ip -s link, and firewall rules. Validate MTU with DF pings. Ensure identical network configuration across the fleet.
3) Symptom: mounts fail at boot or right after host restart
Root cause: mount attempts race network readiness; Docker starts containers before network-online is true.
Fix: manage mounts via systemd units with explicit ordering, or use automount. Avoid on-demand mounts initiated by container start.
4) Symptom: intermittent “permission denied” or weird identity issues
Root cause: UID/GID mismatch, root-squash behavior, or NFSv4 idmapping problems. Containers make this worse because user namespaces and image users vary.
Fix: standardize UID/GID for writers, validate server export options, and for NFSv4 confirm idmapping configuration. Don’t paper over this with 0777; that’s not stability, that’s surrender.
5) Symptom: frequent “stale file handle” after NAS failover or export maintenance
Root cause: server-side file handle mapping changed; clients hold references that no longer resolve.
Fix: avoid moving/rewriting exports underneath clients; use stable paths. For recovery, remount and restart affected workloads. For architecture, prefer stable HA methods supported by your NAS and NFS version, and test failover with real clients.
6) Symptom: “random” mount failures only in secured networks
Root cause: NFSv3 dynamic ports blocked; rpcbind/mountd/lockd not permitted through firewall/security groups.
Fix: move to NFSv4 where possible. If stuck on v3, pin daemon ports server-side and open them intentionally—then document them so the next person doesn’t “optimize” your firewall.
7) Symptom: high latency spikes, then recovery, repeating under load
Root cause: server disk latency (rebuild/snapshot work), NFS thread saturation, or congested network queues.
Fix: measure server-side I/O latency and NFS service threads; fix the bottleneck. Client options like rsize/wsize won’t save a saturated array.
8) Symptom: switching to soft “fixes” hangs but introduces mysterious data issues
Root cause: soft mounts turn outages into I/O errors; applications mishandle partial failures.
Fix: revert to hard for stateful writes, fix the underlying connectivity, and update apps to handle errors where appropriate.
Checklists / step-by-step plan
Step-by-step: stabilize an existing Docker NFS deployment
- Inventory mounts with
findmnt -t nfs,nfs4. Write downvers,proto,hard/soft,timeo,retrans, and whether you used hostnames. - Confirm reality with
nfsstat -m. If Docker says one thing and the kernel did another, trust the kernel. - Decide protocol: prefer NFSv4.1+. If you’re on v3, list firewall dependencies and failure cases you can’t tolerate.
- Fix the network before tuning: validate MTU end-to-end; eliminate interface drops; verify routing symmetry; ensure port 2049 stability.
- Pick mount semantics:
- Stateful writes:
hard, moderatetimeo, low-ishretrans, TCP. - Read-only/cache: consider
softonly if your app handlesEIOand you’re okay with “error instead of hang.”
- Stateful writes:
- Make mounts predictable: pre-mount via systemd or use automount. Avoid runtime mounts triggered by container start where possible.
- Test failure: unplug the server network (in a lab), reboot a client, flap a route, and observe. If your test is “wait and hope,” you’re not testing.
- Operationalize: add dashboards for retransmits, interface drops, server NFS thread saturation, and disk latency. Add an on-call runbook that starts with the Fast diagnosis playbook above.
A short checklist for mount options (stability-first)
- Use TCP:
proto=tcp - Prefer NFSv4.1+:
vers=4.1(or 4.2 if supported) - Correctness-first:
hard - Don’t over-tune: start with
timeo=600,retrans=2and adjust only with evidence - Reduce metadata churn:
noatimefor typical workloads - Be cautious with
actimeoand friends; caching is not “free performance” - Consider
nconnectonly after measuring server and firewall capacity
FAQ
1) Should I use NFSv3 or NFSv4 for Docker volumes?
Use NFSv4.1+ unless you have a specific compatibility reason not to. In container-heavy networks, fewer auxiliary daemons and ports usually means fewer “random” mount failures.
2) Is soft ever acceptable?
Yes—for read-only or cache-like data where an I/O error is preferable to a hang, and where your application is built to treat EIO as normal. For stateful writes, it’s a footgun.
3) Why does docker stop hang when NFS is down?
Because the processes are blocked in kernel I/O on a hard-mounted filesystem. Signals can’t interrupt a thread stuck in uninterruptible sleep. Fix the mount’s underlying reachability.
4) What do timeo and retrans actually do?
They govern RPC retry behavior. timeo is the base timeout for an RPC; retrans is how many retries occur before the client reports “not responding” events (and for soft mounts, before failing I/O).
5) Should I tune rsize and wsize to huge values?
Not by superstition. Modern defaults are often good. Oversized values can interact badly with MTU, server limits, or network drops. Tune only after measuring throughput and retransmits.
6) Does using an IP instead of a hostname help?
It can. If DNS is flaky, slow, or changes unexpectedly, using an IP avoids name resolution as a failure dependency. The trade is losing easy server migration unless you manage the IP as a stable endpoint.
7) What causes “stale file handle” and how do I prevent it?
It’s usually caused by server-side changes that invalidate file handles: export path moves, failover behavior, or filesystem changes under the export. Prevent it by keeping exports stable and using HA methods your NAS supports with real client testing.
8) Should I mount via Docker volumes or pre-mount on the host?
Pre-mounting (systemd mounts/automount) is often more predictable and easier to debug. Docker-volume mounting is workable, but it couples mount lifecycle to container lifecycle, which is not where you want your reliability story to live.
9) What about nolock to fix hangs?
Avoid it unless you’re absolutely sure your workload doesn’t rely on locks. It can “fix” lockd-related issues in NFSv3 by disabling locking, but that trades outages for correctness bugs.
10) If my NFS server is fine, why do only some clients see timeouts?
Because “server is fine” is often shorthand for “it answered a ping.” Client-local issues like MTU mismatch, asymmetric routing, conntrack limits, and NIC drops can selectively break NFS while leaving other traffic mostly okay.
Conclusion: next steps that reduce pager load
If you’re fighting Docker NFS volume timeouts, don’t start by twiddling timeo like it’s a radio dial.
Start by naming the failure: network path, server saturation, protocol version friction, or orchestration timing.
Then make a deliberate choice about semantics: correctness-first hard mounts for stateful writes, and only carefully scoped soft mounts for disposable data.
Practical next steps you can do this week:
- Audit every Docker host with
findmntandnfsstat -m; record actual options and NFS version. - Standardize on NFSv4.1+ over TCP unless you have a reason not to.
- Fix MTU and drop counters before changing mount tuning.
- Move critical mounts to systemd-managed mounts (ideally automount) with explicit network-online ordering.
- Write a runbook based on the Fast diagnosis playbook, and practice it once while it’s quiet.
The endgame isn’t “NFS never blips.” The endgame is: when it blips, it behaves predictably, recovers cleanly, and doesn’t turn your containers into modern art.