ZFS SMB: Fixing “Windows Copy Is Slow” for Real

Was this helpful?

Windows Explorer says “Copying… 112 MB/s” for three seconds, then it drops to 0 B/s and sits there like it’s thinking about its life choices. Users blame “the network.” Network blames “storage.” Storage blames “Windows.” Everyone is wrong in a different way.

If you run ZFS-backed SMB (usually Samba on Linux, sometimes on a storage appliance), you can make Windows copies consistently fast. But you don’t do it by turning random knobs. You do it by proving where the latency comes from, then fixing the specific part that’s lying.

What “slow copy” actually means (and what it isn’t)

“Windows copy is slow” is not a single problem. It’s a user-visible symptom of a pipeline that includes: Windows client behavior, SMB protocol semantics, Samba implementation details, ZFS transaction groups and write paths, and physical media latency. Your job is to find the stage that turns bandwidth into waiting.

The three copy patterns you must separate

  • Large sequential copies (e.g., ISO, VHDX): should run near line rate until the server can’t commit writes fast enough.
  • Many small files (e.g., source trees): dominated by metadata (create, setattr, close, rename), not throughput.
  • Mixed workloads (home shares + VMs + scanners): “slow” is often head-of-line blocking: one bad pattern ruins the queue for everyone.

What it usually is not

It’s rarely “SMB is slow.” SMB3 can be very fast. It’s rarely “ZFS is slow.” ZFS can saturate serious networks. It’s usually latency spikes from sync writes, small random I/O, metadata amplification, or bad caching alignment, made visible by a client that reports speed in optimistic bursts.

One more framing shift: Windows Explorer is not a benchmark tool; it’s an anxiety visualizer. That graph is more mood ring than oscilloscope.

Interesting facts and historical context (so the behavior makes sense)

  1. SMB1 vs SMB2/3 changed everything. SMB2 (Vista/2008 era) reduced chattiness, larger reads/writes, and pipelining. Many “SMB is slow” stories are really “you’re stuck on SMB1.”
  2. Samba started as a reverse-engineered project. It grew from “make UNIX talk to Windows” into an enterprise-grade SMB server. Some defaults are conservative because Samba has to survive weird clients.
  3. ZFS writes are grouped. ZFS commits data in transaction groups (TXGs). That makes throughput great, but it also creates visible “pulse” behavior if the commit phase stalls.
  4. Sync writes are a promise, not a feeling. When an SMB client requests durability, ZFS must commit safely. If your pool can’t do low-latency fsync, you get the classic “fast then zero” copy graph.
  5. SMB durable handles and leases changed close/open behavior. Modern Windows caches aggressively. That’s good, until an app forces durability semantics and turns caching into synchronous pain.
  6. Recordsize matters more for file shares than people admit. ZFS recordsize shapes I/O amplification. Wrong recordsize doesn’t just waste space—it forces extra IOPS under small random access.
  7. Compression often helps SMB, even on fast CPUs. Many office files compress well, reducing disk and network load. The win is often latency, not bandwidth.
  8. SMB signing became more common for security. Enabling signing can be a CPU tax. The “secure” setting sometimes becomes “securely slow” when server CPU is weak or single-thread limited.

Fast diagnosis playbook

This is the order that finds the bottleneck quickly, without falling into the “tune everything” trap.

First: classify the workload

  • One big file? Many small files? Application writes with durability requirements?
  • Does speed drop at fixed intervals (every few seconds)? That smells like TXG commit latency.
  • Does it only happen on certain shares? That smells like dataset properties or SMB share options.

Second: decide if it’s network, CPU, or storage latency

  • Network: interface errors, retransmits, wrong MTU, bad LACP hashing, Wi‑Fi clients pretending to be servers.
  • CPU: one core pinned in smbd, signing/encryption overhead, interrupts, softirq saturation.
  • Storage latency: high await on vdevs, ZFS sync path blocked, SLOG missing/slow, pool near-full or fragmented.

Third: validate sync behavior (this is the usual villain)

  • Check dataset sync property and Samba settings that force sync (e.g., strict sync).
  • Measure fsync latency from the SMB host itself, not from your laptop.
  • If you need sync semantics, ensure a proper SLOG device (power-loss protected) or accept the performance limits of your main vdevs.

Fourth: isolate the “many small files” case

  • Metadata is the workload. Check atime, xattr behavior, and small-block performance.
  • Verify that your pool layout matches metadata IOPS expectations (mirrors vs RAIDZ tradeoffs).

Fifth: tune only what the measurements implicate

If you can’t show a before/after in I/O latency, CPU utilization, or retransmits, you aren’t tuning—you’re decorating.

Stop guessing: measure where the time goes

SMB copies are a negotiation between a client that buffers and a server that commits. Explorer reports “speed” based on how fast data is accepted into buffers, not how fast it is durably written. Meanwhile, ZFS can accept data quickly into ARC and dirty buffers, then pause while committing TXGs. That pause is where the graph hits zero.

Your measurement plan should answer three questions:

  1. Is the client waiting on the server (latency), or is it not sending (client throttling)?
  2. Is the server waiting on disk flushes (sync path) or on CPU (signing/encryption) or on the network?
  3. Is ZFS amplifying the workload (recordsize mismatch, fragmentation, metadata pressure)?

Reliability engineering has a simple rule that applies here: measure the system you have, not the system you wish you had.

Paraphrased idea (Gene Kim): “Improving flow means finding and removing the constraint.” That’s the whole game.

ZFS realities that bite SMB

TXGs and the “fast then zero” pattern

ZFS accumulates dirty data in memory and periodically commits it to disk as a transaction group. If the commit phase takes too long, the system throttles writers. From the client’s view: fast burst, then stall. Repeat. That’s not “network jitter.” It’s storage durability catching up.

Sync writes: the durability tax

When the workload issues synchronous writes (or when the server treats them as such), ZFS must ensure data is on stable storage before acknowledging. On pools without a fast intent log device, sync writes hit your main vdevs. If those are RAIDZ with HDDs, you can predict the result: pain with a timestamp.

Recordsize, ashift, and I/O amplification

ZFS recordsize controls the maximum block size for file data. SMB file shares often store mixed file sizes; a recordsize too large won’t always hurt sequential reads, but it can hurt random writes and partial overwrites. Too small can increase metadata overhead and reduce compression efficiency.

Metadata is not “free”

Small file copies stress metadata: directory entries, ACLs, xattrs, timestamps. ZFS can handle this well, but only if the pool layout and caching are sensible. If you built a wide RAIDZ for capacity and then turned it into a metadata-heavy SMB share, you basically bought a bus and entered it in a motorcycle race.

Pool fullness and fragmentation

As pools get full, allocation becomes harder, fragmentation rises, and latency climbs. SMB users experience this as “it was fine last month.” ZFS doesn’t suddenly forget how to write; it runs out of easy places to put blocks.

SMB realities that bite ZFS

Windows copy semantics: buffering, close, and durability

Windows can buffer writes and only force durability at file close, depending on application flags and server configuration. Some apps (and some security tools) request write-through semantics. That flips your workload from “mostly async” to “sync-heavy” instantly.

Signing and encryption: security has a CPU bill

SMB signing is often mandated by policy. Encryption might be enabled for certain shares. Both consume CPU. If your SMB server is a modest CPU with a fast NIC, you can hit a ceiling where the network is idle and one core is sweating bullets in crypto.

SMB3 Multichannel: great when it works, irrelevant when it doesn’t

Multichannel can use multiple NICs and RSS queues. When misconfigured, you get exactly one TCP flow stuck on one queue. Then someone says “but we have dual 10GbE” as if the server is obligated to care.

Opportunistic locks, leases, and antivirus

Client caching (oplocks/leases) reduces chatter. But endpoint security scanners love to open files, force attribute updates, and generally break caching behavior. This can turn a “many small files” copy into a syscall festival.

Joke #1: SMB troubleshooting is like office politics—everyone insists they’re the bottleneck, and somehow they’re all correct.

Practical tasks: commands, outputs, decisions

Below are real tasks you can run on the SMB/ZFS server. Each includes what the output means and what decision you should make next. These are biased toward Linux + Samba + OpenZFS, because that’s where most “Windows copy is slow” tickets live.

Task 1: Confirm pool health (because performance is often a symptom of a dying disk)

cr0x@server:~$ sudo zpool status -v tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 04:12:19 with 0 errors on Tue Dec 10 03:20:01 2025
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            ata-SAMSUNG_SSD_1TB_A   ONLINE       0     0     0
            ata-SAMSUNG_SSD_1TB_B   ONLINE       0     0     0

errors: No known data errors

Meaning: “ONLINE” and clean scrub means you’re not fighting silent retries or resilver load. If you see DEGRADED, resilvering, or checksum errors, stop performance tuning and fix hardware first.

Decision: If any vdev shows errors or resilver activity, schedule remediation and retest performance after stabilization.

Task 2: Check pool fullness (near-full pools get slow in boring, predictable ways)

cr0x@server:~$ zfs list -o name,used,avail,refer,mountpoint tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   38.2T 2.1T   96K    /tank

Meaning: ~95% full (38.2T used, 2.1T avail) is danger territory for many workloads. Allocation becomes constrained; fragmentation rises.

Decision: If you’re above ~80–85% used and performance matters, plan space reclamation or expansion. No Samba knob will beat physics.

Task 3: Identify which dataset backs the SMB share and dump its key properties

cr0x@server:~$ sudo zfs get -H -o property,value recordsize,compression,atime,sync,xattr,acltype,primarycache,logbias tank/shares/engineering
recordsize	1M
compression	lz4
atime	on
sync	standard
xattr	sa
acltype	posixacl
primarycache	all
logbias	latency

Meaning: You have 1M recordsize (good for large sequential files, risky for partial overwrites), atime is on (extra metadata writes), sync standard (sync honored), xattr in SA (often good), logbias latency (prefers SLOG if present).

Decision: If the share is “many small files,” consider recordsize=128K and atime=off. If it’s VM images, treat it differently (and probably not via SMB).

Task 4: Measure pool I/O latency during a copy (the truth is in iostat)

cr0x@server:~$ sudo zpool iostat -v tank 1 5
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
tank                        38.2T  2.1T     12   2400   3.1M   210M
  mirror-0                  38.2T  2.1T     12   2400   3.1M   210M
    ata-SAMSUNG_SSD_1TB_A      -      -      6   1250   1.6M   108M
    ata-SAMSUNG_SSD_1TB_B      -      -      6   1150   1.5M   102M
--------------------------  -----  -----  -----  -----  -----  -----

Meaning: High write ops (2400/s) with moderate bandwidth suggests small writes or sync-heavy behavior. If bandwidth is low but ops are high, you’re IOPS-bound or flush-bound.

Decision: If writes are small and frequent, investigate sync semantics, metadata load, and recordsize mismatch. If ops are low and bandwidth is low, suspect network or SMB throttling.

Task 5: Observe per-vdev latency with iostat (await is the smoke alarm)

cr0x@server:~$ sudo iostat -x 1 3
Linux 6.6.15 (server) 	12/25/2025 	_x86_64_	(16 CPU)

Device            r/s     w/s   rkB/s   wkB/s  avgrq-sz avgqu-sz   await  r_await  w_await  svctm  %util
nvme0n1           2.0   950.0    64.0 118000.0   248.0     3.20    3.4     1.2      3.4    0.6   58.0
nvme1n1           1.0   910.0    32.0 112000.0   246.0     3.05    3.3     1.1      3.3    0.6   55.0

Meaning: ~3.3ms write await is fine for NVMe. If you see tens/hundreds of ms during copies, the storage is gating your throughput.

Decision: High await + low CPU + clean network = storage path problem (sync writes, full pool, slow vdevs, or SLOG issues).

Task 6: Check whether you even have a SLOG (and whether it’s doing anything)

cr0x@server:~$ sudo zpool status tank | sed -n '1,120p'
  pool: tank
 state: ONLINE
config:

        NAME                         STATE     READ WRITE CKSUM
        tank                         ONLINE       0     0     0
          mirror-0                   ONLINE       0     0     0
            ata-SAMSUNG_SSD_1TB_A    ONLINE       0     0     0
            ata-SAMSUNG_SSD_1TB_B    ONLINE       0     0     0
        logs
          nvme-SLOG_INTEL_OPTANE     ONLINE       0     0     0

Meaning: There is a separate log device. Good. But existence isn’t performance; it must be fast and power-loss protected.

Decision: If sync-heavy workloads exist and there’s no SLOG, decide whether you need sync semantics. If you do, add a proper SLOG. If you don’t, don’t fake it with sync=disabled unless you are comfortable losing acknowledged data on power loss.

Task 7: Watch TXG throttling and dirty data behavior

cr0x@server:~$ sudo arcstat 1 3
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  avail
12:10:01   320    12      4     1    0    11    3     0    0   84.2G  96.0G  21.4G
12:10:02   410    16      3     2    0    14    3     0    0   84.2G  96.0G  21.4G
12:10:03   390    10      2     1    0     9    2     0    0   84.2G  96.0G  21.4G

Meaning: ARC isn’t missing badly; caching is not your current constraint. If ARC is tiny vs memory, or misses spike, you might be memory-bound or misconfigured.

Decision: If ARC misses are high during reads, tune memory/ARC sizing or working set (or accept that the workload doesn’t cache well).

Task 8: Confirm Samba version and whether SMB3 is negotiated

cr0x@server:~$ smbd -V
Version 4.19.5-Debian

cr0x@server:~$ sudo smbstatus -b | sed -n '1,60p'
Samba version 4.19.5-Debian
PID     Username     Group        Machine                                   Protocol Version  Encryption           Signing
-------------------------------------------------------------------------------------------------------------------------
23144   user1        domain users  10.10.20.55 (ipv4:10.10.20.55:53122)     SMB3_11           -                    partial

Meaning: SMB3_11 is negotiated, good. Signing is “partial” (depends on config and client). If you see SMB1, you have bigger problems than tuning.

Decision: If SMB1 appears, disable it and fix client compatibility intentionally. Do not keep SMB1 “for that one legacy scanner.” Replace the scanner or isolate it.

Task 9: Check Samba share configuration for sync killers (strict sync, sync always)

cr0x@server:~$ sudo testparm -sv | sed -n '/^\[engineering\]/,/^\[/{p}'
[engineering]
	path = /tank/shares/engineering
	read only = No
	vfs objects = acl_xattr
	strict sync = Yes
	sync always = No

Meaning: strict sync = Yes forces Samba to flush on more operations. This is a classic “we enabled it for safety” setting that can crater throughput.

Decision: If you don’t have a hard compliance need for strict semantics, set strict sync = No and validate application correctness. If you do need it, invest in SLOG and low-latency storage.

Task 10: Check whether SMB signing/encryption is on and whether CPU is the limiter

cr0x@server:~$ sudo smbstatus -b | awk 'NR==1 || NR==2 || $0 ~ /SMB3/'
Samba version 4.19.5-Debian
PID     Username     Group        Machine                                   Protocol Version  Encryption           Signing
23144   user1        domain users  10.10.20.55 (ipv4:10.10.20.55:53122)     SMB3_11           AES-128-GCM          mandatory

cr0x@server:~$ top -b -n 1 | sed -n '1,20p'
top - 12:12:41 up 34 days,  3:01,  2 users,  load average: 9.12, 8.40, 7.95
Tasks: 291 total,   2 running, 289 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.1 us,  2.0 sy,  0.0 ni, 78.0 id,  0.0 wa,  0.0 hi,  7.9 si,  0.0 st
MiB Mem :  256000.0 total,  21000.0 free,  95000.0 used, 140000.0 buff/cache

Meaning: Encryption is enabled. CPU is mostly idle here, so encryption likely isn’t the bottleneck right now. If you see a core pinned and softirq high, revisit.

Decision: If encryption/signing is mandatory and CPU is hot, upgrade CPU, use AES-NI capable systems, ensure RSS and multiqueue are configured, or narrow encryption to specific sensitive shares.

Task 11: Verify NIC link, duplex, and error counters (cheap checks, expensive consequences)

cr0x@server:~$ ip -s link show dev bond0
2: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 9c:dc:71:aa:bb:cc brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast
    1223344556677 1023344556      0       0       0  120034
    TX:  bytes packets errors dropped carrier collsns
    1334455667788 1124455667      0       0       0       0

cr0x@server:~$ ethtool bond0 | sed -n '1,25p'
Settings for bond0:
	Supported ports: [ ]
	Supported link modes:   Not reported
	Speed: 20000Mb/s
	Duplex: Full
	Auto-negotiation: off

Meaning: No errors, full duplex, expected speed. If you see errors or drops, fix network before touching ZFS.

Decision: If errors exist: check cabling, switch ports, MTU consistency, and offload settings. Performance tuning on top of packet loss is performance theater.

Task 12: Check TCP retransmits and socket pressure (SMB over a sick network is a lie)

cr0x@server:~$ ss -s
Total: 884
TCP:   211 (estab 104, closed 72, orphaned 0, timewait 72)

Transport Total     IP        IPv6
RAW	  0         0         0
UDP	  11        8         3
TCP	  139       113       26
INET	  150       121       29
FRAG	  0         0         0

cr0x@server:~$ netstat -s | sed -n '1,80p'
Tcp:
    154239 active connection openings
    149802 passive connection openings
    1124 failed connection attempts
    1821 connection resets received
    0 connections established
    224159 segments received
    231008 segments sent out
    214 segments retransmitted

Meaning: Retransmits exist but not crazy. If retransmits jump during copies, you’ll see stalls unrelated to storage. SMB is sensitive to latency spikes.

Decision: High retransmits: inspect switch buffers, MTU mismatch, NIC driver/firmware, or overloaded firewall path.

Task 13: Identify whether the workload is sync-heavy (server-side fsync test)

cr0x@server:~$ sync; sudo bash -c 'time dd if=/dev/zero of=/tank/shares/engineering/.fsync-test bs=1M count=256 conv=fdatasync status=none'
real	0m1.92s
user	0m0.00s
sys	0m0.28s

cr0x@server:~$ sudo rm -f /tank/shares/engineering/.fsync-test

Meaning: This measures “write then force durability.” If this is slow (e.g., 10–60s), your pool can’t commit sync writes quickly enough for SMB workloads that demand them.

Decision: Slow fsync: add/validate SLOG, reduce forced sync in Samba if acceptable, or redesign storage for low-latency writes.

Task 14: Confirm dataset is not accidentally forcing sync off (or on) where you didn’t intend

cr0x@server:~$ sudo zfs get -H -o name,property,value sync tank/shares/engineering tank/shares/finance
tank/shares/engineering	sync	standard
tank/shares/finance	sync	always

Meaning: Finance share is forced sync=always. That might be intentional (apps needing durability) or a misconfiguration that makes it crawl.

Decision: If sync=always exists, confirm with app owners why. If nobody can justify it, return to standard and test.

Task 15: Check ZFS compression and actual ratio (because “we enabled compression” is not the same as “it’s working”)

cr0x@server:~$ zfs get -H -o name,property,value compression,compressratio tank/shares/engineering
tank/shares/engineering	compression	lz4
tank/shares/engineering	compressratio	1.62x

Meaning: 1.62x means you’re saving I/O and space. If ratio is ~1.00x, compression isn’t helping much but usually doesn’t hurt with LZ4.

Decision: Keep LZ4 almost always. Only disable if you have measured CPU saturation and near-incompressible data.

Task 16: Look for pathological fragmentation (especially if pool is old and near full)

cr0x@server:~$ sudo zdb -bbbs tank | sed -n '1,40p'
Block Size Histogram:
 512: 0
 1K : 0
 2K : 1048576
 4K : 2097152
 8K : 1048576
 16K: 524288
 32K: 262144
 64K: 131072
 128K: 65536
 256K: 32768
 512K: 16384
 1M : 8192

Meaning: This is a rough view; in real life you’ll correlate fragmentation with allocation behavior and latency. A high diversity of small blocks on a dataset meant for large sequential writes can be a clue.

Decision: If fragmentation and fullness are high, plan data migration or pool expansion. ZFS is great, but it doesn’t defragment itself by wishing.

Three corporate-world mini-stories

Mini-story 1: The incident caused by a wrong assumption

They had a brand-new ZFS file server and two 10GbE uplinks. The rollout looked fine in the first week, mostly because the test was “copy a 20GB ISO once” and everybody went home happy.

Then quarter-end hit. Finance pushed thousands of small PDFs and spreadsheets into a share from a Windows app that insisted on write-through. Users reported copies “stalling every few seconds.” The network team saw no saturation, so they declared victory and blamed Windows. The storage team saw plenty of free RAM and assumed ARC would smooth it out. It didn’t.

The wrong assumption was subtle: “If the pool can do 1GB/s sequential writes, it can do office file copies.” Those are different sports. Sequential bandwidth is a victory lap; sync-heavy metadata is the obstacle course.

Once someone ran a simple dd ... conv=fdatasync test on the dataset, it was obvious. Sync commit latency was the bottleneck. The pool was RAIDZ on HDDs. Perfect for capacity, terrible for low-latency durability.

The fix was also subtle: they didn’t disable sync. They added a proper, power-loss protected SLOG and removed strict sync from shares that didn’t require it. Finance kept their semantics; engineering got their speed back. The helpdesk tickets stopped, which is the only KPI that matters when you’re on call.

Mini-story 2: The optimization that backfired

A different company had slow home directory copies. Someone read a forum thread and decided the fix was “bigger recordsize equals faster.” So they set recordsize=1M across every SMB dataset, including home directories and shared project trees.

Large file copies improved slightly. Then complaints got weirder: saving small documents felt laggy, Outlook PST access became jittery, and some apps started “not responding” during saves. The SMB server wasn’t down; it was just busy doing extra work.

Why? Partial overwrites on large records can create write amplification. A small change in a file can trigger a read-modify-write of a large block, especially when the workload is random and the app does lots of small updates. ZFS is copy-on-write, so it’s already doing careful bookkeeping; adding amplification is like asking it to juggle on a treadmill.

The backfired “optimization” also increased metadata churn because user profiles generate a pile of tiny files and attribute updates. Bigger recordsize didn’t help the metadata path at all. It just made the data path less friendly.

The rollback was disciplined: they split datasets by workload. Home directories went to 128K recordsize, atime off. Large media/project archives stayed at 1M. Performance stabilized. The lesson stuck: tuning is not a buffet where you pile on whatever looks tasty.

Mini-story 3: The boring but correct practice that saved the day

A team running a ZFS + Samba cluster had one unglamorous habit: weekly scrub reports and monthly baseline performance snapshots. Not dashboards for the executive wall. Just a text file with zpool status, zpool iostat under load, and basic NIC error counters.

One Tuesday, users reported that copies had become “spiky.” The on-call engineer didn’t guess. They pulled the baseline and compared it to current numbers. The big change: write latency on one mirror leg had drifted up, and correctable errors were appearing—just enough to trigger retries, not enough to fail the disk.

Because they had baseline data, they didn’t spend half a day arguing about Samba flags. They replaced the disk during a maintenance window, resilvered, and the copy stalls vanished.

Nothing heroic happened. No magic tunables. Just noticing that “performance regression” is often “hardware aging slowly.” This is what boring competence looks like in production.

Tuning decisions that actually move the needle

1) Decide your sync stance explicitly (don’t let it happen to you)

SMB workloads can be sync-heavy, especially with certain applications and policies. You have three choices:

  • Honor sync and pay for it: keep sync=standard, avoid Samba settings that force extra flushing, and deploy a real SLOG if needed.
  • Force sync always: sync=always for compliance-heavy shares. Expect lower throughput; design storage accordingly.
  • Disable sync: sync=disabled is a business decision to risk losing acknowledged writes on power loss or crash. It can be valid in scratch shares, but don’t pretend it’s “free performance.” It’s a different durability contract.

2) Split datasets by workload (one share, one behavior)

One dataset for everything is the fastest way to ensure nothing is good. Separate:

  • Home directories (metadata-heavy, small files)
  • Engineering project trees (many small files, read-mostly)
  • Media archives (large sequential)
  • Application drop zones (may require strict durability)

Then set properties per dataset: recordsize, atime, sync, compression, ACL behavior.

3) Get recordsize right enough

  • General SMB shares: start with recordsize=128K.
  • Large file archives: consider recordsize=1M if most files are large and sequential.
  • Databases/VM images over SMB: avoid if you can; if you must, use specialized settings and test thoroughly. SMB file serving and VM datastore semantics are not a casual marriage.

4) Turn off atime for SMB shares (unless you have a real reason)

atime=on adds metadata writes on reads. Most organizations don’t use access time for anything meaningful, and Windows certainly doesn’t need your ZFS server to write extra metadata every time someone opens a file.

5) Keep LZ4 compression on by default

LZ4 is one of the few “defaults” I’ll defend in production. It often improves effective throughput and reduces I/O. Don’t overthink it until you have evidence of CPU bottlenecks.

6) Use a real SLOG when you need it (and don’t cheap out)

A SLOG device is not “any SSD.” It needs low latency under sync write load and power-loss protection. Otherwise you built an expensive latency generator.

7) Samba: avoid “strict sync” unless you can justify it

strict sync can destroy throughput for workloads that generate many fsync points (including some Windows behaviors around file close). If you need strict semantics, make the storage capable. If you don’t, don’t pay for it.

8) SMB signing/encryption: scope it

Security teams like blanket policies. Production systems like budgets. If signing/encryption must be mandatory, ensure the SMB host has CPU headroom and modern crypto acceleration. If only certain shares contain sensitive data, scope policies per share or per traffic segment.

Joke #2: Nothing makes a file server faster like a policy meeting that ends with “we didn’t change anything.”

Common mistakes: symptom → root cause → fix

1) Symptom: Copy starts fast, then drops to 0 B/s repeatedly

Root cause: TXG commit stalls due to sync writes or slow flush latency (no SLOG, slow vdevs, pool too full).

Fix: Measure fsync (dd ... conv=fdatasync), verify Samba sync settings, add proper SLOG or redesign pool for latency, reclaim space.

2) Symptom: Large files copy fine; many small files crawl

Root cause: Metadata-bound workload (ACLs, xattrs, timestamps) plus small random I/O limits.

Fix: atime=off, ensure appropriate dataset properties, consider mirrors for metadata-heavy pools, verify Samba VFS modules aren’t adding overhead, accept that this is IOPS not bandwidth.

3) Symptom: Speed caps at ~110 MB/s on “10GbE”

Root cause: Client/server negotiated 1GbE, bad LACP hashing, or single TCP flow constraint without multichannel.

Fix: Check link speed via ethtool, validate switch config, test SMB multichannel, and verify the client isn’t on a 1GbE segment.

4) Symptom: Performance worse after enabling SMB signing or encryption

Root cause: CPU bottleneck in crypto/signing, single-thread hot spots, insufficient RSS queues.

Fix: Measure CPU per core during transfer, enable multiqueue/RSS, upgrade CPU, scope signing/encryption, or use hardware that accelerates it.

5) Symptom: Copies intermittently hang for “exactly a few seconds”

Root cause: Network retransmits, bufferbloat, or switch congestion; sometimes TXG timing aligns with perceived pauses.

Fix: Look at retransmits (netstat -s), interface drops, and switch counters. If clean, return to storage latency and sync.

6) Symptom: One share is slow; another share on same server is fine

Root cause: Dataset property mismatch (sync=always, weird recordsize, atime on), Samba share config differences (strict sync, VFS modules), or quotas/reservations impacting allocation.

Fix: Compare zfs get outputs and testparm -sv blocks for both shares. Normalize intentionally.

7) Symptom: “Windows says it will take 2 hours” but server looks idle

Root cause: Client-side scanning (antivirus, indexing), small-file overhead, or client waiting on per-file metadata operations.

Fix: Reproduce with a clean client, test with robocopy options, and confirm server metrics during the operation. Don’t tune servers to compensate for a misbehaving endpoint fleet.

Checklists / step-by-step plan

Step-by-step: fix “fast then zero” SMB copies on ZFS

  1. Confirm pool health: zpool status -v. If degraded or errors, stop and fix disks.
  2. Check pool fullness: zfs list. If >85% used, plan space recovery/expansion.
  3. Identify dataset and properties: zfs get recordsize,atime,sync,compression.
  4. Inspect Samba share config: testparm -sv for strict sync, aio settings, VFS modules.
  5. Measure sync latency: server-side dd ... conv=fdatasync. If slow, it’s your main suspect.
  6. Check SLOG presence/performance: zpool status for logs and ensure device class is appropriate.
  7. Observe disk latency under load: iostat -x and zpool iostat while reproducing.
  8. Verify network health: ip -s link, retransmits (netstat -s), and link speed (ethtool).
  9. Apply one change at a time: e.g., disable strict sync on a test share or add SLOG; then rerun the same transfer and compare.
  10. Write down the result: capture latency, throughput, and whether stalls disappeared. Memory fades; tickets don’t.

Baseline checklist (the boring stuff you’ll thank yourself for)

  • Weekly scrub scheduled; scrub reports reviewed.
  • Monthly snapshot of: zpool status, zfs get key properties, ip -s link, and a repeatable throughput + fsync test.
  • Dataset layout documented by workload category.
  • Explicit policy for sync: which shares require durability guarantees.
  • Change control for Samba config; no “one-liner fixes” in production at 2am.

FAQ

1) Why does Windows Explorer show fast speed, then 0 B/s?

Explorer reports based on buffering and short-term acceptance. ZFS and Samba can accept data quickly, then stall while committing sync writes or TXGs. Measure server-side latency.

2) Is robocopy faster than Explorer?

Sometimes. The bigger win is that robocopy is more predictable and scriptable, and it exposes retries and per-file behavior. It won’t fix server-side sync latency.

3) Should I set sync=disabled to make it fast?

Only if you accept losing acknowledged writes on power loss or crash. For scratch shares it can be acceptable. For business data, it’s a durability downgrade, not a tuning trick.

4) Do I need a SLOG for SMB?

If your workload generates lots of sync writes (or Samba settings force strict flushing), a good SLOG can be transformative. If your workload is mostly async, a SLOG won’t help much.

5) What recordsize should I use for SMB shares?

Start at 128K for general-purpose shares. Use 1M for large sequential archives. Avoid global changes; split datasets by workload.

6) Does turning on LZ4 compression slow things down?

Usually no, and often it speeds things up by reducing I/O. If CPU is already saturated (encryption/signing, heavy load), measure before deciding.

7) Is RAIDZ bad for SMB?

Not “bad,” but RAIDZ is less friendly to small random writes and metadata-heavy workloads than mirrors. If your SMB use case is lots of small files and sync behavior, mirrors often win on latency.

8) Why is one SMB share slow but others are fine?

Different dataset properties or Samba share options. Look for sync=always, atime=on, odd recordsize, or strict sync enabled on only one share.

9) Does SMB Multichannel fix everything?

No. It can increase throughput and resiliency, but it won’t fix storage latency or sync stalls. It also requires correct NIC, driver, and client support.

10) How do I know it’s CPU-bound?

During transfer, one or more CPU cores will be consistently high, often in smbd or kernel networking/crypto. Meanwhile disks and NICs won’t be saturated. That’s your sign.

Next steps you can execute this week

Do these in order. Each step makes a decision clearer, and none require faith.

  1. Pick one reproducible test transfer (one large file and one “many small files” folder) and keep it constant.
  2. Run the fast diagnosis playbook and capture outputs: zpool iostat, iostat -x, ip -s link, netstat -s, smbstatus.
  3. Prove or eliminate sync latency with the server-side dd ... conv=fdatasync test on the dataset.
  4. Split datasets by workload if you haven’t. Set atime=off and sane recordsize per category.
  5. Fix the real bottleneck: add proper SLOG for sync-heavy shares, reclaim space if the pool is too full, or address CPU/network issues if that’s where the evidence points.
  6. Write a one-page runbook with your baseline commands and “normal” outputs. Future you will buy past you coffee.

The goal isn’t a perfect graph. The goal is predictable performance under the durability contract you actually want to offer. Once you choose that contract on purpose, ZFS and SMB stop being mysterious and start being… merely demanding.

← Previous
Docker: Backups You Never Tested — How to Run a Restore Drill Properly
Next →
MySQL vs PostgreSQL: JSON workloads—fast shortcut or long-term pain

Leave a comment