ZFS SMB Durable Handles: Why Copy/Move Behavior Feels Weird

Was this helpful?

You drag a folder in Windows Explorer. The UI says “Moving…”. The progress bar crawls like it’s copying the Library of Congress.
Or it “finishes” instantly, but the folder is still there for a while. Or you lose the connection, reconnect, and somehow the file
is still “open” and the app keeps writing. It’s not magic. It’s SMB semantics colliding with how ZFS and Samba (or an SMB server built
on similar concepts) preserve file identity and safety.

Durable handles are supposed to make your life better during brief network drops. They do. They also make move/copy behavior feel
inconsistent, because the client’s idea of “same file, new name” isn’t always the server’s idea of “safe rename while handles survive.”
Let’s unpack what’s actually happening, how to prove it with commands, and what to change—without cargo-culting registry tweaks.

The mental model: what “durable” really means

Start with a blunt statement: SMB is not “just file I/O.” It’s a stateful protocol with a server-side memory of what a client is doing.
Durable handles are part of that memory.

In SMB2/SMB3, a client opens a file and gets a handle. That handle represents a live state: locks, share modes, byte-range locks,
caching rights, and sometimes a lease (a structured promise about caching and coherency). If the network blips, the client would
traditionally lose the handle. Applications then get “file not found” or “network name no longer available” and everyone blames storage.

Durable handles solve that: the server can keep the open state around for a while so the client can reconnect and “reclaim” the open.
The reconnect can even happen after a TCP reset, Wi‑Fi hiccup, or laptop sleep. This is a reliability feature. The tradeoff is that
the server must preserve and validate more state, and it must be careful when files are renamed/moved while opens exist.

Now add ZFS. ZFS is very good at preserving on-disk consistency and providing stable object identity. SMB is very good at preserving
open-file consistency across reconnects. Put them together and you get correctness. You also get visible latency spikes in places
Windows users interpret as “why is move acting like copy?”

Why copy/move feels weird on SMB over ZFS

1) “Move” is only a rename if the server can do an atomic rename

On a local NTFS volume, a move inside the same filesystem is typically a metadata rename: fast, atomic, and it doesn’t touch file data.
On SMB, Explorer asks the server to do a rename (SMB2 SET_INFO / FileRenameInformation). If the rename is within the same share and the
server can map it to a simple rename, it can be fast.

But if the “move” crosses boundaries the server considers non-atomic—different shares, different underlying filesystems, different
datasets mounted separately, or a path that hits a different VFS module policy—the server may respond “can’t do that as a rename.”
The client then falls back to copy+delete. That’s the first source of “weird.”

2) Durable handles make the server cautious about renames and deletes

If a file is open with a durable handle and the client might reconnect to continue writing, the server must preserve semantics:
the handle should still refer to the same file object after a rename. That’s doable if the server tracks file IDs reliably.
But it changes the cost profile. Operations that used to be “just rename an entry” now involve:

  • Updating durable handle tables and lease break state
  • Validating that share modes are respected for the new name/location
  • Ensuring the rename is visible and coherent to other clients
  • Potentially breaking leases/oplocks so other clients drop caches

That work isn’t free. If the server is busy, or metadata I/O is slow, a rename can take long enough that Explorer looks like it’s
“copying.” The UI isn’t a truth engine; it’s vibes plus a progress bar.

3) “Instant move” followed by lingering files is usually delayed delete or handle retention

Another weird one: you move a folder, it appears at the destination, but it also stays visible in the source for a while. There are
a few honest reasons:

  • Directory entry caching on the client (Explorer is aggressively optimistic).
  • Open handles preventing deletion; the server marks for delete-on-close.
  • Snapshots or holds making the data persist even if the live namespace changed (the namespace changes; blocks remain).
  • Cross-dataset moves that are implemented as copy+delete; deletion waits on closes.

4) ZFS dataset boundaries turn renames into copies, even when paths look “nearby”

ZFS datasets are not directories with a fancy name. They’re separate filesystems with their own mountpoints, properties, and sometimes
different recordsize, compression, xattr layout, and ACL behavior. A rename across datasets is not atomic. The SMB server can’t do a
single filesystem rename when the source and destination aren’t the same filesystem.

That means Explorer’s “move” becomes “copy then delete.” Users don’t know (or care) what a dataset is. They just see an operation that
used to be instant become minutes-long. You can either educate them, or you can design shares to make dataset boundaries invisible to
their workflows. Choose one. Only one will happen.

5) SMB metadata is chatty; ZFS metadata can be expensive under load

Copying a directory tree isn’t just data. It’s a festival of metadata: create, set security, set timestamps, set attributes, set
alternate data streams, query info, open/close. Durable handles and leases add state transitions to that festival. ZFS under heavy
synchronous load or with constrained metadata IOPS will make that festival feel like a slow parade.

Short joke #1: Durable handles are like keeping your parking ticket after you leave the garage—useful, unless the attendant also
expects you to keep the car there forever.

Interesting facts and a little history

  1. SMB2 arrived with Windows Vista/Server 2008 and dramatically reduced “SMB1 chattiness” by batching and pipelining operations.
  2. Durable handles showed up in SMB2.1 (Windows 7/Server 2008 R2 era) to survive transient disconnects without app-level retries.
  3. SMB3 added “persistent handles” mainly for continuously available shares on clustered servers; these are stronger than “durable.”
  4. Opportunistic locks (oplocks) existed long before SMB2; SMB2+ generalized them into leases with clearer semantics.
  5. Windows Explorer’s progress UI is not a protocol analyzer; it frequently mislabels copy vs move when fallbacks happen.
  6. ZFS rename is atomic within a dataset, but not across datasets; it becomes a higher-level copy+unlink operation.
  7. ZFS snapshots don’t “freeze” the live filesystem; they preserve old blocks. Deletes after a snapshot can become space-reclamation work later.
  8. Samba implements durable handles in userspace; performance and semantics depend on kernel VFS behavior, xattr support, and the chosen vfs modules.
  9. SMB uses file IDs to identify files beyond names. Servers that can provide stable inode-like IDs make durable semantics saner.

Durable handles vs leases/oplocks vs “persistent” handles

Durable handles

A durable handle is about reconnect. The server keeps state after a disconnect so the client can reclaim it. The
server decides how long to keep it and under what conditions it’s valid. Durable handles are common on modern SMB stacks.

Leases (and oplocks)

Leases are about caching. They let the client cache reads, writes, and metadata with fewer round trips. When another
client wants conflicting access, the server “breaks” the lease and the client flushes/invalidates caches. Lease breaks can look like
random pauses during copy/move because the server is forcing coherence.

Persistent handles

Persistent handles are the “don’t lose my handle even if the server fails over” variant, intended for continuously available shares
with clustered storage. If you’re not running a cluster designed for this, persistent handle expectations are a way to get hurt.

The practical takeaway: durable handles keep things correct during reconnect; leases make things fast until they must be correct.
Weird copy/move behavior is often the transition between those two modes.

Paraphrased idea — Richard Cook: “Complex systems fail in complex ways; operators must continuously reconcile how work is really done.”

ZFS mechanics that matter: IDs, txgs, xattrs, snapshots

Stable file identity: inode-like behavior and SMB file IDs

SMB clients benefit when the server can provide stable file IDs so that “same object, new name” remains true after rename. ZFS has
strong internal object identity. Samba (or your SMB stack) maps that to SMB file IDs. If that mapping is stable, durable handle
reclamation across rename is safer. If it isn’t, the server gets conservative: more checks, more breaks, more fallbacks.

Transaction groups (txgs) and sync behavior

ZFS batches changes into transaction groups. A lot of small metadata updates during a directory copy can accumulate and then flush.
If your workload triggers synchronous semantics (explicit sync, or SMB settings that imply it for durability), you can get periodic
stalls when the system waits for intent log activity and txg commits.

This is one reason “copy is smooth, move is spiky” happens: move operations can create bursts of metadata updates and rename/unlink
sequences that hit sync points differently than sequential writes.

xattrs and alternate data streams

Windows uses alternate data streams (ADS). SMB servers often implement ADS using extended attributes. On ZFS, xattrs can be stored in
different ways (implementation-dependent), and small-file metadata can turn into a lot of random I/O if you’re under memory pressure.
Copying from Windows often includes security descriptors, zone identifiers, and other metadata that “should be cheap,” until it isn’t.

Snapshots: the move happened, but space didn’t come back

Users say “I moved it, why is the pool still full?” Snapshots. A move inside the same dataset may only rename entries; space doesn’t
change. A copy+delete across datasets may free space in the source dataset, but if the source has snapshots, blocks stay referenced.
From a capacity perspective, “delete” becomes “mark free in live tree but still referenced by snapshot.” That’s correct. It’s also
confusing if you’re watching free space like it’s a stock ticker.

Failure modes that look like “SMB is slow” but aren’t

Rename storms triggering lease breaks

Large moves of directory trees are rename storms. If other clients are browsing those directories, the server has to maintain coherent
views. Lease breaks cascade. Your “simple move” becomes a multi-client coordination exercise.

Antivirus and content indexing turning metadata into latency

On Windows, Defender (or corporate AV) loves new files. Copy creates new files. Move-as-copy creates new files. The client will read
them, hash them, and sometimes reopen them. That increases SMB opens and closes, which interacts with durable handle tables and
share modes. Storage gets blamed for a security team’s enthusiasm.

ZFS ARC pressure and small I/O amplification

Copying many small files is a metadata and small I/O workload. If ARC is pressured, the system starts doing real disk reads for
metadata that would otherwise be cached. Latency rises. SMB latency rises. Explorer declares the move “calculating…” forever.

Cross-share moves: user experience tax for clean admin boundaries

Admins love multiple shares. Users love “it moves instantly.” Those are often mutually exclusive. A cross-share move usually can’t be
a server-side rename. The client copies and deletes, which is correct but not fast.

Practical tasks: commands, outputs, and decisions

These are the checks I run when someone says “SMB moves are weird on ZFS.” Each task has: the command, what the output means, and the
decision you make based on it.

Task 1: Confirm dataset boundaries and mountpoints

cr0x@server:~$ zfs list -o name,mountpoint -r tank/smb
NAME               MOUNTPOINT
tank/smb           /tank/smb
tank/smb/users     /tank/smb/users
tank/smb/projects  /tank/smb/projects
tank/smb/archive   /tank/archive

Meaning: tank/smb/archive is mounted somewhere else. A “move” from /tank/smb/projects to /tank/smb/archive crosses filesystems.

Decision: If users routinely move between these paths, either merge into one dataset or accept copy+delete behavior (and tell them).

Task 2: Verify both source and destination are same filesystem from the kernel’s view

cr0x@server:~$ stat -f -c '%T %m' /tank/smb/projects /tank/archive
zfs /tank/smb/projects
zfs /tank/archive

Meaning: Both are ZFS, but that doesn’t mean same dataset. stat -f won’t differentiate ZFS datasets; it just says “zfs.”

Decision: Use Task 1 plus a share map. Don’t assume “same pool” means “same filesystem.”

Task 3: Map SMB shares to filesystem paths (Samba)

cr0x@server:~$ testparm -s 2>/dev/null | sed -n '/^\[users\]/,/^\[/{/path =/p}'
	path = /tank/smb/users

Meaning: The [users] share points to /tank/smb/users. Do this for both shares involved in the “move.”

Decision: If the move crosses shares, expect client-side copy+delete unless you’re using server-side copy features and the client supports them well.

Task 4: Watch live SMB sessions and open files

cr0x@server:~$ smbstatus --shares --processes
Service      pid     Machine       Connected at                     Encryption   Signing
users        21408   10.20.30.44    Tue Dec 26 09:10:12 2025 UTC     -            -
projects     22119   10.20.30.44    Tue Dec 26 09:11:05 2025 UTC     -            -

Meaning: Same client is connected to multiple shares. Cross-share move is likely copy+delete.

Decision: If this is a frequent workflow, consider presenting a single share root that keeps common moves within one share and dataset.

Task 5: Identify durable handle and lease behavior in Samba logs

cr0x@server:~$ grep -E 'durable|lease|oplock' /var/log/samba/log.smbd | tail -n 8
[2025/12/26 09:17:41.231112,  3] smbd/smb2_oplock.c:3177(smbd_smb2_request_process_break)
  Sending break for lease key 9c3c2b0d...
[2025/12/26 09:17:41.245891,  3] smbd/smb2_server.c:4041(smbd_smb2_request_error_ex)
  smbd_smb2_request_error_ex: idx[1] status[STATUS_OPLOCK_BREAK_IN_PROGRESS]

Meaning: Lease breaks are happening. That’s the server forcing clients to flush caches, often during rename storms.

Decision: If lease breaks correlate with the “weird move,” reduce concurrent directory browsing during migrations, or schedule moves off-hours.

Task 6: Check ZFS dataset properties that affect SMB metadata behavior

cr0x@server:~$ zfs get -H -o name,property,value atime,xattr,acltype,casesensitivity,recordsize,sync tank/smb/projects
tank/smb/projects	atime	off
tank/smb/projects	xattr	sa
tank/smb/projects	acltype	posixacl
tank/smb/projects	casesensitivity	mixed
tank/smb/projects	recordsize	1M
tank/smb/projects	sync	standard

Meaning: xattr=sa can be good for metadata-heavy workloads; recordsize=1M is fine for big files but not decisive for small-file storms.

Decision: If you’re seeing heavy ADS/xattr traffic and random I/O, validate xattr strategy and consider a dataset tuned for small files (and keep moves within it).

Task 7: Check snapshot pressure when “delete didn’t free space”

cr0x@server:~$ zfs list -t snapshot -o name,used,refer,mountpoint -r tank/smb/projects | head
NAME                                USED  REFER  MOUNTPOINT
tank/smb/projects@daily-2025-12-25   118G  2.4T   -
tank/smb/projects@daily-2025-12-26   6.1G  2.5T   -

Meaning: Snapshots reference a lot. Deleting/moving files won’t release those blocks until snapshots are removed.

Decision: If capacity is the pain, adjust snapshot retention or move data to a dataset without long-lived snapshots (but do it knowingly).

Task 8: Observe txg and I/O latency during the operation

cr0x@server:~$ iostat -x 2 5
Linux 6.6.44 (server) 	12/26/2025 	_x86_64_	(32 CPU)

avg-cpu:  %user %nice %system %iowait  %steal   %idle
          12.1  0.0     6.8     9.7     0.0    71.4

Device            r/s     w/s   rkB/s   wkB/s  await  svctm  %util
nvme0n1         215.0   480.0  8200.0 51200.0   8.40   0.35   24.0
sdg             120.0   900.0  1400.0 18000.0  38.20   1.10   95.0

Meaning: One device is pinned at high %util and high await. That’s where your “rename” is waiting.

Decision: If metadata is landing on slow vdevs, fix the vdev layout, add special allocation classes (where applicable), or reduce metadata-heavy operations during peak.

Task 9: Check ARC efficiency and pressure (Linux)

cr0x@server:~$ arcstat 2 3
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
09:20:11  12K  3.1K     25   510   16   2.6K   84     0    0   92G   110G
09:20:13  15K  5.9K     39   820   14   5.1K   86     0    0   92G   110G
09:20:15  14K  6.2K     44   900   14   5.3K   86     0    0   92G   110G

Meaning: Miss rate is high and mostly prefetch misses. You’re doing random metadata/small file work that isn’t staying hot.

Decision: If this correlates with copy/move storms, add RAM, reduce concurrency, or move metadata to faster media if your platform supports it.

Task 10: Confirm whether the operation is rename or copy+delete from the client side (Windows via server logs)

cr0x@server:~$ grep -E 'rename|unlink|NT_STATUS|SET_INFO' /var/log/samba/log.smbd | tail -n 10
[2025/12/26 09:21:44.101220, 10] smbd/smb2_trans2.c:3920(smbd_smb2_setinfo_file)
  SMB2_SETINFO_FILE: FileRenameInformation
[2025/12/26 09:21:44.141109, 10] smbd/close.c:871(close_normal_file)
  closed file users/alice/report.xlsx (numopen=0) NT_STATUS_OK

Meaning: You’re seeing FileRenameInformation, so at least part of the move is a true rename.

Decision: If the UI shows “copying” but logs show rename, focus on metadata latency/lease breaks—not on data throughput.

Task 11: Check for share-level options that change sync/durability behavior

cr0x@server:~$ testparm -s 2>/dev/null | grep -E 'strict sync|sync always|durable handles|kernel oplocks|oplocks|leases'
	kernel oplocks = no
	oplocks = yes
	leases = yes
	strict sync = yes

Meaning: strict sync = yes can force synchronous semantics for certain operations, increasing latency under metadata storms.

Decision: Don’t blindly disable strict sync. Instead, decide per share based on data criticality. For home directories, maybe; for accounting, probably not.

Task 12: Spot “move became copy” via unexpected write volume

cr0x@server:~$ zpool iostat -v tank 2 3
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank        88.2T  21.5T    220   1600   18M   220M
  raidz2-0  40.1T  10.2T    110    900   10M   120M
    sdg         -      -     30    260  2.5M    34M
    sdh         -      -     25    230  2.1M    30M
  raidz2-1  48.1T  11.3T    110    700  8.0M   100M
    sdi         -      -     28    210  2.0M    29M
    sdj         -      -     26    190  1.8M    27M

Meaning: A “move” inside a dataset should not generate sustained hundreds of MB/s of writes. If it does, it’s a copy.

Decision: Rework share/dataset boundaries. If users must move across datasets, plan for throughput and AV overhead; don’t promise rename-speed.

Task 13: Check directory enumeration pressure (are we doing millions of LOOKUPs?)

cr0x@server:~$ nfsstat -c 2>/dev/null || true
nfsstat: NFS not running

Meaning: Not NFS—good. For SMB enumeration pressure you’ll rely on SMB logs, perf counters, and system call tracing.

Decision: If you can’t instrument SMB properly, turn up Samba log level briefly on a single client IP and capture rename/copy behavior directly.

Task 14: Trace rename/unlink syscalls during a “move” (Linux server)

cr0x@server:~$ sudo strace -f -p $(pidof smbd | awk '{print $1}') -e trace=rename,renameat,unlink,unlinkat -s 128 -tt -o /tmp/smbd.rename.trace & sleep 5; tail -n 5 /tmp/smbd.rename.trace
09:23:10.118023 renameat(AT_FDCWD, "/tank/smb/projects/Q4/budget.xlsx", AT_FDCWD, "/tank/smb/projects/Q4-archive/budget.xlsx") = 0
09:23:10.118771 unlinkat(AT_FDCWD, "/tank/smb/projects/Q4/tmp~budget.xlsx", 0) = 0

Meaning: You’re seeing real renameat calls. That suggests server-side rename is happening—slowdown is likely lock/lease breaks or metadata I/O.

Decision: If you see open/read/write volume instead of renames, it’s copy+delete. Then fix the boundary problem first.

Task 15: Detect deletes blocked by open handles (Samba)

cr0x@server:~$ smbstatus --locks | head -n 12
Locked files:
Pid    Uid   DenyMode   Access      R/W        Oplock           SharePath   Name   Time
22119  1050  DENY_NONE  0x120089    RDONLY     EXCLUSIVE+BATCH  /tank/smb/projects  Q4/budget.xlsx  Tue Dec 26 09:23:44 2025

Meaning: A client has an oplock/lease (shown as oplock here). Deleting or moving might need a break; if the client is slow to respond, you wait.

Decision: If one client is holding locks and blocking operations, address that client (sleeping laptops, stale apps) before tuning storage.

Fast diagnosis playbook

When a “move” feels like a “copy,” don’t start by changing ZFS tunables. Start by finding out what operation is actually happening.
Here’s the order that saves time.

First: is it rename or copy+delete?

  • Check share paths and dataset boundaries (Task 1, Task 3).
  • Watch for sustained write bandwidth during a “move” (Task 12).
  • Look for SMB rename calls in logs or trace syscalls (Task 10, Task 14).

If it’s copy+delete: stop blaming durable handles. Your architecture forced the fallback.

Second: if it’s rename, what’s the wait?

  • Look for lease/oplock breaks and STATUS_OPLOCK_BREAK_IN_PROGRESS (Task 5).
  • Check for blocked deletes or open handles (Task 15).
  • Correlate with disk latency/IO wait (Task 8).

If it’s lease breaks: reduce concurrency, avoid mass renames during peak, and consider share layout changes.

Third: is metadata I/O the bottleneck?

  • Check ARC miss rates (Task 9).
  • Validate dataset properties for the workload (Task 6).
  • Verify snapshots aren’t masking “delete” expectations (Task 7).

If it’s metadata: the fix is faster metadata, more memory, fewer small-file storms, or better scheduling—rarely “turn off durable handles.”

Three corporate-world mini-stories

Incident: the wrong assumption (“a move is always a rename”)

A mid-sized company had a clean storage taxonomy: one ZFS dataset per department, one SMB share per dataset, neat quotas, neat snapshots.
The file server was healthy and boring—until finance’s quarter-end close. Suddenly, “moving” folders into an archive share took hours.
Helpdesk tickets multiplied, and someone declared the ZFS pool “degraded” because Windows said “Preparing to move…”.

The wrong assumption was simple: they assumed a move within “the same server” is a rename. But the archive was a different share
and a different dataset. Windows did copy+delete. Antivirus scanned the new files. Finance’s spreadsheet tooling reopened them.
The SMB server did exactly what it should: enforce share modes and keep opens durable across transient disconnects.

The SRE on call proved it by watching write bandwidth on the pool during the “move” and by grepping Samba logs for rename calls.
There were almost none. It was a copy storm. Meanwhile, snapshots on the source dataset meant that even after the delete phase, space
didn’t come back. Users saw a slow move and no reclaimed capacity and concluded “storage is lying.”

The fix wasn’t a tuning flag. They created a single share root with subdirectories for departments inside one dataset for workflows
that required fast moves. They kept separate datasets behind the scenes for snapshot/retention needs, but stopped exposing dataset
boundaries as separate shares for everyday users. Quarter-end became quiet again. The file server went back to being boring, which is
what you want from a file server.

Optimization that backfired: “let’s make it extra durable”

Another org had been burned by a laptop fleet that frequently slept mid-copy. They heard about durable handles and decided the right
move was to crank “safety” everywhere. They enabled strict sync and treated the SMB server like a database server. Nobody asked what
workload they actually had: mostly small Office files, some CAD, and huge directory trees.

The result was a strange pattern: copies were fine at first, then periodic stalls. Moves within a dataset were “mostly instant” but
sometimes hung for tens of seconds. Users noticed the hangs most during large folder reorganizations, which are metadata-heavy and
rename-heavy. Each stall corresponded with a burst of synchronous-ish metadata operations and a pile of lease breaks as multiple
clients browsed the same trees.

The storage team chased disks. The network team chased MTU. The Windows team chased Explorer updates. The real issue was policy:
they imposed “database durability” on a general file share. Correctness didn’t improve much (SMB already had durability mechanisms),
but latency got worse. They rolled strict sync back on general shares, kept it only where business requirements justified it, and
moved critical app data to a share with explicit expectations and scheduled maintenance windows.

Durable handles were not the villain. The villain was pretending all data has the same write semantics. It doesn’t.

Boring but correct practice that saved the day: planned boundary design and change windows

A company with a large engineering org had a weekly “filesystem gardening” job: cleanup, archive, reorganize.
They’d learned (the hard way, years earlier) that mass renames during business hours create lease-break storms and angry users.
So they did two boring things.

First, they designed shares so that the most common moves stayed within a dataset. “Active project” and “Project archive” lived in
the same dataset with a directory boundary, not separate datasets. Retention was handled by snapshots and replication policies, not
by forcing users to move across filesystems.

Second, they scheduled any cross-dataset migration for a window, announced it, and rate-limited it. They used a controlled tool on
the server side (not Explorer drag-and-drop) so they could measure behavior and avoid “thundering herd” effects from dozens of clients.

One week, a network change caused brief disconnects across a floor. Laptops reconnected. Durable handles did their job. The migration
job continued without corruption. Users barely noticed. The postmortem was short and dull, which is the highest compliment in ops.

Short joke #2: If you want excitement, run a nightclub. If you want reliability, run a file server.

Common mistakes: symptoms → root cause → fix

1) Symptom: “Move within the same server is as slow as copy”

Root cause: The move crosses SMB shares or ZFS datasets, so it can’t be a rename; Windows falls back to copy+delete.

Fix: Put source and destination within the same share and dataset for common workflows; or accept it’s a copy and size the system for it.

2) Symptom: “Move hangs at 99%”

Root cause: Open handles or delayed delete; Explorer waits for deletes/metadata finalization while another client holds a lease/oplock.

Fix: Use smbstatus --locks to identify blockers; break the dependency (close app, fix sleeping laptops, reduce directory watchers).

3) Symptom: “Folder moved, but it still appears in the old location”

Root cause: Client-side caching, directory enumeration cache, or delayed update due to lease breaks.

Fix: Force a refresh, confirm server namespace with server-side ls. If consistent on server, treat as client cache. If inconsistent, check SMB logs for errors.

4) Symptom: “We deleted terabytes but space didn’t come back”

Root cause: Snapshots retain blocks; deletes only remove live references.

Fix: Inspect snapshot used/refer; adjust retention, or move data to a dataset with appropriate snapshot policy.

5) Symptom: “Random pauses during copy/move of many files”

Root cause: Lease breaks/oplock breaks; metadata I/O stalls; ARC pressure; synchronous metadata policies.

Fix: Correlate Samba logs with iostat and ARC stats. Reduce concurrent tree operations and fix storage latency first.

6) Symptom: “After a Wi‑Fi drop, the app keeps saving, but the file rename fails”

Root cause: Durable handle reconnect succeeded, but subsequent rename conflicts with share modes or a competing handle on the destination path.

Fix: Check for opens on both source and destination; consider workflow changes (save-to-temp then atomic rename) and ensure server supports stable file IDs.

7) Symptom: “Copy is fast, move is slow (within same dataset)”

Root cause: Move triggers rename/unlink patterns that cause more lease breaks and metadata updates; copy may stream data efficiently.

Fix: Measure lease breaks and metadata latency; schedule large reorganizations, and avoid doing them while many clients browse the same tree.

8) Symptom: “Robocopy /Z works, Explorer fails”

Root cause: Different client behavior. Robocopy uses restartable mode and different retry semantics; Explorer is UI-driven and sometimes more sensitive to perceived errors.

Fix: For migrations, use robust tools (Robocopy, server-side tools) with explicit flags; treat Explorer as a convenience tool, not a migration system.

Checklists / step-by-step plan

Design checklist: prevent “move became copy” surprises

  1. List the top 5 user workflows that involve reorganizing data (moves/renames).
  2. Ensure those workflows stay within a single SMB share whenever possible.
  3. Ensure those workflows stay within a single ZFS dataset whenever “instant move” is a requirement.
  4. Use datasets for policy (snapshots/quotas/replication) where it doesn’t break workflows—or hide boundaries behind a single share root.
  5. Decide which shares need strict durability semantics and which need throughput/latency; document that decision.

Operational checklist: when users complain about move/copy weirdness

  1. Ask: source path, destination path, approximate file count, file types, and whether other users were browsing the tree.
  2. Check whether it crossed shares/datasets (Task 1, Task 3).
  3. Check whether pool writes spike during the “move” (Task 12).
  4. If rename: check lease breaks and locks (Task 5, Task 15).
  5. Correlate with disk latency and ARC pressure (Task 8, Task 9).
  6. Check snapshots if “delete didn’t free space” is part of the complaint (Task 7).
  7. Only after you have evidence, change one variable at a time (share option, scheduling, dataset layout).

Migrations checklist: moving data between datasets without drama

  1. Prefer server-side transfer tools during a change window; don’t rely on Explorer drag-and-drop for terabytes.
  2. Throttle concurrency; small-file moves scale poorly with “more threads” once metadata is the bottleneck.
  3. Freeze or coordinate AV/indexing policies if you can (or at least anticipate the overhead).
  4. Measure: IOPS latency, SMB lease breaks, and open handle counts during the move.
  5. Have a rollback plan: if lease storms impact users, stop the job cleanly and resume later.

FAQ

1) Are durable handles a ZFS feature?

No. Durable handles are an SMB protocol feature implemented by the SMB server (often Samba on Unix-like systems). ZFS influences how
well the server can map stable file identity and how quickly metadata operations complete.

2) Why does moving a folder sometimes take longer than copying it?

Because “move” of a directory tree can be a storm of renames and deletes that triggers lease breaks and metadata updates. A copy can
be a smoother streaming workload if clients aren’t contending on the namespace.

3) If I disable durable handles, will moves become normal?

Usually not. If your “move” is actually copy+delete due to share/dataset boundaries, disabling durable handles won’t change that.
If your issue is lease breaks and handle retention, disabling durability may reduce state but it also makes reconnect behavior worse.
Fix architecture first, then tune.

4) Why does moving across two folders in the same share still behave like copy?

Two folders can still sit on different datasets due to mountpoints inside the share path. The path looks contiguous; the filesystem
isn’t. Confirm with zfs list -o mountpoint and your share’s path =.

5) What’s the relationship between SMB leases and “hangs”?

When the server needs a client to drop a cache or flush pending writes, it breaks the lease. If the client is slow to respond
(sleep, CPU pressure, AV scan), the server waits. Your move waits. That wait is correctness doing its job.

6) Why does Explorer show “calculating time remaining” forever?

Explorer struggles with workloads dominated by metadata and small files, especially over a network. It’s also bad at predicting
operations that change nature mid-flight (rename succeeds for some files, copy fallback for others).

7) Do snapshots slow down SMB moves?

Snapshots don’t typically slow the rename itself. They do affect deletes and space reclamation. If you’re moving across datasets
(copy+delete), snapshots on the source or destination can change write amplification and capacity outcomes.

8) How can I prove to management that “move became copy”?

Show sustained pool write bandwidth during a “move” (Task 12). Then show the dataset boundary or cross-share mapping (Task 1/3).
Renames do not generate large sequential writes; copies do.

9) Is this different on TrueNAS or other ZFS appliances?

The principles are the same. The appliance may expose different knobs or defaults, but SMB semantics don’t change: rename is only
atomic within a filesystem/share context; durable handles and leases still require state tracking and coordination.

10) What should I change first: ZFS tuning or Samba tuning?

Neither. Change layout first: keep common moves inside one dataset/share. Then measure. Only tune after you have a specific bottleneck:
metadata latency, lock contention, or synchronous write policy.

Next steps you should actually do

If you want copy/move to stop feeling weird, stop treating it as a mystery and start treating it as a classification problem:
rename or copy? Once you know that, the fix becomes obvious and usually boring.

  1. Inventory your shares and datasets and draw a boundary map users can’t accidentally cross during normal workflows.
  2. Pick one “fast move” zone (single dataset, single share root) for teams that reorganize constantly.
  3. Instrument lease breaks and locks (briefly increase log levels during incidents) so you can tell “client contention” from “storage latency.”
  4. Schedule large reorganizations and migrations outside peak hours, and use controlled tools rather than Explorer.
  5. Align snapshot policy with reality: if the business expects space to return immediately, don’t keep long-lived snapshots on that dataset.

Durable handles are doing reliability work you don’t want to do manually. Let them. Just don’t build a share/dataset layout that
forces every “move” to become a copy, then act surprised when physics shows up.

← Previous
Why Microcode Updates Will Become as Normal as Driver Updates
Next →
PostgreSQL vs MongoDB Transactions: Where Reality Differs From Docs

Leave a comment