ZFS sync: The Setting That Can Make You Fast… and Unsafe

Was this helpful?

ZFS has a little knob named sync that looks harmless until it ruins your week—or saves it. Flip it one way and your latency graph calms down, your benchmarks look heroic, and your VM host suddenly “feels fast.” Flip it the other way and you get the kind of guarantees that auditors love and operators sleep on. The catch is that both outcomes can be true at the same time, depending on what you think “written to disk” means.

This is not a moral lecture about “never disable safety.” It’s a field guide for people who run production: what sync actually does, why it can throttle a system into the ground, how SLOG devices help (and how they don’t), and how to choose settings that match your workload’s promises. The goal is simple: be fast without accidentally turning your storage into a polite suggestion.

What ZFS “sync” really means (and what it doesn’t)

In ZFS, the sync dataset property controls how ZFS treats synchronous write requests. That phrasing matters. ZFS doesn’t randomly decide to be synchronous; the caller asks for it.

A synchronous write is one where the application (or protocol) says: “Do not report success until this data is on stable storage.” Stable storage is storage that will survive a crash and sudden power loss. Not “it’s in RAM,” not “the controller says it’s in cache,” not “Linux page cache feels optimistic today.” Stable.

ZFS implements this guarantee by committing the write intent to the ZIL (ZFS Intent Log). If you have a separate log device (a SLOG), ZFS can place that ZIL data on a low-latency device to make sync writes faster. Later, ZFS rolls those changes into the main pool as part of normal transaction group (TXG) commits.

Two immediate implications:

  • sync does not make your system “safe” by itself. It defines what ZFS will do when the caller asks for sync semantics. If the caller never asks (or is misconfigured), you may still lose data on power loss.
  • sync can absolutely make you faster. But the speedup often comes from removing durability guarantees. That’s why this setting is famous: it’s one of the few toggles where performance and safety are directly coupled.

Here’s the first joke, because we’ll need it later: Disabling sync is like removing your smoke detector because it keeps beeping when you burn toast—peaceful right up until it isn’t.

Interesting facts and historical context

Some context makes the behavior feel less like magic and more like engineering:

  1. ZFS was born at Sun Microsystems in the early 2000s, designed for end-to-end integrity and sane admin workflows at scale—back when “just use RAID and hope” was still a popular plan.
  2. The ZIL is not a write cache for everything. It exists specifically to satisfy synchronous semantics; most asynchronous writes never touch it.
  3. A SLOG is not “an SSD cache.” It’s a dedicated device to store the ZIL quickly. It doesn’t speed up reads and doesn’t automatically speed up all writes.
  4. Many enterprise storage arrays sold “NVRAM-backed write cache” as the magic ingredient that made sync writes fast. In ZFS land, a power-loss-protected SLOG plays a similar role, but only for the ZIL.
  5. NFS has a reputation for “mysterious slowness” largely because it tends to request synchronous semantics for many operations. If the storage can’t commit sync quickly, the network protocol looks guilty.
  6. Databases often use fsync() as their line in the sand. PostgreSQL, MySQL/InnoDB, and others rely on explicit flush calls to uphold transaction durability.
  7. Virtualization stacks amplify the pain. A VM doing sync writes becomes a host doing sync writes, multiplied by every guest. One noisy durability-sensitive tenant can dominate latency.
  8. “Write ordering” bugs used to be common in filesystems and controllers. ZFS’ design (copy-on-write + checksums) doesn’t eliminate the need for sync, but it makes corruption easier to detect and harder to create silently.

The three modes: standard, always, disabled

The sync property is per dataset (filesystem or zvol). It has three relevant values:

1) sync=standard (the default)

This means: honor the caller’s request. If the application issues synchronous writes or calls fsync()/fdatasync(), ZFS commits intent to the ZIL and only then reports success.

Operationally: this is the “do what the app asked” mode. If performance is bad here, it’s a real signal: your workload is asking for durability and your storage path can’t deliver it quickly.

2) sync=always

This forces all writes to be treated as synchronous, even if the caller didn’t request it.

When you use it: when you don’t trust the workload or middleware to request sync semantics correctly, but you still need durable behavior. Sometimes used on NFS exports for workloads that are notorious for lying about durability needs, or on certain VM images where you want to guarantee “host-level” safety regardless of guest settings.

Tradeoff: if you don’t have a fast, safe SLOG, you can turn a decent pool into a latency museum.

3) sync=disabled

This tells ZFS to treat synchronous requests as asynchronous. It will acknowledge sync writes before they are actually on stable storage.

When you use it: for data you can afford to lose in a crash (scratch space, caches, transient analytics intermediates), or in very specific architectures where durability is handled somewhere else (for example, the app has its own replicated commit log and you explicitly accept local loss).

Risk: any application that believes it got a durable commit may be wrong. That includes databases. That includes VM filesystems. That includes NFS clients doing safe writes.

Second joke, and then we get serious again: Setting sync=disabled on a database volume is like putting “probably” in the middle of your financial ledger—auditors don’t find it charming.

Who asks for sync writes: apps, databases, NFS, hypervisors

Most performance drama around sync comes from misunderstanding who is asking for synchronous semantics and why.

Databases

Databases often convert “transaction committed” into an fsync() on a write-ahead log (WAL) or redo log. The log is sequential-ish, but the requirement is strict: when the database says commit, the log must survive a crash. If ZFS takes 5 ms to satisfy that, the database’s commit latency is at least 5 ms, even if everything else is fast.

Some DBs can relax this (e.g., “async commit” modes), but that’s an application-level business decision. The storage shouldn’t silently change the meaning of “commit” without you knowing.

NFS

NFS clients often request stable writes, and servers have to honor them. On ZFS-backed NFS servers, this means the ZIL/SLOG path becomes the hot path. You can have a screaming-fast pool for asynchronous throughput and still get terrible NFS latency if sync commits are slow.

Virtualization (zvols, VM images)

Hypervisors and guest filesystems can generate a surprising number of synchronous operations—metadata updates, journal commits, barriers, flushes. A single “safe” setting in a guest can translate into constant flushes on the host, and those flushes map directly onto the ZIL path.

This is why you’ll see the classic complaint: “My SSD pool is fast but my VM host feels sluggish.” Often the pool is fine at bulk writes, but the sync write latency is poor.

The sync write path: TXGs, ZIL, SLOG, and what actually hits stable storage

ZFS writes data in transaction groups (TXGs). Changes are accumulated in memory and periodically committed to disk in batches. That batching is why ZFS can be efficient: it turns lots of small random writes into more orderly IO.

Synchronous requests complicate this. The application can’t wait for the next TXG commit (which might be seconds away). So ZFS uses the ZIL: a log of intent that records enough information to replay the operation after a crash. The ZIL is written quickly, then the sync write can be acknowledged. Later, when the TXG commits, the actual data lands in its final on-disk structures and the ZIL entries become obsolete.

Key operational truth: the ZIL is only read during crash recovery. Under normal operation, it’s written, not read. A SLOG device isn’t there to improve steady-state reads; it’s there to accept small, latency-sensitive sync writes quickly and safely.

Another truth: if you don’t have a separate SLOG, ZIL blocks live on the main pool. That can be fine on very fast, low-latency pools—especially all-flash with power-loss protection. On slower pools (spinning disks, or SSDs with poor latency under sync writes), it can be brutal.

SLOG realities: what it accelerates, what it cannot

People buy a SLOG for the same reason they buy better coffee: they want mornings to hurt less. But a SLOG is not a miracle device. It accelerates one thing: latency of synchronous write acknowledgments.

What a SLOG helps

  • Sync-heavy NFS workloads that are bottlenecked on commit latency.
  • Databases that are commit-latency bound (and where the DB insists on durable commits).
  • VM clusters where flushes are frequent and blocking.

What a SLOG does not help

  • Read latency or read IOPS (that’s ARC/L2ARC territory, and even that has nuance).
  • Asynchronous bulk write throughput (that’s mostly the main vdevs and TXG behavior).
  • Workloads that barely issue sync writes; you won’t feel it.

What makes a good SLOG

Two properties: low latency under sync writes and power-loss protection. You want the device to acknowledge writes only when data is actually safe. Consumer SSDs can be fast but may lie about flush durability. Enterprise SSDs (or devices designed for write logging) usually do better here because they have capacitors and firmware designed for this job.

Size is not the headline. The SLOG only needs to hold a short window of outstanding sync transactions—typically seconds worth, not the whole dataset. Latency, not capacity, is what you’re buying.

Mirroring the SLOG

If a SLOG device fails, the pool can continue (ZFS can fall back), but you risk losing the last acknowledged synchronous transactions if the log device dies in the wrong way. In practice, many teams mirror the SLOG for paranoia that’s actually justified—especially on shared storage for VMs or databases.

Three corporate-world mini-stories (realistic war stories)

Mini-story #1: An incident caused by a wrong assumption

The storage team inherited a ZFS-backed NFS cluster feeding a build farm and a handful of “small” internal services. A new service showed up: a stateful queueing system that wrote tiny messages constantly and relied on synchronous commits for correctness. Everyone assumed it would be “like the other apps,” because it lived in the same Kubernetes cluster and used the same NFS class.

It wasn’t. The queueing service forced stable writes, and latency went from “fine” to “what is happening” overnight. Build jobs started timing out in places that made no sense. The network team got pulled in because NFS latency graphs looked like packet loss. The SRE on call restarted daemons, then nodes, then questioned reality.

After a few hours of whack-a-mole, someone ran zpool iostat -v 1 and noticed a tiny device (the only SLOG) pinned at high write ops with ugly latency, while the main pool had headroom. The SLOG was an old consumer SATA SSD—fast in benchmarks, terrible at sustained sync writes, and likely not honoring flush semantics reliably. It had become the choke point and the source of tail latency.

The fix was mundane: replace the SLOG with a proper low-latency, power-loss-protected device and mirror it. Performance stabilized immediately. The postmortem wasn’t about blame; it was about assumptions. The environment changed from “mostly async build artifacts” to “sync-heavy correctness workload,” and the storage settings didn’t evolve with it.

Mini-story #2: An optimization that backfired

A virtualization platform team had a ZFS pool hosting VM disks (zvols). A new cluster came online, and under load it underperformed compared to the old one—even though the hardware was newer. Someone found a blog post suggesting sync=disabled for “massive performance gains” on VM storage. They tested it on a staging host with synthetic IO and got gorgeous numbers.

So they rolled it out gradually. The performance win was real: commit latency plummeted, IO wait dropped, tenants stopped complaining. It felt like hero work—until the first real power event. Not a catastrophic datacenter outage, just a breaker trip during maintenance that took out a rack. The hosts rebooted, storage imported cleanly, and most VMs came up fine.

But a few didn’t. Filesystems were corrupted in ways that journaling didn’t fix. One database VM came up but was missing the last few minutes of committed transactions—transactions the app had acknowledged to upstream services. The “fast” setting had silently changed the meaning of “commit” inside those guests. That caused logical corruption: not broken blocks, but broken business truth.

The rollback was easy. The cleanup wasn’t. They had to reconcile missing orders and reprocess events from upstream logs. In the final write-up, the lesson wasn’t “never tune ZFS.” It was: if you disable sync on VM storage, you are making a durability promise on behalf of every guest, and you’d better have an architecture that tolerates it. Otherwise your benchmark is just a confidence trick.

Mini-story #3: A boring but correct practice that saved the day

A financial services team ran a pair of ZFS storage servers exporting NFS to an application cluster. They had a habit that looked tedious: before each quarterly traffic spike, they ran a “durability drill.” It was simple: verify dataset sync properties, verify SLOG health, verify that flushes behaved as expected, and run a controlled power-loss simulation in a lab environment with the same hardware class.

Engineers complained it was cargo cult. Nothing ever failed in the drill. That was the point. It was the storage equivalent of checking your parachute while still on the ground.

Then they had a real incident: a firmware bug caused one of the log devices to start reporting healthy while silently timing out under certain queue depths. ZFS didn’t immediately fault it, but latency spiked whenever the workload entered a sync-heavy phase. Because the team had baseline latency numbers from prior drills, they detected the deviation quickly and didn’t waste a night blaming NFS, the network, or the application.

They swapped the SLOG pair during a maintenance window, and the cluster returned to normal. No data loss, no mystery. It was boring engineering, which is what you want when money is involved.

Practical tasks: commands, what to look for, and how to interpret

The commands below assume a typical OpenZFS environment on Linux. Adjust pool/dataset names to match your system. Each task includes what the output means in practice.

Task 1: Find your dataset sync settings (and inheritance)

cr0x@server:~$ zfs get -r -o name,property,value,source sync tank
NAME               PROPERTY  VALUE     SOURCE
tank               sync      standard  default
tank/vm            sync      disabled  local
tank/vm/critical   sync      standard  inherited from tank
tank/nfs           sync      standard  local

Interpretation: Don’t just look at the pool root. A single “helpful” dataset with sync=disabled can hide under your VM storage hierarchy. The source column is your truth serum: local vs inherited vs default.

Task 2: Check whether you even have a SLOG (and what it is)

cr0x@server:~$ zpool status -v tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:18:42 with 0 errors on Mon Dec 16 03:10:05 2025
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            nvme0n1                 ONLINE       0     0     0
            nvme1n1                 ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            nvme2n1                 ONLINE       0     0     0
            nvme3n1                 ONLINE       0     0     0

errors: No known data errors

Interpretation: The logs section is your SLOG. If it’s missing, your ZIL lives on the main vdevs. If it’s a single device (not mirrored), decide if you can tolerate the risk profile.

Task 3: Watch per-vdev latency under load

cr0x@server:~$ zpool iostat -v tank 1
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
tank                        1.20T  2.40T    110    980  8.5M  72.1M
  mirror-0                  1.20T  2.40T    110    220  8.5M  15.4M
    nvme0n1                     -      -     60    120  4.2M   7.8M
    nvme1n1                     -      -     50    100  4.3M   7.6M
logs                             -      -      -      -
  mirror-1                       -      -      0    760  0K   56.7M
    nvme2n1                      -      -      0    380  0K   28.4M
    nvme3n1                      -      -      0    380  0K   28.3M
--------------------------  -----  -----  -----  -----  -----  -----

Interpretation: If the log devices are carrying most of the write ops while the main vdevs are calm, your workload is sync-heavy. If performance is bad, the SLOG is the first suspect.

Task 4: Confirm ZFS sees the right ashift (sector size alignment)

cr0x@server:~$ zdb -C tank | grep -E "ashift|vdev_tree" -n | head
52:        vdev_tree:
83:            ashift: 12
121:            ashift: 12

Interpretation: ashift=12 means 4K sectors. Misalignment (too small ashift) can create write amplification and ugly latency, including for sync traffic.

Task 5: Check ZFS pool health and error counters

cr0x@server:~$ zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:18:42 with 0 errors on Mon Dec 16 03:10:05 2025
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            nvme0n1 ONLINE       0     0     0
            nvme1n1 ONLINE       0     0     0

errors: No known data errors

Interpretation: Sync performance debugging on a pool with latent errors is like tuning an engine with a broken oil pump. Fix integrity first.

Task 6: Identify sync-heavy callers by watching fsync activity (Linux)

cr0x@server:~$ sudo perf trace -e fsync,fdatasync,openat -a --duration 10
     0.000 ( 0.008 ms): postgres/1931 fdatasync(fd: 7) = 0
     0.012 ( 0.006 ms): postgres/1931 fdatasync(fd: 7) = 0
     0.030 ( 0.010 ms): qemu-kvm/2210 fsync(fd: 31) = 0
     0.044 ( 0.009 ms): qemu-kvm/2210 fsync(fd: 31) = 0

Interpretation: You’re looking for frequency. If you see thousands of sync calls per second, your latency budget is being spent on durability.

Task 7: Measure sync write latency with fio (direct, sync-like)

cr0x@server:~$ fio --name=syncwrite --filename=/tank/test/syncwrite.bin \
  --rw=write --bs=4k --iodepth=1 --numjobs=1 --direct=1 \
  --fsync=1 --size=1G --runtime=30 --time_based=1
syncwrite: (g=0): rw=write, bs=4K-4K, ioengine=psync, iodepth=1
...
  write: IOPS=6200, BW=24.2MiB/s (25.4MB/s)(726MiB/30001msec)
    clat (usec): min=90, max=9200, avg=160.4, stdev=210.7

Interpretation: --fsync=1 forces a flush each write, approximating worst-case commit behavior. The average latency is nice; the max and tail are what your users feel.

Task 8: Compare behavior with sync=disabled on a disposable dataset

cr0x@server:~$ sudo zfs create tank/test_nodur
cr0x@server:~$ sudo zfs set sync=disabled tank/test_nodur
cr0x@server:~$ fio --name=syncwrite --filename=/tank/test_nodur/syncwrite.bin \
  --rw=write --bs=4k --iodepth=1 --numjobs=1 --direct=1 \
  --fsync=1 --size=1G --runtime=15 --time_based=1
...
  write: IOPS=42000, BW=164MiB/s (172MB/s)(2462MiB/15001msec)
    clat (usec): min=18, max=480, avg=22.7, stdev=7.1

Interpretation: This delta demonstrates exactly what you’re trading away: durable acknowledgment. Treat this result as a diagnostic tool, not a “solution.” If the only way to be fast is to lie, the real fix is improving the sync path (often SLOG).

Task 9: Inspect whether datasets are using zvols (common for VM storage)

cr0x@server:~$ zfs list -t volume -o name,volsize,volblocksize,sync,logbias
NAME           VOLSIZE  VOLBLOCKSIZE  SYNC      LOGBIAS
tank/vm-001      120G         16K     standard  latency
tank/vm-002      200G         16K     disabled  latency

Interpretation: zvols often carry VM IO patterns. Notice any sync=disabled surprises. Also note volblocksize—it influences IO amplification and latency.

Task 10: Check logbias (latency vs throughput preference)

cr0x@server:~$ zfs get -r logbias tank/vm
NAME      PROPERTY  VALUE    SOURCE
tank/vm   logbias   latency  default

Interpretation: logbias=latency encourages using the log device for sync writes. throughput may reduce log usage in some cases. This is not a substitute for correct sync, but it can influence behavior for certain workloads.

Task 11: Confirm compression and recordsize (they can change the sync story indirectly)

cr0x@server:~$ zfs get compression,recordsize tank/nfs
NAME      PROPERTY     VALUE     SOURCE
tank/nfs  compression  lz4       local
tank/nfs  recordsize   128K      default

Interpretation: Compression can reduce physical IO (often good), but it increases CPU work. If your “sync is slow” problem is actually CPU saturation during TXG commits, you’ll chase the wrong culprit unless you measure.

Task 12: Check real-time CPU pressure and IO wait during sync storms

cr0x@server:~$ vmstat 1 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 712344  89244 3821000   0    0     8   420  980 1900 12  6 78  4  0
 5  3      0 701112  89244 3818200   0    0     0  8200 1200 2400 10  7 55 28  0
 3  2      0 700980  89244 3818500   0    0     0  7900 1180 2300 11  7 56 26  0

Interpretation: Rising wa (IO wait) during sync-heavy periods suggests the storage path is blocking CPU progress. If wa is low but latency is high, you might be facing queueing, device flush behavior, or a saturated log device.

Task 13: Spot network filesystems forcing sync semantics (NFS server angle)

cr0x@server:~$ sudo nfsstat -s
Server rpc stats:
calls      badcalls   badfmt     badauth    badclnt
18233990   0          0          0          0

Server nfs v4:
null         compound
0            11239821

Server nfs v3:
null        getattr     setattr     lookup      access
0           226001      8402        110240      90412
commit      write       create      mkdir       remove
340882      901220      4800        220         1900

Interpretation: High NFS commit activity is a clue: clients are demanding stable storage semantics. If commits are frequent and slow, your ZIL/SLOG path matters more than your bulk throughput.

Task 14: Verify the log devices are actually doing work (or not)

cr0x@server:~$ iostat -x 1 5 | egrep 'nvme2n1|nvme3n1|nvme0n1'
nvme2n1  0.00  740.00  0.00  56000.00  0.00  74.20  0.02  0.35  3.10  0.20  74.9
nvme3n1  0.00  735.00  0.00  55800.00  0.00  73.80  0.02  0.34  3.00  0.19  75.1
nvme0n1 55.00  120.00  8200.00  15400.00  3.40  15.30  0.10  0.80  2.10  0.40  25.0

Interpretation: If log devices show high write rates and high utilization, they are in the hot path. If they show near-zero activity under a sync-heavy workload, either the workload isn’t actually sync, the dataset isn’t using standard/always, or the system is configured differently than you think.

Fast diagnosis playbook (what to check first, second, third)

When “ZFS is slow” hits your pager, you don’t have time for philosophy. You need a triage sequence that finds the bottleneck fast.

First: Determine if the pain is sync-related

  1. Check dataset sync properties in the hot path (zfs get -r sync on the relevant datasets/zvols).
  2. Watch zpool iostat -v 1 and look for log device write dominance (or main vdevs struggling with small writes).
  3. On Linux, sample fsync/fdatasync frequency with perf trace (or app-level logs/metrics).

Decision: If you see frequent sync calls and high log activity/latency, it’s a sync path bottleneck. If not, don’t waste your time blaming sync.

Second: Locate the choke point (SLOG vs main vdevs vs CPU)

  1. If there is a SLOG: inspect its device class, health, and utilization. Use iostat -x and zpool iostat -v to see if it’s saturated.
  2. If there is no SLOG: check the main vdev latency under sync-heavy load. Spinning disks doing sync writes can look like “random freezes” to applications.
  3. Check CPU saturation and IO wait (vmstat, top). A busy system can make sync commits look like storage when it’s really scheduling and contention.

Third: Validate durability expectations before “fixing” performance

  1. Identify the workload class: database, VM disks, NFS home directories, build cache, etc.
  2. Ask: what is the acceptable data loss window on sudden power loss? “None” is a valid answer.
  3. Only then consider changes like adding a SLOG, mirroring it, adjusting datasets, or (rarely) disabling sync for truly disposable data.

Common mistakes (with specific symptoms and fixes)

Mistake 1: Disabling sync on VM or database storage “because it’s slow”

Symptoms: Benchmarks and latency improve dramatically; later, after a crash or power event, you see filesystem corruption, missing committed DB transactions, or inconsistent application state.

Fix: Restore sync=standard (or always if appropriate). Invest in a proper SLOG and/or reduce sync pressure at the application layer via supported settings (with explicit risk acceptance).

Mistake 2: Buying a SLOG that is fast but not power-loss safe

Symptoms: Great performance; later, unexplained data integrity issues after abrupt power loss; or you discover the device ignores flushes.

Fix: Use power-loss-protected devices designed for sustained low-latency writes. Mirror the SLOG when the workload’s integrity matters.

Mistake 3: Expecting SLOG to speed up everything

Symptoms: You add a SLOG and see no improvement; or only a small improvement; users still complain about read latency.

Fix: Confirm the workload issues sync writes. If it’s read-heavy, focus on ARC sizing, vdev count, recordsize, and the underlying device performance—not SLOG.

Mistake 4: Tuning without measuring tail latency

Symptoms: Average latency looks fine; users complain about stalls; VM pauses; database commit spikes.

Fix: Measure and graph percentiles (p95/p99). Sync bottlenecks often show up as tail latency from queueing and flush storms.

Mistake 5: Mixing “critical” and “throwaway” data on the same dataset policy

Symptoms: Someone sets sync=disabled for a cache directory and accidentally applies it to a parent dataset used by real data; or inheritance changes after refactors.

Fix: Use separate datasets with explicit properties. Verify with zfs get -r -o name,property,value,source and bake checks into provisioning.

Mistake 6: Ignoring that NFS semantics can force sync behavior

Symptoms: NFS clients show high latency; server CPU/network look fine; storage throughput isn’t maxed; yet operations block.

Fix: Inspect NFS commit/write patterns and ensure the server’s sync path is fast and safe (SLOG + proper devices). Don’t “fix” it by disabling sync unless the data is genuinely disposable.

Checklists / step-by-step plan

Checklist A: Decide the correct sync mode per dataset

  1. Classify the data: critical state, user data, database, VM images, cache, scratch, replicated ephemeral.
  2. Define acceptable loss on crash: none, seconds, minutes, “doesn’t matter.” Write it down.
  3. Set sync=standard for most real data.
  4. Consider sync=always only when you need to enforce durability regardless of caller behavior.
  5. Use sync=disabled only for explicitly disposable datasets (and keep them isolated).

Checklist B: Add a SLOG safely (when justified)

  1. Prove you’re sync-limited: measure fsync rate and ZIL/SLOG utilization.
  2. Choose devices with power-loss protection and predictable latency under sustained writes.
  3. Mirror the SLOG for important workloads.
  4. Add it and validate with real workload tests, not just synthetic throughput.

Step-by-step: Implement a dedicated “safe fast sync” dataset

This is a common pattern for NFS exports or database volumes where you want durability with reasonable performance.

cr0x@server:~$ sudo zfs create tank/prod_safe
cr0x@server:~$ sudo zfs set sync=standard tank/prod_safe
cr0x@server:~$ sudo zfs set logbias=latency tank/prod_safe
cr0x@server:~$ sudo zfs set atime=off tank/prod_safe
cr0x@server:~$ sudo zfs set compression=lz4 tank/prod_safe
cr0x@server:~$ zfs get -o name,property,value,source sync,logbias,atime,compression tank/prod_safe
NAME            PROPERTY     VALUE     SOURCE
tank/prod_safe  sync         standard  local
tank/prod_safe  logbias      latency   local
tank/prod_safe  atime        off       local
tank/prod_safe  compression  lz4       local

Interpretation: This doesn’t “guarantee fast,” but it aligns ZFS behavior with a low-latency sync strategy while keeping durability intact.

Step-by-step: Create an explicitly disposable dataset (and label it like it’s radioactive)

cr0x@server:~$ sudo zfs create tank/scratch_ephemeral
cr0x@server:~$ sudo zfs set sync=disabled tank/scratch_ephemeral
cr0x@server:~$ sudo zfs set compression=off tank/scratch_ephemeral
cr0x@server:~$ sudo zfs set atime=off tank/scratch_ephemeral
cr0x@server:~$ zfs get -o name,property,value,source sync,compression tank/scratch_ephemeral
NAME                  PROPERTY     VALUE     SOURCE
tank/scratch_ephemeral sync         disabled  local
tank/scratch_ephemeral compression  off       local

Interpretation: Make the blast radius obvious. Keep it out of inheritance paths used by real data.

FAQ

1) Does sync=disabled mean “I might lose the last few seconds”?

It can be worse. You can lose data that the application believed was safely committed. That can cause logical corruption: a database or service may acknowledge a transaction and then forget it after a crash. “A few seconds” is not a reliable bound unless the whole stack is designed with that failure mode in mind.

2) If I have a UPS, can I safely disable sync?

A UPS reduces risk but doesn’t eliminate it. Kernel panics, controller resets, firmware bugs, accidental power cuts, and dual power supply failures still happen. Also, a UPS doesn’t help if a device lies about flush completion. If you need durability guarantees, treat UPS as defense-in-depth, not a substitute.

3) Do I need a SLOG on an all-NVMe pool?

Not always. Some NVMe pools have low enough latency that on-pool ZIL is fine. The deciding factor is sync write latency under your workload and the devices’ flush behavior. Measure before buying hardware.

4) Should the SLOG be large?

Usually no. The SLOG holds a short-term log of sync operations until TXGs commit. Latency consistency and power-loss safety matter far more than capacity. Oversizing is common and mostly harmless, but it’s rarely the bottleneck or the win.

5) What’s the difference between ZIL and SLOG again?

The ZIL is a mechanism: the intent log used to satisfy sync semantics. The SLOG is a device: a dedicated place to store ZIL records, ideally faster and safer than the main pool for that specific write pattern.

6) Why is NFS slow on my ZFS server even though local writes are fast?

NFS clients often request stable writes. Local testing often uses buffered/asynchronous IO that doesn’t force flushes. If your ZIL/SLOG path is slow, NFS latency will look terrible while bulk throughput looks fine.

7) Is sync=always safer than sync=standard?

It can be, in the sense that it forces durability even if the application fails to request it. But it can also cause large performance penalties. Use it when you have a specific reason to distrust callers—and when your sync path (SLOG or main pool) can handle it.

8) Can I “fix” sync latency by changing recordsize or compression?

Sometimes indirectly. Sync latency is often about flush/commit behavior and device latency. But if TXG commits are causing backpressure, CPU and write amplification can matter. Don’t guess: measure sync call latency, device latency, and TXG behavior together.

9) If I mirror my SLOG, does that double latency?

It can increase it somewhat, because writes must be committed to both devices. But with proper low-latency devices, mirrored SLOG is often still dramatically faster than logging to slower main vdevs—and it improves resilience against log device failure.

10) What’s the most operator-friendly rule of thumb?

If the workload would file a bug report when it loses a “committed” write, don’t use sync=disabled. Instead, make sync fast the honest way: good devices, good SLOG (when needed), and workload-aware dataset design.

Conclusion

ZFS sync is one of those settings that feels like a cheat code because it can be. It can also be a trap. The property isn’t about “making ZFS faster.” It’s about whether ZFS will honor the caller’s durability contract—or quietly rewrite it.

The production mindset is to treat sync as a policy decision, not a tuning hack. Start by identifying who’s asking for synchronous semantics and why. Measure the ZIL/SLOG path, not just bulk throughput. If you need performance and truth-in-acknowledgment, invest in a proper sync write path and keep the settings aligned with the business promise your systems are making.

← Previous
Local DNS for VPN Users: Stop DNS Leaks and Split Routing Failures
Next →
Debian 13: Your server won’t boot after updates — the clean GRUB rollback that actually works

Leave a comment