There are two kinds of ZFS performance problems: the ones you can fix with architecture, and the ones you try to fix with hardware purchases at 2 a.m. A Separate Log device (SLOG) lives right on that line. For the right workload, it turns “why does every fsync take forever?” into “oh, it’s fine now.” For the wrong workload, it’s a fancy paperweight that adds new failure modes.
This is a field guide written from the perspective of people who actually get paged: what SLOG is, what it isn’t, the “gotchas” that burn teams, and the commands you’ll run when the graph goes vertical. We’ll talk about sync semantics, latency, power-loss protection, and how to tell whether you’re about to make things better—or excitingly worse.
What a SLOG actually is (and why it exists)
Let’s get vocabulary straight, because “SLOG” is one of those terms that gets used like it’s a turbo button.
ZIL is the ZFS Intent Log. It exists in every ZFS pool. It is not a separate device by default. It’s a mechanism: when an application asks for a synchronous write, ZFS must confirm that write is durable (or will survive power loss) before acknowledging it. ZFS accomplishes this by writing a record of the operation to the ZIL, then later folding that change into the main on-disk structures during normal transaction group (TXG) commits.
SLOG is a Separate LOG device: a dedicated vdev where ZFS stores ZIL records instead of using the main pool disks. The point is latency. Rotational disks (and even many SSDs) are terrible at low-latency small sync writes when they’re busy doing other things. A good SLOG device can acknowledge sync writes quickly, then let the pool’s main vdevs absorb bulk work asynchronously.
Here’s the trap: SLOG only helps synchronous writes. If your workload is mostly asynchronous writes—database buffering, streaming ingestion, large sequential copies, backup jobs—SLOG can do nothing, because ZFS wasn’t waiting on the ZIL in the first place.
One more thing: a SLOG device is not a write cache in the sense many people mean. The SLOG does not “hold all writes.” It holds a rolling window of recently acknowledged sync operations, typically seconds of activity, until they’re committed into the pool proper. If the system crashes, ZFS replays those ZIL records on import. If the SLOG lies about durability, you get to experience data archaeology.
Joke #1 (mandatory, and earned): Buying a SLOG for an async workload is like installing a fire escape in a swimming pool—technically impressive, operationally irrelevant.
Interesting facts and short history
Short, concrete context points that help you reason about what you’re touching:
- ZFS was born in an era of “write ordering lies.” Storage stacks historically acknowledged writes before they were truly durable; ZIL exists to give correct semantics to applications that demand it.
- The ZIL is not a journaled filesystem journal. It’s closer to an intent log for recent synchronous operations, not a full metadata journal replaying whole filesystem state.
- SLOG became a common topic when NFS exports got popular. NFS clients (especially for VMs) often issue lots of sync writes or force them via mount/export options, turning latency into a billing event.
- Early SSDs were fast but dishonest. Many consumer SSDs acknowledged flushes quickly but didn’t actually persist data safely during power loss; this is why power-loss protection (PLP) became the SLOG litmus test.
- TXG timing shapes everything. ZFS batches changes into transaction groups; the default commit interval (often around 5 seconds) explains why the SLOG doesn’t need to be huge.
- Special vdev types evolved. ZFS grew dedicated vdev roles (log, cache, special, dedup). SLOG is one of the oldest “separate role” vdevs and also one of the most misunderstood.
- Some vendors shipped “SLOG-in-a-box.” Storage appliances marketed dedicated NVRAM or mirrored log devices because low-latency sync acknowledgment sells well to VM and database buyers.
- “sync=disabled” is the scarlet letter of postmortems. Teams keep rediscovering that performance you “gained” by lying to applications can become data loss you can’t explain to auditors.
When a SLOG helps, when it’s useless, when it’s dangerous
When a SLOG helps
A SLOG helps when your bottleneck is sync write latency. Not throughput. Not read IOPS. Not RAIDZ math. Latency for operations that must be committed before the caller continues.
Classic cases:
- NFS for virtualization (VM disks over NFS). Guest filesystems call fsync. Hypervisors can be sync-happy. The result is a steady rain of small sync writes.
- Databases configured for strong durability (or explicitly issuing fsync on commit). If the database is on ZFS and the dataset is sync-heavy, a good SLOG can reduce commit latency spikes.
- Applications using O_DSYNC / O_SYNC or frequent fsync calls (loggers that actually care, message queues, some CI systems doing paranoid file ops).
- Workloads dominated by small random sync writes where your main vdevs are HDDs or busy SSDs and cannot deliver low, consistent latency.
What you’ll notice when SLOG is the right fix: the system is not “slow everywhere.” It’s slow specifically at sync points. CPU is fine. ARC is fine. Reads are fine. But transactions stall. Latency percentiles look ugly. Users describe “hiccups” rather than steady slowness.
When a SLOG is useless
SLOG is useless when your workload is mostly async, or when the real bottleneck is elsewhere.
- Bulk sequential writes (backups, media ingestion, large file copies). ZFS will buffer in memory and flush in TXGs; SLOG isn’t in the critical path.
- Read-heavy workloads. SLOG has nothing to do with reads. That’s ARC/L2ARC and vdev layout.
- CPU-bound compression, checksumming, or encryption. If you’re pegging cores doing zstd or AES, a SLOG won’t unpeg them.
- Network-bound NFS/SMB. If your NICs are saturated, a SLOG is a very expensive way to keep being saturated.
- Already-low-latency pools (all-NVMe mirrors with plenty of headroom). Sometimes the pool is already faster than your app’s sync cadence; adding a log device adds complexity without benefit.
A common failure pattern: someone sees “write latency” and assumes SLOG. But the writes are async, or the latency is from TXG sync pressure due to memory constraints, or from a single slow disk in a RAIDZ vdev. SLOG doesn’t fix physics; it only changes where sync records land.
When a SLOG is dangerous
SLOG becomes dangerous when it becomes a single point of failure for acknowledged sync writes or when it tempts you into lying about durability.
Risk situations:
- Using a non-PLP consumer SSD as SLOG. It may acknowledge a flush, then lose the last seconds of writes on power loss. ZFS will faithfully replay what it has; the missing part is your problem.
- Using a single-disk SLOG in a system where log loss is unacceptable. If the SLOG device dies in a way that loses in-flight ZIL records, you can lose the sync writes that were already acknowledged to clients.
- Misusing sync settings (e.g.,
sync=disabled) to “get performance back.” This is less “dangerous” and more “eventually career-limiting.” - Overconfidence in mirrored SLOG. Mirroring protects against device failure, but it does not magically add power-loss protection or correct broken flush behavior.
- Adding SLOG to mask an undersized pool. If your main vdevs can’t keep up with the eventual TXG commits, the SLOG can reduce apparent latency while pushing the pool into a backlog. That backlog ends as a cliff.
Joke #2: A SLOG without power-loss protection is like a parachute made of lace—stylish, lightweight, and the last thing you want to discover is “mostly decorative.”
How ZIL, SLOG, TXGs, and sync writes really work
To operate SLOG confidently, you need the mental model. Here’s the one that survives incidents.
Async writes: the happy path
Most writes are asynchronous. The app writes; the OS hands data to ZFS; ZFS puts the changes in memory (ARC and other in-memory structures) and later flushes them to disk as part of a transaction group commit. ZFS batches work to be efficient: fewer seeks, more streaming writes, better compression, better checksumming, better overall throughput.
For async writes, the acknowledgement doesn’t require durable storage. That’s why SLOG is irrelevant: ZFS isn’t waiting on a ZIL write for the application to proceed.
Sync writes: “I need this to survive a crash”
Synchronous writes (or operations that require a synchronous commit) are different. The app says “don’t tell me it’s written until it’s safe.” ZFS must provide a persistence point. It can’t wait for a full TXG commit for every fsync—performance would be terrible—so it writes a log record describing the changes into the ZIL. Once the ZIL record is safely on stable storage, ZFS can acknowledge the sync operation.
Later, when the TXG commits, the actual blocks are written to their final locations in the pool, and the corresponding ZIL records become obsolete. The ZIL is not a permanent log; it’s a short-term promise ledger.
Where the ZIL lives without a SLOG
Without a separate log device, the ZIL lives on the main pool vdevs. That means sync writes compete with everything else: normal writes, reads, scrubs, resilvers, metadata work. On spinning disks, this is often catastrophic for latency: the head seeks to write a small record, then seeks back to continue sequential work, and you’ve just invented random I/O in the middle of a sequential pipeline.
What a SLOG changes
Adding a SLOG moves ZIL writes to a dedicated, typically low-latency device. Now sync writes hit the SLOG, get acknowledged quickly, and the main pool continues its TXG batching mostly undisturbed. The pool still must eventually write the real data during TXG commits. SLOG doesn’t remove the underlying write load; it decouples ack latency from bulk commit throughput.
Why SLOG size is usually small
ZIL records only need to cover the window between a sync acknowledgement and the next successful TXG commit that includes the corresponding data. In steady state, this is seconds, not hours. That’s why most SLOG devices can be small in capacity but must be excellent in latency and power safety.
What happens on crash or power loss
If the system crashes, ZFS imports the pool and replays the ZIL records to bring the filesystem back to a consistent state with respect to acknowledged sync operations. If the SLOG device held those records and it’s present and consistent, you get correctness and a modest import delay depending on how much needs replaying.
If the SLOG device is missing or dead, behavior depends on what exactly happened and how the log vdev failed. Best case: ZFS can fall back to the pool and you lose no acknowledged sync writes because they had already been committed (or the log device failed cleanly). Worst case: your clients were told “your data is safe” and it isn’t. That’s why SLOG devices must be treated as durability components, not performance toys.
Two knobs you will see: sync and logbias
sync is a dataset property that affects how ZFS treats requests for synchronous behavior. It has values like standard (default), always, and disabled. These don’t change what the app asks for; they change how ZFS responds to it. If you set sync=disabled, you are choosing to lie to applications that requested durability.
logbias is a dataset property that can be latency (default) or throughput. It influences how ZFS uses the log for certain workloads. Setting it wrong can negate a SLOG or shift work in ways you don’t expect. It is not a magic “make SLOG faster” switch; it is a hint about what you care about.
What makes a good SLOG device (and what makes a bad one)
The real requirements: latency consistency and honest durability
A good SLOG device has:
- Low latency for small, sync writes (especially at high queue depth variability).
- Power-loss protection (PLP) so acknowledged writes survive sudden power loss.
- Correct flush/FUA behavior. ZFS relies on the device honoring barriers/flushes properly.
- Enough endurance for sustained sync workloads.
- Predictable performance under stress (garbage collection and thermal throttling are the silent killers of “it benchmarked fine”).
A bad SLOG device is usually “fast on paper” but lacks PLP or has firmware that treats flush as a suggestion. Consumer NVMe can be extremely fast for async writes and still be risky for sync durability if it uses volatile write caches without protection. The SLOG path is allergic to lies.
Mirroring the SLOG: when it matters
If you cannot tolerate losing recent acknowledged sync writes due to a single device failure, mirror the SLOG. This is common in corporate environments where NFS exports back VM datastores and a single lost log record becomes a VM filesystem corruption story.
Mirroring is not optional “belt and suspenders” if the business expectation is “sync writes are safe.” It is part of making the log device as reliable as the rest of your storage design.
But don’t confuse mirrored with safe
Two unsafe devices mirrored together create a redundant way to be wrong. If both devices acknowledge flush without persistence, you can lose data twice as reliably. The non-negotiable property is power-loss protection and correct write ordering.
Don’t overspend on capacity
SLOG capacity is rarely the limiting factor. What matters is performance and safety. If you buy a 2 TB “gaming NVMe” as a log device, you bought the wrong thing expensively. A smaller enterprise SSD with PLP is usually the correct answer.
Placement and topology matter
Putting SLOG behind a flaky HBA, an expander that’s already saturated, or a controller with write-cache shenanigans can erase its benefits. The SLOG is on the sync path; the sync path needs to be boring and deterministic.
Practical tasks: commands, outputs, and how to interpret them
These are the commands you actually run when someone says “ZFS is slow” and you need to prove which part is slow. Each task includes a command and what to look for. Adjust pool/dataset names to your environment.
Task 1: Confirm whether a SLOG exists and how it’s configured
cr0x@server:~$ sudo zpool status -v tank
pool: tank
state: ONLINE
scan: scrub repaired 0B in 00:32:10 with 0 errors on Wed Dec 18 03:12:20 2025
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-HDD_A ONLINE 0 0 0
ata-HDD_B ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
nvme-INTEL_SLOG_A ONLINE 0 0 0
nvme-INTEL_SLOG_B ONLINE 0 0 0
errors: No known data errors
Interpretation: Look for a logs section. If it’s missing, you don’t have a separate log vdev. If it exists, note whether it’s mirrored. Also note if the log vdev shows errors; a sick SLOG is a correctness risk.
Task 2: Verify dataset sync behavior (the single most important property here)
cr0x@server:~$ sudo zfs get -o name,property,value,source sync tank/vmstore
NAME PROPERTY VALUE SOURCE
tank/vmstore sync standard default
Interpretation: standard means ZFS respects the application’s sync requests. always forces sync semantics (more durable, often slower). disabled is the “make it fast, make it fragile” mode.
Task 3: Check logbias because it changes the log behavior
cr0x@server:~$ sudo zfs get -o name,property,value,source logbias tank/vmstore
NAME PROPERTY VALUE SOURCE
tank/vmstore logbias latency default
Interpretation: latency is typically correct for VM and database workloads. throughput can reduce logging in some cases but may increase sync latency or change patterns. Treat changes here like production changes, because they are.
Task 4: Identify whether clients are forcing sync (common with NFS)
cr0x@server:~$ sudo zfs get -o name,property,value,source sync tank/nfs
NAME PROPERTY VALUE SOURCE
tank/nfs sync standard local
Interpretation: If you see sync=always on an NFS dataset, expect heavy SLOG dependency. If you see sync=disabled, expect the postmortem to include the word “corruption.”
Task 5: Watch ZFS I/O in real time and see if the log device is active
cr0x@server:~$ sudo zpool iostat -v tank 1
capacity operations bandwidth
pool alloc free read write read write
-------------------------- ----- ----- ----- ----- ----- -----
tank 1.20T 2.40T 50 900 3.2M 85.0M
mirror-0 1.20T 2.40T 50 520 3.2M 42.0M
ata-HDD_A - - 25 260 1.6M 21.0M
ata-HDD_B - - 25 260 1.6M 21.0M
logs - - 0 380 0 14.0M
mirror-1 - - 0 380 0 14.0M
nvme-INTEL_SLOG_A - - 0 190 0 7.0M
nvme-INTEL_SLOG_B - - 0 190 0 7.0M
-------------------------- ----- ----- ----- ----- ----- -----
Interpretation: If your workload is sync-heavy, you should see write ops on the logs vdev. If logs show zero while apps complain about fsync latency, you may not be dealing with sync writes at all—or the dataset isn’t using the SLOG the way you think.
Task 6: Check overall pool latency and per-vdev behavior (spot the slow disk)
cr0x@server:~$ sudo zpool iostat -v -l tank 5
capacity operations bandwidth total_wait disk_wait
pool alloc free read write read write --------- ---------
tank 1.20T 2.40T 80 1100 5.1M 95.0M 18ms 12ms
mirror-0 1.20T 2.40T 80 650 5.1M 48.0M 22ms 16ms
ata-HDD_A - - 40 325 2.6M 24.0M 18ms 12ms
ata-HDD_B - - 40 325 2.6M 24.0M 27ms 20ms
logs - - 0 450 0 18.0M 1ms 1ms
mirror-1 - - 0 450 0 18.0M 1ms 1ms
nvme-INTEL_SLOG_A - - 0 225 0 9.0M 1ms 1ms
nvme-INTEL_SLOG_B - - 0 225 0 9.0M 1ms 1ms
Interpretation: This shows wait time. If one disk has significantly higher disk_wait, it can drag the vdev and pool. A SLOG won’t fix a sick vdev that can’t commit TXGs fast enough.
Task 7: Confirm whether sync writes are being generated at all (use a quick fsync micro-test)
cr0x@server:~$ cd /tank/vmstore
cr0x@server:/tank/vmstore$ sudo bash -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
cr0x@server:/tank/vmstore$ /usr/bin/time -f "elapsed=%e sec" bash -c 'dd if=/dev/zero of=./fsync.test bs=4k count=4096 conv=fdatasync'
4096+0 records in
4096+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.42 s, 40.0 MB/s
elapsed=0.42 sec
Interpretation: conv=fdatasync forces a sync at the end. Run this with and without a SLOG (or on a comparable host). If you don’t see improvement, either the pool is already fast at sync, or you’re bottlenecked elsewhere (CPU, network, TXG pressure).
Task 8: Observe TXG behavior and look for commit pressure (Linux OpenZFS)
cr0x@server:~$ grep -E "txg|zil" /proc/spl/kstat/zfs/arcstats | head
txg_sync 4 0
txg_quiesce 4 0
txg_wait_synced 4 0
zil_commit 4 0
Interpretation: Counters vary by version, but if you’re seeing a lot of time waiting for TXGs to sync (or ZIL commits), your problem may be “pool can’t flush fast enough,” not “SLOG is missing.” On production, use your normal telemetry pipeline; this is the “I’m on the box right now” view.
Task 9: Check if the log device is actually fast at flushes (basic device health)
cr0x@server:~$ sudo smartctl -a /dev/nvme0n1 | sed -n '1,25p'
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.0] (local build)
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPED1D280GA
Serial Number: BTKA1234567F280G
Firmware Version: EDF1470C
NVMe Version: 1.3
...
Critical Warning: 0x00
Temperature: 36 Celsius
Available Spare: 100%
Percentage Used: 2%
Data Units Written: 134,221 [68.7 GB]
Interpretation: You’re looking for a healthy device with low media wear. The SLOG can take sustained write pressure; consumer devices can wear quickly in sync-heavy environments.
Task 10: Add a mirrored SLOG safely (example)
cr0x@server:~$ sudo zpool add tank log mirror /dev/disk/by-id/nvme-INTEL_SLOG_A /dev/disk/by-id/nvme-INTEL_SLOG_B
cr0x@server:~$ sudo zpool status tank | sed -n '1,25p'
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-HDD_A ONLINE 0 0 0
ata-HDD_B ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
nvme-INTEL_SLOG_A ONLINE 0 0 0
nvme-INTEL_SLOG_B ONLINE 0 0 0
Interpretation: Use stable device paths (/dev/disk/by-id) so a reboot doesn’t reshuffle names. After adding, watch zpool iostat to confirm the workload uses it.
Task 11: Remove a SLOG (and what “safe” looks like)
cr0x@server:~$ sudo zpool remove tank mirror-1
cr0x@server:~$ sudo zpool status tank | sed -n '1,35p'
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-HDD_A ONLINE 0 0 0
ata-HDD_B ONLINE 0 0 0
errors: No known data errors
Interpretation: Removing the log vdev is supported on modern OpenZFS. Do it during a calm window; watch client latency. If performance collapses, you’ve confirmed the workload is sync-sensitive.
Task 12: Identify “sync storms” from NFS or VM workloads
cr0x@server:~$ sudo nfsstat -s | sed -n '1,25p'
Server rpc stats:
calls badcalls badfmt badauth badclnt
214748 0 0 0 0
Server nfs v4:
null compound
0 198322
Server nfs v3:
null getattr setattr lookup access readlink
0 4123 1882 22091 7851 0
read write create mkdir symlink mknod
70221 88011 312 12 0 0
commit remove rename link readdir readdirplus
65002 401 33 0 112 909
Interpretation: Lots of NFS commit calls can correlate with sync pressure. Pair this with zpool iostat to see if the SLOG is absorbing log writes or if the main vdevs are taking the hit.
Task 13: Validate whether the dataset is accidentally forcing sync
cr0x@server:~$ sudo zfs get -r -o name,property,value,source sync tank | sed -n '1,30p'
NAME PROPERTY VALUE SOURCE
tank sync standard default
tank/nfs sync always local
tank/vmstore sync standard default
tank/backups sync standard default
Interpretation: One dataset with sync=always can dominate perceived pool performance. Sometimes it’s correct; sometimes it’s cargo-culted from a forum post years ago.
Task 14: Spot a pool that’s falling behind on TXG commits (classic backpressure)
cr0x@server:~$ sudo zpool iostat tank 1
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
tank 1.20T 2.40T 40 2200 2.4M 240M
tank 1.20T 2.40T 35 2400 2.1M 260M
tank 1.20T 2.40T 38 2600 2.3M 280M
Interpretation: High sustained write bandwidth isn’t inherently bad, but if applications are stalling on sync while bandwidth is high, it can indicate that TXGs are taking longer to sync (dirty data pressure), and you’re approaching the point where ZFS has to throttle. A SLOG can hide the start of this problem while the pool quietly accumulates work.
Fast diagnosis playbook
This is the “I have 10 minutes before the incident call turns into a blame ritual” sequence. You’re trying to answer one question: Is the bottleneck sync latency, or something else?
First: determine whether the workload is sync-heavy
- Check dataset properties:
zfs get sync,logbias <dataset>. - Watch log vdev activity:
zpool iostat -v <pool> 1. If the log vdev is idle during “sync pain,” the SLOG isn’t in the path. - Look for client behavior (NFS commit calls, database fsync rates, VM storage patterns). If you can’t observe the app, at least observe the server side: NFS stats, iostat, latency.
Second: if sync-heavy, decide whether the SLOG is the bottleneck
- Use per-vdev latency:
zpool iostat -v -l <pool> 5. If the log vdev has higher wait than expected, it’s not doing its job. - Check device health and errors:
zpool status -vandsmartctl. - Confirm the log device isn’t behind a slower path (shared PCIe lanes, saturated HBA, misconfigured multipath).
Third: if SLOG looks fine, the pool commit path is likely the bottleneck
- Check main vdev latency in
zpool iostat -l. A slow HDD, a dying SSD, or a rebuilding vdev will dominate. - Look for background work: scrub, resilver, heavy snapshots, send/recv, compression/encryption CPU saturation.
- Check memory pressure: if the box is low on RAM or ARC is constrained, ZFS can hit dirty data limits and throttle hard.
Fourth: decide what to change (in priority order)
- If sync-heavy and no SLOG: add a proper mirrored PLP SLOG.
- If sync-heavy and SLOG exists but is consumer-grade: replace it.
- If commit path is slow: fix vdev layout, remove the slow disk, add mirrors, reduce RAIDZ width for IOPS workloads, or scale out.
- If the issue is “someone forced sync everywhere”: adjust dataset/export/app settings with a clear durability stance.
Three corporate-world mini-stories
Mini-story #1: The incident caused by a wrong assumption
The ticket was worded politely: “Intermittent VM filesystem corruption on power events.” The storage team’s first reaction was defensive—no one likes being told their platform is eating data—but the pattern was too clean. It only happened after a sudden reboot of a rack PDUs during maintenance testing. Normal reboots were fine.
The environment was a ZFS-backed NFS datastore for a hypervisor cluster. They had a SLOG device, and they were proud of it. “We bought NVMe, it’s fast,” they said, which is the first sentence of many postmortems. The SLOG was a single consumer NVMe stick because “it’s just a log.” The pool itself was mirrors of HDDs, perfectly respectable for capacity and mostly fine for throughput, but sync latency was their pain point, so SLOG felt like the right lever.
After a few reproductions and some uncomfortable silence, the lesson emerged: the NVMe acknowledged flushes quickly but didn’t reliably persist the last moments of writes during abrupt power loss. The hypervisors had been told “your commits are safe,” because the NFS protocol and guest filesystems rely on that promise. Then the box lost power. On import, ZFS replayed what it had in the log. What it didn’t have was the last bit that the device had pretended was durable.
The fix was not heroic. They replaced the SLOG with an enterprise device with PLP and mirrored it, because they didn’t want a single component to be the “oops” axis. They also stopped saying “it’s just a log.” In ZFS, the log is part of your durability contract.
What changed operationally was more important than the hardware: they documented what “sync safe” meant for their NFS exports, added power-failure simulation to their acceptance testing, and wrote a runbook entry that began with “Confirm SLOG has PLP.” The corruption incidents stopped, and the only thing they lost was the illusion that performance parts can be treated like accessories.
Mini-story #2: The optimization that backfired
A finance-adjacent application had periodic bursts: end-of-month, end-of-quarter, end-of-year—the holy trinity of “why is it slow today?” It ran on VMs on NFS backed by ZFS. During bursts, latency shot up and the app would time out. Someone discovered that setting sync=disabled made the graphs look gorgeous. They rolled it out quickly, because the business was screaming and the change was reversible. In fairness, it worked. It also planted a landmine.
The backfire arrived weeks later in the form of “we rebooted after kernel updates and the app’s last few minutes of transactions are missing.” Not corrupt. Not partially written. Just missing. The database had issued sync writes on commit, the storage lied and said “sure,” and the universe collected its debt at the next crash boundary.
The team tried to argue it was acceptable because the missing window was small. But the business’s definition of “small” is “none,” especially when auditors exist. More subtly, they realized they had trained the application team to assume “storage is fast now,” so they increased concurrency and reduced retries. When they switched back to correct sync semantics, the performance cliff was steeper than before because other knobs had been turned in the meantime.
The long-term fix was boring: a real SLOG, mirrored, with PLP, sized appropriately, plus changes in the app to batch commits more intelligently. They also set explicit SLOs for commit latency and measured fsync rates so “sync” wasn’t a mystical property anymore. The optimization didn’t just backfire—it damaged institutional understanding, and they had to rebuild it.
The operational takeaway: if you “optimize” by changing correctness, you didn’t optimize; you borrowed time at variable interest. ZFS will let you do it. ZFS will also let you explain it later.
Mini-story #3: The boring but correct practice that saved the day
A storage cluster was being expanded. Nothing dramatic: add shelves, add vdevs, rebalance workloads over time. The team had a ritual: after any hardware change, they ran a short verification suite—pool status checks, a sync-write latency probe, and a simulated client workload that produced a steady stream of fsync operations. It took 20 minutes and everyone complained about it until the day it mattered.
During expansion, sync latency quietly doubled. Not enough to trigger alerts immediately, but enough to show up as tail latency spikes in VM operations. The boring verification suite caught it. The team compared results with their baseline and saw that the log vdev was now showing higher wait times than usual.
The culprit was not ZFS at all. A firmware update on the new server’s PCIe backplane had changed link negotiation, and the NVMe slot intended for the SLOG was running at a reduced lane width. The SLOG was still “fast” in throughput tests, because sequential numbers look great, but its latency under flush-heavy workloads degraded, which is the only thing SLOG is paid to be good at.
They moved the SLOG devices to different slots, confirmed link width, reran the latency probe, and the tail spikes vanished. No outage, no customer impact, no incident call. The only visible artifact was a line in the change record: “Corrected NVMe lane width; restored sync latency baseline.”
This is what competence looks like in production: not heroics, but small tests that prevent you from learning about PCIe negotiation via angry Slack messages.
Common mistakes, symptoms, and fixes
This section is written in the language of symptoms, because that’s how you meet SLOG in real life.
Mistake 1: Adding a SLOG and expecting faster bulk writes
Symptoms: No noticeable improvement; backup jobs are still slow; sequential write throughput unchanged; log vdev shows minimal activity.
Why: The workload is async. SLOG only accelerates sync write acknowledgements.
Fix: Remove the SLOG if it adds risk without benefit, or keep it only if you have sync workloads you haven’t measured yet. Optimize vdev layout, recordsize, compression, or network if the workload is bulk I/O.
Mistake 2: Using a consumer SSD/NVMe without PLP as SLOG
Symptoms: Rare but catastrophic corruption or missing transactions after sudden power loss; clean reboots fine; import after crash sometimes slower; post-crash inconsistencies.
Why: The device acknowledges writes and flushes that are not actually persistent.
Fix: Replace with an enterprise device with PLP and proven flush behavior. Mirror it if you need availability and to reduce the chance of log loss on device failure.
Mistake 3: Single-disk SLOG for a workload that treats sync as sacred
Symptoms: After SLOG failure, clients see I/O errors or experience a disruptive performance shift; risk exposure around acknowledged writes; anxiety during device replacements.
Why: The log device is now part of the durability path; a single point of failure is a gamble.
Fix: Mirror the SLOG. Keep spares. Test replacement procedures. If mirroring is not possible, reconsider whether the workload should be on this architecture.
Mistake 4: Setting sync=disabled to make graphs green
Symptoms: Performance improvement followed by missing/corrupt data after crash; auditors become interested in your weekend plans.
Why: You told ZFS to acknowledge sync writes without making them durable.
Fix: Set sync=standard (or always if required). Add a proper SLOG and/or change app behavior (batching, group commit) to reduce fsync frequency.
Mistake 5: Treating SLOG as a “cache” and oversizing it
Symptoms: No benefit from huge SLOG capacity; budget spent with no measurable latency reduction.
Why: SLOG holds a short window of ZIL records, not all writes.
Fix: Size for latency and safety, not capacity. Spend money on quality (PLP, endurance), not terabytes.
Mistake 6: Ignoring the main pool’s ability to commit TXGs
Symptoms: Sync latency improves initially with SLOG, then the system hits periodic stalls; throughput looks fine but apps time out; “everything freezes every few seconds.”
Why: The SLOG decouples acknowledgement but not the eventual need to commit data to the pool. If the pool can’t keep up, it will throttle.
Fix: Improve pool write performance (more vdevs, mirrors, faster disks), reduce write amplification (recordsize, compression), or reduce workload pressure. SLOG is not a substitute for adequate commit throughput.
Mistake 7: Confusing L2ARC with SLOG
Symptoms: “We added a SLOG and reads are still slow.”
Why: SLOG is for sync writes. L2ARC is a read cache extension. Different problems, different tools.
Fix: If reads are slow, investigate ARC hit ratio, dataset properties, vdev read IOPS, and possibly L2ARC. Don’t buy a log device for a read problem.
Checklists / step-by-step plan
Checklist: Decide whether you need a SLOG at all
- Measure sync intensity: Identify fsync-heavy apps (databases, VM/NFS) and whether they are truly bottlenecked on sync latency.
- Observe log activity: If you already have a SLOG, verify it’s used. If you don’t, estimate potential benefit by measuring fsync latency now.
- Rule out obvious bottlenecks: Network saturation, CPU saturation, a scrub/resilver, or a single failing disk can dominate symptoms.
- Decide on durability stance: Confirm you need correct sync semantics. If you don’t, document that explicitly; don’t “accidentally” run unsafe.
Checklist: Choose the right SLOG device
- PLP is mandatory: Choose devices with power-loss protection and enterprise firmware behavior.
- Prefer low latency over big capacity: Your goal is fast, consistent flush latency.
- Check endurance: Sync-heavy workloads can write continuously; ensure TBW/DWPD is appropriate.
- Avoid performance cliffs: Test under sustained load; watch for thermal throttling or garbage collection spikes.
- Mirror if correctness matters: Particularly for NFS VM datastores and critical databases.
Step-by-step plan: Add a SLOG safely in production
- Baseline first: Capture latency percentiles from your monitoring and run a small fsync test on the dataset.
- Confirm device paths: Identify log devices via
/dev/disk/by-idto avoid renaming issues. - Add mirrored log: Use
zpool add POOL log mirror DEV1 DEV2. - Validate usage: During peak sync workload, verify writes on the log vdev with
zpool iostat -v. - Re-baseline: Run the same fsync test; compare latency distributions, not just averages.
- Document the contract: Record that “sync writes are durable and protected by mirrored PLP SLOG,” and note replacement procedures.
Step-by-step plan: If you suspect SLOG is hurting you
- Check for errors:
zpool status -vand SMART stats. - Check log latency:
zpool iostat -v -l POOL 5. If log wait is high, you have a problem. - Consider temporary removal: In a controlled window,
zpool remove POOL LOGVDEVand measure behavior. This is diagnostic, not a permanent plan. - Replace with proper hardware: If you confirm SLOG is the bottleneck, replace it with a device designed for sync workloads.
FAQ
1) Does adding a SLOG make ZFS writes faster?
Only synchronous writes. It improves the latency of operations that must be acknowledged as durable (fsync, O_SYNC, NFS commit-heavy patterns). It does not increase bulk async write throughput by itself.
2) Is SLOG the same thing as ZIL?
No. ZIL is the mechanism and exists in every pool. SLOG is a separate device where ZIL records are stored to improve sync write latency.
3) How big should a SLOG be?
Usually small. You’re storing a short rolling window of sync write intent until TXGs commit. Size is rarely the constraint; latency and power safety are.
4) Should I mirror my SLOG?
If you care about not losing acknowledged sync writes due to a single device failure, yes. For VM datastores over NFS and critical databases, mirrored SLOG is a common “correct by design” choice.
5) Can a SLOG reduce write amplification on the main pool?
Not really. It changes where sync intent is recorded, not the amount of data the pool must eventually write. It can make the main pool’s I/O pattern less chaotic by removing small sync writes from the main vdevs, which can indirectly help, but it doesn’t erase the workload.
6) What’s the difference between SLOG and L2ARC?
SLOG accelerates synchronous write acknowledgements. L2ARC is a secondary read cache. Mixing them up is common, and expensive.
7) Is sync=disabled ever acceptable?
Only if you are willing to lose recent transactions on crash and you’ve explicitly accepted that risk (and documented it). Many internal “scratch” workloads can live with it. Most production databases and VM datastores should not.
8) Why did my performance get worse after adding a SLOG?
Common causes: the log device has poor flush latency, it’s thermally throttling, it’s behind a constrained controller path, or the workload wasn’t sync-heavy and you just added overhead/complexity. Measure log vdev latency with zpool iostat -l and validate that sync writes are actually using it.
9) Can I partition a device and use part of it as SLOG?
Technically possible, but operationally risky unless you’re disciplined. Shared devices can create contention and failure coupling. The sync path wants isolation and predictability.
10) What dataset settings typically pair with a SLOG for VM/NFS?
Usually sync=standard (or always if your environment requires it) and logbias=latency. The rest depends on workload: recordsize, compression, atime, and NFS export settings matter, but they’re not SLOG-specific.
Conclusion
A ZFS SLOG is one of those components that looks like a simple upgrade until you understand what it’s upgrading: a correctness boundary. When your workload is sync-heavy, a proper SLOG can be transformative, especially for NFS-backed virtualization and commit-heavy databases. When your workload is async, it’s dead weight. And when you pick the wrong device—or use the right device in the wrong way—it becomes a neat way to turn “storage performance tuning” into “data recovery conversation.”
The operational rule is simple: measure first, optimize the right thing, and treat the log device like part of your durability story, not a side quest. If you do that, SLOG is boring in the best way: it quietly makes sync writes fast, and nothing exciting happens.