ZFS using SATA SSD as SLOG: The Cheap Upgrade That Often Fails

Was this helpful?

You bought a “fast” SATA SSD, added it as a ZFS SLOG, and expected miracles. Instead you got slower NFS, jittery VM latency,
or—worse—a quiet increase in risk that only shows up during the first power event that matters.

A SLOG is not a cache. It is not a turbo button. It is a promise you make to applications that say “this write is safe now.”
If you keep that promise with a consumer SATA SSD, you’re betting production on a device that was never designed for that kind of oath.

What a SLOG actually does (and what it doesn’t)

ZFS has two related concepts that get mashed together in forum lore: the ZIL (ZFS Intent Log) and the SLOG
(Separate LOG device).

ZIL: the in-pool intent log that exists whether you like it or not

The ZIL is not an “extra feature.” It’s part of how ZFS provides POSIX semantics for synchronous writes. When an application
calls fsync(), uses O_SYNC, or when a protocol insists on stable writes (hello, NFS), ZFS must acknowledge only after the
data is safe against a crash.

For sync writes, ZFS first writes transaction records to the ZIL. Later, during the normal transaction group (TXG) commit,
data is written to its final location in the pool. On crash/reboot, ZFS replays the ZIL to recover the last acknowledged sync writes.

SLOG: relocating the ZIL to a dedicated device

By default, the ZIL lives on the main pool, spread across top-level vdevs. Adding a SLOG tells ZFS: “put the ZIL records here instead.”
The goal is to lower sync write latency and smooth out the random write penalty of sync workloads.

It only helps if your workload issues sync writes. If your workload is mostly async (typical bulk writes, many databases tuned for async,
media storage), a SLOG is a decorative spoiler on a minivan.

What a SLOG is not

  • Not an L2ARC. It does not cache reads.
  • Not a write-back cache. It does not absorb all writes; it only accelerates sync write acknowledgments.
  • Not a performance freebie. A bad SLOG can make latency worse and increase failure blast radius.

The SLOG write pattern is brutal: small, mostly sequential, latency-sensitive writes that must be durable now.
The device must honor flushes. It must have power-loss protection or equivalent durability.
And it must not stall under sustained fsync pressure.

Joke #1: A consumer SATA SSD as SLOG is like using a travel pillow as a motorcycle helmet—soft, optimistic, and wrong at speed.

Why SATA SSD SLOG so often disappoints or fails

The cheap upgrade pitch is seductive: “I have a spare SATA SSD. Add it as log. Sync writes go brrr.” Sometimes you get a small win.
Often you get a mess. Here’s why.

1) SATA SSDs lie (or at least, they negotiate)

The SLOG is only as good as the device’s ability to make writes durable on demand. In ZFS terms, this means honoring cache flushes
and not acknowledging writes until data is in non-volatile media.

Many consumer SSDs have volatile DRAM caches and varying levels of firmware discipline around flush commands. Some are great.
Some are “fine until they aren’t.” The worst case is a drive that acknowledges quickly but loses the last seconds of writes on power loss.
For a SLOG, that is catastrophic: ZFS will replay the ZIL after reboot, but if the SLOG lost acknowledged log records, you’ve created
a window for silent corruption or application-level inconsistency.

2) Latency matters more than bandwidth, and SATA is bad at latency under pressure

Sync write performance is dominated by tail latency. A SLOG that does 50,000 IOPS in a benchmark but occasionally pauses
for garbage collection, firmware housekeeping, or SLC cache exhaustion will turn “fast” into “spiky.”

SATA’s protocol and queues are also limited compared to NVMe. You’re not buying just throughput; you’re buying better behavior
under concurrent flush-heavy workloads.

3) Endurance and write amplification show up earlier than you think

A SLOG sees a constant stream of small writes. ZFS writes log records, later discards them after TXG commit.
That churn can produce write amplification and steady wear.

Consumer SATA drives often have lower endurance ratings and weaker sustained write performance once their pseudo-SLC cache fills.
The failure mode is not always “drive dies.” Often it’s: latency degrades, then you start seeing application timeouts,
then someone disables sync to “fix it,” and now you’re running without a safety net.

4) The “single cheap SLOG” is a single point of drama

ZFS can operate without a SLOG. If the log device fails, the pool typically continues, but you may lose the last acknowledged
synchronous writes (because they were only on the failed SLOG).

Mirroring the SLOG removes that class of risk, but then your “cheap upgrade” now needs two devices—and you still need them to be
power-loss safe and stable under flush.

5) You might not have a sync problem at all

Plenty of ZFS performance pain is not the ZIL. It’s undersized ARC, recordsize mismatch, fragmented HDD vdevs,
too many small I/O patterns on RAIDZ, CPU saturation from checksumming/compression, or a hypervisor stack misconfigured for sync.

Adding a SLOG to a system that is already IOPS-bound elsewhere is a classic “tool applied to the wrong wound.”

Facts and historical context worth knowing

A few concrete points—some history, some engineering—that help cut through myths. These aren’t trivia; they change decisions.

  1. ZFS originated at Sun in the mid-2000s with an explicit focus on data integrity: end-to-end checksums and copy-on-write were not optional features.
  2. The ZIL exists even without a SLOG; adding a log device only relocates it. People who “add SLOG for faster writes” often misunderstand this.
  3. NFS traditionally treats many operations as synchronous (or demands stable storage semantics), which is why SLOG talk often starts in NFS-heavy shops.
  4. Early SSD eras made flush behavior notoriously inconsistent; firmware bugs around write barriers and FUA led to years of “it benchmarks fast” surprises.
  5. SATA’s command queue depth and protocol overhead are limited compared to NVMe; this matters most for flush-heavy and parallel I/O patterns.
  6. Power-loss protection (PLP) was historically an enterprise feature because it requires hardware (capacitors) and validation; consumer drives often omit it or implement partial measures.
  7. ZFS TXGs typically commit every few seconds (tunable), which defines how long log records might live before being discarded—short-lived, high-churn writes.
  8. “Disable sync” became a folk remedy in virtualization stacks because it makes benchmarks look great; it also makes crash recovery look like a crime scene.

Failure modes: performance, correctness, and “it seemed fine”

Performance failure: the SLOG becomes the bottleneck

When you add a SLOG, synchronous operations must hit that log device. If that device has higher latency than your pool’s best-case
sync path (say, a pool of decent SSDs, or even a well-behaved mirror of HDDs with write cache behavior you understand), you can make
sync writes slower.

Classic symptom: your pool’s regular writes are fine, reads are fine, but anything that calls fsync spikes to tens or hundreds of
milliseconds intermittently.

Correctness failure: acknowledging writes that aren’t truly durable

The nightmare scenario is a drive that returns success before the data is actually safe. With a SLOG, ZFS is using that “success” to
tell applications “your synchronous write is safe.” If power dies and the drive loses those acknowledged records, ZFS cannot replay what
it never actually got. The pool will import. It may even look clean. Your application will be the one discovering missing or corrupted
last-second transactions.

People ask, “Doesn’t ZFS protect against that?” ZFS protects against a lot. It can’t make a lying device honest.

Reliability failure: the SLOG dies and you lose the last safe writes

A non-mirrored SLOG is a single device standing between you and loss of acknowledged sync writes during that device’s failure window.
Even if you accept the risk, the operational reality is uglier: when a SLOG starts failing, it often fails by stalling I/O,
causing system-wide latency storms. Now you’re troubleshooting in production while your hypervisor queue backs up.

Operational failure: someone “fixes” it by disabling sync

This is where the cheap upgrade often ends. A team adds a SATA SLOG, sees worse latency, flips sync=disabled on a dataset,
celebrates, and unknowingly changes durability semantics for every VM or NFS client using that dataset.

Quote (paraphrased idea) from Werner Vogels: “Everything fails, all the time—design your systems assuming that reality.”

Fast diagnosis playbook

You want to find the bottleneck quickly, without belief-based tuning. Here’s a practical order of operations.

First: prove you have a sync workload

  • Check ZFS dataset sync properties.
  • Check application/protocol behavior (NFS exports, hypervisor settings, database fsync patterns).
  • Run a controlled sync write test and compare with async.

Second: measure SLOG latency and saturation

  • Use iostat to see if the log device is the busiest disk during the issue.
  • Look for spikes in await / service times on the SLOG.
  • Check if the SLOG is a SATA SSD with questionable flush behavior or no PLP.

Third: confirm pool health and TXG behavior

  • Verify no vdev is degraded or resilvering.
  • Check for sync write amplification due to small blocks / recordsize mismatch.
  • Watch TXG commit times and dirty data limits if you suspect stalls.

Fourth: decide whether to remove, mirror, or replace the SLOG

  • If the SLOG is slower than the pool: remove it.
  • If you need SLOG for NFS/VM sync semantics: replace with PLP NVMe or enterprise SATA with capacitors; mirror it if the workload matters.
  • If you don’t need sync acceleration: don’t run a SLOG “just because.”

Hands-on tasks: commands, outputs, and decisions

These are real tasks you can run on a Linux host using OpenZFS. Each one includes what the output means and what decision you make.
Use a test window if you’re changing properties; the commands below are primarily read-only unless stated.

Task 1: Identify whether a SLOG exists and what it is

cr0x@server:~$ sudo zpool status -v tank
  pool: tank
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            sda                     ONLINE       0     0     0
            sdb                     ONLINE       0     0     0
            sdc                     ONLINE       0     0     0
            sdd                     ONLINE       0     0     0
            sde                     ONLINE       0     0     0
            sdf                     ONLINE       0     0     0
        logs
          sdz                       ONLINE       0     0     0

errors: No known data errors

Meaning: There is a dedicated log vdev (sdz). That’s your SLOG.
Decision: If sdz is a consumer SATA SSD, treat it as suspicious until proven otherwise.

Task 2: Confirm the dataset sync setting (and find “helpful” overrides)

cr0x@server:~$ sudo zfs get -r sync tank
NAME          PROPERTY  VALUE      SOURCE
tank          sync      standard   default
tank/vmstore  sync      standard   local
tank/nfs      sync      disabled   local

Meaning: tank/nfs has sync=disabled, which changes correctness semantics.
Decision: If this was done to “fix performance,” treat it as a production risk item and plan a rollback with a proper SLOG or workload change.

Task 3: Determine whether the log device is SATA and what model it is

cr0x@server:~$ lsblk -d -o NAME,ROTA,TRAN,MODEL,SIZE,SERIAL | grep -E 'sdz|nvme'
sdz     0 sata  CT500MX500SSD1   465.8G  2219E5A1B2C3

Meaning: The SLOG is a SATA Crucial MX500 (common consumer drive).
Decision: Assume no full PLP. Plan to validate flush behavior and latency under sync load; strongly consider replacing with a PLP-capable device.

Task 4: Check whether the drive claims to have volatile write cache

cr0x@server:~$ sudo hdparm -W /dev/sdz
/dev/sdz:
 write-caching =  1 (on)

Meaning: Write cache is enabled. That’s not automatically bad if the drive has PLP; it’s dangerous if it doesn’t.
Decision: If this is a consumer SSD without PLP, don’t trust it as a durability device for synchronous acknowledgments.

Task 5: Pull SMART details and look for power-loss protection hints

cr0x@server:~$ sudo smartctl -a /dev/sdz | sed -n '1,60p'
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.0] (local build)
=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron MX500 SSDs
Device Model:     CT500MX500SSD1
Serial Number:    2219E5A1B2C3
Firmware Version: M3CR046
User Capacity:    500,107,862,016 bytes [500 GB]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s
Local Time is:    Thu Dec 26 11:02:41 2025 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Meaning: SMART identifies a consumer SATA SSD family. SMART rarely “confirms PLP” directly on consumer SATA.
Decision: Treat absence of explicit PLP support as “no PLP.” For SLOG, that pushes you toward replacement or removal.

Task 6: Check wear and media errors on the would-be SLOG

cr0x@server:~$ sudo smartctl -a /dev/sdz | egrep -i 'Media_Wearout|Percent_Lifetime|Reallocated|Uncorrect|CRC|Wear|Errors'
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
cr0x@server:~$ sudo smartctl -a /dev/sdz | egrep -i 'Reallocated_Sector_Ct|Reported_Uncorrect|UDMA_CRC_Error_Count|Percent_Lifetime_Remain|Power_Loss'
Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

Meaning: No obvious media errors. That doesn’t mean it’s suitable as SLOG; it just means it’s not already dying loudly.
Decision: If you see CRC errors, suspect cabling/backplane—fix that before blaming ZFS.

Task 7: Watch per-disk latency during the incident window

cr0x@server:~$ sudo iostat -x 1
Linux 6.8.0 (server)  12/26/2025  _x86_64_ (32 CPU)

Device            r/s     w/s   rkB/s   wkB/s  avgrq-sz  avgqu-sz   await  r_await  w_await  svctm  %util
sda              2.0    18.0    64.0   980.0     98.0      1.20   22.3     8.1     24.0    1.8   36.0
sdz              0.0   420.0     0.0  2100.0     10.0     12.50   29.8     0.0     29.8    0.2   98.0

Meaning: The log device sdz is nearly saturated with small writes (avgrq-sz ~10KB), and its await is high.
Decision: Your SLOG is the bottleneck. Either replace it with a low-latency PLP device, mirror it, or remove it to fall back to in-pool ZIL.

Task 8: Confirm that sync writes are actually happening

cr0x@server:~$ sudo zpool iostat -v tank 1
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank        4.22T  6.58T    120    980  9.8M   44.1M
  raidz2-0  4.22T  6.58T    120    910  9.8M   40.7M
    sda         -      -     20    150  1.7M    6.8M
    sdb         -      -     20    150  1.6M    6.8M
    sdc         -      -     20    150  1.6M    6.8M
    sdd         -      -     20    150  1.6M    6.8M
    sde         -      -     20    155  1.7M    6.8M
    sdf         -      -     20    155  1.6M    6.7M
logs            -      -      0     70  0K     3.4M
  sdz           -      -      0     70  0K     3.4M

Meaning: The log vdev is actively receiving writes. That strongly suggests sync activity.
Decision: If you expected async, find out who (NFS, hypervisor, app) is forcing sync. Fix the workload or provision a proper SLOG.

Task 9: Measure sync vs async write latency with fio (carefully)

Run this on a test dataset, not directly on production paths unless you know what you’re doing.

cr0x@server:~$ sudo fio --name=syncwrite --directory=/tank/test --rw=write --bs=4k --iodepth=1 --numjobs=1 --size=512M --fsync=1 --direct=1
syncwrite: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
  write: IOPS=620, BW=2480KiB/s (2540kB/s)(512MiB/211498msec)
    lat (usec): min=450, max=85000, avg=1600.12, stdev=4100.55

Meaning: Sync write latency has ugly tail spikes (max 85ms). That’s what users feel.
Decision: If max latency is high and correlated with SLOG saturation, replace/remove SLOG rather than tuning around it.

Task 10: Validate whether the SLOG is mirrored (it should be, if it matters)

cr0x@server:~$ sudo zpool status tank | sed -n '1,40p'
  pool: tank
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
        logs
          sdz                       ONLINE       0     0     0

Meaning: Single-device SLOG. If it fails, you risk losing the last acknowledged sync writes.
Decision: If your business cares about those writes (VM storage, NFS for home directories, databases), mirror the SLOG or don’t use one.

Task 11: Check if the pool is currently resilvering/scrubbing (performance red flag)

cr0x@server:~$ sudo zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub in progress since Thu Dec 26 10:41:01 2025
        1.92T scanned at 1.34G/s, 512G issued at 356M/s, 4.22T total
        0B repaired, 12.13% done, 02:45:19 to go
config:
...

Meaning: A scrub is in progress. That can amplify latency issues, especially on HDD pools.
Decision: If the performance complaint coincides with scrub/resilver, pause the tuning crusade and retest after the maintenance load is gone.

Task 12: Find out whether the dataset is using small blocks that punish RAIDZ

cr0x@server:~$ sudo zfs get recordsize,volblocksize,compression tank/vmstore
NAME          PROPERTY      VALUE     SOURCE
tank/vmstore  recordsize    128K      local
tank/vmstore  volblocksize  -         -
tank/vmstore  compression   lz4       local

Meaning: If this is a VM image dataset backed by files, 128K may be okay, but often you want smaller blocks for random I/O patterns.
Decision: If your issue is random sync writes on RAIDZ, consider mirrors for VM workloads or tune the VM storage approach—don’t expect a SATA SLOG to save RAIDZ from physics.

Task 13: Verify whether your SLOG is actually being used (not bypassed)

cr0x@server:~$ sudo zdb -C tank | sed -n '1,120p'
MOS Configuration:
        version: 5000
        name: 'tank'
        ...
        vdev_tree:
            type: 'root'
            id: 0
            guid: 12345678901234567890
            children[0]:
                type: 'raidz'
                ...
            children[1]:
                type: 'log'
                id: 1
                guid: 998877665544332211
                children[0]:
                    type: 'disk'
                    path: '/dev/disk/by-id/ata-CT500MX500SSD1_2219E5A1B2C3'

Meaning: The pool config includes a log vdev. ZFS will use it for sync writes unless disabled at dataset level or other constraints apply.
Decision: If you expected a mirrored log but see one child, you’ve found a design flaw, not a tuning knob.

Task 14: Remove a problematic SLOG (if you decide it’s doing harm)

This changes behavior. Schedule it. Communicate it. And remember: removing the SLOG does not “turn off” the ZIL; it moves it back to the pool.

cr0x@server:~$ sudo zpool remove tank sdz
cr0x@server:~$ sudo zpool status tank
  pool: tank
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0

errors: No known data errors

Meaning: The dedicated log is gone. Sync writes now go to in-pool ZIL.
Decision: If latency improves immediately, the SATA SLOG was your bottleneck. Next step is either “no SLOG” or “proper mirrored PLP SLOG.”

Task 15: If you must have a SLOG, add it as a mirror, using stable device IDs

cr0x@server:~$ sudo zpool add tank log mirror /dev/disk/by-id/nvme-INTEL_SSDPE2KX010T8_PHBT1234001A1P0A /dev/disk/by-id/nvme-INTEL_SSDPE2KX010T8_PHBT1234001A1P0B
cr0x@server:~$ sudo zpool status -v tank | sed -n '1,80p'
  pool: tank
 state: ONLINE
config:

        NAME                                              STATE     READ WRITE CKSUM
        tank                                              ONLINE       0     0     0
          raidz2-0                                        ONLINE       0     0     0
            sda                                           ONLINE       0     0     0
            sdb                                           ONLINE       0     0     0
            sdc                                           ONLINE       0     0     0
            sdd                                           ONLINE       0     0     0
            sde                                           ONLINE       0     0     0
            sdf                                           ONLINE       0     0     0
        logs
          mirror-1                                        ONLINE       0     0     0
            nvme-INTEL_SSDPE2KX010T8_PHBT1234001A1P0A     ONLINE       0     0     0
            nvme-INTEL_SSDPE2KX010T8_PHBT1234001A1P0B     ONLINE       0     0     0

errors: No known data errors

Meaning: Mirrored SLOG using NVMe devices (example). That’s the right structural pattern for reliability.
Decision: If you can’t afford two appropriate devices, you can’t afford a SLOG for important sync workloads. Run without it.

Three corporate mini-stories from the trenches

Mini-story #1: An incident caused by a wrong assumption

A mid-sized company consolidated a few aging NAS boxes into a shiny ZFS server. They ran NFS for home directories and build outputs.
Someone read that “SLOG speeds up writes” and added a spare consumer SATA SSD as the log device. They did a quick file copy test, saw
no obvious regression, and moved on. Everyone loves a quick win. Everyone loves a checkbox.

Weeks later, a short power event hit the rack—UPS transfer, not a full blackout. The server stayed up. The SATA SSD did not.
It dropped off the bus for a moment and came back. ZFS didn’t immediately panic; the pool stayed online. The team breathed again.

The next morning, developers complained about “random” build failures and corrupted artifacts. Nothing was consistently broken.
Re-running the same build would succeed. That’s the worst kind of failure: intermittent, confidence-eroding, and hard to reproduce.

The key detail was the assumption: they believed the SLOG was “just performance,” not durability semantics. They also believed an SSD
is inherently safer than spinning disks. But the SLOG was the only place those acknowledged synchronous records lived until TXG commit.
When the device went weird during a power anomaly, some acknowledged log writes never made it.

The fix wasn’t heroic. They removed the SLOG, forced clients to remount, and stopped the corruption pattern. Later they added a mirrored,
power-loss safe log device and documented why it existed. The lesson stuck because the incident was just painful enough, and not so painful
that everyone got fired.

Mini-story #2: An “optimization” that backfired

A virtualization cluster hosted mixed workloads: a few latency-sensitive databases, a lot of general-purpose VMs. The storage team
watched graphs and saw periodic fsync spikes. They added a SATA SLOG to the ZFS backend expecting to flatten those spikes.

Initially, median latency improved a bit. The team celebrated. Then month-end reporting arrived, and the real-world pattern changed:
lots of concurrent sync-heavy transactions. The SLOG drive hit its sustained write limits, the pseudo-SLC cache exhausted, and write
latency went from “mostly fine” to “jittery and occasionally awful.”

The VMs didn’t just slow down; they synchronized their misery. When the SLOG stalled, it stalled the synchronous acks for many VMs,
causing guest OSes to queue, causing applications to time out, causing retries that increased write pressure. Latency storms have a talent
for becoming self-sustaining.

Someone proposed the classic fix: disable sync on the zvol dataset backing the VM storage. It made the graphs look fantastic.
It also turned a crash-consistent workload into a “good luck” workload. A week later, an unrelated host panic forced a reboot.
A database came back with recovery errors that were expensive to triage and impossible to fully “prove” after the fact.

They rolled back the “optimization,” removed the SATA SLOG, and later installed a mirrored PLP NVMe SLOG. The real fix was not
“faster SSD.” It was stable latency and correct semantics, plus a clear policy: no one disables sync without a signed risk acceptance.

Mini-story #3: The boring but correct practice that saved the day

Another org ran ZFS for NFS and iSCSI with a mix of HDD mirrors and SSD mirrors. They had a SLOG—but it was mirrored, and it was made of
enterprise drives with power-loss protection. The choice looked extravagant compared to a cheap SATA SSD. It wasn’t.

What made them different wasn’t just hardware; it was process. They treated the SLOG like a durability component, not a performance toy.
They tracked SMART wear. They ran monthly fault-injection drills during a maintenance window: pull one SLOG device, confirm the pool stays
healthy, confirm performance remains acceptable, replace, resilver, repeat.

One day a firmware bug caused one SLOG device to start throwing errors. The monitoring fired early—before users screamed. They offlined the
device cleanly, replaced it, and kept running on the remaining mirror leg. No data loss. Minimal latency impact. A ticket, a swap, and done.

The incident never became a story inside the company because it never became dramatic. That’s the point. Boring is a feature in storage.

Common mistakes: symptom → root cause → fix

1) Symptom: “Adding SLOG made NFS slower”

Root cause: The SATA SLOG has worse fsync latency than your in-pool ZIL path, especially under load or garbage collection.

Fix: Remove the SLOG and retest; if you need SLOG, replace with low-latency PLP device (preferably mirrored).

2) Symptom: “Latency spikes every few minutes”

Root cause: SSD firmware housekeeping, SLC cache exhaustion, or TXG-related bursts interacting with a saturated SLOG.

Fix: Observe iostat -x for SLOG %util and await. If the SLOG is pegged, replace or remove it.

3) Symptom: “It’s fast in benchmarks, users still complain”

Root cause: You benchmarked throughput, but users feel tail latency. Sync workloads care about the slowest 1%.

Fix: Use fio --fsync=1 with low iodepth and track max latency; solve tail latency, not peak bandwidth.

4) Symptom: “We disabled sync and everything got better”

Root cause: You removed the requirement to make writes durable before ack. Performance “improves” because you changed the contract.

Fix: Re-enable sync for datasets that need correctness; deploy proper SLOG or redesign workload (e.g., local caching, application-level journaling).

5) Symptom: “After power event, application data is inconsistent”

Root cause: Non-PLP SLOG lost acknowledged writes, or drive lied about flush. ZFS cannot replay what never persisted.

Fix: Stop using consumer SATA as SLOG. Use PLP devices; mirror the SLOG; review UPS and write-cache policy.

6) Symptom: “Pool imports fine, but last transactions are missing”

Root cause: Sync acknowledgments were made based on SLOG that didn’t persist.

Fix: Treat as a durability incident. Audit dataset sync history and SLOG hardware. Replace with PLP mirror, then validate recovery procedures.

7) Symptom: “SLOG drive keeps dropping from SATA bus”

Root cause: Cabling/backplane issues, power instability, or consumer SSD firmware not happy with sustained flush patterns.

Fix: Fix hardware path (cables, HBA firmware, power), then stop using that model as log device. A flaky SLOG is worse than no SLOG.

8) Symptom: “SLOG wear climbs fast”

Root cause: High sync write rate + write amplification; consumer endurance is insufficient.

Fix: Use enterprise endurance drives for log, size appropriately, and monitor wear; consider workload changes to reduce forced sync (where safe).

Joke #2: Disabling sync to “fix” SLOG latency is like removing the smoke alarm because it’s too loud.

Checklists / step-by-step plan

Step-by-step: decide whether you should have a SLOG at all

  1. List the consumers of the dataset. NFS? VM images? Databases? Identify who issues sync writes.
  2. Check dataset properties. If sync=disabled is present anywhere, flag it as a risk item.
  3. Measure sync write latency without SLOG. If it’s already acceptable, don’t add complexity.
  4. If you need a SLOG, define the contract. Are you accelerating sync writes for correctness, or masking a design issue?

Step-by-step: if you already installed a SATA SSD SLOG

  1. Identify the device and model. If it’s consumer, assume “not PLP.”
  2. Check if it’s mirrored. If not mirrored, document the risk and prioritize remediation.
  3. Observe under load. Watch iostat -x and zpool iostat -v during sync-heavy periods.
  4. Run a controlled fio sync test. Track max latency and jitter, not just IOPS.
  5. Make the call: remove it, or replace with mirrored PLP devices.

Step-by-step: build a correct SLOG setup (the version you won’t regret)

  1. Pick devices designed for durable low-latency writes. PLP is the headline feature; consistent latency is the hidden one.
  2. Use two devices as a mirror. If you can’t, accept that you’re choosing “risk” as a feature.
  3. Use stable device paths. Prefer /dev/disk/by-id/..., not /dev/sdX.
  4. Test failover. Offline one log device in a maintenance window; confirm the pool stays healthy and latency stays sane.
  5. Monitor wear and errors. Set alerts for SMART wear, media errors, and bus resets.

Operational checklist: things to document so future-you doesn’t suffer

  • Which datasets require sync semantics and why (NFS exports, VM stores, database volumes).
  • The expected behavior if the SLOG fails (what risk exists, what alerts fire, what runbook steps are).
  • How to remove/replace the SLOG safely.
  • Who is allowed to change sync properties and under what approval.

FAQ

1) Will a SLOG speed up all writes on ZFS?

No. It only helps synchronous writes. Async writes bypass it and go through normal TXG buffering and commit.

2) How do I know if my workload is sync-heavy?

Look for heavy writes on the log vdev via zpool iostat -v. Confirm dataset sync settings.
For NFS and many VM stacks, assume significant sync unless you’ve verified client/server settings.

3) Is using a consumer SATA SSD as SLOG always wrong?

For non-critical homelab experimentation, you can do it. For production where data correctness matters, it’s usually the wrong bet.
The risk is durability and latency spikes, not just raw speed.

4) What’s the difference between ZIL and SLOG?

ZIL is the mechanism. SLOG is a dedicated device where ZIL records are stored. Without SLOG, the ZIL lives on the pool.

5) Should SLOG be mirrored?

If you care about acknowledged sync writes surviving device failure, yes. A single SLOG device is a single point of “those writes are gone.”

6) If the SLOG dies, do I lose the whole pool?

Usually no; the pool can keep running or import without the log device depending on failure timing and configuration. The real danger is loss of the most recent acknowledged synchronous writes.

7) Why not just set sync=disabled and move on?

Because you’re changing the storage contract. Databases, VM filesystems, and NFS clients may believe data is safe when it isn’t.
That’s how you get “everything looked fine” followed by post-crash inconsistency.

8) How big does a SLOG need to be?

Often smaller than people think. You’re storing short-lived log records until TXG commit. Size helps with overprovisioning and endurance,
but latency consistency and PLP matter more than capacity.

9) Is NVMe always better for SLOG?

NVMe tends to have better latency characteristics and queueing, but “NVMe” is not a guarantee of PLP or consistent behavior.
You still choose models known for durable flush behavior and stable tail latency.

10) Could my issue be ARC or RAM, not SLOG?

Yes. If reads are thrashing and the system is memory-starved, everything gets slow and you’ll misattribute it to the log.
That’s why the fast diagnosis starts with proving it’s a sync-write bottleneck.

Conclusion: next steps you can act on today

A SATA SSD SLOG is tempting because it’s cheap and easy. That’s also why it’s such a reliable source of production pain:
it changes durability semantics, concentrates sync latency onto a single device, and exposes the ugly corners of consumer SSD behavior.

Practical next steps:

  1. Run the fast diagnosis playbook. Confirm you have a sync problem before you buy hardware or flip properties.
  2. If you already have a consumer SATA SLOG, measure it. If it’s pegged or spiky, remove it and retest. Don’t guess.
  3. If you need a SLOG for NFS/VM correctness, do it properly. Use power-loss protected devices, and mirror them.
  4. Stop treating sync=disabled like a tuning knob. Treat it like a risk acceptance that requires adult supervision.

The goal isn’t maximal benchmark numbers. The goal is stable latency and reliable semantics—especially on the worst day, when power flickers,
a drive misbehaves, and your job becomes explaining reality to people who were promised safety.

← Previous
WordPress “Destination folder already exists”: fix installs without a wp-content mess
Next →
ZFS glossary: VDEV, TXG, ARC, SPA—Everything You Pretended to Know

Leave a comment