ZFS SLOG power-loss protection: The Feature Your SSD Must Have

Was this helpful?

If you run ZFS in production long enough, you’ll eventually meet the villain of the story: synchronous writes. They’re the reason your NFS datastore feels “mysteriously” sluggish, your VM platform stalls during a busy hour, or your database insists on a latency budget you can’t seem to buy with more spindles.

The hero people reach for is a SLOG device—an SSD dedicated to handling ZFS’s write intent log (ZIL). But here’s the twist: the single most important feature of a SLOG SSD is not throughput, not even raw latency. It’s power-loss protection (PLP). Without it, you’re installing a performance accessory that can turn into a durability liability. In other words: you can make things faster, but also easier to lose in a bad day.

What a SLOG is (and what it isn’t)

Let’s get terminology straight, because half the SLOG arguments on the internet are really arguments about what problem we’re solving.

ZIL is the ZFS Intent Log. It exists to satisfy synchronous write semantics: when an application asks for a write that must be durable before continuing, ZFS needs a place to commit that intent quickly, even if the main pool is slow (think HDD RAIDZ). If the system crashes before those writes are committed into the main pool, ZFS can replay the intent log on import.

SLOG is a separate log device—a dedicated vdev used to store the ZIL on fast media. In practice, a SLOG is mainly a latency reducer for synchronous writes. It does not make your bulk sequential writes faster. It does not magically improve async workloads. It doesn’t fix a pool that’s already saturated on random reads. It does one thing very well: it gives sync writes a short, fast, reliable landing strip.

The most common surprise: the SLOG is not a “write cache” in the usual sense. ZFS still writes the data to the main pool. The SLOG is where the system writes the transaction information and (for sync writes) the data blocks required to satisfy durability until they’re flushed to the pool. That’s why SLOG capacity requirements are often modest and why “fast + safe” beats “big.”

Joke #1: A SLOG without power-loss protection is like a parachute packed by someone who “usually gets it right.” Technically a parachute, emotionally a suggestion.

Why power-loss protection is non-negotiable for SLOG

Power-loss protection (PLP) is the ability of an SSD to keep its in-flight writes safe when the lights go out—typically via onboard capacitors that provide enough energy to flush volatile DRAM buffers to NAND. Many consumer SSDs have volatile write caching and are allowed to acknowledge writes before those writes are actually safe on persistent media.

That is acceptable in many desktop scenarios because the filesystem or application may not rely on strict ordering and durability. ZFS SLOG is the opposite. ZFS uses the log to satisfy a promise: “yes, that sync write is durable now.” If your SSD lies—acknowledges completion but loses power before the data is truly persistent—you can end up with missing or corrupt intent log records. Best case, you lose a few seconds of acknowledged sync writes (which is already unacceptable for the applications that requested sync). Worst case, you create a replay mess that increases recovery time or yields application-level inconsistency.

In SRE terms: if your service-level objective includes “committed transactions survive a crash,” the log device must not turn power failure into silent data loss. PLP isn’t about performance first; it’s about making the performance you gain legitimate.

There’s also a less obvious reason: a non-PLP SSD can fail in a way that looks like “random latency spikes.” Under power jitter or controller hiccups, the drive may aggressively flush caches or retry, dragging latency into the milliseconds-to-seconds range. ZFS sync writes are latency-sensitive. Your users don’t care that the average latency is great if the 99.9th percentile is “why did my VM freeze.”

What PLP does not guarantee: it doesn’t make a drive immortal, it doesn’t prevent firmware bugs, and it doesn’t protect you from kernel panics. It specifically addresses the “power went away while data was in volatile buffers” problem. That’s a major class of failure in datacenters and branch offices alike.

How ZIL/SLOG actually works: the engineer’s version

ZFS batches writes into transaction groups (TXGs). For asynchronous writes, ZFS can absorb data into memory (ARC) and later flush it to disk when the TXG syncs. For synchronous writes, ZFS must ensure the write is stable before acknowledging it to the caller. That’s where the ZIL comes in.

When an application issues a sync write, ZFS records enough information in the ZIL to replay that write after a crash. This record can include the data itself depending on the write size and configuration. The ZIL is written sequentially in a log-like fashion, which is friendly to low-latency devices. On a pool without a separate log, these writes land on the main vdevs, which might be HDDs with miserable fsync latency.

With a dedicated SLOG, ZFS writes these log records to the SLOG vdev instead of scattering them across the pool. Later, when the next TXG commit happens, the main pool receives the real data and the ZIL entries become obsolete. The SLOG is not a permanent home; it’s a temporary staging area to make sync semantics practical.

Two operational consequences matter:

  • The SLOG is on the write acknowledgement path for sync writes. If it’s slow, your clients are slow.
  • The SLOG must be reliable under power loss. If it lies, it breaks the promise.

Another subtlety: ZFS already has ordering guarantees. A SLOG with volatile caches that ignore flushes or reorder writes can violate assumptions. Enterprise SSDs with PLP are typically designed to honor write barriers and flush semantics properly; consumer SSDs sometimes optimize those away for benchmarks.

Interesting facts and historical context

Storage doesn’t repeat itself, but it does rhyme. A few context points that help you make better SLOG decisions:

  1. ZFS originated at Sun Microsystems in the mid-2000s, when disks were huge and slow and RAM was expensive; its transactional design assumed crashes happen and recovery should be deterministic.
  2. The ZIL exists even without a SLOG. People sometimes think “no SLOG means no ZIL,” but ZFS always has an intent log—by default it lives on the pool.
  3. Power-loss protection in SSDs is old news in enterprise storage—it’s basically the SSD version of “battery-backed write cache” from RAID controllers, just without the battery maintenance headaches.
  4. Consumer SSD benchmarks often hide the dangerous part. Many tests measure throughput under steady power and ignore fsync/flush semantics, which is exactly where SLOG correctness lives.
  5. NFS made sync semantics everybody’s problem. Many NFS setups default to synchronous behavior for safety, so the storage backend’s fsync latency becomes user-visible pain.
  6. Early SSDs sometimes had dramatic performance cliffs under sustained sync workloads because their firmware was tuned for desktop traces, not journaling workloads with strict barriers.
  7. The “write cache lie” is not hypothetical. The industry has a long history of devices acknowledging writes before persistence—sometimes by design, sometimes by bug, sometimes by “optimizing” flushes away.
  8. ZFS’s copy-on-write model means it doesn’t overwrite live blocks; it writes new blocks and updates metadata. That’s great for consistency, but it also means sync write paths have more metadata work than a simplistic filesystem.

Choosing a SLOG SSD: what to demand, what to ignore

Demand: real power-loss protection

PLP is the headline. On an enterprise SSD, you’ll usually see it described as power-loss data protection, power-fail protection, or capacitor-backed cache flush. Physically, it often means visible capacitor arrays on the PCB (especially on U.2 / PCIe cards), but not always. The real test is whether the device is designed to safely complete acknowledged writes after power loss.

Operationally, PLP correlates with:

  • Consistent low latency for sync writes
  • Honoring flushes and barriers
  • Fewer “mystery” stalls under pressure

Demand: low latency, not peak throughput

SLOG is about ack time. The key metrics are latency under sync write patterns, particularly small block writes with flushes. A drive that does 7 GB/s sequential might still deliver disappointing 4K sync write latency if its firmware and cache strategy aren’t designed for it.

Prefer: endurance and steady-state behavior

SLOG writes are small, frequent, and repetitive. Even though the ZIL is circular and old entries are discarded, you can generate a surprising amount of write amplification. Look for drives with solid endurance ratings and, more importantly, stable behavior under sustained writes.

Prefer: simple topology and strong monitoring

NVMe devices usually give excellent latency and good visibility via SMART/log pages. SATA can work, but it’s easier to find SATA consumer drives without PLP, and SATA’s queueing can become the bottleneck with many clients. Use what fits your chassis and your failure domain planning.

Ignore: large capacity

A SLOG typically only needs to cover a few seconds of sync write traffic. The relevant sizing is based on your TXG timeout and burst behavior, not “how much data I store.” Huge SLOGs are usually a sign of shopping by marketing bullet points.

Mirror the SLOG if you care about uptime

Important nuance: losing a SLOG device does not typically corrupt the pool, but it can cause loss of the most recent acknowledged sync writes that were only in the log. Also, operationally, a dead SLOG means your sync workload falls back onto the main pool, which can be a performance cliff. Mirroring the SLOG is a common “boring but correct” move for uptime.

Joke #2: Nothing makes you appreciate a mirrored SLOG like explaining to a CFO why “it’s safe” and “it’s slow” can happen at the same time.

Deploying SLOG safely: patterns that survive audits and outages

Understand your sync workload first

If you don’t have synchronous writes, a SLOG won’t help. Many VM and database stacks generate sync writes, but some workloads are mostly async and won’t care. Measure before you buy.

Use a dedicated device, not a partition shared with “something else”

In production, shared log devices tend to become shared pain. Mixing SLOG with other filesystems, swap, or “temporary scratch” is a great way to introduce unpredictable latency.

Mirror for availability, not for speed

A mirrored SLOG doesn’t double performance; it improves resilience and can smooth out tail latency when a device starts misbehaving. If your platform serves VM storage or databases, you will eventually thank your past self for mirroring.

Keep it close to the CPU, and keep it cool

NVMe behind a flaky PCIe riser or an overheated M.2 slot is a reliability experiment you didn’t mean to run. Thermal throttling shows up as “random” sync latency spikes. Put log devices in bays with proper airflow and monitor temperatures.

Practical tasks: commands, checks, and what the output means

These are real operational tasks you can run on a Linux system with OpenZFS. Commands assume you have root or sudo access. Adjust pool names and device paths.

Task 1: Confirm whether your workload is actually doing sync writes

cr0x@server:~$ zpool iostat -v 1
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank        12.3T  5.1T    120   980     8.1M  34.2M
  raidz2    12.3T  5.1T    120   980     8.1M  34.2M
    sda         -      -     20   160     1.4M   5.7M
    sdb         -      -     20   160     1.3M   5.8M
    ...

Interpretation: This shows pool IO, but not sync vs async. It’s your first “is anything happening” check. If writes are low but clients complain, latency rather than throughput may be the issue.

Task 2: Check ZFS dataset sync settings (the foot-gun)

cr0x@server:~$ zfs get -r sync tank
NAME            PROPERTY  VALUE  SOURCE
tank            sync      standard  default
tank/vmstore    sync      standard  local
tank/backups    sync      disabled  local

Interpretation: standard respects application sync requests. disabled lies to applications (performance boost, durability risk). always forces sync and can crush performance without a SLOG.

Task 3: See whether you already have a SLOG configured

cr0x@server:~$ zpool status -v tank
  pool: tank
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            sda                     ONLINE       0     0     0
            sdb                     ONLINE       0     0     0
            sdc                     ONLINE       0     0     0
            sdd                     ONLINE       0     0     0
        logs
          mirror-1                  ONLINE       0     0     0
            nvme0n1                 ONLINE       0     0     0
            nvme1n1                 ONLINE       0     0     0

Interpretation: Look for a logs section. If it’s there, you have a SLOG. If it’s a single device, decide if uptime warrants mirroring.

Task 4: Identify the actual device model and firmware

cr0x@server:~$ lsblk -d -o NAME,MODEL,SIZE,ROTA,TRAN,SERIAL
NAME    MODEL                    SIZE ROTA TRAN SERIAL
sda     ST12000NM0007           10.9T    1 sata ZS0A...
nvme0n1 INTEL SSDPE2KX040T8     3.7T    0 nvme PHM...

Interpretation: You want to know what’s really installed. “Some SSD” is not an inventory strategy.

Task 5: Check NVMe health, including power cycles and media errors

cr0x@server:~$ sudo nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning                    : 0x00
temperature                         : 41 C
available_spare                     : 100%
percentage_used                     : 3%
data_units_written                  : 184,221,991
media_errors                        : 0
num_err_log_entries                 : 0
power_cycles                        : 27
power_on_hours                      : 3912
unsafe_shutdowns                    : 1

Interpretation: unsafe_shutdowns is a clue about power events. One is not panic-worthy; a growing number without explanation is.

Task 6: Check if the drive is doing thermal throttling (latency killer)

cr0x@server:~$ sudo nvme id-ctrl /dev/nvme0n1 | grep -i -E 'tnvmcap|mn|fr'
mn        : INTEL SSDPE2KX040T8
fr        : VDV10131
tnvmcap   : 4000787030016

Interpretation: This is identification. Pair it with temperature from SMART and your chassis airflow reality. If your SLOG runs hot, it will eventually “benchmark” like a much worse device.

Task 7: Watch SLOG IO directly via iostat by vdev

cr0x@server:~$ zpool iostat -v tank 1
                              operations         bandwidth
pool                        read  write        read  write
--------------------------  ----  -----       ----  -----
tank                         180  2200       12.1M  95.4M
  raidz2-0                    180   400       12.1M  18.2M
    sda                         30    70        2.1M   4.6M
    ...
logs                             -  1800          -  77.2M
  mirror-1                        -  1800          -  77.2M
    nvme0n1                       -   900          -  38.6M
    nvme1n1                       -   900          -  38.6M

Interpretation: If you see heavy write ops on logs, you have real sync traffic. If logs are idle, your bottleneck is elsewhere or your clients aren’t issuing sync writes.

Task 8: Check ZFS pool properties that impact sync behavior

cr0x@server:~$ zpool get -o name,property,value,source ashift,autotrim,autoreplace,cachefile tank
NAME  PROPERTY     VALUE     SOURCE
tank  ashift       12        local
tank  autotrim     on        local
tank  autoreplace  off       default
tank  cachefile    /etc/zfs/zpool.cache  local

Interpretation: Not directly SLOG-related, but misaligned ashift and neglected trim can cause write amplification and unpredictable latency.

Task 9: Add a mirrored SLOG (safe pattern) to an existing pool

cr0x@server:~$ sudo zpool add tank log mirror /dev/nvme0n1 /dev/nvme1n1

Interpretation: This creates a mirrored log vdev. Make sure these devices are not used elsewhere and are stable identifiers (by-id paths are better in real operations).

Task 10: Remove a SLOG correctly (when decommissioning)

cr0x@server:~$ sudo zpool remove tank nvme0n1

Interpretation: On supported OpenZFS versions, you can remove log devices. Confirm status afterward. Don’t just yank the device and “hope ZFS figures it out.” It will, but you’ll learn about degraded pools at 3 a.m.

Task 11: Generate a sync write test that actually exercises the SLOG

cr0x@server:~$ sudo zfs create -o sync=standard tank/slogtest
cr0x@server:~$ cd /tank/slogtest
cr0x@server:~$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
cr0x@server:~$ /usr/bin/time -f "elapsed=%e sec" dd if=/dev/zero of=./syncfile bs=8k count=20000 oflag=dsync status=none
elapsed=9.84 sec

Interpretation: oflag=dsync forces each write to be synchronous. This is closer to what databases and NFS clients do. Compare results with and without a SLOG (and during load) to see if it helps.

Task 12: Verify that your SLOG is not being bypassed due to dataset settings

cr0x@server:~$ zfs get sync tank/vmstore
NAME         PROPERTY  VALUE     SOURCE
tank/vmstore sync      standard  local

Interpretation: If it’s disabled, you may see great performance—until you test crash consistency. If it’s always, expect higher SLOG utilization and ensure the device can handle it.

Task 13: Check for extreme latency at the block layer

cr0x@server:~$ iostat -x 1 /dev/nvme0n1
Linux 6.8.0 (server)  12/25/2025  _x86_64_  (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.12    0.00    2.01    8.55    0.00   85.32

Device            r/s     w/s   rkB/s   wkB/s  rrqm/s  wrqm/s  %util  await  svctm
nvme0n1          0.00  950.00    0.00 7600.00    0.00    0.00  42.10   0.45   0.08

Interpretation: For a good SLOG device, you want low await under sync load. If await spikes to tens or hundreds of milliseconds, something is wrong: throttling, firmware, PCIe errors, or contention.

Task 14: Inspect kernel logs for NVMe resets or PCIe issues

cr0x@server:~$ sudo dmesg -T | grep -i -E 'nvme|pcie|reset|timeout' | tail -n 30
[Thu Dec 25 01:12:44 2025] nvme nvme0: I/O 123 QID 4 timeout, aborting
[Thu Dec 25 01:12:44 2025] nvme nvme0: Abort status: 0x0
[Thu Dec 25 01:12:45 2025] nvme nvme0: resetting controller

Interpretation: A SLOG device that occasionally resets will feel like “random fsync freezes.” This is not a ZFS tuning problem; it’s hardware/firmware stability.

Task 15: Confirm the pool is healthy and not silently retrying errors

cr0x@server:~$ zpool status -x
all pools are healthy

Interpretation: If you see errors or degraded vdevs, fix those first. SLOG won’t save a pool that’s already on fire.

Fast diagnosis playbook: find the bottleneck in minutes

This is the “someone is yelling in chat, VMs are slow, what do I check first” sequence. It’s optimized for speed and signal, not completeness.

First: confirm you’re dealing with synchronous write latency

  1. Check client symptoms: NFS “server not responding,” database commit stalls, hypervisor IO wait, guest fsync-heavy workloads.
  2. On the ZFS host, watch per-vdev IO:
    cr0x@server:~$ zpool iostat -v tank 1
    

    Signal: If logs shows lots of writes, you’re in sync-write land. If logs are idle, the complaint is likely read latency, CPU, network, or an async write saturation.

Second: check whether the SLOG device is the limiter

  1. Look at block-layer latency:
    cr0x@server:~$ iostat -x 1 /dev/nvme0n1
    

    Signal: High await on the SLOG device correlates directly with stalled sync writes.

  2. Check for resets/timeouts:
    cr0x@server:~$ sudo dmesg -T | grep -i -E 'nvme|timeout|reset' | tail
    

    Signal: Any controller resets during the incident window are a smoking gun.

  3. Check thermals and health:
    cr0x@server:~$ sudo nvme smart-log /dev/nvme0n1 | egrep -i 'temperature|media_errors|unsafe_shutdowns|percentage_used'
    

    Signal: Overheating or error counts trending upward are reliability problems masquerading as performance problems.

Third: if SLOG is fine, check pool and system contention

  1. Pool-wide saturation:
    cr0x@server:~$ zpool iostat -v tank 1
    

    Signal: If main vdevs show huge queues/utilization and logs are moderate, you may be bottlenecked on data flushes or reads.

  2. CPU and IO wait:
    cr0x@server:~$ vmstat 1
    

    Signal: Sustained high wa (IO wait) or system CPU can indicate broader contention.

  3. Network (for NFS/iSCSI):
    cr0x@server:~$ ip -s link show dev eth0
    

    Signal: Drops, errors, or buffer overruns can make storage look slow.

Common mistakes: symptoms and fixes

Mistake 1: “Any SSD will do for SLOG”

Symptom: Great performance in light tests, terrible tail latency under real load; occasional missing acknowledged writes after power events; scary recovery stories.

Fix: Use an enterprise SSD with PLP. If you can’t verify PLP, treat the device as non-PLP and do not put it on the sync write durability path.

Mistake 2: Using sync=disabled as a “performance feature”

Symptom: Everything is fast until an outage; then the database needs repair, VMs have corrupted filesystems, or the application sees “successful” commits that never made it to disk.

Fix: Put datasets back to sync=standard (or keep it standard and add a proper SLOG). If a vendor requires sync=disabled, treat that as a risk acceptance decision, not a tuning knob.

Mistake 3: Oversizing SLOG and undersizing expectations

Symptom: You bought a huge SSD and saw no improvement.

Fix: SLOG success is about latency and correctness, not capacity. Measure sync write latency; size for a few seconds of peak sync write traffic, not for terabytes.

Mistake 4: Single SLOG device on a platform that can’t tolerate surprises

Symptom: One SSD dies, and suddenly NFS/VM storage becomes unusably slow or you lose the newest sync writes from the last moments before failure.

Fix: Mirror the SLOG. It’s cheap insurance compared to incident time and stakeholder confidence.

Mistake 5: Putting SLOG on a thermally constrained M.2 slot

Symptom: Performance is great for 5–15 minutes, then fsync latency spikes; the host “feels” haunted.

Fix: Move SLOG to a properly cooled bay or add heatsinks/airflow. Monitor temperatures. Thermal stability is performance.

Mistake 6: Confusing SLOG with L2ARC or “special vdev”

Symptom: You added devices and nothing improved, or the wrong thing improved.

Fix: SLOG helps sync writes. L2ARC helps reads (and usually not latency-critical ones unless working set is huge). Special vdev helps metadata and small blocks (and can be dangerous if not mirrored). Pick based on the bottleneck, not vibes.

Checklists / step-by-step plan

Step-by-step plan: deciding whether you need a SLOG

  1. Identify the workload type. NFS for VMs? Databases? iSCSI? These often care about sync semantics.
  2. Measure current sync write pain. Use zpool iostat -v 1 and a targeted dd ... oflag=dsync test on a representative dataset.
  3. Confirm dataset properties. Ensure sync=standard unless you explicitly accept the risk of disabled.
  4. If sync writes are present and slow, plan a SLOG. Don’t buy hardware until you’ve confirmed the path.

Step-by-step plan: selecting a SLOG SSD

  1. Require PLP. If you can’t validate PLP, treat it as absent.
  2. Prioritize latency consistency. Drives marketed for mixed workloads or datacenter use tend to be better behaved.
  3. Plan for endurance. Especially for busy NFS/VM clusters where sync writes are constant.
  4. Prefer NVMe where possible. SATA can work, but NVMe usually wins for queueing and latency.
  5. Plan mirroring for availability. One device is a performance improvement; two is an operational posture.

Step-by-step plan: deploying a mirrored SLOG safely

  1. Inventory devices by stable paths. Use /dev/disk/by-id rather than raw /dev/nvme0n1 when possible.
  2. Add the log mirror.
    cr0x@server:~$ sudo zpool add tank log mirror /dev/disk/by-id/nvme-SSD_A /dev/disk/by-id/nvme-SSD_B
    
  3. Verify pool status.
    cr0x@server:~$ zpool status -v tank
    
  4. Load test sync writes. Run dd ... oflag=dsync or a workload generator during a maintenance window.
  5. Set up monitoring. Track NVMe media errors, unsafe shutdowns, temperature, and latency metrics.

Three corporate-world mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

At a mid-sized company running a virtualization cluster, the storage team had a ZFS-backed NFS appliance that “worked fine” for months. Someone proposed adding a SLOG to improve VM responsiveness during peak hours. Procurement found an inexpensive consumer NVMe drive with impressive benchmark charts. It went into production on a Friday, because of course it did.

Performance looked fantastic. Latency graphs dropped. The success email went out. Then a power event hit the building—nothing dramatic, just a short interruption that the UPS should have masked, except one battery pack was overdue for replacement. The host rebooted, imported the pool, and NFS came back. The hypervisors reconnected. Everyone exhaled.

Over the next few hours, a handful of VMs started showing application-level corruption: a database needed recovery, a message queue had missing acknowledged entries, and a file server had log gaps that didn’t align with any planned maintenance. The team initially suspected the hypervisors or guest OS bugs because the pool imported cleanly and zpool status looked fine.

The postmortem finally narrowed it down to the “upgrade” from last week: the SLOG SSD. The consumer drive had volatile write caching and no PLP. Under power loss, it had acknowledged sync writes that never made it to NAND. ZFS did what it could on replay, but it cannot replay log records that were never truly written. The system had followed the contract; the device had not.

The fix was not a clever sysctl. They replaced the log device with a PLP-capable enterprise SSD, mirrored it, and added a policy: if a device participates in acknowledging durability, it must be explicitly rated for power-loss safety. The embarrassing part wasn’t that the SSD was cheap—it was that the assumption “SSD equals safe” had gone unchallenged.

Mini-story 2: The optimization that backfired

A different organization ran ZFS over a large HDD pool and served iSCSI to a fleet of application servers. They were fighting latency during batch jobs. Someone read that sync=disabled can “make ZFS fast” and pitched it as a reversible optimization. It was implemented on a critical dataset with the logic: “the SAN has a UPS, and the app retries anyway.”

What happened next was subtle. The batch jobs got faster, yes. So fast that upstream systems increased concurrency. Latency looked better at first, but now the pool was absorbing more writes into memory. Under heavy bursts, the system hit memory pressure, and TXG syncs became more brutal. The platform developed a new pattern: periodic stalls where everything paused to catch up.

Then the backfire arrived: a kernel panic triggered by an unrelated driver bug. When the box rebooted, the pool imported, but the application discovered inconsistencies that weren’t supposed to be possible under its transaction model. The panic wasn’t a power failure, but it had the same effect: volatile memory was gone, and with sync=disabled the storage had been lying about durability the whole time.

The real lesson wasn’t “never change settings.” It was that the setting changed the contract with the application. Also, performance wins can create demand that shifts bottlenecks elsewhere. After rolling back to sync=standard and deploying a proper mirrored PLP SLOG, they got most of the performance benefit without invalidating the durability guarantees. The system became predictably fast instead of occasionally fast and sometimes catastrophic.

I’ve seen this pattern repeatedly: a risky optimization “works” until it becomes a dependency. Then it’s not a tweak; it’s a design decision you didn’t document.

Mini-story 3: The boring but correct practice that saved the day

A financial services shop (the kind that has opinions about “durable”) ran ZFS for NFS home directories and a small database cluster. Their SLOG was a mirrored pair of enterprise SATA SSDs with PLP. Not exciting. Not new. But thoroughly chosen: stable firmware, known behavior, decent endurance.

They also did something unfashionable: scheduled quarterly pull-the-plug tests in a lab environment that mirrored production. Not on the actual production array, but on the same model with realistic load. They validated that after abrupt power loss, the pool imported cleanly and the last acknowledged sync writes remained consistent at the application level. They tracked drive health and replaced aging SSDs based on wear indicators rather than hope.

One year, a maintenance contractor accidentally killed power to the wrong rack segment. The UPS carried most of it, but one PDU dropped long enough to reboot the storage head. Clients disconnected. Alerts fired. People ran to the data center with that particular mixture of urgency and resignation.

The system came back exactly like the runbooks said it would. ZFS replayed the log, exports resumed, and applications recovered without data repair. The incident was still expensive in human attention, but it didn’t become a data integrity crisis. Later, during the review, the team realized the reason they were calm was not heroics—it was repetition. They had already seen this failure mode in tests and had already decided what “correct” looked like.

The takeaway is almost annoyingly plain: PLP plus mirroring plus validation turns “we think it’s safe” into “we know how it behaves.” That’s what production engineering is.

FAQ

1) Do I always need a SLOG for ZFS?

No. If your workload is mostly asynchronous writes (or read-heavy), a SLOG may do almost nothing. SLOG matters when sync writes are a meaningful portion of your latency budget: NFS for VMs, databases with fsync-heavy patterns, some iSCSI stacks, and any workload that explicitly requests durability per write.

2) Can I use an old consumer SSD as a SLOG if I mirror it?

Mirroring helps device failure, not correctness under power loss. Two drives that can both lie about durability are not a truth machine. Mirroring non-PLP drives can reduce downtime but doesn’t fix the fundamental “acknowledged but not persistent” risk.

3) How big should my SLOG be?

Usually small. Think “seconds of peak sync write traffic,” not “percent of pool.” Many production systems use SLOGs in the tens of GB to a few hundred GB range, often because the suitable enterprise SSD comes in that size, not because ZFS needs it.

4) Should I put the SLOG on NVMe or SATA?

NVMe is generally better for latency and queue handling, especially with many clients. SATA can be fine if it’s a true enterprise PLP drive and your sync write rate isn’t extreme. The key is consistent low latency under flush-heavy small writes.

5) Does a SLOG help with async write throughput?

Not much, and sometimes not at all. Async writes are primarily governed by TXG syncing to the main pool, memory, and overall pool write bandwidth. SLOG helps the acknowledgement path of synchronous writes.

6) What about using a “special vdev” instead of SLOG?

Different tool. A special vdev accelerates metadata and (optionally) small blocks, helping random reads and some write patterns. It does not replace the sync write log. Also, special vdevs must be treated as critical (mirror them) because losing them can be catastrophic for the pool.

7) If I have PLP, do I still need a UPS?

Yes. PLP covers a narrow problem: the SSD safely persists acknowledged writes when power disappears. A UPS covers the broader problem: keeping the whole system stable, preventing repeated crash cycles, and giving you time for orderly shutdown. They complement each other.

8) How do I know whether my SSD really has PLP?

Trust the vendor’s enterprise documentation and product line positioning, but verify operationally: look for datacenter-class SSDs marketed with explicit power-loss data protection, check for capacitor-backed designs, and avoid drives where PLP is vague or absent. Also watch behavior: devices without PLP often show suspicious flush performance and inconsistent sync latency. Ultimately, you want evidence, not hope.

9) Does ZFS ignore drive caches or force flushes?

ZFS uses flushes and ordering semantics appropriate for synchronous operations, but it cannot force a device to be honest. If firmware acknowledges completion before data is persistent, the OS can’t retroactively make it true. That’s why PLP matters.

10) What happens if my SLOG dies?

If the SLOG device fails, ZFS can typically continue with the pool (often in a degraded state or after removing the log), but you may lose the last acknowledged sync writes that lived only in the log. Performance will likely fall back to main-pool sync latency, which can be brutal on HDD pools. Mirroring reduces the chance that a single SSD failure becomes an outage.

Conclusion

A SLOG is not a vanity accessory for ZFS. It’s a targeted fix for a specific pain: synchronous write latency. But the moment you put a device on the path where applications expect durable acknowledgement, you’ve promoted that device from “fast storage” to “part of the truth.”

That’s why power-loss protection is the feature your SLOG SSD must have. PLP makes performance honest. Mirror it if uptime matters. Keep it cool. Measure sync behavior before and after. And when you’re tempted by a cheap drive with pretty benchmarks, remember that ZFS is transactional—your log device should be, too.

← Previous
Ubuntu 24.04: UFW locked you out — recover SSH access safely from console
Next →
Proxmox “Login Failed” When the Web UI Loads: Top Causes and Fixes

Leave a comment