Your scrub has been “in progress” long enough that people are asking if the storage is haunted. Applications feel sluggish, dashboards show a sea of I/O, and the ETA is either missing or lying. You need to know: is this a normal, boring scrub doing its job, or is it a symptom of something that will bite you later?
This is the production-friendly way to answer that question. We’ll separate expected slowness (the kind you schedule and tolerate) from pathological slowness (the kind you fix before it turns into a support ticket bonfire).
What a scrub actually does (and why “slow” is sometimes correct)
A ZFS scrub is not a benchmark and not a copy operation. It’s a data integrity patrol. ZFS walks through allocated blocks in the pool, reads them, verifies checksums, and—if redundancy allows—repairs silent corruption by rewriting good data over bad. It’s proactive maintenance, the “find it before the user does” kind.
That implies two things that surprise people:
- Scrubs are fundamentally read-heavy (with occasional writes when repairs happen). Your pool can be “slow” because reads are slow, because there’s contention with real workloads, or because ZFS is intentionally being polite.
- Scrubs operate at the block level, not the file level. Fragmentation, recordsize choices, and metadata overhead can matter more than raw disk MB/s.
Scrubs also behave differently depending on vdev layout. Mirrors tend to scrub faster and more predictably than RAIDZ, because mirrors can service reads from either side and have simpler parity math. RAIDZ scrubs are perfectly fine when healthy, but they can turn into a long walk if you have wide vdevs, marginal disks, or heavy random I/O from apps.
Here’s the production rule I use: scrub time is an observable property of your system, not a moral failure. But scrub rate that collapses, or an ETA that increases, is a smell. Not always a fire, but always worth looking.
Short joke #1: A scrub with no ETA is like a storage outage with no postmortem—technically possible, socially unacceptable.
Interesting facts and a little history
- ZFS popularized end-to-end checksumming in mainstream server storage. Checksums are stored separately from data, which is why ZFS can detect “lying disks” that return corrupted blocks without I/O errors.
- Scrub is ZFS’s answer to “bit rot”—silent, incremental corruption that traditional RAID often can’t detect unless a read happens and parity rebuild is triggered.
- The term “scrub” comes from older storage systems that periodically scanned media for errors. ZFS made it routine and user-visible.
- RAIDZ was designed to avoid the write hole seen in classic RAID5/6 implementations, by keeping transactionally consistent metadata and copy-on-write semantics.
- ZFS was born at Sun Microsystems and later spread widely via OpenZFS. Modern ZFS behavior depends on the OpenZFS version, not just “ZFS” as a brand.
- Scrubs used to be more painful on systems without good I/O scheduling or where scrub throttling was primitive. Modern Linux and FreeBSD stacks give you more levers, but also more ways to shoot yourself in the foot.
- Metadata matters. Pools with millions of small files can scrub slower than a pool with fewer large files, even if “used space” looks similar.
- SMR drives made scrubs more unpredictable in the real world. When the drive does background shingled garbage collection, “reads” can become “reads plus internal rewrite drama.”
- Enterprise arrays have done patrol reads for decades, often invisibly. ZFS just gives you the truth in the open—and it turns out the truth can be slow.
Normal scrub slowness vs real trouble: the mental model
“Slow scrub” is ambiguous. You need to pin down which kind of slow you’re seeing. I divide it into four buckets:
1) “Big pool, normal physics” slow
If you have hundreds of TB and spinning disks, a scrub that takes days can be normal. It’s limited by sequential read bandwidth, the vdev layout, and the fact that scrubs don’t always get perfectly sequential access patterns (allocated blocks are not necessarily contiguous).
Signals it’s normal:
- Scrub rate is steady over hours.
- Disk latency isn’t exploding.
- Application impact is predictable and bounded.
- No checksum errors, no read errors.
2) “Throttled on purpose” slow
ZFS will often self-throttle scrubs so production workloads don’t fall over. That means your scrub can look disappointingly slow while the system stays usable. This is good engineering behavior. You can tune it, but do it deliberately.
Signals it’s throttling:
- CPU is mostly fine.
- IOPS aren’t pegged, but scrub progress moves slowly.
- Workload latency stays within SLOs.
3) “Contended by workload” slow
If the pool is serving a busy database, VM farm, or object workload, scrub reads compete with application reads/writes. Now scrub speed becomes a function of business hours. That’s not a ZFS failure; it’s a scheduling failure.
Signals it’s contention:
- Scrub speed varies with traffic patterns.
- Latency spikes correlate with application peaks.
- Turning scrub off makes users happy again.
4) “Something is wrong” slow
This is the category you’re really asking about. Scrub slowness becomes a symptom: a disk is retrying reads, a controller is erroring, a link negotiated down to 1.5Gbps, a vdev has a sick member dragging everyone, or you’ve built a pool layout that’s fine for capacity but bad for scrub behavior.
Signals you likely have a real problem:
- Read errors, checksum errors, or increasing “repaired” bytes across scrubs.
- One disk shows much higher latency or lower throughput than its siblings.
- Scrub rate collapses over time (starts normal, then crawls).
- Kernel logs show resets, timeouts, or link issues.
- SMART attributes show reallocated/pending sectors or UDMA CRC errors.
The key: “slow” isn’t a diagnosis. You’re hunting for a bottleneck and then asking whether that bottleneck is expected, configured, or failing.
Fast diagnosis playbook (first/second/third)
When you’re on-call, you don’t have time for a long philosophy seminar. You need a quick funnel that narrows the problem to one of: expected, contended, throttled, or broken.
First: Is the scrub healthy?
- Check pool status for errors and the scrub’s actual rate.
- Look for any vdev member that’s degraded, faulted, or “too many errors.”
- Decision: if errors exist, treat this as a reliability incident first and a performance question second.
Second: Is one device dragging the whole vdev?
- Check per-disk latency and I/O service times while scrub runs.
- Check SMART quickly for pending sectors, media errors, and link CRC errors.
- Decision: if one disk is slow or retrying, replace it or at least isolate it; scrubs are the canary.
Third: Is it contention or throttling?
- Correlate scrub speed with workload metrics (IOPS, latency, queue depth).
- Check ZFS tunables and whether scrub is intentionally limited.
- Decision: if you’re throttled, adjust carefully; if contended, reschedule or split workloads.
Only after those three do you get to “architecture questions” like vdev width, recordsize, special vdevs, or adding cache devices. If the scrub is slow because a SATA cable is flaky, no amount of “performance tuning” fixes it.
Practical tasks: commands, what output means, and what decision you make
The following tasks are designed to be run while a scrub is active (or right after). Each includes a realistic command, sample output, what it means, and the next decision. The host prompt and outputs are illustrative, but the commands are standard in real environments.
Task 1: Confirm scrub status, rate, and errors
cr0x@server:~$ zpool status -v tank
pool: tank
state: ONLINE
scan: scrub in progress since Mon Dec 23 01:00:02 2025
12.3T scanned at 612M/s, 8.1T issued at 403M/s, 43.2T total
0B repaired, 18.75% done, 2 days 09:14:33 to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
errors: No known data errors
What it means: ZFS shows both “scanned” and “issued.” Issued is closer to actual physical I/O completion rate. If issued is far lower than scanned, you may be seeing readahead, caching effects, or waiting on slow devices.
Decision: If READ/WRITE/CKSUM counts are non-zero, stop treating this as “just slow.” Investigate the failing device(s) before tuning.
Task 2: Get one-line progress repeatedly (good for incident channels)
cr0x@server:~$ zpool status tank | sed -n '1,12p'
pool: tank
state: ONLINE
scan: scrub in progress since Mon Dec 23 01:00:02 2025
12.3T scanned at 612M/s, 8.1T issued at 403M/s, 43.2T total
0B repaired, 18.75% done, 2 days 09:14:33 to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
What it means: This is the minimum viable status snippet. If the ETA keeps increasing hour to hour, you’re likely contended or retrying reads.
Decision: If the issued rate is steady and the ETA shrinks steadily, it’s probably normal or throttled. If it fluctuates wildly, move to per-disk checks.
Task 3: Find which vdev layout you’re dealing with
cr0x@server:~$ zpool status -P tank
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
/dev/disk/by-id/ata-ST12000...A1 ONLINE 0 0 0
/dev/disk/by-id/ata-ST12000...B2 ONLINE 0 0 0
/dev/disk/by-id/ata-ST12000...C3 ONLINE 0 0 0
/dev/disk/by-id/ata-ST12000...D4 ONLINE 0 0 0
/dev/disk/by-id/ata-ST12000...E5 ONLINE 0 0 0
/dev/disk/by-id/ata-ST12000...F6 ONLINE 0 0 0
What it means: RAIDZ2 in one wide vdev. Scrub speed will be bounded by the slowest disk and the parity overhead. One misbehaving disk can slow the whole vdev.
Decision: If you have one very wide RAIDZ vdev and scrubs are painful, you may need an architectural change later (more vdevs, narrower width). Don’t “tune” your way out of physics.
Task 4: Check per-disk latency and utilization during scrub (Linux)
cr0x@server:~$ iostat -x 2 3
Linux 6.6.12 (server) 12/25/2025 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
4.21 0.00 2.73 8.14 0.00 84.92
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s w_await aqu-sz %util
sda 112.0 28800.0 0.0 0.00 18.40 257.1 2.0 512.0 4.30 2.10 98.5
sdb 118.0 30208.0 0.0 0.00 17.92 256.0 2.0 512.0 4.10 2.12 97.9
sdc 110.0 28160.0 0.0 0.00 19.30 256.0 2.0 512.0 4.20 2.05 98.2
sdd 15.0 3840.0 0.0 0.00 220.10 256.0 1.0 256.0 10.00 3.90 99.1
sde 115.0 29440.0 0.0 0.00 18.10 256.0 2.0 512.0 4.00 2.08 98.0
sdf 114.0 29184.0 0.0 0.00 18.70 256.0 2.0 512.0 4.20 2.11 98.4
What it means: sdd has r_await of ~220ms while others are ~18ms. That’s your scrub anchor. The pool will move at the pace of the worst performer in a RAIDZ vdev.
Decision: Immediately inspect sdd for errors/logs/SMART. If it’s a cable/controller issue, fix that before replacing the disk.
Task 5: Check kernel logs for resets/timeouts (Linux)
cr0x@server:~$ sudo dmesg -T | egrep -i 'ata|scsi|reset|timeout|error' | tail -n 12
[Wed Dec 24 13:18:44 2025] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Wed Dec 24 13:18:44 2025] ata7.00: failed command: READ FPDMA QUEUED
[Wed Dec 24 13:18:44 2025] ata7: hard resetting link
[Wed Dec 24 13:18:45 2025] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Wed Dec 24 13:18:46 2025] sd 6:0:0:0: [sdd] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=14s
[Wed Dec 24 13:18:46 2025] blk_update_request: I/O error, dev sdd, sector 123456789 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0
What it means: Link reset plus renegotiation to 1.5Gbps is classic “bad cable/backplane/port” territory. It can also be a dying disk, but cables are cheaper and embarrassingly common.
Decision: Treat as hardware fault. Reseat/replace cable or move to another port. Then re-check per-disk latency. If errors persist, replace the drive.
Task 6: Quick SMART health check for the slow device
cr0x@server:~$ sudo smartctl -a /dev/sdd | egrep -i 'Reallocated_Sector_Ct|Current_Pending_Sector|Offline_Uncorrectable|UDMA_CRC_Error_Count|SMART overall|Power_On_Hours'
SMART overall-health self-assessment test result: PASSED
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 31245
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 12
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 3
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 27
What it means: Pending sectors and offline uncorrectables mean the disk is struggling to read some areas. UDMA CRC errors often point to cabling/backplane issues. “PASSED” is not absolution; it’s marketing.
Decision: If pending/offline uncorrectables exist, plan replacement. If CRC errors are increasing, fix the path (cable/backplane/HBA) too.
Task 7: Identify if the pool is doing repairs (and how much)
cr0x@server:~$ zpool status -v tank | sed -n '1,25p'
pool: tank
state: ONLINE
scan: scrub in progress since Mon Dec 23 01:00:02 2025
14.8T scanned at 540M/s, 10.2T issued at 372M/s, 43.2T total
256M repaired, 23.61% done, 2 days 05:01:12 to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
What it means: Non-zero “repaired” during scrub means ZFS found checksum mismatches and corrected them. That’s the scrub doing its job, but it’s also evidence of corruption somewhere (disk, cabling, controller, or memory).
Decision: If repairs are recurring across scrubs, investigate root cause. One-time repair after a known event might be fine; repeated repairs are not.
Task 8: Look for ZFS-level I/O and latency indicators (Linux)
cr0x@server:~$ sudo cat /proc/spl/kstat/zfs/arcstats | egrep '^(hits|misses|size|c_max|demand_data_misses|prefetch_data_misses) ' | head
hits 2876543210
misses 456789012
size 17179869184
c_max 34359738368
demand_data_misses 123456789
prefetch_data_misses 234567890
What it means: ARC stats tell you whether reads are being served from memory or going to disk. During scrub, ARC can be less helpful because scrub reads are not always cache-friendly, but a tiny ARC with heavy workloads can worsen contention.
Decision: If ARC is starved and the box is swapping, fix memory pressure first. Don’t blame scrub for a RAM problem.
Task 9: Check pool I/O in real time (Linux with zpool iostat)
cr0x@server:~$ zpool iostat -v tank 2 3
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
tank 28.1T 15.1T 3.10K 220 380M 12.1M
raidz2-0 28.1T 15.1T 3.10K 220 380M 12.1M
sda - - 520 35 63.5M 2.1M
sdb - - 525 36 64.1M 2.1M
sdc - - 518 35 63.2M 2.0M
sdd - - 110 34 12.9M 2.0M
sde - - 523 35 64.0M 2.1M
sdf - - 524 35 64.3M 2.1M
---------- ----- ----- ----- ----- ----- -----
What it means: Again, sdd is underperforming compared to others. In RAIDZ, that can reduce effective throughput and increase scrub time.
Decision: If one device shows persistently lower bandwidth/ops, focus there. Don’t tweak global tunables until device parity is restored.
Task 10: Confirm ashift and basic pool properties (performance baseline)
cr0x@server:~$ zdb -C tank | egrep 'ashift|vdev_tree' -n | head -n 8
64: ashift: 12
120: vdev_tree:
121: type: 'root'
122: id: 0
What it means: ashift: 12 implies 4K sectors. If you see ashift: 9 on modern 4K disks, you can get write amplification and weird performance behavior. It won’t always show during scrub (mostly reads), but it can worsen general pool performance and resilver/scrub overhead.
Decision: If ashift is wrong, the fix is usually “rebuild the pool correctly,” not “tune harder.” Put it on the roadmap.
Task 11: Check dataset compression and recordsize (workload interaction)
cr0x@server:~$ zfs get -o name,property,value -s local compression,recordsize tank/vmstore
NAME PROPERTY VALUE
tank/vmstore compression lz4
tank/vmstore recordsize 128K
What it means: For VM images, recordsize often gets set smaller (like 16K) depending on I/O patterns. Large recordsize isn’t “wrong,” but if your workload is random 4K, you can end up with more reads per useful byte during scrub and heavy operational overhead in general.
Decision: Don’t change recordsize casually on existing data. But if scrub pain correlates with a dataset known for small random I/O, review dataset design for the next iteration.
Task 12: Check for special vdevs (metadata) and their health
cr0x@server:~$ zpool status tank | egrep -n 'special|log|cache|spares' -A3
15: special
16: nvme0n1p2 ONLINE 0 0 0
What it means: If you have a special vdev (often NVMe) storing metadata/small blocks, its health and latency can dominate scrub behavior for metadata-heavy pools. A dying special vdev can make the entire pool “feel” slow even if HDDs are fine.
Decision: If scrub is slow on a metadata-heavy workload, check special vdev performance and errors early.
Task 13: Check the actual device path and link speed (common hidden failure)
cr0x@server:~$ sudo hdparm -I /dev/sdd | egrep -i 'Transport|speed|SATA Version' | head -n 5
Transport: Serial, ATA8-AST, SATA 3.1
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)
What it means: The drive supports 6.0Gb/s but is currently at 1.5Gb/s. That’s a strong indicator of link problems, not “ZFS being slow.”
Decision: Fix the physical path. After repair, confirm it negotiates at 6.0Gb/s and rerun iostat.
Task 14: Check scrub throttling-related module parameters (Linux OpenZFS)
cr0x@server:~$ sudo systool -m zfs -a 2>/dev/null | egrep 'zfs_scrub_delay|zfs_top_maxinflight|zfs_vdev_scrub_max_active' | head -n 20
Parameters:
zfs_scrub_delay = "4"
zfs_top_maxinflight = "32"
zfs_vdev_scrub_max_active = "2"
What it means: These values influence how aggressively scrub issues I/O. More aggressive isn’t always better; you can increase queue depth and latency for applications, and sometimes slow down the scrub due to thrash.
Decision: If the scrub is slow but healthy and you have headroom (low latency impact, low util), you can consider tuning. If the system is already hot, don’t “fix” it by making it fight harder.
Task 15: Confirm TRIM and autotrim behavior (SSD pools)
cr0x@server:~$ zpool get autotrim tank
NAME PROPERTY VALUE SOURCE
tank autotrim off default
What it means: On SSD pools, autotrim can affect long-term performance. Not directly scrub speed, but it changes how the pool behaves under sustained reads/writes and garbage collection, which can make scrubs “randomly awful.”
Decision: If you’re on SSDs and see periodic performance cliffs, evaluate enabling autotrim in a controlled change window.
Task 16: Check if you’re accidentally scrubbing frequently
cr0x@server:~$ sudo grep -R "zpool scrub" -n /etc/cron* /var/spool/cron 2>/dev/null | head
/etc/cron.monthly/zfs-scrub:4: zpool scrub tank
What it means: Monthly scrubs are common. Weekly scrubs can be fine for small pools, but on big pools it can mean you’re effectively always scrubbing, and operators start ignoring the signal.
Decision: Set a cadence appropriate to media and risk. If scrub never finishes before the next one starts, you’ve turned integrity checks into background noise.
Three corporate mini-stories from the scrub trenches
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company ran a ZFS-backed virtualization cluster. Nothing exotic: RAIDZ2, big SATA disks, one pool per node. The scrubs were scheduled monthly, and they always took “a while.” People accepted it as part of life.
Then one month the scrub ETA started increasing. It wasn’t dramatic at first—just an extra day. The on-call assumed it was workload contention: end-of-quarter batch jobs. They let it ride. “Scrubs are slow; it’ll finish.”
Two days later, user-facing VM latency spiked, then settled, then spiked again. Zpool status still showed ONLINE, no obvious errors. The assumption held: “It’s busy.” So nobody looked at the disk-level stats. That was the mistake.
When someone finally ran iostat -x, one drive had 300–800ms read await, while the others sat at 15–25ms. SMART had pending sectors. The drive wasn’t failing fast; it was failing politely, dragging the whole vdev through retries. That’s the worst kind because it looks like “normal slowness” right up until it’s not.
They replaced the drive. Scrub rate immediately returned to normal. The real lesson wasn’t “replace drives faster.” It was: never assume scrub slowness is workload until you’ve proven all devices are healthy. Scrub is the only time some bad sectors get touched. It’s your early warning system. Use it.
Mini-story 2: The optimization that backfired
A different org had strict maintenance windows. They wanted scrubs to complete over a weekend, no exceptions. Someone found scrub tunables and decided to “turn it up.” They increased scrub concurrency and reduced delays. The scrub became aggressive, and the throughput number looked great—for about an hour.
Then the application latency climbed. The hypervisors started logging storage stalls. Users complained Monday morning about “random slowness.” The team blamed the network first (as teams do), then blamed ZFS, then blamed the hypervisor. Classic triangle of denial.
What actually happened was more boring: the scrub’s I/O pattern displaced the workload’s cache and drove disk queues deep. HDDs hit near-100% util with high service times. Some application reads became tail-latency monsters. The scrub itself didn’t even finish faster overall—because as queues grew, effective throughput dropped and retries increased.
They rolled back the tuning and moved scrubs to lower-traffic periods. The “optimization” was real, but the system-level impact was negative. The best performance trick in storage is still scheduling: don’t fight your users.
Short joke #2: Storage tuning is like office politics—if you push too hard, everyone slows down and somehow it’s still your fault.
Mini-story 3: The boring but correct practice that saved the day
A fintech team ran OpenZFS on Linux for a ledger-like workload. Scrubs were treated as a formal maintenance activity: scheduled, monitored, and compared against historical baselines. No heroics. Just graphs and discipline.
They kept a simple runbook: after each scrub, record duration, average issued bandwidth, and any repaired bytes. If “repaired” was non-zero, it triggered a deeper check: kernel logs, SMART long test, and a review of any recent hardware changes.
One month, a scrub completed with a small amount repaired—nothing alarming on its own. But it was the second month in a row. Their baseline logic flagged it. The on-call dug in and found intermittent CRC errors on one disk path. Not enough to fail the disk outright, but enough to flip bits occasionally under load. Exactly the kind of defect that ruins your day six months later.
They swapped the backplane cable and moved that disk to a different HBA port. Repairs stopped. No outage, no data loss, no dramatic incident report. This is the kind of win that never gets celebrated because nothing exploded. It should be celebrated anyway.
Common mistakes: symptom → root cause → fix
This section is intentionally blunt. These are patterns that show up in production, repeatedly, because humans are consistent creatures.
Scrub ETA increases over time
- Symptom: ETA goes from “12 hours” to “2 days” while scrub runs.
- Root cause: A device is retrying reads (media issues) or link is flapping; alternatively, workload contention ramped up.
- Fix: Run
iostat -xandzpool iostat -vto identify a slow disk; checkdmesgand SMART. If no single disk is slow, correlate with workload and reschedule scrub.
Scrub is “slow” only during business hours
- Symptom: Scrub crawls 9–5 and speeds up at night.
- Root cause: Contention with production workload; ZFS and/or the OS scheduler is doing the right thing and prioritizing foreground I/O.
- Fix: Schedule scrubs for low-traffic windows; consider throttling rather than aggression. Don’t crank scrub concurrency and hope.
One disk shows 10x higher await than others
- Symptom: In
iostat -x, one drive has highr_awaitor%utilpatterns that don’t match. - Root cause: Dying disk, SMR behavior under stress, bad cable/backplane, port negotiated down.
- Fix: Check
dmesgand SMART, confirm link speed, swap cable/port, replace drive if pending sectors or uncorrectables appear.
Scrub makes applications time out
- Symptom: Latency spikes, timeouts, queue depth grows; scrub seems to “DoS” the system.
- Root cause: Scrub I/O too aggressive, poor workload isolation, too few vdevs, HDD pool serving random I/O workloads without enough spindles.
- Fix: Reduce scrub aggressiveness; schedule; add vdevs or move workload to SSD/NVMe; consider special vdevs for metadata-heavy cases. Stop expecting one wide RAIDZ vdev to act like an array.
Scrub reports repaired bytes repeatedly
- Symptom: Every scrub repairs some data.
- Root cause: Chronic corruption source: bad disk, bad cable, flaky controller, or memory issues (yes, memory).
- Fix: Investigate hardware path end-to-end; run SMART long tests; check ECC logs if available; consider a controlled memory test window. Repaired data is a gift—don’t ignore it.
Scrub is slow on an SSD pool “for no reason”
- Symptom: NVMe/SSD pool scrubs slower than expected, sometimes with periodic cliffs.
- Root cause: Thermal throttling, SSD garbage collection, poor TRIM behavior, PCIe link issues, or a special vdev bottleneck.
- Fix: Check temperatures and PCIe link speed; review
autotrim; confirm firmware; ensure special vdev isn’t saturated or erroring.
Scrub never finishes before next scheduled scrub
- Symptom: Always scrubbing; operators stop paying attention.
- Root cause: Oversized pool for the given media, too frequent cadence, or scrub is being restarted by automation.
- Fix: Reduce cadence; ensure scrubs aren’t restarted unnecessarily; consider architectural changes (more vdevs, faster media) if integrity checks can’t complete in a reasonable window.
Scrub speed is far below what raw disk math suggests
- Symptom: “We have N disks, each can do X MB/s, so why not N×X?”
- Root cause: Scrub reads allocated blocks, not necessarily sequential; metadata overhead; RAIDZ parity; fragmentation; and the pool might be near full, which makes everything uglier.
- Fix: Compare against your own historical scrub baselines, not vendor datasheets. If near-full, free space. If fragmentation is severe, consider planned re-layout via replication to a fresh pool.
Checklists / step-by-step plans
Step-by-step: Decide if a slow scrub is “normal”
- Capture current status. Run
zpool status -v. Save it in your ticket/chat. - Look for any errors. Non-zero READ/WRITE/CKSUM counts or “repaired” bytes changes the urgency.
- Measure the issued rate. If issued is stable and within your historical range, it’s likely normal.
- Check per-disk latency. Use
iostat -x(Linux) and identify outliers. - Check logs. One line in
dmesgabout resets can explain days of scrub pain. - Check SMART. Pending sectors, uncorrectables, and CRC errors decide whether you replace hardware.
- Correlate with workload. If scrub is slow only under load, fix scheduling and/or throttling.
- Only then tune. And make one change at a time with a rollback plan.
Step-by-step: If you find a slow disk during scrub
- Confirm it’s consistently slow:
iostat -x 2 5andzpool iostat -v 2 5. - Check for link negotiation down:
hdparm -Ion SATA, or controller logs for SAS. - Check kernel logs for resets/timeouts:
dmesg -Tfiltered. - Check SMART: pending/offline uncorrectable sectors mean it’s living on borrowed time.
- Swap the cheap stuff first (cable/port) if evidence points to link issues.
- Replace the disk if media issues are present or errors persist after path fixes.
- After replacement, run another scrub or at least a targeted verification plan based on your operational standards.
Step-by-step: If scrub is healthy but disrupts performance
- Confirm no device is sick (outlier latency, errors).
- Confirm whether scrub is already throttled (check tunables and observed I/O depth).
- Move scrub schedule to low-traffic periods; stagger across pools/nodes.
- If you must scrub during business hours, throttle rather than accelerate.
- Re-evaluate pool layout if you routinely can’t complete scrubs in a maintenance window.
FAQ
1) What is a “normal” ZFS scrub speed?
Normal is whatever your pool does when healthy, lightly loaded, and not erroring. Use your own historical scrub duration and issued bandwidth as the baseline. Disk vendor sequential specs are not a scrub promise.
2) Why does scanned differ from issued in zpool status?
“Scanned” reflects logical progress through blocks; “issued” reflects actual I/O sent/completed to the vdevs. Big gaps can happen due to caching, readahead, or waiting on slow devices. If issued is low and latency is high, look for a dragging disk.
3) Does a scrub read free space?
Generally, scrub checks allocated blocks (what’s actually in use). It’s not a full surface scan of every sector. That’s why a disk can still have latent bad sectors that only show up when written or read later.
4) Should I stop a scrub if it’s slow?
If the scrub is healthy but impacting production SLOs, pausing/stopping can be reasonable—then reschedule. If you see errors or repairs, stopping it just delays information you probably need. Handle the underlying hardware issue instead.
5) How often should I scrub?
Common cadence is monthly for large HDD pools, sometimes weekly for smaller or higher-risk environments. The right answer depends on media, redundancy, and how quickly you want to discover latent errors. If your scrub cadence exceeds your ability to finish scrubs, adjust—don’t normalize “always scrubbing.”
6) Scrub found and repaired data. Am I safe now?
You’re safer than you would’ve been, but you’re not “done.” Repairs mean something corrupted beneath ZFS. If repairs repeat, you need a root cause analysis of disks, cabling, controllers, and potentially memory.
7) Is RAIDZ inherently slow at scrubs compared to mirrors?
Mirrors are often faster and more predictable for reads because they can load-balance and don’t do parity reconstruction on reads. RAIDZ can be fine when healthy, but wide RAIDZ vdevs are more sensitive to one slow disk and to random I/O patterns.
8) Can tuning make scrubs dramatically faster?
Sometimes modestly, if you have headroom and conservative defaults. But tuning is not a substitute for more spindles, better media, or fixing a flaky disk path. Also: tuning can backfire by increasing latency and reducing effective throughput.
9) Why is scrub slow on a pool that’s mostly empty?
Because “empty” doesn’t mean “simple.” A pool with millions of small files, heavy metadata, snapshots, or fragmentation can scrub slowly even if used space is low. Scrub touches allocated blocks; metadata-heavy allocations are not sequential candy.
10) What’s the difference between scrub and resilver, and why does it matter for slowness?
Scrub verifies existing data and repairs corruption; resilver reconstructs data to a replaced/returned device. Resilver often has different priority and patterns, and may be more write-heavy. If you confuse the two, you’ll misread the performance expectations and urgency.
Conclusion: practical next steps
Slow scrubs are not inherently scary. In fact, a slow scrub on a big, busy pool is often a sign that ZFS is behaving responsibly. What’s scary is unexplained slowness, especially when it comes with per-disk outliers, kernel resets, or recurring repairs.
Use this sequence as your default:
- Run
zpool status -vand decide if this is a reliability event (errors/repairs) or a scheduling/perf issue. - Run
iostat -xandzpool iostat -vto find the slow device or confirm contention. - Check
dmesgand SMART for the obvious hardware path failures. - Only then consider tuning and scheduling changes, and measure impact against your historical baseline.
One paraphrased idea from W. Edwards Deming fits operations work: “Without data, you’re just someone with an opinion.” Scrub slowness is your chance to collect data before you collect outages.