ZFS zpool initialize: Making New Drives Behave Better From Day One

Was this helpful?

New disks lie. Not maliciously—more like a salesperson who promises “up to” performance and quietly ignores the fine print about steady-state behavior. You build a fresh ZFS pool, run a quick benchmark, and everything looks glorious. Then you put real workload on it and the latency chart starts doing impressionist art.

zpool initialize exists to reduce that gap between “new and shiny” and “production and grumpy.” It deliberately writes across the vdev so ZFS doesn’t discover slow-path behaviors at the worst possible time—like the first Monday after a migration.

What zpool initialize actually does (and what it doesn’t)

At a high level, zpool initialize writes to the entire address space of top-level vdevs in a pool. Think of it as preconditioning: it forces the pool to touch all regions of the disks so you aren’t surprised later when cold, never-written areas behave differently than the hot parts you happened to benchmark.

On many storage devices—especially SSDs, but also modern SMR-ish behaviors and some drive firmware optimizations—performance changes after the device has been written end-to-end at least once. Fresh out of the box, the drive can appear faster because it hasn’t had to do the internal bookkeeping it will eventually be forced to do. Once you write broadly across it, you push it toward “steady state,” which is what you’ll live with for years.

Initialize is not scrub, and it’s not resilver

  • Scrub reads and verifies existing data blocks and repairs them using redundancy. It’s about integrity of data already written.
  • Resilver reconstructs missing data onto a replacement disk, based on allocated blocks. It’s about redundancy restoration.
  • Initialize is about writing across the device to avoid “first write to this region” surprises under production load.

Initialize also doesn’t magically fix a bad design. If you built a pool with the wrong ashift, a single parity vdev for a write-heavy workload, or an HBA with broken firmware, initialization will not absolve you. It will just help you discover the pain earlier, which is still a win.

One blunt operational truth: initialize is a controlled burn. You’re spending I/O now to avoid chaotic I/O later.

Why you should care: the latency cliff

Most performance incidents in storage aren’t about throughput. They’re about tail latency: the 99th and 99.9th percentile. Databases don’t fall over because average writes are 1 ms; they fall over because a small fraction become 200 ms, stack up, and turn your queue depth into a traffic jam.

Here’s the classic pattern on a brand-new pool:

  • You test with fio on empty pool: impressive, stable latency.
  • You go live: still fine for a while, because you’re writing sequentially into fresh space.
  • You cross some threshold: allocation spreads out, metadata churn increases, and the device begins internal garbage collection or shingled remapping.
  • Latency spikes appear at random, and you can’t reproduce them easily in a lab because your lab pool is always “too empty” or “too new.”

Initialize reduces the number of “first time we’ve ever touched this region” events. It’s not a silver bullet, but it is one of the rare tools that helps you make performance problems happen on your schedule.

Joke #1: New SSDs benchmark like interns—fast, eager, and completely untested under real pressure.

The operational version of “steady state”

Engineers often say “steady state” like it’s a neat mathematical concept. In production, it means: the pool has lived through enough writes, TRIMs, overwrites, and metadata churn that it stops changing its personality every week. Initialize helps you get there before your customers do it for you.

If you run large ZFS pools for databases, VM fleets, object stores, or log platforms, you care about predictability. Initialization is about predictability.

Interesting facts and a little history

Storage is a long-running argument between physics and marketing. A few context points help explain why zpool initialize exists and why it matters.

  1. ZFS was designed to treat disks as unreliable. End-to-end checksumming and self-healing are core features, not add-ons.
  2. “Scrub” predates “initialize” as an operational habit. Scrubbing came from the need to proactively catch latent sector errors before a second disk fails.
  3. SSD performance is famously different “fresh” vs “used.” Many controllers front-load performance by writing into empty flash translation layers; later you pay the internal cleanup cost.
  4. TRIM/UNMAP changed the story. Before widespread TRIM support, SSDs could get “dirty” and stay dirty; now the host can tell the drive what blocks are free, but behavior still varies.
  5. RAID rebuild pain shaped modern ops. Long resilvers and rebuild windows became a reliability risk as disk sizes exploded. Initialize doesn’t shorten resilver directly, but it helps you observe vdev behavior early.
  6. Modern HDD firmware has complex caching and zone management. Even conventional drives may have firmware strategies that behave differently on “never written” vs “previously written” areas.
  7. ZFS allocation behavior changes as pools fill. Fragmentation and metaslab selection evolve; early-life performance isn’t representative.
  8. OpenZFS portability changed implementation details. Features like initialization evolved across platforms, and behavior/availability depends on OpenZFS version and OS integration.

One paraphrased idea worth keeping in your head comes from John Ousterhout (Stanford, systems engineer): paraphrased idea: performance problems come from what you didn’t measure, not what you did. Initialize is a way to measure the future now.

When to run initialize (and when not to)

Run it when

  • New pool, real workload soon. Especially if latency matters and the pool will fill quickly.
  • After replacing disks in a vdev, when you want the replacement media to be “warmed up” across its address space before peak hours.
  • After hardware changes (new HBA, firmware update) when you want to detect weird timeouts and slow-paths early.
  • Before a migration cutover. You want predictable behavior during the first days of production, not surprise GC storms.

Be cautious or skip it when

  • You’re already capacity constrained on IOPS. Initialization is extra write load. On busy pools it can amplify pain.
  • You’re doing it “just because.” If you can’t explain the failure mode you’re preventing, you’ll run it at the wrong time and blame ZFS for doing exactly what you asked.
  • You rely on write endurance margins. Initialization writes a lot. On consumer-grade SSDs with thin endurance, it’s not free. Decide consciously.

Initialize vs alternatives (and why I still like initialize)

People try to replicate initialization with dd, fio, or filling a dataset with zeros. It “works” in the sense that you write the disk, but it’s often less controlled and less integrated with ZFS. Initialize is pool-aware and intended for this job.

That said: initialization doesn’t replace planning. Choose sane redundancy, set ashift correctly, don’t oversubscribe HBAs, and don’t attach a pool to a controller that thinks error recovery means “meditate for 120 seconds.”

Practical tasks: commands, output, decisions

Below are field-tested tasks I use when bringing up or stabilizing pools. Each has: the command, what typical output means, and what decision I make from it. If you do these in order, you’ll avoid the “we ran initialize and the pool got slow” panic because you’ll know what “slow” was before you started.

Task 1: Confirm OpenZFS and feature support

cr0x@server:~$ zfs version
zfs-2.2.4-0ubuntu1
zfs-kmod-2.2.4-0ubuntu1

What it means: You’re on OpenZFS 2.2.x; initialization is supported on modern OpenZFS builds (platform specifics vary).

Decision: If you’re on very old ZFS (or a vendor fork), confirm zpool initialize exists and behaves as expected before you bet production on it.

Task 2: Inventory pools and health before touching anything

cr0x@server:~$ zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank  10.9T  1.23T  9.67T        -         -     2%    11%  1.00x  ONLINE  -

What it means: Pool is lightly used, low fragmentation, healthy.

Decision: Initialization on a mostly empty pool is the easiest time. On a near-full or heavily fragmented pool, schedule carefully and monitor tail latency.

Task 3: Get vdev topology and record device IDs (not just /dev/sdX)

cr0x@server:~$ zpool status -v tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 02:11:54 with 0 errors on Sun Dec 15 03:10:41 2025
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            ata-SAMSUNG_MZ7L33T8HBLA-00007_S6Y0NX0W1234  ONLINE       0     0     0
            ata-SAMSUNG_MZ7L33T8HBLA-00007_S6Y0NX0W1235  ONLINE       0     0     0
            ata-SAMSUNG_MZ7L33T8HBLA-00007_S6Y0NX0W1236  ONLINE       0     0     0
            ata-SAMSUNG_MZ7L33T8HBLA-00007_S6Y0NX0W1237  ONLINE       0     0     0
            ata-SAMSUNG_MZ7L33T8HBLA-00007_S6Y0NX0W1238  ONLINE       0     0     0
            ata-SAMSUNG_MZ7L33T8HBLA-00007_S6Y0NX0W1239  ONLINE       0     0     0

errors: No known data errors

What it means: Devices are referenced by stable identifiers. Good.

Decision: If you see /dev/sdX names, fix that before any maintenance. Device enumeration changes are a hobby of Linux at boot time.

Task 4: Check ashift indirectly (sector sizes) before blaming performance

cr0x@server:~$ zdb -C tank | grep -E 'ashift|vdev_tree' -n | head
49:        vdev_tree:
68:                ashift: 12

What it means: ashift=12 implies 4K sectors. That’s usually correct for modern drives.

Decision: If you discover ashift=9 on 4K-native devices, fix the pool design (usually rebuild). Initialize won’t save misalignment.

Task 5: Baseline latency and throughput before initialize

cr0x@server:~$ zpool iostat -v tank 1 5
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank        1.23T  9.67T     12     85   2.1M  48.3M
  raidz2-0  1.23T  9.67T     12     85   2.1M  48.3M
    ...         -      -      2     14   350K   8.2M
    ...         -      -      2     14   360K   8.1M
----------  -----  -----  -----  -----  -----  -----

What it means: You have a baseline. You’re not guessing later.

Decision: If baseline writes are already constrained, schedule initialization off-hours and consider rate-limiting via general system I/O controls (cgroups/ionice) rather than hoping it “won’t be too bad.”

Task 6: Check for existing background work (scrub/resilver) before starting

cr0x@server:~$ zpool status tank | sed -n '1,25p'
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 02:11:54 with 0 errors on Sun Dec 15 03:10:41 2025
config:
...

What it means: No scrub/resilver in progress.

Decision: Don’t stack heavy background operations. If a resilver is running, your priority is redundancy restoration, not preconditioning.

Task 7: Start initialization (whole pool)

cr0x@server:~$ sudo zpool initialize tank

What it means: Command returns quickly; initialization runs asynchronously.

Decision: Immediately begin monitoring. If you “fire and forget,” you’ll later discover it collided with a batch job and you’ll blame the wrong thing.

Task 8: Verify initialization is actually running

cr0x@server:~$ zpool status tank | sed -n '1,35p'
  pool: tank
 state: ONLINE
  scan: initialize in progress since Mon Dec 22 10:14:09 2025
        1.12T scanned at 5.43G/s, 312G issued at 1.50G/s, 10.9T total
        0B initialized, 2.78% done, 02:01:18 to go
config:
...

What it means: You see “initialize in progress” with progress and ETA. Note that “scanned” and “issued” are different; issued is the real write workload.

Decision: If progress is stuck (ETA growing, issued near zero), suspect device timeouts, queue starvation, or competing workloads.

Task 9: Pause initialization when production is burning

cr0x@server:~$ sudo zpool initialize -s tank

What it means: Stops (suspends) the initialize scan.

Decision: Use this when latency budgets are being violated. Don’t “power through” on a hot system; you’ll create a bigger incident than the one you were trying to prevent.

Task 10: Resume initialization

cr0x@server:~$ sudo zpool initialize tank

What it means: Resumes from where it left off.

Decision: Resume during quieter windows. If your workload never has quiet windows, that’s a capacity planning problem, not a ZFS problem.

Task 11: Initialize only a specific vdev (surgical preconditioning)

cr0x@server:~$ sudo zpool initialize tank ata-SAMSUNG_MZ7L33T8HBLA-00007_S6Y0NX0W1238

What it means: Targets the named leaf vdev (platform/version dependent; topology addressing must match zpool status).

Decision: Useful after replacing a single disk in a mirror/raidz group when you want the newcomer to stop behaving like a brand-new device while its siblings are “seasoned.”

Task 12: Watch per-disk behavior while initialize runs

cr0x@server:~$ iostat -x 1 5
Linux 6.8.0-48-generic (server)  12/22/2025  _x86_64_  (32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.12    0.00    1.88    6.44    0.00   88.56

Device            r/s     w/s   rkB/s   wkB/s  avgrq-sz avgqu-sz   await  svctm  %util
sda              0.00  180.00    0.00 184320.0   2048.0     8.20   45.6   5.1   92.0
sdb              0.00  177.00    0.00 181248.0   2048.0     8.10   46.2   5.2   91.4

What it means: High %util and elevated await is expected during heavy sequential writes, but watch for one disk with dramatically worse await than its peers.

Decision: If a single drive is the outlier, treat it as suspect hardware/firmware or a path problem (SAS lane, expander, cable). Initialization is a great way to flush this out.

Task 13: Confirm TRIM support and whether it’s enabled (SSD pools)

cr0x@server:~$ zpool get autotrim tank
NAME  PROPERTY  VALUE     SOURCE
tank  autotrim  off       default

What it means: Autotrim is off. That’s not automatically wrong, but it’s a deliberate choice.

Decision: For many SSD pools, enabling autotrim helps long-term steady-state performance. If you enable it, monitor for any firmware weirdness and performance regression in your environment.

Task 14: Turn on autotrim (if you decide it’s right)

cr0x@server:~$ sudo zpool set autotrim=on tank

What it means: ZFS will issue TRIMs for freed blocks (implementation details vary by OS and OpenZFS version).

Decision: Do this when you trust your SSD firmware and you want consistent long-term performance. If your SSDs are known to misbehave with TRIM under load, keep it off and rely on periodic manual trims during maintenance windows.

Task 15: Monitor ZFS latency indicators via zpool iostat

cr0x@server:~$ zpool iostat -l -v tank 1 3
                              operations     bandwidth
pool                         read  write   read  write
---------------------------  -----  -----  -----  -----
tank                            10    950  1.2M  1.10G
  raidz2-0                      10    950  1.2M  1.10G
    ata-...1234                  2    160  120K  190M
    ata-...1235                  2    158  110K  188M
---------------------------  -----  -----  -----  -----

What it means: Heavy write bandwidth consistent with initialize. If reads remain low and application latency suffers, you may be saturating the same queues used for foreground I/O.

Decision: If foreground reads are getting starved, stop or schedule initialize differently. “But initialization is sequential” is not a guarantee it won’t hurt random read latency.

Task 16: Validate error counters during initialize

cr0x@server:~$ zpool status -v tank | sed -n '1,80p'
  pool: tank
 state: ONLINE
  scan: initialize in progress since Mon Dec 22 10:14:09 2025
        6.02T scanned at 4.91G/s, 1.88T issued at 1.53G/s, 10.9T total
        0B initialized, 17.2% done, 01:38:02 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            ata-...1234                                 ONLINE       0     0     0
            ata-...1235                                 ONLINE       0     0     0
            ata-...1236                                 ONLINE       0     0     0
            ata-...1237                                 ONLINE       0     0     0
            ata-...1238                                 ONLINE       0     0     0
            ata-...1239                                 ONLINE       0     0     0

errors: No known data errors

What it means: No read/write/checksum errors while pounding the disks. That’s exactly what you want to learn early.

Decision: Any incrementing error counters during initialize is a gift. Treat it as an early warning and start isolating hardware paths before you entrust data to them.

Task 17: Correlate kernel logs for resets/timeouts

cr0x@server:~$ sudo dmesg -T | tail -n 12
[Mon Dec 22 10:41:02 2025] sd 2:0:12:0: [sdl] tag#8121 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[Mon Dec 22 10:41:02 2025] sd 2:0:12:0: [sdl] tag#8121 CDB: Write(16) 8a 00 00 00 00 1a 5f 2b 40 00 00 02 00 00 00
[Mon Dec 22 10:41:03 2025] blk_update_request: I/O error, dev sdl, sector 442446848 op 0x1:(WRITE) flags 0x0 phys_seg 32 prio class 0

What it means: The kernel is seeing timeouts and write errors on a device. ZFS may retry; your application will see latency spikes; your resilver windows will become nightmares.

Decision: Stop initialization, investigate cabling/HBA/expander/drive firmware, and run targeted SMART and link diagnostics. Don’t keep writing and hope the problem “burns in.” That’s how you end up with a Friday night.

Task 18: Check SMART health and error logs (SATA example)

cr0x@server:~$ sudo smartctl -a /dev/sdl | sed -n '1,40p'
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-48-generic] (local build)
=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG MZ7L33T8HBLA-00007
Serial Number:    S6Y0NX0W1238
Firmware Version: EDA7202Q
User Capacity:    3,840,755,982,336 bytes [3.84 TB]
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...

What it means: You can confirm firmware version and basic health. The interesting bits are usually in the error log and media wear indicators further down.

Decision: If firmware is known-problematic in your fleet, standardize. Mixed firmware in a vdev is a subtle way to create subtle pain.

Task 19: Observe pool space and fragmentation as it fills (initialize doesn’t fix this)

cr0x@server:~$ zpool list -o name,size,alloc,free,cap,frag,health tank
NAME  SIZE  ALLOC  FREE  CAP  FRAG  HEALTH
tank  10.9T  1.23T 9.67T 11%  2%    ONLINE

What it means: Low fragmentation and low capacity use. You have headroom.

Decision: If you’re routinely operating above ~80% capacity on busy pools, your real performance problem is that you’re running out of contiguous space and metaslabs become more constrained. Initialize won’t change that.

Fast diagnosis playbook

You ran zpool initialize (or you inherited a system where it’s running), and latency is bad. You need to answer one question quickly: is initialization the bottleneck, or is it merely exposing a weak link?

First: confirm what background work is active

  • Run zpool status. If you see initialize in progress, you’ve found a major source of write load.
  • Check if a scrub or resilver is also running. If yes, stop the less critical one.
cr0x@server:~$ zpool status -x
all pools are healthy

Decision: Health isn’t performance. “Healthy” only means no known corruption or device failure right now. Continue.

Second: determine whether the bottleneck is a single disk/path or the whole vdev

  • Use iostat -x or zpool iostat -v and look for one device with much worse await or much lower throughput.
  • If one device is slow, it drags the vdev. RAIDZ and mirrors both pay for the slowest member in different ways.

Third: check kernel logs for resets/timeouts

  • Time-outs in dmesg are often the real villain. Initialization just increases the chance you see them.
  • Reset storms often correlate with a specific slot, cable, expander port, or power issue.

Fourth: verify you’re not simply saturating queues

  • If all disks show high utilization and no errors, you may just be hitting bandwidth/IOPS limits.
  • Stop or suspend initialize, confirm latency recovers, then schedule initialize for off-peak.

Fifth: sanity-check pool design choices

  • Wrong ashift causes permanent, structural pain.
  • Recordsize/volblocksize mismatches won’t break initialize, but they can create misleading performance conclusions.
  • Operating at high pool fill percentages makes everything worse, always.

This playbook isn’t glamorous, but it’s fast. You’re trying to decide if you should pause initialize, swap hardware, or accept that you’re just watching the system do exactly as much I/O as it possibly can.

Three corporate mini-stories from the trenches

1) The incident caused by a wrong assumption

The company had a clean migration plan: new ZFS pool on SSDs, replicate datasets, cut over during a quiet Sunday. They did a quick benchmark on Saturday and celebrated the results. The numbers looked heroic—especially on an empty pool. They assumed “new SSD + ZFS” meant the first week would be the easiest week.

They skipped initialization because it wasn’t in the runbook, and because “we don’t want to wear the drives.” That line always sounds sensible until you realize the workload will write those bytes anyway—just at random times, during business hours, while customers are watching.

Monday morning: VM latency spikes. Not continuous, not predictable. Just sharp enough that some services fell into retry storms. Everyone chased the network first (because everyone always chases the network first), then hypervisor scheduling, then the database. The storage graphs showed average utilization well below max, which made it more confusing. It wasn’t a throughput problem.

When they finally correlated spikes with write amplification inside the SSDs—visible through device-level latency metrics—the pattern became obvious: early writes were landing in “easy” free blocks. As allocation spread and the SSD’s internal housekeeping kicked in, tail latency jumped. They essentially did “initialize in production,” one chaotic burst at a time.

They paused write-heavy jobs, ran zpool initialize during nights for a week, and the latency spikes calmed down. The lasting lesson wasn’t “always initialize.” It was: don’t benchmark empty. Empty is a demo environment, not a workload.

2) The optimization that backfired

Another org had a habit of “optimizing” everything. They read that initialization is sequential writes and assumed it would be harmless if throttled. So they ran initialize during business hours on a shared storage cluster, expecting background I/O scheduling to keep it polite.

To be extra careful, they also kicked off a scrub “to make sure everything is safe.” On paper it sounded like hygiene: initialize warms up, scrub validates, everyone sleeps well. In reality it stacked two large scans that compete for I/O and cache, while application I/O was already non-trivial.

The result was a slow-motion incident. Nothing hard-failed. No obvious red lights. But p99 latencies doubled, then tripled. The SRE on call saw no single smoking gun. CPU was fine. Network was fine. Pool was “ONLINE.” Users were not fine.

The backfire was subtle: the combined background load caused the system to spend more time in queueing and less time doing useful work. Worse, the apps responded by increasing parallelism and retries, which created more random I/O, which made the scans less efficient. Classic feedback loop.

They fixed it by being boring: never run initialize and scrub together, and never assume “sequential background I/O” is automatically harmless. Also, they added a simple dashboard panel: “Is any pool doing a scan?” It prevented future “helpful optimizations.”

3) The boring but correct practice that saved the day

A fintech team ran ZFS for a log-heavy platform. Their culture was aggressively unromantic: everything had a runbook, and every runbook had “prechecks.” It wasn’t because they loved paperwork. It was because they hated surprise.

When they deployed a new shelf of SSDs, the runbook required: record device IDs, confirm firmware uniformity, baseline zpool iostat, run initialize during a scheduled window, and watch kernel logs for resets. The initialize window was long and dull. The on-call engineer rotated through it like watching paint dry, except paint doesn’t page you.

Halfway through initialization, one drive started throwing intermittent timeouts. Not enough to fail immediately, but enough to show up in dmesg and as occasional latency spikes in iostat. They stopped initialize, replaced the drive, and restarted. The vendor later confirmed a firmware issue affecting a subset of that batch.

If they hadn’t initialized, that drive might have limped along for weeks until a peak traffic day forced it into the failure mode. The “boring” practice didn’t make them faster; it made them less surprised. That’s the better currency.

Joke #2: The best storage incident is the one you only experience as a calendar invite.

Common mistakes: symptoms → root cause → fix

1) Symptom: “Initialize made my pool slow”

Root cause: Initialize is heavy sustained write I/O and competes with foreground traffic; your system had no IOPS headroom.

Fix: Suspend initialize during peak (zpool initialize -s), schedule off-peak, and baseline before/after so you can prove causality.

2) Symptom: Progress stuck at a percentage, ETA keeps growing

Root cause: A device is timing out or the I/O path is unstable; ZFS is retrying or waiting on slow commands.

Fix: Check dmesg for resets/timeouts, run SMART, verify cabling/HBA firmware, and consider isolating the suspect disk.

3) Symptom: One disk is pegged at 100% util; others are calm

Root cause: Bad drive, link negotiation issue, or a single device in a RAIDZ vdev dragging the group due to poor latency.

Fix: Compare per-disk await and throughput. Swap the slot/cable. If the problem follows the disk, replace it. If it stays with the slot, fix the path.

4) Symptom: Initialization finishes, but workload still has huge tail latency

Root cause: Not a “new disk behavior” issue. Likely pool is too full, wrong vdev layout, ashift mismatch, or workload is sync-heavy without proper SLOG.

Fix: Check pool capacity (zpool list), fragmentation, and dataset settings; evaluate redundancy and add vdevs or redesign if necessary.

5) Symptom: Pool shows errors during initialize

Root cause: Initialization is exposing marginal hardware. This is good news delivered rudely.

Fix: Treat it like a pre-failure event: gather logs, replace the component, and rerun initialize on the replacement media.

6) Symptom: You ran initialize expecting it to “verify” the pool

Root cause: Confusion with scrub. Initialize writes; scrub verifies existing checksums.

Fix: Use zpool scrub for integrity validation. Use initialize for preconditioning. Don’t swap them.

7) Symptom: Performance got worse after enabling autotrim alongside initialize

Root cause: Your device firmware/path can’t handle concurrent sustained writes plus TRIM workload, or the implementation causes extra background work at the wrong time.

Fix: Don’t introduce two big variables at once. Stabilize first: run initialize with autotrim unchanged, then evaluate autotrim separately with monitoring.

8) Symptom: You initialized a pool during resilver and now everything is on fire

Root cause: You stacked two I/O-intensive, latency-sensitive operations. Resilver needs to finish; redundancy is at risk.

Fix: Stop initialize. Let resilver complete. Then initialize at a calmer time, optionally just the replaced device.

Checklists / step-by-step plan

Checklist A: New pool bring-up with initialization (production-minded)

  1. Record environment: ZFS version, OS kernel, HBA model/firmware. Consistency beats cleverness.
  2. Build pool with stable device IDs (WWN/ATA IDs). Never rely on /dev/sdX.
  3. Verify topology with zpool status and confirm redundancy matches the workload.
  4. Confirm ashift using zdb -C. If wrong, stop and rebuild now. Future-you will not fix it later.
  5. Baseline performance with zpool iostat and device-level iostat -x under light load.
  6. Check for errors in dmesg before starting. If the kernel already complains, initialization will turn complaints into outages.
  7. Start initialize and immediately monitor progress via zpool status.
  8. Watch per-disk metrics for outliers. One outlier is a hardware problem until proven otherwise.
  9. Stop on errors and investigate. Don’t let “it’s only initialization” normalize hardware faults.
  10. After completion, repeat baselines and record the new steady-state numbers. This becomes your “known good” reference.

Checklist B: After replacing a drive in an existing vdev

  1. Let the resilver finish. Verify zpool status is clean.
  2. Run smartctl -a on the new disk and confirm firmware matches the fleet standard.
  3. Initialize only the new device (if supported in your environment) to precondition it.
  4. Monitor dmesg for resets/timeouts; those are often slot/path issues revealed by sustained writes.
  5. Document what changed (serial, slot, firmware). When the next incident occurs, you’ll want to correlate.

Checklist C: Operating on a busy pool (don’t be a hero)

  1. Decide the goal: reduce future latency spikes, or validate hardware stability? If neither, don’t run it.
  2. Pick a window: low traffic, low batch job activity.
  3. Set expectations: publish that background write load will increase, and that latency may rise.
  4. Start initialize and watch p99 latency dashboards.
  5. Have a stop condition: if latency crosses a threshold, suspend initialize. Be disciplined.
  6. Resume later. Completion is nice; controlled impact is nicer.

FAQ

1) Does zpool initialize erase data?

It’s designed to be safe for an in-use pool, but it does write across the vdev. You should treat it as a heavy background write operation, not as a destructive wipe. Still: don’t run it on a pool you can’t afford to stress without monitoring and rollback.

2) Is initialize the same as “burn-in” testing drives?

Related, not identical. Burn-in often includes SMART long tests, read/write patterns, temperature cycling, and error monitoring outside ZFS. Initialize is ZFS-integrated preconditioning. Do both if you care about reliability.

3) Should I initialize HDD pools too?

Sometimes. HDDs don’t have flash translation layers, but modern firmware can still behave differently on untouched regions. More importantly, initialize is a good way to expose marginal disks or flaky links with sustained I/O. If your pool is already busy and stable, the benefit may not justify the load.

4) How long does initialization take?

Roughly: pool size divided by sustained write bandwidth you can spare. RAIDZ parity, controller limits, and competing workloads all matter. Trust zpool status for live estimates, but remember ETA is a guess under contention.

5) Should I run scrub after initialize?

Not automatically. Scrub verifies existing blocks and repairs via redundancy. After a new pool build, a scrub isn’t a terrible idea as a sanity check, but don’t stack scrub and initialize concurrently. Stagger them.

6) Does initialize help with future resilver times?

Not directly. Resilver time depends on allocated data, vdev performance, and system load. Initialize can reveal weak disks/paths early and reduce performance surprises, which indirectly helps your ability to survive resilvers without drama.

7) What if my workload is mostly reads—do I still need initialize?

If you’re truly read-mostly and won’t fill the pool quickly, initialization is less valuable. But if you care about predictable performance under occasional heavy writes (rebuilds, batch jobs, log bursts), initialize can still be worth it.

8) Is zpool initialize a replacement for TRIM?

No. Initialize writes; TRIM informs the device which blocks are free. They address different mechanisms. On SSD pools, you often want both: initialize to precondition, and TRIM (autotrim or periodic) to keep steady-state behavior sane.

9) Can I throttle initialization?

ZFS doesn’t provide a single universal “initialize speed” knob across all environments. Practically, you manage impact by scheduling, suspending/resuming, and controlling system-wide I/O priority (where supported). The best throttle is “don’t run it at noon.”

10) How do I know initialization succeeded?

zpool status will show initialize completed. More importantly, you should see: no new device errors, stable device-level latency, and fewer “mystery spikes” when the pool starts filling.

Conclusion: next steps that won’t bite you later

If you’re building new ZFS pools on modern media, treat zpool initialize like you treat fire drills: inconvenient, controlled, and vastly preferable to the real thing. Run it when you have headroom, measure before and after, and take any errors as a serious signal—not noise.

Practical next steps:

  • Pick one non-critical pool and add initialize to the build runbook, including explicit stop conditions.
  • Standardize device naming (WWN/ATA IDs) and firmware baselines so diagnostics don’t turn into archaeology.
  • Build a small dashboard panel: current ZFS scans (scrub/resilver/initialize), per-disk latency, and kernel error rates.
  • Make one policy decision: when to run initialize (new pool, after replacements, before migrations), and when not to.

New drives will always try to impress you. Your job is to make them behave when nobody’s watching.

← Previous
Marketing Cores: When Numbers in a Name Mislead
Next →
Best GPU for Video Editing: CUDA vs VRAM vs Codecs—What Actually Wins

Leave a comment