You turned on encryption because security asked nicely (or because auditors asked rudely). Then you turned on compression
because storage is expensive and the pool is always “mysteriously” fuller than the forecast. And now performance is weird:
CPU spikes, latency jitters, and “why is this dataset slower than the unencrypted one?” becomes a weekly ritual.
ZFS is deterministic. Your workload isn’t. The gap between those two is where most production pain lives. Let’s close it:
what order the pipeline uses, what knobs matter, and how to prove what’s actually limiting you.
The pipeline order: what happens first, and why you should care
ZFS doesn’t “kind of” compress and “sort of” encrypt. It does both in a very specific pipeline, and performance outcomes
follow that order like a shadow.
What ZFS does to your data (the simplified but accurate model)
For native ZFS encryption (the on-dataset feature), the important sequence is:
- Application writes logical blocks (files, DB pages, VM blocks) into the dataset’s recordsize/volblocksize world.
-
Compression happens first (if enabled). ZFS tries to compress each record (e.g., 128K) independently.
If the result isn’t smaller enough, it may store uncompressed (algorithm-dependent behavior, but the outcome is simple:
it won’t waste space for no gain). -
Encryption happens after compression. This is the big one. Encrypting random-looking ciphertext is basically
incompressible, so if you encrypt first you’re done; compression becomes a fancy no-op. - Checksums are computed for integrity, and ZFS writes blocks into the pool’s vdev layout, with copy-on-write semantics.
The order (compress then encrypt) is why you can have both security and space efficiency—without magic. It’s also why certain
performance problems show up only after enabling encryption: compression might still save I/O, but you’ve moved some work
onto the CPU. If you don’t have CPU headroom, latency becomes your new hobby.
The mental model that avoids dumb mistakes
Think of it like this: compression reduces bytes; encryption destroys patterns. So you want to reduce bytes while patterns
still exist. Then you encrypt.
If you’re using ZFS native encryption, you get the right order by design. If you’re doing “encryption” in the application
or on top of ZFS (encrypted files inside an unencrypted dataset), you might be encrypting before ZFS ever sees the data,
and now ZFS compression is mostly decorative.
Joke #1: Turning on compression for already-encrypted data is like installing a turbo on a bicycle—technically impressive, functionally confusing.
What “order” means in the real world
This article isn’t arguing about philosophical order; it’s about where the CPU cycles and I/O go. The order you care about is:
- Where is encryption happening? ZFS native? App-level? Filesystem-level above ZFS?
- Where is compression happening? ZFS dataset property? App-level compression? Backup tool?
- When is data becoming incompressible? Most commonly: after encryption, after already-compressed formats (JPEG, MP4),
or after application-level compression (e.g., database page compression).
Facts & history that change decisions
Some context pays rent. Here are concrete facts and short historical points that matter when you’re making changes on a production pool.
- ZFS encryption is native and per-dataset. It’s not a separate block device layer bolted on later; it’s a dataset feature with key inheritance options.
- Native encryption arrived much later than ZFS itself. ZFS was created in the mid-2000s era; native encryption landed years after, after long debate about key handling and feature flags.
- ZFS compression has existed “forever” in ZFS terms, and it’s been a default in many shops for years because it often improves performance by reducing I/O.
- AES-GCM is common for ZFS encryption because it provides authenticated encryption (confidentiality + integrity). That integrity is separate from, but complementary to, ZFS checksums.
- ZFS checksums are end-to-end: stored and verified on reads, enabling detection of silent corruption. Encryption does not replace this; it sits alongside it.
- Compression is per-block (record) and local. ZFS doesn’t compress “files”; it compresses blocks, which is why recordsize and workload shape matter so much.
- ARC caches compressed data (and encrypted datasets have their own implications). Cache effectiveness depends on the post-compression size and access patterns.
- ZFS send/receive supports encrypted replication modes. You can send raw encrypted streams that don’t require keys on the receiver—useful for untrusted backup targets.
- Modern CPUs have AES acceleration (AES-NI on x86, similar on other platforms), turning “encryption is slow” into “encryption is usually fine, until it isn’t.”
One quote, because it’s painfully true in ops. Gene Kranz said: “Failure is not an option.” (It’s a cultural motto, not a kernel parameter.)
Performance realities: encryption, compression, and the CPU you actually have
People like simple answers: “Compression is fast,” “Encryption is slow,” “NVMe fixes everything.” Reality is uglier and more interesting.
Performance is a three-way negotiation between CPU cycles, memory bandwidth/cache behavior, and storage latency/IOPS.
Compression can speed things up, even when it costs CPU
If your pool is I/O-bound, compression often wins twice:
- It writes fewer bytes, so your disks do less work.
- It reads fewer bytes, which can turn latency spikes into something you stop noticing.
But compression isn’t free. Algorithms like lz4 are engineered for speed; zstd can trade CPU for better ratios.
You choose based on your bottleneck. If you’re already CPU-bound (high utilization, run queue pressure, frequent context switching),
enabling heavier compression is how you turn a mild performance issue into a ticket queue.
Encryption overhead is often “fine,” until it isn’t
With hardware acceleration, AES-GCM can be very fast. But encryption still adds:
- CPU work per block written and read.
- Some extra metadata handling.
- Potential cache and memory bandwidth pressure under high throughput.
The failure mode isn’t always obvious. You can have plenty of “average CPU” but still be short on per-core headroom.
Storage threads can become latency-sensitive; if they run on cores already busy with app work, you get jitter.
The order gives you a lever: reduce bytes before you pay encryption cost
Because ZFS compresses before it encrypts, compression reduces the amount of data that must be encrypted and written.
This matters when:
- You have a dataset that compresses well (text logs, JSON, VM images with zeroes, many database pages).
- You’re throughput-limited by storage or network replication.
- You’re replicating encrypted data (send/receive), where bytes on the wire cost you time.
If the data is already compressed (media, many backups, encrypted blobs), compression won’t reduce bytes. You still pay
the compression attempt cost, though small for lz4 and non-trivial for heavier zstd levels.
Compression level is a policy decision, not a vibe
The temptation is to pick a cool-sounding algorithm (“zstd-19, because bigger number”) and call it optimization.
That’s not engineering; it’s cosplay.
Here’s the practical stance:
- Default to
compression=lz4almost everywhere. It’s the “safe” choice and often wins on both space and speed. - Use
compression=zstd(moderate level) for datasets where you’ve measured real benefits and you have CPU budget. - Disable compression for datasets that are provably incompressible and extremely latency-sensitive (rare, but real).
Encryption key management can become your performance problem
Not directly—crypto is fast—but because operational mistakes cause delays and outages. If keys aren’t loaded at boot, your services
don’t mount, your apps don’t start, and you end up debugging “storage performance” that’s actually “storage unavailable.”
Joke #2: The fastest filesystem is the one that mounts; the second fastest is the one that doesn’t page your on-call at 3 a.m.
Dataset design: recordsize, volblocksize, and workload fit
Encryption and compression aren’t standalone toggles. They interact with block sizing and access patterns. Most “ZFS is slow”
incidents are actually “ZFS is doing exactly what you asked, and what you asked was weird.”
recordsize: the hidden multiplier
recordsize affects file datasets (not zvols) and controls the maximum block size ZFS uses for file data.
Bigger records:
- Improve sequential throughput (fewer I/O operations).
- Increase compression effectiveness (more data per block, more patterns).
- Can hurt random-read latency for small reads (read amplification).
Encryption and compression both operate per record. If you choose a recordsize that doesn’t fit your workload, you amplify
CPU work and I/O in the wrong places.
volblocksize: for zvols, you only get one shot (mostly)
For zvols, volblocksize is the block size exposed to the consumer (VM, iSCSI, etc.). It’s set at creation and
is hard to change without recreating the volume.
If you run databases or VM images on zvols, get volblocksize right. If the guest writes 8K blocks and your volblocksize is 128K,
you will perform a masterclass in write amplification.
Special vdevs and metadata: the performance trapdoor
Encryption and compression are about data blocks, but metadata behavior can dominate latency. If your workload is metadata-heavy
(millions of small files, containers, build artifacts), your “data” dataset might look fine while metadata is thrashing.
Special vdevs can help, but they must be designed carefully and protected like first-class citizens.
Practical tasks: commands, outputs, and the decision you make
Theory is cheap. Here are practical tasks you can run on a ZFS system to understand whether encryption and compression are helping,
hurting, or just sitting there looking busy. Each task includes: command, what output means, and the decision you make.
Task 1: Confirm encryption and compression properties on a dataset
cr0x@server:~$ zfs get -o name,property,value,source encryption,keystatus,keylocation,keyformat,compression,compressratio pool/app
NAME PROPERTY VALUE SOURCE
pool/app encryption aes-256-gcm local
pool/app keystatus available -
pool/app keylocation prompt local
pool/app keyformat passphrase local
pool/app compression zstd local
pool/app compressratio 1.72x -
What it means: encryption is on, key is loaded (keystatus=available), compression is zstd, and real ratio is 1.72x.
Decision: If compressratio is near 1.00x and workload is latency-sensitive, consider compression=lz4 or off.
If keystatus isn’t available, fix key loading before you chase “performance.”
Task 2: Check whether you’re compressing incompressible data
cr0x@server:~$ zfs get -o name,property,value compressratio,compression pool/media
NAME PROPERTY VALUE SOURCE
pool/media compressratio 1.01x -
pool/media compression zstd local
What it means: zstd is doing basically nothing. Likely media files or already-compressed objects.
Decision: Move to compression=lz4 or compression=off if CPU is hot and this dataset doesn’t benefit.
Task 3: Measure whether encryption is present where you think it is
cr0x@server:~$ zfs get -o name,property,value encryption -r pool
NAME PROPERTY VALUE SOURCE
pool encryption off default
pool/app encryption aes-256-gcm local
pool/app/db encryption aes-256-gcm inherited from pool/app
pool/backups encryption off local
What it means: Not everything is encrypted. Inheritance is working for pool/app/db; backups are not encrypted.
Decision: Decide policy: do backups need encryption? If yes, enable encryption at the dataset layer (or use raw sends).
Task 4: Check CPU features that make encryption cheap (or not)
cr0x@server:~$ grep -m1 -oE 'aes|sha_ni|pclmulqdq' /proc/cpuinfo | sort -u
aes
pclmulqdq
What it means: AES and carry-less multiply features exist; AES-GCM can be hardware-accelerated.
Decision: If these are missing on older hardware, expect noticeably higher encryption CPU cost and plan capacity accordingly.
Task 5: Inspect per-dataset I/O load and latency clues
cr0x@server:~$ zpool iostat -v pool 1 3
capacity operations bandwidth
pool alloc free read write read write
-------------------------- ----- ----- ----- ----- ----- -----
pool 2.10T 5.14T 120 980 18.3M 210M
mirror-0 1.05T 2.57T 60 490 9.2M 105M
nvme0n1 - - 30 250 4.6M 52.0M
nvme1n1 - - 30 240 4.6M 53.0M
mirror-1 1.05T 2.57T 60 490 9.1M 105M
nvme2n1 - - 30 245 4.5M 52.5M
nvme3n1 - - 30 245 4.6M 52.5M
-------------------------- ----- ----- ----- ----- ----- -----
What it means: Heavy writes. This tells you the pool is doing real work; not who is to blame, but where to look next.
Decision: If bandwidth is high but latency is bad, investigate sync writes, SLOG, and CPU saturation rather than “encryption is slow” guesses.
Task 6: Check ARC health and whether caching is helping compressed/encrypted workloads
cr0x@server:~$ arcstat 1 3
time read miss miss% dmis dm% pmis pm% mmis mm% size c
12:10:01 920 180 19 45 25 120 67 15 8 24.1G 31.8G
12:10:02 910 175 19 44 25 116 66 15 9 24.1G 31.8G
12:10:03 950 190 20 50 26 125 66 15 8 24.1G 31.8G
What it means: ~80% hit rate; ARC is useful. If miss% is very high, you’re going to disks and latency will follow.
Decision: If misses dominate and working set is larger than ARC, tune memory, consider special vdev/L2ARC (carefully),
or redesign dataset layout. Don’t blame encryption for cache misses.
Task 7: Verify recordsize and decide whether it matches access patterns
cr0x@server:~$ zfs get -o name,property,value recordsize pool/app
NAME PROPERTY VALUE SOURCE
pool/app recordsize 128K local
What it means: 128K is a common default. Great for sequential, not always for small random reads.
Decision: For small-file heavy or random-read heavy workloads, consider smaller recordsize (e.g., 16K or 32K) on a dataset created for that purpose.
Task 8: For zvols, validate volblocksize (and regret nothing later)
cr0x@server:~$ zfs get -o name,property,value volblocksize pool/vm-01
NAME PROPERTY VALUE SOURCE
pool/vm-01 volblocksize 16K local
What it means: 16K is often reasonable for VM workloads; align to guest I/O where possible.
Decision: If volblocksize is huge and the workload is small random writes, plan a migration/recreate. There’s no heroic sysctl that fixes wrong block sizing.
Task 9: Check sync behavior and whether you’re paying for durability you don’t need
cr0x@server:~$ zfs get -o name,property,value sync pool/app/db
NAME PROPERTY VALUE SOURCE
pool/app/db sync standard default
What it means: ZFS honors application sync requests. Databases may force sync writes; latency will show it.
Decision: Keep sync=standard for correctness unless you truly understand the risk. If you need low-latency sync, consider a SLOG on power-loss-protected NVMe.
Task 10: Confirm encryption key status across boot (operational reliability)
cr0x@server:~$ zfs get -o name,property,value keystatus -r pool/app
NAME PROPERTY VALUE
pool/app keystatus available
pool/app/db keystatus available
What it means: Keys are loaded now. This doesn’t guarantee they’ll load automatically next boot.
Decision: Choose a keyloading strategy (prompt vs file vs external agent) that matches your boot automation and threat model.
Task 11: Benchmark compression effect without lying to yourself
cr0x@server:~$ zfs create -o encryption=aes-256-gcm -o keyformat=passphrase -o keylocation=prompt -o compression=lz4 pool/bench-lz4
Enter passphrase:
Re-enter passphrase:
cr0x@server:~$ dd if=/dev/zero of=/pool/bench-lz4/zeros bs=1M count=2048 status=progress
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 2.31 s, 930 MB/s
cr0x@server:~$ zfs get -o name,property,value compressratio used,logicalused pool/bench-lz4
NAME PROPERTY VALUE SOURCE
pool/bench-lz4 compressratio 200.00x -
pool/bench-lz4 used 12.5M -
pool/bench-lz4 logicalused 2.00G -
What it means: Zeroes compress extremely well; you see the “best case.” Also note: encryption did not prevent compression because compression happened first.
Decision: Use realistic data too (database pages, logs, VM images). Benchmarking only with zeroes is how you create a beautiful lie.
Task 12: Validate send/receive mode for encrypted replication
cr0x@server:~$ zfs send -nvpw pool/app@daily | head
send from @daily estimated size is 58.2G
send from @daily to pool/app@daily incremental size 2.14G
What it means: -w indicates a raw send (encrypted stream preserved). Receiver doesn’t need to decrypt to store it.
Decision: Use raw sends to untrusted backup targets. If you need the receiver to access data, use normal sends and manage keys accordingly.
Task 13: Check per-dataset space savings and whether you’re “winning”
cr0x@server:~$ zfs list -o name,used,logicalused,compressratio,encryption -r pool/app | head -n 6
NAME USED LOGICALUSED RATIO ENCRYPTION
pool/app 620G 1.02T 1.69x aes-256-gcm
pool/app/db 410G 710G 1.73x aes-256-gcm
pool/app/log 42G 120G 2.85x aes-256-gcm
pool/app/tmp 168G 170G 1.01x aes-256-gcm
What it means: Logs compress great; tmp doesn’t. Encryption is consistent.
Decision: Consider compression=off (or lz4) for tmp if CPU is constrained, and keep stronger compression where it pays back.
Task 14: Inspect pool health and scrub status (because performance issues love sick pools)
cr0x@server:~$ zpool status -v pool
pool: pool
state: ONLINE
scan: scrub repaired 0B in 03:12:44 with 0 errors on Sun Dec 15 03:30:12 2025
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme0n1 ONLINE 0 0 0
nvme1n1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
nvme2n1 ONLINE 0 0 0
nvme3n1 ONLINE 0 0 0
errors: No known data errors
What it means: Healthy pool, recent scrub clean. Good baseline.
Decision: If you see errors or resilvering, stop “tuning encryption/compression” and handle hardware/pool health first.
Task 15: Identify whether the workload is random IOPS or sequential bandwidth
cr0x@server:~$ iostat -x 1 3
avg-cpu: %user %nice %system %iowait %steal %idle
35.2 0.0 9.8 1.2 0.0 53.8
Device r/s w/s rkB/s wkB/s await svctm %util
nvme0n1 30.0 250.0 4700.0 52000.0 1.5 0.4 11.2
nvme1n1 30.0 240.0 4600.0 53000.0 1.6 0.4 11.0
What it means: Low iowait and low device utilization suggests storage isn’t saturated. CPU is doing real work.
Decision: If you’re slow but disks aren’t busy, suspect CPU (compression level, encryption overhead, checksumming) or application behavior (sync writes).
Fast diagnosis playbook: find the bottleneck in minutes
When performance tanks after enabling encryption or changing compression, don’t do the “tweak three knobs and hope” dance.
Do this in order. Stop when you find the constraint.
First: Is the pool healthy and not busy doing recovery work?
- Run
zpool status. Look for resilvering, scrubs, checksum errors. - If a resilver or scrub is running, accept degraded performance or reschedule. Tuning won’t fix physics.
Second: Are you I/O-bound or CPU-bound?
- Run
zpool iostat -v 1andiostat -x 1. - If devices are at high
%utilandawaitclimbs: I/O-bound. - If devices are cool but CPU is hot: CPU-bound (often compression/encryption/checksum, or application-level overhead).
Third: Is sync write latency the villain?
- Check dataset
syncproperty. - Check whether the workload is a database or VM doing fsync-heavy patterns.
- If sync writes dominate and you don’t have a proper SLOG, performance will be “fine” until it is not.
Fourth: Is your data compressing? Or are you just burning CPU?
- Check
compressratioandlogicalusedvsused. - If ratio is ~1.00x on hot datasets and CPU is tight, choose lz4 or disable compression there.
Fifth: Is block sizing wrong for the workload?
- Check
recordsizefor file datasets. - Check
volblocksizefor zvols. - Mismatch causes amplification, which looks like “encryption overhead” because everything gets slower together.
Three corporate mini-stories from the trenches
1) Incident caused by a wrong assumption: “Encryption broke compression”
A mid-sized SaaS company rolled out ZFS native encryption to satisfy a customer questionnaire. The engineer doing the change
toggled encryption on new datasets and left compression as-is (zstd). A week later, finance noticed storage growth slowing down.
Everyone relaxed. Then latency complaints started: API calls timing out, background jobs slipping, and a “ZFS is slow now” narrative
took over the incident channel.
The wrong assumption was subtle: the team believed encryption would prevent compression and therefore increase I/O. In their head,
storage would get busier, so they focused on the pool: vdev layout, queue depth, “maybe we need more NVMe.” Meanwhile, the pool
graphs looked boring—low utilization, no obvious saturation.
The fix came from someone who asked an unfashionable question: “Is the CPU doing something different?” They checked
compressratio and saw compression was still effective. They also noticed the app nodes had become CPU-tight because a separate
deployment had increased TLS termination overhead on the same hosts running storage services. The encryption change was the straw,
not the truck.
They moved the storage services onto hosts with more per-core headroom and adjusted compression from zstd to lz4 on the busiest
random-write dataset. Latency stabilized. The incident write-up was blunt: encryption didn’t kill compression; capacity planning
killed capacity planning.
2) Optimization that backfired: “zstd-19 everywhere”
An enterprise IT team migrated from a legacy array to ZFS and loved the savings from compression. Someone read that zstd has great
ratios and decided to standardize on a high level across all datasets, including VM storage and a busy build cache.
The immediate results looked good on the dashboard: compression ratio improved, and storage growth charts made everyone look smart.
Then the helpdesk started getting “VM is sluggish” tickets that were hard to reproduce. The build farm started missing SLAs by small
amounts—just enough to annoy teams without triggering a big incident.
The backfire was classic: they improved space efficiency at the cost of per-I/O CPU latency. High compression levels increase CPU time
per block. On sequential workloads, you can hide that with throughput. On random IO, you can’t. Each I/O now carried a small CPU tax,
and the sum of small taxes is how you create systemic jitter.
They rolled VM datasets back to lz4, kept zstd (moderate) for log archives, and learned the hard rule: compression is a workload feature,
not a global religion.
3) The boring but correct practice that saved the day: key loading and replication discipline
A company running multiple sites used encrypted datasets and replicated them nightly. Their “boring” practice was twofold:
(1) keys were managed with a consistent hierarchy, and (2) replication was tested quarterly with a restore drill. No heroics, just repetition.
A storage node failed hard—hardware-level, no romantic story. They promoted the replica, imported the pool, and brought services up.
The restoration was not fast, but it was predictable. Most importantly, it didn’t become a key-management fiasco.
The saving detail: they used raw sends for offsite copies and ensured that the recovery site had the necessary keys (and the process
to load them) documented and rehearsed. The receiver didn’t need to decrypt the raw stream to store it, which reduced the “moving parts”
in the middle of a stressful event.
The postmortem had no fireworks. It was almost disappointing. That’s what success looks like in production: uneventful recovery and
no late-night cryptography improv.
Common mistakes: symptoms → root cause → fix
These are recurring patterns. If you see the symptom, don’t debate it in Slack. Go straight to the root cause and fix.
1) Symptom: compression ratio stuck near 1.00x, CPU elevated
Root cause: Data is already compressed or encrypted before ZFS sees it (media files, encrypted backups, app-level encryption).
Fix: Use compression=lz4 or compression=off on that dataset. Keep compression on where it actually pays.
2) Symptom: performance dropped after enabling encryption, but disks are not busy
Root cause: CPU-bound: encryption + compression + checksum overhead now competes with application workloads.
Fix: Reduce compression level (zstd → lz4), add CPU headroom, isolate storage services, or scale out. Prove with iostat and ARC stats.
3) Symptom: latency spikes on database writes, especially during peak
Root cause: sync writes (fsync) forcing ZIL behavior; no fast SLOG with power-loss protection, or SLOG is mis-sized/misbehaving.
Fix: Keep sync=standard; add a proper SLOG device; validate with workload tests. Don’t set sync=disabled as a “performance fix” unless you like explaining data loss.
4) Symptom: VM storage feels “randomly slow,” small I/O is terrible
Root cause: Wrong volblocksize causing write amplification; sometimes combined with high compression levels.
Fix: Recreate zvol with correct volblocksize (often 8K/16K depending on workload) and migrate data; use lz4 for VM zvols unless you have measured otherwise.
5) Symptom: system boots, but datasets won’t mount; services fail “mysteriously”
Root cause: Encrypted datasets with keys not loaded at boot; keylocation=prompt on headless systems; missing automation.
Fix: Implement a key loading mechanism consistent with your threat model (file-based with restricted permissions, external agent, manual prompt with runbooks) and test reboot recovery.
6) Symptom: replication is slow, CPU on sender or receiver spikes
Root cause: Non-raw sends require decrypt/re-encrypt or recompression; also heavy compression on live datasets can add CPU cost during send.
Fix: Use raw sends for encrypted backups when the receiver doesn’t need plaintext. Consider tuning compression level on datasets with aggressive change rates.
Checklists / step-by-step plan
Checklist A: Setting up an encrypted + compressed dataset (production-safe defaults)
- Create a dedicated dataset per workload class (db, logs, media, vm, backups).
- Enable encryption at dataset creation time (design key inheritance intentionally).
- Start with
compression=lz4unless you have a measured reason for zstd. - Set
recordsizeto fit the workload (large for sequential; smaller for random reads). - For zvols, choose
volblocksizecarefully (and document it). - Decide
syncpolicy and whether you need a SLOG. - Reboot test: can the system import the pool and load keys predictably?
Checklist B: Changing compression on an existing dataset without drama
- Measure baseline: latency, CPU,
compressratio, and pool iostat under representative load. - Change compression property (it applies to new writes; old blocks remain as-is).
- Observe under real load; don’t trust microbenchmarks alone.
- If you need existing data recompressed, plan a rewrite (send/receive into a new dataset or copy/rsync).
- Re-evaluate after a week of typical workload; short tests miss real-world entropy.
Checklist C: Encryption operational hygiene (the part that prevents “it won’t mount”)
- Standardize keyformat and keylocation across environments where possible.
- Document key loading steps and who has access.
- Test recovery: import pool, load keys, mount datasets, start services.
- Validate replication mode (raw vs non-raw) matches security goals.
- Audit inheritance: make sure sensitive child datasets didn’t accidentally stay unencrypted.
FAQ
1) Does ZFS encrypt before it compresses?
For ZFS native encryption, ZFS compresses first, then encrypts. That’s why compression still works on encrypted datasets.
2) Why is compression ineffective on some encrypted datasets?
Usually because the data was already encrypted or compressed before ZFS saw it (application-level encryption, media formats,
compressed backups). ZFS can’t compress randomness.
3) Should I use zstd or lz4 with encrypted datasets?
Default to lz4 for hot, latency-sensitive datasets. Use zstd when you’ve measured meaningful space savings and have CPU budget.
Encryption doesn’t change that rule; it just adds another CPU consumer.
4) If I change compression, does ZFS recompress existing data?
No. The property affects newly written blocks. To recompress existing data, you need to rewrite it (copy it, or send/receive into a new dataset).
5) Does encryption hurt ARC caching?
Encryption changes what ZFS must do on reads/writes, but ARC effectiveness mostly depends on access patterns and working set size.
If you’re missing ARC a lot, you’ll feel it regardless of encryption.
6) Is raw send the right choice for encrypted backups?
If the backup target doesn’t need to access plaintext, raw send is excellent: it preserves encryption and avoids key exposure on the receiver.
If you need restores that mount and read data at the receiver, plan key management accordingly.
7) Can I enable encryption on an existing unencrypted dataset?
Not in-place in the sense people hope for. Practically, you create a new encrypted dataset and migrate data (send/receive or copy),
then cut over.
8) What’s the biggest performance knob besides compression algorithm?
Block sizing: recordsize for file datasets and volblocksize for zvols. Wrong sizing creates amplification that no algorithm saves you from.
9) Should I set sync=disabled to make encrypted datasets fast?
No, not as a general move. It trades durability for speed and turns certain failures into data loss. If sync latency is the issue,
solve it with proper SLOG or application tuning.
10) How do I know if encryption overhead is my bottleneck?
If disks are not saturated, ARC isn’t the issue, and CPU is pegged during I/O-heavy operations, encryption (plus compression/checksums)
may be contributing. Confirm with system CPU metrics and compare behavior with lz4 vs heavier compression.
Conclusion: practical next steps
The “order that makes performance” is simple: reduce bytes while they’re still compressible, then encrypt. ZFS native encryption
already does this. Your job is not to fight it; your job is to make sure the rest of the system—CPU headroom, block sizing, sync policy,
and key operations—doesn’t sabotage the win.
Next steps you can execute this week
- Inventory datasets: encryption on/off, compression algorithm, compressratio, recordsize/volblocksize.
- Identify hot datasets with compressratio ~1.00x and decide whether compression should be lz4 or off.
- Run the fast diagnosis playbook during peak load once, record results, and keep them as your baseline.
- Pick one workload class (VMs or databases) and validate block sizing; plan migrations for the worst offenders.
- Test a reboot + key load + service start sequence in a controlled window. If it’s not boring, it’s not done.