ZFS ARC: How ZFS Uses RAM (and Why ‘Free RAM’ Is a Myth)

Was this helpful?

If you run ZFS long enough, someone will eventually panic over RAM “disappearing.”
They’ll show you a dashboard where “free memory” is near zero, and they’ll say the sentence every SRE hears in their sleep:
“ZFS is eating all our RAM.”

Here’s the punchline: ZFS is supposed to use your RAM. ZFS treats memory as an accelerator pedal, not a museum exhibit.
The trick isn’t to keep RAM “free.” The trick is to keep it useful—without starving your applications, without triggering swap storms,
and without confusing healthy caching for a leak.

ARC in one sentence

ARC (Adaptive Replacement Cache) is ZFS’s in-memory cache for recently and frequently used blocks—data and metadata—
designed to turn expensive disk reads into cheap RAM reads, adapting dynamically to your workload.

Joke #1: “Free RAM” is like an empty seat on a plane—comforting to look at, but you’re paying for it either way.

Why “free RAM” is a myth (in production terms)

On a modern OS, unused RAM is wasted opportunity. The kernel aggressively uses memory for caches: filesystem cache,
inode/dentry cache, slab allocations, and ZFS’s ARC. This is not a ZFS quirk; it’s basic economics.
RAM is the fastest storage tier you own.

The confusion comes from dashboards and simplistic metrics. Many graphs still treat “used memory” as suspicious and
“free memory” as safe. That’s backwards. What you want is:

  • Low swap activity (especially sustained swap-in/out).
  • Healthy page reclaim behavior (no thrashing, no OOM kills).
  • Stable application latencies under load.
  • Predictable ARC behavior relative to your workload.

ARC can be large and still harmless—if it can shrink when memory pressure arrives, and if it isn’t pushing your workloads into swap.
Conversely, you can have “free RAM” and still be slow, if your working set doesn’t fit cache and your disks are doing random I/O.

A practical way to think about it: free RAM is not a goal; reclaimable RAM is. ARC should be reclaimable under pressure,
and the system should remain stable when it is reclaimed.

What ARC actually is (and what it isn’t)

ARC is not the Linux page cache (but it competes with it)

If you’re on Linux with OpenZFS, you effectively have two caching systems in play:
the Linux page cache and the ZFS ARC. ZFS uses ARC for ZFS-managed storage. Linux uses page cache for file I/O
and anything else backed by the kernel’s caching mechanisms.

Depending on your workload and I/O path, you can end up double-caching or fighting for memory.
For example, database workloads that already implement their own caching can end up paying for caching three times:
application cache, ARC, and (sometimes) page cache effects around direct vs buffered I/O.

ARC is adaptive (and it’s picky about what it keeps)

ARC isn’t a dumb “keep the last N blocks” cache. It uses an adaptive replacement algorithm designed to balance:

  • Recency: “I used this recently, I might use it again soon.”
  • Frequency: “I use this repeatedly over time.”

That matters in real systems. Consider:
a nightly batch job that scans terabytes sequentially (recency-heavy, low reuse) versus a VM datastore with hot blocks
read repeatedly (frequency-heavy). ARC’s goal is to avoid being “polluted” by one-time scans while still capturing
genuinely hot data.

ARC caches metadata, which can matter more than data

In many production ZFS deployments, metadata caching is the difference between “snappy” and “why is ls hanging?”
Metadata includes dnodes, indirect blocks, directory structures, and various lookup structures.

If you’ve ever watched a storage system fall over under “small file workloads,” you’ve met metadata the hard way.
You can have plenty of disk throughput and still stall because the system is doing pointer-chasing on disk
for metadata that should have been in RAM.

What lives in ARC: data, metadata, and the stuff that surprises people

Data blocks vs metadata blocks

ARC contains cached blocks from your pool. Some are file data. Some are metadata.
The mix changes with workload. A VM farm tends to have a lot of repeated reads and metadata churn; a media archive
might mostly stream big blocks once; a Git server can be metadata-heavy with bursts.

Prefetch can help, and it can also set your RAM on fire

ZFS does read-ahead (prefetch) in various scenarios. When it works, it turns sequential reads into smooth throughput.
When it misfires—like when your “sequential” workload is actually many interleaved streams—it can flood ARC with
data that won’t be reused.

Real consequence: you can evict useful metadata to make room for useless prefetched data.
Then everything else gets slower, and people blame “ZFS overhead” when it’s actually cache pollution.

Compressed blocks and the RAM math people get wrong

ZFS stores compressed blocks on disk. ARC typically stores data in a way that depends on implementation details and workload,
but the operational truth is: compression changes your mental model. If you compress well, the effective cache capacity
can increase because more logical data fits per physical unit. But the CPU cost and memory overhead for bookkeeping
are real, and they show up at scale.

Dedup: the “hold my beer” of RAM consumption

Deduplication in ZFS is famous for being both powerful and dangerous. The dedup table (DDT) needs fast access, and
fast access means memory. If you enable dedup without enough RAM for the DDT working set, you can turn a storage system
into a random I/O generator with a side hustle in misery.

Joke #2: Enabling dedup on a RAM-starved box is like adopting a tiger because you got a good deal on cat food.

How ARC grows, shrinks, and sometimes refuses to “let go”

ARC sizing is a negotiation with the OS

ARC has a target size range, governed by tunables such as zfs_arc_min and zfs_arc_max.
The kernel will apply pressure when other subsystems need memory.
In a well-behaved system, ARC grows when RAM is available and shrinks when it’s needed elsewhere.

In a poorly understood system, people see ARC at “max” and assume it’s a leak. Usually it’s not.
ARC is behaving as designed: it found spare memory and used it to make reads faster.

Why ARC sometimes doesn’t shrink “fast enough”

There are cases where ARC shrink can lag behind sudden memory demand:
sudden container bursts, JVM heap expansions, or an emergency page cache expansion for a non-ZFS workload.
ARC is reclaimable, but not necessarily instantly reclaimable, and the path from “pressure” to “bytes freed”
can have latency.

When that latency meets aggressive workloads, you see swapping, kswapd CPU burn, and tail latency spikes.
That’s when ARC becomes politically unpopular.

The special pain of virtualization and “noisy neighbors”

In hypervisor setups (or large container hosts), memory accounting gets messy. Guests have their own caching.
The host has ARC. The host may also have page cache for other files. If you oversubscribe memory or allow
ballooning/overcommit without guardrails, ARC becomes the scapegoat for fundamentally bad capacity planning.

Facts & history: how we got here

  • ZFS was born at Sun Microsystems as a “storage pool + filesystem” design, not a bolt-on volume manager.
  • ARC is based on the Adaptive Replacement Cache algorithm, which improved on simple LRU by balancing recency and frequency.
  • ZFS popularized end-to-end checksumming for data integrity, which increases metadata work—and makes caching metadata more valuable.
  • The “use RAM as cache” philosophy predates ZFS; Unix kernels have long used spare memory for caching, but ZFS made it impossible to ignore.
  • Early ZFS guidance was “RAM is king” partly because disks were slower and random I/O was brutally expensive compared to RAM.
  • L2ARC (the secondary cache) arrived to extend caching onto fast devices, but it’s not free: it needs metadata in ARC to be useful.
  • Dedup became notorious because it moved a traditionally offline storage optimization into a real-time, RAM-hungry code path.
  • OpenZFS brought ZFS to Linux and other platforms, where it had to coexist with different VM and cache subsystems, changing tuning realities.
  • NVMe changed the game: disks got fast enough that bad caching decisions are sometimes less obvious—until you hit tail latency.

Three corporate-world mini-stories

Mini-story #1: The incident caused by a wrong assumption (“Free RAM is healthy”)

A mid-sized company ran a ZFS-backed NFS cluster serving home directories and build artifacts.
A new manager—smart, fast-moving, and freshly trained on a different stack—rolled out a memory “hardening” change:
cap ARC aggressively so that “at least 40% RAM stays free.” It sounded reasonable in a spreadsheet.

The first week looked fine. The dashboards were comforting: lots of green, lots of “free memory.”
Then the quarterly release cycle hit. Build jobs fanned out, small files exploded, and the NFS servers started
stuttering. Not down, just slow enough to make everything else feel broken.

The symptom that sent the incident into full bloom wasn’t “high disk utilization.” It was the ugly kind:
iowait climbing, latency for simple metadata ops spiking, and the NFS threads piling up.
The pool wasn’t saturated on throughput. It was drowning in random reads for metadata that used to sit happily in ARC.

The postmortem wasn’t about blame. It was about a wrong assumption: “free” memory is not a stability indicator.
The fix was equally unsexy: allow ARC to grow, but set a sane upper bound based on actual application headroom,
and add alerting on swap activity and ARC eviction rates—not on “free RAM.”

The lesson: if you treat RAM like a trophy, ZFS will treat your disks like a scratchpad.

Mini-story #2: The optimization that backfired (L2ARC everywhere)

Another shop had a ZFS pool supporting a virtualization cluster. Reads were the pain point, so someone proposed
adding L2ARC devices. They had spare SSDs, and the plan was simple: “Add L2ARC and the cache hit ratio will soar.”
It’s an easy sell because it’s tangible hardware.

They added a big L2ARC, watched the graphs, and… nothing magical happened. In fact, under certain workloads,
latency got worse. It wasn’t catastrophic; it was insidious. The VMs felt “sticky” during morning boot storms,
and random workloads got spikier.

The culprit wasn’t the SSDs. It was memory. L2ARC needs metadata in ARC to be effective. The larger the L2ARC,
the more ARC overhead you spend indexing it. On a host that was already RAM-tight, the extra pressure pushed ARC
into more frequent evictions of exactly the metadata the system needed most.

The rollback wasn’t dramatic. They reduced L2ARC size, added RAM on the next refresh cycle, and adjusted expectations:
L2ARC helps best when the working set is larger than RAM but still “cacheable,” and when you can afford the memory overhead.
Otherwise, you’ve built a very expensive way to make your cache less stable.

The lesson: caching is not additive; it’s a budget. If you spend it twice, you go broke in latency.

Mini-story #3: The boring practice that saved the day (measuring before tuning)

A financial services team ran ZFS for a file ingestion pipeline. They weren’t the loudest team, but they were disciplined.
Their practice was painfully boring: before any “tuning,” they captured a baseline bundle of metrics—ARC stats, IO latency,
swap activity, and per-dataset recordsize/compression settings. Every change came with a before/after comparison.

One afternoon, ingestion latency doubled. The easy blame target was ARC: “Maybe the cache is thrashing.”
But their baseline told a different story. ARC hit ratio was stable. Evictions weren’t unusual. What changed was
memory pressure: a new sidecar process had been deployed with an unbounded heap.

The system wasn’t failing because ARC was greedy; it was failing because the host was overcommitted.
ARC was doing what it could—shrinking under pressure—but the other process kept expanding, pushing the box into swap.
Their graphs showed it clearly: swap-in/out rose first, then latency followed, then CPU time in reclaim.

The fix wasn’t an arcane ZFS tunable. It was a resource limit and a rollback of a bad deployment.
The boring practice—capturing baselines and watching the right indicators—kept them from making it worse by blindly
strangling ARC.

The lesson: most “ZFS memory problems” are actually system memory problems wearing a ZFS hat.

Practical tasks: commands, outputs, and what they mean

The goal here is not to memorize commands. It’s to build muscle memory:
verify the workload, confirm memory pressure, then decide whether ARC is helping or harming.
Commands below assume a Linux system with OpenZFS installed; adjust paths for other platforms.

Task 1: See overall memory reality (not “free RAM” panic)

cr0x@server:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:           251Gi       41Gi       2.1Gi       1.2Gi       208Gi       198Gi
Swap:           16Gi          0B        16Gi

Interpretation: “Free” is low, but “available” is huge. That usually means the system is caching aggressively and can reclaim memory.
If swap is quiet and “available” is healthy, this is probably fine.

Task 2: Check if swapping is actually happening

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 2212140  10240 189512000  0    0    12    44  520  880  6  2 91  1  0
 1  0      0 2198840  10240 189625000  0    0     0     0  500  860  5  2 92  1  0
 4  1      0 2101020  10240 189540000  0    0   220   180 1100 1600 10  4 74 12  0
 3  0      0 2099920  10240 189520000  0    0   140   120  920 1300  8  3 81  8  0
 2  0      0 2103200  10240 189610000  0    0    30    60  600  900  5  2 91  2  0

Interpretation: Watch si/so (swap in/out). Non-zero sustained values mean the box is under memory pressure.
A little I/O wait (wa) isn’t automatically ARC’s fault; correlate with ARC misses and disk latency.

Task 3: Read ARC size and limits directly

cr0x@server:~$ grep -E '^(c|size|c_min|c_max|memory_throttle_count)' /proc/spl/kstat/zfs/arcstats
c                             4    214748364800
c_min                         4    10737418240
c_max                         4    214748364800
size                          4    198742182912
memory_throttle_count         4    0

Interpretation: size is current ARC size. c is the target. c_max is the cap.
If memory_throttle_count climbs, ARC has experienced memory pressure events worth investigating.

Task 4: Check ARC hit/miss behavior (is ARC helping?)

cr0x@server:~$ grep -E '^(hits|misses|demand_data_hits|demand_data_misses|demand_metadata_hits|demand_metadata_misses)' /proc/spl/kstat/zfs/arcstats
hits                           4    18230933444
misses                         4    1209933221
demand_data_hits               4    12055411222
demand_data_misses             4    902331122
demand_metadata_hits           4    5800122201
demand_metadata_misses         4    307602099

Interpretation: High hits relative to misses is good, but don’t worship hit ratio.
What matters is latency and disk load. A “good” ratio can still be too slow if your misses are expensive (random HDD reads),
and a “bad” ratio can be acceptable if your pool is NVMe and your workload is streaming.

Task 5: Watch ARC live with arcstat (when installed)

cr0x@server:~$ arcstat 1 5
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
12:01:01   812    42      5    30   4     8   1     4   0   185G   200G
12:01:02   900    55      6    40   4    10   1     5   1   185G   200G
12:01:03  1100   220     20   190  17    18   2    12   1   184G   200G
12:01:04   980   180     18   150  15    20   2    10   1   184G   200G
12:01:05   860    60      7    45   5    10   1     5   1   184G   200G

Interpretation: A spike in misses during a batch scan is normal. A persistent miss storm during “steady state”
often means your working set doesn’t fit, prefetch pollution, or a workload shift (new dataset, new access pattern).

Task 6: Check for memory reclaim stress (Linux)

cr0x@server:~$ cat /proc/pressure/memory
some avg10=0.00 avg60=0.05 avg300=0.12 total=1843812
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

Interpretation: PSI memory “some” indicates time spent stalled due to memory pressure; “full” is worse (tasks fully blocked).
Rising PSI alongside swap activity and ARC at cap is a signal to revisit memory budgets.

Task 7: Confirm pool health and obvious bottlenecks

cr0x@server:~$ zpool status -xv
all pools are healthy

Interpretation: Don’t tune caches on a sick pool. If you have errors, resilvering, or degraded vdevs, your “ARC problem”
may be a “hardware problem.”

Task 8: Observe I/O latency, not just throughput

cr0x@server:~$ zpool iostat -v 1 3
                              capacity     operations     bandwidth
pool                        alloc   free   read  write   read  write
tank                        48.2T  21.6T   8200   1100   690M   120M
  raidz2-0                   48.2T  21.6T   8200   1100   690M   120M
    sda                          -      -   1020    130    85M    10M
    sdb                          -      -   1015    135    86M    11M
    sdc                          -      -   1040    140    86M    10M
    sdd                          -      -   1030    135    85M    10M

Interpretation: This shows operations and bandwidth, but not latency. If things “feel slow,” pair this with tools like
iostat -x to see await/util, and correlate with ARC misses.

Task 9: Check device latency with iostat

cr0x@server:~$ iostat -x 1 3
Device            r/s     w/s   rMB/s   wMB/s  rrqm/s  wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm  %util
sda            1020.0   130.0    85.0    10.0     0.0     8.0   0.00   5.80   9.10   6.20   9.50    85.3    79.1  0.40  46.0
sdb            1015.0   135.0    86.0    11.0     0.0     9.0   0.00   6.20   9.40   6.10   9.60    86.7    83.1  0.41  46.5

Interpretation: Rising r_await and high %util during ARC miss spikes means your disks are paying for cache misses.
If latency is already low (e.g., NVMe), ARC misses may not be the villain.

Task 10: Identify which datasets are configured to behave expensively

cr0x@server:~$ zfs get -o name,property,value -s local,received recordsize,primarycache,secondarycache,compression tank
NAME  PROPERTY        VALUE
tank  compression     lz4
tank  primarycache    all
tank  recordsize      128K

Interpretation: primarycache=all means both data and metadata are cached in ARC.
For some workloads (databases, large streaming), you might choose metadata to reduce ARC pressure.
Don’t do it by superstition—measure.

Task 11: Check whether a workload is bypassing cache expectations

cr0x@server:~$ zfs get -o name,property,value atime,sync,logbias,primarycache tank/vmstore
NAME         PROPERTY     VALUE
tank/vmstore atime        off
tank/vmstore sync         standard
tank/vmstore logbias      latency
tank/vmstore primarycache all

Interpretation: Settings like sync and logbias won’t change ARC directly, but they change I/O behavior.
If writes are slow and causing backpressure, reads can suffer and the “ARC debate” becomes a distraction.

Task 12: Set a temporary ARC cap (carefully) for experiments

cr0x@server:~$ sudo sh -c 'echo $((64*1024*1024*1024)) > /sys/module/zfs/parameters/zfs_arc_max'
cr0x@server:~$ cat /sys/module/zfs/parameters/zfs_arc_max
68719476736

Interpretation: This sets ARC max to 64 GiB (value is bytes). Use this to test headroom hypotheses, not as a permanent “fix.”
Permanent tuning should be done via your distro’s module parameter configuration so it persists across reboots.

Task 13: Force the question—does performance track ARC size or disk latency?

cr0x@server:~$ sudo sh -c 'echo $((128*1024*1024*1024)) > /sys/module/zfs/parameters/zfs_arc_max'
cr0x@server:~$ sleep 10
cr0x@server:~$ grep -E '^(size|c|c_max)' /proc/spl/kstat/zfs/arcstats
c                             4    137438953472
c_max                         4    137438953472
size                          4    132001234944

Interpretation: Raise ARC cap temporarily and watch whether latency improves and misses drop.
If nothing changes, your bottleneck may be write path, CPU, network, or the application’s access pattern.

Task 14: Spot ARC-related kernel messages

cr0x@server:~$ dmesg -T | grep -i -E 'arc|spl|zfs' | tail -n 10
[Thu Dec 25 09:44:10 2025] ZFS: Loaded module v2.2.4-1
[Thu Dec 25 10:02:31 2025] ZFS: ARC size 197G, target 200G, min 10G, max 200G

Interpretation: You’re looking for warnings about memory throttling, failures to allocate, or repeated reclaim events.
If logs are noisy, you’re beyond “ARC is large” and into “ARC is fighting the kernel.”

Task 15: Determine if your workload is dominated by metadata

cr0x@server:~$ grep -E '^(demand_metadata_hits|demand_metadata_misses|demand_data_hits|demand_data_misses)' /proc/spl/kstat/zfs/arcstats
demand_data_hits               4    12055411222
demand_data_misses             4    902331122
demand_metadata_hits           4    5800122201
demand_metadata_misses         4    307602099

Interpretation: If metadata misses are significant and correlate with slow directory ops, slow file opens, or high IOPS on disks,
prioritize keeping metadata warm: avoid cache pollution, avoid tiny ARC caps, and consider workload-specific dataset caching settings.

Fast diagnosis playbook

When someone pings you with “ZFS is using all the RAM” or “ZFS is slow,” you don’t have time for a philosophy seminar.
You need a short, reliable sequence that finds the bottleneck quickly.

Step 1: Is the host under memory pressure or just caching?

  • Check free -h and focus on available, not free.
  • Check vmstat 1 for sustained si/so > 0.
  • Check PSI memory (/proc/pressure/memory) for rising “some/full.”

If swap is active and PSI is rising, you have a memory pressure problem. ARC may be involved, but it’s rarely the only actor.

Step 2: Are reads slow because ARC is missing, or because disks are slow anyway?

  • Check ARC hits/misses (/proc/spl/kstat/zfs/arcstats or arcstat).
  • Check disk latency (iostat -x 1) and pool behavior (zpool iostat 1).

If ARC misses spike and disk latency spikes, your cache isn’t covering your working set—or it’s being polluted.
If ARC misses spike but disks stay low-latency, your performance complaint might be elsewhere (CPU, network, app).

Step 3: Is the workload changing the cache economics?

  • Look for large scans, backups, reindex jobs, VM boot storms, or replication.
  • Identify whether metadata misses increased (small file workload, millions of inodes).
  • Review dataset properties: primarycache, recordsize, compression, sync/logbias.

Many “ARC incidents” are really “a batch job happened” incidents. Your response should be to isolate or schedule
the batch job, not to permanently cripple caching.

Step 4: Decide: tune ARC limits, tune workload, or add RAM

  • If the host is swapping: reserve headroom (cap ARC) and fix the memory hog.
  • If disks are saturated by misses: increase effective cache (more RAM, better caching policy, reduce pollution).
  • If latency is fine: stop touching it and move to the real bottleneck.

Checklists / step-by-step plan

Checklist A: “Is ARC harming my applications?”

  1. Confirm swap activity:
    cr0x@server:~$ vmstat 1 10

    Look for sustained si/so and rising wa.

  2. Confirm memory availability:
    cr0x@server:~$ free -h

    If available is low and dropping, you’re actually short on memory.

  3. Check ARC cap and size:
    cr0x@server:~$ grep -E '^(size|c_max|c_min|memory_throttle_count)' /proc/spl/kstat/zfs/arcstats

    If ARC is at cap and throttle count is climbing, consider headroom changes.

  4. Correlate with app latency and OOM logs:
    cr0x@server:~$ dmesg -T | tail -n 50

    If you see OOM kills, ARC sizing is not the root cause; overcommit is.

Checklist B: “Is ARC too small for this workload?”

  1. Measure ARC misses during the complaint window:
    cr0x@server:~$ arcstat 1 30

    Persistent misses during normal steady load is a red flag.

  2. Check disk latency at the same time:
    cr0x@server:~$ iostat -x 1 30

    If awaits climb during miss storms, the pool is paying for it.

  3. Test a controlled ARC max increase (if you have headroom):
    cr0x@server:~$ sudo sh -c 'echo $((192*1024*1024*1024)) > /sys/module/zfs/parameters/zfs_arc_max'

    Watch whether latency improves and misses drop. If yes, the fix is usually “more RAM or better isolation.”

Checklist C: “Keep metadata hot, stop cache pollution”

  1. Identify scan-like jobs (backup, scrub, rsync, reindex) and schedule them off-peak.
  2. Consider dataset primarycache=metadata for streaming datasets that don’t benefit from data caching:
    cr0x@server:~$ sudo zfs set primarycache=metadata tank/archive

    This can reduce ARC churn while keeping directory traversal fast.

  3. Validate with ARC stats: metadata misses should drop; disk IOPS should stabilize.

Common mistakes (symptoms and fixes)

Mistake 1: Alerting on “free RAM”

Symptoms: Constant pages to on-call, no actual performance issue, pressure to “fix ZFS memory.”

Fix: Alert on swap activity (vmstat si/so), PSI memory “full,” OOM events, and application latency.
Use “available” memory, not “free,” in dashboards.

Mistake 2: Capping ARC without measuring the workload

Symptoms: Disk IOPS jumps, metadata-heavy operations slow down, “everything feels laggy” during peak.

Fix: Restore a reasonable ARC cap; measure ARC misses and disk latency.
If you need headroom for apps, cap ARC based on a budget (apps + kernel + safety margin), not a percentage of “free.”

Mistake 3: Treating ARC hit ratio as the KPI

Symptoms: People celebrate a high hit rate while tail latency is terrible; or they panic over low hit rate on streaming workloads.

Fix: Prioritize latency and swap health. Hit ratio is context-dependent.
A media streamer can have a low hit ratio and still be fast; a metadata-heavy NFS server cannot.

Mistake 4: Enabling dedup because it “saves space”

Symptoms: Sudden performance collapse, high random reads, memory pressure, slow writes, DDT-related overhead.

Fix: Don’t enable dedup without a real capacity/performance model and memory budget.
If already enabled and suffering, plan a migration strategy; “turning it off” is not instant on existing blocks.

Mistake 5: Throwing L2ARC at the problem on a RAM-tight system

Symptoms: No improvement or worse latency; ARC pressure increases; metadata misses persist.

Fix: Ensure adequate RAM first; keep L2ARC appropriately sized; validate that the workload has reuse.
If your workload is mostly one-time reads, L2ARC is an expensive placebo.

Mistake 6: Ignoring write path issues and blaming ARC

Symptoms: Reads slow “sometimes,” but the real trigger is sync writes, commit latency, or a saturated SLOG/write vdev.

Fix: Measure end-to-end: zpool iostat, device latency, and application write patterns.
Fix write bottlenecks; don’t micromanage ARC to compensate.

Mistake 7: Running mixed workloads without isolation

Symptoms: Backup jobs ruin interactive workloads; VM boot storms crush file services; cache churn.

Fix: Isolate workloads by host, pool, or schedule. Use dataset-specific cache policy where appropriate.
Consider cgroups memory limits for noisy services on Linux.

FAQ

1) Is it bad if ZFS uses most of my RAM?

Not by itself. It’s bad if the system is swapping, reclaiming aggressively (high PSI “full”), or applications are losing memory and slowing down.
If “available” memory is healthy and swap is quiet, ARC using RAM is usually a feature.

2) Why doesn’t ARC release memory immediately when an app needs it?

ARC is reclaimable, but reclaim has mechanics and timing. Under sudden spikes, ARC may lag behind demand,
and the kernel may swap before ARC has shrunk enough. This is why you budget headroom and avoid operating at the cliff edge.

3) Should I set zfs_arc_max on every system?

If the host runs only ZFS workloads (like a dedicated NAS), defaults often work well.
If it’s a mixed-use host (databases, JVMs, containers), setting a cap can prevent surprise contention.
The right answer is a memory budget: what your apps need under peak, plus safety margin, plus what you can afford for ARC.

4) What’s a “good” ARC hit ratio?

Depends on workload. For streaming reads, a low hit ratio can still deliver high throughput.
For random reads and metadata-heavy workloads, low hit ratio usually means real pain.
Track hit/miss trends and correlate with disk latency and user-visible latency.

5) Is ARC the same as L2ARC?

No. ARC is in RAM. L2ARC is a secondary cache on fast storage (SSD/NVMe). L2ARC can extend caching,
but it needs RAM for metadata and doesn’t help much for one-time reads.

6) If I add more RAM, will ZFS always get faster?

Not always, but often. More RAM helps when your working set is cacheable and misses are expensive.
If you’re bottlenecked on writes, CPU, network, or application design, more ARC won’t save you.

7) Why does my system show low “free” memory even when idle?

Because the OS uses RAM for caches to speed up future work. Idle systems with lots of cache are normal.
Focus on “available” memory and swapping, not “free.”

8) Can I configure ZFS to cache only metadata?

Yes, per dataset with primarycache=metadata. It’s useful for datasets with large streaming reads
that don’t benefit from caching data blocks, while still keeping directory traversal and file lookups fast.
Measure before and after—this can backfire on workloads that actually reuse data.

9) How do I tell if ARC is thrashing?

Look for sustained high ARC misses during steady workload, rising eviction behavior, and disk latency spikes that correlate with misses.
If the system is also swapping, you can get into a vicious cycle: pressure causes ARC churn, which increases I/O, which increases latency.

10) Why did performance drop right after a big backup or scrub?

Large sequential reads can evict useful cached blocks (especially metadata) if the cache is not sized or tuned for mixed workloads.
The fix is usually scheduling, isolation, or preventing cache pollution—not permanently shrinking ARC.

Conclusion

ZFS ARC is not a memory leak wearing a filesystem costume. It’s a deliberate design choice: use RAM to avoid disk I/O,
and adapt to what the workload is doing. The operational mistake is treating “free RAM” as a health metric and
treating ARC size as a moral failure.

When performance is bad, don’t argue about philosophy—measure. Check memory pressure, check swap, check ARC misses,
check disk latency, and identify the workload that changed the game. Then decide: cap ARC for headroom, tune datasets
to avoid pollution, isolate workloads, or buy more RAM. The best ZFS tuning is often the simplest:
let ARC do its job, and make sure the rest of the system isn’t sabotaging it.

← Previous
Ubuntu 24.04 Boots to a Black Screen or Boot Loop: 6 Fixes That Usually Solve It
Next →
Email “Sender address rejected”: authentication and policy fixes

Leave a comment