If you’ve ever watched a ZFS host with 256 GB of RAM feel oddly “full” while performance didn’t move an inch, you’ve met the ARC’s bad habit: caching data that doesn’t deserve a second date. The ARC is brilliant at learning what you’ll need again. It’s also obedient. If you keep feeding it a workload that is “read once, never again,” it will happily chew up memory and evict the things that actually matter.
This is where primarycache stops being a footnote and becomes an operational control surface. Done right, it’s a scalpel: per-dataset caching rules that keep ARC hot for metadata and truly reusable blocks, while letting streaming reads and disposable datasets pass through without turning RAM into a museum of yesterday’s backups.
What primarycache really controls (and what it doesn’t)
primarycache is a ZFS dataset property that controls what ZFS is allowed to place into the ARC (the in-memory cache). It has three values:
all— cache both file data and metadata in ARC (default)metadata— cache metadata only (dnode, indirect blocks, etc.)none— cache nothing from that dataset in ARC
That sounds simple, but there are two subtle operational truths:
- This is about admission, not eviction policy. It decides what kinds of blocks from that dataset are eligible to enter ARC. It doesn’t directly “pin” or “prioritize” anything.
- This is per-dataset, inherited, and easy to abuse. It’s powerful precisely because you can apply it to a backup dataset without punishing your VM dataset. It’s also easy to set it on a parent and forget it, quietly kneecapping performance for everything below.
Also: primarycache is not a magic switch for “use L2ARC instead.” That’s secondarycache. You can set both, and they interact, but they solve different problems: ARC admission vs L2ARC admission.
One more gotcha: setting primarycache=none does not mean “ZFS will never use RAM for this dataset.” ZFS still uses memory for bookkeeping, and your application still uses page cache (on Linux) unless you’re using direct I/O. The ARC is only one of several mouths that eat RAM.
Joke #1 (short and relevant): ARC is like an intern with a great memory who never asks why they’re memorizing the entire phone book.
How ARC “wastes” RAM in real life
ARC wasting RAM is usually not a bug. It’s a mismatch between your workload and the cache’s assumptions.
ARC works best when:
- there’s meaningful temporal locality (you read the same blocks again soon), and/or
- metadata reuse is high (directory traversal, small random IO, VM metadata churn), and/or
- your working set can fit or mostly fit in RAM.
ARC works poorly (or at least “expensively”) when:
- you stream huge files once (backups, scrubs, large ETL reads),
- the working set dwarfs memory, causing continual churn,
- the host is also running memory-hungry services (databases, JVMs) that actually need that RAM more.
What does “waste” look like operationally?
- Latency doesn’t improve after the first pass, because there is no second pass.
- ARC grows because ZFS is doing its job: it sees free-ish memory and uses it.
- Other processes get squeezed, leading to swapping, OOM kills, or aggressive reclaim pressure.
- ARC hit ratio looks fine-ish but misses are still expensive because the misses are in the critical path (often metadata or small random reads), and your “hits” are a mountain of irrelevant sequential data.
In a lot of production incidents, the first bad assumption is that “cache hit ratio” is a single number that can validate an entire system. It can’t. You can have a decent hit ratio and still lose because the cache is full of low-value blocks.
Facts & history: why this knob exists
Some context makes this property easier to respect. Here are a few concrete facts and historical notes that influence how primarycache behaves in the wild:
- ZFS was designed with RAM as a first-class performance tier. When ZFS was born at Sun, big-memory servers were part of the target. ARC is not an afterthought; it’s core to the design.
- ARC isn’t “just a cache”; it’s a policy engine. It splits cache into MRU/MFU (recent vs frequent) segments to avoid being fooled by one-time scans.
- Even with scan resistance, sequential workloads still fill ARC. ARC can try not to destroy your hot set during a scan, but it still has to decide what to admit.
primarycacheis the “don’t admit this stuff” lever. - The “metadata-only” idea is older than ZFS. Filesystems have long recognized metadata locality as a huge win. ZFS formalized it at the dataset level.
- ZFS datasets are meant to be policy boundaries. Properties like
recordsize,compression,sync, andprimarycacheexist so you can run mixed workloads on one pool without one workload bullying the rest. - L2ARC historically had a “RAM tax”. Storing L2ARC headers in memory used to be expensive. Newer implementations reduced overhead, but the lesson remains: caching is never free.
- On Linux, the page cache and ARC can compete. OpenZFS on Linux integrates carefully, but from an operator’s point of view, it still means “two caching layers exist unless you use direct I/O.”
- Recordsize interacts with caching value. Big records make sequential throughput happy, but they also mean each cache admission can be chunky. You can pollute ARC faster with large record workloads.
- Dedup is a cache consumer even when you’re not “using cache.” If you enable dedup, the DDT wants memory. That’s a separate axis of pain that
primarycachewon’t solve.
Workload patterns: when to use all, metadata, or none
primarycache=all (default): for workloads that actually reuse data
Keep all when the dataset’s data is reused. Common cases:
- VM images where guests repeatedly touch the same filesystem blocks
- Databases with hot tables and indexes that fit partially in memory
- Home directories and source trees where metadata + small files repeat
- Latency-sensitive services that read the same config/templates repeatedly
Operational signal: ARC hit ratio should correlate with user-facing latency improvements. If you reboot and the system “warms up” over minutes to hours, ARC is paying rent.
primarycache=metadata: the underrated sweet spot
metadata is my default answer for “this dataset is large, mostly sequential, but still needs to list directories and traverse snapshots without suffering.”
Great fits:
- Backup repositories where restores are occasional but listings and snapshot operations are frequent
- Large media archives: lots of big files, few rereads, but you still browse
- Data lakes where compute nodes stream through data once per job
What you get: ARC stores the filesystem brain, not the bulk body. Directory walks, snapshot diffs, and metadata-heavy operations stay quick, while streaming reads don’t evict your actual hot data from other datasets.
primarycache=none: the “quarantine” setting
none is not evil. It’s a quarantine zone for datasets that you know are cache-toxic.
Good fits:
- Scratch spaces for one-time batch exports
- Staging areas for replication receives (depending on workflow)
- Bulk ingest pipelines where data is written once and immediately shipped elsewhere
When it backfires: small random reads, metadata-heavy workloads, or anything with re-reads. If you set none on a VM dataset because “ARC is big,” expect IOPS misery and confused coworkers.
Joke #2 (short and relevant): Setting primarycache=none on your VM dataset is like removing the chairs to prevent people from sitting—technically effective, socially catastrophic.
Three corporate-world mini-stories (wins, losses, and one boring hero)
Mini-story #1: The incident caused by a wrong assumption (ARC ≠ “free RAM”)
At a mid-sized company running a mixed virtualization cluster, one host started swapping during business hours. The graphs looked weird: plenty of “available” memory, but the kernel was reclaiming aggressively, and latency on a customer-facing API spiked.
The on-call did what many of us do under pressure: looked at ARC size, saw it was large, and concluded “ZFS is stealing RAM.” They slapped a lower ARC max on the host and called it a night. The next day was quieter—until the weekly backup job ran.
During the backup window, the host had to serve VM reads and write a massive sequential backup stream from a dataset that was also storing some VM disks (because “storage is storage”). ARC, now constrained, had less room for the VM hot set. The backup job still streamed happily, but the VMs got slower, and the API latency complaints returned.
The wrong assumption wasn’t “ARC uses RAM.” The wrong assumption was “ARC size is the problem.” The actual problem was that a backup workload was polluting the ARC and evicting VM working set. The fix wasn’t starving ARC; it was isolating policy: move backups to their own dataset and set primarycache=metadata there. ARC stayed large, but now it stored useful things.
After that change, the API latency became boring again. The weekly backup job kept running. No heroics. Just one dataset property that turned a chaotic cache into a disciplined one.
Mini-story #2: The optimization that backfired (overzealous “metadata-only”)
A different org had a storage engineer who loved clean rules. They decided: “Database data should never be cached because the database has its own cache.” They set primarycache=metadata on the dataset holding a PostgreSQL cluster, expecting to free ARC for “more important things.”
On paper, it sounded defensible. In reality, it ignored two details: (1) not every database read is a buffer-cache hit, especially during deployments and migrations, and (2) the database’s cache is constrained by configured memory and workload concurrency, not by wishful thinking.
They didn’t notice immediately. Latency crept up during reporting hours, then spiked during a schema migration that forced large index reads. The system’s IOPS demand rose sharply, but the pool was HDD-heavy with limited random performance. ARC could have absorbed a chunk of that. Instead, every cache miss became a disk event.
The rollback was simple: return the dataset to primarycache=all (and adjust other knobs more thoughtfully). The lesson: “apps cache too” doesn’t mean filesystem cache is redundant. It means you need to understand the cache hierarchy and the failure modes, especially under atypical operations like migrations, VACUUM, or cold restarts.
Mini-story #3: The boring but correct practice that saved the day (policy boundaries + tests)
An enterprise team ran a ZFS-backed object store and a set of NFS exports on the same pool. The object store’s workload was mostly large sequential reads and writes; the NFS exports were lots of small-file metadata churn from CI jobs.
They did something profoundly unsexy: they created separate datasets for each workload class, documented the intended properties, and enforced them with a simple configuration check in their provisioning pipeline. For the object store datasets: primarycache=metadata. For the NFS CI datasets: primarycache=all. For ephemeral staging: primarycache=none.
Months later, an internal team launched a new analytics job that read tens of terabytes sequentially during the day. In many companies, that’s the start of a war. Here, it was a non-event: the job ran on a dataset already quarantined from ARC data caching. The CI farm stayed fast. The NFS exports didn’t go sluggish. The only “incident” was someone asking why nothing caught fire.
That’s the real win: boring correctness. Not a heroic midnight sysctl scramble, but a set of boundaries that make unpredictable workloads less dangerous.
Practical tasks: commands, interpretation, and what to change
Below are practical tasks you can run on a typical OpenZFS system (Linux examples). Commands are realistic; adapt dataset/pool names.
Task 1: List datasets and find primarycache settings
cr0x@server:~$ zfs get -r -o name,property,value,source primarycache tank
NAME PROPERTY VALUE SOURCE
tank primarycache all default
tank/vm primarycache all local
tank/backup primarycache metadata local
tank/backup/staging primarycache none local
Interpretation: Look for unexpected inheritance. If a parent has none, everything under it may be silently suffering. The SOURCE column tells you whether it’s local, inherited, or default.
Task 2: Check secondarycache too (L2ARC admission)
cr0x@server:~$ zfs get -r -o name,property,value,source secondarycache tank
NAME PROPERTY VALUE SOURCE
tank secondarycache all default
tank/backup secondarycache metadata local
Interpretation: A common sane combo for backups is primarycache=metadata and secondarycache=metadata, keeping both ARC and L2ARC focused on metadata.
Task 3: Change primarycache safely on one dataset
cr0x@server:~$ sudo zfs set primarycache=metadata tank/backup
Interpretation: This changes future ARC admissions. It won’t instantly purge existing cached data from that dataset. Expect behavior changes over time as ARC turns over.
Task 4: Verify your change and see inheritance
cr0x@server:~$ zfs get -o name,value,source primarycache tank/backup tank/backup/staging
NAME VALUE SOURCE
tank/backup metadata local
tank/backup/staging none local
Interpretation: Confirm the dataset you care about isn’t inheriting something you forgot.
Task 5: Identify “cache-toxic” datasets by workload (top talkers)
cr0x@server:~$ zfs iostat -v tank 5 3
capacity operations bandwidth
pool alloc free read write read write
-------------------------- ----- ----- ----- ----- ----- -----
tank 48.2T 21.7T - - - -
raidz2-0 48.2T 21.7T 980 430 1.20G 210M
sda - - 120 55 150M 28M
sdb - - 115 54 148M 27M
...
-------------------------- ----- ----- ----- ----- ----- -----
cr0x@server:~$ zfs iostat -v tank/backup 5 3
capacity operations bandwidth
dataset alloc free read write read write
-------------------------- ----- ----- ----- ----- ----- -----
tank/backup 22.1T - 820 40 1.05G 18M
-------------------------- ----- ----- ----- ----- ----- -----
Interpretation: If a dataset is doing huge sequential bandwidth reads/writes, it’s a prime candidate for primarycache=metadata or none depending on reuse.
Task 6: Observe ARC size and basic behavior
cr0x@server:~$ cat /proc/spl/kstat/zfs/arcstats | egrep "^(size|c |c_min|c_max|hits|misses|mru_hits|mfu_hits|demand_data_hits|demand_data_misses|demand_metadata_hits|demand_metadata_misses) " | head -n 20
size 4 68719476736
c 4 82463372032
c_min 4 17179869184
c_max 4 103079215104
hits 4 9812334401
misses 4 554332110
mru_hits 4 431228001
mfu_hits 4 9272116400
demand_data_hits 4 7122331100
demand_data_misses 4 431002200
demand_metadata_hits 4 2690000000
demand_metadata_misses 4 12329910
Interpretation: Don’t worship a single hit ratio. Compare demand data vs demand metadata. If metadata misses are high and latency hurts, you probably need more metadata caching, not less.
Task 7: Watch ARC behavior over time during a streaming job
cr0x@server:~$ for i in {1..5}; do
> awk '
> /^(size|hits|misses|demand_data_hits|demand_data_misses|demand_metadata_hits|demand_metadata_misses)/ {print}
> ' /proc/spl/kstat/zfs/arcstats | paste - - - - - - -;
> echo "----";
> sleep 5;
> done
size 4 69123481600 hits 4 9813334401 misses 4 554532110 demand_data_hits 4 7123331100 demand_data_misses 4 431202200 demand_metadata_hits 4 2690100000 demand_metadata_misses 4 12339910
----
size 4 69811097600 hits 4 9815334401 misses 4 555032110 demand_data_hits 4 7127331100 demand_data_misses 4 431802200 demand_metadata_hits 4 2690200000 demand_metadata_misses 4 12349910
----
Interpretation: During a streaming read, you may see demand data misses climb while hits don’t meaningfully help later. That’s the signature for “stop caching this dataset’s data.”
Task 8: Check memory pressure and swapping (is ARC competing with your apps?)
cr0x@server:~$ free -h
total used free shared buff/cache available
Mem: 251Gi 182Gi 4.1Gi 2.0Gi 65Gi 18Gi
Swap: 16Gi 1.2Gi 14Gi
cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 123456 4200000 90000 61200000 0 0 120 900 3200 8800 28 10 59 3 0
5 0 123456 3800000 90000 60800000 0 0 110 950 3300 9100 27 11 58 4 0
Interpretation: Swap usage plus low “available” memory is a sign you need to revisit cache policy. ARC can be big and still okay, but if your apps are paging, you’re paying rent twice.
Task 9: Determine whether a dataset is primarily sequential (quick sampling)
cr0x@server:~$ iostat -x 1 3
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s w_await wareq-sz aqu-sz %util
sda 120.0 155000.0 0.2 0.2 7.10 1291.7 55.0 28500.0 9.80 518.2 0.92 88.0
Interpretation: Large rareq-sz and high throughput suggests streaming. That’s where primarycache=metadata often wins, especially if re-reads are rare.
Task 10: Measure a real before/after with a cold-ish cache (careful)
This is dangerous on production if you’re careless. But you can test with a specific file and controlled reads. Use a non-critical dataset or a test host.
cr0x@server:~$ sudo zfs set primarycache=all tank/testseq
cr0x@server:~$ dd if=/tank/testseq/bigfile.bin of=/dev/null bs=16M status=progress
21474836480 bytes (21 GB, 20 GiB) copied, 22 s, 976 MB/s
cr0x@server:~$ sudo zfs set primarycache=metadata tank/testseq
cr0x@server:~$ dd if=/tank/testseq/bigfile.bin of=/dev/null bs=16M status=progress
21474836480 bytes (21 GB, 20 GiB) copied, 22 s, 974 MB/s
Interpretation: If performance doesn’t change but ARC pollution does, you’ve found free money. Streaming reads often don’t benefit from caching, but they can evict valuable data.
Task 11: Inspect dataset properties that amplify cache pollution risk
cr0x@server:~$ zfs get -o name,property,value recordsize,compression,atime,logbias,primarycache tank/backup
NAME PROPERTY VALUE SOURCE
tank/backup recordsize 1M local
tank/backup compression zstd local
tank/backup atime off local
tank/backup logbias throughput local
tank/backup primarycache metadata local
Interpretation: A large recordsize is perfect for backups, but it also means each cached record is huge. Pairing big records with primarycache=metadata is often the sanest way to keep RAM from turning into a backup museum.
Task 12: Find datasets with inherited “none” (the silent performance killer)
cr0x@server:~$ zfs get -r -H -o name,value,source primarycache tank | awk '$2=="none" || $3=="inherited" {print}'
tank/staging none local
tank/staging/tmp none inherited
Interpretation: If something important is inheriting none, that’s an accidental performance policy. Fix it at the right level (child override or parent correction).
Task 13: Confirm ARC limits (don’t fight the OS blindly)
cr0x@server:~$ cat /sys/module/zfs/parameters/zfs_arc_max
109951162777
cr0x@server:~$ cat /sys/module/zfs/parameters/zfs_arc_min
17179869184
Interpretation: If you’ve constrained ARC too hard, you may be forcing disk reads for hot metadata. Use primarycache to improve ARC quality before you shrink ARC size.
Task 14: Validate an intended policy on a tree (drift detection)
cr0x@server:~$ zfs get -r -o name,value,source primarycache tank | egrep "tank/(vm|backup|staging)"
tank/vm all local
tank/backup metadata local
tank/staging none local
Interpretation: This is the SRE habit: check that reality matches your mental model. Most cache disasters come from “we thought it was set” rather than from bad theory.
Fast diagnosis playbook
This is the “it’s slow and people are staring at you” path. It’s ordered to find the likely bottleneck fast, not to be academically complete.
Step 1: Is the system memory-starved or swapping?
cr0x@server:~$ free -h
cr0x@server:~$ vmstat 1 5
cr0x@server:~$ swapon --show
What you’re looking for: low available memory, swap-in/swap-out activity, and symptoms like random pauses. If swapping is happening, decide whether ARC admission policy (primarycache) is polluting memory or whether you genuinely need more RAM.
Step 2: Is the pool I/O-bound or CPU-bound?
cr0x@server:~$ iostat -x 1 5
cr0x@server:~$ zpool iostat -v 1 5
What you’re looking for: high device %util, high await, and queues. If disks are pegged, ARC tuning may help only if you can increase cache hit rate for relevant data/metadata. If CPU is pegged (compression, checksums), caching changes won’t fix compute saturation.
Step 3: Is ARC helping the workload you care about?
cr0x@server:~$ cat /proc/spl/kstat/zfs/arcstats | egrep "^(hits|misses|demand_data_hits|demand_data_misses|demand_metadata_hits|demand_metadata_misses|size|c )"
What you’re looking for: Are metadata misses high? Are demand data misses climbing during a streaming job? If yes, consider setting primarycache=metadata on the dataset that is doing streaming I/O.
Step 4: Identify the dataset causing churn
cr0x@server:~$ zfs iostat -v tank 2 10
What you’re looking for: Which dataset/pool is pushing the bandwidth/IOPS. Map it to business context (backups, analytics, VM storage) and apply cache policy accordingly.
Step 5: Apply a targeted change (small blast radius)
cr0x@server:~$ sudo zfs set primarycache=metadata tank/backup
What you’re looking for: Reduced ARC churn and improved stability for other workloads. Don’t change ARC max first unless you’re actively dying; fix what gets admitted before shrinking the cache.
Common mistakes (symptoms and fixes)
Mistake 1: Setting primarycache=none on “storage that is slow” (and making it slower)
Symptoms: VM boot storms get worse, small random reads spike in latency, metadata-heavy operations crawl, ARC hit ratio drops for meaningful reads.
Fix: Restore primarycache=all for datasets with reuse (VMs, databases), and quarantine only the streaming datasets. If you need to protect ARC from scans, set primarycache=metadata on backup/archive datasets instead of nuking caching.
Mistake 2: Applying cache policy at the wrong level (inheritance foot-gun)
Symptoms: “We changed it but nothing improved” or “why did the VM datastore change behavior?”
Fix: Use zfs get -r ... source. Correct the parent dataset if the policy is meant to be broad; override on the child if it’s an exception. Document the intended inheritance tree.
Mistake 3: Confusing primarycache with secondarycache
Symptoms: You add an L2ARC device and see little benefit; ARC fills with junk anyway.
Fix: Remember: primarycache controls ARC admission, secondarycache controls L2ARC admission. For streaming datasets, consider primarycache=metadata and secondarycache=metadata, not “L2ARC will save me.”
Mistake 4: Over-optimizing for hit ratio instead of latency
Symptoms: The metrics dashboard looks “better” after a change, but users complain more.
Fix: Treat hit ratio as a clue, not a KPI. Focus on demand metadata misses, tail latency, and disk queue depth. Cache irrelevant data and you can inflate hits while starving the real workload.
Mistake 5: Ignoring recordsize and workload alignment
Symptoms: ARC grows quickly during bulk operations; memory pressure increases; little performance gain.
Fix: Big recordsize plus streaming reads often means “cache pollution at scale.” Use primarycache=metadata on those datasets, and keep big records where they belong (backup/archive), not on random I/O workloads.
Mistake 6: Fighting ARC max/min before fixing admission quality
Symptoms: You lower ARC max to “free memory,” but performance gets unstable, metadata misses rise, and disk I/O increases.
Fix: First, stop admitting junk with primarycache. Then, if you still need to cap ARC for co-located workloads, do it carefully and validate metadata miss behavior.
Checklists / step-by-step plan
Checklist A: Decide the right primarycache value for a dataset
- Classify the dataset: VM/databases, general file shares, backups, archive, scratch.
- Ask one brutal question: “Do we reread the same data blocks within hours?”
- If yes: start with
primarycache=all. - If mostly no, but metadata operations matter: choose
primarycache=metadata. - If it is disposable/one-time stream: consider
primarycache=none. - Validate with
zfs iostatand ARC stats during the actual job window.
Checklist B: Implement safely (minimal blast radius)
- Confirm the dataset path and inheritance tree:
cr0x@server:~$ zfs list -r tank cr0x@server:~$ zfs get -r -o name,value,source primarycache tank - Change only one dataset at a time:
cr0x@server:~$ sudo zfs set primarycache=metadata tank/backup - Watch for 15–60 minutes (or one job run):
cr0x@server:~$ zfs iostat -v tank 5 cr0x@server:~$ cat /proc/spl/kstat/zfs/arcstats | egrep "^(size|hits|misses|demand_metadata_misses|demand_data_misses)" - Record the outcome: latency, swap, disk
await, ARC behavior. - If it helped, codify it (infra-as-code, provisioning templates, or a runbook).
Checklist C: Separate workloads so caching policy can work
- Create datasets by workload class, not by org chart.
- Keep backups/archives off the VM dataset tree.
- Apply
primarycacheat the dataset boundary where the workload is homogeneous. - Audit quarterly for “someone dumped a new workload into the wrong dataset.”
FAQ
1) Does primarycache change affect existing cached data immediately?
No. It affects what gets admitted going forward. Existing ARC contents will age out as the cache turns over. If you need immediate impact, the operational approach is usually “wait for churn” or schedule changes before the heavy job, not forcibly purging caches on production.
2) Should I set primarycache=metadata for all backups?
Often yes, because backups are classic “read once” streams that can pollute ARC, while metadata still matters for browsing, snapshot management, and restore selection. But if you routinely restore the same subset repeatedly (test restores, frequent partial restores), you might benefit from caching some data—measure it.
3) Is primarycache=none ever correct?
Yes, for truly disposable datasets (scratch, staging) or for workflows where caching actively harms more important workloads. The mistake is using none as a blanket “ZFS uses too much RAM” response.
4) How do primarycache and secondarycache differ in plain terms?
primarycache decides what can go into RAM (ARC). secondarycache decides what can go into L2ARC (usually SSD). You can tell ZFS “metadata only” for both to keep both cache tiers focused on filesystem structure rather than bulk data.
5) Will setting primarycache=metadata speed up writes?
Not directly. It’s about read caching. However, indirectly, it can improve overall system stability and reduce memory pressure, which can help write-heavy systems that were suffering from reclaim or swapping. Writes are more influenced by ZIL/SLOG behavior, transaction group tuning, device latency, and sync settings.
6) If my application has its own cache (Redis, PostgreSQL shared_buffers), should I disable ARC data caching?
Not automatically. Many applications cache some things well and others poorly, and they don’t cover cold-start or atypical operations. Filesystem cache often helps with indexes, binaries, shared libraries, and OS activity. Treat primarycache as a workload tool: test changes during realistic operations (including restarts, migrations, and peak read bursts).
7) Why does ARC still grow if I set primarycache=metadata on a big dataset?
Because other datasets may still be caching data, metadata can still be substantial, and ARC also holds different kinds of internal structures. Also, ARC growth is not inherently bad; it’s bad when it crowds out more valuable memory consumers or when it’s filled with low-value blocks.
8) What’s the simplest “good default” policy for a mixed pool?
Keep primarycache=all for VM/databases and interactive file shares. Use primarycache=metadata for backups/archives. Use primarycache=none for scratch/staging that’s truly one-pass. The real trick is not the values—it’s creating datasets that cleanly separate the workloads.
9) Can I fix ARC pollution without changing primarycache?
Sometimes. ARC is designed to resist scan pollution, and you can also manage ARC size caps. But primarycache is the cleanest per-dataset admission policy. If the problem is “this dataset should never consume RAM for data,” don’t rely on global heuristics to guess that.
10) How do I know if metadata caching is the bottleneck?
Look at demand metadata misses in arcstats, and correlate with latency during directory walks, snapshot operations, and small-file workloads. If metadata misses rise and disks are busy, your “cache problem” is likely metadata, not bulk data.
Conclusion
primarycache is one of those ZFS properties that looks like a minor preference until you operate a mixed workload pool for long enough. Then it becomes a boundary: the difference between ARC being a high-IQ performance tier and ARC being a hoarder with unlimited closet space.
The practical play is simple: separate workloads into datasets that make sense, keep primarycache=all where data is actually reused, and use primarycache=metadata (or occasionally none) to stop streaming and disposable workloads from evicting your real hot set. Do it with measurement, not vibes. In production, “boring and correct” beats “clever and global” every time.