You get paged: “Host is out of memory.” You SSH in, run free -h, and it tells you memory is “used.” Helpful. Like a smoke alarm that only says “SMOKE.”
The trick is to stop treating RAM as one bucket. Linux splits it into buckets that matter operationally: anonymous memory (your processes), file-backed cache (fast I/O), kernel slabs (metadata), and limits imposed by cgroups. In ten minutes you can usually identify the real eater—and choose the right fix instead of the classic move: reboot and pray.
Fast diagnosis playbook
This is the order that keeps you from chasing ghosts. You’re hunting the biggest bucket first, then narrowing to the guilty party, then validating with the kernel’s own accounting.
First: Is this real memory pressure or just page cache?
- Check
free -hand focus onavailable, notused. - Check swap activity (
vmstat/sar) and major page faults if you have them.
Second: Is the kernel under memory pressure (OOM risk) and why?
- Check
dmesgfor OOM killer logs and which cgroup triggered it. - Check
/proc/meminfofor anon vs file vs slab breakdown.
Third: Identify the top consumers with the right metric
- Start with RSS per process to find obvious hogs (
ps). - Then use PSS when shared memory makes RSS misleading (
smemor/proc/*/smaps_rollup). - If containers are involved, check cgroups first; “top on the host” is not a container accounting tool.
Fourth: If processes don’t explain it, suspect kernel memory
- Slab growth:
slabtop,/proc/slabinfo,Slabin meminfo. - Kernel stacks, page tables, and unreclaimable memory can take a host down quietly.
Fifth: Validate the fix path before doing anything dramatic
- Kill/restart only the culprit, not the box.
- Set sane limits (systemd/Kubernetes) after you know the steady-state usage.
- Fix leaks with evidence: growth curves, not vibes.
The mental model: what “used RAM” really means
Linux uses RAM aggressively because idle memory is wasted opportunity. It will fill memory with file cache, dentries, inode caches, and other performance helpers. That’s not a leak; that’s the kernel doing its job.
The operational question is not “Why is used high?” It’s “Is the system short on reclaimable memory?” That’s why free shows an available estimate. “Available” roughly means: if you start a new workload right now, how much can the kernel free without causing a meltdown?
Memory pressure becomes real when:
- Swap is actively used and especially if swap-in/out rates are non-trivial.
- Direct reclaim stalls your workloads: latency spikes, CPU in kernel, kswapd busy.
- OOM killer appears, either at host level or inside a cgroup (containers love to die quietly).
- Slab or unreclaimable memory grows, leaving less reclaimable space.
Here’s the breakdown you should keep in your head:
- Anonymous memory (AnonPages): heaps, stacks, JITs, mallocs. This is what “applications” consume.
- File-backed memory (Cached): filesystem cache. Reclaimable when needed, usually.
- Slab (SReclaimable + SUnreclaim): kernel caches/metadata. Partially reclaimable.
- Committed memory: promises. Not all promises become debt, but watch overcommit.
- cgroups: limits. You can have plenty of host RAM and still OOM inside a container.
One paraphrased idea from a reliability legend: paraphrased idea
—John Allspaw has argued that production incidents are often “normal work” colliding with assumptions; memory pages fit that pattern perfectly.
Joke #1: Memory troubleshooting is like dieting: the numbers are real, but the labels are lying to you.
The 10-minute method (tasks with commands)
Below are practical tasks you can run on any Linux box (bare metal, VM, container host). Each task includes: the command, what the output means, and the decision you make.
Task 1: Get the reality check from free
cr0x@server:~$ free -h
total used free shared buff/cache available
Mem: 31Gi 27Gi 1.2Gi 512Mi 2.8Gi 2.9Gi
Swap: 8.0Gi 2.1Gi 5.9Gi
What it means: The key number is available. Here it’s ~2.9Gi, which is not comfortable if the workload spikes. Swap is already in use (2.1Gi), which hints at real pressure.
Decision: If available is low and swap is used or growing, keep going. If available is healthy and swap is quiet, your “used RAM” is probably cache and you may have a different problem (like disk or CPU).
Task 2: Check for active paging and reclaim behavior with vmstat
cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 2211840 923456 65536 911872 12 40 15 20 420 880 12 6 78 4 0
1 0 2211840 905216 65536 918528 0 16 0 8 410 860 11 6 79 4 0
3 0 2211840 892928 65536 924160 0 0 0 0 450 910 14 7 76 3 0
4 0 2211840 876544 65536 931072 0 24 0 12 470 980 17 8 72 3 0
2 0 2211840 868352 65536 936960 0 8 0 4 440 900 13 6 77 4 0
What it means: si/so are swap-in/out per second. Non-zero values over time mean the kernel is actively juggling pages. Also watch wa (I/O wait) and b (blocked processes).
Decision: If swap-out persists, you’re under memory pressure. Your next move is to identify which bucket is growing: processes, cache, slab, or cgroups.
Task 3: Read the truth serum: /proc/meminfo
cr0x@server:~$ egrep 'MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|AnonPages|Mapped|Shmem|Slab|SReclaimable|SUnreclaim|KernelStack|PageTables' /proc/meminfo
MemTotal: 32949044 kB
MemFree: 842112 kB
MemAvailable: 3026112 kB
Buffers: 65536 kB
Cached: 956812 kB
SwapTotal: 8388604 kB
SwapFree: 6193152 kB
AnonPages: 25801120 kB
Mapped: 612340 kB
Shmem: 524288 kB
Slab: 1523400 kB
SReclaimable: 812000 kB
SUnreclaim: 711400 kB
KernelStack: 112000 kB
PageTables: 184000 kB
What it means: AnonPages is huge: processes are the main consumer. Cached is relatively small, so it’s not “just page cache.” Slab is non-trivial; note how much is unreclaimable.
Decision: If AnonPages dominates, hunt processes/cgroups. If Cached dominates and MemAvailable is low, you might be thrashing on file cache due to I/O patterns. If Slab/SUnreclaim dominates, suspect kernel memory growth (often filesystem/network related).
Task 4: Check if the OOM killer already fired
cr0x@server:~$ dmesg -T | egrep -i 'oom|out of memory|killed process' | tail -n 20
[Tue Feb 4 10:18:22 2026] Memory cgroup out of memory: Killed process 24198 (java) total-vm:8123456kB, anon-rss:6123456kB, file-rss:12000kB, shmem-rss:0kB, UID:1001 pgtables:14200kB oom_score_adj:0
[Tue Feb 4 10:18:22 2026] oom_reaper: reaped process 24198 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
What it means: This wasn’t a “host OOM.” It was a memory cgroup OOM, killing java inside a limited group (systemd unit or container). This is the most common reason teams swear “but the host had free memory.”
Decision: If cgroup OOM appears, stop looking at host-wide top lists. Switch to cgroup accounting and container/unit limits.
Task 5: Identify top RSS processes quickly
cr0x@server:~$ ps -eo pid,user,comm,rss,pmem --sort=-rss | head -n 15
PID USER COMMAND RSS %MEM
24198 app java 6321452 19.2
19872 app node 1823340 5.5
1321 root dockerd 612448 1.8
2012 mysql mysqld 588120 1.7
1711 root prometheus 312880 0.9
922 root systemd-jou 188244 0.5
2666 root nginx 92240 0.2
What it means: RSS is resident set size: physical RAM mapped into the process. It’s a blunt instrument, but it catches the obvious offenders fast.
Decision: If one process is far above others and correlates with time-of-incident, you have a prime suspect. Next: verify with PSS (shared memory can inflate RSS) and check cgroups/limits.
Task 6: Don’t get fooled by shared memory—use PSS via smaps_rollup
cr0x@server:~$ sudo sh -c 'cat /proc/24198/smaps_rollup | egrep "Pss:|Rss:|Private_Dirty:|Private_Clean:|Shared_Dirty:|Shared_Clean:"'
Rss: 6321452 kB
Pss: 6189021 kB
Shared_Clean: 12400 kB
Shared_Dirty: 1024 kB
Private_Clean: 88000 kB
Private_Dirty: 6219028 kB
What it means: PSS (proportional set size) spreads shared pages across processes. Here PSS is close to RSS, so the process is truly owning that memory (private dirty is massive).
Decision: High Private_Dirty suggests heap growth or memory leak patterns. If PSS is much lower than RSS, you may be blaming the wrong process due to shared libraries or shared mappings.
Task 7: If you have smem, use it; it saves time
cr0x@server:~$ smem -r -k -t | head -n 12
PID User Command Swap USS PSS RSS
24198 app java 1024K 6000M 6044M 6173M
19872 app node 0K 1600M 1652M 1802M
1321 root dockerd 0K 420M 435M 598M
2012 mysql mysqld 0K 510M 522M 575M
-------------------------------------------------------------------------------
1024K 8530M 8653M 9148M
What it means: USS is unique set size (private memory). PSS is the best “who is actually consuming” metric at system level.
Decision: Prioritize the top PSS/USS processes when building a mitigation plan. RSS is fine for a quick glance; PSS is what you cite in a postmortem.
Task 8: Check container or systemd unit limits (cgroups v2)
cr0x@server:~$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cr0x@server:~$ systemctl status myapp.service --no-pager | egrep 'Memory|Tasks'
Memory: 6.3G (limit: 6.5G)
Tasks: 92 (limit: 2048)
What it means: This unit is capped. You can have 128GB free on the host and still get killed at 6.5GB. The limit is part of the system’s contract.
Decision: If the cap is below peak working set, either raise the limit, reduce memory usage, or accept the kill as your “auto-scaler” (not recommended for stateful things).
Task 9: Inspect the cgroup’s memory current/peak and events
cr0x@server:~$ CG=/sys/fs/cgroup/system.slice/myapp.service
cr0x@server:~$ sudo sh -c "cat $CG/memory.current; cat $CG/memory.max; cat $CG/memory.peak; cat $CG/memory.events"
6848124928
6983510016
6950022144
low 0
high 0
max 12
oom 12
oom_kill 12
What it means: memory.max is the hard cap. memory.peak tells you the worst observed usage. oom_kill confirms cgroup kills happened.
Decision: If memory.current is near memory.max, stop treating it as a host problem. It’s a limit problem, or a leak inside the service boundary.
Task 10: Find per-cgroup memory breakdown (anon/file/slab) on cgroups v2
cr0x@server:~$ sudo sh -c "cat $CG/memory.stat | egrep 'anon |file |slab |sock |shmem |file_mapped|file_dirty|inactive_anon|inactive_file|active_anon|active_file'"
anon 6423011328
file 211345408
shmem 0
slab 142110720
sock 9123840
file_mapped 54476800
file_dirty 122880
inactive_anon 6112147456
active_anon 310863872
inactive_file 188743680
active_file 22601728
What it means: The cgroup is dominated by anon. That’s application memory, not cache. Slab is present but not the main story.
Decision: If file dominates, you might be caching within the cgroup; you can tune read patterns or allow more headroom. If anon dominates, you need heap discipline, fewer in-memory objects, or a higher limit.
Task 11: Investigate slab growth with slabtop
cr0x@server:~$ sudo slabtop -o | head -n 15
Active / Total Objects (% used) : 4821102 / 5012240 (96.2%)
Active / Total Slabs (% used) : 118220 / 118220 (100.0%)
Active / Total Caches (% used) : 94 / 132 (71.2%)
Active / Total Size (% used) : 1289012.40K / 1390024.00K (92.7%)
Minimum / Average / Maximum Object : 0.01K / 0.28K / 8.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
812320 801200 98% 0.19K 38777 21 155108K dentry
610112 605900 99% 0.62K 23832 16 238320K inode_cache
420000 418000 99% 0.10K 10769 39 43076K kmalloc-96
What it means: Big dentry/inode caches point to filesystem metadata pressure: lots of files, lots of path lookups, or a workload that churns directory entries (build boxes, unpackers, container image extraction).
Decision: If slab is the culprit, don’t “optimize the app.” Look at filesystem behavior, runaway log directory scans, recursive backups, or millions of tiny files. Also check for kernel bugs or driver leaks if slabs are odd/unexpected.
Task 12: Check open files and FD growth when memory “disappears”
cr0x@server:~$ sudo lsof -p 24198 | wc -l
18452
cr0x@server:~$ cat /proc/24198/limits | egrep 'Max open files'
Max open files 1048576 1048576 files
What it means: High FD counts often correlate with memory usage (buffers, socket memory, per-FD structures, user-space caching). Not always, but it’s a strong “something is growing” signal.
Decision: If FDs rise steadily with memory, you likely have a resource leak (connections, files, watchers). Fix the leak; raising limits only delays the outage.
Task 13: See if tmpfs or shared memory is eating RAM
cr0x@server:~$ df -hT | egrep 'tmpfs|shm'
tmpfs tmpfs 3.2G 2.7G 0.5G 85% /run
tmpfs tmpfs 16G 9.0G 7.0G 57% /dev/shm
What it means: tmpfs uses RAM (and swap). If /dev/shm grows, that memory is effectively anonymous and can pressure the system.
Decision: If tmpfs growth is unexpected, find the writer (often browsers, ML workloads, IPC-heavy systems) and cap or relocate the storage.
Task 14: Check kernel same-page merging (KSM) and THP if you’re chasing weirdness
cr0x@server:~$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
cr0x@server:~$ cat /sys/kernel/mm/ksm/run
0
What it means: THP can improve performance or create latency spikes and memory fragmentation under some workloads. KSM is usually off unless you enabled it (common on virtualization hosts historically).
Decision: Don’t toggle these mid-incident unless you know your workload’s behavior. If you suspect THP issues, capture evidence (fault rates, latency, fragmentation) and change in maintenance windows.
Interpreting results: decisions you can defend
When “used” is high but “available” is fine
If MemAvailable is healthy and swap activity is near zero, you probably have a normal Linux cache-heavy system. The right action is usually: do nothing, but verify you’re not hitting cgroup limits for critical services.
Teams sometimes “fix” this by dropping caches. That’s not a fix; it’s deliberately making the next read slower.
When AnonPages dominates
This is where leaks live. But it’s also where legitimate in-memory working sets live (datastores, JVM heaps, caches you put there on purpose).
- If a process’s
Private_Dirtyrises steadily over hours/days, suspect a leak or unbounded cache. - If it rises with traffic and falls when traffic falls (and GC runs), it might be normal elasticity.
- If it rises after deploy and never returns, treat it as a regression until proven otherwise.
When slab dominates
Kernel memory is often “the invisible tax.” Inodes, dentries, network buffers, conntrack tables, and filesystem metadata can add up. Slab isn’t bad; unbounded slab is bad.
Typical causes:
- Millions of small files and constant directory traversal.
- Exploding container layers and image extraction behavior.
- Network-heavy systems with large connection tracking tables.
- Kernel/driver bugs (rarer, but you’ll know because nothing in user space explains the loss).
When the box has RAM but the service gets OOM-killed
This is a cgroups problem. The service boundary is limited. Either your limit is wrong, your working set grew, or your service is doing something new (like caching more data).
Operationally, treat it like a production contract issue: align limits with reality, and add alerting on memory.current approaching memory.max.
Joke #2: The OOM killer is the only teammate who always takes decisive action—unfortunately, it never attends the postmortem.
Three mini-stories from corporate life
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company ran a fleet of API servers behind a load balancer. Their dashboards showed the hosts at 40–50% “used” memory, and everyone felt safe. Then, after a traffic event, the API started randomly returning 502s. Not a full outage—worse. A partial one that makes everyone argue.
The on-call looked at the host metrics: plenty of RAM, CPU fine, disk fine. They restarted a few pods (Kubernetes), things improved, then degraded again. The incident got labeled “network flakiness” because that’s what people call issues they can’t see.
Finally someone tailed dmesg on a node and saw repeated Memory cgroup out of memory messages. Each kill was happening inside a pod’s cgroup. The host had memory, but the pods were capped and being killed under burst load. Their assumption—host memory equals service memory—was the lie.
The fix wasn’t heroic: set the pod memory limit based on observed PSS during peak, add a safety margin, and alert on memory.events increments. After that, the same traffic pattern produced higher latency (acceptable) instead of random deaths (unacceptable). The biggest lesson: “free RAM” on the node is irrelevant when you’re living inside cgroups.
Mini-story 2: The optimization that backfired
A data platform team wanted to reduce disk reads on a batch processing host. Someone flipped a tuning knob to keep more data in memory at the application layer: larger in-process caches, bigger buffers, more parallelism. Benchmarks got faster. Everybody high-fived. They rolled it out.
Two weeks later, the fleet started swapping under normal load. Latency and job duration went sideways. The team blamed the storage array. Then they blamed the hypervisor. Then they blamed “Linux being weird.” Classic tour.
When they finally measured AnonPages and per-process PSS, they found the new cache was essentially unbounded under certain job mixes. It wasn’t a “leak” in the strict sense; it was a cache without eviction discipline. The app happily hoarded memory, pushing out filesystem cache and forcing the kernel to swap out other pages.
The backfire was subtle: their “optimization” reduced disk reads at small scale, but at production concurrency it increased total memory pressure and created I/O amplification through swapping. The fix was to set hard caps on application caches, and to load-test using production-like concurrency and dataset churn. Memory wins are real; memory hubris is expensive.
Mini-story 3: The boring but correct practice that saved the day
A payments service ran on systemd units (no containers). They had a habit that was deeply unsexy: every service had a baseline memory profile recorded after each release. Not a fancy APM. Just a scripted capture of /proc/meminfo, ps RSS top lists, and per-process smaps_rollup summaries, stored with the build metadata.
One afternoon, a node started showing rising memory use. No alarms yet, just a slow creep. The on-call compared the current PSS profile to last week’s. The delta was obvious: the main process’s private dirty memory was higher by a meaningful chunk and trending upward.
They rolled back before the OOM killer got involved. No customer-facing incident. Later, in a calmer window, they reproduced the leak in staging: a new feature path allocated objects that were held in a global map longer than intended. The bug was fixed quickly because the team had a baseline and could say “this is new” without debate.
The practice wasn’t glamorous. It just turned memory discussions from religion into arithmetic. In production, boring correctness beats clever guesswork every day of the week.
Common mistakes (symptoms → root cause → fix)
1) Symptom: “RAM is 95% used” alert keeps firing, but performance is fine
Root cause: Alert is based on used rather than available. Linux uses RAM for cache by design.
Fix: Alert on MemAvailable (or a derived “available percent”), plus swap activity. Change the on-call runbook to ignore “used” alone.
2) Symptom: Service gets OOM-killed but the host has plenty of free memory
Root cause: Cgroup memory limit hit (systemd/Kubernetes). Host-level metrics mislead.
Fix: Inspect memory.max, memory.current, and memory.events. Set limits based on observed PSS/peak. Add headroom for spikes and fragmentation.
3) Symptom: Swap is used, but no process looks huge in top
Root cause: Shared memory inflates/deflates RSS perceptions; or kernel slab is consuming; or multiple medium processes sum to pressure.
Fix: Use PSS (smem or smaps_rollup), then check slab (slabtop) and /proc/meminfo.
4) Symptom: Memory keeps climbing after deploy, then eventually OOMs
Root cause: Leak or unbounded cache in user space; sometimes a change in traffic mix triggers a growth path.
Fix: Confirm growth with Private_Dirty trend. Apply a hard cap (config) as a mitigation, then debug with heap tooling appropriate to runtime (JVM, Go, Python). Don’t “just add swap.”
5) Symptom: Slab grows, dentry/inode_cache dominate
Root cause: Filesystem metadata churn: huge directories, recursive scans, build artifacts, log storms, container image unpack churn.
Fix: Reduce file count and churn; fix scripts doing repeated recursive find; rotate logs sanely; avoid exploding tiny files on shared hosts. If it’s a build host, isolate workloads.
6) Symptom: Random latency spikes, kswapd CPU, but RAM graphs look “okay”
Root cause: Memory reclaim and compaction overhead; possible THP side-effects; memory fragmentation.
Fix: Correlate with paging stats and reclaim. Consider madvise THP mode for certain workloads, but only with testing. The fix is often “less memory pressure,” not “different graphs.”
7) Symptom: “Cache is huge, drop_caches fixes it”
Root cause: Dropping cache masks an underlying issue like a leaking process, runaway metadata, or an undersized cgroup; the kernel will refill cache anyway.
Fix: Don’t operationalize drop_caches as a routine action. Measure what’s driving cache growth; fix the workload or set realistic memory budgets.
Checklists / step-by-step plan
10-minute on-call checklist (do this in order)
- Run
free -h. Ifavailableis low, proceed; if not, validate swap and cgroups anyway. - Run
vmstat 1 5. Ifsi/soare non-zero persistently, treat as real memory pressure. - Run the meminfo grep. Decide: anon-heavy vs cache-heavy vs slab-heavy.
- Check
dmesgfor OOM. If it’s cgroup OOM, stop and pivot to cgroups. - List top RSS processes with
ps; identify candidates. - Validate top candidates with
smaps_rollup(PSS/private dirty). - If containers/systemd units are involved: read
memory.current,memory.max,memory.events, andmemory.stat. - If processes don’t add up: investigate slab with
slabtop. - Check tmpfs usage (
df -hTfor tmpfs/shm). - Choose action: restart offender, raise limit, fix leak, reduce file churn, or scale out. Avoid reboot unless you must stop the bleeding.
Decision checklist: what you change based on what you find
- Single process owns memory (high PSS/private dirty): Mitigate by restart/rollback; implement caps; debug leak.
- Many processes sum to pressure: Reduce concurrency, scale horizontally, or move heavy jobs off shared hosts.
- Cgroup cap is too low: Raise cap with margin; right-size requests/limits; alert on approaching cap.
- Slab is the main consumer: Fix filesystem/network churn; reduce file counts; investigate kernel subsystems.
- Swap activity is the problem: Reduce memory usage; consider reducing swappiness only after confirming it helps the workload.
Post-incident checklist (so you don’t relive it)
- Capture a memory profile snapshot: meminfo, top PSS list, cgroup stats, slabtop summary.
- Add alerts that reflect reality: MemAvailable, swap I/O, cgroup OOM kills, and slab growth.
- Set budgets: per-service memory budgets and test them under realistic concurrency.
- Document the “known good” baseline per release so regressions are obvious.
Facts & historical context (the stuff that explains today’s weirdness)
- The
MemAvailablefield is relatively modern in kernel terms; it was added because “free memory” was a terrible predictor of reclaimable memory. - Linux intentionally uses spare RAM for page cache to avoid slow disk reads; seeing low “free” is often a sign the kernel is doing its job.
- OOM killer decisions are heuristic: it assigns “badness” scores to processes. It’s not moral judgment; it’s triage under duress.
- cgroups made “one host, many memory worlds” normal. A container can OOM while the node is fine, because the node isn’t the container’s universe.
- RSS can overcount shared pages. That’s why PSS exists: to attribute shared memory more fairly across processes.
- tmpfs is RAM-backed (and swap-backed). Storing “temporary files” in tmpfs can absolutely become “permanent memory pressure.”
- Slab caches are a performance feature: kernels cache objects like dentries/inodes because allocating/freeing them constantly is expensive.
- Transparent Huge Pages became popular for throughput gains, but they also introduced new operational tradeoffs: compaction cost, fragmentation, and latency sensitivity.
FAQ
1) Why does free show almost no “free” memory on a healthy system?
Because Linux uses RAM as cache. Look at available, not free. “Free” is mostly unused pages; “available” estimates reclaimable pages plus unused.
2) Should I drop caches to “free memory”?
Almost never in production. Dropping caches is like dumping your toolbox onto the floor so the bench looks clean. It may relieve pressure briefly, but it usually hurts performance and hides root causes.
3) What’s the difference between RSS, VIRT, USS, and PSS?
VIRT is virtual address space; it can be huge and meaningless. RSS is resident pages in RAM, but it overcounts shared pages. USS is private memory (unique). PSS is the best system-wide attribution metric because it shares shared pages proportionally.
4) The OOM killer killed my biggest process. Does that mean it was the cause?
Not necessarily. It was the most “killable” process by the heuristic at that moment. The cause might be aggregate pressure, a cgroup limit, or unreclaimable kernel memory growth.
5) How do I know if it’s a cgroup OOM?
Look in dmesg for “Memory cgroup out of memory” and check /sys/fs/cgroup/.../memory.events. If oom_kill increments there, it’s a cgroup issue.
6) Why is swap usage non-zero even when I have free RAM?
The kernel may move cold anonymous pages to swap to keep more file cache hot. Non-zero swap isn’t automatically bad. Active swap-in/out rates (and latency impact) are what matter.
7) What usually causes slab memory to explode?
Filesystem metadata churn (dentrites/inodes), network tracking structures, and sometimes kernel bugs. slabtop tells you which cache is growing; that points to the subsystem.
8) My container shows low memory usage inside, but the host says it’s huge. Who’s right?
They might both be “right” because they’re reporting different scopes and metrics. For container debugging, trust cgroup files: memory.current and memory.stat. For host debugging, use meminfo plus PSS across processes.
9) How do I detect a memory leak quickly without profiling tools?
Look for monotonic growth in Private_Dirty (via /proc/PID/smaps_rollup) and increasing PSS over time, correlated with uptime or requests. Then mitigate with restart/rollback and implement a cap if possible.
10) Is it ever correct to add more RAM?
Yes—when the working set is legitimately larger than the machine and the cost of optimization exceeds the cost of RAM. But confirm with PSS and pressure signals first; don’t pay hardware bills to cover a leak.
Conclusion: next steps that actually reduce pages
When you’re trying to answer “what’s eating RAM,” don’t stare at a single number. Use a sequence that forces the truth: pressure signals, bucket breakdown, then attribution with the right metric (PSS), then scope (cgroups vs host), then kernel memory if user space doesn’t add up.
Practical next steps:
- Update alerts to prioritize
MemAvailable, swap activity, and cgroup OOM kills—not raw “used.” - Add a lightweight memory snapshot script to your incident toolkit (meminfo + top PSS + cgroup stats + slab summary).
- Right-size memory limits using observed
memory.peakand PSS baselines, then add headroom. - For repeat offenders: implement caps for in-app caches, and treat any monotonic private dirty growth as a regression until proven otherwise.
If you do nothing else: next time you’re paged, read /proc/meminfo before you form an opinion. It’s harder to argue with the kernel when it’s giving you line items.