Ubuntu 24.04: Swap on SSD — do it safely (and when you shouldn’t) (case #50)

January 7, 2026 • February 3, 2026 • Read: 22 min • Views: 13

Was this helpful?

Swap is one of those Linux features everyone thinks they understand—right up until a box starts “working” at 0.2 load while every command takes 30 seconds.
On Ubuntu 24.04, the defaults are sensible for laptops and tolerable for most servers, but “swap on SSD” is still a decision with sharp edges.

This is the production version: what swap actually buys you, how SSD changes the tradeoffs, what to check when performance falls off a cliff,
and the exact commands to make changes without turning your filesystem into an incident report.

What swap is (and what it isn’t)

Swap is disk-backed virtual memory. It’s a pressure-release valve: when RAM is tight, the kernel can evict cold pages out of RAM,
freeing space for hot pages. Done well, swap turns “process got killed” into “process got slower.”
Done poorly, swap turns “slow” into “unresponsive.”

On Ubuntu 24.04, you’ll usually have a swapfile at /swap.img created by the installer, managed via systemd.
You might also have zram (compressed RAM swap) depending on flavor and configuration. Each has different failure modes.

The mental model you need

Swap is not extra RAM. It’s a fallback with very different latency.
Swap is not only for emergencies. It can be used proactively to keep cache behavior efficient, especially on mixed workloads.
Swap is not the cause of your memory leak. It’s just where the leak goes to die slowly.

SSD changes the game because swap I/O becomes “less awful,” not “fast.”
NVMe can do impressive IOPS, but swapping is often small random reads during page faults—latency matters, queueing matters,
and cgroup limits matter. A fast SSD can still be the slowest thing in the room.

One quote that operations people internalize early:
“Hope is not a strategy.” — General Gordon R. Sullivan

If your plan is “enable swap and hope,” you’ll get the swap you deserve.

Facts and history that matter

Swap predates Linux. Paging and swap strategies were core to UNIX time-sharing systems long before consumer PCs had enough RAM.
Linux uses swap more intelligently than people assume. It can swap out anonymous memory while keeping file cache hot, depending on pressure and tunables.
The “swap hurts SSD” fear is historically rooted in early flash. Early consumer SSDs had weaker controllers and endurance; modern NVMe drives are far more robust.
Ubuntu shifted toward swapfiles years ago. Swap partitions are still valid, but swapfiles are easier to resize and deploy at scale.
Hibernate is swap’s special case. Hibernation wants a swap area large enough to hold RAM contents, and swapfiles require extra boot configuration.
cgroups changed the meaning of “out of memory.” A container can OOM-kill while the host has free RAM, and swap may be disabled at the cgroup layer.
zram became popular because CPU got cheap. Compressing cold pages in RAM can beat going to disk, especially on laptops and small VMs.
NVMe made swap “possible” on high-throughput systems. It won’t make swap fast, but it can stop it from being catastrophically slow like spinning disks.

Joke #1: Swap is like an airport moving walkway—helpful if you’re already walking, embarrassing if you’re using it to haul a piano.

Should you put swap on SSD?

My opinionated default

On Ubuntu 24.04 with an SSD, you should generally keep some swap enabled—unless you have a specific reason not to.
“Some” means a few gigabytes for most servers, and enough for hibernate if you actually use hibernate.

Swap on SSD is good for

Preventing sudden OOM kills on bursty workloads (package builds, log indexing, JVM warmups).
Keeping the system stable under memory pressure so you can log in, inspect, and fix the real issue.
Reducing tail-risk when you run many services and one misbehaves.
Overcommit sanity on hosts that use memory overcommit responsibly (and have monitoring).

Swap on SSD is bad for

Latency-sensitive, real-time-ish services where occasional page faults are unacceptable.
Systems already I/O bound (busy database disks, saturated NVMe queues, storage nodes doing real work).
“We refuse to size RAM properly” cultures. Swap will become a crutch, then a bill.

The key question

Are you trying to survive a bad day, or trying to save money by underprovisioning memory permanently?
Swap is decent at the first and terrible at the second.

Fast diagnosis playbook

When a host feels “frozen,” you don’t have time for philosophy. You need a short path to the bottleneck.
Here’s the order that tends to work in real production.

1) Confirm you’re swapping (and how badly)

If swap usage is near zero, this isn’t a swap incident. Move on to CPU, I/O, or locks.
If swap usage is high and growing, you’re in memory pressure and likely approaching thrash.
If swap usage is stable but the machine is slow, you might have swap-in storms (page faults) rather than continuous swap-out.

2) Decide whether the bottleneck is disk I/O or CPU

High wa (I/O wait) and high swap-in/out rates usually means the storage path is the limiter.
Low I/O wait but heavy CPU with zram compression can mean you’re paying CPU to avoid disk I/O.

3) Identify the process class causing the pressure

Look for one runaway (memory leak) versus many normal processes combined (underprovisioned RAM).
On servers, check cgroup limits first; the host may be fine.

4) Act: stabilize, then fix

If it’s thrashing: reduce load, stop the offender, or temporarily add swap/zram only if it restores control.
If it’s chronic: add RAM or reduce working set. Tuning swappiness is not a substitute for capacity planning.

Practical tasks (commands, outputs, decisions)

These are the exact checks and interventions I run on Ubuntu 24.04 hosts. Each task includes a realistic output snippet,
what it means, and the decision you make.

Task 1: See what swap exists and whether it’s active

cr0x@server:~$ swapon --show --bytes
NAME       TYPE SIZE        USED PRIO
/swap.img  file 8589934592  0    -2

Meaning: You have an 8 GiB swapfile and it’s enabled, currently unused.

Decision: If you’re troubleshooting slowness and USED is near SIZE, you’re likely in memory pressure. If USED is 0, swap isn’t the culprit.

Task 2: Quick RAM + swap status snapshot

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi        25Gi       1.2Gi       1.0Gi       5.0Gi       3.0Gi
Swap:          8.0Gi       6.5Gi       1.5Gi

Meaning: Swap is heavily used and available RAM is low. This is a “slow now, worse later” posture.

Decision: Check whether the system is actively swapping (Task 3). If it’s not actively swapping, this might be historical swap usage that hasn’t been reclaimed.

Task 3: Detect active swapping vs old swapped-out pages

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0 6815744  93248  90112 2840000 120  240   300   600  800 1200 12  5 70 13  0
 3  1 6897664  84520  89400 2815000 800 1200  2100  3300 1400 2100 10  6 55 29  0
 1  1 7012352  70212  88000 2780000 600  900  1600  2400 1300 1900  9  5 58 28  0
 1  0 7012352  76000  88500 2779000   0    0   200   300  900 1400  8  4 84  4  0
 1  0 7012352  77000  88500 2778000   0    0   150   250  850 1300  7  4 85  4  0

Meaning: si/so (swap-in/swap-out) spikes show active thrash during the first seconds, then calms.

Decision: If spikes correlate with latency, you need to reduce memory pressure or I/O contention. If it’s consistently high, you’re in sustained thrash.

Task 4: Check kernel swappiness and cache pressure knobs

cr0x@server:~$ sysctl vm.swappiness vm.vfs_cache_pressure
vm.swappiness = 60
vm.vfs_cache_pressure = 100

Meaning: Default-ish behavior: swap is allowed as pressure rises.

Decision: For many servers, consider vm.swappiness=10 or 20 if you want to strongly prefer keeping anonymous pages in RAM—but only after confirming you’re not masking an undersized RAM problem.

Task 5: See if the system is OOM-killing

cr0x@server:~$ journalctl -k -g "Out of memory" -n 5
Dec 30 09:12:21 server kernel: Out of memory: Killed process 21877 (java) total-vm:18233400kB, anon-rss:12984320kB, file-rss:12288kB, shmem-rss:0kB, UID:1001 pgtables:36244kB oom_score_adj:0

Meaning: You’re not just swapping; you’re still hitting OOM. Swap isn’t large enough or the memory demand is spiking faster than the system can recover.

Decision: Fix the offending service or increase memory. Increasing swap can buy time, but if the working set truly exceeds RAM, performance will suffer.

Task 6: Identify top memory consumers quickly

cr0x@server:~$ ps -eo pid,comm,rss,vsz,pmem --sort=-rss | head
  PID COMMAND           RSS      VSZ %MEM
21877 java          12984320 18233400 41.2
 8421 postgres       2845120  5120000  9.0
 3112 node            980000  1600000  3.1
 1991 snapd            210000   980000  0.6

Meaning: One JVM is dominating resident memory. That’s likely the pressure source.

Decision: Validate whether it’s expected (cache) or leak/growth. For Java, inspect heap settings; for anything, check recent deploys.

Task 7: Check per-process swap usage (who’s actually swapped out)

cr0x@server:~$ sudo smem -rs swap | head
  PID User     Command                         Swap      USS      PSS      RSS
 8421 postgres /usr/lib/postgresql/16/bin/...  2200M    850M    910M   1200M
 3112 www-data node /srv/app/server.js         1100M    420M    480M    600M
 1991 root     /usr/lib/snapd/snapd             210M     80M     90M    140M

Meaning: Postgres and node have large swapped-out footprints. That usually implies they were idle at some point or the kernel chose them as eviction candidates.

Decision: If those processes are latency-critical, consider adjusting swappiness, memory limits, or ensuring enough RAM. If they’re batchy, this may be acceptable.

Task 8: Measure real storage pressure and queueing

cr0x@server:~$ iostat -xz 1 3
Linux 6.8.0-41-generic (server) 	12/30/2025 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.12    0.00    4.50   28.20    0.00   57.18

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   w_await wareq-sz aqu-sz  %util
nvme0n1         980.0   18000.0     0.0   0.00    7.20    18.37   420.0   12000.0   12.40    28.57   8.10  98.50

Meaning: The NVMe is nearly saturated (%util ~ 98%) and awaits are climbing. If swap is active, it’s competing with everything else.

Decision: If your swap sits on the same device as your database, that’s a knife fight in a phone booth. Consider moving swap to a less contended device, or fix memory sizing.

Task 9: Confirm whether swap is a file and where it lives

cr0x@server:~$ ls -lh /swap.img
-rw------- 1 root root 8.0G Dec 30 08:52 /swap.img

Meaning: It’s a swapfile on the root filesystem.

Decision: Ensure the filesystem supports swapfiles safely (ext4 does; btrfs needs special handling). If root is on encryption/LVM, that’s okay, but understand boot implications for hibernate.

Task 10: Verify filesystem type (swapfile safety check)

cr0x@server:~$ findmnt -no FSTYPE /
ext4

Meaning: ext4 is swapfile-friendly.

Decision: Proceed normally. If you see btrfs, stop and follow btrfs-specific swapfile rules (no CoW, contiguous extents).

Task 11: Check TRIM/discard status (SSD housekeeping)

cr0x@server:~$ systemctl status fstrim.timer --no-pager
● fstrim.timer - Discard unused blocks once a week
     Loaded: loaded (/usr/lib/systemd/system/fstrim.timer; enabled; preset: enabled)
     Active: active (waiting) since Mon 2025-12-29 10:00:00 UTC; 23h ago
    Trigger: Mon 2026-01-05 10:00:00 UTC; 5 days left

Meaning: Weekly TRIM is enabled. Good: it helps SSD performance consistency over time.

Decision: Keep it. Don’t add continuous discard mount options just to feel productive; scheduled TRIM is usually the boring correct choice.

Task 12: See memory pressure signals (PSI) and act like an adult

cr0x@server:~$ cat /proc/pressure/memory
some avg10=0.85 avg60=0.40 avg300=0.15 total=23840219
full avg10=0.20 avg60=0.08 avg300=0.02 total=4022191

Meaning: The system is regularly stalled on memory reclaim (some) and occasionally completely blocked (full).

Decision: This isn’t theoretical. You’re paying real latency. Either reduce working set, add memory, or add fast swap/zram as a mitigation while you fix the root cause.

Task 13: Check whether zram is enabled (and how big)

cr0x@server:~$ swapon --show
NAME       TYPE      SIZE   USED PRIO
/swap.img  file      8G     2.1G   -2
/dev/zram0 partition 4G     1.3G  100

Meaning: zram is present with higher priority (100). The kernel will use zram before the SSD swapfile.

Decision: This is often good on general-purpose systems. If CPU is tight and compression costs are hurting, consider tuning or disabling zram—but only with evidence.

Task 14: Validate swap priority and adjust intentionally

cr0x@server:~$ cat /proc/swaps
Filename				Type		Size		Used		Priority
/dev/zram0                               partition	4194300	1365000		100
/swap.img                                file		8388604	2200000		-2

Meaning: zram will be favored, swapfile is fallback.

Decision: Good default: keep SSD swap as “second line of defense.” If you rely on SSD swap for hibernate, ensure priorities and sizes don’t block that plan.

Task 15: Temporarily disable swap to prove it’s the bottleneck (carefully)

cr0x@server:~$ sudo swapoff -a
swapoff: /swap.img: swapoff failed: Cannot allocate memory

Meaning: You don’t have enough free RAM to pull swapped pages back in. Swap is currently holding up the building.

Decision: Do not force this. Reduce memory usage first (stop services, lower load) or add RAM. Disabling swap in this state makes the OOM killer your new SRE.

Task 16: Check NVMe health and endurance signals (sanity, not superstition)

cr0x@server:~$ sudo smartctl -a /dev/nvme0n1 | egrep -i "data units written|percentage used|critical_warning"
critical_warning                    : 0x00
percentage_used                     : 3%
data_units_written                  : 1,234,567 [632 TB]

Meaning: Drive health looks fine; endurance usage is low. Swap I/O is not automatically a death sentence.

Decision: Focus on performance and stability. Don’t ban swap based on outdated SSD folklore; validate with health and write metrics.

Safe setup patterns on Ubuntu 24.04

Pattern A: Keep the default swapfile, but tune behavior

For most systems, the simplest safe approach is: keep /swap.img, confirm it’s on ext4/xfs,
ensure TRIM is running, and tune swappiness conservatively.

Set swappiness persistently

cr0x@server:~$ echo 'vm.swappiness=20' | sudo tee /etc/sysctl.d/99-swappiness.conf
vm.swappiness=20

cr0x@server:~$ sudo sysctl --system | tail -n 3
* Applying /etc/sysctl.d/99-swappiness.conf ...
vm.swappiness = 20

Meaning: The kernel will try harder to keep anonymous memory in RAM and reclaim cache first.

Decision: Use 10–30 for many server workloads. Avoid setting 0 as a “no swap” flag; it’s not the same and can lead to surprising reclaim behavior.

Pattern B: Use zram as first line, SSD swap as second

If you run mixed workloads and want graceful degradation, zram buys you time without hammering the SSD immediately.
On modern CPUs, this is often a net win. On tiny CPUs or already CPU-bound boxes, it can be counterproductive.

Ubuntu might already provide zram via packages and presets depending on your installation. If you enable it yourself, keep it modest.
Over-allocating zram is how you turn “compression” into “why is the CPU fan screaming in a datacenter rack.”

Pattern C: Dedicated swap partition on SSD (rarely necessary)

Swap partitions are still fine. They can be simpler for hibernate and avoid some filesystem corner cases.
But for most production fleets, swapfiles are easier to resize and manage with configuration tooling.

Resizing a swapfile safely

The safe sequence is: disable swapfile, resize, set permissions, format swap signature, re-enable, validate.
The unsafe sequence is “truncate a live swapfile and pray.”

cr0x@server:~$ sudo swapon --show
NAME      TYPE SIZE USED PRIO
/swap.img file 8G   0B   -2

cr0x@server:~$ sudo swapoff /swap.img

cr0x@server:~$ sudo fallocate -l 16G /swap.img

cr0x@server:~$ sudo chmod 600 /swap.img

cr0x@server:~$ sudo mkswap /swap.img
Setting up swapspace version 1, size = 16 GiB (17179865088 bytes)
no label, UUID=2b4b2b0a-3baf-4c5d-9b7e-8e8d7d0f5b43

cr0x@server:~$ sudo swapon /swap.img

cr0x@server:~$ swapon --show
NAME      TYPE SIZE  USED PRIO
/swap.img file 16G   0B   -2

Meaning: Swap is resized and active.

Decision: Pick a size that matches your goal: crash survivability and operational control, not permanent RAM replacement.

SSD-specific hygiene that actually matters

Avoid contended devices: don’t place swap on the same heavily used NVMe as your primary database storage if you can avoid it.
Keep TRIM working: weekly fstrim.timer is enough for most.
Don’t obsess about wear without evidence: check SMART/NVMe stats and write amplification assumptions before you declare swap “unsafe.”
Use priorities: if you have multiple swap backends (zram + SSD swapfile), set priorities intentionally.

When you should not use swap on SSD

There are real cases where swap on SSD is the wrong move. Not because SSDs are delicate flowers,
but because swap I/O in the wrong place becomes a system-wide outage amplifier.

1) Storage nodes and database servers with saturated disks

If your NVMe is already busy serving database reads/writes or storage replication, swap competes in the same queues.
Under memory pressure, latency spikes cause more timeouts, more retries, more load, more swap. You get the spiral.

In these systems, the more correct answer is: size RAM for the working set, set memory limits, and keep swap minimal—sometimes even disabled—if you can guarantee you won’t need it.
That last clause is where most teams lie to themselves.

2) Ultra-low-latency services

If you run services with hard latency SLOs and spiky access patterns, page faults from swap can violate SLOs in ways that are hard to reproduce.
Your metrics will show “one weird spike” that ruins the day.

3) Misconfigured hibernate expectations

If you “need hibernate” but you don’t configure resume for swapfiles, you’ll eventually test hibernate at the worst moment.
And it won’t come back.

4) Encrypted root with complex early-boot constraints (edge cases)

Swap on encrypted root is fine for normal swapping. Hibernate resume is where complexity enters: initramfs needs to find the swap area early.
If you don’t want to own that complexity, don’t claim you need hibernate.

Joke #2: Nothing says “Friday afternoon” like discovering your “temporary swap tweak” was baked into the golden image six months ago.

Common mistakes: symptom → root cause → fix

1) Symptom: host is “up” but SSH takes 20–60 seconds

Root cause: swap thrash (high page fault latency) and the CPU is stuck in reclaim; interactive shells page-fault constantly.

Fix: Confirm with vmstat (si/so) and PSI. Reduce load immediately (stop the offender), then add RAM or cap memory usage. Consider zram as a mitigation.

2) Symptom: swap is full, but `si/so` is near zero

Root cause: historical swap use; the swapped pages are cold and not being referenced. Not an active incident by itself.

Fix: If performance is fine, do nothing. If you want to reclaim swap, you can swapoff/swapon during a maintenance window only if there is enough free RAM.

3) Symptom: OOM kills happen even with plenty of swap

Root cause: cgroup memory limits (containers), or rapid allocation bursts where reclaim can’t keep up, or oom_score_adj policies that target the wrong process.

Fix: Check cgroup limits, container runtime swap settings, and kernel logs. Fix the memory consumer or raise the limit; don’t just grow swap.

4) Symptom: NVMe %util pinned at ~100% and everything slows

Root cause: swap I/O competing with primary workload I/O, often on the same device.

Fix: Move swap to a less contended device or reduce swapping by adding RAM / lowering working set. If possible, isolate I/O classes with cgroups and I/O controllers, but don’t expect miracles.

5) Symptom: enabling zram makes the system slower

Root cause: CPU-bound workload; compression overhead steals cycles from the app. Or zram is oversized and causes churn.

Fix: Measure CPU time and load. Reduce zram size or disable zram on that host class; rely on SSD swap as fallback if appropriate.

6) Symptom: swapfile exists but isn’t used at all

Root cause: swap is disabled, wrong /etc/fstab entry, or systemd unit not active.

Fix: Run swapon --show. Ensure /etc/fstab includes the swapfile and that permissions are correct (0600), then enable swap.

7) Symptom: hibernate fails or resumes into a reboot

Root cause: resume offset not configured for swapfile, or swap not large enough, or early boot can’t access the swap backing device.

Fix: If you must hibernate, use a swap partition or configure resume properly for swapfile and rebuild initramfs. Otherwise, stop pretending hibernate is a requirement.

Checklists / step-by-step plan

Checklist A: Decide whether swap on SSD is acceptable for this host

Is the workload latency-critical with hard tail SLOs? If yes, keep swap minimal and focus on RAM sizing and limits.
Is the SSD heavily used for primary workload I/O? If yes, swap competes and may amplify incidents.
Do you need hibernate? If yes, plan swap size and resume configuration intentionally.
Do you run containers with strict memory limits? If yes, validate cgroup swap settings; host swap won’t save a container that’s configured to not use it.
Do you have monitoring for memory pressure (PSI), swap activity, and disk latency? If no, you’re flying by instruments you didn’t install.

Checklist B: Deploy a safe swapfile on Ubuntu 24.04 (ext4 root)

Confirm filesystem type: findmnt -no FSTYPE / should be ext4/xfs.
Check current swap: swapon --show.
If resizing, ensure swap usage is low and RAM has headroom; then swapoff the file.
Create/resize with fallocate or dd (fallocate is fine on ext4).
Set permissions: chmod 600.
Format swap: mkswap.
Enable: swapon.
Persist in /etc/fstab (or rely on Ubuntu’s existing configuration if present).
Validate with free -h and swapon --show.

Checklist C: Stabilize a thrashing host (get control back)

Confirm thrash: vmstat 1 and cat /proc/pressure/memory.
Identify offender: ps/smem, then application-level metrics.
Reduce load: stop the non-essential worker, pause batch jobs, scale down concurrency.
Free memory safely: restart the offender if it leaks, or drop caches only if you understand the impact (usually you shouldn’t).
Only then consider changing swap/zram size as a short-term mitigation.
After stability: fix root cause (memory leak, wrong limits, underprovisioning, or noisy neighbor).

Three corporate mini-stories (anonymized, plausible, technically painful)

Incident caused by a wrong assumption: “Swap is disabled in containers, so the host doesn’t matter.”

A team ran a set of Java and Node services on Ubuntu hosts under a container runtime. They believed swap was “a container concern,”
and that disabling swap at the host level was harmless because “containers have memory limits anyway.”
They disabled swap across an environment to “make performance more predictable.”

The first sign of trouble was not performance. It was uptime. Periodic spikes—deploy waves, cache warmups, a few customer batch jobs—
started causing OOM kills. Not graceful restarts, either. The kernel killed the biggest thing it could find in the cgroup at the moment,
which occasionally was a sidecar doing logging and occasionally was the main API process.

The wrong assumption was subtle: the cgroup memory limits were set, but the bursts were legitimate.
Without swap, there was no buffer to absorb transient peaks. The team had tuned JVM heaps close to the limits and assumed the rest of memory
(native allocations, mmap, JIT overhead) would behave.

Once swap was reintroduced—modestly, with conservative swappiness—OOM events dropped sharply. Performance didn’t become “unpredictable.”
It became survivable. The real fix, later, was better memory headroom and more realistic container limits.
Swap was not the solution. It was the seatbelt.

Optimization that backfired: “Put swap on the fastest NVMe and crank swappiness for better cache.”

Another org had shiny NVMe everywhere and a culture that valued clever tuning. Someone proposed: put swap on NVMe,
set swappiness high, and let the kernel push cold pages out so file cache can grow. For a while, it looked good in benchmarks.

Then reality arrived: a mixed workload host running a database, a metrics stack, and some batch jobs. Under a daily batch run,
memory pressure rose and the kernel started swapping. The NVMe, already busy with database writes and compactions, hit saturation.
Latency spiked. Database queries slowed. Retries increased load. The system began swapping more because more processes were waiting longer,
keeping memory tied up. Classic feedback loop.

The worst part: the graphs were confusing. CPU wasn’t pinned. Network was fine. The team stared at application metrics
while the host quietly queued I/O like it was collecting stamps.

The eventual fix was unsexy: lower swappiness, isolate the batch job to a different node class, and stop putting swap in the same I/O blast radius
as the database. The lesson wasn’t “swap is bad.” The lesson was “swap is I/O,” and I/O is a shared resource whether you acknowledge it or not.

Boring but correct practice that saved the day: “We kept small swap, zram, and pressure alerts.”

A platform team operated a fleet of Ubuntu 24.04 VMs with modest RAM. They had a simple standard:
a small SSD-backed swapfile, zram enabled at a conservative size, and alerts on memory PSI and swap-in rates.
No heroics. No “swap is always evil” arguments. Just defaults with guardrails.

One day, a vendor agent update introduced a memory leak. The agent wasn’t mission critical, but it ran everywhere.
Over several hours, memory pressure crept up. On hosts without swap, this kind of leak tends to produce sudden OOM events and cascading restarts.

On this fleet, the hosts degraded slowly: PSI started climbing, swap-in rates rose, and the alert fired while the systems were still reachable.
Engineers logged in, saw the offender, and rolled back the agent package.
The “fix” was straightforward because the incident was observable and the systems stayed interactive long enough to act.

Swap didn’t prevent the leak. Monitoring didn’t prevent the leak. The combination prevented a bad deploy from becoming a fleet-wide outage.
This is what reliability looks like most days: boring, measurable, and quietly effective.

FAQ

1) Is swap on SSD safe for the drive?

Usually yes. Modern SSDs have wear-leveling and endurance far beyond early consumer drives. Check NVMe SMART metrics and your write volumes.
If you are swapping constantly, you have a capacity problem, not a “swap problem.”

2) How much swap should I use on Ubuntu 24.04?

For servers: often 2–8 GiB as a safety buffer, more if you have occasional spikes and can tolerate slowdowns. For desktops/laptops: enough to smooth spikes,
and enough for hibernate if you use it. If you need hibernate, size swap roughly to RAM (exact needs vary with compression and workload).

3) Swapfile or swap partition?

Swapfile is easier to resize and automate. Swap partition can be simpler for hibernate and avoids filesystem-specific constraints.
On ext4, swapfile is a solid default.

4) Should I set swappiness to 1 or 0?

Not as a reflex. Values like 10–30 are reasonable for many servers to prefer RAM. Setting 0 can behave differently than “almost never swap”
and can cause unpleasant reclaim patterns under pressure.

5) Why is swap used even when I have “free” RAM?

Because “free” isn’t the only goal. The kernel balances anonymous memory and file cache, and it may keep cache hot while pushing out cold anonymous pages.
Look at available in free -h, not just free.

6) Does zram replace SSD swap?

It complements it. zram is fast-ish and avoids disk I/O, but it uses CPU and still consumes RAM (compressed). SSD swap is slower but provides larger backing store.
A common pattern is zram high priority, SSD swap lower priority.

7) Can I disable swap entirely?

Yes, but do it only if you’ve tested memory spikes, you have correct memory limits, and you accept that OOM kills will happen sooner and more abruptly.
Disabling swap does not fix memory leaks or undersized nodes. It just changes the failure mode.

8) How do I know if swap is causing my latency?

Look for active swap-in/out in vmstat, elevated memory PSI, and disk latency/queueing in iostat.
High swap usage alone isn’t proof; high swap activity correlated with stalls is.

9) Will moving swap to a different SSD help?

It can, if the current device is contended. Swap performance is mostly about latency under queueing.
Putting swap on the same busy device as your database can magnify the worst moments.

Next steps you can actually do today

If you run Ubuntu 24.04 on SSD, the sane baseline is: keep a modest swapfile, ensure TRIM is enabled,
and tune swappiness only after you’ve measured real memory pressure. Add zram if it helps your host class, not because it’s trendy.

Do these three things next:

Measure: run swapon --show, vmstat 1, iostat -xz 1, and check /proc/pressure/memory during a slow period.
Decide: if you’re thrashing, fix the working set (limits, leaks, capacity). If you’re just carrying cold pages, don’t overreact.
Harden: keep swap modest, set a conservative swappiness (often 10–30 on servers), and alert on memory pressure before users alert on you.

Swap on SSD isn’t a sin. It’s a tool. Use it like one: with measurements, with clear goals, and with a plan for what happens when it’s not enough.