“640 KB is enough”: The quote myth that won’t die

December 1, 2025 • February 3, 2026 • Read: 20 min • Views: 0

Was this helpful?

Somewhere in your org is a spreadsheet cell that quietly assumes “this limit won’t matter.”
It might be a queue depth, a default JVM heap, an inode count, a NAT table size, a writeback cache, or a “temporary” 10 GB volume.
Nobody remembers why it’s that number. Everyone treats it as physics.

That’s how you end up on a 03:00 incident call, arguing with graphs that look like a lie.
The “640 KB is enough” quote survives because it flatters our worst habit: believing today’s constraints are permanent and tomorrow’s demand is negotiable.

The myth, why it’s sticky, and what it teaches

The quote usually goes like this: “640 KB ought to be enough for anybody.” It’s typically attributed to Bill Gates, placed somewhere in the early PC era,
and used as a punchline about arrogance, shortsightedness, or how fast technology changes.

There’s a problem: there’s no solid evidence he said it. The attribution is shaky, the timeframe is fuzzy, and the quote tends to show up in print long after the fact.
The story persists anyway because it’s useful. It compresses a complicated history of hardware, operating systems, and business tradeoffs into one sneer-worthy sentence.

Engineers love a clean moral. Managers love a clean villain. And everyone loves a quote you can deploy in a meeting like a smoke grenade.
But production systems don’t fail because someone said a dumb thing. They fail because a limit existed, was misunderstood, and then got treated as a constant.

Here’s the point you should keep: 640 KB wasn’t a belief about the future; it was a boundary created by design choices and compatibility pressure.
The modern equivalent isn’t “someone thought RAM wouldn’t grow.” It’s “we don’t know which limit is real, which is a default, and which is a landmine.”

First short joke: The “640 KB is enough” quote is like a zombie incident ticket—nobody knows who created it, but it keeps reopening itself.

Facts and context: what 640 KB actually was

To understand why this myth clings to the timeline, you need the boring details. The boring details are where outages come from.
Here are concrete context points that matter, without the cosplay.

8 facts that explain the 640 KB boundary (and why it wasn’t random)

The original IBM PC used the Intel 8088, whose addressing model and early PC architecture made 1 MB of address space a natural ceiling for that era.
The “1 MB limit” wasn’t a vibe; it was structural.
Conventional memory was the first 640 KB (0x00000–0x9FFFF). Above that lived reserved space for video memory, ROM, and hardware mappings.
That reserved region is why “640 KB” appears as a clean number.
The upper memory area (UMA) existed for a reason: video adapters, BIOS ROMs, and expansion ROMs needed address space.
PC compatibility wasn’t optional; it was the product.
MS-DOS ran in real mode, which meant it lived with that conventional memory world.
You can shout at history, but the CPU still does what the CPU does.
Expanded memory (EMS) and extended memory (XMS) were workarounds:
EMS bank-switched memory into a page frame; XMS used memory above 1 MB with a manager. Both were complexity taxes paid for compatibility.
HIMEM.SYS and EMM386.EXE were common tools to access and manage memory beyond conventional limits.
If you ever “optimized” CONFIG.SYS and AUTOEXEC.BAT, you were doing capacity planning with a text editor and prayer.
Protected mode existed, but software ecosystems lagged.
Hardware capability doesn’t instantly rewrite the world; the installed base and compatibility matrix decide what you can ship.
That era was full of tight constraints, but also rapid change.
People weren’t stupid; they were building systems where every kilobyte had a job. The myth survives because we misread constraint as arrogance.

The useful takeaway: the number “640 KB” came from an address space map and pragmatic engineering choices, not a declaration that users would never want more.
It’s the difference between “this is the box we can draw today” and “this box will always be sufficient.”

The real lesson: limits are decisions, not trivia

I don’t care who said what in 1981. I care that in 2026, teams still ship systems with invisible ceilings and then act surprised when they hit them.
The “640 KB” story is a mirror: it shows us what we’re currently hand-waving away.

What “640 KB” looks like in modern production

Default quotas (Kubernetes ephemeral storage, cloud block volume sizes, per-namespace object limits) treated as if they were policy.
Kernel defaults (somaxconn, nf_conntrack_max, fs.file-max) left untouched because “Linux knows best.”
Filesystem limits (inodes, directory scaling behaviors, small file overhead) ignored until “df says there’s space.”
Cache assumptions (“more cache always faster”) that turn into memory pressure, eviction storms, and tail latency spikes.
Queueing and backpressure that doesn’t exist, because someone wanted “simplicity.”

A single quote is comforting; a limit inventory is useful

The myth thrives because it gives you a villain. Villains are easy. Limits are work.
If you run production systems, your job is to know the limits before your users do.

Here’s a paraphrased idea from a notable reliability voice, because it’s the opposite of the 640 KB myth:
paraphrased idea — John Allspaw: Reliability comes from learning and adapting systems, not blaming individuals for outcomes.

Treat “640 KB is enough” as a diagnostic prompt: where are we relying on a historical artifact, a default setting, or a half-remembered constraint?
Then go find it. Write it down. Test it. Put alerts on it. Make it boring.

Three corporate mini-stories from the land of “it’ll be fine”

Mini-story 1: An incident caused by a wrong assumption (“disk full can’t happen; we have monitoring”)

A mid-sized SaaS company ran a multi-tenant Postgres cluster with logical replication into a reporting system.
The primary DB had plenty of free space, and dashboards showed “disk usage stable.” Everyone slept well.

One night, writes slowed, then stalled. Application error rates climbed. The on-call saw the DB was “healthy” by their usual checks:
CPU fine, RAM fine, replication lag rising but not catastrophic. The cluster didn’t crash; it just stopped making forward progress in a way that felt like molasses.

Root cause: the WAL volume filled. Not the main data volume. The WAL mount had a different size, different growth behavior, and a different alert threshold.
The “disk usage stable” dashboard looked at the data filesystem. It never looked at the WAL mount, because someone assumed “it’s on the same disk.”

Worse: the cleanup process that should have removed old WAL segments relied on replication slots. A stuck consumer held slots open.
So WAL grew until it hit the mount limit. The database did exactly what it should do when it can’t safely persist: it stopped accepting work.

The fix was straightforward—resize the mount, add alerts, unstick the consumer, and set sane retention policies. The uncomfortable lesson was not.
The team hadn’t missed a complex failure mode. They’d missed a basic inventory item: what volumes exist, what fills them, and how quickly.

Mini-story 2: An optimization that backfired (“we’ll use huge caches; memory is cheap”)

A payments service had latency issues during peak traffic. The team optimized: more caching in-process, larger connection pools,
and aggressive read-through caches for frequently accessed metadata. Latency improved in staging. The deploy went out with confidence.

In production, tail latency improved for a few hours. Then things got weird. P99 climbed, CPU usage spiked, and error rates became bursty.
The service didn’t look overloaded—until you checked major page faults and reclaim activity. The kernel was fighting for its life.

The optimization created memory pressure and caused the kernel to reclaim file cache aggressively. That meant more disk reads for dependencies.
It also pushed the JVM (yes, this was Java) into a GC posture that looked like a sawtooth of regret.
The service had become “fast on average” and “unpredictable when it mattered,” which is the worst kind of fast.

They rolled back cache sizes, added an explicit memory budget, and moved some cache responsibility to a dedicated tier that could be scaled separately.
The long-term fix included per-endpoint SLOs and load tests that modeled peak cardinality and cache churn—not just steady-state QPS.

The lesson: “memory is cheap” is not an engineering argument. Memory is a shared resource that interacts with IO, GC, and scheduling.
Caches are not free; they are loans you repay with unpredictability unless you budget them.

Mini-story 3: A boring but correct practice that saved the day (capacity headroom + limit drills)

An enterprise internal platform team ran object storage gateways in front of a large storage backend.
The system served logs, artifacts, and backups—everything nobody thinks about until it disappears.

The team had an unsexy practice: every quarter, they ran a “limit drill.”
They would pick a constraint—file descriptors, network connections, cache size, disk throughput, inode usage—and verify alerts, dashboards, and runbooks.
They didn’t do it because it was fun. They did it because unknown limits are where incidents breed.

One week, an application team started uploading millions of tiny objects due to a packaging change.
The backend wasn’t full on bytes, but metadata pressure surged. The gateway nodes began to show elevated IO wait and increased latency.

The platform team caught it early because they had alerts not just on “disk percent used” but also on inode consumption,
request queue depth, and per-device latency. They throttled the noisy workload, coordinated a packaging fix, and added a policy for minimum object size.

Nobody outside the platform team noticed. That’s what “saved the day” looks like: nothing happens, and you get no applause.
Second short joke: Reliability engineering is being proud of an incident that never makes it into a slide deck.

Practical tasks: commands, outputs, decisions

My bias: if you can’t interrogate the system with a command, you don’t understand the system.
Below are practical tasks you can run on a Linux host. Each one includes what the output means and the decision you make from it.
These are not academic; they’re the kinds of checks you do when “something feels slow” and you need to stop guessing.

Task 1: Check memory pressure and swap reality

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi        24Gi       1.2Gi       512Mi       5.8Gi       3.9Gi
Swap:          2.0Gi       1.6Gi       400Mi

Meaning: “available” is the big one; it reflects reclaimable cache. Heavy swap usage suggests sustained memory pressure, not a brief spike.

Decision: If swap is actively used and latency is bad, you either reduce memory footprint (cache budgets, JVM heap, worker count)
or add memory. Don’t treat swap as “extra RAM”; treat it as “latency insurance with a very expensive premium.”

Task 2: Identify top memory consumers (and whether it’s anonymous or file cache)

cr0x@server:~$ ps -eo pid,comm,rss,vsz --sort=-rss | head
  PID COMMAND           RSS    VSZ
 4121 java           9876540 12582912
 2330 postgres       2456780  3145728
 1902 prometheus     1024000  2048000
 1187 nginx           256000   512000

Meaning: RSS shows resident memory; VSZ can be misleading (reserved address space).
A single process with ballooning RSS is an obvious target.

Decision: If RSS growth correlates with latency spikes, apply a memory budget: cap caches, tune heap, or isolate the workload.

Task 3: See if the kernel is reclaiming aggressively

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0 1638400 120000  80000 5200000  10   25   120   300 1200 1800 20  8 60 12  0
 4  1 1639000  90000  70000 5100000  80  120   400  1500 1600 2200 18 10 45 27  0
 3  1 1639500  85000  65000 5000000  60   90   350  1200 1500 2100 15  9 50 26  0
 2  0 1640000 110000  70000 5050000  15   30   150   500 1300 1900 19  8 58 15  0
 2  0 1640000 115000  72000 5080000   5   10   100   320 1250 1850 21  7 62 10  0

Meaning: Non-zero si/so (swap in/out) and high wa (IO wait) indicate memory pressure turning into IO pain.

Decision: If swapping happens during peak, stop “optimizing” elsewhere. Fix memory pressure first or you’ll chase phantom bottlenecks.

Task 4: Check load average versus CPU saturation

cr0x@server:~$ uptime
 14:22:10 up 37 days,  3:11,  2 users,  load average: 18.42, 17.90, 16.77

Meaning: Load average includes runnable and uninterruptible tasks (often IO wait). High load doesn’t automatically mean CPU is maxed.

Decision: Pair this with CPU and IO checks before declaring “we need more cores.”

Task 5: Confirm CPU is actually the bottleneck

cr0x@server:~$ mpstat -P ALL 1 3
Linux 6.5.0 (server) 	01/21/2026 	_x86_64_	(16 CPU)

12:00:01 AM  CPU   %usr %nice %sys %iowait %irq %soft %steal %idle
12:00:02 AM  all   22.11 0.00  7.90   0.80 0.00  0.60   0.00 68.59
12:00:03 AM  all   24.01 0.00  8.10   9.50 0.00  0.70   0.00 57.69
12:00:04 AM  all   21.20 0.00  7.50  11.40 0.00  0.60   0.00 58.70

Meaning: %iowait is climbing; CPU isn’t saturated. The system is waiting on storage/network.

Decision: Don’t scale CPU. Investigate IO latency, filesystem contention, or network storage behavior.

Task 6: Quick disk space check (bytes) and inode check (metadata)

cr0x@server:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  900G  720G  180G  81% /
/dev/nvme1n1p1  200G  198G  2.0G  99% /var/lib/postgresql/wal

cr0x@server:~$ df -i
Filesystem       Inodes   IUsed    IFree IUse% Mounted on
/dev/nvme0n1p2  58982400 1048576 57933824    2% /
/dev/nvme1n1p1  13107200 13090000   17200   100% /var/lib/postgresql/wal

Meaning: A filesystem can be “not full” on bytes but full on inodes. WAL mount is full on both.

Decision: If inode usage is the issue, deleting large files won’t help. You need to delete many small files or rebuild with more inodes.

Task 7: Measure storage latency and utilization live

cr0x@server:~$ iostat -xz 1 3
Linux 6.5.0 (server) 	01/21/2026 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          20.15    0.00    7.30    9.42    0.00   63.13

Device            r/s     w/s   rkB/s   wkB/s  await  %util
nvme0n1         120.0   200.0  4096.0  8192.0   6.20  78.00
nvme1n1          10.0   900.0   512.0 16384.0  45.30  99.20

Meaning: nvme1n1 is pegged (%util ~99) with high await. That’s a bottleneck.

Decision: Move write-heavy workloads (WAL, logs) off that device, increase device capability, or reduce write amplification.

Task 8: Find which processes are doing IO right now

cr0x@server:~$ sudo iotop -o -b -n 3
Total DISK READ: 5.12 M/s | Total DISK WRITE: 42.33 M/s
  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 2330 be/4 postgres     0.00 B/s   28.10 M/s  0.00 %  35.20 % postgres: wal writer
 4121 be/4 cr0x       1.20 M/s     4.10 M/s  0.00 %   5.10 % java -jar service.jar
 3011 be/4 root       0.00 B/s     3.20 M/s  0.00 %   2.00 % journald

Meaning: The WAL writer dominates writes. This isn’t “mysterious IO”; it’s your database doing its job.

Decision: If the IO device can’t keep up, tuning app queries won’t fix it fast. Address the storage path first.

Task 9: Check filesystem mount options that can bite you

cr0x@server:~$ mount | grep -E ' / |wal'
/dev/nvme0n1p2 on / type ext4 (rw,relatime,errors=remount-ro)
/dev/nvme1n1p1 on /var/lib/postgresql/wal type ext4 (rw,relatime,data=ordered)

Meaning: You’re looking for surprises: sync mounts, noatime/relatime, barriers, odd options that change write patterns.

Decision: If you find sync or an unexpected network filesystem under a latency-sensitive path, that’s likely your “640 KB” moment.

Task 10: Check file descriptor limits (the modern “conventional memory” of sockets)

cr0x@server:~$ ulimit -n
1024

cr0x@server:~$ cat /proc/sys/fs/file-nr
42112	0	9223372036854775807

Meaning: Per-process limit is 1024, which is tiny for many services. System-wide file handles are fine.

Decision: If you see “too many open files” errors or connection churn, raise per-service limits via systemd and verify with a restart.

Task 11: Check network backlog and SYN handling (queue limits that look like “random packet loss”)

cr0x@server:~$ sysctl net.core.somaxconn net.ipv4.tcp_max_syn_backlog
net.core.somaxconn = 128
net.ipv4.tcp_max_syn_backlog = 256

Meaning: These defaults can be too low for high-concurrency services, causing connection drops under bursts.

Decision: If you see SYN drops or accept queue overflow in metrics, tune these and load-test. Don’t “just add pods” and hope.

Task 12: Check conntrack table usage (NAT and state tracking: the hidden ceiling)

cr0x@server:~$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 262144

cr0x@server:~$ cat /proc/sys/net/netfilter/nf_conntrack_count
261900

Meaning: You’re nearly at the maximum. When this fills, new connections fail in ways that look like application bugs.

Decision: Increase the table (with memory awareness), reduce unnecessary connection churn, and set alerts at sensible thresholds.

Task 13: Check kernel logs for the truth you didn’t want

cr0x@server:~$ dmesg -T | tail -n 8
[Mon Jan 21 13:58:11 2026] Out of memory: Killed process 4121 (java) total-vm:12582912kB, anon-rss:9876540kB, file-rss:10240kB, shmem-rss:0kB
[Mon Jan 21 13:58:11 2026] oom_reaper: reaped process 4121 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[Mon Jan 21 14:00:02 2026] EXT4-fs warning (device nvme1n1p1): ext4_dx_add_entry: Directory index full, reached max htree level

Meaning: OOM kills and filesystem warnings are not “noise.” They are the system telling you your assumptions are wrong.

Decision: If you see OOM, stop adding features and start sizing memory. If you see filesystem index warnings, examine directory/file layout.

Task 14: Measure directory and small-file explosion

cr0x@server:~$ sudo find /var/lib/postgresql/wal -type f | wc -l
12983456

Meaning: Millions of files implies inode pressure, directory scaling issues, and backup/scan pain.

Decision: Re-architect file layout, rotate aggressively, or move to a design that doesn’t use the filesystem as a database.

Task 15: Confirm whether the app is throttled by cgroups (a very 2026 kind of “640 KB”)

cr0x@server:~$ cat /sys/fs/cgroup/memory.max
2147483648

cr0x@server:~$ cat /sys/fs/cgroup/memory.current
2130014208

Meaning: The workload is basically at its memory limit. You can tune all day; the wall is literal.

Decision: Increase the limit or reduce memory use. Also: set alerts on memory.current approaching memory.max, not after OOM.

Fast diagnosis playbook: find the bottleneck fast

When everything is slow, you don’t have time to philosophize about the 1980s. You need a disciplined sequence that converges.
This playbook assumes a single host or node is misbehaving; adapt it for distributed systems by sampling multiple nodes.

First: confirm the failure mode (symptoms, not theories)

Is it latency, throughput, or errors?
Is it steady degradation or spiky bursts?
Does it correlate with deploys, traffic, cron jobs, batch windows?

Run: load + CPU + memory + IO quick checks. Don’t guess which subsystem is guilty.

Second: check the four usual bottlenecks in order

Memory pressure: free -h, vmstat, cgroup limits, OOM logs.
If swapping or OOM is present, treat it as primary until disproven.
Storage latency: iostat -xz, iotop, filesystem fullness and inode fullness.
High await or %util near 100% is a smoking gun.
CPU saturation: mpstat and per-process CPU.
High %usr/%sys with low iowait points to CPU.
Network and queues: backlog settings, conntrack, retransmits, drops.
A full conntrack table can make an otherwise healthy service look haunted.

Third: localize impact before you “fix” it

Which process is top CPU / top RSS / top IO?
Which mount is filling?
Which device has high latency?
Which limit is near its ceiling (fds, conntrack, cgroups, disk, inodes)?

Fourth: pick the least risky mitigation

Throttle the offender (rate limits, pause batch jobs).
Add headroom (increase volume size, raise limits) if it’s safe and reversible.
Move hot paths off contended resources (separate WAL/logs, isolate caches).
Roll back recent changes if the timeline fits.

Fifth: make it non-repeatable

Add an alert on the actual constraint you hit (not a proxy metric).
Write a runbook that starts with “show me the limit and current usage.”
Schedule a limit drill. Put it on the calendar like patching. Because it is patching—of your assumptions.

Common mistakes: symptom → root cause → fix

This is where the 640 KB myth earns its keep. The failure isn’t “we didn’t predict the future.”
The failure is “we didn’t identify a limit and treat it like a production dependency.”

1) Symptom: “Disk is 70% free but writes fail”

Root cause: inode exhaustion, filesystem metadata limits, or a different mount (WAL/logs) is full.

Fix: check df -i and mountpoints; move hot paths to dedicated volumes; rebuild filesystems with appropriate inode density if needed.

2) Symptom: “Load average is huge; CPU graphs look fine”

Root cause: IO wait or blocked tasks (storage latency, NFS hiccups).

Fix: run iostat -xz and iotop; investigate device await and %util; fix storage bottleneck before scaling CPU.

3) Symptom: “Random timeouts under bursts; adding pods doesn’t help”

Root cause: accept queue overflow, low somaxconn, SYN backlog exhaustion, or conntrack full.

Fix: tune backlog parameters, increase conntrack max with memory awareness, and reduce connection churn via keep-alives/pooling.

4) Symptom: “Latency improved after caching, then got worse than before”

Root cause: cache-induced memory pressure causing reclaim, swap, or GC thrash.

Fix: enforce cache budgets; monitor page faults and reclaim; move caches to dedicated tiers; test with realistic cardinality and churn.

5) Symptom: “Service restarts fix it for a while”

Root cause: resource leak (fds, memory, conntrack), fragmentation, or unbounded queues.

Fix: track growth over time; set hard limits; add leak detection; implement backpressure; don’t accept “restart is the runbook.”

6) Symptom: “Database is slow but CPU is low”

Root cause: storage latency, fsync contention, WAL on saturated device, or checkpoint bursts.

Fix: separate WAL onto fast storage, tune checkpoint settings carefully, measure fsync latency, and watch write amplification.

7) Symptom: “Plenty of RAM free; still OOM-killed”

Root cause: cgroup memory limits, per-container ceilings, or high anonymous RSS under a hard cap.

Fix: check /sys/fs/cgroup/memory.max; increase limits; reduce memory; ensure alerts are based on cgroup usage, not host free.

Checklists / step-by-step plan

Checklist 1: Build a “limits inventory” for any service that matters

List all storage mounts used by the service (data, logs, WAL, tmp, cache).
For each mount: record size, inode count, growth drivers, and cleanup mechanism.
Record compute ceilings: CPU limits, memory limits, heap size, thread pools.
Record OS ceilings: ulimit values, systemd limits, conntrack size, backlog settings.
Record upstream ceilings: DB connection limits, API rate limits, queue quotas.
For each ceiling: define a warning threshold and an emergency threshold.
Create one dashboard that shows “current usage vs limit” for all of the above.

Checklist 2: Capacity planning that doesn’t pretend to be prophecy

Measure current peak (not average) for CPU, memory, IO, network, and storage growth.
Identify the first resource that hits 80% during peak; that’s your first scaling target.
Define headroom policy (example: keep >30% free space on hot volumes; keep conntrack <70%).
Model growth as ranges, not single lines. Include seasonality and batch jobs.
Test failure modes: simulate full disk, full inode table, conntrack near max, low fd limits.
Write down what “degraded but acceptable” looks like and how you’ll enforce it (throttling, shedding).

Checklist 3: Pre-deploy guardrails (the anti-640 KB routine)

Before shipping a “performance” change, define the resource budget it will consume.
Load-test with peak cardinality, not synthetic uniform traffic.
Verify alerts exist for the actual new pressure point (memory.current, iowait, disk await).
Ensure rollback is viable and quick.
Run a canary that is big enough to hit real caches and queues.

FAQ

1) Did Bill Gates actually say “640 KB is enough for anybody”?

There’s no reliable primary-source evidence. The quote is widely considered misattributed or at least unverified.
Treat it as folklore, not history.

2) If the quote is dubious, why talk about it at all?

Because it’s a perfect proxy for a real failure mode: teams confusing a design boundary or default setting with a permanent truth.
The myth is annoying; the lesson is valuable.

3) What exactly was the “640 KB” limit?

Conventional memory on the IBM PC architecture: the usable RAM below the upper reserved area in the first 1 MB address space.
Hardware mapping needs (video, ROM) consumed the rest.

4) Why didn’t they just “use more than 1 MB”?

Later systems did, but compatibility mattered. Early software, DOS real mode assumptions, and the ecosystem made workarounds (EMS/XMS) more practical than breaking everything.

5) What is the modern equivalent of the 640 KB barrier?

Any hidden ceiling: container memory limits, conntrack tables, file descriptor caps, queue depths, inode exhaustion, tiny default volumes, or saturated storage devices.
The “barrier” is wherever your system hits a hard limit you didn’t model.

6) Isn’t this just “always plan for growth”?

Not quite. “Plan for growth” becomes a hand-wave. The real work is: identify specific limits, track usage against them, and rehearse what happens at 80/90/100%.

7) Should we always raise limits preemptively?

No. Raising limits blindly can move failure elsewhere or increase blast radius. Raise limits when you understand the resource cost and you have monitoring and backpressure.

8) How do I stop performance optimizations from backfiring?

Budget them. Every optimization consumes something: memory, IO, CPU, complexity, or operational risk.
Require a “resource bill” in reviews, and test under realistic peak patterns.

9) What if we don’t have time for a full capacity planning effort?

Do a limits inventory first. It’s cheap and immediately useful. Most outages aren’t from unknown unknowns; they’re from known limits nobody wrote down.

10) What’s one metric you’d add everywhere tomorrow?

“Usage vs limit” for each critical ceiling: disk bytes, inodes, memory.current vs memory.max, open fds vs ulimit, conntrack_count vs nf_conntrack_max.
Percentages alone lie; you need the ceiling in view.

Next steps you can actually do this week

Stop arguing about whether someone said a line in the 1980s. Your production system is busy creating its own quote.
The fix is not cynicism; it’s instrumentation and discipline.

Write down your top 10 hard limits per service (memory, disk, inodes, fds, conntrack, queue depths, DB connections).
Add alerts on the real ceilings, not proxies. Alert when you approach the wall, not when you’re already bleeding.
Run one limit drill: pick a constraint and verify you can detect it, mitigate it, and prevent recurrence.
Budget caches explicitly. If a cache doesn’t have a max size, it isn’t a cache; it’s a slow-motion incident.
Separate hot IO paths (logs/WAL/tmp) so a noisy neighbor doesn’t take down your core storage.

The 640 KB myth won’t die because it’s memorable. Make your limits memorable too—by putting them on dashboards and runbooks, where they belong.