MySQL vs PostgreSQL on a 4GB RAM VPS: What to Set First for Websites

Was this helpful?

You’ve got a 4GB RAM VPS. A few websites. A database. And now a pager, a ticket, or a client email that says, “The site is slow.” Nothing is more humbling than watching a $10/month box try to be an enterprise platform because someone enabled a plugin that “only runs one query.”

This is a field guide for getting MySQL or PostgreSQL stable and fast enough for website workloads on small VPS hardware. Not a benchmark fantasy. Not a config-dump. The stuff you set first, the stuff you measure first, and the stuff you stop doing before it costs you weekends.

First decision: MySQL or PostgreSQL for websites on 4GB

On a 4GB VPS, the “best database” is the one you can keep predictable under memory pressure and bursty traffic. Your enemy is not theoretical throughput. It’s swap storms, connection floods, and storage latency spikes that turn “fast enough” into “why is checkout timing out?”

Pick MySQL (InnoDB) when:

  • Your stack is already MySQL-native (WordPress, Magento, many PHP apps) and you don’t want to be the person rewriting everything “for fun.”
  • You want a fairly straightforward cache story: the InnoDB buffer pool is the big knob, and it behaves like a big knob.
  • You need replication that’s easy to operate with common tooling, and you’re okay with eventual consistency trade-offs in some modes.

Pick PostgreSQL when:

  • You care about query correctness and rich SQL features (real window functions, CTEs, better constraints and data types) and you’ll actually use them.
  • You want predictable query plans, good observability, and sane defaults for many modern app patterns.
  • You can commit to connection pooling (pgBouncer) because PostgreSQL’s process-per-connection model punishes “just open more connections” on small boxes.

If this is mostly CMS traffic with plugins you don’t control, I’m usually conservative: stay with MySQL unless the app is already Postgres-first. If you’re building something new with a team that writes SQL intentionally, PostgreSQL is often the better long-term deal. But on 4GB, the short-term win is operational simplicity, not philosophical purity.

Rule of thumb: if you can’t describe your top 5 queries and their indexes, you’re not “choosing a database,” you’re choosing which failure modes you’d like to experience first.

Interesting facts & historical context (that actually changes decisions)

  1. MySQL’s early web dominance came from LAMP ubiquity and “good enough” speed for read-heavy sites. That’s why so many website apps still assume MySQL dialect quirks.
  2. InnoDB became the default in MySQL 5.5 (2010 era). If you’re still thinking in MyISAM terms (table locks, no crash recovery), you’re carrying a fossil in your pocket.
  3. PostgreSQL’s MVCC model is one reason it stays consistent under concurrency, but it creates a steady need for vacuuming. Ignore vacuum and the database won’t scream; it’ll just slowly get worse.
  4. PostgreSQL switched to a more parallel-friendly execution model over time (parallel queries, better planner features). On a small VPS this matters less than on big iron, but it’s part of why Postgres “feels modern” for analytics-style queries.
  5. MySQL’s query cache was removed in MySQL 8.0 because it scaled poorly under concurrency. If someone tells you to “enable query_cache_size,” you found a time traveler.
  6. Postgres gets credit for standards and correctness because it historically prioritized features and integrity over early raw speed. Today it’s fast too, but the cultural DNA still shows in defaults and tooling.
  7. Both engines are conservative about durability by default (fsync, WAL/redo). Disabling durability settings makes benchmarks look heroic and postmortems look like crime scenes.
  8. MariaDB diverged from MySQL in significant ways. “MySQL tuning” advice sometimes maps poorly to MariaDB versions and storage engines. Verify what you’re actually running.
  9. RDS and managed services influenced tuning folklore: people copy cloud defaults to VPS, then wonder why a 4GB box behaves like it’s underwater.

Baseline architecture for a 4GB VPS (and why it matters)

On a 4GB VPS, you don’t have “extra memory.” You have a budget. Spend it on caches that reduce I/O, and on headroom that prevents swapping. The OS page cache also matters because both MySQL and PostgreSQL ultimately need filesystem-backed reads, and the kernel is not your enemy; it’s your last line of defense.

Reality-based memory budget

  • OS + SSH + basic daemons: 300–600MB
  • Web server + PHP-FPM: wildly variable. A few hundred MB to multiple GB depending on process counts and app behavior.
  • Database: what’s left, but not all of it. If you give the DB everything, the web tier will OOM or swap when traffic spikes.

For “websites on a single VPS,” the database isn’t isolated. This is one of the few times where “set it and forget it” is not laziness; it’s survival.

Opinion: If you’re running both web and DB on the same 4GB VPS, plan to allocate roughly 1.5–2.5GB to the database cache layer max, unless you’ve measured PHP memory usage under load and it’s truly small. Your goal is stable latency, not a heroic buffer pool.

Joke #1: A 4GB VPS is like a studio apartment—technically you can fit a treadmill in it, but you’ll hate your life and so will your neighbors.

Fast diagnosis playbook: find the bottleneck in 10 minutes

This is the order I check things when “the site is slow” and the database is the prime suspect. Each step tells you whether to look at CPU, memory, connections, locks, or storage.

First: is the box starving (CPU, RAM, swap)?

  • Check load vs CPU count.
  • Check swap activity and major page faults.
  • Check OOM killer history.

Second: is it storage latency (IOPS/fsync/WAL/redo)?

  • High iowait, slow fsync, long commit times, or stalled checkpoints.
  • Look for queue depth and average await times.

Third: is it connection pressure?

  • Too many DB connections or threads.
  • Connection storms from PHP workers.
  • Thread/process counts hitting RAM.

Fourth: is it locks or long transactions?

  • MySQL: metadata locks, InnoDB row locks, long-running transactions.
  • Postgres: blocked queries, idle-in-transaction sessions, vacuum blocked by old snapshots.

Fifth: is it “bad queries + missing indexes”?

  • Slow query logs / pg_stat_statements show the top offenders.
  • Look for full table scans and “filesort”/temp tables or sequential scans with huge row counts.

That’s it. Don’t start by changing random knobs. Don’t copy a “high performance my.cnf” from a 64GB database server. Measure, then choose one change you can explain.

Quote (paraphrased idea): John Allspaw’s reliability idea: production is where assumptions go to die, so design and operate for learning, not certainty.

Practical tasks: commands, outputs, and what you do next

These are real tasks you can run on a Linux VPS. Each includes: the command, what typical output means, and the decision you make. Run them in order when you’re triaging or setting baselines.

Task 1: Confirm basic system pressure (CPU, RAM, swap)

cr0x@server:~$ uptime
 14:22:19 up 36 days,  3:18,  1 user,  load average: 5.84, 5.12, 3.90

What it means: On a 2 vCPU VPS, load averages above ~2–3 for sustained periods often mean runnable queue backups (CPU) or uninterruptible I/O waits.

Decision: If load is high, immediately check iowait and memory/swap before touching database configs.

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.8Gi       3.3Gi       120Mi        90Mi       420Mi       220Mi
Swap:          1.0Gi       860Mi       164Mi

What it means: Swap in use with low “available” memory is a red flag on latency-sensitive web workloads.

Decision: If swap is being used during peak, reduce DB memory footprint and/or reduce PHP-FPM workers. Consider enabling a small amount of swap only as a crash cushion, not a performance tier.

Task 2: Check for active swapping and iowait

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  1 880000 120000  28000 380000   12   45   210   480  520  900 18  8 40 34  0
 3  0 880000 118000  28000 381000    0   60   160   510  600 1100 20  9 35 36  0
 1  2 880000 115000  28000 379000   20    0   400   200  480  850 15  7 38 40  0
 4  0 880000 119000  28000 382000    0    8    90   520  610 1200 22 10 32 36  0
 2  1 880000 117000  28000 381500    0   35   200   600  590 1050 19  9 34 38  0

What it means: Non-zero si/so indicates swapping. High wa indicates I/O wait.

Decision: If swapping is active, stop tuning queries and fix memory/connection counts. If iowait is high, focus on storage latency, checkpoint/flush behavior, and reducing write amplification.

Task 3: Identify disk latency and queueing quickly

cr0x@server:~$ iostat -xz 1 3
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.20    0.00    8.10   34.50    0.00   39.20

Device            r/s     w/s   rkB/s   wkB/s  rrqm/s  wrqm/s  %util  await  aqu-sz
vda              12.0    48.0   900.0  6200.0    0.2     3.1   92.0   21.5    1.8

What it means: %util near 100% and await > 10–20ms on a VPS disk usually means your storage is the bottleneck.

Decision: Reduce writes (batching, autovacuum tuning, InnoDB flush tuning), move DB to better storage, or separate DB and web. On some VPS plans, the only “tuning” that matters is paying for higher IOPS.

Task 4: Confirm which DB you’re running and version details

cr0x@server:~$ mysql --version
mysql  Ver 8.0.36-0ubuntu0.22.04.1 for Linux on x86_64 ((Ubuntu))

What it means: Major versions change defaults and remove knobs (like query cache). Version tells you what advice is even applicable.

Decision: Don’t apply tuning recipes unless they match your major version and flavor (MySQL vs MariaDB).

cr0x@server:~$ psql --version
psql (PostgreSQL) 16.1 (Ubuntu 16.1-1.pgdg22.04+1)

What it means: Newer Postgres versions improve vacuum, WAL, and planner behavior. That changes “what hurts” on small boxes.

Decision: On old Postgres, you may need more manual babysitting. On newer Postgres, focus more on connection pooling and autovacuum thresholds.

Task 5: Count DB connections (MySQL)

cr0x@server:~$ mysql -e "SHOW STATUS LIKE 'Threads_connected';"
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 185   |
+-------------------+-------+

What it means: 185 connections on a 4GB VPS with PHP is often a problem, even before queries get slow.

Decision: Cap application concurrency, enable persistent connections carefully, or move to a pattern that limits DB concurrency (queueing at app, caching, or splitting read traffic). If you can’t control the app, lower max_connections and accept controlled failures over total collapse.

Task 6: Count DB connections (PostgreSQL)

cr0x@server:~$ sudo -u postgres psql -c "SELECT count(*) AS connections FROM pg_stat_activity;"
 connections
-------------
         142
(1 row)

What it means: 142 Postgres sessions equals 142 backend processes. On a 4GB VPS, that’s a memory and context-switch tax.

Decision: Install pgBouncer and drop max_connections. On small boxes, Postgres without pooling is a performance prank you play on yourself.

Task 7: Find long-running queries and blockers (PostgreSQL)

cr0x@server:~$ sudo -u postgres psql -c "SELECT pid, now()-query_start AS age, state, wait_event_type, wait_event, left(query,80) AS q FROM pg_stat_activity WHERE state <> 'idle' ORDER BY age DESC LIMIT 5;"
 pid  |   age    | state  | wait_event_type | wait_event |                                       q
------+----------+--------+-----------------+------------+--------------------------------------------------------------------------------
 9123 | 00:02:18 | active | Lock            | relation   | UPDATE orders SET status='paid' WHERE id=$1
 9051 | 00:01:44 | active | IO              | DataFileRead | SELECT * FROM products WHERE slug=$1
(2 rows)

What it means: Lock waits point to contention; IO waits point to slow storage or cache misses.

Decision: If Lock waits dominate, fix transaction scope and indexing. If IO waits dominate, increase effective caching (within reason) and reduce random reads via indexes and query shaping.

Task 8: Find lock waits (MySQL)

cr0x@server:~$ mysql -e "SHOW FULL PROCESSLIST;"
Id	User	Host	db	Command	Time	State	Info
210	app	10.0.0.12:50344	shop	Query	75	Waiting for table metadata lock	ALTER TABLE orders ADD COLUMN foo INT
238	app	10.0.0.15:38822	shop	Query	12	Sending data	SELECT * FROM orders WHERE created_at > NOW() - INTERVAL 1 DAY

What it means: Metadata locks can freeze writes and reads behind schema changes, depending on operation and version.

Decision: Stop doing online schema changes casually on a single small VPS. Schedule maintenance or use online schema migration tools designed to reduce locking.

Task 9: Check InnoDB buffer pool hit rate and read pressure

cr0x@server:~$ mysql -e "SHOW STATUS LIKE 'Innodb_buffer_pool_read%';"
+---------------------------------------+---------+
| Variable_name                         | Value   |
+---------------------------------------+---------+
| Innodb_buffer_pool_read_requests      | 9823412 |
| Innodb_buffer_pool_reads              | 412390  |
+---------------------------------------+---------+

What it means: reads are physical reads; read_requests are logical. If physical reads are high relative to requests, you’re missing cache.

Decision: If the working set fits in RAM, increase innodb_buffer_pool_size cautiously. If it doesn’t fit, prioritize indexes and reducing the working set (fewer columns, fewer scans).

Task 10: Check Postgres cache and temp file spills

cr0x@server:~$ sudo -u postgres psql -c "SELECT datname, blks_hit, blks_read, temp_files, temp_bytes FROM pg_stat_database ORDER BY temp_bytes DESC LIMIT 5;"
  datname  | blks_hit | blks_read | temp_files |  temp_bytes
-----------+----------+-----------+------------+--------------
 appdb     |  9201123 |   612332  |      1832  | 2147483648
(1 row)

What it means: Lots of temp_bytes suggests sorts/hashes spilling to disk because work_mem is too small for those operations—or queries are doing too much.

Decision: Don’t crank work_mem globally on a small VPS. Fix queries and indexes first; then raise work_mem per-role or per-session for specific workloads.

Task 11: See top queries (Postgres, if pg_stat_statements is enabled)

cr0x@server:~$ sudo -u postgres psql -c "SELECT calls, mean_exec_time, rows, left(query,80) AS q FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 5;"
 calls | mean_exec_time | rows |                                       q
-------+----------------+------+--------------------------------------------------------------------------------
 82021 |          12.45 |    1 | SELECT id FROM sessions WHERE token=$1
  1220 |         210.12 |  300 | SELECT * FROM orders WHERE user_id=$1 ORDER BY created_at DESC LIMIT 50
(2 rows)

What it means: High total time queries are your budget burners. High call-count queries are your “death by a thousand cuts.”

Decision: Index the hot paths and reduce chatty queries. If a query runs 80k times and takes 12ms, that’s a full core’s worth of regret.

Task 12: Enable and read MySQL slow query log quickly

cr0x@server:~$ mysql -e "SET GLOBAL slow_query_log=ON; SET GLOBAL long_query_time=0.5; SET GLOBAL log_queries_not_using_indexes=ON;"
...output omitted...

What it means: You’re turning on evidence collection. Keep thresholds reasonable so you don’t DOS yourself with logs.

Decision: Collect for 15–60 minutes during peak, then use the data to fix the worst offenders. Turn off log_queries_not_using_indexes if it’s too noisy for your app.

cr0x@server:~$ sudo tail -n 5 /var/log/mysql/mysql-slow.log
# Query_time: 1.204  Lock_time: 0.000 Rows_sent: 50  Rows_examined: 84512
SELECT * FROM orders WHERE user_id=123 ORDER BY created_at DESC LIMIT 50;

What it means: Rows examined is huge relative to rows sent: classic missing index or wrong index order.

Decision: Add/adjust composite indexes to match filter + sort pattern (e.g., (user_id, created_at)), then verify with EXPLAIN.

Task 13: Check filesystem space and inode pressure

cr0x@server:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        80G   74G  2.1G  98% /

What it means: Near-full disks destroy performance and can break DB writes. Postgres may refuse to checkpoint; MySQL may crash or go read-only depending on circumstances.

Decision: Free space now. Then set up log rotation and a monitoring alert at 80–85%.

Task 14: Check kernel and DB process memory usage

cr0x@server:~$ ps -eo pid,comm,rss,pmem --sort=-rss | head
 2311 mysqld     1854320 46.2
 1822 php-fpm8.1  412000 10.2
 1825 php-fpm8.1  405000 10.1
  911 postgres    220000  5.4

What it means: RSS shows actual resident memory. A few large PHP workers plus a large DB cache can tip the box into swap.

Decision: If DB + PHP already consume most RAM, stop increasing DB buffers. Reduce concurrency and cap memory consumers.

If you pick MySQL: what to set first on a 4GB VPS

MySQL on small VPS hardware is usually fine if you don’t treat it like a bottomless pit for connections and memory. InnoDB is your default engine; tune for InnoDB, not for nostalgia.

1) Set innodb_buffer_pool_size like an adult

Goal: Cache hot data/indexes, reduce random reads, avoid starving everything else.

  • If DB is on same box as web: start around 1.0–1.5GB.
  • If DB is mostly alone: up to 2.0–2.5GB can work.

Failure mode: Oversizing the buffer pool doesn’t “use free memory.” It competes with the OS page cache and the web tier. Then you swap. Then every query becomes a storage benchmark.

2) Set max_connections lower than you think

MySQL threads consume memory. PHP apps love opening connections like it’s free. It’s not free.

  • Start around 100–200 depending on app and query latency.
  • If you’re seeing 300–800 connections, you don’t have a “database performance issue.” You have a concurrency control issue.

3) Keep redo log and flush behavior sane

On a small VPS with uncertain storage latency, overly aggressive flushing can cause spikes. But turning durability into a suggestion is how you earn a resume update.

  • innodb_flush_log_at_trx_commit=1 for real durability (default).
  • If you absolutely must reduce fsync pressure and can accept losing up to 1 second of transactions in a crash: consider =2. Document it. Put it in incident runbooks. Don’t pretend it’s free.

4) Disable what you don’t need, but don’t blind yourself

Performance Schema is useful; it also costs overhead. On a tiny VPS, you can reduce instrumentation rather than nuking it.

  • If you’re constantly CPU-bound with low query latency, consider trimming Performance Schema consumers.
  • But keep enough visibility to catch regressions. Debugging without metrics is just creative writing.

5) Set temporary table limits carefully

Web apps love ORDER BY and GROUP BY, often with too-wide result sets.

  • tmp_table_size and max_heap_table_size can reduce disk temp tables, but set them too high and you’ll blow memory under concurrency.

MySQL starter config sketch (not a copy-paste religion)

This is the spirit of it for a mixed web+DB 4GB VPS. Adjust based on measurements above.

cr0x@server:~$ sudo cat /etc/mysql/mysql.conf.d/99-vps-tuning.cnf
[mysqld]
innodb_buffer_pool_size = 1G
innodb_buffer_pool_instances = 1
max_connections = 150
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
slow_query_log = ON
long_query_time = 0.5

What it means: Smaller buffer pool to preserve headroom, capped connections, direct I/O to reduce double-caching (depends on your filesystem and workload), and slow query logging for evidence.

Decision: Apply, restart during a quiet window, then re-check swap/iowait and slow logs. If latency improves and swap disappears, you’re on the right path.

If you pick PostgreSQL: what to set first on a 4GB VPS

Postgres is excellent for websites, but it makes you pay attention to three things early: connection counts, vacuum, and WAL/checkpoints. Ignore any of those and you’ll get “random” slowdowns that aren’t random at all.

1) Install connection pooling (pgBouncer) before you “need” it

On 4GB, Postgres backends are not disposable. A traffic spike that opens hundreds of connections can turn into memory pressure and context-switch overhead.

Do: run pgBouncer in transaction pooling mode for typical web workloads.

Don’t: crank max_connections to 500 and call it scaling.

2) Set shared_buffers conservatively

The folklore says “25% of RAM.” On a mixed web+DB VPS, I’d start around:

  • 512MB to 1GB for shared_buffers.

Postgres benefits from OS page cache too. Giving everything to shared_buffers can starve the OS and other processes.

3) Set work_mem low globally; raise it surgically

work_mem is per sort/hash operation, per query, per backend. You don’t have enough RAM for bravado here.

  • Start at 4–16MB globally depending on concurrency.
  • Increase for a specific role or session if you have a known heavy report query.

4) Keep autovacuum healthy

Autovacuum isn’t optional housekeeping. It’s how Postgres prevents table bloat and keeps index-only scans possible.

  • Monitor dead tuples and vacuum lag.
  • Tune autovacuum thresholds per hot table if needed.

5) Make checkpoints less spiky

On slow VPS storage, checkpoint spikes show up as random latency cliffs. Smoother checkpoints reduce pain.

  • Increase checkpoint_timeout (within reason).
  • Set checkpoint_completion_target high to spread writes.

Postgres starter config sketch

cr0x@server:~$ sudo cat /etc/postgresql/16/main/conf.d/99-vps-tuning.conf
shared_buffers = 768MB
effective_cache_size = 2304MB
work_mem = 8MB
maintenance_work_mem = 128MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 10min
wal_compression = on
log_min_duration_statement = 500ms

What it means: Conservative shared buffers, realistic cache hinting, modest work memory, smoother checkpoints, and query logging for slow statements.

Decision: Apply and reload/restart, then watch temp file growth and checkpoint timing. If your disk is slow, checkpoint smoothing will show up as fewer latency cliffs.

Connections: the silent killer on small boxes

If you run websites, the easiest way to ruin a database is to let the application decide concurrency. PHP-FPM workers + “open a DB connection per request” becomes a thundering herd. On 4GB, you don’t survive by being faster. You survive by being calmer.

What “too many connections” looks like

  • DB CPU high but not doing useful work (context switching, mutex contention).
  • Memory usage climbs with traffic until swap.
  • Latency increases even for simple queries.

What you do instead

  • Cap app concurrency: fewer PHP-FPM children, or set process manager to avoid explosions.
  • Use pooling: pgBouncer for Postgres; for MySQL, consider pooling at the application layer or ensure persistent connections are configured sanely.
  • Fail fast: sometimes lower max_connections is the right move because it protects the box from total thrash.

Joke #2: Unlimited connections is like unlimited buffet shrimp—sounds great until you realize you’re the one closing the restaurant.

Storage and filesystem realities: IOPS, fsync, and why “fast SSD” lies

On VPS platforms, “SSD storage” can mean anything from respectable NVMe to a shared network block device having a bad day. Databases care about latency more than throughput. A few milliseconds of extra fsync time per commit becomes visible at the website.

How writes hurt you differently in MySQL vs PostgreSQL

  • MySQL/InnoDB: redo logging + doublewrite buffer (depending on config/version) + flushing dirty pages. Bursty flush can amplify latency.
  • PostgreSQL: WAL writes + checkpoints + background writer. Vacuum also creates I/O, and bloat increases future I/O.

Small VPS best practice: reduce write amplification first

  • Fix chatty apps (too many small transactions).
  • Batch writes where correctness allows.
  • Avoid constantly updating “last_seen” columns on every request if you don’t need it.
  • Keep indexes lean; every index is a write tax.

Filesystem gotchas

  • Don’t put databases on flaky network filesystems unless you know the platform guarantees durability semantics.
  • Watch out for disk-full conditions: Postgres and MySQL behave badly in different ways, but none of those ways are “nice.”

Three corporate mini-stories from the trenches

1) The incident caused by a wrong assumption: “The cache will cover it”

A small team ran a collection of marketing sites and a checkout service on a single 4GB VPS. It had MySQL, Nginx, and PHP-FPM. Traffic was “mostly static,” which was true until a campaign launched and the checkout service started receiving bursts of authenticated requests.

The assumption was that the page cache and application caching would handle reads, so they pushed innodb_buffer_pool_size up near 3GB to “make the database fast.” It looked great in a quiet hour. Then the campaign hit.

PHP-FPM spawned to handle traffic. Each worker used more memory than anyone remembered. The OS started swapping. The database’s buffer pool was huge, so the kernel had less room for everything else. Latency didn’t increase gradually; it fell off a cliff. The checkout endpoint started timing out, retries increased traffic, and the retry storm turned a resource issue into a denial-of-service they hosted themselves.

The fix wasn’t exotic. They reduced the buffer pool to leave headroom, capped PHP-FPM children, lowered MySQL max_connections so the system failed fast instead of thrashing, and put an explicit queue in front of checkout. They also learned the operational difference between “free memory” and “available memory under burst.”

2) The optimization that backfired: “Just raise work_mem, it’s fine”

An internal app ran on PostgreSQL. Users complained about slow reports, so someone increased work_mem significantly because a blog post said it would reduce temp file I/O. It did. For one user. In one session.

Then a Monday morning happened. Several users ran reports concurrently. Those reports each did multiple sorts and hash joins. Postgres correctly allocated work_mem per operation. Memory usage surged. The VPS didn’t crash immediately; it got slower and slower as swap kicked in. The DB looked “alive” but every query waited behind the I/O storm caused by swapping.

The team rolled back work_mem to a conservative value and instead fixed the report query. They added a missing index, reduced selected columns, and introduced a summary table refreshed periodically. For the genuinely heavy query, they used a role with higher work_mem and forced it through a controlled reporting path. The lesson wasn’t “never tune.” It was “don’t tune globally for a local problem on a small machine.”

3) The boring but correct practice that saved the day: “Cap connections and log slow queries”

A different org hosted several small client sites on a shared 4GB VPS. Nothing fancy. They weren’t chasing microseconds. They did three boring things from day one: capped database connections, enabled slow query logging with a sane threshold, and monitored disk usage with an alert well before 90% full.

One afternoon a plugin update introduced a query regression. The site didn’t immediately fall over because connection caps prevented unlimited load from piling into the DB. Instead, some requests failed quickly, which made the issue visible without melting the whole box.

The slow query log had the smoking gun: a query that started scanning a large table without a useful index. They added the index, cleared up the regression, and the incident was contained to a short window. No mystery. No “it went away.” No weekend archaeology.

This is what boring reliability looks like: controlled failure, evidence collection, and enough headroom that one bad deploy doesn’t become a system-wide catastrophe.

Common mistakes: symptom → root cause → fix

1) Symptom: sudden 10–60s stalls across the site

Root cause: storage latency spikes during checkpoints/flushes or swap storms.

Fix: confirm with iostat and vmstat; reduce memory pressure (smaller DB caches, fewer app workers), smooth checkpoints (Postgres), and reduce write amplification (both).

2) Symptom: database CPU high, queries “not that slow” individually

Root cause: too many concurrent connections; contention overhead dominates.

Fix: cap connections; add pooling (pgBouncer); reduce PHP-FPM concurrency; cache at app or reverse proxy; fail fast rather than thrash.

3) Symptom: Postgres grows and grows; performance slowly degrades

Root cause: vacuum lag and table/index bloat due to insufficient autovacuum or long-running transactions.

Fix: identify idle-in-transaction sessions, tune autovacuum per hot table, and stop holding transactions open across requests.

4) Symptom: MySQL “Waiting for table metadata lock” in processlist

Root cause: schema change or DDL blocked by long transactions; queries queue behind metadata locks.

Fix: schedule DDL in maintenance windows; keep transactions short; use online schema change approaches if required.

5) Symptom: lots of temp files or “Using temporary; Using filesort” in MySQL

Root cause: missing indexes for ORDER BY/GROUP BY patterns; queries sorting huge datasets.

Fix: add composite indexes matching filter+sort; reduce selected columns; paginate properly; avoid OFFSET pagination for deep pages.

6) Symptom: frequent “too many connections” errors

Root cause: app connection leaks, no pooling, or spikes in web worker counts.

Fix: pool connections; set sane timeouts; cap app concurrency; set DB max_connections to a number you can afford.

7) Symptom: after “tuning,” performance got worse

Root cause: a global setting (like work_mem or too-large buffer pool) increased per-connection memory and triggered swap under concurrency.

Fix: revert; apply tuning per-user/per-query; measure memory and concurrency explicitly.

Checklists / step-by-step plan

Step 0: Decide what “good” means

  • Pick an SLO-like target: e.g., homepage p95 < 500ms, checkout p95 < 800ms.
  • Pick a measurement window and capture baseline (CPU, RAM, swap, iowait, DB connections, slow queries).

Step 1: Stabilize the host

  • Ensure disk has at least 15–20% free space.
  • Ensure you’re not swapping under normal peak traffic.
  • Set conservative service limits (systemd limits if needed) to avoid runaway processes.

Step 2: Cap concurrency deliberately

  • Set PHP-FPM max children to a number you can afford in RAM.
  • Set DB max_connections to protect the machine.
  • On Postgres: deploy pgBouncer and reduce backend connections.

Step 3: Set the first memory knobs

  • MySQL: set innodb_buffer_pool_size to fit the working set without starving the OS.
  • Postgres: set shared_buffers conservatively; keep work_mem low globally.

Step 4: Turn on evidence collection

  • MySQL: slow query log at 0.5–1s during peak, then analyze and fix.
  • Postgres: log_min_duration_statement and ideally pg_stat_statements.

Step 5: Fix the top 3 query patterns

  • Add the missing indexes that reduce row scans.
  • Eliminate N+1 queries in the app.
  • Stop doing expensive queries per request; precompute or cache.

Step 6: Re-test and set guardrails

  • Re-run your triage tasks at peak.
  • Add alerts on swap activity, disk utilization, connection counts, and slow query rate.
  • Document your “safe” settings and the rationale so future-you doesn’t undo them.

FAQ

1) On a 4GB VPS, should I prioritize DB cache or OS page cache?

Prioritize stability. For single-box web+DB, don’t starve the OS. A moderate DB cache plus headroom beats a giant cache that triggers swap under bursts.

2) Is PostgreSQL “slower” than MySQL for websites?

Not as a rule. For many web workloads, either is fast enough when indexed well. The bigger differentiator on 4GB is connection management and write patterns, not raw engine speed.

3) What’s the first MySQL setting I should change?

innodb_buffer_pool_size, sized to your reality. Then cap max_connections. Then enable slow query logging and fix what it shows you.

4) What’s the first PostgreSQL setting I should change?

Connection pooling strategy (pgBouncer) and max_connections. Then conservative shared_buffers and logging/pg_stat_statements to identify top queries.

5) Can I just increase swap to solve memory issues?

You can increase swap to prevent abrupt OOM crashes, but swap is not performance RAM. If your database or PHP workers regularly hit swap, latency will become unpredictable.

6) Should I disable fsync for speed?

No for production websites where you care about data integrity. If you disable durability and the host crashes, you can lose data. Benchmarks love it; customers don’t.

7) How do I know if I’m I/O bound?

High iowait in vmstat, high await and %util in iostat, and DB sessions waiting on IO events (Postgres) are strong signals.

8) When should I split web and DB onto separate servers?

When your tuning changes become trade-offs between web tier and DB tier memory, or when storage latency makes database writes unpredictable. Separation buys you isolation and clearer capacity planning.

9) Are defaults good enough these days?

Defaults are better than they used to be, but they’re not tailored to your 4GB “everything on one box” situation. Connection caps and memory budgeting are still on you.

10) What’s the safest “performance win” I can do without deep DB expertise?

Enable slow query logging (or pg_stat_statements), identify the top 3 time consumers, and add the right indexes. Also cap connections so the server remains stable under load.

Next steps that won’t embarrass you later

On a 4GB VPS, you’re not optimizing a database. You’re managing contention between web, database, and storage while trying to keep latency boring.

  1. Run the fast diagnosis playbook during peak and write down what’s actually happening: swap, iowait, connections, locks, top queries.
  2. Cap concurrency first: PHP-FPM workers and DB connections. Add pgBouncer if you’re on Postgres.
  3. Set the first memory knob (InnoDB buffer pool or Postgres shared buffers) to a conservative value that leaves headroom.
  4. Turn on evidence (slow query logs / pg_stat_statements) and fix the top offenders with indexes and query changes.
  5. Re-check disk and write behavior; smooth checkpoints, reduce temp spills, and stop doing noisy writes you don’t need.
  6. Decide if the real fix is architectural: moving DB to separate VPS, upgrading storage tier, or using a managed DB. Sometimes the most effective tuning parameter is your invoice.

If you do only one thing today: cap connections and stop swapping. Everything else is garnish.

← Previous
MySQL vs ClickHouse: Stop Analytics from Killing OLTP (The Clean Split Plan)
Next →
Proxmox Disks Not Detected: HBA, BIOS, and Cabling Quick Checklist

Leave a comment