On a VPS, databases don’t fail loudly. They fail expensively. First it’s “the site feels slow,” then it’s “we need a bigger instance,” and then someone quietly discovers you were paying for RAM your database couldn’t effectively use and for IOPS you never tested.
This is the VPS reality: noisy neighbors, burst credits, inconsistent storage latency, and a hard budget ceiling. In that world, “speed per dollar” isn’t a benchmark screenshot. It’s the ability to keep p95 latency predictable while your CFO keeps you on a leash.
What “speed per dollar” actually means on a VPS
People say “performance” like it’s one number. On a VPS, it’s a pile of tradeoffs and failure modes. If you want speed per dollar, you need to define the “speed” you’re buying:
- Predictable latency (p95/p99), not just average throughput.
- Efficiency per vCPU when the CPU is mediocre and shared.
- Write amplification and fsync behavior when storage is the limiting factor.
- Operational overhead: how often you page someone because maintenance fell behind.
MariaDB and PostgreSQL can both be fast. The difference is how they behave when the VPS isn’t. PostgreSQL tends to be more honest about work it must do (visibility, VACUUM, WAL). MariaDB/InnoDB can be astonishingly forgiving for OLTP read-heavy workloads, but it can also hide problems until the day it can’t.
The opinionated short answer (with caveats)
If you’re on a modest VPS and you want the most speed per dollar for typical SaaS OLTP (lots of small reads/writes, simple joins, many concurrent sessions), PostgreSQL usually wins long-term because it stays consistent under messy workloads and gives you better tools to diagnose and control the system.
If your workload is read-heavy, schema-simple, and you need good performance with minimal tuning, MariaDB can be cheaper to run on small hardware—especially when you can keep most hot data in the InnoDB buffer pool and avoid heavy write contention.
The caveat that matters: on many VPS providers, storage latency variability dominates. In that case the “faster database” is the one you can configure to avoid synchronous stalls without sacrificing correctness. PostgreSQL has strong correctness defaults; MariaDB can be tuned to be terrifyingly fast by relaxing durability, which may be fine—until it’s not.
Dry rule of thumb: if you’re betting the company on the database being boring, pick PostgreSQL. If you’re betting on squeezing cost and your durability requirements are negotiable, MariaDB can be the cheaper race car.
Joke #1: A VPS is like a coworking space: the coffee is fine, the Wi‑Fi is “shared,” and someone is always running a crypto miner two desks over.
Facts & history that still affect your tuning
These aren’t trivia-night facts. They explain why certain knobs exist, why defaults look conservative, and why migrations sometimes feel like moving house during a hurricane.
- PostgreSQL’s MVCC design (multi-version concurrency control) has been central since the 1990s. It’s why reads don’t block writes, and why VACUUM is a real operational concern.
- InnoDB became the default MySQL engine in the late 2000s because it delivered transactions, row-level locking, and crash recovery that people needed in production.
- MariaDB forked from MySQL after Oracle acquired Sun. That corporate event is why “drop-in replacement” became a selling point and why some compatibility corners are still sharp.
- PostgreSQL replication matured from shipping WAL (write-ahead log). The replication story is tightly tied to WAL, fsync, and checkpoint tuning—core to “speed per dollar” on slow disks.
- MySQL’s query cache was removed (and MariaDB’s story differs by version). This history is why you’ll see old advice that is now actively harmful for concurrency.
- PostgreSQL’s planner has a culture of correctness over surprise “cleverness.” It won’t always be the fastest plan, but it tends to be explainable, and that matters at 3 a.m.
- InnoDB’s doublewrite buffer exists because partial page writes happen. On flaky VPS storage, that protection can save you—at the cost of write overhead.
- PostgreSQL has embraced extensions (like pg_stat_statements) as first-class operational tools. That changes the day-to-day debugging experience dramatically.
Where VPSes hurt databases (and why it changes the choice)
1) Storage is the tax collector
On a VPS, you can buy more vCPU and more RAM, and the database will still stall on fsync. The point isn’t “SSD vs HDD.” The point is tail latency: a few slow syncs per second can ruin your p99. PostgreSQL and MariaDB both rely on write-ahead logs (WAL / redo logs). The moment your log device hiccups, your database becomes a queueing system with feelings.
2) CPU “steal” time is real
Virtualization can steal CPU cycles (literally exposed as steal time). If your instance is cheap, your database threads may run in bursts. That hurts PostgreSQL when you have too many active connections and context switching. It hurts MariaDB when purge/flush threads can’t keep up and dirty pages pile up.
3) Memory is both gold and a trap
RAM is the cheapest performance multiplier—until you overcommit it. PostgreSQL uses per-connection memory for sorts/hashes; MariaDB uses thread buffers too, but InnoDB’s buffer pool is the king lever. On small VPSes, one wrong memory setting can trigger swapping, and then you’re benchmarking the Linux VM subsystem, not your database.
4) Network jitter messes with perceived DB speed
Many “database slow” reports are actually network variance between app and DB. PostgreSQL and MariaDB respond similarly: the query is fast; the request isn’t. If your app and DB aren’t co-located, “speed per dollar” becomes “latency per regret.”
MariaDB on a VPS: what it’s great at
When MariaDB tends to win on cost
- Read-heavy OLTP with many small point lookups (primary key / indexed reads).
- Simple schemas where you’re not leaning on complex SQL features.
- Teams with MySQL muscle memory who can run it safely without “creative” tuning.
- Workloads with predictable access patterns where the buffer pool keeps hot pages resident.
The knobs that matter (and why)
On a VPS, MariaDB performance lives and dies by InnoDB:
- innodb_buffer_pool_size: make it big enough to hold your hot working set. If it’s too small, you pay random read IOPS. If it’s too big, the kernel starts reclaiming aggressively and you get stalls.
- innodb_flush_log_at_trx_commit: default is safest (1). Lowering it (2 or 0) buys speed by accepting data loss on crash. This is not a “free optimization,” it’s a business decision.
- innodb_log_file_size / innodb_redo_log_capacity: larger logs reduce checkpoint pressure and smooth writes. Too large and recovery time grows; too small and you churn checkpoints and hammer storage.
- innodb_io_capacity: telling InnoDB what your storage can actually do prevents it from being either lazy or frantic.
Where MariaDB can quietly lose you money
MariaDB can look fast while it’s building up debt: dirty pages accumulating, purge lag, replication delay that’s “fine” until a failover, or a single hot table that turns into a mutex festival. You don’t always see it until you hit the cliff.
Also, cross-version behavior and fork divergence matter. A VPS setup that was stable for one major version can get weird after an upgrade, especially around optimizer choices and default settings.
PostgreSQL on a VPS: what it’s great at
When PostgreSQL tends to win on cost
- Mixed workloads: reads, writes, analytics-ish queries, and “product decided we need reporting” in the same database.
- Complex queries where planner quality and indexing options matter.
- Operational clarity: built-in statistics and a culture of introspection.
- Long-term maintainability: predictable semantics, less “it depends on engine quirks.”
The knobs that matter (and why)
PostgreSQL tuning on a VPS is mostly about: (1) avoiding memory overcommit, (2) smoothing WAL/checkpoints, (3) keeping vacuum healthy.
- shared_buffers: not “as big as possible.” On Linux, too large can starve filesystem cache and hurt. Usually 15–25% of RAM is a sane start on a VPS.
- work_mem: per sort/hash, per node, per query. If you set it like a hero, your box will OOM like a villain.
- effective_cache_size: tells the planner what cache is likely available (OS cache + shared_buffers). It doesn’t allocate memory, but it changes plans.
- checkpoint_timeout, max_wal_size, checkpoint_completion_target: these can turn “periodic latency spikes” into “boring.” Boring is good.
- autovacuum settings: autovac is your roommate that cleans. If you disable it, you’ll enjoy the vibe until the trash hits the ceiling.
Where PostgreSQL can waste your budget
The biggest cost trap is connection management. Thousands of client connections on a small VPS will make Postgres spend money on context switches and memory overhead, not queries. If you don’t use a pooler, you’ll buy a bigger instance just to host idle connections. That’s not “scaling.” That’s subsidizing bad defaults.
Joke #2: Disabling autovacuum because it “uses CPU” is like removing the smoke alarm because it’s noisy.
One quote that maps to production life
Werner Vogels (Amazon CTO) said: “Everything fails all the time.”
Fast diagnosis playbook
When performance tanks on a VPS, your job is not to guess. Your job is to locate the bottleneck quickly, then decide whether you’re compute-bound, memory-bound, I/O-bound, or suffering from contention and queueing.
First: confirm the symptom shape
- Latency spike pattern: periodic (checkpoints, vacuum, backups) vs sustained (I/O saturation, lock contention).
- Scope: one query/table vs everything.
- Time to recover: seconds (burst) vs hours (maintenance debt).
Second: decide which of these is the primary limiter
- Storage: high iowait, high fsync times, low IOPS, long disk queues.
- CPU: high user/sys, high steal, run queue > cores.
- Memory: swapping, page cache thrash, OOM kills, huge working set misses.
- Locking/contention: waits on locks, buffer contentions, too many connections.
Third: pick the right “fix class”
- If storage-bound: checkpoint/WAL/redo smoothing, avoid fsync storms, reduce random reads via caching/indexing, consider separate volume for logs.
- If CPU-bound: reduce concurrency, fix slow queries, add indexes, avoid per-row functions, tune parallelism carefully.
- If memory-bound: right-size buffers, reduce work_mem / per-thread buffers, implement pooling, stop swapping.
- If contention-bound: transaction length, hot rows, lock ordering, isolation level, better indexes, batching writes.
12+ practical tasks: commands, outputs, decisions
These are the tasks you actually run on a VPS when you’re trying to buy speed with engineering instead of credit card upgrades. Each task includes a realistic command, a sketch of output, what the output means, and the decision you make.
Task 1: Check CPU steal and iowait (is the VPS lying to you?)
cr0x@server:~$ mpstat -P ALL 1 5
Linux 6.1.0 (vps01) 12/29/2025 _x86_64_ (4 CPU)
12:00:01 AM CPU %usr %nice %sys %iowait %irq %soft %steal %idle
12:00:01 AM all 22.10 0.00 6.12 18.33 0.00 0.55 8.40 44.50
...
Meaning: %steal at ~8% means your “4 vCPU” are not yours when you need them. %iowait near 18% says storage stalls are dominating.
Decision: Don’t tune SQL first. Start with I/O smoothing (WAL/checkpoints/redo) and reduce concurrent writers. If steal stays high, consider a different VPS class/provider before you buy more cores.
Task 2: Find disk queueing and latency (the “why is p99 awful” check)
cr0x@server:~$ iostat -x 1 5
Linux 6.1.0 (vps01) 12/29/2025 _x86_64_
Device r/s w/s rkB/s wkB/s await aqu-sz %util
vda 85.0 60.0 3400.0 5200.0 22.5 1.80 98.0
...
Meaning: await 22.5ms and %util 98% indicates saturated storage with a persistent queue (aqu-sz ~1.8).
Decision: You are I/O-bound. Prioritize reducing fsync frequency bursts, increasing cache hit rates, and removing unnecessary writes (indexes, verbose logs). Consider moving WAL/redo to faster storage if possible.
Task 3: Confirm filesystem and mount options (barriers, atime, discard)
cr0x@server:~$ findmnt -no SOURCE,FSTYPE,OPTIONS /var/lib
/dev/vda1 ext4 rw,relatime,errors=remount-ro
Meaning: ext4 with relatime is normal. If you see exotic options or network filesystems, stop and reconsider life choices.
Decision: Keep it boring. If atime is enabled (not relatime/noatime), consider noatime for heavy read workloads. Avoid “tuning” mount options you don’t fully understand; the database already does durability logic.
Task 4: Detect swapping and memory pressure
cr0x@server:~$ free -h
total used free shared buff/cache available
Mem: 8.0Gi 6.9Gi 120Mi 180Mi 1.0Gi 420Mi
Swap: 2.0Gi 1.6Gi 420Mi
Meaning: You’re swapping heavily; “available” is low. Database latency will go nonlinear.
Decision: Reduce memory settings now (Postgres work_mem/shared_buffers; MariaDB buffer pool + per-thread buffers). Consider adding RAM only after you confirm you’re not leaking memory via connection storms.
Task 5: Check per-process memory and OOM kills
cr0x@server:~$ dmesg -T | tail -n 8
[Mon Dec 29 00:10:12 2025] Out of memory: Killed process 2113 (postgres) total-vm:5242880kB, anon-rss:3100000kB, file-rss:0kB, shmem-rss:0kB
[Mon Dec 29 00:10:12 2025] oom_reaper: reaped process 2113 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Meaning: The kernel killed a PostgreSQL backend. That’s data-corruption-adjacent if it happens at the wrong moment, and it will absolutely corrupt your Monday.
Decision: Fix memory configuration and connection pooling immediately. Don’t blame the query planner; this is basic capacity discipline.
Task 6: PostgreSQL—find top queries by total time (stop guessing)
cr0x@server:~$ sudo -u postgres psql -d appdb -c "SELECT queryid, calls, total_exec_time::bigint AS total_ms, mean_exec_time::numeric(10,2) AS mean_ms, rows, left(query, 80) AS q FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 5;"
queryid | calls | total_ms | mean_ms | rows | q
----------+-------+----------+---------+------+-----------------------------------------------
91230123 | 12000 | 980000 | 81.67 | 120 | SELECT * FROM orders WHERE user_id=$1 ORDER BY created_at DESC LIMIT $2
11223344 | 1500 | 410000 | 273.33 | 1 | UPDATE inventory SET qty=qty-$1 WHERE sku=$2
...
Meaning: You have a small number of heavy hitters. The first query likely needs a composite index. The update might be hot-row contention.
Decision: Optimize the top 1–3 queries before touching global knobs. Indexes and query rewrites beat “more shared_buffers” most of the time.
Task 7: PostgreSQL—check what you’re waiting on (locks vs I/O vs CPU)
cr0x@server:~$ sudo -u postgres psql -d appdb -c "SELECT wait_event_type, wait_event, count(*) FROM pg_stat_activity WHERE state='active' GROUP BY 1,2 ORDER BY 3 DESC;"
wait_event_type | wait_event | count
-----------------+---------------------+-------
IO | DataFileRead | 12
Lock | relation | 4
LWLock | buffer_content | 3
Meaning: Mostly waiting on DataFileRead = you’re missing cache (or doing lots of random reads). Some lock waits indicate contention, but it’s not the dominant issue.
Decision: Increase cache hit rate by indexing, reducing table bloat (vacuum), and right-sizing shared_buffers/effective_cache_size. For lock waits, shorten transactions and remove hotspot patterns.
Task 8: PostgreSQL—measure cache hit ratio, but don’t worship it
cr0x@server:~$ sudo -u postgres psql -d appdb -c "SELECT datname, blks_hit, blks_read, round(100.0*blks_hit/nullif(blks_hit+blks_read,0),2) AS hit_pct FROM pg_stat_database WHERE datname='appdb';"
datname | blks_hit | blks_read | hit_pct
---------+----------+-----------+---------
appdb | 92000000 | 7800000 | 92.17
Meaning: 92% is not terrible, but on a VPS with mediocre storage, that 8% can still be your p99.
Decision: If you can’t improve indexes/queries, add RAM (or reduce dataset size) to improve hit rate. Also check for sequential scans on big tables.
Task 9: PostgreSQL—spot autovacuum debt and bloat candidates
cr0x@server:~$ sudo -u postgres psql -d appdb -c "SELECT relname, n_live_tup, n_dead_tup, last_autovacuum FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 5;"
relname | n_live_tup | n_dead_tup | last_autovacuum
----------------+------------+------------+--------------------------
events | 8200000 | 1900000 | 2025-12-28 21:14:09+00
sessions | 1500000 | 420000 | 2025-12-28 22:05:51+00
...
Meaning: Dead tuples are high; vacuum is behind. That increases random I/O, hurts indexes, and can trigger transaction ID wraparound risk if neglected long enough.
Decision: Tune autovacuum per table (scale factors, cost limits) and schedule maintenance windows for heavy vacuum/analyze if needed. On a small VPS, “vacuum debt” becomes “budget debt.”
Task 10: MariaDB—inspect InnoDB buffer pool behavior
cr0x@server:~$ sudo mariadb -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool%';"
+---------------------------------------+-----------+
| Variable_name | Value |
+---------------------------------------+-----------+
| Innodb_buffer_pool_pages_total | 524288 |
| Innodb_buffer_pool_pages_free | 1200 |
| Innodb_buffer_pool_read_requests | 980000000 |
| Innodb_buffer_pool_reads | 22000000 |
+---------------------------------------+-----------+
Meaning: Free pages are tiny (pool is “full,” normal). Reads vs read_requests indicates how often you miss cache. A high Innodb_buffer_pool_reads is expensive on VPS storage.
Decision: If misses are high relative to request volume, increase buffer pool (if RAM allows) or fix indexes and reduce table scans.
Task 11: MariaDB—check redo log pressure and flushing
cr0x@server:~$ sudo mariadb -e "SHOW GLOBAL STATUS LIKE 'Innodb_log_waits'; SHOW GLOBAL STATUS LIKE 'Innodb_os_log_fsyncs';"
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Innodb_log_waits | 842 |
+------------------+-------+
+---------------------+--------+
| Variable_name | Value |
+---------------------+--------+
| Innodb_os_log_fsyncs| 520000 |
+---------------------+--------+
Meaning: Innodb_log_waits > 0 means sessions are waiting for redo log space or flush progress—classic write bottleneck.
Decision: Increase redo capacity (depending on version settings), tune flushing, and reduce transaction burstiness. If storage is slow, consider durability tradeoffs only if the business accepts them.
Task 12: MariaDB—confirm what durability level you’re actually running
cr0x@server:~$ sudo mariadb -e "SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit'; SHOW VARIABLES LIKE 'sync_binlog';"
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| innodb_flush_log_at_trx_commit| 1 |
+-------------------------------+-------+
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sync_binlog | 1 |
+---------------+-------+
Meaning: Both set to 1 means strong durability (fsync per commit and per binlog group depending on config). This costs IOPS but buys integrity.
Decision: If you need speed and can accept some risk, you might relax one of these. If you can’t explain the risk to a non-technical stakeholder, keep them at 1 and optimize elsewhere.
Task 13: Detect excessive connections (PostgreSQL)
cr0x@server:~$ sudo -u postgres psql -d appdb -c "SELECT state, count(*) FROM pg_stat_activity GROUP BY state ORDER BY count(*) DESC;"
state | count
--------+-------
idle | 320
active | 25
null | 2
Meaning: Hundreds of idle connections are wasting memory and increasing management overhead.
Decision: Add a pooler (or fix app pooling). Reduce max_connections and force discipline. On VPS, idle connections are a luxury item.
Task 14: Detect replication lag (both worlds) before it ruins failover
cr0x@server:~$ sudo mariadb -e "SHOW SLAVE STATUS\G" | egrep 'Seconds_Behind_Master|Slave_IO_Running|Slave_SQL_Running'
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 37
Meaning: 37 seconds lag. Not catastrophic, but if you fail over during an incident, you’re choosing data loss or downtime.
Decision: Investigate write bursts, slow disk, and long-running transactions. Consider semi-sync or better hardware for replicas if RPO matters.
Task 15: Check fsync timing pain indirectly (PostgreSQL checkpoint stats)
cr0x@server:~$ sudo -u postgres psql -d appdb -c "SELECT checkpoints_timed, checkpoints_req, round(checkpoint_write_time/1000.0,1) AS write_s, round(checkpoint_sync_time/1000.0,1) AS sync_s FROM pg_stat_bgwriter;"
checkpoints_timed | checkpoints_req | write_s | sync_s
------------------+-----------------+---------+--------
120 | 85 | 980.4 | 210.2
Meaning: High sync time implies storage struggling during checkpoints; that maps to user-visible latency spikes.
Decision: Increase max_wal_size, tune checkpoint_completion_target, and reduce bursty writes. If you can separate WAL onto a better volume, do it.
Three corporate-world mini-stories (anonymized, plausible, painful)
1) Incident caused by a wrong assumption: “More RAM means faster”
The company had a small fleet of VPSes hosting a monolith and its database. The database was MariaDB. Traffic grew, and the team did what teams do under pressure: they doubled RAM and declared victory. The graphs looked better for about a week.
Then a marketing campaign hit. Latency spiked into timeouts. The app servers were fine. CPU was fine. The database had plenty of memory. Everyone stared at the dashboard like it owed them an apology.
The wrong assumption was simple: “If there’s free RAM, the database must be able to use it.” They had increased RAM but kept a conservative InnoDB buffer pool size from earlier days, and the workload had changed. Reads were now missing cache and hitting disk. The new RAM mostly became filesystem cache and “free,” while the database was still paying random I/O on a noisy storage volume.
Worse, they had a second-order effect: bigger RAM meant bigger potential dirty page backlog and larger working sets, which made bursts more punishing when flush threads fell behind. When the campaign arrived, InnoDB’s flushing behavior created periodic stalls that lined up perfectly with peak traffic.
The fix wasn’t heroic. They sized buffer pool properly, validated IO capacity settings, and ran a workload replay to confirm p99 stabilized. The expensive lesson: on VPS, RAM is only performance if the database is configured to spend it.
2) Optimization that backfired: “Let’s crank work_mem”
A different org ran PostgreSQL on a 2 vCPU / 8 GB VPS. Reporting queries were slow, and someone suggested increasing work_mem “so sorts happen in memory.” They bumped it aggressively, because the queries were big and the user complaints were bigger.
For a day, the reports screamed. Then the whole system got weird. Random spikes. Autovacuum started lagging. The kernel began swapping. The app saw intermittent 502s. You know the pattern: everything looks like a network problem when the host is on fire.
The backfire was predictable: work_mem is per-operation, per-query, per-backend. Under concurrency, Postgres happily allocated far more than physical RAM, and Linux responded by swapping. Not a little swapping. The kind of swapping that makes a VPS feel like it’s running on a USB drive in a drawer.
The eventual fix involved lowering work_mem, adding a connection pooler to cap concurrency, and rewriting a few reports to pre-aggregate. They also learned to benchmark with realistic concurrency. Single-user benchmarks are how you buy the wrong server.
3) Boring but correct practice that saved the day: “We kept vacuum and backups dull”
A small SaaS ran PostgreSQL on a modest VPS with a read replica. They didn’t have fancy infrastructure, but they had discipline: autovacuum was tuned per table for high-churn data, and they had scheduled maintenance for the worst offenders. Every schema change came with an index plan and a rollback.
One afternoon, storage latency on the primary degraded. Not a full outage—just enough to create misery. Checkpoints took longer, WAL flushed slower, and app latency rose. The team had two advantages: they knew what “normal” looked like, and their system wasn’t carrying maintenance debt.
Because vacuum wasn’t behind, tables weren’t bloated, so the system didn’t amplify reads unnecessarily. Because backups were tested, they could make changes without fear. Because replication was monitored, they knew the replica was healthy and could serve reads temporarily.
The incident still hurt, but it didn’t become a multi-day catastrophe. The “boring practice” wasn’t a secret performance trick. It was just refusing to let the database become a landfill. On VPS budgets, cleanliness is performance.
Common mistakes: symptoms → root cause → fix
1) Symptom: periodic 2–10 second latency spikes
Root cause: checkpoints (PostgreSQL) or aggressive flushing (InnoDB) causing bursty synchronous I/O; sometimes backups competing for disk.
Fix: Smooth writes: increase WAL/redo capacity, tune checkpoint cadence, ensure backups use throttling or run off-peak. Confirm with iostat await/util and Postgres bgwriter stats.
2) Symptom: “CPU is low but everything is slow”
Root cause: storage latency or lock waits. CPU looks idle because threads are waiting.
Fix: Check iowait, disk queue, and database wait events. For locks, identify blockers and shorten transactions.
3) Symptom: Postgres gets slower over days/weeks without code changes
Root cause: vacuum/analyze debt, table/index bloat, stale planner stats.
Fix: Tune autovacuum (especially scale factors) for high-churn tables; monitor dead tuples; schedule vacuum/analyze; consider partitioning for churn-heavy data.
4) Symptom: MariaDB replication lag grows during peak writes
Root cause: replica I/O can’t keep up (redo/binlog fsync cost, slow disk), or long transactions delay apply.
Fix: Reduce fsync pressure, ensure replica storage is adequate, avoid giant transactions, and monitor apply rate. If your failover plan assumes zero lag, stop assuming.
5) Symptom: high load average but modest CPU usage
Root cause: runnable queue includes tasks stuck in uninterruptible I/O sleep (D state). Load average includes them.
Fix: Confirm with iostat and process states. The fix is storage, not more vCPU.
6) Symptom: sudden OOM kills after “tuning for speed”
Root cause: per-connection memory explosion (Postgres work_mem, maintenance_work_mem; MariaDB sort/join buffers) combined with too many connections.
Fix: implement pooling, reduce max connections, and set memory knobs based on worst-case concurrency.
7) Symptom: inserts slow, selects fine (PostgreSQL)
Root cause: WAL fsync latency, too many indexes, or synchronous_commit settings mismatched to hardware.
Fix: reduce index count, batch writes, tune WAL/checkpoints, consider async commit only if acceptable, and ensure WAL isn’t competing with random reads.
8) Symptom: “We upgraded instance size and got nothing”
Root cause: bottleneck is storage or lock contention; more CPU/RAM doesn’t change it.
Fix: measure iowait, await, and lock waits first; then spend money.
Checklists / step-by-step plan
A. Picking the engine for speed per dollar (decision checklist)
- Need complex SQL, clean diagnostics, and predictable behavior? Choose PostgreSQL.
- Mostly simple OLTP, team is MySQL-native, hot dataset fits in buffer pool? MariaDB is a fine cost play.
- Storage quality is unknown or inconsistent? Favor PostgreSQL if you need strong correctness with fewer “fast but risky” temptations; otherwise you’ll tune yourself into a corner.
- Connection count high and app is chatty? PostgreSQL with a pooler; MariaDB is also sensitive, but Postgres will punish you more directly.
B. Baseline a new VPS (do this before blaming the database)
- Verify CPU steal and storage latency at idle and under load.
- Confirm filesystem, mount options, and free space headroom.
- Set up basic metrics: CPU, iowait, disk await, DB connections, replication lag.
- Run a realistic concurrency benchmark (not one client, not one query).
C. PostgreSQL VPS tuning plan (safe and effective)
- Implement connection pooling; cap active sessions.
- Set shared_buffers conservatively; set effective_cache_size realistically.
- Set work_mem modestly; verify peak concurrency before increasing.
- Tune checkpoint/WAL to reduce latency spikes.
- Validate autovacuum is keeping up; tune per-table for high churn.
- Add indexes only when they pay; each index is a write tax.
D. MariaDB VPS tuning plan (InnoDB-first, sanity-first)
- Size innodb_buffer_pool_size to your hot working set, leaving OS headroom.
- Ensure redo capacity is sufficient; avoid log waits.
- Set innodb_io_capacity based on reality, not optimism.
- Be explicit about durability settings; don’t “accidentally” relax them.
- Watch replication lag and purge history length (if applicable to your version/tooling).
- Audit indexes: remove the ones added “just in case.”
E. The “don’t buy bigger yet” checklist
- You have identified top queries and measured their impact.
- You have confirmed whether waits are I/O, locks, or CPU.
- You have eliminated swapping and connection storms.
- You have smoothed checkpoints/flush behavior.
- You have validated replication and backup load aren’t competing with production.
FAQ
1) Which is faster on a cheap VPS: MariaDB or PostgreSQL?
It depends on workload shape, but PostgreSQL often delivers better predictable performance under mixed workloads. MariaDB can be faster for simple read-heavy OLTP when the buffer pool fits the working set.
2) What’s the single biggest VPS bottleneck for both?
Storage latency variance. Not bandwidth—latency. WAL/redo fsync behavior makes small stalls visible to users.
3) Should I put WAL/redo logs on a separate disk?
If your VPS provider offers a separate volume with independent performance characteristics, yes, it can help. If it’s the same underlying pool, you may just add complexity.
4) Is it safe to relax durability to get speed?
Sometimes. But “safe” is a business policy, not a database setting. If you can tolerate losing the last second of transactions after a crash, you can buy speed. If you can’t, optimize elsewhere.
5) Why does PostgreSQL need VACUUM and MariaDB doesn’t (as visibly)?
PostgreSQL’s MVCC leaves dead tuples behind until vacuum reclaims space and keeps visibility metadata healthy. InnoDB manages undo/redo and purge differently. Both need maintenance; PostgreSQL just makes it harder to ignore.
6) How do I know if I need a connection pooler for PostgreSQL?
If you have hundreds of mostly-idle connections or bursts of connection churn, you need pooling. On a small VPS, it’s one of the best “speed per dollar” upgrades you can do without buying hardware.
7) What metrics should I alert on first?
Disk await/%util, iowait, swap usage, active connections, replication lag, and DB wait events (for Postgres). Alert on trends, not just thresholds.
8) Can I tune my way out of bad VPS storage?
You can reduce damage—smooth writes, increase cache hit rate, reduce concurrency—but you can’t beat physics and noisy neighbors forever. Sometimes the correct tuning is switching providers or storage classes.
9) For speed per dollar, should I scale vertically or add a replica?
If reads dominate, a replica can buy more than vertical scaling. If writes dominate and you’re I/O-bound on WAL/redo, vertical scaling (or better storage) usually wins first.
10) Which is easier to diagnose under pressure?
PostgreSQL, typically. The introspection tooling and wait visibility are better out of the box, and the performance story is easier to reason about when you’re tired.
Next steps you can do this week
- Measure your VPS reality: run mpstat and iostat during a slow period and a busy period. If iowait and await are high, stop pretending your bottleneck is SQL syntax.
- Find the top queries: use pg_stat_statements (Postgres) or slow query log/performance schema tooling (MariaDB) and fix the worst offenders first.
- Eliminate swapping: right-size memory settings and implement pooling. Swapping turns “cheap VPS” into “expensive outage.”
- Smooth writes: tune checkpoints (Postgres) or flushing/log settings (MariaDB) to reduce spikes, then verify with stats and latency graphs.
- Make maintenance boring: autovacuum health (Postgres) and InnoDB log/flush health (MariaDB) are not “later” tasks. They are the rent.
If you want the most speed per dollar, treat the database like a production system, not a magic library. Measure. Change one thing. Measure again. That’s not just engineering virtue—it’s the cheapest performance upgrade you’ll ever buy.