It starts innocently: “Let’s just store counters in the database and reject requests when users exceed a limit.” Two weeks later you’re staring at rising p95 latency, hot rows, lock waits, and an incident channel full of messages like “why is the DB at 95% CPU when traffic isn’t even that high?”
Rate limiting is a control-plane concern. Your database is a data-plane workhorse. When you make it police traffic, you ask it to both run the race and referee the race—while also selling snacks in the stands.
The thesis: why “rate limit in SQL” fails in production
Database-side rate limiting sounds attractive because it’s centralized and transactional. You can enforce limits “exactly,” per user, per API key, per tenant, per whatever. And you already have a database. So you add a table like rate_limits, do an UPDATE or INSERT ... ON CONFLICT/ON DUPLICATE KEY, check a timestamp window, and deny if over limit.
In production, that approach tends to fail for a handful of repeatable reasons:
- You create a serialization point. Rate limiting is inherently about shared state. Shared state inside a transactional engine becomes contention. Contention becomes tail latency. Tail latency becomes timeouts. Timeouts become retries. Retries become an unplanned load test.
- You turn “cheap rejects” into “expensive rejects.” A good rate limiter rejects quickly, before your expensive systems do work. A DB-based limiter asks the expensive system (the DB) to do work to decide whether it should do work.
- You amplify traffic bursts. Bursts are exactly when rate limiting matters. Bursts are also when DB contention is worst. Rate limiting inside the DB means you hit the worst-case path when you need the best-case behavior.
- You couple application availability to DB availability. If the rate limiter depends on DB round trips, then a DB brownout becomes an API brownout, even for endpoints that could otherwise degrade gracefully.
- You get “correct” results at the cost of correctness elsewhere. Your core data queries compete with your policing queries. When the limiter gets hot, it steals CPU, buffer cache, and I/O budget from the business.
That’s the operational reality. Here’s the dry-funny summary: using your database for rate limiting is like using your fire alarm to cook dinner—technically possible, but you’re going to have a bad time.
Interesting facts and historical context (so you don’t repeat history)
- PostgreSQL’s MVCC model (from the 1990s Postgres project lineage) avoids read locks for many workloads, but write contention is still write contention; hot updates can still serialize on the same tuple/page.
- InnoDB (the default engine in MariaDB/MySQL land) is built around clustered indexes; a “hot” primary key means you’re hammering the same B-tree pages, which can turn into latch contention and buffer pool churn.
- Token bucket and leaky bucket algorithms show up in networking literature decades before most web APIs existed; they were designed for routers, not OLTP databases.
- pgbouncer’s widespread adoption came from a very practical reality: PostgreSQL connections are not free, and “too many clients” is a classic self-inflicted outage mode.
- MySQL’s early reputation for speed was partly because it shipped with simpler default behaviors and fewer safety features; people then reintroduced complexity at the application layer. The cycle repeats with “let’s rate limit in SQL.”
- Advisory locks in PostgreSQL are powerful and easy to misuse; they’re a concurrency primitive, not a traffic shaping tool.
- Large-scale systems popularized “bulkheads” and “budgets” (separate pools and per-tenant resource isolation) because perfect fairness is expensive; predictable service is the real goal.
- Retry storms are not a modern invention; distributed systems have been re-learning “retries can be harmful” since at least the early RPC era.
MariaDB vs PostgreSQL: what changes, what doesn’t
What’s the same: contention is physics
Whether you run MariaDB or PostgreSQL, a rate limiter implemented as “one row per subject, update it on every request” creates a hotspot. The database must coordinate those updates. Coordination is the whole job of a database, but it’s not free—especially at the QPS levels where rate limiting matters.
What differs: how the pain shows up
PostgreSQL tends to show the pain as:
- Row-level lock waits when multiple transactions update the same row (visible via
pg_locks,pg_stat_activity). - High WAL volume from incessant updates, even if the “business” data isn’t changing much.
- Autovacuum pressure from rapid churn; frequent updates create dead tuples that must be vacuumed, and vacuum needs I/O.
- In extreme cases: CPU spent on LWLocks or contention around shared buffers/checkpoint-related activity.
MariaDB (InnoDB) tends to show the pain as:
- Lock waits and deadlocks around the same primary key or unique index entries (
SHOW ENGINE INNODB STATUSis your friend and your enemy). - Undo log / purge pressure from constant updates, which can snowball into history list length growth and performance decay.
- Buffer pool contention and hot index pages when the same key range is hammered.
- Replication lag if you’re binlogging every limiter update; your replicas become “rate limiter update consumers” instead of serving reads.
Different dashboards. Same root issue: the limiter is now competing with your workload for transactional throughput.
How DB-based rate limiting breaks: locks, hot rows, and invisible work
1) The hot-row problem (a.k.a. “one row to rule them all”)
Most naive schemas use a single row per user/API key. Under bursty traffic, many app workers update the same row. Concurrency collapses into serialization. Your “limit” becomes “the DB’s ability to update that row per second,” which is not the limit you intended. Worse: you can’t raise it without raising DB write throughput.
2) Tail-latency inflation (the quiet killer)
Rate limiting is often executed on every request. So even a small increase in limiter latency multiplies across your traffic. You don’t just add 2 ms; you add 2 ms to everything, then you add retries, then you add queueing. You get a long, slow outage that looks like “everything is a little worse” until it’s suddenly catastrophic.
3) WAL/binlog and vacuum/purge: the “invisible work” tax
Every update has downstream effects:
- PostgreSQL writes WAL; replication consumes WAL; checkpoints flush pages; autovacuum cleans up.
- InnoDB writes redo/undo; purge threads clean up; replication replays binlog events.
Your limiter is now a write-amplification engine. The system spends more time maintaining the ledger than doing the business.
4) Failure coupling: when the DB is sick, your API becomes sick
A rate limiter should protect your database from overload. If it runs in the database, you’ve built a seatbelt that only works when the car is not crashing. Under DB degradation, the limiter gets slower, which increases concurrent requests, which makes the DB more degraded. That’s a feedback loop with a finance-friendly name: “unplanned downtime.”
5) “But we’ll shard the limiter table” (the optimistic phase)
Teams often try to fix hot rows by sharding counters across N rows and summing them. That reduces single-row contention but introduces more reads, more complexity, and still plenty of write load. Also: summing counters on every request is a great way to turn a limiter into a query engine.
Second dry-funny truth: a sharded SQL rate limiter is how you turn one hot row into several slightly less hot rows, like distributing a headache across the whole head.
Fast diagnosis playbook: find the bottleneck in minutes
When you suspect rate limiting “in the DB” is melting things, do not start by rewriting algorithms. Start by proving where the time is going.
First: confirm whether the limiter is on the critical path
- Check application traces: is there a DB call for every request even when rejecting?
- Compare latencies for allowed vs rejected requests. Rejection should be cheaper. If it’s more expensive, you’ve already found your problem.
Second: identify the shared resource that’s contended
- DB lock waits (row locks, transaction locks).
- Connection pool saturation (threads waiting for connections).
- Write-ahead logging or binlog pressure (WAL/binary log volume spikes).
- I/O queueing (high await, dirty buffers, slow flush).
Third: determine whether you’re in a feedback loop
- Are there retries on 429/5xx that increase load?
- Are app timeouts shorter than DB recovery from spikes?
- Is autoscaling adding more clients, worsening contention?
Fourth: pick the fastest safe mitigation
- Move rate limiting to the edge (API gateway / ingress) for the hottest endpoints.
- Enforce connection caps and queueing outside the DB (poolers, semaphores, worker queues).
- Disable or bypass DB-based limiter temporarily if it’s causing systemic failure (with a compensating limit elsewhere).
Paraphrased idea from Werner Vogels (Amazon CTO): “Everything fails, all the time—design systems that expect it and keep serving.”
Practical tasks: commands, outputs, and the decisions they drive
These are the checks I run when I walk into a system where someone proudly says “we rate limit in SQL.” Each task includes: command, sample output, what it means, and the decision you make.
Task 1: Check if the database is CPU-bound or I/O-bound (Linux)
cr0x@server:~$ mpstat -P ALL 1 3
Linux 6.5.0 (db01) 12/31/2025 _x86_64_ (16 CPU)
12:01:11 PM CPU %usr %nice %sys %iowait %irq %soft %steal %idle
12:01:12 PM all 72.10 0.00 17.22 2.10 0.00 0.88 0.00 7.70
12:01:13 PM all 76.55 0.00 16.40 1.90 0.00 0.70 0.00 4.45
12:01:14 PM all 74.80 0.00 18.00 2.40 0.00 0.60 0.00 4.20
Meaning: High %usr+%sys, low %iowait: CPU pressure, likely contention or heavy query execution.
Decision: If the limiter is write-heavy, focus on lock contention/WAL/binlog and hot updates, not disks first.
Task 2: Check disk latency and queueing (Linux)
cr0x@server:~$ iostat -xz 1 3
Linux 6.5.0 (db01) 12/31/2025 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
71.4 0.0 17.9 2.1 0.0 8.6
Device r/s w/s rkB/s wkB/s aqu-sz await svctm %util
nvme0n1 12.0 980.0 210.0 18340.0 5.20 5.10 0.25 26.0
Meaning: Writes are high but await is low; not I/O saturation. The DB is doing lots of small writes (typical of counter updates).
Decision: Don’t buy faster disks to fix a rate limiter design. Fix the design.
Task 3: Check PostgreSQL active queries and wait events
cr0x@server:~$ psql -h db01 -U postgres -d app -c "select pid, usename, state, wait_event_type, wait_event, left(query,80) as q from pg_stat_activity where state<>'idle' order by now()-query_start desc limit 8;"
pid | usename | state | wait_event_type | wait_event | q
------+---------+--------+-----------------+---------------+-----------------------------------------------
8121 | app | active | Lock | tuple | update rate_limits set count=count+1 where key=
8177 | app | active | Lock | tuple | update rate_limits set count=count+1 where key=
8204 | app | active | Lock | tuple | update rate_limits set count=count+1 where key=
8290 | app | active | IO | DataFileRead | select * from orders where tenant_id=$1 order by
Meaning: Multiple sessions waiting on tuple locks for the same pattern of updates. That’s your limiter row(s) acting as a mutex.
Decision: Stop doing per-request updates in SQL. Move limiter state out of the DB or batch it.
Task 4: Find top PostgreSQL queries by total time (requires pg_stat_statements)
cr0x@server:~$ psql -h db01 -U postgres -d app -c "select calls, round(total_exec_time::numeric,1) as total_ms, round(mean_exec_time::numeric,3) as mean_ms, left(query,70) as q from pg_stat_statements order by total_exec_time desc limit 5;"
calls | total_ms | mean_ms | q
-------+-----------+---------+----------------------------------------------------------------------
92000 | 580000.2 | 6.304 | update rate_limits set count=count+1, reset_at=$1 where key=$2
41000 | 210000.7 | 5.122 | insert into rate_limits(key,count,reset_at) values($1,$2,$3) on conflict
8000 | 90000.4 | 11.250 | select * from orders where tenant_id=$1 order by created_at desc limit 50
Meaning: Your limiter queries are dominating total execution time. They’re not “small overhead.” They are the workload.
Decision: Treat this as a functional bug, not “performance tuning.” Remove limiter writes from the DB hot path.
Task 5: Check PostgreSQL WAL generation rate
cr0x@server:~$ psql -h db01 -U postgres -d app -c "select now(), pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')) as wal_since_start;"
now | wal_since_start
------------------------------+----------------
2025-12-31 12:03:10.123+00 | 148 GB
Meaning: Huge WAL for the runtime suggests heavy write churn. Rate-limiter counters are prime suspects.
Decision: If replicas lag or archiving backs up, reduce write volume immediately (edge limiting, caching, batch updates).
Task 6: Check PostgreSQL vacuum pressure for the limiter table
cr0x@server:~$ psql -h db01 -U postgres -d app -c "select relname, n_live_tup, n_dead_tup, last_autovacuum from pg_stat_user_tables where relname='rate_limits';"
relname | n_live_tup | n_dead_tup | last_autovacuum
------------+------------+------------+---------------------------
rate_limits| 120000 | 9800000 | 2025-12-31 11:57:42+00
Meaning: Dead tuples dwarf live tuples: constant updates. Vacuum is working overtime just to clean up limiter writes.
Decision: Stop updating those rows per request. Vacuum tuning is a bandage; the wound is design.
Task 7: Check MariaDB/InnoDB deadlocks and lock waits
cr0x@server:~$ mariadb -h db01 -u root -p -e "SHOW ENGINE INNODB STATUS\G" | sed -n '1,120p'
...
LATEST DETECTED DEADLOCK
------------------------
*** (1) TRANSACTION:
TRANSACTION 34588921, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1136, 3 row lock(s)
MySQL thread id 7123, OS thread handle 140377, query id 991233 app 10.0.3.41 updating
UPDATE rate_limits SET count=count+1, reset_at=FROM_UNIXTIME(173564...) WHERE k='tenant:913'
*** (2) TRANSACTION:
TRANSACTION 34588922, ACTIVE 0 sec starting index read
...
Meaning: Deadlocks on the limiter table. Your “safety mechanism” is now a source of rollbacks and retries.
Decision: Remove the limiter from InnoDB. If you must keep some state, store it in a low-contention system (Redis/memcache) with expiration.
Task 8: Check MariaDB top statements (performance_schema enabled)
cr0x@server:~$ mariadb -h db01 -u root -p -e "SELECT DIGEST_TEXT, COUNT_STAR, ROUND(SUM_TIMER_WAIT/1e12,1) AS total_s FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 3;"
DIGEST_TEXT COUNT_STAR total_s
UPDATE `rate_limits` SET `count` = `count` + ? WHERE `k` = ? 210344 980.2
INSERT INTO `rate_limits` (`k`,`count`,`reset_at`) VALUES (...) 50321 220.5
SELECT * FROM `orders` WHERE `tenant_id` = ? ORDER BY ... 21000 190.0
Meaning: The limiter is top-of-the-chart. The DB is spending most of its time policing traffic.
Decision: Move enforcement earlier in the request path. Leave the DB for data.
Task 9: Check connection saturation (PostgreSQL)
cr0x@server:~$ psql -h db01 -U postgres -d app -c "select count(*) as total, sum(case when state='active' then 1 else 0 end) as active from pg_stat_activity;"
total | active
-------+--------
480 | 220
Meaning: Many connections and many actives. If you’re not pooling correctly, you’re paying overhead per connection and increasing contention.
Decision: Add/verify a pooler (pgbouncer) and cap concurrency per endpoint/tenant outside the DB.
Task 10: Check pgbouncer pool health
cr0x@server:~$ psql -h pgb01 -U pgbouncer -d pgbouncer -c "show pools;"
database | user | cl_active | cl_waiting | sv_active | sv_idle | maxwait
---------+------+-----------+------------+----------+---------+---------
app | app | 180 | 45 | 40 | 0 | 12.3
Meaning: Clients are waiting; servers are maxed. The app is generating more concurrent DB work than the DB can safely handle.
Decision: Enforce concurrency limits in the app (semaphores), move rate limiting to gateway, and reduce DB round trips per request.
Task 11: Check NGINX rate limiting effectiveness at the edge
cr0x@server:~$ sudo nginx -T 2>/dev/null | grep -n "limit_req"
134: limit_req_zone $binary_remote_addr zone=perip:10m rate=20r/s;
201: limit_req zone=perip burst=40 nodelay;
Meaning: Edge limiting is configured. If your DB is still overloaded, either limits are too high, keyed wrong, or bypassed for key traffic.
Decision: Validate keys (per API key/tenant, not per IP when behind NAT), and add per-route budgets.
Task 12: Verify the app is not retrying 429/lock errors aggressively
cr0x@server:~$ rg -n "retry|backoff|429|Too Many Requests" /etc/myapp/config.yaml
118:retry:
119: max_attempts: 6
120: backoff: "fixed"
121: backoff_ms: 50
122: retry_on_status: [429, 500, 502, 503, 504]
Meaning: Fixed 50ms backoff with 6 attempts on 429 is a retry storm generator. 429 is a signal to slow down, not speed-run the server.
Decision: Stop retrying 429 automatically, or use exponential backoff with jitter and a strict cap. Prefer client-side smoothing.
Task 13: Identify hot keys in the limiter table (PostgreSQL example)
cr0x@server:~$ psql -h db01 -U postgres -d app -c "select key, count(*) as updates_last_min from rate_limit_audit where ts > now()-interval '1 minute' group by key order by updates_last_min desc limit 5;"
key | updates_last_min
--------------+------------------
tenant:913 | 18400
tenant:1442 | 10210
tenant:77 | 9800
Meaning: A few tenants dominate update volume. This is where fairness and isolation matter more than perfect global limits.
Decision: Introduce per-tenant budgets and bulkheads outside the DB; consider separate worker pools or queues per tier.
Task 14: Check replication lag (MariaDB replica)
cr0x@server:~$ mariadb -h db-rep01 -u root -p -e "SHOW SLAVE STATUS\G" | egrep "Seconds_Behind_Master|Slave_IO_Running|Slave_SQL_Running"
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 187
Meaning: Replica is behind. If limiter updates are replicated, you’re spending replication throughput on policing rather than business data.
Decision: Keep rate limiting state out of replicated OLTP, or at least don’t replicate it (separate instance/schema with different durability), and fix at the edge.
Three corporate mini-stories from the trenches
Mini-story 1: the incident caused by a wrong assumption
A SaaS team wanted per-tenant rate limits. The product requirement said “hard limit, exact, no more than N requests per minute.” Engineering heard “transactional.” They built a PostgreSQL table keyed by tenant ID, updated on every request, and used UPDATE ... RETURNING to decide allow/deny. It passed load tests, because load tests used evenly distributed tenants and a polite ramp.
Then a real customer ran a migration script that hammered a single tenant with high concurrency. The limiter table became the most-updated table in the database. The symptoms were classic: p95 latency climbed, then p99 fell off a cliff. The DB wasn’t “down,” it was just permanently busy waiting on locks. The application timed out, retried, and doubled the traffic. The incident commander kept asking why the limiter didn’t protect the DB from overload. Silence, then the slow realization: the limiter was the overload.
They fixed it by moving first-line rate limiting into the gateway (per API key) and using a per-tenant work queue for the expensive migration endpoints. The “hard exact per-minute” requirement got rewritten into something adults can operate: “enforce approximate limits with bounded burst, prioritize core endpoints.” Everyone survived. Nobody loved the postmortem, but the graphs were educational.
Mini-story 2: the optimization that backfired
A different company ran MariaDB and had a limiter table with one row per API key. They noticed lock waits and tried to “optimize” by keeping the limiter row small, converting the key to an integer, and packing counters into fewer bytes. Then they added a covering index because “indexes make lookups fast.”
What happened: the update still had to modify the clustered index record (primary key) and the secondary index. They increased write amplification and made the hot page hotter. CPU went up, not down. Deadlocks started appearing because the access pattern changed subtly under concurrency, and their retry logic turned each deadlock into a small burst.
The real fix wasn’t smaller rows. It was changing the architecture: use an in-memory distributed counter with TTL for policing, and let MariaDB handle durable business writes. They kept a coarse “daily quota consumed” in MariaDB for audit and billing, updated asynchronously. The limiter stopped fighting with the checkout flow, which is generally considered good manners.
Mini-story 3: the boring but correct practice that saved the day
Another team had already been burned by “clever in-DB control planes.” They ran PostgreSQL with pgbouncer and had a simple rule: the database does not do request admission control. Admission happens before the DB and uses budgets: per-route concurrency caps, per-tenant fairness, and global overload protection at the gateway.
They still tracked usage for billing, but they did it with asynchronous aggregation. Requests emitted events to a queue; a worker rolled them up every minute and wrote a summary row per tenant. The database saw one write per tenant per minute, not one write per request. When traffic spiked, the queue depth grew. That was acceptable; it did not page anyone at 3 a.m.
One day a marketing campaign created a burst. The gateway started returning 429s for a subset of abusive clients, the app maintained steady latency, and the database barely noticed. Their incident ticket was a single line: “Increased 429 rate due to campaign traffic; no customer impact.” This is the dream: boring, predictable, and slightly smug.
What to do instead: sane patterns that scale
Principle: reject early, cheaply, and independently
Your rate limiter’s job is to protect expensive resources. That means it must live in front of them and remain functional when they’re degraded.
Pattern 1: Edge rate limiting (gateway/ingress)
Use your API gateway or ingress controller to enforce basic limits:
- Per client identifier (API key, JWT subject, tenant ID).
- Per route (login endpoints and search endpoints are not the same).
- With bounded bursts (token bucket) and sensible default responses (429 with retry-after).
This doesn’t need perfect global accuracy. It needs speed and predictability.
Pattern 2: Connection and concurrency limits (bulkheads)
Most DB “rate limiting” is really “we need fewer concurrent queries.” Solve that directly:
- Use a pooler (pgbouncer for PostgreSQL, ProxySQL or app-side pooling for MariaDB).
- Cap concurrency per endpoint (e.g., expensive report generation) with semaphores.
- Cap concurrency per tenant to avoid noisy neighbors.
This is control-plane engineering that actually controls something: concurrent work.
Pattern 3: Queue the expensive work
If the endpoint triggers heavy DB activity, do not let HTTP concurrency dictate DB concurrency. Put heavy work behind a queue. Let the queue depth absorb bursts. Run workers with fixed concurrency and known query patterns. Your database will thank you by staying alive.
Pattern 4: Use an in-memory store for short-window counters
If you truly need a shared counter with TTL, use a system designed for it (Redis/memcached-like). You get:
- Fast atomic increments.
- Expiration semantics without vacuum/purge drama.
- Isolation from your OLTP database’s write path.
Still apply hygiene: avoid single hot keys, include jitter in TTLs, and use per-tenant partitioning if needed.
Pattern 5: Store durable quotas asynchronously
Billing and compliance often require durable records. Great—store them, but do it out of band:
- Emit usage events.
- Aggregate in time buckets (per minute/hour/day).
- Write summaries to MariaDB/PostgreSQL.
The database stays a database, not a high-frequency counter store.
Pattern 6: “Budget” the database explicitly
Instead of “N requests per minute,” define budgets tied to the actual scarce resource:
- Query concurrency budget: max active DB queries per tenant/service.
- CPU budget: limit expensive endpoints by runtime (where measurable).
- I/O budget: throttle bulk exports and migrations.
You can’t buy perfect fairness cheaply. You can buy predictable latency and survivable overload. Choose survivable overload.
Common mistakes: symptoms → root cause → fix
Mistake 1: “429s are high but DB is also slow”
Symptoms: Rising 429 rate, rising DB CPU, rising lock waits, p95 latency increases across all endpoints.
Root cause: Rejections depend on DB writes/locks; limiter itself is the load.
Fix: Move limiter to gateway/in-memory store. Make rejection path DB-free. Add app-side concurrency caps.
Mistake 2: “Only one tenant is slow, but everyone suffers”
Symptoms: One big customer causes global slowdown. DB shows lock waits on limiter table; connection pools saturate.
Root cause: Shared limiter resources or shared DB pool without per-tenant isolation.
Fix: Implement bulkheads: per-tenant concurrency limits, separate worker pools for heavy endpoints, and queue heavy jobs.
Mistake 3: “We added indexes to the limiter table and it got worse”
Symptoms: Higher CPU, more write I/O, deadlocks increase, overall throughput drops.
Root cause: Write amplification: secondary indexes must be maintained on each update; hot pages become hotter.
Fix: Remove the design, not just the indexes. If you must keep a table, write less often (batch/rollup).
Mistake 4: “Autovacuum can’t keep up / purge lag grows”
Symptoms: PostgreSQL dead tuples pile up; MariaDB history list length grows; performance degrades over time.
Root cause: High-frequency updates to the same rows create churn that maintenance can’t amortize.
Fix: Eliminate per-request updates; use TTL counters in memory; aggregate writes.
Mistake 5: “It works in staging but fails in production”
Symptoms: Load tests pass; real traffic causes lock contention and tail latency.
Root cause: Staging lacks skew (hot tenants), lacks retries, lacks real burstiness, and often lacks real concurrency patterns.
Fix: Test with skewed distributions, bursts, retries, and realistic client concurrency. Also: stop using the DB as the limiter.
Mistake 6: “We used advisory locks to implement a limiter” (PostgreSQL)
Symptoms: Throughput collapses under load; sessions pile up waiting for advisory locks; hard-to-debug stalls.
Root cause: Advisory locks serialize work and are easy to turn into a global mutex. That’s not traffic shaping.
Fix: Replace with gateway limits and per-tenant concurrency caps. Use advisory locks only for rare coordination, not per-request gating.
Checklists / step-by-step plan
Step-by-step plan: migrate off DB-based rate limiting without breaking everything
- Inventory what is being limited. Identify endpoints and keys (per IP, per API key, per tenant). If you can’t explain it, you can’t operate it.
- Measure current limiter cost. Use
pg_stat_statementsor MariaDB performance_schema to quantify time spent in limiter statements. - Stop retrying 429 by default. Fix clients/app middleware. 429 should reduce load, not multiply it.
- Add edge rate limiting for the worst offenders. Start with the top 1–3 endpoints by request rate or DB cost.
- Add per-route concurrency caps in the application. Cap expensive DB work directly. This is often the biggest win.
- Introduce a queue for heavy tasks. Migrations, exports, report generation: if it’s heavy, it belongs in a worker.
- Move short-window counters to an in-memory store if needed. Use TTL-based counters and token bucket semantics outside the DB.
- Keep a durable rollup in the DB for billing/audit. Write one row per tenant per minute/hour, not per request.
- Implement overload behavior. Define what happens when the limiter store is unavailable (fail open for low-risk endpoints? fail closed for abusive patterns?). Decide, document, and test it.
- Delete the old limiter path. Feature-flag removal beats “we’ll keep it just in case.” “Just in case” is how dead code pages you later.
Operational checklist: keep the database protected
- Gateway limits in place for per-client and per-route traffic.
- Connection pooling configured and monitored (active, waiting, maxwait).
- Explicit DB concurrency budgets per service and per tenant.
- Timeouts aligned: gateway < app < DB statement timeout (with intent).
- Retry policy includes exponential backoff + jitter; no automatic retry on 429.
- Dashboards track: lock waits, replication lag, WAL/binlog rate, queue depth, 429 rate.
FAQ
1) Is rate limiting in the database ever acceptable?
Only for low-QPS, low-concurrency back-office workloads where correctness matters more than latency, and where a limiter write won’t contend with core traffic. For public APIs or high-QPS internal services, it’s a trap.
2) Which is “better” for DB-based rate limiting: MariaDB or PostgreSQL?
Neither is “better” in the way you want. Both are excellent OLTP databases. Both will suffer when you force high-frequency shared-counter updates. The failure modes differ in instrumentation and maintenance behavior, not in outcome.
3) What if I need strict, globally consistent limits?
Ask why. Most “strict” requirements are actually about cost control or abuse prevention. Use a distributed limiter designed for that, or accept approximate limits with bounded burst and strong monitoring. If you truly need strictness, you’re signing up for coordination costs—put that coordination in a system built for it, not your OLTP database.
4) Can I just use a separate database instance for rate limiting?
You can, and it’s better than contaminating your primary OLTP database. But it’s still a database doing per-request write coordination. You’ll likely rediscover the same scaling limits, just on a smaller, cheaper fire.
5) Why do “counter tables with TTL” hurt PostgreSQL specifically?
Frequent updates create dead tuples and WAL. Autovacuum must clean them; if it falls behind, performance degrades. Even if it keeps up, you’re spending I/O and CPU on maintenance that does not deliver business value.
6) Why do they hurt MariaDB/InnoDB specifically?
InnoDB must manage redo/undo, page latches, and potential hot spots in clustered indexes. Under high concurrency, you’ll see lock waits, deadlocks, and purge pressure. The limiter becomes a write hotspot with collateral damage.
7) Isn’t using Redis for rate limiting just moving the problem?
It moves the problem to a system designed for fast atomic operations and TTLs. You still need to architect for hot keys and bursts, but you’re no longer taxing your transactional database and its durability machinery for every request.
8) What’s the single most effective protection for the DB?
Cap concurrency to the database. Rate limits based on requests per second are indirect; concurrency caps are direct. If you prevent 500 concurrent expensive queries, the DB stays upright.
9) How do I handle limiter-store outages (e.g., Redis down)?
Decide per endpoint: fail open for low-risk read-only routes, fail closed for abuse-sensitive routes (login, expensive searches). Implement a circuit breaker and a safe default. Test it during business hours, not during an incident.
10) What about “rate limiting inside the DB” for internal services only?
Internal traffic is often the worst: retries, fan-out, batch jobs, and a willingness to hammer. Use budgets and bulkheads internally too. Your database does not care whether the overload is “internal.”
Conclusion: practical next steps
If you remember one thing: a database is not a bouncer. It’s the dance floor. Don’t make it check IDs at the door while the music is playing.
Next steps that actually change outcomes:
- Prove it with data: identify limiter statements in
pg_stat_statementsor MariaDB digests and quantify their cost. - Remove the DB from the rejection path: implement edge limits for top endpoints and stop retrying 429.
- Cap DB concurrency: pool connections and enforce per-route and per-tenant bulkheads.
- Queue heavy work: keep HTTP bursts from turning into DB bursts.
- Keep durability where it belongs: aggregate usage into rollups and store those in the database asynchronously.
You’ll ship faster, sleep more, and your database will go back to doing what it’s good at: storing data, not mediating human ambition.