Ubuntu 24.04: Nginx rate limiting that won’t block real users — how to tune it

October 25, 2025 • February 3, 2026 • Read: 26 min • Views: 11

Was this helpful?

You turned on rate limiting, your graphs calmed down, and then Support lit up: “customers can’t log in.” The bot traffic is gone. So are the humans. Welcome to the classic Nginx trap: rate limiting is easy to enable and surprisingly hard to tune in a way that matches real user behavior.

This is a production-minded guide for Ubuntu 24.04 systems running Nginx. We’ll choose sane keys, size shared memory zones, handle bursts without rewarding abusers, and diagnose 429s without guesswork. You’ll leave with configs you can defend in a change review and a playbook you can run at 02:00 with one eye open.

What Nginx rate limiting actually does (and what it doesn’t)

Nginx has two main knobs people lump together as “rate limiting”:

limit_req: limits the request rate per key (requests per second/minute). Think “how fast is this client asking?”
limit_conn: limits the number of simultaneous connections per key. Think “how many sockets is this client holding open?”

limit_req is what you use for brute-force protection, API fairness, and mitigating basic DDoS spray. limit_conn is what you use for slowloris-style behavior, overly-chatty clients, or a CDN/monitoring mistake that opens too many connections.

What neither one does: they don’t distinguish “real users” from “fake users.” They enforce a policy. If your policy key is wrong or your thresholds ignore modern web behavior (parallel requests, retries, mobile networks), you’ll block the very people who pay your salary.

Also: rate limiting is not a WAF, not bot detection, and not a replacement for authentication gating. It’s a circuit breaker for request volume patterns.

A few facts and history that make tuning easier

These aren’t trivia for trivia’s sake. They’re the reasons a “5r/s per IP” rule works in one environment and detonates in another.

HTTP/1.1 keepalive changed the meaning of “connections”. Early HTTP opened many short connections; keepalive made a single connection carry many requests. That’s why limit_conn can be useless for request floods and devastating for WebSockets.
HTTP/2 multiplexing changed the meaning of “parallel”. Browsers can run many concurrent streams on one TCP connection. A client can generate bursts without opening more sockets, so request-rate controls matter more than connection counts.
CDNs and carrier-grade NAT collapsed users onto shared IPs. Rate limiting by IP can punish entire offices, schools, hotels, and mobile carrier egress pools. This is not a theoretical edge case; it’s Tuesday.
Nginx rate limiting uses a shared memory zone. It’s fast because it’s in-memory and shared across workers. It’s also bounded, and when it’s full, it evicts old entries. Eviction changes behavior under load.
The “leaky bucket” model is old, but still wins. limit_req is a token/leaky bucket style limiter. It’s predictable and cheap compared to per-request external checks.
429 Too Many Requests is a modern contract. It’s not just “blocked.” Many clients will retry; some will back off; some will hammer harder. Whether you return 429 or 503 changes traffic shape.
Retries became normal. Mobile clients, service meshes, and libraries retry aggressively. A “minor slowdown” can turn into a burst of retries that trips your limiter and makes a bad day worse.
Attackers adapted to naive limits years ago. Botnets rotate IPs; credential-stuffers distribute attempts across many sources; scrapers use residential proxies. A single global per-IP limit is mostly a speed bump.

One quote that belongs on every on-call rotation wiki: Hope is not a strategy. — James Cameron. It’s short, blunt, and correct.

Principles: don’t rate limit “users”, rate limit behaviors

If you try to protect “the website” with one limiter, you’ll either be too strict for humans or too generous for abuse. Instead:

Protect expensive actions more than cheap ones. Login attempts, password resets, search endpoints, and dynamic pages get stricter limits. Static assets can be basically unlimited (or offloaded to a CDN).
Prefer identity keys over network keys when you can. Rate limit by API key, user ID, or session cookie for authenticated traffic. IP is a last resort for “unknown” traffic.
Let bursts happen, but cap sustained abuse. Humans click, apps prefetch, browsers open multiple connections, and JS does background requests. A small burst allowance prevents false positives.
Fail “politely” for clients that will retry. If you return 429, include Retry-After where you can. You’re shaping traffic, not just slamming the door.
Make it observable. If you can’t answer “who got limited, on what endpoint, with what key, and was it expected?” then you’re flying blind.

Joke 1/2: Rate limiting is like a nightclub bouncer—good ones stop fights, bad ones eject the accountant because he “looked suspicious.”

Choosing the right key: IP, user, token, or something smarter

The key is the heart of your limiter. It defines “who” is being counted. Most pain comes from picking a key that doesn’t match the reality of your traffic.

Option A: `$binary_remote_addr` (IP-based)

This is the default pattern because it’s simple and doesn’t require application cooperation.

Pros: Works for anonymous traffic; cheap; no app changes.
Cons: NAT and proxies collapse multiple users into one bucket; rotating IPs evade it; behind LBs you might be limiting the LB itself if real IP isn’t configured.

Use it for: brute-force attempts on public endpoints, baseline protection for unauthenticated areas, crude mitigation during an incident.

Option B: authenticated identity (cookie, JWT claim, API key)

If you have an API key header like X-API-Key, or a JWT claim you can map into a variable, you can rate limit per customer rather than per NAT gateway.

Pros: Fair; resistant to NAT issues; aligns with billing/abuse boundaries.
Cons: Requires reliable parsing; unauthenticated paths still need a fallback; leaked keys can be abused.

Option C: composite keys (IP + endpoint class, or IP + UA)

Composite keys are useful when you can’t trust identity and you still want to separate behaviors.

Example: rate limit login attempts per IP, but also apply a global low ceiling for a single IP hitting many usernames. Nginx won’t do cross-key correlation for you, but you can set different zones per location and key choice.

What I actually recommend in production

Use two layers:

Anonymous layer: IP-based limit for sensitive endpoints (login, password reset, search). Modest burst. Strict sustained rate.
Authenticated layer: Per-customer token/API key limit for APIs. Higher burst. Higher sustained rate. Separate limits per plan if you can.

And if you’re behind a reverse proxy, fix real IP first. If you don’t, you’re rate limiting a load balancer and calling it “security.”

Sizing shared memory zones and understanding eviction

Nginx stores limiter state in a shared memory zone defined by limit_req_zone. That zone has a fixed size. When it fills up, old entries get evicted. Eviction causes two kinds of weirdness:

Under-limiting: abusive clients rotate through enough unique keys to push themselves out of the zone, effectively “forgetting” their history.
Over-limiting of innocents: less common, but when a zone thrashes, you may see jittery behavior where clients sometimes pass, sometimes get 429, depending on whether their entry is currently resident.

Nginx documentation often suggests rough memory per state entry, but in practice you size based on unique keys you expect during peak windows. For IP-based limiting on a public site, unique keys can be shockingly high.

Practical sizing heuristic

Start with:

10m for small sites or single-purpose endpoints
50m–200m for public, high-traffic edges

Then measure. If you’re running multiple zones (recommended), each zone needs its own memory.

Burst, nodelay, and why “smooth” is not always kind

Most false positives happen because of bursts. Real browsers and mobile apps don’t make one request per second like polite robots. They do this:

Load HTML
Immediately request CSS, JS, images, fonts
Run API calls (sometimes several) once JS loads
Retry a failed request quickly

limit_req supports burst and nodelay:

burst=N allows N excessive requests to queue (or be delayed), smoothing a spike.
nodelay makes bursts pass immediately up to the burst limit (no smoothing delay); beyond that, requests are rejected.

The tradeoff:

burst without nodelay: kinder to backends, but users experience latency and may retry, which can amplify load.
burst with nodelay: better UX for small bursts, but can deliver a sharper punch to your upstream during attacks.

For login endpoints, I often prefer no nodelay (delay abusers) but a small burst for legit double-clicks. For API endpoints where clients time out and retry, I often prefer nodelay with a moderate burst, so “normal” microbursts don’t become long-tail latency.

Different limits for different endpoints (logins are not CSS)

Stop applying a single limiter to / and calling it a day. You want per-class policies:

Login / auth: strict per IP, very strict per username if the app supports it (Nginx alone can’t, but you can key on a login parameter only with extra modules; often you do it in the app). Add per-session exemptions carefully.
Password reset / OTP: strict. Abuse here costs money and reputation.
Search / expensive queries: strict per IP or per token, because bots love search endpoints.
General API: per API key/customer. Different tiers if you can.
Static assets: usually no limit_req; use CDN caching or just let them fly. If you must, use a very high limit and focus on connection controls.
WebSocket / SSE: don’t use request-rate limiting after the upgrade; consider connection limits keyed by IP or token.

Behind a load balancer: real IP, trust boundaries, and spoofing

On Ubuntu 24.04, Nginx frequently sits behind a cloud load balancer, ingress controller, CDN, or service mesh. If you blindly use $remote_addr, you might be limiting the proxy address. That means:

One noisy client can throttle everyone.
Or you set limits so high they’re meaningless, and attackers stroll through.

Fix it with the Real IP module, but be strict about trust. Only accept X-Forwarded-For (or PROXY protocol) from known proxy IP ranges. If you trust the whole internet, any client can forge an X-Forwarded-For and get infinite fresh identities.

Logging and observability: make 429s actionable

The default access log format won’t tell you why a request was limited. You need:

the key used for limiting (or at least the client IP after real-IP processing)
request time and upstream time
status code and bytes sent
request ID for correlation
which limiter tripped

Nginx exposes $limit_req_status which is extremely useful. Log it. If it’s “REJECTED”, you know the limiter did it. If it’s “PASSED” but clients still complain, your bottleneck is elsewhere.

Practical tasks: commands, outputs, and decisions (12+)

These are the things I actually run on Ubuntu when someone says “rate limiting is blocking real users” or “the site is getting hammered.” Each task includes what the output means and what decision you make from it.

Task 1: Confirm Nginx version and build modules

cr0x@server:~$ nginx -V
nginx version: nginx/1.24.0 (Ubuntu)
built with OpenSSL 3.0.13 30 Jan 2024
configure arguments: ... --with-http_realip_module --with-http_limit_req_module --with-http_limit_conn_module ...

What it means: You have the Real IP, limit_req, and limit_conn modules compiled in (common on Ubuntu packages).

Decision: If --with-http_realip_module is missing and you’re behind a proxy, stop and fix your packaging strategy (use the distro package, or rebuild). Without real IP, your key selection may be pointless.

Task 2: Validate the active config (not the one you think is active)

cr0x@server:~$ sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

What it means: Syntax is valid.

Decision: If this fails, do not reload. Fix config first. Rate limiting changes that don’t load cleanly become “we changed something” folklore with no actual effect.

Task 3: Dump the full loaded configuration to see includes and overrides

cr0x@server:~$ sudo nginx -T | sed -n '1,120p'
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
# configuration file /etc/nginx/nginx.conf:
user www-data;
worker_processes auto;
...
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;

What it means: You can now locate where limit_req and limit_req_zone are defined.

Decision: If you find multiple conflicting limiters across conf.d and sites-enabled, consolidate. Fragmented rate limits are how you get “works in staging” and “blocks users in prod.”

Task 4: Check whether you are rate limiting the load balancer (real IP sanity)

cr0x@server:~$ sudo tail -n 5 /var/log/nginx/access.log
10.10.5.12 - - [30/Dec/2025:10:10:12 +0000] "GET /login HTTP/2.0" 200 512 "-" "Mozilla/5.0 ..."
10.10.5.12 - - [30/Dec/2025:10:10:12 +0000] "GET /api/me HTTP/2.0" 429 169 "-" "Mozilla/5.0 ..."
10.10.5.12 - - [30/Dec/2025:10:10:13 +0000] "GET /api/me HTTP/2.0" 429 169 "-" "Mozilla/5.0 ..."

What it means: If every client appears as the same RFC1918 address (like 10.10.5.12), that’s probably your proxy/LB.

Decision: Fix Real IP before touching thresholds. Otherwise you’ll tune for the wrong “client” and block everyone at once.

Task 5: Verify Real IP is configured and trust is narrow

cr0x@server:~$ sudo nginx -T | grep -E 'real_ip_header|set_real_ip_from|real_ip_recursive'
real_ip_header X-Forwarded-For;
set_real_ip_from 10.10.0.0/16;
set_real_ip_from 192.0.2.10;
real_ip_recursive on;

What it means: Nginx will replace $remote_addr using XFF, but only when the request comes from trusted proxy ranges.

Decision: If you see set_real_ip_from 0.0.0.0/0;, you’ve effectively invited spoofing. Fix that immediately. Your rate limiting can be bypassed by a header.

Task 6: Inspect error log for limit_req signals

cr0x@server:~$ sudo grep -E 'limiting requests|limit_req' /var/log/nginx/error.log | tail -n 5
2025/12/30 10:10:13 [error] 12410#12410: *991 limiting requests, excess: 5.610 by zone "api_per_ip", client: 203.0.113.55, server: example, request: "GET /api/me HTTP/2.0", host: "example"

What it means: Nginx is rejecting due to zone api_per_ip, and it shows the calculated “excess.” This is gold for tuning.

Decision: If humans are blocked, decide whether the key is wrong (NAT/proxy), the rate is too low, or the burst is too small.

Task 7: Add a log format that records limiter status (and check it works)

cr0x@server:~$ sudo grep -R "limit_req_status" -n /etc/nginx | head
/etc/nginx/nginx.conf:35:log_format main_ext '$remote_addr $request_id $status $request $limit_req_status';

What it means: Your logs now record whether the limiter passed, delayed, or rejected a request.

Decision: If you can’t see $limit_req_status per request, you will misdiagnose “429 because of upstream” vs “429 because of limiter.” Add it before tuning.

Task 8: Confirm what’s returning 429 (Nginx vs upstream)

cr0x@server:~$ curl -s -D - -o /dev/null https://example/api/me | sed -n '1,12p'
HTTP/2 429
server: nginx
date: Tue, 30 Dec 2025 10:12:10 GMT
content-type: text/html
content-length: 169

What it means: The response is from Nginx (see server: nginx). If your upstream also returns 429, you’ll need additional headers/log fields to distinguish.

Decision: If it’s Nginx, tune Nginx. If it’s upstream, don’t waste time adjusting limit_req.

Task 9: Measure current request rates per client IP quickly

cr0x@server:~$ sudo awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  8421 203.0.113.55
  2210 198.51.100.27
   911 203.0.113.101
   455 192.0.2.44

What it means: Which IPs are hottest in your logs for the sampled window (not a true rate, but a quick heat map).

Decision: If a few IPs dominate, per-IP limits are likely effective. If you see many IPs each with low counts, you’re dealing with distributed traffic and per-IP limits won’t help much.

Task 10: Check active connections and who holds them

cr0x@server:~$ sudo ss -Htn state established '( sport = :443 )' | awk '{print $4}' | cut -d: -f1 | sort | uniq -c | sort -nr | head
  120 203.0.113.55
   48 198.51.100.27
   11 192.0.2.44

What it means: Clients with many established TCP connections to 443.

Decision: If one client holds hundreds/thousands of connections, limit_conn is a better tool than limit_req. If connections are low but requests are high, focus on limit_req and caching.

Task 11: Check whether HTTP/2 is enabled (affects burst behavior)

cr0x@server:~$ sudo nginx -T | grep -R "listen 443" -n /etc/nginx/sites-enabled | head
/etc/nginx/sites-enabled/example.conf:12:    listen 443 ssl http2;

What it means: HTTP/2 multiplexing is in play.

Decision: Expect “bursty” patterns from normal browsers. Increase burst for front-end endpoints, or scope limiting tighter to expensive paths.

Task 12: Track 429s over time from logs (cheap trend view)

cr0x@server:~$ sudo awk '$9==429 {c++} END{print c+0}' /var/log/nginx/access.log
317

What it means: Count of 429 responses in the current log file.

Decision: If this number is non-trivial, you need to classify: are these expected (bots) or unexpected (customers)? Add endpoint breakdown next.

Task 13: Break down 429s by path to find what you’re actually blocking

cr0x@server:~$ sudo awk '$9==429 {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  210 /api/me
   61 /login
   24 /search
   12 /password/reset

What it means: Which endpoints are being limited.

Decision: If /api/me is getting limited and that’s called by your SPA on every page load, your baseline is too strict or your key is wrong (NAT/proxy). If /login is limited heavily, that might be correct (credential stuffing) but verify customer complaints.

Task 14: Confirm zone configuration and rates (find the policy you’re enforcing)

cr0x@server:~$ sudo nginx -T | grep -E 'limit_req_zone|limit_req[^_]' -n | head -n 30
45:    limit_req_zone $binary_remote_addr zone=login_per_ip:20m rate=5r/m;
46:    limit_req_zone $binary_remote_addr zone=api_per_ip:50m rate=10r/s;
112:       limit_req zone=login_per_ip burst=3;
166:       limit_req zone=api_per_ip burst=20 nodelay;

What it means: Your login is limited to 5 requests per minute per IP. Your API is 10 requests per second per IP. Those are very different policies.

Decision: Decide if these numbers reflect reality. 10r/s per IP might be too low if NAT’d corporate customers exist. 5r/m on login might be too low if the login page triggers multiple auth-related calls.

Task 15: Reload safely and confirm workers pick it up

cr0x@server:~$ sudo systemctl reload nginx
cr0x@server:~$ systemctl status nginx --no-pager | sed -n '1,12p'
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-12-30 09:55:01 UTC; 18min ago
       Docs: man:nginx(8)

What it means: Reload succeeded and Nginx stayed up.

Decision: If you see failed reloads, stop iterating. Fix configuration hygiene first. Rate limit tuning in a broken deployment pipeline is just performance art.

Fast diagnosis playbook

When 429s spike or users complain, don’t wander. Do this in order. The goal is to find the bottleneck (policy, key, or capacity) in under 10 minutes.

First: confirm who is generating the 429 and why

Check access logs for status 429 and top paths (Task 12, Task 13).
Check error log for limiting requests messages (Task 6).
- If error log shows limiter rejections: it’s Nginx policy.
- If not: 429 may be from upstream app or gateway layer.
Check headers via curl (Task 8) to see if Nginx is serving 429.

Second: verify the key is sane in your topology

Look at the logged client IPs (Task 4).
- If you see LB IPs: real IP config is missing/broken.
- If you see a small set of NAT IPs: per-IP limiting might be punishing many users.
Verify Real IP trust boundaries (Task 5). Ensure you trust only your proxies.

Third: decide if thresholds or burst are wrong (or if you need endpoint-specific limits)

Inspect current limit zones and rates (Task 14).
Check HTTP/2 usage (Task 11). If enabled, increase burst for browser-facing endpoints.
Check whether the pain is connections or requests (Task 10). Use limit_conn for connection hoarders.

Fourth: verify it’s not a capacity problem masquerading as rate limiting

When upstreams slow down, retries increase, which increases request rate, which triggers your limiter. The limiter isn’t “wrong”; it’s exposing a real capacity issue.

Use your existing metrics, but on-box you can at least log upstream timing fields and spot slow upstreams in access logs. If upstream times are high, tune upstream first or add caching, then revisit the limiter.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption (NAT is not “rare”)

A mid-sized B2B SaaS rolled out Nginx limit_req with a tidy rule: 5 requests per second per IP on /api/. In staging it looked perfect. In production, everything was fine until Monday morning in North America.

Support tickets came in waves: “App spins forever,” “Dashboard doesn’t load,” “Random 429s.” Engineering checked the dashboards and saw a mild traffic increase—nothing dramatic. The incident commander initially suspected a bad deploy. Rollback didn’t help.

On the edge box, the access logs told the story: multiple paying customers appeared from the same handful of IPs. Those weren’t “users.” They were corporate egress gateways and carrier NAT pools. A single office with hundreds of employees sharing one egress IP could now collectively “spend” only 5 requests per second across the entire app.

The immediate fix was ugly but effective: raise the per-IP limit substantially and add stricter rules only for sensitive endpoints (login, password reset). The real fix was better keying: per API token for authenticated API calls, and per session cookie for browser traffic where possible.

After the dust settled, they added a pre-flight checklist item: “Show distribution of unique IPs vs authenticated IDs during peak.” It became a routine part of capacity planning, not an afterthought.

Mini-story 2: The optimization that backfired (nodelay everywhere)

A different company had a noisy client problem. Someone suggested: “Use burst with nodelay so real users don’t feel throttled.” They applied it broadly: all API endpoints, all clients, generous burst, nodelay enabled.

User experience improved in the happy path. Then an integration partner misconfigured a retry policy. Their client hit timeouts and retried aggressively with parallelism. With nodelay and a large burst, Nginx happily forwarded large microbursts straight to upstream services, which were already struggling.

The upstream services fell over, which increased timeouts, which increased retries, which increased bursts. A feedback loop with a nice user-friendly veneer. Monitoring showed Nginx was “fine” and CPU was stable, but upstream error rates spiked and latency went vertical.

The eventual fix was a more nuanced policy: keep nodelay for a few read-only endpoints that needed snappy behavior, but remove nodelay for expensive endpoints and reduce burst sizes. They also introduced upstream queue protection and better timeout/retry defaults in the client SDK.

The moral wasn’t “nodelay is bad.” The moral was that smoothing and user experience must be aligned with backend capacity, not wishful thinking.

Mini-story 3: The boring but correct practice that saved the day (log the limiter status)

A fintech team had a habit that looked dull in code review: every time they added a limiter, they also added log fields for $request_id, $limit_req_status, and the key they were using (hashed where necessary). No exceptions.

One evening, they saw a spike in 429s and a drop in conversion. The initial suspicion was a bot attack. It wasn’t. The logs showed most 429s were “PASSED” and the upstream response code was 429—meaning the application itself was rate limiting because a downstream payment provider was returning errors and their circuit breaker was tripping.

Because the edge logs were explicit, they didn’t waste an hour loosening Nginx limits and inviting real abuse. They fixed the payment provider failover logic, adjusted retry windows, and left the edge policy alone.

That incident didn’t become a multi-team blame festival. It became a 40-minute fix with a clear timeline. Boring logging won over dramatic guessing.

Common mistakes: symptoms → root cause → fix

1) “All users get 429 at the same time”

Symptoms: sudden widespread 429s; users across regions affected; logs show same client IP.

Root cause: you’re rate limiting the load balancer/proxy IP because Real IP isn’t configured (or misconfigured).

Fix: configure real_ip_header and narrow set_real_ip_from to trusted proxy IP ranges. Then use $binary_remote_addr (after real IP processing) as the key.

2) “Corporate customers complain, home users are fine”

Symptoms: office networks see errors; mobile and residential users don’t; top IP list shows a few IPs with heavy traffic.

Root cause: per-IP limits punish NAT’d networks and VPN egress points.

Fix: for authenticated endpoints, rate limit by API key or session. For anonymous, raise per-IP limits and narrow strict limits to specific expensive endpoints.

3) “Login is broken, but only sometimes”

Symptoms: intermittent login failures; users succeed after waiting; spikes during marketing campaigns.

Root cause: login flow makes multiple calls (CSRF fetch, MFA preflight, telemetry) and trips a strict limiter with too-small burst.

Fix: apply rate limits to the POST credential endpoint, not every request under /login. Increase burst slightly; consider delaying (no nodelay) instead of rejecting for minor excess.

4) “We enabled rate limiting, but attacks still hurt”

Symptoms: backend load still high; limiter logs show low rejection rates; many source IPs.

Root cause: distributed traffic (botnets, residential proxies) makes per-IP limits ineffective; or the zone is too small and thrashes.

Fix: add identity-based limits (API key, token), endpoint-specific policies, caching, and upstream protections. Increase zone sizes and monitor eviction effects.

5) “After tuning, upstream started failing more”

Symptoms: fewer 429s, but more 5xx from upstream; latency increases.

Root cause: you loosened limits without capacity; or you enabled nodelay with large burst and forwarded spikes to upstream.

Fix: reintroduce smoothing (remove nodelay) for expensive endpoints; set reasonable bursts; tune upstream timeouts; add caching and queueing.

6) “Rate limiting is bypassed”

Symptoms: limiter seems ineffective; attackers appear as many IPs; logs show odd XFF values.

Root cause: trusting X-Forwarded-For from untrusted sources (set_real_ip_from 0.0.0.0/0 or equivalent).

Fix: trust only known proxies; consider PROXY protocol if supported; validate that client IP changes only when requests come from trusted proxy networks.

Checklists / step-by-step plan

This is the sequence that tends to work in real environments without turning your edge into a slot machine.

Step-by-step plan: implement rate limiting without blocking humans

Fix client identity at the edge.
- Behind a proxy? Configure Real IP with narrow trust.
- Decide what “client” means: IP for anonymous, token for authenticated.
Classify endpoints.
- Auth endpoints (login, reset) = strict
- Expensive endpoints (search, reports) = strict-ish
- Cheap read endpoints = moderate
- Static assets = typically no rate limit
Create separate zones per class.
- Do not reuse one zone for everything. You’ll hide which behavior is abusive.
Set initial rates with a bias toward fewer false positives.
- Start higher than you think, then ratchet down based on observed abuse.
- Use burst to protect humans from spiky clients.
Log limiter status and request IDs.
- If you don’t log it, you can’t tune it.
Roll out gradually.
- Apply to one endpoint class first (login is a good candidate).
- Watch 429 rate, conversion, and error logs.
Decide your response behavior.
- Use 429 for fairness throttling.
- Consider delaying (no nodelay) for brute force to waste attacker time.
Test with realistic traffic shape.
- Browsers (HTTP/2) and mobile apps are bursty.
- Include retries in load tests, because production will.

A solid baseline configuration (opinionated)

This example assumes:

You’re behind a trusted LB in 10.10.0.0/16
You want strict login protection per IP
You want API protection per API key when present, otherwise per IP
You want logs that tell you what happened

cr0x@server:~$ sudo sed -n '1,220p' /etc/nginx/nginx.conf
user www-data;
worker_processes auto;
pid /run/nginx.pid;

events { worker_connections 1024; }

http {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;

  log_format main_ext '$remote_addr $request_id $time_local '
                      '"$request" $status $body_bytes_sent '
                      'rt=$request_time urt=$upstream_response_time '
                      'lrs=$limit_req_status '
                      'xff="$http_x_forwarded_for"';

  access_log /var/log/nginx/access.log main_ext;
  error_log /var/log/nginx/error.log warn;

  real_ip_header X-Forwarded-For;
  set_real_ip_from 10.10.0.0/16;
  real_ip_recursive on;

  # Key selection for APIs: API key if present, else IP.
  map $http_x_api_key $api_limit_key {
    default $http_x_api_key;
    ""      $binary_remote_addr;
  }

  # Whitelist internal monitoring and office VPN (example).
  map $binary_remote_addr $is_whitelisted {
    default 0;
    192.0.2.44 1;
    198.51.100.77 1;
  }

  # Zones: size according to expected unique keys.
  limit_req_zone $binary_remote_addr zone=login_per_ip:20m rate=5r/m;
  limit_req_zone $api_limit_key     zone=api_per_key:100m rate=20r/s;

  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

cr0x@server:~$ sudo sed -n '1,220p' /etc/nginx/sites-enabled/example.conf
server {
  listen 443 ssl http2;
  server_name example;

  # Default: do not rate limit everything. Target specific locations.

  location = /login {
    if ($is_whitelisted) { break; }
    limit_req zone=login_per_ip burst=6;
    proxy_pass http://app_upstream;
  }

  location = /password/reset {
    if ($is_whitelisted) { break; }
    limit_req zone=login_per_ip burst=3;
    proxy_pass http://app_upstream;
  }

  location ^~ /api/ {
    if ($is_whitelisted) { break; }
    limit_req zone=api_per_key burst=40 nodelay;
    proxy_pass http://app_upstream;
  }

  location / {
    proxy_pass http://app_upstream;
  }
}

Why this baseline works:

Login is throttled per IP at a human scale (per minute), allowing minor bursts.
API is throttled per customer key when available, which avoids NAT pain.
Rate limiting is targeted; you’re not breaking asset loads or basic navigation.
Logs capture limiter status so you can tune with evidence.

Joke 2/2: If you rate limit your own health checks, congratulations—you just invented “self-care” for servers, and they will take it literally.

FAQ

1) What’s the difference between `limit_req` and `limit_conn`?

limit_req limits request rate (r/s or r/m). limit_conn limits concurrent connections. Use request limits for floods and fairness; use connection limits for connection hoarding and long-lived streams.

2) Should I rate limit globally at `location /`?

Almost never. You’ll punish normal browser behavior and retries. Rate limit expensive or sensitive endpoints, then add a gentle baseline only if you truly need it.

3) Why do real users get blocked when bots don’t?

Because your key is wrong (NAT/proxy), your thresholds are too low for normal burst patterns, or bots are distributed across many IPs while your users are concentrated behind shared egress.

4) Is `$binary_remote_addr` better than `$remote_addr`?

For limiter keys, yes. It’s a compact binary representation and is commonly recommended for shared memory efficiency. But it only helps if Real IP is correctly set up behind proxies.

5) When should I use `nodelay`?

Use it when you want to allow brief bursts without adding latency and your upstream can take short spikes. Avoid it on endpoints that are expensive or when upstream is already unstable.

6) How do I avoid punishing users behind NAT?

Rate limit authenticated traffic by customer identity (API key, token, session) and keep per-IP limits mainly for anonymous sensitive endpoints. Also avoid tiny per-IP thresholds on general APIs.

7) Can I “whitelist” some clients safely?

Yes, but keep it narrow and auditable (monitoring IPs, office VPN egress, specific partner IPs). Whitelists tend to grow and become a security liability if unmanaged.

8) How do I know if the limiter zone is too small?

You’ll see inconsistent limiting under high unique-key cardinality and patterns that don’t make sense. Practically: if you have a public endpoint and a tiny zone, increase it and observe whether limiter behavior stabilizes.

9) Is returning 429 always the right choice?

For fairness throttling, yes. For brute-force deterrence, delaying (no nodelay) can be more effective. For meltdown protection, 503 with backoff semantics can also make sense, but be intentional.

10) Can Nginx do per-username rate limiting on login attempts?

Not cleanly with stock variables, because the username is typically in the POST body. Do per-IP at Nginx, and do per-username and per-account controls in the application layer.

Conclusion: practical next steps

If you want rate limiting that doesn’t kneecap real users, do three things before you touch a single number:

Fix identity at the edge. Real IP behind proxies, and identity keys for authenticated APIs.
Limit by behavior, not by site. Separate zones and policies for login/reset/search/API. Leave the rest alone unless you have evidence.
Make 429s debuggable. Log $limit_req_status, add request IDs, and check error logs for limiter messages.

Then tune like an adult: measure, change one thing, reload safely, and watch outcomes. Your goal isn’t “fewer requests.” Your goal is “fewer bad requests, same good users.” That’s the whole job.