Docker Rate Limiting and WAF in Front of Containers Without Blocking Real Users

January 23, 2026 • February 3, 2026 • Read: 23 min • Views: 11

Was this helpful?

Everything works in staging. Then production gets discovered—by customers, by scrapers, by “security researchers,” and by whatever is living inside the public IP address space that day. You add a WAF and a rate limit. The alerts quiet down. And then the support queue lights up: “I can’t log in,” “Checkout fails,” “API says 429,” “My office IP is blocked,” “Your site hates hotels.”

This is the part nobody sells in the product demo: rate limiting and WAFs are not safety rails, they’re steering wheels. If you don’t steer—if you don’t observe, tune, and model real user behavior—you’ll absolutely block the people paying your bills, while the bots calmly rotate IPs and keep going.

A working mental model: WAF vs rate limiting vs “just a reverse proxy”

Put the buzzwords down for a minute and think in terms of failure modes.

Rate limiting

Rate limiting is about capacity protection and abuse cost. It answers: “How many requests do we allow from a single actor in a window?” It does not answer: “Is this request malicious?” It also doesn’t automatically stop distributed abuse (a botnet can be polite per IP while still melting you in aggregate).

WAF (web application firewall)

A WAF is about input risk. It answers: “Does this request look like an exploit attempt or protocol abuse?” It’s pattern matching and anomaly detection—sometimes good, sometimes hilariously wrong. WAFs reduce common attack noise (SQL injection sprays, path traversal probes), but they also punish unusual but legitimate traffic, like APIs with weird query strings or users behind enterprise proxies.

Reverse proxy

The reverse proxy is your traffic cop: TLS termination, routing, header normalization, connection pooling, and the place where you can implement both WAF-ish filtering and rate limiting. If you’re running Docker, a reverse proxy is not optional unless you enjoy exposing every container port to the internet like it’s 2014.

Opinion: Put rate limiting and WAF logic at the edge you can observe and version-control. That means a proxy you own (Nginx/HAProxy/Traefik) or a managed edge (CDN/WAF provider) where you still get logs and can stage changes. “Black box security” is how you end up debugging production with vibes.

One quote to keep taped to your monitor: “Hope is not a strategy.” — attributed to various operators; treat it as a paraphrased idea, not scripture.

Facts and historical context (the stuff that explains today’s pain)

WAFs became mainstream after early 2000s web exploits: OWASP started in 2001; the first OWASP Top 10 appeared in 2003 and shaped WAF rule sets for two decades.
ModSecurity started as an Apache module (2002): it popularized the “rules engine” model, which later showed up in Nginx, ingress controllers, and managed WAFs.
Rate limiting predates the web: token bucket and leaky bucket algorithms come from telecom/networking work in the 1980s; they’re still the core primitives today.
HTTP/1.1 keep-alive changed the math: once clients could reuse connections, per-connection limits became useless, and per-request/per-identity controls mattered more.
NAT got everywhere: mobile carriers and corporate networks increasingly put thousands of users behind a handful of egress IPs, so “per-IP” limits started punishing innocent crowds.
CDNs shifted the “real client IP” problem: once traffic terminates at a CDN/WAF, your origin sees the CDN IP unless you explicitly trust and parse forwarded headers.
HTTP/2 multiplexing enables high request rates per connection: if you only limit connections, a single client can still send a flood of streams.
Containers made “edge” a moving target: in VM-era architectures, the load balancer was stable; in container land, teams keep trying to stick security in the app container. That’s adorable. Don’t.

Joke #1: A WAF rule set is like a houseplant—ignore it for a month and it will die, and possibly take your weekend with it.

Recommended architecture patterns (and when to pick each)

Pattern A: CDN/WAF provider → reverse proxy → Docker services

This is the “adult supervision” setup. Your managed edge absorbs volumetric junk and handles basic bot mitigation. Your reverse proxy enforces precise routing, app-aware limits, and emits logs you can correlate.

Use it when: you’re internet-facing, you have login/checkout/API endpoints, and you care about not waking up to a global scan taking you down.

Main risk: trusting forwarded headers incorrectly. If you treat any X-Forwarded-For as gospel, attackers will spoof “real client IP” and bypass per-IP limits.

Pattern B: Reverse proxy with WAF module (e.g., ModSecurity/Coraza) → Docker services

This gives you rule-level control and keeps logs local. It’s also operationally heavy: rules need tuning, and you’re responsible for performance.

Use it when: compliance requires self-managed controls, or you can’t put a managed edge in front for business reasons.

Main risk: CPU burn and false positives if you enable full CRS paranoia levels without understanding your own traffic.

Pattern C: App-level rate limiting inside the container

Good for “one user should not spam one endpoint” logic (e.g., password reset). Bad as your primary protection.

Use it when: you need identity-aware limits (account ID, API key) and you can’t express it cleanly at the proxy.

Main risk: each replica limits independently unless you use a shared store (Redis, etc.). Attackers just round-robin you.

What I recommend for most Docker shops

Do both edge and origin controls:

At the edge: coarse bot mitigation and volumetric shielding.
At your reverse proxy: precise per-route rate limits and basic WAF checks tuned to your app.
In the app: targeted, identity-aware controls for sensitive workflows.

Layered controls. Measured. Logged. Rollbackable.

Identity is hard: IPs lie, users roam, and NAT ruins your day

Most self-inflicted WAF/rate-limit outages come from one assumption: “IP address equals user.” In 2026, that assumption is wrong often enough to hurt you.

Why IP-based limits still matter

They’re easy and cheap. They stop the laziest scanners. They reduce background radiation. But they’re not fairness. They’re a blunt instrument.

Where IP-based limits fail

Carrier-grade NAT (mobile networks): thousands of devices behind a handful of IPs.
Corporate proxies: an entire office appears as one address. Congratulations, you just rate-limited accounting.
Privacy relays and VPNs: legitimate users can look “bot-like,” and bots can look “legitimate.”
IPv6: users can have many addresses; naive per-IP logic can be bypassed by simple address rotation.

Better identities than “source IP”

Pick the strongest identifier you can reliably validate:

API key (best for B2B APIs)
Account ID (after auth)
Session cookie (careful with bots that store cookies)
Device fingerprint (useful, but fraught and privacy-sensitive)
IP subnet (sometimes fairer than exact IP, sometimes worse)

Practical rule: rate-limit by IP for unauthenticated endpoints, then switch to identity-based limits once a user is authenticated. That’s where fairness lives.

Designing sane rate limits that won’t torch real users

Rate limiting isn’t about picking a number. It’s about modeling behavior: human flows, app retries, mobile flakiness, and third-party integrations that behave like caffeinated squirrels.

Start with endpoint classes, not a global limit

Login: low rate, high sensitivity. Add progressive delay and strong bot checks.
Password reset / OTP: very low rate, strict auditing.
Search: medium-to-high, but protect backend caches and databases.
Static assets: usually handled by CDN; don’t waste origin rate limit budget here.
API bulk endpoints: allow bursts, enforce sustained rates, and require keys.
Webhooks inbound: rate limit by partner identity, not IP, and allow retries.

Choose an algorithm that matches your traffic

Token bucket is usually right: allow short bursts (page loads, app startup), cap sustained abuse. Fixed window limits are easy but cause cliff effects: users get blocked because they happened to click “refresh” at the wrong second.

Make 429 responses useful

If you return HTTP 429, include Retry-After (seconds) or a rate limit header scheme you actually honor. Real clients will back off. Bad clients will ignore it, which is fine; you still protect your backend.

Do not rate-limit health checks like they’re attackers

If your orchestrator or external monitor gets 429, you’ve created a self-own: it will retry harder, and now you’re “DDoS’d” by your own tooling.

Decide what “blocked” means

Blocking can be:

Hard reject (403/429): best for known-bad behavior.
Soft challenge: CAPTCHA or JS challenge at the edge; useful for suspicious but not certain traffic.
Degrade: serve cached/stale results, reduce expensive features, or queue work.

WAF tuning without superstition

A WAF rule set is a hypothesis. Your traffic is the experiment. Treat tuning like engineering, not like folklore.

Run in detection-only mode first

Enable the WAF to log what it would block. Gather at least a few days of traffic, including peak business hours and batch jobs. Then tune exceptions based on evidence.

Know your high-false-positive zones

JSON APIs with nested payloads: can trigger “request body anomaly” rules.
GraphQL: query strings often look like injection patterns.
Search filters: users paste odd characters; WAFs panic.
File upload endpoints: multipart parsing and size thresholds.
Redirect/callback URLs: encoded URLs in parameters look like SSRF probes.

Exception strategy: narrow, logged, reviewed

Don’t disable whole classes of rules globally because one endpoint is noisy. Carve exceptions by:

Path (exact route)
Method (POST vs GET)
Parameter name (only allow weirdness in q for search, not in redirect)
Content-type (JSON vs form)
Authenticated vs unauthenticated

WAF performance matters

WAF inspection can be CPU-heavy. If your proxy is saturated, you’ll see latency increase and then a wave of timeouts. That looks like an app incident, but it’s really your security layer strangling the host.

Joke #2: The fastest way to find an untested WAF rule is to deploy it on Friday afternoon. It’s also the fastest way to meet your on-call rotation.

Practical tasks: commands, outputs, and decisions (12+)

These are the kinds of tasks you run when users report blocks, bots spike, or you’re rolling out controls for the first time. Commands are written for a Docker host with a reverse proxy container (Nginx/Traefik/HAProxy) and typical Linux tooling.

Task 1: Confirm what ports are actually exposed

cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}'
NAMES            IMAGE                 PORTS
edge-nginx       nginx:1.25            0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp
app-api          myco/api:2026.01      8080/tcp
app-web          myco/web:2026.01      3000/tcp
redis            redis:7               6379/tcp

What it means: only the edge proxy publishes 80/443 to the world. The app containers are internal-only.

Decision: if you see 0.0.0.0:8080->8080 on an app container, fix it. Rate limits and WAFs don’t help if attackers bypass the proxy.

Task 2: Verify the proxy is the only listener on 80/443

cr0x@server:~$ sudo ss -lntp | egrep ':80|:443'
LISTEN 0      4096         0.0.0.0:80        0.0.0.0:*    users:(("docker-proxy",pid=2114,fd=4))
LISTEN 0      4096         0.0.0.0:443       0.0.0.0:*    users:(("docker-proxy",pid=2121,fd=4))

What it means: Docker is publishing ports, likely mapped to the proxy container.

Decision: if other processes listen on these ports, you may have a bypass path or a port collision. Fix before tuning WAF rules.

Task 3: Check whether you’re trusting forwarded headers correctly

cr0x@server:~$ docker exec -it edge-nginx nginx -T 2>/dev/null | egrep -n 'real_ip|set_real_ip_from|X-Forwarded-For' | head -n 30
45:    real_ip_header X-Forwarded-For;
46:    set_real_ip_from 172.18.0.0/16;
47:    real_ip_recursive on;

What it means: Nginx is configured to treat X-Forwarded-For as client IP, but only when the immediate sender is in 172.18.0.0/16 (your Docker network).

Decision: good. If you see set_real_ip_from 0.0.0.0/0; you’re letting attackers spoof their IP and dodge limits. Fix immediately.

Task 4: Confirm what IP the app sees (the “why are users grouped together?” check)

cr0x@server:~$ docker logs --tail=20 app-api
2026-01-03T08:18:22Z INFO request method=GET path=/v1/me remote=172.18.0.5 xff="198.51.100.23"
2026-01-03T08:18:22Z INFO request method=POST path=/v1/login remote=172.18.0.5 xff="203.0.113.10, 172.18.0.1"

What it means: the app’s TCP peer is the proxy, but it also receives X-Forwarded-For. The chain indicates multiple hops.

Decision: ensure your app framework uses the correct client IP logic and trusts only your proxy. Otherwise, rate limits in-app will be unfair or bypassable.

Task 5: Identify top talkers at the edge (by IP) from access logs

cr0x@server:~$ sudo awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  9123 203.0.113.77
  5110 198.51.100.23
  4022 203.0.113.10
  1889 192.0.2.44

What it means: these IPs are generating a lot of requests. Not necessarily malicious; could be NAT or a partner.

Decision: cross-check with user-agent, paths, and response codes before blocking. High volume alone is not guilt.

Task 6: Find what endpoints are triggering 429s

cr0x@server:~$ sudo awk '$9==429 {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  644 /v1/login
  201 /v1/search
   88 /v1/password-reset

What it means: your current limit is most frequently hit on login and search.

Decision: login 429s might be good (bot control) or bad (real users behind NAT). Search 429s often mean your limit is too low for UI behavior or the frontend is retrying aggressively.

Task 7: Check if 429s correlate with a specific user-agent (bot signature)

cr0x@server:~$ sudo awk '$9==429 {print $12}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
  701 "-" 
  133 "python-requests/2.31.0"
   67 "Mozilla/5.0"

What it means: a lot of 429s have missing/blank user-agent, which is common for scripts.

Decision: you can add stricter limits or challenges for empty user-agent traffic, while being cautious not to break legitimate clients (some corporate tooling is… minimalistic).

Task 8: Confirm whether your proxy is CPU-bound during WAF inspection

cr0x@server:~$ docker stats --no-stream edge-nginx
CONTAINER ID   NAME        CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
a1b2c3d4e5f6   edge-nginx  186.45%   612MiB / 2GiB         29.88%    1.8GB / 2.1GB     0B / 0B    38

What it means: nearly 2 cores are saturated in the proxy container. That can be WAF parsing, regex-heavy rules, or just too much traffic.

Decision: if CPU spikes align with blocked requests and latency, tune WAF rules (reduce scope), add caching/CDN, or scale proxy replicas behind a load balancer.

Task 9: Verify conntrack and socket backlog symptoms (edge host under stress)

cr0x@server:~$ sudo sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_count = 249812
net.netfilter.nf_conntrack_max = 262144

What it means: conntrack is close to full. New connections may fail or get weird timeouts.

Decision: raise conntrack max carefully, reduce connection churn (keep-alive), and ensure rate limiting is not pushing clients into reconnect storms.

Task 10: Inspect WAF logs for the exact rule that blocks real users

cr0x@server:~$ sudo tail -n 20 /var/log/modsecurity/audit.log
--b7c1f3-A--
[03/Jan/2026:08:20:11 +0000] WQx8cQAAAEAAAB9J 198.51.100.23 54022 172.18.0.10 443
--b7c1f3-B--
GET /v1/search?q=%2B%2B%2B HTTP/1.1
Host: api.example.internal
User-Agent: Mozilla/5.0
--b7c1f3-F--
HTTP/1.1 403
--b7c1f3-H--
Message: Access denied with code 403 (phase 2). Matched "Operator `Rx' with parameter `(?:\bunion\b|\bselect\b)'" against variable `ARGS:q' (Value: '+++') [id "942100"]

What it means: rule 942100 fired on the search parameter q. The user searched for “+++” and got treated like an SQL injection attempt.

Decision: add a targeted exception for ARGS:q on /v1/search (or lower paranoia for that endpoint). Do not disable the entire SQLi rule set globally.

Task 11: Validate that your rate limiting key is what you think it is

cr0x@server:~$ docker exec -it edge-nginx nginx -T 2>/dev/null | egrep -n 'limit_req_zone|limit_req' | head -n 40
120: limit_req_zone $binary_remote_addr zone=perip:20m rate=10r/s;
215: location /v1/login {
216:     limit_req zone=perip burst=20 nodelay;
217:     proxy_pass http://app-api:8080;
218: }

What it means: the key is the client IP ($binary_remote_addr). If your “real IP” is wrong, this collapses many users into one bucket.

Decision: confirm real IP handling first. If lots of legit users share IPs (corporate NAT), move login limits to a combination of IP + cookie or add identity-based limits in the app.

Task 12: Reproduce a 429 from your workstation to confirm behavior and headers

cr0x@server:~$ for i in {1..40}; do curl -sk -o /dev/null -w "%{http_code} %{time_total}\n" https://example.com/v1/login; done | tail
200 0.031122
200 0.030881
429 0.000942
429 0.000913
429 0.000901

What it means: after some requests, you hit the limit and get immediate 429s.

Decision: check whether 429 includes Retry-After. If not, add it. If clients hammer harder after 429, you’ll see a feedback loop.

Task 13: Verify the application isn’t generating internal retries that look like abuse

cr0x@server:~$ docker logs --since=10m app-web | egrep -i 'retry|429|too many' | tail -n 20
2026-01-03T08:14:02Z WARN api_call failed status=429 retry_in_ms=200
2026-01-03T08:14:02Z WARN api_call failed status=429 retry_in_ms=200
2026-01-03T08:14:03Z WARN api_call failed status=429 retry_in_ms=200

What it means: the frontend is retrying fast (200ms) on 429, which is the opposite of what you want.

Decision: fix client backoff logic (exponential, honor Retry-After). Otherwise your rate limit turns into a retry storm.

Task 14: Confirm TLS termination point and protocol (HTTP/2 can change load patterns)

cr0x@server:~$ echo | openssl s_client -alpn h2 -connect example.com:443 2>/dev/null | egrep 'ALPN protocol|Protocol'
Protocol  : TLSv1.3
ALPN protocol: h2

What it means: clients negotiate HTTP/2. Multiplexing can increase request concurrency per client.

Decision: ensure your rate limits are per-identity and per-request, not per-connection. If you only cap connections, HTTP/2 will walk right through.

Task 15: Check Docker network paths (ensure no direct route to apps from outside)

cr0x@server:~$ docker network inspect bridge --format '{{json .IPAM.Config}}'
[{"Subnet":"172.17.0.0/16","Gateway":"172.17.0.1"}]

What it means: your containers are on a private Docker bridge. That’s fine, but don’t confuse “private IP” with “not reachable.”

Decision: confirm firewall rules prevent inbound access to container-published ports, and avoid publishing app ports at all.

Fast diagnosis playbook

When a real user is blocked (or your graphs look like a seismograph), don’t start by editing rules. Start by locating the layer doing the damage.

First: determine what kind of block it is

403: likely WAF/ACL/blocklist.
429: rate limiting.
5xx: upstream crash, overload, or proxy timeouts.
Client timeouts: conntrack/backlog/TLS handshake issues, or proxy CPU pegged.

Second: find where the decision was made

Edge provider logs/events (if you have a CDN/WAF).
Reverse proxy access logs: status code, request time, upstream time.
WAF audit logs: rule ID and match target.
App logs: did the request reach the app?

Third: validate identity and headers

Is client IP correct (X-Forwarded-For chain)?
Are you trusting only known proxies?
Is the rate limit key collapsing many users into one bucket?

Fourth: check resource saturation

Proxy CPU and memory (WAF parsing and regex can spike CPU).
Conntrack near max (connection churn and SYN floods).
Upstream latency (database/cache) causing retry storms.

Fifth: mitigate safely

Switch WAF to detection-only temporarily if false positives are severe.
Raise burst capacity for user-facing endpoints while you investigate.
Apply temporary allowlists for known partner IPs with an expiry.
Throttle or challenge suspicious traffic classes (empty UA, known bad ASNs if you have confidence).

Three corporate mini-stories from the trenches

Incident caused by a wrong assumption: “Per-IP is per-user”

The company was mid-size, B2C, with a healthy mobile user base and a login endpoint that got hammered by credential stuffing. They put an Nginx rate limit on /login: 5 requests per minute per IP. Simple. It worked instantly in their synthetic tests. The bot traffic dropped, and the graph looked civilized.

Two days later, customer support opened an incident because login failures spiked from a specific region. Engineers looked at the WAF dashboard and saw nothing obvious. The origin was returning 429s. “Okay,” someone said, “bots are still trying.” But the user-agent strings were normal browsers, and the requests had valid CSRF tokens.

The missing piece was network reality: a big mobile carrier in that region used aggressive NAT. Large numbers of legitimate users shared a small pool of egress IPs. So when prime time hit, the NAT IP hit the rate limit. Thousands of people were effectively competing for five logins per minute.

The fix wasn’t “remove rate limiting.” It was “stop pretending IP equals human.” They changed unauthenticated limits to be higher with a burst, added device-cookie-based keys at the edge, and enforced stricter limits after repeated failed logins per account. They also added a soft challenge for suspicious login patterns. The credential stuffing stayed down, and legitimate logins recovered.

Optimization that backfired: caching the wrong thing at the wrong layer

Another team wanted to reduce backend load and improve latency. They introduced aggressive caching at the reverse proxy for several API GET endpoints. Good idea in general. Then they got clever: they cached some error responses too, “to protect the backend during spikes.” That included 429 responses from rate limiting.

During a mild traffic bump, a subset of users began receiving 429s. The proxy cached those 429s for 30 seconds. Now every user hitting that endpoint from that cache key (which included a too-broad set of headers) got the cached 429, even if they personally were well under the limit.

The incident looked like “the rate limiter is too strict,” but it wasn’t. The caching layer amplified the blast radius of a localized throttle into a user-facing outage. They had created a shared fate between unrelated clients.

The rollback was immediate: never cache 429 at the proxy unless you have a very deliberate design, a correct cache key, and a strong reason. If you’re trying to protect upstreams, use queues, circuit breakers, or serve stale content—not cached throttles.

Boring but correct practice that saved the day: staged rollout with detection-only and a kill switch

A payments-adjacent service had a security requirement to add WAF controls in front of their Docker-hosted APIs. They’d been burned before by “turn it on and pray,” so they did something deeply unsexy: a two-week detection-only phase with a daily review.

Every morning, an SRE and an app engineer looked at the top triggered rules, then compared them against real request samples. They tagged each hit as “malicious,” “unknown,” or “false positive,” and only wrote exceptions when they could explain the traffic. They also kept a feature flag that could switch enforcement off in one config reload.

On enforcement day, they rolled it out by endpoint class: admin routes first (low user diversity), then partner APIs (keyed identity), then public endpoints. They monitored 403 and 429 rates, and they had a clear rollback threshold.

That afternoon, a legitimate partner integration started failing because it sent unusually long JSON fields that triggered a body size anomaly rule. Because they had detection history and a kill switch, they could confidently scope an exception to that partner route and parameter, reload, and move on. No drama. No war room. Just work.

Common mistakes: symptom → root cause → fix

1) Users behind offices/hotels can’t log in (lots of 429s)

Symptom: complaints from corporate networks; logs show many 429s from a single IP with normal browser UAs.

Root cause: per-IP rate limit on unauthenticated endpoints; NAT groups many users.

Fix: raise burst, rate-limit by cookie/device token, and add per-account failed-login limits in-app. Keep per-IP limits, but make them less punitive.

2) “Everything is blocked” after enabling WAF

Symptom: sudden surge of 403s; app sees fewer requests; WAF logs show common rules firing on normal payloads.

Root cause: enforcement enabled without detection baseline; paranoia level too high; rules applied to all routes equally.

Fix: revert to detection-only, tune by endpoint, add narrow exclusions, then re-enable gradually.

3) Rate limiting seems ineffective against bots

Symptom: backend load still high; per-IP throttles trigger but bot traffic continues.

Root cause: distributed attack with IP rotation; or the bot is using many IPs under the threshold.

Fix: add identity-based limits (API keys, sessions), bot challenges at the edge, and aggregate limits (global concurrency caps) plus caching.

4) Random users get blocked with “SQL injection” errors on search

Symptom: 403s on search endpoints; WAF logs show SQLi rules on query parameter.

Root cause: generic CRS rules interpreting search syntax as SQLi.

Fix: scope down inspection or disable specific rule IDs for that parameter/path; keep SQLi rules for sensitive endpoints.

5) After adding limits, latency increases and timeouts appear

Symptom: p95 latency spikes; proxy CPU high; some 504/499 errors.

Root cause: WAF inspection overhead or regex-heavy rules; also possible connection churn if clients retry.

Fix: reduce inspection scope (only inspect request bodies where needed), optimize rules, scale proxy, and enforce sane client backoff with Retry-After.

6) Users can bypass limits by spoofing X-Forwarded-For

Symptom: logs show wildly varying client IPs from the same TCP peer; rate limiting doesn’t catch abusive clients.

Root cause: proxy trusts forwarded headers from untrusted sources.

Fix: configure trusted proxy IP ranges only; strip incoming forwarded headers at the edge and re-add them yourself.

7) Webhooks fail because partner retries are blocked

Symptom: inbound webhook endpoint returns 429; partner reports missing events.

Root cause: webhook endpoint rate-limited per IP; partners retry bursts after timeouts.

Fix: rate-limit by partner identity (shared secret, token), allow larger bursts, and process asynchronously with idempotency keys.

8) “Our monitoring says we’re down” but users are fine (or vice versa)

Symptom: external checks get blocked, but real users don’t; or real users get blocked and checks don’t.

Root cause: monitors have distinct IPs/headers and are treated differently; WAF exceptions for monitors; limits on health endpoints.

Fix: treat monitoring as a first-class client: stable IDs, explicit allow rules, and separate endpoints for deep checks.

Checklists / step-by-step plan

Step-by-step rollout that avoids self-inflicted outages

Lock down exposure: only the proxy publishes ports; containers are internal.
Normalize client identity: configure trusted proxy IPs; set real client IP correctly; strip spoofable headers from the public edge.
Turn on logging first: access logs with upstream timing; WAF audit logs; structured app logs.
Enable WAF in detection-only: collect at least several days; include peak hours and batch jobs.
Classify endpoints: login, search, checkout, admin, webhooks, bulk APIs. Each gets its own policy.
Implement rate limits per endpoint: start with generous burst; enforce sustained rate; return 429 with Retry-After.
Introduce identity-based limits: API key/account ID/session after auth; per-IP only for unauthenticated or coarse protection.
Build exceptions narrowly: per path + parameter + method. Log every exception rationale.
Add a kill switch: one config flag to disable enforcement or reduce paranoia without redeploying apps.
Roll out gradually: admin/internal → partner APIs → public endpoints.
Set SLO-aligned thresholds: define what 403/429 rates are acceptable before rollback.
Rehearse incident response: can you identify the blocking layer in under 5 minutes?

Operational checklist for ongoing maintenance

Weekly review of top WAF rules triggered and top 429 endpoints.
Confirm “real IP” config after any network/CDN change.
Track false positives as bugs with owners and expiry dates on temporary allowlists.
Test login/search flows from mobile networks and corporate egresses (or at least proxies that behave similarly).
Ensure client SDKs honor Retry-After and exponential backoff.
Capacity plan proxy CPU for WAF inspection; treat it like a real workload.

FAQ

1) Should I put the WAF inside the app container?

No, not as your primary control. Put it at the reverse proxy or managed edge where you can centralize policies, log consistently, and avoid per-replica drift.

2) Is per-IP rate limiting always bad?

It’s useful and cheap. It’s also unfair. Use it as a coarse outer shell, then use better identities (cookie/session/API key/account) where possible.

3) How do I avoid blocking mobile users behind NAT?

Allow bursts, don’t set tiny per-minute caps, and switch to a cookie/session-based key as soon as you can. For login, add per-account failed-attempt limits.

4) What’s the difference between 403 and 429 for these controls?

Use 429 for “slow down, you’re over quota.” Use 403 for “we won’t serve this request” (WAF/ACL). Mixing them confuses clients and operators.

5) Why did enabling HTTP/2 change my rate limiting behavior?

HTTP/2 allows many parallel requests over one connection. If you were relying on connection counts as a proxy for load, HTTP/2 breaks that model. Limit per identity/request instead.

6) Can a WAF stop credential stuffing?

Not reliably by itself. Credential stuffing looks like normal login traffic. You need rate limiting, per-account controls, bot detection, and ideally MFA/risk scoring.

7) How do I know if my WAF is causing latency?

Look at proxy CPU, request processing time, and whether latency increases even when upstream time is flat. WAF audit logs and proxy timing fields help correlate.

8) What’s the safest way to add exceptions for false positives?

Scope them narrowly: by route, method, and parameter name; keep the exception as small as possible; add an owner and review date.

9) Should I block by user-agent?

As a signal, yes; as a sole identity, no. User-agents are trivial to spoof. But empty or obviously scripted UAs can justify tighter limits or challenges.

10) How do I prevent attackers from spoofing client IP via X-Forwarded-For?

Trust forwarded headers only from known proxy IP ranges. Strip inbound forwarded headers from the public edge and set your own canonical headers.

Conclusion: next steps that actually reduce pager noise

If you only remember one thing: rate limiting and WAFs are production features. They need observability, staged rollout, and ongoing tuning, just like databases and deploy pipelines.

Do these next, in order:

Ensure only the edge proxy is exposed to the internet. No bypass ports.
Fix real client IP handling and forwarded-header trust boundaries.
Implement per-endpoint rate limits with bursts and 429 + Retry-After.
Run WAF in detection-only, tune based on logs, then enforce gradually.
Add identity-based limits after authentication; stop relying on IP as “user.”
Keep a kill switch and a rollback threshold. Your future self will send a thank-you note.

The goal isn’t “block attackers.” That’s marketing. The real goal is: keep service healthy while letting legitimate users do normal things. The rest is tuning and humility.