Scalper bots: the year queues stopped meaning anything

Was this helpful?

You did everything “right.” You added a waiting room. You put customers in line. You showed a progress bar that calms people down.
And then your inventory evaporated in 47 seconds while real humans stared at “You’re almost there.”

That’s the moment queues stop being a product feature and start being a liability. Scalper bots didn’t just get faster; they learned to
speak your queueing system’s language fluently—cookies, tokens, headers, TLS, autofill, and all the quiet backchannels you forgot existed.

What changed: queues as theater

A queue used to mean something. It meant: “Requests arrive faster than we can serve them, so we’ll serialize access in a fair order.”
On the web, we stretched that meaning into “virtual waiting rooms,” which is a polite way of saying:
we moved the bottleneck to a page we can control.

The shift wasn’t subtle. Scalper operations now treat queueing systems as just another API to reverse engineer.
They don’t “skip the line” with magic. They do it with engineering: parallel sessions, headless browsers with stealth plugins,
residential proxies, pre-solved captchas, warmed cookies, and automation that retries exactly the way a human would—except at 500× scale.

Meanwhile, many companies still design queues as if attackers politely join the same line. That’s the wrong mental model.
Your queue is a distributed system boundary. Attackers will:

  • Acquire more “identities” than real users (IPs, devices, sessions, payment instruments).
  • Exploit state mismatches between CDN, queue provider, app, and inventory service.
  • Turn every edge-case in your fairness logic into a profit center.

When you hear “the queue was bypassed,” translate it to: “the queue token and the checkout entitlement were not cryptographically and operationally bound.”
If you only remember one sentence, make it that.

Joke #1: A virtual waiting room is like a nightclub bouncer who checks IDs, then hands out wristbands made of wet tissue paper.

Facts and historical context that explain today

These aren’t trivia; they’re the breadcrumbs that led to the modern scalper stack.

  1. Ticketing bot wars aren’t new. Automated ticket buying was a mainstream problem well before “limited sneaker drops” became a cultural event.
  2. Waiting rooms became popular after high-profile flash-sale meltdowns. The original goal was stability (protect origin) more than fairness.
  3. CAPTCHAs evolved into an arms race. What started as “prove you’re human” became “prove you’re not a profitable target,” because solving CAPTCHAs is now outsourced or automated.
  4. Residential proxy networks commoditized “unique users.” Attackers can buy IP diversity the way you buy cloud instances—by the gigabyte.
  5. Headless browsers got stealthy. Detection signals like webdriver flags and weird canvas fingerprints became less reliable as tools matured.
  6. Client-side signals became easy to spoof. If your anti-bot depends on JavaScript-only checks without server-side binding, it will be replayed.
  7. Queue providers standardized patterns—and attackers standardized bypasses. Reused integrations mean reused weaknesses.
  8. Bot operators learned to play the entire funnel. They don’t only hit “add to cart.” They pre-warm sessions, monitor inventory endpoints, and exploit holds and releases.

Threat model: what you’re actually up against

The scalper bot stack, in plain terms

Modern scalper operations look less like a script kiddie and more like a small SRE team with questionable ethics.
Typical components:

  • Identity factory: thousands of emails, phone numbers, addresses; sometimes real, sometimes synthetic.
  • Session factory: browsers launched in parallel; cookie jars persisted; “human-like” timings and scroll behavior.
  • Network diversity: rotating residential proxies; ASN mixing; geofencing to match expected buyer location.
  • Payment diversity: multiple cards, multiple wallets; sometimes mule accounts.
  • Queue tooling: token harvesters, replay logic, multi-tab strategies, and time sync to the queue’s cutover moments.
  • Telemetry: they graph your responses. Status codes, latency, inventory changes, even subtle HTML differences.

What “fairness” means to them

Fairness is not a moral argument. It’s an algorithm. If your system’s fairness depends on “one request = one person,”
you’ve already lost, because they can manufacture requests.

What you must protect

  • Origin stability: your app must not fall over; fairness doesn’t matter if the site is down.
  • Entitlement integrity: queue position must translate to a limited, verifiable right to attempt purchase.
  • Inventory correctness: holds, releases, and final decrements must be coherent under load and retries.
  • User trust: the queue UI cannot promise a fairness you don’t actually enforce.

One quote worth keeping on a sticky note:
Paraphrased idea — Werner Vogels: “Everything fails, all the time; build systems that assume it and recover quickly.”

Where queues break in production

1) Your queue token is not an entitlement

Many implementations treat the queue as a “traffic smoother” and then let anyone who reaches the origin proceed.
That’s backwards. You need a server-verifiable entitlement that survives retries but cannot be minted or replayed broadly.

Failure mode: queue token is just a cookie checked by edge logic; attacker replays it from many sessions.
Or it’s not bound to anything meaningful (device, IP range, session key, short TTL, one-time nonce).

2) You leak state in side channels

Attackers don’t need to guess. They observe.
Distinct HTML, different redirect chains, different cache headers, response time deltas—these all reveal whether a token is “good,”
whether inventory exists, or whether a hold succeeded.

3) Inventory holds become a denial-of-inventory attack

“Reserve in cart for 10 minutes” sounds user-friendly until a bot can reserve 10,000 carts.
If you don’t tie holds to scarce entitlements and aggressively expire/reclaim them, you’ve invented a new kind of outage:
the site is up, but nothing is buyable.

4) Your WAF blocks the wrong things

Static rules (block headless, block datacenter IPs) catch last year’s bots and this year’s false positives.
The best attackers look like your most enthusiastic customers—right up until they don’t.

5) Your “anti-bot” breaks the funnel more than the bots

Overly aggressive fingerprinting and challenges can crater legitimate conversion and push humans into infinite loops.
If your mitigations cause more revenue loss than scalping, you just built a self-inflicted DDoS.

6) A queue is a distributed system: consistency failures are inevitable

CDN sees one world. Queue provider sees another. Origin sees a third. Inventory service sees the fourth.
If you haven’t mapped exactly how state transitions happen, you’re debugging by vibes.

Fast diagnosis playbook

When a drop goes sideways, you don’t have time for philosophy. You need to find the bottleneck fast and decide what to trade:
fairness, availability, or revenue. Here’s the order that tends to work in real incidents.

First: is it stability or integrity?

  • If origins are timing out (5xx, elevated latency): treat as reliability incident. Shed load, protect database, reduce features.
  • If site is stable but humans can’t buy (conversion collapse, inventory gone): treat as integrity incident. Focus on entitlement leakage, holds, and automation paths.

Second: locate the choke point

  • CDN/WAF challenge rates and blocks
  • Queue acceptance vs pass-through rate
  • Checkout API error codes (429/403/409/422 patterns)
  • Inventory hold creation/expiration rates

Third: prove or disprove token replay

  • Same queue token seen from multiple IPs/ASNs
  • Same session ID used with many different UA/device hints
  • High concurrency per account/payment instrument

Fourth: pick a mitigation mode

  • Availability mode: hard rate limits, static pages, disable nonessential endpoints, narrow the funnel.
  • Integrity mode: tighten entitlements (short TTL, one-time use), bind holds to entitlements, increase friction at purchase, not at browse.
  • Recovery mode: invalidate leaked tokens, purge holds, restart the queue with a transparent message.

Operational tasks: commands, outputs, decisions

These are not “run this and feel productive” commands. Each one answers a question you will be asked during a drop:
what’s broken, who’s doing it, and what we do next.

Task 1: Confirm whether the queue is actually gating origin

cr0x@server:~$ curl -I -s https://shop.example.com/limited-drop | sed -n '1,20p'
HTTP/2 302
date: Tue, 22 Jan 2026 18:03:11 GMT
location: https://queue.example.com/wait?c=shop&t=abc123
cache-control: no-store
server: cdn-edge

What it means: A 302 to the queue suggests edge gating is enabled for that path.
Decision: Test multiple paths (API endpoints, mobile app endpoints, alternate hostnames). Attackers will.

Task 2: Check if “protected” APIs are reachable without queue context

cr0x@server:~$ curl -s -o /dev/null -w "%{http_code}\n" https://shop.example.com/api/checkout/session
200

What it means: If a critical endpoint returns 200 without queue proof, your queue is mostly decorative.
Decision: Add server-side enforcement: require an entitlement token for checkout/session creation.

Task 3: Spot token replay across many IPs in access logs

cr0x@server:~$ zgrep -h 'queue_token=' /var/log/nginx/access.log* | awk -F'queue_token=' '{print $2}' | awk '{print $1}' | sort | uniq -c | sort -nr | head
842 QTK.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
611 QTK.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
199 QTK.eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

What it means: The same token showing up hundreds of times is a replay signal.
Decision: Rotate signing keys and enforce one-time or low-reuse semantics (jti + server-side cache of used tokens).

Task 4: Correlate one token to multiple client IPs

cr0x@server:~$ zgrep -h 'QTK.eyJhbGci' /var/log/nginx/access.log* | awk '{print $1}' | sort | uniq -c | sort -nr | head
120 203.0.113.18
118 198.51.100.77
116 192.0.2.44

What it means: One token used from many IPs: either NAT at scale (possible) or bot replay (likely).
Decision: Bind entitlements to a stable session key and enforce drift limits (IP / ASN / JA3 bounds with tolerance).

Task 5: Confirm rate limiting effectiveness at the edge

cr0x@server:~$ curl -s -o /dev/null -w "%{http_code} %{time_total}\n" https://shop.example.com/api/inventory/sku/PS5
200 0.042

What it means: A fast 200 for a high-value endpoint suggests it’s cacheable or unprotected.
Decision: If it must exist, make it low-resolution (no exact counts), add jitter, require entitlement, or cache with normalized responses.

Task 6: Detect inventory hold hoarding from Redis

cr0x@server:~$ redis-cli -h redis01 -p 6379 INFO stats | egrep 'expired_keys|evicted_keys'
expired_keys:12890
evicted_keys:0

What it means: High expired_keys during a drop usually means a storm of short-lived holds or sessions.
Decision: If holds are expiring en masse, shorten hold TTL, cap holds per entitlement/account, and release on failed payment quickly.

Task 7: Measure hold cardinality by key pattern (sampled)

cr0x@server:~$ redis-cli -h redis01 --scan --pattern 'hold:*' | head
hold:sku:PS5:session:6f3b1a
hold:sku:PS5:session:9c2dd0
hold:sku:PS5:session:aa91e4

What it means: This confirms holds exist and shows key shape.
Decision: If the pattern is session-based only, consider binding to entitlement + account to prevent anonymous hoarding.

Task 8: Verify database contention (Postgres)

cr0x@server:~$ psql -h pg01 -U ops -d shop -c "select wait_event_type, wait_event, count(*) from pg_stat_activity where state='active' group by 1,2 order by 3 desc limit 10;"
 wait_event_type |  wait_event   | count
-----------------+---------------+-------
 Lock            | transactionid  |    34
 Lock            | tuple          |    19
 Client          | ClientRead     |     8

What it means: Lock waits suggest inventory decrements/holds are serialized poorly.
Decision: Move to atomic inventory primitives (single-row counters, SKIP LOCKED queues, or dedicated inventory service) and reduce hot-row contention.

Task 9: Find the hottest endpoints in NGINX logs

cr0x@server:~$ awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
182344 /api/inventory/sku/PS5
99112 /api/cart/add
60441 /api/checkout/session
22101 /limited-drop

What it means: Inventory and cart endpoints are getting hammered.
Decision: Apply endpoint-specific controls: stricter rate limits, entitlement checks, and caching/normalization for inventory lookups.

Task 10: Confirm CDN cache behavior for queue and drop pages

cr0x@server:~$ curl -I -s https://shop.example.com/limited-drop | egrep 'cache-control|age|via|cf-cache-status|x-cache'
cache-control: no-store
via: 1.1 cdn-edge
x-cache: MISS

What it means: no-store and MISS: every request hits deeper infrastructure.
Decision: Cache static shell pages aggressively; keep personalized/entitlement logic on small, fast endpoints.

Task 11: Check WAF challenge/deny rates (example via local logs)

cr0x@server:~$ zgrep -h 'waf_action=' /var/log/nginx/access.log* | awk -F'waf_action=' '{print $2}' | awk '{print $1}' | sort | uniq -c | sort -nr
19422 challenge
8121 allow
905 deny

What it means: High “challenge” volume can mean bots are chewing through challenges—or humans are suffering.
Decision: If conversion is falling, tune challenges to trigger later (at checkout) and use risk scoring rather than blanket friction.

Task 12: Validate idempotency behavior in checkout (prevent bot retry amplification)

cr0x@server:~$ curl -s -D - -o /dev/null -X POST https://shop.example.com/api/checkout/submit \
  -H "Idempotency-Key: 2d2a7a52-1b7b-4a6c-a9c0-9a3b3c0e6b25" \
  -H "Content-Type: application/json" \
  --data '{"cart_id":"c_123","payment_method":"pm_abc"}' | sed -n '1,20p'
HTTP/2 201
date: Tue, 22 Jan 2026 18:06:02 GMT
content-type: application/json
x-idempotency-replayed: false

What it means: The server acknowledges idempotency and indicates this is not a replay.
Decision: Repeat the same request; if it creates multiple orders, you’re funding retry storms and double-spend weirdness.

Task 13: Repeat idempotent request and verify replay handling

cr0x@server:~$ curl -s -D - -o /dev/null -X POST https://shop.example.com/api/checkout/submit \
  -H "Idempotency-Key: 2d2a7a52-1b7b-4a6c-a9c0-9a3b3c0e6b25" \
  -H "Content-Type: application/json" \
  --data '{"cart_id":"c_123","payment_method":"pm_abc"}' | sed -n '1,20p'
HTTP/2 200
date: Tue, 22 Jan 2026 18:06:08 GMT
content-type: application/json
x-idempotency-replayed: true

What it means: Correct replay behavior: second call returns the original result.
Decision: Enforce idempotency keys for purchase-finalizing endpoints; rate limit without breaking legitimate retries.

Task 14: Check for clock skew (queue and token TTL bugs love skew)

cr0x@server:~$ timedatectl status | sed -n '1,12p'
Local time: Tue 2026-01-22 18:06:30 UTC
Universal time: Tue 2026-01-22 18:06:30 UTC
RTC time: Tue 2026-01-22 18:06:29
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no

What it means: If clocks drift, short-lived entitlements fail randomly, which looks like “the queue is broken.”
Decision: Fix NTP before you rotate token TTLs downward; otherwise you’ll punish real users first.

Task 15: Inspect TLS fingerprint distribution (JA3) from load balancer logs (example format)

cr0x@server:~$ awk '{print $NF}' /var/log/haproxy/haproxy.log | sort | uniq -c | sort -nr | head
78211 771,4866-4867-49195-49196-52393,0-11-10-35-16-5-13-18-23-65281,29-23-24,0
11992 771,4865-4866-4867-49195,0-11-10-35-16-5-13-18-23-65281,29-23-24,0

What it means: A small number of TLS fingerprints dominating can indicate automation clients.
Decision: Don’t block purely on JA3; use it as a risk signal combined with behavior (bursting, checkout velocity, token reuse).

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A consumer brand rolled out a waiting room ahead of a limited drop. Leadership was proud: there was a queue page, a timer,
and a reassuring line about “first come, first served.” The team assumed the queue provider’s token was effectively a ticket.

It wasn’t. The queue token was checked at the CDN layer for HTML pages, but the mobile app endpoints were on a different hostname
and weren’t routed through the same rule set. A bot operator found the API shape in the app, skipped the waiting room entirely,
and hit “create checkout session” directly.

The outage wasn’t a crash; it was a reputational failure. Humans sat in the waiting room while bots purchased through the side door.
Customer support got screenshots. Social media did what it does. Meanwhile, SRE was looking at green dashboards because latency was fine.

The fix was almost boring: unify gating across hostnames and require an entitlement claim at the origin for any state-changing call.
They also wrote one contract test: “Without entitlement, checkout/session must be 403.” It ran in CI and again from a canary in prod.

Mini-story 2: The optimization that backfired

Another company optimized their inventory endpoint to reduce database load. They cached “remaining stock” at the edge for 5 seconds.
The dashboard looked great: fewer origin calls, lower CPU, happier finance.

The bots loved it. That cache turned the inventory endpoint into a clean timing oracle. Attackers watched cached responses flip,
coordinated purchase bursts, and learned exactly when stock was replenished from holds expiring. Humans, meanwhile, kept seeing
stale “in stock” states and slammed “add to cart” into 409 conflicts.

The team tried to fix it by increasing cache TTL. That reduced origin load further and made the oracle stronger.
The “optimization” created a new control plane for attackers: predictable, cheap, and globally distributed.

They unwound it and replaced the endpoint with a normalized response: “available / maybe / sold out,” plus a backoff hint.
Exact counts were removed from public surfaces. Internally, they maintained accurate numbers. Externally, they stopped feeding the bots.

Mini-story 3: The boring but correct practice that saved the day

A large retailer ran quarterly “drop game days.” Not glamorous. People complained it was repetitive.
They practiced a full run: queue activation, entitlement verification, inventory holds, payment retries, and rollback.

In one rehearsal, they discovered a subtle bug: entitlement tokens were validated by one service using UTC and another using local time.
Only a few seconds off, but enough to create intermittent 403s right at checkout when tokens had short TTLs.

Fixing it was unremarkable: standardize on UTC, tighten NTP monitoring, and add a small leeway window for exp/nbf claims.
They also added synthetic monitors that performed the entire funnel with test inventory.

During the real event, bot traffic spiked, but the system stayed predictable. Humans got through. Support volume stayed normal.
The company didn’t “win” against bots forever; they avoided self-inflicted chaos. That’s usually the achievable goal.

Design patterns that still work (and what to avoid)

Bind queue position to a server-verifiable entitlement

A queue UI is not enforcement. Enforcement lives at the origin, preferably at the first state-changing endpoint.
Design an entitlement token that is:

  • Signed (HMAC or asymmetric), with short TTL.
  • Scoped to action and SKU/category (don’t grant “checkout anything” if the drop is one SKU).
  • Bound to session/account in a way you can validate server-side.
  • Replay-resistant with jti and a small server-side used-token cache for purchase-finalization.

Avoid the trap: “We can’t store state for tokens.” You don’t need state for every request. You need it where money changes hands
and where a single token replay can mint multiple orders.

Make “add to cart” cheap, but make “hold inventory” scarce

People spam add-to-cart when they’re anxious. Bots do it because it works. Don’t conflate the two.
Let add-to-cart be a UI affordance, but require entitlement to convert cart to a hold.

  • Cap holds per entitlement/account/payment instrument.
  • Expire holds aggressively and reclaim deterministically.
  • Don’t create holds for anonymous sessions if you can avoid it.

Normalize public signals; keep precision internal

If an endpoint reveals exact stock, queue cutoff timing, or validation errors with high specificity, attackers will instrument it.
They can iterate faster than your on-call.

Public responses should be intentionally boring. Internally, you still need precision for ops and finance.
The mistake is giving attackers the same telemetry you use.

Prefer risk scoring and step-up friction at the right moment

If you challenge everyone at the front door, you slow humans and give bots time to farm solutions.
If you challenge at the moment of scarce resource consumption (hold or checkout submit), you get leverage.

This is also where you can justify asking for stronger proof: account login, verified email/phone, payment verification,
or a device binding. You can’t demand that for browsing without lighting your conversion rate on fire.

Use idempotency keys, everywhere it matters

Bots retry aggressively. Humans retry accidentally. Your system must treat retries as normal, not as a bug.
Idempotency isn’t a “nice to have”; it’s a way to prevent amplification when traffic gets weird.

Operational transparency beats fake promises

Don’t tell users “fair” if you can’t enforce it. Say what you can guarantee:
“We limit purchases per customer,” “we may cancel suspicious orders,” “your place in line grants a limited checkout window.”
Be specific. Vague fairness claims become screenshots later.

Joke #2: The only thing faster than a scalper bot is a post-incident meeting that starts with “But the vendor said…”.

Common mistakes (symptoms → root cause → fix)

1) Symptom: Queue looks healthy, but inventory sells out instantly

  • Root cause: Queue gating applies only to HTML paths; APIs are open or on alternate hostnames.
  • Fix: Enforce entitlement at origin for all state-changing endpoints. Add automated tests that assert 403 without entitlement.

2) Symptom: Humans stuck in endless challenge loops

  • Root cause: Over-tuned WAF/fingerprint rules; unstable client signals; clock skew causing token failures.
  • Fix: Reduce friction at browse; move step-up to hold/checkout; add token leeway and NTP alarms; monitor challenge pass rate by device class.

3) Symptom: Massive cart holds, “sold out” while stock later reappears

  • Root cause: Hold system can be spammed without scarce entitlement; TTL too long; holds not reclaimed quickly on payment failure.
  • Fix: Bind holds to entitlement; cap holds per account/device; shorten TTL; implement immediate release on failed payment authorization.

4) Symptom: Spikes of 409/422 at checkout, but CPU is fine

  • Root cause: Hot-row contention in inventory DB; optimistic concurrency without backoff; retry storms.
  • Fix: Use atomic inventory primitives; add server-guided backoff; enforce idempotency; rate limit checkout submit separately.

5) Symptom: WAF blocks go up, conversion goes down, bots still win

  • Root cause: Rules tuned to old bot signatures; attackers using residential IPs and full browsers.
  • Fix: Shift to behavior-based detection (velocity, token reuse, multi-account patterns). Add purchase limits and post-purchase review/cancellation pipeline.

6) Symptom: “Queue bypass” reports correlate with mobile app users

  • Root cause: App endpoints not integrated with queue/entitlement checks; hardcoded API base URL bypasses CDN rules.
  • Fix: Require the same entitlement claims for app APIs; unify hostnames/routing; rotate secrets embedded in clients and assume they’re public.

7) Symptom: Bots appear to know exactly when you flip the drop live

  • Root cause: Predictable timing signals: cache invalidations, sitemap/product JSON, preloaded SKUs, or staging flags exposed.
  • Fix: Stagger activation internally; avoid public “coming soon” endpoints with precise state; normalize responses; use feature flags that aren’t observable externally.

Checklists / step-by-step plan

Pre-drop checklist (the week before)

  1. Map the funnel. Every hostname and endpoint from landing page to order confirmation. If you can’t draw it, you can’t defend it.
  2. Classify endpoints. Public/anonymous, authenticated, state-changing, inventory-sensitive, payment-finalizing.
  3. Enforce entitlement at origin. Not just at CDN. Start at “create hold” and “create checkout session.”
  4. Verify idempotency. Checkout submit and payment authorize must be idempotent with a server-enforced key.
  5. Set purchase limits. Per account, per payment instrument, per shipping address (with careful false-positive handling).
  6. Decide your cancellation policy. If you’re going to cancel suspicious orders, build the workflow before launch day.
  7. Run a game day. Simulate bot-like traffic, token replay attempts, and hold hoarding. Fix what breaks.
  8. Instrument the right metrics. Queue pass-through rate, entitlement validation failures, hold creation/expiry, checkout success, challenge loops.

Drop-day checklist (hours before)

  1. Confirm NTP and clock health. Short TTL tokens + skew = random pain.
  2. Warm caches deliberately. Cache static content; keep dynamic entitlements uncacheable.
  3. Set safe rate limits. Baseline per-IP and per-session, with emergency tighter limits ready.
  4. Enable “integrity mode” toggles. Step-up challenges at hold/checkout; stricter token reuse checks; reduced inventory signal precision.
  5. Prepare comms. A status page banner and a support macro that doesn’t promise impossible fairness.

Live incident plan (when it goes wrong)

  1. Protect origin first. If 5xx climb, shed load. A fair queue for a down site is performance art.
  2. Validate gating. Test direct API calls; confirm entitlements are required.
  3. Check token replay. Log sampling for repeated tokens across IPs; clamp down with one-time jti at checkout.
  4. Check holds. If holds explode, cap and shorten TTL; purge abusive patterns.
  5. Clamp the funnel. Temporarily disable low-value endpoints (inventory polling) and harden purchase steps.
  6. Decide on reset. If integrity is unrecoverable (tokens leaked widely), reset the queue and be honest about it.

Post-drop checklist (within 48 hours)

  1. Reconcile inventory. Confirm holds/releases/decrements match orders.
  2. Review suspicious orders. Apply cancellation policy consistently; document criteria.
  3. Write the incident report. Include what the bots did, what signals worked, and what failed.
  4. Add regression tests. One test per bypass path discovered. Boring tests are how you keep wins.
  5. Retune controls. Reduce false positives for next time; log the exceptions you had to make.

FAQ

Do queues still help at all?

Yes—for stability. They reduce origin load and smooth spikes. But queues don’t guarantee fairness unless you bind queue passage to
a verifiable entitlement and enforce it at the origin.

Why not just block datacenter IPs?

Because serious bot operators use residential proxies and real browsers. Blocking datacenters catches some noise but won’t stop the buyers
who matter, and it can hurt legitimate corporate networks.

Is CAPTCHA enough?

No. CAPTCHA is friction, not identity. It can be solved by humans-for-hire or bypassed with better automation.
Use it as a step-up control, not as the foundation of your fairness model.

What’s the single most common queue bypass?

Unprotected APIs. Teams gate the landing page but forget the checkout/session endpoint, mobile APIs, or a legacy host that routes around
the queue rules.

How do we prevent cart hoarding without punishing slow users?

Make holds scarce and bounded: require entitlement to create a hold, cap holds per user/payment instrument, and shorten TTL with an
extension only for actively progressing checkouts.

Should we expose exact inventory counts?

Publicly? Usually no. Exact counts create an oracle that bots exploit. Internally, keep precision. Externally, normalize to coarse states
and add backoff hints to reduce polling.

Can device fingerprinting solve this?

It helps, but it’s brittle and privacy-sensitive. Treat fingerprints as risk signals, not as absolute identity.
Expect spoofing and expect false positives across shared devices and privacy tools.

What does “bind entitlement to session” actually mean?

It means the entitlement token should only be valid when presented with a related server-side session key or cookie that was issued during
queue passage, and it should be scoped (SKU/action) and short-lived.

What if we can’t fully stop bots—what’s a realistic goal?

Keep the site up, keep humans able to buy, and make automation expensive. Then add post-purchase enforcement:
purchase limits, suspicious order review, and consistent cancellation/refund policies.

Is canceling suspected scalper orders safe?

It’s operationally messy but often necessary. If you do it, do it with clear criteria, audit trails, and a support workflow.
Random cancellations create customer rage and legal risk.

Conclusion: practical next steps

Scalper bots didn’t “break queues” because they’re clever. They broke queues because many queue implementations were never designed
to be security boundaries. They were designed to stop your database from melting. Different problem.

Next steps that actually move the needle:

  1. Audit your funnel for un-gated hostnames and APIs. Treat it like a pen test, because attackers already did.
  2. Implement origin-enforced entitlements with tight scope and TTL, and replay resistance at purchase-finalization.
  3. Bind inventory holds to scarce entitlements and cap them. A hold is a resource; price it accordingly.
  4. Normalize public signals so bots can’t use your site as their dashboard.
  5. Practice the event with game days and regression tests. It’s boring. It also works.

The uncomfortable truth: you won’t eliminate scalpers. But you can make queues mean something again—by turning them from theater into
enforcement, and by running the system like the adversary is already in the room. Because they are.

← Previous
ZFS vs Storage Spaces: Why “Easy” Can Become “Opaque”
Next →
Copy to Clipboard Buttons That Don’t Lie: States, Tooltips, and Feedback

Leave a comment