MySQL is sweating, your CPUs are pegged, and the product team just “launched a small feature” that somehow doubled traffic. You add replicas. Then you add more replicas. Then you discover the bottleneck wasn’t the writes—your reads are doing the dumbest work imaginable, millions of times per hour.
Redis won’t save a broken schema or bad queries. But if your load is mostly repeatable reads, computed objects, session lookups, rate checks, or “does this user have permission X?”, Redis can take a huge bite out of MySQL traffic. Done right, 80% isn’t a fairy tale. It’s Tuesday.
The myth: “Redis vs MySQL” is the wrong fight
When people say “MySQL vs Redis,” they usually mean “I’m tired of MySQL being slow, can I swap it with something faster?” That’s like replacing your accounting department with a cash register because the register is faster. The register is fantastic. It is also not an accounting department.
MySQL is a durable relational database with transactions, constraints, secondary indexes, joins, and decades of operational muscle memory. Redis is an in-memory data structure server: extremely fast, predictable at low latency, and great for ephemeral or derived data that you can rebuild.
So the correct question is not which one “wins.” It’s:
- Which data must be correct, durable, and queryable in rich ways? Put that in MySQL.
- Which data is hot, repetitive, derived, temporary, or used for coordination? Put that in Redis.
- Where is the load coming from: the CPU cost of query execution, disk I/O, locks, network round trips, or application behavior?
Redis is not your primary database—yet. In most systems, it shouldn’t be. But Redis can be your best performance multiplier, because it changes the shape of your workload: fewer expensive queries, fewer round trips, less contention, and less pressure on your buffer pool and disks.
What each system is actually good at
MySQL: the source of truth with sharp edges
MySQL is built for correctness and long-term storage. It persists data to disk, logs changes, and provides transactional semantics. It also lets you ask complicated questions, which is exactly why people accidentally ask complicated questions 50,000 times per minute.
Use MySQL for:
- Data you cannot lose: orders, balances, permissions, account state, billing events.
- Multi-row invariants: “only one active subscription per user,” foreign keys, uniqueness.
- Auditable change history (often via append-only tables or binlog-based pipelines).
- Queries that require joins or range scans on secondary indexes.
MySQL fails in familiar ways: lock contention, bad indexes, huge scans, buffer pool misses, and slow storage. It also fails socially: “just one more column,” “just one more join,” and “we’ll cache it later,” which is code for “we’ll page SRE later.”
Redis: the hot-path accelerator and coordination layer
Redis is fast because it is simple where it matters: keep data in memory, make operations cheap, and keep the protocol straightforward. It’s a brilliant place for:
- Caching: computed objects, rendered fragments, authorization decisions, API responses.
- Sessions and tokens: quick lookups with TTLs.
- Rate limiting: atomic counters per key with expiration.
- Distributed coordination: locks (carefully), leaderboards, queues/streams, dedup keys.
- Feature flags and config snapshots: small reads, many times.
Redis fails in different ways: memory pressure, eviction surprises, persistence misconfiguration, failover hiccups, and latency spikes from big keys or slow commands. Also, the biggest Redis outage cause is not Redis itself—it’s applications that assume “cache” means “always there.”
One quote that’s tattoo-worthy for anyone operating both: “Everything fails, all the time.”
—Werner Vogels
One short joke, because we’ve earned it: Redis is like espresso—amazing for productivity, but if you build your whole diet around it, you’ll shake and regret it.
Interesting facts and a little history
- Redis began as a real product need: it was created by Salvatore Sanfilippo to solve performance and scalability issues in a web analytics context, not as an academic exercise.
- MySQL’s InnoDB changed the operational game: once InnoDB became the default, crash recovery and transactional behavior became the norm rather than an optional “serious mode.”
- Redis popularity surged with session storage: early mass adoption often started with “move sessions out of MySQL,” because it’s low risk and immediately reduces write churn.
- Memcached vs Redis split: Memcached pushed a simple cache-only story; Redis brought richer data types and atomic operations, which made it useful beyond caching.
- Redis persistence came later in spirit: Redis is in-memory first; RDB snapshots and AOF logs exist, but durability is a configured tradeoff, not a guarantee by default.
- MySQL replication shaped architectures: the ability to split reads to replicas (and later semi-sync and GTID improvements) influenced the “read scaling” playbook long before Redis sat in front of it.
- Cache invalidation is still unsolved as a human problem: the hardest part is not the algorithm; it’s organizational discipline around where truth lives and who owns invalidation rules.
- Redis introduced Lua scripting for atomicity: it’s powerful, but also a way to create “hidden application logic” that no one deploys safely.
How Redis cuts MySQL load without lying to you
Most MySQL load is self-inflicted repetition
In a typical production web system, a big chunk of read traffic repeats. The same user profile. The same product catalog fragment. The same permission set. The same feature flag evaluation. The same “what’s in the header bar” widget that hits five tables because someone wanted it “flexible.”
MySQL can handle plenty of reads, especially from cache (buffer pool) and with proper indexes. But it’s still doing parsing, planning, lock management, and executing logic. Multiply that by many app instances and you get death by a thousand polite SELECTs.
Redis helps when your read workload is:
- Hot: the same keys are requested over and over.
- Derivable: you can reconstruct cache entries from MySQL, or tolerate occasional recomputation.
- Coarse-grained: one cache hit replaces several queries or a join-heavy query.
- Latency-sensitive: shaving 10–30ms matters because it multiplies across downstream calls.
But Redis doesn’t help when you need:
- Ad hoc analytics with flexible query predicates.
- Complex joins that change every sprint.
- Strong consistency across multiple entities without careful design.
- Long retention with cheap storage per GB.
The 80% reduction pattern: cache the expensive boundary
The biggest wins happen when you cache at a boundary that naturally aggregates work. Don’t cache individual rows if your bottleneck is a query that touches 8 tables. Cache the result object your service actually needs: a hydrated user context, a rendered fragment, or an authorization bundle.
That does two things:
- It replaces multiple MySQL calls with one Redis GET.
- It makes the cache key stable and easy to reason about (for invalidation and TTL).
Write patterns: choose one, don’t improvise
- Cache-aside (lazy loading): app reads Redis, on miss reads MySQL, then populates Redis. Most common; easiest to introduce; hardest to keep consistent without discipline.
- Write-through: app writes to cache and DB in the same request flow (often DB first, then cache). Useful for predictable reads, but you must handle partial failures.
- Write-behind: app writes to Redis and asynchronously flushes to DB. Fast, but risky; you’re essentially building a database system on purpose.
For most teams: start with cache-aside, add explicit invalidation on writes, and apply TTLs to cap the blast radius of bugs.
Caching and coordination patterns that work in production
1) Cache-aside with explicit invalidation
On reads:
- GET key from Redis
- if hit: return it
- if miss: query MySQL, serialize, SETEX into Redis, return
On writes:
- Commit to MySQL first.
- Then delete or update cache keys that depend on the changed rows.
- Use TTLs anyway. TTLs are not consistency, they are damage control.
If invalidation feels “too hard,” that’s not a reason to avoid caching. It’s a reason to define ownership: which service owns the cache key namespace, and which writes trigger invalidation.
2) Prevent the stampede: single-flight and soft TTL
Cache stampede is when a hot key expires and 5,000 requests rush to MySQL at once. The database doesn’t get “more correct” under stress; it just gets slower, then it dies.
Fix it with:
- Request coalescing (single-flight): one request rebuilds the key while others wait.
- Soft TTL: serve slightly stale data for a short window while a background refresh happens.
- Jitter: randomize TTLs to avoid synchronized expiry.
3) Negative caching (cache the “not found”)
If bots or broken clients keep asking for IDs that don’t exist, MySQL will happily do that lookup forever. Cache 404s with short TTLs (seconds to minutes). It’s cheap insurance.
4) Use Redis for “decisions,” not “truth”
Good Redis keys represent decisions: “user X is rate-limited,” “session Y is valid until T,” “feature flag set F for cohort C.” These are time-bound and derived.
Bad Redis keys represent truth: “account balance,” “inventory count,” “the only copy of a password reset token with no persistence strategy.” You can do it, but you are now operating a database and pretending you aren’t.
5) Counters and rate limiting: atomic and boring
Rate limiting is Redis’s home turf because INCR and EXPIRE are cheap and atomic. Use a fixed window if you must, sliding window if you’re fancy, but keep the implementation simple enough that someone can debug it at 3 a.m.
6) Queues and streams: know what you’re buying
Redis lists, pub/sub, and streams can implement work queues. They’re great for low-latency fanout and lightweight pipelines. But if you need exactly-once processing, durable retention, consumer group rebalancing, and multi-DC guarantees, you’re not shopping for Redis anymore—you’re shopping for a log system.
Redis durability: what it does, what it doesn’t
Redis persistence is real, but it is not the same contract as a relational database. You need to decide your failure model first, then configure persistence accordingly.
RDB snapshots
RDB writes point-in-time snapshots to disk. It’s compact and fast for restarts, but you can lose data since the last snapshot. That’s fine for caches; questionable for queues; terrifying for ledgers.
AOF (Append Only File)
AOF logs every write. With fsync policies, you can reduce data loss window at the cost of write overhead. AOF rewrite compacts the log periodically. It’s closer to durability, but still not a relational database with constraints and transactional guarantees across rows.
Replication and failover
Redis replication is asynchronous in most common setups. Failover can lose recent writes. If you put “truth” in Redis, you must accept that truth can be rolled back by a failover. If that sentence makes your stomach drop, good—keep truth in MySQL.
Second short joke (and last one): Calling Redis your database because you enabled AOF is like calling a tent a house because you bought a good sleeping bag.
Fast diagnosis playbook
If your system is slow and you suspect “database,” you can waste days guessing. Don’t. Run a fast triage that tells you where to spend your next hour.
First: is MySQL overloaded or just waiting?
- Check MySQL connections and threads running: if threads_running is high and stable, MySQL is busy; if low but queries slow, you may be waiting on I/O or locks.
- Check top query fingerprints: one bad query pattern can dominate CPU even if each call is “only 30ms.”
- Check disk latency: if storage is slow, caching won’t fix writes or buffer pool misses.
Second: is Redis actually helping or hiding pain?
- Redis hit rate: low hit rate means you’re paying network and serialization costs without benefit.
- Redis latency spikes: big keys, slow commands, or persistence fsync can cause tail latency that leaks back to MySQL (via retries and timeouts).
- Evictions: evictions are a sign that your cache policy is making decisions for you, usually bad ones.
Third: is the application the real culprit?
- Concurrency: a deployment that doubles worker count can double database load even if traffic is flat.
- N+1 queries: one endpoint can quietly run hundreds of queries per request.
- Retry storms: misconfigured timeouts can multiply load when things are already slow.
Practical tasks with commands: measure, decide, change
Below are hands-on tasks you can run on real systems. Each includes a command, what the output means, and what decision to make next. No magic, just evidence.
Task 1: Identify top MySQL statements by total time
cr0x@server:~$ mysql -e "SELECT DIGEST_TEXT, COUNT_STAR, ROUND(SUM_TIMER_WAIT/1e12,2) AS total_s FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 5\G"
...output...
What the output means: DIGEST_TEXT shows normalized query shape; COUNT_STAR is executions; total_s is cumulative time spent.
Decision: If one digest dominates total time, target it for caching or indexing before adding hardware. If many digests tie, look for systemic issues (timeouts, N+1).
Task 2: Confirm MySQL is CPU-bound or waiting on I/O
cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_running'; SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads'; SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';"
...output...
What the output means: Threads_running high implies active workload. Buffer_pool_reads (physical reads) vs read_requests (logical reads) gives cache hit intuition.
Decision: If physical reads are climbing and storage latency is high, fix I/O or increase buffer pool before betting everything on Redis.
Task 3: Find lock contention in MySQL
cr0x@server:~$ mysql -e "SELECT * FROM sys.innodb_lock_waits ORDER BY wait_age_secs DESC LIMIT 10\G"
...output...
What the output means: Shows blocking and waiting transactions, and how long they’ve been stuck.
Decision: If lock waits are common, caching reads may help but won’t solve write contention. Fix transaction scope, indexes, or isolation issues.
Task 4: Inspect the MySQL slow query log quickly
cr0x@server:~$ sudo pt-query-digest /var/log/mysql/mysql-slow.log --limit 10
...output...
What the output means: Top query classes by total time, average time, rows examined, and variance.
Decision: If rows examined is huge for simple lookups, you need indexes or query fixes. If queries are predictable and repetitive, they’re prime cache candidates.
Task 5: Verify MySQL index usage for a hot query
cr0x@server:~$ mysql -e "EXPLAIN SELECT * FROM orders WHERE user_id=123 AND status='open'\G"
...output...
What the output means: Look at type (ALL is bad), key used, rows estimated, and extra (Using filesort / temporary can hurt).
Decision: If the plan is a scan or uses the wrong index, fix schema/query first. Redis should not be a bandage for avoidable scans.
Task 6: Check Redis hit rate and keyspace behavior
cr0x@server:~$ redis-cli INFO stats | egrep 'keyspace_hits|keyspace_misses|instantaneous_ops_per_sec'
...output...
What the output means: Hits and misses allow you to compute hit ratio. ops/sec tells you traffic volume.
Decision: If hit ratio is low, your caching strategy is wrong: keys are too granular, TTL too short, or you’re caching the wrong objects.
Task 7: Detect Redis evictions (the silent correctness tax)
cr0x@server:~$ redis-cli INFO stats | egrep 'evicted_keys|expired_keys'
...output...
What the output means: evicted_keys > 0 means Redis is deleting keys under memory pressure. expired_keys is normal for TTL use.
Decision: If evictions occur, either raise maxmemory, change eviction policy, reduce value sizes, or rethink what you cache. Assume eviction breaks assumptions until proven otherwise.
Task 8: Check Redis memory fragmentation and allocator behavior
cr0x@server:~$ redis-cli INFO memory | egrep 'used_memory_human|used_memory_rss_human|mem_fragmentation_ratio|maxmemory_human'
...output...
What the output means: RSS much larger than used_memory implies fragmentation or allocator overhead; maxmemory tells you the ceiling.
Decision: High fragmentation may require tuning (jemalloc behavior, active defrag) or reshaping workloads (avoid huge churn of similarly-sized keys).
Task 9: Find Redis slow commands
cr0x@server:~$ redis-cli SLOWLOG GET 10
...output...
What the output means: Lists slow command executions with duration and arguments (often truncated).
Decision: If you see KEYS, big HGETALL, large range queries, or Lua scripts taking milliseconds, you’ve built a latency grenade. Replace with SCAN patterns, smaller values, or precomputed keys.
Task 10: Validate Redis persistence settings (AOF/RDB)
cr0x@server:~$ redis-cli CONFIG GET appendonly save appendfsync
...output...
What the output means: appendonly on/off, save schedules for RDB, appendfsync policy (always/everysec/no).
Decision: For pure caching, you may disable persistence to reduce overhead. For coordination data (locks, rate limits), persistence can be optional but understand restart behavior. For “truth,” rethink the whole design.
Task 11: Measure Redis latency distribution under load
cr0x@server:~$ redis-cli --latency-history -i 1
...output...
What the output means: Reports min/avg/max latency over time. Spikes correlate with fork for RDB, AOF fsync, or big commands.
Decision: If max latency spikes align with persistence events, tune persistence, move to faster storage, or isolate Redis on dedicated nodes.
Task 12: Check Linux pressure signals that affect both MySQL and Redis
cr0x@server:~$ vmstat 1 5
...output...
What the output means: si/so indicate swapping (bad); wa shows I/O wait; r shows runnable threads; free/buff/cache tells memory posture.
Decision: If swapping is non-zero, stop and fix memory. Redis plus swap is how you turn microseconds into minutes.
Task 13: Confirm disk latency for MySQL data volume
cr0x@server:~$ iostat -x 1 3
...output...
What the output means: r_await/w_await show read/write latency; %util shows saturation.
Decision: If latency is high and util is near 100%, you’re I/O-bound. Caching reads can help, but if writes are the issue, you need storage, batching, or schema changes.
Task 14: Watch MySQL connection churn (often a hidden tax)
cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Connections'; SHOW GLOBAL STATUS LIKE 'Aborted_connects'; SHOW GLOBAL STATUS LIKE 'Threads_connected';"
...output...
What the output means: Connections climbing fast suggests lack of pooling; Aborted_connects implies auth/network issues.
Decision: Fix pooling and timeouts before you add Redis. Otherwise you’ll just build a faster way to overload MySQL.
Task 15: Validate Redis key distribution and hot keys
cr0x@server:~$ redis-cli --hotkeys
...output...
What the output means: Estimates frequently accessed keys (best-effort sampling).
Decision: If one key is extremely hot, shard it (per-user keys), add local caching, or redesign to avoid global counters that serialize traffic.
Three corporate mini-stories (anonymized, real enough)
Mini-story 1: The incident caused by a wrong assumption
The company had a monolithic MySQL backend and introduced Redis “for sessions.” It worked. So they extended Redis to store user entitlements—what features a user had paid for. The thinking was simple: entitlements are read constantly, writes are rare, Redis is fast, and they enabled AOF, so it was “durable enough.”
Then a failover happened during a noisy neighbor event on the virtualization host. Redis promoted a replica that was slightly behind. A small window of entitlement changes—upgrades and downgrades—vanished. Some users lost access; a few got access they shouldn’t have had. Support tickets started. Finance started asking questions with the tone people use when they’re trying to stay polite.
At first, the team hunted for “data corruption.” Redis wasn’t corrupted. It did exactly what it promised: fast, in-memory, asynchronous replication unless configured otherwise, and persistence that still has windows of loss.
The root problem was the assumption that “AOF means database.” They fixed it by moving entitlements back into MySQL as the source of truth, then caching the computed entitlement bundle in Redis with short TTLs and explicit invalidation on entitlement writes. They also added a sanity check path: if Redis says a user is entitled but MySQL disagrees, MySQL wins and Redis is corrected.
After that, failovers were boring again. That’s the goal.
Mini-story 2: The optimization that backfired
A different team was proud of their cache hit rate. They had a popular endpoint: “get user dashboard.” They cached the whole JSON response in Redis for 30 minutes. MySQL load dropped sharply. Graphs looked like a success slide.
Two months later, product shipped a “live notifications” widget inside that dashboard. It needed to update immediately when a notification was read. Instead, users kept seeing old notifications for up to 30 minutes. The team tried to invalidate the cache on notification reads, but the dashboard cache depended on several different underlying tables and event types. Invalidation became a web of “if X changes, delete keys A, B, C, except when…”. Bugs followed. Then incident: a deploy accidentally stopped invalidation for one of the underlying data sources, and stale dashboards spread like a rumor.
The postmortem was uncomfortable because the caching “optimization” had increased the correctness surface area. They had cached too high in the stack without a clean ownership model for invalidation. MySQL load was lower, but user trust was lower too.
The fix was to split the dashboard into smaller cacheable components with different TTLs and invalidation triggers: profile block (rarely changes), recommendation block (TTL-based), notifications (not cached, or cached with very short TTL and strong invalidation). They also introduced versioned keys per user state change, so new writes naturally moved traffic to new keys without needing to delete old ones immediately.
MySQL load went up a bit from the “hero” baseline. Incidents went down a lot. That’s a better trade.
Mini-story 3: The boring but correct practice that saved the day
A mature platform team treated Redis as a dependency with its own SLO, dashboards, and capacity planning. Not exciting. Effective. They did routine load tests that included Redis failure modes: cold cache, partial cache, and Redis restarts during peak.
One afternoon, a kernel update triggered a reboot loop on a subset of nodes in the Redis cluster. Redis availability degraded. The application didn’t go down. Latency increased, but it stayed inside user-facing SLO for most endpoints.
Why? Two boring choices. First, the app had strict timeouts for Redis (single-digit milliseconds) and would fall back to MySQL on miss or error for only a limited set of endpoints. Second, they enforced circuit breakers: if Redis errors exceed a threshold, the app stops trying Redis for a short cooldown window, preventing retry storms.
MySQL did see increased read traffic, but they had headroom because the cache strategy targeted the highest-churn reads and they had already fixed the worst queries. The “cache is down” mode was degraded, not catastrophic.
The team got to fix the cluster without a public incident. No heroics. Just systems that behave like adults.
Common mistakes: symptoms → root cause → fix
1) Symptom: MySQL load didn’t drop after adding Redis
Root cause: Low hit rate, caching at the wrong granularity, or missing cache-aside logic under concurrency.
Fix: Measure keyspace_hits/misses, identify what’s being cached, and cache at the boundary object level. Add request coalescing to stop thundering herds.
2) Symptom: Redis memory grows until evictions start
Root cause: Missing TTLs, too many unique keys (high cardinality), or values larger than expected (JSON blobs, uncompressed arrays).
Fix: Add TTLs by default, cap payload sizes, use hashes for related fields, and define maxmemory + an eviction policy that matches “cache” rather than “store forever.”
3) Symptom: P99 latency got worse after “caching”
Root cause: Redis is introducing tail latency due to persistence fsync, slow commands, big keys, or network hops; plus application retries amplify the problem.
Fix: Check redis-cli –latency-history and SLOWLOG. Reduce big-key operations, tune persistence, and enforce aggressive timeouts with circuit breakers.
4) Symptom: Random stale data, hard to reproduce
Root cause: Invalidation bugs, multi-writer caches, or using TTLs as the only “consistency strategy.”
Fix: Choose a single cache owner per key namespace, implement explicit invalidation on writes, and use versioned keys where feasible.
5) Symptom: MySQL is fine, but Redis CPU is high
Root cause: Too many operations per request, chatty access patterns, or Lua scripts doing heavy work.
Fix: Batch reads (MGET/pipelining), cache richer objects to reduce call count, and keep Lua scripts tiny and well-tested.
6) Symptom: After Redis restart, app melts down and MySQL follows
Root cause: Cold cache plus stampede, no rate limiting on rebuild, and no circuit breaker behavior.
Fix: Use soft TTL, single-flight locking per key, background warming for critical keys, and cap rebuild concurrency.
7) Symptom: Data loss accusations after Redis failover
Root cause: Treating Redis as authoritative store for business-critical state with async replication.
Fix: Put truth in MySQL (or another durable store), cache derived views in Redis, and document the loss window you can tolerate for coordination keys.
8) Symptom: Redis cluster is stable but clients see timeouts
Root cause: Connection pool exhaustion, DNS/endpoint changes, NAT port pressure, or misconfigured client timeouts.
Fix: Instrument client pools, keep Redis timeouts tight but realistic, reuse connections, and avoid per-request connect/disconnect patterns.
Checklists / step-by-step plan
Step-by-step: introducing Redis to reduce MySQL load
- Pick a target: one endpoint or one query digest that dominates total MySQL time, not “the whole database.”
- Define the cached object: what the service actually needs (e.g., “user_context:v3:{user_id}”).
- Decide the consistency model: TTL-only, invalidation-on-write, or versioned keys.
- Set a TTL with jitter: start conservative (minutes), add ±10–20% jitter to avoid synchronized expiry.
- Implement cache-aside: GET, on miss SELECT, SETEX, return.
- Add stampede protection: single-flight per key (lock key with short TTL) or request coalescing in-process.
- Instrument: cache hit ratio, MySQL QPS reduction, P50/P95/P99 latency, error rates, eviction rates.
- Failure behavior: define what happens when Redis is down. Fast fail with fallback for critical reads; circuit breaker to prevent retry storms.
- Capacity plan: estimate key count, average value size, TTL churn, and memory overhead. Set maxmemory explicitly.
- Roll out gradually: feature flag, percentage-based enablement, and easy rollback.
Checklist: is this data safe to put in Redis?
- Can you recompute it from MySQL or other durable stores?
- Can you tolerate stale reads for up to TTL?
- Can you tolerate losing the last few seconds of writes on failover?
- Is there a clear owner for invalidation?
- Is the key cardinality bounded, or can it explode with user input?
Checklist: production readiness for Redis in front of MySQL
- Redis timeouts are tight and retries are bounded.
- Circuit breaker exists for Redis errors/timeouts.
- maxmemory and eviction policy are explicitly configured.
- SLOWLOG monitored; big keys avoided; SCAN used instead of KEYS.
- Cold-cache scenario tested at peak-like load.
- MySQL has headroom to survive cache degradation.
FAQ
1) Can Redis replace MySQL?
Not for most applications. Redis can store data, but MySQL provides relational modeling, constraints, and durability semantics Redis doesn’t match by default. Use Redis to accelerate MySQL, not impersonate it.
2) What’s the safest first Redis use case?
Sessions, rate limiting, and caching derived read models. These are naturally time-bound and don’t pretend to be the system of record.
3) How do I estimate whether I can cut MySQL load by 80%?
Look at top query digests by total time and count. If a small set of repetitive read queries dominates and the results are cacheable, big reductions are plausible. If writes dominate or reads are highly unique, you won’t get 80%.
4) Should I cache individual rows or full objects?
Prefer full objects or aggregated views that match service boundaries. Row-level caching often leads to many Redis calls per request and complex invalidation logic.
5) What TTL should I use?
Pick TTL based on acceptable staleness and operational risk. Short TTL reduces staleness but increases churn and stampede risk. Add jitter. Also: TTL is not a consistency model; it’s a backstop.
6) What eviction policy should I use?
For typical caching: an LRU/LFU variant with TTL awareness is common. The key is to set maxmemory and plan for evictions in the application behavior. “Noeviction” can be fine for coordination data if you size it correctly, but it can also convert memory pressure into outages.
7) How do I keep Redis from becoming a single point of failure?
Run Redis with replication and failover, but more importantly: make the application resilient. Tight timeouts, bounded retries, circuit breakers, and a tested degraded mode that doesn’t stampede MySQL.
8) Is Redis persistence (AOF/RDB) enough for critical data?
Usually no. Persistence reduces loss windows but does not provide the relational and transactional guarantees many critical datasets require. If the data would appear in a legal audit, it belongs in MySQL (or an equivalent durable system), not “mostly durable” memory.
9) Why did Redis increase MySQL load during an outage?
Because your fallback path probably causes a stampede: every cache miss turns into a DB query, and retries multiply traffic. Fix with single-flight, rate-limited rebuild, and circuit breakers.
10) What about using Redis for search or analytics?
Redis can support secondary indexes and fancy structures, but operationally it’s not a general-purpose analytics engine. If you need flexible filtering and aggregation at scale, keep that workload in systems designed for it.
Next steps you can do this week
If you want Redis to cut MySQL load dramatically, stop thinking in slogans and start thinking in contracts. MySQL is truth. Redis is speed. Truth without speed is slow; speed without truth is a future incident report.
- Pull the top 5 MySQL query digests by total time and pick one target endpoint.
- Design a cache key that represents the service-level object, not a table row.
- Implement cache-aside with explicit invalidation on writes and TTL jitter.
- Add stampede protection and circuit breaking before you enable it globally.
- Measure: hit rate, evictions, Redis latency, and MySQL QPS/CPU before and after.
Then do the boring test: restart Redis in staging under load and confirm your application doesn’t panic. If it panics in staging, it will panic in production—just with more witnesses.