MySQL vs MariaDB: temp tables on disk—how to stop them for real

Was this helpful?

You’re staring at a dashboard where CPU is bored, network is fine, but storage latency is climbing like it’s trying to prove a point. Queries that “should be easy” are suddenly slow. And the server’s temp directory is quietly filling with the kind of files nobody invited.

This is the classic pain: internal temp tables spilling to disk. It’s not one setting. It’s not one engine. It’s a three-way argument between your SQL, your server variables, and the specific MySQL/MariaDB version you’re actually running in production—not the one you remember from five years ago.

What “temp tables on disk” really means (and why you should care)

When MySQL or MariaDB can’t execute a query purely in memory, it creates an internal temporary table. This is not the same thing as a user-created CREATE TEMPORARY TABLE. Internal temp tables are engine-created scratchpads used for:

  • GROUP BY and DISTINCT without a usable index
  • ORDER BY that can’t be satisfied by an index (often with LIMIT missing or ineffective)
  • derived tables / subqueries / views that the optimizer decides to materialize
  • some window functions and complex set operations

The part that hurts: internal temp tables can live in memory or on disk. In older mental models, “memory temp table” meant the MEMORY/HEAP engine, and “disk temp table” meant MyISAM. Modern MySQL complicates that: internal temp tables may use InnoDB and its temporary tablespace. MariaDB has its own twists. Your monitoring metric—often Created_tmp_disk_tables—is a lagging indicator that says: “we spilled, we paid I/O, and now we’re pretending it’s normal.”

If you’re running on fast local NVMe with generous cache, you might survive. If you’re on networked storage, or a noisy cloud volume, disk temp tables are basically a performance tax you pay on every traffic spike.

Joke #1: A disk-based temp table is like doing taxes on a treadmill—technically possible, emotionally expensive.

MySQL vs MariaDB: where temp table behavior differs in practice

People like to treat MariaDB as “MySQL with extra features.” Operationally, that’s how you get paged at 2 a.m. Temp table behavior isn’t identical, and the differences matter when you’re trying to stop disk spills “for real.”

1) Internal temp table engine choices are not the same story

MySQL 8.0 heavily uses an internal temp table implementation, and when it spills it often lands in InnoDB’s temp tablespace (still “disk”). You’ll see artifacts in InnoDB temp I/O and sometimes in ibtmp1 growth behavior.

MariaDB (10.3+ especially) has different optimizer and temp table internals, and may lean on Aria/MyISAM behaviors depending on configuration and build. The terminology you see in status counters may look familiar while hiding different mechanics.

2) Status counters look similar, but don’t tell the full truth

Created_tmp_disk_tables exists in both, but what “disk” means can vary by version and internal engine. “Disk” might mean:

  • a file under tmpdir
  • pages in an internal InnoDB temp tablespace
  • a temp table that started in memory but converted after hitting a limit

3) Limits and thresholds differ in edge cases

Both use the dance between tmp_table_size and max_heap_table_size to decide whether an in-memory temp table can exist. But the conversion triggers can be more nuanced (data types, blob/text columns, row format, and specific operations). If you only crank those two variables and call it a day, you’ll reduce some spills and still suffer others.

4) Version-to-version changes are bigger than brand-to-brand stereotypes

MySQL 5.7 vs 8.0 can behave more differently than “MySQL vs MariaDB” at a high level. Same for MariaDB 10.1 vs 10.6. Any serious tuning starts with: “What exact version is running, and what exact workload is spilling?”

Fast diagnosis playbook

If you have 20 minutes before the next incident review, do this in order. It’s biased toward quickly isolating whether you’re dealing with query shape, memory thresholds, or storage throughput/latency.

First: prove it’s temp table I/O (not random disk noise)

  1. Check global temp-table counters and rate of change.
  2. Check if the slow queries correspond to Sort_merge_passes, Created_tmp_disk_tables, and handler reads jumping.
  3. Check filesystem-level I/O latency on the volume hosting tmpdir (or InnoDB temp tablespace).

Second: identify the top 1–3 query patterns creating spills

  1. Pull the top slow queries during the incident window.
  2. Run EXPLAIN and look for “Using temporary” / “Using filesort” / materialization.
  3. Look for GROUP BY and ORDER BY without supporting indexes, plus large intermediate result sets.

Third: decide whether to fix SQL, tune thresholds, or change the spill surface

  1. If one or two queries are responsible: fix SQL/indexing first. It’s the highest ROI and most stable.
  2. If many queries spill due to small limits: raise temp table limits carefully, considering concurrency and memory pressure.
  3. If spills are unavoidable: move tmpdir to fast local SSD/NVMe and make sure the filesystem and mount options aren’t sabotaging you.

Facts and history that explain today’s weirdness

  1. MySQL historically used MEMORY for in-memory temp tables and MyISAM for on-disk ones; that mental model still haunts tuning guides.
  2. InnoDB became the default engine in MySQL 5.5, changing typical production I/O patterns and making temp-table spills more visible on busy systems.
  3. MySQL 8.0 changed a lot of optimizer behavior, including derived table handling and internal temp table implementation details; upgrades can shift spill rates without changing SQL.
  4. MariaDB diverged meaningfully after the fork; it’s not just “drop-in MySQL,” especially around optimizer features and storage engines like Aria.
  5. The “Using temporary; Using filesort” EXPLAIN flags are old and still useful, but they don’t guarantee disk I/O—only that a temp structure exists.
  6. tmpdir has been a performance landmine for decades because it defaults to system temp paths that may sit on slow root volumes.
  7. “Disk temp tables” are often a symptom of missing indexes, not insufficient RAM; adding memory can mask a query bug until the next growth step.
  8. Cloud migration made this worse: ephemeral disks, network volumes, and noisy neighbors turn occasional spills into latency cliffs.

Mechanics: what forces temp tables to disk

The three big triggers

1) Size limits. The server estimates or observes the temp table growing beyond allowed in-memory limits and converts it to on-disk. The classic knobs are:

  • tmp_table_size
  • max_heap_table_size

Effective in-memory temp table capacity is typically the minimum of those two for MEMORY-based temp tables, but modern MySQL internal temp tables aren’t always MEMORY. Still, these variables remain part of the decision tree.

2) Column types and row format. If the temp table needs to store BLOB/TEXT columns, or other structures not supported efficiently in memory, you can spill even with generous limits. This is one reason “just increase tmp_table_size” often disappoints.

3) Query shape that creates huge intermediate results. Even if the final result is small, intermediate results can be massive. Classic offender: GROUP BY on a low-cardinality column while joining a wide table without selective predicates.

Why “stopping disk temp tables” is not a single toggle

Temp tables exist because the optimizer needs a workspace. You can reduce disk spills, but you cannot—and should not—try to eliminate temp tables globally. The goal is:

  • keep temp structures small
  • keep them in memory when it’s safe
  • avoid creating them in the first place by fixing SQL and indexes
  • when they hit disk, make disk fast and predictable

There’s also the concurrency trap: raising temp limits might help a single query, but hurt the system under load when 200 sessions each allocate bigger temp structures. You don’t want to win a benchmark and lose production.

Practical tasks (commands, outputs, decisions)

These are the tasks you actually run during a performance hunt. Each one includes: a command, realistic output, what it means, and the decision you make.

Task 1: Confirm version and flavor (you can’t tune a rumor)

cr0x@server:~$ mysql -e "SELECT VERSION() AS version, @@version_comment AS comment\G"
*************************** 1. row ***************************
version: 8.0.36
comment: MySQL Community Server - GPL

What it means: This is MySQL 8.0.x, so internal temp table behavior and InnoDB temp tablespace are in play.

Decision: Use MySQL 8.0 instrumentation (Performance Schema, sys schema views). Don’t apply MariaDB-only variables or old 5.6 folklore.

Task 2: Check temp table creation counters and their ratio

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Created_tmp%';"
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Created_tmp_disk_tables | 184229   |
| Created_tmp_files       | 912      |
| Created_tmp_tables      | 2441102  |
+-------------------------+----------+

What it means: Disk temp tables exist and are not rare. The ratio matters: 184k disk out of 2.4M temp tables is ~7.5%.

Decision: If the rate is climbing during incidents, you’re spilling under load. Move to rate-based monitoring (next task) and identify culprits.

Task 3: Measure disk temp table rate (not just totals)

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';" && sleep 10 && mysql -e "SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';"
+-------------------------+--------+
| Variable_name           | Value  |
+-------------------------+--------+
| Created_tmp_disk_tables | 184229 |
+-------------------------+--------+
+-------------------------+--------+
| Variable_name           | Value  |
+-------------------------+--------+
| Created_tmp_disk_tables | 184411 |
+-------------------------+--------+

What it means: 182 disk temp tables in 10 seconds: ~18/sec. That’s enough to pressure storage if they’re non-trivial.

Decision: If storage latency correlates, treat this as primary suspect, not background noise.

Task 4: Check the temp table memory thresholds

cr0x@server:~$ mysql -e "SHOW GLOBAL VARIABLES LIKE 'tmp_table_size'; SHOW GLOBAL VARIABLES LIKE 'max_heap_table_size';"
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| tmp_table_size| 16777216 |
+---------------+----------+
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| max_heap_table_size| 16777216 |
+--------------------+----------+

What it means: Both are 16MB. That’s conservative, and it will force disk spills for moderately sized GROUP BY/ORDER BY operations.

Decision: Consider raising them, but only after identifying whether spills are due to size vs data types vs query shape. Don’t blindly set 1G and congratulate yourself.

Task 5: Check where temp files go (tmpdir)

cr0x@server:~$ mysql -e "SHOW VARIABLES LIKE 'tmpdir';"
+---------------+------------+
| Variable_name | Value      |
+---------------+------------+
| tmpdir        | /tmp       |
+---------------+------------+

What it means: Temp spills land in /tmp. On many servers that’s on the root filesystem, which is often the least-loved disk.

Decision: If you have local NVMe, move tmpdir to a dedicated mount with predictable performance and capacity monitoring.

Task 6: Verify /tmp filesystem and mount options

cr0x@server:~$ df -hT /tmp && mount | grep ' /tmp '
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/nvme0n1p2 ext4  80G   72G  4.5G  95% /
tmpfs          tmpfs 16G   1.2G  15G   8% /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime,size=16384m)

What it means: Here /tmp is actually tmpfs (RAM-backed). That’s fast, but it’s also memory pressure waiting to happen.

Decision: If you see OOM risk or swapping, stop using tmpfs for DB temp. Prefer fast local SSD with quotas/monitoring, unless you have firm memory headroom and strict limits.

Task 7: Check InnoDB temp tablespace behavior (MySQL 8.0 typical)

cr0x@server:~$ mysql -e "SHOW GLOBAL VARIABLES LIKE 'innodb_temp_data_file_path'; SHOW GLOBAL STATUS LIKE 'Innodb_temp%';"
+--------------------------+------------------------+
| Variable_name            | Value                  |
+--------------------------+------------------------+
| innodb_temp_data_file_path | ibtmp1:12M:autoextend |
+--------------------------+------------------------+
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Innodb_temp_tablespaces    | 0       |
| Innodb_temp_tables         | 19231   |
| Innodb_temp_table_pages    | 8112    |
+----------------------------+---------+

What it means: Temp activity is happening in InnoDB. Spills might not show up as files in tmpdir the way older guides suggest.

Decision: Watch I/O on the datadir volume too, not only tmpdir. If ibtmp1 grows or I/O spikes, you have a second spill surface.

Task 8: Find “Using temporary” and “Using filesort” in real queries (quick sampling)

cr0x@server:~$ mysql -e "SELECT DIGEST_TEXT, COUNT_STAR, SUM_TIMER_WAIT/1000000000000 AS total_sec
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 5\G"
*************************** 1. row ***************************
DIGEST_TEXT: SELECT customer_id, COUNT(*) FROM orders WHERE created_at BETWEEN ? AND ? GROUP BY customer_id ORDER BY COUNT(*) DESC
COUNT_STAR: 1932
total_sec: 812.4432

What it means: A GROUP BY + ORDER BY aggregate is a prime candidate for temp tables and sort spills.

Decision: Pull this exact query pattern, run EXPLAIN, and check indexing. Don’t tune global memory until you know whether this is simply missing an index.

Task 9: Explain the suspect query and read the “Extra” column like a detective

cr0x@server:~$ mysql -e "EXPLAIN SELECT customer_id, COUNT(*) FROM orders WHERE created_at BETWEEN '2025-12-01' AND '2025-12-31' GROUP BY customer_id ORDER BY COUNT(*) DESC\G"
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: range
possible_keys: idx_created_at
          key: idx_created_at
      key_len: 8
          ref: NULL
         rows: 2100000
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort

What it means: The engine expects to scan a lot of rows and then build a temp table and sort it. This can spill.

Decision: Consider a composite index to support grouping, or change query to reduce scanned rows (narrow time range, pre-aggregate, summary table). Raising temp size might help but won’t fix the scan.

Task 10: Validate whether the temp spills correlate with storage latency

cr0x@server:~$ iostat -x 1 5
Linux 6.2.0-39-generic (db01) 	12/31/2025 	_x86_64_	(16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.22    0.00    3.91   22.14    0.00   55.73

Device            r/s     w/s   rkB/s   wkB/s  await  svctm  %util
nvme0n1         120.0   980.0  4200.0 49200.0  18.30   0.45  68.0

What it means: %iowait and device await are high during the event. This smells like spill-induced write pressure.

Decision: If the storage layer is the limiter, reducing spills or moving them to faster media is urgent. Otherwise you’ll “optimize queries” and still lose to I/O latency spikes.

Task 11: Check free space and inode pressure where temp files live

cr0x@server:~$ df -h /var/lib/mysql /tmp && df -i /var/lib/mysql /tmp
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p3  500G  410G   65G  87% /var/lib/mysql
tmpfs            16G  1.2G   15G   8% /tmp
Filesystem       Inodes   IUsed    IFree IUse% Mounted on
/dev/nvme0n1p3 32768000 812344 31955656    3% /var/lib/mysql
tmpfs           4194304    412  4193892    1% /tmp

What it means: Datadir has headroom but is getting full-ish. Temp is in tmpfs so inodes aren’t the issue here.

Decision: If disk temp tables are landing in datadir (InnoDB temp tablespace), ensure datadir volume has capacity. If /tmp is a real FS and near 100%, fix that before tuning anything else.

Task 12: Inspect current running statements during a spike

cr0x@server:~$ mysql -e "SHOW FULL PROCESSLIST;"
+-----+------+-----------------+------+---------+------+------------------------------+---------------------------+
| Id  | User | Host            | db   | Command | Time | State                        | Info                      |
+-----+------+-----------------+------+---------+------+------------------------------+---------------------------+
| 112 | app  | 10.0.3.21:51422 | prod | Query   |   18 | Creating sort index          | SELECT ... ORDER BY ...   |
| 145 | app  | 10.0.3.18:49811 | prod | Query   |   22 | Copying to tmp table on disk | SELECT ... GROUP BY ...   |
| 201 | app  | 10.0.3.19:53301 | prod | Query   |   11 | Sending data                 | SELECT ...                |
+-----+------+-----------------+------+---------+------+------------------------------+---------------------------+

What it means: You have live evidence: “Copying to tmp table on disk” and “Creating sort index.” That’s spill + sort work happening now.

Decision: Capture these query texts (from app logs or Performance Schema), then reproduce and fix. If this is widespread across many sessions, focus on indexing and temp thresholds.

Task 13: Spot “sort merge passes” indicating sort memory is too small

cr0x@server:~$ mysql -e "SHOW GLOBAL STATUS LIKE 'Sort_merge_passes'; SHOW GLOBAL VARIABLES LIKE 'sort_buffer_size';"
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Sort_merge_passes | 18291 |
+-------------------+-------+
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| sort_buffer_size | 262144  |
+------------------+---------+

What it means: Many merge passes: the sort buffer is small for the workload. That can increase disk activity even without temp tables being the primary driver.

Decision: Carefully increase sort_buffer_size only if you understand concurrency (it’s per-session). Prefer query/index fixes first; buffer inflation is a classic way to trade latency spikes for memory incidents.

Task 14: Validate memory headroom before raising per-session buffers

cr0x@server:~$ free -m
               total        used        free      shared  buff/cache   available
Mem:           65536       41220        2180        1320       22135       20240
Swap:           4096        1024        3072

What it means: Available memory is ~20GB, but swap is in use. That’s a yellow flag: the box is already under memory pressure at times.

Decision: Don’t “fix disk temp tables” by adding large per-thread buffers. You’ll just relocate pain from disk latency to swap storms.

Query patterns that manufacture disk temp tables

Pattern 1: GROUP BY without a supporting index

If you group on customer_id but only have an index on created_at, the engine reads rows by time range and then aggregates in a temp structure. If the group cardinality is high, the temp table grows. If it grows past memory thresholds or contains unfriendly types, it spills.

What to do: Build a composite index that matches your filter and group key in a useful order, for example (created_at, customer_id) or (customer_id, created_at) depending on selectivity and query shape. Then verify with EXPLAIN and actual runtime.

Pattern 2: ORDER BY on an expression (or a different column than the filter)

ORDER BY COUNT(*) DESC (aggregate order), ORDER BY LOWER(email), ORDER BY DATE(created_at)—these frequently force filesort and temp structures.

What to do: If you need ordering, consider precomputing sortable values, or rewriting to use indexed columns, or accepting approximate ordering. If the business wants “top N,” make sure the query actually uses LIMIT and can exploit it.

Pattern 3: DISTINCT across wide rows

SELECT DISTINCT * is the SQL equivalent of asking the server to deduplicate a warehouse. You get a temp table, you get memory pressure, and you get disk.

What to do: Select only the columns you need. Use a narrower DISTINCT key. Or better: fix the join logic so duplicates aren’t created in the first place.

Pattern 4: Derived tables that materialize

Subqueries and views that “look clean” can materialize into temp tables, especially when they include aggregation or sorting. MySQL and MariaDB differ in when they choose materialization, and version upgrades can change behavior.

What to do: Test with EXPLAIN and watch for derived table materialization. In many cases, rewriting a derived table into a join with proper indexes reduces temp work.

Pattern 5: Joining large sets before filtering

If the optimizer chooses a join order that creates a big intermediate result and only filters later, you’ll see temp and sort structures explode.

What to do: Ensure selective predicates are sargable (index-usable). Add indexes that support the join condition and the filter. If necessary, use optimizer hints cautiously (and document them like a loaded firearm).

Settings that matter (and settings that waste your time)

The knobs that actually influence temp table spills

tmp_table_size and max_heap_table_size

These are the first levers everyone reaches for because they’re easy. They also work—sometimes. The key points:

  • The effective in-memory temp table size is constrained by the lower of these two (in classic cases).
  • Raising them increases per-session potential memory use under concurrency.
  • They don’t fix spills caused by data types or by query plans that generate huge intermediate results.

Opinionated guidance: Raise them in measured steps (e.g., 16MB → 64MB → 128MB), only after you’ve identified the top spill queries. Track rates and p95 latency. If you jump straight to 512MB, you’re building a memory incident you haven’t met yet.

tmpdir

This controls where some on-disk temp structures land. If tmpdir is on slow storage, every spill becomes an I/O storm.

Opinionated guidance: Put tmpdir on fast local SSD/NVMe with monitoring and enough free space. Avoid root volume. Avoid networked filesystems. And don’t put it on tmpfs unless you have hard proof you won’t OOM under peak concurrency.

internal_tmp_mem_storage_engine (MySQL)

In some MySQL versions, this influences whether internal in-memory temp tables use MEMORY or a TempTable implementation. This can change performance and memory behavior, and interacts with how BLOB/TEXT are handled.

Opinionated guidance: Don’t change it as a first move. It’s a lever for specific workloads after you’ve confirmed the engine behavior in your version.

Per-session buffers: sort_buffer_size, join_buffer_size, read_rnd_buffer_size

These can reduce disk work for sorts and joins, but they are per-session and can balloon memory usage. Production systems don’t run one query at a time. They run 400 slightly-wrong queries concurrently.

Opinionated guidance: Keep these conservative. Fix SQL and indexes first. If you must tune buffers, do it with concurrency math and load testing, not vibes.

Settings people tweak that usually don’t solve disk temp tables

  • innodb_buffer_pool_size alone: helps general I/O, doesn’t prevent temp tables from needing workspace.
  • Thread/concurrency settings: can change pressure patterns but won’t fix why you’re spilling.
  • “Make the disk faster” as the only plan: it hides broken SQL until the next data growth milestone.

tmpdir, filesystems, and storage: make disk spill less painful

Sometimes you can’t avoid spills. Reporting queries exist. Ad-hoc analytics happens. Someone will run the “monthly rollup” at 9:55 a.m. on a weekday and swear it’s urgent.

So treat spill I/O like a workload class:

  • Put temp I/O on predictable media (local SSD/NVMe preferred).
  • Separate failure domains: don’t let temp files fill the same filesystem that holds your datadir.
  • Monitor capacity and latency for the temp mount explicitly.
  • Use filesystem options that match your risk appetite. For temp-only volumes, you may accept less durability. But be explicit, and document it.

A practical tmpdir layout

A common production approach:

  • /var/lib/mysql on redundant storage (or managed volume with durability)
  • /var/lib/mysqltmp on fast local disk, separate filesystem
  • tmpdir=/var/lib/mysqltmp in config

This doesn’t stop spills, but it stops temp spills from destroying the wrong part of your system.

Joke #2

If your temp tables live on the root filesystem, you’re not running a database; you’re running a suspense novel.

Three corporate mini-stories from the trenches

Mini-story #1: The incident caused by a wrong assumption

A mid-sized SaaS company migrated from an older MySQL to a newer build in a hurry—security patch window, exec attention, the usual. The team’s assumption was simple: temp table spills show up as files in /tmp, so they monitored /tmp usage and called it done.

The next Monday, dashboards turned red. Application threads were stuck waiting on database responses, and storage latency was spiking. Curiously, /tmp usage looked calm. The on-call engineer spent the first hour chasing ghosts: “If /tmp isn’t filling, it can’t be temp tables.”

The real issue: internal temp tables had shifted behavior with the newer version and were hammering the InnoDB temp tablespace on the datadir volume. The datadir volume was durable but not fast, and it was shared with other noisy workloads on the same underlying storage class.

Once they started graphing the right counters (InnoDB temp activity and disk await on the datadir device), the story made sense. The fix wasn’t glamorous: move spill surfaces to a fast local volume, then fix the top two GROUP BY queries that were producing massive intermediate results.

They updated the runbook with one sentence that mattered: “Disk temp tables may not touch tmpdir.” Nobody cheered, but the pager got quieter.

Mini-story #2: The optimization that backfired

An e-commerce team had a recurring slow report query. An engineer decided to “solve it” by raising tmp_table_size and max_heap_table_size dramatically. On a staging benchmark, the query improved. Everyone high-fived and moved on.

Production had different ideas. During a peak sale, concurrency was much higher. Multiple sessions ran the heavy report simultaneously—plus the normal application traffic. The larger temp tables stayed in memory longer, which sounded good, until the server started swapping. Latency exploded. The database didn’t crash outright; it did something worse. It became unpredictably slow.

The post-incident analysis showed the “optimization” didn’t reduce total work; it changed where the work happened. Disk spills became memory pressure, which became swap pressure, which became global slowdown. The original query was still poorly indexed and still scanned a large range; now it also competed harder for RAM.

The eventual fix was boring and effective: add a covering index to support the report’s filter and group key, and restrict the report job to a replica with resource limits. They kept temp sizes slightly higher than before, but within a range justified by concurrency math.

Mini-story #3: The boring but correct practice that saved the day

A financial services platform had a strict change process. People complained it was slow. They also had a habit of running realistic load tests for database config changes, including “what happens when three bad queries run at once.” It was tedious, and it worked.

Before a quarterly traffic bump, they reviewed their top temp-table creators from Performance Schema digest summaries. They didn’t chase every query—just the top offenders that correlated with p95 latency spikes. For each, they required either an index change, a query rewrite, or a documented acceptance that it would spill and therefore must run on a reporting replica.

They also enforced a simple storage rule: temp spill paths live on dedicated fast volumes with alerts on capacity and latency. Not because it was clever, but because it removed an entire failure mode: root filesystem saturation and surprise I/O contention.

When the traffic spike arrived, they still spilled to disk sometimes. But it was controlled, measurable, and didn’t interfere with transactional I/O. The incident that didn’t happen never gets budget, but it should.

Common mistakes: symptom → root cause → fix

1) Symptom: Created_tmp_disk_tables climbs fast during business hours

Root cause: A small number of high-frequency queries doing GROUP BY/ORDER BY without index support.

Fix: Find top digest offenders, add composite/covering indexes, reduce intermediate row counts. Then re-check rate, not totals.

2) Symptom: sudden storage latency spikes, but /tmp is quiet

Root cause: Spills happening in InnoDB temp tablespace on the datadir volume (common in MySQL 8.0 behavior).

Fix: Monitor InnoDB temp activity and datadir device latency. Consider moving workload, improving queries, and ensuring the datadir storage class can handle temp I/O.

3) Symptom: server swaps after you “fixed” spills by raising temp sizes

Root cause: Per-session memory increased under concurrency; aggregate memory use exceeded available RAM.

Fix: Roll back extreme buffer increases. Calculate worst-case memory. Prefer query/index improvements. Increase limits gradually with load tests.

4) Symptom: temp tables go to disk even with huge tmp_table_size

Root cause: Temp table contains BLOB/TEXT or other properties that force on-disk representation, or the query triggers materialization that doesn’t benefit from your size settings.

Fix: Narrow selected columns, avoid wide DISTINCT, rewrite derived tables, and confirm engine behavior for your exact version.

5) Symptom: “Copying to tmp table on disk” in processlist, but query is “small”

Root cause: Final output is small; intermediate result is not. Join order and filters create a large workspace.

Fix: Add selective predicates, index join keys, and ensure sargable conditions. Validate with EXPLAIN and actual row estimates.

6) Symptom: tmpdir fills up, MySQL errors, and apps fail in strange ways

Root cause: tmpdir on a small filesystem (often root), heavy spill workload, no alerting.

Fix: Move tmpdir to a dedicated volume. Add alerting on free space. Cap dangerous report workloads or move them off primary.

Checklists / step-by-step plan

Step-by-step: stopping disk temp tables the sane way

  1. Fingerprint the environment: confirm exact MySQL/MariaDB version, storage layout, tmpdir location.
  2. Measure spill rate: compute Created_tmp_disk_tables/sec during normal and incident windows.
  3. Correlate with storage: check device await, %iowait, and volume latency metrics.
  4. Identify top offenders: pull top statement digests by total time and frequency; isolate candidates with GROUP BY/ORDER BY/DISTINCT/derived tables.
  5. Explain and reproduce: run EXPLAIN on representative queries; confirm “Using temporary/filesort” and large row counts.
  6. Fix SQL/indexes first: composite indexes for filter+group, covering indexes for common patterns, remove SELECT * from DISTINCT queries.
  7. Then adjust thresholds: raise tmp_table_size/max_heap_table_size modestly if spills are still high and memory headroom exists.
  8. Harden tmp I/O path: move tmpdir to fast dedicated storage, ensure capacity alerts, and avoid sharing it with the OS root.
  9. Guardrails: restrict heavy reporting queries to replicas, schedule jobs, and cap concurrency at the app layer if needed.
  10. Verify outcome: compare before/after spill rate, p95 latency, and storage await. If those didn’t move, you didn’t fix the real problem.

A minimal “do not regret this later” checklist

  • tmpdir is not on root filesystem (unless root is truly fast and sized for it).
  • You can graph disk temp table rate, not just totals.
  • You know your top 5 temp-table-creating query digests.
  • You have a policy for reporting queries (replica, schedule, limits).
  • Changes to temp-related memory are load-tested under concurrency.

FAQ

1) Can I completely disable disk temp tables?

No, not realistically. You can reduce how often you spill and make spills less harmful. But some operations require workspace, and at scale you will spill sometimes. The real win is stopping pathological spills caused by bad plans and missing indexes.

2) If I set tmp_table_size to 1G, will that keep temp tables in memory?

Sometimes, for some temp tables. It can also trigger memory exhaustion under concurrency, or fail to help when data types/operations force disk usage. Large values are a tool, not a strategy.

3) Why do I see “Using temporary” but Created_tmp_disk_tables doesn’t increase?

Because a temp structure can exist in memory. “Using temporary” indicates a temp table is used, not that it spilled to disk. Also, the counters can be impacted by engine/version internals.

4) MySQL vs MariaDB: which one is better at avoiding disk temp tables?

Neither wins by default. The biggest factors are your version, your optimizer behavior, your SQL patterns, and your indexes. Pick based on ecosystem and operational fit, then tune based on measurements.

5) Should I put tmpdir on tmpfs to make spills “fast”?

Only if you have strong memory headroom and you’re comfortable with the failure mode. tmpfs makes spills fast right up until it makes the kernel angry. For most production systems, a dedicated fast disk is the safer choice.

6) Are disk temp tables always bad?

No. Occasional spills are normal. What’s bad is sustained high spill rate correlated with latency and I/O wait, or spills that fill filesystems and cause errors.

7) How do I know whether a spill is going to tmpdir or InnoDB temp tablespace?

Use a combination of evidence: tmpdir filesystem I/O and capacity, datadir device I/O, InnoDB temp status counters, and processlist states. Don’t rely on a single metric.

8) What’s the most effective fix for “Copying to tmp table on disk”?

Fix the query plan: add the right index, reduce intermediate result size, and avoid wide DISTINCT and expression sorts. Then ensure spills land on fast predictable storage when they still happen.

9) Is this a storage problem or a database problem?

Both, and that’s why it’s annoying. Bad SQL creates spills; slow or contended storage turns spills into outages. Treat it as a full-stack performance issue: query plan + memory thresholds + I/O path.

Conclusion: next steps that actually move the needle

Disk temp tables aren’t a moral failing. They’re a hint. They tell you the optimizer needed a scratchpad and didn’t fit it in memory—or didn’t even try because the operation demanded disk-friendly structures. Your job is to decide whether that scratchpad is legitimate work or a symptom of sloppy SQL and missing indexes.

Do the next three things:

  1. Measure spill rate during pain (Created_tmp_disk_tables/sec) and correlate it with device latency.
  2. Fix the top offenders with indexes and query rewrites. Don’t tune globally until you know what’s spilling.
  3. Make spills land somewhere sane: dedicated fast tmp storage, capacity alerts, and a plan for report workloads.

Reliability rule of thumb, paraphrased idea from Werner Vogels: you don’t build reliable systems by hoping; you build them by designing for failure and measuring reality.

← Previous
Debian 13 “Unable to locate package”: repo, architecture, and sources.list traps (and fixes)
Next →
Ubuntu 24.04: logrotate isn’t rotating — the one config mistake that keeps biting people

Leave a comment