Docker CPU at 100%: Find the Noisy Container and Cap It Properly

Was this helpful?

The host is pinned. Load average is climbing like it has somewhere important to be. SSH feels sticky. Your dashboards say “CPU 100%,” but they don’t say whose fault it is. Docker is involved, which means the problem is either neatly contained… or beautifully distributed.

This is the playbook I use when a Linux box is burning cycles and containers are the prime suspects. It’s practical, it’s a little opinionated, and it will help you identify the noisy container, prove it’s the real bottleneck, and cap it in a way that doesn’t boomerang into latency, throttling, or weird scheduler behavior.

Fast diagnosis playbook

When CPU is pegged, you don’t need a meditation retreat. You need a fast triage loop that distinguishes:

  • One container burning CPU vs. many containers each “a little” hot
  • CPU saturation vs. CPU throttling vs. run queue contention
  • Real work vs. spin loops vs. kernel overhead

First: confirm it’s CPU and not “CPU-shaped” I/O

  • Run uptime and top to see load average vs. CPU idle.
  • If load is high but CPU idle is also high, you’re likely blocked on I/O or locks.

Second: identify the container(s) responsible

  • Use docker stats --no-stream for a quick ranking.
  • Then map host PIDs back to containers (because “docker stats” can lie by omission when things get weird).

Third: decide whether to cap, scale, or fix the code

  • Cap when one workload is bullying neighbors and you can tolerate more latency for that workload.
  • Scale when the workload is legitimate and throughput matters.
  • Fix when you see spin loops, retries, hot locks, or pathological GC.

Fourth: apply a cap that matches your scheduler reality

  • On Linux, Docker CPU control is cgroups. Your limits are only as sane as your cgroup version and kernel behavior.
  • Pick CPU quota for “this container can use up to X CPU time.” Pick cpuset for “this container can only run on these cores.”

If you’re on-call and need a one-liner: identify the top host PID, map it to a container, then check whether it’s being throttled already. Capping a container that’s already throttled is like telling someone to “calm down” while you’re holding their head underwater.

What “CPU 100%” really means on Docker

“CPU 100%” is one of those metrics that sounds precise and behaves like gossip.

  • On a 4-core host, a single fully busy core is 25% of total CPU if you’re measuring total capacity.
  • In Docker tools, container CPU percentage can be reported relative to a single core or all cores depending on the calculation and version.
  • In cgroups, CPU usage is measured as time (nanoseconds). Limits are enforced as quotas per period, not “percent” in a human sense.

The key operational insight: you can have a host showing “100% CPU” while the important user-facing service is slow because it’s being throttled, starved by run queue contention, or losing time to steal on a hypervisor.

Here’s the mental model that won’t betray you:

  • CPU usage tells you how much time was spent running.
  • Run queue tells you how many threads want to run but can’t.
  • Throttling tells you the kernel actively prevented a cgroup from running because it hit quota.
  • Steal time tells you the VM wanted CPU but the hypervisor said “not now.”

Facts and context: why this is trickier than it looks

Some context points that matter in production because they explain the weirdness you’ll see in output:

  1. Docker CPU limits are cgroup limits. Docker didn’t invent CPU isolation; it’s a wrapper around Linux cgroups and namespaces.
  2. cgroups v1 vs v2 changes the plumbing. Many “why doesn’t this file exist?” debugging sessions are just “you’re on v2 now.”
  3. CFS bandwidth control (CPU quota/period) was merged into the Linux kernel long before containers were mainstream; containers popularized it, but they didn’t originate it.
  4. CPU shares are not a hard cap. Shares are a weight used only under contention; they won’t stop a container from using idle CPU.
  5. “cpuset” is older and blunt. Pinning to cores is deterministic but can waste CPU if you pin poorly or ignore NUMA realities.
  6. Throttling can look like “low CPU usage.” A container can be slow while reporting modest CPU because it’s spending time blocked by quota enforcement.
  7. Load average includes more than CPU. On Linux, load average counts tasks in uninterruptible sleep too, so storage issues can masquerade as “CPU problems.”
  8. Virtualization adds steal time. On oversubscribed hosts, the VM’s “CPU 100%” might be mostly “I would have run if I could.”
  9. Monitoring has a long tail of lies. CPU metrics are easy to collect and easy to misinterpret; differences in sampling windows and normalization create phantom spikes.

One quote worth keeping on your desk:

Werner Vogels (paraphrased idea): “Everything fails; design so failure is expected and handled, not treated as an exception.”

Hands-on tasks: commands, outputs, decisions

These are real tasks I expect an on-call engineer to run. Each one includes: command, what output means, and what decision you make from it. Don’t run them all blindly; run them as a branching investigation.

Task 1: See if the host is actually CPU-saturated

cr0x@server:~$ uptime
 14:22:19 up 37 days,  6:11,  2 users,  load average: 18.42, 17.96, 16.10

What it means: Load average ~18 on an 8-core box is trouble; on a 32-core box it might be fine. Load alone is not proof of CPU saturation.

Decision: Next, check CPU idle and run queue with top or mpstat. If CPU idle is high, pivot to I/O or locks.

Task 2: Check CPU idle, steal time, and the top offenders

cr0x@server:~$ top -b -n1 | head -25
top - 14:22:27 up 37 days,  6:11,  2 users,  load average: 18.42, 17.96, 16.10
Tasks: 512 total,   9 running, 503 sleeping,   0 stopped,   0 zombie
%Cpu(s): 94.7 us,  2.1 sy,  0.0 ni,  0.6 id,  0.0 wa,  0.0 hi,  0.3 si,  2.3 st
MiB Mem :  32114.2 total,   1221.4 free,  14880.3 used,  16012.5 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.  14880.9 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
21483 root      20   0 1614920  82364  19624 R 380.0   0.3  76:12.39 python
12914 root      20   0 2344812 109004  40220 R 160.0   0.3  21:44.07 node
 9881 root      20   0  986604  61172  17640 R  95.0   0.2  12:11.88 java

What it means: CPU idle ~0.6%: you’re CPU-saturated. Steal time is 2.3%: not huge, but it’s telling you the hypervisor is taking some cycles.

Decision: Identify these PIDs: are they in containers? If yes, map them to container IDs. If no, you have a host process issue (or a container running with host PID namespace—yes, people do that).

Task 3: Quick container ranking via docker stats

cr0x@server:~$ docker stats --no-stream
CONTAINER ID   NAME                    CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
a12b3c4d5e6f   api-prod-1              265.42%   612.4MiB / 2GiB        29.90%    1.2GB / 980MB    11.2MB / 0B      78
b98c7d6e5f4a   worker-prod-queue       410.11%   1.1GiB / 1GiB          110.02%   120MB / 98MB     2.4GB / 1.9GB    213
c11d22e33f44   redis-prod              18.33%    3.2GiB / 4GiB          80.00%    420MB / 390MB    0B / 0B          6

What it means: worker-prod-queue is both CPU-hot and over memory limit (110% implies swap/oom risk depending on settings). The CPU numbers over 100% indicate multi-core usage.

Decision: Don’t cap CPU yet. First confirm whether that worker is doing legitimate work, thrashing due to memory pressure, or spinning. Memory pressure can cause CPU burn via GC, allocation churn, or retry storms.

Task 4: Map a host PID to its container

cr0x@server:~$ ps -o pid,comm,cgroup -p 21483 | sed -n '1,3p'
  PID COMMAND         CGROUP
21483 python          0::/docker/b98c7d6e5f4a9c5b6a3e1c2d9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a

What it means: This host PID belongs to a Docker cgroup (cgroups v2 unified path shown as 0::). The long hash matches a container ID prefix.

Decision: Confirm with docker ps and docker inspect, then investigate inside that container.

Task 5: Confirm the container identity and entrypoint

cr0x@server:~$ docker ps --no-trunc | grep b98c7d6e5f4a
b98c7d6e5f4a9c5b6a3e1c2d9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a   acme/worker:7f3c1d   "python /app/worker.py"   3 days ago   Up 3 days   worker-prod-queue

What it means: It’s a Python worker, probably queue consumers or batch jobs.

Decision: If it’s a batch worker, CPU caps are often safe. If it’s latency-sensitive (API), caps can become self-inflicted outages.

Task 6: Check whether the container is already capped

cr0x@server:~$ docker inspect -f 'NanoCpus={{.HostConfig.NanoCpus}} CpuQuota={{.HostConfig.CpuQuota}} CpuPeriod={{.HostConfig.CpuPeriod}} CpusetCpus={{.HostConfig.CpusetCpus}} CpuShares={{.HostConfig.CpuShares}}' worker-prod-queue
NanoCpus=0 CpuQuota=0 CpuPeriod=0 CpusetCpus= CpuShares=0

What it means: No explicit CPU caps or weights are configured. It can consume all CPU it can get.

Decision: You have the option to cap. But first, verify what it’s doing, and check throttling metrics after you apply a cap.

Task 7: Check cgroup v2 CPU stats for throttling

cr0x@server:~$ CID=b98c7d6e5f4a9c5b6a3e1c2d9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a
cr0x@server:~$ CGP=$(docker inspect -f '{{.HostConfig.CgroupParent}}' $CID); echo "${CGP:-/sys/fs/cgroup}"
/sys/fs/cgroup
cr0x@server:~$ cat /sys/fs/cgroup/docker/$CID/cpu.stat
usage_usec 129884420
user_usec 127110003
system_usec 2774417
nr_periods 0
nr_throttled 0
throttled_usec 0

What it means: On this host, the container’s cgroup exists under /sys/fs/cgroup/docker/<id>. No throttling yet because there’s no quota configured.

Decision: If you later set a quota and nr_throttled climbs rapidly with high throttled_usec, you’ve created a throughput/latency ceiling. That may be correct, but it must be intentional.

Task 8: Validate whether the “CPU issue” is actually kernel time or user time

cr0x@server:~$ pidstat -p 21483 1 3
Linux 6.5.0-18-generic (server) 	01/02/2026 	_x86_64_	(16 CPU)

14:23:31      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
14:23:32        0     21483   92.00    6.00    0.00    0.00   98.00     7  python
14:23:33        0     21483   93.00    5.00    0.00    0.00   98.00     7  python
14:23:34        0     21483   90.00    7.00    0.00    0.00   97.00     7  python

What it means: Mostly user time. That points to application-level compute or a tight loop, not kernel overhead.

Decision: Profile inside the container (or on the host for that PID) and look for hot functions, busy-wait loops, or a queue backlog driving legitimate work.

Task 9: Enter the container and see if it’s a thread party

cr0x@server:~$ docker exec -it worker-prod-queue bash -lc 'ps -eLo pid,tid,pcpu,comm --sort=-pcpu | head'
  PID   TID %CPU COMMAND
    1     1 96.4 python
    1    42 92.1 python
    1    43 91.8 python
    1    44 90.9 python
    1    45 90.2 python
   88    88  1.1 bash

What it means: Multiple threads are hot. For Python this can mean multiple processes/threads, C extensions doing work, or something like gevent/eventlet that still burns CPU.

Decision: If it’s a worker pool, cap it or reduce concurrency. If it’s not supposed to be multi-threaded, check for accidental parallelism (e.g., a library spawning threads, or a configuration change).

Task 10: Use perf to find hot spots (host-side, no container tooling required)

cr0x@server:~$ sudo perf top -p 21483 -n 5
Samples: 1K of event 'cycles', 4000 Hz, Event count (approx.): 250000000
Overhead  Shared Object          Symbol
  22.11%  python                 [.] _PyEval_EvalFrameDefault
  15.37%  python                 [.] PyObject_RichCompare
  10.02%  libc.so.6              [.] __memcmp_avx2_movbe
   7.44%  python                 [.] list_contains
   6.98%  python                 [.] PyUnicode_CompareWithASCIIString

What it means: CPU is going into interpreter evaluation and comparisons. That’s real compute, not a kernel bug. It also suggests the workload might be data-heavy comparisons (filters, dedupe, scanning).

Decision: For immediate containment: cap CPU or throttle work concurrency. For long-term: profile at application level; maybe you’re doing O(n²) comparisons on a queue batch.

Joke #1: If your worker is doing O(n²) inside a loop, congratulations—it’s reinvented the space heater.

Task 11: Check run queue pressure per CPU

cr0x@server:~$ mpstat -P ALL 1 2
Linux 6.5.0-18-generic (server) 	01/02/2026 	_x86_64_	(16 CPU)

14:24:31     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %idle
14:24:32     all    92.11    0.00    5.02    0.00    0.00    0.44    2.12    0.31
14:24:32       7    99.00    0.00    0.90    0.00    0.00    0.10    0.00    0.00
14:24:32       8    97.00    0.00    2.70    0.00    0.00    0.30    0.00    0.00

What it means: Several CPUs are basically maxed out. If only a couple cores were hot, you’d consider cpuset pinning or checking single-threaded bottlenecks.

Decision: If the host is globally saturated, capping one container is a fairness move. If only one core is hot, capping by quota won’t fix single-thread limits; you’d fix concurrency or pinning.

Task 12: Inspect container resource constraints in cgroups v2 (quota and effective CPUs)

cr0x@server:~$ cat /sys/fs/cgroup/docker/$CID/cpu.max
max 100000
cr0x@server:~$ cat /sys/fs/cgroup/docker/$CID/cpuset.cpus.effective
0-15

What it means: cpu.max is “quota period”. max means unlimited. Period is 100000 microseconds (100ms). Effective CPUs show the container can run on all 16 CPUs.

Decision: If you want “2 CPUs worth” of time, you’ll set quota to 200000 for period 100000, or use Docker’s --cpus=2 convenience flag.

Task 13: Apply a CPU cap live (carefully) and verify throttling

cr0x@server:~$ docker update --cpus 4 worker-prod-queue
worker-prod-queue
cr0x@server:~$ docker inspect -f 'CpuQuota={{.HostConfig.CpuQuota}} CpuPeriod={{.HostConfig.CpuPeriod}} NanoCpus={{.HostConfig.NanoCpus}}' worker-prod-queue
CpuQuota=400000 CpuPeriod=100000 NanoCpus=4000000000
cr0x@server:~$ cat /sys/fs/cgroup/docker/$CID/cpu.max
400000 100000

What it means: The container can now consume up to 4 CPUs worth of time per 100ms period. Docker translated --cpus into quota/period.

Decision: Watch host CPU and service latency. If the noisy container is non-critical, this is often the right immediate fix. If it’s critical, you may need to allocate more CPU or scale out instead.

Task 14: Confirm whether the cap is causing throttling (and whether it’s acceptable)

cr0x@server:~$ sleep 2; cat /sys/fs/cgroup/docker/$CID/cpu.stat
usage_usec 131992884
user_usec 129050221
system_usec 2942663
nr_periods 2201
nr_throttled 814
throttled_usec 9811123

What it means: Throttling is happening (nr_throttled increased). The container is hitting its CPU quota. That’s expected when you cap a hot workload.

Decision: Decide if throttling is the goal (protect other services) or a sign you capped too hard (throughput collapse, backlog growth). Check queue depth and latency. If backlog grows, either raise cap or scale out workers.

Task 15: Identify the container’s top threads from the host (no exec needed)

cr0x@server:~$ ps -T -p 21483 -o pid,tid,pcpu,comm --sort=-pcpu | head
  PID   TID %CPU COMMAND
21483 21483 96.2 python
21483 21510 92.0 python
21483 21511 91.7 python
21483 21512 90.5 python
21483 21513 90.1 python

What it means: The hot threads are visible from the host. Useful when containers are minimal images without debugging tools.

Decision: If one thread dominates, you’re in single-thread land. If many threads are hot, quota caps will behave more predictably.

Task 16: Check if you’re fighting CPU steal (VM host oversubscription)

cr0x@server:~$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
13  0      0 125132 210204 8023412    0    0     0     5 2841 7102 91  5  2  0  2
18  0      0 124980 210204 8023520    0    0     0     0 2911 7299 90  5  2  0  3
16  0      0 124900 210204 8023604    0    0     0     0 2780 7010 89  6  2  0  3
14  0      0 124820 210204 8023710    0    0     0    10 2894 7255 90  5  2  0  3
15  0      0 124700 210204 8023794    0    0     0     0 2810 7098 91  5  1  0  3

What it means: st (steal) is 2–3%. Not catastrophic. If you see 10–30%, you’re not “CPU limited,” you’re “hardware neighbor limited.”

Decision: If steal is high, caps won’t fix the experience. Move workloads, resize instances, or change host placement. Otherwise you’re optimizing the wrong layer.

How to cap CPU properly (without self-sabotage)

“Just cap it” is how you create the next incident. CPU limits are a contract: you’re telling the kernel, “This workload may be slowed down to protect the rest.” Make that contract explicit and testable.

Pick the right control: quota vs shares vs cpuset

1) CPU quota/period (the default sane cap)

Use when: you want a container to get at most N CPUs worth of time, but still allow the scheduler to place it across cores.

  • Docker flag: --cpus 2 (convenience) or --cpu-quota and --cpu-period.
  • Kernel mechanism: CFS bandwidth control.

What can go wrong: aggressive caps cause heavy throttling, which can create bursty latency. Your app becomes a metronome: runs, gets throttled, runs again.

2) CPU shares (a fairness weight, not a limit)

Use when: you want relative priority between containers under contention, but you’re OK with any container using spare CPU when the host is idle.

  • Docker flag: --cpu-shares.

What can go wrong: people set shares expecting a cap, then wonder why a runaway container still pegs the host at night.

3) Cpuset (pinning to cores)

Use when: you have licensing constraints, you’re isolating noisy neighbors, or you’re managing NUMA/cache locality intentionally. This is for adults who like graphs.

  • Docker flag: --cpuset-cpus 0-3.

What can go wrong: pinning to the “wrong” cores can collide with IRQ affinity, other pinned workloads, or leave half the machine idle while one core is melting.

My preferred approach in production

  1. Start with a quota cap using --cpus, set high enough to avoid constant throttling.
  2. Measure throttling via cpu.stat after the change. Throttling isn’t automatically bad; unexpected throttling is.
  3. If you need stronger isolation (e.g., multi-tenant), add cpuset pinning, but only after you’ve audited CPU topology and interrupt distribution.
  4. Use shares to bias critical services higher than best-effort ones, but don’t pretend it’s a seatbelt.

How much CPU should you give?

Don’t guess. Use the workload’s observed CPU demand and business tolerance.

  • For API services, keep enough CPU to protect p99 latency. If CPU is hot, scaling out is often safer than capping.
  • For workers/batch, cap for fairness and set concurrency to match the cap. Otherwise you’ll just throttle a stampede.
  • For databases, be careful: CPU caps can amplify tail latency and create lock contention. Prefer dedicated nodes or cpusets if you must isolate.

Throttle-aware validation: what to watch after capping

After you apply a cap, check these signals:

  • Host load and CPU idle: Did other services recover?
  • Container throttling: Does nr_throttled climb constantly?
  • Queue depth/backlog: If backlog grows, you reduced throughput below arrival rate.
  • Latency/error rates: For synchronous services, caps often show up as timeouts, not as clean “slower responses.”

Joke #2: CPU quotas are like corporate budgets—everyone hates them, but the alternative is one team buying six espresso machines and calling it “infrastructure.”

Compose, Swarm, and the “why didn’t my limit work?” trap

There are two classes of “my CPU limit doesn’t work” tickets:

  1. It was never applied. The orchestrator ignored it or you put it in the wrong section.
  2. It was applied, but you measured wrong. You expected “50%” and got “still hot” because the host has many cores, or because the workload bursts and gets throttled later.

Docker Compose: version pitfalls

Compose has historically had two places for limits: the older cpu_shares/cpus style and the Swarm-style deploy.resources. The catch: non-Swarm Compose ignores deploy limits in many setups. People paste configs from blog posts and assume the kernel obeys YAML.

If you want a reliable local/Compose cap, validate it with docker inspect after docker compose up. Don’t trust the file.

cr0x@server:~$ docker compose ps
NAME                 IMAGE             COMMAND                  SERVICE   CREATED         STATUS         PORTS
stack_worker_1        acme/worker:7f3c1d "python /app/worker.py" worker    2 minutes ago   Up 2 minutes

cr0x@server:~$ docker inspect -f 'CpuQuota={{.HostConfig.CpuQuota}} CpuPeriod={{.HostConfig.CpuPeriod}} NanoCpus={{.HostConfig.NanoCpus}}' stack_worker_1
CpuQuota=200000 CpuPeriod=100000 NanoCpus=2000000000

What it means: The limits are actually applied. If they show zeros, your Compose settings didn’t land.

Decision: Fix the configuration so limits are applied where your runtime honors them, or enforce via docker update and then codify it properly.

Swarm and Kubernetes differences (operational, not philosophical)

  • In Swarm, deploy.resources.limits.cpus is real and enforced because Swarm schedules tasks with those constraints.
  • In Kubernetes, CPU limits and requests interact with QoS classes. Limits can throttle; requests influence scheduling. “I set a limit” is not the same as “I guaranteed CPU.”

If you’re debugging on Docker hosts but your mental model comes from Kubernetes, be careful: you may be missing the “request vs limit” nuance that affects noisy neighbor behavior.

Three corporate mini-stories from the CPU trenches

Mini-story 1: The incident caused by a wrong assumption

The company had a Docker host running “just a few services.” That phrase is a lie people tell themselves to feel in control. One of the services was an API, another was a queue worker, and a third was a metrics sidecar that nobody wanted to touch because it “worked.”

During a busy period, the host started hitting 100% CPU. The on-call engineer opened the dashboard, saw the API’s CPU line spiking, and made the reasonable-but-wrong assumption: “The API is the problem.” They capped the API container to 1 CPU with a live update. The host CPU dropped. Everyone relaxed for about eight minutes.

Then the error rate rose. Timeouts appeared. The API wasn’t “fixed”; it was strangled. The real culprit was the worker container flooding Redis with retries because it had hit a memory limit and started swapping inside the container’s cgroup environment. The worker created a retry storm. The API merely suffered while trying to keep up.

What made this incident educational was the postmortem detail: the API container’s CPU spikes were a symptom of downstream contention and retries, not a root cause. When the API was capped, it couldn’t even respond quickly enough to shed load gracefully. The worker was still loud, the API just became weak.

The fix wasn’t dramatic. They removed the API cap, added a quota cap to the worker, and—this part is always awkward—reduced worker concurrency so it matched the new CPU contract. The retry storm stopped. Redis calmed down. CPU normalized. The lesson stuck: cap the bully, not the victim.

Mini-story 2: The optimization that backfired

A different team had a batch ingestion pipeline in containers. They wanted more throughput and noticed that CPU was underutilized during off-peak hours. Someone proposed “more parallelism” by increasing worker threads from 4 to 32. It sounded modern. It looked great in a quick test. In production, it turned into a slow-motion meltdown.

The pipeline was parsing compressed data and doing schema validation. With 32 threads per container and several containers per host, they created a thundering herd on CPU and memory. Context switching climbed. Cache locality got worse. The host scheduler did its best impression of a juggler in a windstorm.

They tried capping CPU to “stabilize” it. It stabilized, yes—in the same way a car “stabilizes” when it hits a wall. Throttling went through the roof and throughput collapsed. Work queues backed up, and the team started scaling out containers. That made contention worse, because the bottleneck was the host’s CPU and memory bandwidth, not the number of containers.

What finally fixed it was boring: they dialed worker threads back down, then increased container count slightly but pinned the batch workload to a cpuset range away from latency-sensitive services. They also learned to measure nr_throttled and not just CPU percent. The “optimization” backfired because it assumed CPU is linear. In real systems, parallelism competes with itself.

Mini-story 3: The boring but correct practice that saved the day

There’s a kind of organization that doesn’t do heroics because it doesn’t need to. One platform team had a hard rule: every container in production must declare CPU and memory constraints, and they must be validated by automation after deployment.

Engineers complained. They said it slowed shipping. They said it was “Kubernetes thinking” even though this was plain Docker. The platform team ignored them politely and kept enforcing. They also required that each service define a “degradation mode” decision: when CPU is constrained, do you drop work, queue it, or fail fast?

One afternoon, a third-party library update introduced a regression: a busy loop triggered under a rare input pattern. A subset of requests caused CPU spikes. Normally this would have taken the host down and created an incident spanning multiple services.

Instead, the affected container hit its CPU quota and got throttled. It became slower, yes, but it didn’t starve the rest of the node. The other services kept serving. The monitoring showed a clean signal: throttling for that one service rose. The on-call quickly rolled back. No cascading failure, no “all services degraded,” and no midnight war room.

The boring practice—always set limits, always validate them, always define behavior under constraint—didn’t just prevent resource contention. It made the failure mode legible.

Common mistakes: symptoms → root cause → fix

This is the section where we save you from repeating the greatest hits.

1) Symptom: Host CPU is 100%, but docker stats shows nothing extreme

Root cause: Host processes are hot (journald, node exporter, kernel threads), or containers run with host PID namespace, or your docker stats sampling misses short spikes.

Fix: Use host PID tools first: top, then map hot PIDs to containers via ps -o cgroup. If no Docker cgroup, it’s not a container problem.

2) Symptom: After setting --cpus, the service gets slower and timeouts increase

Root cause: You capped a latency-sensitive service below its p99 needs, causing throttling bursts and request queueing.

Fix: Remove or raise the cap; scale horizontally; add backpressure and concurrency limits. Measure cpu.stat throttling and request latency together.

3) Symptom: You set CPU shares but the container still pegs the host

Root cause: CPU shares are weights, not a cap. With idle CPU available, the container can take it.

Fix: Use --cpus or --cpu-quota for a hard cap. Keep shares for prioritization under contention.

4) Symptom: Load average is huge, but CPU idle is also high

Root cause: Blocked tasks (I/O wait, uninterruptible sleep), lock contention, or filesystem stalls. Load average counts more than runnable tasks.

Fix: Check top for wa, use iostat (if available), inspect blocked tasks, and look for storage/network bottlenecks. Don’t “cap CPU” for an I/O problem.

5) Symptom: CPU is hot only on one core, and performance is terrible

Root cause: Single-threaded bottleneck, global lock, or one hot shard/partition.

Fix: Don’t add CPU caps; fix concurrency or partitioning. If you must isolate, cpuset pinning can stop one hot thread from interfering with everything else, but it won’t make it faster.

6) Symptom: CPU usage looks fine, but the service is slow

Root cause: Throttling: the container is capped and spends time unable to run. CPU percent can look moderate because throttled time is not “CPU usage.”

Fix: Read cpu.stat (nr_throttled, throttled_usec). Raise the cap, reduce concurrency, or scale out.

7) Symptom: CPU spikes happen after you “optimized” logging or metrics

Root cause: High-cardinality metrics, expensive log formatting, synchronous logging, or contention in telemetry pipelines.

Fix: Reduce cardinality, sample, batch, or move heavy formatting off hot paths. Cap the telemetry sidecars too; they’re not innocent.

8) Symptom: After pinning cpuset, throughput drops and some CPUs are idle

Root cause: Bad core selection, interference with IRQs, NUMA mismatch, or pinning too few cores for burst patterns.

Fix: Prefer quota first. If using cpuset, audit CPU topology, consider NUMA nodes, and keep headroom for kernel/interrupt work.

Checklists / step-by-step plan

Step-by-step: locate the noisy container

  1. Measure the host: uptime, top. Confirm CPU idle is low and user/system split.
  2. Rank containers quickly: docker stats --no-stream.
  3. Rank host PIDs: in top sort by CPU, copy top PIDs.
  4. Map PID → cgroup: ps -o cgroup -p <pid>. If it’s under /docker/<id>, you have your container.
  5. Confirm container name/image: docker ps --no-trunc | grep <id> and docker inspect.
  6. Validate what it’s doing: pidstat, perf top, or inside-container ps.

Step-by-step: cap it safely

  1. Decide the goal: protect other workloads vs preserve throughput for this one.
  2. Pick the control: quota (--cpus) for most cases; shares for relative weighting; cpuset for hard isolation.
  3. Apply cap live (if needed): docker update --cpus N <container>.
  4. Verify it applied: docker inspect and cat cpu.max (v2) or the equivalent v1 files.
  5. Measure throttling: check cpu.stat after a few seconds.
  6. Watch service SLOs: latency, errors, queue depth, retries. If they degrade, raise cap or scale out.
  7. Make it permanent: update Compose/Swarm config or your deployment pipeline; don’t leave live docker update as tribal magic.

Step-by-step: prevent recurrence

  1. Set defaults: every service declares CPU and memory constraints.
  2. Validate automatically: post-deploy checks compare desired limits vs docker inspect.
  3. Instrument throttling: alert on sustained nr_throttled increases for critical services.
  4. Align concurrency: worker pool sizes should scale with CPU caps; avoid “32 threads because cores exist somewhere.”
  5. Kill switches: feature flags or rate limits to reduce work creation during spikes.

FAQ

1) Why does docker stats show 400% CPU for one container?

Because it’s using ~4 cores worth of CPU time during the sampling window. CPU% is often normalized to one core, so multi-core usage exceeds 100%.

2) Is --cpus the same as --cpuset-cpus?

No. --cpus uses CFS quota/period (time-based cap). --cpuset-cpus restricts which CPUs the container may run on (placement-based isolation). They solve different problems.

3) Will CPU shares prevent a runaway container from pegging the host?

Only when there’s contention. If the host has idle CPU, shares won’t stop a container from consuming it. Use quota for a hard cap.

4) How do I know if throttling is hurting me?

Read cpu.stat. If nr_throttled and throttled_usec climb steadily while latency/backlog worsens, you capped too hard for the current workload.

5) Why is the host load average high but CPU idle is not zero?

Load average includes tasks waiting on I/O (uninterruptible sleep), not just runnable CPU tasks. High load with significant idle can mean storage stalls, locks, or other blocking.

6) Can I cap CPU on a running container without restarting it?

Yes: docker update --cpus N <container> (and related flags) applies live. Treat it as an emergency lever; codify changes in your deployment config after the incident.

7) I set limits in docker-compose.yml under deploy: but nothing changed. Why?

Because deploy limits are primarily for Swarm mode. Many non-Swarm Compose setups ignore them. Always confirm with docker inspect that limits were applied.

8) Should I cap the database container when CPU is hot?

Usually no, not as a first move. Databases under CPU pressure often need more CPU, query optimization, or isolation. Capping can turn brief contention into long tail latency and lock pileups.

9) What about “CPU 100%” inside a container—does it mean the same thing as the host?

It depends on the tool and normalization. Inside the container you may see a view that ignores host capacity. Trust host-level accounting and cgroup stats for limits and throttling.

10) Is cpuset pinning worth it for noisy-neighbor problems?

Sometimes. It’s powerful and deterministic, but easy to do poorly. Start with quota caps and shares; move to cpuset when you need hard isolation and you understand CPU topology.

Conclusion: practical next steps

When Docker hosts hit 100% CPU, the winning move is not “restart things until the graph looks nicer.” The winning move is: identify the exact container (and PID), validate whether it’s doing real work or nonsense, then cap with intent and confirm throttling behavior.

Next time the host is cooking, do this:

  • Use top to get the hottest PIDs, map them to containers via cgroups.
  • Use docker stats as a hint, not a verdict.
  • Before capping, decide whether the workload is latency-sensitive or throughput-oriented.
  • Apply docker update --cpus for quick containment, then confirm via cpu.stat whether throttling is acceptable.
  • After the incident, make limits permanent, validate them automatically, and align worker concurrency to the CPU contract.

CPU emergencies are rarely mysterious. They’re usually just under-instrumented, over-assumed, and under-capped. Fix those three, and your on-call rotations get shorter—and slightly less poetic.

← Previous
ZFS Missing Device Paths: Using by-id and WWN Like an Adult
Next →
ZFS dnodes and Metadata: Why Metadata Can Be Your Real Bottleneck

Leave a comment