Docker on cgroups v2: pain, errors, and the fix path

Was this helpful?

You upgrade a distro. Docker used to run. Now it doesn’t. Or worse: it runs, but resource limits don’t apply, CPUs get weird, and your “memory capped at 2G” container eats the host. You stare at docker run like it betrayed you personally.

cgroups v2 is better engineering. It’s also a compatibility tax. This is the field guide to the exact errors you’ll see, what they actually mean, and a fix path that doesn’t involve chanting at /sys/fs/cgroup.

What changed with cgroups v2 (and why Docker feels it)

Control groups (“cgroups”) are the kernel’s accounting and throttling framework: CPU, memory, IO, pids, and friends. Docker uses them to implement flags like --memory, --cpus, --pids-limit, and to keep container processes in a neat tree.

cgroups v1 and v2 aren’t just different versions. They’re different models.

cgroups v1: many hierarchies, many gotchas

v1 lets each controller (cpu, memory, blkio, etc.) mount separately. That flexibility created a fun hobby for Linux distros: mounting different controllers in different places and then watching tools make assumptions. Docker grew up here. So did a lot of half-working scripts.

cgroups v2: one hierarchy, consistent rules, different semantics

v2 is “unified hierarchy”: one tree, controllers enabled per subtree, and rules about delegation. It’s more coherent. It also means software must be explicit about how it creates and manages cgroups, and whether systemd is in charge.

Where Docker trips

  • Driver mismatch: Docker can manage cgroups itself (cgroupfs driver) or delegate to systemd (systemd driver). With v2, systemd-as-manager is the happy path on most systemd distros.
  • Old runc/containerd: early v2 support was partial; older versions explode with “unsupported” or silently ignore limits.
  • Rootless constraints: delegation in v2 is stricter. Rootless Docker can work, but it relies on systemd user services and proper delegation.
  • Hybrid modes: some systems run a “mixed” v1/v2 setup. That’s a recipe for “it works on this host but not that one.”

Paraphrased idea from John Allspaw (reliability engineering): “Blame rarely fixes outages; understanding systems does.” That’s the posture you need with cgroups v2: stop arguing with the symptom, map the system.

Facts & short history that will save you time

These are the kind of small, concrete truths that prevent you from debugging the wrong layer.

  1. cgroups were merged into Linux in 2007 (under the “process containers” effort). The original model assumed lots of independent hierarchies.
  2. cgroups v2 started landing around 2016 to fix v1’s fragmentation and subtle semantics bugs, especially around memory and delegation.
  3. systemd adopted cgroups deeply and became the default cgroup manager on many distros; Docker initially resisted, then converged on systemd driver as the sane default for modern setups.
  4. v2 changes memory behavior: memory accounting and enforcement are cleaner, but you must understand memory.max, memory.high, and pressure signals. Old “oom-kill as flow control” habits get exposed.
  5. IO control changed names and meaning: v1’s blkio.* becomes v2’s io.*, and some old tuning advice simply doesn’t apply.
  6. v2 requires explicit controller enablement: you can have a cgroup tree where controllers exist but aren’t enabled for a subtree, producing “file not found” behavior that looks like permission problems.
  7. Delegation is intentionally strict: v2 prevents unprivileged processes from creating arbitrary subtrees unless the parent is configured for delegation. This is why rootless setups fail in novel ways.
  8. Some Docker flags depend on kernel config: even with v2, missing kernel features produce confusing “not supported” errors that look like Docker bugs.

The errors you’ll see in real life

cgroups v2 failures cluster into three buckets: daemon startup, container startup, and “it runs but limits lie.” Here are the classics, with what they usually mean.

Daemon won’t start

  • failed to mount cgroup or cgroup mountpoint does not exist: Docker expects v1 layout but the host is v2 unified, or mounts are missing/locked down.
  • cgroup2: unknown option when mounting: kernel too old or mount options wrong for that kernel.
  • OCI runtime create failed: ... cgroup ... during daemon init: runc/containerd mismatch with host cgroup mode.

Container won’t start (OCI errors)

  • OCI runtime create failed: unable to apply cgroup configuration: ... no such file or directory: controller files not available in the target cgroup, often because controllers aren’t enabled in that subtree.
  • permission denied writing to /sys/fs/cgroup/...: delegation issue (rootless or nested cgroup manager) or systemd owns the subtree and Docker tries cgroupfs.
  • cannot set memory limit: ... invalid argument: using v1-era knobs or a kernel that doesn’t support a specific setting; also happens when swap limits are misconfigured.

Containers run but limits don’t apply

  • docker stats shows unlimited memory: Docker can’t read memory limit from cgroup v2 path it expects; driver mismatch or older Docker.
  • CPU quotas ignored: cpu controller not enabled for that subtree, or you’re using a cgroup driver arrangement that never attaches tasks where you think it does.
  • IO throttling doesn’t work: you’re on v2 and still trying to tune blkio; or the block device driver doesn’t support the requested policy.

Joke #1: cgroups v2 is like adulthood: more structure, fewer loopholes, and everything you used to do “because it worked” is now illegal.

Fast diagnosis playbook (first/second/third)

If you only have five minutes before your change window closes, do this in order.

First: confirm what cgroup mode the host is actually running

Don’t infer from distro version. Check the filesystem and kernel flags.

  • Is /sys/fs/cgroup type cgroup2 (unified) or multiple v1 mounts?
  • Is the system in hybrid mode?

Second: check Docker’s cgroup driver and runtime versions

Most “cgroups v2 pain” is either wrong driver or old runtime.

  • Is Docker using systemd or cgroupfs?
  • Are containerd and runc new enough for your kernel/distro?

Third: verify controller availability and delegation

If a specific limit fails (memory, cpu, pids), confirm the controller is enabled where Docker is placing containers.

  • cgroup.controllers exists and lists the controller.
  • cgroup.subtree_control includes it for the parent cgroup.
  • Permissions and ownership make sense (especially rootless).

Practical tasks: commands, output meaning, and decisions

These are the “stop guessing” tasks. Each one has: command, what the output means, and what decision you make next.

Task 1: Identify cgroup filesystem type (v2 vs v1)

cr0x@server:~$ stat -fc %T /sys/fs/cgroup
cgroup2fs

Meaning: cgroup2fs means unified cgroups v2. If you see tmpfs here and separate mounts for controllers, you’re likely on v1/hybrid.

Decision: If it’s v2, plan on Docker + systemd driver (or at minimum verify compatibility). If it’s v1/hybrid, decide whether to migrate or explicitly force legacy mode for consistency.

Task 2: Confirm what’s mounted under /sys/fs/cgroup

cr0x@server:~$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

Meaning: Single cgroup2 mount: unified mode. If you see many lines like cgroup on /sys/fs/cgroup/memory, that’s v1.

Decision: If unified, stop trying to debug v1 paths like /sys/fs/cgroup/memory. They won’t exist.

Task 3: Check kernel boot parameters affecting cgroups

cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.5.0 root=/dev/mapper/vg0-root ro quiet systemd.unified_cgroup_hierarchy=1

Meaning: systemd.unified_cgroup_hierarchy=1 forces v2 unified. Some systems use the inverse (or distro defaults) to force v1.

Decision: If your org wants predictable behavior, standardize this flag across your fleet (either v2 everywhere, or v1 everywhere). Mixed fleets are where on-call joy goes to die.

Task 4: Ask systemd what it thinks about cgroups

cr0x@server:~$ systemd-analyze --version
systemd 253 (253.5-1)
+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

Meaning: default-hierarchy=unified means systemd expects cgroups v2.

Decision: If systemd is unified, align Docker with systemd cgroup driver unless you have a compelling reason not to.

Task 5: Inspect Docker’s cgroup driver and cgroup version

cr0x@server:~$ docker info --format '{{json .CgroupVersion}} {{json .CgroupDriver}}'
"2" "systemd"

Meaning: Docker sees cgroups v2 and is using the systemd driver. This is the stable combination on modern systemd distros.

Decision: If you get "2" "cgroupfs" on a systemd host, consider switching to systemd driver to avoid delegation and subtree-control surprises.

Task 6: Check runtime component versions (containerd, runc)

cr0x@server:~$ docker info | egrep -i 'containerd|runc|cgroup'
 Cgroup Driver: systemd
 Cgroup Version: 2
 containerd version: 1.7.2
 runc version: 1.1.7

Meaning: Older containerd/runc combos are where v2 support gets “creative.” Modern versions are less exciting.

Decision: If you’re on an older Docker Engine package pinned for “stability,” unpin it—because now it’s unstable. Upgrade the engine/runc/containerd set as a unit where possible.

Task 7: Validate that controllers exist (host-wide)

cr0x@server:~$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

Meaning: These are the controllers supported by the kernel and exposed in the unified hierarchy.

Decision: If a controller you need (like memory or io) is missing, you’re not in a “Docker config problem.” You’re in a kernel/feature problem.

Task 8: Verify controllers are enabled for the subtree Docker will use

cr0x@server:~$ cat /sys/fs/cgroup/cgroup.subtree_control
+cpu +io +memory +pids

Meaning: The plus-prefixed list indicates which controllers are enabled for child cgroups at that level. If +memory isn’t here, child cgroups won’t have memory.max and friends.

Decision: If controllers aren’t enabled, you either fix the parent cgroup configuration (often via systemd unit settings) or move Docker’s cgroup placement to a properly delegated subtree (systemd driver helps).

Task 9: Verify where Docker is placing a container in the cgroup tree

cr0x@server:~$ docker run -d --name cgtest --memory 256m --cpus 0.5 busybox:latest sleep 100000
b7d32b3f6b1f3c3b2c0b9c9f8a7a6e5d4c3b2a1f0e9d8c7b6a5f4e3d2c1b0a9

cr0x@server:~$ docker inspect --format '{{.State.Pid}}' cgtest
22145

cr0x@server:~$ cat /proc/22145/cgroup
0::/system.slice/docker-b7d32b3f6b1f3c3b2c0b9c9f8a7a6e5d4c3b2a1f0e9d8c7b6a5f4e3d2c1b0a9.scope

Meaning: On v2 there’s a single unified entry (0::). The container is in a systemd scope unit under system.slice, which is what you want when using the systemd cgroup driver.

Decision: If the container lands somewhere unexpected (or in a Docker-created cgroup outside systemd’s tree), reconcile Docker’s cgroup driver and systemd configuration.

Task 10: Confirm memory limit is actually applied (v2 files)

cr0x@server:~$ CGPATH=/sys/fs/cgroup/system.slice/docker-b7d32b3f6b1f3c3b2c0b9c9f8a7a6e5d4c3b2a1f0e9d8c7b6a5f4e3d2c1b0a9.scope
cr0x@server:~$ cat $CGPATH/memory.max
268435456

cr0x@server:~$ cat $CGPATH/memory.current
1904640

Meaning: memory.max is bytes. 268435456 is 256 MiB. memory.current shows current usage.

Decision: If memory.max is max (unlimited) despite Docker flags, you have a cgroup placement/driver mismatch or a runtime bug. Don’t tune the application; fix the cgroup plumbing.

Task 11: Confirm CPU quota is applied (v2 semantics)

cr0x@server:~$ cat $CGPATH/cpu.max
50000 100000

Meaning: In v2, cpu.max is “quota period”. Here: 50ms quota per 100ms period = 0.5 CPU.

Decision: If you see max 100000, quota isn’t applied. Check whether the cpu controller is enabled and whether Docker actually honored --cpus in your version.

Task 12: Check pids limit (a common “works until it doesn’t”)

cr0x@server:~$ cat $CGPATH/pids.max
1024

cr0x@server:~$ cat $CGPATH/pids.current
3

Meaning: pids control works. Without this, fork-bombs become “unexpected load tests.”

Decision: If pids.max is missing, controllers aren’t enabled for the subtree. Fix delegation/enablement, not Docker flags.

Task 13: Look for the smoking gun in journald

cr0x@server:~$ journalctl -u docker -b --no-pager | tail -n 20
Jan 03 08:12:11 server dockerd[1190]: time="2026-01-03T08:12:11.332111223Z" level=error msg="failed to create shim task" error="OCI runtime create failed: unable to apply cgroup configuration: mkdir /sys/fs/cgroup/system.slice/docker.service/docker/xyz: permission denied: unknown"
Jan 03 08:12:11 server dockerd[1190]: time="2026-01-03T08:12:11.332188001Z" level=error msg="failed to start daemon" error="... permission denied ..."

Meaning: Docker is trying to create cgroups in a place systemd doesn’t allow (or the process lacks delegation rights). The path is the clue.

Decision: Move to systemd cgroup driver, or fix delegation rules if rootless/nested, instead of chmod-ing random sysfs files (that’s not a fix, it’s a confession).

Task 14: Confirm systemd is the cgroup manager for Docker

cr0x@server:~$ systemctl show docker --property=Delegate,Slice,ControlGroup
Delegate=yes
Slice=system.slice
ControlGroup=/system.slice/docker.service

Meaning: Delegate=yes is crucial: it tells systemd to allow the service to create/manage sub-cgroups. Without it, cgroups v2 delegation breaks in ways that look like random permission errors.

Decision: If Delegate=no, fix the unit drop-in (or use the packaged unit that already sets it correctly). This is often the whole problem.

Task 15: Detect rootless mode (different rules, different pain)

cr0x@server:~$ docker info --format 'rootless={{.SecurityOptions}}'
rootless=[name=seccomp,profile=default name=rootless]

Meaning: Rootless changes what cgroups you can touch, and how delegation must be configured in user sessions.

Decision: If rootless is enabled and cgroup writes fail, stop trying to “fix” it in /sys/fs/cgroup as root. Configure systemd user services and delegation properly, or accept feature limits.

The fix path (choose your route, don’t guess)

There are only a few stable end-states. Pick one deliberately. “It works on my laptop” is not an architecture.

Route A (recommended on systemd distros): cgroups v2 + Docker systemd driver

This is the cleanest option on modern Ubuntu/Debian/Fedora/RHEL-ish systems where systemd is PID 1 and unified hierarchy is default.

1) Ensure Docker is configured for systemd cgroup driver

Check current state first (Task 5). If it’s not systemd, set it.

cr0x@server:~$ sudo mkdir -p /etc/docker
cr0x@server:~$ sudo tee /etc/docker/daemon.json >/dev/null <<'EOF'
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "journald"
}
EOF

What it means: This tells dockerd to create containers under systemd-managed cgroups rather than doing its own cgroupfs hierarchy management.

Decision: If you run Kubernetes, make sure kubelet uses the same driver. Mismatch is a classic “node looks fine until it isn’t” failure mode.

2) Restart Docker and verify

cr0x@server:~$ sudo systemctl daemon-reload
cr0x@server:~$ sudo systemctl restart docker
cr0x@server:~$ docker info | egrep -i 'Cgroup Driver|Cgroup Version'
 Cgroup Driver: systemd
 Cgroup Version: 2

What it means: You’re aligned: systemd + v2 + Docker driver.

Decision: If Docker fails to start, immediately check journald (Task 13). Don’t “just reboot.” Reboots are how you turn a reproducible problem into folklore.

3) Confirm delegation is correct on the docker.service unit

Most packaged units are correct. Custom ones often aren’t.

cr0x@server:~$ systemctl show docker --property=Delegate
Delegate=yes

Decision: If Delegate=no, add a drop-in:

cr0x@server:~$ sudo systemctl edit docker <<'EOF'
[Service]
Delegate=yes
EOF

Route B: force cgroups v1 (legacy mode) to buy time

This is a tactical retreat. It can be appropriate when you have third-party agents, old kernels, or vendor appliances that can’t handle v2 yet. But treat it like debt with interest.

1) Switch systemd to legacy hierarchy via kernel command line

Exact mechanism depends on distro/bootloader. The principle: set a kernel parameter to disable unified hierarchy.

cr0x@server:~$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-6.5.0 root=/dev/mapper/vg0-root ro quiet systemd.unified_cgroup_hierarchy=0

Meaning: You’re forcing v1/hybrid behavior.

Decision: If you do this, document it and make it a fleet standard. A half-v2 fleet is how you get “only that AZ is broken.”

Route C: rootless Docker on cgroups v2 (works, but don’t romanticize it)

Rootless is great for developer boxes and some multi-tenant scenarios. For production, it’s fine if you accept constraints and test them. cgroups v2 actually makes rootless more coherent, but only with correct systemd user delegation.

Typical rules:

  • Use systemd user services where possible.
  • Expect some resource controls to behave differently than rootful mode.
  • Make peace with the fact that security posture and operability trade off.

Joke #2: Rootless Docker with cgroups v2 is perfectly doable, like making espresso while riding a bike—just don’t try it for the first time during an outage.

Route D: containerd directly (when Docker’s ergonomics aren’t worth it)

Some orgs move to containerd + nerdctl (or orchestration) to reduce moving parts. If you’re already deep in Kubernetes, Docker’s value is mostly developer UX. In production, fewer layers can mean fewer cgroup surprises.

But the fix path is the same: make systemd the cgroup manager, keep containerd/runc current, and validate controller enablement.

Three corporate mini-stories (painfully plausible)

Mini-story 1: The incident caused by a wrong assumption

The company migrated a large batch workload from an older LTS to a newer one. The change ticket said “Docker upgrade + security patches.” The assumption was simple: “Same containers, same limits.”

After the upgrade, Docker started. Jobs started. Everything looked green. Then the hosts began swapping, latency spiked, and unrelated services started timing out. The on-call did what on-call does: restarted containers, drained nodes, and blamed “noisy neighbors.” It got worse.

The wrong assumption was that --memory was being enforced. On the new OS, the host was on cgroups v2. Docker was still using the cgroupfs driver because an old golden image had a lingering daemon.json from years ago. Containers were placed in a subtree without the memory controller enabled. The kernel wasn’t ignoring the limits out of spite; it literally had nowhere to apply them.

The fix wasn’t tuning the jobs. The fix was aligning Docker to the systemd driver, verifying Delegate=yes, and then proving enforcement by reading memory.max in the container’s scope cgroup. The postmortem action item wasn’t “monitor memory harder.” It was “standardize cgroup mode and driver at the fleet level.”

Mini-story 2: The optimization that backfired

A platform team tried to reduce CPU throttling noise by “smoothing quotas.” They shifted many services from strict quotas to shares, thinking it would reduce contention and increase overall throughput. They also enabled a new set of kernel defaults as part of the move to unified cgroups v2.

At first it looked good: fewer throttling alerts, better median latency. Then a heavy batch service rolled out a new version and started doing periodic bursts. The bursts were legal under shares-based scheduling, but they stole enough CPU to push a latency-sensitive API into a tail-latency spiral.

The team’s mental model was v1-era: “shares are gentle, quotas are harsh.” Under v2, with unified accounting and different controller behaviors, the absence of a hard cap meant the batch job could dominate during bursts, especially when the CPU controller wasn’t enabled where they thought it was. The “optimization” moved the bottleneck from visible throttling to invisible queueing.

The fix was boring: restore explicit cpu.max limits for the bursty workload, enable cpu controller consistently in the correct subtree, and keep shares for genuinely cooperative services. They also added a validation step in CI that inspects the running cgroup files after deployment. “Trust but verify” is a cliché because it keeps being true.

Mini-story 3: The boring but correct practice that saved the day

A finance-adjacent service ran on a small cluster that always seemed over-provisioned. Someone asked why they were paying for idle. The SRE answer was predictably annoying: “Because we like sleeping.”

They had a practice: for every node image change, they ran a short conformance script. Not a huge test suite—just a dozen checks: cgroup mode, Docker driver, controller availability, and a single container that sets memory and CPU limits and proves them by reading memory.max and cpu.max. It took under two minutes.

One day a new base image slipped in with unified cgroups enabled but an outdated container runtime package set due to a pinned repository. Docker ran, containers ran, but memory limits were flaky and sometimes failed with invalid argument. Their conformance script caught it before production rollout.

The fix was as dull as it was effective: unpin the runtime packages, upgrade as a bundle, rerun the conformance checks, and roll forward. No incident. No executive email thread. Just the quiet satisfaction of being right in advance.

Common mistakes: symptom → root cause → fix

This section is intentionally blunt. These are the patterns I keep seeing in the field.

1) “Docker starts, but memory limits don’t work”

Symptom: docker run --memory succeeds; container uses more than the limit; docker stats shows unlimited.

Root cause: Docker using cgroupfs driver on a systemd+v2 host, placing containers in a subtree without +memory in cgroup.subtree_control, or an older runtime misreporting v2 limits.

Fix: Switch to systemd cgroup driver, confirm Delegate=yes, verify memory.max in the container’s scope path.

2) “OCI runtime create failed: permission denied” under /sys/fs/cgroup

Symptom: Containers fail at create with sysfs permission errors.

Root cause: Missing delegation to the Docker service, rootless user not delegated controllers, or SELinux/AppArmor policy conflicts that surface as write failures.

Fix: Ensure systemd unit has Delegate=yes, use systemd driver, validate rootless prerequisites, check audit logs if MAC is enabled.

3) “No such file or directory” for cgroup files that should exist

Symptom: Error references files like cpu.max or memory.max missing.

Root cause: Controller not enabled for that subtree. In v2, controller presence in cgroup.controllers is not the same as being enabled for children.

Fix: Enable controllers at the correct parent level (systemd slice configuration or proper subtree placement). Re-check cgroup.subtree_control.

4) “CPU quota ignored”

Symptom: --cpus set, but container uses full cores.

Root cause: cpu controller not enabled, or the container ends up outside the managed subtree due to driver mismatch.

Fix: Validate cpu.max in the container’s cgroup. If it’s max, fix controller enablement and driver alignment.

5) “It only breaks on some nodes”

Symptom: Identical deployment behaves differently across nodes.

Root cause: Fleet is mixed v1/v2, or mixed Docker cgroup drivers, or mixed runtime versions.

Fix: Standardize. Pick a cgroup mode and enforce it via image build + boot flags + config management. Then verify with a conformance check.

6) “We forced cgroups v1 to fix it and forgot”

Symptom: New tooling assumes v2; security agent assumes v2; you now have dueling expectations.

Root cause: Tactical rollback became permanent architecture.

Fix: Track it as debt with an owner and a date. Migrate deliberately, node pool by node pool.

Checklists / step-by-step plan

Checklist 1: New node image acceptance (10 minutes, saves hours)

  1. Confirm cgroup mode: stat -fc %T /sys/fs/cgroup should match your fleet standard.
  2. Confirm systemd hierarchy: systemd-analyze --version shows default-hierarchy=unified (if v2 standard).
  3. Confirm Docker sees v2: docker info shows Cgroup Version: 2.
  4. Confirm Docker driver: Cgroup Driver: systemd (recommended for v2+systemd).
  5. Confirm docker.service delegation: systemctl show docker --property=Delegate is yes.
  6. Run one test container with CPU and memory limits and read back memory.max and cpu.max from its scope cgroup.
  7. Check journald for warnings: journalctl -u docker -b.

Checklist 2: Fix path when Docker breaks right after a distro upgrade

  1. Confirm actual cgroup mode (Tasks 1–3). Don’t rely on release notes.
  2. Check Docker driver/version (Tasks 5–6). Upgrade if old.
  3. Check delegation and controller enablement (Tasks 7–8, 14).
  4. Reproduce with a minimal container (Task 9). Don’t debug your 4GB JVM yet.
  5. Confirm the limit is written (Tasks 10–12). If it’s not in the file, it’s not real.
  6. Only then touch application tuning.

Checklist 3: Standardization plan for a mixed cgroup fleet

  1. Pick a target: v2 unified + Docker systemd driver (most common), or v1 legacy (temporary).
  2. Define a compliance check (the acceptance checklist above) and run it per node pool.
  3. Upgrade Docker/containerd/runc as a bundle in each pool.
  4. Flip cgroup mode via boot flags only at pool boundaries (avoid mixed within a pool).
  5. Roll with canaries, verify by reading /sys/fs/cgroup and container scope files.
  6. Document the decision and bake it into image pipelines to prevent drift.

FAQ

1) How do I know if I’m on cgroups v2 without reading a blog post?

Run stat -fc %T /sys/fs/cgroup. If it prints cgroup2fs, you’re on v2 unified. If you see multiple v1 controller mounts, you’re on v1/hybrid.

2) Should I use Docker’s cgroupfs driver on a systemd host?

Avoid it unless you have a specific constraint. On systemd + cgroups v2, the systemd driver is the stable choice because it aligns delegation, slices/scopes, and controller enablement.

3) Why does docker stats show wrong limits on v2?

Usually runtime mismatch (older Docker/containerd/runc), or Docker is reading cgroup data from a path layout that doesn’t match where containers actually live. Verify by reading memory.max directly in the container’s scope cgroup.

4) My error says “no such file or directory” for memory.max. But v2 is enabled. Why?

Because v2 requires controllers to be enabled for the subtree. The file won’t exist if the memory controller isn’t enabled at the parent via cgroup.subtree_control.

5) Is forcing cgroups v1 an acceptable fix?

As a short-term escape hatch, yes. As a long-term strategy, it’s a liability. You’ll increasingly run into tooling and distros that assume v2. If you force v1, standardize it and track a migration plan.

6) Do I need to change my container resource flags for v2?

Usually no; Docker flags stay the same. What changes is whether those flags get translated into the correct v2 files and whether the kernel allows them in that subtree.

7) What’s the single most useful file to look at when debugging v2 resource issues?

The container’s cgroup path from /proc/<pid>/cgroup, then read the relevant files: memory.max, memory.current, cpu.max, pids.max. If the value isn’t there, the limit isn’t there.

8) Rootless Docker: should I run it in production with cgroups v2?

It can be fine for specific cases, but don’t treat it as a free lunch. You need proper systemd user delegation, and some controls differ from rootful behavior. Test the exact limits you rely on.

9) Kubernetes angle: what if kubelet and Docker use different cgroup drivers?

That mismatch is a reliability bug. Processes end up in unexpected subtrees, accounting gets weird, and limits may not apply. Align them (systemd/systemd on v2 is the common pairing).

10) What about IO limits—why did my blkio tuning stop working?

Because blkio is v1 terminology. On v2, look for io.* controls and confirm the kernel and device support the policy. Also verify the io controller is enabled for the subtree.

Conclusion: practical next steps

cgroups v2 isn’t “Docker being broken.” It’s the kernel insisting on a cleaner contract. The pain comes from mismatched expectations: drivers, delegation, and controller enablement.

Do this next:

  1. Standardize your fleet: pick cgroups v2 unified (preferred) or v1 legacy (temporary) and enforce it at boot.
  2. On v2 systemd hosts, set Docker to the systemd cgroup driver and verify Delegate=yes.
  3. Create a tiny conformance script that runs a test container and proves limits by reading cgroup files directly.
  4. Upgrade Docker/containerd/runc as a set. Old runtimes are where “works sometimes” is born.

If you take nothing else: when containers “ignore” limits, don’t argue with docker run flags. Read the cgroup files. The kernel is the source of truth, and it does not accept excuses.

← Previous
ZFS Queue Depth: Why Some SSDs Fly on ext4 and Choke on ZFS
Next →
ZFS RAIDZ3: When Triple Parity Is Worth the Disks

Leave a comment