You deploy a small change. A container that ran yesterday now dies instantly. Docker says something unhelpful like OCI runtime create failed, containerd mentions a “shim”, and runc coughs up permission denied like it’s trying to be mysterious.
Reinstalling Docker feels like action, but it’s usually just panic with extra steps. These failures are almost always diagnosable in-place: a broken mount, a cgroup mismatch, a storage driver issue, a seccomp/AppArmor block, or a disk that ran out of something you weren’t watching.
The mental model: who fails, and where to look
When a container won’t start, you need to know which layer is complaining. Docker is the front desk. containerd is the hotel manager. runc is the locksmith. The kernel is the building.
What Docker does (and doesn’t)
Docker’s daemon (dockerd) handles API requests, image management, networks, volumes, and orchestrates runtime operations. On most modern installs, Docker does not directly spawn your container process; it asks containerd to do that.
What containerd does
containerd manages container lifecycle and content. It creates “tasks” and invokes the OCI runtime (often runc) to set up namespaces, cgroups, mounts, and then exec the container process. If you see errors like failed to create shim task or shim exited, you’re in containerd territory, but the root cause is often still kernel/cgroup/filesystem.
What runc does
runc is the OCI runtime implementation. It reads an OCI config.json and asks the kernel to do the hard stuff: mount a rootfs, set up namespaces, apply seccomp filters, set capabilities, configure cgroups, then exec the target binary. When runc says permission denied it can mean “seccomp”, “LSM (AppArmor/SELinux)”, “filesystem permissions”, or “kernel refused the operation.” The message is rarely specific. That’s our job.
The kernel is where “impossible” becomes “obvious”
Most “Docker is broken” incidents are actually: your disk is full (or out of inodes), your kernel has cgroup v2 behavior you didn’t account for, your filesystem can’t do overlay mounts, your security policy blocks clone() flags, or your DNS / iptables rules got rearranged by someone who “tidied things up.”
Paraphrased idea (attributed to John Allspaw): “Reliability work is about understanding how systems actually fail, not how we wish they would.”
One opinion that will save you hours: don’t start by reinstalling. Start by proving which layer is failing, and whether the host is healthy enough to run anything.
Fast diagnosis playbook (check first/second/third)
This is the “I’m on-call and the blast radius is growing” order of operations. The goal isn’t to be thorough. The goal is to find the bottleneck quickly and stop guessing.
First: confirm the host isn’t lying to you
- Disk space and inodes on the Docker data path (
/var/lib/docker) and root filesystem. - Memory pressure and OOM activity.
- Kernel logs for mounts, seccomp, AppArmor/SELinux denials, cgroup failures.
Second: get the error from the right mouth
- docker events around the time of failure.
- dockerd logs (systemd journal) for full stack traces.
- containerd logs for shim/runc invocation errors.
Third: isolate the failing axis
- Storage axis: overlay2 mount failures, corrupted layer metadata, XFS d_type, NFS/shiftfs weirdness.
- Cgroup axis: cgroup v2, systemd driver mismatch, permission issues in rootless mode.
- Security axis: AppArmor/SELinux/seccomp blocks, no-new-privileges, missing capabilities.
- Runtime axis: runc binary mismatch, containerd plugin config, shim stuck processes.
Once you know the axis, you stop spraying changes. You do one deliberate thing, validate, and move forward. That’s how you keep outages short and postmortems boring.
Interesting facts and context (why this stack is weird)
- Docker split out containerd in 2016 to make the runtime components reusable beyond Docker; Kubernetes later standardized on containerd for many distros.
- runc came out of libcontainer, Docker’s early container engine, and became the reference OCI runtime implementation.
- OCI (Open Container Initiative) formed in 2015 to standardize image format and runtime behavior; it’s why “OCI runtime” appears in errors today.
- The “shim” exists so containers survive daemon restarts; containerd can die and come back without killing every container, because the shim holds the child process relationship.
- overlay2 became the de facto Linux storage driver because it’s fast and uses kernel OverlayFS, but it’s picky about filesystem features and mount options.
- cgroup v2 changed semantics (especially around delegation and controllers), and many “worked on v1” assumptions fail silently or loudly on v2.
- Rootless Docker isn’t just “Docker without sudo”; it uses user namespaces and different networking, and it fails differently (often more politely) than rootful mode.
- Seccomp defaults are conservative; they can break “exotic” syscalls used by some workloads (or by newer glibc) on older kernels.
- XFS needs ftype=1 (d_type) for overlay2 correctness; without it, you get spectacularly confusing layer and rename failures.
Joke 1/2: Containers are “lightweight” until you’re debugging them at 3 a.m., when every namespace weighs exactly one ton.
Practical debugging tasks (commands, outputs, decisions)
Below are real, runnable tasks. Each one has three parts: the command, what typical output means, and the decision you make. Do them in order when you’re lost; do them selectively when you’re not.
Task 1: Capture the exact error from Docker (not the summary)
cr0x@server:~$ docker ps -a --no-trunc | head -n 5
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2b2c0f3b7a0c5f8f1c2c6d5a0a7a5f0a6e9a8b7c6d5e4f3a2b1c0d9e8f7a6b5c4 myapp:latest "/entrypoint.sh" 2 minutes ago Created myapp_1
cr0x@server:~$ docker start myapp_1
Error response from daemon: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open /proc/self/fd: permission denied: unknown
Meaning: This is already deeper than “failed to start.” It points at runc and a specific operation. /proc + permission denied often means LSM (AppArmor/SELinux), seccomp, or a weird mount/namespace restriction.
Decision: Don’t touch Docker packages. Go straight to journald and kernel logs to see who said “no.”
Task 2: Watch Docker events live while reproducing
cr0x@server:~$ docker events --since 10m
2026-01-03T11:12:08.123456789Z container create 2b2c0f3b7a0c (image=myapp:latest, name=myapp_1)
2026-01-03T11:12:08.234567890Z container start 2b2c0f3b7a0c (image=myapp:latest, name=myapp_1)
2026-01-03T11:12:08.345678901Z container die 2b2c0f3b7a0c (exitCode=127, image=myapp:latest, name=myapp_1)
Meaning: The container dies immediately. Exit codes are sometimes application-level, sometimes runtime-level; the timing matters. Instant death often means init never started or an entrypoint binary is missing.
Decision: If the die event is immediate and exit code looks like 127/126, verify the rootfs and entrypoint exist; if it’s a runtime error, go logs.
Task 3: Inspect container config quickly (entrypoint, mounts, security opts)
cr0x@server:~$ docker inspect myapp_1 --format '{{json .HostConfig.SecurityOpt}} {{json .HostConfig.CgroupnsMode}} {{.Path}} {{json .Args}}'
null "private" /entrypoint.sh ["--serve"]
Meaning: No special security opts, private cgroup namespace, entrypoint is /entrypoint.sh. If it fails with 127, that path might not exist or isn’t executable inside the image.
Decision: If error smells like “binary missing,” run the image with a known shell or ls via a different entrypoint (if possible). If error smells like “permission denied,” pivot to LSM/seccomp/cgroups.
Task 4: Read dockerd logs for the full stack trace
cr0x@server:~$ sudo journalctl -u docker --since "15 min ago" --no-pager | tail -n 30
Jan 03 11:12:08 server dockerd[1187]: time="2026-01-03T11:12:08.330123456Z" level=error msg="Error response from daemon: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open /proc/self/fd: permission denied: unknown"
Jan 03 11:12:08 server dockerd[1187]: time="2026-01-03T11:12:08.330234567Z" level=info msg="Attempting next endpoint for containerd"
Jan 03 11:12:08 server dockerd[1187]: time="2026-01-03T11:12:08.330345678Z" level=error msg="Handler for POST /v1.45/containers/2b2c0f3b7a0c/start returned error: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open /proc/self/fd: permission denied: unknown"
Meaning: Docker is relaying an OCI runtime error. The stack trace isn’t always shown unless debug logging is enabled, but even this confirms it’s not “Docker can’t talk to containerd.”
Decision: Next stop: containerd logs and kernel/audit logs. Permission denials are policy, not plumbing.
Task 5: Read containerd logs (shim and runc context)
cr0x@server:~$ sudo journalctl -u containerd --since "15 min ago" --no-pager | tail -n 30
Jan 03 11:12:08 server containerd[1044]: time="2026-01-03T11:12:08.320111222Z" level=info msg="starting containerd" revision= version=1.7.12
Jan 03 11:12:08 server containerd[1044]: time="2026-01-03T11:12:08.328222333Z" level=error msg="RunPodSandbox for "2b2c0f3b7a0c" failed" error="failed to create task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: open /proc/self/fd: permission denied: unknown"
Meaning: containerd is confirming runc’s failure. If you see “shim exited” without runc details, you may have a crashed runtime binary or a corrupted state dir.
Decision: If containerd’s error is thin, enable debug logging temporarily (later task). Otherwise, pivot to kernel logs to find who denied access.
Task 6: Check kernel logs for AppArmor/SELinux/seccomp denials
cr0x@server:~$ sudo dmesg -T | tail -n 25
[Fri Jan 3 11:12:08 2026] audit: type=1400 audit(1735902728.332:312): apparmor="DENIED" operation="open" class="file" profile="docker-default" name="/proc/self/fd/" pid=22451 comm="runc:[2:INIT]" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Meaning: That’s the smoking gun: AppArmor blocked an open on /proc/self/fd/ for the docker-default profile.
Decision: Fix AppArmor policy or run the container with an appropriate profile adjustment (carefully). Don’t disable AppArmor globally unless you enjoy explaining yourself later.
Task 7: Confirm AppArmor status and loaded profiles
cr0x@server:~$ sudo aa-status | head -n 20
apparmor module is loaded.
73 profiles are loaded.
67 profiles are in enforce mode.
docker-default
/usr/sbin/cupsd
/usr/sbin/sshd
6 profiles are in complain mode.
0 processes are unconfined but have a profile defined.
Meaning: AppArmor is active and enforcing. If docker-default is enforcing, your container is probably under that profile unless overridden.
Decision: If this is new behavior, find what changed: kernel update, AppArmor update, or someone toggled profiles. For a quick mitigation, set --security-opt apparmor=unconfined for the specific workload (then write a real policy).
Task 8: Check disk space and inodes (yes, every time)
cr0x@server:~$ df -hT / /var/lib/docker
Filesystem Type Size Used Avail Use% Mounted on
/dev/nvme0n1p2 ext4 96G 95G 120M 100% /
/dev/nvme0n1p2 ext4 96G 95G 120M 100% /
cr0x@server:~$ df -i /var/lib/docker
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/nvme0n1p2 6553600 6551000 2600 100% /
Meaning: You’re out of space and out of inodes. Either one can break container creation, image pulls, and overlay mounts. When Docker fails with random “no space left on device,” it sometimes means “no inodes left,” which is a special kind of cruel.
Decision: Free space/inodes first. Prune safely, delete old logs, rotate journals, or move Docker data root. Do not waste time on runc flags while the disk is redlined.
Task 9: Identify storage driver and Docker root dir
cr0x@server:~$ docker info --format 'Driver={{.Driver}} DockerRootDir={{.DockerRootDir}} CgroupDriver={{.CgroupDriver}} CgroupVersion={{.CgroupVersion}}'
Driver=overlay2 DockerRootDir=/var/lib/docker CgroupDriver=systemd CgroupVersion=2
Meaning: overlay2 on cgroup v2 with systemd driver. That combination is fine, but it narrows your failure modes: overlay mount issues, cgroup delegation/permissions, systemd unit limits.
Decision: If the error involves mounts/layers, focus on overlay2 and underlying FS. If it involves “cgroup” or “systemd,” focus there.
Task 10: Check overlay2 backing filesystem features (XFS ftype, mount options)
cr0x@server:~$ findmnt -no FSTYPE,OPTIONS /var/lib/docker
ext4 rw,relatime,errors=remount-ro
cr0x@server:~$ sudo docker info | sed -n '/Backing Filesystem/,+5p'
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
Meaning: Backing FS supports d_type; overlay2 should be structurally OK. On XFS, you’d want ftype=1; on ext4, you mostly care about being on a local filesystem with sane options.
Decision: If backing FS is NFS, CIFS, or something “creative,” expect overlay2 pain. Move DockerRootDir to local storage or use a compatible driver.
Task 11: Look for overlay mount failures and “invalid argument” in kernel logs
cr0x@server:~$ sudo dmesg -T | grep -E 'overlay|OverlayFS' | tail -n 10
[Fri Jan 3 10:58:41 2026] overlayfs: upper fs does not support RENAME_WHITEOUT
[Fri Jan 3 10:58:41 2026] overlayfs: failed to set xattr on upper
Meaning: OverlayFS is unhappy with the upper filesystem capabilities. This can happen with certain mount options, older kernels, or backing filesystems that don’t support required xattrs/features.
Decision: Fix the filesystem (mount options, kernel support) or relocate DockerRootDir. Trying to “prune images” won’t fix a filesystem that can’t do the operations overlay needs.
Task 12: Validate cgroup v2 mount and controllers availability
cr0x@server:~$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cr0x@server:~$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma
Meaning: cgroup v2 is mounted and controllers exist. Problems show up when controllers aren’t delegated or systemd restrictions block them for Docker.
Decision: If container start errors mention cgroups, confirm Docker’s cgroup driver matches systemd and that systemd version supports the features you need.
Task 13: Find OOM kills that “mysteriously” stop containers
cr0x@server:~$ sudo journalctl -k --since "2 hours ago" --no-pager | grep -i -E 'oom|killed process' | tail -n 10
Jan 03 10:44:19 server kernel: Out of memory: Killed process 21902 (myapp) total-vm:812340kB, anon-rss:512000kB, file-rss:1200kB, shmem-rss:0kB, UID:0 pgtables:1400kB oom_score_adj:0
Meaning: The kernel killed your process. Docker might report the container “exited” with no useful runtime error. This is not a containerd problem. This is the host doing triage.
Decision: Fix memory pressure: raise limits, tune requests/limits, add swap (carefully), or move the workload. Debugging runc won’t resurrect memory.
Task 14: Check for stuck shims and zombie runtime processes
cr0x@server:~$ ps -eo pid,ppid,comm,args | grep -E 'containerd-shim|runc' | grep -v grep | head
22451 1044 containerd-shim-runc-v2 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 2b2c0f3b7a0c -address /run/containerd/containerd.sock
22460 22451 runc:[2:INIT] runc:[2:INIT]
Meaning: Shims exist per container. If shims remain after containers are gone, or runc INIT processes hang, you may have a stuck mount or a kernel bug. Often it’s a storage hang or an uninterruptible sleep.
Decision: Don’t kill random PIDs first. Identify whether they’re stuck in D-state (next task). If they are, you’re dealing with I/O or filesystem trouble.
Task 15: Check process state for D-state (I/O hang) and blocked tasks
cr0x@server:~$ ps -o pid,state,wchan,comm -p 22460
PID S WCHAN COMMAND
22460 D ovl_wa runc:[2:INIT]
cr0x@server:~$ sudo dmesg -T | tail -n 5
[Fri Jan 3 11:05:12 2026] INFO: task runc:[2:INIT]:22460 blocked for more than 120 seconds.
Meaning: D-state means uninterruptible sleep, usually waiting on I/O. Killing it won’t work. This is where “Docker is down” actually means “your storage is on fire.”
Decision: Stop trying to restart Docker. Investigate storage: underlying disk, RAID, network storage, filesystem errors. You might need a host reboot if the kernel is stuck, but you should understand why before you pull that lever.
Task 16: Validate containerd socket health and API responsiveness
cr0x@server:~$ sudo ss -lxnp | grep containerd
u_str LISTEN 0 4096 /run/containerd/containerd.sock 12345 * 0 users:(("containerd",pid=1044,fd=9))
cr0x@server:~$ sudo ctr --address /run/containerd/containerd.sock version
Client:
Version: 1.7.12
Revision: 9a8b7c6d5e4f
Server:
Version: 1.7.12
Revision: 9a8b7c6d5e4f
Meaning: containerd is listening and responding. If Docker claims it can’t connect to containerd, this helps separate “daemon config” from “runtime dead.”
Decision: If ctr works but docker doesn’t, focus on dockerd configuration, sockets, or permissions. If ctr hangs, containerd is sick or blocked (often by storage).
Task 17: Use ctr to list tasks and spot half-created containers
cr0x@server:~$ sudo ctr -n moby containers list | head
CONTAINER IMAGE RUNTIME
2b2c0f3b7a0c docker.io/library/myapp:latest io.containerd.runc.v2
cr0x@server:~$ sudo ctr -n moby tasks list
TASK PID STATUS
2b2c0f3b7a0c 0 STOPPED
Meaning: The container object exists, but the task isn’t running (PID 0). This aligns with “create failed” or “init failed.”
Decision: If lots of STOPPED tasks accumulate, you likely have a systemic runtime/storage/security issue. Fix the axis, then clean up dangling state.
Task 18: Check Docker’s storage usage and prune with intent
cr0x@server:~$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 42 6 38.2GB 31.7GB (83%)
Containers 19 2 1.1GB 900MB (81%)
Local Volumes 15 8 120GB 0B (0%)
Build Cache 10 0 8.4GB 8.4GB
cr0x@server:~$ docker image prune -a --filter "until=168h"
Deleted Images:
deleted: sha256:...
Total reclaimed space: 28.4GB
Meaning: Images and build cache are reclaimable; volumes aren’t. The output tells you where you can safely claw back space without deleting customer data.
Decision: Prune images/build cache under change control. Avoid pruning volumes unless you are absolutely certain. If you’re out of inodes, pruning can also help, but verify with df -i.
Task 19: Verify iptables/nftables mode when networking errors masquerade as runtime failures
cr0x@server:~$ sudo docker run --rm busybox:latest nslookup example.com
Server: 127.0.0.11
Address 1: 127.0.0.11
Name: example.com
Address 1: 93.184.216.34
cr0x@server:~$ sudo iptables -S | head -n 5
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-USER
Meaning: DNS resolution inside container works; iptables chains exist. If containers can’t reach anything and you see FORWARD policy DROP without proper Docker rules, you can get symptoms that look like “container start failed” because apps fail health checks instantly.
Decision: If runtime succeeds but apps die immediately, validate networking and DNS. Not every “container died” is runc’s fault.
Task 20: Temporarily increase dockerd verbosity (safely) to get more context
cr0x@server:~$ sudo mkdir -p /etc/docker
cr0x@server:~$ sudo cat /etc/docker/daemon.json
{
"log-level": "info"
}
cr0x@server:~$ sudo sh -c 'cat > /etc/docker/daemon.json <
Meaning: Debug logs show where dockerd is spending time and what it asked containerd to do. This is particularly useful for hangs (you’ll see the last successful step).
Decision: Use debug mode briefly during incident response, then revert to info. Debug logs are verbose and can consume disk—the same disk you probably just discovered was full.
Joke 2/2: “Just restart Docker” is the operational equivalent of hitting a vending machine—occasionally effective, always suspicious.
Three corporate mini-stories from the trenches
1) Incident caused by a wrong assumption: “permission denied means file permissions”
They had a stable fleet. Minor OS updates rolled weekly. One Tuesday, a deployment to a subset of hosts started failing with OCI runtime create failed: permission denied. The immediate assumption: some file in the image lost its executable bit. The team rebuilt the image, rolled it out, and… nothing changed. Same error, same hosts, different image digests.
Someone then tried the classic fix: reinstall Docker. It “worked” on one host, which was enough to turn the fix into a superstition. On the next host it didn’t work. Now they had inconsistency plus downtime, which is the worst of both worlds.
The breakthrough came from a tired SRE who stopped looking at Docker and started looking at the kernel audit logs. AppArmor denials. The OS update had introduced a stricter default profile behavior for docker-default on that distro build, and one workload happened to open a /proc path during init that triggered the deny.
The fix was boring and specific: set a per-container AppArmor profile override for the affected workload, and then work with security to craft a minimal profile change. No global disable. No reinstall. The “reinstall worked once” host? It had a stale policy cache that didn’t reload until later, which is how myths are born in production.
Takeaway: “permission denied” is a category, not a diagnosis. If you don’t check audit/LSM logs, you’re debugging with a blindfold and a flashlight with dead batteries.
2) Optimization that backfired: moving DockerRootDir to “fast shared storage”
A platform team wanted faster node replacement. They decided to place /var/lib/docker on a shared storage mount so hosts could be reprovisioned without re-pulling images. On paper: fewer pulls, faster scaling, less bandwidth. In reality: they had just attached a latency-sensitive overlay filesystem to a network filesystem with occasional stalls.
At first, it looked fine. Containers started. Image pulls were quick. Then a storage network event caused a handful of 2–5 second pauses. OverlayFS did not take it gracefully. runc init processes started getting stuck in D-state. Shims stayed around like ghosts. Docker restarts didn’t help because the kernel threads were blocked on I/O.
The incident escalated because the symptoms were misleading. Engineers saw containerd logs about shims exiting and assumed runtime bugs. Others saw random “context deadline exceeded” and assumed network. The truth was that the storage backend occasionally stalled, and the overlay upperdir operations (rename/whiteout/xattr) amplified the pain.
They rolled back to local SSD for DockerRootDir. They kept shared storage, but only for explicit volumes where the I/O pattern and failure domain were understood. Node replacement was a little slower. Incidents became rarer. Everyone slept more.
Takeaway: putting Docker’s layer store on shared storage is an optimization that often turns into a reliability tax. If you must do it, test OverlayFS semantics and failure behavior under stalls, not just throughput.
3) Boring but correct practice that saved the day: keeping a “host health” runbook
A fintech shop ran critical batch jobs in containers on a small cluster. One Friday, containers started failing with failed to mount overlay: no space left on device. People assumed disk was full. It wasn’t—at least not by df -h. Plenty of gigabytes free.
But the on-call had a runbook that began with two commands: df -h and df -i. Inodes were at 100%. The culprit was millions of tiny files generated by a logging sidecar that wrote to the host path (yes, really). Docker couldn’t create new layer metadata because inode allocation failed.
They stopped the offender, cleaned the directory, and containers started immediately. No reinstalls. No moving data roots. No desperate kernel tuning. They then added inode monitoring and log rotation enforcement, and the problem never returned in that form.
Takeaway: basic host checks are not beneath you. They are the difference between “five-minute fix” and “two-hour ghost hunt.”
Common mistakes: symptom → root cause → fix
1) Symptom: OCI runtime create failed: permission denied
Root cause: Often AppArmor/SELinux denial, or seccomp blocked syscall, not filesystem permissions.
Fix: Check dmesg -T and audit logs. Adjust per-container security options or policy. Avoid disabling the LSM globally.
2) Symptom: no space left on device but df -h shows free GB
Root cause: Inodes exhausted, or Docker’s data path is on a different filesystem than you checked.
Fix: df -i /var/lib/docker. Prune images/build cache, clean log directories, and add inode monitoring.
3) Symptom: failed to mount overlay: invalid argument
Root cause: Backing filesystem unsupported (e.g., XFS without ftype=1, NFS/CIFS), kernel/OverlayFS feature mismatch.
Fix: Move DockerRootDir to a supported local filesystem. On XFS, ensure ftype=1. Avoid “clever” mounts for layer stores.
4) Symptom: container start hangs; docker commands time out
Root cause: Stuck I/O (D-state), storage stalls, or filesystem errors causing overlay operations to block.
Fix: Check process state (ps ... state), blocked tasks in dmesg, storage health. Restarting Docker won’t unstick the kernel.
5) Symptom: containerd: failed to create task and shim exited unexpectedly
Root cause: runc crash, incompatible runtime binary, corrupted state dirs, or an underlying denial (LSM/cgroup).
Fix: Read containerd logs and kernel logs. Confirm runtime versions; don’t delete random state unless you know what you’re removing.
6) Symptom: containers exit instantly after start, no runtime error
Root cause: Application fails fast (missing config, DNS failure, unable to connect), or OOM kill.
Fix: Check container logs, healthcheck behavior, and kernel OOM logs. Don’t chase runc when the kernel just killed your app.
7) Symptom: Docker daemon won’t start after an update
Root cause: Broken daemon.json, incompatible storage driver config, leftover flags from old versions.
Fix: Validate JSON, check journalctl -u docker, temporarily revert to minimal config, then reintroduce settings one at a time.
8) Symptom: rootless containers fail with cgroup errors
Root cause: cgroup delegation not configured, systemd user session constraints, missing controllers for the user slice.
Fix: Verify cgroup v2 delegation and rootless prerequisites. Rootless is not a drop-in replacement; treat it as a different runtime.
Checklists / step-by-step plan (don’t thrash)
Step-by-step triage when a container won’t start
- Capture the exact error with
docker startanddocker ps -a --no-trunc. Paste it somewhere durable. - Check host health:
df -hT / /var/lib/dockeranddf -i /var/lib/dockerfree -mand kernel OOM logsdmesg -T | tailfor obvious denials or filesystem errors
- Pull the logs from the source:
journalctl -u docker --since ...journalctl -u containerd --since ...journalctl -k --since ...
- Pick the axis based on evidence:
- Permission denied + audit logs → security axis
- Overlay mount errors + dmesg overlayfs → storage axis
- Cgroup errors + cgroup v2 hints → cgroup axis
- Hangs + D-state processes → storage/I/O axis
- Make one change, then immediately retest. No bundling “just in case” changes.
- Revert debug verbosity if you enabled it. Debug logs are a temporary flashlight, not your new lighting design.
Checklist for “should we restart docker/containerd?”
- Yes if: daemons are wedged but host is healthy, no D-state, no filesystem errors, and you can tolerate brief disruption.
- No if: kernel shows blocked tasks, overlay operations hang, storage errors are present, or the root filesystem is full. Fix the host first.
- Sometimes if: stuck shims exist for dead containers. Clean shutdown and restart might clear state, but only after you’ve confirmed I/O is healthy.
Checklist for storage hygiene (prevents half the incidents)
- Monitor inodes, not just bytes.
- Keep DockerRootDir on local storage with known-good filesystem features.
- Rotate journals and application logs; don’t let
/var/logbecome a landfill. - Prune images/build cache on a schedule appropriate for your rollout cadence (with safeguards).
- Keep an eye on kernel messages for overlayfs warnings; they often appear before total failure.
FAQ
1) Do I ever need to reinstall Docker to fix containerd/runc errors?
Rarely. Reinstalling can mask the symptom by resetting config/state, but it doesn’t fix disk exhaustion, LSM denials, cgroup delegation, or a filesystem that can’t do overlay operations. Prove the axis first.
2) How do I tell if it’s Docker, containerd, or runc?
Use logs. Docker will often say “OCI runtime create failed” (that’s runc). containerd logs will mention shim/task creation. Kernel/audit logs will tell you if the kernel refused an operation (mount, open, syscall, cgroup).
3) What’s the fastest way to debug “permission denied”?
Look for LSM denials: dmesg -T and audit logs. If you see AppArmor/SELinux messages, you have your answer. If not, consider filesystem permissions and seccomp.
4) What does “shim exited unexpectedly” actually mean?
It means containerd started a shim process to manage the container task, and it died. Causes include runc errors, corrupted state directories, or underlying kernel failures. It’s a symptom, not a root cause.
5) Why do overlay2 problems look like random runtime failures?
Because overlay2 sits under everything. If overlay mounts fail, runc can’t create a rootfs; if overlay operations block on I/O, processes hang; if xattrs/rename semantics aren’t supported, you get “invalid argument” in places that don’t mention storage.
6) How do I debug without breaking running containers?
Prefer read-only actions first: logs, docker info, ctr version, df, dmesg. Avoid daemon restarts until you’ve assessed whether the issue is localized or systemic. If storage is hanging, restarts can make things worse.
7) If I enable debug logging, what’s the risk?
Disk usage and noise. Debug logs can grow quickly during a failure loop. If you’re already near disk/inode limits, debug logging can be the final insult. Turn it on briefly, capture what you need, turn it back off.
8) Why does cgroup v2 cause container start failures?
Because delegation and controller enablement are stricter. Some setups assume cgroup v1 layout or rely on controllers that aren’t available to Docker/systemd slices. Mismatched cgroup drivers (systemd vs cgroupfs) can also bite.
9) Is rootless Docker harder to debug?
Different, not necessarily harder. Errors often point clearly to user namespace, cgroup delegation, or networking constraints. But you must debug it with rootless assumptions: different paths, different permissions, different limits.
10) What single metric would you add if you keep seeing runtime errors?
Inode usage on the filesystem containing DockerRootDir, plus kernel “blocked task” counts or alerts on repeated overlayfs warnings. Bytes-only monitoring is how you end up surprised.
Conclusion: next steps that actually help
When Docker/containerd/runc starts throwing errors, your job isn’t to find a magic command. Your job is to identify the failing axis—storage, cgroups, security policy, or runtime state—and then apply a targeted fix without detonating the rest of the node.
Practical next steps:
- Adopt the fast diagnosis playbook and keep it in your incident notes.
- Add monitoring for inode usage, not just disk bytes, on DockerRootDir.
- Make kernel/audit logs part of your standard container-start debugging flow.
- Keep DockerRootDir on a supported local filesystem; treat “shared fast storage” for layer stores as guilty until proven reliable.
- When you do change security policy (AppArmor/SELinux/seccomp), do it per-workload first. Global disables are how temporary mitigations become permanent regrets.
If you do all that, you’ll still have incidents—production always finds a way. But you’ll stop reinstalling your runtime like it’s a ritual, and start fixing the actual system.