Docker “Permission Denied” on Sockets and Devices: Capabilities vs Privileged, Done Right

November 15, 2025 • February 3, 2026 • Read: 25 min • Views: 20

Was this helpful?

Nothing ruins a calm on-call shift like a container that boots fine, logs fine, and then faceplants on a single line: permission denied. It’s always the socket you mounted “like we always do,” or the device node you “definitely gave it.” The app team swears it worked on their laptop. You swear it can’t possibly be a permissions issue because you ran it as root. Both of you are wrong in interesting ways.

This is the practical, production-grade way to debug Docker permission failures on UNIX sockets and /dev devices—without reaching for --privileged like it’s a fire extinguisher and you’re bored.

The mental model: why “root” still gets denied

When a container says “permission denied,” your brain wants it to be classic UNIX permissions: user, group, mode bits. Sometimes it is. Often it’s not. In container land, access is a negotiation between several independent bouncers:

File permissions (UID/GID/mode) on the host filesystem (including bind mounts).
Namespace mapping (user namespaces, rootless Docker, remapped root) that changes what UID/GID inside the container means outside.
Linux capabilities (fine-grained “root powers” like CAP_NET_ADMIN).
Seccomp (syscall filtering) that can block operations with a misleading error.
LSMs (SELinux/AppArmor) that deny actions regardless of mode bits.
cgroups device controller (device allow/deny lists) that can block open() on device nodes.

These layers fail differently, and the fixes are not interchangeable. “Run it as root” addresses only one layer (UID/GID) and sometimes not even that if user namespaces are involved. “Use --privileged” bulldozes multiple layers at once and makes everything work right up until it makes something else catch fire.

Here’s the rule I want you to internalize: privileged is not a permission; it’s an environment change. It disables or relaxes multiple safeguards. That’s why it “works.” That’s also why it’s usually the wrong answer.

One quote that ages well in operations: Hope is not a strategy. — James Cameron

Joke #1: If --privileged is your debugging plan, you’re not debugging—you’re negotiating with the kernel using a foghorn.

Interesting facts and small history (useful, not trivia)

Capabilities aren’t new. Linux capabilities split “root” into discrete powers in the late 1990s; containers just made them visible to normal people.
Docker started as LXC glue. Early Docker relied on LXC; the container ecosystem later standardized around OCI runtime specs.
docker.sock is effectively root. Access to the Docker API typically grants control over the host, because you can mount the host filesystem or start privileged containers.
cgroups device filtering is older than Docker. The devices controller existed long before container hype; Docker simply uses it to keep /dev/mem adventures to a minimum.
Seccomp’s default Docker profile is conservative. Docker ships a default seccomp profile that blocks a bunch of syscalls; many “permission denied” reports are actually syscall denials.
Rootless Docker changed the threat model. Rootless mode avoids a root daemon, but it also changes how devices and privileged operations behave. Many things simply can’t be done.
SELinux makes “permission denied” literal. On SELinux systems, DAC (mode bits) can say “allow” and SELinux can still say “nope.” Both are “permissions,” but different systems.
Binding a socket crosses trust boundaries. A container connecting to a host socket inherits whatever that socket can do (systemd, containerd, Docker, custom admin daemons).

Fast diagnosis playbook

You’re on the clock. Don’t poke randomly. Do this in order; each step narrows the failure class quickly.

1) Confirm what path is failing and where it lives

Is it a bind mount from the host? A socket created inside the container? A device node passed through?
Does the error happen on open(), connect(), or some higher-level library call?

2) Determine identity: UID/GID inside vs outside

Check the container process UID/GID.
Check the host file’s owner/mode and any group requirements (common with sockets).
If rootless or userns-remap is involved, assume UID mapping is the problem until proven otherwise.

3) Check the “policy layers”: LSMs and seccomp

If SELinux/AppArmor is enabled, look for AVC/AppArmor denials.
If the failure happens on something “privileged” (mount, setns, perf, bpf, raw sockets), suspect seccomp/capabilities.

4) For devices, check cgroup device permissions

If you can see /dev/... but cannot open it, it’s often cgroups device filtering or missing capability.

5) Only then consider adding capabilities or `--device`

Prefer --cap-add, --device, and explicit group fixes.
Reserve --privileged for rare cases, and treat it as a temporary diagnostic step, not a solution.

Hands-on tasks: commands, output, decisions (12+)

These are the moves I actually use. Each task includes what the output means and the decision you make from it. Run them on the host unless noted.

Task 1: Reproduce the failure with maximum context

cr0x@server:~$ docker logs --tail=50 myapp
...snip...
Error: connect unix /var/run/docker.sock: permission denied
...snip...

What it means: You have a concrete failing path: /var/run/docker.sock. That’s a UNIX socket. The operation is connect(), not just open().

Decision: Focus on socket ownership, group, SELinux/AppArmor labels, and whether the container user is in the right group. Don’t start with capabilities.

Task 2: Identify the container’s effective user

cr0x@server:~$ docker exec myapp id
uid=10001(app) gid=10001(app) groups=10001(app)

What it means: The process is not root. It has no supplemental groups. If the socket requires a group (it usually does), this will fail.

Decision: Either run that specific process with a group that matches the socket, or redesign so the container doesn’t need host Docker access.

Task 3: Inspect the socket permissions on the host

cr0x@server:~$ ls -l /var/run/docker.sock
srw-rw---- 1 root docker 0 Jan  3 10:12 /var/run/docker.sock

What it means: Only root and members of group docker can connect. Mode 660. This is typical.

Decision: If you insist on mounting this socket, your container user needs group docker (GID match matters), or you need a proxy with a narrower API.

Task 4: Confirm the docker group GID (GID mismatch is a classic)

cr0x@server:~$ getent group docker
docker:x:998:cr0x

What it means: The host docker group has GID 998. Inside your container, group IDs may not align unless you make them.

Decision: Pass the group into the container (--group-add 998) or build the image so the container has a group with GID 998 and the process uses it.

Task 5: Run a one-off container with explicit group-add to validate

cr0x@server:~$ docker run --rm -it \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --group-add 998 \
  alpine:3.20 sh -lc 'id && apk add --no-cache docker-cli >/dev/null && docker ps >/dev/null && echo OK'
uid=0(root) gid=0(root) groups=0(root),998
OK

What it means: With group 998, the container can talk to the Docker API.

Decision: If this “fix” is acceptable (often it isn’t), implement group mapping properly. Otherwise, treat any request to mount docker.sock as a security exception that needs review.

Task 6: Diagnose SELinux denials (if SELinux is enabled)

cr0x@server:~$ getenforce
Enforcing

What it means: SELinux policy is active. A plain bind mount can be blocked by labeling rules.

Decision: Check audit logs for AVC denials before you touch permissions or capabilities.

Task 7: Look for recent AVCs that match your container

cr0x@server:~$ sudo ausearch -m avc -ts recent | tail -n 5
type=AVC msg=audit(1735902901.123:812): avc:  denied  { connectto } for  pid=21456 comm="myapp" path="/var/run/docker.sock" scontext=system_u:system_r:container_t:s0:c123,c456 tcontext=system_u:object_r:docker_var_run_t:s0 tclass=unix_stream_socket permissive=0
...snip...

What it means: SELinux explicitly denied connectto on the socket. Even correct UNIX permissions won’t help.

Decision: Fix labeling/mount options (:Z/:z) or policy. Do not chmod the socket in frustration.

Task 8: Validate AppArmor profile (common on Ubuntu)

cr0x@server:~$ docker inspect --format '{{.AppArmorProfile}}' myapp
docker-default

What it means: The default AppArmor profile is applied. It can deny certain operations (less often sockets, more often mounts, perf, ptrace).

Decision: If you see AppArmor denials in dmesg, adjust with a custom profile or test with --security-opt apparmor=unconfined as a diagnostic only.

Task 9: Check seccomp mode

cr0x@server:~$ docker inspect --format '{{.HostConfig.SecurityOpt}}' myapp
[]

What it means: No custom security opts are set; Docker default seccomp profile is in play.

Decision: If the failure is a syscall denial (often shows as Operation not permitted), test with --security-opt seccomp=unconfined to confirm, then fix by adding only needed syscalls/capabilities.

Task 10: For device access, check whether the device node exists inside the container

cr0x@server:~$ docker exec myvpn ls -l /dev/net/tun
crw-rw-rw- 1 root root 10, 200 Jan  3 10:12 /dev/net/tun

What it means: The device node is present. If the app still can’t use it, the problem is not “missing file.” It’s either cgroup device permissions or missing capabilities (commonly CAP_NET_ADMIN).

Decision: Verify device allow rules and required capabilities, not chmod.

Task 11: Confirm the container was started with the device allowed

cr0x@server:~$ docker inspect --format '{{json .HostConfig.Devices}}' myvpn
[{"PathOnHost":"/dev/net/tun","PathInContainer":"/dev/net/tun","CgroupPermissions":"rwm"}]

What it means: Docker has configured the device passthrough and device cgroup permissions for read/write/mknod.

Decision: If permissions still fail, you likely need a capability (CAP_NET_ADMIN) or you’re blocked by LSM/seccomp.

Task 12: Inspect container capabilities (effective set)

cr0x@server:~$ docker exec myvpn sh -lc 'apk add --no-cache libcap >/dev/null 2>&1; capsh --print | sed -n "1,8p"'
Current: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap=ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap
...snip...

What it means: No cap_net_admin. Many VPN/TUN workflows need it to configure interfaces or routing.

Decision: Add --cap-add NET_ADMIN (and possibly NET_RAW, depending) rather than --privileged.

Task 13: Validate by running with minimal added capability

cr0x@server:~$ docker run --rm -it \
  --device /dev/net/tun \
  --cap-add NET_ADMIN \
  alpine:3.20 sh -lc 'ip link show >/dev/null 2>&1 || apk add --no-cache iproute2 >/dev/null; ip tuntap add dev tun0 mode tun && echo OK'
OK

What it means: With NET_ADMIN plus the device, the operation succeeds.

Decision: Bake those requirements into the run configuration (Compose/Kubernetes securityContext equivalent). Don’t escalate further unless something else is blocked.

Task 14: Check kernel/audit logs for “denied” hints (LSM/seccomp)

cr0x@server:~$ dmesg | tail -n 8
[1735902910.441] audit: type=1400 audit(1735902910.441:813): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="docker-default" name="/sys/fs/cgroup/" pid=21999 comm="runc"
...snip...

What it means: AppArmor denied a mount. The error is -13 (EACCES), which looks like a permissions issue at user level.

Decision: You’re not fixing this with chmod. You need an AppArmor change, different mount strategy, or to stop doing that mount from inside the container.

Task 15: Confirm whether you’re in rootless mode (and stop expecting magic)

cr0x@server:~$ docker info --format '{{.SecurityOptions}}'
[name=seccomp,profile=default name=rootless]

What it means: Rootless is enabled. Many device and low-level operations won’t work the same way, or at all.

Decision: If you need raw device access or kernel networking features, rootless might be the wrong platform for this workload. Pick your battles.

Capabilities vs privileged: what you actually need

--privileged is the sledgehammer: all capabilities, all devices, and a bunch of security restrictions relaxed. It’s great for proving a point and terrible for running production. Capabilities are the scalpel: you add exactly what the process needs and keep the rest of the sandbox intact.

What `--privileged` really changes

Exact behavior varies by Docker version and kernel, but in practice privileged mode commonly does all of the following:

Adds all capabilities to the container (or close to it), expanding the effective and bounding sets.
Disables the device cgroup filter: the container can access host devices broadly.
Relaxes some LSM constraints depending on configuration (not a universal bypass, but it changes the game).
Makes it easier for processes to do mounts, manipulate network stack, load kernel modules (still often requires host cooperation), and poke at namespaces.

That’s why it “fixes” permission denied. It’s also why it can turn a compromised app into a host compromise with very few extra steps.

Common capability needs by symptom

Need to create TUN/TAP, set routes, configure iptables: CAP_NET_ADMIN (and sometimes CAP_NET_RAW).
Need to bind to ports <1024: CAP_NET_BIND_SERVICE (Docker grants this by default in many cases).
Need to adjust time or clock settings: CAP_SYS_TIME (try hard to avoid).
Need to mount filesystems: CAP_SYS_ADMIN (this is the “everything bagel” capability; avoid if possible).
Need to use perf events: often blocked by kernel settings and needs CAP_SYS_ADMIN or sysctl changes; also often blocked by seccomp.
Need to change ownership on bind mounts: might need CAP_CHOWN but usually you should fix ownership on the host or use the right UID mapping instead.

The boring guidance that works

If you’re tempted by --privileged, first ask:

Can I solve this with UID/GID alignment or group membership?
Can I solve it with one capability and maybe one device mapping?
Can I solve it by moving the privileged action to a sidecar/agent with a tight API?
Is the real issue that we’re trying to do host administration from inside an app container?

UNIX sockets: docker.sock, container runtimes, and your own sockets

UNIX domain sockets are files, but they aren’t regular files. “Permissions” apply at connect time, and the service behind the socket decides what to do once you connect. That means two different control planes:

Filesystem permissions control who can connect.
Service authorization (if any) controls what you can do after connecting.

docker.sock: the foot-gun with a great marketing campaign

Mounting /var/run/docker.sock into a container is common. It’s also a security boundary collapse. If an attacker gets code execution in that container, they can often create a new container with the host filesystem mounted and call it “debugging.”

So when you fix permission denied on docker.sock, you’re not just solving a technical problem. You’re granting host control. Treat it like production SSH keys: tightly scoped, audited, and rarely needed.

Group and GID: the “works on my laptop” generator

The docker socket is usually owned by group docker. Inside a container, the name “docker” is irrelevant; the numeric GID is what matters. If the socket is root:docker with GID 998, your container process must be in GID 998 to connect.

There are three sane options:

Use --group-add with the host GID. Good for quick fixes, but you must document why.
Create a group in the image with the same numeric GID and run the process with that group. Better for reproducibility.
Don’t mount the socket. Use a narrow proxy or redesign. Best for security.

Your own service sockets (Postgres, Redis, system daemons)

Same pattern: the socket file on the host has owner/group/mode. The container user must match those permissions as seen by the host. If you bind mount /run/postgresql/.s.PGSQL.5432 into a container, the client process inside must have rights to connect based on that socket’s mode bits.

Also: many services put sockets under /run, which is tmpfs and recreated at boot. If you “fixed” the mode bits once, you didn’t fix anything. You left a note for future you to be disappointed.

Devices: /dev, cgroups, and why mknod isn’t your friend

Device nodes under /dev are special files that represent kernel interfaces. Access is controlled by:

UNIX permissions on the device node (owner/group/mode).
The device cgroup allow list (major/minor + r/w/m).
Capabilities needed to perform the operation (network admin, raw io, etc.).
LSMs and seccomp for certain sensitive operations.

The most common device permission failures

/dev/net/tun: You passed the device but forgot CAP_NET_ADMIN, or you’re running rootless and expected miracles.
/dev/fuse: You passed the device but forgot the fuse kernel module isn’t available, or the container runtime disallows it without extra config.
GPU devices (/dev/nvidia*, /dev/dri): Group membership (often video or render) and runtime hooks matter as much as the device nodes.
Block devices: If you’re trying to mount or format disks from inside a container, stop and think about blast radius. Then think again.

`--device` is precise; use it

For device access, prefer:

--device /dev/net/tun (or the specific device)
--cap-add for the capability that authorizes the associated kernel operation
Explicit group ownership (e.g., add to render for DRM devices) when applicable

Do not throw --privileged at a single missing device. If your app needs one device, give it one device. If it needs twenty devices, your app might not be an app. It might be an agent, and agents deserve different review.

Joke #2: The kernel doesn’t care that your container is “just trying to help.” It’s like HR: policy first, feelings later.

SELinux, AppArmor, seccomp: the “looks like permissions” layer cake

Mode bits are just the outer crust. The filling is policy.

SELinux: when chmod is performance art

On SELinux systems, a denial is often logged as an AVC. The process has a security context (like container_t) and the target has a type (like docker_var_run_t). If policy says “no,” it’s no.

Operationally, the common fix for bind mounts is correct labeling:

:Z to relabel content for exclusive container use
:z to relabel for shared container use

If you don’t relabel and SELinux is enforcing, you can see a socket and still be blocked from connecting to it. That’s not Docker being weird. That’s SELinux doing its job.

AppArmor: quieter, but still sharp

AppArmor denials show up in dmesg and can block mounts, ptrace, and other sensitive operations. It can manifest as permission denied or operation not permitted, depending on the call and how it fails.

The “easy” test is --security-opt apparmor=unconfined. The “correct” fix is to write an AppArmor profile that allows exactly what you need. If your container needs to mount arbitrary things, that’s a smell, not a profile requirement.

Seccomp: the invisible hand that returns “EPERM”

Seccomp filters syscalls. When blocked, you often get EPERM (Operation not permitted) and a bug report that says “permissions.” Sometimes it’s correct; sometimes it’s a blocked syscall that has nothing to do with file permissions.

When troubleshooting, temporarily run with unconfined seccomp to confirm the diagnosis, then either adjust the profile or change the approach. A classic example is workloads trying to use newer syscalls that the default profile doesn’t allow.

Rootless Docker: different rules, same pain

Rootless Docker is great for reducing host risk. It’s also a reality check: if your app needs to perform actions that normally require root on the host (devices, low-level network config, mounts), rootless mode will make that hard or impossible.

How rootless changes the permission story

User namespaces are not optional. UID/GID inside the container are mapped to non-root IDs on the host.
Device access is limited. Even if you can see device nodes, operations may be blocked because the underlying privileges aren’t available.
Networking can be different. Depending on setup, you might be using slirp4netns or similar, changing what “network admin” even means.

Rootless is not “Docker but safer.” It’s “Docker with different constraints.” If you’re running storage engines, VPN endpoints, or anything that wants to be part of the kernel, rootless may be the wrong tool.

Three corporate mini-stories from the trenches

Incident: the wrong assumption (root inside container == root on host)

A team rolled out a containerized log shipper that needed to read rotated logs from a host path. In staging, they ran the container as root and bind-mounted /var/log. It worked. The change sailed through code review because “it’s read-only.”

Production ran with user namespace remapping enabled on the Docker daemon. Inside the container, the process was UID 0. On the host, it mapped to an unprivileged UID range. Suddenly the shipper started failing with permission denied on some rotated files and not others. The errors were intermittent because ownership of rotated logs varied by service and rotation job.

They tried the usual incantations: chmod on the host, restart the container, run as root (it already was), add --privileged (which did not help consistently because the userns mapping still applied). Meanwhile, alerting started going blind because the shipper was selectively dropping logs from the exact services you want during an incident.

The fix was boring: stop assuming UID 0 means anything across a namespace boundary. They aligned ownership by using ACLs on the host log directories for the remapped UID range, and they changed the shipper to run as a specific UID that matched the host policy. They also added a canary check that reads a known log path at startup and fails fast, instead of “best-effort” silently losing logs.

Postmortem takeaway: containers didn’t “break permissions.” The team’s assumption did. Namespaces are not a vibe; they’re math.

Optimization that backfired: “just mount docker.sock to avoid deploying agents”

A platform team wanted faster CI jobs. Their idea: run build containers that talk to the host Docker daemon via /var/run/docker.sock. No nested virtualization, no separate builders, less overhead. Everyone cheered, because it was fast and cheap and “industry standard.”

Then a developer added a dependency that executed a post-install script from a third-party package. The script wasn’t malicious on purpose; it was just sloppy and assumed it could introspect the environment. It queried the Docker API, noticed it had control, and started a helper container. That helper container mounted host paths to “cache” things. The cache included credentials and config files that were never supposed to be visible to builds.

Security caught it during a routine review because the build logs started containing odd environment details. Nothing was exploited beyond that, but it was a near miss. The underlying issue was straightforward: mounting docker.sock is equivalent to giving the container an administrative API for the host.

The remediation was a controlled builder service with a narrow API: “build this repo at this commit with this config.” No general Docker API exposure. Builds became slightly slower. The organization slept better. And the platform team stopped treating docker.sock as a convenience feature.

Lesson: if your “optimization” shortcuts a trust boundary, it’s not an optimization. It’s a debt instrument with a variable interest rate.

Boring but correct practice that saved the day: explicit capabilities and preflight checks

A networking product group ran containers that set up TUN interfaces and applied routing rules. Early prototypes were all --privileged because the goal was shipping a demo, not impressing the kernel. When the product moved toward production, an SRE insisted on writing down the minimum set: --device /dev/net/tun, --cap-add NET_ADMIN, plus a couple of sysctls managed outside the container.

It felt pedantic. It also forced the team to document exactly which operations the container performed and where. They added a startup preflight: verify /dev/net/tun is present, verify CAP_NET_ADMIN exists (via a small self-check), and log a crisp error if not.

Months later, a routine host hardening change adjusted the default seccomp profile for a subset of nodes. A few containers started failing at boot with permission-like errors. Because the preflight was explicit, the incident triage was fast: the logs said “NET_ADMIN missing” on the affected nodes. It wasn’t missing; it was being clipped by a misapplied policy. They rolled back the policy and then fixed the rollout process.

Nothing heroic happened. No one SSH’d into boxes at 3 a.m. The system told the truth early, and the fix was obvious. That’s the kind of boring you want in production.

Common mistakes: symptom → root cause → fix

1) Symptom: “permission denied” connecting to /var/run/docker.sock

Root cause: container user not in the socket’s group (or GID mismatch), or SELinux denies connect.

Fix: align GID with --group-add or image group creation; on SELinux use proper labeling or policy. And reconsider whether you should mount docker.sock at all.

2) Symptom: device node exists, but open() fails with permission denied

Root cause: device cgroup denies it, or missing capability for the associated operation.

Fix: pass device via --device with rwm; add the minimal capability (often NET_ADMIN for TUN). Validate with docker inspect and capsh.

3) Symptom: chmod 666 still doesn’t work (socket or file)

Root cause: SELinux/AppArmor denial, or the object isn’t the one you think (e.g., new socket recreated under /run).

Fix: check AVC/AppArmor logs, label mounts correctly, and fix the service that creates the socket (systemd unit permissions) instead of chasing transient files.

4) Symptom: works with –privileged, fails without it

Root cause: you need one of: a device mapping, a capability, a seccomp allowance, or a mount permission.

Fix: bisect: add --device first (if relevant), then add a single capability, then test seccomp unconfined to confirm. Replace privileged with explicit requirements.

5) Symptom: rootless containers can’t access /dev/kmsg, /dev/net/tun, or mount filesystems

Root cause: rootless mode lacks host-root privileges by design.

Fix: don’t run these workloads rootless. Use a rootful node pool for privileged workloads, or move the operation to the host via an agent.

6) Symptom: “Operation not permitted” when doing mount or setns

Root cause: blocked by seccomp/AppArmor or missing CAP_SYS_ADMIN.

Fix: avoid doing it inside the container. If unavoidable, use a custom seccomp/AppArmor profile and explicitly justify the capability. Treat CAP_SYS_ADMIN as “privileged-lite.”

7) Symptom: container can see socket but connect fails only on some hosts

Root cause: host-level policy differences (SELinux enforcing on some, different GIDs, different systemd socket unit permissions).

Fix: standardize host configuration; add preflight checks that log host socket ownership/GID at container start; treat drift as an incident cause.

8) Symptom: can access file paths but not when they’re bind-mounted

Root cause: mount options/labels (SELinux), or userns mapping changes ownership semantics.

Fix: label with :Z/:z on SELinux; ensure UID/GID mapping matches; consider using named volumes with managed ownership if appropriate.

Checklists / step-by-step plan

Checklist A: Fix a UNIX socket “permission denied” without going privileged

Identify the socket path from logs or strace-equivalent debug in the app.
On the host: ls -l the socket and capture owner/group/mode.
In the container: check id for UID/GIDs.
Align group access by numeric GID:
- Prefer --group-add <gid> as a fast test.
- For production, create the group in the image with the same GID and run the process in it.
If SELinux enabled: check AVC logs; label mount correctly.
If AppArmor enabled: check dmesg for denials; adjust profile if needed.
Document the risk if the socket is administrative (docker.sock, containerd, systemd).

Checklist B: Fix /dev device “permission denied” the right way

Confirm device node exists inside container (ls -l).
Confirm device is passed through (docker inspect .HostConfig.Devices).
Determine required capability for the operation (TUN and routes: NET_ADMIN, raw sockets: NET_RAW).
Add one capability at a time and retest.
Check LSM/seccomp logs if the failure persists.
Refuse CAP_SYS_ADMIN by default. If someone asks for it, make them explain the syscall and the alternative.

Checklist C: Safe use of docker.sock (if you absolutely must)

Threat model it: assume container compromise equals host compromise.
Limit who can deploy it and where it can run (dedicated nodes, tight network policies).
Run as non-root and add only the docker socket group by GID.
Prefer a proxy that exposes only necessary endpoints, not full Docker API.
Add monitoring for unexpected container creation and host mounts initiated via that socket.

FAQ

1) Why does a container running as root still get “permission denied”?

Because “root” inside a container may not be root on the host (user namespaces), and because LSMs, seccomp, and cgroup device rules can deny access regardless of UID.

2) Should I ever mount /var/run/docker.sock into a container?

Rarely. Treat it as granting administrative control of the host. If you do it, use explicit GID mapping, strict deployment controls, and document the exception.

3) What’s the difference between `--cap-add` and `--privileged`?

--cap-add grants one specific kernel capability. --privileged grants a broad set of capabilities, relaxes device restrictions, and weakens isolation in multiple ways. One is a scalpel; the other is a forklift.

4) My app needs `CAP_SYS_ADMIN`. Is that acceptable?

Assume “no” until proven otherwise. CAP_SYS_ADMIN covers a huge range of operations (including mounts and namespace interactions). Often there’s an alternative: do the mount on the host, use a volume plugin, or redesign the workflow.

5) Why does it work on my laptop but not in prod?

Different host policies: SELinux enforcing in prod, AppArmor profile differences, different docker group GID, userns-remap enabled, or different kernel/seccomp versions. Containers are portable; host policies are not.

6) How do I tell if SELinux is the problem?

If getenforce is Enforcing and you see AVC denials in audit logs referencing your container context and the target object, SELinux is the cause. Fix labeling or policy; don’t chmod your way out.

7) What’s the safest way to give a container access to a host UNIX socket?

Ensure the socket serves a non-admin API; set the socket mode to require a dedicated group; add only that numeric GID to the container; avoid running the whole container as root. If it’s an admin socket, strongly reconsider.

8) How do device permissions differ from normal file permissions?

Device nodes also go through the device cgroup allow list and often require capabilities for the associated privileged kernel operations. Seeing /dev/net/tun doesn’t mean you can use it.

9) Is rootless Docker a solution to these permission problems?

It reduces risk, but it also removes the ability to do many privileged things. If your workload needs devices or kernel network configuration, rootless likely makes it harder, not easier.

Next steps you can actually do

If you’re staring at “permission denied” today, stop guessing and start classifying:

Is it a socket or a device? That choice determines the fastest path.
Check identity (UID/GID and mapping) before you touch capabilities.
Check SELinux/AppArmor/seccomp logs when chmod doesn’t change outcomes.
Replace privileged with explicit requirements: one --device, one --cap-add, one --group-add, plus labeling if needed.
Write preflight checks into containers that depend on host integrations, so failures are loud and precise.

And if someone asks you to “just mount docker.sock,” ask what problem they’re actually solving. Most of the time, the right fix isn’t permissions. It’s architecture.