“Permission denied” on a Docker volume is never just a permissions problem. It’s a mismatch-of-identities problem, dressed up as a filesystem error, and it usually shows up at the worst possible time: during a deploy, a migration, or a “small change” someone swore was safe.
If you run containers in production, you don’t get to treat UID/GID as trivia. Containers write files. Hosts enforce ownership. Network storage adds opinions. Security hardening adds more opinions. Your job is to make those opinions line up.
Why this keeps happening (it’s identity, not magic)
On Linux, file permissions are enforced by numeric IDs: UID for the owner, GID for the group. Names like www-data are just a lookup table. When a process in a container writes to a mounted directory, the host filesystem sees a UID and GID, not “the container user.”
Here’s the core problem: the container’s idea of UID 1000 might be “appuser,” but the host’s idea of UID 1000 might be “alice,” and your NFS server might decide it’s “nobody.” Same number, different meaning, or same meaning, different number. Either way, the host says: no.
Docker volumes make this visible because they are literally filesystems (bind mounts) or host-managed directories (named volumes). Your container isn’t a tiny VM with its own kernel and its own permission universe. It’s just processes with a constrained view of the host. The kernel is the bouncer, and it checks IDs, not excuses.
One more sharp edge: many official container images are built to run as root by default. That “works” until you mount a host path owned by a normal user, or until you harden the container to run as non-root, or until your storage backend refuses root-squashed writes. Then you discover that the “easy” path was just technical debt with a countdown timer.
Short joke #1: Containers don’t “have permissions.” They borrow yours, ruin the carpet, and leave you holding the security report.
Facts and history that explain today’s pain
- Unix permissions are older than most of your infrastructure: UID/GID enforcement goes back to early Unix; the model is simple, durable, and indifferent to containers.
- Docker’s early defaults favored convenience: early Docker workflows pushed “run as root” because it avoided friction; production security later made that choice expensive.
- Bind mounts predate containers by decades: the kernel doesn’t care that a mount came from Docker; it enforces the same ownership rules.
- User namespaces existed long before they were popular: Linux user namespaces landed years ago, but adoption lagged because they’re powerful and easy to misconfigure.
- NFS root squash is an intentional foot-gun remover: many NFS exports map remote root to
nobodyby design, specifically to prevent container-root from becoming storage-root. - SELinux/AppArmor changed the game: modern distros can deny access even when UID/GID look correct, because MAC (mandatory access control) sits above classic DAC permissions.
- Overlay filesystems add confusing symptoms: overlay2 can make it look like permissions “randomly changed,” when you’re actually seeing merged layers and opaque directories.
- Kubernetes normalized non-root workloads: once clusters started enforcing “runAsNonRoot,” the industry stopped getting away with sloppy UID/GID assumptions.
There’s a reason this hasn’t been “fixed” by a better container runtime: the kernel is doing exactly what it’s supposed to do.
One operations quote worth keeping on your desk: “Hope is not a strategy.”
— General Gordon R. Sullivan
Fast diagnosis playbook (first/second/third)
First: identify who the process thinks it is
If you don’t know the effective UID/GID inside the container, you’re guessing. Get the numbers. Confirm which user the app process actually runs as, not what the Dockerfile implies.
Second: identify what the host path is and who owns it
“Volume” might mean bind mount, named volume, NFS, CIFS, or a plugin-managed mount. Each has different ownership and ACL behavior. Determine the real backing path and its ownership, mode bits, ACLs, and any MAC labels.
Third: check the policy layers above POSIX permissions
When mode bits look fine but it still fails, it’s usually SELinux, AppArmor, or root squash on network storage. Don’t spend an hour chmod’ing your way into the wrong ditch.
Quick decision tree
- Works without volume, fails with volume: identity mismatch or storage policy.
- Works as root, fails as non-root: directory ownership/mode bits wrong for the intended UID/GID.
- Works on local disk, fails on NFS/CIFS: root squash, idmapping, or server-side permissions/ACLs.
- Only fails on Fedora/RHEL: suspect SELinux labeling first.
Practical tasks: commands, outputs, and decisions (12+)
These are the commands I actually run during incidents. Each task includes: command, what the output means, and the decision you make.
Task 1: confirm the container user (effective UID/GID)
cr0x@server:~$ docker exec -it app sh -lc 'id; umask'
uid=10001(app) gid=10001(app) groups=10001(app),10002(shared)
0022
Meaning: The app writes as UID 10001/GID 10001, default umask 0022 (files 644, dirs 755 unless overridden).
Decision: The mounted directory must be writable by UID 10001 or a group the process is in (like GID 10002), or via ACL.
Task 2: confirm what’s mounted where (inside container)
cr0x@server:~$ docker exec -it app sh -lc 'mount | sed -n "1,5p"; mount | grep -E "/data|/var/lib"'
overlay on / type overlay (rw,relatime,lowerdir=...,upperdir=...,workdir=...)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
/dev/sdb1 on /data type ext4 (rw,relatime)
Meaning: /data is a real mount, ext4, not an overlay path inside the container.
Decision: Troubleshooting must involve host filesystem ownership/modes for the ext4 mount point.
Task 3: inspect the Docker mount configuration (host view)
cr0x@server:~$ docker inspect app --format '{{json .Mounts}}'
[{"Type":"bind","Source":"/srv/app/data","Destination":"/data","Mode":"rw","RW":true,"Propagation":"rprivate"}]
Meaning: It’s a bind mount from /srv/app/data. That directory’s permissions are the issue, not Docker’s volume driver.
Decision: Fix ownership/mode/ACL on /srv/app/data, or change the container’s runtime UID/GID.
Task 4: inspect host directory ownership, perms, and ACLs
cr0x@server:~$ sudo ls -ldn /srv/app/data
drwxr-x--- 2 0 0 4096 Jan 2 10:11 /srv/app/data
cr0x@server:~$ sudo getfacl -p /srv/app/data
# file: /srv/app/data
# owner: root
# group: root
user::rwx
group::r-x
other::---
Meaning: Root-owned, group root, and “other” has no access. UID 10001 cannot write, or even read.
Decision: Either chown/chgrp to match the container identity, or grant access via group membership/ACL. Don’t chmod 777 unless you enjoy incident reviews.
Task 5: reproduce the failure with a write test (inside container)
cr0x@server:~$ docker exec -it app sh -lc 'touch /data/.permtest && echo OK'
touch: cannot touch '/data/.permtest': Permission denied
Meaning: It’s not your application. It’s basic write access failing.
Decision: Fix filesystem access first; don’t reconfigure the app until a simple touch works.
Task 6: check if SELinux is denying (host)
cr0x@server:~$ getenforce
Enforcing
cr0x@server:~$ sudo ls -ldZ /srv/app/data
drwxr-x---. 2 root root unconfined_u:object_r:default_t:s0 4096 Jan 2 10:11 /srv/app/data
Meaning: SELinux is enforcing, and the directory has a generic label (default_t) that containers may not be allowed to access.
Decision: Add appropriate SELinux mount options (:Z or :z) for bind mounts, or relabel the path to a container-friendly type.
Task 7: check AppArmor profile usage (host)
cr0x@server:~$ docker inspect app --format '{{.AppArmorProfile}}'
docker-default
Meaning: Default AppArmor profile is applied. Usually fine, but custom profiles can block mounts or paths.
Decision: If permissions look correct and SELinux is off, audit AppArmor logs and profile rules.
Task 8: identify named volume backing path (if not bind mount)
cr0x@server:~$ docker volume inspect appdata --format '{{.Mountpoint}}'
/var/lib/docker/volumes/appdata/_data
cr0x@server:~$ sudo ls -ldn /var/lib/docker/volumes/appdata/_data
drwxr-xr-x 2 0 0 4096 Jan 2 09:50 /var/lib/docker/volumes/appdata/_data
Meaning: Named volumes default to root ownership on the host unless the image or an init step changes it.
Decision: Create a controlled initialization step to set ownership once, or run the service with matching UID/GID.
Task 9: check if you’re on NFS and whether root squash is biting you
cr0x@server:~$ mount | grep -E ' nfs| nfs4'
10.0.2.10:/exports/app on /srv/app/data type nfs4 (rw,relatime,vers=4.1,proto=tcp,clientaddr=10.0.2.21,local_lock=none,sec=sys)
cr0x@server:~$ sudo touch /srv/app/data/.hosttest
touch: cannot touch '/srv/app/data/.hosttest': Permission denied
Meaning: Even host root cannot write. That’s classic root squash or server-side perms/ACLs.
Decision: Stop treating this like a Docker problem. Fix NFS export permissions and UID mapping on the server, or use a dedicated service UID that exists consistently across clients.
Task 10: verify ownership numeric mapping across host and container
cr0x@server:~$ getent passwd 10001
appuser:x:10001:10001::/nonexistent:/usr/sbin/nologin
cr0x@server:~$ docker exec -it app sh -lc 'getent passwd 10001 || true; cat /etc/passwd | tail -n 3'
app:x:10001:10001:app:/home/app:/bin/sh
messagebus:x:100:102::/nonexistent:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
Meaning: UID 10001 exists on host and container; good sign. If host shows a different user for 10001, your “matching UID” story is fiction.
Decision: If you can, standardize UIDs across systems for stateful services. If you can’t, use ACLs or idmapped mounts (where available) instead of wishful thinking.
Task 11: test group-based access (the sane middle ground)
cr0x@server:~$ sudo groupadd -g 10002 shared 2>/dev/null || true
cr0x@server:~$ sudo chgrp -R 10002 /srv/app/data
cr0x@server:~$ sudo chmod -R g+rwX /srv/app/data
cr0x@server:~$ sudo chmod g+s /srv/app/data
cr0x@server:~$ sudo ls -ldn /srv/app/data
drwxrws--- 2 0 10002 4096 Jan 2 10:15 /srv/app/data
Meaning: Directory is group-owned by 10002 and setgid is set so new files inherit group 10002.
Decision: Make the container process a member of GID 10002 (via image or runtime). This avoids chowning everything to a single UID and plays nicer with shared access.
Task 12: add an ACL for a specific UID (when groups aren’t enough)
cr0x@server:~$ sudo setfacl -m u:10001:rwx /srv/app/data
cr0x@server:~$ sudo setfacl -m d:u:10001:rwx /srv/app/data
cr0x@server:~$ sudo getfacl -p /srv/app/data | sed -n '1,12p'
# file: /srv/app/data
# owner: root
# group: shared
user::rwx
user:10001:rwx
group::rwx
mask::rwx
other::---
default:user::rwx
default:user:10001:rwx
default:group::rwx
Meaning: UID 10001 has explicit access, and new files inherit it via default ACL.
Decision: Use ACLs when you need multiple writers with different UIDs, especially across hosts, without resorting to 777.
Task 13: check for immutable attributes (the “why won’t chmod work” moment)
cr0x@server:~$ sudo lsattr -d /srv/app/data
-------------e-- /srv/app/data
Meaning: No immutable flag. If you see i, changes will fail silently or with EPERM.
Decision: If immutable is set, clear it intentionally (chattr -i) and document why it was there.
Task 14: verify actual write path and errors via strace (surgical, not daily)
cr0x@server:~$ docker exec -it app sh -lc 'strace -f -e trace=file -o /tmp/trace.log sh -lc "touch /data/x" || true; tail -n 5 /tmp/trace.log'
openat(AT_FDCWD, "/data/x", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 EACCES (Permission denied)
write(2, "touch: cannot touch '/data/x': Permission denied\n", 52) = 52
exit_group(1) = ?
+++ exited with 1 +++
Meaning: Kernel returned EACCES at create. This is classic DAC/MAC denial, not “file already exists,” not “disk full.”
Decision: Go back to ownership/ACL/SELinux. Don’t waste time on application-level configuration.
UID/GID strategies that actually work
Strategy 1: Match the host’s ownership (run the container with the same UID/GID)
This is the cleanest approach for bind mounts on a single host: pick a service UID/GID on the host, and run the container process as that same numeric identity.
In Docker Compose, that’s usually:
cr0x@server:~$ cat docker-compose.yml
services:
app:
image: yourorg/app:1.2.3
user: "10001:10001"
volumes:
- /srv/app/data:/data:rw
When to do it: single-node stateful services, simple deployments, local disks, predictable UIDs.
Avoid when: you share the same volume across many hosts with inconsistent UID allocation, or you depend on supplementary groups that differ across environments.
Strategy 2: Group-based access + setgid directories (best “shared volume” default)
If more than one service needs access, or if you have operators touching files, group-based access is your friend. Make a shared GID, set group ownership on the directory, setgid the directory, and ensure container users join the group.
On the container side, you can add supplementary groups:
cr0x@server:~$ cat docker-compose.yml
services:
app:
image: yourorg/app:1.2.3
user: "10001:10001"
group_add:
- "10002"
volumes:
- /srv/app/data:/data:rw
Why it works: it scales better than per-UID ownership, and the setgid bit prevents “some files are owned by random groups” drift.
Strategy 3: ACLs for mixed writers (precision tool, not a hammer)
POSIX ACLs let you grant access to multiple UIDs/GIDs without changing the primary owner. They are ideal when the directory is administered by one team but written by multiple services with different numeric identities.
Rule: If you use ACLs, use default ACLs too, or you’ll fix today’s outage and break tomorrow’s file creation.
Strategy 4: One-time initialization (chown once, not on every start)
Many images solve this by doing a chown -R at startup. That is acceptable only when the data directory is small and local. In production with real data, recursive chown on every boot is a self-inflicted denial-of-service.
Better: an explicit init job that runs once per new volume, sets ownership, then never runs again unless you intentionally rotate storage.
Example approach: run a temporary container to initialize a named volume:
cr0x@server:~$ docker run --rm -u 0:0 -v appdata:/data busybox sh -lc 'mkdir -p /data && chown -R 10001:10001 /data && ls -ldn /data'
drwxr-xr-x 2 10001 10001 4096 Jan 2 10:20 /data
Decision: Use this for named volumes and for first-run provisioning. Do not bake recursive chown into your main service startup unless you like slow restarts and long outages.
Strategy 5: Don’t fight the image—choose images that support non-root properly
Some images have a well-defined non-root user and honor PUID/PGID environment variables, or accept a runtime --user cleanly. Others assume root and then fail in creative ways when you deny them root.
Your decision: if a stateful service image can’t run as non-root without hacks, treat it as a liability. Either fix the Dockerfile internally or change the image. “But it works as root” is not a security posture.
Strategy 6: User namespace remapping (strong isolation, extra complexity)
User namespaces let you map container root (UID 0) to an unprivileged UID range on the host. That reduces blast radius when a container escapes its sandbox. It also makes volume permissions confusing if you don’t plan for it.
When enabled, a file created by “root in the container” might show up as UID 165536 on the host, because that’s the mapped host UID. That’s correct behavior. It’s just surprising if you didn’t sign up for it.
Use when: you need stronger host protection and can standardize mappings across nodes.
Avoid when: you rely heavily on bind mounts to human-managed directories and you can’t tolerate mapping complexity.
Strategy 7: Rootless Docker (good security baseline, still needs planning)
Rootless Docker runs the daemon and containers without root privileges. It reduces “container root becomes host root” incidents dramatically. But volumes still need correct ownership under the rootless user’s home and subordinate UID/GID ranges.
Key point: rootless changes where storage lives and how IDs map. It’s not a drop-in for “everything under /srv.”
Strategy 8: Kubernetes: runAsUser + fsGroup (if you’re in that world)
In Kubernetes, the equivalent of “UID/GID strategy” is usually a securityContext with runAsUser, runAsGroup, and fsGroup. fsGroup helps because the kubelet can adjust group ownership/permissions on mounted volumes so group write works.
But don’t treat fsGroup as magic. On some volume types it requires a recursive permission change, which can be painfully slow on big datasets. Plan for it.
Short joke #2: “Just chmod 777” is the storage equivalent of “just reboot it”—sometimes effective, always suspicious.
Storage-specific failure modes (ext4, NFS, CIFS, ZFS, overlay)
Local filesystems (ext4/xfs): predictable, but still easy to sabotage
On local disk, UID/GID and mode bits usually tell the whole story. If it’s wrong, it’s because humans (or init scripts) made it wrong. The common sabotage patterns:
- Host path created as root during provisioning, never chowned.
- Directory is group-writable, but files aren’t inheriting group because setgid wasn’t set.
- Umask is too strict (e.g., 0077) and files become private by default.
- ACL mask is restricting access, even though ACL entries exist.
NFS: identity is political
NFS permission issues are often identity-mapping issues. If you’re using AUTH_SYS (classic “sec=sys”), the server trusts the client to send UIDs. That means consistency of numeric IDs across machines is not optional; it’s the entire security model.
If root squash is enabled (often the default), UID 0 on the client is mapped to an unprivileged user on the server. In container land, that means “running as root” is not the easy button you thought it was.
Practical advice: pick service accounts with fixed UIDs across all nodes, manage them centrally, and don’t pretend name-based mapping will save you. NFS does not care what you call the user.
CIFS/SMB: permissions can be faked client-side
SMB mounts on Linux can present a permission view that is partly synthetic. Mount options like uid=, gid=, file_mode=, and dir_mode= can make everything look writable, until the server says otherwise via its own ACLs.
Failure mode: You think you’ve fixed it by changing mount options, but the server ACL still denies writes. Or you “fix” it by forcing ownership, and then break auditability because every file is owned by the same local UID.
ZFS: great at data, strict about ownership
ZFS is not special with respect to POSIX permissions; it’s just consistent. That consistency can be brutal when you expect “Docker will handle it.” If you snapshot and roll back, ownership changes roll back too. That’s a feature, and also a trap during incident response.
overlay2: the illusion of writable layers
Your container’s root filesystem is typically overlay2: a union of read-only image layers and a writable upper layer. Volume mounts bypass that. So if your app writes fine to /tmp but fails on /data, that’s expected: /tmp is inside the container’s writable layer, while /data is enforced by the host filesystem you mounted.
SELinux: permissions can be correct and still wrong
On SELinux systems, the label matters. A bind-mounted directory labeled default_t may be unreadable to containers even if UID/GID are perfect. Docker supports relabeling with mount flags:
:Zfor a private label (exclusive to one container):zfor a shared label (multiple containers)
Pick deliberately. If you share the same host path between containers and use :Z, you’ll relabel it into a corner.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption (“root can write anywhere”)
A mid-sized company ran a containerized ETL job that wrote parquet files to a shared NFS export. It had been “stable” for months, which mostly meant nobody touched it. Then security asked them to stop running containers as root, and the team complied by setting user: "10001:10001" in Compose.
The next night, the ETL job failed immediately: Permission denied on the output directory. The on-call engineer tried the classic fix: run it as root again. Same error. That’s when the assumption finally died: “root in the container equals root on storage.” It did not. NFS root squash mapped client root to an unprivileged server-side identity.
They then tried the next classic fix: chmod the directory to 777 on the client mount point. It changed nothing. Because the server-side export had ACLs that still denied writes, and chmod on the client wasn’t changing server ACL policy the way they imagined.
The real fix was boring: they created a dedicated service account with a fixed UID/GID, ensured that account existed with the same numbers on all ETL nodes, and set server-side ownership accordingly. They also added a simple preflight check: create a temporary file in the target directory before running the expensive job.
The postmortem was even more boring: they updated their runbooks to treat NFS as a separate security domain with its own rules. That one paragraph saved future on-calls from repeating the same dance.
Mini-story 2: The optimization that backfired (recursive chown on startup)
A different team ran a stateful service in containers with a named volume. An engineer noticed occasional permission errors after migrations and decided to “harden” startup: every container boot ran chown -R app:app /var/lib/app before starting the daemon.
In dev, it was great. Fresh volume, small data, quick chown, no more permission issues. In production, the dataset was large and lived on slower disks. During a routine deploy, the service started slowly, then slower, then it looked dead. The health check failed. Orchestrator restarted it. Which triggered another recursive chown. Now they had a restart loop doing heavy metadata operations on the same directory tree.
The storage graphs showed the truth: high IOPS, mostly metadata writes, and latency spikes that affected other services. The outage wasn’t caused by the application. It was caused by an “optimization” that turned every restart into a storage stress test.
They fixed it by removing the startup chown, replacing it with a one-time init job that only ran when a new volume was created. They also changed their deployment strategy: don’t roll all instances at once, and don’t restart-loop on slow startups without backoff.
The lesson: permission fixes that scale linearly with data size will eventually scale into a page.
Mini-story 3: The boring but correct practice that saved the day (standard UIDs and a permissions contract)
A large enterprise platform team had a rule: every stateful container runs as a dedicated service UID, with a centrally managed UID/GID allocation. Developers complained it was bureaucracy. They wanted to “just use 1000.” The platform team refused, politely and repeatedly.
Months later, an incident hit: a node had to be rebuilt quickly, and a service was rescheduled onto a fresh machine. The service came up, attached its persistent volume, and started writing immediately. No permission errors, no manual chown, no panic.
Why? The UID/GID mapping was consistent across the fleet. The volume’s ownership matched the service account everywhere. The container’s runtime user was pinned. The directory structure had setgid and default ACLs where sharing was required. Nothing clever, nothing exciting. Just a contract.
In the post-incident review, somebody asked if this was luck. The SRE lead’s answer was essentially: luck is for lottery tickets; we have runbooks.
That team still had outages. Everyone does. But they did not have this class of outage, which is a pretty good deal for a handful of policy decisions made early.
Common mistakes: symptom → root cause → fix
1) Symptom: works in container without volume, fails with bind mount
Root cause: host path ownership/mode bits don’t allow the container UID to write.
Fix: chown/chgrp the host directory to match the container UID/GID, or use a shared group + setgid + group_add, or use ACLs.
2) Symptom: works as root, fails as non-root
Root cause: image was designed around root, or the host directory is root-owned and not writable by the intended UID.
Fix: pick a known UID/GID and set ownership accordingly; prefer images that support non-root cleanly; avoid “sudo inside container” hacks.
3) Symptom: permissions look correct, still “Permission denied” on Fedora/RHEL
Root cause: SELinux label mismatch for bind mount.
Fix: mount with :Z/:z or relabel the host path appropriately; confirm with ls -Z and audit logs.
4) Symptom: cannot write to NFS even as root on host
Root cause: NFS root squash or server-side ACL denial.
Fix: fix server export permissions/ACLs; use a dedicated service UID consistent across clients; don’t attempt to chmod your way through NFS policy.
5) Symptom: intermittent failures after deploy; sometimes fixed by restart
Root cause: race between init scripts and app startup, or multiple containers initializing the same volume differently.
Fix: isolate initialization (one-time job), enforce deterministic ownership, and prevent concurrent “init” behavior in multiple replicas.
6) Symptom: volume directory is writable, but new files have wrong group
Root cause: missing setgid on directories or missing default ACL; umask too restrictive.
Fix: set setgid bit on shared directories; apply default ACLs; review umask and app behavior.
7) Symptom: changing perms on host doesn’t change what container sees
Root cause: you’re not editing the real backing path (named volume vs bind mount), or you’re on SMB/NFS with server-side policy.
Fix: docker inspect the mount source; for named volumes use docker volume inspect; for network storage, change server-side rules.
8) Symptom: after enabling userns-remap, everything “became” UID 165536
Root cause: user namespace mapping is working; your tooling and expectations aren’t.
Fix: plan UID mapping ranges; ensure host paths and automation understand mapped ownership; avoid bind mounting human-owned directories into remapped containers.
Checklists / step-by-step plan
Checklist A: Stop the bleeding during an incident (10–15 minutes)
- Prove it’s permissions:
touchin the mounted directory from inside the container. If it fails, proceed; if it succeeds, it’s your app. - Get the effective UID/GID:
idinside container, not from the Dockerfile. - Confirm mount type:
docker inspect→ is it bind or named volume? - Check host ownership/mode/ACL:
ls -ldnandgetfaclon the source path. - Check SELinux:
getenforceandls -Zon host path. - Check network storage:
mounton host; if NFS/CIFS, suspect server-side rules. - Pick the least-dangerous fix: prefer group-based write or a targeted ACL; avoid 777; avoid recursive chown on large trees.
Checklist B: Make it not happen again (the production contract)
- Standardize service UIDs/GIDs: allocate fixed numeric IDs for each stateful service across environments.
- Document the volume contract: “This path must be writable by UID X and/or GID Y, with setgid and default ACL.” Put it in the repo.
- Use group + setgid for shared paths: it’s the simplest scalable model for multiple writers.
- Handle initialization explicitly: one-time job to set ownership/permissions on new volumes.
- Decide on SELinux policy: enforce correct labeling in Compose/Kubernetes manifests.
- Test with a preflight: CI or entrypoint check that verifies
test -won required dirs and fails fast with a good message. - Keep humans out of the data path: if operators must touch files, give them group access; don’t “sudo edit” files in a way that changes ownership unpredictably.
Checklist C: If you must use network storage
- NFS: ensure UID/GID consistency across nodes; decide on root squash intentionally; manage server-side permissions.
- CIFS: be explicit with mount options; understand whether permissions are enforced server-side; avoid pretending mode bits are real if they aren’t.
- Latency and metadata costs: avoid recursive permission changes on huge trees; plan for startup behavior accordingly.
FAQ
1) Why does the container user name not matter for volume permissions?
Because the kernel enforces permissions using numeric UID/GID. Names are just entries in /etc/passwd (or NSS). The host sees numbers.
2) Should I run containers as root to avoid permission issues?
No. It shifts the risk from “permission denied” to “host compromise” and still fails on NFS root squash and SELinux policy. Fix identity mapping instead.
3) What’s the best default for shared writable directories?
Create a shared GID, chgrp the directory, set g+rwX, set the setgid bit, and ensure containers join that GID. Add default ACLs if needed.
4) Is chmod 777 ever acceptable?
As a short-lived diagnostic to prove it’s a permission issue, maybe. As a fix, it’s sloppy and often unnecessary. Use group write or ACLs.
5) Why does it fail only on one host?
Usually UID/GID mismatch (different user allocation), different SELinux mode/labels, or the “same path” actually points to different storage (local vs NFS).
6) What’s the difference between named volumes and bind mounts for permissions?
Bind mounts use an existing host path (your responsibility). Named volumes are managed under Docker’s data directory and often start root-owned unless initialized.
7) How do I fix permissions for a named volume safely?
Run a one-time init container (or a temporary docker run) as root to set ownership on the volume, then run the main service as non-root.
8) Why do I see UID 165536 (or similar) on host files?
You likely enabled user namespace remapping or rootless mode. Container UIDs are mapped into a subordinate UID range on the host. That’s expected.
9) Why do permissions look fine but writes still fail?
SELinux/AppArmor can deny access even if POSIX permissions allow it. On SELinux systems, labeling is a common culprit for bind mounts.
10) Can I “fix” this by adding users to /etc/passwd inside the container?
Adding a name helps logs and tooling, but it doesn’t change the numeric UID. The fix is still: match UIDs, use group/ACLs, or adjust mappings.
Conclusion: next steps you can do today
Docker volume permission problems are predictable. That’s the good news. They happen when numeric identities and policy layers don’t agree. Your goal is not to “try chmod until it works.” Your goal is to define a permissions contract and enforce it consistently.
Do these next:
- Pick a UID/GID strategy per service: matching UID for simple cases, shared group + setgid for shared access, ACLs for mixed writers.
- Remove recursive chown from normal startup: replace it with a one-time init step tied to volume creation.
- Make diagnosis fast: bake in preflight checks (
id,test -w) and document the expected UID/GID in the repo. - Handle SELinux and network storage intentionally: labels and server-side policy are not “edge cases.” They’re production.
Once you treat UID/GID as part of the deployment spec, “Permission denied” stops being a mystery and starts being a unit test you forgot to write.