Docker Read-only Containers: Harden Without Breaking Your App

Was this helpful?

Read-only containers sound like a clean security win until your app tries to write a PID file, cache a DNS result, rotate a log, or unpack a certificate bundle into /tmp like it’s 2012. Then you’re staring at EROFS errors in production and wondering why a “simple hardening tweak” became a service incident.

This is the practical way to do it: lock down the container filesystem, keep the app happy, and keep your operators sane. We’ll focus on Docker, but the patterns map directly to Kubernetes and any runtime that can mount tmpfs and volumes.

What “read-only container” actually means (and what it doesn’t)

In Docker terms, “read-only container” usually means running a container with --read-only. That flips the container’s writable layer to read-only. The image layers were already read-only; the change is that the thin copy-on-write layer that normally absorbs writes becomes immutable.

Important nuance: this does not mean the container can’t write anywhere. It means it can’t write to the root filesystem unless you provide writable mounts. Volumes, bind mounts, and tmpfs mounts are separate mounts and can remain writable. If you do it right, the root filesystem is effectively a sealed appliance, and the only write surfaces are explicitly declared.

The filesystem model you’re actually dealing with

  • Image layers: immutable. Always read-only.
  • Container writable layer: normally writable copy-on-write (OverlayFS/overlay2 in most cases). With --read-only, it becomes read-only.
  • Mounts: volumes, bind mounts, tmpfs. Each can be read-write or read-only regardless of the rootfs.
  • Kernel-provided virtual filesystems: /proc, /sys, /dev. These are special and controlled by runtime flags and Linux capabilities, not “read-only rootfs” alone.

A read-only root filesystem is not a complete sandbox. It’s one control: it reduces persistence for an attacker and shrinks the blast radius of “accidental writes” by your own code. It also forces you to explicitly model state: logs, caches, PID files, uploads, and runtime-generated config.

One quote that holds up in operations (paraphrased idea): John Allspaw argues reliability comes from designing systems that make failure modes visible and manageable, not from assuming nothing will fail. Read-only rootfs is exactly that kind of design constraint.

Why bother: threat model and operational value

Security: make persistence expensive

If an attacker gains code execution inside a container with a writable rootfs, they can drop tooling into /usr/local/bin, modify app code, or plant cron-like persistence (yes, even in containers, people try). A read-only rootfs doesn’t stop runtime execution, but it blocks a common step: writing new binaries and editing existing ones.

Does it prevent data exfiltration? No. Does it prevent memory-resident malware? Also no. But it raises the bar and reduces the “I’ll just modify a config file and wait” style of persistence.

Operations: stop configuration drift inside the container

Some teams still “hotfix” by docker exec editing a file in a running container. That’s a seductive habit: it makes problems disappear until the next deploy, then reappear at 2 a.m. Read-only rootfs makes that anti-pattern fail loudly. Good. Fix it in the image or in configuration management.

Performance and predictability: fewer writes to the COW layer

Overlay filesystems can behave badly when an app writes a lot of small files. You get copy-up overhead, inode churn, and “why is my container disk usage exploding” moments. Putting known-write paths on tmpfs or a dedicated volume makes performance more predictable and makes your storage team less grumpy.

Joke #1: A writable root filesystem is like a shared office whiteboard: useful, but sooner or later someone draws a database schema in permanent marker.

Interesting facts and short history

  • Union filesystems predate Docker by years. AUFS and OverlayFS were already used for live systems and embedded appliances; containers popularized them at scale.
  • Docker’s early storage drivers were messy. AUFS, Device Mapper, btrfs, and OverlayFS all had different semantics and edge cases; read-only rootfs was a way to reduce “writes in weird places.”
  • Read-only rootfs is older than containers. It’s a classic hardening trick for chroot jails and appliance-like Linux systems where only /var is writable.
  • Kubernetes made it mainstream. securityContext.readOnlyRootFilesystem turned it from “cool Docker flag” into a policy check in many orgs.
  • OverlayFS copy-up is the hidden cost. Writing to a file that exists in a lower (image) layer forces a copy of the whole file into the upper layer before the write happens.
  • Many base images assume writable /tmp. Package managers, language runtimes, and TLS tooling often spill temporary files there.
  • Some libraries still default to “write caches next to the code.” Python bytecode caches (__pycache__) and Java class data sharing can surprise you.
  • Logging used to be file-first. Syslog and logrotate-era habits still show up in container images that insist on writing to /var/log even when stdout is the correct answer.
  • Distroless images help, but don’t solve writes. They reduce tooling and surface area, but your app still needs a place for runtime state.

Design patterns that don’t break apps

Pattern 1: Treat the image as firmware

Build your image so it contains everything needed to run: binaries, configs (or templates), CA bundles, timezone data, and any static assets. Then assume it can never change at runtime. This forces a clean separation: code/config in the image (or injected config), state outside.

If you’re currently “fixing” containers by editing files inside them, you don’t have read-only problems. You have release engineering problems wearing a Docker costume.

Pattern 2: Explicit writable paths (the “/var is a contract” approach)

Most apps need a small set of writable locations. Common ones:

  • /tmp for temp files and sockets
  • /var/run (or /run) for PID files and Unix domain sockets
  • /var/cache for caches
  • /var/log only if you absolutely must log to files (try not to)
  • app-specific state directories: /data, /uploads, /var/lib/app

Make those paths writable using tmpfs (for ephemeral state) or volumes (for persistent state). Everything else stays read-only. This is the core move.

Pattern 3: Use tmpfs for “should not persist” state

tmpfs is RAM-backed (and swap-backed if enabled). It’s fast, it disappears on restart, and it keeps junk out of your container layer. Ideal for:

  • PID files
  • runtime sockets
  • temporary decompression
  • language runtime caches that you don’t need to persist

Be disciplined with tmpfs sizing. A “just give it RAM” attitude is how you end up debugging OOM kills that look like random crashes.

Pattern 4: Logs go to stdout/stderr, not files

Containers are not pets. Log files inside a container are a trap: they fill disks, they need rotation, and they get lost if the container dies. Prefer stdout/stderr and let the platform handle aggregation. If you must write logs to disk (compliance, legacy agents), mount a volume at /var/log and accept the operational cost.

Pattern 5: Don’t forget libraries that write “helpfully”

Common offenders:

  • Nginx: wants to write to /var/cache/nginx and sometimes /var/run
  • OpenSSL tooling: may write temp files under /tmp
  • Java: writes to /tmp and sometimes wants a writable $HOME
  • Python: can write bytecode caches and may expect writable $HOME for some packages
  • Node: tooling can write caches under /home/node/.npm if you run build steps at runtime (don’t)

Pattern 6: Prefer non-root + read-only rootfs, but keep it debuggable

Read-only rootfs pairs well with running as a non-root user. You reduce both “can write” and “can chmod/chown” power. But don’t overdo it by removing every tool and then acting surprised when on-call can’t diagnose anything.

If you go distroless, plan a separate debug image or use ephemeral debug containers. “No shell” is fine. “No plan” is not.

Pattern 7: Make state explicit in the app

The best hardening is when the app itself is honest about where it writes. Provide flags/env vars for cache dirs, temp dirs, and runtime state. If your app writes to random defaults, you’ll be playing whack-a-mole with mounts.

Joke #2: The only thing more permanent than a temporary file is the temporary file your app writes on every request.

Hands-on tasks: commands, outputs, decisions

These tasks are ordered the way you’d actually work a rollout: inspect, test, constrain, then verify. Each includes what the output means and the decision you make from it.

Task 1: Confirm your storage driver (because semantics matter)

cr0x@server:~$ docker info --format 'Storage Driver: {{.Driver}}'
Storage Driver: overlay2

What it means: You’re almost certainly on OverlayFS semantics (copy-up, upperdir). Behavior around file writes and inode exhaustion will match overlay2 expectations.

Decision: If you’re not on overlay2 (for example devicemapper), validate read-only behavior and performance in staging. Old drivers have surprising edge cases.

Task 2: Baseline container writes before hardening

cr0x@server:~$ docker run --rm -d --name app-baseline myapp:latest
8b3c1d9d51a5a3a33bb3b4a2e7d0a9e5f3a7c1b0d8c9e2f1a6b7c8d9e0f1a2b3
cr0x@server:~$ docker exec app-baseline sh -c 'find / -xdev -type f -mmin -2 2>/dev/null | head'
/var/run/myapp.pid
/tmp/myapp.sock
/var/cache/myapp/index.bin

What it means: In the last two minutes the app wrote to /var/run, /tmp, and /var/cache. That’s your first draft of “writable paths.”

Decision: Plan tmpfs mounts for /tmp and /var/run; decide if /var/cache should be tmpfs (ephemeral) or a volume (persistent across restarts).

Task 3: Check disk usage growth in the writable layer

cr0x@server:~$ docker ps --format '{{.Names}} {{.ID}}'
app-baseline 8b3c1d9d51a5
cr0x@server:~$ docker inspect --format '{{.GraphDriver.Data.UpperDir}}' app-baseline
/var/lib/docker/overlay2/7c6f3e.../diff
cr0x@server:~$ sudo du -sh /var/lib/docker/overlay2/7c6f3e.../diff
84M	/var/lib/docker/overlay2/7c6f3e.../diff

What it means: The container has already written 84 MB into its upper layer. That’s “invisible state” living on the host, not in any declared volume.

Decision: If this number grows with traffic, you need to move those writes to tmpfs/volumes or you’ll eventually page someone for “disk full on worker node.”

Task 4: Run the container with a read-only rootfs and observe failure mode

cr0x@server:~$ docker rm -f app-baseline
app-baseline
cr0x@server:~$ docker run --rm --name app-ro --read-only myapp:latest
myapp: error: open /var/run/myapp.pid: read-only file system

What it means: The app’s first write is a PID file. Classic. You now have a precise path to make writable.

Decision: Add tmpfs for /var/run (or change the app to not write PID files if it runs as PID 1 and you don’t need them).

Task 5: Add tmpfs mounts for common runtime paths

cr0x@server:~$ docker run --rm --name app-ro \
  --read-only \
  --tmpfs /tmp:rw,nosuid,nodev,noexec,size=64m \
  --tmpfs /var/run:rw,nosuid,nodev,noexec,size=16m \
  myapp:latest
myapp: error: open /var/cache/myapp/index.bin: read-only file system

What it means: You fixed the first failure; the next write surfaced. This is normal. Read-only hardening is an iterative reveal of assumptions.

Decision: Decide whether /var/cache/myapp can be ephemeral. If yes, tmpfs it. If it’s expensive to rebuild, use a volume.

Task 6: Mount a dedicated writable cache directory

cr0x@server:~$ docker volume create myapp-cache
myapp-cache
cr0x@server:~$ docker run --rm --name app-ro \
  --read-only \
  --tmpfs /tmp:rw,nosuid,nodev,noexec,size=64m \
  --tmpfs /var/run:rw,nosuid,nodev,noexec,size=16m \
  -v myapp-cache:/var/cache/myapp:rw \
  myapp:latest
myapp: started on :8080

What it means: The container is now operational with explicit writable areas.

Decision: Document the contract: /tmp and /var/run are ephemeral; /var/cache/myapp persists across container restarts on the same host (or across nodes if you use networked storage).

Task 7: Verify the root filesystem is actually mounted read-only

cr0x@server:~$ docker exec app-ro sh -c 'mount | head -n 6'
overlay on / type overlay (ro,relatime,lowerdir=/var/lib/docker/overlay2/l/...,
upperdir=/var/lib/docker/overlay2/u/...,
workdir=/var/lib/docker/overlay2/w/...)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
tmpfs on /var/run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=16384k)

What it means: The overlay mount for / is ro. Your tmpfs mounts are rw. This is the intended shape.

Decision: If you don’t see ro on the root mount, your runtime config isn’t applying. Stop and fix the deployment, don’t “assume it’s fine.”

Task 8: Prove writes fail where they should

cr0x@server:~$ docker exec app-ro sh -c 'echo test > /etc/should-fail && echo wrote'
sh: can't create /etc/should-fail: Read-only file system

What it means: The container cannot modify system paths. Good.

Decision: If this succeeds, you’ve accidentally left the root writable and your threat model just evaporated.

Task 9: Confirm writable mounts behave as expected

cr0x@server:~$ docker exec app-ro sh -c 'echo ok > /tmp/ok && cat /tmp/ok'
ok
cr0x@server:~$ docker exec app-ro sh -c 'echo ok > /var/cache/myapp/ok && cat /var/cache/myapp/ok'
ok

What it means: Temporary and cache storage are available. You have controlled write surfaces.

Decision: Add noexec on tmpfs mounts unless you have a reason not to. It blocks a class of “download and execute from /tmp” behavior.

Task 10: Check for hidden writes via environment variables (HOME, XDG)

cr0x@server:~$ docker exec app-ro sh -c 'echo $HOME; ls -ld $HOME 2>/dev/null || true'
/home/app
drwxr-xr-x 2 app app 4096 Jan  1 00:00 /home/app
cr0x@server:~$ docker exec app-ro sh -c 'touch $HOME/.probe'
touch: cannot touch '/home/app/.probe': Read-only file system

What it means: Your runtime user’s home exists but is not writable (because it’s in the read-only rootfs). Some libraries try to write config or caches under $HOME.

Decision: Either mount a tmpfs at /home/app (if acceptable) or set environment variables to redirect caches to a writable mount (preferred when you can control it).

Task 11: Validate that logging is not writing to disk

cr0x@server:~$ docker exec app-ro sh -c 'ls -l /var/log 2>/dev/null || true'
total 0
cr0x@server:~$ docker logs --tail=5 app-ro
2026-01-03T00:00:01Z INFO listening on :8080
2026-01-03T00:00:02Z INFO warmup complete

What it means: Logs are going to stdout/stderr (good). /var/log isn’t accumulating files.

Decision: If you see log files appear, either mount /var/log as a volume and manage rotation externally, or change the app/logger configuration to stdout.

Task 12: Catch permission issues early with a non-root user

cr0x@server:~$ docker run --rm --name app-ro-nonroot \
  --user 10001:10001 \
  --read-only \
  --tmpfs /tmp:rw,nosuid,nodev,noexec,size=64m \
  --tmpfs /var/run:rw,nosuid,nodev,noexec,size=16m \
  -v myapp-cache:/var/cache/myapp:rw \
  myapp:latest
myapp: error: open /var/cache/myapp/index.bin: permission denied

What it means: The volume directory permissions don’t allow UID 10001 to write. This is not a read-only failure; it’s a UID/GID ownership mismatch.

Decision: Fix ownership on the volume (one-time init) or use a runtime that supports setting volume permissions. Avoid running as root just to avoid thinking about file ownership.

Task 13: Repair volume ownership safely (one approach)

cr0x@server:~$ docker run --rm -v myapp-cache:/var/cache/myapp alpine:3.20 \
  sh -c 'adduser -D -u 10001 app >/dev/null 2>&1; chown -R 10001:10001 /var/cache/myapp; ls -ld /var/cache/myapp'
drwxr-xr-x    2 app      app           4096 Jan  3 00:10 /var/cache/myapp

What it means: The cache directory is now owned by the non-root UID/GID.

Decision: Re-run the hardened container as non-root. If your platform supports init containers (Kubernetes), this is a cleaner long-term pattern than doing it manually.

Task 14: Re-test the non-root hardened run

cr0x@server:~$ docker run --rm -d --name app-ro-nonroot \
  --user 10001:10001 \
  --read-only \
  --tmpfs /tmp:rw,nosuid,nodev,noexec,size=64m \
  --tmpfs /var/run:rw,nosuid,nodev,noexec,size=16m \
  -v myapp-cache:/var/cache/myapp:rw \
  myapp:latest
f2aa0e8d1bf2f7ad7f0c2b8b2b2a9a3d9a9e1e3c4b5d6e7f8a9b0c1d2e3f4a5b
cr0x@server:~$ docker exec app-ro-nonroot sh -c 'id'
uid=10001 gid=10001

What it means: You’re running hardened and non-root. That’s a meaningful step up in containment.

Decision: Now enforce it in CI/CD with a policy: images must run as non-root and rootfs must be read-only unless a documented exception exists.

Task 15: Validate that your app isn’t silently failing to persist data

cr0x@server:~$ docker exec app-ro-nonroot sh -c 'test -f /var/cache/myapp/index.bin && echo "cache exists" || echo "cache missing"'
cache exists

What it means: Your cache is being created and stored where you expect.

Decision: If it’s missing, the app might be swallowing write errors and running degraded. Add explicit health checks and metrics for cache warmup, uploads, or whatever state matters.

Task 16: Check kernel/LSM denials that look like filesystem errors

cr0x@server:~$ dmesg --ctime | tail -n 5
[Fri Jan  3 00:12:10 2026] audit: type=1400 audit(1735863130.123:120): apparmor="DENIED" operation="open" profile="docker-default" name="/proc/kcore" pid=31245 comm="myapp"

What it means: Not all “permission denied” is filesystem mode bits. AppArmor (or SELinux) can block access too.

Decision: If you see LSM denials, don’t randomly disable security profiles. Adjust the profile or fix the app behavior triggering it.

Fast diagnosis playbook

When a read-only rollout breaks something, you don’t need a philosophical debate. You need a fast triage loop that tells you what is trying to write where, and whether the fix is a mount, a config change, or a code change.

First: identify the exact failing path

  • Check the container logs for EROFS, Read-only file system, permission denied.
  • Find the first failing path; that’s usually the earliest write and often the simplest to address.
cr0x@server:~$ docker logs --tail=50 app-ro
myapp: error: open /var/run/myapp.pid: read-only file system

Second: determine if it’s rootfs read-only or mount permissions/ownership

  • If the error is on a path you intended to be writable, it’s probably ownership/permissions.
  • If it’s on a path you didn’t mount, it’s expected; you need to add a writable mount or change the app to not write there.
cr0x@server:~$ docker exec app-ro sh -c 'mount | grep -E " /var/run | /var/cache | /tmp "'
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
tmpfs on /var/run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=16384k)

Third: inventory writes quickly

  • Use find on recently modified files if the container can start.
  • If it can’t start, run the entrypoint in a debug mode (or override command) to get a shell and reproduce.
cr0x@server:~$ docker exec app-ro sh -c 'find / -xdev -type f -mmin -5 2>/dev/null | head -n 20'
/var/cache/myapp/index.bin
/tmp/myapp.sock
/var/run/myapp.pid

Fourth: check resource bottlenecks introduced by tmpfs

  • Tmpfs consumes memory. Under load, memory pressure looks like “random restarts.”
  • Watch memory usage and tmpfs consumption.
cr0x@server:~$ docker exec app-ro sh -c 'df -h /tmp /var/run'
Filesystem      Size  Used Avail Use% Mounted on
tmpfs            64M  2.1M   62M   4% /tmp
tmpfs            16M   44K   16M   1% /var/run

Fifth: confirm you didn’t break upgrades or certificate refresh flows

  • If your image previously ran apt-get or downloaded assets on start, read-only will block it. Good—move it to build time.
  • If you rely on runtime CA bundle updates inside the container, stop. Update by rebuilding and redeploying images.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

At a mid-sized SaaS company, a platform team rolled out read-only root filesystems for “all stateless services.” They did the responsible thing: canary, monitor, rollback plan. The canary still fell over within minutes.

The service was a Go API. It looked stateless. It talked to a database and a queue. The team assumed it wrote nothing locally. In reality, it had a TLS client that cached OCSP responses and certificate intermediates to a directory under $HOME, inherited from an old library default. Under normal conditions it was “fine,” because the directory existed and was writable in the container’s upper layer.

With read-only rootfs, the cache writes failed. The library didn’t fail closed; it retried network fetches aggressively. Latency spiked, then the downstream dependency started rate-limiting. The API stayed up, but slow enough to cause cascading timeouts across a few services that called it.

The fix was boring: redirect the cache directory to a tmpfs mount and cap retry behavior. The lesson was sharper: “stateless” is not a belief system. It’s a contract you enforce by design and verify by measurement.

Mini-story 2: The optimization that backfired

Another org wanted read-only rootfs and also wanted to be clever about performance. They moved /tmp to tmpfs and set the size to something large “so it never fills.” In staging, everything looked great. Faster builds, faster request handling for an image-processing service. Everyone high-fived.

Then production traffic arrived with real user behavior: a small percentage of requests uploaded enormous images, and the service wrote multiple intermediate files to /tmp per request. tmpfs memory usage climbed rapidly. Linux did what Linux does under pressure: it started reclaiming, then invoked the OOM killer.

The on-call saw containers restarting and assumed a code regression. They rolled back the read-only change and the problem “went away,” because the service went back to writing intermediates to disk. The next day, they tried again and got the same “mysterious” restarts.

The correct fix was to keep tmpfs for small temp files but move large intermediates to a dedicated volume (or redesign to stream). Also: set tmpfs size limits that reflect reality. “Unlimited tmpfs” is just a creative way to turn memory into a disk without telling anyone.

Mini-story 3: The boring practice that saved the day

A financial services team had a policy: every container declares its writable paths in a single place, reviewed like any other interface. They kept a small internal “container contract” document next to the Dockerfile: which paths must be writable, which are tmpfs, which are volumes, which are read-only binds, and why.

During a security push, they enabled read-only rootfs across dozens of services. Most changes were routine because the writable paths were already explicit. A handful of legacy apps failed, but the failures were localized: the contract told them what should have been writable, and their tests asserted the mounts existed.

One service still failed in production due to a new library version that started writing a cache under /var/lib. Their canary caught it. The rollback was clean, and the post-incident action was straightforward: update the contract, add the mount, and add a test that greps logs for Read-only file system during startup.

Nothing heroic happened. That’s the point. The boring practice prevented a cross-service outage and turned a risky hardening project into a controlled migration.

Common mistakes: symptoms → root cause → fix

1) App fails instantly with “Read-only file system” on /var/run

Symptom: PID file or socket creation fails at startup.

Root cause: Runtime state expected under /var/run or /run but rootfs is read-only.

Fix: Mount tmpfs at /var/run (and possibly /run) or configure the app to write PID/sockets to /tmp.

2) App runs but becomes slow; downstream sees spikes

Symptom: No crash, but latency increases after enabling read-only.

Root cause: A cache write fails and the library silently switches to recomputing/refetching on every request.

Fix: Identify cache directory, mount a writable cache path, and add observability for cache hit rate or warmup success.

3) “Permission denied” on a mounted volume

Symptom: Writes fail even though you mounted a volume read-write.

Root cause: UID/GID mismatch. Non-root process can’t write to the volume’s existing ownership.

Fix: Set correct ownership via init step (init container, one-time chown job) or use a storage class that supports fsGroup in Kubernetes.

4) Random restarts after switching /tmp to tmpfs

Symptom: Container gets OOM-killed or evicted under load.

Root cause: tmpfs consumes memory; large temp files amplify memory pressure.

Fix: Size tmpfs conservatively, move large temp workloads to a volume, or redesign to stream rather than spill to disk.

5) Nginx fails with cache or client body temp write errors

Symptom: Nginx logs: “open() … failed (30: Read-only file system)” or “client_body_temp”.

Root cause: Nginx defaults write temp files and cache under /var/cache/nginx.

Fix: Mount /var/cache/nginx writable (tmpfs for temp, volume for cache) and configure client_body_temp_path to a writable directory.

6) Java or Python tooling breaks because it wants a writable home directory

Symptom: Errors writing under /root or /home/app or XDG paths.

Root cause: Libraries default to $HOME for caches/config even in server processes.

Fix: Set HOME to a writable tmpfs (careful) or set language-specific cache dirs to mounted writable paths.

7) “Works on Docker, fails on Kubernetes” (or the reverse)

Symptom: Behavior differs between environments.

Root cause: Different default mounts, security contexts, or read-only flags. Kubernetes also injects service account mounts and may set filesystem groups.

Fix: Make mounts explicit in both. Don’t rely on “platform defaults” for writable paths.

Checklists / step-by-step plan

Step-by-step rollout plan (what I’d do on a production team)

  1. Inventory writes in the baseline container. Run under normal load and list recently modified files in / with -xdev. Capture the paths.
  2. Classify each path: ephemeral (tmpfs), persistent (volume), or “should not exist” (fix the app/image).
  3. Stop runtime installers. If entrypoint runs package installs, downloads, or compiles, move that to image build time. Read-only will force the issue anyway.
  4. Add --read-only and the minimal tmpfs mounts. Start with /tmp and /var/run.
  5. Iterate failures. Add the next writable path only after confirming it’s necessary and correctly scoped (a directory, not “just mount / as rw”).
  6. Run as non-root. Fix volume ownership issues properly. This is where teams try to cheat. Don’t.
  7. Lock it down further: use noexec on tmpfs, drop Linux capabilities, and use a restrictive seccomp profile where feasible.
  8. Add tests: integration test that the container starts with read-only rootfs and that it can write only to declared paths.
  9. Canary in production. Watch latency, error rates, restarts, and memory (tmpfs). Roll forward or back quickly.
  10. Document the contract. Writable paths are part of the service interface now.

Hardening checklist (quick but strict)

  • Root filesystem mounted read-only (--read-only / readOnlyRootFilesystem: true).
  • tmpfs for /tmp and /var/run with nosuid,nodev,noexec where possible.
  • Dedicated volume mounts for truly persistent state.
  • Logs to stdout/stderr; avoid writing to /var/log.
  • Non-root user, with volume ownership handled explicitly.
  • No runtime package installs, no self-updating binaries.
  • Health checks detect “degraded but running” mode (cache failures, upload failures).
  • Monitoring for tmpfs usage and container restarts.

Mount selection guide (decide like an adult)

  • tmpfs: secrets decrypted at runtime, PID files, small temp files, sockets. Great for speed; dangerous if unbounded.
  • named volume: caches you want to persist on a host, small state, queues used for buffering (ideally not inside the app container, but reality happens).
  • bind mount: host-provided config or certs (often read-only). Be careful: you’re coupling to host paths.
  • don’t mount: anything you can eliminate by changing the app to stream, to log to stdout, or to treat the image as immutable.

FAQ

1) Does --read-only make my container “secure”?

No. It’s one control that reduces filesystem persistence and accidental mutation. You still need non-root, capability dropping, network controls, and patching through rebuilds.

2) If I mount a writable volume, doesn’t that defeat the point?

Not if you’re intentional. The point is to reduce the writable surface area and make it explicit. A small writable volume for /var/cache/myapp is much better than “anything can write anywhere.”

3) Why not just run everything stateless and avoid volumes?

Because plenty of “stateless” apps still need temp space, sockets, and caches. The goal is to keep state minimal and controlled, not pretend it doesn’t exist.

4) Can I just mount / as read-only and then remount parts writable inside the container?

Inside a container, remounting typically requires elevated privileges and capabilities you shouldn’t grant. Do the mounts from the runtime (Docker/Kubernetes) so the container process can’t expand its write permissions.

5) What are the minimum writable mounts most apps need?

Usually /tmp and /var/run. After that it’s app-specific: caches, uploads, SQLite files, etc. The right answer is “whatever your app proves it needs under load.”

6) How does this work in Kubernetes?

You set securityContext.readOnlyRootFilesystem: true and then define emptyDir volumes (optionally medium: Memory) for tmpfs-like behavior, plus persistent volumes where required. The same write-path contract applies.

7) Why do I get “permission denied” when the mount is rw?

Because file ownership and permissions still apply. A read-write mount doesn’t magically make UID 10001 allowed to write into a directory owned by root. Fix ownership or use fsGroup/init steps.

8) What about applications that generate config files at startup?

Prefer generating config into a writable mount (like /tmp or /var/run) and point the app to it. Better: generate configs at build time or via injected config (env vars, mounted config files).

9) Will noexec on /tmp break anything?

Sometimes. Tools that extract and execute binaries from /tmp will fail. That’s often desirable in production. If your app genuinely needs it (rare for well-built services), document the exception and constrain it.

10) How do I test this in CI?

Run the container with --read-only and required tmpfs/volumes, execute a smoke test, and fail the build on any Read-only file system log lines or non-zero exit.

Conclusion: next steps that actually stick

Read-only containers are one of those rare hardening measures that also improves operational clarity. They force you to declare where state lives, which makes debugging easier and compromise harder to turn into persistence. The cost is that your app’s lazy filesystem assumptions become your problem. That’s not a downside; that’s reality with the mask removed.

Do this next

  1. Pick one service that you own end-to-end and run it with --read-only in staging.
  2. Iterate writable mounts until it runs, then shrink them to the minimum directories.
  3. Move logs to stdout, stop runtime installers, and make caches explicit.
  4. Run as non-root and fix volume ownership properly.
  5. Turn the final mount list into a contract: checked in, reviewed, and tested.

If you do only one thing: stop letting containers write wherever they feel like. Your future incidents will be more boring, and that’s the highest compliment production can offer.

← Previous
Ryzen’s comeback story: why it felt sudden (but wasn’t)
Next →
ZFS redundant_metadata: When More Metadata Copies Actually Matter

Leave a comment