Docker logs are exploding: fix log rotation before your host dies

October 16, 2025 • February 3, 2026 • Read: 22 min • Views: 10

Was this helpful?

One day your containers are fine. The next day the host is “mysteriously” out of disk, SSH is laggy, and every deploy turns into a roulette spin.
You check /var/log like a good citizen. It’s not the culprit. Then you remember the dark place: /var/lib/docker.

This is the failure mode where Docker logs grow until the filesystem taps out. It’s boring, common, and completely preventable.
Fix it properly and your future self won’t have to do emergency surgery on a live node at 2 a.m.

Fast diagnosis playbook

When disk is filling, you don’t want a philosophy lecture. You want answers in minutes: what’s big, what’s growing, and what to change so it
doesn’t happen again.

First: is it really Docker logs?

Check filesystem usage and confirm which mount is on fire. If / is full, but Docker is on a separate partition, don’t waste time in
the wrong place. If Docker lives on root (common), you’re about to learn why that’s spicy.

Second: find the biggest offenders fast

Identify which container log files are huge. If a single container is generating gigabytes per hour, you’re not “fixing rotation,” you’re
treating a logging DDoS.

Third: confirm the logging driver and its limits

If you’re using json-file without max-size and max-file, logs will grow until the disk says “no.”
If you’re using journald, the log volume moves to systemd-journald, and you need to cap it there.

Fourth: decide what kind of fix you need

Immediate containment: truncate huge logs, free disk, stop the bleeding.
Configuration fix: set daemon defaults, enforce limits, restart Docker safely.
Structural fix: route logs to a centralized system and stop treating local disk like a bottomless pit.

What’s actually growing (and why)

Docker’s default behavior is deceptively simple: whatever your container writes to stdout and stderr gets captured by the Docker engine and
written somewhere. The default “somewhere” on most Linux systems is the json-file logging driver.

With json-file, each container gets a log file (plus rotated ones if you configured rotation) under
/var/lib/docker/containers/<container-id>/<container-id>-json.log. That file grows. And grows. And keeps growing until
the filesystem is full—unless you tell Docker to rotate it.

Here’s the uncomfortable truth: “we have monitoring” doesn’t help if you don’t have guardrails. Disk alerts fire when it’s already bad.
Rotation is a seatbelt. Centralized logging is airbags. You want both.

One quote worth keeping on a sticky note:
Paraphrased idea: hope is not a strategy — often attributed to Gordon “Nick” Haskins (paraphrased idea).

Also: containers don’t magically make logging easier. They make it easier to create a lot of logs, fast, from many tiny places.
If your service is chatty, your host becomes the diary it never asked to be.

Joke #1: Docker logs are like free stickers at a conference—take enough and suddenly your laptop won’t close.

Interesting facts and historical context

Docker originally pushed “logs to stdout” as a clean separation: apps emit logs; the platform decides where they go.
The json-file logging driver became the default because it’s simple, self-contained, and doesn’t require external dependencies.
Docker’s log rotation options are per driver: the flags that work for json-file don’t necessarily apply to journald.
Kubernetes standardized container logs as files on the node (commonly under /var/log/containers), which made node-level log rotation a first-class operational problem.
Systemd-journald has its own retention controls and can be configured to keep logs in memory, on disk, or both—great until nobody caps it.
Overlay filesystems changed how “disk usage” feels: you can have plenty of space in one layer and still run out of space on the host filesystem that backs it.
Early container platforms frequently shipped without opinionated log retention, because retention policy is inherently business-specific (compliance, forensics, cost).
Many production outages blamed on “Docker storage” are actually plain disk-full events triggered by unbounded logs or runaway debug mode.

Practical tasks: commands, outputs, decisions

Below are real tasks you can run on a Linux Docker host. Each includes: the command, what typical output means, and what decision to make next.
The goal is to turn “disk is full” into a controlled sequence of checks.

Task 1: Confirm which filesystem is full

cr0x@server:~$ df -hT
Filesystem     Type   Size  Used Avail Use% Mounted on
/dev/nvme0n1p2 ext4   120G  116G  2.5G  98% /
/dev/nvme0n1p1 vfat   512M  8.0M  504M   2% /boot/efi
tmpfs          tmpfs   32G     0   32G   0% /dev/shm

What it means: Root is at 98%. If Docker data lives on /, you’re one deploy away from a bad time.

Decision: Proceed to locate Docker’s disk consumers immediately; don’t start “cleanup” blindly.

Task 2: Measure `/var/lib/docker` size quickly

cr0x@server:~$ sudo du -sh /var/lib/docker
87G	/var/lib/docker

What it means: Docker is consuming most of root. That’s not automatically wrong, but it’s a strong signal.

Decision: Break down Docker storage into containers, images, volumes, and logs.

Task 3: See Docker’s own storage accounting

cr0x@server:~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          42        18        19.6GB    6.2GB (31%)
Containers      65        24        3.8GB     1.1GB (28%)
Local Volumes   14        12        9.4GB     0B (0%)
Build Cache     0         0         0B        0B

What it means: Images/volumes aren’t massive enough to explain 87G. Logs or container writable layers are likely.

Decision: Go hunting for log files under container directories.

Task 4: Find the largest container log files

cr0x@server:~$ sudo find /var/lib/docker/containers -name "*-json.log" -printf "%s %p\n" | sort -nr | head -n 5
32213455120 /var/lib/docker/containers/9b2c.../9b2c...-json.log
11422577664 /var/lib/docker/containers/12ad.../12ad...-json.log
2213478400  /var/lib/docker/containers/7a11.../7a11...-json.log
845312000   /var/lib/docker/containers/fe88.../fe88...-json.log
331776000   /var/lib/docker/containers/0c19.../0c19...-json.log

What it means: You have at least one 32GB log file. That’s your disk leak.

Decision: Map the container ID to a name/service, then decide: truncate now, and fix rotation permanently.

Task 5: Map container ID to container name and image

cr0x@server:~$ docker ps -a --no-trunc --format 'table {{.ID}}\t{{.Names}}\t{{.Image}}\t{{.Status}}' | head
CONTAINER ID                                                       NAMES                 IMAGE                   STATUS
9b2c0f0e0d6c2f7d4c3d1b1e8c...                                      api-prod              registry/app:2.8.1      Up 3 days
12ad77c2b12d8b1f21b9e8f2aa...                                      worker-prod           registry/worker:5.1.0   Up 3 days
7a11aa90b9d9d7c0c1e0fda2bb...                                      nginx-edge            nginx:1.25              Up 10 days

What it means: The huge log belongs to api-prod. Now you can talk to the right team, or at least know who to glare at.

Decision: Inspect logging settings and rate of log growth; consider immediate truncation if disk is critically low.

Task 6: Check Docker daemon logging driver defaults

cr0x@server:~$ docker info --format '{{.LoggingDriver}}'
json-file

What it means: You’re on json-file. Rotation is not automatic unless configured.

Decision: Verify daemon has log-opts; if not, implement them.

Task 7: Inspect current daemon configuration

cr0x@server:~$ sudo cat /etc/docker/daemon.json
{
  "storage-driver": "overlay2"
}

What it means: No log limits are configured globally.

Decision: Add log-driver and log-opts defaults, then restart Docker during a controlled window.

Task 8: Check a container’s effective log path (json-file)

cr0x@server:~$ docker inspect --format '{{.LogPath}}' api-prod
/var/lib/docker/containers/9b2c0f0e0d6c2f7d4c3d1b1e8c.../9b2c0f0e0d6c2f7d4c3d1b1e8c...-json.log

What it means: This is the exact file that’s growing. No guesswork.

Decision: If disk pressure is high, truncate that file (safely) as containment.

Task 9: Measure log growth rate (is it actively exploding?)

cr0x@server:~$ sudo ls -lh /var/lib/docker/containers/9b2c.../*-json.log
-rw-r----- 1 root root 30G Jan  2 11:41 /var/lib/docker/containers/9b2c.../9b2c...-json.log

cr0x@server:~$ sleep 10; sudo ls -lh /var/lib/docker/containers/9b2c.../*-json.log
-rw-r----- 1 root root 30G Jan  2 11:41 /var/lib/docker/containers/9b2c.../9b2c...-json.log

What it means: If size jumps noticeably in seconds, the service is currently log-spamming. If it’s stable, the “explosion” may have been earlier.

Decision: If it’s actively growing fast, treat it like an incident: reduce log level, rate-limit, or fix the crashing loop.

Task 10: Truncate a huge log file without restarting the container

cr0x@server:~$ sudo truncate -s 0 /var/lib/docker/containers/9b2c.../9b2c...-json.log

cr0x@server:~$ sudo ls -lh /var/lib/docker/containers/9b2c.../9b2c...-json.log
-rw-r----- 1 root root 0 Jan  2 11:44 /var/lib/docker/containers/9b2c.../9b2c...-json.log

What it means: You just freed space immediately. The container keeps running; Docker keeps writing to the same inode.

Decision: This is containment, not a fix. Implement rotation next or you’ll be back here soon.

Task 11: Confirm disk space is back and the host can breathe

cr0x@server:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  120G   86G   29G  75% /

What it means: You bought time. Now spend it wisely.

Decision: Apply daemon-level rotation, then validate with a new container.

Task 12: Configure daemon-level json-file rotation (the real fix)

cr0x@server:~$ sudo tee /etc/docker/daemon.json >/dev/null <<'EOF'
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "5"
  },
  "storage-driver": "overlay2"
}
EOF

What it means: New containers will rotate at ~10MB and keep 5 files (roughly 50MB per container, plus overhead).

Decision: Restart Docker in a controlled way; note that existing containers may need recreation to adopt defaults.

Task 13: Validate configuration is syntactically OK before restart

cr0x@server:~$ sudo jq . /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "5"
  },
  "storage-driver": "overlay2"
}

What it means: Valid JSON. Docker won’t refuse to start because of a missing comma.

Decision: Restart Docker, but understand the blast radius: containers may restart depending on your init system and orchestration.

Task 14: Restart Docker and confirm it came back cleanly

cr0x@server:~$ sudo systemctl restart docker

cr0x@server:~$ systemctl is-active docker
active

What it means: Docker is up. Now verify the logging settings are actually applied for new containers.

Decision: Run a test container and inspect its log config.

Task 15: Verify a new container inherits the log opts

cr0x@server:~$ docker run --rm --name logtest alpine:3.20 sh -c 'i=0; while [ $i -lt 20000 ]; do echo "line $i"; i=$((i+1)); done'
line 0
line 1
line 2
...

cr0x@server:~$ docker inspect --format '{{json .HostConfig.LogConfig}}' logtest
{"Type":"json-file","Config":{"max-file":"5","max-size":"10m"}}

What it means: Rotation limits are present on the container config.

Decision: Plan how to roll these settings onto existing long-lived containers (usually by recreating them).

Task 16: Identify which containers are missing rotation (mixed fleet reality)

cr0x@server:~$ for c in $(docker ps -q); do
  name=$(docker inspect --format '{{.Name}}' "$c" | sed 's#^/##')
  cfg=$(docker inspect --format '{{.HostConfig.LogConfig.Config}}' "$c")
  echo "$name $cfg"
done | head
api-prod map[]
worker-prod map[max-file:3 max-size:50m]
nginx-edge map[]

What it means: map[] typically means no per-container overrides. Depending on Docker version, it may still inherit daemon defaults, or it may reflect old settings. Treat it as suspicious.

Decision: For any critical long-lived container with huge logs, explicitly set logging options or recreate it after daemon defaults are set.

Fix log rotation the right way

The correct fix depends on how you run containers. Standalone Docker hosts are one world. Docker Compose is another. Swarm exists (yes, still).
Kubernetes is its own universe. The principle is the same: you need bounded logs locally, and you need a place for logs to go when you actually care about them.

Daemon defaults: the baseline every host should have

If you do nothing else, set global limits in /etc/docker/daemon.json. This prevents “new container, new unbounded file.”
It also makes hosts predictable across environments.

A sane starting point for many services is:
10–50MB max-size and 3–10 max-file.
Smaller for high-churn services, larger if you need more local history for triage.

The trade-off is blunt: smaller rotation means you might lose older logs locally. That’s fine if you have centralized logging.
It’s reckless if local logs are your only logs.

Per-container overrides: use them sparingly, but use them

For that one container that’s always chatty (ingress, auth proxy, web server with access logs), you may want explicit overrides so a future daemon config change
doesn’t surprise you.

With Docker CLI you can set:
--log-opt max-size=... and --log-opt max-file=... on docker run. In Compose, you can specify logging options per service.
The point isn’t the syntax; the point is intent. Make “bounded local logs” a property of the workload, not a hope pinned to the host.

Do not use OS-level logrotate on Docker json logs as your primary fix

People try this because they know logrotate. It feels familiar. It is also the wrong layer for Docker’s container logs.
Docker expects to own those files; if you rotate them externally by renaming, you can end up with Docker still writing to the old file handle.

Truncation can work in a pinch. Renaming is where the weirdness starts. If you insist on OS-level rotation for some reason, you must understand file descriptors
and copytruncate behavior. Most teams don’t want that relationship drama at scale.

Joke #2: Log rotation is like flossing—skipping it feels fine until it gets expensive and personal.

Make rotation enforceable, not a suggestion

The most common organizational failure: one host is fixed, the next host is “temporary,” and the next host is a snowflake built from someone’s memory.
Your fix must be codified:

Config management (Ansible, Puppet, Chef) or immutable images that ship with daemon.json.
CI checks that reject Compose or run specs without logging limits.
A baseline SRE runbook: “Any container must have bounded local logs.”

Pick a logging driver like you mean it

Docker’s logging driver decides where stdout/stderr goes. It’s not a cosmetic setting. It’s an operational contract:
performance, reliability, retention, and who gets paged when the disk fills.

`json-file`: simple, fast enough, dangerous when unbounded

Pros: local files are easy to inspect; no dependency on systemd; works everywhere; tooling expects it.
Cons: unbounded by default; duplicates effort if you also ship logs; can create huge write amplification on busy services.

In practice: json-file is fine if you cap it and you have log shipping. If you cap it and you don’t ship, you are choosing data loss in exchange for host safety.
That can still be the right choice, but be honest about it.

`journald`: centralized on the node, but still finite

Pros: consistent with system logging; rich metadata; queryable with journalctl; supports rate limiting and size caps via journald config.
Cons: if you don’t cap journald, you just moved the “disk full” problem; debugging across boots and persistence settings can surprise people.

If you use journald, you must configure journald retention. Otherwise you’ve built a better log hoarder, not a safer system.

Remote drivers (syslog, fluentd, gelf, awslogs, splunk, etc.): fewer local files, more network dependencies

Pros: logs leave the node, which is the entire point; centralized retention; you can do analysis without SSH.
Cons: backpressure can hurt; network outages can drop logs or stall containers (depending on driver/settings); operational complexity moves to the logging pipeline.

A production-friendly approach is often: local bounded buffer + reliable shipping. Don’t bet your incident response on a single network hop.

Emergency response: the host is out of disk, now what

When the host is at 100%, the failure modes compound: Docker can’t write logs, containers crash, the kernel can’t allocate space for basics, and suddenly your
“simple logging issue” becomes an availability incident.

Containment goals (in order)

Free space quickly to restore system stability (truncate biggest logs).
Stop the high-volume logging source if it’s an abnormal behavior (debug mode, crash loop, exception storm).
Apply rotation limits so the same pattern can’t refill the disk immediately.
Preserve enough evidence to understand why it happened (sample logs before truncation if feasible).

What not to do in a disk-full panic

Don’t start deleting random directories under /var/lib/docker. That’s how you “fix logs” by deleting your runtime.
Don’t run docker system prune -a as a reflex on a production node. You’ll free space and also delete images your recovery needs.
Don’t restart Docker repeatedly hoping it will “clean something up.” Restarts can trigger cascading container restarts.

If you need to keep evidence

If compliance or debugging requires preserving a slice of logs, capture a tail before truncation. This gives you the last few thousand lines without carrying
the whole 30GB log file.

cr0x@server:~$ sudo tail -n 5000 /var/lib/docker/containers/9b2c.../9b2c...-json.log > /root/api-prod-last-5000.jsonl

What it means: You preserved recent events. It’s not perfect, but it’s usually enough to see the error storm pattern.

Decision: Truncate the huge file after capturing what you need, then fix rotation.

Three mini-stories from corporate reality

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company ran a customer-facing API on a few beefy Docker hosts. The service team was disciplined about metrics and traces, and they assumed logs
were “someone else’s problem” because a logging agent was installed on the hosts. Everyone believed logs were shipped off-box and therefore harmless.

The assumption was wrong in a very specific way: the agent tailed files under /var/log and some app-specific paths, but it did not ingest Docker’s
container json logs under /var/lib/docker/containers. The containers dutifully wrote to stdout. Docker dutifully wrote json logs. Nobody capped them.
The shipping agent never noticed. The logs stayed local. Forever.

The outage didn’t start with the API failing. It started with the host’s root filesystem hitting 100%. At that point, everything became a liar: services failed
for reasons that looked unrelated—TLS handshakes timing out, health checks flapping, and containers restarting because their own log writes started erroring.
Ops saw a spread of symptoms and chased ghosts.

The fix was unglamorous: cap json-file logs globally, explicitly ship container logs, and add monitoring that tracks
/var/lib/docker/containers growth. The important cultural change was even more boring: “logging exists in two places—collection and retention.”
They had collection for some logs, and retention for none.

Mini-story 2: The optimization that backfired

Another organization got tired of disk usage spikes. They decided to set max-size extremely low across the fleet. Think single-digit megabytes.
The reasoning was clean: keep nodes safe, ship everything centrally, and let local disk be an emergency buffer only.

Two weeks later, incident response started failing in a new way. When the central logging pipeline had hiccups (maintenance, network saturation, or just an
overloaded indexer), the local logs were too small to bridge the gap. Engineers would SSH to nodes during an outage and find that the last local logs only covered
a minute or two. The pipeline was down, and the local buffer was effectively empty. Triage turned into guesswork.

Then came the second-order effect: frequent rotations created lots of small files and more metadata churn. On some nodes, the combination of high log volume and
very small rotation thresholds increased overhead. The “optimization” wasn’t catastrophic, but it made the system noisier and harder to reason about.

The eventual compromise was sane: increase max-size to keep a meaningful window locally, but still bounded; make shipping resilient; and add alerting on
log pipeline health. Rotation is not a substitute for reliable collection. It’s a safety net, not the trapeze.

Mini-story 3: The boring but correct practice that saved the day

A finance-adjacent company ran container workloads on hosts they treated as cattle, not pets. The SRE team had a habit that looked tedious in ticket reviews:
every base image baked in Docker daemon defaults for logging, and every host provision included a small validation step that started a test container and inspected
its effective LogConfig. Nobody celebrated this. It was just another checkbox.

One afternoon, a release introduced an exception storm. The service started dumping stack traces at high volume. It would have been the classic “logs fill disk
and everything dies” incident. But on these nodes, each container had hard log caps. Disk usage rose, then leveled off.

The incident still hurt: the service was unhealthy and needed rollback. But the host fleet stayed stable. No cascading failures. No panicked cleanup that risks
deleting images and breaking recovery. The team could focus on the actual bug instead of playing janitor for their filesystems.

The postmortem was almost dull. And that’s the compliment. The boring practice—enforced defaults plus a tiny validation test—turned a potential platform incident
into a straightforward application incident. Your best operational wins look like “nothing special happened.”

Common mistakes (symptom → root cause → fix)

1) Symptom: Root filesystem fills every few days

Root cause: json-file driver with no max-size/max-file, plus chatty workloads.

Fix: Set daemon defaults in /etc/docker/daemon.json, restart Docker safely, and recreate long-lived containers if needed.

2) Symptom: You configured log rotation but old containers still have huge logs

Root cause: Existing containers may not adopt new daemon defaults; they keep their previous LogConfig.

Fix: For critical services, explicitly set per-service logging options (Compose/Swarm) or recreate containers after applying defaults.

3) Symptom: Disk is full but `docker system df` looks normal

Root cause: Docker’s accounting doesn’t always reflect raw log file bloat or filesystem-level realities.

Fix: Measure directly with find and du under /var/lib/docker/containers; treat logs as first-class disk consumers.

4) Symptom: You switched to `journald` and disk still fills

Root cause: journald is retaining too much; no caps or persistent storage is growing unbounded.

Fix: Configure journald retention (e.g., system max use), and monitor journal disk usage. Don’t “set journald” and walk away.

5) Symptom: After external logrotate, Docker keeps writing to a “deleted” file and space isn’t freed

Root cause: File descriptor still points to the old inode; renaming/deleting doesn’t reclaim space until the writer closes.

Fix: Prefer Docker’s built-in rotation. In emergencies, use truncate on the active log path.

6) Symptom: Containers stall when logging backend is slow

Root cause: Some logging drivers can apply backpressure; stdout writes can block if the driver can’t keep up.

Fix: Test driver behavior under load, ensure adequate buffering, and avoid synchronous remote logging without resilience.

7) Symptom: You set `max-size` tiny and now you can’t debug incidents

Root cause: Local log retention window is too small; centralized logging isn’t reliable enough.

Fix: Increase local caps to preserve a meaningful window, and harden log shipping pipeline health and alerting.

8) Symptom: Log files rotate but disk usage still creeps up

Root cause: The real culprit is elsewhere: overlay2 writable layers, dangling volumes, build cache, or non-Docker logs.

Fix: Break down disk usage with du and docker system df, then prune in a targeted way (with change control).

Checklists / step-by-step plan

Step-by-step: stabilize a host that’s filling up right now

Confirm which filesystem is full (df -hT).
Find biggest log files under /var/lib/docker/containers.
Map the container ID to a service and check if it’s in a crash loop or debug mode.
Capture a small tail if you need evidence (tail -n).
Truncate the top offenders (truncate -s 0).
Apply daemon logging defaults and validate JSON.
Restart Docker in a controlled way (or schedule it). Confirm it’s active.
Recreate or redeploy the worst offenders so they pick up the new logging policy.
Set alerting on disk usage and on log growth rate (directory size deltas), not just “disk is 90%.”

Baseline checklist: every Docker host should meet this

Daemon has explicit log-driver and bounded log-opts (or journald retention is configured).
/var/lib/docker capacity is sized for images + volumes + bounded logs, not “whatever is left on root.”
Monitoring includes:
- Filesystem usage alerts with sensible thresholds
- Inode usage alerts (yes, log rotation can shift the problem)
- Growth alerts on /var/lib/docker/containers
Log pipeline health is monitored if logs are shipped off-host.
Runbook includes safe commands: inspect log paths, truncate, verify daemon config.

Change management checklist: rolling out new log settings safely

Decide defaults (max-size, max-file) based on acceptable local retention window.
Update /etc/docker/daemon.json with valid JSON; validate with jq.
Pick a rollout method:
- Recreate containers gradually (preferred)
- Host-by-host maintenance restart
Verify: new containers have the expected HostConfig.LogConfig.
Confirm disk usage stabilizes over several days and that incident response still has enough local logs.

FAQ

1) Why are Docker logs so big when my app “doesn’t log that much”?

Your app may not “log” intentionally, but it might be writing noisy stderr output, repeating stack traces, printing access logs at high QPS, or running in debug mode.
Also, some libraries log more than you expect under error conditions (retry storms are a classic).

2) Does setting `max-size` and `max-file` delete logs?

It rotates and removes older chunks once the retention limit is reached. That is deletion, by design. If you need longer retention, ship logs centrally.
Local disk is not an archive.

3) Will daemon.json logging defaults apply to running containers?

Not reliably. Running containers keep their configured log settings. Treat daemon defaults as “for new containers,” and plan to recreate long-lived workloads.

4) Is truncating the json log safe?

Yes for containment. truncate -s 0 keeps the file and inode; Docker continues writing. You lose historical logs in that file, so capture a tail first if needed.

5) Should I switch from `json-file` to `journald`?

If your fleet is systemd-based and your team is comfortable with journalctl, journald can be a good choice. But cap journald retention. Otherwise you just moved the overflow point.

6) What about Kubernetes—does this still matter?

Yes. Kubernetes still relies on node-level log handling. Container runtime logs, kubelet behavior, and node log rotation settings all influence disk pressure.
If you ignore it, nodes get evicted and workloads flap. Different tooling, same physics.

7) Why not just prune Docker regularly?

Pruning images and caches can help, but it doesn’t address the core problem: unbounded log writes. Also, aggressive pruning can slow deploys and break rollback paths.
Fix the log policy first, then prune intentionally.

8) How do I choose good rotation values?

Pick a local retention window you can live with during a logging pipeline outage. For many teams, 30 minutes to a few hours of logs is a useful buffer.
Convert that into size based on your typical log rate, then cap it. Start conservative, observe, adjust.

9) My disk is full but I can’t find huge json logs—what else should I check?

Look at volumes and writable layers (application caches, uploads, temp files), build caches, and any other directories on the same filesystem.
Also check inode exhaustion: lots of tiny rotated files can hurt even when space looks available.

10) Can log rotation hurt performance?

Excessively frequent rotation can add overhead. But the performance cost of unbounded logs is worse: constant disk writes, high I/O wait, and eventual outage.
The right goal is “bounded and boring.”

Conclusion: next steps that actually stick

If Docker logs are exploding, it’s not a quirky edge case. It’s a missing safety feature. Your host’s disk is not a charity.
The fix is straightforward: identify the log files, cap them properly, and stop treating “stdout” as an infinite sink.

Do these next, in this order:

Today: find the biggest *-json.log files; truncate the worst offenders if disk pressure is high.
This week: set daemon-level log rotation defaults; validate with a test container; recreate critical long-lived services to adopt the policy.
This month: make log policy enforceable in provisioning and CI; ensure centralized logging is healthy and monitored; alert on growth, not just fullness.

Once this is in place, disk-full incidents from logging drop out of your rotation schedule. Your future pages should be about real failures, not your platform
drowning in its own narration.