Everything looks fine until you reboot. Then the “simple” Docker Compose stack becomes a crime scene: containers start in the wrong order, volumes aren’t mounted yet, networks are missing, and your database is up but your app is convinced the universe is still down.
Compose is great at describing an application. systemd is great at making sure your machine behaves like a machine. Put them together properly and you stop writing boot-time shell spaghetti that only works when nobody is watching.
What we are solving (and what we are not)
This is about running Compose-defined stacks reliably after reboot using systemd. Reliably means:
- The stack starts at boot without manual intervention.
- It does not start too early (before disks, networking, or Docker are ready).
- It shuts down cleanly so you don’t corrupt stateful services.
- You can diagnose failures fast with the tools already on the box.
This is not about turning Compose into Kubernetes. Compose will not become a scheduler, a self-healing cluster manager, or a multi-node orchestrator. If you need those, you already know it, and you already have scars.
Also: we’re not doing “@reboot sleep 30 && docker compose up -d”. That’s not engineering. That’s a ritual.
Facts and history that actually matter
A little context helps because half of “Compose + systemd problems” are really “I assumed old behavior still applies.” Here are concrete facts with operational consequences:
- Compose started life as Fig (2013), a Python tool. That heritage is why some people still think “Compose is a Python thing” and treat it like a script rather than a lifecycle tool.
- Docker introduced restart policies early (
restart: always,unless-stopped). Those policies are enforced by the Docker daemon, not systemd, which means they behave differently during shutdown ordering. - systemd became mainstream on major distros in the mid-2010s. Before that, init scripts were “best effort.” If you’re copying guides from that era, you’re inheriting their randomness.
- Compose V2 is a Docker CLI plugin (
docker compose), not the olddocker-composePython binary. Units that hardcode the old path break after upgrades. - Docker’s unit name differs by distro (
docker.servicevsdocker.socketinterplay). Correct ordering requires you to be explicit about what you depend on. - “depends_on” never meant “wait until ready”. It’s start ordering, not readiness. Healthchecks plus wait logic (or app retry logic) still matters.
- journald logging is not a Docker feature; it’s a host logging decision. If you don’t integrate logs into the host’s logging path, you will debug boot issues blind.
- System shutdown is a different universe than system boot. If you don’t handle shutdown timeouts, your database may receive SIGKILL like it stole something.
- Rootless Docker is real ops now, and it shifts where sockets live, how units are installed, and who owns the lifecycle. Units written for rootful Docker quietly fail under rootless.
One quote worth keeping on the wall, because it describes 90% of boot-time container failures:
Werner Vogels (paraphrased idea): “Everything fails; design so failure is expected and recovery is automatic.”
Principles for reboot-proof stacks
1) Pick one supervisor: systemd or Docker restart policies
Don’t make them fight. You can use both, but you must understand the consequence: systemd supervises the Compose command, while Docker supervises the containers. If systemd thinks the service is “done” and Docker restarts containers on its own, you can end up with misleading health signals and confusing restarts.
My preference for a single host running a small number of stacks:
- Use systemd to start the stack at boot and stop it at shutdown.
- Use Docker restart policies inside Compose for container restarts after the stack is running (crashes, transient failures).
That combination keeps the boot/shutdown lifecycle explicit while still giving you runtime resilience.
2) Make ordering real: disks, network, Docker, then Compose
“After=docker.service” is necessary but often insufficient. If your stack depends on a mounted filesystem (NFS, iSCSI, encrypted disk, ZFS dataset import), you must express that ordering too. Otherwise your containers start with empty directories and create fresh state in the wrong place, which is how you get “why is it using SQLite in /var/lib?” at 2 a.m.
3) Don’t confuse “running” with “ready”
systemd can tell you a service started. Docker can tell you a container is running. Neither can tell you that Postgres has finished crash recovery, or that your app has run migrations, unless you wire it in.
This is where healthchecks, retries, and timeouts stop being academic and start preventing pager noise.
4) Keep stateful services boring
Boring means: stable paths, explicit mounts, explicit stop timeouts, and no surprise upgrades at reboot. The “cool” approach is how you learn the hard way that databases dislike abrupt SIGKILL.
Short joke #1: If your stack only boots when you whisper “just this once,” your server has learned emotional dependency, not automation.
Designing a correct systemd unit for Compose
What “correct” looks like
A good unit file does four things:
- Orders itself after prerequisites (Docker, mounts, network-online if needed).
- Starts the stack idempotently.
- Stops the stack cleanly within a realistic timeout.
- Surfaces logs and failure states where your normal tooling will see them.
Unit semantics that matter in production
These are the knobs that decide whether you spend your morning drinking coffee or reading postmortems:
- Type=oneshot + RemainAfterExit=yes: systemd runs a command to bring the stack up, then considers the service “active” without keeping a process attached. This mirrors reality: Docker holds the containers, not the Compose process.
- ExecStart/ExecStop: Use
docker compose up -dfor start, anddocker compose downorstopfor stop. Pick based on whether you want networks/volumes removed. - TimeoutStartSec/TimeoutStopSec: Give enough time for image pulls (start) and for databases to flush (stop). Default timeouts are not a moral statement; they’re just defaults.
- WorkingDirectory: Set it. Compose resolves relative paths, env files, and project names based on it. Leaving it implicit is how you accidentally start an empty project from
/. - EnvironmentFile: Good for per-host config that isn’t in Git. Also a neat way to keep secrets out of unit files (though not a full secret system).
- RequiresMountsFor=: Underused and excellent. It makes systemd wait for a path’s filesystem to be mounted before starting.
- After=network-online.target: Only if you truly need it. It can slow boot if your network setup is flaky. Use it when you depend on remote resources.
Which Compose command to use: up, start, down, stop
Here’s the opinionated mapping:
- Start:
docker compose up -d --remove-orphans(removes forgotten containers from older configs; avoids ghost services). - Stop (stateful-friendly):
docker compose stop(keeps networks and containers defined; quicker restart). - Stop (clean slate):
docker compose down(removes containers and networks; use when you want to recreate on boot).
For most production stacks: start with up -d, stop with stop. Use down when you have a compelling reason (like immutable infrastructure patterns) and you’re sure your volumes are external/persistent.
Logging: choose journald or stick to Docker logs, but be deliberate
During boot failures, you want a single pane of glass. If your organization is already using systemd journal, then make the systemd unit output useful: add --log-level where supported, and ensure failures return non-zero.
Separately, decide where container stdout/stderr go. Docker’s default JSON log driver is fine until it isn’t, then you discover disk usage the fun way. If you use journald as Docker’s log driver, you get centralized host-level log querying with journalctl. If you keep json-file, use rotation settings.
Concrete unit file patterns (rootful and rootless)
Pattern A: rootful Docker, oneshot unit, clean ordering
This is the pattern I deploy most often for a single host. It’s simple, predictable, and doesn’t pretend Compose is a daemon.
cr0x@server:~$ sudo tee /etc/systemd/system/compose@.service > /dev/null <<'EOF'
[Unit]
Description=Docker Compose stack (%i)
Requires=docker.service
After=docker.service
Wants=network-online.target
After=network-online.target
# If your stack uses persistent data on a specific mount, uncomment and set:
# RequiresMountsFor=/srv/%i
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/srv/%i
EnvironmentFile=-/srv/%i/.env
# Pull is optional; use it when you can tolerate boot-time pulls.
ExecStart=/usr/bin/docker compose up -d --remove-orphans
ExecStop=/usr/bin/docker compose stop
ExecStopPost=/usr/bin/docker compose rm -f
TimeoutStartSec=300
TimeoutStopSec=180
[Install]
WantedBy=multi-user.target
EOF
Notes you should care about:
compose@.serviceis a template unit. You can runcompose@myappand it will use/srv/myapp.EnvironmentFile=-makes it optional. If it’s missing, the unit still runs.ExecStopPost rm -fremoves stopped containers so a futureupdoesn’t inherit weird state. If you prefer to keep containers, delete that line.
Pattern B: rootful Docker, “down on stop” for immutable stacks
If you treat the host as cattle (or at least as a bored pet), you may want down so every boot recreates containers. Make sure volumes are real volumes, not anonymous defaults.
cr0x@server:~$ sudo tee /etc/systemd/system/compose-immutable@.service > /dev/null <<'EOF'
[Unit]
Description=Immutable Docker Compose stack (%i)
Requires=docker.service
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/srv/%i
ExecStart=/usr/bin/docker compose up -d --remove-orphans
ExecStop=/usr/bin/docker compose down --remove-orphans
TimeoutStartSec=300
TimeoutStopSec=240
[Install]
WantedBy=multi-user.target
EOF
Be honest: if your database uses a bind mount into /srv/%i/data and that path isn’t mounted yet, down won’t save you. It’ll just recreate wrong faster.
Pattern C: rootless Docker + user systemd units
Rootless Docker is attractive for security boundaries. It’s also different enough to punish copy-paste. The socket and service live under the user session, and units should be installed as user services.
cr0x@server:~$ mkdir -p ~/.config/systemd/user
cr0x@server:~$ tee ~/.config/systemd/user/compose@.service > /dev/null <<'EOF'
[Unit]
Description=User Docker Compose stack (%i)
After=default.target
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=%h/stacks/%i
ExecStart=/usr/bin/docker compose up -d --remove-orphans
ExecStop=/usr/bin/docker compose stop
TimeoutStartSec=300
TimeoutStopSec=180
[Install]
WantedBy=default.target
EOF
And you must enable lingering if you expect it to start at boot without interactive login:
cr0x@server:~$ sudo loginctl enable-linger cr0x
If you forget lingering, everything works in your terminal and fails after reboot. That’s not a mystery; it’s a lifecycle mismatch.
Practical tasks: commands, outputs, and decisions
These are not “nice to know.” These are the things you actually run when someone says, “It didn’t come back after reboot.” Each task includes the command, representative output, what it means, and the decision you make from it.
Task 1: Confirm Compose V2 vs legacy Compose binary
cr0x@server:~$ docker compose version
Docker Compose version v2.24.6
Meaning: Compose is available as the Docker CLI plugin.
Decision: Write unit files calling /usr/bin/docker compose. Do not hardcode docker-compose unless you’ve verified it exists and is managed.
Task 2: Verify Docker daemon is up and not degraded
cr0x@server:~$ systemctl status docker --no-pager
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2026-01-03 09:12:41 UTC; 2min 10s ago
TriggeredBy: ● docker.socket
Docs: man:docker(1)
Meaning: Docker is running and socket-activated.
Decision: If Docker is inactive or failed, fix Docker first. Compose units depending on Docker will cascade-fail.
Task 3: Check whether your Compose unit is enabled and which target wants it
cr0x@server:~$ systemctl is-enabled compose@myapp.service
enabled
Meaning: It should start at boot when its target is reached.
Decision: If it’s disabled, enable it. If it’s static, you wrote a unit without an [Install] section.
Task 4: See if systemd thinks the Compose service is active
cr0x@server:~$ systemctl status compose@myapp.service --no-pager
● compose@myapp.service - Docker Compose stack (myapp)
Loaded: loaded (/etc/systemd/system/compose@.service; enabled; preset: enabled)
Active: active (exited) since Tue 2026-01-03 09:13:04 UTC; 1min 40s ago
Process: 2214 ExecStart=/usr/bin/docker compose up -d --remove-orphans (code=exited, status=0/SUCCESS)
Meaning: The Compose “start action” succeeded; containers should be managed by Docker.
Decision: If it’s failed, go to journal logs for the unit and fix the immediate error (missing env, missing compose file, permissions).
Task 5: Read unit logs for the last boot only
cr0x@server:~$ journalctl -u compose@myapp.service -b --no-pager
Jan 03 09:13:03 server systemd[1]: Starting Docker Compose stack (myapp)...
Jan 03 09:13:04 server docker[2214]: [+] Running 3/3
Jan 03 09:13:04 server docker[2214]: ✔ Network myapp_default Created
Jan 03 09:13:04 server docker[2214]: ✔ Container myapp-db-1 Started
Jan 03 09:13:04 server docker[2214]: ✔ Container myapp-api-1 Started
Jan 03 09:13:04 server systemd[1]: Started Docker Compose stack (myapp).
Meaning: This is the truth source for whether boot-time start happened.
Decision: If logs show missing files or mount errors, fix dependencies/order. If logs show success but app is down, the issue is inside the containers or their dependencies (readiness, network, DB recovery).
Task 6: Confirm containers exist and match the Compose project
cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
NAMES STATUS PORTS
myapp-api-1 Up 1 minute (healthy) 0.0.0.0:8080->8080/tcp
myapp-db-1 Up 1 minute 5432/tcp
Meaning: Containers are present; health status is visible for services with healthchecks.
Decision: If containers are missing, Compose didn’t run or ran in the wrong directory. If containers are restarting, inspect logs and resource pressure.
Task 7: Inspect why a container is restarting (last exit code, OOM, health)
cr0x@server:~$ docker inspect myapp-api-1 --format '{{.State.Status}} {{.State.ExitCode}} OOM={{.State.OOMKilled}} Health={{if .State.Health}}{{.State.Health.Status}}{{end}}'
running 0 OOM=false Health=unhealthy
Meaning: It’s running but unhealthy; likely dependency not ready or app misconfigured.
Decision: Don’t restart blindly. Check logs and dependency reachability. If OOM=true, adjust memory limits or host capacity.
Task 8: Validate compose file resolution and environment at the exact WorkingDirectory
cr0x@server:~$ cd /srv/myapp
cr0x@server:~$ /usr/bin/docker compose config
name: myapp
services:
api:
image: registry.local/myapp-api:1.9.2
environment:
DB_HOST: db
db:
image: postgres:16
Meaning: Compose can parse and render the final config.
Decision: If this fails, systemd will fail too. Fix syntax, missing env vars, missing files, or wrong WorkingDirectory.
Task 9: Confirm your persistent data path is mounted before Compose runs
cr0x@server:~$ findmnt /srv/myapp
TARGET SOURCE FSTYPE OPTIONS
/srv/myapp tank/appdata/myapp zfs rw,xattr,noacl
Meaning: Your stack’s data directory is a real mount (here, ZFS dataset).
Decision: If findmnt shows nothing, you’re writing into the root filesystem. Add RequiresMountsFor=/srv/myapp to the unit and fix your mount/import ordering.
Task 10: Check boot ordering and dependency graph
cr0x@server:~$ systemctl list-dependencies compose@myapp.service --no-pager
compose@myapp.service
● ├─docker.service
● ├─network-online.target
● └─multi-user.target
Meaning: systemd will not start the Compose service until those units are satisfied.
Decision: If your mount isn’t listed, you’re relying on timing. Add RequiresMountsFor or explicit mount units.
Task 11: Time where boot is slow: critical chain
cr0x@server:~$ systemd-analyze critical-chain compose@myapp.service
compose@myapp.service +1.820s
└─docker.service +1.301s
└─network-online.target +1.005s
└─NetworkManager-wait-online.service +1.002s
Meaning: Your Compose stack isn’t the slow part; the system is waiting on network-online.
Decision: If the stack doesn’t truly need network-online, drop it. Otherwise fix the network-online provider (e.g., wait-online service configuration).
Task 12: Confirm shutdown behavior and stop timeout
cr0x@server:~$ systemctl show compose@myapp.service -p TimeoutStopUSec -p ExecStop
TimeoutStopUSec=3min
ExecStop={ path=/usr/bin/docker ; argv[]=/usr/bin/docker compose stop ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Meaning: systemd will allow 3 minutes for clean stop.
Decision: If you run databases, 10 seconds is comedy. Increase TimeoutStopSec, and tune Compose service stop_grace_period if needed.
Task 13: Validate Docker logging driver and rotation (to prevent boot-time disk full surprises)
cr0x@server:~$ docker info --format '{{.LoggingDriver}}'
json-file
Meaning: Containers log to JSON files by default.
Decision: Ensure /etc/docker/daemon.json configures rotation, or consider journald if that fits your ops model. Disk-full during boot can prevent Docker from starting at all.
Task 14: Detect whether you’re dealing with rootless Docker when you thought you weren’t
cr0x@server:~$ docker context show
default
cr0x@server:~$ systemctl --user status docker --no-pager
Unit docker.service could not be found.
Meaning: Likely rootful Docker (or user service not installed). Rootless setups usually have a user-level docker service and a different socket path.
Decision: Align the unit location (system vs user) with how Docker runs. Mismatched assumptions cause “works in shell, fails on boot.”
Fast diagnosis playbook
When a stack doesn’t come back after reboot, you don’t have time for interpretive dance. This is the fastest path to the bottleneck.
First: prove whether systemd ran the start command
- Check unit state:
systemctl status compose@X.service - Check last-boot logs:
journalctl -u compose@X.service -b
If the unit never ran, you’re in enablement/Install/target land. If it ran and failed, fix the reported error before touching containers.
Second: prove whether Docker was ready and stayed alive
systemctl status dockerjournalctl -u docker -bfor storage driver errors, disk full, permission issues, daemon crash loops.
If Docker is unhealthy, Compose is irrelevant. Fix Docker’s storage, disk, or configuration first.
Third: prove whether prerequisites were available (mounts, network, secrets)
findmnt /srv/Xor relevant paths.systemctl list-dependencies compose@X.serviceto see if mounts are required.- Check presence and permissions of env files, compose files, and bind mount directories.
Fourth: if everything “started,” chase readiness and application-level failures
docker psand health status.docker logs --tail=200for the failing container(s).docker inspectfor exit code, OOMKilled, health failures.
Fifth: isolate resource bottlenecks
- CPU/memory pressure:
docker stats --no-stream,free -h. - Disk pressure:
df -h,docker system df. - Slow mounts:
systemd-analyze critical-chainand mount unit logs.
Short joke #2: “It worked after I rebooted again” is not a fix; it’s a slot machine with better branding.
Common mistakes: symptom → root cause → fix
This is the section you’ll wish you’d read before the outage call.
1) Symptom: Unit says “active (exited)” but containers are missing
- Root cause: Wrong
WorkingDirectory, so Compose ran against an empty directory and created a new project elsewhere (or did nothing). - Fix: Set
WorkingDirectory=/srv/myapp(or equivalent). Rundocker compose configfrom that directory to validate. Consider--project-nameif you must, but usually directory-based naming is fine.
2) Symptom: Containers start, then app fails to connect to database right after boot
- Root cause: You relied on
depends_onfor readiness. DB container is running but not accepting connections yet (crash recovery, fsck, WAL replay, encryption unlock). - Fix: Add a healthcheck to the database container and implement retry/backoff in the application. If you must gate startup, use a small init/wait step in the app container entrypoint, not in systemd.
3) Symptom: After reboot, data directory is empty or “reset”
- Root cause: Mount wasn’t ready; container created a new directory on the root filesystem and initialized fresh data. Later the mount appears, hiding the wrong data.
- Fix: Add
RequiresMountsFor=/srv/myapp(or exact data path) to the unit. For ZFS, ensure import happens early. For encrypted disks, ensure unlock units precede Docker/Compose.
4) Symptom: Compose unit fails with “Cannot connect to the Docker daemon” at boot
- Root cause: Your unit runs before Docker socket/daemon is ready, or Docker is slow due to storage checks.
- Fix: Ensure
Requires=docker.serviceandAfter=docker.service. If Docker is socket-activated, still depend on the service. Consider increasingTimeoutStartSecfor your Compose unit if Docker startup is slow.
5) Symptom: Shutdown hangs for a long time, then containers get killed
- Root cause: Too short stop timeouts, or your unit doesn’t stop the stack at all, leaving Docker to handle it late in shutdown.
- Fix: Add
ExecStop=docker compose stopand a realisticTimeoutStopSec. For stateful services, set Composestop_grace_periodand avoiddownif you want faster restarts without recreation.
6) Symptom: Unit works manually, fails at boot with missing env vars
- Root cause: You used shell profile variables or relied on an interactive environment. systemd services do not load your shell RC files.
- Fix: Use
EnvironmentFile=in the unit, or bake env into Composeenv_file. Validate withsystemctl showanddocker compose config.
7) Symptom: Boot is slow because network-online waits forever
- Root cause: You added
network-online.targetby habit, but your stack doesn’t need it, or your network wait service is misconfigured. - Fix: Remove network-online dependency unless you need remote network resources at start. If you do need it, fix the wait-online service or your network manager configuration.
8) Symptom: Rootless Compose stack doesn’t start after reboot
- Root cause: User service isn’t started at boot because lingering isn’t enabled, or the unit was installed as a system unit when Docker is rootless.
- Fix: Install as user unit and run
loginctl enable-linger USER. Confirm withsystemctl --user is-enabledand check user journal logs.
Three corporate-world mini-stories
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company ran a customer-facing API on a single beefy VM. Nothing fancy: Compose stack with an API container, Redis, and Postgres. They used restart: always on everything and called it “high availability.” It worked for months, which is how you get confident for no reason.
During a routine kernel update, the VM rebooted. Docker came back. Containers came back. The API came back, technically. But it returned 500s for about ten minutes, and then it recovered by itself. The on-call saw it, shrugged, and moved on. “Transient.”
A week later, another reboot happened—this time during a busier period. The API’s migration logic ran at startup, assumed the database was reachable immediately, and failed hard. The container restart policy dutifully restarted it, quickly, over and over, hammering the logs and keeping the service down. Postgres was fine; it was just replaying WAL on a slower-than-usual disk after an unclean shutdown.
The wrong assumption was subtle: they believed “container running” implied “dependency ready.” They also believed Docker restart policy was enough to manage startup sequencing. Both beliefs are common. Both are wrong in a way that only shows up during boot or recovery.
The fix was boring and effective: systemd started the stack after mounts and Docker, healthchecks were added to Postgres and Redis, and the API was changed to retry DB connection with backoff before running migrations. The next reboot was uneventful, which is the best kind of story.
Mini-story 2: The optimization that backfired
A different shop wanted faster boot times. They had a dozen Compose projects on one host (internal tooling, dashboards, small services). Someone noticed that waiting on network-online.target added seconds. So they removed it from all units and declared victory.
Boot got faster. Then the weird failures started: a metrics container couldn’t resolve DNS during its initial startup and cached the failure. A license-checking service tried to reach an external endpoint once, failed, and disabled itself until manual restart. A reverse proxy started without being able to resolve upstream names and served default error pages.
In the postmortem, the team discovered something uncomfortable: those services were never robust to transient network absence. The network-online wait had been masking application fragility. Removing it made the fragility visible, and the “optimization” turned boot speed into service flakiness.
The eventual solution was nuanced. They reintroduced network-online only for stacks that truly needed it, and they fixed the worst offenders to retry network operations properly. Boot time improved a bit, reliability improved a lot, and the team learned that shaving seconds off boot is cheap until it isn’t.
Mini-story 3: The boring but correct practice that saved the day
A finance-adjacent company ran a small Compose stack for a reporting pipeline: scheduler, worker, and a database. The host used encrypted storage and a dedicated dataset for the DB data. Their systemd unit had one extra line compared to everyone else’s: RequiresMountsFor=/srv/reporting.
One morning, after an unattended reboot, the encryption unlock step took longer than usual because a dependency service retried key retrieval. The system was “up” but the dataset wasn’t mounted when Docker started. Without ordering, the DB container would have initialized a fresh database in an unmounted directory on the root filesystem.
But the unit didn’t start. systemd waited. Docker was ready, network was fine, but the mount wasn’t there, so the Compose service stayed queued. Once the dataset mounted, the stack started normally. No split-brain directories. No phantom fresh database. No restoration drama.
When they later audited the boot logs, the only artifact was a delayed start. That delay was a feature: it prevented silent data divergence. The best reliability work often looks like “nothing happened,” which is the only acceptable aesthetic in production.
Checklists / step-by-step plan
Step-by-step plan: migrate a Compose stack to systemd cleanly
- Normalize the stack location: put each project under a stable directory (example:
/srv/myapp). Decide whether you want template units (compose@.service) or per-stack units. - Make state explicit: ensure databases use named volumes or bind mounts to a path you control. Avoid anonymous volumes for anything you care about.
- Validate Compose config deterministically: run
docker compose configfrom the intended directory. Fix warnings and missing variables. - Write the unit:
WorkingDirectory=set to the stack directory.Requires=docker.service,After=docker.service.RequiresMountsFor=for any persistent data paths on dedicated mounts.ExecStart=docker compose up -d --remove-orphansExecStop=docker compose stop(ordownif you really want that).- Time out realistically.
- Reload systemd:
systemctl daemon-reload. - Enable the unit:
systemctl enable --now compose@myapp.service. - Test reboot behavior without rebooting: stop Docker, start Docker, ensure the unit behaves. Then do a real reboot in a maintenance window and watch journal logs.
- Instrument readiness: add healthchecks for critical dependencies; ensure apps retry on dependency failure. Don’t make systemd do application-level retries.
- Set logging policy: choose log driver and rotation. Confirm disk usage won’t explode.
- Document the operations contract: what “start,” “stop,” and “upgrade” mean for this stack, including data safety expectations.
Operational checklist: before you call it “reliable”
- Unit starts successfully on cold boot (not just warm reboot).
- Unit waits for required mounts; no data directories created on the wrong filesystem.
- Database containers have realistic stop grace periods; shutdown doesn’t SIGKILL them routinely.
- Logs for start failures are visible in
journalctl -uand container logs are retained/rotated. - Removing a service from Compose doesn’t leave it running forever (use
--remove-orphans). - Disaster scenario tested: Docker fails to start, disk is full, mount missing, network missing. The system fails loudly, not silently.
FAQ
1) Should I use Docker restart policies if I have systemd units?
Yes, but for different layers. systemd should manage “start the stack at boot” and “stop it at shutdown.” Docker restart policies handle container crashes during runtime. Keep them aligned and avoid double-supervision illusions.
2) Should the systemd service be Type=simple and run “docker compose up” without -d?
Usually no. Running without -d binds your service health to a long-running client process. It can work, but it’s fragile: logs, TTY behavior, and client crashes can confuse systemd. Type=oneshot + up -d is cleaner for most hosts.
3) Is “depends_on” enough to control startup order?
It controls start order, not readiness. If you need readiness, use healthchecks and retry logic. Treat readiness as an application responsibility, not an orchestration magic trick.
4) Should ExecStop use “docker compose down” or “stop”?
stop is safer for stateful stacks and faster for restart. down is fine when you want containers recreated and you’re confident persistent data is on named volumes/bind mounts that won’t disappear.
5) How do I ensure my stack doesn’t start before my ZFS datasets or encrypted volumes?
Use RequiresMountsFor= pointing at the path your stack needs (for example, /srv/myapp or /var/lib/myapp). That forces systemd to wait for the mount. Also make sure the mount itself is configured to appear before multi-user.
6) Why does the unit say “active (exited)”—isn’t that wrong?
It’s correct for a oneshot unit with RemainAfterExit=yes. The unit’s job is to run the start command. Docker then keeps containers running. If you want systemd to track a process, you need a different pattern.
7) How do I run multiple stacks cleanly?
Use a template unit (compose@.service) and a directory convention like /srv/<stack>. Each stack gets enabled independently: systemctl enable --now compose@stackname. This avoids one monolithic unit that tries to do everything and fails ambiguously.
8) What’s the clean way to handle secrets?
At minimum, keep secrets out of unit files and out of Git. Use EnvironmentFile= with appropriate permissions, or Compose secrets backed by files. If you already run a secret manager, integrate at container runtime. Don’t pretend systemd is a secret vault.
9) What about Podman Compose or quadlets?
Podman has first-class systemd integration via quadlets, and it’s a valid choice. But this piece is specifically about Docker Compose. If you’re on Podman, lean into its native patterns instead of emulating Docker.
10) How do I keep boot from pulling images and stalling forever?
Don’t pull at boot unless you mean to. Keep images pre-pulled via a separate update job or maintenance workflow. If you must pull at boot, increase TimeoutStartSec and accept the tradeoff.
Conclusion: the next right steps
If you want Compose stacks to behave after reboot, stop treating boot as a superstition and start treating it as dependency management. systemd is excellent at ordering and lifecycle. Compose is excellent at declaring the stack. Together, they’re boring in the best way.
Next steps that pay off immediately:
- Write a template unit with
WorkingDirectory,After/Requires, andRequiresMountsFor. - Enable it per stack and verify with
journalctl -bafter a controlled reboot. - Add healthchecks and retries where “running” is not the same as “ready.”
- Set stop timeouts that respect your stateful services.
Your future self will still get paged sometimes. But it’ll be for real problems, not because your containers beat your disks to the starting line.