Debian 13 “Start request repeated too quickly”: systemd fixes that actually stick

December 9, 2025 • February 3, 2026 • Read: 25 min • Views: 13

Was this helpful?

You deploy a service. It fails. You fix the obvious thing. It fails again—faster. Then systemd throws the dreaded line: “Start request repeated too quickly” and stops even trying. Now you’re stuck: not only is the service down, your init system is politely refusing to let you keep face-planting into the same wall.

This isn’t systemd being “mysterious.” It’s doing exactly what it was designed to do: rate-limit flapping units so one bad service doesn’t turn your host into a log-spewing space heater. The trick is diagnosing the real failure, then applying fixes that survive upgrades, reboots, and future you at 2 a.m.

What “Start request repeated too quickly” actually means

Systemd is telling you: “I tried to start this unit several times in a short window. It kept failing. I’m done until you intervene.” The intervention might be a config fix, a dependency fix, or a deliberate change in the unit’s restart policy. Sometimes you also need to clear the failure state.

Under the hood, systemd enforces a start rate limit per unit. If the unit transitions into failure too many times within the configured interval, systemd hits the start limit and stops automatically starting it. The default behavior varies by version and distro policy, but the core mechanism is consistent: StartLimitIntervalSec + StartLimitBurst. Hit that combo and you get the message.

Important nuance: this is not only about Restart=. Even units that don’t restart themselves can hit the limit if something (a dependency, a timer, an admin repeatedly running systemctl start) triggers frequent attempts.

What you should do: treat the message as a symptom of a loop. The loop is the problem. The limiter is the seatbelt.

One quote, because it still holds: “Hope is not a strategy.” — Gene Kranz

Fast diagnosis playbook (first/second/third)

If you want to find the bottleneck quickly, don’t start by editing unit files. Start by answering three questions: What failed? Why did it fail? What kept retrying?

First: confirm the limiter is what stopped the retries

Look for start-limit-hit results and the actual exit code of the service process.
Confirm whether the service is exiting immediately (bad binary, bad config) or timing out (hang, dependency unavailable).

Second: find the first real error in the logs

Use journalctl -u with a tight time window.
Look for the earliest failure in the burst, not the last one. The last one is often just systemd giving up.

Third: map the loop trigger

Is the unit set to Restart=always?
Is it a dependency chain where a parent keeps trying?
Is a timer or path unit kicking it?
Is something external repeatedly running systemctl start (or a config management agent)?

The fastest wins usually come from: fixing the real underlying error, then adjusting restart semantics so a failure doesn’t become a denial-of-service against your own host.

Practical tasks: commands, outputs, decisions (12+)

Below are field-tested tasks. Each one includes: a command, what realistic output looks like, what it means, and the decision you should make next.

Task 1: Check unit state and confirm start-limit-hit

cr0x@server:~$ systemctl status myapp.service
× myapp.service - MyApp API
     Loaded: loaded (/lib/systemd/system/myapp.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/myapp.service.d
             └─override.conf
     Active: failed (Result: start-limit-hit) since Mon 2025-12-29 09:12:11 UTC; 2min 3s ago
   Duration: 220ms
    Process: 18934 ExecStart=/usr/local/bin/myapp --config /etc/myapp/config.yml (code=exited, status=1/FAILURE)
   Main PID: 18934 (code=exited, status=1/FAILURE)

Dec 29 09:12:11 server systemd[1]: myapp.service: Scheduled restart job, restart counter is at 5.
Dec 29 09:12:11 server systemd[1]: myapp.service: Start request repeated too quickly.
Dec 29 09:12:11 server systemd[1]: myapp.service: Failed with result 'start-limit-hit'.
Dec 29 09:12:11 server systemd[1]: Failed to start myapp.service - MyApp API.

Meaning: systemd rate-limited the unit. The service itself is exiting with status 1. The restart loop hit 5 attempts (typical burst threshold).

Decision: stop tweaking restart limits first. You have a real app failure. Go to logs for the first error.

Task 2: Pull logs around the failure window (and stop reading the last line only)

cr0x@server:~$ journalctl -u myapp.service -b --since "10 minutes ago" --no-pager
Dec 29 09:12:10 server myapp[18934]: ERROR: cannot read config file: /etc/myapp/config.yml: permission denied
Dec 29 09:12:10 server systemd[1]: myapp.service: Main process exited, code=exited, status=1/FAILURE
Dec 29 09:12:10 server systemd[1]: myapp.service: Failed with result 'exit-code'.
Dec 29 09:12:11 server systemd[1]: myapp.service: Scheduled restart job, restart counter is at 1.
Dec 29 09:12:11 server systemd[1]: Started myapp.service - MyApp API.
Dec 29 09:12:11 server myapp[18938]: ERROR: cannot read config file: /etc/myapp/config.yml: permission denied
Dec 29 09:12:11 server systemd[1]: myapp.service: Main process exited, code=exited, status=1/FAILURE
Dec 29 09:12:11 server systemd[1]: myapp.service: Start request repeated too quickly.

Meaning: The real failure is a permissions problem on the config file. The limiter is downstream.

Decision: fix file ownership/permissions or the unit’s User/Group. Don’t raise the burst limit to “make it work.”

Task 3: Inspect the effective unit configuration (what systemd really runs)

cr0x@server:~$ systemctl cat myapp.service
# /lib/systemd/system/myapp.service
[Unit]
Description=MyApp API
After=network-online.target
Wants=network-online.target

[Service]
User=myapp
Group=myapp
ExecStart=/usr/local/bin/myapp --config /etc/myapp/config.yml
Restart=on-failure
RestartSec=1

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/myapp.service.d/override.conf
[Service]
RestartSec=250ms

Meaning: There’s a drop-in override setting RestartSec=250ms. That makes flapping services hit the limiter fast.

Decision: keep restart delay sane (2–10 seconds) unless you have a very good reason. Your logs and disks will thank you.

Task 4: Identify start limit settings (defaults vs overridden)

cr0x@server:~$ systemctl show myapp.service -p StartLimitIntervalSec -p StartLimitBurst -p Restart -p RestartUSec
StartLimitIntervalSec=10s
StartLimitBurst=5
Restart=on-failure
RestartUSec=250ms

Meaning: With an interval of 10 seconds and burst of 5, a 250ms restart delay can trip the limit almost immediately.

Decision: fix underlying error; then set RestartSec=2s or higher unless the service is safe to restart aggressively.

Task 5: Reset the failed state (only after you made a change)

cr0x@server:~$ systemctl reset-failed myapp.service
cr0x@server:~$ systemctl start myapp.service
cr0x@server:~$ systemctl status myapp.service
● myapp.service - MyApp API
     Loaded: loaded (/lib/systemd/system/myapp.service; enabled; preset: enabled)
    Drop-In: /etc/systemd/system/myapp.service.d
             └─override.conf
     Active: active (running) since Mon 2025-12-29 09:16:42 UTC; 3s ago
   Main PID: 19501 (myapp)
      Tasks: 8
     Memory: 34.2M
        CPU: 210ms
     CGroup: /system.slice/myapp.service
             └─19501 /usr/local/bin/myapp --config /etc/myapp/config.yml

Meaning: The rate limit had blocked starts; reset-failed cleared it. The service runs now.

Decision: if it still fails after reset, you have not fixed the root cause. Go back to logs.

Task 6: Confirm the service user can read what it needs

cr0x@server:~$ sudo -u myapp test -r /etc/myapp/config.yml && echo OK || echo NO
NO
cr0x@server:~$ ls -l /etc/myapp/config.yml
-rw------- 1 root root 912 Dec 29 08:58 /etc/myapp/config.yml

Meaning: Config is root-only. If the service runs as User=myapp, it can’t read it.

Decision: change ownership/group, adjust permissions, or run the service under a different user (last resort).

Task 7: Spot dependency chain failures (the “it’s not my unit” problem)

cr0x@server:~$ systemctl list-dependencies --reverse myapp.service
myapp.service
● myapp.target
● multi-user.target

Meaning: Not much is pulling it in; it’s likely enabled directly.

Decision: if the reverse deps list shows a bigger chain (like a target, or another service), you may need to fix the parent unit too.

Task 8: Identify whether a timer or path unit is repeatedly triggering the service

cr0x@server:~$ systemctl list-timers --all | grep -E 'myapp|myapp-'
Mon 2025-12-29 09:20:00 UTC  2min 10s left Mon 2025-12-29 09:10:00 UTC  7min ago myapp-refresh.timer myapp-refresh.service

cr0x@server:~$ systemctl status myapp-refresh.timer
● myapp-refresh.timer - MyApp refresh timer
     Loaded: loaded (/lib/systemd/system/myapp-refresh.timer; enabled; preset: enabled)
     Active: active (waiting) since Mon 2025-12-29 08:10:00 UTC; 1h 10min ago
    Trigger: Mon 2025-12-29 09:20:00 UTC; 2min 10s left

Meaning: Another unit can be calling into your service (or a related one) regularly. A failing refresh job can look like “myapp is flapping” if it shares resources.

Decision: if the timer is too aggressive or broken, fix the timer unit or disable it until stabilized.

Task 9: Check exit codes and signal reasons across restarts

cr0x@server:~$ systemctl show myapp.service -p ExecMainStatus -p ExecMainCode -p ExecMainStartTimestamp -p ExecMainExitTimestamp
ExecMainStatus=1
ExecMainCode=exited
ExecMainStartTimestamp=Mon 2025-12-29 09:12:10 UTC
ExecMainExitTimestamp=Mon 2025-12-29 09:12:10 UTC

Meaning: It exits immediately. That’s usually config, permissions, missing files, missing libs, or “can’t bind port.” It’s not a slow dependency.

Decision: prioritize configuration validation and environment issues; don’t waste time increasing TimeoutStartSec.

Task 10: Verify ports and socket activation assumptions

cr0x@server:~$ ss -lntp | grep ':8080'
LISTEN 0      4096         0.0.0.0:8080      0.0.0.0:*    users:(("nginx",pid=1201,fd=6))

Meaning: Something else owns port 8080 (nginx here). If myapp expects to bind it, it will crash-loop.

Decision: change myapp’s listen port, adjust nginx proxying, or use systemd socket activation properly (don’t half-do it).

Task 11: Validate unit file syntax and hidden typos

cr0x@server:~$ systemd-analyze verify /etc/systemd/system/myapp.service
/etc/systemd/system/myapp.service:12: Unknown lvalue 'RestartSecs' in section 'Service'

Meaning: Systemd ignored a misspelled directive. You thought you had a restart delay; you don’t.

Decision: fix the typo, run daemon-reload, and re-check systemctl show for effective values.

Task 12: Inspect environment seen by the service (PATH, vars, working dir)

cr0x@server:~$ systemctl show myapp.service -p Environment -p EnvironmentFile -p WorkingDirectory -p User -p Group
Environment=
EnvironmentFile=
WorkingDirectory=/
User=myapp
Group=myapp

Meaning: No environment is set and the working directory is root. If your app expects relative paths, it may fail instantly.

Decision: set WorkingDirectory= and explicit Environment= or EnvironmentFile=. Avoid relying on interactive shell defaults.

Task 13: Find who changed what (drop-ins, vendor units, overrides)

cr0x@server:~$ systemctl show myapp.service -p FragmentPath -p DropInPaths
FragmentPath=/lib/systemd/system/myapp.service
DropInPaths=/etc/systemd/system/myapp.service.d/override.conf

Meaning: The vendor unit is in /lib, and your local override is in /etc. That’s the correct pattern on Debian.

Decision: never edit /lib/systemd/system/*.service directly. Put changes in drop-ins so they persist across package upgrades.

Task 14: Track down “it works manually but not as a service”

cr0x@server:~$ sudo -u myapp /usr/local/bin/myapp --config /etc/myapp/config.yml
ERROR: cannot open database socket /run/postgresql/.s.PGSQL.5432: no such file or directory

Meaning: When run as the service user, it can’t reach a dependency (PostgreSQL socket). Maybe postgres isn’t running, or it listens elsewhere, or permissions block it.

Decision: fix dependency readiness and configuration; then adjust unit ordering (After=) only if necessary.

Task 15: Visualize boot ordering and critical chain (slow starts and timeouts)

cr0x@server:~$ systemd-analyze critical-chain myapp.service
myapp.service +4.212s
└─network-online.target +4.198s
  └─systemd-networkd-wait-online.service +4.150s
    └─systemd-networkd.service +1.021s
      └─systemd-udevd.service +452ms

Meaning: Your unit is gated by network-online.target, which waits for network. That can be fine—or a boot delay trap.

Decision: if your service doesn’t truly require “online,” switch to After=network.target and remove the wait-online dependency. Fewer boot-time surprises.

Systemd fixes that actually stick (and why)

“Fixes that stick” have two traits: they survive upgrades, and they reflect how the service behaves in real life. Most flapping services are either mis-specified (wrong unit semantics) or broken at runtime (config, permissions, dependencies). The correct response is usually a small override drop-in plus a real fix in the application environment.

Use drop-ins, not edits in /lib

On Debian, package-managed unit files live under /lib/systemd/system. Local changes belong under /etc/systemd/system. If you edit files under /lib, a package upgrade will eventually “fix” your fix.

cr0x@server:~$ sudo systemctl edit myapp.service
# (editor opens)

Add a drop-in like this:

cr0x@server:~$ cat /etc/systemd/system/myapp.service.d/override.conf
[Service]
Restart=on-failure
RestartSec=5s
TimeoutStartSec=30s

[Unit]
StartLimitIntervalSec=60s
StartLimitBurst=3

Why this sticks: it’s in /etc, so upgrades won’t overwrite it. It also makes your service less likely to thrash the host: 5 seconds between retries, and only 3 retries per minute before systemd stops and forces human attention.

What to avoid: setting StartLimitBurst=1000 or RestartSec=100ms because you “want it to recover fast.” That’s not recovery; that’s a fork bomb with better branding.

Make restart logic match the failure modes

Most services don’t need Restart=always. Use it only for daemons that must be present and are safe to restart regardless of exit reason. Prefer:

Restart=on-failure for typical servers that can exit intentionally during upgrades or config reloads.
Restart=no for one-shot tasks that should fail loudly and stop.
Restart=on-abnormal if you specifically want restarts on signals or dumps, not on clean non-zero exits (useful in some patterns).

Fix Type= and readiness, or systemd will “help” you into a loop

Another classic: the service is actually fine, but systemd thinks it never started. That happens when Type= is wrong.

Type=simple (default): process starts and stays in foreground. Most apps fit here.
Type=forking: legacy daemons that fork to background. If you use this for a foreground app, systemd can misread it and kill/restart it.
Type=notify: app must call sd_notify. If it doesn’t, systemd may time out and restart it.
Type=oneshot: tasks that run and exit; combine with RemainAfterExit=yes when appropriate.

Set the type correctly. If you don’t control the app and it doesn’t support sd_notify, don’t pretend it does.

Use ExecStartPre for validation, but don’t weaponize it

ExecStartPre= is a good place to validate config before burning a start attempt. Done right, it prevents flapping. Done wrong, it creates flapping.

Example: validate a config file exists and is readable by the service user:

cr0x@server:~$ cat /etc/systemd/system/myapp.service.d/validate.conf
[Service]
ExecStartPre=/usr/bin/test -r /etc/myapp/config.yml

If that test fails, the unit fails fast with a clear reason. Your restart policy should be conservative here. You don’t want the host hammering the same missing file 20 times a second.

Make dependencies explicit, but don’t over-serialize boot

Adding After=postgresql.service can help if your app truly cannot start until the DB is up. But too many teams cargo-cult “wait for network online” and “wait for everything.” Then a slow DHCP server takes down half of boot.

A practical stance:

Use Wants= for soft dependencies you’d like but can degrade without.
Use Requires= for hard dependencies. But know that if the dependency fails, your unit fails too.
Prefer your app doing its own retries for upstream services (DB, APIs) while systemd keeps it running, rather than systemd restarting it constantly.

Joke #1: Restart loops are like corporate all-hands meetings—lots of activity, no progress, and everyone leaves more tired.

Know when to adjust StartLimit* (and when not to)

There are legitimate cases to change start limits:

Services that may fail briefly during upstream maintenance and can safely retry over minutes.
Agents that connect to remote endpoints and occasionally race at boot.

But changing start limits is not a fix for “permission denied” or “config invalid.” In those cases, increasing retries just turns an error into churn.

Use systemd-run for safe reproduction

If you need to reproduce the service environment without editing the actual unit, systemd-run can create transient units with similar constraints. It’s a good way to test assumptions about user, working directory, and environment.

cr0x@server:~$ sudo systemd-run --unit=myapp-test --property=User=myapp --property=WorkingDirectory=/var/lib/myapp /usr/local/bin/myapp --config /etc/myapp/config.yml
Running as unit: myapp-test.service
cr0x@server:~$ systemctl status myapp-test.service
● myapp-test.service - /usr/local/bin/myapp --config /etc/myapp/config.yml
     Loaded: loaded (/run/systemd/transient/myapp-test.service; transient)
     Active: failed (Result: exit-code) since Mon 2025-12-29 09:22:06 UTC; 1s ago
    Process: 20411 ExecStart=/usr/local/bin/myapp --config /etc/myapp/config.yml (code=exited, status=1/FAILURE)

Decision: if it fails the same way, it’s not “systemd magic.” It’s the runtime environment.

Typical failure modes behind the message

Here’s what I see most often in Debian fleets. The limiter message is just the bouncer. These are the drunk guests.

1) Permission and ownership mismatches

Service runs as a non-root user. Config files, sockets, or state directories are root-owned. The binary exits immediately with a useful error, but the unit restarts too fast, hits start limits, and now you’re staring at the wrong line.

2) Wrong service Type

Foreground program labeled as forking, or notify without notification support. Systemd interprets it as not started, times out, kills it, restarts, repeats.

3) Fast crash on missing dependency

Example: database socket missing at boot. The app exits quickly; systemd restarts; repeat. Fix either ordering or make the app tolerate dependency unavailability.

4) Bind failures

Port already in use, or privileged port without capability, or wrong listen address. These fail instantly and loop aggressively if restart delay is small.

5) ExecStart points to a wrapper script with a bad shebang

/bin/bash might not exist in minimal containers. Or the script uses set -e and bails on a missing env var. Systemd will happily restart it until it refuses.

6) Timeouts and slow startups

The process starts but doesn’t become “ready” before TimeoutStartSec. Often paired with Type=notify misconfiguration.

7) Config management fighting you

You “fix” the override manually, and 5 minutes later a management agent reverts it. The unit flaps again and hits rate limits. Congratulations, you’ve discovered the invisible hand of compliance.

Common mistakes: symptom → root cause → fix

This section is meant to change decisions. If you recognize a symptom, skip the self-blame and go straight to the actual fix.

Symptom: “Start request repeated too quickly” after you set Restart=always

Root cause: You masked a real failure with aggressive restart. The app is exiting for a reason—config, permission, dependency, crash.

Fix: switch to Restart=on-failure, set RestartSec=5s, then locate the first error in journalctl -u. Only revisit rate limits after the service is stable.

Symptom: Service starts manually but fails under systemd

Root cause: different environment: working directory, PATH, environment variables, user permissions, or missing interactive shell setup.

Fix: inspect systemctl show -p WorkingDirectory -p Environment, set them explicitly, and test with sudo -u.

Symptom: Unit “times out” then hits start limit

Root cause: wrong Type= (notify without notify) or slow dependency readiness; systemd kills it after TimeoutStartSec.

Fix: set Type=simple unless you know better, or implement proper notify; tune TimeoutStartSec only after confirming it’s truly slow and not stuck.

Symptom: “Failed to start” with exit-code, but no useful app logs

Root cause: stdout/stderr not reaching journal (custom logging), or the process fails before its logger initializes; sometimes ExecStart points to the wrong file.

Fix: confirm ExecStart exists and is executable; add temporary StandardOutput=journal and StandardError=journal in a drop-in; verify with systemctl cat.

Symptom: Fix works until reboot, then flaps again

Root cause: state directories in /run not created, or tmpfiles missing; or ordering depends on boot timing.

Fix: use RuntimeDirectory= or StateDirectory= in the unit; ensure permissions via systemd rather than ad-hoc scripts.

Symptom: Start limit hit only during deployments

Root cause: deploy scripts repeatedly restarting while replacing binaries/config, causing transient failures in tight loops.

Fix: coordinate deploy steps: stop unit, replace artifacts, validate config, then start once. Consider ExecReload where applicable.

Symptom: Unit shows “start-limit-hit” even after you fixed the app

Root cause: systemd remembers the failure state.

Fix: run systemctl reset-failed myapp.service, then start again. If it hits again, you’re not fixed.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

They had a Debian-based API service behind a load balancer. After a routine maintenance reboot, half the pool came back “unhealthy.” The dashboards were loud, but not helpful: the hosts were up, the network looked fine, and systemd was spitting “Start request repeated too quickly” like it was bored of the whole ordeal.

The on-call assumption was classic: “network-online.target is flaky again.” So they did what people do under stress—they made the waiting longer. Someone bumped TimeoutStartSec to a few minutes and added more After= lines, serializing boot behind extra targets.

The real cause was dull: the service was running as User=api, and a package post-install step had rewritten the config file with root-only permissions. The app exited instantly with “permission denied,” then systemd did its restart loop until start limiting kicked in. Nothing to do with the network. Everything to do with file modes.

Once they checked journalctl -u for the first error and ran sudo -u api test -r, it was obvious. They fixed ownership and added a hard validation step in CI to assert file permissions. The “systemd problem” vanished, because it was never a systemd problem.

They also removed the extra boot dependencies they’d added in panic. That cleanup mattered: it reduced boot time and prevented unrelated outages later. The lesson wasn’t “don’t assume.” That’s too generic. The lesson was: assume boring things first—permissions, paths, ports—before you build a mythology around targets and ordering.

Mini-story 2: The optimization that backfired

A platform team wanted faster recovery from transient failures. Their reasoning sounded tidy: if a service dies, restart it immediately. So they set Restart=always and tuned RestartSec=200ms across a fleet of internal microservices. They also increased StartLimitBurst because “we don’t want systemd to give up.”

For a couple of weeks it looked fine. Then a dependency service started returning malformed responses due to a bad rollout. One client service segfaulted when parsing them. With the new policy, it didn’t just crash—it machine-gunned restarts. Every restart reloaded configuration, opened connections, wrote logs, and churned CPU caches. The hosts didn’t die instantly, but latency shot up. Not because the service was down, but because it was flapping hard enough to starve its neighbors.

They had effectively replaced “a crash” with “a sustained denial-of-service against their own nodes.” Worse, the increased StartLimitBurst meant systemd kept participating in the chaos for longer. The blast radius expanded from one broken client to a noisy cluster.

The fix was not heroic. They rolled back the aggressive restart settings, restored conservative start limits, and added application-level backoff when upstream data looked wrong. Then they used Restart=on-failure plus RestartSec=5s for most services, reserving fast restarts for a few truly stateless and well-behaved daemons.

They kept one “optimization,” though: a short ExecStartPre config validation so a bad config fails once and stays failed. That’s the kind of fast failure you want—fast to detect, not fast to repeat.

Mini-story 3: The boring but correct practice that saved the day

A storage-adjacent daemon was responsible for mounting and checking encrypted volumes before dependent services started. It wasn’t glamorous. But it was foundational. The team had implemented it with three unsexy habits: unit drop-ins in /etc, explicit StateDirectory= and RuntimeDirectory=, and a short health-check that logged clearly to the journal.

One morning, after an OS upgrade, a subset of hosts began failing the volume prep step. The dependent services all showed “Start request repeated too quickly,” because their required mount never appeared. The difference was that the prep unit’s logs were clean and specific: a missing kernel module on that subset of hosts.

Because dependencies were modeled with Requires= and clear ordering, the failure mode was controlled. Instead of every dependent service flapping, they failed quickly and stayed down. That sounds bad until you’ve seen the alternative: a thundering herd of restarts that buries the root cause under noise.

The fix was straightforward: install the missing module package and rebuild initramfs on the affected nodes. The recovery was equally straightforward: systemctl reset-failed on the targets and start them. No weirdness, no mystery edits in vendor units, no “temporary” symlinks living in /lib.

Joke #2: The best SRE trick is making outages boring—because excitement is just downtime with better marketing.

Checklists / step-by-step plan

Step-by-step: stabilize a flapping unit without hiding the problem

Freeze the loop: if it’s hammering the node, stop it.
```
cr0x@server:~$ sudo systemctl stop myapp.service
```
Decision: if stopping it improves system load immediately, you were in a restart storm. Keep it stopped while debugging.

Get the real error: read logs for the first failure.

cr0x@server:~$ journalctl -u myapp.service -b --since "30 minutes ago" --no-pager | head -n 60
Dec 29 09:12:10 server myapp[18934]: ERROR: cannot read config file: /etc/myapp/config.yml: permission denied
Dec 29 09:12:10 server systemd[1]: myapp.service: Main process exited, code=exited, status=1/FAILURE
...

Decision: if you see a clear app error, fix that first. If logs are empty, validate ExecStart and enable journal output temporarily.

Confirm effective unit config:

cr0x@server:~$ systemctl show myapp.service -p FragmentPath -p DropInPaths -p User -p Group -p ExecStart -p Type -p Restart -p RestartUSec
FragmentPath=/lib/systemd/system/myapp.service
DropInPaths=/etc/systemd/system/myapp.service.d/override.conf
User=myapp
Group=myapp
ExecStart={ path=/usr/local/bin/myapp ; argv[]=/usr/local/bin/myapp --config /etc/myapp/config.yml ; ... }
Type=simple
Restart=on-failure
RestartUSec=250ms

Decision: if RestartSec is too small, fix it in an override. If Type looks wrong, fix it. If User/Group doesn’t match file ownership, fix permissions or unit.

Apply a durable override via drop-in:
```
cr0x@server:~$ sudo systemctl edit myapp.service
```
Use settings like RestartSec=5s and conservative start limits while you stabilize.
Reload systemd and reset failure:
```
cr0x@server:~$ sudo systemctl daemon-reload
cr0x@server:~$ sudo systemctl reset-failed myapp.service
```
Decision: if you forget daemon-reload, your edits may not apply. If you forget reset-failed, systemd may still refuse to start.

Start once, observe:

cr0x@server:~$ sudo systemctl start myapp.service
cr0x@server:~$ systemctl status myapp.service --no-pager
● myapp.service - MyApp API
     Active: active (running) since Mon 2025-12-29 09:30:01 UTC; 2s ago

Decision: if it fails again, don’t keep restarting manually. Go back to logs; iterate deliberately.

Checklist: “Does this unit deserve restart at all?”

If it’s a batch job: use Type=oneshot, consider Restart=no, and fail loudly.
If it’s a daemon that should be present: Restart=on-failure is usually correct.
If it exits cleanly as part of normal operation: avoid Restart=always or you’ll create a loop by design.
If it relies on network or upstream services: favor application-level retry/backoff while the process stays running.

Checklist: safe defaults for most internal services

Restart=on-failure
RestartSec=5s (2s for very lightweight stateless daemons; 10s for heavyweight ones)
StartLimitIntervalSec=60s
StartLimitBurst=3 (5 is fine too; pick something that forces human attention)
Explicit User=, Group=
WorkingDirectory= if the app uses relative paths
RuntimeDirectory=/StateDirectory= for directories the service needs

Interesting facts and history (systemd rate limiting edition)

Systemd didn’t invent restart loops; it just made them easier to express with Restart= and more visible with structured unit state.
The “start limit” is per unit, not global. One flapping service can be contained without punishing others—unless it starves the box.
Drop-in directories (/etc/systemd/system/UNIT.d/*.conf) exist specifically so package upgrades don’t overwrite local operational intent.
Debian’s split between /lib and /etc is a deliberate policy choice: vendor files in /lib, admin changes in /etc.
systemd’s journal was designed to capture structured metadata (unit name, PID, cgroup) so you can answer “what happened?” without grep archaeology.
Start limiting is a safety feature that prevents log floods and CPU churn; it’s basically a circuit breaker for process supervision.
Type=notify came from a desire to replace “sleep 5 and hope” readiness with explicit signaling—great when used correctly, punishing when faked.
Network-online.target is frequently misunderstood: it’s not “network exists,” it’s “a component declares the network is configured,” which can be slow or incorrect depending on the stack.
Resetting failed units is an explicit operator action because systemd treats repeated failure as a meaningful state, not a transient glitch to ignore.

FAQ

1) Does “Start request repeated too quickly” mean systemd is broken?

No. It means your service failed repeatedly and systemd applied its configured rate limit. The service is broken (or mis-specified), and systemd is preventing infinite churn.

2) How do I clear the start limit so I can try again?

After you’ve made a real fix, run:

cr0x@server:~$ sudo systemctl reset-failed myapp.service
cr0x@server:~$ sudo systemctl start myapp.service

If it immediately hits the limit again, you didn’t fix the root cause.

3) Should I increase StartLimitBurst to make it more “resilient”?

Usually no. Burst increases hide real failures and increase host churn. Prefer fixing the real error and using a sensible RestartSec. If you increase burst, do it modestly and intentionally.

4) What’s the difference between RestartSec and StartLimitIntervalSec?

RestartSec is the delay between restart attempts. StartLimitIntervalSec is the window over which systemd counts failed starts, and StartLimitBurst is the max attempts allowed in that window.

5) Why does it fail only on boot but works when I start it later?

Boot timing exposes dependency problems: missing mounts, unavailable network, DB not ready, runtime directories not created. Model the dependency (or make the app tolerate it) and avoid overusing network-online.target.

6) I edited the unit but nothing changed. Why?

Common causes: you edited the vendor unit under /lib and it got replaced, or you forgot systemctl daemon-reload, or your setting was misspelled and ignored. Run systemd-analyze verify and systemctl show to confirm effective values.

7) Is it okay to set Restart=always for critical services?

Sometimes, yes. But only if you understand the service’s exit behavior and it’s safe to restart unconditionally. Many services exit intentionally during upgrades or config changes; Restart=always can fight those workflows.

8) How can I tell if a timer or something else keeps triggering starts?

Check timers and reverse dependencies:

cr0x@server:~$ systemctl list-timers --all
cr0x@server:~$ systemctl list-dependencies --reverse myapp.service

If a timer triggers a job every minute, you may be seeing repeated start attempts that aren’t coming from Restart=.

9) When should I change TimeoutStartSec?

Only when you’ve confirmed the service is legitimately slow to become ready. If the app exits immediately, increasing the timeout does nothing. If Type=notify is wrong, you’re fixing the wrong layer.

10) Can I make systemd log more about why it stopped trying?

Systemd already logs the start-limit event. The missing piece is usually the app’s own error output. Ensure it writes to stderr/stdout or configure logging so journalctl -u captures the first failure reason.

Conclusion: next steps that keep you out of the ditch

“Start request repeated too quickly” is a kindness. It’s systemd telling you the service is failing in a tight loop and it’s not going to help you burn the host down. Treat it as a checkpoint, not the root cause.

Next steps that actually work in Debian 13 environments:

Find the first real error with journalctl -u UNIT and stop reading only the last line.
Inspect effective config with systemctl show and systemctl cat. Trust what systemd runs, not what you think you wrote.
Fix the runtime problem (permissions, ports, dependencies, working directory) before touching limits.
Apply durable changes with drop-ins under /etc/systemd/system/UNIT.d/, then daemon-reload.
Set sane restart semantics: Restart=on-failure, reasonable RestartSec, conservative StartLimit*.
Reset failure state only after changes: systemctl reset-failed.

If you do just one thing: stop treating systemd like a slot machine. Pull the lever less. Read the logs more. The service will either start—or it will fail for a reason you can fix.