At 03:12, production fell over. You did what every sane person does: you reached for logs. And the logs did what logs love to do under stress: they got quiet, rotated away, or never made it off the box.
Ubuntu 24.04 gives you two logging realities living side-by-side: systemd-journald (the journal) and rsyslog (classic syslog). The choice isn’t “modern vs legacy.” It’s “what failure modes can I tolerate,” “how do I prove I didn’t lose events,” and “how fast can I answer an incident commander without guessing.”
The decision: what you should run and why
If you run Ubuntu 24.04 in production and you care about not losing important events, do this:
- Keep journald. It’s not optional on systemd systems, and it’s your best first responder view.
- Make the journal persistent on anything you’ll ever debug after a reboot.
- Use rsyslog for durable, controllable forwarding to a central log platform (SIEM, ELK/OpenSearch, Splunk, whatever your org calls “the truth”).
- Don’t use “forward everything twice” as a strategy. Duplicates are not redundancy; they’re noise that makes you miss the one line you needed.
In other words: journald for local capture, indexing, and structured metadata; rsyslog for syslog ecosystem compatibility, queueing, and deliberate forwarding rules. You can forward from journald to rsyslog, or have services log to syslog directly. The right answer depends on what you need to prove during an incident or audit.
Dry truth: you don’t choose logging by vibes. You choose it by failure mode. Ask, “What happens when disk is full? When network drops? When the box reboots? When time jumps? When the process floods the logger?” and pick the stack that fails the way you can live with.
A mental model that doesn’t lie under pressure
What journald really is
systemd-journald is a collector and store for log events with attached metadata: cgroup, unit name, PID, UID, capabilities, SELinux/AppArmor context (where available), boot ID, and monotonic timestamps. It stores entries in binary journal files. “Binary” isn’t a moral failing; it’s a performance and integrity choice. It allows indexing and relatively fast queries like “show me everything from sshd.service in the last boot.”
By default on many systems, journald uses volatile storage (memory-backed under /run/log/journal) unless persistent storage is configured. That default is friendly to small disks and ephemeral machines, and brutal when you need to debug something that happened before a reboot.
What rsyslog really is
rsyslog is a syslog daemon that ingests messages (from local sockets, from the network, from journald via an input module) and then routes them based on rules. It’s very good at queues, rate-limits, disk-assisted buffering, and shipping logs reliably when the network behaves like a network (which is to say: badly, sometimes).
rsyslog outputs are usually text files in /var/log or remote syslog destinations. Text logs remain the lingua franca of a depressing amount of tooling. That’s not nostalgia; that’s compatibility with things that still parse syslog like it’s 2009.
The pipeline on Ubuntu 24.04 (typical)
- Kernel messages go to the kernel ring buffer, then journald collects them; rsyslog can also read kernel messages depending on config.
- systemd services log to stdout/stderr; journald captures that automatically.
- Many traditional apps still log via
/dev/log(syslog socket). That can be provided by rsyslog or systemd-journald’s syslog compatibility socket. - rsyslog can ingest from journald (via
imjournal) or from the syslog socket, then write files and/or forward.
If you’ve ever wondered why your /var/log/syslog is missing a line you saw in journalctl, the answer is usually “those are two different capture paths.” Logging is a supply chain. You don’t notice the supply chain until a container ship gets stuck.
One quote to staple to your monitor (paraphrased idea): Gene Kim’s operations theme is that improvement comes from shortening feedback loops. Logging is one of your shortest loops; treat it like production code.
Joke #1: Logging is like teeth—ignore it until it hurts, then suddenly you’re willing to pay any price for the pain to stop.
Interesting facts and historical context
- syslog predates Linux. The original syslog came out of BSD Unix in the 1980s, designed for simple networked log transport when “security model” was mostly “don’t let Dave in accounting touch the server.”
- rsyslog is newer than people think. rsyslog was created in the early 2000s as a drop-in replacement for sysklogd with better performance and features like TCP, RELP, and queueing.
- journald stores logs in a binary format by design. It’s optimized for indexed queries and metadata-rich events; the “binary logs are bad” argument is mostly about tooling expectations, not the underlying reliability.
- systemd made stdout/stderr first-class logging. That changed application logging culture: services no longer had to manage log files if they didn’t want to. The platform captures it.
- Traditional log rotation was invented to control disk usage for text logs. With journald, retention is often managed by size/time caps rather than filename-based rotation, which changes how “did we keep last week?” is answered.
- RELP exists because TCP wasn’t enough. TCP can still lose data when a sender crashes or a connection resets at the wrong time; RELP (Reliable Event Logging Protocol) adds application-level acknowledgements.
- Journald tags logs with a boot ID. That sounds small until you’re debugging an intermittent crash and need to separate “this boot” from “the last boot.” It’s a gift.
- The Linux kernel ring buffer is finite. If you don’t drain it under flood, old kernel messages are overwritten. That’s not journald’s fault, but journald is your normal drain path.
Trade-offs that actually matter in 2025
Durability: what survives reboot and what doesn’t
journald can be volatile or persistent. Volatile journald is fine for cattle nodes where you centralize everything instantly, and terrible for “why did it reboot?” moments when your forwarder didn’t ship the last 30 seconds.
rsyslog writing to disk is persistent by default (assuming it writes to /var/log and that filesystem survives). But persistence on the same disk as your workload isn’t a win if the disk fills and your app dies. Durability is a system property, not a daemon property.
Backpressure and burst handling
Under log storms, the logging system becomes part of your performance profile. journald has rate limiting and can drop messages. rsyslog can queue in memory or spill to disk. If you care about “never drop auth logs” or “capture the last 60 seconds before a crash,” you need explicit settings and, usually, disk-assisted queues.
Metadata and query ergonomics
journald wins locally for fast slicing: by unit, by PID, by cgroup, by boot, by priority, by time. If you’re doing incident response on a single box, journalctl is often faster than grepping files—especially when services spam structured data or when PIDs churn.
rsyslog wins when you need to integrate with everything that expects syslog, from network gear to old compliance pipelines. It’s the “universal adapter.”
Security and tamper resistance
Neither daemon magically makes logs tamper-proof. Local root can always do violence. Your real control is: ship logs off-host quickly, keep them immutable in the aggregator, and control access. journald does support sealing features, but don’t confuse “harder to casually edit” with “forensic-grade.”
Complexity and operational cost
Running only journald is simple until you need reliable forwarding with buffering, filtering, and protocol choices. Running journald + rsyslog is a little more moving parts, but gives you explicit control of the pipeline. In production, explicit beats implicit.
Joke #2: “We don’t need centralized logging” is a bold strategy; it’s like opting out of seatbelts because you plan to drive carefully.
Practical tasks (commands, output meaning, decisions)
These are the checks I run on Ubuntu 24.04 when someone says “logs are missing,” “disk is filling,” or “forwarding is flaky.” Each task includes: command, what the output means, and what decision you make.
Task 1: Confirm what’s running (journald, rsyslog)
cr0x@server:~$ systemctl status systemd-journald rsyslog --no-pager
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static)
Active: active (running) since Mon 2025-12-30 09:10:11 UTC; 2h 1min ago
...
● rsyslog.service - System Logging Service
Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; preset: enabled)
Active: active (running) since Mon 2025-12-30 09:10:13 UTC; 2h 1min ago
...
Meaning: Both services are active; you likely have dual-path logging. If rsyslog is inactive, you’re probably relying on journald only.
Decision: If you need remote forwarding with buffering, enable rsyslog (or a dedicated forwarder) and define the path intentionally.
Task 2: See journald storage mode (volatile vs persistent)
cr0x@server:~$ journalctl --disk-usage
Archived and active journals take up 96.0M in the file system.
Meaning: There are journal files on disk somewhere. If this command errors or shows tiny usage but you expected history, you may be volatile-only.
Decision: If you care about logs across reboots, ensure persistent storage is enabled and you have retention settings.
Task 3: Verify whether journald is using persistent storage
cr0x@server:~$ ls -ld /var/log/journal /run/log/journal
drwxr-sr-x 3 root systemd-journal 4096 Dec 30 09:10 /var/log/journal
drwxr-sr-x 2 root systemd-journal 120 Dec 30 09:10 /run/log/journal
Meaning: /var/log/journal exists, so persistence is enabled (or at least available). If it doesn’t exist, journald may be volatile.
Decision: If /var/log/journal is missing, create it and set Storage=persistent (details in the plan section).
Task 4: Inspect journald retention and rate limits
cr0x@server:~$ systemd-analyze cat-config systemd/journald.conf
# /etc/systemd/journald.conf
[Journal]
Storage=persistent
SystemMaxUse=2G
SystemKeepFree=1G
RateLimitIntervalSec=30s
RateLimitBurst=10000
Meaning: These are the effective settings after drop-ins. Small SystemMaxUse means faster eviction. Aggressive rate limiting can drop bursts.
Decision: Tune for your disk budget and incident needs. If you see drops during spikes, adjust rate limits and ship off-host.
Task 5: Detect dropped messages in journald
cr0x@server:~$ journalctl -u systemd-journald --since "1 hour ago" | tail -n 8
Dec 30 10:44:02 server systemd-journald[412]: Suppressed 12845 messages from /system.slice/myapp.service
Dec 30 10:44:02 server systemd-journald[412]: Forwarding to syslog missed 0 messages
Meaning: “Suppressed” indicates rate-limited drops. That’s not theoretical. It’s happening.
Decision: If the suppressed unit is important (auth, kernel, your core service), raise limits and reduce spam at source. Consider rsyslog queues for forwarding reliability.
Task 6: Check whether rsyslog is ingesting from journald
cr0x@server:~$ grep -R "imjournal" /etc/rsyslog.d /etc/rsyslog.conf
/etc/rsyslog.conf:module(load="imjournal" StateFile="imjournal.state")
Meaning: rsyslog is reading from the systemd journal via imjournal. If absent, rsyslog may be reading from /dev/log instead.
Decision: Pick one ingestion strategy to avoid duplicates: either imjournal (journal as source of truth) or socket (syslog as source). Don’t accidentally do both.
Task 7: Spot duplicate events (a classic dual-ingest symptom)
cr0x@server:~$ sudo awk 'NR<=20{print}' /var/log/syslog
Dec 30 11:01:10 server myapp[2211]: started worker=7
Dec 30 11:01:10 server myapp[2211]: started worker=7
Meaning: Same message twice at same timestamp strongly suggests double ingestion (e.g., app logs to syslog and journald forwards into rsyslog too).
Decision: Disable one path: either stop forwarding from journald to rsyslog, or stop rsyslog from also reading /dev/log, depending on architecture.
Task 8: Verify rsyslog queues and whether forwarding is blocked
cr0x@server:~$ systemctl status rsyslog --no-pager | sed -n '1,14p'
● rsyslog.service - System Logging Service
Active: active (running) since Mon 2025-12-30 09:10:13 UTC; 2h 9min ago
Main PID: 621 (rsyslogd)
Tasks: 4
Memory: 8.5M
CPU: 1.901s
CGroup: /system.slice/rsyslog.service
└─621 /usr/sbin/rsyslogd -n -iNONE
Meaning: Status alone won’t tell you queue depth, but it confirms daemon health and flags obvious crash loops.
Decision: If remote forwarding is delayed, check network reachability and rsyslog action queues (see config validation tasks below).
Task 9: Validate rsyslog config (syntax, modules, includes)
cr0x@server:~$ rsyslogd -N1
rsyslogd: version 8.2312.0, config validation run (level 1), master config /etc/rsyslog.conf
rsyslogd: End of config validation run. Bye.
Meaning: Validation passed. If it outputs errors, rsyslog may be running with partial config or failing to start after changes.
Decision: Never reload rsyslog blindly in production. Validate first, then reload, then confirm message flow.
Task 10: Determine whether forwarding is UDP (lossy) or TCP/RELP (better)
cr0x@server:~$ grep -R "@" /etc/rsyslog.d /etc/rsyslog.conf
/etc/rsyslog.d/60-forward.conf:*.* @@logrelay.internal:514
Meaning: @ is UDP, @@ is TCP. TCP still can lose during crashes; RELP is stronger.
Decision: If “don’t lose auth logs” is a requirement, don’t use UDP. Use TCP with disk queues or RELP if your relay supports it.
Task 11: Check if journald is forwarding to syslog (and whether you even need it)
cr0x@server:~$ grep -R "^ForwardToSyslog" /etc/systemd/journald.conf /etc/systemd/journald.conf.d 2>/dev/null
/etc/systemd/journald.conf:ForwardToSyslog=yes
Meaning: journald is forwarding entries to the syslog socket. If rsyslog also reads from the journal, that can duplicate.
Decision: Choose a single handoff point: either ForwardToSyslog (journald → syslog socket) or rsyslog imjournal (journald → rsyslog directly).
Task 12: Identify “why did it reboot?” using boot-separated journal views
cr0x@server:~$ journalctl --list-boots | tail -n 3
-2 2f1c1b2dd0e84fbb9a1f66b2ff0f8d1e Sun 2025-12-29 22:10:17 UTC—Sun 2025-12-29 23:52:01 UTC
-1 7d8c0e3fa0f44a3b8c0de74b8b9f41a2 Mon 2025-12-30 00:10:06 UTC—Mon 2025-12-30 09:09:55 UTC
0 94f2b5d9f61e4f57b5f3c3c7a9c2a1d1 Mon 2025-12-30 09:10:06 UTC—Mon 2025-12-30 11:19:44 UTC
Meaning: Multiple boots are visible, so persistence is working. If you only ever see “0”, you’re likely volatile or history was vacuumed.
Decision: If reboots are mysterious, lock in persistent journald and increase retention so “previous boot” exists when you need it.
Task 13: Pull the shutdown/crash narrative quickly
cr0x@server:~$ journalctl -b -1 -p warning..emerg --no-pager | tail -n 20
Dec 30 09:09:51 server kernel: Out of memory: Killed process 2211 (myapp) total-vm:...
Dec 30 09:09:52 server systemd[1]: myapp.service: Main process exited, code=killed, status=9/KILL
Dec 30 09:09:55 server systemd[1]: Reached target Reboot.
Meaning: Last boot shows OOM kill and service death leading to reboot. This is the kind of “one screen” view journald is excellent at.
Decision: If kernel/OOM events are critical, ensure they are forwarded off-host and not rate-limited away under memory pressure.
Task 14: Confirm disk pressure on logging filesystem
cr0x@server:~$ df -h /var /run
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 40G 34G 4.2G 90% /
tmpfs 3.1G 180M 2.9G 6% /run
Meaning: /var is tight. If logs share the root filesystem, a log burst can become an outage.
Decision: Cap journald usage (SystemMaxUse), rotate text logs properly, and ship off-host. If needed, separate /var onto its own filesystem in serious environments.
Task 15: Quantify which journal consumers are heavy
cr0x@server:~$ journalctl --since "1 hour ago" -o json-pretty | head -n 20
{
"_SYSTEMD_UNIT" : "myapp.service",
"PRIORITY" : "6",
"MESSAGE" : "processed batch id=9f1c...",
"_PID" : "2211",
"__REALTIME_TIMESTAMP" : "1735557650000000"
}
Meaning: JSON output shows fields you can filter on. If your app is spamming “processed batch …” at info level, that’s your disk and your future self’s problem.
Decision: Reduce log volume at the source. Logging systems are not a substitute for metrics.
Task 16: Check who owns access to the journal (debugging permissions)
cr0x@server:~$ id
uid=1000(cr0x) gid=1000(cr0x) groups=1000(cr0x),4(adm)
Meaning: Users in adm often can read many logs; journald access is commonly granted via systemd-journal group or via sudo.
Decision: Give on-call engineers the minimum groups needed to read logs without granting full root. Then audit that decision quarterly, because org charts drift.
Fast diagnosis playbook
You’re on call. The alert says “service down.” Someone says “logs are missing.” Don’t spelunk. Do this in order.
First: find out if the events exist locally at all
- Check the journal for the service and timeframe. Filter by unit and priority. If the journal has it, you have a ground truth starting point.
- Check previous boot. If the host rebooted, your “missing logs” might just be “you’re looking at the wrong boot.”
cr0x@server:~$ journalctl -u myapp.service --since "30 min ago" -p info..emerg --no-pager | tail -n 30
Dec 30 11:03:01 server myapp[2211]: healthcheck failed: upstream timeout
Dec 30 11:03:02 server systemd[1]: myapp.service: Main process exited, code=exited, status=1/FAILURE
Interpretation: If journal has the event, the service is logging and journald is collecting. Your problem is likely forwarding, duplication filters, or file-based syslog expectations.
Second: determine whether data is being dropped
- Look for journald suppression messages.
- Check disk pressure. Full disks cause weird behavior and missing writes.
- Check rsyslog health and config validation.
Third: isolate the bottleneck: capture, store, or ship
- Capture bottleneck: application not logging, stdout not connected, syslog socket mismatch, permissions.
- Store bottleneck: journald volatile, retention too small, disk full, vacuuming, rotation too aggressive.
- Ship bottleneck: rsyslog forwarding over UDP, no queues, network drops, DNS issues for log host, TLS misconfig.
Fourth: prove it with one controlled test message
cr0x@server:~$ logger -p authpriv.notice "LOGTEST authpriv notice from $(hostname) at $(date -Is)"
cr0x@server:~$ journalctl --since "1 min ago" | grep LOGTEST | tail -n 1
Dec 30 11:18:22 server cr0x: LOGTEST authpriv notice from server at 2025-12-30T11:18:22+00:00
Interpretation: If it’s in the journal but not in /var/log/syslog (or not at your aggregator), you’ve narrowed the failure to the handoff/ship path.
Common mistakes: symptoms → root cause → fix
1) “The logs disappear after reboot”
Symptoms: journalctl --list-boots shows only boot 0; investigation after a crash has no history.
Root cause: journald is using volatile storage (/run) because persistent storage wasn’t enabled or /var/log/journal doesn’t exist.
Fix: Create /var/log/journal, set Storage=persistent, restart journald, and confirm multiple boots appear. Also set retention caps so persistence doesn’t become disk exhaustion.
2) “We have duplicates everywhere”
Symptoms: Same line appears twice in /var/log/syslog or twice in the aggregator, often with identical timestamps.
Root cause: Dual ingestion: app logs to syslog socket while journald forwards to syslog and rsyslog also reads from the journal (or vice versa).
Fix: Choose one: rsyslog reads from imjournal or journald forwards to syslog socket. Don’t combine without deliberate dedup logic.
3) “Auth logs are missing from the central system, but local syslog has them”
Symptoms: /var/log/auth.log is populated locally; SIEM is missing entries during network hiccups.
Root cause: UDP forwarding, or TCP forwarding without disk queues, or a relay outage with no buffering.
Fix: Use TCP with disk-assisted queues or RELP to a relay designed for ingestion. Verify queue settings and test by blocking the network temporarily.
4) “During an incident, journalctl is slow or times out”
Symptoms: journalctl queries take a long time, CPU spikes, I/O waits.
Root cause: Huge journals on slow disks, aggressive logging volume, or contention on the underlying filesystem. Sometimes it’s simply trying to render too much output.
Fix: Filter aggressively (unit, priority, time), cap disk usage, vacuum old entries, and keep logs off your slowest storage when possible.
5) “/var is full, and now everything is on fire”
Symptoms: Services fail to start, package updates fail, logs stop updating, random daemons crash.
Root cause: Unbounded file-based logs, misconfigured journald retention, or a runaway app writing at high rate.
Fix: Set journald caps (SystemMaxUse, SystemKeepFree), ensure logrotate is working, and fix the noisy app. If the environment is important, isolate /var onto its own filesystem.
6) “I can see the logs with sudo, but not as my on-call user”
Symptoms: journalctl shows “No journal files were found” or permission denied without sudo.
Root cause: On-call user isn’t in the right group (systemd-journal or adm depending on policy), or hardened permissions were applied.
Fix: Grant controlled read access via group membership, not shared root credentials, and document it.
Three corporate mini-stories from the logging trenches
Mini-story 1: An incident caused by a wrong assumption
A mid-sized company migrated a fleet from Ubuntu 20.04 to 24.04. They had a well-worn runbook: check /var/log/syslog, check /var/log/auth.log, ship to central syslog. The migration “worked,” the services came up, and the team moved on.
Two weeks later, a batch of nodes rebooted under a kernel panic triggered by a dodgy NIC firmware. The on-call pulled /var/log/syslog and saw… not much. It looked like the machine had simply restarted politely. The incident commander asked for “the last 60 seconds.” The on-call had 3 seconds and a growing sense of doom.
The wrong assumption was subtle: they assumed rsyslog was still the primary collector for everything important. But several services were systemd-native and logged to stdout; journald captured them, and only a subset were being forwarded into rsyslog. The missing events weren’t “missing.” They were sitting in volatile journal storage that disappeared across reboot on some node profiles.
The fix was boring and effective: they made journald persistent on all non-ephemeral nodes, set sane size caps, and routed journald into rsyslog in a single, explicit path. The next reboot incident was still unpleasant, but at least the logs told the story instead of gaslighting everyone.
Mini-story 2: An optimization that backfired
A large internal platform team decided they were paying too much in storage for logs. They noticed the journal could grow quickly on chatty nodes. So they turned down journald retention aggressively and tightened rate limits. Their goal was reasonable: keep disks healthy and reduce noise.
For a month, it looked like a win. Disk usage fell. Dashboards looked cleaner. Then a dependency started misbehaving: intermittent TLS handshake failures between services. The failures lasted seconds, only a few times per hour. Metrics showed error spikes, but the logs that would have explained why were often absent. The spikes were exactly the kind that get suppressed by rate limits and short retention when multiple components get noisy at the same time.
They eventually found a pattern by correlating a handful of surviving logs with packet captures: MTU mismatch after a network change. The real lesson wasn’t about MTU. It was that they “optimized” logging by removing the exact data needed to debug rare events, the kind you can’t reproduce on demand.
The corrected approach was to reduce volume at the source (log levels, sampling, structured event design), keep journald retention adequate for local triage, and rely on a central store for longer-term forensics. Cutting retention is a scalpel; they used it like a lawnmower.
Mini-story 3: A boring but correct practice that saved the day
A payments-adjacent team ran Ubuntu nodes that handled authentication. Nothing glamorous: systemd services, rsyslog forwarding, and a central log relay. The team had one habit that felt excessive: every quarter, they ran a controlled “log shipping failure” test during business hours.
The test was simple. They’d block egress to the log relay for a few minutes on a canary host, generate a handful of logger test messages at different facilities and priorities, then re-enable egress. The expectation: messages queue locally and later appear in the aggregator in order, without loss.
One quarter, the test failed. Messages never appeared upstream. Local logs existed, but forwarding didn’t catch up. Because it was a test, not an outage, they had time to investigate without adrenaline. It turned out a config change had switched forwarding to UDP “temporarily” and nobody switched it back. Temporary is the most permanent word in corporate IT.
They reverted to TCP with disk queues and wrote a tiny CI check that flagged UDP forwarding in production configs. A month later, a real network incident hit their datacenter segment. The queue absorbed the outage, the SIEM caught up afterward, and the incident review contained an unfamiliar phrase: “No data loss observed.” Boring won. Again.
Checklists / step-by-step plan
Plan A (recommended): journald persistent + rsyslog forwarding with one ingestion path
-
Make journald persistent.
cr0x@server:~$ sudo mkdir -p /var/log/journal cr0x@server:~$ sudo systemd-tmpfiles --create --prefix /var/log/journal cr0x@server:~$ sudo sed -i 's/^#Storage=.*/Storage=persistent/' /etc/systemd/journald.conf cr0x@server:~$ sudo systemctl restart systemd-journaldWhat to verify:
journalctl --list-bootsshould show more than boot 0 after the next reboot, and/var/log/journalshould populate. -
Set retention caps that won’t fill disks.
cr0x@server:~$ sudo tee /etc/systemd/journald.conf.d/99-retention.conf >/dev/null <<'EOF' [Journal] SystemMaxUse=2G SystemKeepFree=1G MaxRetentionSec=14day EOF cr0x@server:~$ sudo systemctl restart systemd-journaldDecision: Pick caps based on disk size and incident needs. On small root filesystems, be conservative and ship off-host.
-
Choose the handoff to rsyslog: use imjournal OR ForwardToSyslog, not both.
Option 1 (common): rsyslog reads journal with imjournal.
cr0x@server:~$ sudo grep -R "module(load=\"imjournal" /etc/rsyslog.conf module(load="imjournal" StateFile="imjournal.state")Then disable journald forwarding to syslog to avoid duplication if you’re not using it:
cr0x@server:~$ sudo tee /etc/systemd/journald.conf.d/10-forwarding.conf >/dev/null <<'EOF' [Journal] ForwardToSyslog=no EOF cr0x@server:~$ sudo systemctl restart systemd-journald -
Use reliable forwarding (TCP + queues; RELP if available).
cr0x@server:~$ sudo tee /etc/rsyslog.d/60-forward.conf >/dev/null <<'EOF' # Forward everything to a relay over TCP with a disk-assisted queue. # Adjust rules so you don't forward noisy debug logs if you don't need them. action( type="omfwd" target="logrelay.internal" port="514" protocol="tcp" action.resumeRetryCount="-1" queue.type="LinkedList" queue.filename="fwdAll" queue.maxdiskspace="2g" queue.saveonshutdown="on" queue.dequeuebatchsize="500" ) EOF cr0x@server:~$ sudo rsyslogd -N1 cr0x@server:~$ sudo systemctl restart rsyslogDecision: If you have compliance-grade requirements, pair this with an internal relay and consider RELP/TLS. TCP alone is a good baseline, not a guarantee.
-
Prove end-to-end flow with controlled messages.
cr0x@server:~$ logger -p user.notice "LOGPIPE e2e test id=$(uuidgen)" cr0x@server:~$ journalctl --since "2 min ago" -o short-iso | grep LOGPIPE | tail -n 1 2025-12-30T11:20:41+00:00 server cr0x: LOGPIPE e2e test id=3e0c2aef-7e0f-4a43-a3c2-9c3e5c4f2f8bDecision: If it shows locally but not centrally, fix shipping. If it doesn’t show locally, fix capture.
Plan B: journald only (acceptable for ephemeral fleets with strong centralization)
- Use journald persistent only if disks and retention policies allow; otherwise rely on immediate shipping via a journald-aware collector.
- Set strict rate limits carefully: you might protect the node at the cost of losing the one event you needed.
- Make sure you still have an off-host copy. “Local-only” is a prelude to “we can’t prove what happened.”
Plan C: rsyslog as primary (only if you have legacy constraints)
- Possible, but you’ll still have journald capturing stdout/stderr for systemd services.
- If you insist on file-based workflows, ensure services log to syslog or files intentionally. Otherwise you’ll chase missing events in two worlds.
- Be explicit about kernel logging sources to avoid gaps.
FAQ
1) On Ubuntu 24.04, do I need rsyslog at all?
If you need classic syslog file layouts, fine-grained routing rules, disk-assisted queues, or broad syslog ecosystem compatibility, yes. If you have a journald-native collector shipping off-host reliably, you can skip rsyslog.
2) Will journald lose logs?
It can. If configured as volatile, logs won’t survive reboot. If rate limits kick in, it can suppress messages during bursts. If disk is full or retention caps are small, older logs are vacuumed. None of that is evil; it’s just physics.
3) Are binary logs a problem for compliance?
Usually the compliance requirement is “retention, integrity, access control, auditability,” not “must be plain text.” The real compliance move is shipping off-host to immutable storage and controlling access. Binary vs text is a tooling preference, not a guarantee.
4) Why do I see logs in journalctl but not in /var/log/syslog?
Because journald captures stdout/stderr for systemd services by default. Unless you forward those entries to syslog, they won’t appear in syslog files. Also, filters or facility mappings can route messages differently.
5) Should I forward from journald to rsyslog or have rsyslog read the journal?
Pick one, based on clarity and duplication avoidance. I prefer rsyslog reading the journal via imjournal for a single ingestion point with explicit queues and forwarding actions.
6) Is UDP syslog forwarding ever acceptable?
For low-stakes telemetry and noisy debug streams where loss is acceptable, sure. For auth, security, or incident-critical logs: no. Use TCP with buffering, or RELP if you can.
7) How much journal retention should I keep?
Keep enough to cover your human response window: at least “previous boot + a few days” on important hosts. Then rely on central retention for weeks/months. Cap local usage so it can’t eat the box.
8) Can I make journald write traditional text logs directly?
Not as its primary format. journald can forward to syslog, and syslog daemons can write text files. That’s the supported bridge: journald captures, rsyslog writes/forwards.
9) What about container logs?
If containers log to stdout/stderr and the runtime integrates with systemd, journald can capture with rich metadata. If you’re using a different runtime path, ensure your collector grabs container logs explicitly. Don’t assume.
10) How do I prevent logs from taking down the node?
Cap journald disk usage, ensure logrotate works for text logs, and reduce log volume at the source. Also avoid putting heavy logging on the same constrained filesystem as your database.
Conclusion: next steps that won’t betray you
Ubuntu 24.04 doesn’t force a religious war between journald and rsyslog. It gives you two tools with different failure modes. In production, the right pattern is usually: persistent journald for local truth, plus rsyslog for deliberate, buffered, compatible forwarding.
Next steps:
- Make journald persistent on any host you might debug after a reboot, and cap it so it can’t fill disks.
- Decide your single ingestion path into rsyslog to avoid duplicates.
- Switch forwarding to TCP (or RELP) with disk-assisted queues for anything you can’t afford to lose.
- Run a quarterly “log shipping failure” test on a canary. If that sounds excessive, wait until your first audit or security incident.
Logging isn’t just observability. It’s evidence. Build it like you’ll need it in court—because someday, internally, you will.