Post‑Reinstall Hardening Checklist: What to Lock Down First

February 21, 2026 • February 21, 2026 • Read: 20 min • Views: 0

Was this helpful?

You reinstalled a server. It boots. Services start. Everyone relaxes—right up until you remember that a fresh system is basically a brand-new house with the doors propped open for the movers.

This is the window where small choices become expensive stories. Below is the hardening order that actually holds up in production: stop the easy intrusions first, then make the system survivable, then make it diagnosable.

The order matters: hardening principles that don’t lie

Hardening after a reinstall is not a vibes-based activity. It’s a sequence problem. You’re trying to reduce risk quickly while keeping the system operable enough to finish the job.

Principle 1: Close remote doors before you decorate the interior

Start with anything reachable from outside: SSH, remote management, exposed web admin panels, default credentials, open ports. If an attacker can get an interactive shell, your careful file permissions are just a speed bump.

Principle 2: Identity and access are your first permanent state

Users, groups, sudo rules, service accounts, and API keys are how the system will be used for the next year. Your reinstall wiped the messy history. Don’t reintroduce it out of convenience.

Principle 3: Observability is part of security

If you can’t answer “what changed?” and “who did it?” you don’t have a hardened system. You have a system you’re hoping is hardened. Central logs, audit trails, and time sync aren’t garnish; they’re incident response tools.

Principle 4: Storage is where accidents become disasters

Most production outages aren’t Hollywood hacks; they’re permission mistakes, runaway logs, a filled root filesystem, or a “temporary” mount that got forgotten. Storage hardening means making the write paths boring and predictable.

Principle 5: Baselines beat heroics

After reinstall, you can build a known-good baseline. That baseline is your future diff. If you don’t set it now, you’ll spend months arguing whether “it used to work.”

One quote worth keeping in your head when you feel tempted to wing it: “Hope is not a strategy.” — attributed to John M. Sullivan (commonly cited in operations and planning contexts).

Joke #1: A freshly reinstalled server is like a newborn—cute, loud, and absolutely not ready to be left alone on the internet.

Fast diagnosis playbook (first/second/third checks)

This is the “it’s acting weird after reinstall” playbook. Use it to find the bottleneck fast and avoid digging into the wrong layer.

First: Is the system healthy at the OS level?

CPU pressure? Check load, run queue, and whether it’s CPU or iowait.
Memory pressure? Check available memory and swap activity.
Disk full? Root and /var filling are classic post-reinstall failures.
Time correct? If clocks drift, TLS fails, logs lie, and authentication gets “mysterious.”

Second: Is the network path sane?

Default route and DNS? A reinstall often resets resolvers, routes, and interface names.
Firewall rules? Your services may be running but unreachable.
SSH hardening gone too far? “Locked down” is not the same as “locked out.”

Third: Is the storage stack doing what you think it’s doing?

Mounts and fstab? Missing mounts cause apps to write into root.
RAID/ZFS state? Degraded arrays perform badly and fail worse.
Permissions? A reinstall can quietly change UID/GID mappings.

When you’re stuck, pick one question per layer: “Is it compute, network, or storage?” Most wasted hours are people guessing the wrong layer and doubling down.

Interesting facts and historical context (for the skeptical)

Early Unix assumed trusted networks. Many defaults historically favored convenience on campus networks, not hostile internet exposure.
SSH replaced telnet for a reason. In the 1990s, plaintext remote logins were normal; SSH made encrypted remote admin mainstream.
Default-open services have a long tail. “It’s only listening on the LAN” has been a recurring pre-incident phrase since the first flat networks.
Time sync became an availability feature. TLS, Kerberos, and many distributed systems break in bizarre ways when clocks drift.
Log retention used to be a disk problem. Smaller disks forced aggressive rotation; now the failure mode is “keep everything until /var is full.”
Least privilege wasn’t always culturally accepted. For decades, ops teams solved speed problems with shared root, then spent years unlearning it.
Firewalls moved from perimeter to host. The rise of cloud and zero-trust made host-level filtering and segmentation normal rather than paranoid.
Immutable infrastructure popularized reinstall-as-a-fix. Rebuild instead of repair improves consistency—but only if your baseline hardening is real.

Checklists / step-by-step plan (lock down first)

This is the order I’d use on a production Linux server after reinstall. Adjust for distro and environment, but don’t freestyle the sequence.

Phase 0: Don’t brick the box

Get out-of-band access (IPMI/iDRAC/console) or cloud serial console working before you change SSH or firewall.
Record the current IPs, routes, DNS, and SSH access path.
Set a rollback window: if you lock yourself out, you need a known recovery plan.

Phase 1: Stop easy remote compromise

Patch base OS packages.
Harden SSH: keys, no root login, no passwords (when safe), minimal ciphers/MACs if you have a policy.
Enable and validate a host firewall: default deny inbound, allow only what’s required.
Remove/disable unused remote services (old web consoles, RPC endpoints, dev ports).

Phase 2: Make identity boring

Create named admin accounts; avoid shared root.
Lock down sudoers: explicit groups, no NOPASSWD unless you have a reason you can defend.
Set sane password policy where applicable (or central auth policies).
Inventory SSH authorized_keys and kill the zombie keys.

Phase 3: Storage and data safety (where pain lives)

Confirm mounts and filesystem options (nodev/nosuid/noexec where appropriate).
Verify RAID/ZFS health; schedule scrubs and SMART checks.
Set permissions and ownership on data paths; verify UID/GID alignment for service accounts.
Configure log rotation and disk usage guardrails.
Backups: configure, run, and test a restore. If you don’t test, it’s not a backup; it’s a wish.

Phase 4: Observability and response

Time sync (chrony/systemd-timesyncd) and timezone sanity.
Centralize logs or at least ship critical logs off-host.
Enable auditing for privileged actions if your environment requires it.
Baseline monitoring: CPU, memory, disk, service checks, and alert routing.

Phase 5: Service hardening (defense in depth)

Systemd sandboxing options where safe.
Disable unused kernel modules and tighten sysctl for network exposure.
Secrets management approach (file permissions, env vars, vault integration, rotation).

Hands-on tasks (commands, outputs, decisions)

Below are practical tasks you can run right after reinstall. Each includes: a command, what “normal” output looks like, what it means, and what decision you make next.

Task 1: Confirm what’s actually listening (and on which interfaces)

cr0x@server:~$ sudo ss -lntup
Netid State  Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
tcp   LISTEN 0      4096   0.0.0.0:22         0.0.0.0:*     users:(("sshd",pid=812,fd=3))
tcp   LISTEN 0      4096   127.0.0.1:5432     0.0.0.0:*     users:(("postgres",pid=1021,fd=7))
tcp   LISTEN 0      4096   0.0.0.0:9100       0.0.0.0:*     users:(("node_exporter",pid=1102,fd=3))

What it means: SSH and node_exporter are exposed on all interfaces; Postgres is loopback-only (good default).

Decision: If 9100 shouldn’t be public, firewall it to your monitoring network or bind it to a private interface. If anything unexpected is on 0.0.0.0, treat it as a bug until proven otherwise.

Task 2: Verify the firewall is enabled and the policy is not “shrug”

cr0x@server:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip
To                         Action      From
22/tcp                     ALLOW IN    203.0.113.10
9100/tcp                   ALLOW IN    10.10.0.0/16

What it means: Default deny inbound is set. SSH is restricted to a known admin IP; metrics are allowed from an internal range.

Decision: If you see “Default: allow (incoming)”, fix it immediately. If SSH is “Anywhere”, narrow it unless you truly need global admin access (you don’t).

Task 3: Confirm SSH server settings won’t embarrass you later

cr0x@server:~$ sudo sshd -T | egrep '^(permitrootlogin|passwordauthentication|pubkeyauthentication|challengeresponseauthentication|x11forwarding|allowusers|allowgroups)'
permitrootlogin no
passwordauthentication no
pubkeyauthentication yes
challengeresponseauthentication no
x11forwarding no

What it means: Root login and password auth are disabled; key auth is enabled; X11 forwarding is off.

Decision: If this server is a rescue endpoint or needs break-glass, decide that explicitly (restricted source IP + separate key + alerting). Otherwise, keep passwords off.

Task 4: Validate you didn’t lock yourself out (from another terminal)

cr0x@server:~$ ssh -o PreferredAuthentications=publickey -o PasswordAuthentication=no admin@server.example
Linux server 6.5.0-21-generic #21-Ubuntu SMP ...
admin@server:~$ echo OK
OK

What it means: Key-based login works as intended; no password fallback.

Decision: Only after this succeeds should you close your out-of-band console session. If it fails, don’t “fix it live” blindly—review sshd_config and authorized_keys.

Task 5: Inventory privileged access (sudo) and remove surprises

cr0x@server:~$ sudo getent group sudo
sudo:x:27:admin,deploy

What it means: The sudo group contains admin and deploy accounts.

Decision: Decide whether deploy truly needs full sudo. Often it needs specific commands, not god-mode.

Task 6: Check for passwordless sudo (it spreads)

cr0x@server:~$ sudo grep -R --line-number -E 'NOPASSWD|ALL=\(ALL\) ALL' /etc/sudoers /etc/sudoers.d 2>/dev/null
/etc/sudoers:27:%sudo   ALL=(ALL:ALL) ALL
/etc/sudoers.d/deploy:1:deploy ALL=(root) NOPASSWD:/usr/bin/systemctl restart myapp.service

What it means: The deploy user can restart one service without a password. That’s narrowly scoped and defensible.

Decision: Keep it if it’s required for automation and tightly bounded. If you find NOPASSWD:ALL, remove it and expect to discover “automation” that was actually “someone was in a hurry.”

Task 7: Confirm automatic security updates (or your patch pipeline)

cr0x@server:~$ systemctl status unattended-upgrades --no-pager
● unattended-upgrades.service - Unattended Upgrades Shutdown
     Loaded: loaded (/lib/systemd/system/unattended-upgrades.service; enabled)
     Active: active (running) since Tue 2026-02-03 11:14:26 UTC; 1 day ago

What it means: Automatic updates are enabled and running.

Decision: In tightly controlled environments you might disable this and patch via your pipeline instead. Either way, choose one. “We patch manually sometimes” is how you get owned by a vulnerability with a cute logo.

Task 8: Confirm kernel and OS version match your baseline

cr0x@server:~$ uname -a
Linux server 6.5.0-21-generic #21-Ubuntu SMP PREEMPT_DYNAMIC x86_64 GNU/Linux

What it means: This is the running kernel. It must match what your drivers, security policy, and tooling expect.

Decision: If you need a specific kernel line (e.g., for storage HBA drivers), fix it now—not after you’ve loaded data and regret.

Task 9: Check time sync (security and sanity)

cr0x@server:~$ timedatectl
               Local time: Wed 2026-02-04 09:22:11 UTC
           Universal time: Wed 2026-02-04 09:22:11 UTC
                 RTC time: Wed 2026-02-04 09:22:10
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

What it means: Clock is synchronized; NTP is active; UTC is in use (blessedly boring).

Decision: If System clock synchronized: no, fix it before debugging TLS, cluster membership, or “random” auth failures.

Task 10: Check disks, filesystems, and what’s mounted where

cr0x@server:~$ lsblk -f
NAME   FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda
├─sda1 vfat   FAT32       2F1A-9C2B                              512M     2% /boot/efi
├─sda2 ext4   1.0         6f8d1c6a-1f6a-4b6c-9f0a-1f7c6e2a1b4a   18G     21% /
└─sda3 ext4   1.0         41b2b9b0-9a1f-4b57-9b0d-0d8d1d7b6e2e  820G     9% /srv

What it means: Root and /srv are separate. That’s good: app data won’t instantly take down the OS.

Decision: If /var is not separate and you expect heavy logs, consider splitting it or at least enforce aggressive log rotation and monitoring.

Task 11: Verify fstab won’t surprise-reboot you

cr0x@server:~$ sudo cat /etc/fstab
UUID=6f8d1c6a-1f6a-4b6c-9f0a-1f7c6e2a1b4a /     ext4 defaults,errors=remount-ro 0 1
UUID=41b2b9b0-9a1f-4b57-9b0d-0d8d1d7b6e2e /srv  ext4 defaults,noatime         0 2

What it means: Mounts use UUIDs (stable). errors=remount-ro protects root from silent corruption by going read-only on errors.

Decision: If you have “temporary” mounts, decide whether they should be nofail. For critical data, do not use nofail; fail fast instead of booting into a half-working state.

Task 12: Check filesystem usage and find the first “why is this full?”

cr0x@server:~$ df -hT
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sda2      ext4   20G  4.2G   15G  23% /
/dev/sda3      ext4  900G   80G  774G  10% /srv
tmpfs          tmpfs 3.1G     0  3.1G   0% /run/user/1000

What it means: Plenty of space now. The point is establishing the baseline before the system starts to rot.

Decision: Set alerts at 70/80/90% depending on growth rate. If / is small, be more aggressive.

Task 13: Confirm log rotation is enabled and not fantasy

cr0x@server:~$ sudo logrotate -d /etc/logrotate.conf | tail -n 8
reading config file /etc/logrotate.conf
including /etc/logrotate.d
reading config file apt
reading config file rsyslog
Handling 2 logs
rotating pattern: /var/log/syslog  weekly (4 rotations)
renaming /var/log/syslog to /var/log/syslog.1

What it means: Logrotate can parse config and intends to rotate syslog weekly with 4 rotations.

Decision: If you rely on journald only, set SystemMaxUse and SystemMaxFileSize in /etc/systemd/journald.conf. Otherwise /var will eventually stage a coup.

Task 14: Check journald persistence and size limits

cr0x@server:~$ sudo journalctl --disk-usage
Archived and active journals take up 384.0M in the file system.

What it means: Journals are consuming disk. That’s fine—until it isn’t.

Decision: If this climbs unpredictably, cap it and ship logs off-host. If you keep it volatile-only, accept that reboot erases your crime scene.

Task 15: Find services that start automatically (and shouldn’t)

cr0x@server:~$ systemctl list-unit-files --type=service --state=enabled
UNIT FILE                         STATE   PRESET
cron.service                      enabled enabled
ssh.service                       enabled enabled
unattended-upgrades.service       enabled enabled
postgresql.service                enabled enabled
apache2.service                   enabled enabled

What it means: Apache is enabled. Maybe that’s intended. Maybe it’s leftover from “quick test” packages.

Decision: If a service isn’t part of the system’s purpose, disable and remove it. Every enabled daemon is a potential CVE subscription.

Task 16: Verify TLS private keys and secrets aren’t world-readable

cr0x@server:~$ sudo find /etc/ssl -type f -name '*.key' -printf '%m %u %g %p\n'
640 root ssl-cert /etc/ssl/private/server.key

What it means: Mode 640, owned by root, group ssl-cert. That’s typical on Debian/Ubuntu.

Decision: Ensure only the service account group can read it. If you see 644 or random ownership, fix immediately and audit who accessed it.

Task 17: Verify storage health (SMART) before trusting the machine

cr0x@server:~$ sudo smartctl -H /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.0-21-generic] (local build)
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

What it means: Disk reports healthy (not a guarantee, but a baseline).

Decision: If it fails or shows reallocated sectors climbing, replace the disk before you start believing your reinstall “fixed” things.

Task 18: If you use ZFS, confirm pool status and schedule scrubs

cr0x@server:~$ sudo zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:12:44 with 0 errors on Sun Feb  1 03:10:02 2026
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          sdb       ONLINE       0     0     0
          sdc       ONLINE       0     0     0

errors: No known data errors

What it means: Pool is online, last scrub had zero errors.

Decision: If state is DEGRADED, treat it as urgent. Also confirm scrubs are scheduled (cron/systemd timer) so bit rot doesn’t become a surprise hobby.

Task 19: Confirm permissions on critical data paths (and UID/GID consistency)

cr0x@server:~$ sudo stat -c '%U %G %a %n' /srv /srv/myapp /srv/myapp/data
root root 755 /srv
myapp myapp 750 /srv/myapp
myapp myapp 700 /srv/myapp/data

What it means: Data is restricted. Service account owns its data. Good.

Decision: If you see root-owned data directories for app services, you’re likely to “fix” it later with chmod -R and cause collateral damage. Set ownership correctly now.

Task 20: Verify backups exist and are restorable (the only test that counts)

cr0x@server:~$ sudo restic -r /backups/restic repo snapshots
repository 2f1a... opened (version 2, compression level auto)
ID        Time                 Host        Tags        Paths
c31e9f5a  2026-02-04 02:00:12  server                  /etc /srv/myapp

What it means: A snapshot exists. That’s not yet confidence.

Decision: Perform a targeted restore test into a temporary directory and verify checksums/service start. Backups without restores are just performance art.

Joke #2: If you’ve never tested a restore, your backup strategy is basically “thoughts and prayers with extra disk.”

Three corporate mini-stories (how this fails in real companies)

1) Incident caused by a wrong assumption: “It’s only accessible internally”

A mid-size company rebuilt a metrics server after a disk failure. The engineer did what many of us do when tired: installed the exporter, confirmed dashboards were green, moved on. The assumption was that the server lived “on the private network.”

It did, sort of. The new cloud image came with a public IP assigned by default, and security groups were copied from a template used for a different role. The exporter was listening on 0.0.0.0:9100. No auth. No firewall rule on the host. The dashboard worked, therefore everything was “fine.”

Within days, the server showed strange spikes and outbound traffic. Nothing dramatic, just enough to trigger a billing alert. Incident response found opportunistic scanners pulling metadata and trying known weak endpoints. The exporter itself wasn’t the crown jewels, but it was an inventory leak: kernel versions, mounted filesystems, network interfaces, and sometimes environment details via textfile collectors.

The root cause wasn’t the exporter. It was the assumption that network placement equals security. The fix was simple: default-deny inbound at the host firewall, bind internal-only services to internal interfaces, and explicitly manage cloud firewall rules per role—no “close enough” templates.

The lesson stuck: “Internal” is not a property of your feelings. It’s a property of routing tables, firewall rules, and what’s actually listening.

2) Optimization that backfired: disabling updates to “avoid reboots”

A fintech team had a strict uptime culture and a fear of surprise kernel updates. After a reinstall, they disabled unattended upgrades and told themselves they’d patch during a monthly window. Then monthly became quarterly, because there was always a campaign and always a reason.

They optimized for fewer reboots and got exactly that. They also got an increasingly stale userspace, including an OpenSSL build with known vulnerabilities. Nobody noticed because everything worked, and monitoring was focused on latency, not CVEs.

Eventually, a routine security scan lit the place up. The reaction was predictable: emergency patching, emergency reboots, emergency change approvals. The “optimization” produced the very thing it was trying to avoid, except now it arrived at 2 a.m. with management on the bridge call.

The fix wasn’t “turn updates back on and pray.” They implemented a patch pipeline: staging first, production second, and a predictable cadence with a rollback plan. The culture changed too: scheduled reboots became normal, and “surprise reboots” became an outage symptom, not an accepted risk.

If you’re going to disable automatic updates, you’re signing up to be better than automation. Most orgs aren’t.

3) Boring but correct practice that saved the day: separate mounts and strict ownership

A logistics company had a fleet of application servers that got rebuilt often—hardware swaps, OS upgrades, you name it. Their baseline included separate filesystems: / small and stable, /srv for application data, and /var sized for logs. It was the kind of setup nobody brags about at conferences.

One weekend, a new release shipped with debug logging accidentally left on. Log volume exploded. On a single-filesystem layout, that’s the classic chain reaction: / fills, services fail, SSH gets flaky, and recovery becomes a scramble because you can’t even write a temporary file.

Here, /var filled, alerts fired, the app degraded, and the OS stayed healthy. SSH remained stable. They rotated logs, flipped the log level back, and the system recovered without filesystem corruption or emergency rebuild.

The postmortem was short. Not because nothing happened—because the blast radius was constrained by boring decisions made during reinstall months earlier. Separate mounts, sane rotation, and correct ownership aren’t glamorous. They’re how you survive your own software.

Common mistakes: symptoms → root cause → fix

1) Symptom: “SSH worked yesterday, now I’m locked out”

Root cause: Firewall tightened without validating access from the real admin IP; or PasswordAuthentication disabled before keys were deployed; or AllowUsers configured incorrectly.

Fix: Use console access to revert. Validate sshd -T and attempt key-only login from a second session before closing the console. Restrict SSH by source IP only after you’ve confirmed stable access paths.

2) Symptom: “Service is running but unreachable”

Root cause: Process bound to 127.0.0.1 or wrong interface after reinstall; or host firewall default deny without explicit allow; or cloud firewall mismatch.

Fix: Check ss -lntup to confirm bind addresses. Confirm firewall rules on host and in cloud. Don’t assume “running” implies “reachable.”

3) Symptom: “Disk space keeps vanishing”

Root cause: Journald or application logs unbounded; core dumps enabled with large dumps; tmp files accumulating; deleted-but-open files.

Fix: Cap journald usage, configure logrotate, and use lsof +L1 to find deleted open files. Put heavy-write paths on separate mounts and alert early.

4) Symptom: “Permissions are correct but app can’t read/write”

Root cause: UID/GID mismatch after reinstall; container volume mounts owned by host root; ACLs missing; SELinux/AppArmor denial.

Fix: Verify id myapp and ownership on disk. If using SELinux/AppArmor, check denials and label profiles correctly rather than chmod-ing blindly.

5) Symptom: “TLS suddenly fails: certificate not yet valid / expired”

Root cause: Time not synchronized; timezone confusion; RTC misconfigured; NTP blocked.

Fix: Fix time sync first (timedatectl, chrony). Don’t rotate certificates until the clock is sane or you’ll chase ghosts.

6) Symptom: “Performance is terrible after reinstall”

Root cause: Wrong storage driver, missing firmware, degraded RAID/ZFS, or IO scheduler/queue settings reset; or swap thrashing due to changed memory settings.

Fix: Start with iostat, vmstat, and storage health checks. Verify RAID/ZFS status. Confirm you’re on the intended kernel/driver stack.

7) Symptom: “Backups succeed but restores fail”

Root cause: Backups capturing wrong paths, missing permissions/ACLs, or encryption keys not stored safely; restore procedure never validated.

Fix: Do a restore drill: restore to a temp path, validate integrity, and start the service using restored config. Treat restore steps as code and automate them.

FAQ

1) What should I lock down first after reinstall?

SSH exposure and inbound network policy. Patch the base OS, harden SSH (keys, no root, no passwords when possible), and set a default-deny firewall with explicit allows.

2) Should I disable password authentication on SSH immediately?

Only after you’ve verified key-based access from a separate session and you have console access. Disable passwords early, but don’t do it as your first change on a remote-only host.

3) Is it okay to allow SSH from anywhere if I use strong keys?

It’s survivable, not optimal. Strong keys reduce credential attacks, but you still get brute-force noise, exploit attempts, and increased exposure. Restrict by source IP or VPN where you can.

4) How do I decide between UFW, nftables, and firewalld?

Pick the one your team can operate consistently. UFW is simple and common on Ubuntu. nftables is powerful and direct. firewalld is fine in environments already standardized on it. Consistency beats preference.

5) After reinstall, why do my app permissions break even when files look correct?

UID/GID drift is a classic. The username may match, but the numeric IDs may not. Also check ACLs and SELinux/AppArmor denials. Fix identity mapping first; avoid chmod -R as a “solution.”

6) Should I encrypt disks on servers?

If you have physical access risk, multi-tenant concerns, compliance requirements, or you ship drives for RMA, yes. LUKS adds operational complexity (boot-time unlock, key management). Decide deliberately, then test boot and recovery.

7) How much logging is “enough” after reinstall?

Enough to answer: who logged in, what changed, what failed, and what the system did before it failed. At minimum: auth logs, system logs, service logs, and time sync. Ship off-host if compromise risk is real.

8) What’s the fastest way to catch an accidentally exposed service?

ss -lntup on the host, plus an external port scan from a known vantage point. Host view tells you what’s listening; external scan tells you what’s reachable through firewalls.

9) Do I really need a restore test right after reinstall?

Yes. Reinstall is when assumptions change: paths, permissions, encryption keys, service names. A restore test now is cheap. In an incident, it’s the difference between recovery and regret.

10) What’s one hardening change people overdo?

Security headers and crypto knob-twiddling before they’ve nailed identity, patching, firewalling, and backups. You don’t win by polishing the lock while leaving the garage door open.

Practical next steps

Run the listening-port inventory and make every open port justify itself.
Set default-deny inbound on the host firewall and explicitly allow SSH from known sources.
Harden SSH with keys, no root login, and no passwords (after validation).
Establish identity baseline: admin accounts, sudo rules, and service users with least privilege.
Make storage boring: verify mounts, permissions, rotation, and disk health checks.
Prove backups by restoring something real and starting a service from it.
Turn on the boring signals: time sync, log retention limits, and off-host logging where appropriate.
Write down the baseline: outputs of key commands, versions, firewall rules, and ownership expectations—so future you can diff reality against intent.

If you do nothing else: lock down remote access, set a default-deny firewall, and test a restore. Everything else is easier when you’re not actively on fire.