Ubuntu 24.04: SSH hardening that won’t lock you out — a pragmatic checklist

December 6, 2025 • February 3, 2026 • Read: 27 min • Views: 12

Was this helpful?

Everyone loves “hardening” until the first time you push a change, reload sshd, and realize you just bricked your only remote access path. The machine is still running. You can feel it. But you can’t get in. That’s not security; that’s self-inflicted downtime.

This is a production-minded SSH hardening guide for Ubuntu 24.04 that assumes you want stronger auth, less attack surface, and better auditability—without gambling your access on a single edit. We’ll take the boring route on purpose: staged changes, verification commands, and rollback plans.

Hardening principles that prevent lockouts

SSH hardening is a reliability exercise disguised as a security task. The goal isn’t “maximum paranoia.” The goal is “minimum regret.” Here are the principles that keep you employed:

Never change SSH in a single step. Stage changes. Validate. Only then enforce.
Keep a second path in. A second SSH session, a different user, a different auth method, or console access (IPMI/KVM/cloud serial console). Preferably two.
Test config before reload. Use sshd -t and print the effective config to confirm what will happen.
Use “Match” blocks to scope risk. Enforce strict settings for most users, keep a break-glass user constrained but functional.
Lock down the network first, then tighten auth. Reducing who can reach SSH is safer than guessing which clients can still negotiate your new crypto settings.
Measure before you tune. If you don’t know which auth methods are used today, you’re about to break someone important at 2 a.m.

One quote worth carrying around: “Hope is not a strategy.” — Gene Kranz. It’s short, a little harsh, and accurate in operations.

Joke #1: Changing sshd_config on a Friday is a great way to learn how good your on-call rotation really is.

Interesting facts and historical context (because it explains today’s defaults)

SSH replaced Telnet largely because Telnet sent credentials in cleartext. That wasn’t “a little insecure”; it was “Wireshark is your password manager.”
OpenSSH originated as a free fork of SSH in 1999. That fork became the default remote access tool on most Unix-like systems because it was open, audited, and boring—in the best way.
Port 22 is not magical. It’s just the IANA-assigned default. Moving it reduces commodity scanning noise, not targeted attacks, and can complicate firewalls and tooling.
SSH protocol versions used to be a real problem. SSH-1 is obsolete; SSH-2 is the standard. Modern Ubuntu/OpenSSH already disables the bad old days.
“Root login” was historically convenient for admins. It also made auditing awful and turned every leaked credential into a full compromise. That’s why “no root over SSH” became default advice.
Weak key algorithms got deprecated for good reasons. DSA is basically a museum piece. RSA is still around but increasingly policy-constrained; Ed25519 became popular because it’s fast, strong, and hard to misuse.
Strict crypto settings can break older automation. Some embedded clients and legacy libraries don’t support modern KEX/MAC combos. That’s not your fault, but it is your outage if you flip the switch without inventory.
Brute-force attacks didn’t get worse; they got cheaper. Cloud-scale scanning made “password auth on an internet-exposed SSH” an eventually-losing bet.

Baseline: know what you’re running and who can reach it

Hardening starts with a baseline. You need to know:

Which OpenSSH server version you’re running (feature set, defaults, available directives).
Which port(s) you’re listening on and which interfaces.
Which authentication methods are in use right now.
Which networks can reach SSH.
How to regain access if you mess it up.

On Ubuntu 24.04, openssh-server is typically managed by systemd (ssh.service) and configured by /etc/ssh/sshd_config plus drop-in fragments under /etc/ssh/sshd_config.d/ (if you choose to use them). The “drop-in” approach is cleaner for change control: you can keep a minimal, vendor-friendly base and manage your own fragment.

Practical tasks (commands + output meaning + decisions)

These are real tasks you can run on Ubuntu 24.04. Each one includes: the command, what you should see, and what decision you make from it. This is the heart of “hardening without lockout”: you never change what you haven’t observed.

Task 1: Confirm OpenSSH server is installed and versioned

cr0x@server:~$ dpkg -l | grep -E '^ii\s+openssh-server'
ii  openssh-server  1:9.6p1-3ubuntu13  amd64  secure shell (SSH) server, for secure access from remote machines

What it means: You’re running OpenSSH 9.6p1 packaged for Ubuntu. Version matters for supported directives and crypto defaults.

Decision: If openssh-server isn’t installed or is unexpectedly old, stop and fix that before “hardening.” Hardening a stale daemon is polishing a rusted lock.

Task 2: Check whether SSH is running and how it started

cr0x@server:~$ systemctl status ssh --no-pager
● ssh.service - OpenBSD Secure Shell server
     Loaded: loaded (/usr/lib/systemd/system/ssh.service; enabled; preset: enabled)
     Active: active (running) since Mon 2025-12-30 09:14:22 UTC; 2h 18min ago
       Docs: man:sshd(8)
             man:sshd_config(5)
   Main PID: 1247 (sshd)
      Tasks: 1 (limit: 18962)
     Memory: 6.3M
        CPU: 1.221s
     CGroup: /system.slice/ssh.service
             └─1247 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"

What it means: The service is active and managed by systemd. The PID and invocation are visible.

Decision: If it’s not running, don’t “harden” it—restore service health first. If you’re on a console-less host, you want SSH stable before edits.

Task 3: Identify listening port(s) and interfaces

cr0x@server:~$ ss -tulpn | grep -E 'sshd|:22'
tcp   LISTEN 0      128          0.0.0.0:22        0.0.0.0:*    users:(("sshd",pid=1247,fd=3))
tcp   LISTEN 0      128             [::]:22           [::]:*    users:(("sshd",pid=1247,fd=4))

What it means: SSH is listening on all IPv4 and IPv6 interfaces on port 22.

Decision: If this is an internet-facing host, you’re likely overexposed. Plan network restrictions (firewall + allowlists) and possibly binding to a management interface.

Task 4: See effective sshd configuration (not just the file)

cr0x@server:~$ sudo sshd -T | egrep -i 'port |listenaddress|passwordauthentication|pubkeyauthentication|permitrootlogin|authenticationmethods'
port 22
listenaddress 0.0.0.0
listenaddress ::
permitrootlogin prohibit-password
passwordauthentication yes
pubkeyauthentication yes
authenticationmethods any

What it means: This is the config sshd will actually use after reading all includes and defaults.

Decision: You harden from the effective configuration, not your assumptions. If PasswordAuthentication yes, you need a plan to disable it safely (after confirming keys and automation).

Task 5: Validate your current config syntax before touching anything

cr0x@server:~$ sudo sshd -t

What it means: No output is good. Output indicates syntax errors and sshd would refuse to start/reload.

Decision: If you ever see errors here, do not reload sshd. Fix syntax first. Syntax failures are the purest lockout generator.

Task 6: Confirm you have a working key-based login (from another terminal)

cr0x@server:~$ ssh -o PreferredAuthentications=publickey -o PasswordAuthentication=no -v ops@127.0.0.1 'echo OK'
OpenSSH_9.6p1 Ubuntu-3ubuntu13, OpenSSL 3.0.13 30 Jan 2024
debug1: Authenticating to 127.0.0.1:22 as 'ops'
debug1: Authentication succeeded (publickey).
OK

What it means: You can authenticate with a key and you’re explicitly refusing password fallback.

Decision: If this fails, do not disable password auth yet. Fix keys, agent, or authorized_keys first.

Task 7: Inspect authorized_keys permissions (silent killers)

cr0x@server:~$ sudo ls -ld /home/ops /home/ops/.ssh /home/ops/.ssh/authorized_keys
drwxr-x---  6 ops ops 4096 Dec 30 09:01 /home/ops
drwx------  2 ops ops 4096 Dec 30 09:03 /home/ops/.ssh
-rw-------  1 ops ops  742 Dec 30 09:03 /home/ops/.ssh/authorized_keys

What it means: Permissions look sane: home not world-writable, .ssh is 700, authorized_keys is not writable by others.

Decision: If you see group/world writable bits or wrong ownership, fix them. sshd will ignore keys it considers unsafe, and it won’t throw a party to tell you.

Task 8: Confirm which users are allowed and which groups matter

cr0x@server:~$ getent group sudo
sudo:x:27:ops,deploy

What it means: ops and deploy are in sudo; these are likely interactive admin users.

Decision: If you plan to use AllowGroups, make sure you know which groups to permit. Mis-scoping this is a classic “why can’t I log in?” moment.

Task 9: Check firewall state and existing SSH allowances

cr0x@server:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip

To                         Action      From
--                         ------      ----
22/tcp                     ALLOW IN    10.20.0.0/16
22/tcp                     ALLOW IN    192.168.50.0/24

What it means: UFW is active, default-deny incoming, and SSH is allowed only from two private networks.

Decision: This is good. If you see 22/tcp ALLOW IN Anywhere on an internet host, fix firewalling before you tweak crypto.

Task 10: Observe auth events and brute-force noise

cr0x@server:~$ sudo journalctl -u ssh --since "2 hours ago" | tail -n 12
Dec 30 10:44:01 server sshd[8621]: Failed password for invalid user admin from 203.0.113.77 port 51244 ssh2
Dec 30 10:44:06 server sshd[8621]: Failed password for invalid user admin from 203.0.113.77 port 51244 ssh2
Dec 30 10:44:10 server sshd[8621]: Connection closed by invalid user admin 203.0.113.77 port 51244 [preauth]
Dec 30 11:02:33 server sshd[9012]: Accepted publickey for ops from 10.20.8.41 port 49912 ssh2: ED25519 SHA256:2d3v...

What it means: You have password brute-force attempts (likely internet exposure or VPN exposure) and legitimate key logins.

Decision: If you see repeated password attempts, disabling password auth becomes higher priority. If you only see key logins, you can enforce stricter auth with less risk.

Task 11: Check current SSH client compatibility (from your workstation)

cr0x@server:~$ ssh -G ops@server | egrep -i 'kexalgorithms|ciphers|macs|hostkeyalgorithms' | head
ciphers chacha20-poly1305@openssh.com,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr
macs umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com
kexalgorithms sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
hostkeyalgorithms ssh-ed25519,ecdsa-sha2-nistp256,rsa-sha2-512,rsa-sha2-256

What it means: Your client supports modern algorithms, including hybrid post-quantum KEX options (OpenSSH’s approach) and Ed25519.

Decision: Before tightening server algorithms, inventory client support. Your workstation being modern doesn’t mean your fleet automation is.

Task 12: Create a safe “sshd on a different port” canary (temporary)

cr0x@server:~$ sudo /usr/sbin/sshd -D -p 2222 -o PermitRootLogin=no -o PasswordAuthentication=no -o PubkeyAuthentication=yes

What it means: This starts a foreground sshd listener on port 2222 with strict auth settings, without touching your primary daemon.

Decision: Use this to test new policies while keeping port 22 as your safety net. Once validated, translate settings into config and reload the real service.

Task 13: Test reload safely (after changes) and confirm it took effect

cr0x@server:~$ sudo sshd -t && sudo systemctl reload ssh && sudo sshd -T | egrep -i 'passwordauthentication|permitrootlogin'
passwordauthentication no
permitrootlogin no

What it means: Syntax is valid, reload succeeded, and effective config now shows stricter auth.

Decision: If the reload fails, don’t panic—your existing sessions should stay up. Fix, then reload again. If you did a restart and it didn’t come back, you’re in “console time.” Avoid restarts unless necessary.

Task 14: Confirm you didn’t accidentally block yourself at the network layer

cr0x@server:~$ sudo ufw status numbered
Status: active

     To                         Action      From
[ 1] 22/tcp                     ALLOW IN    10.20.0.0/16
[ 2] 22/tcp                     ALLOW IN    192.168.50.0/24

What it means: You can clearly see which rules apply and in what order.

Decision: If you’re about to tighten rules, add new allow rules first, test connectivity, then remove the old broad ones. Subtractive firewall edits are how you discover your VPN doesn’t come from the subnet you assumed.

sshd_config: settings worth changing (and the ones that cause pain)

Ubuntu 24.04 + OpenSSH are already decent. The trick is choosing changes that materially reduce risk without detonating access. Here’s what I recommend and why, plus the failure modes.

Use a drop-in file, not a messy monolith

Create a managed fragment like /etc/ssh/sshd_config.d/50-hardening.conf. It plays well with packaging, makes diffs readable, and keeps your changes explicit.

cr0x@server:~$ sudo install -m 0644 -o root -g root /dev/null /etc/ssh/sshd_config.d/50-hardening.conf

High-value settings (safe when staged)

Disable password auth after verifying keys: PasswordAuthentication no.
Disable root login for interactive SSH: PermitRootLogin no. If you need privileged access, use sudo with audited users.
Limit who can log in: AllowUsers or AllowGroups. This reduces blast radius if a credential leaks.
Reduce attack surface: X11Forwarding no unless you truly need it; many don’t. Also consider AllowTcpForwarding restrictions per-user.
Set sane timeouts: ClientAliveInterval and ClientAliveCountMax to clear dead sessions.
Log with intent: LogLevel VERBOSE can help auditing (but can add noise; don’t do it blindly on high-volume bastions).

Settings that are “secure” but frequently cause lockouts

Over-tightening algorithms without client inventory. If you remove RSA SHA-2 support carelessly or disallow a KEX your older clients require, you’ll get handshake failures.
Setting AuthenticationMethods incorrectly. It’s powerful and sharp. A typo can require “two factors” that you never actually configured.
Using AllowUsers and forgetting service accounts (automation, backups, config management). Your system won’t crash; your pipeline will.

A solid baseline drop-in (adapt it)

This is conservative enough for most fleets and strict enough to matter. Apply it only after you’ve confirmed key auth works for all required accounts.

cr0x@server:~$ sudo tee /etc/ssh/sshd_config.d/50-hardening.conf >/dev/null <<'EOF'
# Managed hardening settings (Ubuntu 24.04)

Protocol 2

# Safer defaults: no root, no passwords
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no
ChallengeResponseAuthentication no
PubkeyAuthentication yes

# Reduce opportunistic abuse
MaxAuthTries 4
LoginGraceTime 20
MaxStartups 10:30:60

# Hygiene
X11Forwarding no
PermitTunnel no
PermitUserEnvironment no
AllowAgentForwarding no

# Keepalive to clear dead sessions
ClientAliveInterval 300
ClientAliveCountMax 2

# Prefer explicit allowlists (set these to match your org)
# AllowGroups ssh-users

LogLevel VERBOSE
EOF

How to apply without drama: Keep an active session open, validate config (sshd -t), reload (not restart), and test new logins in a second terminal.

Authentication strategy: keys, passwords, MFA, and break-glass

SSH auth isn’t a single toggle. It’s a strategy. In production, you want:

Primary auth that is strong and automated (public keys, ideally backed by a sane provisioning process).
Defense against credential reuse and brute-force (no passwords, or at least no passwords from the public internet).
Break-glass access that is controlled, audited, and tested (because it will be needed at the worst time).

Keys: do them properly

Ed25519 keys are the modern default for interactive users. They’re small, fast, and resistant to a lot of historical foot-guns. For automation, you may still have RSA keys in the wild; that’s okay if they’re using SHA-2 signatures and not ancient sizes.

cr0x@server:~$ ssh-keygen -t ed25519 -a 64 -C "ops@laptop"
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/cr0x/.ssh/id_ed25519):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cr0x/.ssh/id_ed25519
Your public key has been saved in /home/cr0x/.ssh/id_ed25519.pub

What it means: -a 64 increases KDF rounds for passphrase protection. That matters if the private key ever leaks.

Decision: For humans: use a passphrase. For automation: consider agent-based approaches or scoped keys with forced commands and source restrictions.

Deploy keys with a controlled method

If you use ssh-copy-id, do it while password auth is still enabled (temporarily), then disable passwords once verified.

cr0x@server:~$ ssh-copy-id -i ~/.ssh/id_ed25519.pub ops@server
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/cr0x/.ssh/id_ed25519.pub"
Number of key(s) added: 1
Now try logging into the machine, with:   "ssh 'ops@server'"

Decision: Immediately test with PasswordAuthentication=no as shown earlier. Don’t trust “added: 1” as proof of access.

MFA: useful, but don’t improvise in production

MFA for SSH can be done via PAM (e.g., TOTP) or via hardware-backed keys (FIDO2/U2F with OpenSSH). The reliability trap: PAM changes are global login plumbing. A small mistake can break not only SSH but also local console logins depending on how you configure it.

Pragmatic approach:

Start with key-only SSH + network allowlists + monitoring. That’s already a big improvement.
Add MFA on bastions first, not on every node, unless you have a mature rollout and recovery plan.
If using FIDO keys, ensure every admin has two devices and you have a process for lost keys.

Break-glass: the boring account that saves you

Create a dedicated breakglass user with:

Key-only auth
Restricted source addresses (VPN or office IP ranges)
Sudo rights gated by your policy (often full sudo, but tightly controlled and audited)
No day-to-day use

cr0x@server:~$ sudo adduser --disabled-password --gecos "" breakglass
Adding user `breakglass' ...
Adding new group `breakglass' (1002) ...
Adding new user `breakglass' (1002) with group `breakglass (1002)' ...
Creating home directory `/home/breakglass' ...
Copying files from `/etc/skel' ...
Adding new user `breakglass' to supplemental / extra groups `users' ...
Adding user `breakglass' to group `users' ...

Decision: Keep this account out of normal workflows. If it shows up in regular login logs, treat it like a smoke alarm.

Network controls: UFW, allowlists, and “don’t expose port 22 to the planet”

Network restrictions are the least glamorous hardening change and the most effective. If the attacker can’t reach the port, your auth policy becomes a second line of defense instead of the first.

Bind SSH to a management IP (when you have one)

If the host has a dedicated management interface/VLAN, bind SSH there. This reduces exposure even if firewall rules drift.

cr0x@server:~$ sudo tee /etc/ssh/sshd_config.d/10-listen.conf >/dev/null <<'EOF'
ListenAddress 10.20.8.10
EOF

Decision: Only do this if you are sure that IP is stable and reachable from your admin networks. Otherwise you’ll invent new swear words.

UFW allowlist pattern (add before remove)

If SSH is currently open, don’t slam it shut. Add a narrow allow rule first, test, then remove broad access.

cr0x@server:~$ sudo ufw allow from 10.20.0.0/16 to any port 22 proto tcp
Rule added

What it means: Admin network is allowed.

Decision: Confirm you can connect from that network. Only then remove Anywhere rules.

Consider a bastion instead of direct node access

In many corporate environments, the best SSH hardening is architectural: make most servers non-routable from user networks, and require a bastion with stricter monitoring. SSH becomes a controlled choke point instead of a thousand scattered entry points.

Joke #2: A bastion host is like the office receptionist—everyone complains about it until the day you remove it.

Abuse resistance: rate limits, Fail2ban, and log-driven responses

If your SSH is reachable, it will be scanned. Even with key-only auth, attackers will hammer usernames and attempt weird edge cases. Your job is to make that noise cheap to ignore and expensive to sustain.

Use systemd journal + rate limiting as your first line

OpenSSH has built-in knobs (MaxStartups, LoginGraceTime, MaxAuthTries). These reduce resource abuse and slow brute-force attempts.

Don’t set them to absurdly low values on bastions; you’ll punish your own users during incident response when many people connect at once.

Fail2ban: still useful, but not a substitute for allowlists

Fail2ban works by parsing logs and adding firewall rules dynamically. It’s helpful for internet-facing endpoints, but it’s reactive and it can be gamed. Treat it as a mop, not a dam.

cr0x@server:~$ sudo apt-get update -y && sudo apt-get install -y fail2ban
Reading package lists... Done
Building dependency tree... Done
The following NEW packages will be installed:
  fail2ban

cr0x@server:~$ sudo systemctl enable --now fail2ban
Created symlink /etc/systemd/system/multi-user.target.wants/fail2ban.service → /usr/lib/systemd/system/fail2ban.service.

cr0x@server:~$ sudo fail2ban-client status sshd
Status for the jail: sshd
|- Filter
|  |- Currently failed: 0
|  |- Total failed: 24
|  `- File list: /var/log/auth.log
`- Actions
   |- Currently banned: 1
   |- Total banned: 3
   `- Banned IP list: 203.0.113.77

What it means: The jail is working, tracking failures, and banning abusive IPs.

Decision: If you see internal corporate IPs getting banned, your allowlists/VPN/NAT behavior may be causing shared egress IPs. Tune the jail or, better, fix the network path.

Crypto policy: ciphers, MACs, KEX, and host keys (without breaking clients)

Crypto hardening is where well-meaning people cause real outages. The right approach is incremental: inventory client capabilities, pick a policy, deploy in stages, and watch for handshake failures.

First rule: don’t outsmart your fleet

Ubuntu 24.04’s OpenSSH is already modern. Unless you have a compliance requirement, you often don’t need to explicitly set Ciphers/MACs/KexAlgorithms at all. Defaults evolve with patches; pinning can freeze you in time.

When you do need to set them (compliance, auditors, consistent cross-distro policy), do it with an inventory step and a rollback plan.

Inventory failures: watch for “no matching…” messages

cr0x@server:~$ ssh -vvv legacy@server
debug1: kex: algorithm: (no match)
Unable to negotiate with 10.20.8.10 port 22: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1

What it means: The client is ancient and only offers weak KEX. Your hardened server refuses it. That’s probably correct.

Decision: Upgrade the client or isolate it behind a controlled bastion that can speak to it. Don’t weaken the fleet for a single fossil.

Host keys: keep them sane, keep them stable

Host keys are the server’s identity. Rotating them without a plan triggers “REMOTE HOST IDENTIFICATION HAS CHANGED!” warnings, which users will learn to ignore. That’s the opposite of what you want.

cr0x@server:~$ sudo ssh-keygen -l -f /etc/ssh/ssh_host_ed25519_key.pub
256 SHA256:8t7mZlQZ9l8m0n3H9Jx7qWzYkqvHqQx0mJv4pYkq8mE root@server (ED25519)

Decision: Ensure you have modern host keys (Ed25519, ECDSA, RSA SHA-2). If you plan rotation, coordinate known_hosts updates via config management, not email.

Operational guardrails: change control, testing, and rollback

“Hardening” is not a hero move. It’s a controlled rollout.

Keep an active session and open a second one before reload

When you reload sshd, existing sessions typically remain. That’s your lifeline. Open a second session before changes so you can test new logins while keeping a known-good channel open.

Prefer reload over restart

systemctl reload ssh asks sshd to re-read configuration without dropping existing connections. Restart is more disruptive and more likely to strand you if a startup failure occurs.

Know your out-of-band access before you need it

Cloud console, IPMI, hypervisor console, or physical access. Test it quarterly. Yes, quarterly. The first time you use it should not be during an outage window while Slack fills with opinions.

Back up config files with a timestamp

cr0x@server:~$ sudo cp -a /etc/ssh/sshd_config /etc/ssh/sshd_config.bak.$(date +%F_%H%M%S)

Decision: If something goes sideways, you want a known-good file, not a memory of what you changed.

Fast diagnosis playbook

When SSH “breaks,” it’s usually one of four categories: daemon down, network blocked, auth rejected, or crypto negotiation failure. Diagnose in that order. Fast. No wandering.

1) Is sshd up and listening?

On the server console (or via existing session): check service status and listening sockets.

cr0x@server:~$ systemctl is-active ssh; ss -tulpn | grep sshd
active
tcp   LISTEN 0      128          10.20.8.10:22        0.0.0.0:*    users:(("sshd",pid=1247,fd=3))

If it’s not active/listening: config syntax, permissions, or package issues. Jump to sshd -t and journal logs.

2) Is the network path open?

From client side: do you get a TCP connection or a timeout?
From server side: do firewall rules permit your source IP?

cr0x@server:~$ sudo ufw status verbose
Status: active
Default: deny (incoming), allow (outgoing), disabled (routed)

Timeouts usually mean network/firewall. Immediate “connection refused” usually means sshd not listening (or port mismatch).

3) Is it auth policy?

From client: use verbose SSH.
From server: check ssh logs for “Authentication refused” or “user not allowed.”

cr0x@server:~$ ssh -vvv ops@server
debug1: Authentications that can continue: publickey
debug1: Offering public key: /home/cr0x/.ssh/id_ed25519 ED25519 SHA256:2d3v...
debug1: Server accepts key: /home/cr0x/.ssh/id_ed25519
debug1: Authentication succeeded (publickey).

If you see “Permission denied (publickey)”: check key deployment, file permissions, AllowUsers/AllowGroups, and any Match blocks.

4) Is it crypto negotiation?

Look for “no matching” messages: KEX, host key algorithm, cipher, or MAC. This is often a client age problem or an overly pinned server policy.

Common mistakes: symptom → root cause → fix

1) Symptom: Existing sessions stay up, but new logins fail immediately

Root cause: You reloaded sshd with a stricter policy (PasswordAuthentication no, AllowGroups, or AuthenticationMethods) and your test user doesn’t match it.

Fix: In your existing session, run sshd -T, confirm effective settings, and adjust allowlists or restore password auth temporarily while you fix key access.

2) Symptom: “Connection refused”

Root cause: sshd is not listening on that interface/port (bad ListenAddress, wrong port, service down).

Fix: ss -tulpn to confirm listeners; journalctl -u ssh for startup errors; correct config, then reload or restart if needed.

3) Symptom: “Connection timed out”

Root cause: Firewall/security group/network ACL blocking, or routing/VPN issue.

Fix: Verify UFW rules and upstream rules. Add a temporary narrow allow from your current IP, test, then clean up.

4) Symptom: “Permission denied (publickey)” but your key is present

Root cause: sshd is ignoring authorized_keys due to unsafe permissions or wrong ownership; or the user is not allowed by AllowUsers/AllowGroups.

Fix: Fix permissions to 700 on .ssh and 600 on authorized_keys. Verify ownership. Confirm allow rules.

5) Symptom: “REMOTE HOST IDENTIFICATION HAS CHANGED!” after a maintenance event

Root cause: Host keys rotated unexpectedly (rebuild, restore, or image change) without propagating known_hosts updates.

Fix: Validate the new host key fingerprint out-of-band, then update known_hosts centrally. Don’t train users to ignore warnings.

6) Symptom: Some automation breaks, humans are fine

Root cause: You disabled password auth and the automation was still using it; or the automation client library can’t negotiate the new algorithms.

Fix: Identify the failing accounts in logs, migrate automation to keys or SSH certificates, and test with the exact client version used in CI/CD.

Checklists / step-by-step plan

Phase 0: Pre-flight (don’t skip this)

Confirm console/out-of-band access exists and works (cloud serial, IPMI, hypervisor console).
Open two SSH sessions from different terminals.
Back up SSH config and note current effective settings:
- sudo cp -a /etc/ssh/sshd_config ...
- sudo sshd -T > /root/sshd-effective.before
Inventory who logs in and how (journal entries, bastion logs, automation accounts).

Phase 1: Network-first tightening

Implement allowlists in UFW (or cloud security groups) to limit SSH sources to admin/VPN networks.
Test connectivity from each allowed network.
Only then remove broad rules.

Phase 2: Key auth everywhere

Ensure every required human and automation account has a working key-based login.
Verify with PasswordAuthentication=no on the client side.
Fix permissions and ownership issues proactively.

Phase 3: Enforce no-password, no-root

Add a managed drop-in with PasswordAuthentication no and PermitRootLogin no.
Run sudo sshd -t.
Reload sshd, not restart.
Test a fresh login as a normal admin user and as the break-glass user.

Phase 4: Reduce SSH features you don’t need

Disable X11 forwarding unless required.
Disable agent forwarding by default. If your workflow needs it, enable it per-user using Match blocks.
Restrict port forwarding if SSH is used only for shell access.

Phase 5: Abuse resistance and monitoring

Set MaxAuthTries, LoginGraceTime, MaxStartups sensibly.
Deploy Fail2ban where appropriate (public-facing endpoints), but don’t rely on it as your primary control.
Review logs weekly until stable, then monthly.

Three corporate mini-stories from the land of regrettable SSH changes

Mini-story 1: The incident caused by a wrong assumption

The company had a mixed estate: some Ubuntu, some RHEL, a handful of appliances nobody wanted to admit were still in service. An engineer decided to standardize SSH policies during a “security sprint.” Reasonable goal. They added AllowGroups ssh-admins across the fleet, confident that “all admins are in that group.”

They were right on paper. In practice, there were three different identity sources: local users on older hosts, LDAP on newer ones, and an IAM integration on bastions. On several servers, the group existed—but the admin accounts weren’t members because those accounts were local breakouts created during past incidents. Nobody had cleaned them up. The assumption was that identity was centralized everywhere. It wasn’t.

The change rolled out. Existing SSH sessions stayed connected, which gave everyone false confidence. New sessions failed. The on-call team started rotating terminals like air traffic controllers because they didn’t want to lose their last remaining sessions. A routine kernel patching run was halfway through and couldn’t re-establish connections to finish.

The fix was boring and slightly humiliating: revert the AllowGroups change, audit group membership properly, then reintroduce allowlists using a staged Match Address block for the break-glass path first. The lasting lesson wasn’t “never use AllowGroups.” It was “never assume identity is uniform unless you’ve proven it.”

Mini-story 2: The optimization that backfired

A different org ran a heavily used bastion. They had thousands of short-lived SSH connections: humans, automation, and scripts doing “one command then exit.” Someone noticed CPU spikes under load and decided to “optimize” by tightening timeouts and reducing concurrent unauthenticated connections.

They set LoginGraceTime aggressively low and MaxStartups too strict, thinking they were only hurting bots. During a routine incident, a VPN flap caused many users’ SSH clients to reconnect simultaneously. The bastion started dropping new connections right when it was needed most. The security team loved the new settings; the incident commander did not.

Worse, the failure mode looked like network instability. Users saw intermittent disconnects and timeouts. People blamed the VPN, then DNS, then the cloud network. The root cause was self-inflicted admission control from sshd under bursty reconnect conditions.

They rolled back to a more forgiving MaxStartups curve and increased grace time slightly. CPU spikes remained manageable, and the bastion became predictable again. The moral: optimizing SSH like it’s a benchmark is how you discover your workload includes humans behaving like a thundering herd.

Mini-story 3: The boring but correct practice that saved the day

A finance company had a strict rule: every SSH policy change required (1) a canary host, (2) an out-of-band access test, and (3) a scripted verification of key-only login before disabling passwords. It sounded bureaucratic. It also worked.

They planned to disable password authentication fleet-wide. Before enforcement, their pipeline ran a test that attempted login with PasswordAuthentication=no for every privileged account that mattered. One canary failed: a legacy automation job still used password auth to reach a reporting box. Nobody knew because it was “set and forget” from years ago.

Because the test ran before enforcement, nothing broke. They fixed the job to use a dedicated key with restricted command permissions, reran the test, and only then pushed the config everywhere.

There was no drama, no emergency console access, and no late-night change revert. The security improvement landed quietly. That’s what “correct” looks like in production: not heroic, just repeatable.

FAQ

1) Should I change the SSH port from 22?

If you’re internet-exposed, changing ports reduces background scan noise but doesn’t stop targeted attackers. Do it only if it doesn’t complicate your tooling. Prefer network allowlists and key-only auth.

2) Is disabling password authentication always safe?

It’s safe when you’ve verified key-based access for every required user and automation path, and you have a break-glass plan. It’s risky when you “think” keys are deployed but haven’t tested with password fallback disabled.

3) What’s the safest way to apply changes: reload or restart?

Reload first. systemctl reload ssh keeps existing sessions. Restart is fine when necessary, but it increases the chance of complete lockout if config is broken.

4) Should I set explicit Ciphers/MACs/KexAlgorithms?

Only if you must for policy/compliance or cross-distro consistency. Defaults are generally good and improve with updates. Pinning algorithms can create a future outage when clients differ.

5) How do I prevent root access without losing admin capability?

Disable root login over SSH (PermitRootLogin no) and use named accounts with sudo. You get audit trails and you can revoke access cleanly.

6) What about SSH agent forwarding?

Disable it by default (AllowAgentForwarding no). If certain workflows require it, enable it in a targeted Match User block and keep it off bastions unless you truly trust the hop.

7) Do I need Fail2ban if I already have UFW allowlists?

If SSH is strictly allowlisted to private networks, Fail2ban’s value drops. If SSH is reachable from the internet (even temporarily), Fail2ban is helpful as an extra layer, not a primary defense.

8) What’s the single best “won’t lock you out” trick?

Run a temporary canary sshd on another port with your intended strict settings, test it end-to-end, then apply settings to the real service.

9) How do I know if I broke crypto compatibility?

Client-side ssh -vvv will show negotiation failures like “no matching key exchange method.” Server logs can also show rejected algorithms. The fix is usually upgrading the client, not weakening the server.

10) Can I enforce different SSH policies for different users?

Yes. Use Match User, Match Group, or Match Address blocks to scope stricter or looser settings. This is how you keep break-glass access controlled while enforcing strong defaults for everyone else.

Conclusion: what to do next

SSH hardening on Ubuntu 24.04 doesn’t need to be theatrical. Do it like an operator: reduce exposure first, prove key access, then enforce stricter auth. Avoid algorithm pinning unless you have a reason. Prefer reloads over restarts. Keep a break-glass path that you actually test, not just talk about.

Practical next steps:

Run the baseline tasks: confirm effective config, listeners, firewall scope, and real login patterns.
Add/verify a break-glass user with key-only auth and restricted source networks.
Roll out a drop-in hardening file in canary → small batch → full fleet stages, with sshd -t and a scripted login test gate.
After enforcement, spend one week watching auth logs daily. The system will tell you what you broke—if you’re listening.