It’s 09:12. You’ve got a deploy in flight, a teammate waiting, and your terminal just started replying with the two least helpful words in operations: Permission denied. Yesterday you could SSH into the box. Today you can’t. Nothing “changed,” except everything always changes.
This is a field guide for Ubuntu 24.04 (Noble) when SSH stops letting you in. Not theory—production reality: keys that used to work, accounts that used to exist, configs that “should be fine,” and policy layers that silently block you while insisting you’re wrong.
Fast diagnosis playbook (first/second/third)
When SSH flips from “works” to “permission denied,” don’t start by editing random files. Start by proving what’s happening, where it’s failing, and which side is lying. Here’s the fastest path I know that doesn’t create new problems while chasing the old one.
First: confirm you’re hitting the right machine and right user
- Resolve target and capture the path. Confirm hostname → IP and whether you’re going via bastion/ProxyJump.
- Confirm the username. “Permission denied” often means “wrong user” plus “no fallback auth allowed.”
- Check host key continuity. Wrong host or rebuilt host shows up as a host key change, but people often “fix” it by deleting known_hosts and then connect to the wrong thing.
Second: run one verbose client attempt and read it like a log
- Use
-vvvonce. Don’t guess; the client tells you which identities it offered and what was accepted or rejected. - Decide: is it failing at key selection, server key acceptance, or account authorization?
Third: inspect server logs (or use console access if you’re locked out)
- If you have any privileged path (cloud console, IPMI/iDRAC, serial console, KVM), use it. It’s not “cheating,” it’s why you pay for it.
- Read
journalctlforsshdand PAM/account messages. - Fix the narrowest thing that explains the logs. Restarting sshd is rarely the fix; it’s usually how you turn a small mistake into an outage.
What “Permission denied” actually means in SSH
The message is famously unhelpful because it’s the client summarizing a multi-step negotiation. On Ubuntu 24.04 with OpenSSH, “Permission denied” generally means:
- The server accepted the connection, but no authentication method succeeded.
- The server rejected your key, password, or keyboard-interactive attempt.
- Or, more subtly: authentication succeeded but the account phase failed (PAM, access rules, locked user, invalid shell), and the user experience still collapses into “Permission denied.”
There are three broad buckets:
- Client-side identity problems: wrong key, wrong agent, wrong username, wrong config stanza.
- Server-side file/config problems: broken permissions, Match blocks, authorized_keys not readable, wrong AuthorizedKeysFile path, etc.
- Policy/account gating: user locked, expired, PAM restrictions, AllowUsers/DenyUsers, MFA requirements not satisfied.
One quote to keep you honest when you’re tempted to “just restart things until it works”:
“Hope is not a strategy.” — Gene Kranz
Interesting facts and quick history (because it matters)
A little context makes you faster at diagnosis because you stop expecting SSH to behave like a single monolithic thing. It’s a stack.
- SSH replaced rsh/telnet primarily because those older tools sent credentials in cleartext. SSH’s security model is why it’s picky—and why you should be picky too.
- OpenSSH’s strict permissions checks are deliberate. If your
~/.sshis group-writable, it assumes someone else could inject keys. It would rather lock you out than be “helpful.” - “Permission denied (publickey)” is a client-side summary. The server can reject keys for many reasons: wrong key type, wrong principal, bad permissions, forced-command restrictions, or account policy.
- Host keys are the server’s identity. They prevent man-in-the-middle attacks. When people blindly delete
known_hosts, they’re choosing convenience over detection. - PAM can deny you after authentication succeeds. That’s why logs matter: you can “authenticate” and still get kicked out at the account/session stage.
- OpenSSH defaults evolve. Algorithms and key sizes that were acceptable a decade ago may be disabled now. Updates can “break” old keys, especially weak RSA settings.
- Ubuntu’s OpenSSH integrates tightly with systemd. Logs are usually in journald, and service behavior (like socket activation in some setups) can change expectations.
- Cloud images often use opinionated SSH settings. For example, passwords disabled, specific default users, and cloud-init managing
authorized_keys.
Cause #1: You’re not using the key you think you’re using (agent, identity, file)
This is the #1 “worked yesterday” failure in real life because it’s usually not the server. It’s you, your laptop, your agent, your config, your VPN split-tunnel, your new terminal, or a corporate security tool that rotated something while you slept.
Task 1: Run one verbose SSH attempt and identify what key is offered
cr0x@server:~$ ssh -vvv user@host.example.com
OpenSSH_9.6p1 Ubuntu-3ubuntu13, OpenSSL 3.0.13 30 Jan 2024
debug1: Reading configuration data /home/cr0x/.ssh/config
debug1: Connecting to host.example.com [203.0.113.10] port 22.
debug1: identity file /home/cr0x/.ssh/id_ed25519 type 3
debug1: identity file /home/cr0x/.ssh/id_rsa type 0
debug1: Offering public key: /home/cr0x/.ssh/id_ed25519 ED25519 SHA256:abc... agent
debug1: Authentications that can continue: publickey
debug1: Offering public key: /home/cr0x/.ssh/id_rsa RSA SHA256:def... agent
debug1: Authentications that can continue: publickey
user@host.example.com: Permission denied (publickey).
What it means: You offered two keys. Both were rejected. That narrows it: either the server doesn’t have either public key in authorized_keys, or it can’t read them, or server policy blocks you.
Decision: If you expected a different key, fix the client first (explicit -i, clean up ~/.ssh/config, or fix the agent). If you expected one of those keys to work, move to server-side checks (Cause #2 and #3).
Task 2: Force the key and disable agent confusion
cr0x@server:~$ ssh -i ~/.ssh/prod_ed25519 -o IdentitiesOnly=yes -vv user@host.example.com
debug1: identity file /home/cr0x/.ssh/prod_ed25519 type 3
debug1: Offering public key: /home/cr0x/.ssh/prod_ed25519 ED25519 SHA256:xyz...
debug1: Authentications that can continue: publickey
user@host.example.com: Permission denied (publickey).
What it means: You definitively offered the intended key. It still failed.
Decision: Stop fiddling with your laptop. Go verify the server has the matching public key and can read it.
Task 3: Verify what keys your agent is actually holding
cr0x@server:~$ ssh-add -l
256 SHA256:abc... /home/cr0x/.ssh/id_ed25519 (ED25519)
2048 SHA256:def... /home/cr0x/.ssh/id_rsa (RSA)
What it means: Those are the identities your agent will offer unless you constrain it.
Decision: If the right key is missing, add it (ssh-add ~/.ssh/prod_ed25519) or point SSH at it with -i and IdentitiesOnly=yes. If the right key is present but rejected, proceed server-side.
Task 4: Inspect your SSH config precedence (the silent foot-gun)
cr0x@server:~$ ssh -G host.example.com | sed -n '1,80p'
user user
hostname host.example.com
port 22
identityfile ~/.ssh/id_ed25519
identitiesonly no
proxyjump bastion.example.com
pubkeyauthentication yes
passwordauthentication no
What it means: ssh -G shows the final computed config. If identityfile isn’t what you expect—or user isn’t—the client is doing exactly what you told it to do, just not what you meant.
Decision: Fix ~/.ssh/config so production hosts use explicit IdentityFile and IdentitiesOnly yes. Ambiguity is cute in novels, not in access control.
Dry-funny joke #1: SSH agents are like “helpful” coworkers: they show up uninvited and volunteer the wrong credentials with total confidence.
Cause #2: Server-side permissions/ownership broke (home, .ssh, authorized_keys)
If SSH key auth used to work and suddenly doesn’t, the most common server-side culprit is a permissions change. This happens after:
- a home directory migration,
- an overzealous
chmod -R, - restoring from backup with wrong ownership,
- switching users/groups (including centralized identity),
- or a “security hardening” script that didn’t understand OpenSSH’s rules.
Task 5: Read the server logs for key rejection reasons
You’ll need console access or another privileged route if SSH is fully broken.
cr0x@server:~$ sudo journalctl -u ssh -n 80 --no-pager
Aug 07 10:12:01 host sshd[12345]: Authentication refused: bad ownership or modes for directory /home/user
Aug 07 10:12:01 host sshd[12345]: Connection closed by authenticating user user 198.51.100.22 port 51234 [preauth]
What it means: This is the classic: permissions on the user’s home (or .ssh) are too open, or owned by the wrong user.
Decision: Fix ownership and modes. Don’t “relax” sshd settings to tolerate bad permissions; fix the underlying filesystem state.
Task 6: Check ownership and modes for home and SSH files
cr0x@server:~$ sudo namei -l /home/user/.ssh/authorized_keys
f: /home/user/.ssh/authorized_keys
drwxr-xr-x root root /
drwxr-xr-x root root home
drwxrwxr-x user user user
drwxr-xr-x user user .ssh
-rw-r--r-- user user authorized_keys
What it means: /home/user is group-writable (drwxrwxr-x). That’s enough for OpenSSH to refuse keys, depending on configuration. Even if .ssh looks fine, the parent directory can fail the check.
Decision: Remove group-write on the home directory and ensure ownership is correct.
Task 7: Apply sane permissions (surgical, not recursive chaos)
cr0x@server:~$ sudo chown -R user:user /home/user/.ssh
cr0x@server:~$ sudo chmod 0755 /home/user
cr0x@server:~$ sudo chmod 0700 /home/user/.ssh
cr0x@server:~$ sudo chmod 0600 /home/user/.ssh/authorized_keys
cr0x@server:~$ sudo ls -ld /home/user /home/user/.ssh /home/user/.ssh/authorized_keys
drwxr-xr-x 9 user user 4096 Aug 7 10:00 /home/user
drwx------ 2 user user 4096 Aug 7 10:01 /home/user/.ssh
-rw------- 1 user user 402 Aug 7 10:01 /home/user/.ssh/authorized_keys
What it means: These are the conservative defaults that keep sshd happy: home not writable by group/others; .ssh private; keys file private.
Decision: Re-test login. If it still fails and logs don’t complain about ownership/modes anymore, move to config/policy causes.
Task 8: Confirm the public key on the server matches the private key you’re using
This avoids the “wrong key pair” rabbit hole.
cr0x@server:~$ ssh-keygen -lf /home/user/.ssh/authorized_keys | head -n 3
256 SHA256:xyz... comment@laptop (ED25519)
2048 SHA256:old... legacy@laptop (RSA)
What it means: These are the fingerprints of authorized public keys. Compare to your client key fingerprint.
Decision: If the server lacks the right key, add it (preferably through your standard provisioning path). If it’s present, proceed to sshd config and policy.
Cause #3: sshd config changed (AllowUsers, Match blocks, auth methods)
On Ubuntu 24.04, you’re typically running OpenSSH with a config split between /etc/ssh/sshd_config and includes in /etc/ssh/sshd_config.d/. That’s good. It’s also how a “small” change becomes a surprise.
Task 9: Validate the sshd configuration (syntax and effective settings)
cr0x@server:~$ sudo sshd -t
cr0x@server:~$ echo $?
0
What it means: Exit code 0 means syntax is valid. Non-zero means you likely broke sshd config; you might still have an existing running daemon, but reload/restart will fail.
Decision: If syntax fails, fix that first. If syntax is OK, inspect effective settings.
Task 10: Print effective sshd settings (the truth, not what you remember)
cr0x@server:~$ sudo sshd -T | egrep -i 'passwordauthentication|pubkeyauthentication|authorizedkeysfile|allowusers|denyusers|permitrootlogin|match'
pubkeyauthentication yes
passwordauthentication no
authorizedkeysfile .ssh/authorized_keys .ssh/authorized_keys2
permitrootlogin prohibit-password
What it means: This is the evaluated config (minus Match context unless you test it). It shows whether password auth is disabled and where sshd expects keys to live.
Decision: If AuthorizedKeysFile points somewhere unexpected, your keys might be in the wrong place. If PubkeyAuthentication is off, you’ve found the problem. If Allow/Deny rules exist, inspect them next.
Task 11: Check for Match blocks that only apply to your user or source IP
cr0x@server:~$ sudo awk 'BEGIN{p=0} /^Match /{p=1} {if(p)print}' /etc/ssh/sshd_config /etc/ssh/sshd_config.d/*.conf 2>/dev/null
Match User user
AuthenticationMethods publickey
X11Forwarding no
What it means: You have a user-specific policy. Maybe someone added AuthenticationMethods publickey (fine) but removed the correct key, or combined it with another requirement, or set a forced command that now fails.
Decision: Ensure the Match block reflects reality. If you require publickey only, the key must exist and be readable. If you require multiple methods, confirm the client can satisfy them.
Task 12: Confirm sshd is running the config you think it is
cr0x@server:~$ systemctl status ssh --no-pager
● ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/usr/lib/systemd/system/ssh.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-08-07 09:59:10 UTC; 15min ago
Docs: man:sshd(8)
man:sshd_config(5)
Main PID: 1122 (sshd)
Tasks: 1 (limit: 2321)
Memory: 2.3M
CPU: 79ms
CGroup: /system.slice/ssh.service
└─1122 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"
What it means: sshd is running. Great. That doesn’t mean it’s allowing you. But it does mean you can safely reload after config changes (still test with sshd -t first).
Decision: If you changed config and didn’t reload, do a safe reload after validation.
cr0x@server:~$ sudo sshd -t && sudo systemctl reload ssh
What it means: Reload applies config without dropping existing sessions. Restart drops sessions and is how you turn one lockout into many lockouts.
Decision: Prefer reload. If reload fails, fix config and try again—don’t restart out of frustration.
Cause #4: Account/policy blocks you (locked user, expired key, PAM, access.conf)
This is the “everything looks correct, keys match, permissions are fine, sshd is healthy, but I still can’t get in” zone. It’s also where corporate environments hide the sharp objects: PAM policies, centralized identity, account expiration, and “temporary” deny lists that become permanent.
Task 13: Check whether the user account is locked or expired
cr0x@server:~$ sudo passwd -S user
user L 2025-07-10 0 99999 7 -1
What it means: The L indicates the account is locked. SSH key auth can still be blocked at the account stage depending on PAM and configuration; in many setups, a locked password effectively blocks logins.
Decision: If this account should be usable, unlock it (with your change control process). If it should be locked, stop trying to bypass policy and use the correct break-glass account or workflow.
Task 14: Check if the shell is valid (yes, this still happens)
cr0x@server:~$ getent passwd user
user:x:1001:1001:User,,,:/home/user:/usr/sbin/nologin
What it means: The account’s login shell is /usr/sbin/nologin. SSH can authenticate but then refuse the session (often looks like permission denied or immediate disconnect).
Decision: If the user should have shell access, set a valid shell like /bin/bash or whatever your standard is. If it’s intentionally nologin, don’t fight it.
Task 15: Look for PAM access restrictions in the logs
cr0x@server:~$ sudo journalctl -u ssh -n 120 --no-pager | egrep -i 'pam|account|access|denied|failure'
Aug 07 10:14:09 host sshd[12555]: pam_access(sshd:account): access denied for user `user' from `198.51.100.22'
Aug 07 10:14:09 host sshd[12555]: Failed password for user from 198.51.100.22 port 51288 ssh2
What it means: PAM is denying account access based on source IP or rules. The authentication method line after it can be misleading; the denial is explicit.
Decision: Inspect /etc/security/access.conf and PAM sshd stack. Fix the rule or source IP assumption. Don’t disable PAM modules casually; you’ll win access and lose audit/compliance.
Task 16: Check PAM configuration for sshd (what modules run)
cr0x@server:~$ sudo grep -nE 'pam_access|pam_sss|pam_faillock|pam_tally2' /etc/pam.d/sshd
15:account required pam_access.so
22:account [default=bad success=ok user_unknown=ignore] pam_sss.so
What it means: pam_access.so is active. That explains source-based blocks. pam_sss.so suggests centralized identity (SSSD) is involved.
Decision: If access rules are too strict, adjust them. If SSSD is down or misconfigured, you might need to restore identity service health (not just SSH).
Task 17: Detect SSSD / identity outages that masquerade as SSH failures
cr0x@server:~$ systemctl status sssd --no-pager
● sssd.service - System Security Services Daemon
Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-08-07 08:01:44 UTC; 2h 13min ago
What it means: SSSD is running, but “running” isn’t “working.” Still, a dead SSSD is an easy win.
Decision: If SSSD is down, bring it back. If it’s up, check logs for auth/account lookup failures (outside this article’s scope, but you know where this goes).
Cause #5: The box isn’t the box (host keys, DNS/IP, bastion, ProxyJump)
“SSH worked yesterday” is often true. It’s just that you were SSHing into a different machine yesterday and didn’t know it. DNS changes, floating IP reassignment, load balancers in front of SSH (yes, people do this), and bastions can all create that illusion.
Task 18: Confirm DNS and route to the target
cr0x@server:~$ getent hosts host.example.com
203.0.113.10 host.example.com
What it means: This is the IP your system will use. If it changed since yesterday, you’re probably knocking on a different door.
Decision: If the IP is unexpected, stop and validate the correct target (inventory, cloud console). Fix DNS, or update your SSH config to use the stable address.
Task 19: Check host key mismatch (the honest warning)
cr0x@server:~$ ssh user@host.example.com
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Host key verification failed.
What it means: Either the host was rebuilt, or DNS/IP changed, or someone is intercepting traffic. In enterprise networks, rebuilds and IP reuse are common; attacks are less common, but you don’t get to assume.
Decision: Verify the new host key via an out-of-band trusted channel (console, CMDB, provisioning logs). Only then update known_hosts.
Task 20: Remove a single offending known_hosts entry safely
cr0x@server:~$ ssh-keygen -R host.example.com
# Host host.example.com found: line 42
/home/cr0x/.ssh/known_hosts updated.
Original contents retained as /home/cr0x/.ssh/known_hosts.old
What it means: You removed one host entry, not nuked the entire file.
Decision: Reconnect and validate the new fingerprint. If you can’t validate it, don’t proceed. “It’s probably fine” is how incident reports get written.
Task 21: Confirm ProxyJump/bastion path (you might be blocked upstream)
cr0x@server:~$ ssh -G host.example.com | egrep -i 'proxyjump|proxycommand|hostname|user'
user user
hostname host.example.com
proxyjump bastion.example.com
What it means: Your connection goes through a bastion. Permission denied might be happening at the bastion hop, not the final host.
Decision: Try connecting to the bastion directly with verbosity and confirm auth there first.
cr0x@server:~$ ssh -vvv user@bastion.example.com
debug1: Authentications that can continue: publickey
user@bastion.example.com: Permission denied (publickey).
What it means: The bastion is rejecting you. The target might be fine.
Decision: Fix bastion access (keys, allowlists, policy) before touching the target host.
Dry-funny joke #2: Nothing builds character like debugging “Permission denied” for an hour—only to learn you’ve been SSHing into the staging box with production confidence.
Three corporate mini-stories (realistic, anonymized, and painfully educational)
Mini-story 1: The incident caused by a wrong assumption
They had a fleet of Ubuntu servers behind a bastion. Access was key-only, no passwords, and the company had just rolled out a new laptop management tool. The tool quietly replaced the user’s SSH agent integration—still an agent, still “working,” but no longer loading the same keys by default.
At 08:30, an engineer tried to patch a high-risk CVE. SSH said Permission denied (publickey). He assumed the server was broken, because “I didn’t change anything.” A second engineer got the same error and assumed the bastion was down. They opened an incident, paged the platform team, and started planning emergency console access.
The logs on the bastion were clear: dozens of rejected keys, none matching the expected fingerprint. On the clients, ssh -vvv showed the agent offering an older RSA key and a GitHub key—neither authorized for production. The correct key existed on disk, but was never offered because the new tool had changed agent behavior and the ~/.ssh/config had no explicit IdentityFile for the bastion.
The fix was boring: update SSH config to pin the right identity for the bastion and set IdentitiesOnly yes. The postmortem was also boring, which is how you know it was useful: the wrong assumption was “the server changed.” The server didn’t. The client did.
They also added a pre-flight check to their runbook: if SSH fails, collect ssh -G and ssh -vvv output before escalating. That one step prevented a whole category of future false alarms.
Mini-story 2: The optimization that backfired
A different company had a “security hardening sprint.” Someone decided to standardize home directory permissions across all Linux boxes. The script set /home/* to 0775 so a shared group could “help with collaboration.” It looked harmless. It was even approved in a ticket because the reviewer thought “group write” only affects local edits.
Within minutes, SSH key auth started failing for a subset of users. Not all of them—only those whose home directories became group-writable and whose sshd configuration still had strict mode checks (the default). The on-call saw Permission denied (publickey) and assumed keys had been removed or cloud-init had overwritten authorized_keys.
The sshd logs told the truth: Authentication refused: bad ownership or modes for directory /home/user. The “optimization” (collaboration via group write) violated OpenSSH’s security checks. The hardening sprint had ironically reduced security by encouraging people to propose disabling strict mode checks just to get access back.
The fix was to revert home directory permissions to 0755 (or more restrictive), and implement collaboration via a shared project directory with correct ACLs, not by weakening user home boundaries. They also changed their scripts to apply permissions only to known safe paths instead of blasting the entire tree.
The lesson wasn’t “don’t automate.” It was “don’t automate without understanding the invariants the system depends on.” SSH depends on file permission invariants. Break them and it assumes compromise—which is a reasonable personality trait for a security daemon.
Mini-story 3: The boring but correct practice that saved the day
One team ran Ubuntu 24.04 on a mix of physical hosts and cloud instances. They had a simple practice: every SSH-affecting change required (1) sshd -t validation, (2) a systemctl reload ssh, not a restart, and (3) a second open session before closing the first.
It wasn’t glamorous. It didn’t make slides. It did prevent outages.
During a maintenance window, an engineer added a Match Address block to enforce stricter auth from the public internet while keeping internal access smooth. The config was syntactically valid, but logically wrong: it matched more addresses than intended, effectively applying “public internet rules” to internal jump hosts. Result: people started getting denied.
Because they used reload and kept a second session open, they never locked themselves out. They immediately tail-read logs, saw the match behavior, corrected the match condition, reloaded again, and verified from both internal and external paths. No drama, no console recovery, no midnight heroics.
The practice that saved them wasn’t genius. It was discipline: validate, reload, keep a backdoor session until you’ve proven the door works.
Common mistakes: symptoms → root cause → fix
This section is opinionated because it’s what people actually do when stressed. If you recognize yourself, congratulations: you’re human. Now stop it.
1) Symptom: “Permission denied (publickey)” after a routine file copy to the server
- Root cause:
~/.sshor home directory permissions changed (group writable, wrong owner). - Fix: Use
namei -lto find the first bad directory; set home to0755,.sshto0700,authorized_keysto0600.
2) Symptom: Works from one laptop, fails from another
- Root cause: Different key offered due to agent/config differences; or source IP is blocked by PAM/access rules.
- Fix: Compare
ssh -Gandssh -vvvfrom both machines; check server logs forpam_accessdenies.
3) Symptom: “Permission denied” right after you “cleaned up” known_hosts
- Root cause: You connected to the wrong host (DNS/IP changed) and now trust a new key blindly, or you’re on a rebuilt instance without correct
authorized_keys. - Fix: Verify target IP with
getent hosts. Verify host key out-of-band. Then verify the authorized keys are present and correct.
4) Symptom: You can authenticate but get immediately disconnected
- Root cause: Invalid shell (
/usr/sbin/nologin), forced command failing, or PAM session module error. - Fix: Check
getent passwdshell, inspectsshdlogs, and verify any forced command or restricted environment.
5) Symptom: Only one user is blocked; others can SSH fine
- Root cause: Match block for that user, wrong
authorized_keys, locked account, or per-user policy. - Fix: Search
sshd_configandsshd_config.dforMatch User. Validate account status withpasswd -S. Fix ownership/modes in that user’s home.
6) Symptom: It broke right after an OpenSSH/Ubuntu update
- Root cause: Deprecated algorithm or key type disabled; or sshd config behavior changed by included snippets.
- Fix: Confirm key type (
ssh-keygen -lf), prefer ED25519. Inspect effective config withsshd -Tand check includes in/etc/ssh/sshd_config.d/.
Checklists / step-by-step plan
If you want a plan you can run under pressure, here it is. It’s designed to minimize collateral damage and maximize signal.
Checklist A: Client-side triage (2 minutes)
- Confirm the user/host you’re targeting (don’t laugh; do it).
- Run
ssh -vvvonce and save output. - Check computed config:
ssh -G host | head. - List agent keys:
ssh-add -l. - Retry with pinned key:
ssh -i ~/.ssh/prod_ed25519 -o IdentitiesOnly=yes user@host.
Checklist B: Server-side “why did sshd reject me?” (5–10 minutes)
- Read logs:
journalctl -u ssh -n 120. - If logs mention ownership/modes, run
namei -l /home/user/.ssh/authorized_keys. - Fix permissions (home,
.ssh,authorized_keys), then re-test. - If logs mention PAM access, inspect
/etc/pam.d/sshdand/etc/security/access.conf. - Validate sshd config:
sshd -tand effective settings:sshd -T. - Reload sshd safely:
systemctl reload ssh.
Checklist C: “Are we even talking to the right host?” (3 minutes)
- Resolve host:
getent hosts host.example.com. - Check if host key changed (do not ignore warnings).
- If you use a bastion, test bastion auth separately.
- Confirm that the machine’s SSH host key fingerprint matches what you expect via console.
Operational advice you can steal
- Always keep one working session open while changing SSH-related config.
- Use reload, not restart, unless you enjoy spontaneous lockouts.
- Don’t weaken SSH’s permission checks to “get back in.” Fix the permissions. The daemon is right to be strict.
- Make identity explicit in
~/.ssh/configfor critical hosts: key path, user, andIdentitiesOnly yes.
FAQ
1) Why does SSH say “Permission denied” when my key is definitely on the server?
Because “on the server” isn’t enough. sshd must be able to read it, and it must trust the path to it. Bad permissions on /home/user or ~/.ssh commonly cause silent key rejection with a log entry like “bad ownership or modes.”
2) How do I know which key SSH is trying?
Use ssh -vvv. Look for lines like “Offering public key:” and “identity file … type …”. If it’s offering a key you didn’t expect, force the correct one with -i and IdentitiesOnly=yes.
3) Is it safe to delete ~/.ssh/known_hosts?
It’s safe in the same way it’s safe to disable smoke detectors because they keep beeping. Remove only the specific host entry with ssh-keygen -R, and only after you’ve validated the new host key fingerprint.
4) SSH worked, then after a reboot it broke. What changed?
Often: identity services (SSSD) didn’t come up cleanly, home directories mounted differently, permissions shifted via automation, or the instance IP changed and you’re hitting a different host. Start with logs (journalctl -u ssh) and verify you’re connecting to the expected IP.
5) Can a locked password prevent SSH key login?
Yes, depending on PAM and account policies. Even if public key auth succeeds, PAM’s account phase can reject a locked/expired account. Check with passwd -S user and logs for PAM messages.
6) I only get “Permission denied (publickey)” and no other methods. Why?
Because the server likely has PasswordAuthentication no and/or KbdInteractiveAuthentication no. That’s common and generally good. It also means your key path must be correct, and the key must be authorized and readable.
7) What’s the quickest server-side command to see if sshd config is the issue?
sudo sshd -T to print effective settings and sudo sshd -t to validate syntax. Pair that with journalctl -u ssh to see why the daemon rejected you.
8) Why does SSH fail only from the office network but works from my home network (or vice versa)?
Source-based policy: pam_access rules, firewall rules, AllowUsers with user@host patterns, or a Match block keyed on address. Logs usually show the decision point.
9) I see “Authentication refused: bad ownership or modes” but everything looks fine in ~/.ssh. What now?
Check the entire path with namei -l. The bad permission is often on /home/user or a parent directory, not the .ssh directory itself.
10) Should I restart sshd when troubleshooting?
Prefer systemctl reload ssh after sshd -t. Restarting can drop your last working session and turn a fix into a lockout. Reload is the grown-up choice.
Practical next steps
If SSH worked yesterday and today it’s “Permission denied,” your job is to turn that vague complaint into one of five concrete causes: wrong key, bad permissions, sshd policy, account policy, or wrong target host. Don’t improvise—measure.
- Run
ssh -vvvand identify what key is offered and rejected. - Confirm the computed client config with
ssh -Gand pin the correct identity. - On the server, read
journalctl -u ssh. It usually tells you the real reason. - Fix permissions using
namei -lto find the first unsafe directory in the path. - Validate sshd config with
sshd -t, confirm effective settings withsshd -T, and reload safely. - When logs implicate PAM or account state, treat it as an identity/policy incident, not an SSH incident.
And once you’re back in: make SSH deterministic. Explicit keys per host, tight permissions, and a runbook that starts with logs. Future-you will be tired and will deserve the kindness.