Ransomware doesn’t show up with a villain cape. It arrives as a normal Tuesday: a helpdesk ticket about “files not opening,” a CEO Slack ping, a storage graph that suddenly looks like a ski slope. Then the ransom note. Then the adrenaline, the meetings, and the long night where everyone suddenly remembers the word “backup” and argues about what it means.
If your plan is “we bought the best antivirus,” you don’t have a plan. You have a purchase order. Real ransomware defense is mostly unglamorous operations: identity hygiene, segmentation, logging that’s actually watched, and backups that are provably restorable when the attackers are already inside.
The myth: “Best antivirus” as a security strategy
Antivirus (AV) and even modern endpoint tooling are not useless. They’re just not a primary control against ransomware. Treating them as “the thing that stops ransomware” is like treating seatbelts as a plan for avoiding car crashes. Good to have, not where accidents are prevented.
Ransomware operators don’t bet the company on commodity malware anymore. They run playbooks. They buy access. They exploit identity. They live off the land. They use signed tools, remote management, legit admin utilities, and stolen credentials. They take their time. They disable what you bought to protect yourself, or they route around it.
The core failure mode behind the “best AV” myth is a category error: AV is a detection/control on endpoints. Ransomware is an operational failure that spans identity, network, backup, storage, and response. Endpoint controls are one layer, but you need defense that survives an attacker with admin privileges.
Dry reality: if an adversary becomes domain admin and can reach your backup system, your “best AV” score is not going to matter much.
One quote you should staple to the wall: “Hope is not a strategy.” — Vince Lombardi
Joke #1 (short, relevant): Antivirus is like deodorant: it helps, but it doesn’t replace bathing.
Facts & history: how we got here (and why it matters)
Ransomware feels modern, but it’s been iterating for decades. A few concrete points that explain why today’s defenses have to be operational, not just software:
- 1989: The AIDS Trojan is often cited as an early ransomware precursor—distributed on floppy disks, using crude encryption and payment by mail. The model worked even when the crypto didn’t.
- Mid-2000s: “Scareware” and fake AV monetized fear. It taught criminals that user behavior and payment flows matter more than technical elegance.
- 2013–2016: CryptoLocker and its successors industrialized strong encryption at scale. The “encrypt everything fast” era begins.
- 2017: WannaCry and NotPetya spread using wormable vulnerabilities and weak patch hygiene. NotPetya was destructive masquerading as ransomware—insurance and response playbooks had to adapt.
- 2019–2021: “Big game hunting” shifted focus to enterprises: manual intrusion, privilege escalation, and tailored impact. Encryption was only the closing act.
- Double extortion: Attackers started stealing data before encrypting it, turning “restore from backup” into only half a solution.
- Ransomware-as-a-service: Affiliates commoditized intrusions. Skill is distributed; volume increases; tactics standardize.
- EDR wars: Operators explicitly test and tune payloads against popular EDR stacks. The endpoint becomes a contested space, not a safe one.
- Cloud era twist: Identity and API abuse can wipe object storage or encrypt SaaS data. Your “endpoint AV” is irrelevant to a stolen OAuth token.
These aren’t trivia. They explain the modern rule: you cannot rely on a single preventative product, because the attacker’s core capability is adapting to products.
What ransomware really does in your environment
Most ransomware incidents in corporate environments follow a fairly boring sequence. “Boring” is not comforting. It means it happens a lot.
Phase 1: Initial access
Common entry points: compromised VPN credentials without MFA, exposed RDP, phishing that captures tokens, unpatched edge appliances, or third-party remote tools. The attacker wants a foothold that looks like normal remote work.
Phase 2: Credential theft and privilege escalation
Credential dumping, token theft, Kerberos abuse, password spraying, abuse of local admin reuse. If your admins log into workstations with domain admin accounts, the attacker’s job gets dramatically easier.
Phase 3: Discovery and lateral movement
They enumerate file shares, backup servers, hypervisors, domain controllers, and monitoring systems. They look for your crown jewels and the systems that could help you recover. If they can break recovery, the ransom gets higher.
Phase 4: Data theft (optional, increasingly common)
They stage and exfiltrate sensitive data. This is where bandwidth graphs and unexpected outbound connections matter. If you don’t have egress monitoring, you find out during the ransom note.
Phase 5: Impact
Encryption of endpoints and servers, deletion of shadow copies/snapshots, disabling services, mass GPO changes, and—if you’re unlucky—destruction of backups. This is the “everything is on fire” moment, but it started hours or days earlier.
Controls that actually prevent ransomware (and why)
Notice the word “prevent.” Not “detect after it’s everywhere.” Real prevention means stopping the attacker before they can encrypt widely or before they can destroy your ability to restore.
1) Identity hardening beats endpoint heroics
If you can’t protect identities, you can’t protect anything. Ransomware operators love environments where credentials are long-lived, overprivileged, and casually reused.
- MFA everywhere (VPN, email, admin portals, cloud consoles). Prefer phishing-resistant MFA for admin roles.
- Separate admin accounts (daily driver vs privileged). No domain admin browsing the web.
- Privileged access workstations (PAWs) or hardened jump hosts for administrative actions.
- Tiering model (Domain Controllers are Tier 0; keep Tier 0 creds off lower tiers).
- Reduce standing privilege with just-in-time elevation where possible.
2) Backups: the only “cure,” but only if they can’t be deleted
Backups are not a checkbox. They’re a system with adversaries. If the attacker can use your own admin tools to delete backups, you don’t have backups. You have expensive false hope.
What “good” looks like:
- 3-2-1: three copies, two media types, one offsite/isolated. Still a solid baseline.
- Immutability: snapshots/object lock or WORM-like policies that your normal admins cannot override quickly.
- Offline/air-gapped copy (logical or physical). Not everything needs to be offline, but something does.
- Restore testing: scheduled, measured, and boring. The goal is not a successful backup; it’s a successful restore.
- RPO/RTO engineered: what you can lose (RPO) and how fast you can be back (RTO). If you’ve never timed a restore, your RTO is a wish.
3) Network segmentation: make lateral movement expensive
Flat networks are ransomware accelerants. Segmentation is how you turn “one compromised laptop” into “one compromised laptop,” instead of “the entire company.”
- Isolate backups from general server networks. Limit management ports to jump hosts.
- Separate user subnets from server subnets; separate server subnets by function.
- Restrict east-west traffic. Make SMB and WinRM boringly hard to reach.
- Control egress. Most environments monitor inbound; attackers love outbound.
4) Logging and monitoring that can survive the attack
Attackers disable what they can see. If your logs live on the same domain and same storage that gets encrypted, you’ll be doing forensics with vibes.
- Centralize logs to a system with hardened access.
- Alert on identity events: new admin accounts, new service principals, group membership changes.
- Alert on backup tampering: job deletions, retention changes, snapshot deletions.
- Alert on mass file rename/write patterns and high SMB write rates.
5) Patch management: not perfect, still essential
Patch hygiene won’t stop credential theft, but it will stop a chunk of initial access and privilege escalation. Prioritize edge devices, VPN appliances, RMM tools, hypervisors, and domain controllers. If your patch cadence is “when we have time,” ransomware will schedule time for you.
6) EDR and AV: useful, but not the boss of the plan
Use modern endpoint detection/response for visibility, isolation, and containment. But assume that at least one endpoint will be missed or will be able to run something anyway. Your other layers must hold.
7) Storage engineering: snapshots, immutability, and blast-radius control
As a storage person, here’s the uncomfortable truth: ransomware is a storage workload. It’s a write-amplifying, metadata-churning, small-write storm with an attitude. You’ll see IOPS spikes, latency climbs, and pools that suddenly look “busy” even though business traffic is normal.
Storage controls that matter:
- Immutable snapshots (where supported) or snapshot policies protected from routine admin credentials.
- Separate backup storage from primary. Different credentials, different network, different failure domain.
- Monitor delete rates and rename storms on file systems.
- Know your restore paths: VM-level restores, file-level restores, bare metal. Time them.
Hands-on tasks: 12+ concrete checks with commands and decisions
These are practical tasks you can run today. Each includes a command, sample output, what it means, and the decision you make. Assume Linux servers for the command host unless noted; adapt to your environment.
Task 1: Check for suspicious SMB share activity (Linux Samba server)
cr0x@server:~$ sudo smbstatus -b
Samba version 4.15.13-Ubuntu
PID Username Group Machine Protocol Version Encryption Signing
-----------------------------------------------------------------------------------------------------------------------------
2314 jdoe domain users 10.20.5.44 (ipv4:10.20.5.44:49832) SMB3_11 - partial
2488 svc_backup domain users 10.20.9.12 (ipv4:10.20.9.12:52110) SMB3_11 - partial
Service pid Machine Connected at Encryption Signing
------------------------------------------------------------------------------------------
finance 2314 10.20.5.44 Tue Feb 5 10:12:44 2026 UTC - partial
profiles 2488 10.20.9.12 Tue Feb 5 09:55:03 2026 UTC - partial
What it means: Active SMB sessions and which shares they’re touching. A single workstation hammering a sensitive share outside business hours is suspicious.
Decision: If an unexpected host is connected to many shares or many sessions appear rapidly, isolate that endpoint and start file audit checks.
Task 2: Detect a rename/write storm on a filesystem (Linux)
cr0x@server:~$ sudo iostat -xz 1 3
Linux 6.5.0-15-generic (fs01) 02/05/2026 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
9.12 0.00 6.44 31.88 0.00 52.56
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz aqu-sz %util
nvme0n1 4.12 211.4 0.00 0.00 2.41 51.30 812.3 18244.0 120.1 12.88 38.22 22.45 31.2 99.1
What it means: High write ops, high iowait, high utilization. Ransomware often shows up as sustained write pressure with lots of small writes.
Decision: If this is abnormal for the host, treat it as an incident. Correlate with process-level IO and recent auth events.
Task 3: Identify top IO processes (Linux)
cr0x@server:~$ sudo iotop -o -b -n 3
Total DISK READ: 0.00 B/s | Total DISK WRITE: 65.12 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
8123 be/4 root 0.00 B/s 18.22 M/s 0.00 % 98.00 % python3 /opt/agent/tmp/enc.py
3911 be/4 nobody 0.00 B/s 12.05 M/s 0.00 % 87.00 % smbd: notifyd
What it means: A single process doing heavy sustained writes is a red flag—especially from odd paths like /opt/agent/tmp.
Decision: Kill/isolate the host, preserve evidence, and check whether this process exists on other systems.
Task 4: Check for recently modified executable files in suspicious directories (Linux)
cr0x@server:~$ sudo find /tmp /var/tmp /dev/shm -type f -mtime -1 -maxdepth 2 -ls | head
131099 120 -rwxr-xr-x 1 root root 122880 Feb 5 09:58 /tmp/.cache/agent
131104 44 -rwxr-xr-x 1 root root 45056 Feb 5 10:01 /dev/shm/.x
What it means: Fresh executables in temp locations are common in intrusion chains.
Decision: If you see new executable artifacts, quarantine the system and hash/collect them for analysis.
Task 5: Verify backup repository immutability controls (example: ZFS snapshots)
cr0x@server:~$ sudo zfs list -t snapshot -o name,creation -s creation tank/backups | tail -5
tank/backups@auto-2026-02-05_0100 Tue Feb 5 01:00 2026
tank/backups@auto-2026-02-05_0200 Tue Feb 5 02:00 2026
tank/backups@auto-2026-02-05_0300 Tue Feb 5 03:00 2026
tank/backups@auto-2026-02-05_0400 Tue Feb 5 04:00 2026
tank/backups@auto-2026-02-05_0500 Tue Feb 5 05:00 2026
What it means: Snapshots exist and are being created on schedule. That’s necessary, not sufficient.
Decision: Confirm who can destroy these snapshots. If “domain admins can SSH in and run zfs destroy,” you don’t have immutability; you have a to-do list.
Task 6: Check whether snapshots are protected by delegation (ZFS example)
cr0x@server:~$ sudo zfs allow tank/backups
---- Permissions on tank/backups --------------------------------------------
Local+Descendent permissions:
user backupsvc create,mount,receive,rollback,snapshot
user root create,destroy,mount,receive,rename,rollback,snapshot
What it means: Only root can destroy snapshots/datasets; the backup service account can create/receive/snapshot but not destroy.
Decision: This is a good pattern. If your backup service can destroy or rename, fix it. If too many humans can become root easily, fix that too.
Task 7: Validate you can restore (not just back up) a file quickly (ZFS clone example)
cr0x@server:~$ sudo zfs clone tank/backups@auto-2026-02-05_0500 tank/restore-test
cr0x@server:~$ ls -lah /tank/restore-test/finance/ | head
total 64K
drwxr-xr-x 2 root root 4.0K Feb 5 05:00 .
drwxr-xr-x 8 root root 4.0K Feb 5 05:00 ..
-rw-r--r-- 1 root root 18K Feb 5 04:59 invoice-2026-02.csv
What it means: You can mount a point-in-time view without touching production data. This is what “restore readiness” looks like.
Decision: Time this operation and record it. If it takes too long, your RTO assumptions are wrong.
Task 8: Check for mass file changes using audit logs (Linux auditd example)
cr0x@server:~$ sudo ausearch -ts today -k finance-share | tail -5
time->Tue Feb 5 10:03:41 2026
type=SYSCALL msg=audit(1738759421.211:1293): arch=c000003e syscall=82 success=yes exit=0 a0=ffffff9c a1=7f2a2c001b40 a2=241 a3=1b6 items=2 ppid=8123 pid=8123 auid=1002 uid=0 gid=0 exe="/usr/bin/python3" comm="python3"
type=PATH msg=audit(1738759421.211:1293): item=1 name="/srv/samba/finance/Q1/budget.xlsx.locked" inode=901232 dev=fd:00 mode=0100644 ouid=1002 ogid=1002 rdev=00:00 nametype=CREATE
What it means: A process is creating files with a new extension in the finance share. That’s classic encryption behavior.
Decision: Contain immediately: isolate host/user, disable credentials, block SMB from that source, preserve the snapshot.
Task 9: Check outbound connections for large unexpected transfers (Linux)
cr0x@server:~$ sudo ss -tpn | head -8
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 131072 10.20.9.22:443 185.199.110.153:52344 users:(("rclone",pid=9011,fd=7))
ESTAB 0 0 10.20.9.22:22 10.20.1.10:49822 users:(("sshd",pid=1881,fd=3))
What it means: Tools like rclone are frequently abused for exfiltration because they look like normal cloud sync.
Decision: If you don’t run rclone in production, treat this as an incident. Kill the process, capture command line/history, and check egress logs for what left.
Task 10: Verify domain controller replication health (Windows from a management host using WinRM is common; here’s a Linux-based check via samba-tool)
cr0x@server:~$ samba-tool drs showrepl dc01.corp.example
DSA Options: 0x00000001
DSA object GUID: 2b6b0f4b-3a2b-4d2a-9f2a-3b5a9c9aa2ad
==== INBOUND NEIGHBORS ====
DC=corp,DC=example
dc02.corp.example via RPC
DSA object GUID: 9d7b1c2a-1c11-4a72-ae9e-2d0b2a6a0c17
Last attempt @ Tue Feb 5 09:55:12 2026 UTC was successful
What it means: If replication is broken, you can’t trust group membership changes, password resets, or GPO state during incident response.
Decision: If replication is failing, stabilize AD first. Many recoveries die because identity is inconsistent.
Task 11: Check for new local admin additions (Linux example via sudoers and group files)
cr0x@server:~$ sudo grep -R "ALL=(ALL)" -n /etc/sudoers /etc/sudoers.d | head
/etc/sudoers:27:%sudo ALL=(ALL:ALL) ALL
/etc/sudoers.d/90-cloud-init-users:1:deploy ALL=(ALL) NOPASSWD:ALL
What it means: Passwordless sudo for a general account is a gift to attackers who land on that host.
Decision: Remove NOPASSWD where it isn’t strictly required, and constrain commands. If it is required, move it behind strong auth and logging on a jump host.
Task 12: Confirm backups are completing and not silently failing (generic: check systemd timers + logs)
cr0x@server:~$ systemctl list-timers --all | grep -E "backup|restic|borg"
Tue 2026-02-05 01:00:00 UTC 4h 59min left Tue 2026-02-04 01:00:03 UTC 19h ago restic-backup.timer restic-backup.service
cr0x@server:~$ sudo journalctl -u restic-backup.service -n 8 --no-pager
Feb 04 01:00:03 backup01 restic[2201]: repository 1a2b3c4d opened successfully
Feb 04 01:12:44 backup01 restic[2201]: processed 412563 files, 1.8 TiB in 12m41s
Feb 04 01:12:44 backup01 restic[2201]: snapshot 9f8e7d6c saved
Feb 04 01:12:44 backup01 systemd[1]: restic-backup.service: Succeeded.
What it means: Backups ran, completed, and saved a snapshot. You also learn throughput and scale, which matters for restores.
Decision: If backups are failing, fix it before you do anything “advanced.” If they’re slow, plan for restore capacity upgrades.
Task 13: Test restore integrity with a checksum comparison (example)
cr0x@server:~$ sha256sum /srv/data/finance/invoice-2026-02.csv
c8b1d7a2b2a5ef3fdddbf0d66a52f3f3d8b2d1e2f30d7c2f2e0b7b2b3a1f0a9e /srv/data/finance/invoice-2026-02.csv
cr0x@server:~$ sha256sum /tank/restore-test/finance/invoice-2026-02.csv
c8b1d7a2b2a5ef3fdddbf0d66a52f3f3d8b2d1e2f30d7c2f2e0b7b2b3a1f0a9e /tank/restore-test/finance/invoice-2026-02.csv
What it means: Your restore copy matches production. This is how you prove backups are not just present, but correct.
Decision: If checksums don’t match, treat it as data corruption or incomplete backup and investigate before an incident forces your hand.
Task 14: Quickly find processes with suspicious persistence (Linux)
cr0x@server:~$ systemctl list-unit-files --type=service | grep -E "agent|update|backup" | head
agent-updater.service enabled
backup-sync.service enabled
system-update.service enabled
cr0x@server:~$ systemctl status agent-updater.service --no-pager | sed -n '1,12p'
● agent-updater.service - Agent Updater
Loaded: loaded (/etc/systemd/system/agent-updater.service; enabled; preset: enabled)
Active: active (running) since Tue 2026-02-05 09:58:11 UTC; 7min ago
Main PID: 8123 (python3)
CGroup: /system.slice/agent-updater.service
└─8123 /usr/bin/python3 /opt/agent/tmp/enc.py
What it means: Persistence via systemd is common. A service running a script from a temp-ish path is extremely suspect.
Decision: Stop the service, isolate host, capture the unit file and script for analysis, and search fleet-wide for the same unit.
Fast diagnosis playbook
This is the “walk into the war room and be useful in five minutes” list. The goal is to find the bottleneck: is this an endpoint outbreak, an identity compromise, a storage saturation event, or a backup destruction attempt?
First: confirm scope and stop the bleeding
- Identify patient zero: which host/user first reported encrypted files? Correlate timestamps with authentication logs.
- Isolate likely sources: quarantine endpoints at the network level when possible; don’t rely on the endpoint to cooperate.
- Freeze backups/snapshots: preserve restore points immediately. If you have immutable snapshots, verify they still exist and are accessible.
Second: figure out whether identity is compromised
- Check privileged account changes: new admins, group membership changes, suspicious logins from unusual hosts.
- Check for mass password resets or disabled MFA. Attackers sometimes “prepare” by cutting off defenders.
- Confirm DC health: if AD is unstable, your remediation actions may not propagate cleanly.
Third: determine if this is encryption, exfiltration, or both
- Storage telemetry: iowait spikes, SMB write bursts, elevated metadata ops. Encryption is loud at the storage layer.
- Egress telemetry: unusual outbound connections, tools like rclone, big transfers to unknown IPs.
- File indicators: new extensions, ransom notes, consistent rename patterns.
Fourth: choose a recovery lane
- If backups are intact: prioritize containment and restore planning. Don’t rush restores into a still-compromised domain.
- If backups are threatened: lock them down immediately (credential changes, network isolation, revoke access). Time matters more than elegance.
- If data exfil is confirmed: involve legal/compliance early; restoration doesn’t erase disclosure obligations.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
They were proud of their endpoint suite. Big-name vendor, aggressive marketing, a dashboard full of reassuring green circles. The security lead could point at monthly reports showing blocked malware. Leadership loved it because it looked like progress.
The wrong assumption was simple: “If we have EDR, ransomware can’t spread.” What they actually had was EDR deployed to most endpoints, with exclusions for performance on a set of file servers that “couldn’t afford overhead.” Those servers were the place employees stored everything that mattered. The servers also allowed older SMB settings for a legacy app that nobody wanted to touch.
An attacker got in through a contractor VPN account without MFA. The account had been “temporary” for six months. They moved laterally, found the unprotected file servers, and ran encryption directly on them. No endpoint agent meant no isolation action. By the time the helpdesk saw the first ticket, the storage was already saturated with writes and the share was a graveyard of renamed files.
During the postmortem, the uncomfortable realization landed: their “best AV” didn’t fail. It did what it could on the endpoints it was on. Their strategy failed because they had built a system where the most important assets were the least protected, and where a single missing control turned into a company-wide crisis.
The fix wasn’t “buy a better product.” It was a segmentation project, MFA enforcement, removal of standing access, and a backup repository redesign so a contractor account couldn’t even reach it.
Mini-story 2: The optimization that backfired
A different company had a backup system that worked. Restores were slow, but they worked. Someone noticed the monthly storage bill and decided to “optimize.” They reduced snapshot frequency, shortened retention, and consolidated backup credentials so the ops team could “move faster.” They also opened firewall rules so the backup server could reach everything without tickets.
The change looked efficient. Fewer snapshots. Less storage. Less friction. The metrics improved. The risk didn’t show up on dashboards, because risk rarely does.
When ransomware hit, the attacker found the backup server quickly. It had broad network reach and a credential with far too much power because “backups need access.” They used the same admin path ops used: delete old restore points, then delete recent ones, then delete the job definitions so it would take longer to notice. The environment didn’t just lose production data; it lost time, because nobody could tell which backups were trustworthy.
Recovery became archaeology. They had to hunt for an older offsite copy that wasn’t fully synchronized. It restored, but it was old enough to cause serious business disruption. The “optimization” had effectively traded a predictable storage cost for an unpredictable existential cost.
The lesson they adopted was strict: treat backup systems as hostile territory. Reduce convenience. Add friction where it protects deletion and tampering. And never optimize retention without proving restore requirements first.
Mini-story 3: The boring but correct practice that saved the day
This one isn’t cinematic. It’s why it worked.
A mid-sized firm had an annoying quarterly ritual: a restore test. Not a tabletop exercise. A real restore. They would pick a random application VM, restore it into an isolated network, verify the service comes up, and measure the time. It annoyed everyone because it consumed storage and labor and produced a report nobody read until something went wrong.
They also maintained a separate backup admin account, stored offline, with MFA. Backup storage lived on a different network segment. Snapshot deletion required a separate approval path and credentials that were not the same as domain admin.
When ransomware landed through a compromised user mailbox and spread to a handful of servers, it did real damage. But it couldn’t reach the backup repository. It couldn’t delete snapshots. The team isolated affected hosts, verified the last known-good snapshots, and restored systems in a controlled order. The restore test runbooks meant there was no debate about “how” to restore—only “what first.”
They still had a bad week, because incidents are like that. But they didn’t have a catastrophic quarter. The boring practice—the repeated proof that restores work—turned panic into a sequence of steps.
Common mistakes: symptoms → root cause → fix
This is the part where I stop being diplomatic. These are patterns that keep showing up in real environments.
1) “Our backups were fine” (until they weren’t)
- Symptoms: Backup jobs show “successful,” but restores fail or take days; missing app consistency; corrupted archives; nobody knows encryption keys/passwords.
- Root cause: Backup success measured by job completion, not restore validation; no restore drills; credentials stored in compromised systems.
- Fix: Schedule restore tests with measured RTO; store backup admin creds offline; implement immutability; document restore order by dependency.
2) Encryption spreads “instantly” across the org
- Symptoms: Many shares and servers hit within minutes; storage latency spikes; SMB write rates go vertical.
- Root cause: Flat network + broad share permissions + reused admin creds + no east-west controls.
- Fix: Segment; restrict SMB/WinRM; remove local admin reuse; implement tiered admin model; limit share write permissions.
3) The security team can’t see anything during the incident
- Symptoms: SIEM silent; agents offline; log servers encrypted; no timeline.
- Root cause: Monitoring depends on the same identity/storage plane that got compromised; no hardened log pipeline.
- Fix: Centralize logs to hardened infrastructure; separate authentication; use write-once or restricted retention controls; test “logging during outage.”
4) Restores re-infect the environment
- Symptoms: Systems get re-encrypted after restore; attacker still present; new admin accounts appear again.
- Root cause: Restoring into a still-compromised domain; persistence not removed; credentials not rotated.
- Fix: Contain first; rotate privileged credentials; rebuild domain trust carefully; restore to isolated network and validate before rejoining.
5) “We had MFA” but attackers still got in
- Symptoms: Compromise through SSO; suspicious OAuth app grants; session tokens abused; helpdesk tricked.
- Root cause: MFA not enforced for all paths; phishing-susceptible methods; token theft; weak helpdesk verification.
- Fix: Enforce MFA everywhere; use phishing-resistant MFA for admins; alert on new app consents; harden helpdesk procedures.
6) Backups exist but get deleted anyway
- Symptoms: Snapshots missing; retention changed; backup jobs removed; object storage buckets emptied.
- Root cause: Backup admins share identity plane with attackers; no immutability; no separation of duties; no alerting on destructive actions.
- Fix: Implement immutable retention; separate credentials; require break-glass accounts for deletions; alert on retention changes and deletions immediately.
Checklists / step-by-step plan
Checkpoint 0: Decide what “good” means for your business
- Define RPO per system (how much data you can lose).
- Define RTO per system (how long you can be down).
- Classify data: regulated, confidential, internal, public.
- Identify top 5 services that must return first (identity, DNS, core apps, file shares, ERP, etc.).
Week 1: Stop the easy wins for attackers
- Enforce MFA on VPN, email, and privileged access.
- Inventory privileged accounts and split admin vs daily accounts.
- Remove obvious privilege bombs: NOPASSWD sudo, shared admin creds, stale contractor accounts.
- Baseline egress: know what outbound traffic “normal” looks like from servers.
Week 2: Make backups resilient to admin compromise
- Implement immutability for at least one backup copy (snapshots/object lock/WORM-like retention).
- Segment backup networks: restrict access to backup infrastructure from a jump host only.
- Separate credentials: backup admin accounts should not be domain admins; store break-glass credentials offline.
- Alert on destructive actions: snapshot deletions, retention changes, backup job deletions.
Week 3: Reduce blast radius with segmentation and permissions
- Segment user-to-server access: only necessary ports, only necessary subnets.
- Tighten share permissions: remove “Everyone write”; implement least privilege by group.
- Block lateral movement protocols where possible: SMB between workstations, WinRM, WMI across subnets.
- Harden admin workflows: PAW or jump host, no direct admin from workstations.
Week 4: Prove you can recover
- Run a restore test into an isolated network; validate app function, not just boot.
- Time it and compare to RTO; update plans or buy capacity.
- Write runbooks for restore order and decision gates.
- Rehearse the first hour: containment, credential rotation, snapshot preservation, comms plan.
Joke #2 (short, relevant): If your disaster recovery plan is a PDF nobody can find, congratulations—you’ve invented ransomware-compatible documentation storage.
FAQ
Does antivirus matter at all?
Yes. It reduces commodity infections and can help with containment. But it’s not a primary control because ransomware operators often use legitimate tools and stolen credentials.
What’s the single most important control against ransomware?
Backups you can restore, protected by immutability and separation of privileges. If attackers can delete your backups, the incident becomes existential.
Are immutable backups enough?
No. They address encryption impact, not data theft, identity compromise, or reinfection. You still need identity hardening, segmentation, and monitoring.
How often should we test restores?
At minimum quarterly for critical systems, and after major changes (new backup tooling, storage migration, identity changes). For top-tier systems, monthly isn’t excessive.
What should we monitor to detect ransomware early?
Identity changes (new admins, new tokens, MFA changes), unusual SMB write patterns, spikes in file renames, backup retention/job changes, and abnormal outbound transfers.
Should we pay the ransom?
That’s a business/legal decision, not a technical one, and it depends on jurisdiction, insurance, and impact. Operationally: build so you don’t have to. Also assume decryption tools may be slow, unreliable, or incomplete.
We’re mostly SaaS. Are we safe?
No. Identity compromise can encrypt or delete cloud data, and attackers can abuse OAuth grants and admin consoles. You still need backups/exports, logging, and strong identity controls.
What’s the difference between EDR and AV in this context?
AV is typically signature/behavior-based prevention on endpoints. EDR adds telemetry, hunting, and response actions (like isolating a host). Both help, neither replaces backup resilience and identity controls.
How do we avoid restoring malware along with data?
Restore into an isolated network, scan and validate, rotate credentials, and ensure the original intrusion path is closed before reconnecting. Treat restores as a controlled rebuild, not a rewind button.
Do snapshots on primary storage count as backups?
They’re a recovery tool, not a full backup strategy. If the attacker compromises the storage admin plane, snapshots can be destroyed. Use snapshots as one layer, plus separate backup copies.
Next steps you can execute this week
If you want ransomware prevention, stop shopping for “best antivirus” and start running a reliability program for security outcomes.
- Pick one critical service and measure restore time end-to-end. Put the number in writing.
- Protect one backup copy with immutability and prove that routine admins cannot delete it quickly.
- Enforce MFA for all remote access, then audit exceptions until there are none worth keeping.
- Segment backups and management networks so a compromised workstation can’t even see them.
- Set three alerts you will actually respond to: privileged group changes, backup retention changes, and abnormal outbound transfers.
- Write a one-page first-hour runbook and run it once. The goal is to remove debate, not to impress anyone.
Ransomware is a systems problem. Treat it like production engineering: reduce blast radius, enforce least privilege, design for recovery, and rehearse. Antivirus can come along for the ride.