NotPetya: when “malware” behaved like a sledgehammer

December 5, 2025 • February 3, 2026 • Read: 21 min • Views: 0

Was this helpful?

At 2 a.m., a “ransomware” screen is almost comforting. It suggests a transaction. NotPetya wasn’t a transaction.
It was an operational demolition: Windows fleets rebooting into a fake “repair” routine while the business discovered—live—
how dependent it was on domain trust, shared credentials, and backups that looked great in slide decks.

This is a production-systems view of NotPetya: how it moved, how it killed, and what you should do differently tomorrow morning.
If your plan relies on “we’ll just isolate the one infected PC,” you’ve already lost.

NotPetya in one page: what it did and why it mattered

NotPetya was widely reported as ransomware. Operationally, treat it as a wiper with excellent distribution.
It used multiple propagation methods, grabbed credentials, executed via standard Windows admin tooling, and
then sabotaged system boot by overwriting parts of the disk boot process and encrypting structures required to mount the filesystem.

The practical implication: paying wouldn’t reliably recover anything because the “ransom” workflow was functionally broken.
The damage wasn’t “files locked”; it was “machines don’t boot, and recovery is rebuild + restore.”
If your recovery model assumed “we’ll disinfect endpoints,” NotPetya turned that into “we need to reimage fleets and reconstitute identity.”

NotPetya also exposed a quiet truth: many enterprises run like a shared apartment with one master key. Lateral movement
becomes trivial when local admin passwords are reused, domain admins log into random desktops, and remote execution is left wide open.

One engineering maxim still applies, and it’s the one quote worth keeping on your wall:
Hope is not a strategy. —paraphrased idea attributed to General Gordon R. Sullivan

Joke #1: NotPetya was “ransomware” the way a bulldozer is “landscaping.” You can pay it; the yard is still gone.

Facts and context you should actually remember

These aren’t trivia; they’re the levers that explain why the incident cascaded the way it did.

It hit in June 2017, spreading fast across corporate networks and causing global outages.
It commonly entered via a compromised software update mechanism (classic supply-chain compromise), not just random phishing.
It abused Windows administrative tools like PSExec and WMIC—tools your admins already used every day.
It used credential theft (memory scraping / credential dumping) to move laterally where patching wasn’t enough.
It also spread via SMB using multiple techniques, including exploiting unpatched hosts and leveraging valid credentials.
It damaged bootability by tampering with boot records and encrypting key disk structures; recovery often meant reimaging.
The ransom payment channel was effectively nonfunctional, meaning even “successful payment” didn’t map to organized decryption.
It was mislabeled early as “Petya ransomware”, which muddied incident response playbooks built for profit-motivated ransomware.
It highlighted the risks of flat networks: once inside, it found shared admin paths like water finds cracks.

If you remember only one: the initial compromise vector matters less than the internal controls you lacked.
NotPetya wasn’t a magic trick; it was an audit of everyday shortcuts.

How NotPetya worked: the mechanics of a “wiper in ransomware clothing”

1) Entry: supply chain beats perimeter theater

Organizations that were “not a target” still got hit because the entry point wasn’t personalized spearphishing in many cases.
A compromised update mechanism gives the attacker a signed-looking binary with a distribution channel you already allow.
This is why “we block executables from email” is necessary but not sufficient; it’s also why egress filtering and application allowlisting
become real security controls, not compliance theater.

2) Propagation: exploit + credentials + admin tooling

NotPetya didn’t bet on one technique. It combined multiple ways to move:

Exploit-style spread against vulnerable SMB implementations on unpatched Windows systems.
Credential-based spread once it obtained passwords/hashes, turning your own admin rights into its transport.
Remote execution via PSExec/WMIC, which often sailed through because “it’s internal” and “admins need it.”

That multi-pronged approach matters operationally. Patch management alone doesn’t save you if your credential hygiene is garbage.
Conversely, perfect passwords don’t save you if you’ve left unpatched legacy boxes in the same broadcast domain as everything else.
Defense has to be layered because the attacker already is.

3) Payload: break the boot chain, not just the files

The “Petya-ish” family is notorious for messing with the boot process. NotPetya went after low-level disk structures.
This is the part that turns a malware incident into a logistics incident: imaging pipelines, USB PXE boot reliability,
driver packs, BitLocker recovery keys, and golden image drift all suddenly matter.

You also learn quickly which “backups” were just snapshots of already-encrypted data and which were actually immutable,
offline, and restorable under pressure.

4) Timing and reboot behavior: why it felt like the floor dropped

NotPetya commonly triggered a reboot to execute its boot-time sabotage. That’s psychologically important: a user sees a reboot,
shrugs, gets coffee, comes back to a skull-and-crossbones vibe. Then the help desk calls IT. Then IT calls everyone.
Meanwhile, lateral movement has already happened.

This is why “we’ll shut it down when we see it” is usually too late. If your detection depends on end users being observant,
you’re building a safety system out of hope and caffeine.

5) The real impact: identity, DNS, and shared services

NotPetya didn’t just kill desktops. It broke the things those desktops used: domain controllers, file servers, deployment servers,
software distribution points, monitoring collectors. When those go, the business discovers it can’t even coordinate recovery.

If your response plan requires “push an agent to all hosts,” and your software distribution is down, congratulations:
your plan is a document, not a capability.

Where organizations failed: the predictable weak points

Flat networks: the default architecture of regret

Lots of enterprise LANs still behave like a friendly village where every house shares the same hallway.
When NotPetya lands on one endpoint, it can see file shares, management ports, and remote execution paths everywhere.
Segmentation isn’t glamorous. It’s also the cheapest way to turn “global outage” into “local incident.”

Credential sprawl: domain admin on a workstation is a loaded gun

NotPetya loved credentials because credentials are universal. Exploits are picky; passwords are not.
If domain admins or high-privilege service accounts log into endpoints, those secrets end up in memory, and memory is a buffet.
The fix is boring and strict: tiered admin model, privileged access workstations, LAPS/Windows LAPS, and remove admin rights from users.

Remote execution left open: PSExec is not the villain, your controls are

PSExec and WMIC are common because they work. NotPetya used them because you let them work everywhere.
You can keep remote management and still reduce blast radius: restrict who can call it, where it can be called from,
log it, and isolate admin tools to management subnets.

Backups that weren’t recoverable at scale

“We have backups” is a sentence that means nothing unless you can answer: recover what, how fast, in what order,
and without AD.
NotPetya forced enterprises to discover that their restore process depended on production AD, production DNS, and production file shares.
When those are down, the restore runbook becomes interpretive dance.

Monitoring that went dark with the network

Centralized logging and monitoring are great—until they’re on the same network that just got bricked.
You need at least one out-of-band view: a tap, a separate management network, or cloud-based telemetry that survives internal chaos.
If you can’t see, you can’t triage, and you can’t prove containment.

Fast diagnosis playbook: what to check first/second/third to find the bottleneck quickly

This is for the first hour, when everyone is shouting and your job is to turn panic into a queue.
The goal is to identify: (1) are we still spreading, (2) what shared service is the choke point, (3) what can we restore first.

First: stop the bleeding (propagation)

Confirm active lateral movement by looking for spikes in SMB/RPC traffic, authentication storms, and remote service creation.
Pull the network brakes strategically: block SMB (445) between segments; restrict admin protocols to management subnets.
Disable known abused tooling paths temporarily: PSExec service creation, remote WMI, and admin shares between user VLANs.

Second: protect and validate identity services

Check domain controllers: are they reachable, healthy, time-synced, and not rebooting into nonsense?
Freeze privileged accounts: disable or reset high-risk accounts, rotate service credentials where feasible.
Confirm DNS and DHCP integrity: if name resolution is poisoned or down, recovery tools will fail in weird ways.

Third: decide the recovery strategy (rebuild vs restore)

Pick a “golden island”: a clean network segment with clean admin workstations and clean tooling.
Prioritize business services: ERP, shipping, manufacturing, identity, email—whatever keeps the business breathing.
Validate backups by restoring a representative system in the golden island. Don’t debate; restore.

Your fastest win is often to restore capability (identity + minimal apps) rather than chasing perfect forensics.
Forensics matter. So does payroll on Friday.

Hands-on tasks: commands, outputs, and decisions (12+)

These are practical tasks you can run during a NotPetya-style incident or in hardening exercises. The outputs are examples.
The point isn’t to memorize them; it’s to build muscle memory around what the output means and what you do next.

Task 1: Identify sudden SMB connection storms on a Linux gateway

cr0x@server:~$ sudo ss -tnp '( sport = :445 or dport = :445 )' | head
ESTAB 0 0 10.10.20.5:51234 10.10.30.12:445 users:(("smbd",pid=2140,fd=45))
SYN-SENT 0 1 10.10.20.5:51301 10.10.30.19:445 users:(("smbclient",pid=9921,fd=3))
ESTAB 0 0 10.10.20.8:49822 10.10.30.12:445 users:(("smbd",pid=2140,fd=52))
...

What it means: A burst of new SMB attempts (SYN-SENT) plus many established sessions can indicate worm-like scanning or mass share access.

Decision: If this is abnormal, block 445 between user VLANs immediately and restrict SMB to file server subnets only.

Task 2: See top talkers to port 445 on a router/firewall host

cr0x@server:~$ sudo conntrack -L | grep dport=445 | awk '{print $5}' | cut -d= -f2 | sort | uniq -c | sort -nr | head
  482 src=10.10.20.5
  219 src=10.10.20.8
   77 src=10.10.21.33
...

What it means: A small set of sources generating hundreds of SMB flows is suspicious in office networks.

Decision: Isolate the top talkers at the switch port or VLAN level. Don’t wait for endpoint tooling to cooperate.

Task 3: Check Windows domain replication health from a management host (via WinRM)

cr0x@server:~$ ssh admin@mgmt-bastion "powershell -NoProfile -Command \"repadmin /replsummary\""
Replication Summary Start Time: 1/22/2026 02:14:50

Source DSA          largest delta    fails/total %%   error
DC1                 00:01:12         0 / 20    0
DC2                 00:00:58         0 / 20    0

Destination DSA     largest delta    fails/total %%   error
DC1                 00:01:12         0 / 20    0
DC2                 00:00:58         0 / 20    0

What it means: Healthy replication suggests AD isn’t currently collapsing. During a wiper event, AD stability is oxygen.

Decision: If replication fails spike or deltas grow, prioritize DC isolation and recovery over app servers. Everything else depends on it.

Task 4: Detect unexpected remote service creation (common with PSExec)

cr0x@server:~$ ssh admin@mgmt-bastion "powershell -NoProfile -Command \"Get-WinEvent -FilterHashtable @{LogName='System'; Id=7045; StartTime=(Get-Date).AddHours(-2)} | Select-Object -First 5 | Format-List\""
ProviderName : Service Control Manager
Id           : 7045
Message      : A service was installed in the system. Service Name: PSEXESVC Display Name: PSEXESVC
...

What it means: Event ID 7045 with PSEXESVC indicates PSExec-style remote execution.

Decision: If unexpected, block inbound admin shares and remote SCM calls between workstation segments; investigate source hosts immediately.

Task 5: Check for WMIC remote execution traces

cr0x@server:~$ ssh admin@mgmt-bastion "powershell -NoProfile -Command \"Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4688; StartTime=(Get-Date).AddHours(-2)} | Where-Object { $_.Message -match 'wmic.exe' } | Select-Object -First 3 | Format-Table -Auto\""
TimeCreated           Id Message
-----------           -- -------
1/22/2026 01:42:10 4688 A new process has been created... New Process Name: C:\Windows\System32\wbem\WMIC.exe ...

What it means: Process creation events referencing WMIC can indicate remote command execution or inventory tooling. Context matters.

Decision: If this shows up on user endpoints or from unusual parent processes, treat as active compromise and isolate the host.

Task 6: Confirm whether SMBv1 is still enabled on Windows hosts

cr0x@server:~$ ssh admin@mgmt-bastion "powershell -NoProfile -Command \"Get-WindowsOptionalFeature -Online -FeatureName SMB1Protocol\""
FeatureName      : SMB1Protocol
State            : Enabled

What it means: SMBv1 enabled is a persistent risk amplifier; it’s not “the cause,” but it increases the speed of bad days.

Decision: Disable SMBv1 broadly, but stage it: identify legacy dependencies first, then isolate them. Don’t let one ancient copier dictate network safety.

Task 7: See which hosts are exposing admin shares from a Linux scanner box

cr0x@server:~$ for ip in 10.10.20.{1..20}; do timeout 1 bash -c "echo >/dev/tcp/$ip/445" 2>/dev/null && echo "$ip:445 open"; done
10.10.20.5:445 open
10.10.20.8:445 open
10.10.20.12:445 open

What it means: A lot of workstation-to-workstation 445 exposure means lateral movement has easy lanes.

Decision: Implement host firewall rules or network ACLs so workstations cannot accept SMB from peer workstations.

Task 8: Identify machines rebooting unexpectedly (Linux virtualization host perspective)

cr0x@server:~$ sudo journalctl -u libvirtd --since "2 hours ago" | grep -E "shutdown|reboot|destroy" | head
Jan 22 01:31:02 hv01 libvirtd[1120]: Domain win-app-03 destroyed
Jan 22 01:31:18 hv01 libvirtd[1120]: Domain win-app-03 started
...

What it means: Sudden reboot patterns across Windows VMs can align with malware triggering reboots to execute boot-level changes.

Decision: Pause automated restarts; snapshot forensic disks where appropriate; isolate network connectivity for suspected VMs.

Task 9: Check MBR/partition table visibility from a Linux rescue environment

cr0x@server:~$ sudo fdisk -l /dev/sda | head -n 15
Disk /dev/sda: 240 GiB, 257698037760 bytes, 503316480 sectors
Disk model: SSD 860 EVO
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start       End   Sectors   Size Id Type
/dev/sda1  *     2048 503316479 503314432   240G  7 HPFS/NTFS/exFAT

What it means: If fdisk shows garbage, unknown disklabel, or missing partitions, boot record tampering is likely.

Decision: Don’t attempt “quick fixes” on production disks. Preserve images for investigation; proceed with rebuild + restore unless you have a proven recovery play.

Task 10: Validate backup immutability/air-gap assumptions (object storage example via s3cmd)

cr0x@server:~$ s3cmd info s3://corp-backups/windows/DC1-2026-01-22.vhdx
File size: 68719476736
Last mod: Tue, 22 Jan 2026 01:10:02 GMT
MIME type: application/octet-stream
MD5 sum: 1b2c3d4e5f...
Server side encryption: AES256

What it means: The backup exists and metadata is sane. This does not prove it’s unmodified or restorable, but it’s the first gate.

Decision: Immediately attempt a restore to an isolated network. If the restore cannot be completed in a controlled test, treat backups as untrusted.

Task 11: Check for “everyone is a local admin” drift on Windows

cr0x@server:~$ ssh admin@mgmt-bastion "powershell -NoProfile -Command \"Get-LocalGroupMember -Group 'Administrators' | Select-Object Name\""
Name
----
BUILTIN\Administrator
CORP\Domain Admins
CORP\Helpdesk
CORP\someuser

What it means: Domain groups and random users in local Administrators make credential theft far more valuable.

Decision: Remove broad groups from local admin; adopt LAPS; enforce privileged access via dedicated admin accounts and devices.

Task 12: Confirm whether credential caching policies are too permissive

cr0x@server:~$ ssh admin@mgmt-bastion "powershell -NoProfile -Command \"reg query 'HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon' /v CachedLogonsCount\""
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon
    CachedLogonsCount    REG_SZ    10

What it means: Cached logons help laptops work offline, but they also mean more credential material on endpoints.

Decision: Tune cached logons to business need. For high-risk segments (finance, admins), reduce caching and enforce MFA + privileged workstations.

Task 13: Find Kerberos authentication storms that hint at credential abuse

cr0x@server:~$ ssh admin@mgmt-bastion "powershell -NoProfile -Command \"Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4769; StartTime=(Get-Date).AddMinutes(-30)} | Measure-Object\""
Count    : 18423
Average  :
Sum      :
Maximum  :
Minimum  :
Property :

What it means: A sudden spike in ticket-granting events can correlate with widespread authentication attempts and lateral movement.

Decision: Rate-limit and isolate suspected subnets; investigate the top requesting hosts; consider emergency password resets for exposed service accounts.

Task 14: Verify that your “golden image” is actually retrievable and current

cr0x@server:~$ ls -lh /srv/images/windows-2019-golden.wim
-rw-r--r-- 1 root root 5.3G Jan 10 03:12 /srv/images/windows-2019-golden.wim

What it means: You have an image file. The timestamp suggests it may be stale relative to patch cadence.

Decision: If the image is old, rebuild it now in peacetime. During an outbreak, stale images create a second incident: mass redeploy into known vulnerabilities.

Joke #2: Your incident runbook that depends on “remote into the infected host and run the tool” is adorable.
It’s like planning to fix a fire by emailing the smoke.

Three corporate mini-stories from the blast radius

Mini-story 1: The incident caused by a wrong assumption

A mid-sized manufacturing company ran a mixed Windows environment: a modern office network plus legacy OT-adjacent machines
that “couldn’t be touched” because they controlled packaging lines. The security team assumed that the OT-ish machines were isolated
because they were on a different IP range. They were not isolated. They were simply numbered differently.

When the initial infection showed up in accounting, the first response was classic: shut down email attachments, block a few hashes,
and tell users to stop clicking things. The team believed the threat would stay where it started, because “it’s ransomware.”
Within an hour, help desk tickets spiked: file shares unavailable, logins timing out, random reboots.

The wrong assumption showed itself in the firewall: almost no internal east-west rules existed. SMB, RPC, and admin shares flowed freely.
The “OT range” was fully reachable from office desktops because someone had once needed to push a software update and never removed the rule.
That old rule was now a highway.

The recovery was brutal but clarifying. They had to rebuild endpoints at scale, then rebuild trust: AD tiering, LAPS, segmentation by function,
and explicit “no workstation-to-workstation SMB.” The painful lesson wasn’t that malware is clever. The lesson was that networks are honest:
if it’s routable, it’s reachable.

Mini-story 2: The optimization that backfired

A retail company had optimized its Windows administration for speed. The endpoint management team used a single highly privileged
service account to deploy software through remote execution across thousands of machines. It was efficient, consistent, and beloved.
“One account to rule them all,” as they said in meetings, with exactly the wrong amount of confidence.

During the outbreak, credential dumping turned that optimization into a catastrophe multiplier.
Once the service account’s credential material was harvested on a single machine, it became a skeleton key for the fleet.
The malware didn’t need to brute force. It simply impersonated the exact thing the company depended on for control.

The team tried to contain by turning off the management platform, but the platform wasn’t the only problem—the credential was.
Disabling the account stopped some propagation, but it also broke legitimate remote recovery workflows.
They had accidentally designed an environment where the same mechanism that made operations easy also made the attacker fast.

The fix was not “never centralize.” It was to centralize with boundaries: per-segment deployment accounts,
constrained delegation, just-enough administration, and strong separation between management planes and user endpoints.
They still deploy software quickly today. They just don’t do it with a master password that can be scraped from RAM.

Mini-story 3: The boring but correct practice that saved the day

A professional services firm had a habit nobody bragged about: quarterly restore drills that assumed AD was unavailable.
It was not a tabletop exercise. It was hands-on, with a clean-room network, a documented order of operations,
and an irritating insistence on writing down what broke.

When NotPetya-style behavior hit (supply-chain entry, fast lateral movement, boot failure), their monitoring lit up,
and they did something unsexy immediately: they halted east-west SMB at the core, then carved out a clean management subnet.
They accepted that some hosts were gone and moved straight to rebuild.

The restore drills paid off in a very specific way: they had offline copies of AD recovery material and knew how to stand up
a minimal identity service without relying on the infected domain. They also knew which backups were “nice to have” and which
were essential for billing and client deliverables.

They still had a bad week. But it was a week, not a month. The “secret sauce” was not a product.
It was the discipline to test the dull parts: restores, privileged access, and segmentation rules that made architects yawn.

Common mistakes: symptoms → root cause → fix

1) Symptom: “We isolated the first infected laptop, but more machines keep dying.”

Root cause: Lateral movement already occurred via credentials and admin tooling; the initial host was not the only propagator.

Fix: Block SMB and remote admin protocols between workstation segments; rotate high-value credentials; hunt for PSExec/WMIC activity at scale.

2) Symptom: “Patching didn’t help; we were patched and still got hit.”

Root cause: Patch status reduces exploitability but does not prevent credential-based spread or supply-chain entry.

Fix: Enforce least privilege, LAPS, tiered admin, and network segmentation. Treat credential hygiene as a primary control, not a “later.”

3) Symptom: “Backups exist, but restore is impossible right now.”

Root cause: Restore process depends on production AD/DNS/file shares, which are down or untrusted.

Fix: Build a clean-room restore environment; maintain offline credentials/keys; rehearse AD-down restores; keep immutable/offline copies.

4) Symptom: “We can’t log in anywhere; authentication is failing across the board.”

Root cause: Domain controllers impacted, time skew, DNS issues, or mass password resets done without sequencing.

Fix: Stabilize DCs first; verify time sync; validate DNS; perform controlled credential rotation starting with tier-0 accounts.

5) Symptom: “Endpoint tooling shows green, but users report reboot loops and boot failures.”

Root cause: Boot record tampering and disk-level damage; agents can’t report if the OS doesn’t load.

Fix: Shift to imaging/rebuild workflow; preserve forensic disk images; don’t waste hours trying to ‘clean’ non-bootable systems.

6) Symptom: “Containment failed because we couldn’t push firewall rules fast enough.”

Root cause: Over-reliance on centralized management that shares fate with the infected network.

Fix: Pre-stage network ACL playbooks at core switches/firewalls; maintain out-of-band access; test emergency change paths.

7) Symptom: “Only one subsidiary office got the update, yet HQ died too.”

Root cause: Hub-and-spoke connectivity with broad trust; shared admin creds or shared services connected networks.

Fix: Segment by trust boundaries; restrict admin credential use across sites; use separate management domains or PAM jump hosts.

Checklists / step-by-step plan

Phase 0 (now, before the next outbreak): reduce blast radius

Segment the network: block workstation-to-workstation SMB/RPC by default. Allow only to required servers.
Disable SMBv1 broadly; isolate exceptions behind strict ACLs.
Deploy LAPS/Windows LAPS so local admin passwords are unique and rotated.
Tier your admin accounts: no domain admin logons on workstations; use privileged access workstations.
Lock down PSExec/WMIC: restrict who can execute remotely; monitor for service creation and suspicious WMI calls.
Make backups survivable: immutable copies, offline/air-gapped where possible, and restore drills that assume AD is down.
Prepare a golden island: clean jump hosts, clean tooling, and a management subnet that can operate during an incident.
Instrument east-west traffic: if you can’t see lateral movement, you can’t contain it.

Phase 1 (first hour): contain

Declare the incident early and freeze risky changes. You need a single technical commander, not a committee.
Block SMB (445) and remote admin (e.g., RPC/SMB admin shares) across workstation VLANs at the core.
Isolate top talkers identified via network telemetry (SMB storms, auth storms).
Protect DCs: isolate them logically; ensure only management subnets can reach admin ports.
Disable or reset high-value accounts if compromise is suspected (domain admins, deployment accounts).

Phase 2 (day 1–3): recover capability, then confidence

Stand up a clean management environment (jump host, imaging, logging) separate from infected segments.
Rebuild from known-good images rather than “cleaning” endpoints. Wipers punish optimism.
Restore priority services in dependency order: identity/DNS/DHCP → core apps → file services → endpoints.
Rotate credentials systematically starting with tier-0. Document what was rotated and when.
Prove containment with network and event log evidence before reconnecting segments.

Phase 3 (week 2+): don’t waste the pain

Write the timeline while memories are fresh: initial access, propagation, detection, containment, recovery.
Measure MTTR honestly: not when servers booted, but when the business resumed.
Convert findings into controls: segmentation rules, account tiering, backup drills, and monitoring that survives outages.

FAQ

Is NotPetya ransomware or a wiper?

Treat it as a wiper operationally. It presented a ransom note, but recovery through payment was not reliably viable.
The effect was destructive and systemic.

Why did it spread so fast inside companies?

Because it combined multiple methods: SMB exploitation, credential theft, and remote execution via standard admin tools.
Flat networks and shared credentials made the internal environment behave like one big trust zone.

Would disabling SMBv1 have prevented it?

It would have reduced one high-risk propagation path, but not eliminated credential-based spread and abuse of admin tooling.
You need segmentation, least privilege, and credential hygiene in addition to patching and protocol hardening.

What is the single most important containment move?

Stop east-west movement: block SMB and remote admin protocols between workstation segments at the network layer.
Endpoint isolation is great when it works; network controls work when endpoints are already on fire.

Should we power off infected machines?

Sometimes. If you have evidence of active spread from a host, removing it from the network helps.
But don’t waste time playing whack-a-mole; focus on network containment and credential rotation. Also preserve forensic evidence if required.

How do backups fail in events like this?

Commonly by dependency: restores require production AD/DNS, the backup catalog is on an infected file share, or backups aren’t immutable and get wiped too.
The cure is restore drills in a clean-room environment and at least one offline/immutable copy.

What should we monitor to detect NotPetya-like lateral movement?

SMB connection spikes, event logs for remote service creation (e.g., PSEXESVC), unusual WMI execution, authentication storms (Kerberos/NTLM),
and sudden reboots across many hosts.

Does application allowlisting help?

Yes, especially against unexpected binaries and suspicious execution from user-writable paths.
But allowlisting won’t save you if the attacker is executing with legitimate admin tools and stolen credentials. Use it as a layer, not a talisman.

How do we recover Active Directory safely if DCs were impacted?

With a documented, rehearsed plan: isolate suspected DCs, validate replication/health, decide whether to restore from known-good backups,
and rotate privileged credentials after re-establishing trust. If you’ve never practiced AD recovery, do not start improvising during a crisis.

What’s the best long-term defense against a supply-chain entry?

You can’t “firewall” your way out of trusting vendors. Reduce impact with segmentation, least privilege, and strong monitoring.
Also harden update workflows: verify signing chains, restrict where update systems can run, and isolate those systems from user endpoints.

Practical next steps

NotPetya wasn’t impressive because it was novel. It was impressive because it was compatible with how enterprises actually behave:
shared credentials, broad network reachability, and an overconfidence that “internal” equals “safe.”

If you run production systems, do three things this quarter:

Implement segmentation that blocks workstation-to-workstation SMB/RPC and prove it with testing.
Fix credential hygiene: LAPS, tiered admin, and no domain admin sessions on endpoints.
Run a restore drill assuming AD is down, using a clean-room network and a timed objective.

You don’t need to predict the next NotPetya. You need to make your environment boring to destroy: slow to spread, hard to credential-hop,
and easy to rebuild from clean parts. That’s what resilience looks like when the “malware” shows up with a sledgehammer.