You can patch fast, rotate secrets, and run fancy detections—and still get owned because someone else shipped you the problem.
The uncomfortable truth: attackers love vendors because vendors already have distribution, trust, and a direct line into your production fleet.
The pager doesn’t care whether the breach started in your repo or a dependency’s repo. Your customers only see that you ran the code.
So let’s talk about how supply-chain attacks actually work in the real world, what to check first when you suspect one, and what to change so you stop being the easy downstream victim.
What a supply-chain attack really is (and what it isn’t)
A supply-chain attack is when the attacker compromises a trusted upstream element—vendor software, open-source dependency,
build tooling, CI runners, package repositories, container registries, update channels, or even your managed service provider—
so the victim installs the attacker’s payload as part of “normal operations.”
This is not the same as “we got phished” or “someone brute-forced our VPN.” Those are direct attacks. In supply-chain cases,
the attacker weaponizes your trust relationships. They aim for a distribution mechanism, not an individual box.
In operations terms: a supply-chain attack turns your deployment pipeline into an infection vector. The blast radius is defined by
how broadly you distribute artifacts, and how quickly you can answer two questions:
- What exactly did we deploy, to where, and when?
- Can we prove it came from our intended source and wasn’t modified?
If you can’t answer those quickly, you’re not “behind on tooling.” You’re behind on survivability.
One quote worth keeping on a sticky note for incident response:
“Hope is not a strategy.”
— General Gordon R. Sullivan
Why attackers love vendors
Compromising a customer requires targeting every customer. Compromising a vendor lets you do bulk delivery with branding.
It’s the difference between trying every door in the neighborhood and bribing the locksmith.
Vendors also provide “cover traffic.” Their software is expected to be noisy: phone-home telemetry, update checks, plugins, and integrations.
Those are great hiding places for command-and-control.
What counts as “vendor” in 2026
It’s not just the company you cut a PO to. Your “vendor” is:
- Every dependency you transitively pull from package ecosystems (npm, PyPI, Maven, RubyGems, Go modules).
- Every container base image and OS repository.
- Your CI/CD service, runners, plugins, and marketplace actions.
- Infrastructure-as-code providers and modules.
- Managed services and SaaS that hold secrets and have API access into your systems.
If it can push code, run code, or mint credentials, treat it like it’s already in your threat model.
The common attack paths attackers use
1) Malicious update through a trusted channel
Classic move: compromise the vendor’s build environment or signing keys, ship a signed update, and ride the auto-update mechanism into prod.
Your defenses may even help them: patch automation obediently accelerates the rollout.
2) Dependency substitution (typosquatting and dependency confusion)
If your build can pull from public registries, attackers can publish a package with the right name—or a near-right name—and wait for your build to “helpfully” fetch it.
Dependency confusion is especially nasty when internal package names collide with public namespaces.
Joke #1: Dependency names are like passwords—if you think “util” is fine, you also think “Password123” is edgy.
3) Compromised maintainer account
A lot of open-source security is “the maintainer’s laptop is fine, probably.” When the maintainer account gets hijacked,
the attacker can push a new release that looks legitimate. If your process treats “new version exists” as “ship it,” you’ve built a pipeline that auto-installs compromised code.
4) CI/CD runner compromise
Runners are juicy: they see source code, secrets, artifact repositories, and usually have network access to everything.
If a runner is compromised, the attacker can alter build outputs, inject backdoors, or exfiltrate signing keys.
This is how “the repo looks fine” turns into “the artifact is evil.”
5) Artifact registry and mirror poisoning
If you mirror packages or container images for speed, that mirror is now a root-of-trust component. Attackers go after it.
If your clients trust the mirror blindly, you have centralized your failure mode.
6) Build-time script execution
Package installs frequently execute scripts (postinstall hooks, setup.py, etc.). That means “building” is also “running untrusted code.”
In other words, your build system is a production-like environment for attackers.
7) SaaS integration abuse
OAuth apps, GitHub Apps, CI marketplace actions, chatops bots—these are “vendors” that can read and write your repos, issues, secrets, and pipelines.
Many of these apps are over-privileged because the person who set them up was optimizing for “make it work,” not “make it survive.”
Facts and historical context you should know
- Software distribution has been a target since the 1980s: early PC malware spread via shared disks and “helpful utilities,” a primitive supply chain.
- Digital code signing became mainstream because integrity didn’t scale manually: once downloads replaced physical media, vendors needed a way to say “this came from us.”
- Package ecosystems turned libraries into infrastructure: modern apps routinely depend on hundreds to thousands of transitive packages, many maintained by volunteers.
- Build systems became networked: CI/CD connected builds to the internet, to registries, to cloud metadata, and to secret stores—great for speed, great for attackers.
- “Dependency confusion” wasn’t new when it got a name: internal/public namespace collisions existed for years; naming it made it easier to brief executives.
- SBOMs rose because procurement needed something auditable: engineering already knew dependencies were messy; SBOMs forced the mess into a format governance could touch.
- Reproducible builds are a response to “trust me” binaries: if you can rebuild and match bit-for-bit, you reduce reliance on the vendor’s build environment.
- Modern attackers optimize for dwell time: supply-chain access often enables long-lived access because payloads can look like legitimate components.
- Cloud identity changed the blast radius: a compromised CI token can be more powerful than a compromised server because it can mint and deploy everywhere.
Fast diagnosis playbook (first/second/third)
Supply-chain incidents are a race between your ability to bound the blast radius and the attacker’s ability to entrench.
The goal of fast diagnosis is not perfect attribution. It’s containment with evidence.
First: Prove what changed
- Inventory deployed artifacts (image digests, package versions, build IDs) across prod/stage/dev.
- Identify the earliest deployment of the suspicious version. Time matters for scope.
- Freeze the update path: stop auto-deploy and auto-update, but don’t wipe evidence.
Second: Validate provenance and integrity
- Check signatures on artifacts and verify the signing identity matches your policy.
- Compare SBOMs between known-good and suspicious builds to find injected dependencies.
- Rebuild from source if possible and compare hashes (or at least dependency lockfiles).
Third: Hunt for execution and persistence
- Search runtime indicators: unexpected outbound connections, new cron/systemd units, new binaries, suspicious processes.
- Audit credentials used by the vendor component: API keys, OAuth tokens, cloud roles.
- Rotate and re-issue credentials after containment; assume anything touched by CI is exposed.
If you do those in order, you can answer the executive questions quickly:
“Are we affected?”, “How big is it?”, “Can we stop it?”, “What do we rotate?”
Practical tasks: commands, outputs, decisions (12+)
The point of these tasks is operational clarity: commands you can run under pressure, the output you should care about,
and the decision that follows. Mix and match based on your stack.
Task 1: Identify running container images by digest (Kubernetes)
cr0x@server:~$ kubectl get pods -A -o=jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{range .status.containerStatuses[*]}{.image}{"\t"}{.imageID}{"\n"}{end}{end}'
prod api-7b9c7bdfb8-k8r9t registry.local/api:1.4.2 docker-pullable://registry.local/api@sha256:8e6b...
prod worker-6f5c9cc8d9-2qvxn registry.local/worker:2.1.0 docker-pullable://registry.local/worker@sha256:2c1a...
What it means: Tags lie; digests don’t. The sha256 identifies exactly what ran.
Decision: If suspected compromise is tied to a tag, pivot immediately to digests and list every workload using affected digests.
Task 2: List deployments and their image tags (quick scope)
cr0x@server:~$ kubectl get deploy -A -o=jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{range .spec.template.spec.containers[*]}{.image}{"\n"}{end}{end}'
prod api registry.local/api:1.4.2
prod worker registry.local/worker:2.1.0
What it means: This is your “who might be affected” list, not proof.
Decision: Use it to prioritize which digests to verify and which teams to notify first.
Task 3: Verify an image signature with cosign
cr0x@server:~$ cosign verify --key /etc/cosign/cosign.pub registry.local/api@sha256:8e6b...
Verification for registry.local/api@sha256:8e6b... --
The following checks were performed on each of these signatures:
- The cosign claims were validated
- Existence of the claims in the transparency log was verified offline
- The signatures were verified against the specified public key
What it means: Signature verified against your key. This checks integrity and signer identity (if your key handling is sane).
Decision: If verification fails, quarantine the digest: block it at admission control and stop rollouts that reference it.
Task 4: Fail closed with a Kubernetes validating policy (detect unsigned)
cr0x@server:~$ kubectl get validatingadmissionpolicies
NAME AGE
require-signed-images 41d
What it means: You have a policy object, but that doesn’t mean it’s enforced everywhere.
Decision: Confirm bindings and test with a known-unsigned image. If enforcement is partial, treat that as an incident-prep bug.
Task 5: Check what versions of a Debian package are installed (host scope)
cr0x@server:~$ dpkg -l | grep -E '^ii\s+(openssl|curl|sudo)\s'
ii curl 7.88.1-10+deb12u4 amd64 command line tool for transferring data with URL syntax
ii openssl 3.0.11-1~deb12u2 amd64 Secure Sockets Layer toolkit - cryptographic utility
ii sudo 1.9.13p3-1+deb12u1 amd64 Provide limited super user privileges to specific users
What it means: Concrete package versions to map against the vendor advisory / IOCs.
Decision: If versions match an affected range, isolate hosts from the network and plan a rebuild, not a “hotfix and hope.”
Task 6: Verify package origin and repository (APT)
cr0x@server:~$ apt-cache policy curl
curl:
Installed: 7.88.1-10+deb12u4
Candidate: 7.88.1-10+deb12u4
Version table:
*** 7.88.1-10+deb12u4 500
500 http://deb.debian.org/debian bookworm/main amd64 Packages
100 /var/lib/dpkg/status
What it means: Confirms the repository source for the installed package.
Decision: If you see unexpected repos (especially ad-hoc mirrors), assume repository compromise or misconfiguration and cut over to a known-good mirror.
Task 7: Check recent package installs and upgrades (host timeline)
cr0x@server:~$ grep -E " install | upgrade " /var/log/dpkg.log | tail -n 5
2026-01-20 12:14:02 upgrade curl:amd64 7.88.1-10+deb12u3 7.88.1-10+deb12u4
2026-01-20 12:14:03 upgrade openssl:amd64 3.0.11-1~deb12u1 3.0.11-1~deb12u2
What it means: Timestamps and package transitions give you a minimum bound for when the change landed.
Decision: Correlate with deployments and outbound traffic spikes. If install aligns with suspicious behavior, treat that host set as potentially compromised.
Task 8: Find suspicious new systemd units (persistence check)
cr0x@server:~$ systemctl list-unit-files --type=service --state=enabled | tail -n 10
ssh.service enabled
systemd-timesyncd.service enabled
telegraf.service enabled
vendor-agent.service enabled
What it means: A new enabled service is a persistence candidate. “vendor-agent” may be legitimate or may be the problem.
Decision: If the unit appeared around the incident window and isn’t in your golden image spec, isolate and investigate before disabling (evidence first).
Task 9: Inspect the service definition and binary path
cr0x@server:~$ systemctl cat vendor-agent.service
# /etc/systemd/system/vendor-agent.service
[Unit]
Description=Vendor Agent
After=network-online.target
[Service]
ExecStart=/usr/local/bin/vendor-agent --config /etc/vendor/agent.yaml
Restart=always
[Install]
WantedBy=multi-user.target
What it means: Shows exactly what runs and from where.
Decision: If it runs from writable locations (/usr/local, home dirs), treat as higher risk and move the component into immutable, managed paths.
Task 10: Check whether a binary changed recently (quick triage)
cr0x@server:~$ ls -l --time-style=long-iso /usr/local/bin/vendor-agent
-rwxr-xr-x 1 root root 184320 2026-01-20 12:13 /usr/local/bin/vendor-agent
What it means: Modification time gives you an anchor. Not proof, but a lead.
Decision: If mtime aligns with a suspicious update, hash it, compare with known-good, and consider rebuilding affected hosts from a trusted base.
Task 11: Hash the binary and compare across fleet (integrity)
cr0x@server:~$ sha256sum /usr/local/bin/vendor-agent
9d3f5a2c8f6d2c0e7c1a3b7b0d4f11d0f3e0f9a7a1c2b3d4e5f6a7b8c9d0e1f2 /usr/local/bin/vendor-agent
What it means: This is the actual identity of the binary you executed.
Decision: If hashes differ between hosts that “should be identical,” assume uncontrolled distribution and stop trusting “version strings.”
Task 12: Identify unexpected outbound connections (runtime indicators)
cr0x@server:~$ sudo ss -tpn | head -n 12
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 10.0.4.12:43122 203.0.113.77:443 users:(("vendor-agent",pid=1842,fd=9))
ESTAB 0 0 10.0.4.12:53618 10.0.1.9:5432 users:(("api",pid=2210,fd=12))
What it means: A vendor agent talking to a public IP may be normal—or may be exfil.
Decision: If the destination is not on your approved egress list, block it at the firewall/egress gateway and start packet capture on an isolated clone.
Task 13: Review DNS queries for odd domains (behavioral clue)
cr0x@server:~$ sudo journalctl -u systemd-resolved --since "2 hours ago" | grep -i "query" | tail -n 5
Jan 22 10:11:03 node-12 systemd-resolved[512]: Querying A record for updates.vendor.example.
Jan 22 10:11:09 node-12 systemd-resolved[512]: Querying A record for telemetry.vendor.example.
What it means: The system is resolving vendor endpoints. Names matter: “updates” is expected; random lookalikes are not.
Decision: If you see newly introduced domains post-update, require the vendor to explain them and block until you’re satisfied.
Task 14: Check Git commit signature verification (repo hygiene)
cr0x@server:~$ git log --show-signature -n 3
commit 1a2b3c4d5e6f7g8h9i0j
gpg: Signature made Tue 21 Jan 2026 09:12:33 AM UTC
gpg: using RSA key 4A5B6C7D8E9F0123
gpg: Good signature from "Build Bot <buildbot@example.com>"
Author: Build Bot <buildbot@example.com>
Date: Tue Jan 21 09:12:33 2026 +0000
release: bump api to 1.4.2
What it means: “Good signature” means the commit matches a trusted key, not that the content is safe.
Decision: If signatures are missing or keys are unexpected, freeze releases and audit who can push to release branches.
Task 15: Detect dependency drift with lockfile diffs
cr0x@server:~$ git diff --name-only HEAD~1..HEAD | grep -E 'package-lock.json|poetry.lock|go.sum'
package-lock.json
What it means: A lockfile changed. That’s where surprise dependencies enter.
Decision: Treat lockfile changes as security-sensitive. Require review by someone who can read dependency graphs, not just application code.
Task 16: Audit recent CI/CD tokens usage (AWS CloudTrail example)
cr0x@server:~$ aws cloudtrail lookup-events --lookup-attributes AttributeKey=Username,AttributeValue=ci-deploy-role --max-results 3
{
"Events": [
{
"EventName": "AssumeRole",
"EventTime": "2026-01-22T09:58:14Z",
"Username": "ci-deploy-role"
},
{
"EventName": "PutObject",
"EventTime": "2026-01-22T09:58:55Z",
"Username": "ci-deploy-role"
}
]
}
What it means: Shows activity by a role often used in pipelines.
Decision: If you see unusual regions, times, or API calls for the role, assume CI credential exposure and rotate/re-scope immediately.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A mid-sized fintech ran a popular commercial monitoring agent. The agent had deep host access (because of course it did),
and the vendor had a clean reputation. The internal assumption was simple: “If it’s signed, it’s safe.”
An alert came in: a set of application hosts started making outbound TLS connections to an IP range nobody recognized.
NetOps tagged it as “probably vendor telemetry.” The SRE on call wasn’t thrilled, but the change window had been hectic,
and nobody wanted to be the person who blocked monitoring.
The mistake was the assumption that signing equals safety. Signing equals authenticity. If the vendor’s signing pipeline is compromised,
the signature becomes a delivery confirmation.
When they finally pulled the agent binary from three hosts and compared hashes, they didn’t match—despite “same version.”
Turns out there were two distribution paths: a normal update channel and a “hotfix” channel support had enabled months ago.
One path was compromised, one wasn’t. Same version string. Different payload.
The fix wasn’t heroic. They built a policy: only allow agent updates from their internal repository mirror, only by digest,
and require an internal re-sign step. Also: monitoring agents no longer got blanket egress.
The biggest cultural change was admitting that “vendor trust” is not a control.
Mini-story 2: The optimization that backfired
A B2B SaaS company optimized builds for speed. They used a shared CI cache volume across many repositories.
Dependencies, compiled artifacts, even some tooling binaries were cached globally. It was fast. It was also a shared infection surface.
One repo pulled a compromised dependency through a public registry. The dependency had a build script that executed during install.
It dropped a “helpful” binary into the shared cache path, named like a normal tool. Future builds in other repos started using it
because the PATH and cache lookup favored the warmed cache.
The weird symptom: builds succeeded, tests passed, but release artifacts had subtle differences. The runtime bug reports were nonsense:
logging looked slightly off, and a few requests started timing out under load. It smelled like an app regression, not a security incident.
The turning point was when someone rebuilt the same commit on a fresh runner and got a different container digest.
That should never happen in a healthy pipeline. They traced it to the shared cache: the build was not hermetic.
They removed the global cache, or rather: they made caches per-repo and per-branch, with strict ownership and periodic purge.
They also locked dependency sources to internal proxies and required lockfile pinning. Builds got slower. Incidents got rarer.
Leadership stopped worshiping “fast CI” when they were shown the cost of “fast compromise.”
Mini-story 3: The boring but correct practice that saved the day
A healthcare org had a policy everyone complained about: production deployments could only use artifacts from an internal registry,
and the registry only accepted artifacts that were signed by the build system and accompanied by an SBOM.
People called it “bureaucracy.” It was, honestly, a little bureaucratic.
A vendor advisory dropped: a widely used base image in the ecosystem had a compromised variant circulating.
Teams panicked because their images were “FROM that base.” Slack went feral.
The incident commander ran a simple query: list all running images by digest, then check which digests existed in the internal registry.
Most production digests did; a handful did not—those came from a skunkworks environment that bypassed the standard pipeline.
They blocked external registry pulls at the cluster egress and admission layer. Production stayed stable.
The only workloads that broke were the ones already violating policy, which made the conversation refreshingly short.
The boring practice wasn’t “we are secure.” It was “we can prove what we run.”
That’s the difference between an incident and a rumor.
Joke #2: Your audit logs are like vegetables—nobody loves them, but skipping them eventually becomes a lifestyle problem.
Checklists / step-by-step plan
Step-by-step: harden your intake pipeline (what you pull into builds)
- Inventory all external sources: registries, repos, Git submodules, CI actions, Terraform modules, vendor update endpoints.
- Implement internal proxies/mirrors for packages and container images; make builds pull from one controlled place.
- Pin dependencies using lockfiles and immutable digests. Ban “latest” and floating tags in production manifests.
- Require provenance: enforce that artifacts must be signed and (ideally) have attestation/provenance metadata.
- Scan, but don’t worship scanning: use scanning to prioritize, not to declare victory. A clean scan is not a clean bill of health.
- Separate build and deploy identities: the role that builds should not be able to deploy to prod without policy checks.
- Make updates observable: record every artifact digest deployed, and keep it queryable for at least your incident lookback window.
Step-by-step: harden CI/CD runners (where supply-chain attacks become real)
- Ephemeral runners: prefer short-lived runners over long-lived pets. Compromise should die with the VM/container.
- Network egress control: builds should not have open internet access by default. Allow-list registries and package proxies.
- Secrets discipline: use short-lived tokens, OIDC federation where possible, and scope credentials per job.
- Cache safely: per-repo/per-branch caches, no shared writable paths across trust boundaries, and periodic purges.
- Protect signing keys: keep them in HSM/KMS-backed systems; never leave them lying around in runner filesystems.
- Record build metadata: who built it, from what commit, with what dependencies, on what runner image.
Step-by-step: production guardrails (where you stop bad artifacts)
- Admission control that enforces signature verification and blocks unknown registries.
- Egress filtering so vendor components can’t call arbitrary endpoints.
- Runtime monitoring tuned for vendors: baseline expected domains/ports/processes; alert on deviations.
- Rollback-by-digest: ability to revert to known-good digests quickly, without “rebuild and pray.”
- Credential rotation drills: if a vendor component is compromised, you should know exactly which secrets it could touch.
Common mistakes (symptoms → root cause → fix)
1) “We deployed the same version everywhere, but behavior differs”
Symptoms: Same semantic version reported, different hashes, inconsistent network activity.
Root cause: You track by tag/version string instead of immutable digest; multiple distribution channels exist (mirror vs direct, stable vs hotfix).
Fix: Enforce digest pinning and single intake path. Record digests at deploy time and block drift at admission.
2) “The repo looks clean, but the built artifact is different”
Symptoms: Rebuilding the same commit yields different artifacts; only CI builds are affected.
Root cause: Non-hermetic builds: dependency fetching at build time, mutable base images, shared caches, compromised runner/toolchain.
Fix: Make builds deterministic where possible: pinned deps, locked base images, isolated caches, runner hardening, and provenance attestations.
3) “Security scan says green, but we still got popped”
Symptoms: No CVEs flagged; still see suspicious outbound traffic or data access.
Root cause: Supply-chain payload isn’t a known CVE; it’s a malicious feature. Scanners detect known badness, not intent.
Fix: Add provenance checks, behavioral monitoring, and strict egress policies. Treat scanning as one input, not a gate of truth.
4) “We can’t tell who ran what, where”
Symptoms: During incident response, teams argue about versions, and nobody can produce a definitive inventory quickly.
Root cause: No deployment event logging with digests; insufficient retention; too many manual deploy paths.
Fix: Centralize deployment metadata, require change management hooks, and make the inventory queryable (not buried in Slack threads).
5) “Vendor says rotate keys, but we don’t know which keys”
Symptoms: Panic rotations, service outages, missed credentials.
Root cause: Over-privileged vendor integrations and untracked secrets sprawl.
Fix: Build a secrets map: which system uses which credential, scope it, and rotate on a schedule so emergency rotation isn’t your first time.
6) “We blocked the bad package, but it came back”
Symptoms: Dependency reappears in builds after removal; developers reintroduce it unknowingly.
Root cause: Transitive dependency pull; lack of policy enforcement; lockfile not pinned or not enforced in CI.
Fix: Use lockfiles as the source of truth, enforce “no lockfile drift,” and add allow/deny policies at the proxy/mirror layer.
7) “Our mirror made it worse”
Symptoms: Many systems get the compromised package quickly, all from the same internal source.
Root cause: Mirror is trusted but not monitored, and ingest is automatic without verification; no immutability or retention rules.
Fix: Add ingest verification (signatures, checksums), make the mirror append-only/immutable for releases, and alert on unexpected upstream changes.
FAQ
1) Are supply-chain attacks mostly an open-source problem?
No. Open-source is visible, so it gets discussed. Commercial vendors get compromised too—and sometimes have broader distribution and deeper privileges.
The risk is about trust and access, not license type.
2) If we require code signing, are we safe?
Safer, not safe. Signing tells you who signed it. If the signing key or signing pipeline is compromised, you’ll faithfully verify a malicious artifact.
You need key protection, provenance, and detection when signing identities change.
3) What’s the difference between an SBOM and provenance?
An SBOM lists what’s inside (components). Provenance tells you how it was built (process, environment, identities).
SBOM helps you scope exposure; provenance helps you decide whether to trust the artifact in the first place.
4) Do we need reproducible builds to be “good” at this?
Reproducible builds are great, but don’t let perfection block progress. Start with immutable digests, artifact signing, and locked dependencies.
Reproducibility is a longer project—worth it for critical components.
5) How do we handle vendors that require outbound internet access?
Treat it like any other high-risk integration: explicit allow-lists, TLS inspection where appropriate, and logging.
Ask the vendor for a fixed set of domains/IP ranges and a documented protocol. If they can’t provide it, that’s a risk decision, not a technical detail.
6) What should we do when a vendor publishes “rotate credentials” guidance?
Rotate in this order: (1) credentials used by CI/CD and build systems, (2) vendor integration tokens with broad API access,
(3) long-lived shared secrets. Also audit usage logs before and after rotation to spot continued abuse.
7) Can we just block public registries entirely?
Often, yes—and you should for production builds. Developers may need internet access for exploration, but CI should pull through controlled proxies.
Blocking is simple; the hard part is making the proxy usable and fast enough that teams don’t bypass it.
8) How do we prevent dependency confusion?
Use private namespaces where possible, configure package managers to prefer internal registries, and block unknown packages at the proxy.
The operational trick: make the secure path the easiest path, or people will find creative ways to break it.
9) What’s the first metric you track to know you’re improving?
Mean time to inventory: how long it takes to answer “where is digest X running?” across environments.
If it’s more than minutes, your next incident will be expensive.
Conclusion: next steps you can execute this week
Supply-chain security isn’t a vibe. It’s the ability to constrain trust, prove provenance, and respond without guessing.
Attackers are betting you can’t answer basic questions under pressure. Prove them wrong.
- Pick one: enforce image digest pinning in production, or block external registry pulls. Do at least one this week.
- Make inventory easy: store deployment metadata (digests, package versions, build IDs) where incident commanders can query it quickly.
- Harden CI: isolate runners, restrict egress, and move signing keys out of runner filesystems.
- Turn vendor trust into vendor constraints: least privilege, explicit egress, and verified artifacts only.
- Run a drill: simulate “vendor update compromised” and measure time-to-scope, time-to-block, time-to-rotate.
If you do nothing else: stop deploying by tag, start deploying by digest, and require a signature you actually verify.
That alone turns a whole class of supply-chain incidents from “mysterious widespread infection” into “blocked at the gate.”