Docker Multi-Stage Builds: Shrink Images Without Breaking Runtime

November 2, 2025 • February 3, 2026 • Read: 20 min • Views: 22

Was this helpful?

Nothing ruins a clean deploy like an image that builds fine and then faceplants at runtime because you stripped out the one shared library your binary quietly needs. Your CI green checkmark becomes a little lie you tell yourself.

Multi-stage builds are the right tool for shrinking images without turning production into an archeological dig. But they’re also a sharp tool. Used well, they cut fat. Used carelessly, they cut arteries.

What multi-stage actually does (and what it doesn’t)
Interesting facts and quick history (because context saves outages)
Core patterns that work in production
The runtime contract: what you must keep
Practical tasks: commands, outputs, and decisions (12+)
Fast diagnosis playbook
Common mistakes: symptoms → root cause → fix
Three corporate mini-stories from the trenches
Checklists / step-by-step plan
FAQ
Conclusion: next steps that pay rent

What multi-stage actually does (and what it doesn’t)

Multi-stage builds are Docker’s way of letting you use one image to build and another to run—inside a single Dockerfile. You compile in a fat stage with compilers and headers, then copy the results into a thin stage that contains only what the app needs at runtime.

The important part: multi-stage builds do not magically make your application “minimal-friendly.” They simply make it easier to separate build-time dependencies from runtime dependencies. If your app needs glibc at runtime and you ship it into a musl-based Alpine image, you didn’t “optimize.” You planted a time bomb.

Why ops people like it

Smaller images pull faster, store cheaper, scan faster, and have less attack surface. That’s not theoretical: it’s fewer bytes crossing the network on deploy day and fewer packages to patch at 2 a.m.

What it changes in practice

Build reproducibility: you can pin build toolchains without bloating runtime.
Security posture: fewer runtime packages equals fewer CVEs to triage.
Failure modes: you’ll see missing libraries, missing CA certificates, missing timezone data, missing shell, missing users—things you didn’t realize you were relying on.

One quote to keep taped to your monitor:

“Hope is not a strategy.” — Gordon R. Dickson

Multi-stage builds are how you stop hoping your runtime image “probably has what it needs.” You verify it.

Interesting facts and quick history (because context saves outages)

Some short, concrete context points that explain why multi-stage builds became standard practice:

Docker added multi-stage builds in 2017 (Docker 17.05). Before that, people used brittle “builder containers” and manual copy steps.
Layer caching shaped Dockerfile style: ordering commands to maximize cache reuse became a skill because rebuilding everything was painfully slow.
Alpine became popular because it was tiny, not because it was universally compatible. The musl vs glibc mismatch still bites teams shipping prebuilt binaries.
Distroless images (minimal runtime images without package managers and shells) gained traction as supply-chain and attack-surface concerns grew.
OCI image format standardized the container image layout across runtimes, which made image tooling (inspection, scanning) more consistent.
BuildKit changed the game: improved parallelism, better caching, secrets mounts, and advanced copy patterns made multi-stage builds more maintainable.
SBOMs became mainstream as organizations started needing to answer “what’s inside this image?” during audits and incident response.
Language ecosystems responded differently: Go embraced static binaries; Node and Python leaned into slimmer bases; Java added jlink/jdeps to trim runtimes.

Core patterns that work in production

Pattern 1: Builder + runtime with explicit artifacts

This is the canonical multi-stage pattern: build in one stage, copy a binary or packaged app into a runtime stage.

cr0x@server:~$ cat Dockerfile
# syntax=docker/dockerfile:1

FROM golang:1.22-bookworm AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o /out/app ./cmd/app

FROM gcr.io/distroless/static-debian12:nonroot AS runtime
COPY --from=build /out/app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]

Opinion: if you can ship a truly static binary, do it. It’s the cleanest operational artifact. But only if you understand what you’re giving up (glibc features, DNS nuances, OS-level tooling). Static isn’t “better,” it’s “different.”

Pattern 2: “Slim base” runtime (still has a shell)

Distroless is excellent for hardening. It’s also a pain when you’re on-call and need to quickly inspect the container. Sometimes the correct answer is “slim Debian with a shell,” especially when your organization’s maturity says you’ll be debugging live.

cr0x@server:~$ cat Dockerfile
# syntax=docker/dockerfile:1

FROM node:22-bookworm AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:22-bookworm-slim AS runtime
WORKDIR /app
ENV NODE_ENV=production
COPY package*.json ./
RUN npm ci --omit=dev
COPY --from=build /app/dist ./dist
USER node
CMD ["node", "dist/server.js"]

If you’re shrinking images, do it without deleting your ability to operate the service. A shell isn’t evil; shipping compilers and package managers in runtime is.

Pattern 3: “Toolbox stage” for debugging (kept out of production)

Keep production images minimal, but don’t pretend you’ll never need tools. Use an extra stage as a toolbox so you can reproduce issues locally or in a controlled debug deployment.

cr0x@server:~$ cat Dockerfile
# syntax=docker/dockerfile:1

FROM debian:bookworm AS toolbox
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl ca-certificates iproute2 dnsutils procps strace \
  && rm -rf /var/lib/apt/lists/*

# other stages ...

You don’t ship the toolbox stage. You keep it so your engineers can attach the same tools to the same filesystem layout when debugging.

Pattern 4: Reusing artifacts across multiple final images

A good multi-stage Dockerfile can emit multiple targets: a runtime image, a debug image, a test image. Same source, different outputs.

cr0x@server:~$ docker buildx build --target runtime -t myapp:runtime .
[+] Building 18.7s (16/16) FINISHED
 => exporting to image
 => => naming to docker.io/library/myapp:runtime

This is how you keep parity without shipping junk. Same build, different target.

Joke #1: If your Dockerfile has one stage, it’s not “simple,” it’s “optimistic.”

The runtime contract: what you must keep

The runtime image is a contract between your app and the OS userland you ship. Multi-stage builds make it easy to violate that contract by accident. Here’s what tends to go missing:

1) Shared libraries and dynamic loader

If you compile with CGO enabled (common for DNS behavior, SQLite bindings, image processing, etc.), you’ll need the right libc and loader in the runtime stage. If you build on Debian and run on Alpine, you can get the classic:

exec /app: no such file or directory (even though the file exists) because the dynamic loader path doesn’t exist in the runtime.

2) CA certificates

Your service talks to HTTPS endpoints. Without CA certs, TLS fails. Many “minimal” images don’t include them by default.

3) Timezone data and locale

UTC-only is fine until it isn’t—reporting jobs, customer-facing timestamps, and compliance logging can get weird fast.

4) Users, permissions, and file ownership

Multi-stage copies preserve file ownership unless you tell Docker otherwise. If you run as non-root (you should), verify the files are readable/executable.

5) Entrypoint semantics

Shell form vs exec form matters. If you rely on shell expansion but removed the shell, you’ll find out at runtime, not build time.

6) Observability expectations

If your on-call playbook assumes curl exists inside the container, distroless will disappoint you. Use sidecars, ephemeral debug containers, or separate debug targets. Decide intentionally.

Practical tasks: commands, outputs, and decisions (12+)

These are real tasks you can run today. Each includes: command, what output means, and what decision you make.

Task 1: Compare image sizes and decide if optimization matters

cr0x@server:~$ docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
REPOSITORY   TAG       SIZE
myapp        fat       1.12GB
myapp        slim      142MB
myapp        distroless 34.6MB

Meaning: you’ve got an order-of-magnitude reduction available.

Decision: if deploys are slow, registry storage is costly, or scanners are drowning, you pursue multi-stage. If your image is already ~60–150MB and stable, prioritize correctness over shaving another 10MB.

Task 2: Inspect layers to find what’s bloating the image

cr0x@server:~$ docker history --no-trunc myapp:fat | head -n 8
IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
3f2c...        2 hours ago    /bin/sh -c npm install                          402MB
b18a...        2 hours ago    /bin/sh -c apt-get update && apt-get install   311MB
9c10...        3 hours ago    /bin/sh -c pip install -r requirements.txt     198MB
...

Meaning: you’re shipping build dependencies and package caches.

Decision: move compile/install tooling to builder stage; ensure caches are not copied into runtime; use npm ci --omit=dev or equivalent.

Task 3: Confirm which stages exist and what they’re named

cr0x@server:~$ docker buildx bake --print 2>/dev/null | sed -n '1,40p'
{
  "target": {
    "default": {
      "context": ".",
      "dockerfile": "Dockerfile"
    }
  }
}

Meaning: your build config is simple; stage discovery is in the Dockerfile.

Decision: explicitly name stages (AS build, AS runtime) so copy lines don’t rot.

Task 4: Build a specific target to validate runtime stage alone

cr0x@server:~$ docker build --target runtime -t myapp:runtime .
[+] Building 21.3s (12/12) FINISHED
 => exporting to image
 => => naming to docker.io/library/myapp:runtime

Meaning: runtime stage builds and exports. Good start.

Decision: wire CI to build the runtime target explicitly, not just the default stage.

Task 5: Run the container and watch for the “it builds, it doesn’t run” class of failures

cr0x@server:~$ docker run --rm myapp:runtime
standard_init_linux.go:228: exec user process caused: no such file or directory

Meaning: usually a dynamic loader / libc mismatch, not a missing binary.

Decision: check whether the binary is dynamically linked and whether the runtime base has the correct loader (glibc vs musl). Fix base image choice or compile flags.

Task 6: Check if a binary is dynamically linked (builder stage inspection)

cr0x@server:~$ docker run --rm --entrypoint /bin/bash myapp:build -lc "file /out/app && ldd /out/app || true"
/out/app: ELF 64-bit LSB pie executable, x86-64, dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, ...
	linux-vdso.so.1 (0x00007ffd6b3d9000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9d7c3b4000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f9d7c5b6000)

Meaning: it’s dynamically linked and expects Debian-style glibc loader paths.

Decision: run on Debian/Ubuntu/distroless-glibc compatible base, or rebuild static (if viable), or vendor required libs carefully.

Task 7: Verify CA certificates exist in the runtime image

cr0x@server:~$ docker run --rm --entrypoint /bin/sh myapp:runtime -lc "ls -l /etc/ssl/certs/ca-certificates.crt 2>/dev/null || echo missing"
missing

Meaning: TLS calls will fail unless your language runtime bundles certs (many don’t, or not fully).

Decision: install ca-certificates in runtime stage (or copy from builder), or switch to a base that includes them.

Task 8: Test outbound TLS from inside the container (when tools exist)

cr0x@server:~$ docker run --rm --entrypoint /bin/sh myapp:slim -lc "curl -fsS https://example.com | head"
<!doctype html>
<html>
<head>

Meaning: certs and DNS work, network egress works, basic runtime sanity exists.

Decision: if this fails, don’t guess—check certs, DNS, proxies, and network policies before touching the Dockerfile again.

Task 9: Confirm the runtime image contains required config files

cr0x@server:~$ docker run --rm --entrypoint /bin/sh myapp:runtime -lc "ls -l /app/config || true"
ls: cannot access '/app/config': No such file or directory

Meaning: you forgot to copy config defaults, templates, migrations, or static assets.

Decision: explicitly COPY those artifacts from the build context or from builder output; don’t rely on “it was there in the old image.”

Task 10: Confirm user and file permissions (non-root runtime)

cr0x@server:~$ docker run --rm --entrypoint /bin/sh myapp:slim -lc "id && ls -l /app && test -x /app/app && echo executable"
uid=1000(node) gid=1000(node) groups=1000(node)
total 18240
-rwxr-xr-x 1 root root 18673664 Jan  2 12:11 app
executable

Meaning: the binary is executable, but owned by root.

Decision: decide if ownership matters. For read/execute it’s fine; for writing logs, temp files, caches, it will explode. Prefer COPY --chown=node:node for app directories that need writes.

Task 11: Measure build time and cache effectiveness

cr0x@server:~$ DOCKER_BUILDKIT=1 docker build --progress=plain -t myapp:test . | sed -n '1,35p'
#1 [internal] load build definition from Dockerfile
#2 [internal] load metadata for docker.io/library/node:22-bookworm
#3 [build 1/6] WORKDIR /app
#4 [build 2/6] COPY package*.json ./
#5 [build 3/6] RUN npm ci
#5 CACHED
#6 [build 4/6] COPY . .
#7 [build 5/6] RUN npm run build
...

Meaning: dependency install step is cached; source copy invalidates later layers only.

Decision: keep dependency descriptors copied before source, so small code changes don’t trigger full dependency rebuilds.

Task 12: Verify what actually ended up in the runtime image filesystem

cr0x@server:~$ docker run --rm --entrypoint /bin/sh myapp:slim -lc "du -sh /app/* | sort -h | tail"
4.0K	/app/package.json
16M	/app/node_modules
52M	/app/dist

Meaning: node_modules is smaller than dist; you’re shipping only production deps (good).

Decision: if node_modules is huge, you probably shipped devDependencies or a build cache. Fix the install command and .dockerignore.

Task 13: Detect accidental inclusion of secrets or build junk

cr0x@server:~$ docker run --rm --entrypoint /bin/sh myapp:slim -lc "find /app -maxdepth 2 -name '*.pem' -o -name '.env' -o -name 'id_rsa' 2>/dev/null | head"

Meaning: no obvious secret files found (this is not a full audit, but it’s a fast sanity check).

Decision: if anything shows up, stop the line. Fix the build context and .dockerignore, then rotate secrets as needed.

Task 14: Quick CVE surface comparison via package listing

cr0x@server:~$ docker run --rm --entrypoint /bin/sh myapp:slim -lc "dpkg -l | wc -l"
196

Meaning: 196 packages installed. That’s a lot of potential patch surface.

Decision: if you can move to distroless or a slimmer base while keeping runtime contract intact, you reduce patch churn. If you can’t, at least pin and patch predictably.

Task 15: Confirm entrypoint and cmd are what you think they are

cr0x@server:~$ docker inspect myapp:runtime --format '{{json .Config.Entrypoint}} {{json .Config.Cmd}}'
["/app"] null

Meaning: container uses exec-form entrypoint; no shell required.

Decision: keep it this way. If you see ["/bin/sh","-c",...] in a minimal image, expect runtime failures when the shell isn’t there.

Fast diagnosis playbook

When a multi-stage “optimization” breaks production, you don’t have time for container philosophy. You need a sequence that finds the bottleneck quickly.

First: classify the failure (start vs serving vs external calls)

Container won’t start: entrypoint missing, loader mismatch, permissions, wrong architecture.
Starts then crashes: missing config/assets, missing environment variables, segfault from libc mismatch, wrong working directory.
Starts and serves but broken features: missing CA certs, timezone, fonts, image codecs, locales, DNS behavior differences.

Second: confirm what you shipped (not what you meant to ship)

Inspect entrypoint/cmd (docker inspect).
List expected files inside runtime (ls, du).
Check binary link mode (file, ldd in builder stage).

Third: validate the runtime contract with one probing request

For HTTPS clients: check CA bundle exists; test TLS to a known endpoint (in a debug image if needed).
For DNS-heavy apps: verify resolver config and behavior; confirm /etc/resolv.conf inside container.
For filesystem writes: check user, permissions, and writable paths (/tmp, app cache dirs).

Fourth: decide the remediation path

Base image mismatch: switch runtime base, don’t duct-tape libraries unless you love surprises.
Missing artifacts: copy them explicitly, add tests to fail the build when absent.
Too minimal to debug: add a debug target; don’t shove shells into production “just in case.”

Common mistakes: symptoms → root cause → fix

This section exists because most failures repeat. Teams change; physics doesn’t.

1) “exec … no such file or directory” but the file exists

Symptom: container exits immediately; error mentions “no such file or directory.”
Root cause: dynamic loader path missing (glibc binary on musl image), wrong architecture, or CRLF line endings for scripts.
Fix: run file and ldd in builder stage; align base image with libc; ensure correct GOOS/GOARCH; use exec-form entrypoint; normalize line endings.

2) TLS errors: “x509: certificate signed by unknown authority”

Symptom: app starts but can’t call HTTPS dependencies.
Root cause: missing CA certificate bundle in runtime image.
Fix: install ca-certificates in runtime; or copy the cert bundle from builder; verify with a TLS probe.

3) Works in CI, fails in production: missing config/templates/migrations

Symptom: runtime errors about missing files; endpoints return 500; startup complains about templates.
Root cause: multi-stage copy only brought the binary, not the support files.
Fix: define an explicit artifact directory in builder output and copy it wholesale; add a build-time check that required paths exist.

4) Permission denied when writing logs/cache

Symptom: app crashes when writing to /app, /var/log, or cache dirs; works as root.
Root cause: runtime runs as non-root, but copied files/dirs are owned by root and not writable.
Fix: create writable directories in runtime stage; use COPY --chown; set USER intentionally; prefer writing to /tmp or dedicated volume paths.

5) “sh: not found” or “bash: not found”

Symptom: container fails because it tries to execute a shell command.
Root cause: shell-form CMD/ENTRYPOINT in a minimal image that doesn’t ship a shell.
Fix: use exec-form JSON arrays; remove shell scripts or ship an appropriate base; for complex startup logic, consider a tiny init binary.

6) Image is small but builds are now painfully slow

Symptom: CI build time increases after “optimization.”
Root cause: poor caching order, copying entire repo before dependency install, or disabling BuildKit features.
Fix: copy dependency manifests first; use BuildKit; use .dockerignore; consider cache mounts for package managers.

7) “It runs locally, fails on Kubernetes” after shrinking

Symptom: local Docker run fine; in cluster it fails with DNS/timeouts/permissions.
Root cause: cluster security context runs as different UID; read-only filesystem; network policies stricter; missing tools to introspect.
Fix: align runtime user; write to approved paths; test with the same security context; provide a debug target or ephemeral debugging method.

Joke #2: Distroless is great until you realize you’ve containerized your ability to panic quietly.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

They had a Go service that “obviously” compiled to a static binary. That’s what everyone says about Go services right before they enable CGO for one tiny feature and forget about it. The team switched the runtime base from Debian slim to Alpine because the size looked fantastic on a slide.

The deploy rolled out during business hours. A few pods started, then immediately crashed. The logs were insulting: “no such file or directory.” People checked the image; the binary was right there. Someone suggested the registry corrupted it. Someone else suggested Kubernetes was “having one of those days.” Classic.

The actual issue: the binary was dynamically linked with glibc and expected /lib64/ld-linux-x86-64.so.2. Alpine didn’t have it. The binary couldn’t even reach main(). It wasn’t an application bug. It was a loader problem.

The fix was boring: switch the runtime base back to a glibc-compatible image, then decide whether CGO was truly needed. Later they rebuilt with CGO_ENABLED=0 and verified with file and ldd checks in CI.

The lesson stuck: “minimal” is a runtime contract, not a diet plan. If you can’t describe your binary’s dependencies, you can’t safely shrink the container.

Mini-story 2: The optimization that backfired

A platform team decided to speed up CI by caching everything. They introduced aggressive BuildKit caching and copied the entire monorepo early in the Dockerfile so builds had “context.” The image got smaller thanks to multi-stage. Build times, however, turned into a slow-motion disaster.

Why? Because copying the whole repo before dependency installation invalidated the cache on almost every commit. A doc change in a neighboring service caused the Node dependency layer to rebuild. Engineers started avoiding merges near release windows because builds were too slow to iterate safely.

The team then added a second “optimization”: a blanket RUN rm -rf cleanup step late in the builder stage. It didn’t help runtime size (multi-stage already discarded the builder), but it did increase build time and reduced cache reuse further, because the layers changed constantly.

They eventually refactored: service-specific contexts, strict .dockerignore, and dependency manifests copied early. They also stopped “cleaning” builder layers that never shipped.

The operational takeaway: optimize the thing you actually pay for. With multi-stage builds, runtime size is often cheap to fix; build latency requires discipline and structure.

Mini-story 3: The boring but correct practice that saved the day

An enterprise team was moving a Python API into multi-stage builds. Everyone wanted distroless because security liked the phrase “no shell.” The SREs pushed back, not because they loved shells, but because they loved sleeping.

They built two targets: runtime (minimal) and debug (same app bits, plus a shell and basic network tools). The debug image was not deployed by default. It was only used for incident response in a controlled namespace, with explicit approvals.

Two months later, a dependency started failing TLS handshakes intermittently due to a corporate MITM proxy rotation. The production image didn’t have curl or openssl, as intended. The on-call deployed the debug target to reproduce the failure and confirmed the CA chain problem in minutes, not hours.

The resolution was straightforward: update trusted CAs and validate proxy settings. The real win was time-to-diagnosis. The team didn’t have to rebuild a “one-off debug image” during an outage while everyone watched.

The lesson: having a debug target is dull governance work. It’s also how you avoid turning outages into improvisational theater.

Checklists / step-by-step plan

Step-by-step plan: migrate to multi-stage without breaking runtime

Start with a known-good baseline. Keep your current Dockerfile/image as a reference. Don’t delete it until the new one proves itself.
Name your stages. Use AS build and AS runtime. Future you will thank present you.
Define an artifact directory. Example: /out contains binaries, assets, migrations, and configs that must ship.
Copy only artifacts into runtime. Not the entire repo. Not /root/.cache. Not your feelings.
Pick a runtime base that matches your linkage. If dynamic glibc: use Debian/Ubuntu/distroless-base. If static: distroless-static can be great.
Add runtime contract checks in CI. Verify expected files exist; verify binary type; verify CA bundle presence if needed.
Run as non-root. Set USER. Fix file ownership with COPY --chown and create writable dirs explicitly.
Create a debug target. Same app bits, extra tools. Don’t ship it to prod by default, but keep it buildable.
Measure before/after. Image size, pull time, build time, scan time, startup time. Choose optimizations that move real needles.
Roll out gradually. Canary deploy. Watch logs for missing files/libs. If you’re surprised, your checks are incomplete.

Checklist: runtime contract items

Entrypoint uses exec-form and exists
Binary architecture matches the cluster nodes
Binary linkage matches base image libc
CA certificates available for HTTPS
Timezone/locale strategy defined (UTC-only or include tzdata)
Non-root user configured; writable directories created
All required assets/config/migrations copied
Healthcheck works (or external health checks account for minimal images)

FAQ

1) Are multi-stage builds always worth it?

No. If your runtime image is already small and stable, and you rarely rebuild, you might not get enough benefit. But for most services with frequent deployments, it’s worth doing once and keeping correct.

2) Should I use Alpine for runtime to save space?

Only if you’re sure your runtime dependencies are compatible with musl, or you built specifically for Alpine. Otherwise use Debian slim or distroless variants that match your linkage.

3) Distroless vs slim: which should I pick?

Distroless when you have strong observability and a clear debug path (debug target, ephemeral debug containers). Slim when your org still needs “ssh into the container” energy to survive incidents.

4) Why did my image shrink but startup got slower?

It’s usually not the image size. It’s missing caches (expected), cold JIT compilation, DNS behavior changes, or init logic changes. Measure startup time and check what changed besides bytes.

5) How do I avoid copying secrets into the runtime stage?

Keep your build context clean with .dockerignore, avoid copying the whole repo when you only need artifacts, and never bake runtime secrets into images. Use runtime secret injection mechanisms.

6) Can I run without a shell and still debug effectively?

Yes. Use a debug target image, sidecars, or ephemeral debugging containers. The key is planning for debugging, not pretending it won’t happen.

7) What’s the safest way to handle certificates in minimal images?

Prefer a base that includes CA certificates, or explicitly install/copy a known CA bundle. Then test TLS in an integration check that fails loudly.

8) How do I keep build times fast with multi-stage builds?

Order your Dockerfile for caching: copy dependency manifests first, install deps, then copy source. Use BuildKit. Use .dockerignore to keep unrelated files from invalidating layers.

9) Is it okay to have multiple final stages?

Yes. It’s a strong pattern: runtime for production, debug for incidents, test for CI. Same source and artifacts, different packaging.

10) What if my app needs OS packages at runtime?

Then install them in runtime—intentionally and minimally. Multi-stage builds aren’t about refusing runtime dependencies; they’re about refusing accidental ones.

Conclusion: next steps that pay rent

If you take one thing away: multi-stage builds are not a trick for winning a “smallest image” contest. They’re a mechanism for making your runtime dependencies explicit and reviewable.

Next steps you can do this week:

Pick one service with an embarrassingly large image and implement a builder/runtime split with named stages.
Add CI checks that validate the runtime contract: binary linkage/architecture, required files present, and CA certificates when applicable.
Create a debug target and document when it can be used.
Measure image size, build time, and deploy time before/after. Keep the changes that improve operations, not just aesthetics.