You deploy a container. It worked on your laptop. It even worked in staging. Then production says: exec format error.
The logs are empty, the pod restarts, and your incident channel fills with the same question asked in different fonts: “What changed?”
Usually: nothing “changed.” You just asked a CPU to run instructions meant for a different CPU. Containers are not magic.
They’re packaging. And packaging still cares about architecture.
What “exec format error” actually means
exec format error is a kernel-level complaint. Linux tried to execute a file and couldn’t recognize it as a runnable
binary for the current machine. This is not Docker being moody. This is the host OS refusing to load the program.
In container land, the most common causes are:
- Wrong architecture: you pulled an
arm64image onto anamd64node, or vice versa. - Wrong binary format inside the image: your base is
amd64but you copied in anarm64binary during build. - Bad shebang or CRLF in an entrypoint script: the kernel can’t parse the interpreter line or finds invisible Windows characters.
- Missing dynamic loader: you built for glibc but shipped an Alpine (musl) runtime without the expected loader.
But the headline error message is the same, which is why teams waste hours arguing about “Docker vs Kubernetes vs CI”
when the kernel already told you the real issue: “I can’t execute this.”
One operationally useful mental model: a container image is a tarball full of files plus some metadata. When the container starts,
the host kernel still executes the entrypoint. If that entrypoint (or the interpreter it points to) doesn’t match the host’s CPU
and ABI expectations, the kernel returns ENOEXEC, and your runtime converts it into a log line you’ll squint at during an outage.
Fast diagnosis playbook
When you’re on-call, you do not want a lecture. You want a sequence that converges quickly. Here’s the order that tends to produce answers
in minutes, not hours.
1) Identify the node architecture (don’t assume)
First check what the host actually is. “It’s x86” is often folklore, and folklore is not a monitoring signal.
2) Identify the image’s architecture as pulled
Confirm the local image metadata: OS/Arch and whether it came from a manifest list.
3) Confirm what file is failing to execute
Find the entrypoint and command, then examine that file’s type inside the image. If it’s a script, check line endings and shebang.
If it’s a binary, check ELF headers and architecture.
4) Decide: rebuild vs select platform vs enable emulation
Production fix hierarchy:
- Best: publish a correct multi-arch image and redeploy.
- Acceptable emergency workaround: pin
--platformduring pull/run or in Kubernetes scheduling (if you know what you’re doing). - Last resort: run through QEMU emulation. It can be fine for dev; it’s rarely a performance-neutral “solution” in prod.
Interesting facts and history you can use in postmortems
- Fact 1: “Exec format error” is older than containers; it’s a classic Unix/Linux error returned when the kernel can’t load a binary format.
- Fact 2: Docker’s multi-arch story was originally rough; “manifest lists” became the mainstream mechanism so one tag can refer to multiple architectures.
- Fact 3: Apple’s move to ARM (M1/M2/M3) dramatically increased wrong-arch incidents because developer laptops stopped matching many production servers.
- Fact 4: Kubernetes doesn’t “fix” architecture mismatch; it schedules pods to nodes, and nodes execute what they’re given. The mismatch shows up as CrashLoopBackOff.
- Fact 5: QEMU user-mode emulation via
binfmt_miscis what makes “run ARM images on x86” feel possible, but it’s still emulation with real overhead and edge cases. - Fact 6: Alpine Linux uses musl libc; Debian/Ubuntu typically use glibc. Shipping a glibc-linked binary into musl images can look like “exec format error” or “no such file.”
- Fact 7: The ELF header in a binary includes architecture. You can often diagnose mismatches instantly with
fileinside the image. - Fact 8: “Works on my machine” got a new variant: “works on my architecture.” The sentence is longer, but the blame is the same.
Where wrong-arch images sneak in
Scenario A: building on ARM laptops without multi-arch output
Developer builds an image on an Apple Silicon laptop. The image is linux/arm64. They push it to the registry under a tag used by CI.
In production (x86_64), the entrypoint binary is ARM. Boom: exec format error.
Scenario B: CI runners changed architecture quietly
You migrated from self-hosted x86 runners to managed runners that are “faster and cheaper.” Surprise: the pool now includes ARM runners.
Your build pipeline is deterministic—just not in the way you wanted.
Scenario C: a multi-stage Dockerfile copied the wrong binary
Multi-stage builds are great. They’re also very good at copying the wrong artifact efficiently.
If stage one runs on one platform and stage two runs on another, you can end up with mismatched binaries embedded in an otherwise correct base.
Scenario D: entrypoint script looks executable but isn’t
The entrypoint is a shell script committed with Windows CRLF line endings.
Linux tries to execute it, interprets /bin/sh^M as the interpreter, and you get a failure that looks suspiciously like architecture issues.
Scenario E: the tag points to a single-arch image, not a manifest list
Teams think “we publish multi-arch.” They don’t. They publish separate tags.
Then someone uses the “latest” tag in prod and gets whichever platform the last build overwrote.
Joke #1: Containers are like office coffee machines: they look standardized until you discover half the building runs on incompatible pods.
Practical tasks: commands, expected output, and decisions
This section is intentionally operational. Every task has three parts: the command, what the output means, and what decision you make next.
Run these from a node where the failure occurs, or from a workstation with access to the image.
Task 1: Confirm host architecture (Linux)
cr0x@server:~$ uname -m
x86_64
Meaning: The host CPU is x86_64 (amd64). If you see aarch64, that’s ARM64.
Decision: If host is x86_64 and your image is arm64, you have a mismatch. Keep going to prove it, then fix the build/publish process.
Task 2: Confirm architecture via OS metadata (more explicit)
cr0x@server:~$ dpkg --print-architecture
amd64
Meaning: Debian-family name for the architecture. This helps when scripts or config talk in distro terms.
Decision: Map amd64 ↔ x86_64, arm64 ↔ aarch64. If these don’t match your image platform, you know what’s coming.
Task 3: Inspect the image platform you pulled
cr0x@server:~$ docker image inspect --format '{{.Os}}/{{.Architecture}} {{.Id}}' myapp:prod
linux/arm64 sha256:5f9c8a0b6c0f...
Meaning: The local image is linux/arm64. On an amd64 host, it won’t execute without emulation.
Decision: Either pull the right platform explicitly (short-term) or rebuild/publish a multi-arch image (clean fix).
Task 4: Inspect the manifest list (what the tag actually points to)
cr0x@server:~$ docker manifest inspect myorg/myapp:prod | sed -n '1,60p'
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 1784,
"digest": "sha256:9a1d...",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 1784,
"digest": "sha256:ab22...",
"platform": {
"architecture": "arm64",
"os": "linux"
}
}
]
}
Meaning: This tag is a multi-arch manifest list. That’s good. If you only see one manifest (no list), it’s single-arch.
Decision: If the list includes your node architecture, the registry is fine; your pull might be forced to another platform or your runtime is doing something odd.
If it does not include your node architecture, you must publish the missing one.
Task 5: Pull the correct platform explicitly (safe emergency move)
cr0x@server:~$ docker pull --platform=linux/amd64 myorg/myapp:prod
prod: Pulling from myorg/myapp
Digest: sha256:9a1d...
Status: Downloaded newer image for myorg/myapp:prod
Meaning: You fetched the amd64 variant. The digest should match the amd64 manifest’s digest from the manifest list.
Decision: If this works, your clean fix is still to make the tag resolve correctly without manual platform pinning.
Task 6: Re-run the container with explicit platform (diagnostic)
cr0x@server:~$ docker run --rm --platform=linux/amd64 myorg/myapp:prod --version
myapp 2.8.1
Meaning: The image runs when platform is correct. This strongly implicates a platform mismatch.
Decision: Stop debugging the app itself. Fix image publishing and deployment selection.
Task 7: Find the configured entrypoint and command
cr0x@server:~$ docker image inspect --format 'Entrypoint={{json .Config.Entrypoint}} Cmd={{json .Config.Cmd}}' myorg/myapp:prod
Entrypoint=["/usr/local/bin/entrypoint.sh"] Cmd=["myapp","serve"]
Meaning: The failing executable is likely /usr/local/bin/entrypoint.sh (or whatever shows here).
Decision: Inspect that file inside the image. Don’t guess.
Task 8: Inspect the entrypoint file type inside the image
cr0x@server:~$ docker run --rm --entrypoint /bin/sh myorg/myapp:prod -lc 'ls -l /usr/local/bin/entrypoint.sh; file /usr/local/bin/entrypoint.sh'
-rwxr-xr-x 1 root root 812 Jan 2 10:11 /usr/local/bin/entrypoint.sh
/usr/local/bin/entrypoint.sh: POSIX shell script, ASCII text executable, with CRLF line terminators
Meaning: CRLF line terminators are a red flag. The kernel may choke on the interpreter line.
Decision: Convert to LF in the repo or during build. If it’s a binary, file will tell you the architecture.
Task 9: Inspect a binary’s ELF architecture inside the image
cr0x@server:~$ docker run --rm --entrypoint /bin/sh myorg/myapp:prod -lc 'file /usr/local/bin/myapp; readelf -h /usr/local/bin/myapp | sed -n "1,25p"'
/usr/local/bin/myapp: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, not stripped
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Type: EXEC (Executable file)
Machine: AArch64
Meaning: That binary is ARM64. If the host is amd64, this is your smoking gun.
Decision: Rebuild the binary for the correct platform or publish multi-arch builds. Do not “fix” this by changing the entrypoint.
Task 10: Detect the “missing dynamic loader” trap
cr0x@server:~$ docker run --rm --entrypoint /bin/sh myorg/myapp:prod -lc 'ls -l /lib64/ld-linux-x86-64.so.2 /lib/ld-musl-x86_64.so.1 2>/dev/null || true; ldd /usr/local/bin/myapp || true'
ldd: /usr/local/bin/myapp: No such file or directory
Meaning: ldd reporting “No such file” for a file that clearly exists often means the interpreter (dynamic loader) path in the ELF header is missing in the image.
That’s an ABI/base-image mismatch, not a missing binary.
Decision: Ensure your runtime image matches the libc expectations (glibc vs musl) or ship a statically-linked binary where appropriate.
Task 11: Check Kubernetes node architecture and OS
cr0x@server:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
prod-node-a-01 Ready worker 92d v1.28.5 10.0.4.21 Ubuntu 22.04.3 LTS 5.15.0-91-generic containerd://1.7.11
Meaning: This is not enough by itself. You need architecture too.
Decision: Query node labels next; Kubernetes encodes arch as a label.
Task 12: Confirm Kubernetes node architecture label
cr0x@server:~$ kubectl get node prod-node-a-01 -o jsonpath='{.metadata.labels.kubernetes\.io/arch}{"\n"}{.metadata.labels.kubernetes\.io/os}{"\n"}'
amd64
linux
Meaning: The node is amd64. If the image is arm64-only, pods will fail or never start.
Decision: Either schedule to matching nodes (nodeSelector/affinity) or publish the correct image variant. Prefer publishing.
Task 13: Inspect a failing pod events for exec/CrashLoop hints
cr0x@server:~$ kubectl describe pod myapp-7d6c7b9cf4-kkp2l | sed -n '1,120p'
Name: myapp-7d6c7b9cf4-kkp2l
Namespace: prod
Containers:
myapp:
Image: myorg/myapp:prod
State: Waiting
Reason: CrashLoopBackOff
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 3m12s kubelet Successfully pulled image "myorg/myapp:prod"
Warning BackOff 2m41s (x7 over 3m10s) kubelet Back-off restarting failed container
Meaning: Kubelet successfully pulled the image; the container dies after start attempts. This is compatible with exec format error.
You still need container logs.
Decision: Fetch container logs (including previous) and check runtime messages.
Task 14: Grab the previous container logs (the error often shows here)
cr0x@server:~$ kubectl logs myapp-7d6c7b9cf4-kkp2l -c myapp --previous
exec /usr/local/bin/myapp: exec format error
Meaning: That’s the kernel refusal surfaced through the runtime.
Decision: Confirm image arch vs node arch, then move to publishing a correct manifest list.
Task 15: For containerd-based nodes, inspect image platform (if you have access)
cr0x@server:~$ sudo crictl inspecti myorg/myapp:prod | sed -n '1,80p'
{
"status": {
"repoTags": [
"myorg/myapp:prod"
],
"repoDigests": [
"myorg/myapp@sha256:9a1d..."
],
"image": {
"spec": {
"annotations": {
"org.opencontainers.image.ref.name": "myorg/myapp:prod"
}
}
}
}
}
Meaning: crictl output varies by runtime and config; not all setups expose platform directly in this view.
The digest is still useful: you can map it to a manifest entry and see which platform was selected.
Decision: If you can’t see platform here, rely on manifest inspection plus node architecture labels.
Task 16: Verify BuildKit/buildx is active (you want it)
cr0x@server:~$ docker buildx version
github.com/docker/buildx v0.12.1 3b6e3c5
Meaning: Buildx is installed. This is the modern path for multi-arch builds.
Decision: If buildx is missing, install/enable it in CI. Stop trying to duct-tape multi-arch with hand-rolled scripts.
The clean fix: build and publish multi-arch images
The clean fix is boring: build for the platforms you run, publish a manifest list, and let clients pull the correct variant automatically.
This is what tags are for. One tag. Multiple architectures. Zero surprises.
What “clean” looks like in practice
- One tag (e.g.,
myorg/myapp:prod) points to a manifest list includinglinux/amd64andlinux/arm64. - Each platform image is built from the same source revision, with reproducible build steps.
- CI enforces that the published tag actually contains the required platforms.
- Runtime does not require
--platformoverrides.
Build and push multi-arch with buildx
On a machine with Docker BuildKit and buildx, you can build and push in one go:
cr0x@server:~$ docker buildx create --use --name multiarch
multiarch
cr0x@server:~$ docker buildx inspect --bootstrap | sed -n '1,120p'
Name: multiarch
Driver: docker-container
Nodes:
Name: multiarch0
Endpoint: unix:///var/run/docker.sock
Status: running
Platforms: linux/amd64, linux/arm64, linux/arm/v7
Meaning: Your builder supports multiple platforms. If linux/arm64 isn’t listed, you likely need binfmt/QEMU configured.
Now build and push:
cr0x@server:~$ docker buildx build --platform=linux/amd64,linux/arm64 -t myorg/myapp:prod --push .
[+] Building 128.4s (24/24) FINISHED
=> [internal] load build definition from Dockerfile
=> => transferring dockerfile: 2.12kB
=> exporting manifest list myorg/myapp:prod
=> => pushing manifest for myorg/myapp:prod
Meaning: The log line “exporting manifest list” is what you want. That’s the multi-arch tag.
Decision: If this produces only a single manifest, you didn’t actually build multi-arch. Check builder platforms and CI environment.
When you need binfmt/QEMU (and when you don’t)
If your build machine is amd64 and you’re building arm64 images (or the other way around), buildx can use emulation via binfmt_misc.
But if you can build natively on each architecture (separate runners), that’s usually faster and less flaky.
cr0x@server:~$ docker run --privileged --rm tonistiigi/binfmt --install arm64,amd64
installing: arm64
installing: amd64
Meaning: This registers QEMU handlers with the kernel so foreign-arch binaries can run under emulation.
Decision: Use this to enable multi-arch builds on a single runner. For production workloads, don’t assume emulation is acceptable just because it starts.
Multi-stage Dockerfiles: keep platforms consistent
The most common foot-gun is a multi-stage build where the builder stage and runtime stage don’t align by platform, or you copy a prebuilt binary
downloaded from the internet that defaults to amd64 while you’re building arm64.
Make platform explicit and use the BuildKit-provided build args:
cr0x@server:~$ cat Dockerfile
FROM --platform=$BUILDPLATFORM golang:1.22 AS build
ARG TARGETOS TARGETARCH
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /out/myapp ./cmd/myapp
FROM alpine:3.19
COPY --from=build /out/myapp /usr/local/bin/myapp
ENTRYPOINT ["/usr/local/bin/myapp"]
Meaning: The builder runs on the build machine’s platform ($BUILDPLATFORM), but the output binary is compiled for the target platform.
Decision: Prefer this pattern over “download a binary in Dockerfile.” If you must download, select by TARGETARCH.
Hardening: CI guardrails that prevent repeats
Incidents caused by wrong-arch images are rarely “hard.” They’re organizationally easy to repeat.
Fixing the build once is not the same as preventing the next one.
Guardrail 1: Assert the manifest contains required platforms
After pushing, inspect the manifest and fail the pipeline if it’s not multi-arch (or missing a required platform).
cr0x@server:~$ docker manifest inspect myorg/myapp:prod | grep -E '"architecture": "amd64"|"architecture": "arm64"'
"architecture": "amd64",
"architecture": "arm64",
Meaning: Both platforms appear. If one is missing, your tag is incomplete.
Decision: Fail the build. Do not “warn and proceed.” Warnings are how outages are scheduled.
Guardrail 2: Record the image digest and deploy by digest
Tags are pointers. Pointers move. For production, deploy immutable digests where possible, especially for rollback hygiene.
You can still publish a tag for humans, but let automation use digests.
cr0x@server:~$ docker buildx imagetools inspect myorg/myapp:prod | sed -n '1,60p'
Name: myorg/myapp:prod
MediaType: application/vnd.docker.distribution.manifest.list.v2+json
Digest: sha256:4e3f...
Manifests:
Name: myorg/myapp@sha256:9a1d...
Platform: linux/amd64
Name: myorg/myapp@sha256:ab22...
Platform: linux/arm64
Meaning: You have a stable digest for the manifest list and per-arch digests beneath it.
Decision: Store the manifest list digest as the deployment artifact. It’s the correct unit for multi-arch.
Guardrail 3: Make the platform a first-class parameter in builds
If your pipeline has hidden architecture variability, you will eventually ship the wrong bits. Make it explicit.
Build jobs should declare which platforms are built and validated.
Guardrail 4: Stop letting developer laptops publish production tags
If a laptop can publish :prod, you will eventually get a production incident with a battery percentage in the root cause chain.
Separate “dev push” from “release push.”
Guardrail 5: Validate the entrypoint binary inside the built image
Add a post-build sanity check: run file on the main binary for each platform. This catches “copied the wrong artifact” even when the manifest looks right.
cr0x@server:~$ docker buildx build --platform=linux/amd64 -t myorg/myapp:test --load .
[+] Building 22.1s (18/18) FINISHED
=> => naming to docker.io/myorg/myapp:test
cr0x@server:~$ docker run --rm --entrypoint /bin/sh myorg/myapp:test -lc 'file /usr/local/bin/myapp'
/usr/local/bin/myapp: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
Meaning: The binary matches amd64. Repeat for arm64 using a native runner or emulated check if acceptable.
One quote that belongs on every build wall: Hope is not a strategy.
— paraphrased idea often attributed in operations circles.
Three corporate mini-stories (anonymized, painfully familiar)
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company ran most of its production workloads on amd64 Kubernetes nodes. A small but growing batch-processing cluster used arm64 nodes
because they were cheaper for the performance envelope needed. Both clusters pulled from the same registry, and both used the same image tags.
A service team pushed a new release late afternoon. They built on their own CI runner fleet, which had been amd64 for years.
During a cost-cutting sprint, the CI platform group quietly added arm64 runners to the pool. The scheduler started placing builds on arm64 for some jobs.
Nobody wrote it down because, from their perspective, it “shouldn’t matter.”
The team’s Docker pipeline produced a single-arch image. When the job ran on arm64, the pushed tag became arm64-only.
The amd64 production cluster updated, pulled the tag, and began crashing instantly with exec format error.
Alerts fired; rollback didn’t help because the previous tag had already been overwritten earlier in the day.
The fix was simple: rebuild for amd64 and push again. The lesson was not simple: “CI runner architecture is part of your supply chain.”
After the incident, they introduced mandatory multi-arch publishing, and they stopped allowing mutable tags for production deployments.
The most valuable change wasn’t technical. It was permission: any team could block a release if the manifest wasn’t multi-arch.
Mini-story 2: The optimization that backfired
A platform team decided to speed up builds by caching compiled artifacts between pipeline runs. Reasonable idea.
They introduced a shared cache bucket keyed on repository and branch, not on architecture. This is where the story gets expensive.
One week later, a developer on an ARM laptop ran the build locally, pushing a cache update as part of a “developer experience” enhancement.
CI picked up the cached artifact, happily copied it into the final image, and published an amd64-tagged image with an arm64 binary inside.
The base image was amd64; the binary was arm64. That mismatch is a special kind of cursed because metadata lies while the kernel tells the truth.
The incident was confusing. Image inspection said linux/amd64. Node architecture was amd64. Yet the entrypoint failed.
Engineers spent time suspecting corrupted layers, bad registries, and “maybe Kubernetes is pulling the wrong thing.”
Eventually someone ran file inside the container and got the truth in one line.
They kept the caching, but fixed the key: architecture and toolchain version became part of the cache identity.
They also added a post-build check that validated the binary architecture inside the image. The performance win remained; the surprise cost didn’t.
Mini-story 3: The boring but correct practice that saved the day
A regulated enterprise had a habit other companies make fun of: every deploy used digests, not tags.
The teams complained. It looked unfriendly in YAML. It wasn’t “modern.” It was, however, extremely hard to accidentally mutate.
An application team shipped a new version built on a developer workstation during an urgent hotfix. They pushed a tag that was used by staging.
The image was arm64. Staging was mixed-arch, and the rollout failed on half the nodes. Predictable.
Production didn’t care. Production referenced a manifest list digest produced by the official release pipeline.
The mutated tag never entered the deployment path. The hotfix was annoying, but it was contained.
The platform team didn’t have to “freeze tags” or play registry whack-a-mole. The process did its job.
In the post-incident review, the enterprise didn’t brag. They just pointed at the rule: “prod deploys only from signed pipeline artifacts by digest.”
Nobody applauded. Nothing broke. That’s the point.
Common mistakes: symptom → root cause → fix
1) Pod CrashLoopBackOff with “exec format error” in logs
Symptom: Container starts and immediately exits; kubectl logs --previous shows exec format error.
Root cause: Image architecture doesn’t match node architecture, or the main binary inside is for the wrong architecture.
Fix: Publish a multi-arch image (manifest list) and redeploy; verify node arch labels and manifest platforms match.
2) Docker run fails locally on Apple Silicon but works on CI
Symptom: Developer on M1/M2 sees exec format error when running an image built/pulled elsewhere.
Root cause: Single-arch amd64 image pulled onto arm64 host without emulation, or the tag points to amd64 only.
Fix: Use --platform=linux/arm64 temporarily; long-term publish multi-arch. If you rely on emulation, set it up intentionally and measure.
3) Image inspect says amd64, still “exec format error”
Symptom: docker image inspect shows linux/amd64, but startup fails with exec format error.
Root cause: Wrong-arch binary copied into the image (multi-stage mix-up, cached artifact, downloaded binary).
Fix: Run file on the actual entrypoint binary inside the image; fix the build step that injects the artifact.
4) Entry point script fails with exec format error, but it’s “just a script”
Symptom: Entrypoint is a shell script; error appears at startup.
Root cause: CRLF line endings or a bad shebang (interpreter path invalid in the image).
Fix: Ensure LF endings; ensure #!/bin/sh points to an existing interpreter; run file inside the image.
5) “No such file or directory” for a binary that exists
Symptom: Logs show no such file or directory for the binary; ls shows the file exists.
Root cause: Missing ELF interpreter (dynamic loader) or libc mismatch (glibc binary on Alpine/musl).
Fix: Use a compatible base image (glibc-based) or build static; validate interpreter path via readelf -l.
6) Multi-arch tag exists, but nodes still pull wrong variant
Symptom: Manifest list includes amd64 and arm64, yet a node pulls the wrong one.
Root cause: Platform is forced via --platform, runtime config, or cached local image/tag confusion.
Fix: Remove forced platform settings; pull by digest; clear local images on the node if needed; verify by inspecting the pulled image’s platform.
Joke #2: “Exec format error” is the kernel’s way of saying, “That’s not my job,” which is also my favorite way to decline meetings.
Checklists / step-by-step plan
Step-by-step: incident response (15–30 minutes)
- Confirm failing nodes’ architecture (
uname -mor Kubernetes node label). - Inspect the image you actually pulled on the node (
docker image inspector runtime equivalent). - Inspect the registry tag manifest (
docker manifest inspect). - Determine whether this is a platform mismatch or an internal binary mismatch (
fileinside image). - If mismatch: pull correct platform explicitly as a short-term mitigation, or roll back to a known-good digest.
- Start the clean fix: rebuild and push multi-arch manifest list.
- Add a CI assertion that checks manifest platforms and binary architecture for the entrypoint.
Step-by-step: clean fix implementation (same day)
- Enable BuildKit/buildx in CI and standardize on it.
- Make Dockerfile multi-arch-safe: use
$BUILDPLATFORM,$TARGETARCH, and compile for target. - Build and push:
docker buildx build --platform=linux/amd64,linux/arm64 ... --push. - Verify manifest list contains required platforms.
- Smoke test per platform (native runner preferred; emulation acceptable for basic checks).
- Deploy using a manifest list digest, not a mutable tag.
Step-by-step: prevention (this sprint)
- Ban production deploys from mutable tags; use digests in prod environments.
- Lock down registry permissions: only CI can push release tags.
- Make runner architecture explicit in CI scheduling; don’t allow “mixed pools” without multi-arch builds.
- Add a build artifact provenance record that includes platforms built and the manifest list digest.
- Teach the team: architecture is part of the interface, not an implementation detail.
FAQ
Q1: Is “exec format error” always a CPU architecture problem?
No, but it’s the first thing to check because it’s common and fast to prove. Scripts with CRLF line endings and bad shebangs can also trigger it,
and ELF interpreter mismatches can look similar.
Q2: Why does Docker sometimes “just work” across architectures on my laptop?
Because you might have QEMU emulation registered via binfmt_misc, often installed by Docker Desktop or a previous setup step.
It’s convenient. It can also hide problems until you hit production nodes without emulation.
Q3: What’s the difference between an image and a manifest list?
A single image is one platform’s filesystem and config. A manifest list is an index that points to multiple platform-specific images under one tag.
Clients select the correct image based on their platform (unless forced otherwise).
Q4: In Kubernetes, can I force the right architecture with node selectors?
Yes. You can use kubernetes.io/arch in node selectors or affinity rules. This is useful when you truly run different builds per arch.
But it’s not a substitute for publishing multi-arch images when the application is supposed to run everywhere.
Q5: Should we use --platform in production deployments?
Only as a temporary mitigation or in tightly controlled situations. It becomes a hidden policy that can break scheduling assumptions and mask a bad publish.
The long-term fix is correct manifests and correct builds.
Q6: Why does ldd sometimes say “No such file” for an existing binary?
Because the kernel can’t load the binary’s interpreter (dynamic loader) referenced in the ELF headers. That loader path doesn’t exist in the image,
often due to glibc/musl mismatches or missing loader packages.
Q7: Can we publish separate tags per architecture instead of multi-arch manifests?
You can, but you’ll regret it unless you have strict naming, scheduling, and deployment discipline. Multi-arch manifests let a single tag behave correctly
across environments, which reduces human error—the most abundant resource in most organizations.
Q8: What’s the fastest proof that a binary is built for the wrong arch?
Run file on it inside the container (or on the artifact before packaging). It will tell you “x86-64” vs “ARM aarch64” immediately.
Follow up with readelf -h if you need more detail.
Q9: If we build multi-arch, do we need to test on both architectures?
Yes. At least a smoke test. Multi-arch builds can fail in architecture-specific ways: different dependencies, CGO behavior, or native library availability.
“It built” is not the same as “it runs.”
Conclusion: next steps that stick
When you see Docker’s exec format error, treat it like a production fire alarm: it’s loud, it’s blunt, and it’s usually right.
Don’t start by rewriting entrypoints or blaming Kubernetes. Start by confirming architecture on both sides and validating what’s actually inside the image.
Practical next steps:
- Today: inspect the failing image’s platform and the entrypoint binary with
docker image inspectandfile. - This week: publish a multi-arch manifest list using buildx, and verify it in CI.
- This sprint: deploy by digest in production and lock down who can push release tags.
The clean fix is not clever. It’s correct. And it’s much cheaper than discovering, again, that your CPUs have opinions.