“Exec format error.” The most honest message your container can give you. It’s the runtime equivalent of staring at a door labeled push and pulling anyway: you shipped the wrong CPU binary.
Multi-arch is supposed to make this boring: publish one tag, run everywhere. In real production pipelines, it’s easy to accidentally publish an amd64-only image under a tag your arm64 nodes will happily pull. Or worse: publish a manifest list that claims arm64 exists while the layer contents are secretly amd64. That’s when you get paged at 02:17 for “Kubernetes is broken again.” It’s not. Your build is.
What “multi-arch” really is (and what it is not)
Multi-arch in Docker is not magic. It’s a packaging trick: a single image tag points to a manifest list (a.k.a. “index”) that contains one image manifest per platform. The registry serves the right one based on the client’s requested platform.
That’s it. The registry doesn’t validate your binaries. Docker doesn’t read your ELF headers and say “hmm, suspicious.” If you push an arm64 manifest that references an amd64 layer, you will successfully publish a lie. The runtime will only complain when the kernel refuses to execute it.
Key terms you should stop hand-waving
- Platform: OS + architecture (+ optional variant). Example:
linux/amd64,linux/arm64,linux/arm/v7. - Manifest: JSON describing a specific image for one platform: config + layers.
- Manifest list / index: JSON mapping platforms to manifests. This is what you want your tag to be when you say “multi-arch.”
- BuildKit: The build engine behind
docker buildx. It does the heavy lifting: cross-building, caching, exporting, and remote builders. - QEMU emulation: A way to run non-native binaries during build. It’s convenient, slower, and sometimes subtle-broken.
If you internalize one thing: multi-arch is about publishing correct metadata and correct bytes. You need both.
Quick facts and history that actually matter
- Docker “manifest lists” showed up years after Docker images became popular. Early Docker had no first-class multi-arch story; people used separate tags like
:armand:amd64and prayed no one mistyped. - OCI image specs standardized what registries store. Most modern registries store OCI-compatible manifests even if you call them “Docker images.” The JSON shapes are well-defined; your tooling is what varies.
- BuildKit was a major architectural shift. Classic
docker buildwas tied to the daemon and not designed for cross-platform builds at scale. BuildKit made builds more parallel, cacheable, and exportable. - “arm64” winning in servers changed the threat model. It used to be hobbyist boards. Now it’s mainstream cloud instances. Pulling an amd64-only image on an arm64 node is no longer “rare.”
- Kubernetes multi-arch isn’t special. It relies on the same registry manifest negotiation as Docker. If your tag is wrong, Kubernetes will dutifully pull the wrong thing faster than you can say “rollout restart.”
- Alpine’s musl and Debian’s glibc differences bite harder under emulation. QEMU can mask ABI mismatches until runtime, and then you get crashes that look like application bugs.
- “Variant” matters for some ARM targets.
linux/arm/v7vslinux/arm/v6is not bikeshedding. If you ship the wrong variant, the binary may execute and still crash in weird ways. - Registries generally don’t validate platform correctness. They store what you push. Treat the registry as durable storage, not a quality gate.
Multi-arch is mature enough to be boring—if you build it like an SRE who’s been burned before.
How you end up shipping the wrong binaries
1) You pushed a single-arch image under a “universal” tag
Someone ran docker build -t myapp:latest . on an amd64 laptop and pushed it. Your arm64 nodes pull :latest, get amd64 layers, and fail with exec format error.
This is the classic. It still happens in 2026 because people still have fingers.
2) You built multi-arch, but only one platform actually succeeded
Buildx can build multiple platforms in one command. It can also silently produce partial results if you’re not strict about failures in CI and you don’t check the manifest list afterward.
3) Your Dockerfile “helpfully” downloads the wrong prebuilt binary
The most common form of wrong-arch shipping is not compilation. It’s curl. A Dockerfile that does:
curl -L -o tool.tgz ...linux-amd64...regardless of platform- or it uses an installer script that defaults to amd64
- or it assumes
uname -minside the build container equals the target arch
Under emulation, uname -m might report what you expect, until it doesn’t. Under cross-compilation, it may report the build environment architecture, not the target.
4) You relied on QEMU for “it’ll work” builds and got “it’ll work sometimes”
QEMU is great for making progress. It’s not great for pretending it’s identical to native execution. Some language ecosystems (not naming names, but yes, that one) run architecture detection during build. Under emulation, detection can be wrong, slow, or flaky.
5) Your cache mixed architectures and you didn’t notice
Build caches are content-addressed, but your build logic can still produce cross-arch contamination if you write to shared paths, reuse artifacts across stages, or fetch “latest” without pinning per-arch assets.
6) Your CI runners are multi-arch, your assumptions are single-arch
When your fleet includes amd64 and arm64 runners, “build on whatever is available” becomes “ship whatever it built.” Multi-arch requires explicitness: platforms, provenance, and verification.
One quote worth keeping in your head while you do this work: Hope is not a strategy.
(paraphrased idea, often attributed to engineers and operators in reliability circles)
Joke #1: Multi-arch images are like power adapters—everything looks compatible until you actually plug it in.
Fast diagnosis playbook
When something fails in prod, you don’t get bonus points for an elegant theory. You get points for restoring service and preventing a repeat. Here’s the fastest path I know.
First: prove what got pulled
- Check the node architecture. If the node is arm64 and the image is amd64-only, stop right there.
- Inspect the tag’s manifest list. Does the tag even include the platform you think it does?
- Resolve the digest actually pulled. Tags move. Digests don’t. Find the digest the runtime is using.
Second: validate the bytes inside the image
- Start a debug container and inspect a binary with
file. If it says x86-64 on an arm64 target, you found the smoking gun. - Check dynamic linker / libc expectations. Wrong arch is obvious. Wrong ABI can be sneakier.
Third: trace the build source of truth
- Look at the Dockerfile for downloads. Anything fetching prebuilt artifacts must be platform-aware.
- Check buildx logs for per-platform steps. A single green job badge can hide a partial build.
- Verify QEMU/binfmt setup if emulation is involved. If your build depends on emulation, treat it as a dependency with health checks.
Most common bottleneck
It’s not BuildKit. It’s your pipeline’s lack of verification. The build “worked” and published a broken tag. The system did exactly what you told it to do, which is the problem.
Practical tasks: commands, outputs, decisions
These are not “toy” commands. They’re the stuff you run during an incident, and the stuff you automate afterward so you don’t have incidents.
Task 1: Confirm your host architecture (don’t guess)
cr0x@server:~$ uname -m
aarch64
What it means: The node is ARM 64-bit. It expects linux/arm64 images.
Decision: If the image tag doesn’t advertise linux/arm64, you’re already in “wrong binary” territory.
Task 2: See what platforms your tag claims to support
cr0x@server:~$ docker buildx imagetools inspect myorg/myapp:latest
Name: myorg/myapp:latest
MediaType: application/vnd.oci.image.index.v1+json
Digest: sha256:8c9c2f7b4f8a5b0d5c0a2b1e9c3d1a6e2f4b7a9c0d1e2f3a4b5c6d7e8f9a0b1c
Manifests:
Name: myorg/myapp:latest@sha256:111...
Platform: linux/amd64
Name: myorg/myapp:latest@sha256:222...
Platform: linux/arm64
What it means: The tag is a multi-arch index containing amd64 and arm64 variants.
Decision: If your platform is missing here, fix publishing first. If it’s present, move on to verifying the contents of the arm64 image.
Task 3: Inspect the manifest list with the Docker CLI (alternate view)
cr0x@server:~$ docker manifest inspect myorg/myapp:latest | head -n 20
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:111...",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:222...",
What it means: You’re looking at the raw platform mapping.
Decision: When debugging registry weirdness, JSON doesn’t lie. If the manifest list doesn’t include your platform, stop blaming Kubernetes.
Task 4: Force-pull a specific platform locally
cr0x@server:~$ docker pull --platform=linux/arm64 myorg/myapp:latest
latest: Pulling from myorg/myapp
Digest: sha256:222...
Status: Downloaded newer image for myorg/myapp:latest
What it means: You asked for the arm64 variant and got a specific digest.
Decision: Use that digest for deeper inspection. If the pull fails with “no matching manifest,” your multi-arch publish is incomplete.
Task 5: Confirm the image config architecture metadata
cr0x@server:~$ docker image inspect myorg/myapp:latest --format '{{.Architecture}} {{.Os}}'
arm64 linux
What it means: The local image metadata claims it’s arm64.
Decision: Good sign, not proof. Next, verify the actual executable files.
Task 6: Check the binary architecture inside the container
cr0x@server:~$ docker run --rm --entrypoint /bin/sh myorg/myapp:latest -c 'file /usr/local/bin/myapp'
/usr/local/bin/myapp: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, stripped
What it means: The bytes are actually arm64.
Decision: If it says “x86-64,” you have a wrong-binary build step (usually download logic). Fix Dockerfile logic, not the registry.
Task 7: Reproduce the failure signature (“exec format error”) safely
cr0x@server:~$ docker run --rm --platform=linux/arm64 myorg/myapp:latest /usr/local/bin/myapp --version
myapp 2.7.1
What it means: The arm64 variant executes and prints version output.
Decision: If you get exec format error, the runtime is trying to execute the wrong arch (or the file is not an executable for that OS/arch). Go back to Task 6.
Task 8: Verify which builder you are using (and whether it’s sane)
cr0x@server:~$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS
default docker
default unix:///var/run/docker.sock running v0.12.5 linux/amd64
multiarch docker-container
multiarch0 unix:///var/run/docker.sock running v0.12.5 linux/amd64,linux/arm64,linux/arm/v7
What it means: The default builder can only do amd64; the docker-container builder supports multiple platforms (likely via binfmt/QEMU).
Decision: Use a dedicated builder instance for multi-arch. Don’t trust “default” unless you like surprises.
Task 9: Create and select a proper multi-arch builder
cr0x@server:~$ docker buildx create --name multiarch --driver docker-container --use
multiarch
What it means: Buildx will run a BuildKit container for consistent behavior and multi-platform features.
Decision: In CI, always create/use a named builder. It makes builds reproducible and debuggable.
Task 10: Check binfmt/QEMU registration on the host
cr0x@server:~$ docker run --privileged --rm tonistiigi/binfmt --info
Supported platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6
Enabled platforms: linux/amd64, linux/arm64, linux/arm/v7
What it means: The kernel has binfmt handlers enabled for those architectures.
Decision: If your target platform isn’t enabled, multi-arch builds that require emulation will fail or silently skip. Enable the platform or switch to native builders per arch.
Task 11: Build and push a multi-arch image (the correct way)
cr0x@server:~$ docker buildx build --platform=linux/amd64,linux/arm64 -t myorg/myapp:2.7.1 --push .
[+] Building 142.6s (24/24) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.12kB 0.0s
=> [linux/amd64] exporting to image 8.1s
=> => pushing layers 6.7s
=> [linux/arm64] exporting to image 9.4s
=> => pushing layers 7.5s
=> exporting manifest list 1.2s
=> => pushing manifest list 0.6s
What it means: You built both platforms and pushed a manifest list.
Decision: If you don’t see “exporting manifest list,” you probably didn’t push a multi-arch index. Fix that before you declare victory.
Task 12: Validate the pushed tag is an index, not a single manifest
cr0x@server:~$ docker buildx imagetools inspect myorg/myapp:2.7.1 | sed -n '1,20p'
Name: myorg/myapp:2.7.1
MediaType: application/vnd.oci.image.index.v1+json
Digest: sha256:4aa...
Manifests:
Name: myorg/myapp:2.7.1@sha256:aaa...
Platform: linux/amd64
Name: myorg/myapp:2.7.1@sha256:bbb...
Platform: linux/arm64
What it means: Registry now stores a true multi-arch index for that tag.
Decision: Gate your pipeline on this check. If it’s not an index, fail the build.
Task 13: Diagnose a Dockerfile that downloads the wrong asset
cr0x@server:~$ docker buildx build --platform=linux/arm64 --progress=plain --no-cache .
#10 [linux/arm64 6/9] RUN curl -fsSL -o /usr/local/bin/helm.tgz "https://example/helm-linux-amd64.tgz"
#10 0.9s curl: (22) The requested URL returned error: 404
What it means: Your Dockerfile hard-coded amd64 (and it’s failing on arm64, which is merciful). Often it won’t 404 and will “work” while shipping the wrong bytes.
Decision: Replace hard-coded architecture with BuildKit’s platform args (TARGETARCH, TARGETOS, TARGETVARIANT) and map them to upstream naming.
Task 14: Use BuildKit-provided platform arguments correctly
cr0x@server:~$ docker buildx build --platform=linux/arm64 --progress=plain --no-cache -t test/myapp:arm64 .
#5 [linux/arm64 3/7] RUN echo "TARGETOS=$TARGETOS TARGETARCH=$TARGETARCH TARGETVARIANT=$TARGETVARIANT"
#5 0.1s TARGETOS=linux TARGETARCH=arm64 TARGETVARIANT=
What it means: BuildKit injects target platform values; you should base downloads/compilation on these, not uname.
Decision: If your Dockerfile uses uname -m to choose binaries, you’re one refactor away from pain. Switch to BuildKit args.
Task 15: Confirm the running container’s platform in Kubernetes (from a node)
cr0x@server:~$ kubectl get node -o wide | head -n 3
NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE KERNEL-VERSION
worker-arm01 Ready <none> 55d v1.29.1 10.0.3.21 Ubuntu 22.04.3 LTS 5.15.0-91-generic
worker-x86a01 Ready <none> 55d v1.29.1 10.0.2.14 Ubuntu 22.04.3 LTS 5.15.0-91-generic
What it means: You have a mixed-arch cluster (common now). Scheduling will place pods on either unless constrained.
Decision: If the image isn’t truly multi-arch, you must pin node selectors/affinity temporarily, or you’ll get “works on one node” roulette.
Task 16: Identify the image digest actually used by a pod
cr0x@server:~$ kubectl get pod myapp-5f7f6d9c7b-2qk4p -o jsonpath='{.status.containerStatuses[0].imageID}{"\n"}'
docker-pullable://myorg/myapp@sha256:bbb...
What it means: The pod resolved the tag to a digest (immutable). That digest corresponds to one platform manifest.
Decision: Compare this digest with your manifest list. If it points at the wrong platform entry, your registry metadata or client platform negotiation is off.
Task 17: Debug BuildKit cache behavior (and stop “optimizing” blindly)
cr0x@server:~$ docker buildx build --platform=linux/amd64,linux/arm64 \
--cache-to=type=registry,ref=myorg/myapp:buildcache,mode=max \
--cache-from=type=registry,ref=myorg/myapp:buildcache \
-t myorg/myapp:ci-test --push .
[+] Building 98.3s (24/24) FINISHED
=> importing cache manifest from myorg/myapp:buildcache 2.4s
=> [linux/amd64] CACHED 0.6s
=> [linux/arm64] CACHED 0.8s
=> exporting manifest list 1.1s
What it means: Cache was reused for both platforms, and a manifest list was exported.
Decision: If you see one platform cached and the other rebuilding from scratch every time, your Dockerfile ordering or platform-conditional steps are defeating cache reuse.
Task 18: Catch “single-arch masquerading as multi-arch” in CI
cr0x@server:~$ docker buildx imagetools inspect myorg/myapp:ci-test | grep -E 'MediaType|Platform'
MediaType: application/vnd.oci.image.index.v1+json
Platform: linux/amd64
Platform: linux/arm64
What it means: Your tag is an index and includes both platforms.
Decision: Make this a required pipeline step. If it prints application/vnd.oci.image.manifest.v1+json instead, fail the job.
A build strategy that behaves in CI
The correct strategy depends on what you build. Compiled languages behave differently than “curl a binary into /usr/local/bin” builds. But the principles are consistent:
Principle 1: Prefer native builds per architecture when you can
If you have the option to run an arm64 runner and an amd64 runner, take it. Native builds avoid emulation surprises and are usually faster for heavy compilation.
That doesn’t mean you need two totally separate pipelines. It means you should consider a builder topology:
- Remote builders per arch (BuildKit supports this pattern)
- Or separate CI jobs that push per-arch images and then publish a manifest list
Principle 2: If you use QEMU, treat it as production infrastructure
QEMU via binfmt is not “just a developer convenience” once it’s in CI. It can break across kernel updates, Docker updates, and even security hardening. Monitor it and validate it.
Principle 3: Make the Dockerfile platform-aware without being clever
Use BuildKit’s args. They exist for a reason:
TARGETOS,TARGETARCH,TARGETVARIANTBUILDOS,BUILDARCHfor the build environment
Then map to vendor naming explicitly. Many upstreams use x86_64 instead of amd64. Or aarch64 instead of arm64. Don’t guess; map.
Principle 4: Separate “fetch tools” from “build app”
The more you mix responsibilities, the more you risk cache pollution and platform mix-ups. A clean pattern:
- Stage A: fetch or build platform-specific tooling (keyed on target arch)
- Stage B: build your application (cross-compile if appropriate)
- Stage C: minimal runtime image (copy in exactly what you need)
Principle 5: Always verify the published tag
Verification is cheap. Incidents are expensive. After pushing, inspect the manifest list, pull each platform variant, and verify the main binary with file. Automate it.
Joke #2: If you don’t verify multi-arch, you’re basically running a cross-platform lottery where the prize is an outage.
Three corporate mini-stories (anonymized, painfully familiar)
Mini-story 1: The incident caused by a wrong assumption
A mid-size company migrated a chunk of their Kubernetes nodes to arm64 to cut costs. The platform team did the reasonable thing: tainted the new nodes, moved low-risk services first, and watched error rates. It looked fine for a week.
Then a routine redeploy of a “boring” internal API started failing only on the arm nodes. On-call saw crash loops with exec format error. They assumed it was a base image issue and rolled back the cluster upgrade they’d just done. Nothing changed. They rolled back the application. Still broken on arm.
The wrong assumption was simple: “We use buildx, therefore our images are multi-arch.” They did use buildx. But the pipeline pushed a tag from a single docker build job that ran on an amd64 runner. Another job built arm64 for tests, but never pushed it.
The fix was equally simple: make publishing a manifest list the only path to :latest and release tags, and gate the pipeline with imagetools inspect plus a binary check. The deeper lesson was cultural: no tag should exist without a verification step, even if the team “knows” how it works.
Mini-story 2: The optimization that backfired
A different org had a slow build and decided to “make it faster” by aggressively caching and reusing a shared artifact directory across builds. They mounted a cache volume into the builder and stored compiled outputs keyed only by commit SHA. It cut build time dramatically—for a while.
Then the arm64 images started failing with segmentation faults in a crypto library during TLS handshakes. Not immediately. Only under load. The x86 images were fine. Everyone suspected a compiler bug, then a kernel regression, then cosmic rays.
The root cause was mundane and infuriating: the shared cache directory contained amd64-built objects that were being copied into the arm64 build stage because the build scripts used “if file exists, reuse it” logic. Under QEMU, some build steps were skipped and the cache short-circuited compilation in the worst possible way. The image metadata still declared arm64, but the bytes were mixed.
They fixed it by making caches platform-scoped and by refusing to share opaque artifact directories across platforms. BuildKit’s registry cache solved the original problem without the dangerous shortcut. The “optimization” had created a cross-architecture supply chain problem inside their own pipeline.
Mini-story 3: The boring but correct practice that saved the day
A finance-adjacent company had strict change control and a slightly annoying release process. Engineers complained about the extra steps: publish images by digest, record manifests, and keep a small “release evidence” log for each deploy. It felt like paperwork. It also meant their on-call engineers slept more.
During a busy week, a service started crashing only on a subset of nodes after a routine image rebuild. This time, the incident response was almost boring. They grabbed the pod’s imageID digest, matched it to the manifest list, and immediately saw that the arm64 digest referenced layers built two days earlier. The amd64 digest was new.
The build pipeline had partially failed pushing arm64 layers due to a transient registry permission issue. The job still published the tag. But because the team always deploys by digest in production and always records the mapping of tag → digest → platform, they could quickly pin the known-good digest for both platforms and restore service while they fixed CI.
The “boring” practice—digest-based promotion and manifest verification—didn’t prevent the mistake. It made the blast radius small and the diagnosis fast. That’s the real win.
Common mistakes: symptom → root cause → fix
1) Symptom: exec format error at container start
Root cause: Wrong architecture binary in the image, or the tag points to the wrong platform manifest.
Fix: Inspect the tag with docker buildx imagetools inspect. Pull the intended platform with docker pull --platform=.... Confirm binary arch with file. Fix Dockerfile downloads to use TARGETARCH.
2) Symptom: Works on amd64 nodes, crashes on arm64 nodes, but no exec format error
Root cause: Mixed-architecture userland components (plugins, shared libs) or an ABI mismatch (musl vs glibc), often introduced by “download latest” scripts.
Fix: Check file output for every copied binary, not just the main one. Pin base images and tool versions. Prefer distro packages or compile from source per arch.
3) Symptom: Tag claims arm64 support, but arm64 pull says “no matching manifest”
Root cause: You pushed a single-arch manifest, not a manifest list; or you overwrote the tag with a single-arch push.
Fix: Only allow tag publication from docker buildx build --platform ... --push. Use CI permissions to prevent manual pushes to release tags.
4) Symptom: Multi-arch builds are painfully slow
Root cause: QEMU emulation doing heavy compilation; no cache; Dockerfile invalidates cache too often.
Fix: Move to native builders per arch or cross-compile where appropriate. Add registry-backed cache. Reorder Dockerfile to maximize cache hits.
5) Symptom: arm64 build “succeeds” but runtime fails to find loader
Root cause: You copied a dynamically linked binary built against glibc into a musl image (or vice versa), or you copied only the binary without required shared libs.
Fix: Use a consistent base image family across stages, or build statically where appropriate. Validate with ldd (if available) or inspect with file and check interpreter path.
6) Symptom: CI shows green, but prod pulls an older digest for one architecture
Root cause: Partial push; manifest list updated incorrectly; cache or push step failed for one platform while another succeeded.
Fix: Gate on manifest verification and require both platforms to be present and freshly built. Consider pushing per-arch tags first, then creating the manifest list explicitly.
7) Symptom: Docker Compose runs the wrong arch locally
Root cause: Local environment defaults to a platform; Compose may build locally for host arch unless platform: is set or you use buildx correctly.
Fix: Set explicit platform in Compose for testing, or pull with --platform. Don’t treat Compose behavior as proof your registry tag is correct.
Checklists / step-by-step plan
Step-by-step: harden a multi-arch release pipeline
- Decide target platforms explicitly. For most backend services:
linux/amd64andlinux/arm64. Addlinux/arm/v7only if you truly support it. - Create a named builder in CI. Use
docker-containerdriver for consistent BuildKit behavior. - Install/verify binfmt if you rely on emulation. Run the binfmt info command and ensure target platforms are enabled.
- Make Dockerfile platform-aware. Replace
uname -mlogic withTARGETARCHmapping. - Build and push multi-arch in one operation. Use
docker buildx build --platform=... --push. - Verify the published tag is an index. Fail the pipeline if media type is not an OCI index.
- Verify each platform variant’s main binary. Pull per platform and run
fileon the entrypoint binary. - Promote by digest to production. Deploy immutable digests; keep tags for humans.
- Monitor for platform skew. In mixed clusters, watch crash loops correlated with node arch.
- Lock down who can push release tags. Prevent “quick fixes” that bypass verification.
Checklist: Dockerfile review for multi-arch correctness
- Any
curl/wgetdownloads? If yes, does the URL vary byTARGETARCH? - Any installer scripts? Do they support arm64 explicitly or do they default to amd64?
- Any compiled artifacts copied from another stage? Are stages built for the same
--platform? - Any use of
unameordpkg --print-architecture? Are you sure it’s querying the target, not the build environment? - Are you pinning versions and checksums per arch? If not, you’re trusting the internet in stereo.
Checklist: release verification gates (minimum viable safety)
imagetools inspectshows an OCI index and includes all required platforms.- Per-platform pull succeeds.
- Per-platform container runs
--version(or a lightweight health command). - Binary inspection via
filematches expected arch.
FAQ
1) Do I always need QEMU to build multi-arch images?
No. If your build is pure cross-compilation (for example, Go with the right settings), you can build for other architectures without running foreign binaries. QEMU is needed when build steps execute target-arch binaries during the build.
2) What’s the difference between docker build and docker buildx build here?
buildx uses BuildKit features: multi-platform outputs, advanced cache exporters, and remote builders. Classic docker build is typically single-platform and tied to the local daemon’s architecture.
3) Why does the tag show arm64 support but the binary is still amd64?
Because manifests are metadata. You can publish a manifest that says “arm64” while the layer contains an amd64 binary. This usually happens via hard-coded downloads or cross-arch cache contamination.
4) Should I build both architectures in one job or separate jobs?
If you have reliable native runners for each arch, separate jobs can be faster and more deterministic. If you rely on QEMU, a single buildx invocation is simpler but can be slower. Either way, publish one manifest list and verify it.
5) Can I “fix” an existing tag that’s wrong?
You can repush the tag to point to a corrected manifest list, but caches and rollouts might already have pulled the broken digest. In production, prefer deploying by digest so the fix is an explicit rollout decision, not a tag surprise.
6) Why does Kubernetes sometimes pull the wrong architecture?
Usually it doesn’t. It pulls what the manifest negotiation provides for the node platform. When it “pulls wrong,” it’s typically because the tag wasn’t a proper multi-arch index, or the index entry references the wrong content.
7) What about linux/arm/v7—should I support it?
Only if you truly have customers on 32-bit ARM. Supporting it increases build complexity, test matrix size, and the chance of variant mistakes. Don’t add it as a trophy platform.
8) How do I make “downloaded tools” safe across architectures?
Use TARGETARCH-based mapping, pin versions, and validate checksums per architecture. Better: use distro packages or build from source where it’s practical.
9) Is it okay to use :latest for multi-arch?
It’s okay for developer convenience, not okay as a production contract. Use immutable digests or versioned tags for releases, and treat :latest as a moving pointer.
10) What’s the single best guardrail to prevent wrong binaries?
A post-push verification job that (a) confirms the tag is an OCI index with the required platforms, and (b) validates the entrypoint binary architecture with file for each platform.
Conclusion: next steps you can do this week
If you run mixed architecture fleets—or you will soon—multi-arch isn’t optional polish. It’s basic release hygiene. The registry will happily store your mistakes, and Kubernetes will happily deploy them at scale.
- Add a verification gate: after push, require
docker buildx imagetools inspectto show an OCI index with all platforms. - Validate the bytes: for each platform, pull and run
fileon the main executable. - Fix your Dockerfile downloads: replace hard-coded
amd64assets withTARGETARCHmapping. - Pick a builder strategy: native builders per arch if you can; QEMU if you must—then monitor it like any other dependency.
- Deploy by digest: make rollouts explicit, reversible, and debuggable.
Multi-arch done right is quiet. That’s the goal: no heroics, no mystery crashes, no “works on my node.” Just correct binaries, every time.