You deploy a container, it starts fine, healthcheck is green… and then it can’t resolve a hostname.
apt update hangs. Your app logs say “Temporary failure in name resolution.”
The host itself resolves names perfectly, which is the most insulting part.
On Ubuntu 24.04, this often lands on the same tripwire: systemd-resolved is doing its job,
Docker is doing its job, and the overlap creates a DNS situation that looks like witchcraft. It isn’t.
It’s plumbing. And the fix does not require turning off half your network stack out of spite.
What actually breaks: stub resolvers, namespaces, and Docker’s DNS
Let’s name the moving parts, because the failure only looks mysterious when you treat DNS as a single box.
On Ubuntu 24.04, the host commonly uses systemd-resolved to provide a local “stub resolver”
bound to 127.0.0.53. The host’s /etc/resolv.conf is often a symlink to a file
that points at that stub.
Docker, meanwhile, typically injects a resolver into containers by copying or generating
/etc/resolv.conf inside the container’s network namespace. If Docker sees nameservers in the host’s
resolv.conf, it will try to use them. Here’s the catch: a container’s 127.0.0.53 is not the host.
It’s the container itself. So when a container tries to query 127.0.0.53, it’s basically asking
itself a DNS question, and nobody answers.
Docker does have an embedded DNS server (usually reachable at 127.0.0.11 inside containers)
for service discovery on user-defined networks. But that embedded DNS still needs upstream servers to forward
requests to the real world. If upstream is set to the host stub (127.0.0.53), forwarding fails.
There are other variants:
-
Docker picks up
nameserver 127.0.0.53from the host and writes it into containers. Immediate failure. -
Docker uses its embedded DNS (
127.0.0.11) but upstream is broken, so internal names work, external names fail. -
Split DNS (corporate VPN domains routed to specific resolvers) works on the host via
systemd-resolved,
but containers bypass that split logic and query public DNS for internal zones, getting NXDOMAIN or timeouts.
The goal here is not to “win” by disabling systemd-resolved. It provides legitimate value:
per-link DNS, DNSSEC state, caching, and good integration with NetworkManager and netplan. The goal is to make
Docker use a resolv.conf that contains reachable upstream servers, and to do it predictably.
One quote to frame the mindset (paraphrased idea): John Allspaw has pushed the idea that reliability comes from understanding systems, not blaming people.
DNS failures are nearly always system interactions, not operator incompetence.
Joke #1: DNS is the only part of the stack where “it worked yesterday” is considered a configuration format.
Interesting facts and historical context (so this stops feeling random)
-
systemd-resolved’s stub at 127.0.0.53 exists to avoid colliding with local resolvers like dnsmasq,
and to keep a consistent loopback endpoint even when upstreams change. -
Docker’s embedded DNS (127.0.0.11) was introduced to support container-to-container discovery on
user-defined networks, not as a general-purpose “corporate DNS client.” -
glibc’s resolver library reads
/etc/resolv.confand has rules like “3 nameservers max”
that can silently drop entries, which is fun when VPN clients add five resolvers. -
Ubuntu moved further into systemd integration over the past releases;
/etc/resolv.conf
being a symlink is now normal, not exotic. -
Split DNS used to be rare outside enterprises. Now even home users hit it with mesh VPNs, dev tools,
and “magic” DNS for internal domains. -
Network namespaces mean loopback is per-namespace.
127.0.0.1inside a container is not the host.
If you remember only one thing, remember that. -
resolv.conf in containers is not sacred. It’s generated at container start and can change depending on
Docker settings, network driver, and runtime. -
systemd-resolved can expose a “real” resolv.conf listing upstream servers via
/run/systemd/resolve/resolv.conf, which is often the cleanest source for Docker to consume.
Fast diagnosis playbook
When production is paging and you need signal fast, don’t start by editing config files.
Start by figuring out where resolution dies: inside the container, at Docker’s embedded DNS, or at the host resolver layer.
First: confirm what fails and where
- Inside container: can you resolve anything? Public names? Internal names?
- Inside container: what does
/etc/resolv.confactually contain? - Host: is
systemd-resolvedhealthy and what upstream servers does it know?
Second: classify the failure
- nameserver 127.0.0.53 inside container → almost certainly wrong resolv.conf propagation.
- nameserver 127.0.0.11 inside container, but timeouts → Docker embedded DNS upstream is wrong/unreachable.
- Public works, corp domains fail → split DNS mismatch between host and containers.
- Only certain networks fail → VPN route, firewall, or MTU/fragmentation affecting DNS traffic.
Third: fix at the narrowest layer
- Prefer pointing Docker at a real resolv.conf or explicit DNS servers.
- Use per-network or per-compose DNS settings when appropriate.
- Avoid disabling
systemd-resolvedunless you’re replacing it with a better-defined resolver and you accept the blast radius.
Practical tasks: commands, expected output, and what decision to make
These are the checks I actually run on Ubuntu hosts when containers start failing DNS.
Each task includes (1) a command, (2) what the output means, and (3) the decision you make from it.
Run them in order until you have a confident diagnosis.
Task 1: Confirm Docker DNS symptom from inside a minimal container
cr0x@server:~$ docker run --rm alpine:3.20 sh -lc 'cat /etc/resolv.conf; nslookup -type=a example.com 2>&1 | sed -n "1,12p"'
nameserver 127.0.0.53
options edns0 trust-ad
nslookup: can't resolve '(null)': Name does not resolve
Meaning: the container is trying to talk to 127.0.0.53 (itself), not the host’s resolver.
The lookup fails immediately.
Decision: stop blaming upstream DNS. Fix how Docker populates container resolv.conf (or Docker’s daemon DNS).
Task 2: Check whether the host resolv.conf is the systemd stub
cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Oct 10 09:12 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
Meaning: the host points to the stub resolver file.
Decision: if Docker is inheriting this, it will hand containers a dead-end nameserver. You want Docker to use the “real” resolv.conf.
Task 3: Inspect the “real” resolv.conf that lists upstream servers
cr0x@server:~$ sed -n '1,20p' /run/systemd/resolve/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
nameserver 10.20.0.10
nameserver 10.20.0.11
search corp.example
Meaning: these are reachable upstream resolvers from the host network perspective.
Decision: best-practice fix is typically: make Docker read this file instead of the stub file.
Task 4: Verify systemd-resolved is running and not degraded
cr0x@server:~$ systemctl status systemd-resolved --no-pager -l
● systemd-resolved.service - Network Name Resolution
Loaded: loaded (/usr/lib/systemd/system/systemd-resolved.service; enabled; preset: enabled)
Active: active (running) since Thu 2025-12-28 10:20:18 UTC; 3h 12min ago
Docs: man:systemd-resolved.service(8)
Main PID: 812 (systemd-resolve)
Status: "Processing requests..."
Tasks: 1 (limit: 38291)
Memory: 6.1M
CPU: 1.228s
Meaning: resolved is healthy.
Decision: do not “fix” this by disabling it. Your problem is Docker integration, not resolver health.
Task 5: See what DNS servers resolved believes it should use per-link
cr0x@server:~$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Link 2 (ens160)
Current Scopes: DNS
Protocols: +DefaultRoute -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.20.0.10
DNS Servers: 10.20.0.10 10.20.0.11
DNS Domain: corp.example
Meaning: host uses ens160 DNS, and has a search domain.
Decision: if containers need corp.example resolution, you must make Docker use these upstream servers or you’ll lose split DNS behavior.
Task 6: Check Docker’s view of the host resolv.conf it will use
cr0x@server:~$ docker info 2>/dev/null | sed -n '/DNS/,+3p'
DNS: 127.0.0.53
Meaning: Docker daemon is configured (explicitly or implicitly) to use the stub resolver.
Decision: override Docker’s DNS to use real upstreams or point Docker at the non-stub resolv.conf.
Task 7: Confirm what resolv.conf a running container actually got
cr0x@server:~$ cid=$(docker run -d alpine:3.20 sleep 300); docker exec "$cid" cat /etc/resolv.conf; docker rm -f "$cid" >/dev/null
nameserver 127.0.0.53
options edns0 trust-ad
Meaning: this is not Docker’s embedded DNS; it’s directly inheriting the stub.
Decision: fix the input Docker uses (host resolv.conf or daemon DNS config).
Task 8: Check whether Docker’s embedded DNS is present (127.0.0.11 scenario)
cr0x@server:~$ docker network create testnet >/dev/null
cr0x@server:~$ cid=$(docker run -d --network testnet alpine:3.20 sleep 300)
cr0x@server:~$ docker exec "$cid" cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
cr0x@server:~$ docker rm -f "$cid" >/dev/null; docker network rm testnet >/dev/null
Meaning: on user-defined networks, Docker uses its embedded DNS.
Decision: if lookups still fail here, the upstream used by Docker’s embedded DNS is the problem.
Task 9: Inspect iptables/nft rules that allow container DNS forwarding
cr0x@server:~$ sudo nft list ruleset | sed -n '1,80p'
table inet filter {
chain input {
type filter hook input priority filter; policy accept;
}
chain forward {
type filter hook forward priority filter; policy accept;
}
}
table ip nat {
chain POSTROUTING {
type nat hook postrouting priority srcnat; policy accept;
oifname "ens160" ip saddr 172.17.0.0/16 masquerade
}
}
Meaning: NAT masquerade exists, forward policy is accept (in this snippet).
Decision: if you see restrictive forward policies or missing masquerade, DNS queries may never leave the host. Fix firewall rules before touching DNS config.
Task 10: Watch systemd-resolved logs during a container DNS attempt
cr0x@server:~$ sudo journalctl -u systemd-resolved -n 30 --no-pager
Dec 28 13:21:04 server systemd-resolved[812]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.20.0.10.
Dec 28 13:21:10 server systemd-resolved[812]: DNS server 10.20.0.11 does not support DNSSEC, disabling DNSSEC validation for this server.
Meaning: resolved is negotiating capabilities; these messages are not fatal by themselves.
Decision: if there are timeouts, SERVFAIL storms, or frequent server switching, you have upstream DNS issues too. Don’t stop at the Docker stub problem.
Task 11: Validate host can resolve what containers cannot (split DNS check)
cr0x@server:~$ resolvectl query registry.corp.example
registry.corp.example: 10.30.40.50 -- link: ens160
-- Information acquired via protocol DNS in 12.7ms.
-- Data is authenticated: no
Meaning: host resolves a corporate name through its configured link DNS.
Decision: if containers can’t resolve this, you need to pass those corporate resolvers to Docker, not public DNS.
Task 12: Validate container resolution against explicit DNS servers (quick proof)
cr0x@server:~$ docker run --rm --dns 10.20.0.10 --dns 10.20.0.11 alpine:3.20 sh -lc 'nslookup example.com | sed -n "1,8p"'
Server: 10.20.0.10
Address: 10.20.0.10:53
Non-authoritative answer:
Name: example.com
Address: 93.184.216.34
Meaning: with explicit upstream DNS, the container resolves fine.
Decision: your permanent fix is to configure Docker daemon DNS (or Compose/network DNS) to use reachable servers or the real resolv.conf.
Task 13: Check Docker daemon configuration file presence and effective settings
cr0x@server:~$ sudo test -f /etc/docker/daemon.json && sudo cat /etc/docker/daemon.json || echo "no /etc/docker/daemon.json"
no /etc/docker/daemon.json
Meaning: no explicit daemon configuration is set.
Decision: you can safely add a minimal daemon.json to control DNS rather than playing symlink games.
Task 14: Confirm what resolv.conf Docker uses when the host file changes
cr0x@server:~$ sudo readlink -f /etc/resolv.conf
/run/systemd/resolve/stub-resolv.conf
Meaning: Docker will commonly pick this up at daemon start, not necessarily dynamically per container.
Decision: if you change host resolv.conf symlink, restart Docker to ensure it re-reads it (plan maintenance window if needed).
Joke #2: Nothing says “high availability” like shipping a DNS fix that requires a daemon restart at 2 p.m. on a Tuesday.
Fix patterns that work (without disabling systemd-resolved)
There are three sane approaches. Pick one based on how “enterprise” your DNS is, and how much you value keeping host networking standard.
I’ll tell you what I’d do in production for each case.
Fix pattern A (recommended): tell Docker to use real upstream DNS servers
This is the cleanest in the “make failures boring” sense. You stop Docker from inheriting the stub resolver and give it
the resolvers it should forward to.
Create or edit /etc/docker/daemon.json:
cr0x@server:~$ sudo install -d -m 0755 /etc/docker
cr0x@server:~$ sudo tee /etc/docker/daemon.json >/dev/null <<'EOF'
{
"dns": ["10.20.0.10", "10.20.0.11"],
"dns-search": ["corp.example"]
}
EOF
cr0x@server:~$ sudo systemctl restart docker
cr0x@server:~$ docker info 2>/dev/null | sed -n '/DNS/,+3p'
DNS: 10.20.0.10
DNS: 10.20.0.11
What this does: Docker’s embedded DNS and/or container resolv.conf now points at resolvers reachable from containers (via host NAT).
Why it’s good: deterministic, no symlink games, resilient to Ubuntu updates.
Downside: if upstream DNS changes (DHCP, roaming laptops), you must update this config or automate it.
Fix pattern B: point host /etc/resolv.conf at systemd-resolved’s non-stub file
This is common and works well on servers with stable networking. The trick: you’re not disabling resolved; you’re just
making /etc/resolv.conf list real upstream servers instead of 127.0.0.53.
Docker then inherits usable resolvers.
cr0x@server:~$ sudo mv /etc/resolv.conf /etc/resolv.conf.bak
cr0x@server:~$ sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf
cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 32 Dec 28 14:01 /etc/resolv.conf -> /run/systemd/resolve/resolv.conf
cr0x@server:~$ sudo systemctl restart docker
What this does: host glibc resolver now queries upstream servers directly.
Why it’s good: Docker inherits good nameservers without extra daemon config.
Downside: some setups rely on the stub mode for specific behaviors; also, laptops/VPN setups can rotate DNS a lot.
If this host roams networks, I’d rather configure Docker explicitly.
Fix pattern C: per-Compose / per-container DNS overrides (surgical, sometimes necessary)
This is what you do when one stack needs corporate DNS and another must use public resolvers, or when a vendor container
does something weird with DNS. It’s also what you do when you don’t own the host daemon settings (hello, shared runners).
Example with Docker Compose:
cr0x@server:~$ cat compose.yaml
services:
app:
image: alpine:3.20
command: ["sh", "-lc", "cat /etc/resolv.conf; nslookup example.com | sed -n '1,8p'; sleep 3600"]
dns:
- 10.20.0.10
- 10.20.0.11
dns_search:
- corp.example
What this does: overrides resolv.conf inside the container.
Why it’s good: limited blast radius; change control-friendly.
Downside: config sprawl. If you have 40 compose projects, you will forget one.
What to avoid: “just disable systemd-resolved” as a reflex
You can absolutely rip out systemd-resolved and use a static resolv.conf or another local resolver.
Sometimes that’s the right move, especially if you already run dnsmasq/unbound and you want a simpler stack.
But as a default response, it’s heavy-handed.
Disabling resolved often breaks:
- VPN split DNS behavior integrated with NetworkManager
- Per-interface DNS configuration on multi-homed hosts
- Systems expecting resolved’s stub (various desktop and dev tooling assumptions)
If you want reliability, don’t remove components until you can explain exactly what they were doing for you.
One subtle production detail: restart behavior and container lifecycle
Docker usually writes a container’s /etc/resolv.conf at container start. If you fix DNS and don’t restart containers,
some will keep the old broken config. So the rollout plan matters:
- Fix Docker daemon DNS.
- Restart Docker if needed.
- Redeploy / restart containers to refresh their resolv.conf.
Three corporate-world mini-stories (how this fails in real orgs)
Mini-story 1: the incident caused by a wrong assumption
A team I worked with ran build agents on Ubuntu hosts. Nothing fancy: Docker builds, a few integration tests,
then push artifacts to an internal registry. A routine OS upgrade rolled through: Ubuntu 24.04, kernel updates,
a reboot, and everything looked fine. Host could resolve anything. The CI agents? Half of them started failing
on “could not resolve host” mid-pipeline.
The wrong assumption was painfully simple: “If the host resolves DNS, containers will too.”
That belief held for years on older configurations because the host’s /etc/resolv.conf listed real upstream servers.
The upgrade flipped it to the systemd stub, and Docker dutifully copied the stub address into containers.
Containers then asked themselves for DNS answers. They were not as wise as they looked.
The initial response was classic corporate escalation theater: open tickets with the network team, ask if DNS is down,
retry builds, blame “flaky infra.” Meanwhile, the clue was sitting in plain sight: nameserver 127.0.0.53
inside the container. The fix was to set Docker daemon DNS to the corporate resolvers and restart the Docker service
during a maintenance window.
The lasting lesson wasn’t “systemd-resolved is bad.” It was “namespaces change what localhost means.”
On every incident review since, the team added one line: always check container /etc/resolv.conf
before diagnosing upstream DNS.
Mini-story 2: the optimization that backfired
Another org decided they were going to “reduce latency” by hardcoding public DNS servers into Docker on all dev machines.
The reasoning sounded decent: fewer moving parts, no reliance on corporate DNS, and faster resolution than the VPN.
They pushed a standard /etc/docker/daemon.json with two public resolvers.
It worked for a week. Then internal tooling started failing in containers: package mirrors, internal Git hosts,
service discovery under a private domain. On the host, it still worked because systemd-resolved
applied split DNS: corporate domains went to corporate resolvers, everything else went public.
Containers bypassed that logic and hit public DNS for private zones, returning NXDOMAIN quickly and confidently.
The “optimization” didn’t just fail; it failed fast, in the most misleading way. Developers saw “host works, container doesn’t,”
assumed Docker networking was broken, and started adding hacks: extra --add-host entries, bespoke DNS settings per project,
and caches that masked the issue until the next network change.
The rollback was to stop forcing public DNS globally, and instead configure Docker to use the same upstream servers as the host,
or use per-project DNS only when a legitimate exception existed. The deeper win was policy: DNS is part of environment parity.
If your containerized dev environment doesn’t follow the same DNS routing rules as production, you will debug ghosts.
Mini-story 3: the boring but correct practice that saved the day
One SRE team had an unglamorous habit: every host had a tiny “network sanity” script run by cron and also available as a runbook snippet.
It checked three things: host DNS resolution via resolvectl, container DNS resolution via a known minimal container,
and whether the host’s /etc/resolv.conf was stub or upstream.
When Ubuntu 24.04 rolled out, alerts started firing before customers noticed anything: “container DNS mismatch detected.”
Nothing was down yet because most services used already-running containers with cached results, but the team knew
that the next deploy would fail.
They used the script output to classify the issue as “stub propagated into containers” and applied a standard fix:
set Docker daemon DNS to upstream resolvers drawn from /run/systemd/resolve/resolv.conf, then restart Docker
during a controlled window, then restart only stateless services first.
The practice wasn’t glamorous. It didn’t involve service meshes or AI. It was basic validation at the boundaries
between host and container. And it turned what could have been a multi-team incident into a planned change.
Common mistakes: symptoms → root cause → fix
1) Containers show “Temporary failure in name resolution” immediately
Symptom: any DNS lookup fails instantly inside containers; host works.
Root cause: container /etc/resolv.conf has nameserver 127.0.0.53.
Fix: configure Docker daemon DNS (/etc/docker/daemon.json) or point host resolv.conf to
/run/systemd/resolve/resolv.conf, then restart Docker and restart affected containers.
2) Internal service names resolve, external names time out
Symptom: ping other-container works on a user-defined network, but apt update fails.
Root cause: Docker embedded DNS (127.0.0.11) works for internal discovery, but upstream forwarding breaks
due to stub resolver inheritance or unreachable upstream DNS.
Fix: set Docker daemon DNS to real upstream resolvers; confirm NAT/firewall permits UDP/TCP 53 egress.
3) Works off VPN, fails on VPN (or the other way around)
Symptom: corporate domains fail only when VPN is connected.
Root cause: host uses split DNS via resolved; containers are pointed at public resolvers or old resolvers.
Fix: feed Docker the same corporate resolvers (and search domains) the host uses, or add per-Compose DNS for stacks that require it.
4) Only some containers fail after a DNS change
Symptom: old containers keep working; new deployments fail.
Root cause: containers keep the resolv.conf they were created with; new ones get the new broken config.
Fix: after fixing Docker DNS, roll restart containers (prioritize stateless), or recreate compose stacks.
5) DNS works for small queries but fails for larger responses
Symptom: some lookups succeed; others hang or return SERVFAIL; often worse on VPN.
Root cause: MTU/fragmentation issues causing dropped UDP fragments, or EDNS0 size issues on a path.
Fix: reduce MTU on docker bridge or VPN interface, or ensure TCP fallback works; check resolved logs for “degraded feature set.”
6) You “fixed” it by setting 8.8.8.8, now internal things are broken
Symptom: public domains resolve; internal domains do not.
Root cause: you bypassed corporate DNS and split-horizon zones.
Fix: use corporate resolvers as upstream for Docker, not public ones, unless you’re certain your workloads never need internal DNS.
7) Host DNS breaks after changing /etc/resolv.conf symlink
Symptom: host suddenly stops resolving or behaves inconsistently after symlink changes.
Root cause: tools expecting stub mode; or you overwrote resolv.conf in a way that fights NetworkManager/netplan.
Fix: prefer Docker daemon DNS config (pattern A). If you do symlink changes, keep a backup and verify with resolvectl.
Checklists / step-by-step plan
Plan A (recommended): stable servers, you control Docker daemon
-
Confirm the failure: run a one-shot container and check
/etc/resolv.conf.
If you see127.0.0.53, proceed. -
Identify upstream resolvers from
/run/systemd/resolve/resolv.conforresolvectl status. -
Configure Docker daemon DNS in
/etc/docker/daemon.jsonwith those upstream servers. - Restart Docker in a controlled window.
- Roll containers (restart/recreate) to refresh container resolv.conf.
- Validate with both public and internal domains (if relevant).
- Record the decision in a runbook: “We pin Docker DNS to upstream resolvers; do not use 127.0.0.53.”
Plan B: you must preserve split DNS behavior precisely
- Use the corporate resolvers that resolved reports per-link, and include search domains if needed.
- For stacks that must resolve multiple zones differently, prefer per-Compose DNS overrides rather than one global setting.
-
Validate with
resolvectl queryon the host andnslookupinside containers for the same names.
Plan C: short-term mitigation while you wait for change control
-
Run critical containers with
--dnsexplicitly to restore functionality. - Document which services you touched and why, because this will otherwise become “mystery config” in three months.
- Schedule the proper daemon-level fix to avoid permanent snowflakes.
FAQ
1) Why does 127.0.0.53 work on the host but not in containers?
Because loopback is per network namespace. The container’s 127.0.0.53 is not the host’s resolver; it’s inside the container.
Unless you run resolved inside that container (don’t), nothing is listening there.
2) Isn’t Docker supposed to use 127.0.0.11 for DNS?
Often yes, especially on user-defined networks. But Docker’s embedded DNS still forwards to upstream resolvers.
If Docker learned upstream as 127.0.0.53 from the host, forwarding fails anyway.
3) Can I just change /etc/resolv.conf to the non-stub file?
You can, and it often works well on servers. On laptops or VPN-heavy environments, it may behave differently than stub mode.
If you want the least surprising path, configure Docker daemon DNS explicitly.
4) Do I need to restart Docker after changing DNS settings?
Yes for daemon-level changes. Also restart or recreate containers so their /etc/resolv.conf is regenerated with the new settings.
Otherwise you’ll fix the host and keep running broken containers.
5) What if my DNS servers are provided by DHCP and change frequently?
Then hardcoding DNS in /etc/docker/daemon.json is brittle unless you automate it.
For roaming hosts, consider a small automation that updates Docker DNS from /run/systemd/resolve/resolv.conf and restarts Docker during safe windows.
6) Why not set Docker DNS to a public resolver and move on?
Because split-horizon DNS is real. Internal zones won’t exist publicly, and some orgs intentionally serve different answers
internally vs externally. Public DNS might “work” until you touch anything corporate, then it fails in confusing ways.
7) Does this affect Kubernetes/containerd too?
Yes, in spirit. The exact mechanics differ (CNI, kubelet resolvConf settings), but the underlying issue remains:
containers need resolvers reachable from their namespace. Stub resolvers on loopback are a recurring footgun.
8) I see timeouts, not immediate failures. Is it still the stub problem?
Sometimes. Immediate “can’t resolve” often points to 127.0.0.53 inside the container.
Timeouts can also indicate firewall/NAT issues, VPN routing, MTU problems, or upstream resolver health. Use the fast diagnosis playbook to classify it.
9) Can I run a DNS cache on the host and point Docker at that?
Yes, but do it deliberately. If you run unbound/dnsmasq bound to a non-loopback address reachable from container networks,
containers can use it. Binding only to 127.0.0.1 recreates the namespace problem in a different outfit.
Conclusion: next steps you can actually ship
The recurring Ubuntu 24.04 + Docker DNS failure is not a “Docker is broken” story. It’s a predictable mismatch:
systemd-resolved advertises a loopback stub, and Docker (or your containers) can’t reach that loopback in a different namespace.
Once you see it, you can’t unsee it.
Practical next steps:
- Run the one-shot container check and look at
/etc/resolv.conf. Don’t guess. - If you see
127.0.0.53, pick a fix pattern: daemon DNS (preferred) or resolv.conf symlink to the non-stub file. - Restart Docker in a controlled window, then restart/recreate containers to pick up the new resolver config.
- Validate both public and internal domains (if you have any split DNS at all, assume you do).
- Write down the decision in your runbook so the next upgrade doesn’t re-teach the same lesson the hard way.
The best DNS fix is the one that turns DNS back into boring infrastructure. Save your adrenaline for things that deserve it.