DNS inside containers fails in the most demoralizing way: not always. You ship an image, the health checks are green, then the app can’t resolve db.internal for two minutes, recovers, fails again, and your incident timeline turns into interpretive dance.
On modern Linux, this frequently isn’t “Docker DNS” so much as “your host’s resolv.conf points at systemd’s stub resolver (127.0.0.53), and Docker copied it into a network namespace where that address doesn’t mean what you think it means.” Fixing it once is easy. Making it stick across reboots, VPN changes, DHCP renewals, and distro updates is the real job.
Fast diagnosis playbook
This is the order I use when someone says “DNS is broken in Docker.” It’s designed to identify the bottleneck quickly, not to satisfy philosophical curiosity.
First: is it container-only or host-wide?
- Resolve from the host with the same name the container is failing on.
- Resolve from a fresh container attached to the same network as the failing service.
- Compare the DNS servers in host
/etc/resolv.confvs container/etc/resolv.conf.
If the host can’t resolve it either, you don’t have a Docker problem. You have a DNS problem that Docker is politely inheriting.
Second: is 127.0.0.53 involved?
If the container’s /etc/resolv.conf lists nameserver 127.0.0.53, that’s your smoking gun on most systemd-resolved setups. Inside the container network namespace, 127.0.0.53 refers to the container itself, not the host stub.
Third: is Docker’s embedded DNS (127.0.0.11) failing, or upstream?
On user-defined bridge networks, Docker provides an embedded DNS server at 127.0.0.11 that forwards queries upstream. If containers can’t resolve service names (container-to-container) and can’t resolve external names, Docker’s embedded DNS or iptables path might be broken. If service discovery works but external doesn’t, upstream resolvers are likely wrong.
Fourth: are you dealing with split DNS?
VPNs and corporate networks love split DNS: internal domains go to internal resolvers; everything else goes public. systemd-resolved supports this elegantly. Docker… less so. Expect that a “fixed” DNS might still fail for *.corp names unless you explicitly teach Docker about the internal resolvers.
Fifth: check for MTU / TCP fallback / EDNS weirdness
Intermittent timeouts, especially across VPN, can be DNS packets being fragmented or dropped. Force TCP as a test. If TCP works and UDP doesn’t, you’re in PMTU / firewall land.
What actually breaks: Docker, resolv.conf, and systemd-resolved
Let’s name the moving parts:
- systemd-resolved is a local resolver manager. It can run a “stub listener” on
127.0.0.53, maintain per-link DNS servers, and support split DNS routing by domain. - /etc/resolv.conf is the file libc resolvers read to find nameservers and search domains. It can be a real file or a symlink managed by systemd-resolved.
- Docker generates a container’s
/etc/resolv.confbased on the host’s resolver config, with some behavior differences depending on network mode and whether you override DNS settings. - Docker embedded DNS (
127.0.0.11) exists on user-defined bridge networks. It provides container name resolution and forwards queries to upstream servers.
The classic failure mode goes like this:
- Your distro enables systemd-resolved.
/etc/resolv.confbecomes a symlink to a stub config that points tonameserver 127.0.0.53.- Docker sees that and writes the same nameserver into containers.
- Inside containers,
127.0.0.53is not the host’s resolver. It’s the container’s loopback. - Queries time out or fail with “connection refused.” Applications interpret that as “DNS is down.” They are correct.
There’s a second, more subtle failure mode: Docker uses 127.0.0.11 in containers, which forwards to upstream resolvers. But upstream gets populated from host config at Docker start time. Then your VPN comes up, systemd-resolved changes upstream per-link servers, and Docker keeps forwarding to the old ones. Everything seems fine until you need an internal domain. Then it’s “why does my laptop resolve it but the container can’t?”
And yes, you can paper over it by hardcoding 8.8.8.8 in Docker. That’s the engineering equivalent of treating chest pain with an energy drink. It might make the dashboard green. It can also bypass internal DNS, violate policy, break split DNS, and make your production debugging miserable.
One paraphrased idea from Dan Kaminsky (DNS researcher) that aged well: DNS is deceptively simple until it fails, and then it fails in ways that look like everything else.
(paraphrased idea)
One joke, since we’re here: DNS is the one dependency that can fail and still convince you your app is the problem. It’s like being ghosted by an IP address.
Interesting facts and history (the stuff that explains the weirdness)
- Fact 1: systemd-resolved’s stub listener uses
127.0.0.53by convention, not magic. It’s just loopback, and namespaces change what loopback means. - Fact 2: On many Ubuntu releases,
/etc/resolv.confis a symlink to/run/systemd/resolve/stub-resolv.confwhen systemd-resolved is enabled. - Fact 3: systemd-resolved maintains two generated resolv.conf variants: a stub file (points to
127.0.0.53) and a “real upstream” file (lists the actual DNS servers) usually at/run/systemd/resolve/resolv.conf. - Fact 4: Docker’s embedded DNS at
127.0.0.11isn’t a general-purpose recursive resolver. It’s a forwarder plus container service discovery, and it depends on upstream DNS being correct. - Fact 5: The
ndotsoption in resolv.conf changes how often “search domains” are appended. Highndotscombined with long search lists can turn a single lookup into multiple DNS queries and apparent slowness. - Fact 6: glibc’s resolver has historically favored UDP and falls back to TCP. Some firewalls drop fragmented UDP responses, causing slow fallbacks and timeouts that look intermittent.
- Fact 7: Prior to Docker’s embedded DNS behavior becoming common in user-defined networks, container DNS behavior varied more and people frequently bound containers to the host network to “fix DNS,” usually creating new problems.
- Fact 8: Split DNS (domain-based routing to specific resolvers) is common in enterprise environments and is a first-class feature of systemd-resolved; Docker does not inherently replicate per-domain routing unless you configure it deliberately.
- Fact 9: A surprising number of “DNS outages” in container fleets are actually about caching behavior: the resolver in the container caches differently than the host, or the app caches bad results and never retries correctly.
Hands-on tasks: commands, expected output, and what to decide
You don’t fix DNS with vibes. You fix it with targeted observations and decisions. Here are practical tasks I actually run, with what the output means and what to do next.
Task 1: Check what the host thinks /etc/resolv.conf is
cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Nov 18 09:02 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
Meaning: The host resolver file is a symlink to the stub. Expect 127.0.0.53 inside it.
Decision: You likely need Docker to use the upstream resolver file instead, or explicitly set DNS servers for Docker.
Task 2: Inspect the host stub resolv.conf
cr0x@server:~$ cat /run/systemd/resolve/stub-resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
nameserver 127.0.0.53
options edns0 trust-ad
search corp.example
Meaning: The host uses systemd’s local stub. Great on the host, toxic if copied into containers.
Decision: Don’t point containers at 127.0.0.53. Ensure they use real upstream DNS servers.
Task 3: Inspect the host “real” resolver list
cr0x@server:~$ cat /run/systemd/resolve/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
nameserver 10.20.30.40
nameserver 10.20.30.41
search corp.example
Meaning: These are the upstream DNS servers systemd-resolved will talk to.
Decision: These are usually the right servers to feed Docker, either via Docker daemon config or by repointing /etc/resolv.conf away from the stub (carefully).
Task 4: Confirm systemd-resolved status and stub listener
cr0x@server:~$ systemctl status systemd-resolved --no-pager
● systemd-resolved.service - Network Name Resolution
Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2026-01-02 07:41:12 UTC; 2h 18min ago
...
DNS Stub Listener: yes
Meaning: Resolved is active and the stub listener is enabled.
Decision: If you disable the stub listener, you must ensure the system still has a usable resolv.conf. Disabling blindly is how you create a host-wide outage.
Task 5: See per-link DNS, including split DNS routing
cr0x@server:~$ resolvectl status
Global
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
Current DNS Server: 10.20.30.40
DNS Servers: 10.20.30.40 10.20.30.41
DNS Domain: corp.example
Link 2 (ens18)
Current Scopes: DNS
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.20.30.40
DNS Servers: 10.20.30.40 10.20.30.41
DNS Domain: corp.example
Meaning: Confirms which DNS servers and domains are active. If a VPN link exists, it may have its own domains and servers.
Decision: If you depend on split DNS, you can’t just “use 1.1.1.1” and call it done. You need to propagate internal resolvers into Docker.
Task 6: Check what Docker thinks the DNS is (daemon view)
cr0x@server:~$ docker info --format '{{json .}}' | jq '.DockerRootDir, .Name, .SecurityOptions'
"/var/lib/docker"
"server"
[
"name=apparmor",
"name=seccomp,profile=default"
]
Meaning: This doesn’t show DNS directly, but confirms you’re not in some exotic runtime path. We’re setting context.
Decision: Proceed to inspect a container’s resolv.conf and the daemon config.
Task 7: Inspect DNS config inside a running container
cr0x@server:~$ docker exec -it web01 cat /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
search corp.example
Meaning: This container is pointing at itself for DNS. It will fail unless something inside the container listens on 127.0.0.53:53 (it won’t).
Decision: Fix Docker’s DNS input (daemon config) or the host’s resolv.conf linkage so containers get real upstream servers.
Task 8: Quick functional test from inside a container
cr0x@server:~$ docker exec -it web01 getent ahosts example.com
getent: Name or service not known
Meaning: libc resolution fails. This is not “curl can’t reach the internet,” it’s name resolution failing.
Decision: Confirm nameserver reachability and whether Docker embedded DNS is in play.
Task 9: Determine whether Docker embedded DNS is used (127.0.0.11)
cr0x@server:~$ docker run --rm alpine:3.19 cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
Meaning: This container is on a user-defined network or Docker decided to use embedded DNS. Good: it avoids the 127.0.0.53 trap. Bad: it still needs correct upstream servers.
Decision: If external resolution fails with 127.0.0.11, investigate Docker daemon upstream DNS, iptables, or firewall.
Task 10: Test upstream DNS reachability from the host network namespace
cr0x@server:~$ dig +time=1 +tries=1 @10.20.30.40 example.com A
; <<>> DiG 9.18.24 <<>> +time=1 +tries=1 @10.20.30.40 example.com A
;; NOERROR, id: 22031
;; ANSWER SECTION:
example.com. 300 IN A 93.184.216.34
Meaning: The upstream resolver is reachable and answering.
Decision: If containers still fail, the problem is likely container-to-host forwarding, Docker DNS forwarding, or NAT/iptables.
Task 11: Test reachability to the upstream resolver from inside a container
cr0x@server:~$ docker run --rm alpine:3.19 sh -c "apk add --no-cache bind-tools >/dev/null; dig +time=1 +tries=1 @10.20.30.40 example.com A"
; <<>> DiG 9.18.24 <<>> +time=1 +tries=1 @10.20.30.40 example.com A
;; connection timed out; no servers could be reached
Meaning: From the container’s network, that resolver isn’t reachable. Could be routing, firewall, or VPN-only accessibility from the host namespace.
Decision: If internal resolvers are only reachable via host routes not NATed to containers, you need to fix routing/NAT, use host networking for that service, or run a DNS forwarder that containers can reach.
Task 12: Inspect Docker network and container DNS details
cr0x@server:~$ docker inspect web01 --format '{{json .HostConfig.Dns}} {{json .NetworkSettings.Networks}}' | jq .
[
null,
{
"appnet": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"web01",
"web"
],
"NetworkID": "b3c2f5cbd7c7f0d7d3b7d7aa3d2c51a9c7bd22b9f5a3db0a3d35a8a2c4d9a111",
"EndpointID": "c66d42e9a07b6d6a8e6d6d3fb5a5a0de27b8464a9e7d0a2c4e5b11aa3aa2beef",
"Gateway": "172.18.0.1",
"IPAddress": "172.18.0.10",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:12:00:0a",
"DriverOpts": null
}
}
]
Meaning: No explicit DNS set on the container; it inherited defaults. Network is a user-defined bridge appnet.
Decision: If defaults are wrong, fix at the daemon level or per-network/per-service via Compose. Prefer daemon-level when you want consistency.
Task 13: Check Docker daemon DNS configuration file
cr0x@server:~$ sudo cat /etc/docker/daemon.json
{
"log-driver": "journald"
}
Meaning: No DNS configuration is defined. Docker will take cues from the host resolver and whatever it captured at start.
Decision: Add explicit DNS servers (and optionally search domains/options) if your host uses systemd-resolved stub or your upstream changes frequently.
Task 14: Validate if containers can reach Docker’s embedded DNS port
cr0x@server:~$ docker run --rm alpine:3.19 sh -c "apk add --no-cache drill >/dev/null; drill @127.0.0.11 example.com | head"
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 59030
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; example.com. IN A
Meaning: Embedded DNS is responding. If name resolution still fails in the app, you may have libc/options/search behavior issues, not raw DNS connectivity.
Decision: Inspect ndots, search domains, and application-level DNS caching.
Task 15: Check for search/ndots-induced lookup storms
cr0x@server:~$ docker exec -it web01 sh -c "cat /etc/resolv.conf; echo; getent hosts api"
nameserver 127.0.0.11
options ndots:5
search corp.example svc.cluster.local
getent: Name or service not known
Meaning: With ndots:5, the resolver treats short names as “relative” and tries search domains first. That can cause slow failures if those suffixes don’t resolve.
Decision: For non-Kubernetes Docker workloads, keep ndots modest (often 0–1) and prune search domains. Or teach applications to use FQDNs.
Task 16: Capture DNS traffic to prove where it dies
cr0x@server:~$ sudo tcpdump -ni any port 53 -c 10
tcpdump: data link type LINUX_SLL2
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
10:22:01.112233 vethabc123 Out IP 172.18.0.10.45122 > 10.20.30.40.53: 12345+ A? example.com. (28)
10:22:02.113244 vethabc123 Out IP 172.18.0.10.45122 > 10.20.30.40.53: 12345+ A? example.com. (28)
10:22:03.114255 vethabc123 Out IP 172.18.0.10.45122 > 10.20.30.40.53: 12345+ A? example.com. (28)
Meaning: Queries leave the container veth, but there are no replies. That’s not a libc problem. That’s path, firewall, or the upstream dropping you.
Decision: Check routing to the resolver from the Docker bridge, verify NAT/masquerade, confirm upstream accepts queries from that source.
Fixes that stick (pick one strategy and commit)
There are multiple valid fixes. What’s invalid is mixing them at random until it “works on my laptop.” Choose a strategy that matches your environment: laptops with VPN churn, servers with static resolvers, or mixed corporate networks.
Strategy A (most common): Configure Docker daemon with explicit DNS servers
This is the blunt, effective approach: tell Docker which DNS servers to hand to containers (or to use as upstream for the embedded DNS). It’s stable across systemd-resolved stub behavior and doesn’t depend on /etc/resolv.conf symlink games.
Do this when: your environment has known resolver IPs (internal resolvers, or a local DNS forwarder) and you want predictable behavior.
cr0x@server:~$ sudo tee /etc/docker/daemon.json >/dev/null <<'EOF'
{
"dns": ["10.20.30.40", "10.20.30.41"],
"dns-search": ["corp.example"],
"dns-opts": ["timeout:2", "attempts:2"]
}
EOF
cr0x@server:~$ sudo systemctl restart docker
What to look for: New containers should show either those resolvers directly or 127.0.0.11 with correct forwarding. Existing containers may need a restart to pick up new settings.
Trade-offs: Great consistency. Not great if your DNS servers are dynamic (like DHCP-provided on laptops). For laptops, consider pointing Docker at a local forwarder that tracks systemd-resolved.
Strategy B: Point host /etc/resolv.conf at systemd-resolved “real” upstream file
This fix makes Docker inherit actual upstream servers by changing what /etc/resolv.conf points to. It’s effective and fast, but it changes host behavior too. If you don’t understand the consequences, don’t do it on production nodes without a rollback plan.
Do this when: you want Docker to inherit upstream DNS automatically and you’re OK with bypassing the stub listener for libc clients reading /etc/resolv.conf.
cr0x@server:~$ sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf
cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 32 Jan 2 10:31 /etc/resolv.conf -> /run/systemd/resolve/resolv.conf
Meaning: You’ve made /etc/resolv.conf list real upstream servers, not 127.0.0.53.
Decision: Restart Docker so it re-reads the host resolver config; then restart affected containers.
cr0x@server:~$ sudo systemctl restart docker
Trade-offs: Split DNS can become less elegant if you relied on systemd-resolved doing per-domain routing for local stub clients. Some apps talk to resolved via NSS modules; many just read resolv.conf. Know your stack.
Strategy C: Run a local DNS forwarder reachable by containers, and point Docker at it
If you have split DNS, VPN churn, or “the host can reach internal resolvers but containers can’t,” a local forwarder is the boring, correct glue. The forwarder listens on an address reachable from containers (often the Docker bridge gateway) and forwards to systemd-resolved or upstream resolvers.
Do this when: resolver IPs change often, or internal resolvers are only reachable from the host namespace.
One practical pattern: run dnsmasq or unbound on the host, listening on 172.18.0.1 (your Docker bridge gateway) and forwarding to systemd-resolved upstreams. Then configure Docker daemon "dns": ["172.18.0.1"].
Why it sticks: Docker gets a stable DNS IP. The forwarder can track systemd-resolved changes or be reloaded on link change events.
Strategy D: Use Compose/service-level DNS overrides sparingly
Compose lets you set dns, dns_search, dns_opt per service. It’s useful for exceptions, but it’s not a fleet strategy. You will forget the special case six months later and rediscover it during an outage.
Use it when: one service needs a special resolver for a short period, or you’re migrating.
Strategy E: Host network mode for DNS-sensitive workloads (last resort)
Yes, --network host makes a lot of DNS problems disappear because the container shares the host namespace. It also removes network isolation and increases blast radius. It’s acceptable for a debugging container. It’s a last resort for production services unless you’ve got strong reasons.
Second joke (and last one): using host networking to fix DNS is like fixing a leaky faucet by removing the plumbing. Technically no more leaks.
Split DNS, VPNs, and corporate networks
Split DNS is where most “it works on the host but not in Docker” tickets are born. systemd-resolved can route queries based on domain suffix to specific links and servers. Docker’s model is simpler: containers get a resolv.conf with a list of nameservers, plus optional search domains. There’s no native “send corp.example to resolver A and everything else to resolver B” semantics inside resolv.conf beyond the order of servers and search list.
What happens in practice:
- Your host resolves
git.corp.examplebecause systemd-resolved knows that domain belongs to the VPN interface. - Your container uses
10.20.30.40because that was “global” at Docker start, but it doesn’t know the VPN-only resolver that arrived later. - You hardcode a public resolver into Docker, and now internal names never resolve at all.
When you’re in this world, the “right” answer is usually to run a DNS forwarder on the host that can do the split routing, and make containers query that forwarder. systemd-resolved can be that brain, but containers can’t safely query the stub at 127.0.0.53. So you either:
- Expose a resolved listener on a non-loopback address (not my favorite), or
- Use a forwarder that queries resolved via its upstream file or via D-Bus integration, or
- Push the same internal resolvers into Docker and accept you may need to restart Docker when VPN changes.
For laptops, “restart Docker when VPN connects” is not elegant, but it is honest. Automate it with a NetworkManager dispatcher script if you must. For servers, prefer stable resolvers or a local forwarder.
Three corporate-world mini-stories
1) Incident caused by a wrong assumption: “127.0.0.53 is always the host resolver”
A mid-sized SaaS company had a fleet of Ubuntu hosts running Docker. The platform team standardized on systemd-resolved because it handled VPN clients cleanly and kept resolver state sane. Someone noticed /etc/resolv.conf pointed at 127.0.0.53 and concluded, confidently, that all local processes—including containers—should just use it.
They built a base image with a health check: run getent hosts api.corp.example. It passed in their staging environment intermittently, which they interpreted as “DNS is flaky.” They bumped timeouts. They added retries. They blamed the DNS team. Classic.
Production incident arrived on a Monday morning: several services failed to connect to internal APIs. External DNS worked sometimes (thanks to caching and some services using direct IPs). Internal names consistently failed. The first responders restarted containers and saw some recover, which was worse than failing consistently: it suggested the issue was “in the app.”
The fix took 20 minutes once someone actually looked at a container’s /etc/resolv.conf and realized it contained 127.0.0.53. In the container namespace, that’s the container’s loopback. There was nothing listening on port 53. They updated Docker daemon DNS settings to use the upstream resolvers and rolled restarts. The postmortem theme was painful but useful: assumptions about loopback addresses do not survive namespaces.
2) Optimization that backfired: “We’ll reduce DNS latency by hardcoding public resolvers”
A large enterprise team ran developer workstations with Docker Desktop-like setups on Linux. Their internal DNS servers were occasionally slow during peak hours. A well-meaning engineer proposed an “optimization”: set Docker daemon DNS to public resolvers to improve lookup performance for external dependencies (package repos, container registries).
It looked like a win for a week. Builds were a little faster. People stopped complaining about slow apt resolves. Then a new internal service launched with a name that only existed in internal DNS. Containers started failing to resolve it while the host could resolve it fine. The failures were confusing: developers could curl the service from the host, but their containers couldn’t.
Debugging turned into finger-pointing between app teams and infra. Eventually someone noticed the Docker daemon had been pinned to public DNS, bypassing split DNS entirely. The “optimization” had quietly removed access to internal name resolution for all containers, and in some environments it also violated policy around DNS logging and data exfiltration controls.
The rollback was straightforward: revert Docker DNS to internal resolvers, and add a local caching forwarder to reduce latency without bypassing corporate controls. The lesson was practical: optimizing DNS by picking a “fast” resolver is a trap when your organization uses DNS as a routing and policy mechanism.
3) Boring but correct practice that saved the day: “We standardize DNS tests in CI and on hosts”
A payments company had been burned by intermittent DNS failures during a prior migration. They responded with a boring playbook: every node had a small diagnostics container image available locally, and every deployment pipeline ran a DNS sanity suite before and after rollout.
The suite wasn’t fancy. It checked that containers could resolve a public name, resolve an internal name, and resolve a service name on the Docker network. It also checked for 127.0.0.53 inside container resolv.conf, because they’d seen that movie before. The pipeline failed fast if any of those checks failed.
Six months later, a distro update changed how /etc/resolv.conf was managed on a new base image for nodes. On half the new nodes, Docker started handing 127.0.0.53 into containers. The tests caught it before services rolled out widely. The incident was a non-event: a small batch of nodes never entered service until fixed.
That’s not glamorous engineering. It is the kind that lets you sleep. Standardized diagnostic checks don’t prevent every outage, but they stop the dumb ones from repeating.
Common mistakes: symptom → root cause → fix
1) “Containers can’t resolve anything; host resolves fine”
Symptom: getent hosts example.com fails in container; works on host.
Root cause: Container /etc/resolv.conf contains nameserver 127.0.0.53 inherited from systemd-resolved stub.
Fix: Configure Docker daemon DNS to real upstream servers, or repoint host /etc/resolv.conf to /run/systemd/resolve/resolv.conf and restart Docker.
2) “External DNS works, internal domains fail (especially after VPN connects)”
Symptom: example.com resolves; git.corp.example fails in containers.
Root cause: Docker captured DNS servers before VPN link added split DNS; containers don’t see VPN-provided resolvers.
Fix: Use a local DNS forwarder and point Docker to it, or restart Docker on VPN state changes (and accept the disruption), or explicitly set internal DNS servers in Docker.
3) “Intermittent DNS timeouts; retries help”
Symptom: Some queries time out, then succeed; logs show bursts of DNS errors.
Root cause: UDP fragmentation/MTU issues across VPN; EDNS0 responses dropped; firewall blocks large UDP DNS.
Fix: Test with TCP-based queries; adjust MTU; consider disabling EDNS0 for specific resolver paths; or run a local forwarder that handles TCP upstream.
4) “Service discovery inside the Docker network fails”
Symptom: Container A cannot resolve container B by name on a user-defined network.
Root cause: Embedded DNS path broken, or container is on default bridge without embedded DNS semantics, or conflicting --dns disables Docker’s service discovery expectations depending on setup.
Fix: Use a user-defined bridge network; avoid overriding DNS per-container unless necessary; verify Docker network driver health; inspect iptables rules.
5) “Lookups are slow; CPU spikes in app during DNS calls”
Symptom: Latency increases; app threads block on DNS; high query volume.
Root cause: Overgrown search domain list + high ndots causing multiple queries per lookup; app uses short names.
Fix: Reduce search domains; set ndots appropriately; use FQDNs in config; consider local caching resolver.
6) “We set Docker DNS and now some apps still use old resolvers”
Symptom: After changing daemon.json, some containers still have old resolv.conf.
Root cause: Existing containers retain their generated resolv.conf until recreated/restarted (depending on runtime behavior and bind-mounts).
Fix: Restart/recreate containers; ensure you didn’t bind-mount /etc/resolv.conf unintentionally; verify with docker exec cat /etc/resolv.conf.
Checklists / step-by-step plan
Checklist 1: Verify the failure mode in 5 minutes
- From host: resolve the failing name with
getent hostsanddigto confirm host health. - From container:
cat /etc/resolv.confand look for127.0.0.53or suspicious search/ndots. - From container: query a known resolver directly with
dig @IPto separate “DNS path” from “DNS config.” - From host: run
resolvectl statusto find actual upstream servers and split DNS domains. - Decide: do you need static DNS (servers) or dynamic/split behavior (forwarder)?
Checklist 2: Apply a stable fix on servers (recommended path)
- Choose DNS servers that are reachable from container networks (often internal resolvers).
- Set Docker daemon
dnsanddns-searchin/etc/docker/daemon.json. - Restart Docker during a maintenance window.
- Restart/recreate containers so they pick up changes.
- Run a small DNS test container in CI/CD and on-node as a readiness gate.
Checklist 3: Apply a stable fix on laptops with VPN churn
- Stop trying to keep Docker perfectly aligned with systemd-resolved’s per-link state manually.
- Run a local DNS forwarder bound to an address containers can reach (bridge gateway or host IP).
- Point Docker daemon DNS to that forwarder.
- Optionally reload the forwarder on VPN connect/disconnect events.
- Keep a single diagnostic container image handy and test internal + external resolution after network changes.
Checklist 4: Change management that prevents repeat incidents
- Codify Docker DNS configuration (daemon.json) in config management.
- Monitor for unexpected changes to
/etc/resolv.confsymlink target. - Document whether you rely on split DNS and which internal domains matter.
- Add a small canary test that resolves at least one internal and one external name from a container on each node.
- When updating base images, validate systemd-resolved behavior before rolling widely.
FAQ
Why does 127.0.0.53 work on the host but not in containers?
Because containers run in their own network namespace. Loopback is per-namespace. 127.0.0.53 inside the container points to the container itself, not the host’s systemd-resolved stub.
Why do some containers show 127.0.0.11 instead?
That’s Docker’s embedded DNS server, commonly used on user-defined bridge networks. It provides container name resolution and forwards external queries upstream.
If Docker uses embedded DNS, why do I care about systemd-resolved?
Because embedded DNS is a forwarder. It still needs upstream resolvers. If Docker captured bad upstream settings (like the stub address) or stale ones (pre-VPN), you still lose.
Should I disable systemd-resolved to fix Docker DNS?
Usually no. systemd-resolved is fine; the problem is exporting its stub config into containers. Prefer configuring Docker DNS explicitly or using the upstream resolv.conf file. Disabling resolved can create new issues, especially with split DNS and modern network managers.
Is repointing /etc/resolv.conf safe?
It can be, but it changes host behavior. On some systems, tools expect the stub configuration. If you do it, validate host resolution behavior and make it a managed change, not an ad-hoc tweak at 2 a.m.
Do I need to restart Docker after changing DNS settings?
Yes, for daemon-level changes. And you typically need to restart or recreate containers for them to pick up new resolv.conf content.
Why does DNS break only after connecting to VPN?
Because VPN often injects DNS servers and search domains dynamically. systemd-resolved updates its per-link configuration, but Docker doesn’t automatically reconfigure running containers or always refresh upstream settings unless restarted or explicitly guided.
Can I just set Docker DNS to a public resolver and move on?
In corporate environments, that commonly breaks internal name resolution and may violate policy. Even outside corporate networks, it can hide the real issue (like MTU problems) and create new failure modes.
How do I tell if I’m dealing with MTU/fragmentation DNS issues?
Symptoms are intermittent timeouts, especially for records with larger responses. Use dig and compare UDP vs TCP behavior. If TCP succeeds consistently while UDP times out, suspect MTU/fragmentation or firewall drops.
What’s the most robust “set it and forget it” approach?
On servers: configure Docker daemon DNS to known internal resolvers (or a local forwarder) and keep it managed. On laptops with VPN churn: a local forwarder + Docker pointing to it usually wins.
Conclusion: next steps you can execute today
Docker DNS “mysteries” are often just namespacing meets systemd-resolved. The fix is to stop letting containers inherit a loopback-only stub resolver address, and to pick a stable source of truth for upstream DNS.
- Run the fast diagnosis playbook and confirm whether
127.0.0.53is leaking into containers. - Pick a strategy: daemon-level DNS configuration (most common), repoint
/etc/resolv.confto the upstream file (carefully), or add a local forwarder for split DNS and VPN churn. - Make it stick: manage the config, restart Docker deliberately, and add a container-based DNS test as a gate so this class of outage stops repeating.
If you do only one thing: inspect /etc/resolv.conf inside a failing container before you restart anything. It’s amazing how often that one file explains the whole incident.