Ubuntu 24.04: DNS caches lie — flush the right cache (and stop flushing the wrong one) (case #86)

December 27, 2025 • February 3, 2026 • Read: 20 min • Views: 14

Was this helpful?

DNS failures in production rarely look like “DNS is down.” They look like you: staring at dig returning the right IP, while your app stubbornly connects to the old one. Or worse, half your fleet works and the other half acts like it’s still Tuesday.

On Ubuntu 24.04, flushing “the DNS cache” is a trap because there isn’t one cache. There are layers. Some are local, some are per-process, some are in your browser, some are in your container runtime, and some are upstream resolvers quietly doing “helpful” things like negative caching. If you flush the wrong layer, you don’t fix anything—you just burn time and confidence.

The real model: where DNS answers get cached on Ubuntu 24.04

DNS on a modern Ubuntu host is a relay race, and each runner may keep its own notebook. When you query “what is api.example.internal today?”, the answer may come from:

1) Your application (yes, your app)

Many runtimes cache DNS aggressively or weirdly:

Java: DNS caching depends on security properties; it can cache forever in some setups.
Go: can use the system resolver or a pure-Go resolver depending on build flags and environment; behavior differs.
Python: libraries may cache, HTTP clients may pool connections, and “DNS problems” are sometimes stale keep-alive sockets.
glibc: traditionally does not provide a full DNS cache, but it does have behavior like options attempts, timeout, and can be influenced by NSS modules.

If your app caches DNS, flushing systemd-resolved won’t help. Restarting the app might.

2) NSS (Name Service Switch) and its order of operations

The file /etc/nsswitch.conf decides whether the host checks /etc/hosts before DNS, whether it uses mDNS, and more. A line like this:

cr0x@server:~$ grep -E '^hosts:' /etc/nsswitch.conf
hosts:          files mdns4_minimal [NOTFOUND=return] dns mymachines

…means /etc/hosts can override DNS, and mDNS can short-circuit lookups. When someone says “DNS is lying,” it’s sometimes not DNS—it’s NSS obeying your local files.

3) systemd-resolved (the usual suspect on Ubuntu 24.04)

Ubuntu 24.04 commonly runs systemd-resolved and points /etc/resolv.conf to a stub resolver (often 127.0.0.53). That stub caches answers, does DNSSEC validation depending on config, can do DNS over TLS, and can route queries per-link (per interface) based on network settings.

4) A local caching resolver you installed on purpose

Some environments deploy dnsmasq, unbound, or bind9 on hosts or nodes. That’s a separate cache. Flushing systemd-resolved won’t flush unbound. Shocking, I know.

5) Upstream resolvers: corporate DNS, VPC resolvers, kube-dns/CoreDNS, ISP resolvers

Even if your host is clean, upstream resolvers cache too—often with their own minimum TTLs, prefetch behavior, or negative caching settings. If your upstream resolver is serving stale data, your host can’t “flush” it. You can only bypass it or force it to refresh (and you might not have permission).

6) Network middleboxes and “helpful” security products

Some corporate networks intercept DNS. Some rewrite answers. Some block specific domains. If your packets never reach the resolver you think you’re using, the cache you flush is irrelevant.

One paraphrased idea, because it’s still the best operational advice: paraphrased idea — you can’t fix what you can’t observe (attributed to Gene Kim’s reliability/DevOps themes).

Operational rule: Before flushing anything, prove which layer answered the question and which resolver was actually contacted. Otherwise you’re “fixing” DNS the way people “fix” printers: by bullying them.

Joke #1: DNS is the only system where “it’s cached” is both an explanation and a confession.

Interesting facts and historical context (the stuff that makes today’s mess make sense)

DNS negative caching has been around for decades. NXDOMAIN answers can be cached using the SOA “minimum” value, so “it didn’t exist five minutes ago” can stick around.
Ubuntu’s resolver story has changed multiple times. Between resolvconf, systemd-resolved, NetworkManager, and netplan, the “right place” to configure DNS is era-dependent.
/etc/resolv.conf is often a symlink now. On systemd systems it may point to a stub file managed by resolved; editing it directly can be undone on reboot or link changes.
glibc historically avoided a full DNS cache. That decision pushed caching to dedicated daemons (like nscd) and later to systemd-resolved and applications.
Browsers became their own resolvers. Modern browsers do aggressive caching, prefetching, and sometimes DNS-over-HTTPS; “works in dig” doesn’t guarantee “works in Chrome.”
TTL is not a promise; it’s a hint. Some resolvers cap TTLs (minimum/maximum), and some clients extend caching due to internal policies.
Split-horizon DNS is a feature, not a bug. The same name can legitimately resolve differently depending on source network; the “wrong” answer might be the right answer for a different interface.
IPv6 can “win” even when you don’t want it. Many clients try AAAA first; if your IPv6 path is broken, it looks like DNS flakiness.
systemd-resolved can route per-link DNS. VPN up? Now half your names go to the VPN resolver and half go to the LAN resolver, depending on routing domains.

Fast diagnosis playbook (check these first, in order)

This is the shortest path to the truth when “DNS is wrong” on Ubuntu 24.04. Don’t freestyle. Follow the chain.

Step 1: Identify what the application is actually doing

Is it using system resolver (glibc/NSS) or an internal resolver (browser, JVM, Go netgo)?
Is the “DNS problem” really a stale TCP connection pool?
Does it run in a container with its own /etc/resolv.conf?

Step 2: Prove what the host resolver stack is

Is /etc/resolv.conf a stub to 127.0.0.53?
Is systemd-resolved active?
Is there dnsmasq, unbound, nscd involved?

Step 3: Compare answers across paths

Compare getent hosts (NSS path) vs dig @server (direct resolver) vs app behavior.
Check both A and AAAA, and confirm which is used.

Step 4: If it’s caching, find which cache

Flush systemd-resolved only if it’s the caching layer in play.
If upstream resolver is stale, bypass or query authoritative/alternative resolvers to confirm.
If the app caches, restart or reconfigure the app—don’t keep whacking the OS.

Step 5: Validate on the wire

Use tcpdump on port 53 (or 853 for DoT) to see if queries leave the box.
If no packets leave, you’re not dealing with “DNS server” issues; you’re dealing with local resolution.

Practical tasks: commands, expected output, and decisions (12+)

These are designed to be run on Ubuntu 24.04 servers and desktops. Each task includes (1) a command, (2) what the output means, and (3) what decision you make next.

Task 1: See what `/etc/resolv.conf` really is

cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Jun 12 09:41 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

Meaning: You’re using the systemd-resolved stub. Most apps will query 127.0.0.53, not your upstream DNS servers directly.

Decision: Stop editing /etc/resolv.conf by hand. Inspect resolved configuration and per-link DNS instead.

Task 2: Confirm `systemd-resolved` is active

cr0x@server:~$ systemctl status systemd-resolved --no-pager
● systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/usr/lib/systemd/system/systemd-resolved.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-12-26 10:12:01 UTC; 2h 41min ago
       Docs: man:systemd-resolved.service(8)

Meaning: There is a local caching stub in play.

Decision: Use resolvectl as your primary diagnostic tool; flushing resolved may be relevant.

Task 3: Find the actual upstream DNS servers resolved is using

cr0x@server:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 10.20.0.53
       DNS Servers: 10.20.0.53 10.20.0.54
        DNS Domain: corp.internal

Link 2 (ens160)
    Current Scopes: DNS
         Protocols: +DefaultRoute
Current DNS Server: 10.20.0.53
       DNS Servers: 10.20.0.53 10.20.0.54
        DNS Domain: corp.internal

Meaning: The upstream servers are 10.20.0.53/54, and there’s a routing domain (corp.internal).

Decision: If answers are stale, you must determine whether the staleness is in resolved’s cache or in 10.20.0.53.

Task 4: Compare the NSS path vs direct DNS query

cr0x@server:~$ getent hosts api.corp.internal
10.50.12.19     api.corp.internal

cr0x@server:~$ dig +short api.corp.internal
10.50.12.19

Meaning: At least right now, NSS and the stub resolver agree.

Decision: If the application still hits an old IP, suspect app-level caching or connection reuse.

Task 5: Bypass the stub and query the upstream server directly

cr0x@server:~$ dig @10.20.0.53 api.corp.internal +noall +answer +ttlid
api.corp.internal.  30  IN  A  10.50.12.19

Meaning: Upstream resolver returns 10.50.12.19 with TTL 30 seconds.

Decision: If your host returns something else, the problem is local caching/routing. If upstream is wrong, flushing locally is theater.

Task 6: Check whether `/etc/hosts` is overriding DNS

cr0x@server:~$ grep -n 'api.corp.internal' /etc/hosts || true
12:10.10.10.10 api.corp.internal

Meaning: You have a hard-coded override. NSS will likely return this before DNS.

Decision: Remove or correct the entry, then retest with getent hosts. Flushing caches won’t fix a file.

Task 7: Inspect NSS order for host lookups

cr0x@server:~$ grep -E '^hosts:' /etc/nsswitch.conf
hosts:          files mdns4_minimal [NOTFOUND=return] dns mymachines

Meaning: files (aka /etc/hosts) comes first. mDNS may intercept some names.

Decision: If you’re chasing a name that could be handled by mDNS (.local), test with and without it. If corporate policy forbids mDNS surprises, adjust the order.

Task 8: Flush `systemd-resolved` cache (only when it’s the right layer)

cr0x@server:~$ sudo resolvectl flush-caches

Meaning: Resolved drops cached positive and negative responses.

Decision: Immediately re-run resolvectl query name or getent hosts. If answers don’t change, the “lie” is not in resolved’s cache.

Task 9: Show cache statistics (is the cache even being used?)

cr0x@server:~$ resolvectl statistics
DNSSEC Verdicts:                 Secure=0 Insecure=0 Bogus=0 Indeterminate=0
Cache:                           Current Cache Size=42 Max Cache Size=4096
Cache Hits:                      118
Cache Misses:                    67
Cache Evictions:                 0
DNS Transactions:                79

Meaning: Resolved is caching and serving hits. If cache hits are high and wrong answers persist, flushing may help—unless the wrong answer is being refetched from upstream.

Decision: If cache hits are near zero, your issue likely isn’t resolved caching; look for app caching or a different resolver.

Task 10: Query through resolved with verbose details

cr0x@server:~$ resolvectl query api.corp.internal
api.corp.internal: 10.50.12.19                       -- link: ens160

-- Information acquired via protocol DNS in 28.4ms.
-- Data is authenticated: no

Meaning: It shows which link/interface was used. That’s critical when VPNs or multiple NICs exist.

Decision: If the query goes out the “wrong” link, fix per-link DNS settings (netplan/NetworkManager/systemd-networkd), not caches.

Task 11: Catch DNS on the wire (prove whether queries leave the host)

cr0x@server:~$ sudo tcpdump -ni any '(udp port 53 or tcp port 53)'
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
10:41:15.219233 ens160 Out IP 10.20.1.44.53344 > 10.20.0.53.53: 12345+ A? api.corp.internal. (35)
10:41:15.245090 ens160 In  IP 10.20.0.53.53 > 10.20.1.44.53344: 12345 1/0/0 A 10.50.12.19 (51)

Meaning: Queries are leaving to 10.20.0.53 and replies come back. If your app still sees a different IP, the issue is above the OS resolver path (app cache or connection reuse) or a split-horizon resolution difference in the app environment.

Decision: If you see no outbound queries while running lookups, you’re hitting a local cache or an entirely different mechanism (like DoH in a browser).

Task 12: Check if DoT (port 853) is used

cr0x@server:~$ sudo tcpdump -ni any 'tcp port 853'
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
10:43:50.102991 ens160 Out IP 10.20.1.44.42132 > 10.20.0.53.853: Flags [S], seq 205383221, win 64240, options [mss 1460,sackOK,TS val 151512 ecr 0,nop,wscale 7], length 0

Meaning: You’re using DNS-over-TLS. Some middleboxes and tools assume UDP/53 only; they’ll mislead you.

Decision: When diagnosing “DNS blocked,” check firewall rules for 853 and confirm resolver support.

Task 13: Validate what a container sees (Docker example)

cr0x@server:~$ sudo docker run --rm alpine:3.20 cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

Meaning: Containers may use Docker’s embedded DNS (127.0.0.11), which is its own caching/forwarding layer.

Decision: If only containers are broken, stop flushing host caches and inspect container DNS configuration and the engine.

Task 14: See what IPs your app is actually connecting to

cr0x@server:~$ sudo ss -tnp | grep ':443' | head
ESTAB 0 0 10.20.1.44:50412 10.10.10.10:443 users:(("myapp",pid=23144,fd=27))

Meaning: The app is connected to 10.10.10.10. If DNS says 10.50.12.19, you may have stale connection pooling, an old resolved result inside the process, or an /etc/hosts override.

Decision: Restart the app or rotate its connection pool; don’t keep blaming DNS until you’ve proven a fresh lookup happens.

Task 15: Check for `nscd` (another caching layer)

cr0x@server:~$ systemctl is-active nscd || true
inactive

Meaning: nscd is not active. Good: one less cache to argue with.

Decision: If it’s active in your environment, learn how it caches and flush it intentionally; otherwise you will chase ghosts.

Task 16: Identify who owns port 53 locally (conflicts)

cr0x@server:~$ sudo ss -lunp | grep ':53 ' || true
UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=862,fd=13))

Meaning: The stub resolver is listening. If you expected dnsmasq on 127.0.0.1:53, that’s a mismatch.

Decision: Don’t run multiple local DNS daemons without a plan; decide which process owns the stub and how clients should reach it.

Task 17: Check for negative caching (NXDOMAIN) that might be stuck

cr0x@server:~$ resolvectl query doesnotexist.corp.internal
doesnotexist.corp.internal: resolve call failed: 'does not exist'

Meaning: NXDOMAIN is in play. If that name was just created, you may be fighting negative caching upstream or locally.

Decision: Flush local caches, then query upstream directly. If upstream still returns NXDOMAIN, the DNS change hasn’t propagated there.

Flushing the right cache (and why the usual advice is wrong)

The classic bad advice goes like this: “On Linux, run sudo systemctl restart networking” or “restart NetworkManager” or “edit /etc/resolv.conf and put 8.8.8.8.” That advice belongs in a museum next to dial-up modems.

What “flush DNS” can mean on Ubuntu 24.04

Flush systemd-resolved: sudo resolvectl flush-caches
Restart resolved (heavier): sudo systemctl restart systemd-resolved
Flush browser DNS cache: not an OS command; it’s a browser setting/action
Flush Docker embedded DNS: often requires restarting containers or the daemon; it’s not the host’s stub
Flush upstream resolver: requires access to that resolver; you can’t do it from a client box

How to choose the right flush

Use this decision tree:

If getent hosts name is wrong but dig @upstream name is right: flush systemd-resolved (and check per-link routing domains).
If getent is right but your app is wrong: it’s probably app cache, stale connections, or a different resolver path (DoH, netgo, JVM).
If dig @upstream is wrong: flushing locally won’t help. Escalate to DNS owners or bypass with a different resolver for validation.
If containers are wrong but host is right: you’re in container DNS land (127.0.0.11, kube-dns, CoreDNS, node-local-dns).

The only “flush” you should do by default

None. Default flushing is busywork. Your default should be prove the path, then flush the layer that’s actually caching.

Joke #2: Restarting NetworkManager to fix DNS is like rebooting the office to fix a typo.

Three corporate mini-stories from the DNS trenches

Mini-story 1: The incident caused by a wrong assumption (“flushed DNS, still broken”)

A mid-sized SaaS company rolled out a new internal API endpoint behind a load balancer. The change was clean: update DNS from an old VIP to the new one, TTL set to 30 seconds, and a slow migration window. The on-call ran dig and saw the new IP immediately. Victory lap energy.

But the application servers kept calling the old VIP for another hour. Some requests succeeded (because the old VIP still served something), others hit a dead backend pool and returned timeouts. The incident channel filled with “DNS is cached somewhere.” Someone flushed systemd-resolved across a subset of servers. No change.

The wrong assumption: that the application was using the OS resolver on every request. It wasn’t. The app was a JVM service with a DNS cache configuration inherited from an ancient base image. It cached positive results longer than anyone remembered, and the service used a persistent connection pool that didn’t re-resolve anyway.

The fix wasn’t “flush DNS.” The fix was: lower JVM DNS cache TTL to something sane for that service, add a deploy-time “connection pool recycle” during endpoint changes, and improve runbooks so “dig says X” doesn’t end the investigation.

They also learned a painful side lesson: DNS TTL doesn’t force clients to behave. It politely suggests. Production systems are not known for their manners.

Mini-story 2: The optimization that backfired (local cache everywhere)

A large enterprise with a chatty microservice mesh decided to reduce load on central resolvers. The idea was rational: install a local caching resolver on every VM. They chose a lightweight forwarder with aggressive caching defaults. Latency improved on synthetic tests. Resolver graphs looked calmer. Everyone went home on time for two weeks.

Then they had a security incident response where they needed to reroute a set of service names quickly away from a compromised network segment. DNS changes were made centrally. Some nodes updated quickly, others didn’t. The “cache everywhere” plan had turned into “inconsistency everywhere.”

What bit them wasn’t just positive caching. Negative caching and serve-stale behavior (configured to hide upstream flaps) meant some hosts continued returning old answers even when upstream had changed. The whole point of DNS—rapid indirection—was blunted by “performance tuning.”

The fix required two changes: policy-driven TTL caps (don’t extend beyond authoritative TTL for critical names), and a clear operational lever to invalidate caches for specific zones during incidents. They kept local caching, but they treated it like production infrastructure with change control and observability, not a “speed hack.”

The lesson: caches are not free. They are tiny distributed databases with personality.

Mini-story 3: The boring but correct practice that saved the day (prove it on the wire)

A fintech team ran Ubuntu 24.04 on a mix of bare metal and VMs. One morning, a subset of nodes failed to reach a payment partner API after a DNS change on the partner side. The partner swore DNS was updated. The team’s dig on one host agreed. Others didn’t. Anxiety arrived early.

The on-call followed a dull but reliable routine: first, resolvectl status to confirm upstream servers; second, getent hosts for NSS behavior; third, direct dig @upstream. The results were consistent: some nodes were querying one corporate resolver, others a different one due to per-link DNS routing after a VPN failover event.

They didn’t “flush DNS and pray.” They captured a short tcpdump showing exactly which resolver each affected host was querying and the answers returned. That packet capture ended the debate in ten minutes, because packets don’t do politics.

Once the root cause was clear—split resolver paths—they fixed the network configuration to ensure all nodes used the same resolver pair for that routing domain, and added monitoring that checks resolved’s per-link DNS server list for drift.

It wasn’t glamorous. It was correct. And it prevented a multi-hour blame spiral between the app team, network team, and an external partner.

Common mistakes: symptom → root cause → fix

1) Symptom: `dig` shows new IP, app still uses old IP

Root cause: App-level DNS caching or persistent connection reuse (pooling/keep-alive). Sometimes the app never re-resolves after startup.

Fix: Verify with ss -tnp what IP is connected; recycle the app/pool; configure runtime DNS TTL; ensure lookups occur per request where appropriate.

2) Symptom: Flushing `systemd-resolved` changes nothing

Root cause: The wrong cache. Either upstream is stale, or the client isn’t using resolved (containers, DoH, custom resolver).

Fix: Query upstream directly with dig @server; check container /etc/resolv.conf; capture DNS traffic with tcpdump.

3) Symptom: Only some interfaces/nodes resolve internal domains

Root cause: Per-link DNS routing/split-horizon domains; VPN or secondary NIC changed resolved’s routing domains.

Fix: Use resolvectl status and resolvectl query to see which link answered; correct netplan/NetworkManager config so the right interface owns the domain.

4) Symptom: Name “does not exist” after you just created it

Root cause: Negative caching (local or upstream). Your resolver cached NXDOMAIN based on SOA settings.

Fix: Flush local caches; query multiple upstream resolvers; wait out negative TTL or invalidate upstream cache if you control it.

5) Symptom: `getent hosts` returns a weird IP that `dig` doesn’t

Root cause: /etc/hosts override or NSS order/mDNS interception.

Fix: Inspect /etc/hosts and /etc/nsswitch.conf; remove stale overrides; re-run getent hosts.

6) Symptom: IPv4 works sometimes, IPv6 fails and it “looks like DNS”

Root cause: AAAA record present; client prefers IPv6; IPv6 path broken.

Fix: Check dig AAAA name; validate IPv6 connectivity; fix routing/firewall or adjust client policy.

7) Symptom: Containers fail to resolve, host resolves fine

Root cause: Container runtime DNS layer (127.0.0.11) or Kubernetes DNS configuration; node-local caching may differ.

Fix: Inspect container /etc/resolv.conf; test resolution from inside the container; adjust daemon/kube DNS config instead of host caches.

8) Symptom: DNS queries time out sporadically

Root cause: Firewall, MTU issues, or resolver reachability; sometimes TCP fallback blocked; sometimes DoT blocked.

Fix: Use tcpdump to see retries; test TCP/53 and TCP/853; confirm routing and security group rules.

Checklists / step-by-step plan

Incident checklist: “DNS is wrong on Ubuntu 24.04”

Confirm what you’re testing: Are you testing the app, NSS, or direct DNS?
Check resolver mode: ls -l /etc/resolv.conf and systemctl status systemd-resolved.
Inspect per-link DNS: resolvectl status.
Compare answers: getent hosts name vs dig name vs dig @upstream name.
Check overrides: grep name /etc/hosts and grep '^hosts:' /etc/nsswitch.conf.
Flush only when justified: sudo resolvectl flush-caches.
Validate on the wire: tcpdump for port 53/853 while issuing queries.
If app mismatch persists: confirm connections (ss -tnp), restart/recycle app/pool, and inspect runtime DNS caching behavior.
If upstream is stale: escalate to DNS owners or bypass for validation; don’t keep flushing clients.

Prevention checklist: stop future “DNS lies”

Standardize resolver ownership: decide whether resolved is the stub everywhere, or whether you run a dedicated local resolver. Don’t do both accidentally.
Set sane TTL expectations: document what TTLs your org uses for critical names and how caches may cap them.
Operationalize DNS changes: for endpoint migrations, include app restart/pool recycle steps when clients may cache.
Monitor resolver drift: alert if per-link DNS servers differ from expected (especially after VPN events).
Separate “name resolution” from “connectivity” in runbooks: always check the actual connected IP and route, not just DNS answers.
Train teams on NSS: make getent the default “what will apps see?” tool.

FAQ

1) On Ubuntu 24.04, what is the correct command to flush DNS?

If you’re using systemd-resolved (common on 24.04), use sudo resolvectl flush-caches. But only after proving resolved is the caching layer in your path.

2) Why does `dig` show one IP but my application uses another?

dig is a direct DNS client; your app likely uses NSS and the system resolver stack, or it caches in-process, or it reuses existing connections. Compare getent hosts and check active connections with ss -tnp.

3) Is restarting `systemd-resolved` better than flushing caches?

Restarting is heavier and can briefly disrupt resolution. Prefer resolvectl flush-caches first. Restart only if resolved is wedged (rare) or you changed its configuration.

4) Why shouldn’t I edit `/etc/resolv.conf` directly?

Because on Ubuntu 24.04 it’s often a symlink managed by systemd/netplan/NetworkManager. Your edits can vanish on reboot or link changes, and you’ll be debugging yesterday’s file.

5) What’s the best “truth” command for what the system will resolve?

getent hosts name. It follows NSS rules, including /etc/hosts, mDNS, and DNS via the configured resolver stack.

6) Can systemd-resolved route DNS differently for different interfaces?

Yes. It can select DNS servers and search domains per link. This is a common cause of “some nodes work, some don’t,” especially with VPNs.

7) How do I know if the cache problem is upstream and not local?

Query the upstream server directly: dig @DNS_SERVER name. If upstream is wrong, flushing local caches won’t change the answer.

8) Why does a newly created DNS record still return NXDOMAIN?

Negative caching. NXDOMAIN responses can be cached according to SOA parameters. Flush local caches, then check upstream; if upstream still returns NXDOMAIN, wait or invalidate upstream cache if you control it.

9) My host resolves fine but containers don’t. Why?

Containers often use a different resolver address (like Docker’s 127.0.0.11) and different forwarding rules. Diagnose from inside the container and fix the container DNS layer—not the host stub.

10) How can I prove which resolver IP my host is using without guessing?

resolvectl status shows the current DNS server and per-link DNS servers. Pair it with tcpdump to confirm traffic goes where you think it does.

Conclusion: next steps that actually move the needle

If you take one operational habit from this case: stop saying “flush DNS” as if it’s a single lever. On Ubuntu 24.04 it’s a stack. Your job is to identify which layer is lying—or more often, which layer is faithfully repeating a stale truth it learned earlier.

Do this next time, in this order:

Use getent hosts to see what applications will see.
Use resolvectl status and resolvectl query to see which link and which upstream resolver are involved.
Use dig @upstream to separate local caching from upstream staleness.
Use tcpdump when arguments start. Packets settle arguments.
Flush only the cache you proved is in the path.

And when someone suggests restarting random networking services because “DNS is weird,” hand them a packet capture and a copy of your runbook. Gently. You still have to work with them tomorrow.