Ubuntu 24.04: VPN breaks DNS — fix resolvers and routes in the correct order (case #52)

Was this helpful?

You connect to the VPN and everything looks “up.” The tunnel is green. The icon smiles.
And then: nothing resolves. Internal hostnames fail, public DNS maybe fails too, and your browser does the
corporate equivalent of staring at a closed door.

This is the sort of failure that makes smart people do dumb things—like randomly editing /etc/resolv.conf
at 2 a.m. Don’t. On Ubuntu 24.04, resolver behavior is a layered system (systemd-resolved + NetworkManager + per-link DNS),
and VPNs are aggressive about routes. Fixing DNS without fixing routes is like changing the smoke detector batteries while the kitchen is still on fire.

The mental model: what changed in Ubuntu 24.04 and why VPNs trip it

Ubuntu 24.04 still “uses” /etc/resolv.conf, but in the same way a stage magician “uses” a deck of cards:
it’s not where the real decisions are happening. Most desktop and many server installs run
systemd-resolved with a stub resolver and per-interface DNS configuration.
NetworkManager is typically the policy engine feeding systemd-resolved DNS and domains.
VPN clients (OpenVPN, WireGuard via NetworkManager, proprietary agents) often modify:

  • Routes (default route, policy routes, split tunnel routes)
  • DNS servers (push internal resolvers, override to “VPN DNS only”)
  • Search domains / split DNS domains (route only corp.example via VPN DNS)
  • Firewall rules (some clients implement their own kill-switch)

The trap: DNS is network traffic. If your routing table points DNS packets to the wrong place,
“fixing DNS” at the resolver layer doesn’t help. Equally, a correct route with a wrong resolver
makes it look like the VPN is down when it isn’t. So the order matters:
routing first, then resolver selection, then split DNS behavior, then caches.

Here’s the baseline flow on a typical Ubuntu 24.04 system:

  1. glibc calls the resolver library (often using nsswitch.conf rules)
  2. Queries go to the nameserver listed in /etc/resolv.conf (often 127.0.0.53)
  3. systemd-resolved selects DNS servers per-link, per-domain, with routing rules
  4. Packets are sent using the kernel routing table (or policy routing) via the chosen interface

If a VPN “helpfully” installs a default route through the tunnel but forgets to allow your corporate DNS
to be reachable through that tunnel—or worse, installs DNS servers that are only reachable on the local LAN—
you get the classic symptom: IPs work, names don’t. Or “public names work, internal names don’t.”

One quote worth keeping on a sticky note:
“Hope is not a strategy.” — Gordon R. Dickson. It’s not an SRE quote, but it applies brutally well to DNS debugging.
(If you can’t verify it with commands, it’s hope.)

Joke #1: DNS is the only system where “it works on my machine” can literally mean “it works on my resolver.”

Interesting facts and context (quick, concrete, useful)

  • systemd-resolved has been the default resolver in Ubuntu for years, but the ecosystem still assumes /etc/resolv.conf is authoritative, which creates ghost bugs when it’s only a stub.
  • Split DNS is older than most engineers think: enterprises were doing “internal zones on internal resolvers” long before modern VPN apps branded it as a feature.
  • NetworkManager can set DNS per connection and per VPN, and it can also decide whether VPN DNS overrides or supplements existing DNS.
  • OpenVPN “push” options can deliver DNS servers and routes to clients; if the server config is sloppy, every client inherits the sloppiness.
  • WireGuard doesn’t have a native control plane for DNS; DNS is typically configured by the client (wg-quick, NetworkManager, or custom scripts), which means “DNS broke” is often “client didn’t apply DNS.”
  • 127.0.0.53 is not “the DNS server”; it’s a local stub listener for systemd-resolved. If you point applications elsewhere, you can bypass split DNS logic by accident.
  • DNS over TCP exists for a reason: large responses, DNSSEC, and some VPN/firewall combos that mishandle UDP fragmentation.
  • Search domains can be weaponized by accident: adding a search suffix like corp.example can make short names resolve differently and confuse logs and tooling.

Fast diagnosis playbook (what to check first/second/third)

First: prove whether the problem is routing or resolving

  1. Can you reach a known IP over the VPN?
    If you can ping or curl an internal IP but internal names fail, you’re in resolver land.
    If you can’t reach internal IPs either, start with routes/firewall.
  2. Can you reach the configured DNS server IP(s)?
    If the VPN pushed 10.0.0.53 but you can’t route to it, DNS is doomed no matter how correct configs look.

Second: inspect systemd-resolved’s view, not your assumptions

  1. Use resolvectl status to see per-link DNS servers and domains.
    If your VPN link has DNS but the default route still points to Wi‑Fi, you may have split tunnel without split DNS (or vice versa).
  2. Check if /etc/resolv.conf is a stub pointing at 127.0.0.53 or something else.
    If it’s been overwritten by a VPN client, you may be bypassing systemd-resolved’s per-link logic.

Third: validate resolution from multiple angles

  1. Test with resolvectl query (goes through systemd-resolved).
  2. Test with dig @SERVER (bypasses local policy; good for checking reachability and server correctness).
  3. If results differ, you have a policy/config mismatch rather than a pure DNS outage.

Fourth: only then touch configuration

Don’t start “fixing” by hardcoding 8.8.8.8 or ripping out resolved. That’s how you get public DNS working
while internal zones fail silently for weeks. Fix the routing and resolver policy so the machine behaves correctly on and off VPN.

Practical tasks: commands, expected output, and decisions

These are the checks I actually run on Ubuntu 24.04 when a VPN “connects” but DNS acts like it’s in a different time zone.
Each task includes: the command, what output usually means, and the decision you make from it.
Do them in order unless you enjoy chaotic debugging.

Task 1: Confirm the VPN interface exists and is up

cr0x@server:~$ ip -brief link
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
enp0s31f6        UP             3c:52:82:ab:cd:ef <BROADCAST,MULTICAST,UP,LOWER_UP>
wg0              UNKNOWN        9a:bc:de:f0:12:34 <POINTOPOINT,NOARP,UP,LOWER_UP>

Meaning: wg0 (or tun0) exists and is UP. If it’s missing, you don’t have a DNS problem; you have a VPN problem.
Decision: If the interface isn’t present/up, fix VPN connectivity/auth first. Stop here.

Task 2: Inspect addresses on the VPN interface

cr0x@server:~$ ip addr show dev wg0
4: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.44.0.12/32 scope global wg0
       valid_lft forever preferred_lft forever

Meaning: You got an address. A /32 on WireGuard is common. For OpenVPN, you may see /24 or /20.
Decision: No address means the tunnel isn’t properly configured. DNS can’t work if the VPN didn’t assign network identity.

Task 3: Check the default route and VPN routes

cr0x@server:~$ ip route
default via 192.168.1.1 dev enp0s31f6 proto dhcp metric 100
10.44.0.0/16 dev wg0 proto kernel scope link src 10.44.0.12
192.168.1.0/24 dev enp0s31f6 proto kernel scope link src 192.168.1.50 metric 100

Meaning: Default route is still local LAN; VPN provides a route to only 10.44/16 (split tunnel).
Decision: If corporate DNS is in 10.44/16, good. If the VPN pushed DNS like 10.99.0.53 but there’s no route to 10.99/16, you’ve found the bug: missing route(s).

Task 4: Ask “how do we route to the DNS server IP?”

cr0x@server:~$ ip route get 10.44.0.53
10.44.0.53 dev wg0 src 10.44.0.12 uid 1000
    cache

Meaning: Packets to 10.44.0.53 will go via wg0. Great.
Decision: If it routes via the wrong interface (like Wi‑Fi), fix routing/policy before touching DNS settings.

Task 5: Verify you can reach the DNS server (ICMP is optional; UDP/53 is the point)

cr0x@server:~$ nc -uvz -w2 10.44.0.53 53
Connection to 10.44.0.53 53 port [udp/domain] succeeded!

Meaning: UDP 53 appears reachable. Not a perfect guarantee, but it strongly suggests routing/firewall isn’t blocking basic DNS.
Decision: If this fails, focus on VPN routes, firewall rules, or the DNS server being down.

Task 6: Inspect what /etc/resolv.conf really is

cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Apr 25 10:12 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

Meaning: Stub resolver in use; apps call 127.0.0.53 and systemd-resolved decides upstream.
Decision: If /etc/resolv.conf is a regular file pointing at some VPN-provided DNS, you may have bypassed resolved and broken split DNS. Decide whether you want that (you probably don’t).

Task 7: Check systemd-resolved status and per-link DNS

cr0x@server:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp0s31f6)
    Current Scopes: DNS
         Protocols: +DefaultRoute
Current DNS Server: 192.168.1.1
       DNS Servers: 192.168.1.1

Link 4 (wg0)
    Current Scopes: DNS
         Protocols: -DefaultRoute
       DNS Servers: 10.44.0.53
        DNS Domain: ~corp.example

Meaning: Perfect split DNS setup: only corp.example goes to VPN DNS (routing domain prefixed by ~).
Decision: If the VPN link has no DNS servers, NetworkManager didn’t apply them (or a client clobbered things). Fix at NM/VPN profile.

Task 8: Confirm resolution via systemd-resolved path

cr0x@server:~$ resolvectl query app01.corp.example
app01.corp.example: 10.44.20.17                     -- link: wg0

-- Information acquired via protocol DNS in 42.1ms.
-- Data is authenticated: no

Meaning: Resolved is choosing wg0 and returning the right record.
Decision: If this works but applications still fail, check nsswitch.conf or app-specific DNS behavior (some apps bypass glibc).

Task 9: Compare with a direct dig against the VPN DNS server

cr0x@server:~$ dig @10.44.0.53 app01.corp.example +time=2 +tries=1

; <<>> DiG 9.18.28-0ubuntu0.24.04.1-Ubuntu <<>> @10.44.0.53 app01.corp.example +time=2 +tries=1
; (1 server found)
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48291
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; ANSWER SECTION:
app01.corp.example.  60 IN A 10.44.20.17

;; Query time: 25 msec
;; SERVER: 10.44.0.53#53(10.44.0.53) (UDP)
;; WHEN: Thu Apr 25 10:26:51 UTC 2025
;; MSG SIZE  rcvd: 62

Meaning: The DNS server itself is fine.
Decision: If dig works but resolvectl fails, the issue is local policy/config (domains, per-link DNS, resolved state).

Task 10: Check whether the VPN changed DNS priority or default route handling

cr0x@server:~$ nmcli dev show wg0 | sed -n '1,120p'
GENERAL.DEVICE:                         wg0
GENERAL.TYPE:                           wireguard
GENERAL.STATE:                          100 (connected)
IP4.ADDRESS[1]:                         10.44.0.12/32
IP4.GATEWAY:                            --
IP4.DNS[1]:                             10.44.0.53
IP4.DOMAIN[1]:                          corp.example
IP4.ROUTE[1]:                           dst = 10.44.0.0/16, nh = 0.0.0.0, mt = 0

Meaning: NM believes wg0 has DNS and a route. Good.
Decision: If IP4.DNS is missing here, fix the VPN connection profile; don’t hack resolved directly as a first move.

Task 11: Check for policy routing rules (common with kill-switch or “full tunnel”)

cr0x@server:~$ ip rule show
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

Meaning: No special policy rules. Simple routing table applies.
Decision: If you see extra rules directing DNS traffic (or all traffic) to a separate table, you must inspect those tables too. Don’t assume ip route alone tells the story.

Task 12: Inspect firewall rules for DNS blocks (especially with VPN kill-switch)

cr0x@server:~$ sudo nft list ruleset | sed -n '1,160p'
table inet filter {
  chain output {
    type filter hook output priority filter; policy accept;
    ip daddr 10.44.0.0/16 udp dport 53 accept
    ip daddr 10.44.0.0/16 tcp dport 53 accept
  }
}

Meaning: DNS to the VPN space is allowed. Great.
Decision: If you see a default-drop output chain or rules that only allow traffic via the tunnel interface, confirm DNS is included. Many “privacy” VPN clients block UDP/53 unless explicitly allowed.

Task 13: Check systemd-resolved logs when it looks “configured but dead”

cr0x@server:~$ journalctl -u systemd-resolved -n 80 --no-pager
Apr 25 10:24:10 server systemd-resolved[812]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.44.0.53.
Apr 25 10:24:12 server systemd-resolved[812]: Failed to send hostname reply: Transport endpoint is not connected
Apr 25 10:24:20 server systemd-resolved[812]: Switching to DNS server 10.44.0.53 for link 4.

Meaning: You can see retries, fallbacks, and “degraded feature set” messages that hint at MTU/fragmentation issues across VPN.
Decision: If you see repeated timeouts or switching, move to MTU checks and try TCP DNS queries.

Task 14: Test DNS over TCP (because VPNs and MTU love drama)

cr0x@server:~$ dig @10.44.0.53 app01.corp.example +tcp +time=2 +tries=1

; <<>> DiG 9.18.28-0ubuntu0.24.04.1-Ubuntu <<>> @10.44.0.53 app01.corp.example +tcp +time=2 +tries=1
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1136
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; ANSWER SECTION:
app01.corp.example.  60 IN A 10.44.20.17

;; Query time: 31 msec
;; SERVER: 10.44.0.53#53(10.44.0.53) (TCP)
;; WHEN: Thu Apr 25 10:28:51 UTC 2025
;; MSG SIZE  rcvd: 62

Meaning: TCP works. If UDP fails but TCP succeeds, suspect MTU/fragmentation or firewall rules targeting UDP.
Decision: Adjust MTU on tunnel, fix path MTU discovery issues, or allow fragmented UDP. Don’t “fix” by switching resolvers to public DNS; that just hides the problem.

Task 15: Confirm what applications will do via NSS

cr0x@server:~$ grep -E '^(hosts|resolve)' /etc/nsswitch.conf
hosts:          files mdns4_minimal [NOTFOUND=return] dns

Meaning: Standard ordering: files, then mDNS, then DNS. Fine.
Decision: If dns is missing or replaced by something odd, you can have “resolvectl works but getent fails.” Fix NSS ordering before blaming the VPN.

Task 16: Test resolution like a boring OS tool (because browsers lie)

cr0x@server:~$ getent ahosts app01.corp.example
10.44.20.17     STREAM app01.corp.example
10.44.20.17     DGRAM  app01.corp.example
10.44.20.17     RAW    app01.corp.example

Meaning: glibc resolver path works.
Decision: If getent fails but resolvectl query works, the problem is between glibc and resolved (resolv.conf mode, nsswitch, or per-app DNS).

Fix in the correct order: routes → resolvers → split DNS → caching

The theme: don’t “fix” DNS by forcing a single resolver globally unless you’re deliberately choosing to break split DNS.
Your goal is policy correctness: internal domains resolve via internal DNS when VPN is up; public domains resolve normally; VPN down returns you to normal.
That’s reliability, not heroics.

Step 1: Fix routing to the DNS servers (and internal zones)

Most VPN DNS failures are not DNS. They’re routing. The VPN pushes a DNS server address that is only reachable via the tunnel,
but the tunnel is configured split-route incorrectly. Or the opposite: full-tunnel installed, but the client’s firewall blocks DNS off-tunnel.

If you discover missing routes, decide where the fix belongs:

  • Correct place: the VPN profile/server configuration that pushes routes. Fix it once; all clients benefit.
  • Acceptable workaround: add static routes on the client when you can’t control the VPN server.
  • Bad idea: switch to a public resolver to “get internet back.” That hides the broken internal routing and kills internal name resolution.

Example: add a route temporarily (until the VPN profile is corrected):

cr0x@server:~$ sudo ip route add 10.99.0.0/16 dev wg0

Meaning: You’ve made the DNS server network reachable via the tunnel.
Decision: If this “fixes DNS,” go back and implement the route in NetworkManager connection settings or on the VPN server. Don’t leave snowflake routes on laptops as a permanent strategy.

Step 2: Ensure systemd-resolved is the authority (unless you have a reason not to)

On Ubuntu 24.04, you want one decision-maker. If a VPN client replaces /etc/resolv.conf with a static file,
it can bypass systemd-resolved and break split DNS. Restore the stub symlink if needed.

cr0x@server:~$ sudo ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf

Meaning: Applications use the local stub; resolved can apply per-link DNS and routing domains.
Decision: If you run a headless server and explicitly don’t want systemd-resolved, that’s a different design. But don’t mix modes casually.

Step 3: Fix the VPN connection’s DNS and split DNS settings in NetworkManager

For corporate VPNs, you usually want split DNS: internal zones via VPN DNS, everything else via your normal resolver.
On Ubuntu with NetworkManager, that means the VPN connection should set:

  • VPN DNS server(s): internal resolvers
  • DNS domain(s): routing domains like ~corp.example (or configured in NM so resolved gets them)
  • Appropriate route(s): to internal networks and DNS servers

Inspect and adjust connection settings (example shows reading values; writing depends on VPN type):

cr0x@server:~$ nmcli -f NAME,TYPE,DEVICE connection show --active
Wired connection 1  ethernet  enp0s31f6
Corp WG            wireguard wg0
cr0x@server:~$ nmcli connection show "Corp WG" | sed -n '1,140p'
connection.id:                          Corp WG
connection.type:                        wireguard
ipv4.method:                            auto
ipv4.dns:                               10.44.0.53
ipv4.dns-search:                        corp.example
ipv4.never-default:                     yes

Meaning: ipv4.never-default: yes indicates split tunnel (no default route via VPN).
Decision: If you need full tunnel, set never-default to no (and make sure DNS routes/firewall allow the DNS server!). If you need split tunnel, keep it yes but ensure routes to internal DNS and internal networks exist.

Step 4: Repair domain routing (the subtle part people skip)

You can have “DNS servers set” but still send the wrong queries to them.
With systemd-resolved, routing domains matter. A routing domain (prefixed with ~ in resolvectl output)
tells resolved which domains should be answered by which link.

If your VPN should only handle corp.example, set that, not a global search list that makes everything look internal.
A global search list is how you end up with printer resolving to printer.corp.example while you’re at home,
and suddenly your family network is “part of the incident.”

Step 5: Flush caches after changes (but only after changes)

systemd-resolved caches. Browsers cache. Some apps cache aggressively. Flush the right cache at the right time so you don’t chase ghosts.

cr0x@server:~$ sudo resolvectl flush-caches
cr0x@server:~$ resolvectl statistics
DNSSEC supported by current servers: no
Transactions: 58
Cache hits: 11
Cache misses: 47

Meaning: If you keep seeing cache hits for wrong answers, flushing helps, but it doesn’t fix policy.
Decision: Flush once after configuration changes, then re-test. If you flush repeatedly during debugging, you lose signal.

Step 6: If UDP is flaky, treat MTU like a first-class citizen

VPNs change MTU. DNS over UDP can be impacted by fragmentation, EDNS0, and path MTU discovery failures.
If you see resolved degrading features or UDP timeouts, test TCP (Task 14).
Then consider reducing tunnel MTU or fixing firewall rules that drop fragments.

Joke #2: The fastest way to learn about MTU is to ignore it for a week and then “discover” it at 3 a.m.

Three corporate mini-stories from the DNS/VPN trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company rolled out Ubuntu 24.04 to a group of engineers who lived in VPN land. The migration “went fine”
until the first week when internal services started “flapping.” The symptom was odd: internal web apps worked by IP but failed by name,
and only for some users, on and off, depending on where they were sitting.

The assumption was classic: “resolv.conf is the DNS configuration.” A well-meaning team member wrote a script that rewrote
/etc/resolv.conf on VPN connect to point directly at the corporate DNS, and rewrote it again on disconnect.
It worked in their terminal tests. They declared victory and went back to shipping features.

On Ubuntu 24.04, that script bypassed systemd-resolved’s per-link logic and domain routing.
Users with split tunnel started sending all DNS queries to corporate resolvers—including public domains—over paths that weren’t routed.
The corporate DNS servers were reachable only through the tunnel, and the script sometimes ran before routes were installed.
DNS queries went nowhere, then cached NXDOMAINs, and the browser experience became a slot machine.

The fix was boring: remove the script, restore the stub resolver, and correct the VPN profile so NetworkManager fed resolved the right servers and routing domains.
The “incident” ended not with a heroic hack, but with the humility of admitting the system had a policy engine for a reason.

Mini-story 2: The optimization that backfired

Another org had a performance-minded networking team. They were tired of “slow DNS” complaints and decided to optimize:
push a single “fast” DNS server to all VPN clients. The server was a well-resourced resolver close to the VPN concentrator.
They also shortened TTLs for internal records because “it makes changes propagate faster.”

It backfired in two ways. First, clients in split tunnel mode now had a resolver that was only reachable via VPN, but the server push didn’t ensure
routes to that resolver in every tunnel mode. Second, the low TTLs increased query volume and made intermittent packet loss visible as user-facing timeouts.
The resolver was “fast” in isolation and fragile in the real network.

Engineers started hardcoding public resolvers to “fix their internet,” which created a new class of problems:
internal domains leaked to public DNS (harmless most of the time, but embarrassing), and internal names stopped resolving entirely.
The helpdesk playbook became a mess of contradictory advice.

The eventual remediation: restore split DNS (internal zones to internal resolvers only when routes exist),
keep TTLs reasonable, and add TCP fallback support for DNS across the VPN path.
Performance improved because the system was correct, not because someone declared a resolver “fast.”

Mini-story 3: The boring but correct practice that saved the day

A security-heavy enterprise had a habit that looked dull on paper: every VPN change required a test checklist run on a clean Ubuntu image.
Not a developer laptop. Not someone’s “already configured” machine. A disposable VM that represented the baseline reality.
They tested routing, DNS, and a small set of internal service lookups.

One week, they planned to add a new internal DNS zone and update the VPN to push the corresponding routing domain.
During the checklist run, resolvectl status showed the domain was being added as a search domain instead of a routing domain.
That meant resolved would treat it differently, and short names would start resolving in surprising ways.
It wasn’t a security incident yet, but it smelled like one.

The team caught it before rollout. They adjusted the NetworkManager profile generation so the VPN created proper per-domain routing.
They also verified that the DNS server IP was reachable in split tunnel mode by checking ip route get during connect.
This prevented the “VPN connected but DNS dead” ticket storm.

No heroics. No midnight edits. Just a checklist that treated DNS and routing as coupled systems.
The best outages are the ones you never get paged for.

Common mistakes (symptoms → root cause → fix)

1) Symptom: “VPN connects, but internal hostnames don’t resolve”

Root cause: VPN DNS server is configured, but there’s no route to it (split tunnel missing the resolver subnet).
Fix: Add/repair routes to DNS server IP(s) in the VPN profile or server push config. Verify with ip route get DNS_IP.

2) Symptom: “Public DNS works, internal DNS fails”

Root cause: Split DNS not configured; internal domains aren’t routed to the VPN resolver (or routing domains missing).
Fix: Configure routing domains (e.g., ~corp.example) for the VPN link in resolved/NM, and ensure VPN DNS servers are set on that link.

3) Symptom: “Internal works, but the internet dies when VPN is up”

Root cause: Full tunnel route installed (or kill-switch rules) but VPN DNS points to internal resolvers that can’t resolve public domains, or egress is restricted.
Fix: Decide policy: full tunnel with corporate recursive resolvers that resolve public domains, or split tunnel with non-VPN resolver for public. Don’t half-do both.

4) Symptom: “resolvectl query works, but browsers/apps still fail”

Root cause: Application bypasses system resolver (DoH, custom resolver, container DNS, or hardcoded nameserver).
Fix: Confirm with getent ahosts and app settings. Disable app DoH temporarily or align it with corporate DNS policy.

5) Symptom: “dig @DNS works, but resolvectl query times out”

Root cause: systemd-resolved is using different DNS server(s) or wrong link selection due to missing routing domains.
Fix: Inspect resolvectl status and ensure VPN link has DNS servers and ~domain routing domains.

6) Symptom: “DNS works for a while, then breaks after sleep/resume”

Root cause: VPN reconnects but DNS settings aren’t re-applied (race between VPN up and NM/resolved update), or stale routes remain.
Fix: Confirm post-resume with tasks 3, 7, 10. If missing, fix dispatcher scripts or VPN client integration; flushing caches is not a fix, it’s a bandage.

7) Symptom: “Only large DNS responses fail (some domains load, others don’t)”

Root cause: MTU/fragmentation issues, EDNS0 problems, or UDP fragments dropped across the tunnel.
Fix: Test TCP DNS. Adjust tunnel MTU, allow fragments, or configure resolver/client for safer UDP sizes.

8) Symptom: “Random NXDOMAIN for internal names”

Root cause: Wrong resolver is answering internal queries (public resolver or home router), sometimes due to missing routing domain or wrong DNS priority.
Fix: Enforce split DNS: internal zones routed exclusively to internal resolvers via VPN link.

Checklists / step-by-step plan

Checklist A: The “I need it working now” sequence (10–15 minutes)

  1. Confirm VPN interface is up: ip -brief link.
  2. Confirm VPN has an IP: ip addr show dev wg0 or ip addr show dev tun0.
  3. Check routes: ip route.
  4. Verify route to DNS server: ip route get DNS_IP.
  5. Test reachability to DNS server port: nc -uvz DNS_IP 53.
  6. Inspect resolver state: resolvectl status.
  7. Query internal name: resolvectl query host.corp.example.
  8. Compare with direct dig: dig @DNS_IP host.corp.example.
  9. Flush cache once after changes: sudo resolvectl flush-caches.
  10. If UDP flaky, test TCP: dig @DNS_IP host.corp.example +tcp.

Checklist B: Make the fix durable (so you don’t meet this ticket again)

  1. Stop editing /etc/resolv.conf manually; restore stub symlink if needed.
  2. Fix VPN profile/server push routes to include DNS server subnet and internal networks.
  3. Configure split DNS routing domains (not just search domains) for internal zones.
  4. Validate with resolvectl status that the VPN link owns ~corp.example.
  5. Confirm nmcli dev show shows DNS and domains on the VPN interface.
  6. Audit firewall/kill-switch rules for DNS over UDP and TCP.
  7. Document the expected state: which interface owns which domain and which DNS servers.
  8. Add a regression test: connect VPN on a clean image and run resolvectl query for at least one internal and one public name.

Checklist C: When you suspect the VPN client is “helping” too much

  1. Check if /etc/resolv.conf got replaced with a regular file.
  2. Check for extra ip rule entries.
  3. Check nft output filtering rules for UDP/53.
  4. Look for multiple DNS managers running (proprietary client + NetworkManager + resolved).
  5. Pick one manager as the source of truth; disable the others’ DNS manipulation if possible.

FAQ

1) Why does VPN break DNS on Ubuntu 24.04 specifically?

It’s not “specifically,” but Ubuntu 24.04 makes it more visible because systemd-resolved and per-link DNS are common.
VPN clients that assume /etc/resolv.conf is the only knob can fight the actual resolver stack.

2) Should I disable systemd-resolved?

Usually no. Resolved is good at split DNS when fed correct per-link settings. Disable it only if you have a clear alternative plan
(like a local caching resolver you manage) and you understand how VPN clients will update it.

3) My /etc/resolv.conf points to 127.0.0.53. Is that wrong?

No. That’s the local stub. The real upstream servers are visible in resolvectl status.
If you point /etc/resolv.conf directly at an upstream server, you can bypass split DNS behavior.

4) What does ~corp.example mean in resolvectl status?

The tilde indicates a routing domain: queries for that domain go to the DNS servers for that link.
Without the tilde, it’s more like a search domain, which changes how short names are expanded.

5) Why does dig work but my apps still fail?

dig @server bypasses your resolver policy and asks a server directly. It proves server reachability and correctness,
not that your OS is using that server for that domain. Apps may also use DoH or their own resolver stack.

6) Why does DNS work for small records but fail for some domains?

Larger responses (multiple A/AAAA, DNSSEC, big TXT) can hit UDP fragmentation issues across VPN.
Test with dig +tcp. If TCP works but UDP doesn’t, address MTU/fragmentation or firewall behavior.

7) Can I just add public DNS like 1.1.1.1 as a fallback?

You can, but it’s a policy decision with consequences. It often causes internal queries to leak to public DNS or fail unpredictably,
and it can mask missing VPN routes. If your company requires internal-only name resolution, don’t do it.

8) How do I know if the VPN is forcing a default route?

Check ip route for the default route and nmcli connection show for ipv4.never-default.
Also check ip rule for policy routing that can override the default route.

9) Why does it break only after reconnecting the VPN?

Race conditions and stale state: routes and DNS domains might not be fully removed on disconnect,
or not re-applied on connect. Confirm state after reconnect with ip route and resolvectl status.

10) What’s the safest way to validate a fix?

Validate three things: (1) ip route get DNS_IP uses the expected interface, (2) resolvectl status shows the correct per-link DNS and routing domains,
(3) getent ahosts resolves internal and public names as expected with VPN up and down.

Conclusion: practical next steps

When VPN breaks DNS on Ubuntu 24.04, the winning move is order. Routes first. Resolver policy second. Split DNS third. Caches last.
If you fix it in reverse, you’ll get something that “works” until it doesn’t—usually when you’re presenting, on-call, or both.

Next steps you can do today:

  1. Run the Fast diagnosis playbook and capture outputs for ip route and resolvectl status.
  2. Verify reachability to the VPN DNS server with ip route get and nc -uvz.
  3. Restore the stub /etc/resolv.conf if a VPN client overwrote it.
  4. Fix the VPN profile/server so routes and routing domains match the DNS servers it pushes.
  5. Add a regression checklist on a clean Ubuntu 24.04 image, because yesterday’s laptop state is not a test environment.

DNS failures are rarely glamorous. They are, however, extremely honest: the system will do exactly what your routes and resolver policy tell it to do.
Make those two correct, and the rest tends to fall in line.

← Previous
Supply-chain attacks: when hackers target your vendor instead of you
Next →
IKEv2/IPsec: When It’s a Better Choice Than WireGuard or OpenVPN

Leave a comment