dnsmasq Cache + DHCP: A Clean Config That Doesn’t Fight Your System

Was this helpful?

If your laptops “sometimes can’t resolve names” and your printers “randomly disappear,” you don’t have a printer problem.
You have a DNS and DHCP consistency problem. The worst part is how quiet it is: things mostly work, until they don’t.

dnsmasq is the rare tool that can make small networks behave like grown-ups—fast DNS caching, sane DHCP leases, and just enough
features to stop you from building a Rube Goldberg machine out of three daemons and a spreadsheet.
The trick is to configure it so it doesn’t fight your system: systemd-resolved, NetworkManager, resolvconf, and friends.

The mental model: who owns DNS, who owns DHCP

A “clean” dnsmasq deployment is mostly about boundaries. DNS and DHCP are separate protocols, but operationally they’re glued together:
DHCP hands out addresses (and DNS servers), and DNS answers the questions your clients ask when they try to use those addresses.

Your biggest source of pain is two different components believing they are the boss:
one daemon rewriting /etc/resolv.conf, another daemon binding to port 53, your router’s DHCP still running in the background,
and some well-meaning VPN client jamming split DNS rules into the mix.

What “doesn’t fight your system” means

  • Single authority for DHCP on a given L2 segment. No exceptions. Not “mostly.”
  • Clear ownership of stub resolution on clients: either systemd-resolved is the local stub, or dnsmasq is.
  • One place to define upstream DNS (or at least one place per class of clients), with predictable overrides.
  • Logging you can turn on when needed, but not a permanent disk-eating monument to your curiosity.

dnsmasq can play in multiple roles: it can be a DNS forwarder/cache, an authoritative DNS server for local zones, a DHCP server,
a TFTP server for PXE, and more. The failure mode is when it does half of each role while something else does the other half.
That’s when you get “works on Wi‑Fi but not Ethernet” and “only breaks on Tuesday.”

Interesting facts and small history that actually matters

  • dnsmasq’s design target was small networks: one daemon, low memory, minimal dependencies, predictable behavior under load.
  • Port 53 has been a battleground since the 1990s: “local caching resolvers” became popular because upstream recursion was slow and fragile.
  • DHCP’s real superpower is centralizing policy—not just addresses, but default gateway, DNS servers, NTP, and vendor-specific options.
  • systemd-resolved normalized the “stub resolver” pattern: apps query a local listener (127.0.0.53) and it forwards upstream with policy.
  • DNS caching can hide upstream outages temporarily, which is nice until TTLs expire and your “stable” system falls off a cliff at once.
  • Negative caching (NXDOMAIN caching) is a thing: caching “this name doesn’t exist” can speed up failures—and also prolong them.
  • Split DNS predates most VPN clients: enterprises have done “internal zones resolve internally” since before laptops were fashionable.
  • DHCP lease time is a lever: too short hammers your server; too long delays recovery from bad assignments and changes.
  • DNS search domains are a foot-gun: a single bad search list can turn one lookup into six and make latency look like packet loss.

One paraphrased idea that should be stapled to your monitor: paraphrased idea from John Allspaw—reliability comes from understanding how systems fail,
not from pretending they won’t. DNS is a system that fails in creative ways.

Pick one architecture (and stick to it)

Architecture A: dnsmasq owns LAN DNS cache + DHCP; clients use it directly

This is the cleanest for home labs and small offices: clients get DHCP from dnsmasq and use dnsmasq as their DNS server.
dnsmasq forwards to upstream resolvers (your ISP, public resolvers, or an internal recursive resolver).

The operational win: one box is the source of truth for “what’s on my LAN,” and you can teach it local names.
The operational risk: if you put it on a laptop, you deserve what you get. Put it on something boring.

Architecture B: systemd-resolved on clients; dnsmasq is only DHCP (or only DNS)

This works when you’re in a corporate fleet where systemd-resolved is standard and VPN clients hook into it.
In this mode, dnsmasq can serve DHCP and hand out DNS servers, but clients still use their stub resolver locally.

It’s viable, but it’s also how you end up with “DNS is different depending on who logged in last.” You can do it.
You just need discipline.

Architecture C: dnsmasq as a local cache on each machine

Usually not worth it. If you do it, you’re essentially creating N small caches with N sets of upstream rules.
Debugging becomes interpretive dance.

Opinion: For most small networks, choose Architecture A. It’s simpler, debuggable, and you can make it highly reliable with almost no effort.

Joke #1: DNS is like office politics—everyone swears it’s “not their department,” yet it decides whether anything gets done.

The clean dnsmasq config (DNS cache + DHCP) for a LAN

Assumptions for this configuration:

  • Linux server running dnsmasq
  • LAN interface: eth0
  • LAN subnet: 192.168.50.0/24
  • dnsmasq LAN IP: 192.168.50.1
  • Local domain: lan
  • Upstream DNS: 1.1.1.1 and 9.9.9.9 (replace as needed)

Core principles baked into the config

  • Bind explicitly to the LAN interface/IP so you don’t accidentally become “DNS for the universe.”
  • Don’t read random resolver state unless you mean to. If you rely on /etc/resolv.conf, you inherit its chaos.
  • Be explicit about your local domain and local names.
  • Log only what you need, and know how to switch it on during incidents.

dnsmasq.conf (clean baseline)

cr0x@server:~$ sudo tee /etc/dnsmasq.d/lan.conf >/dev/null <<'EOF'
# ---- Identity & safety rails ----
domain-needed
bogus-priv
no-hosts
no-resolv

# Bind only where we intend to serve
interface=eth0
listen-address=192.168.50.1
bind-interfaces

# ---- Upstream DNS ----
server=1.1.1.1
server=9.9.9.9

# ---- Local DNS behavior ----
domain=lan
local=/lan/
expand-hosts
cache-size=10000
neg-ttl=60

# If you want local names from /etc/hosts, do it intentionally:
addn-hosts=/etc/dnsmasq.hosts

# ---- DHCPv4 ----
dhcp-authoritative
dhcp-range=192.168.50.50,192.168.50.199,255.255.255.0,12h
dhcp-option=option:router,192.168.50.1
dhcp-option=option:dns-server,192.168.50.1
dhcp-option=option:domain-name,lan
dhcp-option=option:ntp-server,192.168.50.1

# Example static lease: MAC,IP,hostname,lease-time
dhcp-host=AA:BB:CC:DD:EE:FF,192.168.50.10,nas,24h

# ---- Logging (keep default quiet; enable when diagnosing) ----
log-async=20
#log-queries
#log-dhcp
EOF

A few notes that save real time:

  • no-resolv means “I do not trust /etc/resolv.conf.” Good. You should not, unless you control it.
  • bind-interfaces plus listen-address prevents accidental exposure on VPN interfaces or docker bridges.
  • dhcp-authoritative is how you stop clients from clinging to stale leases after you reconfigure. Use it only if you are truly the DHCP authority for that subnet.
  • cache-size=10000 is reasonable on modern hardware. Tiny caches turn DNS into a latency amplifier.

Local host records (intentionally separate)

cr0x@server:~$ sudo tee /etc/dnsmasq.hosts >/dev/null <<'EOF'
192.168.50.1   gw gw.lan
192.168.50.10  nas nas.lan
192.168.50.20  build build.lan
EOF

Keeping dnsmasq’s local names out of /etc/hosts is an operations choice: it reduces accidental coupling with system resolver behavior.
When you’re debugging at 2 a.m., you want fewer moving parts, not more.

Enable and restart

cr0x@server:~$ sudo systemctl enable --now dnsmasq
...output...

If you see “failed,” don’t guess. Jump to the tasks section and verify port binding, config syntax, and competing services.

Stop fighting the OS: systemd-resolved, NetworkManager, resolvconf

The root of most fights: port 53 and /etc/resolv.conf

On many modern Linux distros, systemd-resolved runs a stub resolver at 127.0.0.53:53.
That’s fine—until you try to run dnsmasq on the same host and also bind port 53 on 0.0.0.0 or on localhost.

The second fight is /etc/resolv.conf. It might be:

  • a plain file managed by you,
  • a symlink to systemd-resolved’s stub file,
  • managed by resolvconf,
  • rewritten by NetworkManager,
  • or all of the above, briefly, during boot.

Recommended pattern for a dnsmasq server host

If this server is your LAN DNS/DHCP box, the cleanest approach is:

  • dnsmasq listens on the LAN IP (e.g., 192.168.50.1) for clients
  • the server itself can use either:
    • dnsmasq (point its resolver to 127.0.0.1 or 192.168.50.1), or
    • systemd-resolved (and dnsmasq never binds 127.0.0.1:53)

My preference: let the server itself use dnsmasq too, but do it deliberately. If you want systemd-resolved, keep dnsmasq off localhost.
Pick one, document it, and move on.

How to coexist with systemd-resolved (without drama)

Option 1 (common): dnsmasq serves LAN on 192.168.50.1; systemd-resolved stays for local stub resolution. No conflict if dnsmasq doesn’t bind 127.0.0.1.
That’s what listen-address=192.168.50.1 accomplishes.

Option 2 (if you want dnsmasq as the local resolver): disable systemd-resolved and manage /etc/resolv.conf yourself.
This is fine on a dedicated server. It’s a bad idea on a laptop with VPN software that expects systemd-resolved.

NetworkManager gotcha

NetworkManager can run its own dnsmasq instance for caching on the client. That’s not the same as your server dnsmasq,
but it can confuse debugging because “dnsmasq is running” might refer to the wrong one.

If your server uses NetworkManager, be explicit about DNS handling (either let NM manage it or don’t).
Do not let it half-manage it.

Joke #2: The easiest way to test DNS is to ask three people how it works—then watch them all confidently disagree.

Practical tasks: commands, expected output, and the decision you make

These are real operational tasks. Each one includes: a command, what the output means, and what you decide next.
Run them on the dnsmasq host unless stated otherwise.

Task 1: Confirm dnsmasq is running and not flapping

cr0x@server:~$ systemctl status dnsmasq --no-pager
● dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server
     Loaded: loaded (/lib/systemd/system/dnsmasq.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2025-12-31 09:12:01 UTC; 2h 10min ago
   Main PID: 1324 (dnsmasq)
      Tasks: 1 (limit: 18956)
     Memory: 6.4M
     CGroup: /system.slice/dnsmasq.service
             └─1324 /usr/sbin/dnsmasq -k

Meaning: “active (running)” with stable uptime means the daemon isn’t crashing/restarting.
Decision: If uptime is seconds/minutes, go straight to journalctl (Task 2) and config test (Task 3).

Task 2: Read the last errors from dnsmasq

cr0x@server:~$ sudo journalctl -u dnsmasq -n 80 --no-pager
Dec 31 09:11:59 server dnsmasq[1312]: failed to bind DHCP server socket: Address already in use
Dec 31 09:12:00 server systemd[1]: dnsmasq.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 31 09:12:00 server systemd[1]: dnsmasq.service: Failed with result 'exit-code'.

Meaning: Another DHCP server is already bound (often isc-dhcp-server or your router still running on that segment).
Decision: Identify what owns UDP/67 (Task 4) and kill/disable the competing DHCP on that network. Do not “work around” it.

Task 3: Validate dnsmasq config syntax before restart

cr0x@server:~$ sudo dnsmasq --test
dnsmasq: syntax check OK.

Meaning: Syntax is fine; runtime failures are likely port binding, permissions, or environment.
Decision: If you get “bad option” or a file parse error, fix it before restarting—don’t do config roulette in production.

Task 4: Check who is listening on DNS and DHCP ports

cr0x@server:~$ sudo ss -ulpn | egrep ':(53|67)\s'
udp   UNCONN 0      0             192.168.50.1:53        0.0.0.0:*    users:(("dnsmasq",pid=1324,fd=6))
udp   UNCONN 0      0               0.0.0.0:67        0.0.0.0:*    users:(("dnsmasq",pid=1324,fd=4))

Meaning: dnsmasq owns UDP/53 on the LAN IP and UDP/67 for DHCP.
Decision: If you see systemd-resolved on 0.0.0.0:53 or another process on :67, you have a conflict to resolve before anything else.

Task 5: Confirm /etc/resolv.conf is what you think it is

cr0x@server:~$ ls -l /etc/resolv.conf
lrwxrwxrwx 1 root root 39 Dec 31 08:58 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

Meaning: This host is using systemd-resolved’s stub resolver.
Decision: That’s fine if dnsmasq is only for LAN on 192.168.50.1. If you want the host itself to use dnsmasq, change the resolver strategy deliberately.

Task 6: Verify upstream DNS reachability from the server

cr0x@server:~$ dig +time=1 +tries=1 @1.1.1.1 example.com A
; <<>> DiG 9.18.24 <<>> +time=1 +tries=1 @1.1.1.1 example.com A
;; ANSWER SECTION:
example.com.        86400   IN      A       93.184.216.34
;; Query time: 18 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)

Meaning: Upstream is reachable and responding quickly.
Decision: If this times out, your problem is not dnsmasq—it’s routing, firewall, or upstream outage. Fix that first.

Task 7: Verify dnsmasq answers locally (server-side)

cr0x@server:~$ dig +short @192.168.50.1 nas.lan
192.168.50.10

Meaning: Local name resolution works.
Decision: If local names fail, check local=/lan/, domain=lan, and that your addn-hosts file is readable.

Task 8: Verify caching is actually happening

cr0x@server:~$ sudo kill -USR1 $(pidof dnsmasq)
cr0x@server:~$ sudo journalctl -u dnsmasq -n 30 --no-pager
Dec 31 11:18:10 server dnsmasq[1324]: cache size 10000, 8121/10000 cache insertions re-used unexpired cache entries.

Meaning: Cache entries are being reused; dnsmasq is doing its job as a cache.
Decision: If reuse is near zero, clients may be bypassing dnsmasq, or TTLs are too low, or you’re not actually serving the clients you think you are.

Task 9: Watch DHCP leases live

cr0x@server:~$ sudo tail -n 20 /var/lib/misc/dnsmasq.leases
1735643551 11:22:33:44:55:66 192.168.50.73 laptop *
1735644012 aa:bb:cc:dd:ee:ff 192.168.50.10 nas *

Meaning: Leases are being assigned. The fields are: expiry epoch, MAC, IP, hostname, client-id.
Decision: If a client isn’t getting an address, and it’s not here, you have a DHCP reachability issue (VLAN, firewall, competing DHCP).

Task 10: Confirm the client got the right DNS server from DHCP

cr0x@server:~$ ipconfig getpacket en0
...output...

Meaning: On macOS, this shows the DHCP offer/ack including DNS servers. On Linux, use Task 11.
Decision: If the DNS server isn’t 192.168.50.1, your DHCP options are wrong or another DHCP server is winning.

Task 11: On a Linux client, check what DNS is being used

cr0x@server:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Link 2 (eth0)
    Current Scopes: DNS
         DNS Servers: 192.168.50.1
          DNS Domain: lan

Meaning: systemd-resolved is active, but it is using dnsmasq as its upstream DNS server for that link. That’s normal and healthy.
Decision: If it shows 127.0.0.53 only with no upstream, you’re missing DHCP-provided DNS or NM configuration.

Task 12: Verify a client can resolve through dnsmasq and measure latency

cr0x@server:~$ dig @192.168.50.1 example.com A +stats
...output...
;; Query time: 3 msec

Meaning: Low query time suggests cache hit or fast upstream. If you run it twice, the second should typically be faster (cache hit).
Decision: If query time is hundreds of ms or timeouts occur, check upstream health, packet loss, MTU/VPN, and whether DNS is being intercepted.

Task 13: Check if you’re leaking DNS queries to the wrong interface

cr0x@server:~$ sudo tcpdump -ni eth0 udp port 53 -c 10
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
10 packets captured

Meaning: You see DNS traffic on the LAN interface. If you see none while clients complain, they are not querying your dnsmasq at all.
Decision: Fix DHCP options or client resolver settings. Don’t tune dnsmasq for traffic it never sees.

Task 14: Find a rogue DHCP server on the LAN

cr0x@server:~$ sudo nmap --script broadcast-dhcp-discover -e eth0
Starting Nmap 7.94 ( https://nmap.org ) at 2025-12-31 11:41 UTC
Pre-scan script results:
| broadcast-dhcp-discover:
|   Response 1 of 2:
|     IP Offered: 192.168.50.120
|     DHCP Server Identifier: 192.168.50.1
|   Response 2 of 2:
|     IP Offered: 192.168.50.210
|     DHCP Server Identifier: 192.168.50.254

Meaning: Two DHCP servers responded. That’s your outage story before it becomes your outage incident.
Decision: Turn off DHCP on the router (192.168.50.254) or move dnsmasq to a different VLAN/subnet. Don’t run dueling DHCP.

Task 15: Confirm firewall allows DHCP and DNS

cr0x@server:~$ sudo nft list ruleset | sed -n '1,120p'
table inet filter {
  chain input {
    type filter hook input priority 0;
    policy drop;
    iif "lo" accept
    ct state established,related accept
    iif "eth0" udp dport { 53, 67 } accept
    iif "eth0" tcp dport 53 accept
  }
}

Meaning: DNS and DHCP are allowed on the LAN interface.
Decision: If UDP/67 is blocked, DHCP won’t work. If UDP/53 is blocked, most DNS won’t work. Fix rules before touching dnsmasq.

Task 16: Test DHCP request path from a client (Linux)

cr0x@server:~$ sudo dhclient -v -r eth0
...output...
cr0x@server:~$ sudo dhclient -v eth0
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3
DHCPOFFER of 192.168.50.73 from 192.168.50.1
DHCPREQUEST for 192.168.50.73 on eth0 to 255.255.255.255 port 67
DHCPACK of 192.168.50.73 from 192.168.50.1
bound to 192.168.50.73 -- renewal in 19958 seconds.

Meaning: The client is receiving offers/acks from your dnsmasq server.
Decision: If you see offers from a different server, you’ve got competing DHCP. If you see no offers, it’s L2/VLAN/firewall.

Fast diagnosis playbook

When DNS/DHCP is broken, you don’t want a philosophical debate. You want a tight loop: verify authority, verify reachability, verify the client path.
Here’s the order that finds the bottleneck fastest.

First: establish who is in charge

  1. Run Task 14 (broadcast DHCP discover). If there are two DHCP servers, stop. Fix that first.
  2. On the dnsmasq host, run Task 4 (who owns ports 53 and 67). If dnsmasq isn’t bound as expected, stop. Fix binding conflicts.
  3. On a client, run Task 11 (resolvectl status or equivalent) to see what DNS server it is actually using.

Second: verify the server is healthy

  1. Task 1 + Task 2: service status and logs. If it’s flapping, don’t touch clients yet.
  2. Task 3: syntax check. This catches the “I changed one line at midnight” failures.
  3. Task 6: upstream DNS reachability. If upstream is down, caching may mask it briefly; don’t be fooled.

Third: verify the packet path

  1. Task 13: tcpdump on LAN interface for DNS traffic. No packets means clients aren’t asking you.
  2. Task 16: DHCP handshake from a client. If no DHCPOFFER, it’s L2/VLAN/firewall.
  3. Task 15: firewall ruleset sanity check.

Fourth: isolate “slow” from “broken”

  1. Task 12: dig +stats twice. First call tests upstream; second call tests cache.
  2. Task 8: cache stats via USR1. Confirms the cache is being used.

This playbook is deliberately boring. That’s the point: it avoids wasting time in the wrong layer.

Common mistakes: symptom → root cause → fix

1) “DHCP works, but DNS randomly times out”

Symptom: Clients get an IP, but name resolution is intermittent or slow.

Root cause: Clients aren’t using dnsmasq for DNS (wrong DHCP option) or are using a broken upstream resolver via VPN policy.

Fix: Verify DHCP handed out option:dns-server,192.168.50.1 (Task 10/11). Then verify traffic hits dnsmasq (Task 13). If VPN split DNS is involved, decide whether VPN should override LAN DNS and configure split forwarding explicitly.

2) “dnsmasq won’t start: address already in use”

Symptom: systemd shows dnsmasq failed; logs mention bind errors.

Root cause: systemd-resolved (or another DNS server) has port 53, or another DHCP server has port 67.

Fix: Use Task 4 to identify the process. If you only need dnsmasq on the LAN IP, set listen-address=192.168.50.1 and avoid localhost. If you need localhost too, disable the competing service intentionally.

3) “Clients get the wrong default gateway”

Symptom: Some clients can resolve names but can’t reach the internet or other VLANs.

Root cause: Wrong dhcp-option=option:router, or competing DHCP server handing out a different router option.

Fix: Run Task 14 to locate rogue DHCP. Then correct option:router. Confirm on client using its DHCP packet inspection (Task 10/16).

4) “Local names don’t resolve, but external names do”

Symptom: example.com resolves fine; nas.lan does not.

Root cause: Missing local=/lan/ or wrong domain=lan; the local hosts file isn’t being read; or the client search domain isn’t set.

Fix: Validate dnsmasq local answers (Task 7). Ensure DHCP hands out option:domain-name,lan. If you don’t want search domains, teach users to use FQDNs like nas.lan and move on.

5) “After a change, nothing updates for hours”

Symptom: You changed a record or upstream, but clients keep getting old answers.

Root cause: TTLs and caching (dnsmasq cache, client caches, browser caches), plus long DHCP lease times for DNS server changes.

Fix: Lower TTLs for records you plan to move. For emergencies, restart dnsmasq (clears cache) and renew client DHCP leases. Don’t permanently run with 5-minute DHCP leases to “fix” change management.

6) “DNS looks fine from the server, broken from clients”

Symptom: dig @192.168.50.1 works on the server, but clients time out.

Root cause: Firewall blocks UDP/53, wrong interface binding, VLAN isolation, or client isn’t pointing at the server.

Fix: Task 15 (firewall), Task 4 (binding), Task 13 (tcpdump), then verify DHCP options on the client.

7) “DNS breaks only when the VPN is connected”

Symptom: Internal corporate names resolve, but LAN names stop (or vice versa).

Root cause: Split DNS policy overrides, or VPN client changes resolver order/search domains.

Fix: Decide policy: should VPN override everything, or only certain domains? Implement split forwarding in dnsmasq using domain-specific server=/corp.example/10.0.0.53, or let systemd-resolved handle it and keep dnsmasq purely for LAN clients.

Three corporate mini-stories from the trenches

Incident caused by a wrong assumption: “The router is only a router”

A mid-sized office moved into a new space. The network contractor left behind a shiny router/firewall appliance.
The internal team brought up a small Linux VM to run dnsmasq because they wanted consistent hostnames for dev kits and printers.
They assumed the appliance was “just routing” and that the old DHCP service had been disabled during the cutover.

Everything worked in the morning. By afternoon, conference room displays started dropping off the network.
Some laptops got the expected DNS server. Others got the appliance’s default DNS settings.
People blamed Wi‑Fi. People always blame Wi‑Fi.

The clue was boring: two different default gateways showed up in client configs, depending on who last renewed a lease.
The appliance was still running DHCP on the same VLAN, handing out its own gateway and DNS options.

The fix wasn’t clever. They ran a broadcast DHCP discover, confirmed two responders, and disabled DHCP on the appliance.
Then they forced a lease renew on problem devices. The “randomness” vanished instantly.

The lesson: never assume the edge appliance isn’t doing DHCP. Verify it. Appliances love features the way toddlers love markers.

Optimization that backfired: “Let’s reduce upstream queries to near zero”

In a different org, a team tuned dnsmasq’s cache aggressively and added domain search lists to “improve usability.”
Their idea was simple: big cache, long TTL behaviors, and a search domain so people could type short names.
They also enabled query logging permanently because “it’s useful.” It was.

Two weeks later, incident reports started showing slow logins to SaaS apps, intermittent failures resolving certain domains,
and periodic high IO wait on the VM. The CPU was fine. The NIC was fine. The disk was sad.

Permanent query logging produced a high churn rate in journald and the underlying storage.
The search domain list multiplied queries: typos and short names expanded into multiple failing lookups per attempt.
Negative caching did the rest—temporary upstream NXDOMAIN responses stuck around longer than the team expected.

They “optimized” their way into a bottleneck: the system was spending time writing logs about lookups that shouldn’t have existed.
The DNS server was accurate. It was just busy narrating its own life.

The fix: disable permanent log-queries, keep it as a switch for incidents, and trim search domains.
They kept a reasonably sized cache, but stopped trying to make DNS invisible through cleverness.

Boring but correct practice that saved the day: “Bind explicitly, document ownership”

A finance company ran a small fleet of branch-office servers. Each branch had a local dnsmasq providing DHCP and DNS cache
because WAN links were unreliable and latency-sensitive apps were everywhere.
The config standard was dull: dnsmasq bound only to the LAN interface, upstream resolvers were explicit,
and DHCP was authoritative on exactly one VLAN per branch.

One day, an emergency VPN rollout pushed a new tunnel interface to every server. Suddenly, lots of services started binding
to extra interfaces. This is how “internal-only” becomes “accessible from places you didn’t imagine.”

dnsmasq didn’t care. It was already pinned to the LAN IP with listen-address and bind-interfaces.
It didn’t start answering DNS on the VPN interface, and it didn’t leak local names into the wrong place.

Other services misbehaved and had to be patched quickly. DNS and DHCP kept working, quietly, in the background.
The branch offices noticed the VPN changes; they did not notice their network core services staying stable.
That is a compliment you don’t get in a ticketing system.

The lesson: the boring practice—explicit binds, explicit upstreams, explicit ownership—pays out exactly when your environment changes under you.

Checklists / step-by-step plan

Plan A: Deploy dnsmasq as the LAN DNS cache + DHCP server (recommended)

  1. Pick the authority boundary.
    Decide which subnet(s) dnsmasq will serve DHCP for. Disable DHCP elsewhere on that segment.
  2. Pick the listening scope.
    Choose the LAN IP and interface. Use listen-address and bind-interfaces.
  3. Write a minimal config.
    Start with the baseline config in this article. Avoid feature sprawl.
  4. Test syntax.
    Run dnsmasq --test before restarting.
  5. Start service and verify ports.
    Use ss -ulpn to confirm dnsmasq owns UDP/53 and UDP/67 as intended.
  6. Verify DHCP offers.
    Use a client with dhclient -v or run a broadcast DHCP discover scan.
  7. Verify DNS answers.
    Test both local names (e.g., nas.lan) and external names.
  8. Enable logging only when you need it.
    Keep log-queries commented by default; turn it on during incidents, then turn it off.
  9. Document it like you’ll forget.
    One page: what subnets served, where the config lives, who owns upstream DNS, and how to detect rogue DHCP.

Plan B: Migrate from “router DHCP” to dnsmasq without a surprise outage

  1. Lower router DHCP lease time temporarily (e.g., a few hours) one day before migration.
  2. Bring up dnsmasq in “DNS only” mode first (do not enable DHCP yet). Validate DNS behavior.
  3. Schedule a short cutover window.
  4. Disable DHCP on router.
  5. Enable dnsmasq DHCP with dhcp-authoritative and correct options.
  6. On a few test clients, renew leases and verify DNS/gateway are correct.
  7. Watch leases file and logs for the first hour. Expect some weird clients to cling to stale leases; force renew as needed.
  8. Restore sane lease times (e.g., 12h or 24h) after stabilization.

Plan C: Make it resilient (without turning it into a project)

  • Put dnsmasq on a stable host (small VM, NUC, or low-power server) with a UPS if you care about it.
  • Keep config in a repo. Even a private git repo on the box is better than “I think I changed it last month.”
  • Back up /etc/dnsmasq.d and lease files if static mappings matter.
  • Monitor: service state, query latency (synthetic dig), and packet loss on the LAN.

FAQ

1) Should I disable systemd-resolved on clients?

Usually no. In managed fleets and laptops with VPN clients, systemd-resolved is often the integration point.
Let it run, and just ensure it uses your dnsmasq as the upstream DNS server for the LAN link.

2) Should I disable systemd-resolved on the dnsmasq server?

If dnsmasq is dedicated and you want it to also be the server’s own resolver, disabling systemd-resolved can simplify things.
But you don’t have to—just make sure dnsmasq doesn’t bind to localhost if systemd-resolved owns it.

3) Why use no-resolv and explicit server= entries?

Because /etc/resolv.conf is frequently managed by other components and can change during boot, VPN connect, or link changes.
Explicit upstreams make behavior predictable and debugging faster.

4) What cache size should I use?

For small networks, 5,000–20,000 is usually fine. If you’re RAM-constrained, smaller is fine too—just don’t expect miracles.
Measure using USR1 stats and query latency before and after.

5) Can dnsmasq do split DNS?

Yes. You can forward specific domains to specific upstream resolvers using server=/corp.example/10.0.0.53.
Keep it explicit and document it, because future you won’t remember why only one domain behaves differently.

6) How do I know if clients are bypassing dnsmasq?

Run tcpdump on the server interface for UDP/53 (Task 13). If clients complain and you see no traffic, they aren’t querying you.
Then check the client’s DNS server settings (Task 11) and DHCP options.

7) Is it safe to run dnsmasq on a host that also runs Docker or Kubernetes?

It can be, but you must bind dnsmasq to the intended interface/IP. Container stacks create extra interfaces,
and “listen everywhere” becomes “answer DNS where you shouldn’t.” Use listen-address and bind-interfaces.

8) What about IPv6 (SLAAC, RA, DHCPv6)?

dnsmasq can help with IPv6, but IPv6 introduces more moving parts (router advertisements, DHCPv6, DNS via RDNSS).
If you’re just starting, get IPv4 stable first. Then add IPv6 deliberately and test with packet captures.

9) Should I enable DNSSEC validation in dnsmasq?

Only if you understand the failure modes and have time to debug them. DNSSEC can improve integrity,
but misconfigurations and broken upstreams can create “everything is down” incidents that look like network issues.

10) Can I run two dnsmasq servers for redundancy?

For DNS caching/forwarding, yes—clients can have multiple DNS servers. For DHCP, redundancy is trickier.
You can split ranges or use DHCP failover mechanisms with other software, but dnsmasq alone isn’t a full enterprise DHCP HA solution.
For small networks, a second cold spare and good backups are often the pragmatic answer.

Next steps you can do today

If you want dnsmasq to behave, stop letting it negotiate authority with whatever else happens to be installed.
Choose the architecture, bind to the right interface, make upstream DNS explicit, and ensure DHCP has a single owner.
That’s 80% of the reliability story.

  1. Run the rogue DHCP check (Task 14) on your LAN and eliminate any double-servers.
  2. Implement the baseline config and validate ports (Task 4) and syntax (Task 3).
  3. Verify client path: DHCP handshake (Task 16) and active DNS server (Task 11).
  4. Keep logging as an on-demand tool, not a lifestyle.
  5. Write down ownership: which box is DHCP, which box is DNS, and what “normal” looks like.

You don’t need a fancy stack to have reliable naming and addressing. You need one boring box that does its job, and a system that doesn’t fight it.

← Previous
Docker: Alpine vs Debian-slim — stop picking the wrong base image
Next →
DNS Cache Poisoning Basics: Harden Your Resolver Without Overengineering

Leave a comment