REFUSED is the DNS equivalent of being waved away at the door by someone with a clipboard. The packet arrived. The server understood what you asked.
It just decided you’re not invited.
That makes REFUSED both easier and more annoying than a timeout: your network is probably fine, your client is probably fine, and your server is very intentionally
declining to help. The trick is figuring out which policy is doing the refusing, and whether it’s correct.
What REFUSED really means (and what it doesn’t)
In DNS terms, REFUSED is an RCODE (response code) that says: “I’m not answering this query because of local policy.”
Not “I don’t know,” not “the name doesn’t exist,” not “I’m broken.” Policy.
The two most common misunderstandings:
-
REFUSED is not NXDOMAIN. NXDOMAIN means the server did the work and concluded the name doesn’t exist.
REFUSED means the server won’t even try (or won’t disclose the result) for you. - REFUSED is not SERVFAIL. SERVFAIL is “I tried, something went wrong.” REFUSED is “nope, intentionally.”
A good mental model: REFUSED is the DNS server’s bouncer enforcing the guest list. Your query can be perfectly valid.
You might even get an answer from the same server if you ask from a different IP, over a different interface, using the right TSIG key,
or with recursion enabled for your network.
Where you see it
You’ll see REFUSED in tools like dig as status: REFUSED, often with a small response and no answer section.
Some clients hide it behind vague error messages like “server refused” or “DNS name does not exist” (which is wrong, thanks).
Why you should care
REFUSED is usually security working as designed: no open resolvers, no zone transfers to the world, no recursion for random networks,
no data leakage across split-horizon views. The incident happens when that policy blocks the wrong people, or when your architecture relies
on a behavior (recursion, forwarding, transfer, update) that was never actually allowed.
One paraphrased idea from Gene Kim (operations/reliability author): make work visible and reduce uncertainty, because hidden queues and surprises drive outages
.
REFUSED is often “hidden policy” made visible.
Interesting facts and historical context
- RCODEs are old: the DNS response codes came from early DNS design work in the 1980s, when “policy refusal” was already a concept.
-
Open resolvers became a global problem: widespread abuse of open recursive resolvers for reflection/amplification attacks pushed operators
to default-deny recursion for untrusted clients. -
BIND’s defaults shifted over time: older configurations and copy-pasted blog snippets can leave recursion unexpectedly open or unexpectedly closed,
depending on version and distro packaging. -
Split-horizon DNS predates “zero trust” branding: serving different answers to different clients is an old enterprise trick for hiding internal
names and steering traffic. -
EDNS(0) changed the game: extending DNS to support larger UDP responses exposed new policy points: buffer sizes, TCP fallback, and filtering
“weird” EDNS clients. -
DNSSEC increased resolver complexity: validation failures typically surface as SERVFAIL, but many environments add policy layers (firewalls, RPZ)
that turn some queries into REFUSED on purpose. - RPZ popularized policy-driven answers: Response Policy Zones let resolvers rewrite or block responses (NXDOMAIN, CNAME to walled garden, or REFUSED-like behavior).
- Anycast made “same IP, different policy” real: with anycast DNS, two users can hit different instances with different ACLs, making REFUSED feel intermittent.
Fast diagnosis playbook
If you remember nothing else, remember this order. It’s optimized for real incidents: short loops, high signal, minimal guessing.
1) Confirm what is refusing: which server, which path, which protocol
- Do you get REFUSED from the local stub (systemd-resolved), a corporate resolver, or an authoritative server?
- Is it UDP only? TCP only? Only over VPN? Only in one subnet?
- Is it one name, one zone, or everything?
2) Compare behavior from two source IPs
Same query, different client networks. If one works and one gets REFUSED, it’s almost always an ACL, a view, or “recursion allowed for A but not B.”
3) Determine if recursion is involved
Query an external name (like something not in your zones) directly against the server you suspect. If it refuses that but answers internal names,
you’re looking at recursion controls or forwarding restrictions.
4) Look for policy engines (RPZ, DNS firewall, views, geo rules)
If the refusal is name-specific (certain domains only), assume policy. If it’s client-specific (certain subnets only), assume ACL/view/interface binding.
5) Check logs last—but check them correctly
Logs can confirm the decision path, but in many environments they’re rate-limited, sampled, or only on the refusing node in an anycast fleet.
Use logs to validate a hypothesis, not to generate one.
Joke #1: DNS policy is like office coffee—everyone depends on it, nobody owns it, and when it’s bad, productivity collapses.
Where REFUSED is generated: resolver vs authoritative vs middleboxes
Recursive resolvers
Recursive resolvers refuse queries when they are configured to do so based on client IP, interface, or query type. Common cases:
- Recursion disabled or restricted: server will answer authoritatively for its own zones, but refuses recursion for outsiders.
- ACL denies client: “access-control” (Unbound) or “allow-query/allow-recursion” (BIND) blocks the source.
- Policy zone / DNS firewall: RPZ or equivalent refuses (or rewrites) based on domain.
- Rate limiting / abuse controls: some stacks refuse when you trip thresholds.
Authoritative servers
Authoritative servers refuse when the query is outside what they serve, when views restrict who can ask, or when a special operation is attempted:
- AXFR/IXFR refused: zone transfers are commonly refused without explicit allow-transfer and sometimes TSIG.
- Dynamic update refused: RFC2136 updates require allow-update and usually TSIG. Otherwise: refusal.
- Query ACLs: some authoritative deployments restrict who can query internal zones.
Forwarders and “DNS in the middle”
REFUSED can also come from a forwarder (like CoreDNS, dnsmasq, systemd-resolved, or a cloud DNS proxy) that enforces rules before forwarding,
or passes back a REFUSED it received upstream.
Then there are middleboxes: firewalls doing DNS inspection, “security gateways” that answer on behalf of DNS, or load balancers with health-check logic.
They can generate REFUSED-like behavior that looks like a DNS server decision but is really a network appliance having a day.
The policies that block you: ACLs, recursion controls, RPZ, views, TSIG, and friends
1) Client ACLs: “allow-query” and “allow-recursion” (BIND)
In BIND, the two knobs people confuse are allow-query and allow-recursion. They are not interchangeable.
- allow-query: who may ask questions at all (for the zones served, and sometimes globally).
- allow-recursion: who may use recursion (fetching answers from elsewhere).
A server can be willing to answer authoritatively for corp.example but refuse recursion for internet names. That’s normal and good.
Trouble happens when you deploy “one resolver to rule them all” and forget that half your clients aren’t in the allowed subnet list.
2) Unbound access-control: deceptively simple
Unbound tends to be clearer: you define access-control rules per subnet and action (allow/refuse/deny).
But “clearer” doesn’t mean “hard to mess up.”
If you add a broad refuse rule before a narrower allow (or misunderstand match precedence), you can lock out a region.
And in an anycast fleet, you may only lock out the nodes that picked up that change.
3) Views and split-horizon: correct idea, sharp edges
Views (BIND) or similar constructs allow different answers depending on the source address. They’re great for:
- internal vs external records
- private zones only reachable via VPN
- cloud migrations where you steer certain networks to new endpoints
They also create a classic REFUSED pattern: clients fall into the “wrong view” and get REFUSED because that view has no recursion,
no access to the zone, or stricter ACLs.
4) RPZ / DNS firewall: when the server refuses because the name is “bad”
Many enterprises run Response Policy Zones or similar DNS firewall features. Depending on policy, a blocked domain might yield:
NXDOMAIN, a walled-garden IP, or occasionally a refusal-like behavior (or a refusal from an upstream policy engine).
If REFUSED is domain-specific and consistent across clients, suspect RPZ or an upstream security resolver. If it’s domain-specific and inconsistent,
suspect multiple resolvers with different policy versions.
5) Zone transfers (AXFR/IXFR): refusal is the default
If you attempt an AXFR from a random workstation and get REFUSED: congratulations, someone did at least one thing right.
Transfers should be explicitly permitted to secondaries, often with TSIG.
6) Dynamic updates: TSIG or it didn’t happen
RFC2136 updates are powerful and dangerous. A refusal here is normally a missing key, wrong key, wrong algorithm, or wrong “update-policy.”
7) Interface binding and source address surprises
DNS ACLs are evaluated against the source IP. In NAT, VPN, Kubernetes, and cloud load balancers, the “source IP you think you have”
is often not the source IP the DNS server sees.
REFUSED appears after a network change because the ACL didn’t change with it.
8) DNS over TCP / EDNS / fragmentation: policy by accident
Some environments restrict TCP/53, or block large UDP DNS responses. When a resolver tries UDP, gets truncated, retries with TCP, and TCP is blocked,
you usually see timeouts. But some middleboxes and proxies respond with REFUSED or an immediate failure.
Don’t assume REFUSED is purely a “DNS config issue.” It can be “network security thinks DNS should be 512 bytes forever.”
Hands-on tasks: commands, outputs, and decisions (12+)
These are the tasks I actually run during incidents. Each one includes the command, a realistic output snippet, what it means, and what decision to make.
Use them as a repeatable investigation script, not a one-off ritual.
Task 1: Reproduce with dig and capture the RCODE clearly
cr0x@server:~$ dig @10.20.30.40 www.example.com A +norecurse
; <<>> DiG 9.18.24 <<>> @10.20.30.40 www.example.com A +norecurse
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 59821
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; QUESTION SECTION:
;www.example.com. IN A
;; Query time: 2 msec
;; SERVER: 10.20.30.40#53(10.20.30.40) (UDP)
;; WHEN: Tue Dec 31 11:02:18 UTC 2025
;; MSG SIZE rcvd: 56
What it means: The DNS server answered quickly with REFUSED. Network path is fine.
Decision: Stop debugging routing and firewalls first. Start looking at ACLs/policy on that server or the layer in front of it.
Task 2: Check if recursion is the issue (same query with recursion desired)
cr0x@server:~$ dig @10.20.30.40 www.example.com A +recurse
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 3452
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
What it means: Refusal happens regardless of RD bit. This is likely “client not permitted” rather than “recursion not available.”
Decision: Validate client ACLs and views. If you expected authoritative answers, confirm you’re querying the right server.
Task 3: Verify the server answers anyone else (compare from a known-good subnet)
cr0x@server:~$ dig @10.20.30.40 www.example.com A +short
93.184.216.34
What it means: From this host, it works. So the DNS server can answer; the refusal is client-specific or path-specific.
Decision: Compare source IPs and NAT. Identify what IP the DNS server sees for the failing client.
Task 4: Confirm what resolver your client is actually using
cr0x@server:~$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 127.0.0.53
DNS Servers: 127.0.0.53
What it means: systemd-resolved is in play; your “DNS server” might be a stub forwarding elsewhere.
Decision: Query the upstream resolver directly, not just 127.0.0.53, or inspect resolved’s upstream with resolvectl dns.
Task 5: Identify upstream DNS servers configured on the host (and surprises)
cr0x@server:~$ resolvectl dns
Global:
Link 2 (ens192): 10.20.30.40 10.20.30.41
Link 3 (tun0): 172.16.0.53
What it means: VPN interface provides its own resolver. REFUSED might happen only when the VPN is up and route priorities change.
Decision: Re-test by targeting each resolver IP directly. Decide if split-DNS rules are misapplied.
Task 6: Determine if you’re hitting an authoritative-only server
cr0x@server:~$ dig @10.20.30.40 corp.example SOA +norecurse
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11906
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; ANSWER SECTION:
corp.example. 3600 IN SOA ns1.corp.example. hostmaster.corp.example. 2025123101 7200 3600 1209600 3600
What it means: Server is authoritative for corp.example (AA flag). It might refuse recursion for everything else by design.
Decision: If clients are using this as a general resolver, fix the architecture: point clients to recursive resolvers, not auth-only.
Task 7: Explicitly test recursion availability with a root hint style query
cr0x@server:~$ dig @10.20.30.40 . NS +recurse
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 41223
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
What it means: This server refuses recursion (or refuses you). Querying the root should be answerable by a recursive resolver.
Decision: Check recursion settings and client ACLs. Confirm whether this server is intended to recurse for your source IP.
Task 8: Check BIND view matching and ACLs (server-side)
cr0x@server:~$ sudo named-checkconf -p | sed -n '1,120p'
options {
directory "/var/cache/bind";
recursion yes;
allow-recursion { 10.20.0.0/16; 127.0.0.1; };
allow-query-cache { 10.20.0.0/16; 127.0.0.1; };
};
view "internal" {
match-clients { 10.20.0.0/16; };
recursion yes;
};
view "external" {
match-clients { any; };
recursion no;
};
What it means: Clients outside 10.20.0.0/16 land in “external” view with recursion disabled; they may see REFUSED for recursive lookups.
Decision: Either expand match-clients to include the intended networks (VPN, NAT ranges), or ensure those clients use the correct resolver.
Task 9: Check Unbound access-control evaluation
cr0x@server:~$ sudo unbound-control list_local_zones
name type
. transparent
corp.example. transparent
blocked.example. static
What it means: Local zones and policy may exist. Not all refusals are “ACL”; some are “local policy.”
Decision: Inspect unbound.conf for access-control: and policy zones; confirm expected behavior for the blocked domains.
Task 10: Inspect Unbound access-control lines quickly
cr0x@server:~$ sudo grep -nE '^(access-control|interface|do-tcp|do-udp)' /etc/unbound/unbound.conf
12:interface: 10.20.30.40
23:do-udp: yes
24:do-tcp: yes
41:access-control: 10.20.0.0/16 allow
42:access-control: 0.0.0.0/0 refuse
What it means: Default is refuse for everyone except 10.20.0.0/16. If your clients are now in 10.21.0.0/16, they’re blocked.
Decision: Add the new subnet(s) as allow, or fix NAT so clients appear from an allowed source IP.
Task 11: Check if REFUSED is actually “AXFR denied”
cr0x@server:~$ dig @10.20.30.53 corp.example AXFR
; Transfer failed.
What it means: Transfers are blocked (as they should be by default). Some servers return REFUSED; dig just reports failure.
Decision: If you are configuring a secondary: implement allow-transfer on the primary and TSIG between them. Don’t “temporarily open it to any.”
Task 12: Validate BIND transfer permissions (server-side)
cr0x@server:~$ sudo named-checkconf -p | grep -n 'allow-transfer' -n
248: allow-transfer { key "xfr-tsig"; 10.20.30.54; };
What it means: Transfers are allowed only for 10.20.30.54 and only with a TSIG key. If your secondary IP changed, it will be refused.
Decision: Update the allowed address list and roll the TSIG key correctly. Treat this as a controlled change, not an emergency hack.
Task 13: Check dynamic update refusal due to missing TSIG
cr0x@server:~$ nsupdate -v <<'EOF'
server 10.20.30.53
zone corp.example
update add test01.corp.example 60 A 10.20.40.10
send
EOF
update failed: REFUSED
What it means: Server refuses updates without appropriate auth. Good.
Decision: Use TSIG and update-policy. If this was expected to work, the client isn’t using the right key or you’re hitting the wrong view.
Task 14: Check server logs for explicit “denied” reason (BIND example)
cr0x@server:~$ sudo journalctl -u bind9 -n 20 --no-pager
Dec 31 11:02:18 ns1 named[1442]: client @0x7f9a64012a00 10.21.5.18#43812 (www.example.com): query (cache) 'www.example.com/A/IN' denied
Dec 31 11:02:18 ns1 named[1442]: client @0x7f9a64012a00 10.21.5.18#43812 (www.example.com): query failed (REFUSED) for www.example.com/IN/A at query.c:7750
What it means: The server explicitly denied a cache (recursive) query from 10.21.5.18.
Decision: Add 10.21.0.0/16 to allow-recursion / allow-query-cache (or route those clients to the correct resolver).
Task 15: Prove NAT/source-IP mismatch (capture at the server)
cr0x@server:~$ sudo tcpdump -ni any port 53 and host 10.21.5.18 -c 3
tcpdump: data link type LINUX_SLL2
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
11:02:18.412345 ens192 In IP 192.0.2.45.53512 > 10.20.30.40.53: 59821+ A? www.example.com. (32)
11:02:18.412912 ens192 Out IP 10.20.30.40.53 > 192.0.2.45.53512: 59821 Refused 0/0/1 (56)
What it means: The “client” 10.21.5.18 is actually NATed to 192.0.2.45 at the DNS server’s perspective.
Decision: Fix ACLs to include the NAT egress range, or adjust routing so the original client IP is preserved (where feasible).
Task 16: Verify if a forwarder is returning REFUSED upstream
cr0x@server:~$ dig @10.20.30.10 www.example.com A +comments
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 51001
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
What it means: The forwarder claims recursion available (RA), but still refuses. That often points to policy (RPZ/security) or upstream refusal.
Decision: Query the forwarder’s upstream (if known) or check forwarder configuration for domain blocks and ACLs.
Three corporate mini-stories from the trenches
Mini-story 1: The outage caused by a wrong assumption
A mid-sized company ran two BIND servers per datacenter. They were labeled “DNS1/DNS2” in inventory and everyone treated them as interchangeable.
One pair was authoritative for internal zones and also did recursion for clients. The other pair was authoritative-only for several delegated zones used
by a legacy platform.
During a network refresh, a team updated DHCP scopes and accidentally pointed a large office subnet at the authoritative-only pair.
The symptoms were weirdly selective: internal names in corp.example resolved fine, because the auth servers were authoritative.
Everything else—SaaS logins, package repositories, external APIs—started returning REFUSED.
The first responders chased firewalls and ISP links. They saw “REFUSED” and interpreted it as “remote DNS is blocking us.”
Meanwhile, the DNS servers were doing exactly what they were built to do: “I’m authoritative for these zones; I am not your recursive resolver.”
The fix was boring: correct DHCP options and add monitoring that checks recursion on the resolver VIP, not just “port 53 is open.”
The lesson was sharper: do not name servers by role-ambiguity. “DNS1” is not a role; it’s a confession.
Mini-story 2: The optimization that backfired
Another organization wanted to reduce resolver load. Someone proposed a “smart” approach: route branch offices to a regional forwarder,
then to central resolvers. It looked neat in diagrams: fewer round trips, more caching at the edge.
They implemented a dnsmasq forwarder on branch routers with aggressive local caching and a minimal ACL: allow only the branch subnet.
Then a VPN rollout happened. Remote laptops entered the branch network over a tunnel, but their source IPs came from a different pool.
The router’s dnsmasq saw “unknown source” and returned REFUSED immediately.
The incident was painful because it was intermittent: some users were on Wi‑Fi (NATed into allowed space), others on wired (not NATed),
others remote (tunneled). Same laptop, different time of day, different outcome.
They fixed it by removing caching and policy from the router entirely: branch clients used central resolvers directly over the VPN,
and the router only provided DHCP. The optimization wasn’t evil; it just moved policy enforcement to the most fragile place in the system.
Mini-story 3: The boring, correct practice that saved the day
A global enterprise ran anycast recursive resolvers with strict ACLs and RPZ. They also had a habit that looked bureaucratic:
every resolver policy change required a “policy diff” output attached to the change request, plus a canary rollout to one site.
One day, an engineer updated the allowed client ranges to include a new cloud VPC. The change was correct in intent but wrong in a detail:
they added the VPC CIDR in one environment file but not in the generated ACL for production.
The canary site immediately showed REFUSED for clients in that VPC. The runbook had a specific check: run a query from an instance in the VPC,
confirm it hits the canary node, and confirm status: NOERROR for recursion.
The engineer caught the mismatch before the fleet rollout.
No drama. No executive bridge call. Just a change reverted and re-issued with correct generation. Boring won, again.
Common mistakes: symptom → root cause → fix
This section is intentionally specific. Generic advice causes generic outages.
1) Symptom: REFUSED for external names, internal names work
- Root cause: client is using an authoritative-only server; recursion disabled; or client falls into a “no recursion” view.
- Fix: point clients to recursive resolvers; or enable recursion for the right view and set allow-recursion correctly.
2) Symptom: REFUSED only from VPN users
- Root cause: ACLs don’t include VPN pool; split-DNS misconfigured; VPN DNS pushes a different resolver with stricter policy.
- Fix: include VPN CIDR in allow-recursion/access-control; ensure VPN clients use intended resolver; validate NAT behavior.
3) Symptom: REFUSED only for one domain (or category of domains)
- Root cause: RPZ/DNS firewall policy; security resolver upstream; domain placed in “local-zone” static block.
- Fix: confirm policy intent; add exception; roll policy consistently across resolvers; document who owns the block decision.
4) Symptom: REFUSED after moving resolvers behind a load balancer
- Root cause: source IP changes (SNAT), so ACLs no longer match; health-checks come from an untrusted range and get refused.
- Fix: preserve source IP where possible; otherwise add LB egress ranges to ACLs; set explicit “allow” for health-check sources.
5) Symptom: AXFR/IXFR fails with “Transfer failed” or REFUSED
- Root cause: missing allow-transfer; secondary IP changed; TSIG mismatch; view mismatch (secondary hits wrong view).
- Fix: add allow-transfer for correct IPs; use TSIG; verify both ends agree on key name/algorithm; ensure match-clients includes secondary.
6) Symptom: Dynamic updates return REFUSED
- Root cause: nsupdate missing TSIG; update-policy doesn’t permit the name; server is not the primary for that zone; wrong view.
- Fix: configure TSIG on client; adjust update-policy with least privilege; send updates to the primary; validate view selection.
7) Symptom: Some anycast sites refuse; others answer
- Root cause: inconsistent config/policy rollout; site-local ACLs; different upstream forwarders; incomplete RPZ replication.
- Fix: enforce config convergence; add per-site smoke tests; identify which node served the query via CHAOS TXT or instance tagging.
8) Symptom: REFUSED spikes only under load
- Root cause: rate limiting configured to refuse; abuse detection; upstream refusing due to your resolver behavior (too many clients behind one NAT).
- Fix: tune rate limiting; fix NAT fan-in; add more egress IPs; verify you’re not accidentally acting like a botnet.
Joke #2: If you “temporarily” allow recursion for any, the internet will RSVP immediately.
Checklists / step-by-step plan
Step-by-step: isolate the refusing layer
- Query with dig against the configured resolver. Record server IP, status, flags, UDP/TCP.
- Bypass the stub. If systemd-resolved is used, query the real upstream resolver IPs directly.
- Query the same name from two networks. Office LAN vs VPN, or one pod vs one VM. Note differences.
- Query an authoritative zone you expect. Confirm AA flag appears where expected; confirm correct server role.
- Check recursion explicitly. Query the root or a clearly external domain with +recurse.
- Check server logs for “denied/refused.” Confirm the source IP the server sees.
- Validate NAT and routing. Packet-capture on the server if necessary to reveal the true client address.
- Identify policy layers. RPZ, views, DNS firewall upstream, forwarding chains.
- Fix with least privilege. Expand ACLs precisely; avoid opening recursion to broad ranges.
- Regression test. Test from a representative client set. Then monitor for open resolver exposure.
Checklist: safe changes that reduce REFUSED incidents without weakening security
- Define explicit resolver roles: recursive-only, authoritative-only, or mixed (and document it).
- Maintain an inventory of client CIDRs that are allowed recursion; update it with network change tickets.
- For VPN and cloud: decide whether source IP is preserved or NATed; write ACLs to match reality.
- For anycast: enforce config convergence and include a per-node “policy version” indicator.
- For RPZ: establish an exception process and a clear owner for policy decisions.
- Monitor for REFUSED rate increases, and split by client subnet and query type.
- Test AXFR/IXFR and updates from secondaries/automation after every DNS change.
Checklist: what not to do (unless you enjoy pager fatigue)
- Don’t enable recursion for
anyto “make it work.” That’s how you become an open resolver incident report. - Don’t run policy enforcement on branch routers unless you can update and audit it like production.
- Don’t assume the “client IP” is what the server sees; verify with logs or tcpdump.
- Don’t mix authoritative and recursive roles casually; when you do, document the intent and enforce it with views.
FAQ
1) What’s the difference between REFUSED and NXDOMAIN?
NXDOMAIN means the server answered and the name does not exist in the DNS namespace it checked. REFUSED means the server declined to answer due to policy.
2) Can an authoritative server return REFUSED for normal queries?
Yes. Authoritative servers can restrict who may query certain zones (internal-only), and commonly refuse AXFR/IXFR and dynamic updates without permission.
3) Why do I get REFUSED only from certain subnets?
That’s almost always an ACL or view match issue. Either those subnets are not in allow-recursion/access-control, or they land in a view with recursion disabled.
NAT can make this appear random if the apparent source IP changes by path.
4) Why does dig show “ra” (recursion available) but status is REFUSED?
“RA” is a capability flag; it doesn’t mean recursion is permitted for your source. Many resolvers are capable of recursion but refuse it for unauthorized clients.
Some forwarders also pass RA while applying their own policy.
5) Is REFUSED always a misconfiguration?
No. REFUSED is frequently correct security posture: refusing recursion to the internet, refusing zone transfers broadly, refusing unsigned updates.
It becomes a problem when policy and intended users drift apart.
6) How do RPZ and DNS firewalls relate to REFUSED?
RPZ is a policy mechanism that can block or rewrite DNS answers based on the name or response. Depending on implementation and policy choice,
clients might see NXDOMAIN, a walled-garden record, or a refusal propagated from an upstream policy resolver.
7) Why do Kubernetes pods sometimes see REFUSED when nodes don’t?
Pods often query CoreDNS, which forwards to upstream resolvers. The upstream may see the node IP, a NAT IP, or a cloud NAT gateway IP.
If recursion ACLs don’t include that apparent source, you get REFUSED. Another cause: CoreDNS policy plugins or stub-domain configurations.
8) What’s the safest way to “fix” REFUSED for legitimate clients?
Identify the exact client source IP range as seen by the refusing server, then add the narrowest allow rule required (allow-recursion/access-control),
ideally in the correct view. Validate you did not create an open resolver. Test from multiple networks.
9) Can DNSSEC cause REFUSED?
DNSSEC validation failures typically lead to SERVFAIL, not REFUSED. But policy layers around DNSSEC (security resolvers, DNS firewalls) can decide to refuse
certain behaviors or clients. Treat DNSSEC as a complicating factor, not the default culprit for REFUSED.
10) Should I return REFUSED or drop packets for blocked clients?
Returning REFUSED is operationally nicer: clients fail fast and you can diagnose it. Dropping packets can look like a network outage.
If you’re blocking abusive traffic at scale, drops may be appropriate, but for internal policy enforcement, REFUSED usually reduces incident time.
Conclusion: next steps that actually reduce incidents
REFUSED is not a mystery code. It’s a policy decision, and DNS is full of policy: who can recurse, who can transfer, who can update,
which names are blocked, which clients get which view. When REFUSED bites you, it’s usually because your network reality changed faster than your DNS policy did.
Do these next:
- Map resolver roles (recursive vs authoritative) and stop pointing clients at the wrong one.
- Inventory allowed client source ranges as seen by the resolver (including NAT and VPN pools).
- Add a regression test that runs dig from each major client environment and alerts on REFUSED spikes.
- Treat policy like code: diffable configs, canary rollout, and a clear owner for RPZ/DNS firewall rules.
The goal isn’t to eliminate REFUSED. The goal is to make refusals intentional, documented, and predictable—so the only thing getting blocked is the bad traffic,
not your own business.