DNS over HTTPS and DNS over TLS: privacy without breaking corporate networks

November 12, 2025 • February 3, 2026 • Read: 23 min • Views: 7

Was this helpful?

The day “DNS is down” hits your chat, it’s never just DNS. It’s VPN auth failing, SaaS apps timing out,
and half the company discovering they don’t actually know what “split-horizon” means. Now add encrypted DNS—
DoH and DoT—and you get a special flavor of outage: everything looks normal on the wire, and your logs go quiet.

You want user privacy. You also want corporate controls: internal zones, threat blocking, egress policy,
and incident response that doesn’t involve reading tea leaves. You can have both, but only if you stop treating
DoH/DoT as a browser toggle and start treating it as a production service with an architecture, policies, and tests.

What you’re really changing when you enable DoH/DoT

Classic DNS is mostly UDP/53 (plus TCP/53 for big answers, DNSSEC, and zone transfers). It’s fast, ubiquitous,
and painfully observable. Observability cuts both ways: your ISP can see where you’re going, and so can anyone
sitting in the path with a packet capture and a bad attitude.

DoT (DNS over TLS) wraps DNS in TLS, usually to port 853. DoH (DNS over HTTPS) wraps DNS in HTTP/2 or HTTP/3 over TLS,
usually to port 443. Both encrypt the question and answer. Both protect against passive observers on the network.
Neither magically makes DNS “secure” in the sense of “correct”—that’s still a combination of resolver integrity,
DNSSEC validation, and not running your entire estate on vibes.

In corporate networks, DNS is more than name resolution:

Identity glue: internal service discovery, AD, Kerberos, and SSO flows.
Policy point: blocklists, sinkholes, category filtering, and data-loss controls.
Telemetry: DNS logs are a cheap early-warning system.
Segmentation: split-horizon DNS is how you keep internal names internal.

Encrypt DNS without a plan and you don’t just “increase privacy.” You bypass your own controls, silently.
Then the first time someone can’t reach an internal host because their laptop is asking a public resolver,
you’ll hear a sentence that should be illegal in regulated industries: “But it works on my home Wi‑Fi.”

Facts and history that matter in ops

DNS shipped in 1983 (RFC 882/883 era). It was designed for a kinder network where “encryption everywhere” wasn’t the default.
DNSSEC started in the late 1990s and matured through the 2000s. It protects authenticity, not privacy. People confuse this constantly.
Root zone signing went live in 2010, making DNSSEC validation meaningful end-to-end for many domains, if your resolver actually validates.
DoT was standardized in 2016 (RFC 7858). It’s clean, direct, and obvious on the network: TLS to port 853 tends to stand out.
DoH was standardized in 2018 (RFC 8484). It hides inside normal HTTPS, which is both the point and the operational headache.
Firefox enabled DoH by default in some regions around 2019–2020 via “TRR” rollout. Enterprises discovered it the fun way: by getting tickets.
EDNS0 expanded DNS so responses could exceed 512 bytes over UDP, reducing TCP fallback—great for performance, messy for middleboxes that never learned.
QNAME minimization became a best practice to reduce data leakage to authoritative servers. It’s privacy help even without DoH/DoT, but depends on resolver behavior.
Encrypted ClientHello (ECH) is the next frontier: it can hide the SNI. When ECH becomes common, “block DoH by SNI” gets flimsy fast.

The industry didn’t wake up one morning and decide to encrypt DNS for fun. This was a reaction to pervasive monitoring,
ISP-level DNS tampering, and the fact that plain DNS is a tracking beacon with excellent uptime.

Threat model: who you’re protecting, and from what

“Privacy” is not a feature; it’s a decision about adversaries. For corporate networks, you usually have three:

1) Passive observers on untrusted networks

Coffee shop Wi‑Fi, hotel networks, airports. DoH/DoT helps a lot here. If you have remote workers and no always-on VPN,
encrypted DNS is one of the few controls that still works when the network is hostile.

2) On-path tampering

Captive portals, malware injecting NXDOMAINs, ISP “help” redirecting typos to ad pages. TLS removes most of this.
DNSSEC also helps, but DNSSEC deployment is uneven and many stubs don’t validate; resolvers do.

3) Your own enterprise controls (yes, you are an adversary here)

You log DNS for detection. You block known-bad domains. You route internal zones to internal servers.
Encrypted DNS can bypass those controls unless you provide a sanctioned encrypted resolver and force clients to use it.

The real enterprise goal is not “disable DoH.” It’s “make encrypted DNS go through our resolver where it belongs.”
Blocking is a blunt instrument. Sometimes necessary, rarely sufficient.

One paraphrased idea that’s haunted operations for decades: paraphrased idea: “Hope is not a strategy.” — attributed to Edsger W. Dijkstra (paraphrase, not verbatim).
If you’re “hoping” clients won’t use public DoH endpoints, you’re already behind.

DoH vs DoT: practical differences (not marketing)

Transport and visibility

DoT: TLS on port 853. Easy to identify and policy at the firewall. Also easy to break accidentally if egress rules are strict.
DoH: HTTPS on port 443. Harder to distinguish from ordinary web traffic without TLS interception, endpoint controls, or known endpoint lists.

Operational control points

If you can manage endpoints (MDM/GPO), DoH is controllable: you can set system resolvers, configure “secure DNS” policies,
and pin to your resolver. If you can’t manage endpoints, DoH becomes “a browser feature that is now your network problem.”

Performance characteristics

DoH can ride existing HTTPS connections, reuse TCP/TLS, and behave nicely over HTTP/2. In some environments that’s faster,
especially on lossy networks. In others, it adds overhead and proxy complexity. DoT is simpler: one TLS session to a resolver,
lots of queries.

Middleboxes and proxies

Corporate proxies often understand HTTP; they don’t understand “DNS semantics inside HTTPS.” A proxy can happily forward DoH
while your security team loses DNS visibility, which is like leaving your front door locked but removing the walls.

Joke #1: DNS is the only system where you can be down, up, and “working as designed” at the same time, depending on who’s asking.

How corporate DNS breaks: split-horizon, proxies, and “helpful” middleboxes

Split-horizon DNS and why DoH/DoT can bypass it

Split-horizon means internal clients get internal answers for internal names, and external clients get external answers.
It’s used for:

Internal-only hostnames (e.g., git.corp)
Different answers based on location (inside vs outside)
Security: not leaking internal topology

If a laptop uses public DoH (say to a resolver on the Internet) while on VPN—or worse, while inside your office—it may:

Fail to resolve internal names (availability incident)
Resolve to public addresses (data exfil risk, wrong routing)
Leak internal query patterns to an external party (privacy and confidentiality issue)

DNS-based security controls and what encrypted DNS does to them

If you block malicious domains at the resolver, and clients bypass that resolver, you just traded “prevent” for “detect later”
(if you even can). And many organizations can’t. They run SIEM rules that assume DNS logs exist.

Encryption isn’t the enemy. Unmanaged encryption is.

Middleboxes that mishandle modern DNS features

Even with plain DNS, middleboxes can break things:
EDNS0, DNSSEC, fragmented UDP, TCP fallback. With DoT/DoH, the failure modes shift:
TLS inspection boxes that block unknown SNI, HTTP proxies that time out long-lived connections,
and egress NATs that churn ports fast enough to make resolvers look flaky.

Recommended enterprise architecture: private resolvers + policy + controlled encryption

The north star

Provide an enterprise resolver service that supports:

Internal zones (authoritative or forwarded)
Validated recursion (DNSSEC validation on)
Threat policy (RPZ or equivalent), with change control
Encrypted listener endpoints (DoT and/or DoH) for clients
Logging that’s privacy-aware and retention-scoped
High availability (anycast or at least redundant sites)

Where to terminate encryption

Terminate DoH/DoT on your resolvers, not at a random perimeter box that has no DNS context.
Keep the resolver close (network-wise) to clients where possible; latency matters.
Then your resolver can talk to upstream resolvers or authoritative servers using:

Plain DNS (acceptable on controlled networks, but less private)
DoT to upstream (good compromise)
DNS over QUIC (emerging; treat as “test lab” unless you’re confident)

Client control strategy

Managed endpoints: configure OS-level DoH/DoT to your resolvers; lock down browser “secure DNS” to system resolver or enterprise endpoint.
Unmanaged / BYOD: you may need network controls (egress filtering, captive portal policies, NAC) and a documented “if you want access, use our DNS.”

Don’t confuse “block public DoH” with “solve the problem”

Blocking known DoH endpoints by IP or SNI is whack-a-mole. It can reduce risk quickly, but it’s not sustainable
as the Internet moves to ECH and as providers host DoH behind CDNs. The long-term play is endpoint policy and
providing a better path: your own secure resolver with comparable performance.

Joke #2: The easiest way to find an undocumented dependency is to enable DoH and watch which internal apps immediately start screaming.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company rolled out a “privacy improvement” memo: “Enable Secure DNS in your browser.” It sounded harmless.
The security team liked the idea. The helpdesk liked the idea because it sounded like fewer Wi‑Fi problems.
No one asked which resolver would be used.

Within a week, internal web apps started failing for remote workers on VPN. The failure was oddly selective:
some users could reach intranet.corp, others got timeouts, and a few got redirected to a public landing page.
The network team checked split-tunnel routes and VPN DNS settings. Everything looked fine.

The wrong assumption was simple: “If you’re on VPN, you must be using corporate DNS.” Not true.
Browsers were sending DoH queries to public resolvers over the VPN tunnel (or sometimes outside it, depending on client routing),
bypassing internal DNS views. Internal names weren’t resolvable publicly, so they failed. Some names that existed externally
resolved to public services, which is a more interesting kind of bad.

The fix wasn’t “turn off DoH.” The fix was to provide a corporate DoH endpoint, configure managed browsers to use it,
and enforce on endpoints that “secure DNS” must use enterprise resolvers. Then internal zones were consistent again,
and the security team got their DNS telemetry back—at the resolver, not on the wire.

Mini-story 2: The optimization that backfired

A large enterprise decided to reduce DNS latency by deploying a local caching layer on branch routers.
The idea: cache aggressively, lower upstream traffic, faster page loads. Someone turned the TTL honoring knob into a suggestion.
Suddenly, common SaaS domains were cached “forever enough.”

Then a major SaaS provider shifted some endpoints during an incident on their side. The enterprise kept answering with stale
A/AAAA records. Users saw intermittent failures depending on which branch they were in. The resolver logs looked clean.
Packet captures looked clean. Everything was “correct” except reality.

The backfire got worse with DoH: browsers pinned to a public DoH resolver were getting fresh answers, while managed devices
using the branch cache got stale ones. Two realities, one company. Every debugging thread started with, “Works for me.”
That phrase should be considered a hazardous substance.

The fix was boring: respect TTLs, keep cache sizes sane, and stop trying to outsmart authoritative operators.
Add resolver-side monitoring for unusually high cache hit ratios on fast-changing domains. Performance matters, but correctness is a feature.

Mini-story 3: The boring but correct practice that saved the day

A financial org had a rule: every DNS change had to be staged through a test resolver pair, with a canary group of clients,
and with a rollback path pre-approved. Nobody loved it. It felt like paperwork for packets.

They introduced DoT for managed endpoints, terminating at their internal resolvers, and kept plain DNS as a fallback for legacy devices.
During rollout, an upstream connectivity issue caused sporadic TLS handshake failures from one data center to the DoT listeners.
The canary group saw it first—small blast radius, loud alarms.

Because they had instrumentation on resolver latency, TLS handshake error rates, and client distribution per site, they quickly
isolated the issue to a specific firewall policy update that impacted long-lived TLS sessions. They rolled back that specific policy,
not the entire DoT rollout. Users barely noticed.

Nothing heroic happened. That’s why it worked. When you make DNS changes boring, you can sleep.

Practical tasks: commands, outputs, and the decision you make

The fastest way to stop arguing about DoH/DoT is to collect evidence. Below are field-tested tasks you can run on Linux hosts
(or in troubleshooting containers) and what to do with the results. Adjust hostnames and IPs to your environment.

Task 1: Confirm which resolver the host is actually using

cr0x@server:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 10.20.0.53
       DNS Servers: 10.20.0.53 10.20.0.54
        DNS Domain: corp.example

Meaning: The system resolver points to 10.20.0.53/10.20.0.54. If users claim “DoH broke DNS,” but the OS points to corporate resolvers, the bypass is likely in an app (browser) or a VPN client.

Decision: If DNS servers are public or unexpected, fix DHCP/VPN DNS assignment first. If they’re correct, investigate application-level DoH.

Task 2: Observe live DNS traffic to see if UDP/53 is even being used

cr0x@server:~$ sudo tcpdump -ni any '(udp port 53 or tcp port 53 or tcp port 853)' -c 10
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12:01:11.120345 eth0  Out IP 10.20.10.15.38422 > 10.20.0.53.53: 1234+ A? intranet.corp.example. (38)
12:01:11.121004 eth0  In  IP 10.20.0.53.53 > 10.20.10.15.38422: 1234* 1/0/0 A 10.50.1.20 (54)
10 packets captured

Meaning: Plain DNS is happening. If you see nothing on 53/853 while lookups work, DoH (443) is likely in play.

Decision: If you must enforce corporate DNS, move to endpoint/browser policy or implement network detection for DoH patterns.

Task 3: Test resolution against corporate resolver explicitly

cr0x@server:~$ dig @10.20.0.53 intranet.corp.example A +noall +answer
intranet.corp.example. 60 IN A 10.50.1.20

Meaning: Corporate resolver can resolve internal name. If users can’t, the issue is client path, not server data.

Decision: If this works but user apps fail, chase split-DNS settings, DoH bypass, or search domain differences.

Task 4: Compare with a public resolver to detect split-horizon leakage

cr0x@server:~$ dig @1.1.1.1 intranet.corp.example A +noall +comments +answer
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 22110

Meaning: Public resolver doesn’t know internal name. If endpoints use public DoH, internal names will fail.

Decision: Provide corporate DoH/DoT for managed clients; block or restrict public encrypted DNS where policy requires.

Task 5: Check whether systemd-resolved is configured for DoT

cr0x@server:~$ grep -R 'DNSOverTLS\|DNSSEC\|DNS=' /etc/systemd/resolved.conf /etc/systemd/resolved.conf.d 2>/dev/null
/etc/systemd/resolved.conf:DNS=10.20.0.53 10.20.0.54
/etc/systemd/resolved.conf:DNSOverTLS=opportunistic
/etc/systemd/resolved.conf:DNSSEC=allow-downgrade

Meaning: Opportunistic DoT means it will try DoT when supported, but fall back. That’s safer for compatibility, weaker for “must encrypt.”

Decision: For corporate-managed networks, prefer “strict” only when your resolvers are consistently reachable and monitored.

Task 6: Verify DoT connectivity to your resolver endpoint

cr0x@server:~$ openssl s_client -connect dns.corp.example:853 -servername dns.corp.example -brief
CONNECTION ESTABLISHED
Protocol version: TLSv1.3
Ciphersuite: TLS_AES_256_GCM_SHA384
Peer certificate: CN = dns.corp.example
Verification: OK

Meaning: TLS handshake succeeds, cert verifies. If it fails, DoT clients will fall back (opportunistic) or break (strict).

Decision: If verification fails, fix certificates/SANs/chain. If connection fails, fix firewall/NAT/MTU.

Task 7: Confirm your resolver is listening on 853 and/or 443

cr0x@server:~$ sudo ss -lntp | egrep '(:53|:853|:443)\s'
LISTEN 0      4096      0.0.0.0:53        0.0.0.0:*    users:(("unbound",pid=1442,fd=6))
LISTEN 0      4096      0.0.0.0:853       0.0.0.0:*    users:(("unbound",pid=1442,fd=8))
LISTEN 0      4096      0.0.0.0:443       0.0.0.0:*    users:(("caddy",pid=1310,fd=5))

Meaning: Unbound on 53/853, and a web server on 443 (often used as a DoH frontend). If 853 isn’t listening, DoT won’t work.

Decision: Ensure listeners match your rollout plan; don’t publish “DoT available” if it’s not.

Task 8: Measure resolver latency and detect packet loss symptoms

cr0x@server:~$ dig @10.20.0.53 www.example.com A +stats +tries=1 +timeout=1
;; Query time: 8 msec
;; SERVER: 10.20.0.53#53(10.20.0.53) (UDP)
;; WHEN: Tue Dec 31 12:05:22 UTC 2025
;; MSG SIZE  rcvd: 56

Meaning: 8ms is healthy for a nearby resolver. If you see timeouts or 800ms+ spikes, you have network path issues, overload, or upstream slowness.

Decision: If latency spikes correlate with DoT/DoH rollouts, check TLS handshake overhead, connection reuse, and firewall state table capacity.

Task 9: Inspect whether clients are using public DoH endpoints (Linux host perspective)

cr0x@server:~$ sudo ss -tnp | awk '$4 ~ /:443$/ {print $5}' | head
34.120.54.55:443
104.16.249.249:443
1.1.1.1:443

Meaning: Connections to common CDN ranges and known resolver IPs appear. This is not proof of DoH (443 is everything), but it’s a lead.

Decision: If you see repeated long-lived connections to known DoH providers during DNS failures, test browser secure DNS settings and enterprise policies.

Task 10: Check local stub resolver behavior and caching

cr0x@server:~$ resolvectl statistics
Transactions: 18234
Cache Hits:  12011
Cache Misses: 6223
DNSSEC Verdicts: secure=0 insecure=0 bogus=0 indeterminate=0

Meaning: High cache hits are normal. DNSSEC verdicts at zero suggests the stub isn’t validating, or validation is disabled upstream.

Decision: Decide where DNSSEC validation lives (usually on enterprise recursive resolvers). Verify it’s enabled there, not necessarily on every endpoint.

Task 11: Validate DNSSEC on the corporate resolver (from a client)

cr0x@server:~$ dig @10.20.0.53 dnssec-failed.org A +dnssec +noall +comments
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1512

Meaning: A validating resolver should return SERVFAIL for deliberately broken DNSSEC zones.

Decision: If you get a normal A record, your resolver isn’t validating DNSSEC. Decide if that’s acceptable; for many enterprises, it shouldn’t be.

Task 12: Identify whether a proxy is interfering with DoH (common in corporate networks)

cr0x@server:~$ curl -I --connect-timeout 3 https://dns.corp.example/dns-query
HTTP/2 405
server: caddy
date: Tue, 31 Dec 2025 12:07:11 GMT

Meaning: You reached the DoH endpoint. A 405 is fine for a GET/HEAD mismatch depending on implementation; the key is that TLS+HTTP works end-to-end.

Decision: If this times out or returns a proxy block page, fix proxy allowlists or bypass rules for the corporate DoH endpoint.

Task 13: Verify firewall egress policy for DoT

cr0x@server:~$ sudo nft list ruleset | sed -n '1,120p'
table inet filter {
  chain output {
    type filter hook output priority 0; policy accept;
  }
  chain forward {
    type filter hook forward priority 0; policy drop;
    tcp dport { 53, 853, 443 } accept
  }
}

Meaning: Forward chain allows 853. If 853 isn’t allowed from client VLANs to resolver VIPs, DoT will fail.

Decision: Permit 853 only to your resolvers, not the whole Internet, unless you enjoy surprise bypasses.

Task 14: On the resolver, check for saturation (CPU, sockets, open files)

cr0x@server:~$ sudo pidstat -p $(pgrep -x unbound) 1 3
Linux 6.8.0 (dns01) 	12/31/2025 	_x86_64_	(8 CPU)

12:08:01 PM   UID       PID    %usr %system  %CPU  Command
12:08:02 PM  unbound   1442    38.00   6.00 44.00  unbound
12:08:03 PM  unbound   1442    41.00   7.00 48.00  unbound

Meaning: If your resolver is chewing 50% CPU sustained, you’re fine until you’re not. Sudden spikes during DoH rollout can indicate TLS termination overhead or query floods.

Decision: Scale out (more resolver instances), enable connection reuse on DoH frontends, and rate-limit abusive clients rather than punishing everyone.

Task 15: Confirm logs show the client subnet/source you expect (NAT hides sins)

cr0x@server:~$ sudo tail -n 5 /var/log/unbound/unbound.log
[1735646910] unbound[1442:0] info: 10.20.10.15 www.example.com. A IN
[1735646911] unbound[1442:0] info: 10.20.10.15 intranet.corp.example. A IN
[1735646912] unbound[1442:0] info: 10.20.10.15 api.vendor.example. AAAA IN

Meaning: You can attribute queries to client IPs. If everything comes from a NAT gateway, your forensic value drops.

Decision: Avoid NAT between clients and resolvers when possible. If unavoidable, include client identifiers at the DoH layer (with care) or use per-site resolvers.

Fast diagnosis playbook

When DoH/DoT is involved, the usual DNS debugging muscle memory fails because you can’t see queries on UDP/53 anymore.
This playbook is designed for on-call reality: find the bottleneck in minutes, not by writing a novel in the incident channel.

First: determine whether the client is bypassing corporate DNS

Check OS resolver settings (resolvectl status or OS equivalent).
Check browser secure DNS policy (managed vs user-toggled).
Look for absence of UDP/53 queries while browsing still “resolves” (hinting DoH).

Why: If the client isn’t using your resolver, nothing else you do matters.

Second: confirm reachability and TLS health to the corporate encrypted endpoint

For DoT: openssl s_client -connect ...:853 (cert verify, handshake time).
For DoH: curl -I https://... (HTTP response, proxy interference).

Why: Most enterprise DoT failures are not “DNS bugs.” They’re TLS/cert/egress policy issues.

Third: validate resolver correctness and upstream behavior

dig @resolver internal.name and dig @resolver external.name
Check SERVFAIL spikes; test DNSSEC on a known-bad domain
Look at resolver CPU and file descriptor limits

Why: If the resolver is overloaded or misconfigured, encryption just makes the outage harder to see.

Fourth: isolate network path problems (MTU, state tables, proxies)

MTU issues show up as TLS handshake failures or intermittent stalls.
Firewall state exhaustion shows up as random failures across many clients.
Proxy timeouts show up as DoH requests failing only on-net.

Common mistakes (and how they fail in production)

1) Symptom: Internal names fail only in browsers; OS tools work

Root cause: Browser DoH enabled to a public resolver; OS still uses corporate DNS.

Fix: Enforce browser policy: “use system resolver” or “use corporate DoH endpoint.” Provide a corporate DoH service that works off-VPN and on-VPN.

2) Symptom: Random DNS failures after enabling DoT “strict”

Root cause: Certificate mismatch (wrong SAN), missing intermediate chain, or firewall blocks 853 intermittently.

Fix: Fix PKI first. Then allow 853 only to your resolvers. Add monitoring on TLS handshake success rate.

3) Symptom: Some sites load slowly; resolver latency looks fine

Root cause: DoH via proxy adds head-of-line blocking or connection churn; HTTP/2 reuse not working; proxy inspects and delays.

Fix: Bypass proxy for corporate DoH endpoint. Ensure keepalive and HTTP/2 are enabled. Consider DoT if proxies are unavoidable.

4) Symptom: Security team loses DNS visibility overnight

Root cause: Clients silently switched to public DoH (browser update, OS feature), bypassing corporate resolvers and logs.

Fix: Endpoint management: disable public DoH, configure enterprise DoH/DoT. Network: restrict egress to known DoH endpoints if required, but treat as temporary.

5) Symptom: Split-horizon answers are inconsistent between subnets

Root cause: Multiple resolver stacks (branch caches, central resolvers, public DoH) with different forwarding rules and caches.

Fix: Consolidate policy: one resolver service with consistent views; keep branch resolvers but centrally managed; stop “creative caching.”

6) Symptom: High SERVFAIL rate to external domains only

Root cause: DNSSEC validation failures because of broken upstream domains, incorrect time, or blocked UDP fragments/TCP fallback.

Fix: Ensure TCP/53 allowed outbound from resolvers. Check NTP on resolvers. Use sane DNSSEC settings; don’t disable validation globally because one domain is broken.

7) Symptom: DoH works off-network, fails on corporate LAN

Root cause: SSL inspection or proxy blocks the DoH endpoint, or captive portal intercepts 443 in some segments.

Fix: Allowlist/bypass the corporate DoH endpoint from interception. If you must inspect, terminate carefully and preserve semantics—or accept you’re breaking privacy.

8) Symptom: Resolver CPU spikes after enabling DoH

Root cause: TLS termination overhead moved to the resolver without capacity planning; or DoH endpoint lacks connection reuse and opens too many sessions.

Fix: Offload TLS to a dedicated frontend with keepalive; scale resolver pool; implement rate limits and per-client quotas.

Checklists / step-by-step plan

Step 1: Decide your enterprise stance (write it down)

Are you providing a corporate encrypted DNS service? (You should.)
Do you require encryption off-network? On-network?
What logs do you retain, and why? Who can access them?
Which internal zones must never leak?

Step 2: Build the resolver service like a tier-0 dependency

Redundant resolvers per site or region.
Consistent forwarding and split-horizon rules.
DNSSEC validation enabled on recursive resolvers.
RPZ/threat feeds with change control and rollback.
Capacity tests: QPS, TLS handshakes, connection counts.

Step 3: Add DoT and/or DoH endpoints—intentionally

DoT on 853 for managed clients, simpler to reason about.
DoH on 443 for hostile networks and proxy-heavy environments, but make it a first-class service (monitor it).
Certificates from a trusted internal/public CA as appropriate; include correct SANs.
Make endpoints reachable from VPN and from the public Internet if your remote workforce needs it (with DDoS considerations).

Step 4: Enforce endpoint policy

OS-level DNS: set enterprise resolvers.
Browser-level: force “secure DNS” to corporate endpoint or system resolver.
Prevent user override where your risk model demands it (regulated environments).

Step 5: Network controls (use sparingly, but be ready)

Allow DoT (853) only to corporate resolvers; block to the Internet if policy requires.
For DoH, prefer endpoint controls; use network detection as a backstop.
Document exceptions (guest Wi‑Fi, BYOD, labs).

Step 6: Monitoring that catches the real failures

Resolver: QPS, latency, SERVFAIL/NXDOMAIN rates, cache hit ratio, upstream RTT.
DoT: TLS handshake error rate, session counts, certificate expiry alarms.
DoH: HTTP status distribution, p95 latency, proxy error codes, connection reuse metrics.
Client experience: synthetic queries for internal and external names from each site.

Step 7: Rollout plan that respects blast radius

Canary group (IT + a few power users).
Gradual expansion per site/region.
Rollback: clear steps to revert endpoint policy and routing.
Incident runbook: include “is client bypassing resolver?” as first check.

FAQ

1) Should enterprises allow DoH at all?

Yes, but not as “everyone uses whatever resolver their browser likes.” Provide a corporate DoH endpoint and manage clients to use it.
If you can’t manage clients, you may need to restrict public DoH to preserve split-horizon and security controls.

2) Is DoT easier than DoH for corporate networks?

Usually. Port 853 is explicit, easier to firewall, and avoids proxy weirdness. DoH is better when you must traverse networks
that block 853, but it’s harder to control at the network layer.

3) Does encrypted DNS stop malware?

No. It stops passive observers from reading DNS traffic. Malware can still resolve names—possibly more reliably.
Your mitigation is policy at the resolver (when clients use it), endpoint security, and egress controls.

4) If we do DNSSEC, do we still need DoH/DoT?

DNSSEC and DoH/DoT solve different problems. DNSSEC authenticates answers; DoH/DoT hides the questions and answers from observers.
You want DNSSEC validation on your resolvers whether or not you deploy encryption.

5) Can we block public DoH by SNI?

Sometimes, today. Less reliably tomorrow due to ECH and CDN fronting patterns. Treat SNI blocking as a short-term containment measure,
not a strategy you want to maintain for years.

6) Won’t DoH break our DNS logging?

It breaks on-path logging, which you shouldn’t rely on anyway. It doesn’t break resolver logging if clients use your resolver.
Move visibility to the resolver layer and keep retention scoped to legitimate security/ops needs.

7) What about guest Wi‑Fi and BYOD?

Decide what you’re optimizing for. If guests are not allowed to reach internal resources, you can let them use public DNS/DoH.
If they need internal access, treat them as managed or force them through a controlled access method (NAC/VDI/VPN).

8) How do we prevent internal domain leakage over public resolvers?

The reliable answer is endpoint configuration: ensure internal suffixes resolve via corporate resolvers, and configure DoH/DoT to your endpoints.
Network blocking helps but will never be perfect if clients can tunnel over 443.

9) Is running our own DoH endpoint risky?

It’s a production service, so yes. But it’s also a control point you already rely on (DNS). If you don’t run it,
someone else will—on behalf of your endpoints—without your split-horizon needs or incident response requirements.

10) What’s the “least bad” default for a mixed environment?

Corporate recursive resolvers with DNSSEC validation, internal zone forwarding, and DoT “opportunistic” for managed endpoints,
plus a corporate DoH endpoint for browsers and hostile networks. Then tighten to “strict” where monitoring proves it’s safe.

Conclusion: next steps that won’t embarrass you later

DoH and DoT aren’t just privacy switches. They move your DNS control plane from the network to the endpoint and the resolver.
If you don’t provide an enterprise-grade encrypted DNS service, your users will adopt one accidentally—via a browser update,
an OS toggle, or an enthusiastic blog post—then you’ll debug it at 2 a.m. with half your telemetry missing.

Practical next steps:

Inventory: identify where DNS is resolved today (OS, browsers, VPN clients, branch caches).
Build: deploy redundant corporate recursive resolvers with DNSSEC validation and consistent split-horizon rules.
Encrypt: add DoT and/or DoH endpoints with real monitoring (handshake rates, latency, error codes).
Control: enforce endpoint and browser policies to use corporate encrypted DNS, not public endpoints.
Contain: use network blocks sparingly for public DoT and targeted DoH endpoints when policy demands, but don’t pretend it’s permanent.
Rehearse: run failure drills—expired cert, resolver overload, upstream outage—so the first time isn’t in production.

If you do this right, users get privacy on hostile networks, security teams keep useful DNS visibility, and internal apps don’t faceplant.
The goal is not to “win” against encryption. The goal is to run a network where encryption is normal and your controls still work.