Office VPN access control: allow only what’s needed (not full LAN-to-LAN)

Was this helpful?

You set up “temporary” office VPN access for a vendor, a contractor, or the new remote sales team. It works. Everyone celebrates.
Six months later you’re staring at a lateral movement incident, a mystery route leak, and a firewall that looks like a bowl of spaghetti.
The VPN didn’t fail. Your boundaries did.

The goal here isn’t to make VPN access impossible. It’s to make it boring: only the required apps, only the required routes,
with guardrails that keep working when the network changes at 2 a.m. and your pager is already mad at you.

The principle: VPN is not a teleporter into your LAN

“Office VPN” tends to get treated like a magic cable: plug a remote laptop into it and—poof—it’s “in the office.”
That’s the mental model that creates full LAN-to-LAN access, broad routes, and “any-any” firewall rules.
It’s also the mental model that makes incident response miserable.

A VPN is just a transport. It is not an authorization system. If you don’t deliberately layer authorization on top—routes,
firewall policy, and identity-aware gates—you’re granting whatever the network topology happens to allow today.
And topology changes all the time.

The right model is: a VPN is a controlled on-ramp into a small set of services. Those services can live on your LAN,
in a DMZ/VPN segment, or in cloud networks. But the on-ramp should not lead to “everything with an RFC1918 address.”

Opinionated guidance: treat “full LAN-to-LAN” as a default-deny violation. If you need it, you should be able to defend it in writing.
Not in a committee meeting. In a post-incident review.

Threat model in plain English: what you’re defending against

1) Lateral movement from one compromised endpoint

Remote endpoints are messy. They travel. They join coffee shop Wi‑Fi. They run browser extensions that were “free” for a reason.
If one endpoint gets popped and your VPN lands it on the same L2/L3 reachability as file servers, printers, hypervisors, backup boxes,
and “temporary” admin consoles, you just gave an attacker a private highway.

2) Accidental access: the honest mistake that still hurts

Most “unauthorized access” in corporate networks is not a hoodie-wearing villain.
It’s a developer pointing a script at 10.0.0.0/8 because it was quicker than finding the right IP.
It’s a vendor “just checking something” on a host they weren’t meant to reach.
Least-privilege VPN design prevents both malice and oops.

3) Route leaks and overlapping subnets

If you’ve ever merged two companies or added a new cloud VPC, you’ve met the villain:
overlapping RFC1918 space. With broad VPN routes, it becomes ambiguous which “10.0.12.0/24” you meant,
and the wrong path sometimes still works—until it doesn’t, and you spend your weekend diffing routing tables.

4) DNS as an access vector

Private DNS can reveal internal naming, service discovery targets, and “hidden” management endpoints.
You don’t want every VPN user to query every internal zone, even if they can’t connect afterward.
Reconnaissance is a phase, not a permission.

5) Operational failure: the access model you can’t debug

Overly clever VPN policy is its own threat. If the only person who understands the routing is on vacation,
your mean time to repair becomes “whenever they land.” Debuggability is a security feature.

Paraphrased idea (attributed): “Hope is not a strategy.” — often credited in ops culture to General Gordon R. Sullivan.
Whether you like the phrasing or not, it applies: don’t “hope” broad VPN access stays safe.

Historical and interesting facts (because we keep relearning them)

  1. VPNs became mainstream as “perimeter extensions” in the 1990s/early 2000s, when offices had a clean inside/outside boundary and remote access was rare.
  2. IPsec was designed for network-to-network security, not per-app access—so “full subnet reachability” is a natural (but outdated) outcome.
  3. Split tunneling used to be treated as heresy because it breaks the “all traffic through HQ” inspection story; today it’s often necessary for performance and SaaS reality.
  4. NAT made mergers harder: overlapping RFC1918 networks were tolerated because NAT “fixed it,” until you try to join two NAT-heavy environments with a VPN and everything collides.
  5. Early remote access concentrated trust in the VPN gateway; modern designs distribute trust into identity providers, device posture checks, and app-layer proxies.
  6. “Flat networks” were once normal because internal threats were under-modeled; ransomware and credential theft made internal segmentation non-negotiable.
  7. WireGuard popularized route-as-policy: its AllowedIPs setting is both routing and access control, which is elegant and easy to shoot yourself with.
  8. DNS split-horizon isn’t new, but VPNs made it ubiquitous: internal names for internal services became a dependency, and mis-scoped DNS became a common leak.
  9. Cloud networking pushed “service-first” thinking: security groups and L7 gateways forced teams to name what they’re connecting to, not just which subnet exists.

If you take one lesson from history: architectures outlive the reasons they were built. The VPN you inherit was probably reasonable once.
It’s just not reasonable anymore.

Reference architectures that avoid full LAN-to-LAN

Architecture A: “VPN to a service segment,” not VPN to the whole LAN

Put the VPN terminator (WireGuard, OpenVPN, IPsec gateway) into a dedicated VPN subnet/VLAN.
From there, allow egress only to the services required: ticketing jump host, Git, CI, a handful of internal APIs, maybe RDP/SSH to a bastion.
Deny everything else by default.

Key point: the VPN subnet is a client network, not a peer LAN. Don’t route remote users directly into “corp-lan.”
Route them into “vpn-clients,” then firewall them to specific destinations.

Architecture B: Per-app access via bastions and proxies (boring, effective)

If users need admin access, they don’t need the whole admin network. They need a jump host with MFA, logging, and a short list of allowed outbound targets.
SSH bastions, RDP gateways, and HTTP reverse proxies are old tech. That’s why they work.

The VPN becomes an access method to a control point (bastion/proxy), not a blanket route to everything.
Once you accept that, your firewall policy gets smaller and your audits get easier.

Architecture C: Identity-aware access (when you can)

If your internal services can sit behind an identity-aware proxy (IAP) model, do it. It reduces network-level trust and shifts authorization to user and device identity.
But don’t lie to yourself: it’s not “set and forget.” You still need network segmentation to contain what happens when identity is bypassed or misconfigured.

Architecture D: True site-to-site, but tightly scoped and documented

Sometimes you really do need site-to-site connectivity: branch office printers, industrial controllers, or legacy systems that can’t speak through proxies.
Fine. But you still don’t need “any subnet to any subnet.”
Write down the exact prefixes that must be reachable, and enforce it in both routing and firewall layers.

Joke #1: A “temporary full-tunnel any-any VPN” is like a “temporary production admin password.” It exists forever, and it grows teeth.

Controls that actually work: routing, firewalling, identity, DNS

1) Routing: choose explicit prefixes, not “everything private”

The most common failure mode is advertising broad internal ranges to VPN clients because it’s convenient:
10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. Don’t.
That’s not routing; that’s abdication.

Do this instead:

  • Advertise only the required service subnets (or even better, service IPs).
  • Keep the list short; if it grows, you have an architecture issue, not a routing issue.
  • For WireGuard, use AllowedIPs as a strict allowlist.
  • For OpenVPN, push only the routes you intend (push "route ..."), and block client-to-client by default unless required.

2) Firewall: default deny, then allow by destination and port

Routing decides what’s reachable. Firewall decides what’s allowed. You need both because routes have a habit of changing,
and because “reachable” is a much larger set than “should be used.”

The basic pattern:

  • Define a VPN client source range (e.g., 10.250.0.0/24).
  • Allow that source to reach only specific destinations/ports.
  • Log denies (selectively; don’t DDoS your own logging pipeline).
  • Block east-west within the VPN pool unless you’re intentionally supporting peer traffic.

3) Identity and device posture: the VPN should know who’s connecting

IP-based rules alone are brittle. Tie access to identity wherever possible:

  • Per-user certificates (OpenVPN), per-user WireGuard keys mapped to users, or SSO-integrated VPN solutions.
  • Short-lived credentials if your tooling supports it.
  • Device posture checks (managed device, disk encryption, OS version) if your environment can enforce them.

Even if you can’t do “zero trust” end-to-end, you can at least know which user a key belongs to,
and revoke it without emailing the whole company.

4) DNS: split-horizon with a scalpel, not a shovel

VPN clients often need internal DNS for a subset of zones. Give them those zones, not your entire corporate namespace.
If your VPN solution can do conditional forwarding (send only corp.example to internal resolvers), use it.
Otherwise, be careful: pushing internal DNS servers to clients can accidentally route all DNS queries through the VPN and create privacy, performance,
and debugging headaches.

5) Observability: log what matters and make it queryable

You need to answer, quickly:

  • Who connected? From where? With what device/key?
  • What routes were assigned?
  • What was allowed/denied at the firewall?
  • Which DNS queries were made (at least for internal zones)?

If your answer is “we can probably grep the VPN server,” you’re one incident away from learning the definition of “probably.”

Practical tasks (commands, outputs, decisions)

These are the day-to-day moves that keep “least privilege VPN” from being a slide deck.
Each task includes: a command, what typical output means, and the decision you make.
Examples assume Linux-based VPN gateways and common tooling.

Task 1: Confirm what routes the VPN server is actually advertising

cr0x@server:~$ sudo ip route show
default via 203.0.113.1 dev eth0 proto static
10.0.10.0/24 dev lan0 proto kernel scope link src 10.0.10.1
10.250.0.0/24 dev wg0 proto kernel scope link src 10.250.0.1
172.16.50.0/24 via 10.0.10.2 dev lan0 proto static

What it means: The server has local LAN 10.0.10.0/24, VPN pool 10.250.0.0/24, and a routed subnet 172.16.50.0/24.
This is server-side routing, not client-side “pushed” routes, but it hints at what you might be enabling.

Decision: If you see broad routes (like 10.0.0.0/8) on the gateway, stop and justify them.
Tighten to only the subnets that host the services you intend to expose.

Task 2: For WireGuard, list peers and their AllowedIPs (your real access policy)

cr0x@server:~$ sudo wg show
interface: wg0
  public key: G2h9l8mRkQhYtYc9o2b1...
  listening port: 51820

peer: M4x9Vt2g9bBv7hPpP3c0...
  preshared key: (hidden)
  endpoint: 198.51.100.24:53321
  allowed ips: 10.250.0.10/32, 10.0.30.15/32
  latest handshake: 1 minute, 12 seconds ago
  transfer: 128.21 MiB received, 44.02 MiB sent

peer: Q7u1kL0pZz2nWc3aY1r5...
  endpoint: 203.0.113.77:59610
  allowed ips: 10.250.0.11/32, 10.0.40.0/24
  latest handshake: 2 hours, 4 minutes ago
  transfer: 2.10 MiB received, 1.54 MiB sent

What it means: Peer 1 can route to a single internal host 10.0.30.15.
Peer 2 can route to an entire subnet 10.0.40.0/24. That’s a policy difference worth documenting.

Decision: Prefer host routes or tight service subnets. Subnet access should require an explicit owner and a reason.
If you can’t name the reason, remove the subnet.

Task 3: Check whether IP forwarding is enabled (and whether it should be)

cr0x@server:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

What it means: The host will route packets between interfaces. That’s required for many VPN gateway designs.
It’s also the switch that turns “VPN server” into “router into everything.”

Decision: If you’re running per-app proxy/bastion access and don’t need L3 routing, set it to 0.
If you do need routing, compensate with strict firewall rules on the forwarding path.

Task 4: Inspect nftables policy to ensure default-deny on forwarding from VPN

cr0x@server:~$ sudo nft list ruleset
table inet filter {
  chain input {
    type filter hook input priority 0; policy drop;
    iif "lo" accept
    ct state established,related accept
    tcp dport { 22, 51820 } accept
  }

  chain forward {
    type filter hook forward priority 0; policy drop;
    ct state established,related accept
    iif "wg0" oif "lan0" ip daddr 10.0.30.15 tcp dport 443 accept
    iif "wg0" oif "lan0" ip daddr 10.0.40.0/24 tcp dport 5432 accept
  }

  chain output {
    type filter hook output priority 0; policy accept;
  }
}

What it means: Forwarding is default drop. Two explicit allows exist: HTTPS to one host, and Postgres to one subnet.
This is the shape you want: narrow exceptions, readable rules.

Decision: If your forward chain policy is accept, you’re doing “full LAN-to-LAN” by accident.
Flip to drop and enumerate what must be reachable.

Task 5: Identify what VPN clients are trying (and failing) to reach

cr0x@server:~$ sudo nft add rule inet filter forward iif "wg0" counter log prefix "VPN-FWD-DROP " level info drop
cr0x@server:~$ sudo journalctl -n 5 -k
Aug 14 10:21:19 gw kernel: VPN-FWD-DROP IN=wg0 OUT=lan0 SRC=10.250.0.11 DST=10.0.10.25 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=TCP SPT=51234 DPT=445 WINDOW=64240 SYN
Aug 14 10:21:20 gw kernel: VPN-FWD-DROP IN=wg0 OUT=lan0 SRC=10.250.0.11 DST=10.0.10.53 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=55911 DPT=53

What it means: A client attempted SMB (445) to 10.0.10.25 and DNS to 10.0.10.53, and both were dropped.
That’s either unauthorized access (good) or a missing rule (also good to discover early).

Decision: Decide if those destinations are legitimate business needs. If not, keep dropping and consider blocking at the client policy level too.
If yes, add a tight allow: specific server, specific port, logged.

Task 6: Verify NAT behavior (NAT can hide problems, and sometimes creates them)

cr0x@server:~$ sudo nft list table ip nat
table ip nat {
  chain postrouting {
    type nat hook postrouting priority srcnat; policy accept;
    oif "lan0" ip saddr 10.250.0.0/24 masquerade
  }
}

What it means: VPN client traffic to the LAN is masqueraded. LAN hosts will see the gateway IP, not the client IP.
That can simplify routing on the LAN side but destroys per-client attribution at the destination.

Decision: Avoid NAT for admin/audit-sensitive environments. Prefer routed VPN pools with explicit return routes so logs show real client IPs.
If you must NAT (legacy constraints), compensate with gateway-side logging and per-user mapping.

Task 7: Check for route leaks to clients (OpenVPN example via status)

cr0x@server:~$ sudo tail -n 12 /var/log/openvpn/status.log
OpenVPN CLIENT LIST
Updated,2025-08-14 10:20:10
Common Name,Real Address,Bytes Received,Bytes Sent,Connected Since
alice,198.51.100.24:53321,134507892,46158212,2025-08-14 08:12:03

ROUTING TABLE
Virtual Address,Common Name,Real Address,Last Ref
10.250.0.10,alice,198.51.100.24:53321,2025-08-14 10:20:09
GLOBAL STATS
Max bcast/mcast queue length,0
END

What it means: You can see which user has which VPN IP. This is the minimum you need for incident response correlation.

Decision: If you can’t map users to VPN IPs reliably, fix that before you “optimize” anything else.
Access control without attribution is a confidence trick.

Task 8: Confirm DNS configuration on the client path (what resolver is used)

cr0x@server:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 3 (wg0)
    Current Scopes: DNS
         Protocols: +DefaultRoute
Current DNS Server: 10.0.10.53
       DNS Servers: 10.0.10.53
        DNS Domain: corp.example

What it means: Only corp.example is routed to internal DNS, via the wg0 link.
That’s conditional DNS scoping; it avoids hijacking all DNS queries into the office.

Decision: If you see internal DNS set globally with no domain scoping, expect weirdness: SaaS logins failing, captive portals breaking, and hard-to-debug latency.
Fix scoping before users “fix” it themselves by hardcoding public resolvers.

Task 9: Prove reachability vs permission (ping works, TCP doesn’t)

cr0x@server:~$ ping -c 2 10.0.30.15
PING 10.0.30.15 (10.0.30.15) 56(84) bytes of data.
64 bytes from 10.0.30.15: icmp_seq=1 ttl=63 time=3.21 ms
64 bytes from 10.0.30.15: icmp_seq=2 ttl=63 time=3.08 ms

--- 10.0.30.15 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
cr0x@server:~$ nc -vz 10.0.30.15 443
Connection to 10.0.30.15 443 port [tcp/https] succeeded!

What it means: Both ICMP and TCP/443 are allowed, so the service is reachable and permitted.
If ping works but TCP fails, you likely have a firewall/SG/app policy issue. If ping fails but TCP works, someone is filtering ICMP (fine, but be consistent).

Decision: Use ICMP only as a coarse signal. Make decisions based on the actual application port.

Task 10: Identify which side is dropping packets with tcpdump

cr0x@server:~$ sudo tcpdump -ni wg0 host 10.0.30.15 and tcp port 443 -c 3
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on wg0, link-type RAW (Raw IP), snapshot length 262144 bytes
10:24:10.102345 IP 10.250.0.10.51522 > 10.0.30.15.443: Flags [S], seq 1987654321, win 64240, options [mss 1360,sackOK,TS val 123 ecr 0,nop,wscale 7], length 0
10:24:11.110221 IP 10.250.0.10.51522 > 10.0.30.15.443: Flags [S], seq 1987654321, win 64240, options [mss 1360,sackOK,TS val 1123 ecr 0,nop,wscale 7], length 0
10:24:13.126019 IP 10.250.0.10.51522 > 10.0.30.15.443: Flags [S], seq 1987654321, win 64240, options [mss 1360,sackOK,TS val 3123 ecr 0,nop,wscale 7], length 0

What it means: SYNs are arriving on wg0, but no SYN-ACK is seen returning (at least on wg0).
Either forwarding/firewall is blocking, return routing is broken, or the destination is down/refusing.

Decision: Capture on both interfaces (wg0 and lan0). If it leaves lan0 but no reply returns, it’s return path or destination-side firewall.
If it never leaves lan0, it’s your gateway firewall/forwarding.

Task 11: Check for overlapping subnets and “shadow routes”

cr0x@server:~$ ip route get 10.0.40.10
10.0.40.10 via 10.0.10.2 dev lan0 src 10.0.10.1 uid 0
    cache

What it means: Traffic to 10.0.40.10 goes via 10.0.10.2. If you expected it to be local or via another router, you have a route mismatch.

Decision: If a prefix can be reached via multiple paths, make it explicit and document the chosen one.
In VPN design, ambiguity is downtime.

Task 12: Validate MTU and detect fragmentation pain

cr0x@server:~$ ip link show wg0
6: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/none
cr0x@server:~$ ping -M do -s 1380 -c 2 10.0.30.15
PING 10.0.30.15 (10.0.30.15) 1380(1408) bytes of data.
1388 bytes from 10.0.30.15: icmp_seq=1 ttl=63 time=3.45 ms
1388 bytes from 10.0.30.15: icmp_seq=2 ttl=63 time=3.39 ms

--- 10.0.30.15 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms

What it means: MTU on wg0 is 1420. A 1380-byte payload with DF works; you likely won’t see PMTUD issues for typical traffic.
If this fails, large TLS packets may stall, and users will report “it connects but times out randomly.”

Decision: If you see MTU blackholes, set a lower VPN MTU and clamp TCP MSS on the gateway.
Don’t “fix” it by allowing full LAN access; that’s not a fix, that’s surrender.

Task 13: Confirm that client-to-client traffic is blocked (prevent peer lateral movement)

cr0x@server:~$ sudo nft list chain inet filter forward
table inet filter {
  chain forward {
    type filter hook forward priority 0; policy drop;
    ct state established,related accept
    iif "wg0" oif "lan0" ip daddr 10.0.30.15 tcp dport 443 accept
    iif "wg0" oif "lan0" ip daddr 10.0.40.0/24 tcp dport 5432 accept
  }
}

What it means: There’s no rule permitting iif wg0 to oif wg0. VPN clients can’t talk to each other through the gateway.
That’s good: one compromised laptop can’t casually scan the rest of the VPN pool.

Decision: Only allow peer-to-peer VPN traffic if you have a clear product requirement (e.g., remote pair programming tools) and a containment story.

Task 14: Ensure logging shows identity, not just IPs (WireGuard mapping sanity)

cr0x@server:~$ sudo grep -R "M4x9Vt2g9bBv7hPpP3c0" -n /etc/wireguard/
 /etc/wireguard/wg0.conf:18:PublicKey = M4x9Vt2g9bBv7hPpP3c0...
 /etc/wireguard/wg0.conf:19:# owner=alice purpose=vendor-api access=10.0.30.15:443

What it means: The peer key is annotated with ownership and purpose. This is not fancy. It’s operationally priceless.

Decision: If peers are anonymous blobs, you will hesitate to revoke access because you won’t know who breaks.
Add ownership metadata and make it required for changes.

Joke #2: The fastest way to learn networking is to allow 10.0.0.0/8 over VPN and then wait for your first incident.
The second fastest is to not do that.

Fast diagnosis playbook

When VPN access is “slow,” “broken,” or “works for some users,” don’t start by rewriting configs.
Start by finding the layer that’s failing: transport, routing, firewall, DNS, or the app.
Here’s the order that keeps your time-to-truth short.

First: is the tunnel up and stable?

  • Check handshake/connected state (wg show / OpenVPN status / IPsec SA state).
  • Look for frequent reconnects (usually NAT timeouts, roaming endpoints, or MTU issues).
  • Confirm client got the expected VPN IP.

Second: is the route correct for the specific destination?

  • From the client: verify the route to the service IP uses the VPN interface.
  • From the gateway: ip route get <dest> and ensure it points where you think.
  • Watch for overlapping subnets and accidentally more-specific routes.

Third: is the firewall permitting the exact tuple (src, dst, proto, port)?

  • On the gateway: check forward rules and counters.
  • Temporarily add a logged drop rule to see what’s being attempted.
  • Confirm destination-side firewall/security groups also allow the VPN pool.

Fourth: DNS and name-based routing

  • Resolve the service name from the client and confirm it returns the intended IP (internal vs external split).
  • Confirm only required internal zones are routed to internal resolvers.
  • If users report “some sites break,” suspect full-tunnel DNS hijack or DNS timeouts through the VPN.

Fifth: MTU and “it times out only on big responses”

  • Run DF pings to find a safe payload size.
  • Clamp MSS if needed.
  • Symptoms often look like “login page loads, then spinning forever.”

Sixth: application and authentication

  • Once transport/routing/firewall/DNS are correct, debug the service itself (TLS, auth, app logs).
  • Don’t blame the VPN for a 401. It’s innocent until proven guilty.

Three corporate mini-stories from the trenches

Incident caused by a wrong assumption: “It’s only the dev subnet”

A mid-sized company spun up a WireGuard VPN for contractors working on a migration. The network engineer did the “reasonable” thing:
add 10.20.0.0/16 to AllowedIPs because that was “the dev environment.”
Everyone could reach the dev cluster and the internal Git host. Tickets stopped.

The assumption was that 10.20.0.0/16 was exclusively dev. It used to be. Then someone added a few “temporary” management endpoints into that range:
a virtualization console, a backup UI, and an old monitoring box that still had local accounts.
No one updated the VPN policy because no one remembered the VPN policy existed as a policy.

A contractor’s laptop got infected through a browser exploit. The attacker didn’t need to “break the VPN.”
They simply used the contractor’s established tunnel and scanned reachable IPs.
They found the backup UI. They tried credential stuffing. They got lucky.

The post-incident review was painful but clarifying: the VPN wasn’t misconfigured; it was under-specified.
“Dev subnet” was a story people told themselves, not an enforced boundary. They fixed it by moving exposed services behind a bastion and replacing the /16 route with a half-dozen /32s.
They also forced owners for every allowed destination. Suddenly, “temporary endpoints” stopped being invisible.

An optimization that backfired: centralizing everything through the office

Another organization had a mix of SaaS apps and internal tools. Someone decided full-tunnel VPN would “standardize security”:
all client traffic would go through HQ, so they could inspect it and apply consistent policies.
On paper, neat. In practice, this turned the VPN gateway into a choke point and the office internet link into the company’s unofficial backbone.

The first symptoms were subtle: video calls “occasionally stutter,” large file uploads “sometimes fail,” and people blame their home Wi‑Fi because that’s what everyone does.
Then the change rolled out to more users. DNS got slower because all queries hairpinned.
SaaS apps started rate-limiting because requests appeared to come from one or two egress IPs.

The security team responded by adding more inspection rules. Latency went up again.
Eventually the on-call engineer got the dreaded ticket: “VPN works but the internet is unusable.”
That’s not a networking puzzle; that’s architecture collecting its debt.

They backed out to split tunneling for internet-bound traffic and kept only a tight set of internal routes.
For inspection, they applied controls at endpoints and at the app layer where possible.
The lesson: pushing all traffic through one place is an availability gamble. Availability is part of security, even if the compliance spreadsheet disagrees.

The boring but correct practice that saved the day: explicit allowlists and change notes

A finance-adjacent company had a strict stance: VPN users got a dedicated pool, and the firewall forward chain was default deny.
Every allow rule had three pieces of metadata: service owner, business purpose, and an expiration review date.
It sounded bureaucratic until it wasn’t.

A partner organization reported they could suddenly reach an internal service they shouldn’t. Panic started, as it does.
The on-call engineer did not start “turning off the VPN.” They checked the forward chain counters and saw hits on a recently added rule.
The rule’s comment included the ticket reference and the owner.

The owner confirmed: during a maintenance window, they temporarily broadened access from a /32 to a /24 “to make testing easier,” and forgot to revert.
Because the rule was annotated and the change was traceable, the fix took minutes: revert the rule to the original /32 and redeploy.

No heroic debugging, no multi-hour war room, no vague blame. Just boring hygiene paying dividends.
If you want reliability, make reversibility cheap and accountability normal.

Common mistakes: symptom → root cause → fix

1) “I can reach everything on 10.x now”

Symptom: A VPN user can browse file shares, printer admin pages, and random UIs never mentioned in requirements.

Root cause: Broad routes pushed to clients and permissive forward policy (often default-accept) on the VPN gateway.

Fix: Remove broad route advertisements; set forward policy to drop; add explicit allows by destination and port. Block client-to-client unless required.

2) “VPN works, but one internal app is unreachable”

Symptom: Tunnel is up, DNS resolves, but TCP connects time out to a specific service.

Root cause: Missing firewall allow for that service/port, or destination-side firewall/security group lacks the VPN pool source.

Fix: Verify tuple in gateway forward rules; check counters/logs; add minimal allow; update destination ACLs. Avoid opening entire subnets “just to test.”

3) “Everything is slow on VPN; Zoom dies; SaaS logins are weird”

Symptom: After connecting, internet performance degrades; SaaS behaves unpredictably.

Root cause: Full-tunnel routing plus DNS hairpin through HQ; egress IP concentration triggers SaaS risk controls; gateway becomes bottleneck.

Fix: Use split tunneling for internet traffic; push only internal routes; scope DNS to internal zones; if inspection is required, use endpoint controls and selective proxies.

4) “It works for some users, fails for others”

Symptom: Same config, different results across users.

Root cause: NAT/firewall differences on client networks, MTU blackholes, or inconsistent per-user routing/AllowedIPs.

Fix: Compare peer configs; test MTU with DF pings; consider keepalives; avoid per-user snowflake routes unless necessary and documented.

5) “We can’t tell who accessed what”

Symptom: Logs show only gateway IP due to NAT, or users share keys.

Root cause: Masquerade hides client IP; lack of per-user credentials; poor log correlation.

Fix: Prefer routed pools without NAT; enforce per-user keys/certs; annotate keys; log connect/disconnect with identity and assigned IP.

6) “After the merger, VPN broke and now routes flap”

Symptom: Some internal ranges intermittently route to the wrong place; name resolution points to unreachable IPs.

Root cause: Overlapping RFC1918 ranges combined via VPN; ambiguous routes; split-horizon DNS not aligned with actual reachability.

Fix: Renumber where possible; otherwise use NAT at clear boundaries with strong logging; make routing more specific; align DNS views with reachable networks.

7) “VPN clients can scan each other”

Symptom: A client can connect to another client’s exposed ports via VPN IPs.

Root cause: Client-to-client allowed (OpenVPN client-to-client enabled, or forward rules allow wg0→wg0).

Fix: Disable client-to-client; block wg0→wg0 forwarding; if peer traffic is required, enforce explicit policy and consider separate pools/groups.

Checklists / step-by-step plan

Step-by-step: migrating from full LAN-to-LAN to least-privilege VPN

  1. Inventory what users actually need

    • List apps/services, not subnets. “Git over HTTPS,” “RDP to jump host,” “Postgres to reporting DB.”
    • Identify owners for each service and a business reason.
  2. Define a dedicated VPN client pool and segment

    • Pick a non-overlapping range (e.g., 10.250.0.0/24).
    • Route it properly (no NAT if you can avoid it).
  3. Turn on default-deny for forwarding from VPN

    • Forward chain policy drop, allow established/related.
    • Explicitly allow the minimum destinations/ports.
  4. Reduce routes pushed to clients

    • For WireGuard: tighten AllowedIPs per peer/group.
    • For OpenVPN: push specific routes only.
  5. Put admin access behind bastions

    • Prefer SSH/RDP gateways with MFA and logging.
    • Then allow bastion → target, not client → target.
  6. Scope DNS

    • Internal zones only; avoid forcing all DNS through VPN unless required.
    • Make internal names resolve to reachable IPs for VPN clients (no “dead” split-horizon records).
  7. Add identity mapping and key hygiene

    • Per-user keys/certs, no shared secrets.
    • Annotate ownership, purpose, and review date next to the config.
  8. Add logging you will actually use

    • Connection events + assigned IP, firewall denies (sampled), DNS queries for internal zones (if feasible).
    • Make it searchable by user, VPN IP, and destination.
  9. Roll out in phases

    • Start with a pilot group and the top 5 services.
    • Keep the old broad access behind an emergency toggle, but put an expiration date on the toggle.
  10. Run a tabletop incident drill

    • Practice: “Contractor key compromised, what can they reach?”
    • If the answer is “most of the LAN,” you’re not done.

Operational checklist: adding a new allowed service

  • Service owner approves and provides destination IP/hostname and port(s).
  • Decide: direct access vs via bastion/proxy.
  • Confirm destination-side ACLs allow the VPN pool (routed, not NATed, ideally).
  • Add firewall allow rule with comment including owner/purpose and review date.
  • Add only the needed route (prefer /32 or smallest subnet that hosts the service).
  • Test from a representative client network (home NAT, mobile hotspot, corporate Wi‑Fi).
  • Confirm logs show user identity for access attempts and denies.

Security checklist: keeping the model tight over time

  • Quarterly review of allowed destinations per group/peer.
  • Expire vendor access automatically unless renewed.
  • Alert on new broad route announcements or changes to default policies.
  • Block known-dangerous protocols (SMB, RPC) from VPN unless explicitly required and constrained.
  • Enforce OS and endpoint security baselines for managed devices.

FAQ

1) Isn’t “full LAN-to-LAN” simpler and therefore safer?

It’s simpler in the same way removing brakes simplifies a car.
Fewer parts, fewer decisions—until the first time you need to stop. Full LAN-to-LAN expands blast radius and makes audits and incident response harder.
Simplicity that increases risk is not operational simplicity.

2) Split tunneling feels risky. Shouldn’t all traffic go through corporate security tools?

If you can guarantee capacity, reliability, and correct inspection without breaking SaaS, full-tunnel can be defensible.
Most organizations can’t. A pragmatic approach is split tunnel for internet, strict routes for internal services, and endpoint security controls for the rest.
Centralizing everything through HQ is an availability bet you place every workday.

3) Can I rely on WireGuard AllowedIPs alone for access control?

AllowedIPs is necessary but not sufficient. It’s a strong lever, but you still want firewall rules on the gateway and ACLs on the destination.
Defense in depth matters because someone will eventually add a route “temporarily.”

4) Should VPN clients be NATed to the gateway?

Avoid NAT when you can: it kills attribution at the destination and complicates audits.
Use routed pools and teach the LAN side how to return traffic to the VPN subnet. NAT is sometimes required for legacy networks,
but treat it as a compromise and add compensating controls.

5) How do I prevent VPN users from discovering internal hostnames?

Scope DNS. Don’t hand out a resolver that can answer the entire internal namespace if users only need one zone.
Use conditional forwarding or per-link DNS routing, and avoid leaking “mgmt” zones to general users.

6) What about “but the app uses dynamic ports”?

That’s a strong hint you should not be allowing the app directly over VPN.
Put it behind a gateway that normalizes access (proxy, bastion, published fixed ports), or redesign the service exposure.
Dynamic ports are fine inside a controlled segment; they’re not fine as a VPN policy requirement.

7) How do we handle vendors who need access “just for a week”?

Treat vendor access as expiring by default. Issue per-vendor credentials, restrict routes to the specific service, and set a review/expiration date.
If the work extends, renew deliberately. “It’s still needed” should be a choice, not a consequence of forgetting.

8) What’s the minimal viable setup if we’re small and busy?

Dedicated VPN pool, default-deny forwarding, and one bastion for admin access.
Add a small handful of explicit service allows. Annotate everything with ownership.
You can grow into fancier identity-aware access later, but you can’t retroactively shrink a flat network without pain.

9) How do we keep least-privilege from turning into endless ticket ping-pong?

Create a standard “service onboarding” path: owners provide destination/ports, you implement a rule template, and you review quarterly.
Also, push teams toward proxies/bastions so “access” is granted at a control point, not by spraying firewall rules across the LAN.
The goal is fewer decisions, not more.

Conclusion: practical next steps

The office VPN is not your LAN extension. It’s an access product, and like any product it needs a clear contract:
who can reach what, over which paths, with what logs, and with what expiration.
If you can’t state the contract, you don’t have control—you have connectivity.

Next steps you can do this week:

  1. Pick one VPN group (vendors or contractors are ideal) and replace broad routes with explicit destinations.
  2. Flip forwarding to default deny and add only the required allow rules with comments and review dates.
  3. Decide whether NAT is hiding your ability to audit; if yes, plan a routed pool migration.
  4. Scope DNS to internal zones that are actually required.
  5. Write down the “Fast diagnosis” playbook and keep it near the on-call runbook, not in someone’s head.

Do it right and your VPN becomes dull infrastructure: predictable, explainable, and uninteresting.
That’s the highest compliment production systems ever get.

← Previous
Ubuntu 24.04: SSH hardening that won’t lock you out — a pragmatic checklist
Next →
Debian 13: NTP Works but Drift Persists — Hardware Clock and Chrony Tweaks (Case #79)

Leave a comment