You finally get approval to connect Office A and Office B. Then you discover both LANs are 192.168.0.0/24, because of course they are. Everyone wants “just a VPN,” but routing doesn’t do magic: when both sides claim the same addresses, packets can’t know where “192.168.0.50” lives.
This is the reality of small-office networking meeting grown-up connectivity. The good news: you can connect them without renumbering. The bad news: you’ll need to be deliberate about which kind of “without renumbering” you mean, because some approaches are duct tape and some are a proper bracket.
What actually breaks when subnets overlap
If both offices use 192.168.0.0/24, you don’t have “a routing problem,” you have an identity problem. IP addresses are supposed to identify endpoints. When two endpoints share the same identity, things get weird fast.
Why a simple site-to-site VPN fails
Classic site-to-site VPN designs assume this: “Traffic destined for subnet X goes into the tunnel.” With overlap, both sites believe they are subnet X. So:
- A host in Office A sends to
192.168.0.50. Its OS decides “that’s on my local LAN,” ARPs for it, and never routes it to the VPN. - If you force it into the VPN (policy routing, firewall rules), the far side receives a packet with a source address that also looks local, and replies get misrouted.
- Even if you “sort of” make it work, every service that bakes IPs into ACLs, logs, or config will become ambiguous and painful.
Joke #1: Overlapping subnets are like having two coworkers named “Alex” on the same on-call rotation. You can make it work, but your alerts will become performance art.
The three collisions you’ll hit
- Forwarding decision collision: hosts treat same-subnet destinations as on-link. They ARP, not route.
- Return path collision: replies go to the wrong place because the source/destination look local on both ends.
- State collision: firewalls/NAT tables can’t reliably distinguish flows that share the same 5-tuple semantics across sites if you don’t translate something.
The fix always involves making the traffic unambiguous. You do that by translating addresses, isolating routing tables, or moving the connectivity up the stack (overlay/proxy).
Interesting facts and historical context
Some perspective helps, because this mess isn’t new—it’s the predictable outcome of private addressing and decades of “just ship it.” Here are a few concrete bits of context that shape why overlapping subnets are so common:
- RFC 1918 (1996) popularized private address space (
10/8,172.16/12,192.168/16), making it easy for everyone to pick the same subnets independently. - 192.168.0.0/24 became “the default LAN” largely because early consumer routers shipped with it, and inertia is a powerful drug.
- NAT was originally treated as a pragmatic hack to conserve IPv4 addresses; it later became a security placebo and an architectural dependency.
- IPsec was designed for network-to-network security, but not for identity collisions; NAT traversal (NAT-T) solved a different problem (NAT on the path), not overlapping LANs.
- Early VPN appliances commonly pushed “policy-based” tunnels (encryption domains). That model collapses under overlap unless you add translation.
- VRFs (virtual routing and forwarding) came from carrier/ISP tech to keep customer routes separate on shared infrastructure; they’re now the cleanest way to keep overlapping networks distinct.
- Split-horizon DNS predates cloud by decades, and it’s still a practical tool when IP reachability is messy but names can be made consistent.
- IPv6 was supposed to end all of this. Instead, many orgs run dual-stack partially, keep IPv4 private LANs, and still overlap anyway.
Decision tree: what you should do (and what you’ll regret)
“Without renumbering” can mean several things. Be explicit before you touch a firewall:
- Hard requirement: endpoints keep their current IPs and still talk to each other directly by IP.
- Soft requirement: endpoints keep their IPs, but they can reach remote services via new IPs, DNS names, or a proxy.
- Temporary requirement: keep IPs for now, connect sites, then schedule renumbering later.
My opinionated guidance:
- If you need broad L3 reachability between most hosts on both sides: use NAT between sites (also called network translation, NAT-on-a-stick, netmap) or VRFs if you’re in serious router land.
- If only a few services must be reachable (file server, RDP gateway, app): use application-layer proxies or publish services behind a reverse proxy/bastion.
- If you’re connecting users/devices, not networks: use an overlay with its own address space and identity model.
- Do not stretch L2 between offices to “make it one LAN.” Unless your job includes writing postmortems for sport.
And a single quote to keep you honest. Here’s a paraphrased idea often attributed to Peter Drucker: “If you can’t measure it, you can’t improve it.”
In networking terms: if you can’t observe it, you can’t debug it.
Solution patterns that work in production
Pattern 1: NAT / network translation (the usual winner)
This is the workhorse solution. You keep both offices on 192.168.0.0/24, but you introduce a translated subnet for one side (or both) when traffic crosses the inter-site link.
How it looks
- Office A LAN:
192.168.0.0/24 - Office B LAN:
192.168.0.0/24 - Translation for B as seen from A:
10.200.0.0/24(example)
When an A host wants to reach a B host, it targets 10.200.0.x. The site-to-site edge device translates 10.200.0.x to 192.168.0.x across the tunnel, and also translates the source so return traffic comes back correctly.
Two variants: one-way vs two-way translation
- One-way (preferred when possible): Only Office A needs to initiate. A sees B as
10.200.0.0/24. B may not need to know A exists. - Two-way: A sees B as
10.200.0.0/24and B sees A as10.201.0.0/24. This is more work but symmetrical and clearer for ACLs.
Pros
- Works with normal routing; no endpoint changes if you use DNS and/or port-based service access.
- Compatible with IPsec route-based tunnels, WireGuard, GRE, etc.
- Scoped blast radius: translation only applies inter-site.
Cons
- Some apps break if they embed IPs in payloads (SIP, some old licensing, certain SMB edge cases).
- Operational complexity: troubleshooting NAT requires discipline and logging.
- Security teams sometimes dislike “hiding” addresses—though it’s not actually hiding, it’s just remapping.
What to avoid: ad-hoc SNAT “because it works.” Use deterministic, documented mapping (netmap or 1:1 where possible). Random port overload NAT across a tunnel is a future incident report.
Pattern 2: Application proxies (boring, effective, limited)
If the real goal is “Office A users need the accounting app in Office B” and that’s it, don’t build a network merger. Publish the app properly.
Examples:
- Reverse proxy for HTTP(S) apps (Nginx/HAProxy) in a DMZ or service network.
- RDP gateway / bastion host for a few Windows servers.
- File access through a dedicated service (SFTP/HTTPS) instead of full SMB across sites.
This bypasses the subnet overlap entirely: clients connect to a single reachable IP/name, not to random internal hosts.
Pattern 3: Overlay networks (WireGuard, “Zero Trust”, mesh)
Overlays assign their own address space (often 100.64.0.0/10 CGNAT-like ranges or a dedicated 10.x), and they route traffic based on identity and policy rather than “is it on my LAN.”
Good when:
- You need device-to-device connectivity, not subnet-to-subnet.
- You don’t control the edge routers cleanly (managed ISP CPE, third-party firewalls, politics).
- You want gradual rollout: add endpoints over time.
But: an overlay doesn’t automatically solve “I want every printer to see every laptop.” It solves “specific endpoints can reach specific endpoints,” which is usually what you actually want once you stop asking for “a VPN.”
Pattern 4: VRFs and L3 segmentation (enterprise-grade)
If you have routers/switches that support VRFs (or firewall instances/VDOMs), you can keep both 192.168.0.0/24 networks separate by placing them in distinct routing tables. Then you selectively leak routes or NAT between VRFs at controlled points.
This is the cleanest model when you’re integrating many sites that collide (it happens in acquisitions). It’s also the most likely to be implemented incorrectly if your team hasn’t run VRFs in anger before. The tooling and observability must be there: per-VRF routes, per-VRF firewall policies, and clear diagrams.
Pattern 5: Bridging/L2 stretch (the “don’t” section)
Someone will propose “just bridge the networks” so it’s all one LAN. That means extending broadcast domains (ARP, DHCP, multicast) across a WAN link. The overlap problem doesn’t disappear; it mutates into a chaos problem.
Joke #2: Stretching Layer 2 across offices is like putting your datacenter on a power strip from the convenience store next door. It technically delivers electricity.
A WAN is not a LAN. Keep it routed. If you must do L2 for a niche legacy requirement, do it with explicit containment (EVPN/VXLAN, storm control, DHCP guard) and a rollback plan that you’ve tested while fully caffeinated.
Fast diagnosis playbook
When connectivity fails between overlapping subnets, people waste hours staring at the tunnel status. Don’t. Your bottleneck is usually one of three things: (1) host routing decision, (2) translation policy mismatch, (3) return path/ACL state. Here’s the order that finds truth fast.
1) Confirm what the host is trying to do (on-link vs routed)
- From a client in Office A, check the route to the target IP (or name) it’s using.
- If it believes the destination is “link-local” (same /24), it will ARP and never hit the gateway.
2) Confirm the edge device sees packets and applies the intended NAT
- Check firewall/NAT hit counters.
- Use packet capture on inside and tunnel interfaces to prove translation.
3) Confirm the return path is symmetric
- Look for replies leaving the wrong interface.
- Check conntrack/state table entries and reverse NAT.
4) Only then suspect the tunnel or MTU
- Most “VPN is up but nothing works” cases are policy/NAT/route issues, not crypto.
- If you see partial connectivity (ICMP works, TCP stalls), then look at MTU/MSS and fragmentation.
Practical tasks: commands, outputs, decisions (12+)
The following tasks assume Linux-based gateways/hosts for demonstration because the commands are explicit and inspectable. The same concepts map to firewalls and routers, but with different UI paint.
Task 1: Identify the local subnet and gateway on a client
cr0x@server:~$ ip -4 addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
inet 192.168.0.23/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
What it means: the host is in 192.168.0.0/24. Any 192.168.0.x destination looks “local” to the kernel.
Decision: if you need this host to reach a remote office host also numbered 192.168.0.x, you must use a different destination IP (translated/overlay/proxy). Do not attempt to route 192.168.0.0/24 to the tunnel from this host; it will never pick the route.
Task 2: Prove the host will ARP instead of route
cr0x@server:~$ ip route get 192.168.0.50
192.168.0.50 dev eth0 src 192.168.0.23 uid 1000
cache
What it means: it’s on-link (dev eth0), not via a gateway.
Decision: don’t use 192.168.0.50 as the “remote” address from this LAN. Use a translated prefix like 10.200.0.50 (with NAT on the edge), or a DNS name that resolves to that translated address.
Task 3: Check the ARP table after a failed ping
cr0x@server:~$ ping -c 1 192.168.0.50
PING 192.168.0.50 (192.168.0.50) 56(84) bytes of data.
--- 192.168.0.50 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
cr0x@server:~$ ip neigh show to 192.168.0.50
192.168.0.50 dev eth0 INCOMPLETE
What it means: the host ARPed on the local LAN, got no reply, and never tried the VPN gateway.
Decision: any “fix” that focuses on the VPN tunnel is premature. The endpoint is not sending traffic to the tunnel.
Task 4: Validate that the translated prefix is routable via the gateway
cr0x@server:~$ ip route get 10.200.0.50
10.200.0.50 via 192.168.0.1 dev eth0 src 192.168.0.23 uid 1000
cache
What it means: traffic to 10.200.0.0/24 will go to the gateway (192.168.0.1), which can then NAT/tunnel it.
Decision: proceed with edge NAT and tunnel policy for 10.200.0.0/24 as “remote B.” If this route doesn’t exist, add it via DHCP option 121/classless static routes, or advertise it via your internal routing protocol.
Task 5: On the gateway, confirm IP forwarding is enabled
cr0x@server:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1
What it means: the gateway will route packets between interfaces.
Decision: if this is 0, nothing else matters. Enable it and ensure it persists. If it is 1, move on to routing and NAT rules.
Task 6: Check routes on the gateway for the translated subnet and tunnel
cr0x@server:~$ ip route show
default via 203.0.113.1 dev wan0
10.200.0.0/24 dev wg0 scope link
192.168.0.0/24 dev lan0 scope link src 192.168.0.1
What it means: the gateway believes 10.200.0.0/24 is reachable via the tunnel interface (wg0).
Decision: if 10.200.0.0/24 is missing or points somewhere else, fix routing first. NAT rules won’t rescue bad routing.
Task 7: Verify WireGuard peer AllowedIPs (common failure point)
cr0x@server:~$ wg show
interface: wg0
public key: 6YcD...redacted...
listening port: 51820
peer: p9nS...redacted...
endpoint: 198.51.100.10:51820
allowed ips: 10.200.0.0/24
latest handshake: 41 seconds ago
transfer: 120.33 KiB received, 98.11 KiB sent
What it means: the peer is up and the tunnel will accept/route 10.200.0.0/24.
Decision: if AllowedIPs doesn’t include your translated prefix, the kernel will drop or misroute traffic. Fix AllowedIPs, then retest.
Task 8: Inspect NAT rules (nftables) and confirm counters increment
cr0x@server:~$ sudo nft list ruleset
table ip nat {
chain prerouting {
type nat hook prerouting priority dstnat; policy accept;
iifname "lan0" ip daddr 10.200.0.0/24 dnat to 192.168.0.0/24
}
chain postrouting {
type nat hook postrouting priority srcnat; policy accept;
oifname "wg0" ip saddr 192.168.0.0/24 snat to 10.201.0.1
}
}
What it means: inbound from LAN destined to 10.200.0.0/24 is destination-NATed to 192.168.0.0/24. Outbound to the tunnel is source-NATed so the far side can return traffic to a unique source.
Decision: if you only DNAT without a corresponding SNAT (or vice versa), replies will likely vanish into the wrong LAN. Use symmetric translation unless you have a proven asymmetric design.
Task 9: Use conntrack to confirm state and translation
cr0x@server:~$ sudo conntrack -L | head
tcp 6 431999 ESTABLISHED src=192.168.0.23 dst=10.200.0.50 sport=51544 dport=445 src=192.168.0.50 dst=10.201.0.1 sport=445 dport=51544 [ASSURED] mark=0 use=1
What it means: conntrack shows the original flow (192.168.0.23 -> 10.200.0.50) and the translated tuple toward the far end (... dst=10.201.0.1 as the SNAT source).
Decision: if state never appears, traffic isn’t reaching the gateway or is blocked by firewall policy before NAT. If state appears but no return traffic, troubleshoot the far-side routing/NAT and tunnel policy.
Task 10: Packet capture on LAN and tunnel interfaces to prove NAT behavior
cr0x@server:~$ sudo tcpdump -ni lan0 host 192.168.0.23 and host 10.200.0.50
IP 192.168.0.23.51544 > 10.200.0.50.445: Flags [S], seq 123456, win 64240, length 0
cr0x@server:~$ sudo tcpdump -ni wg0 host 192.168.0.50 and port 445
IP 10.201.0.1.51544 > 192.168.0.50.445: Flags [S], seq 123456, win 64240, length 0
What it means: you see the original destination on LAN (10.200.0.50) and the post-NAT packet on the tunnel (dst 192.168.0.50, src 10.201.0.1).
Decision: if LAN capture shows traffic but tunnel capture doesn’t, NAT or routing is wrong on the gateway. If tunnel shows outbound but no inbound replies, the far side isn’t returning (policy, routing, or asymmetric NAT).
Task 11: Check reverse path filtering (rp_filter) on Linux gateways
cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.wg0.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.wg0.rp_filter = 1
What it means: strict reverse path filtering may drop packets that arrive on an interface that doesn’t match the kernel’s idea of the return route. NAT plus tunnels plus multiple interfaces is exactly where this bites.
Decision: for gateways doing NAT across tunnels, set rp_filter to loose mode (2) or disable per interface, but only with understanding and compensating firewall policy. If you see mysterious one-way traffic, this is a prime suspect.
Task 12: Test MTU/MSS issues (the “VPN is up but TCP stalls” classic)
cr0x@server:~$ ping -M do -s 1380 -c 2 10.200.0.50
PING 10.200.0.50 (10.200.0.50) 1380(1408) bytes of data.
1388 bytes from 10.200.0.50: icmp_seq=1 ttl=63 time=22.1 ms
1388 bytes from 10.200.0.50: icmp_seq=2 ttl=63 time=21.7 ms
--- 10.200.0.50 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 21.7/21.9/22.1/0.2 ms
What it means: with DF set, a 1380-byte payload works. If this fails while smaller pings succeed, you have an MTU issue on the path/tunnel.
Decision: clamp TCP MSS on the tunnel edge and/or lower the tunnel MTU. Don’t “fix” it by disabling PMTUD globally; that’s how you create new problems.
Task 13: Verify DNS answers match your translation plan
cr0x@server:~$ dig +short fileserver.officeb.example
10.200.0.50
What it means: clients are being pointed at the translated address, not the ambiguous real LAN address.
Decision: if DNS returns 192.168.0.50 for a remote service, clients will ARP and fail. Fix DNS (split-horizon or conditional forwarding) so names resolve to translated/overlay addresses.
Task 14: Confirm firewall policy isn’t silently dropping the translated traffic
cr0x@server:~$ sudo nft list chain inet filter forward
table inet filter {
chain forward {
type filter hook forward priority filter; policy drop;
ct state established,related accept
iifname "lan0" oifname "wg0" ip daddr 192.168.0.0/24 tcp dport { 22, 445, 3389 } accept
}
}
What it means: default drop is in place, and only specific ports are allowed from LAN to tunnel toward the remote LAN after DNAT.
Decision: if you intended broad access but only allowed a few ports, you’ll see selective failures. Adjust policy to match requirements. Also: keep it tight. “Any-any across sites” is how internal malware becomes a frequent traveler.
Three corporate mini-stories
Mini-story 1: The incident caused by a wrong assumption
A mid-sized company acquired a smaller one and needed “quick connectivity” for finance systems. Both offices were on 192.168.0.0/24, but nobody noticed because the ticket said “set up site-to-site VPN, subnets are standard.” The network team configured an IPsec policy-based tunnel with local and remote encryption domains both set to 192.168.0.0/24. The tunnel came up. The dashboard was green. Everyone went home.
The next morning, accounting couldn’t reach anything. The first assumption: “VPN is down.” It wasn’t. The second assumption: “DNS issue.” Not really. The actual problem was banal: clients in Office A tried to access a server in Office B by IP (192.168.0.40), and every one of them ARPed locally. The helpdesk reported “no route to host” on Linux machines and “request timed out” on Windows. Engineers stared at IPsec SAs and rekey timers for two hours.
The breakthrough came from a packet capture on a client switchport: ARP requests for 192.168.0.40 repeating like a metronome. No default gateway involvement. No VPN traffic. The tunnel was innocent.
They fixed it by deploying NAT at the edge: Office B became 10.200.0.0/24 as seen from Office A, and DNS names for B services resolved to those translated IPs. The incident ended quietly. The postmortem theme wasn’t “IPsec is hard.” It was “we assumed uniqueness without checking.”
Mini-story 2: The optimization that backfired
Another org connected two overlapping sites with translation and things mostly worked. Then someone tried to “optimize latency” by adding a second tunnel over a cheaper ISP link and enabling ECMP (equal-cost multipath) between tunnels. The goal was load sharing. The reality was conntrack and NAT state getting sprayed across two paths without coordination.
Some flows landed on tunnel A, got NATed, and replies came back on tunnel B. Depending on the firewall, those replies were either dropped as invalid state, or accepted and then reverse-translated incorrectly. Users reported symptoms that looked random: file copies failing at 37%, RDP sessions freezing, and web apps “logging out” mid-click.
The team initially blamed the cheaper ISP for packet loss. There was some packet loss, sure. But the core issue was stateful translation across asymmetric paths. The tunnels were healthy; the architecture wasn’t.
The fix was not dramatic: pin flows (policy-based routing or connection marks) so both directions of a connection stay on one tunnel, or run active/standby instead of ECMP for stateful NAT traffic. “Optimization” became “stability regression” because nobody asked whether the system was allowed to be stateless. It wasn’t.
Mini-story 3: The boring but correct practice that saved the day
A global company had a habit: before any inter-site connectivity change, they produced a one-page “Address Identity Sheet.” It listed every relevant subnet, translation range, DNS behavior, and who owned it. It was dull. It also prevented a pile of outages.
During an office move, a contractor reconfigured a branch router and accidentally changed the DHCP pool from 192.168.1.0/24 to 192.168.0.0/24 because “that’s what we always use.” Overnight, printers and a couple of workstations pulled new leases. The site-to-site translation design now overlapped with local reality, and the first report was “remote file server is down.”
The on-call engineer pulled the Address Identity Sheet, saw that 192.168.0.0/24 was reserved for translation semantics in that region, and immediately suspected the local subnet had drifted. A five-minute check of DHCP logs confirmed it. They reverted the DHCP scope, flushed a handful of leases, and the “VPN outage” evaporated.
No heroics. No vendor calls. Just a boring document and the discipline to keep it accurate. In operations, boring is often the most sophisticated outcome.
Common mistakes: symptoms → root cause → fix
1) Symptom: “VPN is up, but I can’t ping remote 192.168.0.x”
Root cause: the client treats 192.168.0.x as on-link and ARPs locally; traffic never enters the tunnel.
Fix: do not target the remote site’s overlapping subnet directly. Use a translated prefix (e.g., 10.200.0.0/24), an overlay IP, or a proxy address. Adjust DNS accordingly.
2) Symptom: “Only one direction works” (A can reach B, B can’t reach A)
Root cause: asymmetric translation or missing SNAT/DNAT pairing; return traffic is routed locally on the far side.
Fix: implement symmetric mapping or ensure the far side has a route back to the translated source. Verify conntrack entries and captures on both sides.
3) Symptom: “Some ports work, SMB breaks, RDP resets”
Root cause: firewall policy mismatch, helper modules, or MTU/MSS problems; sometimes NAT hairpins unexpected flows.
Fix: confirm allowed ports in the forward policy, test PMTUD with DF pings, clamp MSS on the tunnel edge, and avoid overly clever ALGs unless you need them.
4) Symptom: “It works for a few minutes, then stops until we restart the tunnel”
Root cause: NAT state timeout, asymmetric routing, or ECMP across stateful devices; sometimes rekey changes the path.
Fix: enforce symmetric routing, increase relevant timeouts if justified, and avoid load sharing without state synchronization.
5) Symptom: “Only a few hosts can connect; others can’t”
Root cause: overlapping again at a smaller scope (static routes on some clients, multiple VLANs), duplicate IPs, or DHCP scope conflicts.
Fix: inventory addressing. Verify client routes and ARP behavior. Lock down DHCP scopes and ensure translated prefixes are not used anywhere locally.
6) Symptom: “DNS resolves, but connections go to the wrong machine”
Root cause: split-horizon DNS misconfigured; internal name resolves to local 192.168.0.x instead of translated/overlay IP.
Fix: implement conditional forwarding or views so each office gets the correct answer. For shared services, prefer names that resolve to unambiguous addresses.
Checklists / step-by-step plan
Step-by-step plan: connect two overlapping /24s with NAT over a tunnel
- Choose translation prefixes that don’t exist anywhere else. Don’t pick another “cute” default like
192.168.1.0/24. Use something obviously “virtual,” like10.200.0.0/24for Office B and10.201.0.0/24for Office A (as seen from B). - Decide whether you need one-way or two-way initiation. If only a handful of services in B need to be accessed from A, one-way may be enough.
- Pick the connectivity substrate: route-based IPsec, WireGuard, GRE over IPsec, etc. Route-based makes NAT easier to reason about.
- Add routes: on Office A gateway, route
10.200.0.0/24into the tunnel. On Office B gateway, route10.201.0.0/24into the tunnel (if two-way). - Implement deterministic NAT:
- DNAT
10.200.0.xto192.168.0.xat the Office B edge (or Office A edge depending on design). - SNAT sources to a translated source that the far side can return to without ambiguity.
- DNAT
- Update DNS: make remote services resolve to translated IPs. Use split-horizon if the same name must resolve differently per site.
- Constrain access with firewall policy: allow required ports only. Start restrictive, expand with intent.
- Validate with packet captures at three points: client LAN, local gateway LAN side, gateway tunnel side.
- Harden MTU/MSS before users complain. Clamp MSS on the tunnel boundary if you can.
- Write down the mapping in a place people will actually find (ticket + runbook). Translation is a design choice, not a secret.
- Monitoring: add checks for tunnel health, NAT rule hit counters, and synthetic transactions (e.g., TCP connect to the remote service IP/port).
- Rollback plan: be ready to disable NAT rules and routes cleanly. Don’t improvise under pressure.
Operational checklist: before you declare it “done”
- Every cross-site service has a name that resolves to an unambiguous address from each office.
- NAT mappings are deterministic and documented (which translated range maps to which real range).
- Return traffic is proven with captures; not assumed.
- Firewall policies are principle-of-least-privilege, with explicit allow rules and logged drops during rollout.
- MTU/MSS tested with DF pings and a real TCP transfer.
- You can explain the design to a tired engineer at 3 a.m. using one diagram.
FAQ
1) Can I solve overlapping subnets with static routes alone?
No. If a host believes the destination is on-link (same subnet), it ARPs and never consults your clever static route. You must change the destination identity (translation/overlay/proxy) or change the host subnet mask (which is renumbering in disguise).
2) What about changing one office to /23 or /25 to “separate” them?
That’s partial renumbering and usually causes more collateral damage than people expect: DHCP scopes, ACLs, printer configs, and hardcoded assumptions. It can be a stepping stone, but don’t pretend it’s painless.
3) Is NAT between sites “bad practice”?
It’s a tradeoff, not a sin. For overlapping networks, NAT is often the least risky path to connectivity without touching endpoints. The bad practice is undocumented, inconsistent NAT that turns troubleshooting into archaeology.
4) Which translated range should I use?
Pick something unlikely to collide with existing or future networks. Many orgs reserve a block like 10.200.0.0/16 for translations. The exact choice matters less than making it globally unique in your environment and writing it down.
5) Will SMB/file shares work through NAT?
Usually yes, if MTU/MSS is sane and stateful firewalls aren’t mangling it. But SMB is sensitive to latency and packet loss, and it loves long-lived connections. Test real transfers, not just pings.
6) Do I need to translate both source and destination?
In most overlapping-subnet cases, yes, at least for the initiating direction. DNAT alone often fails because the far side routes replies locally (since the source looks like its own LAN). Symmetric translation makes return paths unambiguous.
7) Can overlays (like WireGuard) avoid NAT entirely?
Often, yes. If each endpoint gets an overlay IP in a unique range, endpoints talk using overlay addresses, not the overlapping LAN addresses. But if you need “every host in A can reach every host in B by their existing IPs,” overlays won’t grant that wish without additional translation or proxies.
8) What if I only need access to a handful of servers in the other office?
Use a proxy or publish those services behind a small service network with unique addressing. It’s less moving parts than full network translation, and it’s easier to secure.
9) How do cloud VPNs handle overlapping CIDRs (AWS/Azure style situations)?
They generally don’t “handle” it magically; they require you to use NAT/translation at the edge, or to assign unique address spaces per network attachment. Overlap is a topology problem, not a vendor checkbox.
10) What’s the cleanest long-term fix?
Renumbering to unique subnets, with a real IP plan. NAT is a perfectly acceptable bridge (sometimes for years), but the cleanest long-term identity model is unique addressing plus good routing.
Conclusion: next steps you can actually execute
If you connect two offices that both live on 192.168.0.0/24, you’re not fighting encryption. You’re fighting ambiguity. Your job is to make packets able to answer one question: “Which 192.168.0.50 do you mean?”
Practical next steps:
- Pick a translated prefix for one office (
10.200.0.0/24is fine as an example) and reserve it so nobody uses it locally later. - Decide the access model: full L3 reachability (NAT/VRF) vs service reachability (proxy) vs device reachability (overlay).
- Implement route-based connectivity (WireGuard or route-based IPsec), then add deterministic NAT with logging and counters.
- Fix DNS so humans use names and names resolve to unambiguous addresses from each site.
- Run the fast diagnosis playbook once during rollout on purpose, so you can do it faster during the inevitable 2 a.m. surprise.
The most professional outcome isn’t “we made it work.” It’s “we made it predictable, observable, and easy to explain.” That’s how you connect overlapping networks without turning the next on-call into a scavenger hunt.