You built a ZeroTier network. Everyone’s “ONLINE.” The controller looks happy. And yet:
ping times out, SSH won’t connect, and your app behaves like it’s still stranded on a hotel Wi‑Fi network in 2009.
You don’t need vibes. You need packets.
This guide is for the moment where “works on my laptop” meets production reality: multiple OSes, firewalls you didn’t configure,
NAT you can’t control, and a coworker who insists ICMP is “optional.”
What ZeroTier really is (and what it isn’t)
ZeroTier is an overlay network: it creates a virtual Ethernet-like interface on each node and uses encrypted peer-to-peer
transport where possible. When peer-to-peer is impossible (strict NAT, blocked UDP, etc.), it can relay.
Think of it as “software-defined LAN over the internet,” except it behaves more like a hybrid:
sometimes L2-ish, often L3-ish, and always dependent on your real network conditions.
The practical implication: when you can’t ping, it’s rarely “ZeroTier is down.”
It’s usually one of five things:
authorization, addressing, OS firewall policy, routing/overlap, or path MTU/fragmentation.
The rest of this article is how to prove which one, quickly.
What ZeroTier is not: it’s not a magic bypass for corporate egress policy, and it’s not a substitute for
understanding routes. Also, it won’t make a Windows host reply to ICMP if Windows Firewall says “no.”
If you’re reading this because “it says ONLINE but I can’t ping,” welcome to the club.
Facts and context that make troubleshooting easier
Some history and mechanics matter because they explain failure modes. Here are nine concrete points worth knowing.
- ZeroTier uses UDP by default (commonly port 9993). If UDP egress is blocked, you may get relays or nothing at all.
- Nodes have a 10‑digit identity derived from cryptographic keys. That identity persists; changing IP doesn’t change the node identity.
- Networks are identified by a 16‑digit network ID. Joining the wrong network is a classic “everything looks fine” mistake.
- “ONLINE” doesn’t mean “reachable”. It often means “the controller knows about this node,” not “ICMP will work end-to-end.”
- Direct peer-to-peer is opportunistic. NAT traversal succeeds or fails based on real-world NAT behavior, hairpin support, and port mapping.
- Relays are normal. When you see “RELAY,” latency goes up and throughput goes down. That can break apps that are picky about RTT or MTU.
- Managed routes are controller-side policy. If you expect site-to-site connectivity without announcing routes, you’ll be disappointed repeatedly.
- Bridging is powerful and risky. Bridging can turn a clean overlay into a broadcast party. Great for some legacy cases, terrible for “why is this slow?”
- Subnet overlaps are poison. If your ZeroTier network uses 192.168.1.0/24 and someone’s home router does too, packets will take the dumbest possible path.
Build a private network that behaves
Pick an address plan that won’t collide with reality
Do not pick 192.168.0.0/24 or 10.0.0.0/24 unless your hobby is debugging split-brain routing.
Use something boring and unlikely: a dedicated slice of 10/8 (but not the usual suspects), or if your environment can handle it,
use IPv6 inside ZeroTier.
A simple, low-collision choice: 10.147.0.0/16 for the overlay, then carve /24s if you later bridge sites.
The exact range doesn’t matter; the lack of overlap does.
Decide: routing or bridging (don’t “kind of” do both)
Routing is almost always the right answer for site-to-site connectivity. Bridging is for when you truly need L2 adjacency
(some ancient discovery protocol, non-routable app assumptions, or a vendor appliance that thinks routers are a conspiracy).
Bridging increases blast radius. Broadcast and multicast can spill into the overlay, and that’s how you get weird latency spikes.
If your goal is “SSH to servers and reach a couple subnets,” route it.
Control plane vs data plane: authorization is not optional
A private network requires member authorization. If you forget to authorize, the node may show up but won’t pass traffic.
This is not a subtle issue; it’s an “I swear it’s connected” issue.
Establish a baseline: ping is not the first test
ICMP is useful, but it’s also commonly blocked. Your first baseline should be:
(1) can nodes see each other as peers, and (2) can you pass TCP to a known open port (SSH, HTTPS, or a temporary netcat listener).
Joke #1: Ping is like a smoke alarm—when it’s silent you’re either safe, or the battery died months ago and nobody told you.
Fast diagnosis playbook (check this order)
When you’re on-call, you don’t get points for creativity. You get points for speed and certainty.
Here’s the order that finds the bottleneck with the least thrashing.
1) Confirm membership and authorization
- Is the node joined to the correct network ID?
- Is it authorized on the controller?
- Does it have an assigned ZeroTier IP?
2) Confirm the local interface and routes
- Does the ZeroTier interface exist and have an IP?
- Is the route table sending traffic to the ZeroTier interface, or to the wrong default gateway due to overlap?
3) Confirm peer path quality (direct vs relay) and UDP reachability
- Are peers “DIRECT” or “RELAY”?
- Is UDP 9993 blocked outbound/inbound?
- Are you behind symmetric NAT that kills P2P?
4) Check host firewalls and ICMP policy
- Does the OS firewall allow ICMP and the target TCP ports?
- Is the policy different for “Public” vs “Private” profiles (Windows)?
5) Check MTU/fragmentation when small packets work and big ones don’t
- Can you ping with small payload but not large?
- Do you see PMTUD blackholes due to blocked ICMP fragmentation-needed?
6) Only then: get fancy
- Bridging loops, managed routes misconfig, policy-based routing, container namespace quirks, moons, and so on.
Practical tasks: commands, outputs, and decisions
The following tasks are intentionally operational. Each one includes a command, realistic output, what it means,
and the next decision you should make. Run these on both ends (source and destination) unless the task is explicitly controller-side.
Task 1: Verify ZeroTier service is running (Linux systemd)
cr0x@server:~$ systemctl status zerotier-one --no-pager
● zerotier-one.service - ZeroTier One
Loaded: loaded (/lib/systemd/system/zerotier-one.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2025-12-27 09:11:42 UTC; 2h 13min ago
Main PID: 1178 (zerotier-one)
Tasks: 24 (limit: 19070)
Memory: 38.6M
CPU: 1min 22.413s
CGroup: /system.slice/zerotier-one.service
└─1178 /usr/sbin/zerotier-one
Meaning: The daemon is alive. If it’s inactive/failed, nothing else matters.
Decision: If not running, start it and check logs before touching routes/firewalls.
Task 2: Confirm you joined the intended network ID
cr0x@server:~$ sudo zerotier-cli listnetworks
200 listnetworks <nwid> <name> <mac> <status> <type> <dev> <ZT assigned ips>
200 listnetworks a84ac5c10a1b2c3d corp-overlay 5e:33:aa:9d:01:10 OK PRIVATE zt7nn3p3kq 10.147.10.23/16
Meaning: You’re joined, status OK, and you have an assigned overlay IP.
Decision: If status is ACCESS_DENIED or there’s no assigned IP, fix authorization or IP assignment first.
Task 3: Check node identity (useful for controller authorization)
cr0x@server:~$ sudo zerotier-cli info
200 info 9f3c1a2b3c 1.14.2 ONLINE
Meaning: Node ID is 9f3c1a2b3c. You’ll use that on the controller to authorize and to match membership.
Decision: If it’s not ONLINE, you may have UDP reachability issues, DNS issues, or time skew causing trust problems.
Task 4: List peers and see DIRECT vs RELAY
cr0x@server:~$ sudo zerotier-cli listpeers
200 listpeers <ztaddr> <path> <latency> <version> <role>
200 listpeers 62f865ae71 203.0.113.19/9993;2684 34 1.14.2 LEAF
200 listpeers 9d219039f7 198.51.100.8/9993;58421 41 1.14.2 LEAF
200 listpeers 35c192ce9b - 0 1.14.2 PLANET
Meaning: You have direct paths to peers (public IP/port shown). If you see RELAY or a dash for path, expect issues.
Decision: If relayed and performance-sensitive, troubleshoot NAT/UDP or add a moon in a reachable place.
Task 5: Confirm the ZeroTier interface and IP address
cr0x@server:~$ ip -brief addr show | grep -E 'zt|zerotier'
zt7nn3p3kq UP 10.147.10.23/16 fe80::5c33:aaff:fe9d:110/64
Meaning: The interface exists, it’s up, and has the expected overlay IP.
Decision: If interface is missing/down, your join likely failed or the service isn’t running.
Task 6: Check routing table for overlap or wrong next hop
cr0x@server:~$ ip route show
default via 192.168.50.1 dev eth0
10.147.0.0/16 dev zt7nn3p3kq proto kernel scope link src 10.147.10.23
192.168.50.0/24 dev eth0 proto kernel scope link src 192.168.50.20
Meaning: Overlay subnet routes to the ZeroTier interface. Good.
Decision: If the overlay subnet overlaps with a local subnet, you’ll see competing routes. Fix by changing the overlay range or using managed routes carefully.
Task 7: Try TCP connectivity instead of ping (netcat)
cr0x@server:~$ nc -vz 10.147.10.50 22
Connection to 10.147.10.50 22 port [tcp/ssh] succeeded!
Meaning: Even if ping fails, TCP works. This often means ICMP is blocked, not that ZeroTier is broken.
Decision: Stop wasting time on ping. Fix firewall policy if you truly need ICMP; otherwise proceed with your app.
Task 8: Confirm ICMP is blocked locally (Linux firewall: nftables)
cr0x@server:~$ sudo nft list ruleset | sed -n '1,80p'
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
ct state established,related accept
iif "lo" accept
iifname "zt7nn3p3kq" tcp dport { 22, 443 } accept
}
}
Meaning: Default drop on input, and only TCP 22/443 are allowed on ZeroTier interface. ICMP is not allowed.
Decision: Add explicit ICMP accept on the ZeroTier interface if ping is required, or document “ping won’t work” and move on.
Task 9: Check Windows Firewall profile mismatch (PowerShell)
cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetConnectionProfile | Format-Table Name,InterfaceAlias,NetworkCategory"
Name InterfaceAlias NetworkCategory
corp-wifi Wi-Fi Public
ZeroTier One [a84ac5c10a1b2c3d] ZeroTier One Virtual Port Private
Meaning: The ZeroTier interface is “Private,” but other interfaces may be “Public.” Rules may not apply where you think.
Decision: Ensure inbound rules are bound to the ZeroTier interface or the right profile; don’t assume “Private” means “allowed.”
Task 10: Test MTU quickly (Linux ping with DF)
cr0x@server:~$ ping -c 2 -M do -s 1472 10.147.10.50
PING 10.147.10.50 (10.147.10.50) 1472(1500) bytes of data.
ping: local error: message too long, mtu=280
ping: local error: message too long, mtu=280
--- 10.147.10.50 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1015ms
Meaning: The path MTU is far smaller than Ethernet 1500. Something is tunneling inside something else (often VPN-on-VPN, PPPoE, or broken ICMP PMTUD).
Decision: Reduce ZeroTier MTU (or fix the underlying path) and re-test with smaller payload sizes until it passes.
Task 11: Inspect ZeroTier local config and networks directory
cr0x@server:~$ sudo ls -al /var/lib/zerotier-one
total 64
drwxr-xr-x 5 root root 4096 Dec 27 09:11 .
drwxr-xr-x 38 root root 4096 Dec 27 09:11 ..
-rw-r--r-- 1 root root 64 Dec 27 09:11 identity.public
-rw------- 1 root root 256 Dec 27 09:11 identity.secret
drwxr-xr-x 2 root root 4096 Dec 27 09:11 networks.d
-rw-r--r-- 1 root root 1254 Dec 27 09:11 zerotier-one.port
Meaning: Identity exists; membership configs live under networks.d.
Decision: If identity files are missing or permissions are wrong, node identity changes or daemon fails in weird ways. Fix filesystem/permissions before networking.
Task 12: Validate that UDP 9993 is reachable from the host’s network
cr0x@server:~$ sudo ss -uapn | grep 9993
UNCONN 0 0 0.0.0.0:9993 0.0.0.0:* users:(("zerotier-one",pid=1178,fd=25))
UNCONN 0 0 [::]:9993 [::]:* users:(("zerotier-one",pid=1178,fd=26))
Meaning: ZeroTier is listening on UDP 9993. That’s necessary but not sufficient.
Decision: If it’s not listening, check config or that the daemon is bound correctly. If it is, but peers still relay, suspect upstream firewall/NAT.
Task 13: Quick packet capture on the ZeroTier interface
cr0x@server:~$ sudo tcpdump -ni zt7nn3p3kq icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on zt7nn3p3kq, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:45:11.203812 IP 10.147.10.23 > 10.147.10.50: ICMP echo request, id 1789, seq 1, length 64
10:45:12.210914 IP 10.147.10.23 > 10.147.10.50: ICMP echo request, id 1789, seq 2, length 64
Meaning: Requests are leaving. If you don’t see replies, the problem is downstream: destination host firewall, route back, or packet loss.
Decision: Run tcpdump on the destination too. If the destination never sees the requests, you have a routing/ZeroTier path issue. If it sees them and doesn’t reply, it’s host policy.
Task 14: Check reverse path and policy routing traps
cr0x@server:~$ sysctl net.ipv4.conf.all.rp_filter net.ipv4.conf.zt7nn3p3kq.rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.zt7nn3p3kq.rp_filter = 1
Meaning: Strict reverse path filtering is enabled. On multi-homed hosts, it can drop valid traffic if Linux thinks the return path is “wrong.”
Decision: If you’re routing between interfaces or using policy routing, consider setting rp_filter to loose (2) on relevant interfaces.
Task 15: Verify IP forwarding if this node is meant to route traffic
cr0x@server:~$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 0
Meaning: This host will not route packets between interfaces.
Decision: If this box is your ZeroTier gateway for a site-to-site route, enable forwarding and add firewall rules for forwarding/NAT as needed.
Task 16: Check for subnet overlap that steals traffic (classic “can’t ping”)
cr0x@server:~$ ip route get 10.147.10.50
10.147.10.50 dev zt7nn3p3kq src 10.147.10.23 uid 1000
cache
Meaning: Kernel will send traffic over ZeroTier interface. If it instead chose eth0, you have overlap or a more-specific route.
Decision: Fix the address plan or routing priority. Don’t “work around” overlap with hacks unless you enjoy recurring incidents.
The “can’t ping” deep dive: where packets die
Failure zone 1: You’re not actually authorized (or you joined the wrong network)
The controller is the bouncer. If you didn’t get stamped, you’re not getting in.
A node can appear in the member list but remain unauthorized. Depending on your controller UI/policy, you might still see “connected”
indicators that lull you into false confidence.
Operational rule: treat “has a ZeroTier IP and can exchange packets” as the real definition of membership, not “I see it in the web UI.”
Failure zone 2: ICMP is blocked and you’re chasing the wrong problem
Ping is ICMP echo. Many environments block ICMP inbound by default, especially Windows hosts and hardened Linux servers.
If SSH works and ping doesn’t, that’s not “ZeroTier can’t ping.” That’s “ICMP is blocked.”
If your monitoring uses ping, either permit it explicitly on the ZeroTier interface or switch monitoring to TCP checks.
TCP checks are usually closer to what you care about anyway: the app.
Failure zone 3: Route asymmetry and “return traffic goes somewhere else”
Overlay networks still rely on routing decisions. If host A sends packets to host B over ZeroTier,
host B must send replies back over ZeroTier (or at least back to A in a way that makes sense).
If host B thinks A is on a different interface due to overlap or policy routing, replies get lost.
This shows up as: tcpdump on A shows echo requests leaving, tcpdump on B shows them arriving,
but B never replies (or replies out the wrong interface). Reverse path filtering can also drop those replies.
Failure zone 4: Subnet overlap with home/office networks
Overlap is the silent killer because it’s intermittent. Your office might use 10.0.0.0/8 broadly,
your overlay uses 10.0.0.0/24 because “it’s private,” and a remote user’s local network happens to match.
Some machines work, some don’t, and you start blaming ZeroTier like it’s haunted.
Fix overlap by choosing a dedicated overlay range and sticking with it. If you already shipped overlap,
migrate deliberately and in one direction. “Temporary” overlapping routes are how you end up with permanent chaos.
Failure zone 5: NAT traversal vs relays (DIRECT matters)
When NAT traversal succeeds, peers talk directly. When it fails, they may relay through infrastructure.
Relays are fine for admin traffic; they can be brutal for bulk transfers, databases, and anything with tight timeouts.
Sometimes ping works but your app times out because of relay latency and jitter.
You can often improve direct connectivity by allowing UDP outbound, ensuring no “UDP helper” middleboxes are mangling traffic,
or placing a moon in a location both sides can reach.
Failure zone 6: MTU and PMTUD black holes
You’ll see this when “small things work, big things break.” Maybe ping works with default size but fails with large payloads,
or SSH connects but file transfers stall, or TLS handshakes hang mysteriously.
Overlay encapsulation adds overhead. If the underlying path has a reduced MTU (PPPoE, mobile networks, nested tunnels),
you can exceed it. Proper PMTUD requires ICMP “fragmentation needed” messages; many networks block those, creating a black hole.
Failure zone 7: Bridging and broadcast storms (or just loud networks)
Bridging can make “can’t ping” look like “sometimes ping, sometimes not” because you’ve introduced L2 noise.
Broadcast and multicast can crowd out useful traffic, especially on constrained links.
The fix is usually to stop bridging and route instead.
One reliability quote, because it fits
“Hope is not a strategy.” — General Gordon R. Sullivan
In ops terms: stop hoping ping will start working and start collecting evidence in layers.
Joke #2: NAT is the corporate middle manager of networking: it adds meetings, loses messages, and still takes credit when things work.
Three corporate mini-stories from the trenches
Mini-story 1: The incident caused by a wrong assumption
A mid-size company rolled out ZeroTier to connect a handful of admin workstations to a set of test servers in a lab.
A developer said, “Once it says ONLINE, we can ping everything.” That assumption went straight into a runbook.
Monitoring was built on ping, because it was easy and it looked like networking.
The first weekend, a Windows jump host was patched and rebooted. It came back ONLINE in the controller.
Ping checks failed. Alerts fired. On-call spent an hour chasing ZeroTier peers and NAT traversal.
Eventually someone tried nc to port 3389 and it worked. RDP worked. Only ping failed.
Root cause: Windows Firewall had reverted to a stricter inbound ICMP policy on the ZeroTier adapter after a profile change.
The monitoring system interpreted “no ICMP” as “host down,” and the incident report blamed the overlay.
The overlay was innocent; the assumption was guilty.
The fix was boring and effective: change monitoring to TCP checks for the real service (RDP/SSH/HTTPS),
and explicitly allow ICMP only where it mattered. The runbook was updated with “ONLINE is not reachability.”
That single sentence saved several future Saturdays.
Mini-story 2: The optimization that backfired
Another organization wanted “better performance” across ZeroTier, so they decided to bridge two office LANs
into the overlay. The rationale sounded good: “If it’s one big LAN, discovery protocols will work and apps won’t need changes.”
They enabled bridging on a couple of Linux gateways and declared victory after a few successful pings.
By Monday afternoon, complaints rolled in: video calls stuttered, file shares paused, and random devices on one site
started showing duplicate IP warnings. The helpdesk blamed the ISP. The network team blamed Wi‑Fi.
The SRE on-call did the unfashionable thing: captured traffic on the ZeroTier interface and counted broadcast frames.
It turned out they had effectively extended broadcast domains over an internet overlay with variable latency and jitter,
and they were hauling a bunch of chatty L2 traffic (including things that were never meant to leave a building) across it.
Some devices didn’t cope with the new “LAN” topology. ARP behavior got weird. DHCP broadcasts became a horror story.
The rollback was immediate. They replaced bridging with routing, added a couple of managed routes for required subnets,
and used application-level configuration for discovery where possible.
Performance improved, because they stopped trying to make the internet behave like a switch.
Mini-story 3: The boring but correct practice that saved the day
A regulated enterprise ran ZeroTier for a small internal fleet of build agents spread across multiple providers.
They treated the overlay like production networking: consistent address plan, documented managed routes,
and a standard host firewall baseline that explicitly allowed required ports on the ZeroTier interface.
No heroics, no “we’ll remember later.”
One day, a cloud provider changed something in their network path and a subset of nodes started relaying.
Latency increased; a few build jobs began timing out when uploading artifacts. Nobody panicked.
The on-call used the same checklist every time: peers (DIRECT vs RELAY), route tables, then MTU tests.
They found PMTUD problems on one provider path. Because their baseline included an MTU validation test and a documented knob
for reducing MTU on affected nodes, they rolled out a temporary lower MTU for that provider’s nodes while they escalated upstream.
Builds stabilized within an hour. No incident bridge call needed.
The lesson: boring defaults and repeatable checks are a performance feature. They reduce time-to-truth.
In production, that’s what you actually want.
Common mistakes: symptom → root cause → fix
1) Symptom: “Node is ONLINE but has no ZeroTier IP”
Root cause: Member not authorized, auto-assign IP disabled, or the node joined the wrong network ID.
Fix: Authorize the member and ensure the network has an IPv4/IPv6 assignment pool. Re-check zerotier-cli listnetworks.
2) Symptom: “Ping fails, SSH works”
Root cause: ICMP blocked by host firewall or OS policy.
Fix: Allow ICMP on the ZeroTier interface/profile, or stop using ping as the primary reachability check.
3) Symptom: “Can ping by ZeroTier IP, but can’t reach remote LAN subnet”
Root cause: Missing managed route, missing IP forwarding, or missing NAT/forward rules on the gateway.
Fix: Add managed routes on the controller; enable net.ipv4.ip_forward=1; add forward rules (and NAT if required).
4) Symptom: “Some users can ping, others can’t (especially from home)”
Root cause: Subnet overlap with home router ranges, or split DNS sending traffic the wrong way.
Fix: Change the overlay subnet to a low-collision range; avoid common RFC1918 ranges; verify ip route get.
5) Symptom: “Intermittent packet loss, jittery performance, ‘RELAY’ peers”
Root cause: NAT traversal failure or UDP blocked; traffic is relayed.
Fix: Allow UDP outbound; avoid restrictive egress networks; consider a moon in a reachable environment; re-check listpeers.
6) Symptom: “Small requests work; large transfers hang”
Root cause: MTU/PMTUD black hole; ICMP fragmentation-needed blocked somewhere.
Fix: Lower MTU on the overlay interface and/or fix upstream ICMP handling; validate with ping -M do.
7) Symptom: “After enabling bridging, everything got weird and slow”
Root cause: Broadcast/multicast amplification across the overlay; accidental L2 extension.
Fix: Stop bridging; route instead; narrow allowed traffic and subnets; confirm you truly need L2.
8) Symptom: “It works until we run containers”
Root cause: Container network namespaces and firewall rules don’t include the ZeroTier interface; asymmetric routing via docker0.
Fix: Decide whether containers should bind to the host’s ZeroTier IP, or run a dedicated routing/NAT rule set for container subnets.
Checklists / step-by-step plan
Checklist A: New ZeroTier network rollout (small org)
- Pick an overlay subnet that will not overlap with office/home ranges (avoid 192.168.0.0/16 and common 10.0.0.0/24).
- Decide routing vs bridging. Default to routing.
- Join 2–3 pilot nodes and authorize them.
- Verify
zerotier-cli listnetworksshows OK and assigned IPs. - Verify
zerotier-cli listpeersshows direct paths where possible. - Define host firewall policy: allow only required ports on the ZeroTier interface.
- Pick a real connectivity test: TCP to service port, plus ping only if you explicitly allow ICMP.
- Document managed routes and owners (who adds routes, who approves).
- Run MTU validation and record the working payload size baseline.
Checklist B: Site-to-site connectivity (route remote subnet into ZeroTier)
- Choose a gateway node at the remote site with stable uptime and known firewall ownership.
- Enable IP forwarding on the gateway (
net.ipv4.ip_forward=1). - Add a managed route for the remote site subnet pointing to the gateway’s ZeroTier IP.
- Ensure the remote site subnet has a return route back to the overlay (either via the gateway or NAT).
- Validate with a traceroute-like approach: host → gateway ZeroTier IP → remote LAN IP.
- Lock down forwarding rules so the overlay can only reach what it should.
Checklist C: “Can’t ping” incident response
- Confirm network join + authorization (Task 2).
- Confirm interface up + overlay IP (Task 5).
- Check route selection (Task 6 and Task 16).
- Check peer path (Task 4).
- Try TCP to a known port (Task 7).
- Check host firewall rules (Task 8 / Windows profile checks).
- Capture on both ends (Task 13) to see if requests arrive and replies leave.
- If symptoms suggest it: MTU test (Task 10).
- Only then escalate to moons/bridging/policy routing changes.
FAQ
1) Why does ZeroTier show “ONLINE” but I can’t ping?
“ONLINE” is not a guarantee of ICMP reachability. Common causes: member not authorized, ICMP blocked by host firewall,
subnet overlap, or asymmetric routing. Use TCP tests and packet capture to prove where packets stop.
2) Is ping required to prove ZeroTier works?
No. Ping tests ICMP echo; many systems block it. A better “does it work” test is TCP to a known open port,
or an application-level health check. Allow ICMP only if you have an explicit operational need.
3) What does DIRECT vs RELAY mean in listpeers?
DIRECT means peers found a peer-to-peer path (usually better latency and throughput). RELAY means traffic is passing through a relay
due to NAT traversal failure or blocked UDP. RELAY is acceptable for admin access, less so for heavy traffic.
4) Which subnet should I use for my ZeroTier network?
Use a subnet that won’t overlap with real networks your users will be on. Avoid common home/SMB ranges.
Pick something uncommon inside 10/8, or use IPv6. Consistency beats cleverness.
5) How do managed routes work?
Managed routes tell ZeroTier members: “to reach this subnet, send traffic to that member as a gateway.”
They don’t magically enable forwarding; your gateway must have IP forwarding enabled and firewall rules to forward traffic.
6) Do I need a moon?
Not always. A moon can help when many nodes sit behind restrictive NAT or when you want more predictable reachability.
If most peers are RELAY and performance matters, a moon in a reachable environment is worth considering.
7) Why does SSH connect but file transfer stalls?
That pattern often screams MTU/PMTUD. SSH can establish a session with small packets, then hang when larger packets or
different TCP segments hit the MTU black hole. Test with ping -M do and reduce MTU if needed.
8) Can I bridge my LAN into ZeroTier?
You can, but you probably shouldn’t. Bridging extends L2 broadcast domains and can cause noisy traffic,
weird ARP/DHCP behavior, and performance issues. Route unless you have a specific L2 requirement you can defend in a postmortem.
9) Why can’t I reach a device on the remote LAN from a ZeroTier client?
Usually missing return path. You added a managed route, but the remote LAN device doesn’t know how to reply to the overlay subnet.
Fix with a return route on the LAN router, or NAT on the gateway (with eyes open about auditability and troubleshooting).
10) Is UDP 9993 always required?
ZeroTier’s common behavior relies on UDP. If UDP is blocked, you may see degraded connectivity or no connectivity.
You can sometimes limp along via relays, but if you’re on a network that blocks outbound UDP broadly, expect pain.
Next steps you can do today
- Stop treating ping as the oracle. Decide what “up” means for your service, and test that.
- Standardize your overlay address plan. If you’ve already overlapped, schedule a migration before it schedules one for you.
- Baseline peer health. Record how many peers are DIRECT vs RELAY in normal operation.
- Write down your “Fast diagnosis” order. Put it in the on-call runbook, not in someone’s head.
- Make firewall policy explicit. Allow what you need on the ZeroTier interface; deny the rest; document the intent.
- Add an MTU test to your toolkit. Most teams discover MTU issues only after losing a day. Be better than that.
If you do those six things, “can’t ping” becomes a two-minute classification problem instead of a three-hour blame spiral.
That’s the difference between a neat overlay network and an expensive hobby.