WSL & Dev Tools – cr0x.net https://cr0x.net Sat, 28 Feb 2026 10:44:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://cr0x.net/wp-content/uploads/2026/02/logo-150x150.png WSL & Dev Tools – cr0x.net https://cr0x.net 32 32 WSL2 + Kubernetes: The Setup That Doesn’t Melt Your Laptop https://cr0x.net/en/wsl2-kubernetes-laptop-safe-setup/ https://cr0x.net/en/wsl2-kubernetes-laptop-safe-setup/#respond Sat, 28 Feb 2026 10:44:36 +0000 https://cr0x.net/?p=34966 You installed Kubernetes locally because you wanted speed: iterate fast, test charts, debug controllers, maybe run a small platform stack.
Then your fans hit takeoff, your SSD starts doing push-ups, and your “quick dev cluster” becomes the reason Slack is asking if you’re online.

WSL2 can be a great place to run Kubernetes—if you treat it like a real VM with storage and memory constraints, not like a magical Linux folder.
This is the practical setup that avoids the classic failure modes: runaway RAM, slow I/O, DNS weirdness, and “why is kubectl hanging?”

What you’re actually building (and why it melts laptops)

On Windows with WSL2, “running Kubernetes” is rarely just “running Kubernetes.”
It’s a stack of nested abstractions that each has opinions about CPU scheduling, memory reclamation, filesystem semantics, and networking.
When any layer guesses wrong, your laptop pays.

The typical WSL2 Kubernetes setup looks like this:

  • Windows host OS
  • WSL2 VM (a lightweight Hyper-V VM with a virtual disk)
  • Linux userland distro (Ubuntu, Debian, etc.)
  • Container runtime (Docker Engine, containerd, or nerdctl stack)
  • Kubernetes distribution (kind, k3d, minikube, microk8s, or Docker Desktop’s integrated cluster)
  • Your workloads: databases, operators, build pipelines, ingress controllers, service meshes—aka “tiny production”

The meltdown usually comes from one of three places:

  1. Memory: WSL2 happily caches page cache and doesn’t always hand it back quickly. Kubernetes happily schedules pods until the node is “fine” right up until it isn’t.
  2. Storage: crossing the Windows/Linux filesystem boundary can turn normal I/O into a slow-motion incident. Databases amplify this with fsync and small random writes.
  3. Networking/DNS: kube-dns, Windows DNS, WSL2 virtual NICs, and VPN clients form a triangle of sadness.

The goal isn’t “make it fast in benchmarks.” The goal is “make it predictably fast enough,”
and more importantly, make failure modes obvious.
Reliability in dev matters because dev is where you decide what you’ll regret in prod.

A few facts and historical context (so the weird parts make sense)

These are short on purpose. They’re the mental model upgrades that prevent hours of ghost-hunting.

  1. WSL1 was syscall translation; WSL2 is a real Linux kernel in a VM. That’s why WSL2 can run Kubernetes properly, but also why it behaves like a VM with its own disk and memory policies.
  2. WSL2 stores Linux files in a virtual disk (VHDX) by default. That disk can grow quickly and doesn’t always shrink unless you explicitly compact it.
  3. Accessing Linux files from Windows is not the same as accessing Windows files from Linux. The performance characteristics differ dramatically, and the slow path will hurt databases and image builds.
  4. Kubernetes wasn’t designed for laptops. It was designed for clusters where “one node is a cattle VM” and you can burn CPU on reconciliation loops without hearing fans.
  5. kind runs Kubernetes in Docker containers. That’s excellent for reproducibility, but it adds another layer where storage and networking can get “creative.”
  6. k3s (and by extension k3d) was built to be lightweight. It uses SQLite by default, which is fine locally, but it still stresses storage if you combine it with lots of controllers.
  7. cgroups v2 changed how resource isolation behaves. Many “why is my memory limit ignored?” conversations are really “which cgroup mode am I on?” conversations.
  8. DNS is a top-tier failure mode in local clusters. Not because DNS is hard, but because corporate VPNs and split-horizon DNS can quietly override everything.
  9. Container image builds punish filesystem metadata. The difference between a fast filesystem and a bridged one becomes obvious when you run multi-stage builds with thousands of small files.

Pick your local Kubernetes: kind vs k3d vs minikube (and what I recommend)

My default recommendation: kind inside WSL2, with constraints

For most developers and SREs doing platform work, kind gives you repeatability, speed, and clean teardown.
The cluster is “just containers,” and you can version pin Kubernetes easily.
The key is to constrain WSL2 resources and keep your workloads on the Linux filesystem.

When to prefer k3d (k3s in Docker)

If your goal is “run a practical dev platform stack with minimal overhead,” k3d is excellent.
k3s trims fat: fewer components, less memory. It’s forgiving on smaller laptops.
It’s also closer to what many edge setups run, which is useful if you ship to constrained environments.

When to use minikube

minikube is fine when you want “one tool that does many drivers.”
But on WSL2, minikube can land you in a confusing driver matrix: Docker driver, KVM (usually not), Hyper-V (Windows-side),
and then you start debugging the driver more than the cluster.
If you’re already happy with Docker inside WSL2, minikube’s Docker driver is workable.

What to avoid (unless you have a reason)

  • Running heavy stateful workloads on /mnt/c and then blaming Kubernetes for being slow. That’s not Kubernetes; that’s the filesystem boundary asking you to stop.
  • Over-allocating CPUs and RAM “because it’s local.” WSL2 will take it, Windows will fight back, and your browser will lose.
  • Doing “prod-like” with everything turned on. Service mesh + distributed tracing + three operators + a database + a CI system is not a dev cluster; it’s a hobby data center.

One short joke, as promised: Kubernetes is like a kitchen with 30 chefs—nobody cooks faster, but everyone files a status update.

WSL2 baseline: limits, kernel, and the things Windows won’t tell you

Set WSL2 resource limits or accept chaos

By default, WSL2 will scale memory usage up to a large fraction of your system RAM.
It’s not malicious. It’s Linux doing Linux things—using memory for cache.
The problem is that Windows doesn’t always reclaim it in a way that feels polite.

Put a .wslconfig file in your Windows user profile directory (Windows side).
You want a ceiling on memory and CPU, and you want swap to exist but not become a substitute for RAM.

cr0x@server:~$ cat /mnt/c/Users/$WINUSER/.wslconfig
[wsl2]
memory=8GB
processors=4
swap=4GB
localhostForwarding=true

Decision: If you have 16GB RAM total, 8GB for WSL2 is usually sane.
If you have 32GB, you can go 12–16GB. More than that tends to hide leaks and bad pod limits.

WSL2 reclaim behavior: use the tools you have

WSL2 memory reclamation has improved, but it can still feel sticky after heavy builds or cluster churn.
If you need to reclaim memory quickly, shutting down WSL is the blunt instrument that works.

cr0x@server:~$ wsl.exe --shutdown
...output...

Output meaning: No output is normal. It stops all WSL distros and the WSL2 VM.
Decision: Use it when Windows is starved and you need RAM back now, not when you’re troubleshooting an in-cluster problem.

Use systemd in WSL2 (if available) and be explicit

Modern WSL supports systemd. That makes running Docker/containerd and Kubernetes tooling less awkward.
Check if systemd is enabled.

cr0x@server:~$ ps -p 1 -o comm=
systemd

Output meaning: If PID 1 is systemd, you can use normal service management.
If it’s something else (like init), you’ll need to manage daemons differently.
Decision: Prefer systemd-enabled WSL for stability and fewer “why didn’t it start?” mysteries.

Storage on WSL2: the difference between “works” and “doesn’t hate you”

Rule #1: keep Kubernetes data on the Linux filesystem

Put your cluster state, container images, and persistent volume data under your Linux distro filesystem (e.g., /home, /var).
Avoid storing heavy-write workloads on /mnt/c.
The interop layer can be fine for editing code, but it’s a trap for databases and anything fsync-heavy.

Rule #2: understand where your bytes go

Docker/containerd store images and writable layers somewhere under /var/lib.
kind and k3d add their own layers.
If you build a lot of images or run CI-like workloads, your VHDX grows. It might not shrink.
That’s not a moral failing; that’s how virtual disks work.

Rule #3: measure I/O the boring way

You don’t need fancy storage tooling to catch the big problems. You need two comparisons:
Linux FS performance and Windows-mounted FS performance.
If one is 10x slower, don’t “tune Kubernetes.” Move the workload.

cr0x@server:~$ dd if=/dev/zero of=/home/cr0x/io-test.bin bs=1M count=512 conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.24 s, 433 MB/s

Output meaning: This is a rough sequential write with flush. Hundreds of MB/s is expected on SSD-backed storage.
Decision: If this is single-digit MB/s, your VM is under I/O pressure or your disk is unhappy. Fix that before Kubernetes.

cr0x@server:~$ dd if=/dev/zero of=/mnt/c/Users/$WINUSER/io-test.bin bs=1M count=256 conv=fdatasync
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 12.8 s, 21.0 MB/s

Output meaning: If you see this kind of drop, that’s the interop path.
Decision: Don’t put /var/lib/docker, /var/lib/containerd, or PV directories on /mnt/c.
Keep code there if you must, but keep build caches and databases in Linux space.

VHDX growth: compacting is a maintenance chore, not a one-time fix

If your WSL2 virtual disk grows due to image builds and churn, you can compact it—but compacting is not automatic.
First, clean up inside Linux: remove unused images, prune volumes, clear caches.
Then compact from Windows. The exact steps differ by Windows build and tooling, but the principle is consistent:
delete unused blocks in the guest, then tell the host to compact.

The boring truth: local clusters create data. They rarely delete data as aggressively as you think.
If you treat your dev environment like production, you’ll schedule maintenance like production. That’s the deal.

Networking: NodePorts, ingress, DNS, and the localhost trap

Localhost is not a philosophy; it’s a routing decision

In WSL2, “localhost” can mean Windows localhost or WSL localhost depending on how you started the process.
With localhostForwarding=true, WSL can forward ports to Windows, but it’s not magic for all traffic patterns.

If you run kind inside WSL2 and expose a service with NodePort, you’ll often access it from Windows via forwarded ports
or by hitting the WSL VM IP. Both can work. The important part is being consistent and documenting which one your team uses.

DNS: the silent killer of kubectl

A lot of “Kubernetes is slow” complaints are actually DNS timeouts.
kubectl looks like it’s hung, but it’s waiting for API server calls that never quite resolve cleanly.
VPN clients make this worse by injecting resolvers or forcing split DNS.

Your job is to separate “API server slow” from “name resolution slow” in under five minutes.
Later, you can argue with corporate VPN tooling.

Practical tasks (commands, outputs, and decisions)

These are real tasks you’ll do when setting up and when debugging.
Each one includes what the output means and the decision you make from it.
Do them in order when you’re building a new setup; do them selectively when you’re on fire.

Task 1: Confirm you’re on WSL2 (not WSL1)

cr0x@server:~$ wsl.exe -l -v
  NAME            STATE           VERSION
* Ubuntu-22.04    Running         2

Output meaning: VERSION 2 means a real Linux kernel VM. VERSION 1 means syscall translation.
Decision: If you’re on VERSION 1, stop and convert. Kubernetes needs the real kernel behavior.

Task 2: Check kernel and cgroup mode

cr0x@server:~$ uname -r
5.15.146.1-microsoft-standard-WSL2
cr0x@server:~$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

Output meaning: cgroup2 mounted means unified hierarchy.
Decision: If your tooling assumes cgroup v1 (older Docker configs, some monitoring agents), expect surprises.

Task 3: Verify systemd and service control

cr0x@server:~$ systemctl is-system-running
running

Output meaning: systemd is active and stable.
Decision: If you see degraded, check failed units before blaming Kubernetes for “random” issues.

Task 4: Confirm container runtime health (Docker example)

cr0x@server:~$ docker info --format '{{.ServerVersion}} {{.CgroupVersion}}'
24.0.7 2

Output meaning: Docker is reachable and reports cgroup version.
Decision: If this command is slow or hangs, fix Docker before touching kind/k3d.

Task 5: Measure memory pressure quickly

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       5.9Gi       310Mi       160Mi       1.5Gi       1.3Gi
Swap:          4.0Gi       1.2Gi       2.8Gi

Output meaning: available matters more than free. Swap use indicates pressure.
Decision: If available is under ~1Gi and swap is climbing, scale down the cluster, reduce limits, or raise WSL memory.

Task 6: Check disk usage where it counts

cr0x@server:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb       250G  180G   71G  72% /

Output meaning: This is your distro filesystem backed by VHDX.
Decision: Above ~85% usage, performance and compaction behavior get worse. Prune images and old PVs.

Task 7: Install and create a kind cluster with sane defaults

cr0x@server:~$ kind create cluster --name dev --image kindest/node:v1.29.2
Creating cluster "dev" ...
 ✓ Ensuring node image (kindest/node:v1.29.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-dev"
You can now use your cluster with:

kubectl cluster-info --context kind-dev

Output meaning: Cluster created, context set. StorageClass installed (usually standard).
Decision: If CNI install hangs, suspect DNS/proxy/VPN or image pull issues—don’t keep retrying blindly.

Task 8: Verify cluster liveness and component health

cr0x@server:~$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:40685
CoreDNS is running at https://127.0.0.1:40685/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
cr0x@server:~$ kubectl get nodes -o wide
NAME                STATUS   ROLES           AGE   VERSION   INTERNAL-IP   OS-IMAGE
dev-control-plane   Ready    control-plane   2m    v1.29.2   172.18.0.2    Debian GNU/Linux 12 (bookworm)

Output meaning: API server reachable, node Ready, internal IP assigned.
Decision: If node is NotReady, go straight to kubectl describe node and CNI logs—don’t reinstall everything yet.

Task 9: Spot the real resource hogs (nodes + pods)

cr0x@server:~$ kubectl top nodes
NAME                CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
dev-control-plane   620m         15%    2240Mi          56%

Output meaning: Metrics-server is working (kind often includes it via addons or you installed it).
Decision: If memory is high when idle, check for chatty controllers, leaking dev builds, or a stuck log shipper.

Task 10: Validate DNS inside the cluster (the cheap smoke test)

cr0x@server:~$ kubectl run -it --rm dns-test --image=busybox:1.36 --restart=Never -- nslookup kubernetes.default.svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default.svc.cluster.local
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
pod "dns-test" deleted

Output meaning: CoreDNS responds, service discovery works.
Decision: If this times out, fix DNS before debugging your app. Your app is innocent until proven guilty.

Task 11: Confirm whether slow kubectl is network or API latency

cr0x@server:~$ time kubectl get pods -A
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   coredns-76f75df574-2lq9r                    1/1     Running   0          4m
kube-system   coredns-76f75df574-9x2ns                    1/1     Running   0          4m
kube-system   etcd-dev-control-plane                      1/1     Running   0          4m
kube-system   kindnet-4cdbf                               1/1     Running   0          4m
kube-system   kube-apiserver-dev-control-plane            1/1     Running   0          4m
kube-system   kube-controller-manager-dev-control-plane   1/1     Running   0          4m
kube-system   kube-proxy-4xw26                            1/1     Running   0          4m
kube-system   kube-scheduler-dev-control-plane            1/1     Running   0          4m

real    0m0.312s
user    0m0.127s
sys     0m0.046s

Output meaning: 300ms is fine for local. Multiple seconds suggests DNS, kubeconfig context confusion, or a struggling API server.
Decision: If it’s slow, run kubectl get --raw /readyz?verbose next.

Task 12: Check API server readiness endpoints for the actual blocker

cr0x@server:~$ kubectl get --raw='/readyz?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]informer-sync ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
readyz check passed

Output meaning: If etcd or informer sync is slow/failing, the control plane is the bottleneck.
Decision: etcd slow often means storage latency. That’s your cue to check disk and avoid cross-filesystem I/O.

Task 13: Find the pod that is burning your laptop (CPU/memory)

cr0x@server:~$ kubectl top pods -A --sort-by=memory | tail -n 10
default       api-7b7c7d8f6c-9bq5m             1/1     980Mi
observability prometheus-0                      2/2     1210Mi
observability loki-0                            1/1     1530Mi

Output meaning: You can see the top consumers.
Decision: If your dev cluster includes Prometheus/Loki by default, you just found your “why is it hot?” answer.
Scale down, reduce retention, or use lighter tools locally.

Task 14: Check container runtime disk usage (Docker)

cr0x@server:~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          42        12        18.4GB    12.1GB (65%)
Containers      19        6         1.2GB     720MB (60%)
Local Volumes   27        9         9.8GB     6.3GB (64%)
Build Cache     124       0         22.6GB    22.6GB

Output meaning: Build cache is often the silent disk eater.
Decision: If reclaimable is large, prune deliberately. Don’t wait until your VHDX hits 95% and everything slows down.

Task 15: Prune safely (and accept the trade-off)

cr0x@server:~$ docker builder prune --all --force
Deleted build cache objects:
v8m3q8qgk7yq4o0l5u3f7s1m2
...
Total reclaimed space: 22.6GB

Output meaning: You reclaimed space, at the cost of rebuilding layers later.
Decision: In dev, disk space and stability are worth more than shaving 90 seconds off the next build.

Task 16: Confirm you’re not accidentally running workloads on /mnt/c

cr0x@server:~$ kubectl get pv -A -o wide
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                      STORAGECLASS   REASON   AGE   VOLUMEMODE
pvc-1d2c7f7e-2f5d-4c0d-9a3a-2c6f9d8b1a7c   10Gi       RWO            Delete           Bound    default/db-data           standard                8m    Filesystem

Output meaning: This doesn’t show the host path. In kind, local-path and default provisioning differ.
Decision: Inspect the storage class and provisioner behavior; if it binds to hostPath on a Windows-mounted path, fix it immediately.

Task 17: Diagnose kubelet/container issues via node logs (kind node container)

cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Status}}' | grep dev-control-plane
dev-control-plane   Up 6 minutes
cr0x@server:~$ docker logs dev-control-plane | tail -n 20
I0205 09:10:12.123456       1 server.go:472] "Kubelet version" kubeletVersion="v1.29.2"
I0205 09:10:15.234567       1 kubelet.go:2050] "Skipping pod synchronization" error="PLEG is not healthy"
...

Output meaning: If PLEG is unhealthy, the kubelet is struggling—often due to disk I/O or container runtime slowness.
Decision: Go check disk latency, container runtime health, and the amount of log churn.

Task 18: Quick check for log storms (they can be your hottest workload)

cr0x@server:~$ kubectl logs -n kube-system deploy/coredns --tail=20
.:53
[INFO] plugin/reload: Running configuration SHA512 = 7a9b...
[INFO] 10.244.0.1:52044 - 46483 "A IN kubernetes.default.svc.cluster.local. udp 62 false 512" NOERROR qr,aa,rd 114 0.000131268s

Output meaning: Some DNS logs are normal. Thousands per second are not.
Decision: If logs are hot, reduce verbosity, fix the client retry loop, or you’ll pay in CPU and disk writes.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A team I worked with standardized on WSL2 for dev clusters because it was “basically Linux.”
They kept their repo on the Windows filesystem for easy IDE integration, then mounted it into containers for builds and tests.
It seemed fine for small services. Then the platform group added a local Postgres, a controller, and a test suite that ran migrations on every run.

The symptom was weird: migrations would sometimes take 30 seconds, sometimes 10 minutes.
People blamed “Kubernetes overhead” and “that operator we added.”
One engineer tried to fix it by bumping CPU limits. It got worse—higher CPU just made the system hit the storage bottleneck faster.

The wrong assumption was simple: they assumed /mnt/c was “just another filesystem.”
It isn’t. It’s a boundary with different caching behavior, different metadata performance, and different flush semantics.
Their database volume and build caches were sitting on the slow path.

The fix was unglamorous: move the database and build cache into the Linux filesystem, and only keep the source checkout on Windows if needed.
They also added a preflight script that refused to start the stack if it detected PVs pointing at /mnt/c.
Performance stabilized instantly. Nobody celebrated, which is how you know it was the right fix.

Mini-story 2: The optimization that backfired

Another company wanted “faster local clusters,” so they preloaded everything: observability stack, ingress, cert-manager, and a couple operators.
The idea was noble: reduce onboarding time, ensure everyone had the same baseline, avoid “works on my laptop.”
They even built an internal script that would create the cluster and install all charts in one shot.

Then the tickets started. Laptops ran hot during meetings. Battery life cratered.
kubectl commands would sometimes lag by several seconds. Developers started disabling components “temporarily,” which turned into permanent drift.
The platform team responded by pushing more default resource limits and more replicas, because “prod parity.”

The backfire came from controller churn and background work.
Prometheus scraping, Loki ingestion, cert-manager reconciliation, and operator loops are not free.
In a real cluster, you amortize that cost across servers. On a laptop, you feel it every time the fan ramps.

The fix was to define two profiles: core (ingress + DNS + storage + metrics-server) and full (the heavy stack).
Core was the default; full was opt-in for debugging.
They also trimmed scrape intervals and retention in local mode. The irony: onboarding got faster because people stopped fighting their environment.

Mini-story 3: The boring but correct practice that saved the day

A regulated enterprise (the kind that loves spreadsheets) had a surprisingly smooth WSL2 Kubernetes experience.
Their secret wasn’t a fancy toolchain. It was discipline: version pinning, repeatable cluster creation, and aggressive cleanup.
Every developer had the same kind node image version, the same chart versions, and the same default resource limits.

They also had a weekly maintenance routine: prune unused images, delete unused namespaces, and compact the dev environment when needed.
It was scheduled, documented, and dull.
Developers initially complained—nobody wants “maintenance day” on a laptop.

Then came the day a Windows update changed something subtle in networking.
Half the org’s clusters started having intermittent DNS failures.
The teams with drifted environments had a mess: different CNI versions, random kubeconfigs, inconsistent local DNS overrides.
The disciplined teams could reproduce quickly and compare apples to apples.

They isolated it to resolver behavior under VPN, rolled out a standardized workaround, and got back to work.
Boring practice saved the day: consistent versions and consistent baselines make debugging finite.

Fast diagnosis playbook: what to check first, second, third

When your laptop is melting or your cluster is “slow,” you don’t have time to admire architecture.
You need a deterministic path to the bottleneck.

First: Is the host (Windows + WSL2) under resource pressure?

  1. Check WSL2 memory and swap usage: free -h. If available is low and swap is climbing, you’re memory-bound.
  2. Check disk fullness: df -h /. If near full, everything becomes slower and more fragile.
  3. Check quick I/O sanity: dd ... conv=fdatasync on Linux FS vs /mnt/c. If Linux FS is slow too, you have system-level I/O pressure.

Second: Is Kubernetes control plane healthy or stuck?

  1. API readyz: kubectl get --raw='/readyz?verbose'. If etcd checks are slow, suspect storage latency.
  2. Node status: kubectl get nodes and kubectl describe node. If NotReady, look at CNI and kubelet symptoms.
  3. CoreDNS smoke test: run a busybox nslookup. If DNS is broken, stop pretending the app is the problem.

Third: Which workload is actually burning CPU/memory/I/O?

  1. Top consumers: kubectl top pods -A --sort-by=memory and --sort-by=cpu.
  2. Log storms: check logs on the suspected pods. High write rate equals I/O pressure equals global slowdown.
  3. Image/build churn: docker system df and prune build cache if it’s ballooned.

Paraphrased idea from Werner Vogels: “Everything fails, all the time.” Build your local setup so failure is quick to identify, not mysterious.

Common mistakes: symptom → root cause → fix

1) Symptom: Laptop fans spike when cluster is “idle”

Root cause: background controllers (observability stack, operators) doing constant reconciliation; or a pod in a crash loop writing logs.

Fix: kubectl top pods -A, find the hog; scale it down; reduce retention/scrape intervals; fix crash loops; set sane requests/limits.

2) Symptom: kubectl commands take 5–30 seconds randomly

Root cause: DNS timeouts or VPN resolver interference; sometimes kubeconfig points to a dead context.

Fix: run time kubectl get pods -A, then kubectl get --raw='/readyz?verbose'. If readyz is fine, test DNS inside cluster. If DNS is failing, fix resolv.conf/vpn split DNS policies or run without VPN for local cluster tasks.

3) Symptom: Postgres/MySQL in cluster is painfully slow

Root cause: PV or bind mounts are on /mnt/c or another slow boundary; fsync-heavy workloads amplify it.

Fix: keep PV data on the Linux filesystem; use a local-path provisioner that writes to /var inside WSL, not Windows mounts.

4) Symptom: WSL2 eats RAM and never gives it back

Root cause: Linux page cache + WSL2 reclaim behavior; big builds and image pulls fill cache; memory limit not configured.

Fix: set .wslconfig memory cap; restart WSL with wsl.exe --shutdown when needed; reduce cluster footprint and avoid running everything at once.

5) Symptom: Disk fills up “mysteriously”

Root cause: container image layers, build cache, leftover PV data, and logs; VHDX grows; pruning not done.

Fix: docker system df and docker builder prune --all; delete unused namespaces/PVs; keep an eye on df -h.

6) Symptom: Node becomes NotReady; pods stuck ContainerCreating

Root cause: CNI broken or kubelet/container runtime struggling due to I/O pressure; kind node container unhealthy.

Fix: check kind node container logs; check CNI pods in kube-system; fix disk pressure; recreate cluster if the node image is corrupted.

7) Symptom: Ingress works from WSL but not from Windows

Root cause: port forwarding expectations wrong; Windows firewall/VPN; confusion between WSL IP and Windows localhost.

Fix: decide on one access method (forwarded localhost vs WSL VM IP); document it; expose ingress with a predictable mapping; verify with curl from both sides.

8) Symptom: Builds inside containers are slower than builds on Windows

Root cause: source tree on Windows mount; heavy metadata ops across boundary; antivirus scanning on Windows path.

Fix: keep source inside Linux filesystem for builds; use Windows IDE via WSL integration; exclude build directories from Windows AV if policy allows.

Second and final short joke: If your dev cluster needs a runbook, congratulations—you’ve built a small production environment with worse funding.

Checklists / step-by-step plan

Setup plan (do this once per laptop)

  1. Set WSL2 limits in .wslconfig: cap memory and CPU; enable reasonable swap.
  2. Enable systemd in WSL (if supported) and standardize on it across your team.
  3. Choose one runtime: Docker Engine inside WSL2 is fine; avoid mixing Docker Desktop + WSL Docker unless you enjoy ambiguity.
  4. Choose one Kubernetes tool: kind (recommended) or k3d. Pick one and standardize cluster creation scripts.
  5. Keep cluster data in Linux FS: ensure container runtime storage and PV paths are not on /mnt/c.
  6. Define profiles: “core” default; “full” opt-in. Your laptop is not a staging cluster.
  7. Pin versions: kind node image version, chart versions, and critical addons.

Daily workflow checklist (stay fast, stay sane)

  1. Before heavy work: free -h and df -h /. If you’re low, prune first.
  2. After a big build day: docker system df. If build cache is huge, prune it.
  3. When something feels slow: run the DNS smoke test and /readyz checks before changing anything.
  4. Keep your repo where your tools are fastest: if builds are in Linux, keep the working copy in Linux.

Weekly maintenance checklist (boring, effective)

  1. Prune build cache: docker builder prune --all.
  2. Prune unused images and containers: docker system prune (careful: understand what it deletes).
  3. Delete unused namespaces and PVs in the dev cluster.
  4. Recreate the cluster if it has accumulated too much drift. Tear down is a feature.
  5. Reclaim memory if Windows is tight: wsl.exe --shutdown.

FAQ

1) Should I use Docker Desktop or Docker Engine inside WSL2?

If you want the simplest Windows integration and your company standardizes on it, Docker Desktop is fine.
If you want fewer moving parts and clearer Linux behavior, use Docker Engine inside WSL2.
Pick one and commit; mixed setups create debugging folklore.

2) Why is storing data on /mnt/c such a problem?

Because you’re crossing a virtualization/interop boundary with different caching and metadata semantics.
Databases do lots of small writes and fsync calls. That path punishes them.
Keep heavy I/O inside the Linux filesystem and treat /mnt/c as “good for documents, not for hot data.”

3) kind or k3d: which is less likely to melt my laptop?

k3d often uses less memory at baseline because k3s is smaller.
kind is incredibly predictable and easy to pin versions. Both can be laptop-safe if you keep the workload set lean and set WSL2 limits.
My bias: kind for platform work and multi-node simulation; k3d for “just run the stack.”

4) How much RAM should I allocate to WSL2?

On 16GB total: 6–8GB is a good ceiling.
On 32GB: 12–16GB is fine.
If you allocate too much, you’ll hide bad pod limits and starve Windows apps in subtle ways.

5) Why does WSL2 keep memory after I stop workloads?

Linux aggressively uses memory for filesystem cache. That’s normally good.
WSL2’s reclamation back to Windows can be slower than you’d like.
If you need RAM back immediately, shut down WSL; if you want long-term sanity, constrain memory and reduce background churn.

6) Why are my kubectl commands slow only when the VPN is on?

VPN clients commonly inject DNS resolvers and routing rules.
Kubernetes relies on DNS internally, and kubectl relies on reliable connectivity to the API endpoint.
Diagnose with the DNS smoke test and readyz checks; then decide whether to split-tunnel, adjust resolvers, or run local cluster tasks off-VPN.

7) How do I expose services to Windows from a cluster running in WSL2?

Decide if you’re using forwarded localhost ports or the WSL VM IP.
For predictable dev UX, many teams port-forward (kubectl port-forward) or run an ingress that maps to known ports.
Document the method so your team doesn’t debug “localhost” for fun.

8) Can I run stateful workloads (Postgres, Kafka) locally in WSL2 Kubernetes?

Yes, but be realistic. Postgres is fine for dev if its data lives on the Linux filesystem and you don’t run five other heavy stacks.
Kafka is possible, but it’s often where “local parity” becomes “local punishment.” Consider lighter substitutes unless you’re debugging Kafka-specific behavior.

9) How often should I recreate my local cluster?

If you’re doing platform work with lots of CRDs and controller installs, recreating weekly or biweekly is normal.
If your cluster is stable and you keep it lean, you can run it longer.
Recreate immediately when you suspect drift: weird DNS, stuck webhooks, or mysterious admission failures.

10) What’s the most common root cause of “everything is slow”?

Storage. Either the workload is on the wrong filesystem path, or the disk is near full, or the system is writing logs like it’s paid by the line.
Memory is second. DNS is third. Kubernetes itself is rarely the first cause; it’s just the stage where the problem performs.

Next steps you can do today

  1. Set your WSL2 limits (memory, CPU, swap). If you do only one thing, do this.
  2. Move hot data off /mnt/c: container runtime storage, PVs, databases, build caches—keep them in Linux filesystem.
  3. Pick a lean default cluster profile: kind or k3d with only core addons. Make “full stack” an opt-in profile.
  4. Adopt the fast diagnosis playbook: check host pressure, then control plane health, then top consumers. Stop guessing.
  5. Schedule boring maintenance: prune caches, delete unused namespaces, and recreate the cluster when drift sets in.

The endgame isn’t to build the most impressive local cluster. It’s to build one that behaves consistently under stress.
Predictability is what keeps your laptop cool—and your brain cooler.

]]>
https://cr0x.net/en/wsl2-kubernetes-laptop-safe-setup/feed/ 0
WSL2 + VPN: Why It Breaks (and How to Fix It) https://cr0x.net/en/wsl2-vpn-breaks-and-fix/ https://cr0x.net/en/wsl2-vpn-breaks-and-fix/#respond Sat, 28 Feb 2026 00:42:44 +0000 https://cr0x.net/?p=34934 You connect to the corporate VPN, open your WSL2 shell, and suddenly nothing works. Git can’t reach the internal repo.
Curl hangs. DNS returns lies. Your app can hit the internet or the intranet, but never both.

This isn’t you being cursed. It’s the predictable outcome of layering a Linux VM (WSL2) behind Windows NAT, then letting a VPN client
rewrite routes and DNS like it owns the machine. Which, to be fair, it sort of does.

The mental model: why WSL2 and VPNs collide

WSL2 is not a compatibility layer in the old sense. It’s a lightweight VM with a real Linux kernel.
Networking-wise, it behaves like a machine sitting behind a small NAT gateway implemented by Windows.
Your Linux distro gets an address in a private range (typically 172.16.0.0/12 or 192.168.0.0/16-ish),
and Windows does translation and forwarding.

VPN clients, on the other hand, are professionally intrusive. They install virtual adapters, inject routes,
override DNS, enforce policy, and sometimes block “non-corporate” forwarding paths on purpose.
The VPN usually assumes it’s dealing with one host network stack. WSL2 is a second stack behind it.
Now you’ve got two routing domains and at least three places DNS might be “helpfully” rewritten:
Windows, the VPN client, and WSL’s auto-generated /etc/resolv.conf.

Where the packets actually go

When a process in WSL2 connects to git.internal.corp, it does:

  • Linux app asks Linux resolver for DNS.
  • Linux resolver consults /etc/resolv.conf (often pointing to a Windows-side resolver IP).
  • Traffic leaves the WSL VM to the Windows host via the WSL virtual switch.
  • Windows decides which interface/route wins: VPN adapter, Wi‑Fi/Ethernet, or “nope”.
  • VPN client may NAT, encrypt, or block forwarding depending on policy.

The breakage usually falls into one of four buckets:
routing, DNS, MTU/fragmentation, or policy/firewall.
You can fix all four, but you have to stop guessing which bucket you’re in.

Joke #1: VPN clients are like cats—independent, territorial, and convinced they’re the only thing in the house that matters.

The uncomfortable truth

Some VPN policies intentionally make WSL2 hard. Not because WSL2 is evil, but because it is effectively a second machine.
Security teams worry about unmanaged Linux environments pivoting into internal networks.
So the “fix” might not be a tweak; it might be “get your VPN team to allow it” or “use a sanctioned dev VM”.
Still, in many companies, it’s just misconfiguration and defaults fighting defaults.

Fast diagnosis playbook (check 1/2/3)

When WSL2 + VPN breaks, don’t start by reinstalling WSL or changing ten settings at once.
Do the following in order. Each step narrows the fault domain.

1) Is it DNS or routing?

  • If ping 10.x.x.x works but ping git.internal.corp doesn’t, it’s DNS.
  • If DNS resolves to the right IP but TCP can’t connect, it’s routing/firewall/MTU.
  • If only some internal subnets work, it’s split tunnel route injection.

2) Compare Windows vs WSL behavior

  • If Windows can reach internal resources but WSL can’t, the problem is the NAT/forwarding boundary or DNS handoff.
  • If Windows can’t reach them either, stop blaming WSL. Fix VPN connection, routes, or corporate DNS first.

3) Identify the interface that should win

  • Full tunnel: default route should go via VPN. DNS should be corporate.
  • Split tunnel: only specific internal prefixes go via VPN; default stays on local internet.
  • Either way: WSL’s traffic must be allowed to traverse Windows to that interface.

4) Check MTU last (but don’t forget it)

MTU problems look like “DNS works, ping works, HTTPS hangs” or “small requests work, large ones stall”.
If you see that pattern, test MTU before you waste an hour arguing with routing tables.

Facts and history: why this is a recurring mess

  • WSL1 and WSL2 are fundamentally different networks. WSL1 shared the Windows stack; WSL2 is NATed behind a VM.
  • WSL2 uses a Hyper‑V virtual switch under the hood. Even on Windows Home, the plumbing behaves like a mini Hyper‑V setup.
  • Many VPN clients still ship kernel drivers. They hook deeply into Windows networking to enforce policy and routes.
  • DNS “split horizon” is common in enterprises. The same hostname resolves differently inside vs outside the VPN, making stale resolvers disastrous.
  • NRPT and per-interface DNS are a Windows thing. Windows can send different DNS queries to different servers depending on domain rules; Linux typically doesn’t unless configured.
  • MTU pain is older than WSL. VPN encapsulation reduces effective MTU, and PMTUD is still fragile in real networks.
  • Default route precedence is not universal. Windows route metrics and VPN “force tunnel” settings often surprise people coming from Linux.
  • Corporate security baselines often block IP forwarding. Even if Windows can reach the VPN, it may refuse to forward traffic from a “virtual” NIC.
  • WSL2 localhost access changed over time. Recent Windows builds improved localhost forwarding between Windows and WSL, but VPNs can still interfere.

Failure modes mapped to root causes

Failure mode A: “DNS is broken in WSL, but Windows is fine”

Most common. WSL autogenerates /etc/resolv.conf pointing at a Windows-side resolver IP.
When you connect VPN, Windows DNS servers change. WSL doesn’t always update cleanly, or it updates to a resolver that can’t reach the
corporate DNS due to interface binding.

Failure mode B: “Internal IPs time out from WSL, but resolve correctly”

That’s typically routing or policy. Windows can route to the internal subnet via VPN adapter,
but WSL traffic might not be allowed to traverse that path (firewall, VPN client policy, or missing route to return traffic).

Failure mode C: “Split tunnel: one subnet works, another doesn’t”

Split tunneling relies on pushing many internal routes. Some VPN clients push only to Windows and assume local processes are the only source.
WSL traffic may use a different source address range (the WSL NAT subnet), and the corporate side might not have a return route.

Failure mode D: “HTTPS stalls, git clone hangs, but ping works”

MTU / fragmentation. Encapsulation shrinks MTU; some paths drop ICMP “fragmentation needed” messages; TCP blackholes.
It’s boring, it’s ancient, and it still burns teams weekly.

Failure mode E: “Everything works until sleep/resume”

Resume often changes interface ordering, route metrics, or the VPN client’s internal state.
WSL keeps running, still holding old DNS or routes, while Windows has moved on.

Hands-on tasks: commands, outputs, and decisions (12+)

These tasks are designed to be run exactly when things are broken. Each one tells you something specific,
and tells you what to do next. Run both Windows-side and WSL-side checks; the delta is the clue.

Task 1 (WSL): What IP and default gateway do I have?

cr0x@server:~$ ip -br addr
lo               UNKNOWN        127.0.0.1/8 ::1/128
eth0             UP             172.26.112.45/20

cr0x@server:~$ ip route show
default via 172.26.112.1 dev eth0
172.26.112.0/20 dev eth0 proto kernel scope link src 172.26.112.45

What it means: WSL is on 172.26.112.0/20 and uses 172.26.112.1 as a gateway (Windows NAT).
Decision: If the gateway is missing or the interface is DOWN, your WSL networking is broken before the VPN even enters the story.
Restart WSL (wsl --shutdown from Windows) and try again.

Task 2 (WSL): Is DNS pointing somewhere sane?

cr0x@server:~$ cat /etc/resolv.conf
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
nameserver 172.26.112.1

What it means: WSL is using the Windows gateway as a DNS proxy/resolver.
Decision: If corporate DNS is failing, this is your first suspect. Either Windows’s resolver is wrong for your VPN state,
or WSL is stuck with stale DNS. Proceed to Task 3 and 4.

Task 3 (WSL): Can I resolve an internal name, and what server answered?

cr0x@server:~$ resolvectl status 2>/dev/null | sed -n '1,120p'
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: foreign
Current DNS Server: 172.26.112.1
       DNS Servers: 172.26.112.1

What it means: Your resolver is the WSL gateway. That can work, but only if Windows is forwarding queries appropriately.
Decision: If internal domains don’t resolve, test direct DNS to corporate servers next (Task 4) to separate “WSL resolver” from “DNS reachability”.

Task 4 (WSL): Query corporate DNS directly (bypass the proxy)

cr0x@server:~$ dig @10.20.30.40 git.internal.corp +time=2 +tries=1

; <<>> DiG 9.18.24-1ubuntu1.3-Ubuntu <<>> @10.20.30.40 git.internal.corp +time=2 +tries=1
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

What it means: WSL cannot reach the corporate DNS server IP directly.
Decision: This is routing/policy, not a resolver file problem. Go to Task 6 (routing) and Task 10 (firewall/policy hints).

Task 5 (Windows in WSL shell): Can I compare with Windows DNS quickly?

cr0x@server:~$ powershell.exe -NoProfile -Command "Resolve-DnsName git.internal.corp | Select-Object -First 1"
Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
git.internal.corp                              A      60    Answer     10.50.12.34

What it means: Windows can resolve the name.
Decision: Now you know: DNS works on Windows, but WSL can’t use it. Fix is usually WSL DNS integration (see fixes section: “pin resolv.conf” or “use systemd-resolved properly”).

Task 6 (WSL): What route would I use to reach an internal IP?

cr0x@server:~$ ip route get 10.50.12.34
10.50.12.34 via 172.26.112.1 dev eth0 src 172.26.112.45 uid 1000

What it means: WSL will send internal traffic to the Windows gateway. That’s expected.
Decision: If Windows then doesn’t forward it into the VPN, you’re dealing with a Windows routing/policy/firewall problem or a VPN client that blocks forwarding from WSL’s vNIC.

Task 7 (WSL): Can I reach the internal IP at all?

cr0x@server:~$ ping -c 2 10.50.12.34
PING 10.50.12.34 (10.50.12.34) 56(84) bytes of data.

--- 10.50.12.34 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1027ms

What it means: No ICMP reachability. Not definitive (ICMP may be blocked), but it’s a strong signal.
Decision: Try TCP connectivity next. If TCP also fails, it’s routing/policy. If TCP works and ping fails, it’s just ICMP policy.

Task 8 (WSL): Test TCP to a known internal port

cr0x@server:~$ nc -vz -w 2 10.50.12.34 443
nc: connect to 10.50.12.34 port 443 (tcp) timed out: Operation now in progress

What it means: TCP can’t get through.
Decision: Move up the stack: check Windows routes and VPN adapter behavior. If Windows can connect but WSL can’t, suspect VPN client restrictions on forwarding/NAT.

Task 9 (Windows via WSL): What does Windows think the routes are?

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetRoute -AddressFamily IPv4 | Sort-Object -Property RouteMetric | Select-Object -First 8"
ifIndex DestinationPrefix NextHop     RouteMetric ifMetric PolicyStore
------- ----------------- -------     ----------- -------- -----------
19      0.0.0.0/0         10.8.0.1    5           25       ActiveStore
3       0.0.0.0/0         192.168.1.1 35          25       ActiveStore
19      10.0.0.0/8        0.0.0.0     5           25       ActiveStore
19      172.16.0.0/12     0.0.0.0     5           25       ActiveStore
19      192.168.0.0/16    0.0.0.0     5           25       ActiveStore
1       127.0.0.0/8       0.0.0.0     256         75       ActiveStore

What it means: The VPN (ifIndex 19) is winning default route and internal RFC1918 ranges.
Decision: In full tunnel, this is expected. In split tunnel, it might be wrong and will break local network access.
Either way, if WSL can’t traverse this, the VPN client may be blocking forwarding or Windows firewall is filtering the WSL vEthernet interface.

Task 10 (Windows via WSL): Is the WSL vEthernet profile/firewall hostile?

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetAdapter -Name 'vEthernet (WSL)' | Format-List -Property Name,Status,MacAddress,LinkSpeed"
Name       : vEthernet (WSL)
Status     : Up
MacAddress : 00-15-5D-4A-2B-1C
LinkSpeed  : 10 Gbps

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetConnectionProfile -InterfaceAlias 'vEthernet (WSL)' | Format-List"
Name             : Network
InterfaceAlias   : vEthernet (WSL)
NetworkCategory  : Public
IPv4Connectivity : Internet

What it means: The WSL virtual NIC is categorized as Public. Many corporate firewalls clamp down hard on Public profiles.
Decision: If policy allows, set it to Private. If policy doesn’t allow, you need explicit firewall rules or a sanctioned approach.

Task 11 (WSL): Check MTU and test for blackholing

cr0x@server:~$ ip link show dev eth0 | sed -n '1,3p'
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:15:5d:4a:2b:1c brd ff:ff:ff:ff:ff:ff

cr0x@server:~$ ping -c 2 -M do -s 1472 10.50.12.34
PING 10.50.12.34 (10.50.12.34) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1500

cr0x@server:~$ ping -c 2 -M do -s 1360 10.50.12.34
PING 10.50.12.34 (10.50.12.34) 1360(1388) bytes of data.

--- 10.50.12.34 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1018ms

What it means: First ping shows your interface MTU logic. Second shows path still fails—so it’s not just “too big packets”.
Decision: If smaller sizes succeed but larger stall, clamp MTU (or MSS) on the WSL side or fix PMTUD. If both fail, return to routing/policy.

Task 12 (WSL): Trace where packets stop (basic)

cr0x@server:~$ traceroute -n -w 1 -q 1 10.50.12.34
traceroute to 10.50.12.34 (10.50.12.34), 30 hops max, 60 byte packets
 1  172.26.112.1  0.240 ms
 2  * 
 3  * 
 4  *

What it means: Traffic reaches the Windows gateway, then disappears. That’s the boundary.
Decision: Stop tweaking Linux routes. You need to make Windows+VPN forward traffic properly, or adjust the VPN posture to support WSL.

Task 13 (WSL): See if a proxy is expected (common in corporate VPNs)

cr0x@server:~$ env | grep -iE 'http_proxy|https_proxy|no_proxy'
https_proxy=http://proxy.internal.corp:8080
http_proxy=http://proxy.internal.corp:8080
no_proxy=localhost,127.0.0.1,.internal.corp

What it means: Your environment expects an HTTP proxy.
Decision: If direct TCP to internal endpoints fails but proxy access works on Windows, you may need consistent proxy vars inside WSL (or use a PAC solution via Windows). Also ensure no_proxy includes internal domains that shouldn’t be proxied.

Task 14 (WSL): Confirm what git actually tries to do

cr0x@server:~$ GIT_CURL_VERBOSE=1 GIT_TRACE=1 git ls-remote https://git.internal.corp/team/repo 2>&1 | sed -n '1,40p'
20:17:14.123456 git.c:460               trace: built-in: git ls-remote https://git.internal.corp/team/repo
20:17:14.130000 http.c:725              == Info:   Trying 10.50.12.34:443...
20:17:16.132000 http.c:725              == Info: connect to 10.50.12.34 port 443 failed: Connection timed out
fatal: unable to access 'https://git.internal.corp/team/repo/': Failed to connect to git.internal.corp port 443 after 2002 ms: Timeout was reached

What it means: Resolution succeeded; connect timed out.
Decision: Don’t touch DNS. Fix routing/forwarding/policy or use the corporate proxy if required.

Fix patterns that actually hold up

There are dozens of forum “fixes” for WSL2+VPN. Half are cargo cult. A quarter are security policy violations.
The rest can be made reliable if you choose the right pattern for your environment: full tunnel, split tunnel, proxy-only, or “VPN client forbids it”.

Fix pattern 1: Stabilize DNS inside WSL (stop chasing moving targets)

If Windows resolves internal names but WSL doesn’t, you usually want WSL to use the corporate DNS servers directly,
or use a stable local forwarder that actually follows VPN state.

Option A: Pin resolv.conf (simple, blunt, effective)

Disable auto-generation and set explicit DNS servers.
This works best when corporate DNS IPs are stable while connected to VPN.

cr0x@server:~$ sudo tee /etc/wsl.conf >/dev/null <<'EOF'
[network]
generateResolvConf = false
EOF

cr0x@server:~$ sudo rm -f /etc/resolv.conf
cr0x@server:~$ sudo tee /etc/resolv.conf >/dev/null <<'EOF'
nameserver 10.20.30.40
nameserver 10.20.30.41
search internal.corp
options timeout:2 attempts:2
EOF

What the output means: No output is fine; you’re writing files.
Decision: Restart WSL (wsl --shutdown) so it re-reads config. If this fixes internal resolution but breaks public DNS off-VPN,
you need conditional DNS or a resolver that switches with VPN.

Option B: Use systemd-resolved properly (less blunt, more correct)

On newer WSL builds, systemd can be enabled. When it’s working, it gives you a real resolver daemon,
and you can point it at DNS servers with proper caching and fallback behavior.
The catch: you must keep the Windows/VPN DNS story consistent, or you’ll just fail faster.

If you’re not already using systemd in WSL, don’t enable it solely to fix VPN DNS unless you can support the change.
It’s good. It’s also another moving part. Production rule: add complexity only when it buys you stability.

Fix pattern 2: Split tunnel reality check (routes must exist both ways)

Split tunnel is where optimism goes to die. Your Windows box gets routes to 10.0.0.0/8 and friends via VPN.
WSL sends packets to Windows gateway. Windows forwards into VPN. Great.
Then the corporate side receives packets sourced from 172.26.112.45 (WSL’s private NAT range) and has no idea how to reply.

In full tunnel, this often “accidentally” works because the VPN client NATs outbound traffic and makes it look like it’s coming from the Windows host’s VPN address.
In split tunnel, NAT behavior varies wildly. Some clients NAT, some don’t, some do it only for Windows processes, and some refuse for “virtual adapters”.

Your fix options:

  • Best: VPN client supports NAT/forwarding for WSL traffic (or is configured to).
  • Also valid: Corporate network has routes back to the WSL NAT subnet (rare; requires network team buy-in and strict controls).
  • Workaround: Use a proxy/jump host/bastion that Windows can reach, and tunnel from WSL over that single allowed path.

Fix pattern 3: Make Windows firewall stop sabotaging the vEthernet (WSL) interface

If the WSL adapter is “Public” and your corporate endpoint firewall is strict, WSL traffic may be blocked from reaching the VPN adapter.
This is common on hardened laptops.

If you are allowed to, set the WSL virtual NIC profile to Private. If you’re not allowed, add narrow firewall rules instead of flipping profiles broadly.
Your security team will appreciate “narrow and auditable” more than “I toggled random stuff until it worked”.

Fix pattern 4: Fix MTU (or MSS) when TCP stalls

VPN encapsulation eats MTU. If you have an underlying MTU of 1500 and the VPN adds overhead,
your effective path MTU might be 1400-ish or lower. If PMTUD fails (because ICMP is filtered somewhere),
large TCP segments blackhole and connections hang in ways that look mystical.

The pragmatic fix is to clamp MTU on the WSL interface or clamp TCP MSS.
You’re trading a bit of throughput for stability. In corporate networks, stability is the only currency accepted.

cr0x@server:~$ sudo ip link set dev eth0 mtu 1400
cr0x@server:~$ ip link show dev eth0 | sed -n '1,2p'
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:15:5d:4a:2b:1c brd ff:ff:ff:ff:ff:ff

What it means: MTU is now 1400.
Decision: Retest the hanging workflow (git clone, curl, whatever). If it fixes stalls, make it persistent (via distro network config),
and document why. If it changes nothing, revert and keep diagnosing.

Fix pattern 5: Treat proxies as first-class citizens

Many corporate VPN environments are not “route anywhere you want”. They are “route to proxy, then proxy”.
If Windows has proxy auto-config, but WSL does not, WSL will fail even though the laptop “works”.

In that environment, the right fix is to configure proxy variables in WSL consistently and maintain no_proxy.
Don’t hardcode this into random shell startup files with mystery conditions; use a managed approach (profile scripts),
and keep it visible. Secrets belong in credential stores, not in .bashrc.

Joke #2: Debugging VPN networking is like watching a magic show, except the rabbit is your packet and it never comes back.

Fix pattern 6: When the VPN client forbids WSL, stop fighting it

Some VPN clients enforce “no traffic from virtual adapters” or similar. Sometimes this is explicit policy; sometimes it’s an implementation artifact.
If your diagnosis consistently shows packets die at the Windows gateway and your VPN client is known to lock down forwarding,
you have three sane options:

  • Use WSL1 (if it fits your workload) because it shares the Windows network stack and often avoids the NAT boundary entirely.
  • Use a sanctioned dev VM on the VPN (managed by IT) or a remote development host inside the network.
  • Use an SSH bastion reachable from Windows and connect through it (port-forwarding) from WSL; treat it as a controlled egress.

The unsane option is “install a sketchy driver” or “disable the endpoint firewall” to make it work.
That’s how you get your laptop quarantined, and then your week gets worse.

A reliability principle worth keeping

Hope is not a strategy. — James Cameron

Apply it here: stop hoping the route table “looks fine” and prove every layer with targeted tests.

Common mistakes: symptom → root cause → fix

1) Symptom: “WSL can’t resolve internal domains; Windows can”

Root cause: WSL /etc/resolv.conf points to a resolver that doesn’t follow VPN DNS changes, or Windows DNS is per-interface and WSL’s DNS proxy path misses NRPT rules.

Fix: Pin DNS inside WSL (disable generateResolvConf) to corporate DNS servers when on VPN, or implement a resolver approach that follows VPN state reliably.

2) Symptom: “DNS resolves, but TCP to internal IP times out from WSL”

Root cause: VPN client blocks forwarding from the WSL vEthernet adapter, or Windows firewall profile blocks it.

Fix: Check WSL adapter network category; add narrow firewall exceptions; if VPN policy forbids it, use WSL1 or a sanctioned dev host.

3) Symptom: “Split tunnel: some internal subnets work, others don’t”

Root cause: Missing routes from VPN client, overlapping local networks, or return path not known for WSL NAT range.

Fix: Verify Windows routes for each prefix; avoid local LAN ranges that overlap corporate RFC1918; push correct routes via VPN profile; ensure NAT behavior is consistent.

4) Symptom: “After sleep/resume, WSL loses connectivity until restart”

Root cause: Interface metric changes or stale resolver state in WSL; VPN reconnect changes DNS/route ordering.

Fix: Automate wsl --shutdown after VPN reconnect (or at least document it). Prefer stable DNS configuration instead of relying on auto-generated resolv.conf.

5) Symptom: “HTTPS stalls; small requests work; ping works”

Root cause: MTU blackhole due to VPN encapsulation and broken PMTUD.

Fix: Lower MTU on WSL interface (or clamp MSS) and retest. If it works, make it persistent and note the VPN overhead.

6) Symptom: “Local services on Windows aren’t reachable from WSL while on VPN”

Root cause: VPN forces routes or firewall policies that interfere with local subnet/localhost forwarding; sometimes the VPN toggles firewall rules aggressively.

Fix: Use explicit host IPs, validate Windows firewall rules, and consider binding services to the right interface. For development, prefer connecting via localhost when supported, but verify it after VPN connect.

Checklists / step-by-step plan

Checklist A: You need internal DNS + internal TCP from WSL (full tunnel VPN)

  1. Confirm Windows can resolve and connect to the internal service.
  2. In WSL, check /etc/resolv.conf; test dig to corporate DNS directly.
  3. If Windows works and WSL doesn’t, pin WSL DNS to corporate resolvers or fix the DNS proxy path.
  4. Verify Windows routes: VPN default route and internal prefixes should go via VPN adapter.
  5. Check WSL adapter firewall profile; change to Private if allowed or add narrow rules.
  6. If TCP stalls in weird ways, test MTU and clamp if needed.
  7. Document the final state: DNS servers, MTU, and what breaks when off-VPN.

Checklist B: You need split tunnel (internet local, corp subnets via VPN)

  1. List the exact internal prefixes you need (don’t guess; get them from the VPN profile or network team).
  2. On Windows, confirm those prefixes have routes via VPN adapter (Task 9 approach).
  3. In WSL, verify traffic to each prefix goes to the Windows gateway (Task 6) and doesn’t detour.
  4. Test reachability to an internal IP per prefix (Task 8).
  5. If it fails, suspect return path/NAT behavior: see if corporate side expects traffic from the Windows VPN IP only.
  6. Decide: (a) VPN client supports WSL forwarding/NAT; (b) corporate routes back to WSL NAT subnet; or (c) use proxy/bastion.
  7. Stabilize DNS: internal domains must resolve to internal IPs only while on VPN; avoid mixing public and private resolvers.

Checklist C: You’re on a locked-down corporate endpoint

  1. Assume you cannot change firewall profile, install drivers, or add routes persistently.
  2. Prove where packets die (Task 12 traceroute to internal IP).
  3. If the drop is at Windows gateway and Windows itself can reach the target, it’s likely policy against forwarding from virtual adapters.
  4. Stop fighting the laptop. Switch strategy: WSL1, remote dev host, or bastion with port forwards.
  5. Get the exception documented if WSL2 is a business need; “it’s annoying” is not a business need.

Three corporate mini-stories (anonymized, plausible, and painfully familiar)

Mini-story 1: The incident caused by a wrong assumption

A team rolled out a new internal package mirror during a migration. Developers used WSL2 for builds.
The mirror lived on an internal subnet reachable only via VPN.
Windows laptops could reach it. The team assumed that meant “developers can reach it,” full stop.

The first Monday after rollout, build times went from normal to catastrophic. Not because the mirror was down.
Because WSL2 could not reach it, so tooling fell back to public mirrors and retries. Some builds succeeded slowly; some failed; everyone blamed the mirror anyway.
The on-call got paged for “registry outage” that wasn’t an outage.

The failure mode was clean in hindsight: Windows had the VPN routes; WSL2 traffic died at the host boundary.
The VPN client enforced a policy that didn’t allow forwarding from virtual adapters. Windows processes were fine; WSL2 was not.
Nobody checked the assumption early because “it works on my machine” was true—just not in the same network stack.

The fix wasn’t a clever route hack. It was a decision: for this company’s security posture, WSL2 would not get direct VPN access.
They shipped a sanctioned Linux VM image for builds and kept WSL2 for local-only workflows. It was less convenient.
It also stopped the weekly “why is apt broken” Slack archaeology.

Mini-story 2: The optimization that backfired

Another org tried to “speed up VPN” by enforcing split tunneling aggressively. Internet traffic stayed local; only internal prefixes went to VPN.
It reduced load on their concentrators and made video calls less miserable. Everybody celebrated.

Then the dev tooling started failing in weird ways. Containers in WSL2 would reach some internal services but not others.
DNS sometimes returned internal IPs, sometimes public ones. Git over HTTPS hung intermittently.
It wasn’t random. It was deterministic chaos.

The optimization introduced two coupling points: route completeness and DNS consistency.
Their split tunnel routes didn’t include every internal dependency, especially newly added ones. And their DNS strategy relied on Windows NRPT rules,
which WSL2 didn’t honor when it used a generic resolver path.
The result: resolution succeeded, but the route didn’t exist, so connections timed out. Developers “fixed” it with hosts files and hardcoded IPs.
Which worked until the next renumbering.

The rollback wasn’t total. The team kept split tunnel for most staff, but created a “developer” VPN profile:
either full tunnel or a much broader set of prefixes plus stable DNS. Less elegant, more bandwidth, fewer mystery outages.
They also banned hosts-file fixes for internal services, not out of purity, but because it was operational debt with teeth.

Mini-story 3: The boring but correct practice that saved the day

A platform team had a habit that looked like bureaucracy: every time a developer reported “VPN broke WSL,”
the first response wasn’t advice—it was a tiny checklist of command outputs to paste.
WSL: ip route, cat /etc/resolv.conf, dig internal domain.
Windows: route table snippet and DNS server list. Same request, every time.

People complained at first. “Why do you need all that? It’s obviously DNS.” It was not always DNS.
The checklist made them stop arguing with their own hunches.
After a month, patterns emerged: one VPN client version broke forwarding; one Wi‑Fi driver update changed metrics; one endpoint firewall policy update reclassified the WSL NIC.

When a real incident hit—an internal repo unreachable from WSL across a department—those baseline outputs made triage fast.
They could say: Windows resolves and connects, WSL resolves but cannot connect, traceroute dies at gateway, and the WSL NIC is Public.
That narrowed the blast radius from “network is down” to “endpoint policy regression”.

The eventual fix was dull: a policy adjustment to allow specific outbound flows from the WSL adapter into the VPN adapter,
plus a documented workaround for MTU clamping when on a certain ISP. Nobody wrote a blog post about it.
But the next time it happened, the on-call slept.

FAQ

1) Why does WSL2 break on VPN when WSL1 didn’t?

WSL1 shared the Windows network stack. WSL2 is a VM behind a NAT boundary. VPN clients and firewalls that handle the Windows stack
don’t automatically forward traffic for a second stack.

2) Should I switch to WSL1 just for VPN compatibility?

If your workload tolerates WSL1 (filesystem performance and kernel features can be limiting), it’s a legitimate fix.
It avoids the NAT boundary and often behaves like “normal Windows networking”.

3) Is the problem always DNS?

No. DNS is common, but plenty of failures are routing/policy: traffic reaches the Windows gateway and dies because the VPN client blocks forwarding
or the firewall profile is hostile.

4) Why does it work until I reconnect VPN or resume from sleep?

VPN reconnect and resume can reorder routes, change DNS servers, and change interface metrics.
WSL may keep old resolver configuration, and the Windows/VPN state changes underneath it.

5) How do I know if it’s MTU?

If name resolution works and small requests succeed, but HTTPS downloads, git clones, or API calls hang or stall, suspect MTU/PMTUD.
Clamp MTU in WSL temporarily and retest the failing workflow.

6) Can I just add routes inside WSL to fix split tunnel?

Usually no. WSL routes still go through the Windows gateway. If Windows/VPN won’t forward, Linux-side route tweaks won’t change policy.
Focus on Windows routes and VPN behavior first.

7) Why can Windows reach internal IPs but WSL can’t?

Windows processes originate from Windows interfaces and are handled by the VPN client as expected.
WSL traffic originates from the WSL vEthernet/NAT subnet; some VPN clients treat that as “forwarded traffic” and block it.

8) What’s the cleanest enterprise-friendly solution?

A sanctioned development environment inside the corporate network: managed VM, VDI, or remote dev host.
If local WSL2 must be used, negotiate a documented policy exception and implement narrow firewall rules rather than broad profile changes.

9) Does enabling systemd in WSL fix VPN issues?

Sometimes it helps with DNS because you get a real resolver service, but it doesn’t magically bypass VPN policy or firewall rules.
Use it when you can support it operationally, not as a superstition.

Practical next steps

Do three things, in this order:

  1. Classify the failure: DNS vs routing/policy vs MTU. Use the fast diagnosis steps and the tasks above.
  2. Pick a stable fix pattern: pin DNS, adjust firewall/profile, clamp MTU, adopt proxy, or switch strategy (WSL1/remote dev host).
  3. Make it repeatable: document the expected state (DNS servers, VPN mode, MTU) and keep a copy-paste checklist for the next outage.

The goal isn’t “make it work on your laptop today.” The goal is “make it boring next month.”
WSL2 is excellent. VPNs are necessary. Getting them to cooperate is less about clever commands and more about admitting where the boundary really is.

]]>
https://cr0x.net/en/wsl2-vpn-breaks-and-fix/feed/ 0
Port Forwarding in WSL2: Make Your Services Reachable from LAN https://cr0x.net/en/wsl2-port-forwarding-lan/ https://cr0x.net/en/wsl2-port-forwarding-lan/#respond Tue, 24 Feb 2026 15:16:56 +0000 https://cr0x.net/?p=34902 You spun up a perfectly good service in WSL2—an API, a Prometheus endpoint, a dev database—and it works from the Windows host.
Then someone on the LAN tries to hit it and gets nothing but a timeout and a vague sense of betrayal.

This isn’t a Linux problem. It’s not even really a Windows problem. It’s an expectation management problem with a NAT’d VM
that changes IPs, plus a firewall that treats you like a stranger until you prove otherwise.

The mental model: what “WSL2 networking” actually is

WSL2 is not “Linux processes on Windows” in the way WSL1 was. WSL2 is a lightweight VM running a real Linux kernel.
That VM sits behind a virtual switch and typically a NAT. Translation: your WSL2 instance has its own IP address, usually on a private
Hyper-V network, and that address can change whenever WSL restarts.

When you hit localhost on Windows, WSL2 tries to be helpful. There’s a feature commonly referred to as
“localhost forwarding” that makes ports reachable from the Windows host without you doing explicit routing.
But “reachable from the Windows host” is not the same as “reachable from the rest of your network.”

For the LAN to reach a service inside WSL2, you need an ingress point on the Windows host (which does have a LAN-reachable IP),
and then a forward or proxy from that host into the VM’s private IP/port. That’s the whole game.

Think of Windows as your edge node, and WSL2 as a backend service in a private subnet. Suddenly it looks like any other environment
you’ve supported in production. Congratulations: you’re running a tiny NAT’d datacenter on your laptop.

Interesting facts and quick history

  • WSL1 vs WSL2 networking is fundamentally different. WSL1 shared the host network stack; WSL2 does not. That’s why port exposure got “harder.”
  • WSL2 uses a real Linux kernel. This is why iptables, sysctl, and “normal Linux” debugging tools are relevant again.
  • Hyper-V’s virtual switch is the plumbing. Even on Windows editions where you never opened Hyper-V Manager, WSL2 still relies on Hyper-V components.
  • The WSL2 VM IP is usually ephemeral. It’s allocated on startup; shutdown/restart often changes it, which breaks static forwards if you hardcode it.
  • netsh interface portproxy predates WSL2 by years. It’s an old-but-effective TCP forwarding tool that became popular again because it works.
  • Portproxy is TCP-only. If you need UDP, you’ll use a different approach (or a different design).
  • Windows Firewall treats forwarded ports like real inbound services. If the firewall blocks the port on Windows, your forwarding plan is dead on arrival.
  • “It works on localhost” is a trap. Localhost success often proves only that Windows-to-WSL forwarding is active—not that your service is listening on the right interface.

Fast diagnosis playbook

When someone says “I can’t reach my WSL2 service from the LAN,” the fastest route is to identify the first hop where packets stop.
Don’t guess. Don’t reboot. Measure.

1) First check: is the service listening in Linux on the right interface?

If it only listens on 127.0.0.1 inside WSL, forwarding from Windows won’t help. You’ll be forwarding into a brick wall.

2) Second check: can Windows reach the service by WSL IP?

If Windows can’t connect to WSL_IP:PORT, your issue is not the LAN. It’s service binding, Linux firewall, or routing between host and VM.

3) Third check: is Windows listening on the LAN IP/port and is the firewall open?

If Windows isn’t listening (or firewall is blocking), the LAN will time out. If Windows is listening but forwarding is wrong, you’ll see quick failures or resets.

4) Fourth check: does it fail only from some LAN clients?

That’s a clue for client-side firewall/VPN split tunneling, wrong subnet profiles (Public vs Private), or a corporate endpoint policy.

Baseline tasks: prove where the packets die

These are the tasks I run in order. Each one has: command, example output, what it means, and what decision you make next.
The goal is to stop arguing with your future self.

Task 1: Find your Windows LAN IP (the real ingress point)

cr0x@server:~$ ipconfig

Windows IP Configuration

Ethernet adapter Ethernet:

   IPv4 Address. . . . . . . . . . . : 192.168.1.40
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.1.1

Meaning: LAN clients must target 192.168.1.40 (or your Wi-Fi IP) to reach anything on this machine.
Decision: Use this IP in your test from another computer. Don’t use the WSL IP; the LAN can’t route to it by default.

Task 2: Find the WSL2 VM IP (the backend target)

cr0x@server:~$ wsl.exe -e bash -lc "ip -4 addr show eth0 | sed -n 's/.*inet \\([0-9.]*\\).*/\\1/p'"
172.24.181.166

Meaning: This is the address Windows must forward to.
Decision: If you implement portproxy, you’ll point it at this IP. Also: expect it to change later.

Task 3: Confirm the service is listening in WSL2, and where

cr0x@server:~$ wsl.exe -e bash -lc "ss -lntp | head -n 20"
State  Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
LISTEN 0      4096   127.0.0.1:8000      0.0.0.0:*     users:(("python3",pid=2217,fd=3))
LISTEN 0      4096   0.0.0.0:9090        0.0.0.0:*     users:(("prometheus",pid=1022,fd=7))

Meaning: Port 8000 is bound to 127.0.0.1 only (loopback). Port 9090 is bound to 0.0.0.0 (all interfaces).
Decision: Anything bound to 127.0.0.1 will not be reachable through a forward to eth0. Fix the service config first.

Task 4: Test from Windows to the WSL IP directly (bypasses portproxy)

cr0x@server:~$ powershell.exe -NoProfile -Command "Test-NetConnection -ComputerName 172.24.181.166 -Port 9090"

ComputerName     : 172.24.181.166
RemoteAddress    : 172.24.181.166
RemotePort       : 9090
InterfaceAlias   : vEthernet (WSL)
SourceAddress    : 172.24.176.1
TcpTestSucceeded : True

Meaning: Windows can reach the service inside WSL. That’s good.
Decision: Now you only need to solve LAN-to-Windows ingress + forward.

Task 5: If the Windows-to-WSL test fails, check Linux firewall inside WSL

cr0x@server:~$ wsl.exe -e bash -lc "sudo iptables -S | head -n 30"
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT

Meaning: No iptables blocking here (policy ACCEPT).
Decision: If your policy is DROP or you have explicit rejects, fix that before touching Windows.

Task 6: Check whether Windows is already listening on the target port

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetTCPConnection -LocalPort 9090 -State Listen | Format-Table -AutoSize"

LocalAddress LocalPort RemoteAddress RemotePort State  OwningProcess
----------- --------- ------------- ---------- -----  -------------
0.0.0.0     9090      0.0.0.0       0          Listen 1234

Meaning: Something is already listening on Windows on port 9090. Portproxy cannot bind to a port that’s taken.
Decision: Choose a different external port, or stop the conflicting listener.

Task 7: Check existing portproxy rules (because ghosts exist)

cr0x@server:~$ powershell.exe -NoProfile -Command "netsh interface portproxy show v4tov4"

Listen on ipv4:             Connect to ipv4:

Address         Port        Address         Port
--------------- ----------  --------------- ----------
0.0.0.0         8080        172.24.181.166  8000

Meaning: Windows listens on 0.0.0.0:8080 and forwards to WSL 172.24.181.166:8000.
Decision: Validate that WSL is actually listening on 8000 on a non-loopback interface, or this forward is lying to you.

Task 8: Confirm Windows is listening after you add portproxy

cr0x@server:~$ powershell.exe -NoProfile -Command "netstat -ano | findstr :8080"
  TCP    0.0.0.0:8080           0.0.0.0:0              LISTENING       4

Meaning: PID 4 is the System process hosting the portproxy listener. That’s normal.
Decision: If you don’t see LISTENING, portproxy isn’t active or the port is blocked/occupied.

Task 9: Test from a different LAN machine (the only test that matters)

cr0x@server:~$ curl -I http://192.168.1.40:8080
HTTP/1.1 200 OK
Server: uvicorn
Date: Tue, 06 Feb 2026 11:20:14 GMT
Content-Type: text/html; charset=utf-8

Meaning: LAN can reach Windows, Windows forwards to WSL, and the app responds.
Decision: Ship it. Then make it persistent, because WSL IPs are fickle.

Task 10: If LAN test times out, check Windows firewall rule state

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetFirewallRule -DisplayName 'WSL2 8080 Inbound' -ErrorAction SilentlyContinue | Select-Object DisplayName, Enabled, Profile, Direction, Action"
DisplayName            Enabled Profile Direction Action
-----------            ------- ------- --------- ------
WSL2 8080 Inbound      True    Private Inbound   Allow

Meaning: Rule exists, enabled, applies to Private profile, allows inbound.
Decision: If your network is set to Public, this rule won’t match. Fix the profile or add Public (carefully).

Task 11: Verify the Windows network profile (Public vs Private)

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetConnectionProfile | Format-Table -AutoSize"
Name             InterfaceAlias      NetworkCategory IPv4Connectivity
----             --------------      --------------- ----------------
OfficeLAN        Ethernet            Public          Internet

Meaning: You’re on a Public profile, so Private-only firewall rules won’t help.
Decision: Either switch the profile to Private (if appropriate) or allow the port on Public for this interface only.

Task 12: Confirm traffic is arriving on Windows (packet capture, quick and honest)

cr0x@server:~$ powershell.exe -NoProfile -Command "pktmon filter remove; pktmon filter add -p 8080; pktmon start --etw -m real-time"
PktMon started with Real Time display.

Meaning: You’re now watching for inbound packets on port 8080.
Decision: If you see nothing during a client test, the problem is upstream (wrong IP, VLAN, client route, or network policy). If you see traffic, focus on firewall/portproxy/service.

Joke #1: Port forwarding is like office politics—everything works until you assume the other side can “just reach it.”

Three ways to expose WSL2 services to the LAN (and when to use each)

Approach A: Windows portproxy (simple, works, TCP-only)

This is the go-to for “I need my WSL2 TCP service reachable from my LAN today.” You bind a listener on Windows, forward to WSL’s IP.
It’s boring. Boring is good.

Use it when:

  • You need TCP (HTTP, gRPC, SSH, Prometheus, databases over TCP).
  • You can tolerate a small amount of Windows-side configuration.
  • You want LAN reachability without changing the WSL2 networking mode.

Approach B: Reverse proxy on Windows (IIS/NGINX/Caddy on the host)

If you want TLS termination, host-based auth, nicer logs, or multiple backends with sane routing, run a proper reverse proxy on Windows
and point it at WSL. This is cleaner than stacking multiple portproxy rules, and it’s easier to reason about when you run more than one service.

Use it when:

  • You need HTTPS and don’t want your dev certs living inside WSL only.
  • You want path-based routing (/api, /grafana), or multiple apps on port 443.
  • You need real logs and rate limiting.

Approach C: Bridging / mirrored networking modes (tempting, but environment-specific)

Depending on your Windows and WSL version, there are newer networking modes that can change how WSL is connected (including “mirrored” behavior).
This can reduce the need for explicit port forwards, but it’s not a universal fix—especially in corporate environments with endpoint controls,
VPN clients, and managed firewall policies.

Use it when:

  • You’re standardizing a dev fleet and can validate behavior across identical builds.
  • You want fewer moving parts than portproxy + scheduled tasks.
  • You understand that “it works on my laptop” is not a deployment strategy.

Method 1 (most common): Windows netsh portproxy + firewall

Step 1: Make sure your service listens on something reachable from WSL’s eth0

If your app binds to 127.0.0.1 inside WSL, it will only accept connections originating inside that same WSL instance.
That’s fine for local dev. It’s useless for LAN exposure.

Example: Uvicorn/FastAPI. Don’t do this:

cr0x@server:~$ wsl.exe -e bash -lc "python3 -m uvicorn app:app --host 127.0.0.1 --port 8000"
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Do this:

cr0x@server:~$ wsl.exe -e bash -lc "python3 -m uvicorn app:app --host 0.0.0.0 --port 8000"
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Decision: Before touching Windows networking, get ss -lntp to show 0.0.0.0:8000 (or the WSL eth0 IP).
That’s the contract you’re going to forward to.

Step 2: Add a portproxy rule (Windows listens, forwards to WSL)

Pick an external listening port on Windows. Often you’ll keep it the same as the internal port.
Here, we’ll forward Windows port 8080 to WSL port 8000.

cr0x@server:~$ powershell.exe -NoProfile -Command "netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=8080 connectaddress=172.24.181.166 connectport=8000"
Ok.

Meaning: Windows now accepts TCP connections on all interfaces (including LAN) on port 8080 and forwards to WSL.
Decision: If you only want LAN exposure on one interface/IP, set listenaddress to your LAN IP instead of 0.0.0.0.

Step 3: Verify the rule and listener

cr0x@server:~$ powershell.exe -NoProfile -Command "netsh interface portproxy show v4tov4"
Listen on ipv4:             Connect to ipv4:

Address         Port        Address         Port
--------------- ----------  --------------- ----------
0.0.0.0         8080        172.24.181.166  8000
cr0x@server:~$ powershell.exe -NoProfile -Command "netstat -ano | findstr :8080"
  TCP    0.0.0.0:8080           0.0.0.0:0              LISTENING       4

Meaning: The portproxy mapping exists and Windows is listening.
Decision: If netstat shows nothing, you have a conflict (port in use), insufficient privileges, or a policy blocking it.

Step 4: Test from Windows, then from LAN

cr0x@server:~$ powershell.exe -NoProfile -Command "curl.exe -I http://127.0.0.1:8080"
HTTP/1.1 200 OK
Server: uvicorn
Date: Tue, 06 Feb 2026 11:22:40 GMT
Content-Type: text/html; charset=utf-8

Meaning: Windows can hit its own listener and get forwarded into WSL.
Decision: If localhost works but LAN doesn’t, your issue is almost always Windows Firewall profile/rules or upstream network.

Windows Firewall rules that don’t make you hate your life

Windows Firewall is not your enemy. It’s a bouncer. You just need to put your name on the list.
The subtle part is the profile: Domain, Private, Public. Your “OfficeLAN” might be categorized as Public, and then your carefully crafted
Private-only rule does exactly nothing.

Create an inbound allow rule for the listening port

cr0x@server:~$ powershell.exe -NoProfile -Command "New-NetFirewallRule -DisplayName 'WSL2 8080 Inbound' -Direction Inbound -Action Allow -Protocol TCP -LocalPort 8080 -Profile Private"
Name                  : {2d4e3f2b-41c0-4b84-9c85-1c09f3324c7f}
DisplayName           : WSL2 8080 Inbound
Enabled               : True
Direction             : Inbound
Profiles              : Private

Meaning: TCP/8080 inbound is allowed when your interface is in the Private profile.
Decision: If you’re on Public (Task 11), either change the network category or add Public. Don’t blindly open ports on Public on a laptop you roam with.

Limit scope: allow only your LAN subnet (recommended)

cr0x@server:~$ powershell.exe -NoProfile -Command "Set-NetFirewallRule -DisplayName 'WSL2 8080 Inbound' -RemoteAddress 192.168.1.0/24"

Meaning: Only clients in 192.168.1.0/24 can connect.
Decision: This is the grown-up move for home labs and office subnets. You’ll thank yourself later.

Confirm the rule applies and is enabled

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-NetFirewallRule -DisplayName 'WSL2 8080 Inbound' | Get-NetFirewallPortFilter | Format-Table -AutoSize"
Protocol LocalPort RemotePort IcmpType LocalAddress RemoteAddress
-------- --------- ---------- -------- ------------ -------------
TCP      8080      Any        Any      Any          192.168.1.0/24

Meaning: The port filter is correct and scoped.
Decision: If RemoteAddress shows “Any” and you didn’t intend that, tighten it. Security incidents often begin with “It’s just my dev box.”

Make it persistent across WSL restarts

WSL2’s IP changes. Portproxy rules don’t magically follow it. If you hardcode yesterday’s WSL IP, you will be
“mysteriously down” after a reboot, a WSL shutdown, or sometimes after the machine wakes up cranky.

There are two sane options:

  • Automate updating portproxy whenever WSL starts (or at user logon).
  • Stop using portproxy for dynamic backends and put a reverse proxy on Windows that targets localhost via WSL localhost forwarding (where appropriate).

Option 1: Script portproxy update at logon (simple, effective)

This PowerShell one-liner gets the current WSL IP and updates a portproxy mapping.

cr0x@server:~$ powershell.exe -NoProfile -Command "$wslip = (wsl.exe -e bash -lc `"ip -4 addr show eth0 | sed -n 's/.*inet \\([0-9.]*\\).*/\\1/p'`").Trim(); netsh interface portproxy delete v4tov4 listenport=8080 listenaddress=0.0.0.0 2>$null; netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=8080 connectaddress=$wslip connectport=8000; netsh interface portproxy show v4tov4"
Listen on ipv4:             Connect to ipv4:

Address         Port        Address         Port
--------------- ----------  --------------- ----------
0.0.0.0         8080        172.24.181.166  8000

Meaning: The rule is now aligned with today’s WSL IP.
Decision: Put this in a Scheduled Task “At log on” or “At startup” with highest privileges.
If you don’t run it elevated, portproxy changes can fail silently or half-work.

Option 2: Confirm WSL isn’t being shut down unexpectedly

cr0x@server:~$ wsl.exe --status
Default Distribution: Ubuntu
Default Version: 2
WSL version: 2.1.5.0
Kernel version: 5.15.146.1

Meaning: WSL is installed and active; version details help when behavior differs across machines.
Decision: If your org has mixed versions, standardize or expect inconsistent port/localhost behavior.

Make the service listen correctly (the part everyone forgets)

Most “WSL2 forwarding” failures are not forwarding failures. They’re binding failures.
Your service is listening only on loopback, so it works from within WSL and maybe from Windows via special localhost forwarding,
but it won’t accept connections coming in on eth0.

Diagnose binding: loopback vs all interfaces

cr0x@server:~$ wsl.exe -e bash -lc "ss -lnt | awk 'NR==1 || /:8000|:9090/'"
State  Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0      4096   127.0.0.1:8000    0.0.0.0:*
LISTEN 0      4096   0.0.0.0:9090      0.0.0.0:*

Meaning: 8000 is loopback-only; 9090 is reachable from the WSL VM network.
Decision: For LAN exposure, bind to 0.0.0.0 or the WSL IP, then lock it down with firewall/proxy if needed.

Check what address your app thinks it should bind to

cr0x@server:~$ wsl.exe -e bash -lc "grep -R --line-number 'listen\\|bind\\|host' /etc/nginx 2>/dev/null | head"

Meaning: No nginx config in this example. On real systems, you’ll find listen 127.0.0.1:... surprises.
Decision: If you see loopback binds, decide whether you want portproxy to forward to loopback (not typical) or change the service to listen on eth0.

Joke #2: If you bind to 127.0.0.1 and expect the LAN to reach it, you’re basically mailing a package to yourself and blaming the postal service.

Docker/Desktop/containers inside WSL2: the extra layer cake

Containers don’t remove networking complexity; they concentrate it. If you run Docker inside WSL2 (or Docker Desktop using WSL2),
you may have three networking layers: LAN → Windows → WSL2 VM → container.

The right move depends on where the container port is published.

Task: Check container port publishing inside WSL2

cr0x@server:~$ wsl.exe -e bash -lc "docker ps --format 'table {{.Names}}\t{{.Ports}}' | head"
NAMES        PORTS
api-dev      0.0.0.0:8000->8000/tcp
grafana      127.0.0.1:3000->3000/tcp

Meaning: api-dev is published on all interfaces in WSL2; grafana is loopback-only inside WSL2.
Decision: For LAN exposure, publish the container port on 0.0.0.0 in WSL2 (or use a WSL-side reverse proxy), then forward from Windows.

Task: Verify from Windows to WSL IP for a containerized service

cr0x@server:~$ powershell.exe -NoProfile -Command "Test-NetConnection -ComputerName 172.24.181.166 -Port 8000"
ComputerName     : 172.24.181.166
RemoteAddress    : 172.24.181.166
RemotePort       : 8000
InterfaceAlias   : vEthernet (WSL)
SourceAddress    : 172.24.176.1
TcpTestSucceeded : True

Meaning: The container-published port is reachable from Windows through the vEthernet interface.
Decision: Now portproxy that Windows port to WSL:8000, and handle firewall.

Three corporate-world mini-stories (so you don’t repeat them)

1) The incident caused by a wrong assumption

A team built an internal toolchain: a small API in WSL2, a UI on another developer’s machine, and some automation hitting the API from a CI runner on the LAN.
The developer demoed it: works on their Windows laptop using http://localhost:5000. Everyone nodded. They merged.

When the CI runner tried to call it, timeouts. The assumption was “localhost means reachable,” which is the kind of lie that survives just long enough to ship.
They had confused Windows-host localhost forwarding with actual LAN reachability.

The debugging started in the wrong place. People stared at application logs and blame-shifted to HTTP clients. Meanwhile, packets never made it to Windows at all.
The firewall profile was Public on the laptop because the network was treated as untrusted by policy. Inbound was blocked.

The fix was straightforward: a Windows inbound firewall rule scoped to the CI subnet, plus a portproxy mapping that updated on logon.
The important part wasn’t the commands. It was the change in mental model: WSL2 is a backend network segment, and Windows is the edge.

Afterward they wrote a one-page runbook: “If it only works on localhost, it’s not a service. It’s a local demo.” That saved them from repeating the same mistake later.

2) The optimization that backfired

Another team had multiple WSL2 services and got tired of managing portproxy rules. Someone suggested a “cleaner” approach:
bind everything inside WSL2 to 0.0.0.0, then open a broad inbound firewall rule on Windows for a port range.
Less admin overhead, faster onboarding. It sounded efficient.

Then a security scan flagged the machine as exposing unexpected services on the corporate network.
Some of those ports were dev databases without auth (because “it’s just on my laptop”), and one was a debugging endpoint that happily spilled environment variables.

The team argued it was a false positive. It wasn’t. They had created a “soft target” workstation that behaved like a lightly defended server.
Even if nobody exploited it, the exposure alone was enough to trigger incident response and policy scrutiny.

The rollback was painful but educational: they moved to a Windows reverse proxy on 443 with authentication, and only forwarded specific ports.
They also restricted firewall rules to known subnets. The onboarding got slightly more complex, but the security posture stopped being embarrassing.

Optimization isn’t just about speed. In networking, “fewer rules” often means “bigger blast radius.”

3) The boring but correct practice that saved the day

A platform team had a predictable problem: WSL2 IP changes broke local integrations for a handful of engineers every week.
Instead of heroic debugging sessions, they wrote a dull script and standardized it across machines via endpoint management.

The script did three things: query current WSL IP, rewrite portproxy mappings for a short list of required ports, and validate listeners with netstat.
If validation failed, it logged an error and stopped. No guessing, no partial state.

They also created firewall rules with explicit scopes (office subnets only) and locked them to Private/Domain profiles.
On Public networks (coffee shops), the services simply weren’t reachable. That was a feature, not a bug.

Months later, when a Windows update changed behavior on a subset of machines, their validation step caught it early.
They didn’t discover the break during a demo. They discovered it during a login script run, with a clear failure point.

The most valuable SRE work often looks like paperwork: standardization, idempotent scripts, and checks that fail loudly.

Common mistakes: symptom → root cause → fix

1) Symptom: works in WSL with curl, fails from LAN with timeout

Root cause: Windows Firewall blocks inbound on the listening port (often wrong network profile: Public vs Private).

Fix: Create/enable an inbound allow rule for the port and correct profile, scope to your LAN subnet.

2) Symptom: works from Windows localhost, fails from Windows to WSL_IP

Root cause: Service in WSL binds to 127.0.0.1 only, or Linux firewall blocks non-loopback.

Fix: Bind service to 0.0.0.0 (or WSL eth0 IP). Re-check with ss -lntp.

3) Symptom: LAN sees “connection refused” immediately

Root cause: Windows is reachable, but nothing is listening (portproxy missing, wrong listenport, or port conflict prevented binding).

Fix: Check netstat -ano for LISTENING. Fix port conflicts or recreate portproxy mapping.

4) Symptom: it worked yesterday; today it times out

Root cause: WSL2 IP changed; portproxy still points to the old IP.

Fix: Re-query WSL IP and update portproxy. Automate it via Scheduled Task.

5) Symptom: only some LAN clients can connect

Root cause: Firewall rule scoped too tightly, client on different subnet/VLAN, or client is on VPN with different routing.

Fix: Validate client source IP/subnet. Expand RemoteAddress scope intentionally, or route appropriately.

6) Symptom: HTTPS/TLS errors or wrong Host header behavior

Root cause: Using raw portproxy for multiple HTTP apps without a reverse proxy; backend expects specific Host/SNI.

Fix: Put a reverse proxy on Windows, terminate TLS there, route by host/path, and forward to WSL.

7) Symptom: UDP service never works through portproxy

Root cause: Portproxy is TCP-only.

Fix: Redesign (use TCP), run a different proxy that supports UDP, or use a networking mode that provides direct addressing.

8) Symptom: portproxy exists, firewall open, still nothing

Root cause: You’re listening on 0.0.0.0 but the Windows NIC is on Public profile with inbound restrictions or corporate endpoint rules override local firewall.

Fix: Check network profile, endpoint security policies, and validate with packet capture (pktmon). If policy blocks inbound, stop fighting it and use an approved reverse proxy/ingress solution.

Checklists / step-by-step plan

Checklist A: One service, one port, needs LAN access (TCP)

  1. Confirm service binds correctly in WSL2.

    Run ss -lntp. You want 0.0.0.0:PORT or WSL_IP:PORT, not 127.0.0.1:PORT.

  2. Get the WSL IP.

    Use wsl.exe -e bash -lc "ip -4 addr show eth0 ...". Write it down; it’s your connectaddress.

  3. Test Windows to WSL directly.

    Test-NetConnection -ComputerName WSL_IP -Port PORT. If this fails, don’t touch the LAN yet.

  4. Create portproxy mapping.

    Bind on Windows listenaddress=0.0.0.0 and forward to WSL.

  5. Open Windows Firewall for the listening port.

    Create an inbound allow rule on Private/Domain; scope RemoteAddress to your LAN subnet.

  6. Test from another LAN machine.

    Use curl or a browser. Don’t call it done until a second machine succeeds.

  7. Make it persistent.

    Schedule a script to update portproxy mapping with the current WSL IP at startup/logon.

Checklist B: Multiple services (HTTP) and you want it clean

  1. Run a reverse proxy on Windows on 443 (single inbound hole).
  2. Forward to WSL backends on their internal ports.
  3. Keep WSL ports closed to the LAN; only Windows proxy is exposed.
  4. Use firewall scopes to limit who can hit the proxy.
  5. Log requests on the Windows side; debugging becomes survivable.

Checklist C: You keep breaking after reboot

  1. Assume the WSL IP changed until proven otherwise.
  2. Update portproxy mapping programmatically.
  3. Validate: netsh ... show and netstat -ano.
  4. If it still fails, run the fast diagnosis playbook again. Don’t “just reboot.”

One quote worth keeping on your monitor

“Everything fails, all the time.” — Werner Vogels

It’s not pessimism. It’s a design constraint. Your port forwarding should be built like it’ll be rebooted, renumbered, and mis-profiled—because it will.

FAQ

1) Why can I reach the service from Windows localhost but not from other LAN machines?

Because Windows-to-WSL localhost forwarding is not LAN routing. LAN machines hit your Windows LAN IP, and Windows must accept inbound traffic
(firewall) and forward it (portproxy or proxy) to WSL.

2) Do I always need netsh portproxy?

No. It’s the simplest for TCP. If you want TLS, routing, and logs, run a reverse proxy on Windows instead.
If you need UDP, portproxy won’t help.

3) Why does portproxy break after a reboot or wsl --shutdown?

The WSL2 VM gets a new IP. Portproxy still points at the old one. Automate updating the mapping with the current WSL IP at startup/logon.

4) Can I forward to 127.0.0.1 inside WSL?

Portproxy forwards to an IP that Windows can route to. Typically you forward to the WSL eth0 IP.
If your service is loopback-only, change it to listen on 0.0.0.0 (and then use firewall/proxy to control exposure).

5) Is it safe to set listenaddress=0.0.0.0?

It’s safe only if your firewall scopes are sane. If you expose a port on all interfaces and allow it from “Any,” you are publishing a service.
Restrict by profile and RemoteAddress, especially on laptops.

6) Does portproxy support UDP?

No. It’s TCP-only. If you need UDP services (some games, DNS, certain telemetry), use a different proxy mechanism or redesign toward TCP.

7) Why do I see LISTENING PID 4 for my forwarded port?

That’s expected. The System process hosts the listener for portproxy rules.
Validate the mapping with netsh interface portproxy show v4tov4 and end-to-end testing.

8) Can I bind the forward to only my LAN IP instead of all interfaces?

Yes. Set listenaddress to your LAN IP (like 192.168.1.40) and keep the firewall rule scoped too.
This reduces accidental exposure on VPN adapters or other interfaces.

9) What’s the quickest way to tell if the LAN request even reaches my Windows machine?

Use pktmon filtered on the listening port while you run a LAN client request. If no packets arrive, stop blaming WSL.
Fix IP targeting, VLAN routing, or client VPN behavior.

Conclusion: next steps that actually work

Make this boring. Boring scales.

  1. Fix binding in WSL first: ensure your service listens on 0.0.0.0 (or WSL eth0 IP), not just loopback.
  2. Prove Windows can reach WSL by IP with Test-NetConnection. If this fails, don’t involve the LAN.
  3. Use portproxy for simple TCP exposure, then open Windows Firewall narrowly (profile + subnet scope).
  4. Automate portproxy updates so WSL IP churn doesn’t page you (or ruin your demo) after every reboot.
  5. When complexity grows (TLS, multiple apps, auth), stop stacking hacks and run a proper Windows reverse proxy.

If you treat your laptop like a tiny edge server—with explicit ingress, explicit policy, and repeatable config—you’ll stop “debugging networking”
and start shipping services.

]]>
https://cr0x.net/en/wsl2-port-forwarding-lan/feed/ 0
Install Node, Python, and Go in WSL: Clean Dev Environments Without Windows Mess https://cr0x.net/en/install-node-python-go-wsl/ https://cr0x.net/en/install-node-python-go-wsl/#respond Tue, 24 Feb 2026 09:49:10 +0000 https://cr0x.net/?p=34952 You want a dev box that behaves like production Linux, but your laptop came with Windows, a corporate image, and a “developer experience” that’s mostly registry keys and regret.
WSL is the compromise that actually works—if you stop treating it like “Linux but in a folder” and start treating it like a real system with boundaries.

This is the playbook I use when I need Node, Python, and Go installed cleanly, versioned sanely, and reproducible across a team—without smearing dependencies across Windows,
breaking PATH, or turning file I/O into a performance art piece.

The rules: keep Windows clean, keep WSL honest

A clean dev environment is mostly about refusing to “just install it real quick.” You can absolutely install Node on Windows, Python on Windows, Go on Windows,
then install them again in WSL, then wonder why your tooling randomly uses the wrong binary. That’s not a rite of passage. That’s a preventable incident.

My opinionated default:

  • Install language runtimes inside WSL, not on Windows. Use Windows only for editors/terminals and optional GUI tools.
  • Use version managers (nvm, pyenv) unless you have a controlled monorepo with pinned toolchains baked into containers.
  • Keep source code and dependency caches in the Linux filesystem (inside the WSL distro), not under /mnt/c.
  • Keep Windows PATH out of WSL PATH unless you have a specific reason. Interop is a scalpel, not a diet.
  • Automate setup with a bootstrap script and dotfiles. If your environment can’t be rebuilt, it’s not an environment; it’s a pet.

Exactly one quote, because it’s worth the ink. Gene Kim’s paraphrased idea is: reliability comes from systems and feedback loops, not heroics.
WSL setup is the same: boring defaults beat clever hacks.

Joke #1: The quickest way to learn about PATH precedence is to break PATH precedence.
The second quickest way is to read the rest of this article.

Facts and history you can weaponize

A few short facts that explain why WSL behaves the way it does—and why the “easy” install path often becomes the expensive one.

  1. WSL 1 and WSL 2 are different beasts. WSL 2 uses a real Linux kernel in a lightweight VM; WSL 1 translated Linux syscalls. Performance and semantics differ.
  2. WSL 2 networking is NAT’d by default. Localhost usually works, but some corporate proxies, VPNs, and port exposure assumptions will betray you.
  3. Cross-filesystem access is asymmetric. Linux filesystem access from WSL is fast; accessing Windows files under /mnt/c is slower, especially for many small files (hello, node_modules).
  4. Node tooling got more complex over time. npm became a platform; Yarn and pnpm fought for mindshare; Corepack showed up to standardize package manager versions.
  5. Python packaging is still a warzone. Wheels vs source builds, system libraries, and virtual environments create failure modes that look like “Python is broken” but are really “your build inputs are inconsistent.”
  6. Go deliberately reduced environment complexity. Modules (Go 1.11+) moved dependency management away from GOPATH-centric workflows, but GOPATH still matters for caches and older tooling.
  7. Ubuntu’s packages trade freshness for stability. Apt packages can lag language releases. Version managers exist because “latest” and “secure” are not synonyms.
  8. Corporate Windows images are opinionated. They ship with Python launchers, old Git, and security agents that hook file I/O. Mixing runtimes across Windows and WSL makes debugging twice as fun and half as useful.

Foundation: choose your distro, set boundaries, and verify WSL health

Pick one WSL distro per “persona.” For most people that’s one Ubuntu distro for day-to-day work. Multiple distros are fine when you need isolation
(e.g., one for legacy Python 2 archaeology, one for modern tooling), but don’t create ten distros because you can’t decide between shells.

WSL version and distro sanity

Before installing any language toolchains, confirm you’re on WSL 2 and that your distro is healthy. If your base is shaky, every install step turns into folklore.

Boundary #1: stop auto-inheriting Windows PATH unless you mean it

The sneakiest breakage happens when WSL “helpfully” includes Windows paths. Suddenly your python inside WSL points at a Windows executable,
which then tries to read Linux paths and fails in creative ways.

Disable that by editing /etc/wsl.conf inside the distro and then restarting WSL.

Boundary #2: treat the WSL filesystem as your dev disk

Put repos under ~ (or another Linux path), not under /mnt/c/Users/.... The latter will burn you on performance and tooling edge cases.
If you must operate on Windows files (compliance, shared drives), isolate that workflow and accept it will be slower.

Filesystem placement: where your code lives decides your speed

Your toolchains aren’t slow; your storage path is slow. Node and Python are especially sensitive because they create and read thousands of small files.
If those live under /mnt/c, every filesystem operation takes the long way through interop.

The rule is simple:

  • Code + dependencies inside WSL’s ext4 filesystem (typically under /home) for speed.
  • Windows-mounted paths for occasional file exchange, not for builds.

When someone says “WSL is slow,” I ask one question: “Where is the repo located?” Most of the time, that’s the whole mystery.

Node.js in WSL (nvm, corepack, and avoiding global chaos)

Install Node in WSL using nvm. I don’t care if apt has nodejs. I don’t care if Windows has Node already. You want fast, repeatable,
per-user versions that won’t collide with system packages or corporate tooling.

Version policy that doesn’t create pager fatigue

  • Use Active LTS for most teams.
  • Use current only when you’re validating ahead of time, not because you felt lucky.
  • Pin Node major/minor in .nvmrc at the repo root. That’s your contract.

Package managers: let Corepack do its job

Modern Node setups often use Yarn or pnpm. Corepack ships with Node and can install and pin the package manager version declared by the project.
This reduces “works on my machine” because “my machine” ran Yarn 1 while CI ran Yarn 3.

Global npm installs are fine for a small set of things (linters, scaffolding) but prefer project-local binaries and npx or scripts.
Global installs become a shared junk drawer with no labels.

Python in WSL (pyenv, venv, and compiled dependencies)

Python is the runtime that will punish vague intentions. Install with pyenv when you need multiple versions (you do), and use venv
per project. Don’t install random pip packages globally and then wonder why one project’s dependencies sabotage another.

System dependencies: the part everyone forgets

Many Python packages include native extensions (cryptography, numpy, lxml). Wheels often exist, but not always for your Python version or architecture.
When pip builds from source, you need system headers and compilers. That’s why the “simple pip install” sometimes becomes a C toolchain install.
This is normal. It’s just not pleasant.

Two rules that eliminate most Python drama

  • Always use a venv and keep it inside the project or under a dedicated directory.
  • Pin dependencies with a lock approach appropriate to your org (requirements.txt with hashes, pip-tools, poetry, etc.). The mechanism matters less than the discipline.

Go in WSL (GOVERSION, GOPATH, and module hygiene)

Go is the least dramatic of the three, which is why people get complacent and do weird things like manually unpack tarballs into random directories.
Install Go in WSL in a predictable location and keep your environment variables boring.

Modules-first worldview

If you’re building modern Go, use modules. Put code wherever you want (under your home directory is fine), and let go env tell you what matters.
GOPATH still exists and still affects caches and older tools, but it’s no longer the place your source code must live.

Go version strategy

If your org ships multiple Go services, you will eventually need multiple Go versions. You can use a Go version manager, or you can standardize per quarter.
What you shouldn’t do is “whatever Go was installed on the laptop that day.”

Interop without pain: PATH, Git, SSH, and VS Code

The cleanest setup is: languages inside WSL, editor on Windows, and a minimal interop layer between them. VS Code’s WSL integration is popular because it
doesn’t pretend the boundary doesn’t exist—it just makes it less annoying.

Git: pick one side and be consistent

Use Git inside WSL for repos stored inside WSL. If you use Windows Git on a repo inside WSL, you’re inviting line-ending weirdness and permission surprises.
If you use WSL Git on a repo under /mnt/c, you’ll likely get performance issues and occasional metadata confusion.

SSH keys: don’t copy them around casually

Store keys inside WSL (~/.ssh) and protect them with proper permissions. If you must use Windows-managed keys, be explicit about the agent strategy.
The “it worked yesterday” class of SSH problems often comes from agent confusion across the boundary.

Joke #2: Nothing ages faster than an SSH key you pasted into a chat “just for a minute.”

Three corporate mini-stories (how teams actually break this)

Incident: the wrong assumption (PATH inheritance and the phantom Python)

A team migrated a mid-sized Node + Python monorepo to WSL to get closer to Linux production. People were happy for about a week.
Then CI started failing for one developer only, with Python tooling errors that looked like bad dependencies. Everyone did the usual dance:
delete venv, reinstall, clear caches, swear quietly.

The root cause wasn’t the repo. It was an assumption: “If I’m in WSL, python is Linux Python.” On that machine, WSL inherited Windows PATH,
and python.exe from Windows was getting called first. It didn’t always fail; it failed when scripts passed Linux paths or relied on Linux-only libraries.
The error messages were misleading because the wrong interpreter was running.

The fix was boring: disable Windows PATH injection in /etc/wsl.conf, restart WSL, and verify interpreter identity with which python
and python -c checks. After that, their setup docs explicitly included “prove your Python is Linux Python” as a required step.

The lesson: when a failure is machine-specific and inconsistent, suspect the boundary—PATH, filesystem location, proxy settings—before suspecting dependencies.
Dependencies are deterministic; your environment often isn’t.

Optimization that backfired (moving repos to /mnt/c to “simplify backups”)

Another org decided it would be “cleaner” if all source code lived under the Windows user profile so it would be included in corporate backups and endpoint DLP scans.
Their WSL distro was treated as disposable. That sounded reasonable in a meeting.

The first complaint was slow installs: npm ci taking forever. Then Go builds started dragging. Python venv creation became sluggish.
Engineers started switching back to Windows-native tooling “just for speed,” which reintroduced the exact “two environments” mess they were trying to avoid.

Security agents on Windows were scanning file operations, which multiplied the pain for the “many small files” workloads. Node’s dependency tree is essentially
a benchmark for filesystem overhead, and it failed the benchmark loudly. The team tried to optimize by excluding directories from scanning, but policy exceptions were slow,
and the exclusions didn’t cover every path.

They eventually flipped the model: repos and dependency directories live in WSL’s filesystem for performance; backup is handled by pushing to remote git and,
when needed, exporting artifacts. DLP concerns were addressed by controlling what can be copied out of WSL rather than punishing every read/write.

The lesson: optimizing for governance by moving hot build workloads onto the Windows filesystem is like putting your database on network storage “for convenience.”
It will work. It will also be slow. And then people will work around it in ways you don’t control.

Boring but correct practice that saved the day (pinning toolchains and verifying them)

A platform team supported dozens of services across Node, Python, and Go. They didn’t want bespoke snowflake laptops. They wrote a bootstrap script that:
installed base packages, configured WSL boundaries, installed nvm/pyenv, and set default versions. They also added a “verify” step that printed versions and key paths.

It wasn’t flashy. It didn’t use a fancy provisioning framework. It was a shell script and a checklist. It also forced standard outputs: everyone’s
node, python, pip, and go were checked the same way. That meant support tickets began with facts, not vibes.

When a new corporate laptop image rolled out, it quietly changed Windows PATH ordering and introduced a Windows Python shim that confused WSL users—on some machines.
The bootstrap verification caught it immediately because the output didn’t match expected patterns. Instead of a month of intermittent issues, it was a one-day fix:
update /etc/wsl.conf guidance and re-run the verification step.

The lesson: the boring practice is “pin versions and verify identity.” Not once, but every rebuild. It’s dull until it saves you, and then it’s your favorite.

Fast diagnosis playbook

When “Node/Python/Go in WSL is broken” lands in your lap, don’t start reinstalling things. You’ll destroy evidence and waste time.
Diagnose in this order to find the bottleneck fast.

First: confirm the boundary (WSL version, PATH, filesystem location)

  • Is this WSL 2?
  • Is the repo under Linux filesystem or /mnt/c?
  • Is WSL inheriting Windows PATH? Are Windows binaries shadowing Linux ones?

Second: confirm toolchain identity (which binary, which version, which install method)

  • For Node: which node, node -v, nvm current
  • For Python: which python, python -V, python -c "import sys; print(sys.executable)"
  • For Go: which go, go version, go env

Third: confirm performance constraints (I/O, CPU, memory, antivirus interaction)

  • If installs/builds are slow: check if you’re operating on /mnt/c.
  • If random hangs: check free disk space and memory pressure inside WSL.
  • If network fetches fail: check proxy settings and DNS resolution inside WSL.

Fourth: only then reinstall

Reinstalling without knowing what went wrong is how you end up with three broken installs instead of one.

Common mistakes: symptoms → root cause → fix

1) “python points to python.exe”

Symptoms: python runs but can’t import Linux modules; paths look like C:\...; venv behaves strangely.

Root cause: WSL inherited Windows PATH; Windows Python is first in precedence.

Fix: Disable PATH append in /etc/wsl.conf, restart WSL, verify with which python and file $(which python).

2) “npm install takes forever”

Symptoms: npm ci or pnpm install is dramatically slower than teammates; high CPU in Windows security processes.

Root cause: Repo or node_modules under /mnt/c; Windows filesystem + scanning overhead for tiny files.

Fix: Move repo into WSL filesystem (~/src), reinstall dependencies, keep Windows access for editing only.

3) “pyenv install fails with missing headers”

Symptoms: Build errors mentioning zlib, openssl, readline, or ffi.

Root cause: Missing build dependencies for compiling Python.

Fix: Install required packages via apt (compiler, dev headers), then retry pyenv install.

4) “go get works, but builds fail on CI”

Symptoms: Local build succeeds; CI or teammate build fails; module versions differ.

Root cause: Module versions not pinned/committed; go.mod/go.sum drift.

Fix: Run go mod tidy, commit go.sum, and set a minimum Go version in go.mod.

5) “Permission denied in repo”

Symptoms: Git can’t write; tools can’t create files; weird UID/GID mismatches.

Root cause: Working in Windows-mounted filesystem with permission metadata mismatch; sometimes bad mount options.

Fix: Move repo into WSL filesystem; if you must use /mnt/c, tune mount options and accept limitations.

6) “VS Code terminal shows one Node, tasks use another”

Symptoms: Terminal prints Node 20; tasks/build scripts behave like Node 18; inconsistent results.

Root cause: Mixed contexts: Windows VS Code launching tasks in Windows shell vs WSL; or PATH differs between login/non-login shells.

Fix: Ensure you’re in a WSL Remote window; ensure shell init loads nvm/pyenv for non-interactive shells, or configure tasks explicitly.

Practical tasks (commands + outputs + decisions)

These are real checks I run. Each has: a command, what the output means, and the decision you make from it.
Run them inside WSL unless noted.

Task 1: Confirm WSL version and distros (from Windows)

cr0x@server:~$ wsl.exe -l -v
  NAME            STATE           VERSION
* Ubuntu-22.04    Running         2
  Debian          Stopped         2

Meaning: Your primary distro is Ubuntu-22.04 and it’s WSL 2.

Decision: If VERSION is 1, convert it or install a WSL 2 distro. Don’t build modern toolchains on WSL 1 unless you enjoy edge cases.

Task 2: Check kernel and distro basics

cr0x@server:~$ uname -a
Linux cr0x-laptop 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu Oct 12 20:38:48 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Meaning: You’re on the WSL2 kernel line; architecture is x86_64.

Decision: If you see unexpected architecture (e.g., arm64), ensure your language builds and binaries match.

Task 3: Detect Windows PATH leakage

cr0x@server:~$ echo "$PATH" | tr ':' '\n' | head -n 10
/home/cr0x/.nvm/versions/node/v20.11.1/bin
/home/cr0x/.pyenv/shims
/home/cr0x/.pyenv/bin
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin

Meaning: PATH starts with Linux user toolchains; no obvious /mnt/c/Windows entries in the first lines.

Decision: If you see many /mnt/c entries, disable PATH append in /etc/wsl.conf and restart WSL.

Task 4: Verify where Node comes from

cr0x@server:~$ which node
/home/cr0x/.nvm/versions/node/v20.11.1/bin/node

Meaning: Node is provided by nvm in your home directory.

Decision: If which node points to /usr/bin/node, you’re using apt’s Node. Decide if that’s acceptable; usually it’s not for multi-version work.

Task 5: Confirm Node and npm are coherent

cr0x@server:~$ node -v && npm -v
v20.11.1
10.2.4

Meaning: Node and npm versions are aligned for that Node release.

Decision: If npm is unexpectedly old/new, you may have PATH shadowing or a partial install.

Task 6: Check Corepack status (Yarn/pnpm control plane)

cr0x@server:~$ corepack --version
0.24.1

Meaning: Corepack exists and can manage Yarn/pnpm versions.

Decision: If missing, you’re likely on an older Node or a custom build; decide whether to upgrade Node or manage package managers manually.

Task 7: Verify Python identity and executable path

cr0x@server:~$ which python
/home/cr0x/.pyenv/shims/python

Meaning: Python is controlled by pyenv shims (good for multi-version).

Decision: If it points to /mnt/c/... or ends in .exe, you’re running Windows Python inside WSL. Fix PATH leakage.

Task 8: Confirm Python version and where it runs from

cr0x@server:~$ python -V
Python 3.12.2
cr0x@server:~$ python -c "import sys; print(sys.executable)"
/home/cr0x/.pyenv/versions/3.12.2/bin/python

Meaning: You’re running the pyenv-installed interpreter.

Decision: If sys.executable points somewhere surprising, stop and fix interpreter selection before touching dependencies.

Task 9: Create and validate a venv (proves pip isolation)

cr0x@server:~$ python -m venv .venv
cr0x@server:~$ source .venv/bin/activate
(.venv) cr0x@server:~$ which python
/home/cr0x/.venv/bin/python

Meaning: Your shell is using the venv interpreter; pip installs will go into the venv.

Decision: If which python doesn’t change after activation, your shell init is broken or you’re not activating correctly.

Task 10: Verify Go installation and environment

cr0x@server:~$ which go
/usr/local/go/bin/go
cr0x@server:~$ go version
go version go1.22.1 linux/amd64

Meaning: Go is installed in a standard location and reports the correct OS/arch.

Decision: If it reports windows/amd64, you somehow called Windows Go from WSL. That’s a boundary violation.

Task 11: Confirm Go module mode and key paths

cr0x@server:~$ go env GOPATH GOMOD GOCACHE
/home/cr0x/go
/home/cr0x/src/myservice/go.mod
/home/cr0x/.cache/go-build

Meaning: GOPATH is in your home; modules are active (GOMOD points to a go.mod); caches are in Linux space.

Decision: If GOMOD is empty inside a module repo, you’re not in the module directory or GO111MODULE behavior is odd; fix before debugging dependencies.

Task 12: Detect “repo is on /mnt/c” (performance red flag)

cr0x@server:~$ pwd
/mnt/c/Users/cr0x/source/myapp

Meaning: You’re building from the Windows filesystem mount.

Decision: If this repo contains Node or Python dependencies, move it to ~/src in WSL. Expect big speedups.

Task 13: Measure basic disk space (mysterious install failures love full disks)

cr0x@server:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc        251G  198G   41G  83% /

Meaning: You have ~41G free in the distro filesystem.

Decision: If you’re near 100%, stop. Clean caches before blaming package managers.

Task 14: Check memory pressure (random build kills and slowdowns)

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.8Gi       5.6Gi       410Mi       112Mi       1.8Gi       1.7Gi
Swap:          2.0Gi       1.4Gi       640Mi

Meaning: Available memory is low, swap is active.

Decision: Expect slow builds. Close memory-heavy apps, tune WSL memory limits, or reduce parallelism for installs/tests.

Task 15: Check DNS and outbound connectivity (package fetch failures)

cr0x@server:~$ getent hosts pypi.org
151.101.64.223  pypi.org
151.101.128.223 pypi.org
151.101.192.223 pypi.org
151.101.0.223   pypi.org

Meaning: DNS resolution works from inside WSL.

Decision: If resolution fails, troubleshoot WSL DNS/proxy before touching language installers.

Checklists / step-by-step plan

Plan A: clean, repeatable setup (recommended)

  1. Confirm WSL 2 and pick one distro.
    Use wsl.exe -l -v. If you see WSL 1, fix that first.
  2. Update base packages.
    Run apt update/upgrade and install build essentials. This avoids “pip tried to compile something and died” later.
  3. Set boundaries in /etc/wsl.conf.
    Disable Windows PATH injection unless you have a specific need.
  4. Create a Linux-native workspace.
    Make ~/src, clone repos there, and stop developing under /mnt/c.
  5. Install nvm, then Node LTS.
    Add .nvmrc per repo. Prefer corepack for Yarn/pnpm pinning.
  6. Install pyenv, then Python versions you need.
    Set a global default and a per-repo local version when required.
  7. Use venv per project.
    Create .venv, activate it, then install dependencies.
  8. Install Go in a stable path.
    Set PATH to include Go, verify go env.
  9. Run the verification tasks.
    Check which and versions for node/python/go, and check repo location.
  10. Automate it.
    Put these steps into a bootstrap script and require it for onboarding.

Plan B: when you’re forced to work under /mnt/c (acceptable but slower)

  1. Keep language runtimes in WSL anyway. Don’t install parallel Windows runtimes “to speed things up.”
  2. Keep dependency-heavy directories out of /mnt/c if possible (some tools support configuring cache and build output paths).
  3. Expect slow Node installs. Plan for it: use lockfiles, avoid repeated clean installs, and don’t benchmark WSL based on this mode.
  4. Be explicit about line endings and executable bits; Windows filesystems don’t naturally preserve Linux semantics.

FAQ

Should I install Node/Python/Go on Windows as well?

Generally no. Install them in WSL. Installing them on Windows too creates ambiguity: editors, terminals, and scripts may pick different runtimes.
If you need Windows-native builds for a specific product, isolate that workflow and document it.

Is WSL 2 always better than WSL 1 for dev?

For most modern dev stacks, yes. WSL 2 is closer to real Linux behavior and usually better for containers and tooling compatibility.
WSL 1 can be useful for certain filesystem access patterns, but it’s not the default choice for clean toolchains.

Why is Node so much slower in some setups?

Because Node workloads stress filesystems: many small files, frequent metadata operations. If your repo is on /mnt/c,
you pay interop overhead plus whatever Windows security tooling is doing.

Can I use apt to install Node and Python instead of version managers?

You can, but you’ll trade simplicity today for pain later when versions diverge across projects. apt is fine when you need one stable version
and the distro provides what you need. Most teams need multiple versions. Use nvm and pyenv.

What about Conda for Python?

Conda can work well, especially for scientific stacks. In corporate environments, it can also introduce its own dependency universe and disk footprint.
If your org already standardizes on Conda, use it. If not, pyenv + venv is usually simpler and more Linux-native.

Where should I store my Git repositories?

Inside WSL’s Linux filesystem (e.g., ~/src) for performance and correct Linux semantics. Use Windows paths for occasional file sharing,
not for dependency-heavy builds.

How do I keep my environment reproducible across a team?

Pin tool versions in-repo (.nvmrc, Python version files, go.mod), commit lockfiles, and provide a bootstrap script that installs
the version managers and required packages. Add a verification step that prints paths and versions.

Do I need Docker if I use WSL?

Not always. WSL gives you a Linux userland that’s good for dev. Docker adds runtime isolation and production parity for services.
Many teams use both: WSL for the developer shell, Docker for services and dependencies.

What’s the safest way to handle SSH keys with WSL?

Keep keys in WSL under ~/.ssh, use strict permissions, and use an SSH agent inside WSL when possible.
Avoid copying keys between Windows and WSL casually. If corporate policy requires a Windows agent, document the integration explicitly.

Why do I see different behavior between terminal sessions?

Shell initialization differs between interactive and non-interactive shells, login vs non-login shells, and between terminal apps.
If nvm/pyenv initialization lives only in one file, your tools may vanish in another context. Standardize your shell init.

Next steps that keep it clean

If you want a WSL dev environment that doesn’t rot:

  1. Move active repos into ~/src inside WSL and re-run installs. This alone fixes a shocking number of “WSL is slow” complaints.
  2. Disable Windows PATH inheritance in WSL unless you’ve justified it with a real requirement.
  3. Adopt version managers: nvm for Node, pyenv for Python, and a consistent Go installation strategy. Pin versions per repo.
  4. Write a bootstrap script plus a verification script. Make “print versions and paths” the first step of debugging, not the last.
  5. When something breaks, follow the diagnosis order: boundary → identity → performance → reinstall. Don’t guess.

The goal isn’t to worship cleanliness. It’s to make your laptop behave like a small, predictable production system: controlled inputs, clear boundaries,
and failures that are explainable instead of mystical.

]]>
https://cr0x.net/en/install-node-python-go-wsl/feed/ 0
Pick a WSL Distro That Won’t Annoy You (Ubuntu vs Debian vs Others) https://cr0x.net/en/pick-wsl-distro-ubuntu-debian-others/ https://cr0x.net/en/pick-wsl-distro-ubuntu-debian-others/#respond Tue, 24 Feb 2026 06:59:58 +0000 https://cr0x.net/?p=34905 WSL is the closest thing Windows has to a “just run Linux” button. And like every button in ops, it works great until you pick the wrong default and spend a Tuesday debugging something that was never your actual job.

The distro choice is one of those decisions you’ll forget you made—right up until it starts making decisions for you: which libc you’re on, how old your OpenSSL is, whether packages exist, whether your coworkers can reproduce your bug, and how often an upgrade turns your day into a minor incident.

What you’re really choosing when you “pick a distro”

On bare metal, distro selection can feel ideological. On WSL, it’s more like choosing which set of defaults you want to live under while Windows quietly holds the steering wheel.

You’re picking:

  • Release cadence and upgrade blast radius. How often you’ll be forced to deal with toolchain churn versus how quickly you get security fixes and new compiler/runtime features.
  • Package ecosystem friction. Whether your daily drivers (Python, Node, Go, Rust, OpenJDK, PostgreSQL clients, Azure/AWS CLIs) are first-class citizens or constant exceptions.
  • Community “works on my machine” compatibility. If everyone in your org uses Ubuntu, being the lone Fedora-on-WSL person is like writing your own on-call schedule.
  • Default security posture. AppArmor vs SELinux expectations, how sudo is configured, how SSH keys are handled, and how services start.
  • How painful it is to debug. The volume of answers on internal wikis and the internet matters. You don’t get points for originality at 2 a.m.

One reliability quote you should tattoo on your sprint board

Paraphrased idea (John Ousterhout): “Complexity is the root cause of most software problems.”

WSL is already a layered cake. Your distro choice should reduce complexity, not add a new personality disorder.

Interesting facts and historical context (the bits that shape today’s defaults)

  1. WSL 1 (2016) wasn’t a VM. It translated Linux syscalls into Windows NT calls; great for some tools, awkward for others.
  2. WSL 2 (2019) switched to a real Linux kernel in a lightweight VM. That’s why Docker and real kernel features became sane.
  3. Ubuntu became the de facto WSL default early. Microsoft partnered closely with Canonical; inertia is a powerful devops tool.
  4. Systemd support in WSL arrived much later than users wanted. For years, distros had to fake service management or you had to run daemons manually.
  5. WSL’s filesystem split is foundational. Linux-side files (ext4 in the VM) behave differently than Windows-mounted files under /mnt/c.
  6. Debian’s “stable” branch aims for predictability over novelty. That ideology maps well to corporate reproducibility.
  7. Fedora is upstream-adjacent for a lot of Linux tech. It often gets new tooling earlier—great for experimentation, spicy for change control.
  8. Alpine’s musl libc is a real compatibility boundary. It’s small and fast, but not everything expects musl in dev environments.
  9. Kali is designed for security testing workflows. Using it as a daily dev box is like bringing a chainsaw to slice bread: possible, but everyone gets nervous.

WSL realities that change the distro math

Before you compare Ubuntu vs Debian, understand the ground rules. WSL is not “Linux on Windows” so much as “Linux beside Windows, sharing a few organs.” Those shared organs are where most annoyance comes from.

The two filesystems problem (and why it dominates performance)

Inside WSL2, your Linux root filesystem lives on a virtual ext4 disk (a VHDX). That path is fast for Linux syscalls. Windows files mounted at /mnt/c are convenient, but crossing that boundary costs you. A lot.

If you do heavy I/O (git status in huge repos, node_modules installs, Python virtualenv creation) on /mnt/c, you will blame your distro. Wrong culprit. Your bottleneck is the filesystem boundary.

Networking isn’t “just Linux networking”

WSL2 uses a virtualized network stack. It mostly works. When it doesn’t, the failure mode looks like “Linux DNS is broken” or “my proxy hates me.” The fixes often involve Windows-side settings and WSL restarting, not changing distros.

Kernel is Microsoft-managed (mostly)

Unlike a traditional distro install, your kernel updates are tied to WSL and Windows updates, not your distro’s kernel packages. That reduces one dimension of distro difference—but increases the importance of userland compatibility and tooling.

WSLg and GUI apps are a separate axis

Running Linux GUI apps (WSLg) works across distros, but the out-of-box experience varies: packages, fonts, GPU libraries, and desktop-ish dependencies can be smoother on Ubuntu than on minimal distros.

Short joke #1: Choosing a distro for WSL based on wallpaper aesthetics is bold. It’s like selecting a database because the logo is cute.

Quick picks (opinionated): what to install for common jobs

If you want the least drama: Ubuntu LTS

Pick Ubuntu LTS if you want maximum compatibility with tutorials, corporate images, third-party repos, and teammates. It’s the default because it works, not because it’s cool.

Choose: Ubuntu 22.04 LTS or Ubuntu 24.04 LTS (depending on what your org supports).

If you want minimal and predictable: Debian stable

Pick Debian stable if you want fewer “surprises” and you don’t need the newest compilers by default. It’s slower-moving, which is a feature when you’re trying to keep dev and CI aligned.

If you live in containers and want latest tooling: Fedora (carefully)

Fedora is great when you’re testing modern toolchains, kernel-adjacent features (even though WSL kernel is fixed), or you like newer versions of languages. But Fedora upgrades are not shy. If you hate churn, Fedora will find you.

If you think Alpine will be “small and fast”: reconsider

Alpine’s minimalism is real, but musl-based environments can create compatibility potholes in dev workflows. Alpine shines inside containers. As your primary WSL distro, it’s often a tax you didn’t budget.

If you’re doing security training: Kali (for that purpose only)

Kali is not “better Linux.” It’s a curated toolbox. Use it as a separate distro you spin up when needed, not as your daily driver.

Ubuntu on WSL: the default for a reason

Ubuntu on WSL is the path of least resistance, which in production engineering is a compliment. Most vendor install instructions assume Ubuntu. Many internal runbooks do too. And if you ever need to ask a coworker for help, “I’m on Ubuntu” lowers the cognitive load.

What Ubuntu gets right on WSL

  • LTS cadence fits corporate life. You can stay stable for years while still getting security updates.
  • Toolchain availability. PPAs, vendor repos, and packages tend to exist and be tested.
  • Better default ergonomics. Reasonable base packages, predictable behavior, and lots of “this tutorial just works.”
  • WSL mindshare. If there’s a WSL-specific workaround, someone has probably written it for Ubuntu first.

What Ubuntu does that annoys some people

  • Snap. On classic Linux machines, Snap is a religious war. On WSL, it’s mostly a practical question: do you need snap-packaged apps, and does snapd behave well under WSL + systemd? Sometimes yes, sometimes it’s friction.
  • More “stuff” by default. If you want a lean environment, Ubuntu may feel heavy compared to Debian minimal installs.
  • Non-LTS releases are churny. If you pick a non-LTS release for “newer packages,” expect more frequent upgrades and occasional regression whack-a-mole.

When I recommend Ubuntu without debate

Teams. Shared dev environments. Corporate laptops. CI parity with Ubuntu runners. New hires. People who want to work, not curate.

Debian on WSL: boring, lean, and usually correct

Debian stable is the friend who shows up on time, doesn’t talk about cryptocurrency, and leaves the kitchen cleaner than they found it. For WSL, that’s a strong pitch.

What Debian gets right on WSL

  • Predictable upgrades. Stable is stable. You get security fixes, but the base doesn’t change under your feet.
  • Minimalism without being weird. You can keep your environment small and still be compatible with most Linux expectations.
  • Excellent packaging discipline. Debian’s packaging norms tend to reduce “mystery behavior.”

Debian’s real tradeoffs

  • Older defaults. That can mean older Python, Node, GCC/Clang, OpenSSL, etc. You can use backports or language-specific installers, but now you’re doing work.
  • Some vendor scripts assume Ubuntu. They might check lsb_release and refuse to run, or they’ll reference Ubuntu-specific packages.

When Debian is the best call

When you care about reproducibility, when you’re supporting long-lived internal tooling, when you want fewer surprises, and when you’re okay installing newer language runtimes explicitly.

Others (Fedora, openSUSE, Alpine, Kali): when they’re great and when they’re a trap

Fedora: modern, fast-moving, sometimes too honest

Fedora is terrific if you want current compilers, newer language runtimes, and a distro culture that ships modern tech quickly. In WSL, that can be a productivity boost—until a major upgrade lands and your tooling decides to reenact a dependency graph collapse.

Fedora on WSL is a good choice for advanced users who are comfortable treating their dev environment as cattle, not a pet: export, nuke, re-import, move on.

openSUSE (Leap vs Tumbleweed): the underrated option

openSUSE tends to be solid, especially if your org runs SUSE in production. Leap is the stable line; Tumbleweed is rolling. On WSL, rolling releases can be fun until they’re your problem.

Alpine: great in containers, not always great as a workstation

Alpine’s musl libc and busybox-centric userland are excellent for minimal container images. For WSL as a general dev distro, you’ll hit edge cases: prebuilt binaries that assume glibc, build scripts that assume GNU coreutils behavior, and colleagues who can’t reproduce your environment without also drinking the Alpine kool-aid.

Kali: a specialty toolset, not a lifestyle

Kali is excellent for its intended job. Install it as an additional WSL distro for security work. Keep your daily dev on Ubuntu or Debian.

Short joke #2: Running a rolling release on your work laptop is exciting. So is juggling knives—both are better as hobbies than as job requirements.

Systemd and services: the “do I need this?” section

Modern Linux distros assume systemd. Many dev workflows assume services: Docker daemon (if you use Docker-in-WSL), PostgreSQL, Redis, ssh-agent, cron-like tasks. Historically, WSL made that awkward. These days, systemd can be enabled, but you should still decide intentionally.

Enable systemd if:

  • You need services to start reliably on WSL launch.
  • You’re using service units, timers, or journalctl for debugging.
  • You want parity with Linux servers where systemd is the norm.

Skip systemd if:

  • Your WSL distro is mainly for CLI tooling and builds.
  • You run services in containers (Docker Desktop integration) or on remote hosts.
  • You want the smallest surface area for “why is my WSL boot slow?” tickets.

Filesystem and storage performance: where WSL actually bites

I’ll say it bluntly: most WSL “distro performance” complaints are storage layout mistakes.

Golden rule

Keep code you build and test inside the Linux filesystem (somewhere under your WSL home directory), not on /mnt/c. Use Windows mounts for file sharing, light editing, and convenience—not for heavy churn.

Why it matters (practical)

  • Git operations touch tons of small files. Crossing filesystem boundaries makes that slower.
  • Node/npm/pnpm create and scan huge directory trees.
  • Python virtualenv/pip is small-file I/O heavy.
  • Language servers index everything; they are storage latency amplifiers.

Where to put what

  • Put repos here: ~/src inside WSL.
  • Put caches here: default Linux cache dirs are fine; don’t redirect them to Windows.
  • Share with Windows via: \\wsl$ path from Windows Explorer (works well for editing with Windows tools).

Practical tasks: commands that answer real questions

These are the checks I actually run when someone says “WSL is slow” or “this distro is weird.” Each task includes: command, what the output means, and what decision you make.

Task 1: Confirm WSL version (1 vs 2) and distro list

cr0x@server:~$ wsl.exe -l -v
  NAME            STATE           VERSION
* Ubuntu-24.04    Running         2
  Debian          Stopped         2

Meaning: VERSION 2 means a real Linux kernel VM. VERSION 1 means syscall translation (different perf and compatibility).

Decision: If you’re on WSL1 and using Docker, modern filesystems, or expecting kernel features, migrate to WSL2. If your main pain is /mnt/c performance, WSL2 usually helps—but you still need correct file placement.

Task 2: Check distro release and support window

cr0x@server:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
ID=ubuntu

Meaning: You’re on a specific release; LTS implies longer support, fewer disruptive changes.

Decision: For corporate/dev team consistency, prefer LTS (Ubuntu) or stable (Debian). If you’re on an end-of-life release, plan an export/import upgrade rather than “just keep patching.”

Task 3: Check whether systemd is enabled (and whether it’s a mess)

cr0x@server:~$ ps -p 1 -o comm=
systemd

Meaning: PID 1 is systemd. If it prints something else (or errors), systemd isn’t your init.

Decision: If you rely on services, enable systemd. If not, disable it to reduce moving parts—especially on older corporate builds where WSL updates lag.

Task 4: Check if you’re doing work on Windows-mounted storage

cr0x@server:~$ pwd
/mnt/c/Users/cr0x/source/big-repo

Meaning: You’re in the Windows filesystem.

Decision: If this is a build/test repo, move it to ~/src and access it from Windows via \\wsl$. Expect big wins for node/python/git heavy workflows.

Task 5: Measure filesystem boundary pain with a simple file storm

cr0x@server:~$ cd ~ && mkdir -p /tmp/io-test && cd /tmp/io-test
cr0x@server:~$ time bash -c 'for i in $(seq 1 20000); do echo x > f.$i; done'
real    0m1.8s
user    0m0.5s
sys     0m1.2s

Meaning: This is on Linux filesystem (fast-ish). Repeat under /mnt/c and compare.

Decision: If the same test is dramatically slower on /mnt/c, stop blaming Ubuntu vs Debian. Fix file placement, then revisit.

Task 6: Check disk usage inside the WSL VHDX world

cr0x@server:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb       100G   41G   55G  43% /

Meaning: Your Linux filesystem has headroom (or not).

Decision: If you’re above ~85% used, expect weirdness: package installs failing, builds failing, performance degrading. Clean caches, prune containers, or expand/move the distro.

Task 7: Identify your worst space hogs quickly

cr0x@server:~$ sudo du -xhd1 /home/cr0x | sort -h
2.1G    /home/cr0x/.cache
6.8G    /home/cr0x/.local
14G     /home/cr0x/src
23G     /home/cr0x

Meaning: -x stays on one filesystem; you’re seeing real space costs inside the distro.

Decision: If caches are huge, prune them (pip/npm). If repos are huge, consider shallow clones for giant vendor repos or use sparse checkout.

Task 8: Check memory pressure and swap behavior

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       6.1Gi       0.6Gi       0.2Gi       1.0Gi       1.1Gi
Swap:          2.0Gi       1.4Gi       0.6Gi

Meaning: You’re swapping. That can make “WSL is slow” feel like a personal attack.

Decision: Reduce parallel builds, increase WSL memory limits, or stop running heavyweight services locally. Distro choice won’t fix memory starvation.

Task 9: Confirm DNS behavior inside WSL (common corporate failure)

cr0x@server:~$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 10.10.0.53
       DNS Servers: 10.10.0.53 10.10.0.54

Meaning: DNS servers are set; if they’re wrong or missing, name resolution will be flaky.

Decision: If DNS points to unreachable servers or a captive VPN interface, fix Windows DNS/VPN split tunnel policy, or override WSL resolv.conf generation carefully.

Task 10: Validate proxy environment (because enterprises love proxies)

cr0x@server:~$ env | grep -i proxy
HTTP_PROXY=http://proxy.corp:8080
HTTPS_PROXY=http://proxy.corp:8080
NO_PROXY=localhost,127.0.0.1,.corp

Meaning: Your shell has proxy vars set. Tools like apt, curl, npm may or may not honor them consistently.

Decision: If downloads fail, standardize proxy config at one layer (shell + apt config + git config) and keep NO_PROXY sane for internal services.

Task 11: Check package manager health and lock contention

cr0x@server:~$ sudo apt update
Hit:1 http://archive.ubuntu.com/ubuntu noble InRelease
Get:2 http://security.ubuntu.com/ubuntu noble-security InRelease [110 kB]
Fetched 110 kB in 1s (135 kB/s)
Reading package lists... Done
Building dependency tree... Done
All packages are up to date.

Meaning: apt works and repos are reachable.

Decision: If you see lock errors, a background apt job is running (or stuck). Fix that before you “debug WSL.” If repos are blocked, you have proxy/DNS/cert problems—not a distro problem.

Task 12: Check glibc vs musl (Alpine compatibility tripwire)

cr0x@server:~$ ldd --version
ldd (Ubuntu GLIBC 2.39-0ubuntu8) 2.39
Copyright (C) 2024 Free Software Foundation, Inc.

Meaning: You’re on glibc. On Alpine, this would be musl, and some binaries won’t run.

Decision: If your workflow requires vendor binaries (cloud CLIs, proprietary agents, some language toolchains), prefer glibc distros (Ubuntu/Debian/Fedora/openSUSE).

Task 13: Confirm where your git is and whether line endings will hurt you

cr0x@server:~$ git config --global --get core.autocrlf
input

Meaning: You’re set to convert CRLF on commit only, a common safe setting when mixing Windows and Linux tooling.

Decision: If you see weird diffs or shell scripts failing with /bin/bash^M, fix autocrlf and enforce .gitattributes. This is not Ubuntu’s fault.

Task 14: Check Docker strategy (Desktop integration vs Docker-in-WSL)

cr0x@server:~$ docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               ERROR
default             Current DOCKER_HOST based configuration   unix:///var/run/docker.sock
desktop-linux *     Docker Desktop                            unix:///home/cr0x/.docker/desktop/docker.sock

Meaning: You have contexts; desktop-linux suggests Docker Desktop integration.

Decision: For most devs, Docker Desktop integration reduces daemon babysitting. If you need full Linux control or you’re on restricted machines, Docker-in-WSL can work—but plan for more service management.

Task 15: Export a distro before you “just upgrade”

cr0x@server:~$ wsl.exe --export Ubuntu-24.04 C:\Users\cr0x\backup\ubuntu24.tar
Export in progress, this may take a few minutes.
The operation completed successfully.

Meaning: You have a portable backup you can re-import.

Decision: Always export before major changes. It’s the difference between “oops” and “restore in 5 minutes.”

Fast diagnosis playbook (first/second/third)

When WSL feels slow or broken, you need a triage order. Here’s the one that finds the bottleneck quickly, without turning your laptop into a science fair.

First: identify the class of problem (I/O, CPU, memory, network)

  • I/O suspicion: git commands slow, installs slow, language server indexing slow, lots of small files. Check if you’re under /mnt/c.
  • CPU suspicion: compiles slow but disk seems fine. Check load and CPU limits.
  • Memory suspicion: everything intermittently slow, fan noise, swapping. Check free -h.
  • Network suspicion: apt/npm/curl fails, DNS timeouts, proxy errors. Check DNS and proxy vars.

Second: confirm the WSL substrate is healthy

  • Verify WSL2, not WSL1.
  • Check disk usage (df -h) and space hogs (du).
  • If things are “haunted,” restart WSL from Windows: wsl.exe --shutdown and reopen.

Third: only then debate distro choice

If performance pain is due to the filesystem boundary, memory pressure, or Windows VPN/proxy policy, switching Ubuntu to Debian won’t save you. It just changes the shape of the same fire.

Change distros when:

  • You need older/more stable packages (Debian stable) to match production.
  • You need easier vendor support and common tutorials (Ubuntu LTS).
  • You need bleeding-edge userland (Fedora/openSUSE Tumbleweed) and accept churn.

Common mistakes: symptoms → root cause → fix

1) “Git is unbearably slow in WSL”

Symptoms: git status takes seconds; npm install takes forever; CPU looks mostly idle.

Root cause: Repo is on /mnt/c (Windows filesystem mount).

Fix: Move repo to ~/src inside WSL. Access it from Windows via \\wsl$. Re-test.

2) “apt update fails randomly at work”

Symptoms: Timeouts, TLS errors, “Temporary failure resolving,” only on VPN or only off VPN.

Root cause: Corporate proxy/DNS split behavior; WSL DNS auto-generation conflicts with VPN adapters.

Fix: Standardize proxy env vars; verify DNS servers inside WSL; if necessary, disable auto resolv.conf and manage it intentionally (with an internal runbook, not vibes).

3) “Service won’t start / systemctl doesn’t work”

Symptoms: systemctl errors, daemons die after closing terminal.

Root cause: systemd not enabled, or you’re expecting background services without an init system.

Fix: Enable systemd if you need it; otherwise run services via Docker Desktop or explicit scripts.

4) “Docker builds are slow and weird”

Symptoms: Builds take ages; file change detection is flaky; volumes behave oddly.

Root cause: Mixing Windows filesystem, WSL filesystem, and Docker contexts; bind mounts across boundary are slow.

Fix: Keep build context inside WSL filesystem; choose one Docker strategy and stick to it (Desktop integration is usually easiest).

5) “After an upgrade, Python/Node broke”

Symptoms: Toolchain mismatch, missing libs, pip wheels failing, node-gyp drama.

Root cause: Non-LTS distro upgrades or mixing system packages with language version managers incorrectly.

Fix: Pin versions and use a version manager (pyenv/nvm/asdf) consistently, or stick to Ubuntu LTS/Debian stable and upgrade on purpose with a backup export.

6) “WSL is eating my disk space”

Symptoms: Windows drive fills up; WSL reports moderate usage; things don’t add up.

Root cause: VHDX grows and doesn’t always shrink automatically; caches and container layers accumulate.

Fix: Prune caches and containers; export/import to compact when needed; avoid storing giant artifacts in WSL if Windows-side storage policy is tight.

Checklists / step-by-step plan

Checklist A: Choose your distro in 10 minutes

  1. Are you joining an existing team? Use whatever they use unless you have a strong reason. Usually Ubuntu LTS.
  2. Need maximum tutorial/vendor compatibility? Ubuntu LTS.
  3. Need “don’t change my base” stability? Debian stable.
  4. Need newest userland and you accept churn? Fedora or openSUSE Tumbleweed.
  5. Need a security testing toolbox? Kali as an additional distro, not your primary.
  6. Think you want Alpine? Put Alpine in a container. Use Ubuntu/Debian as the host distro unless you love debugging libc issues.

Checklist B: Set up WSL so it stays fast

  1. Install WSL2 and verify with wsl.exe -l -v.
  2. Create ~/src and clone repos there.
  3. Access files from Windows via \\wsl$ instead of working under /mnt/c.
  4. Decide on systemd: enable only if you need services.
  5. Pick one Docker approach and standardize (usually Docker Desktop integration).
  6. Export the distro before major upgrades.

Checklist C: Upgrade without drama

  1. Export: wsl.exe --export <Distro> C:\path\backup.tar.
  2. Document your must-have packages: compilers, language runtimes, key CLIs.
  3. If your environment is fragile, consider rebuilding from scratch and restoring only dotfiles and SSH keys.
  4. Re-import into a new distro name if you want a clean rollback path.

Three corporate mini-stories from the trenches

Mini-story #1: The incident caused by a wrong assumption (filesystem boundary edition)

A product team had a WSL-based dev environment and a large monorepo. New developers were told, casually, to “just clone the repo in your Windows home directory and use WSL for builds.” It sounded reasonable: Windows tools can see the files, Linux tools can build them. Convenience wins, right?

Within a month, the team started seeing intermittent test failures and “random” timeouts. CI was fine. Only laptops were melting. The main symptom was slow file scanning: the build system and language servers would crawl, then developers would kill processes, then incremental builds would get confused about what changed.

Someone finally profiled it the unglamorous way: timing file creation and git status in two places. Linux filesystem: fast enough. /mnt/c: painfully slower, and jittery. The “random timeouts” weren’t random. They were the build tool waiting on filesystem operations that were crossing the Windows/Linux boundary thousands of times per minute.

The fix was boring but immediate: move the repo to ~/src inside WSL, document “do not build on /mnt/c,” and teach Windows users to open the repo via \\wsl$. The incident ended, not with a patch to the build tool, but with a corrected assumption about where the files lived.

Mini-story #2: The optimization that backfired (rolling release bravado)

An infrastructure subgroup wanted “newer everything” on developer machines to reduce friction with modern toolchains. They standardized on a fast-moving distro for WSL because it shipped newer compilers and libraries without extra repos. For a while, it was great. Faster builds, fewer manual installs, happier engineers.

Then came a wave of upgrades. A handful of core packages changed behavior—nothing outrageous, but enough to break a couple of internal scripts that assumed older defaults. At the same time, a vendor CLI used by half the company shipped a binary that didn’t like a newer library version. The failure mode was ugly: “works on my machine” turned into “works on my machine yesterday.”

Support load went up. Not because the distro was “bad,” but because the organization didn’t have the operational discipline for frequent upgrades. People pinned packages ad hoc. Some stopped upgrading entirely. Now the fleet had three states: updated, partially updated, and fossilized.

The eventual recovery was to define two tiers: a stable default (Ubuntu LTS) and an “advanced/experimental” option (rolling). They also added a basic policy: major upgrades happen intentionally, with an export backup first. The lesson wasn’t “never use modern distros.” It was that “modern by default” is an operational commitment, not a vibe.

Mini-story #3: The boring but correct practice that saved the day (export/import discipline)

A finance-adjacent team had a WSL distro that accumulated years of tooling: Python envs, local databases for integration tests, custom binaries, you name it. One engineer’s laptop started behaving oddly after a Windows update: WSL would launch, then hang. Reboots didn’t help. The immediate fear was data loss and a multi-day rebuild.

But the team had an unglamorous habit: before major changes, they exported their WSL distros to a standard location. Not daily, not perfect, but often enough that it mattered. The engineer had an export from the prior week.

The recovery playbook was straightforward: wsl.exe --shutdown, uninstall the broken distro registration, and re-import from the tarball under a new name. They were back in business quickly, and they could diff the old and new environment without panic.

That “boring practice” didn’t just save time; it saved decision quality. Without it, people tend to make desperate changes and create a bigger mess. With a known rollback, the team could fix the root cause calmly.

FAQ

1) Should I pick Ubuntu or Debian for WSL if I’m new to Linux?

Ubuntu LTS. You’ll get more “it just works,” more tutorials that match your system, and fewer packaging surprises.

2) Is Debian “more stable” than Ubuntu LTS?

In terms of change rate to base packages, Debian stable tends to be more conservative. Ubuntu LTS is also stable, but with different defaults and sometimes newer stacks via backports/HWE approaches. For most WSL users, both are stable enough; choose based on ecosystem compatibility.

3) Does distro choice fix slow performance on WSL?

Sometimes, but not usually. The biggest performance lever is putting your workload on the Linux filesystem, not /mnt/c. After that: memory pressure, CPU limits, and Docker strategy.

4) Should I enable systemd in WSL?

Enable it if you need services to behave like Linux servers: systemctl, timers, journald logs. If you just need shells and compilers, skip it and keep the environment simpler.

5) Can I run Docker in WSL without Docker Desktop?

Yes, but you’ll be managing a daemon and its storage, plus service lifecycle. Most developers should use Docker Desktop integration unless policy prevents it.

6) What about openSUSE or Fedora if my production servers use them?

That’s a legitimate reason to match production userland. Just be honest about upgrade cadence: Fedora is faster-moving; openSUSE has both stable and rolling options.

7) Is Alpine a good WSL distro for dev work?

Alpine is great for containers. For general dev on WSL, musl-related compatibility issues and differing userland behavior can cost time. Use it when you specifically need Alpine parity.

8) How do I keep multiple WSL distros without chaos?

Name them by purpose (e.g., Ubuntu-Work, Debian-CI-Parity, Kali-Lab). Keep your “default” boring. Export before upgrades. Don’t share the same repos across distros via /mnt/c if performance matters.

9) Can I access my WSL files from Windows safely?

Yes—use \\wsl$ from Windows. Avoid poking directly into the underlying VHDX or distro filesystem from Windows paths not meant for it.

10) If I already picked wrong, how painful is it to switch?

Not that painful if you treat the distro as replaceable. Export if needed, re-install a new distro, and re-provision via scripts. The pain comes from snowflake environments; fix that once and future you will stop sending angry emails to present you.

Conclusion: practical next steps

If you want a WSL distro that won’t annoy you, optimize for predictability and shared reality, not personal taste.

  1. Default choice: install Ubuntu LTS unless you have a specific reason not to.
  2. Stability-first choice: use Debian stable when you want minimal churn and clean defaults.
  3. Performance choice: keep repos in the Linux filesystem; stop building on /mnt/c.
  4. Operational hygiene: export your distro before upgrades, and keep a rebuild script for core tooling.
  5. Triage like an SRE: check filesystem location, memory pressure, and network/proxy before blaming the distro.

Pick the boring option, set it up correctly, and spend your time shipping instead of learning new and exciting ways for a package manager to disappoint you.

]]>
https://cr0x.net/en/pick-wsl-distro-ubuntu-debian-others/feed/ 0
WSL: The Fastest Way to Get a Real Dev Environment on Windows (No VM Drama) https://cr0x.net/en/wsl-real-dev-environment-windows/ https://cr0x.net/en/wsl-real-dev-environment-windows/#respond Sat, 21 Feb 2026 04:03:43 +0000 https://cr0x.net/?p=33917 If you’ve ever tried to do “serious” development on Windows, you know the pattern: you start with good intentions, then spend a day installing toolchains, chasing DLL errors, and discovering your build scripts assume bash exists. You can run a full VM, sure—until the fan noise becomes your primary interface and your laptop turns into a portable space heater.

WSL2 is the least dramatic way to get a real Linux environment on a Windows machine, with near-native performance, first-class tooling, and enough escape hatches to keep SREs from developing a nervous tic. But it’s also easy to use wrong. This is the field guide for using it right.

What WSL actually is (and what it isn’t)

WSL is not “Linux apps running in a Windows compatibility layer” (that was the vibe of WSL1). WSL2 is a real Linux kernel running inside a lightweight VM managed by Windows. Your Linux environment is real enough that you can run standard packages, real system calls, real networking stacks, and modern container workflows.

It’s also not a full server VM you should treat like a pet. WSL instances are disposable-ish. You want repeatable configuration, versioned dotfiles, and backups you can actually restore. If your workflow depends on “I configured it once and now I’m afraid to touch it,” you’ve rebuilt the VM drama you came here to avoid.

WSL2 gives you:

  • A Linux filesystem stored in a VHDX (virtual disk) on Windows.
  • Fast Linux-side filesystem operations.
  • Interop: calling Windows binaries from Linux and vice versa.
  • Reasonable integration with editors (especially VS Code Remote).
  • Enough tunables for CPU/memory/swap to stop runaway builds from eating your lunch.

WSL2 also gives you new ways to fail. Mostly around filesystems, networking assumptions, and resource tuning. Conveniently, those are the same areas that bite production systems, so you can treat your workstation like a tiny datacenter and behave accordingly.

Interesting facts and short history you can weaponize

  • WSL1 and WSL2 are fundamentally different beasts. WSL1 translated Linux syscalls to Windows; WSL2 runs a real Linux kernel in a managed VM.
  • WSL2 stores your distro in a VHDX file. That single file is both a blessing (easy backup/export) and a curse (you can fill it with junk fast).
  • Early WSL1 didn’t support key kernel features that container tooling assumes. WSL2 changed the game for Docker and Kubernetes development workflows.
  • Microsoft made systemd support official in WSL after years of “please don’t do init systems here” cultural friction. It’s now a standard option, not a hack.
  • File performance depends on where you keep your code. Linux filesystem operations are fast on the ext4 inside WSL; crossing into /mnt/c is slower and has different semantics.
  • WSLg is a real thing. Linux GUI apps can run integrated on Windows 11 without a DIY X server circus.
  • WSL networking uses NAT and virtual switches. That makes localhost mostly friendly, but “my service is bound to the wrong interface” becomes a frequent workplace mystery.
  • Windows and Linux disagree about case sensitivity and filename rules. If you mix filesystems casually, Git will eventually make it personal.
  • WSL got a Store-delivered versioning model so updates can ship faster than OS releases. This matters when you’re debugging “it works on my Windows build” differences.

Why WSL2 is fast (and when it’s not)

WSL2 feels fast because your Linux tools aren’t emulated; they’re running against a real kernel. Process startup is snappy, the package manager behaves, and typical dev tasks—compiling, running tests, running a local database—are solid.

The fastest thing you can do for WSL performance is boring: keep your repositories in the Linux filesystem (~ or anywhere under /home) and treat Windows mounts (/mnt/c) as “interop-only.” If you ignore this, you’ll spend the rest of your week “optimizing” the wrong thing.

Where it can get slow or weird:

  • Cross-filesystem I/O. Node/npm, Rust, Go, Python virtualenvs, and Git generate lots of small files. Doing that on /mnt/c is a tax you pay per syscall.
  • Antivirus scanning. When Windows Defender scans your build artifacts, it’s like running chaos engineering against your laptop.
  • Memory pressure. WSL2 is a VM. If you let it balloon or swap aggressively, your “fast” dev environment becomes a stuttery horror show.
  • Networking assumptions. Binding to localhost is usually fine, but inbound traffic from other machines requires explicit thought.

One operational truth worth pinning to your monitor: your workstation is a multi-tenant system (Windows + WSL + maybe Docker). Resource isolation is not optional; it’s the difference between “productive” and “why is Teams using 4GB.”

A sane WSL2 setup for real work

Pick a distro like an adult

Use Ubuntu LTS unless you have a strong reason not to. It has the broadest package ecosystem and the fewest “why is this tutorial different” moments. Debian is fine if you prefer it. Alpine is great for containers, not for humans as a daily driver.

Enable systemd (if you want services that behave normally)

If you’re running Docker Engine inside WSL, or you want ssh-agent, postgresql, cron, etc. to behave like they do on real Linux, enable systemd. If you don’t need it, don’t enable it—less moving parts.

Use Windows Terminal and VS Code Remote

Windows Terminal is the right shell UI. VS Code Remote to WSL is the right editing model. Editing Linux files with Windows processes works, but it’s a great way to meet line-ending issues and file watcher quirks you didn’t order.

Resource tuning: set limits before your laptop sets them for you

WSL2 can consume memory opportunistically. That’s great until it isn’t. Put boundaries in place using .wslconfig. You want enough RAM for builds and containers, but you also want Windows to remain responsive.

And now the one quote you should remember when you’re tempted to “just wing it”:

“Hope is not a strategy.” — paraphrased idea often cited in engineering and operations circles

Joke #1: WSL is like a well-run on-call rotation—quiet when you’re doing things correctly, and loud when you get creative.

12+ practical tasks: commands, outputs, decisions

These are the daily-driver checks I use when a WSL setup is being weird. Each task includes: the command, what the output means, and the decision you make from it.

Task 1: Verify you’re on WSL2 (not WSL1)

cr0x@server:~$ wsl.exe -l -v
  NAME            STATE           VERSION
* Ubuntu-22.04    Running         2

Meaning: VERSION 2 confirms you’re on the real kernel-based WSL.

Decision: If you see VERSION 1, convert it (wsl.exe --set-version <distro> 2) before you benchmark anything.

Task 2: Confirm the kernel and environment from inside Linux

cr0x@server:~$ uname -a
Linux cr0x 5.15.133.1-microsoft-standard-WSL2 #1 SMP Wed Oct 11 16:01:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Meaning: You’re on the Microsoft WSL2 kernel build.

Decision: If you’re missing expected kernel features, update WSL from Windows and reboot WSL.

Task 3: Check disk usage inside the distro (spot silent VHDX growth)

cr0x@server:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc        251G   76G  163G  32% /

Meaning: Your Linux root filesystem has headroom.

Decision: If usage is high, hunt large directories (next task) before you “fix” performance with random tweaks.

Task 4: Find the top space consumers quickly

cr0x@server:~$ sudo du -xhd1 /home | sort -h
4.0K    /home/lost+found
12G     /home/cr0x/.cache
38G     /home/cr0x/projects
50G     /home/cr0x

Meaning: Caches and project directories dominate.

Decision: Clear language/tool caches thoughtfully (pip, npm, cargo) and consider pruning container images if Docker is involved.

Task 5: Measure whether your repo is on Linux FS or Windows mount

cr0x@server:~$ pwd
/home/cr0x/projects/api-service

Meaning: You’re in the Linux filesystem. Good.

Decision: If you see /mnt/c/..., move the repo into /home unless you have a specific interop need.

Task 6: Check mount type and options (diagnose permission weirdness)

cr0x@server:~$ mount | grep -E '/mnt/c|/ '
C: on /mnt/c type drvfs (rw,noatime,uid=1000,gid=1000,metadata,case=off)
none on / type ext4 (rw,relatime,discard,errors=remount-ro)

Meaning: Windows drive is mounted as drvfs; Linux root is ext4.

Decision: If metadata is missing, you’ll have permission/exec-bit pain on /mnt/c. Fix via WSL mount options in /etc/wsl.conf.

Task 7: Verify CPU and memory view (and catch runaway allocation)

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           15Gi       6.2Gi       1.9Gi       128Mi       6.9Gi       8.5Gi
Swap:          8.0Gi       0.0Ki       8.0Gi

Meaning: Plenty of available memory; swap not in use.

Decision: If swap is hot, fix memory limits or stop running a database, three browsers, and a container cluster simultaneously.

Task 8: Identify what’s chewing CPU (and whether it’s Windows or Linux)

cr0x@server:~$ top -b -n1 | head -n 12
top - 14:22:01 up  2:11,  0 users,  load average: 2.31, 2.10, 1.88
Tasks:  95 total,   1 running,  94 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.5 us,  2.1 sy,  0.0 ni, 84.8 id,  0.2 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem :  15360.0 total,   2048.0 free,   6360.0 used,   6952.0 buff/cache
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
812 cr0x      20   0 2539888 312432  40288 R  78.2   2.0   0:34.12 node

Meaning: The Linux process node is hot; CPU wait is low.

Decision: If wa (I/O wait) is high, look at filesystem placement and antivirus. If CPU is hot, it’s probably your build/test workload.

Task 9: Check DNS behavior (common source of “pip/npm is broken”)

cr0x@server:~$ resolvectl status | sed -n '1,30p'
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 172.24.112.1
       DNS Servers: 172.24.112.1

Meaning: DNS is coming from a virtual gateway.

Decision: If name resolution fails intermittently, you may need to stop auto-generating /etc/resolv.conf and set stable DNS, especially behind corporate VPN/proxy setups.

Task 10: Confirm localhost ports and listening services

cr0x@server:~$ ss -lntp | head
State  Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
LISTEN 0      4096   127.0.0.1:5432     0.0.0.0:*     users:(("postgres",pid=941,fd=7))
LISTEN 0      4096   0.0.0.0:8080       0.0.0.0:*     users:(("python",pid=1202,fd=3))

Meaning: Postgres is bound to localhost; your app is bound to all interfaces.

Decision: If Windows can’t reach a WSL service, bind to 0.0.0.0 (carefully) or configure port forwarding rules; don’t guess.

Task 11: Validate systemd status (if enabled)

cr0x@server:~$ systemctl is-system-running
running

Meaning: systemd is healthy; services should behave normally.

Decision: If you see degraded, check failing units; don’t ignore it just because “it’s only a dev box.”

Task 12: Check journal logs for service failures

cr0x@server:~$ journalctl -u docker --no-pager -n 10
Feb 04 14:01:12 cr0x systemd[1]: Starting Docker Application Container Engine...
Feb 04 14:01:12 cr0x dockerd[1333]: time="2026-02-04T14:01:12.412" level=info msg="Starting up"
Feb 04 14:01:12 cr0x dockerd[1333]: time="2026-02-04T14:01:12.980" level=info msg="Loading containers: start."
Feb 04 14:01:13 cr0x systemd[1]: Started Docker Application Container Engine.

Meaning: Docker service started cleanly.

Decision: If you see storage-driver errors or permission issues, fix the underlying cause (filesystem location, cgroup settings), not by reinstalling everything.

Task 13: Confirm Git is not silently fighting case sensitivity

cr0x@server:~$ git config --get core.ignorecase
false

Meaning: Git will treat case changes as meaningful, which matches Linux behavior.

Decision: If this is true and your team uses case-sensitive paths, you’re inviting weird diffs and missing files—fix it per-repo.

Task 14: Export a distro for backup (disaster recovery for your dev life)

cr0x@server:~$ wsl.exe --export Ubuntu-22.04 D:\wsl-backups\ubuntu-22.04.tar
Exporting, this may take a few minutes...

Meaning: You created a portable backup archive.

Decision: If you cannot restore your environment quickly, you don’t have a setup—you have a snowflake.

Task 15: Import a distro to a different drive (space and performance management)

cr0x@server:~$ wsl.exe --import Ubuntu-22.04-Work D:\WSL\Ubuntu-Work D:\wsl-backups\ubuntu-22.04.tar --version 2
Importing, this may take a few minutes...

Meaning: You installed a new instance on a chosen path, useful for fast NVMe or larger disks.

Decision: If your system drive is cramped, move the distro. Don’t wait for “disk full” during a release.

Task 16: Sanity-check interop when calling Windows tools

cr0x@server:~$ powershell.exe -NoProfile -Command '$PSVersionTable.PSVersion.ToString()'
7.4.1

Meaning: You can call Windows PowerShell from Linux. Handy for automation glue.

Decision: Use interop deliberately. If your build depends on it, document it, because someone will run it on pure Linux and curse your name.

Storage and filesystem rules that prevent regret

I’m a storage person. I don’t trust “it feels fine” as a metric, and neither should you. WSL2 performance lives and dies by filesystem boundaries.

Rule 1: Keep repos inside the Linux filesystem

Do your work under /home. Put your source code there. Put your dependency caches there. Let Linux do Linux things where Linux semantics exist.

When you develop under /mnt/c, you’re asking for:

  • Slow metadata operations (lots of small file creates/renames).
  • Confusing permission bits (unless you enable metadata support).
  • File watcher weirdness (especially in Node, webpack, and Python reloaders).
  • Case sensitivity mismatches that cause “works for me” bugs.

Rule 2: Treat /mnt/c like an exchange zone

Use it for things that need to be visible to Windows apps: exporting artifacts, dropping logs for a Windows-side viewer, copying data sets, or integrating with corporate tools that only run on Windows. Don’t make it your build workspace.

Rule 3: Plan for VHDX growth and cleanup

Your WSL distro’s disk is a virtual disk image. It grows as you install packages and generate build artifacts. It doesn’t always shrink automatically when you delete files. That’s not a moral failing; it’s how thin-provisioned virtual disks work.

Operational implication: periodically prune caches and container layers. Also, if you run heavy databases locally, consider putting their data in a separate distro or separate storage strategy so your primary dev environment doesn’t become a landfill.

Rule 4: Know what “fast I/O” means for your workload

Compilers, package managers, and test runners often do lots of small random I/O and metadata operations (stat, open, close). That’s where filesystem translation layers hurt. Large sequential reads of a few files might not show it. So when you benchmark, benchmark your actual workload: npm ci, pip install, go test ./..., your real build step.

Networking: localhost, DNS, proxies, and the sharp edges

WSL2 networking is good enough that you forget it exists—until you join a corporate VPN, run a local proxy, or bind to the wrong interface and wonder why nothing can talk to anything.

Localhost: the common happy path

Most dev workflows are “I run a server in WSL and hit it from a browser on Windows.” That usually works because Windows and WSL cooperate for localhost forwarding in common scenarios. But don’t assume it’s magic; verify with ss and actual requests.

Bind addresses: choose intentionally

Binding to 127.0.0.1 means “only inside WSL.” Binding to 0.0.0.0 means “all interfaces,” which makes it reachable from Windows and potentially other networks depending on firewall and routing.

If you’re working with sensitive services (databases, internal admin UIs), keep them on localhost unless you have a clear need. Developers accidentally exposing a debug port is how you end up in a security training slideshow.

DNS: when “it’s intermittent” is your clue

Corporate VPN clients often install their own DNS and routing policies. WSL may pick up a stub resolver address that sometimes works and sometimes doesn’t, depending on whether the VPN is in a good mood. When you see package installs failing with name resolution errors, treat DNS as a first-class suspect.

Proxies: don’t half-configure them

If your network requires an HTTP proxy, configure it consistently across:

  • Environment variables (HTTP_PROXY, HTTPS_PROXY, NO_PROXY)
  • Git proxy settings if needed
  • Language-specific tooling (npm, pip, cargo) if needed
  • Docker daemon / build configuration if you build images

Half-configured proxies create “works sometimes” failures that waste the most expensive resource in engineering: your attention.

Docker and containers on WSL2 without foot-guns

There are two common patterns:

  1. Docker Desktop using the WSL2 backend. Usually the easiest. Windows manages the integration, and your Linux tools talk to Docker seamlessly.
  2. Docker Engine installed inside the WSL distro. Works well if you want a more Linux-native experience, especially with systemd, but you own more of the plumbing.

Storage for containers: the invisible disk hog

Container layers accumulate. They always do. If you don’t prune, your VHDX grows, performance degrades, and you’ll eventually hit “no space left on device” mid-build.

Filesystem placement still matters

Build contexts living on /mnt/c can make Docker builds painfully slow because sending build context and reading metadata crosses the boundary. Keep build contexts in the Linux filesystem.

Joke #2: Docker image layers are like office snacks—nobody admits they’re responsible, but they disappear slowly until the budget meeting.

Three corporate-world mini-stories (pain included)

Mini-story 1: The incident caused by a wrong assumption

A mid-sized company rolled out a “standard dev environment” policy: Windows laptops, WSL2 for backend services, a shared monorepo, and a local Postgres container. Sounds modern. It worked fine for weeks—until a release week when a few developers started reporting that migrations “randomly fail” and the API “sometimes can’t connect to the database.”

The wrong assumption was subtle: they assumed localhost meant the same thing everywhere, and that if a service listened on localhost inside WSL, it was automatically reachable from Windows apps. Some developers ran the API in WSL and the DB in Docker Desktop; others ran both inside WSL; a few ran the API in Windows and the DB in WSL. The connection string localhost:5432 was treated like a universal truth.

In practice, the networking path depended on where the client lived. A Windows process hitting localhost was not always talking to WSL’s localhost. Some setups forwarded correctly; others didn’t, especially when the VPN client toggled routes and the Hyper-V virtual switch reset.

The fix wasn’t a magical registry tweak. They standardized the topology: API and DB run in the same environment (both in WSL, or both in Docker Desktop with WSL backend). They wrote two canonical connection strings and put them in tooling that detected the runtime context. They also added a simple health check script that validated connectivity from the actual client process, not from “where we think it runs.”

After that, the “randomness” vanished. It wasn’t random; it was unstated architecture.

Mini-story 2: The optimization that backfired

A different org had a performance problem: npm install was slow. Someone decided to “optimize” by keeping the repo on the Windows filesystem so Windows indexing and enterprise backup tools could see it. They moved the monorepo under C:\dev and accessed it from WSL via /mnt/c/dev.

At first it looked fine. Builds started, tests ran, and the team congratulated itself for being pragmatic. Two weeks later, the bug reports started: hot reload missing changes, Jest watchers spiking CPU, Git reporting modified files that nobody touched, and TypeScript builds occasionally failing because a file “does not exist” for a millisecond.

They’d optimized for the wrong observer. The Windows filesystem path introduced metadata latency and semantic mismatches, especially for tools that watch file events and expect Linux behavior. The overhead wasn’t just speed—it was correctness. Developers wasted hours “fixing” their code when the filesystem was the one lying to them.

The actual optimization was the opposite: keep repos in the Linux filesystem for correctness and speed, then selectively sync artifacts back to Windows when needed. They also excluded the WSL VHDX location from aggressive scanning policies where possible. The end result: faster installs and fewer phantom file issues.

Performance wins that break correctness are not wins. They’re debt with interest.

Mini-story 3: The boring but correct practice that saved the day

A large enterprise team treated WSL dev environments the way SREs treat production: as disposable, reproducible units. Every developer had a bootstrap script that installed packages, configured shell tooling, pinned language runtimes, and set up SSH keys and known hosts policies. It wasn’t glamorous; it was a pile of scripts and a checklist.

One Monday, a Windows update caused a subset of machines to behave oddly with virtualization features. WSL distros wouldn’t start reliably; some got stuck “Running” but didn’t respond. The usual outcome for teams is lost days and a lot of personal debugging rituals.

This team’s response was almost boring: export the distro if possible, then rebuild from scratch on affected machines using the script. If export failed, they restored from a recent export archive. Developers were back online the same day, and the team lead didn’t have to beg IT for miracles.

The practice that saved them was simple: periodic WSL exports + reproducible setup. It turned a workstation incident into a minor inconvenience.

In ops terms, they had an RTO for developer productivity. Most teams don’t. They should.

Fast diagnosis playbook: what to check first/second/third

When WSL “feels slow” or “is acting weird,” you want to identify the bottleneck in minutes, not hours. Here’s the triage order that works in practice.

First: Confirm where the work is happening (filesystem boundary)

  1. Run pwd. If you’re under /mnt/c, assume that’s the culprit until proven otherwise.
  2. Run mount | grep /mnt/c and check if metadata is enabled if you rely on permissions/exec bits.
  3. Move the repo to /home and rerun the slow command. If it’s suddenly fine, stop diagnosing and adopt the fix.

Second: Check resource pressure (memory and swap)

  1. Run free -h. If available memory is low and swap is used, expect stutters.
  2. Run top. If load average is high with low CPU usage, it may be I/O wait or contention.
  3. Decide: set sane WSL limits, reduce container footprint, or stop running heavyweight services locally.

Third: Check antivirus and indexing impacts (Windows side)

  1. If your repo is on Windows filesystem, expect scanning overhead—move it.
  2. If your distro’s VHDX location is being scanned aggressively, you’ll see intermittent spikes during builds.
  3. Decision: coordinate with corporate endpoint policy if allowed; otherwise design your workflow to minimize the scan surface.

Fourth: Networking and DNS (especially behind VPN)

  1. Run resolvectl status and validate DNS server presence.
  2. Try name resolution (getent hosts for a known domain).
  3. Decision: pin DNS config, adjust VPN split-tunnel settings if permitted, and ensure proxy variables are consistent.

Fifth: Version drift (WSL, kernel, distro packages)

  1. Run wsl.exe --status on Windows and uname -a in Linux.
  2. Update WSL and the distro packages if you’re chasing a known bug.
  3. Decision: standardize versions across the team; don’t debug “works on my laptop” kernel mismatches forever.

Common mistakes: symptom → root cause → fix

1) Symptom: Git status is slow, npm installs take forever

Root cause: Repo is on /mnt/c (drvfs). Heavy metadata workloads are penalized.

Fix: Move repo into /home. Keep Windows access via selective sync (copy artifacts out), not by developing on drvfs.

2) Symptom: Permission denied / scripts not executable under /mnt/c

Root cause: drvfs mounted without metadata, so Linux permission bits aren’t preserved.

Fix: Configure /etc/wsl.conf to mount with metadata, or keep executable scripts inside the Linux filesystem where permissions behave.

3) Symptom: “Temporary failure resolving …” during package installs

Root cause: DNS forwarding instability, often triggered by VPN clients or changing network profiles.

Fix: Disable auto-generation of /etc/resolv.conf and set stable DNS, or adjust the VPN/proxy policy to provide consistent resolver behavior.

4) Symptom: Service runs in WSL but browser on Windows can’t reach it

Root cause: Service bound to 127.0.0.1 inside WSL or port forwarding not active.

Fix: Bind to 0.0.0.0 for dev where appropriate, verify with ss -lntp, and use explicit port forwarding if required by your setup.

5) Symptom: Docker builds are slow; context send takes ages

Root cause: Build context on Windows filesystem; lots of file metadata traversals cross boundaries.

Fix: Keep Dockerfile and context in Linux filesystem. Prune images and builder cache routinely.

6) Symptom: WSL memory usage keeps growing and Windows feels sluggish

Root cause: WSL2 VM opportunistically uses RAM; memory isn’t reclaimed quickly under some workloads.

Fix: Set limits in .wslconfig, restart WSL when needed, and avoid running container stacks you don’t need daily.

7) Symptom: File watchers miss changes or spin CPU

Root cause: Editing Linux files with Windows tooling across filesystem boundaries, or watching files on drvfs.

Fix: Use VS Code Remote to WSL; keep watched files in Linux FS; reduce watcher scope; use polling only as a last resort.

8) Symptom: “No space left on device” even after deleting stuff

Root cause: VHDX grew; deleting files doesn’t always shrink the virtual disk immediately.

Fix: Prune caches, export/import to compact logically, and keep heavy data in a planned location. Don’t wait for a crisis.

Checklists / step-by-step plan

Checklist A: 30-minute setup that won’t embarrass you later

  1. Install WSL2 and choose Ubuntu LTS.
  2. Update packages inside the distro; install core tools (git, build-essential, curl, ca-certificates).
  3. Decide on systemd: enable only if you need long-running services.
  4. Create ~/projects and keep repos there.
  5. Use Windows Terminal as your shell UI.
  6. Use VS Code Remote to WSL for editing and debugging.
  7. Configure SSH keys and agent strategy (Windows agent forwarding or Linux agent, but pick one).
  8. Set .wslconfig memory/CPU/swap boundaries appropriate to your machine.
  9. Run the “12+ tasks” checks once to baseline behavior.
  10. Export the distro once it’s clean. This is your golden image.

Checklist B: Daily operating habits (the ones that prevent slow creep)

  1. Keep build caches under control (prune old virtualenvs, node_modules in dead branches, stale container layers).
  2. Don’t run heavyweight services you don’t need today.
  3. Prefer Linux-native tools inside WSL; use interop sparingly.
  4. When something breaks after a VPN change, check DNS before reinstalling toolchains.
  5. Export your distro periodically if your work depends on local state.

Checklist C: When performance tanks, do this in order

  1. Confirm repo location: if it’s on /mnt/c, move it.
  2. Check memory/swap: if swapping, set limits and reduce workload.
  3. Check I/O wait and top CPU offenders.
  4. Check DNS and proxy settings if network operations are slow.
  5. Only then consider WSL updates, resets, or a distro rebuild.

FAQ

Is WSL2 “just a VM”?

Technically yes: a lightweight VM with a real Linux kernel. Practically, it behaves like an integrated subsystem with excellent interop and tooling support.

Should I use WSL1 for anything?

Rarely. WSL1 can be useful for specific filesystem access patterns on Windows drives, but for modern Linux tooling and containers, WSL2 is the default.

Where should I put my source code?

Inside the Linux filesystem, under /home. Use /mnt/c for exchange with Windows tools, not for builds and dependency installs.

Can I run Docker without Docker Desktop?

Yes, by installing Docker Engine inside the WSL distro. It’s viable, especially with systemd enabled, but you’ll own more configuration and updates.

Why does my WSL disk usage not shrink after deleting files?

Because the virtual disk grew to accommodate data, and thin-provisioned disks don’t always compact automatically. Plan cleanup and consider export/import as a practical compaction approach.

How do I make Windows reach a service running in WSL?

Ensure the service is listening on an appropriate interface (0.0.0.0 if needed) and verify with ss -lntp. Then test connectivity from Windows. Don’t rely on assumptions about localhost forwarding.

Do I need systemd in WSL?

Only if you want Linux services to run and be managed normally (Docker Engine, databases, background daemons). If you’re just running CLI tools and ephemeral processes, you can skip it.

What’s the safest backup strategy for WSL?

Use wsl.exe --export to create periodic archives, plus keep your configuration in version control (dotfiles, bootstrap scripts). Backups without restore practice are theoretical.

Why do file watchers behave oddly?

Often because the files live on drvfs (/mnt/c) or are edited across boundaries. Keep watched project files in Linux FS and use WSL-native editing (VS Code Remote).

Can I run GUI Linux apps?

On Windows 11 with WSLg, yes, and it’s surprisingly usable. For production-grade Linux GUI work, you still want a real Linux desktop or remote environment, but for dev tooling it’s fine.

Next steps that actually improve your life

If you want the shortest path to a “real dev environment” on Windows, do these things and stop improvising:

  1. Move your repos into the Linux filesystem and treat /mnt/c as a staging area.
  2. Baseline your system using the tasks above: verify WSL2, check mounts, confirm memory behavior, validate DNS, and check listening ports.
  3. Set WSL resource limits so your dev environment can’t DOS your desktop.
  4. Pick one container strategy (Docker Desktop WSL backend or Docker in-distro) and standardize it for your team.
  5. Make it recoverable: export the distro once you’ve got it right, and keep a reproducible bootstrap script so a rebuild is a routine event, not a catastrophe.

WSL2 is not a science project. It’s infrastructure. Treat it like infrastructure and it will behave like it. Treat it like a weekend hobby and it will eventually do what all weekend hobbies do: break right before Monday.

]]>
https://cr0x.net/en/wsl-real-dev-environment-windows/feed/ 0
WSL2 Time Drift: Fix Clock Skew the Right Way https://cr0x.net/en/wsl2-time-drift-fix/ https://cr0x.net/en/wsl2-time-drift-fix/#respond Fri, 20 Feb 2026 04:50:17 +0000 https://cr0x.net/?p=34953 When WSL2 time drifts, nothing fails politely. Git fetches die with TLS errors. Package managers swear the repository metadata is “from the future.” Logs become fiction. And your incident timeline—already fragile—turns into interpretive dance.

The worst part: it feels random. You close your laptop, walk to a meeting, come back, and suddenly your Linux environment believes it’s last Tuesday. Let’s fix it like adults: measure first, sync the right clock, and stop papering over skew with hacks that make security tools scream.

What WSL2 time drift looks like (and why you should care)

Clock skew is one of those “it’s only a few minutes” problems that can stop an entire pipeline. In production operations, time is an API contract: TLS certificate validity, Kerberos tickets, OAuth token expiry, log correlation, distributed tracing spans, cache invalidation, build reproducibility. Break time and you break trust.

WSL2 adds a twist: it’s a Linux VM running under Hyper-V virtualization plumbing, with its own kernel. That means time can drift at the guest level even when Windows looks fine. Sleep/hibernate transitions and host load can exacerbate it. Some users get a WSL2 instance that resumes with a stale clock; others see periodic drift under sustained CPU pressure or when the guest isn’t getting scheduled frequently.

Two quick realities:

  • Your Windows clock being correct does not guarantee your WSL2 clock is correct.
  • Fixing time by “just setting the date” inside the guest is a band-aid. It may even create security problems by masking the fact that you have no reliable time source.

Joke #1: Time drift is like technical debt—you ignore it until it charges interest during an outage.

Interesting facts and historical context (you’ll be smarter at the end)

  • NTP is older than many modern OSes. The Network Time Protocol has been around since the early 1980s and is still the backbone of time sync on the internet.
  • Clock sync is not just “set the time.” Modern implementations discipline the clock gradually to avoid breaking time-sensitive applications.
  • Virtualized time has always been weird. Early VM platforms struggled with timer interrupts and scheduling, leading to guests that ran “fast” or “slow” under load.
  • Monotonic vs wall-clock time matters. Linux provides a monotonic clock for measuring durations; wall clock time can jump due to NTP adjustments or manual changes.
  • Kerberos is famously intolerant of skew. Typical default tolerances are minutes, not hours. Past that, auth fails in ways that look like “credentials broken.”
  • TLS relies on time for basic safety. Certificate validity windows are a simple check that stops replay and misuse. Wrong time makes secure systems look broken.
  • Leap seconds are a real thing. They’re rare, but they’ve caused production incidents when systems disagree about how to handle them.
  • Windows and Linux differ on time assumptions. Historically, dual-boot setups fought over whether the hardware clock is local time or UTC; that fight taught a generation to respect timekeeping.

How WSL2 keeps time: the moving parts

WSL2 is a lightweight VM with a real Linux kernel. It shares a lot with Hyper-V guests: virtual timers, a virtualization bus, and integration mechanisms that let the guest cooperate with the host. But “lightweight” doesn’t mean “immune to VM time issues.” It just means the failure modes are sneakier.

Three clocks you should keep distinct

  • Windows host wall clock: what your system tray shows; generally disciplined via Windows Time service (w32time) or domain time.
  • WSL2 guest wall clock: what date prints in Linux; can drift or resume stale after sleep.
  • Monotonic clocks: used for measuring intervals; rarely your issue in “TLS not yet valid,” but relevant if you see strange timeouts or scheduling behavior.

Why sleep/hibernate is the usual villain

When a laptop sleeps, CPU execution stops. When it wakes, the host updates its time, but the guest VM may not immediately receive a clean time sync event. If the VM is paused or its timekeeping relies on a virtual clock source that doesn’t get corrected promptly, you can resume with a clock that’s behind. The more you sleep, the more you drift. It’s not mystical; it’s state resumption.

Why “just run ntpdate” is not the right instinct

One-off time jumps can break running processes. Databases, build tools, and anything relying on timestamps can misbehave. The right approach is to enable a time service that disciplines the clock (gradual corrections), and to fix the root cause (resync on resume, ensure WSL2 integration, ensure host time is stable).

Paraphrased idea (not verbatim) from Richard Cook, reliability researcher: Failures are rarely single-point; they’re the result of normal assumptions lining up badly.

Fast diagnosis playbook

This is the “you’re on call, things are failing, and you need to know what’s wrong in five minutes” flow. Don’t overthink it.

First: confirm the skew and its direction

  1. Check Windows time vs an external reference (or at least confirm Windows isn’t wildly off).
  2. Check WSL2 time inside the distro.
  3. Compute the delta. Seconds? Minutes? Hours? That scale changes the likely root cause.

Second: decide if the guest can reach a time source

  1. Verify DNS and outbound UDP/123 reachability (or at least HTTPS access to corporate time proxies if that’s your environment).
  2. Check whether a time daemon exists and is running (systemd-timesyncd, chrony).

Third: isolate host vs guest vs network policy

  1. If Windows is wrong: fix Windows time first. The guest riding a broken host is not a guest problem.
  2. If Windows is right but WSL2 is wrong: fix WSL2’s sync mechanism and resume behavior.
  3. If both are right sometimes and wrong after sleep: implement resync on wake and verify WSL version/kernel updates.
  4. If only corporate network breaks it: you likely have NTP blocked and need an allowed time source or a different approach.

Practical tasks: commands, outputs, decisions (12+)

These are hands-on checks you can run right now. Each task includes: command, what typical output means, and what decision you make from it.

Task 1: Check WSL2 wall clock immediately

cr0x@server:~$ date -Is
2026-02-05T14:22:09+00:00

Meaning: ISO-8601 wall clock time inside WSL2. Compare it to Windows or a known-good reference.

Decision: If it’s off by more than ~2 minutes, treat it as a real problem. If it’s off by hours, suspect sleep/hibernate resume or manual changes.

Task 2: Check time zone and whether it’s a “timezone confusion” not drift

cr0x@server:~$ timedatectl
               Local time: Thu 2026-02-05 14:22:12 UTC
           Universal time: Thu 2026-02-05 14:22:12 UTC
                 RTC time: Thu 2026-02-05 14:22:12
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: no
              NTP service: inactive
          RTC in local TZ: no

Meaning: If “Local time” matches expectations but “Time zone” is wrong, you might be seeing offset confusion. If “System clock synchronized” is no and NTP is inactive, the guest isn’t being disciplined.

Decision: If NTP is inactive, you need to enable a time service (systemd-timesyncd or chrony). If the timezone is wrong, fix that separately—don’t chase “drift” that’s actually an offset.

Task 3: Confirm systemd is actually running (WSL2 varies by config)

cr0x@server:~$ ps -p 1 -o comm=
systemd

Meaning: If PID 1 is systemd, you can use systemd services normally. If it’s something else (like init or bash), you’ll need a different approach (or enable systemd in WSL).

Decision: If systemd is not PID 1, decide whether you will enable systemd (recommended for a “real Linux” experience) or run chrony manually.

Task 4: If systemd exists, check timesync service status

cr0x@server:~$ systemctl status systemd-timesyncd --no-pager
● systemd-timesyncd.service - Network Time Synchronization
     Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
     Active: active (running) since Thu 2026-02-05 14:10:03 UTC; 12min ago
       Docs: man:systemd-timesyncd.service(8)
   Main PID: 319 (systemd-timesyn)
     Status: "Synchronized to time server 10.0.0.53:123 (ntp.corp)"

Meaning: “Synchronized” is what you want. If it says “No network connectivity” or “Timed out,” your network path to NTP is blocked or misconfigured.

Decision: If it’s active and synchronized, drift might be resume-related (needs wake resync) or a clock source issue. If it’s not synchronized, fix connectivity or NTP server config.

Task 5: Inspect the last sync and current offset (timesyncd)

cr0x@server:~$ timedatectl timesync-status
       Server: 10.0.0.53 (ntp.corp)
Poll interval: 32min 0s (min: 32s; max 34min 8s)
         Leap: normal
      Version: 4
      Stratum: 3
    Reference: 3B4E2A91
    Precision: 1us (-24)
Root distance: 18.525ms
       Offset: +1.842ms
        Delay: 2.913ms
       Jitter: 0
 Packet count: 109
    Frequency: -12.345ppm

Meaning: Offset in milliseconds is healthy. Root distance and stratum give you a sense of quality. Giant offsets (seconds+) mean your guest is unstable or unsynchronized.

Decision: If offset is small but you still see TLS errors, your problem may be timezone, cached DNS, or certificate chain—not time drift.

Task 6: If you’re using chrony, confirm sources and tracking

cr0x@server:~$ chronyc tracking
Reference ID    : 0A000035 (10.0.0.53)
Stratum         : 4
Ref time (UTC)  : Thu Feb 05 14:22:24 2026
System time     : 0.000001234 seconds fast of NTP time
Last offset     : -0.000000321 seconds
RMS offset      : 0.000012345 seconds
Frequency       : 15.123 ppm slow
Residual freq   : -0.001 ppm
Skew            : 0.020 ppm
Root delay      : 0.002911 seconds
Root dispersion : 0.000812 seconds
Update interval : 64.0 seconds
Leap status     : Normal

Meaning: “System time … fast/slow of NTP time” tells you your error. Healthy is near zero, stable frequency, normal leap status.

Decision: If chrony shows no reference or large offsets, you’re not actually syncing. Fix network or server configuration before blaming WSL.

Task 7: Verify name resolution and basic connectivity (because NTP can fail quietly)

cr0x@server:~$ getent hosts ntp.corp
10.0.0.53        ntp.corp

Meaning: DNS resolves the NTP name. If it doesn’t, timesync will fail even if the network is fine.

Decision: If DNS fails, fix WSL2 resolv.conf generation issues or corporate DNS access before touching time services.

Task 8: Check UDP/123 reachability indirectly (no fancy tools required)

cr0x@server:~$ sudo timeout 3 bash -c 'cat < /dev/null > /dev/udp/10.0.0.53/123' && echo ok || echo blocked
ok

Meaning: This is a crude check: it doesn’t prove NTP responses, but “blocked” suggests routing/firewall/policy issues.

Decision: If blocked, stop trying random NTP servers; you need a corporate-approved time path (or to use host-provided time sync).

Task 9: Look for suspend/resume patterns in dmesg

cr0x@server:~$ dmesg | tail -n 20
[  812.345678] systemd[1]: Started Network Time Synchronization.
[ 5432.123456] PM: suspend entry (s2idle)
[ 5432.234567] Freezing user space processes
[ 5432.345678] PM: suspend exit
[ 5432.456789] systemd-timesyncd[319]: Network configuration changed, trying to establish connection.

Meaning: If you see suspend/resume markers and then time sync re-init, you’re likely dealing with post-resume stale time.

Decision: Implement a post-resume resync hook (in WSL context: restart WSL or trigger time service restart) rather than hand-setting time.

Task 10: Confirm the guest is actually WSL2 and see version info

cr0x@server:~$ uname -a
Linux cr0x-wsl 5.15.146.1-microsoft-standard-WSL2 #1 SMP Fri Jan 10 18:20:00 UTC 2026 x86_64 GNU/Linux

Meaning: The kernel string includes Microsoft WSL2 branding. Kernel age matters; timekeeping improvements land in newer kernels.

Decision: If your kernel is old, update WSL. Don’t debug 2020-era timekeeping if you can install 2026 fixes.

Task 11: Detect “time went backwards” symptoms in logs (a classic red flag)

cr0x@server:~$ journalctl -b -p warning --no-pager | grep -i time | tail -n 5
Feb 05 14:05:01 cr0x-wsl systemd[1]: Time has been changed
Feb 05 14:05:01 cr0x-wsl systemd[1]: Detected clock change. Skipping time warp.

Meaning: Time jumps happened. Not necessarily fatal, but it explains flaky builds and “not yet valid” errors.

Decision: If time is jumping often, focus on stable discipline (chrony) and resume hooks, not manual corrections.

Task 12: Reproduce the failure in a controlled way (TLS validity check)

cr0x@server:~$ curl -Iv https://example.internal 2>&1 | sed -n '1,15p'
*   Trying 10.0.1.25:443...
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* SSL certificate problem: certificate is not yet valid
* Closing connection 0
curl: (60) SSL certificate problem: certificate is not yet valid

Meaning: “Not yet valid” often indicates the client clock is behind the certificate’s NotBefore timestamp. It can also be a weird proxy doing TLS interception with fresh certs. But time is the first suspect.

Decision: Compare date to Windows time and to the cert validity window (next task). If skew is real, fix time before chasing certificate stores.

Task 13: Inspect certificate validity dates to prove it’s time, not “mystery TLS”

cr0x@server:~$ echo | openssl s_client -connect example.internal:443 -servername example.internal 2>/dev/null | openssl x509 -noout -dates
notBefore=Feb  5 14:10:00 2026 GMT
notAfter=May  6 14:10:00 2026 GMT

Meaning: If your WSL2 time is earlier than notBefore, TLS fails exactly as shown.

Decision: Stop arguing with TLS. Fix the clock.

Task 14: Quick, low-risk “kick” — restart WSL from Windows side

cr0x@server:~$ powershell.exe -NoProfile -Command "wsl --shutdown"

Meaning: This shuts down all WSL instances. On next launch, the VM boots fresh and often picks up correct host time.

Decision: If this fixes it consistently, your root cause is resume behavior. You still want a durable fix, but now you have a safe workaround.

Task 15: Validate whether your time daemon is stepping the clock (dangerous) or slewing (preferred)

cr0x@server:~$ chronyc -n sourcestats | head
Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
10.0.0.53                   7   4   256     +0.012      0.034   +0.000   0.001

Meaning: Stable frequency and small offsets indicate slewing discipline. If you see violent corrections, you may be stepping often, which breaks apps.

Decision: Configure chrony to step only at startup and slew thereafter, unless you have a special case.

Fixes that work (and why)

There are three tiers of fixes: update/enable the plumbing, run a real time sync daemon, and handle sleep/resume. Do them in that order. Avoid “set the time in a cron job” unless your goal is to create new and exciting failure modes.

1) Keep WSL and Windows time healthy

If Windows time is wrong, everything downstream becomes theater. Fix Windows time sync first (domain joined machines should already be disciplined). On non-domain machines, make sure Windows Time service is running and not blocked by VPN policy. Then update WSL so you’re not stuck with older kernel behavior.

2) Enable systemd in WSL (if you can)

Modern WSL supports systemd when configured. Having systemd makes time sync a normal Linux problem, which is a gift. You can run systemd-timesyncd (simple) or chrony (better under weird conditions).

If you can’t or won’t enable systemd, you can still run chrony in a more manual way, but you’ll lose the clean service lifecycle hooks that make this reliable.

3) Prefer chrony when drift is chronic

systemd-timesyncd is fine for many dev laptops. But WSL2 drift patterns—especially post-suspend and under CPU contention—often benefit from chrony’s robustness and better tracking/steering behavior.

chrony also copes better with intermittent network and long gaps between sync opportunities. That’s basically the job description of a laptop VM.

4) Handle resume explicitly

If the drift appears after sleep/hibernate, you want an explicit re-sync trigger. A clean approach is to restart the time daemon (or even WSL) on resume. The “correct” hook depends on whether you manage it from Windows (Task Scheduler) or inside Linux (systemd units and timers). In a corporate fleet, Windows-driven remediation is often easier to distribute.

Joke #2: A laptop waking from sleep is basically a tiny disaster recovery exercise—just with fewer postmortems and more coffee.

chrony configuration pattern that behaves in the real world

On a corporate network, you often have an internal NTP server. Use it. Random public NTP from inside a locked-down VPN is how you spend a Thursday explaining UDP egress to security.

Example: install and configure chrony (commands differ by distro). Here’s a Debian/Ubuntu-flavored setup:

cr0x@server:~$ sudo apt-get update
Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Reading package lists... Done

Meaning: If this fails with “Release file is not valid yet,” your clock is behind. Fix time before continuing.

Decision: If update succeeded, proceed with installing chrony.

cr0x@server:~$ sudo apt-get install -y chrony
Reading package lists... Done
Building dependency tree... Done
The following NEW packages will be installed:
  chrony
Setting up chrony (4.4-1ubuntu0.1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/chrony.service → /lib/systemd/system/chrony.service.

Meaning: chrony is installed and enabled as a service.

Decision: Configure it to point at allowed servers and to step only at startup.

cr0x@server:~$ sudo bash -c 'cat >/etc/chrony/chrony.conf <<EOF
pool ntp.corp iburst
driftfile /var/lib/chrony/chrony.drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
EOF'

Meaning: makestep 1.0 3 allows stepping if offset >1s for the first 3 updates. After that, it slews. That’s a sane balance for laptop resume weirdness without breaking running workloads repeatedly.

Decision: Restart and verify tracking.

cr0x@server:~$ sudo systemctl restart chrony
cr0x@server:~$ chronyc sources -v
  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current best, '+' = combined, '-' = not combined.
| /   Reachability register (octal) - '>377' means good.
||                                                              
^* 10.0.0.53                       3   6   377    12   +0us[ +5us] +/-  20ms

Meaning: The * indicates the selected source. Reachability 377 is excellent. Offset is tiny.

Decision: If you don’t get a source and reach stays low (like 0 or 1), it’s network policy or DNS. Fix that upstream.

Resume remediation: the pragmatic approach

On affected laptops, the most consistent “fix it now” action is restarting WSL. It’s brutal, but it’s effective. For a dev workstation, this is often acceptable. For long-running processes in WSL, less so.

If you can’t afford a full WSL shutdown, restart your time service inside the distro after resume. If systemd is enabled, that’s straightforward.

cr0x@server:~$ sudo systemctl restart systemd-timesyncd

Meaning: Forces timesyncd to re-evaluate network and resync.

Decision: If timesyncd doesn’t actually sync post-resume, switch to chrony or address NTP reachability.

Common mistakes: symptoms → root cause → fix

These are the repeat offenders. If you recognize one, you can skip hours of “maybe it’s DNS” flailing.

1) Symptom: “SSL certificate is not yet valid” inside WSL2 only

  • Root cause: WSL2 guest time is behind after sleep/resume; Windows time is correct.
  • Fix: Enable chrony or timesyncd; implement post-resume resync; quick workaround is wsl --shutdown then relaunch.

2) Symptom: apt-get update says “Release file is not valid yet”

  • Root cause: Guest clock behind repository metadata timestamp.
  • Fix: Fix time first. Don’t disable repository date checks; that’s a security downgrade disguised as convenience.

3) Symptom: time is off by exactly your timezone offset

  • Root cause: Wrong timezone configuration inside the distro, not drift.
  • Fix: Set timezone with timedatectl set-timezone (systemd) or distro-specific tools. Keep clock in UTC; display local as needed.

4) Symptom: time correct at boot, drifts during heavy CPU load

  • Root cause: Guest not receiving scheduling time slices reliably; time discipline too weak; older kernel/WSL build.
  • Fix: Update WSL/kernel; use chrony; reduce extreme CPU starvation; avoid pinning WSL to minimal resources if it breaks timekeeping.

5) Symptom: chrony/timesyncd can’t reach any server on VPN

  • Root cause: Corporate network blocks UDP/123; NTP allowed only to internal servers or via special proxy.
  • Fix: Point your time service at approved internal NTP servers; if none exist, escalate to network/security for an approved path. Don’t tunnel to random internet NTP from corporate endpoints.

6) Symptom: “System clock synchronized: yes” but apps still complain about time

  • Root cause: App is using cached tokens/certs created during skew; or the issue is monotonic time vs wall clock; or it’s not time at all.
  • Fix: Clear/restart the affected processes after fixing time; verify with certificate dates; confirm actual wall clock with date -Is and compare.

7) Symptom: “time went backwards” warnings in journald frequently

  • Root cause: Repeated stepping due to aggressive config or unstable time source; or resume causes large backward jump.
  • Fix: Use chrony with controlled stepping (makestep only early) and slewing afterward; implement resume resync to avoid huge jumps.

Corporate mini-stories from the clock-skew trenches

Mini-story 1: The incident caused by a wrong assumption

The company had a mixed Windows/Linux dev environment, and WSL2 was the unofficial standard. Everything was fine until the team rolled out a new internal package repository with short-lived TLS certificates issued by an internal CA. The CA was properly managed. The certificates were valid. The repository was boring in the best way.

Then builds started failing. Not all builds. Not even most builds. Just a rotating set of developers, mostly laptop users, mostly the ones who closed their lids between meetings. The error was consistent: “certificate is not yet valid.” Security teams got pinged. The repository team got blamed. Someone suggested the CA clock was wrong (it wasn’t).

The wrong assumption was simple: “If Windows time is fine, WSL time must be fine.” It’s a comforting thought, like believing your backups are good because the dashboard is green. An SRE finally had a developer run date in WSL2 next to the Windows clock. The guest was seven minutes behind. That was enough to fall before the cert NotBefore boundary.

The fix wasn’t heroic. They enabled a proper time sync service in WSL2, pointed it at the internal NTP servers already used by Linux hosts, and added a lightweight “resync-on-resume” job on Windows that restarted WSL for the most affected user group. Once time was stable, TLS errors evaporated. The CA team stopped getting angry emails. The repo team stopped getting angry emails. Everyone went back to ignoring time—until the next incident, because that’s how humans work.

Mini-story 2: The optimization that backfired

A different org had a battery-life initiative. They tuned laptops for “efficiency,” which included aggressive sleep policies and CPU power saving. Someone also recommended limiting background services. In WSL2, developers started stripping services to reduce “overhead,” because of course they did. A few even disabled time sync daemons with the logic: “I’m not a server.”

A month later, intermittent auth failures showed up against an internal Kerberos-backed service. Again: not everyone, not always. The failures tended to happen in the afternoon, often after a long meeting. Developers blamed VPN. VPN blamed DNS. DNS blamed “something with Linux.” The classic corporate triangle of blame.

The real culprit was clock drift inside WSL2 after repeated suspend/resume cycles, compounded by the optimization: no time daemon to correct it. Kerberos tolerance is not a suggestion. When the drift crossed the threshold, tickets were rejected. The failures looked like “wrong password” to the application layer, which is a cruel joke performed by protocols.

The rollback was simple: re-enable time sync and relax the most aggressive sleep tuning for developers who used WSL2 heavily. Battery life dipped slightly. Productivity recovered significantly. The lesson landed the way these lessons always land: “Your laptop is not a server, but your laptop runs systems that care about time like a server.”

Mini-story 3: The boring but correct practice that saved the day

A finance-adjacent team ran a build-and-release pipeline locally in WSL2 because the corporate CI capacity was tight. It wasn’t ideal, but it was reality. The team lead had one non-negotiable rule: every dev environment must point at the same internal NTP sources as production Linux hosts, and it must be verifiable.

So they standardized on chrony in WSL2, with a config managed via a small bootstrap script. They didn’t trust “works on my machine.” They trusted “chronyc tracking shows low offset.” They also logged a simple health signal: a daily check that recorded offset and stratum into a local file. Not because it was glamorous—because it turned time into a measurable property.

One day, a corporate network change accidentally blocked UDP/123 on a subset of Wi-Fi VLANs. Developers started seeing weird issues elsewhere, but this team caught the time issue early because their daily check started reporting no reachable sources. They had evidence, not vibes.

While other groups argued about whether “the internet is down,” this team switched temporarily to an alternate approved NTP source on a different network segment and kept shipping. Later, network fixed the policy. Nobody wrote a dramatic postmortem. That’s what success looks like: boring, correct, and largely invisible.

Checklists / step-by-step plan

Checklist A: One-time stabilization (do this once per machine)

  1. Update WSL and kernel (Windows-side) and reboot if needed.
  2. Confirm Windows time is disciplined (domain time or reliable sync).
  3. Inside WSL2, confirm systemd status (ps -p 1).
  4. Enable a time sync daemon:
    • If systemd is present and you want simple: enable systemd-timesyncd.
    • If you see chronic drift: install and run chrony.
  5. Point at approved NTP sources (internal NTP if corporate; otherwise stable public pools if allowed).
  6. Verify offset (timesyncd status or chronyc tracking).

Checklist B: Post-sleep reliability (the “laptop reality” plan)

  1. Reproduce the bug: close lid for 5–10 minutes, resume, run date -Is.
  2. If drift occurs: decide between:
    • Fast workaround: restart WSL (wsl --shutdown).
    • Durable fix: implement a resume-triggered service restart (timesyncd/chrony) or WSL restart via Windows Task Scheduler.
  3. After implementing: repeat the sleep/resume test and verify offset stays small.

Checklist C: When corporate networks get in the way

  1. Assume UDP/123 is blocked until proven otherwise.
  2. Find approved time servers (internal NTP, domain controllers, or sanctioned time sources).
  3. Configure your daemon to use only those servers.
  4. Verify reachability and tracking with chrony/timesyncd status.
  5. If none exist: escalate. Time is a security dependency, not a personal preference.

FAQ

1) Why does WSL2 drift more than “normal Linux” on bare metal?

Because it’s a VM with a different scheduling and timer model, and laptop sleep/hibernate adds discontinuities. Bare metal usually has tighter hardware timer behavior and fewer suspend/resume edge cases in the guest.

2) Should I just run sudo date -s when it happens?

No. Manual time setting causes jumps that can break running processes and hides the real issue: lack of continuous discipline. Use a time daemon and fix resume behavior.

3) Is systemd-timesyncd enough, or do I need chrony?

timesyncd is enough when the network is stable and drift is mild. If you see repeated post-suspend skew, intermittent connectivity, or large corrections, chrony is usually the better tool.

4) My WSL2 doesn’t have systemd. Am I stuck?

You’re not stuck, but you’re playing on hard mode. You can enable systemd in WSL (recommended) or run chrony without systemd supervision. The latter works, but it’s easier to get wrong and harder to keep consistent.

5) Why does time drift break apt and git so dramatically?

apt validates repository metadata timestamps; TLS validates certificate validity windows. Both are security features. Incorrect time looks indistinguishable from an attack or corruption, so the correct behavior is to fail.

6) Can Docker inside WSL2 make this worse?

Docker itself doesn’t “cause” drift, but heavy container workloads can increase CPU contention. If the guest gets starved or your time daemon is absent/misconfigured, the skew becomes visible faster.

7) Why do my logs look out of order?

If wall clock time jumps backward or forward, log timestamps follow it. Your systems may still be functioning, but your observability becomes unreliable. Use monotonic time for durations, and keep wall clock disciplined.

8) What’s an acceptable time offset for dev work?

Keep it within a second if you can; within a few seconds is usually fine for most tasks. Once you’re in minutes, expect auth and TLS failures. In corporate environments with strict security, even small skew can be painful.

9) If Windows time is correct, why doesn’t WSL2 automatically inherit it perfectly?

Because “inherit” isn’t a constant stream; it’s mediated by virtualization integration and guest behavior. Suspend/resume, scheduling delays, and guest configuration can still produce skew.

10) Is it safe to restart WSL frequently?

It’s safe in the sense that it’s a supported operation. It’s not safe for unsaved state: it kills running processes in WSL. Treat it like restarting a VM: fine when planned, rude when done mid-workload.

Next steps you can actually do today

Do this in order. Stop when the problem stops.

  1. Measure: run date -Is and timedatectl in WSL2. Confirm whether it’s drift or timezone.
  2. Confirm sync service: if systemd is present, check systemctl status systemd-timesyncd (or chrony if installed).
  3. Fix the basics: enable a daemon (timesyncd or chrony), point it at approved time sources, verify offset.
  4. Handle sleep: if skew appears after resume, implement a deterministic action—restart time service or, if necessary, restart WSL—triggered after wake.
  5. Validate with a failure test: repeat the TLS validity check and repository update. Don’t declare victory until the original symptom is gone.

If you take one opinionated piece of advice from this: treat time like infrastructure, even on a laptop. Especially on a laptop. Your tools are now distributed systems wearing a hoodie.

]]>
https://cr0x.net/en/wsl2-time-drift-fix/feed/ 0
Run systemd in WSL: What Works Now (and What Still Breaks) https://cr0x.net/en/run-systemd-in-wsl/ https://cr0x.net/en/run-systemd-in-wsl/#respond Thu, 19 Feb 2026 10:52:53 +0000 https://cr0x.net/?p=34919 You wanted a Linux environment on Windows that behaves like the servers you actually run. Then you typed
systemctl status in WSL and got the classic deadpan: “System has not been booted with systemd.”
Suddenly your “quick dev box” turns into a weird petri dish of half-working daemons, missing logs, and services
that only start if you remember to kick them.

The good news: systemd in WSL is now a real supported mode, not a pile of hacks. The bad news: it’s still not a
full VM, it still isn’t your production kernel, and some things fail in ways that feel personal.
This is the practical map—what works, what breaks, and how to debug it like you’re on call.

The state of play: systemd in WSL today

If you’re on WSL2 and a reasonably current WSL package, you can enable systemd per distribution using
/etc/wsl.conf. WSL will then launch systemd as PID 1 inside the distro, and the rest of your services
behave like they do on a normal Linux machine—mostly.

“Mostly” is doing work here. WSL is not a generic hypervisor. It’s a Linux kernel hosted by Windows with a tight
integration layer. Some plumbing is different by design: networking is virtualized; mounts are translated; the
lifecycle is session-driven; Windows is still holding the keys to power management and host networking.

My opinionated guidance:

  • Use systemd in WSL when you need realistic service management (Docker Engine, podman, sshd,
    timers, journald, dbus-mediated services).
  • Don’t use WSL as production (yes, people try). Treat it as a dev/CI environment with
    production-like behavior, not a production substrate.
  • Keep a boundary between “Linux services” and “Windows host responsibilities”. When you blur that
    boundary, troubleshooting becomes performance art.

How systemd actually boots inside WSL (and why that matters)

Traditionally, WSL launched your shell (or configured command) directly; there was no init system. That meant:
no PID 1 doing service supervision, no consistent boot target, and no journald-managed logs. If a daemon died,
it died quietly or left behind a stale PID file like a tiny tombstone.

With systemd enabled, WSL starts systemd as PID 1. D-Bus comes along for the ride, and systemd units can run
normally. You now have a coherent way to start services, define dependencies, set restart policies, and use
timers instead of “I put it in my shell rc file and hoped.”

But WSL’s “boot” is not a hardware boot. It’s a user-space start inside a managed environment. Two consequences
show up immediately:

  1. Lifecycle is tied to the WSL instance. When WSL stops (or is terminated), your services stop.
    When it starts, unit ordering happens, but it’s not the same as a full server boot with device discovery.
  2. Integration layers can override Linux defaults. DNS resolution, network interfaces, and mounts
    can behave differently than on a normal distribution install.

The practical lesson: treat systemd in WSL as “real Linux service management running in a constrained host
container,” not as “I installed Ubuntu.” This framing prevents a lot of wrong assumptions.

Interesting facts and historical context

These are short, concrete points that explain why this topic is still spicy:

  1. WSL1 didn’t use a Linux kernel at all; it translated Linux syscalls to Windows NT kernel calls.
    That model was clever, but it limited compatibility for kernel-dependent workloads.
  2. WSL2 switched to a real Linux kernel running in a lightweight VM. That’s why containers and
    cgroups became viable.
  3. For years, systemd wasn’t supported in WSL, partly because WSL did not present itself as a
    traditional booted system with PID 1 ownership and device management.
  4. Early “systemd on WSL” attempts relied on wrapper scripts and manual PID 1 tricks. They worked
    until they didn’t, and they broke in wonderfully confusing ways.
  5. Linux distros increasingly expect systemd. Even when a service can run without it, packaging
    and service integration often assume systemd units exist.
  6. systemd is not just “an init system”; it’s a suite (journald, resolved, networkd, logind, timers,
    unit dependencies). In WSL, you might want exactly some of these and not others.
  7. cgroup v2 is the modern default in many environments. That matters for Docker, podman, and any
    resource control; WSL2 has steadily improved cgroup support.
  8. WSL networking historically used NAT with a changing VM IP. That’s fine for outbound calls and
    painful for inbound services unless you plan for it.
  9. WSL regenerates parts of your Linux config (notably /etc/resolv.conf) unless you
    tell it not to, which collides with systemd-resolved expectations.

What works well (enough) with systemd in WSL

Service management that behaves like Linux

The headline benefit is obvious: systemctl works. Units can start at “boot” (WSL instance start),
restart on failure, and express dependencies. That means:

  • sshd can be a service, not a manual command you forget.
  • cron-like work can be a systemd timer with logging and failure handling.
  • dbus-using services behave like they do on servers.

journald makes debugging less superstitious

Without journald, you end up scraping random files under /var/log and wondering why nothing is there.
With systemd, you can query structured logs per unit, per boot, and per time range. It’s not magic, but it’s
predictable—my favorite flavor of magic.

Docker Engine and friends can be more “native-Linux”

Many teams use Docker Desktop with WSL integration, which can work fine without systemd. But if you want to run
Docker Engine inside the distro like you would on a Linux host, systemd support removes a lot of papercuts:
unit files, socket activation patterns, and predictable startup.

Systemd timers are a sane alternative to “backgrounding”

WSL sessions come and go. Background processes that you started manually can disappear, or keep running and
surprise you later. Timers + services give you idempotent, inspectable scheduling.

What still breaks or stays weird

DNS and resolv.conf: the perennial footgun

WSL likes to manage DNS for you. systemd-resolved likes to manage DNS for you. Put them together and you can get:
missing name resolution, different behavior between host and WSL, or “it works until I reconnect to Wi‑Fi.”

The fix is usually not complicated, but it requires choosing who owns the configuration. Either let WSL generate
/etc/resolv.conf and don’t run systemd-resolved, or disable WSL’s generation and wire it correctly
for resolved. Indecision is the worst configuration management tool.

Networking for inbound services is still not “server-like”

Inbound networking depends on how Windows exposes ports to the WSL VM. Your service might be listening on
127.0.0.1 inside WSL and unreachable from Windows, or vice versa. Sometimes the “right” bind address
is counterintuitive. Sometimes it changes with updates.

Mount semantics: /mnt/c is convenient and slow

The Windows filesystem mounted under /mnt/c is great for interoperability and terrible for workloads
that want Linux filesystem semantics and speed (metadata-heavy builds, databases, large Git repos). systemd doesn’t
fix that; it just makes the pain more organized.

Not all kernel knobs behave like your data center

WSL’s kernel is real, but it’s managed. Some modules, sysctls, and device behaviors may differ from your fleet.
If you’re testing anything kernel-sensitive, validate on a real Linux VM too.

Joke #1: WSL with systemd is like giving your dev laptop a pager—suddenly it has opinions about uptime.

Practical tasks: commands, outputs, and decisions

The fastest way to understand whether your setup is “healthy enough” is to interrogate it like you would a
production host. Below are real commands, what realistic output looks like, and the decision you make.

Task 1: Confirm you’re on WSL2 (not WSL1)

cr0x@server:~$ uname -r
5.15.133.1-microsoft-standard-WSL2

What it means: If the kernel string includes microsoft-standard-WSL2, you’re on WSL2.

Decision: If you’re on WSL1, stop. systemd support is for WSL2. Convert the distro to WSL2 before continuing.

Task 2: Verify PID 1 is systemd

cr0x@server:~$ ps -p 1 -o pid,comm,args
  PID COMMAND         COMMAND
    1 systemd         /sbin/init

What it means: PID 1 being systemd is the whole ballgame.

Decision: If PID 1 is init or bash, systemd is not enabled (or WSL didn’t restart cleanly).

Task 3: Check WSL systemd enablement configuration

cr0x@server:~$ cat /etc/wsl.conf
[boot]
systemd=true

What it means: This is the per-distro switch.

Decision: If missing/false, set it, then restart the WSL instance from Windows (or terminate that distro) so it takes effect.

Task 4: Prove systemd is functional (not just present)

cr0x@server:~$ systemctl is-system-running
running

What it means: running is healthy. You might also see degraded if a unit failed.

Decision: If degraded, immediately list failed units and fix the first failure that matters.

Task 5: List failed units and pick the right fight

cr0x@server:~$ systemctl --failed
  UNIT                    LOAD   ACTIVE SUB    DESCRIPTION
● systemd-resolved.service loaded failed failed Network Name Resolution
● snapd.service            loaded failed failed Snap Daemon

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state.
SUB    = The low-level unit activation state, often a more detailed state.
2 loaded units listed.

What it means: Some defaults fail in WSL, depending on distro and config.

Decision: If the failed unit is irrelevant (like snapd in many WSL dev setups), mask it. If it’s resolved, fix DNS ownership first.

Task 6: Inspect why a unit failed (systemd’s perspective)

cr0x@server:~$ systemctl status systemd-resolved.service --no-pager -l
● systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Mon 2026-02-05 09:12:17 UTC; 25s ago
       Docs: man:systemd-resolved.service(8)
    Process: 412 ExecStart=/lib/systemd/systemd-resolved (code=exited, status=1/FAILURE)
     Status: "Failed to read /etc/resolv.conf: Too many levels of symbolic links"

What it means: A classic loop: resolved wants a symlink to stub resolv.conf, WSL may rewrite it, and you get a symlink circus.

Decision: Decide who owns /etc/resolv.conf. Then implement cleanly and restart the unit.

Task 7: Check who owns DNS right now

cr0x@server:~$ ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 116 Feb  5 09:10 /etc/resolv.conf

What it means: A regular file typically suggests WSL generated it (or you pinned it manually).

Decision: If you want systemd-resolved, you generally want /etc/resolv.conf to be a symlink to resolved’s stub or generated file—without WSL rewriting it.

Task 8: Confirm journald is capturing logs (and not silently useless)

cr0x@server:~$ journalctl -b -n 10 --no-pager
Feb 05 09:12:01 server systemd[1]: Starting Network Name Resolution...
Feb 05 09:12:17 server systemd[1]: systemd-resolved.service: Main process exited, code=exited, status=1/FAILURE
Feb 05 09:12:17 server systemd[1]: systemd-resolved.service: Failed with result 'exit-code'.
Feb 05 09:12:17 server systemd[1]: Failed to start Network Name Resolution.
Feb 05 09:12:20 server systemd[1]: Starting OpenSSH Daemon...
Feb 05 09:12:20 server sshd[522]: Server listening on 0.0.0.0 port 22.
Feb 05 09:12:20 server systemd[1]: Started OpenSSH Daemon.

What it means: You have boot-scoped logs. That’s your root of truth now.

Decision: If journald is empty or errors, fix logging before chasing service bugs. No logs means you’re debugging by vibe.

Task 9: Check dbus health (lots of things quietly depend on it)

cr0x@server:~$ systemctl status dbus --no-pager
● dbus.service - D-Bus System Message Bus
     Loaded: loaded (/lib/systemd/system/dbus.service; static)
     Active: active (running) since Mon 2026-02-05 09:12:00 UTC; 1min 2s ago
TriggeredBy: ● dbus.socket
       Docs: man:dbus-daemon(1)
   Main PID: 210 (dbus-daemon)
      Tasks: 1 (limit: 18947)
     Memory: 2.3M
     CGroup: /system.slice/dbus.service
             └─210 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

What it means: dbus is running, socket activation is working.

Decision: If dbus is dead, expect “weird” everywhere: hostnamectl, timedatectl, resolved, logind-adjacent tools.

Task 10: Verify cgroup version (containers care)

cr0x@server:~$ stat -fc %T /sys/fs/cgroup
cgroup2fs

What it means: cgroup2fs means cgroup v2. Older setups might show tmpfs and require different container config.

Decision: If you run Docker Engine or podman, align your cgroup driver expectations with what’s available.

Task 11: If Docker is installed, check the daemon unit

cr0x@server:~$ systemctl status docker --no-pager
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; preset: enabled)
     Active: active (running) since Mon 2026-02-05 09:12:45 UTC; 35s ago
TriggeredBy: ● docker.socket
       Docs: man:dockerd(8)
   Main PID: 900 (dockerd)
      Tasks: 18
     Memory: 78.4M
     CGroup: /system.slice/docker.service
             └─900 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

What it means: Running daemon, socket activation, normal-looking cgroup path.

Decision: If Docker fails, check cgroups, iptables/nft assumptions, and whether you’re colliding with Docker Desktop integration.

Task 12: Check what’s listening and where (127.0.0.1 vs 0.0.0.0)

cr0x@server:~$ ss -lntp
State  Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
LISTEN 0      4096   0.0.0.0:22         0.0.0.0:*     users:(("sshd",pid=522,fd=3))
LISTEN 0      4096   127.0.0.1:5432     0.0.0.0:*     users:(("postgres",pid=1203,fd=6))

What it means: sshd is reachable on all interfaces inside WSL; postgres is bound to loopback only.

Decision: If Windows can’t reach a service, first check bind addresses. “It’s running” is not “it’s reachable.”

Task 13: Confirm mount type for your project directory

cr0x@server:~$ df -T .
Filesystem     Type  1K-blocks      Used Available Use% Mounted on
/dev/sdd       ext4  263174212  18234568 231513928   8% /

What it means: If you’re on ext4 under the Linux filesystem, performance is usually decent.

Decision: If your project lives under /mnt/c (typically 9p or similar), move it into the distro filesystem for heavy builds/databases.

Task 14: Identify if WSL is rewriting resolv.conf behind your back

cr0x@server:~$ grep -n "generateResolvConf" /etc/wsl.conf || true

What it means: No output usually means default behavior (WSL may generate resolv.conf).

Decision: If you want deterministic DNS with systemd-resolved, explicitly set generateResolvConf=false and manage the file/symlink yourself.

Task 15: Check boot performance and blame the slow units

cr0x@server:~$ systemd-analyze blame | head
3.214s snapd.service
1.122s docker.service
 812ms systemd-resolved.service
 402ms ssh.service
 211ms systemd-journald.service

What it means: You have unit-level timing. This is gold when “WSL feels slow today.”

Decision: Mask or disable units that are irrelevant in WSL and slow down every start.

Task 16: Prove a timer works and produces logs

cr0x@server:~$ systemctl list-timers --all --no-pager | head
NEXT                        LEFT     LAST                        PASSED  UNIT                         ACTIVATES
Mon 2026-02-05 09:20:00 UTC 2min 10s Mon 2026-02-05 09:15:00 UTC 2min ago apt-daily.timer              apt-daily.service
Mon 2026-02-05 10:00:00 UTC 42min    Mon 2026-02-05 09:00:00 UTC 17min ago man-db.timer                man-db.service

What it means: Timers are scheduled and tracked.

Decision: If you rely on scheduled tasks, prefer timers over shell hacks. If timers spam logs or burn CPU, adjust or disable them.

Fast diagnosis playbook

When systemd-in-WSL is “broken,” it rarely fails in a novel way. Most outages are the same small set of failure
modes wearing different hats. Here’s the order that gets you to the bottleneck fast.

First: Is systemd actually PID 1 and is the system running?

  • ps -p 1 -o comm,args → if not systemd, stop and fix enablement/restart WSL.
  • systemctl is-system-running → if degraded, list failed units and address the first relevant one.

Second: Are the logs usable?

  • journalctl -b -n 50 → if empty, you don’t have observability; fix journald/storage permissions or your expectations.
  • journalctl -u <service> -b → if the service is failing, prefer this over random log files.

Third: Is it DNS/networking or is it the service itself?

  • resolvectl status (if using resolved) or inspect /etc/resolv.conf.
  • ss -lntp to confirm bind addresses and listeners.
  • ip addr + ip route to confirm interfaces and default route exist.

Fourth: Is performance being killed by mounts or a slow unit?

  • df -T and path sanity: don’t run databases on /mnt/c if you want joy.
  • systemd-analyze blame to identify slow startup units; disable what you don’t need.

Fifth: Container workloads—confirm cgroups and daemon assumptions

  • stat -fc %T /sys/fs/cgroup to confirm cgroup v2.
  • systemctl status docker and journalctl -u docker -b for daemon errors.

The goal is not to “try random fixes.” The goal is to classify the problem in under five minutes.

Three corporate mini-stories (all true-to-life, none traceable)

Mini-story 1: The incident caused by a wrong assumption

A finance-adjacent team standardized on WSL for development because it reduced laptop variance. They added systemd
so services could start “like in prod.” It worked for months. Then a new developer joined, pulled the repo, ran
the bootstrap, and their local service mesh never came up correctly.

The wrong assumption was subtle: they assumed WSL “boot” was equivalent to a VM boot and that systemd units would
be running before the developer’s first shell prompt reliably. In practice, WSL instance startup is triggered by
launching the distro, and the timing depends on what else the host is doing. The bootstrap script immediately ran
a client command against a service that was still starting, failed once, and then cached the failure state in a
local config.

The symptom looked like a network problem: timeouts, refused connections, and one person insisting “it works on my
machine,” which is both a statement and a lifestyle. The root cause was just ordering: services were fine but not
ready when queried, and the script had no retries/backoff.

The fix was boring: add readiness checks (systemd After= and Wants= where appropriate, plus
a client-side retry loop with timeouts), and stop treating first-boot behavior as deterministic. They also added
a single “health” command that checked critical units and printed actionable output.

Mini-story 2: The optimization that backfired

Another team was chasing faster builds. They noticed that keeping source under /mnt/c made it easier to
edit from Windows tools, but builds were slow. Someone proposed an “optimization”: symlink build outputs and caches
back to Windows storage so they’d survive WSL resets and be shared across distros.

It sped up a few small builds and then detonated in the way only I/O optimizations can: the largest builds started
failing intermittently. They saw corrupted artifact caches, spurious permission errors, and file watcher storms.
systemd wasn’t the direct cause, but systemd made it more visible because background services were now reliably
running—indexers, language servers, and file watchers were all active and amplifying the filesystem’s worst traits.

The backfire was classic: the Windows-mounted filesystem had different semantics and performance characteristics.
Metadata operations were slower, file locks behaved differently, and the entire pipeline was now sensitive to
timing. The team spent a week “tuning” before admitting the plan was flawed.

The fix was to keep heavy build caches and repos inside the Linux filesystem (ext4 in the WSL virtual disk), then
expose only the final artifacts back to Windows if needed. They kept an explicit “sync” step rather than trying
to make two worlds share a hot directory structure.

Mini-story 3: The boring but correct practice that saved the day

A platform group maintained a WSL-based dev environment for dozens of engineers. The setup included systemd,
Docker, a local registry mirror, and a few internal agents. This was the kind of environment that works until one
Tuesday morning when everyone updates something and chaos blooms.

Their boring practice: they kept a minimal, versioned “golden” diagnosis script that ran the same checks every
time: PID 1, failed units, last boot logs, DNS ownership, listening sockets, disk type for working directory, and
container daemon health. It printed simple pass/fail lines and suggested the next command.

When a WSL update changed networking behavior enough to break inbound connections for a subset of laptops, the
group didn’t spend hours arguing about whose machine was cursed. They ran the script, saw that services were
listening on 127.0.0.1, and Windows could no longer reach them. The pattern was consistent.

Because they had a baseline and logs, they could craft a targeted fix: adjust bind addresses for dev services,
update firewall rules on the host, and document a single workaround for affected users while waiting for a more
permanent Windows-side change. It wasn’t glamorous, but it prevented a slow-motion productivity outage.

Common mistakes: symptom → root cause → fix

1) “System has not been booted with systemd”

Symptom: systemctl errors, units can’t be managed.

Root cause: systemd not enabled, or WSL instance not restarted after config change.

Fix: Set [boot] systemd=true in /etc/wsl.conf and restart the WSL instance. Confirm PID 1 is systemd.

2) DNS works once, then breaks after network changes

Symptom: apt update fails, internal names stop resolving after VPN/Wi‑Fi changes.

Root cause: Ownership conflict: WSL auto-generating /etc/resolv.conf while systemd-resolved expects control (or vice versa).

Fix: Choose one owner. If using resolved, set generateResolvConf=false and wire /etc/resolv.conf to resolved’s expected file without loops. If not using resolved, disable/mask it.

3) Services run but are unreachable from Windows

Symptom: curl localhost:PORT works inside WSL but not from Windows, or the reverse.

Root cause: Bind address mismatch (127.0.0.1 vs 0.0.0.0), NAT/port forwarding assumptions, or host firewall.

Fix: Use ss -lntp to see where it’s bound. Bind dev services intentionally. Don’t guess. If needed, adjust Windows firewall rules.

4) WSL feels slow “after enabling systemd”

Symptom: Distro startup feels heavier; commands lag; CPU spikes on launch.

Root cause: Unneeded services auto-starting (snapd, apt timers, indexers), or filesystem choice (working under /mnt/c).

Fix: Use systemd-analyze blame and systemctl --failed. Disable/mask irrelevant units. Move heavy projects into the Linux filesystem.

5) Docker inside WSL fails oddly (iptables, cgroups, permissions)

Symptom: Docker daemon won’t start, containers can’t create networks, errors about cgroups or nft.

Root cause: Mismatched expectations: cgroup v2 vs v1, firewall backend differences, or conflict with Docker Desktop integration.

Fix: Confirm cgroup mode, inspect journalctl -u docker -b, and pick one model: Docker Desktop-managed or distro-managed. Mixing increases entropy.

6) systemd-resolved fails with symlink loops

Symptom: resolved won’t start; errors mention /etc/resolv.conf symlink depth.

Root cause: Incorrect symlink chain, often due to WSL regeneration or manual edits layered on top of distro defaults.

Fix: Remove the loop and rebuild the intended state. Ensure exactly one owner of the file and that WSL isn’t regenerating it against your wishes.

7) Timers run when you don’t want them to

Symptom: Random CPU/network usage; apt-daily wakes up during demos; fans spin like a jet.

Root cause: Default distro timers aren’t tuned for “laptop dev environment.”

Fix: Disable or reschedule timers you don’t need. Validate with systemctl list-timers and check logs for actual impact.

Checklists / step-by-step plan

Step-by-step: enabling systemd safely

  1. Confirm WSL2: run uname -r and look for WSL2 kernel string.
  2. Enable systemd in the distro: edit /etc/wsl.conf:

    cr0x@server:~$ sudo sh -c 'cat > /etc/wsl.conf <<EOF
    [boot]
    systemd=true
    EOF'
    
  3. Restart the WSL instance: from Windows, terminate the distro or restart WSL so the change applies.
  4. Verify PID 1: ps -p 1 -o comm,args.
  5. Check for degraded state: systemctl is-system-running, then systemctl --failed.
  6. Make DNS a deliberate choice: decide whether to use systemd-resolved. If you don’t need it, mask it and move on.

    cr0x@server:~$ sudo systemctl mask systemd-resolved.service
    Created symlink /etc/systemd/system/systemd-resolved.service → /dev/null.
    
  7. Disable slow/unneeded units: use systemd-analyze blame and disable the offenders you don’t use.
  8. Move performance-sensitive work into ext4: keep repos/databases under the Linux filesystem, not /mnt/c.

Checklist: production-like dev setup (the sane version)

  • Critical services have systemd units with restart policies.
  • Logs are queried via journalctl, not “whatever file exists.”
  • DNS ownership is explicit and documented.
  • Inbound networking assumptions are tested (from Windows and inside WSL).
  • Timers are pruned to what you actually want.
  • Repos and databases live on the Linux filesystem.
  • Container toolchain has one owner (Docker Desktop integration or in-distro Docker), not both.

Checklist: when you should not bother with systemd in WSL

  • You just need a shell, compilers, and a few CLI tools.
  • Your workflow is “run one foreground process” and kill it when done.
  • You rely on Windows-native services and only use WSL for scripting.

FAQ

1) Do I need systemd in WSL to run Docker?

Not strictly. Docker Desktop can integrate with WSL without you managing systemd.
But if you want a Linux-like Docker Engine service inside the distro with standard unit management, systemd helps.
Pick one model and commit to it.

2) Why does systemctl work but my service still doesn’t start on WSL “boot”?

WSL “boot” is “instance start,” and timing varies. Ensure your unit has the right dependencies, and use readiness checks.
Then verify systemctl is-enabled and inspect journalctl -u your.service -b.

3) Should I run systemd-resolved in WSL?

Only if you need it and you’re willing to own DNS configuration explicitly. Many dev environments are fine letting WSL manage
/etc/resolv.conf and masking resolved. The worst option is running both “half on.”

4) Why are my logs missing under /var/log?

With systemd, many services log to journald, not to flat files. Use journalctl. If you want file logs, configure the service
or journald forwarding deliberately.

5) Is systemd in WSL stable enough for daily work?

Yes for dev workflows, especially when you keep configs simple and avoid filesystem and DNS footguns.
No if your definition of “stable” includes “behaves exactly like our production kernel and network.”

6) Why is my service listening but unreachable from Windows?

Usually it’s binding to 127.0.0.1 inside WSL, or Windows firewall/port forwarding behavior changed. Start with
ss -lntp and bind deliberately. Confirm from both sides.

7) Will enabling systemd slow down WSL?

It can, if your distro enables a bunch of services/timers you don’t need. The fix is straightforward: measure with
systemd-analyze blame, then disable/mask the noise.

8) Can I use systemd timers instead of cron in WSL?

Yes, and you probably should. Timers have better logging and dependency control. Just remember that if the WSL instance isn’t running,
timers won’t fire until it is.

9) What’s the single most common systemd-in-WSL failure mode?

DNS ownership conflicts, followed by slow or failing “extra” units (snapd, auto-updaters) that weren’t designed for a WSL lifecycle.

10) If this is for dev, why be so strict about diagnostics?

Because dev downtime is real downtime—just paid in engineers instead of customers. Also because a clean diagnostic path prevents the team
from cargo-culting fixes that break later.

Next steps you can actually take

systemd in WSL is now a legitimate tool: you can manage services, get coherent logs, and run a more production-shaped dev environment
without kludges. But WSL still has its own physics. Treat it like a constrained Linux host with Windows as the platform—not as a tiny
server that just happens to live in a laptop.

Practical next steps:

  1. Enable systemd, restart WSL, and verify PID 1 + systemctl is-system-running.
  2. Kill the irrelevant services: disable/mask what doesn’t belong in your dev environment.
  3. Pick a DNS owner and enforce it. Half measures create the hardest incidents.
  4. Move performance-sensitive repos/databases off /mnt/c and into ext4.
  5. Write a tiny team “health check” script that runs the same 8–10 commands every time.

Quote (paraphrased idea), attributed: Werner Vogels is often credited with the principle that you should “build it, run it, and own it”
as part of reliability culture.

Joke #2: If you mask snapd in WSL and nothing breaks, you have discovered the rarest creature in ops: a dependency you didn’t need.

]]>
https://cr0x.net/en/run-systemd-in-wsl/feed/ 0
Docker Desktop vs Docker in WSL: Which One Is Actually Faster? https://cr0x.net/en/docker-desktop-vs-docker-wsl-speed/ https://cr0x.net/en/docker-desktop-vs-docker-wsl-speed/#respond Wed, 18 Feb 2026 21:56:08 +0000 https://cr0x.net/?p=34887 If your containers feel like they’re running through wet cement on Windows, you’re not imagining it. The performance gap between “Docker Desktop” and “Docker in WSL” is real—but it’s also misunderstood. Most people benchmark the wrong thing, then “fix” it by making their setup harder to debug.

I’m going to tell you what’s actually faster, why, and how to prove it on your machine in under an hour—without cargo-culting settings you won’t remember next quarter.

What you’re really comparing (and why the names are misleading)

On Windows, Docker “the product” and Docker “the engine” get tangled. People say “Docker Desktop is slow” when they mean “my bind mounts are slow.” Or they say “WSL Docker is faster” when they accidentally moved their source code into a Linux filesystem and stopped paying the Windows filesystem tax.

Let’s define the two setups in the way performance actually cares about:

  • Docker Desktop: a Windows app that runs Docker Engine inside a Linux VM. On modern versions, that VM is usually backed by WSL2. Desktop also adds integrations: UI, credential helpers, networking glue, file sharing paths, extensions, and policy knobs.
  • Docker Engine inside WSL2 (“Docker in WSL”): you install and run the Linux Docker Engine directly in your WSL distro. No Desktop app required. Containers run in the same WSL2 VM environment, but you manage the daemon like a normal Linux system.

Notice what’s missing: the actual container runtime doesn’t magically change. Both end up in a Linux kernel context (WSL2’s VM). The difference is where your daemon lives, how file sharing is wired, and how many translation layers you accidentally add.

Here’s the fastest way to think about it:

  • If your workload is CPU-bound (compiling, compression, crypto), Desktop vs WSL Engine is usually a rounding error.
  • If your workload is filesystem-bound (Node.js installs, hot reloaders, language servers, large monorepos), file placement and mount type dominate everything.
  • If your workload is network-bound (lots of localhost services, proxies, VPNs), integration details and NAT paths can make one feel “randomly” worse.

One quote, because it fits: Werner Vogels has a “paraphrased idea” that everything fails, all the time—so you design and operate assuming failure. Performance work is similar: assume your fastest path will degrade unless you keep it simple and observable.

Joke #1: Benchmarking Docker on Windows without checking where your code lives is like timing a sports car while dragging the parking brake. You’ll get a number, sure.

A few facts and historical context you can use in arguments

These aren’t trivia-night facts. They explain why the performance profile looks the way it does.

  1. Docker on Windows initially leaned heavily on Hyper-V to run a Linux VM, because containers need a Linux kernel. Early setups had noticeably higher overhead and brittle networking.
  2. WSL1 wasn’t a VM; it translated Linux syscalls into Windows behavior. That was clever, but it wasn’t “real Linux,” and many container behaviors were awkward or impossible.
  3. WSL2 switched to a real Linux kernel in a lightweight VM, which made Linux container workloads much more compatible—and often faster for Linux-native filesystem operations.
  4. File performance problems often come from crossing the Windows/Linux filesystem boundary (e.g., accessing /mnt/c from Linux). That boundary has to translate metadata, permissions, and notification semantics.
  5. Bind mounts aren’t inherently slow; bind mounts that traverse a virtualization boundary are. A Linux bind mount inside the Linux filesystem is typically fine.
  6. Docker Desktop has evolved into a platform, not just a daemon launcher. The extra features are useful, but they add moving parts that can impact performance and debugging.
  7. BuildKit changed Docker build performance characteristics by improving caching, parallelism, and mount-based build steps. But it also made certain filesystem bottlenecks more visible.
  8. Windows file change notifications differ from Linux in edge cases; hot reload stacks can behave differently depending on whether events are bridged or polled.

Where speed actually comes from: CPU, disk, mounts, and network

CPU: usually not the deciding factor

CPU-bound workloads (Go/Rust builds, gzip, test runners that don’t thrash the disk) tend to perform similarly between Desktop and WSL Engine because both are executing inside the same WSL2-backed Linux environment in many modern setups. The VM boundary exists either way. The question becomes: did you accidentally add extra layers (like running Docker Desktop but also calling it from Windows paths with heavy bind mounts)?

Disk I/O: where most people lose the week

The single biggest speed lever on Windows container dev is where your project files live and how they’re mounted.

  • Best-case: code is stored inside the WSL2 Linux filesystem (your distro’s ext4 in a VHDX), and containers use volumes or bind mounts that stay within Linux. Fast metadata, fast small-file operations.
  • Worst-case: code is stored on NTFS (e.g., C:\src) and you bind mount it into containers via the boundary (/mnt/c/src or Desktop file sharing). Small-file heavy workloads get punished.

Why? Because a “simple” operation like “stat 30,000 files” turns into a translation party: metadata mapping, permission mapping, case sensitivity weirdness, and cache invalidations. Node’s node_modules is basically a small-file torture test. So are Python virtualenvs and Rust cargo registries.

Bind mounts vs volumes: your hidden performance contract

Volumes live in Docker’s Linux storage backend (overlay2 on ext4 in WSL2). They’re usually fast and predictable. Bind mounts mirror a host path into the container. If that path is on Windows, you pay for boundary crossing. If that path is inside the WSL2 distro filesystem, bind mounts can be fine.

Translated into advice: keep your dependencies inside Linux (volumes) even if your source has to be on Windows. Or better: keep both in Linux and use editor integration to work with WSL paths.

Networking: mostly fine until VPNs and localhost enter the chat

WSL2 uses NAT’d virtual networking. Docker adds its own virtual network plumbing. Docker Desktop adds more integration to make localhost behave like people expect on Windows.

Performance issues in networking usually show up as:

  • mysterious latency spikes when a corporate VPN is connected
  • DNS resolution delays inside containers
  • port-forwarding inconsistencies across Windows ↔ WSL ↔ container

Desktop tends to “just work” more often for port publishing to Windows. WSL Engine can be cleaner for Linux-to-Linux service meshes inside WSL, but you might do more manual work to make Windows tools see the services.

Memory and page cache: the quiet reason one setup “feels” faster

Linux page cache is a performance superpower. If your workload repeatedly reads the same dependency trees, caching helps a lot. But WSL2’s memory behavior (dynamic allocation and reclaim) can make performance feel inconsistent if the VM is starved or constantly ballooning. Desktop adds its own resource limits and UI knobs, which can be helpful—or can throttle you if set too low.

Who’s faster for what: a blunt decision matrix

There isn’t one universal “faster.” There is a faster-for-your-bottleneck. Here’s the practical take.

If you want maximum speed for filesystem-heavy dev

Pick Docker Engine inside WSL2, and keep your repos inside the WSL filesystem (\\wsl$ path from Windows; actual storage under the distro VHDX). Use Linux-native tools. Bind mount from Linux paths, not /mnt/c.

If you want the most predictable developer experience in a mixed Windows shop

Pick Docker Desktop, especially if you need GUI management, credential helpers, corporate proxies, Kubernetes toggles, or fewer “why doesn’t localhost work” tickets. But still: keep your project in WSL if you care about speed.

If your bottleneck is builds, not live-reload

Either setup can be fast. What matters is BuildKit, caching strategy, and avoiding pointless invalidation (copying your whole repo early in the Dockerfile, or rebuilding dependencies every edit).

If you’re on a corporate VPN that breaks DNS

Desktop often has better knobs and integration. WSL Engine can be “purer” but you may end up debugging resolv.conf and Windows DNS proxy behavior at 4:55 PM.

Joke #2: Nothing accelerates a “let’s optimize Docker” meeting like someone saying “it works on my laptop.” Then everyone’s laptop becomes evidence.

Practical tasks: commands, expected output, and what decision to make

These are real checks I’d run before believing anyone’s benchmark. Each task includes: command, what the output means, and what you do next.

Task 1: Confirm where Docker is running (Desktop vs WSL Engine)

cr0x@server:~$ docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT                             ERROR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock
desktop-linux       Docker Desktop                            unix:///var/run/docker.sock

Meaning: Contexts tell you which daemon you’re talking to. If you see desktop-linux, you’re likely using Desktop’s daemon integration.

Decision: If you’re trying to compare, explicitly switch contexts and re-run tests. Don’t “think” you switched; prove it.

Task 2: Identify the daemon and storage driver

cr0x@server:~$ docker info --format '{{.ServerVersion}} {{.Driver}} {{.OperatingSystem}}'
27.3.1 overlay2 Docker Desktop

Meaning: Overlay2 on Linux is typical. If you don’t see overlay2, something unusual is happening. If it’s not Linux, you’re not in the intended path.

Decision: If the storage driver is unexpected, stop. Your performance investigation will be fiction until you fix that.

Task 3: Measure whether your project is on Windows or WSL filesystem

cr0x@server:~$ pwd
/mnt/c/src/myapp

Meaning: /mnt/c is Windows NTFS mounted into WSL. If your project is here and you’re bind mounting it, you’re likely paying the boundary tax.

Decision: If you care about speed, move the repo into Linux (e.g., ~/src/myapp inside WSL) and re-test before changing anything else.

Task 4: Compare metadata-heavy ops (the “node_modules test” without installing anything)

cr0x@server:~$ time find . -type f | wc -l
48219

real    0m3.214s
user    0m0.112s
sys     0m0.988s

Meaning: This approximates small-file traversal cost. On /mnt/c, this can be dramatically slower than inside Linux.

Decision: If sys time is high and real is much higher than you expect, your bottleneck is filesystem translation. Fix placement/mounts, not CPU.

Task 5: Compare bind mount performance inside a container

cr0x@server:~$ docker run --rm -v "$PWD":/work -w /work alpine sh -lc 'time find . -type f >/dev/null'
real    0m4.02s
user    0m0.07s
sys     0m1.21s

Meaning: This measures the bind mount path from container view. If it’s much slower than running find on the host filesystem directly, the mount bridge is the culprit.

Decision: If slow, switch to Linux-side repo or use Docker volumes for dependency-heavy dirs.

Task 6: Confirm whether your container writes are going to a volume or a bind mount

cr0x@server:~$ docker inspect -f '{{range .Mounts}}{{.Type}} {{.Source}} -> {{.Destination}}{{"\n"}}{{end}}' myapp
bind /mnt/c/src/myapp -> /app
volume myapp_node_modules -> /app/node_modules

Meaning: A hybrid setup is often best: source via bind mount, dependencies via volume.

Decision: If node_modules or target or .venv lives on a Windows-backed bind mount, move it to a volume.

Task 7: Check WSL version and distro state (from within WSL)

cr0x@server:~$ wsl.exe -l -v
  NAME            STATE           VERSION
* Ubuntu-22.04    Running         2

Meaning: You want WSL2 for performance and kernel compatibility. WSL1 is a different beast.

Decision: If you’re on WSL1, migrate. Don’t tune around it.

Task 8: Check whether Docker Desktop is the daemon you’re talking to (WSL integration can blur it)

cr0x@server:~$ ps -ef | grep -E 'dockerd|containerd' | grep -v grep
root        2660       1  0 10:14 ?        00:00:02 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Meaning: If you see dockerd running inside the WSL distro, you’re likely running Engine in WSL. If not, you might still be using Desktop’s socket integration.

Decision: Decide who owns the daemon: Desktop or systemd in WSL. Avoid half-and-half ownership if you want predictable debugging.

Task 9: Check systemd state in WSL (relevant for running Docker Engine properly)

cr0x@server:~$ ps -p 1 -o comm=
systemd

Meaning: If PID 1 is systemd, you can manage Docker like a normal Linux service. If it’s not, you may be using alternative init methods.

Decision: If you want WSL Engine stability, enable systemd and run Docker under it.

Task 10: Measure build performance (BuildKit on, cache working)

cr0x@server:~$ DOCKER_BUILDKIT=1 docker build -t myapp:bench .
[+] Building 18.2s (12/12) FINISHED
 => [internal] load build definition from Dockerfile                                      0.0s
 => => transferring dockerfile: 1.23kB                                                    0.0s
 => [internal] load metadata for docker.io/library/node:20-alpine                         0.8s
 => [1/6] FROM docker.io/library/node:20-alpine@sha256:...                                0.0s
 => [internal] load .dockerignore                                                         0.0s
 => => transferring context: 2.31kB                                                       0.0s
 => [2/6] WORKDIR /app                                                                    0.1s
 => [3/6] COPY package*.json ./                                                           0.1s
 => [4/6] RUN npm ci                                                                      14.9s
 => [5/6] COPY . .                                                                        1.5s
 => [6/6] RUN npm test                                                                    0.8s
 => exporting to image                                                                    0.0s

Meaning: The slow step is obvious. If COPY . . is slow, your context transfer and filesystem path are bad. If npm ci is slow, you’re I/O bound or network bound.

Decision: If COPY is slow, fix repo placement and .dockerignore. If dependency install is slow, move cache directories to volumes and ensure your DNS/proxy isn’t throttling.

Task 11: Observe container CPU throttling or host contention

cr0x@server:~$ docker stats --no-stream
CONTAINER ID   NAME        CPU %     MEM USAGE / LIMIT     MEM %     NET I/O       BLOCK I/O      PIDS
c3a1d2e9f0b1   myapp       225.41%   612.3MiB / 7.6GiB     7.86%     12.4MB / 9MB  1.2GB / 38MB  34

Meaning: If CPU is pegged and you’re still slow, you might be compute-bound, or you’re being throttled by assigned CPUs.

Decision: If CPU is high and build steps are compute-heavy, allocate more CPUs/memory to WSL2/Desktop or reduce parallelism in toolchains.

Task 12: Check disk usage and where Docker stores its data

cr0x@server:~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          27        6         9.812GB   5.104GB (52%)
Containers      14        2         1.102GB   812.4MB (73%)
Local Volumes   19        7         22.44GB   8.201GB (36%)
Build Cache     45        0         3.993GB   3.993GB

Meaning: Volume bloat and build cache bloat can turn into “my disk is slow” because your VHDX grows and your host disk gets fragmented or pressured.

Decision: If reclaimable is high, prune build cache/unused volumes. If the host disk is nearly full, stop pretending this is a Docker problem.

Task 13: Check DNS latency inside containers (network “slow” often means DNS “slow”)

cr0x@server:~$ docker run --rm alpine sh -lc 'time nslookup registry-1.docker.io >/dev/null'
real    0m0.42s
user    0m0.01s
sys     0m0.01s

Meaning: If DNS takes seconds, package installs and pulls will crawl. VPNs and split-DNS can trigger this.

Decision: If slow, focus on DNS configuration (WSL resolv.conf behavior, Desktop DNS proxy settings, corporate DNS rules), not Docker flags.

Task 14: Measure bind mount event behavior for hot reloaders (inotify vs polling)

cr0x@server:~$ docker run --rm -v "$PWD":/work -w /work alpine sh -lc 'apk add --no-cache inotify-tools >/dev/null && inotifywait -t 2 -e modify . || echo timeout'
Setting up watches.
Watches established.
timeout

Meaning: If you don’t get events reliably, your hot reload tool may fall back to polling, which burns CPU and “feels slow.”

Decision: If events don’t trigger across your mount, move code into WSL filesystem or configure the tool to use polling with sane intervals (and accept the CPU cost).

Fast diagnosis playbook (first/second/third checks)

This is the order that finds the bottleneck fastest in real teams. Not theory. Triage.

First: identify the boundary you’re crossing

  • Is your repo under /mnt/c?
  • Are you bind mounting Windows paths into containers?
  • Are your dependency directories living on Windows-backed mounts?

If yes: expect slow metadata operations. Fix file placement/mount strategy before touching anything else.

Second: decide who owns the daemon and keep it consistent

  • Are you using Docker Desktop’s daemon via integration, or a dockerd you run inside WSL?
  • Do you have both installed and occasionally switch without realizing?

If inconsistent: your measurements will vary. Pick one, document it, and enforce it with contexts or team scripts.

Third: isolate build vs runtime vs network

  • Build slow? Inspect docker build output: is it COPY, dependency installs, or tests?
  • Runtime slow? Check docker stats, disk write patterns, and filesystem event behavior.
  • Pull/install slow? Check DNS latency and VPN/proxy behavior.

Fourth: check resource caps and host contention

  • Is WSL/Desktop limited to 2 CPUs and 2GB RAM because someone “optimized” it?
  • Is your host disk nearly full or under heavy antivirus scanning?

If yes: you’re not tuning Docker, you’re tuning the host environment.

Three corporate mini-stories from the trenches

Incident: the wrong assumption that “WSL = Linux speed”

A mid-sized company rolled out Windows laptops for a dev team that previously used Linux workstations. The plan was simple: “Use WSL2, run Docker, everything is basically Linux again.” The first week, engineers complained that test runs and dependency installs were painfully slow, but only for some repos.

The team assumed the problem was Docker Desktop overhead and started uninstalling Desktop, switching to Docker Engine inside WSL, and tweaking daemon flags. It got… slightly better. Not good.

The real culprit was boring: the repos lived on C:\ because corporate backup and endpoint protection policies were tied to Windows paths. Devs edited code in Windows tools, and WSL accessed it via /mnt/c. Their container bind mounts crossed the boundary, and the workload was a perfect storm of small-file metadata operations.

The fix wasn’t “switch Docker.” It was a policy exception and a workflow change: store repos inside WSL for active development, keep “exported” artifacts on Windows, and use IDE WSL integration. The same Docker setup suddenly looked “faster” because the filesystem stopped being a translator for every syscall.

Optimization that backfired: starving the VM to “save battery”

Another team wanted laptops to run cooler during travel. Someone decided to clamp WSL2/ Docker resources: fewer CPUs, low memory, aggressive reclaim. On paper, it reduced fan noise. In practice, it turned builds into a slot machine.

Symptoms were weird: first build after reboot was okay, second build crawled, then it got better, then it stalled again. Engineers blamed caches, then blamed Docker, then blamed each other’s branches. Classic.

The hidden behavior was memory pressure. With too little RAM, the Linux page cache couldn’t do its job. Meanwhile, repeated compiles and dependency scans churned the filesystem cache and triggered swap-like behavior inside the VM. The CPU limits amplified the pain: less parallelism, longer wall time, more time for background processes to collide.

They reverted to sane limits, then made battery savings explicit and opt-in. Moral: “optimization” that removes headroom tends to create jitter, and jitter is what makes engineers distrust every other metric.

Boring but correct practice that saved the day: pinning the workflow and measuring one thing at a time

A larger org had a mix of Desktop users and WSL Engine users. Performance complaints kept arriving, but every ticket was un-actionable: “Docker is slow.” Helpful.

An SRE-minded engineer created a short internal “Docker on Windows contract”: where repos live, which daemon is standard, how to mount dependencies, and a minimal benchmark script that measures file traversal, build time, and DNS latency separately. It was not glamorous. It was a checklist.

When an outage-like incident hit—developers couldn’t pull images and pipelines slowed—the team used the script and immediately saw DNS latency spikes inside containers. That pointed to a corporate DNS change interacting with VPN split tunneling. Without the baseline tests, they would have wasted days “tuning Docker.”

They fixed DNS policy, not Docker. The boring practice was consistency and measurement discipline, and it paid for itself the first time something went weird at scale.

Common mistakes: symptom → root cause → fix

1) “npm install takes forever in containers”

Symptom: dependency installs are 5–20× slower than expected; CPU is low; disk activity is constant.

Root cause: dependencies are being written to a Windows-backed bind mount (/mnt/c), causing slow metadata operations.

Fix: store repo in WSL filesystem, or keep node_modules in a Docker volume and mount only source code.

2) “Hot reload doesn’t trigger, so the dev server polls and burns CPU”

Symptom: file changes sometimes don’t propagate; fans spin; reload is delayed.

Root cause: inotify events don’t bridge cleanly across certain mount paths; tool falls back to polling.

Fix: move repo into WSL filesystem; configure the watcher to use polling with reasonable intervals if you must stay on Windows paths.

3) “Docker build is slow at COPY”

Symptom: COPY . . takes seconds to minutes; context transfer is heavy.

Root cause: massive build context from poor .dockerignore; repo on Windows path; many small files.

Fix: improve .dockerignore; relocate repo; reorder Dockerfile to maximize caching (copy manifests first, then install, then copy source).

4) “Pulls and apt installs hang randomly on VPN”

Symptom: image pulls stall; package installs wait on name resolution.

Root cause: DNS inside containers is slow or broken due to VPN split-DNS and NAT.

Fix: measure DNS latency; adjust DNS configuration for WSL/Desktop; coordinate with IT for correct resolver behavior.

5) “It’s fast for me but slow for new hires”

Symptom: some machines are fine, others terrible, same repo.

Root cause: inconsistent daemon ownership (Desktop vs WSL Engine), inconsistent file locations, inconsistent resource caps.

Fix: standardize: one supported setup, one repo location recommendation, one baseline diagnostic script.

6) “Disk space disappears and everything gets slower over time”

Symptom: builds slow down; disk nearly full; Docker reports lots of reclaimable data.

Root cause: build cache and volumes grow; WSL VHDX expands; host disk pressure increases.

Fix: prune unused images/volumes/build cache; monitor host disk; avoid keeping endless old layers locally.

Checklists / step-by-step plan

Plan A: you want the fastest dev loop on Windows

  1. Move active repos into WSL filesystem: ~/src inside your distro.
  2. Use IDE support for WSL so editing still feels native.
  3. Use Docker Engine inside WSL2 if you want minimal layers, or Desktop if you need corporate-friendly integration—but keep the files in WSL either way.
  4. Mount source via Linux path. Keep dependency directories in volumes.
  5. Measure: file traversal time, container traversal time, build step timings, DNS latency.

Plan B: you must keep code on C:\ due to corporate policy

  1. Accept that /mnt/c is slower for small files. Don’t fight physics with Slack messages.
  2. Use volumes for heavy dependency directories: node_modules, .venv, target, vendor, tool caches.
  3. Tune watch behavior: prefer polling with sane intervals rather than “hope inotify works.”
  4. Improve .dockerignore to reduce context transfer.
  5. Keep an eye on antivirus/endpoint protection exceptions (with security approval). Those scanners love chewing on dependency trees.

Plan C: you’re optimizing builds, not live dev

  1. Turn on BuildKit; confirm caching works.
  2. Reorder Dockerfile for cache hits: copy lockfiles first, install deps, then copy source.
  3. Minimize build context with .dockerignore.
  4. Separate network issues from disk issues with a DNS timing check.
  5. Use multi-stage builds where it reduces final image size without increasing rebuild time.

FAQ

1) Is Docker Desktop always slower than Docker Engine in WSL2?

No. For CPU-bound workloads they’re often similar. The big wins usually come from keeping files inside the WSL filesystem and avoiding Windows-backed bind mounts.

2) What’s the single biggest performance improvement?

Move your repo from C:\ (accessed as /mnt/c) into the WSL distro filesystem, then bind mount from there. It removes a translation layer from hot paths.

3) Why are bind mounts slow on Windows?

Bind mounts that cross Windows ↔ Linux boundaries must translate filesystem semantics. Small-file metadata ops get expensive. Volumes avoid that by staying in Linux storage.

4) Can I keep using Windows editors if my repo is in WSL?

Yes. Most modern editors have WSL integration. The key is: store the files in Linux, edit via a bridge designed for it, and let containers access them without crossing NTFS.

5) Should I uninstall Docker Desktop if I run Docker Engine in WSL?

If you don’t need Desktop features, uninstalling can reduce confusion and background complexity. If your org depends on Desktop integration (proxies, credential store, support), keep it—but be explicit about which daemon you use.

6) Why does my performance change after reboot or sleep?

WSL2 VM lifecycle, page cache warmth, and dynamic memory allocation all affect “feel.” Cold caches make everything look worse. Resource caps can make it inconsistent.

7) Is Kubernetes in Docker Desktop a performance problem?

It can be, mainly by consuming CPU/RAM and adding background churn. If you’re not using it, turn it off. If you are, budget resources like an adult system, not like a demo.

8) Are Docker volumes always faster than bind mounts?

Not always. Volumes are usually faster than Windows-backed bind mounts. But a bind mount inside the WSL filesystem can be perfectly fine and more convenient for dev workflows.

9) How do I know if the bottleneck is DNS?

Time an nslookup inside a container. If it takes seconds, your “slow pulls” and “slow installs” are probably resolver/VPN/proxy issues.

10) What if my team uses Compose and it’s slow?

Compose amplifies filesystem and watch problems because it often mounts multiple services from the same repo. Fix mounts and file location first, then look at per-service resource usage.

Next steps you can actually do this week

  1. Pick a baseline: decide whether your standard is Docker Desktop or WSL Engine. Standardize contexts and scripts so people can’t “accidentally” compare different daemons.
  2. Move one repo into WSL and re-run two measurements: find traversal time on host and in a bind-mounted container. If you see a big drop, you’ve found your main lever.
  3. Fix mounts strategically: keep source editable, but keep dependency trees in volumes. It’s the best compromise when policy forces Windows paths.
  4. Run the fast diagnosis playbook on the next “Docker is slow” complaint. If you can’t say whether it’s filesystem, DNS, or resource caps, you’re not diagnosing—just narrating.
  5. Document the boring rules (repo location, mounts, DNS check, build cache hygiene). This is how you avoid repeating the same performance investigation every onboarding cycle.

If you want a single opinionated recommendation: put your code in WSL2’s Linux filesystem and stop bind-mounting NTFS into containers unless you have to. Docker Desktop vs WSL Engine is secondary; the filesystem boundary is the boss fight.

]]>
https://cr0x.net/en/docker-desktop-vs-docker-wsl-speed/feed/ 0
WSL2 Setup That Doesn’t Break Docker (Yes, There’s a Right Way) https://cr0x.net/en/wsl2-setup-doesnt-break-docker/ https://cr0x.net/en/wsl2-setup-doesnt-break-docker/#respond Wed, 18 Feb 2026 10:54:31 +0000 https://cr0x.net/?p=34872 Most “WSL2 + Docker” guides are written like nobody has ever shipped software under deadline pressure. They skip the parts where your containers crawl because you cloned a repo into the wrong directory, or Docker Desktop decides your images are “gone” because you “cleaned up” WSL. Then Monday happens.

This is the production-minded way to set up WSL2 so Docker stays fast, predictable, and recoverable. It’s opinionated because reality is opinionated.

The right way: what you’re actually building

You are not “installing Linux on Windows.” You are building a small, managed virtualization stack:

  • Windows hosts the kernel-adjacent infrastructure (Hyper-V components) that WSL2 uses.
  • WSL2 runs one or more Linux distributions inside a lightweight VM. Each distro has its own virtual disk (a VHDX).
  • Docker runs either:
    • Docker Desktop, which runs its own Linux VM and can integrate with WSL2 distros, or
    • Docker Engine inside a WSL2 distro (no Docker Desktop), with systemd and Linux-native behavior.

The “doesn’t break” setup comes down to three themes:

  1. Put the right files on the right filesystem. Linux-on-Linux for builds, node_modules, and image layers. Windows paths only when you truly need Windows tooling.
  2. Control resources deliberately. WSL2 will happily eat RAM and disk until your laptop becomes a space heater with a keyboard.
  3. Pick one Docker ownership model. Either Docker Desktop owns Docker, or your Linux distro does. Mixing them without intent is how you get “it worked yesterday.”

One quote worth keeping above your monitor: Hope is not a strategy. — General Gordon R. Sullivan

Interesting facts and context (so you stop making 2019 mistakes)

  • WSL1 wasn’t a VM. It translated Linux syscalls into Windows kernel calls. Great trick, but it wasn’t “real Linux,” and Docker-in-WSL1 was a science project.
  • WSL2 switched to a real Linux kernel. That’s why Docker became practical. It’s also why you now have a real virtual disk that can fill up and fragment.
  • Early WSL2 builds had notorious file I/O pain on /mnt/c. This is why “clone into Linux home” became the default advice—and it’s still correct for most dev workloads.
  • Docker Desktop moved its backend multiple times. Hyper-V backend, then WSL2 backend, plus shifting integration behavior across versions. Old blog posts lie by accident.
  • WSL2 networking uses NAT. That’s why the WSL2 VM has its own IP and why some VPN/DNS setups behave like they’re haunted.
  • VHDX “shrink” isn’t automatic. Deleting files in Linux doesn’t necessarily return space to Windows. You need explicit compaction steps.
  • Systemd support in WSL is recent. For years it was “no systemd,” which changed how services and Docker behaved. Now you can enable it, but many guides still assume you can’t.
  • Case sensitivity differences are real. Windows filesystems are usually case-insensitive; Linux is case-sensitive. This shows up as weird Git diffs, broken builds, or duplicate files.

WSL2 + Docker: architecture and the sharp edges

Two supported models (pick one)

Model A: Docker Desktop + WSL2 integration. Docker Desktop owns the daemon. Your WSL distro gets Docker CLI access and can run containers, but the “real” Docker state lives in Docker Desktop’s managed environment.

Model B: Docker Engine inside a WSL2 distro. Your distro owns the daemon and storage. This feels more like a Linux server. It’s great if you want Linux-native behavior and fewer surprises from a GUI app.

The storage reality: where performance lives or dies

WSL2 Linux filesystem (your distro’s ext4 inside VHDX) is fast for Linux tooling. Windows-mounted files under /mnt/c are slower and have metadata translation overhead. Docker bind mounts and build contexts amplify that overhead.

If you run builds on Windows paths and mount them into Linux containers, you are asking for performance problems. You may also be asking for file watcher weirdness (especially with Node, Python, or any tool that thinks in inotify).

Networking: “localhost works” until it doesn’t

WSL2 has its own virtual NIC. Windows and WSL2 cooperate so that “localhost” often works both ways, but the path is not symmetric and not guaranteed under VPNs, firewalls, or custom DNS.

Docker adds its own network namespaces and bridge networks inside the Linux environment. Translation layers stack, and every translation layer is a new place for latency, MTU issues, or name resolution to go wrong.

Joke #1: NAT is like middle management—useful in theory, but sometimes your packets leave the meeting more confused than when they entered.

Non-negotiable rules that keep Docker from breaking

Rule 1: Keep source code in the Linux filesystem unless you have a strong reason

Default: clone into ~/src inside WSL. Build there. Run Docker from there. Your life improves.

Exceptions exist (Windows-only IDE workflows, corporate endpoint scanning constraints), but you should treat exceptions like production change requests: documented, justified, reversible.

Rule 2: Don’t let Docker data drift across ownership boundaries

If you use Docker Desktop, let it own the engine and its storage. Don’t also run a separate dockerd inside your distro unless you’re intentionally running two daemons (most people aren’t).

Rule 3: Control WSL resources with a .wslconfig

Unbounded WSL can starve Windows. That can look like “Docker is slow,” “VS Code is laggy,” or “my fan is now my primary input device.” Put limits in place, then tune.

Rule 4: Treat WSL distros like cattle, but export before surgery

WSL makes it easy to nuke and recreate a distro. That’s a feature. But Docker images, volumes, and databases inside WSL are state. Export or snapshot before you “just reinstall Ubuntu real quick.”

Rule 5: Never debug performance without measuring the filesystem boundary

When Docker feels slow, you need to answer: are we doing slow I/O on /mnt/c, or is this CPU/memory/network? Guessing is how you burn afternoons.

Practical tasks (commands, outputs, decisions)

These are the day-to-day commands that keep WSL2 + Docker boring. Each task includes: command, what the output means, and what decision you make next.

Task 1: Confirm WSL versions and default distro

cr0x@server:~$ wsl -l -v
  NAME                   STATE           VERSION
* Ubuntu-22.04           Running         2
  docker-desktop         Running         2
  docker-desktop-data    Running         2

Meaning: You have Ubuntu on WSL2, plus Docker Desktop’s internal distros. That’s normal for Model A.

Decision: If you don’t want Docker Desktop, you should not see docker-desktop distros. If you do, you’re in Model A whether you admit it or not.

Task 2: Check the WSL kernel version (sanity + bug context)

cr0x@server:~$ wsl --status
Default Distribution: Ubuntu-22.04
Default Version: 2
WSL version: 2.1.5.0
Kernel version: 5.15.133.1-1
WSLg version: 1.0.59
MSRDC version: 1.2.4677
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.22631.3085

Meaning: Modern WSL with a 5.15-based kernel. Many older networking and systemd issues depend on kernel/WSL version.

Decision: If you’re on a much older WSL version, upgrade first before chasing ghosts. Debugging old WSL builds is paying interest on technical debt.

Task 3: Verify Docker ownership (Desktop vs in-distro)

cr0x@server:~$ docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT                  ERROR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock
desktop-linux       Docker Desktop                            unix:///var/run/docker.sock

Meaning: Contexts exist; in WSL, both can point to a Unix socket depending on integration.

Decision: Pick a context and stick to it per workflow. If contexts flip under you, that’s a configuration problem, not a “Docker is flaky” problem.

Task 4: Check if systemd is enabled in the WSL distro (Model B prerequisite)

cr0x@server:~$ cat /etc/wsl.conf
[boot]
systemd=true

Meaning: systemd is enabled. You can run docker.service properly inside WSL (Model B).

Decision: If you want Model B and this is missing, add it and restart WSL. If you want Model A (Docker Desktop), you typically don’t need dockerd inside the distro at all.

Task 5: Restart WSL cleanly after config changes

cr0x@server:~$ wsl --shutdown
The operation completed successfully.

Meaning: All WSL VMs stopped. Next launch will re-read config.

Decision: Do this after changing .wslconfig or /etc/wsl.conf. If you don’t, you’re debugging a configuration you didn’t actually apply.

Task 6: Measure where your repo lives (Linux vs Windows mount)

cr0x@server:~$ pwd
/home/cr0x/src/myapp

Meaning: Repo is in the Linux filesystem. Good for Docker build contexts and bind mounts.

Decision: If you’re under /mnt/c/..., expect slower I/O. Move the repo unless you have a hard requirement to keep it on Windows.

Task 7: Demonstrate the performance boundary quickly (fio)

cr0x@server:~$ fio --name=randread --filename=/home/cr0x/fio.test --size=256M --bs=4k --rw=randread --iodepth=32 --numjobs=1 --runtime=10 --time_based --direct=1
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=32
fio-3.33
Starting 1 process
randread: Laying out IO file (1 file / 256MiB)
randread: (groupid=0, jobs=1): err= 0: pid=221: Fri Feb  2 10:01:16 2026
  read: IOPS=21.4k, BW=83.5MiB/s (87.6MB/s)(835MiB/10000msec)

Meaning: You’re getting healthy random read IOPS on the Linux filesystem.

Decision: Run the same test on /mnt/c. If IOPS crater, stop blaming Docker and stop building on Windows mounts.

Task 8: Check Docker storage driver and root dir (where your layers actually live)

cr0x@server:~$ docker info --format 'RootDir={{.DockerRootDir}} Driver={{.Driver}} Cgroup={{.CgroupDriver}}'
RootDir=/var/lib/docker Driver=overlay2 Cgroup=systemd

Meaning: Docker uses overlay2 and stores data in /var/lib/docker inside the Linux environment that owns the daemon.

Decision: If you see an unexpected RootDir or a driver you didn’t plan for, you’re not running the daemon you think you are.

Task 9: Find where the VHDX lives on Windows (capacity planning)

cr0x@server:~$ powershell.exe -NoProfile -Command "Get-ChildItem -Recurse $env:LOCALAPPDATA\Packages\*Ubuntu*\LocalState\ext4.vhdx | Select-Object FullName,Length"
FullName                                                             Length
--------                                                             ------
C:\Users\cr0x\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu22.04LTS_79rhkp1fndgsc\LocalState\ext4.vhdx  64424509440

Meaning: Your Ubuntu distro’s virtual disk is 60 GiB on C:. That’s where your Linux filesystem lives.

Decision: If C: is small or protected by corporate policies, plan a controlled move using export/import (later tasks). Randomly “moving folders” is how you create unrecoverable WSL states.

Task 10: Check WSL disk usage from inside Linux (before you “clean up”)

cr0x@server:~$ df -hT /
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sdb       ext4   251G   96G  143G  41% /

Meaning: The Linux root filesystem has space. This is inside the VHDX.

Decision: If Use% is high, clean inside Linux first (Docker prune, log cleanup). Don’t assume Windows sees freed space afterward.

Task 11: Identify Docker’s biggest space consumers (images/volumes)

cr0x@server:~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          24        6         18.3GB    12.7GB (69%)
Containers      11        2         1.4GB     1.1GB (78%)
Local Volumes   13        5         22.9GB    9.6GB (41%)
Build Cache     0         0         0B        0B

Meaning: Volumes are the heavy hitters. Images are reclaimable too.

Decision: If you need to free disk, prune images/containers first, then audit volumes carefully. Pruning volumes blindly is how you “accidentally” delete someone’s local database.

Task 12: Safe-ish cleanup (prune with intent)

cr0x@server:~$ docker image prune -a
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Deleted Images:
deleted: sha256:3f1e2a0f7c8d...
Total reclaimed space: 9.8GB

Meaning: Unused images removed, ~10 GB back inside Docker’s storage.

Decision: If your builds are slow, pruning won’t help. If your disk is full, pruning helps. Different problems, different levers.

Task 13: Verify bind mount performance risk (are you mounting Windows paths?)

cr0x@server:~$ docker run --rm -v /mnt/c/Users/cr0x/src:/src alpine ls -la /src | head
total 64
drwxrwxrwx    1 root     root          4096 Feb  2 10:22 .
drwxr-xr-x    1 root     root          4096 Feb  2 10:22 ..
-rwxrwxrwx    1 root     root          1962 Feb  1 19:04 README.md
-rwxrwxrwx    1 root     root           372 Feb  1 19:04 docker-compose.yml

Meaning: You’re bind-mounting a Windows path. It works, but it’s a performance trap for heavy I/O workloads.

Decision: If your stack does lots of file reads (Node, PHP, webpack, Python), move the repo into Linux and mount from /home instead.

Task 14: Confirm port reachability from Windows to WSL containers

cr0x@server:~$ docker run --rm -d -p 8080:80 nginx:alpine
c7c51d1c65b98a0c2a5c2b9d9a3a9b10f1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6
cr0x@server:~$ powershell.exe -NoProfile -Command "Test-NetConnection -ComputerName 127.0.0.1 -Port 8080 | Select-Object TcpTestSucceeded"
TcpTestSucceeded
----------------
True

Meaning: Port publishing works and Windows can reach it on localhost.

Decision: If this fails under VPN, the issue is usually name resolution, firewall rules, or WSL NAT behavior—not “nginx is broken.”

Task 15: Diagnose DNS inside WSL (the silent productivity killer)

cr0x@server:~$ cat /etc/resolv.conf
nameserver 172.22.48.1
search corp.example
cr0x@server:~$ getent hosts registry-1.docker.io | head -n 1
2600:1f18:2148:bc02:7fd3:1c63:6f57:fc3a registry-1.docker.io

Meaning: WSL is using a NAT-provided resolver; DNS lookups work.

Decision: If lookups hang or return wrong results on VPN, you may need to disable auto-generated resolv.conf and set DNS explicitly (with care—see common mistakes).

Task 16: Export a distro before risky changes (your “undo” button)

cr0x@server:~$ wsl --export Ubuntu-22.04 D:\wsl-backups\Ubuntu-22.04.tar
The operation completed successfully.

Meaning: You now have an offline backup you can import elsewhere.

Decision: Do this before moving disks, reinstalling, or “cleaning up WSL.” If you don’t, you’re betting your work on luck.

Task 17: Move a distro to another drive (the correct way)

cr0x@server:~$ wsl --unregister Ubuntu-22.04
The operation completed successfully.
cr0x@server:~$ wsl --import Ubuntu-22.04 D:\wsl\Ubuntu-22.04 D:\wsl-backups\Ubuntu-22.04.tar --version 2
The operation completed successfully.

Meaning: The distro is now stored under D:\wsl\Ubuntu-22.04 (including its new VHDX). This is how you relocate WSL storage without corruption.

Decision: Do this if C: is constrained or if corporate encryption policies make WSL disk I/O miserable on the system drive.

Task 18: Verify Docker still points where you think after moves

cr0x@server:~$ wsl -d Ubuntu-22.04 -- bash -lc "docker ps"
CONTAINER ID   IMAGE           COMMAND                  CREATED       STATUS       PORTS                  NAMES
c7c51d1c65b9   nginx:alpine    "/docker-entrypoint.…"   2 min ago     Up 2 min     0.0.0.0:8080->80/tcp   nostalgic_borg

Meaning: Docker CLI can still talk to a daemon and your container state exists.

Decision: If containers/images “vanish,” you switched daemons or contexts. Diagnose ownership model drift first, not “Docker ate my homework.”

Fast diagnosis playbook

This is the order that finds the bottleneck quickly, without chasing folklore.

1) Identify the ownership model in 30 seconds

  • Check: wsl -l -v for docker-desktop distros; docker info for root dir and driver.
  • Why: If you’re unsure which daemon you’re talking to, every other observation is suspect.
  • Decision: Commit to Model A or Model B for this machine and team. “Both” is a debugging tax.

2) Measure the filesystem boundary (Linux vs /mnt/c)

  • Check: Where the repo lives (pwd), where bind mounts point, and do a quick fio test in both locations.
  • Why: Most “Docker on WSL is slow” reports are actually “I’m doing Linux builds on a Windows-mounted path.”
  • Decision: If /mnt/c is the hotspot, move code into Linux filesystem and adjust editor workflow.

3) Confirm resource limits (RAM/swap) and pressure signals

  • Check: WSL memory usage, swap behavior, and whether Windows is paging heavily.
  • Why: Under memory pressure, everything looks like I/O slowness.
  • Decision: Set .wslconfig limits; increase memory for build-heavy workloads; keep swap sane.

4) Validate DNS and proxy behavior (especially on VPN)

  • Check: getent hosts, pulls from registries, and whether /etc/resolv.conf is auto-generated.
  • Why: Build steps often fetch dependencies; broken DNS looks like “Docker build hangs.”
  • Decision: Fix DNS generation or set static DNS with a controlled policy.

5) Only then: Docker-specific performance (buildkit, cache, overlay2)

  • Check: Build logs, cache misses, layer invalidation patterns, and whether bind mounts are causing rebuild storms.
  • Why: Docker performance tuning is worthless if the underlying filesystem is wrong.
  • Decision: Tune Dockerfile ordering, use BuildKit, and reduce bind-mounted churn.

Common mistakes: symptom → root cause → fix

1) “Docker is painfully slow”

Symptom: npm install, composer install, pip install, or webpack builds take 5–20× longer than expected.

Root cause: Repo lives under /mnt/c and you’re bind-mounting it into containers; metadata operations are expensive across the Windows↔Linux boundary.

Fix: Move the repo into the Linux filesystem (/home/.../src). If you need Windows editor access, use VS Code Remote / WSL or access via \\wsl$\ from Windows.

2) “My images/containers disappeared after a reboot”

Symptom: Yesterday’s images are gone; docker ps is empty; volumes aren’t there.

Root cause: You’re talking to a different Docker daemon/context (Desktop vs in-distro), or Docker Desktop reset its data distro.

Fix: Run docker context ls, then docker info to confirm RootDir. Decide Model A or B and remove the other daemon to prevent drift.

3) “Docker build hangs on ‘Downloading…’”

Symptom: Builds stall fetching base images or dependencies; retry sometimes works.

Root cause: DNS resolution issues under VPN, or proxy settings not applied inside WSL/Docker environment.

Fix: Test name resolution with getent hosts. If needed, disable auto-generated resolv.conf and set known-good resolvers; align proxy env vars for both WSL and Docker.

4) “Published ports randomly stop working”

Symptom: -p 8080:80 worked earlier; now Windows can’t reach it on localhost.

Root cause: WSL NAT and firewall changes after sleep/VPN toggles; sometimes the WSL virtual switch is in a bad state.

Fix: Restart WSL (wsl --shutdown). If using Docker Desktop, restart it too. Re-test with Test-NetConnection.

5) “Disk is full but I deleted a ton of stuff”

Symptom: Windows drive space doesn’t increase after deleting Docker images or Linux files.

Root cause: VHDX does not shrink automatically; space remains allocated inside the virtual disk.

Fix: Clean up inside Linux, then compact the VHDX using supported tooling. If you can’t compact easily, export/import to a new VHDX as a pragmatic reset.

6) “File watching doesn’t work (hot reload broken)”

Symptom: App doesn’t reload on file changes; watchers miss events or spike CPU.

Root cause: Watching a Windows-mounted filesystem from Linux containers can degrade inotify semantics; or too many files overwhelm watchers.

Fix: Keep code in Linux filesystem; for large monorepos, configure polling with sane intervals as a fallback, but treat it as last resort.

7) “Git shows weird case-only renames or duplicates”

Symptom: CI passes on Linux but local behaves oddly; files differ only by case.

Root cause: Case sensitivity mismatch when working on Windows filesystem.

Fix: Work inside Linux filesystem for repos that assume Linux semantics; enforce case-consistent naming in the repo.

Three corporate mini-stories from the trenches

Incident: the wrong assumption (“WSL is basically just a folder”)

A product team rolled out WSL2 to standardize dev environments. The mandate was reasonable: Docker, Linux tooling, consistent builds. The implementation was not. Someone assumed WSL distros were just directories you could move around like any other cache.

So an engineer “saved space” by copying the WSL distro’s LocalState directory to D: and deleting the original from C:. It worked for a day, mostly. Then Docker Desktop updated, WSL registered state no longer matched the disk content, and the distro refused to start. The immediate symptom was “Docker won’t start,” but the real failure was “the VM disk is no longer where WSL thinks it is.”

The recovery was ugly. They had to unregister and re-import from whatever copy still existed. Some people lost local databases stored in volumes because nobody had a policy for exporting WSL state before surgery.

The lesson wasn’t “don’t use WSL.” It was: treat WSL distros like managed VM assets. If you want them on another drive, use export/import. If you want to reduce space, prune and compact, don’t play filesystem roulette.

Optimization that backfired: “Let’s put everything on /mnt/c so Windows tools can see it”

A different org wanted a smooth experience for Windows-first developers. Their idea: keep repos on C:\ so Windows security tools and IDEs “see everything,” then run builds in WSL2 and containers by bind-mounting the same Windows paths.

On small projects it looked fine. On the main monorepo it was a slow-motion incident. Incremental builds were no longer incremental. File scanning steps (linters, test discovery, TypeScript compilation) became the new bottleneck. People compensated by turning off checks locally—because deadlines don’t care about your architecture diagram.

Then came the real backfire: Docker builds started timing out in CI-like local scripts. Not because the CPU was weak, but because the build context tar and overlay operations were spending their time in the Windows↔Linux translation layer. Developers called it “Docker overhead.” It wasn’t. It was storage placement.

They eventually switched to Linux-native repo locations and used editor integrations to keep the Windows UX. The speedup was immediate and boring, which is the kind of success you want in operations.

Boring but correct practice that saved the day: “Export before changes”

A platform team maintained a standard WSL2 image for developers working on multiple containerized services. They had a rule: any change that might touch WSL distro storage requires an export first. No exceptions, no “it’ll be fine,” no heroics.

One quarter, a Windows update plus a security agent update caused disk I/O stalls on C:. Developers reported Docker builds taking forever, plus occasional WSL timeouts. The fix was to relocate distros to a different drive where the agent’s scanning policy was less aggressive. This is the kind of sentence that makes everyone sigh, but it happens.

Because they had a recent export, the move was procedural: shutdown WSL, export, unregister, import to the new location, verify. They didn’t have to debug corrupted VHDX files at 2 a.m. They also didn’t have to ask hundreds of engineers to rebuild environments from scratch.

It wasn’t glamorous. It was disciplined. And it turned a potentially chaotic migration into a controlled maintenance task.

Checklists / step-by-step plan

Step 0: Decide your Docker model (don’t improvise later)

  • Choose Model A if you want the easiest desktop experience, GUI management, and don’t need Linux-native daemon control.
  • Choose Model B if you want Linux-like server behavior, explicit control of /var/lib/docker, and fewer Docker Desktop surprises.

Step 1: Install WSL2 cleanly and verify

  • Install WSL and a distro.
  • Run wsl -l -v and confirm VERSION=2.
  • Run wsl --status and confirm you’re on a modern WSL version.

Step 2: Configure WSL resources (the boring guardrails)

Create or edit %UserProfile%\.wslconfig on Windows. Example policy (adjust to your machine):

  • Memory: cap to something reasonable (e.g., 8–16 GB on dev laptops).
  • Swap: don’t set it to 0 unless you like OOM kills during builds.
  • Processors: cap if you want your fans to stop negotiating for hazard pay.

Then wsl --shutdown to apply.

Step 3: Put code in the right place

  • Create ~/src inside WSL.
  • Clone repos there.
  • Run Docker builds from there.

Step 4: Docker setup by model

Model A: Docker Desktop + WSL2 integration

  • Enable WSL2 backend in Docker Desktop settings.
  • Enable integration for your chosen distro.
  • Confirm docker ps works inside the distro.
  • Do not install and run a separate dockerd inside WSL unless you intentionally want two daemons.

Model B: Docker Engine inside WSL2

  • Enable systemd in /etc/wsl.conf.
  • Restart WSL.
  • Install Docker Engine packages inside the distro.
  • Start and enable docker via systemd; confirm docker info shows overlay2 and sane root dir.

Step 5: Make disaster recovery boring

  • Before major changes, export the distro: wsl --export.
  • For drive relocation: export → unregister → import.
  • Keep a policy for what is allowed to live in Docker volumes locally (databases, caches) and how to back it up if it matters.

Joke #2: If your only backup plan is “I’ll remember what I did,” congratulations—you’ve invented single-point-of-failure memory.

FAQ

1) Should I use Docker Desktop or Docker Engine inside WSL?

If you want the least friction on Windows, use Docker Desktop (Model A). If you want Linux-native control and predictable daemon behavior, run Docker Engine inside WSL (Model B). Pick one per machine.

2) Why is building from /mnt/c so much slower?

Because you’re crossing a filesystem boundary with metadata translation. Builds do lots of tiny file operations. Tiny operations are where the boundary costs show up.

3) Where are my Docker images actually stored?

Run docker info and look at DockerRootDir. In Docker Desktop, the backing storage is in its managed environment; in in-distro Docker, it’s usually /var/lib/docker inside that distro’s VHDX.

4) Can I move WSL to another drive safely?

Yes: export the distro to a tar, unregister it, then import it to a directory on the target drive. Do not manually copy the LocalState VHDX and hope.

5) Why does Windows disk space not come back after deleting files in WSL?

Because the VHDX file doesn’t automatically shrink. You freed space inside ext4, but Windows still sees the VHDX allocation. Use compaction workflows or export/import to recreate a smaller VHDX.

6) How do I know if “Docker is slow” is actually DNS?

If pulls/build steps hang on network fetches, test with getent hosts and a simple image pull. If name resolution is slow or wrong, fix DNS before touching Dockerfiles.

7) Is it safe to run two Docker daemons (Desktop + in-distro)?

It can be done, but it’s usually a self-inflicted outage. You’ll confuse contexts, ports, and image locations. If you must, document contexts and pin them per project.

8) Why do ports sometimes work only from Windows or only from WSL?

Because NAT and forwarding rules differ by direction. Also, VPN/firewall changes can break the implicit glue. Restart WSL and Docker Desktop, then re-test with a known container and Test-NetConnection.

9) What’s the safest way to handle databases in local Docker volumes?

Assume volumes are disposable unless you back them up. For anything important, script exports (e.g., pg_dump) to a known location inside Linux filesystem and/or a Windows backup path.

10) Does enabling systemd in WSL matter for Docker?

For Model B, yes: systemd makes Docker behave like it does on real Linux, including service lifecycle and cgroup driver alignment. For Model A, it’s optional.

Next steps (do these, not vibes)

  1. Decide your Docker model and remove ambiguity: Desktop integration or in-distro daemon.
  2. Move active repos into Linux filesystem and stop bind-mounting Windows paths for heavy builds.
  3. Add WSL resource limits so Windows stays responsive under load.
  4. Export your distro before changes like drive moves, reinstallations, or “cleanup.”
  5. Use the fast diagnosis playbook the next time performance tanks: ownership → filesystem boundary → resources → DNS → Docker tuning.

When WSL2 and Docker are set up correctly, they don’t feel like a “stack.” They feel like your machine is finally cooperating. That’s the goal: boring, fast, recoverable.

]]>
https://cr0x.net/en/wsl2-setup-doesnt-break-docker/feed/ 0