WSL2 + Kubernetes: The Setup That Doesn’t Melt Your Laptop

Was this helpful?

You installed Kubernetes locally because you wanted speed: iterate fast, test charts, debug controllers, maybe run a small platform stack.
Then your fans hit takeoff, your SSD starts doing push-ups, and your “quick dev cluster” becomes the reason Slack is asking if you’re online.

WSL2 can be a great place to run Kubernetes—if you treat it like a real VM with storage and memory constraints, not like a magical Linux folder.
This is the practical setup that avoids the classic failure modes: runaway RAM, slow I/O, DNS weirdness, and “why is kubectl hanging?”

What you’re actually building (and why it melts laptops)

On Windows with WSL2, “running Kubernetes” is rarely just “running Kubernetes.”
It’s a stack of nested abstractions that each has opinions about CPU scheduling, memory reclamation, filesystem semantics, and networking.
When any layer guesses wrong, your laptop pays.

The typical WSL2 Kubernetes setup looks like this:

  • Windows host OS
  • WSL2 VM (a lightweight Hyper-V VM with a virtual disk)
  • Linux userland distro (Ubuntu, Debian, etc.)
  • Container runtime (Docker Engine, containerd, or nerdctl stack)
  • Kubernetes distribution (kind, k3d, minikube, microk8s, or Docker Desktop’s integrated cluster)
  • Your workloads: databases, operators, build pipelines, ingress controllers, service meshes—aka “tiny production”

The meltdown usually comes from one of three places:

  1. Memory: WSL2 happily caches page cache and doesn’t always hand it back quickly. Kubernetes happily schedules pods until the node is “fine” right up until it isn’t.
  2. Storage: crossing the Windows/Linux filesystem boundary can turn normal I/O into a slow-motion incident. Databases amplify this with fsync and small random writes.
  3. Networking/DNS: kube-dns, Windows DNS, WSL2 virtual NICs, and VPN clients form a triangle of sadness.

The goal isn’t “make it fast in benchmarks.” The goal is “make it predictably fast enough,”
and more importantly, make failure modes obvious.
Reliability in dev matters because dev is where you decide what you’ll regret in prod.

A few facts and historical context (so the weird parts make sense)

These are short on purpose. They’re the mental model upgrades that prevent hours of ghost-hunting.

  1. WSL1 was syscall translation; WSL2 is a real Linux kernel in a VM. That’s why WSL2 can run Kubernetes properly, but also why it behaves like a VM with its own disk and memory policies.
  2. WSL2 stores Linux files in a virtual disk (VHDX) by default. That disk can grow quickly and doesn’t always shrink unless you explicitly compact it.
  3. Accessing Linux files from Windows is not the same as accessing Windows files from Linux. The performance characteristics differ dramatically, and the slow path will hurt databases and image builds.
  4. Kubernetes wasn’t designed for laptops. It was designed for clusters where “one node is a cattle VM” and you can burn CPU on reconciliation loops without hearing fans.
  5. kind runs Kubernetes in Docker containers. That’s excellent for reproducibility, but it adds another layer where storage and networking can get “creative.”
  6. k3s (and by extension k3d) was built to be lightweight. It uses SQLite by default, which is fine locally, but it still stresses storage if you combine it with lots of controllers.
  7. cgroups v2 changed how resource isolation behaves. Many “why is my memory limit ignored?” conversations are really “which cgroup mode am I on?” conversations.
  8. DNS is a top-tier failure mode in local clusters. Not because DNS is hard, but because corporate VPNs and split-horizon DNS can quietly override everything.
  9. Container image builds punish filesystem metadata. The difference between a fast filesystem and a bridged one becomes obvious when you run multi-stage builds with thousands of small files.

Pick your local Kubernetes: kind vs k3d vs minikube (and what I recommend)

My default recommendation: kind inside WSL2, with constraints

For most developers and SREs doing platform work, kind gives you repeatability, speed, and clean teardown.
The cluster is “just containers,” and you can version pin Kubernetes easily.
The key is to constrain WSL2 resources and keep your workloads on the Linux filesystem.

When to prefer k3d (k3s in Docker)

If your goal is “run a practical dev platform stack with minimal overhead,” k3d is excellent.
k3s trims fat: fewer components, less memory. It’s forgiving on smaller laptops.
It’s also closer to what many edge setups run, which is useful if you ship to constrained environments.

When to use minikube

minikube is fine when you want “one tool that does many drivers.”
But on WSL2, minikube can land you in a confusing driver matrix: Docker driver, KVM (usually not), Hyper-V (Windows-side),
and then you start debugging the driver more than the cluster.
If you’re already happy with Docker inside WSL2, minikube’s Docker driver is workable.

What to avoid (unless you have a reason)

  • Running heavy stateful workloads on /mnt/c and then blaming Kubernetes for being slow. That’s not Kubernetes; that’s the filesystem boundary asking you to stop.
  • Over-allocating CPUs and RAM “because it’s local.” WSL2 will take it, Windows will fight back, and your browser will lose.
  • Doing “prod-like” with everything turned on. Service mesh + distributed tracing + three operators + a database + a CI system is not a dev cluster; it’s a hobby data center.

One short joke, as promised: Kubernetes is like a kitchen with 30 chefs—nobody cooks faster, but everyone files a status update.

WSL2 baseline: limits, kernel, and the things Windows won’t tell you

Set WSL2 resource limits or accept chaos

By default, WSL2 will scale memory usage up to a large fraction of your system RAM.
It’s not malicious. It’s Linux doing Linux things—using memory for cache.
The problem is that Windows doesn’t always reclaim it in a way that feels polite.

Put a .wslconfig file in your Windows user profile directory (Windows side).
You want a ceiling on memory and CPU, and you want swap to exist but not become a substitute for RAM.

cr0x@server:~$ cat /mnt/c/Users/$WINUSER/.wslconfig
[wsl2]
memory=8GB
processors=4
swap=4GB
localhostForwarding=true

Decision: If you have 16GB RAM total, 8GB for WSL2 is usually sane.
If you have 32GB, you can go 12–16GB. More than that tends to hide leaks and bad pod limits.

WSL2 reclaim behavior: use the tools you have

WSL2 memory reclamation has improved, but it can still feel sticky after heavy builds or cluster churn.
If you need to reclaim memory quickly, shutting down WSL is the blunt instrument that works.

cr0x@server:~$ wsl.exe --shutdown
...output...

Output meaning: No output is normal. It stops all WSL distros and the WSL2 VM.
Decision: Use it when Windows is starved and you need RAM back now, not when you’re troubleshooting an in-cluster problem.

Use systemd in WSL2 (if available) and be explicit

Modern WSL supports systemd. That makes running Docker/containerd and Kubernetes tooling less awkward.
Check if systemd is enabled.

cr0x@server:~$ ps -p 1 -o comm=
systemd

Output meaning: If PID 1 is systemd, you can use normal service management.
If it’s something else (like init), you’ll need to manage daemons differently.
Decision: Prefer systemd-enabled WSL for stability and fewer “why didn’t it start?” mysteries.

Storage on WSL2: the difference between “works” and “doesn’t hate you”

Rule #1: keep Kubernetes data on the Linux filesystem

Put your cluster state, container images, and persistent volume data under your Linux distro filesystem (e.g., /home, /var).
Avoid storing heavy-write workloads on /mnt/c.
The interop layer can be fine for editing code, but it’s a trap for databases and anything fsync-heavy.

Rule #2: understand where your bytes go

Docker/containerd store images and writable layers somewhere under /var/lib.
kind and k3d add their own layers.
If you build a lot of images or run CI-like workloads, your VHDX grows. It might not shrink.
That’s not a moral failing; that’s how virtual disks work.

Rule #3: measure I/O the boring way

You don’t need fancy storage tooling to catch the big problems. You need two comparisons:
Linux FS performance and Windows-mounted FS performance.
If one is 10x slower, don’t “tune Kubernetes.” Move the workload.

cr0x@server:~$ dd if=/dev/zero of=/home/cr0x/io-test.bin bs=1M count=512 conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.24 s, 433 MB/s

Output meaning: This is a rough sequential write with flush. Hundreds of MB/s is expected on SSD-backed storage.
Decision: If this is single-digit MB/s, your VM is under I/O pressure or your disk is unhappy. Fix that before Kubernetes.

cr0x@server:~$ dd if=/dev/zero of=/mnt/c/Users/$WINUSER/io-test.bin bs=1M count=256 conv=fdatasync
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 12.8 s, 21.0 MB/s

Output meaning: If you see this kind of drop, that’s the interop path.
Decision: Don’t put /var/lib/docker, /var/lib/containerd, or PV directories on /mnt/c.
Keep code there if you must, but keep build caches and databases in Linux space.

VHDX growth: compacting is a maintenance chore, not a one-time fix

If your WSL2 virtual disk grows due to image builds and churn, you can compact it—but compacting is not automatic.
First, clean up inside Linux: remove unused images, prune volumes, clear caches.
Then compact from Windows. The exact steps differ by Windows build and tooling, but the principle is consistent:
delete unused blocks in the guest, then tell the host to compact.

The boring truth: local clusters create data. They rarely delete data as aggressively as you think.
If you treat your dev environment like production, you’ll schedule maintenance like production. That’s the deal.

Networking: NodePorts, ingress, DNS, and the localhost trap

Localhost is not a philosophy; it’s a routing decision

In WSL2, “localhost” can mean Windows localhost or WSL localhost depending on how you started the process.
With localhostForwarding=true, WSL can forward ports to Windows, but it’s not magic for all traffic patterns.

If you run kind inside WSL2 and expose a service with NodePort, you’ll often access it from Windows via forwarded ports
or by hitting the WSL VM IP. Both can work. The important part is being consistent and documenting which one your team uses.

DNS: the silent killer of kubectl

A lot of “Kubernetes is slow” complaints are actually DNS timeouts.
kubectl looks like it’s hung, but it’s waiting for API server calls that never quite resolve cleanly.
VPN clients make this worse by injecting resolvers or forcing split DNS.

Your job is to separate “API server slow” from “name resolution slow” in under five minutes.
Later, you can argue with corporate VPN tooling.

Practical tasks (commands, outputs, and decisions)

These are real tasks you’ll do when setting up and when debugging.
Each one includes what the output means and the decision you make from it.
Do them in order when you’re building a new setup; do them selectively when you’re on fire.

Task 1: Confirm you’re on WSL2 (not WSL1)

cr0x@server:~$ wsl.exe -l -v
  NAME            STATE           VERSION
* Ubuntu-22.04    Running         2

Output meaning: VERSION 2 means a real Linux kernel VM. VERSION 1 means syscall translation.
Decision: If you’re on VERSION 1, stop and convert. Kubernetes needs the real kernel behavior.

Task 2: Check kernel and cgroup mode

cr0x@server:~$ uname -r
5.15.146.1-microsoft-standard-WSL2
cr0x@server:~$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

Output meaning: cgroup2 mounted means unified hierarchy.
Decision: If your tooling assumes cgroup v1 (older Docker configs, some monitoring agents), expect surprises.

Task 3: Verify systemd and service control

cr0x@server:~$ systemctl is-system-running
running

Output meaning: systemd is active and stable.
Decision: If you see degraded, check failed units before blaming Kubernetes for “random” issues.

Task 4: Confirm container runtime health (Docker example)

cr0x@server:~$ docker info --format '{{.ServerVersion}} {{.CgroupVersion}}'
24.0.7 2

Output meaning: Docker is reachable and reports cgroup version.
Decision: If this command is slow or hangs, fix Docker before touching kind/k3d.

Task 5: Measure memory pressure quickly

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       5.9Gi       310Mi       160Mi       1.5Gi       1.3Gi
Swap:          4.0Gi       1.2Gi       2.8Gi

Output meaning: available matters more than free. Swap use indicates pressure.
Decision: If available is under ~1Gi and swap is climbing, scale down the cluster, reduce limits, or raise WSL memory.

Task 6: Check disk usage where it counts

cr0x@server:~$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb       250G  180G   71G  72% /

Output meaning: This is your distro filesystem backed by VHDX.
Decision: Above ~85% usage, performance and compaction behavior get worse. Prune images and old PVs.

Task 7: Install and create a kind cluster with sane defaults

cr0x@server:~$ kind create cluster --name dev --image kindest/node:v1.29.2
Creating cluster "dev" ...
 ✓ Ensuring node image (kindest/node:v1.29.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-dev"
You can now use your cluster with:

kubectl cluster-info --context kind-dev

Output meaning: Cluster created, context set. StorageClass installed (usually standard).
Decision: If CNI install hangs, suspect DNS/proxy/VPN or image pull issues—don’t keep retrying blindly.

Task 8: Verify cluster liveness and component health

cr0x@server:~$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:40685
CoreDNS is running at https://127.0.0.1:40685/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
cr0x@server:~$ kubectl get nodes -o wide
NAME                STATUS   ROLES           AGE   VERSION   INTERNAL-IP   OS-IMAGE
dev-control-plane   Ready    control-plane   2m    v1.29.2   172.18.0.2    Debian GNU/Linux 12 (bookworm)

Output meaning: API server reachable, node Ready, internal IP assigned.
Decision: If node is NotReady, go straight to kubectl describe node and CNI logs—don’t reinstall everything yet.

Task 9: Spot the real resource hogs (nodes + pods)

cr0x@server:~$ kubectl top nodes
NAME                CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
dev-control-plane   620m         15%    2240Mi          56%

Output meaning: Metrics-server is working (kind often includes it via addons or you installed it).
Decision: If memory is high when idle, check for chatty controllers, leaking dev builds, or a stuck log shipper.

Task 10: Validate DNS inside the cluster (the cheap smoke test)

cr0x@server:~$ kubectl run -it --rm dns-test --image=busybox:1.36 --restart=Never -- nslookup kubernetes.default.svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default.svc.cluster.local
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
pod "dns-test" deleted

Output meaning: CoreDNS responds, service discovery works.
Decision: If this times out, fix DNS before debugging your app. Your app is innocent until proven guilty.

Task 11: Confirm whether slow kubectl is network or API latency

cr0x@server:~$ time kubectl get pods -A
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   coredns-76f75df574-2lq9r                    1/1     Running   0          4m
kube-system   coredns-76f75df574-9x2ns                    1/1     Running   0          4m
kube-system   etcd-dev-control-plane                      1/1     Running   0          4m
kube-system   kindnet-4cdbf                               1/1     Running   0          4m
kube-system   kube-apiserver-dev-control-plane            1/1     Running   0          4m
kube-system   kube-controller-manager-dev-control-plane   1/1     Running   0          4m
kube-system   kube-proxy-4xw26                            1/1     Running   0          4m
kube-system   kube-scheduler-dev-control-plane            1/1     Running   0          4m

real    0m0.312s
user    0m0.127s
sys     0m0.046s

Output meaning: 300ms is fine for local. Multiple seconds suggests DNS, kubeconfig context confusion, or a struggling API server.
Decision: If it’s slow, run kubectl get --raw /readyz?verbose next.

Task 12: Check API server readiness endpoints for the actual blocker

cr0x@server:~$ kubectl get --raw='/readyz?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]informer-sync ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
readyz check passed

Output meaning: If etcd or informer sync is slow/failing, the control plane is the bottleneck.
Decision: etcd slow often means storage latency. That’s your cue to check disk and avoid cross-filesystem I/O.

Task 13: Find the pod that is burning your laptop (CPU/memory)

cr0x@server:~$ kubectl top pods -A --sort-by=memory | tail -n 10
default       api-7b7c7d8f6c-9bq5m             1/1     980Mi
observability prometheus-0                      2/2     1210Mi
observability loki-0                            1/1     1530Mi

Output meaning: You can see the top consumers.
Decision: If your dev cluster includes Prometheus/Loki by default, you just found your “why is it hot?” answer.
Scale down, reduce retention, or use lighter tools locally.

Task 14: Check container runtime disk usage (Docker)

cr0x@server:~$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          42        12        18.4GB    12.1GB (65%)
Containers      19        6         1.2GB     720MB (60%)
Local Volumes   27        9         9.8GB     6.3GB (64%)
Build Cache     124       0         22.6GB    22.6GB

Output meaning: Build cache is often the silent disk eater.
Decision: If reclaimable is large, prune deliberately. Don’t wait until your VHDX hits 95% and everything slows down.

Task 15: Prune safely (and accept the trade-off)

cr0x@server:~$ docker builder prune --all --force
Deleted build cache objects:
v8m3q8qgk7yq4o0l5u3f7s1m2
...
Total reclaimed space: 22.6GB

Output meaning: You reclaimed space, at the cost of rebuilding layers later.
Decision: In dev, disk space and stability are worth more than shaving 90 seconds off the next build.

Task 16: Confirm you’re not accidentally running workloads on /mnt/c

cr0x@server:~$ kubectl get pv -A -o wide
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                      STORAGECLASS   REASON   AGE   VOLUMEMODE
pvc-1d2c7f7e-2f5d-4c0d-9a3a-2c6f9d8b1a7c   10Gi       RWO            Delete           Bound    default/db-data           standard                8m    Filesystem

Output meaning: This doesn’t show the host path. In kind, local-path and default provisioning differ.
Decision: Inspect the storage class and provisioner behavior; if it binds to hostPath on a Windows-mounted path, fix it immediately.

Task 17: Diagnose kubelet/container issues via node logs (kind node container)

cr0x@server:~$ docker ps --format 'table {{.Names}}\t{{.Status}}' | grep dev-control-plane
dev-control-plane   Up 6 minutes
cr0x@server:~$ docker logs dev-control-plane | tail -n 20
I0205 09:10:12.123456       1 server.go:472] "Kubelet version" kubeletVersion="v1.29.2"
I0205 09:10:15.234567       1 kubelet.go:2050] "Skipping pod synchronization" error="PLEG is not healthy"
...

Output meaning: If PLEG is unhealthy, the kubelet is struggling—often due to disk I/O or container runtime slowness.
Decision: Go check disk latency, container runtime health, and the amount of log churn.

Task 18: Quick check for log storms (they can be your hottest workload)

cr0x@server:~$ kubectl logs -n kube-system deploy/coredns --tail=20
.:53
[INFO] plugin/reload: Running configuration SHA512 = 7a9b...
[INFO] 10.244.0.1:52044 - 46483 "A IN kubernetes.default.svc.cluster.local. udp 62 false 512" NOERROR qr,aa,rd 114 0.000131268s

Output meaning: Some DNS logs are normal. Thousands per second are not.
Decision: If logs are hot, reduce verbosity, fix the client retry loop, or you’ll pay in CPU and disk writes.

Three corporate mini-stories from the trenches

Mini-story 1: The incident caused by a wrong assumption

A team I worked with standardized on WSL2 for dev clusters because it was “basically Linux.”
They kept their repo on the Windows filesystem for easy IDE integration, then mounted it into containers for builds and tests.
It seemed fine for small services. Then the platform group added a local Postgres, a controller, and a test suite that ran migrations on every run.

The symptom was weird: migrations would sometimes take 30 seconds, sometimes 10 minutes.
People blamed “Kubernetes overhead” and “that operator we added.”
One engineer tried to fix it by bumping CPU limits. It got worse—higher CPU just made the system hit the storage bottleneck faster.

The wrong assumption was simple: they assumed /mnt/c was “just another filesystem.”
It isn’t. It’s a boundary with different caching behavior, different metadata performance, and different flush semantics.
Their database volume and build caches were sitting on the slow path.

The fix was unglamorous: move the database and build cache into the Linux filesystem, and only keep the source checkout on Windows if needed.
They also added a preflight script that refused to start the stack if it detected PVs pointing at /mnt/c.
Performance stabilized instantly. Nobody celebrated, which is how you know it was the right fix.

Mini-story 2: The optimization that backfired

Another company wanted “faster local clusters,” so they preloaded everything: observability stack, ingress, cert-manager, and a couple operators.
The idea was noble: reduce onboarding time, ensure everyone had the same baseline, avoid “works on my laptop.”
They even built an internal script that would create the cluster and install all charts in one shot.

Then the tickets started. Laptops ran hot during meetings. Battery life cratered.
kubectl commands would sometimes lag by several seconds. Developers started disabling components “temporarily,” which turned into permanent drift.
The platform team responded by pushing more default resource limits and more replicas, because “prod parity.”

The backfire came from controller churn and background work.
Prometheus scraping, Loki ingestion, cert-manager reconciliation, and operator loops are not free.
In a real cluster, you amortize that cost across servers. On a laptop, you feel it every time the fan ramps.

The fix was to define two profiles: core (ingress + DNS + storage + metrics-server) and full (the heavy stack).
Core was the default; full was opt-in for debugging.
They also trimmed scrape intervals and retention in local mode. The irony: onboarding got faster because people stopped fighting their environment.

Mini-story 3: The boring but correct practice that saved the day

A regulated enterprise (the kind that loves spreadsheets) had a surprisingly smooth WSL2 Kubernetes experience.
Their secret wasn’t a fancy toolchain. It was discipline: version pinning, repeatable cluster creation, and aggressive cleanup.
Every developer had the same kind node image version, the same chart versions, and the same default resource limits.

They also had a weekly maintenance routine: prune unused images, delete unused namespaces, and compact the dev environment when needed.
It was scheduled, documented, and dull.
Developers initially complained—nobody wants “maintenance day” on a laptop.

Then came the day a Windows update changed something subtle in networking.
Half the org’s clusters started having intermittent DNS failures.
The teams with drifted environments had a mess: different CNI versions, random kubeconfigs, inconsistent local DNS overrides.
The disciplined teams could reproduce quickly and compare apples to apples.

They isolated it to resolver behavior under VPN, rolled out a standardized workaround, and got back to work.
Boring practice saved the day: consistent versions and consistent baselines make debugging finite.

Fast diagnosis playbook: what to check first, second, third

When your laptop is melting or your cluster is “slow,” you don’t have time to admire architecture.
You need a deterministic path to the bottleneck.

First: Is the host (Windows + WSL2) under resource pressure?

  1. Check WSL2 memory and swap usage: free -h. If available is low and swap is climbing, you’re memory-bound.
  2. Check disk fullness: df -h /. If near full, everything becomes slower and more fragile.
  3. Check quick I/O sanity: dd ... conv=fdatasync on Linux FS vs /mnt/c. If Linux FS is slow too, you have system-level I/O pressure.

Second: Is Kubernetes control plane healthy or stuck?

  1. API readyz: kubectl get --raw='/readyz?verbose'. If etcd checks are slow, suspect storage latency.
  2. Node status: kubectl get nodes and kubectl describe node. If NotReady, look at CNI and kubelet symptoms.
  3. CoreDNS smoke test: run a busybox nslookup. If DNS is broken, stop pretending the app is the problem.

Third: Which workload is actually burning CPU/memory/I/O?

  1. Top consumers: kubectl top pods -A --sort-by=memory and --sort-by=cpu.
  2. Log storms: check logs on the suspected pods. High write rate equals I/O pressure equals global slowdown.
  3. Image/build churn: docker system df and prune build cache if it’s ballooned.

Paraphrased idea from Werner Vogels: “Everything fails, all the time.” Build your local setup so failure is quick to identify, not mysterious.

Common mistakes: symptom → root cause → fix

1) Symptom: Laptop fans spike when cluster is “idle”

Root cause: background controllers (observability stack, operators) doing constant reconciliation; or a pod in a crash loop writing logs.

Fix: kubectl top pods -A, find the hog; scale it down; reduce retention/scrape intervals; fix crash loops; set sane requests/limits.

2) Symptom: kubectl commands take 5–30 seconds randomly

Root cause: DNS timeouts or VPN resolver interference; sometimes kubeconfig points to a dead context.

Fix: run time kubectl get pods -A, then kubectl get --raw='/readyz?verbose'. If readyz is fine, test DNS inside cluster. If DNS is failing, fix resolv.conf/vpn split DNS policies or run without VPN for local cluster tasks.

3) Symptom: Postgres/MySQL in cluster is painfully slow

Root cause: PV or bind mounts are on /mnt/c or another slow boundary; fsync-heavy workloads amplify it.

Fix: keep PV data on the Linux filesystem; use a local-path provisioner that writes to /var inside WSL, not Windows mounts.

4) Symptom: WSL2 eats RAM and never gives it back

Root cause: Linux page cache + WSL2 reclaim behavior; big builds and image pulls fill cache; memory limit not configured.

Fix: set .wslconfig memory cap; restart WSL with wsl.exe --shutdown when needed; reduce cluster footprint and avoid running everything at once.

5) Symptom: Disk fills up “mysteriously”

Root cause: container image layers, build cache, leftover PV data, and logs; VHDX grows; pruning not done.

Fix: docker system df and docker builder prune --all; delete unused namespaces/PVs; keep an eye on df -h.

6) Symptom: Node becomes NotReady; pods stuck ContainerCreating

Root cause: CNI broken or kubelet/container runtime struggling due to I/O pressure; kind node container unhealthy.

Fix: check kind node container logs; check CNI pods in kube-system; fix disk pressure; recreate cluster if the node image is corrupted.

7) Symptom: Ingress works from WSL but not from Windows

Root cause: port forwarding expectations wrong; Windows firewall/VPN; confusion between WSL IP and Windows localhost.

Fix: decide on one access method (forwarded localhost vs WSL VM IP); document it; expose ingress with a predictable mapping; verify with curl from both sides.

8) Symptom: Builds inside containers are slower than builds on Windows

Root cause: source tree on Windows mount; heavy metadata ops across boundary; antivirus scanning on Windows path.

Fix: keep source inside Linux filesystem for builds; use Windows IDE via WSL integration; exclude build directories from Windows AV if policy allows.

Second and final short joke: If your dev cluster needs a runbook, congratulations—you’ve built a small production environment with worse funding.

Checklists / step-by-step plan

Setup plan (do this once per laptop)

  1. Set WSL2 limits in .wslconfig: cap memory and CPU; enable reasonable swap.
  2. Enable systemd in WSL (if supported) and standardize on it across your team.
  3. Choose one runtime: Docker Engine inside WSL2 is fine; avoid mixing Docker Desktop + WSL Docker unless you enjoy ambiguity.
  4. Choose one Kubernetes tool: kind (recommended) or k3d. Pick one and standardize cluster creation scripts.
  5. Keep cluster data in Linux FS: ensure container runtime storage and PV paths are not on /mnt/c.
  6. Define profiles: “core” default; “full” opt-in. Your laptop is not a staging cluster.
  7. Pin versions: kind node image version, chart versions, and critical addons.

Daily workflow checklist (stay fast, stay sane)

  1. Before heavy work: free -h and df -h /. If you’re low, prune first.
  2. After a big build day: docker system df. If build cache is huge, prune it.
  3. When something feels slow: run the DNS smoke test and /readyz checks before changing anything.
  4. Keep your repo where your tools are fastest: if builds are in Linux, keep the working copy in Linux.

Weekly maintenance checklist (boring, effective)

  1. Prune build cache: docker builder prune --all.
  2. Prune unused images and containers: docker system prune (careful: understand what it deletes).
  3. Delete unused namespaces and PVs in the dev cluster.
  4. Recreate the cluster if it has accumulated too much drift. Tear down is a feature.
  5. Reclaim memory if Windows is tight: wsl.exe --shutdown.

FAQ

1) Should I use Docker Desktop or Docker Engine inside WSL2?

If you want the simplest Windows integration and your company standardizes on it, Docker Desktop is fine.
If you want fewer moving parts and clearer Linux behavior, use Docker Engine inside WSL2.
Pick one and commit; mixed setups create debugging folklore.

2) Why is storing data on /mnt/c such a problem?

Because you’re crossing a virtualization/interop boundary with different caching and metadata semantics.
Databases do lots of small writes and fsync calls. That path punishes them.
Keep heavy I/O inside the Linux filesystem and treat /mnt/c as “good for documents, not for hot data.”

3) kind or k3d: which is less likely to melt my laptop?

k3d often uses less memory at baseline because k3s is smaller.
kind is incredibly predictable and easy to pin versions. Both can be laptop-safe if you keep the workload set lean and set WSL2 limits.
My bias: kind for platform work and multi-node simulation; k3d for “just run the stack.”

4) How much RAM should I allocate to WSL2?

On 16GB total: 6–8GB is a good ceiling.
On 32GB: 12–16GB is fine.
If you allocate too much, you’ll hide bad pod limits and starve Windows apps in subtle ways.

5) Why does WSL2 keep memory after I stop workloads?

Linux aggressively uses memory for filesystem cache. That’s normally good.
WSL2’s reclamation back to Windows can be slower than you’d like.
If you need RAM back immediately, shut down WSL; if you want long-term sanity, constrain memory and reduce background churn.

6) Why are my kubectl commands slow only when the VPN is on?

VPN clients commonly inject DNS resolvers and routing rules.
Kubernetes relies on DNS internally, and kubectl relies on reliable connectivity to the API endpoint.
Diagnose with the DNS smoke test and readyz checks; then decide whether to split-tunnel, adjust resolvers, or run local cluster tasks off-VPN.

7) How do I expose services to Windows from a cluster running in WSL2?

Decide if you’re using forwarded localhost ports or the WSL VM IP.
For predictable dev UX, many teams port-forward (kubectl port-forward) or run an ingress that maps to known ports.
Document the method so your team doesn’t debug “localhost” for fun.

8) Can I run stateful workloads (Postgres, Kafka) locally in WSL2 Kubernetes?

Yes, but be realistic. Postgres is fine for dev if its data lives on the Linux filesystem and you don’t run five other heavy stacks.
Kafka is possible, but it’s often where “local parity” becomes “local punishment.” Consider lighter substitutes unless you’re debugging Kafka-specific behavior.

9) How often should I recreate my local cluster?

If you’re doing platform work with lots of CRDs and controller installs, recreating weekly or biweekly is normal.
If your cluster is stable and you keep it lean, you can run it longer.
Recreate immediately when you suspect drift: weird DNS, stuck webhooks, or mysterious admission failures.

10) What’s the most common root cause of “everything is slow”?

Storage. Either the workload is on the wrong filesystem path, or the disk is near full, or the system is writing logs like it’s paid by the line.
Memory is second. DNS is third. Kubernetes itself is rarely the first cause; it’s just the stage where the problem performs.

Next steps you can do today

  1. Set your WSL2 limits (memory, CPU, swap). If you do only one thing, do this.
  2. Move hot data off /mnt/c: container runtime storage, PVs, databases, build caches—keep them in Linux filesystem.
  3. Pick a lean default cluster profile: kind or k3d with only core addons. Make “full stack” an opt-in profile.
  4. Adopt the fast diagnosis playbook: check host pressure, then control plane health, then top consumers. Stop guessing.
  5. Schedule boring maintenance: prune caches, delete unused namespaces, and recreate the cluster when drift sets in.

The endgame isn’t to build the most impressive local cluster. It’s to build one that behaves consistently under stress.
Predictability is what keeps your laptop cool—and your brain cooler.

← Previous
IT Industry: The ‘Rewrite From Scratch’ Lie — Why It Fails and What Works
Next →
Audio Crackling on Windows 11: Fix Latency Without Buying New Hardware

Leave a comment