The promise is seductive: keep Windows for everything corporate, keep Linux for everything useful, and still hit the GPU like you’re on a real workstation.
Then you run nvidia-smi in WSL and it says “not found”, or worse, “No devices were found”.
This is the reality: WSL GPU support is solid now, but it’s picky. You don’t “install CUDA in WSL” the way you did on bare metal.
You assemble a chain of dependencies across Windows, the WSL kernel, the Linux userland, and sometimes Docker. One weak link and you get a slow, confusing failure.
The mental model: what “GPU in WSL” actually means
WSL2 is not “Linux on Windows” in the cute sense. It’s a lightweight VM. Your Linux processes run in a real Linux kernel (Microsoft-shipped),
and they see hardware through a virtualization boundary.
For GPU compute, WSL2 uses a paravirtualized GPU interface. On modern Windows builds, the Windows GPU driver exposes a compute path to the WSL VM.
Linux processes don’t talk to a physical PCI device; they talk to a virtual device node that forwards work to the host driver.
The Windows driver is the source of truth. In practice, that means:
- You do not install a full Linux NVIDIA kernel driver in WSL for the GPU. If you try, you’ll usually make things worse.
- WSL’s
libcudacomes from a special integration layer, not from your distro’s standard kernel-module stack. - Most failures are mismatches between Windows driver version, WSL kernel capability, and userland libraries (CUDA toolkit / ML frameworks).
Think of WSL GPU support as “remote CUDA to the Windows driver over a very fast local channel”, not as “Linux owns the GPU”.
This mental shift prevents the classic mistake: chasing Linux kernel modules that don’t even apply.
Joke #1: WSL GPU troubleshooting is like herding cats—except the cats are drivers, and they all insist they’re already installed.
The real requirements (what must be true)
1) Your Windows build must support WSL2 GPU compute
GPU compute in WSL2 depends on the Windows host stack: WDDM GPU drivers, WSL plumbing, and kernel support.
If Windows is too old, you’ll get a clean-looking WSL install with a completely dead GPU path.
Practical guidance: keep Windows updated. If you’re in an enterprise ring that lags, verify GPU-in-WSL support before you promise anything to your team.
This isn’t optional; it’s the foundation.
2) Your GPU vendor and driver must explicitly support WSL
In the real world, most people doing CUDA in WSL are on NVIDIA. AMD can work in certain scenarios (and DirectML exists),
but CUDA-in-WSL is primarily an NVIDIA story.
The key requirement is the Windows NVIDIA driver version that includes WSL support.
Installing a random “Studio” or “Game Ready” driver is fine as long as it’s new enough and includes the WSL compute components.
Old drivers will happily render games while refusing to expose compute to WSL.
3) WSL2, not WSL1
WSL1 translates Linux syscalls. It’s clever, but it’s not the platform for GPU compute. You want WSL2 with the real kernel.
If your distribution is still on WSL1, stop. Convert it. Don’t debug anything else until you do.
4) A supported WSL kernel and WSL version
There are two moving pieces: the Windows OS build and the WSL component itself (which now updates more like an app).
The GPU path lives in that space. If you’re on a stale WSL version, you can see weird mismatches: Windows driver looks correct, but WSL lacks the glue.
5) Userland libraries must match the framework expectations
Here’s where people shoot themselves in the foot: they install the full CUDA toolkit in WSL as if they were on Ubuntu bare metal,
then overwrite key libraries, and then wonder why nvidia-smi behaves but PyTorch doesn’t.
Decide what you actually need:
- If you only run PyTorch or TensorFlow: you often don’t need the full CUDA toolkit; you need the right framework build and compatible userland libs.
- If you compile custom CUDA code: you need the toolkit inside WSL, but you still should not install a Linux kernel driver.
6) Docker adds another dependency layer
Docker in WSL is common—and a frequent failure amplifier. Now you have:
Windows driver → WSL GPU interface → WSL userland → Docker engine → NVIDIA container runtime → your container image.
One mismatch and you lose the GPU, or you get a container that sees the GPU but can’t run kernels.
Facts and historical context (why it’s weird)
These aren’t trivia for trivia’s sake. They explain today’s design and the failure modes you’ll actually see.
- WSL1 (2016) wasn’t a VM. It translated syscalls. Great for CLI tools; not a natural fit for GPUs and kernel-mode drivers.
- WSL2 (2019) switched to a real Linux kernel. That made container workloads and kernel-dependent tooling practical.
- GPU compute in WSL2 arrived via paravirtualization, not PCI passthrough. Your Linux guest doesn’t “own” the GPU device like a typical hypervisor passthrough setup.
- The Windows display driver model (WDDM) is in the critical path. If WDDM components are out of date, compute can fail even if graphics work fine.
- NVIDIA’s WSL support included a dedicated user-mode CUDA path. This is why the Windows driver version matters more than the Linux package manager’s idea of “latest”.
nvidia-smiin WSL is a compatibility signal, not the whole truth. It can succeed while your ML framework fails due to cuDNN/cublas mismatches.- Early WSL GPU support was constrained by Windows Insider builds. A lot of “it works on my machine” lore comes from that era.
- DirectML became a pragmatic alternative for some Windows-first ML workloads. But if you want mainstream CUDA ecosystems, you’re still in NVIDIA land.
Practical tasks: commands, outputs, decisions
These are the production-grade checks. Not the “try reinstalling everything” rituals.
Each task includes a command, what typical output means, and what decision you make next.
Task 1: Confirm WSL version and basic health (Windows side)
cr0x@server:~$ wsl.exe --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26052.1000-240405-2035.ge-release
Windows version: 10.0.22631.3007
Interpretation: You want a modern WSL version and a 5.15+ kernel in most environments. The exact numbers vary, but “ancient” is obvious.
Decision: If wsl.exe --version fails or shows very old components, update WSL before touching CUDA or frameworks.
Task 2: Confirm your distro runs as WSL2 (not WSL1)
cr0x@server:~$ wsl.exe -l -v
NAME STATE VERSION
* Ubuntu-22.04 Running 2
Debian Stopped 2
Interpretation: VERSION must be 2.
Decision: If it’s 1, convert: wsl.exe --set-version <distro> 2. Don’t debug GPU on WSL1.
Task 3: Verify Windows can see the NVIDIA driver (host sanity)
cr0x@server:~$ nvidia-smi.exe
Tue Feb 6 09:14:22 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23 Driver Version: 551.23 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 4090 On | 00000000:01:00.0 On | N/A |
| 0% 44C P8 21W / 450W | 987MiB / 24564MiB | 2% Default |
+-----------------------------------------+------------------------+----------------------+
Interpretation: Windows driver is installed and the GPU is visible. The CUDA Version shown here is the driver capability, not your WSL toolkit.
Decision: If this fails on Windows, stop and fix the driver on Windows first. WSL won’t magically improve it.
Task 4: Verify the GPU device node exists inside WSL
cr0x@server:~$ ls -l /dev/dxg
crw-rw-rw- 1 root root 509, 0 Feb 6 09:10 /dev/dxg
Interpretation: /dev/dxg is the WSL GPU bridge device. No /dev/dxg, no GPU compute path.
Decision: If it’s missing, suspect outdated WSL/kernel, wrong Windows build, or GPU driver lacking WSL support.
Task 5: Confirm WSL can call into NVIDIA management stack
cr0x@server:~$ nvidia-smi
Tue Feb 6 09:16:05 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23 Driver Version: 551.23 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 4090 On | 00000000:01:00.0 On | N/A |
| 0% 45C P8 23W / 450W | 992MiB / 24564MiB | 1% Default |
+-----------------------------------------+------------------------+----------------------+
Interpretation: This is a necessary but not sufficient check. It proves the management stack works through WSL.
Decision: If this fails, don’t waste time reinstalling PyTorch yet. Fix the driver/WSL integration first.
Task 6: Check whether you accidentally installed a Linux kernel NVIDIA driver (usually bad in WSL)
cr0x@server:~$ lsmod | grep -E '^nvidia|^nouveau' || true
nouveau 2539520 0
drm_ttm_helper 16384 1 nouveau
ttm 106496 2 drm_ttm_helper,nouveau
drm_kms_helper 315392 1 nouveau
Interpretation: Seeing nouveau or nvidia kernel modules loaded in WSL is a red flag.
In typical WSL GPU setups, you should not be using those kernel modules for the paravirtual path.
Decision: If these are loaded, remove/blacklist them and revert to the supported WSL integration model. This is a common self-inflicted outage.
Task 7: Verify libcuda.so resolution inside WSL
cr0x@server:~$ ldconfig -p | grep -E 'libcuda\.so|libnvidia-ml\.so' | head
libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1
libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libnvidia-ml.so.1
Interpretation: In WSL, you typically want CUDA stubs and integration libraries under /usr/lib/wsl/lib.
If libcuda.so.1 resolves somewhere else (like /usr/lib/x86_64-linux-gnu from a distro package), you may have library conflicts.
Decision: If resolution is wrong, fix the library pathing and remove conflicting packages before you debug frameworks.
Task 8: Confirm the environment is not sabotaging library load paths
cr0x@server:~$ env | grep -E 'LD_LIBRARY_PATH|CUDA_HOME|CUDA_PATH' || true
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
CUDA_HOME=/usr/local/cuda
Interpretation: This can be fine if you intentionally installed a matching toolkit.
It can also force your apps to pick up the wrong libraries and ignore /usr/lib/wsl/lib.
Decision: If you’re just running framework wheels/conda packages, consider unsetting these variables and letting the framework manage dependencies.
Task 9: Validate GPU access from a framework (PyTorch example)
cr0x@server:~$ python3 -c "import torch; print(torch.__version__); print('cuda:', torch.version.cuda); print('is_available:', torch.cuda.is_available()); print('device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'none')"
2.2.1
cuda: 12.1
is_available: True
device: NVIDIA RTX 4090
Interpretation: This is the money test. If nvidia-smi works but torch.cuda.is_available() is false, you likely have userland library mismatches.
Decision: If false, inspect ldd on the relevant CUDA libs, and verify you installed a CUDA-enabled build of the framework.
Task 10: Inspect dynamic linking for a CUDA library (catch “wrong libcuda” fast)
cr0x@server:~$ python3 -c "import ctypes; import os; print(ctypes.CDLL('libcuda.so.1'))"
<CDLL 'libcuda.so.1', handle 55b7f6d8f900 at 0x7f2df0b7f2d0>
Interpretation: If this throws “cannot open shared object file”, your library path is broken.
Decision: If it fails, return to Task 7 and Task 8. Don’t reinstall the world; fix linkage.
Task 11: Check WSL memory and swap configuration (silent performance killer)
cr0x@server:~$ free -h
total used free shared buff/cache available
Mem: 31Gi 4.2Gi 21Gi 136Mi 5.8Gi 26Gi
Swap: 16Gi 0B 16Gi
Interpretation: WSL uses dynamic memory, but it can still hit swap or memory pressure under training workloads.
If your dataset pipeline is memory-hungry, GPU utilization can crater while the CPU churns on paging.
Decision: If swap is active during training, fix memory limits and data pipeline buffering. Don’t blame CUDA first.
Task 12: Measure GPU utilization while running a workload
cr0x@server:~$ nvidia-smi dmon -s pucm -d 1
# gpu pwr gtemp sm mem enc dec mclk pclk
# Idx W C % % % % MHz MHz
0 85 62 97 71 0 0 10501 2520
0 92 63 98 73 0 0 10501 2520
Interpretation: If sm is low and your job is “training”, you’re not GPU-bound. You’re bottlenecked upstream (CPU, I/O, data loader).
Decision: Low utilization means you should profile input pipeline, storage, and CPU threads before touching GPU settings.
Task 13: Check disk and filesystem type (I/O bottleneck audit)
cr0x@server:~$ df -Th / /mnt/c
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdd ext4 250G 42G 196G 18% /
C:\ drvfs 953G 611G 342G 65% /mnt/c
Interpretation: WSL’s native ext4 filesystem (/) behaves like Linux. /mnt/c is DrvFS and has different performance characteristics.
Many ML workloads get wrecked by dataset reads from /mnt/c.
Decision: Keep training datasets and heavy I/O inside the WSL ext4 filesystem. Use /mnt/c for convenience, not performance.
Task 14: Quick and dirty I/O benchmark where your dataset lives
cr0x@server:~$ dd if=/dev/zero of=./io-test.bin bs=1M count=2048 oflag=direct status=progress
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.92 s, 1.1 GB/s
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.92 s, 1.1 GB/s
Interpretation: This tests sequential write throughput. If you run the same on /mnt/c and get dramatically worse numbers, you found a bottleneck.
Decision: Move data, or redesign the pipeline (tar shards, caching, fewer tiny file stats) if you must stay on DrvFS.
Task 15: Check container GPU visibility (if you use Docker)
cr0x@server:~$ docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
Tue Feb 6 09:22:18 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23 Driver Version: 551.23 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
|=========================================+========================+======================|
| 0 NVIDIA RTX 4090 On | 00000000:01:00.0 On | N/A |
+---------------------------------------------------------------------------------------+
Interpretation: If this fails but WSL nvidia-smi works, your issue is in the container runtime path (toolkit/runtime integration).
Decision: Fix NVIDIA container runtime configuration and ensure Docker Desktop/Engine is wired to WSL correctly.
Task 16: Validate CPU-side bottlenecks (because GPU is often innocent)
cr0x@server:~$ sudo apt-get update >/dev/null 2>&1; sudo apt-get install -y sysstat >/dev/null 2>&1; mpstat -P ALL 1 3
Linux 5.15.146.1-microsoft-standard-WSL2 (server) 02/06/2026 _x86_64_ (24 CPU)
09:23:10 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
09:23:11 AM all 320.00 0.00 40.00 55.00 0.00 0.00 0.00 0.00 0.00 585.00
09:23:12 AM all 330.00 0.00 41.00 60.00 0.00 0.00 0.00 0.00 0.00 570.00
Interpretation: High %iowait says you’re waiting on storage, not compute. That’s the classic “GPU at 10%” mystery.
Decision: Fix I/O (Task 13/14), data loader parallelism, and file layout before you start tuning CUDA flags.
Fast diagnosis playbook (find the bottleneck quickly)
When someone pings you with “WSL GPU is broken,” they usually mean one of three things:
no GPU visibility, GPU visible but frameworks fail, or GPU works but performance is trash.
This playbook gets you to the right bucket fast.
First: Is this a Windows driver / WSL integration failure?
- Run
nvidia-smi.exeon Windows. If it fails, stop. Fix Windows driver installation. - In WSL, check
ls -l /dev/dxg. If missing, stop. Update WSL and confirm Windows build/driver supports WSL. - In WSL, run
nvidia-smi. If it fails but/dev/dxgexists, suspect library conflicts (libnvidia-ml/libcuda) or broken WSL integration.
Second: Is it a userland mismatch (framework/toolkit confusion)?
- Check
ldconfig -p | grep libcudaand ensure it points to/usr/lib/wsl/lib. - Test the framework directly (PyTorch/TensorFlow minimal GPU probe).
- If the framework fails but
nvidia-smiworks, fix versions: framework build vs CUDA userland vs any pinnedLD_LIBRARY_PATH.
Third: Is it performance (data pipeline / storage / CPU limits)?
- While running the workload:
nvidia-smi dmon -s pucm -d 1. Ifsmis low, you’re not GPU-bound. - Check where data lives:
df -Th. If training reads from/mnt/c, move it. - Watch
%iowaitwithmpstat. High iowait: storage bottleneck. Low iowait but CPU pegged: data loader / preprocessing is the limiter.
Paraphrased idea from Werner Vogels (Amazon CTO): “Everything fails; reliability comes from designing and operating for that truth.”
Three corporate mini-stories from the trenches
Incident: the wrong assumption (“WSL is just Ubuntu, right?”)
A data science team had a Windows standard image and used WSL2 for dev. They hired a new ML engineer who had done years of CUDA on bare-metal Ubuntu.
The engineer did what any competent person would do: installed the NVIDIA Linux driver and a matching CUDA toolkit in the WSL distro.
The symptoms were beautifully confusing. nvidia-smi worked on Windows, but inside WSL it alternated between “No devices were found” and a segfault.
Sometimes it worked after a reboot. Sometimes it worked until Docker started. The team burned a week in Slack archaeology.
The root cause was simple: they had installed packages that tried to manage a kernel driver that WSL wasn’t supposed to use.
Library files got overwritten, module loading logic ran into a kernel boundary, and the environment became nondeterministic depending on what loaded first.
The fix was boring: remove the Linux NVIDIA driver packages, purge stale CUDA repos, and restore the WSL-provided /usr/lib/wsl/lib stack.
After that, they installed only the userland bits they actually needed for building code, and pinned the framework versions.
What changed culturally was more important than the commands: they stopped treating WSL like a “real Linux host” and started treating it like a specialized runtime.
The incident writeup literally had one line highlighted: “The Windows driver is the driver.”
Optimization that backfired: “Let’s keep datasets on C: so OneDrive backs them up”
A team wanted a single source of truth for datasets across Windows tools and WSL notebooks.
They stored everything under C:\Users\...\datasets and accessed it from WSL via /mnt/c.
It looked clean. It made compliance people happy. It made training painfully slow.
The GPU graphs told the story: utilization oscillated, training time doubled, and random stalls appeared.
The team assumed GPU virtualization overhead. They tweaked batch size, enabled mixed precision, changed CUDA versions, and even swapped GPUs.
Nothing stuck.
The real villain was file I/O behavior under DrvFS, amplified by a dataset layout with millions of small files and metadata-heavy access patterns.
Add corporate endpoint protection scanning and cloud sync behavior, and you get death by a thousand tiny stats() calls.
They fixed it by moving hot datasets into the WSL ext4 filesystem and exporting only finalized artifacts back to Windows paths.
For datasets that had to live on Windows, they repacked them into sharded archives to reduce metadata churn.
The “optimization” was reversed, and performance came back like someone took a boot off the data loader’s neck.
Boring but correct practice: version pinning and a smoke test saved the day
Another org had a proper platform team. Not glamorous. Mostly spreadsheets and guardrails.
They built a standard WSL base image: Windows driver version range, WSL minimum version, Ubuntu distro version, and a blessed set of ML frameworks.
Every Monday, a scheduled job ran a smoke test on a handful of machines: check /dev/dxg, run nvidia-smi, import PyTorch, run a tiny CUDA kernel,
and record the results. No deep benchmarking—just “does the chain still work.”
One week, Windows updates rolled out and the smoke test caught that a subset of machines had an older OEM GPU driver silently reinstalled.
Users hadn’t noticed yet, because graphics were fine. Compute would have been the first thing to break during a deadline.
The fix was a driver enforcement policy and an automated remediation script. The key win wasn’t the script.
It was the existence of a routine test that treated GPU-in-WSL as an operational dependency, not a developer superstition.
Joke #2: The most reliable GPU setup is the one nobody “improves” on a Friday afternoon.
Common mistakes: symptom → root cause → fix
1) Symptom: nvidia-smi in WSL says “command not found”
- Root cause: NVIDIA userland tools not installed in the distro, or PATH missing.
- Fix: Install the appropriate userland package set for your distro, or use framework-level tests. Don’t install kernel drivers. Ensure
/usr/lib/wsl/libexists.
2) Symptom: nvidia-smi in WSL says “No devices were found”
- Root cause: Missing WSL GPU bridge (
/dev/dxg), unsupported Windows driver, or outdated WSL/kernel. - Fix: Update Windows, update WSL, install a Windows NVIDIA driver that supports WSL compute. Validate
ls -l /dev/dxg.
3) Symptom: /dev/dxg missing
- Root cause: Not on WSL2, WSL component too old, or Windows build doesn’t support GPU compute path.
- Fix: Convert distro to WSL2, update WSL, and ensure host OS meets GPU-in-WSL requirements.
4) Symptom: nvidia-smi works, but PyTorch says CUDA not available
- Root cause: Installed CPU-only PyTorch build, or userland CUDA libraries mismatched/overridden by
LD_LIBRARY_PATH. - Fix: Install a CUDA-enabled framework build; remove conflicting CUDA libs; ensure
libcuda.so.1resolves to/usr/lib/wsl/lib.
5) Symptom: Docker container can’t see GPU, but WSL can
- Root cause: NVIDIA container runtime not configured, Docker not using the WSL engine correctly, or missing
--gpus all. - Fix: Fix Docker + NVIDIA runtime integration; re-test with a known CUDA base image and
nvidia-smiinside the container.
6) Symptom: Training is slow; GPU utilization is low and spiky
- Root cause: Data pipeline bottleneck (I/O, CPU preprocessing, too few workers), often worsened by using
/mnt/c. - Fix: Move datasets into WSL ext4; increase data loader workers; shard files; watch
%iowaitand GPUsmduring runs.
7) Symptom: Random segfaults or “illegal instruction” after installing CUDA toolkit
- Root cause: Library conflicts between distro CUDA packages and WSL-provided integration libs; occasionally mixing repo versions.
- Fix: Audit
ldconfig -p, remove conflicting packages, avoid overriding critical libs withLD_LIBRARY_PATH.
8) Symptom: GPU visible, but you get out-of-memory earlier than expected
- Root cause: Misreading VRAM usage (framework caching), or WSL memory pressure causing CPU-side issues that look like GPU issues.
- Fix: Use
nvidia-smito inspect VRAM, andfree -hto inspect host memory/swap behavior; tune batch size and data pipeline memory.
Checklists / step-by-step plan
Checklist A: Clean baseline for GPU compute in WSL (NVIDIA-centric)
- Windows: Install a current NVIDIA driver with WSL compute support. Verify with
nvidia-smi.exe. - Windows: Ensure WSL is current:
wsl.exe --versionmust show modern components. - WSL: Ensure your distro is WSL2:
wsl.exe -l -v. - WSL: Confirm GPU bridge device:
ls -l /dev/dxg. - WSL: Confirm management stack:
nvidia-smi. - WSL: Confirm library resolution:
ldconfig -p | grep libcudashould point to/usr/lib/wsl/lib. - WSL: Run a framework smoke test (PyTorch/TensorFlow).
Checklist B: Performance sanity for ML workloads
- Keep datasets under WSL ext4, not
/mnt/c. Verify withdf -Th. - During training, watch GPU:
nvidia-smi dmon -s pucm -d 1. - During training, watch CPU iowait:
mpstat -P ALL 1. - If iowait is high, benchmark disk path with
dd(or better tools later) and fix data layout. - If CPU is pegged but iowait is low, reduce preprocessing cost, increase worker threads/processes, and cache decoded data.
Checklist C: Docker path (only if you need it)
- First prove GPU works in WSL without containers (
nvidia-smiand a framework test). - Then test a known CUDA base image with
docker run --gpus all ... nvidia-smi. - If container fails: treat it as a runtime integration issue, not a GPU issue.
- Pin container base image CUDA version to what your framework expects. Avoid “latest” unless you enjoy archaeology.
FAQ
1) Do I need to install the NVIDIA Linux driver inside WSL?
Typically, no. WSL GPU compute relies on the Windows driver and a WSL integration layer.
Installing Linux kernel drivers in WSL is a common way to create library conflicts and weird instability.
2) If Windows nvidia-smi.exe works, why doesn’t WSL nvidia-smi?
Because “driver installed” is necessary but not sufficient. WSL needs the GPU bridge device (/dev/dxg) and correct userland libraries.
If /usr/lib/wsl/lib is being ignored or overwritten, WSL tools can fail.
3) What’s the single fastest check for “GPU available to WSL”?
Check ls -l /dev/dxg. If it’s missing, the compute path isn’t even present.
Then run nvidia-smi as a follow-up.
4) Why does GPU training run slower in WSL than native Linux sometimes?
Often it isn’t the GPU path—it’s storage and file I/O. Training from /mnt/c can be dramatically slower for metadata-heavy workloads.
Put data on the WSL ext4 filesystem and watch GPU utilization.
5) Can I use AMD GPUs for compute in WSL?
There are options (notably DirectML for certain stacks), but CUDA ecosystems are overwhelmingly NVIDIA-focused.
If your workload depends on CUDA libraries, plan on NVIDIA hardware and drivers.
6) Docker can’t see my GPU in WSL. What’s the usual culprit?
Missing or misconfigured GPU runtime integration for containers, or simply not using --gpus all.
Prove the GPU works outside Docker first, then debug the container layer.
7) Do I need the full CUDA toolkit in WSL to run PyTorch or TensorFlow?
Not always. Many framework distributions bundle the necessary CUDA userland libraries (or expect specific ones).
Install the toolkit only if you compile CUDA code or have a specific dependency that requires it.
8) Why does nvidia-smi show a CUDA version that doesn’t match my toolkit?
The CUDA version in nvidia-smi reflects the driver’s maximum supported CUDA capability.
Your toolkit/framework CUDA version is a separate userland component. Mismatch isn’t automatically wrong; incompatibility is.
9) Is WSL GPU support stable enough for production training?
For many teams, yes—especially for developer workstations and repeatable CI-like pipelines.
For “this is our only training cluster” situations, you still want the operational control of native Linux servers, but WSL can be perfectly credible for a lot of workflows.
Practical next steps
- Decide your target stack. Framework-only (simpler) versus custom CUDA compilation (needs toolkit). Stop mixing goals.
- Prove the chain in order. Windows driver → WSL2 →
/dev/dxg→nvidia-smi→ framework GPU probe → (optional) Docker probe. - Move data off
/mnt/cfor training. If you do nothing else for performance, do this. - Pin versions and add a smoke test. A one-minute automated check beats a week of “but it worked yesterday” every time.
If you treat WSL GPU like a system—with dependencies, contracts, and a verification routine—it behaves like one.
If you treat it like a magic trick, it will eventually do the classic trick: disappearing right before your deadline.