GPU in WSL: The Real Requirements (and the Common Gotchas)

February 6, 2026 • February 6, 2026 • Read: 20 min • Views: 0

Was this helpful?

The promise is seductive: keep Windows for everything corporate, keep Linux for everything useful, and still hit the GPU like you’re on a real workstation.
Then you run nvidia-smi in WSL and it says “not found”, or worse, “No devices were found”.

This is the reality: WSL GPU support is solid now, but it’s picky. You don’t “install CUDA in WSL” the way you did on bare metal.
You assemble a chain of dependencies across Windows, the WSL kernel, the Linux userland, and sometimes Docker. One weak link and you get a slow, confusing failure.

The mental model: what “GPU in WSL” actually means

WSL2 is not “Linux on Windows” in the cute sense. It’s a lightweight VM. Your Linux processes run in a real Linux kernel (Microsoft-shipped),
and they see hardware through a virtualization boundary.

For GPU compute, WSL2 uses a paravirtualized GPU interface. On modern Windows builds, the Windows GPU driver exposes a compute path to the WSL VM.
Linux processes don’t talk to a physical PCI device; they talk to a virtual device node that forwards work to the host driver.
The Windows driver is the source of truth. In practice, that means:

You do not install a full Linux NVIDIA kernel driver in WSL for the GPU. If you try, you’ll usually make things worse.
WSL’s libcuda comes from a special integration layer, not from your distro’s standard kernel-module stack.
Most failures are mismatches between Windows driver version, WSL kernel capability, and userland libraries (CUDA toolkit / ML frameworks).

Think of WSL GPU support as “remote CUDA to the Windows driver over a very fast local channel”, not as “Linux owns the GPU”.
This mental shift prevents the classic mistake: chasing Linux kernel modules that don’t even apply.

Joke #1: WSL GPU troubleshooting is like herding cats—except the cats are drivers, and they all insist they’re already installed.

The real requirements (what must be true)

1) Your Windows build must support WSL2 GPU compute

GPU compute in WSL2 depends on the Windows host stack: WDDM GPU drivers, WSL plumbing, and kernel support.
If Windows is too old, you’ll get a clean-looking WSL install with a completely dead GPU path.

Practical guidance: keep Windows updated. If you’re in an enterprise ring that lags, verify GPU-in-WSL support before you promise anything to your team.
This isn’t optional; it’s the foundation.

2) Your GPU vendor and driver must explicitly support WSL

In the real world, most people doing CUDA in WSL are on NVIDIA. AMD can work in certain scenarios (and DirectML exists),
but CUDA-in-WSL is primarily an NVIDIA story.

The key requirement is the Windows NVIDIA driver version that includes WSL support.
Installing a random “Studio” or “Game Ready” driver is fine as long as it’s new enough and includes the WSL compute components.
Old drivers will happily render games while refusing to expose compute to WSL.

3) WSL2, not WSL1

WSL1 translates Linux syscalls. It’s clever, but it’s not the platform for GPU compute. You want WSL2 with the real kernel.
If your distribution is still on WSL1, stop. Convert it. Don’t debug anything else until you do.

4) A supported WSL kernel and WSL version

There are two moving pieces: the Windows OS build and the WSL component itself (which now updates more like an app).
The GPU path lives in that space. If you’re on a stale WSL version, you can see weird mismatches: Windows driver looks correct, but WSL lacks the glue.

5) Userland libraries must match the framework expectations

Here’s where people shoot themselves in the foot: they install the full CUDA toolkit in WSL as if they were on Ubuntu bare metal,
then overwrite key libraries, and then wonder why nvidia-smi behaves but PyTorch doesn’t.

Decide what you actually need:

If you only run PyTorch or TensorFlow: you often don’t need the full CUDA toolkit; you need the right framework build and compatible userland libs.
If you compile custom CUDA code: you need the toolkit inside WSL, but you still should not install a Linux kernel driver.

6) Docker adds another dependency layer

Docker in WSL is common—and a frequent failure amplifier. Now you have:
Windows driver → WSL GPU interface → WSL userland → Docker engine → NVIDIA container runtime → your container image.
One mismatch and you lose the GPU, or you get a container that sees the GPU but can’t run kernels.

Facts and historical context (why it’s weird)

These aren’t trivia for trivia’s sake. They explain today’s design and the failure modes you’ll actually see.

WSL1 (2016) wasn’t a VM. It translated syscalls. Great for CLI tools; not a natural fit for GPUs and kernel-mode drivers.
WSL2 (2019) switched to a real Linux kernel. That made container workloads and kernel-dependent tooling practical.
GPU compute in WSL2 arrived via paravirtualization, not PCI passthrough. Your Linux guest doesn’t “own” the GPU device like a typical hypervisor passthrough setup.
The Windows display driver model (WDDM) is in the critical path. If WDDM components are out of date, compute can fail even if graphics work fine.
NVIDIA’s WSL support included a dedicated user-mode CUDA path. This is why the Windows driver version matters more than the Linux package manager’s idea of “latest”.
nvidia-smi in WSL is a compatibility signal, not the whole truth. It can succeed while your ML framework fails due to cuDNN/cublas mismatches.
Early WSL GPU support was constrained by Windows Insider builds. A lot of “it works on my machine” lore comes from that era.
DirectML became a pragmatic alternative for some Windows-first ML workloads. But if you want mainstream CUDA ecosystems, you’re still in NVIDIA land.

Practical tasks: commands, outputs, decisions

These are the production-grade checks. Not the “try reinstalling everything” rituals.
Each task includes a command, what typical output means, and what decision you make next.

Task 1: Confirm WSL version and basic health (Windows side)

cr0x@server:~$ wsl.exe --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26052.1000-240405-2035.ge-release
Windows version: 10.0.22631.3007

Interpretation: You want a modern WSL version and a 5.15+ kernel in most environments. The exact numbers vary, but “ancient” is obvious.

Decision: If wsl.exe --version fails or shows very old components, update WSL before touching CUDA or frameworks.

Task 2: Confirm your distro runs as WSL2 (not WSL1)

cr0x@server:~$ wsl.exe -l -v
  NAME            STATE           VERSION
* Ubuntu-22.04    Running         2
  Debian          Stopped         2

Interpretation: VERSION must be 2.

Decision: If it’s 1, convert: wsl.exe --set-version <distro> 2. Don’t debug GPU on WSL1.

Task 3: Verify Windows can see the NVIDIA driver (host sanity)

cr0x@server:~$ nvidia-smi.exe
Tue Feb  6 09:14:22 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23                 Driver Version: 551.23         CUDA Version: 12.4   |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 4090               On   |   00000000:01:00.0  On |                  N/A |
|  0%   44C    P8             21W / 450W  |     987MiB / 24564MiB   |      2%      Default |
+-----------------------------------------+------------------------+----------------------+

Interpretation: Windows driver is installed and the GPU is visible. The CUDA Version shown here is the driver capability, not your WSL toolkit.

Decision: If this fails on Windows, stop and fix the driver on Windows first. WSL won’t magically improve it.

Task 4: Verify the GPU device node exists inside WSL

cr0x@server:~$ ls -l /dev/dxg
crw-rw-rw- 1 root root 509, 0 Feb  6 09:10 /dev/dxg

Interpretation: /dev/dxg is the WSL GPU bridge device. No /dev/dxg, no GPU compute path.

Decision: If it’s missing, suspect outdated WSL/kernel, wrong Windows build, or GPU driver lacking WSL support.

Task 5: Confirm WSL can call into NVIDIA management stack

cr0x@server:~$ nvidia-smi
Tue Feb  6 09:16:05 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23                 Driver Version: 551.23         CUDA Version: 12.4   |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 4090               On   |   00000000:01:00.0  On |                  N/A |
|  0%   45C    P8             23W / 450W  |     992MiB / 24564MiB   |      1%      Default |
+-----------------------------------------+------------------------+----------------------+

Interpretation: This is a necessary but not sufficient check. It proves the management stack works through WSL.

Decision: If this fails, don’t waste time reinstalling PyTorch yet. Fix the driver/WSL integration first.

Task 6: Check whether you accidentally installed a Linux kernel NVIDIA driver (usually bad in WSL)

cr0x@server:~$ lsmod | grep -E '^nvidia|^nouveau' || true
nouveau              2539520  0
drm_ttm_helper         16384  1 nouveau
ttm                  106496  2 drm_ttm_helper,nouveau
drm_kms_helper        315392  1 nouveau

Interpretation: Seeing nouveau or nvidia kernel modules loaded in WSL is a red flag.
In typical WSL GPU setups, you should not be using those kernel modules for the paravirtual path.

Decision: If these are loaded, remove/blacklist them and revert to the supported WSL integration model. This is a common self-inflicted outage.

Task 7: Verify `libcuda.so` resolution inside WSL

cr0x@server:~$ ldconfig -p | grep -E 'libcuda\.so|libnvidia-ml\.so' | head
	libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1
	libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libnvidia-ml.so.1

Interpretation: In WSL, you typically want CUDA stubs and integration libraries under /usr/lib/wsl/lib.
If libcuda.so.1 resolves somewhere else (like /usr/lib/x86_64-linux-gnu from a distro package), you may have library conflicts.

Decision: If resolution is wrong, fix the library pathing and remove conflicting packages before you debug frameworks.

Task 8: Confirm the environment is not sabotaging library load paths

cr0x@server:~$ env | grep -E 'LD_LIBRARY_PATH|CUDA_HOME|CUDA_PATH' || true
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
CUDA_HOME=/usr/local/cuda

Interpretation: This can be fine if you intentionally installed a matching toolkit.
It can also force your apps to pick up the wrong libraries and ignore /usr/lib/wsl/lib.

Decision: If you’re just running framework wheels/conda packages, consider unsetting these variables and letting the framework manage dependencies.

Task 9: Validate GPU access from a framework (PyTorch example)

cr0x@server:~$ python3 -c "import torch; print(torch.__version__); print('cuda:', torch.version.cuda); print('is_available:', torch.cuda.is_available()); print('device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'none')"
2.2.1
cuda: 12.1
is_available: True
device: NVIDIA RTX 4090

Interpretation: This is the money test. If nvidia-smi works but torch.cuda.is_available() is false, you likely have userland library mismatches.

Decision: If false, inspect ldd on the relevant CUDA libs, and verify you installed a CUDA-enabled build of the framework.

Task 10: Inspect dynamic linking for a CUDA library (catch “wrong libcuda” fast)

cr0x@server:~$ python3 -c "import ctypes; import os; print(ctypes.CDLL('libcuda.so.1'))"
<CDLL 'libcuda.so.1', handle 55b7f6d8f900 at 0x7f2df0b7f2d0>

Interpretation: If this throws “cannot open shared object file”, your library path is broken.

Decision: If it fails, return to Task 7 and Task 8. Don’t reinstall the world; fix linkage.

Task 11: Check WSL memory and swap configuration (silent performance killer)

cr0x@server:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:           31Gi       4.2Gi       21Gi       136Mi       5.8Gi        26Gi
Swap:          16Gi          0B       16Gi

Interpretation: WSL uses dynamic memory, but it can still hit swap or memory pressure under training workloads.
If your dataset pipeline is memory-hungry, GPU utilization can crater while the CPU churns on paging.

Decision: If swap is active during training, fix memory limits and data pipeline buffering. Don’t blame CUDA first.

Task 12: Measure GPU utilization while running a workload

cr0x@server:~$ nvidia-smi dmon -s pucm -d 1
# gpu   pwr gtemp   sm   mem   enc   dec  mclk  pclk
# Idx     W     C    %     %     %     %   MHz   MHz
    0    85    62   97    71     0     0  10501  2520
    0    92    63   98    73     0     0  10501  2520

Interpretation: If sm is low and your job is “training”, you’re not GPU-bound. You’re bottlenecked upstream (CPU, I/O, data loader).

Decision: Low utilization means you should profile input pipeline, storage, and CPU threads before touching GPU settings.

Task 13: Check disk and filesystem type (I/O bottleneck audit)

cr0x@server:~$ df -Th / /mnt/c
Filesystem     Type   Size  Used Avail Use% Mounted on
/dev/sdd       ext4   250G   42G  196G  18% /
C:\           drvfs  953G  611G  342G  65% /mnt/c

Interpretation: WSL’s native ext4 filesystem (/) behaves like Linux. /mnt/c is DrvFS and has different performance characteristics.
Many ML workloads get wrecked by dataset reads from /mnt/c.

Decision: Keep training datasets and heavy I/O inside the WSL ext4 filesystem. Use /mnt/c for convenience, not performance.

Task 14: Quick and dirty I/O benchmark where your dataset lives

cr0x@server:~$ dd if=/dev/zero of=./io-test.bin bs=1M count=2048 oflag=direct status=progress
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.92 s, 1.1 GB/s
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.92 s, 1.1 GB/s

Interpretation: This tests sequential write throughput. If you run the same on /mnt/c and get dramatically worse numbers, you found a bottleneck.

Decision: Move data, or redesign the pipeline (tar shards, caching, fewer tiny file stats) if you must stay on DrvFS.

Task 15: Check container GPU visibility (if you use Docker)

cr0x@server:~$ docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
Tue Feb  6 09:22:18 2026
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23                 Driver Version: 551.23         CUDA Version: 12.4   |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
|=========================================+========================+======================|
|   0  NVIDIA RTX 4090               On   |   00000000:01:00.0  On |                  N/A |
+---------------------------------------------------------------------------------------+

Interpretation: If this fails but WSL nvidia-smi works, your issue is in the container runtime path (toolkit/runtime integration).

Decision: Fix NVIDIA container runtime configuration and ensure Docker Desktop/Engine is wired to WSL correctly.

Task 16: Validate CPU-side bottlenecks (because GPU is often innocent)

cr0x@server:~$ sudo apt-get update >/dev/null 2>&1; sudo apt-get install -y sysstat >/dev/null 2>&1; mpstat -P ALL 1 3
Linux 5.15.146.1-microsoft-standard-WSL2 (server) 	02/06/2026 	_x86_64_	(24 CPU)

09:23:10 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
09:23:11 AM  all   320.00    0.00   40.00   55.00    0.00    0.00    0.00    0.00    0.00  585.00
09:23:12 AM  all   330.00    0.00   41.00   60.00    0.00    0.00    0.00    0.00    0.00  570.00

Interpretation: High %iowait says you’re waiting on storage, not compute. That’s the classic “GPU at 10%” mystery.

Decision: Fix I/O (Task 13/14), data loader parallelism, and file layout before you start tuning CUDA flags.

Fast diagnosis playbook (find the bottleneck quickly)

When someone pings you with “WSL GPU is broken,” they usually mean one of three things:
no GPU visibility, GPU visible but frameworks fail, or GPU works but performance is trash.
This playbook gets you to the right bucket fast.

First: Is this a Windows driver / WSL integration failure?

Run nvidia-smi.exe on Windows. If it fails, stop. Fix Windows driver installation.
In WSL, check ls -l /dev/dxg. If missing, stop. Update WSL and confirm Windows build/driver supports WSL.
In WSL, run nvidia-smi. If it fails but /dev/dxg exists, suspect library conflicts (libnvidia-ml/libcuda) or broken WSL integration.

Second: Is it a userland mismatch (framework/toolkit confusion)?

Check ldconfig -p | grep libcuda and ensure it points to /usr/lib/wsl/lib.
Test the framework directly (PyTorch/TensorFlow minimal GPU probe).
If the framework fails but nvidia-smi works, fix versions: framework build vs CUDA userland vs any pinned LD_LIBRARY_PATH.

Third: Is it performance (data pipeline / storage / CPU limits)?

While running the workload: nvidia-smi dmon -s pucm -d 1. If sm is low, you’re not GPU-bound.
Check where data lives: df -Th. If training reads from /mnt/c, move it.
Watch %iowait with mpstat. High iowait: storage bottleneck. Low iowait but CPU pegged: data loader / preprocessing is the limiter.

Paraphrased idea from Werner Vogels (Amazon CTO): “Everything fails; reliability comes from designing and operating for that truth.”

Three corporate mini-stories from the trenches

Incident: the wrong assumption (“WSL is just Ubuntu, right?”)

A data science team had a Windows standard image and used WSL2 for dev. They hired a new ML engineer who had done years of CUDA on bare-metal Ubuntu.
The engineer did what any competent person would do: installed the NVIDIA Linux driver and a matching CUDA toolkit in the WSL distro.

The symptoms were beautifully confusing. nvidia-smi worked on Windows, but inside WSL it alternated between “No devices were found” and a segfault.
Sometimes it worked after a reboot. Sometimes it worked until Docker started. The team burned a week in Slack archaeology.

The root cause was simple: they had installed packages that tried to manage a kernel driver that WSL wasn’t supposed to use.
Library files got overwritten, module loading logic ran into a kernel boundary, and the environment became nondeterministic depending on what loaded first.

The fix was boring: remove the Linux NVIDIA driver packages, purge stale CUDA repos, and restore the WSL-provided /usr/lib/wsl/lib stack.
After that, they installed only the userland bits they actually needed for building code, and pinned the framework versions.

What changed culturally was more important than the commands: they stopped treating WSL like a “real Linux host” and started treating it like a specialized runtime.
The incident writeup literally had one line highlighted: “The Windows driver is the driver.”

Optimization that backfired: “Let’s keep datasets on C: so OneDrive backs them up”

A team wanted a single source of truth for datasets across Windows tools and WSL notebooks.
They stored everything under C:\Users\...\datasets and accessed it from WSL via /mnt/c.
It looked clean. It made compliance people happy. It made training painfully slow.

The GPU graphs told the story: utilization oscillated, training time doubled, and random stalls appeared.
The team assumed GPU virtualization overhead. They tweaked batch size, enabled mixed precision, changed CUDA versions, and even swapped GPUs.
Nothing stuck.

The real villain was file I/O behavior under DrvFS, amplified by a dataset layout with millions of small files and metadata-heavy access patterns.
Add corporate endpoint protection scanning and cloud sync behavior, and you get death by a thousand tiny stats() calls.

They fixed it by moving hot datasets into the WSL ext4 filesystem and exporting only finalized artifacts back to Windows paths.
For datasets that had to live on Windows, they repacked them into sharded archives to reduce metadata churn.
The “optimization” was reversed, and performance came back like someone took a boot off the data loader’s neck.

Boring but correct practice: version pinning and a smoke test saved the day

Another org had a proper platform team. Not glamorous. Mostly spreadsheets and guardrails.
They built a standard WSL base image: Windows driver version range, WSL minimum version, Ubuntu distro version, and a blessed set of ML frameworks.

Every Monday, a scheduled job ran a smoke test on a handful of machines: check /dev/dxg, run nvidia-smi, import PyTorch, run a tiny CUDA kernel,
and record the results. No deep benchmarking—just “does the chain still work.”

One week, Windows updates rolled out and the smoke test caught that a subset of machines had an older OEM GPU driver silently reinstalled.
Users hadn’t noticed yet, because graphics were fine. Compute would have been the first thing to break during a deadline.

The fix was a driver enforcement policy and an automated remediation script. The key win wasn’t the script.
It was the existence of a routine test that treated GPU-in-WSL as an operational dependency, not a developer superstition.

Joke #2: The most reliable GPU setup is the one nobody “improves” on a Friday afternoon.

Common mistakes: symptom → root cause → fix

1) Symptom: `nvidia-smi` in WSL says “command not found”

Root cause: NVIDIA userland tools not installed in the distro, or PATH missing.
Fix: Install the appropriate userland package set for your distro, or use framework-level tests. Don’t install kernel drivers. Ensure /usr/lib/wsl/lib exists.

2) Symptom: `nvidia-smi` in WSL says “No devices were found”

Root cause: Missing WSL GPU bridge (/dev/dxg), unsupported Windows driver, or outdated WSL/kernel.
Fix: Update Windows, update WSL, install a Windows NVIDIA driver that supports WSL compute. Validate ls -l /dev/dxg.

3) Symptom: `/dev/dxg` missing

Root cause: Not on WSL2, WSL component too old, or Windows build doesn’t support GPU compute path.
Fix: Convert distro to WSL2, update WSL, and ensure host OS meets GPU-in-WSL requirements.

4) Symptom: `nvidia-smi` works, but PyTorch says CUDA not available

Root cause: Installed CPU-only PyTorch build, or userland CUDA libraries mismatched/overridden by LD_LIBRARY_PATH.
Fix: Install a CUDA-enabled framework build; remove conflicting CUDA libs; ensure libcuda.so.1 resolves to /usr/lib/wsl/lib.

5) Symptom: Docker container can’t see GPU, but WSL can

Root cause: NVIDIA container runtime not configured, Docker not using the WSL engine correctly, or missing --gpus all.
Fix: Fix Docker + NVIDIA runtime integration; re-test with a known CUDA base image and nvidia-smi inside the container.

6) Symptom: Training is slow; GPU utilization is low and spiky

Root cause: Data pipeline bottleneck (I/O, CPU preprocessing, too few workers), often worsened by using /mnt/c.
Fix: Move datasets into WSL ext4; increase data loader workers; shard files; watch %iowait and GPU sm during runs.

7) Symptom: Random segfaults or “illegal instruction” after installing CUDA toolkit

Root cause: Library conflicts between distro CUDA packages and WSL-provided integration libs; occasionally mixing repo versions.
Fix: Audit ldconfig -p, remove conflicting packages, avoid overriding critical libs with LD_LIBRARY_PATH.

8) Symptom: GPU visible, but you get out-of-memory earlier than expected

Root cause: Misreading VRAM usage (framework caching), or WSL memory pressure causing CPU-side issues that look like GPU issues.
Fix: Use nvidia-smi to inspect VRAM, and free -h to inspect host memory/swap behavior; tune batch size and data pipeline memory.

Checklists / step-by-step plan

Checklist A: Clean baseline for GPU compute in WSL (NVIDIA-centric)

Windows: Install a current NVIDIA driver with WSL compute support. Verify with nvidia-smi.exe.
Windows: Ensure WSL is current: wsl.exe --version must show modern components.
WSL: Ensure your distro is WSL2: wsl.exe -l -v.
WSL: Confirm GPU bridge device: ls -l /dev/dxg.
WSL: Confirm management stack: nvidia-smi.
WSL: Confirm library resolution: ldconfig -p | grep libcuda should point to /usr/lib/wsl/lib.
WSL: Run a framework smoke test (PyTorch/TensorFlow).

Checklist B: Performance sanity for ML workloads

Keep datasets under WSL ext4, not /mnt/c. Verify with df -Th.
During training, watch GPU: nvidia-smi dmon -s pucm -d 1.
During training, watch CPU iowait: mpstat -P ALL 1.
If iowait is high, benchmark disk path with dd (or better tools later) and fix data layout.
If CPU is pegged but iowait is low, reduce preprocessing cost, increase worker threads/processes, and cache decoded data.

Checklist C: Docker path (only if you need it)

First prove GPU works in WSL without containers (nvidia-smi and a framework test).
Then test a known CUDA base image with docker run --gpus all ... nvidia-smi.
If container fails: treat it as a runtime integration issue, not a GPU issue.
Pin container base image CUDA version to what your framework expects. Avoid “latest” unless you enjoy archaeology.

FAQ

1) Do I need to install the NVIDIA Linux driver inside WSL?

Typically, no. WSL GPU compute relies on the Windows driver and a WSL integration layer.
Installing Linux kernel drivers in WSL is a common way to create library conflicts and weird instability.

2) If Windows `nvidia-smi.exe` works, why doesn’t WSL `nvidia-smi`?

Because “driver installed” is necessary but not sufficient. WSL needs the GPU bridge device (/dev/dxg) and correct userland libraries.
If /usr/lib/wsl/lib is being ignored or overwritten, WSL tools can fail.

3) What’s the single fastest check for “GPU available to WSL”?

Check ls -l /dev/dxg. If it’s missing, the compute path isn’t even present.
Then run nvidia-smi as a follow-up.

4) Why does GPU training run slower in WSL than native Linux sometimes?

Often it isn’t the GPU path—it’s storage and file I/O. Training from /mnt/c can be dramatically slower for metadata-heavy workloads.
Put data on the WSL ext4 filesystem and watch GPU utilization.

5) Can I use AMD GPUs for compute in WSL?

There are options (notably DirectML for certain stacks), but CUDA ecosystems are overwhelmingly NVIDIA-focused.
If your workload depends on CUDA libraries, plan on NVIDIA hardware and drivers.

6) Docker can’t see my GPU in WSL. What’s the usual culprit?

Missing or misconfigured GPU runtime integration for containers, or simply not using --gpus all.
Prove the GPU works outside Docker first, then debug the container layer.

7) Do I need the full CUDA toolkit in WSL to run PyTorch or TensorFlow?

Not always. Many framework distributions bundle the necessary CUDA userland libraries (or expect specific ones).
Install the toolkit only if you compile CUDA code or have a specific dependency that requires it.

8) Why does `nvidia-smi` show a CUDA version that doesn’t match my toolkit?

The CUDA version in nvidia-smi reflects the driver’s maximum supported CUDA capability.
Your toolkit/framework CUDA version is a separate userland component. Mismatch isn’t automatically wrong; incompatibility is.

9) Is WSL GPU support stable enough for production training?

For many teams, yes—especially for developer workstations and repeatable CI-like pipelines.
For “this is our only training cluster” situations, you still want the operational control of native Linux servers, but WSL can be perfectly credible for a lot of workflows.

Practical next steps

Decide your target stack. Framework-only (simpler) versus custom CUDA compilation (needs toolkit). Stop mixing goals.
Prove the chain in order. Windows driver → WSL2 → /dev/dxg → nvidia-smi → framework GPU probe → (optional) Docker probe.
Move data off /mnt/c for training. If you do nothing else for performance, do this.
Pin versions and add a smoke test. A one-minute automated check beats a week of “but it worked yesterday” every time.

If you treat WSL GPU like a system—with dependencies, contracts, and a verification routine—it behaves like one.
If you treat it like a magic trick, it will eventually do the classic trick: disappearing right before your deadline.