From 60c62f8f84274d80d0d83718b3e6026afa88a191 Mon Sep 17 00:00:00 2001
From: Quentin Fuxa <quentin.fuxa@gmail.com>
Date: Tue, 25 Nov 2025 23:31:46 +0100
Subject: [PATCH] troubleshooting #271 #276 #284 #286

---
 README.md               |   6 ++-
 docs/troubleshooting.md | 113 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+), 2 deletions(-)
 create mode 100644 docs/troubleshooting.md
diff --git a/README.md b/README.md
index b867199..9ae9f08 100644
--- a/README.md
+++ b/README.md
@@ -51,9 +51,11 @@ pip install whisperlivekit
 2. **Open your browser** and navigate to `http://localhost:8000`. Start speaking and watch your words appear in real-time!
 
 
-> - See [tokenizer.py](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py) for the list of all available languages.
-> - For HTTPS requirements, see the **Parameters** section for SSL configuration options.
+> - See [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py) for the list of all available languages.
+> - Check the [troubleshooting guide](docs/troubleshooting.md) for step-by-step fixes collected from recent GPU setup/env issues.
 > - The CLI entry point is exposed as both `wlk` and `whisperlivekit-server`; they are equivalent.
+> - For HTTPS requirements, see the **Parameters** section for SSL configuration options.
+
 
 #### Use it to capture audio from web pages.
 
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
new file mode 100644
index 0000000..2581a41
--- /dev/null
+++ b/docs/troubleshooting.md
@@ -0,0 +1,113 @@
+# Troubleshooting
+
+
+## GPU drivers & cuDNN visibility
+
+### Linux error: `Unable to load libcudnn_ops.so* / cudnnCreateTensorDescriptor`
+> Reported in issue #271 (Arch/CachyOS)
+
+`faster-whisper` (used for the SimulStreaming encoder) dynamically loads cuDNN.  
+If the runtime cannot find `libcudnn_*`, verify that CUDA and cuDNN match the PyTorch build you installed:
+
+1. **Install CUDA + cuDNN** (Arch/CachyOS example):
+   ```bash
+   sudo pacman -S cuda cudnn
+   sudo ldconfig
+   ```
+2. **Make sure the shared objects are visible**:
+   ```bash
+   ls /usr/lib/libcudnn*
+   ```
+3. **Check what CUDA version PyTorch expects** and match that with the driver you installed:
+   ```bash
+   python - <<'EOF'
+   import torch
+   print(torch.version.cuda)
+   EOF
+   nvcc --version
+   ```
+4. If you installed CUDA in a non-default location, export `CUDA_HOME` and add `$CUDA_HOME/lib64` to `LD_LIBRARY_PATH`.
+
+Once the CUDA/cuDNN versions match, `whisperlivekit-server` starts normally.
+
+### Windows error: `Could not locate cudnn_ops64_9.dll`
+> Reported in issue #286 (Conda on Windows)
+
+PyTorch bundles cuDNN DLLs inside your environment (`<env>\Lib\site-packages\torch\lib`).  
+When `ctranslate2` or `faster-whisper` cannot find `cudnn_ops64_9.dll`:
+
+1. Locate the DLL shipped with PyTorch, e.g.
+   ```
+   E:\conda\envs\WhisperLiveKit\Lib\site-packages\torch\lib\cudnn_ops64_9.dll
+   ```
+2. Add that directory to your `PATH` **or** copy the `cudnn_*64_9.dll` files into a directory that is already on `PATH` (such as the environment's `Scripts/` folder).
+3. Restart the shell before launching `wlk`.
+
+Installing NVIDIA's standalone cuDNN 9.x and pointing `PATH`/`CUDNN_PATH` to it works as well, but is usually not required.
+
+---
+
+## PyTorch / CTranslate2 GPU builds
+
+### `Torch not compiled with CUDA enabled`
+> Reported in issue #284
+
+If `torch.zeros(1).cuda()` raises that assertion it means you installed a CPU-only wheel.  
+Install the GPU-enabled wheels that match your CUDA toolkit:
+
+```bash
+pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
+```
+
+Replace `cu130` with the CUDA version supported by your driver (see [PyTorch install selector](https://pytorch.org/get-started/locally/)).  
+Validate with:
+
+```python
+import torch
+print(torch.cuda.is_available(), torch.cuda.get_device_name())
+```
+
+### `CTranslate2 device count: 0` or `Could not infer dtype of ctranslate2._ext.StorageView`
+> Follow-up in issue #284
+
+`ctranslate2` publishes separate CPU and CUDA wheels. The default `pip install ctranslate2` brings the CPU build, which makes WhisperLiveKit fall back to CPU tensors and leads to the dtype error above.
+
+1. Uninstall the CPU build: `pip uninstall -y ctranslate2`.
+2. Install the CUDA wheel that matches your toolkit (example for CUDA 13.0):
+   ```bash
+   pip install ctranslate2==4.5.0 -f https://opennmt.net/ctranslate2/whl/cu130
+   ```
+   (See the [CTranslate2 installation table](https://opennmt.net/CTranslate2/installation.html) for other CUDA versions.)
+3. Verify:
+   ```python
+   import ctranslate2
+   print("CUDA devices:", ctranslate2.get_cuda_device_count())
+   ```
+
+If you intentionally want CPU inference, run `wlk --backend whisper` to avoid mixing CPU-only CTranslate2 with a GPU Torch build.
+
+---
+
+## Hopper / Blackwell (`sm_121a`) systems
+> Reported in issue #276 (NVIDIA DGX Spark)
+
+CUDA 12.1a GPUs ship before some toolchains know about the architecture ID, so Triton/PTXAS need manual hints:
+
+```bash
+export CUDA_HOME="/usr/local/cuda-13.0"
+export PATH="$CUDA_HOME/bin:$PATH"
+export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
+
+# Tell Triton where the new ptxas lives
+export TRITON_PTXAS_PATH="$CUDA_HOME/bin/ptxas"
+
+# Force PyTorch to JIT kernels for all needed architectures
+export TORCH_CUDA_ARCH_LIST="8.0 9.0 10.0 12.0 12.1a"
+```
+
+After exporting those variables (or adding them to your systemd service / shell profile), restart `wlk`. Incoming streams will now compile kernels targeting `sm_121a` without crashing.
+
+---
+
+Need help with another recurring issue? Open a GitHub discussion or PR and reference this document so we can keep it current.
+