new vac parameters

2025-08-17 22:26:28 +02:00 · 2025-08-17 22:26:28 +02:00 · eb96153ffd
commit eb96153ffd
parent 47e3eb9b5b
1 changed files with 42 additions and 92 deletions
--- a/README.md
+++ b/README.md
@ -40,56 +40,37 @@ WhisperLiveKit brings real-time speech transcription directly to your browser, w

 <img alt="Architecture" src="architecture.png" />

-
-## Quick Start
+### Installation & Quick Start

 ```bash
-# Install the package
 pip install whisperlivekit
-
-# Start the transcription server
-whisperlivekit-server --model tiny.en
-
-# Open your browser at http://localhost:8000 to see the interface.
-# Use  -ssl-certfile public.crt --ssl-keyfile private.key parameters to use SSL
 ```

-That's it! Start speaking and watch your words appear on screen.
+>  **FFmpeg is required** and must be installed before using WhisperLiveKit
+> 
+> | OS | How to install |
+> |-----------|-------------|
+>  | Ubuntu/Debian | `sudo apt install ffmpeg` |
+> | MacOS | `brew install ffmpeg` |
+> | Windows | Download .exe from https://ffmpeg.org/download.html and add to PATH |

-## Installation
+#### Quick Start
+1. **Start the transcription server:**
+   ```bash
+   whisperlivekit-server --model tiny.en
+   ```
+
+2. **Open your browser** and navigate to `http://localhost:8000`
+
+3. **Start speaking** and watch your words appear in real-time!
+
+> For production use or HTTPS requirements, see the [Parameters](#parameters) section for SSL configuration options.
+
+#### Optional Dependencies

 ```bash
-#Install from PyPI (Recommended)
-pip install whisperlivekit

-#Install from Source
-git clone https://github.com/QuentinFuxa/WhisperLiveKit
-cd WhisperLiveKit
-pip install -e .
-```
-
-### FFmpeg Dependency
-
-```bash
-# Ubuntu/Debian
-sudo apt install ffmpeg
-
-# macOS
-brew install ffmpeg
-
-# Windows
-# Download from https://ffmpeg.org/download.html and add to PATH
-```
-
-### Optional Dependencies
-
-```bash
-# Sentence-based buffer trimming
-pip install mosestokenizer wtpsplit
-pip install tokenize_uk  # If you work with Ukrainian text
-
-# Speaker diarization
-pip install diart
+pip install whisperlivekit[diarization] # Speaker diarization

 # Alternative Whisper backends (default is faster-whisper)
 pip install whisperlivekit[whisper]              # Original Whisper
@ -98,29 +79,23 @@ pip install whisperlivekit[mlx-whisper]          # Apple Silicon optimization
 pip install whisperlivekit[openai]               # OpenAI API
 ```

-### 🎹 Pyannote Models Setup
-
-For diarization, you need access to pyannote.audio models:
-
-1. [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model
-2. [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the `pyannote/segmentation-3.0` model
-3. [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model
-4. Login with HuggingFace:
-```bash
-pip install huggingface_hub
-huggingface-cli login
-```
+ 
+> **Pyannote Models Setup** For diarization, you need access to pyannote.audio models:
+> 1. [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model
+> 2. [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the `pyannote/segmentation-3.0` model
+> 3. [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model
+>4. Login with HuggingFace:
+> ```bash
+> huggingface-cli login
+> ```

 ## 💻 Usage Examples

-### Command-line Interface
+#### Command-line Interface

 Start the transcription server with various options:

 ```bash
-# Basic server with English model
-whisperlivekit-server --model tiny.en
-
 # Advanced configuration with diarization
 whisperlivekit-server --host 0.0.0.0 --port 8000 --model medium --diarization --language auto

@ -129,8 +104,8 @@ whisperlivekit-server --backend simulstreaming --model large-v3 --frame-threshol
 ```


-### Python API Integration (Backend)
-Check [basic_server.py](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/basic_server.py) for a complete example.
+#### Python API Integration (Backend)
+Check [basic_server](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/basic_server.py) for a more complete example of how to use the functions and classes.

 ```python
 from whisperlivekit import TranscriptionEngine, AudioProcessor, parse_args
@ -145,14 +120,10 @@ transcription_engine = None
 async def lifespan(app: FastAPI):
    global transcription_engine
    transcription_engine = TranscriptionEngine(model="medium", diarization=True, lan="en")
-    # You can also load from command-line arguments using parse_args()
-    # args = parse_args()
-    # transcription_engine = TranscriptionEngine(**vars(args))
    yield

 app = FastAPI(lifespan=lifespan)

-# Process WebSocket connections
 async def handle_websocket_results(websocket: WebSocket, results_generator):
    async for response in results_generator:
        await websocket.send_json(response)
@ -172,16 +143,16 @@ async def websocket_endpoint(websocket: WebSocket):
        await audio_processor.process_audio(message)        
 ```

-### Frontend Implementation
+#### Frontend Implementation

-The package includes a simple HTML/JavaScript implementation that you can adapt for your project. You can find it [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html), or load its content using `get_web_interface_html()` :
+The package includes an HTML/JavaScript implementation [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html)

 ```python
-from whisperlivekit import get_web_interface_html
+from whisperlivekit import get_web_interface_html #You can also import it in your code
 html_content = get_web_interface_html()
 ```

-## ⚙️ Configuration Reference
+### ⚙️ Configuration Reference

 WhisperLiveKit offers extensive configuration options:

@ -223,14 +194,8 @@ WhisperLiveKit offers extensive configuration options:
 | `--model-path` | Direct path to .pt model file. Download it if not found | `./base.pt` |
 | `--preloaded-model-count` | Optional. Number of models to preload in memory to speed up loading (set up to the expected number of concurrent users) | `1` |

-## 🔧 How It Works

-1. **Audio Capture**: Browser's MediaRecorder API captures audio in webm/opus format
-2. **Streaming**: Audio chunks are sent to the server via WebSocket
-3. **Processing**: Server decodes audio with FFmpeg and streams into the model for transcription
-4. **Real-time Output**: Partial transcriptions appear immediately in light gray (the 'aperçu') and finalized text appears in normal color
-
-## 🚀 Deployment Guide
+### 🚀 Deployment Guide

 To deploy WhisperLiveKit in production:

@ -243,9 +208,7 @@ To deploy WhisperLiveKit in production:
   gunicorn -k uvicorn.workers.UvicornWorker -w 4 your_app:app
   ```

-2. **Frontend Integration**:
-   - Host your customized version of the example HTML/JS in your web application
-   - Ensure WebSocket connection points to your server's address
+2. **Frontend**: Host your customized version of the `html` example & ensure WebSocket connection points correctly

 3. **Nginx Configuration** (recommended for production):
    ```nginx    
@ -272,31 +235,18 @@ A basic Dockerfile is provided which allows re-use of Python package installatio
 - Create a reusable image with only the basics and then run as a named container:
    ```bash
    docker build -t whisperlivekit-defaults .
-    docker create --gpus all --name whisperlivekit -p 8000:8000 whisperlivekit-defaults
+    docker create --gpus all --name whisperlivekit -p 8000:8000 whisperlivekit-defaults --model base
    docker start -i whisperlivekit
    ```

    > **Note**: If you're running on a system without NVIDIA GPU support (such as Mac with Apple Silicon or any system without CUDA capabilities), you need to **remove the `--gpus all` flag** from the `docker create` command. Without GPU acceleration, transcription will use CPU only, which may be significantly slower. Consider using small models for better performance on CPU-only systems.

 #### Customization
- Customize the container options:
-    ```bash
-    docker build -t whisperlivekit-defaults .
-    docker create --gpus all --name whisperlivekit-base -p 8000:8000 whisperlivekit-defaults --model base
-    docker start -i whisperlivekit-base
-    ```

 - `--build-arg` Options:
  - `EXTRAS="whisper-timestamped"` - Add extras to the image's installation (no spaces). Remember to set necessary container options!
  - `HF_PRECACHE_DIR="./.cache/"` - Pre-load a model cache for faster first-time start
  - `HF_TKN_FILE="./token"` - Add your Hugging Face Hub access token to download gated models

-## 🔮 Use Cases
+#### 🔮 Use Cases
 Capture discussions in real-time for meeting transcription, help hearing-impaired users follow conversations through accessibility tools, transcribe podcasts or videos automatically for content creation, transcribe support calls with speaker identification for customer service...
-
-## 🙏 Acknowledgments
-
-We extend our gratitude to the original authors of:
-
-| [Whisper Streaming](https://github.com/ufal/whisper_streaming)  | [SimulStreaming](https://github.com/ufal/SimulStreaming) | [Diart](https://github.com/juanmc2005/diart) | [OpenAI Whisper](https://github.com/openai/whisper) |
-| -------- | ------- | -------- | ------- |