From eb96153ffd0b37d5ef449b112de1f6172c01e13f Mon Sep 17 00:00:00 2001 From: Quentin Fuxa <38427957+QuentinFuxa@users.noreply.github.com> Date: Sun, 17 Aug 2025 22:26:28 +0200 Subject: [PATCH] new vac parameters --- README.md | 134 +++++++++++++++++------------------------------------- 1 file changed, 42 insertions(+), 92 deletions(-) diff --git a/README.md b/README.md index a180936..c2fdd3d 100644 --- a/README.md +++ b/README.md @@ -40,56 +40,37 @@ WhisperLiveKit brings real-time speech transcription directly to your browser, w Architecture - -## Quick Start +### Installation & Quick Start ```bash -# Install the package pip install whisperlivekit - -# Start the transcription server -whisperlivekit-server --model tiny.en - -# Open your browser at http://localhost:8000 to see the interface. -# Use -ssl-certfile public.crt --ssl-keyfile private.key parameters to use SSL ``` -That's it! Start speaking and watch your words appear on screen. +> **FFmpeg is required** and must be installed before using WhisperLiveKit +> +> | OS | How to install | +> |-----------|-------------| +> | Ubuntu/Debian | `sudo apt install ffmpeg` | +> | MacOS | `brew install ffmpeg` | +> | Windows | Download .exe from https://ffmpeg.org/download.html and add to PATH | -## Installation +#### Quick Start +1. **Start the transcription server:** + ```bash + whisperlivekit-server --model tiny.en + ``` + +2. **Open your browser** and navigate to `http://localhost:8000` + +3. **Start speaking** and watch your words appear in real-time! + +> For production use or HTTPS requirements, see the [Parameters](#parameters) section for SSL configuration options. + +#### Optional Dependencies ```bash -#Install from PyPI (Recommended) -pip install whisperlivekit -#Install from Source -git clone https://github.com/QuentinFuxa/WhisperLiveKit -cd WhisperLiveKit -pip install -e . -``` - -### FFmpeg Dependency - -```bash -# Ubuntu/Debian -sudo apt install ffmpeg - -# macOS -brew install ffmpeg - -# Windows -# Download from https://ffmpeg.org/download.html and add to PATH -``` - -### Optional Dependencies - -```bash -# Sentence-based buffer trimming -pip install mosestokenizer wtpsplit -pip install tokenize_uk # If you work with Ukrainian text - -# Speaker diarization -pip install diart +pip install whisperlivekit[diarization] # Speaker diarization # Alternative Whisper backends (default is faster-whisper) pip install whisperlivekit[whisper] # Original Whisper @@ -98,29 +79,23 @@ pip install whisperlivekit[mlx-whisper] # Apple Silicon optimization pip install whisperlivekit[openai] # OpenAI API ``` -### ๐ŸŽน Pyannote Models Setup - -For diarization, you need access to pyannote.audio models: - -1. [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model -2. [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the `pyannote/segmentation-3.0` model -3. [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model -4. Login with HuggingFace: -```bash -pip install huggingface_hub -huggingface-cli login -``` + +> **Pyannote Models Setup** For diarization, you need access to pyannote.audio models: +> 1. [Accept user conditions](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model +> 2. [Accept user conditions](https://huggingface.co/pyannote/segmentation-3.0) for the `pyannote/segmentation-3.0` model +> 3. [Accept user conditions](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model +>4. Login with HuggingFace: +> ```bash +> huggingface-cli login +> ``` ## ๐Ÿ’ป Usage Examples -### Command-line Interface +#### Command-line Interface Start the transcription server with various options: ```bash -# Basic server with English model -whisperlivekit-server --model tiny.en - # Advanced configuration with diarization whisperlivekit-server --host 0.0.0.0 --port 8000 --model medium --diarization --language auto @@ -129,8 +104,8 @@ whisperlivekit-server --backend simulstreaming --model large-v3 --frame-threshol ``` -### Python API Integration (Backend) -Check [basic_server.py](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/basic_server.py) for a complete example. +#### Python API Integration (Backend) +Check [basic_server](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/basic_server.py) for a more complete example of how to use the functions and classes. ```python from whisperlivekit import TranscriptionEngine, AudioProcessor, parse_args @@ -145,14 +120,10 @@ transcription_engine = None async def lifespan(app: FastAPI): global transcription_engine transcription_engine = TranscriptionEngine(model="medium", diarization=True, lan="en") - # You can also load from command-line arguments using parse_args() - # args = parse_args() - # transcription_engine = TranscriptionEngine(**vars(args)) yield app = FastAPI(lifespan=lifespan) -# Process WebSocket connections async def handle_websocket_results(websocket: WebSocket, results_generator): async for response in results_generator: await websocket.send_json(response) @@ -172,16 +143,16 @@ async def websocket_endpoint(websocket: WebSocket): await audio_processor.process_audio(message) ``` -### Frontend Implementation +#### Frontend Implementation -The package includes a simple HTML/JavaScript implementation that you can adapt for your project. You can find it [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html), or load its content using `get_web_interface_html()` : +The package includes an HTML/JavaScript implementation [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html) ```python -from whisperlivekit import get_web_interface_html +from whisperlivekit import get_web_interface_html #You can also import it in your code html_content = get_web_interface_html() ``` -## โš™๏ธ Configuration Reference +### โš™๏ธ Configuration Reference WhisperLiveKit offers extensive configuration options: @@ -223,14 +194,8 @@ WhisperLiveKit offers extensive configuration options: | `--model-path` | Direct path to .pt model file. Download it if not found | `./base.pt` | | `--preloaded-model-count` | Optional. Number of models to preload in memory to speed up loading (set up to the expected number of concurrent users) | `1` | -## ๐Ÿ”ง How It Works -1. **Audio Capture**: Browser's MediaRecorder API captures audio in webm/opus format -2. **Streaming**: Audio chunks are sent to the server via WebSocket -3. **Processing**: Server decodes audio with FFmpeg and streams into the model for transcription -4. **Real-time Output**: Partial transcriptions appear immediately in light gray (the 'aperรงu') and finalized text appears in normal color - -## ๐Ÿš€ Deployment Guide +### ๐Ÿš€ Deployment Guide To deploy WhisperLiveKit in production: @@ -243,9 +208,7 @@ To deploy WhisperLiveKit in production: gunicorn -k uvicorn.workers.UvicornWorker -w 4 your_app:app ``` -2. **Frontend Integration**: - - Host your customized version of the example HTML/JS in your web application - - Ensure WebSocket connection points to your server's address +2. **Frontend**: Host your customized version of the `html` example & ensure WebSocket connection points correctly 3. **Nginx Configuration** (recommended for production): ```nginx @@ -272,31 +235,18 @@ A basic Dockerfile is provided which allows re-use of Python package installatio - Create a reusable image with only the basics and then run as a named container: ```bash docker build -t whisperlivekit-defaults . - docker create --gpus all --name whisperlivekit -p 8000:8000 whisperlivekit-defaults + docker create --gpus all --name whisperlivekit -p 8000:8000 whisperlivekit-defaults --model base docker start -i whisperlivekit ``` > **Note**: If you're running on a system without NVIDIA GPU support (such as Mac with Apple Silicon or any system without CUDA capabilities), you need to **remove the `--gpus all` flag** from the `docker create` command. Without GPU acceleration, transcription will use CPU only, which may be significantly slower. Consider using small models for better performance on CPU-only systems. #### Customization -- Customize the container options: - ```bash - docker build -t whisperlivekit-defaults . - docker create --gpus all --name whisperlivekit-base -p 8000:8000 whisperlivekit-defaults --model base - docker start -i whisperlivekit-base - ``` - `--build-arg` Options: - `EXTRAS="whisper-timestamped"` - Add extras to the image's installation (no spaces). Remember to set necessary container options! - `HF_PRECACHE_DIR="./.cache/"` - Pre-load a model cache for faster first-time start - `HF_TKN_FILE="./token"` - Add your Hugging Face Hub access token to download gated models -## ๐Ÿ”ฎ Use Cases +#### ๐Ÿ”ฎ Use Cases Capture discussions in real-time for meeting transcription, help hearing-impaired users follow conversations through accessibility tools, transcribe podcasts or videos automatically for content creation, transcribe support calls with speaker identification for customer service... - -## ๐Ÿ™ Acknowledgments - -We extend our gratitude to the original authors of: - -| [Whisper Streaming](https://github.com/ufal/whisper_streaming) | [SimulStreaming](https://github.com/ufal/SimulStreaming) | [Diart](https://github.com/juanmc2005/diart) | [OpenAI Whisper](https://github.com/openai/whisper) | -| -------- | ------- | -------- | ------- |