From 8ae38a48efaa6aaa0c2b6e8953b4c0a25f5c6f91 Mon Sep 17 00:00:00 2001
From: Quentin Fuxa <38427957+QuentinFuxa@users.noreply.github.com>
Date: Wed, 5 Mar 2025 18:18:38 +0100
Subject: [PATCH] Update README.md
---
README.md | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/README.md b/README.md
index e7fd47c..06db726 100644
--- a/README.md
+++ b/README.md
@@ -3,30 +3,25 @@
This project is based on [Whisper Streaming](https://github.com/ufal/whisper_streaming) and lets you transcribe audio directly from your browser. Simply launch the local server and grant microphone access. Everything runs locally on your machine ✨
-
+
### Differences from [Whisper Streaming](https://github.com/ufal/whisper_streaming)
#### ⚙️ **Core Improvements**
-- **Buffering Preview** – Displays unvalidated transcription segments for immediate feedback.
-- **Multi-User Support** – Handles multiple users simultaneously without conflicts.
+- **Buffering Preview** – Displays unvalidated transcription segments
+- **Multi-User Support** – Handles multiple users simultaneously by decoupling backend and online asr
- **MLX Whisper Backend** – Optimized for Apple Silicon for faster local processing.
-- **Enhanced Sentence Segmentation** – Improved buffer trimming for better accuracy across languages.
- **Confidence validation** – Immediately validate high-confidence tokens for faster inference
#### 🎙️ **Speaker Identification**
-- **Real-Time Diarization** – Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart).
+- **Real-Time Diarization** – Identify different speakers in real time using [Diart](https://github.com/juanmc2005/diart)
#### 🌐 **Web & API**
-- **Built-in Web UI** – Simple browser interface with no frontend setup required
+- **Built-in Web UI** – Simple raw html browser interface with no frontend setup required
- **FastAPI WebSocket Server** – Real-time speech-to-text processing with async FFmpeg streaming.
- **JavaScript Client** – Ready-to-use MediaRecorder implementation for seamless client-side integration.
-#### 🚀 **Coming Soon**
-
-- **Enhanced Diarization Performance** – Optimize speaker identification by implementing longer steps for Diart processing and leveraging language-specific segmentation patterns to improve speaker boundary detection
-
## Installation
@@ -86,6 +81,8 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
python whisper_fastapi_online_server.py --host 0.0.0.0 --port 8000
```
+ **Parameters**
+
All [Whisper Streaming](https://github.com/ufal/whisper_streaming) parameters are supported.
Additional parameters:
- `--host` and `--port` let you specify the server’s IP/port.
@@ -94,7 +91,7 @@ This project is based on [Whisper Streaming](https://github.com/ufal/whisper_str
- `--diarization`: Enable/disable speaker diarization (default: False)
- `--confidence-validation`: Use confidence scores for faster validation. Transcription will be faster but punctuation might be less accurate (default: True)
-4. **Open the Provided HTML**:
+5. **Open the Provided HTML**:
- By default, the server root endpoint `/` serves a simple `live_transcription.html` page.
- Open your browser at `http://localhost:8000` (or replace `localhost` and `8000` with whatever you specified).