WhisperLiveKit

Author	SHA1	Message	Date
Quentin Fuxa	3b7a2fcc87	Add Qwen3-ASR MLX SimulStreaming backend New backend 'qwen3-mlx-simul' for Apple Silicon: AlignAtt border detection via monkey-patched cross-attention on MLX Qwen3-ASR. Supports 0.6B (RTF 0.236 on M5) and 1.7B models. - qwen3_mlx_simul.py: full streaming implementation with KV cache, alignment head attention extraction, border-distance policy - core.py: register new backend in TranscriptionEngine + online_factory - parse_args.py: add qwen3-mlx-simul to CLI choices	2026-03-15 11:00:00 +01:00
Quentin Fuxa	22325ba326	tune simul-kv: 2s inference interval, configurable min_new_seconds	2026-03-15 21:30:00 +01:00
Quentin Fuxa	a540a5fd10	fix simul-kv audio trim bug, add 1.7B v2 alignment heads	2026-03-15 20:45:00 +01:00
Quentin Fuxa	b69eaf82be	qwen3 simul+kv: optimized streaming with kv cache reuse	2026-03-15 18:30:00 +01:00
Quentin Fuxa	ed503be140	qwen	2026-01-02 23:52:00 +01:00
Quentin Fuxa	a6a85431f6	update benchmark with qwen3 which reuses kv cache	2026-03-15 22:32:01 +01:00
Quentin Fuxa	dd48997674	qwen3: reuse encoder kv cache	2026-03-15 22:31:39 +01:00
Quentin Fuxa	0c73a73aa3	update benchmark results and procedure	2026-03-15 11:16:15 +01:00
Quentin Fuxa	dfd5bf417c	voxtral mlx : improved chunking	2026-03-14 00:13:29 +01:00
Quentin Fuxa	e7e3441ca4	Add Qwen3 ASR backend	2026-03-07 11:48:00 +01:00
Quentin Fuxa	9abe26a996	Add CLI with serve, transcribe, listen, pull, diagnose	2026-03-01 13:37:00 +01:00
Quentin Fuxa	586540ae36	Add test harness and test client	2026-02-22 16:19:00 +01:00
Quentin Fuxa	cd8df8e1aa	Update package setup and exports	2026-02-21 11:33:00 +01:00
Quentin Fuxa	e30f9a2573	Improve diarization backends	2026-02-15 14:55:00 +01:00
Quentin Fuxa	32de7b1276	Fix frontend buffer rendering for slow backends	2026-02-14 09:28:00 +01:00
Quentin Fuxa	9ac7c26a0b	Add OpenAI REST API and Deepgram WebSocket	2026-02-08 15:42:00 +01:00
Quentin Fuxa	c0e2600993	Add snapshot-then-diff WebSocket protocol	2026-02-07 10:17:00 +01:00
Quentin Fuxa	e0db3a98f9	Add per-session language proxy	2026-02-01 17:03:00 +01:00
Quentin Fuxa	2fe34427ef	Fix voxtral streaming drain and silence flush	2026-01-31 11:12:00 +01:00
Quentin Fuxa	d58365421f	Refactor audio processor async pipeline	2026-01-25 13:48:00 +01:00
Quentin Fuxa	a282cbe75f	Improve tokens alignment and silence handling	2026-01-24 10:55:00 +01:00
Quentin Fuxa	6e85c16614	Refactor TranscriptionEngine singleton	2026-01-18 15:27:00 +01:00
Quentin Fuxa	e1823dd99c	Improve online ASR processor	2026-01-17 09:35:00 +01:00
Quentin Fuxa	e144abbbc7	Refactor timed objects and data structures	2026-01-11 16:08:00 +01:00
Quentin Fuxa	83362c89c4	Clean up config and model paths	2026-01-10 11:42:00 +01:00
Quentin Fuxa	cf6c49f502	Ruff lint cleanup	2026-01-03 10:23:00 +01:00
Quentin Fuxa	451535d48f	Fix ctranslate2 encoder conversion (#345 ) and memory leak in TokensAlignment (#344 ) - Add fallback chain for StorageView to numpy conversion - Prune old tokens/segments after 5min to bound memory	2026-03-10 22:37:00 +01:00
Quentin Fuxa	abfaf06203	Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit	2026-03-04 18:17:23 +01:00
Quentin Fuxa	d1fe932241	Apply DRY method v0 - to try to catch and resolve infinite loops such as in #338	2026-03-03 22:52:00 +01:00
Chingning Chen	b63f54e838	fix(whisper/tokenizer): prevent IndexError from crashing multilingual streams This fix addresses a critical bug in the Whisper tokenizer that causes the transcription server to crash with an `IndexError: string index out of range` when streaming audio in languages utilizing multi-byte UTF-8 characters (e.g., Cantonese, Japanese, Mandarin). When a 3-byte character is cut off at the boundary of an audio chunk, incomplete bytes are decoded into a single Unicode replacement character (`\ufffd`), artificially shortening the string and breaking the offset mapping assumed by `split_tokens_on_unicode`. This ports the upstream fix from SYSTRAN/faster-whisper (PR #111) to add a strict bounds check before accessing the string index, allowing incomplete bytes to be safely caught and handled in the next chunk.	2026-03-02 15:31:43 +08:00
Quentin Fuxa	66e58624b9	disable MLXAlignAtt which fails on special characters	2026-03-01 11:52:00 +01:00
Quentin Fuxa	4b2377c243	fix: correct false auto-detect claim, median bug, RTF inflation - BENCHMARK.md: whisper also supports --language auto, voxtral is not the only one. Fixed mlx-whisper speed comparison (LA is actually faster than SS for mlx-whisper, not comparable). - metrics.py: median calculation was wrong for even-length lists (took upper middle instead of averaging the two middle values). - metrics_collector.py: RTF was inflated because log_summary() used wall-clock elapsed time instead of sum of actual ASR call durations. - README.md: clarified that whisper also supports auto language detection, voxtral just does it better. - Added 2 new median tests (even + odd length).	2026-02-22 23:38:04 +01:00
Quentin Fuxa	a4da246ea5	feat: add voxtral-mlx native backend for Apple Silicon Pure-MLX implementation of Voxtral Mini 4B Realtime for low-latency speech transcription on Apple Silicon. Avoids the transformers/torch overhead and runs at 0.18-0.32x real-time factor. - voxtral_mlx/model.py: MLX model with spectrogram, encoder, decoder - voxtral_mlx/loader.py: model loading with 6-bit quantized weights - voxtral_mlx/spectrogram.py: mel spectrogram computation in MLX - voxtral_mlx_asr.py: VoxtralASR adapter for the AudioProcessor pipeline	2026-02-22 23:28:10 +01:00
Quentin Fuxa	9b2c3ee844	docs: update README with voxtral backend, benchmarks, testing sections - Add Voxtral Backend section explaining voxtral-mlx and voxtral (HF). - Add Testing & Benchmarks section with commands to run tests/benchmarks. - Update --backend parameter docs to include voxtral-mlx and voxtral. - Update optional dependencies table with Voxtral entry. - Link to BENCHMARK.md for detailed performance comparisons.	2026-02-22 23:27:57 +01:00
Quentin Fuxa	f5eee67b11	fix: silence double-counting bug, add metrics module and runtime instrumentation - Fix _begin_silence pushing same object reference as _end_silence, causing the consumer to process two ended events and double the silence duration. - Fix initial silence never cleared when VAC is disabled, causing the no-VAC path to enqueue zero audio. - Add sample-precise silence boundaries (at_sample parameter). - Add whisperlivekit/metrics.py with WER computation (word-level Levenshtein) and timestamp accuracy (greedy alignment). No external dependencies. - Add whisperlivekit/metrics_collector.py with SessionMetrics dataclass for per-session runtime observability. Instrumented at 6 points in AudioProcessor: init, process_audio, transcription_processor, _end_silence, results_formatter, cleanup. Emits SESSION_METRICS structured log line on session end.	2026-02-22 23:27:12 +01:00
Quentin Fuxa	4a6868e3e1	correct processor attributes mixtral	2026-02-22 21:13:21 +01:00
Quentin Fuxa	3c15246fc0	mixstral hf v0	2026-02-20 20:49:57 +01:00
Quentin Fuxa	b8d9d7d289	fix: handle numpy object_ dtype from ctranslate2 encoder (#337 )	2026-02-20 20:48:28 +01:00
Quentin Fuxa	4c7706e2cf	fix: use vac_chunk_size for audio processing interval when VAC is enabled (#334 )	2026-02-20 20:48:06 +01:00
Quentin Fuxa	7f3a3df620	simulstreaming mlx & torch dedup of common base	2025-02-15 23:52:00 +01:00
Quentin Fuxa	8c799fa4d1	fix simulstreaming vram leak: cap cross-attn accumulation + token budget fixes #283, fixes #275 - accumulated_cross_attns was growing unboundedly during decoding loop, using up to ~5GB for repetition loops. now capped to rolling window of 16 - max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50) instead of actual text token rate (~15/s), allowing 10-40x too many decoding steps - removed unused torch.cat on early return path - removed dead self.committed/last_result_tokens lists (never read) - same fixes applied to mlx variant	2026-02-11 22:10:00 +01:00
Quentin Fuxa	8923337380	fix --direct-english-translation not setting task=translate for localagreement backends the flag was only used for tokenizer language selection but never actually passed to whisper/faster-whisper transcribe calls. also init OpenaiApiASR.task and read from transcribe_kargs. fixes #306	2026-02-11 22:10:00 +01:00
Quentin Fuxa	aded1649ae	fix model_cache_dir + direct_english_translation task in simulstreaming pass actual cache dir instead of None, and use proper task string instead of boolean for AlignAttConfig fixes #310	2026-02-11 22:10:00 +01:00
Quentin Fuxa	3b535e857a	fix NoneType concatenation in add_translation fixes #296	2026-02-11 22:10:00 +01:00
Quentin Fuxa	d649250b9a	fix Segment classmethod call + isinstance type narrowing fixes #331, fixes #329	2026-02-11 22:10:00 +01:00
Quentin Fuxa	7735478286	add insert_audio_chunk to DiartDiarization fixes #332	2026-02-11 22:10:00 +01:00
Quentin Fuxa	b9e72d2b9a	add probability field to ASRToken fixes #330, fixes #313	2026-02-11 22:10:00 +01:00
Emmanuel Schmidbauer	d59ddbaeae	Fix critical thread safety issues	2026-01-09 11:23:19 -05:00
Anton Jacobson	3db5d81a20	update diarization lag after stream analysed	2025-12-18 14:13:28 +01:00
Quentin Fuxa	ec09898a9f	fixes #301	2025-12-06 10:19:50 +01:00

1 2 3 4 5 ...

303 commits