Commit graph

303 commits

Author SHA1 Message Date
Quentin Fuxa
3b7a2fcc87 Add Qwen3-ASR MLX SimulStreaming backend
New backend 'qwen3-mlx-simul' for Apple Silicon: AlignAtt border
detection via monkey-patched cross-attention on MLX Qwen3-ASR.
Supports 0.6B (RTF 0.236 on M5) and 1.7B models.

- qwen3_mlx_simul.py: full streaming implementation with KV cache,
  alignment head attention extraction, border-distance policy
- core.py: register new backend in TranscriptionEngine + online_factory
- parse_args.py: add qwen3-mlx-simul to CLI choices
2026-03-15 11:00:00 +01:00
Quentin Fuxa
22325ba326 tune simul-kv: 2s inference interval, configurable min_new_seconds 2026-03-15 21:30:00 +01:00
Quentin Fuxa
a540a5fd10 fix simul-kv audio trim bug, add 1.7B v2 alignment heads 2026-03-15 20:45:00 +01:00
Quentin Fuxa
b69eaf82be qwen3 simul+kv: optimized streaming with kv cache reuse 2026-03-15 18:30:00 +01:00
Quentin Fuxa
ed503be140 qwen 2026-01-02 23:52:00 +01:00
Quentin Fuxa
a6a85431f6 update benchmark with qwen3 which reuses kv cache 2026-03-15 22:32:01 +01:00
Quentin Fuxa
dd48997674 qwen3: reuse encoder kv cache 2026-03-15 22:31:39 +01:00
Quentin Fuxa
0c73a73aa3 update benchmark results and procedure 2026-03-15 11:16:15 +01:00
Quentin Fuxa
dfd5bf417c voxtral mlx : improved chunking 2026-03-14 00:13:29 +01:00
Quentin Fuxa
e7e3441ca4 Add Qwen3 ASR backend 2026-03-07 11:48:00 +01:00
Quentin Fuxa
9abe26a996 Add CLI with serve, transcribe, listen, pull, diagnose 2026-03-01 13:37:00 +01:00
Quentin Fuxa
586540ae36 Add test harness and test client 2026-02-22 16:19:00 +01:00
Quentin Fuxa
cd8df8e1aa Update package setup and exports 2026-02-21 11:33:00 +01:00
Quentin Fuxa
e30f9a2573 Improve diarization backends 2026-02-15 14:55:00 +01:00
Quentin Fuxa
32de7b1276 Fix frontend buffer rendering for slow backends 2026-02-14 09:28:00 +01:00
Quentin Fuxa
9ac7c26a0b Add OpenAI REST API and Deepgram WebSocket 2026-02-08 15:42:00 +01:00
Quentin Fuxa
c0e2600993 Add snapshot-then-diff WebSocket protocol 2026-02-07 10:17:00 +01:00
Quentin Fuxa
e0db3a98f9 Add per-session language proxy 2026-02-01 17:03:00 +01:00
Quentin Fuxa
2fe34427ef Fix voxtral streaming drain and silence flush 2026-01-31 11:12:00 +01:00
Quentin Fuxa
d58365421f Refactor audio processor async pipeline 2026-01-25 13:48:00 +01:00
Quentin Fuxa
a282cbe75f Improve tokens alignment and silence handling 2026-01-24 10:55:00 +01:00
Quentin Fuxa
6e85c16614 Refactor TranscriptionEngine singleton 2026-01-18 15:27:00 +01:00
Quentin Fuxa
e1823dd99c Improve online ASR processor 2026-01-17 09:35:00 +01:00
Quentin Fuxa
e144abbbc7 Refactor timed objects and data structures 2026-01-11 16:08:00 +01:00
Quentin Fuxa
83362c89c4 Clean up config and model paths 2026-01-10 11:42:00 +01:00
Quentin Fuxa
cf6c49f502 Ruff lint cleanup 2026-01-03 10:23:00 +01:00
Quentin Fuxa
451535d48f Fix ctranslate2 encoder conversion (#345) and memory leak in TokensAlignment (#344)
- Add fallback chain for StorageView to numpy conversion
- Prune old tokens/segments after 5min to bound memory
2026-03-10 22:37:00 +01:00
Quentin Fuxa
abfaf06203 Merge branch 'main' of https://github.com/QuentinFuxa/WhisperLiveKit 2026-03-04 18:17:23 +01:00
Quentin Fuxa
d1fe932241 Apply DRY method v0 - to try to catch and resolve infinite loops such as in #338 2026-03-03 22:52:00 +01:00
Chingning Chen
b63f54e838 fix(whisper/tokenizer): prevent IndexError from crashing multilingual streams
This fix addresses a critical bug in the Whisper tokenizer that causes
the transcription server to crash with an `IndexError: string index out
of range` when streaming audio in languages utilizing multi-byte UTF-8
characters (e.g., Cantonese, Japanese, Mandarin).

When a 3-byte character is cut off at the boundary of an audio chunk,
incomplete bytes are decoded into a single Unicode replacement character
(`\ufffd`), artificially shortening the string and breaking the offset
mapping assumed by `split_tokens_on_unicode`.

This ports the upstream fix from SYSTRAN/faster-whisper (PR #111) to add
a strict bounds check before accessing the string index, allowing
incomplete bytes to be safely caught and handled in the next chunk.
2026-03-02 15:31:43 +08:00
Quentin Fuxa
66e58624b9 disable MLXAlignAtt which fails on special characters 2026-03-01 11:52:00 +01:00
Quentin Fuxa
4b2377c243 fix: correct false auto-detect claim, median bug, RTF inflation
- BENCHMARK.md: whisper also supports --language auto, voxtral is not
  the only one. Fixed mlx-whisper speed comparison (LA is actually
  faster than SS for mlx-whisper, not comparable).
- metrics.py: median calculation was wrong for even-length lists
  (took upper middle instead of averaging the two middle values).
- metrics_collector.py: RTF was inflated because log_summary() used
  wall-clock elapsed time instead of sum of actual ASR call durations.
- README.md: clarified that whisper also supports auto language
  detection, voxtral just does it better.
- Added 2 new median tests (even + odd length).
2026-02-22 23:38:04 +01:00
Quentin Fuxa
a4da246ea5 feat: add voxtral-mlx native backend for Apple Silicon
Pure-MLX implementation of Voxtral Mini 4B Realtime for low-latency
speech transcription on Apple Silicon. Avoids the transformers/torch
overhead and runs at 0.18-0.32x real-time factor.

- voxtral_mlx/model.py: MLX model with spectrogram, encoder, decoder
- voxtral_mlx/loader.py: model loading with 6-bit quantized weights
- voxtral_mlx/spectrogram.py: mel spectrogram computation in MLX
- voxtral_mlx_asr.py: VoxtralASR adapter for the AudioProcessor pipeline
2026-02-22 23:28:10 +01:00
Quentin Fuxa
9b2c3ee844 docs: update README with voxtral backend, benchmarks, testing sections
- Add Voxtral Backend section explaining voxtral-mlx and voxtral (HF).
- Add Testing & Benchmarks section with commands to run tests/benchmarks.
- Update --backend parameter docs to include voxtral-mlx and voxtral.
- Update optional dependencies table with Voxtral entry.
- Link to BENCHMARK.md for detailed performance comparisons.
2026-02-22 23:27:57 +01:00
Quentin Fuxa
f5eee67b11 fix: silence double-counting bug, add metrics module and runtime instrumentation
- Fix _begin_silence pushing same object reference as _end_silence,
  causing the consumer to process two ended events and double the
  silence duration.
- Fix initial silence never cleared when VAC is disabled, causing
  the no-VAC path to enqueue zero audio.
- Add sample-precise silence boundaries (at_sample parameter).
- Add whisperlivekit/metrics.py with WER computation (word-level
  Levenshtein) and timestamp accuracy (greedy alignment). No
  external dependencies.
- Add whisperlivekit/metrics_collector.py with SessionMetrics
  dataclass for per-session runtime observability. Instrumented
  at 6 points in AudioProcessor: init, process_audio,
  transcription_processor, _end_silence, results_formatter, cleanup.
  Emits SESSION_METRICS structured log line on session end.
2026-02-22 23:27:12 +01:00
Quentin Fuxa
4a6868e3e1 correct processor attributes mixtral 2026-02-22 21:13:21 +01:00
Quentin Fuxa
3c15246fc0 mixstral hf v0 2026-02-20 20:49:57 +01:00
Quentin Fuxa
b8d9d7d289 fix: handle numpy object_ dtype from ctranslate2 encoder (#337) 2026-02-20 20:48:28 +01:00
Quentin Fuxa
4c7706e2cf fix: use vac_chunk_size for audio processing interval when VAC is enabled (#334) 2026-02-20 20:48:06 +01:00
Quentin Fuxa
7f3a3df620 simulstreaming mlx & torch dedup of common base 2025-02-15 23:52:00 +01:00
Quentin Fuxa
8c799fa4d1 fix simulstreaming vram leak: cap cross-attn accumulation + token budget
fixes #283, fixes #275

- accumulated_cross_attns was growing unboundedly during decoding loop,
  using up to ~5GB for repetition loops. now capped to rolling window of 16
- max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50)
  instead of actual text token rate (~15/s), allowing 10-40x too many
  decoding steps
- removed unused torch.cat on early return path
- removed dead self.committed/last_result_tokens lists (never read)
- same fixes applied to mlx variant
2026-02-11 22:10:00 +01:00
Quentin Fuxa
8923337380 fix --direct-english-translation not setting task=translate for localagreement backends
the flag was only used for tokenizer language selection but never
actually passed to whisper/faster-whisper transcribe calls. also init
OpenaiApiASR.task and read from transcribe_kargs.

fixes #306
2026-02-11 22:10:00 +01:00
Quentin Fuxa
aded1649ae fix model_cache_dir + direct_english_translation task in simulstreaming
pass actual cache dir instead of None, and use proper task string
instead of boolean for AlignAttConfig

fixes #310
2026-02-11 22:10:00 +01:00
Quentin Fuxa
3b535e857a fix NoneType concatenation in add_translation
fixes #296
2026-02-11 22:10:00 +01:00
Quentin Fuxa
d649250b9a fix Segment classmethod call + isinstance type narrowing
fixes #331, fixes #329
2026-02-11 22:10:00 +01:00
Quentin Fuxa
7735478286 add insert_audio_chunk to DiartDiarization
fixes #332
2026-02-11 22:10:00 +01:00
Quentin Fuxa
b9e72d2b9a add probability field to ASRToken
fixes #330, fixes #313
2026-02-11 22:10:00 +01:00
Emmanuel Schmidbauer
d59ddbaeae Fix critical thread safety issues 2026-01-09 11:23:19 -05:00
Anton Jacobson
3db5d81a20 update diarization lag after stream analysed 2025-12-18 14:13:28 +01:00
Quentin Fuxa
ec09898a9f fixes #301 2025-12-06 10:19:50 +01:00