Commit graph

303 commits

Author SHA1 Message Date
Quentin Fuxa
719e8b1a20 adapt online for mlx detection 2024-11-25 23:52:00 +01:00
Quentin Fuxa
f1b47178d8 adapt online for mlx detection 2024-11-25 23:52:00 +01:00
Quentin Fuxa
59db08e961 loader for full mlx 2024-11-25 23:52:00 +01:00
Quentin Fuxa
6fc20b9562 new dec class 2024-11-21 23:52:00 +01:00
Quentin Fuxa
fac8659161 uses native mlx function for attention 2024-11-21 23:52:00 +01:00
Quentin Fuxa
4d9332ce7d fixes #299 2025-12-05 17:54:14 +01:00
Quentin Fuxa
62444ce746 session parameter required in OnnxWrapper 2025-12-05 15:37:18 +01:00
Quentin Fuxa
2431a6bf91 isolated VAD states per user: .onnx: share a stateless model. .jit: require duplicating the model.
Co-authored-by: eschmidbauer <eschmidbauer@gmail.com>
2025-12-05 15:27:14 +01:00
Zizheng Guo
30ddd522a4
Fix local agreement backend, removing excess parameter, fixes https://github.com/QuentinFuxa/WhisperLiveKit/issues/295 2025-12-04 16:45:23 +08:00
Quentin Fuxa
82cd24bb75 LoRa path v0 - functional 2025-11-29 17:21:10 +01:00
Quentin Fuxa
d45c397c6a simulstreaming: limit n tokens to prevent hallucinations 2025-11-28 21:41:19 +01:00
Quentin Fuxa
1d88ba9d69 Fixes #294. improve model path backend detection and file extraction 2025-11-27 23:14:00 +01:00
Quentin Fuxa
c0965c6c31 Lines to Segments. Merging dataclasses 2025-11-27 21:54:58 +01:00
Quentin Fuxa
7faa21f95f alignatt: enable model sharing by removing hooks and centralizing session state. Solves #282
Co-authored-by: Emmanuel Schmidbauer <eschmidbauer@gmail.com>
2025-11-25 23:07:42 +01:00
Quentin Fuxa
4e9f951551 correct silences handling when language not auto 2025-11-20 11:20:00 +01:00
Quentin Fuxa
870141298c isort 2025-11-23 11:20:00 +01:00
Quentin Fuxa
a175d1a327 fixes silence detected but never reported by silero 2025-11-23 11:20:00 +01:00
Quentin Fuxa
6206fff118 0.2.15 2025-11-21 23:52:00 +01:00
Quentin Fuxa
b5067249c0 stt/diar/nllw alignment: internal rework 5 2025-11-20 23:52:00 +01:00
Quentin Fuxa
f4f9831d39 stt/diar/nllw alignment: internal rework 5 2025-11-20 23:52:00 +01:00
Quentin Fuxa
254faaf64c stt/diar/nllw alignment: internal rework 5 2025-11-20 23:52:00 +01:00
Quentin Fuxa
8e7aea4fcf internal rework 4 2025-11-20 23:45:20 +01:00
Quentin Fuxa
270faf2069 internal rework 3 2025-11-20 22:28:30 +01:00
Quentin Fuxa
b7c1cc77cc internal rework 2 2025-11-20 22:06:38 +01:00
Quentin Fuxa
9a45ec221c internal rework 1 2025-11-20 12:58:38 +01:00
Quentin Fuxa
b7d20a0ff0 segment attribution in result formatter 2025-11-19 21:10:28 +01:00
Quentin Fuxa
c1bb9c2bde reduce flickering remaining_time_transcription 2025-11-19 19:09:37 +01:00
Quentin Fuxa
11e9def0b2 diarization corrections 2025-11-19 19:06:03 +01:00
Quentin Fuxa
3104f40f6e fixes #279 #278 2025-11-19 18:17:50 +01:00
Quentin Fuxa
e9b4ceeee5 Add audio partial silence in chunks handling. bump to 0.2.14.post3 2025-11-17 22:52:00 +01:00
Quentin Fuxa
437641fb43 reduce min-chunk-size to 0.1, set default model to base 2027-04-25 23:52:00 +02:00
Quentin Fuxa
bfd60b3921 Add audio partial silence in chunks handling. bump to 0.2.14.post2 2025-11-17 22:52:00 +01:00
Quentin Fuxa
1e67bf97f0 improve buffering when use of heavy models 2027-04-25 23:52:00 +02:00
Quentin Fuxa
bbd4fd6cff Merge branch 'improve_EOS_handling' 2025-11-16 22:30:31 +01:00
Quentin Fuxa
28985962a0 Silence handling: finish transcription even if not validated at the BEGINNING of the silence 2025-11-16 22:29:08 +01:00
Quentin Fuxa
a38c103fcd simulstreaming coreml encoder compatibility 2025-11-16 21:24:14 +01:00
Quentin Fuxa
4d2ffb24f8 coreml conversion 2025-11-16 19:11:43 +01:00
Quentin Fuxa
1bbbb7903c lora loader in shared whisper core 2025-11-16 18:44:35 +01:00
Quentin Fuxa
80b77998f9 Refactor backend handling 2025-11-15 19:51:41 +01:00
Quentin Fuxa
d310f7e25f hf compatibility 2025-11-15 18:34:19 +01:00
Quentin Fuxa
8d9be88fe6 translation buffer is now displayed in frontend 2025-11-10 15:22:26 +01:00
Quentin Fuxa
16461052ed task to direct-english-translation 2025-11-10 13:20:26 +01:00
Quentin Fuxa
5491dbd824 last_validated_token handled in state 2025-11-10 13:18:52 +01:00
Quentin Fuxa
13401ffe24 whisper core at root of wlk 2025-11-10 12:17:18 +01:00
Quentin Fuxa
7108d2ddc5 fixes https://github.com/QuentinFuxa/WhisperLiveKit/issues/269 2025-11-09 20:08:18 +01:00
Quentin Fuxa
a732e0903e Add a script to detect alignement heads, usefull for distilled whisper 2025-11-09 18:12:09 +01:00
Quentin Fuxa
0491681be4 Distilled model compatibility with HF config.json to ModelDimensions 2025-11-08 20:20:05 +01:00
Quentin Fuxa
ffe5284764 _processing_tasks_done checks task completion 2025-11-05 23:34:00 +01:00
Quentin Fuxa
06b31f51eb exception when translation and no nllw 2025-10-30 23:30:19 +01:00
Quentin Fuxa
ece02db6a3 Use optional new separate NLLW package for translation 2025-10-30 19:36:28 +01:00
Quentin Fuxa
939a7ebf8b Translation Local Agreement + Cache optimization v0. Not connected yet 2025-10-28 00:16:52 +01:00
Quentin Fuxa
61edb70fff audioProcessor state variables are now uniquely in State dataclass 2025-10-26 18:54:47 +01:00
Quentin Fuxa
4e455b8aab translation now separates validated from output buffer tokens 2025-10-26 18:51:09 +01:00
Quentin Fuxa
9434390ad3 simplify task stopping condition 2025-10-26 17:26:43 +01:00
Quentin Fuxa
65250db92c tensor to list at the stream end 2025-10-26 16:40:12 +01:00
Quentin Fuxa
416dce7975 fixes #261
Co-authored-by: yosagi <11404771+yosagi@users.noreply.github.com>"
2025-10-25 14:20:08 +02:00
Quentin Fuxa
0c5365e7c6 fixes #258 2025-10-24 20:51:16 +02:00
Quentin Fuxa
e7b05b0138 migration to silero vad v6: supports onnx 2025-10-23 23:52:00 +02:00
Quentin Fuxa
714fb3b14a custom faster-whisper/mlx whisper encoder available 2025-10-23 20:33:17 +02:00
Quentin Fuxa
0af379c465 DOC: information about file format 2025-10-23 20:32:05 +02:00
Quentin Fuxa
1f684cdd97 fixes #251 2025-10-06 19:53:27 +02:00
Quentin Fuxa
9b1e061b32 forwarded_allow_ips in core 2025-10-04 23:04:00 +02:00
Quentin Fuxa
b4abc158b9
Merge pull request #249 from Damrod/add-ip-forwarding-support
fix wss for reverse proxying
2025-10-06 10:20:05 +02:00
Alvaro Ollero
3736458503 Uvicorn exposes a configuration option to enable reverse proxying from a trusted ip. This PR exposes it downstreams to end clients 2025-10-04 22:21:06 +02:00
Quentin Fuxa
374618e050 token speakers are only reattributed for token coming after last_validated_token 2025-10-04 09:52:00 +02:00
Quentin Fuxa
543972ef38 fixes #248 2025-10-04 09:52:00 +02:00
Quentin Fuxa
a7db39d999 solves incorrect spacing in buffer diarization 2025-10-02 23:04:00 +02:00
Quentin Fuxa
a153e11fe0 update when self.diarization_before_transcription 2025-09-28 11:04:00 +02:00
Quentin Fuxa
ca6f9246cc force language = en for .en models 2025-09-28 11:04:00 +02:00
Quentin Fuxa
d080d675a8 cutom alignment heads parameter for custom models 2025-09-27 11:04:00 +02:00
Quentin Fuxa
40bff38933
Merge pull request #239 from msghik/feature/fine-tuned-model-support
feat: Allow loading fine-tuned models in simulstreaming
2025-09-29 10:08:26 +02:00
Quentin Fuxa
2fe3ca0188 connect source to output destination when used as chrome extension to keep audio playing 2025-09-27 13:59:44 +02:00
Quentin Fuxa
545ea15c9a ensure buffer size to be a multiple of the element size 2025-09-27 13:58:32 +02:00
Quentin Fuxa
8cbaeecc75 cutom alignment heads parameter for custom models 2025-09-27 11:04:00 +02:00
google-labs-jules[bot]
70e854b346 feat: Allow loading fine-tuned models in simulstreaming
This change modifies the `simulstreaming` backend to support loading fine-tuned Whisper models via the `--model_dir` argument.

The `SimulStreamingASR` class has been updated to:
- Use the `model_dir` path directly to load the model, which is the correct procedure for fine-tuned `.pt` files.
- Automatically disable the `faster-whisper` and `mlx-whisper` fast encoders when `model_dir` is used, as they are not compatible with standard fine-tuned models.

The call site in `core.py` already passed the `model_dir` argument, so no changes were needed there. This change makes the `simulstreaming` backend more flexible and allows users to leverage their own custom models.
2025-09-27 07:29:30 +00:00
Quentin Fuxa
d55490cd27 typo and simpler conditions 2025-09-26 20:38:26 +02:00
Quentin Fuxa
b22478c0b4 correct silences handling when language not auto 2025-09-25 23:20:00 +02:00
Quentin Fuxa
94c34efd90 chrome extension ws default to localhost 2025-09-25 23:04:00 +02:00
Quentin Fuxa
9fc6654a4a common frontend for web/ and chrome extension 2025-09-25 23:14:25 +02:00
Quentin Fuxa
4dd5d8bf8a translation compatible with auto and detected language 2025-09-22 11:20:00 +02:00
Quentin Fuxa
6caf3e0485 correct silence handling in translation 2025-09-27 11:58:00 +02:00
Quentin Fuxa
93f002cafb language detection after few seconds working 2025-09-20 11:08:00 +02:00
Quentin Fuxa
c5e30c2c07 svg loaded once in javascript, no more need for StaticFiles 2025-09-20 11:06:00 +02:00
Quentin Fuxa
1c2afb8bd2 svg loaded once in javascript, no more need for StaticFiles 2025-09-20 11:06:00 +02:00
Quentin Fuxa
674b20d3af in buffer while language not detected » 2025-09-21 11:05:00 +02:00
Quentin Fuxa
a5503308c5 O(n) to O(1) for simulstreaming timestamp determination 2025-09-21 11:04:00 +02:00
Quentin Fuxa
e61afdefa3 punctuation is now checked in timed_object 2025-09-22 22:40:39 +02:00
Quentin Fuxa
426d70a790 simulstreaming infer does not return a dictionary anymore 2025-09-21 11:03:00 +02:00
Quentin Fuxa
b03a212fbf fixes #227 , auto language dectection v0.1 - simulstreaming only - when diarization and auto 2025-09-19 19:15:28 +02:00
Quentin Fuxa
1833e7c921 0.2.10 2025-09-16 23:45:00 +02:00
Quentin Fuxa
0a6e5ae9c1 ffmpeg install instruction error indicates --pcm-input alternative 2025-09-17 16:04:17 +02:00
Quentin Fuxa
ee448a37e9 when pcm-input is set, the frontend uses AudioWorklet 2025-09-17 14:55:57 +02:00
Quentin Fuxa
9c051052b0 Merge branch 'main' into ScriptProcessorNode-to-AudioWorklet 2025-09-17 11:28:36 +02:00
Quentin Fuxa
4d7c487614 replace deprecated ScriptProcessorNode with AudioWorklet 2025-09-17 10:53:53 +02:00
Quentin Fuxa
65025cc448 nllb backend can be transformers, and model size can be 1.3B 2025-09-17 10:20:31 +02:00
Quentin Fuxa
bbba1d9bb7 add nllb-backend and translation perf test in dev_notes 2025-09-16 20:45:01 +02:00
Quentin Fuxa
99dc96c644 fixes #224 2025-09-16 18:34:35 +02:00
GeorgeCaoJ
2a27d2030a feat: support web audio 16kHz PCM input and remove ffmpeg dependency 2025-09-15 23:22:25 +08:00
Quentin Fuxa
cd160caaa1 asyncio.to_thread for transcription and translation 2025-09-15 15:23:22 +02:00
Quentin Fuxa
5aa312e437 simulstreaming warmup is done in whisperlivekit.simul_whisper.backend.load_model, not in warmup_online 2025-09-13 20:19:19 +01:00