WhisperLiveKit

History

Quentin Fuxa 8c799fa4d1 fix simulstreaming vram leak: cap cross-attn accumulation + token budget fixes #283, fixes #275 - accumulated_cross_attns was growing unboundedly during decoding loop, using up to ~5GB for repetition loops. now capped to rolling window of 16 - max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50) instead of actual text token rate (~15/s), allowing 10-40x too many decoding steps - removed unused torch.cat on early return path - removed dead self.committed/last_result_tokens lists (never read) - same fixes applied to mlx variant		2026-02-11 22:10:00 +01:00
..
mlx	fix simulstreaming vram leak: cap cross-attn accumulation + token budget	2026-02-11 22:10:00 +01:00
__init__.py	refactor(simulstreaming): extract backend + online module into separate files from whisper streaming	2025-08-08 18:07:51 +02:00
backend.py	fix simulstreaming vram leak: cap cross-attn accumulation + token budget	2026-02-11 22:10:00 +01:00
beam.py	alignatt: enable model sharing by removing hooks and centralizing session state. Solves #282	2025-11-25 23:07:42 +01:00
config.py	isort	2025-11-23 11:20:00 +01:00
decoder_state.py	Fix critical thread safety issues	2026-01-09 11:23:19 -05:00
eow_detection.py	modify import paths in simul whisper backend so that it works in lib mode	2025-07-01 20:34:47 +02:00
mlx_encoder.py	loader for full mlx	2024-11-25 23:52:00 +01:00
simul_whisper.py	fix simulstreaming vram leak: cap cross-attn accumulation + token budget	2026-02-11 22:10:00 +01:00
token_buffer.py	isort	2025-11-23 11:20:00 +01:00