fixes #283, fixes #275 - accumulated_cross_attns was growing unboundedly during decoding loop, using up to ~5GB for repetition loops. now capped to rolling window of 16 - max_tokens_per_chunk was using TOKENS_PER_SECOND (mel frame rate = 50) instead of actual text token rate (~15/s), allowing 10-40x too many decoding steps - removed unused torch.cat on early return path - removed dead self.committed/last_result_tokens lists (never read) - same fixes applied to mlx variant |
||
|---|---|---|
| .. | ||
| mlx | ||
| __init__.py | ||
| backend.py | ||
| beam.py | ||
| config.py | ||
| decoder_state.py | ||
| eow_detection.py | ||
| mlx_encoder.py | ||
| simul_whisper.py | ||
| token_buffer.py | ||