Commit graph

4 commits

Author SHA1 Message Date
Quentin Fuxa
a6a85431f6 update benchmark with qwen3 which reuses kv cache 2026-03-15 22:32:01 +01:00
Quentin Fuxa
dfd5bf417c voxtral mlx : improved chunking 2026-03-14 00:13:29 +01:00
Quentin Fuxa
2fe34427ef Fix voxtral streaming drain and silence flush 2026-01-31 11:12:00 +01:00
Quentin Fuxa
a4da246ea5 feat: add voxtral-mlx native backend for Apple Silicon
Pure-MLX implementation of Voxtral Mini 4B Realtime for low-latency
speech transcription on Apple Silicon. Avoids the transformers/torch
overhead and runs at 0.18-0.32x real-time factor.

- voxtral_mlx/model.py: MLX model with spectrogram, encoder, decoder
- voxtral_mlx/loader.py: model loading with 6-bit quantized weights
- voxtral_mlx/spectrogram.py: mel spectrogram computation in MLX
- voxtral_mlx_asr.py: VoxtralASR adapter for the AudioProcessor pipeline
2026-02-22 23:28:10 +01:00