18 KiB
Local Text-to-Speech Setup
Learn how to run text-to-speech models completely locally using OpenAI-compatible TTS servers, giving you full privacy control and zero ongoing costs for podcast and audio generation.
This guide uses Speaches as an example implementation, but the principles apply to any OpenAI-compatible TTS server.
Why Local Text-to-Speech?
Running text-to-speech locally offers significant advantages:
- 🔒 Complete Privacy: Your content never leaves your machine
- 💰 Zero Ongoing Costs: No per-character or per-minute charges
- ⚡ No Rate Limits: Generate unlimited audio without restrictions
- 🌐 Offline Capability: Works without internet connection
- 🎯 Full Control: Choose and customize your voice models
- 📈 Predictable Costs: One-time setup, no surprises
Available Local TTS Solutions
Open Notebook supports any OpenAI-compatible text-to-speech server. This guide uses Speaches as an example because it's:
- Open-source and actively maintained
- Easy to set up with Docker
- Compatible with OpenAI's TTS API specification
- Supports multiple high-quality voice models
About Speaches
Speaches is an open-source, OpenAI-compatible text-to-speech server that runs locally on your machine. It provides:
- OpenAI API Compatibility: Works seamlessly with Open Notebook's OpenAI-compatible provider
- High-Quality Voices: Support for multiple neural TTS models
- Easy Model Management: Simple CLI for downloading and managing voice models
- Docker Support: Run in containers for easy deployment
- Multiple Voice Options: Various voices and languages available
- Customizable Speed: Adjust speech rate to your preference
Note
: If you're using a different OpenAI-compatible TTS server, the configuration steps will be similar - just adjust the endpoints and model names accordingly.
Quick Start with Speaches
This section demonstrates setup using Speaches as an example. If you're using a different local TTS solution, adapt the steps accordingly.
Prerequisites
- Docker installed on your system
- At least 2GB RAM available
- 5GB disk space for models
Basic Setup
The fastest way to get started is using our example setup:
1. Create a project directory:
mkdir speaches-setup
cd speaches-setup
2. Create a docker-compose.yml file:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cpu
container_name: speaches
ports:
- "8969:8000"
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
restart: unless-stopped
volumes:
hf-hub-cache:
3. Start the Speaches server:
docker compose up -d
4. Download a TTS model:
# Wait a few seconds for the container to start
sleep 10
# Download the recommended Kokoro model
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX
5. Test the setup:
curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \
--output test.mp3 \
--data '{
"input": "Hello! This is a test of local text to speech.",
"model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
"voice": "af_bella",
"speed": 1.0
}'
If successful, you'll have a test.mp3 file with the generated speech!
Configure Open Notebook
Now that Speaches is running, configure Open Notebook to use it:
1. Set the environment variable:
For Docker deployments:
docker run -d \
--name open-notebook \
-p 8502:8502 -p 5055:5055 \
-v ./notebook_data:/app/data \
-v ./surreal_data:/mydata \
-e OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1 \
lfnovo/open_notebook:v1-latest-single
For local development:
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1
2. Add the model in Open Notebook:
- Go to Settings → Models page
- Click Add Model in the Text-to-Speech section
- Configure the model:
- Provider:
openai_compatible - Model Name:
speaches-ai/Kokoro-82M-v1.0-ONNX - Display Name:
Kokoro Local TTS(or your preference)
- Provider:
- Click Save
3. Set as default (optional):
- In Settings, set this model as your default Text-to-Speech model
- Now all podcast generation will use your local TTS
Available Voice Models
Speaches supports various TTS models from Hugging Face. Here are some recommended options:
Kokoro (Recommended)
- Model ID:
speaches-ai/Kokoro-82M-v1.0-ONNX - Size: ~500MB
- Quality: High
- Speed: Fast
- Languages: English
- Voices:
af_bella,af_sarah,am_adam,am_michael, and more
Other Models
You can use any compatible ONNX TTS model from Hugging Face. Check the Speaches documentation for a complete list.
Available Voices
The Kokoro model includes multiple voices with different characteristics:
Female Voices:
af_bella- Clear, professionalaf_sarah- Warm, friendlyaf_nicole- Energetic, expressive
Male Voices:
am_adam- Deep, authoritativeam_michael- Friendly, conversationalbf_emma- British accent, professionalbm_george- British accent, formal
Testing Voices:
# Try different voices to find your favorite
for voice in af_bella af_sarah am_adam am_michael; do
curl "http://localhost:8969/v1/audio/speech" -s \
-H "Content-Type: application/json" \
--output "test_${voice}.mp3" \
--data "{
\"input\": \"Hello! This is a test of the ${voice} voice.\",
\"model\": \"speaches-ai/Kokoro-82M-v1.0-ONNX\",
\"voice\": \"${voice}\"
}"
done
Advanced Configuration
GPU Acceleration
For faster processing with NVIDIA GPUs:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cuda # GPU-enabled image
container_name: speaches
ports:
- "8969:8000"
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
hf-hub-cache:
Custom Port
If port 8969 is already in use, change it in docker-compose.yml:
ports:
- "9000:8000" # Use port 9000 instead
Then update your environment variable:
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:9000/v1
Multiple Models
Download and use multiple models for different purposes:
# Download additional models
docker compose exec speaches uv tool run speaches-cli model download model-name-1
docker compose exec speaches uv tool run speaches-cli model download model-name-2
# List downloaded models
docker compose exec speaches uv tool run speaches-cli model list
In Open Notebook, add each model separately and choose which to use for different podcasts.
Network Configuration
Docker Networking
When Open Notebook runs in Docker and needs to reach Speaches:
On macOS/Windows:
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1
On Linux:
# Option 1: Use Docker bridge IP
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://172.17.0.1:8969/v1
# Option 2: Use host networking
docker run --network host ...
Remote Speaches Server
Run Speaches on a different machine for distributed processing:
# On the server machine
docker compose up -d
# Allow external connections (be careful with firewall settings)
# Update docker-compose.yml to bind to 0.0.0.0:8969
Then configure Open Notebook:
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://server-ip:8969/v1
Security Warning: Only expose Speaches on trusted networks or use proper authentication/firewall rules.
Podcast Generation
Creating Podcasts with Local TTS
Once configured, use Speaches for podcast generation:
- Go to Podcasts page in Open Notebook
- Create or edit an Episode Profile
- Configure speakers:
- For each speaker, select your Speaches model
- Choose different voices (e.g.,
af_bellafor host,am_adamfor guest)
- Generate podcast
- Audio is generated locally using your Speaches server
Multi-Speaker Setup
Create natural-sounding conversations with different voices:
Speaker 1 (Host):
- Model: speaches-ai/Kokoro-82M-v1.0-ONNX
- Voice: af_bella
Speaker 2 (Guest):
- Model: speaches-ai/Kokoro-82M-v1.0-ONNX
- Voice: am_adam
Speaker 3 (Narrator):
- Model: speaches-ai/Kokoro-82M-v1.0-ONNX
- Voice: bf_emma
Performance Optimization
CPU Performance
Recommended Specs:
- 4+ CPU cores
- 4GB+ RAM
- SSD storage
Tips:
- Close unnecessary applications
- Use quantized models when available
- Adjust speech speed for faster generation
Memory Management
Monitor Docker memory usage:
docker stats speaches
Allocate more memory if needed:
services:
speaches:
# ... other config ...
mem_limit: 4g # Adjust based on your system
Batch Processing
For generating multiple audio files, Speaches handles concurrent requests efficiently. Open Notebook automatically manages this during podcast generation.
Troubleshooting
Service Won't Start
Symptom: Container exits immediately
Solutions:
# Check logs
docker compose logs speaches
# Verify Docker is running
docker ps
# Check port availability
lsof -i :8969 # macOS/Linux
netstat -ano | findstr :8969 # Windows
Connection Refused
Symptom: Open Notebook can't reach Speaches
Solutions:
-
Verify Speaches is running:
curl http://localhost:8969/v1/models -
Check Docker networking:
- Use
host.docker.internalinstead oflocalhostwhen Open Notebook is in Docker - Verify firewall settings
- Use
-
Test from inside Open Notebook container:
docker exec -it open-notebook curl http://host.docker.internal:8969/v1/models
Model Not Found
Symptom: Error about missing model during generation
Solutions:
-
Verify model is downloaded:
docker compose exec speaches uv tool run speaches-cli model list -
Download the model:
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX -
Check model name matches what you configured in Open Notebook
Poor Audio Quality
Symptom: Generated speech sounds robotic or unclear
Solutions:
- Try different voices
- Adjust speech speed (1.0 is normal, try 0.9-1.2)
- Use higher-quality models if available
- Check that model downloaded completely
Slow Generation
Symptom: Audio generation takes a long time
Solutions:
- Enable GPU acceleration if you have an NVIDIA GPU
- Use faster models (smaller models = faster generation)
- Adjust speech speed to 1.5-2.0 for quicker output
- Allocate more CPU cores in Docker settings
- Use SSD storage instead of HDD
Out of Memory
Symptom: Container crashes or system freezes
Solutions:
-
Increase Docker memory limit:
services: speaches: mem_limit: 4g # Increase this value -
Use smaller models
-
Close other applications
-
Monitor with
docker stats
Voice Not Available
Symptom: Requested voice doesn't work
Solutions:
- Check available voices for your model
- Use one of the documented voices (af_bella, am_adam, etc.)
- Verify voice name spelling (case-sensitive)
Comparison: Local vs Cloud TTS
| Aspect | Local (Speaches) | Cloud (OpenAI/ElevenLabs) |
|---|---|---|
| Cost | Free after setup | $15-50 per 1M characters |
| Privacy | Complete | Data sent to provider |
| Speed | Depends on hardware | Usually faster |
| Quality | Good (improving) | Excellent |
| Setup | Moderate complexity | Simple API key |
| Offline | Yes | No |
| Rate Limits | None | Yes |
| Voices | Limited selection | Many options |
| Languages | Limited | 50+ languages |
Recommendation:
- Use Local for: Privacy-sensitive content, high-volume generation, development
- Use Cloud for: Production podcasts, multiple languages, premium quality needs
Best Practices
1. Model Management
Download Models Ahead of Time:
# Don't wait until generation time
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX
Keep Models Updated:
# Periodically check for model updates
# Remove old models to save space
docker compose exec speaches uv tool run speaches-cli model list
2. Voice Selection
Test Before Production:
- Generate test audio with different voices
- Choose voices that match your podcast style
- Use consistent voices for recurring speakers
Voice Characteristics:
- Clear pronunciation for educational content
- Expressive voices for storytelling
- Professional voices for business content
3. Resource Management
Monitor System Resources:
# Check Docker resource usage
docker stats speaches
# Monitor disk space for models
docker compose exec speaches df -h
Optimize Docker:
# Set appropriate limits
services:
speaches:
mem_limit: 4g
cpus: 2
4. Backup Strategy
Persist Model Cache:
The hf-hub-cache volume stores downloaded models. To backup:
# List volumes
docker volume ls
# Backup volume
docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar czf /backup/speaches-models-backup.tar.gz /data
Restore if needed:
docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar xzf /backup/speaches-models-backup.tar.gz -C /
5. Testing
Always Test First:
# Test with short text before generating long podcasts
curl "http://localhost:8969/v1/audio/speech" -s \
-H "Content-Type: application/json" \
--output test.mp3 \
--data '{
"input": "Test",
"model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
"voice": "af_bella"
}'
Complete Setup Script
For quick setup, save this as setup-speaches.sh:
#!/bin/bash
set -e
echo "Creating Speaches setup directory..."
mkdir -p speaches-setup
cd speaches-setup
echo "Creating docker-compose.yml..."
cat > docker-compose.yml << 'EOF'
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cpu
container_name: speaches
ports:
- "8969:8000"
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
restart: unless-stopped
volumes:
hf-hub-cache:
EOF
echo "Starting Speaches container..."
docker compose up -d
echo "Waiting for service to be ready..."
sleep 10
echo "Downloading TTS model..."
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX
echo "Testing speech generation..."
curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \
--output test-audio.mp3 \
--data '{
"input": "Hello! Speaches is now configured and ready to use with Open Notebook.",
"model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
"voice": "af_bella",
"speed": 1.0
}'
echo ""
echo "✅ Setup complete!"
echo ""
echo "Next steps:"
echo "1. Test the audio file: test-audio.mp3"
echo "2. Set environment variable: export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1"
echo "3. Configure in Open Notebook Settings → Models"
echo ""
echo "To stop Speaches: docker compose down"
echo "To restart: docker compose up -d"
Make it executable and run:
chmod +x setup-speaches.sh
./setup-speaches.sh
Using Other Local TTS Servers
The principles in this guide apply to any OpenAI-compatible TTS server. When using a different solution:
- Start your TTS server following its documentation
- Set the environment variable to point to your server:
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://your-server-url:port/v1 - Add the model in Open Notebook using provider
openai_compatible - Use the model name as specified by your TTS server
The key requirement is OpenAI API compatibility - specifically, the /v1/audio/speech endpoint.
Getting Help
Resources:
- Open Notebook Discord: https://discord.gg/37XJPXfz2w - Get help with Open Notebook integration
- Open Notebook Issues: Report integration issues to Open Notebook
- Speaches GitHub: https://github.com/speaches-ai/speaches - For Speaches-specific questions
- Your TTS Server Documentation: Consult the docs for your chosen TTS solution
Common Questions:
Q: Can I use Speaches with multiple Open Notebook instances? A: Yes! Just point each instance to the same Speaches server URL.
Q: How much disk space do I need? A: Each model is 300-800MB. Start with 5GB and add more as you download models.
Q: Can I use this for commercial podcasts? A: Check the model's license on Hugging Face. Most open models allow commercial use.
Q: How does quality compare to ElevenLabs or OpenAI? A: Local models are improving rapidly. For most use cases, quality is very good. Premium services still have an edge for the highest quality needs.
Related Documentation
- OpenAI-Compatible Setup - General OpenAI-compatible provider configuration
- AI Models Guide - Complete AI model configuration
- Podcast Generation - Learn about creating podcasts
- Ollama Setup - Another local AI option for language models
This guide should get you up and running with local text-to-speech in Open Notebook. Enjoy complete privacy and unlimited audio generation! 🎙️