# Local Text-to-Speech Setup Learn how to run text-to-speech models completely locally using OpenAI-compatible TTS servers, giving you full privacy control and zero ongoing costs for podcast and audio generation. This guide uses **Speaches** as an example implementation, but the principles apply to any OpenAI-compatible TTS server. ## Why Local Text-to-Speech? Running text-to-speech locally offers significant advantages: - **🔒 Complete Privacy**: Your content never leaves your machine - **💰 Zero Ongoing Costs**: No per-character or per-minute charges - **⚡ No Rate Limits**: Generate unlimited audio without restrictions - **🌐 Offline Capability**: Works without internet connection - **🎯 Full Control**: Choose and customize your voice models - **📈 Predictable Costs**: One-time setup, no surprises ## Available Local TTS Solutions Open Notebook supports any OpenAI-compatible text-to-speech server. This guide uses **Speaches** as an example because it's: - Open-source and actively maintained - Easy to set up with Docker - Compatible with OpenAI's TTS API specification - Supports multiple high-quality voice models ### About Speaches [Speaches](https://github.com/speaches-ai/speaches) is an open-source, OpenAI-compatible text-to-speech server that runs locally on your machine. It provides: - **OpenAI API Compatibility**: Works seamlessly with Open Notebook's OpenAI-compatible provider - **High-Quality Voices**: Support for multiple neural TTS models - **Easy Model Management**: Simple CLI for downloading and managing voice models - **Docker Support**: Run in containers for easy deployment - **Multiple Voice Options**: Various voices and languages available - **Customizable Speed**: Adjust speech rate to your preference > **Note**: If you're using a different OpenAI-compatible TTS server, the configuration steps will be similar - just adjust the endpoints and model names accordingly. ## Quick Start with Speaches This section demonstrates setup using Speaches as an example. If you're using a different local TTS solution, adapt the steps accordingly. ### Prerequisites - **Docker** installed on your system - At least **2GB RAM** available - **5GB disk space** for models ### Basic Setup The fastest way to get started is using our example setup: **1. Create a project directory:** ```bash mkdir speaches-setup cd speaches-setup ``` **2. Create a `docker-compose.yml` file:** ```yaml services: speaches: image: ghcr.io/speaches-ai/speaches:latest-cpu container_name: speaches ports: - "8969:8000" volumes: - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub restart: unless-stopped volumes: hf-hub-cache: ``` **3. Start the Speaches server:** ```bash docker compose up -d ``` **4. Download a TTS model:** ```bash # Wait a few seconds for the container to start sleep 10 # Download the recommended Kokoro model docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX ``` **5. Test the setup:** ```bash curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \ --output test.mp3 \ --data '{ "input": "Hello! This is a test of local text to speech.", "model": "speaches-ai/Kokoro-82M-v1.0-ONNX", "voice": "af_bella", "speed": 1.0 }' ``` If successful, you'll have a `test.mp3` file with the generated speech! ### Configure Open Notebook Now that Speaches is running, configure Open Notebook to use it: **1. Set the environment variable:** For Docker deployments: ```bash docker run -d \ --name open-notebook \ -p 8502:8502 -p 5055:5055 \ -v ./notebook_data:/app/data \ -v ./surreal_data:/mydata \ -e OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1 \ lfnovo/open_notebook:v1-latest-single ``` For local development: ```bash export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1 ``` **2. Add the model in Open Notebook:** 1. Go to **Settings** → **Models** page 2. Click **Add Model** in the Text-to-Speech section 3. Configure the model: - **Provider**: `openai_compatible` - **Model Name**: `speaches-ai/Kokoro-82M-v1.0-ONNX` - **Display Name**: `Kokoro Local TTS` (or your preference) 4. Click **Save** **3. Set as default (optional):** - In Settings, set this model as your default Text-to-Speech model - Now all podcast generation will use your local TTS ## Available Voice Models Speaches supports various TTS models from Hugging Face. Here are some recommended options: ### Kokoro (Recommended) - **Model ID**: `speaches-ai/Kokoro-82M-v1.0-ONNX` - **Size**: ~500MB - **Quality**: High - **Speed**: Fast - **Languages**: English - **Voices**: `af_bella`, `af_sarah`, `am_adam`, `am_michael`, and more ### Other Models You can use any compatible ONNX TTS model from Hugging Face. Check the [Speaches documentation](https://github.com/speaches-ai/speaches) for a complete list. ## Available Voices The Kokoro model includes multiple voices with different characteristics: **Female Voices:** - `af_bella` - Clear, professional - `af_sarah` - Warm, friendly - `af_nicole` - Energetic, expressive **Male Voices:** - `am_adam` - Deep, authoritative - `am_michael` - Friendly, conversational - `bf_emma` - British accent, professional - `bm_george` - British accent, formal **Testing Voices:** ```bash # Try different voices to find your favorite for voice in af_bella af_sarah am_adam am_michael; do curl "http://localhost:8969/v1/audio/speech" -s \ -H "Content-Type: application/json" \ --output "test_${voice}.mp3" \ --data "{ \"input\": \"Hello! This is a test of the ${voice} voice.\", \"model\": \"speaches-ai/Kokoro-82M-v1.0-ONNX\", \"voice\": \"${voice}\" }" done ``` ## Advanced Configuration ### GPU Acceleration For faster processing with NVIDIA GPUs: ```yaml services: speaches: image: ghcr.io/speaches-ai/speaches:latest-cuda # GPU-enabled image container_name: speaches ports: - "8969:8000" volumes: - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub restart: unless-stopped deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] volumes: hf-hub-cache: ``` ### Custom Port If port 8969 is already in use, change it in docker-compose.yml: ```yaml ports: - "9000:8000" # Use port 9000 instead ``` Then update your environment variable: ```bash export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:9000/v1 ``` ### Multiple Models Download and use multiple models for different purposes: ```bash # Download additional models docker compose exec speaches uv tool run speaches-cli model download model-name-1 docker compose exec speaches uv tool run speaches-cli model download model-name-2 # List downloaded models docker compose exec speaches uv tool run speaches-cli model list ``` In Open Notebook, add each model separately and choose which to use for different podcasts. ## Network Configuration ### Docker Networking When Open Notebook runs in Docker and needs to reach Speaches: **On macOS/Windows:** ```bash export OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1 ``` **On Linux:** ```bash # Option 1: Use Docker bridge IP export OPENAI_COMPATIBLE_BASE_URL_TTS=http://172.17.0.1:8969/v1 # Option 2: Use host networking docker run --network host ... ``` ### Remote Speaches Server Run Speaches on a different machine for distributed processing: ```bash # On the server machine docker compose up -d # Allow external connections (be careful with firewall settings) # Update docker-compose.yml to bind to 0.0.0.0:8969 ``` Then configure Open Notebook: ```bash export OPENAI_COMPATIBLE_BASE_URL_TTS=http://server-ip:8969/v1 ``` **Security Warning:** Only expose Speaches on trusted networks or use proper authentication/firewall rules. ## Podcast Generation ### Creating Podcasts with Local TTS Once configured, use Speaches for podcast generation: 1. **Go to Podcasts page** in Open Notebook 2. **Create or edit an Episode Profile** 3. **Configure speakers:** - For each speaker, select your Speaches model - Choose different voices (e.g., `af_bella` for host, `am_adam` for guest) 4. **Generate podcast** 5. **Audio is generated locally** using your Speaches server ### Multi-Speaker Setup Create natural-sounding conversations with different voices: ``` Speaker 1 (Host): - Model: speaches-ai/Kokoro-82M-v1.0-ONNX - Voice: af_bella Speaker 2 (Guest): - Model: speaches-ai/Kokoro-82M-v1.0-ONNX - Voice: am_adam Speaker 3 (Narrator): - Model: speaches-ai/Kokoro-82M-v1.0-ONNX - Voice: bf_emma ``` ## Performance Optimization ### CPU Performance **Recommended Specs:** - 4+ CPU cores - 4GB+ RAM - SSD storage **Tips:** - Close unnecessary applications - Use quantized models when available - Adjust speech speed for faster generation ### Memory Management Monitor Docker memory usage: ```bash docker stats speaches ``` Allocate more memory if needed: ```yaml services: speaches: # ... other config ... mem_limit: 4g # Adjust based on your system ``` ### Batch Processing For generating multiple audio files, Speaches handles concurrent requests efficiently. Open Notebook automatically manages this during podcast generation. ## Troubleshooting ### Service Won't Start **Symptom:** Container exits immediately **Solutions:** ```bash # Check logs docker compose logs speaches # Verify Docker is running docker ps # Check port availability lsof -i :8969 # macOS/Linux netstat -ano | findstr :8969 # Windows ``` --- ### Connection Refused **Symptom:** Open Notebook can't reach Speaches **Solutions:** 1. **Verify Speaches is running:** ```bash curl http://localhost:8969/v1/models ``` 2. **Check Docker networking:** - Use `host.docker.internal` instead of `localhost` when Open Notebook is in Docker - Verify firewall settings 3. **Test from inside Open Notebook container:** ```bash docker exec -it open-notebook curl http://host.docker.internal:8969/v1/models ``` --- ### Model Not Found **Symptom:** Error about missing model during generation **Solutions:** 1. **Verify model is downloaded:** ```bash docker compose exec speaches uv tool run speaches-cli model list ``` 2. **Download the model:** ```bash docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX ``` 3. **Check model name matches** what you configured in Open Notebook --- ### Poor Audio Quality **Symptom:** Generated speech sounds robotic or unclear **Solutions:** - Try different voices - Adjust speech speed (1.0 is normal, try 0.9-1.2) - Use higher-quality models if available - Check that model downloaded completely --- ### Slow Generation **Symptom:** Audio generation takes a long time **Solutions:** - **Enable GPU acceleration** if you have an NVIDIA GPU - **Use faster models** (smaller models = faster generation) - **Adjust speech speed** to 1.5-2.0 for quicker output - **Allocate more CPU cores** in Docker settings - **Use SSD storage** instead of HDD --- ### Out of Memory **Symptom:** Container crashes or system freezes **Solutions:** 1. **Increase Docker memory limit:** ```yaml services: speaches: mem_limit: 4g # Increase this value ``` 2. **Use smaller models** 3. **Close other applications** 4. **Monitor with** `docker stats` --- ### Voice Not Available **Symptom:** Requested voice doesn't work **Solutions:** - Check available voices for your model - Use one of the documented voices (af_bella, am_adam, etc.) - Verify voice name spelling (case-sensitive) ## Comparison: Local vs Cloud TTS | Aspect | Local (Speaches) | Cloud (OpenAI/ElevenLabs) | |--------|------------------|---------------------------| | **Cost** | Free after setup | $15-50 per 1M characters | | **Privacy** | Complete | Data sent to provider | | **Speed** | Depends on hardware | Usually faster | | **Quality** | Good (improving) | Excellent | | **Setup** | Moderate complexity | Simple API key | | **Offline** | Yes | No | | **Rate Limits** | None | Yes | | **Voices** | Limited selection | Many options | | **Languages** | Limited | 50+ languages | **Recommendation:** - **Use Local** for: Privacy-sensitive content, high-volume generation, development - **Use Cloud** for: Production podcasts, multiple languages, premium quality needs ## Best Practices ### 1. Model Management **Download Models Ahead of Time:** ```bash # Don't wait until generation time docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX ``` **Keep Models Updated:** ```bash # Periodically check for model updates # Remove old models to save space docker compose exec speaches uv tool run speaches-cli model list ``` ### 2. Voice Selection **Test Before Production:** - Generate test audio with different voices - Choose voices that match your podcast style - Use consistent voices for recurring speakers **Voice Characteristics:** - Clear pronunciation for educational content - Expressive voices for storytelling - Professional voices for business content ### 3. Resource Management **Monitor System Resources:** ```bash # Check Docker resource usage docker stats speaches # Monitor disk space for models docker compose exec speaches df -h ``` **Optimize Docker:** ```yaml # Set appropriate limits services: speaches: mem_limit: 4g cpus: 2 ``` ### 4. Backup Strategy **Persist Model Cache:** The `hf-hub-cache` volume stores downloaded models. To backup: ```bash # List volumes docker volume ls # Backup volume docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar czf /backup/speaches-models-backup.tar.gz /data ``` **Restore if needed:** ```bash docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar xzf /backup/speaches-models-backup.tar.gz -C / ``` ### 5. Testing **Always Test First:** ```bash # Test with short text before generating long podcasts curl "http://localhost:8969/v1/audio/speech" -s \ -H "Content-Type: application/json" \ --output test.mp3 \ --data '{ "input": "Test", "model": "speaches-ai/Kokoro-82M-v1.0-ONNX", "voice": "af_bella" }' ``` ## Complete Setup Script For quick setup, save this as `setup-speaches.sh`: ```bash #!/bin/bash set -e echo "Creating Speaches setup directory..." mkdir -p speaches-setup cd speaches-setup echo "Creating docker-compose.yml..." cat > docker-compose.yml << 'EOF' services: speaches: image: ghcr.io/speaches-ai/speaches:latest-cpu container_name: speaches ports: - "8969:8000" volumes: - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub restart: unless-stopped volumes: hf-hub-cache: EOF echo "Starting Speaches container..." docker compose up -d echo "Waiting for service to be ready..." sleep 10 echo "Downloading TTS model..." docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX echo "Testing speech generation..." curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \ --output test-audio.mp3 \ --data '{ "input": "Hello! Speaches is now configured and ready to use with Open Notebook.", "model": "speaches-ai/Kokoro-82M-v1.0-ONNX", "voice": "af_bella", "speed": 1.0 }' echo "" echo "✅ Setup complete!" echo "" echo "Next steps:" echo "1. Test the audio file: test-audio.mp3" echo "2. Set environment variable: export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1" echo "3. Configure in Open Notebook Settings → Models" echo "" echo "To stop Speaches: docker compose down" echo "To restart: docker compose up -d" ``` Make it executable and run: ```bash chmod +x setup-speaches.sh ./setup-speaches.sh ``` ## Using Other Local TTS Servers The principles in this guide apply to any OpenAI-compatible TTS server. When using a different solution: 1. **Start your TTS server** following its documentation 2. **Set the environment variable** to point to your server: ```bash export OPENAI_COMPATIBLE_BASE_URL_TTS=http://your-server-url:port/v1 ``` 3. **Add the model in Open Notebook** using provider `openai_compatible` 4. **Use the model name** as specified by your TTS server The key requirement is OpenAI API compatibility - specifically, the `/v1/audio/speech` endpoint. ## Getting Help **Resources:** - **Open Notebook Discord**: [https://discord.gg/37XJPXfz2w](https://discord.gg/37XJPXfz2w) - Get help with Open Notebook integration - **Open Notebook Issues**: Report integration issues to Open Notebook - **Speaches GitHub**: [https://github.com/speaches-ai/speaches](https://github.com/speaches-ai/speaches) - For Speaches-specific questions - **Your TTS Server Documentation**: Consult the docs for your chosen TTS solution **Common Questions:** **Q: Can I use Speaches with multiple Open Notebook instances?** A: Yes! Just point each instance to the same Speaches server URL. **Q: How much disk space do I need?** A: Each model is 300-800MB. Start with 5GB and add more as you download models. **Q: Can I use this for commercial podcasts?** A: Check the model's license on Hugging Face. Most open models allow commercial use. **Q: How does quality compare to ElevenLabs or OpenAI?** A: Local models are improving rapidly. For most use cases, quality is very good. Premium services still have an edge for the highest quality needs. ## Related Documentation - **[OpenAI-Compatible Setup](openai-compatible.md)** - General OpenAI-compatible provider configuration - **[AI Models Guide](ai-models.md)** - Complete AI model configuration - **[Podcast Generation](podcasts.md)** - Learn about creating podcasts - **[Ollama Setup](ollama.md)** - Another local AI option for language models --- This guide should get you up and running with local text-to-speech in Open Notebook. Enjoy complete privacy and unlimited audio generation! 🎙️