From 80c9b7d3ff2f560a11fe5eafbebe3095f1581fb0 Mon Sep 17 00:00:00 2001 From: LUIS NOVO Date: Sun, 19 Oct 2025 12:06:29 -0300 Subject: [PATCH] docs: local tts setup guide --- docs/features/index.md | 23 + docs/features/local_tts.md | 668 +++++++++++++++++++++++++++++ docs/features/openai-compatible.md | 11 +- docs/features/podcasts.md | 10 + docs/index.md | 3 + 5 files changed, 714 insertions(+), 1 deletion(-) create mode 100644 docs/features/local_tts.md diff --git a/docs/features/index.md b/docs/features/index.md index c7bb2ce..79f4479 100644 --- a/docs/features/index.md +++ b/docs/features/index.md @@ -20,6 +20,29 @@ Master Open Notebook's granular context control system. - Integration with AI models for cost control - Advanced context features and automation +### 🔧 **[OpenAI-Compatible Providers](openai-compatible.md)** +Use any OpenAI-compatible endpoint with Open Notebook. +- LM Studio, Text Generation WebUI, vLLM support +- Mode-specific configuration for different capabilities +- Docker networking and remote server setup +- Comprehensive troubleshooting and best practices +- Works with local and cloud endpoints + +### 🎙️ **[Local Text-to-Speech](local_tts.md)** +Run text-to-speech completely locally using OpenAI-compatible TTS servers. +- Zero ongoing costs after setup +- Complete privacy - audio never leaves your machine +- Multiple voice options and models +- Perfect for podcast generation +- Various local TTS solutions available + +### 🦙 **[Ollama Setup](ollama.md)** +Configure local language models and embeddings with Ollama. +- Free, privacy-focused AI models +- Network configuration and Docker integration +- Model recommendations and optimization +- Troubleshooting and best practices + ## 🔧 Content Processing ### ⚡ **[Transformations](transformations.md)** diff --git a/docs/features/local_tts.md b/docs/features/local_tts.md new file mode 100644 index 0000000..5247422 --- /dev/null +++ b/docs/features/local_tts.md @@ -0,0 +1,668 @@ +# Local Text-to-Speech Setup + +Learn how to run text-to-speech models completely locally using OpenAI-compatible TTS servers, giving you full privacy control and zero ongoing costs for podcast and audio generation. + +This guide uses **Speaches** as an example implementation, but the principles apply to any OpenAI-compatible TTS server. + +## Why Local Text-to-Speech? + +Running text-to-speech locally offers significant advantages: + +- **🔒 Complete Privacy**: Your content never leaves your machine +- **💰 Zero Ongoing Costs**: No per-character or per-minute charges +- **⚡ No Rate Limits**: Generate unlimited audio without restrictions +- **🌐 Offline Capability**: Works without internet connection +- **🎯 Full Control**: Choose and customize your voice models +- **📈 Predictable Costs**: One-time setup, no surprises + +## Available Local TTS Solutions + +Open Notebook supports any OpenAI-compatible text-to-speech server. This guide uses **Speaches** as an example because it's: + +- Open-source and actively maintained +- Easy to set up with Docker +- Compatible with OpenAI's TTS API specification +- Supports multiple high-quality voice models + +### About Speaches + +[Speaches](https://github.com/speaches-ai/speaches) is an open-source, OpenAI-compatible text-to-speech server that runs locally on your machine. It provides: + +- **OpenAI API Compatibility**: Works seamlessly with Open Notebook's OpenAI-compatible provider +- **High-Quality Voices**: Support for multiple neural TTS models +- **Easy Model Management**: Simple CLI for downloading and managing voice models +- **Docker Support**: Run in containers for easy deployment +- **Multiple Voice Options**: Various voices and languages available +- **Customizable Speed**: Adjust speech rate to your preference + +> **Note**: If you're using a different OpenAI-compatible TTS server, the configuration steps will be similar - just adjust the endpoints and model names accordingly. + +## Quick Start with Speaches + +This section demonstrates setup using Speaches as an example. If you're using a different local TTS solution, adapt the steps accordingly. + +### Prerequisites + +- **Docker** installed on your system +- At least **2GB RAM** available +- **5GB disk space** for models + +### Basic Setup + +The fastest way to get started is using our example setup: + +**1. Create a project directory:** +```bash +mkdir speaches-setup +cd speaches-setup +``` + +**2. Create a `docker-compose.yml` file:** +```yaml +services: + speaches: + image: ghcr.io/speaches-ai/speaches:latest-cpu + container_name: speaches + ports: + - "8969:8000" + volumes: + - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub + restart: unless-stopped + +volumes: + hf-hub-cache: +``` + +**3. Start the Speaches server:** +```bash +docker compose up -d +``` + +**4. Download a TTS model:** +```bash +# Wait a few seconds for the container to start +sleep 10 + +# Download the recommended Kokoro model +docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX +``` + +**5. Test the setup:** +```bash +curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \ + --output test.mp3 \ + --data '{ + "input": "Hello! This is a test of local text to speech.", + "model": "speaches-ai/Kokoro-82M-v1.0-ONNX", + "voice": "af_bella", + "speed": 1.0 + }' +``` + +If successful, you'll have a `test.mp3` file with the generated speech! + +### Configure Open Notebook + +Now that Speaches is running, configure Open Notebook to use it: + +**1. Set the environment variable:** + +For Docker deployments: +```bash +docker run -d \ + --name open-notebook \ + -p 8502:8502 -p 5055:5055 \ + -v ./notebook_data:/app/data \ + -v ./surreal_data:/mydata \ + -e OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969 \ + lfnovo/open_notebook:v1-latest-single +``` + +For local development: +```bash +export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969 +``` + +**2. Add the model in Open Notebook:** + +1. Go to **Settings** → **Models** page +2. Click **Add Model** in the Text-to-Speech section +3. Configure the model: + - **Provider**: `openai_compatible` + - **Model Name**: `speaches-ai/Kokoro-82M-v1.0-ONNX` + - **Display Name**: `Kokoro Local TTS` (or your preference) +4. Click **Save** + +**3. Set as default (optional):** +- In Settings, set this model as your default Text-to-Speech model +- Now all podcast generation will use your local TTS + +## Available Voice Models + +Speaches supports various TTS models from Hugging Face. Here are some recommended options: + +### Kokoro (Recommended) +- **Model ID**: `speaches-ai/Kokoro-82M-v1.0-ONNX` +- **Size**: ~500MB +- **Quality**: High +- **Speed**: Fast +- **Languages**: English +- **Voices**: `af_bella`, `af_sarah`, `am_adam`, `am_michael`, and more + +### Other Models +You can use any compatible ONNX TTS model from Hugging Face. Check the [Speaches documentation](https://github.com/speaches-ai/speaches) for a complete list. + +## Available Voices + +The Kokoro model includes multiple voices with different characteristics: + +**Female Voices:** +- `af_bella` - Clear, professional +- `af_sarah` - Warm, friendly +- `af_nicole` - Energetic, expressive + +**Male Voices:** +- `am_adam` - Deep, authoritative +- `am_michael` - Friendly, conversational +- `bf_emma` - British accent, professional +- `bm_george` - British accent, formal + +**Testing Voices:** +```bash +# Try different voices to find your favorite +for voice in af_bella af_sarah am_adam am_michael; do + curl "http://localhost:8969/v1/audio/speech" -s \ + -H "Content-Type: application/json" \ + --output "test_${voice}.mp3" \ + --data "{ + \"input\": \"Hello! This is a test of the ${voice} voice.\", + \"model\": \"speaches-ai/Kokoro-82M-v1.0-ONNX\", + \"voice\": \"${voice}\" + }" +done +``` + +## Advanced Configuration + +### GPU Acceleration + +For faster processing with NVIDIA GPUs: + +```yaml +services: + speaches: + image: ghcr.io/speaches-ai/speaches:latest-cuda # GPU-enabled image + container_name: speaches + ports: + - "8969:8000" + volumes: + - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub + restart: unless-stopped + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] + +volumes: + hf-hub-cache: +``` + +### Custom Port + +If port 8969 is already in use, change it in docker-compose.yml: + +```yaml +ports: + - "9000:8000" # Use port 9000 instead +``` + +Then update your environment variable: +```bash +export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:9000 +``` + +### Multiple Models + +Download and use multiple models for different purposes: + +```bash +# Download additional models +docker compose exec speaches uv tool run speaches-cli model download model-name-1 +docker compose exec speaches uv tool run speaches-cli model download model-name-2 + +# List downloaded models +docker compose exec speaches uv tool run speaches-cli model list +``` + +In Open Notebook, add each model separately and choose which to use for different podcasts. + +## Network Configuration + +### Docker Networking + +When Open Notebook runs in Docker and needs to reach Speaches: + +**On macOS/Windows:** +```bash +export OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969 +``` + +**On Linux:** +```bash +# Option 1: Use Docker bridge IP +export OPENAI_COMPATIBLE_BASE_URL_TTS=http://172.17.0.1:8969 + +# Option 2: Use host networking +docker run --network host ... +``` + +### Remote Speaches Server + +Run Speaches on a different machine for distributed processing: + +```bash +# On the server machine +docker compose up -d + +# Allow external connections (be careful with firewall settings) +# Update docker-compose.yml to bind to 0.0.0.0:8969 +``` + +Then configure Open Notebook: +```bash +export OPENAI_COMPATIBLE_BASE_URL_TTS=http://server-ip:8969 +``` + +**Security Warning:** Only expose Speaches on trusted networks or use proper authentication/firewall rules. + +## Podcast Generation + +### Creating Podcasts with Local TTS + +Once configured, use Speaches for podcast generation: + +1. **Go to Podcasts page** in Open Notebook +2. **Create or edit an Episode Profile** +3. **Configure speakers:** + - For each speaker, select your Speaches model + - Choose different voices (e.g., `af_bella` for host, `am_adam` for guest) +4. **Generate podcast** +5. **Audio is generated locally** using your Speaches server + +### Multi-Speaker Setup + +Create natural-sounding conversations with different voices: + +``` +Speaker 1 (Host): +- Model: speaches-ai/Kokoro-82M-v1.0-ONNX +- Voice: af_bella + +Speaker 2 (Guest): +- Model: speaches-ai/Kokoro-82M-v1.0-ONNX +- Voice: am_adam + +Speaker 3 (Narrator): +- Model: speaches-ai/Kokoro-82M-v1.0-ONNX +- Voice: bf_emma +``` + +## Performance Optimization + +### CPU Performance + +**Recommended Specs:** +- 4+ CPU cores +- 4GB+ RAM +- SSD storage + +**Tips:** +- Close unnecessary applications +- Use quantized models when available +- Adjust speech speed for faster generation + +### Memory Management + +Monitor Docker memory usage: +```bash +docker stats speaches +``` + +Allocate more memory if needed: +```yaml +services: + speaches: + # ... other config ... + mem_limit: 4g # Adjust based on your system +``` + +### Batch Processing + +For generating multiple audio files, Speaches handles concurrent requests efficiently. Open Notebook automatically manages this during podcast generation. + +## Troubleshooting + +### Service Won't Start + +**Symptom:** Container exits immediately + +**Solutions:** +```bash +# Check logs +docker compose logs speaches + +# Verify Docker is running +docker ps + +# Check port availability +lsof -i :8969 # macOS/Linux +netstat -ano | findstr :8969 # Windows +``` + +--- + +### Connection Refused + +**Symptom:** Open Notebook can't reach Speaches + +**Solutions:** +1. **Verify Speaches is running:** + ```bash + curl http://localhost:8969/v1/models + ``` + +2. **Check Docker networking:** + - Use `host.docker.internal` instead of `localhost` when Open Notebook is in Docker + - Verify firewall settings + +3. **Test from inside Open Notebook container:** + ```bash + docker exec -it open-notebook curl http://host.docker.internal:8969/v1/models + ``` + +--- + +### Model Not Found + +**Symptom:** Error about missing model during generation + +**Solutions:** +1. **Verify model is downloaded:** + ```bash + docker compose exec speaches uv tool run speaches-cli model list + ``` + +2. **Download the model:** + ```bash + docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX + ``` + +3. **Check model name matches** what you configured in Open Notebook + +--- + +### Poor Audio Quality + +**Symptom:** Generated speech sounds robotic or unclear + +**Solutions:** +- Try different voices +- Adjust speech speed (1.0 is normal, try 0.9-1.2) +- Use higher-quality models if available +- Check that model downloaded completely + +--- + +### Slow Generation + +**Symptom:** Audio generation takes a long time + +**Solutions:** +- **Enable GPU acceleration** if you have an NVIDIA GPU +- **Use faster models** (smaller models = faster generation) +- **Adjust speech speed** to 1.5-2.0 for quicker output +- **Allocate more CPU cores** in Docker settings +- **Use SSD storage** instead of HDD + +--- + +### Out of Memory + +**Symptom:** Container crashes or system freezes + +**Solutions:** +1. **Increase Docker memory limit:** + ```yaml + services: + speaches: + mem_limit: 4g # Increase this value + ``` + +2. **Use smaller models** +3. **Close other applications** +4. **Monitor with** `docker stats` + +--- + +### Voice Not Available + +**Symptom:** Requested voice doesn't work + +**Solutions:** +- Check available voices for your model +- Use one of the documented voices (af_bella, am_adam, etc.) +- Verify voice name spelling (case-sensitive) + +## Comparison: Local vs Cloud TTS + +| Aspect | Local (Speaches) | Cloud (OpenAI/ElevenLabs) | +|--------|------------------|---------------------------| +| **Cost** | Free after setup | $15-50 per 1M characters | +| **Privacy** | Complete | Data sent to provider | +| **Speed** | Depends on hardware | Usually faster | +| **Quality** | Good (improving) | Excellent | +| **Setup** | Moderate complexity | Simple API key | +| **Offline** | Yes | No | +| **Rate Limits** | None | Yes | +| **Voices** | Limited selection | Many options | +| **Languages** | Limited | 50+ languages | + +**Recommendation:** +- **Use Local** for: Privacy-sensitive content, high-volume generation, development +- **Use Cloud** for: Production podcasts, multiple languages, premium quality needs + +## Best Practices + +### 1. Model Management + +**Download Models Ahead of Time:** +```bash +# Don't wait until generation time +docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX +``` + +**Keep Models Updated:** +```bash +# Periodically check for model updates +# Remove old models to save space +docker compose exec speaches uv tool run speaches-cli model list +``` + +### 2. Voice Selection + +**Test Before Production:** +- Generate test audio with different voices +- Choose voices that match your podcast style +- Use consistent voices for recurring speakers + +**Voice Characteristics:** +- Clear pronunciation for educational content +- Expressive voices for storytelling +- Professional voices for business content + +### 3. Resource Management + +**Monitor System Resources:** +```bash +# Check Docker resource usage +docker stats speaches + +# Monitor disk space for models +docker compose exec speaches df -h +``` + +**Optimize Docker:** +```yaml +# Set appropriate limits +services: + speaches: + mem_limit: 4g + cpus: 2 +``` + +### 4. Backup Strategy + +**Persist Model Cache:** +The `hf-hub-cache` volume stores downloaded models. To backup: +```bash +# List volumes +docker volume ls + +# Backup volume +docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar czf /backup/speaches-models-backup.tar.gz /data +``` + +**Restore if needed:** +```bash +docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar xzf /backup/speaches-models-backup.tar.gz -C / +``` + +### 5. Testing + +**Always Test First:** +```bash +# Test with short text before generating long podcasts +curl "http://localhost:8969/v1/audio/speech" -s \ + -H "Content-Type: application/json" \ + --output test.mp3 \ + --data '{ + "input": "Test", + "model": "speaches-ai/Kokoro-82M-v1.0-ONNX", + "voice": "af_bella" + }' +``` + +## Complete Setup Script + +For quick setup, save this as `setup-speaches.sh`: + +```bash +#!/bin/bash +set -e + +echo "Creating Speaches setup directory..." +mkdir -p speaches-setup +cd speaches-setup + +echo "Creating docker-compose.yml..." +cat > docker-compose.yml << 'EOF' +services: + speaches: + image: ghcr.io/speaches-ai/speaches:latest-cpu + container_name: speaches + ports: + - "8969:8000" + volumes: + - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub + restart: unless-stopped + +volumes: + hf-hub-cache: +EOF + +echo "Starting Speaches container..." +docker compose up -d + +echo "Waiting for service to be ready..." +sleep 10 + +echo "Downloading TTS model..." +docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX + +echo "Testing speech generation..." +curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \ + --output test-audio.mp3 \ + --data '{ + "input": "Hello! Speaches is now configured and ready to use with Open Notebook.", + "model": "speaches-ai/Kokoro-82M-v1.0-ONNX", + "voice": "af_bella", + "speed": 1.0 + }' + +echo "" +echo "✅ Setup complete!" +echo "" +echo "Next steps:" +echo "1. Test the audio file: test-audio.mp3" +echo "2. Set environment variable: export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969" +echo "3. Configure in Open Notebook Settings → Models" +echo "" +echo "To stop Speaches: docker compose down" +echo "To restart: docker compose up -d" +``` + +Make it executable and run: +```bash +chmod +x setup-speaches.sh +./setup-speaches.sh +``` + +## Using Other Local TTS Servers + +The principles in this guide apply to any OpenAI-compatible TTS server. When using a different solution: + +1. **Start your TTS server** following its documentation +2. **Set the environment variable** to point to your server: + ```bash + export OPENAI_COMPATIBLE_BASE_URL_TTS=http://your-server-url:port + ``` +3. **Add the model in Open Notebook** using provider `openai_compatible` +4. **Use the model name** as specified by your TTS server + +The key requirement is OpenAI API compatibility - specifically, the `/v1/audio/speech` endpoint. + +## Getting Help + +**Resources:** +- **Open Notebook Discord**: [https://discord.gg/37XJPXfz2w](https://discord.gg/37XJPXfz2w) - Get help with Open Notebook integration +- **Open Notebook Issues**: Report integration issues to Open Notebook +- **Speaches GitHub**: [https://github.com/speaches-ai/speaches](https://github.com/speaches-ai/speaches) - For Speaches-specific questions +- **Your TTS Server Documentation**: Consult the docs for your chosen TTS solution + +**Common Questions:** + +**Q: Can I use Speaches with multiple Open Notebook instances?** +A: Yes! Just point each instance to the same Speaches server URL. + +**Q: How much disk space do I need?** +A: Each model is 300-800MB. Start with 5GB and add more as you download models. + +**Q: Can I use this for commercial podcasts?** +A: Check the model's license on Hugging Face. Most open models allow commercial use. + +**Q: How does quality compare to ElevenLabs or OpenAI?** +A: Local models are improving rapidly. For most use cases, quality is very good. Premium services still have an edge for the highest quality needs. + +## Related Documentation + +- **[OpenAI-Compatible Setup](openai-compatible.md)** - General OpenAI-compatible provider configuration +- **[AI Models Guide](ai-models.md)** - Complete AI model configuration +- **[Podcast Generation](podcasts.md)** - Learn about creating podcasts +- **[Ollama Setup](ollama.md)** - Another local AI option for language models + +--- + +This guide should get you up and running with local text-to-speech in Open Notebook. Enjoy complete privacy and unlimited audio generation! 🎙️ diff --git a/docs/features/openai-compatible.md b/docs/features/openai-compatible.md index 9bcd974..c463350 100644 --- a/docs/features/openai-compatible.md +++ b/docs/features/openai-compatible.md @@ -53,9 +53,11 @@ export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:8080/v1 # Speech services on a different server export OPENAI_COMPATIBLE_BASE_URL_STT=http://localhost:9000/v1 -export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:9000/v1 +export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1 ``` +> **🎙️ Want free, local text-to-speech?** Check our [Local TTS Setup Guide](local_tts.md) for completely private, zero-cost podcast generation! + ## Environment Variable Reference ### Generic Configuration @@ -530,6 +532,13 @@ export OPENAI_API_KEY=your_openai_key # Fallback to OpenAI for embeddings - Validate all required modalities work - Check error handling +## Related Guides + +**OpenAI-Compatible Setups:** +- **[Local TTS Setup](local_tts.md)** - Free, private text-to-speech for podcasts +- **[Ollama Setup](ollama.md)** - Local language models and embeddings +- **[AI Models Guide](ai-models.md)** - Complete model configuration overview + ## Getting Help **Community Resources:** diff --git a/docs/features/podcasts.md b/docs/features/podcasts.md index e0288da..f366442 100644 --- a/docs/features/podcasts.md +++ b/docs/features/podcasts.md @@ -114,6 +114,16 @@ Each speaker profile includes: - Emotional expression control - Professional-grade output +#### **Local TTS (OpenAI-Compatible)** +- 🆕 **Completely Free**: Zero ongoing costs after setup +- 🔒 **Full Privacy**: Audio generation never leaves your machine +- 🚀 **No Rate Limits**: Generate unlimited podcasts +- 🎙️ **Multiple Voices**: Various high-quality voice options +- ⚡ **Fast Processing**: Local generation without network latency +- 🔧 **Multiple Options**: Various local TTS servers available + +> **💡 Want to run TTS locally?** Check our comprehensive [Local TTS Setup Guide](local_tts.md) for step-by-step setup instructions, voice selection tips, and troubleshooting help. Perfect for privacy-focused users or high-volume podcast generation! + ## 🔄 Background Processing & Queue Management ### Non-Blocking Experience diff --git a/docs/index.md b/docs/index.md index fe53cc2..ad69554 100644 --- a/docs/index.md +++ b/docs/index.md @@ -41,6 +41,9 @@ Deep dives into what makes Open Notebook special. - **[Transformations](features/transformations.md)** - Custom content processing - **[Podcasts](features/podcasts.md)** - Multi-speaker podcast generation - **[Citations](features/citations.md)** - Research integrity support +- **[Local TTS](features/local_tts.md)** - 🆕 Free, private local text-to-speech +- **[OpenAI-Compatible](features/openai-compatible.md)** - Use LM Studio and compatible endpoints +- **[Ollama](features/ollama.md)** - Local AI models setup - **[REST API](development/api-reference.md)** - [![API Docs](https://img.shields.io/badge/API-Documentation-blue?style=flat-square)](http://localhost:5055/docs) Complete programmatic access ---