LUIS NOVO dd79d7a511 docs: clearly shows /v1 to prevent user mistakes

2025-10-19 12:08:48 -03:00

18 KiB

Raw Blame History

Local Text-to-Speech Setup

Learn how to run text-to-speech models completely locally using OpenAI-compatible TTS servers, giving you full privacy control and zero ongoing costs for podcast and audio generation.

This guide uses Speaches as an example implementation, but the principles apply to any OpenAI-compatible TTS server.

Why Local Text-to-Speech?

Running text-to-speech locally offers significant advantages:

🔒 Complete Privacy: Your content never leaves your machine
💰 Zero Ongoing Costs: No per-character or per-minute charges
⚡ No Rate Limits: Generate unlimited audio without restrictions
🌐 Offline Capability: Works without internet connection
🎯 Full Control: Choose and customize your voice models
📈 Predictable Costs: One-time setup, no surprises

Available Local TTS Solutions

Open Notebook supports any OpenAI-compatible text-to-speech server. This guide uses Speaches as an example because it's:

Open-source and actively maintained
Easy to set up with Docker
Compatible with OpenAI's TTS API specification
Supports multiple high-quality voice models

About Speaches

Speaches is an open-source, OpenAI-compatible text-to-speech server that runs locally on your machine. It provides:

OpenAI API Compatibility: Works seamlessly with Open Notebook's OpenAI-compatible provider
High-Quality Voices: Support for multiple neural TTS models
Easy Model Management: Simple CLI for downloading and managing voice models
Docker Support: Run in containers for easy deployment
Multiple Voice Options: Various voices and languages available
Customizable Speed: Adjust speech rate to your preference

Note

: If you're using a different OpenAI-compatible TTS server, the configuration steps will be similar - just adjust the endpoints and model names accordingly.

Quick Start with Speaches

This section demonstrates setup using Speaches as an example. If you're using a different local TTS solution, adapt the steps accordingly.

Prerequisites

Docker installed on your system
At least 2GB RAM available
5GB disk space for models

Basic Setup

The fastest way to get started is using our example setup:

1. Create a project directory:

mkdir speaches-setup
cd speaches-setup

2. Create a docker-compose.yml file:

services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest-cpu
    container_name: speaches
    ports:
      - "8969:8000"
    volumes:
      - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
    restart: unless-stopped

volumes:
  hf-hub-cache:

3. Start the Speaches server:

docker compose up -d

4. Download a TTS model:

# Wait a few seconds for the container to start
sleep 10

# Download the recommended Kokoro model
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX

5. Test the setup:

curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \
  --output test.mp3 \
  --data '{
    "input": "Hello! This is a test of local text to speech.",
    "model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
    "voice": "af_bella",
    "speed": 1.0
  }'

If successful, you'll have a test.mp3 file with the generated speech!

Configure Open Notebook

Now that Speaches is running, configure Open Notebook to use it:

1. Set the environment variable:

For Docker deployments:

docker run -d \
  --name open-notebook \
  -p 8502:8502 -p 5055:5055 \
  -v ./notebook_data:/app/data \
  -v ./surreal_data:/mydata \
  -e OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1 \
  lfnovo/open_notebook:v1-latest-single

For local development:

export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1

2. Add the model in Open Notebook:

Go to Settings → Models page
Click Add Model in the Text-to-Speech section
Configure the model:
- Provider: openai_compatible
- Model Name: speaches-ai/Kokoro-82M-v1.0-ONNX
- Display Name: Kokoro Local TTS (or your preference)
Click Save

3. Set as default (optional):

In Settings, set this model as your default Text-to-Speech model
Now all podcast generation will use your local TTS

Available Voice Models

Speaches supports various TTS models from Hugging Face. Here are some recommended options:

Kokoro (Recommended)

Model ID: speaches-ai/Kokoro-82M-v1.0-ONNX
Size: ~500MB
Quality: High
Speed: Fast
Languages: English
Voices: af_bella, af_sarah, am_adam, am_michael, and more

Other Models

You can use any compatible ONNX TTS model from Hugging Face. Check the Speaches documentation for a complete list.

Available Voices

The Kokoro model includes multiple voices with different characteristics:

Female Voices:

af_bella - Clear, professional
af_sarah - Warm, friendly
af_nicole - Energetic, expressive

Male Voices:

am_adam - Deep, authoritative
am_michael - Friendly, conversational
bf_emma - British accent, professional
bm_george - British accent, formal

Testing Voices:

# Try different voices to find your favorite
for voice in af_bella af_sarah am_adam am_michael; do
  curl "http://localhost:8969/v1/audio/speech" -s \
    -H "Content-Type: application/json" \
    --output "test_${voice}.mp3" \
    --data "{
      \"input\": \"Hello! This is a test of the ${voice} voice.\",
      \"model\": \"speaches-ai/Kokoro-82M-v1.0-ONNX\",
      \"voice\": \"${voice}\"
    }"
done

Advanced Configuration

GPU Acceleration

For faster processing with NVIDIA GPUs:

services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest-cuda  # GPU-enabled image
    container_name: speaches
    ports:
      - "8969:8000"
    volumes:
      - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

volumes:
  hf-hub-cache:

Custom Port

If port 8969 is already in use, change it in docker-compose.yml:

ports:
  - "9000:8000"  # Use port 9000 instead

Then update your environment variable:

export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:9000/v1

Multiple Models

Download and use multiple models for different purposes:

# Download additional models
docker compose exec speaches uv tool run speaches-cli model download model-name-1
docker compose exec speaches uv tool run speaches-cli model download model-name-2

# List downloaded models
docker compose exec speaches uv tool run speaches-cli model list

In Open Notebook, add each model separately and choose which to use for different podcasts.

Network Configuration

Docker Networking

When Open Notebook runs in Docker and needs to reach Speaches:

On macOS/Windows:

export OPENAI_COMPATIBLE_BASE_URL_TTS=http://host.docker.internal:8969/v1

On Linux:

# Option 1: Use Docker bridge IP
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://172.17.0.1:8969/v1

# Option 2: Use host networking
docker run --network host ...

Remote Speaches Server

Run Speaches on a different machine for distributed processing:

# On the server machine
docker compose up -d

# Allow external connections (be careful with firewall settings)
# Update docker-compose.yml to bind to 0.0.0.0:8969

Then configure Open Notebook:

export OPENAI_COMPATIBLE_BASE_URL_TTS=http://server-ip:8969/v1

Security Warning: Only expose Speaches on trusted networks or use proper authentication/firewall rules.

Podcast Generation

Creating Podcasts with Local TTS

Once configured, use Speaches for podcast generation:

Go to Podcasts page in Open Notebook
Create or edit an Episode Profile
Configure speakers:
- For each speaker, select your Speaches model
- Choose different voices (e.g., af_bella for host, am_adam for guest)
Generate podcast
Audio is generated locally using your Speaches server

Multi-Speaker Setup

Create natural-sounding conversations with different voices:

Speaker 1 (Host):
- Model: speaches-ai/Kokoro-82M-v1.0-ONNX
- Voice: af_bella

Speaker 2 (Guest):
- Model: speaches-ai/Kokoro-82M-v1.0-ONNX
- Voice: am_adam

Speaker 3 (Narrator):
- Model: speaches-ai/Kokoro-82M-v1.0-ONNX
- Voice: bf_emma

Performance Optimization

CPU Performance

Recommended Specs:

4+ CPU cores
4GB+ RAM
SSD storage

Tips:

Close unnecessary applications
Use quantized models when available
Adjust speech speed for faster generation

Memory Management

Monitor Docker memory usage:

docker stats speaches

Allocate more memory if needed:

services:
  speaches:
    # ... other config ...
    mem_limit: 4g  # Adjust based on your system

Batch Processing

For generating multiple audio files, Speaches handles concurrent requests efficiently. Open Notebook automatically manages this during podcast generation.

Troubleshooting

Service Won't Start

Symptom: Container exits immediately

Solutions:

# Check logs
docker compose logs speaches

# Verify Docker is running
docker ps

# Check port availability
lsof -i :8969  # macOS/Linux
netstat -ano | findstr :8969  # Windows

Connection Refused

Symptom: Open Notebook can't reach Speaches

Solutions:

Verify Speaches is running:
```
curl http://localhost:8969/v1/models
```
Check Docker networking:
- Use host.docker.internal instead of localhost when Open Notebook is in Docker
- Verify firewall settings

Test from inside Open Notebook container:

docker exec -it open-notebook curl http://host.docker.internal:8969/v1/models

Model Not Found

Symptom: Error about missing model during generation

Solutions:

Verify model is downloaded:

docker compose exec speaches uv tool run speaches-cli model list

Download the model:

docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX

Check model name matches what you configured in Open Notebook

Poor Audio Quality

Symptom: Generated speech sounds robotic or unclear

Solutions:

Try different voices
Adjust speech speed (1.0 is normal, try 0.9-1.2)
Use higher-quality models if available
Check that model downloaded completely

Slow Generation

Symptom: Audio generation takes a long time

Solutions:

Enable GPU acceleration if you have an NVIDIA GPU
Use faster models (smaller models = faster generation)
Adjust speech speed to 1.5-2.0 for quicker output
Allocate more CPU cores in Docker settings
Use SSD storage instead of HDD

Out of Memory

Symptom: Container crashes or system freezes

Solutions:

Increase Docker memory limit:

services:
  speaches:
    mem_limit: 4g  # Increase this value

Use smaller models
Close other applications
Monitor with docker stats

Voice Not Available

Symptom: Requested voice doesn't work

Solutions:

Check available voices for your model
Use one of the documented voices (af_bella, am_adam, etc.)
Verify voice name spelling (case-sensitive)

Comparison: Local vs Cloud TTS

Aspect	Local (Speaches)	Cloud (OpenAI/ElevenLabs)
Cost	Free after setup	$15-50 per 1M characters
Privacy	Complete	Data sent to provider
Speed	Depends on hardware	Usually faster
Quality	Good (improving)	Excellent
Setup	Moderate complexity	Simple API key
Offline	Yes	No
Rate Limits	None	Yes
Voices	Limited selection	Many options
Languages	Limited	50+ languages

Recommendation:

Use Local for: Privacy-sensitive content, high-volume generation, development
Use Cloud for: Production podcasts, multiple languages, premium quality needs

Best Practices

1. Model Management

Download Models Ahead of Time:

# Don't wait until generation time
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX

Keep Models Updated:

# Periodically check for model updates
# Remove old models to save space
docker compose exec speaches uv tool run speaches-cli model list

2. Voice Selection

Test Before Production:

Generate test audio with different voices
Choose voices that match your podcast style
Use consistent voices for recurring speakers

Voice Characteristics:

Clear pronunciation for educational content
Expressive voices for storytelling
Professional voices for business content

3. Resource Management

Monitor System Resources:

# Check Docker resource usage
docker stats speaches

# Monitor disk space for models
docker compose exec speaches df -h

Optimize Docker:

# Set appropriate limits
services:
  speaches:
    mem_limit: 4g
    cpus: 2

4. Backup Strategy

Persist Model Cache: The hf-hub-cache volume stores downloaded models. To backup:

# List volumes
docker volume ls

# Backup volume
docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar czf /backup/speaches-models-backup.tar.gz /data

Restore if needed:

docker run --rm -v hf-hub-cache:/data -v $(pwd):/backup ubuntu tar xzf /backup/speaches-models-backup.tar.gz -C /

5. Testing

Always Test First:

# Test with short text before generating long podcasts
curl "http://localhost:8969/v1/audio/speech" -s \
  -H "Content-Type: application/json" \
  --output test.mp3 \
  --data '{
    "input": "Test",
    "model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
    "voice": "af_bella"
  }'

Complete Setup Script

For quick setup, save this as setup-speaches.sh:

#!/bin/bash
set -e

echo "Creating Speaches setup directory..."
mkdir -p speaches-setup
cd speaches-setup

echo "Creating docker-compose.yml..."
cat > docker-compose.yml << 'EOF'
services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest-cpu
    container_name: speaches
    ports:
      - "8969:8000"
    volumes:
      - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
    restart: unless-stopped

volumes:
  hf-hub-cache:
EOF

echo "Starting Speaches container..."
docker compose up -d

echo "Waiting for service to be ready..."
sleep 10

echo "Downloading TTS model..."
docker compose exec speaches uv tool run speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX

echo "Testing speech generation..."
curl "http://localhost:8969/v1/audio/speech" -s -H "Content-Type: application/json" \
  --output test-audio.mp3 \
  --data '{
    "input": "Hello! Speaches is now configured and ready to use with Open Notebook.",
    "model": "speaches-ai/Kokoro-82M-v1.0-ONNX",
    "voice": "af_bella",
    "speed": 1.0
  }'

echo ""
echo "✅ Setup complete!"
echo ""
echo "Next steps:"
echo "1. Test the audio file: test-audio.mp3"
echo "2. Set environment variable: export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1"
echo "3. Configure in Open Notebook Settings → Models"
echo ""
echo "To stop Speaches: docker compose down"
echo "To restart: docker compose up -d"

Make it executable and run:

chmod +x setup-speaches.sh
./setup-speaches.sh

Using Other Local TTS Servers

The principles in this guide apply to any OpenAI-compatible TTS server. When using a different solution:

Start your TTS server following its documentation

Set the environment variable to point to your server:

export OPENAI_COMPATIBLE_BASE_URL_TTS=http://your-server-url:port/v1

Add the model in Open Notebook using provider openai_compatible
Use the model name as specified by your TTS server

The key requirement is OpenAI API compatibility - specifically, the /v1/audio/speech endpoint.

Getting Help

Resources:

Open Notebook Discord: https://discord.gg/37XJPXfz2w - Get help with Open Notebook integration
Open Notebook Issues: Report integration issues to Open Notebook
Speaches GitHub: https://github.com/speaches-ai/speaches - For Speaches-specific questions
Your TTS Server Documentation: Consult the docs for your chosen TTS solution

Common Questions:

Q: Can I use Speaches with multiple Open Notebook instances? A: Yes! Just point each instance to the same Speaches server URL.

Q: How much disk space do I need? A: Each model is 300-800MB. Start with 5GB and add more as you download models.

Q: Can I use this for commercial podcasts? A: Check the model's license on Hugging Face. Most open models allow commercial use.

Q: How does quality compare to ElevenLabs or OpenAI? A: Local models are improving rapidly. For most use cases, quality is very good. Premium services still have an edge for the highest quality needs.

OpenAI-Compatible Setup - General OpenAI-compatible provider configuration
AI Models Guide - Complete AI model configuration
Podcast Generation - Learn about creating podcasts
Ollama Setup - Another local AI option for language models

This guide should get you up and running with local text-to-speech in Open Notebook. Enjoy complete privacy and unlimited audio generation! 🎙️

18 KiB Raw Blame History

Local Text-to-Speech Setup

Why Local Text-to-Speech?

Available Local TTS Solutions

About Speaches

Quick Start with Speaches

Prerequisites

Basic Setup

Configure Open Notebook

Available Voice Models

Kokoro (Recommended)

Other Models

Available Voices

Advanced Configuration

GPU Acceleration

Custom Port

Multiple Models

Network Configuration

Docker Networking

Remote Speaches Server

Podcast Generation

Creating Podcasts with Local TTS

Multi-Speaker Setup

Performance Optimization

CPU Performance

Memory Management

Batch Processing

Troubleshooting

Service Won't Start

Connection Refused

Model Not Found

Poor Audio Quality

Slow Generation

Out of Memory

Voice Not Available

Comparison: Local vs Cloud TTS

Best Practices

1. Model Management

2. Voice Selection

3. Resource Management

4. Backup Strategy

5. Testing

Complete Setup Script

Using Other Local TTS Servers

Getting Help

Related Documentation

18 KiB

Raw Blame History