## ElatoAI: Realtime Voice AI Models on FastAPI

`server/fastapi` is the simplest self-hosted Elato backend for people who want a normal Python server instead of an edge runtime.

Use this if you want:

- a FastAPI server you can run on your own machine or VM
- a classic `STT -> LLM -> TTS` voice pipeline
- a smaller provider surface that is easy to understand
- the same ESP32 transport shape as the rest of Elato

If you are new to the project, read these first:

- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/README.md`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/README.md`

## The Simple Provider Set

To keep onboarding straightforward, the classic FastAPI route is centered around a small set of providers.

### LLM

- `openai`
- `claude`
- `gemini`
- `grok`

### STT

- `deepgram`
- `whisper`

### TTS

- `elevenlabs`
- `cartesia`
- `deepgram`
- `openai`

The code still uses the `models/llm`, `models/stt`, and `models/tts` layout, but the active registry is intentionally trimmed so the default experience stays simple.

## Default Setup

The default classic route is:

- STT: `deepgram`
- LLM: `openai`
- TTS: `elevenlabs`

That gives people one obvious path to get running before they start swapping providers.

## Project Layout

```text
server/fastapi/
├── bot.py
├── voice_pipeline.py
├── esp32_transport.py
├── server.py
├── env.example
└── models/
    ├── llm/
    ├── stt/
    └── tts/
```

## How The FastAPI Server Fits Into Elato

Elato has three backend options right now:

- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/deno`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi`

A clean way to think about them is:

- `Deno`: edge-first, mature provider integrations
- `Cloudflare`: Workers + Durable Objects + Workers AI
- `FastAPI`: normal Python server, easy to self-host, easy to reason about

## Quick Start

### 1. Create or activate your Python environment

Use whatever you prefer. If you already use `uv`, that is a good default.

### 2. Install dependencies

This repo uses `pyproject.toml`, so install from that environment rather than a `requirements.txt` file.

With `uv`:

```bash
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
uv sync
```

Or with plain pip in your venv:

```bash
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
pip install -e .
```

### 3. Create your env file

Copy the example values from:

- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/env.example`

Minimum example for the default route:

```env
DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

CURRENT_VOICE_ROUTE=classic
CLASSIC_STT_PROVIDER=deepgram
CLASSIC_LLM_PROVIDER=openai
CLASSIC_TTS_PROVIDER=elevenlabs

ESP32_INPUT_SAMPLE_RATE=16000
BROWSER_INPUT_SAMPLE_RATE=16000
AUDIO_OUTPUT_SAMPLE_RATE=24000
PIPELINE_AUDIO_IN_SAMPLE_RATE=16000
PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000

ALLOWED_ORIGINS=*
HOST=0.0.0.0
PORT=7860
```

### 4. Run the server

If you use `uv`:

```bash
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
uv run server.py
```

If you use your activated venv directly:

```bash
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
python server.py
```

### 5. Point your ESP32 at the FastAPI backend

Update the firmware config so your hardware connects to this server instead of the Deno or Cloudflare backend.

The ESP32 route is:

```text
/ws/esp32
```

For browser or Next.js testing, the server also exposes:

- `/ws/browser`
- `/ws/nextjs`

## How Provider Selection Works

The classic route reads three env vars:

- `CLASSIC_STT_PROVIDER`
- `CLASSIC_LLM_PROVIDER`
- `CLASSIC_TTS_PROVIDER`

So changing providers is just an env change.

Pipecat handles the runtime orchestration for us:

- STT turns incoming audio into transcripts
- the LLM receives conversation context and streams text back
- TTS turns that streamed text into audio

In other words, Pipecat stitches the pipeline together, but Elato still needs to provide:

- the provider selection UX
- the transport protocol for ESP32
- the environment-variable contract for API keys
- the recommended defaults

That is why this FastAPI backend now has a simple provider catalog and validation layer in:

- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/providers.py`

This lets the app answer questions like:

- which LLMs do we support?
- which key does `deepgram` require?
- can the server start with the currently selected stack?

### Required API Keys By Provider

The current simple provider map is:

- `openai` LLM: `OPENAI_API_KEY`
- `claude` LLM: `ANTHROPIC_API_KEY`
- `gemini` LLM: `GEMINI_API_KEY`
- `grok` LLM: `XAI_API_KEY`
- `deepgram` STT: `DEEPGRAM_API_KEY`
- `whisper` STT: no external API key required
- `elevenlabs` TTS: `ELEVENLABS_API_KEY`
- `cartesia` TTS: `CARTESIA_API_KEY`
- `deepgram` TTS: `DEEPGRAM_API_KEY`
- `openai` TTS: `OPENAI_API_KEY`

At startup, the server now validates the selected `CLASSIC_*_PROVIDER` values and fails early if the required keys are missing.

### Provider Modules

Each supported provider now has its own module file so the layout is easy to understand:

- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/openai.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/anthropic.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/gemini.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/grok.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/deepgram.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/whisper.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/elevenlabs.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/cartesia.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/deepgram.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/openai.py`

Under the hood, these modules delegate to Pipecat service implementations. We keep that wiring thin on purpose so users mostly think in terms of:

- `STT`
- `LLM`
- `TTS`

not internal service classes.

Examples:

### OpenAI + Deepgram + ElevenLabs

```env
CLASSIC_STT_PROVIDER=deepgram
CLASSIC_LLM_PROVIDER=openai
CLASSIC_TTS_PROVIDER=elevenlabs
```

### Whisper + Claude + Cartesia

```env
CLASSIC_STT_PROVIDER=whisper
CLASSIC_LLM_PROVIDER=claude
CLASSIC_TTS_PROVIDER=cartesia
```

### Deepgram + Gemini + OpenAI TTS

```env
CLASSIC_STT_PROVIDER=deepgram
CLASSIC_LLM_PROVIDER=gemini
CLASSIC_TTS_PROVIDER=openai
```

## Unified Experience Across Elato

A simple way to keep the product understandable is:

- keep the Next.js frontend focused on character creation and device management
- keep the ESP32 firmware focused on one transport protocol
- let users choose one backend runtime:
  - Deno
  - Cloudflare
  - FastAPI
- inside each backend, expose the same conceptual knobs:
  - `STT`
  - `LLM`
  - `TTS`

That means the hardware story stays stable:

- one firmware
- one websocket-style mental model
- three server deployment choices

The cleanest unification strategy is not “every backend supports every provider.”
It is:

- every backend should expose the same categories
- each backend should have one recommended default stack
- advanced users can swap providers later

## What This Looks Like In A UI

For Elato, the cleanest UI model is:

1. user picks a backend runtime:
   - `deno`
   - `cloudflare`
   - `fastapi`
2. user picks one option in each category:
   - `stt`
   - `llm`
   - `tts`
3. UI shows which API keys are required
4. backend validates the selection before starting a session

This FastAPI server now exposes a simple provider catalog at:

- `/providers`

So your Next.js frontend can eventually fetch the available providers and render a model picker without hardcoding everything in the UI.

## Recommended Defaults

If you want a simple opinionated experience for users, keep one default combo per backend.

Suggested defaults:

- `Deno`: OpenAI realtime
- `Cloudflare`: Workers AI STT/TTS + OpenAI LLM
- `FastAPI`: Deepgram + OpenAI + ElevenLabs

That gives users one obvious starting point without taking away flexibility.

## Important Files

If you want to change the FastAPI backend, start here:

- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/server.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/voice_pipeline.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/esp32_transport.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/__init__.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/__init__.py`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/__init__.py`

## Current Notes

- The filesystem still contains many scaffolded provider modules from the earlier broader experiment.
- The active provider registry is now intentionally much smaller.
- That means the codebase stays extensible, but the user-facing default path stays simple.