AIvoices/server/fastapi
2026-05-09 18:45:55 +05:30
..
models add fastapi files 2026-05-09 18:38:57 +05:30
.gitignore add fastapi files 2026-05-09 18:38:57 +05:30
bot.py remove stale files 2026-05-09 18:45:55 +05:30
character_prompt.py add fastapi files 2026-05-09 18:38:57 +05:30
Dockerfile add fastapi files 2026-05-09 18:38:57 +05:30
env.example add fastapi files 2026-05-09 18:38:57 +05:30
esp32_transport.py add fastapi files 2026-05-09 18:38:57 +05:30
gem_live_route.py add fastapi files 2026-05-09 18:38:57 +05:30
grok_route.py add fastapi files 2026-05-09 18:38:57 +05:30
pcc-deploy.toml add fastapi files 2026-05-09 18:38:57 +05:30
pyproject.toml add fastapi files 2026-05-09 18:38:57 +05:30
README.md remove stale files 2026-05-09 18:45:55 +05:30
server.py add fastapi files 2026-05-09 18:38:57 +05:30
voice_pipeline.py remove stale files 2026-05-09 18:45:55 +05:30

ElatoAI: Realtime Voice AI Models on FastAPI

server/fastapi is the simplest self-hosted Elato backend for people who want a normal Python server instead of an edge runtime.

Use this if you want:

  • a FastAPI server you can run on your own machine or VM
  • a classic STT -> LLM -> TTS voice pipeline
  • a smaller provider surface that is easy to understand
  • the same ESP32 transport shape as the rest of Elato

If you are new to the project, read these first:

  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/README.md
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/README.md

The Simple Provider Set

To keep onboarding straightforward, the classic FastAPI route is centered around a small set of providers.

LLM

  • openai
  • claude
  • gemini
  • grok

STT

  • deepgram
  • whisper

TTS

  • elevenlabs
  • cartesia
  • deepgram
  • openai

The code still uses the models/llm, models/stt, and models/tts layout, but the active registry is intentionally trimmed so the default experience stays simple.

Default Setup

The default classic route is:

  • STT: deepgram
  • LLM: openai
  • TTS: elevenlabs

That gives people one obvious path to get running before they start swapping providers.

Project Layout

server/fastapi/
├── bot.py
├── voice_pipeline.py
├── esp32_transport.py
├── server.py
├── env.example
└── models/
    ├── llm/
    ├── stt/
    └── tts/

How The FastAPI Server Fits Into Elato

Elato has three backend options right now:

  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/deno
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi

A clean way to think about them is:

  • Deno: edge-first, mature provider integrations
  • Cloudflare: Workers + Durable Objects + Workers AI
  • FastAPI: normal Python server, easy to self-host, easy to reason about

Quick Start

1. Create or activate your Python environment

Use whatever you prefer. If you already use uv, that is a good default.

2. Install dependencies

This repo uses pyproject.toml, so install from that environment rather than a requirements.txt file.

With uv:

cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
uv sync

Or with plain pip in your venv:

cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
pip install -e .

3. Create your env file

Copy the example values from:

  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/env.example

Minimum example for the default route:

DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

CURRENT_VOICE_ROUTE=classic
CLASSIC_STT_PROVIDER=deepgram
CLASSIC_LLM_PROVIDER=openai
CLASSIC_TTS_PROVIDER=elevenlabs

ESP32_INPUT_SAMPLE_RATE=16000
BROWSER_INPUT_SAMPLE_RATE=16000
AUDIO_OUTPUT_SAMPLE_RATE=24000
PIPELINE_AUDIO_IN_SAMPLE_RATE=16000
PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000

ALLOWED_ORIGINS=*
HOST=0.0.0.0
PORT=7860

4. Run the server

If you use uv:

cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
uv run server.py

If you use your activated venv directly:

cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi
python server.py

5. Point your ESP32 at the FastAPI backend

Update the firmware config so your hardware connects to this server instead of the Deno or Cloudflare backend.

The ESP32 route is:

/ws/esp32

For browser or Next.js testing, the server also exposes:

  • /ws/browser
  • /ws/nextjs

How Provider Selection Works

The classic route reads three env vars:

  • CLASSIC_STT_PROVIDER
  • CLASSIC_LLM_PROVIDER
  • CLASSIC_TTS_PROVIDER

So changing providers is just an env change.

Pipecat handles the runtime orchestration for us:

  • STT turns incoming audio into transcripts
  • the LLM receives conversation context and streams text back
  • TTS turns that streamed text into audio

In other words, Pipecat stitches the pipeline together, but Elato still needs to provide:

  • the provider selection UX
  • the transport protocol for ESP32
  • the environment-variable contract for API keys
  • the recommended defaults

That is why this FastAPI backend now has a simple provider catalog and validation layer in:

  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/providers.py

This lets the app answer questions like:

  • which LLMs do we support?
  • which key does deepgram require?
  • can the server start with the currently selected stack?

Required API Keys By Provider

The current simple provider map is:

  • openai LLM: OPENAI_API_KEY
  • claude LLM: ANTHROPIC_API_KEY
  • gemini LLM: GEMINI_API_KEY
  • grok LLM: XAI_API_KEY
  • deepgram STT: DEEPGRAM_API_KEY
  • whisper STT: no external API key required
  • elevenlabs TTS: ELEVENLABS_API_KEY
  • cartesia TTS: CARTESIA_API_KEY
  • deepgram TTS: DEEPGRAM_API_KEY
  • openai TTS: OPENAI_API_KEY

At startup, the server now validates the selected CLASSIC_*_PROVIDER values and fails early if the required keys are missing.

Provider Modules

Each supported provider now has its own module file so the layout is easy to understand:

  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/openai.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/anthropic.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/gemini.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/grok.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/deepgram.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/whisper.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/elevenlabs.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/cartesia.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/deepgram.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/openai.py

Under the hood, these modules delegate to Pipecat service implementations. We keep that wiring thin on purpose so users mostly think in terms of:

  • STT
  • LLM
  • TTS

not internal service classes.

Examples:

OpenAI + Deepgram + ElevenLabs

CLASSIC_STT_PROVIDER=deepgram
CLASSIC_LLM_PROVIDER=openai
CLASSIC_TTS_PROVIDER=elevenlabs

Whisper + Claude + Cartesia

CLASSIC_STT_PROVIDER=whisper
CLASSIC_LLM_PROVIDER=claude
CLASSIC_TTS_PROVIDER=cartesia

Deepgram + Gemini + OpenAI TTS

CLASSIC_STT_PROVIDER=deepgram
CLASSIC_LLM_PROVIDER=gemini
CLASSIC_TTS_PROVIDER=openai

Unified Experience Across Elato

A simple way to keep the product understandable is:

  • keep the Next.js frontend focused on character creation and device management
  • keep the ESP32 firmware focused on one transport protocol
  • let users choose one backend runtime:
    • Deno
    • Cloudflare
    • FastAPI
  • inside each backend, expose the same conceptual knobs:
    • STT
    • LLM
    • TTS

That means the hardware story stays stable:

  • one firmware
  • one websocket-style mental model
  • three server deployment choices

The cleanest unification strategy is not “every backend supports every provider.” It is:

  • every backend should expose the same categories
  • each backend should have one recommended default stack
  • advanced users can swap providers later

What This Looks Like In A UI

For Elato, the cleanest UI model is:

  1. user picks a backend runtime:
    • deno
    • cloudflare
    • fastapi
  2. user picks one option in each category:
    • stt
    • llm
    • tts
  3. UI shows which API keys are required
  4. backend validates the selection before starting a session

This FastAPI server now exposes a simple provider catalog at:

  • /providers

So your Next.js frontend can eventually fetch the available providers and render a model picker without hardcoding everything in the UI.

If you want a simple opinionated experience for users, keep one default combo per backend.

Suggested defaults:

  • Deno: OpenAI realtime
  • Cloudflare: Workers AI STT/TTS + OpenAI LLM
  • FastAPI: Deepgram + OpenAI + ElevenLabs

That gives users one obvious starting point without taking away flexibility.

Important Files

If you want to change the FastAPI backend, start here:

  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/server.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/voice_pipeline.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/esp32_transport.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/__init__.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/__init__.py
  • /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/tts/__init__.py

Current Notes

  • The filesystem still contains many scaffolded provider modules from the earlier broader experiment.
  • The active provider registry is now intentionally much smaller.
  • That means the codebase stays extensible, but the user-facing default path stays simple.