diff --git a/server-fastapi/README.md b/server-fastapi/README.md deleted file mode 100644 index 194c2b3..0000000 --- a/server-fastapi/README.md +++ /dev/null @@ -1,256 +0,0 @@ -## ElatoAI: Realtime Voice AI Models on FastAPI - -`server-fastapi` is the simplest self-hosted Elato backend for people who want a normal Python server instead of an edge runtime. - -Use this if you want: - -- a FastAPI server you can run on your own machine or VM -- a classic `STT -> LLM -> TTS` voice pipeline -- a smaller provider surface that is easy to understand -- the same ESP32 transport shape as the rest of Elato - -If you are new to the project, read these first: - -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/README.md` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/README.md` - -## The Simple Provider Set - -To keep onboarding straightforward, the classic FastAPI route is centered around a small set of providers. - -### LLM - -- `openai` -- `claude` -- `gemini` -- `grok` - -### STT - -- `deepgram` -- `whisper` - -### TTS - -- `elevenlabs` -- `cartesia` -- `deepgram` -- `openai` - -The code still uses the `models/llm`, `models/stt`, and `models/tts` layout, but the active registry is intentionally trimmed so the default experience stays simple. - -## Default Setup - -The default classic route is: - -- STT: `deepgram` -- LLM: `openai` -- TTS: `elevenlabs` - -That gives people one obvious path to get running before they start swapping providers. - -## Project Layout - -```text -server-fastapi/ -├── bot.py -├── classic_route.py -├── esp32_transport.py -├── server.py -├── env.example -└── models/ - ├── llm/ - ├── stt/ - └── tts/ -``` - -## How The FastAPI Server Fits Into Elato - -Elato has three backend options right now: - -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/deno` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi` - -A clean way to think about them is: - -- `Deno`: edge-first, mature provider integrations -- `Cloudflare`: Workers + Durable Objects + Workers AI -- `FastAPI`: normal Python server, easy to self-host, easy to reason about - -## Quick Start - -### 1. Create or activate your Python environment - -Use whatever you prefer. If you already use `uv`, that is a good default. - -### 2. Install dependencies - -This repo uses `pyproject.toml`, so install from that environment rather than a `requirements.txt` file. - -With `uv`: - -```bash -cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi -uv sync -``` - -Or with plain pip in your venv: - -```bash -cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi -pip install -e . -``` - -### 3. Create your env file - -Copy the example values from: - -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/env.example` - -Minimum example for the default route: - -```env -DEEPGRAM_API_KEY=your_deepgram_api_key -OPENAI_API_KEY=your_openai_api_key -ELEVENLABS_API_KEY=your_elevenlabs_api_key - -CURRENT_VOICE_ROUTE=classic -CLASSIC_STT_PROVIDER=deepgram -CLASSIC_LLM_PROVIDER=openai -CLASSIC_TTS_PROVIDER=elevenlabs - -ESP32_INPUT_SAMPLE_RATE=16000 -BROWSER_INPUT_SAMPLE_RATE=16000 -AUDIO_OUTPUT_SAMPLE_RATE=24000 -PIPELINE_AUDIO_IN_SAMPLE_RATE=16000 -PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000 - -ALLOWED_ORIGINS=* -HOST=0.0.0.0 -PORT=7860 -``` - -### 4. Run the server - -If you use `uv`: - -```bash -cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi -uv run server.py -``` - -If you use your activated venv directly: - -```bash -cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi -python server.py -``` - -### 5. Point your ESP32 at the FastAPI backend - -Update the firmware config so your hardware connects to this server instead of the Deno or Cloudflare backend. - -The ESP32 route is: - -```text -/ws/esp32 -``` - -For browser or Next.js testing, the server also exposes: - -- `/ws/browser` -- `/ws/nextjs` - -## How Provider Selection Works - -The classic route reads three env vars: - -- `CLASSIC_STT_PROVIDER` -- `CLASSIC_LLM_PROVIDER` -- `CLASSIC_TTS_PROVIDER` - -So changing providers is just an env change. - -Examples: - -### OpenAI + Deepgram + ElevenLabs - -```env -CLASSIC_STT_PROVIDER=deepgram -CLASSIC_LLM_PROVIDER=openai -CLASSIC_TTS_PROVIDER=elevenlabs -``` - -### Whisper + Claude + Cartesia - -```env -CLASSIC_STT_PROVIDER=whisper -CLASSIC_LLM_PROVIDER=claude -CLASSIC_TTS_PROVIDER=cartesia -``` - -### Deepgram + Gemini + OpenAI TTS - -```env -CLASSIC_STT_PROVIDER=deepgram -CLASSIC_LLM_PROVIDER=gemini -CLASSIC_TTS_PROVIDER=openai -``` - -## Unified Experience Across Elato - -A simple way to keep the product understandable is: - -- keep the Next.js frontend focused on character creation and device management -- keep the ESP32 firmware focused on one transport protocol -- let users choose one backend runtime: - - Deno - - Cloudflare - - FastAPI -- inside each backend, expose the same conceptual knobs: - - `STT` - - `LLM` - - `TTS` - -That means the hardware story stays stable: - -- one firmware -- one websocket-style mental model -- three server deployment choices - -The cleanest unification strategy is not “every backend supports every provider.” -It is: - -- every backend should expose the same categories -- each backend should have one recommended default stack -- advanced users can swap providers later - -## Recommended Defaults - -If you want a simple opinionated experience for users, keep one default combo per backend. - -Suggested defaults: - -- `Deno`: OpenAI realtime -- `Cloudflare`: Workers AI STT/TTS + OpenAI LLM -- `FastAPI`: Deepgram + OpenAI + ElevenLabs - -That gives users one obvious starting point without taking away flexibility. - -## Important Files - -If you want to change the FastAPI backend, start here: - -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/server.py` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/classic_route.py` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/esp32_transport.py` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/llm/__init__.py` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/stt/__init__.py` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/tts/__init__.py` - -## Current Notes - -- The filesystem still contains many scaffolded provider modules from the earlier broader experiment. -- The active provider registry is now intentionally much smaller. -- That means the codebase stays extensible, but the user-facing default path stays simple. diff --git a/server-fastapi/__pycache__/classic_route.cpython-313.pyc b/server-fastapi/__pycache__/classic_route.cpython-313.pyc deleted file mode 100644 index efd08aa..0000000 Binary files a/server-fastapi/__pycache__/classic_route.cpython-313.pyc and /dev/null differ diff --git a/server-fastapi/classic_route.py b/server-fastapi/classic_route.py deleted file mode 100644 index b8355a4..0000000 --- a/server-fastapi/classic_route.py +++ /dev/null @@ -1,55 +0,0 @@ -"""Classic STT -> LLM -> TTS pipeline builder.""" - -from __future__ import annotations - -import os - -from character_prompt import LANGUAGE_LEARNING_PAL_PROMPT -from loguru import logger -from models.llm import create_llm_service -from models.stt import create_stt_service -from models.tts import create_tts_service -from pipecat.audio.vad.silero import SileroVADAnalyzer -from pipecat.audio.vad.vad_analyzer import VADParams -from pipecat.processors.aggregators.llm_context import LLMContext -from pipecat.processors.aggregators.llm_response_universal import ( - LLMContextAggregatorPair, - LLMUserAggregatorParams, -) - - -def build_classic_route(input_processor, context: LLMContext): - stt_provider = os.getenv("CLASSIC_STT_PROVIDER", "deepgram") - llm_provider = os.getenv("CLASSIC_LLM_PROVIDER", "openai") - tts_provider = os.getenv("CLASSIC_TTS_PROVIDER", "elevenlabs") - - logger.info( - "Building classic route with stt={} llm={} tts={}", - stt_provider, - llm_provider, - tts_provider, - ) - - stt = create_stt_service(stt_provider) - llm = create_llm_service( - llm_provider, - system_instruction=LANGUAGE_LEARNING_PAL_PROMPT, - ) - tts = create_tts_service(tts_provider) - - user_aggregator, assistant_aggregator = LLMContextAggregatorPair( - context, - user_params=LLMUserAggregatorParams( - vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=1)) - ), - ) - - processors = [ - input_processor, - stt, - user_aggregator, - llm, - tts, - ] - - return processors, assistant_aggregator diff --git a/server-fastapi/env.example b/server-fastapi/env.example deleted file mode 100644 index 8f22444..0000000 --- a/server-fastapi/env.example +++ /dev/null @@ -1,27 +0,0 @@ -DEEPGRAM_API_KEY=your_deepgram_api_key -OPENAI_API_KEY=your_openai_api_key -ANTHROPIC_API_KEY=your_anthropic_api_key -GEMINI_API_KEY=your_gemini_api_key -XAI_API_KEY=your_xai_api_key -ELEVENLABS_API_KEY=your_elevenlabs_api_key -CARTESIA_API_KEY=your_cartesia_api_key - -# Classic route providers -CURRENT_VOICE_ROUTE=classic -CLASSIC_STT_PROVIDER=deepgram -CLASSIC_LLM_PROVIDER=openai -CLASSIC_TTS_PROVIDER=elevenlabs - -# Transport and pipeline sample rates -ESP32_INPUT_SAMPLE_RATE=16000 -BROWSER_INPUT_SAMPLE_RATE=16000 -AUDIO_OUTPUT_SAMPLE_RATE=24000 -PIPELINE_AUDIO_IN_SAMPLE_RATE=16000 -PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000 - -# Browser / Next.js access -ALLOWED_ORIGINS=* - -# WebSocket server settings -HOST=0.0.0.0 -PORT=7860 diff --git a/server-fastapi/models/llm/__init__.py b/server-fastapi/models/llm/__init__.py deleted file mode 100644 index fe825f2..0000000 --- a/server-fastapi/models/llm/__init__.py +++ /dev/null @@ -1,20 +0,0 @@ -"""LLM provider registry.""" - -from __future__ import annotations - -from models._provider_loader import load_provider_factory - -LLM_REGISTRY = { - "claude": "models.llm.anthropic", - "anthropic": "models.llm.anthropic", - "gemini": "models.llm.google_gemini", - "google_gemini": "models.llm.google_gemini", - "google_vertex_ai": "models.llm.google_vertex_ai", - "grok": "models.llm.grok", - "openai": "models.llm.openai", -} - - -def create_llm_service(provider_name: str, **kwargs): - factory = load_provider_factory(LLM_REGISTRY, provider_name, "LLM") - return factory(**kwargs) diff --git a/server-fastapi/models/llm/__pycache__/__init__.cpython-313.pyc b/server-fastapi/models/llm/__pycache__/__init__.cpython-313.pyc deleted file mode 100644 index 26d562f..0000000 Binary files a/server-fastapi/models/llm/__pycache__/__init__.cpython-313.pyc and /dev/null differ diff --git a/server-fastapi/models/stt/__init__.py b/server-fastapi/models/stt/__init__.py deleted file mode 100644 index 26cb9d7..0000000 --- a/server-fastapi/models/stt/__init__.py +++ /dev/null @@ -1,16 +0,0 @@ -"""STT provider registry.""" - -from __future__ import annotations - -from models._provider_loader import load_provider_factory - -STT_REGISTRY = { - "deepgram": "models.stt.deepgram", - "openai": "models.stt.openai", - "whisper": "models.stt.whisper", -} - - -def create_stt_service(provider_name: str, **kwargs): - factory = load_provider_factory(STT_REGISTRY, provider_name, "STT") - return factory(**kwargs) diff --git a/server-fastapi/models/stt/__pycache__/__init__.cpython-313.pyc b/server-fastapi/models/stt/__pycache__/__init__.cpython-313.pyc deleted file mode 100644 index 4aaac67..0000000 Binary files a/server-fastapi/models/stt/__pycache__/__init__.cpython-313.pyc and /dev/null differ diff --git a/server-fastapi/models/tts/__init__.py b/server-fastapi/models/tts/__init__.py deleted file mode 100644 index efc7468..0000000 --- a/server-fastapi/models/tts/__init__.py +++ /dev/null @@ -1,17 +0,0 @@ -"""TTS provider registry.""" - -from __future__ import annotations - -from models._provider_loader import load_provider_factory - -TTS_REGISTRY = { - "cartesia": "models.tts.cartesia", - "deepgram": "models.tts.deepgram", - "elevenlabs": "models.tts.elevenlabs", - "openai": "models.tts.openai", -} - - -def create_tts_service(provider_name: str, **kwargs): - factory = load_provider_factory(TTS_REGISTRY, provider_name, "TTS") - return factory(**kwargs) diff --git a/server-fastapi/models/tts/__pycache__/__init__.cpython-313.pyc b/server-fastapi/models/tts/__pycache__/__init__.cpython-313.pyc deleted file mode 100644 index eaf3005..0000000 Binary files a/server-fastapi/models/tts/__pycache__/__init__.cpython-313.pyc and /dev/null differ diff --git a/server/fastapi/README.md b/server/fastapi/README.md index d11ee4d..e443b67 100644 --- a/server/fastapi/README.md +++ b/server/fastapi/README.md @@ -54,7 +54,7 @@ That gives people one obvious path to get running before they start swapping pro ```text server/fastapi/ ├── bot.py -├── classic_route.py +├── voice_pipeline.py ├── esp32_transport.py ├── server.py ├── env.example @@ -327,7 +327,7 @@ That gives users one obvious starting point without taking away flexibility. If you want to change the FastAPI backend, start here: - `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/server.py` -- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/classic_route.py` +- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/voice_pipeline.py` - `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/esp32_transport.py` - `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/__init__.py` - `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/__init__.py` diff --git a/server/fastapi/bot.py b/server/fastapi/bot.py index b6c4207..e766c3a 100644 --- a/server/fastapi/bot.py +++ b/server/fastapi/bot.py @@ -9,7 +9,7 @@ import os from typing import Literal -from classic_route import build_classic_route +from voice_pipeline import build_voice_pipeline from dotenv import load_dotenv from gem_live_route import build_gem_live_route from grok_route import build_grok_route @@ -155,7 +155,7 @@ async def run_bot_session( elif voice_route == "grok": route_processors, assistant_aggregator = build_grok_route(input_processor, context) else: - route_processors, assistant_aggregator = build_classic_route(input_processor, context) + route_processors, assistant_aggregator = build_voice_pipeline(input_processor, context) processors = [transport.input(), *route_processors] diff --git a/server/fastapi/classic_route.py b/server/fastapi/voice_pipeline.py similarity index 92% rename from server/fastapi/classic_route.py rename to server/fastapi/voice_pipeline.py index b8355a4..8fa61de 100644 --- a/server/fastapi/classic_route.py +++ b/server/fastapi/voice_pipeline.py @@ -1,4 +1,4 @@ -"""Classic STT -> LLM -> TTS pipeline builder.""" +"""Default STT -> LLM -> TTS voice pipeline builder.""" from __future__ import annotations @@ -18,7 +18,7 @@ from pipecat.processors.aggregators.llm_response_universal import ( ) -def build_classic_route(input_processor, context: LLMContext): +def build_voice_pipeline(input_processor, context: LLMContext): stt_provider = os.getenv("CLASSIC_STT_PROVIDER", "deepgram") llm_provider = os.getenv("CLASSIC_LLM_PROVIDER", "openai") tts_provider = os.getenv("CLASSIC_TTS_PROVIDER", "elevenlabs")