remove stale files
This commit is contained in:
parent
b7fd3a865b
commit
72a5c60ec1
13 changed files with 6 additions and 397 deletions
|
|
@ -1,256 +0,0 @@
|
||||||
## ElatoAI: Realtime Voice AI Models on FastAPI
|
|
||||||
|
|
||||||
`server-fastapi` is the simplest self-hosted Elato backend for people who want a normal Python server instead of an edge runtime.
|
|
||||||
|
|
||||||
Use this if you want:
|
|
||||||
|
|
||||||
- a FastAPI server you can run on your own machine or VM
|
|
||||||
- a classic `STT -> LLM -> TTS` voice pipeline
|
|
||||||
- a smaller provider surface that is easy to understand
|
|
||||||
- the same ESP32 transport shape as the rest of Elato
|
|
||||||
|
|
||||||
If you are new to the project, read these first:
|
|
||||||
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/README.md`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/README.md`
|
|
||||||
|
|
||||||
## The Simple Provider Set
|
|
||||||
|
|
||||||
To keep onboarding straightforward, the classic FastAPI route is centered around a small set of providers.
|
|
||||||
|
|
||||||
### LLM
|
|
||||||
|
|
||||||
- `openai`
|
|
||||||
- `claude`
|
|
||||||
- `gemini`
|
|
||||||
- `grok`
|
|
||||||
|
|
||||||
### STT
|
|
||||||
|
|
||||||
- `deepgram`
|
|
||||||
- `whisper`
|
|
||||||
|
|
||||||
### TTS
|
|
||||||
|
|
||||||
- `elevenlabs`
|
|
||||||
- `cartesia`
|
|
||||||
- `deepgram`
|
|
||||||
- `openai`
|
|
||||||
|
|
||||||
The code still uses the `models/llm`, `models/stt`, and `models/tts` layout, but the active registry is intentionally trimmed so the default experience stays simple.
|
|
||||||
|
|
||||||
## Default Setup
|
|
||||||
|
|
||||||
The default classic route is:
|
|
||||||
|
|
||||||
- STT: `deepgram`
|
|
||||||
- LLM: `openai`
|
|
||||||
- TTS: `elevenlabs`
|
|
||||||
|
|
||||||
That gives people one obvious path to get running before they start swapping providers.
|
|
||||||
|
|
||||||
## Project Layout
|
|
||||||
|
|
||||||
```text
|
|
||||||
server-fastapi/
|
|
||||||
├── bot.py
|
|
||||||
├── classic_route.py
|
|
||||||
├── esp32_transport.py
|
|
||||||
├── server.py
|
|
||||||
├── env.example
|
|
||||||
└── models/
|
|
||||||
├── llm/
|
|
||||||
├── stt/
|
|
||||||
└── tts/
|
|
||||||
```
|
|
||||||
|
|
||||||
## How The FastAPI Server Fits Into Elato
|
|
||||||
|
|
||||||
Elato has three backend options right now:
|
|
||||||
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/deno`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi`
|
|
||||||
|
|
||||||
A clean way to think about them is:
|
|
||||||
|
|
||||||
- `Deno`: edge-first, mature provider integrations
|
|
||||||
- `Cloudflare`: Workers + Durable Objects + Workers AI
|
|
||||||
- `FastAPI`: normal Python server, easy to self-host, easy to reason about
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
### 1. Create or activate your Python environment
|
|
||||||
|
|
||||||
Use whatever you prefer. If you already use `uv`, that is a good default.
|
|
||||||
|
|
||||||
### 2. Install dependencies
|
|
||||||
|
|
||||||
This repo uses `pyproject.toml`, so install from that environment rather than a `requirements.txt` file.
|
|
||||||
|
|
||||||
With `uv`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
|
|
||||||
uv sync
|
|
||||||
```
|
|
||||||
|
|
||||||
Or with plain pip in your venv:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
|
|
||||||
pip install -e .
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Create your env file
|
|
||||||
|
|
||||||
Copy the example values from:
|
|
||||||
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/env.example`
|
|
||||||
|
|
||||||
Minimum example for the default route:
|
|
||||||
|
|
||||||
```env
|
|
||||||
DEEPGRAM_API_KEY=your_deepgram_api_key
|
|
||||||
OPENAI_API_KEY=your_openai_api_key
|
|
||||||
ELEVENLABS_API_KEY=your_elevenlabs_api_key
|
|
||||||
|
|
||||||
CURRENT_VOICE_ROUTE=classic
|
|
||||||
CLASSIC_STT_PROVIDER=deepgram
|
|
||||||
CLASSIC_LLM_PROVIDER=openai
|
|
||||||
CLASSIC_TTS_PROVIDER=elevenlabs
|
|
||||||
|
|
||||||
ESP32_INPUT_SAMPLE_RATE=16000
|
|
||||||
BROWSER_INPUT_SAMPLE_RATE=16000
|
|
||||||
AUDIO_OUTPUT_SAMPLE_RATE=24000
|
|
||||||
PIPELINE_AUDIO_IN_SAMPLE_RATE=16000
|
|
||||||
PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000
|
|
||||||
|
|
||||||
ALLOWED_ORIGINS=*
|
|
||||||
HOST=0.0.0.0
|
|
||||||
PORT=7860
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Run the server
|
|
||||||
|
|
||||||
If you use `uv`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
|
|
||||||
uv run server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
If you use your activated venv directly:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi
|
|
||||||
python server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. Point your ESP32 at the FastAPI backend
|
|
||||||
|
|
||||||
Update the firmware config so your hardware connects to this server instead of the Deno or Cloudflare backend.
|
|
||||||
|
|
||||||
The ESP32 route is:
|
|
||||||
|
|
||||||
```text
|
|
||||||
/ws/esp32
|
|
||||||
```
|
|
||||||
|
|
||||||
For browser or Next.js testing, the server also exposes:
|
|
||||||
|
|
||||||
- `/ws/browser`
|
|
||||||
- `/ws/nextjs`
|
|
||||||
|
|
||||||
## How Provider Selection Works
|
|
||||||
|
|
||||||
The classic route reads three env vars:
|
|
||||||
|
|
||||||
- `CLASSIC_STT_PROVIDER`
|
|
||||||
- `CLASSIC_LLM_PROVIDER`
|
|
||||||
- `CLASSIC_TTS_PROVIDER`
|
|
||||||
|
|
||||||
So changing providers is just an env change.
|
|
||||||
|
|
||||||
Examples:
|
|
||||||
|
|
||||||
### OpenAI + Deepgram + ElevenLabs
|
|
||||||
|
|
||||||
```env
|
|
||||||
CLASSIC_STT_PROVIDER=deepgram
|
|
||||||
CLASSIC_LLM_PROVIDER=openai
|
|
||||||
CLASSIC_TTS_PROVIDER=elevenlabs
|
|
||||||
```
|
|
||||||
|
|
||||||
### Whisper + Claude + Cartesia
|
|
||||||
|
|
||||||
```env
|
|
||||||
CLASSIC_STT_PROVIDER=whisper
|
|
||||||
CLASSIC_LLM_PROVIDER=claude
|
|
||||||
CLASSIC_TTS_PROVIDER=cartesia
|
|
||||||
```
|
|
||||||
|
|
||||||
### Deepgram + Gemini + OpenAI TTS
|
|
||||||
|
|
||||||
```env
|
|
||||||
CLASSIC_STT_PROVIDER=deepgram
|
|
||||||
CLASSIC_LLM_PROVIDER=gemini
|
|
||||||
CLASSIC_TTS_PROVIDER=openai
|
|
||||||
```
|
|
||||||
|
|
||||||
## Unified Experience Across Elato
|
|
||||||
|
|
||||||
A simple way to keep the product understandable is:
|
|
||||||
|
|
||||||
- keep the Next.js frontend focused on character creation and device management
|
|
||||||
- keep the ESP32 firmware focused on one transport protocol
|
|
||||||
- let users choose one backend runtime:
|
|
||||||
- Deno
|
|
||||||
- Cloudflare
|
|
||||||
- FastAPI
|
|
||||||
- inside each backend, expose the same conceptual knobs:
|
|
||||||
- `STT`
|
|
||||||
- `LLM`
|
|
||||||
- `TTS`
|
|
||||||
|
|
||||||
That means the hardware story stays stable:
|
|
||||||
|
|
||||||
- one firmware
|
|
||||||
- one websocket-style mental model
|
|
||||||
- three server deployment choices
|
|
||||||
|
|
||||||
The cleanest unification strategy is not “every backend supports every provider.”
|
|
||||||
It is:
|
|
||||||
|
|
||||||
- every backend should expose the same categories
|
|
||||||
- each backend should have one recommended default stack
|
|
||||||
- advanced users can swap providers later
|
|
||||||
|
|
||||||
## Recommended Defaults
|
|
||||||
|
|
||||||
If you want a simple opinionated experience for users, keep one default combo per backend.
|
|
||||||
|
|
||||||
Suggested defaults:
|
|
||||||
|
|
||||||
- `Deno`: OpenAI realtime
|
|
||||||
- `Cloudflare`: Workers AI STT/TTS + OpenAI LLM
|
|
||||||
- `FastAPI`: Deepgram + OpenAI + ElevenLabs
|
|
||||||
|
|
||||||
That gives users one obvious starting point without taking away flexibility.
|
|
||||||
|
|
||||||
## Important Files
|
|
||||||
|
|
||||||
If you want to change the FastAPI backend, start here:
|
|
||||||
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/server.py`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/classic_route.py`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/esp32_transport.py`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/llm/__init__.py`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/stt/__init__.py`
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server-fastapi/models/tts/__init__.py`
|
|
||||||
|
|
||||||
## Current Notes
|
|
||||||
|
|
||||||
- The filesystem still contains many scaffolded provider modules from the earlier broader experiment.
|
|
||||||
- The active provider registry is now intentionally much smaller.
|
|
||||||
- That means the codebase stays extensible, but the user-facing default path stays simple.
|
|
||||||
Binary file not shown.
|
|
@ -1,55 +0,0 @@
|
||||||
"""Classic STT -> LLM -> TTS pipeline builder."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
from character_prompt import LANGUAGE_LEARNING_PAL_PROMPT
|
|
||||||
from loguru import logger
|
|
||||||
from models.llm import create_llm_service
|
|
||||||
from models.stt import create_stt_service
|
|
||||||
from models.tts import create_tts_service
|
|
||||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
|
||||||
from pipecat.audio.vad.vad_analyzer import VADParams
|
|
||||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
|
||||||
from pipecat.processors.aggregators.llm_response_universal import (
|
|
||||||
LLMContextAggregatorPair,
|
|
||||||
LLMUserAggregatorParams,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def build_classic_route(input_processor, context: LLMContext):
|
|
||||||
stt_provider = os.getenv("CLASSIC_STT_PROVIDER", "deepgram")
|
|
||||||
llm_provider = os.getenv("CLASSIC_LLM_PROVIDER", "openai")
|
|
||||||
tts_provider = os.getenv("CLASSIC_TTS_PROVIDER", "elevenlabs")
|
|
||||||
|
|
||||||
logger.info(
|
|
||||||
"Building classic route with stt={} llm={} tts={}",
|
|
||||||
stt_provider,
|
|
||||||
llm_provider,
|
|
||||||
tts_provider,
|
|
||||||
)
|
|
||||||
|
|
||||||
stt = create_stt_service(stt_provider)
|
|
||||||
llm = create_llm_service(
|
|
||||||
llm_provider,
|
|
||||||
system_instruction=LANGUAGE_LEARNING_PAL_PROMPT,
|
|
||||||
)
|
|
||||||
tts = create_tts_service(tts_provider)
|
|
||||||
|
|
||||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
|
||||||
context,
|
|
||||||
user_params=LLMUserAggregatorParams(
|
|
||||||
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=1))
|
|
||||||
),
|
|
||||||
)
|
|
||||||
|
|
||||||
processors = [
|
|
||||||
input_processor,
|
|
||||||
stt,
|
|
||||||
user_aggregator,
|
|
||||||
llm,
|
|
||||||
tts,
|
|
||||||
]
|
|
||||||
|
|
||||||
return processors, assistant_aggregator
|
|
||||||
|
|
@ -1,27 +0,0 @@
|
||||||
DEEPGRAM_API_KEY=your_deepgram_api_key
|
|
||||||
OPENAI_API_KEY=your_openai_api_key
|
|
||||||
ANTHROPIC_API_KEY=your_anthropic_api_key
|
|
||||||
GEMINI_API_KEY=your_gemini_api_key
|
|
||||||
XAI_API_KEY=your_xai_api_key
|
|
||||||
ELEVENLABS_API_KEY=your_elevenlabs_api_key
|
|
||||||
CARTESIA_API_KEY=your_cartesia_api_key
|
|
||||||
|
|
||||||
# Classic route providers
|
|
||||||
CURRENT_VOICE_ROUTE=classic
|
|
||||||
CLASSIC_STT_PROVIDER=deepgram
|
|
||||||
CLASSIC_LLM_PROVIDER=openai
|
|
||||||
CLASSIC_TTS_PROVIDER=elevenlabs
|
|
||||||
|
|
||||||
# Transport and pipeline sample rates
|
|
||||||
ESP32_INPUT_SAMPLE_RATE=16000
|
|
||||||
BROWSER_INPUT_SAMPLE_RATE=16000
|
|
||||||
AUDIO_OUTPUT_SAMPLE_RATE=24000
|
|
||||||
PIPELINE_AUDIO_IN_SAMPLE_RATE=16000
|
|
||||||
PIPELINE_AUDIO_OUT_SAMPLE_RATE=24000
|
|
||||||
|
|
||||||
# Browser / Next.js access
|
|
||||||
ALLOWED_ORIGINS=*
|
|
||||||
|
|
||||||
# WebSocket server settings
|
|
||||||
HOST=0.0.0.0
|
|
||||||
PORT=7860
|
|
||||||
|
|
@ -1,20 +0,0 @@
|
||||||
"""LLM provider registry."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from models._provider_loader import load_provider_factory
|
|
||||||
|
|
||||||
LLM_REGISTRY = {
|
|
||||||
"claude": "models.llm.anthropic",
|
|
||||||
"anthropic": "models.llm.anthropic",
|
|
||||||
"gemini": "models.llm.google_gemini",
|
|
||||||
"google_gemini": "models.llm.google_gemini",
|
|
||||||
"google_vertex_ai": "models.llm.google_vertex_ai",
|
|
||||||
"grok": "models.llm.grok",
|
|
||||||
"openai": "models.llm.openai",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def create_llm_service(provider_name: str, **kwargs):
|
|
||||||
factory = load_provider_factory(LLM_REGISTRY, provider_name, "LLM")
|
|
||||||
return factory(**kwargs)
|
|
||||||
Binary file not shown.
|
|
@ -1,16 +0,0 @@
|
||||||
"""STT provider registry."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from models._provider_loader import load_provider_factory
|
|
||||||
|
|
||||||
STT_REGISTRY = {
|
|
||||||
"deepgram": "models.stt.deepgram",
|
|
||||||
"openai": "models.stt.openai",
|
|
||||||
"whisper": "models.stt.whisper",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def create_stt_service(provider_name: str, **kwargs):
|
|
||||||
factory = load_provider_factory(STT_REGISTRY, provider_name, "STT")
|
|
||||||
return factory(**kwargs)
|
|
||||||
Binary file not shown.
|
|
@ -1,17 +0,0 @@
|
||||||
"""TTS provider registry."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from models._provider_loader import load_provider_factory
|
|
||||||
|
|
||||||
TTS_REGISTRY = {
|
|
||||||
"cartesia": "models.tts.cartesia",
|
|
||||||
"deepgram": "models.tts.deepgram",
|
|
||||||
"elevenlabs": "models.tts.elevenlabs",
|
|
||||||
"openai": "models.tts.openai",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def create_tts_service(provider_name: str, **kwargs):
|
|
||||||
factory = load_provider_factory(TTS_REGISTRY, provider_name, "TTS")
|
|
||||||
return factory(**kwargs)
|
|
||||||
Binary file not shown.
|
|
@ -54,7 +54,7 @@ That gives people one obvious path to get running before they start swapping pro
|
||||||
```text
|
```text
|
||||||
server/fastapi/
|
server/fastapi/
|
||||||
├── bot.py
|
├── bot.py
|
||||||
├── classic_route.py
|
├── voice_pipeline.py
|
||||||
├── esp32_transport.py
|
├── esp32_transport.py
|
||||||
├── server.py
|
├── server.py
|
||||||
├── env.example
|
├── env.example
|
||||||
|
|
@ -327,7 +327,7 @@ That gives users one obvious starting point without taking away flexibility.
|
||||||
If you want to change the FastAPI backend, start here:
|
If you want to change the FastAPI backend, start here:
|
||||||
|
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/server.py`
|
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/server.py`
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/classic_route.py`
|
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/voice_pipeline.py`
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/esp32_transport.py`
|
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/esp32_transport.py`
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/__init__.py`
|
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/llm/__init__.py`
|
||||||
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/__init__.py`
|
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi/models/stt/__init__.py`
|
||||||
|
|
|
||||||
|
|
@ -9,7 +9,7 @@
|
||||||
import os
|
import os
|
||||||
from typing import Literal
|
from typing import Literal
|
||||||
|
|
||||||
from classic_route import build_classic_route
|
from voice_pipeline import build_voice_pipeline
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from gem_live_route import build_gem_live_route
|
from gem_live_route import build_gem_live_route
|
||||||
from grok_route import build_grok_route
|
from grok_route import build_grok_route
|
||||||
|
|
@ -155,7 +155,7 @@ async def run_bot_session(
|
||||||
elif voice_route == "grok":
|
elif voice_route == "grok":
|
||||||
route_processors, assistant_aggregator = build_grok_route(input_processor, context)
|
route_processors, assistant_aggregator = build_grok_route(input_processor, context)
|
||||||
else:
|
else:
|
||||||
route_processors, assistant_aggregator = build_classic_route(input_processor, context)
|
route_processors, assistant_aggregator = build_voice_pipeline(input_processor, context)
|
||||||
|
|
||||||
processors = [transport.input(), *route_processors]
|
processors = [transport.input(), *route_processors]
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,4 +1,4 @@
|
||||||
"""Classic STT -> LLM -> TTS pipeline builder."""
|
"""Default STT -> LLM -> TTS voice pipeline builder."""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|
@ -18,7 +18,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def build_classic_route(input_processor, context: LLMContext):
|
def build_voice_pipeline(input_processor, context: LLMContext):
|
||||||
stt_provider = os.getenv("CLASSIC_STT_PROVIDER", "deepgram")
|
stt_provider = os.getenv("CLASSIC_STT_PROVIDER", "deepgram")
|
||||||
llm_provider = os.getenv("CLASSIC_LLM_PROVIDER", "openai")
|
llm_provider = os.getenv("CLASSIC_LLM_PROVIDER", "openai")
|
||||||
tts_provider = os.getenv("CLASSIC_TTS_PROVIDER", "elevenlabs")
|
tts_provider = os.getenv("CLASSIC_TTS_PROVIDER", "elevenlabs")
|
||||||
Loading…
Reference in a new issue