AIvoices/server/cloudflare/README.md
2026-04-18 02:08:37 +05:30

257 lines
6.3 KiB
Markdown

# server/cloudflare
Cloudflare Workers + Durable Objects backend for Elato's ESP32 realtime voice flow.
This server keeps the existing Elato device protocol and routes audio through Cloudflare-hosted services:
- STT: Cloudflare Workers AI via `@cloudflare/voice`
- LLM: OpenAI Chat Completions
- TTS: Cloudflare Workers AI Deepgram Aura
- Transport: WebSocket + Opus packetization for ESP32
If you are new to the overall project, start with the root README first:
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/README.md`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/README.md`
## What This Server Does
This backend is meant to be an alternative to the Deno edge server, not a separate firmware protocol.
The ESP32 still talks to the same Elato-style control surface:
- `auth`
- `AUDIO.COMMITTED`
- `RESPONSE.CREATED`
- binary audio frames
- `RESPONSE.COMPLETE`
- `SESSION.END`
Public route:
```text
/ws/esp32
```
Health check:
```text
/healthz
```
## Current Layout
```text
server/cloudflare/
├── models/
│ ├── llm.ts
│ ├── session.ts
│ ├── stt.ts
│ └── tts.ts
├── src/
│ ├── index.ts
│ ├── opus.ts
│ ├── prompt.ts
│ └── types.ts
├── package.json
└── wrangler.toml
```
## How It Works
1. The ESP32 opens a secure websocket to `/ws/esp32`.
2. The Worker creates a fresh Durable Object session for that websocket.
3. The server sends the Elato `auth` payload.
4. The server triggers the first assistant turn.
5. LLM output is synthesized to audio.
6. Audio is packetized into Opus frames and streamed back to the ESP32.
7. After playback, the ESP32 goes back to listening.
8. Incoming mic audio is fed to the STT session for the next turn.
## Prerequisites
You need:
- Node.js 22+
- npm
- a Cloudflare account with Workers enabled
- a Workers AI binding
- an OpenAI API key for the LLM path
## Local Development
### 1. Install dependencies
```bash
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare
npm install
```
### 2. Create local env vars
Copy the example file:
```bash
cp .dev.vars.example .dev.vars
```
Then fill in the values you actually need.
Typical local file:
```env
OPENAI_API_KEY=...
ELATO_OPENAI_MODEL=gpt-4.1-mini
ELATO_OPENAI_SYSTEM_PROMPT=You are a friendly toy character.
ELATO_OPENAI_FIRST_MESSAGE=Say hello first in one short sentence.
```
Notes:
- `JWT_SECRET_KEY` is not currently required for the stripped-down iteration unless you wire auth back in.
- Do not commit real secrets.
### 3. Run locally
```bash
npm run dev
```
This uses:
```bash
wrangler dev --ip 0.0.0.0 --port 8787
```
So local access is typically:
```text
http://<your-lan-ip>:8787/healthz
ws://<your-lan-ip>:8787/ws/esp32
```
For local firmware testing:
- point the ESP32 at your machine's LAN IP, not `0.0.0.0`
- local plain `ws://` is fine for quick testing if your firmware build allows it
- production firmware should use `wss://`
## Deploying to Cloudflare
### 1. Set Worker secrets
Set the runtime secrets in Cloudflare:
- `OPENAI_API_KEY`
- optionally `ELATO_OPENAI_MODEL`
- optionally `ELATO_OPENAI_SYSTEM_PROMPT`
- optionally `ELATO_OPENAI_FIRST_MESSAGE`
### 2. Deploy
```bash
cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare
npm run deploy
```
### 3. Point the ESP32 at the Worker
Example production route:
```text
wss://<your-worker>.workers.dev/ws/esp32
```
## Durable Object Model
The current setup uses one fresh Durable Object per websocket voice session.
That is the sensible default for realtime voice apps because:
- each call/session gets isolated state
- reconnects do not inherit stale memory
- turn state is easier to reason about
- cleanup is straightforward
This is what the Worker does in `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/src/index.ts`.
## Migrations
This backend already has a Durable Object rename migration in `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/wrangler.toml`:
- `ElatoOpenAiVoiceAgent` -> `ElatoVoiceSession`
If you rename the DO again later, add another migration instead of just changing the class name.
## Common Commands
Typecheck:
```bash
npm run typecheck
```
Local dev:
```bash
npm run dev
```
Deploy:
```bash
npm run deploy
```
## Operational Notes
A few things matter in practice:
- Rapid reconnect testing can trigger Workers AI rate limits, especially on TTS.
- If you redeploy while a websocket session is active, Cloudflare may log:
`This script has been upgraded. Please send a new request to connect to the new version.`
That is expected during deploy churn.
- If the ESP32 flips into speaking briefly and then falls back, check whether TTS actually produced audio or hit a `429`.
- If STT does not advance turns, inspect the STT provider logs first before debugging firmware state.
## Current Limitations
This Cloudflare backend is still a pragmatic project backend, not a polished platform product.
Current caveats:
- auth is still intentionally stubbed out with comments
- DB writes are still placeholders
- Workers AI rate limiting can affect repeated testing
- the stack is still operationally rough compared with the more mature Deno path
## Recommended Files To Read
If you are modifying this backend, read these first:
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/src/index.ts`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/models/session.ts`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/models/stt.ts`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/models/llm.ts`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/models/tts.ts`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/firmware-arduino/src/Audio.cpp`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/firmware-arduino/src/Config.cpp`
## Relationship To Other Servers
Elato currently includes multiple backend paths:
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/deno`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare`
- `/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi`
Use Cloudflare when you want:
- Workers + Durable Objects
- Cloudflare-hosted STT/TTS
- a stateful edge session model
Use Deno when you want:
- the most battle-tested Elato path right now
- direct provider integrations already working in production