History

akdeb da27a02c4d test cloudflare readme		2026-04-18 02:08:37 +05:30
..
models	re-arrange server files	2026-04-17 17:09:20 +05:30
src	re-arrange server files	2026-04-17 17:09:20 +05:30
.dev.vars.example	re-arrange server files	2026-04-17 17:09:20 +05:30
.gitignore	re-arrange server files	2026-04-17 17:09:20 +05:30
package-lock.json	re-arrange server files	2026-04-17 17:09:20 +05:30
package.json	re-arrange server files	2026-04-17 17:09:20 +05:30
README.md	test cloudflare readme	2026-04-18 02:08:37 +05:30
tsconfig.json	re-arrange server files	2026-04-17 17:09:20 +05:30
wrangler.toml	re-arrange server files	2026-04-17 17:09:20 +05:30

README.md

server/cloudflare

Cloudflare Workers + Durable Objects backend for Elato's ESP32 realtime voice flow.

This server keeps the existing Elato device protocol and routes audio through Cloudflare-hosted services:

STT: Cloudflare Workers AI via @cloudflare/voice
LLM: OpenAI Chat Completions
TTS: Cloudflare Workers AI Deepgram Aura
Transport: WebSocket + Opus packetization for ESP32

If you are new to the overall project, start with the root README first:

/Users/akashdeepdeb/Desktop/Projects/ElatoAI/README.md
/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/README.md

What This Server Does

This backend is meant to be an alternative to the Deno edge server, not a separate firmware protocol.

The ESP32 still talks to the same Elato-style control surface:

auth
AUDIO.COMMITTED
RESPONSE.CREATED
binary audio frames
RESPONSE.COMPLETE
SESSION.END

Public route:

/ws/esp32

Health check:

/healthz

Current Layout

server/cloudflare/
├── models/
│   ├── llm.ts
│   ├── session.ts
│   ├── stt.ts
│   └── tts.ts
├── src/
│   ├── index.ts
│   ├── opus.ts
│   ├── prompt.ts
│   └── types.ts
├── package.json
└── wrangler.toml

How It Works

The ESP32 opens a secure websocket to /ws/esp32.
The Worker creates a fresh Durable Object session for that websocket.
The server sends the Elato auth payload.
The server triggers the first assistant turn.
LLM output is synthesized to audio.
Audio is packetized into Opus frames and streamed back to the ESP32.
After playback, the ESP32 goes back to listening.
Incoming mic audio is fed to the STT session for the next turn.

Prerequisites

You need:

Node.js 22+
npm
a Cloudflare account with Workers enabled
a Workers AI binding
an OpenAI API key for the LLM path

Local Development

1. Install dependencies

cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare
npm install

2. Create local env vars

Copy the example file:

cp .dev.vars.example .dev.vars

Then fill in the values you actually need.

Typical local file:

OPENAI_API_KEY=...
ELATO_OPENAI_MODEL=gpt-4.1-mini
ELATO_OPENAI_SYSTEM_PROMPT=You are a friendly toy character.
ELATO_OPENAI_FIRST_MESSAGE=Say hello first in one short sentence.

Notes:

JWT_SECRET_KEY is not currently required for the stripped-down iteration unless you wire auth back in.
Do not commit real secrets.

3. Run locally

npm run dev

This uses:

wrangler dev --ip 0.0.0.0 --port 8787

So local access is typically:

http://<your-lan-ip>:8787/healthz
ws://<your-lan-ip>:8787/ws/esp32

For local firmware testing:

point the ESP32 at your machine's LAN IP, not 0.0.0.0
local plain ws:// is fine for quick testing if your firmware build allows it
production firmware should use wss://

Deploying to Cloudflare

1. Set Worker secrets

Set the runtime secrets in Cloudflare:

OPENAI_API_KEY
optionally ELATO_OPENAI_MODEL
optionally ELATO_OPENAI_SYSTEM_PROMPT
optionally ELATO_OPENAI_FIRST_MESSAGE

2. Deploy

cd /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare
npm run deploy

3. Point the ESP32 at the Worker

Example production route:

wss://<your-worker>.workers.dev/ws/esp32

Durable Object Model

The current setup uses one fresh Durable Object per websocket voice session.

That is the sensible default for realtime voice apps because:

each call/session gets isolated state
reconnects do not inherit stale memory
turn state is easier to reason about
cleanup is straightforward

This is what the Worker does in /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/src/index.ts.

Migrations

This backend already has a Durable Object rename migration in /Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare/wrangler.toml:

ElatoOpenAiVoiceAgent -> ElatoVoiceSession

If you rename the DO again later, add another migration instead of just changing the class name.

Common Commands

Typecheck:

npm run typecheck

Local dev:

npm run dev

Deploy:

npm run deploy

Operational Notes

A few things matter in practice:

Rapid reconnect testing can trigger Workers AI rate limits, especially on TTS.
If you redeploy while a websocket session is active, Cloudflare may log: This script has been upgraded. Please send a new request to connect to the new version. That is expected during deploy churn.
If the ESP32 flips into speaking briefly and then falls back, check whether TTS actually produced audio or hit a 429.
If STT does not advance turns, inspect the STT provider logs first before debugging firmware state.

Current Limitations

This Cloudflare backend is still a pragmatic project backend, not a polished platform product.

Current caveats:

auth is still intentionally stubbed out with comments
DB writes are still placeholders
Workers AI rate limiting can affect repeated testing
the stack is still operationally rough compared with the more mature Deno path

Relationship To Other Servers

Elato currently includes multiple backend paths:

/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/deno
/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/cloudflare
/Users/akashdeepdeb/Desktop/Projects/ElatoAI/server/fastapi

Use Cloudflare when you want:

Workers + Durable Objects
Cloudflare-hosted STT/TTS
a stateful edge session model

Use Deno when you want:

the most battle-tested Elato path right now
direct provider integrations already working in production