Realtime Voice AI with 100+ Models on Arduino ESP32 with Secure Websockets and E
Find a file
2025-04-17 16:37:07 +01:00
firmware-arduino frontend update 2025-04-17 14:38:57 +01:00
frontend-nextjs frontend update 2025-04-17 14:38:57 +01:00
node_modules frontend update 2025-04-17 14:38:57 +01:00
server-deno frontend update 2025-04-17 14:38:57 +01:00
supabase adding seed file 2025-04-17 16:37:07 +01:00
.DS_Store frontend update 2025-04-17 14:38:57 +01:00
package-lock.json frontend update 2025-04-17 14:38:57 +01:00
package.json frontend update 2025-04-17 14:38:57 +01:00
README.md adding seed file 2025-04-17 16:37:07 +01:00

🚀 ElatoAI: Realtime AI Speech for ESP32

Realtime AI Speech powered by OpenAI Realtime API, ESP32, Secure WebSockets, and Deno Edge Functions for >10-minute uninterrupted global conversations!

📺 Demo Video

Watch Demo on YouTube

Click the image above to watch the demo video on YouTube

🌟 Features

  • Realtime Speech-to-Speech: Instant speech conversion powered by OpenAI's Realtime APIs.
  • Secure WebSockets: Reliable, encrypted WebSocket communication.
  • Server Turn Detection: Intelligent conversation flow handling for smooth interactions.
  • Opus Audio Compression: High-quality audio streaming with minimal bandwidth.
  • Global Edge Performance: Low latency Deno Edge Functions ensuring seamless global conversations.
  • ESP32 Arduino Framework: Optimized and easy-to-use hardware integration.

📌 Project Architecture

ElatoAI consists of three main components:

  1. Frontend Client (Next.js hosted on Vercel)
  2. Edge Server Functions (Deno running on Supabase Edge)
  3. ESP32 IoT Client (PlatformIO/Arduino)

🛠 Tech Stack

Component Technology Used
Frontend Next.js, Vercel, Supabase DB
Edge Functions Deno Edge Functions on Supabase
IoT Client PlatformIO, Arduino Framework, ESP32-S3
Audio Codec Opus
Communication Secure WebSockets
Libraries ArduinoJson, WebSockets, AsyncTCP

🗺️ High-Level Flow

flowchart TD
  User[User Speech] --> ESP32
  ESP32[ESP32 Device] -->|WebSocket| Edge[Deno Edge Function]
  Edge -->|OpenAI API| OpenAI[OpenAI Realtime API]
  OpenAI --> Edge
  Edge -->|WebSocket| ESP32
  ESP32 --> User[AI Generated Speech]

📂 Project Structure

graph TD
  repo[ElatoAI]
  repo --> frontend[Frontend - Next.js]
  repo --> deno[Deno Edge Function]
  repo --> esp32[ESP32 Arduino Client]

  frontend --> supabase[Supabase DB]
  esp32 --> websockets[Secure WebSockets]
  esp32 --> opus[Opus Codec]
  esp32 --> audio_tools[arduino-audio-tools]

⚙️ PlatformIO Configuration

[env:esp32-s3-devkitc-1]
platform = espressif32 @ 6.10.0
board = esp32-s3-devkitc-1
framework = arduino
monitor_speed = 115200

lib_deps =
    bblanchon/ArduinoJson@^7.1.0
    links2004/WebSockets@^2.4.1
    https://github.com/esp-arduino-libs/ESP32_Button.git
    https://github.com/pschatzmann/arduino-audio-tools.git#v1.0.1
    https://github.com/pschatzmann/arduino-libopus.git
    ESP32Async/AsyncTCP
    ESP32Async/ESPAsyncWebServer

📊 Important Stats

  • Latency: <150ms round-trip globally
  • 🎧 Audio Quality: Opus codec at 24kbps (high clarity)
  • Uninterrupted Conversations: Over 10 minutes continuous conversations
  • 🌎 Global Availability: Optimized with edge computing via Supabase

🛡 Security

  • Secure WebSockets (WSS) for encrypted data transfers
  • Edge validation and error handling for robust, secure deployment

🤝 Contributing

We welcome contributions!

  • Fork this repository.
  • Create your feature branch (git checkout -b feature/AmazingFeature).
  • Commit your changes (git commit -m 'Add AmazingFeature').
  • Push to the branch (git push origin feature/AmazingFeature).
  • Open a pull request!

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.


🎉 Give a if you found this project interesting!