diff --git a/README.md b/README.md index fdad7a6..8afb9a8 100644 --- a/README.md +++ b/README.md @@ -69,7 +69,7 @@ Realtime AI Speech powered by **OpenAI Realtime API**, **Eleven Labs AI Agents** -## π½οΈ Demo Video [(β¨ Gemini demo)](https://youtu.be/_zUBue3pfVI) +## π½οΈ Demo Video ([β¨ Gemini demo](https://youtu.be/_zUBue3pfVI), [Eleven Labs Demo](https://youtu.be/7LKTIuEW-hg))
-Video links: [OpenAI Demo](https://youtu.be/o1eIAwVll5I) | [Gemini Demo](https://youtu.be/_zUBue3pfVI) +Video links: [OpenAI Demo](https://youtu.be/o1eIAwVll5I) | [Gemini Demo](https://youtu.be/_zUBue3pfVI) | [Eleven Labs Demo](https://youtu.be/7LKTIuEW-hg) ## π·ββοΈ DIY Hardware Design @@ -235,10 +235,10 @@ flowchart TD ESP32[ESP32 Device] -->|WebSocket| Edge[Deno Edge Function] Edge -->|OpenAI API| OpenAI[OpenAI Realtime API] Edge -->|Gemini API| Gemini[Gemini Live API] - Edge -->|ElevenLabs API| ElevenLabs[ElevenLabs AI Agents] + Edge -->|Eleven Labs API| Eleven Labs[Eleven Labs AI Agents] OpenAI --> Edge Gemini --> Edge - ElevenLabs --> Edge + Eleven Labs --> Edge Edge -->|WebSocket| ESP32 ESP32 --> UserOutput ``` @@ -281,7 +281,7 @@ lib_deps = ``` ## Additional Docs -- [βΈοΈ Using the ElevenLabs API](./docs/ElevenLabs.md) +- [βΈοΈ Using the Eleven Labs API](./docs/ElevenLabs.md) - [π Core Use Cases](./docs/Usecases.md) - [π€π€π€ Getting Started with multiple devices](./docs/MultipleDevices.md) @@ -311,7 +311,7 @@ lib_deps = 2. Adding Arduino IDE support 3. Add Hume API client for emotion detection 4. Add MCP support on Deno Edge -5. Plug in ElevenLabs API for voice generation +5. Plug in Eleven Labs API for voice generation 6. Add Azure OpenAI Support (easy pickings) We welcome contributions diff --git a/docs/ElevenLabs.md b/docs/ElevenLabs.md index 1cc2302..59fe872 100644 --- a/docs/ElevenLabs.md +++ b/docs/ElevenLabs.md @@ -7,7 +7,7 @@ ``` 2. **Agent Configuration** - - Create an agent in the ElevenLabs dashboard + - Create an agent in the Eleven Labs dashboard - Copy the agent ID - On the Elato UI, Click `+ Create new` and create an Eleven Labs character with a `title` and the `agentId` @@ -16,16 +16,16 @@ 1. **Connection Flow**: - ESP32 connects to your Deno server via WebSocket - Server authenticates the user and gets their personality configuration - - If provider is "elevenlabs", server requests a signed URL from ElevenLabs API - - Server establishes WebSocket connection to ElevenLabs using the signed URL - - Server acts as a relay between ESP32 and ElevenLabs + - If provider is "elevenlabs", server requests a signed URL from Eleven Labs API + - Server establishes WebSocket connection to Eleven Labs using the signed URL + - Server acts as a relay between ESP32 and Eleven Labs 2. **Audio Processing**: - - IMPORTANT: In your ElevenLabs Agent Settings > Voice > TTS Output Format > Set this to PCM 24kHz. + - IMPORTANT: In your Eleven Labs Agent Settings > Voice > TTS Output Format > Set this to PCM 24kHz. - Currently the AI Agent must speak first. You can change this behaviour in `Audio.cpp`. - - ESP32 sends PCM16 audio data (binary) to server (If you change this, you will also want to change your ElevenLabs Audio input settings) - - Server converts to base64 and forwards to ElevenLabs - - ElevenLabs sends back base64 audio data + - ESP32 sends PCM16 audio data (binary) to server (If you change this, you will also want to change your Eleven Labs Audio input settings) + - Server converts to base64 and forwards to Eleven Labs + - Eleven Labs sends back base64 audio data - Server converts to PCM16, encodes with Opus, and sends to ESP32 3. **Message Types**: @@ -35,16 +35,16 @@ ## Usage -Once configured, the ElevenLabs provider works exactly like OpenAI and Gemini. The server will automatically route to the ElevenLabs implementation when the provider is set to "elevenlabs". +Once configured, the Eleven Labs provider works exactly like OpenAI and Gemini. The server will automatically route to the Eleven Labs implementation when the provider is set to "elevenlabs". ## Differences from OpenAI/Gemini -- **No system prompts**: ElevenLabs agents are configured in their dashboard through your account (and api key via a signedUrl) -- **Agent-based**: More advanced with workflow handling and tool calling with MCP through the ElevenLabs dashboard +- **No system prompts**: Eleven Labs agents are configured in their dashboard through your account (and api key via a signedUrl) +- **Agent-based**: More advanced with workflow handling and tool calling with MCP through the Eleven Labs dashboard ## Troubleshooting -1. **Connection Issues**: Ensure your ElevenLabs API key is valid and has access to Conversational AI -2. **Agent Not Found**: Verify the agent ID is correct and the agent exists in your ElevenLabs account +1. **Connection Issues**: Ensure your Eleven Labs API key is valid and has access to Conversational AI +2. **Agent Not Found**: Verify the agent ID is correct and the agent exists in your Eleven Labs account 3. **Audio Issues**: Check that the AI is speaking first. And that TTS Output is 24kHz pcm not 16kHz pcm (which is default) -4. **Transcription Missing**: Ensure your ElevenLabs agent has transcription enabled in the dashboard \ No newline at end of file +4. **Transcription Missing**: Ensure your Eleven Labs agent has transcription enabled in the dashboard \ No newline at end of file diff --git a/frontend-nextjs/app/components/CreateCharacter/ElevenLabsModal.tsx b/frontend-nextjs/app/components/CreateCharacter/ElevenLabsModal.tsx index ce603bd..d91eec0 100644 --- a/frontend-nextjs/app/components/CreateCharacter/ElevenLabsModal.tsx +++ b/frontend-nextjs/app/components/CreateCharacter/ElevenLabsModal.tsx @@ -84,7 +84,7 @@ export default function ElevenLabsModal({ isOpen, onClose, onSuccess, selectedUs- Find this in your ElevenLabs dashboard under your agent settings + Find this in your Eleven Labs dashboard under your agent settings
@@ -130,7 +130,7 @@ export default function ElevenLabsModal({ isOpen, onClose, onSuccess, selectedUs