diff --git a/docs/research/acp-deep-dive.md b/docs/research/acp-deep-dive.md new file mode 100644 index 00000000..bad16629 --- /dev/null +++ b/docs/research/acp-deep-dive.md @@ -0,0 +1,847 @@ +# Agent Client Protocol (ACP) — Deep Technical Analysis + +> Дата исследования: 2026-03-24 +> Контекст: интеграция ACP в Claude Agent Teams UI (Electron 40.x) + +--- + +## 1. Что такое ACP? + +**Agent Client Protocol (ACP)** — это открытый стандарт коммуникации между редакторами кода (IDE) и AI-агентами. Создан Zed Industries, поддерживается JetBrains с октября 2025. + +**Аналогия:** LSP (Language Server Protocol) стандартизировал интеграцию языковых серверов с редакторами. ACP делает то же самое для AI coding agents. + +**Проблема, которую решает:** +- Каждый редактор делал кастомную интеграцию для каждого агента (M x N) +- Агенты были привязаны к конкретным IDE +- ACP сводит M x N → M + N (агент реализует ACP один раз, работает во всех IDE) + +**Лицензия:** Apache 2.0 +**Governance:** Lead Maintainers — Ben Brandt (Zed Industries), Sergey Ignatov (JetBrains) + +**Источники:** +- Спецификация: https://agentclientprotocol.com/ +- GitHub: https://github.com/agentclientprotocol/agent-client-protocol +- Zed ACP: https://zed.dev/acp + +> **ВАЖНО:** Существует ТРИ разных протокола с аббревиатурой ACP: +> 1. **Agent Client Protocol** (Zed/JetBrains) — редактор ↔ агент. **Это наш фокус.** +> 2. **Agent Communication Protocol** (IBM BeeAI) — агент ↔ агент. Сливается с A2A (Linux Foundation). Не релевантно. +> 3. **Agent Connect Protocol** (Agntcy Collective) — REST API для remote agents. Не релевантно. + +--- + +## 2. Архитектура протокола + +### 2.1 Транспорт + +| Режим | Транспорт | Формат | Статус | +|-------|-----------|--------|--------| +| **Локальный** | stdio (stdin/stdout) | NDJSON (newline-delimited JSON) | Стабильный | +| **TCP** | TCP socket (порт) | NDJSON | Стабильный (Copilot CLI: `--acp --port 8080`) | +| **Remote** | HTTP / WebSocket | JSON-RPC | **Work in progress** | + +Основной режим: **JSON-RPC 2.0 поверх NDJSON через stdio**. Клиент (IDE) spawn'ит агента как subprocess, stdin/stdout становятся транспортом. + +### 2.2 Типы сообщений + +Два типа (JSON-RPC 2.0): +- **Methods** — request-response пары, ожидают result или error +- **Notifications** — односторонние сообщения, без ответа + +### 2.3 Lifecycle + +``` +Client Agent + | | + |------- initialize --------------->| (версия протокола + capabilities) + |<------ InitializeResponse --------| (agent capabilities) + | | + |------- authenticate ------------->| (если требуется) + |<------ AuthenticateResponse ------| + | | + |------- session/new --------------->| (cwd, mcpServers[]) + |<------ NewSessionResponse ---------| (sessionId) + | | + |------- session/prompt ------------->| (prompt content) + |<~~~~~~ session/update (notification)| (streaming chunks, tool calls, plans) + |<~~~~~~ session/update | + |<--request_permission --------------| (tool approval) + |------- permission response ------->| + |<~~~~~~ session/update | + |<------ PromptResponse -------------| (stopReason) + | | + |------- session/prompt (next) ----->| + | ... | +``` + +### 2.4 Session Update Events (стриминг) + +Во время `prompt` агент шлёт `session/update` notifications: + +| Event | Описание | +|-------|----------| +| `agent_message_chunk` | Текстовый чанк от агента (streaming) | +| `agent_thought_chunk` | Мысли агента (thinking) | +| `user_message_chunk` | Эхо пользовательского ввода | +| `tool_call` | Новый вызов инструмента (pending/completed) | +| `tool_call_update` | Обновление статуса вызова инструмента | +| `plan` | План действий с приоритетами и статусами | +| `available_commands_update` | Обновление доступных команд | +| `config_option_update` | Изменение конфигурации | +| `current_mode_update` | Смена режима сессии | +| `session_info_update` | Метаданные сессии (title, activity) | +| `usage_update` | Потребление токенов (draft) | + +### 2.5 Client-Provided Methods + +Клиент (IDE) предоставляет агенту доступ к: + +| Метод | Описание | Required? | +|-------|----------|-----------| +| `session/request_permission` | Запрос разрешения на выполнение инструмента | **Required** | +| `fs/read_text_file` | Чтение файла | Optional | +| `fs/write_text_file` | Запись файла | Optional | +| `terminal/create` | Создание терминала | Optional | +| `terminal/output` | Получение вывода терминала | Optional | +| `terminal/wait_for_exit` | Ожидание завершения | Optional | +| `terminal/kill` | Завершение процесса | Optional | +| `terminal/release` | Освобождение ресурсов | Optional | + +### 2.6 MCP Integration + +ACP переиспользует JSON-представления из MCP где возможно. Агент может принимать MCP сервера при создании сессии: + +```typescript +connection.newSession({ + cwd: '/path/to/project', + mcpServers: [ + { type: 'stdio', command: 'node', args: ['mcp-server.js'] }, + { type: 'http', url: 'https://mcp.example.com', headers: {} }, + { type: 'sse', url: 'https://mcp.example.com/sse', headers: {} }, + ], +}); +``` + +--- + +## 3. TypeScript SDK — API Surface + +### 3.1 Package Info + +| Характеристика | Значение | +|----------------|----------| +| **npm** | `@agentclientprotocol/sdk` | +| **Версия** | 0.16.1 (март 2026) | +| **Размер** | 863 kB | +| **Dependencies** | **0** (zero dependencies!) | +| **Лицензия** | Apache-2.0 | +| **Dependents** | 245+ пакетов | +| **GitHub stars** | 122 | +| **Contributors** | 31 | +| **Commits** | 544 | +| **Used by** | 823+ проектов | + +**Факт:** ранее публиковался как `@zed-industries/agent-client-protocol`, переименован. + +### 3.2 Exported Classes (4) + +```typescript +import { + ClientSideConnection, // Для клиентов (IDE) — наш интерес + AgentSideConnection, // Для агентов (сервер) + TerminalHandle, // Управление терминалом + RequestError, // Типизированная ошибка +} from '@agentclientprotocol/sdk'; +``` + +### 3.3 Exported Interfaces (2) + +```typescript +interface Client { + requestPermission(params: RequestPermissionRequest): Promise; + sessionUpdate(params: SessionNotification): Promise; + writeTextFile?(params: WriteTextFileRequest): Promise; + readTextFile?(params: ReadTextFileRequest): Promise; + // terminal methods... +} + +interface Agent { + initialize(params: InitializeRequest): Promise; + newSession(params: NewSessionRequest): Promise; + authenticate?(params: AuthenticateRequest): Promise; + prompt(params: PromptRequest): Promise; + cancel?(params: CancelNotification): Promise; + setSessionMode?(params: SetSessionModeRequest): Promise; + // ... +} +``` + +### 3.4 Exported Functions (1) + Variables (3) + +```typescript +// Единственная утилитарная функция — создаёт NDJSON stream +function ndJsonStream(input: WritableStream, output: ReadableStream): Stream; + +// Константы +const PROTOCOL_VERSION: string; // Текущая версия протокола +const AGENT_METHODS: string[]; // Список методов агента +const CLIENT_METHODS: string[]; // Список методов клиента +``` + +### 3.5 Type Aliases (~180+) + +Полный список категорий типов: + +- **Content:** `TextContent`, `ImageContent`, `AudioContent`, `Content`, `ContentBlock`, `ContentChunk` +- **Authentication:** `AuthMethod`, `AuthCapabilities`, `AuthenticateRequest/Response` +- **Sessions:** `SessionId`, `SessionInfo`, `SessionCapabilities`, `SessionUpdate`, `SessionMode` +- **Tools:** `ToolCall`, `ToolCallUpdate`, `ToolCallId`, `ToolCallStatus`, `ToolKind` +- **Permissions:** `RequestPermissionRequest/Response`, `PermissionOption`, `PermissionOptionKind` +- **Plans:** `Plan`, `PlanEntry`, `PlanEntryStatus`, `PlanEntryPriority` +- **Diffs:** `Diff` (`path`, `oldText`, `newText`) +- **File System:** `ReadTextFileRequest/Response`, `WriteTextFileRequest/Response` +- **Terminals:** `Terminal`, `CreateTerminalRequest/Response`, `TerminalExitStatus` +- **MCP:** `McpCapabilities`, `McpServer`, `McpServerStdio`, `McpServerHttp`, `McpServerSse` +- **Protocol:** `InitializeRequest/Response`, `PromptRequest/Response`, `StopReason`, `Cost`, `Usage` +- **Elicitation (draft):** `ElicitationRequest/Response`, `ElicitationSchema` — формы ввода от агента +- **Config:** `SessionConfigOption`, `SessionConfigBoolean`, `SessionConfigSelect` +- **Models:** `ModelId`, `ModelInfo` + +### 3.6 ClientSideConnection — Full API + +```typescript +class ClientSideConnection { + constructor(toClient: (agent: Agent) => Client, stream: Stream); + + // Properties + signal: AbortSignal; // Aborts when connection closes + closed: Promise; // Resolves when connection ends + + // Core methods + initialize(params: InitializeRequest): Promise; + newSession(params: NewSessionRequest): Promise; + prompt(params: PromptRequest): Promise; + authenticate(params: AuthenticateRequest): Promise; + + // Session management + loadSession(params: LoadSessionRequest): Promise; // Resume previous + listSessions(params: ListSessionsRequest): Promise; // List available + setSessionMode(params: SetSessionModeRequest): Promise; + setSessionConfigOption(params: SetSessionConfigOptionRequest): Promise; + + // Unstable/experimental + unstable_forkSession(params: ForkSessionRequest): Promise; + unstable_resumeSession(params: ResumeSessionRequest): Promise; + unstable_closeSession(params: CloseSessionRequest): Promise; + unstable_setSessionModel(params: SetSessionModelRequest): Promise; + unstable_logout(params: LogoutRequest): Promise; + + // Notifications + cancel(params: CancelNotification): Promise; // Cancel ongoing prompt + + // Extensibility + extMethod(method: string, params: Record): Promise>; + extNotification(method: string, params: Record): Promise; +} +``` + +--- + +## 4. Какие агенты поддерживают ACP? + +### Подтверждённые (с доказательствами) + +| Агент | Поддержка ACP | Как реализовано | Источник | +|-------|--------------|-----------------|----------| +| **Gemini CLI** | Нативная (reference implementation) | Встроенный ACP-сервер | [zed.dev/acp](https://zed.dev/acp) | +| **Claude Code** | Через адаптер | `@zed-industries/claude-code-acp` (npm, Apache 2.0) | [GitHub](https://github.com/zed-industries/claude-agent-acp) | +| **Codex CLI** | Через community adapter | Zed adapter | [zed.dev/docs/ai/external-agents](https://zed.dev/docs/ai/external-agents) | +| **GitHub Copilot CLI** | Нативная (public preview) | `copilot --acp` / `copilot --acp --port 8080` | [GitHub Blog](https://github.blog/changelog/2026-01-28-acp-support-in-copilot-cli-is-now-in-public-preview/) | +| **Goose** (Block) | Нативная | Встроенный ACP-сервер | [goose blog](https://block.github.io/goose/blog/2025/10/24/intro-to-agent-client-protocol-acp/) | +| **Junie** (JetBrains) | Нативная | Встроена в JetBrains AI Assistant | [JetBrains](https://www.jetbrains.com/help/ai-assistant/acp.html) | +| **Cline** | Нативная | Встроенный ACP-сервер | [DeepWiki](https://deepwiki.com/cline/cline/12.5-agent-client-protocol-(acp)) | +| **Kiro CLI** | Нативная | Встроенный ACP-сервер | [Kiro docs](https://kiro.dev/docs/cli/acp/) | +| **OpenCode** | Нативная | Встроенный ACP-сервер | [opencode.ai](https://opencode.ai/docs/acp/) | +| **Augment Code** | Нативная | ACP Registry | [Registry](https://agentclientprotocol.com/registry) | +| **Qwen Code** | Нативная | ACP Registry | VS Code ACP Client | + +**Claude Code НЕ имеет нативного `--acp` флага** (есть [Feature Request #6686](https://github.com/anthropics/claude-code/issues/6686)). Работает через `@zed-industries/claude-code-acp` адаптер, который использует Claude Agent SDK. + +### IDE/Клиенты с ACP поддержкой + +| Клиент | Статус | +|--------|--------| +| **Zed** | Нативная (создатели протокола) | +| **JetBrains** (IntelliJ, PyCharm и др.) | Нативная (co-maintainer) | +| **Neovim** | Через плагины (CodeCompanion, avante.nvim) | +| **Emacs** | Community extensions | +| **Marimo** (Python notebooks) | Встроенная | +| **VS Code** | **НЕТ** (ключевой вопрос для экосистемы) | +| **Cursor** | **НЕТ** (может появиться если будет спрос) | + +--- + +## 5. Конкретный пример кода (из SDK) + +### Client (IDE side) + +```typescript +import { spawn } from 'node:child_process'; +import { Writable, Readable } from 'node:stream'; +import * as acp from '@agentclientprotocol/sdk'; + +class MyClient implements acp.Client { + async requestPermission(params: acp.RequestPermissionRequest): Promise { + // UI показывает dialog с params.options + return { + outcome: { outcome: 'selected', optionId: params.options[0].optionId }, + }; + } + + async sessionUpdate(params: acp.SessionNotification): Promise { + const update = params.update; + switch (update.sessionUpdate) { + case 'agent_message_chunk': + if (update.content.type === 'text') { + console.log(update.content.text); // Streaming text + } + break; + case 'tool_call': + console.log(`Tool: ${update.title} (${update.status})`); + break; + case 'tool_call_update': + console.log(`Tool ${update.toolCallId}: ${update.status}`); + break; + case 'plan': + // Plan entries with status/priority + break; + } + } + + async readTextFile(params: acp.ReadTextFileRequest): Promise { + const content = fs.readFileSync(params.path, 'utf-8'); + return { content }; + } + + async writeTextFile(params: acp.WriteTextFileRequest): Promise { + fs.writeFileSync(params.path, params.content); + return {}; + } +} + +async function main() { + // Spawn agent process + const agentProcess = spawn('claude', ['--acp'], { + stdio: ['pipe', 'pipe', 'inherit'], + }); + + // Create NDJSON stream over stdio + const input = Writable.toWeb(agentProcess.stdin!); + const output = Readable.toWeb(agentProcess.stdout!) as ReadableStream; + const stream = acp.ndJsonStream(input, output); + + // Create connection + const client = new MyClient(); + const connection = new acp.ClientSideConnection((_agent) => client, stream); + + // Initialize + const initResult = await connection.initialize({ + protocolVersion: acp.PROTOCOL_VERSION, + clientCapabilities: { + fs: { readTextFile: true, writeTextFile: true }, + terminal: { create: true, output: true, kill: true }, + }, + }); + + // Create session + const session = await connection.newSession({ + cwd: '/path/to/project', + mcpServers: [], + }); + + // Send prompt (blocks until agent completes turn) + const result = await connection.prompt({ + sessionId: session.sessionId, + prompt: [{ type: 'text', text: 'Fix the bug in main.ts' }], + }); + + console.log(`Stop reason: ${result.stopReason}`); // 'end_turn' | 'cancelled' | ... +} +``` + +### Process spawning + +**ACP SDK НЕ управляет spawn'ом процесса.** Это ответственность клиента. SDK берёт на себя только протокол поверх уже готового stream (stdin/stdout). + +```typescript +// ACP SDK expects web streams: +const input = Writable.toWeb(childProcess.stdin!); // WritableStream +const output = Readable.toWeb(childProcess.stdout!); // ReadableStream +const stream = acp.ndJsonStream(input, output); // ACP Stream +``` + +--- + +## 6. ACP vs MCP — Различия + +| Аспект | MCP (Model Context Protocol) | ACP (Agent Client Protocol) | +|--------|-----|-----| +| **Создатель** | Anthropic | Zed Industries + JetBrains | +| **Фокус** | Инструменты/данные для модели | Коммуникация IDE ↔ агент | +| **Аналогия** | "Дать человеку лучшие инструменты" | "Собрать команду из людей" | +| **Отношение** | **Что** (доступ к данным/tools) | **Где** (где агент живёт в workflow) | +| **Протокол** | JSON-RPC 2.0 поверх stdio/SSE/HTTP | JSON-RPC 2.0 поверх NDJSON stdio | +| **Типы контента** | Tools, Resources, Prompts | Messages, Tool Calls, Plans, Diffs, Permissions | +| **Стейт** | Stateless на уровне протокола | Stateful (sessions, message history) | +| **Sessions** | Нет (транспортные сессии) | Да (conversation sessions с ID) | +| **Streaming** | Через SSE или notifications | session/update notifications | + +**Ключевое:** ACP и MCP комплементарны. ACP-сессия может принимать MCP-серверы (`mcpServers` в `newSession`). Агент использует MCP для доступа к инструментам, а ACP для общения с IDE. + +--- + +## 7. Зрелость и стабильность + +### Версионирование + +SDK на v0.16.1 (март 2026) — ещё **pre-1.0**. Много `unstable_` методов. + +### Timeline ключевых событий + +| Дата | Событие | +|------|---------| +| Сентябрь 2025 | Zed анонсирует ACP | +| Октябрь 2025 | JetBrains присоединяется | +| Октябрь 2025 | Gemini CLI — первая интеграция | +| Январь 2026 | Copilot CLI ACP public preview | +| Январь 2026 | ACP Registry запущен | +| Февраль 2026 | Session Config Options стабилизированы | +| Март 2026 | session/list + session_info_update стабилизированы | +| Март 2026 | SDK v0.16.1 | + +### Что в Draft (ещё не стабилизировано) + +- `session/close` — закрытие сессий +- `session/fork` — форк сессий +- `session/resume` — возобновление сессий +- Elicitation — формы ввода от агента +- Usage updates — статистика токенов +- Message IDs — идентификаторы сообщений +- Delete in Diff — удаление файлов через diff +- Next Edit Suggestions — предложения следующих правок + +### Breaking Changes + +Протокол на стадии 0.x — breaking changes возможны между минорными версиями. Rename пакета `@zed-industries/agent-client-protocol` → `@agentclientprotocol/sdk` уже произошёл. + +--- + +## 8. Анализ интеграции в наш Electron app + +### 8.1 Текущая архитектура (как мы работаем сейчас) + +Наш стек коммуникации с Claude Code: + +``` +Electron Main Process + └── TeamProvisioningService + ├── spawnCli() → ChildProcess (stream-json) + ├── stdin.write(NDJSON) → Claude CLI + ├── stdout → parse NDJSON lines + │ ├── type: "user" / "assistant" / "result" / "system" + │ ├── type: "control_request" (tool approval) + │ └── result.success → turn complete + └── stderr → logs, error detection +``` + +**Ключевые аргументы CLI:** +``` +--input-format stream-json --output-format stream-json +``` + +**Наша обработка:** +- `HANDLED_STREAM_JSON_TYPES = ['user', 'assistant', 'control_request', 'result', 'system']` +- `stdin.write(message + '\n')` — отправка +- Ручной парсинг NDJSON с carry buffer для неполных строк +- `control_request` → UI dialog для tool approval +- `result.success` → turn complete, process alive +- SIGKILL для остановки (SIGTERM вызывает cleanup) + +### 8.2 Что ACP заменил бы + +| Компонент | Сейчас (stream-json) | С ACP | +|-----------|---------------------|-------| +| **Spawn** | `spawnCli()` | Остаётся наш `spawnCli()` | +| **Transport** | Ручной NDJSON парсинг с carry buffer | `acp.ndJsonStream()` + `ClientSideConnection` | +| **Initialize** | Нет (просто шлём prompt) | `connection.initialize()` — capabilities negotiation | +| **Session** | Нет (implicit) | `connection.newSession()` — explicit session ID | +| **Prompt** | `stdin.write(JSON.stringify({type:'user',...}) + '\n')` | `connection.prompt({sessionId, prompt})` | +| **Streaming** | Ручной парсинг stdout строк | `sessionUpdate()` callback с typed events | +| **Tool approval** | `control_request` парсинг | `requestPermission()` callback | +| **File ops** | Нет (агент делает сам) | `readTextFile()` / `writeTextFile()` callbacks | +| **Terminal** | Нет | `terminal/*` callbacks | +| **Cancel** | SIGKILL | `connection.cancel()` (graceful) | + +### 8.3 Что ACP НЕ решает (нам всё ещё нужно) + +1. **Agent Teams orchestration** — ACP это one-agent ↔ one-client. Оркестрация команд, TaskCreate, SendMessage, TeamCreate — всё это наш domain logic поверх CLI-specific протокола. + +2. **stream-json специфика Claude Code** — Claude Code НЕ поддерживает `--acp` нативно. Он использует `--input-format stream-json --output-format stream-json`. ACP требует адаптер (`@zed-industries/claude-code-acp`), который внутри использует Claude Agent SDK. + +3. **Team file monitoring** — Наш `TeamConfigReader`, `TeamTaskReader`, `TeamInboxReader` мониторят файлы на диске. ACP не имеет concept of teams/tasks. + +4. **Cross-team communication** — Наш `cross_team_send`, inbox relay, sentinel messages — всё это специфика нашей архитектуры. + +5. **Post-compact context recovery** — Наши `pendingPostCompactReminder` и context reinjection — domain-specific. + +6. **Member spawn management** — Трекинг `MemberSpawnStatus`, reconnect, stall detection — наш код. + +7. **MCP config building** — `TeamMcpConfigBuilder` — наш код для сборки MCP конфигов. + +8. **Tool approval auto-resolve** — `shouldAutoAllow()` и custom rules — наша логика. + +### 8.4 Гипотетическая интеграция (Pseudocode) + +```typescript +// === ВАРИАНТ A: ACP для нового multi-agent клиента === +// Если бы Claude Code поддерживал --acp нативно + +import * as acp from '@agentclientprotocol/sdk'; +import { spawnCli } from '@main/utils/childProcess'; + +class TeamAgentClient implements acp.Client { + constructor( + private teamName: string, + private memberName: string, + private onUpdate: (event: SessionUpdate) => void, + private onPermission: (request: ToolApprovalRequest) => Promise, + ) {} + + async requestPermission(params: acp.RequestPermissionRequest) { + // Проксируем в наш UI через существующий tool approval flow + const approval = await this.onPermission(mapToOurFormat(params)); + return mapToAcpResponse(approval); + } + + async sessionUpdate(params: acp.SessionNotification) { + // Маппим ACP events в наши TeamChangeEvent'ы + const update = params.update; + switch (update.sessionUpdate) { + case 'agent_message_chunk': + this.onUpdate({ type: 'agent-text', text: update.content.text }); + break; + case 'tool_call': + this.onUpdate({ type: 'tool-call', ...mapToolCall(update) }); + break; + case 'plan': + this.onUpdate({ type: 'plan-update', entries: update.entries }); + break; + } + } +} + +async function spawnAgentWithAcp(claudePath: string, args: string[], cwd: string) { + // 1. Spawn process (наш существующий код) + const child = spawnCli(claudePath, ['--acp', ...args], { cwd, stdio: ['pipe', 'pipe', 'pipe'] }); + + // 2. Create ACP connection (заменяет весь ручной NDJSON парсинг) + const input = Writable.toWeb(child.stdin!); + const output = Readable.toWeb(child.stdout!) as ReadableStream; + const stream = acp.ndJsonStream(input, output); + + const client = new TeamAgentClient(teamName, memberName, onUpdate, onPermission); + const connection = new acp.ClientSideConnection((_agent) => client, stream); + + // 3. Initialize + const initResult = await connection.initialize({ + protocolVersion: acp.PROTOCOL_VERSION, + clientCapabilities: { + fs: { readTextFile: true, writeTextFile: true }, + terminal: { create: true, output: true, kill: true }, + }, + }); + + // 4. Create session with MCP servers + const mcpConfigPath = await mcpBuilder.writeConfigFile(); + const session = await connection.newSession({ + cwd, + mcpServers: [ + { type: 'stdio', command: 'node', args: [mcpServerPath] }, + ], + }); + + // 5. Send prompt + const result = await connection.prompt({ + sessionId: session.sessionId, + prompt: [{ type: 'text', text: provisioningPrompt }], + }); + + // 6. Graceful cancel instead of SIGKILL + await connection.cancel({ sessionId: session.sessionId }); + + return { connection, session, child }; +} +``` + +```typescript +// === ВАРИАНТ B: ACP как дополнительный протокол (реалистичный) === +// Claude Code -> stream-json (как сейчас) +// Другие агенты (Gemini, Codex, Copilot) -> ACP +// Наше приложение поддерживает ОБА протокола + +interface AgentConnection { + sendPrompt(text: string): Promise; + onMessage(callback: (msg: AgentMessage) => void): void; + cancel(): Promise; + close(): void; +} + +class StreamJsonConnection implements AgentConnection { + // Существующий код из TeamProvisioningService + // stream-json протокол Claude Code +} + +class AcpConnection implements AgentConnection { + private connection: acp.ClientSideConnection; + private sessionId: string; + + constructor(connection: acp.ClientSideConnection, sessionId: string) { + this.connection = connection; + this.sessionId = sessionId; + } + + async sendPrompt(text: string) { + await this.connection.prompt({ + sessionId: this.sessionId, + prompt: [{ type: 'text', text }], + }); + } + + async cancel() { + await this.connection.cancel({ sessionId: this.sessionId }); + } +} + +function createAgentConnection(agent: AgentType, child: ChildProcess): AgentConnection { + if (agent === 'claude-code') { + return new StreamJsonConnection(child); // Как сейчас + } + // Gemini CLI, Codex CLI, Copilot CLI и др. + const stream = acp.ndJsonStream( + Writable.toWeb(child.stdin!), + Readable.toWeb(child.stdout!) as ReadableStream, + ); + const conn = new acp.ClientSideConnection((_agent) => new AcpClient(), stream); + return new AcpConnection(conn, sessionId); +} +``` + +### 8.5 Ключевые технические проблемы интеграции + +#### Проблема 1: Web Streams vs Node Streams +ACP SDK использует Web Streams API (`ReadableStream`, `WritableStream`). Node.js child_process возвращает Node Streams. Нужна конвертация: +```typescript +const input = Writable.toWeb(child.stdin!); // node:stream → web stream +const output = Readable.toWeb(child.stdout!); // node:stream → web stream +``` +В Electron 40.x (Node 22+) эти конвертации доступны нативно. + +#### Проблема 2: Claude Code не поддерживает ACP +Claude Code использует `stream-json`, не ACP. Для ACP нужен `@zed-industries/claude-code-acp` адаптер (который в свою очередь использует Claude Agent SDK — отдельный npm пакет с Anthropic API key). + +**Наш текущий подход (прямой CLI)** не требует API key — используется auth token пользователя. Адаптер `claude-code-acp` требует `ANTHROPIC_API_KEY`, что делает его непрактичным для нашего zero-config подхода. + +#### Проблема 3: Blocking prompt() +`connection.prompt()` блокирует до завершения turn'а. Streaming идёт через callback'и (`sessionUpdate`). Это отличается от нашего подхода где мы парсим stdout строку за строкой. + +#### Проблема 4: Team orchestration +ACP — это 1:1 (один клиент, один агент). У нас N агентов в команде. Каждый агент = отдельный ACP connection. Координация между ними — полностью наш код. + +--- + +## 9. Что код мы СОХРАНЯЕМ vs что ACP заменяет + +### Сохраняем (наш domain logic): + +| Файл/Модуль | Причина | +|-------------|---------| +| `TeamProvisioningService.ts` (80%) | Team orchestration, member management, task tracking | +| `TeamConfigReader.ts` | File-based team config monitoring | +| `TeamTaskReader.ts` | File-based task monitoring | +| `TeamInboxReader.ts` | File-based inbox monitoring | +| `TeamMcpConfigBuilder.ts` | MCP config generation | +| `TeamMembersMetaStore.ts` | Member metadata | +| `TeamSentMessagesStore.ts` | Sent messages tracking | +| `ClaudeBinaryResolver.ts` | CLI binary resolution | +| `childProcess.ts` | Process spawning (spawnCli, killProcessTree) | +| `toolApprovalRules.ts` | Auto-approval logic | +| `actionModeInstructions.ts` | Agent instructions | +| Cross-team communication | Inbox relay, sentinel messages | +| Post-compact recovery | Context reinjection | +| Stall detection | Watchdog timers | +| Auth retry | Re-spawn on auth failure | + +### ACP заменяет (если бы Claude Code поддерживал): + +| Компонент | Строки кода | Чем заменяет | +|-----------|-------------|--------------| +| NDJSON парсинг stdout | ~200 LOC | `ndJsonStream()` + `ClientSideConnection` | +| Carry buffer логика | ~50 LOC | Автоматически в SDK | +| Message type dispatching | ~150 LOC | Typed `sessionUpdate()` callback | +| Tool approval protocol | ~100 LOC | `requestPermission()` callback | +| Session init handshake | ~30 LOC | `initialize()` + `newSession()` | +| **Итого** | **~530 LOC** | Типизированный SDK | + +**Из ~6000 LOC TeamProvisioningService**, ACP заменяет ~530 LOC (менее 9%). Остальные 91% — domain-specific orchestration. + +--- + +## 10. Честные оценки + +### Сложность интеграции: 6/10 (Уверенность: 8/10) + +- SDK сам по себе простой (0 dependencies, чистый API) +- Проблема: Claude Code не поддерживает ACP нативно +- Нужен маппинг между ACP events и нашими internal types +- Web Streams конвертация в Electron — тривиальна +- Основная сложность: поддержка двух протоколов (stream-json + ACP) + +### Полезность для нашего кейса: 4/10 (Уверенность: 9/10) + +- Наш primary agent (Claude Code) НЕ поддерживает ACP +- 91% нашего кода — domain-specific, ACP не касается +- Выгода: если хотим поддержать ДРУГИЕ агенты (Gemini, Codex, Copilot) — тогда ACP становится очень полезным (8/10) +- Для Claude Code only — бессмысленно, мы уже общаемся напрямую через stream-json + +### Зрелость/стабильность: 5/10 (Уверенность: 7/10) + +- Pre-1.0 (v0.16.1) +- Много `unstable_` методов +- Breaking changes между минорами возможны +- НО: 31 контрибьютор, 544 коммита, JetBrains + Zed backing +- Активная разработка, быстрый темп (18 npm versions) +- Usage updates и session management — ещё в draft + +### Риск adoption: 5/10 (Уверенность: 7/10) + +- Zero dependencies — безопасно для bundle size +- Pre-1.0 → API может измениться +- Claude Code может получить нативную ACP поддержку в будущем (Feature Request существует) +- VS Code не поддерживает ACP — это риск для всей экосистемы +- JetBrains backing — сильный сигнал стабильности + +--- + +## 11. Рекомендация + +### WAIT — Не интегрировать сейчас. Наблюдать. + +**Надёжность решения: 8/10. Уверенность в рекомендации: 9/10.** + +**Почему WAIT, а не ADOPT:** + +1. **Claude Code — наш primary agent и он НЕ говорит по ACP.** Пока Anthropic не добавит `--acp` флаг (или не поменяет `stream-json` на ACP), интеграция ACP не даёт value для Claude Code. + +2. **Мы заменим менее 9% кода.** ROI не оправдывает migration effort + поддержку двух протоколов. + +3. **Pre-1.0 API.** Breaking changes реальны. Лучше подождать стабилизации. + +**Когда стоит ADOPT:** + +1. **Claude Code получит нативную ACP поддержку** — тогда можно мигрировать stream-json → ACP, упростив парсинг. + +2. **Мы решим поддержать multi-agent (Gemini + Codex + Claude)** — тогда ACP станет единым протоколом для не-Claude агентов. Архитектура: stream-json для Claude, ACP для остальных, общий `AgentConnection` интерфейс. + +3. **ACP достигнет 1.0** — стабильный API, можно инвестировать в интеграцию. + +**Что делать прямо сейчас:** + +1. Следить за [Feature Request #6686](https://github.com/anthropics/claude-code/issues/6686) (Claude Code ACP support) +2. Следить за [ACP Updates](https://agentclientprotocol.com/updates) (protocol evolution) +3. Проектировать `AgentConnection` abstraction в нашем коде, чтобы stream-json и ACP могли быть взаимозаменяемы в будущем +4. Если решим поддержать Gemini/Codex — начать с ACP как протокола для них + +--- + +## Приложение A: Полная архитектура ACP Protocol Schema + +### Error Codes (JSON-RPC) + +| Code | Meaning | +|------|---------| +| -32700 | Parse error | +| -32600 | Invalid request | +| -32601 | Method not found | +| -32602 | Invalid params | +| -32603 | Internal error | +| -32000 | Authentication required | +| -32002 | Resource not found | + +### Permission Option Kinds + +| Kind | Описание | +|------|----------| +| `allow_once` | Разрешить один раз | +| `allow_always` | Разрешить всегда | +| `reject_once` | Отклонить один раз | +| `reject_always` | Отклонить всегда | + +### Stop Reasons + +| Reason | Описание | +|--------|----------| +| `end_turn` | Агент завершил turn нормально | +| `cancelled` | Пользователь отменил | +| `max_tokens` | Достигнут лимит токенов | +| `tool_use` | Агент ожидает результат tool (редко в ACP) | + +### Tool Call Kinds + +| Kind | Описание | +|------|----------| +| `read` | Чтение (файл, поиск) | +| `edit` | Редактирование файла | +| `command` | Выполнение команды | +| `tool` | Вызов MCP tool | + +### Diff Format + +```json +{ + "path": "/absolute/path/to/file.ts", + "oldText": "original content (null for new files)", + "newText": "modified content" +} +``` + +--- + +## Приложение B: Ссылки + +### Спецификация и документация +- [ACP Introduction](https://agentclientprotocol.com/get-started/introduction) +- [ACP Protocol Overview](https://agentclientprotocol.com/protocol/overview) +- [ACP Schema](https://agentclientprotocol.com/protocol/schema) +- [ACP Updates](https://agentclientprotocol.com/updates) +- [ACP Registry](https://agentclientprotocol.com/registry) + +### SDK +- [npm: @agentclientprotocol/sdk](https://www.npmjs.com/package/@agentclientprotocol/sdk) +- [GitHub: typescript-sdk](https://github.com/agentclientprotocol/typescript-sdk) +- [API Reference](https://agentclientprotocol.github.io/typescript-sdk/) +- [SDK Examples](https://github.com/agentclientprotocol/typescript-sdk/tree/main/src/examples) + +### Claude Code ACP +- [Feature Request #6686](https://github.com/anthropics/claude-code/issues/6686) +- [Zed Claude Code ACP Adapter](https://github.com/zed-industries/claude-agent-acp) +- [Zed Blog: Claude Code via ACP](https://zed.dev/blog/claude-code-via-acp) + +### Ecosystem +- [Zed ACP](https://zed.dev/acp) +- [JetBrains ACP](https://www.jetbrains.com/acp/) +- [JetBrains ACP Docs](https://www.jetbrains.com/help/ai-assistant/acp.html) +- [GitHub Copilot ACP](https://github.blog/changelog/2026-01-28-acp-support-in-copilot-cli-is-now-in-public-preview/) +- [Goose ACP](https://block.github.io/goose/blog/2025/10/24/intro-to-agent-client-protocol-acp/) +- [Kiro CLI ACP](https://kiro.dev/docs/cli/acp/) +- [OpenCode ACP](https://opencode.ai/docs/acp/) diff --git a/docs/research/ai-agent-protocols-and-routing.md b/docs/research/ai-agent-protocols-and-routing.md new file mode 100644 index 00000000..1365dc84 --- /dev/null +++ b/docs/research/ai-agent-protocols-and-routing.md @@ -0,0 +1,782 @@ +# AI Agent Orchestration Landscape: Protocols, Routing & Desktop Tools + +**Date:** March 24, 2026 +**Status:** Research snapshot (rapidly evolving landscape) + +--- + +## Table of Contents + +1. [Executive Summary](#executive-summary) +2. [Protocol-Level Standards](#1-protocol-level-standards) + - [MCP (Model Context Protocol)](#11-mcp--model-context-protocol) + - [A2A (Agent2Agent Protocol)](#12-a2a--agent2agent-protocol) + - [ACP (Agent Communication Protocol)](#13-acp--agent-communication-protocol) + - [AGENTS.md](#14-agentsmd) + - [Protocol Layer Summary](#15-protocol-layer-summary) +3. [Governance: Agentic AI Foundation (AAIF)](#2-governance-agentic-ai-foundation-aaif) +4. [Multi-Model Routing & Proxy Tools](#3-multi-model-routing--proxy-tools) + - [LiteLLM](#31-litellm) + - [OpenRouter](#32-openrouter) +5. [Agent Orchestration Frameworks](#4-agent-orchestration-frameworks) + - [LangGraph](#41-langgraph) + - [CrewAI](#42-crewai) + - [AutoGen / Microsoft Agent Framework](#43-autogen--microsoft-agent-framework) + - [OpenAI Agents SDK](#44-openai-agents-sdk) + - [Google Agent Development Kit (ADK)](#45-google-agent-development-kit-adk) + - [AWS Strands Agents](#46-aws-strands-agents) + - [OpenAgents](#47-openagents) + - [GitAgent](#48-gitagent) + - [Goose (Block)](#49-goose-block) + - [Framework Comparison Table](#410-framework-comparison-table) +6. [Desktop/Local Orchestration Tools](#5-desktoplocal-orchestration-tools) + - [VS Code Multi-Agent Hub](#51-vs-code-multi-agent-hub) + - [Augment Code Intent](#52-augment-code-intent) + - [OpenAI Codex Desktop App](#53-openai-codex-desktop-app) +7. [Relevance for Claude Agent Teams UI](#6-relevance-for-claude-agent-teams-ui) +8. [Sources](#sources) + +--- + +## Executive Summary + +As of March 2026, the AI agent ecosystem has consolidated around three complementary protocol layers: + +| Layer | Protocol | Purpose | Governance | +|-------|----------|---------|------------| +| **Agent-to-Tool** | MCP | Connect agents to tools/data | AAIF (Linux Foundation) | +| **Agent-to-Agent** | A2A | Agents discover/communicate with each other | Linux Foundation | +| **Agent Config** | AGENTS.md | Project-level agent instructions | AAIF (Linux Foundation) | + +All three are open-source, vendor-neutral, and governed by the Linux Foundation. The Agentic AI Foundation (AAIF), co-founded by Anthropic, OpenAI, and Block in December 2025, is the umbrella organization. + +Key numbers: +- **MCP:** 97M monthly SDK downloads, 10,000+ servers, 300+ clients +- **A2A:** 22.7K GitHub stars, 150+ supporting organizations, v0.3 released +- **AGENTS.md:** Adopted by 60,000+ open-source projects, supported by all major coding agents except Claude Code + +The framework landscape is fragmenting into three tiers: +1. **Cloud-vendor SDKs** (OpenAI Agents SDK, Google ADK, AWS Strands, Microsoft Agent Framework) -- production-grade, tied to ecosystems +2. **Independent frameworks** (LangGraph, CrewAI, OpenAgents) -- model-agnostic, community-driven +3. **Portability layers** (GitAgent, MCP, A2A) -- cross-framework interop + +Desktop orchestration is emerging as a new category, with VS Code, Augment Intent, and OpenAI Codex App leading the charge. + +--- + +## 1. Protocol-Level Standards + +### 1.1 MCP -- Model Context Protocol + +| Field | Value | +|-------|-------| +| **URL** | [modelcontextprotocol.io](https://modelcontextprotocol.io/) | +| **GitHub** | [modelcontextprotocol](https://github.com/modelcontextprotocol) | +| **Created by** | Anthropic (November 2024) | +| **Governance** | AAIF / Linux Foundation (donated December 2025) | +| **License** | Apache 2.0 | +| **Maturity** | Production -- spec version 2025-11-25 | +| **Adoption** | 97M monthly SDK downloads, 10,000+ servers, 300+ clients | +| **Reliability** | 9/10 | +| **Confidence** | 9/10 | + +**What it enables:** Standardized agent-to-tool communication. Any AI model can connect to any data source or tool through a universal interface (tools, resources, prompts). Often compared to "USB-C for AI." + +**Key facts:** +- Adopted by every major AI platform: Claude, ChatGPT, Cursor, Gemini, Microsoft Copilot, VS Code +- OpenAI adopted MCP across its products in March 2025 +- 2026 roadmap focuses on: transport scalability (remote servers), agent communication upgrades (chunked messages, multipart streams), enterprise readiness (audit trails, SSO) +- Security concerns: prompt injection, tool poisoning, cross-server shadowing identified in April 2025 analysis + +**Relation to A2A:** MCP handles agent-to-tool connections. A2A handles agent-to-agent. Complementary, not competing. A common production pattern: MCP for tool connections + A2A for agent coordination. + +> Source: [A Year of MCP (Pento)](https://www.pento.ai/blog/a-year-of-mcp-2025-review), [The 2026 MCP Roadmap](http://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/), [MCP Wikipedia](https://en.wikipedia.org/wiki/Model_Context_Protocol), [MCP Specification](https://modelcontextprotocol.io/specification/2025-11-25), [The New Stack - MCP 2026 Roadmap](https://thenewstack.io/model-context-protocol-roadmap-2026/) + +--- + +### 1.2 A2A -- Agent2Agent Protocol + +| Field | Value | +|-------|-------| +| **URL** | [github.com/a2aproject/A2A](https://github.com/a2aproject/A2A) | +| **Created by** | Google (April 9, 2025, Cloud Next) | +| **Governance** | Linux Foundation (June 2025) | +| **License** | Apache 2.0 | +| **Version** | 0.3 (July 2025) -- added gRPC, signed security cards | +| **GitHub Stars** | 22.7K (main repo) | +| **Supporting Orgs** | 150+ (Atlassian, Salesforce, SAP, PayPal, etc.) | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**What it enables:** Standardized agent-to-agent communication. Agents discover each other via "Agent Cards" (JSON at `/.well-known/agent.json`), negotiate capabilities, and exchange tasks over HTTP/SSE/JSON-RPC. + +**Key features:** +- **Capability discovery** via Agent Cards (name, endpoint, skills, auth flows) +- **Flexible modalities**: text, audio, video streaming +- **Enterprise auth**: parity with OpenAPI authentication schemes +- **Supports async**: tasks from quick responses to multi-day research +- Protocol: JSON-RPC 2.0 over HTTP(S), SSE for streaming, push notifications + +**ACP merger (August 2025):** IBM's Agent Communication Protocol (ACP) officially merged into A2A under the Linux Foundation. BeeAI platform now uses A2A. + +**Ecosystem:** Native support in Google ADK, AWS Strands, Microsoft Agent Framework, LiteLLM, OpenAgents. CrewAI added A2A support. LangGraph and AutoGen have not yet adopted natively. + +> Source: [Google Developers Blog - A2A](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/), [Google Cloud Blog - A2A Upgrade](https://cloud.google.com/blog/products/ai-machine-learning/agent2agent-protocol-is-getting-an-upgrade), [Linux Foundation - A2A Project](https://www.linuxfoundation.org/press/linux-foundation-launches-the-agent2agent-protocol-project-to-enable-secure-intelligent-communication-between-ai-agents), [IBM - A2A](https://www.ibm.com/think/topics/agent2agent-protocol), [ACP Joins A2A](https://lfaidata.foundation/communityblog/2025/08/29/acp-joins-forces-with-a2a-under-the-linux-foundations-lf-ai-data/) + +--- + +### 1.3 ACP -- Agent Communication Protocol + +| Field | Value | +|-------|-------| +| **URL** | [github.com/i-am-bee/acp](https://github.com/i-am-bee/acp) | +| **Created by** | IBM BeeAI (March 2025) | +| **Status** | **Merged into A2A** (August 2025) | +| **License** | Apache 2.0 | +| **Reliability** | 7/10 (merged, not standalone) | +| **Confidence** | 8/10 | + +**What it was:** A lightweight REST-based protocol for agent-to-agent messaging. No SDK required -- curl/Postman compatible. Key differentiators were offline agent discovery and peer-to-peer interaction. + +**Current status:** ACP merged into A2A. The BeeAI platform now runs on A2A. IBM stated: "By bringing the assets and expertise behind ACP into A2A, we can build a single, more powerful standard." Migration guides are available. + +**Legacy significance:** ACP influenced A2A's design toward simpler REST-based patterns and offline discovery capabilities. + +> Source: [IBM Research - ACP](https://research.ibm.com/blog/agent-communication-protocol-ai), [IBM - What is ACP](https://www.ibm.com/think/topics/agent-communication-protocol), [ACP Joins A2A](https://lfaidata.foundation/communityblog/2025/08/29/acp-joins-forces-with-a2a-under-the-linux-foundations-lf-ai-data/) + +--- + +### 1.4 AGENTS.md + +| Field | Value | +|-------|-------| +| **URL** | [agents.md](https://agents.md/) | +| **Created by** | OpenAI (August 2025) | +| **Governance** | AAIF / Linux Foundation | +| **License** | Open standard (Markdown convention) | +| **Adoption** | 60,000+ repositories | +| **Reliability** | 8/10 | +| **Confidence** | 9/10 | + +**What it enables:** A standardized Markdown file that gives AI coding agents project-specific instructions (build commands, coding conventions, testing requirements, boundaries). Like `.gitignore` but for agents. + +**Adoption:** Supported by GitHub Copilot, Cursor, Windsurf, Zed, Warp, VS Code, JetBrains Junie, OpenAI Codex CLI, Google Jules, Gemini CLI, Amp, Devin, Aider, goose, RooCode, Augment Code. + +**Notable exception:** Claude Code uses its own `CLAUDE.md` format. Open issue with 3,000+ upvotes requesting AGENTS.md support, but Anthropic has not committed to it. + +**For monorepos:** Nested AGENTS.md files work (agents parse nearest file in directory tree). OpenAI's main repo has 88 AGENTS.md files. + +> Source: [InfoQ - AGENTS.md](https://www.infoq.com/news/2025/08/agents-md/), [agents.md official site](https://agents.md/), [OpenAI AAIF announcement](https://openai.com/index/agentic-ai-foundation/) + +--- + +### 1.5 Protocol Layer Summary + +``` ++--------------------------------------------------+ +| AGENTS.md / CLAUDE.md | <- Agent config/instructions ++--------------------------------------------------+ +| A2A (Agent-to-Agent Protocol) | <- Agent discovery & communication +| (includes former ACP) | ++--------------------------------------------------+ +| MCP (Model Context Protocol) | <- Agent-to-tool connections ++--------------------------------------------------+ +| HTTP / SSE / JSON-RPC / gRPC | <- Transport layer ++--------------------------------------------------+ +``` + +All three major layers are: +- Open source (Apache 2.0) +- Governed by the Linux Foundation (via AAIF or directly) +- Backed by every major AI company +- Production-ready or approaching it + +--- + +## 2. Governance: Agentic AI Foundation (AAIF) + +| Field | Value | +|-------|-------| +| **URL** | [aaif.io](https://aaif.io/) | +| **Parent** | Linux Foundation | +| **Founded** | December 9, 2025 | +| **Co-founders** | Anthropic, Block, OpenAI | +| **Platinum Members** | AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, OpenAI | +| **Total Members** | 97+ | +| **Board Chair** | David Nalley (AWS) | +| **Reliability** | 9/10 | +| **Confidence** | 9/10 | + +**What it does:** Neutral governance body for agentic AI open standards. Hosts MCP, goose, and AGENTS.md as founding projects. A2A is governed separately under the Linux Foundation but aligned. + +**Key principles:** +- Open governance: contributors from all backgrounds shape direction +- Project autonomy: individual projects maintain full technical independence +- Sustainability: neutral infrastructure and funding (not vendor-controlled) +- Focused scope: agentic AI only (not all of AI/ML/data science) + +**Funding model:** "Directed fund" -- companies contribute through membership dues. Roadmaps set by technical steering committees, not sponsors. + +**Government alignment:** NIST launched the "AI Agent Standards Initiative" in February 2026 to foster industry-led technical standards for AI agents. + +**Upcoming event:** MCP Dev Summit North America, April 2-3, 2026, New York City. + +> Source: [Linux Foundation - AAIF](https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation), [OpenAI - AAIF](https://openai.com/index/agentic-ai-foundation/), [Anthropic - AAIF](https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation), [NIST AI Agent Standards Initiative](https://www.nist.gov/news-events/news/2026/02/announcing-ai-agent-standards-initiative-interoperable-and-secure) + +--- + +## 3. Multi-Model Routing & Proxy Tools + +### 3.1 LiteLLM + +| Field | Value | +|-------|-------| +| **URL** | [litellm.ai](https://docs.litellm.ai/) | +| **GitHub** | [BerriAI/litellm](https://github.com/BerriAI/litellm) | +| **Type** | LLM Gateway / Proxy (self-hosted) | +| **License** | MIT (Enterprise features paid) | +| **LLM Support** | 100+ models | +| **Agent Support** | A2A agents (LangGraph, Vertex AI, Azure, Bedrock, Pydantic AI) | +| **MCP Support** | Yes (central endpoint with per-key ACL) | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**What it enables:** +- Unified OpenAI-compatible gateway for 100+ LLMs from all providers +- A2A agent routing through the same gateway +- MCP tool access with per-key access control +- Load balancing: simple-shuffle, least-busy, usage-based, latency-based +- Retry/fallback across deployments +- Cost tracking per key/team/user +- Content filtering, PII masking, guardrails + +**Performance:** 8ms P95 latency at 1K RPS. + +**Known issues (2025-2026):** +- Python GIL limits concurrency under high load +- DB logging degrades after 1M+ logs (GitHub issue #12067) +- Enterprise features (SSO, RBAC, budgets) locked behind paid license +- 800+ open GitHub issues; September 2025 release caused OOM on Kubernetes +- Bifrost (Go-based competitor) claims 50x faster performance + +**Agent routing capability:** LiteLLM supports adding A2A agents as first-class endpoints, meaning you can route to both LLMs and agents through the same gateway. This makes it a potential universal backend for agent orchestration. + +**Relevance for desktop agent UI:** High. Could serve as a unified backend that routes requests to different LLM providers and A2A agents through a single API. The self-hosted nature and OpenAI-compatible API make it easy to integrate. + +> Source: [LiteLLM Docs](https://docs.litellm.ai/docs/), [LiteLLM GitHub](https://github.com/BerriAI/litellm), [Top 5 LiteLLM Alternatives 2026](https://www.getmaxim.ai/articles/top-5-litellm-alternatives-in-2026/) + +--- + +### 3.2 OpenRouter + +| Field | Value | +|-------|-------| +| **URL** | [openrouter.ai](https://openrouter.ai/) | +| **Type** | Cloud-hosted LLM routing service | +| **Models** | 500+ from 60+ providers | +| **Scale** | 250K+ apps, 4.2M+ users | +| **API** | OpenAI SDK compatible | +| **License** | Proprietary (cloud service) | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**What it enables:** +- Single API for 500+ models (OpenAI, Anthropic, Google, Meta, Mistral, etc.) +- Auto-routing: cheap models for simple queries, premium for complex +- Automatic provider fallback for reliability +- Low latency: ~15ms overhead (edge infrastructure) +- 29 free models available (no credit card) + +**Agent support:** Supports building agentic workflows through the API, but no native A2A/MCP protocol support. It is an LLM routing layer, not an agent orchestration layer. + +**Multi-model strategy for agents:** The recommended approach is to use different models for different tasks (e.g., Devstral for coding, MiniMax for agents, DeepSeek for general). OpenRouter's auto-routing facilitates this. + +**Relevance for desktop agent UI:** Medium. Excellent for LLM routing (choosing models per task), but lacks native agent orchestration. Would need to be paired with an agent framework. Not self-hostable. + +> Source: [OpenRouter](https://openrouter.ai/), [OpenRouter Review 2026](https://aiagentslist.com/agents/openrouter), [Building Agentic AI with OpenRouter](https://dev.to/allanninal/building-your-first-agentic-ai-workflow-with-openrouter-api-1fo6) + +--- + +## 4. Agent Orchestration Frameworks + +### 4.1 LangGraph + +| Field | Value | +|-------|-------| +| **GitHub** | [langchain-ai/langgraph](https://github.com/langchain-ai/langgraph) | +| **Architecture** | Graph-based workflows (nodes + edges) | +| **Languages** | Python, JavaScript/TypeScript | +| **License** | MIT | +| **Best for** | Production-grade stateful systems | +| **MCP/A2A** | No native support yet | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**Key strengths:** +- Most control over execution flow (conditional logic, branching, parallel) +- Best debugging/observability via LangSmith companion tooling +- Production-proven with enterprise deployments +- Model-agnostic: assign different models to different agent nodes +- Mature checkpointing and state persistence + +**Key weaknesses:** +- Steepest learning curve (requires graph theory knowledge) +- No native MCP/A2A support yet +- Higher initial development time vs. CrewAI + +> Source: [DataCamp - Framework Comparison](https://www.datacamp.com/tutorial/crewai-vs-langgraph-vs-autogen), [DEV - Agent Showdown 2026](https://dev.to/topuzas/the-great-ai-agent-showdown-of-2026-openai-autogen-crewai-or-langgraph-1ea8) + +--- + +### 4.2 CrewAI + +| Field | Value | +|-------|-------| +| **URL** | [crewai.com](https://crewai.com/) | +| **Architecture** | Role-based teams (roles, goals, backstories) | +| **Languages** | Python | +| **License** | MIT | +| **Best for** | Quick prototyping, team-based workflows | +| **A2A** | Added A2A support | +| **MCP** | Not natively | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**Key strengths:** +- Most beginner-friendly (40% faster time-to-production vs. LangGraph) +- Role-based metaphor mirrors real organizations +- YAML config keeps agent definitions readable +- Active development (unlike AutoGen) +- Added A2A support for interoperability + +**Key weaknesses:** +- Less mature monitoring/observability tooling +- Python-only +- Less granular control than LangGraph for complex workflows + +> Source: [CrewAI](https://crewai.com/), [OpenAgents Blog - Frameworks Compared](https://openagents.org/blog/posts/2026-02-23-open-source-ai-agent-frameworks-compared) + +--- + +### 4.3 AutoGen / Microsoft Agent Framework + +| Field | Value | +|-------|-------| +| **URL** | [github.com/microsoft/agent-framework](https://github.com/microsoft/agent-framework) | +| **Previous** | AutoGen + Semantic Kernel (merged October 2025) | +| **Languages** | Python, .NET | +| **License** | MIT | +| **Status** | Release Candidate (February 2026), GA target end of Q1 2026 | +| **MCP/A2A** | Both supported natively | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**What happened:** +- Microsoft merged AutoGen and Semantic Kernel into a unified "Microsoft Agent Framework" in October 2025 +- AutoGen is now in maintenance mode (bug fixes/security only) +- Semantic Kernel features are being absorbed +- GA 1.0 targeted for end of Q1 2026 + +**Key features:** +- Unified programming model: Python and .NET +- Graph-based workflows: sequential, concurrent, handoff, group chat patterns +- Multi-provider: Azure OpenAI, OpenAI, Anthropic, AWS Bedrock, Ollama, etc. +- Native interoperability: A2A, AG-UI, MCP, OpenAPI +- Enterprise: session-based state management, middleware, telemetry + +**Key concern:** Community disruption from the merge. AutoGen users forced to migrate. Strategic shift raises questions about long-term stability of Microsoft's agent strategy. + +> Source: [Visual Studio Magazine - Agent Framework](https://visualstudiomagazine.com/articles/2025/10/01/semantic-kernel-autogen--open-source-microsoft-agent-framework.aspx), [Microsoft Learn - Agent Framework](https://learn.microsoft.com/en-us/agent-framework/overview/), [Microsoft Azure Blog](https://azure.microsoft.com/en-us/blog/introducing-microsoft-agent-framework/) + +--- + +### 4.4 OpenAI Agents SDK + +| Field | Value | +|-------|-------| +| **URL** | [openai.github.io/openai-agents-python](https://openai.github.io/openai-agents-python/) | +| **GitHub** | [openai/openai-agents-python](https://github.com/openai/openai-agents-python) | +| **Languages** | Python, TypeScript/JavaScript | +| **License** | MIT | +| **Version** | 0.13.0 (March 2026) | +| **Maturity** | Production-ready | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**Core primitives:** Agents, Handoffs, Tools (functions + MCP + hosted), Guardrails, Human-in-the-loop, Sessions, Tracing, Realtime Agents (voice). + +**Provider-agnostic:** Supports OpenAI Responses/Chat APIs and 100+ other LLMs despite being OpenAI-branded. + +**Orchestration patterns:** Agents-as-tools (bounded subtask) and handoffs (specialist takes over). + +**MCP support:** Native. Agents can use MCP servers as tool providers. + +> Source: [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/), [Agents SDK Review (mem0)](https://mem0.ai/blog/openai-agents-sdk-review), [OpenAI Developers 2025](https://developers.openai.com/blog/openai-for-developers-2025/) + +--- + +### 4.5 Google Agent Development Kit (ADK) + +| Field | Value | +|-------|-------| +| **URL** | [google.github.io/adk-docs](https://google.github.io/adk-docs/) | +| **GitHub** | [google/adk-python](https://github.com/google/adk-python) (17.8K stars) | +| **Languages** | Python, Go | +| **License** | Apache 2.0 | +| **A2A** | Native integration | +| **MCP** | Native support | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**Key strengths:** +- Same framework powering Google's Agentspace and Customer Engagement Suite +- Native A2A + MCP: first-party protocol support +- Rich tool ecosystem: built-in tools, MCP servers, LangChain/LlamaIndex integration, agents as tools +- LiteLLM integration for multi-provider model access (Anthropic, Meta, Mistral, etc.) +- Deploy anywhere: Cloud Run, Vertex AI Agent Engine, GKE +- 3.3M monthly downloads + +**Key weakness:** Optimized for Gemini/Google ecosystem. Model-agnostic in theory, but best experience with Google Cloud. + +> Source: [Google Developers Blog - ADK](https://developers.googleblog.com/en/agent-development-kit-easy-to-build-multi-agent-applications/), [ADK Docs](https://google.github.io/adk-docs/), [ADK + A2A](https://google.github.io/adk-docs/a2a/) + +--- + +### 4.6 AWS Strands Agents + +| Field | Value | +|-------|-------| +| **URL** | [strandsagents.com](https://strandsagents.com/) | +| **GitHub** | [strands-agents](https://github.com/strands-agents) (2,000+ stars) | +| **Languages** | Python, TypeScript | +| **License** | Apache 2.0 | +| **Version** | 1.0 (production-ready) | +| **A2A** | Native support | +| **MCP** | First-class support | +| **Downloads** | 150K+ on PyPI | +| **Reliability** | 7/10 | +| **Confidence** | 7/10 | + +**Key features:** +- Model-driven approach: model reasons about when to use sub-agents +- Multi-agent patterns: Graph, Swarm, Workflow +- Native A2A: expose agents as A2A servers, communicate with other A2A agents +- First-class MCP: thousands of tools accessible +- Model-agnostic: Bedrock, Anthropic, Gemini, LiteLLM, Ollama, OpenAI, and more +- Deploy: Lambda, Fargate, EKS, Bedrock AgentCore, Docker, Kubernetes +- OpenTelemetry observability built-in + +**Key concern:** Newer entrant (May 2025), smaller community than LangGraph/CrewAI. AWS ecosystem-optimized. + +> Source: [AWS Blog - Strands Agents](https://aws.amazon.com/blogs/opensource/introducing-strands-agents-an-open-source-ai-agents-sdk/), [Strands 1.0](https://aws.amazon.com/blogs/opensource/introducing-strands-agents-1-0-production-ready-multi-agent-orchestration-made-simple/), [AWS - A2A on Strands](https://aws.amazon.com/blogs/opensource/open-protocols-for-agent-interoperability-part-4-inter-agent-communication-on-a2a/) + +--- + +### 4.7 OpenAgents + +| Field | Value | +|-------|-------| +| **URL** | [openagents.org](https://openagents.org/) | +| **GitHub** | [openagents-org/openagents](https://github.com/openagents-org/openagents) | +| **Languages** | Python | +| **License** | Open source | +| **A2A** | Native support | +| **MCP** | Native support | +| **Reliability** | 6/10 | +| **Confidence** | 7/10 | + +**Unique positioning:** Only framework with native first-class support for BOTH MCP and A2A protocols. Purpose-built for interoperable agent networks. + +**Key features:** +- Persistent agent communities (not one-shot pipelines) +- LLM-agnostic (any model provider) +- Agent discovery: agents find each other in workspaces +- @mention delegation between agents +- Manages Claude, Codex, Aider, and more from a single CLI +- Self-hosted agent networks via SDK + +**Key concern:** Smaller community and less production-hardened than LangGraph/CrewAI. Newer project. + +> Source: [OpenAgents Blog - Comparison](https://openagents.org/blog/posts/2026-02-23-open-source-ai-agent-frameworks-compared), [OpenAgents GitHub](https://github.com/openagents-org/openagents) + +--- + +### 4.8 GitAgent + +| Field | Value | +|-------|-------| +| **URL** | [github.com/open-gitagent/gitagent](https://github.com/open-gitagent/gitagent) | +| **Created** | March 2026 (very new) | +| **Type** | Framework-agnostic agent definition format | +| **License** | Open source | +| **Reliability** | 5/10 | +| **Confidence** | 6/10 | + +**What it does:** "Docker for AI Agents" -- a universal format to define an agent once and export it to any framework. + +**Export targets:** `gitagent export -f [framework]` supports OpenAI, Claude Code, LangChain/LangGraph, CrewAI, AutoGen. + +**Key innovation:** +- Agent identity in SOUL.md + skills/ directories +- Git-native state management (Markdown files, not vector DBs) +- Human-in-the-loop via standard PRs (not custom dashboards) +- Enterprise compliance (FINRA, SEC) built-in + +**What ports:** Prompts, persona, constraints, tool schemas, role policies, model preferences. +**What stays:** Runtime orchestration, state machines, live tool execution, memory I/O. + +**Key concern:** Brand new (March 2026). No production track record. Early-stage community. + +> Source: [MarkTechPost - GitAgent](https://www.marktechpost.com/2026/03/22/meet-gitagent-the-docker-for-ai-agents-that-is-finally-solving-the-fragmentation-between-langchain-autogen-and-claude-code/), [GitAgent GitHub](https://github.com/open-gitagent/gitagent) + +--- + +### 4.9 Goose (Block) + +| Field | Value | +|-------|-------| +| **URL** | [block.github.io/goose](https://block.github.io/goose/) | +| **GitHub** | [block/goose](https://github.com/block/goose) (30,000+ stars, 350+ contributors) | +| **Created by** | Block (January 2025) | +| **Governance** | AAIF / Linux Foundation | +| **License** | Apache 2.0 | +| **Type** | Local-first AI agent (CLI + Desktop) | +| **MCP** | Core architecture built on MCP | +| **LLM Support** | 25+ providers (commercial + local models) | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**What it does:** An extensible, local-first AI agent. Goes beyond code suggestions -- runs shell commands, edits files, executes code, orchestrates multi-step workflows. Reference implementation for MCP. + +**Key facts:** +- 110+ releases since January 2025 +- 3,000+ MCP servers available in the ecosystem +- Founding project of AAIF alongside MCP and AGENTS.md +- Works with any LLM (multi-model config for cost optimization) +- Modular via MCP extensions + +> Source: [Block - Introducing Goose](https://block.xyz/inside/block-open-source-introduces-codename-goose), [Goose GitHub](https://github.com/block/goose), [Linux Foundation - AAIF](https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation) + +--- + +### 4.10 Framework Comparison Table + +| Framework | MCP | A2A | Multi-Provider | Languages | Architecture | Maturity | GitHub Stars | +|-----------|-----|-----|----------------|-----------|-------------|----------|-------------| +| **LangGraph** | No | No | Yes | Py, JS/TS | Graph-based | High | ~40K | +| **CrewAI** | No | Yes | Yes | Py | Role-based | Medium-High | ~30K | +| **MS Agent Framework** | Yes | Yes | Yes | Py, .NET | Graph + Conversational | Medium (RC) | ~40K (combined) | +| **OpenAI Agents SDK** | Yes | No | Yes (100+ LLMs) | Py, TS/JS | Handoff-based | High | N/A | +| **Google ADK** | Yes | Yes | Yes (via LiteLLM) | Py, Go | Hierarchical | Medium-High | ~18K | +| **AWS Strands** | Yes | Yes | Yes | Py, TS | Model-driven | Medium | ~2K | +| **OpenAgents** | Yes | Yes | Yes | Py | Network-based | Low | ~1K | +| **Goose** | Yes (core) | No | Yes (25+) | Rust/TS | MCP-based | Medium-High | ~30K | +| **GitAgent** | No | No | Yes (portability) | Universal | Format/spec | Very Low | New | + +--- + +## 5. Desktop/Local Orchestration Tools + +### 5.1 VS Code Multi-Agent Hub + +| Field | Value | +|-------|-------| +| **URL** | [code.visualstudio.com](https://code.visualstudio.com/blogs/2026/02/05/multi-agent-development) | +| **Release** | January 2026 (v1.109) | +| **Agents** | GitHub Copilot + Claude + Codex | +| **Subagents** | Parallel execution | +| **MCP** | Full MCP Apps support | +| **Reliability** | 9/10 | +| **Confidence** | 9/10 | + +**What it is:** VS Code as a multi-agent command center. Run Claude, Codex, and Copilot side by side from a single interface. + +**Key features (v1.109+):** +- **Agent Sessions view**: orchestrate multiple AI assistants, delegate tasks, compare outputs +- **Parallel subagents**: fire off multiple independent tasks simultaneously +- **Agent types**: local (interactive), background (CLI/worktrees), cloud (GitHub PRs), third-party +- **Custom agents**: specialized roles (research, implementation, security) with defined tools, instructions, and models +- **MCP Apps**: tool calls return interactive UI components (dashboards, forms, visualizations) +- **Copilot Memory**: context retention across interactions + +**Agent HQ (GitHub):** Announced at GitHub Universe 2025, launched February 2026. Assign issues to Copilot, Claude, Codex, or all three to compare results. + +**Agent strengths differentiation:** +- Copilot: fast autocomplete, repo-specific patterns, inline experience +- Claude: thorough, trade-off analysis, multi-file changes +- Codex: fast generation, algorithmic tasks, concise output + +> Source: [VS Code Blog - Multi-Agent](https://code.visualstudio.com/blogs/2026/02/05/multi-agent-development), [The New Stack - VS Code Multi-Agent](https://thenewstack.io/vs-code-becomes-multi-agent-command-center-for-developers/), [GitHub Blog - Agent HQ](https://github.blog/news-insights/company-news/pick-your-agent-use-claude-and-codex-on-agent-hq/) + +--- + +### 5.2 Augment Code Intent + +| Field | Value | +|-------|-------| +| **URL** | [augmentcode.com](https://www.augmentcode.com/blog/intent-a-workspace-for-agent-orchestration) | +| **Platform** | macOS (public beta, February 2026); Windows waitlist | +| **Type** | Standalone desktop app | +| **Architecture** | Living Spec + three-tier agents (Coordinator, Specialists, Verifier) | +| **BYOA** | Yes (Claude Code, Codex, OpenCode) | +| **Reliability** | 6/10 | +| **Confidence** | 7/10 | + +**Unique concept: Living Spec.** A shared document that acts as the canonical source of truth. Reduces prompt drift, stale assumptions, and conflicting parallel work. Coordinator breaks requirements into tasks, specialists execute in isolated git worktrees, verifier checks results against spec. + +**BYOA (Bring Your Own Agent):** Use Claude Code, Codex, or OpenCode inside Intent's workspace. Free tier for BYOA; Context Engine requires subscription. + +**Context Engine:** Processes 400,000+ files through semantic dependency analysis. Agents gain understanding of service boundaries, API contracts, dependency relationships. + +**Benchmark claims:** SWE-bench Pro: Auggie 51.80% vs Claude Code 49.75% vs Cursor 50.21%. + +**Relevance to Claude Agent Teams UI:** Intent is the closest conceptual competitor. Both aim to be a desktop UI for multi-agent coding orchestration. Key differences: +- Intent uses living specs; our app uses kanban boards +- Intent is macOS-only; our app is cross-platform (Electron) +- Intent is commercial (freemium); ours is 100% free/open-source +- Intent requires BYOA agents; ours is Claude Code-native with potential for multi-provider + +> Source: [Augment Code - Intent](https://www.augmentcode.com/blog/intent-a-workspace-for-agent-orchestration), [Intent vs Claude Code](https://www.augmentcode.com/tools/intent-vs-claude-code), [Best AI Coding Desktop Apps 2026](https://www.augmentcode.com/tools/best-ai-coding-agent-desktop-apps) + +--- + +### 5.3 OpenAI Codex Desktop App + +| Field | Value | +|-------|-------| +| **Created** | February 2, 2026 | +| **Platform** | macOS only (Windows late 2026) | +| **Type** | Standalone desktop app | +| **Architecture** | "Command center for agents" | +| **Reliability** | 7/10 | +| **Confidence** | 7/10 | + +**What it does:** Centralizes multiple AI coding agents in a single interface. Manage parallel AI workflows, review automated changes, run long-running background tasks. + +**Key gap vs. our app:** Codex Desktop is OpenAI-only. No multi-provider agent support. No kanban board. No team collaboration features. + +> Source: [IntuitionLabs - Codex App](https://intuitionlabs.ai/articles/openai-codex-app-ai-coding-agents), [Augment Code - Desktop Apps Comparison](https://www.augmentcode.com/tools/best-ai-coding-agent-desktop-apps) + +--- + +## 6. Relevance for Claude Agent Teams UI + +### Could any of these serve as a universal backend for a desktop AI team management UI? + +**Highest relevance tools:** + +| Tool | Why Relevant | Integration Path | Effort | +|------|-------------|------------------|--------| +| **MCP** | Our agents already use MCP. Universal tool protocol. | Already integrated via Claude Code | Low | +| **A2A** | Could enable cross-provider agent communication (Claude + Codex + Gemini agents) | Implement A2A client/server in Electron main process | Medium-High | +| **LiteLLM** | Unified routing to any LLM. A2A agent support. Self-hosted. | Spawn local proxy, route all requests through it | Medium | +| **OpenAgents** | Native MCP + A2A. Manages Claude, Codex, Aider from single CLI. | Could replace/augment Claude Code CLI orchestration | High | +| **AGENTS.md** | Would make our kanban tasks/specs consumable by any agent | Generate AGENTS.md from team config | Low | + +### Strategic positioning + +Our app (Claude Agent Teams UI) has unique advantages that no competitor offers: + +1. **Kanban board** -- nobody else has this for agent orchestration +2. **100% free, open-source, local-first** -- vs. Augment Intent (freemium), Codex App (OpenAI-only), VS Code (ecosystem lock-in) +3. **Claude Code-native** -- deepest integration with Claude's agent teams feature +4. **Cross-team communication** -- agents coordinate across teams, not just within + +### Potential evolution path + +``` +Phase 1 (Current): Claude Code-native orchestration + | +Phase 2: Add AGENTS.md export (make teams consumable by other agents) + | +Phase 3: Add A2A server (expose our teams as A2A-discoverable agents) + | +Phase 4: Add multi-provider support via LiteLLM/A2A + (Claude + Codex + Gemini agents on same kanban board) + | +Phase 5: Full "universal AI team management" platform +``` + +**Key risk:** The VS Code multi-agent hub (Agent HQ) has massive distribution advantage. Our differentiation must come from superior UX (kanban), deeper team management, and open-source community. + +### Market context +- Gartner: 40% of enterprise apps will feature AI agents by end of 2026 (up from 5%) +- IDC: agentic AI spending to exceed $1.3T by 2029 (31.9% CAGR) +- UiPath: 65% of organizations piloting agentic systems by mid-2025 + +--- + +## Sources + +### Protocols & Standards +- [Google Developers Blog - A2A Protocol](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/) +- [Google Cloud Blog - A2A Upgrade](https://cloud.google.com/blog/products/ai-machine-learning/agent2agent-protocol-is-getting-an-upgrade) +- [Linux Foundation - A2A Project](https://www.linuxfoundation.org/press/linux-foundation-launches-the-agent2agent-protocol-project-to-enable-secure-intelligent-communication-between-ai-agents) +- [A2A GitHub](https://github.com/a2aproject/A2A) +- [MCP Official Site](https://modelcontextprotocol.io/) +- [MCP 2026 Roadmap](http://blog.modelcontextprotocol.io/posts/2026-mcp-roadmap/) +- [MCP Specification 2025-11-25](https://modelcontextprotocol.io/specification/2025-11-25) +- [Pento - A Year of MCP](https://www.pento.ai/blog/a-year-of-mcp-2025-review) +- [The New Stack - MCP Roadmap 2026](https://thenewstack.io/model-context-protocol-roadmap-2026/) +- [MCP Wikipedia](https://en.wikipedia.org/wiki/Model_Context_Protocol) +- [IBM - ACP](https://www.ibm.com/think/topics/agent-communication-protocol) +- [IBM Research - ACP](https://research.ibm.com/blog/agent-communication-protocol-ai) +- [ACP Joins A2A](https://lfaidata.foundation/communityblog/2025/08/29/acp-joins-forces-with-a2a-under-the-linux-foundations-lf-ai-data/) +- [AGENTS.md Official Site](https://agents.md/) +- [InfoQ - AGENTS.md](https://www.infoq.com/news/2025/08/agents-md/) +- [IBM - What is BeeAI](https://www.ibm.com/think/topics/beeai) +- [NIST - AI Agent Standards Initiative](https://www.nist.gov/news-events/news/2026/02/announcing-ai-agent-standards-initiative-interoperable-and-secure) + +### Governance +- [Linux Foundation - AAIF](https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation) +- [OpenAI - AAIF](https://openai.com/index/agentic-ai-foundation/) +- [Anthropic - AAIF](https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation) +- [Block - AAIF](https://block.xyz/inside/block-anthropic-and-openai-launch-the-agentic-ai-foundation) +- [AAIF Official Site](https://aaif.io/) + +### Frameworks & SDKs +- [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/) +- [OpenAI Agents SDK GitHub](https://github.com/openai/openai-agents-python) +- [Google ADK Docs](https://google.github.io/adk-docs/) +- [Google ADK GitHub](https://github.com/google/adk-python) +- [AWS Strands Agents](https://strandsagents.com/) +- [AWS - Introducing Strands](https://aws.amazon.com/blogs/opensource/introducing-strands-agents-an-open-source-ai-agents-sdk/) +- [AWS - Strands 1.0](https://aws.amazon.com/blogs/opensource/introducing-strands-agents-1-0-production-ready-multi-agent-orchestration-made-simple/) +- [Microsoft Agent Framework](https://github.com/microsoft/agent-framework) +- [Microsoft Learn - Agent Framework](https://learn.microsoft.com/en-us/agent-framework/overview/) +- [Visual Studio Magazine - Agent Framework](https://visualstudiomagazine.com/articles/2025/10/01/semantic-kernel-autogen--open-source-microsoft-agent-framework.aspx) +- [LangGraph](https://github.com/langchain-ai/langgraph) +- [CrewAI](https://crewai.com/) +- [OpenAgents](https://openagents.org/) +- [OpenAgents GitHub](https://github.com/openagents-org/openagents) +- [GitAgent GitHub](https://github.com/open-gitagent/gitagent) +- [MarkTechPost - GitAgent](https://www.marktechpost.com/2026/03/22/meet-gitagent-the-docker-for-ai-agents-that-is-finally-solving-the-fragmentation-between-langchain-autogen-and-claude-code/) +- [Goose GitHub](https://github.com/block/goose) +- [Block - Introducing Goose](https://block.xyz/inside/block-open-source-introduces-codename-goose) + +### Routing & Gateways +- [LiteLLM Docs](https://docs.litellm.ai/) +- [LiteLLM GitHub](https://github.com/BerriAI/litellm) +- [OpenRouter](https://openrouter.ai/) +- [Top 5 LiteLLM Alternatives 2026](https://www.getmaxim.ai/articles/top-5-litellm-alternatives-in-2026/) + +### Desktop Tools +- [VS Code Blog - Multi-Agent](https://code.visualstudio.com/blogs/2026/02/05/multi-agent-development) +- [The New Stack - VS Code Multi-Agent](https://thenewstack.io/vs-code-becomes-multi-agent-command-center-for-developers/) +- [GitHub Blog - Agent HQ](https://github.blog/news-insights/company-news/pick-your-agent-use-claude-and-codex-on-agent-hq/) +- [Augment Code - Intent](https://www.augmentcode.com/blog/intent-a-workspace-for-agent-orchestration) +- [Augment Code - Best Desktop Apps](https://www.augmentcode.com/tools/best-ai-coding-agent-desktop-apps) +- [IntuitionLabs - Codex App](https://intuitionlabs.ai/articles/openai-codex-app-ai-coding-agents) + +### Framework Comparisons +- [DataCamp - CrewAI vs LangGraph vs AutoGen](https://www.datacamp.com/tutorial/crewai-vs-langgraph-vs-autogen) +- [OpenAgents Blog - Frameworks Compared](https://openagents.org/blog/posts/2026-02-23-open-source-ai-agent-frameworks-compared) +- [DEV - Agent Showdown 2026](https://dev.to/topuzas/the-great-ai-agent-showdown-of-2026-openai-autogen-crewai-or-langgraph-1ea8) +- [Shakudo - Top 9 AI Agent Frameworks](https://www.shakudo.io/blog/top-9-ai-agent-frameworks) +- [AIMultiple - Top 5 Agentic Frameworks 2026](https://aimultiple.com/agentic-frameworks) + +### Market Research +- [Gravitee - A2A vs MCP](https://www.gravitee.io/blog/googles-agent-to-agent-a2a-and-anthropics-model-context-protocol-mcp) +- [RUH.AI - AI Agent Protocols 2026 Complete Guide](https://www.ruh.ai/blogs/ai-agent-protocols-2026-complete-guide) +- [Thoughtworks - MCP Impact 2025](https://www.thoughtworks.com/en-us/insights/blog/generative-ai/model-context-protocol-mcp-impact-2025) +- [Shipyard - Claude Code Multi-Agent 2026](https://shipyard.build/blog/claude-code-multi-agent/) diff --git a/docs/research/ai-orchestration-tools-part2.md b/docs/research/ai-orchestration-tools-part2.md new file mode 100644 index 00000000..374387d2 --- /dev/null +++ b/docs/research/ai-orchestration-tools-part2.md @@ -0,0 +1,705 @@ +# AI Agent Orchestrators & Dispatchers — Part 2 + +> Research date: 2026-03-24 +> Focus: Provider-agnostic agent abstraction layers, dispatch systems, and multi-agent coding orchestrators +> Scope: NEW tools not covered in Part 1 + +--- + +## Tier 1: Desktop Apps & ADEs (Agentic Development Environments) + +These are the most relevant to our product — desktop applications that provide a UI layer for managing multiple coding agents. + +### 1. Emdash (YC W26) + +- **GitHub:** https://github.com/generalaction/emdash +- **Stars:** ~2,700+ +- **License:** Open source (exact license TBD) +- **Language:** Electron-based desktop app +- **Unique:** First YC-backed "Agentic Development Environment" (ADE). Run multiple coding agents in parallel, each isolated in its own git worktree, either locally or over SSH. + +**Agent providers:** 22 CLI agents supported — Claude Code, Qwen Code, Amp, Codex, Gemini CLI, and more. + +**Architecture:** +- Each agent runs in its own git worktree with full isolation +- Built-in ticket integrations: Linear, GitHub, Jira — pass tickets directly to agents +- Remote development via SSH/SFTP with secure keychain credential storage +- Built-in diff review, PR creation, CI/CD checks, and merge +- Privacy-first: Emdash itself sends no code/chat data to any servers + +**Integration potential:** DIRECT COMPETITOR. Very similar concept to our app. Key differences: Emdash is more a "parallel agent launcher" while we focus on team orchestration with inter-agent communication and kanban management. + +**Maturity:** Active development, YC-backed, growing fast (966 -> 2700 stars in weeks). Available for macOS (Apple Silicon + Intel) and Linux. + +**Source:** [GitHub](https://github.com/generalaction/emdash) | [emdash.sh](https://www.emdash.sh/) | [YC profile](https://www.ycombinator.com/companies/emdash) + +--- + +### 2. Constellagent + +- **GitHub:** https://github.com/owengretzinger/constellagent +- **Stars:** TBD (listed in awesome-agent-orchestrators) +- **License:** Open source +- **Language:** macOS desktop app +- **Unique:** Each agent gets its own terminal, editor, and git worktree — all in one window. macOS-native UI. + +**Agent providers:** Any CLI-based coding agent (Claude Code, Codex, Gemini CLI, etc.) + +**Architecture:** +- Side-by-side agent sessions with isolated git worktrees +- Built-in terminal + code editor per agent +- macOS-native (not Electron) + +**Integration potential:** Simpler than our app but validates the "multi-agent desktop UI" market. macOS-only limits audience. + +**Source:** [GitHub](https://github.com/owengretzinger/constellagent) + +--- + +## Tier 2: CLI Orchestrators with Provider Abstraction + +### 3. ORCH + +- **GitHub:** https://www.orch.one/ (listed in awesome-agent-orchestrators) +- **Stars:** TBD +- **License:** MIT +- **Language:** TypeScript +- **Unique:** CLI runtime with formal STATE MACHINE for task lifecycle (`todo -> in_progress -> review -> done`). Agents talk to each other, share context, and run 24/7 as a daemon. + +**Agent providers:** 5 built-in adapters — Claude (Anthropic), OpenCode (multi-provider via OpenRouter), Codex (OpenAI), Cursor, and a universal Shell adapter (anything that takes a prompt). + +**Architecture:** +- Each AI tool wrapped in adapter implementing common interface (`src/infrastructure/adapters/`) +- Event bus with wildcard subscriptions for TUI activity feed +- Git worktree isolation per agent +- Inter-agent messaging + shared context +- All state stored locally in `.orchestry/` — no telemetry +- "Set goal at 10pm, wake up to pull requests" + +**Integration potential:** Very interesting adapter pattern. The common interface + event bus architecture is close to what we'd need for a provider abstraction layer. Could study their adapter implementations. + +**Source:** [orch.one](https://www.orch.one/) | [DEV article](https://dev.to/oxgeneral/orchestrating-a-team-of-ai-agents-from-a-single-cli-4h6) + +--- + +### 4. Agent Swarm (Desplega AI) + +- **GitHub:** https://github.com/desplega-ai/agent-swarm +- **Stars:** Notable stargazers (Andrew Ng, Chip Huyen). Exact count TBD. +- **License:** MIT +- **Language:** TypeScript +- **Unique:** Full lead/worker coordination with Docker isolation, compounding memory, persistent agent identity (SOUL.md, IDENTITY.md), and DAG-based workflow engine. + +**Agent providers:** Claude Code (primary), pi-mono. Provider adapter pattern via `HARNESS_PROVIDER=claude|pi`. Codex, Gemini CLI support planned. + +**Architecture:** +- Lead agent decomposes tasks, delegates to worker agents in Docker containers +- MCP API server backed by SQLite for communication and state +- Persistent searchable filesystem shared across swarm (agent-fs) +- Compounding memory: agents learn from every session via summaries + OpenAI embeddings +- Persistent identity: agents have evolving SOUL.md/IDENTITY.md files +- DAG-based workflow engine with triggers, conditions, checkpoint durability +- Integrations: Slack, GitHub, GitLab, Email, Linear +- Dashboard UI with real-time monitoring + debug dashboard with SQL query interface + +**Integration potential:** Most feature-rich orchestrator found. The persistent identity and compounding memory concepts are innovative. Dashboard UI could inspire features. + +**Source:** [GitHub](https://github.com/desplega-ai/agent-swarm) | [Docs](https://docs.agent-swarm.dev) | [Dashboard](https://agent-swarm.desplega.sh/) + +--- + +### 5. Kodo + +- **GitHub:** Listed in awesome-agent-orchestrators +- **Stars:** ~37 +- **License:** Open source +- **Unique:** SWE-bench verified. Autonomous multi-agent orchestrator with independent architect and tester verification stages in work cycles. + +**Agent providers:** Claude Code, Codex, Gemini CLI + +**Architecture:** +- Directs agents through work cycles +- Independent architect verification +- Independent tester verification +- SWE-bench validated results + +**Integration potential:** Small project but interesting verification-centric workflow approach. + +**Source:** [awesome-agent-orchestrators](https://github.com/andyrewlee/awesome-agent-orchestrators) + +--- + +### 6. AgentFactory (Supaku) + +- **GitHub:** https://github.com/supaku/agentfactory +- **Stars:** TBD +- **License:** Open source +- **Language:** TypeScript +- **Unique:** "Software factory" with assembly-line pipeline (dev -> QA -> acceptance). Distributed worker pool via Redis. Exposes fleet as MCP server. Implements A2A protocol v0.3.0. + +**Agent providers:** Claude, Codex, Spring AI (via `AgentProvider` interface) + +**Architecture:** +- `AgentProvider` interface for pluggable agent backends +- Pipeline: development -> QA -> acceptance (like CI/CD for agents) +- Distributed worker pool: webhook server + Redis queue + multiple worker nodes +- MCP server exposure: any MCP-aware client can interact with fleet +- A2A protocol support (v0.3.0) — operates as both client and server +- Spring AI Bench integration for benchmarking +- Scaffolding: `@supaku/create-agentfactory-app` +- One-click deploy to Vercel/Railway +- Linear integration for issue tracking + +**Integration potential:** The A2A + MCP server approach is very forward-looking. Enterprise Java teams can use Spring AI agents alongside Claude/Codex. + +**Source:** [GitHub](https://github.com/supaku/agentfactory) + +--- + +## Tier 3: Framework-Level Abstraction Layers + +### 7. Mozilla any-agent + +- **GitHub:** https://github.com/mozilla-ai/any-agent +- **Stars:** ~1,100+ +- **License:** Open source (Mozilla) +- **Language:** Python +- **Unique:** META-FRAMEWORK. Build agent once, switch frameworks by changing `AgentFramework` config parameter. Normalized logging via open-inference. Trace-first evaluation with LLM-as-judge. + +**Agent frameworks supported:** Abstraction over multiple agent frameworks (not providers) — lets you swap between different frameworks without rewriting agent code. + +**Architecture:** +- Single interface to different agent frameworks +- Normalized logging regardless of framework +- Trace-first evaluation approach +- Multi-agent via "Agents-As-Tools" pattern +- Companion projects: `any-llm` (LLM provider abstraction), `any-guardrail`, `Agent Factory` (natural language to agents), `mcpd` ("requirements.txt for agentic systems") + +**Integration potential:** Different abstraction level than what we need. Useful if we want to abstract over agent frameworks rather than coding agent CLIs. The `mcpd` tool for MCP server management is interesting. + +**Source:** [GitHub](https://github.com/mozilla-ai/any-agent) | [Blog](https://blog.mozilla.ai/introducing-any-agent-an-abstraction-layer-between-your-code-and-the-many-agentic-frameworks/) | [Docs](https://mozilla-ai.github.io/any-agent/) + +--- + +### 8. VoltAgent + +- **GitHub:** https://github.com/VoltAgent/voltagent +- **Stars:** TBD (active GitHub org with multiple repos) +- **License:** MIT +- **Language:** TypeScript +- **Unique:** "Refine.dev for AI agents" — TypeScript-first with n8n-style visual debugging console. Multi-agent orchestration with resumable streaming and voice support. + +**Agent providers:** OpenAI, Anthropic, Google, and others — swap by changing config, not code. + +**Architecture:** +- LLM-agnostic: provider swap via config +- Memory adapters (durable, cross-run) +- Resumable streaming: clients reconnect to in-flight streams after refresh +- RAG + Knowledge Base: managed document ingestion, chunking, embeddings, search +- Guardrails: runtime input/output validation +- Evals: built-in eval suites +- Voice: TTS/STT with OpenAI, ElevenLabs, custom providers +- VoltOps Console: observability, automation, deployment, evals (cloud & self-hosted) +- MCP docs server for AI coding assistants + +**Integration potential:** Great TypeScript framework if we want to build our own agent abstraction. The resumable streaming pattern is relevant for Electron apps. + +**Source:** [GitHub](https://github.com/VoltAgent/voltagent) | [voltagent.dev](https://voltagent.dev/) + +--- + +### 9. Mastra + +- **GitHub:** https://github.com/mastra-ai/mastra +- **Stars:** 7,500+ (as of early reports, likely higher now) +- **License:** Open source (EE features source-available under enterprise license) +- **Language:** TypeScript +- **Created by:** Team behind Gatsby (YC-backed) +- **Unique:** "Batteries-included TypeScript AI framework." Used by Replit Agent 3 (improved task success 80% -> 96%). Supports 81 LLM providers and 2,436+ models via Vercel AI SDK. + +**Agent providers:** 40+ providers via Vercel AI SDK (OpenAI, Anthropic, Gemini, etc.) + +**Architecture:** +- Model routing: 40+ providers through one interface +- Human-in-the-loop: suspend/resume with stored execution state +- Context management: conversation history, data retrieval, working + semantic memory +- MCP servers: expose agents/tools/resources via MCP +- Integration with React, Next.js, Node.js +- Serverless deployment: Vercel, Cloudflare, Netlify, or Mastra hosting +- `npm create mastra@latest` for quick start + +**Integration potential:** Very mature TypeScript SDK. Could be used as an underlying agent framework in our Electron app. The human-in-the-loop suspend/resume is exactly what we need for kanban workflows. + +**Source:** [GitHub](https://github.com/mastra-ai/mastra) | [mastra.ai](https://mastra.ai/) | [YC profile](https://www.ycombinator.com/companies/mastra) + +--- + +## Tier 4: Coding Agent Platforms (Individual Agents with Multi-Provider Support) + +### 10. Goose (Block) + +- **GitHub:** https://github.com/block/goose +- **Stars:** 27,000+ +- **License:** Apache 2.0 +- **Language:** Rust +- **Unique:** By Block (Square, Cash App). 25+ LLM providers, 3,000+ MCP servers. Contributed to Linux Foundation's Agentic AI Foundation alongside Anthropic's MCP and OpenAI's AGENTS.md. + +**Agent providers:** 25+ LLM providers (OpenAI, Anthropic, Google, DeepSeek, local via Ollama). Can even use Claude Code as a model provider inside Goose. + +**Architecture:** +- Multi-provider with multi-model configuration (use different models for different tasks in same session) +- Subagents for parallel task execution with isolated workspaces +- MCP-native (among first agents to support MCP) +- CLI + Desktop app (not IDE-locked) +- Recipes system for reusable workflows +- Completely free + open source; you only pay LLM API costs + +**Integration potential:** Goose itself is a coding agent, not an orchestrator. But its multi-provider architecture and MCP integration patterns are worth studying. Could be one of the agents our UI orchestrates. + +**Source:** [GitHub](https://github.com/block/goose) | [block.github.io/goose](https://block.github.io/goose/) | [AI Tool Analysis Review](https://aitoolanalysis.com/goose-ai-review/) + +--- + +### 11. OpenCode + +- **GitHub:** https://github.com/opencode-ai/opencode +- **Stars:** 95K-120K+ (massive growth, surpassed Claude Code in stars) +- **License:** Open source +- **Language:** Go (Bubble Tea TUI) +- **Created by:** Team behind SST (Serverless Stack) and terminal.shop +- **Unique:** Go-based terminal agent with 75+ LLM providers. Built-in TUI with Vim-like editor. 5M+ monthly developers. + +**Agent providers:** 75+ providers — OpenAI, Anthropic, Google Gemini, AWS Bedrock, Groq, Azure OpenAI, OpenRouter, and more. + +**Architecture:** +- Interactive TUI built with Bubble Tea +- Session management with persistent SQLite storage +- Multiple agent types: plan agent (analysis), general-purpose agent (full tool access) +- Parallel work units +- MCP integration for external tools +- LSP integration for code intelligence +- Provider-agnostic philosophy: "as models evolve, being provider-agnostic is important" + +**Integration potential:** OpenCode is a single-agent tool, not an orchestrator. However, it's the most popular open-source alternative to Claude Code. Worth considering as a supported runtime for our orchestrator. + +**Source:** [GitHub](https://github.com/opencode-ai/opencode) | [opencode.ai](https://opencode.ai/) | [OpenCode Docs - Agents](https://opencode.ai/docs/agents/) | [OpenCode Docs - Providers](https://opencode.ai/docs/providers/) + +--- + +### 12. OpenHands (formerly OpenDevin) + +- **GitHub:** https://github.com/OpenHands/OpenHands +- **Stars:** 68,600+ +- **License:** MIT +- **Language:** Python +- **Unique:** Cloud coding agent platform with $18.8M Series A. Solves 87% of bug tickets same day. Event stream architecture with typed events. + +**Agent providers:** 100+ providers via LiteLLM (OpenAI, Anthropic, Google, etc.). Git providers: GitHub, GitLab, Bitbucket, Azure DevOps, Forgejo. + +**Architecture:** +- Event stream architecture: all agent-environment interactions as typed events through central hub +- Agent -> Runtime -> EventStream -> LLM pipeline +- Hierarchical agent coordination via delegation tool +- Sub-agents as independent conversations inheriting parent config +- Distributed deployment: WebSocket for agent/runtime communication +- Isolated Docker/Kubernetes environments +- V1 SDK transition: moving from mandatory Docker to optional sandboxing +- Software Agent SDK for building custom agents + +**Integration potential:** Enterprise-grade platform. The event stream architecture and typed events pattern could inspire our agent communication protocol. + +**Source:** [GitHub](https://github.com/OpenHands/OpenHands) | [openhands.dev](https://openhands.dev/) | [Software Agent SDK paper](https://arxiv.org/html/2511.03690v1) + +--- + +## Tier 5: Specialized Multi-Agent Coding Systems + +### 13. Liza (Disciplined Multi Coding Agent System) + +- **GitHub:** https://github.com/liza-mas/liza +- **Stars:** TBD +- **License:** Open source +- **Unique:** "Lisa Simpson vs Ralph Wiggum" philosophy. 55+ LLM failure modes mapped to countermeasures. Behavioral contracts, blackboard coordination, and explicit state machine. MOST disciplined approach to multi-agent coding. + +**Architecture:** +- Behavioral contract with Tier 0 invariants (never violated) +- Blackboard coordination: shared file tracks goals, tasks, assignments, history +- Stateless agents with external specs for context handoff +- Approval Request mechanism forces reasoning before acting +- Deterministic pre/post hooks at role transitions +- Orchestrator-routed model selection +- Agent roles: Coder, Security Auditor, Security Audit Reviewer +- Sprint-based workflow: autonomous within sprints, human reviews between sprints +- CLI: `liza setup`, `liza init`, `liza agent coder`, `liza validate`, `liza watch`, `liza sprint-checkpoint` + +**Integration potential:** The behavioral contract and blackboard coordination concepts are academically interesting and could improve agent reliability. + +**Source:** [GitHub](https://github.com/liza-mas/liza) + +--- + +### 14. Multi-Agent Coding System (Danau5tin) + +- **GitHub:** https://github.com/Danau5tin/multi-agent-coding-system +- **Stars:** TBD +- **License:** Open source +- **Unique:** Reached #13 on Stanford's TerminalBench (slightly above Claude Code). Novel "Context Store" for multi-agent knowledge sharing. RL-trained 14B Orca-Agent model. + +**Architecture:** +- Orchestrator + Explorer + Coder agents with knowledge artifacts +- Context Store: persistent knowledge layer with selective injection +- Trust Calibration Strategy: adaptive delegation based on task complexity +- Orchestrator cannot read/modify code directly — operates at architectural level only +- Companion project: Orca-Agent-RL (14B model, trained on 32x H100s) + +**Integration potential:** The Context Store pattern for multi-agent knowledge sharing is a novel approach worth studying. + +**Source:** [GitHub](https://github.com/Danau5tin/multi-agent-coding-system) | [Hacker News](https://news.ycombinator.com/item?id=45113348) + +--- + +### 15. Open SWE (LangChain) + +- **GitHub:** https://github.com/langchain-ai/open-swe +- **Stars:** 7,700+ +- **License:** MIT +- **Language:** Python +- **Unique:** Built on LangGraph Deep Agents framework. Multi-agent architecture (Manager, Planner, Programmer, Reviewer). Captures patterns used by Stripe, Ramp, Coinbase for internal coding agents. + +**Agent providers:** Any LLM via LangGraph. Multiple sandbox providers: Modal, Daytona, Runloop, LangSmith. + +**Architecture:** +- Manager -> Planner -> Programmer -> Reviewer pipeline +- Isolated Daytona sandboxes per task +- Subagent orchestration via Deep Agents task tool +- Middleware hooks: deterministic middleware around agent loop +- AGENTS.md support: read from sandbox, injected into system prompt +- Async & cloud-native: multiple tasks in parallel, "double texting" support +- Integrations: Linear, Slack, GitHub + +**Integration potential:** Enterprise-grade coding agent framework. The middleware hook pattern and AGENTS.md support are interesting patterns. + +**Source:** [GitHub](https://github.com/langchain-ai/open-swe) | [LangChain Blog](https://blog.langchain.com/introducing-open-swe-an-open-source-asynchronous-coding-agent/) + +--- + +### 16. DeerFlow 2.0 (ByteDance) + +- **GitHub:** https://github.com/bytedance/deer-flow +- **Stars:** 37,000+ +- **License:** MIT +- **Language:** Python +- **Unique:** ByteDance's "SuperAgent harness." Ground-up rewrite of v1. Multi-service architecture with Nginx reverse proxy. Skills system for extensibility. #1 GitHub Trending within 24h of launch. + +**Agent providers:** Model-agnostic — any OpenAI-compatible API (GPT-4, Claude, Gemini, DeepSeek, local models via Ollama). + +**Architecture:** +- Harness (core): agent orchestration, tools, sandbox, models, MCP, skills, config +- App layer: FastAPI Gateway API + IM channel integrations (Feishu, Slack, Telegram) +- Lead agent decomposes tasks, spawns sub-agents with scoped contexts +- Docker-sandboxed execution per sub-agent (own filesystem, bash terminal) +- Skills system: Markdown-based workflow definitions with best practices +- Persistent JSON memory system (user context, history, facts with confidence scores) +- Three sandbox modes (configurable via config.yaml) +- MCP servers with OAuth token flows + +**Integration potential:** Impressive scale and ByteDance backing. Skills system is interesting — Markdown-based workflow definitions could be adapted for our agent team recipes. + +**Source:** [GitHub](https://github.com/bytedance/deer-flow) | [deerflow.tech](https://deerflow.tech/) | [DeepWiki analysis](https://deepwiki.com/bytedance/deer-flow) + +--- + +## Tier 6: Infrastructure & Runtime Frameworks + +### 17. Dapr Agents (CNCF) + +- **GitHub:** https://github.com/dapr/dapr-agents +- **Stars:** Part of Dapr ecosystem (34K+ stars for main Dapr project) +- **License:** Open source (CNCF) +- **Language:** Python (only) +- **Unique:** v1.0 GA announced at KubeCon Europe 2026. DurableAgent class: every LLM call and tool execution is a checkpoint. Kill process mid-workflow, resume from last saved point. + +**Agent providers:** LLM provider decoupling via Dapr Conversation API — swap LLMs without code changes (OpenAI, Anthropic, AWS Bedrock, etc.) + +**Architecture:** +- Kubernetes-native: distribute thousands of agents across pods/nodes +- DurableAgent with checkpoint/resume +- Multi-agent via Dapr pub/sub messaging +- Coordination models: LLM-based, random, round-robin +- SPIFFE identity for agent-to-agent authorization +- Distributed tracing via OTEL + Prometheus metrics +- mTLS encrypted communication +- Enterprise adoption: ZEISS, EU logistics companies + +**Integration potential:** Overkill for desktop app, but the DurableAgent checkpoint/resume pattern could inspire our agent crash recovery. Python-only is a limitation. + +**Source:** [GitHub](https://github.com/dapr/dapr-agents) | [Diagrid Blog](https://www.diagrid.io/blog/dapr-agents-1-0-durable-cloud-native-production-ready) | [KubeCon announcement](https://jangwook.net/en/blog/en/dapr-agents-v1-cncf-production-ai-framework/) + +--- + +### 18. Sandcastle + +- **GitHub:** https://github.com/gizmax/Sandcastle +- **Stars:** TBD +- **License:** Open source +- **Language:** Python +- **Unique:** EU AI Act compliance built-in. 63 integrations. YAML-defined workflows. Smart model routing (quality/cost/latency constraints per step). 118 built-in + 118 community workflow templates. + +**Agent providers:** OpenAI, Anthropic, plus many more via multi-provider routing. Budget pressure detection forces cheaper models. + +**Architecture:** +- YAML workflow definitions with DAG dependencies and parallel branches +- 4 sandbox backends: E2B cloud microVMs, Docker, Cloudflare Workers edge, local subprocess +- Smart model routing with historical performance data +- 5 browser automation modes (Playwright, Computer Use, DOM Extract, LightPanda, Browserbase) +- Real-time SSE dashboard (runs, costs, schedules, approvals, experiments) +- A/B testing models and prompts per step with auto-deployment +- Replay & checkpoints: re-run from any step +- PII redaction and tamper-evident audit trail +- Agent runtime with circuit breaker and pool management + +**Integration potential:** Enterprise-grade workflow orchestrator. The smart model routing and A/B testing capabilities could be interesting for our team management feature. + +**Source:** [GitHub](https://github.com/gizmax/Sandcastle) | [gizmax.cz/sandcastle](https://gizmax.cz/sandcastle/) + +--- + +### 19. AgentScope + Runtime (Alibaba/Tongyi Lab) + +- **GitHub:** https://github.com/agentscope-ai/agentscope (~18,900+ stars) + https://github.com/agentscope-ai/agentscope-runtime +- **License:** Open source +- **Language:** Python (+ Java implementation) +- **Unique:** Production-ready agent platform with SEPARATE runtime framework. Framework-agnostic runtime (not tied to AgentScope itself). "Agent as API" approach. Java SDK available. + +**Agent providers:** OpenAI, DashScope, Gemini, Anthropic, self-hosted open-source models. Provider-agnostic via formatter system. + +**Architecture:** +- AgentScope: agent development framework with multi-agent collaboration +- AgentScope Runtime: separate deployment infrastructure (sandboxing, state management, memory) +- Runtime is framework-agnostic — works with other agent frameworks too +- Agent-as-API: white-box development experience +- Multi-layer hook system for observability (OpenTelemetry integration) +- Serverless deployment support (Alibaba Cloud FC) +- Java implementation (Spring AI Alibaba, Langchain4j) +- ReAct agent built implementation-agnostic + +**Integration potential:** The separation of agent framework from runtime is architecturally clean. The framework-agnostic runtime concept aligns with our need for a provider-neutral orchestration layer. + +**Source:** [GitHub (main)](https://github.com/agentscope-ai/agentscope) | [GitHub (runtime)](https://github.com/agentscope-ai/agentscope-runtime) + +--- + +### 20. OpenAgentsControl (OAC) + +- **GitHub:** https://github.com/darrenhinde/OpenAgentsControl +- **Stars:** ~2,900 +- **License:** Open source +- **Language:** Built on OpenCode +- **Unique:** Plan-first, approval-based execution. "Minimal Viable Information" (MVI) principle = 80% token reduction. Editable agents via Markdown files. + +**Agent providers:** Model-agnostic — Claude, GPT, Gemini, local models (Ollama, LM Studio). Built on OpenCode. + +**Architecture:** +- Propose -> Approve -> Execute model +- MVI principle: load only relevant patterns per task (80% token savings) +- Editable agents: modify behavior by editing Markdown files +- Custom Agent System Builder wizard +- Coding patterns committed to repos (team consistency) +- Multi-language: TypeScript, Python, Go, Rust + +**Integration potential:** The MVI token reduction technique and editable Markdown agents are useful ideas. Plan-first approach aligns with structured team workflows. + +**Source:** [GitHub](https://github.com/darrenhinde/OpenAgentsControl) | [BrightCoding review](https://www.blog.brightcoding.dev/2026/02/19/openagentscontrol-the-revolutionary-ai-agent-framework) + +--- + +### 21. NeuroLink (Juspay) + +- **GitHub:** https://github.com/juspay/neurolink +- **Stars:** ~119 +- **License:** MIT +- **Language:** TypeScript +- **Unique:** Enterprise-grade unified API for 12 major AI providers and 100+ models. Extracted from production systems at Juspay. Multi-provider failover and automatic cost optimization. + +**Agent providers:** 12 providers unified: OpenAI, Google, Anthropic, AWS, Azure, Groq, Together AI, Mistral, Cohere, Fireworks, Cloudflare, Ollama. 300+ models via OpenRouter integration. + +**Architecture:** +- Single API for 12+ providers (switch with one parameter change) +- 64+ built-in tools and MCP servers +- Multi-step agentic loops with per-step tool execution control +- Persistent memory (Redis/S3/SQLite) +- HITL workflows +- Structured output with Zod schemas +- Auto cost optimization and multi-provider failover +- LiteLLM integration for 100+ models +- TypeScript SDK + professional CLI + +**Integration potential:** Good TypeScript SDK for unified LLM access. If we need to add direct LLM provider abstraction (beyond just spawning CLI agents), NeuroLink's approach is solid. + +**Source:** [GitHub](https://github.com/juspay/neurolink) + +--- + +### 22. Pi-mono (badlogic) + +- **GitHub:** https://github.com/badlogic/pi-mono +- **Stars:** TBD +- **License:** Open source +- **Language:** TypeScript (npm packages) +- **Unique:** Minimal terminal coding harness with 4 modes: interactive, print/JSON, RPC, and SDK for embedding. Extensible via TypeScript Extensions, Skills, Prompt Templates, and Themes. + +**Agent providers:** Multi-provider via `Api` type union. Providers added by extending the API identifier system. + +**Architecture:** +- Monorepo with multiple packages (`packages/coding-agent`, etc.) +- 4 modes: interactive, print/JSON, RPC (process integration), SDK (embedding) +- OpenClaw SDK integration for real-world use +- Extension system: TypeScript Extensions, Skills, Prompt Templates, Themes +- Packaged as npm packages for sharing +- Used as a provider in Agent Swarm (`HARNESS_PROVIDER=pi`) + +**Integration potential:** The RPC and SDK modes are interesting for embedding a coding agent into our Electron app. Minimal footprint philosophy is appealing. + +**Source:** [GitHub](https://github.com/badlogic/pi-mono) + +--- + +### 23. Agentic Fleet (Qredence) + +- **GitHub:** https://github.com/Qredence/agentic-fleet +- **Stars:** TBD +- **License:** Open source +- **Language:** Python (backend) + React 19 + TypeScript (frontend) +- **Unique:** Built on Microsoft Agent Framework's Magentic Fleet pattern. Five-phase pipeline: analysis -> routing -> execution -> progress -> quality. + +**Architecture:** +- Backend: Python 3.12/3.13, FastAPI, Typer CLI, DSPy, Microsoft Agent Framework +- Frontend: React 19, TypeScript, Vite, Tailwind CSS, Radix UI, Shadcn UI +- ToolRegistry adapters (Tavily search, browser automation, code execution, MCP) +- Real-time SSE/WebSocket streaming +- Five-phase task pipeline + +**Integration potential:** Good example of combining Microsoft Agent Framework with a React frontend. The ToolRegistry adapter pattern is relevant. + +**Source:** [GitHub](https://github.com/Qredence/agentic-fleet) + +--- + +### 24. Plandex + +- **GitHub:** https://github.com/plandex-ai/plandex +- **Stars:** 15,086 +- **License:** MIT +- **Language:** Go +- **Unique:** Terminal-based AI coding with 2M token context, full version control for AI plans (branches, diff review), and cumulative diff review sandbox. + +**Agent providers:** Combine models from Anthropic, OpenAI, Google, and open source providers. + +**Architecture:** +- 2M token context handling (~100k per file) +- Tree-sitter project maps for 20M+ token directories +- Version control for plans (branches, compare models) +- Cumulative diff review sandbox (changes separate until approved) +- Full autonomy capable but highly configurable step-by-step review +- Git integration with auto-commit + +**Integration potential:** Single agent, not an orchestrator. But the plan version control and diff sandbox concepts are relevant to our code review feature. + +**Source:** [GitHub](https://github.com/plandex-ai/plandex) | [plandex.ai](https://plandex.ai/) + +--- + +## Tier 7: Evolving / Archived (Notable Mentions) + +### 25. ControlFlow -> Marvin 3.0 (PrefectHQ) + +- **GitHub:** https://github.com/PrefectHQ/ControlFlow (archived) -> https://github.com/PrefectHQ/marvin +- **Unique:** Task-centric architecture with Prefect 3.0 observability. Evolved into Marvin 3.0 using Pydantic AI for LLM interactions (full range of providers). +- **Note:** ControlFlow is archived, Marvin 3.0 is the successor with broader provider support. + +**Source:** [GitHub (ControlFlow)](https://github.com/PrefectHQ/ControlFlow) | [GitHub (Marvin)](https://github.com/PrefectHQ/marvin) + +--- + +## Summary Comparison Table + +| Tool | Type | Stars | Language | Agent Providers | Desktop App | Key Differentiator | +|------|------|-------|----------|----------------|-------------|-------------------| +| **Emdash** | ADE | 2,700+ | Electron | 22 CLI agents | Yes | YC W26, tickets integration | +| **Constellagent** | ADE | TBD | macOS native | Any CLI agent | Yes (macOS only) | Terminal+editor+worktree per agent | +| **ORCH** | CLI | TBD | TypeScript | 5 adapters | TUI | State machine, inter-agent messaging | +| **Agent Swarm** | CLI+Dashboard | TBD | TypeScript | Claude, Pi | Dashboard UI | Compounding memory, persistent identity | +| **AgentFactory** | CLI+Web | TBD | TypeScript | Claude, Codex, Spring AI | Dashboard | A2A protocol, MCP server, Redis pool | +| **Goose** | Agent | 27K+ | Rust | 25+ LLM providers | Desktop+CLI | Linux Foundation, MCP-native | +| **OpenCode** | Agent | 95K+ | Go | 75+ providers | TUI | Fastest-growing, Bubble Tea UI | +| **OpenHands** | Platform | 68K+ | Python | 100+ via LiteLLM | Web UI | $18.8M Series A, event stream arch | +| **DeerFlow** | Harness | 37K+ | Python | Any OpenAI-compatible | Web UI | ByteDance, skills system | +| **Open SWE** | Framework | 7,700+ | Python | Any via LangGraph | No | LangChain, enterprise patterns | +| **Mastra** | Framework | 7,500+ | TypeScript | 40+ providers | No | By Gatsby team, used by Replit | +| **Mozilla any-agent** | Meta-framework | 1,100+ | Python | Framework abstraction | No | Switch frameworks, not providers | +| **VoltAgent** | Framework | TBD | TypeScript | OpenAI, Anthropic, Google | Console UI | Resumable streaming, voice | +| **Dapr Agents** | Runtime | Part of 34K+ | Python | Via Conversation API | No | CNCF, Kubernetes-native, durable agents | +| **Liza** | System | TBD | CLI | Any LLM | No | Behavioral contracts, 55+ failure modes | +| **Sandcastle** | Orchestrator | TBD | Python | Multi-provider routing | Dashboard | EU AI Act, YAML workflows, 118 templates | + +--- + +## Key Architectural Patterns Observed + +### 1. Agent Runtime Interface Pattern +**Used by:** ORCH, Overstory, Agent Swarm, AgentFactory +- Define a common interface (spawn, configure, detect readiness, parse transcript) +- Each agent provider gets an adapter implementing this interface +- Swap providers without changing orchestration logic + +### 2. Git Worktree Isolation Pattern +**Used by:** Emdash, Constellagent, ORCH, Agent Swarm, ComposioHQ +- Standard approach for multi-agent parallel work +- Each agent gets its own worktree + branch +- Merge back via PR/conflict resolution + +### 3. Event Stream / Pub-Sub Architecture +**Used by:** OpenHands, ORCH, Dapr Agents +- All agent interactions as typed events through central hub +- Enables observability, replay, and debugging + +### 4. Checkpoint/Resume (Durable Execution) +**Used by:** Dapr Agents, Sandcastle, Mastra +- Every step saves a checkpoint +- Kill process mid-workflow -> resume from last saved point +- Critical for production reliability + +### 5. Lead-Worker Decomposition +**Used by:** Agent Swarm, DeerFlow, Open SWE, Claude Agent Teams (ours) +- Lead agent decomposes tasks +- Workers execute in isolation +- Results stitched back together + +--- + +## Integration Relevance for Claude Agent Teams UI + +### Direct Competitors (UI level) +1. **Emdash** — Most direct competitor. YC-backed. 22 agents. But lacks kanban, inter-agent communication, and team orchestration. +2. **Constellagent** — macOS-only. Simpler scope. + +### Architectural Inspiration +1. **ORCH** — Adapter interface pattern for agent providers + state machine for task lifecycle +2. **Agent Swarm** — Compounding memory + persistent identity + dashboard UI +3. **AgentFactory** — A2A protocol + MCP server exposure + pipeline stages +4. **VoltAgent** — TypeScript-first framework with resumable streaming (relevant for Electron) +5. **Mastra** — Human-in-the-loop suspend/resume via stored state + +### Worth Studying +1. **Liza** — Behavioral contracts for agent reliability +2. **Mozilla any-agent** — Meta-framework approach +3. **OpenHands** — Event stream architecture at scale +4. **DeerFlow** — Skills system (Markdown-based workflow definitions) + +### Key Competitive Advantages We Have +- **Kanban board** — NO ONE else has this for agent orchestration +- **Inter-agent communication** — Most tools only have lead-worker, not peer-to-peer +- **Code review workflow** — Diff view per task with approve/reject +- **Claude Code Agent Teams native support** — Built specifically for Claude's team protocol +- **Context monitoring** — Token usage tracking by category (unique) +- **Zero-setup onboarding** — Built-in Claude Code installation diff --git a/docs/research/ai-orchestration-tools-part3.md b/docs/research/ai-orchestration-tools-part3.md new file mode 100644 index 00000000..3fc536c4 --- /dev/null +++ b/docs/research/ai-orchestration-tools-part3.md @@ -0,0 +1,861 @@ +# AI Orchestration Tools Research — Part 3 + +**Date:** 2026-03-24 +**Focus:** Emerging/niche agent orchestrators, infrastructure-level tools, protocol-first frameworks, TypeScript/Node-based solutions, fleet managers + +--- + +## Table of Contents + +1. [TypeScript-First Agent Frameworks](#1-typescript-first-agent-frameworks) +2. [Infrastructure & Gateway Layer](#2-infrastructure--gateway-layer) +3. [Durable Execution & Workflow Engines](#3-durable-execution--workflow-engines) +4. [Visual & Low-Code Agent Builders](#4-visual--low-code-agent-builders) +5. [Protocol Standards & Ecosystem](#5-protocol-standards--ecosystem) +6. [Coding Agent Fleet Managers](#6-coding-agent-fleet-managers) +7. [Python-First Frameworks (with TS relevance)](#7-python-first-frameworks-with-ts-relevance) +8. [Summary Matrix](#8-summary-matrix) +9. [Recommendations for Claude Agent Teams UI](#9-recommendations-for-claude-agent-teams-ui) + +--- + +## 1. TypeScript-First Agent Frameworks + +### 1.1 Mastra AI + +- **URL:** https://github.com/mastra-ai/mastra +- **Stars:** ~22.3k (March 2026) +- **npm downloads:** 300k+/week +- **License:** Apache 2.0 +- **Funding:** $13M seed (YC W25, Paul Graham, Gradient Ventures) +- **Source:** [Mastra GitHub](https://github.com/mastra-ai/mastra), [Mastra Docs](https://mastra.ai/docs), [The New Stack](https://thenewstack.io/mastra-empowers-web-devs-to-build-ai-agents-in-typescript/) + +**What it is:** From the team behind Gatsby — a full-featured TypeScript framework for AI agents, workflows, RAG, and memory. Model routing to 40+ providers through one interface (OpenAI, Anthropic, Gemini, etc.). + +**Architecture highlights:** +- **Agents** — autonomous entities with LLM + tools + system instructions +- **Workflows** — graph-based state machines with discrete steps, inputs/outputs +- **Memory** — short-term and long-term memory across threads and sessions +- **Mastra Studio** — local developer playground for visualization/debugging +- **Production tools** — built-in evals, observability, tracing + +**Enterprise adoption:** Replit (Agent 3), SoftBank, Marsh McLennan (75k employees), PayPal, Adobe, Docker. + +**Relevance for Electron integration:** +- Pure TypeScript, runs on Node.js natively +- Can deploy as standalone server or embed in existing Node apps +- Most mature TS agent framework by adoption metrics +- Workflow engine could serve as orchestration backend +- **Confidence: 9/10, Reliability: 9/10** + +--- + +### 1.2 Inngest AgentKit + +- **URL:** https://github.com/inngest/agent-kit +- **Stars:** ~793 +- **npm:** `@inngest/agent-kit` +- **License:** Apache 2.0 (core), proprietary cloud +- **Source:** [AgentKit Docs](https://agentkit.inngest.com/overview), [Inngest Blog](https://www.inngest.com/blog/ai-orchestration-with-agentkit-step-ai) + +**What it is:** TypeScript library for building multi-agent networks with deterministic routing, MCP tooling, and durable execution through Inngest's workflow engine. + +**Architecture highlights:** +- **Agents** — LLM calls with prompts, tools, and MCP +- **Networks** — agents collaborate with shared State and handoff +- **Routers** — from code-based to LLM-based (ReAct) orchestration +- **State** — typed state machine combined with conversation history +- **Tracing** — built-in debug/optimize locally and in cloud +- **React hooks** — `@inngest/use-agent` for frontend integration +- Supports OpenAI, Anthropic, Gemini, and OpenAI-compatible models + +**Key differentiator:** Backed by Inngest's durable execution engine — agents survive crashes, can pause/resume, and handle long-running tasks with automatic retries. This is critical for production reliability. + +**Relevance for Electron integration:** +- Pure TypeScript, lightweight +- Good abstraction for multi-agent networks with routing +- Durable execution is exactly what production agent teams need +- React hooks for UI integration +- **Confidence: 7/10, Reliability: 7/10** + +--- + +### 1.3 VoltAgent + +- **URL:** https://github.com/VoltAgent/voltagent +- **Stars:** ~5.1k (March 2026) +- **License:** MIT +- **Source:** [VoltAgent site](https://voltagent.dev/), [GitHub](https://github.com/VoltAgent/voltagent), [MarkTechPost](https://www.marktechpost.com/2025/04/22/meet-voltagent-a-typescript-ai-framework-for-building-and-orchestrating-scalable-ai-agents/) + +**What it is:** Observability-first TypeScript AI agent framework with Memory, RAG, Guardrails, Tools, MCP, Voice, Workflow support. + +**Architecture highlights:** +- **VoltOps Console** — like n8n but for debugging AI agents (cloud & self-hosted) +- Multi-agent workflows via Chain API — compose, branch, orchestrate +- Workflow steps typed with Zod schemas (compile-time safety + runtime validation) +- Human-in-the-loop with pause/resume +- MCP support, bring-your-own LLMs + +**Key differentiator:** Observability as a first-class concern. The VoltOps console provides real-time monitoring, debugging, and workflow visualization — useful for our kanban-style task monitoring. + +**Relevance for Electron integration:** +- MIT license, TypeScript-first, Node.js native +- Observability features could complement our session analysis +- Zod-based typing aligns with our codebase patterns +- **Confidence: 7/10, Reliability: 6/10** + +--- + +### 1.4 HazelJS + +- **URL:** https://github.com/hazel-js/hazeljs +- **Stars:** Small (early alpha) +- **npm:** `@hazeljs/core`, `@hazeljs/agent`, `@hazeljs/ai`, etc. (38+ packages) +- **License:** Apache 2.0 +- **Source:** [HazelJS site](https://hazeljs.ai/), [DEV.to](https://dev.to/arslan_mecom/from-beta-to-alpha-the-hazeljs-journey-in-38-packages-3nad) + +**What it is:** AI-native backend framework with production-grade Agent Runtime, Agentic RAG, and persistent memory. NestJS-style decorator-based API. + +**Architecture highlights:** +- Modular: 40+ installable npm packages (core, ai, agent, rag, memory, flow, auth, cache...) +- **AgentGraph** + **SupervisorAgent** for multi-agent orchestration +- **@hazeljs/flow** — durable workflow engine with wait/resume, idempotency, retries +- **@hazeljs/memory** — pluggable user memory (in-memory, Postgres, Redis, Prisma, vector) +- Decorator-based: `@Agent`, `@Tool`, `@Controller`, `@SemanticSearch` +- Supports OpenAI, Anthropic, Ollama + +**Key differentiator:** Full backend framework approach (not just agents), NestJS-inspired architecture. Combines web framework + agent runtime + durable workflows in one stack. + +**Relevance for Electron integration:** +- TypeScript-first, modular npm packages +- Durable flow engine could be useful +- Very early (alpha) — risky for production +- **Confidence: 5/10, Reliability: 4/10** + +--- + +### 1.5 Agentica + +- **URL:** https://github.com/wrtnlabs/agentica +- **npm:** `@agentica/core`, `@agentica/rpc` +- **License:** MIT +- **Source:** [Agentica Docs](https://wrtnlabs.io/agentica/), [GitHub](https://github.com/wrtnlabs/agentica) + +**What it is:** TypeScript framework specialized in LLM Function Calling, enhanced by the TypeScript compiler. By Wrtn Technologies. + +**Architecture highlights:** +- **Compiler-driven development** — constructs function calling schemas automatically from TypeScript types via `typia` +- Auto-converts Swagger/OpenAPI/MCP documents into function calling schemas +- **Validation feedback** — detects and corrects AI mistakes in argument composition +- **Selector agent** — filters candidate functions to minimize context/tokens +- Supports embedded controllers: Google Calendar, GitHub, Reddit, Slack, etc. + +**Key differentiator:** Instead of complex agent graphs/workflows, you just list TypeScript class types or OpenAPI docs, and Agentica handles function calling automatically. The compiler does the heavy lifting. + +**Relevance for Electron integration:** +- MIT license, TypeScript-native +- Interesting approach for auto-generating tool interfaces +- Could be useful for generating agent tool schemas from existing code +- **Confidence: 6/10, Reliability: 5/10** + +--- + +### 1.6 Strands Agents (AWS) + +- **URL:** https://github.com/strands-agents +- **Downloads:** 14M+ total (since May 2025) +- **License:** Open source (Apache 2.0) +- **Source:** [Strands site](https://strandsagents.com/), [AWS Blog](https://aws.amazon.com/blogs/opensource/introducing-strands-agents-an-open-source-ai-agents-sdk/) + +**What it is:** Open source SDK from AWS for building AI agents in Python and TypeScript. Model-driven approach — works with Bedrock, Anthropic, OpenAI, and more. + +**Architecture highlights:** +- TypeScript SDK (preview, December 2025) with full type safety, async/await +- Native tools for AWS service interactions +- Edge device support (sub-100ms latency, ARM/x86, offline with llama.cpp) +- **Steering** — modular prompt mechanism to guide agents mid-execution +- **Evaluations** — validate agent behavior +- Multi-agent patterns: Agent-as-Tool, Swarm + +**Key differentiator:** AWS backing, production-tested at enterprise scale. TypeScript support enables browser/server/Lambda deployment. Edge device support is unique. + +**Relevance for Electron integration:** +- TypeScript SDK available +- AWS-heavy ecosystem may add unwanted dependencies +- Good multi-agent patterns (Agent-as-Tool, Swarm) +- **Confidence: 7/10, Reliability: 7/10** + +--- + +### 1.7 OpenAI Agents SDK (TypeScript) + +- **URL:** https://github.com/openai/openai-agents-js +- **Stars:** ~2.1k +- **npm downloads:** ~128k/week +- **License:** MIT +- **Source:** [OpenAI Agents SDK TS](https://openai.github.io/openai-agents-js/) + +**What it is:** Official OpenAI framework for multi-agent workflows and voice agents in TypeScript. + +**Architecture highlights:** +- Agents as tools / Handoffs for cross-agent delegation +- Guardrails for input validation, run in parallel with agent execution +- Function tools with Zod-powered validation and automatic schema generation +- Built-in MCP server tool integration +- TypeScript-first: orchestrate agents using native language features + +**Key differentiator:** Official OpenAI support, lightweight but powerful. Handoff mechanism is well-designed for multi-agent coordination. + +**Relevance for Electron integration:** +- MIT license, pure TypeScript +- Strong typing with Zod +- Model-locked to OpenAI (primary limitation) +- **Confidence: 8/10, Reliability: 7/10** + +--- + +### 1.8 Google ADK for TypeScript + +- **URL:** https://developers.googleblog.com/introducing-agent-development-kit-for-typescript-build-ai-agents-with-the-power-of-a-code-first-approach/ +- **Stars:** ~581 (December 2025 launch) +- **npm downloads:** ~5k/week +- **License:** Apache 2.0 +- **Source:** [Google Developers Blog](https://developers.googleblog.com/introducing-agent-development-kit-for-typescript-build-ai-agents-with-the-power-of-a-code-first-approach/) + +**What it is:** Google's open-source TypeScript framework for building AI agents and multi-agent systems. Code-first approach. + +**Architecture highlights:** +- First-class MCP and A2A protocol support +- Multi-agent coordination +- Code-first TypeScript development + +**Key differentiator:** Google backing, first-class A2A support. Strong protocol-first approach. + +**Relevance for Electron integration:** +- Pure TypeScript, Apache 2.0 +- Still young (December 2025 launch) +- A2A support could be important for future interop +- **Confidence: 6/10, Reliability: 5/10** + +--- + +## 2. Infrastructure & Gateway Layer + +### 2.1 AgentGateway + +- **URL:** https://github.com/agentgateway/agentgateway +- **Stars:** ~2k+ (hit 1M image pulls, 115 contributors) +- **License:** Open source (Linux Foundation) +- **Language:** Rust +- **Source:** [AgentGateway site](https://agentgateway.dev/), [GitHub](https://github.com/agentgateway/agentgateway), [Solo.io Blog](https://www.solo.io/blog/updated-a2a-and-mcp-gateway) + +**What it is:** Next-generation agentic proxy for AI agents and MCP servers. A production-ready gateway for the agentic era, written in Rust. + +**Architecture highlights:** +- **MCP + A2A protocol support** — deep protocol awareness +- **RBAC** — robust role-based access control for MCP/A2A +- **Multi-tenancy** — each tenant with own resources and users +- **Dynamic config via xDS** — no downtime updates +- **Kubernetes-native** — built-in Kubernetes controller via Gateway API +- **LLM routing** — can route traffic to OpenAI, Anthropic, Gemini, Bedrock +- **Legacy API translation** — transforms OpenAPI specs into MCP tools automatically +- **v1.0 released** — production-ready milestone + +**Key differentiator:** The infrastructure layer between agents and their tools/peers. Not an agent framework itself, but the network fabric that makes multi-agent systems work in production. Backed by Solo.io (Envoy/Istio experts), donated to Linux Foundation. + +**Relevance for Electron integration:** +- Written in Rust — not directly embeddable in Node.js +- Could be used as a sidecar/proxy process alongside Electron +- OpenAPI-to-MCP translation is very useful for tool integration +- **Confidence: 6/10, Reliability: 8/10** + +--- + +### 2.2 MCP Gateway & Registry + +- **URL:** https://github.com/agentic-community/mcp-gateway-registry +- **License:** Open source +- **Source:** [GitHub](https://github.com/agentic-community/mcp-gateway-registry) + +**What it is:** Enterprise-ready MCP Gateway & Registry that centralizes AI development tools with OAuth authentication, dynamic tool discovery, and unified access for AI agents and coding assistants. + +**Architecture highlights:** +- Unified MCP Server Gateway — single access point +- MCP Servers Registry — dynamic tool discovery +- Agent Registry & A2A Communication Hub +- Dual authentication: human user + machine-to-machine agent auth +- Keycloak/Entra integration for enterprise SSO + +**Key differentiator:** Governance layer for MCP servers — transforms "scattered MCP server chaos into governed, auditable tool access." This is the missing middleware between agents and tools. + +**Relevance for Electron integration:** +- Could solve MCP server management for team agents +- OAuth/auth layer would be useful for enterprise deployments +- **Confidence: 5/10, Reliability: 5/10** + +--- + +### 2.3 Invariant Gateway + +- **URL:** https://github.com/invariantlabs-ai/invariant-gateway +- **License:** Open source +- **Source:** [GitHub](https://github.com/invariantlabs-ai/invariant-gateway) + +**What it is:** LLM proxy to observe and debug what AI agents are doing. Supports MCP (stdio, SSE, Streamable HTTP) tool calling. Integrates with LiteLLM. + +**Key differentiator:** Focused on observability and debugging of agent tool calls — complementary to our session analysis features. + +--- + +## 3. Durable Execution & Workflow Engines + +### 3.1 Temporal + +- **URL:** https://github.com/temporalio/temporal +- **Stars:** 13k+ +- **Valuation:** $5B (Series D, February 2026, led by a16z) +- **License:** MIT +- **Source:** [Temporal Blog](https://temporal.io/blog/of-course-you-can-build-dynamic-ai-agents-with-temporal), [Temporal A16Z Funding](https://temporal.io/blog/temporal-raises-usd300m-series-d-at-a-usd5b-valuation) + +**What it is:** The foundational durable execution platform. Separates Workflows (orchestration) from Activities (actual work like LLM calls). Agents survive crashes and resume exactly where they left off. + +**Architecture highlights:** +- **Workflow/Activity separation** — deterministic orchestration + non-deterministic LLM calls +- **Event History** — full record of past decisions for crash recovery +- **OpenAI Agents SDK integration** (public preview) — durable agents out of the box +- **PydanticAI integration** — durable Python agents +- **Handles 150k+ actions/second** — battle-tested at scale + +**Enterprise adoption:** OpenAI (Codex runs on Temporal), Replit, Lovable, ADP, Abridge, Washington Post, Block. + +**Key differentiator:** The gold standard for durable execution. If AI agents need to run for hours/days, survive crashes, and handle human-in-the-loop — Temporal is the infrastructure layer that makes it work. + +**Relevance for Electron integration:** +- TypeScript SDK available +- Requires a server component (can self-host or use cloud) +- Adds significant operational complexity +- Best for server-side orchestration, not embedded in Electron +- **Confidence: 9/10, Reliability: 10/10** + +--- + +### 3.2 Trigger.dev + +- **URL:** https://github.com/triggerdotdev/trigger.dev +- **Stars:** ~13.9k +- **License:** Apache 2.0 +- **Source:** [Trigger.dev site](https://trigger.dev/), [AI Agents docs](https://trigger.dev/product/ai-agents), [GitHub](https://github.com/triggerdotdev/trigger.dev) + +**What it is:** Platform for building and deploying fully-managed AI agents and workflows. Durable execution with checkpoint-resume (CRIU). + +**Architecture highlights:** +- **Orchestrator pattern** — breaks jobs into smaller tasks, assigns to specialists +- **Realtime streaming** — live status updates, LLM response streaming to frontend +- **Vercel AI SDK integration** — `ai.tool` creates tools from tasks +- **MCP Server** — interact with projects from Claude Code, Cursor, etc. +- **batch.triggerByTaskAndWait** — efficient parallel coordination +- **Elastic infrastructure** — auto-scaling, concurrency control + +**Key differentiator:** Durable execution + realtime streaming + MCP server. The MCP server integration means agents in our app could trigger/monitor Trigger.dev tasks. + +**Relevance for Electron integration:** +- TypeScript-native +- Server-side platform (not embeddable in Electron directly) +- Good as external orchestration backend +- MCP integration is a natural bridge +- **Confidence: 7/10, Reliability: 8/10** + +--- + +### 3.3 Hatchet + +- **URL:** https://github.com/hatchet-dev/hatchet +- **Stars:** ~4.5k+ +- **License:** MIT +- **SDKs:** Python, TypeScript, Golang +- **Source:** [Hatchet site](https://hatchet.run/), [Docs](https://docs.hatchet.run/v1), [GitHub](https://github.com/hatchet-dev/hatchet) + +**What it is:** Open-source platform for AI agent orchestration, background tasks, and mission-critical workflows. YC W24. + +**Architecture highlights:** +- General-purpose: queue + DAG orchestrator + durable execution engine +- **AI agent primitives** — retries, parallel tool calls, state management, guardrails +- **Fairness** — distributes requests fairly, prevents busy-user overwhelm +- **Concurrency control** — FIFO, LIFO, Round Robin, Priority Queues +- **Human-in-the-loop** — eventing for signaling and streaming +- Built on PostgreSQL — simple self-hosting +- Web UI for monitoring + +**Key differentiator:** Lower operational overhead than Temporal (just PostgreSQL), while providing similar durable execution guarantees. The fairness and concurrency controls are specifically designed for AI agent workloads. + +**Relevance for Electron integration:** +- TypeScript SDK available +- Simpler to self-host than Temporal +- Could be bundled with Electron app (just needs PostgreSQL) +- **Confidence: 7/10, Reliability: 7/10** + +--- + +### 3.4 Windmill + +- **URL:** https://github.com/windmill-labs/windmill +- **Stars:** ~13k+ +- **License:** AGPLv3 +- **Source:** [Windmill site](https://www.windmill.dev/), [AI Agents Blog](https://www.windmill.dev/blog/ai-agents) + +**What it is:** Open-source developer platform for building internal tools, workflows, and automations. Supports 20+ languages including TypeScript (Bun runtime). + +**Architecture highlights:** +- **AI Agent Steps** — any Windmill script becomes a tool the AI agent can invoke +- **Automatic tool definitions** — JSON schema from scripts becomes agent tool definitions +- **Multi-language tools** — Python, TypeScript, Go, Rust, PHP, Bash, SQL, etc. +- **MCP integration** — agents connect to external MCP servers +- **Visual DAG editor** + workflows-as-code (Python/TypeScript) +- **~50ms added latency** — very performant + +**Key differentiator:** Any script in any language automatically becomes an agent tool. The "scripts as tools" approach is uniquely pragmatic — no separate tool registration needed. + +**Relevance for Electron integration:** +- AGPLv3 license (restrictive for embedding) +- Docker-based deployment +- Better as external orchestration service +- **Confidence: 6/10, Reliability: 7/10** + +--- + +## 4. Visual & Low-Code Agent Builders + +### 4.1 Dify + +- **URL:** https://github.com/langgenius/dify +- **Stars:** ~129.8k (most-starred agent framework on GitHub) +- **License:** Apache 2.0 (core) +- **Source:** [Dify site](https://dify.ai/), [GitHub](https://github.com/langgenius/dify), [Medium](https://medium.com/@gptproto.official/dify-the-open-source-standard-for-ai-orchestration-777a7bae3bb4) + +**What it is:** Open-source LLM app development platform with visual workflow builder, RAG pipeline, agent capabilities, and model management. + +**Architecture highlights:** +- **Visual canvas** for building AI workflows +- **Hundreds of LLM integrations** — any OpenAI-compatible model +- **50+ built-in tools** for agents +- **MCP integration** — supports HTTP-based MCP services (protocol 2025-03-26) +- Can turn Dify workflows/agents into MCP servers +- **Backend-as-a-Service** — all features via REST API +- 180k+ developers, 59k+ end users + +**Key differentiator:** The most popular open-source agent platform by stars. Strong visual workflow editor. Can expose workflows as MCP servers — meaning our app could consume Dify workflows as tools. + +**Relevance for Electron integration:** +- Python/Docker backend — not embeddable in Electron +- REST API could be consumed from our Electron app +- MCP server mode is very interesting for integration +- **Confidence: 7/10, Reliability: 8/10** + +--- + +### 4.2 n8n + +- **URL:** https://github.com/n8n-io/n8n +- **Stars:** ~180.7k +- **License:** Fair-code (Sustainable Use License) +- **Source:** [n8n site](https://n8n.io/), [AI Agents](https://n8n.io/ai-agents/), [GitHub](https://github.com/n8n-io/n8n) + +**What it is:** Fair-code workflow automation platform with native AI capabilities. 400+ integrations, visual builder + code. + +**Architecture highlights:** +- **AI Agent node** — connects to LLMs, integrates with tools +- **MCP Server** — call n8n workflows from other AI systems +- **Human-in-the-loop** — approval at any workflow point +- **Multi-agent & RAG support** +- Full observability: inspect prompts, responses, execution flow + +**Limitations:** Lacks persistent memory, autonomous planning, and dynamic decision-making. Better for structured tasks than truly autonomous agents. + +**Relevance for Electron integration:** +- TypeScript-based (Node.js) +- Could theoretically be embedded, but it's a full platform +- Fair-code license may be restrictive +- Better as external orchestration service consumed via MCP +- **Confidence: 6/10, Reliability: 7/10** + +--- + +### 4.3 Rivet + +- **URL:** https://github.com/Ironclad/rivet +- **Stars:** ~3.9k +- **License:** Open source +- **Source:** [Rivet site](https://rivet.ironcladapp.com/), [GitHub](https://github.com/Ironclad/rivet) + +**What it is:** Visual AI programming environment for building AI agents with LLMs. By Ironclad. Desktop app + TypeScript runtime library. + +**Architecture highlights:** +- **Node-based visual editor** — drag-and-drop AI chains +- **Real-time debugging** — watch graph execute step-by-step, remote debugging +- **Graph nesting** — modular, reusable components +- **Graphs as YAML** — version control, code review +- **TypeScript runtime library** (`rivet-core`) — run graphs programmatically +- **`rivet serve`** — expose any graph as HTTP endpoint +- **Plugin ecosystem** — Anthropic, HuggingFace, MongoDB plugins + +**Key differentiator:** Desktop Electron app with visual AI chain builder + TypeScript runtime. The "graphs as YAML + TypeScript execution" approach is very relevant — could potentially embed Rivet's runtime in our app. + +**Relevance for Electron integration:** +- TypeScript runtime library for programmatic execution +- Already built as an Electron app — proven pattern +- YAML-based graph definitions could be stored/versioned +- Plugin architecture for extensibility +- **Confidence: 7/10, Reliability: 6/10** + +--- + +## 5. Protocol Standards & Ecosystem + +### 5.1 Protocol Landscape (2026) + +The AI agent ecosystem has converged on a layered protocol stack: + +| Protocol | Owner | Focus | Spec | +|----------|-------|-------|------| +| **MCP** (Model Context Protocol) | Anthropic / AAIF | Agent-to-Tool | Tool access, context | +| **A2A** (Agent-to-Agent) | Google / AAIF | Agent-to-Agent | Task delegation | +| **ACP** (Agent Communication Protocol) | IBM BeeAI / LF | Agent Communication | REST-based, merged into A2A Aug 2025 | +| **AG-UI** (Agent-to-User) | Community | Agent-to-User | Real-time interactivity | +| **AGNTCY** | Cisco / LF | Agent Infrastructure | Discovery, identity, security | + +**Sources:** [DEV.to MCP vs A2A](https://dev.to/pockit_tools/mcp-vs-a2a-the-complete-guide-to-ai-agent-protocols-in-2026-30li), [Agentic AI Foundation](https://intuitionlabs.ai/articles/agentic-ai-foundation-open-standards), [Pento MCP Review](https://www.pento.ai/blog/a-year-of-mcp-2025-review) + +**Key facts (March 2026):** +- MCP: 97M+ monthly SDK downloads (Python + TypeScript combined) +- AAIF (Agentic AI Foundation): Co-founded by OpenAI, Anthropic, Google, Microsoft, AWS, Block — hosts both MCP and A2A +- TypeScript MCP SDK: v1.27.1 (March 2026) +- A2A Agent Cards: `/.well-known/agent.json` for discovery +- Consensus architecture: MCP for tools, A2A for agents, AG-UI for humans + +**Key insight for our product:** "If your agents are all within the same organization, running in the same infrastructure — you don't need A2A. Use simpler orchestration. A2A's overhead isn't justified for single-org setups." ([Source](https://dev.to/pockit_tools/mcp-vs-a2a-the-complete-guide-to-ai-agent-protocols-in-2026-30li)) + +--- + +### 5.2 Semantic Router (Aurelio AI) + +- **URL:** https://github.com/aurelio-labs/semantic-router +- **License:** MIT +- **Language:** Python +- **Source:** [Aurelio AI](https://www.aurelio.ai/semantic-router), [GitHub](https://github.com/aurelio-labs/semantic-router) + +**What it is:** Superfast decision-making layer for LLMs and agents. Routes requests using semantic vector space instead of slow LLM calls. + +**Key capability:** Tool selection, guardrails, intent routing — all without LLM calls. Scales to thousands of tools. + +### 5.3 vLLM Semantic Router + +- **URL:** https://github.com/vllm-project/semantic-router +- **License:** Open source +- **Language:** Rust +- **Source:** [vLLM Blog](https://blog.vllm.ai/2026/01/05/vllm-sr-iris.html), [Red Hat](https://developers.redhat.com/articles/2025/09/11/vllm-semantic-router-improving-efficiency-ai-reasoning) + +**What it is:** System-level intelligent router for Mixture-of-Models. Routes queries to the best model based on complexity analysis. + +**v0.1 "Iris" release (January 2026):** Production-ready, 600+ PRs merged, 300+ issues, 50+ engineers. Supports OpenAI Responses API with conversation state for intelligent routing in multi-turn agent apps. + +**Key stats:** +10.2% accuracy on complex tasks, -47.1% latency, -48.5% token usage. + +--- + +## 6. Coding Agent Fleet Managers + +### 6.1 Angy + +- **URL:** Product Hunt (recent launch, ~1 week ago) +- **License:** Open source +- **Source:** [Product Hunt](https://www.producthunt.com/products/angy) + +**What it is:** Open-source fleet manager and IDE for Claude Code. Orchestrates a deterministic multi-phase pipeline (Plan -> Build -> Test) with adversarial verification. + +**Architecture:** +- **Adversarial Counterpart agent** that strictly verifies code +- **Git worktree isolation** for parallel agent execution +- **Scheduler** for running epics overnight +- **Multi-phase pipeline:** Architect -> Counterpart -> Build -> Test +- Self-bootstrapped after one day of initial work + +--- + +### 6.2 GitHub Agent HQ + +- **URL:** https://github.blog/news-insights/company-news/welcome-home-agents/ +- **Source:** [GitHub Blog](https://github.blog/news-insights/company-news/welcome-home-agents/), [Eficode](https://www.eficode.com/blog/why-github-agent-hq-matters-for-engineering-teams-in-2026) + +**What it is:** GitHub's platform for orchestrating AI agent fleets. Multi-agent support with Claude Code, Codex, and custom agents. + +**Architecture:** +- **Mission Control** — unified command center across GitHub, VS Code, mobile, CLI +- **Fleet of specialized agents** — security, testing, refactoring specialists +- **Multi-vendor:** Anthropic, OpenAI, Google, Cognition, xAI +- **Governance controls** — branch controls, identity, agent access policies +- **Squad** — coordinated AI teams inside repositories + +--- + +### 6.3 Hephaestus + +- **URL:** https://github.com/Ido-Levi/Hephaestus +- **License:** Open source (alpha) +- **Source:** [GitHub](https://github.com/Ido-Levi/Hephaestus), [HN](https://news.ycombinator.com/item?id=45796897) + +**What it is:** Semi-structured agentic framework where workflows build themselves as agents discover what needs to be done. + +**Architecture:** +- Define phase types (Analyze -> Implement -> Test), agents dynamically create tasks +- **Ticket-based coordination** — tickets flow through workflow carrying context +- **Guardian system** — LLM-powered coherence scoring for alignment checking +- **Parallel agents** in isolated Claude Code sessions +- Real-time observability + +**Key differentiator:** Emergent workflows — agents discover tasks rather than following predefined plans. Interesting alternative to rigid kanban task assignment. + +--- + +### 6.4 KAOS (Kubernetes Agent Orchestration System) + +- **URL:** https://github.com/axsaucedo/kaos +- **License:** Open source +- **Source:** [GitHub](https://github.com/axsaucedo/kaos), [HN](https://news.ycombinator.com/item?id=46688521) + +**What it is:** Kubernetes-native framework for deploying and orchestrating AI agents at scale. + +**Architecture:** +- **Golang control plane** — manages Agentic CRDs (Custom Resource Definitions) +- **Python data plane** — implements A2A, memory, tool/model management +- **React UI** — CRUD + debugging +- **PAIS** — enterprise wrapper for Pydantic AI with OpenAI-compatible HTTP API +- **A2A discovery** built in +- **OpenTelemetry** instrumentation + +**Key differentiator:** Kubernetes-native multi-agent system for hundreds/thousands of services. Production infrastructure approach. + +--- + +## 7. Python-First Frameworks (with TS relevance) + +### 7.1 BeeAI Framework (IBM) + +- **URL:** https://github.com/i-am-bee/beeai-framework +- **Stars:** 3k+ +- **License:** Open source (Linux Foundation governance) +- **Source:** [IBM Think](https://www.ibm.com/think/news/beeai-open-source-multiagent), [BeeAI Docs](https://framework.beeai.dev/) + +**What it is:** IBM's open-source framework for production-grade multi-agent systems. **Dual language: Python AND TypeScript with complete feature parity.** + +**Architecture:** +- 10+ LLM providers including Ollama, OpenAI, Watsonx.ai +- **MCP tool integration** +- **A2A protocol support** (ACP merged into A2A) +- **Agent Stack** — framework-agnostic deployment (BeeAI, LangGraph, CrewAI, custom) +- Built-in constraint enforcement and rule-based governance +- Each agent runs in its own container with resource limits +- OpenTelemetry observability + +**Key differentiator:** TypeScript with feature parity is rare among IBM projects. Linux Foundation governance ensures long-term stability. The Agent Stack deploy layer is uniquely framework-agnostic. + +**Relevance for Electron integration:** +- TypeScript SDK with full feature parity +- Framework-agnostic Agent Stack could deploy any agent +- MCP + A2A support aligns with protocol trends +- **Confidence: 7/10, Reliability: 7/10** + +--- + +### 7.2 Letta (formerly MemGPT) + +- **URL:** https://github.com/letta-ai/letta +- **Stars:** 16.2k+ +- **License:** Open source +- **Source:** [Letta site](https://www.letta.com/), [GitHub](https://github.com/letta-ai/letta) + +**What it is:** Platform for stateful agents with advanced memory that learn and self-improve over time. + +**Architecture:** +- **Self-editing memory** — agents manage their own memory blocks +- **Sleep-time compute** — agents "think" during downtime, rewrite memory +- **Skill learning** — agents learn new skills from experience +- **Letta Code** — #1 model-agnostic open source agent on Terminal-Bench +- **REST API + TypeScript SDK** +- Model-agnostic: OpenAI, Anthropic, local models + +**Key differentiator:** Memory-first architecture is unique. Sleep-time compute and skill learning are research-frontier features. TypeScript SDK available. + +**Relevance for Electron integration:** +- TypeScript SDK for client-side integration +- REST API for server-side +- Memory architecture could inform our agent context management +- **Confidence: 7/10, Reliability: 6/10** + +--- + +### 7.3 CAMEL-AI + +- **URL:** https://github.com/camel-ai/camel +- **Stars:** Growing (active research community) +- **License:** Apache 2.0 (code), CC BY NC 4.0 (datasets) +- **Source:** [CAMEL-AI site](https://www.camel-ai.org/), [GitHub](https://github.com/camel-ai/camel) + +**What it is:** The first open-source multi-agent framework, focused on dialog-driven collaboration and scaling laws of agents. + +**Architecture:** +- **Role-based agents** — structured conversations between assigned roles +- **OWL** — Optimized Workforce Learning, #1 on GAIA benchmark (69.09%) +- **OASIS** — simulations with 1M agents +- **MCPify** — project for MCP integration +- Accepted at NeurIPS 2025 + +**Key differentiator:** Research-first approach focused on scaling laws of multi-agent systems. OWL's GAIA benchmark performance is state-of-the-art. Python only. + +--- + +### 7.4 Julep AI + +- **URL:** https://github.com/julep-ai/julep +- **License:** Open source +- **Source:** [Julep site](https://julep.ai/), [GitHub](https://github.com/julep-ai/julep), [Temporal Blog](https://temporal.io/blog/julep-ai-future-ai-workflows) + +**What it is:** "Firebase for AI agents" — serverless platform for multi-step AI workflows. Persistent memory, modular workflows (YAML or code), built-in retries. + +**Status:** Hosted backend shut down December 31, 2025. Open-source self-hosting available. Team pivoted to **memory.store**. + +**Note:** Python and Node.js SDKs available, but future unclear given the pivot. + +--- + +### 7.5 ChatDev 2.0 + +- **URL:** https://github.com/OpenBMB/ChatDev +- **License:** Apache 2.0 +- **Source:** [GitHub](https://github.com/OpenBMB/ChatDev), [IBM](https://www.ibm.com/think/topics/chatdev) + +**What it is:** Zero-code multi-agent orchestration platform simulating a virtual software company. ChatDev 2.0 (January 2026) transforms rigid structures into flexible workflow systems. + +**Architecture:** +- **Visual canvas (Workflow)** — drag-and-drop multi-agent system design +- **Python SDK** (PyPI: chatdev) — run YAML workflows in Python +- **MacNet** — multi-agent collaboration networks for complex topologies +- **Puppeteer** — dynamic orchestration with RL-optimized agent sequencing +- FastAPI backend + Vue 3 frontend + +**Key differentiator:** NeurIPS 2025 accepted research, zero-code visual approach, software company simulation metaphor. Python + Vue only. + +--- + +### 7.6 Haystack (deepset) + +- **URL:** https://github.com/deepset-ai/haystack +- **Stars:** High (enterprise adoption: Airbus, NVIDIA, Comcast) +- **License:** Apache 2.0 +- **Source:** [Haystack site](https://haystack.deepset.ai/), [Haystack Docs](https://docs.haystack.deepset.ai/docs/agents) + +**What it is:** Open-source AI orchestration framework for production-ready LLM applications. Modular pipelines + agent workflows. + +**Architecture:** +- **Context engineering** — explicit control over retrieval, ranking, filtering, routing +- **Universal Agent** component with Chat Generator + tools +- **ComponentTool** — wrap any Haystack component as a callable tool +- **@tool decorator** — create tools from Python functions +- **Hayhooks** — expose pipelines/agents via HTTP or MCP +- **AgentSnapshot** — stepwise debugging with breakpoints +- Model-agnostic: OpenAI, Anthropic, Cohere, HuggingFace, Azure, Bedrock +- Latest: v2.25 (March 2026) + +**Key differentiator:** Enterprise-grade, context-engineering focused. The MCP exposure via Hayhooks means our app could consume Haystack agents as tools. + +--- + +### 7.7 ControlFlow (Prefect) -> Marvin + +- **URL:** https://github.com/PrefectHQ/ControlFlow (archived) +- **License:** Apache 2.0 +- **Source:** [Prefect Blog](https://www.prefect.io/blog/controlflow-intro) + +**What it is:** Task-centric AI workflow framework built on Prefect 3.0. **Archived** — merged into Marvin framework. + +**Key ideas (preserved in Marvin):** +- Tasks, Agents, Flows as core abstractions +- "AI agents are most effective when applied to small, well-defined tasks" +- Multi-agent collaboration strategies: Round-robin, Random, Moderated +- Every flow is a Prefect flow — full orchestration + observability + +--- + +## 8. Summary Matrix + +| Tool | Language | Stars | License | MCP | A2A | Multi-Agent | Electron-Ready | Maturity | +|------|----------|-------|---------|-----|-----|-------------|----------------|----------| +| **Mastra** | TypeScript | 22.3k | Apache 2.0 | Yes | -- | Yes | **Native** | Production | +| **Inngest AgentKit** | TypeScript | 793 | Apache 2.0 | Yes | -- | Yes (Networks) | **Native** | Beta | +| **VoltAgent** | TypeScript | 5.1k | MIT | Yes | -- | Yes (Chain API) | **Native** | Early | +| **HazelJS** | TypeScript | Small | Apache 2.0 | -- | -- | Yes (AgentGraph) | **Native** | Alpha | +| **Agentica** | TypeScript | Small | MIT | Yes | -- | No | **Native** | Beta | +| **Strands (AWS)** | Python+TS | 14M DL | Apache 2.0 | Yes | -- | Yes (Swarm) | TS SDK | Preview | +| **OpenAI Agents SDK** | TypeScript | 2.1k | MIT | Yes | -- | Yes (Handoffs) | **Native** | GA | +| **Google ADK TS** | TypeScript | 581 | Apache 2.0 | Yes | Yes | Yes | **Native** | Early | +| **BeeAI** | Python+TS | 3k | Open (LF) | Yes | Yes | Yes | TS SDK | Production | +| **AgentGateway** | Rust | 2k | Open (LF) | Yes | Yes | -- (infra) | Sidecar | v1.0 | +| **Temporal** | Multi | 13k | MIT | -- | -- | -- (infra) | TS SDK | Production | +| **Trigger.dev** | TypeScript | 13.9k | Apache 2.0 | Yes | -- | Yes | Server-side | v4 | +| **Hatchet** | Multi | 4.5k | MIT | -- | -- | -- (infra) | TS SDK | Production | +| **Dify** | Python | 129.8k | Apache 2.0 | Yes | -- | Yes | REST API | Production | +| **n8n** | TypeScript | 180.7k | Fair-code | Yes | -- | Yes (basic) | Heavy | Production | +| **Rivet** | TypeScript | 3.9k | Open | -- | -- | -- | **Electron app** | v4.1 | +| **Letta** | Python+TS | 16.2k | Open | -- | -- | -- | TS SDK | Production | +| **CAMEL-AI** | Python | Growing | Apache 2.0 | -- | -- | Yes | -- | Research | +| **ChatDev 2.0** | Python | Growing | Apache 2.0 | -- | -- | Yes | -- | v2.0 | +| **Haystack** | Python | High | Apache 2.0 | Yes | -- | Yes | REST/MCP | v2.25 | + +--- + +## 9. Recommendations for Claude Agent Teams UI + +### Tier 1: Most Relevant for Integration (TypeScript-native, embeddable) + +1. **Mastra** — The most mature TS agent framework. Could serve as orchestration backend for agent workflows, multi-model routing, and memory management. Proven at scale (Replit, PayPal). + +2. **Inngest AgentKit** — Lightweight multi-agent networks with durable execution. The Agent -> Network -> Router -> State model maps well to our team/agent/task architecture. + +3. **OpenAI Agents SDK (TS)** — If we want to support OpenAI models natively. Handoff mechanism is clean for agent-to-agent delegation. + +4. **VoltAgent** — Observability-first approach complements our session analysis. Chain API for multi-agent workflows is well-designed. + +### Tier 2: Protocol & Infrastructure Integration + +5. **AgentGateway** — Could be bundled as a sidecar process. Handles MCP/A2A protocol routing, OpenAPI-to-MCP translation, multi-tenancy. + +6. **MCP Gateway Registry** — Solves MCP server governance for enterprise deployments. + +7. **Rivet** — TypeScript runtime library for visual AI chain execution. Already an Electron app. + +### Tier 3: External Services (consume via API/MCP) + +8. **Dify** — Expose visual workflows as MCP servers that our app consumes. +9. **Trigger.dev** — Durable execution backend via MCP server integration. +10. **Hatchet** — Lightweight durable execution (just PostgreSQL). + +### Key Architectural Insight + +The emerging pattern for 2026 is a **layered architecture**: +- **Protocol layer:** MCP (tools) + A2A (agents) + AG-UI (humans) +- **Execution layer:** Durable workflows (Temporal/Hatchet/Inngest) +- **Agent layer:** Framework-specific (Mastra/AgentKit/custom) +- **Orchestration layer:** Fleet management (our kanban board / Agent HQ / Hephaestus) +- **Gateway layer:** AgentGateway for routing, security, observability + +Our product (Claude Agent Teams UI) sits at the **orchestration layer** — the kanban-based fleet management interface. The key opportunity is to become framework-agnostic by integrating with the protocol layer (MCP/A2A) and supporting multiple agent frameworks underneath. + +### Unique Competitive Advantages We Have + +Based on this research, no tool combines ALL of: +1. Kanban-based task management (visual orchestration) +2. Multi-agent team coordination with real-time communication +3. Code review (diff view) per task +4. Deep session analysis (bash commands, reasoning, tokens) +5. Desktop-native (Electron) with zero-setup + +The closest competitors are GitHub Agent HQ (platform-level, not desktop) and Angy (fleet manager, but IDE-focused not kanban). Our kanban + code review + session analysis combination remains unique. diff --git a/docs/research/ai-orchestration-tools.md b/docs/research/ai-orchestration-tools.md new file mode 100644 index 00000000..9928836b --- /dev/null +++ b/docs/research/ai-orchestration-tools.md @@ -0,0 +1,550 @@ +# AI Agent Orchestration Tools & Frameworks (March 2026) + +> Research date: 2026-03-24 +> Focus: Multi-provider AI coding agent orchestration — tools that coordinate Claude Code, Codex CLI, Gemini CLI, and other AI agents together. + +## Executive Summary + +The multi-agent AI orchestration market has exploded in 2025-2026. Gartner reports a **1,445% surge** in multi-agent system inquiries from Q1 2024 to Q2 2025. The AI agent market reached **$7.84B in 2025**, projected to hit **$52.62B by 2030** (CAGR 46.3%). + +The landscape splits into three distinct categories: +1. **Desktop orchestrators** — Electron/Tauri apps managing parallel coding agents with kanban boards, diff viewers, git worktree isolation +2. **CLI/framework orchestrators** — Command-line tools and Python/TypeScript frameworks for multi-agent coordination +3. **General-purpose multi-agent frameworks** — Provider-agnostic frameworks for building any multi-agent system (not coding-specific) + +**Key finding for our project:** Multiple direct competitors have emerged with kanban boards + multi-agent orchestration (Vibe Kanban, Dorothy, Mozzie). However, none combine all of: multi-provider agent support + kanban + code review + team communication + Electron desktop app in the way Claude Agent Teams UI does. + +--- + +## Category 1: Desktop Orchestrators (Most Relevant to Our Project) + +### 1.1 Vibe Kanban (BloopAI) + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/BloopAI/vibe-kanban](https://github.com/BloopAI/vibe-kanban) | +| **Stars** | ~23,700 | +| **License** | Open source (free) | +| **Tech Stack** | Rust (backend) + TypeScript/React (frontend) | +| **AI Providers** | Claude Code, Codex, Gemini CLI, GitHub Copilot, Amp, Cursor, OpenCode, Droid, CCR, Qwen Code (10+) | +| **Reliability** | 8/10 | +| **Confidence** | 9/10 | + +**Architecture:** Cross-platform orchestration platform (CLI + web UI) with kanban board. Each agent gets its own git worktree and branch. Implements MCP both as client and server — the kanban board itself becomes an API for AI agents. + +**Key features:** +- Kanban board with drag-and-drop task management +- Parallel agent execution in isolated workspaces +- Built-in diff review with inline comments +- Built-in browser preview with devtools +- MCP server — other agents can create tasks, move cards, read board status +- PR creation and merge from UI +- Install via `npx vibe-kanban` + +**Relevance to us:** **DIRECT COMPETITOR.** Has kanban + multi-agent + diff review. Key differences: no team communication/messaging between agents, no session analysis, no context monitoring. Uses Rust backend (not Electron). + +--- + +### 1.2 Dorothy + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/Charlie85270/Dorothy](https://github.com/Charlie85270/Dorothy) | +| **Website** | [dorothyai.app](https://dorothyai.app/) | +| **License** | Open source | +| **Tech Stack** | Electron + React/Next.js | +| **AI Providers** | Claude Code, Codex, Gemini CLI | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**Architecture:** Electron desktop app with isolated PTY terminal sessions per agent. Features a "Super Agent" orchestrator that programmatically controls all other agents via MCP tools. + +**Key features:** +- Kanban board with drag-and-drop, agents auto-pick work by skill +- 5 MCP servers (40+ tools) for programmatic agent control +- Super Agent meta-orchestrator that delegates across agent pool +- GitHub, JIRA, Telegram, Slack integrations +- Google Workspace integration (Gmail, Drive, Sheets, Calendar) +- Community skill plugins from skills.sh +- 3D animated agent visualization +- Agent automations (trigger on GitHub PRs, issues, events) +- Scheduling and recurring agent tasks + +**Relevance to us:** **DIRECT COMPETITOR.** Electron + kanban + multi-agent + MCP. Most similar to our architecture. Lacks: team-level communication, deep session analysis, context token tracking, structured code review workflow. + +--- + +### 1.3 Superset + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/superset-sh/superset](https://github.com/superset-sh/superset) | +| **Website** | [superset.sh](https://superset.sh/) | +| **Stars** | ~7,800 | +| **License** | Elastic License 2.0 (ELv2) — NOT MIT/Apache | +| **Tech Stack** | Electron + React + xterm.js + TailwindCSS v4, Bun + Turborepo | +| **AI Providers** | Claude Code, Codex, OpenCode, Cursor Agent — any CLI agent | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**Architecture:** Electron desktop terminal environment. Each task gets its own git worktree. Built-in diff viewer and editor. Same terminal stack as VS Code (xterm.js). + +**Key features:** +- Run 10+ agents simultaneously +- Git worktree isolation per task +- Built-in diff viewer +- Workspace presets (automate env setup, deps) +- One-click open in external IDE +- Agent status monitoring and notifications + +**Relevance to us:** Competitor in the parallel-agent-desktop space. Less feature-rich (no kanban, no team messaging, no code review workflow). More of a "terminal multiplexer for agents" than a full management platform. + +--- + +### 1.4 Mozzie + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/usemozzie/mozzie](https://github.com/usemozzie/mozzie) | +| **License** | Open source | +| **Tech Stack** | Tauri (Rust) + Node + pnpm | +| **AI Providers** | Claude Code, Gemini CLI, Codex CLI, custom scripts | +| **Reliability** | 6/10 | +| **Confidence** | 7/10 | + +**Architecture:** Tauri desktop app with LLM orchestrator. Agents communicate via ACP (Agent Communication Protocol) over stdio. Persistent orchestrator conversation history. + +**Key features:** +- LLM orchestrator that creates work items, sets dependencies, assigns agents +- Git worktree isolation per work item +- Dependency graph with cycle detection +- Sub-work-items with stacked branches +- Review workflow (approve to push, reject with feedback) +- Live streaming of agent output with tool-call visualization +- Agents learn from rejection history + +**Relevance to us:** Competitor. Tauri-based (lighter than Electron). Has dependency management and review workflow. No kanban board per se, more of a work-item queue. + +--- + +### 1.5 Parallel Code + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/johannesjo/parallel-code](https://github.com/johannesjo/parallel-code) | +| **License** | MIT | +| **AI Providers** | Claude Code, Codex CLI, Gemini CLI | +| **Reliability** | 6/10 | +| **Confidence** | 7/10 | + +**Architecture:** Desktop app with automatic git worktree creation per task. Keyboard-first design. + +**Key features:** +- Automatic branch + worktree per task +- 5+ agents in parallel, zero conflicts +- Unified session view +- Built-in diff viewer with one-click merge +- Mobile monitoring via QR code (Wi-Fi/Tailscale) +- Keyboard-first, mouse optional + +**Relevance to us:** Simpler competitor focused on parallel execution + diff review. No kanban, no team communication. + +--- + +## Category 2: CLI/Framework Orchestrators for Coding Agents + +### 2.1 MCO (Multi-CLI Orchestrator) + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/mco-org/mco](https://github.com/mco-org/mco) | +| **License** | Open source | +| **Language** | TypeScript/Node | +| **AI Providers** | Claude Code, Codex CLI, Gemini CLI, OpenCode, Qwen Code | +| **Reliability** | 7/10 | +| **Confidence** | 7/10 | + +**Architecture:** Neutral orchestration layer. Dispatches prompts to multiple agent CLIs in parallel, aggregates results, returns structured output (JSON, SARIF, PR-ready Markdown). No vendor lock-in. + +**Key concept:** "Work like a Tech Lead" — assign one task to multiple agents, run in parallel, compare outcomes. Designed to be called by any IDE or agent (Cursor, Trae, Copilot, Windsurf). + +**Integration potential:** Could be used as a backend dispatch layer. MCO handles the multi-agent fan-out; our UI handles the visualization and management. + +--- + +### 2.2 Agent Orchestrator (ComposioHQ) + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/ComposioHQ/agent-orchestrator](https://github.com/ComposioHQ/agent-orchestrator) | +| **Stars** | ~4,500 | +| **License** | MIT | +| **Language** | TypeScript | +| **AI Providers** | Claude Code, Codex, Aider (agent-agnostic plugin system) | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**Architecture:** Plugin-based orchestrator managing fleets of coding agents. 8 pluggable abstraction slots: agent, runtime, tracker, reviewer, etc. Each agent gets own git worktree, branch, and PR. + +**Key features:** +- Agent-agnostic (Claude Code, Codex, Aider) +- Runtime-agnostic (tmux, Docker) +- Tracker-agnostic (GitHub, Linear) +- Auto-fix CI failures and address review comments +- Centralized dashboard for monitoring +- 100% AI co-authored codebase (impressive dogfooding) +- 30 concurrent agents at peak + +**Impressive stat:** 8 days from first commit to 43K lines of TypeScript, 91 commits, 61 PRs merged, 84% of PRs created by AI agent sessions. + +--- + +### 2.3 AWS CLI Agent Orchestrator (CAO) + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/awslabs/cli-agent-orchestrator](https://github.com/awslabs/cli-agent-orchestrator) | +| **License** | Open source | +| **Language** | Python | +| **AI Providers** | Amazon Q CLI, Claude Code (Codex CLI, Gemini CLI, Qwen CLI planned) | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**Architecture:** Hierarchical multi-agent system with Supervisor Agent coordinating Worker Agents. Each agent in isolated tmux session. Communication via MCP servers. Local HTTP server processes orchestration requests. + +**Orchestration patterns:** +- Handoff (synchronous task transfer) +- Assign (async parallel execution) +- Send Message (direct agent communication) +- Flow — scheduled cron-like runs + +**Caveat:** Supervisor runs on Amazon Bedrock — requires AWS credentials and account. Open source code but can't run without AWS infrastructure. + +--- + +### 2.4 MetaSwarm + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/dsifry/metaswarm](https://github.com/dsifry/metaswarm) | +| **License** | Open source | +| **Language** | TypeScript/Node | +| **AI Providers** | Claude Code, Gemini CLI, Codex CLI | +| **Reliability** | 7/10 | +| **Confidence** | 7/10 | + +**Architecture:** Self-improving multi-agent orchestration with 18 specialized agent personas, 13 skills, 15 commands. 9-phase workflow from issue to merged PR. + +**Key features:** +- Recursive orchestration (swarm of swarms) +- Cross-model review (writer reviewed by different AI model) +- Per-task and per-session USD budget circuit breakers +- TDD enforcement, quality gates +- Git worktree isolation with sandbox protection +- Auto-detects Team Mode when multiple sessions active +- Install via `npx metaswarm init` + +--- + +### 2.5 Overstory + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/jayminwest/overstory](https://github.com/jayminwest/overstory) | +| **License** | Open source | +| **Language** | TypeScript (Bun) | +| **AI Providers** | Claude Code, Pi, Gemini CLI, Aider, Goose, Amp (11 runtime adapters) | +| **Reliability** | 6/10 | +| **Confidence** | 7/10 | + +**Architecture:** Pluggable `AgentRuntime` interface. Tmux isolation per agent in git worktrees. SQLite WAL-mode mail system for inter-agent messaging (~1-5ms per query). Two-layer instruction system (Base + per-task Overlay). + +**Key features:** +- 11 runtime adapters +- FIFO merge queue with 4-tier conflict resolution +- Tiered watchdog system (mechanical daemon + AI triage + monitor agent) +- Instruction overlays for orchestrated workers +- Honest self-critique in project docs (refreshing transparency) + +--- + +### 2.6 Claude Octopus + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/nyldn/claude-octopus](https://github.com/nyldn/claude-octopus) | +| **License** | Open source | +| **AI Providers** | Codex, Gemini, Claude, Perplexity, OpenRouter, Copilot, Qwen, Ollama (8 providers) | +| **Reliability** | 6/10 | +| **Confidence** | 7/10 | + +**Architecture:** Multi-LLM orchestration plugin for Claude Code. 75% consensus gate catches disagreements before production. 32 specialized personas, 47 commands, 50 skills. Zero providers required to start — add them one at a time. + +--- + +### 2.7 agtx + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/fynnfluegge/agtx](https://github.com/fynnfluegge/agtx) | +| **License** | Open source | +| **AI Providers** | Claude Code, Codex, Gemini CLI, OpenCode, Cursor | +| **Reliability** | 6/10 | +| **Confidence** | 6/10 | + +**Architecture:** Multi-session AI coding terminal manager. Orchestrator agent picks up tasks, plans, and delegates to multiple coding agents running in parallel. + +--- + +## Category 3: General-Purpose Multi-Agent Frameworks + +### 3.1 CrewAI + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/crewAIInc/crewAI](https://github.com/crewAIInc/crewAI) | +| **Stars** | ~45,900 | +| **License** | MIT | +| **Language** | Python | +| **AI Providers** | OpenAI, Anthropic, Gemini, Ollama, any via LiteLLM | +| **Maturity** | Production-ready, 100K+ certified developers | +| **Reliability** | 9/10 | +| **Confidence** | 9/10 | + +**Architecture:** Role-based metaphor (role, goal, backstory per agent). Three process types: sequential, hierarchical, consensual. Native MCP and A2A support. Two approaches: Crews (autonomy) and Flows (enterprise production). + +**Electron integration potential:** Python-based, so would need a subprocess/API bridge. Not designed for desktop UI integration but could serve as an orchestration backend. + +--- + +### 3.2 Microsoft Agent Framework (AutoGen + Semantic Kernel) + +| Attribute | Details | +|-----------|---------| +| **URL** | [learn.microsoft.com/en-us/agent-framework](https://learn.microsoft.com/en-us/agent-framework/overview/) | +| **Stars** | AutoGen: ~52,000 | +| **License** | Open source (MIT) | +| **Language** | Python, .NET | +| **AI Providers** | OpenAI, Azure OpenAI, Anthropic, Gemini, local models | +| **Maturity** | GA targeted end Q1 2026 | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**Architecture:** Unified SDK + runtime merging AutoGen + Semantic Kernel. Orchestration patterns: sequential, concurrent, group chat, handoff, Magentic (dynamic task ledger). Event-driven core, async-first. + +**Electron integration potential:** Primarily Python/.NET. Could use as a backend runtime via API. + +--- + +### 3.3 Agno + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/agno-agi/agno](https://github.com/agno-agi/agno) | +| **Stars** | ~38,900 | +| **License** | Apache-2.0 | +| **Language** | Python | +| **AI Providers** | OpenAI, Anthropic, Groq, and many more | +| **Maturity** | Production-ready (AgentOS + FastAPI runtime) | +| **Reliability** | 8/10 | +| **Confidence** | 8/10 | + +**Architecture:** Three-layer design: framework (agents, teams, workflows), runtime (stateless FastAPI backends), monitoring. Claims 529x faster instantiation than LangGraph. Teams with automatic agent-to-agent communication, context passing, result aggregation. + +**Electron integration potential:** FastAPI backend makes it easy to integrate via HTTP API. + +--- + +### 3.4 OpenAI Agents SDK (successor to Swarm) + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/openai/openai-agents-python](https://github.com/openai/openai-agents-python) | +| **License** | MIT | +| **Language** | Python | +| **AI Providers** | OpenAI + 100+ LLMs via provider-agnostic design | +| **Maturity** | Production-ready (launched March 2025) | +| **Reliability** | 8/10 | +| **Confidence** | 9/10 | + +**Architecture:** Core primitives: Agents, Handoffs, Guardrails, Function tools, MCP server tool calling, Sessions, Tracing. Handoff pattern: agents transfer control explicitly, carrying conversation context. Built-in MCP integration. + +--- + +### 3.5 LangGraph (by LangChain) + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/langchain-ai/langgraph](https://github.com/langchain-ai/langgraph) | +| **License** | MIT | +| **Language** | Python, TypeScript | +| **AI Providers** | Model-agnostic (plug different LLMs into different nodes) | +| **Maturity** | Production-ready, LangSmith observability | +| **Reliability** | 8/10 | +| **Confidence** | 9/10 | + +**Architecture:** Graph-based design. Each agent is a node maintaining its own state. Conditional edges, multi-team coordination, hierarchical control. Supervisor nodes for scalable orchestration. + +--- + +### 3.6 AWS Agent Squad (formerly Multi-Agent Orchestrator) + +| Attribute | Details | +|-----------|---------| +| **URL** | [github.com/awslabs/agent-squad](https://github.com/awslabs/agent-squad) | +| **License** | Open source | +| **Language** | Python, TypeScript (dual) | +| **AI Providers** | AWS Bedrock, extensible | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**Architecture:** Intelligent intent classification routes queries dynamically. Streaming + non-streaming support. Context management across agents. Universal deployment (Lambda to any cloud). + +--- + +### 3.7 Google ADK (Agent Development Kit) + +| Attribute | Details | +|-----------|---------| +| **URL** | [cloud.google.com](https://cloud.google.com/blog/products/ai-machine-learning/unlock-ai-agent-collaboration-convert-adk-agents-for-a2a) | +| **License** | Open source | +| **Language** | Python | +| **AI Providers** | Gemini (primary), extensible | +| **Reliability** | 7/10 | +| **Confidence** | 8/10 | + +**Architecture:** Hierarchical agent tree. Native A2A protocol support — agents from different frameworks can discover and invoke each other. + +--- + +### 3.8 OpenAI Symphony (New — March 2026) + +| Attribute | Details | +|-----------|---------| +| **URL** | See [Medium article](https://medium.com/@georgethomasm_89397/openai-symphony-the-new-orchestration-framework-for-multi-agent-systems-2ec991ee74cc) | +| **License** | Open source | +| **Language** | Python | +| **Maturity** | Very early (released March 5, 2026) | +| **Reliability** | 4/10 | +| **Confidence** | 5/10 | + +**Architecture:** Hierarchical delegation, iterative refinement, composable workflows. Checkpoint-based recovery — if agent fails mid-execution, workflow resumes from last checkpoint. Documentation sparse, community small, but growing. + +--- + +## Key Protocols & Standards + +### Google A2A (Agent-to-Agent Protocol) + +| Attribute | Details | +|-----------|---------| +| **URL** | [a2a-protocol.org](https://a2a-protocol.org/latest/) | +| **GitHub** | [github.com/a2aproject/A2A](https://github.com/a2aproject/A2A) | +| **Status** | v0.3 (July 2025), donated to Linux Foundation | +| **Supporters** | 150+ organizations (Google, Atlassian, Salesforce, SAP, etc.) | +| **Confidence** | 9/10 | + +**Purpose:** Agent-to-agent communication standard. Complementary to MCP (agent-to-tool). Agent Cards (JSON) for capability discovery. HTTP + gRPC transport. Becoming the de facto interop standard. + +### Anthropic MCP (Model Context Protocol) + +Already integrated into our project. MCP = agent-to-tool communication. A2A = agent-to-agent communication. The two are complementary. + +--- + +## Comparison Matrix: Desktop Orchestrators + +| Feature | **Our App** | **Vibe Kanban** | **Dorothy** | **Superset** | **Mozzie** | +|---------|------------|-----------------|-------------|--------------|------------| +| **Kanban board** | Yes | Yes | Yes | No | No | +| **Multi-provider agents** | Claude only* | 10+ agents | 3 agents | Any CLI | 3+ agents | +| **Code review / diff** | Yes | Yes | No | Yes | Yes | +| **Team communication** | Yes | No | Via Super Agent | No | No | +| **Session analysis** | Yes (deep) | No | No | No | No | +| **Context monitoring** | Yes | No | No | No | No | +| **MCP integration** | Yes | Yes (client+server) | Yes (5 servers) | No | ACP | +| **Agent-to-agent messaging** | Yes | Via MCP | Via Super Agent | No | Via ACP | +| **Dependency graph** | No | No | No | No | Yes | +| **External integrations** | No | GitHub | GitHub, JIRA, Slack, Telegram | IDE integration | No | +| **Tech stack** | Electron/React | Rust/React | Electron/React | Electron/React | Tauri | +| **License** | MIT | Free/OSS | OSS | ELv2 | OSS | +| **GitHub stars** | ~small | ~23,700 | Unknown | ~7,800 | Unknown | + +*Currently Claude-only, but the architecture could support multi-provider agents. + +--- + +## Strategic Recommendations + +### Immediate Opportunities + +1. **Multi-provider support is the #1 gap.** Every competitor now supports Claude + Codex + Gemini. Our single-provider approach is a significant limitation. Priority: HIGH. + +2. **MCP server exposure.** Dorothy and Vibe Kanban expose their kanban board as an MCP server — agents can programmatically create tasks, move cards, check status. This is a powerful pattern we should adopt. + +3. **A2A protocol awareness.** The A2A standard (150+ orgs, Linux Foundation) is becoming the agent-to-agent interop standard. We should monitor and potentially implement it. + +### Integration Paths for Multi-Provider Support + +| Approach | Description | Effort | Reliability | +|----------|-------------|--------|-------------| +| **Direct CLI integration** | Spawn Codex CLI / Gemini CLI alongside Claude Code in separate processes | Medium | 8/10 | +| **MCO as dispatch layer** | Use MCO to fan out tasks across multiple agent CLIs | Low | 7/10 | +| **Plugin architecture** | Build pluggable AgentRuntime interface (like Overstory) | High | 9/10 | +| **A2A protocol** | Implement A2A for cross-agent communication | High | 7/10 | + +### Unique Differentiators We Should Protect + +1. **Deep session analysis** (bash commands, reasoning, subprocesses) — nobody else has this +2. **Context monitoring** (token usage by category) — unique feature +3. **Team communication model** (lead + teammates with direct messaging) — only Dorothy's Super Agent comes close +4. **Post-compact context recovery** — unique +5. **Code review workflow** (accept/reject/comment per task) — Vibe Kanban is closest competitor here + +### Tools Worth Investigating Further + +1. **Vibe Kanban** — most direct competitor, 23.7K stars, Rust backend, mature feature set +2. **Dorothy** — Electron architecture closest to ours, MCP-heavy, good integration model +3. **Agent Orchestrator (ComposioHQ)** — plugin architecture is excellent, could inspire our multi-provider design +4. **MCO** — lightweight dispatch layer we could integrate as-is +5. **Overstory** — SQLite mail system for inter-agent messaging is elegant + +--- + +## Curated Resource Lists + +- [awesome-agent-orchestrators](https://github.com/andyrewlee/awesome-agent-orchestrators) — Comprehensive list of orchestration tools +- [awesome-cli-coding-agents](https://github.com/bradAGI/awesome-cli-coding-agents) — 80+ CLI coding agents + orchestration harnesses +- [awesome-ai-agents-2026](https://github.com/caramaschiHG/awesome-ai-agents-2026) — 300+ resources across 20+ categories + +--- + +## Sources + +- [Top 5 Open-Source Agentic AI Frameworks in 2026](https://aimultiple.com/agentic-frameworks) +- [Top 9 AI Agent Frameworks — Shakudo](https://www.shakudo.io/blog/top-9-ai-agent-frameworks) +- [Best Open Source Frameworks for AI Agents — Firecrawl](https://www.firecrawl.dev/blog/best-open-source-agent-frameworks) +- [Microsoft Agent Framework Announcement](https://devblogs.microsoft.com/foundry/introducing-microsoft-agent-framework-the-open-source-engine-for-agentic-ai-apps/) +- [OpenAI Symphony — Medium](https://medium.com/@georgethomasm_89397/openai-symphony-the-new-orchestration-framework-for-multi-agent-systems-2ec991ee74cc) +- [CrewAI Open Source](https://crewai.com/open-source) +- [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/) +- [AWS CLI Agent Orchestrator](https://aws.amazon.com/blogs/opensource/introducing-cli-agent-orchestrator-transforming-developer-cli-tools-into-a-multi-agent-powerhouse/) +- [Google A2A Protocol](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/) +- [A2A Protocol v0.3 Upgrade](https://cloud.google.com/blog/products/ai-machine-learning/agent2agent-protocol-is-getting-an-upgrade) +- [Warp Oz Platform](https://www.warp.dev/blog/oz-orchestration-platform-cloud-agents) +- [Vibe Kanban](https://vibekanban.com/) +- [Dorothy AI](https://dorothyai.app/) +- [Superset IDE](https://superset.sh/) +- [MCO — mco-org/mco](https://github.com/mco-org/mco) +- [Agent Orchestrator — ComposioHQ](https://github.com/ComposioHQ/agent-orchestrator) +- [MetaSwarm](https://github.com/dsifry/metaswarm) +- [Overstory](https://github.com/jayminwest/overstory) +- [Claude Octopus](https://github.com/nyldn/claude-octopus) +- [Mozzie](https://github.com/usemozzie/mozzie) +- [Parallel Code](https://github.com/johannesjo/parallel-code) +- [Orchestral AI Paper](https://arxiv.org/abs/2601.02577) +- [LLM Orchestration 2026 — AIMultiple](https://aimultiple.com/llm-orchestration) +- [Multi-Agent Frameworks 2026 — GuruSup](https://gurusup.com/blog/best-multi-agent-frameworks-2026) +- [Agno Framework](https://github.com/agno-agi/agno) +- [awesome-agent-orchestrators](https://github.com/andyrewlee/awesome-agent-orchestrators) +- [awesome-cli-coding-agents](https://github.com/bradAGI/awesome-cli-coding-agents) diff --git a/docs/research/best-abstraction-for-electron.md b/docs/research/best-abstraction-for-electron.md new file mode 100644 index 00000000..9af717cc --- /dev/null +++ b/docs/research/best-abstraction-for-electron.md @@ -0,0 +1,726 @@ +# Best Abstraction Tool for Multi-Provider Agent Support in Electron + +**Date**: 2026-03-24 +**Branch**: `dev` +**Based on**: actual source analysis of `TeamProvisioningService.ts` (7,982 LOC), `childProcess.ts`, `TeamMcpConfigBuilder.ts`, `PtyTerminalService.ts`, `agent-teams-controller/`, and prior research in `docs/research/` + +--- + +## Context: What We Have Today + +Our Electron app (40.x) manages Claude Code CLI processes via: + +| Component | File | Role | +|-----------|------|------| +| `spawnCli()` | `src/main/utils/childProcess.ts` | child_process.spawn wrapper with Windows EINVAL fallback, injects `CLI_ENV_DEFAULTS` | +| `TeamProvisioningService` | `src/main/services/team/TeamProvisioningService.ts` | 7,982 LOC monolith: process lifecycle, stream-json NDJSON parsing, prompt engineering, stall watchdog, tool approval relay | +| `ClaudeBinaryResolver` | `src/main/services/team/ClaudeBinaryResolver.ts` | Resolves `claude` binary across PATH, NVM, platform dirs | +| `TeamMcpConfigBuilder` | `src/main/services/team/TeamMcpConfigBuilder.ts` | Builds `--mcp-config` JSON for every spawned process | +| `PtyTerminalService` | `src/main/services/infrastructure/PtyTerminalService.ts` | node-pty for embedded terminal (used separately, NOT for agent processes) | +| `agent-teams-controller` | `agent-teams-controller/` | Provider-agnostic file CRUD (tasks, kanban, inbox, reviews) | +| `killTeamProcess()` | TeamProvisioningService | Uses SIGKILL to prevent Claude CLI SIGTERM cleanup deleting team files | + +**Current protocol**: Claude CLI `--input-format stream-json --output-format stream-json` — proprietary NDJSON with types: `user`, `assistant`, `control_request`, `result`, `system`. + +**Current coupling**: 9/10 to Claude Code CLI (see `best-integration-approach.md` for full coupling map). + +--- + +## Two Distinct Needs + +### Level 1: CLI Agent Process Management +Spawn external CLI agents (Claude Code, Codex CLI, Gemini CLI, Goose) as child processes, each with its own protocol, binary resolution, health monitoring, and MCP config. + +### Level 2: Programmatic LLM API Calls +Call LLM APIs directly for lightweight tasks (code review bot, triage bot, task planning, MCP tool calling). No CLI process — just HTTP to provider APIs. + +These are **fundamentally different problems** and should use **different solutions**. + +--- + +## Level 1: CLI Agent Process Management + +### The Candidates + +#### Option A: Own Adapter Pattern (Overstory-style) +**Reliability: 9/10 | Confidence: 9/10** + +Build a thin `AgentCliAdapter` interface with per-CLI implementations. + +```typescript +// src/main/services/agent/AgentCliAdapter.ts +export interface AgentCliAdapter { + readonly providerId: string; // 'claude' | 'codex' | 'gemini' | 'goose' + + /** Resolve binary path on this machine */ + resolveBinary(): Promise; + + /** Build spawn args for creating/launching a team */ + buildSpawnArgs(request: AgentSpawnRequest): string[]; + + /** Build env vars for the spawned process */ + buildEnv(base: NodeJS.ProcessEnv): NodeJS.ProcessEnv; + + /** Parse a line of stdout. Returns typed event or null (skip). */ + parseStdoutLine(line: string): AgentOutputEvent | null; + + /** Format a user message for stdin */ + formatUserMessage(text: string): string; + + /** Process exit semantics: what does exit code mean? */ + interpretExitCode(code: number | null): 'success' | 'error' | 'killed'; + + /** Kill semantics: SIGTERM vs SIGKILL */ + killProcess(child: ChildProcess): void; + + /** Whether this CLI needs MCP config file */ + needsMcpConfig: boolean; + + /** Build MCP config in the format this CLI expects */ + buildMcpConfig?(servers: Record): object; +} +``` + +Per-provider implementations: + +```typescript +// src/main/services/agent/adapters/ClaudeCliAdapter.ts +export class ClaudeCliAdapter implements AgentCliAdapter { + readonly providerId = 'claude'; + readonly needsMcpConfig = true; + + async resolveBinary(): Promise { + return new ClaudeBinaryResolver().resolve(); + } + + buildSpawnArgs(request: AgentSpawnRequest): string[] { + return [ + '--input-format', 'stream-json', + '--output-format', 'stream-json', + '--verbose', + '--setting-sources', 'user,project,local', + '--mcp-config', request.mcpConfigPath!, + '--disallowedTools', 'TeamDelete,TodoWrite', + ...(request.skipPermissions + ? ['--dangerously-skip-permissions', '--permission-mode', 'bypassPermissions'] + : ['--permission-prompt-tool', 'stdio', '--permission-mode', 'default']), + ...(request.model ? ['--model', request.model] : []), + ]; + } + + buildEnv(base: NodeJS.ProcessEnv): NodeJS.ProcessEnv { + return { ...base, CLAUDE_HOOK_JUDGE_MODE: 'true' }; + } + + parseStdoutLine(line: string): AgentOutputEvent | null { + const msg = JSON.parse(line); + // Existing 60+ branch logic from handleStreamJsonMessage() + switch (msg.type) { + case 'assistant': return { kind: 'text', content: extractText(msg) }; + case 'result': return { kind: 'result', success: msg.subtype !== 'error' }; + case 'control_request': return { kind: 'approval', request: msg }; + // ... etc + } + } + + formatUserMessage(text: string): string { + return JSON.stringify({ + type: 'user', + message: { role: 'user', content: [{ type: 'text', text }] }, + }) + '\n'; + } + + killProcess(child: ChildProcess): void { + killProcessTree(child, 'SIGKILL'); // SIGKILL to prevent cleanup + } +} +``` + +```typescript +// src/main/services/agent/adapters/CodexCliAdapter.ts +export class CodexCliAdapter implements AgentCliAdapter { + readonly providerId = 'codex'; + readonly needsMcpConfig = false; // Codex uses MCP differently + + async resolveBinary(): Promise { + // which codex + return resolveWhich('codex'); + } + + buildSpawnArgs(request: AgentSpawnRequest): string[] { + return ['app-server']; // JSON-RPC mode + } + + parseStdoutLine(line: string): AgentOutputEvent | null { + // JSON-RPC notification parsing + const msg = JSON.parse(line); + if (msg.method === 'item/agentMessage/delta') { + return { kind: 'text_delta', content: msg.params.delta }; + } + // ... + } + + formatUserMessage(text: string): string { + // JSON-RPC request for turn/start + return JSON.stringify({ + jsonrpc: '2.0', id: nextId(), + method: 'turn/start', + params: { message: text }, + }) + '\n'; + } + + killProcess(child: ChildProcess): void { + killProcessTree(child, 'SIGTERM'); // Codex handles SIGTERM gracefully + } +} +``` + +```typescript +// src/main/services/agent/adapters/GeminiCliAdapter.ts +export class GeminiCliAdapter implements AgentCliAdapter { + readonly providerId = 'gemini'; + readonly needsMcpConfig = false; + + async resolveBinary(): Promise { + return resolveWhich('gemini'); + } + + buildSpawnArgs(request: AgentSpawnRequest): string[] { + return [ + '--output-format', 'stream-json', + '-p', request.prompt, + ]; + } + + parseStdoutLine(line: string): AgentOutputEvent | null { + // Gemini NDJSON events + const event = JSON.parse(line); + // ... + } + + formatUserMessage(text: string): string { + // Gemini headless doesn't support multi-turn stdin in stream-json + // (one-shot with -p flag). For multi-turn, need new process per turn. + throw new Error('Gemini CLI does not support multi-turn stdin'); + } + + killProcess(child: ChildProcess): void { + killProcessTree(child, 'SIGTERM'); + } +} +``` + +**Pros:** +- Zero new dependencies +- Perfectly fits existing `spawnCli()` / `killProcessTree()` infrastructure +- Each adapter is ~100-200 LOC — easy to test in isolation +- Can be extracted incrementally from the existing TeamProvisioningService +- No framework overhead in the Electron main process +- Each CLI's quirks handled explicitly (Claude SIGKILL vs Codex SIGTERM, stream-json vs JSON-RPC) + +**Cons:** +- We write the adapter code ourselves (~500 LOC total for 4 adapters) +- No built-in CLI discovery / health check framework + +**Effort**: ~800 LOC (interface + 4 adapters + factory), 3-5 days + +--- + +#### Option B: node-pty Based Approach +**Reliability: 5/10 | Confidence: 4/10** + +Use pseudo-terminal for all CLI agents (captures raw terminal output). + +```typescript +import * as pty from 'node-pty'; + +const proc = pty.spawn('claude', ['--verbose'], { + name: 'xterm-256color', + cols: 120, rows: 40, + cwd: projectPath, + env: process.env, +}); + +proc.onData((data) => { + // Problem: raw terminal output with ANSI codes, cursor movement, etc. + // We'd need to strip all that to parse structured JSON +}); +``` + +**Pros:** +- Already have `node-pty` in dependencies (for embedded terminal) +- Works with any CLI that has a TUI mode + +**Cons:** +- node-pty is a native addon requiring electron-rebuild (fragile across platforms) +- All CLIs output ANSI escape codes in TTY mode — parsing structured data from raw terminal output is extremely unreliable +- We ALREADY use stream-json/JSON-RPC specifically to AVOID the TTY problem +- Memory overhead of full PTY per agent process +- Claude Code, Codex, and Gemini all have headless/programmatic modes — PTY is the WRONG abstraction + +**Verdict: REJECT.** PTY is for interactive terminals, not programmatic agent management. We already learned this — `PtyTerminalService` is used only for the embedded terminal, not for agent processes. + +--- + +#### Option C: MCO / Third-Party Orchestrator Library +**Reliability: 3/10 | Confidence: 3/10** + +No mature, production-ready TypeScript library exists for "spawn and manage multiple AI CLI agents as child processes." The closest is `pi-builder` from the `awesome-cli-coding-agents` ecosystem, but it's a young project (~100 stars) with no stability guarantees. + +**Verdict: REJECT.** The problem is too niche and CLI-specific for a generic library. Each CLI has its own protocol (Claude stream-json, Codex JSON-RPC, Gemini NDJSON, Goose recipes). A generic library would either be too thin to be useful or too opinionated to handle the differences. + +--- + +#### Level 1 Recommendation: Option A (Own Adapter Pattern) + +| Criteria | Score | +|----------|-------| +| Fit with existing code patterns | 10/10 — mirrors how `spawnCli()` and `ClaudeBinaryResolver` already work | +| Lines of code to integrate | ~800 LOC (interface + 4 adapters + factory) | +| Heavy dependencies added | 0 | +| Runs in Electron main process | Yes (pure Node.js) | +| License compatibility | N/A (our own code, AGPL-3.0) | +| Active maintenance | By us — full control | + +**Migration path**: Extract current Claude-specific logic from `TeamProvisioningService` into `ClaudeCliAdapter`, then add adapters for other CLIs one by one. The monster 7,982 LOC monolith gets decomposed as a side effect. + +--- + +## Level 2: Programmatic LLM API Calls + +### The Candidates + +#### Option A: Vercel AI SDK (`ai` + `@ai-sdk/*`) +**Reliability: 9/10 | Confidence: 9/10** (Recommended) + +The leading TypeScript LLM abstraction. 20M+ monthly npm downloads, backed by Vercel, 30K+ GitHub stars. + +```typescript +// src/main/services/llm/LlmService.ts +import { generateText, streamText, tool } from 'ai'; +import { anthropic } from '@ai-sdk/anthropic'; +import { openai } from '@ai-sdk/openai'; +import { google } from '@ai-sdk/google'; +import { z } from 'zod'; + +// Simple code review — runs in Electron main process +export async function reviewCode(diff: string, model = 'anthropic/claude-sonnet-4-20250514') { + const { text } = await generateText({ + model: anthropic('claude-sonnet-4-20250514'), + system: 'You are a code reviewer. Be concise.', + prompt: `Review this diff:\n\n${diff}`, + }); + return text; +} + +// Streaming task planning with tool calling — relayed to renderer via IPC +export async function planTasks( + description: string, + onChunk: (text: string) => void, +) { + const result = streamText({ + model: openai('gpt-4o'), + system: 'You are a project planner.', + prompt: description, + tools: { + createTask: tool({ + description: 'Create a new task on the kanban board', + parameters: z.object({ + title: z.string(), + assignee: z.string().optional(), + column: z.enum(['backlog', 'todo', 'in_progress']), + }), + execute: async ({ title, assignee, column }) => { + // Call our agent-teams-controller to create task + return controller.createTask({ title, assignee, column }); + }, + }), + }, + maxSteps: 10, // Allow multi-step tool calling loops + }); + + for await (const chunk of result.textStream) { + onChunk(chunk); + } +} + +// Triage incoming issue — pick best team member +export async function triageTask(taskDescription: string) { + const { object } = await generateObject({ + model: google('gemini-2.5-flash'), + schema: z.object({ + assignee: z.string(), + priority: z.enum(['low', 'medium', 'high', 'critical']), + reasoning: z.string(), + }), + prompt: `Triage this task: ${taskDescription}\nAvailable members: alice (frontend), bob (backend), carol (devops)`, + }); + return object; // Typed: { assignee: string; priority: string; reasoning: string } +} +``` + +**What we install:** +```bash +pnpm add ai @ai-sdk/anthropic @ai-sdk/openai @ai-sdk/google zod +# ai: 67.5 kB gzipped (core) +# @ai-sdk/anthropic: ~15 kB gzipped +# @ai-sdk/openai: ~19.5 kB gzipped +# @ai-sdk/google: ~15 kB gzipped +# Total: ~117 kB gzipped — very reasonable for Electron +``` + +**Pros:** +- Unified `generateText()` / `streamText()` / `generateObject()` API across ALL providers +- Swap provider with one line change: `anthropic('claude-sonnet-4-20250514')` → `openai('gpt-4o')` +- First-class tool calling with Zod schema validation +- Streaming works perfectly in Node.js (Electron main process) +- Sentry already has `vercelAIIntegration` for Electron — we already use `@sentry/electron` +- TypeScript-first: full type inference for tool parameters and structured outputs +- AI SDK 6 `Agent` class for reusable agent patterns +- 20M+ monthly downloads, extremely active maintenance, battle-tested +- Apache-2.0 license — compatible with our AGPL-3.0 + +**Cons:** +- Adds ~4 new deps (ai, 3 providers) — but they're lightweight +- Learning curve for Zod schemas (though Zod is industry standard) +- AI SDK 5→6 had some breaking changes — minor version churn risk + +**Electron main process integration:** +```typescript +// src/main/ipc/llm.ts — IPC handlers for renderer +import { wrapHandler } from './utils'; +import { streamText } from 'ai'; +import { anthropic } from '@ai-sdk/anthropic'; + +export function registerLlmHandlers() { + // One-shot generation + ipcMain.handle('llm:generate', wrapHandler(async (_event, params) => { + const { text } = await generateText({ + model: resolveModel(params.model), // 'anthropic/claude-sonnet-4-20250514' → anthropic('claude-sonnet-4-20250514') + system: params.system, + prompt: params.prompt, + }); + return { text }; + })); + + // Streaming — emit chunks via webContents.send() + ipcMain.handle('llm:stream', wrapHandler(async (event, params) => { + const result = streamText({ + model: resolveModel(params.model), + system: params.system, + prompt: params.prompt, + }); + + const sender = event.sender; + for await (const chunk of result.textStream) { + sender.send('llm:chunk', { requestId: params.requestId, chunk }); + } + sender.send('llm:done', { requestId: params.requestId }); + return { started: true }; + })); +} +``` + +--- + +#### Option B: Mastra (LLM layer only) +**Reliability: 6/10 | Confidence: 5/10** + +Mastra is a full agent framework (workflows, RAG, memory, server). Using "just the LLM layer" means using Mastra's `Agent` class which internally uses AI SDK anyway. + +```typescript +import { Agent } from '@mastra/core/agent'; + +const reviewer = new Agent({ + id: 'code-reviewer', + instructions: 'You are a code reviewer.', + model: 'anthropic/claude-sonnet-4-20250514', +}); + +const result = await reviewer.generate('Review this diff...'); +``` + +**Pros:** +- Nice `Agent` abstraction with built-in memory and workflow support +- Uses AI SDK internally — same providers +- TypeScript-native + +**Cons:** +- `@mastra/core` pulls in significant dependencies (server framework, storage adapters, DI container) +- Overkill for our use case — we need `generateText()`, not the full agent runtime +- Our agent runtime IS the CLI process management layer, not Mastra's in-process loop +- Less mature than AI SDK (smaller community, fewer downloads) +- Adds unnecessary abstraction layer on top of AI SDK +- YC-backed startup — could pivot or die; AI SDK is backed by Vercel ($3.2B company) + +**See also:** `docs/research/mastra-integration-analysis.md` (full analysis, verdict 6/10 feasibility) + +--- + +#### Option C: LangChain.js +**Reliability: 4/10 | Confidence: 3/10** + +```typescript +import { ChatAnthropic } from '@langchain/anthropic'; +import { ChatOpenAI } from '@langchain/openai'; + +const chat = new ChatAnthropic({ model: 'claude-sonnet-4-20250514' }); +const result = await chat.invoke('Review this diff...'); +``` + +**Pros:** +- Largest ecosystem (chains, agents, RAG, memory) +- Many tutorials and examples + +**Cons:** +- **101 kB gzipped** — 3x the size of OpenAI SDK, 1.5x AI SDK +- Heavy dependency tree (infamous for bloat) +- Frequent breaking changes between versions +- Overcomplicated abstractions for simple LLM calls +- Edge runtime incompatible (uses Node `fs`) +- Community frustration well-documented: "LangChain adds unnecessary complexity" +- For our use case (simple API calls with tool calling), it's a 10-ton truck for a bicycle ride + +--- + +#### Option D: LiteLLM (via proxy) +**Reliability: 5/10 | Confidence: 4/10** + +Run a Python proxy process, point OpenAI SDK at it. + +```typescript +import OpenAI from 'openai'; + +const client = new OpenAI({ + baseURL: 'http://localhost:4000', // LiteLLM proxy + apiKey: 'sk-anything', +}); + +const result = await client.chat.completions.create({ + model: 'anthropic/claude-sonnet-4-20250514', + messages: [{ role: 'user', content: 'Review this diff...' }], +}); +``` + +**Pros:** +- 100+ providers through OpenAI-compatible API +- Rate limiting, fallbacks, cost tracking built-in +- Established in production at many companies + +**Cons:** +- **Requires Python runtime** — catastrophic for an Electron desktop app +- Another long-lived process to manage (proxy lifecycle) +- Performance degrades under concurrency (Python GIL) +- Extra latency hop: Electron → proxy → provider → proxy → Electron +- Enterprise features (SSO, RBAC) behind paid license +- Electron users expect a self-contained app, not "also install Python 3.11" + +--- + +#### Option E: Direct Provider SDKs with Thin Wrapper +**Reliability: 7/10 | Confidence: 7/10** + +```typescript +import Anthropic from '@anthropic-ai/sdk'; +import OpenAI from 'openai'; + +async function callLlm(provider: string, prompt: string) { + switch (provider) { + case 'anthropic': { + const client = new Anthropic(); + const msg = await client.messages.create({ + model: 'claude-sonnet-4-20250514', + max_tokens: 4096, + messages: [{ role: 'user', content: prompt }], + }); + return msg.content[0].type === 'text' ? msg.content[0].text : ''; + } + case 'openai': { + const client = new OpenAI(); + const result = await client.chat.completions.create({ + model: 'gpt-4o', + messages: [{ role: 'user', content: prompt }], + }); + return result.choices[0]?.message?.content ?? ''; + } + // ...each provider has different API shape + } +} +``` + +**Pros:** +- Each SDK is lightweight and well-maintained +- No abstraction overhead — direct control + +**Cons:** +- Must implement unified tool calling ourselves (Anthropic tools format ≠ OpenAI function calling ≠ Google tool format) +- Must implement streaming ourselves for each provider +- Must implement structured output extraction per-provider +- Maintenance burden grows linearly with each new provider +- This is literally what AI SDK already does, but worse + +--- + +### Level 2 Recommendation: Option A (Vercel AI SDK) + +| Criteria | Score | +|----------|-------| +| Fit with existing code patterns | 9/10 — pure TypeScript, Node.js-compatible, modular | +| Lines of code to integrate | ~200 LOC (LlmService + IPC handlers) | +| Heavy dependencies added | No — ~117 kB gzipped total for core + 3 providers | +| Runs in Electron main process | Yes — confirmed by Sentry Electron integration docs | +| License compatibility | Apache-2.0 → compatible with our AGPL-3.0 | +| Active maintenance | 10/10 — 20M+ monthly downloads, Vercel-backed | + +--- + +## Combined Architecture + +``` +┌─────────────────────────────────────────────────────────┐ +│ Electron Main Process │ +│ │ +│ ┌──────────────────────────────────────────────────┐ │ +│ │ Level 1: CLI Process Management │ │ +│ │ │ │ +│ │ AgentCliAdapter (interface) │ │ +│ │ ├─ ClaudeCliAdapter (stream-json NDJSON) │ │ +│ │ ├─ CodexCliAdapter (app-server JSON-RPC) │ │ +│ │ ├─ GeminiCliAdapter (stream-json NDJSON) │ │ +│ │ └─ GooseCliAdapter (stdin recipes) │ │ +│ │ │ │ +│ │ spawnCli() + killProcessTree() (unchanged) │ │ +│ │ TeamMcpConfigBuilder (unchanged) │ │ +│ │ TeamProvisioningService (refactored to use │ │ +│ │ adapter.parseStdoutLine() etc.) │ │ +│ └──────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────┐ │ +│ │ Level 2: Programmatic LLM API Calls │ │ +│ │ │ │ +│ │ Vercel AI SDK (ai + @ai-sdk/*) │ │ +│ │ ├─ generateText() → code review, triage │ │ +│ │ ├─ streamText() → task planning, chat │ │ +│ │ ├─ generateObject()→ structured extraction │ │ +│ │ └─ tool() → MCP tool bridges │ │ +│ │ │ │ +│ │ LlmService.ts (~200 LOC) │ │ +│ │ IPC handlers → renderer │ │ +│ └──────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────┐ │ +│ │ Shared: agent-teams-controller │ │ +│ │ (provider-agnostic task/kanban/inbox CRUD) │ │ +│ └──────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +--- + +## Comparison Matrix + +### Level 1: CLI Process Management + +| Criterion | Own Adapter | node-pty | MCO/Third-Party | +|-----------|-------------|----------|-----------------| +| Reliability | 9/10 | 5/10 | 3/10 | +| Confidence | 9/10 | 4/10 | 3/10 | +| Fit with codebase | 10/10 | 4/10 | 3/10 | +| New dependencies | 0 | 0 (already have) | Unknown | +| LOC to integrate | ~800 | ~600 | ~1000+ | +| Electron compatible | Yes | Yes (fragile) | Unknown | +| Handles protocol diffs | Explicit | No (raw PTY) | Generic/lossy | + +### Level 2: Programmatic LLM API Calls + +| Criterion | AI SDK | Mastra | LangChain | LiteLLM | Direct SDKs | +|-----------|--------|--------|-----------|---------|-------------| +| Reliability | 9/10 | 6/10 | 4/10 | 5/10 | 7/10 | +| Confidence | 9/10 | 5/10 | 3/10 | 4/10 | 7/10 | +| Fit with codebase | 9/10 | 5/10 | 3/10 | 2/10 | 7/10 | +| Bundle size | 117 kB | ~400+ kB | 101 kB + deps | N/A (Python) | ~80 kB | +| Tool calling | Unified | Unified (via AI SDK) | Unified | OpenAI-compat | Per-provider | +| Streaming | Async iterator | Async iterator | Chains | SSE proxy | Per-provider | +| Providers | 20+ | 94 (via AI SDK) | 20+ | 100+ | Each separate | +| Electron main proc | Confirmed | Untested | Problematic | Requires Python | Yes | +| License | Apache-2.0 | Elastic-2.0 / AGPL-3.0 | MIT | MIT | Varies | +| Maintenance | Vercel (huge team) | Startup (small) | Community | Community | Per-vendor | + +--- + +## Final Recommendation + +### Level 1: Own Adapter Pattern +- **0 new dependencies**, ~800 LOC +- Extract Claude-specific logic from the 7,982 LOC monolith into `ClaudeCliAdapter` +- Add `CodexCliAdapter`, `GeminiCliAdapter`, `GooseCliAdapter` incrementally +- Each adapter handles that CLI's unique protocol, binary resolution, spawn args, kill semantics +- Decomposes the monolith as a beneficial side effect + +### Level 2: Vercel AI SDK (`ai` + `@ai-sdk/*`) +- **4 lightweight deps** (~117 kB gzipped total), ~200 LOC integration +- `generateText()` for one-shot tasks, `streamText()` for interactive, `generateObject()` for structured extraction +- Unified tool calling with Zod schemas +- Swap any provider with one line change +- Apache-2.0 compatible with our AGPL-3.0 +- Already used by 20M+ monthly projects, confirmed Electron compatibility + +### Implementation Order + +1. **Week 1**: Create `AgentCliAdapter` interface, extract `ClaudeCliAdapter` from `TeamProvisioningService` +2. **Week 1**: Install AI SDK, create `LlmService.ts` with `generateText()` wrapper, add IPC handlers +3. **Week 2**: Add `CodexCliAdapter` (app-server JSON-RPC mode) +4. **Week 2**: Build code review bot using AI SDK + MCP tools +5. **Week 3**: Add `GeminiCliAdapter`, `GooseCliAdapter` +6. **Week 3**: Build triage bot, task planning with `streamText()` + tool calling + +**Total effort**: ~3 weeks for full multi-provider support at both levels. + +--- + +## Sources + +### AI SDK (Vercel) +- [AI SDK Introduction](https://ai-sdk.dev/docs/introduction) +- [AI SDK 6 Announcement](https://vercel.com/blog/ai-sdk-6) +- [Node.js Getting Started](https://ai-sdk.dev/docs/getting-started/nodejs) +- [Providers and Models](https://ai-sdk.dev/docs/foundations/providers-and-models) +- [Sentry Electron + Vercel AI Integration](https://docs.sentry.io/platforms/javascript/guides/electron/configuration/integrations/vercelai/) +- [Generating Text](https://ai-sdk.dev/docs/ai-sdk-core/generating-text) +- [npm: ai](https://www.npmjs.com/package/ai) +- [GitHub: vercel/ai](https://github.com/vercel/ai) + +### Codex CLI +- [Codex SDK](https://developers.openai.com/codex/sdk) +- [Codex App Server](https://developers.openai.com/codex/app-server) +- [npm: @openai/codex-sdk](https://www.npmjs.com/package/@openai/codex-sdk) +- [CLI Reference](https://developers.openai.com/codex/cli/reference) + +### Gemini CLI +- [Headless Mode Reference](https://geminicli.com/docs/cli/headless/) +- [GitHub: google-gemini/gemini-cli](https://github.com/google-gemini/gemini-cli) + +### Goose +- [GitHub: block/goose](https://github.com/block/goose) +- [CLI Commands](https://block.github.io/goose/docs/guides/goose-cli-commands/) + +### Mastra +- [GitHub: mastra-ai/mastra](https://github.com/mastra-ai/mastra) +- [Mastra Docs: Models](https://mastra.ai/models) + +### LangChain.js +- [LangChain vs Vercel AI SDK vs OpenAI SDK: 2026 Guide](https://strapi.io/blog/langchain-vs-vercel-ai-sdk-vs-openai-sdk-comparison-guide) +- [Bundle Size Issue #809](https://github.com/langchain-ai/langchainjs/issues/809) +- [LangChain Criticism](https://community.latenode.com/t/why-im-avoiding-langchain-in-2025/39046) + +### LiteLLM +- [LiteLLM Proxy Docs](https://docs.litellm.ai/docs/simple_proxy) +- [Best LiteLLM Alternatives 2026](https://www.getmaxim.ai/articles/best-litellm-alternatives-in-2026/) + +### License Compatibility +- [Apache License and GPL Compatibility](https://www.apache.org/licenses/GPL-compatibility.html) +- [Apache 2.0 Compatible Licenses Guide](https://licensecheck.io/guides/apache-compatible) + +### Ecosystem +- [CLI Coding Agents Comparison 2026](https://www.tembo.io/blog/coding-cli-tools-comparison) +- [awesome-cli-coding-agents](https://github.com/bradAGI/awesome-cli-coding-agents) diff --git a/docs/research/best-integration-approach.md b/docs/research/best-integration-approach.md new file mode 100644 index 00000000..45cbb764 --- /dev/null +++ b/docs/research/best-integration-approach.md @@ -0,0 +1,406 @@ +# Best Integration Approach for Multi-Provider Agent Support + +**Date**: 2026-03-24 +**Branch**: `dev` +**Based on**: deep codebase analysis of actual source files + +--- + +## Executive Summary + +After analyzing 21,584 LOC in `src/main/services/team/`, 2,973 LOC in `src/main/ipc/teams.ts`, 1,245 LOC in `mcp-server/src/`, and all prompt engineering in `TeamProvisioningService.ts` (7,982 LOC), the recommendation is: + +**Option 7: Hybrid approach** — keep Claude Code native support as-is, enhance the existing MCP server to be the universal integration point for other agents. + +This is the only approach that ships incrementally, preserves our working architecture, and provides real multi-provider value within 2-3 weeks. + +--- + +## Architecture Deep Dive + +### Coupling Map (actual file references) + +#### Layer 1: Process Management (9/10 coupling to Claude) +- `src/main/services/team/ClaudeBinaryResolver.ts` (292 LOC) — resolves `claude` binary across PATH, NVM, platform-specific dirs +- `src/main/services/team/TeamProvisioningService.ts` (7,982 LOC) — the monolith: process spawn, stream-json parsing, prompt engineering, inbox relay, tool approval, stall detection, auth retry +- `src/main/utils/childProcess.ts` — `spawnCli()` injects `CLAUDE_HOOK_JUDGE_MODE` env var +- Claude CLI flags hardcoded: `--input-format stream-json`, `--output-format stream-json`, `--verbose`, `--setting-sources`, `--mcp-config`, `--disallowedTools`, `--dangerously-skip-permissions`, `--permission-prompt-tool`, `--permission-mode`, `--model`, `--effort`, `--worktree`, `--resume` +- Kill semantics: `killTeamProcess()` uses SIGKILL because Claude CLI SIGTERM cleanup **deletes team files** + +#### Layer 2: Protocol (10/10 coupling to Claude) +- stream-json protocol is entirely Claude-proprietary +- `HANDLED_STREAM_JSON_TYPES` = `user`, `assistant`, `control_request`, `result`, `system` +- Input format: `{"type":"user","message":{"role":"user","content":[{"type":"text","text":"..."}]}}\n` +- Output parsing: 60+ branches in `handleStreamJsonMessage()` (lines 4858-5294) +- `control_request` for tool approval — Claude Code-specific flow +- Teammate message format: `content` + +#### Layer 3: Prompt Engineering (10/10 coupling to Claude) +- `buildProvisioningPrompt()` (lines 860-953) — tells Claude to use `TeamCreate` built-in tool, then `Task` tool with `team_name` parameter to spawn teammates +- `buildMemberSpawnPrompt()` (lines 444-478) — instructs member to call `member_briefing` MCP tool first, then work with MCP task tools +- `buildPersistentLeadContext()` (lines 664-766) — 100+ line constraint block teaching Claude about kanban, review workflow, delegation-first behavior, agent block policy, cross-team messaging +- `buildTeamCtlOpsInstructions()` (lines 563-662) — exact MCP tool call examples: `task_create`, `task_get`, `kanban_set_column`, `review_approve`, etc. +- `buildActionModeProtocol()` — imports from `agent-teams-controller` via `protocols.buildActionModeProtocolText()` + +**Key insight**: The prompt teaches Claude to use two categories of tools: +1. **Claude Code built-in tools**: `TeamCreate`, `TeamDelete`, `TaskCreate` (the CLI's internal Task tool for spawning subagents), `SendMessage` — these exist ONLY in Claude Code +2. **MCP tools**: `task_create`, `task_get`, `task_list`, `kanban_get`, `review_approve`, `message_send`, etc. — these come from our `agent-teams-mcp` server and are **provider-agnostic** + +#### Layer 4: Data Layer (5/10 coupling — mostly agnostic) +- `agent-teams-controller` (workspace package) — **provider-agnostic** file-based CRUD for tasks, kanban, reviews, messages, processes +- `TeamDataService.ts` (1,953 LOC) — reads team data, invokes controller. Most logic is generic +- `TeamInboxWriter.ts` — writes JSON inbox files. No Claude-specific code +- `TeamTaskReader.ts`, `TeamTaskWriter.ts` — file-based task CRUD via controller +- `TeamKanbanManager.ts` — kanban state management via controller +- `TeamConfigReader.ts` — reads `config.json` from `~/.claude/teams//` +- Path dependency: `~/.claude/teams/` and `~/.claude/tasks/` via `pathDecoder.ts` + +#### Layer 5: MCP Server (0/10 coupling — fully agnostic) +- `mcp-server/src/` (1,245 LOC) — FastMCP server exposing 30+ tools +- **Already exposed tools**: + - Tasks: `task_create`, `task_get`, `task_get_comment`, `task_list`, `task_set_status`, `task_start`, `task_complete`, `task_set_owner`, `task_add_comment`, `task_attach_file`, `task_attach_comment_file`, `task_set_clarification`, `task_link`, `task_unlink`, `member_briefing`, `task_briefing` + - Kanban: `kanban_get`, `kanban_set_column`, `kanban_clear`, `kanban_list_reviewers`, `kanban_add_reviewer`, `kanban_remove_reviewer` + - Review: `review_request`, `review_start`, `review_approve`, `review_request_changes` + - Messages: `message_send` + - Processes: `process_register`, `process_list`, `process_unregister`, `process_stop` + - Cross-team: `cross_team_send`, `cross_team_list_targets`, `cross_team_get_outbox` + - Runtime: `team_launch`, `team_stop` +- Uses `agent-teams-controller` directly — no Claude Code dependency in MCP tools +- All tools take `teamName` + `claudeDir` as context parameters + +#### Layer 6: HTTP Control API (2/10 coupling) +- `src/main/http/teams.ts` — REST API for `POST /api/teams/:teamName/launch` and `/stop` +- `TeamControlApiState.ts` — publishes control endpoint to `~/.claude/team-control-api.json` +- Thin wrapper over `TeamProvisioningService` — the provisioning itself is Claude-coupled, but the HTTP API shape is generic + +--- + +## Approach Evaluation + +### 1. Mastra (TS-native orchestration framework) + +**Confidence: 4/10 | Reliability: 5/10** + +- **What it is**: Full TS-native agent framework with workflows, tools, memory, RAG +- **Effort**: 8-12 weeks +- **What breaks**: Everything. Mastra has its own agent lifecycle, tool system, and workflow engine. Our entire `TeamProvisioningService` (8K LOC), `TeamDataService` (2K LOC), prompt engineering, stream-json protocol, inbox system, kanban logic would need to be replaced or wrapped +- **What stays**: UI components (renderer), shared types, some utility code +- **Reusable code**: ~20% (UI, types, file watching) +- **Risk**: Very High. Mastra is designed for API-based agents (OpenAI, Anthropic API), not CLI-based agents. Claude Code Agent Teams runs as a CLI process with stream-json — Mastra has no concept of this. Would require either: + - Abandoning Claude Code CLI in favor of raw Anthropic API calls (losing Agent Teams, built-in tools, session persistence) + - Building a massive adapter layer to make Claude Code CLI look like a Mastra "agent" +- **Quality**: Medium. Multi-provider support would be good, but we'd lose all Claude Code-specific features that make the product unique +- **Verdict**: Massive rewrite for uncertain benefit. Our product IS Claude Code Agent Teams UI — Mastra would replace the foundation + +### 2. MCO (dispatch layer) + +**Confidence: 3/10 | Reliability: 4/10** + +- **What it is**: Lightweight dispatch layer for routing tasks to different agent providers +- **Effort**: 6-8 weeks +- **What breaks**: Same fundamental problem as Mastra — MCO dispatches to "agents" but doesn't understand Claude Code's CLI protocol, stream-json, Agent Teams, or our inbox system +- **What stays**: Data layer, UI, some services +- **Reusable code**: ~30% +- **Risk**: High. MCO is minimal and would require us to build most of the integration ourselves anyway +- **Quality**: Low-Medium. MCO is too thin to solve the real problems (protocol translation, process management, prompt adaptation) +- **Verdict**: All the work of a custom solution without the benefit of framework support + +### 3. Overstory Pattern (AgentRuntime interface + SQLite mail) + +**Confidence: 5/10 | Reliability: 6/10** + +- **What it is**: Abstract `AgentRuntime` interface with SQLite-backed message queue +- **Effort**: 6-10 weeks +- **What breaks**: Process management, protocol layer, prompt engineering +- **What stays**: UI, kanban logic, data layer structure (would migrate from JSON files to SQLite) +- **Reusable code**: ~35% +- **Risk**: High. Major architectural change (JSON files -> SQLite, inbox files -> SQLite mail queue). All of `TeamProvisioningService` would need rewriting for each provider +- **Quality**: Good long-term architecture, but: + - We already HAVE a working message system (JSON inbox files + file watchers) + - SQLite migration would break compatibility with Claude Code CLI's native file format + - Claude Code reads/writes `~/.claude/teams//inboxes/.json` directly — switching to SQLite means Claude Code can't participate without a shim +- **Verdict**: Architecturally elegant but fights against Claude Code's native file-based protocol + +### 4. mozilla/any-agent (meta-framework) + +**Confidence: 3/10 | Reliability: 3/10** + +- **What it is**: Python meta-framework to switch agent providers via config +- **Effort**: 10-14 weeks +- **What breaks**: Language barrier — our entire codebase is TypeScript/Electron. any-agent is Python +- **What stays**: UI (renderer) +- **Reusable code**: ~15% (UI only) +- **Risk**: Very High. Would need either: + - Python backend + IPC bridge to Electron renderer (architectural nightmare) + - Port any-agent concepts to TypeScript (then it's really option 5) +- **Quality**: Theoretically good multi-provider support, but wrong language ecosystem +- **Verdict**: Non-starter for a TypeScript/Electron project + +### 5. Our own AgentRuntime abstraction + +**Confidence: 6/10 | Reliability: 7/10** + +- **What it is**: Custom `AgentRuntime` interface inspired by the patterns above, implemented in TypeScript +- **Effort**: 8-12 weeks for full implementation, 4-6 weeks for MVP +- **What breaks**: `TeamProvisioningService` would be refactored into multiple provider-specific implementations +- **What stays**: Data layer (`agent-teams-controller`, TeamDataService, MCP server), UI, kanban, review, cross-team +- **Reusable code**: ~55-60% +- **Risk**: Medium-High. The abstraction must account for fundamentally different agent lifecycles: + - Claude Code: CLI process, stream-json, Agent Teams built-in, teammate spawning via Task tool + - Codex: subprocess, different CLI protocol, no native team tools + - Gemini CLI: yet another protocol + - API-based agents: HTTP calls, no process management at all +- **Quality**: Could be excellent if done right. But the abstraction boundary is extremely hard to get right because Claude Code's Agent Teams is so deeply integrated +- **Key interfaces needed**: + +```typescript +interface AgentRuntime { + name: string; + spawn(config: AgentSpawnConfig): Promise; + sendMessage(process: AgentProcess, message: string): Promise; + parseOutput(line: string): ParsedAgentOutput; + kill(process: AgentProcess): void; + checkAuth(): Promise; + buildPrompt(context: PromptContext): string; +} + +interface AgentProcess { + pid: number; + stdin: Writable; + stdout: Readable; + stderr: Readable; + on(event: 'exit', handler: (code: number) => void): void; +} +``` + +- **The hard part**: `TeamProvisioningService` is 7,982 LOC of deeply intertwined logic. Splitting it into provider-agnostic + provider-specific parts is a multi-week refactoring effort. The `handleStreamJsonMessage()` method alone (lines 4858-5294) handles 15+ message types with side effects throughout +- **Verdict**: Right direction, but expensive and risky as a first step + +### 6. MCP-Based Approach (expose kanban as MCP server for external agents) + +**Confidence: 8/10 | Reliability: 8/10** + +- **What it is**: Enhance our existing MCP server so external agents (Codex, Gemini, any MCP-capable agent) connect TO us and use our kanban, tasks, messages, review system +- **Effort**: 2-3 weeks +- **What breaks**: Nothing. This is additive +- **What stays**: Everything. 100% of existing code remains unchanged +- **Reusable code**: 100% +- **Risk**: Low. We already have a working MCP server with 30+ tools +- **Quality**: Surprisingly good for the effort level. Here's why: + - **MCP is a cross-vendor standard** — Codex, Gemini CLI, Cursor, and many others already support MCP + - **Our MCP server already exposes the full API**: tasks, kanban, review, messages, cross-team, processes + - **External agents don't need our prompts** — they bring their own intelligence. They just need tools to interact with our kanban board + - **The user experience is**: open our app, see the kanban board, agents from different providers create tasks, update statuses, send messages, request reviews — all visible on the same board + +What's missing from the current MCP server for this to work: +1. **Team creation/config via MCP** — currently only `team_launch`/`team_stop` exist as runtime tools; need `team_create_config` MCP tool +2. **Member registration via MCP** — external agents need to register themselves as team members without Claude Code's `TeamCreate` built-in +3. **Agent identification** — MCP tools need a way for agents to identify themselves (which provider, which model) +4. **Task assignment notifications** — when a task is assigned to an external agent, something needs to notify that agent (webhook? polling? SSE?) +5. **Standalone MCP server mode** — currently our MCP server is spawned as a child process by `TeamMcpConfigBuilder`. For external agents, it needs to run standalone (it already can via `agent-teams-mcp` bin) + +- **Verdict**: Best bang for the buck. Low risk, high reuse, ships fast, provider-agnostic by design + +### 7. Hybrid: Native Claude Code + MCP Server for Others (RECOMMENDED) + +**Confidence: 9/10 | Reliability: 9/10** + +- **What it is**: Keep Claude Code Agent Teams as the primary (optimized) path. Enhance MCP server as the universal integration point for all other agents. Eventually, even Claude Code agents could use MCP tools (they already do via `--mcp-config`) +- **Effort**: 3-4 weeks for Phase 1, incremental thereafter +- **What breaks**: Nothing +- **What stays**: Everything +- **Reusable code**: 100% +- **Risk**: Very Low + +#### Why this is the right answer + +1. **We already have 90% of the infrastructure**: + - `mcp-server/` with 30+ tools covering tasks, kanban, review, messages, cross-team, processes + - `agent-teams-controller` as provider-agnostic data layer + - HTTP control API for launch/stop + - File watcher system that detects changes from ANY source (not just Claude Code) + +2. **Claude Code is our strongest path — don't break it**: + - `TeamProvisioningService` (8K LOC) is battle-tested, handles edge cases (auth retry, stall detection, post-compact context recovery, tool approval) + - The prompt engineering works. It took months to tune delegation-first behavior, task board discipline, review workflow, cross-team messaging + - Replacing this with a generic abstraction would lose all these optimizations + +3. **MCP is the industry standard for tool interop**: + - Claude Code already uses our MCP tools via `--mcp-config` + - OpenAI Codex supports MCP (announced 2025) + - Google Gemini supports MCP + - Cursor/Windsurf support MCP + - Any MCP-capable agent can connect today + +4. **The prompt is NOT a blocker for other agents**: + - Our prompts teach Claude Code agents how to use MCP tools (`task_create`, `kanban_set_column`, etc.) + - External agents using MCP don't need our prompts — MCP tool descriptions ARE the prompt + - Each MCP tool already has a `description` field that tells any agent what it does + +5. **Incremental delivery**: + - Phase 1: Publish `agent-teams-mcp` as standalone npm package, add missing tools + - Phase 2: Add UI support for "external member" type, show provider badge + - Phase 3: Add notification/polling mechanism for task assignments + - Phase 4: Optionally abstract `TeamProvisioningService` for a second native provider + +--- + +## Implementation Plan + +### Phase 1: MCP Server Enhancement (Week 1-2) + +**Goal**: Any MCP-capable agent can join an existing team and work on tasks. + +New MCP tools to add to `mcp-server/src/tools/`: + +``` +team_join — register external agent as team member +team_leave — unregister from team +team_list_teams — discover available teams +team_get_config — get team configuration +member_register — register with provider/model metadata +member_heartbeat — keepalive for external agents +task_poll_assigned — poll for newly assigned tasks (for agents without push) +task_claim — claim an unassigned task +``` + +Files to modify: +- `mcp-server/src/tools/index.ts` — register new tool modules +- `mcp-server/src/tools/memberTools.ts` — NEW: member lifecycle tools +- `mcp-server/src/tools/teamDiscoveryTools.ts` — NEW: team discovery +- `mcp-server/package.json` — prepare for standalone npm publish +- `mcp-server/src/agent-teams-controller.d.ts` — extend controller types if needed + +Files unchanged (0 modifications to core): +- `src/main/services/team/TeamProvisioningService.ts` — untouched +- `src/main/services/team/TeamDataService.ts` — untouched +- `src/main/ipc/teams.ts` — untouched +- All prompt engineering — untouched + +### Phase 2: UI Support for External Agents (Week 2-3) + +**Goal**: External agents appear on the kanban board with provider badges. + +- `src/shared/types/team.ts` — add `provider?: string`, `model?: string` to `TeamMember` +- `src/renderer/components/team/` — show provider icon/badge next to member name +- `src/main/services/team/TeamDataService.ts` — recognize external members in data reads +- File watchers already detect changes from any source — no changes needed + +### Phase 3: Notification Mechanism (Week 3-4) + +**Goal**: External agents get notified of task assignments without polling. + +Options (ranked): +1. **SSE endpoint** — `GET /api/teams/:teamName/events` — server-sent events for task changes. Reliability: 8/10, Confidence: 8/10 +2. **Webhook** — configure callback URL per member. Reliability: 7/10, Confidence: 7/10 +3. **Polling** — `task_poll_assigned` MCP tool (already planned in Phase 1). Reliability: 9/10, Confidence: 9/10 + +Recommend: Start with polling (simplest), add SSE later. + +### Phase 4: Optional Native Provider (Week 6+, if demand exists) + +**Goal**: Add a second native CLI provider (e.g., Codex) with process management. + +Only NOW would we extract the `AgentRuntime` abstraction from option 5, but scoped: +- Extract binary resolution from `ClaudeBinaryResolver` into `CliProvider` interface +- Extract process spawn from `TeamProvisioningService.createTeam()`/`launchTeam()` into provider-specific implementations +- Keep `TeamProvisioningService` as `ClaudeProvisioningService` (rename) +- Create `CodexProvisioningService` implementing same interface + +This is the expensive part (6-8 weeks), but by Phase 4 we'll know if there's actual demand. + +--- + +## Comparison Table + +| Criterion | Mastra | MCO | Overstory | any-agent | AgentRuntime | MCP-Only | **Hybrid** | +|---|---|---|---|---|---|---|---| +| Effort (weeks) | 8-12 | 6-8 | 6-10 | 10-14 | 8-12 | 2-3 | **3-4** | +| Code reuse | 20% | 30% | 35% | 15% | 55% | 100% | **100%** | +| Risk | Very High | High | High | Very High | Medium-High | Low | **Very Low** | +| Breaks existing? | Yes | Yes | Yes | Yes | Partially | No | **No** | +| Multi-provider quality | Good | Low-Med | Good | Good | Good | Good | **Good** | +| Incremental? | No | No | No | No | Partially | Yes | **Yes** | +| Ships fast? | No | No | No | No | No | Yes | **Yes** | +| Keeps Claude optimized? | No | No | No | No | Partially | Yes | **Yes** | +| Industry standard? | Custom | Custom | Custom | Python | Custom | MCP | **MCP** | +| Confidence | 4/10 | 3/10 | 5/10 | 3/10 | 6/10 | 8/10 | **9/10** | +| Reliability | 5/10 | 4/10 | 6/10 | 3/10 | 7/10 | 8/10 | **9/10** | + +--- + +## Prompt Engineering Analysis + +### What percentage is Claude-specific vs generic? + +| Prompt Section | Claude-Specific? | LOC | Purpose | +|---|---|---|---| +| `buildProvisioningPrompt()` | **100% Claude** | ~95 | Uses TeamCreate built-in, Task tool for spawning | +| `buildMemberSpawnPrompt()` | **30% Claude** | ~35 | MCP tool calls are generic; `Task tool` spawn is Claude | +| `buildPersistentLeadContext()` | **20% Claude** | ~100 | Constraints are generic; `TeamCreate`/`TeamDelete` refs are Claude | +| `buildTeamCtlOpsInstructions()` | **0% Claude** | ~100 | Pure MCP tool examples — any agent can use these | +| `buildActionModeProtocol()` | **0% Claude** | ~30 | Generic action mode behavior | +| `buildAgentBlockUsagePolicy()` | **50% Claude** | ~30 | Agent block format is Claude-specific; concept is generic | +| `buildReconnectMemberSpawnPrompt()` | **30% Claude** | ~50 | Similar to spawn prompt | + +**Overall**: ~35% of prompt content is Claude-specific (spawning, built-in tools). ~65% is generic task management behavior that any agent needs (use MCP tools, update task status, post comments before completing, notify lead after completion). + +**For MCP-based external agents**: The MCP tool `description` fields already serve as the "prompt". External agents don't need our big prompt — they discover tools via MCP protocol and use tool descriptions. The only thing missing is a "bootstrap briefing" MCP tool that gives a new agent its role, workflow instructions, and team context — and we already have `member_briefing` for this. + +--- + +## Risk Analysis for Recommended Approach (Hybrid) + +| Risk | Probability | Impact | Mitigation | +|---|---|---|---| +| MCP adoption stalls | Low | Medium | MCP is already adopted by Claude, Codex, Gemini, Cursor | +| External agents can't follow task workflow | Medium | Low | `member_briefing` provides onboarding; tool descriptions guide behavior | +| Performance with many external agents | Low | Medium | MCP server is lightweight; file I/O is the bottleneck (same as now) | +| Breaking changes in MCP protocol | Very Low | High | MCP spec is stable (v1.0+), FastMCP library handles protocol | +| External agent quality varies | High | Medium | This is a feature, not a bug — user chooses which agents to use | +| Path coupling (`~/.claude/`) | Low | Low | `claudeDir` parameter already supported in all MCP tools | + +--- + +## Final Recommendation + +**Go with Option 7: Hybrid (Claude Code native + MCP for others).** + +Reasoning: +1. **Zero risk to existing product** — nothing changes for Claude Code users +2. **Fastest time to market** — 3-4 weeks for meaningful multi-provider support +3. **100% code reuse** — no refactoring, no migration, no breaking changes +4. **Industry standard** — MCP is the protocol all major AI tools are converging on +5. **Natural evolution** — Phase 4 (native providers) can happen later if justified by demand +6. **Our MCP server already works** — 30+ tools, battle-tested with Claude Code Agent Teams +7. **Competitive advantage** — no one else has a kanban board + MCP server combination + +The key insight is: **we don't need to abstract our process management layer to support multiple providers**. Instead, we expose our **data layer** (tasks, kanban, reviews, messages) via MCP, and let each agent provider bring their own process management. Our app becomes the **collaboration hub** — the kanban board where all agents converge, regardless of provider. + +--- + +## Appendix: Key Source Files Referenced + +| File | LOC | Role | +|---|---|---| +| `src/main/services/team/TeamProvisioningService.ts` | 7,982 | Process lifecycle, prompt engineering, stream-json protocol | +| `src/main/services/team/TeamDataService.ts` | 1,953 | Data reads, controller integration | +| `src/main/ipc/teams.ts` | 2,973 | IPC handlers for all team operations | +| `src/main/services/team/ClaudeBinaryResolver.ts` | 292 | Claude binary resolution | +| `src/main/services/team/TeamInboxWriter.ts` | 80+ | File-based inbox writes | +| `src/main/services/team/TeamMcpConfigBuilder.ts` | 228 | MCP config generation for Claude | +| `src/main/services/team/CrossTeamService.ts` | 60+ | Cross-team messaging | +| `src/main/services/team/actionModeInstructions.ts` | 51 | Action mode protocol | +| `src/main/http/teams.ts` | 160+ | HTTP control API | +| `src/main/utils/childProcess.ts` | 182 | CLI spawn/kill utilities | +| `mcp-server/src/index.ts` | 24 | MCP server entry | +| `mcp-server/src/controller.ts` | 19 | Controller factory | +| `mcp-server/src/tools/taskTools.ts` | 501 | Task MCP tools | +| `mcp-server/src/tools/kanbanTools.ts` | 82 | Kanban MCP tools | +| `mcp-server/src/tools/reviewTools.ts` | 104 | Review MCP tools | +| `mcp-server/src/tools/messageTools.ts` | 60 | Message MCP tools | +| `mcp-server/src/tools/processTools.ts` | 89 | Process MCP tools | +| `mcp-server/src/tools/crossTeamTools.ts` | 81 | Cross-team MCP tools | +| `mcp-server/src/tools/runtimeTools.ts` | 78 | Runtime MCP tools | +| `src/types/agent-teams-controller.d.ts` | 101 | Controller type definitions | +| `src/shared/types/team.ts` | 100+ | Shared team types | diff --git a/docs/research/claude-coupling-analysis.md b/docs/research/claude-coupling-analysis.md new file mode 100644 index 00000000..da8e4b34 --- /dev/null +++ b/docs/research/claude-coupling-analysis.md @@ -0,0 +1,536 @@ +# Claude Coupling Analysis + +Comprehensive analysis of how tightly the Claude Agent Teams UI codebase is coupled to Claude/Claude Code/Claude Agent Teams. The goal is to understand the effort required to abstract the AI provider layer to support other agents (OpenAI Codex, Gemini CLI, etc.). + +**Date**: 2026-03-24 +**Branch**: `dev` +**Commit**: `08be859` + +--- + +## Summary Table + +| Area | Coupling (1-10) | Effort | Key Blockers | +|---|---|---|---| +| 1. Process Management | **9** | High | Binary name, CLI flags, kill semantics | +| 2. Protocol / Communication | **10** | High | stream-json is Claude-proprietary | +| 3. Message Parsing (JSONL) | **9** | High | Schema is Claude Code's internal format | +| 4. Team Management | **10** | Very High | Agent Teams is a Claude Code feature | +| 5. Session Data / Paths | **9** | Medium | `~/.claude/` hardcoded everywhere | +| 6. Authentication | **8** | Medium | `claude auth status`, GCS binary dist | +| 7. MCP Integration | **5** | Low | MCP is a cross-vendor standard | +| 8. UI Components | **6** | Medium | Branding strings, CLAUDE.md references | +| 9. Types / Interfaces | **8** | High | Types mirror Claude Code JSONL schema | +| 10. Configuration | **7** | Medium | Path constants, env vars, config files | +| **Pricing / Cost** | **7** | Medium | pricing.json is Claude-model-centric | +| **Model Parsing** | **9** | Low | `parseModelString()` only handles `claude-*` | + +**Overall Coupling Score: 8.3 / 10** — Deeply coupled to Claude Code at nearly every layer. + +--- + +## 1. Process Management + +**Coupling: 9/10 | Effort: High** + +### Specific Files +- `src/main/services/team/ClaudeBinaryResolver.ts` — resolves the `claude` binary across platforms +- `src/main/utils/childProcess.ts` — `spawnCli()` / `execCli()` wrappers inject `CLAUDE_HOOK_JUDGE_MODE` env var +- `src/main/services/team/TeamProvisioningService.ts` — spawns `claude` with Claude-specific flags +- `src/main/services/infrastructure/CliInstallerService.ts` — downloads `claude` binary from GCS +- `src/main/services/schedule/ScheduledTaskExecutor.ts` — spawns `claude -p` for scheduled tasks + +### What's Claude-specific +1. **Binary name**: `ClaudeBinaryResolver` searches for `claude` binary across PATH, NVM, platform-specific dirs +2. **CLI flags**: `--input-format stream-json`, `--output-format stream-json`, `--verbose`, `--setting-sources`, `--mcp-config`, `--disallowedTools`, `--dangerously-skip-permissions`, `--permission-prompt-tool`, `--model`, `--effort`, `--worktree`, `--resume`, `--no-session-persistence`, `--max-turns`, `--permission-mode` +3. **Env var**: `CLAUDE_HOOK_JUDGE_MODE: 'true'` injected into every CLI process +4. **Env var**: `CLAUDE_CONFIG_DIR` set in `buildEnrichedEnv()` +5. **Env var override**: `CLAUDE_CLI_PATH` for custom binary location +6. **Kill semantics**: `killTeamProcess()` uses SIGKILL specifically because Claude CLI cleanup on SIGTERM deletes team files +7. **GCS distribution**: `CliInstallerService` downloads from `https://storage.googleapis.com/claude-code-dist-.../claude-code-releases` +8. **Version command**: `claude --version` expected to output `"X.Y.Z (Claude Code)"` +9. **Install command**: `claude install` for shell integration + +### Abstraction Approach +Create a `CliProvider` interface: +```typescript +interface CliProvider { + name: string; + resolveBinaryPath(): Promise; + buildSpawnArgs(options: SpawnOptions): string[]; + buildEnv(binaryPath: string): NodeJS.ProcessEnv; + parseVersionOutput(stdout: string): string; + getKillSignal(): NodeJS.Signals; + install(): Promise; + checkAuth(): Promise; +} +``` +Each provider (ClaudeCliProvider, CodexCliProvider, GeminiCliProvider) implements this. `ClaudeBinaryResolver` becomes `ClaudeCliProvider.resolveBinaryPath()`. + +--- + +## 2. Protocol / Communication + +**Coupling: 10/10 | Effort: High** + +### Specific Files +- `src/main/services/team/TeamProvisioningService.ts` (lines 126-132, 2742-2980, 4849-5290) — stream-json parser +- `src/renderer/utils/streamJsonParser.ts` — renderer-side stream-json log parsing +- `src/renderer/components/team/CliLogsRichView.tsx` — renders stream-json output +- `src/shared/utils/teammateMessageParser.ts` — parses `` XML format + +### What's Claude-specific +1. **stream-json protocol**: Claude Code's proprietary newline-delimited JSON over stdin/stdout + - Input: `{"type":"user","message":{"role":"user","content":[...]}}\n` + - Output types: `user`, `assistant`, `control_request`, `result`, `system` + - `result.success` = turn complete, `result.error` = failure + - `control_request` for tool approval prompts +2. **Message envelope**: `{"type":"user","message":{"role":"user","content":[{"type":"text","text":"..."}]}}` +3. **Teammate message format**: XML tags `content` +4. **Preflight ping**: `claude -p "Output only the single word PONG." --output-format text --model haiku --max-turns 1 --no-session-persistence` +5. **Tool approval**: `control_request` type with `tool_input`, `tool_name`, approval via stdin + +### Abstraction Approach +This is the hardest area. Create a `CliProtocol` interface: +```typescript +interface CliProtocol { + formatInputMessage(text: string): string; + parseOutputLine(line: string): ParsedOutputMessage; + isResultSuccess(msg: ParsedOutputMessage): boolean; + isResultError(msg: ParsedOutputMessage): boolean; + isToolApprovalRequest(msg: ParsedOutputMessage): ToolApprovalRequest | null; + formatToolApprovalResponse(approved: boolean): string; + getProtocolFlags(): string[]; // e.g. ['--input-format', 'stream-json', ...] +} +``` +Each agent's protocol would need a distinct implementation. OpenAI Codex uses a different protocol (REST-based sandbox execution, not stdin/stdout). This would require major architectural changes. + +--- + +## 3. Message Parsing (JSONL) + +**Coupling: 9/10 | Effort: High** + +### Specific Files +- `src/main/types/jsonl.ts` — raw JSONL entry types (Claude Code session file format) +- `src/main/types/messages.ts` — parsed message types and type guards +- `src/main/types/domain.ts` — domain types referencing `~/.claude/projects/` structure +- `src/main/types/chunks.ts` — chunk building from parsed messages +- `src/main/utils/jsonl.ts` — JSONL file parser +- `src/main/constants/messageTags.ts` — ``, ``, `` tags + +### What's Claude-specific +1. **JSONL schema**: Entry types (`user`, `assistant`, `system`, `summary`, `file-history-snapshot`, `queue-operation`) are Claude Code's internal format +2. **Content blocks**: `text`, `thinking`, `tool_use`, `tool_result`, `image` — follows Anthropic Messages API schema +3. **`thinking` + `signature`**: Extended thinking is an Anthropic-specific feature +4. **`isMeta` flag**: Claude Code's internal convention for distinguishing real user messages from tool results +5. **`isSidechain`**: Claude Code's flag for subagent messages +6. **`stop_reason`**: `end_turn`, `tool_use`, `max_tokens`, `stop_sequence` — Anthropic API values +7. **XML tags in content**: ``, ``, ``, `` are Claude Code's internal message wrapping +8. **`` model**: Claude Code's marker for system-generated placeholders +9. **`isCompactSummary`**: Claude Code's context compaction mechanism +10. **Usage metadata**: `cache_read_input_tokens`, `cache_creation_input_tokens` — Anthropic cache API + +### Abstraction Approach +Create a `SessionParser` interface that converts provider-specific session data to a normalized `ParsedMessage`: +```typescript +interface SessionDataProvider { + parseSessionFile(path: string): AsyncIterable; + isRealUserMessage(msg: ParsedMessage): boolean; + isToolCall(block: ContentBlock): boolean; + extractToolResult(msg: ParsedMessage): ToolResult | null; +} +``` +The existing `ParsedMessage` type is actually reasonably generic (it has `toolCalls`, `toolResults`, `content`). The provider-specific part is the parsing FROM the raw format TO `ParsedMessage`. New providers would implement different parsers. + +--- + +## 4. Team Management + +**Coupling: 10/10 | Effort: Very High** + +### Specific Files +- `src/main/services/team/TeamProvisioningService.ts` (~7800 lines) — the monolith +- `agent-teams-controller/` — workspace package for file-level team operations +- `src/main/services/team/*.ts` (~35 files) — team data, inbox, tasks, kanban, review, cross-team +- `src/shared/types/team.ts` — TeamConfig, TeamTask, SendMessageRequest, etc. +- `src/main/ipc/teams.ts` — ~65 IPC handlers for team operations +- `src/shared/utils/leadDetection.ts` — detects team lead by `agentType` values + +### What's Claude-specific +1. **Agent Teams is a Claude Code feature**: `TeamCreate`, `TaskCreate`, `TaskUpdate`, `TaskList`, `TaskGet`, `SendMessage`, `TeamDelete` are Claude Code CLI tools +2. **Team file structure**: `~/.claude/teams/{teamName}/config.json`, `inboxes/{member}.json`, `kanban-state.json`, `processes.json`, `members.meta.json` +3. **Task file structure**: `~/.claude/tasks/{teamName}/{taskId}.json` +4. **Inbox protocol**: File-based message passing — lead reads stdin, teammates read inbox files +5. **Lead/teammate distinction**: Lead uses stream-json, teammates are independent CLI processes +6. **Tool blocking**: `--disallowedTools TeamDelete,TodoWrite` +7. **`agentType` values**: `team-lead`, `lead`, `orchestrator`, `general-purpose` — Claude Code internal values +8. **`teammate_spawned` tool results**: How team member processes are detected +9. **Cross-team communication**: `cross_team_send`, `cross_team_list_targets`, `cross_team_get_outbox` +10. **Action mode instructions**: Custom protocol text injected into team prompts +11. **`agent-teams-controller` package**: Pure JS module that reads Claude Code's team filesystem directly + +### Abstraction Approach +This is by far the hardest area. Agent Teams is a unique Claude Code feature with no equivalent in other CLI agents. Options: +- **Option A**: Keep team management as Claude-only feature, abstract only session viewing +- **Option B**: Build a generic team orchestration layer that wraps different agent CLIs. Would need to implement inbox/task/kanban semantics independently of Claude Code. +- **Option C**: Make team management pluggable — each provider declares `supportsTeams: boolean` and provides a `TeamOrchestrator` implementation if supported + +Option A is the most realistic short-term approach. + +--- + +## 5. Session Data / Paths + +**Coupling: 9/10 | Effort: Medium** + +### Specific Files +- `src/main/utils/pathDecoder.ts` — all path construction (`~/.claude/projects/`, `~/.claude/todos/`, `~/.claude/teams/`, `~/.claude/tasks/`) +- `src/main/services/discovery/ProjectScanner.ts` — scans `~/.claude/projects/` +- `src/main/services/infrastructure/FileWatcher.ts` — watches `~/.claude/projects/`, `~/.claude/todos/`, `~/.claude/teams/`, `~/.claude/tasks/` +- `src/main/services/infrastructure/SshConnectionManager.ts` — hardcodes `~/.claude/projects` for remote +- `src/main/services/infrastructure/ConfigManager.ts` — config at `~/.claude/claude-devtools-config.json` +- `src/main/constants/worktreePatterns.ts` — detects `.claude/worktrees/` pattern + +### What's Claude-specific +1. **Base path**: `~/.claude/` as root for all data +2. **Path encoding**: `/Users/name/project` → `-Users-name-project` (Claude Code's convention) +3. **Session files**: `~/.claude/projects/{encoded-path}/{uuid}.jsonl` +4. **Subagent files**: `~/.claude/projects/{path}/{session_uuid}/agent_{uuid}.jsonl` +5. **Todo files**: `~/.claude/todos/{sessionId}.json` +6. **Team files**: `~/.claude/teams/{teamName}/...` +7. **Task files**: `~/.claude/tasks/{teamName}/{taskId}.json` +8. **Config**: `~/.claude/claude-devtools-config.json` (our config, stored in Claude's directory) +9. **SSH remote**: Hardcoded `/home/{user}/.claude/projects`, `/Users/{user}/.claude/projects`, `/root/.claude/projects` +10. **Worktree patterns**: `.claude/worktrees/` as a known source + +### Abstraction Approach +Path resolution is already partially abstracted via `getClaudeBasePath()` with override support (`setClaudeBasePathOverride`). Extend to: +```typescript +interface DataPathProvider { + getBasePath(): string; // ~/.claude/, ~/.codex/, etc. + getProjectsPath(): string; // {base}/projects/ + getSessionPath(projectId: string, sessionId: string): string; + getSubagentPath(projectId: string, sessionId: string): string; + encodeProjectPath(absolutePath: string): string; + decodeProjectPath(encoded: string): string; +} +``` +Medium effort because path functions are centralized in `pathDecoder.ts`. The SSH remote paths would need provider-specific resolution. + +--- + +## 6. Authentication + +**Coupling: 8/10 | Effort: Medium** + +### Specific Files +- `src/main/services/infrastructure/CliInstallerService.ts` — `claude auth status --output-format json`, `claude --version` +- `src/shared/types/cliInstaller.ts` — `CliInstallationStatus.authLoggedIn`, `authMethod` +- `src/main/utils/cliAuthDiagLog.ts` — diagnostic logging for auth issues +- `src/renderer/components/dashboard/CliStatusBanner.tsx` — shows login status + +### What's Claude-specific +1. **Auth check**: `claude auth status --output-format json` — returns `{loggedIn: boolean, authMethod: string}` +2. **Auth method types**: `"oauth_token"`, `"api_key"` — Claude-specific +3. **Binary distribution**: GCS bucket `claude-code-dist-*` with platform-specific binaries +4. **Install flow**: Downloads binary → SHA256 verify → `claude install` for shell integration +5. **Version parsing**: `"2.1.59 (Claude Code)"` format +6. **Preflight auth check**: Runs `claude -p "PONG"` to verify auth works + +### Abstraction Approach +```typescript +interface CliInstallerProvider { + getLatestVersion(): Promise; + downloadBinary(platform: CliPlatform): Promise; // returns temp path + installBinary(binaryPath: string): Promise; + checkVersion(binaryPath: string): Promise; + checkAuth(binaryPath: string): Promise; +} +``` + +--- + +## 7. MCP Integration + +**Coupling: 5/10 | Effort: Low** + +### Specific Files +- `src/main/services/team/TeamMcpConfigBuilder.ts` — builds MCP config JSON for team processes +- `src/main/services/extensions/install/McpInstallService.ts` — installs MCP servers +- `src/shared/types/extensions/mcp.ts` — MCP types +- `mcp-server/` — built-in MCP server for the app + +### What's Claude-specific +1. **Config file location**: `.claude.json` in home dir, `.mcp.json` in project +2. **CLI flag**: `--mcp-config` to pass config path to CLI +3. **Config format**: Standard MCP format (`{mcpServers: {name: {command, args}}}`) +4. **Built-in server**: `mcp-server/` is our own — not Claude-specific + +### What's NOT Claude-specific +MCP (Model Context Protocol) is becoming a cross-vendor standard. The protocol itself is vendor-neutral. The config format may vary by agent but the server implementation is portable. + +### Abstraction Approach +MCP is already the most abstracted area. The only coupling is the config file naming (`.claude.json`) and the `--mcp-config` flag. A provider interface would specify how to pass MCP config to the CLI. + +--- + +## 8. UI Components + +**Coupling: 6/10 | Effort: Medium** + +### Specific Files +- `src/renderer/index.html` — title "Claude Agent Teams UI" +- `src/renderer/components/common/ErrorBoundary.tsx` — CSS classes `bg-claude-dark-bg`, `text-claude-dark-text` +- `src/renderer/components/team/ClaudeLogsDialog.tsx`, `ClaudeLogsPanel.tsx`, `ClaudeLogsSection.tsx`, `ClaudeLogsFilterPopover.tsx`, `useClaudeLogsController.ts` — "Claude Logs" feature naming +- `src/renderer/types/claudeMd.ts` — CLAUDE.md tracking types +- `src/renderer/utils/claudeMdTracker.ts` (70 occurrences) — CLAUDE.md context tracking +- `src/renderer/utils/contextTracker.ts` (56 occurrences) — references CLAUDE.md sources +- `src/renderer/components/chat/SessionContextPanel/` — CLAUDE.md section +- `src/renderer/components/settings/sections/GeneralSection.tsx` (69 occurrences) — "Claude Root" settings +- `src/renderer/components/dashboard/CliStatusBanner.tsx` — "Claude CLI" status +- `src/renderer/index.css` — comments mentioning "Claude Code" +- `src/shared/constants/cli.ts` — `CLI_NOT_FOUND_MESSAGE = 'Claude CLI not found...'` + +### What's Claude-specific +1. **Branding strings**: "Claude Agent Teams UI", "Claude CLI", "Claude Logs", "Claude Root" +2. **CSS theme variables**: `claude-dark-bg`, `claude-dark-text`, `claude-dark-border`, `claude-dark-surface` in ErrorBoundary +3. **CLAUDE.md feature**: The entire CLAUDE.md tracking system (types, tracker, UI) is Claude Code specific +4. **"Claude Logs"**: 5+ components for viewing CLI logs named "ClaudeLogs*" +5. **Settings**: "Local Claude Root" setting for `~/.claude` override + +### What's NOT Claude-specific +- Chat rendering (UserChunk, AIChunk, SystemChunk) is generic +- Kanban board UI is generic +- Team member list, task management UI is generic +- Tool call visualization is generic (tool_use/tool_result pattern is shared across LLM providers) + +### Abstraction Approach +1. Replace hardcoded strings with a config/branding module +2. Rename `ClaudeLogs*` → `CliLogs*` or `AgentLogs*` +3. Rename `claudeMdTracker` → `instructionFileTracker` (provider specifies filename pattern) +4. CSS variable renaming is mechanical (`claude-dark-*` → `app-dark-*`) +5. "Claude Root" → "Agent Data Directory" + +--- + +## 9. Types / Interfaces + +**Coupling: 8/10 | Effort: High** + +### Specific Files +- `src/main/types/jsonl.ts` — `ChatHistoryEntry` union follows Claude Code JSONL exactly +- `src/main/types/messages.ts` — `ParsedMessage` with Claude-specific fields (`isMeta`, `isSidechain`, `isCompactSummary`) +- `src/main/types/domain.ts` — `MessageType`, `TokenUsage` with `cache_read_input_tokens` +- `src/shared/types/team.ts` — Team types entirely Claude Agent Teams specific +- `src/shared/types/api.ts` — API surface exposes Claude-specific session/team types +- `src/shared/utils/modelParser.ts` — parses `claude-*` model strings only +- `src/shared/utils/pricing.ts` — pricing data is Claude/Anthropic model centric + +### What's Claude-specific +1. **Content block types**: `thinking` with `signature` field — Anthropic extended thinking +2. **Token usage fields**: `cache_read_input_tokens`, `cache_creation_input_tokens` — Anthropic prompt caching +3. **Model string format**: `claude-{family}-{major}-{minor}-{date}` and old `claude-{major}-{family}-{date}` +4. **Model families**: `sonnet`, `opus`, `haiku` — Anthropic model names +5. **`isMeta`/`isSidechain`**: Claude Code's internal conventions +6. **`stop_reason` values**: `end_turn`, `tool_use`, `max_tokens`, `stop_sequence` +7. **Pricing data**: `resources/pricing.json` is Anthropic-model-only (includes Bedrock/Vertex variants) + +### Abstraction Approach +The `ParsedMessage` type is actually fairly close to a generic representation. Key changes: +- Make `thinking` content optional/provider-specific +- Generalize token usage (some fields are Anthropic-specific) +- `modelParser.ts` needs a provider-aware implementation +- Pricing needs multi-provider support (or provider-supplied pricing) + +--- + +## 10. Configuration + +**Coupling: 7/10 | Effort: Medium** + +### Specific Files +- `src/main/services/infrastructure/ConfigManager.ts` — stores config in `~/.claude/claude-devtools-config.json` +- `src/main/utils/cliEnv.ts` — sets `CLAUDE_CONFIG_DIR` env var +- `src/main/utils/pathDecoder.ts` — `getClaudeBasePath()` with override support +- `src/shared/utils/cliArgsParser.ts` — `PROTECTED_CLI_FLAGS` are Claude CLI flags +- `src/main/ipc/config.ts` — configuration IPC handlers +- `src/main/services/team/TeamMcpConfigBuilder.ts` — `.claude.json` user MCP config + +### What's Claude-specific +1. **Config dir**: `~/.claude/` as base +2. **Config filename**: `claude-devtools-config.json` +3. **Env vars**: `CLAUDE_CONFIG_DIR`, `CLAUDE_CLI_PATH`, `CLAUDE_HOOK_JUDGE_MODE` +4. **Protected flags**: `--input-format`, `--output-format`, `--setting-sources`, `--mcp-config`, `--disallowedTools`, `--verbose` +5. **Settings sources**: `user,project,local` — Claude CLI setting hierarchy +6. **User config files**: `.claude.json` (MCP), `~/.claude/settings.json` + +### Abstraction Approach +Already partially abstracted (`setClaudeBasePathOverride` exists). Extend: +```typescript +interface ProviderConfig { + basePath: string; + configFileName: string; + envVars: Record; + protectedFlags: Set; + settingSources?: string; +} +``` + +--- + +## Additional Coupling: `agent-teams-controller` Package + +**Coupling: 10/10 | Effort: High** + +The `agent-teams-controller/` workspace package is a pure JS module that directly reads/writes Claude Code's team filesystem: +- `runtimeHelpers.js`: `getPaths()` returns `~/.claude/teams/{name}/`, `~/.claude/tasks/{name}/` +- `context.js`: `createControllerContext({teamName, claudeDir})` +- `tasks.js`, `kanban.js`, `review.js`, `messages.js`, etc. — all operate on Claude's file structures + +This package would need to be either: +- Made provider-aware (different file layouts per provider) +- Replaced with a generic team data layer + +--- + +## Estimated Overall Effort for Full Abstraction + +| Phase | Scope | Estimated Effort | +|---|---|---| +| **Phase 1**: Session viewing only | Path abstraction + JSONL parser + model parser | 2-3 weeks | +| **Phase 2**: UI de-branding | Rename strings, CSS vars, component names | 1 week | +| **Phase 3**: CLI provider interface | Binary resolution + auth + install | 2 weeks | +| **Phase 4**: Protocol abstraction | stream-json → generic protocol layer | 3-4 weeks | +| **Phase 5**: Team management abstraction | Generic orchestration layer | 4-8 weeks | +| **Total** | Full multi-provider support | 12-18 weeks | + +--- + +## Recommended Abstraction Strategy + +### Priority Order (what to do first) + +1. **Paths first** (low risk, high reward) — `pathDecoder.ts` already has override support. Make `getBasePath()` provider-aware. This unblocks session viewing for other agents. + +2. **Session parser second** — Create `SessionDataProvider` interface. The existing `ParsedMessage` type works as the normalized target. Each provider implements a parser FROM their raw format TO `ParsedMessage`. + +3. **Model/pricing third** — Make `parseModelString()` and pricing lookup provider-aware. Use a registry pattern where each provider registers its models. + +4. **CLI provider fourth** — Abstract binary resolution, auth, install, spawning. This is where protocol differences become critical. + +5. **Team management last** — This is the hardest and most Claude-specific feature. Consider keeping it as a Claude-only feature initially. + +### What's Hardest + +1. **stream-json protocol** — This is Claude Code's proprietary stdin/stdout protocol. Other agents use completely different paradigms (OpenAI Codex uses sandboxed REST API, Gemini CLI may use different protocol). Abstracting this requires a fundamental architectural decision about how the app communicates with agents. + +2. **Agent Teams** — No other CLI agent has an equivalent feature. The entire team management subsystem (~35 service files, ~65 IPC handlers, controller package) is built around Claude Code's Agent Teams. Supporting multi-agent orchestration for other providers would essentially mean building this from scratch. + +3. **JSONL session format** — Claude Code's JSONL format is deeply embedded in the codebase (types, parsers, chunk builders, context trackers). While `ParsedMessage` serves as a reasonable intermediary, the raw parsing layer touching 10+ files would need provider-specific implementations. + +### What's Easiest + +1. **MCP** — Already vendor-neutral. Only config file naming and CLI flag need adjustment. +2. **UI branding** — Mechanical string/CSS replacement. +3. **Path configuration** — Override mechanism already exists. + +--- + +## Architecture Diagram: Provider-Agnostic Layer + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Renderer (UI) │ +│ Components (generic) │ Store (generic) │ Types (generic)│ +└────────────────────────────┬────────────────────────────────┘ + │ IPC +┌────────────────────────────┴────────────────────────────────┐ +│ Provider Manager │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ Session │ │ CLI │ │ Team │ │ +│ │ Provider │ │ Provider │ │ Provider │ ← Interfaces │ +│ └─────┬────┘ └─────┬────┘ └─────┬────┘ │ +│ │ │ │ │ +│ ┌─────┴────┐ ┌─────┴────┐ ┌─────┴────────┐ │ +│ │ Claude │ │ Claude │ │ Claude Agent │ │ +│ │ JSONL │ │ CLI │ │ Teams │ ← Impls │ +│ │ Parser │ │ Spawner │ │ Orchestrator │ │ +│ └──────────┘ └──────────┘ └──────────────┘ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ +│ │ Codex │ │ Codex │ │ (not │ │ +│ │ Session │ │ CLI │ │ supported) │ ← Future │ +│ │ Parser │ │ Spawner │ │ │ │ +│ └──────────┘ └──────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────────┘ + │ +┌────────────────────────────┴────────────────────────────────┐ +│ Data Path Provider │ +│ ~/.claude/ │ ~/.codex/ │ ~/.gemini/ │ etc. │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Key Interfaces + +``` +CliProvider +├── resolveBinaryPath() → string | null +├── buildSpawnArgs(opts) → string[] +├── buildEnv(binary) → ProcessEnv +├── checkAuth(binary) → AuthStatus +├── getKillSignal() → Signals +└── getProtocolFlags() → string[] + +CliProtocol +├── formatInputMessage(text) → string +├── parseOutputLine(line) → ParsedOutput +├── isSuccess(msg) → boolean +├── isError(msg) → boolean +└── isToolApproval(msg) → ToolApproval | null + +SessionDataProvider +├── parseSessionFile(path) → AsyncIterable +├── getSessionPaths(basePath) → string[] +├── getSubagentPaths(sessionPath) → string[] +└── encodeProjectPath(path) → string + +DataPathProvider +├── getBasePath() → string +├── getProjectsPath() → string +├── getTeamsPath() → string | null +├── getSessionFilePath(project, session) → string +└── getConfigFilePath() → string + +TeamOrchestrator (optional per provider) +├── supportsTeams: boolean +├── createTeam(request) → TeamCreateResponse +├── launchTeam(request) → TeamLaunchResponse +├── sendMessage(team, request) → SendMessageResult +└── stopTeam(teamName) → void + +ModelInfoProvider +├── parseModelString(model) → ModelInfo | null +├── getModelFamilies() → string[] +├── getPricing(model) → Pricing | null +└── getContextWindow(model) → number + +InstructionFileProvider +├── getFilename() → string // "CLAUDE.md", ".codexrc", etc. +├── getGlobalPath() → string +├── getProjectPath(projectDir) → string +└── getSourceTypes() → string[] +``` + +--- + +## Conclusion + +The codebase is deeply coupled to Claude Code at approximately 8.3/10 overall. The coupling is most severe in: +1. **Team management** (10/10) — Claude Agent Teams is a unique feature with no equivalent +2. **Protocol** (10/10) — stream-json is proprietary +3. **Session data** (9/10) — JSONL format, path encoding, file structure +4. **Process management** (9/10) — Claude binary, flags, kill semantics + +The most pragmatic path to multi-provider support would be a phased approach starting with session viewing (paths + JSONL parser abstraction), which delivers value with ~3 weeks effort, before tackling the much harder protocol and team management layers. + +Full abstraction to support other agents with team management would require 12-18 weeks of focused effort, with the protocol and team management layers being the primary engineering challenges. diff --git a/docs/research/claude-kanban-dataflow.md b/docs/research/claude-kanban-dataflow.md new file mode 100644 index 00000000..fe696db9 --- /dev/null +++ b/docs/research/claude-kanban-dataflow.md @@ -0,0 +1,431 @@ +# Claude Kanban Data Flow: Full Architecture Analysis + +## Executive Summary + +Claude Code **does NOT use its own built-in Agent Teams tools** (TaskCreate, TaskUpdate, TaskList, etc.) for kanban management. Instead, our app injects a **custom MCP server** (`agent-teams-mcp`) that provides its own set of tools (`task_create`, `task_list`, `task_start`, `task_complete`, `review_request`, etc.). Claude's built-in `TaskCreate` is explicitly demoted to "optional for private planning only" via the provisioning prompt. + +The data flow is: **Claude calls MCP tools → agent-teams-controller writes JSON files to disk → fs.watch() detects changes → IPC event → React UI updates**. + +--- + +## 1. How the MCP Server Gets Injected + +### TeamMcpConfigBuilder (`src/main/services/team/TeamMcpConfigBuilder.ts`) + +When a team is created or launched, `TeamMcpConfigBuilder.writeConfigFile()` generates a temporary JSON file: + +``` +/tmp/claude-team-mcp/agent-teams-mcp-.json +``` + +Contents: +```json +{ + "mcpServers": { + "agent-teams": { + "command": "node", + "args": ["/path/to/mcp-server/index.js"] + }, + ...userMcpServers + } +} +``` + +This merges the user's `~/.claude.json` MCP servers with the injected `agent-teams` server (our server wins on name collision). + +### CLI Launch Args (`TeamProvisioningService.ts`, lines 2986-2989) + +```typescript +'--mcp-config', mcpConfigPath, +'--disallowedTools', 'TeamDelete,TodoWrite', +``` + +- `--mcp-config` points Claude CLI to our generated config +- `TeamDelete` is blocked to prevent team cleanup +- `TodoWrite` is blocked because Opus tends to use it instead of our MCP tools +- Claude's native `TaskCreate`/`TaskUpdate` are NOT blocked — they are left available but deprioritized via prompt engineering + +### The Provisioning Prompt (line 724) + +``` +- TaskCreate is optional for private planning only; do NOT use it for team-board tasks. +``` + +The prompt then explicitly instructs Claude to use MCP tools: + +``` +Task board operations — use MCP tools directly: +- Get task details: task_get { teamName: "...", taskId: "" } +- Create task: task_create { teamName: "...", subject: "...", ... } +- Start task: task_start { teamName: "...", taskId: "" } +... +``` + +--- + +## 2. What MCP Tools Exist + +### MCP Server Structure (`mcp-server/`) + +``` +mcp-server/ +├── src/ +│ ├── index.ts — FastMCP server, stdio transport +│ ├── controller.ts — wraps agent-teams-controller +│ └── tools/ +│ ├── taskTools.ts — task_create, task_list, task_get, task_set_status, task_start, +│ │ task_complete, task_set_owner, task_add_comment, task_link, etc. +│ ├── kanbanTools.ts — kanban_get, kanban_set_column, kanban_clear, kanban_add_reviewer +│ ├── reviewTools.ts — review_request, review_start, review_approve, review_request_changes +│ ├── messageTools.ts +│ ├── processTools.ts +│ ├── runtimeTools.ts +│ └── crossTeamTools.ts +``` + +### Full MCP Tool List + +| Domain | Tools | +|--------|-------| +| Task | `task_create`, `task_create_from_message`, `task_get`, `task_get_comment`, `task_list`, `task_set_status`, `task_start`, `task_complete`, `task_set_owner`, `task_add_comment`, `task_attach_file`, `task_attach_comment_file`, `task_set_clarification`, `task_link`, `task_unlink`, `member_briefing`, `task_briefing` | +| Kanban | `kanban_get`, `kanban_set_column`, `kanban_clear`, `kanban_list_reviewers`, `kanban_add_reviewer`, `kanban_remove_reviewer` | +| Review | `review_request`, `review_start`, `review_approve`, `review_request_changes` | +| Message | (message-related tools) | +| Process | (process-related tools) | +| Runtime | (runtime-related tools) | +| Cross-team | `cross_team_send`, `cross_team_list_targets`, `cross_team_get_outbox` | + +--- + +## 3. Data Flow: Claude MCP Tool Call → Disk + +### The Shared Library: `agent-teams-controller` + +Both the MCP server and the Electron main process use the same `agent-teams-controller` package (workspace dependency). This is a plain JS library that provides: + +```javascript +// agent-teams-controller/src/controller.js +function createController(options) { + const context = createControllerContext(options); // { teamName, paths } + return { + tasks: bindModule(context, tasks), + kanban: bindModule(context, kanban), + review: bindModule(context, review), + messages: bindModule(context, messages), + ... + }; +} +``` + +### Path Resolution + +```javascript +// agent-teams-controller/src/internal/runtimeHelpers.js +function getPaths(flags, teamName) { + const claudeDir = getClaudeDir(flags); // ~/.claude + return { + teamDir: path.join(claudeDir, 'teams', teamName), + tasksDir: path.join(claudeDir, 'tasks', teamName), + kanbanPath: path.join(claudeDir, 'teams', teamName, 'kanban-state.json'), + ... + }; +} +``` + +So tasks live in `~/.claude/tasks//.json` and kanban state lives in `~/.claude/teams//kanban-state.json`. + +### Task Creation Flow (MCP → Disk) + +1. Claude calls MCP tool: `task_create { teamName: "my-team", subject: "Fix bug" }` +2. `mcp-server/src/tools/taskTools.ts` → `getController(teamName).tasks.createTask(...)` +3. `agent-teams-controller/src/internal/tasks.js` → `taskStore.createTask(context, params)` +4. `agent-teams-controller/src/internal/taskStore.js`: + ```javascript + function writeJson(filePath, value) { + ensureDir(path.dirname(filePath)); + const tempPath = `${filePath}.${process.pid}.${Date.now()}.tmp`; + fs.writeFileSync(tempPath, JSON.stringify(value, null, 2)); + fs.renameSync(tempPath, filePath); // atomic write + } + ``` +5. Result: `~/.claude/tasks/my-team/.json` is created + +### Kanban State Flow + +The kanban state is a separate JSON file (`kanban-state.json`) in the teams directory. When Claude calls `review_request` or `kanban_set_column`, the controller writes to `~/.claude/teams//kanban-state.json`. + +--- + +## 4. Data Flow: Disk → UI + +### FileWatcher (`src/main/services/infrastructure/FileWatcher.ts`) + +There are **two separate fs.watch()** watchers: + +1. **Teams watcher** — watches `~/.claude/teams/` (recursive) + - Detects: `config.json`, `kanban-state.json`, `inboxes/*.json`, `sentMessages.json`, `processes.json` + +2. **Tasks watcher** — watches `~/.claude/tasks/` (recursive) + - Detects: `/.json` changes + +When a file changes: + +```typescript +// FileWatcher.ts, line 404 +this.tasksWatcher = fs.watch(this.tasksPath, { recursive: true }, (eventType, filename) => { + this.handleTasksChange(eventType, filename); +}); +``` + +`processTasksChange()` (line 1028) parses the filename to extract `teamName` and `detail` (e.g., "12.json"), then emits: + +```typescript +const event: TeamChangeEvent = { type: 'task', teamName, detail: relative }; +this.emit('team-change', event); +``` + +### Event Propagation (`src/main/index.ts`, line 500-608) + +`wireFileWatcherEvents()` listens for `team-change` events: + +```typescript +context.fileWatcher.on('team-change', teamChangeHandler); +``` + +For task events (`row.type === 'task'`): + +1. **Sends IPC to renderer**: `mainWindow.webContents.send(TEAM_CHANGE, event)` (line 502) +2. **Broadcasts to HTTP SSE**: `httpServer?.broadcast('team-change', event)` (line 504) +3. **Reconciles artifacts**: `teamDataService.reconcileTeamArtifacts(teamName)` (line 583) +4. **Notifies lead**: `teamDataService.notifyLeadOnTeammateTaskStart(teamName, taskId)` (line 590) +5. **Backs up task**: `teamBackupService.scheduleTaskBackup(teamName, detail)` (line 606) + +### UI Data Reading + +The renderer (React) receives `TEAM_CHANGE` events and re-fetches task data via IPC: + +- `team:getTasks` → calls `TeamTaskReader.getTasks(teamName)` which reads all `~/.claude/tasks//*.json` files +- `team:updateKanban` → calls `TeamKanbanManager.updateTask()` which reads/writes `kanban-state.json` + +The Electron `TeamTaskReader` (`src/main/services/team/TeamTaskReader.ts`) re-reads all task JSON files from disk, parses them, filters out `_internal` tasks, normalizes fields, and returns `TeamTask[]` to the renderer. + +--- + +## 5. Claude's Built-in Tools vs Our MCP Tools + +### Claude's Native Built-in Tools (Agent Teams Protocol) + +| Native Tool | Purpose | Blocked? | +|-------------|---------|----------| +| `TeamCreate` | Create team structure (config.json, state) | No — used during provisioning | +| `TaskCreate` | Create a task via CLI internal mechanism | No — but deprioritized by prompt ("optional for private planning only") | +| `TaskUpdate` | Update task via CLI internal mechanism | No — but never instructed to use | +| `TaskList` | List tasks via CLI | No — but never instructed to use | +| `TaskGet` | Get task via CLI | No — but never instructed to use | +| `SendMessage` | Send message between agents | No — actively used for inter-agent chat | +| `TeamDelete` | Delete team | **YES — blocked via --disallowedTools** | +| `TodoWrite` | Write todo items | **YES — blocked via --disallowedTools** | +| `Agent` | Spawn subagent/teammate | No — actively used to spawn teammates | + +### Our MCP Tools (agent-teams-mcp) + +| MCP Tool | Purpose | Claude instructed to use? | +|----------|---------|-------------------------| +| `task_create` | Create task on board | **YES** — primary task creation | +| `task_start` | Move task to in_progress | **YES** | +| `task_complete` | Move task to completed | **YES** | +| `task_add_comment` | Add comment to task | **YES** | +| `task_get` | Read task details | **YES** | +| `task_list` | List all tasks | **YES** | +| `review_request` | Move to review column | **YES** | +| `review_approve` | Approve review | **YES** | +| `kanban_set_column` | Move task on kanban | **YES** | + +### Why This Split? + +Claude's native `TaskCreate` writes tasks to `~/.claude/tasks//.json` too — the same location. But: + +1. **Our MCP tools add richer fields** (displayId, workIntervals, historyEvents, comments, attachments, reviewState, sourceMessage, etc.) +2. **Our MCP tools enforce board discipline** (via agent-teams-controller logic) +3. **Our kanban state is a separate file** (`kanban-state.json`) that Claude's native tools don't manage +4. **Review workflow** (review_request → review_start → review_approve / review_request_changes) is entirely our MCP layer + +Claude's native TaskCreate creates simpler task JSON files. The CLI's internal Zod schema requires `description`, `blocks`, `blockedBy` fields — our `TeamTaskWriter.createTask()` (line 68-71) ensures CLI compatibility: +```typescript +const cliCompatibleTask = { + ...task, + description: task.description ?? '', + blocks: task.blocks ?? [], + blockedBy: task.blockedBy ?? [], +}; +``` + +--- + +## 6. The Two-Writer Problem + +Both writers hit the same filesystem: + +| Writer | Writes to | When | +|--------|-----------|------| +| MCP server (agent-teams-controller) | `~/.claude/tasks//.json` | Claude calls `task_create`, `task_set_status`, etc. | +| Electron main (TeamTaskWriter) | `~/.claude/tasks//.json` | UI creates/updates tasks (user clicks "Create Task", drag-drop, etc.) | +| Claude CLI built-in | `~/.claude/tasks//.json` | If Claude uses native TaskCreate (deprioritized) | + +All three write to the same files. Concurrent writes are handled by: +- MCP: `taskStore.writeJson()` uses atomic temp+rename +- Electron: `TeamTaskWriter` uses per-file locks + `atomicWriteAsync()` +- CLI: Its own write mechanism + +There is NO cross-process lock between MCP and Electron — they rely on atomic writes and eventual consistency (file watcher detects changes within ~100ms debounce). + +--- + +## 7. Full Data Flow Diagram + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Claude Code CLI (stream-json process) │ +│ │ +│ ┌─────────────────┐ ┌──────────────────────┐ │ +│ │ Built-in Tools │ │ MCP Tools │ │ +│ │ • SendMessage │ │ (agent-teams-mcp) │ │ +│ │ • Agent │ │ • task_create │ │ +│ │ • TaskCreate(*) │ │ • task_start │ │ +│ │ • Read/Write/Bash │ │ • task_complete │ │ +│ └────────┬──────────┘ │ • task_add_comment │ │ +│ │ │ • review_request │ │ +│ │ │ • kanban_set_column │ │ +│ │ └───────────┬──────────────┘ │ +│ │ │ │ +│ stdout (stream-json) agent-teams-controller │ +│ │ │ │ +└───────────┼───────────────────────────┼─────────────────────────────┘ + │ │ + │ ┌──────▼──────────────────┐ + │ │ File System (disk) │ + │ │ │ + │ │ ~/.claude/tasks// │ + │ │ ├── 1.json │ + │ │ ├── 2.json │ + │ │ └── ... │ + │ │ │ + │ │ ~/.claude/teams// │ + │ │ ├── config.json │ + │ │ ├── kanban-state.json │ + │ │ └── inboxes/ │ + │ └──────┬──────────────────┘ + │ │ + │ fs.watch() (recursive) + │ │ +┌───────────┼───────────────────────────┼─────────────────────────────┐ +│ Electron Main Process │ │ +│ │ │ │ +│ ┌────────▼──────────┐ ┌───────────▼───────────┐ │ +│ │ TeamProvisioning │ │ FileWatcher │ │ +│ │ Service │ │ • tasksWatcher │ │ +│ │ (parses stdout) │ │ • teamsWatcher │ │ +│ │ │ └───────────┬───────────┘ │ +│ │ • captureSendMsg │ │ │ +│ │ • captureSpawnEvt │ TeamChangeEvent { type: 'task' } │ +│ │ • detectSessionId │ │ │ +│ └─────────────────────┘ ┌───────────▼───────────┐ │ +│ │ wireFileWatcherEvents │ │ +│ │ (src/main/index.ts) │ │ +│ ┌──────────────────────┐ └───────────┬───────────┘ │ +│ │ TeamTaskReader │ │ │ +│ │ (re-reads all .json) │◄─────────────┤ │ +│ │ │ │ │ +│ │ TeamKanbanManager │ IPC: TEAM_CHANGE │ +│ │ (reads kanban-state) │ │ │ +│ └──────────────────────┘ │ │ +│ │ │ +│ ┌──────────────────────┐ │ │ +│ │ TeamTaskWriter │ │ │ +│ │ (UI-initiated writes) │ │ │ +│ └──────────────────────┘ │ │ +└────────────────────────────────────────┼────────────────────────────┘ + │ + IPC (webContents.send) + │ +┌────────────────────────────────────────┼────────────────────────────┐ +│ Renderer (React + Zustand) │ │ +│ │ │ +│ team-change event → refetch tasks via IPC → update Zustand store │ +│ → re-render KanbanBoard │ +│ │ +└────────────────────────────────────────────────────────────────────┘ + +(*) TaskCreate — Claude's native tool, deprioritized by prompt. + Writes to same location but lacks our rich metadata. +``` + +--- + +## 8. Key Questions Answered + +### Does Claude currently use MCP for kanban management? + +**YES.** Claude uses our `agent-teams-mcp` MCP server for ALL task board operations. The server is injected via `--mcp-config` when spawning the CLI process. Claude's native `TaskCreate` is not blocked but is explicitly deprioritized ("optional for private planning only") via the system prompt. + +### How does task data flow? + +1. **Claude calls MCP tool** (e.g., `task_create`) via the stdio MCP transport +2. **agent-teams-controller** writes a JSON file to `~/.claude/tasks//.json` (atomic write via temp+rename) +3. **fs.watch()** in FileWatcher detects the change (100ms debounce) +4. **TeamChangeEvent** `{ type: 'task', teamName, detail: '.json' }` emitted +5. **wireFileWatcherEvents()** forwards to renderer via IPC (`webContents.send('team:change', event)`) +6. **Renderer** re-fetches full task list via IPC → `TeamTaskReader.getTasks()` re-reads all JSON files +7. **Zustand store** updates → React components re-render + +### Could we replace Claude's built-in tools with MCP tools? + +**We already did, effectively.** Claude's built-in `TaskCreate`/`TaskUpdate`/`TaskList`/`TaskGet` are NOT blocked, but the prompt instructs Claude to use our MCP tools exclusively. The built-in `SendMessage` and `Agent` tools are still used (they handle inter-agent communication and teammate spawning — responsibilities our MCP server doesn't cover). + +What we CANNOT replace via MCP: +- `SendMessage` — this is Claude's native inter-agent messaging protocol +- `Agent` — this is the tool that spawns teammate subprocesses +- `TeamCreate` — this bootstraps the team structure + +### If Claude also used MCP (like Codex/Gemini would), would that unify the architecture? + +**Partially, but with important nuances:** + +**What's already unified:** +- The `agent-teams-controller` package is the single source of truth for task/kanban/review operations. Both the MCP server and the Electron main process import it. +- Any AI agent (Claude, Codex, Gemini) that connects to our MCP server gets the same tools and writes to the same files. + +**What would still differ per agent:** +- **Team spawning** — Claude uses `Agent(team_name=...)` which is proprietary. Other agents would need their own subprocess spawning mechanism. +- **Inter-agent messaging** — Claude uses `SendMessage` (part of its Agent Teams protocol). Other agents would need a different approach (perhaps MCP-based `send_message` tool). +- **Process lifecycle** — Claude's `--input-format stream-json` / `--output-format stream-json` keeps the CLI alive. Other agents would need different process management. +- **Prompt injection** — Our provisioning prompt is Claude-specific. Other agents would need their own system prompts. + +**To truly unify for multi-agent support:** +1. The MCP server already provides all task/kanban operations — any agent with MCP support can use them +2. We'd need to add MCP tools for messaging (`send_message`, `read_inbox`) to replace Claude-specific `SendMessage` +3. We'd need a generic agent spawning mechanism (not Claude's `Agent` tool) +4. The stdout parsing in `TeamProvisioningService` is Claude-specific — other agents would need different adapters + +--- + +## 9. File Index + +| File | Role | +|------|------| +| `src/main/services/team/TeamProvisioningService.ts` | Spawns Claude CLI, attaches stdout parser, handles stream-json, manages team lifecycle | +| `src/main/services/team/TeamMcpConfigBuilder.ts` | Generates `--mcp-config` JSON file that injects our MCP server | +| `mcp-server/src/index.ts` | FastMCP server entry point (stdio transport) | +| `mcp-server/src/controller.ts` | Wraps `agent-teams-controller` for MCP tools | +| `mcp-server/src/tools/taskTools.ts` | Task CRUD MCP tools (17 tools) | +| `mcp-server/src/tools/kanbanTools.ts` | Kanban state MCP tools (6 tools) | +| `mcp-server/src/tools/reviewTools.ts` | Review workflow MCP tools | +| `agent-teams-controller/src/controller.js` | Shared controller factory — creates context + binds all domain modules | +| `agent-teams-controller/src/internal/taskStore.js` | Low-level task JSON file read/write operations | +| `agent-teams-controller/src/internal/tasks.js` | Task business logic (create, start, complete, comment, etc.) | +| `agent-teams-controller/src/internal/runtimeHelpers.js` | Path resolution (`~/.claude/tasks/`, `~/.claude/teams/`) | +| `src/main/services/infrastructure/FileWatcher.ts` | Watches `~/.claude/tasks/` and `~/.claude/teams/` with fs.watch() | +| `src/main/index.ts` (lines 425-620) | `wireFileWatcherEvents()` — forwards file changes to renderer via IPC | +| `src/main/services/team/TeamTaskReader.ts` | Reads all task JSON files, normalizes, returns `TeamTask[]` | +| `src/main/services/team/TeamTaskWriter.ts` | UI-side writes (create, update status, add comment, etc.) | +| `src/main/services/team/TeamKanbanManager.ts` | Reads/writes `kanban-state.json` for UI kanban overlay | diff --git a/docs/research/cli-adapter-exhaustive-search.md b/docs/research/cli-adapter-exhaustive-search.md new file mode 100644 index 00000000..8dd76217 --- /dev/null +++ b/docs/research/cli-adapter-exhaustive-search.md @@ -0,0 +1,305 @@ +# Exhaustive Search: Unified CLI Agent Adapter Libraries + +**Date:** 2026-03-24 +**Goal:** Find ANY existing library/package that provides a unified interface for spawning and communicating with multiple AI coding CLI agents (Claude Code, Codex, Gemini CLI, Goose, Aider, OpenCode, etc.) +**Verdict:** Multiple viable options now exist. The landscape has changed dramatically since last check. + +--- + +## Executive Summary + +The "nothing exists" conclusion from previous research is **no longer accurate**. As of March 2026, there are at least **6 serious contenders** that provide a unified interface for controlling multiple CLI coding agents. The ecosystem exploded in late 2025 / early 2026 driven by the Agent Client Protocol (ACP) standard and the proliferation of CLI coding agents. + +However, **none of them are a drop-in library for Electron** in the way we need. Each has tradeoffs. The analysis below is ordered from most to least relevant for our use case. + +--- + +## Tier 1: Directly Relevant — Unified Agent Interface Libraries + +### 1. Rivet Sandbox Agent SDK +- **Repo:** https://github.com/rivet-dev/sandbox-agent +- **npm:** `@sandbox-agent/cli` (v0.2.x), `sandbox-agent` (TS SDK) +- **Website:** https://sandboxagent.dev +- **Language:** Rust server + TypeScript SDK +- **Supported agents:** Claude Code, Codex, OpenCode, Cursor, Amp, Pi (6 agents) +- **Last activity:** Active (HN launch Feb 2026) +- **Stars:** High interest (featured on InfoQ, HN front page) +- **TypeScript types:** Yes, full TypeScript SDK with embedded mode +- **Installable via npm:** Yes +- **Can embed in Electron:** Partially. The TS SDK can spawn the Rust binary as a subprocess. However, it's designed for sandboxed environments (Docker, E2B, Daytona), not local Electron apps. +- **How it works:** Rust HTTP server runs inside a sandbox, exposes unified REST + SSE API. TS SDK connects over HTTP or spawns daemon. +- **Universal session schema:** Yes — normalizes all agent events into consistent format (session lifecycle, items, questions, permissions) +- **Reliability:** 8/10 — Backed by Rivet (YC company), clean architecture +- **Confidence this fits our needs:** 5/10 — Sandbox-first design doesn't map well to local Electron. We'd need to run the binary locally without a sandbox. The TS SDK embed mode is promising but untested for our use case. + +### 2. Agent Client Protocol (ACP) + TypeScript SDK +- **Repo:** https://github.com/agentclientprotocol/typescript-sdk +- **npm:** `@agentclientprotocol/sdk` (v0.14.1, 245 dependents) +- **Spec:** https://agentclientprotocol.com +- **Language:** TypeScript +- **Supported agents:** 25+ agents (Claude, Codex, Gemini CLI, Copilot, Goose, OpenCode, Pi, Kiro, Junie, Cline, OpenHands, Qoder, Kimi, and many more) +- **Last publish:** 15 days ago (very active) +- **Stars:** Growing rapidly (Zed-backed, GitHub Copilot adopted it) +- **TypeScript types:** Yes, full TypeScript SDK +- **Installable via npm:** Yes +- **Can embed in Electron:** Yes. The SDK provides `ClientSideConnection` that connects to agents via stdio or TCP. You spawn the agent CLI process and pipe stdio — exactly like what we do now with Claude Code. +- **How it works:** Standardized JSON-RPC protocol over stdio/TCP. Each agent implements ACP server. Client spawns process, communicates via NDJSON. +- **Reliability:** 9/10 — Backed by Zed Industries, adopted by GitHub Copilot CLI, Gemini CLI, Goose, and 20+ agents. This is becoming the industry standard. +- **Confidence this fits our needs:** 7/10 — This is the most promising approach. However: not all agents support ACP natively yet (Claude Code's ACP support is via adapter, not native). The protocol covers editor-agent communication, which is close to but not identical to our CLI orchestration needs. +- **Critical note:** ACP is about standardizing the *protocol* between a client and an agent. It does NOT handle process spawning, worktree management, or team coordination — we'd still build that ourselves on top. + +### 3. @posthog/code-agent +- **Repo:** https://github.com/PostHog/code (monorepo) +- **npm:** `@posthog/code-agent` (v0.2.0) +- **Language:** TypeScript +- **Supported agents:** Claude Code (Anthropic), OpenAI Codex (2 agents) +- **Last publish:** ~3 months ago +- **Stars:** Part of PostHog's code monorepo +- **TypeScript types:** Yes, full TypeScript +- **Installable via npm:** Yes +- **Can embed in Electron:** Yes — it's a pure TypeScript library +- **How it works:** Wraps Anthropic Claude Agent SDK and OpenAI Codex SDK behind a unified interface. Single API for streaming events, tool calls, diffs, permissions. +- **Features:** Unified permissions (strict/auto/permissive), MCP bridge, diff normalization, streaming events, auth discovery +- **Reliability:** 6/10 — Only 2 providers, no community adoption (0 dependents), published by PostHog for their own products +- **Confidence this fits our needs:** 4/10 — Too limited (only 2 agents). Uses official SDKs (not CLI spawn), which means it talks to APIs, not CLI processes. Different paradigm from what we need. + +### 4. one-agent-sdk +- **Repo:** https://github.com/odysa/one-agent-sdk +- **Language:** TypeScript +- **Supported agents:** Claude Code, Codex, Kimi CLI (3 agents) +- **TypeScript types:** Yes +- **Installable via npm:** Appears to be (uses official provider SDKs) +- **Can embed in Electron:** Yes — pure TypeScript +- **How it works:** Wraps official SDKs (@anthropic-ai/claude-agent-sdk, @openai/codex-sdk, @moonshot-ai/kimi-agent-sdk) behind unified interface. Provider-agnostic tools, handoffs, middleware. +- **Reliability:** 4/10 — Very new, minimal community, only 3 providers +- **Confidence this fits our needs:** 3/10 — Same limitation as @posthog/code-agent: uses SDKs not CLI spawn. Only 3 agents. Too narrow. + +### 5. Coder AgentAPI +- **Repo:** https://github.com/coder/agentapi +- **Language:** Go (server), OpenAPI 3.0.3 spec available +- **Supported agents:** Claude Code, Goose, Aider, Gemini, Amp, Codex (6 agents) +- **Stars:** ~996 +- **Latest version:** v0.11.2 +- **TypeScript types:** No official TS SDK, but OpenAPI spec available for generation +- **Installable via npm:** No (Go binary) +- **Can embed in Electron:** Partially. We'd bundle the Go binary and spawn it as subprocess. +- **How it works:** Runs an in-memory terminal emulator. Translates API calls into terminal keystrokes, parses agent outputs into messages. Simple 4-endpoint REST API (POST /message, GET /status, GET /events SSE, GET /messages). +- **Reliability:** 7/10 — Built by Coder (well-funded company), clean design, but terminal emulation approach has inherent limitations +- **Confidence this fits our needs:** 5/10 — Terminal emulation is clever but fragile. We'd need to bundle a Go binary. No native TypeScript SDK. Could generate one from OpenAPI spec. + +--- + +## Tier 2: Standalone Apps with Adapter Architecture (Not Reusable Libraries) + +These projects have interesting adapter/plugin architectures but are **standalone applications**, not importable libraries. + +### 6. Overstory +- **Repo:** https://github.com/jayminwest/overstory +- **Language:** TypeScript (Bun runtime) +- **Architecture:** Pluggable `AgentRuntime` interface at `src/runtimes/types.ts` +- **Supported runtimes:** 11 (Claude Code, Pi, Gemini CLI, Aider, Goose, Amp, and custom) +- **Stars:** Growing +- **Reusable as library:** No. It's a CLI orchestrator (Bun-only, uses tmux). The `AgentRuntime` interface is embedded in the app, not published as a package. +- **Relevance:** The `AgentRuntime` interface design is good reference material for our own adapter pattern. Worth studying `src/runtimes/types.ts`. + +### 7. conductor-oss (by charannyk06) +- **Repo:** https://github.com/charannyk06/conductor-oss +- **npm:** `conductor-oss` (launcher only) +- **Language:** Rust backend + TypeScript frontend (Next.js dashboard) +- **Architecture:** `conductor-executors` crate contains adapters for 10 agents +- **Supported agents:** Claude Code, Codex, Gemini, Qwen Code, Cursor Agent, Amp, OpenCode, Copilot, CCR (10 agents) +- **Reusable as library:** No. The agent adapters are Rust code in a Rust crate. The npm package is just a launcher that starts the Rust server. +- **Relevance:** Good reference for agent adapter patterns. The adapter architecture handles binary detection, launch commands, process monitoring, and prompt delivery. + +### 8. Vibe Kanban +- **Repo:** https://github.com/BloopAI/vibe-kanban +- **npm:** `vibe-kanban` (npx wrapper) +- **Stars:** ~23.4k +- **Language:** Rust backend + TypeScript/React frontend +- **Architecture:** "Executor" plugin pattern for each agent +- **Supported agents:** 10+ (Claude Code, Codex, Gemini CLI, GitHub Copilot, Amp, Cursor, OpenCode, Droid, CCR, Qwen Code) +- **Reusable as library:** No. Executors are Rust code. TypeScript types are generated from Rust via ts-rs. +- **Relevance:** Closest competitor to our product. Their agent adapter pattern is in Rust, not reusable by us. But: there is a community TypeScript port `@nogataka/coding-agent-mgr` that claims to be a drop-in replacement — worth investigating. + +### 9. Dorothy +- **Repo:** https://github.com/Charlie85270/Dorothy +- **Language:** Electron + React/Next.js +- **Architecture:** Agent Manager using node-pty +- **Supported agents:** Claude Code, Codex, Gemini +- **Reusable as library:** No. Standalone Electron desktop app. +- **Relevance:** Very similar architecture to ours (Electron + node-pty). Good reference for how they handle agent spawning. MCP server integration is interesting. + +### 10. Emdash +- **Repo:** https://github.com/generalaction/emdash +- **Backed by:** Y Combinator W26 +- **Language:** Electron + TypeScript +- **Supported agents:** 23 CLI providers +- **Reusable as library:** No. Standalone Electron app with SQLite/Drizzle. +- **Relevance:** Most similar to our product architecture-wise (Electron + TypeScript). Supports 23 agents. Worth studying their provider integration code for patterns. Auto-detects installed CLIs. + +### 11. ComposioHQ Agent Orchestrator +- **Repo:** https://github.com/ComposioHQ/agent-orchestrator +- **npm:** `@composio/ao` (global CLI) +- **Language:** TypeScript (40,000 LOC) +- **Architecture:** 8 plugin slots (runtime, agent, workspace, tracker, SCM, notifier, terminal, lifecycle) +- **Supported agents:** Claude Code, Codex, Aider (and more via plugins) +- **Reusable as library:** Partially. The plugin interfaces are TypeScript, but the system is designed as a standalone CLI orchestrator. +- **Stars:** Growing (17 plugins, 3,288 tests) +- **Relevance:** The TypeScript plugin interface pattern could be extracted/adapted. + +### 12. Parallel Code +- **Repo:** https://github.com/johannesjo/parallel-code +- **Language:** Desktop app (unspecified stack) +- **Supported agents:** Claude Code, Codex CLI, Gemini CLI +- **Reusable as library:** No. Standalone desktop app. + +--- + +## Tier 3: MCP-Based Orchestrators (Different Paradigm) + +### 13. all-agents-mcp +- **Repo:** https://github.com/Dokkabei97/all-agents-mcp +- **npm:** `all-agents-mcp` (npx) +- **Language:** TypeScript +- **Supported agents:** Claude Code, Codex, Gemini CLI, Copilot CLI (4 agents) +- **Architecture:** MCP server with agent abstraction layer (`src/agents/types.ts`, `base-agent.ts`, per-agent adapters) +- **Reusable as library:** Partially. The agent abstraction layer (`src/agents/`) could be extracted. But it's designed as an MCP server, not a library. +- **TypeScript types:** Yes +- **Relevance:** The `src/agents/` directory contains a clean TypeScript agent abstraction with `types.ts`, `base-agent.ts`, and per-agent implementations. This is the closest to a reusable adapter pattern in pure TypeScript. + +### 14. agents-mcp (d-kimuson) +- **Repo:** https://github.com/d-kimuson/agents-mcp +- **Description:** MCP server for unified AI agents interface +- **Relevance:** Minimal info, likely similar pattern to all-agents-mcp + +--- + +## Tier 4: Protocols / Standards (Not Libraries, But Important Context) + +### 15. Agent Client Protocol (ACP) +- **Spec:** https://agentclientprotocol.com +- **Repo:** https://github.com/agentclientprotocol/agent-client-protocol +- **Created by:** Zed Industries +- **Adopted by:** GitHub Copilot CLI, Gemini CLI, Goose, Pi, OpenClaw, OpenCode, Cline, Codex, and 20+ agents +- **TypeScript SDK:** `@agentclientprotocol/sdk` (v0.14.1, 245 dependents, published 15 days ago) +- **This is becoming THE standard.** JSON-RPC over stdio/TCP. Editor spawns agent process, communicates via NDJSON. +- **Key insight:** If most agents converge on ACP, our adapter layer becomes simpler — we just need an ACP client. + +### 16. agent-protocol (AI Engineers Foundation) +- **npm:** `agent-protocol` (v1.0.5) +- **Last published:** 2 years ago (dead) +- **Relevance:** Superseded by ACP. Not relevant. + +--- + +## Tier 5: Tangentially Related (Process Management / Terminal Control) + +### 17. terminalcp (@mariozechner/terminalcp) +- **Repo:** https://github.com/badlogic/terminalcp +- **npm:** `@mariozechner/terminalcp` +- **What:** "Playwright for the terminal" — MCP server that lets agents spawn and interact with any CLI tool +- **Uses:** node-pty + xterm.js for terminal emulation +- **Relevance:** Not an agent adapter, but the terminal spawn/control pattern (node-pty + xterm.js + Unix socket daemon) is exactly what we'd use if building our own. + +### 18. Network-AI +- **Repo:** https://github.com/jovanSAPFIONEER/Network-AI +- **npm:** `network-ai` +- **Language:** TypeScript +- **What:** Multi-agent orchestrator with 14 adapters (LangChain, AutoGen, CrewAI, OpenAI, etc.) +- **Relevance:** The adapters are for AI *frameworks*, not CLI coding agents. Different domain. + +### 19. execa +- **npm:** `execa` (millions of weekly downloads) +- **What:** Process execution for humans. Wrapper around child_process. +- **Relevance:** Not agent-specific, but the best foundation for spawning CLI processes in Node.js. We already use this pattern. + +--- + +## Comprehensive Comparison Matrix + +| Project | Type | npm pkg? | TS types? | Agents | Electron-safe? | Active? | Our fit | +|---------|------|----------|-----------|--------|-----------------|---------|---------| +| **ACP SDK** | Protocol SDK | Yes | Yes | 25+ | Yes | Very | **Best** | +| **Sandbox Agent SDK** | Unified API | Yes | Yes | 6 | Partial | Active | Good | +| **@posthog/code-agent** | SDK wrapper | Yes | Yes | 2 | Yes | Stale | Poor | +| **one-agent-sdk** | SDK wrapper | Yes | Yes | 3 | Yes | New | Poor | +| **Coder AgentAPI** | HTTP server | No (Go) | OpenAPI | 6 | Partial | Active | OK | +| **all-agents-mcp** | MCP server | Yes | Yes | 4 | Partial | Active | Reference | +| **Overstory** | CLI app | No | Yes | 11 | No (Bun+tmux) | Active | Reference | +| **conductor-oss** | App | Launcher | Rust | 10 | No (Rust) | Active | Reference | +| **Vibe Kanban** | App | Wrapper | Generated | 10+ | No (Rust) | Active | Reference | +| **Dorothy** | Electron app | No | Yes | 3+ | Same arch | Active | Reference | +| **Emdash** | Electron app | No | Yes | 23 | Same arch | Active | Reference | +| **ComposioHQ AO** | CLI app | Global | Yes | 3+ | Partial | Active | Reference | + +--- + +## Recommendation + +### Best Option: ACP SDK (`@agentclientprotocol/sdk`) +- **Reliability:** 9/10 +- **Confidence:** 7/10 + +**Why:** ACP is becoming the industry standard. 25+ agents support it. Backed by Zed, adopted by GitHub Copilot. The TypeScript SDK is mature (v0.14.1, 245 dependents). It handles the protocol layer — we handle process spawning and team coordination on top. + +**Risk:** Claude Code's ACP support is via adapter (not native stream-json). We'd need to verify Claude Code works with ACP in our specific use case (Agent Teams, stream-json mode). The protocol focuses on editor-agent communication, not CLI orchestration. + +### Fallback: Build Our Own Adapter Layer +- **Reliability:** 8/10 +- **Confidence:** 9/10 + +**Why:** Given that: +1. No library perfectly fits our Electron + Agent Teams architecture +2. The adapter layer is relatively thin (spawn process, pipe stdio, parse output) +3. We already have a working Claude Code integration via stream-json +4. ACP can be adopted incrementally as agents converge on it + +We should define our own `IAgentRuntime` interface (inspired by Overstory's `AgentRuntime` and ACP's `AgentSideConnection`), implement Claude Code adapter first, then add ACP-based adapters for other agents. + +### Reference implementations to study: +1. **ACP TypeScript SDK** — Protocol design, event schema, NDJSON streaming +2. **Overstory `src/runtimes/types.ts`** — AgentRuntime interface design for CLI agents +3. **all-agents-mcp `src/agents/`** — Clean TypeScript agent abstraction with base class +4. **Emdash provider integration** — How they handle 23 agents in Electron +5. **Sandbox Agent SDK event schema** — Universal session schema for normalizing agent events + +--- + +## Sources + +### Tier 1 (Libraries/SDKs) +- [Rivet Sandbox Agent SDK](https://github.com/rivet-dev/sandbox-agent) | [Docs](https://sandboxagent.dev/) | [InfoQ](https://www.infoq.com/news/2026/02/rivet-agent-sandbox-sdk/) +- [ACP TypeScript SDK](https://github.com/agentclientprotocol/typescript-sdk) | [npm](https://www.npmjs.com/package/@agentclientprotocol/sdk) | [Spec](https://agentclientprotocol.com) +- [@posthog/code-agent](https://www.npmjs.com/package/@posthog/code-agent) | [PostHog/code](https://github.com/PostHog/code) +- [one-agent-sdk](https://github.com/odysa/one-agent-sdk) +- [Coder AgentAPI](https://github.com/coder/agentapi) + +### Tier 2 (Apps with Adapter Architecture) +- [Overstory](https://github.com/jayminwest/overstory) +- [conductor-oss](https://github.com/charannyk06/conductor-oss) | [npm](https://www.npmjs.com/package/conductor-oss) +- [Vibe Kanban](https://github.com/BloopAI/vibe-kanban) | [npm](https://www.npmjs.com/package/vibe-kanban) +- [Dorothy](https://github.com/Charlie85270/Dorothy) | [Site](https://dorothyai.app/) +- [Emdash](https://github.com/generalaction/emdash) | [Site](https://www.emdash.sh/) +- [ComposioHQ Agent Orchestrator](https://github.com/ComposioHQ/agent-orchestrator) +- [Parallel Code](https://github.com/johannesjo/parallel-code) + +### Tier 3 (MCP Orchestrators) +- [all-agents-mcp](https://github.com/Dokkabei97/all-agents-mcp) + +### Tier 4 (Protocols) +- [Agent Client Protocol](https://agentclientprotocol.com) | [GitHub](https://github.com/agentclientprotocol/agent-client-protocol) | [Copilot ACP](https://github.blog/changelog/2026-01-28-acp-support-in-copilot-cli-is-now-in-public-preview/) +- [AI Code Agents SDK](https://felix-arntz.me/blog/introducing-ai-code-agents-a-typescript-sdk-to-solve-vendor-lock-in-for-coding-agents/) (Vercel AI SDK based, early stage) + +### Tier 5 (Process/Terminal) +- [terminalcp](https://github.com/badlogic/terminalcp) | [npm](https://www.npmjs.com/package/@mariozechner/terminalcp) +- [Network-AI](https://github.com/jovanSAPFIONEER/Network-AI) + +### Curated Lists +- [awesome-agent-orchestrators](https://github.com/andyrewlee/awesome-agent-orchestrators) +- [awesome-cli-coding-agents](https://github.com/bradAGI/awesome-cli-coding-agents) + +### HN Discussions +- [Show HN: Sandbox Agent SDK](https://news.ycombinator.com/item?id=46795584) +- [Show HN: OpenSwarm](https://news.ycombinator.com/item?id=47160980) +- [Show HN: Bridge from Copilot SDK to ACP](https://news.ycombinator.com/item?id=47165572) +- [Ask HN: Why CLI coding agents?](https://news.ycombinator.com/item?id=45115303) diff --git a/docs/research/mastra-integration-analysis.md b/docs/research/mastra-integration-analysis.md new file mode 100644 index 00000000..4a4f2541 --- /dev/null +++ b/docs/research/mastra-integration-analysis.md @@ -0,0 +1,756 @@ +# Mastra Integration Analysis + +> Technical feasibility study for integrating Mastra (TypeScript agent framework) with Claude Agent Teams UI. +> Date: 2026-03-24 + +## Table of Contents + +1. [Executive Summary](#executive-summary) +2. [Mastra Architecture Overview](#mastra-architecture-overview) +3. [Our Codebase Architecture](#our-codebase-architecture) +4. [Integration Points Analysis](#integration-points-analysis) +5. [Concrete Integration Approaches](#concrete-integration-approaches) +6. [Architecture Diagram](#architecture-diagram) +7. [What Stays the Same](#what-stays-the-same) +8. [What Must Change](#what-must-change) +9. [Effort Estimate](#effort-estimate) +10. [Risks and Blockers](#risks-and-blockers) +11. [Recommendations](#recommendations) +12. [Sources](#sources) + +--- + +## Executive Summary + +Mastra is a TypeScript-first agent framework (22K+ stars, $13M seed, YC-backed) from the Gatsby team. It provides unified primitives for agents, tools, workflows, RAG, and multi-agent orchestration with 40+ LLM provider support. + +**Key finding: Mastra operates at a fundamentally different level than Claude CLI.** Our app is a process manager and UI for Claude Code CLI sessions. Mastra is an SDK for building agents programmatically. Integration is possible but requires a significant architectural shift — specifically, replacing Claude CLI process management with in-process Mastra agent runtime. + +**Verdict: 6/10 feasibility, 4/10 reliability of quick integration.** The integration is architecturally sound but represents 6-10 person-weeks of work with significant risk to our core differentiator (Claude Code CLI features: file editing, terminal, git, Agent tool for spawning teammates). + +--- + +## Mastra Architecture Overview + +### Core Packages + +| Package | Purpose | +|---------|---------| +| `@mastra/core` | Agent, Workflow, Tool, Server, Storage, Vector, DI | +| `@mastra/mcp` | MCPClient (consume) + MCPServer (expose) | +| `@mastra/ai-sdk` | AI SDK v5 compatibility layer | +| `@mastra/client-js` | HTTP client for remote Mastra servers | +| `mastra` | CLI for project scaffolding | + +### Agent Definition + +```typescript +import { Agent } from '@mastra/core/agent'; +import { MCPClient } from '@mastra/mcp'; + +const agent = new Agent({ + id: 'team-lead', + name: 'Team Lead', + instructions: 'You coordinate the team...', + model: 'anthropic/claude-sonnet-4-20250514', // any of 40+ providers + tools: { taskCreate, taskUpdate, sendMessage }, +}); + +// Usage +const result = await agent.generate('Create tasks for the frontend sprint'); +const stream = await agent.stream('Review the PR'); +``` + +### Multi-Agent: Supervisor Pattern (recommended as of Feb 2026) + +```typescript +const researcher = new Agent({ + id: 'researcher', + description: 'Researches technical topics', + model: 'anthropic/claude-sonnet-4-20250514', + tools: { webSearch, readFile }, +}); + +const developer = new Agent({ + id: 'developer', + description: 'Implements code changes', + model: 'anthropic/claude-sonnet-4-20250514', + tools: { editFile, runTests, bash }, +}); + +const supervisor = new Agent({ + id: 'supervisor', + name: 'Team Lead', + instructions: 'Coordinate researcher and developer...', + model: 'anthropic/claude-sonnet-4-20250514', + agents: { researcher, developer }, // auto-converted to tools + memory: new Memory(), +}); + +const stream = await supervisor.stream('Fix the authentication bug', { + maxSteps: 20, +}); +``` + +### MCP Integration + +```typescript +import { MCPClient } from '@mastra/mcp'; + +const mcp = new MCPClient({ + servers: { + 'agent-teams': { + command: 'node', + args: ['/path/to/mcp-server/dist/index.js'], + }, + }, +}); + +const agent = new Agent({ + id: 'worker', + model: 'anthropic/claude-sonnet-4-20250514', + instructions: '...', +}); + +// Dynamic tool injection +const response = await agent.stream('Update task #abc to in_progress', { + toolsets: await mcp.listToolsets(), +}); +``` + +--- + +## Our Codebase Architecture + +### Process Management Layer (Claude-specific) + +The core of our backend is `TeamProvisioningService` — an 8000+ line service that manages Claude CLI processes. + +**Key file**: `src/main/services/team/TeamProvisioningService.ts` + +Core flow: +1. **Resolve Claude binary** via `ClaudeBinaryResolver` +2. **Build provisioning prompt** (~100 lines of structured instructions) via `buildProvisioningPrompt()` +3. **Spawn CLI process** with stream-json protocol: + ``` + spawnCli(claudePath, [ + '--input-format', 'stream-json', + '--output-format', 'stream-json', + '--verbose', + '--mcp-config', mcpConfigPath, + '--disallowedTools', 'TeamDelete,TodoWrite', + '--dangerously-skip-permissions', + ]) + ``` +4. **Parse stdout** as NDJSON (newline-delimited JSON) — types: `user`, `assistant`, `control_request`, `result`, `system` +5. **Send input via stdin** using stream-json protocol: `{"type":"user","message":{"role":"user","content":[...]}}\n` +6. **Monitor filesystem** for team config, tasks, inboxes written by CLI +7. **Relay messages** between lead and teammates via inbox files + +**Key file**: `src/main/utils/childProcess.ts` — `spawnCli()` and `execCli()` wrappers with Windows shell fallback and EINVAL handling. + +### Prompt/Instruction System + +The prompt system is deeply intertwined with Claude Code's native capabilities: + +**`buildProvisioningPrompt()`** (line ~860) constructs a multi-section prompt: +- Team identity (name, project, lead) +- Step 1: Call **BUILT-IN TeamCreate tool** (Claude Code native) +- Step 2: Spawn teammates via **Agent tool** (Claude Code native) with `team_name` parameter +- Step 3: Create tasks via **MCP board tools** +- Persistent lead context: communication protocol, board MCP operations, agent block policy + +**`buildMemberSpawnPrompt()`** (line ~444) constructs per-teammate instructions: +- Role and workflow injection +- `member_briefing` MCP bootstrap call +- Task lifecycle protocol (comment -> start -> work -> comment -> complete) +- Detailed notification/escalation rules + +**Critical Claude-specific constructs in prompts:** +- `Agent` tool with `team_name` parameter — Claude Code's native teammate spawning +- `TeamCreate` built-in tool — Claude Code's team lifecycle management +- `SendMessage` built-in tool — Claude Code's inter-agent messaging +- `--disallowedTools TeamDelete,TodoWrite` — Claude Code CLI flags +- `--permission-mode bypassPermissions` — Claude Code permission system +- stream-json protocol for bidirectional communication +- Post-compact context reinjection for context window management + +### MCP Server + +**Key files**: `mcp-server/src/` (16 TypeScript files) + +Our MCP server (FastMCP-based) exposes domain tools to agents: + +| Tool Domain | Tools | File | +|------------|-------|------| +| Tasks | task_create, task_get, task_list, task_start, task_complete, etc. | `taskTools.ts` | +| Kanban | kanban_get, kanban_set_column, kanban_clear | `kanbanTools.ts` | +| Review | review_request, review_approve, review_request_changes | `reviewTools.ts` | +| Messages | send messages between agents | `messageTools.ts` | +| Process | process management | `processTools.ts` | +| Cross-team | cross_team_send, cross_team_list_targets | `crossTeamTools.ts` | +| Runtime | runtime state queries | `runtimeTools.ts` | + +All tools delegate to `agent-teams-controller` — a workspace package that manages team state (config.json, tasks/, inboxes/). + +### Message Parsing Pipeline + +**Key files**: +- `src/main/types/jsonl.ts` — Raw JSONL format types (Claude Code session files) +- `src/main/types/messages.ts` — ParsedMessage with type guards +- `src/main/services/analysis/ChunkBuilder.ts` — Builds timeline chunks from parsed messages + +The JSONL parsing is tightly coupled to Claude Code's output format: +- Entry types: `user`, `assistant`, `system`, `summary`, `file-history-snapshot`, `queue-operation` +- Content blocks: `text`, `thinking`, `tool_use`, `tool_result`, `image` +- Usage metadata: `input_tokens`, `output_tokens`, `cache_read_input_tokens` +- Claude-specific fields: `model`, `stop_reason`, `cwd`, `gitBranch`, `agentId`, `isSidechain` + +### IPC Layer + +**Key file**: `src/main/ipc/teams.ts` — 60+ IPC channels for team operations + +The renderer communicates with main process via Electron IPC. The channels include team CRUD, task management, message sending, provisioning control, tool approval, and process lifecycle. + +--- + +## Integration Points Analysis + +### 1. Process Spawning — Claude CLI vs Mastra Agent Runtime + +| Aspect | Current (Claude CLI) | Mastra Integration | +|--------|---------------------|-------------------| +| **Runtime** | External process (`claude` binary) | In-process Node.js (`Agent.stream()`) | +| **Protocol** | stream-json over stdin/stdout | Programmatic TypeScript API | +| **Agent spawning** | `Agent` tool with `team_name` param | `new Agent({ agents: {...} })` supervisor pattern | +| **Tool execution** | Claude Code built-in + MCP | Mastra tools + `@mastra/mcp` MCPClient | +| **File editing** | Claude Code's built-in file tools | Must provide custom tools (Read, Write, Bash) | +| **Terminal** | Claude Code's built-in terminal | Must provide custom Bash tool | +| **Git** | Claude Code's built-in git support | Must provide custom git tools | +| **Context window** | Claude Code manages (200K) | Mastra manages via provider settings | + +**Claude-specificity score: 9/10** — This is the most tightly coupled area. + +### 2. Prompt/Instruction System + +| Aspect | Current | Mastra Equivalent | +|--------|---------|-------------------| +| System prompt | Injected via stream-json first message | `Agent.instructions` property | +| Dynamic instructions | Post-compact reinjection via stdin | `instructions` as function returning dynamic text | +| Built-in tools refs | `TeamCreate`, `Agent`, `SendMessage` in prompt | Must be replaced with Mastra tool calls | +| MCP tool refs | `task_create { teamName: "..." }` | Same MCP tools via `@mastra/mcp` MCPClient | + +**Claude-specificity score: 7/10** — Prompts reference Claude Code native tools extensively. + +### 3. MCP Server + +| Aspect | Current | Mastra Integration | +|--------|---------|-------------------| +| Server framework | FastMCP (stdio transport) | Same — OR convert to Mastra tools directly | +| Tool definitions | `server.addTool({ name, parameters, execute })` | `createTool({ id, inputSchema, execute })` | +| Transport | stdio (spawned by Claude CLI) | Could use `@mastra/mcp` MCPClient or convert to native Mastra tools | +| Controller | `agent-teams-controller` package | **Unchanged** — pure JS, no Claude dependency | + +**Claude-specificity score: 2/10** — MCP is provider-agnostic. Our `agent-teams-controller` is pure business logic. + +### 4. Message Parsing / JSONL Pipeline + +| Aspect | Current | Mastra Integration | +|--------|---------|-------------------| +| Session storage | `~/.claude/projects/{path}/*.jsonl` | Mastra has its own storage/memory system | +| Format | Claude Code JSONL (specific schema) | Mastra streaming chunks (text-delta, tool-call, etc.) | +| Type guards | `isParsedRealUserMessage`, etc. | New type guards for Mastra output format | +| Chunk building | `ChunkBuilder` from JSONL messages | New adapter from Mastra stream events | +| Subagent detection | `SubagentResolver` from tool_use content | Mastra supervisor tracks sub-agent calls natively | + +**Claude-specificity score: 8/10** — The entire analysis pipeline assumes Claude Code JSONL format. + +### 5. Team Lifecycle (config, inboxes, tasks) + +| Aspect | Current | Mastra Integration | +|--------|---------|-------------------| +| Team config | `~/.claude/teams/{name}/config.json` (Claude CLI creates) | Must be managed by our app directly | +| Task storage | `~/.claude/tasks/{name}/` (agent-teams-controller) | **Unchanged** | +| Inbox messaging | `~/.claude/teams/{name}/inboxes/{member}.json` | Replace with Mastra memory or direct tool calls | +| Cross-team comms | Inbox files with relay | Mastra agents can call each other directly | + +**Claude-specificity score: 6/10** — File-based coordination is Claude CLI convention, but our controller is independent. + +--- + +## Concrete Integration Approaches + +### Approach A: Mastra as Agent Runtime (Replace Claude CLI) + +**Confidence: 5/10 | Reliability: 4/10** + +Replace `spawnCli()` with in-process Mastra agents. The lead becomes a `supervisor` Agent, teammates become sub-agents. + +```typescript +// src/main/services/team/MastraTeamRuntime.ts (new file) +import { Agent } from '@mastra/core/agent'; +import { MCPClient } from '@mastra/mcp'; +import { createTool } from '@mastra/core/tools'; + +// Convert our MCP tools to native Mastra tools +const taskCreateTool = createTool({ + id: 'task_create', + description: 'Create a team task', + inputSchema: z.object({ + teamName: z.string(), + subject: z.string(), + description: z.string().optional(), + owner: z.string().optional(), + }), + execute: async (input) => { + const controller = getController(input.teamName); + return controller.tasks.createTask(input); + }, +}); + +// File editing tool (replaces Claude Code built-in) +const editFileTool = createTool({ + id: 'edit_file', + description: 'Edit a file on disk', + inputSchema: z.object({ + path: z.string(), + oldText: z.string(), + newText: z.string(), + }), + execute: async (input) => { + // Must implement file editing logic ourselves + const content = await fs.promises.readFile(input.path, 'utf8'); + const updated = content.replace(input.oldText, input.newText); + await fs.promises.writeFile(input.path, updated); + return { success: true }; + }, +}); + +// Bash tool (replaces Claude Code built-in) +const bashTool = createTool({ + id: 'bash', + description: 'Execute a bash command', + inputSchema: z.object({ command: z.string() }), + execute: async (input) => { + const { stdout, stderr } = await execAsync(input.command); + return { stdout, stderr }; + }, +}); + +// Create teammate agents +function createTeammateAgent(member: TeamMember, teamTools: Record) { + return new Agent({ + id: `teammate-${member.name}`, + name: member.name, + description: member.role || 'Team member', + instructions: buildMemberInstructions(member), // adapted from buildMemberSpawnPrompt + model: 'anthropic/claude-sonnet-4-20250514', + tools: { + ...teamTools, + editFileTool, + bashTool, + readFileTool, + // ... other dev tools + }, + }); +} + +// Create supervisor (lead) agent +function createLeadAgent( + request: TeamCreateRequest, + teammates: Record +) { + return new Agent({ + id: `lead-${request.teamName}`, + name: 'team-lead', + instructions: buildLeadInstructions(request), // adapted from buildPersistentLeadContext + model: request.model || 'anthropic/claude-sonnet-4-20250514', + agents: teammates, // Mastra auto-converts to tools + tools: { + ...teamTools, // task_create, kanban_get, etc. + editFileTool, + bashTool, + readFileTool, + }, + memory: new Memory(), + }); +} +``` + +**What breaks:** +- Claude Code's file editing (diff view, permission system) — must reimplement +- Claude Code's terminal integration +- Claude Code's git support +- Claude Code's extended thinking +- Claude Code's session persistence/resume +- The entire JSONL parsing pipeline +- Tool approval flow (our `control_request` handling) +- Post-compact context reinjection + +### Approach B: Mastra as Middleware / Orchestration Layer (Keep Claude CLI) + +**Confidence: 7/10 | Reliability: 6/10** + +Use Mastra as an orchestration layer that manages routing and coordination, while still spawning Claude CLI processes for actual work. + +```typescript +// src/main/services/team/MastraOrchestrator.ts (new file) +import { Agent } from '@mastra/core/agent'; +import { createTool } from '@mastra/core/tools'; + +// Tool that spawns a Claude CLI process for actual work +const claudeCliTool = createTool({ + id: 'claude_cli_execute', + description: 'Execute a task using Claude Code CLI', + inputSchema: z.object({ + prompt: z.string(), + cwd: z.string(), + model: z.string().optional(), + }), + execute: async (input) => { + // Spawn Claude CLI with -p (one-shot) + const result = await execCli(claudePath, [ + '-p', input.prompt, + '--output-format', 'text', + ...(input.model ? ['--model', input.model] : []), + ], { cwd: input.cwd }); + return { output: result.stdout }; + }, +}); + +// Mastra agent for high-level orchestration +const orchestrator = new Agent({ + id: 'orchestrator', + name: 'Task Orchestrator', + instructions: `You coordinate a development team. + Use claude_cli_execute for actual coding tasks. + Use task tools for board management.`, + model: 'anthropic/claude-sonnet-4-20250514', + tools: { + claudeCliTool, + ...teamBoardTools, + }, +}); + +// Orchestrator decides what to do, Claude CLI does the coding +const stream = await orchestrator.stream(userMessage); +``` + +**What this preserves:** +- Claude CLI's file editing, terminal, git, etc. +- Our existing JSONL pipeline (for CLI-executed tasks) +- MCP server tools (used by CLI processes) + +**What this adds:** +- Model-agnostic orchestration layer +- Ability to use OpenAI/Gemini/etc. for routing decisions +- Mastra's workflow engine for deterministic task flows + +**What breaks / gets complex:** +- Two runtime models (Mastra in-process + Claude CLI processes) +- Doubled complexity for message flow +- Unclear who "owns" the conversation state + +### Approach C: Mastra MCP Bridge (Minimal Integration) + +**Confidence: 8/10 | Reliability: 7/10** + +Use `@mastra/mcp` MCPServer to expose our existing tools to any Mastra-compatible client, and `@mastra/mcp` MCPClient to consume external MCP tools. + +```typescript +// mcp-server/src/mastra-bridge.ts (new file) +import { MCPServer } from '@mastra/mcp'; +import { Agent } from '@mastra/core/agent'; +import { registerTools } from './tools'; + +// Expose our existing tools as an MCP server that Mastra agents can consume +const mcpServer = new MCPServer({ + name: 'agent-teams-mcp', + version: '1.0.0', + tools: { + // Convert FastMCP tools to Mastra tools, or expose via MCP protocol + ...convertFastMcpToMastraTools(registerTools), + }, +}); + +// Any Mastra agent can now use our board tools +const externalAgent = new Agent({ + id: 'external-worker', + model: 'openai/gpt-4o', + instructions: 'You manage tasks on the team board.', + tools: await new MCPClient({ + servers: { + 'agent-teams': { + command: 'node', + args: ['path/to/mcp-server/dist/index.js'], + }, + }, + }).listTools(), +}); +``` + +**What this preserves:** +- Everything — this is additive, not replacement +- Claude CLI remains the primary runtime + +**What this adds:** +- Mastra agents can interact with our board +- Path to multi-provider support +- Future extensibility + +--- + +## Architecture Diagram + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ Electron App (Renderer) │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │ +│ │ Kanban │ │ Timeline │ │ Inbox │ │ Code Editor │ │ +│ │ Board │ │ View │ │ Chat │ │ (Diff View) │ │ +│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └───────┬───────┘ │ +│ └──────────────┴─────────────┴────────────────┘ │ +│ │ IPC │ +└──────────────────────────────┼───────────────────────────────────┘ + │ +┌──────────────────────────────┼───────────────────────────────────┐ +│ Electron App (Main) │ +│ │ │ +│ ┌───────────────────────────┴──────────────────────────────┐ │ +│ │ IPC Handler Layer (teams.ts) │ │ +│ └───────────────────────────┬──────────────────────────────┘ │ +│ │ │ +│ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ +│ MASTRA MIDDLEWARE LAYER (Approach B / Future) │ +│ │ ┌─────────────────┐ │ ┌─────────────────────┐ │ │ +│ │ Mastra Agent │ │ │ Mastra Workflow │ │ +│ │ │ (Orchestrator) │ │ │ (Task Routing) │ │ │ +│ │ model-agnostic │ │ │ DAG execution │ │ +│ │ └────────┬─────────┘ │ └──────────┬──────────┘ │ │ +│ │ │ │ │ +│ └ ─ ─ ─ ─ ─┼─ ─ ─ ─ ─ ─ ─┼─ ─ ─ ─ ─ ─ ─ ┼─ ─ ─ ─ ─ ─ ─ ┘ │ +│ │ │ │ │ +│ ┌───────────┴──────────────┴───────────────┴──────────────┐ │ +│ │ TeamProvisioningService (existing) │ │ +│ │ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │ │ +│ │ │ spawnCli() │ │ stream-json │ │ FS monitor │ │ │ +│ │ │ (Claude CLI)│ │ parser │ │ (tasks/inbox) │ │ │ +│ │ └──────┬──────┘ └──────┬───────┘ └───────┬───────┘ │ │ +│ └─────────┼────────────────┼───────────────────┼──────────┘ │ +│ │ │ │ │ +│ ┌─────────┴────────────────┴───────────────────┴──────────┐ │ +│ │ agent-teams-controller (pure JS) │ │ +│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │ +│ │ │ Tasks │ │ Kanban │ │ Inbox │ │ Config │ │ │ +│ │ │ CRUD │ │ State │ │ Messages │ │ Reader │ │ │ +│ │ └──────────┘ └──────────┘ └──────────┘ └────────┘ │ │ +│ └──────────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────────┼───────────────────────────────┐ │ +│ │ MCP Server (agent-teams-mcp) │ │ +│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────────────┐ │ │ +│ │ │ Tasks │ │ Kanban │ │ Review │ │ Messages │ │ │ +│ │ │ Tools │ │ Tools │ │ Tools │ │ & Cross-team │ │ │ +│ │ └────────┘ └────────┘ └────────┘ └────────────────┘ │ │ +│ └──────────────────────────────────────────────────────────┘ │ +│ │ +└───────────────────────────────────────────────────────────────────┘ + │ + ┌──────────┴──────────┐ + │ Claude CLI Process │ ← or Mastra agent.stream() + │ (stream-json) │ + │ ┌────────────────┐ │ + │ │ File Edit │ │ + │ │ Terminal/Bash │ │ + │ │ Git │ │ + │ │ Agent (spawn) │ │ + │ │ SendMessage │ │ + │ │ MCP tools │ │ + │ └────────────────┘ │ + └─────────────────────┘ +``` + +--- + +## What Stays the Same + +These modules are **not Claude-specific** and would survive any integration: + +| Module | Path | Why | +|--------|------|-----| +| `agent-teams-controller` | `agent-teams-controller/` | Pure JS business logic for tasks, kanban, review, inbox. Zero Claude dependency. | +| MCP Server tools | `mcp-server/src/tools/*.ts` | Standard MCP protocol. Works with any MCP-compatible agent. | +| UI components | `src/renderer/` | React/Zustand/Tailwind. Communicates via IPC, agnostic to backend. | +| IPC layer interface | `src/preload/constants/ipcChannels.ts` | Channel names are just strings. | +| Shared types | `src/shared/types/team.ts` | TeamTask, InboxMessage, etc. — domain types. | +| Team data services | `TeamDataService`, `TeamConfigReader`, `TeamTaskReader` | File-based, read team state from disk. | +| TeamMcpConfigBuilder | `src/main/services/team/TeamMcpConfigBuilder.ts` | Builds MCP config files. Could serve Mastra MCPClient too. | +| Notification system | `NotificationManager` | UI notifications, not Claude-specific. | + +--- + +## What Must Change + +### Tier 1: Core Runtime (Required for any Mastra integration) + +| File/Module | Lines | Change Required | +|------------|-------|-----------------| +| `TeamProvisioningService.ts` | ~8000 | Major refactor: abstract `AgentRuntime` interface. spawnCli() becomes one implementation, Mastra becomes another. | +| `childProcess.ts` | 220 | Keep as-is for Claude CLI path. New `MastraRuntime.ts` for in-process agents. | +| `ClaudeBinaryResolver.ts` | ~200 | Keep for Claude CLI path. Not needed for Mastra path. | + +### Tier 2: Message Parsing (Required for Approach A) + +| File/Module | Lines | Change Required | +|------------|-------|-----------------| +| `src/main/types/jsonl.ts` | 200+ | New parallel types for Mastra streaming events. | +| `src/main/types/messages.ts` | 377 | Extend ParsedMessage or create MastraMessage adapter. | +| `ChunkBuilder.ts` | ~600 | Abstract chunk building from JSONL parsing. Mastra adapter produces same chunk types. | +| `SubagentResolver.ts` | ~400 | Mastra supervisor natively tracks sub-agents. Simpler resolver. | +| `SemanticStepExtractor.ts` | ~300 | Mastra tool calls have different structure. Adapter needed. | + +### Tier 3: Prompt System (Required for all approaches) + +| File/Module | Lines | Change Required | +|------------|-------|-----------------| +| `buildProvisioningPrompt()` | ~100 | Remove Claude-specific steps (TeamCreate, Agent tool). Replace with Mastra tool references. | +| `buildMemberSpawnPrompt()` | ~80 | Convert to Mastra Agent `instructions`. Remove Agent tool spawn references. | +| `buildPersistentLeadContext()` | ~100 | Remove Agent tool references. Keep MCP tool instructions (they still apply). | +| `buildTeamCtlOpsInstructions()` | ~100 | Keep — these reference MCP tools which are provider-agnostic. | +| `actionModeInstructions.ts` | 50 | Keep — action modes are prompt-level, not provider-specific. | + +### Tier 4: Tool Approval (Required for Approach A) + +| File/Module | Change Required | +|------------|-----------------| +| Tool approval flow | Mastra has its own `requireApproval: true` on tools + `approveToolCall()`/`declineToolCall()`. Must adapt our UI's approval dialog to use Mastra's API instead of `control_request` stream-json messages. | + +--- + +## Effort Estimate + +### Approach A: Full Mastra Runtime (Replace Claude CLI) + +| Phase | Effort | Risk | +|-------|--------|------| +| Abstract AgentRuntime interface | 2 weeks | Medium — large refactor of 8K line service | +| Implement Mastra runtime adapter | 2 weeks | High — need to reimplement file/terminal/git tools | +| Adapt message parsing pipeline | 1 week | Medium — new adapter for Mastra events | +| Adapt prompt system | 1 week | Low — mostly string template changes | +| Tool approval integration | 1 week | Medium — different approval API | +| Testing + stabilization | 2 weeks | High — regression risk | +| **Total** | **9-10 weeks** | **High** | + +### Approach B: Mastra Middleware (Keep Claude CLI) + +| Phase | Effort | Risk | +|-------|--------|------| +| Mastra orchestrator service | 1 week | Medium | +| Claude CLI adapter tool | 1 week | Low | +| Dual runtime state management | 2 weeks | High — complexity | +| Message flow unification | 1 week | Medium | +| Testing | 1 week | Medium | +| **Total** | **6-7 weeks** | **Medium-High** | + +### Approach C: MCP Bridge (Minimal) + +| Phase | Effort | Risk | +|-------|--------|------| +| @mastra/mcp MCPServer wrapper | 3 days | Low | +| Example Mastra agent consuming our tools | 2 days | Low | +| Documentation + examples | 2 days | Low | +| **Total** | **1-2 weeks** | **Low** | + +--- + +## Risks and Blockers + +### Critical Blockers + +1. **Claude Code's built-in tools are not replicable via Mastra.** + Claude Code has deep integration with the filesystem, terminal, git, and its own Agent tool for spawning teammates. Mastra provides no equivalent — you would need to build `editFile`, `bash`, `readFile`, `glob`, `grep`, `git` tools from scratch. These tools must handle permissions, sandboxing, diff generation, and conflict resolution. This is not just wrapping `fs.writeFile()` — it's thousands of lines of battle-tested code. + +2. **stream-json protocol is Claude Code proprietary.** + Our entire real-time UI (live typing, tool progress, subagent tracking) depends on the stream-json wire format. Mastra's streaming format is different (AI SDK compatible). The translation layer is non-trivial. + +3. **Team/teammate lifecycle is Claude Code's native feature.** + `TeamCreate`, `Agent` with `team_name`, `SendMessage` — these are built into Claude Code CLI. Mastra's supervisor pattern is conceptually similar but mechanically different (in-process sub-agents vs. separate CLI processes). + +4. **Context window management.** + Claude Code manages its own context window, compaction, and session persistence. Mastra delegates this to the model provider's API. Our post-compact reinjection system would need complete redesign. + +### High Risks + +5. **Performance: in-process vs. out-of-process.** + Claude CLI runs as a separate process with its own Node.js runtime. Mastra agents run in-process within Electron's main process. Long-running agent tasks could block the Electron event loop. Would need worker threads or separate Node processes. + +6. **Authentication divergence.** + Claude Code CLI handles its own auth (OAuth, API key). Mastra uses provider API keys directly. Different auth models for different users. + +7. **Losing Claude Code ecosystem.** + Claude Code has CLAUDE.md, settings.json, .mcp.json, hooks, and growing features. Switching to Mastra means losing access to this ecosystem for Claude users. + +### Medium Risks + +8. **Mastra version churn.** + Mastra is pre-1.0 (currently ~1.10.x) and evolving rapidly. The AgentNetwork API was deprecated in favor of supervisor agents in just months. API stability is not guaranteed. + +9. **Dual dependency burden.** + Adding `@mastra/core` (~150KB+ with deps) to an Electron app increases bundle size and potential version conflicts. + +--- + +## Recommendations + +### Short Term (Now): Approach C — MCP Bridge + +**Confidence: 9/10 | Reliability: 8/10** + +- Wrap our MCP server with `@mastra/mcp` MCPServer +- Publish as a standalone MCP endpoint that any Mastra agent can consume +- Zero risk to existing functionality +- Opens the door for external Mastra agents to manage our board +- 1-2 weeks effort + +### Medium Term (Q2-Q3 2026): Abstract AgentRuntime Interface + +**Confidence: 7/10 | Reliability: 6/10** + +- Extract `AgentRuntime` interface from `TeamProvisioningService` +- `ClaudeCliRuntime` implements it (current behavior) +- Prepare the seam for `MastraRuntime` without building it yet +- De-risk the eventual full integration +- 2-3 weeks effort + +### Long Term (Q4 2026+): Approach B — Mastra Middleware + +**Confidence: 6/10 | Reliability: 5/10** + +- Add Mastra as orchestration layer for routing and multi-provider support +- Keep Claude CLI as the "worker" runtime for actual coding +- Use Mastra for decision-making, task routing, and provider switching +- Full multi-model support without losing Claude Code's tooling +- 6-7 weeks effort + +### NOT Recommended: Approach A (Full Replacement) + +**Confidence: 3/10 | Reliability: 2/10** + +Replacing Claude CLI entirely with Mastra-managed agents would lose our core differentiator (deep Claude Code integration: file editing, terminal, git, session persistence, extended thinking, etc.). The effort (~10 weeks) and risk are not justified unless Claude Code CLI is deprecated, which shows no signs of happening. + +--- + +## Sources + +- [Mastra GitHub Repository](https://github.com/mastra-ai/mastra) +- [Mastra Official Documentation](https://mastra.ai/docs) +- [Mastra Agent Overview](https://mastra.ai/docs/agents/overview) +- [Mastra MCP Overview](https://mastra.ai/docs/tools-mcp/mcp-overview) +- [Mastra Agent Network Evolution](https://mastra.ai/blog/agent-network) +- [Mastra vNext Agent Network](https://mastra.ai/blog/vnext-agent-network) +- [Mastra Supervisor Pattern (Feb 2026)](https://mastra.ai/blog/announcing-mastra-improved-agent-orchestration-ai-sdk-v5-support) +- [Mastra Agent Streaming Reference](https://mastra.ai/reference/agents/stream) +- [@mastra/core npm](https://www.npmjs.com/package/@mastra/core) +- [@mastra/mcp npm](https://www.npmjs.com/package/@mastra/mcp) +- [Mastra $13M Seed Round](https://technews180.com/funding-news/mastra-raises-13m-seed-for-typescript-ai-framework/) +- [Mastra on Y Combinator](https://www.ycombinator.com/companies/mastra) diff --git a/docs/research/mastra-vs-direct-mcp.md b/docs/research/mastra-vs-direct-mcp.md new file mode 100644 index 00000000..4a8bc52e --- /dev/null +++ b/docs/research/mastra-vs-direct-mcp.md @@ -0,0 +1,345 @@ +# @mastra/mcp vs Direct MCP: нужна ли нам Mastra как универсальный интеграционный слой? + +**Дата:** 2026-03-24 +**Контекст:** Вопрос пользователя — "Maybe we should use @mastra/mcp since it has many agents built-in?" +**Связанные документы:** +- `docs/research/mastra-integration-analysis.md` — полный технический анализ интеграции Mastra +- `docs/research/best-integration-approach.md` — сравнение всех подходов к мультипровайдерности +- `docs/research/ai-agent-protocols-and-routing.md` — обзор протоколов и фреймворков + +--- + +## Краткий ответ + +**Mastra НЕ даёт нам "many agents built-in" в том смысле, как это звучит.** Mastra — это SDK для создания СВОИХ агентов через API-вызовы к LLM-провайдерам. Она не умеет запускать/управлять CLI-агентами (Claude Code, Codex, Gemini CLI и т.д.) как процессами. Для нашего продукта — Electron-приложения, управляющего CLI-процессами через kanban-доску — прямой MCP остаётся правильным выбором. + +**Итоговая рекомендация: Прямой MCP (Вариант A)** +- Надёжность: 9/10 +- Уверенность: 9/10 + +--- + +## 1. Что такое @mastra/mcp на самом деле + +### Что Mastra НЕ является + +Mastra — это **НЕ** библиотека, которая подключает готовых агентов (Claude Code, Codex, Gemini CLI). Это SDK для создания собственных агентов через API-вызовы. Когда Mastra говорит про "40+ providers" — речь о 40+ LLM-провайдерах (OpenAI, Anthropic, Google и т.д.), к которым можно делать API-запросы, а не о CLI-агентах, которые работают как процессы. + +### Что Mastra ЯВЛЯЕТСЯ + +| Компонент | Описание | +|-----------|----------| +| `@mastra/core` | Agent runtime: создание агентов через `new Agent({model, instructions, tools})` | +| `@mastra/mcp` | MCPClient (подключение к MCP-серверам) + MCPServer (экспорт инструментов) | +| Agent | TS-объект, который вызывает LLM API + tools в цикле | +| Supervisor | Паттерн multi-agent: один агент координирует других | +| Memory | Observational Memory для long-term context | +| Workflows | DAG-based workflow engine | +| ToolSearchProcessor | Динамическая подгрузка инструментов (экономия токенов) | + +### Ключевой момент + +Mastra-агент — это `agent.generate("prompt")` или `agent.stream("prompt")`. Это **HTTP-вызов к LLM API** (OpenAI, Anthropic и т.д.). Это **НЕ** запуск CLI-процесса `claude --input-format stream-json`. + +Наш продукт — менеджер CLI-процессов с kanban-доской. Mastra работает на другом уровне абстракции. + +--- + +## 2. Поддержка MCP у CLI-агентов (март 2026) + +**Ключевой вопрос: если агенты уже поддерживают MCP нативно, зачем нам Mastra как прослойка?** + +| CLI-агент | MCP поддержка | Как настраивается | Источник | +|-----------|---------------|-------------------|----------| +| **Claude Code** | Нативная | `--mcp-config path.json`, `.mcp.json`, `~/.claude.json` | [code.claude.com/docs/en/mcp](https://code.claude.com/docs/en/mcp) | +| **OpenAI Codex** | Нативная | `~/.codex/config.toml`, `codex mcp add` | [developers.openai.com/codex/mcp](https://developers.openai.com/codex/mcp) | +| **Gemini CLI** | Нативная | `~/.gemini/settings.json` | [geminicli.com/docs/tools/mcp-server](https://geminicli.com/docs/tools/mcp-server/) | +| **Goose** | Нативная (MCP — основа расширений) | Built-in, Remote/Stdio/Command | [github.com/block/goose](https://github.com/block/goose) | +| **OpenCode** | Нативная | `opencode.json`, `opencode mcp add` | [opencode.ai/docs/mcp-servers](https://opencode.ai/docs/mcp-servers/) | +| **Kilo Code** | Нативная | `mcp_settings.json`, `.kilocode/mcp.json` | [kilo.ai/docs/automate/mcp/using-in-kilo-code](https://kilo.ai/docs/automate/mcp/using-in-kilo-code) | +| **Aider** | Через адаптеры (mcpm-aider) | MCP-клиент пакеты | [pulsemcp.com/servers/disler-aider](https://www.pulsemcp.com/servers/disler-aider) | + +**Вывод: 6 из 7 основных CLI-агентов уже поддерживают MCP нативно.** Им не нужна Mastra как прослойка — они могут подключиться к нашему MCP-серверу напрямую. + +--- + +## 3. Что Mastra добавляет поверх "сырого" MCP + +### Реальные преимущества Mastra (и почему они нам НЕ нужны) + +| Фича Mastra | Что это | Нужно ли нам? | Почему | +|-------------|---------|---------------|--------| +| **MCPClient** — подключение к нескольким MCP-серверам | Единый клиент для N серверов | Нет | Наш продукт ПРЕДОСТАВЛЯЕТ MCP-сервер, а не потребляет их | +| **MCPServer** — экспорт агентов/инструментов | Expose agents as MCP tools | Нет | У нас уже есть FastMCP сервер с 30+ инструментами | +| **ToolSearchProcessor** — динамический поиск инструментов | Агент ищет нужный инструмент по запросу | Нет | У нас ~30 инструментов, а не сотни. Контекст не проблема | +| **Agent Runtime** — цикл reason-act с memory | Полноценный runtime для API-агентов | Нет | Наши агенты — CLI-процессы (Claude Code, Codex), у них свой runtime | +| **Observability** — MCP_TOOL_CALL spans, Studio UI | Трейсинг MCP-вызовов | Нет | У нас свой UI с timeline, chunks, context tracking | +| **Serverless adapters** — Express/Hono/Koa | Запуск MCP в serverless | Нет | Мы Electron-приложение, не serverless | +| **Multi-registry** — Composio, Smithery | Поиск MCP-серверов в реестрах | Нет | Мы предоставляем один конкретный MCP-сервер | +| **Supervisor pattern** — multi-agent orchestration | Один агент управляет другими | Частично | Но Claude Code Agent Teams УЖЕ делает это нативно через `TeamCreate` + `Agent tool` | +| **600+ моделей** через 40+ провайдеров | Model routing | Нет | CLI-агенты сами решают, какую модель использовать | + +### Что Mastra НЕ может + +| Задача | Может ли Mastra? | Как мы решаем | +|--------|-------------------|---------------| +| Запустить `claude` CLI как процесс | Нет | `spawnCli()` + stream-json | +| Управлять `codex` CLI как subprocess | Нет | Нужен свой ProvisioningService | +| Парсить stream-json stdout | Нет | `handleStreamJsonMessage()` — наш код | +| Использовать Agent Teams built-in tools | Нет | Claude Code нативно | +| Работать с `~/.claude/teams/` файловой системой | Нет | `agent-teams-controller` | +| Показывать kanban-доску | Нет | Наш Electron UI | + +--- + +## 4. Три подхода: сравнение + +### Вариант A: Прямой MCP (наш текущий/рекомендуемый подход) + +**Надёжность: 9/10 | Уверенность: 9/10** + +``` +┌─────────────────────────────┐ +│ Electron App (UI) │ +│ ┌───────┐ ┌──────────┐ │ +│ │Kanban │ │ Timeline │ │ +│ │Board │ │ Messages │ │ +│ └───┬───┘ └────┬─────┘ │ +│ └──────────┘ │ +│ │ IPC │ +├───────────┼─────────────────┤ +│ Main Process │ +│ ┌────────────────────┐ │ +│ │ TeamProvisioning │ │ +│ │ Service │ │ +│ └─────────┬──────────┘ │ +│ │ │ +│ ┌─────────┴──────────┐ │ +│ │ MCP Server │ ←── Любой агент подключается сюда +│ │ (agent-teams-mcp) │ через --mcp-config +│ │ 30+ tools │ +│ └─────────────────────┘ │ +└─────────────────────────────┘ + │ │ + ┌────┴────┐ ┌────┴────┐ + │ Claude │ │ Codex / │ + │ Code │ │ Gemini │ + │ CLI │ │ CLI │ + │ (native)│ │ (via │ + │ │ │ MCP) │ + └─────────┘ └─────────┘ +``` + +**Как работает:** +1. Claude Code — нативная интеграция (процесс + stream-json + Agent Teams) +2. Другие агенты (Codex, Gemini, Goose, OpenCode, Kilo) — подключаются к нашему MCP-серверу через свой нативный MCP-клиент +3. Все агенты видят одну kanban-доску, создают задачи, обновляют статусы через MCP tools + +**Трудозатраты:** 0 доп. работы для MCP-части (уже работает). 2-3 недели для новых MCP-инструментов (`team_join`, `task_poll_assigned` и др.) + UI для внешних агентов. + +**Что даёт:** +- Любой MCP-совместимый агент подключается из коробки +- Zero dependency overhead (никаких `@mastra/*` пакетов) +- Наш MCP-сервер — единственная точка интеграции +- Полная совместимость с Claude Code Agent Teams + +**Чего не даёт:** +- Нет встроенного agent-to-agent (A2A) протокола (но он нам и не нужен — у нас inbox-файлы) +- Нет автоматического model routing (но CLI-агенты делают это сами) +- Нет встроенного observability для внешних агентов (но мы видим их действия через MCP-tool calls) + +### Вариант B: @mastra/mcp как обёртка нашего MCP-сервера + +**Надёжность: 5/10 | Уверенность: 4/10** + +``` +┌────────────────────────────────┐ +│ Electron App (UI) │ +├────────────────────────────────┤ +│ Main Process │ +│ ┌────────────────────┐ │ +│ │ TeamProvisioning │ │ +│ └─────────┬──────────┘ │ +│ │ │ +│ ┌─────────┴────────────────┐ │ +│ │ @mastra/mcp MCPServer │ │ ← Mastra обёртка +│ │ wraps our FastMCP tools │ │ +│ └─────────┬────────────────┘ │ +│ │ │ +│ ┌─────────┴────────────────┐ │ +│ │ @mastra/mcp MCPClient │ │ ← Mastra клиент +│ │ connects to external │ │ для внешних серверов +│ │ MCP servers │ │ +│ └──────────────────────────┘ │ +└────────────────────────────────┘ +``` + +**Трудозатраты:** 1-2 недели на обёртку + зависимость от `@mastra/core` (~150KB+) + +**Что даёт:** +- Typed MCPClient с auto-detect transport (stdio/HTTP/SSE) +- ToolSearchProcessor для динамического tool loading (если у нас будет 100+ инструментов) +- Tracing integration с Langfuse/LangSmith + +**Чего не даёт:** +- Ничего, что нельзя получить с прямым MCP +- CLI-агенты всё равно подключаются через свой нативный MCP-клиент, а не через Mastra + +**Проблемы:** +- Лишний слой абстракции (FastMCP -> Mastra MCPServer -> MCP protocol -> agent) +- Зависимость от быстро меняющегося фреймворка (Mastra уже менял API agent networks -> supervisors) +- Bundle size increase в Electron (~150KB+ от @mastra/core) +- Нет реальной выгоды: CLI-агенты не используют @mastra/mcp — они используют свои нативные MCP-клиенты + +### Вариант C: Mastra как оркестратор (создаёт/управляет агентами программно) + +**Надёжность: 3/10 | Уверенность: 3/10** + +``` +┌────────────────────────────────┐ +│ Electron App (UI) │ +├────────────────────────────────┤ +│ Main Process │ +│ ┌──────────────────────────┐ │ +│ │ Mastra Supervisor Agent │ │ ← Mastra управляет всем +│ │ model: anthropic/... │ │ +│ │ agents: { worker1, ... }│ │ +│ │ tools: { task_create } │ │ +│ └──────────┬───────────────┘ │ +│ │ │ +│ ┌──────────┴───────────────┐ │ +│ │ Mastra Sub-Agents │ │ ← API-based, не CLI +│ │ openai/gpt-4o │ │ +│ │ anthropic/claude-sonnet │ │ +│ │ google/gemini-2.5-pro │ │ +│ └──────────────────────────┘ │ +└────────────────────────────────┘ +``` + +**Трудозатраты:** 8-12 недель + +**Что даёт:** +- Полная мультимодельность через API (40+ провайдеров) +- Mastra memory, workflows, evals + +**Чего не даёт:** +- Claude Code Agent Teams (нативные инструменты CLI: file editing, terminal, git, session persistence) +- Управление CLI-процессами +- Парсинг JSONL-сессий +- Всё, что делает наш продукт уникальным + +**Проблемы:** +- **Полностью ломает наш продукт.** Мы перестаём быть "Claude Agent Teams UI" и становимся "ещё один Mastra-based agent manager" +- Нужно заново реализовать file editing, bash, git tools (тысячи строк battle-tested кода в Claude Code) +- Теряем CLAUDE.md, hooks, settings.json, extended thinking — весь экосистемный Claude Code +- Mastra-агенты — API-based. Они НЕ запускаются как CLI-процессы с своим terminal и git integration + +--- + +## 5. Что насчёт "Skills" — 40+ AI агентов в Mastra? + +Это отдельная тема, которая может ввести в заблуждение. + +**"Skills" в Mastra — это НЕ готовые агенты.** Это markdown-файлы с инструкциями (CLAUDE.md, AGENTS.md), которые учат внешних AI coding agents (Claude Code, Cursor, Windsurf, Copilot и т.д.) использовать Mastra API. То есть Mastra генерирует `.cursor/rules` или `CLAUDE.md` с документацией по своему SDK. + +Список из 40+ "агентов" (AdaL, Amp, Antigravity, Augment, CodeBuddy, Crush, Droid, Goose, Kilo, Kimi CLI, Kiro CLI, Kode и т.д.) — это список IDE/CLI tools, для которых Mastra может сгенерировать instruction files. Это **НЕ** то, что Mastra может программно запускать или управлять. + +--- + +## 6. Экосистема: инструменты для оркестрации CLI-агентов + +Для полноты картины — вот что существует в марте 2026 для управления CLI-агентами как процессами (наша задача): + +| Инструмент | Что делает | GitHub | Подходит нам? | +|------------|-----------|--------|---------------| +| **CCManager** | Session manager для Claude/Codex/Gemini/OpenCode/Kilo CLI | [kbwo/ccmanager](https://github.com/kbwo/ccmanager) | Нет — TUI, не Electron; нет kanban | +| **MCO** | Neutral orchestration layer для CLI-агентов | [mco-org/mco](https://github.com/mco-org/mco) | Частично — dispatch layer, но без UI | +| **Mozzie** | Desktop tool для параллельной оркестрации | [ProductHunt](https://www.producthunt.com/products/mozzie) | Конкурент | +| **Nexus MCP** | MCP-сервер для вызова CLI-агентов как tools | [glama.ai](https://glama.ai/mcp/servers/j7an/nexus-mcp) | Интересно — позволяет одному агенту вызывать другой через MCP | +| **claude-code-teams-mcp** | Reimplementation Agent Teams как standalone MCP server | [cs50victor/claude-code-teams-mcp](https://github.com/cs50victor/claude-code-teams-mcp) | Валидирует наш подход — MCP как universal integration layer | + +**Вывод:** Индустрия движется к MCP как универсальному протоколу, а не к Mastra как универсальному фреймворку. Mastra — для создания API-based агентов. MCP — для интеграции любых агентов с инструментами. + +--- + +## 7. Сводная таблица + +| Критерий | Вариант A: Прямой MCP | Вариант B: @mastra/mcp обёртка | Вариант C: Mastra оркестратор | +|----------|----------------------|-------------------------------|------------------------------| +| **Трудозатраты** | 2-3 недели (новые MCP tools) | 3-4 недели | 8-12 недель | +| **Что ломается** | Ничего | Ничего (additive) | Всё | +| **Code reuse** | 100% | 100% | ~20% | +| **Мультипровайдерность** | Любой MCP-совместимый агент | Любой MCP-совместимый агент | 40+ API провайдеров | +| **CLI-агенты** | Нативная поддержка | Нативная поддержка | Не поддерживаются | +| **Bundle size** | +0 KB | +150 KB+ (@mastra/core) | +150 KB+ | +| **Зависимость от Mastra** | Нет | Слабая | Полная | +| **Риск** | Очень низкий | Низкий | Очень высокий | +| **Наш продукт остаётся?** | Да | Да | Нет — становится другим продуктом | +| **Уникальность** | Kanban + Claude Code Teams + MCP | Kanban + Claude Code Teams + MCP | Ещё один Mastra-based agent manager | +| **Надёжность** | 9/10 | 5/10 | 3/10 | +| **Уверенность** | 9/10 | 4/10 | 3/10 | + +--- + +## 8. Финальная рекомендация + +### Не используем @mastra/mcp. Используем прямой MCP. + +**Причины:** + +1. **Mastra решает не нашу проблему.** Mastra — SDK для создания API-based агентов. Наш продукт — менеджер CLI-процессов. Разные домены. + +2. **CLI-агенты уже поддерживают MCP нативно.** Claude Code, Codex, Gemini CLI, Goose, OpenCode, Kilo — все могут подключиться к нашему MCP-серверу без Mastra. + +3. **@mastra/mcp — лишний слой.** CLI-агенты не используют Mastra MCPClient. Они используют свои нативные MCP-клиенты. Mastra MCPServer просто обернёт наш FastMCP-сервер без добавления ценности. + +4. **Наш MCP-сервер уже работает.** 30+ инструментов, battle-tested с Claude Code Agent Teams. Нужно добавить 5-8 новых инструментов для external agents — и готово. + +5. **Zero dependency = zero risk.** Mastra меняет API быстро (agent networks -> supervisors за месяцы). Прямой MCP — стабильный стандарт (v1.0+, AAIF/Linux Foundation). + +6. **Наше конкурентное преимущество — kanban + Claude Code Agent Teams.** Mastra не усиливает это. Mastra превращает нас в generic agent manager, которых уже десятки. + +### Что делать вместо Mastra + +Следовать плану из `docs/research/best-integration-approach.md` — **Option 7: Hybrid**: + +1. **Phase 1 (неделя 1-2):** Добавить MCP-инструменты для external agents: `team_join`, `team_leave`, `task_poll_assigned`, `task_claim`, `member_register`, `member_heartbeat` +2. **Phase 2 (неделя 2-3):** UI-поддержка внешних агентов: provider badge, external member type +3. **Phase 3 (неделя 3-4):** Notification mechanism (polling, SSE) +4. **Phase 4 (по запросу):** Нативная поддержка второго CLI-агента (Codex) через `AgentRuntime` abstraction + +### Когда Mastra МОЖЕТ понадобиться + +- Если мы решим создавать **API-based агентов** для задач, не требующих CLI (code review, planning, triage) — Mastra Agent + наш MCP server +- Если мы решим добавить **ToolSearchProcessor** для discovery среди сотен инструментов (сейчас у нас 30+, не актуально) +- Если мы решим экспортировать наши агенты/workflow как **standalone MCP server** для внешних систем (Mastra MCPServer может быть удобнее FastMCP) +- Если Claude Code CLI будет **deprecated** (никаких признаков этого) + +Но это всё сценарии "если" на далёкое будущее. Сейчас прямой MCP — правильный и достаточный выбор. + +--- + +## Источники + +- [Mastra GitHub Repository (22K+ stars)](https://github.com/mastra-ai/mastra) +- [Mastra MCP Overview](https://mastra.ai/docs/mcp/overview) +- [Mastra Agents Overview](https://mastra.ai/docs/agents/overview) +- [Mastra Agent Networks](https://mastra.ai/docs/agents/networks) +- [@mastra/mcp npm](https://www.npmjs.com/package/@mastra/mcp) +- [Why We're All-In on MCP (Mastra Blog)](https://mastra.ai/blog/mastra-mcp) +- [Mastra 1.0 Announcement (300K+ weekly downloads, 19.4K stars)](https://mastra.ai/blog/announcing-mastra-1) +- [Mastra Changelog 2026-03-12](https://mastra.ai/blog/changelog-2026-03-12) +- [Claude Code MCP Docs](https://code.claude.com/docs/en/mcp) +- [OpenAI Codex MCP](https://developers.openai.com/codex/mcp) +- [Gemini CLI MCP](https://geminicli.com/docs/tools/mcp-server/) +- [Goose — open source AI agent by Block](https://github.com/block/goose) +- [OpenCode MCP](https://opencode.ai/docs/mcp-servers/) +- [Kilo Code MCP](https://kilo.ai/docs/automate/mcp/using-in-kilo-code) +- [Aider MCP Server](https://www.pulsemcp.com/servers/disler-aider) +- [claude-code-teams-mcp (standalone reimplementation)](https://github.com/cs50victor/claude-code-teams-mcp) +- [CCManager (session manager)](https://github.com/kbwo/ccmanager) +- [MCO (multi-agent orchestrator)](https://github.com/mco-org/mco) +- [Nexus MCP (CLI agents as MCP tools)](https://glama.ai/mcp/servers/j7an/nexus-mcp) +- [Mastra ToolSearchProcessor (Feb 2026)](https://mastra.ai/blog/changelog-2026-02-04) +- [Google Official MCP Support Announcement](https://cloud.google.com/blog/products/ai-machine-learning/announcing-official-mcp-support-for-google-services) +- [Agentic AI Foundation (AAIF) — Linux Foundation](https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation) diff --git a/docs/research/unified-cli-agent-interface.md b/docs/research/unified-cli-agent-interface.md new file mode 100644 index 00000000..76489a32 --- /dev/null +++ b/docs/research/unified-cli-agent-interface.md @@ -0,0 +1,533 @@ +# Unified CLI Agent Interface — Research (March 2026) + +Research on tools/libraries providing a unified interface for calling multiple AI coding CLI agents abstractly (Claude Code, Codex CLI, Gemini CLI, Goose, OpenCode, Aider, etc.). + +## Summary & Recommendation + +**No single battle-tested npm library exists** that abstracts CLI agent spawning behind a clean TypeScript interface suitable for embedding in an Electron app. The ecosystem is fragmented across ~10 projects, each with tradeoffs. The most relevant options for our use case are: + +| Project | Lang | Approach | Library Use | Agents | Our Fit | +|---------|------|----------|-------------|--------|---------| +| **Coder AgentAPI** | Go | HTTP API over terminal emulation | Via HTTP (language-agnostic) | 11 | 8/10 | +| **all-agents-mcp** | TS | MCP server, child process spawn | npm import or MCP | 4 | 7/10 | +| **Overstory** | TS | AgentRuntime interface + tmux | CLI only (Bun) | 11 | 6/10 | +| **Composio Agent Orchestrator** | TS | Plugin architecture, worktrees | Build from source | 4+ | 5/10 | +| **MCO** | Python | CLI adapter hooks | CLI/MCP only | 5+ | 4/10 | +| **Network-AI** | TS | Blackboard coordination | npm library | 17 adapters* | 3/10 | +| **AWS CAO** | Python | tmux + MCP server | REST API | 7 | 4/10 | + +\* Network-AI adapters are for AI frameworks (LangChain, CrewAI, etc.), not CLI coding agents directly. + +**Recommended approach**: Extract the adapter pattern from Coder AgentAPI or all-agents-mcp, build our own thin `AgentAdapter` interface in TypeScript. + +--- + +## Tier 1 — Most Relevant for Our Use Case + +### 1. Coder AgentAPI + +**The most mature unified interface for controlling CLI coding agents programmatically.** + +- **URL**: https://github.com/coder/agentapi +- **Stars**: ~1,300 +- **Language**: Go (82%), TypeScript (15% — web UI) +- **License**: MIT +- **npm package**: None (Go binary, HTTP API) + +#### How it works +Runs an in-memory terminal emulator (Go). Translates API calls into terminal keystrokes, parses agent output into structured messages. Each agent type has a message formatter in `lib/msgfmt/`. + +#### Supported agents (11) +Claude Code, Goose, Aider, Gemini CLI, GitHub Copilot, AmazonQ, OpenCode, Sourcegraph Amp, Codex, Auggie, Cursor CLI. + +#### API surface +``` +GET /messages — conversation history +POST /message — send message (type: "user" | "raw") +GET /status — "stable" | "running" +GET /events — SSE stream (real-time) +GET /openapi.json — full OpenAPI schema +``` + +#### Integration with Electron +- Spawn `agentapi server --type=claude -- claude` as child process +- Communicate via HTTP (localhost:3284) +- SSE events for real-time status updates +- Can generate TS client from OpenAPI spec using `@hey-api/openapi-ts` +- **Con**: Requires Go binary distribution alongside Electron app +- **Con**: Terminal emulation approach is fragile — keystrokes, not stdin/stdout protocol + +#### Reliability: 7/10 +#### Confidence: 8/10 — well-maintained by Coder (enterprise company), actively updated + +Source: [github.com/coder/agentapi](https://github.com/coder/agentapi) + +--- + +### 2. all-agents-mcp + +**TypeScript MCP server that orchestrates agents via unified stdio interface.** + +- **URL**: https://github.com/Dokkabei97/all-agents-mcp +- **npm**: `all-agents-mcp` +- **Language**: TypeScript (100%) +- **License**: MIT (assumed) + +#### How it works +Invokes each agent's CLI binary as a child process. Each agent implementation extends `BaseAgent` abstract class which handles process spawning, stdin/stdout capture. No API bypass — pure process orchestration. + +#### Key TypeScript interface +``` +src/agents/ + IAgent interface — identity, availability, execution, health + BaseAgent abstract class — spawn logic, stdin/stdout + claude-agent.ts + codex-agent.ts + gemini-agent.ts + copilot-agent.ts +``` + +#### Supported agents (4) +Claude Code, Codex CLI, Gemini CLI, GitHub Copilot CLI. + +#### API surface (MCP tools) +- `ask_agent` — single agent query +- `ask_all` — parallel multi-agent comparison +- `delegate_task` — complexity-based routing +- `cross_verify` — same agent, multiple models +- Plus specialized: code review, debug, explain, test gen, refactor + +#### Integration with Electron +- **Pure TypeScript** — best language fit +- Can import as library or run as MCP server +- Child process spawning maps well to our existing architecture +- `IAgent` interface is close to what we need +- **Con**: Only 4 agents (vs 11 in AgentAPI) +- **Con**: Young project, may lack edge case handling +- **Con**: MCP-first design, not raw process management + +#### Reliability: 5/10 +#### Confidence: 6/10 — concept is solid, but limited agent coverage + +Source: [github.com/Dokkabei97/all-agents-mcp](https://github.com/Dokkabei97/all-agents-mcp) + +--- + +### 3. Overstory + +**Multi-agent orchestration with pluggable AgentRuntime interface — most agents supported.** + +- **URL**: https://github.com/jayminwest/overstory +- **npm**: `@os-eco/overstory-cli` +- **Language**: TypeScript (Bun runtime) +- **License**: MIT + +#### AgentRuntime interface (`src/runtimes/types.ts`) +Defines the contract each adapter must implement: +- Spawning +- Config deployment +- Guard enforcement +- Readiness detection +- Transcript parsing + +#### Supported runtimes (11) +Claude Code, Pi, Copilot, Cursor, Codex, Gemini CLI, Aider, Goose, Amp, OpenCode, Sapling. + +#### Architecture +- Agents run in isolated **git worktrees via tmux** +- Inter-agent messaging via **SQLite** (`.overstory/mail.db`, WAL mode) +- Tiered conflict resolution for merge +- Watchdog daemon for health monitoring +- Hierarchy: Orchestrator → Coordinator → Supervisor → Workers + +#### Integration with Electron +- TypeScript — good language fit +- `AgentRuntime` interface is the cleanest abstraction found +- **Con**: Requires Bun (not Node.js) +- **Con**: Hard dependency on tmux (not available on Windows, awkward in Electron) +- **Con**: Designed as CLI orchestrator, not embeddable library +- **Con**: Heavy — mail system, worktrees, watchdog are overhead we don't need + +#### What we can extract +The `AgentRuntime` interface pattern is the most instructive. We could model our own adapter interface after it, implementing only spawn/communicate/status methods. + +#### Reliability: 6/10 +#### Confidence: 5/10 — great architecture design but tmux/Bun deps make it impractical for Electron + +Source: [github.com/jayminwest/overstory](https://github.com/jayminwest/overstory) + +--- + +## Tier 2 — Useful Reference, Not Direct Import + +### 4. ComposioHQ Agent Orchestrator + +**Enterprise-grade TypeScript orchestrator with plugin architecture.** + +- **URL**: https://github.com/ComposioHQ/agent-orchestrator +- **Language**: TypeScript (91.5%), pnpm monorepo +- **npm**: Not published (build from source, `npm link -g packages/cli`) +- **License**: Not specified +- **Stars**: Growing, backed by Composio (well-funded company) + +#### Plugin architecture (8 slots) +| Slot | Default | Alternatives | +|------|---------|-------------| +| Runtime | tmux | docker, k8s, process | +| Agent | claude-code | codex, aider, opencode | +| Workspace | worktree | clone | +| Tracker | github | linear | +| Notifier | desktop | slack, composio, webhook | +| Terminal | iterm2 | web | + +All interfaces in `packages/core/src/types.ts`. Plugins implement one interface and export a `PluginModule`. + +#### Key stats +40,000 lines of TypeScript, 17 plugins, 3,288 tests. + +#### Integration with Electron +- TypeScript monorepo — compatible +- Plugin interface is clean and extensible +- **Con**: Not published as npm package +- **Con**: Heavy — includes dashboard, CI integration, PR management +- **Con**: tmux as default runtime +- **Con**: Designed for autonomous operation, not interactive control + +#### Reliability: 6/10 +#### Confidence: 5/10 — impressive codebase but too heavy for embedding + +Source: [github.com/ComposioHQ/agent-orchestrator](https://github.com/ComposioHQ/agent-orchestrator) + +--- + +### 5. MCO (Multi-CLI Orchestrator) + +**Python-based neutral orchestration layer for CLI coding agents.** + +- **URL**: https://github.com/mco-org/mco +- **npm**: `@tt-a1i/mco` (Node.js wrapper around Python) +- **Language**: Python (core), Node.js (wrapper) +- **Requires**: Python 3.10+ + +#### Adapter architecture +Adding a new agent CLI requires implementing three hooks: +1. Auth check +2. Command builder +3. Output normalizer + +Supports two transport modes: Shim (stdout parsing) and ACP (JSON-RPC). + +#### Supported agents (5+) +Claude Code, Codex CLI, Gemini CLI, OpenCode, Qwen Code. Custom agents via `.mco/agents.yaml`. + +#### Features +- Parallel dispatch + consensus engine (`agreement_ratio`, `consensus_score`) +- JSON/SARIF/Markdown output +- Debate mode, divide mode (files/dimensions) +- MCP server mode for programmatic access + +#### Integration with Electron +- **Con**: Python dependency — very problematic for Electron distribution +- **Con**: Not a library, primarily CLI +- MCP server mode could work but adds complexity +- The 3-hook adapter pattern is a useful design reference + +#### Reliability: 5/10 +#### Confidence: 4/10 — Python dependency is a dealbreaker for Electron + +Source: [github.com/mco-org/mco](https://github.com/mco-org/mco) + +--- + +### 6. AWS CLI Agent Orchestrator (CAO) + +**AWS-backed orchestrator with supervisor-worker pattern via tmux + MCP.** + +- **URL**: https://github.com/awslabs/cli-agent-orchestrator +- **Language**: Python 3.10+ +- **Install**: `uv tool install` (not on PyPI) +- **License**: Apache 2.0 + +#### Supported providers (7) +Kiro CLI, Claude Code, Codex CLI, Gemini CLI, Kimi CLI, GitHub Copilot CLI, Q CLI. + +#### Orchestration patterns +1. **Handoff** — synchronous task transfer with wait-for-completion +2. **Assign** — asynchronous spawning for parallel execution +3. **Send Message** — direct communication with existing agents + +#### REST API +Server on `localhost:9889` — session management, terminal control, messaging. + +#### Integration with Electron +- **Con**: Python — not suitable for Electron +- **Con**: tmux dependency +- REST API approach could be adapted +- Agent profile system is well-designed (provider key in frontmatter) + +#### Reliability: 7/10 +#### Confidence: 4/10 — solid engineering (AWS) but Python/tmux deps block Electron use + +Source: [github.com/awslabs/cli-agent-orchestrator](https://github.com/awslabs/cli-agent-orchestrator), [AWS Blog](https://aws.amazon.com/blogs/opensource/introducing-cli-agent-orchestrator-transforming-developer-cli-tools-into-a-multi-agent-powerhouse/) + +--- + +### 7. Network-AI + +**TypeScript multi-agent coordination with atomic shared state.** + +- **URL**: https://github.com/jovanSAPFIONEER/Network-AI +- **npm**: `network-ai` +- **Language**: TypeScript +- **License**: MIT + +#### Key concept +Solves the "last-write-wins" problem with atomic `propose -> validate -> commit` semantics using filesystem-based mutual exclusion. + +#### 17 adapters +LangChain, AutoGen, CrewAI, OpenAI Assistants, LlamaIndex, Semantic Kernel, Haystack, DSPy, Agno, MCP, Custom, OpenClaw, A2A, Codex, MiniMax, NemoClaw, APS. + +**Important caveat**: These are adapters for AI *frameworks* (LangChain, CrewAI), not CLI coding agents (Claude Code, Aider). The Codex adapter is for OpenAI API, not Codex CLI. + +#### Library usage +```typescript +import { LockedBlackboard, CustomAdapter, createSwarmOrchestrator } from 'network-ai'; +``` + +#### Integration with Electron +- TypeScript + npm — good language fit +- Importable as library +- **Con**: Solves a different problem (framework coordination, not CLI agent spawning) +- **Con**: No adapters for CLI coding agents specifically +- Blackboard pattern could be useful for inter-agent state + +#### Reliability: 5/10 +#### Confidence: 3/10 — wrong abstraction level for our needs + +Source: [github.com/jovanSAPFIONEER/Network-AI](https://github.com/jovanSAPFIONEER/Network-AI) + +--- + +## Tier 3 — Ecosystem Context + +### 8. Pi (pi-mono) + +**TypeScript monorepo — coding agent toolkit with unified LLM API.** + +- **URL**: https://github.com/badlogic/pi-mono +- **npm**: `@mariozechner/pi-coding-agent` +- **Language**: TypeScript (Bun) +- **Stars**: 25,400+ + +Not a multi-agent orchestrator — it's a coding agent itself (like Claude Code but open source). Relevant because its modular package design (`pi-ai`, `pi-agent-core`, `pi-coding-agent`, `pi-tui`) shows how to abstract agent internals. Supports 15+ LLM providers. + +Source: [github.com/badlogic/pi-mono](https://github.com/badlogic/pi-mono) + +--- + +### 9. AI Code Agents SDK (Felix Arntz) + +**TypeScript SDK for vendor-lock-in-free coding agents.** + +- **Blog**: https://felix-arntz.me/blog/introducing-ai-code-agents-a-typescript-sdk-to-solve-vendor-lock-in-for-coding-agents/ +- **Language**: TypeScript +- **Built on**: Vercel AI SDK +- **Status**: Very early stage (announced November 2025) + +Abstracts **Environment** (sandboxed execution contexts) and **Tools** (file system, commands) behind interfaces. Model-agnostic via Vercel AI SDK. + +**Not a CLI agent spawner** — it's an SDK for *building* coding agents, not orchestrating existing ones. No GitHub repository found (may be private or unreleased). + +#### Reliability: 2/10 (not yet available) +#### Confidence: 3/10 + +--- + +### 10. Claude Code Bridge (ccb) + +**Terminal-based multi-AI collaboration via split panes.** + +- **URL**: https://github.com/bfly123/claude_code_bridge +- **Stars**: 1,759 +- **Language**: **Python** (not TypeScript) + +Orchestrates Claude, Codex, Gemini, OpenCode, Droid through terminal multiplexer (WezTerm/tmux) split panes. 50-200 tokens per call via persistent sessions. + +**Not suitable**: Python, tmux-based, designed for human-visible terminal interaction. + +Source: [github.com/bfly123/claude_code_bridge](https://github.com/bfly123/claude_code_bridge) + +--- + +## Related Infrastructure + +### node-pty + xterm.js (Terminal Emulation in Electron) + +The foundational building blocks if we build our own solution: + +- **node-pty**: `npm install node-pty` — fork pseudoterminals in Node.js. Used by VS Code, Hyper, and many Electron terminal apps. Supports Linux, macOS, Windows (conpty). [github.com/microsoft/node-pty](https://github.com/microsoft/node-pty) +- **xterm.js**: Terminal emulator for the browser/Electron renderer. [github.com/xtermjs/xterm.js](https://github.com/xtermjs/xterm.js) +- **@loopmode/xpty**: React component + helpers for building terminals in Electron with xterm.js + node-pty. [github.com/loopmode/xpty](https://github.com/loopmode/xpty) + +This is essentially what Coder AgentAPI does in Go. We could replicate the approach in TypeScript using node-pty directly. + +**Important**: node-pty is **not thread-safe** and requires native compilation. Already used by many Electron apps successfully. + +--- + +### Anthropic Claude Agent SDK (Official) + +- **npm**: `@anthropic-ai/claude-agent-sdk` +- **URL**: https://github.com/anthropics/claude-agent-sdk-typescript +- **Docs**: https://platform.claude.com/docs/en/agent-sdk/typescript + +Official SDK for spawning Claude Code programmatically. Includes `spawnClaudeCodeProcess` option, `AgentDefinition` for subagents. Only works with Claude Code. + +--- + +### Awesome CLI Coding Agents (Curated List) + +Comprehensive directory of 80+ CLI coding agents + orchestrators: +- **URL**: https://github.com/bradAGI/awesome-cli-coding-agents + +Notable orchestrators from the list: +- **Superset** (7.4k stars) — terminal for coding agents, parallel sessions +- **Claude Squad** (6.4k stars) — tmux multi-session Claude Code +- **Crystal** (3.0k stars) — parallel agents in git worktrees +- **Toad** (2.7k stars) — agent orchestrator for parallel CLI sessions +- **Emdash** (2.7k stars) — concurrent coding agents + +--- + +## Key Findings + +### 1. No universal npm library exists +There is no `npm install universal-agent` that gives you a clean TypeScript interface to spawn and communicate with arbitrary CLI coding agents. The ecosystem is solving this problem in different ways (MCP servers, HTTP APIs, tmux wrappers, CLI tools) but none are designed as embeddable libraries for Electron. + +### 2. Two architectural approaches dominate + +**Terminal emulation** (AgentAPI approach): +- Spawn a PTY, type into it, parse output +- Works with ANY CLI agent without modification +- Fragile — depends on terminal output format +- Message boundaries are hard to detect + +**stdin/stdout protocol** (our current Claude Code approach): +- `--input-format stream-json --output-format stream-json` +- Clean structured communication +- Only works if CLI supports it +- Each agent has its own protocol (or none) + +### 3. Agent protocol fragmentation +Each CLI agent has a different communication protocol: +- **Claude Code**: stream-json stdin/stdout +- **Codex CLI**: `--json` flag, structured output +- **Gemini CLI**: No programmatic API documented +- **Goose**: Custom protocol +- **Aider**: Text-based, `--message` flag +- **OpenCode**: No public programmatic API + +This fragmentation is why projects like AgentAPI resort to terminal emulation — it's the only truly universal approach. + +### 4. MCP as potential unifier +MCP (Model Context Protocol) is emerging as a common integration point. All major coding agents now support MCP for tools, and projects like MCO and all-agents-mcp use MCP as the orchestration transport. However, MCP doesn't solve the agent *spawning* and *lifecycle management* problem. + +### 5. The ACP (Agent Client Protocol) is emerging +The Agent Client Protocol (mentioned in MCO's ACP mode and the Cursor ACP adapter) may become a standard for agent-to-agent communication, but it's too early and not widely adopted. + +--- + +## Proposed Architecture for Our Project + +Based on this research, the recommended approach is to build our own thin abstraction layer: + +```typescript +// AgentAdapter interface (inspired by Overstory's AgentRuntime + all-agents-mcp's IAgent) +interface AgentAdapter { + // Identity + readonly id: string; // "claude-code" | "codex" | "gemini" | etc. + readonly displayName: string; + + // Detection + isInstalled(): Promise; + getVersion(): Promise; + + // Lifecycle + spawn(config: AgentSpawnConfig): Promise; + + // Capabilities + supportsMcp(): boolean; + supportsStreamJson(): boolean; + supportsTeams(): boolean; +} + +interface AgentProcess { + // Communication + sendMessage(text: string): Promise; + onMessage(handler: (msg: AgentMessage) => void): void; + onStatus(handler: (status: AgentStatus) => void): void; + + // Lifecycle + isAlive(): boolean; + kill(): Promise; + + // Process + readonly pid: number; + readonly stdin: Writable; + readonly stdout: Readable; +} + +interface AgentSpawnConfig { + workingDir: string; + mcpConfig?: string; // path to MCP config file + model?: string; + maxTokens?: number; + disallowedTools?: string[]; + env?: Record; + systemPrompt?: string; +} +``` + +### Implementation approaches (ranked) + +**Option A: Direct child_process spawn with per-agent formatters (Recommended)** +- Use Node.js `child_process.spawn()` for each agent +- Each adapter knows the correct CLI flags and I/O format +- Similar to all-agents-mcp's `BaseAgent` approach +- Reliability: 8/10, Confidence: 9/10 + +**Option B: node-pty terminal emulation (AgentAPI approach in TS)** +- Use `node-pty` to spawn PTY for each agent +- Parse terminal output, inject keystrokes +- Works with any agent but fragile +- Reliability: 6/10, Confidence: 7/10 + +**Option C: Wrap Coder AgentAPI as subprocess** +- Spawn `agentapi server` as a sidecar process +- Communicate via HTTP API +- Leverage their 11 agent support +- Reliability: 7/10, Confidence: 6/10 (Go binary distribution complexity) + +**Option D: Fork all-agents-mcp's TypeScript code** +- Take the IAgent/BaseAgent pattern +- Extend with more agents +- Reliability: 6/10, Confidence: 7/10 + +--- + +## Sources + +- [Coder AgentAPI](https://github.com/coder/agentapi) — HTTP API for 11 coding agents (Go) +- [all-agents-mcp](https://github.com/Dokkabei97/all-agents-mcp) — TypeScript MCP server for 4 agents +- [Overstory](https://github.com/jayminwest/overstory) — AgentRuntime interface with 11 runtimes (TS/Bun) +- [ComposioHQ Agent Orchestrator](https://github.com/ComposioHQ/agent-orchestrator) — TS monorepo, plugin architecture +- [MCO](https://github.com/mco-org/mco) — Python multi-CLI orchestrator with adapter hooks +- [AWS CLI Agent Orchestrator](https://github.com/awslabs/cli-agent-orchestrator) — Python, supervisor-worker pattern +- [Network-AI](https://github.com/jovanSAPFIONEER/Network-AI) — TS, 17 framework adapters, npm library +- [Pi (pi-mono)](https://github.com/badlogic/pi-mono) — TS coding agent toolkit +- [Claude Code Bridge](https://github.com/bfly123/claude_code_bridge) — Python multi-AI collaboration +- [Awesome CLI Coding Agents](https://github.com/bradAGI/awesome-cli-coding-agents) — curated directory of 80+ agents +- [node-pty](https://github.com/microsoft/node-pty) — PTY for Node.js (Microsoft) +- [Anthropic Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-typescript) — Official TS SDK +- [Felix Arntz blog — AI Code Agents SDK](https://felix-arntz.me/blog/introducing-ai-code-agents-a-typescript-sdk-to-solve-vendor-lock-in-for-coding-agents/) — Vendor lock-in abstraction concept +- [AWS Blog — CLI Agent Orchestrator](https://aws.amazon.com/blogs/opensource/introducing-cli-agent-orchestrator-transforming-developer-cli-tools-into-a-multi-agent-powerhouse/) diff --git a/docs/research/unified-llm-api-tools.md b/docs/research/unified-llm-api-tools.md new file mode 100644 index 00000000..3b8e0e8d --- /dev/null +++ b/docs/research/unified-llm-api-tools.md @@ -0,0 +1,571 @@ +# Unified LLM API Libraries for TypeScript/Electron + +> **Date:** 2026-03-24 +> **Goal:** Find the best library that provides a single API for calling multiple LLM providers (OpenAI, Anthropic, Google, etc.) from our Electron app. +> **Requirements:** TypeScript-native, tool calling, streaming, can run in Electron (no server), open source, actively maintained, MCP integration + +--- + +## TL;DR — Recommendation + +**Vercel AI SDK (`ai` + `@ai-sdk/*` providers)** is the clear winner for our use case. + +| Criteria | Winner | +|---|---| +| Best as a library (not framework) | Vercel AI SDK | +| Tool calling across providers | Vercel AI SDK | +| Streaming | Vercel AI SDK | +| TypeScript DX | Vercel AI SDK | +| MCP integration | Vercel AI SDK | +| Runs in Electron (no server) | Vercel AI SDK, multi-llm-ts | +| Community & maintenance | Vercel AI SDK | +| Lightweight / minimal footprint | multi-llm-ts | + +If we need something **even simpler** with zero framework overhead and 12 provider support, `multi-llm-ts` is a solid lightweight alternative (already used by a production Electron app — Witsy). + +--- + +## Candidates Compared + +### 1. Vercel AI SDK (RECOMMENDED) + +| | | +|---|---| +| **Package** | `ai` (core), `@ai-sdk/openai`, `@ai-sdk/anthropic`, `@ai-sdk/google`, etc. | +| **GitHub** | [github.com/vercel/ai](https://github.com/vercel/ai) | +| **Stars** | ~23K | +| **npm downloads** | ~4.5M/week (across `ai` + `@ai-sdk/*` packages) | +| **License** | Apache 2.0 | +| **Latest version** | ai@6.0.138 (March 2026) | +| **TypeScript** | Native TypeScript, written from scratch. Excellent DX. | +| **Contributors** | 597+ | + +**Provider coverage:** +100+ models supported. Official provider packages for: OpenAI, Anthropic, Google (Gemini), Mistral, Cohere, Amazon Bedrock, Azure OpenAI, xAI (Grok), Groq, Perplexity, Fireworks, Together AI, DeepSeek, Ollama (local), and 40+ community providers including OpenRouter, Portkey, etc. + +**Tool calling:** Full support via `generateText` and `streamText`. Multi-step tool execution loops with `stopWhen`. AI SDK 6 introduces `ToolLoopAgent` for automatic tool execution. `needsApproval: true` for human-in-the-loop. Type-safe tool definitions with Zod schemas. + +**Streaming:** First-class streaming via `streamText()` and `streamObject()`. Returns async iterable `textStream`. No custom parsing needed. + +**MCP integration:** Full MCP support since AI SDK 6. Built-in MCP client with `tools()` method that adapts MCP tools to AI SDK tools. Supports HTTP/SSE/stdio transports. OAuth authentication for MCP servers. Elicitation support (MCP servers can request user input). + +**Can run in Electron:** YES. `generateText()` and `streamText()` are pure Node.js functions — no web server required. Work directly in Electron's main process. Confirmed by Sentry's Electron + Vercel AI integration. Community project [electron-ai-chatbot](https://github.com/pashvc/electron-ai-chatbot) exists. + +**Maturity:** Very high. Used by Thomson Reuters, Clay, and "teams ranging from startups to Fortune 500 companies". 20M+ monthly downloads. Active development with frequent releases (multiple per week). + +**Strengths:** +- Most library-like: single function calls (`generateText`, `streamText`, `generateObject`), no framework lock-in +- Switch providers by changing one line of code +- Best TypeScript DX in the category +- Huge ecosystem of provider packages +- Excellent documentation at [ai-sdk.dev](https://ai-sdk.dev/) +- Built-in fallbacks in AI SDK 6 +- DevTools for debugging LLM calls + +**Weaknesses:** +- Provider packages add separate dependencies (though each is small) +- UI hooks (`useChat`, `useCompletion`) are React/web focused — not relevant for our Electron main process use +- Some newer features (AI SDK 6) are still stabilizing + +**Reliability: 9/10 | Confidence: 9/10** + +**Links:** +- [Official docs](https://ai-sdk.dev/docs/introduction) +- [Tool calling docs](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling) +- [MCP tools docs](https://ai-sdk.dev/docs/ai-sdk-core/mcp-tools) +- [Node.js getting started](https://ai-sdk.dev/docs/getting-started/nodejs) +- [AI SDK 6 announcement](https://vercel.com/blog/ai-sdk-6) +- [npm: ai](https://www.npmjs.com/package/ai) +- [GitHub](https://github.com/vercel/ai) + +--- + +### 2. multi-llm-ts (Lightweight Alternative) + +| | | +|---|---| +| **Package** | `multi-llm-ts` | +| **GitHub** | [github.com/nbonamy/multi-llm-ts](https://github.com/nbonamy/multi-llm-ts) | +| **Stars** | ~50 (small project) | +| **npm downloads** | ~211/week | +| **License** | MIT | +| **Latest version** | 4.6.2 (March 2026) | +| **TypeScript** | Native TypeScript | +| **Maintainers** | 1 | + +**Provider coverage:** +12 providers: OpenAI, Anthropic, Google, Mistral, Groq, Ollama, xAI, DeepSeek, Cerebras, Meta/Llama, Azure AI, OpenRouter. + +**Tool calling:** Built-in plugin/tool system. Define tools with parameter descriptions and execution logic. Tool calling handled automatically across all providers. + +**Streaming:** `complete()` (non-streaming) and `generate()` (streaming) methods. + +**MCP integration:** None built-in. + +**Can run in Electron:** YES. Already powering [Witsy](https://github.com/nbonamy/witsy) — a production Electron desktop AI assistant using 20+ providers through this library. This is the most proven Electron integration of any library on this list. + +**Maturity:** Active development, frequent releases. Small community but proven in production via Witsy. + +**Strengths:** +- Smallest, most focused library — does exactly one thing well +- Already proven in a real Electron desktop app +- MIT license +- Clean abstraction: `igniteEngine()` / `igniteModel()` → `complete()` / `generate()` +- AbortSignal support for cancellation +- Token usage tracking +- Multi-attachment support + +**Weaknesses:** +- Single maintainer — bus factor risk +- Very small community (~211 downloads/week) +- No MCP integration +- No structured output (generateObject equivalent) +- 12 providers vs 100+ in Vercel AI SDK +- Limited documentation + +**Reliability: 6/10 | Confidence: 7/10** + +**Links:** +- [npm: multi-llm-ts](https://www.npmjs.com/package/multi-llm-ts) +- [GitHub](https://github.com/nbonamy/multi-llm-ts) +- [Witsy (Electron app using it)](https://github.com/nbonamy/witsy) + +--- + +### 3. Mastra + +| | | +|---|---| +| **Package** | `@mastra/core` | +| **GitHub** | [github.com/mastra-ai/mastra](https://github.com/mastra-ai/mastra) | +| **Stars** | ~19.8K | +| **npm downloads** | ~300K/week | +| **License** | Apache 2.0 (core), Enterprise License (ee/ features) | +| **Latest version** | 1.x (January 2026 v1.0) | +| **TypeScript** | Native TypeScript, from Gatsby team | + +**Provider coverage:** +3,388 models from 94 providers — because it uses Vercel AI SDK under the hood for model routing. + +**Tool calling:** Full tool calling support. Define tools with schemas and descriptions. `ToolSearchProcessor` lets agents search for and load tools on demand. + +**Streaming:** Yes, via Vercel AI SDK. + +**MCP integration:** Yes, via `@mastra/mcp` package. Acts as both MCP client and server. Supports SSE, HTTP, and Hono-based MCP servers. MCP tool calls are traced with dedicated span types. + +**Can run in Electron:** Partially. Mastra has an official [Electron guide](https://mastra.ai/guides/getting-started/electron). However, it's designed as a server-side framework with HTTP endpoints. Using it in Electron's main process would mean importing a framework designed for servers into a desktop app. + +**Maturity:** High. v1.0 since January 2026. Y Combinator W25 batch ($13M funding). Used by Replit, PayPal, Sanity. + +**Strengths:** +- Huge provider coverage (94 providers through Vercel AI SDK) +- Built-in agents, workflows, memory, evals +- Clean TypeScript DX +- Strong MCP integration including MCP server authoring +- Backed by VC funding and large team +- Official Electron guide exists + +**Weaknesses:** +- It's a FRAMEWORK, not a library — brings entire agent/workflow/memory system +- Heavy dependency graph (`@mastra/core` pulls in many dependencies) +- Enterprise license for some features (RBAC, ACL) +- Designed primarily for server environments +- Overkill if you just need to call LLMs from Electron +- Uses Vercel AI SDK internally — so you'd be adding a framework layer on top of the library we actually need + +**Reliability: 8/10 | Confidence: 6/10** (for our "library" use case — it's a great framework but overkill) + +**Links:** +- [mastra.ai](https://mastra.ai/) +- [Docs](https://mastra.ai/docs) +- [Electron guide](https://mastra.ai/guides/getting-started/electron) +- [npm: @mastra/core](https://www.npmjs.com/package/@mastra/core) +- [GitHub](https://github.com/mastra-ai/mastra) +- [MCP integration docs](https://docs.mcp.run/integrating/tutorials/mcpx-mastra-ts/) + +--- + +### 4. LangChain.js + +| | | +|---|---| +| **Package** | `langchain`, `@langchain/core`, `@langchain/openai`, etc. | +| **GitHub** | [github.com/langchain-ai/langchainjs](https://github.com/langchain-ai/langchainjs) | +| **Stars** | ~17.3K | +| **npm downloads** | ~1M/week | +| **License** | MIT | +| **Latest version** | langchain@1.2.30 (March 2026) | +| **TypeScript** | TypeScript, ported from Python | + +**Provider coverage:** +100+ LLM providers, 50+ vector stores, hundreds of tools. + +**Tool calling:** Standardized `tool_calls` interface on AIMessage. `bind_tools()` and `create_tool_calling_agent()`. Dynamic tools and recovery from hallucinated tool calls (since v1.2.13). Custom Vitest matchers for tool call assertions. + +**Streaming:** Yes, via `streamEvents` and async iterators. Real-time streaming with `StreamEvents`. + +**MCP integration:** Community integrations exist but not first-party like Vercel AI SDK. + +**Can run in Electron:** Yes, technically (it's Node.js), but: +- Heavy: 101.2 kB gzipped bundle +- Designed for server environments +- Many abstractions add overhead + +**Maturity:** Very high. Largest ecosystem. LangSmith for observability. 8 maintainers. + +**Strengths:** +- Largest ecosystem and community +- Most integrations (100+ providers, 50+ vector stores) +- LangSmith for production observability +- LangGraph for complex agent workflows +- Mature, well-documented + +**Weaknesses:** +- Most framework-like — imposes architecture +- Heaviest bundle (101.2 kB gzipped) +- More boilerplate than Vercel AI SDK +- TypeScript feels like a port from Python (Python-first design) +- Frequent breaking changes historically +- "Powerful but sometimes overly complex for straightforward use cases" +- Edge runtime blocked + +**Reliability: 8/10 | Confidence: 5/10** (for our use case — great framework, wrong fit for lightweight Electron integration) + +**Links:** +- [langchain.com](https://www.langchain.com/) +- [JS docs](https://docs.langchain.com/oss/javascript/langchain/overview) +- [npm: langchain](https://www.npmjs.com/package/langchain) +- [GitHub](https://github.com/langchain-ai/langchainjs) +- [Tool calling with LangChain](https://blog.langchain.com/tool-calling-with-langchain/) + +--- + +### 5. Portkey AI Gateway + +| | | +|---|---| +| **Package** | `@portkey-ai/gateway` (self-hosted), `portkey-ai` (SDK), `@portkey-ai/vercel-provider` | +| **GitHub** | [github.com/Portkey-AI/gateway](https://github.com/Portkey-AI/gateway) | +| **Stars** | ~11K | +| **npm downloads** | Low (niche) | +| **License** | MIT | +| **Latest version** | gateway@1.15.2 | +| **TypeScript** | Written in TypeScript | + +**Provider coverage:** +1,600+ models. 200+ LLM providers. 50+ AI guardrails. + +**Tool calling:** Supported via OpenAI-compatible API. Also integrates as [Vercel AI SDK provider](https://ai-sdk.dev/providers/community-providers/portkey). + +**Streaming:** Yes. + +**MCP integration:** Has MCP Gateway feature for centralized MCP server management. + +**Can run in Electron:** PARTIALLY. The gateway itself can run via `npx @portkey-ai/gateway` (starts a local server). The SDK (`portkey-ai`) is a client that needs a running gateway. This means you'd need to either: (a) run the gateway as a subprocess in Electron, or (b) use the hosted Portkey service. Neither is ideal vs just importing a library. + +**Maturity:** High. 10B+ tokens processed daily. SOC2, HIPAA, GDPR compliant. Used by Postman, Haptik, Turing. + +**Strengths:** +- Enterprise-grade: fallbacks, retries, load balancing, guardrails +- 1,600+ models +- <1ms gateway latency, 122kb footprint +- Excellent observability and logging +- MCP Gateway for centralized tool management +- Integrates with Vercel AI SDK as a provider + +**Weaknesses:** +- Gateway architecture — needs a running server/proxy, doesn't work as a pure import +- For Electron, adds unnecessary complexity (subprocess management) +- Best as a production gateway, not as an embedded library +- Hosted service has latency (25-40ms added) +- Primarily designed for server/cloud deployments + +**Reliability: 9/10 | Confidence: 4/10** (excellent product, wrong architecture for embedded Electron use) + +**Links:** +- [portkey.ai](https://portkey.ai/) +- [Gateway docs](https://portkey.ai/docs/product/ai-gateway) +- [npm: @portkey-ai/gateway](https://www.npmjs.com/package/@portkey-ai/gateway) +- [GitHub](https://github.com/Portkey-AI/gateway) +- [Vercel AI SDK provider](https://ai-sdk.dev/providers/community-providers/portkey) + +--- + +### 6. OpenRouter SDK + +| | | +|---|---| +| **Package** | `@openrouter/sdk` | +| **GitHub** | [github.com/OpenRouterTeam/typescript-sdk](https://github.com/OpenRouterTeam/typescript-sdk) | +| **Stars** | ~148 | +| **npm downloads** | ~345K/week | +| **License** | Apache 2.0 | +| **Latest version** | 0.9.11 (beta) | +| **TypeScript** | Auto-generated from OpenAPI spec | + +**Provider coverage:** +300+ models from 60+ providers through OpenRouter's unified endpoint. + +**Tool calling:** Yes, built-in. Clean architecture for agentic workflows. + +**Streaming:** Yes. + +**MCP integration:** Not built-in. OpenRouter is a routing service, not an MCP-aware system. + +**Can run in Electron:** YES, but requires internet connectivity to OpenRouter's API. All requests go through OpenRouter's servers (adds 25-40ms latency). Cannot use API keys directly with providers — must go through OpenRouter. + +**Maturity:** SDK is in BETA. May have breaking changes between versions. + +**Strengths:** +- Simple: one API key, one endpoint, 300+ models +- Auto-generated types always match the API +- High weekly downloads (345K) +- Pay-as-you-go pricing +- Also available as Vercel AI SDK provider (`@openrouter/ai-sdk-provider`, 611 stars) + +**Weaknesses:** +- BETA status — not production-stable +- Requires routing through OpenRouter's servers (vendor dependency) +- Added latency per request +- Cannot use your own API keys directly with providers +- ESM-only (no CommonJS support) +- Not a library — it's a client for a service + +**Reliability: 6/10 | Confidence: 5/10** (good service, but vendor dependency + beta status) + +**Links:** +- [openrouter.ai](https://openrouter.ai/) +- [TypeScript SDK docs](https://openrouter.ai/docs/sdks/typescript) +- [npm: @openrouter/sdk](https://www.npmjs.com/package/@openrouter/sdk) +- [GitHub](https://github.com/OpenRouterTeam/typescript-sdk) +- [AI SDK provider](https://www.npmjs.com/package/@openrouter/ai-sdk-provider) + +--- + +### 7. Google Genkit + +| | | +|---|---| +| **Package** | `genkit` | +| **GitHub** | [github.com/firebase/genkit](https://github.com/firebase/genkit) | +| **Stars** | ~5.7K | +| **npm downloads** | ~moderate (41 dependents) | +| **License** | Apache 2.0 | +| **Latest version** | 1.30.1 | +| **TypeScript** | TypeScript + Go + Python | + +**Provider coverage:** +Google (Gemini), OpenAI, Anthropic, Ollama, AWS Bedrock, Azure OpenAI, Mistral, Cloudflare Workers AI, Hugging Face, and more via plugins. + +**Tool calling:** Full support via `defineTool` API. Interrupts for human-in-the-loop. Multi-agent architectures with sub-agents as tools. + +**Streaming:** Yes. + +**MCP integration:** Yes, supports connecting to external MCP servers for tool discovery and execution. + +**Can run in Electron:** Technically yes (Node.js), but designed for Firebase/Cloud Run deployment. Brings CLI, local dev UI, and server deployment patterns. + +**Maturity:** Built by Google, used in production by Firebase. Active development. + +**Strengths:** +- Built by Google, used in production +- Clean tool calling API +- Multi-agent support +- MCP integration +- Dev UI for debugging + +**Weaknesses:** +- Firebase/Google ecosystem bias +- Server-oriented design (CLI, cloud deployment focus) +- Smaller ecosystem than Vercel AI SDK or LangChain +- Not designed for desktop/Electron apps + +**Reliability: 7/10 | Confidence: 4/10** (good framework, Google-centric, not ideal for Electron) + +**Links:** +- [genkit.dev](https://genkit.dev/) +- [Firebase docs](https://firebase.google.com/docs/genkit) +- [npm: genkit](https://www.npmjs.com/package/genkit) +- [GitHub](https://github.com/firebase/genkit) +- [Tool calling docs](https://genkit.dev/docs/js/tool-calling/) + +--- + +### 8. Bifrost (Maxim AI) + +| | | +|---|---| +| **Package** | `@maximhq/bifrost` (via npx) | +| **GitHub** | [github.com/maximhq/bifrost](https://github.com/maximhq/bifrost) | +| **Stars** | ~2K+ | +| **License** | Source-available (check repo) | +| **Language** | Go (not TypeScript) | + +**Provider coverage:** +15+ providers through OpenAI-compatible API. + +**Tool calling:** Yes, via "Code Mode" — innovative approach reducing token usage by 50%. + +**MCP integration:** Yes, acts as both MCP client and server. Centralized MCP tool management. + +**Can run in Electron:** NO — it's a Go binary that runs as a server. Would need to be spawned as a subprocess and communicated with via HTTP. + +**Strengths:** +- Blazing fast: 11us overhead (50x faster than LiteLLM) +- Code Mode innovation for tool calling +- Strong MCP gateway features + +**Weaknesses:** +- Go binary, not a JS library +- Requires running a separate server process +- Wrong architecture for embedded Electron use + +**Reliability: 7/10 | Confidence: 2/10** (great gateway, completely wrong for our use case) + +**Links:** +- [docs.getbifrost.ai](https://docs.getbifrost.ai/overview) +- [GitHub](https://github.com/maximhq/bifrost) + +--- + +## Comparison Matrix + +| Library | Stars | npm/week | Tool Calling | Streaming | MCP | Electron | TypeScript | License | Library vs Framework | +|---|---|---|---|---|---|---|---|---|---| +| **Vercel AI SDK** | 23K | 4.5M | Excellent | Excellent | Full (v6) | YES | Native | Apache 2.0 | Library | +| **multi-llm-ts** | ~50 | 211 | Good | Good | No | YES (proven) | Native | MIT | Library | +| **Mastra** | 19.8K | 300K | Excellent | Excellent | Full | Partial | Native | Apache 2.0* | Framework | +| **LangChain.js** | 17.3K | 1M | Excellent | Good | Partial | Heavy | Ported | MIT | Framework | +| **Portkey** | 11K | Low | Good | Yes | MCP Gateway | Needs server | Native TS | MIT | Gateway | +| **OpenRouter SDK** | 148 | 345K | Good | Yes | No | Via service | Auto-gen | Apache 2.0 | Service client | +| **Google Genkit** | 5.7K | Moderate | Good | Yes | Yes | Server-focused | Native | Apache 2.0 | Framework | +| **Bifrost** | 2K+ | N/A | Innovative | Yes | Full | No (Go binary) | N/A | Source-avail | Gateway | + +--- + +## Architecture for Our Electron App + +### Recommended Approach: Vercel AI SDK in Electron Main Process + +``` +Renderer (React UI) + │ + │ IPC (ipcMain / ipcRenderer) + │ +Main Process (Node.js) + ├── AI SDK Core (generateText, streamText, generateObject) + │ ├── @ai-sdk/openai → OpenAI API + │ ├── @ai-sdk/anthropic → Anthropic API + │ ├── @ai-sdk/google → Google Gemini API + │ └── @ai-sdk/xai → xAI/Grok API + │ + ├── MCP Client (AI SDK built-in) + │ └── Connect to MCP servers for tool discovery + │ + └── API Key Storage (local, secure) +``` + +### Installation + +```bash +pnpm add ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google +``` + +### Example Usage (Electron Main Process) + +```typescript +import { generateText, streamText } from 'ai'; +import { openai } from '@ai-sdk/openai'; +import { anthropic } from '@ai-sdk/anthropic'; +import { google } from '@ai-sdk/google'; + +// Switch provider by changing one line +const model = anthropic('claude-sonnet-4-20250514'); +// const model = openai('gpt-4o'); +// const model = google('gemini-2.0-flash'); + +// Non-streaming +const { text } = await generateText({ + model, + prompt: 'Explain quantum computing', +}); + +// Streaming +const result = streamText({ + model, + prompt: 'Write a story', +}); +for await (const chunk of result.textStream) { + // Send to renderer via IPC + mainWindow.webContents.send('ai:chunk', chunk); +} + +// Tool calling +const { text, toolCalls } = await generateText({ + model, + tools: { + getWeather: { + description: 'Get weather for a location', + parameters: z.object({ city: z.string() }), + execute: async ({ city }) => fetchWeather(city), + }, + }, + prompt: 'What is the weather in Tokyo?', +}); +``` + +--- + +## Decision + +**Primary choice: Vercel AI SDK (`ai` + provider packages)** +- Reliability: 9/10 +- Confidence: 9/10 +- Reason: Best TypeScript DX, most library-like, full MCP support, huge ecosystem, works in Electron main process, active development + +**Fallback / lightweight alternative: `multi-llm-ts`** +- Reliability: 6/10 +- Confidence: 7/10 +- Reason: Already proven in production Electron app (Witsy), minimal footprint, but small community and no MCP + +**NOT recommended for our use case:** +- LangChain.js — too heavy, framework-oriented, Python-first design +- Mastra — excellent framework but overkill (and uses Vercel AI SDK internally anyway) +- Portkey/Bifrost — gateway architecture, needs running server +- OpenRouter SDK — vendor dependency, beta status +- Google Genkit — server/Firebase oriented + +--- + +## Sources + +- [Vercel AI SDK — Official docs](https://ai-sdk.dev/docs/introduction) +- [Vercel AI SDK — GitHub](https://github.com/vercel/ai) +- [AI SDK 6 announcement](https://vercel.com/blog/ai-sdk-6) +- [AI SDK MCP tools](https://ai-sdk.dev/docs/ai-sdk-core/mcp-tools) +- [AI SDK Tool Calling](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling) +- [LangChain.js — GitHub](https://github.com/langchain-ai/langchainjs) +- [LangChain.js — npm](https://www.npmjs.com/package/langchain) +- [LangChain vs Vercel AI SDK vs OpenAI SDK: 2026 Guide](https://strapi.io/blog/langchain-vs-vercel-ai-sdk-vs-openai-sdk-comparison-guide) +- [Mastra — Official site](https://mastra.ai/) +- [Mastra — GitHub](https://github.com/mastra-ai/mastra) +- [Mastra Electron guide](https://mastra.ai/guides/getting-started/electron) +- [Mastra Licensing](https://mastra.ai/docs/community/licensing) +- [Portkey AI Gateway — GitHub](https://github.com/Portkey-AI/gateway) +- [Portkey AI docs](https://portkey.ai/docs/product/ai-gateway) +- [Portkey Vercel provider](https://ai-sdk.dev/providers/community-providers/portkey) +- [OpenRouter — TypeScript SDK docs](https://openrouter.ai/docs/sdks/typescript) +- [OpenRouter — npm](https://www.npmjs.com/package/@openrouter/sdk) +- [Google Genkit — GitHub](https://github.com/firebase/genkit) +- [Genkit Tool Calling](https://genkit.dev/docs/js/tool-calling/) +- [Bifrost — GitHub](https://github.com/maximhq/bifrost) +- [Bifrost docs](https://docs.getbifrost.ai/overview) +- [multi-llm-ts — GitHub](https://github.com/nbonamy/multi-llm-ts) +- [multi-llm-ts — npm](https://www.npmjs.com/package/multi-llm-ts) +- [Witsy (Electron app using multi-llm-ts)](https://github.com/nbonamy/witsy) +- [3 Best Open Source LiteLLM Alternatives in 2026](https://openalternative.co/alternatives/litellm) +- [Best LiteLLM Alternatives in 2026](https://www.getmaxim.ai/articles/best-litellm-alternatives-in-2026/) +- [AI Framework Comparison: Vercel AI SDK, Mastra, Langchain and Genkit](https://komelin.com/blog/ai-framework-comparison) +- [Top 5 TypeScript AI Agent Frameworks 2026](https://blog.agentailor.com/posts/top-typescript-ai-agent-frameworks-2026) +- [Sentry Electron + Vercel AI integration](https://docs.sentry.io/platforms/javascript/guides/electron/configuration/integrations/vercelai/) +- [Electron AI Chatbot](https://github.com/pashvc/electron-ai-chatbot) diff --git a/docs/research/unified-mcp-architecture.md b/docs/research/unified-mcp-architecture.md new file mode 100644 index 00000000..b2c92a00 --- /dev/null +++ b/docs/research/unified-mcp-architecture.md @@ -0,0 +1,485 @@ +# Unified MCP Architecture: Should Claude Also Use MCP for Kanban? + +**Date**: 2026-03-24 +**Branch**: `dev` +**Based on**: deep analysis of `mcp-server/`, `agent-teams-controller/`, `TeamProvisioningService.ts`, `TeamMcpConfigBuilder.ts`, and data flow through file watchers + +--- + +## The Question + +If Codex/Gemini use MCP for kanban management, should Claude also use MCP instead of its native built-in tools? This would unify the architecture into a single code path. + +--- + +## Current State: What Exists Today + +### MCP Server (30+ tools, fully provider-agnostic) + +Our `mcp-server/` package exposes these tools via FastMCP over stdio: + +| Category | Tools | Count | +|----------|-------|-------| +| Tasks | `task_create`, `task_create_from_message`, `task_get`, `task_get_comment`, `task_list`, `task_set_status`, `task_start`, `task_complete`, `task_set_owner`, `task_add_comment`, `task_attach_file`, `task_attach_comment_file`, `task_set_clarification`, `task_link`, `task_unlink`, `member_briefing`, `task_briefing` | 17 | +| Kanban | `kanban_get`, `kanban_set_column`, `kanban_clear`, `kanban_list_reviewers`, `kanban_add_reviewer`, `kanban_remove_reviewer` | 6 | +| Review | `review_request`, `review_start`, `review_approve`, `review_request_changes` | 4 | +| Messages | `message_send` | 1 | +| Processes | `process_register`, `process_list`, `process_unregister`, `process_stop` | 4 | +| Cross-team | `cross_team_send`, `cross_team_list_targets`, `cross_team_get_outbox` | 3 | +| Runtime | `team_launch`, `team_stop` | 2 | +| **Total** | | **37** | + +### Claude's Native Built-in Tools (Claude Code Agent Teams) + +These exist ONLY inside Claude Code CLI and cannot be replaced: + +| Tool | Purpose | Can MCP replace? | +|------|---------|------------------| +| `TeamCreate` | Creates team config on disk, initializes team state | Partially (MCP can write config, but Claude Code uses this to enter "team mode") | +| `TeamDelete` | Deletes team, cleans up processes | Partially | +| `TaskCreate` (Agent tool with `team_name`) | Spawns a teammate subprocess | **NO** -- this is process spawning, not task creation | +| `SendMessage` | Claude's native inbox message delivery | Partially (MCP `message_send` writes to same files) | +| `TaskGet` | Claude's native task query | Yes, `task_get` MCP does the same | +| `TaskList` | Claude's native task listing | Yes, `task_list` MCP does the same | +| `TaskUpdate` | Claude's native task update | Yes, `task_set_status`/`task_set_owner` MCP do the same | + +**Critical insight**: Claude Code's `TaskCreate` with `team_name` parameter is NOT a task-creation tool -- it's a **teammate process spawner**. It tells Claude Code CLI to fork a new subprocess for a teammate. No MCP tool can replace this because it's an internal CLI operation. + +### Data Flow: Where Files Live + +Both Claude's native tools AND our MCP server write to the **same directories**: + +``` +~/.claude/ + teams// + config.json -- team configuration + kanban-state.json -- kanban board state + processes.json -- registered processes + members.meta.json -- member metadata + inboxes/ + .json -- per-member inbox messages + user.json -- messages to the user + task-attachments/ + / -- file attachments + tasks// + .json -- individual task files +``` + +**This is already a shared data layer.** Our MCP server uses `agent-teams-controller` which reads/writes these exact files. Claude Code CLI also reads/writes these files via its built-in Agent Teams feature. The file watchers in `src/main/` detect changes from ANY source. + +### How Claude ALREADY Uses MCP + +Claude Code agents (both lead and teammates) **already** receive our MCP server via `--mcp-config`: + +``` +TeamMcpConfigBuilder.writeConfigFile() + → generates temp JSON config pointing to mcp-server/dist/index.js + → passed to Claude CLI via --mcp-config + → Claude Code loads our MCP tools alongside its built-in tools +``` + +The prompt in `buildTeamCtlOpsInstructions()` teaches Claude to use MCP tools: +``` +Internal task board tooling (MCP): +- Use the board-management MCP tools for tasks that must appear on the team board +``` + +And `buildMemberSpawnPrompt()` instructs teammates: +``` +First call member_briefing to learn your current assigned tasks... +Use task_start/task_complete/task_add_comment to track progress... +``` + +**Claude Code agents already use our MCP tools for task/kanban management.** They use native tools only for: team creation, teammate spawning, and direct messaging (though `message_send` MCP also works). + +--- + +## Three Architectures Compared + +### Architecture A: Dual-Path (Current Proposal for Multi-Provider) + +``` + +-----------------+ + | Kanban UI | + | (Electron) | + +--------+--------+ + | + +--------+--------+ + | File Watchers | + | (chokidar) | + +--------+--------+ + | + +--------------+--------------+ + | | + +---------+----------+ +------------+-----------+ + | ~/.claude/teams/ | | ~/.claude/tasks/ | + | config, kanban, | | .json files | + | inboxes, processes | | | + +----+----------+----+ +-----+------------+-----+ + | | | | + | | | | + +----+----+ +---+--------+ +---+----+ +-----+------+ + | Claude | | MCP Server | | Claude | | MCP Server | + | Native | | (agent- | | Native | | (agent- | + | Tools | | teams-mcp) | | Tools | | teams-mcp) | + +---------+ +-----+------+ +--------+ +-----+------+ + | | | | + +----+----+ +----+-----+ +----+----+ +----+-----+ + | Claude | | Codex/ | | Claude | | Codex/ | + | Code | | Gemini/ | | Code | | Gemini/ | + | CLI | | Any MCP | | CLI | | Any MCP | + +---------+ | Agent | +---------+ | Agent | + +----------+ +----------+ +``` + +**Data flow:** +- Claude -> native built-in tools -> writes directly to `~/.claude/teams/` and `~/.claude/tasks/` +- Claude -> MCP tools -> `agent-teams-controller` -> writes to same files +- Codex/Gemini -> MCP tools -> `agent-teams-controller` -> writes to same files +- File watchers detect ALL changes -> UI updates + +| Criterion | Score | +|-----------|-------| +| Reliability | **9/10** | +| Confidence | **9/10** | +| Effort | 3-4 weeks | +| Risk | Very Low | +| Code reuse | 100% | + +**Pros:** +- Zero risk to existing Claude Code functionality +- Claude uses its battle-tested native tools (TeamCreate, Agent/Task tool for spawning) +- MCP tools handle task/kanban CRUD (Claude already uses these) +- External agents use MCP exclusively +- Both paths write to same files, file watchers don't care who writes +- 30+ MCP tools already exist and are tested + +**Cons:** +- Two "entry points" for writes (native tools + MCP tools), though they share the same data layer +- Claude has redundant tools (native TaskGet + MCP task_get), but the prompt steers which to use +- If agent-teams-controller changes, both native and MCP paths need verification + +--- + +### Architecture B: Unified MCP (ALL agents use MCP only) + +``` + +-----------------+ + | Kanban UI | + | (Electron) | + +--------+--------+ + | + +--------+--------+ + | File Watchers | + | (chokidar) | + +--------+--------+ + | + +--------------+--------------+ + | | + +---------+----------+ +------------+-----------+ + | ~/.claude/teams/ | | ~/.claude/tasks/ | + +--------+-----------+ +--------+---------------+ + | | + +----------+---------------+ + | + +--------+--------+ + | MCP Server | + | (agent-teams- | + | controller) | + +--------+--------+ + | + +-------------+-------------+ + | | | + +----+----+ +----+-----+ +-----+----+ + | Claude | | Codex/ | | Gemini/ | + | Code | | Gemini | | Other | + | CLI | | CLI | | Agents | + +---------+ +----------+ +----------+ +``` + +**Data flow:** +- ALL agents -> MCP tools only -> `agent-teams-controller` -> writes to `~/.claude/tasks/` and `~/.claude/teams/` +- Claude Code's native tools (TeamCreate, TaskCreate, SendMessage) are DISABLED or unused +- File watchers detect changes -> UI updates + +| Criterion | Score | +|-----------|-------| +| Reliability | **4/10** | +| Confidence | **3/10** | +| Effort | 8-12 weeks | +| Risk | Very High | +| Code reuse | ~40% | + +**Pros:** +- Single code path for all agents +- Single set of tools to maintain +- Architecturally "clean" + +**Cons -- and this is where the analysis gets critical:** + +1. **Cannot disable Claude's `TaskCreate` (Agent tool with team_name)** + - This is how Claude Code spawns teammate subprocesses + - There is no MCP equivalent -- MCP tools return JSON responses, they cannot fork processes + - `--disallowedTools TaskCreate` would break teammate spawning entirely + - Our `team_launch` MCP tool talks to the desktop runtime HTTP API -- it's a different mechanism (launches the whole team, not individual teammates) + +2. **Cannot fully replace `TeamCreate`** + - `TeamCreate` puts Claude Code CLI into "team mode" -- it enables Agent Teams features, stdin relay, inbox monitoring + - Writing `config.json` via MCP creates the files but doesn't activate the CLI-side features + - The CLI needs to be told about the team through its own internal protocol + +3. **Cannot fully replace `SendMessage`** + - Our MCP `message_send` writes to inbox files, which works for teammates (they read inbox files directly) + - But the lead reads messages via stdin relay (`relayLeadInboxMessages()`). MCP `message_send` to lead would require the relay to detect the file write and relay it -- this works but is a longer path with more latency + - Risk of message delivery race conditions during high-frequency messaging + +4. **Prompt rewrite is massive and risky** + - `buildProvisioningPrompt()` (95 LOC) teaches Claude to use `TeamCreate` + `Agent` tool -- would need complete rewrite + - `buildPersistentLeadContext()` (100+ LOC) references built-in tools throughout + - `buildMemberSpawnPrompt()` references `member_briefing` MCP tool (this part is already MCP-based) + - Total: ~300 LOC of prompt engineering that took months to tune for delegation-first behavior, task board discipline, review workflow + - Any prompt change risks breaking the finely-tuned agent behavior + +5. **Token overhead from MCP tool descriptions** + - 37 MCP tools * ~50-100 tokens each = 1,850-3,700 additional tokens per turn + - Claude's native tools don't consume context (they're built into the CLI) + - For long sessions this accumulates significantly + +6. **MCP tool discovery overhead** + - Each MCP tool call has stdio round-trip overhead vs native tool calls which are in-process + - For high-frequency operations (agent spawning many tasks) this adds latency + +7. **Loss of Claude Code optimizations** + - Claude Code's built-in tools are optimized for its internal state machine + - `TeamCreate` triggers internal event routing, session persistence, teammate monitoring + - Replacing with MCP tools means these side effects would need to be triggered differently + +--- + +### Architecture C: Hybrid Unified (RECOMMENDED) + +``` + +-----------------+ + | Kanban UI | + | (Electron) | + +--------+--------+ + | + +--------+--------+ + | File Watchers | + | (chokidar) | + +--------+--------+ + | + +--------------+--------------+ + | | + +---------+----------+ +------------+-----------+ + | ~/.claude/teams/ | | ~/.claude/tasks/ | + | config, kanban, | | .json files | + | inboxes, processes | | | + +--+---------+---+---+ +---+---------+---+------+ + | | | | | | + | +----+---+----+ | +----+---+-----+ + | | agent-teams- | | | agent-teams- | + | | controller +-------+ | controller | + | | (shared | | (shared | + | | data layer) | | data layer) | + | +----+---------+ +-----+---------+ + | | | + | +----+---------+ +-----+---------+ + | | MCP Server | | MCP Server | + | | (agent- | | (agent- | + | | teams-mcp) | | teams-mcp) | + | +----+---------+ +-----+---------+ + | | | + +----+----+ | +----------+ +-----+----+ + | Claude | +----+ Codex/ | | Gemini/ | + | Native | | Any MCP | | Other | + | Tools: | | Agent | | Agents | + | Team | +----------+ +----------+ + | Create, | + | Agent | Claude ALSO uses MCP for: + | Spawn, | task_create, task_get, task_list, + | Send | task_set_status, kanban_get, + | Message | kanban_set_column, review_request, + +---------+ review_approve, message_send, etc. +``` + +**Data flow:** +- Claude -> native tools for LIFECYCLE operations (TeamCreate, Agent/Task spawning, SendMessage to lead) +- Claude -> MCP tools for CRUD operations (task management, kanban, review, comments) -- **already happens today** +- Codex/Gemini -> MCP tools for ALL operations +- ALL writes go to the same `~/.claude/` directories via `agent-teams-controller` +- File watchers detect ALL changes regardless of source + +| Criterion | Score | +|-----------|-------| +| Reliability | **9/10** | +| Confidence | **9/10** | +| Effort | 3-4 weeks (same as Architecture A) | +| Risk | Very Low | +| Code reuse | 100% | + +**Pros:** +- Claude keeps its native tools for things MCP cannot do (process spawning, entering team mode) +- Claude uses MCP for task/kanban CRUD -- THIS IS ALREADY THE CASE TODAY +- External agents use MCP exclusively -- works today with our 37 tools +- Single data layer (`agent-teams-controller`) for all writes +- File watchers are source-agnostic +- Zero prompt rewriting for Claude +- Zero risk to existing functionality + +**Cons:** +- Claude has both native + MCP tools available (mild complexity) +- Need to ensure no conflicts when Claude's native tools and MCP tools modify the same task + +--- + +## Critical Finding: Architecture C IS Architecture A + +After thorough analysis, Architectures A and C are **functionally identical** because: + +1. **Claude already uses our MCP tools for kanban/task management** -- the prompt explicitly instructs this via `buildTeamCtlOpsInstructions()` +2. **Claude only uses native tools for what MCP cannot do** -- TeamCreate (entering team mode), Agent tool (spawning subprocesses), SendMessage (lead stdin relay) +3. **Both paths already write to the same files** -- `agent-teams-controller` is the shared data layer + +The "dual-path" concern is a misconception. There aren't two competing paths -- there's one path for **lifecycle operations** (Claude Code native) and one path for **data operations** (MCP), and they already coexist. + +--- + +## Does Our MCP Server Write to the Same Files as Claude's Native Tools? + +**YES, unequivocally.** + +Evidence from source code: + +1. `agent-teams-controller/src/internal/runtimeHelpers.js` line 117-124: +```javascript +function getPaths(flags, teamName) { + const claudeDir = getClaudeDir(flags); // defaults to ~/.claude + const teamDir = path.join(claudeDir, 'teams', safeTeam); + const tasksDir = path.join(claudeDir, 'tasks', safeTeam); + const kanbanPath = path.join(teamDir, 'kanban-state.json'); + const processesPath = path.join(teamDir, 'processes.json'); + return { claudeDir, teamDir, tasksDir, kanbanPath, processesPath }; +} +``` + +2. `mcp-server/src/controller.ts` uses `createController({ teamName })` which calls `getPaths()` above +3. Claude Code's native Agent Teams also writes to `~/.claude/teams//` and `~/.claude/tasks//` +4. Both use the same JSON file format with atomic write (temp file + rename) + +**Conflict risk**: Very low. File writes use atomic rename (`writeJson` creates a temp file then `fs.renameSync`). The `fileLock.js` module provides advisory locking for concurrent writes. Task files are per-task (one JSON per task), so different agents working on different tasks don't collide. + +--- + +## Architecture Decision + +### Architecture B (Unified MCP-only) is NOT viable + +The fundamental blocker: **Claude Code's Agent Teams is a CLI feature, not a data feature.** The built-in tools (TeamCreate, Agent tool for spawning) trigger internal CLI state changes that cannot be replicated via MCP. Disabling them would break: + +- Team mode activation +- Teammate process spawning +- Lead inbox relay +- Tool approval flow +- Post-compact context recovery +- Auth retry logic + +These are 7,982 LOC of battle-tested code in `TeamProvisioningService.ts` that would need to be rebuilt from scratch with worse ergonomics. + +### Architecture A/C (Hybrid) is already the architecture we have + +The "should Claude use MCP?" question has already been answered: **Claude already uses MCP for kanban/task operations.** The prompt instructs it. The `--mcp-config` flag delivers our MCP server to every Claude Code agent (lead and teammates). + +The only remaining question is: what do we need to add to support Codex/Gemini? + +--- + +## What's Actually Needed for Multi-Provider Support + +### Already complete (0 additional work) + +- MCP server with 37 tools +- `agent-teams-controller` as provider-agnostic data layer +- File watchers that detect changes from any source +- Atomic file writes to prevent corruption +- HTTP control API for launch/stop + +### Needed: New MCP tools for external agent lifecycle + +``` +team_join -- register as external team member (provider, model metadata) +team_leave -- unregister from team +team_list_teams -- discover available teams +team_get_config -- get team configuration and member list +member_heartbeat -- keepalive signal for external agents +task_poll_assigned -- poll for tasks assigned to this agent +task_claim -- claim an unassigned pending task +``` + +**Files to add:** +- `mcp-server/src/tools/memberTools.ts` (new) +- `mcp-server/src/tools/teamDiscoveryTools.ts` (new) +- `agent-teams-controller/src/internal/memberLifecycle.js` (new) + +**Files to modify:** +- `mcp-server/src/tools/index.ts` -- register new tool modules +- `agent-teams-controller/src/internal/runtimeHelpers.js` -- member metadata helpers +- `src/shared/types/team.ts` -- add `provider?: string`, `model?: string` fields + +**Files unchanged (0 modifications):** +- `TeamProvisioningService.ts` -- untouched +- `TeamDataService.ts` -- reads data generically, will pick up new fields +- `TeamMcpConfigBuilder.ts` -- untouched (Claude-specific) +- All prompt engineering -- untouched + +### Needed: UI enhancements + +- Provider badge/icon on member cards +- "External agent" indicator on kanban task cards +- Different color/treatment for externally-managed agents + +--- + +## Comparison Matrix + +| Criterion | A: Dual-Path | B: Unified MCP | C: Hybrid (=A) | +|-----------|:---:|:---:|:---:| +| Reliability | 9/10 | 4/10 | **9/10** | +| Confidence | 9/10 | 3/10 | **9/10** | +| Effort (weeks) | 3-4 | 8-12 | **3-4** | +| Risk level | Very Low | Very High | **Very Low** | +| Existing code reuse | 100% | ~40% | **100%** | +| Breaks Claude flow? | No | Yes | **No** | +| Breaks prompts? | No | Yes (300+ LOC rewrite) | **No** | +| Single data layer? | Yes | Yes | **Yes** | +| Claude keeps optimizations? | Yes | No | **Yes** | +| Supports Codex/Gemini? | Yes, via MCP | Yes, via MCP | **Yes, via MCP** | +| Token overhead | None extra | +1.8-3.7K tokens/turn | **None extra** | +| MCP standard compliance | Yes | Yes | **Yes** | +| Incremental delivery? | Yes | No | **Yes** | +| Time to first external agent | 2-3 weeks | 8+ weeks | **2-3 weeks** | + +--- + +## Risks and Mitigations + +| Risk | Probability | Impact | Mitigation | +|------|:-----------:|:------:|------------| +| MCP tool conflicts with native tools | Low | Medium | Tools operate on different task IDs; atomic writes; file-level locking in `fileLock.js` | +| External agent writes corrupt state | Low | High | `agent-teams-controller` validates all inputs; atomic write-rename pattern; per-task file isolation | +| External agent doesn't follow workflow | Medium | Low | `member_briefing` provides onboarding; tool descriptions guide behavior; `task_set_clarification` for issues | +| Performance under many agents | Low | Medium | File I/O is the bottleneck (same as now); no additional overhead | +| Claude Code updates break file format | Low | High | `agent-teams-controller` is our adapter layer -- update it when format changes | +| MCP protocol evolution | Very Low | Low | FastMCP library handles protocol; MCP spec is stable (v1.0+) | + +--- + +## Conclusion + +**Architecture C (Hybrid) is the answer, and it's essentially what we already have.** + +The realization that resolves the question: Claude Code already uses our MCP tools for task/kanban management. The "should Claude use MCP too?" question is already answered with "yes, and it does." Claude keeps its native tools for the things that ONLY Claude Code can do (process spawning, team mode activation), and uses MCP for everything that's shared (tasks, kanban, review, messages, comments). + +For Codex/Gemini, we add ~7 new MCP tools for agent lifecycle management. That's it. No architectural changes, no prompt rewrites, no refactoring. The data layer is already shared, the file watchers are already source-agnostic, and the MCP server already exposes the full API. + +The **single most important insight** from this analysis: the architecture is NOT "dual-path." It's a single shared data layer (`agent-teams-controller`) with two access methods -- native tools for Claude Code internal operations, MCP tools for everything else. Both access methods are complementary, not competing.