agent-ecosystem/docs/research/sdk-vs-cli-comparison.md
iliya fe90ac866d Add deep dive research documentation on AI agent orchestration and communication standards
- Introduced multiple markdown files covering agent spawn packages, inter-agent communication protocols, and multi-agent orchestration tools.
- Detailed analysis of official SDKs for CLI agents (Claude Code, Codex, Gemini) and their integration potential.
- Documented various competitor approaches to agent spawning and communication, highlighting strengths and weaknesses.
- Provided insights into best practices for implementing multi-provider support within Electron applications.

This comprehensive documentation aims to enhance understanding of the current AI agent ecosystem and serve as a resource for developers and stakeholders.
2026-03-27 17:51:49 +02:00

376 lines
21 KiB
Markdown

# SDK vs CLI Direct Spawn: Honest Comparison
**Date:** 2026-03-25
**Status:** Research complete
**Verdict:** SDKs are NOT limiting — they ARE the CLI with a nicer API, plus extras. But there are real tradeoffs.
---
## TL;DR
All three SDKs (Claude Agent SDK, Codex SDK, Gemini CLI SDK) **spawn the CLI as a child process** under the hood and communicate via stdin/stdout JSON protocol. The SDK IS the CLI — it just wraps `child_process.spawn()` with a typed API. There is **no functional limitation** vs direct spawn, because the SDK literally does the same thing. However, there is a **real performance overhead** (~12s per `query()` call for Claude) and some **CLI-only features** (Agent Teams for Claude) that require workarounds.
---
## 1. Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`)
### Architecture: How It Works Under the Hood
The SDK **bundles `cli.js`** directly inside the npm package. When you call `query()`, it spawns a Node.js process running this bundled CLI with `--input-format stream-json --output-format stream-json --verbose`. Communication is via NDJSON over stdin/stdout.
> "The SDK code actually bundles a cli.js file directly — which contains the entire Claude Code CLI."
> — [Claude Agent SDK Pitfalls](https://liruifengv.com/posts/claude-agent-sdk-pitfalls-en/)
**Key insight:** The `spawnClaudeCodeProcess` option lets you provide a completely custom spawn function. Node's `ChildProcess` already satisfies the `SpawnedProcess` interface. This means you can override HOW the CLI is spawned — Docker, VM, remote, whatever.
### Complete Options Reference (from [official docs](https://platform.claude.com/docs/en/agent-sdk/typescript))
| Option | Type | Description |
|--------|------|-------------|
| `model` | `string` | Claude model to use |
| `cwd` | `string` | Working directory |
| `env` | `Record<string, string>` | Environment variables |
| `systemPrompt` | `string \| preset` | Custom or `claude_code` preset |
| `allowedTools` | `string[]` | Auto-approve tools |
| `disallowedTools` | `string[]` | Deny tools (checked first, overrides everything) |
| `mcpServers` | `Record<string, McpServerConfig>` | MCP server configs (stdio, SSE, HTTP, in-process SDK) |
| `strictMcpConfig` | `boolean` | Only use MCP servers from this config |
| `settingSources` | `SettingSource[]` | `["user", "project", "local"]` to match CLI behavior |
| `permissionMode` | `PermissionMode` | `default`, `acceptEdits`, `bypassPermissions`, `plan`, `dontAsk` |
| `canUseTool` | `Function` | Custom permission callback |
| `agents` | `Record<string, AgentDefinition>` | Programmatic subagents |
| `hooks` | `Partial<Record<HookEvent, ...>>` | Programmatic hook callbacks |
| `plugins` | `SdkPluginConfig[]` | Load custom plugins |
| `maxTurns` | `number` | Max agentic turns |
| `maxBudgetUsd` | `number` | Budget cap |
| `effort` | `'low'\|'medium'\|'high'\|'max'` | Thinking depth |
| `thinking` | `ThinkingConfig` | Adaptive thinking config |
| `betas` | `SdkBeta[]` | Beta features (e.g., `context-1m-2025-08-07`) |
| `includePartialMessages` | `boolean` | Stream partial responses |
| `outputFormat` | `{ type: 'json_schema', schema }` | Structured output |
| `spawnClaudeCodeProcess` | `Function` | Custom spawn function |
| `pathToClaudeCodeExecutable` | `string` | Custom CLI path |
| `executable` | `'bun'\|'deno'\|'node'` | JS runtime |
| `executableArgs` | `string[]` | Runtime args |
| **`extraArgs`** | **`Record<string, string\|null>`** | **ANY arbitrary CLI flags** |
| `debug` | `boolean` | Debug mode |
| `debugFile` | `string` | Debug log file |
| `sandbox` | `SandboxSettings` | Sandbox config |
| `persistSession` | `boolean` | Disable session persistence |
| `resume` | `string` | Resume session by ID |
| `forkSession` | `boolean` | Fork on resume |
| `enableFileCheckpointing` | `boolean` | File change tracking |
| `fallbackModel` | `string` | Fallback model |
| `promptSuggestions` | `boolean` | Emit prompt suggestions |
| `stderr` | `Function` | Stderr callback |
### Feature Comparison: SDK vs CLI Direct
| Feature | CLI Direct | SDK | Notes |
|---------|-----------|-----|-------|
| MCP config | `--mcp-config '{...}'` | `mcpServers: {...}` | SDK has typed config + in-process MCP servers (SDK advantage) |
| Strict MCP | `--strict-mcp-config` | `strictMcpConfig: true` | Equivalent |
| Disallowed tools | `--disallowedTools X,Y` | `disallowedTools: ['X','Y']` | Equivalent. Known bug: both ignore MCP tools in `-p` mode ([#12863](https://github.com/anthropics/claude-code/issues/12863)) |
| Allowed tools | `--allowedTools X,Y` | `allowedTools: ['X','Y']` | Equivalent |
| stream-json I/O | `--input-format stream-json --output-format stream-json` | Automatic (SDK default) | SDK uses this internally, no config needed |
| Permission mode | `--permission-mode X` | `permissionMode: 'X'` | Equivalent |
| Custom flags | Any `--flag value` | `extraArgs: { flag: 'value' }` | **SDK supports arbitrary flags via `extraArgs`** |
| CLAUDE.md | Auto-loaded | `settingSources: ['project']` | Opt-in in SDK, auto in CLI |
| Custom spawn | Manual `child_process.spawn()` | `spawnClaudeCodeProcess: (opts) => spawn(...)` | SDK provides typed interface |
| In-process MCP | Not possible | `createSdkMcpServer()` | **SDK-only advantage** — no subprocess overhead |
| Custom tools | Via MCP only | In-process functions | **SDK-only advantage** |
| Programmatic hooks | Via config files | Callback functions | **SDK-only advantage** |
| Programmatic subagents | Via config files | `agents: {...}` inline | **SDK-only advantage** |
| Agent Teams | Full support | **CLI-only feature** | Not configurable via SDK options. Must use CLI |
| Auto memory | Full support | **Never loaded by SDK** | CLI-only feature |
| Skills | Full support | Via `settingSources` + `allowedTools: ['Skill']` | Equivalent when configured |
| Session resume | `claude --resume ID` | `resume: 'sessionId'` | Equivalent |
| Streaming | Via flags | `includePartialMessages: true` | SDK provides typed events |
| Structured output | Not available | `outputFormat: { type: 'json_schema', ... }` | **SDK-only advantage** |
| File checkpointing | Not available | `enableFileCheckpointing: true` | **SDK-only advantage** |
| V2 Session API | Not available | `unstable_v2_*` | **SDK-only**, unstable |
### Performance: ~12s Overhead Per `query()` Call
**This is real and documented.** Each `query()` call spawns a new CLI process, which takes ~12s to initialize.
> "The Claude Agent SDK `query()` has ~12s overhead per call — no hot process reuse"
> — [GitHub Issue #34](https://github.com/anthropics/claude-agent-sdk-typescript/issues/34)
For comparison, direct Anthropic Messages API: 1-3s. Previous SDK versions: ~40s (improved 70%).
**But for our use case (long-running agent sessions), this doesn't matter.** We spawn teams that run for minutes/hours. 12s startup is amortized. If you need sub-second responses, use the Anthropic API directly — not the SDK and not CLI direct.
**Direct CLI spawn has the SAME overhead** — the 12s is the CLI initialization time, not SDK overhead. SDK adds negligible wrapper cost on top.
### SDK-Only Features (Not Available in CLI Direct)
1. **In-process MCP servers**`createSdkMcpServer()`, no subprocess management
2. **Custom tools as functions** — No separate MCP server needed
3. **Programmatic hooks** — TypeScript/Python callbacks, not shell scripts
4. **Structured output** — JSON schema for typed responses
5. **File checkpointing** — Rewind file changes to any point
6. **Typed message stream**`SDKMessage` union type with discriminators
7. **Dynamic MCP management**`setMcpServers()`, `toggleMcpServer()`, `reconnectMcpServer()`
8. **Prompt suggestions** — AI-generated next prompt
9. **Permission callbacks**`canUseTool()` with structured decisions
### CLI-Only Features (Not Available in SDK)
1. **Agent Teams** — Multiple coordinated sessions (our core feature!)
2. **Auto memory**`~/.claude/projects/*/memory/` persistence
3. **Interactive TUI** — Terminal UI
### Critical Finding for Our Project
**Agent Teams are CLI-only.** The official docs explicitly state:
> "Agent teams are a CLI feature where one session acts as the team lead, coordinating work across independent teammates."
> — [Claude Code Features in SDK](https://platform.claude.com/docs/en/agent-sdk/claude-code-features)
This means for our team management feature, we **MUST** use CLI direct (which we already do). The SDK cannot replace our current architecture for teams.
However, for solo agents or subagent workflows, the SDK provides a better API.
---
## 2. Codex SDK (`@openai/codex-sdk`)
### Architecture
> "The TypeScript SDK wraps the Codex CLI from `@openai/codex`. It spawns the CLI and exchanges JSONL events over stdin/stdout."
> — [Codex SDK docs](https://developers.openai.com/codex/sdk)
Same pattern as Claude: SDK spawns CLI subprocess.
### API Surface
```typescript
const codex = new Codex({
env?: Record<string, string>, // Environment variables
baseUrl?: string, // API base URL (→ --config openai_base_url=...)
config?: Record<string, any> // Arbitrary config (→ --config key=value)
});
const thread = codex.startThread({
workingDirectory?: string,
skipGitRepoCheck?: boolean
});
// Buffered
const result = await thread.run(prompt, { outputSchema?: JSONSchema });
// Streaming
for await (const event of thread.runStreamed(prompt)) { ... }
// Resume
const thread = codex.resumeThread(threadId);
```
### Feature Comparison: SDK vs CLI Direct
| Feature | CLI Direct | SDK | Notes |
|---------|-----------|-----|-------|
| MCP config | `config.toml` / `codex mcp` | Via `config` option passthrough | CLI manages MCP directly |
| Custom flags | Any flag | `config: { key: value }``--config key=value` | Limited to config passthrough |
| Model selection | `/model` command | Not directly exposed | Must use config |
| Approval modes | `--full-auto`, etc. | Not directly exposed | Must use config or env |
| Structured output | Not in interactive mode | `outputSchema` (Zod → JSON Schema) | **SDK advantage** |
| Thread persistence | `codex resume` | `resumeThread(threadId)` | Equivalent |
| Streaming | JSONL stdout | `runStreamed()` async generator | SDK provides typed events |
| Multimodal input | Screenshots, sketches | `{ type: 'local_image', path }` | Equivalent |
| Performance | Baseline (CLI init) | Same + minimal SDK overhead | No significant difference |
### Native SDK Alternative: `@codex-native/sdk`
There's a Rust-based alternative via napi-rs that **does NOT spawn child processes**:
> "The Native SDK provides Rust-powered bindings via napi-rs, giving you direct access to Codex functionality without spawning child processes."
> — [@codex-native/sdk npm](https://www.npmjs.com/package/@codex-native/sdk)
Full API compatibility with the TypeScript SDK, but with native performance. However, only 33 weekly downloads — practically nobody uses it.
### Known Issues
- **Windows spawn EPERM** — CLI fails on Windows ([#7810](https://github.com/openai/codex/issues/7810))
- **Zombie MCP processes** — 1,300+ zombies, 37GB memory leak ([#12491](https://github.com/openai/codex/issues/12491))
- **Subagents experimental** — Gated behind `features.multi_agent` flag
---
## 3. Gemini CLI SDK (`@google/gemini-cli-sdk`)
### Architecture
Monorepo with three packages:
- `@google/gemini-cli` — Bundled single executable (CLI)
- `@google/gemini-cli-core` — Core logic, API orchestration, tool execution
- `@google/gemini-cli-sdk` — Programmatic API layer over core
**Key difference from Claude/Codex:** The Gemini SDK **does NOT spawn CLI as subprocess**. It uses `@google/gemini-cli-core` directly as a library. This is architecturally different — the SDK calls core functions in-process.
### API Surface
```typescript
// Agent-based API
const agent = new GeminiCliAgent(definition: LocalAgentDefinition);
// Includes: model config, tools, system instructions
// Session management
const session = new GeminiCliSession(context: AgentLoopContext);
// Activity monitoring
agent.onActivity((activity) => { ... });
```
### Feature Comparison: SDK vs CLI Direct
| Feature | CLI Direct | SDK | Notes |
|---------|-----------|-----|-------|
| MCP support | `config.toml` | Via core ToolRegistry | Same underlying system |
| Custom tools | Via MCP servers | Via ToolRegistry + custom definitions | SDK has more direct access |
| Model routing | Auto fallback | Via ModelConfig | Same capability |
| Hooks | Shell scripts | Programmatic callbacks | SDK advantage |
| Sandboxing | Built-in | Via SandboxManager | Same capability |
| Output format | `--output-format json/stream-json` | Typed events via callbacks | SDK provides typed events |
| Extensions | Plugin architecture | Same plugin system | Equivalent |
| Agent Skills | Custom skills | Custom skills | Equivalent |
| Performance | Baseline | **No subprocess — in-process** | **SDK is faster** |
| Abort support | Ctrl+C | Limited — aborted requests continue ([known issue](https://github.com/google-gemini/gemini-cli/issues/15539)) | CLI wins here |
| Checkpointing | Automatic snapshots | Via SDK session | Equivalent |
### Maturity
The SDK was introduced in v0.30.0. GitHub issue [#15539](https://github.com/google-gemini/gemini-cli/issues/15539) requesting a formal SDK is now **CLOSED as completed**. The core API surface is still evolving — `@google/gemini-cli-core` includes "robust compatibility measures" suggesting instability.
---
## 4. Cross-SDK Comparison Matrix
| Dimension | Claude Agent SDK | Codex SDK | Gemini CLI SDK |
|-----------|-----------------|-----------|----------------|
| **Architecture** | Spawns CLI subprocess | Spawns CLI subprocess | In-process (uses core directly) |
| **Startup overhead** | ~12s per query() | Unknown (similar pattern) | Minimal (no subprocess) |
| **CLI flag passthrough** | `extraArgs` for ANY flag | `config` for config flags | N/A (not subprocess-based) |
| **MCP support** | Full (stdio, SSE, HTTP, in-process) | Full (stdio, HTTP) | Full (via ToolRegistry) |
| **In-process tools** | `createSdkMcpServer()` | Custom tool registration | ToolRegistry |
| **Structured output** | JSON Schema | JSON Schema (Zod) | Zod schemas |
| **Agent teams** | CLI-only | N/A | N/A |
| **Subagents** | Programmatic + filesystem | Experimental | Via LocalAgentExecutor |
| **Streaming** | AsyncGenerator<SDKMessage> | AsyncGenerator events | onActivity callback |
| **Custom spawn** | `spawnClaudeCodeProcess` | Not exposed | N/A (no subprocess) |
| **Session resume** | Full (resume, fork) | Full (resumeThread) | Via GeminiCliSession |
| **Hooks** | Programmatic callbacks | Not documented | Programmatic callbacks |
| **License** | Proprietary | Open source (Apache 2.0) | Open source (Apache 2.0) |
| **npm weekly DL** | High (official Anthropic) | High (official OpenAI) | Medium (newer) |
| **Maturity** | Production (v0.2.81) | Production | Early (v0.30.0+) |
---
## 5. Key Questions Answered
### Q1: Does Claude Agent SDK support ALL CLI flags?
**YES.** Via `extraArgs: Record<string, string | null>` you can pass ANY arbitrary flag. Plus most important flags have dedicated typed options (`mcpServers`, `disallowedTools`, `allowedTools`, `permissionMode`, etc.).
### Q2: Does Codex SDK expose all CLI capabilities?
**Partially.** The `config` option can pass arbitrary config values, but not all CLI flags are exposed as typed options. The API surface is minimal compared to Claude's SDK.
### Q3: Does Gemini CLI SDK expose all CLI capabilities?
**Mostly.** Since it uses `@google/gemini-cli-core` directly (not subprocess), it has access to all internal APIs. But the public SDK surface is still maturing.
### Q4: Is there a performance overhead?
**Claude/Codex: YES — ~12s startup per query() call.** This is the CLI initialization time, not SDK overhead. Direct spawn has the same cost.
**Gemini: NO additional overhead** — in-process architecture, no subprocess.
### Q5: Can we pass arbitrary flags through the SDK?
- **Claude:** YES, via `extraArgs`
- **Codex:** Partially, via `config` (maps to `--config key=value`)
- **Gemini:** N/A (not subprocess-based)
### Q6: Does the SDK actually spawn the CLI?
- **Claude:** YES — spawns bundled `cli.js` via `child_process`
- **Codex:** YES — spawns CLI and exchanges JSONL over stdin/stdout
- **Gemini:** NO — uses core library in-process
### Q7: What happens when a new CLI flag is added?
- **Claude:** `extraArgs` passes ANY flag through immediately. No SDK update needed.
- **Codex:** May need SDK update for new flags not covered by `config`
- **Gemini:** Core library update needed, but it's the same package ecosystem
### Q8: Can we use SDK and direct CLI interchangeably?
**YES.** They are not mutually exclusive. Use SDK for simple flows, CLI direct for advanced (Agent Teams, etc.). Both produce the same session files, use the same auth, same MCP servers.
---
## 6. Verdict for Our Project (Claude Agent Teams UI)
### What We Need
1. **Agent Teams** (lead + teammates, stream-json, inbox messaging) — **CLI-only**
2. **MCP config passthrough** (`--mcp-config`, `--strict-mcp-config`) — **Both work**
3. **Disallowed tools** (`--disallowedTools`) — **Both work**
4. **stream-json stdin/stdout****SDK uses this internally, CLI direct also works**
5. **Custom spawn control** (Electron, process management) — **Both work**
6. **Long-running sessions** (teams run for hours) — **Both work, 12s overhead irrelevant**
### Recommendation
| Use Case | Approach | Confidence |
|----------|----------|------------|
| Agent Teams (lead + teammates) | **CLI direct spawn** (current) | 10/10 — SDK cannot do this |
| Solo agent mode | **Either works**, SDK is nicer | 9/10 |
| Future multi-provider support | **CLI direct for each** | 8/10 — more flexible |
| Subagent orchestration | **SDK preferred** (typed subagents) | 8/10 |
### Final Assessment
**The concern about SDKs being "less flexible" is UNFOUNDED for most use cases.** The SDKs provide typed access to everything the CLI does, plus extras (in-process MCP, programmatic hooks, structured output). The `extraArgs` option in Claude SDK means you're never blocked by missing typed options.
**The concern about SDKs being "slower" is VALID but IRRELEVANT for long-running agents.** The ~12s startup overhead is the CLI itself, not the SDK wrapper. Direct spawn has the same cost.
**The ONE real limitation: Agent Teams are CLI-only.** Since our core feature IS Agent Teams, we MUST use CLI direct for team management. This is not a limitation of "SDKs in general" — it's a specific architectural decision by Anthropic.
### Hybrid Approach (Best of Both Worlds)
```
Agent Teams → CLI direct spawn (mandatory)
Solo agents → SDK query() (nicer API, typed, in-process MCP)
Subagents → SDK agents option (programmatic, isolated)
Multi-provider → CLI direct for each provider
```
Reliability: 9/10
Confidence: 9/10
---
## Sources
- [Claude Agent SDK TypeScript Reference](https://platform.claude.com/docs/en/agent-sdk/typescript)
- [Claude Code Features in SDK](https://platform.claude.com/docs/en/agent-sdk/claude-code-features)
- [Claude Agent SDK Overview](https://platform.claude.com/docs/en/agent-sdk/overview)
- [Claude Agent SDK MCP](https://platform.claude.com/docs/en/agent-sdk/mcp)
- [Claude Agent SDK Performance Issue #34](https://github.com/anthropics/claude-agent-sdk-typescript/issues/34)
- [Claude Agent SDK Custom Spawn #103](https://github.com/anthropics/claude-agent-sdk-typescript/issues/103)
- [Claude Agent SDK Pitfalls](https://liruifengv.com/posts/claude-agent-sdk-pitfalls-en/)
- [Claude Agent SDK vs CLI System Prompts](https://github.com/shanraisshan/claude-code-best-practice/blob/main/reports/claude-agent-sdk-vs-cli-system-prompts.md)
- [Claude Code vs Claude Agent SDK (Medium)](https://drlee.io/claude-code-vs-claude-agent-sdk-whats-the-difference-177971c442a9)
- [--disallowedTools MCP bug #12863](https://github.com/anthropics/claude-code/issues/12863)
- [Codex SDK Documentation](https://developers.openai.com/codex/sdk)
- [Codex CLI Documentation](https://developers.openai.com/codex/cli)
- [Codex MCP Support](https://developers.openai.com/codex/mcp)
- [@codex-native/sdk npm](https://www.npmjs.com/package/@codex-native/sdk)
- [Codex Zombie Process Bug #12491](https://github.com/openai/codex/issues/12491)
- [Gemini CLI SDK DeepWiki](https://deepwiki.com/google-gemini/gemini-cli/5.9-sdk-and-programmatic-api)
- [Gemini CLI Formal SDK Request #15539](https://github.com/google-gemini/gemini-cli/issues/15539)
- [Gemini CLI npm Package](https://geminicli.com/docs/npm/)
- [Making Claude Agents Run Faster](https://medium.com/@bayllama/making-your-agents-built-using-claude-agent-sdk-run-faster-2f2526a5cb42)
- [Building Agents with Claude Agent SDK (Anthropic)](https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk)