agent-ecosystem/docs/research/sdk-vs-cli-comparison.md
iliya fe90ac866d Add deep dive research documentation on AI agent orchestration and communication standards
- Introduced multiple markdown files covering agent spawn packages, inter-agent communication protocols, and multi-agent orchestration tools.
- Detailed analysis of official SDKs for CLI agents (Claude Code, Codex, Gemini) and their integration potential.
- Documented various competitor approaches to agent spawning and communication, highlighting strengths and weaknesses.
- Provided insights into best practices for implementing multi-provider support within Electron applications.

This comprehensive documentation aims to enhance understanding of the current AI agent ecosystem and serve as a resource for developers and stakeholders.
2026-03-27 17:51:49 +02:00

21 KiB

SDK vs CLI Direct Spawn: Honest Comparison

Date: 2026-03-25 Status: Research complete Verdict: SDKs are NOT limiting — they ARE the CLI with a nicer API, plus extras. But there are real tradeoffs.


TL;DR

All three SDKs (Claude Agent SDK, Codex SDK, Gemini CLI SDK) spawn the CLI as a child process under the hood and communicate via stdin/stdout JSON protocol. The SDK IS the CLI — it just wraps child_process.spawn() with a typed API. There is no functional limitation vs direct spawn, because the SDK literally does the same thing. However, there is a real performance overhead (~12s per query() call for Claude) and some CLI-only features (Agent Teams for Claude) that require workarounds.


1. Claude Agent SDK (@anthropic-ai/claude-agent-sdk)

Architecture: How It Works Under the Hood

The SDK bundles cli.js directly inside the npm package. When you call query(), it spawns a Node.js process running this bundled CLI with --input-format stream-json --output-format stream-json --verbose. Communication is via NDJSON over stdin/stdout.

"The SDK code actually bundles a cli.js file directly — which contains the entire Claude Code CLI." — Claude Agent SDK Pitfalls

Key insight: The spawnClaudeCodeProcess option lets you provide a completely custom spawn function. Node's ChildProcess already satisfies the SpawnedProcess interface. This means you can override HOW the CLI is spawned — Docker, VM, remote, whatever.

Complete Options Reference (from official docs)

Option Type Description
model string Claude model to use
cwd string Working directory
env Record<string, string> Environment variables
systemPrompt string | preset Custom or claude_code preset
allowedTools string[] Auto-approve tools
disallowedTools string[] Deny tools (checked first, overrides everything)
mcpServers Record<string, McpServerConfig> MCP server configs (stdio, SSE, HTTP, in-process SDK)
strictMcpConfig boolean Only use MCP servers from this config
settingSources SettingSource[] ["user", "project", "local"] to match CLI behavior
permissionMode PermissionMode default, acceptEdits, bypassPermissions, plan, dontAsk
canUseTool Function Custom permission callback
agents Record<string, AgentDefinition> Programmatic subagents
hooks Partial<Record<HookEvent, ...>> Programmatic hook callbacks
plugins SdkPluginConfig[] Load custom plugins
maxTurns number Max agentic turns
maxBudgetUsd number Budget cap
effort 'low'|'medium'|'high'|'max' Thinking depth
thinking ThinkingConfig Adaptive thinking config
betas SdkBeta[] Beta features (e.g., context-1m-2025-08-07)
includePartialMessages boolean Stream partial responses
outputFormat { type: 'json_schema', schema } Structured output
spawnClaudeCodeProcess Function Custom spawn function
pathToClaudeCodeExecutable string Custom CLI path
executable 'bun'|'deno'|'node' JS runtime
executableArgs string[] Runtime args
extraArgs Record<string, string|null> ANY arbitrary CLI flags
debug boolean Debug mode
debugFile string Debug log file
sandbox SandboxSettings Sandbox config
persistSession boolean Disable session persistence
resume string Resume session by ID
forkSession boolean Fork on resume
enableFileCheckpointing boolean File change tracking
fallbackModel string Fallback model
promptSuggestions boolean Emit prompt suggestions
stderr Function Stderr callback

Feature Comparison: SDK vs CLI Direct

Feature CLI Direct SDK Notes
MCP config --mcp-config '{...}' mcpServers: {...} SDK has typed config + in-process MCP servers (SDK advantage)
Strict MCP --strict-mcp-config strictMcpConfig: true Equivalent
Disallowed tools --disallowedTools X,Y disallowedTools: ['X','Y'] Equivalent. Known bug: both ignore MCP tools in -p mode (#12863)
Allowed tools --allowedTools X,Y allowedTools: ['X','Y'] Equivalent
stream-json I/O --input-format stream-json --output-format stream-json Automatic (SDK default) SDK uses this internally, no config needed
Permission mode --permission-mode X permissionMode: 'X' Equivalent
Custom flags Any --flag value extraArgs: { flag: 'value' } SDK supports arbitrary flags via extraArgs
CLAUDE.md Auto-loaded settingSources: ['project'] Opt-in in SDK, auto in CLI
Custom spawn Manual child_process.spawn() spawnClaudeCodeProcess: (opts) => spawn(...) SDK provides typed interface
In-process MCP Not possible createSdkMcpServer() SDK-only advantage — no subprocess overhead
Custom tools Via MCP only In-process functions SDK-only advantage
Programmatic hooks Via config files Callback functions SDK-only advantage
Programmatic subagents Via config files agents: {...} inline SDK-only advantage
Agent Teams Full support CLI-only feature Not configurable via SDK options. Must use CLI
Auto memory Full support Never loaded by SDK CLI-only feature
Skills Full support Via settingSources + allowedTools: ['Skill'] Equivalent when configured
Session resume claude --resume ID resume: 'sessionId' Equivalent
Streaming Via flags includePartialMessages: true SDK provides typed events
Structured output Not available outputFormat: { type: 'json_schema', ... } SDK-only advantage
File checkpointing Not available enableFileCheckpointing: true SDK-only advantage
V2 Session API Not available unstable_v2_* SDK-only, unstable

Performance: ~12s Overhead Per query() Call

This is real and documented. Each query() call spawns a new CLI process, which takes ~12s to initialize.

"The Claude Agent SDK query() has ~12s overhead per call — no hot process reuse" — GitHub Issue #34

For comparison, direct Anthropic Messages API: 1-3s. Previous SDK versions: ~40s (improved 70%).

But for our use case (long-running agent sessions), this doesn't matter. We spawn teams that run for minutes/hours. 12s startup is amortized. If you need sub-second responses, use the Anthropic API directly — not the SDK and not CLI direct.

Direct CLI spawn has the SAME overhead — the 12s is the CLI initialization time, not SDK overhead. SDK adds negligible wrapper cost on top.

SDK-Only Features (Not Available in CLI Direct)

  1. In-process MCP serverscreateSdkMcpServer(), no subprocess management
  2. Custom tools as functions — No separate MCP server needed
  3. Programmatic hooks — TypeScript/Python callbacks, not shell scripts
  4. Structured output — JSON schema for typed responses
  5. File checkpointing — Rewind file changes to any point
  6. Typed message streamSDKMessage union type with discriminators
  7. Dynamic MCP managementsetMcpServers(), toggleMcpServer(), reconnectMcpServer()
  8. Prompt suggestions — AI-generated next prompt
  9. Permission callbackscanUseTool() with structured decisions

CLI-Only Features (Not Available in SDK)

  1. Agent Teams — Multiple coordinated sessions (our core feature!)
  2. Auto memory~/.claude/projects/*/memory/ persistence
  3. Interactive TUI — Terminal UI

Critical Finding for Our Project

Agent Teams are CLI-only. The official docs explicitly state:

"Agent teams are a CLI feature where one session acts as the team lead, coordinating work across independent teammates." — Claude Code Features in SDK

This means for our team management feature, we MUST use CLI direct (which we already do). The SDK cannot replace our current architecture for teams.

However, for solo agents or subagent workflows, the SDK provides a better API.


2. Codex SDK (@openai/codex-sdk)

Architecture

"The TypeScript SDK wraps the Codex CLI from @openai/codex. It spawns the CLI and exchanges JSONL events over stdin/stdout." — Codex SDK docs

Same pattern as Claude: SDK spawns CLI subprocess.

API Surface

const codex = new Codex({
  env?: Record<string, string>,     // Environment variables
  baseUrl?: string,                  // API base URL (→ --config openai_base_url=...)
  config?: Record<string, any>       // Arbitrary config (→ --config key=value)
});

const thread = codex.startThread({
  workingDirectory?: string,
  skipGitRepoCheck?: boolean
});

// Buffered
const result = await thread.run(prompt, { outputSchema?: JSONSchema });

// Streaming
for await (const event of thread.runStreamed(prompt)) { ... }

// Resume
const thread = codex.resumeThread(threadId);

Feature Comparison: SDK vs CLI Direct

Feature CLI Direct SDK Notes
MCP config config.toml / codex mcp Via config option passthrough CLI manages MCP directly
Custom flags Any flag config: { key: value }--config key=value Limited to config passthrough
Model selection /model command Not directly exposed Must use config
Approval modes --full-auto, etc. Not directly exposed Must use config or env
Structured output Not in interactive mode outputSchema (Zod → JSON Schema) SDK advantage
Thread persistence codex resume resumeThread(threadId) Equivalent
Streaming JSONL stdout runStreamed() async generator SDK provides typed events
Multimodal input Screenshots, sketches { type: 'local_image', path } Equivalent
Performance Baseline (CLI init) Same + minimal SDK overhead No significant difference

Native SDK Alternative: @codex-native/sdk

There's a Rust-based alternative via napi-rs that does NOT spawn child processes:

"The Native SDK provides Rust-powered bindings via napi-rs, giving you direct access to Codex functionality without spawning child processes." — @codex-native/sdk npm

Full API compatibility with the TypeScript SDK, but with native performance. However, only 33 weekly downloads — practically nobody uses it.

Known Issues

  • Windows spawn EPERM — CLI fails on Windows (#7810)
  • Zombie MCP processes — 1,300+ zombies, 37GB memory leak (#12491)
  • Subagents experimental — Gated behind features.multi_agent flag

3. Gemini CLI SDK (@google/gemini-cli-sdk)

Architecture

Monorepo with three packages:

  • @google/gemini-cli — Bundled single executable (CLI)
  • @google/gemini-cli-core — Core logic, API orchestration, tool execution
  • @google/gemini-cli-sdk — Programmatic API layer over core

Key difference from Claude/Codex: The Gemini SDK does NOT spawn CLI as subprocess. It uses @google/gemini-cli-core directly as a library. This is architecturally different — the SDK calls core functions in-process.

API Surface

// Agent-based API
const agent = new GeminiCliAgent(definition: LocalAgentDefinition);
// Includes: model config, tools, system instructions

// Session management
const session = new GeminiCliSession(context: AgentLoopContext);

// Activity monitoring
agent.onActivity((activity) => { ... });

Feature Comparison: SDK vs CLI Direct

Feature CLI Direct SDK Notes
MCP support config.toml Via core ToolRegistry Same underlying system
Custom tools Via MCP servers Via ToolRegistry + custom definitions SDK has more direct access
Model routing Auto fallback Via ModelConfig Same capability
Hooks Shell scripts Programmatic callbacks SDK advantage
Sandboxing Built-in Via SandboxManager Same capability
Output format --output-format json/stream-json Typed events via callbacks SDK provides typed events
Extensions Plugin architecture Same plugin system Equivalent
Agent Skills Custom skills Custom skills Equivalent
Performance Baseline No subprocess — in-process SDK is faster
Abort support Ctrl+C Limited — aborted requests continue (known issue) CLI wins here
Checkpointing Automatic snapshots Via SDK session Equivalent

Maturity

The SDK was introduced in v0.30.0. GitHub issue #15539 requesting a formal SDK is now CLOSED as completed. The core API surface is still evolving — @google/gemini-cli-core includes "robust compatibility measures" suggesting instability.


4. Cross-SDK Comparison Matrix

Dimension Claude Agent SDK Codex SDK Gemini CLI SDK
Architecture Spawns CLI subprocess Spawns CLI subprocess In-process (uses core directly)
Startup overhead ~12s per query() Unknown (similar pattern) Minimal (no subprocess)
CLI flag passthrough extraArgs for ANY flag config for config flags N/A (not subprocess-based)
MCP support Full (stdio, SSE, HTTP, in-process) Full (stdio, HTTP) Full (via ToolRegistry)
In-process tools createSdkMcpServer() Custom tool registration ToolRegistry
Structured output JSON Schema JSON Schema (Zod) Zod schemas
Agent teams CLI-only N/A N/A
Subagents Programmatic + filesystem Experimental Via LocalAgentExecutor
Streaming AsyncGenerator AsyncGenerator events onActivity callback
Custom spawn spawnClaudeCodeProcess Not exposed N/A (no subprocess)
Session resume Full (resume, fork) Full (resumeThread) Via GeminiCliSession
Hooks Programmatic callbacks Not documented Programmatic callbacks
License Proprietary Open source (Apache 2.0) Open source (Apache 2.0)
npm weekly DL High (official Anthropic) High (official OpenAI) Medium (newer)
Maturity Production (v0.2.81) Production Early (v0.30.0+)

5. Key Questions Answered

Q1: Does Claude Agent SDK support ALL CLI flags?

YES. Via extraArgs: Record<string, string | null> you can pass ANY arbitrary flag. Plus most important flags have dedicated typed options (mcpServers, disallowedTools, allowedTools, permissionMode, etc.).

Q2: Does Codex SDK expose all CLI capabilities?

Partially. The config option can pass arbitrary config values, but not all CLI flags are exposed as typed options. The API surface is minimal compared to Claude's SDK.

Q3: Does Gemini CLI SDK expose all CLI capabilities?

Mostly. Since it uses @google/gemini-cli-core directly (not subprocess), it has access to all internal APIs. But the public SDK surface is still maturing.

Q4: Is there a performance overhead?

Claude/Codex: YES — ~12s startup per query() call. This is the CLI initialization time, not SDK overhead. Direct spawn has the same cost.

Gemini: NO additional overhead — in-process architecture, no subprocess.

Q5: Can we pass arbitrary flags through the SDK?

  • Claude: YES, via extraArgs
  • Codex: Partially, via config (maps to --config key=value)
  • Gemini: N/A (not subprocess-based)

Q6: Does the SDK actually spawn the CLI?

  • Claude: YES — spawns bundled cli.js via child_process
  • Codex: YES — spawns CLI and exchanges JSONL over stdin/stdout
  • Gemini: NO — uses core library in-process

Q7: What happens when a new CLI flag is added?

  • Claude: extraArgs passes ANY flag through immediately. No SDK update needed.
  • Codex: May need SDK update for new flags not covered by config
  • Gemini: Core library update needed, but it's the same package ecosystem

Q8: Can we use SDK and direct CLI interchangeably?

YES. They are not mutually exclusive. Use SDK for simple flows, CLI direct for advanced (Agent Teams, etc.). Both produce the same session files, use the same auth, same MCP servers.


6. Verdict for Our Project (Claude Agent Teams UI)

What We Need

  1. Agent Teams (lead + teammates, stream-json, inbox messaging) — CLI-only
  2. MCP config passthrough (--mcp-config, --strict-mcp-config) — Both work
  3. Disallowed tools (--disallowedTools) — Both work
  4. stream-json stdin/stdoutSDK uses this internally, CLI direct also works
  5. Custom spawn control (Electron, process management) — Both work
  6. Long-running sessions (teams run for hours) — Both work, 12s overhead irrelevant

Recommendation

Use Case Approach Confidence
Agent Teams (lead + teammates) CLI direct spawn (current) 10/10 — SDK cannot do this
Solo agent mode Either works, SDK is nicer 9/10
Future multi-provider support CLI direct for each 8/10 — more flexible
Subagent orchestration SDK preferred (typed subagents) 8/10

Final Assessment

The concern about SDKs being "less flexible" is UNFOUNDED for most use cases. The SDKs provide typed access to everything the CLI does, plus extras (in-process MCP, programmatic hooks, structured output). The extraArgs option in Claude SDK means you're never blocked by missing typed options.

The concern about SDKs being "slower" is VALID but IRRELEVANT for long-running agents. The ~12s startup overhead is the CLI itself, not the SDK wrapper. Direct spawn has the same cost.

The ONE real limitation: Agent Teams are CLI-only. Since our core feature IS Agent Teams, we MUST use CLI direct for team management. This is not a limitation of "SDKs in general" — it's a specific architectural decision by Anthropic.

Hybrid Approach (Best of Both Worlds)

Agent Teams → CLI direct spawn (mandatory)
Solo agents → SDK query() (nicer API, typed, in-process MCP)
Subagents → SDK agents option (programmatic, isolated)
Multi-provider → CLI direct for each provider

Reliability: 9/10 Confidence: 9/10


Sources