iliya fe90ac866d Add deep dive research documentation on AI agent orchestration and communication standards

- Introduced multiple markdown files covering agent spawn packages, inter-agent communication protocols, and multi-agent orchestration tools.
- Detailed analysis of official SDKs for CLI agents (Claude Code, Codex, Gemini) and their integration potential.
- Documented various competitor approaches to agent spawning and communication, highlighting strengths and weaknesses.
- Provided insights into best practices for implementing multi-provider support within Electron applications.

This comprehensive documentation aims to enhance understanding of the current AI agent ecosystem and serve as a resource for developers and stakeholders.

2026-03-27 17:51:49 +02:00

21 KiB

Raw Blame History

SDK vs CLI Direct Spawn: Honest Comparison

Date: 2026-03-25 Status: Research complete Verdict: SDKs are NOT limiting — they ARE the CLI with a nicer API, plus extras. But there are real tradeoffs.

TL;DR

All three SDKs (Claude Agent SDK, Codex SDK, Gemini CLI SDK) spawn the CLI as a child process under the hood and communicate via stdin/stdout JSON protocol. The SDK IS the CLI — it just wraps child_process.spawn() with a typed API. There is no functional limitation vs direct spawn, because the SDK literally does the same thing. However, there is a real performance overhead (~12s per query() call for Claude) and some CLI-only features (Agent Teams for Claude) that require workarounds.

1. Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`)

Architecture: How It Works Under the Hood

The SDK bundles cli.js directly inside the npm package. When you call query(), it spawns a Node.js process running this bundled CLI with --input-format stream-json --output-format stream-json --verbose. Communication is via NDJSON over stdin/stdout.

"The SDK code actually bundles a cli.js file directly — which contains the entire Claude Code CLI." — Claude Agent SDK Pitfalls

Key insight: The spawnClaudeCodeProcess option lets you provide a completely custom spawn function. Node's ChildProcess already satisfies the SpawnedProcess interface. This means you can override HOW the CLI is spawned — Docker, VM, remote, whatever.

Complete Options Reference (from official docs)

Option	Type	Description
`model`	`string`	Claude model to use
`cwd`	`string`	Working directory
`env`	`Record<string, string>`	Environment variables
`systemPrompt`	`string \| preset`	Custom or `claude_code` preset
`allowedTools`	`string[]`	Auto-approve tools
`disallowedTools`	`string[]`	Deny tools (checked first, overrides everything)
`mcpServers`	`Record<string, McpServerConfig>`	MCP server configs (stdio, SSE, HTTP, in-process SDK)
`strictMcpConfig`	`boolean`	Only use MCP servers from this config
`settingSources`	`SettingSource[]`	`["user", "project", "local"]` to match CLI behavior
`permissionMode`	`PermissionMode`	`default`, `acceptEdits`, `bypassPermissions`, `plan`, `dontAsk`
`canUseTool`	`Function`	Custom permission callback
`agents`	`Record<string, AgentDefinition>`	Programmatic subagents
`hooks`	`Partial<Record<HookEvent, ...>>`	Programmatic hook callbacks
`plugins`	`SdkPluginConfig[]`	Load custom plugins
`maxTurns`	`number`	Max agentic turns
`maxBudgetUsd`	`number`	Budget cap
`effort`	`'low'\|'medium'\|'high'\|'max'`	Thinking depth
`thinking`	`ThinkingConfig`	Adaptive thinking config
`betas`	`SdkBeta[]`	Beta features (e.g., `context-1m-2025-08-07`)
`includePartialMessages`	`boolean`	Stream partial responses
`outputFormat`	`{ type: 'json_schema', schema }`	Structured output
`spawnClaudeCodeProcess`	`Function`	Custom spawn function
`pathToClaudeCodeExecutable`	`string`	Custom CLI path
`executable`	`'bun'\|'deno'\|'node'`	JS runtime
`executableArgs`	`string[]`	Runtime args
`extraArgs`	`Record<string, string\|null>`	ANY arbitrary CLI flags
`debug`	`boolean`	Debug mode
`debugFile`	`string`	Debug log file
`sandbox`	`SandboxSettings`	Sandbox config
`persistSession`	`boolean`	Disable session persistence
`resume`	`string`	Resume session by ID
`forkSession`	`boolean`	Fork on resume
`enableFileCheckpointing`	`boolean`	File change tracking
`fallbackModel`	`string`	Fallback model
`promptSuggestions`	`boolean`	Emit prompt suggestions
`stderr`	`Function`	Stderr callback

Feature Comparison: SDK vs CLI Direct

Feature	CLI Direct	SDK	Notes
MCP config	`--mcp-config '{...}'`	`mcpServers: {...}`	SDK has typed config + in-process MCP servers (SDK advantage)
Strict MCP	`--strict-mcp-config`	`strictMcpConfig: true`	Equivalent
Disallowed tools	`--disallowedTools X,Y`	`disallowedTools: ['X','Y']`	Equivalent. Known bug: both ignore MCP tools in `-p` mode (#12863)
Allowed tools	`--allowedTools X,Y`	`allowedTools: ['X','Y']`	Equivalent
stream-json I/O	`--input-format stream-json --output-format stream-json`	Automatic (SDK default)	SDK uses this internally, no config needed
Permission mode	`--permission-mode X`	`permissionMode: 'X'`	Equivalent
Custom flags	Any `--flag value`	`extraArgs: { flag: 'value' }`	SDK supports arbitrary flags via `extraArgs`
CLAUDE.md	Auto-loaded	`settingSources: ['project']`	Opt-in in SDK, auto in CLI
Custom spawn	Manual `child_process.spawn()`	`spawnClaudeCodeProcess: (opts) => spawn(...)`	SDK provides typed interface
In-process MCP	Not possible	`createSdkMcpServer()`	SDK-only advantage — no subprocess overhead
Custom tools	Via MCP only	In-process functions	SDK-only advantage
Programmatic hooks	Via config files	Callback functions	SDK-only advantage
Programmatic subagents	Via config files	`agents: {...}` inline	SDK-only advantage
Agent Teams	Full support	CLI-only feature	Not configurable via SDK options. Must use CLI
Auto memory	Full support	Never loaded by SDK	CLI-only feature
Skills	Full support	Via `settingSources` + `allowedTools: ['Skill']`	Equivalent when configured
Session resume	`claude --resume ID`	`resume: 'sessionId'`	Equivalent
Streaming	Via flags	`includePartialMessages: true`	SDK provides typed events
Structured output	Not available	`outputFormat: { type: 'json_schema', ... }`	SDK-only advantage
File checkpointing	Not available	`enableFileCheckpointing: true`	SDK-only advantage
V2 Session API	Not available	`unstable_v2_*`	SDK-only, unstable

Performance: ~12s Overhead Per `query()` Call

This is real and documented. Each query() call spawns a new CLI process, which takes ~12s to initialize.

"The Claude Agent SDK query() has ~12s overhead per call — no hot process reuse" — GitHub Issue #34

For comparison, direct Anthropic Messages API: 1-3s. Previous SDK versions: ~40s (improved 70%).

But for our use case (long-running agent sessions), this doesn't matter. We spawn teams that run for minutes/hours. 12s startup is amortized. If you need sub-second responses, use the Anthropic API directly — not the SDK and not CLI direct.

Direct CLI spawn has the SAME overhead — the 12s is the CLI initialization time, not SDK overhead. SDK adds negligible wrapper cost on top.

SDK-Only Features (Not Available in CLI Direct)

In-process MCP servers — createSdkMcpServer(), no subprocess management
Custom tools as functions — No separate MCP server needed
Programmatic hooks — TypeScript/Python callbacks, not shell scripts
Structured output — JSON schema for typed responses
File checkpointing — Rewind file changes to any point
Typed message stream — SDKMessage union type with discriminators
Dynamic MCP management — setMcpServers(), toggleMcpServer(), reconnectMcpServer()
Prompt suggestions — AI-generated next prompt
Permission callbacks — canUseTool() with structured decisions

CLI-Only Features (Not Available in SDK)

Agent Teams — Multiple coordinated sessions (our core feature!)
Auto memory — ~/.claude/projects/*/memory/ persistence
Interactive TUI — Terminal UI

Critical Finding for Our Project

Agent Teams are CLI-only. The official docs explicitly state:

"Agent teams are a CLI feature where one session acts as the team lead, coordinating work across independent teammates." — Claude Code Features in SDK

This means for our team management feature, we MUST use CLI direct (which we already do). The SDK cannot replace our current architecture for teams.

However, for solo agents or subagent workflows, the SDK provides a better API.

2. Codex SDK (`@openai/codex-sdk`)

Architecture

"The TypeScript SDK wraps the Codex CLI from @openai/codex. It spawns the CLI and exchanges JSONL events over stdin/stdout." — Codex SDK docs

Same pattern as Claude: SDK spawns CLI subprocess.

API Surface

const codex = new Codex({
  env?: Record<string, string>,     // Environment variables
  baseUrl?: string,                  // API base URL (→ --config openai_base_url=...)
  config?: Record<string, any>       // Arbitrary config (→ --config key=value)
});

const thread = codex.startThread({
  workingDirectory?: string,
  skipGitRepoCheck?: boolean
});

// Buffered
const result = await thread.run(prompt, { outputSchema?: JSONSchema });

// Streaming
for await (const event of thread.runStreamed(prompt)) { ... }

// Resume
const thread = codex.resumeThread(threadId);

Feature Comparison: SDK vs CLI Direct

Feature	CLI Direct	SDK	Notes
MCP config	`config.toml` / `codex mcp`	Via `config` option passthrough	CLI manages MCP directly
Custom flags	Any flag	`config: { key: value }` → `--config key=value`	Limited to config passthrough
Model selection	`/model` command	Not directly exposed	Must use config
Approval modes	`--full-auto`, etc.	Not directly exposed	Must use config or env
Structured output	Not in interactive mode	`outputSchema` (Zod → JSON Schema)	SDK advantage
Thread persistence	`codex resume`	`resumeThread(threadId)`	Equivalent
Streaming	JSONL stdout	`runStreamed()` async generator	SDK provides typed events
Multimodal input	Screenshots, sketches	`{ type: 'local_image', path }`	Equivalent
Performance	Baseline (CLI init)	Same + minimal SDK overhead	No significant difference

Native SDK Alternative: `@codex-native/sdk`

There's a Rust-based alternative via napi-rs that does NOT spawn child processes:

"The Native SDK provides Rust-powered bindings via napi-rs, giving you direct access to Codex functionality without spawning child processes." — @codex-native/sdk npm

Full API compatibility with the TypeScript SDK, but with native performance. However, only 33 weekly downloads — practically nobody uses it.

Known Issues

Windows spawn EPERM — CLI fails on Windows (#7810)
Zombie MCP processes — 1,300+ zombies, 37GB memory leak (#12491)
Subagents experimental — Gated behind features.multi_agent flag

3. Gemini CLI SDK (`@google/gemini-cli-sdk`)

Architecture

Monorepo with three packages:

@google/gemini-cli — Bundled single executable (CLI)
@google/gemini-cli-core — Core logic, API orchestration, tool execution
@google/gemini-cli-sdk — Programmatic API layer over core

Key difference from Claude/Codex: The Gemini SDK does NOT spawn CLI as subprocess. It uses @google/gemini-cli-core directly as a library. This is architecturally different — the SDK calls core functions in-process.

API Surface

// Agent-based API
const agent = new GeminiCliAgent(definition: LocalAgentDefinition);
// Includes: model config, tools, system instructions

// Session management
const session = new GeminiCliSession(context: AgentLoopContext);

// Activity monitoring
agent.onActivity((activity) => { ... });

Feature Comparison: SDK vs CLI Direct

Feature	CLI Direct	SDK	Notes
MCP support	`config.toml`	Via core ToolRegistry	Same underlying system
Custom tools	Via MCP servers	Via ToolRegistry + custom definitions	SDK has more direct access
Model routing	Auto fallback	Via ModelConfig	Same capability
Hooks	Shell scripts	Programmatic callbacks	SDK advantage
Sandboxing	Built-in	Via SandboxManager	Same capability
Output format	`--output-format json/stream-json`	Typed events via callbacks	SDK provides typed events
Extensions	Plugin architecture	Same plugin system	Equivalent
Agent Skills	Custom skills	Custom skills	Equivalent
Performance	Baseline	No subprocess — in-process	SDK is faster
Abort support	Ctrl+C	Limited — aborted requests continue (known issue)	CLI wins here
Checkpointing	Automatic snapshots	Via SDK session	Equivalent

Maturity

The SDK was introduced in v0.30.0. GitHub issue #15539 requesting a formal SDK is now CLOSED as completed. The core API surface is still evolving — @google/gemini-cli-core includes "robust compatibility measures" suggesting instability.

4. Cross-SDK Comparison Matrix

Dimension	Claude Agent SDK	Codex SDK	Gemini CLI SDK
Architecture	Spawns CLI subprocess	Spawns CLI subprocess	In-process (uses core directly)
Startup overhead	~12s per query()	Unknown (similar pattern)	Minimal (no subprocess)
CLI flag passthrough	`extraArgs` for ANY flag	`config` for config flags	N/A (not subprocess-based)
MCP support	Full (stdio, SSE, HTTP, in-process)	Full (stdio, HTTP)	Full (via ToolRegistry)
In-process tools	`createSdkMcpServer()`	Custom tool registration	ToolRegistry
Structured output	JSON Schema	JSON Schema (Zod)	Zod schemas
Agent teams	CLI-only	N/A	N/A
Subagents	Programmatic + filesystem	Experimental	Via LocalAgentExecutor
Streaming	AsyncGenerator	AsyncGenerator events	onActivity callback
Custom spawn	`spawnClaudeCodeProcess`	Not exposed	N/A (no subprocess)
Session resume	Full (resume, fork)	Full (resumeThread)	Via GeminiCliSession
Hooks	Programmatic callbacks	Not documented	Programmatic callbacks
License	Proprietary	Open source (Apache 2.0)	Open source (Apache 2.0)
npm weekly DL	High (official Anthropic)	High (official OpenAI)	Medium (newer)
Maturity	Production (v0.2.81)	Production	Early (v0.30.0+)

5. Key Questions Answered

Q1: Does Claude Agent SDK support ALL CLI flags?

YES. Via extraArgs: Record<string, string | null> you can pass ANY arbitrary flag. Plus most important flags have dedicated typed options (mcpServers, disallowedTools, allowedTools, permissionMode, etc.).

Q2: Does Codex SDK expose all CLI capabilities?

Partially. The config option can pass arbitrary config values, but not all CLI flags are exposed as typed options. The API surface is minimal compared to Claude's SDK.

Q3: Does Gemini CLI SDK expose all CLI capabilities?

Mostly. Since it uses @google/gemini-cli-core directly (not subprocess), it has access to all internal APIs. But the public SDK surface is still maturing.

Q4: Is there a performance overhead?

Claude/Codex: YES — ~12s startup per query() call. This is the CLI initialization time, not SDK overhead. Direct spawn has the same cost.

Gemini: NO additional overhead — in-process architecture, no subprocess.

Q5: Can we pass arbitrary flags through the SDK?

Claude: YES, via extraArgs
Codex: Partially, via config (maps to --config key=value)
Gemini: N/A (not subprocess-based)

Q6: Does the SDK actually spawn the CLI?

Claude: YES — spawns bundled cli.js via child_process
Codex: YES — spawns CLI and exchanges JSONL over stdin/stdout
Gemini: NO — uses core library in-process

Q7: What happens when a new CLI flag is added?

Claude: extraArgs passes ANY flag through immediately. No SDK update needed.
Codex: May need SDK update for new flags not covered by config
Gemini: Core library update needed, but it's the same package ecosystem

Q8: Can we use SDK and direct CLI interchangeably?

YES. They are not mutually exclusive. Use SDK for simple flows, CLI direct for advanced (Agent Teams, etc.). Both produce the same session files, use the same auth, same MCP servers.

6. Verdict for Our Project (Claude Agent Teams UI)

What We Need

Agent Teams (lead + teammates, stream-json, inbox messaging) — CLI-only
MCP config passthrough (--mcp-config, --strict-mcp-config) — Both work
Disallowed tools (--disallowedTools) — Both work
stream-json stdin/stdout — SDK uses this internally, CLI direct also works
Custom spawn control (Electron, process management) — Both work
Long-running sessions (teams run for hours) — Both work, 12s overhead irrelevant

Recommendation

Use Case	Approach	Confidence
Agent Teams (lead + teammates)	CLI direct spawn (current)	10/10 — SDK cannot do this
Solo agent mode	Either works, SDK is nicer	9/10
Future multi-provider support	CLI direct for each	8/10 — more flexible
Subagent orchestration	SDK preferred (typed subagents)	8/10

Final Assessment

The concern about SDKs being "less flexible" is UNFOUNDED for most use cases. The SDKs provide typed access to everything the CLI does, plus extras (in-process MCP, programmatic hooks, structured output). The extraArgs option in Claude SDK means you're never blocked by missing typed options.

The concern about SDKs being "slower" is VALID but IRRELEVANT for long-running agents. The ~12s startup overhead is the CLI itself, not the SDK wrapper. Direct spawn has the same cost.

The ONE real limitation: Agent Teams are CLI-only. Since our core feature IS Agent Teams, we MUST use CLI direct for team management. This is not a limitation of "SDKs in general" — it's a specific architectural decision by Anthropic.

Hybrid Approach (Best of Both Worlds)

Agent Teams → CLI direct spawn (mandatory)
Solo agents → SDK query() (nicer API, typed, in-process MCP)
Subagents → SDK agents option (programmatic, isolated)
Multi-provider → CLI direct for each provider

Reliability: 9/10 Confidence: 9/10

21 KiB Raw Blame History

SDK vs CLI Direct Spawn: Honest Comparison

TL;DR

1. Claude Agent SDK (@anthropic-ai/claude-agent-sdk)

Architecture: How It Works Under the Hood

Complete Options Reference (from official docs)

Feature Comparison: SDK vs CLI Direct

Performance: ~12s Overhead Per query() Call

SDK-Only Features (Not Available in CLI Direct)

CLI-Only Features (Not Available in SDK)

Critical Finding for Our Project

2. Codex SDK (@openai/codex-sdk)

Architecture

API Surface

Feature Comparison: SDK vs CLI Direct

Native SDK Alternative: @codex-native/sdk

Known Issues

3. Gemini CLI SDK (@google/gemini-cli-sdk)

Architecture

API Surface

Feature Comparison: SDK vs CLI Direct

Maturity

4. Cross-SDK Comparison Matrix

5. Key Questions Answered

Q1: Does Claude Agent SDK support ALL CLI flags?

Q2: Does Codex SDK expose all CLI capabilities?

Q3: Does Gemini CLI SDK expose all CLI capabilities?

Q4: Is there a performance overhead?

Q5: Can we pass arbitrary flags through the SDK?

Q6: Does the SDK actually spawn the CLI?

Q7: What happens when a new CLI flag is added?

Q8: Can we use SDK and direct CLI interchangeably?

6. Verdict for Our Project (Claude Agent Teams UI)

What We Need

Recommendation

Final Assessment

Hybrid Approach (Best of Both Worlds)

Sources

21 KiB

Raw Blame History

1. Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`)

Performance: ~12s Overhead Per `query()` Call

2. Codex SDK (`@openai/codex-sdk`)

Native SDK Alternative: `@codex-native/sdk`

3. Gemini CLI SDK (`@google/gemini-cli-sdk`)