agent-ecosystem/docs/research/codex-native-runtime-integration-decision.md

312 lines
9.6 KiB
Markdown

# Codex Native Runtime Integration Decision
**Status**: Decision
**Date**: 2026-04-19
**Owner repos**:
- `claude_team`
- `agent_teams_orchestrator`
- `plugin-kit-ai`
## Purpose
Record the chosen direction for improving Codex integration in the multimodel runtime without losing native Codex capabilities such as plugins, skills, and MCP.
## Chosen Plan Assessment
- Chosen plan: normalized internal event/log layer plus staged `Codex-native` backend lane
- Assessment: `🎯 9 🛡️ 9 🧠 7`
- Estimated first serious wave: `2200-4500` lines across `agent_teams_orchestrator`, `claude_team`, and `plugin-kit-ai`
## Current Reality
Today, `Codex` inside our multimodel runtime is **not** executed through the real Codex runtime.
Instead, the current path is:
- `claude_team`
- `agent_teams_orchestrator`
- internal Codex backend
- OpenAI Responses API
In practice this means:
- the orchestrator keeps Anthropic-style streaming semantics
- `Codex` is treated as a model backend, not as a native runtime
- native Codex plugins are not honestly end-to-end supported
- current `Codex` capability support is limited by our adapter, not by the real Codex runtime
## What We Learned
After deep code and docs analysis, the most important conclusions are:
1. `@openai/codex-sdk` and `codex exec --json` are the real official execution seam for embedded Codex runtime usage.
2. `codex exec` supports API-key mode, so API-key mode itself is not the blocker.
3. `Codex` native plugins, apps, skills, and MCP are part of the real Codex runtime flow.
4. Our current `agent_teams_orchestrator` query loop is deeply coupled to Anthropic-style events and tool semantics.
5. A full drop-in swap from the current Codex adapter to `@openai/codex-sdk / codex exec` would not be a safe transport-only change. It would change runtime semantics.
6. `plugin-kit-ai` is a good fit for plugin management and native plugin placement.
7. `codex app-server` is promising for richer control-plane features, but should not be the foundation of the first production rollout for plugin management.
## Chosen Direction
We will **not** force Codex into the current Anthropic-shaped runtime contract.
We will instead:
- add a new **internal normalized event/log layer**
- keep execution semantics provider-native where needed
- add a separate **Codex-native runtime lane**
- use `plugin-kit-ai` for plugin management and native plugin placement
In practical terms:
- current Codex path stays available as the fallback/default path at first
- real Codex runtime execution becomes a separate lane instead of a drop-in replacement
- unified logs come from normalization, not from pretending every provider has Anthropic-native runtime semantics
## Decision Summary
### We are doing this
- keep the current Codex adapter path as the fallback/default path initially
- introduce a new `Codex-native` backend lane using `@openai/codex-sdk / codex exec`
- introduce a normalized internal event/log format for all providers
- map Anthropic, Gemini, and future Codex-native events into that normalized format
- keep unified logging, transcript projection, analytics, and UI-facing event handling on top of the normalized layer
- use `plugin-kit-ai` for:
- install
- update
- remove
- repair
- discover
- catalog
- native Codex plugin placement through native marketplace/filesystem layout
### We are not doing this
- not replacing the whole multimodel runtime in one shot
- not forcing real Codex runtime execution into fake Anthropic transport semantics
- not pretending a full `@openai/codex-sdk / codex exec` swap is a drop-in backend replacement
- not making `app-server plugin/*` the first production seam
## Why We Chose This
### Main benefit
This path gives us both:
- unified internal logs/events
- a real path to native Codex runtime capabilities
without requiring a full rewrite of the current multimodel runtime.
### Main reason against a direct full swap
The current orchestrator is deeply coupled to Anthropic-shaped runtime behavior:
- `tool_use`
- `tool_result`
- `content_block_start`
- `input_json_delta`
- `message_delta`
- current permission and sandbox flow
- current synthetic tool/result handling
- current transcript persistence and resume logic
`codex exec` emits a different event model:
- `thread.started`
- `turn.started`
- `turn.completed`
- `turn.failed`
- `item.started`
- `item.updated`
- `item.completed`
and item types such as:
- `agent_message`
- `reasoning`
- `command_execution`
- `file_change`
- `mcp_tool_call`
That is not just a different wire format. It is a different runtime shape.
## What Changes Per Repo
### `agent_teams_orchestrator`
This repo takes the biggest change.
We want to:
- introduce a provider-neutral normalized event/log model
- add adapter mappers from current Anthropic/Gemini style streams into that model
- add a separate `Codex-native` backend lane through `@openai/codex-sdk / codex exec`
- keep the current Codex adapter path alive as fallback during migration
- avoid forcing `codex exec` events into fake `tool_use/tool_result` transport semantics
We do **not** want to:
- replace the current Codex backend in one shot
- rewrite all providers around Codex-native semantics
- make transcript/log normalization depend on Anthropic wire events
### `claude_team`
This repo should stay relatively stable compared with the orchestrator.
We want to:
- keep one multimodel runtime concept
- stay capability-aware per provider/backend lane
- consume normalized runtime/log DTOs rather than assuming one provider-shaped event model
- integrate plugin management through `plugin-kit-ai`
- keep Codex plugin support gated behind the real Codex-native lane
We do **not** want to:
- invent a fake Codex plugin support state while execution still goes through the old adapter lane
- force UI logic to infer runtime truth from provider labels alone
### `plugin-kit-ai`
This repo remains the management engine, not the execution engine.
We want to:
- use it for catalog
- use it for discover
- use it for install/update/remove/repair
- use it for native Codex plugin placement through native marketplace/filesystem layout
We do **not** want to:
- make it responsible for running Codex plugins inside sessions
- blur installation and execution into one concern
## Target Architecture
### Runtime execution
- `Anthropic` can continue on the current path for now
- `Gemini` can continue on the current path for now
- `Codex-native` gets a dedicated backend lane through `@openai/codex-sdk / codex exec`
### Internal normalization
All runtime backends must project into a shared internal event/log model.
The normalized layer should represent concepts such as:
- turn started
- assistant text
- reasoning
- command execution
- MCP call
- file change
- approval request
- turn completed
- turn failed
The normalized format is the source of truth for:
- logs
- transcript projection
- analytics
- UI-facing activity/event summaries
The normalized format is **not** required to preserve provider-native wire semantics.
## Codex Plugins Strategy
For Codex plugins we want:
- native Codex runtime execution
- native Codex marketplace/filesystem placement
- provider-aware plugin management in `claude_team`
Therefore:
- `plugin-kit-ai` is the management engine
- real Codex runtime is the execution engine
This is important because plugin installation and plugin execution are different concerns.
Installing a native Codex plugin is not enough by itself if the session still runs through our current Responses API adapter path.
## App Server Position
`codex app-server` remains relevant, but not as the first critical path for this migration.
It is better positioned as a later control-plane enhancement for things like:
- auth state
- MCP status and OAuth flows
- skills/config inspection
- external config import
For the first production rollout, it should not be the hard dependency for plugin lifecycle management.
## Implementation Phases
### Phase 1
- design and introduce the normalized internal event/log layer
- keep current backends working
- define the internal mapping contract clearly
### Phase 2
- add a `Codex-native` backend lane through `@openai/codex-sdk / codex exec`
- keep the current Codex adapter as fallback
- validate API-key mode, working directory behavior, sandbox mode, approval policy, thread resume, and streaming
### Phase 3
- integrate `plugin-kit-ai` for provider-aware plugin management
- add native Codex plugin placement through native marketplace/filesystem model
- keep current UI provider-aware and capability-aware
### Phase 4
- optionally add selective `codex app-server` control-plane integration where it provides clear value
## Main Risks And Guardrails
### Risk 1 - treating `codex-sdk/exec` as a transport-only swap
This is the most dangerous mistake.
Guardrail:
- treat `Codex-native` as a separate runtime lane
- normalize logs/events above it
- do not assume the current Anthropic-shaped tool loop can be preserved unchanged
### Risk 2 - claiming Codex plugin support too early
Installing native Codex plugins is not enough if execution still runs through the current adapter path.
Guardrail:
- only advertise Codex plugin support when the session actually runs through the Codex-native lane
### Risk 3 - overcommitting to `app-server` too early
`codex app-server` is useful, but it should not become a hard dependency for the first production plugin rollout.
Guardrail:
- use it later for selective control-plane features
- do not block the first migration on `app-server plugin/*`
## Practical Rule
If we need **unified logs**, we normalize events.
If we need **native Codex capabilities**, we do not fake Codex into Anthropic runtime semantics.
That is the core architectural rule for this migration.