1216 lines
41 KiB
Markdown
1216 lines
41 KiB
Markdown
# Codex Native Runtime - Phase 0 Implementation Spec
|
||
|
||
Status:
|
||
|
||
- working spec, implementation-backed
|
||
- intended companion to [codex-native-runtime-integration-decision.md](codex-native-runtime-integration-decision.md)
|
||
- scope: minimal safe spike, not broad rollout
|
||
- audited against current code and a live local `codex exec` run on 2026-04-19
|
||
- safe to continue coding against
|
||
- not ready to unlock `codex-native` for normal runtime selection yet
|
||
|
||
## Purpose
|
||
|
||
This document turns the Codex-native decision doc into an execution spec for Phase 0.
|
||
|
||
Phase 0 is not the full migration.
|
||
|
||
Its only job is to prove that we can add a feature-flagged `codex-native` lane without:
|
||
|
||
- breaking current transcript consumers
|
||
- lying about status/capabilities in UI
|
||
- silently changing launch, replay, or approval semantics
|
||
|
||
If Phase 0 succeeds, we should know whether the first implementation wave can proceed as a minimal safe swap.
|
||
|
||
## Current Readiness Verdict
|
||
|
||
The spec itself is now ready to drive implementation.
|
||
|
||
Phase 0 implementation is now wired and evidence-backed.
|
||
|
||
Current state:
|
||
|
||
- ✅ ready and already implemented:
|
||
- `codex-native` backend vocabulary in `agent_teams_orchestrator`
|
||
- `codex-native` backend vocabulary in `claude_team` config and validation
|
||
- backend-aware Codex connection-routing in `claude_team`
|
||
- lane-aware Codex status/copy in `claude_team`
|
||
- raw `codex exec` arg builder
|
||
- raw JSONL-to-normalized-event mapper
|
||
- real process-owned `codex exec` runner
|
||
- transcript-compatible projector
|
||
- persisted history wiring through the native lane
|
||
- native executable identity, credential source, and completion metadata capture
|
||
- parser coverage for native projected assistant rows
|
||
- parser coverage for modern system warning rows
|
||
- conservative selector lock policy
|
||
- targeted tests for the above slices
|
||
- ⚠️ partially implemented:
|
||
- `codex-native` runtime status can now represent the lane honestly, and the execution lane is real, but the lane remains intentionally locked and non-selectable
|
||
- native lane credentials are routed honestly end-to-end, but the lane still exposes only a conservative headless-limited capability profile
|
||
- the lane remains intentionally conservative in UI exposure and unlock policy even though transcript authority is now stronger
|
||
- ✅ sign-off evidence package is now captured in
|
||
[codex-native-runtime-phase-0-signoff-evidence.md](./codex-native-runtime-phase-0-signoff-evidence.md)
|
||
|
||
Practical meaning:
|
||
|
||
- the Phase 0 contract is now strong enough to keep implementing against
|
||
- the product is still protected from false rollout because `codex-native` remains a locked experimental lane
|
||
|
||
## Spec Maintenance Rule
|
||
|
||
This document is allowed to evolve only in two ways:
|
||
|
||
1. to reflect implementation-backed reality more accurately
|
||
2. to tighten gates when a new risk is discovered
|
||
|
||
It must not drift into a second speculative architecture document.
|
||
|
||
Required maintenance behavior:
|
||
|
||
- if a Phase 0 PR changes authority order, capability truth, lock policy, or exit criteria, this spec must be updated in the same PR
|
||
- if a Phase 0 PR only adds implementation under an already-frozen contract, this spec should update only its status/checklist sections
|
||
- if current code and this spec disagree, either the code is wrong, or the spec is stale - do not leave the disagreement implicit
|
||
- if the implementation-status snapshot changes materially, update the `Implementation Status As Of ...` date in the same PR
|
||
|
||
## Phase 0 Source Of Truth Rule
|
||
|
||
For Phase 0 implementation work:
|
||
|
||
- this document is the execution contract
|
||
- [codex-native-runtime-integration-decision.md](codex-native-runtime-integration-decision.md) remains the broader strategy and risk document
|
||
|
||
If the two documents appear to disagree on a Phase 0 implementation detail:
|
||
|
||
- this spec wins until both documents are reconciled
|
||
|
||
Reason:
|
||
|
||
- the decision doc is intentionally broader
|
||
- this spec is intentionally narrower and implementation-facing
|
||
|
||
## Implementation Status As Of 2026-04-19
|
||
|
||
### Foundation already landed
|
||
|
||
- `agent_teams_orchestrator` now knows `codex-native` as a first-class backend id
|
||
- `agent_teams_orchestrator` status and registry surfaces can describe the lane without auto-resolving into it
|
||
- `claude_team` config vocabulary, validation, connection routing, and runtime UI copy are lane-aware
|
||
- old Codex auth mode no longer silently chooses the runtime lane
|
||
- raw exec Phase 0 modules already exist for:
|
||
- arg building
|
||
- JSONL mapping
|
||
- normalized event shape
|
||
- the live orchestrator execution path now has:
|
||
- a real `codex exec` runner
|
||
- transcript-compatible projection
|
||
- persisted history writes
|
||
- executable identity and completion metadata capture
|
||
- native projected transcript rows now carry:
|
||
- thread-status authority
|
||
- warning-source attribution
|
||
- execution-summary and history-completeness metadata
|
||
- targeted tests now exist for resolver, registry, config validation, connection routing, lane-aware UI, exec arg building, JSONL mapping, transcript projection, thread-status authority, turn execution, JSONL parsing, exact-log parsing, and session parsing
|
||
|
||
### Foundation intentionally still locked
|
||
|
||
- `codex-native` is not selectable for normal users
|
||
- `auto` never resolves to `codex-native`
|
||
- targeted client guard still rejects live interactive execution on the lane
|
||
- renderer/status surfaces may show the lane diagnostically, but not as a fully usable runtime
|
||
|
||
### Remaining Phase 0 blockers
|
||
|
||
- no code blockers remain inside Phase 0
|
||
- lane unlock remains intentionally blocked by rollout policy
|
||
|
||
### Phase 0 readiness verdict
|
||
|
||
- ✅ implementation-complete
|
||
- ✅ sign-off evidence captured
|
||
- ✅ raw-exec execution slice is landed
|
||
- ✅ ready to treat the spec as the contract for remaining work
|
||
- ✅ ready to declare Phase 0 complete
|
||
- ⚠️ still not ready to unlock `codex-native` as a selectable runtime lane
|
||
|
||
## Observed Current Codex Exec Facts
|
||
|
||
The following are no longer assumptions. They were observed locally on 2026-04-19 with:
|
||
|
||
- `codex-cli 0.117.0`
|
||
- `codex exec --json --ephemeral --skip-git-repo-check -C /tmp 'Reply only with OK'`
|
||
|
||
Observed event shape:
|
||
|
||
- `thread.started`
|
||
- `turn.started`
|
||
- `item.completed`
|
||
- `turn.completed`
|
||
|
||
Observed successful assistant payload:
|
||
|
||
- `item.completed.item.type = "agent_message"`
|
||
- `item.completed.item.text = "OK"`
|
||
|
||
Observed usage payload:
|
||
|
||
- `turn.completed.usage.input_tokens`
|
||
- `turn.completed.usage.cached_input_tokens`
|
||
- `turn.completed.usage.output_tokens`
|
||
|
||
Observed seam-critical warning:
|
||
|
||
- `thread/read failed while backfilling turn items for turn completion`
|
||
- `ephemeral threads do not support includeTurns`
|
||
- non-JSON warning lines may be interleaved with JSONL and must stay source-attributed
|
||
|
||
Observed practical implication:
|
||
|
||
- `--ephemeral` gives useful live events
|
||
- `--ephemeral` does not give final completion backfill via `thread/read`
|
||
- this confirms the Phase 0 rule that live stream and canonical history are different authorities
|
||
|
||
## Current Implemented Routing Facts
|
||
|
||
These are current implementation-backed truths, not future intentions:
|
||
|
||
- `codex-native` is a distinct backend lane, not a rename of old Codex `api` or `adapter`
|
||
- `auto` does not resolve to `codex-native`
|
||
- `codex-native` requires its own native-lane readiness path
|
||
- the native credential surface is `CODEX_API_KEY`, not implicit old-lane readiness
|
||
- `claude_team` now keeps auth routing and backend-lane routing separate
|
||
- when the selected backend is `codex-native`, app-side credential bridging may populate `CODEX_API_KEY`
|
||
- manual early routing into live `codex-native` execution is still protected by a targeted runtime guard
|
||
- once a real native runner exists, native-lane truth must also carry executable identity, not only backend id
|
||
|
||
Practical rule:
|
||
|
||
- if later code or copy contradicts any item above, it should be treated as regression unless the Phase 0 contract is intentionally amended
|
||
|
||
## Scope
|
||
|
||
In scope:
|
||
|
||
- one experimental `codex-native` backend lane
|
||
- one chosen execution seam for the spike
|
||
- normalized runtime events for the spike lane
|
||
- transcript-compatible projection for the spike lane
|
||
- explicit authority order for:
|
||
- history
|
||
- status
|
||
- warnings
|
||
- launch intent versus native thread defaults
|
||
- credential routing
|
||
- feature-flagged runtime exposure only
|
||
- explicit unsupported-state treatment for headless-limited interactions
|
||
|
||
Out of scope:
|
||
|
||
- making `codex-native` the default
|
||
- broad plugin UX rollout
|
||
- detached review parity
|
||
- full app-server integration
|
||
- changing `claude_team` transcript parser format
|
||
- removing the old Codex `adapter/api` lane
|
||
|
||
## Phase 0 Deliverable
|
||
|
||
Phase 0 is complete only if all of the following are true:
|
||
|
||
- `agent_teams_orchestrator` can run one real Codex-native session through a feature-flagged lane
|
||
- the spike emits normalized events
|
||
- normalized events can be projected into transcript-compatible persisted history
|
||
- current `claude_team` transcript readers still parse the output without schema rewrite
|
||
- runtime status can represent the lane honestly as selected, resolved, degraded, or unavailable
|
||
- UI copy does not overclaim:
|
||
- plugin support
|
||
- approval support
|
||
- interactive prompt support
|
||
- current-session plugin activation
|
||
- thread health from process health
|
||
|
||
## Phase 0 Exit Checklist
|
||
|
||
Use this as the stop/go gate before declaring Phase 0 done.
|
||
|
||
| Gate | Current state | Requirement to pass |
|
||
| --- | --- | --- |
|
||
| `codex-native` backend truth exists in both repos | ✅ done | keep green |
|
||
| lane remains additive and non-default | ✅ done | keep green |
|
||
| lane remains locked until execution is real | ✅ done | keep green |
|
||
| old Codex `api/adapter` lane remains behaviorally unchanged | ✅ targeted regression coverage green | required |
|
||
| old Codex lane remains the safe fallback when native lane is absent, locked, or degraded | ✅ targeted regression coverage green | required |
|
||
| real `codex exec` process run is wired into orchestrator | ✅ done | keep green |
|
||
| executable identity is captured per run | ✅ done | keep green |
|
||
| runner records executable source and completion policy | ✅ done | keep green |
|
||
| normalized native events flow from live process output | ✅ done | keep green |
|
||
| native lane capability profile remains explicit and conservative | ✅ done | keep green |
|
||
| transcript-compatible projection is written to persisted history | ✅ done | keep green |
|
||
| current parser and exact-log paths still parse the projection | ✅ parser and exact-log proof green | keep green |
|
||
| native thread-status authority exists or degrades honestly | ✅ projected thread-status rows and targeted tests green | keep green |
|
||
| warning sources remain separated end-to-end | ✅ warning-source attribution survives projected transcript rows | keep green |
|
||
| replay and history fixtures exist for `ephemeral` and non-ephemeral runs | ✅ targeted replay/history fixtures green | keep green |
|
||
| UI copy stays lane-aware and capability-honest | ✅ targeted UI/runtime tests green | keep green |
|
||
|
||
## Completion Versus Unlock Policy
|
||
|
||
Phase 0 completion and lane unlock are related, but they are not the same event.
|
||
|
||
Phase 0 completion means:
|
||
|
||
- one real `codex-native` execution path works end-to-end
|
||
- transcript, status, warning, and history truth stay honest
|
||
- internal fixtures prove the chosen seam well enough to proceed
|
||
|
||
Phase 0 completion does **not** mean:
|
||
|
||
- `codex-native` becomes default
|
||
- `auto` may resolve to `codex-native`
|
||
- the lane is generally available without a feature flag
|
||
- the lane suddenly gains plugin, MCP, approval, or app-server-grade interactive claims
|
||
|
||
Default post-Phase-0 policy:
|
||
|
||
- keep `codex-native` feature-flagged
|
||
- keep capability truth conservative
|
||
- unlock only for explicit internal usage first
|
||
- treat broader rollout as a later decision after Phase 1 gates, not as an automatic consequence of finishing Phase 0
|
||
|
||
## Old Codex Lane Regression Guardrail
|
||
|
||
Phase 0 is not allowed to “succeed” by quietly making the existing Codex lane worse.
|
||
|
||
Required rule:
|
||
|
||
- all `codex-native` work remains additive until a later explicit migration decision
|
||
|
||
That means:
|
||
|
||
- old Codex `api/adapter` execution remains routable
|
||
- old Codex connection/auth behavior remains valid for the old lane
|
||
- `auto` keeps today’s old-lane behavior
|
||
- status, settings, and selector surfaces keep showing a truthful fallback path when native lane is absent, locked, or degraded
|
||
- a failed or unavailable `codex-native` lane must not make the whole Codex provider story look unavailable if the old lane still works
|
||
|
||
Not allowed:
|
||
|
||
- reinterpreting old-lane readiness as native-lane readiness
|
||
- changing old-lane defaults only because the new lane exists
|
||
- breaking old-lane tests while claiming the work is “only for native”
|
||
|
||
## Chosen Phase 0 Default
|
||
|
||
Phase 0 default:
|
||
|
||
- execution seam: raw `codex exec` wrapper first
|
||
- lane shape: headless-limited until proven otherwise
|
||
- old Codex lane remains intact and is the fallback
|
||
- `codex-native` is additive, behind feature flag
|
||
|
||
Reason:
|
||
|
||
- raw exec exposes session ownership and `--ephemeral` tradeoffs more honestly than the current TypeScript SDK wrapper
|
||
- it reduces the chance of hiding critical persistence or capability differences under a convenience API too early
|
||
|
||
## Execution Seam Freeze Rule
|
||
|
||
Phase 0 currently chooses one seam:
|
||
|
||
- raw `codex exec` wrapper first
|
||
|
||
That choice is now frozen for the remainder of Phase 0 unless explicitly amended.
|
||
|
||
Practical rule:
|
||
|
||
- do not quietly switch the live implementation to current TypeScript SDK mid-Phase-0 while keeping the same checklist and evidence package
|
||
- if the chosen seam changes, the following must be re-evaluated and updated together:
|
||
- capability matrix
|
||
- credential-routing contract
|
||
- history-completeness contract
|
||
- sign-off evidence package
|
||
- sign-off command package
|
||
|
||
Reason:
|
||
|
||
- otherwise Phase 0 can look “complete” while its evidence package still proves a different seam than the one actually being shipped
|
||
|
||
## Current Phase 0 Contract State
|
||
|
||
This spec now serves two jobs at once:
|
||
|
||
1. freeze the minimum safe contract for the remaining Phase 0 work
|
||
2. record which pieces of that contract already exist in code
|
||
|
||
That distinction matters because Phase 0 is no longer theoretical.
|
||
|
||
It already has grounded slices in both repos and is now implementation-complete, but it remains deliberately rollout-limited.
|
||
|
||
Rule:
|
||
|
||
- if a section below describes authority or capability truth that is not implemented yet, it is still binding for the next code slices
|
||
- if current code violates that truth, current code must change before `codex-native` is unlocked
|
||
|
||
## Repo Ownership
|
||
|
||
### `agent_teams_orchestrator`
|
||
|
||
Owns:
|
||
|
||
- Codex-native execution seam
|
||
- normalized event schema
|
||
- raw native event mapping
|
||
- transcript-compatible projector
|
||
- lane capability truth
|
||
- thread-status and warning authority
|
||
- credential routing for the chosen seam
|
||
|
||
Recommended touched areas:
|
||
|
||
- `src/services/runtimeBackends/types.ts`
|
||
- `src/services/runtimeBackends/registry.ts`
|
||
- `src/services/runtimeBackends/codexBackendResolver.ts`
|
||
- `src/services/boardTaskActivity/contract.ts`
|
||
- `src/services/boardTaskActivity/BoardTaskTranscriptProjector.ts`
|
||
- `src/query.ts`
|
||
- `src/utils/config.ts`
|
||
|
||
Path note:
|
||
|
||
- the paths above are in the `agent_teams_orchestrator` repo, not in `claude_team`
|
||
|
||
Recommended new module split for the spike:
|
||
|
||
- `src/services/codexNative/execRunner.ts`
|
||
- `src/services/codexNative/jsonlMapper.ts`
|
||
- `src/services/codexNative/normalizedEvents.ts`
|
||
- `src/services/codexNative/capabilities.ts`
|
||
- `src/services/codexNative/statusAuthority.ts`
|
||
- `src/services/codexNative/transcriptProjector.ts`
|
||
|
||
Current implementation status:
|
||
|
||
- ✅ created:
|
||
- `src/services/codexNative/execRunner.ts`
|
||
- `src/services/codexNative/jsonlMapper.ts`
|
||
- `src/services/codexNative/normalizedEvents.ts`
|
||
- `src/services/codexNative/capabilities.ts`
|
||
- `src/services/codexNative/statusAuthority.ts`
|
||
- `src/services/codexNative/transcriptProjector.ts`
|
||
- `src/services/codexNative/signOffHarness.ts`
|
||
|
||
### `claude_team`
|
||
|
||
Owns:
|
||
|
||
- backend-lane-aware status ingestion
|
||
- lane-aware copy
|
||
- feature-flag exposure
|
||
- preserving current transcript/read-model path
|
||
|
||
Recommended touched areas:
|
||
|
||
- [ClaudeMultimodelBridgeService.ts](../../src/main/services/runtime/ClaudeMultimodelBridgeService.ts)
|
||
- [CliStatusBanner.tsx](../../src/renderer/components/dashboard/CliStatusBanner.tsx)
|
||
- [CliStatusSection.tsx](../../src/renderer/components/settings/sections/CliStatusSection.tsx)
|
||
- [providerConnectionUi.ts](../../src/renderer/components/runtime/providerConnectionUi.ts)
|
||
- [ProviderRuntimeSettingsDialog.tsx](../../src/renderer/components/runtime/ProviderRuntimeSettingsDialog.tsx)
|
||
- [SessionParser.ts](../../src/main/services/parsing/SessionParser.ts)
|
||
- [BoardTaskExactLogStrictParser.ts](../../src/main/services/team/taskLogs/exact/BoardTaskExactLogStrictParser.ts)
|
||
|
||
### `plugin-kit-ai`
|
||
|
||
Not required for the Phase 0 spike.
|
||
|
||
Only Phase-0-adjacent requirement:
|
||
|
||
- no UI or status copy may imply plugin execution support for `codex-native` before Phase 3
|
||
|
||
## Recommended Coding Order
|
||
|
||
Phase 0 should be cut in this order:
|
||
|
||
1. `agent_teams_orchestrator` type freeze
|
||
- add `codex-native` backend id to runtime backend types
|
||
- keep old Codex lane untouched
|
||
- add feature flag gates only, no behavior switch yet
|
||
- status: ✅ done
|
||
- grounded by:
|
||
- backend id additions
|
||
- resolver gates
|
||
- registry/status exposure
|
||
- targeted runtime backend tests
|
||
|
||
2. raw exec spike seam
|
||
- add a tiny native runner that can start one Codex-native session
|
||
- capture raw JSONL
|
||
- record executable source, credential path, and `ephemeral` policy
|
||
- status: ✅ done
|
||
- grounded by:
|
||
- arg builder
|
||
- real process runner in orchestrator
|
||
- live event fixture mapping
|
||
- observed local seam validation
|
||
- executable-source capture
|
||
- executable-version capture
|
||
- completion-policy and backfill metadata capture
|
||
- explicit client guard that keeps rollout conservative
|
||
|
||
3. normalized mapper
|
||
- map raw events into the Phase-0 normalized schema
|
||
- do not wire UI to raw events
|
||
- status: ✅ done
|
||
- grounded by:
|
||
- thread started
|
||
- turn started
|
||
- assistant text
|
||
- usage updated
|
||
- turn completed
|
||
- stderr warning passthrough
|
||
- unsupported raw event preservation
|
||
- stable minimal Phase-0 event contract frozen in code
|
||
|
||
4. transcript-compatible projector
|
||
- project the normalized subset into persisted transcript-compatible history
|
||
- verify current parser path still works
|
||
- status: ✅ done
|
||
- grounded by:
|
||
- persisted assistant projection
|
||
- projected warning rows with source attribution
|
||
- projected thread-status rows
|
||
- projected execution-summary rows with history-completeness metadata
|
||
- green parser and exact-log fixtures
|
||
|
||
5. status and warning authority
|
||
- keep lane status, thread status, and warning-source truth separate
|
||
- update bridge payloads before touching UI copy
|
||
- status: ✅ done
|
||
- grounded by:
|
||
- backend lane truth in runtime status
|
||
- selectable-vs-available distinction
|
||
- codex-native remains locked
|
||
- targeted UI copy no longer claims auth mode equals runtime lane
|
||
- projected thread-status authority in persisted history
|
||
- projected warning-source attribution in persisted history
|
||
- sign-off evidence for `process` versus `history` warning attribution
|
||
|
||
6. `claude_team` feature-flagged exposure
|
||
- show lane only when the backend truth can already represent it honestly
|
||
- keep unsupported capabilities visibly unsupported
|
||
- status: ✅ done
|
||
- grounded by:
|
||
- lane-aware config vocabulary
|
||
- lane-aware connection/runtime copy
|
||
- lane-aware selector behavior
|
||
- backend env kept independent from auth mode
|
||
- locked-lane affordance in runtime settings surfaces
|
||
- targeted UI/runtime tests for locked-lane truth
|
||
|
||
7. fixture and regression pass
|
||
- add the mandatory Phase-0 fixtures
|
||
- only then allow limited internal usage of the new lane
|
||
- status: ✅ done
|
||
- grounded by:
|
||
- resolver fixtures
|
||
- runtime status fixtures
|
||
- raw exec arg-builder fixtures
|
||
- raw JSONL mapper fixtures
|
||
- `claude_team` config/routing/UI fixtures
|
||
- transcript/replay/history fixtures
|
||
- thread-status authority fixtures
|
||
- exact-log compatibility fixtures
|
||
- repo-visible sign-off evidence package
|
||
|
||
## Authority Order
|
||
|
||
This is the most important part of the spec.
|
||
|
||
### 1. Execution authority
|
||
|
||
For the spike lane:
|
||
|
||
1. raw `codex exec` JSONL output
|
||
2. normalized-event mapping
|
||
3. transcript-compatible projection
|
||
4. current `claude_team` transcript/read-model path
|
||
|
||
Rule:
|
||
|
||
- no UI surface consumes raw native events directly in Phase 0
|
||
|
||
### 2. History authority
|
||
|
||
History truth order:
|
||
|
||
1. explicit seam-owned completion or hydration source for the chosen lane
|
||
2. persisted transcript-compatible projection written by orchestrator
|
||
3. live event cache for activity only
|
||
|
||
Rule:
|
||
|
||
- live stream is never canonical history by itself
|
||
|
||
### 3. Status authority
|
||
|
||
Status truth must stay split by scope:
|
||
|
||
1. native thread status
|
||
2. provider-lane status
|
||
3. host process/provisioning status
|
||
|
||
Rules:
|
||
|
||
- thread health is not inferred from process liveness
|
||
- provider-global runtime banners are not allowed to masquerade as thread-specific health
|
||
- if native thread status is unavailable on the chosen seam, UI must say degraded or unavailable, not synthesize `active`
|
||
|
||
### 4. Warning authority
|
||
|
||
Warning channels remain separate:
|
||
|
||
1. native thread warnings
|
||
2. config/startup warnings
|
||
3. provisioning/process warnings
|
||
|
||
Rules:
|
||
|
||
- do not merge these channels into one generic warning field
|
||
- if a UI surface can only show one summary line, it must still preserve source attribution in detail text
|
||
|
||
### 5. Launch-intent authority
|
||
|
||
There are two different truths:
|
||
|
||
- host launch intent
|
||
- live native thread defaults
|
||
|
||
Rules:
|
||
|
||
- `provider/model/effort` in launch config is launch intent only
|
||
- resumed native thread defaults may differ
|
||
- if they differ, UI must show either:
|
||
- inherited native defaults
|
||
- explicit override pending
|
||
- or forced fresh-thread policy
|
||
|
||
### 6. Credential authority
|
||
|
||
Rules:
|
||
|
||
- old Codex lane auth truth and `codex-native` auth truth must not share one fake readiness source
|
||
- old lane may still use current app-side `OPENAI_API_KEY` flow
|
||
- `codex-native` must use only the credential contract actually required by the chosen seam
|
||
- UI must not infer native readiness from old-lane auth success
|
||
|
||
## Phase 0 Capability Matrix
|
||
|
||
Phase 0 should assume the following unless the spike proves otherwise:
|
||
|
||
| Capability | Old Codex lane | `codex-native` spike lane |
|
||
| --- | --- | --- |
|
||
| Team launch | supported | supported behind flag |
|
||
| Transcript-compatible history | supported | required |
|
||
| Plugins | unsupported | unsupported in Phase 0 |
|
||
| MCP | unsupported or existing-lane-specific | unsupported unless explicitly proven on chosen seam |
|
||
| Skills | unsupported or existing-lane-specific | unsupported unless explicitly proven on chosen seam |
|
||
| Manual approvals | current lane semantics | unsupported or limited unless explicitly proven |
|
||
| Generic interactive prompts | n/a | unsupported in Phase 0 |
|
||
| Detached review | current lane semantics | unsupported in Phase 0 |
|
||
| Lane-aware status | partial | required |
|
||
|
||
Practical rule:
|
||
|
||
- Phase 0 defaults to conservative capability truth
|
||
- nothing upgrades from unsupported to supported by implication
|
||
- if the live seam only proves diagnostic readiness, capability must remain diagnostic-only
|
||
|
||
## Current Lock Policy
|
||
|
||
This is now a required Phase 0 rule, not a suggestion.
|
||
|
||
`codex-native` may be:
|
||
|
||
- visible in runtime status
|
||
- visible in backend options
|
||
- resolved diagnostically
|
||
|
||
But it must remain:
|
||
|
||
- `selectable: false`
|
||
- non-default
|
||
- non-auto-resolved
|
||
- non-routable into live execution without an explicit execution-lane implementation
|
||
- protected by a targeted runtime error if manually forced too early
|
||
|
||
Reason:
|
||
|
||
- Phase 0 now has honest backend truth, real end-to-end native execution, and transcript projection
|
||
- the remaining lock is now a rollout-policy choice, not a missing-code problem
|
||
- therefore unlocking the lane would still create worse product truth than the current state
|
||
|
||
## Normalized Event Schema
|
||
|
||
Phase 0 does not need the full future schema.
|
||
|
||
It does need a small, stable subset with explicit source attribution.
|
||
|
||
The important distinction is:
|
||
|
||
- one minimal schema is already implemented and should now be treated as frozen groundwork
|
||
- a richer schema is still allowed later, but only as an additive expansion
|
||
|
||
### Current minimal schema already frozen in code
|
||
|
||
Current grounded contract in `src/services/codexNative/normalizedEvents.ts`:
|
||
|
||
```ts
|
||
type CodexNativeNormalizedEvent =
|
||
| {
|
||
type: 'thread_started'
|
||
threadId: string
|
||
}
|
||
| {
|
||
type: 'turn_started'
|
||
}
|
||
| {
|
||
type: 'assistant_text'
|
||
itemId: string
|
||
text: string
|
||
}
|
||
| {
|
||
type: 'usage_updated'
|
||
inputTokens: number
|
||
cachedInputTokens: number
|
||
outputTokens: number
|
||
}
|
||
| {
|
||
type: 'turn_completed'
|
||
}
|
||
| {
|
||
type: 'warning'
|
||
source: 'stderr'
|
||
text: string
|
||
}
|
||
| {
|
||
type: 'unsupported_raw_event'
|
||
rawType: string
|
||
payload: unknown
|
||
}
|
||
```
|
||
|
||
Rules for this already-landed minimal schema:
|
||
|
||
- it is sufficient for the raw-exec spike groundwork
|
||
- it is not yet sufficient for final Phase 0 completion
|
||
- it must not be broken or renamed casually while the runner and projector are being wired
|
||
- any richer shape added next must be additive or accompanied by projector updates in the same slice
|
||
|
||
### Target additive schema before Phase 0 can be called complete
|
||
|
||
This is the richer schema the remaining implementation should converge toward:
|
||
|
||
```ts
|
||
type NormalizedProviderId = 'anthropic' | 'codex' | 'gemini'
|
||
type NormalizedRuntimeLaneId = 'anthropic' | 'gemini-cli-sdk' | 'codex-adapter' | 'codex-api' | 'codex-native'
|
||
|
||
type NativeThreadStatus =
|
||
| { type: 'not_loaded' }
|
||
| { type: 'idle' }
|
||
| { type: 'active'; activeFlags?: string[] }
|
||
| { type: 'system_error' }
|
||
|
||
type NativeWarningSource = 'thread' | 'config' | 'process' | 'provisioning'
|
||
|
||
type NormalizedRuntimeEvent =
|
||
| {
|
||
kind: 'thread_started'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
status?: NativeThreadStatus
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'thread_status_changed'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
status: NativeThreadStatus
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'thread_defaults_restored'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
model?: string
|
||
reasoningEffort?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'turn_started'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
turnId?: string
|
||
requestId?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'assistant_text'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
text: string
|
||
isDelta: boolean
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'reasoning'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
text?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'usage_updated'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
inputTokens?: number
|
||
outputTokens?: number
|
||
contextWindow?: number
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'model_rerouted'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
configuredModel?: string
|
||
effectiveModel?: string
|
||
reasoningEffort?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'turn_plan_updated'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
summary?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'turn_diff_updated'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
summary?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'warning_emitted'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
source: NativeWarningSource
|
||
threadId?: string
|
||
requestId?: string
|
||
message: string
|
||
detail?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'turn_completed'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
timestamp: string
|
||
}
|
||
| {
|
||
kind: 'turn_failed'
|
||
provider: NormalizedProviderId
|
||
laneId: NormalizedRuntimeLaneId
|
||
threadId: string
|
||
requestId?: string
|
||
error: string
|
||
timestamp: string
|
||
}
|
||
```
|
||
|
||
Schema rules:
|
||
|
||
- every event carries `provider` and `laneId`
|
||
- every event is source-attributed
|
||
- thread status and warnings are not hidden inside generic `detailMessage`
|
||
- `requestId` is optional on the wire but mandatory once known
|
||
- expansion from the current minimal schema must be additive until projector and fixture coverage are in place
|
||
|
||
## Transcript Projector Contract
|
||
|
||
Phase 0 projector requirements:
|
||
|
||
- produce persisted history that current `SessionParser` and exact-log readers can parse
|
||
- preserve request-correlation fields where available
|
||
- preserve board-task carrier fields
|
||
- never require `claude_team` to understand raw Codex item shapes
|
||
|
||
Projector rules:
|
||
|
||
1. `assistant_text`
|
||
- may append or extend assistant transcript content
|
||
|
||
2. `usage_updated`
|
||
- does not need to become a visible assistant row
|
||
- may project into additive metadata or side-channel metadata
|
||
- must not be silently dropped if it is the only authoritative usage source
|
||
|
||
3. `thread_status_changed`
|
||
- does not become canonical transcript history by default
|
||
- stays in normalized/status layer
|
||
|
||
4. `warning_emitted`
|
||
- thread and config warnings should be projectable to later UI/debug surfaces
|
||
- do not force them into fake assistant rows
|
||
|
||
5. `thread_defaults_restored`
|
||
- must not rewrite old launch config
|
||
- must remain explicit metadata
|
||
|
||
6. `model_rerouted`
|
||
- must not overwrite configured model copy invisibly
|
||
- may project to normalized-only metadata in Phase 0 if transcript row shape has no truthful home
|
||
|
||
## Raw Exec Spike Contract
|
||
|
||
The spike runner must prove all of the following:
|
||
|
||
- start a Codex-native session in a chosen working directory
|
||
- pass native credentials in the seam-native way
|
||
- capture JSONL events
|
||
- map them to normalized events
|
||
- persist transcript-compatible projection
|
||
- record:
|
||
- thread id
|
||
- executable identity
|
||
- whether run was `ephemeral`
|
||
- whether completion backfill existed
|
||
- whether final usage/model truth came from live stream or explicit seam-owned completion path
|
||
|
||
The spike runner must explicitly capture these facts:
|
||
|
||
- executable source:
|
||
- bundled
|
||
- external CLI
|
||
- executable version:
|
||
- exact reported version string when available
|
||
- runtime identity:
|
||
- backend lane id
|
||
- executable source
|
||
- executable version
|
||
- credential source:
|
||
- native API-key path
|
||
- or explicit unsupported state
|
||
- interactive capability:
|
||
- unsupported
|
||
- limited
|
||
- proven
|
||
- final history completeness:
|
||
- live-only
|
||
- backfilled
|
||
- explicit hydration required
|
||
|
||
Current implementation note:
|
||
|
||
- the spec is already grounded by one live local run
|
||
- the next required step is to turn that manual seam proof into a reusable runner contract
|
||
- until that happens, `codex-native` remains a locked diagnostic lane
|
||
- current code already enforces this lock from both status/selectability truth and live client guardrails
|
||
|
||
## Status Contract
|
||
|
||
Phase 0 status payload changes must allow `claude_team` to say all of the following truthfully:
|
||
|
||
- lane exists but is not selected
|
||
- lane is selected but not verified
|
||
- lane is resolved but degraded
|
||
- lane is running but the thread is not loaded
|
||
- lane process is alive but the thread is in `systemError`
|
||
|
||
Minimum required additions for the spike path:
|
||
|
||
- keep `selectedBackendId`
|
||
- keep `resolvedBackendId`
|
||
- keep `availableBackends`
|
||
- keep native executable identity in diagnostic or detail truth once the runner exists
|
||
- do not let degraded transport erase backend truth
|
||
- keep thread health separate from provider-global health
|
||
|
||
Current implementation note:
|
||
|
||
- backend-level status truth is already in place
|
||
- thread-level status truth is not
|
||
- therefore current Phase 0 must still describe `codex-native` as execution-locked
|
||
|
||
If native thread status is unavailable on the chosen seam:
|
||
|
||
- surface `unknown` or `degraded`
|
||
- do not synthesize `active`
|
||
|
||
## Warning Contract
|
||
|
||
Phase 0 UI must be able to distinguish:
|
||
|
||
- startup/config warning
|
||
- native thread warning
|
||
- provisioning/process warning
|
||
|
||
Allowed compromise:
|
||
|
||
- a single banner may summarize all warning presence
|
||
|
||
Not allowed:
|
||
|
||
- one combined warning string with no source attribution anywhere
|
||
|
||
## Launch Intent vs Native Defaults Contract
|
||
|
||
Phase 0 must choose one of these policies and implement it explicitly:
|
||
|
||
1. fresh-thread only
|
||
2. resume with inherited native defaults
|
||
3. resume but force explicit override
|
||
|
||
Default for the spike:
|
||
|
||
- support resume only behind flag
|
||
- if resumed defaults differ from launch intent, keep that drift explicit
|
||
|
||
Minimum required surfaced truth:
|
||
|
||
- requested launch model/effort
|
||
- effective native defaults after resume, if known
|
||
- warning or degraded state when they differ
|
||
|
||
## Credential Routing Contract
|
||
|
||
Phase 0 must not reuse old-lane readiness assumptions.
|
||
|
||
Rules:
|
||
|
||
- `codex-native` readiness is computed only from the chosen seam's credential contract
|
||
- old Codex API-key success does not imply native-lane readiness
|
||
- missing or wrong native credentials must degrade only the native lane, not the entire provider story
|
||
|
||
## Test Matrix
|
||
|
||
Minimum must-exist tests for Phase 0:
|
||
|
||
### `agent_teams_orchestrator`
|
||
|
||
- `codex-native-api-key-routing`
|
||
- `native-binary-identity-metadata`
|
||
- `exec-headless-rejects-interactive-server-requests`
|
||
- `live-turn-stream-vs-hydrated-history`
|
||
- `thread-system-error-vs-process-alive`
|
||
- `thread-not-loaded-vs-runtime-still-running`
|
||
- `thread-warning-vs-config-warning-truth`
|
||
- `resume-persisted-thread-defaults-vs-launch-intent`
|
||
- `resume-model-switch-warning-vs-runtime-copy`
|
||
- `ephemeral-turn-completed-without-backfill`
|
||
- `non-ephemeral-completed-turn-backfill`
|
||
- `request-chain-invariants`
|
||
|
||
### `claude_team`
|
||
|
||
- `runtime-selector-visible-but-not-ready`
|
||
- `headless-lane-capability-copy`
|
||
- `native-lane-auth-copy`
|
||
- `exact-log-hydrated-after-live-stream`
|
||
- `approval-cleared-on-lifecycle`
|
||
- `native-thread-status-vs-process-copy`
|
||
- `warning-channel-copy`
|
||
- `launch-intent-vs-native-defaults-copy`
|
||
|
||
## Required Evidence Package For Phase 0 Sign-off
|
||
|
||
Phase 0 should not be declared complete from code inspection alone.
|
||
|
||
Minimum sign-off evidence must include all of the following:
|
||
|
||
1. one real successful `codex exec`-backed native run through the orchestrator lane
|
||
2. persisted transcript-compatible output from that run
|
||
3. recorded native executable identity for that run:
|
||
- source
|
||
- exact version string when available
|
||
4. parser proof that current `claude_team` transcript readers still parse it
|
||
5. exact-log or replay proof for both:
|
||
- `--ephemeral`
|
||
- non-ephemeral or explicit replacement hydration path
|
||
6. one degraded-path proof showing native lane failure does not erase old-lane fallback truth
|
||
7. one status proof showing process-alive does not masquerade as native thread healthy
|
||
8. one warning proof showing config warnings and native thread warnings remain attributable
|
||
9. green targeted test runs for:
|
||
- existing old-lane fallback/regression coverage
|
||
- new native-lane runner/mapper/projector coverage
|
||
|
||
Practical rule:
|
||
|
||
- if any one of the nine items above is missing, Phase 0 is still implementation-in-progress, not sign-off ready
|
||
|
||
Recommended evidence placement:
|
||
|
||
- keep sign-off artifacts close to this doc under `docs/research/` or another explicit repo-visible location
|
||
- do not rely only on terminal memory or one-off local runs as the sole proof of completion
|
||
|
||
## Minimum Sign-off Command Package
|
||
|
||
Phase 0 sign-off should include a reproducible command package, not only prose.
|
||
|
||
Minimum command set:
|
||
|
||
### In `agent_teams_orchestrator`
|
||
|
||
- `bun test src/services/runtimeBackends/codexBackendResolver.test.ts`
|
||
- `bun test src/services/runtimeBackends/registry.agentTeams.test.ts`
|
||
- `bun test src/services/codexNative/execRunner.test.ts`
|
||
- `bun test src/services/codexNative/jsonlMapper.test.ts`
|
||
- `bun test src/services/codexNative/transcriptProjector.test.ts`
|
||
- `bun test src/services/codexNative/statusAuthority.test.ts`
|
||
- `bun test src/services/codexNative/turnExecutor.test.ts`
|
||
- `bun test src/services/codexNative/signOffHarness.test.ts`
|
||
- `git diff --check`
|
||
|
||
### In `claude_team`
|
||
|
||
- `pnpm exec vitest run test/main/ipc/configValidation.test.ts`
|
||
- `pnpm exec vitest run test/main/services/runtime/ProviderConnectionService.test.ts`
|
||
- `pnpm exec vitest run test/main/services/runtime/providerAwareCliEnv.test.ts`
|
||
- `pnpm exec vitest run test/main/services/runtime/ClaudeMultimodelBridgeService.test.ts`
|
||
- `pnpm exec vitest run test/renderer/components/runtime/providerConnectionUi.test.ts`
|
||
- `pnpm exec vitest run test/renderer/components/runtime/ProviderRuntimeSettingsDialog.test.ts`
|
||
- `pnpm exec vitest run test/renderer/components/cli/CliStatusVisibility.test.ts`
|
||
- `pnpm exec vitest run test/main/utils/jsonl.test.ts`
|
||
- `pnpm exec vitest run test/main/services/parsing/SessionParser.test.ts`
|
||
- `pnpm exec vitest run test/main/services/team/BoardTaskExactLogStrictParser.test.ts`
|
||
- `git diff --check`
|
||
|
||
### Manual native-lane proof
|
||
|
||
- one real `codex exec --json` run through the chosen orchestrator seam
|
||
- `bun run ./scripts/codex-native-phase0-signoff.ts --cwd /tmp --prompt 'Reply only with OK' --ephemeral`
|
||
- `bun run ./scripts/codex-native-phase0-signoff.ts --cwd /tmp --prompt 'Reply only with OK' --persistent`
|
||
- one recorded native executable identity proof:
|
||
- source
|
||
- version string when available
|
||
- one explicit `--ephemeral` proof
|
||
- one non-ephemeral or explicit replacement-hydration proof
|
||
- one degraded-lane proof that old Codex fallback still stays truthful
|
||
|
||
Rule:
|
||
|
||
- if the command package is not written down and reproducible, the evidence package is incomplete even if one local run looked good
|
||
|
||
## Tests Already In Place
|
||
|
||
The following tests already exist and should remain green while Phase 0 continues:
|
||
|
||
### `agent_teams_orchestrator`
|
||
|
||
- `src/services/runtimeBackends/codexBackendResolver.test.ts`
|
||
- `src/services/runtimeBackends/registry.agentTeams.test.ts`
|
||
- `src/services/codexNative/execRunner.test.ts`
|
||
- `src/services/codexNative/jsonlMapper.test.ts`
|
||
- `src/services/codexNative/transcriptProjector.test.ts`
|
||
- `src/services/codexNative/statusAuthority.test.ts`
|
||
- `src/services/codexNative/turnExecutor.test.ts`
|
||
- `src/services/codexNative/signOffHarness.test.ts`
|
||
|
||
### `claude_team`
|
||
|
||
- `test/main/services/parsing/CodexNativePhase0Smoke.test.ts`
|
||
- `test/main/ipc/configValidation.test.ts`
|
||
- `test/main/utils/jsonl.test.ts`
|
||
- `test/main/services/parsing/SessionParser.test.ts`
|
||
- `test/main/services/runtime/ProviderConnectionService.test.ts`
|
||
- `test/main/services/runtime/providerAwareCliEnv.test.ts`
|
||
- `test/main/services/runtime/ClaudeMultimodelBridgeService.test.ts`
|
||
- `test/main/services/team/BoardTaskExactLogStrictParser.test.ts`
|
||
- `test/renderer/components/runtime/providerConnectionUi.test.ts`
|
||
- `test/renderer/components/runtime/ProviderRuntimeSettingsDialog.test.ts`
|
||
- `test/renderer/components/cli/CliStatusVisibility.test.ts`
|
||
|
||
## Exact Remaining Work Before Phase 0 Can Be Called Complete
|
||
|
||
There is no remaining required Phase 0 code work.
|
||
|
||
The remaining steps are rollout-policy decisions:
|
||
|
||
1. decide whether to keep the lane locked through early internal rollout
|
||
2. if unlock is proposed later, make that a separate rollout decision rather than a hidden consequence of Phase 0 completion
|
||
|
||
## Remaining Implementation Surface From Today
|
||
|
||
The original Phase 0 estimate was:
|
||
|
||
- `agent_teams_orchestrator`: `450-1100` lines
|
||
- `claude_team`: `180-450` lines
|
||
- tests: `250-700` lines
|
||
|
||
That estimate still looks directionally correct for total Phase 0 scope.
|
||
|
||
But from the current implementation state, the remaining required surface is now:
|
||
|
||
- `agent_teams_orchestrator`: `0` lines required for Phase 0
|
||
- `claude_team`: `0` lines required for Phase 0
|
||
- tests and fixtures: `0` lines required for Phase 0
|
||
|
||
Remaining total from today:
|
||
|
||
- roughly `0` lines of required Phase 0 code
|
||
- rollout decisions remain separate from implementation completion
|
||
|
||
Practical reading:
|
||
|
||
- the big architecture uncertainty is mostly resolved
|
||
- execution wiring, projection, parser truth, and proof fixtures are already landed
|
||
- the remaining work is rollout policy only
|
||
|
||
## No-Go Rules For Starting Phase 1 Code
|
||
|
||
Do not move past Phase 0 if any of these remain ambiguous:
|
||
|
||
- whether the chosen seam is headless-limited
|
||
- whether final history completeness depends on seam-specific backfill
|
||
- whether thread status is authoritative or only guessed from process truth
|
||
- whether native thread warnings can be attributed separately from config and provisioning warnings
|
||
- whether resumed native defaults can diverge from launch intent without visible warning
|
||
- whether native credentials are routed independently from the old Codex lane
|
||
|
||
## Estimated Implementation Surface
|
||
|
||
For Phase 0 only:
|
||
|
||
- `agent_teams_orchestrator`: `450-1100` lines
|
||
- `claude_team`: `180-450` lines
|
||
- tests: `250-700` lines
|
||
|
||
Total Phase 0 expectation:
|
||
|
||
- roughly `900-2250` lines
|
||
|
||
That is intentionally smaller than the broader first-wave rollout.
|
||
|
||
## Practical Rule
|
||
|
||
Phase 0 is successful if it proves one thing:
|
||
|
||
- we can run a real `codex-native` lane and keep our current transcript/UI world honest without pretending Codex is just another Anthropic-shaped transport.
|