agent-ecosystem/docs/research/codex-native-runtime-phase-0-implementation-spec.md

41 KiB
Raw Blame History

Codex Native Runtime - Phase 0 Implementation Spec

Status:

  • working spec, implementation-backed
  • intended companion to codex-native-runtime-integration-decision.md
  • scope: minimal safe spike, not broad rollout
  • audited against current code and a live local codex exec run on 2026-04-19
  • safe to continue coding against
  • not ready to unlock codex-native for normal runtime selection yet

Purpose

This document turns the Codex-native decision doc into an execution spec for Phase 0.

Phase 0 is not the full migration.

Its only job is to prove that we can add a feature-flagged codex-native lane without:

  • breaking current transcript consumers
  • lying about status/capabilities in UI
  • silently changing launch, replay, or approval semantics

If Phase 0 succeeds, we should know whether the first implementation wave can proceed as a minimal safe swap.

Current Readiness Verdict

The spec itself is now ready to drive implementation.

Phase 0 implementation is now wired and evidence-backed.

Current state:

  • ready and already implemented:
    • codex-native backend vocabulary in agent_teams_orchestrator
    • codex-native backend vocabulary in claude_team config and validation
    • backend-aware Codex connection-routing in claude_team
    • lane-aware Codex status/copy in claude_team
    • raw codex exec arg builder
    • raw JSONL-to-normalized-event mapper
    • real process-owned codex exec runner
    • transcript-compatible projector
    • persisted history wiring through the native lane
    • native executable identity, credential source, and completion metadata capture
    • parser coverage for native projected assistant rows
    • parser coverage for modern system warning rows
    • conservative selector lock policy
    • targeted tests for the above slices
  • ⚠️ partially implemented:
    • codex-native runtime status can now represent the lane honestly, and the execution lane is real, but the lane remains intentionally locked and non-selectable
    • native lane credentials are routed honestly end-to-end, but the lane still exposes only a conservative headless-limited capability profile
    • the lane remains intentionally conservative in UI exposure and unlock policy even though transcript authority is now stronger
  • sign-off evidence package is now captured in codex-native-runtime-phase-0-signoff-evidence.md

Practical meaning:

  • the Phase 0 contract is now strong enough to keep implementing against
  • the product is still protected from false rollout because codex-native remains a locked experimental lane

Spec Maintenance Rule

This document is allowed to evolve only in two ways:

  1. to reflect implementation-backed reality more accurately
  2. to tighten gates when a new risk is discovered

It must not drift into a second speculative architecture document.

Required maintenance behavior:

  • if a Phase 0 PR changes authority order, capability truth, lock policy, or exit criteria, this spec must be updated in the same PR
  • if a Phase 0 PR only adds implementation under an already-frozen contract, this spec should update only its status/checklist sections
  • if current code and this spec disagree, either the code is wrong, or the spec is stale - do not leave the disagreement implicit
  • if the implementation-status snapshot changes materially, update the Implementation Status As Of ... date in the same PR

Phase 0 Source Of Truth Rule

For Phase 0 implementation work:

If the two documents appear to disagree on a Phase 0 implementation detail:

  • this spec wins until both documents are reconciled

Reason:

  • the decision doc is intentionally broader
  • this spec is intentionally narrower and implementation-facing

Implementation Status As Of 2026-04-19

Foundation already landed

  • agent_teams_orchestrator now knows codex-native as a first-class backend id
  • agent_teams_orchestrator status and registry surfaces can describe the lane without auto-resolving into it
  • claude_team config vocabulary, validation, connection routing, and runtime UI copy are lane-aware
  • old Codex auth mode no longer silently chooses the runtime lane
  • raw exec Phase 0 modules already exist for:
    • arg building
    • JSONL mapping
    • normalized event shape
  • the live orchestrator execution path now has:
    • a real codex exec runner
    • transcript-compatible projection
    • persisted history writes
    • executable identity and completion metadata capture
  • native projected transcript rows now carry:
    • thread-status authority
    • warning-source attribution
    • execution-summary and history-completeness metadata
  • targeted tests now exist for resolver, registry, config validation, connection routing, lane-aware UI, exec arg building, JSONL mapping, transcript projection, thread-status authority, turn execution, JSONL parsing, exact-log parsing, and session parsing

Foundation intentionally still locked

  • codex-native is not selectable for normal users
  • auto never resolves to codex-native
  • targeted client guard still rejects live interactive execution on the lane
  • renderer/status surfaces may show the lane diagnostically, but not as a fully usable runtime

Remaining Phase 0 blockers

  • no code blockers remain inside Phase 0
  • lane unlock remains intentionally blocked by rollout policy

Phase 0 readiness verdict

  • implementation-complete
  • sign-off evidence captured
  • raw-exec execution slice is landed
  • ready to treat the spec as the contract for remaining work
  • ready to declare Phase 0 complete
  • ⚠️ still not ready to unlock codex-native as a selectable runtime lane

Observed Current Codex Exec Facts

The following are no longer assumptions. They were observed locally on 2026-04-19 with:

  • codex-cli 0.117.0
  • codex exec --json --ephemeral --skip-git-repo-check -C /tmp 'Reply only with OK'

Observed event shape:

  • thread.started
  • turn.started
  • item.completed
  • turn.completed

Observed successful assistant payload:

  • item.completed.item.type = "agent_message"
  • item.completed.item.text = "OK"

Observed usage payload:

  • turn.completed.usage.input_tokens
  • turn.completed.usage.cached_input_tokens
  • turn.completed.usage.output_tokens

Observed seam-critical warning:

  • thread/read failed while backfilling turn items for turn completion
  • ephemeral threads do not support includeTurns
  • non-JSON warning lines may be interleaved with JSONL and must stay source-attributed

Observed practical implication:

  • --ephemeral gives useful live events
  • --ephemeral does not give final completion backfill via thread/read
  • this confirms the Phase 0 rule that live stream and canonical history are different authorities

Current Implemented Routing Facts

These are current implementation-backed truths, not future intentions:

  • codex-native is a distinct backend lane, not a rename of old Codex api or adapter
  • auto does not resolve to codex-native
  • codex-native requires its own native-lane readiness path
  • the native credential surface is CODEX_API_KEY, not implicit old-lane readiness
  • claude_team now keeps auth routing and backend-lane routing separate
  • when the selected backend is codex-native, app-side credential bridging may populate CODEX_API_KEY
  • manual early routing into live codex-native execution is still protected by a targeted runtime guard
  • once a real native runner exists, native-lane truth must also carry executable identity, not only backend id

Practical rule:

  • if later code or copy contradicts any item above, it should be treated as regression unless the Phase 0 contract is intentionally amended

Scope

In scope:

  • one experimental codex-native backend lane
  • one chosen execution seam for the spike
  • normalized runtime events for the spike lane
  • transcript-compatible projection for the spike lane
  • explicit authority order for:
    • history
    • status
    • warnings
    • launch intent versus native thread defaults
    • credential routing
  • feature-flagged runtime exposure only
  • explicit unsupported-state treatment for headless-limited interactions

Out of scope:

  • making codex-native the default
  • broad plugin UX rollout
  • detached review parity
  • full app-server integration
  • changing claude_team transcript parser format
  • removing the old Codex adapter/api lane

Phase 0 Deliverable

Phase 0 is complete only if all of the following are true:

  • agent_teams_orchestrator can run one real Codex-native session through a feature-flagged lane
  • the spike emits normalized events
  • normalized events can be projected into transcript-compatible persisted history
  • current claude_team transcript readers still parse the output without schema rewrite
  • runtime status can represent the lane honestly as selected, resolved, degraded, or unavailable
  • UI copy does not overclaim:
    • plugin support
    • approval support
    • interactive prompt support
    • current-session plugin activation
    • thread health from process health

Phase 0 Exit Checklist

Use this as the stop/go gate before declaring Phase 0 done.

Gate Current state Requirement to pass
codex-native backend truth exists in both repos done keep green
lane remains additive and non-default done keep green
lane remains locked until execution is real done keep green
old Codex api/adapter lane remains behaviorally unchanged targeted regression coverage green required
old Codex lane remains the safe fallback when native lane is absent, locked, or degraded targeted regression coverage green required
real codex exec process run is wired into orchestrator done keep green
executable identity is captured per run done keep green
runner records executable source and completion policy done keep green
normalized native events flow from live process output done keep green
native lane capability profile remains explicit and conservative done keep green
transcript-compatible projection is written to persisted history done keep green
current parser and exact-log paths still parse the projection parser and exact-log proof green keep green
native thread-status authority exists or degrades honestly projected thread-status rows and targeted tests green keep green
warning sources remain separated end-to-end warning-source attribution survives projected transcript rows keep green
replay and history fixtures exist for ephemeral and non-ephemeral runs targeted replay/history fixtures green keep green
UI copy stays lane-aware and capability-honest targeted UI/runtime tests green keep green

Completion Versus Unlock Policy

Phase 0 completion and lane unlock are related, but they are not the same event.

Phase 0 completion means:

  • one real codex-native execution path works end-to-end
  • transcript, status, warning, and history truth stay honest
  • internal fixtures prove the chosen seam well enough to proceed

Phase 0 completion does not mean:

  • codex-native becomes default
  • auto may resolve to codex-native
  • the lane is generally available without a feature flag
  • the lane suddenly gains plugin, MCP, approval, or app-server-grade interactive claims

Default post-Phase-0 policy:

  • keep codex-native feature-flagged
  • keep capability truth conservative
  • unlock only for explicit internal usage first
  • treat broader rollout as a later decision after Phase 1 gates, not as an automatic consequence of finishing Phase 0

Old Codex Lane Regression Guardrail

Phase 0 is not allowed to “succeed” by quietly making the existing Codex lane worse.

Required rule:

  • all codex-native work remains additive until a later explicit migration decision

That means:

  • old Codex api/adapter execution remains routable
  • old Codex connection/auth behavior remains valid for the old lane
  • auto keeps todays old-lane behavior
  • status, settings, and selector surfaces keep showing a truthful fallback path when native lane is absent, locked, or degraded
  • a failed or unavailable codex-native lane must not make the whole Codex provider story look unavailable if the old lane still works

Not allowed:

  • reinterpreting old-lane readiness as native-lane readiness
  • changing old-lane defaults only because the new lane exists
  • breaking old-lane tests while claiming the work is “only for native”

Chosen Phase 0 Default

Phase 0 default:

  • execution seam: raw codex exec wrapper first
  • lane shape: headless-limited until proven otherwise
  • old Codex lane remains intact and is the fallback
  • codex-native is additive, behind feature flag

Reason:

  • raw exec exposes session ownership and --ephemeral tradeoffs more honestly than the current TypeScript SDK wrapper
  • it reduces the chance of hiding critical persistence or capability differences under a convenience API too early

Execution Seam Freeze Rule

Phase 0 currently chooses one seam:

  • raw codex exec wrapper first

That choice is now frozen for the remainder of Phase 0 unless explicitly amended.

Practical rule:

  • do not quietly switch the live implementation to current TypeScript SDK mid-Phase-0 while keeping the same checklist and evidence package
  • if the chosen seam changes, the following must be re-evaluated and updated together:
    • capability matrix
    • credential-routing contract
    • history-completeness contract
    • sign-off evidence package
    • sign-off command package

Reason:

  • otherwise Phase 0 can look “complete” while its evidence package still proves a different seam than the one actually being shipped

Current Phase 0 Contract State

This spec now serves two jobs at once:

  1. freeze the minimum safe contract for the remaining Phase 0 work
  2. record which pieces of that contract already exist in code

That distinction matters because Phase 0 is no longer theoretical.

It already has grounded slices in both repos and is now implementation-complete, but it remains deliberately rollout-limited.

Rule:

  • if a section below describes authority or capability truth that is not implemented yet, it is still binding for the next code slices
  • if current code violates that truth, current code must change before codex-native is unlocked

Repo Ownership

agent_teams_orchestrator

Owns:

  • Codex-native execution seam
  • normalized event schema
  • raw native event mapping
  • transcript-compatible projector
  • lane capability truth
  • thread-status and warning authority
  • credential routing for the chosen seam

Recommended touched areas:

  • src/services/runtimeBackends/types.ts
  • src/services/runtimeBackends/registry.ts
  • src/services/runtimeBackends/codexBackendResolver.ts
  • src/services/boardTaskActivity/contract.ts
  • src/services/boardTaskActivity/BoardTaskTranscriptProjector.ts
  • src/query.ts
  • src/utils/config.ts

Path note:

  • the paths above are in the agent_teams_orchestrator repo, not in claude_team

Recommended new module split for the spike:

  • src/services/codexNative/execRunner.ts
  • src/services/codexNative/jsonlMapper.ts
  • src/services/codexNative/normalizedEvents.ts
  • src/services/codexNative/capabilities.ts
  • src/services/codexNative/statusAuthority.ts
  • src/services/codexNative/transcriptProjector.ts

Current implementation status:

  • created:
    • src/services/codexNative/execRunner.ts
    • src/services/codexNative/jsonlMapper.ts
    • src/services/codexNative/normalizedEvents.ts
    • src/services/codexNative/capabilities.ts
    • src/services/codexNative/statusAuthority.ts
    • src/services/codexNative/transcriptProjector.ts
    • src/services/codexNative/signOffHarness.ts

claude_team

Owns:

  • backend-lane-aware status ingestion
  • lane-aware copy
  • feature-flag exposure
  • preserving current transcript/read-model path

Recommended touched areas:

plugin-kit-ai

Not required for the Phase 0 spike.

Only Phase-0-adjacent requirement:

  • no UI or status copy may imply plugin execution support for codex-native before Phase 3

Phase 0 should be cut in this order:

  1. agent_teams_orchestrator type freeze
  • add codex-native backend id to runtime backend types
  • keep old Codex lane untouched
  • add feature flag gates only, no behavior switch yet
  • status: done
    • grounded by:
      • backend id additions
      • resolver gates
      • registry/status exposure
      • targeted runtime backend tests
  1. raw exec spike seam
  • add a tiny native runner that can start one Codex-native session
  • capture raw JSONL
  • record executable source, credential path, and ephemeral policy
  • status: done
    • grounded by:
      • arg builder
      • real process runner in orchestrator
      • live event fixture mapping
      • observed local seam validation
      • executable-source capture
      • executable-version capture
      • completion-policy and backfill metadata capture
      • explicit client guard that keeps rollout conservative
  1. normalized mapper
  • map raw events into the Phase-0 normalized schema
  • do not wire UI to raw events
  • status: done
    • grounded by:
      • thread started
      • turn started
      • assistant text
      • usage updated
      • turn completed
      • stderr warning passthrough
      • unsupported raw event preservation
      • stable minimal Phase-0 event contract frozen in code
  1. transcript-compatible projector
  • project the normalized subset into persisted transcript-compatible history
  • verify current parser path still works
  • status: done
    • grounded by:
      • persisted assistant projection
      • projected warning rows with source attribution
      • projected thread-status rows
      • projected execution-summary rows with history-completeness metadata
      • green parser and exact-log fixtures
  1. status and warning authority
  • keep lane status, thread status, and warning-source truth separate
  • update bridge payloads before touching UI copy
  • status: done
    • grounded by:
      • backend lane truth in runtime status
      • selectable-vs-available distinction
      • codex-native remains locked
      • targeted UI copy no longer claims auth mode equals runtime lane
      • projected thread-status authority in persisted history
      • projected warning-source attribution in persisted history
      • sign-off evidence for process versus history warning attribution
  1. claude_team feature-flagged exposure
  • show lane only when the backend truth can already represent it honestly
  • keep unsupported capabilities visibly unsupported
  • status: done
    • grounded by:
      • lane-aware config vocabulary
      • lane-aware connection/runtime copy
      • lane-aware selector behavior
      • backend env kept independent from auth mode
      • locked-lane affordance in runtime settings surfaces
      • targeted UI/runtime tests for locked-lane truth
  1. fixture and regression pass
  • add the mandatory Phase-0 fixtures
  • only then allow limited internal usage of the new lane
  • status: done
    • grounded by:
      • resolver fixtures
      • runtime status fixtures
      • raw exec arg-builder fixtures
      • raw JSONL mapper fixtures
      • claude_team config/routing/UI fixtures
      • transcript/replay/history fixtures
      • thread-status authority fixtures
      • exact-log compatibility fixtures
      • repo-visible sign-off evidence package

Authority Order

This is the most important part of the spec.

1. Execution authority

For the spike lane:

  1. raw codex exec JSONL output
  2. normalized-event mapping
  3. transcript-compatible projection
  4. current claude_team transcript/read-model path

Rule:

  • no UI surface consumes raw native events directly in Phase 0

2. History authority

History truth order:

  1. explicit seam-owned completion or hydration source for the chosen lane
  2. persisted transcript-compatible projection written by orchestrator
  3. live event cache for activity only

Rule:

  • live stream is never canonical history by itself

3. Status authority

Status truth must stay split by scope:

  1. native thread status
  2. provider-lane status
  3. host process/provisioning status

Rules:

  • thread health is not inferred from process liveness
  • provider-global runtime banners are not allowed to masquerade as thread-specific health
  • if native thread status is unavailable on the chosen seam, UI must say degraded or unavailable, not synthesize active

4. Warning authority

Warning channels remain separate:

  1. native thread warnings
  2. config/startup warnings
  3. provisioning/process warnings

Rules:

  • do not merge these channels into one generic warning field
  • if a UI surface can only show one summary line, it must still preserve source attribution in detail text

5. Launch-intent authority

There are two different truths:

  • host launch intent
  • live native thread defaults

Rules:

  • provider/model/effort in launch config is launch intent only
  • resumed native thread defaults may differ
  • if they differ, UI must show either:
    • inherited native defaults
    • explicit override pending
    • or forced fresh-thread policy

6. Credential authority

Rules:

  • old Codex lane auth truth and codex-native auth truth must not share one fake readiness source
  • old lane may still use current app-side OPENAI_API_KEY flow
  • codex-native must use only the credential contract actually required by the chosen seam
  • UI must not infer native readiness from old-lane auth success

Phase 0 Capability Matrix

Phase 0 should assume the following unless the spike proves otherwise:

Capability Old Codex lane codex-native spike lane
Team launch supported supported behind flag
Transcript-compatible history supported required
Plugins unsupported unsupported in Phase 0
MCP unsupported or existing-lane-specific unsupported unless explicitly proven on chosen seam
Skills unsupported or existing-lane-specific unsupported unless explicitly proven on chosen seam
Manual approvals current lane semantics unsupported or limited unless explicitly proven
Generic interactive prompts n/a unsupported in Phase 0
Detached review current lane semantics unsupported in Phase 0
Lane-aware status partial required

Practical rule:

  • Phase 0 defaults to conservative capability truth
  • nothing upgrades from unsupported to supported by implication
  • if the live seam only proves diagnostic readiness, capability must remain diagnostic-only

Current Lock Policy

This is now a required Phase 0 rule, not a suggestion.

codex-native may be:

  • visible in runtime status
  • visible in backend options
  • resolved diagnostically

But it must remain:

  • selectable: false
  • non-default
  • non-auto-resolved
  • non-routable into live execution without an explicit execution-lane implementation
  • protected by a targeted runtime error if manually forced too early

Reason:

  • Phase 0 now has honest backend truth, real end-to-end native execution, and transcript projection
  • the remaining lock is now a rollout-policy choice, not a missing-code problem
  • therefore unlocking the lane would still create worse product truth than the current state

Normalized Event Schema

Phase 0 does not need the full future schema.

It does need a small, stable subset with explicit source attribution.

The important distinction is:

  • one minimal schema is already implemented and should now be treated as frozen groundwork
  • a richer schema is still allowed later, but only as an additive expansion

Current minimal schema already frozen in code

Current grounded contract in src/services/codexNative/normalizedEvents.ts:

type CodexNativeNormalizedEvent =
  | {
      type: 'thread_started'
      threadId: string
    }
  | {
      type: 'turn_started'
    }
  | {
      type: 'assistant_text'
      itemId: string
      text: string
    }
  | {
      type: 'usage_updated'
      inputTokens: number
      cachedInputTokens: number
      outputTokens: number
    }
  | {
      type: 'turn_completed'
    }
  | {
      type: 'warning'
      source: 'stderr'
      text: string
    }
  | {
      type: 'unsupported_raw_event'
      rawType: string
      payload: unknown
    }

Rules for this already-landed minimal schema:

  • it is sufficient for the raw-exec spike groundwork
  • it is not yet sufficient for final Phase 0 completion
  • it must not be broken or renamed casually while the runner and projector are being wired
  • any richer shape added next must be additive or accompanied by projector updates in the same slice

Target additive schema before Phase 0 can be called complete

This is the richer schema the remaining implementation should converge toward:

type NormalizedProviderId = 'anthropic' | 'codex' | 'gemini'
type NormalizedRuntimeLaneId = 'anthropic' | 'gemini-cli-sdk' | 'codex-adapter' | 'codex-api' | 'codex-native'

type NativeThreadStatus =
  | { type: 'not_loaded' }
  | { type: 'idle' }
  | { type: 'active'; activeFlags?: string[] }
  | { type: 'system_error' }

type NativeWarningSource = 'thread' | 'config' | 'process' | 'provisioning'

type NormalizedRuntimeEvent =
  | {
      kind: 'thread_started'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      status?: NativeThreadStatus
      timestamp: string
    }
  | {
      kind: 'thread_status_changed'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      status: NativeThreadStatus
      timestamp: string
    }
  | {
      kind: 'thread_defaults_restored'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      model?: string
      reasoningEffort?: string
      timestamp: string
    }
  | {
      kind: 'turn_started'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      turnId?: string
      requestId?: string
      timestamp: string
    }
  | {
      kind: 'assistant_text'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      text: string
      isDelta: boolean
      timestamp: string
    }
  | {
      kind: 'reasoning'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      text?: string
      timestamp: string
    }
  | {
      kind: 'usage_updated'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      inputTokens?: number
      outputTokens?: number
      contextWindow?: number
      timestamp: string
    }
  | {
      kind: 'model_rerouted'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      configuredModel?: string
      effectiveModel?: string
      reasoningEffort?: string
      timestamp: string
    }
  | {
      kind: 'turn_plan_updated'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      summary?: string
      timestamp: string
    }
  | {
      kind: 'turn_diff_updated'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      summary?: string
      timestamp: string
    }
  | {
      kind: 'warning_emitted'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      source: NativeWarningSource
      threadId?: string
      requestId?: string
      message: string
      detail?: string
      timestamp: string
    }
  | {
      kind: 'turn_completed'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      timestamp: string
    }
  | {
      kind: 'turn_failed'
      provider: NormalizedProviderId
      laneId: NormalizedRuntimeLaneId
      threadId: string
      requestId?: string
      error: string
      timestamp: string
    }

Schema rules:

  • every event carries provider and laneId
  • every event is source-attributed
  • thread status and warnings are not hidden inside generic detailMessage
  • requestId is optional on the wire but mandatory once known
  • expansion from the current minimal schema must be additive until projector and fixture coverage are in place

Transcript Projector Contract

Phase 0 projector requirements:

  • produce persisted history that current SessionParser and exact-log readers can parse
  • preserve request-correlation fields where available
  • preserve board-task carrier fields
  • never require claude_team to understand raw Codex item shapes

Projector rules:

  1. assistant_text
  • may append or extend assistant transcript content
  1. usage_updated
  • does not need to become a visible assistant row
  • may project into additive metadata or side-channel metadata
  • must not be silently dropped if it is the only authoritative usage source
  1. thread_status_changed
  • does not become canonical transcript history by default
  • stays in normalized/status layer
  1. warning_emitted
  • thread and config warnings should be projectable to later UI/debug surfaces
  • do not force them into fake assistant rows
  1. thread_defaults_restored
  • must not rewrite old launch config
  • must remain explicit metadata
  1. model_rerouted
  • must not overwrite configured model copy invisibly
  • may project to normalized-only metadata in Phase 0 if transcript row shape has no truthful home

Raw Exec Spike Contract

The spike runner must prove all of the following:

  • start a Codex-native session in a chosen working directory
  • pass native credentials in the seam-native way
  • capture JSONL events
  • map them to normalized events
  • persist transcript-compatible projection
  • record:
    • thread id
    • executable identity
    • whether run was ephemeral
    • whether completion backfill existed
    • whether final usage/model truth came from live stream or explicit seam-owned completion path

The spike runner must explicitly capture these facts:

  • executable source:
    • bundled
    • external CLI
  • executable version:
    • exact reported version string when available
  • runtime identity:
    • backend lane id
    • executable source
    • executable version
  • credential source:
    • native API-key path
    • or explicit unsupported state
  • interactive capability:
    • unsupported
    • limited
    • proven
  • final history completeness:
    • live-only
    • backfilled
    • explicit hydration required

Current implementation note:

  • the spec is already grounded by one live local run
  • the next required step is to turn that manual seam proof into a reusable runner contract
  • until that happens, codex-native remains a locked diagnostic lane
  • current code already enforces this lock from both status/selectability truth and live client guardrails

Status Contract

Phase 0 status payload changes must allow claude_team to say all of the following truthfully:

  • lane exists but is not selected
  • lane is selected but not verified
  • lane is resolved but degraded
  • lane is running but the thread is not loaded
  • lane process is alive but the thread is in systemError

Minimum required additions for the spike path:

  • keep selectedBackendId
  • keep resolvedBackendId
  • keep availableBackends
  • keep native executable identity in diagnostic or detail truth once the runner exists
  • do not let degraded transport erase backend truth
  • keep thread health separate from provider-global health

Current implementation note:

  • backend-level status truth is already in place
  • thread-level status truth is not
  • therefore current Phase 0 must still describe codex-native as execution-locked

If native thread status is unavailable on the chosen seam:

  • surface unknown or degraded
  • do not synthesize active

Warning Contract

Phase 0 UI must be able to distinguish:

  • startup/config warning
  • native thread warning
  • provisioning/process warning

Allowed compromise:

  • a single banner may summarize all warning presence

Not allowed:

  • one combined warning string with no source attribution anywhere

Launch Intent vs Native Defaults Contract

Phase 0 must choose one of these policies and implement it explicitly:

  1. fresh-thread only
  2. resume with inherited native defaults
  3. resume but force explicit override

Default for the spike:

  • support resume only behind flag
  • if resumed defaults differ from launch intent, keep that drift explicit

Minimum required surfaced truth:

  • requested launch model/effort
  • effective native defaults after resume, if known
  • warning or degraded state when they differ

Credential Routing Contract

Phase 0 must not reuse old-lane readiness assumptions.

Rules:

  • codex-native readiness is computed only from the chosen seam's credential contract
  • old Codex API-key success does not imply native-lane readiness
  • missing or wrong native credentials must degrade only the native lane, not the entire provider story

Test Matrix

Minimum must-exist tests for Phase 0:

agent_teams_orchestrator

  • codex-native-api-key-routing
  • native-binary-identity-metadata
  • exec-headless-rejects-interactive-server-requests
  • live-turn-stream-vs-hydrated-history
  • thread-system-error-vs-process-alive
  • thread-not-loaded-vs-runtime-still-running
  • thread-warning-vs-config-warning-truth
  • resume-persisted-thread-defaults-vs-launch-intent
  • resume-model-switch-warning-vs-runtime-copy
  • ephemeral-turn-completed-without-backfill
  • non-ephemeral-completed-turn-backfill
  • request-chain-invariants

claude_team

  • runtime-selector-visible-but-not-ready
  • headless-lane-capability-copy
  • native-lane-auth-copy
  • exact-log-hydrated-after-live-stream
  • approval-cleared-on-lifecycle
  • native-thread-status-vs-process-copy
  • warning-channel-copy
  • launch-intent-vs-native-defaults-copy

Required Evidence Package For Phase 0 Sign-off

Phase 0 should not be declared complete from code inspection alone.

Minimum sign-off evidence must include all of the following:

  1. one real successful codex exec-backed native run through the orchestrator lane
  2. persisted transcript-compatible output from that run
  3. recorded native executable identity for that run:
    • source
    • exact version string when available
  4. parser proof that current claude_team transcript readers still parse it
  5. exact-log or replay proof for both:
    • --ephemeral
    • non-ephemeral or explicit replacement hydration path
  6. one degraded-path proof showing native lane failure does not erase old-lane fallback truth
  7. one status proof showing process-alive does not masquerade as native thread healthy
  8. one warning proof showing config warnings and native thread warnings remain attributable
  9. green targeted test runs for:
    • existing old-lane fallback/regression coverage
    • new native-lane runner/mapper/projector coverage

Practical rule:

  • if any one of the nine items above is missing, Phase 0 is still implementation-in-progress, not sign-off ready

Recommended evidence placement:

  • keep sign-off artifacts close to this doc under docs/research/ or another explicit repo-visible location
  • do not rely only on terminal memory or one-off local runs as the sole proof of completion

Minimum Sign-off Command Package

Phase 0 sign-off should include a reproducible command package, not only prose.

Minimum command set:

In agent_teams_orchestrator

  • bun test src/services/runtimeBackends/codexBackendResolver.test.ts
  • bun test src/services/runtimeBackends/registry.agentTeams.test.ts
  • bun test src/services/codexNative/execRunner.test.ts
  • bun test src/services/codexNative/jsonlMapper.test.ts
  • bun test src/services/codexNative/transcriptProjector.test.ts
  • bun test src/services/codexNative/statusAuthority.test.ts
  • bun test src/services/codexNative/turnExecutor.test.ts
  • bun test src/services/codexNative/signOffHarness.test.ts
  • git diff --check

In claude_team

  • pnpm exec vitest run test/main/ipc/configValidation.test.ts
  • pnpm exec vitest run test/main/services/runtime/ProviderConnectionService.test.ts
  • pnpm exec vitest run test/main/services/runtime/providerAwareCliEnv.test.ts
  • pnpm exec vitest run test/main/services/runtime/ClaudeMultimodelBridgeService.test.ts
  • pnpm exec vitest run test/renderer/components/runtime/providerConnectionUi.test.ts
  • pnpm exec vitest run test/renderer/components/runtime/ProviderRuntimeSettingsDialog.test.ts
  • pnpm exec vitest run test/renderer/components/cli/CliStatusVisibility.test.ts
  • pnpm exec vitest run test/main/utils/jsonl.test.ts
  • pnpm exec vitest run test/main/services/parsing/SessionParser.test.ts
  • pnpm exec vitest run test/main/services/team/BoardTaskExactLogStrictParser.test.ts
  • git diff --check

Manual native-lane proof

  • one real codex exec --json run through the chosen orchestrator seam
  • bun run ./scripts/codex-native-phase0-signoff.ts --cwd /tmp --prompt 'Reply only with OK' --ephemeral
  • bun run ./scripts/codex-native-phase0-signoff.ts --cwd /tmp --prompt 'Reply only with OK' --persistent
  • one recorded native executable identity proof:
    • source
    • version string when available
  • one explicit --ephemeral proof
  • one non-ephemeral or explicit replacement-hydration proof
  • one degraded-lane proof that old Codex fallback still stays truthful

Rule:

  • if the command package is not written down and reproducible, the evidence package is incomplete even if one local run looked good

Tests Already In Place

The following tests already exist and should remain green while Phase 0 continues:

agent_teams_orchestrator

  • src/services/runtimeBackends/codexBackendResolver.test.ts
  • src/services/runtimeBackends/registry.agentTeams.test.ts
  • src/services/codexNative/execRunner.test.ts
  • src/services/codexNative/jsonlMapper.test.ts
  • src/services/codexNative/transcriptProjector.test.ts
  • src/services/codexNative/statusAuthority.test.ts
  • src/services/codexNative/turnExecutor.test.ts
  • src/services/codexNative/signOffHarness.test.ts

claude_team

  • test/main/services/parsing/CodexNativePhase0Smoke.test.ts
  • test/main/ipc/configValidation.test.ts
  • test/main/utils/jsonl.test.ts
  • test/main/services/parsing/SessionParser.test.ts
  • test/main/services/runtime/ProviderConnectionService.test.ts
  • test/main/services/runtime/providerAwareCliEnv.test.ts
  • test/main/services/runtime/ClaudeMultimodelBridgeService.test.ts
  • test/main/services/team/BoardTaskExactLogStrictParser.test.ts
  • test/renderer/components/runtime/providerConnectionUi.test.ts
  • test/renderer/components/runtime/ProviderRuntimeSettingsDialog.test.ts
  • test/renderer/components/cli/CliStatusVisibility.test.ts

Exact Remaining Work Before Phase 0 Can Be Called Complete

There is no remaining required Phase 0 code work.

The remaining steps are rollout-policy decisions:

  1. decide whether to keep the lane locked through early internal rollout
  2. if unlock is proposed later, make that a separate rollout decision rather than a hidden consequence of Phase 0 completion

Remaining Implementation Surface From Today

The original Phase 0 estimate was:

  • agent_teams_orchestrator: 450-1100 lines
  • claude_team: 180-450 lines
  • tests: 250-700 lines

That estimate still looks directionally correct for total Phase 0 scope.

But from the current implementation state, the remaining required surface is now:

  • agent_teams_orchestrator: 0 lines required for Phase 0
  • claude_team: 0 lines required for Phase 0
  • tests and fixtures: 0 lines required for Phase 0

Remaining total from today:

  • roughly 0 lines of required Phase 0 code
  • rollout decisions remain separate from implementation completion

Practical reading:

  • the big architecture uncertainty is mostly resolved
  • execution wiring, projection, parser truth, and proof fixtures are already landed
  • the remaining work is rollout policy only

No-Go Rules For Starting Phase 1 Code

Do not move past Phase 0 if any of these remain ambiguous:

  • whether the chosen seam is headless-limited
  • whether final history completeness depends on seam-specific backfill
  • whether thread status is authoritative or only guessed from process truth
  • whether native thread warnings can be attributed separately from config and provisioning warnings
  • whether resumed native defaults can diverge from launch intent without visible warning
  • whether native credentials are routed independently from the old Codex lane

Estimated Implementation Surface

For Phase 0 only:

  • agent_teams_orchestrator: 450-1100 lines
  • claude_team: 180-450 lines
  • tests: 250-700 lines

Total Phase 0 expectation:

  • roughly 900-2250 lines

That is intentionally smaller than the broader first-wave rollout.

Practical Rule

Phase 0 is successful if it proves one thing:

  • we can run a real codex-native lane and keep our current transcript/UI world honest without pretending Codex is just another Anthropic-shaped transport.