agent-ecosystem/docs/research/codex-app-server-model-catalog-plan.md

# Codex App-Server Model Catalog Plan

**Date**: 2026-04-21
**Status**: implementation complete in feature worktrees, pending final review/commit
**Worktree**: `/Users/belief/dev/projects/claude/claude_team_codex_model_catalog_plan`
**Branch**: `spike/codex-model-catalog-plan`
**Primary repo**: `claude_team`
**Secondary repo worktree**: `/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike`
**Architecture reference**: [FEATURE_ARCHITECTURE_STANDARD.md](../FEATURE_ARCHITECTURE_STANDARD.md)

## Executive Summary

Codex model selection should move from hardcoded local lists to the official Codex app-server `model/list` catalog.

Chosen implementation:

- Add a dedicated `src/features/codex-model-catalog` feature in `claude_team`.
- Use `codex app-server` JSON-RPC `model/list` as the primary source for Codex models.
- Keep the existing static Codex catalog only as a bounded fallback when app-server is unavailable.
- Add rich, additive model metadata to `CliProviderStatus` while keeping `models: string[]` for backwards compatibility.
- Use per-model `supportedReasoningEfforts` and `defaultReasoningEffort` for the Codex model picker and launch validation.
- Keep Anthropic and Gemini behavior unchanged by default.
- Update `agent_teams_orchestrator` so Codex launches pass reasoning effort through Codex config key `model_reasoning_effort`, not through an invented `--effort` flag.

Decision score:

- `🎯 9   🛡️ 9   🧠 6`
- estimated implementation size: `1200-2400` lines across `claude_team` and `agent_teams_orchestrator`

Why this is the safest path:

- It follows the real Codex client contract instead of chasing static releases.
- It solves future model releases like `gpt-5.5` without an app release, as long as Codex app-server already exposes the model.
- It avoids breaking Anthropic by making the new catalog contract additive and provider-scoped.
- It handles `xhigh` correctly as Codex-specific reasoning effort, not as Anthropic `max`.

Current implementation state:

- `claude_team` has the dedicated Codex model catalog feature, app-server JSON-RPC client, static fallback, provider status integration, Codex model picker integration, provider-aware effort UI, launch validation, launch identity persistence, and targeted tests.
- `agent_teams_orchestrator_codex_native_spike` exposes runtime capabilities for dynamic Codex models and Codex reasoning config pass-through, and its Codex native exec runner passes effort through `-c model_reasoning_effort="value"`.
- Anthropic remains isolated from Codex-only effort values. Anthropic launch UI still uses `low | medium | high`; Codex can use per-model `minimal | low | medium | high | xhigh` only where catalog/runtime policy allows it.
- Future Codex app-server models can appear immediately in UI. Launch is allowed only when the local runtime declares dynamic Codex model support; otherwise they remain visible with upgrade/policy copy instead of failing late during spawn.
- `Default` Codex selection is resolved to a concrete model immediately before provisioning and stored as additive launch identity metadata.
- The remaining work before merge is review/signoff, not more architecture discovery.

## Sources And Verification

Official sources checked:

- [Codex App Server](https://developers.openai.com/codex/app-server)
- [Codex CLI command line options](https://developers.openai.com/codex/cli/reference)
- [Codex Configuration Reference](https://developers.openai.com/codex/config-reference)

Important official facts:

- `model/list` is explicitly intended for rendering model and personality selectors.
- `model/list` returns `id`, `model`, `displayName`, `hidden`, `defaultReasoningEffort`, `supportedReasoningEfforts`, `inputModalities`, `supportsPersonality`, `isDefault`, `upgrade`, and `upgradeInfo`.
- `includeHidden: false` returns picker-visible models by default.
- `codex exec` has `--model` and `-c, --config key=value`.
- `codex exec` does not expose a first-class `--effort` flag.
- Codex config key `model_reasoning_effort` supports `minimal | low | medium | high | xhigh`.
- `xhigh` is model-dependent.
- `config/read` exists in app-server and returns effective configuration after configuration layering.
- Codex loads user config from `~/.codex/config.toml` and can also load project-scoped `.codex/config.toml` only for trusted projects.
- `model_catalog_json` can override the model catalog, including profile-level overrides.
- `codex exec` supports `--cd` and `--profile`, and `-c key=value` overrides take precedence for one invocation.

Local probe:

- binary: `codex-cli 0.117.0`
- method: `codex app-server` over JSON-RPC stdio
- transport: newline-delimited JSON-RPC over stdio, not `Content-Length` framing
- request: `model/list` with `{ "limit": 20, "includeHidden": false }`
- result: 8 visible models, `gpt-5.4` marked default, `nextCursor: null`
- visible models returned: `gpt-5.4`, `gpt-5.2-codex`, `gpt-5.1-codex-max`, `gpt-5.4-mini`, `gpt-5.3-codex`, `gpt-5.3-codex-spark`, `gpt-5.2`, `gpt-5.1-codex-mini`
- `xhigh` is already returned for most models.
- `gpt-5.1-codex-mini` only returned `medium | high`, so effort options must be per-model.
- `gpt-5.3-codex-spark` returned default effort `high`, so default effort must not be global.
- `codex exec --help` locally confirms `--cd`, `--profile`, `--model`, `--oss`, `--local-provider`, and repeatable `-c key=value`.
- local help confirms `--oss` is equivalent to `-c model_provider=oss`, so provider scope can differ from subscription-backed OpenAI Codex if not guarded.
- live `config/read` probe returned `{ config, origins }`.
- live `config/read` probe requires `params` object; missing `params` returns JSON-RPC error `-32600`.
- live `config/read` probe accepted `{ cwd }` and `{ profile }` without error, so the implementation should feature-detect and test scoped reads instead of assuming only global config.
- final live smoke on this worktree confirmed `model/list` returns 8 visible models, default `gpt-5.4`, `xhigh` for most models, and `medium | high` only for `gpt-5.1-codex-mini`.

Combined app-server session probe:

- one initialized app-server process successfully handled `account/read`, `account/rateLimits/read`, and `model/list` sequentially
- `account/read` returned a ChatGPT account shape in the local environment
- `account/rateLimits/read` returned `primary.windowDurationMins = 300` and `secondary.windowDurationMins = 10080`
- `model/list` returned the same 8 visible models in that same session
- conclusion: provider refresh should use a combined control-plane session when it needs account, limits, and catalog truth

## Lowest-Confidence Areas And Decisions

### 1. Auth-scoped catalog truth

`🎯 7   🛡️ 9   🧠 6`
Estimated implementation impact: `180-350` lines

Uncertainty:

- app-server `model/list` may return different catalogs depending on active Codex auth state, account plan, org policy, API-key mode, or future Codex rollout flags.
- The local probe only proves one logged-in environment, not all account modes.

Decision:

- treat Codex model catalog as auth-scoped, not global
- cache key must include binary path, Codex home, preferred auth mode, effective auth mode, managed account stable identity when available, and API-key availability source
- never reuse a ChatGPT-account catalog as API-key-mode catalog
- never reuse an API-key-mode catalog as ChatGPT-account catalog
- when auth mode changes, keep previous catalog visible only as stale UI while refresh is in flight, then replace it

Implementation rule:

```text
catalogCacheKey =
  binaryPath
  + binaryVersion
  + codexHome
  + preferredAuthMode
  + effectiveAuthMode
  + managedAccountHash or "no-chatgpt-account"
  + apiKey.source or "no-api-key"
```

The hash should use a per-process salt and should not be persisted. Do not persist raw email solely for catalog cache.

### 2. Default model determinism

`🎯 8   🛡️ 9   🧠 6`
Estimated implementation impact: `220-420` lines

Uncertainty:

- current UI can represent model as empty string meaning `Default`
- Codex app-server default can change after a Codex release
- launch logs, relaunch, replay, and team metadata need to stay explainable

Decision:

- keep `Default` as a UI selection
- resolve `Default` to a concrete `resolvedLaunchModel` immediately before launch
- persist both user selection and resolved runtime truth in launch metadata
- never silently rewrite old team config from one concrete model to another
- if a team stored `Default`, relaunch should show that it will resolve to the current Codex default before launch

Required persisted launch identity:

```ts
export interface ProviderModelLaunchIdentity {
  providerId: TeamProviderId;
  providerBackendId: TeamProviderBackendId | null;
  selectedModel: string | null;
  selectedModelKind: 'default' | 'explicit';
  resolvedLaunchModel: string;
  catalogId: string | null;
  catalogSource: 'app-server' | 'static-fallback' | 'unavailable';
  catalogFetchedAt: string | null;
  selectedEffort: string | null;
  resolvedEffort: string | null;
}
```

This identity should be written into exact logs and launch-derived metadata. It should not replace existing fields in Phase 1, but it should become the canonical explanation layer for Codex relaunch/replay.

### 3. Effort transport through orchestrator

`🎯 8   🛡️ 9   🧠 5`
Estimated implementation impact: `180-320` lines

Uncertainty:

- Agent Teams exposes a generic `--effort` concept today
- Codex CLI does not expose `--effort`
- Codex uses config key `model_reasoning_effort`

Decision:

- UI and main process may accept provider-aware effort strings
- orchestrator public Agent Teams CLI can continue accepting `--effort`
- Codex executor must translate Codex effort to `codex exec -c model_reasoning_effort='"value"'`
- Anthropic executor must not see Codex-only effort values
- Codex executor must not see Anthropic `max`

No implementation phase may ship `xhigh` as selectable until this pass-through is tested.

### 4. Catalog availability vs team-agent safety policy

`🎯 8   🛡️ 9   🧠 5`
Estimated implementation impact: `160-280` lines

Uncertainty:

- app-server `model/list` says a model is available to Codex
- our team-agent contract can still make a model unsafe for Agent Teams if it breaks task/reply/bootstrap conventions
- current UI has local disabled policy for `gpt-5.3-codex-spark`, `gpt-5.2-codex`, and `gpt-5.1-codex-mini`

Decision:

- model catalog answers "can Codex offer this model"
- team policy answers "can Agent Teams safely launch this model"
- keep these as separate layers
- do not remove current disabled policies just because app-server returns a model
- show clear disabled copy: `Available in Codex, disabled for Agent Teams`
- disabled models can still display catalog metadata and effort metadata for transparency

### 5. Codex binary version and app-server method compatibility

`🎯 7   🛡️ 9   🧠 6`
Estimated implementation impact: `220-420` lines

Uncertainty:

- `codex app-server` is documented as an app-server integration surface, but local users can have older Codex binaries.
- `model/list` may be missing, renamed, or return a narrower shape in older binaries.
- Current `JsonRpcStdioClient` collapses JSON-RPC errors to `Error(message)`, which loses method, code, and structured details needed to distinguish `method not found` from auth/network/timeout.
- Current `CodexBinaryResolver` caches only binary path, not binary version.

Decision:

- make binary version part of catalog cache identity
- add structured JSON-RPC error metadata before implementing catalog fallback
- treat `method not found` as `static-fallback`, not as account failure
- treat malformed model rows as catalog degradation, not app-server runtime failure
- clear catalog cache when resolved Codex binary path or version changes

Required implementation detail:

```ts
export class JsonRpcRequestError extends Error {
  readonly method: string;
  readonly code: number | null;
  readonly details: unknown;
}
```

The app-server model client should classify:

- `method_not_found`: fallback to static catalog and show upgrade hint
- `timeout`: stale cache if available, then fallback
- `malformed_response`: fallback plus diagnostics
- `process_exit`: shared app-server failure for all sub-results in combined snapshot
- `auth_required`: account/read decides auth truth; model/list must not invent auth truth

### 6. `auto` auth resolution for model catalog

`🎯 7   🛡️ 9   🧠 6`
Estimated implementation impact: `180-320` lines

Uncertainty:

- UI lets users pick `auto`, `chatgpt`, or `api_key`.
- Catalog can differ between ChatGPT subscription and API key.
- The model picker must preview the catalog for the mode that launch will actually use, not only the configured preference.

Decision:

- `preferredAuthMode=auto` is not a catalog scope by itself
- resolve `auto` into `effectiveAuthMode` using the same readiness logic as launch
- catalog request should be scoped to the effective launch mode
- Provider Settings can show both preference and effective catalog scope when they differ
- if effective mode flips from ChatGPT to API key because ChatGPT becomes unavailable, keep stale ChatGPT catalog visually stale and refresh API-key catalog

UX copy rule:

- do not say `Detected from OPENAI_API_KEY` as the primary model catalog source when ChatGPT account is the effective mode
- show API-key availability only as fallback/secondary when selected auth is ChatGPT or auto resolves to ChatGPT

### 7. App-server notifications and refresh cadence

`🎯 8   🛡️ 8   🧠 5`
Estimated implementation impact: `160-260` lines

Uncertainty:

- account login flow has notifications
- current docs and local probe do not establish a dedicated model-catalog changed notification
- keeping a long-lived app-server just for model catalog would increase lifecycle complexity

Decision:

- do not introduce a long-lived model catalog subscription in this rollout
- use short-lived app-server sessions for refresh
- trigger catalog refresh after login success, logout, auth mode change, API-key source change, manual refresh, and provider status refresh
- do not poll `model/list` aggressively from renderer
- use `10 minute` success TTL and stale cache for UI continuity

If a future app-server release adds model catalog notifications, integrate them later behind the catalog feature port without changing renderer contracts.

### 8. Backup, restore, and relaunch compatibility

`🎯 7   🛡️ 9   🧠 7`
Estimated implementation impact: `240-520` lines

Uncertainty:

- team launch metadata already persisted provider/model/effort/backend in several places
- adding dynamic defaults and resolved model identity can make old backups ambiguous
- old teams may contain no `modelCatalog` metadata

Decision:

- new `ProviderModelLaunchIdentity` is additive
- old teams without it remain readable
- relaunch derives missing identity from existing provider/model/effort/backend fields
- restore does not require the old catalog to be available
- if restored explicit model is missing from current catalog, UI preserves the explicit model with a warning instead of silently replacing it with current default
- if restored model was `Default`, relaunch preview resolves it against current catalog and says so before launch

Migration rule:

```text
old explicit model -> selectedModelKind="explicit", resolvedLaunchModel=old model
old empty model -> selectedModelKind="default", resolvedLaunchModel=current default at next launch
missing effort -> selectedEffort=null, resolvedEffort=current model default at next launch
```

### 9. UI and orchestrator version skew

`🎯 7   🛡️ 10   🧠 7`
Estimated implementation impact: `280-620` lines

Uncertainty:

- `claude_team` and `agent_teams_orchestrator` can be updated at different times.
- UI can learn about `xhigh`, `minimal`, or a future model like `gpt-5.5` before the installed orchestrator can launch it safely.
- The current orchestrator static Codex helpers can reject a model that Codex app-server already exposed.

Decision:

- catalog visibility and launch capability are separate contracts
- UI may display app-server catalog metadata as soon as it is available
- UI must not enable launch controls that require new orchestrator behavior until runtime capability says that behavior exists
- provider-explicit Codex model strings can be accepted only after orchestrator declares dynamic Codex model support
- Codex `xhigh` can be shown as metadata before Phase 4, but it is disabled for launch until Codex effort pass-through is available

Required runtime capability contract:

```ts
export interface ProviderRuntimeCapabilities {
  providerId: TeamProviderId;
  codex?: {
    supportsDynamicAppServerModels: boolean;
    supportsCodexReasoningEffortConfig: boolean;
    supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
    acceptsProviderExplicitFutureModels: boolean;
  };
}
```

Compatibility rule:

```text
catalog says model/effort exists
+ team policy says model is not disabled
+ runtime capability says launch path supports it
= launch control enabled
```

If any part is missing, the picker can still display the model, but launch must be disabled with explicit copy.

Recommended copy:

- `Available in Codex, waiting for Agent Teams runtime support`
- `This Codex effort is visible in Codex, but this Agent Teams runtime cannot launch it yet`
- `Upgrade the Agent Teams runtime to use this model`

This avoids a bad state where the user selects `xhigh` successfully in UI and then gets a late `codex exec` failure.

### 10. Future model policy, including `gpt-5.5`

`🎯 8   🛡️ 8   🧠 6`
Estimated implementation impact: `240-520` lines

Uncertainty:

- app-server can expose a new model immediately after OpenAI releases it.
- the user goal is that new Codex models appear without us shipping a new static list.
- Agent Teams still needs a safety layer so one unexpected model row does not break team launch flows.

Top 3 policies:

1. Allow every app-server-visible model immediately: `🎯 8   🛡️ 5   🧠 3`, `80-180` lines. This best solves future releases, but it can route unverified models into team launch without product copy or rollback clarity.
2. Show every app-server-visible model immediately, launch with capability gate plus "new model" warning: `🎯 9   🛡️ 8   🧠 5`, `240-520` lines. This keeps future models visible without app releases, but still blocks only real launch incompatibilities.
3. Hide or disable unknown models until a code release updates policy: `🎯 4   🛡️ 9   🧠 2`, `60-120` lines. This is safe but defeats the reason to use `model/list`.

Chosen policy: option 2.

Implementation rule:

- app-server-visible, non-hidden models appear in the picker immediately
- known disabled Agent Teams models remain disabled
- new unknown models are selectable only if runtime capabilities support dynamic Codex models
- new unknown models get a `New from Codex catalog` note until a successful launch or explicit policy promotion marks them `verified`
- if the new model does not expose usable text input or any supported effort we can launch, it is shown but disabled
- hidden models are never introduced into new-team pickers by default

Policy statuses:

```ts
export type CodexTeamModelPolicyStatus =
  | 'verified'
  | 'new-from-codex-catalog'
  | 'disabled-for-agent-teams'
  | 'requires-runtime-upgrade'
  | 'missing-from-current-catalog';
```

This means `gpt-5.5` can appear the day app-server returns it, but the UI will not pretend the full Agent Teams launch path is verified unless the local runtime can actually handle provider-explicit dynamic Codex models.

### 11. Hidden, upgraded, and persisted models

`🎯 8   🛡️ 9   🧠 5`
Estimated implementation impact: `160-340` lines

Uncertainty:

- official docs say `includeHidden: false` returns picker-visible models by default.
- persisted teams can reference a model that later becomes hidden, upgraded, renamed, or unavailable.
- app-server exposes `upgrade` and `upgradeInfo`, but we do not know every future migration shape.

Decision:

- normal picker uses `includeHidden: false`
- if a persisted explicit Codex model is not found in the visible catalog, run one scoped refresh with `includeHidden: true`
- if hidden lookup finds the model, show it as `Hidden in Codex catalog` and keep relaunch possible only if runtime capability and team policy allow it
- if `upgrade` points to a visible replacement, show a non-destructive migration suggestion
- never auto-rewrite persisted model ids during restore or relaunch

Relaunch behavior:

```text
visible model found -> normal relaunch
hidden model found -> relaunch allowed only with warning and policy pass
upgrade available -> show "Switch to recommended model" action
missing model -> keep value visible, require user to choose another model before launch
```

This avoids both failure modes: silently changing a user's team model, or breaking old teams because a model moved out of the default picker.

### 12. Stored effort schema and non-dialog launch paths

`🎯 7   🛡️ 9   🧠 7`
Estimated implementation impact: `320-760` lines

Uncertainty:

- effort is not used only in launch dialogs.
- team metadata, member metadata, backup/restore, draft retry, localStorage launch params, and scheduled/provisioned flows can all carry `effort`.
- current normalizers in team data paths may silently discard anything outside `low | medium | high`.

Decision:

- provider-aware effort parsing must be added at every inbound boundary, not only in React components
- old persisted `low | medium | high` values stay valid
- new Codex-specific values are preserved only with provider/model context
- if provider context is missing, parse as legacy effort and do not invent Codex-specific meaning
- scheduled launches and automation-like flows must either be updated in the same phase or explicitly block Codex-only efforts until updated

High-risk code paths to audit during implementation:

- `src/main/services/team/TeamMembersMetaStore.ts`
- `src/main/services/team/TeamDataService.ts`
- `src/main/services/team/TeamBackupService.ts`
- `src/main/services/team/TeamProvisioningService.ts`
- `src/shared/types/schedule.ts`
- `src/main/ipc/teams.ts`
- `src/main/http/teams.ts`
- renderer launch prefill and draft retry localStorage state

Migration rule:

```text
legacy effort with no provider context -> keep if low | medium | high
codex effort with provider=codex -> validate against selected model catalog
codex effort with provider missing -> store as selected string only, resolve before launch
unsupported restored effort -> show warning, do not silently downgrade
```

### 13. Renderer stale state, HMR, and out-of-order refreshes

`🎯 8   🛡️ 9   🧠 5`
Estimated implementation impact: `180-380` lines

Uncertainty:

- previous provider settings work showed transient wrong states after HMR and slow refreshes.
- catalog, account, and rate limits can refresh with different timings.
- a stale app-server response can arrive after a newer auth-mode change.

Decision:

- every provider status refresh should carry a monotonic `requestId` or `snapshotVersion`
- renderer stores the latest accepted version per provider
- responses older than the latest accepted version are ignored
- `modelCatalog.schemaVersion` is required and future versions are treated as degraded, not fatal
- HMR should keep last ready provider status visible while a refresh is in flight
- a catalog refresh cannot overwrite account connected state unless it came from the same combined snapshot

Required stale-write guard:

```text
if incoming.providerId != current.providerId -> reject
if incoming.requestId < current.requestId -> reject
if incoming.authScope != current.authScope and incoming.status is not from current auth selection -> keep as stale diagnostics only
```

This directly targets flicker like `Codex native unavailable` followed by ready state, or fallback API-key copy appearing while ChatGPT account mode is selected.

### 14. Privacy, logs, and diagnostics

`🎯 8   🛡️ 9   🧠 4`
Estimated implementation impact: `120-260` lines

Uncertainty:

- account-scoped cache keys need stable identity, but raw email should not leak into exact logs, runtime snapshots, or persistent diagnostics.
- API-key source is useful for UX, but no secret or env value should be logged.

Decision:

- hash managed account identity in memory for cache keys
- use a per-process salt for volatile cache keys
- do not persist raw account email solely for model catalog cache
- exact logs can record `authScope=chatgpt` or `authScope=api_key`, not raw account identity
- diagnostics can record `apiKeySource=OPENAI_API_KEY` but never the value
- error messages preserve method/code/timeout, but redact command env and tokens

Required diagnostic fields:

```ts
export interface CodexModelCatalogDiagnostics {
  source: 'app-server' | 'static-fallback' | 'unavailable';
  status: 'ready' | 'stale' | 'degraded' | 'unavailable';
  method?: 'model/list';
  errorCode?: string | number | null;
  errorCategory?: string | null;
  binaryVersion?: string | null;
  effectiveAuthMode?: 'chatgpt' | 'api_key' | null;
  cacheAgeMs?: number | null;
}
```

No UI surface should show `Unknown error` for catalog failures after this feature.

### 15. Rollout ordering across repos

`🎯 8   🛡️ 10   🧠 6`
Estimated implementation impact: `120-260` lines

Uncertainty:

- `claude_team` can ship UI before the user has a compatible `agent_teams_orchestrator` runtime in cache.
- the app can point to `CLAUDE_DEV_RUNTIME_ROOT`, bundled runtime cache, or a user-installed runtime binary.

Decision:

- implement orchestrator support first or behind a UI capability gate
- Provider Settings can show catalog metadata before launch support exists
- Create/Launch dialogs must consult runtime capabilities before enabling new Codex models or new Codex efforts
- the runtime health check should expose a version/capability payload, not force UI to infer support from binary version strings
- if capabilities are unavailable, default to safe: display metadata, disable launch-only features

Rollout sequence:

1. Add orchestrator dynamic Codex model and effort capability support.
2. Add `claude_team` catalog feature and provider status metadata.
3. Show catalog in UI with capability gates.
4. Enable launch when capability and catalog agree.
5. Remove any temporary guard only after bundled runtime and dev runtime both report capabilities in CI/smoke.

This is the cleanest way to avoid UI and runtime getting out of sync.

### 16. Codex config/profile/cwd catalog mismatch

`🎯 6   🛡️ 10   🧠 8`
Estimated implementation impact: `360-900` lines

Uncertainty:

- official config docs allow `model_catalog_json`, and profile-level `profiles.<name>.model_catalog_json` can override it.
- Codex loads project-scoped `.codex/config.toml` only when a project or worktree is trusted.
- `codex exec` can run with a different `cwd`, profile, and inline `-c` overrides than the short-lived app-server preview session.
- current `CodexAppServerSessionFactory` starts `codex app-server` without an explicit `cwd` or profile.

Failure mode:

- Provider Settings shows catalog A from global config.
- Launch runs `codex exec` in project cwd with project-scoped or profile config and effectively uses catalog B.
- The user selects a model that preview says is valid, but launch resolves against a different provider/catalog.

Top 3 policies:

1. Global-only catalog preview: `🎯 7   🛡️ 5   🧠 3`, `80-180` lines. Fast and simple, but wrong for project-scoped Codex configs.
2. Project-scoped catalog preview for launch flows, global preview for dashboard: `🎯 9   🛡️ 9   🧠 7`, `360-900` lines. More work, but it matches actual `codex exec` launch context.
3. Ignore config and force a static OpenAI Codex provider always: `🎯 5   🛡️ 8   🧠 4`, `200-420` lines. Safer than mismatch, but it discards legitimate user Codex config and can surprise power users.

Chosen policy: option 2.

Decision:

- dashboard/provider card can show a global Codex catalog snapshot
- Create/Launch dialogs must fetch or resolve catalog for the selected launch `cwd`
- if profile selection exists or is introduced, catalog cache key must include profile name
- if we pass inline config overrides to `codex exec`, equivalent preview scope must include those overrides or launch must be marked "not preview-verified"
- if project trust/config cannot be resolved, launch UI falls back to global catalog but shows `Catalog may differ for this project`

Required preview scope:

```ts
export interface CodexModelCatalogScope {
  codexHome: string;
  binaryPath: string;
  binaryVersion: string | null;
  cwd: string | null;
  projectTrust: 'trusted' | 'untrusted' | 'unknown';
  profileName: string | null;
  configFingerprint: string | null;
  preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
  effectiveAuthMode: 'chatgpt' | 'api_key' | null;
  launchOverridesFingerprint: string | null;
}
```

Cache key correction:

```text
catalogCacheKey =
  binaryPath
  + binaryVersion
  + codexHome
  + cwd or "global"
  + projectTrust
  + profileName or "default-profile"
  + configFingerprint or "unknown-config"
  + launchOverridesFingerprint or "no-launch-overrides"
  + preferredAuthMode
  + effectiveAuthMode
  + forcedLoginMethod or "no-forced-login-method"
  + forcedWorkspaceHash or "no-forced-workspace"
  + managedAccountHash or "no-chatgpt-account"
  + apiKey.source or "no-api-key"
```

Implementation notes:

- use app-server `config/read` when available to get effective config fingerprints for the same scope that launch will use
- do not parse arbitrary TOML as the primary config source if app-server can resolve effective configuration
- if app-server cannot scope `config/read` by cwd/profile, keep that uncertainty visible in diagnostics
- do not use raw config file contents as a cache key or log payload; hash only the relevant effective keys

Relevant effective keys:

- `model`
- `model_provider`
- `model_catalog_json`
- `profiles.<name>.model_catalog_json`
- `model_reasoning_effort`
- `forced_login_method`
- `forced_chatgpt_workspace_id`
- `openai_base_url`
- `model_providers.*` only as a redacted structural fingerprint
- `projects.<path>.trust_level`

Acceptance:

- a team launch from project A and project B can have different Codex catalog cache entries
- a trusted project `.codex/config.toml` changing `model_catalog_json` invalidates preview for that project
- global dashboard status does not claim to be launch-exact for every project
- exact logs record the catalog scope fingerprint, not raw config values

### 17. Built-in OpenAI Codex provider vs custom/OSS Codex config

`🎯 7   🛡️ 9   🧠 7`
Estimated implementation impact: `260-620` lines

Uncertainty:

- Codex config supports `model_provider`, custom providers, `oss_provider`, and provider auth settings.
- Agent Teams "Codex" provider is intended to mean native Codex through OpenAI/ChatGPT subscription or API-key billing, not arbitrary custom provider execution.
- app-server `model/list` can be influenced by configuration, but our product copy currently talks about Codex subscription.

Decision:

- this cutover should keep Agent Teams Codex scoped to the built-in OpenAI Codex provider
- custom provider and OSS provider support should be a separate provider feature, not silently mixed into `provider=codex`
- if effective config says `model_provider` is not built-in OpenAI for the launch scope, show a clear warning and block subscription-mode launch unless the user intentionally switches to a future custom-provider flow
- when launching Agent Teams Codex, pass or enforce provider config consistently so `codex exec` uses the same provider class previewed by the catalog

Recommended launch guard:

```text
if provider=codex and effective model_provider is neither missing nor "openai":
  status = degraded
  launch = blocked
  copy = "This project config points Codex at a custom/local provider. Agent Teams Codex currently supports the built-in OpenAI Codex provider only."
```

If the team wants to support custom providers later:

- add a separate `provider=codex-custom` or generic OpenAI-compatible provider
- do not reuse subscription UX or rate-limit UI
- do not show ChatGPT account limits for custom provider launches

This prevents a confusing case where UI says "Codex subscription" but runtime actually routes to local OSS or a custom endpoint.

### 18. Modalities and personality support

`🎯 8   🛡️ 9   🧠 4`
Estimated implementation impact: `120-280` lines

Uncertainty:

- app-server model rows expose `inputModalities` and `supportsPersonality`.
- Agent Teams launch prompts are text-first today, but future UI can attach images or personality-like instructions.
- older model catalogs can omit `inputModalities`, and docs say missing modalities should be treated as `["text", "image"]` for backward compatibility.

Decision:

- launchability requires `text` input support
- image support is displayed as capability metadata, not required for normal team launch
- `supportsPersonality=false` must not disable normal team launch, but the UI must not claim `/personality` or personality-specific behavior for that model
- missing `inputModalities` uses the documented backward-compatible default

Validation rule:

```text
if inputModalities exists and does not include "text":
  show model, disable launch, copy "This Codex model is not text-launch compatible for Agent Teams"

if supportsPersonality=false:
  hide personality controls for this model if those controls exist
```

This keeps model picker truthful without overfitting to the current text-only launch flow.

### 19. Stable app-server surface vs experimental fields

`🎯 8   🛡️ 9   🧠 4`
Estimated implementation impact: `80-180` lines

Uncertainty:

- app-server has an `experimentalApi` capability.
- `model/list` itself is documented on the stable API overview, but adjacent methods and future richer fields can be experimental.
- opting into experimental API globally can change response surface and error behavior.

Decision:

- keep `experimentalApi=false` for the model catalog rollout
- rely only on stable `model/list` fields listed in the docs
- treat extra fields as diagnostics only
- add a later explicit spike before using experimental catalog, plugin, or app-server thread features in this path

Acceptance:

- catalog tests run with `experimentalApi=false`
- no Phase 1-5 task depends on experimental fields
- if a future field appears, normalization ignores it unless we add a typed, tested use case

### 20. App-server preview vs native exec signoff

`🎯 8   🛡️ 10   🧠 6`
Estimated implementation impact: `180-420` lines

Uncertainty:

- `model/list` is the correct picker source, but the actual launch surface remains `codex exec --json`.
- a model can appear in app-server before `codex exec` in the installed binary handles it correctly.
- effort config can be accepted syntactically but rejected by the model/provider at runtime.

Decision:

- app-server catalog is necessary for UI, but not the only release gate for enabling new launch capability
- Phase 4 must include a live or mocked native-exec compatibility probe for the selected launch path
- native exec signoff should test model, provider scope, cwd, profile, and non-default effort together
- if live signoff is not available in CI, use a fixture-based unit test plus one documented local smoke command before merging

Required signoff matrix:

```text
default model + default effort + selected cwd
explicit gpt-5.4 + xhigh + selected cwd
gpt-5.1-codex-mini + high + selected cwd
gpt-5.1-codex-mini + xhigh -> blocked before exec
synthetic future model + capability disabled -> blocked before exec
synthetic future model + capability enabled -> argv accepted by orchestrator test
custom model_provider config -> blocked or explicit custom-provider copy
```

This prevents the plan from treating app-server catalog presence as proof that the full Agent Teams runtime path is healthy.

### 21. `config/read` scope contract is only partially documented

`🎯 7   🛡️ 9   🧠 6`
Estimated implementation impact: `180-420` lines

Uncertainty:

- docs list `config/read`, but the detailed request/response shape is not as explicit as `model/list`.
- local probe confirms `config/read` returns `{ config, origins }` and accepts `params`.
- local probe confirms missing `params` returns `-32600`, so callers must always send `{}` at minimum.
- local probe confirms `{ cwd }` and `{ profile }` are accepted, but we still need tests around whether they fully mirror `codex exec --cd/--profile` in all installations.

Decision:

- treat `config/read` as a feature-detected helper, not as a hard dependency for model catalog availability
- always call `config/read` with an object, never with missing params
- include `config/read` method/code/details in diagnostics
- if scoped `config/read` fails but global succeeds, mark launch catalog as `scope_unverified`, not `ready`
- if `config/read` is missing on older binaries, fall back to global catalog and require runtime capability plus explicit degraded copy before launch enablement

Recommended DTO:

```ts
export interface CodexAppServerConfigReadParams {
  cwd?: string | null;
  profile?: string | null;
}

export interface CodexAppServerConfigReadResponse {
  config: Record<string, unknown>;
  origins: Record<string, unknown>;
}
```

Feature-detect result:

```ts
export type CodexConfigReadSupport =
  | 'supported-scoped'
  | 'supported-global-only'
  | 'method-missing'
  | 'failed';
```

Acceptance:

- unit tests cover missing `params`, method-not-found, global success, scoped success, and scoped failure
- `config/read` failure never breaks account/rate-limit/model-list reads
- launch UI does not present a project-scoped catalog as verified unless config scope was actually checked

### 22. Forced login method and ChatGPT workspace scope

`🎯 7   🛡️ 10   🧠 6`
Estimated implementation impact: `180-420` lines

Uncertainty:

- effective Codex config can include `forced_login_method`.
- effective Codex config can include `forced_chatgpt_workspace_id`.
- workspace/account policy can affect available models, rate limits, and whether ChatGPT subscription mode is valid.
- previous Codex account work already had a real bug around forced login method, so this is not theoretical.

Decision:

- auth scope must include forced login method and forced workspace identity when present
- if UI-selected auth mode conflicts with `forced_login_method`, effective auth mode wins and UI must explain why
- forced workspace id must be hashed before cache/log usage
- rate-limit, account, and model catalog snapshots must be scoped together so workspace changes cannot reuse stale catalog

Auth scope correction:

```ts
export interface CodexCatalogAuthScope {
  preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
  effectiveAuthMode: 'chatgpt' | 'api_key' | null;
  forcedLoginMethod: 'chatgpt' | 'api_key' | null;
  managedAccountHash: string | null;
  forcedWorkspaceHash: string | null;
  apiKeySource: string | null;
}
```

UX rules:

- if user selected ChatGPT but config forces API key, show `Codex config forces API key mode for this scope`
- if user selected API key but config forces ChatGPT, show `Codex config forces ChatGPT account mode for this scope`
- if workspace id changes, show `Codex workspace changed, refreshing subscription limits and model catalog`
- never show raw workspace id in UI unless Codex app-server provides a display name that is intended for users

Cache invalidation:

- forced login method change invalidates both auth and catalog cache
- forced workspace hash change invalidates ChatGPT-scoped rate limits and catalog
- account logout clears all ChatGPT workspace-scoped entries

### 23. Model catalog file trust and local file changes

`🎯 6   🛡️ 9   🧠 7`
Estimated implementation impact: `220-520` lines

Uncertainty:

- `model_catalog_json` can point to a local JSON file.
- app-server resolves effective config, but our app may not know if that JSON file changed unless config fingerprint includes enough origin data.
- project-scoped `.codex/config.toml` only applies for trusted projects, so a file can exist but not be active.

Decision:

- treat `model_catalog_json` as part of effective config, not as a file we parse directly by default
- if `config/read.origins` exposes enough origin/path data, hash only path and mtime for invalidation, not file contents
- if origin/path data is unavailable, rely on manual refresh and short TTL
- never read arbitrary `model_catalog_json` file contents into logs or diagnostics
- do not apply project-scoped model catalog unless Codex effective config says the project is trusted and the catalog is active

Top 3 invalidation policies:

1. TTL/manual-refresh only: `🎯 7   🛡️ 6   🧠 2`, `40-100` lines. Simple but stale after local file edits.
2. Hash effective config plus optional mtime for active catalog file: `🎯 8   🛡️ 9   🧠 5`, `220-520` lines. Best balance without parsing arbitrary catalog files ourselves.
3. Parse and watch every possible catalog file: `🎯 5   🛡️ 7   🧠 8`, `500-1000` lines. Too much responsibility and security surface for this feature.

Chosen policy: option 2.

Acceptance:

- active `model_catalog_json` path change invalidates cache
- active catalog file mtime change invalidates cache when path is available
- inactive untrusted project `.codex/config.toml` does not affect the trusted/global catalog

## Top 3 Implementation Options

### 1. Dedicated Codex model catalog feature - chosen

`🎯 9   🛡️ 9   🧠 6`
Estimated size: `1200-2400` lines

Core idea:

- create `src/features/codex-model-catalog`
- keep model catalog rules isolated from account UI, provider status plumbing, and Electron transport
- reuse existing `CodexAppServerSessionFactory`
- expose a small feature facade to provider status and renderer model picker
- update orchestrator only where runtime status and launch effort transport require it

Why it wins:

- best SOLID alignment
- clean domain rules for model visibility, effort validation, fallback, and default selection
- does not make `codex-account` responsible for model policy
- least risk to Anthropic
- easiest to test without full app startup

Main tradeoff:

- needs small integration glue in existing provider status and team launch flows

### 2. Fold catalog into `codex-account`

`🎯 7   🛡️ 7   🧠 5`
Estimated size: `800-1600` lines

Core idea:

- extend `src/features/codex-account` with `model/list`
- use account snapshot as the only Codex control-plane snapshot
- merge account, rate limits, and model catalog in one feature

Why it is tempting:

- fewer new folders
- account feature already owns app-server account/rate-limit reads
- easier to fetch account plus model catalog in one app-server session

Why I do not recommend it:

- model catalog is not account management
- the feature becomes a broad Codex control-plane catch-all
- future provider catalog work would have to pull model rules back out
- more risk of account UI churn when only model picker changes are needed

### 3. Full provider model catalog for all providers now

`🎯 7   🛡️ 8   🧠 9`
Estimated size: `2500-4500` lines

Core idea:

- build one provider-agnostic model catalog for Anthropic, Codex, Gemini, and future providers
- move static renderer catalog policy into a shared feature
- expose one rich contract for all provider model pickers

Why it is attractive:

- cleanest long-term abstraction
- one UI model for labels, availability, capabilities, and efforts
- reduces future duplication

Why not now:

- too much surface area while Codex runtime cutover is still fresh
- Anthropic model behavior is already stable and should not be reworked for a Codex catalog issue
- would delay the concrete Codex model release problem

## Current Code Reality

### `claude_team`

Existing app-server infrastructure:

- `src/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.ts`
- `src/main/services/infrastructure/codexAppServer/CodexAppServerSessionFactory.ts`
- `src/main/services/infrastructure/codexAppServer/protocol.ts`
- `src/features/codex-account/main/infrastructure/CodexAccountAppServerClient.ts`

Current account client behavior:

- `readAccount()` opens one app-server session.
- `readRateLimits()` opens another app-server session.
- `logout()` opens another app-server session.
- no `model/list` protocol types exist yet.
- `CodexAppServerSessionFactory` starts `codex app-server` with no explicit `cwd` or profile option.
- app-server initialize response includes `codexHome`, but the current protocol types do not expose effective config or config fingerprint.

Current shared provider status:

- `CliProviderStatus.models` is only `string[]`.
- `CliProviderStatus.modelAvailability` has per-model verification status but no rich model metadata.
- renderer model selector can already prefer runtime-provided `providerStatus.models`.

Current effort type:

```ts
export type EffortLevel = 'low' | 'medium' | 'high';
```

Risk:

- adding `xhigh` directly without provider-specific validation would let Anthropic UI accidentally offer unsupported choices.

Current persistence and non-dialog launch paths:

- team metadata and member metadata normalize launch-derived provider/model/effort in multiple services.
- backup/restore copies metadata but restore-time launch preview must still tolerate missing catalog metadata.
- draft retry and launch prefill can reuse old localStorage state.
- scheduled launch types can reference the shared effort type.

Risk:

- updating only the visible launch dialogs would leave hidden paths that silently drop Codex-only efforts or relaunch with stale default semantics.

### `agent_teams_orchestrator`

Current Codex model catalog:

- `src/utils/model/codex.ts`
- static `CODEX_MODELS`
- static `DEFAULT_CODEX_MODEL`
- `isCodexModel()` checks only static ids

Current runtime status:

- `getUnifiedRuntimeStatusPayload('codex')` returns static Codex model ids.

Current CLI effort:

- top-level `--effort <level>` currently accepts `low | medium | high | max`.
- Codex native execution is ultimately `codex exec --json`.
- installed `codex exec --help` shows no `--effort` flag.

Risk:

- if we send `--effort xhigh` through current orchestrator, it fails before Codex can use it.
- if we map Anthropic `max` to Codex `xhigh`, the semantics are wrong.
- if we show `xhigh` in UI before the launch path supports it, the picker becomes misleading.

## Target Architecture

### Feature folder

```text
src/features/codex-model-catalog/
  contracts/
    codexModelCatalog.dto.ts
    index.ts
  core/
    domain/
      codexModelCatalog.ts
      codexReasoningEffort.ts
      codexModelCatalogFallback.ts
      normalizeCodexAppServerModel.ts
    application/
      GetCodexModelCatalogUseCase.ts
      CodexModelCatalogPorts.ts
  main/
    composition/
      createCodexModelCatalogFeature.ts
    adapters/
      output/
        CodexAppServerModelCatalogSource.ts
        StaticCodexModelCatalogSource.ts
    infrastructure/
      CodexModelCatalogAppServerClient.ts
      InMemoryCodexModelCatalogCache.ts
  preload/
    index.ts
  renderer/
    adapters/
      codexModelCatalogViewModel.ts
    hooks/
      useCodexModelCatalog.ts
    ui/
      CodexModelEffortHint.tsx
```

Rules:

- `core/domain` has all normalization and validation rules.
- `main/infrastructure` is the only layer that knows JSON-RPC method names.
- renderer never receives raw app-server rows.
- app shell imports only public feature entrypoints.

### App-server lifecycle

Use the existing `CodexAppServerSessionFactory`.

Request sequence:

1. Spawn `codex app-server`.
2. Send `initialize` with `clientInfo` and capabilities.
3. Send `initialized`.
4. Request `model/list`.
5. Drain or ignore notifications safely.
6. Close stdin and terminate the process on completion or timeout.

Recommended timeouts:

- initialize: `6000ms`
- `model/list`: `4500ms`
- total model catalog read: `9000ms`

Recommended pagination:

- request `limit: 100`, `includeHidden: false` for normal UI
- follow `nextCursor` until `null`
- hard-stop after 5 pages to avoid runaway loops
- log a degraded catalog warning if the hard-stop is hit

### Single-session snapshot policy

Provider status currently risks multiple sequential app-server starts:

- account read
- rate limits read
- future model list read

This caused slow provider loading in earlier UI work, so the plan should not add another app-server spawn in the hot path.

Preferred design:

- keep `codex-model-catalog` as a separate feature for ownership
- add an optional combined Codex control-plane read in composition
- when provider status refresh needs account plus rate limits plus model catalog, use one app-server session and issue all three requests inside it
- each sub-result has independent soft-failure state
- total snapshot can be partially healthy

Snapshot shape:

```ts
export interface CodexControlPlaneSnapshot {
  binary: {
    path: string;
    version: string | null;
  };
  account: CodexAccountSnapshotResult;
  rateLimits: CodexRateLimitsSnapshotResult;
  modelCatalog: CodexModelCatalogSnapshotResult;
  configScope: {
    cwd: string | null;
    profileName: string | null;
    projectTrust: 'trusted' | 'untrusted' | 'unknown';
    configReadSupport: CodexConfigReadSupport;
    effectiveConfigFingerprint: string | null;
    launchOverridesFingerprint: string | null;
    activeModelCatalogFileFingerprint: string | null;
  };
  initialize: {
    codexHome: string;
    platformFamily: string;
    platformOs: string;
  };
  fetchedAt: string;
}
```

Soft-failure rules:

- account failure must not erase a fresh cached model catalog
- model catalog failure must not mark ChatGPT account disconnected
- rate-limit failure must not hide model picker options
- if app-server initialize fails, all three sub-results are degraded from the same root cause

Required correction to the existing account flow:

- current `CodexAccountAppServerClient.readAccount()` and `readRateLimits()` each open their own app-server process
- adding a third standalone `readModelCatalog()` would be a Provider Settings latency regression
- implement a combined app-server read path before wiring catalog into provider refresh
- keep separate methods for mutations and focused tests, but use the combined path for normal status refresh
- enrich `JsonRpcStdioClient` errors before catalog integration so the combined reader can classify `model/list` method failures without losing account truth

Recommended application service shape:

```ts
export interface CodexControlPlaneReader {
  readSnapshot(options: CodexControlPlaneReadOptions): Promise<CodexControlPlaneSnapshot>;
}
```

This can live in `codex-model-catalog` composition or in a small shared Codex control-plane composition module. Do not put model normalization inside `codex-account`.

Read scope:

- Provider Settings global refresh can pass `cwd=null`.
- Create/Launch dialogs should pass the selected absolute `cwd`.
- Relaunch/restore should pass the team's persisted project path.
- Scheduled launch validation should pass `schedule.launchConfig.cwd`.
- If a future UI supports Codex profile selection, the same profile must be passed to preview and launch.

## Contracts

### App-server protocol types

Add protocol DTOs to `src/main/services/infrastructure/codexAppServer/protocol.ts`:

```ts
export type CodexAppServerReasoningEffort =
  | 'none'
  | 'minimal'
  | 'low'
  | 'medium'
  | 'high'
  | 'xhigh';

export interface CodexAppServerReasoningEffortOption {
  reasoningEffort: CodexAppServerReasoningEffort;
  description?: string | null;
}

export type CodexAppServerInputModality = 'text' | 'image' | string;

export interface CodexAppServerModel {
  id: string;
  model: string;
  displayName: string;
  description?: string | null;
  hidden: boolean;
  supportedReasoningEfforts: CodexAppServerReasoningEffortOption[];
  defaultReasoningEffort: CodexAppServerReasoningEffort;
  inputModalities?: CodexAppServerInputModality[] | null;
  supportsPersonality?: boolean | null;
  isDefault: boolean;
  upgrade?: string | null;
  upgradeInfo?: unknown;
  availabilityNux?: unknown;
}

export interface CodexAppServerModelListParams {
  cursor?: string | null;
  limit?: number | null;
  includeHidden?: boolean | null;
}

export interface CodexAppServerModelListResponse {
  data: CodexAppServerModel[];
  nextCursor: string | null;
}

export interface CodexAppServerConfigReadParams {
  cwd?: string | null;
  profile?: string | null;
}

export interface CodexAppServerConfigReadResponse {
  config: Record<string, unknown>;
  origins: Record<string, unknown>;
}
```

`config/read` caller rule:

- always pass a params object, even when empty
- call global config as `config/read` with `{}`
- call project scope as `config/read` with `{ cwd }`
- call profile scope as `config/read` with `{ profile }`
- if both cwd and profile are needed, test `{ cwd, profile }` in Phase 1 and record the behavior before enabling profile-aware UI

### Domain model

Use separate ids:

- `catalogId`: app-server `id`, stable identity for React keys, telemetry, and dedupe
- `launchModel`: app-server `model` when non-empty, otherwise `id`

Reason:

- local probe currently returned equal values, but official schema exposes both fields, so they can diverge later.
- using `id` for launch would be a latent bug if Codex introduces a display/catalog alias.

```ts
export interface CodexCatalogModel {
  catalogId: string;
  launchModel: string;
  displayName: string;
  description: string | null;
  hidden: boolean;
  isDefault: boolean;
  supportedReasoningEfforts: CodexReasoningEffort[];
  defaultReasoningEffort: CodexReasoningEffort | null;
  inputModalities: CodexInputModality[];
  supportsPersonality: boolean;
  upgrade: string | null;
  source: 'app-server' | 'static-fallback';
}
```

Normalization rules:

- reject rows without a usable `id`
- derive `launchModel` from `model || id`
- default missing `inputModalities` to `['text', 'image']` for older catalogs
- default missing `supportsPersonality` to `false`
- accept documented `supportedReasoningEfforts` objects with `reasoningEffort`
- defensively accept string effort entries in tests, because older generated local types and live clients can drift
- drop duplicate `catalogId` rows after the first visible row
- drop duplicate `launchModel` rows after the first visible row unless a hidden row is the only available row
- keep unknown effort strings out of the selectable UI, but preserve them in diagnostics
- if no model is marked `isDefault`, choose static fallback default only as degraded fallback and label it as such

### Provider status contract

Add an optional rich catalog to `CliProviderStatus`:

```ts
export interface CliProviderModelCatalog {
  schemaVersion: 1;
  source: 'app-server' | 'static-fallback' | 'unavailable';
  status: 'ready' | 'stale' | 'degraded' | 'unavailable';
  fetchedAt: string | null;
  staleAt: string | null;
  binary?: {
    path: string | null;
    version: string | null;
  };
  authScope?: {
    preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
    effectiveAuthMode: 'chatgpt' | 'api_key' | null;
    forcedLoginMethod?: 'chatgpt' | 'api_key' | null;
    managedAccountHash?: string | null;
    forcedWorkspaceHash?: string | null;
    apiKeySource?: string | null;
  };
  launchScope?: {
    cwd: string | null;
    profileName: string | null;
    projectTrust: 'trusted' | 'untrusted' | 'unknown';
    configFingerprint: string | null;
    launchOverridesFingerprint: string | null;
  };
  errorMessage?: string | null;
  defaultModelId?: string | null;
  defaultLaunchModel?: string | null;
  models: CliProviderModelInfo[];
}

export interface CliProviderModelInfo {
  catalogId: string;
  launchModel: string;
  displayName: string;
  description?: string | null;
  hidden?: boolean;
  isDefault?: boolean;
  supportedReasoningEfforts?: CliProviderReasoningEffort[];
  defaultReasoningEffort?: CliProviderReasoningEffort | null;
  inputModalities?: string[];
  supportsPersonality?: boolean;
  upgrade?: string | null;
}

export type CliProviderReasoningEffort =
  | 'none'
  | 'minimal'
  | 'low'
  | 'medium'
  | 'high'
  | 'xhigh'
  | 'max';

export interface CliProviderRuntimeCapabilities {
  schemaVersion: 1;
  codex?: {
    supportsDynamicAppServerModels: boolean;
    supportsCodexReasoningEffortConfig: boolean;
    supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
    acceptsProviderExplicitFutureModels: boolean;
  };
}
```

Backwards compatibility:

- keep `CliProviderStatus.models: string[]`
- add `CliProviderStatus.runtimeCapabilities?: CliProviderRuntimeCapabilities`
- for Codex, derive `models` from `modelCatalog.models.map(model => model.launchModel)`
- for Anthropic and Gemini, do not require `modelCatalog`
- old renderers continue to work from `models`
- new renderers prefer `modelCatalog` when present
- never put team-agent disabled policy directly into `CliProviderModelCatalog`; catalog describes Codex availability, while Agent Teams policy is applied by renderer and launch validators
- never infer launch capability only from catalog presence

Renderer integration hotspot:

- update `TeamModelRuntimeProviderStatus` in `src/renderer/utils/teamModelAvailability.ts` to include `modelCatalog`
- update `getRuntimeSelectorModels()` to use `modelCatalog.models[*].launchModel` for Codex
- update `getAvailableTeamProviderModelOptions()` to map rich Codex options with display labels, default badge, and catalog diagnostics
- keep Anthropic path on `getFallbackTeamProviderModelOptions()`
- keep Gemini path on existing `models: string[]` until Gemini has a richer catalog

### Team launch effort contract

Do not add a separate per-provider lane for this feature.

Use existing team-level model/provider selection, but make effort provider-aware.

Recommended implementation:

- keep persisted field name `effort`
- widen internal effort type to `ProviderReasoningEffort`
- add provider/model validators at every launch boundary
- Anthropic UI only shows `low | medium | high`
- Codex UI shows only the selected model's `supportedReasoningEfforts`
- orchestrator accepts `minimal | low | medium | high | xhigh` for Codex and `low | medium | high | max` for Anthropic paths

Existing validator hotspots:

- `src/shared/types/team.ts` currently defines `EffortLevel = 'low' | 'medium' | 'high'`
- `src/main/ipc/teams.ts` currently validates only `low | medium | high`
- `src/main/http/teams.ts` currently validates only `low | medium | high`
- `src/renderer/components/team/dialogs/EffortLevelSelector.tsx` currently hardcodes only `Default | Low | Medium | High`
- `LaunchTeamDialog`, `CreateTeamDialog`, member draft rows, and member editor utilities currently cast strings with `as EffortLevel`

Required migration:

- replace unsafe `as EffortLevel` casts with a provider-aware normalization function
- parse provider before parsing effort in IPC and HTTP paths
- validate lead effort against lead provider/model
- validate member effort against each member's resolved provider/model
- keep old persisted `low | medium | high` values readable without migration

Validation rule:

```text
provider=codex:
  effort must be in selectedModel.supportedReasoningEfforts

provider=anthropic:
  effort must be low | medium | high

provider=gemini:
  keep current behavior unless Gemini gets a richer effort contract
```

Important:

- do not map Anthropic `max` to Codex `xhigh`
- do not map Codex `xhigh` to Anthropic `max`
- if selected Codex model changes and old effort is unsupported, reset to the new model's `defaultReasoningEffort`
- if catalog is unavailable, only allow static fallback efforts that are proven launchable

Launch identity rule:

- `effort` is user selection
- `resolvedEffort` is what launch sends to runtime
- if user selection is empty/default, `resolvedEffort` comes from app-server `defaultReasoningEffort`
- if resolved effort equals app-server default, runtime transport may omit `model_reasoning_effort`, but exact logs still record the resolved value

## Runtime Launch Transport

This was the highest-risk area in the earlier plan. The corrected plan is explicit.

Facts:

- `codex exec` has `--model`.
- `codex exec` has `-c, --config key=value`.
- `codex exec` has no documented `--effort`.
- Codex config has `model_reasoning_effort`.
- `model_reasoning_effort` supports `minimal | low | medium | high | xhigh`.

Therefore:

- Codex native launch must not pass `--effort xhigh` to Codex CLI.
- Orchestrator may keep accepting `--effort` as its public Agent Teams flag.
- When provider is Codex native, orchestrator must translate accepted effort into `codex exec -c model_reasoning_effort="value"`.
- When no effort is selected, omit `model_reasoning_effort` and let Codex use its model default.
- When effort equals the selected model's app-server default, either omit it or pass it consistently, but pick one policy and test it.

Recommended policy:

- omit effort when it equals app-server `defaultReasoningEffort`
- pass effort only when user explicitly selected a non-default value

Reason:

- this tracks Codex defaults as Codex evolves
- exact logs remain cleaner
- future app-server default changes are not blocked by stale persisted values

Live signoff command shape:

```bash
codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"
```

Quoting requirement:

- command builder must pass `-c` and `model_reasoning_effort="xhigh"` as separate argv entries
- shell-rendered exact logs can show `-c model_reasoning_effort='"xhigh"'`
- tests should assert argv arrays, not only shell strings
- never concatenate user-controlled effort into a shell string without argv escaping

Prelaunch validation must block:

- `gpt-5.1-codex-mini` with `low`
- `gpt-5.1-codex-mini` with `xhigh`
- unknown effort strings from app-server until explicitly supported by our UI and orchestrator type

## Static Fallback

Fallback stays necessary because:

- user may have an older Codex binary
- app-server may fail to initialize
- app-server may start but not support `model/list`
- offline usage should not make the entire model picker empty
- tests should not depend on live Codex availability

Fallback rules:

- fallback source is explicitly marked `static-fallback`
- fallback never claims to be current
- fallback has a short visible warning in Provider Settings only when user is choosing Codex models
- fallback model list should be minimal and conservative
- fallback must not include newly guessed future models
- fallback caused by missing `model/list` should include an upgrade hint tied to the detected Codex binary version when available

Recommended fallback models:

- `gpt-5.4`
- `gpt-5.4-mini`
- `gpt-5.3-codex`
- `gpt-5.2`
- `gpt-5.1-codex-mini`

Fallback effort rules:

- use `medium | high` for `gpt-5.1-codex-mini`
- use `low | medium | high | xhigh` for known models only if live signoff confirms `model_reasoning_effort` pass-through
- otherwise fallback UI can show richer metadata but disable non-launchable options

API-key mode note:

- do not use OpenAI `/v1/models` as the primary Codex picker for subscription-backed Codex
- optional API `/v1/models` fallback is allowed only for explicit API-key mode diagnostics
- if API `/v1/models` disagrees with Codex app-server `model/list`, Codex app-server wins for native Codex execution
- reason: the actual runtime surface is `codex exec`, and app-server describes what Codex clients should show

## Cache And Refresh

Goal:

- make model updates feel fresh without making Provider Settings slow or flaky.

Main-process cache:

- key: Codex binary path plus Codex binary version plus Codex home plus launch cwd/profile/config fingerprint plus preferred auth mode plus effective auth mode plus managed account hash plus API-key source
- success TTL: `10 minutes`
- stale TTL: `24 hours`
- in-flight dedupe: one live `model/list` request per key
- manual refresh bypasses success TTL but still dedupes in-flight work
- auth mode change invalidates the ready cache for UI selection purposes
- `forced_login_method` and forced workspace changes invalidate the affected auth/catalog scope
- logout clears ChatGPT-scoped catalog cache
- API key source change clears API-key-scoped catalog cache
- project `.codex/config.toml`, global `config.toml`, or `model_catalog_json` changes clear the affected scope when detected by fingerprint change
- binary path or version change clears all Codex model catalog cache entries

Renderer cache:

- consume `CliProviderStatus.modelCatalog`
- no independent polling loop in the model picker
- refresh through existing provider status refresh action

Dashboard policy:

- do not run `model/list` on every dashboard render
- use existing provider status refresh cadence
- model catalog stale state can be shown only inside settings/model picker, not as a scary dashboard error
- dashboard catalog is a global/default-scope summary, not a promise that every project cwd has the same catalog

Provider Settings policy:

- open dialog with cached provider status immediately
- refresh in background
- show `Checking...` only for the area still being refreshed
- never replace a ready catalog with empty state during a refresh

Avoid this bug:

- do not set global provider status to `unavailable` while only the model catalog refresh is pending
- do not replace a ChatGPT-ready account state with a catalog timeout
- do not show generic `Unknown error`; preserve app-server method, timeout, and fallback source in diagnostics
- if `auto` resolves to ChatGPT, API-key detection copy stays secondary
- if `auto` resolves to API key because ChatGPT is unavailable, show why ChatGPT was skipped before showing API-key catalog

## UI Behavior

### Model picker

When `provider=codex`:

- prefer `providerStatus.modelCatalog.models`
- option value is `launchModel`
- React key can use `catalogId`
- label uses `displayName`
- default badge uses `isDefault`
- hidden app-server models are excluded from normal selector unless already persisted in a team
- disabled state uses existing Agent Teams policy plus app-server `upgrade` hints
- runtime-capability state controls whether a visible model is launchable
- fallback badge says `Using fallback catalog` only when source is fallback
- if app-server says a model is available but Agent Teams disables it, show `Available in Codex, disabled for Agent Teams`
- if app-server says a future model exists but runtime capability is missing, show `Available in Codex, waiting for Agent Teams runtime support`
- if a persisted model is missing from current catalog, show it as `Unavailable in current Codex catalog` and require user confirmation before relaunch
- if the dialog has a selected cwd and only a global catalog is available, show global options as provisional until project-scoped catalog finishes
- if project-scoped catalog differs from global catalog, keep the user's explicit selection only if it exists in the project-scoped catalog or is a preserved persisted value

When catalog is loading:

- keep previous options visible
- show a subtle "Refreshing models" state
- do not show an empty Codex picker unless no cached or fallback models exist
- label provisional global catalog rows as `Checking this project...` when launch cwd is known

When catalog fails:

- use stale cache if present
- otherwise use static fallback
- show the app-server error in diagnostics, not as a generic unknown error

### Effort selector

When `provider=codex` and selected model has catalog metadata:

- show efforts from `supportedReasoningEfforts`
- mark `defaultReasoningEffort` as default
- include `xhigh` if returned by app-server and runtime capability says Codex effort config pass-through is supported
- if runtime capability is missing, show Codex-only efforts as metadata or disabled rows, not selectable launch values
- if selected effort is no longer valid, reset to default with a small explanation
- if model is Agent Teams-disabled, keep effort selector read-only or disabled to avoid suggesting launchability

When selected model has no catalog metadata:

- show only safe fallback efforts
- do not show `xhigh` unless launch pass-through is implemented and tested

When `provider=anthropic`:

- keep current selector behavior
- do not show Codex-only `minimal`, `none`, or `xhigh`
- do not change Anthropic copy

### Default model

Recommended behavior:

- app-server `isDefault` defines the Codex default in UI
- "Default" label can render as `Default (gpt-5.4)` or `Default (GPT-5.4)` when catalog is ready
- new Codex teams can display `Default`, but launch must resolve it to a concrete `resolvedLaunchModel`
- existing teams keep their persisted model unless user changes it
- do not rewrite old team metadata just because app-server default changed
- exact logs and team metadata should record both selected `Default` and concrete resolved model

Reason:

- new teams benefit from current Codex defaults
- existing teams remain explainable even if Codex default changes later

## Orchestrator Changes

### Model status

Short term:

- keep static `CODEX_MODELS` for standalone fallback and non-app UI compatibility
- add richer status only if orchestrator can read app-server directly without slowing CLI startup

Recommended first cut:

- `claude_team` owns app-server model catalog for UI
- orchestrator keeps static runtime status until a dedicated orchestrator catalog source is added
- launch validation accepts provider-explicit Codex model strings even if not in static `CODEX_MODELS`
- orchestrator exposes runtime capabilities for dynamic Codex model ids and Codex reasoning effort config pass-through

Reason:

- UI is where the dynamic picker is needed immediately
- orchestrator should not reject a future model that Codex app-server already exposed and `claude_team` selected
- UI should not guess whether the current runtime can launch that future model

### Validation

Update validation so:

- provider-explicit `codex` launches can use model strings from app-server catalog
- unknown model strings are not guessed as Codex without provider context
- static `isCodexModel()` remains valid for generic detection, not authoritative for provider-explicit launches
- if provider context is missing, keep existing conservative static validation

### Effort transport

Update orchestrator:

- accept Codex efforts `minimal | low | medium | high | xhigh`
- preserve Anthropic `max`
- in Codex native executor, convert Codex effort to `-c model_reasoning_effort='"value"'`
- do not pass unsupported effort values to `codex exec`
- exact logs should show the selected effort as normalized Agent Teams metadata and the actual Codex config override

Required tests:

- Codex native `xhigh` becomes `-c model_reasoning_effort='"xhigh"'`
- no effort omits `model_reasoning_effort`
- Anthropic `max` remains Anthropic-only
- Codex `max` is rejected
- Anthropic `xhigh` is rejected

## Concrete Implementation Touchpoints

`claude_team`:

- `src/main/services/infrastructure/codexAppServer/protocol.ts` - add app-server model DTOs
- `src/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.ts` - preserve JSON-RPC error code, method, and details
- `src/main/services/infrastructure/codexAppServer/CodexBinaryResolver.ts` or a nearby service - expose binary version for cache invalidation
- `src/features/codex-model-catalog` - new feature for catalog domain, use case, app-server source, fallback source, and cache
- `src/features/codex-account/main/composition/createCodexAccountFeature.ts` - coordinate combined control-plane snapshot or delegate to shared reader
- `src/features/codex-account/renderer/mergeCodexProviderStatusWithSnapshot.ts` - preserve account truth while merging model catalog truth
- `src/shared/types/cliInstaller.ts` - add optional provider model catalog
- `src/shared/types/team.ts` - widen provider-aware effort types without breaking old persisted values
- `src/shared/types/schedule.ts` - prevent scheduled launches from dropping Codex-specific efforts
- `src/main/services/team/TeamDataService.ts` - preserve provider-aware effort and launch identity when reconstructing team state
- `src/main/services/team/TeamMembersMetaStore.ts` - stop filtering Codex efforts down to legacy `low | medium | high`
- `src/main/services/team/TeamBackupService.ts` and restore paths - preserve additive launch identity and tolerate old backups
- `src/main/services/runtime/CliProviderModelAvailabilityService.ts` - keep runtime verification compatible with `launchModel` values and do not verify hidden/catalog-only rows by accident
- `src/main/ipc/teams.ts` and `src/main/http/teams.ts` - parse provider first, then validate effort
- `src/renderer/utils/teamModelAvailability.ts` - consume rich Codex catalog
- `src/renderer/utils/teamModelCatalog.ts` - demote Codex static list to fallback and labels only
- `src/renderer/components/team/dialogs/EffortLevelSelector.tsx` - make options provider/model-aware
- `src/renderer/components/team/dialogs/LaunchTeamDialog.tsx` and `CreateTeamDialog.tsx` - remove unsafe effort casts and persist resolved launch identity
- member draft/editor components - validate per-member resolved provider/model/effort
- renderer launch prefill and draft retry storage - add a versioned launch identity payload and tolerate old entries

`agent_teams_orchestrator`:

- `src/entrypoints/sdk/runtimeTypes.ts` - add provider-aware Codex effort support
- `src/main.tsx` - update `--effort` parser or provider-specific validation path
- `src/utils/effort.ts` and `src/utils/providerEffort.ts` - separate Anthropic `max` from Codex `xhigh`
- Codex native executor path - convert effort to `-c model_reasoning_effort`
- `src/utils/model/codex.ts` - rename static list semantics to fallback/static detection
- `src/utils/model/validateModel.ts` - allow provider-explicit Codex app-catalog models
- runtime status/capability endpoint - expose dynamic Codex model and effort pass-through support
- exact-log/runtime status code - record selected model, resolved model, selected effort, resolved effort, and config override

## Phased Implementation

### Phase 0 - contracts and live spike

Commit boundary: `docs(codex): plan app-server model catalog`

Tasks:

- add this plan
- keep live probe output in signoff notes or test fixture
- confirm installed Codex supports `model/list`
- confirm one app-server session can read account, rate limits, and model catalog
- confirm docs support `model_reasoning_effort`
- decide exact shell quoting for `-c model_reasoning_effort`
- capture fixtures for at least two catalog shapes: current live shape and synthetic `id !== model`
- capture current Codex binary version and document cache invalidation expectations

Acceptance:

- plan exists in the dedicated worktree
- no code behavior changes
- weak areas are explicitly called out

### Phase 1 - app-server model catalog feature

Commit boundary: `feat(codex): add app-server model catalog source`

Tasks:

- add structured JSON-RPC request errors with method/code/details
- expose or probe Codex binary version for catalog cache keys
- add effective config fingerprint support using app-server `config/read` when available
- add `config/read` support detection and always send `{}` params at minimum
- add `src/features/codex-model-catalog`
- add app-server protocol types
- add `CodexModelCatalogAppServerClient`
- add normalization domain rules
- add static fallback source
- add in-memory cache with TTL and in-flight dedupe
- include launch scope fields in cache keys: cwd, profile, trust, config fingerprint, launch override fingerprint
- include forced login method and forced workspace hash in auth-scoped cache keys
- normalize both documented effort option objects and defensive string effort values
- classify `method not found`, timeout, malformed response, and empty catalog separately
- add structured diagnostics without raw account email or secret-bearing env values
- expose feature facade from main composition

Acceptance:

- JSON-RPC `method not found` can be detected in tests
- binary version changes invalidate catalog cache
- config fingerprint changes invalidate catalog cache for that scope
- forced login/workspace changes invalidate account, limits, and catalog cache for that scope
- unit tests cover normalization, fallback, pagination, duplicate ids, missing modalities, unknown effort strings, and `id !== model`
- app-server client tests cover `model/list` request params and timeout labels
- method-not-found falls back without marking account disconnected
- diagnostics include source, status, method, error category, binary version, effective auth mode, and cache age
- no renderer behavior changes yet

### Phase 2 - provider status integration

Commit boundary: `feat(runtime): expose codex model catalog metadata`

Tasks:

- add optional `modelCatalog` to `CliProviderStatus`
- add optional `runtimeCapabilities` to `CliProviderStatus`
- merge Codex model catalog into provider status
- keep `models: string[]` derived from `launchModel`
- make provider refresh use cached, auth-scoped catalog
- implement combined account/rate-limits/catalog app-server read for normal refresh
- avoid extra app-server session in hot paths where account snapshot already refreshes
- clear ChatGPT-scoped catalog on logout and API-key-scoped catalog when API key source changes
- clear all catalog entries when Codex binary path or version changes
- ensure `auto` catalog scope follows effective launch auth mode, not just configured preference
- add request/snapshot versioning so stale refresh responses cannot overwrite newer auth state
- support global provider refresh and project-scoped launch refresh as different catalog scopes
- preserve Anthropic provider status shape

Acceptance:

- Codex provider status includes `modelCatalog`
- Codex provider status includes runtime capability metadata when available
- old `models` still works
- `auto` with ChatGPT ready uses ChatGPT-scoped catalog even if API key is detected
- `auto` with ChatGPT unavailable and API key ready uses API-key-scoped catalog with clear degraded copy
- forced login method overrides are reflected in effective auth copy and cache scope
- one normal Codex provider refresh does not spawn separate app-server processes for account, limits, and catalog
- Anthropic snapshots are byte-for-byte equivalent except ordering noise already present
- provider dashboard does not block on a slow catalog refresh when stale cache exists
- older refresh results are ignored after auth mode or runtime capability changes
- global dashboard catalog and project launch catalog do not overwrite each other

### Phase 3 - dynamic UI model picker and effort selector

Commit boundary: `feat(codex): use dynamic model catalog in team launch UI`

Tasks:

- update Codex model picker to prefer rich catalog
- show app-server labels, default badge, and fallback source state
- update effort selector to be provider/model-aware
- show `xhigh` metadata only for Codex models that return it
- make `xhigh` selectable only when runtime capability says Codex effort config pass-through is supported
- hide Codex-only efforts for Anthropic
- reset invalid effort on model change
- preserve missing persisted models as visible warning rows instead of silently clearing selection
- keep Agent Teams disabled policy separate from Codex app-server availability
- show future app-server models immediately, with `New from Codex catalog` status when policy has not verified them yet
- when cwd is selected, refresh project-scoped Codex catalog before enabling launch-only controls

Acceptance:

- `gpt-5.1-codex-mini` shows only `medium | high`
- `gpt-5.3-codex-spark` defaults to `high`
- `gpt-5.4` shows `low | medium | high | xhigh` as catalog metadata
- `xhigh` is disabled with runtime-upgrade copy until capability support is present
- app-server-visible but Agent Teams-disabled model shows disabled copy, not unavailable copy
- synthetic future `gpt-5.5` fixture appears without touching static catalog
- persisted model missing from current catalog is visible with a warning
- Anthropic UI remains `low | medium | high`
- static fallback still renders when app-server is unavailable
- global catalog can be displayed provisionally, but launch enablement waits for project-scoped catalog or explicit degraded confirmation

### Phase 4 - launch validation and Codex effort pass-through

Commit boundary: `feat(runtime): pass codex reasoning effort through native exec`

Tasks:

- widen team launch effort validation with provider-specific rules
- update IPC and HTTP validators
- update `TeamProvisioningService` request shaping
- persist additive `ProviderModelLaunchIdentity` into team metadata, exact-log metadata, and backup/restore payloads where launch identity is reconstructed
- update orchestrator parser and runtime types
- expose orchestrator runtime capability metadata for dynamic Codex models and Codex effort config
- translate Codex effort to argv entries `['-c', 'model_reasoning_effort="value"']`
- keep Anthropic `max` separate
- add exact-log metadata for selected model, resolved launch model, catalog source, selected effort, and resolved effort
- resolve `Default` to concrete launch model before provisioning
- update scheduled/provisioned launch paths or block Codex-only efforts in those paths until updated
- enforce built-in OpenAI Codex provider scope or block custom/OSS provider configs with clear copy
- pass profile/cwd/config overrides consistently between preview and `codex exec`

Acceptance:

- Codex `xhigh` launch reaches `codex exec` as `model_reasoning_effort`
- Codex `max` is rejected before launch
- Anthropic `xhigh` is rejected before launch
- unsupported model-effort pairs are blocked before provisioning
- provider-explicit synthetic future model is accepted only when runtime capability says dynamic Codex models are supported
- member metadata, team metadata, draft retry, and backup/restore preserve provider-aware effort
- replay/exact logs show what was selected, what default resolved to, and what was passed to Codex
- exact logs include catalog scope fingerprint and provider scope, but not raw config values

### Phase 5 - cleanup and fallback tightening

Commit boundary: `refactor(codex): demote static model catalog to fallback`

Tasks:

- rename static Codex catalog helpers to make fallback status explicit
- remove UI assumptions that static list is authoritative
- make future provider-explicit Codex ids launchable when selected from app-server catalog
- add diagnostics for catalog source and staleness
- document fallback behavior
- add a fixture/test with synthetic future model `gpt-5.5`
- remove any remaining hardcoded Codex model order from the primary Codex UI path
- add hidden-model fixture and upgrade-suggestion fixture
- add one migration test for old localStorage launch prefill without provider model launch identity
- add project-scoped catalog fixture with `model_catalog_json`
- add custom-provider config fixture
- add forced login method and forced workspace fixtures
- add `config/read` method-missing and invalid-params fixtures

Acceptance:

- new app-server model can appear in UI without code changes
- static fallback is visible as fallback in diagnostics
- no code path treats static `CODEX_MODELS` as the only valid Codex provider model list
- synthetic `gpt-5.5` appears through app-server fixture and can be selected without touching static catalog
- hidden persisted model is preserved with warning and is not introduced into new-team picker
- project-scoped catalog differences are visible and do not corrupt global provider status
- forced login method changes are visible and do not reuse stale catalog/rate-limit scope

## Test Plan

### `claude_team` unit tests

Add tests for:

- structured JSON-RPC error classification
- binary version cache invalidation
- effective config fingerprint cache invalidation
- `config/read` support detection, including invalid missing params
- project-scoped `model_catalog_json` fixture
- app-server model normalization
- `id` vs `model` split
- default model selection
- per-model effort options
- unknown effort filtering
- auth-scoped catalog cache keys
- `auto` auth resolving to ChatGPT vs API-key catalog scope
- combined app-server snapshot partial failures
- method-not-found fallback for older Codex app-server
- fallback catalog source
- stale cache behavior
- stale refresh response is ignored after newer auth-scope request
- global catalog and project-scoped catalog use separate cache entries
- forced login method and forced workspace hash use separate cache entries
- custom/OSS `model_provider` config is blocked or marked unsupported for Agent Teams Codex
- raw managed account email does not appear in catalog diagnostics or exact-log metadata
- provider status `models` compatibility
- provider status runtime capabilities compatibility
- provider model availability uses `launchModel`, not `catalogId`
- renderer model picker with rich catalog
- renderer effort selector with Codex and Anthropic providers
- renderer disables Codex-only efforts when runtime capability is missing
- renderer shows synthetic future model as `New from Codex catalog`
- renderer preserves hidden persisted model after `includeHidden: true` recovery
- persisted missing model warning row
- Agent Teams disabled policy overlay for app-server-visible models
- backup/restore reads old metadata and preserves new launch identity when present
- draft retry and launch prefill read old localStorage entries without dropping provider/model identity
- scheduled launch validation either supports Codex-specific effort or blocks it with explicit error
- launch preview with selected cwd does not enable launch from global-only catalog when project-scoped catalog is still unknown

Suggested commands:

```bash
pnpm vitest run \
  test/features/codex-model-catalog \
  test/features/codex-account \
  test/renderer/components/team \
  test/renderer/utils/teamModelCatalog.test.ts
```

### `agent_teams_orchestrator` tests

Add tests for:

- provider-explicit Codex model validation
- Codex effort parser accepts `minimal | low | medium | high | xhigh`
- Anthropic effort parser keeps existing behavior
- Codex native executor emits `-c model_reasoning_effort`
- Codex native executor builds argv entries, not unsafe shell concatenation
- no effort omits Codex effort config
- `max` is not accepted for Codex
- synthetic `gpt-5.5` passes when provider is explicitly Codex and model came from app catalog
- capability payload reports dynamic Codex model support and effort config support
- provider-explicit future model fails closed when capability is disabled
- Codex native exec argv includes cwd/profile/config override semantics that match preview scope
- custom provider config is not silently routed through subscription Codex UX

Suggested command:

```bash
pnpm test -- runtimeBackends providerEffort spawnMultiAgent codex
```

### live smoke

Run only when developer has Codex login/API available:

```bash
codex app-server
```

JSON-RPC smoke:

```json
{ "jsonrpc": "2.0", "id": 1, "method": "model/list", "params": { "limit": 20, "includeHidden": false } }
```

Native exec effort smoke:

```bash
codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"
```

Failure smoke:

```bash
codex exec --json --model gpt-5.1-codex-mini -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"
```

Expected:

- our app should block the second case before launch once catalog metadata is available
- if run manually, Codex may return model/provider-specific error, but product UX should not rely on that late failure

## Risks And Mitigations

### Risk 1 - app-server startup slows provider settings

`🎯 8   🛡️ 8   🧠 5`

Mitigation:

- cache model catalog in main process
- dedupe in-flight refreshes
- use stale cache while refreshing
- combine account/rate-limit/catalog reads where possible
- never clear ready UI while refresh is pending

### Risk 2 - effort values leak into Anthropic

`🎯 9   🛡️ 9   🧠 4`

Mitigation:

- provider-specific effort validation
- renderer selector branches by provider and selected model
- tests for Anthropic not showing `xhigh`, `minimal`, or `none`
- orchestrator rejects invalid provider-effort pairs

### Risk 3 - `id` and `model` diverge later

`🎯 8   🛡️ 9   🧠 3`

Mitigation:

- use `catalogId` for identity
- use `launchModel` for runtime
- tests with fixture where `id !== model`

### Risk 4 - app-server catalog has unknown fields or new efforts

`🎯 8   🛡️ 8   🧠 5`

Mitigation:

- tolerant protocol DTOs
- unknown efforts preserved in diagnostics but not selectable
- add one small allow-list update when product intentionally supports a new effort
- no hard crash on unknown `inputModalities`

### Risk 5 - static fallback becomes accidentally authoritative again

`🎯 7   🛡️ 8   🧠 4`

Mitigation:

- name fallback helpers clearly
- include `source` in model catalog
- tests assert app-server source wins over fallback
- UI diagnostics expose fallback source

### Risk 6 - launch path accepts model from UI but orchestrator rejects it

`🎯 8   🛡️ 8   🧠 6`

Mitigation:

- provider-explicit Codex launch validation should trust `provider=codex` plus app-server-selected model
- static `isCodexModel()` remains only a generic detector
- exact tests with a future-model fixture like `gpt-5.5`

### Risk 7 - auth-scoped catalog leaks between modes

`🎯 7   🛡️ 9   🧠 6`

Mitigation:

- include auth scope in catalog cache key
- clear scoped cache on logout and API-key source changes
- tests for ChatGPT catalog not being reused in API-key mode
- UI labels catalog source and auth scope in diagnostics

### Risk 8 - Default becomes nondeterministic across relaunch

`🎯 8   🛡️ 9   🧠 6`

Mitigation:

- persist selected model kind and resolved launch model in launch identity
- exact logs record both `Default` and concrete model
- relaunch preview shows current default resolution before launch
- do not silently rewrite old explicit models

### Risk 9 - older Codex binary lacks `model/list`

`🎯 7   🛡️ 9   🧠 5`

Mitigation:

- preserve JSON-RPC error code and method
- classify method-not-found separately from app-server failure
- show static fallback with Codex upgrade hint
- cache key includes binary version so upgrades refresh the catalog

### Risk 10 - `auto` auth shows the wrong catalog

`🎯 7   🛡️ 9   🧠 6`

Mitigation:

- resolve effective auth mode before catalog scope
- keep ChatGPT and API-key catalogs separate
- UI copy distinguishes selected preference, effective launch mode, and fallback credentials
- tests cover ChatGPT-ready + API-key-present and ChatGPT-missing + API-key-ready cases

### Risk 11 - UI enables a capability the installed runtime cannot launch

`🎯 7   🛡️ 10   🧠 7`

Mitigation:

- add explicit runtime capability metadata
- display catalog metadata separately from launch enablement
- fail closed when capability is missing or stale
- test Phase 3 UI against a pre-Phase-4 runtime fixture

### Risk 12 - future models appear but break team-agent behavior

`🎯 8   🛡️ 8   🧠 6`

Mitigation:

- split Codex catalog availability from Agent Teams policy status
- show new models as `New from Codex catalog`
- block only hard incompatibilities: runtime capability missing, unsupported modality, disabled policy, unsupported effort
- exact logs record new-model status for later debugging

### Risk 13 - hidden or upgraded persisted models are silently lost

`🎯 8   🛡️ 9   🧠 5`

Mitigation:

- run one `includeHidden: true` lookup for persisted explicit models missing from visible catalog
- preserve model value during restore and relaunch preview
- show upgrade suggestions without auto-rewriting metadata
- test hidden-model and upgrade fixtures

### Risk 14 - non-dialog launch path drops Codex effort

`🎯 7   🛡️ 9   🧠 7`

Mitigation:

- audit team metadata, members metadata, backup/restore, draft retry, launch prefill, and schedule types
- parse provider before parsing effort at every main-process boundary
- block Codex-only effort in any path not updated in the same phase
- add tests outside React launch dialogs

### Risk 15 - HMR or slow refresh overwrites correct provider state

`🎯 8   🛡️ 9   🧠 5`

Mitigation:

- add request/snapshot versioning
- ignore out-of-order provider status responses
- do not let catalog failures overwrite account truth
- keep last ready state visible while a refresh is pending

### Risk 16 - global catalog preview differs from project launch catalog

`🎯 6   🛡️ 10   🧠 8`

Mitigation:

- include cwd, profile, trust, config fingerprint, and launch override fingerprint in catalog scope
- use app-server `config/read` when available to derive effective config
- keep dashboard/global catalog separate from launch/project catalog
- require project-scoped catalog before enabling launch-only controls when cwd is known

### Risk 17 - custom or OSS Codex config is mistaken for subscription Codex

`🎯 7   🛡️ 9   🧠 7`

Mitigation:

- keep Agent Teams Codex scoped to built-in OpenAI Codex provider
- detect effective `model_provider` when possible
- block or degrade custom/OSS provider configs with explicit copy
- do not show ChatGPT account limits for custom provider execution

### Risk 18 - non-text model row appears in catalog

`🎯 8   🛡️ 9   🧠 4`

Mitigation:

- require `text` input modality for Agent Teams launch
- treat missing `inputModalities` with the documented backward-compatible default
- do not claim personality support when `supportsPersonality=false`

### Risk 19 - experimental app-server surface changes behavior

`🎯 8   🛡️ 9   🧠 4`

Mitigation:

- keep `experimentalApi=false`
- rely only on documented stable `model/list` fields
- ignore unknown fields unless a typed use case is added

### Risk 20 - app-server catalog passes but native exec fails

`🎯 8   🛡️ 10   🧠 6`

Mitigation:

- treat app-server catalog as picker truth, not full launch proof
- require Phase 4 native exec argv tests and live smoke where possible
- test model, effort, cwd, profile, and provider scope together
- block unsupported model-effort pairs before `codex exec`

### Risk 21 - `config/read` behavior differs across Codex versions

`🎯 7   🛡️ 9   🧠 6`

Mitigation:

- feature-detect `config/read`
- always send `{}` params at minimum
- classify method-missing, invalid-params, scoped-failure, and global-success separately
- never make config-read failure disconnect the Codex account

### Risk 22 - forced login/workspace reuses stale catalog

`🎯 7   🛡️ 10   🧠 6`

Mitigation:

- include forced login method and forced workspace hash in auth scope
- invalidate account, limits, and catalog together when either changes
- display forced auth copy instead of showing conflicting selected auth copy
- redact workspace ids in logs and diagnostics

### Risk 23 - local `model_catalog_json` changes without config change

`🎯 6   🛡️ 9   🧠 7`

Mitigation:

- hash effective config and optionally active catalog file mtime when app-server exposes enough origin data
- keep TTL/manual refresh fallback when origin data is unavailable
- do not parse or log arbitrary catalog file contents
- do not apply untrusted project-scoped catalog files unless effective config says they are active

## Definition Of Done

The feature is done when:

- Codex model picker uses app-server `model/list` when available.
- New app-server-visible Codex models appear without app code changes.
- `supportedReasoningEfforts` and `defaultReasoningEffort` drive Codex effort UI.
- `xhigh` appears only where Codex reports it.
- Anthropic UI and launch behavior are unchanged.
- Codex launches pass effort through `model_reasoning_effort`.
- UI launch controls are gated by runtime capabilities, not by catalog metadata alone.
- Future app-server-visible models appear without code changes and are marked as new until policy/runtime support is clear.
- `Default` Codex selection resolves to concrete launch identity before provisioning.
- Auth changes do not reuse stale model catalogs across ChatGPT and API-key modes.
- Project-scoped Codex config and `model_catalog_json` cannot make launch use a different catalog than preview without explicit degraded copy.
- Custom or OSS Codex provider config is not silently presented as ChatGPT subscription-backed Agent Teams Codex.
- `config/read` compatibility is feature-detected and never breaks account truth on older binaries.
- Forced login method and forced workspace changes cannot reuse stale account, rate-limit, or catalog cache.
- Codex binary upgrades invalidate stale catalog cache and retry `model/list`.
- Older Codex binaries without `model/list` fall back without breaking account state.
- Static Codex catalog is clearly fallback, not primary truth.
- Hidden persisted models are preserved with explicit warnings.
- Backup/restore, draft retry, launch prefill, member metadata, and scheduled paths do not drop provider-aware effort.
- Exact logs and diagnostics do not persist raw account identifiers or secret values.
- Exact logs include catalog scope and provider scope fingerprints for debugging preview vs launch mismatch.
- HMR and out-of-order refreshes do not replace ready provider status with stale fallback/error state.
- Provider Settings remains fast and does not show transient empty/error states during refresh.
- Tests cover catalog source, fallback, effort validation, and launch pass-through.

## Final Signoff And Handoff

The implementation is now ready for review after these checks stay green:

1. `claude_team`: `pnpm typecheck`
2. `claude_team`: targeted catalog/runtime/team provisioning Vitest suites
3. `agent_teams_orchestrator_codex_native_spike`: targeted Codex native exec and runtime capability Bun suites
4. Live `codex app-server model/list` smoke against the installed Codex binary
5. Optional UI smoke with `CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike`

Merge requirement:

- merge/pair the `claude_team` branch with the `agent_teams_orchestrator_codex_native_spike` runtime capability change.
- if the UI branch is merged without the runtime capability change, the feature remains safe but conservative: dynamic future Codex models and `xhigh` are visible as catalog metadata but blocked for launch.
- if the runtime capability change is merged without the UI branch, existing Codex native behavior remains unchanged except for the explicit runtime status payload and `xhigh` exact argv support already covered by tests.

Recommended final manual smoke:

```bash
CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike pnpm dev
```

Then verify:

- Provider Settings Codex model list is populated from app-server catalog.
- `gpt-5.1-codex-mini` shows only `medium | high`.
- `gpt-5.4` shows `low | medium | high | xhigh`.
- Anthropic does not show `minimal`, `none`, or `xhigh`.
- A synthetic or newly released Codex model is not silently hidden by static UI code.
- Launch logs include selected model, resolved launch model, selected effort, resolved effort, catalog source, and runtime capability truth.