agent-ecosystem/docs/research/codex-app-server-model-catalog-plan.md
2026-04-21 12:45:34 +03:00

98 KiB

Codex App-Server Model Catalog Plan

Date: 2026-04-21
Status: implementation complete in feature worktrees, pending final review/commit
Worktree: /Users/belief/dev/projects/claude/claude_team_codex_model_catalog_plan
Branch: spike/codex-model-catalog-plan
Primary repo: claude_team
Secondary repo worktree: /Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike
Architecture reference: FEATURE_ARCHITECTURE_STANDARD.md

Executive Summary

Codex model selection should move from hardcoded local lists to the official Codex app-server model/list catalog.

Chosen implementation:

  • Add a dedicated src/features/codex-model-catalog feature in claude_team.
  • Use codex app-server JSON-RPC model/list as the primary source for Codex models.
  • Keep the existing static Codex catalog only as a bounded fallback when app-server is unavailable.
  • Add rich, additive model metadata to CliProviderStatus while keeping models: string[] for backwards compatibility.
  • Use per-model supportedReasoningEfforts and defaultReasoningEffort for the Codex model picker and launch validation.
  • Keep Anthropic and Gemini behavior unchanged by default.
  • Update agent_teams_orchestrator so Codex launches pass reasoning effort through Codex config key model_reasoning_effort, not through an invented --effort flag.

Decision score:

  • 🎯 9 🛡️ 9 🧠 6
  • estimated implementation size: 1200-2400 lines across claude_team and agent_teams_orchestrator

Why this is the safest path:

  • It follows the real Codex client contract instead of chasing static releases.
  • It solves future model releases like gpt-5.5 without an app release, as long as Codex app-server already exposes the model.
  • It avoids breaking Anthropic by making the new catalog contract additive and provider-scoped.
  • It handles xhigh correctly as Codex-specific reasoning effort, not as Anthropic max.

Current implementation state:

  • claude_team has the dedicated Codex model catalog feature, app-server JSON-RPC client, static fallback, provider status integration, Codex model picker integration, provider-aware effort UI, launch validation, launch identity persistence, and targeted tests.
  • agent_teams_orchestrator_codex_native_spike exposes runtime capabilities for dynamic Codex models and Codex reasoning config pass-through, and its Codex native exec runner passes effort through -c model_reasoning_effort="value".
  • Anthropic remains isolated from Codex-only effort values. Anthropic launch UI still uses low | medium | high; Codex can use per-model minimal | low | medium | high | xhigh only where catalog/runtime policy allows it.
  • Future Codex app-server models can appear immediately in UI. Launch is allowed only when the local runtime declares dynamic Codex model support; otherwise they remain visible with upgrade/policy copy instead of failing late during spawn.
  • Default Codex selection is resolved to a concrete model immediately before provisioning and stored as additive launch identity metadata.
  • The remaining work before merge is review/signoff, not more architecture discovery.

Sources And Verification

Official sources checked:

Important official facts:

  • model/list is explicitly intended for rendering model and personality selectors.
  • model/list returns id, model, displayName, hidden, defaultReasoningEffort, supportedReasoningEfforts, inputModalities, supportsPersonality, isDefault, upgrade, and upgradeInfo.
  • includeHidden: false returns picker-visible models by default.
  • codex exec has --model and -c, --config key=value.
  • codex exec does not expose a first-class --effort flag.
  • Codex config key model_reasoning_effort supports minimal | low | medium | high | xhigh.
  • xhigh is model-dependent.
  • config/read exists in app-server and returns effective configuration after configuration layering.
  • Codex loads user config from ~/.codex/config.toml and can also load project-scoped .codex/config.toml only for trusted projects.
  • model_catalog_json can override the model catalog, including profile-level overrides.
  • codex exec supports --cd and --profile, and -c key=value overrides take precedence for one invocation.

Local probe:

  • binary: codex-cli 0.117.0
  • method: codex app-server over JSON-RPC stdio
  • transport: newline-delimited JSON-RPC over stdio, not Content-Length framing
  • request: model/list with { "limit": 20, "includeHidden": false }
  • result: 8 visible models, gpt-5.4 marked default, nextCursor: null
  • visible models returned: gpt-5.4, gpt-5.2-codex, gpt-5.1-codex-max, gpt-5.4-mini, gpt-5.3-codex, gpt-5.3-codex-spark, gpt-5.2, gpt-5.1-codex-mini
  • xhigh is already returned for most models.
  • gpt-5.1-codex-mini only returned medium | high, so effort options must be per-model.
  • gpt-5.3-codex-spark returned default effort high, so default effort must not be global.
  • codex exec --help locally confirms --cd, --profile, --model, --oss, --local-provider, and repeatable -c key=value.
  • local help confirms --oss is equivalent to -c model_provider=oss, so provider scope can differ from subscription-backed OpenAI Codex if not guarded.
  • live config/read probe returned { config, origins }.
  • live config/read probe requires params object; missing params returns JSON-RPC error -32600.
  • live config/read probe accepted { cwd } and { profile } without error, so the implementation should feature-detect and test scoped reads instead of assuming only global config.
  • final live smoke on this worktree confirmed model/list returns 8 visible models, default gpt-5.4, xhigh for most models, and medium | high only for gpt-5.1-codex-mini.

Combined app-server session probe:

  • one initialized app-server process successfully handled account/read, account/rateLimits/read, and model/list sequentially
  • account/read returned a ChatGPT account shape in the local environment
  • account/rateLimits/read returned primary.windowDurationMins = 300 and secondary.windowDurationMins = 10080
  • model/list returned the same 8 visible models in that same session
  • conclusion: provider refresh should use a combined control-plane session when it needs account, limits, and catalog truth

Lowest-Confidence Areas And Decisions

1. Auth-scoped catalog truth

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-350 lines

Uncertainty:

  • app-server model/list may return different catalogs depending on active Codex auth state, account plan, org policy, API-key mode, or future Codex rollout flags.
  • The local probe only proves one logged-in environment, not all account modes.

Decision:

  • treat Codex model catalog as auth-scoped, not global
  • cache key must include binary path, Codex home, preferred auth mode, effective auth mode, managed account stable identity when available, and API-key availability source
  • never reuse a ChatGPT-account catalog as API-key-mode catalog
  • never reuse an API-key-mode catalog as ChatGPT-account catalog
  • when auth mode changes, keep previous catalog visible only as stale UI while refresh is in flight, then replace it

Implementation rule:

catalogCacheKey =
  binaryPath
  + binaryVersion
  + codexHome
  + preferredAuthMode
  + effectiveAuthMode
  + managedAccountHash or "no-chatgpt-account"
  + apiKey.source or "no-api-key"

The hash should use a per-process salt and should not be persisted. Do not persist raw email solely for catalog cache.

2. Default model determinism

🎯 8 🛡️ 9 🧠 6
Estimated implementation impact: 220-420 lines

Uncertainty:

  • current UI can represent model as empty string meaning Default
  • Codex app-server default can change after a Codex release
  • launch logs, relaunch, replay, and team metadata need to stay explainable

Decision:

  • keep Default as a UI selection
  • resolve Default to a concrete resolvedLaunchModel immediately before launch
  • persist both user selection and resolved runtime truth in launch metadata
  • never silently rewrite old team config from one concrete model to another
  • if a team stored Default, relaunch should show that it will resolve to the current Codex default before launch

Required persisted launch identity:

export interface ProviderModelLaunchIdentity {
  providerId: TeamProviderId;
  providerBackendId: TeamProviderBackendId | null;
  selectedModel: string | null;
  selectedModelKind: 'default' | 'explicit';
  resolvedLaunchModel: string;
  catalogId: string | null;
  catalogSource: 'app-server' | 'static-fallback' | 'unavailable';
  catalogFetchedAt: string | null;
  selectedEffort: string | null;
  resolvedEffort: string | null;
}

This identity should be written into exact logs and launch-derived metadata. It should not replace existing fields in Phase 1, but it should become the canonical explanation layer for Codex relaunch/replay.

3. Effort transport through orchestrator

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 180-320 lines

Uncertainty:

  • Agent Teams exposes a generic --effort concept today
  • Codex CLI does not expose --effort
  • Codex uses config key model_reasoning_effort

Decision:

  • UI and main process may accept provider-aware effort strings
  • orchestrator public Agent Teams CLI can continue accepting --effort
  • Codex executor must translate Codex effort to codex exec -c model_reasoning_effort='"value"'
  • Anthropic executor must not see Codex-only effort values
  • Codex executor must not see Anthropic max

No implementation phase may ship xhigh as selectable until this pass-through is tested.

4. Catalog availability vs team-agent safety policy

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 160-280 lines

Uncertainty:

  • app-server model/list says a model is available to Codex
  • our team-agent contract can still make a model unsafe for Agent Teams if it breaks task/reply/bootstrap conventions
  • current UI has local disabled policy for gpt-5.3-codex-spark, gpt-5.2-codex, and gpt-5.1-codex-mini

Decision:

  • model catalog answers "can Codex offer this model"
  • team policy answers "can Agent Teams safely launch this model"
  • keep these as separate layers
  • do not remove current disabled policies just because app-server returns a model
  • show clear disabled copy: Available in Codex, disabled for Agent Teams
  • disabled models can still display catalog metadata and effort metadata for transparency

5. Codex binary version and app-server method compatibility

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 220-420 lines

Uncertainty:

  • codex app-server is documented as an app-server integration surface, but local users can have older Codex binaries.
  • model/list may be missing, renamed, or return a narrower shape in older binaries.
  • Current JsonRpcStdioClient collapses JSON-RPC errors to Error(message), which loses method, code, and structured details needed to distinguish method not found from auth/network/timeout.
  • Current CodexBinaryResolver caches only binary path, not binary version.

Decision:

  • make binary version part of catalog cache identity
  • add structured JSON-RPC error metadata before implementing catalog fallback
  • treat method not found as static-fallback, not as account failure
  • treat malformed model rows as catalog degradation, not app-server runtime failure
  • clear catalog cache when resolved Codex binary path or version changes

Required implementation detail:

export class JsonRpcRequestError extends Error {
  readonly method: string;
  readonly code: number | null;
  readonly details: unknown;
}

The app-server model client should classify:

  • method_not_found: fallback to static catalog and show upgrade hint
  • timeout: stale cache if available, then fallback
  • malformed_response: fallback plus diagnostics
  • process_exit: shared app-server failure for all sub-results in combined snapshot
  • auth_required: account/read decides auth truth; model/list must not invent auth truth

6. auto auth resolution for model catalog

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-320 lines

Uncertainty:

  • UI lets users pick auto, chatgpt, or api_key.
  • Catalog can differ between ChatGPT subscription and API key.
  • The model picker must preview the catalog for the mode that launch will actually use, not only the configured preference.

Decision:

  • preferredAuthMode=auto is not a catalog scope by itself
  • resolve auto into effectiveAuthMode using the same readiness logic as launch
  • catalog request should be scoped to the effective launch mode
  • Provider Settings can show both preference and effective catalog scope when they differ
  • if effective mode flips from ChatGPT to API key because ChatGPT becomes unavailable, keep stale ChatGPT catalog visually stale and refresh API-key catalog

UX copy rule:

  • do not say Detected from OPENAI_API_KEY as the primary model catalog source when ChatGPT account is the effective mode
  • show API-key availability only as fallback/secondary when selected auth is ChatGPT or auto resolves to ChatGPT

7. App-server notifications and refresh cadence

🎯 8 🛡️ 8 🧠 5
Estimated implementation impact: 160-260 lines

Uncertainty:

  • account login flow has notifications
  • current docs and local probe do not establish a dedicated model-catalog changed notification
  • keeping a long-lived app-server just for model catalog would increase lifecycle complexity

Decision:

  • do not introduce a long-lived model catalog subscription in this rollout
  • use short-lived app-server sessions for refresh
  • trigger catalog refresh after login success, logout, auth mode change, API-key source change, manual refresh, and provider status refresh
  • do not poll model/list aggressively from renderer
  • use 10 minute success TTL and stale cache for UI continuity

If a future app-server release adds model catalog notifications, integrate them later behind the catalog feature port without changing renderer contracts.

8. Backup, restore, and relaunch compatibility

🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 240-520 lines

Uncertainty:

  • team launch metadata already persisted provider/model/effort/backend in several places
  • adding dynamic defaults and resolved model identity can make old backups ambiguous
  • old teams may contain no modelCatalog metadata

Decision:

  • new ProviderModelLaunchIdentity is additive
  • old teams without it remain readable
  • relaunch derives missing identity from existing provider/model/effort/backend fields
  • restore does not require the old catalog to be available
  • if restored explicit model is missing from current catalog, UI preserves the explicit model with a warning instead of silently replacing it with current default
  • if restored model was Default, relaunch preview resolves it against current catalog and says so before launch

Migration rule:

old explicit model -> selectedModelKind="explicit", resolvedLaunchModel=old model
old empty model -> selectedModelKind="default", resolvedLaunchModel=current default at next launch
missing effort -> selectedEffort=null, resolvedEffort=current model default at next launch

9. UI and orchestrator version skew

🎯 7 🛡️ 10 🧠 7
Estimated implementation impact: 280-620 lines

Uncertainty:

  • claude_team and agent_teams_orchestrator can be updated at different times.
  • UI can learn about xhigh, minimal, or a future model like gpt-5.5 before the installed orchestrator can launch it safely.
  • The current orchestrator static Codex helpers can reject a model that Codex app-server already exposed.

Decision:

  • catalog visibility and launch capability are separate contracts
  • UI may display app-server catalog metadata as soon as it is available
  • UI must not enable launch controls that require new orchestrator behavior until runtime capability says that behavior exists
  • provider-explicit Codex model strings can be accepted only after orchestrator declares dynamic Codex model support
  • Codex xhigh can be shown as metadata before Phase 4, but it is disabled for launch until Codex effort pass-through is available

Required runtime capability contract:

export interface ProviderRuntimeCapabilities {
  providerId: TeamProviderId;
  codex?: {
    supportsDynamicAppServerModels: boolean;
    supportsCodexReasoningEffortConfig: boolean;
    supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
    acceptsProviderExplicitFutureModels: boolean;
  };
}

Compatibility rule:

catalog says model/effort exists
+ team policy says model is not disabled
+ runtime capability says launch path supports it
= launch control enabled

If any part is missing, the picker can still display the model, but launch must be disabled with explicit copy.

Recommended copy:

  • Available in Codex, waiting for Agent Teams runtime support
  • This Codex effort is visible in Codex, but this Agent Teams runtime cannot launch it yet
  • Upgrade the Agent Teams runtime to use this model

This avoids a bad state where the user selects xhigh successfully in UI and then gets a late codex exec failure.

10. Future model policy, including gpt-5.5

🎯 8 🛡️ 8 🧠 6
Estimated implementation impact: 240-520 lines

Uncertainty:

  • app-server can expose a new model immediately after OpenAI releases it.
  • the user goal is that new Codex models appear without us shipping a new static list.
  • Agent Teams still needs a safety layer so one unexpected model row does not break team launch flows.

Top 3 policies:

  1. Allow every app-server-visible model immediately: 🎯 8 🛡️ 5 🧠 3, 80-180 lines. This best solves future releases, but it can route unverified models into team launch without product copy or rollback clarity.
  2. Show every app-server-visible model immediately, launch with capability gate plus "new model" warning: 🎯 9 🛡️ 8 🧠 5, 240-520 lines. This keeps future models visible without app releases, but still blocks only real launch incompatibilities.
  3. Hide or disable unknown models until a code release updates policy: 🎯 4 🛡️ 9 🧠 2, 60-120 lines. This is safe but defeats the reason to use model/list.

Chosen policy: option 2.

Implementation rule:

  • app-server-visible, non-hidden models appear in the picker immediately
  • known disabled Agent Teams models remain disabled
  • new unknown models are selectable only if runtime capabilities support dynamic Codex models
  • new unknown models get a New from Codex catalog note until a successful launch or explicit policy promotion marks them verified
  • if the new model does not expose usable text input or any supported effort we can launch, it is shown but disabled
  • hidden models are never introduced into new-team pickers by default

Policy statuses:

export type CodexTeamModelPolicyStatus =
  | 'verified'
  | 'new-from-codex-catalog'
  | 'disabled-for-agent-teams'
  | 'requires-runtime-upgrade'
  | 'missing-from-current-catalog';

This means gpt-5.5 can appear the day app-server returns it, but the UI will not pretend the full Agent Teams launch path is verified unless the local runtime can actually handle provider-explicit dynamic Codex models.

11. Hidden, upgraded, and persisted models

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 160-340 lines

Uncertainty:

  • official docs say includeHidden: false returns picker-visible models by default.
  • persisted teams can reference a model that later becomes hidden, upgraded, renamed, or unavailable.
  • app-server exposes upgrade and upgradeInfo, but we do not know every future migration shape.

Decision:

  • normal picker uses includeHidden: false
  • if a persisted explicit Codex model is not found in the visible catalog, run one scoped refresh with includeHidden: true
  • if hidden lookup finds the model, show it as Hidden in Codex catalog and keep relaunch possible only if runtime capability and team policy allow it
  • if upgrade points to a visible replacement, show a non-destructive migration suggestion
  • never auto-rewrite persisted model ids during restore or relaunch

Relaunch behavior:

visible model found -> normal relaunch
hidden model found -> relaunch allowed only with warning and policy pass
upgrade available -> show "Switch to recommended model" action
missing model -> keep value visible, require user to choose another model before launch

This avoids both failure modes: silently changing a user's team model, or breaking old teams because a model moved out of the default picker.

12. Stored effort schema and non-dialog launch paths

🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 320-760 lines

Uncertainty:

  • effort is not used only in launch dialogs.
  • team metadata, member metadata, backup/restore, draft retry, localStorage launch params, and scheduled/provisioned flows can all carry effort.
  • current normalizers in team data paths may silently discard anything outside low | medium | high.

Decision:

  • provider-aware effort parsing must be added at every inbound boundary, not only in React components
  • old persisted low | medium | high values stay valid
  • new Codex-specific values are preserved only with provider/model context
  • if provider context is missing, parse as legacy effort and do not invent Codex-specific meaning
  • scheduled launches and automation-like flows must either be updated in the same phase or explicitly block Codex-only efforts until updated

High-risk code paths to audit during implementation:

  • src/main/services/team/TeamMembersMetaStore.ts
  • src/main/services/team/TeamDataService.ts
  • src/main/services/team/TeamBackupService.ts
  • src/main/services/team/TeamProvisioningService.ts
  • src/shared/types/schedule.ts
  • src/main/ipc/teams.ts
  • src/main/http/teams.ts
  • renderer launch prefill and draft retry localStorage state

Migration rule:

legacy effort with no provider context -> keep if low | medium | high
codex effort with provider=codex -> validate against selected model catalog
codex effort with provider missing -> store as selected string only, resolve before launch
unsupported restored effort -> show warning, do not silently downgrade

13. Renderer stale state, HMR, and out-of-order refreshes

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 180-380 lines

Uncertainty:

  • previous provider settings work showed transient wrong states after HMR and slow refreshes.
  • catalog, account, and rate limits can refresh with different timings.
  • a stale app-server response can arrive after a newer auth-mode change.

Decision:

  • every provider status refresh should carry a monotonic requestId or snapshotVersion
  • renderer stores the latest accepted version per provider
  • responses older than the latest accepted version are ignored
  • modelCatalog.schemaVersion is required and future versions are treated as degraded, not fatal
  • HMR should keep last ready provider status visible while a refresh is in flight
  • a catalog refresh cannot overwrite account connected state unless it came from the same combined snapshot

Required stale-write guard:

if incoming.providerId != current.providerId -> reject
if incoming.requestId < current.requestId -> reject
if incoming.authScope != current.authScope and incoming.status is not from current auth selection -> keep as stale diagnostics only

This directly targets flicker like Codex native unavailable followed by ready state, or fallback API-key copy appearing while ChatGPT account mode is selected.

14. Privacy, logs, and diagnostics

🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 120-260 lines

Uncertainty:

  • account-scoped cache keys need stable identity, but raw email should not leak into exact logs, runtime snapshots, or persistent diagnostics.
  • API-key source is useful for UX, but no secret or env value should be logged.

Decision:

  • hash managed account identity in memory for cache keys
  • use a per-process salt for volatile cache keys
  • do not persist raw account email solely for model catalog cache
  • exact logs can record authScope=chatgpt or authScope=api_key, not raw account identity
  • diagnostics can record apiKeySource=OPENAI_API_KEY but never the value
  • error messages preserve method/code/timeout, but redact command env and tokens

Required diagnostic fields:

export interface CodexModelCatalogDiagnostics {
  source: 'app-server' | 'static-fallback' | 'unavailable';
  status: 'ready' | 'stale' | 'degraded' | 'unavailable';
  method?: 'model/list';
  errorCode?: string | number | null;
  errorCategory?: string | null;
  binaryVersion?: string | null;
  effectiveAuthMode?: 'chatgpt' | 'api_key' | null;
  cacheAgeMs?: number | null;
}

No UI surface should show Unknown error for catalog failures after this feature.

15. Rollout ordering across repos

🎯 8 🛡️ 10 🧠 6
Estimated implementation impact: 120-260 lines

Uncertainty:

  • claude_team can ship UI before the user has a compatible agent_teams_orchestrator runtime in cache.
  • the app can point to CLAUDE_DEV_RUNTIME_ROOT, bundled runtime cache, or a user-installed runtime binary.

Decision:

  • implement orchestrator support first or behind a UI capability gate
  • Provider Settings can show catalog metadata before launch support exists
  • Create/Launch dialogs must consult runtime capabilities before enabling new Codex models or new Codex efforts
  • the runtime health check should expose a version/capability payload, not force UI to infer support from binary version strings
  • if capabilities are unavailable, default to safe: display metadata, disable launch-only features

Rollout sequence:

  1. Add orchestrator dynamic Codex model and effort capability support.
  2. Add claude_team catalog feature and provider status metadata.
  3. Show catalog in UI with capability gates.
  4. Enable launch when capability and catalog agree.
  5. Remove any temporary guard only after bundled runtime and dev runtime both report capabilities in CI/smoke.

This is the cleanest way to avoid UI and runtime getting out of sync.

16. Codex config/profile/cwd catalog mismatch

🎯 6 🛡️ 10 🧠 8
Estimated implementation impact: 360-900 lines

Uncertainty:

  • official config docs allow model_catalog_json, and profile-level profiles.<name>.model_catalog_json can override it.
  • Codex loads project-scoped .codex/config.toml only when a project or worktree is trusted.
  • codex exec can run with a different cwd, profile, and inline -c overrides than the short-lived app-server preview session.
  • current CodexAppServerSessionFactory starts codex app-server without an explicit cwd or profile.

Failure mode:

  • Provider Settings shows catalog A from global config.
  • Launch runs codex exec in project cwd with project-scoped or profile config and effectively uses catalog B.
  • The user selects a model that preview says is valid, but launch resolves against a different provider/catalog.

Top 3 policies:

  1. Global-only catalog preview: 🎯 7 🛡️ 5 🧠 3, 80-180 lines. Fast and simple, but wrong for project-scoped Codex configs.
  2. Project-scoped catalog preview for launch flows, global preview for dashboard: 🎯 9 🛡️ 9 🧠 7, 360-900 lines. More work, but it matches actual codex exec launch context.
  3. Ignore config and force a static OpenAI Codex provider always: 🎯 5 🛡️ 8 🧠 4, 200-420 lines. Safer than mismatch, but it discards legitimate user Codex config and can surprise power users.

Chosen policy: option 2.

Decision:

  • dashboard/provider card can show a global Codex catalog snapshot
  • Create/Launch dialogs must fetch or resolve catalog for the selected launch cwd
  • if profile selection exists or is introduced, catalog cache key must include profile name
  • if we pass inline config overrides to codex exec, equivalent preview scope must include those overrides or launch must be marked "not preview-verified"
  • if project trust/config cannot be resolved, launch UI falls back to global catalog but shows Catalog may differ for this project

Required preview scope:

export interface CodexModelCatalogScope {
  codexHome: string;
  binaryPath: string;
  binaryVersion: string | null;
  cwd: string | null;
  projectTrust: 'trusted' | 'untrusted' | 'unknown';
  profileName: string | null;
  configFingerprint: string | null;
  preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
  effectiveAuthMode: 'chatgpt' | 'api_key' | null;
  launchOverridesFingerprint: string | null;
}

Cache key correction:

catalogCacheKey =
  binaryPath
  + binaryVersion
  + codexHome
  + cwd or "global"
  + projectTrust
  + profileName or "default-profile"
  + configFingerprint or "unknown-config"
  + launchOverridesFingerprint or "no-launch-overrides"
  + preferredAuthMode
  + effectiveAuthMode
  + forcedLoginMethod or "no-forced-login-method"
  + forcedWorkspaceHash or "no-forced-workspace"
  + managedAccountHash or "no-chatgpt-account"
  + apiKey.source or "no-api-key"

Implementation notes:

  • use app-server config/read when available to get effective config fingerprints for the same scope that launch will use
  • do not parse arbitrary TOML as the primary config source if app-server can resolve effective configuration
  • if app-server cannot scope config/read by cwd/profile, keep that uncertainty visible in diagnostics
  • do not use raw config file contents as a cache key or log payload; hash only the relevant effective keys

Relevant effective keys:

  • model
  • model_provider
  • model_catalog_json
  • profiles.<name>.model_catalog_json
  • model_reasoning_effort
  • forced_login_method
  • forced_chatgpt_workspace_id
  • openai_base_url
  • model_providers.* only as a redacted structural fingerprint
  • projects.<path>.trust_level

Acceptance:

  • a team launch from project A and project B can have different Codex catalog cache entries
  • a trusted project .codex/config.toml changing model_catalog_json invalidates preview for that project
  • global dashboard status does not claim to be launch-exact for every project
  • exact logs record the catalog scope fingerprint, not raw config values

17. Built-in OpenAI Codex provider vs custom/OSS Codex config

🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 260-620 lines

Uncertainty:

  • Codex config supports model_provider, custom providers, oss_provider, and provider auth settings.
  • Agent Teams "Codex" provider is intended to mean native Codex through OpenAI/ChatGPT subscription or API-key billing, not arbitrary custom provider execution.
  • app-server model/list can be influenced by configuration, but our product copy currently talks about Codex subscription.

Decision:

  • this cutover should keep Agent Teams Codex scoped to the built-in OpenAI Codex provider
  • custom provider and OSS provider support should be a separate provider feature, not silently mixed into provider=codex
  • if effective config says model_provider is not built-in OpenAI for the launch scope, show a clear warning and block subscription-mode launch unless the user intentionally switches to a future custom-provider flow
  • when launching Agent Teams Codex, pass or enforce provider config consistently so codex exec uses the same provider class previewed by the catalog

Recommended launch guard:

if provider=codex and effective model_provider is neither missing nor "openai":
  status = degraded
  launch = blocked
  copy = "This project config points Codex at a custom/local provider. Agent Teams Codex currently supports the built-in OpenAI Codex provider only."

If the team wants to support custom providers later:

  • add a separate provider=codex-custom or generic OpenAI-compatible provider
  • do not reuse subscription UX or rate-limit UI
  • do not show ChatGPT account limits for custom provider launches

This prevents a confusing case where UI says "Codex subscription" but runtime actually routes to local OSS or a custom endpoint.

18. Modalities and personality support

🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 120-280 lines

Uncertainty:

  • app-server model rows expose inputModalities and supportsPersonality.
  • Agent Teams launch prompts are text-first today, but future UI can attach images or personality-like instructions.
  • older model catalogs can omit inputModalities, and docs say missing modalities should be treated as ["text", "image"] for backward compatibility.

Decision:

  • launchability requires text input support
  • image support is displayed as capability metadata, not required for normal team launch
  • supportsPersonality=false must not disable normal team launch, but the UI must not claim /personality or personality-specific behavior for that model
  • missing inputModalities uses the documented backward-compatible default

Validation rule:

if inputModalities exists and does not include "text":
  show model, disable launch, copy "This Codex model is not text-launch compatible for Agent Teams"

if supportsPersonality=false:
  hide personality controls for this model if those controls exist

This keeps model picker truthful without overfitting to the current text-only launch flow.

19. Stable app-server surface vs experimental fields

🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 80-180 lines

Uncertainty:

  • app-server has an experimentalApi capability.
  • model/list itself is documented on the stable API overview, but adjacent methods and future richer fields can be experimental.
  • opting into experimental API globally can change response surface and error behavior.

Decision:

  • keep experimentalApi=false for the model catalog rollout
  • rely only on stable model/list fields listed in the docs
  • treat extra fields as diagnostics only
  • add a later explicit spike before using experimental catalog, plugin, or app-server thread features in this path

Acceptance:

  • catalog tests run with experimentalApi=false
  • no Phase 1-5 task depends on experimental fields
  • if a future field appears, normalization ignores it unless we add a typed, tested use case

20. App-server preview vs native exec signoff

🎯 8 🛡️ 10 🧠 6
Estimated implementation impact: 180-420 lines

Uncertainty:

  • model/list is the correct picker source, but the actual launch surface remains codex exec --json.
  • a model can appear in app-server before codex exec in the installed binary handles it correctly.
  • effort config can be accepted syntactically but rejected by the model/provider at runtime.

Decision:

  • app-server catalog is necessary for UI, but not the only release gate for enabling new launch capability
  • Phase 4 must include a live or mocked native-exec compatibility probe for the selected launch path
  • native exec signoff should test model, provider scope, cwd, profile, and non-default effort together
  • if live signoff is not available in CI, use a fixture-based unit test plus one documented local smoke command before merging

Required signoff matrix:

default model + default effort + selected cwd
explicit gpt-5.4 + xhigh + selected cwd
gpt-5.1-codex-mini + high + selected cwd
gpt-5.1-codex-mini + xhigh -> blocked before exec
synthetic future model + capability disabled -> blocked before exec
synthetic future model + capability enabled -> argv accepted by orchestrator test
custom model_provider config -> blocked or explicit custom-provider copy

This prevents the plan from treating app-server catalog presence as proof that the full Agent Teams runtime path is healthy.

21. config/read scope contract is only partially documented

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-420 lines

Uncertainty:

  • docs list config/read, but the detailed request/response shape is not as explicit as model/list.
  • local probe confirms config/read returns { config, origins } and accepts params.
  • local probe confirms missing params returns -32600, so callers must always send {} at minimum.
  • local probe confirms { cwd } and { profile } are accepted, but we still need tests around whether they fully mirror codex exec --cd/--profile in all installations.

Decision:

  • treat config/read as a feature-detected helper, not as a hard dependency for model catalog availability
  • always call config/read with an object, never with missing params
  • include config/read method/code/details in diagnostics
  • if scoped config/read fails but global succeeds, mark launch catalog as scope_unverified, not ready
  • if config/read is missing on older binaries, fall back to global catalog and require runtime capability plus explicit degraded copy before launch enablement

Recommended DTO:

export interface CodexAppServerConfigReadParams {
  cwd?: string | null;
  profile?: string | null;
}

export interface CodexAppServerConfigReadResponse {
  config: Record<string, unknown>;
  origins: Record<string, unknown>;
}

Feature-detect result:

export type CodexConfigReadSupport =
  | 'supported-scoped'
  | 'supported-global-only'
  | 'method-missing'
  | 'failed';

Acceptance:

  • unit tests cover missing params, method-not-found, global success, scoped success, and scoped failure
  • config/read failure never breaks account/rate-limit/model-list reads
  • launch UI does not present a project-scoped catalog as verified unless config scope was actually checked

22. Forced login method and ChatGPT workspace scope

🎯 7 🛡️ 10 🧠 6
Estimated implementation impact: 180-420 lines

Uncertainty:

  • effective Codex config can include forced_login_method.
  • effective Codex config can include forced_chatgpt_workspace_id.
  • workspace/account policy can affect available models, rate limits, and whether ChatGPT subscription mode is valid.
  • previous Codex account work already had a real bug around forced login method, so this is not theoretical.

Decision:

  • auth scope must include forced login method and forced workspace identity when present
  • if UI-selected auth mode conflicts with forced_login_method, effective auth mode wins and UI must explain why
  • forced workspace id must be hashed before cache/log usage
  • rate-limit, account, and model catalog snapshots must be scoped together so workspace changes cannot reuse stale catalog

Auth scope correction:

export interface CodexCatalogAuthScope {
  preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
  effectiveAuthMode: 'chatgpt' | 'api_key' | null;
  forcedLoginMethod: 'chatgpt' | 'api_key' | null;
  managedAccountHash: string | null;
  forcedWorkspaceHash: string | null;
  apiKeySource: string | null;
}

UX rules:

  • if user selected ChatGPT but config forces API key, show Codex config forces API key mode for this scope
  • if user selected API key but config forces ChatGPT, show Codex config forces ChatGPT account mode for this scope
  • if workspace id changes, show Codex workspace changed, refreshing subscription limits and model catalog
  • never show raw workspace id in UI unless Codex app-server provides a display name that is intended for users

Cache invalidation:

  • forced login method change invalidates both auth and catalog cache
  • forced workspace hash change invalidates ChatGPT-scoped rate limits and catalog
  • account logout clears all ChatGPT workspace-scoped entries

23. Model catalog file trust and local file changes

🎯 6 🛡️ 9 🧠 7
Estimated implementation impact: 220-520 lines

Uncertainty:

  • model_catalog_json can point to a local JSON file.
  • app-server resolves effective config, but our app may not know if that JSON file changed unless config fingerprint includes enough origin data.
  • project-scoped .codex/config.toml only applies for trusted projects, so a file can exist but not be active.

Decision:

  • treat model_catalog_json as part of effective config, not as a file we parse directly by default
  • if config/read.origins exposes enough origin/path data, hash only path and mtime for invalidation, not file contents
  • if origin/path data is unavailable, rely on manual refresh and short TTL
  • never read arbitrary model_catalog_json file contents into logs or diagnostics
  • do not apply project-scoped model catalog unless Codex effective config says the project is trusted and the catalog is active

Top 3 invalidation policies:

  1. TTL/manual-refresh only: 🎯 7 🛡️ 6 🧠 2, 40-100 lines. Simple but stale after local file edits.
  2. Hash effective config plus optional mtime for active catalog file: 🎯 8 🛡️ 9 🧠 5, 220-520 lines. Best balance without parsing arbitrary catalog files ourselves.
  3. Parse and watch every possible catalog file: 🎯 5 🛡️ 7 🧠 8, 500-1000 lines. Too much responsibility and security surface for this feature.

Chosen policy: option 2.

Acceptance:

  • active model_catalog_json path change invalidates cache
  • active catalog file mtime change invalidates cache when path is available
  • inactive untrusted project .codex/config.toml does not affect the trusted/global catalog

Top 3 Implementation Options

1. Dedicated Codex model catalog feature - chosen

🎯 9 🛡️ 9 🧠 6
Estimated size: 1200-2400 lines

Core idea:

  • create src/features/codex-model-catalog
  • keep model catalog rules isolated from account UI, provider status plumbing, and Electron transport
  • reuse existing CodexAppServerSessionFactory
  • expose a small feature facade to provider status and renderer model picker
  • update orchestrator only where runtime status and launch effort transport require it

Why it wins:

  • best SOLID alignment
  • clean domain rules for model visibility, effort validation, fallback, and default selection
  • does not make codex-account responsible for model policy
  • least risk to Anthropic
  • easiest to test without full app startup

Main tradeoff:

  • needs small integration glue in existing provider status and team launch flows

2. Fold catalog into codex-account

🎯 7 🛡️ 7 🧠 5
Estimated size: 800-1600 lines

Core idea:

  • extend src/features/codex-account with model/list
  • use account snapshot as the only Codex control-plane snapshot
  • merge account, rate limits, and model catalog in one feature

Why it is tempting:

  • fewer new folders
  • account feature already owns app-server account/rate-limit reads
  • easier to fetch account plus model catalog in one app-server session

Why I do not recommend it:

  • model catalog is not account management
  • the feature becomes a broad Codex control-plane catch-all
  • future provider catalog work would have to pull model rules back out
  • more risk of account UI churn when only model picker changes are needed

3. Full provider model catalog for all providers now

🎯 7 🛡️ 8 🧠 9
Estimated size: 2500-4500 lines

Core idea:

  • build one provider-agnostic model catalog for Anthropic, Codex, Gemini, and future providers
  • move static renderer catalog policy into a shared feature
  • expose one rich contract for all provider model pickers

Why it is attractive:

  • cleanest long-term abstraction
  • one UI model for labels, availability, capabilities, and efforts
  • reduces future duplication

Why not now:

  • too much surface area while Codex runtime cutover is still fresh
  • Anthropic model behavior is already stable and should not be reworked for a Codex catalog issue
  • would delay the concrete Codex model release problem

Current Code Reality

claude_team

Existing app-server infrastructure:

  • src/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.ts
  • src/main/services/infrastructure/codexAppServer/CodexAppServerSessionFactory.ts
  • src/main/services/infrastructure/codexAppServer/protocol.ts
  • src/features/codex-account/main/infrastructure/CodexAccountAppServerClient.ts

Current account client behavior:

  • readAccount() opens one app-server session.
  • readRateLimits() opens another app-server session.
  • logout() opens another app-server session.
  • no model/list protocol types exist yet.
  • CodexAppServerSessionFactory starts codex app-server with no explicit cwd or profile option.
  • app-server initialize response includes codexHome, but the current protocol types do not expose effective config or config fingerprint.

Current shared provider status:

  • CliProviderStatus.models is only string[].
  • CliProviderStatus.modelAvailability has per-model verification status but no rich model metadata.
  • renderer model selector can already prefer runtime-provided providerStatus.models.

Current effort type:

export type EffortLevel = 'low' | 'medium' | 'high';

Risk:

  • adding xhigh directly without provider-specific validation would let Anthropic UI accidentally offer unsupported choices.

Current persistence and non-dialog launch paths:

  • team metadata and member metadata normalize launch-derived provider/model/effort in multiple services.
  • backup/restore copies metadata but restore-time launch preview must still tolerate missing catalog metadata.
  • draft retry and launch prefill can reuse old localStorage state.
  • scheduled launch types can reference the shared effort type.

Risk:

  • updating only the visible launch dialogs would leave hidden paths that silently drop Codex-only efforts or relaunch with stale default semantics.

agent_teams_orchestrator

Current Codex model catalog:

  • src/utils/model/codex.ts
  • static CODEX_MODELS
  • static DEFAULT_CODEX_MODEL
  • isCodexModel() checks only static ids

Current runtime status:

  • getUnifiedRuntimeStatusPayload('codex') returns static Codex model ids.

Current CLI effort:

  • top-level --effort <level> currently accepts low | medium | high | max.
  • Codex native execution is ultimately codex exec --json.
  • installed codex exec --help shows no --effort flag.

Risk:

  • if we send --effort xhigh through current orchestrator, it fails before Codex can use it.
  • if we map Anthropic max to Codex xhigh, the semantics are wrong.
  • if we show xhigh in UI before the launch path supports it, the picker becomes misleading.

Target Architecture

Feature folder

src/features/codex-model-catalog/
  contracts/
    codexModelCatalog.dto.ts
    index.ts
  core/
    domain/
      codexModelCatalog.ts
      codexReasoningEffort.ts
      codexModelCatalogFallback.ts
      normalizeCodexAppServerModel.ts
    application/
      GetCodexModelCatalogUseCase.ts
      CodexModelCatalogPorts.ts
  main/
    composition/
      createCodexModelCatalogFeature.ts
    adapters/
      output/
        CodexAppServerModelCatalogSource.ts
        StaticCodexModelCatalogSource.ts
    infrastructure/
      CodexModelCatalogAppServerClient.ts
      InMemoryCodexModelCatalogCache.ts
  preload/
    index.ts
  renderer/
    adapters/
      codexModelCatalogViewModel.ts
    hooks/
      useCodexModelCatalog.ts
    ui/
      CodexModelEffortHint.tsx

Rules:

  • core/domain has all normalization and validation rules.
  • main/infrastructure is the only layer that knows JSON-RPC method names.
  • renderer never receives raw app-server rows.
  • app shell imports only public feature entrypoints.

App-server lifecycle

Use the existing CodexAppServerSessionFactory.

Request sequence:

  1. Spawn codex app-server.
  2. Send initialize with clientInfo and capabilities.
  3. Send initialized.
  4. Request model/list.
  5. Drain or ignore notifications safely.
  6. Close stdin and terminate the process on completion or timeout.

Recommended timeouts:

  • initialize: 6000ms
  • model/list: 4500ms
  • total model catalog read: 9000ms

Recommended pagination:

  • request limit: 100, includeHidden: false for normal UI
  • follow nextCursor until null
  • hard-stop after 5 pages to avoid runaway loops
  • log a degraded catalog warning if the hard-stop is hit

Single-session snapshot policy

Provider status currently risks multiple sequential app-server starts:

  • account read
  • rate limits read
  • future model list read

This caused slow provider loading in earlier UI work, so the plan should not add another app-server spawn in the hot path.

Preferred design:

  • keep codex-model-catalog as a separate feature for ownership
  • add an optional combined Codex control-plane read in composition
  • when provider status refresh needs account plus rate limits plus model catalog, use one app-server session and issue all three requests inside it
  • each sub-result has independent soft-failure state
  • total snapshot can be partially healthy

Snapshot shape:

export interface CodexControlPlaneSnapshot {
  binary: {
    path: string;
    version: string | null;
  };
  account: CodexAccountSnapshotResult;
  rateLimits: CodexRateLimitsSnapshotResult;
  modelCatalog: CodexModelCatalogSnapshotResult;
  configScope: {
    cwd: string | null;
    profileName: string | null;
    projectTrust: 'trusted' | 'untrusted' | 'unknown';
    configReadSupport: CodexConfigReadSupport;
    effectiveConfigFingerprint: string | null;
    launchOverridesFingerprint: string | null;
    activeModelCatalogFileFingerprint: string | null;
  };
  initialize: {
    codexHome: string;
    platformFamily: string;
    platformOs: string;
  };
  fetchedAt: string;
}

Soft-failure rules:

  • account failure must not erase a fresh cached model catalog
  • model catalog failure must not mark ChatGPT account disconnected
  • rate-limit failure must not hide model picker options
  • if app-server initialize fails, all three sub-results are degraded from the same root cause

Required correction to the existing account flow:

  • current CodexAccountAppServerClient.readAccount() and readRateLimits() each open their own app-server process
  • adding a third standalone readModelCatalog() would be a Provider Settings latency regression
  • implement a combined app-server read path before wiring catalog into provider refresh
  • keep separate methods for mutations and focused tests, but use the combined path for normal status refresh
  • enrich JsonRpcStdioClient errors before catalog integration so the combined reader can classify model/list method failures without losing account truth

Recommended application service shape:

export interface CodexControlPlaneReader {
  readSnapshot(options: CodexControlPlaneReadOptions): Promise<CodexControlPlaneSnapshot>;
}

This can live in codex-model-catalog composition or in a small shared Codex control-plane composition module. Do not put model normalization inside codex-account.

Read scope:

  • Provider Settings global refresh can pass cwd=null.
  • Create/Launch dialogs should pass the selected absolute cwd.
  • Relaunch/restore should pass the team's persisted project path.
  • Scheduled launch validation should pass schedule.launchConfig.cwd.
  • If a future UI supports Codex profile selection, the same profile must be passed to preview and launch.

Contracts

App-server protocol types

Add protocol DTOs to src/main/services/infrastructure/codexAppServer/protocol.ts:

export type CodexAppServerReasoningEffort =
  | 'none'
  | 'minimal'
  | 'low'
  | 'medium'
  | 'high'
  | 'xhigh';

export interface CodexAppServerReasoningEffortOption {
  reasoningEffort: CodexAppServerReasoningEffort;
  description?: string | null;
}

export type CodexAppServerInputModality = 'text' | 'image' | string;

export interface CodexAppServerModel {
  id: string;
  model: string;
  displayName: string;
  description?: string | null;
  hidden: boolean;
  supportedReasoningEfforts: CodexAppServerReasoningEffortOption[];
  defaultReasoningEffort: CodexAppServerReasoningEffort;
  inputModalities?: CodexAppServerInputModality[] | null;
  supportsPersonality?: boolean | null;
  isDefault: boolean;
  upgrade?: string | null;
  upgradeInfo?: unknown;
  availabilityNux?: unknown;
}

export interface CodexAppServerModelListParams {
  cursor?: string | null;
  limit?: number | null;
  includeHidden?: boolean | null;
}

export interface CodexAppServerModelListResponse {
  data: CodexAppServerModel[];
  nextCursor: string | null;
}

export interface CodexAppServerConfigReadParams {
  cwd?: string | null;
  profile?: string | null;
}

export interface CodexAppServerConfigReadResponse {
  config: Record<string, unknown>;
  origins: Record<string, unknown>;
}

config/read caller rule:

  • always pass a params object, even when empty
  • call global config as config/read with {}
  • call project scope as config/read with { cwd }
  • call profile scope as config/read with { profile }
  • if both cwd and profile are needed, test { cwd, profile } in Phase 1 and record the behavior before enabling profile-aware UI

Domain model

Use separate ids:

  • catalogId: app-server id, stable identity for React keys, telemetry, and dedupe
  • launchModel: app-server model when non-empty, otherwise id

Reason:

  • local probe currently returned equal values, but official schema exposes both fields, so they can diverge later.
  • using id for launch would be a latent bug if Codex introduces a display/catalog alias.
export interface CodexCatalogModel {
  catalogId: string;
  launchModel: string;
  displayName: string;
  description: string | null;
  hidden: boolean;
  isDefault: boolean;
  supportedReasoningEfforts: CodexReasoningEffort[];
  defaultReasoningEffort: CodexReasoningEffort | null;
  inputModalities: CodexInputModality[];
  supportsPersonality: boolean;
  upgrade: string | null;
  source: 'app-server' | 'static-fallback';
}

Normalization rules:

  • reject rows without a usable id
  • derive launchModel from model || id
  • default missing inputModalities to ['text', 'image'] for older catalogs
  • default missing supportsPersonality to false
  • accept documented supportedReasoningEfforts objects with reasoningEffort
  • defensively accept string effort entries in tests, because older generated local types and live clients can drift
  • drop duplicate catalogId rows after the first visible row
  • drop duplicate launchModel rows after the first visible row unless a hidden row is the only available row
  • keep unknown effort strings out of the selectable UI, but preserve them in diagnostics
  • if no model is marked isDefault, choose static fallback default only as degraded fallback and label it as such

Provider status contract

Add an optional rich catalog to CliProviderStatus:

export interface CliProviderModelCatalog {
  schemaVersion: 1;
  source: 'app-server' | 'static-fallback' | 'unavailable';
  status: 'ready' | 'stale' | 'degraded' | 'unavailable';
  fetchedAt: string | null;
  staleAt: string | null;
  binary?: {
    path: string | null;
    version: string | null;
  };
  authScope?: {
    preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
    effectiveAuthMode: 'chatgpt' | 'api_key' | null;
    forcedLoginMethod?: 'chatgpt' | 'api_key' | null;
    managedAccountHash?: string | null;
    forcedWorkspaceHash?: string | null;
    apiKeySource?: string | null;
  };
  launchScope?: {
    cwd: string | null;
    profileName: string | null;
    projectTrust: 'trusted' | 'untrusted' | 'unknown';
    configFingerprint: string | null;
    launchOverridesFingerprint: string | null;
  };
  errorMessage?: string | null;
  defaultModelId?: string | null;
  defaultLaunchModel?: string | null;
  models: CliProviderModelInfo[];
}

export interface CliProviderModelInfo {
  catalogId: string;
  launchModel: string;
  displayName: string;
  description?: string | null;
  hidden?: boolean;
  isDefault?: boolean;
  supportedReasoningEfforts?: CliProviderReasoningEffort[];
  defaultReasoningEffort?: CliProviderReasoningEffort | null;
  inputModalities?: string[];
  supportsPersonality?: boolean;
  upgrade?: string | null;
}

export type CliProviderReasoningEffort =
  | 'none'
  | 'minimal'
  | 'low'
  | 'medium'
  | 'high'
  | 'xhigh'
  | 'max';

export interface CliProviderRuntimeCapabilities {
  schemaVersion: 1;
  codex?: {
    supportsDynamicAppServerModels: boolean;
    supportsCodexReasoningEffortConfig: boolean;
    supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
    acceptsProviderExplicitFutureModels: boolean;
  };
}

Backwards compatibility:

  • keep CliProviderStatus.models: string[]
  • add CliProviderStatus.runtimeCapabilities?: CliProviderRuntimeCapabilities
  • for Codex, derive models from modelCatalog.models.map(model => model.launchModel)
  • for Anthropic and Gemini, do not require modelCatalog
  • old renderers continue to work from models
  • new renderers prefer modelCatalog when present
  • never put team-agent disabled policy directly into CliProviderModelCatalog; catalog describes Codex availability, while Agent Teams policy is applied by renderer and launch validators
  • never infer launch capability only from catalog presence

Renderer integration hotspot:

  • update TeamModelRuntimeProviderStatus in src/renderer/utils/teamModelAvailability.ts to include modelCatalog
  • update getRuntimeSelectorModels() to use modelCatalog.models[*].launchModel for Codex
  • update getAvailableTeamProviderModelOptions() to map rich Codex options with display labels, default badge, and catalog diagnostics
  • keep Anthropic path on getFallbackTeamProviderModelOptions()
  • keep Gemini path on existing models: string[] until Gemini has a richer catalog

Team launch effort contract

Do not add a separate per-provider lane for this feature.

Use existing team-level model/provider selection, but make effort provider-aware.

Recommended implementation:

  • keep persisted field name effort
  • widen internal effort type to ProviderReasoningEffort
  • add provider/model validators at every launch boundary
  • Anthropic UI only shows low | medium | high
  • Codex UI shows only the selected model's supportedReasoningEfforts
  • orchestrator accepts minimal | low | medium | high | xhigh for Codex and low | medium | high | max for Anthropic paths

Existing validator hotspots:

  • src/shared/types/team.ts currently defines EffortLevel = 'low' | 'medium' | 'high'
  • src/main/ipc/teams.ts currently validates only low | medium | high
  • src/main/http/teams.ts currently validates only low | medium | high
  • src/renderer/components/team/dialogs/EffortLevelSelector.tsx currently hardcodes only Default | Low | Medium | High
  • LaunchTeamDialog, CreateTeamDialog, member draft rows, and member editor utilities currently cast strings with as EffortLevel

Required migration:

  • replace unsafe as EffortLevel casts with a provider-aware normalization function
  • parse provider before parsing effort in IPC and HTTP paths
  • validate lead effort against lead provider/model
  • validate member effort against each member's resolved provider/model
  • keep old persisted low | medium | high values readable without migration

Validation rule:

provider=codex:
  effort must be in selectedModel.supportedReasoningEfforts

provider=anthropic:
  effort must be low | medium | high

provider=gemini:
  keep current behavior unless Gemini gets a richer effort contract

Important:

  • do not map Anthropic max to Codex xhigh
  • do not map Codex xhigh to Anthropic max
  • if selected Codex model changes and old effort is unsupported, reset to the new model's defaultReasoningEffort
  • if catalog is unavailable, only allow static fallback efforts that are proven launchable

Launch identity rule:

  • effort is user selection
  • resolvedEffort is what launch sends to runtime
  • if user selection is empty/default, resolvedEffort comes from app-server defaultReasoningEffort
  • if resolved effort equals app-server default, runtime transport may omit model_reasoning_effort, but exact logs still record the resolved value

Runtime Launch Transport

This was the highest-risk area in the earlier plan. The corrected plan is explicit.

Facts:

  • codex exec has --model.
  • codex exec has -c, --config key=value.
  • codex exec has no documented --effort.
  • Codex config has model_reasoning_effort.
  • model_reasoning_effort supports minimal | low | medium | high | xhigh.

Therefore:

  • Codex native launch must not pass --effort xhigh to Codex CLI.
  • Orchestrator may keep accepting --effort as its public Agent Teams flag.
  • When provider is Codex native, orchestrator must translate accepted effort into codex exec -c model_reasoning_effort="value".
  • When no effort is selected, omit model_reasoning_effort and let Codex use its model default.
  • When effort equals the selected model's app-server default, either omit it or pass it consistently, but pick one policy and test it.

Recommended policy:

  • omit effort when it equals app-server defaultReasoningEffort
  • pass effort only when user explicitly selected a non-default value

Reason:

  • this tracks Codex defaults as Codex evolves
  • exact logs remain cleaner
  • future app-server default changes are not blocked by stale persisted values

Live signoff command shape:

codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"

Quoting requirement:

  • command builder must pass -c and model_reasoning_effort="xhigh" as separate argv entries
  • shell-rendered exact logs can show -c model_reasoning_effort='"xhigh"'
  • tests should assert argv arrays, not only shell strings
  • never concatenate user-controlled effort into a shell string without argv escaping

Prelaunch validation must block:

  • gpt-5.1-codex-mini with low
  • gpt-5.1-codex-mini with xhigh
  • unknown effort strings from app-server until explicitly supported by our UI and orchestrator type

Static Fallback

Fallback stays necessary because:

  • user may have an older Codex binary
  • app-server may fail to initialize
  • app-server may start but not support model/list
  • offline usage should not make the entire model picker empty
  • tests should not depend on live Codex availability

Fallback rules:

  • fallback source is explicitly marked static-fallback
  • fallback never claims to be current
  • fallback has a short visible warning in Provider Settings only when user is choosing Codex models
  • fallback model list should be minimal and conservative
  • fallback must not include newly guessed future models
  • fallback caused by missing model/list should include an upgrade hint tied to the detected Codex binary version when available

Recommended fallback models:

  • gpt-5.4
  • gpt-5.4-mini
  • gpt-5.3-codex
  • gpt-5.2
  • gpt-5.1-codex-mini

Fallback effort rules:

  • use medium | high for gpt-5.1-codex-mini
  • use low | medium | high | xhigh for known models only if live signoff confirms model_reasoning_effort pass-through
  • otherwise fallback UI can show richer metadata but disable non-launchable options

API-key mode note:

  • do not use OpenAI /v1/models as the primary Codex picker for subscription-backed Codex
  • optional API /v1/models fallback is allowed only for explicit API-key mode diagnostics
  • if API /v1/models disagrees with Codex app-server model/list, Codex app-server wins for native Codex execution
  • reason: the actual runtime surface is codex exec, and app-server describes what Codex clients should show

Cache And Refresh

Goal:

  • make model updates feel fresh without making Provider Settings slow or flaky.

Main-process cache:

  • key: Codex binary path plus Codex binary version plus Codex home plus launch cwd/profile/config fingerprint plus preferred auth mode plus effective auth mode plus managed account hash plus API-key source
  • success TTL: 10 minutes
  • stale TTL: 24 hours
  • in-flight dedupe: one live model/list request per key
  • manual refresh bypasses success TTL but still dedupes in-flight work
  • auth mode change invalidates the ready cache for UI selection purposes
  • forced_login_method and forced workspace changes invalidate the affected auth/catalog scope
  • logout clears ChatGPT-scoped catalog cache
  • API key source change clears API-key-scoped catalog cache
  • project .codex/config.toml, global config.toml, or model_catalog_json changes clear the affected scope when detected by fingerprint change
  • binary path or version change clears all Codex model catalog cache entries

Renderer cache:

  • consume CliProviderStatus.modelCatalog
  • no independent polling loop in the model picker
  • refresh through existing provider status refresh action

Dashboard policy:

  • do not run model/list on every dashboard render
  • use existing provider status refresh cadence
  • model catalog stale state can be shown only inside settings/model picker, not as a scary dashboard error
  • dashboard catalog is a global/default-scope summary, not a promise that every project cwd has the same catalog

Provider Settings policy:

  • open dialog with cached provider status immediately
  • refresh in background
  • show Checking... only for the area still being refreshed
  • never replace a ready catalog with empty state during a refresh

Avoid this bug:

  • do not set global provider status to unavailable while only the model catalog refresh is pending
  • do not replace a ChatGPT-ready account state with a catalog timeout
  • do not show generic Unknown error; preserve app-server method, timeout, and fallback source in diagnostics
  • if auto resolves to ChatGPT, API-key detection copy stays secondary
  • if auto resolves to API key because ChatGPT is unavailable, show why ChatGPT was skipped before showing API-key catalog

UI Behavior

Model picker

When provider=codex:

  • prefer providerStatus.modelCatalog.models
  • option value is launchModel
  • React key can use catalogId
  • label uses displayName
  • default badge uses isDefault
  • hidden app-server models are excluded from normal selector unless already persisted in a team
  • disabled state uses existing Agent Teams policy plus app-server upgrade hints
  • runtime-capability state controls whether a visible model is launchable
  • fallback badge says Using fallback catalog only when source is fallback
  • if app-server says a model is available but Agent Teams disables it, show Available in Codex, disabled for Agent Teams
  • if app-server says a future model exists but runtime capability is missing, show Available in Codex, waiting for Agent Teams runtime support
  • if a persisted model is missing from current catalog, show it as Unavailable in current Codex catalog and require user confirmation before relaunch
  • if the dialog has a selected cwd and only a global catalog is available, show global options as provisional until project-scoped catalog finishes
  • if project-scoped catalog differs from global catalog, keep the user's explicit selection only if it exists in the project-scoped catalog or is a preserved persisted value

When catalog is loading:

  • keep previous options visible
  • show a subtle "Refreshing models" state
  • do not show an empty Codex picker unless no cached or fallback models exist
  • label provisional global catalog rows as Checking this project... when launch cwd is known

When catalog fails:

  • use stale cache if present
  • otherwise use static fallback
  • show the app-server error in diagnostics, not as a generic unknown error

Effort selector

When provider=codex and selected model has catalog metadata:

  • show efforts from supportedReasoningEfforts
  • mark defaultReasoningEffort as default
  • include xhigh if returned by app-server and runtime capability says Codex effort config pass-through is supported
  • if runtime capability is missing, show Codex-only efforts as metadata or disabled rows, not selectable launch values
  • if selected effort is no longer valid, reset to default with a small explanation
  • if model is Agent Teams-disabled, keep effort selector read-only or disabled to avoid suggesting launchability

When selected model has no catalog metadata:

  • show only safe fallback efforts
  • do not show xhigh unless launch pass-through is implemented and tested

When provider=anthropic:

  • keep current selector behavior
  • do not show Codex-only minimal, none, or xhigh
  • do not change Anthropic copy

Default model

Recommended behavior:

  • app-server isDefault defines the Codex default in UI
  • "Default" label can render as Default (gpt-5.4) or Default (GPT-5.4) when catalog is ready
  • new Codex teams can display Default, but launch must resolve it to a concrete resolvedLaunchModel
  • existing teams keep their persisted model unless user changes it
  • do not rewrite old team metadata just because app-server default changed
  • exact logs and team metadata should record both selected Default and concrete resolved model

Reason:

  • new teams benefit from current Codex defaults
  • existing teams remain explainable even if Codex default changes later

Orchestrator Changes

Model status

Short term:

  • keep static CODEX_MODELS for standalone fallback and non-app UI compatibility
  • add richer status only if orchestrator can read app-server directly without slowing CLI startup

Recommended first cut:

  • claude_team owns app-server model catalog for UI
  • orchestrator keeps static runtime status until a dedicated orchestrator catalog source is added
  • launch validation accepts provider-explicit Codex model strings even if not in static CODEX_MODELS
  • orchestrator exposes runtime capabilities for dynamic Codex model ids and Codex reasoning effort config pass-through

Reason:

  • UI is where the dynamic picker is needed immediately
  • orchestrator should not reject a future model that Codex app-server already exposed and claude_team selected
  • UI should not guess whether the current runtime can launch that future model

Validation

Update validation so:

  • provider-explicit codex launches can use model strings from app-server catalog
  • unknown model strings are not guessed as Codex without provider context
  • static isCodexModel() remains valid for generic detection, not authoritative for provider-explicit launches
  • if provider context is missing, keep existing conservative static validation

Effort transport

Update orchestrator:

  • accept Codex efforts minimal | low | medium | high | xhigh
  • preserve Anthropic max
  • in Codex native executor, convert Codex effort to -c model_reasoning_effort='"value"'
  • do not pass unsupported effort values to codex exec
  • exact logs should show the selected effort as normalized Agent Teams metadata and the actual Codex config override

Required tests:

  • Codex native xhigh becomes -c model_reasoning_effort='"xhigh"'
  • no effort omits model_reasoning_effort
  • Anthropic max remains Anthropic-only
  • Codex max is rejected
  • Anthropic xhigh is rejected

Concrete Implementation Touchpoints

claude_team:

  • src/main/services/infrastructure/codexAppServer/protocol.ts - add app-server model DTOs
  • src/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.ts - preserve JSON-RPC error code, method, and details
  • src/main/services/infrastructure/codexAppServer/CodexBinaryResolver.ts or a nearby service - expose binary version for cache invalidation
  • src/features/codex-model-catalog - new feature for catalog domain, use case, app-server source, fallback source, and cache
  • src/features/codex-account/main/composition/createCodexAccountFeature.ts - coordinate combined control-plane snapshot or delegate to shared reader
  • src/features/codex-account/renderer/mergeCodexProviderStatusWithSnapshot.ts - preserve account truth while merging model catalog truth
  • src/shared/types/cliInstaller.ts - add optional provider model catalog
  • src/shared/types/team.ts - widen provider-aware effort types without breaking old persisted values
  • src/shared/types/schedule.ts - prevent scheduled launches from dropping Codex-specific efforts
  • src/main/services/team/TeamDataService.ts - preserve provider-aware effort and launch identity when reconstructing team state
  • src/main/services/team/TeamMembersMetaStore.ts - stop filtering Codex efforts down to legacy low | medium | high
  • src/main/services/team/TeamBackupService.ts and restore paths - preserve additive launch identity and tolerate old backups
  • src/main/services/runtime/CliProviderModelAvailabilityService.ts - keep runtime verification compatible with launchModel values and do not verify hidden/catalog-only rows by accident
  • src/main/ipc/teams.ts and src/main/http/teams.ts - parse provider first, then validate effort
  • src/renderer/utils/teamModelAvailability.ts - consume rich Codex catalog
  • src/renderer/utils/teamModelCatalog.ts - demote Codex static list to fallback and labels only
  • src/renderer/components/team/dialogs/EffortLevelSelector.tsx - make options provider/model-aware
  • src/renderer/components/team/dialogs/LaunchTeamDialog.tsx and CreateTeamDialog.tsx - remove unsafe effort casts and persist resolved launch identity
  • member draft/editor components - validate per-member resolved provider/model/effort
  • renderer launch prefill and draft retry storage - add a versioned launch identity payload and tolerate old entries

agent_teams_orchestrator:

  • src/entrypoints/sdk/runtimeTypes.ts - add provider-aware Codex effort support
  • src/main.tsx - update --effort parser or provider-specific validation path
  • src/utils/effort.ts and src/utils/providerEffort.ts - separate Anthropic max from Codex xhigh
  • Codex native executor path - convert effort to -c model_reasoning_effort
  • src/utils/model/codex.ts - rename static list semantics to fallback/static detection
  • src/utils/model/validateModel.ts - allow provider-explicit Codex app-catalog models
  • runtime status/capability endpoint - expose dynamic Codex model and effort pass-through support
  • exact-log/runtime status code - record selected model, resolved model, selected effort, resolved effort, and config override

Phased Implementation

Phase 0 - contracts and live spike

Commit boundary: docs(codex): plan app-server model catalog

Tasks:

  • add this plan
  • keep live probe output in signoff notes or test fixture
  • confirm installed Codex supports model/list
  • confirm one app-server session can read account, rate limits, and model catalog
  • confirm docs support model_reasoning_effort
  • decide exact shell quoting for -c model_reasoning_effort
  • capture fixtures for at least two catalog shapes: current live shape and synthetic id !== model
  • capture current Codex binary version and document cache invalidation expectations

Acceptance:

  • plan exists in the dedicated worktree
  • no code behavior changes
  • weak areas are explicitly called out

Phase 1 - app-server model catalog feature

Commit boundary: feat(codex): add app-server model catalog source

Tasks:

  • add structured JSON-RPC request errors with method/code/details
  • expose or probe Codex binary version for catalog cache keys
  • add effective config fingerprint support using app-server config/read when available
  • add config/read support detection and always send {} params at minimum
  • add src/features/codex-model-catalog
  • add app-server protocol types
  • add CodexModelCatalogAppServerClient
  • add normalization domain rules
  • add static fallback source
  • add in-memory cache with TTL and in-flight dedupe
  • include launch scope fields in cache keys: cwd, profile, trust, config fingerprint, launch override fingerprint
  • include forced login method and forced workspace hash in auth-scoped cache keys
  • normalize both documented effort option objects and defensive string effort values
  • classify method not found, timeout, malformed response, and empty catalog separately
  • add structured diagnostics without raw account email or secret-bearing env values
  • expose feature facade from main composition

Acceptance:

  • JSON-RPC method not found can be detected in tests
  • binary version changes invalidate catalog cache
  • config fingerprint changes invalidate catalog cache for that scope
  • forced login/workspace changes invalidate account, limits, and catalog cache for that scope
  • unit tests cover normalization, fallback, pagination, duplicate ids, missing modalities, unknown effort strings, and id !== model
  • app-server client tests cover model/list request params and timeout labels
  • method-not-found falls back without marking account disconnected
  • diagnostics include source, status, method, error category, binary version, effective auth mode, and cache age
  • no renderer behavior changes yet

Phase 2 - provider status integration

Commit boundary: feat(runtime): expose codex model catalog metadata

Tasks:

  • add optional modelCatalog to CliProviderStatus
  • add optional runtimeCapabilities to CliProviderStatus
  • merge Codex model catalog into provider status
  • keep models: string[] derived from launchModel
  • make provider refresh use cached, auth-scoped catalog
  • implement combined account/rate-limits/catalog app-server read for normal refresh
  • avoid extra app-server session in hot paths where account snapshot already refreshes
  • clear ChatGPT-scoped catalog on logout and API-key-scoped catalog when API key source changes
  • clear all catalog entries when Codex binary path or version changes
  • ensure auto catalog scope follows effective launch auth mode, not just configured preference
  • add request/snapshot versioning so stale refresh responses cannot overwrite newer auth state
  • support global provider refresh and project-scoped launch refresh as different catalog scopes
  • preserve Anthropic provider status shape

Acceptance:

  • Codex provider status includes modelCatalog
  • Codex provider status includes runtime capability metadata when available
  • old models still works
  • auto with ChatGPT ready uses ChatGPT-scoped catalog even if API key is detected
  • auto with ChatGPT unavailable and API key ready uses API-key-scoped catalog with clear degraded copy
  • forced login method overrides are reflected in effective auth copy and cache scope
  • one normal Codex provider refresh does not spawn separate app-server processes for account, limits, and catalog
  • Anthropic snapshots are byte-for-byte equivalent except ordering noise already present
  • provider dashboard does not block on a slow catalog refresh when stale cache exists
  • older refresh results are ignored after auth mode or runtime capability changes
  • global dashboard catalog and project launch catalog do not overwrite each other

Phase 3 - dynamic UI model picker and effort selector

Commit boundary: feat(codex): use dynamic model catalog in team launch UI

Tasks:

  • update Codex model picker to prefer rich catalog
  • show app-server labels, default badge, and fallback source state
  • update effort selector to be provider/model-aware
  • show xhigh metadata only for Codex models that return it
  • make xhigh selectable only when runtime capability says Codex effort config pass-through is supported
  • hide Codex-only efforts for Anthropic
  • reset invalid effort on model change
  • preserve missing persisted models as visible warning rows instead of silently clearing selection
  • keep Agent Teams disabled policy separate from Codex app-server availability
  • show future app-server models immediately, with New from Codex catalog status when policy has not verified them yet
  • when cwd is selected, refresh project-scoped Codex catalog before enabling launch-only controls

Acceptance:

  • gpt-5.1-codex-mini shows only medium | high
  • gpt-5.3-codex-spark defaults to high
  • gpt-5.4 shows low | medium | high | xhigh as catalog metadata
  • xhigh is disabled with runtime-upgrade copy until capability support is present
  • app-server-visible but Agent Teams-disabled model shows disabled copy, not unavailable copy
  • synthetic future gpt-5.5 fixture appears without touching static catalog
  • persisted model missing from current catalog is visible with a warning
  • Anthropic UI remains low | medium | high
  • static fallback still renders when app-server is unavailable
  • global catalog can be displayed provisionally, but launch enablement waits for project-scoped catalog or explicit degraded confirmation

Phase 4 - launch validation and Codex effort pass-through

Commit boundary: feat(runtime): pass codex reasoning effort through native exec

Tasks:

  • widen team launch effort validation with provider-specific rules
  • update IPC and HTTP validators
  • update TeamProvisioningService request shaping
  • persist additive ProviderModelLaunchIdentity into team metadata, exact-log metadata, and backup/restore payloads where launch identity is reconstructed
  • update orchestrator parser and runtime types
  • expose orchestrator runtime capability metadata for dynamic Codex models and Codex effort config
  • translate Codex effort to argv entries ['-c', 'model_reasoning_effort="value"']
  • keep Anthropic max separate
  • add exact-log metadata for selected model, resolved launch model, catalog source, selected effort, and resolved effort
  • resolve Default to concrete launch model before provisioning
  • update scheduled/provisioned launch paths or block Codex-only efforts in those paths until updated
  • enforce built-in OpenAI Codex provider scope or block custom/OSS provider configs with clear copy
  • pass profile/cwd/config overrides consistently between preview and codex exec

Acceptance:

  • Codex xhigh launch reaches codex exec as model_reasoning_effort
  • Codex max is rejected before launch
  • Anthropic xhigh is rejected before launch
  • unsupported model-effort pairs are blocked before provisioning
  • provider-explicit synthetic future model is accepted only when runtime capability says dynamic Codex models are supported
  • member metadata, team metadata, draft retry, and backup/restore preserve provider-aware effort
  • replay/exact logs show what was selected, what default resolved to, and what was passed to Codex
  • exact logs include catalog scope fingerprint and provider scope, but not raw config values

Phase 5 - cleanup and fallback tightening

Commit boundary: refactor(codex): demote static model catalog to fallback

Tasks:

  • rename static Codex catalog helpers to make fallback status explicit
  • remove UI assumptions that static list is authoritative
  • make future provider-explicit Codex ids launchable when selected from app-server catalog
  • add diagnostics for catalog source and staleness
  • document fallback behavior
  • add a fixture/test with synthetic future model gpt-5.5
  • remove any remaining hardcoded Codex model order from the primary Codex UI path
  • add hidden-model fixture and upgrade-suggestion fixture
  • add one migration test for old localStorage launch prefill without provider model launch identity
  • add project-scoped catalog fixture with model_catalog_json
  • add custom-provider config fixture
  • add forced login method and forced workspace fixtures
  • add config/read method-missing and invalid-params fixtures

Acceptance:

  • new app-server model can appear in UI without code changes
  • static fallback is visible as fallback in diagnostics
  • no code path treats static CODEX_MODELS as the only valid Codex provider model list
  • synthetic gpt-5.5 appears through app-server fixture and can be selected without touching static catalog
  • hidden persisted model is preserved with warning and is not introduced into new-team picker
  • project-scoped catalog differences are visible and do not corrupt global provider status
  • forced login method changes are visible and do not reuse stale catalog/rate-limit scope

Test Plan

claude_team unit tests

Add tests for:

  • structured JSON-RPC error classification
  • binary version cache invalidation
  • effective config fingerprint cache invalidation
  • config/read support detection, including invalid missing params
  • project-scoped model_catalog_json fixture
  • app-server model normalization
  • id vs model split
  • default model selection
  • per-model effort options
  • unknown effort filtering
  • auth-scoped catalog cache keys
  • auto auth resolving to ChatGPT vs API-key catalog scope
  • combined app-server snapshot partial failures
  • method-not-found fallback for older Codex app-server
  • fallback catalog source
  • stale cache behavior
  • stale refresh response is ignored after newer auth-scope request
  • global catalog and project-scoped catalog use separate cache entries
  • forced login method and forced workspace hash use separate cache entries
  • custom/OSS model_provider config is blocked or marked unsupported for Agent Teams Codex
  • raw managed account email does not appear in catalog diagnostics or exact-log metadata
  • provider status models compatibility
  • provider status runtime capabilities compatibility
  • provider model availability uses launchModel, not catalogId
  • renderer model picker with rich catalog
  • renderer effort selector with Codex and Anthropic providers
  • renderer disables Codex-only efforts when runtime capability is missing
  • renderer shows synthetic future model as New from Codex catalog
  • renderer preserves hidden persisted model after includeHidden: true recovery
  • persisted missing model warning row
  • Agent Teams disabled policy overlay for app-server-visible models
  • backup/restore reads old metadata and preserves new launch identity when present
  • draft retry and launch prefill read old localStorage entries without dropping provider/model identity
  • scheduled launch validation either supports Codex-specific effort or blocks it with explicit error
  • launch preview with selected cwd does not enable launch from global-only catalog when project-scoped catalog is still unknown

Suggested commands:

pnpm vitest run \
  test/features/codex-model-catalog \
  test/features/codex-account \
  test/renderer/components/team \
  test/renderer/utils/teamModelCatalog.test.ts

agent_teams_orchestrator tests

Add tests for:

  • provider-explicit Codex model validation
  • Codex effort parser accepts minimal | low | medium | high | xhigh
  • Anthropic effort parser keeps existing behavior
  • Codex native executor emits -c model_reasoning_effort
  • Codex native executor builds argv entries, not unsafe shell concatenation
  • no effort omits Codex effort config
  • max is not accepted for Codex
  • synthetic gpt-5.5 passes when provider is explicitly Codex and model came from app catalog
  • capability payload reports dynamic Codex model support and effort config support
  • provider-explicit future model fails closed when capability is disabled
  • Codex native exec argv includes cwd/profile/config override semantics that match preview scope
  • custom provider config is not silently routed through subscription Codex UX

Suggested command:

pnpm test -- runtimeBackends providerEffort spawnMultiAgent codex

live smoke

Run only when developer has Codex login/API available:

codex app-server

JSON-RPC smoke:

{ "jsonrpc": "2.0", "id": 1, "method": "model/list", "params": { "limit": 20, "includeHidden": false } }

Native exec effort smoke:

codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"

Failure smoke:

codex exec --json --model gpt-5.1-codex-mini -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"

Expected:

  • our app should block the second case before launch once catalog metadata is available
  • if run manually, Codex may return model/provider-specific error, but product UX should not rely on that late failure

Risks And Mitigations

Risk 1 - app-server startup slows provider settings

🎯 8 🛡️ 8 🧠 5

Mitigation:

  • cache model catalog in main process
  • dedupe in-flight refreshes
  • use stale cache while refreshing
  • combine account/rate-limit/catalog reads where possible
  • never clear ready UI while refresh is pending

Risk 2 - effort values leak into Anthropic

🎯 9 🛡️ 9 🧠 4

Mitigation:

  • provider-specific effort validation
  • renderer selector branches by provider and selected model
  • tests for Anthropic not showing xhigh, minimal, or none
  • orchestrator rejects invalid provider-effort pairs

Risk 3 - id and model diverge later

🎯 8 🛡️ 9 🧠 3

Mitigation:

  • use catalogId for identity
  • use launchModel for runtime
  • tests with fixture where id !== model

Risk 4 - app-server catalog has unknown fields or new efforts

🎯 8 🛡️ 8 🧠 5

Mitigation:

  • tolerant protocol DTOs
  • unknown efforts preserved in diagnostics but not selectable
  • add one small allow-list update when product intentionally supports a new effort
  • no hard crash on unknown inputModalities

Risk 5 - static fallback becomes accidentally authoritative again

🎯 7 🛡️ 8 🧠 4

Mitigation:

  • name fallback helpers clearly
  • include source in model catalog
  • tests assert app-server source wins over fallback
  • UI diagnostics expose fallback source

Risk 6 - launch path accepts model from UI but orchestrator rejects it

🎯 8 🛡️ 8 🧠 6

Mitigation:

  • provider-explicit Codex launch validation should trust provider=codex plus app-server-selected model
  • static isCodexModel() remains only a generic detector
  • exact tests with a future-model fixture like gpt-5.5

Risk 7 - auth-scoped catalog leaks between modes

🎯 7 🛡️ 9 🧠 6

Mitigation:

  • include auth scope in catalog cache key
  • clear scoped cache on logout and API-key source changes
  • tests for ChatGPT catalog not being reused in API-key mode
  • UI labels catalog source and auth scope in diagnostics

Risk 8 - Default becomes nondeterministic across relaunch

🎯 8 🛡️ 9 🧠 6

Mitigation:

  • persist selected model kind and resolved launch model in launch identity
  • exact logs record both Default and concrete model
  • relaunch preview shows current default resolution before launch
  • do not silently rewrite old explicit models

Risk 9 - older Codex binary lacks model/list

🎯 7 🛡️ 9 🧠 5

Mitigation:

  • preserve JSON-RPC error code and method
  • classify method-not-found separately from app-server failure
  • show static fallback with Codex upgrade hint
  • cache key includes binary version so upgrades refresh the catalog

Risk 10 - auto auth shows the wrong catalog

🎯 7 🛡️ 9 🧠 6

Mitigation:

  • resolve effective auth mode before catalog scope
  • keep ChatGPT and API-key catalogs separate
  • UI copy distinguishes selected preference, effective launch mode, and fallback credentials
  • tests cover ChatGPT-ready + API-key-present and ChatGPT-missing + API-key-ready cases

Risk 11 - UI enables a capability the installed runtime cannot launch

🎯 7 🛡️ 10 🧠 7

Mitigation:

  • add explicit runtime capability metadata
  • display catalog metadata separately from launch enablement
  • fail closed when capability is missing or stale
  • test Phase 3 UI against a pre-Phase-4 runtime fixture

Risk 12 - future models appear but break team-agent behavior

🎯 8 🛡️ 8 🧠 6

Mitigation:

  • split Codex catalog availability from Agent Teams policy status
  • show new models as New from Codex catalog
  • block only hard incompatibilities: runtime capability missing, unsupported modality, disabled policy, unsupported effort
  • exact logs record new-model status for later debugging

Risk 13 - hidden or upgraded persisted models are silently lost

🎯 8 🛡️ 9 🧠 5

Mitigation:

  • run one includeHidden: true lookup for persisted explicit models missing from visible catalog
  • preserve model value during restore and relaunch preview
  • show upgrade suggestions without auto-rewriting metadata
  • test hidden-model and upgrade fixtures

Risk 14 - non-dialog launch path drops Codex effort

🎯 7 🛡️ 9 🧠 7

Mitigation:

  • audit team metadata, members metadata, backup/restore, draft retry, launch prefill, and schedule types
  • parse provider before parsing effort at every main-process boundary
  • block Codex-only effort in any path not updated in the same phase
  • add tests outside React launch dialogs

Risk 15 - HMR or slow refresh overwrites correct provider state

🎯 8 🛡️ 9 🧠 5

Mitigation:

  • add request/snapshot versioning
  • ignore out-of-order provider status responses
  • do not let catalog failures overwrite account truth
  • keep last ready state visible while a refresh is pending

Risk 16 - global catalog preview differs from project launch catalog

🎯 6 🛡️ 10 🧠 8

Mitigation:

  • include cwd, profile, trust, config fingerprint, and launch override fingerprint in catalog scope
  • use app-server config/read when available to derive effective config
  • keep dashboard/global catalog separate from launch/project catalog
  • require project-scoped catalog before enabling launch-only controls when cwd is known

Risk 17 - custom or OSS Codex config is mistaken for subscription Codex

🎯 7 🛡️ 9 🧠 7

Mitigation:

  • keep Agent Teams Codex scoped to built-in OpenAI Codex provider
  • detect effective model_provider when possible
  • block or degrade custom/OSS provider configs with explicit copy
  • do not show ChatGPT account limits for custom provider execution

Risk 18 - non-text model row appears in catalog

🎯 8 🛡️ 9 🧠 4

Mitigation:

  • require text input modality for Agent Teams launch
  • treat missing inputModalities with the documented backward-compatible default
  • do not claim personality support when supportsPersonality=false

Risk 19 - experimental app-server surface changes behavior

🎯 8 🛡️ 9 🧠 4

Mitigation:

  • keep experimentalApi=false
  • rely only on documented stable model/list fields
  • ignore unknown fields unless a typed use case is added

Risk 20 - app-server catalog passes but native exec fails

🎯 8 🛡️ 10 🧠 6

Mitigation:

  • treat app-server catalog as picker truth, not full launch proof
  • require Phase 4 native exec argv tests and live smoke where possible
  • test model, effort, cwd, profile, and provider scope together
  • block unsupported model-effort pairs before codex exec

Risk 21 - config/read behavior differs across Codex versions

🎯 7 🛡️ 9 🧠 6

Mitigation:

  • feature-detect config/read
  • always send {} params at minimum
  • classify method-missing, invalid-params, scoped-failure, and global-success separately
  • never make config-read failure disconnect the Codex account

Risk 22 - forced login/workspace reuses stale catalog

🎯 7 🛡️ 10 🧠 6

Mitigation:

  • include forced login method and forced workspace hash in auth scope
  • invalidate account, limits, and catalog together when either changes
  • display forced auth copy instead of showing conflicting selected auth copy
  • redact workspace ids in logs and diagnostics

Risk 23 - local model_catalog_json changes without config change

🎯 6 🛡️ 9 🧠 7

Mitigation:

  • hash effective config and optionally active catalog file mtime when app-server exposes enough origin data
  • keep TTL/manual refresh fallback when origin data is unavailable
  • do not parse or log arbitrary catalog file contents
  • do not apply untrusted project-scoped catalog files unless effective config says they are active

Definition Of Done

The feature is done when:

  • Codex model picker uses app-server model/list when available.
  • New app-server-visible Codex models appear without app code changes.
  • supportedReasoningEfforts and defaultReasoningEffort drive Codex effort UI.
  • xhigh appears only where Codex reports it.
  • Anthropic UI and launch behavior are unchanged.
  • Codex launches pass effort through model_reasoning_effort.
  • UI launch controls are gated by runtime capabilities, not by catalog metadata alone.
  • Future app-server-visible models appear without code changes and are marked as new until policy/runtime support is clear.
  • Default Codex selection resolves to concrete launch identity before provisioning.
  • Auth changes do not reuse stale model catalogs across ChatGPT and API-key modes.
  • Project-scoped Codex config and model_catalog_json cannot make launch use a different catalog than preview without explicit degraded copy.
  • Custom or OSS Codex provider config is not silently presented as ChatGPT subscription-backed Agent Teams Codex.
  • config/read compatibility is feature-detected and never breaks account truth on older binaries.
  • Forced login method and forced workspace changes cannot reuse stale account, rate-limit, or catalog cache.
  • Codex binary upgrades invalidate stale catalog cache and retry model/list.
  • Older Codex binaries without model/list fall back without breaking account state.
  • Static Codex catalog is clearly fallback, not primary truth.
  • Hidden persisted models are preserved with explicit warnings.
  • Backup/restore, draft retry, launch prefill, member metadata, and scheduled paths do not drop provider-aware effort.
  • Exact logs and diagnostics do not persist raw account identifiers or secret values.
  • Exact logs include catalog scope and provider scope fingerprints for debugging preview vs launch mismatch.
  • HMR and out-of-order refreshes do not replace ready provider status with stale fallback/error state.
  • Provider Settings remains fast and does not show transient empty/error states during refresh.
  • Tests cover catalog source, fallback, effort validation, and launch pass-through.

Final Signoff And Handoff

The implementation is now ready for review after these checks stay green:

  1. claude_team: pnpm typecheck
  2. claude_team: targeted catalog/runtime/team provisioning Vitest suites
  3. agent_teams_orchestrator_codex_native_spike: targeted Codex native exec and runtime capability Bun suites
  4. Live codex app-server model/list smoke against the installed Codex binary
  5. Optional UI smoke with CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike

Merge requirement:

  • merge/pair the claude_team branch with the agent_teams_orchestrator_codex_native_spike runtime capability change.
  • if the UI branch is merged without the runtime capability change, the feature remains safe but conservative: dynamic future Codex models and xhigh are visible as catalog metadata but blocked for launch.
  • if the runtime capability change is merged without the UI branch, existing Codex native behavior remains unchanged except for the explicit runtime status payload and xhigh exact argv support already covered by tests.

Recommended final manual smoke:

CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike pnpm dev

Then verify:

  • Provider Settings Codex model list is populated from app-server catalog.
  • gpt-5.1-codex-mini shows only medium | high.
  • gpt-5.4 shows low | medium | high | xhigh.
  • Anthropic does not show minimal, none, or xhigh.
  • A synthetic or newly released Codex model is not silently hidden by static UI code.
  • Launch logs include selected model, resolved launch model, selected effort, resolved effort, catalog source, and runtime capability truth.