777genius 7a337b6268 feat(codex): add app-server model catalog

2026-04-21 12:45:34 +03:00

98 KiB

Raw Blame History

Codex App-Server Model Catalog Plan

Date: 2026-04-21
Status: implementation complete in feature worktrees, pending final review/commit
Worktree: /Users/belief/dev/projects/claude/claude_team_codex_model_catalog_plan
Branch: spike/codex-model-catalog-plan
Primary repo: claude_team
Secondary repo worktree: /Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike
Architecture reference: FEATURE_ARCHITECTURE_STANDARD.md

Executive Summary

Codex model selection should move from hardcoded local lists to the official Codex app-server model/list catalog.

Chosen implementation:

Add a dedicated src/features/codex-model-catalog feature in claude_team.
Use codex app-server JSON-RPC model/list as the primary source for Codex models.
Keep the existing static Codex catalog only as a bounded fallback when app-server is unavailable.
Add rich, additive model metadata to CliProviderStatus while keeping models: string[] for backwards compatibility.
Use per-model supportedReasoningEfforts and defaultReasoningEffort for the Codex model picker and launch validation.
Keep Anthropic and Gemini behavior unchanged by default.
Update agent_teams_orchestrator so Codex launches pass reasoning effort through Codex config key model_reasoning_effort, not through an invented --effort flag.

Decision score:

🎯 9 🛡️ 9 🧠 6
estimated implementation size: 1200-2400 lines across claude_team and agent_teams_orchestrator

Why this is the safest path:

It follows the real Codex client contract instead of chasing static releases.
It solves future model releases like gpt-5.5 without an app release, as long as Codex app-server already exposes the model.
It avoids breaking Anthropic by making the new catalog contract additive and provider-scoped.
It handles xhigh correctly as Codex-specific reasoning effort, not as Anthropic max.

Current implementation state:

claude_team has the dedicated Codex model catalog feature, app-server JSON-RPC client, static fallback, provider status integration, Codex model picker integration, provider-aware effort UI, launch validation, launch identity persistence, and targeted tests.
agent_teams_orchestrator_codex_native_spike exposes runtime capabilities for dynamic Codex models and Codex reasoning config pass-through, and its Codex native exec runner passes effort through -c model_reasoning_effort="value".
Anthropic remains isolated from Codex-only effort values. Anthropic launch UI still uses low | medium | high; Codex can use per-model minimal | low | medium | high | xhigh only where catalog/runtime policy allows it.
Future Codex app-server models can appear immediately in UI. Launch is allowed only when the local runtime declares dynamic Codex model support; otherwise they remain visible with upgrade/policy copy instead of failing late during spawn.
Default Codex selection is resolved to a concrete model immediately before provisioning and stored as additive launch identity metadata.
The remaining work before merge is review/signoff, not more architecture discovery.

Sources And Verification

Official sources checked:

Important official facts:

model/list is explicitly intended for rendering model and personality selectors.
model/list returns id, model, displayName, hidden, defaultReasoningEffort, supportedReasoningEfforts, inputModalities, supportsPersonality, isDefault, upgrade, and upgradeInfo.
includeHidden: false returns picker-visible models by default.
codex exec has --model and -c, --config key=value.
codex exec does not expose a first-class --effort flag.
Codex config key model_reasoning_effort supports minimal | low | medium | high | xhigh.
xhigh is model-dependent.
config/read exists in app-server and returns effective configuration after configuration layering.
Codex loads user config from ~/.codex/config.toml and can also load project-scoped .codex/config.toml only for trusted projects.
model_catalog_json can override the model catalog, including profile-level overrides.
codex exec supports --cd and --profile, and -c key=value overrides take precedence for one invocation.

Local probe:

binary: codex-cli 0.117.0
method: codex app-server over JSON-RPC stdio
transport: newline-delimited JSON-RPC over stdio, not Content-Length framing
request: model/list with { "limit": 20, "includeHidden": false }
result: 8 visible models, gpt-5.4 marked default, nextCursor: null
visible models returned: gpt-5.4, gpt-5.2-codex, gpt-5.1-codex-max, gpt-5.4-mini, gpt-5.3-codex, gpt-5.3-codex-spark, gpt-5.2, gpt-5.1-codex-mini
xhigh is already returned for most models.
gpt-5.1-codex-mini only returned medium | high, so effort options must be per-model.
gpt-5.3-codex-spark returned default effort high, so default effort must not be global.
codex exec --help locally confirms --cd, --profile, --model, --oss, --local-provider, and repeatable -c key=value.
local help confirms --oss is equivalent to -c model_provider=oss, so provider scope can differ from subscription-backed OpenAI Codex if not guarded.
live config/read probe returned { config, origins }.
live config/read probe requires params object; missing params returns JSON-RPC error -32600.
live config/read probe accepted { cwd } and { profile } without error, so the implementation should feature-detect and test scoped reads instead of assuming only global config.
final live smoke on this worktree confirmed model/list returns 8 visible models, default gpt-5.4, xhigh for most models, and medium | high only for gpt-5.1-codex-mini.

Combined app-server session probe:

one initialized app-server process successfully handled account/read, account/rateLimits/read, and model/list sequentially
account/read returned a ChatGPT account shape in the local environment
account/rateLimits/read returned primary.windowDurationMins = 300 and secondary.windowDurationMins = 10080
model/list returned the same 8 visible models in that same session
conclusion: provider refresh should use a combined control-plane session when it needs account, limits, and catalog truth

Lowest-Confidence Areas And Decisions

1. Auth-scoped catalog truth

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-350 lines

Uncertainty:

app-server model/list may return different catalogs depending on active Codex auth state, account plan, org policy, API-key mode, or future Codex rollout flags.
The local probe only proves one logged-in environment, not all account modes.

Decision:

treat Codex model catalog as auth-scoped, not global
cache key must include binary path, Codex home, preferred auth mode, effective auth mode, managed account stable identity when available, and API-key availability source
never reuse a ChatGPT-account catalog as API-key-mode catalog
never reuse an API-key-mode catalog as ChatGPT-account catalog
when auth mode changes, keep previous catalog visible only as stale UI while refresh is in flight, then replace it

Implementation rule:

catalogCacheKey =
  binaryPath
  + binaryVersion
  + codexHome
  + preferredAuthMode
  + effectiveAuthMode
  + managedAccountHash or "no-chatgpt-account"
  + apiKey.source or "no-api-key"

The hash should use a per-process salt and should not be persisted. Do not persist raw email solely for catalog cache.

2. Default model determinism

🎯 8 🛡️ 9 🧠 6
Estimated implementation impact: 220-420 lines

Uncertainty:

current UI can represent model as empty string meaning Default
Codex app-server default can change after a Codex release
launch logs, relaunch, replay, and team metadata need to stay explainable

Decision:

keep Default as a UI selection
resolve Default to a concrete resolvedLaunchModel immediately before launch
persist both user selection and resolved runtime truth in launch metadata
never silently rewrite old team config from one concrete model to another
if a team stored Default, relaunch should show that it will resolve to the current Codex default before launch

Required persisted launch identity:

export interface ProviderModelLaunchIdentity {
  providerId: TeamProviderId;
  providerBackendId: TeamProviderBackendId | null;
  selectedModel: string | null;
  selectedModelKind: 'default' | 'explicit';
  resolvedLaunchModel: string;
  catalogId: string | null;
  catalogSource: 'app-server' | 'static-fallback' | 'unavailable';
  catalogFetchedAt: string | null;
  selectedEffort: string | null;
  resolvedEffort: string | null;
}

This identity should be written into exact logs and launch-derived metadata. It should not replace existing fields in Phase 1, but it should become the canonical explanation layer for Codex relaunch/replay.

3. Effort transport through orchestrator

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 180-320 lines

Uncertainty:

Agent Teams exposes a generic --effort concept today
Codex CLI does not expose --effort
Codex uses config key model_reasoning_effort

Decision:

UI and main process may accept provider-aware effort strings
orchestrator public Agent Teams CLI can continue accepting --effort
Codex executor must translate Codex effort to codex exec -c model_reasoning_effort='"value"'
Anthropic executor must not see Codex-only effort values
Codex executor must not see Anthropic max

No implementation phase may ship xhigh as selectable until this pass-through is tested.

4. Catalog availability vs team-agent safety policy

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 160-280 lines

Uncertainty:

app-server model/list says a model is available to Codex
our team-agent contract can still make a model unsafe for Agent Teams if it breaks task/reply/bootstrap conventions
current UI has local disabled policy for gpt-5.3-codex-spark, gpt-5.2-codex, and gpt-5.1-codex-mini

Decision:

model catalog answers "can Codex offer this model"
team policy answers "can Agent Teams safely launch this model"
keep these as separate layers
do not remove current disabled policies just because app-server returns a model
show clear disabled copy: Available in Codex, disabled for Agent Teams
disabled models can still display catalog metadata and effort metadata for transparency

5. Codex binary version and app-server method compatibility

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 220-420 lines

Uncertainty:

codex app-server is documented as an app-server integration surface, but local users can have older Codex binaries.
model/list may be missing, renamed, or return a narrower shape in older binaries.
Current JsonRpcStdioClient collapses JSON-RPC errors to Error(message), which loses method, code, and structured details needed to distinguish method not found from auth/network/timeout.
Current CodexBinaryResolver caches only binary path, not binary version.

Decision:

make binary version part of catalog cache identity
add structured JSON-RPC error metadata before implementing catalog fallback
treat method not found as static-fallback, not as account failure
treat malformed model rows as catalog degradation, not app-server runtime failure
clear catalog cache when resolved Codex binary path or version changes

Required implementation detail:

export class JsonRpcRequestError extends Error {
  readonly method: string;
  readonly code: number | null;
  readonly details: unknown;
}

The app-server model client should classify:

method_not_found: fallback to static catalog and show upgrade hint
timeout: stale cache if available, then fallback
malformed_response: fallback plus diagnostics
process_exit: shared app-server failure for all sub-results in combined snapshot
auth_required: account/read decides auth truth; model/list must not invent auth truth

6. `auto` auth resolution for model catalog

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-320 lines

Uncertainty:

UI lets users pick auto, chatgpt, or api_key.
Catalog can differ between ChatGPT subscription and API key.
The model picker must preview the catalog for the mode that launch will actually use, not only the configured preference.

Decision:

preferredAuthMode=auto is not a catalog scope by itself
resolve auto into effectiveAuthMode using the same readiness logic as launch
catalog request should be scoped to the effective launch mode
Provider Settings can show both preference and effective catalog scope when they differ
if effective mode flips from ChatGPT to API key because ChatGPT becomes unavailable, keep stale ChatGPT catalog visually stale and refresh API-key catalog

UX copy rule:

do not say Detected from OPENAI_API_KEY as the primary model catalog source when ChatGPT account is the effective mode
show API-key availability only as fallback/secondary when selected auth is ChatGPT or auto resolves to ChatGPT

7. App-server notifications and refresh cadence

🎯 8 🛡️ 8 🧠 5
Estimated implementation impact: 160-260 lines

Uncertainty:

account login flow has notifications
current docs and local probe do not establish a dedicated model-catalog changed notification
keeping a long-lived app-server just for model catalog would increase lifecycle complexity

Decision:

do not introduce a long-lived model catalog subscription in this rollout
use short-lived app-server sessions for refresh
trigger catalog refresh after login success, logout, auth mode change, API-key source change, manual refresh, and provider status refresh
do not poll model/list aggressively from renderer
use 10 minute success TTL and stale cache for UI continuity

If a future app-server release adds model catalog notifications, integrate them later behind the catalog feature port without changing renderer contracts.

8. Backup, restore, and relaunch compatibility

🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 240-520 lines

Uncertainty:

team launch metadata already persisted provider/model/effort/backend in several places
adding dynamic defaults and resolved model identity can make old backups ambiguous
old teams may contain no modelCatalog metadata

Decision:

new ProviderModelLaunchIdentity is additive
old teams without it remain readable
relaunch derives missing identity from existing provider/model/effort/backend fields
restore does not require the old catalog to be available
if restored explicit model is missing from current catalog, UI preserves the explicit model with a warning instead of silently replacing it with current default
if restored model was Default, relaunch preview resolves it against current catalog and says so before launch

Migration rule:

old explicit model -> selectedModelKind="explicit", resolvedLaunchModel=old model
old empty model -> selectedModelKind="default", resolvedLaunchModel=current default at next launch
missing effort -> selectedEffort=null, resolvedEffort=current model default at next launch

9. UI and orchestrator version skew

🎯 7 🛡️ 10 🧠 7
Estimated implementation impact: 280-620 lines

Uncertainty:

claude_team and agent_teams_orchestrator can be updated at different times.
UI can learn about xhigh, minimal, or a future model like gpt-5.5 before the installed orchestrator can launch it safely.
The current orchestrator static Codex helpers can reject a model that Codex app-server already exposed.

Decision:

catalog visibility and launch capability are separate contracts
UI may display app-server catalog metadata as soon as it is available
UI must not enable launch controls that require new orchestrator behavior until runtime capability says that behavior exists
provider-explicit Codex model strings can be accepted only after orchestrator declares dynamic Codex model support
Codex xhigh can be shown as metadata before Phase 4, but it is disabled for launch until Codex effort pass-through is available

Required runtime capability contract:

export interface ProviderRuntimeCapabilities {
  providerId: TeamProviderId;
  codex?: {
    supportsDynamicAppServerModels: boolean;
    supportsCodexReasoningEffortConfig: boolean;
    supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
    acceptsProviderExplicitFutureModels: boolean;
  };
}

Compatibility rule:

catalog says model/effort exists
+ team policy says model is not disabled
+ runtime capability says launch path supports it
= launch control enabled

If any part is missing, the picker can still display the model, but launch must be disabled with explicit copy.

Recommended copy:

Available in Codex, waiting for Agent Teams runtime support
This Codex effort is visible in Codex, but this Agent Teams runtime cannot launch it yet
Upgrade the Agent Teams runtime to use this model

This avoids a bad state where the user selects xhigh successfully in UI and then gets a late codex exec failure.

10. Future model policy, including `gpt-5.5`

🎯 8 🛡️ 8 🧠 6
Estimated implementation impact: 240-520 lines

Uncertainty:

app-server can expose a new model immediately after OpenAI releases it.
the user goal is that new Codex models appear without us shipping a new static list.
Agent Teams still needs a safety layer so one unexpected model row does not break team launch flows.

Top 3 policies:

Allow every app-server-visible model immediately: 🎯 8 🛡️ 5 🧠 3, 80-180 lines. This best solves future releases, but it can route unverified models into team launch without product copy or rollback clarity.
Show every app-server-visible model immediately, launch with capability gate plus "new model" warning: 🎯 9 🛡️ 8 🧠 5, 240-520 lines. This keeps future models visible without app releases, but still blocks only real launch incompatibilities.
Hide or disable unknown models until a code release updates policy: 🎯 4 🛡️ 9 🧠 2, 60-120 lines. This is safe but defeats the reason to use model/list.

Chosen policy: option 2.

Implementation rule:

app-server-visible, non-hidden models appear in the picker immediately
known disabled Agent Teams models remain disabled
new unknown models are selectable only if runtime capabilities support dynamic Codex models
new unknown models get a New from Codex catalog note until a successful launch or explicit policy promotion marks them verified
if the new model does not expose usable text input or any supported effort we can launch, it is shown but disabled
hidden models are never introduced into new-team pickers by default

Policy statuses:

export type CodexTeamModelPolicyStatus =
  | 'verified'
  | 'new-from-codex-catalog'
  | 'disabled-for-agent-teams'
  | 'requires-runtime-upgrade'
  | 'missing-from-current-catalog';

This means gpt-5.5 can appear the day app-server returns it, but the UI will not pretend the full Agent Teams launch path is verified unless the local runtime can actually handle provider-explicit dynamic Codex models.

11. Hidden, upgraded, and persisted models

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 160-340 lines

Uncertainty:

official docs say includeHidden: false returns picker-visible models by default.
persisted teams can reference a model that later becomes hidden, upgraded, renamed, or unavailable.
app-server exposes upgrade and upgradeInfo, but we do not know every future migration shape.

Decision:

normal picker uses includeHidden: false
if a persisted explicit Codex model is not found in the visible catalog, run one scoped refresh with includeHidden: true
if hidden lookup finds the model, show it as Hidden in Codex catalog and keep relaunch possible only if runtime capability and team policy allow it
if upgrade points to a visible replacement, show a non-destructive migration suggestion
never auto-rewrite persisted model ids during restore or relaunch

Relaunch behavior:

visible model found -> normal relaunch
hidden model found -> relaunch allowed only with warning and policy pass
upgrade available -> show "Switch to recommended model" action
missing model -> keep value visible, require user to choose another model before launch

This avoids both failure modes: silently changing a user's team model, or breaking old teams because a model moved out of the default picker.

12. Stored effort schema and non-dialog launch paths

🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 320-760 lines

Uncertainty:

effort is not used only in launch dialogs.
team metadata, member metadata, backup/restore, draft retry, localStorage launch params, and scheduled/provisioned flows can all carry effort.
current normalizers in team data paths may silently discard anything outside low | medium | high.

Decision:

provider-aware effort parsing must be added at every inbound boundary, not only in React components
old persisted low | medium | high values stay valid
new Codex-specific values are preserved only with provider/model context
if provider context is missing, parse as legacy effort and do not invent Codex-specific meaning
scheduled launches and automation-like flows must either be updated in the same phase or explicitly block Codex-only efforts until updated

High-risk code paths to audit during implementation:

src/main/services/team/TeamMembersMetaStore.ts
src/main/services/team/TeamDataService.ts
src/main/services/team/TeamBackupService.ts
src/main/services/team/TeamProvisioningService.ts
src/shared/types/schedule.ts
src/main/ipc/teams.ts
src/main/http/teams.ts
renderer launch prefill and draft retry localStorage state

Migration rule:

legacy effort with no provider context -> keep if low | medium | high
codex effort with provider=codex -> validate against selected model catalog
codex effort with provider missing -> store as selected string only, resolve before launch
unsupported restored effort -> show warning, do not silently downgrade

13. Renderer stale state, HMR, and out-of-order refreshes

🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 180-380 lines

Uncertainty:

previous provider settings work showed transient wrong states after HMR and slow refreshes.
catalog, account, and rate limits can refresh with different timings.
a stale app-server response can arrive after a newer auth-mode change.

Decision:

every provider status refresh should carry a monotonic requestId or snapshotVersion
renderer stores the latest accepted version per provider
responses older than the latest accepted version are ignored
modelCatalog.schemaVersion is required and future versions are treated as degraded, not fatal
HMR should keep last ready provider status visible while a refresh is in flight
a catalog refresh cannot overwrite account connected state unless it came from the same combined snapshot

Required stale-write guard:

if incoming.providerId != current.providerId -> reject
if incoming.requestId < current.requestId -> reject
if incoming.authScope != current.authScope and incoming.status is not from current auth selection -> keep as stale diagnostics only

This directly targets flicker like Codex native unavailable followed by ready state, or fallback API-key copy appearing while ChatGPT account mode is selected.

14. Privacy, logs, and diagnostics

🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 120-260 lines

Uncertainty:

account-scoped cache keys need stable identity, but raw email should not leak into exact logs, runtime snapshots, or persistent diagnostics.
API-key source is useful for UX, but no secret or env value should be logged.

Decision:

hash managed account identity in memory for cache keys
use a per-process salt for volatile cache keys
do not persist raw account email solely for model catalog cache
exact logs can record authScope=chatgpt or authScope=api_key, not raw account identity
diagnostics can record apiKeySource=OPENAI_API_KEY but never the value
error messages preserve method/code/timeout, but redact command env and tokens

Required diagnostic fields:

export interface CodexModelCatalogDiagnostics {
  source: 'app-server' | 'static-fallback' | 'unavailable';
  status: 'ready' | 'stale' | 'degraded' | 'unavailable';
  method?: 'model/list';
  errorCode?: string | number | null;
  errorCategory?: string | null;
  binaryVersion?: string | null;
  effectiveAuthMode?: 'chatgpt' | 'api_key' | null;
  cacheAgeMs?: number | null;
}

No UI surface should show Unknown error for catalog failures after this feature.

15. Rollout ordering across repos

🎯 8 🛡️ 10 🧠 6
Estimated implementation impact: 120-260 lines

Uncertainty:

claude_team can ship UI before the user has a compatible agent_teams_orchestrator runtime in cache.
the app can point to CLAUDE_DEV_RUNTIME_ROOT, bundled runtime cache, or a user-installed runtime binary.

Decision:

implement orchestrator support first or behind a UI capability gate
Provider Settings can show catalog metadata before launch support exists
Create/Launch dialogs must consult runtime capabilities before enabling new Codex models or new Codex efforts
the runtime health check should expose a version/capability payload, not force UI to infer support from binary version strings
if capabilities are unavailable, default to safe: display metadata, disable launch-only features

Rollout sequence:

Add orchestrator dynamic Codex model and effort capability support.
Add claude_team catalog feature and provider status metadata.
Show catalog in UI with capability gates.
Enable launch when capability and catalog agree.
Remove any temporary guard only after bundled runtime and dev runtime both report capabilities in CI/smoke.

This is the cleanest way to avoid UI and runtime getting out of sync.

16. Codex config/profile/cwd catalog mismatch

🎯 6 🛡️ 10 🧠 8
Estimated implementation impact: 360-900 lines

Uncertainty:

official config docs allow model_catalog_json, and profile-level profiles.<name>.model_catalog_json can override it.
Codex loads project-scoped .codex/config.toml only when a project or worktree is trusted.
codex exec can run with a different cwd, profile, and inline -c overrides than the short-lived app-server preview session.
current CodexAppServerSessionFactory starts codex app-server without an explicit cwd or profile.

Failure mode:

Provider Settings shows catalog A from global config.
Launch runs codex exec in project cwd with project-scoped or profile config and effectively uses catalog B.
The user selects a model that preview says is valid, but launch resolves against a different provider/catalog.

Top 3 policies:

Global-only catalog preview: 🎯 7 🛡️ 5 🧠 3, 80-180 lines. Fast and simple, but wrong for project-scoped Codex configs.
Project-scoped catalog preview for launch flows, global preview for dashboard: 🎯 9 🛡️ 9 🧠 7, 360-900 lines. More work, but it matches actual codex exec launch context.
Ignore config and force a static OpenAI Codex provider always: 🎯 5 🛡️ 8 🧠 4, 200-420 lines. Safer than mismatch, but it discards legitimate user Codex config and can surprise power users.

Chosen policy: option 2.

Decision:

dashboard/provider card can show a global Codex catalog snapshot
Create/Launch dialogs must fetch or resolve catalog for the selected launch cwd
if profile selection exists or is introduced, catalog cache key must include profile name
if we pass inline config overrides to codex exec, equivalent preview scope must include those overrides or launch must be marked "not preview-verified"
if project trust/config cannot be resolved, launch UI falls back to global catalog but shows Catalog may differ for this project

Required preview scope:

export interface CodexModelCatalogScope {
  codexHome: string;
  binaryPath: string;
  binaryVersion: string | null;
  cwd: string | null;
  projectTrust: 'trusted' | 'untrusted' | 'unknown';
  profileName: string | null;
  configFingerprint: string | null;
  preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
  effectiveAuthMode: 'chatgpt' | 'api_key' | null;
  launchOverridesFingerprint: string | null;
}

Cache key correction:

catalogCacheKey =
  binaryPath
  + binaryVersion
  + codexHome
  + cwd or "global"
  + projectTrust
  + profileName or "default-profile"
  + configFingerprint or "unknown-config"
  + launchOverridesFingerprint or "no-launch-overrides"
  + preferredAuthMode
  + effectiveAuthMode
  + forcedLoginMethod or "no-forced-login-method"
  + forcedWorkspaceHash or "no-forced-workspace"
  + managedAccountHash or "no-chatgpt-account"
  + apiKey.source or "no-api-key"

Implementation notes:

use app-server config/read when available to get effective config fingerprints for the same scope that launch will use
do not parse arbitrary TOML as the primary config source if app-server can resolve effective configuration
if app-server cannot scope config/read by cwd/profile, keep that uncertainty visible in diagnostics
do not use raw config file contents as a cache key or log payload; hash only the relevant effective keys

Relevant effective keys:

model
model_provider
model_catalog_json
profiles.<name>.model_catalog_json
model_reasoning_effort
forced_login_method
forced_chatgpt_workspace_id
openai_base_url
model_providers.* only as a redacted structural fingerprint
projects.<path>.trust_level

Acceptance:

a team launch from project A and project B can have different Codex catalog cache entries
a trusted project .codex/config.toml changing model_catalog_json invalidates preview for that project
global dashboard status does not claim to be launch-exact for every project
exact logs record the catalog scope fingerprint, not raw config values

17. Built-in OpenAI Codex provider vs custom/OSS Codex config

🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 260-620 lines

Uncertainty:

Codex config supports model_provider, custom providers, oss_provider, and provider auth settings.
Agent Teams "Codex" provider is intended to mean native Codex through OpenAI/ChatGPT subscription or API-key billing, not arbitrary custom provider execution.
app-server model/list can be influenced by configuration, but our product copy currently talks about Codex subscription.

Decision:

this cutover should keep Agent Teams Codex scoped to the built-in OpenAI Codex provider
custom provider and OSS provider support should be a separate provider feature, not silently mixed into provider=codex
if effective config says model_provider is not built-in OpenAI for the launch scope, show a clear warning and block subscription-mode launch unless the user intentionally switches to a future custom-provider flow
when launching Agent Teams Codex, pass or enforce provider config consistently so codex exec uses the same provider class previewed by the catalog

Recommended launch guard:

if provider=codex and effective model_provider is neither missing nor "openai":
  status = degraded
  launch = blocked
  copy = "This project config points Codex at a custom/local provider. Agent Teams Codex currently supports the built-in OpenAI Codex provider only."

If the team wants to support custom providers later:

add a separate provider=codex-custom or generic OpenAI-compatible provider
do not reuse subscription UX or rate-limit UI
do not show ChatGPT account limits for custom provider launches

This prevents a confusing case where UI says "Codex subscription" but runtime actually routes to local OSS or a custom endpoint.

18. Modalities and personality support

🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 120-280 lines

Uncertainty:

app-server model rows expose inputModalities and supportsPersonality.
Agent Teams launch prompts are text-first today, but future UI can attach images or personality-like instructions.
older model catalogs can omit inputModalities, and docs say missing modalities should be treated as ["text", "image"] for backward compatibility.

Decision:

launchability requires text input support
image support is displayed as capability metadata, not required for normal team launch
supportsPersonality=false must not disable normal team launch, but the UI must not claim /personality or personality-specific behavior for that model
missing inputModalities uses the documented backward-compatible default

Validation rule:

if inputModalities exists and does not include "text":
  show model, disable launch, copy "This Codex model is not text-launch compatible for Agent Teams"

if supportsPersonality=false:
  hide personality controls for this model if those controls exist

This keeps model picker truthful without overfitting to the current text-only launch flow.

19. Stable app-server surface vs experimental fields

🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 80-180 lines

Uncertainty:

app-server has an experimentalApi capability.
model/list itself is documented on the stable API overview, but adjacent methods and future richer fields can be experimental.
opting into experimental API globally can change response surface and error behavior.

Decision:

keep experimentalApi=false for the model catalog rollout
rely only on stable model/list fields listed in the docs
treat extra fields as diagnostics only
add a later explicit spike before using experimental catalog, plugin, or app-server thread features in this path

Acceptance:

catalog tests run with experimentalApi=false
no Phase 1-5 task depends on experimental fields
if a future field appears, normalization ignores it unless we add a typed, tested use case

20. App-server preview vs native exec signoff

🎯 8 🛡️ 10 🧠 6
Estimated implementation impact: 180-420 lines

Uncertainty:

model/list is the correct picker source, but the actual launch surface remains codex exec --json.
a model can appear in app-server before codex exec in the installed binary handles it correctly.
effort config can be accepted syntactically but rejected by the model/provider at runtime.

Decision:

app-server catalog is necessary for UI, but not the only release gate for enabling new launch capability
Phase 4 must include a live or mocked native-exec compatibility probe for the selected launch path
native exec signoff should test model, provider scope, cwd, profile, and non-default effort together
if live signoff is not available in CI, use a fixture-based unit test plus one documented local smoke command before merging

Required signoff matrix:

default model + default effort + selected cwd
explicit gpt-5.4 + xhigh + selected cwd
gpt-5.1-codex-mini + high + selected cwd
gpt-5.1-codex-mini + xhigh -> blocked before exec
synthetic future model + capability disabled -> blocked before exec
synthetic future model + capability enabled -> argv accepted by orchestrator test
custom model_provider config -> blocked or explicit custom-provider copy

This prevents the plan from treating app-server catalog presence as proof that the full Agent Teams runtime path is healthy.

21. `config/read` scope contract is only partially documented

🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-420 lines

Uncertainty:

docs list config/read, but the detailed request/response shape is not as explicit as model/list.
local probe confirms config/read returns { config, origins } and accepts params.
local probe confirms missing params returns -32600, so callers must always send {} at minimum.
local probe confirms { cwd } and { profile } are accepted, but we still need tests around whether they fully mirror codex exec --cd/--profile in all installations.

Decision:

treat config/read as a feature-detected helper, not as a hard dependency for model catalog availability
always call config/read with an object, never with missing params
include config/read method/code/details in diagnostics
if scoped config/read fails but global succeeds, mark launch catalog as scope_unverified, not ready
if config/read is missing on older binaries, fall back to global catalog and require runtime capability plus explicit degraded copy before launch enablement

Recommended DTO:

export interface CodexAppServerConfigReadParams {
  cwd?: string | null;
  profile?: string | null;
}

export interface CodexAppServerConfigReadResponse {
  config: Record<string, unknown>;
  origins: Record<string, unknown>;
}

Feature-detect result:

export type CodexConfigReadSupport =
  | 'supported-scoped'
  | 'supported-global-only'
  | 'method-missing'
  | 'failed';

Acceptance:

unit tests cover missing params, method-not-found, global success, scoped success, and scoped failure
config/read failure never breaks account/rate-limit/model-list reads
launch UI does not present a project-scoped catalog as verified unless config scope was actually checked

🎯 7 🛡️ 10 🧠 6
Estimated implementation impact: 180-420 lines

Uncertainty:

effective Codex config can include forced_login_method.
effective Codex config can include forced_chatgpt_workspace_id.
workspace/account policy can affect available models, rate limits, and whether ChatGPT subscription mode is valid.
previous Codex account work already had a real bug around forced login method, so this is not theoretical.

Decision:

auth scope must include forced login method and forced workspace identity when present
if UI-selected auth mode conflicts with forced_login_method, effective auth mode wins and UI must explain why
forced workspace id must be hashed before cache/log usage
rate-limit, account, and model catalog snapshots must be scoped together so workspace changes cannot reuse stale catalog

Auth scope correction:

export interface CodexCatalogAuthScope {
  preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
  effectiveAuthMode: 'chatgpt' | 'api_key' | null;
  forcedLoginMethod: 'chatgpt' | 'api_key' | null;
  managedAccountHash: string | null;
  forcedWorkspaceHash: string | null;
  apiKeySource: string | null;
}

UX rules:

if user selected ChatGPT but config forces API key, show Codex config forces API key mode for this scope
if user selected API key but config forces ChatGPT, show Codex config forces ChatGPT account mode for this scope
if workspace id changes, show Codex workspace changed, refreshing subscription limits and model catalog
never show raw workspace id in UI unless Codex app-server provides a display name that is intended for users

Cache invalidation:

forced login method change invalidates both auth and catalog cache
forced workspace hash change invalidates ChatGPT-scoped rate limits and catalog
account logout clears all ChatGPT workspace-scoped entries

23. Model catalog file trust and local file changes

🎯 6 🛡️ 9 🧠 7
Estimated implementation impact: 220-520 lines

Uncertainty:

model_catalog_json can point to a local JSON file.
app-server resolves effective config, but our app may not know if that JSON file changed unless config fingerprint includes enough origin data.
project-scoped .codex/config.toml only applies for trusted projects, so a file can exist but not be active.

Decision:

treat model_catalog_json as part of effective config, not as a file we parse directly by default
if config/read.origins exposes enough origin/path data, hash only path and mtime for invalidation, not file contents
if origin/path data is unavailable, rely on manual refresh and short TTL
never read arbitrary model_catalog_json file contents into logs or diagnostics
do not apply project-scoped model catalog unless Codex effective config says the project is trusted and the catalog is active

Top 3 invalidation policies:

TTL/manual-refresh only: 🎯 7 🛡️ 6 🧠 2, 40-100 lines. Simple but stale after local file edits.
Hash effective config plus optional mtime for active catalog file: 🎯 8 🛡️ 9 🧠 5, 220-520 lines. Best balance without parsing arbitrary catalog files ourselves.
Parse and watch every possible catalog file: 🎯 5 🛡️ 7 🧠 8, 500-1000 lines. Too much responsibility and security surface for this feature.

Chosen policy: option 2.

Acceptance:

active model_catalog_json path change invalidates cache
active catalog file mtime change invalidates cache when path is available
inactive untrusted project .codex/config.toml does not affect the trusted/global catalog

Top 3 Implementation Options

1. Dedicated Codex model catalog feature - chosen

🎯 9 🛡️ 9 🧠 6
Estimated size: 1200-2400 lines

Core idea:

create src/features/codex-model-catalog
keep model catalog rules isolated from account UI, provider status plumbing, and Electron transport
reuse existing CodexAppServerSessionFactory
expose a small feature facade to provider status and renderer model picker
update orchestrator only where runtime status and launch effort transport require it

Why it wins:

best SOLID alignment
clean domain rules for model visibility, effort validation, fallback, and default selection
does not make codex-account responsible for model policy
least risk to Anthropic
easiest to test without full app startup

Main tradeoff:

needs small integration glue in existing provider status and team launch flows

2. Fold catalog into `codex-account`

🎯 7 🛡️ 7 🧠 5
Estimated size: 800-1600 lines

Core idea:

extend src/features/codex-account with model/list
use account snapshot as the only Codex control-plane snapshot
merge account, rate limits, and model catalog in one feature

Why it is tempting:

fewer new folders
account feature already owns app-server account/rate-limit reads
easier to fetch account plus model catalog in one app-server session

Why I do not recommend it:

model catalog is not account management
the feature becomes a broad Codex control-plane catch-all
future provider catalog work would have to pull model rules back out
more risk of account UI churn when only model picker changes are needed

3. Full provider model catalog for all providers now

🎯 7 🛡️ 8 🧠 9
Estimated size: 2500-4500 lines

Core idea:

build one provider-agnostic model catalog for Anthropic, Codex, Gemini, and future providers
move static renderer catalog policy into a shared feature
expose one rich contract for all provider model pickers

Why it is attractive:

cleanest long-term abstraction
one UI model for labels, availability, capabilities, and efforts
reduces future duplication

Why not now:

too much surface area while Codex runtime cutover is still fresh
Anthropic model behavior is already stable and should not be reworked for a Codex catalog issue
would delay the concrete Codex model release problem

Current Code Reality

`claude_team`

Existing app-server infrastructure:

src/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.ts
src/main/services/infrastructure/codexAppServer/CodexAppServerSessionFactory.ts
src/main/services/infrastructure/codexAppServer/protocol.ts
src/features/codex-account/main/infrastructure/CodexAccountAppServerClient.ts

Current account client behavior:

readAccount() opens one app-server session.
readRateLimits() opens another app-server session.
logout() opens another app-server session.
no model/list protocol types exist yet.
CodexAppServerSessionFactory starts codex app-server with no explicit cwd or profile option.
app-server initialize response includes codexHome, but the current protocol types do not expose effective config or config fingerprint.

Current shared provider status:

CliProviderStatus.models is only string[].
CliProviderStatus.modelAvailability has per-model verification status but no rich model metadata.
renderer model selector can already prefer runtime-provided providerStatus.models.

Current effort type:

export type EffortLevel = 'low' | 'medium' | 'high';

Risk:

adding xhigh directly without provider-specific validation would let Anthropic UI accidentally offer unsupported choices.

Current persistence and non-dialog launch paths:

team metadata and member metadata normalize launch-derived provider/model/effort in multiple services.
backup/restore copies metadata but restore-time launch preview must still tolerate missing catalog metadata.
draft retry and launch prefill can reuse old localStorage state.
scheduled launch types can reference the shared effort type.

Risk:

updating only the visible launch dialogs would leave hidden paths that silently drop Codex-only efforts or relaunch with stale default semantics.

`agent_teams_orchestrator`

Current Codex model catalog:

src/utils/model/codex.ts
static CODEX_MODELS
static DEFAULT_CODEX_MODEL
isCodexModel() checks only static ids

Current runtime status:

getUnifiedRuntimeStatusPayload('codex') returns static Codex model ids.

Current CLI effort:

top-level --effort <level> currently accepts low | medium | high | max.
Codex native execution is ultimately codex exec --json.
installed codex exec --help shows no --effort flag.

Risk:

if we send --effort xhigh through current orchestrator, it fails before Codex can use it.
if we map Anthropic max to Codex xhigh, the semantics are wrong.
if we show xhigh in UI before the launch path supports it, the picker becomes misleading.

Target Architecture

Feature folder

src/features/codex-model-catalog/
  contracts/
    codexModelCatalog.dto.ts
    index.ts
  core/
    domain/
      codexModelCatalog.ts
      codexReasoningEffort.ts
      codexModelCatalogFallback.ts
      normalizeCodexAppServerModel.ts
    application/
      GetCodexModelCatalogUseCase.ts
      CodexModelCatalogPorts.ts
  main/
    composition/
      createCodexModelCatalogFeature.ts
    adapters/
      output/
        CodexAppServerModelCatalogSource.ts
        StaticCodexModelCatalogSource.ts
    infrastructure/
      CodexModelCatalogAppServerClient.ts
      InMemoryCodexModelCatalogCache.ts
  preload/
    index.ts
  renderer/
    adapters/
      codexModelCatalogViewModel.ts
    hooks/
      useCodexModelCatalog.ts
    ui/
      CodexModelEffortHint.tsx

Rules:

core/domain has all normalization and validation rules.
main/infrastructure is the only layer that knows JSON-RPC method names.
renderer never receives raw app-server rows.
app shell imports only public feature entrypoints.

App-server lifecycle

Use the existing CodexAppServerSessionFactory.

Request sequence:

Spawn codex app-server.
Send initialize with clientInfo and capabilities.
Send initialized.
Request model/list.
Drain or ignore notifications safely.
Close stdin and terminate the process on completion or timeout.

Recommended timeouts:

initialize: 6000ms
model/list: 4500ms
total model catalog read: 9000ms

Recommended pagination:

request limit: 100, includeHidden: false for normal UI
follow nextCursor until null
hard-stop after 5 pages to avoid runaway loops
log a degraded catalog warning if the hard-stop is hit

Single-session snapshot policy

Provider status currently risks multiple sequential app-server starts:

account read
rate limits read
future model list read

This caused slow provider loading in earlier UI work, so the plan should not add another app-server spawn in the hot path.

Preferred design:

keep codex-model-catalog as a separate feature for ownership
add an optional combined Codex control-plane read in composition
when provider status refresh needs account plus rate limits plus model catalog, use one app-server session and issue all three requests inside it
each sub-result has independent soft-failure state
total snapshot can be partially healthy

Snapshot shape:

export interface CodexControlPlaneSnapshot {
  binary: {
    path: string;
    version: string | null;
  };
  account: CodexAccountSnapshotResult;
  rateLimits: CodexRateLimitsSnapshotResult;
  modelCatalog: CodexModelCatalogSnapshotResult;
  configScope: {
    cwd: string | null;
    profileName: string | null;
    projectTrust: 'trusted' | 'untrusted' | 'unknown';
    configReadSupport: CodexConfigReadSupport;
    effectiveConfigFingerprint: string | null;
    launchOverridesFingerprint: string | null;
    activeModelCatalogFileFingerprint: string | null;
  };
  initialize: {
    codexHome: string;
    platformFamily: string;
    platformOs: string;
  };
  fetchedAt: string;
}

Soft-failure rules:

account failure must not erase a fresh cached model catalog
model catalog failure must not mark ChatGPT account disconnected
rate-limit failure must not hide model picker options
if app-server initialize fails, all three sub-results are degraded from the same root cause

Required correction to the existing account flow:

current CodexAccountAppServerClient.readAccount() and readRateLimits() each open their own app-server process
adding a third standalone readModelCatalog() would be a Provider Settings latency regression
implement a combined app-server read path before wiring catalog into provider refresh
keep separate methods for mutations and focused tests, but use the combined path for normal status refresh
enrich JsonRpcStdioClient errors before catalog integration so the combined reader can classify model/list method failures without losing account truth

Recommended application service shape:

export interface CodexControlPlaneReader {
  readSnapshot(options: CodexControlPlaneReadOptions): Promise<CodexControlPlaneSnapshot>;
}

This can live in codex-model-catalog composition or in a small shared Codex control-plane composition module. Do not put model normalization inside codex-account.

Read scope:

Provider Settings global refresh can pass cwd=null.
Create/Launch dialogs should pass the selected absolute cwd.
Relaunch/restore should pass the team's persisted project path.
Scheduled launch validation should pass schedule.launchConfig.cwd.
If a future UI supports Codex profile selection, the same profile must be passed to preview and launch.

Contracts

App-server protocol types

Add protocol DTOs to src/main/services/infrastructure/codexAppServer/protocol.ts:

export type CodexAppServerReasoningEffort =
  | 'none'
  | 'minimal'
  | 'low'
  | 'medium'
  | 'high'
  | 'xhigh';

export interface CodexAppServerReasoningEffortOption {
  reasoningEffort: CodexAppServerReasoningEffort;
  description?: string | null;
}

export type CodexAppServerInputModality = 'text' | 'image' | string;

export interface CodexAppServerModel {
  id: string;
  model: string;
  displayName: string;
  description?: string | null;
  hidden: boolean;
  supportedReasoningEfforts: CodexAppServerReasoningEffortOption[];
  defaultReasoningEffort: CodexAppServerReasoningEffort;
  inputModalities?: CodexAppServerInputModality[] | null;
  supportsPersonality?: boolean | null;
  isDefault: boolean;
  upgrade?: string | null;
  upgradeInfo?: unknown;
  availabilityNux?: unknown;
}

export interface CodexAppServerModelListParams {
  cursor?: string | null;
  limit?: number | null;
  includeHidden?: boolean | null;
}

export interface CodexAppServerModelListResponse {
  data: CodexAppServerModel[];
  nextCursor: string | null;
}

export interface CodexAppServerConfigReadParams {
  cwd?: string | null;
  profile?: string | null;
}

export interface CodexAppServerConfigReadResponse {
  config: Record<string, unknown>;
  origins: Record<string, unknown>;
}

config/read caller rule:

always pass a params object, even when empty
call global config as config/read with {}
call project scope as config/read with { cwd }
call profile scope as config/read with { profile }
if both cwd and profile are needed, test { cwd, profile } in Phase 1 and record the behavior before enabling profile-aware UI

Domain model

Use separate ids:

catalogId: app-server id, stable identity for React keys, telemetry, and dedupe
launchModel: app-server model when non-empty, otherwise id

Reason:

local probe currently returned equal values, but official schema exposes both fields, so they can diverge later.
using id for launch would be a latent bug if Codex introduces a display/catalog alias.

export interface CodexCatalogModel {
  catalogId: string;
  launchModel: string;
  displayName: string;
  description: string | null;
  hidden: boolean;
  isDefault: boolean;
  supportedReasoningEfforts: CodexReasoningEffort[];
  defaultReasoningEffort: CodexReasoningEffort | null;
  inputModalities: CodexInputModality[];
  supportsPersonality: boolean;
  upgrade: string | null;
  source: 'app-server' | 'static-fallback';
}

Normalization rules:

reject rows without a usable id
derive launchModel from model || id
default missing inputModalities to ['text', 'image'] for older catalogs
default missing supportsPersonality to false
accept documented supportedReasoningEfforts objects with reasoningEffort
defensively accept string effort entries in tests, because older generated local types and live clients can drift
drop duplicate catalogId rows after the first visible row
drop duplicate launchModel rows after the first visible row unless a hidden row is the only available row
keep unknown effort strings out of the selectable UI, but preserve them in diagnostics
if no model is marked isDefault, choose static fallback default only as degraded fallback and label it as such

Provider status contract

Add an optional rich catalog to CliProviderStatus:

export interface CliProviderModelCatalog {
  schemaVersion: 1;
  source: 'app-server' | 'static-fallback' | 'unavailable';
  status: 'ready' | 'stale' | 'degraded' | 'unavailable';
  fetchedAt: string | null;
  staleAt: string | null;
  binary?: {
    path: string | null;
    version: string | null;
  };
  authScope?: {
    preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
    effectiveAuthMode: 'chatgpt' | 'api_key' | null;
    forcedLoginMethod?: 'chatgpt' | 'api_key' | null;
    managedAccountHash?: string | null;
    forcedWorkspaceHash?: string | null;
    apiKeySource?: string | null;
  };
  launchScope?: {
    cwd: string | null;
    profileName: string | null;
    projectTrust: 'trusted' | 'untrusted' | 'unknown';
    configFingerprint: string | null;
    launchOverridesFingerprint: string | null;
  };
  errorMessage?: string | null;
  defaultModelId?: string | null;
  defaultLaunchModel?: string | null;
  models: CliProviderModelInfo[];
}

export interface CliProviderModelInfo {
  catalogId: string;
  launchModel: string;
  displayName: string;
  description?: string | null;
  hidden?: boolean;
  isDefault?: boolean;
  supportedReasoningEfforts?: CliProviderReasoningEffort[];
  defaultReasoningEffort?: CliProviderReasoningEffort | null;
  inputModalities?: string[];
  supportsPersonality?: boolean;
  upgrade?: string | null;
}

export type CliProviderReasoningEffort =
  | 'none'
  | 'minimal'
  | 'low'
  | 'medium'
  | 'high'
  | 'xhigh'
  | 'max';

export interface CliProviderRuntimeCapabilities {
  schemaVersion: 1;
  codex?: {
    supportsDynamicAppServerModels: boolean;
    supportsCodexReasoningEffortConfig: boolean;
    supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
    acceptsProviderExplicitFutureModels: boolean;
  };
}

Backwards compatibility:

keep CliProviderStatus.models: string[]
add CliProviderStatus.runtimeCapabilities?: CliProviderRuntimeCapabilities
for Codex, derive models from modelCatalog.models.map(model => model.launchModel)
for Anthropic and Gemini, do not require modelCatalog
old renderers continue to work from models
new renderers prefer modelCatalog when present
never put team-agent disabled policy directly into CliProviderModelCatalog; catalog describes Codex availability, while Agent Teams policy is applied by renderer and launch validators
never infer launch capability only from catalog presence

Renderer integration hotspot:

update TeamModelRuntimeProviderStatus in src/renderer/utils/teamModelAvailability.ts to include modelCatalog
update getRuntimeSelectorModels() to use modelCatalog.models[*].launchModel for Codex
update getAvailableTeamProviderModelOptions() to map rich Codex options with display labels, default badge, and catalog diagnostics
keep Anthropic path on getFallbackTeamProviderModelOptions()
keep Gemini path on existing models: string[] until Gemini has a richer catalog

Team launch effort contract

Do not add a separate per-provider lane for this feature.

Use existing team-level model/provider selection, but make effort provider-aware.

Recommended implementation:

keep persisted field name effort
widen internal effort type to ProviderReasoningEffort
add provider/model validators at every launch boundary
Anthropic UI only shows low | medium | high
Codex UI shows only the selected model's supportedReasoningEfforts
orchestrator accepts minimal | low | medium | high | xhigh for Codex and low | medium | high | max for Anthropic paths

Existing validator hotspots:

src/shared/types/team.ts currently defines EffortLevel = 'low' | 'medium' | 'high'
src/main/ipc/teams.ts currently validates only low | medium | high
src/main/http/teams.ts currently validates only low | medium | high
src/renderer/components/team/dialogs/EffortLevelSelector.tsx currently hardcodes only Default | Low | Medium | High
LaunchTeamDialog, CreateTeamDialog, member draft rows, and member editor utilities currently cast strings with as EffortLevel

Required migration:

replace unsafe as EffortLevel casts with a provider-aware normalization function
parse provider before parsing effort in IPC and HTTP paths
validate lead effort against lead provider/model
validate member effort against each member's resolved provider/model
keep old persisted low | medium | high values readable without migration

Validation rule:

provider=codex:
  effort must be in selectedModel.supportedReasoningEfforts

provider=anthropic:
  effort must be low | medium | high

provider=gemini:
  keep current behavior unless Gemini gets a richer effort contract

Important:

do not map Anthropic max to Codex xhigh
do not map Codex xhigh to Anthropic max
if selected Codex model changes and old effort is unsupported, reset to the new model's defaultReasoningEffort
if catalog is unavailable, only allow static fallback efforts that are proven launchable

Launch identity rule:

effort is user selection
resolvedEffort is what launch sends to runtime
if user selection is empty/default, resolvedEffort comes from app-server defaultReasoningEffort
if resolved effort equals app-server default, runtime transport may omit model_reasoning_effort, but exact logs still record the resolved value

Runtime Launch Transport

This was the highest-risk area in the earlier plan. The corrected plan is explicit.

Facts:

codex exec has --model.
codex exec has -c, --config key=value.
codex exec has no documented --effort.
Codex config has model_reasoning_effort.
model_reasoning_effort supports minimal | low | medium | high | xhigh.

Therefore:

Codex native launch must not pass --effort xhigh to Codex CLI.
Orchestrator may keep accepting --effort as its public Agent Teams flag.
When provider is Codex native, orchestrator must translate accepted effort into codex exec -c model_reasoning_effort="value".
When no effort is selected, omit model_reasoning_effort and let Codex use its model default.
When effort equals the selected model's app-server default, either omit it or pass it consistently, but pick one policy and test it.

Recommended policy:

omit effort when it equals app-server defaultReasoningEffort
pass effort only when user explicitly selected a non-default value

Reason:

this tracks Codex defaults as Codex evolves
exact logs remain cleaner
future app-server default changes are not blocked by stale persisted values

Live signoff command shape:

codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"

Quoting requirement:

command builder must pass -c and model_reasoning_effort="xhigh" as separate argv entries
shell-rendered exact logs can show -c model_reasoning_effort='"xhigh"'
tests should assert argv arrays, not only shell strings
never concatenate user-controlled effort into a shell string without argv escaping

Prelaunch validation must block:

gpt-5.1-codex-mini with low
gpt-5.1-codex-mini with xhigh
unknown effort strings from app-server until explicitly supported by our UI and orchestrator type

Static Fallback

Fallback stays necessary because:

user may have an older Codex binary
app-server may fail to initialize
app-server may start but not support model/list
offline usage should not make the entire model picker empty
tests should not depend on live Codex availability

Fallback rules:

fallback source is explicitly marked static-fallback
fallback never claims to be current
fallback has a short visible warning in Provider Settings only when user is choosing Codex models
fallback model list should be minimal and conservative
fallback must not include newly guessed future models
fallback caused by missing model/list should include an upgrade hint tied to the detected Codex binary version when available

Recommended fallback models:

gpt-5.4
gpt-5.4-mini
gpt-5.3-codex
gpt-5.2
gpt-5.1-codex-mini

Fallback effort rules:

use medium | high for gpt-5.1-codex-mini
use low | medium | high | xhigh for known models only if live signoff confirms model_reasoning_effort pass-through
otherwise fallback UI can show richer metadata but disable non-launchable options

API-key mode note:

do not use OpenAI /v1/models as the primary Codex picker for subscription-backed Codex
optional API /v1/models fallback is allowed only for explicit API-key mode diagnostics
if API /v1/models disagrees with Codex app-server model/list, Codex app-server wins for native Codex execution
reason: the actual runtime surface is codex exec, and app-server describes what Codex clients should show

Cache And Refresh

Goal:

make model updates feel fresh without making Provider Settings slow or flaky.

Main-process cache:

key: Codex binary path plus Codex binary version plus Codex home plus launch cwd/profile/config fingerprint plus preferred auth mode plus effective auth mode plus managed account hash plus API-key source
success TTL: 10 minutes
stale TTL: 24 hours
in-flight dedupe: one live model/list request per key
manual refresh bypasses success TTL but still dedupes in-flight work
auth mode change invalidates the ready cache for UI selection purposes
forced_login_method and forced workspace changes invalidate the affected auth/catalog scope
logout clears ChatGPT-scoped catalog cache
API key source change clears API-key-scoped catalog cache
project .codex/config.toml, global config.toml, or model_catalog_json changes clear the affected scope when detected by fingerprint change
binary path or version change clears all Codex model catalog cache entries

Renderer cache:

consume CliProviderStatus.modelCatalog
no independent polling loop in the model picker
refresh through existing provider status refresh action

Dashboard policy:

do not run model/list on every dashboard render
use existing provider status refresh cadence
model catalog stale state can be shown only inside settings/model picker, not as a scary dashboard error
dashboard catalog is a global/default-scope summary, not a promise that every project cwd has the same catalog

Provider Settings policy:

open dialog with cached provider status immediately
refresh in background
show Checking... only for the area still being refreshed
never replace a ready catalog with empty state during a refresh

Avoid this bug:

do not set global provider status to unavailable while only the model catalog refresh is pending
do not replace a ChatGPT-ready account state with a catalog timeout
do not show generic Unknown error; preserve app-server method, timeout, and fallback source in diagnostics
if auto resolves to ChatGPT, API-key detection copy stays secondary
if auto resolves to API key because ChatGPT is unavailable, show why ChatGPT was skipped before showing API-key catalog

UI Behavior

Model picker

When provider=codex:

prefer providerStatus.modelCatalog.models
option value is launchModel
React key can use catalogId
label uses displayName
default badge uses isDefault
hidden app-server models are excluded from normal selector unless already persisted in a team
disabled state uses existing Agent Teams policy plus app-server upgrade hints
runtime-capability state controls whether a visible model is launchable
fallback badge says Using fallback catalog only when source is fallback
if app-server says a model is available but Agent Teams disables it, show Available in Codex, disabled for Agent Teams
if app-server says a future model exists but runtime capability is missing, show Available in Codex, waiting for Agent Teams runtime support
if a persisted model is missing from current catalog, show it as Unavailable in current Codex catalog and require user confirmation before relaunch
if the dialog has a selected cwd and only a global catalog is available, show global options as provisional until project-scoped catalog finishes
if project-scoped catalog differs from global catalog, keep the user's explicit selection only if it exists in the project-scoped catalog or is a preserved persisted value

When catalog is loading:

keep previous options visible
show a subtle "Refreshing models" state
do not show an empty Codex picker unless no cached or fallback models exist
label provisional global catalog rows as Checking this project... when launch cwd is known

When catalog fails:

use stale cache if present
otherwise use static fallback
show the app-server error in diagnostics, not as a generic unknown error

Effort selector

When provider=codex and selected model has catalog metadata:

show efforts from supportedReasoningEfforts
mark defaultReasoningEffort as default
include xhigh if returned by app-server and runtime capability says Codex effort config pass-through is supported
if runtime capability is missing, show Codex-only efforts as metadata or disabled rows, not selectable launch values
if selected effort is no longer valid, reset to default with a small explanation
if model is Agent Teams-disabled, keep effort selector read-only or disabled to avoid suggesting launchability

When selected model has no catalog metadata:

show only safe fallback efforts
do not show xhigh unless launch pass-through is implemented and tested

When provider=anthropic:

keep current selector behavior
do not show Codex-only minimal, none, or xhigh
do not change Anthropic copy

Default model

Recommended behavior:

app-server isDefault defines the Codex default in UI
"Default" label can render as Default (gpt-5.4) or Default (GPT-5.4) when catalog is ready
new Codex teams can display Default, but launch must resolve it to a concrete resolvedLaunchModel
existing teams keep their persisted model unless user changes it
do not rewrite old team metadata just because app-server default changed
exact logs and team metadata should record both selected Default and concrete resolved model

Reason:

new teams benefit from current Codex defaults
existing teams remain explainable even if Codex default changes later

Orchestrator Changes

Model status

Short term:

keep static CODEX_MODELS for standalone fallback and non-app UI compatibility
add richer status only if orchestrator can read app-server directly without slowing CLI startup

Recommended first cut:

claude_team owns app-server model catalog for UI
orchestrator keeps static runtime status until a dedicated orchestrator catalog source is added
launch validation accepts provider-explicit Codex model strings even if not in static CODEX_MODELS
orchestrator exposes runtime capabilities for dynamic Codex model ids and Codex reasoning effort config pass-through

Reason:

UI is where the dynamic picker is needed immediately
orchestrator should not reject a future model that Codex app-server already exposed and claude_team selected
UI should not guess whether the current runtime can launch that future model

Validation

Update validation so:

provider-explicit codex launches can use model strings from app-server catalog
unknown model strings are not guessed as Codex without provider context
static isCodexModel() remains valid for generic detection, not authoritative for provider-explicit launches
if provider context is missing, keep existing conservative static validation

Effort transport

Update orchestrator:

accept Codex efforts minimal | low | medium | high | xhigh
preserve Anthropic max
in Codex native executor, convert Codex effort to -c model_reasoning_effort='"value"'
do not pass unsupported effort values to codex exec
exact logs should show the selected effort as normalized Agent Teams metadata and the actual Codex config override

Required tests:

Codex native xhigh becomes -c model_reasoning_effort='"xhigh"'
no effort omits model_reasoning_effort
Anthropic max remains Anthropic-only
Codex max is rejected
Anthropic xhigh is rejected

Concrete Implementation Touchpoints

claude_team:

src/main/services/infrastructure/codexAppServer/protocol.ts - add app-server model DTOs
src/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.ts - preserve JSON-RPC error code, method, and details
src/main/services/infrastructure/codexAppServer/CodexBinaryResolver.ts or a nearby service - expose binary version for cache invalidation
src/features/codex-model-catalog - new feature for catalog domain, use case, app-server source, fallback source, and cache
src/features/codex-account/main/composition/createCodexAccountFeature.ts - coordinate combined control-plane snapshot or delegate to shared reader
src/features/codex-account/renderer/mergeCodexProviderStatusWithSnapshot.ts - preserve account truth while merging model catalog truth
src/shared/types/cliInstaller.ts - add optional provider model catalog
src/shared/types/team.ts - widen provider-aware effort types without breaking old persisted values
src/shared/types/schedule.ts - prevent scheduled launches from dropping Codex-specific efforts
src/main/services/team/TeamDataService.ts - preserve provider-aware effort and launch identity when reconstructing team state
src/main/services/team/TeamMembersMetaStore.ts - stop filtering Codex efforts down to legacy low | medium | high
src/main/services/team/TeamBackupService.ts and restore paths - preserve additive launch identity and tolerate old backups
src/main/services/runtime/CliProviderModelAvailabilityService.ts - keep runtime verification compatible with launchModel values and do not verify hidden/catalog-only rows by accident
src/main/ipc/teams.ts and src/main/http/teams.ts - parse provider first, then validate effort
src/renderer/utils/teamModelAvailability.ts - consume rich Codex catalog
src/renderer/utils/teamModelCatalog.ts - demote Codex static list to fallback and labels only
src/renderer/components/team/dialogs/EffortLevelSelector.tsx - make options provider/model-aware
src/renderer/components/team/dialogs/LaunchTeamDialog.tsx and CreateTeamDialog.tsx - remove unsafe effort casts and persist resolved launch identity
member draft/editor components - validate per-member resolved provider/model/effort
renderer launch prefill and draft retry storage - add a versioned launch identity payload and tolerate old entries

agent_teams_orchestrator:

src/entrypoints/sdk/runtimeTypes.ts - add provider-aware Codex effort support
src/main.tsx - update --effort parser or provider-specific validation path
src/utils/effort.ts and src/utils/providerEffort.ts - separate Anthropic max from Codex xhigh
Codex native executor path - convert effort to -c model_reasoning_effort
src/utils/model/codex.ts - rename static list semantics to fallback/static detection
src/utils/model/validateModel.ts - allow provider-explicit Codex app-catalog models
runtime status/capability endpoint - expose dynamic Codex model and effort pass-through support
exact-log/runtime status code - record selected model, resolved model, selected effort, resolved effort, and config override

Phased Implementation

Phase 0 - contracts and live spike

Commit boundary: docs(codex): plan app-server model catalog

Tasks:

add this plan
keep live probe output in signoff notes or test fixture
confirm installed Codex supports model/list
confirm one app-server session can read account, rate limits, and model catalog
confirm docs support model_reasoning_effort
decide exact shell quoting for -c model_reasoning_effort
capture fixtures for at least two catalog shapes: current live shape and synthetic id !== model
capture current Codex binary version and document cache invalidation expectations

Acceptance:

plan exists in the dedicated worktree
no code behavior changes
weak areas are explicitly called out

Phase 1 - app-server model catalog feature

Commit boundary: feat(codex): add app-server model catalog source

Tasks:

add structured JSON-RPC request errors with method/code/details
expose or probe Codex binary version for catalog cache keys
add effective config fingerprint support using app-server config/read when available
add config/read support detection and always send {} params at minimum
add src/features/codex-model-catalog
add app-server protocol types
add CodexModelCatalogAppServerClient
add normalization domain rules
add static fallback source
add in-memory cache with TTL and in-flight dedupe
include launch scope fields in cache keys: cwd, profile, trust, config fingerprint, launch override fingerprint
include forced login method and forced workspace hash in auth-scoped cache keys
normalize both documented effort option objects and defensive string effort values
classify method not found, timeout, malformed response, and empty catalog separately
add structured diagnostics without raw account email or secret-bearing env values
expose feature facade from main composition

Acceptance:

JSON-RPC method not found can be detected in tests
binary version changes invalidate catalog cache
config fingerprint changes invalidate catalog cache for that scope
forced login/workspace changes invalidate account, limits, and catalog cache for that scope
unit tests cover normalization, fallback, pagination, duplicate ids, missing modalities, unknown effort strings, and id !== model
app-server client tests cover model/list request params and timeout labels
method-not-found falls back without marking account disconnected
diagnostics include source, status, method, error category, binary version, effective auth mode, and cache age
no renderer behavior changes yet

Phase 2 - provider status integration

Commit boundary: feat(runtime): expose codex model catalog metadata

Tasks:

add optional modelCatalog to CliProviderStatus
add optional runtimeCapabilities to CliProviderStatus
merge Codex model catalog into provider status
keep models: string[] derived from launchModel
make provider refresh use cached, auth-scoped catalog
implement combined account/rate-limits/catalog app-server read for normal refresh
avoid extra app-server session in hot paths where account snapshot already refreshes
clear ChatGPT-scoped catalog on logout and API-key-scoped catalog when API key source changes
clear all catalog entries when Codex binary path or version changes
ensure auto catalog scope follows effective launch auth mode, not just configured preference
add request/snapshot versioning so stale refresh responses cannot overwrite newer auth state
support global provider refresh and project-scoped launch refresh as different catalog scopes
preserve Anthropic provider status shape

Acceptance:

Codex provider status includes modelCatalog
Codex provider status includes runtime capability metadata when available
old models still works
auto with ChatGPT ready uses ChatGPT-scoped catalog even if API key is detected
auto with ChatGPT unavailable and API key ready uses API-key-scoped catalog with clear degraded copy
forced login method overrides are reflected in effective auth copy and cache scope
one normal Codex provider refresh does not spawn separate app-server processes for account, limits, and catalog
Anthropic snapshots are byte-for-byte equivalent except ordering noise already present
provider dashboard does not block on a slow catalog refresh when stale cache exists
older refresh results are ignored after auth mode or runtime capability changes
global dashboard catalog and project launch catalog do not overwrite each other

Phase 3 - dynamic UI model picker and effort selector

Commit boundary: feat(codex): use dynamic model catalog in team launch UI

Tasks:

update Codex model picker to prefer rich catalog
show app-server labels, default badge, and fallback source state
update effort selector to be provider/model-aware
show xhigh metadata only for Codex models that return it
make xhigh selectable only when runtime capability says Codex effort config pass-through is supported
hide Codex-only efforts for Anthropic
reset invalid effort on model change
preserve missing persisted models as visible warning rows instead of silently clearing selection
keep Agent Teams disabled policy separate from Codex app-server availability
show future app-server models immediately, with New from Codex catalog status when policy has not verified them yet
when cwd is selected, refresh project-scoped Codex catalog before enabling launch-only controls

Acceptance:

gpt-5.1-codex-mini shows only medium | high
gpt-5.3-codex-spark defaults to high
gpt-5.4 shows low | medium | high | xhigh as catalog metadata
xhigh is disabled with runtime-upgrade copy until capability support is present
app-server-visible but Agent Teams-disabled model shows disabled copy, not unavailable copy
synthetic future gpt-5.5 fixture appears without touching static catalog
persisted model missing from current catalog is visible with a warning
Anthropic UI remains low | medium | high
static fallback still renders when app-server is unavailable
global catalog can be displayed provisionally, but launch enablement waits for project-scoped catalog or explicit degraded confirmation

Phase 4 - launch validation and Codex effort pass-through

Commit boundary: feat(runtime): pass codex reasoning effort through native exec

Tasks:

widen team launch effort validation with provider-specific rules
update IPC and HTTP validators
update TeamProvisioningService request shaping
persist additive ProviderModelLaunchIdentity into team metadata, exact-log metadata, and backup/restore payloads where launch identity is reconstructed
update orchestrator parser and runtime types
expose orchestrator runtime capability metadata for dynamic Codex models and Codex effort config
translate Codex effort to argv entries ['-c', 'model_reasoning_effort="value"']
keep Anthropic max separate
add exact-log metadata for selected model, resolved launch model, catalog source, selected effort, and resolved effort
resolve Default to concrete launch model before provisioning
update scheduled/provisioned launch paths or block Codex-only efforts in those paths until updated
enforce built-in OpenAI Codex provider scope or block custom/OSS provider configs with clear copy
pass profile/cwd/config overrides consistently between preview and codex exec

Acceptance:

Codex xhigh launch reaches codex exec as model_reasoning_effort
Codex max is rejected before launch
Anthropic xhigh is rejected before launch
unsupported model-effort pairs are blocked before provisioning
provider-explicit synthetic future model is accepted only when runtime capability says dynamic Codex models are supported
member metadata, team metadata, draft retry, and backup/restore preserve provider-aware effort
replay/exact logs show what was selected, what default resolved to, and what was passed to Codex
exact logs include catalog scope fingerprint and provider scope, but not raw config values

Phase 5 - cleanup and fallback tightening

Commit boundary: refactor(codex): demote static model catalog to fallback

Tasks:

rename static Codex catalog helpers to make fallback status explicit
remove UI assumptions that static list is authoritative
make future provider-explicit Codex ids launchable when selected from app-server catalog
add diagnostics for catalog source and staleness
document fallback behavior
add a fixture/test with synthetic future model gpt-5.5
remove any remaining hardcoded Codex model order from the primary Codex UI path
add hidden-model fixture and upgrade-suggestion fixture
add one migration test for old localStorage launch prefill without provider model launch identity
add project-scoped catalog fixture with model_catalog_json
add custom-provider config fixture
add forced login method and forced workspace fixtures
add config/read method-missing and invalid-params fixtures

Acceptance:

new app-server model can appear in UI without code changes
static fallback is visible as fallback in diagnostics
no code path treats static CODEX_MODELS as the only valid Codex provider model list
synthetic gpt-5.5 appears through app-server fixture and can be selected without touching static catalog
hidden persisted model is preserved with warning and is not introduced into new-team picker
project-scoped catalog differences are visible and do not corrupt global provider status
forced login method changes are visible and do not reuse stale catalog/rate-limit scope

Test Plan

`claude_team` unit tests

Add tests for:

structured JSON-RPC error classification
binary version cache invalidation
effective config fingerprint cache invalidation
config/read support detection, including invalid missing params
project-scoped model_catalog_json fixture
app-server model normalization
id vs model split
default model selection
per-model effort options
unknown effort filtering
auth-scoped catalog cache keys
auto auth resolving to ChatGPT vs API-key catalog scope
combined app-server snapshot partial failures
method-not-found fallback for older Codex app-server
fallback catalog source
stale cache behavior
stale refresh response is ignored after newer auth-scope request
global catalog and project-scoped catalog use separate cache entries
forced login method and forced workspace hash use separate cache entries
custom/OSS model_provider config is blocked or marked unsupported for Agent Teams Codex
raw managed account email does not appear in catalog diagnostics or exact-log metadata
provider status models compatibility
provider status runtime capabilities compatibility
provider model availability uses launchModel, not catalogId
renderer model picker with rich catalog
renderer effort selector with Codex and Anthropic providers
renderer disables Codex-only efforts when runtime capability is missing
renderer shows synthetic future model as New from Codex catalog
renderer preserves hidden persisted model after includeHidden: true recovery
persisted missing model warning row
Agent Teams disabled policy overlay for app-server-visible models
backup/restore reads old metadata and preserves new launch identity when present
draft retry and launch prefill read old localStorage entries without dropping provider/model identity
scheduled launch validation either supports Codex-specific effort or blocks it with explicit error
launch preview with selected cwd does not enable launch from global-only catalog when project-scoped catalog is still unknown

Suggested commands:

pnpm vitest run \
  test/features/codex-model-catalog \
  test/features/codex-account \
  test/renderer/components/team \
  test/renderer/utils/teamModelCatalog.test.ts

`agent_teams_orchestrator` tests

Add tests for:

provider-explicit Codex model validation
Codex effort parser accepts minimal | low | medium | high | xhigh
Anthropic effort parser keeps existing behavior
Codex native executor emits -c model_reasoning_effort
Codex native executor builds argv entries, not unsafe shell concatenation
no effort omits Codex effort config
max is not accepted for Codex
synthetic gpt-5.5 passes when provider is explicitly Codex and model came from app catalog
capability payload reports dynamic Codex model support and effort config support
provider-explicit future model fails closed when capability is disabled
Codex native exec argv includes cwd/profile/config override semantics that match preview scope
custom provider config is not silently routed through subscription Codex UX

Suggested command:

pnpm test -- runtimeBackends providerEffort spawnMultiAgent codex

live smoke

Run only when developer has Codex login/API available:

codex app-server

JSON-RPC smoke:

{ "jsonrpc": "2.0", "id": 1, "method": "model/list", "params": { "limit": 20, "includeHidden": false } }

Native exec effort smoke:

codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"

Failure smoke:

codex exec --json --model gpt-5.1-codex-mini -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"

Expected:

our app should block the second case before launch once catalog metadata is available
if run manually, Codex may return model/provider-specific error, but product UX should not rely on that late failure

Risks And Mitigations

Risk 1 - app-server startup slows provider settings

🎯 8 🛡️ 8 🧠 5

Mitigation:

cache model catalog in main process
dedupe in-flight refreshes
use stale cache while refreshing
combine account/rate-limit/catalog reads where possible
never clear ready UI while refresh is pending

Risk 2 - effort values leak into Anthropic

🎯 9 🛡️ 9 🧠 4

Mitigation:

provider-specific effort validation
renderer selector branches by provider and selected model
tests for Anthropic not showing xhigh, minimal, or none
orchestrator rejects invalid provider-effort pairs

Risk 3 - `id` and `model` diverge later

🎯 8 🛡️ 9 🧠 3

Mitigation:

use catalogId for identity
use launchModel for runtime
tests with fixture where id !== model

Risk 4 - app-server catalog has unknown fields or new efforts

🎯 8 🛡️ 8 🧠 5

Mitigation:

tolerant protocol DTOs
unknown efforts preserved in diagnostics but not selectable
add one small allow-list update when product intentionally supports a new effort
no hard crash on unknown inputModalities

Risk 5 - static fallback becomes accidentally authoritative again

🎯 7 🛡️ 8 🧠 4

Mitigation:

name fallback helpers clearly
include source in model catalog
tests assert app-server source wins over fallback
UI diagnostics expose fallback source

Risk 6 - launch path accepts model from UI but orchestrator rejects it

🎯 8 🛡️ 8 🧠 6

Mitigation:

provider-explicit Codex launch validation should trust provider=codex plus app-server-selected model
static isCodexModel() remains only a generic detector
exact tests with a future-model fixture like gpt-5.5

Risk 7 - auth-scoped catalog leaks between modes

🎯 7 🛡️ 9 🧠 6

Mitigation:

include auth scope in catalog cache key
clear scoped cache on logout and API-key source changes
tests for ChatGPT catalog not being reused in API-key mode
UI labels catalog source and auth scope in diagnostics

Risk 8 - Default becomes nondeterministic across relaunch

🎯 8 🛡️ 9 🧠 6

Mitigation:

persist selected model kind and resolved launch model in launch identity
exact logs record both Default and concrete model
relaunch preview shows current default resolution before launch
do not silently rewrite old explicit models

Risk 9 - older Codex binary lacks `model/list`

🎯 7 🛡️ 9 🧠 5

Mitigation:

preserve JSON-RPC error code and method
classify method-not-found separately from app-server failure
show static fallback with Codex upgrade hint
cache key includes binary version so upgrades refresh the catalog

Risk 10 - `auto` auth shows the wrong catalog

🎯 7 🛡️ 9 🧠 6

Mitigation:

resolve effective auth mode before catalog scope
keep ChatGPT and API-key catalogs separate
UI copy distinguishes selected preference, effective launch mode, and fallback credentials
tests cover ChatGPT-ready + API-key-present and ChatGPT-missing + API-key-ready cases

Risk 11 - UI enables a capability the installed runtime cannot launch

🎯 7 🛡️ 10 🧠 7

Mitigation:

add explicit runtime capability metadata
display catalog metadata separately from launch enablement
fail closed when capability is missing or stale
test Phase 3 UI against a pre-Phase-4 runtime fixture

Risk 12 - future models appear but break team-agent behavior

🎯 8 🛡️ 8 🧠 6

Mitigation:

split Codex catalog availability from Agent Teams policy status
show new models as New from Codex catalog
block only hard incompatibilities: runtime capability missing, unsupported modality, disabled policy, unsupported effort
exact logs record new-model status for later debugging

Risk 13 - hidden or upgraded persisted models are silently lost

🎯 8 🛡️ 9 🧠 5

Mitigation:

run one includeHidden: true lookup for persisted explicit models missing from visible catalog
preserve model value during restore and relaunch preview
show upgrade suggestions without auto-rewriting metadata
test hidden-model and upgrade fixtures

Risk 14 - non-dialog launch path drops Codex effort

🎯 7 🛡️ 9 🧠 7

Mitigation:

audit team metadata, members metadata, backup/restore, draft retry, launch prefill, and schedule types
parse provider before parsing effort at every main-process boundary
block Codex-only effort in any path not updated in the same phase
add tests outside React launch dialogs

Risk 15 - HMR or slow refresh overwrites correct provider state

🎯 8 🛡️ 9 🧠 5

Mitigation:

add request/snapshot versioning
ignore out-of-order provider status responses
do not let catalog failures overwrite account truth
keep last ready state visible while a refresh is pending

Risk 16 - global catalog preview differs from project launch catalog

🎯 6 🛡️ 10 🧠 8

Mitigation:

include cwd, profile, trust, config fingerprint, and launch override fingerprint in catalog scope
use app-server config/read when available to derive effective config
keep dashboard/global catalog separate from launch/project catalog
require project-scoped catalog before enabling launch-only controls when cwd is known

Risk 17 - custom or OSS Codex config is mistaken for subscription Codex

🎯 7 🛡️ 9 🧠 7

Mitigation:

keep Agent Teams Codex scoped to built-in OpenAI Codex provider
detect effective model_provider when possible
block or degrade custom/OSS provider configs with explicit copy
do not show ChatGPT account limits for custom provider execution

Risk 18 - non-text model row appears in catalog

🎯 8 🛡️ 9 🧠 4

Mitigation:

require text input modality for Agent Teams launch
treat missing inputModalities with the documented backward-compatible default
do not claim personality support when supportsPersonality=false

Risk 19 - experimental app-server surface changes behavior

🎯 8 🛡️ 9 🧠 4

Mitigation:

keep experimentalApi=false
rely only on documented stable model/list fields
ignore unknown fields unless a typed use case is added

Risk 20 - app-server catalog passes but native exec fails

🎯 8 🛡️ 10 🧠 6

Mitigation:

treat app-server catalog as picker truth, not full launch proof
require Phase 4 native exec argv tests and live smoke where possible
test model, effort, cwd, profile, and provider scope together
block unsupported model-effort pairs before codex exec

Risk 21 - `config/read` behavior differs across Codex versions

🎯 7 🛡️ 9 🧠 6

Mitigation:

feature-detect config/read
always send {} params at minimum
classify method-missing, invalid-params, scoped-failure, and global-success separately
never make config-read failure disconnect the Codex account

🎯 7 🛡️ 10 🧠 6

Mitigation:

include forced login method and forced workspace hash in auth scope
invalidate account, limits, and catalog together when either changes
display forced auth copy instead of showing conflicting selected auth copy
redact workspace ids in logs and diagnostics

Risk 23 - local `model_catalog_json` changes without config change

🎯 6 🛡️ 9 🧠 7

Mitigation:

hash effective config and optionally active catalog file mtime when app-server exposes enough origin data
keep TTL/manual refresh fallback when origin data is unavailable
do not parse or log arbitrary catalog file contents
do not apply untrusted project-scoped catalog files unless effective config says they are active

Definition Of Done

The feature is done when:

Codex model picker uses app-server model/list when available.
New app-server-visible Codex models appear without app code changes.
supportedReasoningEfforts and defaultReasoningEffort drive Codex effort UI.
xhigh appears only where Codex reports it.
Anthropic UI and launch behavior are unchanged.
Codex launches pass effort through model_reasoning_effort.
UI launch controls are gated by runtime capabilities, not by catalog metadata alone.
Future app-server-visible models appear without code changes and are marked as new until policy/runtime support is clear.
Default Codex selection resolves to concrete launch identity before provisioning.
Auth changes do not reuse stale model catalogs across ChatGPT and API-key modes.
Project-scoped Codex config and model_catalog_json cannot make launch use a different catalog than preview without explicit degraded copy.
Custom or OSS Codex provider config is not silently presented as ChatGPT subscription-backed Agent Teams Codex.
config/read compatibility is feature-detected and never breaks account truth on older binaries.
Forced login method and forced workspace changes cannot reuse stale account, rate-limit, or catalog cache.
Codex binary upgrades invalidate stale catalog cache and retry model/list.
Older Codex binaries without model/list fall back without breaking account state.
Static Codex catalog is clearly fallback, not primary truth.
Hidden persisted models are preserved with explicit warnings.
Backup/restore, draft retry, launch prefill, member metadata, and scheduled paths do not drop provider-aware effort.
Exact logs and diagnostics do not persist raw account identifiers or secret values.
Exact logs include catalog scope and provider scope fingerprints for debugging preview vs launch mismatch.
HMR and out-of-order refreshes do not replace ready provider status with stale fallback/error state.
Provider Settings remains fast and does not show transient empty/error states during refresh.
Tests cover catalog source, fallback, effort validation, and launch pass-through.

Final Signoff And Handoff

The implementation is now ready for review after these checks stay green:

claude_team: pnpm typecheck
claude_team: targeted catalog/runtime/team provisioning Vitest suites
agent_teams_orchestrator_codex_native_spike: targeted Codex native exec and runtime capability Bun suites
Live codex app-server model/list smoke against the installed Codex binary
Optional UI smoke with CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike

Merge requirement:

merge/pair the claude_team branch with the agent_teams_orchestrator_codex_native_spike runtime capability change.
if the UI branch is merged without the runtime capability change, the feature remains safe but conservative: dynamic future Codex models and xhigh are visible as catalog metadata but blocked for launch.
if the runtime capability change is merged without the UI branch, existing Codex native behavior remains unchanged except for the explicit runtime status payload and xhigh exact argv support already covered by tests.

Recommended final manual smoke:

CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike pnpm dev

Then verify:

Provider Settings Codex model list is populated from app-server catalog.
gpt-5.1-codex-mini shows only medium | high.
gpt-5.4 shows low | medium | high | xhigh.
Anthropic does not show minimal, none, or xhigh.
A synthetic or newly released Codex model is not silently hidden by static UI code.
Launch logs include selected model, resolved launch model, selected effort, resolved effort, catalog source, and runtime capability truth.

98 KiB Raw Blame History

Codex App-Server Model Catalog Plan

Executive Summary

Sources And Verification

Lowest-Confidence Areas And Decisions

1. Auth-scoped catalog truth

2. Default model determinism

3. Effort transport through orchestrator

4. Catalog availability vs team-agent safety policy

5. Codex binary version and app-server method compatibility

6. auto auth resolution for model catalog

7. App-server notifications and refresh cadence

8. Backup, restore, and relaunch compatibility

9. UI and orchestrator version skew

10. Future model policy, including gpt-5.5

11. Hidden, upgraded, and persisted models

12. Stored effort schema and non-dialog launch paths

13. Renderer stale state, HMR, and out-of-order refreshes

14. Privacy, logs, and diagnostics

15. Rollout ordering across repos

16. Codex config/profile/cwd catalog mismatch

17. Built-in OpenAI Codex provider vs custom/OSS Codex config

18. Modalities and personality support

19. Stable app-server surface vs experimental fields

20. App-server preview vs native exec signoff

21. config/read scope contract is only partially documented

22. Forced login method and ChatGPT workspace scope

23. Model catalog file trust and local file changes

Top 3 Implementation Options

1. Dedicated Codex model catalog feature - chosen

2. Fold catalog into codex-account

3. Full provider model catalog for all providers now

Current Code Reality

claude_team

agent_teams_orchestrator

Target Architecture

Feature folder

App-server lifecycle

Single-session snapshot policy

Contracts

App-server protocol types

Domain model

Provider status contract

Team launch effort contract

Runtime Launch Transport

Static Fallback

Cache And Refresh

UI Behavior

Model picker

Effort selector

Default model

Orchestrator Changes

Model status

Validation

Effort transport

Concrete Implementation Touchpoints

Phased Implementation

Phase 0 - contracts and live spike

Phase 1 - app-server model catalog feature

Phase 2 - provider status integration

Phase 3 - dynamic UI model picker and effort selector

Phase 4 - launch validation and Codex effort pass-through

Phase 5 - cleanup and fallback tightening

Test Plan

claude_team unit tests

agent_teams_orchestrator tests

live smoke

Risks And Mitigations

Risk 1 - app-server startup slows provider settings

Risk 2 - effort values leak into Anthropic

Risk 3 - id and model diverge later

Risk 4 - app-server catalog has unknown fields or new efforts

Risk 5 - static fallback becomes accidentally authoritative again

Risk 6 - launch path accepts model from UI but orchestrator rejects it

Risk 7 - auth-scoped catalog leaks between modes

Risk 8 - Default becomes nondeterministic across relaunch

Risk 9 - older Codex binary lacks model/list

Risk 10 - auto auth shows the wrong catalog

Risk 11 - UI enables a capability the installed runtime cannot launch

Risk 12 - future models appear but break team-agent behavior

98 KiB

Raw Blame History

6. `auto` auth resolution for model catalog

10. Future model policy, including `gpt-5.5`

21. `config/read` scope contract is only partially documented

2. Fold catalog into `codex-account`

`claude_team`

`agent_teams_orchestrator`

`claude_team` unit tests

`agent_teams_orchestrator` tests

Risk 3 - `id` and `model` diverge later

Risk 9 - older Codex binary lacks `model/list`

Risk 10 - `auto` auth shows the wrong catalog

Risk 21 - `config/read` behavior differs across Codex versions

Risk 23 - local `model_catalog_json` changes without config change