98 KiB
Codex App-Server Model Catalog Plan
Date: 2026-04-21
Status: implementation complete in feature worktrees, pending final review/commit
Worktree: /Users/belief/dev/projects/claude/claude_team_codex_model_catalog_plan
Branch: spike/codex-model-catalog-plan
Primary repo: claude_team
Secondary repo worktree: /Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike
Architecture reference: FEATURE_ARCHITECTURE_STANDARD.md
Executive Summary
Codex model selection should move from hardcoded local lists to the official Codex app-server model/list catalog.
Chosen implementation:
- Add a dedicated
src/features/codex-model-catalogfeature inclaude_team. - Use
codex app-serverJSON-RPCmodel/listas the primary source for Codex models. - Keep the existing static Codex catalog only as a bounded fallback when app-server is unavailable.
- Add rich, additive model metadata to
CliProviderStatuswhile keepingmodels: string[]for backwards compatibility. - Use per-model
supportedReasoningEffortsanddefaultReasoningEffortfor the Codex model picker and launch validation. - Keep Anthropic and Gemini behavior unchanged by default.
- Update
agent_teams_orchestratorso Codex launches pass reasoning effort through Codex config keymodel_reasoning_effort, not through an invented--effortflag.
Decision score:
🎯 9 🛡️ 9 🧠 6- estimated implementation size:
1200-2400lines acrossclaude_teamandagent_teams_orchestrator
Why this is the safest path:
- It follows the real Codex client contract instead of chasing static releases.
- It solves future model releases like
gpt-5.5without an app release, as long as Codex app-server already exposes the model. - It avoids breaking Anthropic by making the new catalog contract additive and provider-scoped.
- It handles
xhighcorrectly as Codex-specific reasoning effort, not as Anthropicmax.
Current implementation state:
claude_teamhas the dedicated Codex model catalog feature, app-server JSON-RPC client, static fallback, provider status integration, Codex model picker integration, provider-aware effort UI, launch validation, launch identity persistence, and targeted tests.agent_teams_orchestrator_codex_native_spikeexposes runtime capabilities for dynamic Codex models and Codex reasoning config pass-through, and its Codex native exec runner passes effort through-c model_reasoning_effort="value".- Anthropic remains isolated from Codex-only effort values. Anthropic launch UI still uses
low | medium | high; Codex can use per-modelminimal | low | medium | high | xhighonly where catalog/runtime policy allows it. - Future Codex app-server models can appear immediately in UI. Launch is allowed only when the local runtime declares dynamic Codex model support; otherwise they remain visible with upgrade/policy copy instead of failing late during spawn.
DefaultCodex selection is resolved to a concrete model immediately before provisioning and stored as additive launch identity metadata.- The remaining work before merge is review/signoff, not more architecture discovery.
Sources And Verification
Official sources checked:
Important official facts:
model/listis explicitly intended for rendering model and personality selectors.model/listreturnsid,model,displayName,hidden,defaultReasoningEffort,supportedReasoningEfforts,inputModalities,supportsPersonality,isDefault,upgrade, andupgradeInfo.includeHidden: falsereturns picker-visible models by default.codex exechas--modeland-c, --config key=value.codex execdoes not expose a first-class--effortflag.- Codex config key
model_reasoning_effortsupportsminimal | low | medium | high | xhigh. xhighis model-dependent.config/readexists in app-server and returns effective configuration after configuration layering.- Codex loads user config from
~/.codex/config.tomland can also load project-scoped.codex/config.tomlonly for trusted projects. model_catalog_jsoncan override the model catalog, including profile-level overrides.codex execsupports--cdand--profile, and-c key=valueoverrides take precedence for one invocation.
Local probe:
- binary:
codex-cli 0.117.0 - method:
codex app-serverover JSON-RPC stdio - transport: newline-delimited JSON-RPC over stdio, not
Content-Lengthframing - request:
model/listwith{ "limit": 20, "includeHidden": false } - result: 8 visible models,
gpt-5.4marked default,nextCursor: null - visible models returned:
gpt-5.4,gpt-5.2-codex,gpt-5.1-codex-max,gpt-5.4-mini,gpt-5.3-codex,gpt-5.3-codex-spark,gpt-5.2,gpt-5.1-codex-mini xhighis already returned for most models.gpt-5.1-codex-minionly returnedmedium | high, so effort options must be per-model.gpt-5.3-codex-sparkreturned default efforthigh, so default effort must not be global.codex exec --helplocally confirms--cd,--profile,--model,--oss,--local-provider, and repeatable-c key=value.- local help confirms
--ossis equivalent to-c model_provider=oss, so provider scope can differ from subscription-backed OpenAI Codex if not guarded. - live
config/readprobe returned{ config, origins }. - live
config/readprobe requiresparamsobject; missingparamsreturns JSON-RPC error-32600. - live
config/readprobe accepted{ cwd }and{ profile }without error, so the implementation should feature-detect and test scoped reads instead of assuming only global config. - final live smoke on this worktree confirmed
model/listreturns 8 visible models, defaultgpt-5.4,xhighfor most models, andmedium | highonly forgpt-5.1-codex-mini.
Combined app-server session probe:
- one initialized app-server process successfully handled
account/read,account/rateLimits/read, andmodel/listsequentially account/readreturned a ChatGPT account shape in the local environmentaccount/rateLimits/readreturnedprimary.windowDurationMins = 300andsecondary.windowDurationMins = 10080model/listreturned the same 8 visible models in that same session- conclusion: provider refresh should use a combined control-plane session when it needs account, limits, and catalog truth
Lowest-Confidence Areas And Decisions
1. Auth-scoped catalog truth
🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-350 lines
Uncertainty:
- app-server
model/listmay return different catalogs depending on active Codex auth state, account plan, org policy, API-key mode, or future Codex rollout flags. - The local probe only proves one logged-in environment, not all account modes.
Decision:
- treat Codex model catalog as auth-scoped, not global
- cache key must include binary path, Codex home, preferred auth mode, effective auth mode, managed account stable identity when available, and API-key availability source
- never reuse a ChatGPT-account catalog as API-key-mode catalog
- never reuse an API-key-mode catalog as ChatGPT-account catalog
- when auth mode changes, keep previous catalog visible only as stale UI while refresh is in flight, then replace it
Implementation rule:
catalogCacheKey =
binaryPath
+ binaryVersion
+ codexHome
+ preferredAuthMode
+ effectiveAuthMode
+ managedAccountHash or "no-chatgpt-account"
+ apiKey.source or "no-api-key"
The hash should use a per-process salt and should not be persisted. Do not persist raw email solely for catalog cache.
2. Default model determinism
🎯 8 🛡️ 9 🧠 6
Estimated implementation impact: 220-420 lines
Uncertainty:
- current UI can represent model as empty string meaning
Default - Codex app-server default can change after a Codex release
- launch logs, relaunch, replay, and team metadata need to stay explainable
Decision:
- keep
Defaultas a UI selection - resolve
Defaultto a concreteresolvedLaunchModelimmediately before launch - persist both user selection and resolved runtime truth in launch metadata
- never silently rewrite old team config from one concrete model to another
- if a team stored
Default, relaunch should show that it will resolve to the current Codex default before launch
Required persisted launch identity:
export interface ProviderModelLaunchIdentity {
providerId: TeamProviderId;
providerBackendId: TeamProviderBackendId | null;
selectedModel: string | null;
selectedModelKind: 'default' | 'explicit';
resolvedLaunchModel: string;
catalogId: string | null;
catalogSource: 'app-server' | 'static-fallback' | 'unavailable';
catalogFetchedAt: string | null;
selectedEffort: string | null;
resolvedEffort: string | null;
}
This identity should be written into exact logs and launch-derived metadata. It should not replace existing fields in Phase 1, but it should become the canonical explanation layer for Codex relaunch/replay.
3. Effort transport through orchestrator
🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 180-320 lines
Uncertainty:
- Agent Teams exposes a generic
--effortconcept today - Codex CLI does not expose
--effort - Codex uses config key
model_reasoning_effort
Decision:
- UI and main process may accept provider-aware effort strings
- orchestrator public Agent Teams CLI can continue accepting
--effort - Codex executor must translate Codex effort to
codex exec -c model_reasoning_effort='"value"' - Anthropic executor must not see Codex-only effort values
- Codex executor must not see Anthropic
max
No implementation phase may ship xhigh as selectable until this pass-through is tested.
4. Catalog availability vs team-agent safety policy
🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 160-280 lines
Uncertainty:
- app-server
model/listsays a model is available to Codex - our team-agent contract can still make a model unsafe for Agent Teams if it breaks task/reply/bootstrap conventions
- current UI has local disabled policy for
gpt-5.3-codex-spark,gpt-5.2-codex, andgpt-5.1-codex-mini
Decision:
- model catalog answers "can Codex offer this model"
- team policy answers "can Agent Teams safely launch this model"
- keep these as separate layers
- do not remove current disabled policies just because app-server returns a model
- show clear disabled copy:
Available in Codex, disabled for Agent Teams - disabled models can still display catalog metadata and effort metadata for transparency
5. Codex binary version and app-server method compatibility
🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 220-420 lines
Uncertainty:
codex app-serveris documented as an app-server integration surface, but local users can have older Codex binaries.model/listmay be missing, renamed, or return a narrower shape in older binaries.- Current
JsonRpcStdioClientcollapses JSON-RPC errors toError(message), which loses method, code, and structured details needed to distinguishmethod not foundfrom auth/network/timeout. - Current
CodexBinaryResolvercaches only binary path, not binary version.
Decision:
- make binary version part of catalog cache identity
- add structured JSON-RPC error metadata before implementing catalog fallback
- treat
method not foundasstatic-fallback, not as account failure - treat malformed model rows as catalog degradation, not app-server runtime failure
- clear catalog cache when resolved Codex binary path or version changes
Required implementation detail:
export class JsonRpcRequestError extends Error {
readonly method: string;
readonly code: number | null;
readonly details: unknown;
}
The app-server model client should classify:
method_not_found: fallback to static catalog and show upgrade hinttimeout: stale cache if available, then fallbackmalformed_response: fallback plus diagnosticsprocess_exit: shared app-server failure for all sub-results in combined snapshotauth_required: account/read decides auth truth; model/list must not invent auth truth
6. auto auth resolution for model catalog
🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-320 lines
Uncertainty:
- UI lets users pick
auto,chatgpt, orapi_key. - Catalog can differ between ChatGPT subscription and API key.
- The model picker must preview the catalog for the mode that launch will actually use, not only the configured preference.
Decision:
preferredAuthMode=autois not a catalog scope by itself- resolve
autointoeffectiveAuthModeusing the same readiness logic as launch - catalog request should be scoped to the effective launch mode
- Provider Settings can show both preference and effective catalog scope when they differ
- if effective mode flips from ChatGPT to API key because ChatGPT becomes unavailable, keep stale ChatGPT catalog visually stale and refresh API-key catalog
UX copy rule:
- do not say
Detected from OPENAI_API_KEYas the primary model catalog source when ChatGPT account is the effective mode - show API-key availability only as fallback/secondary when selected auth is ChatGPT or auto resolves to ChatGPT
7. App-server notifications and refresh cadence
🎯 8 🛡️ 8 🧠 5
Estimated implementation impact: 160-260 lines
Uncertainty:
- account login flow has notifications
- current docs and local probe do not establish a dedicated model-catalog changed notification
- keeping a long-lived app-server just for model catalog would increase lifecycle complexity
Decision:
- do not introduce a long-lived model catalog subscription in this rollout
- use short-lived app-server sessions for refresh
- trigger catalog refresh after login success, logout, auth mode change, API-key source change, manual refresh, and provider status refresh
- do not poll
model/listaggressively from renderer - use
10 minutesuccess TTL and stale cache for UI continuity
If a future app-server release adds model catalog notifications, integrate them later behind the catalog feature port without changing renderer contracts.
8. Backup, restore, and relaunch compatibility
🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 240-520 lines
Uncertainty:
- team launch metadata already persisted provider/model/effort/backend in several places
- adding dynamic defaults and resolved model identity can make old backups ambiguous
- old teams may contain no
modelCatalogmetadata
Decision:
- new
ProviderModelLaunchIdentityis additive - old teams without it remain readable
- relaunch derives missing identity from existing provider/model/effort/backend fields
- restore does not require the old catalog to be available
- if restored explicit model is missing from current catalog, UI preserves the explicit model with a warning instead of silently replacing it with current default
- if restored model was
Default, relaunch preview resolves it against current catalog and says so before launch
Migration rule:
old explicit model -> selectedModelKind="explicit", resolvedLaunchModel=old model
old empty model -> selectedModelKind="default", resolvedLaunchModel=current default at next launch
missing effort -> selectedEffort=null, resolvedEffort=current model default at next launch
9. UI and orchestrator version skew
🎯 7 🛡️ 10 🧠 7
Estimated implementation impact: 280-620 lines
Uncertainty:
claude_teamandagent_teams_orchestratorcan be updated at different times.- UI can learn about
xhigh,minimal, or a future model likegpt-5.5before the installed orchestrator can launch it safely. - The current orchestrator static Codex helpers can reject a model that Codex app-server already exposed.
Decision:
- catalog visibility and launch capability are separate contracts
- UI may display app-server catalog metadata as soon as it is available
- UI must not enable launch controls that require new orchestrator behavior until runtime capability says that behavior exists
- provider-explicit Codex model strings can be accepted only after orchestrator declares dynamic Codex model support
- Codex
xhighcan be shown as metadata before Phase 4, but it is disabled for launch until Codex effort pass-through is available
Required runtime capability contract:
export interface ProviderRuntimeCapabilities {
providerId: TeamProviderId;
codex?: {
supportsDynamicAppServerModels: boolean;
supportsCodexReasoningEffortConfig: boolean;
supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
acceptsProviderExplicitFutureModels: boolean;
};
}
Compatibility rule:
catalog says model/effort exists
+ team policy says model is not disabled
+ runtime capability says launch path supports it
= launch control enabled
If any part is missing, the picker can still display the model, but launch must be disabled with explicit copy.
Recommended copy:
Available in Codex, waiting for Agent Teams runtime supportThis Codex effort is visible in Codex, but this Agent Teams runtime cannot launch it yetUpgrade the Agent Teams runtime to use this model
This avoids a bad state where the user selects xhigh successfully in UI and then gets a late codex exec failure.
10. Future model policy, including gpt-5.5
🎯 8 🛡️ 8 🧠 6
Estimated implementation impact: 240-520 lines
Uncertainty:
- app-server can expose a new model immediately after OpenAI releases it.
- the user goal is that new Codex models appear without us shipping a new static list.
- Agent Teams still needs a safety layer so one unexpected model row does not break team launch flows.
Top 3 policies:
- Allow every app-server-visible model immediately:
🎯 8 🛡️ 5 🧠 3,80-180lines. This best solves future releases, but it can route unverified models into team launch without product copy or rollback clarity. - Show every app-server-visible model immediately, launch with capability gate plus "new model" warning:
🎯 9 🛡️ 8 🧠 5,240-520lines. This keeps future models visible without app releases, but still blocks only real launch incompatibilities. - Hide or disable unknown models until a code release updates policy:
🎯 4 🛡️ 9 🧠 2,60-120lines. This is safe but defeats the reason to usemodel/list.
Chosen policy: option 2.
Implementation rule:
- app-server-visible, non-hidden models appear in the picker immediately
- known disabled Agent Teams models remain disabled
- new unknown models are selectable only if runtime capabilities support dynamic Codex models
- new unknown models get a
New from Codex catalognote until a successful launch or explicit policy promotion marks themverified - if the new model does not expose usable text input or any supported effort we can launch, it is shown but disabled
- hidden models are never introduced into new-team pickers by default
Policy statuses:
export type CodexTeamModelPolicyStatus =
| 'verified'
| 'new-from-codex-catalog'
| 'disabled-for-agent-teams'
| 'requires-runtime-upgrade'
| 'missing-from-current-catalog';
This means gpt-5.5 can appear the day app-server returns it, but the UI will not pretend the full Agent Teams launch path is verified unless the local runtime can actually handle provider-explicit dynamic Codex models.
11. Hidden, upgraded, and persisted models
🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 160-340 lines
Uncertainty:
- official docs say
includeHidden: falsereturns picker-visible models by default. - persisted teams can reference a model that later becomes hidden, upgraded, renamed, or unavailable.
- app-server exposes
upgradeandupgradeInfo, but we do not know every future migration shape.
Decision:
- normal picker uses
includeHidden: false - if a persisted explicit Codex model is not found in the visible catalog, run one scoped refresh with
includeHidden: true - if hidden lookup finds the model, show it as
Hidden in Codex catalogand keep relaunch possible only if runtime capability and team policy allow it - if
upgradepoints to a visible replacement, show a non-destructive migration suggestion - never auto-rewrite persisted model ids during restore or relaunch
Relaunch behavior:
visible model found -> normal relaunch
hidden model found -> relaunch allowed only with warning and policy pass
upgrade available -> show "Switch to recommended model" action
missing model -> keep value visible, require user to choose another model before launch
This avoids both failure modes: silently changing a user's team model, or breaking old teams because a model moved out of the default picker.
12. Stored effort schema and non-dialog launch paths
🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 320-760 lines
Uncertainty:
- effort is not used only in launch dialogs.
- team metadata, member metadata, backup/restore, draft retry, localStorage launch params, and scheduled/provisioned flows can all carry
effort. - current normalizers in team data paths may silently discard anything outside
low | medium | high.
Decision:
- provider-aware effort parsing must be added at every inbound boundary, not only in React components
- old persisted
low | medium | highvalues stay valid - new Codex-specific values are preserved only with provider/model context
- if provider context is missing, parse as legacy effort and do not invent Codex-specific meaning
- scheduled launches and automation-like flows must either be updated in the same phase or explicitly block Codex-only efforts until updated
High-risk code paths to audit during implementation:
src/main/services/team/TeamMembersMetaStore.tssrc/main/services/team/TeamDataService.tssrc/main/services/team/TeamBackupService.tssrc/main/services/team/TeamProvisioningService.tssrc/shared/types/schedule.tssrc/main/ipc/teams.tssrc/main/http/teams.ts- renderer launch prefill and draft retry localStorage state
Migration rule:
legacy effort with no provider context -> keep if low | medium | high
codex effort with provider=codex -> validate against selected model catalog
codex effort with provider missing -> store as selected string only, resolve before launch
unsupported restored effort -> show warning, do not silently downgrade
13. Renderer stale state, HMR, and out-of-order refreshes
🎯 8 🛡️ 9 🧠 5
Estimated implementation impact: 180-380 lines
Uncertainty:
- previous provider settings work showed transient wrong states after HMR and slow refreshes.
- catalog, account, and rate limits can refresh with different timings.
- a stale app-server response can arrive after a newer auth-mode change.
Decision:
- every provider status refresh should carry a monotonic
requestIdorsnapshotVersion - renderer stores the latest accepted version per provider
- responses older than the latest accepted version are ignored
modelCatalog.schemaVersionis required and future versions are treated as degraded, not fatal- HMR should keep last ready provider status visible while a refresh is in flight
- a catalog refresh cannot overwrite account connected state unless it came from the same combined snapshot
Required stale-write guard:
if incoming.providerId != current.providerId -> reject
if incoming.requestId < current.requestId -> reject
if incoming.authScope != current.authScope and incoming.status is not from current auth selection -> keep as stale diagnostics only
This directly targets flicker like Codex native unavailable followed by ready state, or fallback API-key copy appearing while ChatGPT account mode is selected.
14. Privacy, logs, and diagnostics
🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 120-260 lines
Uncertainty:
- account-scoped cache keys need stable identity, but raw email should not leak into exact logs, runtime snapshots, or persistent diagnostics.
- API-key source is useful for UX, but no secret or env value should be logged.
Decision:
- hash managed account identity in memory for cache keys
- use a per-process salt for volatile cache keys
- do not persist raw account email solely for model catalog cache
- exact logs can record
authScope=chatgptorauthScope=api_key, not raw account identity - diagnostics can record
apiKeySource=OPENAI_API_KEYbut never the value - error messages preserve method/code/timeout, but redact command env and tokens
Required diagnostic fields:
export interface CodexModelCatalogDiagnostics {
source: 'app-server' | 'static-fallback' | 'unavailable';
status: 'ready' | 'stale' | 'degraded' | 'unavailable';
method?: 'model/list';
errorCode?: string | number | null;
errorCategory?: string | null;
binaryVersion?: string | null;
effectiveAuthMode?: 'chatgpt' | 'api_key' | null;
cacheAgeMs?: number | null;
}
No UI surface should show Unknown error for catalog failures after this feature.
15. Rollout ordering across repos
🎯 8 🛡️ 10 🧠 6
Estimated implementation impact: 120-260 lines
Uncertainty:
claude_teamcan ship UI before the user has a compatibleagent_teams_orchestratorruntime in cache.- the app can point to
CLAUDE_DEV_RUNTIME_ROOT, bundled runtime cache, or a user-installed runtime binary.
Decision:
- implement orchestrator support first or behind a UI capability gate
- Provider Settings can show catalog metadata before launch support exists
- Create/Launch dialogs must consult runtime capabilities before enabling new Codex models or new Codex efforts
- the runtime health check should expose a version/capability payload, not force UI to infer support from binary version strings
- if capabilities are unavailable, default to safe: display metadata, disable launch-only features
Rollout sequence:
- Add orchestrator dynamic Codex model and effort capability support.
- Add
claude_teamcatalog feature and provider status metadata. - Show catalog in UI with capability gates.
- Enable launch when capability and catalog agree.
- Remove any temporary guard only after bundled runtime and dev runtime both report capabilities in CI/smoke.
This is the cleanest way to avoid UI and runtime getting out of sync.
16. Codex config/profile/cwd catalog mismatch
🎯 6 🛡️ 10 🧠 8
Estimated implementation impact: 360-900 lines
Uncertainty:
- official config docs allow
model_catalog_json, and profile-levelprofiles.<name>.model_catalog_jsoncan override it. - Codex loads project-scoped
.codex/config.tomlonly when a project or worktree is trusted. codex execcan run with a differentcwd, profile, and inline-coverrides than the short-lived app-server preview session.- current
CodexAppServerSessionFactorystartscodex app-serverwithout an explicitcwdor profile.
Failure mode:
- Provider Settings shows catalog A from global config.
- Launch runs
codex execin project cwd with project-scoped or profile config and effectively uses catalog B. - The user selects a model that preview says is valid, but launch resolves against a different provider/catalog.
Top 3 policies:
- Global-only catalog preview:
🎯 7 🛡️ 5 🧠 3,80-180lines. Fast and simple, but wrong for project-scoped Codex configs. - Project-scoped catalog preview for launch flows, global preview for dashboard:
🎯 9 🛡️ 9 🧠 7,360-900lines. More work, but it matches actualcodex execlaunch context. - Ignore config and force a static OpenAI Codex provider always:
🎯 5 🛡️ 8 🧠 4,200-420lines. Safer than mismatch, but it discards legitimate user Codex config and can surprise power users.
Chosen policy: option 2.
Decision:
- dashboard/provider card can show a global Codex catalog snapshot
- Create/Launch dialogs must fetch or resolve catalog for the selected launch
cwd - if profile selection exists or is introduced, catalog cache key must include profile name
- if we pass inline config overrides to
codex exec, equivalent preview scope must include those overrides or launch must be marked "not preview-verified" - if project trust/config cannot be resolved, launch UI falls back to global catalog but shows
Catalog may differ for this project
Required preview scope:
export interface CodexModelCatalogScope {
codexHome: string;
binaryPath: string;
binaryVersion: string | null;
cwd: string | null;
projectTrust: 'trusted' | 'untrusted' | 'unknown';
profileName: string | null;
configFingerprint: string | null;
preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
effectiveAuthMode: 'chatgpt' | 'api_key' | null;
launchOverridesFingerprint: string | null;
}
Cache key correction:
catalogCacheKey =
binaryPath
+ binaryVersion
+ codexHome
+ cwd or "global"
+ projectTrust
+ profileName or "default-profile"
+ configFingerprint or "unknown-config"
+ launchOverridesFingerprint or "no-launch-overrides"
+ preferredAuthMode
+ effectiveAuthMode
+ forcedLoginMethod or "no-forced-login-method"
+ forcedWorkspaceHash or "no-forced-workspace"
+ managedAccountHash or "no-chatgpt-account"
+ apiKey.source or "no-api-key"
Implementation notes:
- use app-server
config/readwhen available to get effective config fingerprints for the same scope that launch will use - do not parse arbitrary TOML as the primary config source if app-server can resolve effective configuration
- if app-server cannot scope
config/readby cwd/profile, keep that uncertainty visible in diagnostics - do not use raw config file contents as a cache key or log payload; hash only the relevant effective keys
Relevant effective keys:
modelmodel_providermodel_catalog_jsonprofiles.<name>.model_catalog_jsonmodel_reasoning_effortforced_login_methodforced_chatgpt_workspace_idopenai_base_urlmodel_providers.*only as a redacted structural fingerprintprojects.<path>.trust_level
Acceptance:
- a team launch from project A and project B can have different Codex catalog cache entries
- a trusted project
.codex/config.tomlchangingmodel_catalog_jsoninvalidates preview for that project - global dashboard status does not claim to be launch-exact for every project
- exact logs record the catalog scope fingerprint, not raw config values
17. Built-in OpenAI Codex provider vs custom/OSS Codex config
🎯 7 🛡️ 9 🧠 7
Estimated implementation impact: 260-620 lines
Uncertainty:
- Codex config supports
model_provider, custom providers,oss_provider, and provider auth settings. - Agent Teams "Codex" provider is intended to mean native Codex through OpenAI/ChatGPT subscription or API-key billing, not arbitrary custom provider execution.
- app-server
model/listcan be influenced by configuration, but our product copy currently talks about Codex subscription.
Decision:
- this cutover should keep Agent Teams Codex scoped to the built-in OpenAI Codex provider
- custom provider and OSS provider support should be a separate provider feature, not silently mixed into
provider=codex - if effective config says
model_provideris not built-in OpenAI for the launch scope, show a clear warning and block subscription-mode launch unless the user intentionally switches to a future custom-provider flow - when launching Agent Teams Codex, pass or enforce provider config consistently so
codex execuses the same provider class previewed by the catalog
Recommended launch guard:
if provider=codex and effective model_provider is neither missing nor "openai":
status = degraded
launch = blocked
copy = "This project config points Codex at a custom/local provider. Agent Teams Codex currently supports the built-in OpenAI Codex provider only."
If the team wants to support custom providers later:
- add a separate
provider=codex-customor generic OpenAI-compatible provider - do not reuse subscription UX or rate-limit UI
- do not show ChatGPT account limits for custom provider launches
This prevents a confusing case where UI says "Codex subscription" but runtime actually routes to local OSS or a custom endpoint.
18. Modalities and personality support
🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 120-280 lines
Uncertainty:
- app-server model rows expose
inputModalitiesandsupportsPersonality. - Agent Teams launch prompts are text-first today, but future UI can attach images or personality-like instructions.
- older model catalogs can omit
inputModalities, and docs say missing modalities should be treated as["text", "image"]for backward compatibility.
Decision:
- launchability requires
textinput support - image support is displayed as capability metadata, not required for normal team launch
supportsPersonality=falsemust not disable normal team launch, but the UI must not claim/personalityor personality-specific behavior for that model- missing
inputModalitiesuses the documented backward-compatible default
Validation rule:
if inputModalities exists and does not include "text":
show model, disable launch, copy "This Codex model is not text-launch compatible for Agent Teams"
if supportsPersonality=false:
hide personality controls for this model if those controls exist
This keeps model picker truthful without overfitting to the current text-only launch flow.
19. Stable app-server surface vs experimental fields
🎯 8 🛡️ 9 🧠 4
Estimated implementation impact: 80-180 lines
Uncertainty:
- app-server has an
experimentalApicapability. model/listitself is documented on the stable API overview, but adjacent methods and future richer fields can be experimental.- opting into experimental API globally can change response surface and error behavior.
Decision:
- keep
experimentalApi=falsefor the model catalog rollout - rely only on stable
model/listfields listed in the docs - treat extra fields as diagnostics only
- add a later explicit spike before using experimental catalog, plugin, or app-server thread features in this path
Acceptance:
- catalog tests run with
experimentalApi=false - no Phase 1-5 task depends on experimental fields
- if a future field appears, normalization ignores it unless we add a typed, tested use case
20. App-server preview vs native exec signoff
🎯 8 🛡️ 10 🧠 6
Estimated implementation impact: 180-420 lines
Uncertainty:
model/listis the correct picker source, but the actual launch surface remainscodex exec --json.- a model can appear in app-server before
codex execin the installed binary handles it correctly. - effort config can be accepted syntactically but rejected by the model/provider at runtime.
Decision:
- app-server catalog is necessary for UI, but not the only release gate for enabling new launch capability
- Phase 4 must include a live or mocked native-exec compatibility probe for the selected launch path
- native exec signoff should test model, provider scope, cwd, profile, and non-default effort together
- if live signoff is not available in CI, use a fixture-based unit test plus one documented local smoke command before merging
Required signoff matrix:
default model + default effort + selected cwd
explicit gpt-5.4 + xhigh + selected cwd
gpt-5.1-codex-mini + high + selected cwd
gpt-5.1-codex-mini + xhigh -> blocked before exec
synthetic future model + capability disabled -> blocked before exec
synthetic future model + capability enabled -> argv accepted by orchestrator test
custom model_provider config -> blocked or explicit custom-provider copy
This prevents the plan from treating app-server catalog presence as proof that the full Agent Teams runtime path is healthy.
21. config/read scope contract is only partially documented
🎯 7 🛡️ 9 🧠 6
Estimated implementation impact: 180-420 lines
Uncertainty:
- docs list
config/read, but the detailed request/response shape is not as explicit asmodel/list. - local probe confirms
config/readreturns{ config, origins }and acceptsparams. - local probe confirms missing
paramsreturns-32600, so callers must always send{}at minimum. - local probe confirms
{ cwd }and{ profile }are accepted, but we still need tests around whether they fully mirrorcodex exec --cd/--profilein all installations.
Decision:
- treat
config/readas a feature-detected helper, not as a hard dependency for model catalog availability - always call
config/readwith an object, never with missing params - include
config/readmethod/code/details in diagnostics - if scoped
config/readfails but global succeeds, mark launch catalog asscope_unverified, notready - if
config/readis missing on older binaries, fall back to global catalog and require runtime capability plus explicit degraded copy before launch enablement
Recommended DTO:
export interface CodexAppServerConfigReadParams {
cwd?: string | null;
profile?: string | null;
}
export interface CodexAppServerConfigReadResponse {
config: Record<string, unknown>;
origins: Record<string, unknown>;
}
Feature-detect result:
export type CodexConfigReadSupport =
| 'supported-scoped'
| 'supported-global-only'
| 'method-missing'
| 'failed';
Acceptance:
- unit tests cover missing
params, method-not-found, global success, scoped success, and scoped failure config/readfailure never breaks account/rate-limit/model-list reads- launch UI does not present a project-scoped catalog as verified unless config scope was actually checked
22. Forced login method and ChatGPT workspace scope
🎯 7 🛡️ 10 🧠 6
Estimated implementation impact: 180-420 lines
Uncertainty:
- effective Codex config can include
forced_login_method. - effective Codex config can include
forced_chatgpt_workspace_id. - workspace/account policy can affect available models, rate limits, and whether ChatGPT subscription mode is valid.
- previous Codex account work already had a real bug around forced login method, so this is not theoretical.
Decision:
- auth scope must include forced login method and forced workspace identity when present
- if UI-selected auth mode conflicts with
forced_login_method, effective auth mode wins and UI must explain why - forced workspace id must be hashed before cache/log usage
- rate-limit, account, and model catalog snapshots must be scoped together so workspace changes cannot reuse stale catalog
Auth scope correction:
export interface CodexCatalogAuthScope {
preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
effectiveAuthMode: 'chatgpt' | 'api_key' | null;
forcedLoginMethod: 'chatgpt' | 'api_key' | null;
managedAccountHash: string | null;
forcedWorkspaceHash: string | null;
apiKeySource: string | null;
}
UX rules:
- if user selected ChatGPT but config forces API key, show
Codex config forces API key mode for this scope - if user selected API key but config forces ChatGPT, show
Codex config forces ChatGPT account mode for this scope - if workspace id changes, show
Codex workspace changed, refreshing subscription limits and model catalog - never show raw workspace id in UI unless Codex app-server provides a display name that is intended for users
Cache invalidation:
- forced login method change invalidates both auth and catalog cache
- forced workspace hash change invalidates ChatGPT-scoped rate limits and catalog
- account logout clears all ChatGPT workspace-scoped entries
23. Model catalog file trust and local file changes
🎯 6 🛡️ 9 🧠 7
Estimated implementation impact: 220-520 lines
Uncertainty:
model_catalog_jsoncan point to a local JSON file.- app-server resolves effective config, but our app may not know if that JSON file changed unless config fingerprint includes enough origin data.
- project-scoped
.codex/config.tomlonly applies for trusted projects, so a file can exist but not be active.
Decision:
- treat
model_catalog_jsonas part of effective config, not as a file we parse directly by default - if
config/read.originsexposes enough origin/path data, hash only path and mtime for invalidation, not file contents - if origin/path data is unavailable, rely on manual refresh and short TTL
- never read arbitrary
model_catalog_jsonfile contents into logs or diagnostics - do not apply project-scoped model catalog unless Codex effective config says the project is trusted and the catalog is active
Top 3 invalidation policies:
- TTL/manual-refresh only:
🎯 7 🛡️ 6 🧠 2,40-100lines. Simple but stale after local file edits. - Hash effective config plus optional mtime for active catalog file:
🎯 8 🛡️ 9 🧠 5,220-520lines. Best balance without parsing arbitrary catalog files ourselves. - Parse and watch every possible catalog file:
🎯 5 🛡️ 7 🧠 8,500-1000lines. Too much responsibility and security surface for this feature.
Chosen policy: option 2.
Acceptance:
- active
model_catalog_jsonpath change invalidates cache - active catalog file mtime change invalidates cache when path is available
- inactive untrusted project
.codex/config.tomldoes not affect the trusted/global catalog
Top 3 Implementation Options
1. Dedicated Codex model catalog feature - chosen
🎯 9 🛡️ 9 🧠 6
Estimated size: 1200-2400 lines
Core idea:
- create
src/features/codex-model-catalog - keep model catalog rules isolated from account UI, provider status plumbing, and Electron transport
- reuse existing
CodexAppServerSessionFactory - expose a small feature facade to provider status and renderer model picker
- update orchestrator only where runtime status and launch effort transport require it
Why it wins:
- best SOLID alignment
- clean domain rules for model visibility, effort validation, fallback, and default selection
- does not make
codex-accountresponsible for model policy - least risk to Anthropic
- easiest to test without full app startup
Main tradeoff:
- needs small integration glue in existing provider status and team launch flows
2. Fold catalog into codex-account
🎯 7 🛡️ 7 🧠 5
Estimated size: 800-1600 lines
Core idea:
- extend
src/features/codex-accountwithmodel/list - use account snapshot as the only Codex control-plane snapshot
- merge account, rate limits, and model catalog in one feature
Why it is tempting:
- fewer new folders
- account feature already owns app-server account/rate-limit reads
- easier to fetch account plus model catalog in one app-server session
Why I do not recommend it:
- model catalog is not account management
- the feature becomes a broad Codex control-plane catch-all
- future provider catalog work would have to pull model rules back out
- more risk of account UI churn when only model picker changes are needed
3. Full provider model catalog for all providers now
🎯 7 🛡️ 8 🧠 9
Estimated size: 2500-4500 lines
Core idea:
- build one provider-agnostic model catalog for Anthropic, Codex, Gemini, and future providers
- move static renderer catalog policy into a shared feature
- expose one rich contract for all provider model pickers
Why it is attractive:
- cleanest long-term abstraction
- one UI model for labels, availability, capabilities, and efforts
- reduces future duplication
Why not now:
- too much surface area while Codex runtime cutover is still fresh
- Anthropic model behavior is already stable and should not be reworked for a Codex catalog issue
- would delay the concrete Codex model release problem
Current Code Reality
claude_team
Existing app-server infrastructure:
src/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.tssrc/main/services/infrastructure/codexAppServer/CodexAppServerSessionFactory.tssrc/main/services/infrastructure/codexAppServer/protocol.tssrc/features/codex-account/main/infrastructure/CodexAccountAppServerClient.ts
Current account client behavior:
readAccount()opens one app-server session.readRateLimits()opens another app-server session.logout()opens another app-server session.- no
model/listprotocol types exist yet. CodexAppServerSessionFactorystartscodex app-serverwith no explicitcwdor profile option.- app-server initialize response includes
codexHome, but the current protocol types do not expose effective config or config fingerprint.
Current shared provider status:
CliProviderStatus.modelsis onlystring[].CliProviderStatus.modelAvailabilityhas per-model verification status but no rich model metadata.- renderer model selector can already prefer runtime-provided
providerStatus.models.
Current effort type:
export type EffortLevel = 'low' | 'medium' | 'high';
Risk:
- adding
xhighdirectly without provider-specific validation would let Anthropic UI accidentally offer unsupported choices.
Current persistence and non-dialog launch paths:
- team metadata and member metadata normalize launch-derived provider/model/effort in multiple services.
- backup/restore copies metadata but restore-time launch preview must still tolerate missing catalog metadata.
- draft retry and launch prefill can reuse old localStorage state.
- scheduled launch types can reference the shared effort type.
Risk:
- updating only the visible launch dialogs would leave hidden paths that silently drop Codex-only efforts or relaunch with stale default semantics.
agent_teams_orchestrator
Current Codex model catalog:
src/utils/model/codex.ts- static
CODEX_MODELS - static
DEFAULT_CODEX_MODEL isCodexModel()checks only static ids
Current runtime status:
getUnifiedRuntimeStatusPayload('codex')returns static Codex model ids.
Current CLI effort:
- top-level
--effort <level>currently acceptslow | medium | high | max. - Codex native execution is ultimately
codex exec --json. - installed
codex exec --helpshows no--effortflag.
Risk:
- if we send
--effort xhighthrough current orchestrator, it fails before Codex can use it. - if we map Anthropic
maxto Codexxhigh, the semantics are wrong. - if we show
xhighin UI before the launch path supports it, the picker becomes misleading.
Target Architecture
Feature folder
src/features/codex-model-catalog/
contracts/
codexModelCatalog.dto.ts
index.ts
core/
domain/
codexModelCatalog.ts
codexReasoningEffort.ts
codexModelCatalogFallback.ts
normalizeCodexAppServerModel.ts
application/
GetCodexModelCatalogUseCase.ts
CodexModelCatalogPorts.ts
main/
composition/
createCodexModelCatalogFeature.ts
adapters/
output/
CodexAppServerModelCatalogSource.ts
StaticCodexModelCatalogSource.ts
infrastructure/
CodexModelCatalogAppServerClient.ts
InMemoryCodexModelCatalogCache.ts
preload/
index.ts
renderer/
adapters/
codexModelCatalogViewModel.ts
hooks/
useCodexModelCatalog.ts
ui/
CodexModelEffortHint.tsx
Rules:
core/domainhas all normalization and validation rules.main/infrastructureis the only layer that knows JSON-RPC method names.- renderer never receives raw app-server rows.
- app shell imports only public feature entrypoints.
App-server lifecycle
Use the existing CodexAppServerSessionFactory.
Request sequence:
- Spawn
codex app-server. - Send
initializewithclientInfoand capabilities. - Send
initialized. - Request
model/list. - Drain or ignore notifications safely.
- Close stdin and terminate the process on completion or timeout.
Recommended timeouts:
- initialize:
6000ms model/list:4500ms- total model catalog read:
9000ms
Recommended pagination:
- request
limit: 100,includeHidden: falsefor normal UI - follow
nextCursoruntilnull - hard-stop after 5 pages to avoid runaway loops
- log a degraded catalog warning if the hard-stop is hit
Single-session snapshot policy
Provider status currently risks multiple sequential app-server starts:
- account read
- rate limits read
- future model list read
This caused slow provider loading in earlier UI work, so the plan should not add another app-server spawn in the hot path.
Preferred design:
- keep
codex-model-catalogas a separate feature for ownership - add an optional combined Codex control-plane read in composition
- when provider status refresh needs account plus rate limits plus model catalog, use one app-server session and issue all three requests inside it
- each sub-result has independent soft-failure state
- total snapshot can be partially healthy
Snapshot shape:
export interface CodexControlPlaneSnapshot {
binary: {
path: string;
version: string | null;
};
account: CodexAccountSnapshotResult;
rateLimits: CodexRateLimitsSnapshotResult;
modelCatalog: CodexModelCatalogSnapshotResult;
configScope: {
cwd: string | null;
profileName: string | null;
projectTrust: 'trusted' | 'untrusted' | 'unknown';
configReadSupport: CodexConfigReadSupport;
effectiveConfigFingerprint: string | null;
launchOverridesFingerprint: string | null;
activeModelCatalogFileFingerprint: string | null;
};
initialize: {
codexHome: string;
platformFamily: string;
platformOs: string;
};
fetchedAt: string;
}
Soft-failure rules:
- account failure must not erase a fresh cached model catalog
- model catalog failure must not mark ChatGPT account disconnected
- rate-limit failure must not hide model picker options
- if app-server initialize fails, all three sub-results are degraded from the same root cause
Required correction to the existing account flow:
- current
CodexAccountAppServerClient.readAccount()andreadRateLimits()each open their own app-server process - adding a third standalone
readModelCatalog()would be a Provider Settings latency regression - implement a combined app-server read path before wiring catalog into provider refresh
- keep separate methods for mutations and focused tests, but use the combined path for normal status refresh
- enrich
JsonRpcStdioClienterrors before catalog integration so the combined reader can classifymodel/listmethod failures without losing account truth
Recommended application service shape:
export interface CodexControlPlaneReader {
readSnapshot(options: CodexControlPlaneReadOptions): Promise<CodexControlPlaneSnapshot>;
}
This can live in codex-model-catalog composition or in a small shared Codex control-plane composition module. Do not put model normalization inside codex-account.
Read scope:
- Provider Settings global refresh can pass
cwd=null. - Create/Launch dialogs should pass the selected absolute
cwd. - Relaunch/restore should pass the team's persisted project path.
- Scheduled launch validation should pass
schedule.launchConfig.cwd. - If a future UI supports Codex profile selection, the same profile must be passed to preview and launch.
Contracts
App-server protocol types
Add protocol DTOs to src/main/services/infrastructure/codexAppServer/protocol.ts:
export type CodexAppServerReasoningEffort =
| 'none'
| 'minimal'
| 'low'
| 'medium'
| 'high'
| 'xhigh';
export interface CodexAppServerReasoningEffortOption {
reasoningEffort: CodexAppServerReasoningEffort;
description?: string | null;
}
export type CodexAppServerInputModality = 'text' | 'image' | string;
export interface CodexAppServerModel {
id: string;
model: string;
displayName: string;
description?: string | null;
hidden: boolean;
supportedReasoningEfforts: CodexAppServerReasoningEffortOption[];
defaultReasoningEffort: CodexAppServerReasoningEffort;
inputModalities?: CodexAppServerInputModality[] | null;
supportsPersonality?: boolean | null;
isDefault: boolean;
upgrade?: string | null;
upgradeInfo?: unknown;
availabilityNux?: unknown;
}
export interface CodexAppServerModelListParams {
cursor?: string | null;
limit?: number | null;
includeHidden?: boolean | null;
}
export interface CodexAppServerModelListResponse {
data: CodexAppServerModel[];
nextCursor: string | null;
}
export interface CodexAppServerConfigReadParams {
cwd?: string | null;
profile?: string | null;
}
export interface CodexAppServerConfigReadResponse {
config: Record<string, unknown>;
origins: Record<string, unknown>;
}
config/read caller rule:
- always pass a params object, even when empty
- call global config as
config/readwith{} - call project scope as
config/readwith{ cwd } - call profile scope as
config/readwith{ profile } - if both cwd and profile are needed, test
{ cwd, profile }in Phase 1 and record the behavior before enabling profile-aware UI
Domain model
Use separate ids:
catalogId: app-serverid, stable identity for React keys, telemetry, and dedupelaunchModel: app-servermodelwhen non-empty, otherwiseid
Reason:
- local probe currently returned equal values, but official schema exposes both fields, so they can diverge later.
- using
idfor launch would be a latent bug if Codex introduces a display/catalog alias.
export interface CodexCatalogModel {
catalogId: string;
launchModel: string;
displayName: string;
description: string | null;
hidden: boolean;
isDefault: boolean;
supportedReasoningEfforts: CodexReasoningEffort[];
defaultReasoningEffort: CodexReasoningEffort | null;
inputModalities: CodexInputModality[];
supportsPersonality: boolean;
upgrade: string | null;
source: 'app-server' | 'static-fallback';
}
Normalization rules:
- reject rows without a usable
id - derive
launchModelfrommodel || id - default missing
inputModalitiesto['text', 'image']for older catalogs - default missing
supportsPersonalitytofalse - accept documented
supportedReasoningEffortsobjects withreasoningEffort - defensively accept string effort entries in tests, because older generated local types and live clients can drift
- drop duplicate
catalogIdrows after the first visible row - drop duplicate
launchModelrows after the first visible row unless a hidden row is the only available row - keep unknown effort strings out of the selectable UI, but preserve them in diagnostics
- if no model is marked
isDefault, choose static fallback default only as degraded fallback and label it as such
Provider status contract
Add an optional rich catalog to CliProviderStatus:
export interface CliProviderModelCatalog {
schemaVersion: 1;
source: 'app-server' | 'static-fallback' | 'unavailable';
status: 'ready' | 'stale' | 'degraded' | 'unavailable';
fetchedAt: string | null;
staleAt: string | null;
binary?: {
path: string | null;
version: string | null;
};
authScope?: {
preferredAuthMode: 'auto' | 'chatgpt' | 'api_key' | null;
effectiveAuthMode: 'chatgpt' | 'api_key' | null;
forcedLoginMethod?: 'chatgpt' | 'api_key' | null;
managedAccountHash?: string | null;
forcedWorkspaceHash?: string | null;
apiKeySource?: string | null;
};
launchScope?: {
cwd: string | null;
profileName: string | null;
projectTrust: 'trusted' | 'untrusted' | 'unknown';
configFingerprint: string | null;
launchOverridesFingerprint: string | null;
};
errorMessage?: string | null;
defaultModelId?: string | null;
defaultLaunchModel?: string | null;
models: CliProviderModelInfo[];
}
export interface CliProviderModelInfo {
catalogId: string;
launchModel: string;
displayName: string;
description?: string | null;
hidden?: boolean;
isDefault?: boolean;
supportedReasoningEfforts?: CliProviderReasoningEffort[];
defaultReasoningEffort?: CliProviderReasoningEffort | null;
inputModalities?: string[];
supportsPersonality?: boolean;
upgrade?: string | null;
}
export type CliProviderReasoningEffort =
| 'none'
| 'minimal'
| 'low'
| 'medium'
| 'high'
| 'xhigh'
| 'max';
export interface CliProviderRuntimeCapabilities {
schemaVersion: 1;
codex?: {
supportsDynamicAppServerModels: boolean;
supportsCodexReasoningEffortConfig: boolean;
supportedCodexReasoningEfforts: Array<'minimal' | 'low' | 'medium' | 'high' | 'xhigh'>;
acceptsProviderExplicitFutureModels: boolean;
};
}
Backwards compatibility:
- keep
CliProviderStatus.models: string[] - add
CliProviderStatus.runtimeCapabilities?: CliProviderRuntimeCapabilities - for Codex, derive
modelsfrommodelCatalog.models.map(model => model.launchModel) - for Anthropic and Gemini, do not require
modelCatalog - old renderers continue to work from
models - new renderers prefer
modelCatalogwhen present - never put team-agent disabled policy directly into
CliProviderModelCatalog; catalog describes Codex availability, while Agent Teams policy is applied by renderer and launch validators - never infer launch capability only from catalog presence
Renderer integration hotspot:
- update
TeamModelRuntimeProviderStatusinsrc/renderer/utils/teamModelAvailability.tsto includemodelCatalog - update
getRuntimeSelectorModels()to usemodelCatalog.models[*].launchModelfor Codex - update
getAvailableTeamProviderModelOptions()to map rich Codex options with display labels, default badge, and catalog diagnostics - keep Anthropic path on
getFallbackTeamProviderModelOptions() - keep Gemini path on existing
models: string[]until Gemini has a richer catalog
Team launch effort contract
Do not add a separate per-provider lane for this feature.
Use existing team-level model/provider selection, but make effort provider-aware.
Recommended implementation:
- keep persisted field name
effort - widen internal effort type to
ProviderReasoningEffort - add provider/model validators at every launch boundary
- Anthropic UI only shows
low | medium | high - Codex UI shows only the selected model's
supportedReasoningEfforts - orchestrator accepts
minimal | low | medium | high | xhighfor Codex andlow | medium | high | maxfor Anthropic paths
Existing validator hotspots:
src/shared/types/team.tscurrently definesEffortLevel = 'low' | 'medium' | 'high'src/main/ipc/teams.tscurrently validates onlylow | medium | highsrc/main/http/teams.tscurrently validates onlylow | medium | highsrc/renderer/components/team/dialogs/EffortLevelSelector.tsxcurrently hardcodes onlyDefault | Low | Medium | HighLaunchTeamDialog,CreateTeamDialog, member draft rows, and member editor utilities currently cast strings withas EffortLevel
Required migration:
- replace unsafe
as EffortLevelcasts with a provider-aware normalization function - parse provider before parsing effort in IPC and HTTP paths
- validate lead effort against lead provider/model
- validate member effort against each member's resolved provider/model
- keep old persisted
low | medium | highvalues readable without migration
Validation rule:
provider=codex:
effort must be in selectedModel.supportedReasoningEfforts
provider=anthropic:
effort must be low | medium | high
provider=gemini:
keep current behavior unless Gemini gets a richer effort contract
Important:
- do not map Anthropic
maxto Codexxhigh - do not map Codex
xhighto Anthropicmax - if selected Codex model changes and old effort is unsupported, reset to the new model's
defaultReasoningEffort - if catalog is unavailable, only allow static fallback efforts that are proven launchable
Launch identity rule:
effortis user selectionresolvedEffortis what launch sends to runtime- if user selection is empty/default,
resolvedEffortcomes from app-serverdefaultReasoningEffort - if resolved effort equals app-server default, runtime transport may omit
model_reasoning_effort, but exact logs still record the resolved value
Runtime Launch Transport
This was the highest-risk area in the earlier plan. The corrected plan is explicit.
Facts:
codex exechas--model.codex exechas-c, --config key=value.codex exechas no documented--effort.- Codex config has
model_reasoning_effort. model_reasoning_effortsupportsminimal | low | medium | high | xhigh.
Therefore:
- Codex native launch must not pass
--effort xhighto Codex CLI. - Orchestrator may keep accepting
--effortas its public Agent Teams flag. - When provider is Codex native, orchestrator must translate accepted effort into
codex exec -c model_reasoning_effort="value". - When no effort is selected, omit
model_reasoning_effortand let Codex use its model default. - When effort equals the selected model's app-server default, either omit it or pass it consistently, but pick one policy and test it.
Recommended policy:
- omit effort when it equals app-server
defaultReasoningEffort - pass effort only when user explicitly selected a non-default value
Reason:
- this tracks Codex defaults as Codex evolves
- exact logs remain cleaner
- future app-server default changes are not blocked by stale persisted values
Live signoff command shape:
codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"
Quoting requirement:
- command builder must pass
-candmodel_reasoning_effort="xhigh"as separate argv entries - shell-rendered exact logs can show
-c model_reasoning_effort='"xhigh"' - tests should assert argv arrays, not only shell strings
- never concatenate user-controlled effort into a shell string without argv escaping
Prelaunch validation must block:
gpt-5.1-codex-miniwithlowgpt-5.1-codex-miniwithxhigh- unknown effort strings from app-server until explicitly supported by our UI and orchestrator type
Static Fallback
Fallback stays necessary because:
- user may have an older Codex binary
- app-server may fail to initialize
- app-server may start but not support
model/list - offline usage should not make the entire model picker empty
- tests should not depend on live Codex availability
Fallback rules:
- fallback source is explicitly marked
static-fallback - fallback never claims to be current
- fallback has a short visible warning in Provider Settings only when user is choosing Codex models
- fallback model list should be minimal and conservative
- fallback must not include newly guessed future models
- fallback caused by missing
model/listshould include an upgrade hint tied to the detected Codex binary version when available
Recommended fallback models:
gpt-5.4gpt-5.4-minigpt-5.3-codexgpt-5.2gpt-5.1-codex-mini
Fallback effort rules:
- use
medium | highforgpt-5.1-codex-mini - use
low | medium | high | xhighfor known models only if live signoff confirmsmodel_reasoning_effortpass-through - otherwise fallback UI can show richer metadata but disable non-launchable options
API-key mode note:
- do not use OpenAI
/v1/modelsas the primary Codex picker for subscription-backed Codex - optional API
/v1/modelsfallback is allowed only for explicit API-key mode diagnostics - if API
/v1/modelsdisagrees with Codex app-servermodel/list, Codex app-server wins for native Codex execution - reason: the actual runtime surface is
codex exec, and app-server describes what Codex clients should show
Cache And Refresh
Goal:
- make model updates feel fresh without making Provider Settings slow or flaky.
Main-process cache:
- key: Codex binary path plus Codex binary version plus Codex home plus launch cwd/profile/config fingerprint plus preferred auth mode plus effective auth mode plus managed account hash plus API-key source
- success TTL:
10 minutes - stale TTL:
24 hours - in-flight dedupe: one live
model/listrequest per key - manual refresh bypasses success TTL but still dedupes in-flight work
- auth mode change invalidates the ready cache for UI selection purposes
forced_login_methodand forced workspace changes invalidate the affected auth/catalog scope- logout clears ChatGPT-scoped catalog cache
- API key source change clears API-key-scoped catalog cache
- project
.codex/config.toml, globalconfig.toml, ormodel_catalog_jsonchanges clear the affected scope when detected by fingerprint change - binary path or version change clears all Codex model catalog cache entries
Renderer cache:
- consume
CliProviderStatus.modelCatalog - no independent polling loop in the model picker
- refresh through existing provider status refresh action
Dashboard policy:
- do not run
model/liston every dashboard render - use existing provider status refresh cadence
- model catalog stale state can be shown only inside settings/model picker, not as a scary dashboard error
- dashboard catalog is a global/default-scope summary, not a promise that every project cwd has the same catalog
Provider Settings policy:
- open dialog with cached provider status immediately
- refresh in background
- show
Checking...only for the area still being refreshed - never replace a ready catalog with empty state during a refresh
Avoid this bug:
- do not set global provider status to
unavailablewhile only the model catalog refresh is pending - do not replace a ChatGPT-ready account state with a catalog timeout
- do not show generic
Unknown error; preserve app-server method, timeout, and fallback source in diagnostics - if
autoresolves to ChatGPT, API-key detection copy stays secondary - if
autoresolves to API key because ChatGPT is unavailable, show why ChatGPT was skipped before showing API-key catalog
UI Behavior
Model picker
When provider=codex:
- prefer
providerStatus.modelCatalog.models - option value is
launchModel - React key can use
catalogId - label uses
displayName - default badge uses
isDefault - hidden app-server models are excluded from normal selector unless already persisted in a team
- disabled state uses existing Agent Teams policy plus app-server
upgradehints - runtime-capability state controls whether a visible model is launchable
- fallback badge says
Using fallback catalogonly when source is fallback - if app-server says a model is available but Agent Teams disables it, show
Available in Codex, disabled for Agent Teams - if app-server says a future model exists but runtime capability is missing, show
Available in Codex, waiting for Agent Teams runtime support - if a persisted model is missing from current catalog, show it as
Unavailable in current Codex catalogand require user confirmation before relaunch - if the dialog has a selected cwd and only a global catalog is available, show global options as provisional until project-scoped catalog finishes
- if project-scoped catalog differs from global catalog, keep the user's explicit selection only if it exists in the project-scoped catalog or is a preserved persisted value
When catalog is loading:
- keep previous options visible
- show a subtle "Refreshing models" state
- do not show an empty Codex picker unless no cached or fallback models exist
- label provisional global catalog rows as
Checking this project...when launch cwd is known
When catalog fails:
- use stale cache if present
- otherwise use static fallback
- show the app-server error in diagnostics, not as a generic unknown error
Effort selector
When provider=codex and selected model has catalog metadata:
- show efforts from
supportedReasoningEfforts - mark
defaultReasoningEffortas default - include
xhighif returned by app-server and runtime capability says Codex effort config pass-through is supported - if runtime capability is missing, show Codex-only efforts as metadata or disabled rows, not selectable launch values
- if selected effort is no longer valid, reset to default with a small explanation
- if model is Agent Teams-disabled, keep effort selector read-only or disabled to avoid suggesting launchability
When selected model has no catalog metadata:
- show only safe fallback efforts
- do not show
xhighunless launch pass-through is implemented and tested
When provider=anthropic:
- keep current selector behavior
- do not show Codex-only
minimal,none, orxhigh - do not change Anthropic copy
Default model
Recommended behavior:
- app-server
isDefaultdefines the Codex default in UI - "Default" label can render as
Default (gpt-5.4)orDefault (GPT-5.4)when catalog is ready - new Codex teams can display
Default, but launch must resolve it to a concreteresolvedLaunchModel - existing teams keep their persisted model unless user changes it
- do not rewrite old team metadata just because app-server default changed
- exact logs and team metadata should record both selected
Defaultand concrete resolved model
Reason:
- new teams benefit from current Codex defaults
- existing teams remain explainable even if Codex default changes later
Orchestrator Changes
Model status
Short term:
- keep static
CODEX_MODELSfor standalone fallback and non-app UI compatibility - add richer status only if orchestrator can read app-server directly without slowing CLI startup
Recommended first cut:
claude_teamowns app-server model catalog for UI- orchestrator keeps static runtime status until a dedicated orchestrator catalog source is added
- launch validation accepts provider-explicit Codex model strings even if not in static
CODEX_MODELS - orchestrator exposes runtime capabilities for dynamic Codex model ids and Codex reasoning effort config pass-through
Reason:
- UI is where the dynamic picker is needed immediately
- orchestrator should not reject a future model that Codex app-server already exposed and
claude_teamselected - UI should not guess whether the current runtime can launch that future model
Validation
Update validation so:
- provider-explicit
codexlaunches can use model strings from app-server catalog - unknown model strings are not guessed as Codex without provider context
- static
isCodexModel()remains valid for generic detection, not authoritative for provider-explicit launches - if provider context is missing, keep existing conservative static validation
Effort transport
Update orchestrator:
- accept Codex efforts
minimal | low | medium | high | xhigh - preserve Anthropic
max - in Codex native executor, convert Codex effort to
-c model_reasoning_effort='"value"' - do not pass unsupported effort values to
codex exec - exact logs should show the selected effort as normalized Agent Teams metadata and the actual Codex config override
Required tests:
- Codex native
xhighbecomes-c model_reasoning_effort='"xhigh"' - no effort omits
model_reasoning_effort - Anthropic
maxremains Anthropic-only - Codex
maxis rejected - Anthropic
xhighis rejected
Concrete Implementation Touchpoints
claude_team:
src/main/services/infrastructure/codexAppServer/protocol.ts- add app-server model DTOssrc/main/services/infrastructure/codexAppServer/JsonRpcStdioClient.ts- preserve JSON-RPC error code, method, and detailssrc/main/services/infrastructure/codexAppServer/CodexBinaryResolver.tsor a nearby service - expose binary version for cache invalidationsrc/features/codex-model-catalog- new feature for catalog domain, use case, app-server source, fallback source, and cachesrc/features/codex-account/main/composition/createCodexAccountFeature.ts- coordinate combined control-plane snapshot or delegate to shared readersrc/features/codex-account/renderer/mergeCodexProviderStatusWithSnapshot.ts- preserve account truth while merging model catalog truthsrc/shared/types/cliInstaller.ts- add optional provider model catalogsrc/shared/types/team.ts- widen provider-aware effort types without breaking old persisted valuessrc/shared/types/schedule.ts- prevent scheduled launches from dropping Codex-specific effortssrc/main/services/team/TeamDataService.ts- preserve provider-aware effort and launch identity when reconstructing team statesrc/main/services/team/TeamMembersMetaStore.ts- stop filtering Codex efforts down to legacylow | medium | highsrc/main/services/team/TeamBackupService.tsand restore paths - preserve additive launch identity and tolerate old backupssrc/main/services/runtime/CliProviderModelAvailabilityService.ts- keep runtime verification compatible withlaunchModelvalues and do not verify hidden/catalog-only rows by accidentsrc/main/ipc/teams.tsandsrc/main/http/teams.ts- parse provider first, then validate effortsrc/renderer/utils/teamModelAvailability.ts- consume rich Codex catalogsrc/renderer/utils/teamModelCatalog.ts- demote Codex static list to fallback and labels onlysrc/renderer/components/team/dialogs/EffortLevelSelector.tsx- make options provider/model-awaresrc/renderer/components/team/dialogs/LaunchTeamDialog.tsxandCreateTeamDialog.tsx- remove unsafe effort casts and persist resolved launch identity- member draft/editor components - validate per-member resolved provider/model/effort
- renderer launch prefill and draft retry storage - add a versioned launch identity payload and tolerate old entries
agent_teams_orchestrator:
src/entrypoints/sdk/runtimeTypes.ts- add provider-aware Codex effort supportsrc/main.tsx- update--effortparser or provider-specific validation pathsrc/utils/effort.tsandsrc/utils/providerEffort.ts- separate Anthropicmaxfrom Codexxhigh- Codex native executor path - convert effort to
-c model_reasoning_effort src/utils/model/codex.ts- rename static list semantics to fallback/static detectionsrc/utils/model/validateModel.ts- allow provider-explicit Codex app-catalog models- runtime status/capability endpoint - expose dynamic Codex model and effort pass-through support
- exact-log/runtime status code - record selected model, resolved model, selected effort, resolved effort, and config override
Phased Implementation
Phase 0 - contracts and live spike
Commit boundary: docs(codex): plan app-server model catalog
Tasks:
- add this plan
- keep live probe output in signoff notes or test fixture
- confirm installed Codex supports
model/list - confirm one app-server session can read account, rate limits, and model catalog
- confirm docs support
model_reasoning_effort - decide exact shell quoting for
-c model_reasoning_effort - capture fixtures for at least two catalog shapes: current live shape and synthetic
id !== model - capture current Codex binary version and document cache invalidation expectations
Acceptance:
- plan exists in the dedicated worktree
- no code behavior changes
- weak areas are explicitly called out
Phase 1 - app-server model catalog feature
Commit boundary: feat(codex): add app-server model catalog source
Tasks:
- add structured JSON-RPC request errors with method/code/details
- expose or probe Codex binary version for catalog cache keys
- add effective config fingerprint support using app-server
config/readwhen available - add
config/readsupport detection and always send{}params at minimum - add
src/features/codex-model-catalog - add app-server protocol types
- add
CodexModelCatalogAppServerClient - add normalization domain rules
- add static fallback source
- add in-memory cache with TTL and in-flight dedupe
- include launch scope fields in cache keys: cwd, profile, trust, config fingerprint, launch override fingerprint
- include forced login method and forced workspace hash in auth-scoped cache keys
- normalize both documented effort option objects and defensive string effort values
- classify
method not found, timeout, malformed response, and empty catalog separately - add structured diagnostics without raw account email or secret-bearing env values
- expose feature facade from main composition
Acceptance:
- JSON-RPC
method not foundcan be detected in tests - binary version changes invalidate catalog cache
- config fingerprint changes invalidate catalog cache for that scope
- forced login/workspace changes invalidate account, limits, and catalog cache for that scope
- unit tests cover normalization, fallback, pagination, duplicate ids, missing modalities, unknown effort strings, and
id !== model - app-server client tests cover
model/listrequest params and timeout labels - method-not-found falls back without marking account disconnected
- diagnostics include source, status, method, error category, binary version, effective auth mode, and cache age
- no renderer behavior changes yet
Phase 2 - provider status integration
Commit boundary: feat(runtime): expose codex model catalog metadata
Tasks:
- add optional
modelCatalogtoCliProviderStatus - add optional
runtimeCapabilitiestoCliProviderStatus - merge Codex model catalog into provider status
- keep
models: string[]derived fromlaunchModel - make provider refresh use cached, auth-scoped catalog
- implement combined account/rate-limits/catalog app-server read for normal refresh
- avoid extra app-server session in hot paths where account snapshot already refreshes
- clear ChatGPT-scoped catalog on logout and API-key-scoped catalog when API key source changes
- clear all catalog entries when Codex binary path or version changes
- ensure
autocatalog scope follows effective launch auth mode, not just configured preference - add request/snapshot versioning so stale refresh responses cannot overwrite newer auth state
- support global provider refresh and project-scoped launch refresh as different catalog scopes
- preserve Anthropic provider status shape
Acceptance:
- Codex provider status includes
modelCatalog - Codex provider status includes runtime capability metadata when available
- old
modelsstill works autowith ChatGPT ready uses ChatGPT-scoped catalog even if API key is detectedautowith ChatGPT unavailable and API key ready uses API-key-scoped catalog with clear degraded copy- forced login method overrides are reflected in effective auth copy and cache scope
- one normal Codex provider refresh does not spawn separate app-server processes for account, limits, and catalog
- Anthropic snapshots are byte-for-byte equivalent except ordering noise already present
- provider dashboard does not block on a slow catalog refresh when stale cache exists
- older refresh results are ignored after auth mode or runtime capability changes
- global dashboard catalog and project launch catalog do not overwrite each other
Phase 3 - dynamic UI model picker and effort selector
Commit boundary: feat(codex): use dynamic model catalog in team launch UI
Tasks:
- update Codex model picker to prefer rich catalog
- show app-server labels, default badge, and fallback source state
- update effort selector to be provider/model-aware
- show
xhighmetadata only for Codex models that return it - make
xhighselectable only when runtime capability says Codex effort config pass-through is supported - hide Codex-only efforts for Anthropic
- reset invalid effort on model change
- preserve missing persisted models as visible warning rows instead of silently clearing selection
- keep Agent Teams disabled policy separate from Codex app-server availability
- show future app-server models immediately, with
New from Codex catalogstatus when policy has not verified them yet - when cwd is selected, refresh project-scoped Codex catalog before enabling launch-only controls
Acceptance:
gpt-5.1-codex-minishows onlymedium | highgpt-5.3-codex-sparkdefaults tohighgpt-5.4showslow | medium | high | xhighas catalog metadataxhighis disabled with runtime-upgrade copy until capability support is present- app-server-visible but Agent Teams-disabled model shows disabled copy, not unavailable copy
- synthetic future
gpt-5.5fixture appears without touching static catalog - persisted model missing from current catalog is visible with a warning
- Anthropic UI remains
low | medium | high - static fallback still renders when app-server is unavailable
- global catalog can be displayed provisionally, but launch enablement waits for project-scoped catalog or explicit degraded confirmation
Phase 4 - launch validation and Codex effort pass-through
Commit boundary: feat(runtime): pass codex reasoning effort through native exec
Tasks:
- widen team launch effort validation with provider-specific rules
- update IPC and HTTP validators
- update
TeamProvisioningServicerequest shaping - persist additive
ProviderModelLaunchIdentityinto team metadata, exact-log metadata, and backup/restore payloads where launch identity is reconstructed - update orchestrator parser and runtime types
- expose orchestrator runtime capability metadata for dynamic Codex models and Codex effort config
- translate Codex effort to argv entries
['-c', 'model_reasoning_effort="value"'] - keep Anthropic
maxseparate - add exact-log metadata for selected model, resolved launch model, catalog source, selected effort, and resolved effort
- resolve
Defaultto concrete launch model before provisioning - update scheduled/provisioned launch paths or block Codex-only efforts in those paths until updated
- enforce built-in OpenAI Codex provider scope or block custom/OSS provider configs with clear copy
- pass profile/cwd/config overrides consistently between preview and
codex exec
Acceptance:
- Codex
xhighlaunch reachescodex execasmodel_reasoning_effort - Codex
maxis rejected before launch - Anthropic
xhighis rejected before launch - unsupported model-effort pairs are blocked before provisioning
- provider-explicit synthetic future model is accepted only when runtime capability says dynamic Codex models are supported
- member metadata, team metadata, draft retry, and backup/restore preserve provider-aware effort
- replay/exact logs show what was selected, what default resolved to, and what was passed to Codex
- exact logs include catalog scope fingerprint and provider scope, but not raw config values
Phase 5 - cleanup and fallback tightening
Commit boundary: refactor(codex): demote static model catalog to fallback
Tasks:
- rename static Codex catalog helpers to make fallback status explicit
- remove UI assumptions that static list is authoritative
- make future provider-explicit Codex ids launchable when selected from app-server catalog
- add diagnostics for catalog source and staleness
- document fallback behavior
- add a fixture/test with synthetic future model
gpt-5.5 - remove any remaining hardcoded Codex model order from the primary Codex UI path
- add hidden-model fixture and upgrade-suggestion fixture
- add one migration test for old localStorage launch prefill without provider model launch identity
- add project-scoped catalog fixture with
model_catalog_json - add custom-provider config fixture
- add forced login method and forced workspace fixtures
- add
config/readmethod-missing and invalid-params fixtures
Acceptance:
- new app-server model can appear in UI without code changes
- static fallback is visible as fallback in diagnostics
- no code path treats static
CODEX_MODELSas the only valid Codex provider model list - synthetic
gpt-5.5appears through app-server fixture and can be selected without touching static catalog - hidden persisted model is preserved with warning and is not introduced into new-team picker
- project-scoped catalog differences are visible and do not corrupt global provider status
- forced login method changes are visible and do not reuse stale catalog/rate-limit scope
Test Plan
claude_team unit tests
Add tests for:
- structured JSON-RPC error classification
- binary version cache invalidation
- effective config fingerprint cache invalidation
config/readsupport detection, including invalid missing params- project-scoped
model_catalog_jsonfixture - app-server model normalization
idvsmodelsplit- default model selection
- per-model effort options
- unknown effort filtering
- auth-scoped catalog cache keys
autoauth resolving to ChatGPT vs API-key catalog scope- combined app-server snapshot partial failures
- method-not-found fallback for older Codex app-server
- fallback catalog source
- stale cache behavior
- stale refresh response is ignored after newer auth-scope request
- global catalog and project-scoped catalog use separate cache entries
- forced login method and forced workspace hash use separate cache entries
- custom/OSS
model_providerconfig is blocked or marked unsupported for Agent Teams Codex - raw managed account email does not appear in catalog diagnostics or exact-log metadata
- provider status
modelscompatibility - provider status runtime capabilities compatibility
- provider model availability uses
launchModel, notcatalogId - renderer model picker with rich catalog
- renderer effort selector with Codex and Anthropic providers
- renderer disables Codex-only efforts when runtime capability is missing
- renderer shows synthetic future model as
New from Codex catalog - renderer preserves hidden persisted model after
includeHidden: truerecovery - persisted missing model warning row
- Agent Teams disabled policy overlay for app-server-visible models
- backup/restore reads old metadata and preserves new launch identity when present
- draft retry and launch prefill read old localStorage entries without dropping provider/model identity
- scheduled launch validation either supports Codex-specific effort or blocks it with explicit error
- launch preview with selected cwd does not enable launch from global-only catalog when project-scoped catalog is still unknown
Suggested commands:
pnpm vitest run \
test/features/codex-model-catalog \
test/features/codex-account \
test/renderer/components/team \
test/renderer/utils/teamModelCatalog.test.ts
agent_teams_orchestrator tests
Add tests for:
- provider-explicit Codex model validation
- Codex effort parser accepts
minimal | low | medium | high | xhigh - Anthropic effort parser keeps existing behavior
- Codex native executor emits
-c model_reasoning_effort - Codex native executor builds argv entries, not unsafe shell concatenation
- no effort omits Codex effort config
maxis not accepted for Codex- synthetic
gpt-5.5passes when provider is explicitly Codex and model came from app catalog - capability payload reports dynamic Codex model support and effort config support
- provider-explicit future model fails closed when capability is disabled
- Codex native exec argv includes cwd/profile/config override semantics that match preview scope
- custom provider config is not silently routed through subscription Codex UX
Suggested command:
pnpm test -- runtimeBackends providerEffort spawnMultiAgent codex
live smoke
Run only when developer has Codex login/API available:
codex app-server
JSON-RPC smoke:
{ "jsonrpc": "2.0", "id": 1, "method": "model/list", "params": { "limit": 20, "includeHidden": false } }
Native exec effort smoke:
codex exec --json --model gpt-5.4 -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"
Failure smoke:
codex exec --json --model gpt-5.1-codex-mini -c model_reasoning_effort='"xhigh"' --skip-git-repo-check --ephemeral "Return only: ok"
Expected:
- our app should block the second case before launch once catalog metadata is available
- if run manually, Codex may return model/provider-specific error, but product UX should not rely on that late failure
Risks And Mitigations
Risk 1 - app-server startup slows provider settings
🎯 8 🛡️ 8 🧠 5
Mitigation:
- cache model catalog in main process
- dedupe in-flight refreshes
- use stale cache while refreshing
- combine account/rate-limit/catalog reads where possible
- never clear ready UI while refresh is pending
Risk 2 - effort values leak into Anthropic
🎯 9 🛡️ 9 🧠 4
Mitigation:
- provider-specific effort validation
- renderer selector branches by provider and selected model
- tests for Anthropic not showing
xhigh,minimal, ornone - orchestrator rejects invalid provider-effort pairs
Risk 3 - id and model diverge later
🎯 8 🛡️ 9 🧠 3
Mitigation:
- use
catalogIdfor identity - use
launchModelfor runtime - tests with fixture where
id !== model
Risk 4 - app-server catalog has unknown fields or new efforts
🎯 8 🛡️ 8 🧠 5
Mitigation:
- tolerant protocol DTOs
- unknown efforts preserved in diagnostics but not selectable
- add one small allow-list update when product intentionally supports a new effort
- no hard crash on unknown
inputModalities
Risk 5 - static fallback becomes accidentally authoritative again
🎯 7 🛡️ 8 🧠 4
Mitigation:
- name fallback helpers clearly
- include
sourcein model catalog - tests assert app-server source wins over fallback
- UI diagnostics expose fallback source
Risk 6 - launch path accepts model from UI but orchestrator rejects it
🎯 8 🛡️ 8 🧠 6
Mitigation:
- provider-explicit Codex launch validation should trust
provider=codexplus app-server-selected model - static
isCodexModel()remains only a generic detector - exact tests with a future-model fixture like
gpt-5.5
Risk 7 - auth-scoped catalog leaks between modes
🎯 7 🛡️ 9 🧠 6
Mitigation:
- include auth scope in catalog cache key
- clear scoped cache on logout and API-key source changes
- tests for ChatGPT catalog not being reused in API-key mode
- UI labels catalog source and auth scope in diagnostics
Risk 8 - Default becomes nondeterministic across relaunch
🎯 8 🛡️ 9 🧠 6
Mitigation:
- persist selected model kind and resolved launch model in launch identity
- exact logs record both
Defaultand concrete model - relaunch preview shows current default resolution before launch
- do not silently rewrite old explicit models
Risk 9 - older Codex binary lacks model/list
🎯 7 🛡️ 9 🧠 5
Mitigation:
- preserve JSON-RPC error code and method
- classify method-not-found separately from app-server failure
- show static fallback with Codex upgrade hint
- cache key includes binary version so upgrades refresh the catalog
Risk 10 - auto auth shows the wrong catalog
🎯 7 🛡️ 9 🧠 6
Mitigation:
- resolve effective auth mode before catalog scope
- keep ChatGPT and API-key catalogs separate
- UI copy distinguishes selected preference, effective launch mode, and fallback credentials
- tests cover ChatGPT-ready + API-key-present and ChatGPT-missing + API-key-ready cases
Risk 11 - UI enables a capability the installed runtime cannot launch
🎯 7 🛡️ 10 🧠 7
Mitigation:
- add explicit runtime capability metadata
- display catalog metadata separately from launch enablement
- fail closed when capability is missing or stale
- test Phase 3 UI against a pre-Phase-4 runtime fixture
Risk 12 - future models appear but break team-agent behavior
🎯 8 🛡️ 8 🧠 6
Mitigation:
- split Codex catalog availability from Agent Teams policy status
- show new models as
New from Codex catalog - block only hard incompatibilities: runtime capability missing, unsupported modality, disabled policy, unsupported effort
- exact logs record new-model status for later debugging
Risk 13 - hidden or upgraded persisted models are silently lost
🎯 8 🛡️ 9 🧠 5
Mitigation:
- run one
includeHidden: truelookup for persisted explicit models missing from visible catalog - preserve model value during restore and relaunch preview
- show upgrade suggestions without auto-rewriting metadata
- test hidden-model and upgrade fixtures
Risk 14 - non-dialog launch path drops Codex effort
🎯 7 🛡️ 9 🧠 7
Mitigation:
- audit team metadata, members metadata, backup/restore, draft retry, launch prefill, and schedule types
- parse provider before parsing effort at every main-process boundary
- block Codex-only effort in any path not updated in the same phase
- add tests outside React launch dialogs
Risk 15 - HMR or slow refresh overwrites correct provider state
🎯 8 🛡️ 9 🧠 5
Mitigation:
- add request/snapshot versioning
- ignore out-of-order provider status responses
- do not let catalog failures overwrite account truth
- keep last ready state visible while a refresh is pending
Risk 16 - global catalog preview differs from project launch catalog
🎯 6 🛡️ 10 🧠 8
Mitigation:
- include cwd, profile, trust, config fingerprint, and launch override fingerprint in catalog scope
- use app-server
config/readwhen available to derive effective config - keep dashboard/global catalog separate from launch/project catalog
- require project-scoped catalog before enabling launch-only controls when cwd is known
Risk 17 - custom or OSS Codex config is mistaken for subscription Codex
🎯 7 🛡️ 9 🧠 7
Mitigation:
- keep Agent Teams Codex scoped to built-in OpenAI Codex provider
- detect effective
model_providerwhen possible - block or degrade custom/OSS provider configs with explicit copy
- do not show ChatGPT account limits for custom provider execution
Risk 18 - non-text model row appears in catalog
🎯 8 🛡️ 9 🧠 4
Mitigation:
- require
textinput modality for Agent Teams launch - treat missing
inputModalitieswith the documented backward-compatible default - do not claim personality support when
supportsPersonality=false
Risk 19 - experimental app-server surface changes behavior
🎯 8 🛡️ 9 🧠 4
Mitigation:
- keep
experimentalApi=false - rely only on documented stable
model/listfields - ignore unknown fields unless a typed use case is added
Risk 20 - app-server catalog passes but native exec fails
🎯 8 🛡️ 10 🧠 6
Mitigation:
- treat app-server catalog as picker truth, not full launch proof
- require Phase 4 native exec argv tests and live smoke where possible
- test model, effort, cwd, profile, and provider scope together
- block unsupported model-effort pairs before
codex exec
Risk 21 - config/read behavior differs across Codex versions
🎯 7 🛡️ 9 🧠 6
Mitigation:
- feature-detect
config/read - always send
{}params at minimum - classify method-missing, invalid-params, scoped-failure, and global-success separately
- never make config-read failure disconnect the Codex account
Risk 22 - forced login/workspace reuses stale catalog
🎯 7 🛡️ 10 🧠 6
Mitigation:
- include forced login method and forced workspace hash in auth scope
- invalidate account, limits, and catalog together when either changes
- display forced auth copy instead of showing conflicting selected auth copy
- redact workspace ids in logs and diagnostics
Risk 23 - local model_catalog_json changes without config change
🎯 6 🛡️ 9 🧠 7
Mitigation:
- hash effective config and optionally active catalog file mtime when app-server exposes enough origin data
- keep TTL/manual refresh fallback when origin data is unavailable
- do not parse or log arbitrary catalog file contents
- do not apply untrusted project-scoped catalog files unless effective config says they are active
Definition Of Done
The feature is done when:
- Codex model picker uses app-server
model/listwhen available. - New app-server-visible Codex models appear without app code changes.
supportedReasoningEffortsanddefaultReasoningEffortdrive Codex effort UI.xhighappears only where Codex reports it.- Anthropic UI and launch behavior are unchanged.
- Codex launches pass effort through
model_reasoning_effort. - UI launch controls are gated by runtime capabilities, not by catalog metadata alone.
- Future app-server-visible models appear without code changes and are marked as new until policy/runtime support is clear.
DefaultCodex selection resolves to concrete launch identity before provisioning.- Auth changes do not reuse stale model catalogs across ChatGPT and API-key modes.
- Project-scoped Codex config and
model_catalog_jsoncannot make launch use a different catalog than preview without explicit degraded copy. - Custom or OSS Codex provider config is not silently presented as ChatGPT subscription-backed Agent Teams Codex.
config/readcompatibility is feature-detected and never breaks account truth on older binaries.- Forced login method and forced workspace changes cannot reuse stale account, rate-limit, or catalog cache.
- Codex binary upgrades invalidate stale catalog cache and retry
model/list. - Older Codex binaries without
model/listfall back without breaking account state. - Static Codex catalog is clearly fallback, not primary truth.
- Hidden persisted models are preserved with explicit warnings.
- Backup/restore, draft retry, launch prefill, member metadata, and scheduled paths do not drop provider-aware effort.
- Exact logs and diagnostics do not persist raw account identifiers or secret values.
- Exact logs include catalog scope and provider scope fingerprints for debugging preview vs launch mismatch.
- HMR and out-of-order refreshes do not replace ready provider status with stale fallback/error state.
- Provider Settings remains fast and does not show transient empty/error states during refresh.
- Tests cover catalog source, fallback, effort validation, and launch pass-through.
Final Signoff And Handoff
The implementation is now ready for review after these checks stay green:
claude_team:pnpm typecheckclaude_team: targeted catalog/runtime/team provisioning Vitest suitesagent_teams_orchestrator_codex_native_spike: targeted Codex native exec and runtime capability Bun suites- Live
codex app-server model/listsmoke against the installed Codex binary - Optional UI smoke with
CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike
Merge requirement:
- merge/pair the
claude_teambranch with theagent_teams_orchestrator_codex_native_spikeruntime capability change. - if the UI branch is merged without the runtime capability change, the feature remains safe but conservative: dynamic future Codex models and
xhighare visible as catalog metadata but blocked for launch. - if the runtime capability change is merged without the UI branch, existing Codex native behavior remains unchanged except for the explicit runtime status payload and
xhighexact argv support already covered by tests.
Recommended final manual smoke:
CLAUDE_DEV_RUNTIME_ROOT=/Users/belief/dev/projects/claude/agent_teams_orchestrator_codex_native_spike pnpm dev
Then verify:
- Provider Settings Codex model list is populated from app-server catalog.
gpt-5.1-codex-minishows onlymedium | high.gpt-5.4showslow | medium | high | xhigh.- Anthropic does not show
minimal,none, orxhigh. - A synthetic or newly released Codex model is not silently hidden by static UI code.
- Launch logs include selected model, resolved launch model, selected effort, resolved effort, catalog source, and runtime capability truth.