35 KiB
Phase 3 - Codex native image attachment delivery
⚠️ Историческая документация: Этот файл содержит дизайн-документацию. Фактическая реализация в src/features/agent-attachments/ может отличаться от описанной архитектуры.
Summary
Goal: support image attachments for Codex native teammates by using Codex CLI's supported --image <FILE> transport rather than embedding base64 in prompt text.
Chosen approach: optimized image artifact files + Codex native exec args extension + text-only fallback errors for unsupported attachment kinds.
🎯 8.6 🛡️ 8.4 🧠 6.6
Estimated change size: 260-440 LOC across two repos.
Repos:
/Users/belief/dev/projects/claude/claude_team/Users/belief/dev/projects/claude/agent_teams_orchestrator
Live proof
Validated manually:
printf '%s\n' 'Look at the attached image. Reply with exactly one word: red, green, or blue.' \
| codex exec --json --skip-git-repo-check -C /tmp \
--model gpt-5.4-mini \
--image /tmp/agent-attachment-prototypes/red-card-valid.png \
--output-last-message /tmp/agent-attachment-prototypes/codex-last.txt \
-
Result:
red
Therefore Codex adapter should pass file paths.
Current blocker
agent_teams_orchestrator currently rejects non-text prompts in Codex native:
Codex native phase 0 only supports text-only prompts. Images, documents, and structured input are not wired yet.
Likely locations:
agent_teams_orchestrator/src/services/codexNative/turnExecutor.ts
agent_teams_orchestrator/src/services/codexNative/execRunner.ts
Do not remove this guard globally. Replace it with structured extraction for supported image content only.
Data contract
Add a Codex native input shape that can represent text plus image files.
export interface CodexNativeTurnInput {
promptText: string;
imagePaths: string[];
}
If the current internal API only accepts text, introduce a narrow overload or adapter:
export type CodexNativePromptInput =
| { kind: 'text'; text: string }
| { kind: 'text-with-images'; text: string; imagePaths: string[] };
Do not pass base64 into Codex prompt text.
Claude-team side adapter
CodexNativeAttachmentAdapter should prepare image artifacts.
export class CodexNativeAttachmentAdapter implements AttachmentDeliveryAdapter {
readonly runtimeKind = 'codex-native' as const;
canDeliver(ctx: AttachmentRuntimeContext, attachment: NormalizedAgentAttachment) {
if (attachment.kind !== 'image') {
return block('Codex native currently supports image attachments only.');
}
return allowIfMime(attachment, ['image/png', 'image/jpeg', 'image/webp']);
}
async prepare(ctx: AttachmentRuntimeContext, attachment: NormalizedAgentAttachment) {
const variant = selectCodexImageFileVariant(attachment);
const path = await this.artifactStore.materializeFileVariant(variant, {
teamName: ctx.teamName,
runtime: 'codex-native',
});
return {
runtimeKind: this.runtimeKind,
attachmentId: attachment.id,
part: { kind: 'codex-image-arg', path },
diagnostics: [`prepared image file for Codex native: ${formatBytes(variant.byteSize)}`],
};
}
}
Artifact directory should be app-owned and not user-editable:
~/.claude/teams/<team>/attachments/<message-id>/<attachment-id>/<variant-id>.<ext>
If existing team data conventions prefer another base path, use that. The key is deterministic metadata and cleanup safety.
Orchestrator changes
Extract Codex image paths from content blocks
export function toCodexNativeTurnInput(input: string | ContentBlockParam[]): CodexNativeTurnInput {
if (typeof input === 'string') {
return { promptText: input, imagePaths: [] };
}
const textParts: string[] = [];
const imagePaths: string[] = [];
for (const block of input) {
if (block.type === 'text') {
textParts.push(block.text);
continue;
}
if (block.type === 'image') {
const path = materializeCodexImageBlockToTempFile(block);
imagePaths.push(path);
continue;
}
throw new Error(`Codex native does not support ${block.type} attachments yet.`);
}
return {
promptText: textParts.join('\n\n').trim(),
imagePaths,
};
}
Preferred path: desktop already materializes artifacts and passes paths, so orchestrator should not need to decode base64 except for compatibility with direct SDK/fork calls.
Extend exec args
export function buildCodexNativeExecArgs(options: CodexNativeExecOptions): string[] {
return [
'exec',
'--json',
'--skip-git-repo-check',
'-C', options.cwd,
...options.imagePaths.flatMap(path => ['--image', path]),
'-',
];
}
Preserve stdin prompt behavior
Keep:
child.stdin.end(options.prompt);
Do not switch to putting the prompt in argv if it can be long.
Edge cases
Image file path missing before Codex starts
Expected behavior:
- fail before spawn with a clear error;
- do not start Codex with missing
--imagepath.
Multiple images
Codex CLI supports repeatable --image <FILE>. Pass one arg pair per image.
Unsupported document/PDF
Expected behavior:
Codex native does not support PDF attachments yet. Send text or images only.
Do not silently convert PDF to text in this phase.
OpenAI account/session issue
Attachment code must not mask auth errors. If Codex says login required, show Codex auth error unchanged.
Artifact cleanup
Do not delete image files immediately after spawn. Codex may read after process start. Keep artifacts with message/team data and clean with team cleanup or retention policy.
Project path sandbox
Codex gets --image absolute paths outside project. Confirm current Codex CLI accepts this. Live test used /tmp, so it does. If future sandbox blocks, copy artifacts into an app-owned allowed directory.
Test plan
Orchestrator unit
- text-only input produces no
--imageargs; - text plus one image produces one
--imagearg; - multiple images produce repeated args in order;
- unsupported document block throws clear error;
- missing image path throws before spawn;
- prompt still goes to stdin.
Desktop unit/service
- Codex adapter chooses file variant;
- artifact materialization writes expected file;
- planner blocks PDF for Codex;
- error messages do not include base64.
Live e2e
Only when explicitly requested:
codex exec --json --skip-git-repo-check -C /tmp --model gpt-5.4-mini --image red-card-valid.png -
Expected final message:
red
Suggested focused checks:
# claude_team
pnpm vitest run src/features/agent-attachments/**/*.test.ts test/main/services/team/TeamProvisioningService.test.ts
pnpm typecheck --pretty false
# agent_teams_orchestrator
bun test src/services/codexNative/*.test.ts
Safety checklist
- Text-only Codex path unchanged.
- Auth/session errors preserved.
- No base64 in prompt text.
- No immediate cleanup of image files after spawn.
- Unsupported files fail before model call.
Deep implementation details
Two-repo boundary
Desktop should decide and materialize attachment artifacts. Orchestrator should execute Codex with prepared input.
claude_team:
normalize/optimize/store image
decide Codex supports image
pass prepared prompt + image artifact refs into runtime handoff
agent_teams_orchestrator:
accept text + imagePaths
validate files exist/readable
append --image args
keep prompt on stdin
Avoid making orchestrator depend on desktop feature internals.
Minimal orchestrator type extension
export interface CodexNativeExecOptions {
cwd: string;
prompt: string;
model?: string;
imagePaths?: string[];
env?: NodeJS.ProcessEnv;
}
Default imagePaths = [] preserves existing callers.
Args builder exact behavior
export function buildCodexNativeExecArgs(options: CodexNativeExecOptions): string[] {
const args = [
'exec',
'--json',
'--skip-git-repo-check',
'-C',
options.cwd,
];
if (options.model) {
args.push('--model', options.model);
}
for (const imagePath of options.imagePaths ?? []) {
args.push('--image', imagePath);
}
args.push('-');
return args;
}
Order matters. Keep - last so stdin is prompt.
File validation before spawn
async function assertCodexImageFilesReady(paths: string[]): Promise<void> {
for (const imagePath of paths) {
const stat = await fs.promises.stat(imagePath).catch(() => null);
if (!stat?.isFile()) {
throw new Error(`Codex image attachment is missing: ${path.basename(imagePath)}`);
}
if (stat.size <= 0) {
throw new Error(`Codex image attachment is empty: ${path.basename(imagePath)}`);
}
if (stat.size > CODEX_IMAGE_FILE_MAX_BYTES) {
throw new Error(`Codex image attachment is too large: ${path.basename(imagePath)}`);
}
}
}
Do not include full absolute paths in user messages unless copied diagnostics need them and they are redacted/safe.
Desktop artifact store contract
export interface AttachmentArtifactStore {
materializeVariantFile(input: {
teamName: string;
messageId: string;
attachmentId: string;
variantId: string;
filename: string;
base64: string;
expectedSha256: string;
}): Promise<{ path: string; bytes: number; sha256: string }>;
}
Validation:
- directory created with recursive mkdir;
- filename sanitized;
- write to temp file then rename;
- sha256 verified after write;
- if existing file has same sha256, reuse;
- if existing file mismatch, rewrite from original.
Artifact write pattern
const tmp = `${target}.${process.pid}.${Date.now()}.tmp`;
await fs.promises.writeFile(tmp, bytes, { flag: 'wx' }).catch(async error => {
if (error.code === 'EEXIST') {
await fs.promises.rm(tmp, { force: true });
await fs.promises.writeFile(tmp, bytes, { flag: 'wx' });
return;
}
throw error;
});
await fs.promises.rename(tmp, target);
Prefer a shared atomic write helper if one already exists.
More edge cases
| Edge case | Expected behavior |
|---|---|
| Codex login expires | Codex auth error shown unchanged |
| Image path contains spaces | args array handles it, no shell quoting needed |
| Artifact deleted between validation and spawn | Codex may fail; surface exact stderr, but pre-spawn validation reduces probability |
| Multiple Codex members use same attachment | artifact store can reuse same variant path by hash |
| User sends image to Codex lead while lead busy | existing lead busy/message delivery semantics remain unchanged |
| Codex model selected is text-only in future | capability gate should block when catalog knows; otherwise live model may error, preserve exact error |
| Image is WebP | if Codex accepts through --image, allow; otherwise convert to PNG/JPEG in Phase 1/adapter policy |
| PDF attached to Codex | block in v1 with clear message |
Test additions in orchestrator
test('adds repeated --image args before stdin marker', () => {
expect(buildCodexNativeExecArgs({ cwd: '/tmp', prompt: 'x', imagePaths: ['/a.png', '/b.jpg'] }))
.toContainSequence(['--image', '/a.png', '--image', '/b.jpg', '-']);
});
test('keeps text-only args unchanged', () => {
expect(buildCodexNativeExecArgs({ cwd: '/tmp', prompt: 'x' }))
.not.toContain('--image');
});
Regression traps
- Passing prompt as argv and accidentally truncating/escaping long prompts.
- Deleting artifact file in finally before Codex has read it.
- Allowing arbitrary renderer-supplied paths into
--image. - Hiding Codex auth errors behind
Attachment failed. - Treating Codex CLI
turn.completedas image-understood proof without response content.
File-by-file implementation plan
claude_team
Potential files:
src/features/agent-attachments/main/adapters/output/CodexNativeAttachmentAdapter.ts
src/features/agent-attachments/main/infrastructure/AttachmentArtifactStore.ts
src/main/services/team/TeamProvisioningService.ts
src/main/ipc/teams.ts
Keep the desktop side responsible for app-owned artifact paths.
agent_teams_orchestrator
Potential files:
src/services/codexNative/turnExecutor.ts
src/services/codexNative/execRunner.ts
src/services/codexNative/*.test.ts
Make the orchestrator change backward compatible by defaulting imagePaths to [].
Integration contract between repos
If the desktop already invokes orchestrator through a structured prompt, add image paths explicitly rather than hiding them in text.
Preferred:
interface NativeRuntimePromptEnvelope {
text: string;
attachments?: Array<{
kind: 'image-file';
path: string;
mimeType: 'image/png' | 'image/jpeg' | 'image/webp';
sha256: string;
}>;
}
Avoid:
Here is an image: /tmp/foo.png
unless the runtime explicitly cannot accept images and user chose a textual fallback.
Artifact security rules
- Renderer never supplies final file path.
- Backend chooses artifact path under app-owned directory.
- Path traversal in filename is sanitized.
- Artifact path passed to Codex is absolute.
- Artifact checksum is verified after write.
- Artifact metadata does not include API keys or user prompt text.
More detailed edge cases
| Edge case | Expected behavior |
|---|---|
| Codex process starts but exits before reading image | surface exact Codex stderr/exit code |
| Artifact file exists but unreadable due permissions | fail before spawn if detectable |
| Two sends with same image and same message id | reuse same artifact variant |
| Two sends with same image but different message id | allow separate metadata, optionally same content-addressed blob |
| Image path has non-ASCII filename | store sanitized ASCII filename plus metadata originalName |
| User cancels send during artifact write | abort write if supported, cleanup temp file |
Codex CLI changes --image flag |
tests fail at args builder/live smoke before release |
Test code skeleton
describe('CodexNativeAttachmentAdapter', () => {
it('materializes image variant and returns codex image arg', async () => {
const adapter = new CodexNativeAttachmentAdapter(fakeArtifactStore);
const part = await adapter.prepare(ctx, imageAttachment);
expect(part.part).toEqual({ kind: 'codex-image-arg', path: '/tmp/app/att/red.png' });
});
it('blocks PDF attachments', () => {
const decision = adapter.canDeliver(ctx, pdfAttachment);
expect(decision.allowed).toBe(false);
expect(decision.blockers[0].code).toBe('attachment_runtime_unsupported');
});
});
describe('buildCodexNativeExecArgs', () => {
it('keeps stdin marker last with image args before it', () => {
expect(buildCodexNativeExecArgs({ cwd: '/tmp', prompt: 'x', imagePaths: ['/a.png'] }))
.toEqual(expect.arrayContaining(['--image', '/a.png', '-']));
});
});
Review checklist
- Existing text-only Codex tests still pass.
imagePathsdefault is empty.- No shell string command building for image paths.
- Missing image file fails before spawn where possible.
- Auth errors are not converted to attachment errors.
- The feature works with Codex subscription auth, not only API key.
Phase 3 exit criteria
Phase 3 is complete only when:
- text-only Codex native still uses the same exec path;
- Codex image send uses
--image <path>and stdin prompt; - image paths come only from app-owned artifacts;
- missing artifact fails before spawn;
- unsupported PDFs/documents are blocked before Codex call;
- Codex auth errors remain exact;
- no OpenCode/Claude code changes are included except shared interfaces.
Cross-repo sequencing
Recommended order:
- Orchestrator: add optional
imagePathsto Codex exec runner with tests. - Orchestrator: keep turn executor text-only behavior unless image paths are explicitly supplied.
- Desktop: add Codex adapter that materializes image files.
- Desktop: wire Codex adapter only for Codex native send path.
- Live smoke Codex with
gpt-5.4-mini.
Do not wire desktop before orchestrator can safely accept imagePaths.
Backward compatibility in orchestrator
function normalizeExecOptions(options: CodexNativeExecOptions): Required<Pick<CodexNativeExecOptions, 'imagePaths'>> {
return {
imagePaths: options.imagePaths ?? [],
};
}
Existing tests should not need imagePaths.
Handling structured content blocks
If orchestrator receives Anthropic-style content blocks, only support image blocks when they already point to app-owned artifacts or can be materialized safely.
V1 preference:
Desktop passes file paths, not base64 blocks, to Codex native.
If a direct orchestrator caller passes base64 image blocks, fail with clear TODO unless implementing materialization there too.
throw new Error('Codex native image blocks must be materialized to file paths before execution.');
This prevents duplicate artifact stores across repos.
Codex path validation nuance
Do not require image file to be inside project cwd. Live test showed /tmp works. Requiring cwd-only would break app-owned artifact store. Instead require:
- absolute path;
- file exists;
- file extension/MIME allowed;
- size under budget;
- path was produced by trusted desktop adapter or trusted test input.
More Codex tests
it('rejects relative image paths', async () => {
await expect(runCodexNativeExec({ prompt: 'x', imagePaths: ['foo.png'] }))
.rejects.toThrow(/absolute/i);
});
it('does not include image path in prompt stdin', async () => {
const child = fakeCodexChild();
await runner.run({ prompt: 'describe', imagePaths: ['/tmp/a.png'] });
expect(child.stdin.end).toHaveBeenCalledWith('describe');
});
More Codex bug traps
| Trap | Prevention |
|---|---|
| prompt accidentally becomes argv | assert - remains final arg |
| image path included twice | test exact args |
| artifact path deleted too early | keep artifacts with message retention |
| base64 path from renderer | backend-only artifact store |
| Codex auth failure hidden | do not catch provider errors as attachment errors |
| unsupported PDF converted to prompt text silently | block explicitly |
Detailed Implementation Checklist
Step 1 - Define desktop to orchestrator attachment contract
Codex native invocation details should stay in the runtime/orchestrator boundary. Desktop should pass managed artifact references, not raw arbitrary paths.
export interface CodexNativeAttachmentRequest {
kind: 'image';
artifactId: string;
mimeType: 'image/png' | 'image/jpeg';
absolutePath: string;
sizeBytes: number;
}
Validation before crossing process boundary:
- Path must be under app-managed attachment artifact directory.
- MIME type must be PNG/JPEG for Phase 3.
- File must exist and match expected size budget.
- File path must not come directly from renderer.
Step 2 - Add Codex adapter in desktop
Desktop adapter should produce a provider-neutral runtime request, not CLI args directly.
export function buildCodexAttachmentRuntimeRequest(input: {
text: string;
attachments: AgentAttachmentPayload[];
}): CodexRuntimePromptRequest {
return {
text: input.text,
images: input.attachments.map((attachment) => ({
path: selectBestImageArtifact(attachment, 'codex').path,
mimeType: selectBestImageArtifact(attachment, 'codex').mimeType,
})),
};
}
Step 3 - Add orchestrator CLI serialization
In orchestrator, serialize to Codex-native image args only at the final command builder.
Expected conceptual shape:
const args = ['exec', '--json', '--model', model];
for (const image of request.images) {
args.push('--image', image.path);
}
args.push('-');
Do not use base64 text fallback.
Step 4 - Preserve Codex auth behavior
Codex attachment changes must not modify:
CODEX_HOMEselection.- ChatGPT subscription vs API key logic.
forced_login_method=chatgptpropagation.- existing diagnostics for “ChatGPT login is required”.
If Codex auth fails, the error should remain auth-specific, not attachment-specific.
Codex-Specific Edge Cases
| Case | Expected behavior |
|---|---|
| Codex ChatGPT subscription logged out | Preserve existing login required diagnostic. |
| Codex API key mode selected but native subscription expected | Preserve existing auth mode diagnostic. |
| Image artifact path deleted before send | Delivery fails with artifact missing. |
| Large image after optimization | Block before Codex CLI. |
| Non-image file | Phase 3 does not route it through --image. |
| Multiple images | Pass repeated --image args if Codex supports them, otherwise enforce count 1 with clear warning. |
| Codex model without vision | Block based on capability matrix if known. |
Cross-Repo Contract Test Idea
Add a small pure test in orchestrator for command building:
it('passes codex images as repeated --image args', () => {
const command = buildCodexExecCommand({
prompt: 'What color?',
images: ['/tmp/a.png', '/tmp/b.jpg'],
});
expect(command.args).toContainSequence(['--image', '/tmp/a.png']);
expect(command.args).toContainSequence(['--image', '/tmp/b.jpg']);
expect(command.stdin).toBe('What color?');
});
Add desktop-side test for path safety:
it('rejects codex image paths outside managed artifact directory', () => {
expect(() => validateManagedArtifactPath('/etc/passwd')).toThrow(/outside managed attachment directory/);
});
Manual Codex QA
- Confirm dashboard shows Codex ChatGPT account ready.
- Send red-card image to Codex lead or member.
- Expected answer:
redor equivalent. - Send unsupported model if available and verify warning.
- Send oversized optimized image and verify block before runtime call.
- Temporarily invalidate Codex login and verify auth diagnostic remains clear.
Phase 3 Exit Criteria
- Codex text-only prompt path unchanged.
- Codex image prompt uses native image channel, not base64 text.
- Codex auth diagnostics still match current behavior.
- Desktop never passes renderer-provided arbitrary file paths to orchestrator.
- One real Codex subscription visual smoke test passes.
Implementation Safeguards
Keep Codex login/session handling untouched
The recent launch stability work around Codex auth is fragile and should not be mixed with attachment delivery.
Do not change:
- Codex auth discovery.
- ChatGPT account vs API key selection.
CODEX_HOMEpropagation.forced_login_method=chatgptsettings.- preflight auth status copy.
Attachment support should sit after auth has already resolved.
Avoid stdin bloat
Codex image paths should be CLI args or native request fields. The prompt on stdin should stay text-only.
Bad:
stdin = `${prompt}\n\nIMAGE_BASE64=${base64}`;
Good:
args.push('--image', managedImagePath);
stdin = prompt;
Artifact lifetime
Codex runtime may read image files after process spawn. Do not delete artifacts immediately after creating command args.
Safe policy:
- Keep managed artifacts at least until delivery result is terminal.
- If message remains retryable, keep artifacts until retry window expires.
- Garbage collect old artifacts by age and reachability from message records.
Codex retry edge cases
| Case | Safe behavior |
|---|---|
| Retry after artifact GC | Fail with artifact missing, do not send text-only replacement silently. |
| Retry after model switch | Revalidate capability and budgets. |
| Retry after Codex logout | Show auth error, preserve message. |
| Runtime exits after receiving image | Runtime diagnostic, not attachment validation failure. |
Cross-repo PR checklist
Desktop repo:
- Produces provider-neutral image request.
- Validates managed artifact paths.
- Preserves message/ledger semantics.
Orchestrator repo:
- Converts request to Codex native args.
- Does not inspect renderer metadata.
- Redacts command diagnostics.
- Tests command builder with one and multiple images.
Failure Injection Tests for Phase 3
describe('Codex native image delivery', () => {
it('keeps prompt on stdin and images as args', () => {
const command = buildCodexNativeCommand({
prompt: 'What color?',
images: [managedImage('/tmp/app/attachments/a.png')],
});
expect(command.stdin).toBe('What color?');
expect(command.args).toContain('--image');
expect(command.stdin).not.toContain('base64');
});
it('does not mask Codex login failure as attachment failure', async () => {
const result = await runCodexImageDelivery(fakeCodexLoggedOut());
expect(result.error.message).toMatch(/codex login|ChatGPT/i);
expect(result.error.code).not.toBe('attachment_provider_rejected');
});
});
Codex Command Builder Invariants
--imageargs must appear before prompt stdin is consumed if Codex CLI requires it.- Paths must be absolute managed artifact paths.
- Do not shell-concatenate paths. Use argv array.
- Do not quote manually inside argv array.
- Do not pass images through environment variables.
- Do not write temp prompt files containing base64 image text.
Codex Existing-Team Edge Case
Existing teams may have members created before attachment support. That must not matter. Attachment capability is based on current provider/model/runtime, not team creation date.
If old metadata lacks provider/model:
- Use existing compatibility probe behavior.
- Do not infer vision support from old inbox names.
- Block image send until team/member metadata is stable.
Phase 3 Stop Conditions
Stop and reassess if:
- Codex image support requires changing auth detection.
- Codex CLI version in app runtime does not support
--image. - Image args only work in standalone shell but not app-managed runtime.
- Provider diagnostics start saying API key mode when ChatGPT subscription was selected.
File-Level Implementation Plan
Desktop suggested files:
src/features/agent-attachments/main/providers/codexAttachmentAdapter.ts
src/features/agent-attachments/main/providers/codexAttachmentAdapter.test.ts
Orchestrator suggested files:
src/runtime/codex/codexImageArgs.ts
src/runtime/codex/codexImageArgs.test.ts
Use actual repo structure names during implementation, but keep this separation: desktop plans delivery, orchestrator serializes runtime command.
Desktop adapter skeleton
export function buildCodexNativeAttachmentRequest(input: {
text: string;
attachments: AgentAttachmentPayload[];
}): CodexNativePromptRequest {
return {
text: input.text,
images: input.attachments.map((attachment) => {
const artifact = selectProviderImageArtifact(attachment, 'codex');
return {
artifactId: artifact.artifactId,
path: artifact.absolutePath,
mimeType: artifact.mimeType,
sizeBytes: artifact.sizeBytes,
};
}),
};
}
Orchestrator command args skeleton
export function appendCodexImageArgs(args: string[], images: CodexRuntimeImage[]): void {
for (const image of images) {
assertManagedRuntimeImagePath(image.path);
args.push('--image', image.path);
}
}
Never build a shell string like:
`codex exec --image ${path}`
Use argv arrays only.
Codex provider proof interaction
Codex image delivery should reuse existing delivery proof gates. Image delivery itself is not success.
Success examples:
- Visible direct reply to user.
- Correct
message_sendrelay for delegated/peer path. - Correct work-sync report for work-sync path.
Failure examples:
- Codex command exits before assistant message.
- Codex auth fails.
- Codex returns text saying it cannot inspect image.
The last one may still be a visible reply. Treat it as delivered but semantically unhelpful, not as transport failure.
Phase 3 Deep Review Addendum
Codex runtime compatibility probe
Before enabling Codex image delivery in UI, the app should know whether the configured Codex runtime supports image args.
Conservative options:
- Static capability by known runtime version - 🎯 8 🛡️ 8.5 🧠 4, примерно
80-160строк. - One-time local CLI help parse/cache - 🎯 7.5 🛡️ 8 🧠 5, примерно
120-220строк. - Always attempt and surface provider error - 🎯 6 🛡️ 6.5 🧠 2, примерно
30-80строк.
Recommended for release: option 1 if runtime version is already controlled by the app, otherwise option 2 with short cache.
Do not run expensive live Codex image probes on every composer render.
Codex request lifecycle
Composer send
-> persist message and artifacts
-> backend validates Codex capability
-> desktop builds Codex runtime request
-> orchestrator builds argv with --image
-> Codex runtime produces response
-> existing delivery proof gates decide delivered/failed
Every arrow should have tests or explicit diagnostics.
Codex diagnostics examples
Good:
Codex image delivery failed: optimized image artifact is missing. The message was saved but cannot be retried with the image.
Codex ChatGPT login is required before sending image attachments.
Bad:
Agent failed.
Spawn failed.
unless process launch actually failed.
Phase 3 Implementation Contract Addendum
Codex DTO between desktop and orchestrator
Use a stable JSON-serializable shape.
export interface CodexImageAttachmentForRuntime {
artifactId: string;
path: string;
mimeType: 'image/png' | 'image/jpeg';
sizeBytes: number;
}
export interface CodexPromptRuntimeRequest {
text: string;
images?: CodexImageAttachmentForRuntime[];
}
Rules:
imagesomitted or empty means exact text-only legacy behavior.imagespresent means orchestrator appends native--imageargs.- Unknown fields should be ignored or rejected consistently, not partially consumed.
Codex compatibility check
If the app bundles/controls Codex CLI version, use static support knowledge. If not, parse codex exec --help once and cache whether --image exists.
Pseudo-code:
export async function detectCodexImageArgSupport(runtime: CodexRuntime): Promise<boolean> {
const cached = codexImageSupportCache.get(runtime.binaryPath);
if (cached) return cached.supported;
const help = await runtime.run(['exec', '--help'], { timeoutMs: 5000 });
const supported = /--image\b/.test(help.stdout + help.stderr);
codexImageSupportCache.set(runtime.binaryPath, { supported, checkedAt: Date.now() });
return supported;
}
Do not run this on every message send if it is slow.
Codex error mapping table
| Error text category | Final classification |
|---|---|
| login required / ChatGPT session | provider auth/session error |
unknown option --image |
runtime does not support image attachments |
| file not found | attachment artifact missing |
| max tokens/quota | provider rejected message |
| model cannot inspect image | visible semantic response if delivered |
Business logic should not depend on fragile regex except for display cleanup. Prefer structured exit codes/known adapter failures where available.
Implementation Readiness Addendum
Definition of Ready for Phase 3
Before coding Phase 3:
- Phase 1 normalized artifact storage is in place.
- Codex text-only delivery tests are green.
- Orchestrator branch/worktree is aligned with desktop branch.
- Actual app-managed Codex runtime has been checked for image support.
Cross-repo compatibility rule
Desktop must tolerate orchestrator without image support during development by failing safely before send, not by crashing runtime.
if (!runtimeCapabilities.codexImageArgs) {
throw new AgentAttachmentError(
'attachment_runtime_transport_failed',
'Current Codex runtime does not support image attachments.'
);
}
Codex no-regression assertions
- Text-only Codex request shape unchanged when
imagesis empty. - Codex ChatGPT account mode still selected when user chose subscription.
- No
OPENAI_API_KEYfallback is introduced for subscription image send. CODEX_HOMEstill points to expected local auth state.- Provider auth errors are not swallowed by attachment adapter.
Codex additional edge cases
| Edge case | Expected behavior |
|---|---|
--image unsupported |
Block with runtime unsupported diagnostic. |
| Two images but Codex supports one only | Block or reduce with explicit user choice, no silent drop. |
| Image path contains spaces | argv array handles it. Test it. |
| Image path contains shell metacharacters | argv array handles it. No shell interpolation. |
| Codex returns answer in non-English | Accept semantic visual answer if it clearly identifies image. |
Final Phase 3 Acceptance Specs
Spec 1 - Codex text-only no regression
Given a user sends a text-only message to a Codex target
When the runtime request is built
Then images is omitted or empty
And the existing Codex text-only path is used
And auth/session handling is unchanged
Spec 2 - Codex image uses native args
Given a user sends a PNG image to Codex
When the orchestrator command is built
Then the prompt remains text-only stdin
And the image path is passed with --image argv
And no shell string interpolation is used
And no base64 appears in stdin
Spec 3 - Codex runtime lacks image support
Given the configured Codex runtime does not support --image
When the user tries to send an image
Then the send is blocked or delivery fails before runtime prompt
And the diagnostic says Codex runtime does not support image attachments
And text-only Codex messages still work
Phase 3 exact PR contract
The Phase 3 PR is acceptable only if:
- Desktop and orchestrator agree on DTO shape.
- Orchestrator command builder has argv tests.
- Codex auth tests or diagnostics are not weakened.
- Runtime support detection is cached or static, not repeated on every render.
- Paths with spaces are covered in tests.
- No API-key fallback is introduced for subscription mode.
Codex failure copy examples
Codex image delivery is unavailable because the configured Codex runtime does not support --image.
Codex ChatGPT login is required before this image can be sent.
Prepared image file is missing. Remove and attach the image again.
Phase 3 Pre-Mortem and Extra Safeguards
Likely Codex mistakes
| Mistake | Concrete prevention |
|---|---|
--image command built through shell string |
Use argv arrays only. |
| Image support check runs too often | Cache runtime capability. |
| API key mode accidentally used for subscription | Preserve selected auth mode and existing env rules. |
| Orchestrator receives desktop fields it ignores | Add contract test across DTO. |
| Runtime does not support image but UI allows send | Capability gate checks runtime support. |
Cross-version DTO tolerance
During development, desktop and orchestrator can briefly be out of sync. The final merged state must be compatible, but code should fail clearly if mismatch happens.
export function assertCodexRuntimeUnderstandsImages(runtimeCaps: RuntimeCapabilities): void {
if (!runtimeCaps.codexImages) {
throw new AgentAttachmentError(
'attachment_runtime_transport_failed',
'This Codex runtime does not support image attachments yet.'
);
}
}
Codex diagnostic redaction
If command args are included in diagnostics, redact paths if needed and never include prompt text with secrets.
Safe diagnostic:
{
"provider": "codex",
"model": "gpt-5.4-mini",
"imageCount": 1,
"imageArgsUsed": true,
"stdinBytes": 31
}
Unsafe diagnostic:
{
"stdin": "full user prompt with private text",
"imageBase64": "..."
}