docs(team): add agent attachment implementation plans

This commit is contained in:
777genius 2026-05-09 01:09:36 +03:00
parent deef93a81f
commit 7b88d495d0
6 changed files with 4412 additions and 0 deletions

File diff suppressed because it is too large Load diff

View file

@ -909,3 +909,811 @@ describe('picaImageOptimizer', () => {
});
});
```
## Detailed Implementation Checklist
### Step 1 - Add pure domain module
Create a feature-local domain module with no Electron, DOM, or filesystem dependency.
Suggested files:
```text
src/features/agent-attachments/shared/types.ts
src/features/agent-attachments/shared/budgets.ts
src/features/agent-attachments/shared/capabilities.ts
src/features/agent-attachments/shared/validation.ts
src/features/agent-attachments/shared/index.ts
```
The first implementation should be intentionally boring:
```ts
export function classifyAttachmentMime(mimeType: string): AttachmentKind {
if (mimeType === 'image/png') return 'image';
if (mimeType === 'image/jpeg') return 'image';
if (mimeType === 'image/webp') return 'image';
if (mimeType === 'application/pdf') return 'file';
if (mimeType.startsWith('text/')) return 'file';
return 'unsupported';
}
```
Avoid clever extension guessing in v1. Extension fallback can be a later hardening step if real users need it.
### Step 2 - Add renderer optimizer behind composer-only call site
Keep optimization in renderer because browser APIs are good at image decode and `pica` is browser-oriented.
```ts
export interface OptimizeImageForAgentInput {
file: File;
budget: ImageOptimizationBudget;
}
export interface OptimizeImageForAgentResult {
original: BrowserAttachmentArtifact;
optimized: BrowserAttachmentArtifact;
warnings: string[];
}
```
Implementation order:
1. Decode image dimensions with `createImageBitmap` where available.
2. Compute target dimensions from total batch budget and per-image max dimension.
3. Use `pica.resize` into canvas.
4. Encode JPEG/PNG based on input and transparency.
5. Return warnings, not thrown errors, for non-fatal resize degradation.
### Step 3 - Add backend validation
Main process must re-check everything received from renderer.
```ts
export function validateNormalizedAttachmentForSend(input: {
attachment: AgentAttachmentPayload;
target: ProviderTarget;
}): AttachmentValidationResult {
const capability = resolveAgentAttachmentCapability(input.target);
const sizeResult = validateAttachmentBudget(input.attachment, capability);
if (!sizeResult.ok) return sizeResult;
return { ok: true, warnings: [] };
}
```
Backend validation should never trust:
- MIME type from browser alone.
- File name extension.
- Renderer-provided optimized dimensions.
- Renderer-provided `supported: true` capability result.
### Step 4 - Wire UI warnings without changing delivery
Before provider adapters are implemented, UI can show warnings but must not pretend unsupported providers work.
Safe UX:
- Allow attach, show preview.
- On send, block if selected target cannot receive the attachment yet.
- Explain which phase/provider support is missing.
- Keep text-only send untouched.
## Phase 1 Exit Criteria
Phase 1 is complete when:
- Text-only composer behavior is unchanged.
- Image preview still works for small images.
- Large image preview shows optimized size and warnings.
- Unsupported file type produces a clear local validation error.
- Backend rejects forged oversized attachment metadata.
- No provider delivery path receives new attachment data yet unless explicitly wired in later phases.
## Edge Case Matrix
| Case | Expected behavior |
|---|---|
| Animated GIF | Treat as unsupported for image delivery in v1, or convert first frame only with explicit warning if implemented. |
| Transparent PNG | Prefer PNG if small, JPEG only if transparency is absent or user accepts flattened background. |
| Huge panorama | Downscale by max edge and total pixel budget. |
| Tiny image | Do not upscale. |
| Corrupt image | Show decode failed, do not send attachment. |
| HEIC on macOS | Do not promise support unless decode pipeline is explicitly tested. |
| Clipboard image with no filename | Generate stable display name like `clipboard-image.png`. |
| Same image attached twice | Keep two attachment ids, do not dedupe content silently. |
| Multiple images exceed total budget | Optimize all proportionally, then block if still over cap. |
| User switches target after attaching | Recompute warnings for new provider/model before send. |
## Common Bug Patterns to Avoid
- Storing base64 in message JSON. This can bloat inboxes and break process stdin.
- Mutating the original attachment when optimization runs.
- Letting renderer decide final support without backend validation.
- Showing “sent” when attachment was dropped from provider payload.
- Collapsing all failures into generic “delivery failed”.
- Running optimization in Electron main with a new native dependency right before release.
- Changing current file attachment behavior for text/PDF before image flow is stable.
## Focused Test Examples
```ts
describe('attachment budgets', () => {
it('blocks a single optimized image over hard cap', () => {
const result = validateAttachmentBudget(
fakeImage({ sizeBytes: 12 * 1024 * 1024 }),
codexVisionCapability()
);
expect(result.ok).toBe(false);
expect(result.code).toBe('attachment_too_large');
});
it('does not upscale small images', () => {
const plan = planImageResize({ width: 320, height: 200 }, { maxEdge: 1600 });
expect(plan.width).toBe(320);
expect(plan.height).toBe(200);
});
});
```
## Manual QA Script
Use a clean dev profile and verify:
1. Attach a 200 KB PNG screenshot - preview appears, no warning.
2. Attach a 12 MB PNG screenshot - preview appears, optimized size shown.
3. Attach three screenshots - total budget warning is deterministic.
4. Attach a `.txt` file - current file behavior remains unchanged.
5. Attach a corrupt image renamed `.png` - decode error appears.
6. Switch target from Claude to non-vision OpenCode model - unsupported warning appears.
7. Remove attachment - warnings clear.
## Rollback Plan
If Phase 1 causes UI instability:
- Keep domain files, but remove composer call to optimizer.
- Leave backend validation unused.
- No persisted migration is needed if message schema was not changed.
## Implementation Safeguards
### Keep Phase 1 read-only for provider delivery
Phase 1 should not change how messages are sent to agents. It should only add normalization, previews, warnings, and backend validation primitives.
Safe call sites:
- Composer attachment selection.
- Composer warning rendering.
- Draft serialization of normalized metadata.
- Backend validation helper tests.
Unsafe call sites for Phase 1:
- Claude delivery transport.
- Codex runtime invocation.
- OpenCode prompt ledger.
- Team launch/provisioning.
- Member runtime liveness.
### Suggested internal state machine
```ts
type AttachmentPrepareState =
| { status: 'idle' }
| { status: 'decoding'; attachmentId: string }
| { status: 'optimizing'; attachmentId: string; progress?: number }
| { status: 'ready'; attachment: AgentAttachmentPayload }
| { status: 'blocked'; attachmentId: string; reason: AttachmentDeliveryFailureCode }
| { status: 'failed'; attachmentId: string; error: string };
```
UI rule:
- `blocked` means user can fix/remove/change target.
- `failed` means local processing failed.
- Neither state should enqueue runtime delivery.
### Budget planning algorithm
Use deterministic, simple budget allocation. Do not optimize each image independently to the max, because a batch can still exceed total budget.
```ts
export function allocateImageBudgets(input: {
images: ImageCandidate[];
totalMaxBytes: number;
perImageMaxBytes: number;
}): ImageBudgetAllocation[] {
const perImageFairShare = Math.floor(input.totalMaxBytes / Math.max(1, input.images.length));
const targetBytes = Math.min(input.perImageMaxBytes, perImageFairShare);
return input.images.map((image) => ({ imageId: image.id, targetBytes }));
}
```
If the optimized output is still above target:
1. Reduce dimensions down to min edge threshold.
2. Reduce JPEG quality down to minimum acceptable quality.
3. If still too large, block and explain.
Do not loop indefinitely.
### Cancellation edge cases
| User action | Safe behavior |
|---|---|
| Removes image during optimization | Cancel or ignore result by generation id. |
| Adds image then switches team | Cancel or detach optimization result from old draft. |
| Switches provider/model | Recompute capability warnings without re-encoding if artifact is reusable. |
| Sends while optimization pending | Disable send or show “attachment still processing”. |
| Closes app mid-optimization | No partially written artifact should be treated as ready. |
### Generation id pattern
```ts
let prepareGeneration = 0;
async function prepareAttachments(files: File[]) {
const generation = ++prepareGeneration;
const result = await optimize(files);
if (generation !== prepareGeneration) return;
setPreparedAttachments(result);
}
```
This avoids stale async results restoring removed attachments.
### Phase 1 PR checklist
- No provider delivery path changed.
- No launch/runtime tests need snapshot updates.
- Text-only message send still works manually.
- Image preview removal cannot leave hidden payload in draft.
- Backend validation is stricter than renderer validation.
- All user-visible errors are actionable.
## Failure Injection Tests for Phase 1
Add tests that intentionally simulate bad renderer or corrupted local state.
```ts
describe('attachment backend validation hardening', () => {
it('rejects renderer supplied absolute paths outside managed storage', () => {
const result = validateAttachmentStorageReference({
artifactId: 'att_1',
path: '/Users/belief/.ssh/id_rsa',
});
expect(result.ok).toBe(false);
expect(result.code).toBe('attachment_artifact_path_unsafe');
});
it('rejects metadata that claims small size but file is large', async () => {
const result = await validateAttachmentArtifactOnDisk({
expectedSizeBytes: 100,
actualPath: fixturePath('large-image.png'),
maxBytes: 1024,
});
expect(result.ok).toBe(false);
expect(result.code).toBe('attachment_too_large');
});
});
```
## Browser and Platform Edge Cases
| Edge case | Safe implementation note |
|---|---|
| Safari/WebKit image decode differences | Use feature detection, not browser assumptions. |
| macOS clipboard TIFF/HEIC | Treat unsupported until explicitly converted/tested. |
| EXIF orientation | Prefer decode path that respects orientation or normalize orientation during canvas draw. |
| Color profiles | Accept minor color shift for screenshots, do not promise color-managed output. |
| Canvas memory pressure | Bound megapixels before drawing. |
| Pica worker failure | Fall back to canvas resize with warning, or block with clear error. |
| Transparent UI screenshot | Avoid JPEG flattening unless transparency absent or user warning is shown. |
| Very long screenshot | Limit max edge and total pixels to prevent huge canvas allocation. |
## Memory Safety Budget
Renderer memory is the main Phase 1 risk.
Recommended starting limits:
```ts
export const ATTACHMENT_IMAGE_LIMITS = {
maxInputBytes: 20 * 1024 * 1024,
maxInputPixels: 32_000_000,
maxOutputBytesPerImage: 4 * 1024 * 1024,
maxOutputBytesTotal: 8 * 1024 * 1024,
maxOutputEdge: 2400,
minJpegQuality: 0.72,
defaultJpegQuality: 0.86,
};
```
These are release-safe starting values, not final product limits. They should be tuned after real usage.
## Phase 1 Stop Conditions
Stop implementation and reassess if any of these happen:
- Optimized images are stored as base64 in persisted message JSON.
- Main process needs a new native image dependency.
- Existing file attachments stop sending.
- Composer draft state becomes provider-specific.
- Text-only messages require schema migration.
## File-Level Implementation Plan
Suggested new files:
```text
src/features/agent-attachments/shared/types.ts
src/features/agent-attachments/shared/budgets.ts
src/features/agent-attachments/shared/capabilities.ts
src/features/agent-attachments/shared/validation.ts
src/features/agent-attachments/shared/storageIds.ts
src/features/agent-attachments/renderer/optimizeImageForAgent.ts
src/features/agent-attachments/renderer/usePreparedAttachments.ts
src/features/agent-attachments/main/validateAttachmentArtifact.ts
src/features/agent-attachments/main/attachmentArtifactStore.ts
```
Suggested tests:
```text
test/features/agent-attachments/budgets.test.ts
test/features/agent-attachments/capabilities.test.ts
test/features/agent-attachments/validation.test.ts
test/features/agent-attachments/attachmentArtifactStore.test.ts
```
### Storage id validation
```ts
const SAFE_ID_RE = /^[a-zA-Z0-9][a-zA-Z0-9_-]{0,120}$/;
export function assertSafeAttachmentStorageId(name: string, value: string): void {
if (!SAFE_ID_RE.test(value)) {
throw new Error(`Invalid ${name}`);
}
}
```
Use this for `teamName`, `messageId`, and `attachmentId` before building artifact paths.
### Managed path resolver
```ts
export function resolveAttachmentArtifactPath(input: {
teamRoot: string;
teamName: string;
messageId: string;
attachmentId: string;
fileName: 'original.png' | 'original.jpg' | 'optimized.png' | 'optimized.jpg' | 'thumb.jpg' | 'meta.json';
}): string {
assertSafeAttachmentStorageId('teamName', input.teamName);
assertSafeAttachmentStorageId('messageId', input.messageId);
assertSafeAttachmentStorageId('attachmentId', input.attachmentId);
const base = path.resolve(input.teamRoot, input.teamName, 'attachments', input.messageId, input.attachmentId);
const resolved = path.resolve(base, input.fileName);
if (!resolved.startsWith(base + path.sep)) {
throw new Error('Attachment artifact path escaped managed directory');
}
return resolved;
}
```
### Renderer optimizer guardrails
```ts
export async function optimizeImageForAgent(input: OptimizeImageForAgentInput): Promise<OptimizeImageForAgentResult> {
const bitmap = await createImageBitmap(input.file);
assertPixelBudget(bitmap.width, bitmap.height, input.budget.maxInputPixels);
const target = planResizeDimensions({
width: bitmap.width,
height: bitmap.height,
maxEdge: input.budget.maxOutputEdge,
});
const canvas = document.createElement('canvas');
canvas.width = target.width;
canvas.height = target.height;
await resizeWithPicaOrFallback(bitmap, canvas);
const blob = await encodeCanvasForProvider(canvas, input.budget);
return buildOptimizationResult(input.file, blob, target);
}
```
Do not implement infinite quality-search loops. Use a small bounded set of quality attempts like `[0.86, 0.8, 0.74, 0.72]`.
### Phase 1 exact acceptance criteria
- Backend path resolver rejects traversal in tests.
- Budget tests cover single, multiple, and oversized images.
- Renderer hook ignores stale optimization result after removal.
- Composer cannot send while any attachment is `optimizing`.
- Existing text-only composer tests do not need behavioral changes.
- No provider delivery module imports renderer optimizer.
## Phase 1 Deep Review Addendum
### Exact UI states
Composer should expose these states clearly:
| State | Send allowed | Copy |
|---|---:|---|
| No attachments | yes | normal text-only behavior |
| Attachments optimizing | no | `Preparing image...` |
| Attachments ready and supported | yes | optional optimized size summary |
| Attachment too large after optimization | no | `Image is too large after optimization. Remove it or use a smaller image.` |
| Target model unsupported | no | `Selected model does not support image attachments.` |
| Decode failed | no | `Could not read this image.` |
| Artifact persistence failed | no | `Could not prepare attachment for sending.` |
### Accessibility and UX guardrails
- Warning text should be readable without relying on color.
- Remove button must remain available for blocked attachments.
- If multiple attachments have warnings, show per-attachment reason and aggregate reason near send.
- Do not hide text draft when attachment fails.
- If user removes the blocked attachment, send button should recover immediately.
### Persistence safety
When user sends a message with attachments:
1. Generate message id first.
2. Persist original/optimized artifacts under message id.
3. Persist message record referencing attachment ids.
4. Begin provider delivery.
Do not begin provider delivery before message record and artifacts are durably written.
This avoids a retry state where ledger knows about a message but artifacts are missing because write happened later.
### Phase 1 anti-regression tests
```ts
it('does not serialize image bytes into message json', () => {
const message = buildMessageRecordWithAttachment(fakeAttachment());
expect(JSON.stringify(message)).not.toContain('base64');
expect(JSON.stringify(message)).not.toContain('data:image');
});
it('restores send button after removing blocked attachment', () => {
const state = reducer(blockedAttachmentState(), removeAttachment('att_1'));
expect(selectCanSend(state)).toBe(true);
});
```
### Phase 1 implementation order
Implement in this order to reduce bug risk:
1. Pure budget/capability tests.
2. Safe id/path helpers.
3. Artifact store tests.
4. Renderer optimization hook without composer integration.
5. Composer preview/warning integration.
6. Send blocking.
7. Backend validation on send.
If a later step fails, earlier pure modules remain useful and low-risk.
## Phase 1 Implementation Contract Addendum
### Exact domain types
```ts
export type AttachmentWarningCode =
| 'image_was_resized'
| 'image_was_reencoded'
| 'image_quality_reduced'
| 'model_support_unknown'
| 'model_does_not_support_images'
| 'file_type_not_supported';
export interface AttachmentWarning {
code: AttachmentWarningCode;
message: string;
attachmentId?: string;
}
export interface ImageOptimizationBudget {
maxInputBytes: number;
maxInputPixels: number;
maxOutputBytesPerImage: number;
maxOutputBytesTotal: number;
maxOutputEdge: number;
jpegQualityAttempts: readonly number[];
}
```
Keep these in shared pure code. Renderer and main may import types and pure validators, but renderer-specific optimizer code must not be imported by main.
### Quality strategy
Use a deterministic quality strategy instead of adaptive unbounded loops.
```ts
const JPEG_QUALITY_ATTEMPTS = [0.86, 0.82, 0.78, 0.74, 0.72] as const;
for (const quality of JPEG_QUALITY_ATTEMPTS) {
const blob = await encodeCanvas(canvas, 'image/jpeg', quality);
if (blob.size <= targetBytes) return blob;
}
throw new AgentAttachmentError(
'attachment_too_large',
'Image is too large after optimization. Remove it or use a smaller image.'
);
```
PNG strategy:
- Keep PNG for small screenshots and transparency.
- Re-encode to JPEG only when transparency is absent and size requires it.
- If transparency exists and PNG remains too large, block with clear reason instead of silently flattening unless product explicitly accepts flattening.
### UI copy table
| Code | Copy |
|---|---|
| `attachment_too_large` | `Image is too large after optimization. Remove it or use a smaller image.` |
| `attachment_type_unsupported` | `This file type is not supported for agent image delivery.` |
| `attachment_model_unsupported` | `Selected model does not support image attachments. Switch model or remove the image.` |
| `attachment_optimization_failed` | `Could not prepare this image for sending.` |
| `attachment_artifact_missing` | `Prepared image file is missing. Remove and attach the image again.` |
Copy should be short in UI. Detailed diagnostics can go into copy diagnostics/logs.
### Batch behavior
Multiple images should preserve user order.
```ts
export function sortAttachmentsForDelivery(attachments: AgentAttachmentPayload[]): AgentAttachmentPayload[] {
return [...attachments].sort((a, b) => a.order - b.order);
}
```
Do not sort by size or file name because the prompt may refer to “first image” and “second image”.
### Draft persistence edge cases
- Draft can reference attachment ids before final message id exists.
- On send, draft attachment ids should be reparented or copied into message artifact directory.
- If reparenting fails, send should stop before provider delivery.
- Removing attachment from draft should remove draft artifact eventually, but not synchronously block UI.
## Implementation Readiness Addendum
### Definition of Ready for Phase 1
Before coding Phase 1:
- Confirm exact composer components that own attachment state.
- Confirm where message ids are generated for user sends.
- Confirm where existing attachments are persisted today.
- Confirm whether current image previews store bytes, paths, or blobs.
- Confirm no provider delivery changes are included in the Phase 1 PR.
### Exact reducer-style behavior
```ts
type ComposerAttachmentAction =
| { type: 'attachment_added'; file: File; draftAttachmentId: string }
| { type: 'attachment_prepare_started'; draftAttachmentId: string; generation: number }
| { type: 'attachment_prepare_succeeded'; draftAttachmentId: string; generation: number; payload: AgentAttachmentPayload }
| { type: 'attachment_prepare_failed'; draftAttachmentId: string; generation: number; error: AgentAttachmentErrorJson }
| { type: 'attachment_removed'; draftAttachmentId: string }
| { type: 'target_changed'; target: ProviderTarget };
```
Reducer rule:
- Ignore `attachment_prepare_succeeded` if generation is stale.
- Ignore prepare results for removed attachment ids.
- Recompute capability warnings on `target_changed` without reprocessing image bytes.
- Do not clear text draft when attachment fails.
### Backend artifact write order
```ts
async function persistMessageAttachments(input: PersistMessageAttachmentsInput): Promise<PersistedAttachmentBundle> {
const messageDir = resolveMessageAttachmentDir(input.teamName, input.messageId);
await fs.mkdir(messageDir, { recursive: true });
const persisted: PersistedAttachment[] = [];
for (const attachment of input.attachments) {
const paths = resolveAttachmentPaths(input.teamName, input.messageId, attachment.id);
await writeFileAtomic(paths.original, attachment.originalBytes);
if (attachment.optimizedBytes) await writeFileAtomic(paths.optimized, attachment.optimizedBytes);
await writeFileAtomic(paths.meta, JSON.stringify(buildMeta(attachment), null, 2));
persisted.push(toPersistedAttachment(paths, attachment));
}
return { attachments: persisted };
}
```
Use atomic writes for metadata and artifacts where practical. A partially written optimized image must not be treated as ready.
### Atomic write requirement
```ts
async function writeFileAtomic(path: string, bytes: Buffer | string): Promise<void> {
const tmp = `${path}.${process.pid}.${Date.now()}.tmp`;
await fs.writeFile(tmp, bytes);
await fs.rename(tmp, path);
}
```
If write fails, cleanup tmp best-effort and return typed error.
### Phase 1 additional edge cases
| Edge case | Expected behavior |
|---|---|
| Disk full during artifact write | Send blocked, text draft preserved, typed error shown. |
| App crashes after artifacts written before message record | Orphan artifacts may remain, later GC can clean. No message corruption. |
| App crashes after message record before provider delivery | Message exists with attachments and can be retried later. |
| Thumbnail write fails | Do not block delivery if original/optimized artifact is valid. Show preview fallback. |
| Original write succeeds, optimized write fails | Block image delivery unless original fits provider budget. |
## Final Phase 1 Acceptance Specs
### Spec 1 - small supported image
```gherkin
Given a user attaches a 200 KB PNG screenshot
And the selected target supports images
When the composer prepares the attachment
Then the preview is shown
And the send button remains enabled
And the persisted message references artifact ids
And the message JSON does not contain base64
```
### Spec 2 - oversized image optimized successfully
```gherkin
Given a user attaches a 12 MB PNG screenshot
When optimization completes below provider budget
Then the user sees that the image was optimized
And the send button is enabled
And the optimized artifact is used for provider delivery later
And the original artifact remains available for retry/regeneration
```
### Spec 3 - oversized image still too large
```gherkin
Given a user attaches a huge image
When bounded optimization cannot bring it below budget
Then the send button is disabled
And the text draft remains intact
And the user can remove the image
And no provider delivery is attempted
```
### Spec 4 - stale optimization result
```gherkin
Given a user attaches an image
And removes it while optimization is running
When the optimization promise resolves
Then the removed attachment is not restored
And the send state reflects the current draft only
```
### Phase 1 exact PR contract
The Phase 1 PR is acceptable only if:
- It adds shared attachment domain types.
- It adds budget/capability/validation tests.
- It adds safe managed artifact path/id helpers.
- It adds renderer optimization or a clearly documented placeholder if split.
- It does not change provider delivery behavior.
- It does not import provider runtime code into renderer optimizer.
- It does not import attachment feature into launch/provisioning.
### Phase 1 likely review findings to prevent
| Finding | Prevention |
|---|---|
| Message JSON contains base64 | persist artifact ids only |
| Send starts before artifact write | enforce write-before-delivery order |
| Draft removal race restores attachment | generation id guard |
| Backend trusts renderer size | stat artifact on disk |
| Unsupported model warning only in UI | backend validation also blocks |
## Phase 1 Pre-Mortem and Extra Safeguards
### Likely Phase 1 mistakes
| Mistake | Concrete prevention |
|---|---|
| Optimizer result races with removed attachment | Generation id guard in reducer/hook. |
| Backend trusts renderer MIME | Backend validates by allowlist and artifact metadata. |
| Draft artifacts leak forever | Mark draft artifacts and add later GC policy. |
| UI blocks text-only send after image error removed | Selector tests for `canSend`. |
| Multiple images reorder | Preserve user-provided order field. |
| Image-only send unclear | Product decision before coding: allow image-only with default prompt or require text. |
### Image-only message decision
Top options:
1. Require text with image - 🎯 8.5 🛡️ 9 🧠 2, примерно `20-50` строк.
Safest release behavior. It avoids confusing empty prompt behavior across providers.
2. Allow image-only with generated prompt - 🎯 7 🛡️ 7.5 🧠 4, примерно `60-120` строк.
Useful UX, but generated prompts can surprise users and differ by action mode.
3. Allow image-only raw - 🎯 5.5 🛡️ 5 🧠 2, примерно `20-40` строк.
Some providers may accept it, but behavior is inconsistent.
Recommendation: start with option 1 for release.
### Pica wrapper contract
```ts
export interface ImageResizeEngine {
resize(input: {
source: ImageBitmap;
targetWidth: number;
targetHeight: number;
}): Promise<HTMLCanvasElement>;
}
```
Reason:
- Keeps `pica` behind a tiny interface.
- Allows tests with fake resize engine.
- Allows fallback or replacement without changing composer state.
### Quality acceptance rules
- UI screenshots should remain legible at common zoom levels after resize.
- Do not reduce JPEG quality below configured floor.
- If text in screenshot becomes unreadable in manual QA, increase budget before release.
- If multiple images exceed total budget, prefer blocking over making all unreadable.
### Phase 1 contract tests to add before UI wiring
```ts
it('preserves attachment order for delivery', () => {
const sorted = sortAttachmentsForDelivery([
fakeImageAttachment({ id: 'b', order: 2 }),
fakeImageAttachment({ id: 'a', order: 1 }),
]);
expect(sorted.map((item) => item.id)).toEqual(['a', 'b']);
});
it('does not allow send while attachment is optimizing', () => {
expect(selectCanSend(fakeDraft({ attachmentState: 'optimizing' }))).toBe(false);
});
```

View file

@ -613,3 +613,567 @@ it('does not write to stdin when attachment payload exceeds Claude budget', asyn
## Code review notes
If the diff shows a new `if (mimeType)` ladder inside `TeamProvisioningService`, the refactor failed. That logic belongs in adapter/helper tests.
## Detailed Implementation Checklist
### Step 1 - Locate and isolate current Claude message serialization
Do not rewrite the full delivery path. Add a seam where text and attachments are converted into Claude input blocks.
Target shape:
```ts
export function buildClaudeInputBlocks(input: {
text: string;
attachments: AgentAttachmentPayload[];
}): ClaudeInputBlock[] {
return [
{ type: 'text', text: input.text },
...input.attachments.map(toClaudeImageBlock),
];
}
```
### Step 2 - Use provider-native image blocks only
For Claude, image data should be represented as structured image blocks for the SDK/stream-json path. Do not paste base64 into text.
```ts
function toClaudeImageBlock(attachment: AgentAttachmentPayload): ClaudeInputBlock {
const artifact = selectBestImageArtifact(attachment, 'claude');
return {
type: 'image',
source: {
type: 'base64',
media_type: artifact.mimeType,
data: artifact.base64,
},
};
}
```
Important: this function should live in a Claude adapter, not in the composer or generic delivery service.
### Step 3 - Preserve text-only path
Guard the new block builder so text-only Claude messages produce byte-for-byte equivalent semantic behavior.
```ts
if (attachments.length === 0) {
return buildExistingClaudeTextPayload(text);
}
```
Only remove this branch after tests prove the generic block path is fully equivalent.
### Step 4 - Add exact diagnostics
Claude attachment failures should say what failed:
- `Claude image artifact missing`
- `Claude image MIME type unsupported: image/webp`
- `Claude image payload exceeds budget after optimization`
- `Claude stream-json image delivery rejected: <redacted provider error>`
Avoid saying “teammate crashed” unless process liveness confirms that.
## Claude-Specific Edge Cases
| Case | Risk | Safe behavior |
|---|---|---|
| Claude subscription not logged in | Could be misread as attachment failure | Preserve existing auth diagnostic. |
| Multiple images | Token/latency spike | Enforce count and total byte budget before SDK call. |
| Image plus task delegation | Tool-call response still expected | Existing proof gates stay unchanged. |
| Lead prompt with images | Lead may consume image but not delegate | This is normal model behavior, not transport failure. |
| Assistant refuses visual task | Delivery succeeded, model response is semantic failure | Do not retry transport automatically. |
| Claude CLI path lacking image support | Attachment delivery blocked with provider capability error | Do not fallback to base64 text. |
## Golden Serialization Tests
Add tests that validate payload shape without hitting Claude.
```ts
it('serializes a png as a Claude image block', () => {
const blocks = buildClaudeInputBlocks({
text: 'What color?',
attachments: [fakePngAttachment({ base64: 'abc' })],
});
expect(blocks).toEqual([
{ type: 'text', text: 'What color?' },
{
type: 'image',
source: { type: 'base64', media_type: 'image/png', data: 'abc' },
},
]);
});
```
Negative test:
```ts
it('does not serialize image as plain base64 text', () => {
const payload = JSON.stringify(buildClaudeInputBlocks(input));
expect(payload).not.toContain('data:image/png;base64');
});
```
## Manual Claude QA
Use the known-good red-card PNG:
1. Send to Claude lead: “What color is the square? Answer one word.”
2. Expected answer: `red` or equivalent.
3. Send two images if supported by budget.
4. Send oversized image and verify UI blocks before provider call.
5. Temporarily break auth and verify auth error is preserved, not attachment error.
## Phase 2 Exit Criteria
- Claude text-only delivery remains green.
- Claude image delivery answers visual smoke prompt.
- Claude oversized image is blocked before SDK/provider call.
- Claude provider rejection is shown as delivery failure, not launch failure.
- No Codex or OpenCode code path changes are required for Phase 2 to pass.
## Implementation Safeguards
### Claude adapter should be additive
Do not refactor all Claude runtime code just to add image support. Add a small adapter and call it from the existing send path only when attachments are present.
```ts
if (validatedAttachments.length > 0) {
const payload = await claudeAttachmentAdapter.buildDeliveryParts({
text,
attachments: validatedAttachments,
budget,
});
return sendClaudeStructuredPayload(payload);
}
return sendClaudeTextOnlyPayload(text);
```
This limits blast radius.
### Do not infer readiness from Claude image response
A Claude member answering an image prompt proves message delivery, not bootstrap readiness. Keep these concepts separate:
- `bootstrapConfirmed`: launch/runtime proof.
- `messageDelivered`: prompt delivery proof.
- `visibleReply`: user-visible response proof.
### Claude streaming failure mapping
| Failure source | User-facing classification |
|---|---|
| SDK rejects image MIME | `attachment_type_unsupported` |
| SDK rejects payload size | `attachment_too_large` |
| Auth token invalidated | existing provider auth error |
| Runtime exits after request | runtime crash/exit diagnostic |
| Assistant answers “cannot view image” | semantic model response, not transport failure |
| Stream closes without response | delivery failure, eligible for existing bounded retry only if text path already retries |
### Claude PR review checklist
- Does the text-only path avoid new serialization code?
- Are images sent as structured image blocks, not text?
- Is artifact content loaded as late as possible?
- Are size and MIME checked before reading large file into memory?
- Is provider error redacted?
- Does delivery failure preserve the inbox message?
### Additional Claude tests
```ts
it('keeps text-only Claude path unchanged', async () => {
const result = await buildClaudeDeliveryRequest({ text: 'hello', attachments: [] });
expect(result.kind).toBe('legacy_text');
});
it('classifies missing optimized artifact before provider call', async () => {
await expect(buildClaudeDeliveryRequest({
text: 'see image',
attachments: [fakeAttachmentWithMissingPath()],
})).rejects.toMatchObject({ code: 'attachment_artifact_missing' });
});
```
## Failure Injection Tests for Phase 2
```ts
describe('Claude attachment delivery failures', () => {
it('does not mark teammate offline when Claude rejects image payload', async () => {
const result = await deliverClaudeMessageWithAttachment(fakeProviderRejectsImage());
expect(result.delivery.status).toBe('failed');
expect(result.delivery.failureCode).toBe('attachment_provider_rejected');
expect(result.memberPatch).toBeUndefined();
});
it('preserves existing auth error when Claude token is invalid', async () => {
const result = await deliverClaudeMessageWithAttachment(fakeInvalidAuth());
expect(result.error.message).toMatch(/authentication|sign in|token/i);
expect(result.error.code).not.toBe('attachment_provider_rejected');
});
});
```
## Claude Serialization Gotchas
- Some Claude SDK/CLI surfaces accept message arrays, some accept stream-json lines, and some accept plain prompt text. Attachments must use the surface that truly supports image blocks.
- If the current runtime path cannot send image blocks safely, Phase 2 must block image send with a clear message instead of falling back to base64 text.
- Do not mix `@file` syntax with image block syntax unless that exact runtime path has been tested.
- If multiple Claude launch contexts exist, validate the one used by the app, not only a standalone prototype.
## Claude Runtime Exit Correlation
If Claude process exits shortly after image delivery, diagnostics should show both facts separately:
```text
Image delivery was attempted using Claude image blocks.
The Claude runtime exited 18s later.
Last stderr: <redacted tail>
```
Do not claim the image caused the exit unless provider stderr explicitly says so.
## Phase 2 Stop Conditions
Stop and reassess if:
- Claude implementation requires changing team bootstrap prompts.
- Claude text-only path must be rewritten broadly.
- Auth/session diagnostics change unexpectedly.
- Provider image block support is not available in the actual app runtime path.
## File-Level Implementation Plan
Suggested files:
```text
src/features/agent-attachments/main/providers/claudeAttachmentAdapter.ts
src/features/agent-attachments/main/providers/claudeAttachmentAdapter.test.ts
```
Existing delivery call site should import only the adapter public function, not internal attachment storage helpers.
### Adapter skeleton
```ts
export async function buildClaudeAttachmentDeliveryParts(input: {
text: string;
attachments: AgentAttachmentPayload[];
readArtifact: AttachmentArtifactReader;
}): Promise<ClaudeDeliveryParts> {
const blocks: ClaudeInputBlock[] = [];
if (input.text.trim().length > 0) {
blocks.push({ type: 'text', text: input.text });
}
for (const attachment of input.attachments) {
const artifact = selectProviderImageArtifact(attachment, 'anthropic');
const bytes = await input.readArtifact.readBytes(artifact.artifactId);
blocks.push(toClaudeImageBlock(artifact.mimeType, bytes));
}
return { kind: 'claude_structured_blocks', blocks };
}
```
### Artifact reader abstraction
```ts
export interface AttachmentArtifactReader {
readBytes(artifactId: string): Promise<Buffer>;
stat(artifactId: string): Promise<{ sizeBytes: number }>;
}
```
This makes adapter unit tests independent from filesystem.
### Claude image block conversion
```ts
function toClaudeImageBlock(mimeType: string, bytes: Buffer): ClaudeInputBlock {
if (mimeType !== 'image/png' && mimeType !== 'image/jpeg') {
throw new AttachmentDeliveryError('attachment_type_unsupported', `Claude image MIME unsupported: ${mimeType}`);
}
return {
type: 'image',
source: {
type: 'base64',
media_type: mimeType,
data: bytes.toString('base64'),
},
};
}
```
This base64 is provider-native structured payload data, not prompt text. That distinction should be documented in code comments because it is easy to confuse.
### Claude phase review traps
- If a reviewer sees `data:image` in prompt text, reject.
- If a reviewer sees `spawnStatus` or `launchState` imports in adapter, reject.
- If a reviewer sees attachment adapter imported by bootstrap/provisioning code, reject.
- If text-only Claude messages start going through `readArtifact`, reject.
## Phase 2 Deep Review Addendum
### Claude adapter contract tests
Test adapter without real Claude first. The real smoke test should only prove provider behavior after contract is stable.
```ts
describe('buildClaudeAttachmentDeliveryParts', () => {
it('preserves text order before images', async () => {
const parts = await buildClaudeAttachmentDeliveryParts(fakeInputWithOneImage());
expect(parts.blocks[0]).toMatchObject({ type: 'text' });
expect(parts.blocks[1]).toMatchObject({ type: 'image' });
});
it('rejects webp until explicitly supported', async () => {
await expect(buildClaudeAttachmentDeliveryParts(fakeWebpInput()))
.rejects.toMatchObject({ code: 'attachment_type_unsupported' });
});
});
```
### Claude delivery proof matrix
| Message target | Required proof after Claude response |
|---|---|
| Direct user ask | visible reply or safe plain text materialization |
| Delegate to teammate | structured relay/message proof if current flow requires it |
| Work-sync | existing work-sync proof, not visual answer |
| Task progress | existing task/progress proof |
Image support must not weaken these proof gates.
### Live smoke minimum
Use a prompt that cannot be answered correctly without image access:
```text
Look at the attached image. What is the single dominant color of the square? Answer with one English word.
```
Passing answer: `red`.
Failing answers:
- `I cannot view images`.
- Any generic guess not grounded in image.
- Empty turn.
- Tool-only response without visible answer for direct ask.
## Phase 2 Implementation Contract Addendum
### Runtime path decision tree
Before implementing, identify the exact Claude runtime path used by team messages.
```text
Does app path support structured image blocks?
yes -> implement native Claude image blocks
no -> block image send for Claude with clear unsupported runtime message
```
Do not implement a fallback that pastes base64 text.
### Claude provider adapter return shape
```ts
export type ClaudeAttachmentDeliveryParts =
| { kind: 'legacy_text'; text: string }
| { kind: 'structured_blocks'; blocks: ClaudeInputBlock[] };
```
Rules:
- `legacy_text` only when no attachments.
- `structured_blocks` when attachments exist.
- Call site must handle both explicitly.
- Any unsupported attachment throws typed `AgentAttachmentError` before provider call.
### Claude provider smoke record format
Record smoke results in PR notes:
```text
Provider: Claude subscription
Runtime path: <actual app path>
Model: <model>
Prompt: What color is the square? One word.
Image: red-square.png
Expected: red
Observed: red
Date: 2026-05-09
Result: pass
```
This avoids relying on stale memory about prototype success.
## Implementation Readiness Addendum
### Definition of Ready for Phase 2
Before coding Phase 2:
- Phase 1 normalized attachment payload exists and is tested.
- Backend can read optimized artifact bytes by attachment id.
- Exact Claude app runtime path for team messages is identified.
- Text-only Claude delivery tests are green before changes.
### Mocking strategy
Use a fake artifact reader and fake Claude sender.
```ts
const fakeArtifactReader: AttachmentArtifactReader = {
async readBytes(id) {
if (id === 'missing') throw new AgentAttachmentError('attachment_artifact_missing', 'missing');
return Buffer.from([1, 2, 3]);
},
async stat() {
return { sizeBytes: 3 };
},
};
```
Do not unit-test by invoking real Claude. Real Claude belongs to smoke/e2e only.
### Claude fallback decision
If structured image blocks are unavailable in the actual app runtime:
- Block image send for Claude.
- Keep text-only send working.
- Add diagnostic: `Claude runtime path does not support image attachments yet.`
- Do not fallback to `@file`, Markdown image links, or base64 text unless separately proven.
### Claude additional edge cases
| Edge case | Expected behavior |
|---|---|
| Empty text with image | Allow if UX supports image-only prompt, otherwise require text. Do not crash adapter. |
| Multiple images | Preserve order. |
| Unsupported MIME after hydration | Typed pre-provider error. |
| Artifact read fails | Message saved, delivery fails actionable. |
| Claude returns visible refusal | Delivery succeeded with refusal visible to user. |
## Final Phase 2 Acceptance Specs
### Spec 1 - Claude text-only no regression
```gherkin
Given a user sends a text-only message to a Claude target
When the message is delivered
Then the legacy Claude text path is used
And no artifact reader is invoked
And existing proof gates are unchanged
```
### Spec 2 - Claude image delivered through structured block
```gherkin
Given a user sends a PNG image to a Claude target
When the Claude adapter builds delivery parts
Then it returns structured blocks
And the first block is the text prompt
And the second block is a provider-native image block
And no data:image text is present in the prompt
```
### Spec 3 - Claude artifact missing
```gherkin
Given a message references an optimized image artifact
And the artifact file is missing
When delivery is attempted
Then delivery fails with attachment_artifact_missing
And the member launch state is unchanged
And the message remains available for user action
```
### Phase 2 exact PR contract
The Phase 2 PR is acceptable only if:
- It adds a Claude adapter with fake artifact-reader tests.
- It preserves text-only fast path.
- It maps provider image rejection to attachment delivery failure.
- It preserves Claude auth/session diagnostics.
- It includes one real smoke note or explicitly marks smoke as pending.
- It does not touch Codex/OpenCode delivery code except shared types.
### Claude provider-native base64 comment requirement
If implementation converts bytes to base64 for Claude structured image blocks, add a comment explaining why this is not the forbidden base64-in-text fallback.
```ts
// Claude expects image bytes inside the structured image block as base64.
// This is provider-native payload data, not text appended to the user prompt.
```
## Phase 2 Pre-Mortem and Extra Safeguards
### Likely Claude mistakes
| Mistake | Concrete prevention |
|---|---|
| Testing standalone Claude path but shipping different app path | Smoke actual app-managed team delivery path. |
| Treating Claude visible refusal as transport failure | Visible refusal is delivered response. |
| Weakening relay/work-sync proof gates | Keep existing proof gates after response. |
| Reading large artifact before budget validation | Validate size before reading bytes. |
| Logging structured payload with base64 | Redact image block data in diagnostics. |
### Redaction helper requirement
Claude structured payload may contain base64. Any debug output must redact it.
```ts
export function redactClaudeBlocksForDiagnostics(blocks: ClaudeInputBlock[]): unknown[] {
return blocks.map((block) => {
if (block.type !== 'image') return block;
return {
...block,
source: {
...block.source,
data: `[redacted image bytes: ${block.source.media_type}]`,
},
};
});
}
```
### Claude adapter test for redaction
```ts
it('redacts image bytes in diagnostics', () => {
const redacted = redactClaudeBlocksForDiagnostics([fakeClaudeImageBlock('SECRET_BASE64')]);
expect(JSON.stringify(redacted)).not.toContain('SECRET_BASE64');
});
```
### Claude stream handling edge case
If Claude streams partial text then errors:
- Preserve partial visible text only if existing delivery layer already supports partials safely.
- Otherwise report delivery failed with provider diagnostic.
- Do not mark message read if required visible proof was not committed.

View file

@ -624,3 +624,595 @@ it('does not include image path in prompt stdin', async () => {
| base64 path from renderer | backend-only artifact store |
| Codex auth failure hidden | do not catch provider errors as attachment errors |
| unsupported PDF converted to prompt text silently | block explicitly |
## Detailed Implementation Checklist
### Step 1 - Define desktop to orchestrator attachment contract
Codex native invocation details should stay in the runtime/orchestrator boundary. Desktop should pass managed artifact references, not raw arbitrary paths.
```ts
export interface CodexNativeAttachmentRequest {
kind: 'image';
artifactId: string;
mimeType: 'image/png' | 'image/jpeg';
absolutePath: string;
sizeBytes: number;
}
```
Validation before crossing process boundary:
- Path must be under app-managed attachment artifact directory.
- MIME type must be PNG/JPEG for Phase 3.
- File must exist and match expected size budget.
- File path must not come directly from renderer.
### Step 2 - Add Codex adapter in desktop
Desktop adapter should produce a provider-neutral runtime request, not CLI args directly.
```ts
export function buildCodexAttachmentRuntimeRequest(input: {
text: string;
attachments: AgentAttachmentPayload[];
}): CodexRuntimePromptRequest {
return {
text: input.text,
images: input.attachments.map((attachment) => ({
path: selectBestImageArtifact(attachment, 'codex').path,
mimeType: selectBestImageArtifact(attachment, 'codex').mimeType,
})),
};
}
```
### Step 3 - Add orchestrator CLI serialization
In orchestrator, serialize to Codex-native image args only at the final command builder.
Expected conceptual shape:
```ts
const args = ['exec', '--json', '--model', model];
for (const image of request.images) {
args.push('--image', image.path);
}
args.push('-');
```
Do not use base64 text fallback.
### Step 4 - Preserve Codex auth behavior
Codex attachment changes must not modify:
- `CODEX_HOME` selection.
- ChatGPT subscription vs API key logic.
- `forced_login_method=chatgpt` propagation.
- existing diagnostics for “ChatGPT login is required”.
If Codex auth fails, the error should remain auth-specific, not attachment-specific.
## Codex-Specific Edge Cases
| Case | Expected behavior |
|---|---|
| Codex ChatGPT subscription logged out | Preserve existing login required diagnostic. |
| Codex API key mode selected but native subscription expected | Preserve existing auth mode diagnostic. |
| Image artifact path deleted before send | Delivery fails with artifact missing. |
| Large image after optimization | Block before Codex CLI. |
| Non-image file | Phase 3 does not route it through `--image`. |
| Multiple images | Pass repeated `--image` args if Codex supports them, otherwise enforce count 1 with clear warning. |
| Codex model without vision | Block based on capability matrix if known. |
## Cross-Repo Contract Test Idea
Add a small pure test in orchestrator for command building:
```ts
it('passes codex images as repeated --image args', () => {
const command = buildCodexExecCommand({
prompt: 'What color?',
images: ['/tmp/a.png', '/tmp/b.jpg'],
});
expect(command.args).toContainSequence(['--image', '/tmp/a.png']);
expect(command.args).toContainSequence(['--image', '/tmp/b.jpg']);
expect(command.stdin).toBe('What color?');
});
```
Add desktop-side test for path safety:
```ts
it('rejects codex image paths outside managed artifact directory', () => {
expect(() => validateManagedArtifactPath('/etc/passwd')).toThrow(/outside managed attachment directory/);
});
```
## Manual Codex QA
1. Confirm dashboard shows Codex ChatGPT account ready.
2. Send red-card image to Codex lead or member.
3. Expected answer: `red` or equivalent.
4. Send unsupported model if available and verify warning.
5. Send oversized optimized image and verify block before runtime call.
6. Temporarily invalidate Codex login and verify auth diagnostic remains clear.
## Phase 3 Exit Criteria
- Codex text-only prompt path unchanged.
- Codex image prompt uses native image channel, not base64 text.
- Codex auth diagnostics still match current behavior.
- Desktop never passes renderer-provided arbitrary file paths to orchestrator.
- One real Codex subscription visual smoke test passes.
## Implementation Safeguards
### Keep Codex login/session handling untouched
The recent launch stability work around Codex auth is fragile and should not be mixed with attachment delivery.
Do not change:
- Codex auth discovery.
- ChatGPT account vs API key selection.
- `CODEX_HOME` propagation.
- `forced_login_method=chatgpt` settings.
- preflight auth status copy.
Attachment support should sit after auth has already resolved.
### Avoid stdin bloat
Codex image paths should be CLI args or native request fields. The prompt on stdin should stay text-only.
Bad:
```ts
stdin = `${prompt}\n\nIMAGE_BASE64=${base64}`;
```
Good:
```ts
args.push('--image', managedImagePath);
stdin = prompt;
```
### Artifact lifetime
Codex runtime may read image files after process spawn. Do not delete artifacts immediately after creating command args.
Safe policy:
- Keep managed artifacts at least until delivery result is terminal.
- If message remains retryable, keep artifacts until retry window expires.
- Garbage collect old artifacts by age and reachability from message records.
### Codex retry edge cases
| Case | Safe behavior |
|---|---|
| Retry after artifact GC | Fail with artifact missing, do not send text-only replacement silently. |
| Retry after model switch | Revalidate capability and budgets. |
| Retry after Codex logout | Show auth error, preserve message. |
| Runtime exits after receiving image | Runtime diagnostic, not attachment validation failure. |
### Cross-repo PR checklist
Desktop repo:
- Produces provider-neutral image request.
- Validates managed artifact paths.
- Preserves message/ledger semantics.
Orchestrator repo:
- Converts request to Codex native args.
- Does not inspect renderer metadata.
- Redacts command diagnostics.
- Tests command builder with one and multiple images.
## Failure Injection Tests for Phase 3
```ts
describe('Codex native image delivery', () => {
it('keeps prompt on stdin and images as args', () => {
const command = buildCodexNativeCommand({
prompt: 'What color?',
images: [managedImage('/tmp/app/attachments/a.png')],
});
expect(command.stdin).toBe('What color?');
expect(command.args).toContain('--image');
expect(command.stdin).not.toContain('base64');
});
it('does not mask Codex login failure as attachment failure', async () => {
const result = await runCodexImageDelivery(fakeCodexLoggedOut());
expect(result.error.message).toMatch(/codex login|ChatGPT/i);
expect(result.error.code).not.toBe('attachment_provider_rejected');
});
});
```
## Codex Command Builder Invariants
- `--image` args must appear before prompt stdin is consumed if Codex CLI requires it.
- Paths must be absolute managed artifact paths.
- Do not shell-concatenate paths. Use argv array.
- Do not quote manually inside argv array.
- Do not pass images through environment variables.
- Do not write temp prompt files containing base64 image text.
## Codex Existing-Team Edge Case
Existing teams may have members created before attachment support. That must not matter. Attachment capability is based on current provider/model/runtime, not team creation date.
If old metadata lacks provider/model:
- Use existing compatibility probe behavior.
- Do not infer vision support from old inbox names.
- Block image send until team/member metadata is stable.
## Phase 3 Stop Conditions
Stop and reassess if:
- Codex image support requires changing auth detection.
- Codex CLI version in app runtime does not support `--image`.
- Image args only work in standalone shell but not app-managed runtime.
- Provider diagnostics start saying API key mode when ChatGPT subscription was selected.
## File-Level Implementation Plan
Desktop suggested files:
```text
src/features/agent-attachments/main/providers/codexAttachmentAdapter.ts
src/features/agent-attachments/main/providers/codexAttachmentAdapter.test.ts
```
Orchestrator suggested files:
```text
src/runtime/codex/codexImageArgs.ts
src/runtime/codex/codexImageArgs.test.ts
```
Use actual repo structure names during implementation, but keep this separation: desktop plans delivery, orchestrator serializes runtime command.
### Desktop adapter skeleton
```ts
export function buildCodexNativeAttachmentRequest(input: {
text: string;
attachments: AgentAttachmentPayload[];
}): CodexNativePromptRequest {
return {
text: input.text,
images: input.attachments.map((attachment) => {
const artifact = selectProviderImageArtifact(attachment, 'codex');
return {
artifactId: artifact.artifactId,
path: artifact.absolutePath,
mimeType: artifact.mimeType,
sizeBytes: artifact.sizeBytes,
};
}),
};
}
```
### Orchestrator command args skeleton
```ts
export function appendCodexImageArgs(args: string[], images: CodexRuntimeImage[]): void {
for (const image of images) {
assertManagedRuntimeImagePath(image.path);
args.push('--image', image.path);
}
}
```
Never build a shell string like:
```ts
`codex exec --image ${path}`
```
Use argv arrays only.
### Codex provider proof interaction
Codex image delivery should reuse existing delivery proof gates. Image delivery itself is not success.
Success examples:
- Visible direct reply to user.
- Correct `message_send` relay for delegated/peer path.
- Correct work-sync report for work-sync path.
Failure examples:
- Codex command exits before assistant message.
- Codex auth fails.
- Codex returns text saying it cannot inspect image.
The last one may still be a visible reply. Treat it as delivered but semantically unhelpful, not as transport failure.
## Phase 3 Deep Review Addendum
### Codex runtime compatibility probe
Before enabling Codex image delivery in UI, the app should know whether the configured Codex runtime supports image args.
Conservative options:
1. Static capability by known runtime version - 🎯 8 🛡️ 8.5 🧠 4, примерно `80-160` строк.
2. One-time local CLI help parse/cache - 🎯 7.5 🛡️ 8 🧠 5, примерно `120-220` строк.
3. Always attempt and surface provider error - 🎯 6 🛡️ 6.5 🧠 2, примерно `30-80` строк.
Recommended for release: option 1 if runtime version is already controlled by the app, otherwise option 2 with short cache.
Do not run expensive live Codex image probes on every composer render.
### Codex request lifecycle
```text
Composer send
-> persist message and artifacts
-> backend validates Codex capability
-> desktop builds Codex runtime request
-> orchestrator builds argv with --image
-> Codex runtime produces response
-> existing delivery proof gates decide delivered/failed
```
Every arrow should have tests or explicit diagnostics.
### Codex diagnostics examples
Good:
```text
Codex image delivery failed: optimized image artifact is missing. The message was saved but cannot be retried with the image.
```
```text
Codex ChatGPT login is required before sending image attachments.
```
Bad:
```text
Agent failed.
```
```text
Spawn failed.
```
unless process launch actually failed.
## Phase 3 Implementation Contract Addendum
### Codex DTO between desktop and orchestrator
Use a stable JSON-serializable shape.
```ts
export interface CodexImageAttachmentForRuntime {
artifactId: string;
path: string;
mimeType: 'image/png' | 'image/jpeg';
sizeBytes: number;
}
export interface CodexPromptRuntimeRequest {
text: string;
images?: CodexImageAttachmentForRuntime[];
}
```
Rules:
- `images` omitted or empty means exact text-only legacy behavior.
- `images` present means orchestrator appends native `--image` args.
- Unknown fields should be ignored or rejected consistently, not partially consumed.
### Codex compatibility check
If the app bundles/controls Codex CLI version, use static support knowledge. If not, parse `codex exec --help` once and cache whether `--image` exists.
Pseudo-code:
```ts
export async function detectCodexImageArgSupport(runtime: CodexRuntime): Promise<boolean> {
const cached = codexImageSupportCache.get(runtime.binaryPath);
if (cached) return cached.supported;
const help = await runtime.run(['exec', '--help'], { timeoutMs: 5000 });
const supported = /--image\b/.test(help.stdout + help.stderr);
codexImageSupportCache.set(runtime.binaryPath, { supported, checkedAt: Date.now() });
return supported;
}
```
Do not run this on every message send if it is slow.
### Codex error mapping table
| Error text category | Final classification |
|---|---|
| login required / ChatGPT session | provider auth/session error |
| unknown option `--image` | runtime does not support image attachments |
| file not found | attachment artifact missing |
| max tokens/quota | provider rejected message |
| model cannot inspect image | visible semantic response if delivered |
Business logic should not depend on fragile regex except for display cleanup. Prefer structured exit codes/known adapter failures where available.
## Implementation Readiness Addendum
### Definition of Ready for Phase 3
Before coding Phase 3:
- Phase 1 normalized artifact storage is in place.
- Codex text-only delivery tests are green.
- Orchestrator branch/worktree is aligned with desktop branch.
- Actual app-managed Codex runtime has been checked for image support.
### Cross-repo compatibility rule
Desktop must tolerate orchestrator without image support during development by failing safely before send, not by crashing runtime.
```ts
if (!runtimeCapabilities.codexImageArgs) {
throw new AgentAttachmentError(
'attachment_runtime_transport_failed',
'Current Codex runtime does not support image attachments.'
);
}
```
### Codex no-regression assertions
- Text-only Codex request shape unchanged when `images` is empty.
- Codex ChatGPT account mode still selected when user chose subscription.
- No `OPENAI_API_KEY` fallback is introduced for subscription image send.
- `CODEX_HOME` still points to expected local auth state.
- Provider auth errors are not swallowed by attachment adapter.
### Codex additional edge cases
| Edge case | Expected behavior |
|---|---|
| `--image` unsupported | Block with runtime unsupported diagnostic. |
| Two images but Codex supports one only | Block or reduce with explicit user choice, no silent drop. |
| Image path contains spaces | argv array handles it. Test it. |
| Image path contains shell metacharacters | argv array handles it. No shell interpolation. |
| Codex returns answer in non-English | Accept semantic visual answer if it clearly identifies image. |
## Final Phase 3 Acceptance Specs
### Spec 1 - Codex text-only no regression
```gherkin
Given a user sends a text-only message to a Codex target
When the runtime request is built
Then images is omitted or empty
And the existing Codex text-only path is used
And auth/session handling is unchanged
```
### Spec 2 - Codex image uses native args
```gherkin
Given a user sends a PNG image to Codex
When the orchestrator command is built
Then the prompt remains text-only stdin
And the image path is passed with --image argv
And no shell string interpolation is used
And no base64 appears in stdin
```
### Spec 3 - Codex runtime lacks image support
```gherkin
Given the configured Codex runtime does not support --image
When the user tries to send an image
Then the send is blocked or delivery fails before runtime prompt
And the diagnostic says Codex runtime does not support image attachments
And text-only Codex messages still work
```
### Phase 3 exact PR contract
The Phase 3 PR is acceptable only if:
- Desktop and orchestrator agree on DTO shape.
- Orchestrator command builder has argv tests.
- Codex auth tests or diagnostics are not weakened.
- Runtime support detection is cached or static, not repeated on every render.
- Paths with spaces are covered in tests.
- No API-key fallback is introduced for subscription mode.
### Codex failure copy examples
```text
Codex image delivery is unavailable because the configured Codex runtime does not support --image.
```
```text
Codex ChatGPT login is required before this image can be sent.
```
```text
Prepared image file is missing. Remove and attach the image again.
```
## Phase 3 Pre-Mortem and Extra Safeguards
### Likely Codex mistakes
| Mistake | Concrete prevention |
|---|---|
| `--image` command built through shell string | Use argv arrays only. |
| Image support check runs too often | Cache runtime capability. |
| API key mode accidentally used for subscription | Preserve selected auth mode and existing env rules. |
| Orchestrator receives desktop fields it ignores | Add contract test across DTO. |
| Runtime does not support image but UI allows send | Capability gate checks runtime support. |
### Cross-version DTO tolerance
During development, desktop and orchestrator can briefly be out of sync. The final merged state must be compatible, but code should fail clearly if mismatch happens.
```ts
export function assertCodexRuntimeUnderstandsImages(runtimeCaps: RuntimeCapabilities): void {
if (!runtimeCaps.codexImages) {
throw new AgentAttachmentError(
'attachment_runtime_transport_failed',
'This Codex runtime does not support image attachments yet.'
);
}
}
```
### Codex diagnostic redaction
If command args are included in diagnostics, redact paths if needed and never include prompt text with secrets.
Safe diagnostic:
```json
{
"provider": "codex",
"model": "gpt-5.4-mini",
"imageCount": 1,
"imageArgsUsed": true,
"stdinBytes": 31
}
```
Unsafe diagnostic:
```json
{
"stdin": "full user prompt with private text",
"imageBase64": "..."
}
```

View file

@ -692,3 +692,646 @@ it('serializes image file part without changing response observer', async () =>
| provider key logged in smoke | redactor in smoke script |
| model capability string hardcoded in UI only | backend resolver is source of truth |
| OpenRouter model id normalization inconsistent | shared `normalizeOpenCodeModelRef` tests |
## Detailed Implementation Checklist
### Step 1 - Add OpenCode capability matrix
OpenCode support depends on provider and model, not just provider id.
```ts
export interface OpenCodeModelVisionCapability {
provider: 'openai' | 'openrouter' | string;
modelPattern: RegExp;
vision: 'yes' | 'no' | 'unknown';
evidence: 'live-smoke' | 'docs' | 'manual' | 'default';
}
```
Initial known evidence from prototype:
- `openai/gpt-5.4-mini`: vision yes.
- `openrouter/moonshotai/kimi-k2.6`: vision yes.
- `openrouter/z-ai/glm-4.5v`: vision yes.
- `openrouter/z-ai/glm-5.1`: vision no or unreliable for images based on smoke result.
Unknown models should not silently accept images. Use warning/block depending on UX decision:
- For release-safe behavior: block unknown vision models and explain.
- For exploratory dev behavior: allow with explicit “model may not support images” warning only if user confirms. Not recommended before release.
### Step 2 - Route OpenCode attachments through file parts
Use OpenCode file attachment mechanism, not base64 text.
Conceptual orchestrator command:
```ts
const args = ['run', '--format', 'json', '--model', model, prompt];
for (const image of request.images) {
args.push('-f', image.path);
}
```
If using HTTP/server API instead of CLI, keep the same contract: file path/part is transport-native.
### Step 3 - Integrate with delivery ledger safely
Attachment delivery result must still satisfy the existing prompt delivery proof rules.
| Result | Ledger behavior |
|---|---|
| Model returns visible reply | Existing success path. |
| Model returns plain text direct user reply | Existing safe materialization path. |
| Model says it cannot view image | Delivery succeeded, semantic unsupported response. Mark delivered if visible reply exists. |
| Provider rejects file | Delivery failed with provider diagnostic. |
| Empty assistant turn | Existing bounded repair policy applies. |
| Non-visible tool without progress | Existing proof-directed repair applies. |
### Step 4 - Add exact diagnostics
OpenCode attachment diagnostics should include provider/model context but redact secrets.
Examples:
- `OpenCode model openrouter/z-ai/glm-5.1 is not marked vision-capable for image attachments.`
- `OpenCode image artifact missing before delivery.`
- `OpenCode provider rejected image attachment: <redacted error>`
- `OpenCode returned a visible reply saying it cannot inspect images.`
Do not turn a model “I cannot see images” answer into transport failure if the model produced a visible answer. That is useful user feedback.
## OpenCode Edge Cases
| Case | Expected behavior |
|---|---|
| OpenRouter key quota exceeded | Provider failure with exact redacted diagnostic. |
| OpenCode OAuth stale | Auth/session failure, not attachment failure. |
| Model vision unknown | Block or warn according to release setting, default block. |
| Model claims no image support despite capability yes | Visible semantic answer, not retry-loop. |
| File path accepted but model returns empty turn | Existing bounded empty-turn repair. |
| Concurrent messages with images | Ledger correlation by original message id remains authoritative. |
| Retry of failed image delivery | Reuse same managed artifact if still present, otherwise fail artifact missing. |
| User restarts member after image failure | Restart does not delete inbox message or attachment metadata. |
## Capability Governance
Do not scatter model lists across UI and backend.
Recommended single source:
```text
src/features/agent-attachments/shared/opencodeVisionCapabilities.ts
```
Export both backend and renderer-safe functions:
```ts
export function resolveOpenCodeVisionCapability(input: {
providerId: string;
model: string;
}): AgentAttachmentCapability {
// Pure function, no filesystem, no network.
}
```
When adding a new model:
1. Add capability entry with evidence comment.
2. Add unit test.
3. Add manual smoke result to this plan or docs.
4. Do not infer full provider support from one model.
## Manual OpenCode QA
Run at least these visual prompts:
1. `openai/gpt-5.4-mini`: red-card image -> expected `red`.
2. `openrouter/moonshotai/kimi-k2.6`: red-card image -> expected `red`.
3. `openrouter/z-ai/glm-4.5v`: red-card image -> expected `red`.
4. `openrouter/z-ai/glm-5.1`: red-card image -> expected block or clear unsupported warning.
5. Quota/key failure simulation -> exact provider error visible, no teammate offline unless runtime exits.
## Phase 4 Exit Criteria
- OpenCode text-only prompt path unchanged.
- Vision-capable OpenCode models receive real image files.
- Non-vision/unknown OpenCode models do not silently hallucinate image answers.
- Existing OpenCode repair policy remains bounded.
- Provider quota/auth errors remain exact and do not become generic “spawn failed”.
## Implementation Safeguards
### Capability result must be explainable
Users need to understand why one OpenCode model supports screenshots and another does not.
```ts
export interface CapabilityExplanation {
supported: boolean;
reason: 'known_vision_model' | 'known_non_vision_model' | 'unknown_model' | 'unsupported_provider';
displayText: string;
}
```
Examples:
- `openai/gpt-5.4-mini supports image attachments.`
- `openrouter/z-ai/glm-5.1 is not marked as image-capable. Choose a vision-capable model or remove images.`
- `This OpenCode model has unknown image support. Image delivery is blocked for reliability.`
### Do not couple OpenCode vision support to weak-model repair
The proof-directed repair policy handles missing response proof. It should not decide model vision support.
Keep separate:
- `OpenCodeVisionCapability`: can this model receive images?
- `OpenCodePromptDeliveryRepairPolicy`: did a delivered prompt produce required proof?
- `OpenCodeDeliveryLedger`: what happened to this message attempt?
### OpenCode provider rejection mapping
OpenRouter/OpenCode errors should remain exact but redacted.
| Provider error | Mapping |
|---|---|
| Quota/credits/token budget | provider rejected attachment/message, exact diagnostic. |
| Model not found | model/provider config diagnostic. |
| Unsupported file/input | attachment provider rejected. |
| OAuth/session stale | provider auth/session diagnostic. |
| Empty assistant turn | existing bounded no-response repair. |
No regex classification is required to decide business logic. Regex/string matching can be used only for safe redaction and known diagnostic display cleanup.
### OpenCode PR checklist
- Unknown models do not silently receive images.
- Supported models use native file parts.
- No model-specific prompt hacks are added.
- Existing text-only OpenCode tests still pass.
- Existing empty-turn retry remains bounded.
- Provider quota/auth errors are not masked by attachment code.
### Additional OpenCode tests
```ts
it('blocks unknown OpenCode vision model before delivery', () => {
const decision = resolveOpenCodeVisionCapability({
providerId: 'openrouter',
model: 'some/new-model',
});
expect(decision.supported).toBe(false);
expect(decision.reason).toBe('unknown_model');
});
it('does not invoke repair policy for model capability block', () => {
const result = planOpenCodeAttachmentDelivery(unknownVisionModelInput());
expect(result.status).toBe('blocked');
expect(result.retryable).toBe(false);
});
```
## Failure Injection Tests for Phase 4
```ts
describe('OpenCode image capability and delivery', () => {
it('blocks a known non-vision OpenCode model before prompt delivery', () => {
const result = planOpenCodeImageDelivery({
providerId: 'openrouter',
model: 'z-ai/glm-5.1',
attachments: [fakeImageAttachment()],
});
expect(result.status).toBe('blocked');
expect(result.failureCode).toBe('attachment_model_unsupported');
});
it('does not retry provider quota rejection as proof repair', async () => {
const result = await deliverOpenCodeImage(fakeOpenRouterQuotaError());
expect(result.retryKind).not.toBe('proof_directed_repair');
expect(result.failureCode).toBe('attachment_provider_rejected');
});
});
```
## OpenCode Multi-Hop Edge Cases
| Scenario | Safe behavior |
|---|---|
| User sends image to lead, lead delegates to OpenCode member | Lead message contains image; delegated message should include image only if explicitly attached/forwarded by lead protocol. Do not automatically leak user image to all teammates. |
| User sends direct image to OpenCode member | Direct delivery uses OpenCode file parts. |
| OpenCode member replies with tool call but no visible reply | Existing proof policy decides, not image layer. |
| OpenCode member starts task from image | Task evidence/progress still required for task state. |
| OpenCode model answers “I cannot access images” | Visible reply delivered; UI can show model limitation, no retry loop. |
## Privacy Boundary for Delegation
Do not automatically fan out attachments to every teammate. Attachment propagation should follow explicit message routing.
Rules:
- Direct `user -> member`: deliver attachment to that member only.
- `user -> lead`: deliver attachment to lead only.
- Lead delegation: attachment forwarded only if the generated structured message includes or references the attachment intentionally.
- System nudges/work-sync: do not include user attachments.
This avoids accidental data exposure across teammates.
## Phase 4 Stop Conditions
Stop and reassess if:
- OpenCode requires model-specific prompt hacks to see images.
- Unknown models must be allowed silently for UX convenience.
- Existing proof repair tests start changing because of image capability logic.
- Provider quota errors get converted into generic delivery proof failures.
## File-Level Implementation Plan
Desktop suggested files:
```text
src/features/agent-attachments/main/providers/opencodeAttachmentAdapter.ts
src/features/agent-attachments/shared/opencodeVisionCapabilities.ts
src/features/agent-attachments/main/providers/opencodeAttachmentAdapter.test.ts
```
Orchestrator suggested files:
```text
src/runtime/opencode/opencodeFileParts.ts
src/runtime/opencode/opencodeFileParts.test.ts
```
### Capability matrix skeleton
```ts
const OPENCODE_VISION_MODELS: OpenCodeModelVisionCapability[] = [
{
provider: 'openai',
modelPattern: /^gpt-5\.4-mini$/,
vision: 'yes',
evidence: 'live-smoke',
},
{
provider: 'openrouter',
modelPattern: /^moonshotai\/kimi-k2\.6$/,
vision: 'yes',
evidence: 'live-smoke',
},
{
provider: 'openrouter',
modelPattern: /^z-ai\/glm-4\.5v$/,
vision: 'yes',
evidence: 'live-smoke',
},
{
provider: 'openrouter',
modelPattern: /^z-ai\/glm-5\.1$/,
vision: 'no',
evidence: 'live-smoke',
},
];
```
Keep comments near each entry with date and smoke summary. Do not infer provider-wide support.
### OpenCode adapter skeleton
```ts
export function buildOpenCodeAttachmentPromptRequest(input: {
text: string;
providerId: string;
model: string;
attachments: AgentAttachmentPayload[];
}): OpenCodePromptRequest {
const capability = resolveOpenCodeVisionCapability({ providerId: input.providerId, model: input.model });
if (!capability.supported) {
throw new AttachmentDeliveryError('attachment_model_unsupported', capability.displayText);
}
return {
text: input.text,
files: input.attachments.map((attachment) => selectProviderImageArtifact(attachment, 'opencode').absolutePath),
};
}
```
### OpenCode warning copy examples
Use precise user-facing copy:
```text
This OpenCode model is not verified for image attachments. Choose a vision-capable model or remove the image.
```
```text
OpenCode provider rejected the image attachment: <redacted provider error>
```
Avoid vague copy:
```text
Message failed.
```
```text
OpenCode error.
```
## Phase 4 Deep Review Addendum
### OpenCode capability decision must be cached but refreshable
Model capability matrix is mostly static, but the user may change provider/model in Edit Team.
Rules:
- Resolve capability from current member metadata at send time.
- Do not cache capability per member forever.
- If member model changes, next send uses new model capability.
- UI warnings should refresh after Edit Team save/restart.
### OpenCode direct vs lead-routed images
Direct user to OpenCode member:
```text
user -> alice(OpenCode) with image
```
Deliver image to Alice only.
User to lead:
```text
user -> lead with image
```
Deliver image to lead only. Lead may choose to ask teammate, but automatic image forwarding should not happen unless future protocol explicitly supports attachment references.
This is a privacy and predictability decision.
### OpenCode unsupported model message examples
For direct send:
```text
Alice is using openrouter/z-ai/glm-5.1, which is not verified for image attachments. Remove the image or switch Alice to a vision-capable model.
```
For team send to lead where lead is not OpenCode:
```text
The lead can receive this image. It will not be automatically forwarded to OpenCode teammates unless the lead explicitly sends it.
```
### OpenCode model capability update procedure
When adding a new OpenCode vision model:
1. Run red-square smoke test.
2. Record provider, exact model id, date, result.
3. Add capability matrix entry.
4. Add unit test.
5. If model is unreliable, mark unsupported or unknown, not supported.
## Phase 4 Implementation Contract Addendum
### Provider/model canonicalization
OpenCode model ids can appear with or without provider prefix depending on UI/runtime source. Normalize before capability lookup.
```ts
export function canonicalizeOpenCodeModel(input: {
providerId: string;
model: string;
}): { providerId: string; model: string } {
const providerId = input.providerId.toLowerCase();
const model = input.model.replace(/^openrouter\//, '').replace(/^openai\//, '');
return { providerId, model };
}
```
Capability tests must cover both forms:
- `providerId=openrouter`, `model=moonshotai/kimi-k2.6`
- `providerId=openrouter`, `model=openrouter/moonshotai/kimi-k2.6`
### OpenCode DTO between desktop and orchestrator
```ts
export interface OpenCodePromptRuntimeRequest {
text: string;
files?: Array<{
artifactId: string;
path: string;
mimeType: 'image/png' | 'image/jpeg';
sizeBytes: number;
}>;
}
```
Rules:
- `files` are provider-native attachments, not arbitrary user paths.
- `files` empty means text-only behavior.
- Orchestrator validates path before passing `-f` or HTTP file part.
### OpenCode ledger interaction examples
```ts
if (capability.supported === false) {
return {
accepted: false,
delivered: false,
retryable: false,
failureCode: 'attachment_model_unsupported',
};
}
```
This must not create a ledger retry attempt because the prompt was not delivered.
If provider accepts prompt but returns empty assistant turn, existing ledger retry applies:
```ts
return {
accepted: true,
delivered: false,
retryable: true,
failureCode: 'empty_assistant_turn',
};
```
## Implementation Readiness Addendum
### Definition of Ready for Phase 4
Before coding Phase 4:
- Phase 1 normalized artifacts are stable.
- Existing OpenCode delivery ledger tests are green.
- OpenCode text-only direct reply path is green.
- Capability matrix initial entries are accepted.
- Orchestrator OpenCode file-part mechanism is identified.
### OpenCode mocked sender strategy
Use fake OpenCode sender outcomes:
```ts
type FakeOpenCodeOutcome =
| { kind: 'visible_reply'; text: string }
| { kind: 'empty_assistant_turn' }
| { kind: 'provider_error'; message: string }
| { kind: 'non_visible_tool' };
```
Test attachment adapter and ledger separately:
- Adapter tests: capability and file-part request building.
- Ledger tests: how delivery outcome affects retry/proof.
- Do not combine both in one huge brittle test unless it is e2e.
### OpenCode additional edge cases
| Edge case | Expected behavior |
|---|---|
| Model id casing differs | Canonicalize before capability lookup. |
| Provider prefix duplicated | Canonicalize. |
| OpenRouter key missing | Existing provider auth diagnostic. |
| OpenRouter credits exhausted | Provider diagnostic, no proof repair. |
| Vision model returns empty | Existing bounded empty-turn retry. |
| Non-vision model says cannot see image | If blocked before send, this should not occur. If it occurs from stale capability, visible reply is shown and capability should be corrected. |
### Capability matrix change control
Do not accept capability entries without evidence.
Required evidence for `vision: yes`:
- Real smoke prompt.
- Model id exactly as runtime uses it.
- Date.
- Observed answer.
Required evidence for `vision: no`:
- Real smoke or provider documentation.
- Observed refusal/unsupported behavior.
## Final Phase 4 Acceptance Specs
### Spec 1 - known vision OpenCode model
```gherkin
Given a user sends an image to openrouter/moonshotai/kimi-k2.6
And the capability matrix marks it vision-capable
When delivery is planned
Then the image is sent as a native file part
And existing ledger proof gates decide delivery success
```
### Spec 2 - known non-vision OpenCode model
```gherkin
Given a user sends an image to openrouter/z-ai/glm-5.1
And the capability matrix marks it unsupported
When delivery is planned
Then delivery is blocked before prompt send
And no retry ledger attempt is created
And the UI explains that the model is not verified for image attachments
```
### Spec 3 - OpenCode provider quota error
```gherkin
Given an OpenCode provider rejects a message due to quota
When an image message is sent
Then the exact redacted provider diagnostic is shown
And the failure is not converted into proof repair
And the member launch state is unchanged
```
### Phase 4 exact PR contract
The Phase 4 PR is acceptable only if:
- Capability matrix has tests for every entry.
- Unknown models default safe.
- File-part request builder has tests.
- Existing OpenCode ledger bounded retry tests remain green.
- Unsupported capability block does not create a retry attempt.
- Direct and lead-routed privacy rules are documented in tests or code comments.
### OpenCode capability copy examples
```text
Jack is using openrouter/z-ai/glm-5.1, which is not verified for image attachments. Switch to a vision-capable model or remove the image.
```
```text
Alice can receive this image because openai/gpt-5.4-mini is verified for image attachments.
```
## Phase 4 Pre-Mortem and Extra Safeguards
### Likely OpenCode mistakes
| Mistake | Concrete prevention |
|---|---|
| Unknown model allowed because “maybe vision” | Unknown defaults blocked. |
| Capability lookup misses provider prefix variant | Canonicalization tests. |
| Provider quota error enters proof repair | Provider rejection is terminal for that attempt, not proof repair. |
| Direct user image gets auto-forwarded to teammates | Explicit route-only attachment propagation. |
| Non-vision model gets image-less prompt silently | Capability block before delivery. |
### OpenCode file-part transport abstraction
Do not let high-level delivery know whether OpenCode uses CLI `-f` or HTTP multipart internally.
```ts
export interface OpenCodeFilePartTransport {
sendPrompt(input: {
sessionId: string;
text: string;
files: OpenCodeRuntimeFilePart[];
}): Promise<OpenCodePromptOutcome>;
}
```
This allows future switch from CLI to server API without changing capability or ledger logic.
### OpenCode semantic refusal handling
If model responds visibly with “I cannot inspect images”:
- Direct user message: show response as delivered.
- Add optional diagnostic warning that capability matrix may be wrong.
- Do not retry automatically.
- Consider downgrading capability entry after repeated smoke failure.
### OpenCode tests for no silent drop
```ts
it('does not send text-only prompt when image is unsupported', async () => {
const sender = createFakeOpenCodeSender();
await expect(deliverImageToUnsupportedOpenCodeModel(sender)).rejects.toMatchObject({
code: 'attachment_model_unsupported',
});
expect(sender.calls).toHaveLength(0);
});
```

View file

@ -629,3 +629,683 @@ All OpenRouter models support screenshots.
| docs become stale | capability catalog references live smoke date |
| unsupported negative model changes behavior | update catalog/test explicitly |
| copied diagnostics leak image data | redaction unit tests |
## Detailed E2E Matrix
Use one tiny deterministic image fixture: a red square with no metadata dependencies.
| Provider path | Model | Prompt | Expected |
|---|---|---|---|
| Claude subscription | current configured Claude model | `What color is the square? One word.` | `red` |
| Codex subscription | `gpt-5.4-mini` or available vision model | same | `red` |
| OpenCode OpenAI | `openai/gpt-5.4-mini` | same | `red` |
| OpenCode OpenRouter | `openrouter/moonshotai/kimi-k2.6` | same | `red` |
| OpenCode OpenRouter | `openrouter/z-ai/glm-4.5v` | same | `red` |
| OpenCode OpenRouter | `openrouter/z-ai/glm-5.1` | same | blocked/unsupported |
## Negative E2E Matrix
| Scenario | Expected |
|---|---|
| Oversized image | UI blocks before send with size/optimization details. |
| Corrupt image | UI blocks with decode error. |
| Missing optimized artifact | Delivery fails actionable, message preserved. |
| Provider quota exhausted | Provider diagnostic shown, no generic spawn failed. |
| Codex logged out | Existing Codex login diagnostic shown. |
| OpenCode model no vision | Unsupported warning/block shown. |
| Multiple images over total budget | Deterministic budget warning. |
| Runtime exits after send | Runtime exit diagnostic separate from attachment validation. |
## Automated Test Layers
### Pure tests
- Budget selection.
- Capability resolution.
- MIME/type classification.
- Provider adapter serialization.
- Redaction.
### Main process tests
- IPC validation rejects invalid attachment ids.
- Backend never accepts raw renderer paths.
- Missing artifacts produce controlled failure.
- Delivery failure does not mutate launch state.
### Renderer tests
- Preview and optimization warnings.
- Target switching recomputes capability warning.
- Send button blocks unsupported attachments.
- Removing attachments clears warnings.
### Safe E2E tests
- Provider command builders receive native attachment parts.
- One real smoke per supported provider before release.
## Release Checklist
Before enabling broadly:
- `pnpm typecheck --pretty false` passes.
- Focused attachment pure tests pass.
- Existing team launch tests still pass.
- Existing OpenCode delivery tests still pass.
- At least one real visual smoke is green for Claude, Codex, and OpenCode.
- No new dependency is added without version/license check.
- App logs do not contain raw base64 image data.
- Copy diagnostics redacts paths/secrets where required.
- User-facing warnings are clear and actionable.
## Observability Additions
Add lightweight diagnostics that help debug without leaking content.
Safe diagnostic fields:
```ts
interface AttachmentDiagnosticSummary {
attachmentCount: number;
totalBytes: number;
optimizedTotalBytes: number;
providers: string[];
blockedReason?: AttachmentDeliveryFailureCode;
warningCodes: string[];
}
```
Unsafe diagnostic fields:
- Base64 content.
- Full OCR/text extracted from images.
- Provider API keys.
- User home directory paths unless already part of explicit local diagnostics.
- Full binary file hashes if privacy-sensitive.
## Post-Release Monitoring
Watch for these regressions after release:
- Users report team going offline after sending images.
- Users report image sent but model says it cannot see image on a supposedly vision-capable model.
- Inbox/message JSON grows unexpectedly large.
- Renderer memory spikes after attaching multiple screenshots.
- Codex/OpenCode diagnostics become generic after attachment failures.
- Retry loops repeat the same image message without bounded attempts.
If any happen:
1. Disable only affected provider adapter path.
2. Keep normalization and UI warnings if stable.
3. Preserve messages and attachment metadata for forensic debugging.
4. Do not revert unrelated bootstrap/process backend changes in the same hotfix.
## Final Safety Review Questions
Before implementation starts, answer these in the PR description:
- Does any provider receive base64 as plain text? If yes, why is that unavoidable?
- Can renderer force backend to read an arbitrary file path? It must not.
- What happens if optimized artifact is deleted before retry?
- What happens if user switches target model after attaching image?
- What happens if a provider supports text but not vision?
- Does this change launch readiness or only message delivery?
- Are all retries bounded by existing ledger/runtime rules?
## Deep Verification Plan
### Deterministic fixture generation
Keep fixture generation local and deterministic so E2E does not depend on external image files.
```ts
export async function createRedSquareFixture(path: string): Promise<void> {
// Use a tiny PNG fixture committed to test fixtures, or generate with deterministic bytes.
}
```
Preferred:
- Commit a tiny PNG fixture under test fixtures if repository policy allows binary fixtures.
- Otherwise generate once in test setup using a deterministic encoder.
### E2E assertion style
Do not require exact model wording. Normalize response and assert semantic color.
```ts
function answerMentionsRed(text: string): boolean {
return /\bred\b/i.test(text) || /красн/i.test(text);
}
```
Avoid accepting `I cannot view images` as pass.
### E2E timeout strategy
- Provider smoke tests should have generous but bounded timeout.
- Failure should print provider/model, delivery path, and redacted stderr tail.
- One provider failure should not hide other provider results.
### Release-blocking vs non-blocking tests
Release-blocking:
- Pure unit tests.
- Serialization tests.
- One Claude smoke if Claude support is enabled.
- One Codex smoke if Codex support is enabled.
- One OpenCode OpenAI smoke if OpenCode image support is enabled for OpenAI.
Non-blocking/manual before release:
- Multiple OpenRouter model smokes.
- Very large image performance test.
- HEIC/clipboard platform-specific tests.
### User documentation checklist
Document these user-facing facts:
- Screenshots are automatically optimized before sending.
- Some OpenCode models do not support image attachments.
- If a model cannot receive images, the UI will ask the user to switch model or remove images.
- Raw files are not pasted as base64 into messages.
- If delivery fails, the message is preserved and the error explains whether it was size, model support, auth, quota, or runtime.
### Final PR template section
Each implementation PR should include:
```md
## Attachment safety checklist
- [ ] Text-only messages unchanged.
- [ ] No base64 plain-text prompt fallback.
- [ ] Backend validates attachment ids and paths.
- [ ] Unsupported model behavior tested.
- [ ] Provider auth errors remain provider auth errors.
- [ ] Delivery failure does not mark teammate offline unless runtime exits.
- [ ] Copy diagnostics redacts secrets.
- [ ] E2E smoke listed or intentionally deferred.
```
## Failure Injection E2E Scenarios
These should be run after unit coverage is green.
### Runtime survives attachment failure
1. Start a team with one known working member.
2. Send unsupported oversized image.
3. Verify send is blocked or delivery fails without marking team/member offline.
4. Send text-only message afterward.
5. Verify member still responds.
### Provider auth failure remains auth failure
1. Temporarily use stale/invalid provider auth in test environment.
2. Send image message.
3. Verify diagnostics mention auth/login, not image unsupported.
4. Restore auth.
### Non-vision model blocked
1. Select known non-vision OpenCode model.
2. Attach image.
3. Verify UI blocks before delivery.
4. Remove image.
5. Verify text-only send works.
### Artifact missing on retry
1. Save message with image artifact.
2. Delete artifact file in test harness before retry.
3. Trigger retry.
4. Verify `attachment_artifact_missing` and no text-only fallback.
## Release Candidate Checklist
A release candidate can include attachment support only if:
- All Phase 1 pure/domain tests pass.
- Provider enabled in UI has at least one real visual smoke test green.
- Unsupported model UX has been manually verified.
- Copy diagnostics includes attachment error code and provider/model, but no base64.
- Team launch tests are unchanged or green.
- Existing user message delivery tests are green.
- OpenCode proof repair tests are green.
- Codex auth/preflight tests are green.
## Suggested rollout order in release notes
1. “Image attachments are optimized before sending.”
2. “Claude and Codex vision-capable models receive screenshots through native image channels.”
3. “OpenCode image support depends on the selected model. The UI warns when a model is not image-capable.”
4. “Unsupported images are blocked with actionable diagnostics instead of being pasted as text.”
## Post-Implementation Audit Checklist
After all phases are implemented, search the codebase for these anti-patterns:
```text
base64,
data:image,
--image,
-f,
attachment_provider_rejected,
attachment_model_unsupported,
launch-state,
spawnStatus,
bootstrapConfirmed
```
Audit goals:
- `base64` appears only in provider-native block serialization or tests.
- `data:image` is not inserted into prompt text.
- `--image` appears only in Codex command builder and tests.
- `-f` image usage appears only in OpenCode adapter/command builder and tests.
- Attachment failures do not write launch-state/spawn status.
- Bootstrap code does not import attachment modules.
## Implementation Completion Checklist by Phase
### Phase 1 completion evidence
Attach to PR:
- Unit test output for budget/validation/capability modules.
- Screenshot of composer warning for unsupported/oversized image.
- Screenshot of optimized image metadata display if UI exposes it.
- Confirmation that no provider delivery code changed.
### Phase 2 completion evidence
Attach to PR:
- Claude serialization test output.
- Real Claude visual smoke result.
- Text-only Claude regression result.
- Diagnostic screenshot for oversized image block.
### Phase 3 completion evidence
Attach to PR:
- Codex command builder test output.
- Real Codex visual smoke result using subscription mode.
- Confirmation Codex auth diagnostics unchanged.
- Confirmation no shell string command construction was added.
### Phase 4 completion evidence
Attach to PR:
- OpenCode capability matrix tests.
- OpenCode OpenAI visual smoke result.
- OpenRouter Kimi/GLM visual smoke results if enabled.
- Unsupported GLM/non-vision model warning screenshot.
- Confirmation OpenCode proof repair lifecycle unchanged.
### Phase 5 completion evidence
Attach to PR:
- Full focused test list.
- Manual smoke matrix with provider/model/date.
- Known limitations section.
- Release note draft.
## Final bug-prevention checklist
Before merging the final phase, manually inspect these concerns:
- Does every provider adapter have tests for empty attachments and image attachments?
- Does every adapter reject unsupported MIME types?
- Does every adapter avoid changing text-only behavior?
- Does any code parse provider error text to decide model capability? It should not.
- Does any code write attachment failures into launch state? It must not.
- Does any code drop attachments silently? It must not.
- Does unsupported model UI block send or require explicit user action? It should.
- Does retry preserve attachment identity? It must.
- Does artifact missing produce a clear error? It must.
## Honest risk estimate after this planning
If implemented phase-by-phase exactly as planned:
- Phase 1 risk: `2.5/10`
- Phase 2 risk: `3/10`
- Phase 3 risk: `4/10`
- Phase 4 risk: `5/10`
- Phase 5 risk: `2/10`
- Overall release risk: `3.5/10`
If implemented as one broad refactor:
- Overall release risk: `7/10`
Main reason: the dangerous part is not image resizing itself. The dangerous part is accidentally coupling attachments to delivery proofs, runtime liveness, auth diagnostics, or provider-specific launch code.
## Phase 5 Deep Review Addendum
### Cross-provider E2E script shape
A single script can reduce manual drift, but it should not be required for normal unit tests.
Conceptual CLI:
```bash
pnpm tsx scripts/smoke-agent-attachments.ts \
--provider claude \
--model current \
--image test/fixtures/red-square.png
```
Output should be machine-readable enough for logs:
```json
{
"provider": "codex",
"model": "gpt-5.4-mini",
"image": "red-square.png",
"result": "pass",
"answerPreview": "red",
"durationMs": 18432
}
```
### E2E failure report template
When an image smoke fails, report:
```text
Provider: OpenCode
Model: openrouter/z-ai/glm-5.1
Expected: red
Observed: I cannot view images
Classification: model_not_vision_capable
Action: keep blocked/unsupported in capability matrix
```
Do not report it as generic runtime failure.
### Manual release drill
Before release, do this exact drill:
1. Start app fresh.
2. Create or open a simple team with Claude lead.
3. Send text-only message, confirm normal reply.
4. Send one screenshot to Claude, confirm visual answer.
5. Switch to Codex member, send same screenshot, confirm visual answer.
6. Switch to OpenCode vision member, send same screenshot, confirm visual answer.
7. Switch to OpenCode unsupported model, confirm UI blocks image.
8. Send text-only to unsupported model, confirm it still works.
9. Restart app, verify messages render and attachments do not disappear.
10. Copy diagnostics from failed unsupported send, verify no base64/secrets.
### Final release confidence rating target
Do not ship broad attachment support unless confidence reaches:
- 🎯 at least `8.5/10` for Claude.
- 🎯 at least `8/10` for Codex.
- 🎯 at least `7.5/10` for OpenCode vision-capable known models.
- 🛡️ at least `9/10` that unsupported models fail safely.
- 🛡️ at least `9/10` that launch/readiness is unaffected.
## Phase 5 Implementation Contract Addendum
### Test command matrix
Focused checks after each phase should be small and targeted.
Phase 1:
```bash
pnpm vitest run test/features/agent-attachments/budgets.test.ts test/features/agent-attachments/validation.test.ts
pnpm typecheck --pretty false
```
Phase 2:
```bash
pnpm vitest run test/features/agent-attachments/claudeAttachmentAdapter.test.ts
pnpm vitest run test/main/services/team/TeamProvisioningServiceRelay.test.ts
```
Phase 3:
```bash
pnpm vitest run test/features/agent-attachments/codexAttachmentAdapter.test.ts
# plus orchestrator codex command builder test in agent_teams_orchestrator
```
Phase 4:
```bash
pnpm vitest run test/features/agent-attachments/opencodeAttachmentAdapter.test.ts
pnpm vitest run test/main/services/team/OpenCodePromptDeliveryLedger.test.ts test/main/services/team/OpenCodePromptDeliveryWatchdog.test.ts
```
Phase 5:
```bash
pnpm typecheck --pretty false
pnpm vitest run test/features/agent-attachments/**/*.test.ts
```
Exact test paths may change during implementation, but the test categories should remain.
### Smoke result ledger
Create a simple markdown table in PR or docs after live smokes:
| Date | Provider | Model | Runtime path | Result | Notes |
|---|---|---|---|---|---|
| 2026-05-09 | Claude | current | app team delivery | pass | red-square -> red |
| 2026-05-09 | Codex | gpt-5.4-mini | app team delivery | pass | red-square -> red |
| 2026-05-09 | OpenCode | openai/gpt-5.4-mini | app team delivery | pass | red-square -> red |
This prevents accidental reliance on standalone prototype results when app path differs.
### Final implementation order reminder
Do not start with provider adapters. Start with Phase 1 domain and storage because provider adapters need a stable input contract.
Correct order:
1. Normalize and persist artifacts safely.
2. Block unsupported/oversized sends.
3. Add Claude adapter.
4. Add Codex adapter.
5. Add OpenCode adapter.
6. Add E2E and docs polish.
Wrong order:
1. Hack provider CLI args.
2. Later figure out storage.
3. Later figure out UI warnings.
4. Later discover retries lost files.
## Implementation Readiness Addendum
### Definition of Ready for Phase 5
Before Phase 5:
- All provider adapters have unit tests.
- At least one provider image path works in app-managed runtime.
- Unsupported model UX exists.
- Diagnostics taxonomy is implemented.
### Fixture strategy
Prefer one committed tiny PNG fixture plus optional generated variants.
Fixtures:
```text
test/fixtures/attachments/red-square.png
test/fixtures/attachments/red-square-large.png
test/fixtures/attachments/corrupt.png
test/fixtures/attachments/not-image.txt
```
If binary fixtures are not desired, generate in test setup, but keep generation deterministic.
### Smoke classification helper
```ts
export function classifyVisualSmokeAnswer(answer: string): 'pass' | 'refusal' | 'wrong' | 'empty' {
const normalized = answer.trim().toLowerCase();
if (!normalized) return 'empty';
if (/cannot|can't|unable|view|image/.test(normalized) && !/red/.test(normalized)) return 'refusal';
if (/\bred\b|красн/.test(normalized)) return 'pass';
return 'wrong';
}
```
Do not overfit this helper to one provider. It is only for smoke classification.
### Release notes limitations section
Document limitations honestly:
```md
Known limitations:
- Image support depends on selected provider and model.
- Some OpenCode/OpenRouter models are intentionally blocked until verified for vision.
- Very large images are optimized and may lose some detail.
- Non-image files continue using existing file behavior and are not part of the new image pipeline.
```
### Final post-merge watchlist
For the first release after merge, watch for:
- Increased renderer memory usage after attaching screenshots.
- Any report of team launch failures after sending images.
- Any report that images appear in message text as base64.
- Any OpenCode model that should be marked unsupported.
- Any Codex auth diagnostic regression.
- Any failed retry due to missing artifacts.
## Final Phase 5 Acceptance Specs
### Spec 1 - cross-provider visual smoke matrix
```gherkin
Given the red-square fixture
When visual smoke runs for each enabled provider path
Then every supported provider/model answers red
And every unsupported model is blocked or classified unsupported
And no failure is reported as generic spawn failure
```
### Spec 2 - diagnostics are safe
```gherkin
Given an attachment delivery fails
When the user copies diagnostics
Then diagnostics include provider, model, failure code, and redacted reason
And diagnostics do not include base64 image bytes
And diagnostics do not include API keys or bearer tokens
```
### Spec 3 - launch unaffected
```gherkin
Given a team launches without attachments
When launch completes
Then launch behavior matches pre-attachment behavior
And no attachment module participates in bootstrap readiness
```
### Phase 5 exact PR contract
The Phase 5 PR is acceptable only if:
- Smoke results are recorded with date/provider/model/runtime path.
- Unsupported model behavior is manually verified.
- Existing launch and delivery focused tests are green.
- Documentation explains limitations honestly.
- Release notes avoid promising universal vision support.
- Post-merge watchlist is included in release checklist.
### Final “do not ship if” list
Do not ship if any of these are true:
- Text-only messages regress for any provider.
- Any provider receives base64 as plain prompt text.
- Unsupported OpenCode models silently accept image messages.
- Attachment failure marks teammate spawn failed without runtime exit evidence.
- Codex subscription auth diagnostics regress.
- Image retry can silently send text without the original image.
- Copy diagnostics leaks base64 or secrets.
## Phase 5 Pre-Mortem and Extra Safeguards
### Likely verification mistakes
| Mistake | Prevention |
|---|---|
| Standalone provider smoke passes but app path fails | Smoke through app-managed team delivery. |
| E2E accepts refusal as pass | Use classifier that rejects “cannot view image”. |
| Only happy path tested | Include unsupported model, artifact missing, and auth failure. |
| Logs leak base64 | Copy diagnostics test searches for base64 markers. |
| Release notes overpromise | Explicit known limitations section. |
### Copy diagnostics acceptance test
```ts
it('copy diagnostics omits image bytes and secrets', () => {
const text = buildAttachmentFailureDiagnostics(fakeImageFailure());
expect(text).toContain('attachment_model_unsupported');
expect(text).not.toContain('data:image');
expect(text).not.toMatch(/[A-Za-z0-9+/]{200,}={0,2}/);
expect(text).not.toMatch(/sk-[a-zA-Z0-9_-]+/);
});
```
### App-path smoke script must prove route
Smoke result should include route:
```json
{
"route": "user->lead",
"teamName": "attachment-smoke-claude",
"provider": "anthropic",
"model": "current",
"attachmentTransport": "claude_structured_blocks",
"result": "pass"
}
```
If route is standalone CLI only, mark as prototype evidence, not release evidence.
### Release manager checklist
- Read all `Do not ship if` items.
- Confirm no broad feature flag remains.
- Confirm capability gates are user-visible product behavior.
- Confirm no launch/provisioning files import attachment modules.
- Confirm provider smoke evidence is app-path evidence.
- Confirm rollback by provider is possible without data migration.