From 7b88d495d0beec53faf470b6fac44442d5b4d9b0 Mon Sep 17 00:00:00 2001 From: 777genius Date: Sat, 9 May 2026 01:09:36 +0300 Subject: [PATCH] docs(team): add agent attachment implementation plans --- .../agent-attachments-architecture-plan.md | 1125 +++++++++++++++++ ...-phase-1-normalization-and-budgets-plan.md | 808 ++++++++++++ ...chments-phase-2-claude-stream-json-plan.md | 564 +++++++++ ...t-attachments-phase-3-codex-native-plan.md | 592 +++++++++ ...ttachments-phase-4-opencode-vision-plan.md | 643 ++++++++++ ...attachments-phase-5-e2e-and-polish-plan.md | 680 ++++++++++ 6 files changed, 4412 insertions(+) diff --git a/docs/team-management/agent-attachments-architecture-plan.md b/docs/team-management/agent-attachments-architecture-plan.md index d69e0b38..ef868cef 100644 --- a/docs/team-management/agent-attachments-architecture-plan.md +++ b/docs/team-management/agent-attachments-architecture-plan.md @@ -1020,3 +1020,1128 @@ These are intentionally broad. Not all need implementation in v1, but each shoul | OpenCode | model accepts file but refuses vision | capability catalog update, no transport blame | | Codex | image path missing | pre-spawn local error | | Claude | API says image invalid | provider/runtime error, not optimizer error | + +## Execution Protocol for Implementation + +This project should be implemented as a sequence of narrow commits. Each commit must preserve all existing non-attachment behavior. + +Recommended commit sequence: + +1. `feat(attachments): add normalized attachment domain` +2. `feat(attachments): add renderer image optimization` +3. `feat(attachments): route claude images through stream-json blocks` +4. `feat(attachments): pass codex images through native image args` +5. `feat(attachments): pass opencode vision images through file parts` +6. `test(attachments): add cross-provider image delivery e2e` + +Rules during implementation: + +- Do not mix launch/runtime stabilization changes into attachment commits. +- Do not change bootstrap, member readiness, process backend, tmux fallback, or work-sync semantics. +- Do not add a native image processing dependency to Electron main in Phase 1. +- Do not introduce provider-specific image normalization logic in renderer. +- Do not trust renderer-computed budget/capability decisions as final authority. +- Do not log attachment base64, OAuth tokens, API keys, absolute temp artifact content, or full prompt payloads. + +## Cross-Phase Compatibility Contract + +All phases should agree on the same normalized attachment contract. This prevents Phase 2/3/4 from each inventing a slightly different shape. + +```ts +export interface AgentAttachmentPayload { + id: string; + originalName: string; + mimeType: 'image/png' | 'image/jpeg' | 'image/webp' | 'application/pdf' | 'text/plain' | string; + sizeBytes: number; + kind: 'image' | 'file'; + source: 'composer' | 'clipboard' | 'drag-drop' | 'task' | 'inbox'; + storage: { + originalPath?: string; + optimizedPath?: string; + thumbnailPath?: string; + }; + image?: { + width?: number; + height?: number; + animated?: boolean; + optimizedWidth?: number; + optimizedHeight?: number; + optimization: 'none' | 'lossless' | 'resized' | 'jpeg-reencoded' | 'unsupported'; + }; + warnings: string[]; +} +``` + +Compatibility rules: + +- New fields must be optional until every consumer is migrated. +- Existing persisted message attachments must continue rendering even if they lack normalized metadata. +- Provider adapters must accept both freshly normalized attachments and old persisted attachments after backend hydration. +- Missing optimized variant must not crash delivery. It should fall back to original only if the provider budget allows it. +- Missing original path must fail visibly for delivery, but must not corrupt the inbox record. + +## Provider Adapter Boundary + +Use a single boundary where provider-specific logic begins. + +```ts +export interface AttachmentDeliveryAdapter { + readonly providerId: 'anthropic' | 'codex' | 'opencode'; + + canDeliverAttachment(input: { + model: string; + attachment: AgentAttachmentPayload; + capability: AgentAttachmentCapability; + }): AttachmentDeliveryDecision; + + buildDeliveryParts(input: { + text: string; + attachments: AgentAttachmentPayload[]; + budget: AgentAttachmentBudget; + }): Promise; +} +``` + +Architecture intent: + +- Renderer owns user experience and local image optimization. +- Main process owns validation, redaction, budget enforcement, and persistence. +- Provider adapters own transport-specific serialization. +- Orchestrator owns CLI/runtime invocation details where those details are outside Electron. + +This keeps SOLID boundaries clear: + +- Single Responsibility: normalization, capability, and transport are separate modules. +- Open-Closed: adding a new provider should add an adapter, not mutate every call site. +- Interface Segregation: UI code should not know about `--image`, OpenCode `-f`, or Claude stream-json details. +- Dependency Inversion: delivery flow depends on adapter interfaces, not concrete provider branches. + +## Invariants and Enforcement Points + +| Invariant | Enforced in renderer | Enforced in main | Enforced in adapter | Notes | +|---|---:|---:|---:|---| +| No raw base64 text in prompt | yes | yes | yes | Base64 may exist in provider-native SDK block only when required. | +| Original attachment remains immutable | yes | yes | no | Optimized files are derived artifacts. | +| User sees unsupported-model warning before send | yes | yes | no | Backend still blocks if renderer is stale. | +| Attachments do not change launch readiness | no | yes | yes | Delivery errors are message-level, not bootstrap truth. | +| Provider limits are bounded | yes | yes | yes | Renderer preflight is UX, backend is authority. | +| Secrets are never copied into diagnostics | no | yes | yes | Provider stderr tails must be redacted. | +| File paths are never supplied by renderer unchecked | no | yes | no | Backend resolves managed artifact ids to paths. | + +## Failure Taxonomy + +Attachment failures should be separated from runtime failures. + +```ts +type AttachmentDeliveryFailureCode = + | 'attachment_too_large' + | 'attachment_type_unsupported' + | 'attachment_model_unsupported' + | 'attachment_optimization_failed' + | 'attachment_artifact_missing' + | 'attachment_provider_rejected' + | 'attachment_runtime_transport_failed'; +``` + +Mapping rules: + +- `attachment_model_unsupported`: show before send when possible. Do not start delivery. +- `attachment_too_large`: show before send when possible. Do not start delivery. +- `attachment_artifact_missing`: message stays saved, delivery fails actionable, no team offline state. +- `attachment_provider_rejected`: delivery failure with exact redacted provider error. +- `attachment_runtime_transport_failed`: delivery failure, not teammate spawn failure unless runtime actually exits. + +## Release Risk Budget + +Overall implementation risk if all phases are done at once: `6/10`. + +Recommended risk after phased implementation: `3/10`. + +Why phased risk is lower: + +- Phase 1 is UI/domain-only and can ship disabled from provider paths until Phase 2. +- Claude, Codex, and OpenCode adapters can be validated independently. +- E2E can prove each provider before enabling broad UX expectations. +- Rollback of a provider adapter does not require removing normalization. + +## Whole-Project Definition of Done + +The attachment architecture should be considered done only when all of these are true: + +- A user can attach a large screenshot and see deterministic resize/compression feedback before send. +- Claude subscription path receives the image as a real image and answers a visual question. +- Codex subscription path receives the image as a real image and answers a visual question. +- OpenCode OpenAI vision-capable model receives the image as a real image and answers a visual question. +- OpenCode OpenRouter vision-capable model receives the image as a real image and answers a visual question. +- OpenCode non-vision model is blocked or clearly warned, not allowed to silently hallucinate. +- Multiple images are budgeted deterministically. +- Message delivery failure does not mark the teammate offline unless the teammate runtime actually dies. +- Diagnostics explain whether the problem is size, type, model capability, provider rejection, or runtime crash. +- Existing text-only messages, task delegation, work-sync, and bootstrap flows behave exactly as before. + + +## Deep Hardening Addendum + +### Top 3 architectural options considered + +1. Provider-native adapters with shared normalization - 🎯 9.4 🛡️ 9.3 🧠 5.5, примерно `900-1500` строк across all phases. + + This is the selected approach. The shared layer owns optimization, budgets, capability checks, and diagnostics. Each provider adapter only knows how to serialize already-normalized artifacts into its native transport. + + Why this is safest: + + - No fake universal base64 prompt format. + - Each provider receives images through a mechanism already proven in prototypes. + - Attachment support can be rolled out provider-by-provider. + - Text-only behavior remains unchanged. + +2. Universal base64-in-text fallback - 🎯 4.5 🛡️ 3 🧠 3, примерно `250-450` строк. + + This looks simple, but it is the wrong abstraction. It bloats prompts, breaks stdin/process limits, degrades model understanding, and makes provider behavior unpredictable. It also risks causing exactly the kind of runtime crashes we have been trying to stabilize. + +3. Fully external artifact URL service - 🎯 7 🛡️ 7.5 🧠 8.5, примерно `1600-2600` строк. + + This may be good long-term for remote/cloud agents, but it is too much for the current release. It introduces upload lifecycle, access control, expiry, privacy, and offline-mode concerns. + +### Global go/no-go gates + +Do not implement the next phase until the previous phase satisfies its gate. + +| Gate | Required before | +|---|---| +| Normalized attachment type is stable and tested | Claude/Codex/OpenCode provider work | +| Renderer optimizer does not mutate existing composer behavior | Any provider delivery wiring | +| Backend can reject forged attachment metadata | Any real provider send | +| Claude image smoke passes | Shipping Claude support | +| Codex image smoke passes | Shipping Codex support | +| OpenCode model capability matrix is explicit | Shipping OpenCode support | +| Negative unsupported-model UX is clear | Any broad release | + +### Migration seams to keep stable + +These seams should become small stable boundaries. Do not leak implementation details across them. + +```ts +// Renderer boundary +prepareComposerAttachmentsForSend(files, target): Promise; + +// Main boundary +hydrateAndValidateAgentAttachments(messageDraft, target): Promise; + +// Delivery boundary +planAttachmentDelivery(providerTarget, attachments): AttachmentDeliveryPlan; + +// Provider boundary +adapter.buildDeliveryParts({ text, attachments, budget }): Promise; +``` + +If a future provider is added, only `planAttachmentDelivery` and a provider adapter should change. The composer should not grow provider-specific branches. + +### Data compatibility strategy + +Persisted messages can exist in three versions: + +1. Legacy messages without attachments. +2. Current messages with file/image metadata but no optimized variant metadata. +3. New normalized attachment messages. + +Hydration should be tolerant: + +```ts +export function hydratePersistedAttachment(raw: unknown): AgentAttachmentPayload | null { + if (isNormalizedAttachment(raw)) return raw; + if (isLegacyImageAttachment(raw)) return normalizeLegacyImageAttachment(raw); + if (isLegacyFileAttachment(raw)) return normalizeLegacyFileAttachment(raw); + return null; +} +``` + +Rules: + +- Do not mutate old message JSON on read. +- Normalize in memory for delivery/rendering. +- Only write new schema for newly created or edited messages. +- If old attachment cannot be hydrated, show it as unavailable, not crash message rendering. + +### Security checklist + +Before implementation PR is accepted: + +- Renderer cannot pass arbitrary absolute paths to backend. +- Backend resolves attachment ids to managed artifact paths. +- IPC rejects `../`, absolute member-controlled paths, unknown attachment ids, and unknown log paths. +- Diagnostics redact API keys and bearer tokens. +- Copy diagnostics does not include raw image content. +- Temp artifacts are not world-readable if stored outside app data. +- Provider errors are exact enough to debug but redacted. + +### Performance checklist + +- Large images are downscaled before send. +- Renderer optimization is cancellable when attachment is removed. +- Multiple concurrent optimizations are bounded. +- Message JSON does not contain base64 blobs. +- Virtualized rendering is not required for attachment previews in v1, but previews should use thumbnails. +- Main process does not synchronously read huge files on UI-critical IPC handlers. + +### Reliability checklist + +- Delivery success is still provider proof-based, not “we attached a file”. +- Attachment failure does not imply teammate offline. +- Runtime crash after attachment send is surfaced separately with stderr/process diagnostics. +- Retry uses same artifact if available, otherwise fails as artifact missing. +- User can remove/replace attachment after validation failure. +- Unsupported vision model is blocked before expensive runtime call. + + +## Red-Team Review: Most Likely Regressions + +These are the failure modes most likely to slip into implementation if the plan is not followed strictly. + +### Regression 1 - Attachments accidentally become launch/runtime truth + +Risk: attachment delivery failure gets mixed with teammate liveness or bootstrap projection. + +How to prevent: + +- Keep attachment failures in message delivery diagnostics only. +- Do not write attachment errors into `launch-state.json`. +- Do not update member `spawnStatus` from attachment provider errors. +- If the process exits, runtime liveness code may report that separately, but the attachment layer must not infer it. + +Review question: + +```text +Can this code path mark a member failed/offline without process/liveness evidence? +``` + +If yes, reject the implementation. + +### Regression 2 - Renderer paths become backend file access + +Risk: UI sends `/Users/.../secret` path and backend reads it as an attachment. + +How to prevent: + +- Renderer sends attachment ids and normalized metadata. +- Backend resolves ids to managed artifact paths. +- Backend rejects paths outside the managed artifact root. +- Provider adapters never accept user-supplied paths directly. + +Review question: + +```text +Could a malicious renderer cause main process to read an arbitrary local file? +``` + +If yes, reject the implementation. + +### Regression 3 - Text-only delivery changes subtly + +Risk: refactor changes text message serialization while adding image support. + +How to prevent: + +- Add fast path for `attachments.length === 0` that calls existing text-only function. +- Add regression tests for text-only direct user, delegate, ask, task, work-sync, and system delivery. +- Do not normalize text-only messages through provider image adapters. + +Review question: + +```text +Does this PR modify text-only provider payload shape? +``` + +If yes, require explicit tests and reason. + +### Regression 4 - OpenCode weak models get infinite repair loops + +Risk: image support and proof repair interact, causing repeated retries. + +How to prevent: + +- Capability block is terminal and not retryable. +- Provider rejection follows existing ledger maxAttempts. +- Empty assistant turn remains bounded by existing ledger policy. +- Semantic “I cannot view images” visible reply is not transport retry. + +Review question: + +```text +Can this path enqueue another prompt without consuming one bounded ledger attempt? +``` + +If yes, reject. + +### Regression 5 - Image optimization hides too much quality loss + +Risk: screenshots become unreadable, model fails, user blames agent. + +How to prevent: + +- Use conservative max edge and JPEG quality floor. +- Show optimized size and warning if aggressive compression was required. +- Never upscale. +- Preserve PNG for small UI screenshots or transparency where practical. + +Review question: + +```text +Can the user understand that the image was compressed and why? +``` + +If no, improve UX before release. + +## File Ownership Map by Phase + +This is not a hard rule, but it helps keep PR blast radius contained. + +| Phase | Primary ownership | Must avoid | +|---|---|---| +| Phase 1 | `src/features/agent-attachments`, composer UI, shared validation | Provider runtime code, launch code | +| Phase 2 | Claude delivery adapter/call site | Codex/OpenCode runtime changes | +| Phase 3 | Codex adapter + orchestrator command builder | Codex auth/session logic | +| Phase 4 | OpenCode adapter + capability matrix | OpenCode launch/bootstrap logic | +| Phase 5 | tests, docs, diagnostics | broad runtime refactors | + +## Suggested Review Order + +For each implementation PR, review in this order: + +1. Public contract and type shape. +2. Path/security validation. +3. Text-only no-regression path. +4. Provider-specific serialization. +5. Error mapping and diagnostics. +6. Retry/lifecycle interaction. +7. Tests. +8. UI copy. + +Do not start by reviewing UI. Most severe bugs will be in path validation, lifecycle coupling, or provider transport. + + +## Implementation Playbook Addendum + +### Schema versioning + +Attachment metadata should have an explicit schema version from day one. This prevents future migrations from relying on brittle field existence checks. + +```ts +export const AGENT_ATTACHMENT_SCHEMA_VERSION = 1 as const; + +export interface VersionedAgentAttachmentPayload extends AgentAttachmentPayload { + schemaVersion: typeof AGENT_ATTACHMENT_SCHEMA_VERSION; +} +``` + +Versioning rules: + +- New messages write `schemaVersion: 1`. +- Old messages without `schemaVersion` are hydrated as legacy attachments in memory. +- Hydration must be pure and non-mutating. +- A future migration can upgrade persisted records, but Phase 1 should not rewrite historical inboxes. + +### Managed artifact storage layout + +Use deterministic app-managed storage. Do not store optimized files in random temp locations that disappear before retry. + +Recommended layout: + +```text +~/.claude/teams//attachments///original. +~/.claude/teams//attachments///optimized. +~/.claude/teams//attachments///thumb. +~/.claude/teams//attachments///meta.json +``` + +Storage rules: + +- `teamName`, `messageId`, and `attachmentId` must be validated identifiers, not raw paths. +- Original artifact is immutable after initial write. +- Optimized artifacts may be regenerated if missing and original exists. +- Thumbnail is UI-only and must not be used for provider delivery. +- Artifact cleanup must be reachability-based or age-based, never immediate after send. + +### Artifact metadata example + +```json +{ + "schemaVersion": 1, + "attachmentId": "att_abc123", + "messageId": "msg_abc123", + "originalName": "screenshot.png", + "mimeType": "image/png", + "originalBytes": 1452000, + "optimizedBytes": 422000, + "width": 2560, + "height": 1440, + "optimizedWidth": 1600, + "optimizedHeight": 900, + "createdAt": "2026-05-09T00:00:00.000Z" +} +``` + +Do not store provider credentials, prompt text, or base64 content in this metadata. + +### Concurrency model + +Attachment processing has three independent concurrency domains: + +1. Renderer optimization jobs. +2. Main-process artifact validation and persistence. +3. Provider delivery attempts. + +They should not share mutable state directly. + +```ts +export interface AttachmentProcessingToken { + draftId: string; + generation: number; + attachmentIds: string[]; +} +``` + +Concurrency rules: + +- Renderer jobs are invalidated by draft generation. +- Main-process validation is tied to message id. +- Provider delivery is tied to ledger/delivery attempt id. +- Retry must use message id and attachment id, not current composer draft state. + +### Garbage collection policy + +Attachment artifacts can grow large. Add a conservative GC policy but do not make it part of the critical send path. + +Safe GC inputs: + +- Message records that still reference attachments. +- Delivery ledger records that are still retryable. +- Artifact creation time. +- Team deletion/removal. + +Unsafe GC behavior: + +- Deleting artifacts immediately after first successful delivery. +- Deleting artifacts while a retry is pending. +- Deleting original while keeping optimized only. +- Running expensive GC synchronously during send. + +Recommended first version: + +```ts +export interface AttachmentGarbageCollectionPolicy { + minAgeBeforeUnreferencedDeleteMs: number; + maxTotalBytesPerTeam?: number; + dryRun?: boolean; +} +``` + +Phase 1 can define the policy and tests. Actual scheduled GC can be a later small PR if release risk is high. + +### Telemetry and diagnostics boundaries + +Useful diagnostic summary: + +```ts +export interface AgentAttachmentDeliveryDiagnostic { + messageId: string; + targetProvider: string; + targetModel: string; + attachmentCount: number; + totalOriginalBytes: number; + totalOptimizedBytes: number; + capabilityDecision: string; + failureCode?: AttachmentDeliveryFailureCode; +} +``` + +Never include: + +- Raw image bytes. +- Base64 strings. +- OCR output. +- Secret-bearing provider request payloads. +- Full local absolute artifact path in normal UI diagnostics. + +### Explicit accepted risks + +These are acceptable for the first release: + +- Some image formats are unsupported even if the browser can preview them. +- Unknown OpenCode models are blocked for image delivery until tested. +- Optimized screenshots may lose minor visual fidelity. +- Artifact GC may initially be conservative and leave extra files. +- Full image search/annotation/editing is out of scope. + +These are not acceptable: + +- Images pasted as base64 text to all providers. +- Unsupported model silently receiving image-less prompt. +- Attachment failure marking teammate spawn failed. +- Arbitrary renderer path read by backend. +- Text-only messaging regression. + + +## Traceability Matrix: Real Problems to Design Decisions + +This section maps the original user pain points to concrete technical controls. It should be used during implementation review to verify that the plan solves the actual problem, not only the happy path. + +| Problem observed | Root risk | Design decision | Phase | +|---|---|---|---| +| Large screenshots may crash or destabilize runtime delivery | Huge base64/stdin payloads and unbounded image bytes | Optimize in renderer, enforce backend budgets, never paste base64 as plain text | 1 | +| Different providers support images differently | False universal abstraction | Shared normalization plus provider-native adapters | 2, 3, 4 | +| Codex and OpenCode may not support every model equally | Model capability ambiguity | Capability matrix and backend final support gate | 3, 4 | +| User needs clear UI when model cannot see images | Silent hallucination or confusing failure | Pre-send warnings/blocking with exact provider/model reason | 1, 4, 5 | +| Attachment failure could be mistaken for team crash | Lifecycle coupling | Attachment errors remain message-delivery diagnostics only | all | +| Existing launch stability is fragile | Accidental runtime/provisioning changes | No launch/readiness imports in attachment feature | all | +| Retry could resend without files or as text fallback | Silent data loss | Retry preserves attachment identity, missing artifact is explicit failure | 3, 4, 5 | +| User privacy across teammates | Accidental fanout | Attachments follow explicit message route only | 4 | + +## Dependency Policy + +Phase 1 may add or use `pica` in the renderer for high-quality resizing only if dependency review passes. + +Before adding or updating dependency: + +```bash +pnpm view pica version +pnpm view pica license +pnpm view pica time --json +``` + +Dependency rules: + +- Prefer an already-installed dependency if it satisfies quality and maintenance needs. +- Do not add native dependencies to Electron main for image processing in this phase. +- Do not add a full log/image viewer package for basic image resizing. +- Record the checked version and date in the PR description. +- If the installed version is older than latest, decide explicitly whether to upgrade or use current lockfile version. + +Recommended dependency risk rating for `pica` approach: + +- 🎯 8.8 🛡️ 8.7 🧠 3.5, roughly `120-220` LOC renderer integration. + +Why not sharp/jimp in main for v1: + +- `sharp` adds native packaging risk in Electron. +- `jimp` is slower and lower quality for large screenshots. +- Renderer has browser decode/canvas APIs and keeps image work close to the user action. + +## Phase Slicing for Small PRs + +Even inside each phase, split implementation into small reviewable PRs where possible. + +### Phase 1 PR slices + +1. Shared types and pure budget/capability functions. +2. Artifact storage/path validation in main. +3. Renderer optimization hook and UI warnings. +4. Composer send blocking and backend validation integration. + +### Phase 2 PR slices + +1. Claude adapter pure serializer and tests. +2. Claude delivery call-site integration behind attachment-present branch. +3. Claude live smoke and diagnostics polish. + +### Phase 3 PR slices + +1. Desktop Codex adapter request shape. +2. Orchestrator Codex image arg builder. +3. End-to-end Codex smoke and failure mapping. + +### Phase 4 PR slices + +1. OpenCode vision capability matrix. +2. OpenCode file-part adapter. +3. Ledger/proof interaction tests. +4. OpenRouter model smoke and unsupported-model UX. + +### Phase 5 PR slices + +1. Automated test matrix. +2. Manual smoke scripts/docs. +3. Diagnostics and release notes. + +Do not ship a phase as one huge PR unless the repo owner explicitly accepts the higher review risk. + +## Contract Boundaries That Must Not Be Crossed + +```text +Renderer composer + -> normalized draft attachment metadata +Main attachment feature + -> validated managed artifact bundle +Provider delivery adapter + -> provider-native delivery parts +Runtime/orchestrator + -> process/CLI execution only +Delivery ledger + -> proof/retry state only +Launch/provisioning + -> no dependency on attachments +``` + +Forbidden imports: + +- Launch/provisioning code importing `agent-attachments`. +- Attachment adapter importing launch-state helpers. +- Renderer optimizer importing main-process path/storage helpers. +- OpenCode repair policy importing model vision capability to decide retry loops. + +## Design Comments to Add in Code + +Add short comments only where the decision is non-obvious. + +Example near provider-native base64 block: + +```ts +// This base64 is part of Claude's structured image block, not plain prompt text. +// Do not replace this with data:image text, because that breaks provider-native image handling. +``` + +Example near launch/liveness separation: + +```ts +// Attachment delivery errors are message-level diagnostics. +// They must not update teammate launchState/spawnStatus unless runtime liveness independently reports an exit. +``` + +Example near OpenCode capability block: + +```ts +// OpenCode image support is model-specific. Unknown models are blocked for reliability instead of silently receiving image-less prompts. +``` + + +## Implementation Contract Addendum + +### No feature flags, but explicit capability gates + +This project should not use hidden feature flags or environment flags for the final behavior. The safe gating mechanism is provider/model capability, not runtime flags. + +Rules: + +- If a provider/model supports images, the UI may allow image send after validation. +- If a provider/model does not support images, the UI blocks image send with explanation. +- If support is unknown, default to safe block for release. +- Do not add `ENABLE_AGENT_ATTACHMENTS`, `USE_NEW_ATTACHMENTS`, or similar flags. +- Temporary local dev toggles are allowed only in uncommitted experiments, not in final implementation. + +Reason: + +Feature flags would create two production paths and double the regression surface. Capability gates are part of the domain and must remain after release. + +### Public API and IPC principles + +Attachments should not create broad filesystem IPC. If new IPC is required, it should be attachment-specific and validated. + +Allowed IPC style: + +```ts +attachments:prepareImageArtifact(draftAttachmentId, metadata) +attachments:getAttachmentPreview(attachmentId) +attachments:removeDraftAttachment(attachmentId) +``` + +Avoid IPC style: + +```ts +fs:readFile(path) +attachments:readArbitraryPath(path) +provider:sendRawPayload(payload) +``` + +IPC validation requirements: + +- Validate all ids. +- Validate MIME type against allowlist. +- Validate declared byte size against file stat when file exists. +- Reject unknown provider/model values unless they pass existing provider metadata validation. +- Return typed errors, not raw thrown stack traces. + +### Error class pattern + +Use typed domain errors so UI and provider adapters can map failures consistently. + +```ts +export class AgentAttachmentError extends Error { + constructor( + readonly code: AttachmentDeliveryFailureCode, + message: string, + readonly details: Record = {} + ) { + super(message); + this.name = 'AgentAttachmentError'; + } +} +``` + +Mapping rule: + +- Domain validation throws/returns `AgentAttachmentError`. +- Provider adapter wraps provider rejection into `AgentAttachmentError` after redaction. +- UI consumes `code` for behavior and `message` for display. +- Logs may include safe `details`, never raw bytes or secrets. + +### DTO stability + +Keep DTOs discriminated and serializable. + +```ts +export type AttachmentValidationResult = + | { ok: true; warnings: AttachmentWarning[] } + | { ok: false; code: AttachmentDeliveryFailureCode; message: string; warnings: AttachmentWarning[] }; +``` + +Avoid returning `Error` objects over IPC. Convert to plain JSON. + +### Attachment lifecycle diagram + +```mermaid +flowchart TD + A["User selects image"] --> B["Renderer decodes and optimizes"] + B --> C["Renderer shows preview and warnings"] + C --> D["User sends message"] + D --> E["Main validates and persists artifacts"] + E --> F["Message record references attachment ids"] + F --> G["Provider adapter builds native payload"] + G --> H["Existing delivery proof gates decide success"] + H --> I["UI shows reply or delivery diagnostic"] +``` + +Critical separation: + +- Steps A-C are draft UX. +- Steps D-F are persistence and validation. +- Step G is provider transport. +- Step H is existing message proof lifecycle. +- None of these steps should mutate launch/bootstrap state. + +### Multi-repo sequencing + +When a phase touches both `claude_team` and `agent_teams_orchestrator`, use this sequence: + +1. Add passive types/tests in `claude_team`. +2. Add orchestrator command builder support with tests. +3. Wire desktop to send new request shape. +4. Run focused tests in both repos. +5. Run one manual smoke. +6. Commit both repos only after compatible state is reached. + +Do not merge desktop code that emits image request fields before orchestrator understands them. + + +## Implementation Readiness Addendum + +### Definition of Ready before coding + +Do not start production implementation until these decisions are accepted: + +- Image delivery uses provider-native channels, not base64 plain text. +- OpenCode unknown vision models are blocked for release reliability. +- Attachments are message delivery artifacts, not launch/runtime truth. +- Renderer optimization is allowed, but backend remains final authority. +- No feature flags are used for final behavior; capability gates are the product behavior. +- Non-image file semantics are not expanded until image flow is stable. + +### Image vs file scope boundary + +This plan is primarily for image attachments. Existing file attachments should not be redesigned in this phase. + +| Attachment kind | Phase behavior | +|---|---| +| PNG/JPEG screenshot | Supported through image pipeline. | +| WebP | Optional only if all provider paths are tested. Otherwise block or convert explicitly. | +| GIF/animated image | Unsupported in v1 unless first-frame conversion is intentionally implemented. | +| PDF | Keep current file behavior. Do not pretend all agents can visually inspect PDF pages. | +| TXT/MD/code file | Keep current file/path/text behavior. Not part of image pipeline. | +| Arbitrary binary | Unsupported for agent delivery unless existing file flow already supports it. | + +If users attach non-image files while targeting a provider with image support, do not route those files through image adapters. + +### Persisted message attachment example + +New messages should reference attachments, not embed bytes. + +```json +{ + "id": "msg_123", + "from": "user", + "to": "lead", + "text": "What is wrong with this screenshot?", + "attachments": [ + { + "schemaVersion": 1, + "id": "att_1", + "kind": "image", + "originalName": "screenshot.png", + "mimeType": "image/png", + "sizeBytes": 1452000, + "storage": { + "originalArtifactId": "art_original_1", + "optimizedArtifactId": "art_optimized_1", + "thumbnailArtifactId": "art_thumb_1" + }, + "image": { + "width": 2560, + "height": 1440, + "optimizedWidth": 1600, + "optimizedHeight": 900, + "optimization": "resized" + }, + "warnings": ["image_was_resized"] + } + ] +} +``` + +Forbidden persisted shape: + +```json +{ + "text": "What is wrong with this screenshot? data:image/png;base64,..." +} +``` + +### Provider payload examples + +Claude structured image block: + +```ts +[ + { type: 'text', text: 'What color is this square?' }, + { + type: 'image', + source: { + type: 'base64', + media_type: 'image/png', + data: '', + }, + }, +] +``` + +Codex native runtime request: + +```ts +{ + text: 'What color is this square?', + images: [ + { path: '/managed/team/msg/att/optimized.png', mimeType: 'image/png' }, + ], +} +``` + +OpenCode native runtime request: + +```ts +{ + text: 'What color is this square?', + files: [ + { path: '/managed/team/msg/att/optimized.png', mimeType: 'image/png' }, + ], +} +``` + +### Rollback by phase + +| Phase | Safe rollback | +|---|---| +| Phase 1 | Disable composer attachment preparation call. Leave pure modules unused. | +| Phase 2 | Disable Claude image adapter branch. Text-only Claude path remains. | +| Phase 3 | Disable Codex image capability gate. Orchestrator image arg builder can remain unused. | +| Phase 4 | Mark OpenCode image capability unsupported for all models. Text-only OpenCode remains. | +| Phase 5 | Remove only smoke scripts/docs if unstable. Runtime behavior unaffected. | + +Rollback must not delete persisted messages or artifacts. If needed, UI can show attachments as unavailable until support is restored. + + +## Final Implementation Guardrails Addendum + +### Bug ledger to keep during implementation + +During implementation, keep a short bug ledger in the PR description. This prevents hidden tradeoffs from being forgotten. + +```md +## Attachment implementation bug ledger + +| Risk | Status | Mitigation | +|---|---|---| +| Text-only path changed | checked | attachments.length === 0 uses legacy path | +| Renderer path could escape storage | checked | backend id-based resolver only | +| Unsupported OpenCode model silently accepts image | checked | capability block before delivery | +| Retry loses attachment artifact | checked | retry resolves same artifact id | +| Attachment error mutates launch state | checked | no launch-state imports | +``` + +### Acceptance-test specification format + +Every phase should define tests in this style before code is written: + +```md +Given: a user attaches a 5 MB PNG screenshot +When: the selected model supports images +Then: the image is optimized below the provider budget +And: the persisted message references artifact ids, not base64 +And: provider delivery uses the native image channel +``` + +This keeps implementation aligned with product behavior instead of internal mechanics only. + +### Provider capability maintenance policy + +Capabilities are product behavior, so changes must be reviewed like code, not treated as config trivia. + +Rules: + +- Add a test for every new model capability entry. +- Include evidence for `vision: yes` and `vision: no`. +- Prefer `unknown` or blocked over optimistic support. +- Remove or downgrade support if live smoke becomes unstable. +- Do not infer that all OpenRouter models support vision because one OpenRouter model worked. + +### Provider capability review table + +| Decision | Required evidence | Risk if wrong | +|---|---|---| +| Mark model vision-capable | live image smoke or official docs | user sends image, model cannot see it | +| Mark model unsupported | live refusal or official docs | false negative, user cannot use capable model | +| Keep model unknown | no stable evidence | safest release behavior | + +### Architectural quality score target + +Implementation should target: + +- SOLID: `8.5/10` or higher. +- DRY: `8/10` or higher. +- Provider isolation: `9/10` or higher. +- Launch/runtime isolation: `9.5/10` or higher. +- Testability: `8.5/10` or higher. + +If a phase drops below these, do not continue to the next provider. Refactor the boundary first. + +### What not to optimize prematurely + +Do not add these before the basic image path is stable: + +- Full image gallery management. +- Image annotation/editing. +- OCR extraction. +- Remote artifact upload service. +- Streaming upload progress per provider. +- Cross-teammate automatic attachment forwarding. +- Automatic model selection based on attachment. + +Each adds product and reliability complexity that is not needed for the first stable release. + + +## Pre-Mortem: How This Project Could Fail + +Use this section before implementation and before merge. If any item becomes true, stop and fix the design before continuing. + +| Failure mode | Early warning sign | Prevention | +|---|---|---| +| Attachment logic leaks into launch | `agent-attachments` imported from provisioning/runtime bootstrap files | Keep attachment layer under message delivery only. | +| Provider adapters duplicate validation | Same budget/type checks copy-pasted in Claude/Codex/OpenCode adapters | Shared validator produces validated bundle before adapter. | +| UI says image sent but provider got text only | Provider payload tests do not assert native image channel | Add serialization tests for each provider. | +| Retry loses image | Retry code reads current composer draft instead of persisted message attachments | Retry resolves attachments by message id and attachment id. | +| OpenCode weak model loops forever | Capability unsupported case creates ledger attempt | Capability block happens before delivery attempt. | +| Codex auth regresses | Image support touches auth/env construction | Keep Codex auth path untouched and test diagnostics. | +| Attachments bloat inbox files | Message JSON contains base64 | Persist artifact ids only. | +| Screenshots become unreadable | Aggressive compression without warning | Conservative quality floor and explicit warning. | +| User image leaks to all teammates | Team send fanouts attachment automatically | Attachments follow explicit route only. | + +## Error Propagation Contract + +Errors should move through the system with enough structure to drive UI and diagnostics. + +```ts +export interface AgentAttachmentErrorJson { + code: AttachmentDeliveryFailureCode; + message: string; + providerId?: string; + model?: string; + attachmentId?: string; + retryable: boolean; + safeDetails?: Record; +} +``` + +Rules: + +- Renderer displays `message` and may branch on `code`. +- Main logs `safeDetails` only after redaction. +- Provider adapter must preserve provider/model in error context. +- `retryable` means the same message can be attempted again without user edits. +- Capability errors are not retryable until user changes model or removes attachment. +- Artifact missing is not retryable unless original can be regenerated. + +## Test Builder Conventions + +Use builders to keep tests readable and avoid fragile object literals. + +```ts +export function fakeImageAttachment(overrides: Partial = {}): AgentAttachmentPayload { + return { + schemaVersion: 1, + id: 'att_test_1', + kind: 'image', + originalName: 'red-square.png', + mimeType: 'image/png', + sizeBytes: 1024, + storage: { + originalArtifactId: 'art_original_1', + optimizedArtifactId: 'art_optimized_1', + }, + image: { + width: 64, + height: 64, + optimizedWidth: 64, + optimizedHeight: 64, + optimization: 'none', + }, + warnings: [], + ...overrides, + }; +} +``` + +Builder rules: + +- Defaults should represent a valid supported image. +- Invalid cases should override one field at a time. +- Builders must not hide required fields with `as any`. +- Provider adapter tests should use fake artifact readers, not real files unless testing storage. + +## Review Grep Checklist + +Run these searches during implementation review. + +```bash +rg "data:image|base64" src test +rg "launchState|spawnStatus|bootstrapConfirmed" src/features/agent-attachments src/main/services/team +rg "--image|-f" src /Users/belief/dev/projects/claude/agent_teams_orchestrator/src +rg "readFile\(|writeFile\(" src/features/agent-attachments src/main/services +rg "OPENAI_API_KEY|ANTHROPIC_API_KEY|CODEX_API_KEY|Authorization" src/features/agent-attachments src/main/services +``` + +Review meaning: + +- `base64` should appear only in Claude structured image block code/tests. +- `launchState`/`spawnStatus` should not appear inside attachment feature. +- `--image` should appear only in Codex runtime command builder/tests. +- `-f` should appear only in OpenCode runtime file-part builder/tests. +- Secret env names should appear only in redaction tests or existing auth code, not attachment diagnostics. + +## Minimum Viable Shipping Scope + +If release pressure is high, ship only this subset: + +1. Phase 1 normalization, optimization, validation, and warnings. +2. Claude image delivery if real app-path smoke is green. +3. Codex image delivery if `--image` support and subscription auth are green. +4. OpenCode support only for models with live smoke evidence. + +Do not ship broad “all OpenCode models support images” behavior. + diff --git a/docs/team-management/agent-attachments-phase-1-normalization-and-budgets-plan.md b/docs/team-management/agent-attachments-phase-1-normalization-and-budgets-plan.md index 59d9f7af..9614dd53 100644 --- a/docs/team-management/agent-attachments-phase-1-normalization-and-budgets-plan.md +++ b/docs/team-management/agent-attachments-phase-1-normalization-and-budgets-plan.md @@ -909,3 +909,811 @@ describe('picaImageOptimizer', () => { }); }); ``` + +## Detailed Implementation Checklist + +### Step 1 - Add pure domain module + +Create a feature-local domain module with no Electron, DOM, or filesystem dependency. + +Suggested files: + +```text +src/features/agent-attachments/shared/types.ts +src/features/agent-attachments/shared/budgets.ts +src/features/agent-attachments/shared/capabilities.ts +src/features/agent-attachments/shared/validation.ts +src/features/agent-attachments/shared/index.ts +``` + +The first implementation should be intentionally boring: + +```ts +export function classifyAttachmentMime(mimeType: string): AttachmentKind { + if (mimeType === 'image/png') return 'image'; + if (mimeType === 'image/jpeg') return 'image'; + if (mimeType === 'image/webp') return 'image'; + if (mimeType === 'application/pdf') return 'file'; + if (mimeType.startsWith('text/')) return 'file'; + return 'unsupported'; +} +``` + +Avoid clever extension guessing in v1. Extension fallback can be a later hardening step if real users need it. + +### Step 2 - Add renderer optimizer behind composer-only call site + +Keep optimization in renderer because browser APIs are good at image decode and `pica` is browser-oriented. + +```ts +export interface OptimizeImageForAgentInput { + file: File; + budget: ImageOptimizationBudget; +} + +export interface OptimizeImageForAgentResult { + original: BrowserAttachmentArtifact; + optimized: BrowserAttachmentArtifact; + warnings: string[]; +} +``` + +Implementation order: + +1. Decode image dimensions with `createImageBitmap` where available. +2. Compute target dimensions from total batch budget and per-image max dimension. +3. Use `pica.resize` into canvas. +4. Encode JPEG/PNG based on input and transparency. +5. Return warnings, not thrown errors, for non-fatal resize degradation. + +### Step 3 - Add backend validation + +Main process must re-check everything received from renderer. + +```ts +export function validateNormalizedAttachmentForSend(input: { + attachment: AgentAttachmentPayload; + target: ProviderTarget; +}): AttachmentValidationResult { + const capability = resolveAgentAttachmentCapability(input.target); + const sizeResult = validateAttachmentBudget(input.attachment, capability); + if (!sizeResult.ok) return sizeResult; + return { ok: true, warnings: [] }; +} +``` + +Backend validation should never trust: + +- MIME type from browser alone. +- File name extension. +- Renderer-provided optimized dimensions. +- Renderer-provided `supported: true` capability result. + +### Step 4 - Wire UI warnings without changing delivery + +Before provider adapters are implemented, UI can show warnings but must not pretend unsupported providers work. + +Safe UX: + +- Allow attach, show preview. +- On send, block if selected target cannot receive the attachment yet. +- Explain which phase/provider support is missing. +- Keep text-only send untouched. + +## Phase 1 Exit Criteria + +Phase 1 is complete when: + +- Text-only composer behavior is unchanged. +- Image preview still works for small images. +- Large image preview shows optimized size and warnings. +- Unsupported file type produces a clear local validation error. +- Backend rejects forged oversized attachment metadata. +- No provider delivery path receives new attachment data yet unless explicitly wired in later phases. + +## Edge Case Matrix + +| Case | Expected behavior | +|---|---| +| Animated GIF | Treat as unsupported for image delivery in v1, or convert first frame only with explicit warning if implemented. | +| Transparent PNG | Prefer PNG if small, JPEG only if transparency is absent or user accepts flattened background. | +| Huge panorama | Downscale by max edge and total pixel budget. | +| Tiny image | Do not upscale. | +| Corrupt image | Show decode failed, do not send attachment. | +| HEIC on macOS | Do not promise support unless decode pipeline is explicitly tested. | +| Clipboard image with no filename | Generate stable display name like `clipboard-image.png`. | +| Same image attached twice | Keep two attachment ids, do not dedupe content silently. | +| Multiple images exceed total budget | Optimize all proportionally, then block if still over cap. | +| User switches target after attaching | Recompute warnings for new provider/model before send. | + +## Common Bug Patterns to Avoid + +- Storing base64 in message JSON. This can bloat inboxes and break process stdin. +- Mutating the original attachment when optimization runs. +- Letting renderer decide final support without backend validation. +- Showing “sent” when attachment was dropped from provider payload. +- Collapsing all failures into generic “delivery failed”. +- Running optimization in Electron main with a new native dependency right before release. +- Changing current file attachment behavior for text/PDF before image flow is stable. + +## Focused Test Examples + +```ts +describe('attachment budgets', () => { + it('blocks a single optimized image over hard cap', () => { + const result = validateAttachmentBudget( + fakeImage({ sizeBytes: 12 * 1024 * 1024 }), + codexVisionCapability() + ); + + expect(result.ok).toBe(false); + expect(result.code).toBe('attachment_too_large'); + }); + + it('does not upscale small images', () => { + const plan = planImageResize({ width: 320, height: 200 }, { maxEdge: 1600 }); + expect(plan.width).toBe(320); + expect(plan.height).toBe(200); + }); +}); +``` + +## Manual QA Script + +Use a clean dev profile and verify: + +1. Attach a 200 KB PNG screenshot - preview appears, no warning. +2. Attach a 12 MB PNG screenshot - preview appears, optimized size shown. +3. Attach three screenshots - total budget warning is deterministic. +4. Attach a `.txt` file - current file behavior remains unchanged. +5. Attach a corrupt image renamed `.png` - decode error appears. +6. Switch target from Claude to non-vision OpenCode model - unsupported warning appears. +7. Remove attachment - warnings clear. + +## Rollback Plan + +If Phase 1 causes UI instability: + +- Keep domain files, but remove composer call to optimizer. +- Leave backend validation unused. +- No persisted migration is needed if message schema was not changed. + + +## Implementation Safeguards + +### Keep Phase 1 read-only for provider delivery + +Phase 1 should not change how messages are sent to agents. It should only add normalization, previews, warnings, and backend validation primitives. + +Safe call sites: + +- Composer attachment selection. +- Composer warning rendering. +- Draft serialization of normalized metadata. +- Backend validation helper tests. + +Unsafe call sites for Phase 1: + +- Claude delivery transport. +- Codex runtime invocation. +- OpenCode prompt ledger. +- Team launch/provisioning. +- Member runtime liveness. + +### Suggested internal state machine + +```ts +type AttachmentPrepareState = + | { status: 'idle' } + | { status: 'decoding'; attachmentId: string } + | { status: 'optimizing'; attachmentId: string; progress?: number } + | { status: 'ready'; attachment: AgentAttachmentPayload } + | { status: 'blocked'; attachmentId: string; reason: AttachmentDeliveryFailureCode } + | { status: 'failed'; attachmentId: string; error: string }; +``` + +UI rule: + +- `blocked` means user can fix/remove/change target. +- `failed` means local processing failed. +- Neither state should enqueue runtime delivery. + +### Budget planning algorithm + +Use deterministic, simple budget allocation. Do not optimize each image independently to the max, because a batch can still exceed total budget. + +```ts +export function allocateImageBudgets(input: { + images: ImageCandidate[]; + totalMaxBytes: number; + perImageMaxBytes: number; +}): ImageBudgetAllocation[] { + const perImageFairShare = Math.floor(input.totalMaxBytes / Math.max(1, input.images.length)); + const targetBytes = Math.min(input.perImageMaxBytes, perImageFairShare); + return input.images.map((image) => ({ imageId: image.id, targetBytes })); +} +``` + +If the optimized output is still above target: + +1. Reduce dimensions down to min edge threshold. +2. Reduce JPEG quality down to minimum acceptable quality. +3. If still too large, block and explain. + +Do not loop indefinitely. + +### Cancellation edge cases + +| User action | Safe behavior | +|---|---| +| Removes image during optimization | Cancel or ignore result by generation id. | +| Adds image then switches team | Cancel or detach optimization result from old draft. | +| Switches provider/model | Recompute capability warnings without re-encoding if artifact is reusable. | +| Sends while optimization pending | Disable send or show “attachment still processing”. | +| Closes app mid-optimization | No partially written artifact should be treated as ready. | + +### Generation id pattern + +```ts +let prepareGeneration = 0; + +async function prepareAttachments(files: File[]) { + const generation = ++prepareGeneration; + const result = await optimize(files); + if (generation !== prepareGeneration) return; + setPreparedAttachments(result); +} +``` + +This avoids stale async results restoring removed attachments. + +### Phase 1 PR checklist + +- No provider delivery path changed. +- No launch/runtime tests need snapshot updates. +- Text-only message send still works manually. +- Image preview removal cannot leave hidden payload in draft. +- Backend validation is stricter than renderer validation. +- All user-visible errors are actionable. + + +## Failure Injection Tests for Phase 1 + +Add tests that intentionally simulate bad renderer or corrupted local state. + +```ts +describe('attachment backend validation hardening', () => { + it('rejects renderer supplied absolute paths outside managed storage', () => { + const result = validateAttachmentStorageReference({ + artifactId: 'att_1', + path: '/Users/belief/.ssh/id_rsa', + }); + + expect(result.ok).toBe(false); + expect(result.code).toBe('attachment_artifact_path_unsafe'); + }); + + it('rejects metadata that claims small size but file is large', async () => { + const result = await validateAttachmentArtifactOnDisk({ + expectedSizeBytes: 100, + actualPath: fixturePath('large-image.png'), + maxBytes: 1024, + }); + + expect(result.ok).toBe(false); + expect(result.code).toBe('attachment_too_large'); + }); +}); +``` + +## Browser and Platform Edge Cases + +| Edge case | Safe implementation note | +|---|---| +| Safari/WebKit image decode differences | Use feature detection, not browser assumptions. | +| macOS clipboard TIFF/HEIC | Treat unsupported until explicitly converted/tested. | +| EXIF orientation | Prefer decode path that respects orientation or normalize orientation during canvas draw. | +| Color profiles | Accept minor color shift for screenshots, do not promise color-managed output. | +| Canvas memory pressure | Bound megapixels before drawing. | +| Pica worker failure | Fall back to canvas resize with warning, or block with clear error. | +| Transparent UI screenshot | Avoid JPEG flattening unless transparency absent or user warning is shown. | +| Very long screenshot | Limit max edge and total pixels to prevent huge canvas allocation. | + +## Memory Safety Budget + +Renderer memory is the main Phase 1 risk. + +Recommended starting limits: + +```ts +export const ATTACHMENT_IMAGE_LIMITS = { + maxInputBytes: 20 * 1024 * 1024, + maxInputPixels: 32_000_000, + maxOutputBytesPerImage: 4 * 1024 * 1024, + maxOutputBytesTotal: 8 * 1024 * 1024, + maxOutputEdge: 2400, + minJpegQuality: 0.72, + defaultJpegQuality: 0.86, +}; +``` + +These are release-safe starting values, not final product limits. They should be tuned after real usage. + +## Phase 1 Stop Conditions + +Stop implementation and reassess if any of these happen: + +- Optimized images are stored as base64 in persisted message JSON. +- Main process needs a new native image dependency. +- Existing file attachments stop sending. +- Composer draft state becomes provider-specific. +- Text-only messages require schema migration. + + +## File-Level Implementation Plan + +Suggested new files: + +```text +src/features/agent-attachments/shared/types.ts +src/features/agent-attachments/shared/budgets.ts +src/features/agent-attachments/shared/capabilities.ts +src/features/agent-attachments/shared/validation.ts +src/features/agent-attachments/shared/storageIds.ts +src/features/agent-attachments/renderer/optimizeImageForAgent.ts +src/features/agent-attachments/renderer/usePreparedAttachments.ts +src/features/agent-attachments/main/validateAttachmentArtifact.ts +src/features/agent-attachments/main/attachmentArtifactStore.ts +``` + +Suggested tests: + +```text +test/features/agent-attachments/budgets.test.ts +test/features/agent-attachments/capabilities.test.ts +test/features/agent-attachments/validation.test.ts +test/features/agent-attachments/attachmentArtifactStore.test.ts +``` + +### Storage id validation + +```ts +const SAFE_ID_RE = /^[a-zA-Z0-9][a-zA-Z0-9_-]{0,120}$/; + +export function assertSafeAttachmentStorageId(name: string, value: string): void { + if (!SAFE_ID_RE.test(value)) { + throw new Error(`Invalid ${name}`); + } +} +``` + +Use this for `teamName`, `messageId`, and `attachmentId` before building artifact paths. + +### Managed path resolver + +```ts +export function resolveAttachmentArtifactPath(input: { + teamRoot: string; + teamName: string; + messageId: string; + attachmentId: string; + fileName: 'original.png' | 'original.jpg' | 'optimized.png' | 'optimized.jpg' | 'thumb.jpg' | 'meta.json'; +}): string { + assertSafeAttachmentStorageId('teamName', input.teamName); + assertSafeAttachmentStorageId('messageId', input.messageId); + assertSafeAttachmentStorageId('attachmentId', input.attachmentId); + + const base = path.resolve(input.teamRoot, input.teamName, 'attachments', input.messageId, input.attachmentId); + const resolved = path.resolve(base, input.fileName); + if (!resolved.startsWith(base + path.sep)) { + throw new Error('Attachment artifact path escaped managed directory'); + } + return resolved; +} +``` + +### Renderer optimizer guardrails + +```ts +export async function optimizeImageForAgent(input: OptimizeImageForAgentInput): Promise { + const bitmap = await createImageBitmap(input.file); + assertPixelBudget(bitmap.width, bitmap.height, input.budget.maxInputPixels); + + const target = planResizeDimensions({ + width: bitmap.width, + height: bitmap.height, + maxEdge: input.budget.maxOutputEdge, + }); + + const canvas = document.createElement('canvas'); + canvas.width = target.width; + canvas.height = target.height; + + await resizeWithPicaOrFallback(bitmap, canvas); + const blob = await encodeCanvasForProvider(canvas, input.budget); + return buildOptimizationResult(input.file, blob, target); +} +``` + +Do not implement infinite quality-search loops. Use a small bounded set of quality attempts like `[0.86, 0.8, 0.74, 0.72]`. + +### Phase 1 exact acceptance criteria + +- Backend path resolver rejects traversal in tests. +- Budget tests cover single, multiple, and oversized images. +- Renderer hook ignores stale optimization result after removal. +- Composer cannot send while any attachment is `optimizing`. +- Existing text-only composer tests do not need behavioral changes. +- No provider delivery module imports renderer optimizer. + + +## Phase 1 Deep Review Addendum + +### Exact UI states + +Composer should expose these states clearly: + +| State | Send allowed | Copy | +|---|---:|---| +| No attachments | yes | normal text-only behavior | +| Attachments optimizing | no | `Preparing image...` | +| Attachments ready and supported | yes | optional optimized size summary | +| Attachment too large after optimization | no | `Image is too large after optimization. Remove it or use a smaller image.` | +| Target model unsupported | no | `Selected model does not support image attachments.` | +| Decode failed | no | `Could not read this image.` | +| Artifact persistence failed | no | `Could not prepare attachment for sending.` | + +### Accessibility and UX guardrails + +- Warning text should be readable without relying on color. +- Remove button must remain available for blocked attachments. +- If multiple attachments have warnings, show per-attachment reason and aggregate reason near send. +- Do not hide text draft when attachment fails. +- If user removes the blocked attachment, send button should recover immediately. + +### Persistence safety + +When user sends a message with attachments: + +1. Generate message id first. +2. Persist original/optimized artifacts under message id. +3. Persist message record referencing attachment ids. +4. Begin provider delivery. + +Do not begin provider delivery before message record and artifacts are durably written. + +This avoids a retry state where ledger knows about a message but artifacts are missing because write happened later. + +### Phase 1 anti-regression tests + +```ts +it('does not serialize image bytes into message json', () => { + const message = buildMessageRecordWithAttachment(fakeAttachment()); + expect(JSON.stringify(message)).not.toContain('base64'); + expect(JSON.stringify(message)).not.toContain('data:image'); +}); + +it('restores send button after removing blocked attachment', () => { + const state = reducer(blockedAttachmentState(), removeAttachment('att_1')); + expect(selectCanSend(state)).toBe(true); +}); +``` + +### Phase 1 implementation order + +Implement in this order to reduce bug risk: + +1. Pure budget/capability tests. +2. Safe id/path helpers. +3. Artifact store tests. +4. Renderer optimization hook without composer integration. +5. Composer preview/warning integration. +6. Send blocking. +7. Backend validation on send. + +If a later step fails, earlier pure modules remain useful and low-risk. + + +## Phase 1 Implementation Contract Addendum + +### Exact domain types + +```ts +export type AttachmentWarningCode = + | 'image_was_resized' + | 'image_was_reencoded' + | 'image_quality_reduced' + | 'model_support_unknown' + | 'model_does_not_support_images' + | 'file_type_not_supported'; + +export interface AttachmentWarning { + code: AttachmentWarningCode; + message: string; + attachmentId?: string; +} + +export interface ImageOptimizationBudget { + maxInputBytes: number; + maxInputPixels: number; + maxOutputBytesPerImage: number; + maxOutputBytesTotal: number; + maxOutputEdge: number; + jpegQualityAttempts: readonly number[]; +} +``` + +Keep these in shared pure code. Renderer and main may import types and pure validators, but renderer-specific optimizer code must not be imported by main. + +### Quality strategy + +Use a deterministic quality strategy instead of adaptive unbounded loops. + +```ts +const JPEG_QUALITY_ATTEMPTS = [0.86, 0.82, 0.78, 0.74, 0.72] as const; + +for (const quality of JPEG_QUALITY_ATTEMPTS) { + const blob = await encodeCanvas(canvas, 'image/jpeg', quality); + if (blob.size <= targetBytes) return blob; +} + +throw new AgentAttachmentError( + 'attachment_too_large', + 'Image is too large after optimization. Remove it or use a smaller image.' +); +``` + +PNG strategy: + +- Keep PNG for small screenshots and transparency. +- Re-encode to JPEG only when transparency is absent and size requires it. +- If transparency exists and PNG remains too large, block with clear reason instead of silently flattening unless product explicitly accepts flattening. + +### UI copy table + +| Code | Copy | +|---|---| +| `attachment_too_large` | `Image is too large after optimization. Remove it or use a smaller image.` | +| `attachment_type_unsupported` | `This file type is not supported for agent image delivery.` | +| `attachment_model_unsupported` | `Selected model does not support image attachments. Switch model or remove the image.` | +| `attachment_optimization_failed` | `Could not prepare this image for sending.` | +| `attachment_artifact_missing` | `Prepared image file is missing. Remove and attach the image again.` | + +Copy should be short in UI. Detailed diagnostics can go into copy diagnostics/logs. + +### Batch behavior + +Multiple images should preserve user order. + +```ts +export function sortAttachmentsForDelivery(attachments: AgentAttachmentPayload[]): AgentAttachmentPayload[] { + return [...attachments].sort((a, b) => a.order - b.order); +} +``` + +Do not sort by size or file name because the prompt may refer to “first image” and “second image”. + +### Draft persistence edge cases + +- Draft can reference attachment ids before final message id exists. +- On send, draft attachment ids should be reparented or copied into message artifact directory. +- If reparenting fails, send should stop before provider delivery. +- Removing attachment from draft should remove draft artifact eventually, but not synchronously block UI. + + +## Implementation Readiness Addendum + +### Definition of Ready for Phase 1 + +Before coding Phase 1: + +- Confirm exact composer components that own attachment state. +- Confirm where message ids are generated for user sends. +- Confirm where existing attachments are persisted today. +- Confirm whether current image previews store bytes, paths, or blobs. +- Confirm no provider delivery changes are included in the Phase 1 PR. + +### Exact reducer-style behavior + +```ts +type ComposerAttachmentAction = + | { type: 'attachment_added'; file: File; draftAttachmentId: string } + | { type: 'attachment_prepare_started'; draftAttachmentId: string; generation: number } + | { type: 'attachment_prepare_succeeded'; draftAttachmentId: string; generation: number; payload: AgentAttachmentPayload } + | { type: 'attachment_prepare_failed'; draftAttachmentId: string; generation: number; error: AgentAttachmentErrorJson } + | { type: 'attachment_removed'; draftAttachmentId: string } + | { type: 'target_changed'; target: ProviderTarget }; +``` + +Reducer rule: + +- Ignore `attachment_prepare_succeeded` if generation is stale. +- Ignore prepare results for removed attachment ids. +- Recompute capability warnings on `target_changed` without reprocessing image bytes. +- Do not clear text draft when attachment fails. + +### Backend artifact write order + +```ts +async function persistMessageAttachments(input: PersistMessageAttachmentsInput): Promise { + const messageDir = resolveMessageAttachmentDir(input.teamName, input.messageId); + await fs.mkdir(messageDir, { recursive: true }); + + const persisted: PersistedAttachment[] = []; + for (const attachment of input.attachments) { + const paths = resolveAttachmentPaths(input.teamName, input.messageId, attachment.id); + await writeFileAtomic(paths.original, attachment.originalBytes); + if (attachment.optimizedBytes) await writeFileAtomic(paths.optimized, attachment.optimizedBytes); + await writeFileAtomic(paths.meta, JSON.stringify(buildMeta(attachment), null, 2)); + persisted.push(toPersistedAttachment(paths, attachment)); + } + return { attachments: persisted }; +} +``` + +Use atomic writes for metadata and artifacts where practical. A partially written optimized image must not be treated as ready. + +### Atomic write requirement + +```ts +async function writeFileAtomic(path: string, bytes: Buffer | string): Promise { + const tmp = `${path}.${process.pid}.${Date.now()}.tmp`; + await fs.writeFile(tmp, bytes); + await fs.rename(tmp, path); +} +``` + +If write fails, cleanup tmp best-effort and return typed error. + +### Phase 1 additional edge cases + +| Edge case | Expected behavior | +|---|---| +| Disk full during artifact write | Send blocked, text draft preserved, typed error shown. | +| App crashes after artifacts written before message record | Orphan artifacts may remain, later GC can clean. No message corruption. | +| App crashes after message record before provider delivery | Message exists with attachments and can be retried later. | +| Thumbnail write fails | Do not block delivery if original/optimized artifact is valid. Show preview fallback. | +| Original write succeeds, optimized write fails | Block image delivery unless original fits provider budget. | + + +## Final Phase 1 Acceptance Specs + +### Spec 1 - small supported image + +```gherkin +Given a user attaches a 200 KB PNG screenshot +And the selected target supports images +When the composer prepares the attachment +Then the preview is shown +And the send button remains enabled +And the persisted message references artifact ids +And the message JSON does not contain base64 +``` + +### Spec 2 - oversized image optimized successfully + +```gherkin +Given a user attaches a 12 MB PNG screenshot +When optimization completes below provider budget +Then the user sees that the image was optimized +And the send button is enabled +And the optimized artifact is used for provider delivery later +And the original artifact remains available for retry/regeneration +``` + +### Spec 3 - oversized image still too large + +```gherkin +Given a user attaches a huge image +When bounded optimization cannot bring it below budget +Then the send button is disabled +And the text draft remains intact +And the user can remove the image +And no provider delivery is attempted +``` + +### Spec 4 - stale optimization result + +```gherkin +Given a user attaches an image +And removes it while optimization is running +When the optimization promise resolves +Then the removed attachment is not restored +And the send state reflects the current draft only +``` + +### Phase 1 exact PR contract + +The Phase 1 PR is acceptable only if: + +- It adds shared attachment domain types. +- It adds budget/capability/validation tests. +- It adds safe managed artifact path/id helpers. +- It adds renderer optimization or a clearly documented placeholder if split. +- It does not change provider delivery behavior. +- It does not import provider runtime code into renderer optimizer. +- It does not import attachment feature into launch/provisioning. + +### Phase 1 likely review findings to prevent + +| Finding | Prevention | +|---|---| +| Message JSON contains base64 | persist artifact ids only | +| Send starts before artifact write | enforce write-before-delivery order | +| Draft removal race restores attachment | generation id guard | +| Backend trusts renderer size | stat artifact on disk | +| Unsupported model warning only in UI | backend validation also blocks | + + +## Phase 1 Pre-Mortem and Extra Safeguards + +### Likely Phase 1 mistakes + +| Mistake | Concrete prevention | +|---|---| +| Optimizer result races with removed attachment | Generation id guard in reducer/hook. | +| Backend trusts renderer MIME | Backend validates by allowlist and artifact metadata. | +| Draft artifacts leak forever | Mark draft artifacts and add later GC policy. | +| UI blocks text-only send after image error removed | Selector tests for `canSend`. | +| Multiple images reorder | Preserve user-provided order field. | +| Image-only send unclear | Product decision before coding: allow image-only with default prompt or require text. | + +### Image-only message decision + +Top options: + +1. Require text with image - 🎯 8.5 🛡️ 9 🧠 2, примерно `20-50` строк. + + Safest release behavior. It avoids confusing empty prompt behavior across providers. + +2. Allow image-only with generated prompt - 🎯 7 🛡️ 7.5 🧠 4, примерно `60-120` строк. + + Useful UX, but generated prompts can surprise users and differ by action mode. + +3. Allow image-only raw - 🎯 5.5 🛡️ 5 🧠 2, примерно `20-40` строк. + + Some providers may accept it, but behavior is inconsistent. + +Recommendation: start with option 1 for release. + +### Pica wrapper contract + +```ts +export interface ImageResizeEngine { + resize(input: { + source: ImageBitmap; + targetWidth: number; + targetHeight: number; + }): Promise; +} +``` + +Reason: + +- Keeps `pica` behind a tiny interface. +- Allows tests with fake resize engine. +- Allows fallback or replacement without changing composer state. + +### Quality acceptance rules + +- UI screenshots should remain legible at common zoom levels after resize. +- Do not reduce JPEG quality below configured floor. +- If text in screenshot becomes unreadable in manual QA, increase budget before release. +- If multiple images exceed total budget, prefer blocking over making all unreadable. + +### Phase 1 contract tests to add before UI wiring + +```ts +it('preserves attachment order for delivery', () => { + const sorted = sortAttachmentsForDelivery([ + fakeImageAttachment({ id: 'b', order: 2 }), + fakeImageAttachment({ id: 'a', order: 1 }), + ]); + expect(sorted.map((item) => item.id)).toEqual(['a', 'b']); +}); + +it('does not allow send while attachment is optimizing', () => { + expect(selectCanSend(fakeDraft({ attachmentState: 'optimizing' }))).toBe(false); +}); +``` + diff --git a/docs/team-management/agent-attachments-phase-2-claude-stream-json-plan.md b/docs/team-management/agent-attachments-phase-2-claude-stream-json-plan.md index 18ac1a73..8549c0ee 100644 --- a/docs/team-management/agent-attachments-phase-2-claude-stream-json-plan.md +++ b/docs/team-management/agent-attachments-phase-2-claude-stream-json-plan.md @@ -613,3 +613,567 @@ it('does not write to stdin when attachment payload exceeds Claude budget', asyn ## Code review notes If the diff shows a new `if (mimeType)` ladder inside `TeamProvisioningService`, the refactor failed. That logic belongs in adapter/helper tests. + +## Detailed Implementation Checklist + +### Step 1 - Locate and isolate current Claude message serialization + +Do not rewrite the full delivery path. Add a seam where text and attachments are converted into Claude input blocks. + +Target shape: + +```ts +export function buildClaudeInputBlocks(input: { + text: string; + attachments: AgentAttachmentPayload[]; +}): ClaudeInputBlock[] { + return [ + { type: 'text', text: input.text }, + ...input.attachments.map(toClaudeImageBlock), + ]; +} +``` + +### Step 2 - Use provider-native image blocks only + +For Claude, image data should be represented as structured image blocks for the SDK/stream-json path. Do not paste base64 into text. + +```ts +function toClaudeImageBlock(attachment: AgentAttachmentPayload): ClaudeInputBlock { + const artifact = selectBestImageArtifact(attachment, 'claude'); + return { + type: 'image', + source: { + type: 'base64', + media_type: artifact.mimeType, + data: artifact.base64, + }, + }; +} +``` + +Important: this function should live in a Claude adapter, not in the composer or generic delivery service. + +### Step 3 - Preserve text-only path + +Guard the new block builder so text-only Claude messages produce byte-for-byte equivalent semantic behavior. + +```ts +if (attachments.length === 0) { + return buildExistingClaudeTextPayload(text); +} +``` + +Only remove this branch after tests prove the generic block path is fully equivalent. + +### Step 4 - Add exact diagnostics + +Claude attachment failures should say what failed: + +- `Claude image artifact missing` +- `Claude image MIME type unsupported: image/webp` +- `Claude image payload exceeds budget after optimization` +- `Claude stream-json image delivery rejected: ` + +Avoid saying “teammate crashed” unless process liveness confirms that. + +## Claude-Specific Edge Cases + +| Case | Risk | Safe behavior | +|---|---|---| +| Claude subscription not logged in | Could be misread as attachment failure | Preserve existing auth diagnostic. | +| Multiple images | Token/latency spike | Enforce count and total byte budget before SDK call. | +| Image plus task delegation | Tool-call response still expected | Existing proof gates stay unchanged. | +| Lead prompt with images | Lead may consume image but not delegate | This is normal model behavior, not transport failure. | +| Assistant refuses visual task | Delivery succeeded, model response is semantic failure | Do not retry transport automatically. | +| Claude CLI path lacking image support | Attachment delivery blocked with provider capability error | Do not fallback to base64 text. | + +## Golden Serialization Tests + +Add tests that validate payload shape without hitting Claude. + +```ts +it('serializes a png as a Claude image block', () => { + const blocks = buildClaudeInputBlocks({ + text: 'What color?', + attachments: [fakePngAttachment({ base64: 'abc' })], + }); + + expect(blocks).toEqual([ + { type: 'text', text: 'What color?' }, + { + type: 'image', + source: { type: 'base64', media_type: 'image/png', data: 'abc' }, + }, + ]); +}); +``` + +Negative test: + +```ts +it('does not serialize image as plain base64 text', () => { + const payload = JSON.stringify(buildClaudeInputBlocks(input)); + expect(payload).not.toContain('data:image/png;base64'); +}); +``` + +## Manual Claude QA + +Use the known-good red-card PNG: + +1. Send to Claude lead: “What color is the square? Answer one word.” +2. Expected answer: `red` or equivalent. +3. Send two images if supported by budget. +4. Send oversized image and verify UI blocks before provider call. +5. Temporarily break auth and verify auth error is preserved, not attachment error. + +## Phase 2 Exit Criteria + +- Claude text-only delivery remains green. +- Claude image delivery answers visual smoke prompt. +- Claude oversized image is blocked before SDK/provider call. +- Claude provider rejection is shown as delivery failure, not launch failure. +- No Codex or OpenCode code path changes are required for Phase 2 to pass. + + +## Implementation Safeguards + +### Claude adapter should be additive + +Do not refactor all Claude runtime code just to add image support. Add a small adapter and call it from the existing send path only when attachments are present. + +```ts +if (validatedAttachments.length > 0) { + const payload = await claudeAttachmentAdapter.buildDeliveryParts({ + text, + attachments: validatedAttachments, + budget, + }); + return sendClaudeStructuredPayload(payload); +} + +return sendClaudeTextOnlyPayload(text); +``` + +This limits blast radius. + +### Do not infer readiness from Claude image response + +A Claude member answering an image prompt proves message delivery, not bootstrap readiness. Keep these concepts separate: + +- `bootstrapConfirmed`: launch/runtime proof. +- `messageDelivered`: prompt delivery proof. +- `visibleReply`: user-visible response proof. + +### Claude streaming failure mapping + +| Failure source | User-facing classification | +|---|---| +| SDK rejects image MIME | `attachment_type_unsupported` | +| SDK rejects payload size | `attachment_too_large` | +| Auth token invalidated | existing provider auth error | +| Runtime exits after request | runtime crash/exit diagnostic | +| Assistant answers “cannot view image” | semantic model response, not transport failure | +| Stream closes without response | delivery failure, eligible for existing bounded retry only if text path already retries | + +### Claude PR review checklist + +- Does the text-only path avoid new serialization code? +- Are images sent as structured image blocks, not text? +- Is artifact content loaded as late as possible? +- Are size and MIME checked before reading large file into memory? +- Is provider error redacted? +- Does delivery failure preserve the inbox message? + +### Additional Claude tests + +```ts +it('keeps text-only Claude path unchanged', async () => { + const result = await buildClaudeDeliveryRequest({ text: 'hello', attachments: [] }); + expect(result.kind).toBe('legacy_text'); +}); + +it('classifies missing optimized artifact before provider call', async () => { + await expect(buildClaudeDeliveryRequest({ + text: 'see image', + attachments: [fakeAttachmentWithMissingPath()], + })).rejects.toMatchObject({ code: 'attachment_artifact_missing' }); +}); +``` + + +## Failure Injection Tests for Phase 2 + +```ts +describe('Claude attachment delivery failures', () => { + it('does not mark teammate offline when Claude rejects image payload', async () => { + const result = await deliverClaudeMessageWithAttachment(fakeProviderRejectsImage()); + + expect(result.delivery.status).toBe('failed'); + expect(result.delivery.failureCode).toBe('attachment_provider_rejected'); + expect(result.memberPatch).toBeUndefined(); + }); + + it('preserves existing auth error when Claude token is invalid', async () => { + const result = await deliverClaudeMessageWithAttachment(fakeInvalidAuth()); + + expect(result.error.message).toMatch(/authentication|sign in|token/i); + expect(result.error.code).not.toBe('attachment_provider_rejected'); + }); +}); +``` + +## Claude Serialization Gotchas + +- Some Claude SDK/CLI surfaces accept message arrays, some accept stream-json lines, and some accept plain prompt text. Attachments must use the surface that truly supports image blocks. +- If the current runtime path cannot send image blocks safely, Phase 2 must block image send with a clear message instead of falling back to base64 text. +- Do not mix `@file` syntax with image block syntax unless that exact runtime path has been tested. +- If multiple Claude launch contexts exist, validate the one used by the app, not only a standalone prototype. + +## Claude Runtime Exit Correlation + +If Claude process exits shortly after image delivery, diagnostics should show both facts separately: + +```text +Image delivery was attempted using Claude image blocks. +The Claude runtime exited 18s later. +Last stderr: +``` + +Do not claim the image caused the exit unless provider stderr explicitly says so. + +## Phase 2 Stop Conditions + +Stop and reassess if: + +- Claude implementation requires changing team bootstrap prompts. +- Claude text-only path must be rewritten broadly. +- Auth/session diagnostics change unexpectedly. +- Provider image block support is not available in the actual app runtime path. + + +## File-Level Implementation Plan + +Suggested files: + +```text +src/features/agent-attachments/main/providers/claudeAttachmentAdapter.ts +src/features/agent-attachments/main/providers/claudeAttachmentAdapter.test.ts +``` + +Existing delivery call site should import only the adapter public function, not internal attachment storage helpers. + +### Adapter skeleton + +```ts +export async function buildClaudeAttachmentDeliveryParts(input: { + text: string; + attachments: AgentAttachmentPayload[]; + readArtifact: AttachmentArtifactReader; +}): Promise { + const blocks: ClaudeInputBlock[] = []; + if (input.text.trim().length > 0) { + blocks.push({ type: 'text', text: input.text }); + } + + for (const attachment of input.attachments) { + const artifact = selectProviderImageArtifact(attachment, 'anthropic'); + const bytes = await input.readArtifact.readBytes(artifact.artifactId); + blocks.push(toClaudeImageBlock(artifact.mimeType, bytes)); + } + + return { kind: 'claude_structured_blocks', blocks }; +} +``` + +### Artifact reader abstraction + +```ts +export interface AttachmentArtifactReader { + readBytes(artifactId: string): Promise; + stat(artifactId: string): Promise<{ sizeBytes: number }>; +} +``` + +This makes adapter unit tests independent from filesystem. + +### Claude image block conversion + +```ts +function toClaudeImageBlock(mimeType: string, bytes: Buffer): ClaudeInputBlock { + if (mimeType !== 'image/png' && mimeType !== 'image/jpeg') { + throw new AttachmentDeliveryError('attachment_type_unsupported', `Claude image MIME unsupported: ${mimeType}`); + } + + return { + type: 'image', + source: { + type: 'base64', + media_type: mimeType, + data: bytes.toString('base64'), + }, + }; +} +``` + +This base64 is provider-native structured payload data, not prompt text. That distinction should be documented in code comments because it is easy to confuse. + +### Claude phase review traps + +- If a reviewer sees `data:image` in prompt text, reject. +- If a reviewer sees `spawnStatus` or `launchState` imports in adapter, reject. +- If a reviewer sees attachment adapter imported by bootstrap/provisioning code, reject. +- If text-only Claude messages start going through `readArtifact`, reject. + + +## Phase 2 Deep Review Addendum + +### Claude adapter contract tests + +Test adapter without real Claude first. The real smoke test should only prove provider behavior after contract is stable. + +```ts +describe('buildClaudeAttachmentDeliveryParts', () => { + it('preserves text order before images', async () => { + const parts = await buildClaudeAttachmentDeliveryParts(fakeInputWithOneImage()); + expect(parts.blocks[0]).toMatchObject({ type: 'text' }); + expect(parts.blocks[1]).toMatchObject({ type: 'image' }); + }); + + it('rejects webp until explicitly supported', async () => { + await expect(buildClaudeAttachmentDeliveryParts(fakeWebpInput())) + .rejects.toMatchObject({ code: 'attachment_type_unsupported' }); + }); +}); +``` + +### Claude delivery proof matrix + +| Message target | Required proof after Claude response | +|---|---| +| Direct user ask | visible reply or safe plain text materialization | +| Delegate to teammate | structured relay/message proof if current flow requires it | +| Work-sync | existing work-sync proof, not visual answer | +| Task progress | existing task/progress proof | + +Image support must not weaken these proof gates. + +### Live smoke minimum + +Use a prompt that cannot be answered correctly without image access: + +```text +Look at the attached image. What is the single dominant color of the square? Answer with one English word. +``` + +Passing answer: `red`. + +Failing answers: + +- `I cannot view images`. +- Any generic guess not grounded in image. +- Empty turn. +- Tool-only response without visible answer for direct ask. + + +## Phase 2 Implementation Contract Addendum + +### Runtime path decision tree + +Before implementing, identify the exact Claude runtime path used by team messages. + +```text +Does app path support structured image blocks? + yes -> implement native Claude image blocks + no -> block image send for Claude with clear unsupported runtime message +``` + +Do not implement a fallback that pastes base64 text. + +### Claude provider adapter return shape + +```ts +export type ClaudeAttachmentDeliveryParts = + | { kind: 'legacy_text'; text: string } + | { kind: 'structured_blocks'; blocks: ClaudeInputBlock[] }; +``` + +Rules: + +- `legacy_text` only when no attachments. +- `structured_blocks` when attachments exist. +- Call site must handle both explicitly. +- Any unsupported attachment throws typed `AgentAttachmentError` before provider call. + +### Claude provider smoke record format + +Record smoke results in PR notes: + +```text +Provider: Claude subscription +Runtime path: +Model: +Prompt: What color is the square? One word. +Image: red-square.png +Expected: red +Observed: red +Date: 2026-05-09 +Result: pass +``` + +This avoids relying on stale memory about prototype success. + + +## Implementation Readiness Addendum + +### Definition of Ready for Phase 2 + +Before coding Phase 2: + +- Phase 1 normalized attachment payload exists and is tested. +- Backend can read optimized artifact bytes by attachment id. +- Exact Claude app runtime path for team messages is identified. +- Text-only Claude delivery tests are green before changes. + +### Mocking strategy + +Use a fake artifact reader and fake Claude sender. + +```ts +const fakeArtifactReader: AttachmentArtifactReader = { + async readBytes(id) { + if (id === 'missing') throw new AgentAttachmentError('attachment_artifact_missing', 'missing'); + return Buffer.from([1, 2, 3]); + }, + async stat() { + return { sizeBytes: 3 }; + }, +}; +``` + +Do not unit-test by invoking real Claude. Real Claude belongs to smoke/e2e only. + +### Claude fallback decision + +If structured image blocks are unavailable in the actual app runtime: + +- Block image send for Claude. +- Keep text-only send working. +- Add diagnostic: `Claude runtime path does not support image attachments yet.` +- Do not fallback to `@file`, Markdown image links, or base64 text unless separately proven. + +### Claude additional edge cases + +| Edge case | Expected behavior | +|---|---| +| Empty text with image | Allow if UX supports image-only prompt, otherwise require text. Do not crash adapter. | +| Multiple images | Preserve order. | +| Unsupported MIME after hydration | Typed pre-provider error. | +| Artifact read fails | Message saved, delivery fails actionable. | +| Claude returns visible refusal | Delivery succeeded with refusal visible to user. | + + +## Final Phase 2 Acceptance Specs + +### Spec 1 - Claude text-only no regression + +```gherkin +Given a user sends a text-only message to a Claude target +When the message is delivered +Then the legacy Claude text path is used +And no artifact reader is invoked +And existing proof gates are unchanged +``` + +### Spec 2 - Claude image delivered through structured block + +```gherkin +Given a user sends a PNG image to a Claude target +When the Claude adapter builds delivery parts +Then it returns structured blocks +And the first block is the text prompt +And the second block is a provider-native image block +And no data:image text is present in the prompt +``` + +### Spec 3 - Claude artifact missing + +```gherkin +Given a message references an optimized image artifact +And the artifact file is missing +When delivery is attempted +Then delivery fails with attachment_artifact_missing +And the member launch state is unchanged +And the message remains available for user action +``` + +### Phase 2 exact PR contract + +The Phase 2 PR is acceptable only if: + +- It adds a Claude adapter with fake artifact-reader tests. +- It preserves text-only fast path. +- It maps provider image rejection to attachment delivery failure. +- It preserves Claude auth/session diagnostics. +- It includes one real smoke note or explicitly marks smoke as pending. +- It does not touch Codex/OpenCode delivery code except shared types. + +### Claude provider-native base64 comment requirement + +If implementation converts bytes to base64 for Claude structured image blocks, add a comment explaining why this is not the forbidden base64-in-text fallback. + +```ts +// Claude expects image bytes inside the structured image block as base64. +// This is provider-native payload data, not text appended to the user prompt. +``` + + +## Phase 2 Pre-Mortem and Extra Safeguards + +### Likely Claude mistakes + +| Mistake | Concrete prevention | +|---|---| +| Testing standalone Claude path but shipping different app path | Smoke actual app-managed team delivery path. | +| Treating Claude visible refusal as transport failure | Visible refusal is delivered response. | +| Weakening relay/work-sync proof gates | Keep existing proof gates after response. | +| Reading large artifact before budget validation | Validate size before reading bytes. | +| Logging structured payload with base64 | Redact image block data in diagnostics. | + +### Redaction helper requirement + +Claude structured payload may contain base64. Any debug output must redact it. + +```ts +export function redactClaudeBlocksForDiagnostics(blocks: ClaudeInputBlock[]): unknown[] { + return blocks.map((block) => { + if (block.type !== 'image') return block; + return { + ...block, + source: { + ...block.source, + data: `[redacted image bytes: ${block.source.media_type}]`, + }, + }; + }); +} +``` + +### Claude adapter test for redaction + +```ts +it('redacts image bytes in diagnostics', () => { + const redacted = redactClaudeBlocksForDiagnostics([fakeClaudeImageBlock('SECRET_BASE64')]); + expect(JSON.stringify(redacted)).not.toContain('SECRET_BASE64'); +}); +``` + +### Claude stream handling edge case + +If Claude streams partial text then errors: + +- Preserve partial visible text only if existing delivery layer already supports partials safely. +- Otherwise report delivery failed with provider diagnostic. +- Do not mark message read if required visible proof was not committed. + diff --git a/docs/team-management/agent-attachments-phase-3-codex-native-plan.md b/docs/team-management/agent-attachments-phase-3-codex-native-plan.md index 42c140af..9402c899 100644 --- a/docs/team-management/agent-attachments-phase-3-codex-native-plan.md +++ b/docs/team-management/agent-attachments-phase-3-codex-native-plan.md @@ -624,3 +624,595 @@ it('does not include image path in prompt stdin', async () => { | base64 path from renderer | backend-only artifact store | | Codex auth failure hidden | do not catch provider errors as attachment errors | | unsupported PDF converted to prompt text silently | block explicitly | + +## Detailed Implementation Checklist + +### Step 1 - Define desktop to orchestrator attachment contract + +Codex native invocation details should stay in the runtime/orchestrator boundary. Desktop should pass managed artifact references, not raw arbitrary paths. + +```ts +export interface CodexNativeAttachmentRequest { + kind: 'image'; + artifactId: string; + mimeType: 'image/png' | 'image/jpeg'; + absolutePath: string; + sizeBytes: number; +} +``` + +Validation before crossing process boundary: + +- Path must be under app-managed attachment artifact directory. +- MIME type must be PNG/JPEG for Phase 3. +- File must exist and match expected size budget. +- File path must not come directly from renderer. + +### Step 2 - Add Codex adapter in desktop + +Desktop adapter should produce a provider-neutral runtime request, not CLI args directly. + +```ts +export function buildCodexAttachmentRuntimeRequest(input: { + text: string; + attachments: AgentAttachmentPayload[]; +}): CodexRuntimePromptRequest { + return { + text: input.text, + images: input.attachments.map((attachment) => ({ + path: selectBestImageArtifact(attachment, 'codex').path, + mimeType: selectBestImageArtifact(attachment, 'codex').mimeType, + })), + }; +} +``` + +### Step 3 - Add orchestrator CLI serialization + +In orchestrator, serialize to Codex-native image args only at the final command builder. + +Expected conceptual shape: + +```ts +const args = ['exec', '--json', '--model', model]; +for (const image of request.images) { + args.push('--image', image.path); +} +args.push('-'); +``` + +Do not use base64 text fallback. + +### Step 4 - Preserve Codex auth behavior + +Codex attachment changes must not modify: + +- `CODEX_HOME` selection. +- ChatGPT subscription vs API key logic. +- `forced_login_method=chatgpt` propagation. +- existing diagnostics for “ChatGPT login is required”. + +If Codex auth fails, the error should remain auth-specific, not attachment-specific. + +## Codex-Specific Edge Cases + +| Case | Expected behavior | +|---|---| +| Codex ChatGPT subscription logged out | Preserve existing login required diagnostic. | +| Codex API key mode selected but native subscription expected | Preserve existing auth mode diagnostic. | +| Image artifact path deleted before send | Delivery fails with artifact missing. | +| Large image after optimization | Block before Codex CLI. | +| Non-image file | Phase 3 does not route it through `--image`. | +| Multiple images | Pass repeated `--image` args if Codex supports them, otherwise enforce count 1 with clear warning. | +| Codex model without vision | Block based on capability matrix if known. | + +## Cross-Repo Contract Test Idea + +Add a small pure test in orchestrator for command building: + +```ts +it('passes codex images as repeated --image args', () => { + const command = buildCodexExecCommand({ + prompt: 'What color?', + images: ['/tmp/a.png', '/tmp/b.jpg'], + }); + + expect(command.args).toContainSequence(['--image', '/tmp/a.png']); + expect(command.args).toContainSequence(['--image', '/tmp/b.jpg']); + expect(command.stdin).toBe('What color?'); +}); +``` + +Add desktop-side test for path safety: + +```ts +it('rejects codex image paths outside managed artifact directory', () => { + expect(() => validateManagedArtifactPath('/etc/passwd')).toThrow(/outside managed attachment directory/); +}); +``` + +## Manual Codex QA + +1. Confirm dashboard shows Codex ChatGPT account ready. +2. Send red-card image to Codex lead or member. +3. Expected answer: `red` or equivalent. +4. Send unsupported model if available and verify warning. +5. Send oversized optimized image and verify block before runtime call. +6. Temporarily invalidate Codex login and verify auth diagnostic remains clear. + +## Phase 3 Exit Criteria + +- Codex text-only prompt path unchanged. +- Codex image prompt uses native image channel, not base64 text. +- Codex auth diagnostics still match current behavior. +- Desktop never passes renderer-provided arbitrary file paths to orchestrator. +- One real Codex subscription visual smoke test passes. + + +## Implementation Safeguards + +### Keep Codex login/session handling untouched + +The recent launch stability work around Codex auth is fragile and should not be mixed with attachment delivery. + +Do not change: + +- Codex auth discovery. +- ChatGPT account vs API key selection. +- `CODEX_HOME` propagation. +- `forced_login_method=chatgpt` settings. +- preflight auth status copy. + +Attachment support should sit after auth has already resolved. + +### Avoid stdin bloat + +Codex image paths should be CLI args or native request fields. The prompt on stdin should stay text-only. + +Bad: + +```ts +stdin = `${prompt}\n\nIMAGE_BASE64=${base64}`; +``` + +Good: + +```ts +args.push('--image', managedImagePath); +stdin = prompt; +``` + +### Artifact lifetime + +Codex runtime may read image files after process spawn. Do not delete artifacts immediately after creating command args. + +Safe policy: + +- Keep managed artifacts at least until delivery result is terminal. +- If message remains retryable, keep artifacts until retry window expires. +- Garbage collect old artifacts by age and reachability from message records. + +### Codex retry edge cases + +| Case | Safe behavior | +|---|---| +| Retry after artifact GC | Fail with artifact missing, do not send text-only replacement silently. | +| Retry after model switch | Revalidate capability and budgets. | +| Retry after Codex logout | Show auth error, preserve message. | +| Runtime exits after receiving image | Runtime diagnostic, not attachment validation failure. | + +### Cross-repo PR checklist + +Desktop repo: + +- Produces provider-neutral image request. +- Validates managed artifact paths. +- Preserves message/ledger semantics. + +Orchestrator repo: + +- Converts request to Codex native args. +- Does not inspect renderer metadata. +- Redacts command diagnostics. +- Tests command builder with one and multiple images. + + +## Failure Injection Tests for Phase 3 + +```ts +describe('Codex native image delivery', () => { + it('keeps prompt on stdin and images as args', () => { + const command = buildCodexNativeCommand({ + prompt: 'What color?', + images: [managedImage('/tmp/app/attachments/a.png')], + }); + + expect(command.stdin).toBe('What color?'); + expect(command.args).toContain('--image'); + expect(command.stdin).not.toContain('base64'); + }); + + it('does not mask Codex login failure as attachment failure', async () => { + const result = await runCodexImageDelivery(fakeCodexLoggedOut()); + + expect(result.error.message).toMatch(/codex login|ChatGPT/i); + expect(result.error.code).not.toBe('attachment_provider_rejected'); + }); +}); +``` + +## Codex Command Builder Invariants + +- `--image` args must appear before prompt stdin is consumed if Codex CLI requires it. +- Paths must be absolute managed artifact paths. +- Do not shell-concatenate paths. Use argv array. +- Do not quote manually inside argv array. +- Do not pass images through environment variables. +- Do not write temp prompt files containing base64 image text. + +## Codex Existing-Team Edge Case + +Existing teams may have members created before attachment support. That must not matter. Attachment capability is based on current provider/model/runtime, not team creation date. + +If old metadata lacks provider/model: + +- Use existing compatibility probe behavior. +- Do not infer vision support from old inbox names. +- Block image send until team/member metadata is stable. + +## Phase 3 Stop Conditions + +Stop and reassess if: + +- Codex image support requires changing auth detection. +- Codex CLI version in app runtime does not support `--image`. +- Image args only work in standalone shell but not app-managed runtime. +- Provider diagnostics start saying API key mode when ChatGPT subscription was selected. + + +## File-Level Implementation Plan + +Desktop suggested files: + +```text +src/features/agent-attachments/main/providers/codexAttachmentAdapter.ts +src/features/agent-attachments/main/providers/codexAttachmentAdapter.test.ts +``` + +Orchestrator suggested files: + +```text +src/runtime/codex/codexImageArgs.ts +src/runtime/codex/codexImageArgs.test.ts +``` + +Use actual repo structure names during implementation, but keep this separation: desktop plans delivery, orchestrator serializes runtime command. + +### Desktop adapter skeleton + +```ts +export function buildCodexNativeAttachmentRequest(input: { + text: string; + attachments: AgentAttachmentPayload[]; +}): CodexNativePromptRequest { + return { + text: input.text, + images: input.attachments.map((attachment) => { + const artifact = selectProviderImageArtifact(attachment, 'codex'); + return { + artifactId: artifact.artifactId, + path: artifact.absolutePath, + mimeType: artifact.mimeType, + sizeBytes: artifact.sizeBytes, + }; + }), + }; +} +``` + +### Orchestrator command args skeleton + +```ts +export function appendCodexImageArgs(args: string[], images: CodexRuntimeImage[]): void { + for (const image of images) { + assertManagedRuntimeImagePath(image.path); + args.push('--image', image.path); + } +} +``` + +Never build a shell string like: + +```ts +`codex exec --image ${path}` +``` + +Use argv arrays only. + +### Codex provider proof interaction + +Codex image delivery should reuse existing delivery proof gates. Image delivery itself is not success. + +Success examples: + +- Visible direct reply to user. +- Correct `message_send` relay for delegated/peer path. +- Correct work-sync report for work-sync path. + +Failure examples: + +- Codex command exits before assistant message. +- Codex auth fails. +- Codex returns text saying it cannot inspect image. + +The last one may still be a visible reply. Treat it as delivered but semantically unhelpful, not as transport failure. + + +## Phase 3 Deep Review Addendum + +### Codex runtime compatibility probe + +Before enabling Codex image delivery in UI, the app should know whether the configured Codex runtime supports image args. + +Conservative options: + +1. Static capability by known runtime version - 🎯 8 🛡️ 8.5 🧠 4, примерно `80-160` строк. +2. One-time local CLI help parse/cache - 🎯 7.5 🛡️ 8 🧠 5, примерно `120-220` строк. +3. Always attempt and surface provider error - 🎯 6 🛡️ 6.5 🧠 2, примерно `30-80` строк. + +Recommended for release: option 1 if runtime version is already controlled by the app, otherwise option 2 with short cache. + +Do not run expensive live Codex image probes on every composer render. + +### Codex request lifecycle + +```text +Composer send + -> persist message and artifacts + -> backend validates Codex capability + -> desktop builds Codex runtime request + -> orchestrator builds argv with --image + -> Codex runtime produces response + -> existing delivery proof gates decide delivered/failed +``` + +Every arrow should have tests or explicit diagnostics. + +### Codex diagnostics examples + +Good: + +```text +Codex image delivery failed: optimized image artifact is missing. The message was saved but cannot be retried with the image. +``` + +```text +Codex ChatGPT login is required before sending image attachments. +``` + +Bad: + +```text +Agent failed. +``` + +```text +Spawn failed. +``` + +unless process launch actually failed. + + +## Phase 3 Implementation Contract Addendum + +### Codex DTO between desktop and orchestrator + +Use a stable JSON-serializable shape. + +```ts +export interface CodexImageAttachmentForRuntime { + artifactId: string; + path: string; + mimeType: 'image/png' | 'image/jpeg'; + sizeBytes: number; +} + +export interface CodexPromptRuntimeRequest { + text: string; + images?: CodexImageAttachmentForRuntime[]; +} +``` + +Rules: + +- `images` omitted or empty means exact text-only legacy behavior. +- `images` present means orchestrator appends native `--image` args. +- Unknown fields should be ignored or rejected consistently, not partially consumed. + +### Codex compatibility check + +If the app bundles/controls Codex CLI version, use static support knowledge. If not, parse `codex exec --help` once and cache whether `--image` exists. + +Pseudo-code: + +```ts +export async function detectCodexImageArgSupport(runtime: CodexRuntime): Promise { + const cached = codexImageSupportCache.get(runtime.binaryPath); + if (cached) return cached.supported; + + const help = await runtime.run(['exec', '--help'], { timeoutMs: 5000 }); + const supported = /--image\b/.test(help.stdout + help.stderr); + codexImageSupportCache.set(runtime.binaryPath, { supported, checkedAt: Date.now() }); + return supported; +} +``` + +Do not run this on every message send if it is slow. + +### Codex error mapping table + +| Error text category | Final classification | +|---|---| +| login required / ChatGPT session | provider auth/session error | +| unknown option `--image` | runtime does not support image attachments | +| file not found | attachment artifact missing | +| max tokens/quota | provider rejected message | +| model cannot inspect image | visible semantic response if delivered | + +Business logic should not depend on fragile regex except for display cleanup. Prefer structured exit codes/known adapter failures where available. + + +## Implementation Readiness Addendum + +### Definition of Ready for Phase 3 + +Before coding Phase 3: + +- Phase 1 normalized artifact storage is in place. +- Codex text-only delivery tests are green. +- Orchestrator branch/worktree is aligned with desktop branch. +- Actual app-managed Codex runtime has been checked for image support. + +### Cross-repo compatibility rule + +Desktop must tolerate orchestrator without image support during development by failing safely before send, not by crashing runtime. + +```ts +if (!runtimeCapabilities.codexImageArgs) { + throw new AgentAttachmentError( + 'attachment_runtime_transport_failed', + 'Current Codex runtime does not support image attachments.' + ); +} +``` + +### Codex no-regression assertions + +- Text-only Codex request shape unchanged when `images` is empty. +- Codex ChatGPT account mode still selected when user chose subscription. +- No `OPENAI_API_KEY` fallback is introduced for subscription image send. +- `CODEX_HOME` still points to expected local auth state. +- Provider auth errors are not swallowed by attachment adapter. + +### Codex additional edge cases + +| Edge case | Expected behavior | +|---|---| +| `--image` unsupported | Block with runtime unsupported diagnostic. | +| Two images but Codex supports one only | Block or reduce with explicit user choice, no silent drop. | +| Image path contains spaces | argv array handles it. Test it. | +| Image path contains shell metacharacters | argv array handles it. No shell interpolation. | +| Codex returns answer in non-English | Accept semantic visual answer if it clearly identifies image. | + + +## Final Phase 3 Acceptance Specs + +### Spec 1 - Codex text-only no regression + +```gherkin +Given a user sends a text-only message to a Codex target +When the runtime request is built +Then images is omitted or empty +And the existing Codex text-only path is used +And auth/session handling is unchanged +``` + +### Spec 2 - Codex image uses native args + +```gherkin +Given a user sends a PNG image to Codex +When the orchestrator command is built +Then the prompt remains text-only stdin +And the image path is passed with --image argv +And no shell string interpolation is used +And no base64 appears in stdin +``` + +### Spec 3 - Codex runtime lacks image support + +```gherkin +Given the configured Codex runtime does not support --image +When the user tries to send an image +Then the send is blocked or delivery fails before runtime prompt +And the diagnostic says Codex runtime does not support image attachments +And text-only Codex messages still work +``` + +### Phase 3 exact PR contract + +The Phase 3 PR is acceptable only if: + +- Desktop and orchestrator agree on DTO shape. +- Orchestrator command builder has argv tests. +- Codex auth tests or diagnostics are not weakened. +- Runtime support detection is cached or static, not repeated on every render. +- Paths with spaces are covered in tests. +- No API-key fallback is introduced for subscription mode. + +### Codex failure copy examples + +```text +Codex image delivery is unavailable because the configured Codex runtime does not support --image. +``` + +```text +Codex ChatGPT login is required before this image can be sent. +``` + +```text +Prepared image file is missing. Remove and attach the image again. +``` + + +## Phase 3 Pre-Mortem and Extra Safeguards + +### Likely Codex mistakes + +| Mistake | Concrete prevention | +|---|---| +| `--image` command built through shell string | Use argv arrays only. | +| Image support check runs too often | Cache runtime capability. | +| API key mode accidentally used for subscription | Preserve selected auth mode and existing env rules. | +| Orchestrator receives desktop fields it ignores | Add contract test across DTO. | +| Runtime does not support image but UI allows send | Capability gate checks runtime support. | + +### Cross-version DTO tolerance + +During development, desktop and orchestrator can briefly be out of sync. The final merged state must be compatible, but code should fail clearly if mismatch happens. + +```ts +export function assertCodexRuntimeUnderstandsImages(runtimeCaps: RuntimeCapabilities): void { + if (!runtimeCaps.codexImages) { + throw new AgentAttachmentError( + 'attachment_runtime_transport_failed', + 'This Codex runtime does not support image attachments yet.' + ); + } +} +``` + +### Codex diagnostic redaction + +If command args are included in diagnostics, redact paths if needed and never include prompt text with secrets. + +Safe diagnostic: + +```json +{ + "provider": "codex", + "model": "gpt-5.4-mini", + "imageCount": 1, + "imageArgsUsed": true, + "stdinBytes": 31 +} +``` + +Unsafe diagnostic: + +```json +{ + "stdin": "full user prompt with private text", + "imageBase64": "..." +} +``` + diff --git a/docs/team-management/agent-attachments-phase-4-opencode-vision-plan.md b/docs/team-management/agent-attachments-phase-4-opencode-vision-plan.md index 647492a4..4782bf65 100644 --- a/docs/team-management/agent-attachments-phase-4-opencode-vision-plan.md +++ b/docs/team-management/agent-attachments-phase-4-opencode-vision-plan.md @@ -692,3 +692,646 @@ it('serializes image file part without changing response observer', async () => | provider key logged in smoke | redactor in smoke script | | model capability string hardcoded in UI only | backend resolver is source of truth | | OpenRouter model id normalization inconsistent | shared `normalizeOpenCodeModelRef` tests | + +## Detailed Implementation Checklist + +### Step 1 - Add OpenCode capability matrix + +OpenCode support depends on provider and model, not just provider id. + +```ts +export interface OpenCodeModelVisionCapability { + provider: 'openai' | 'openrouter' | string; + modelPattern: RegExp; + vision: 'yes' | 'no' | 'unknown'; + evidence: 'live-smoke' | 'docs' | 'manual' | 'default'; +} +``` + +Initial known evidence from prototype: + +- `openai/gpt-5.4-mini`: vision yes. +- `openrouter/moonshotai/kimi-k2.6`: vision yes. +- `openrouter/z-ai/glm-4.5v`: vision yes. +- `openrouter/z-ai/glm-5.1`: vision no or unreliable for images based on smoke result. + +Unknown models should not silently accept images. Use warning/block depending on UX decision: + +- For release-safe behavior: block unknown vision models and explain. +- For exploratory dev behavior: allow with explicit “model may not support images” warning only if user confirms. Not recommended before release. + +### Step 2 - Route OpenCode attachments through file parts + +Use OpenCode file attachment mechanism, not base64 text. + +Conceptual orchestrator command: + +```ts +const args = ['run', '--format', 'json', '--model', model, prompt]; +for (const image of request.images) { + args.push('-f', image.path); +} +``` + +If using HTTP/server API instead of CLI, keep the same contract: file path/part is transport-native. + +### Step 3 - Integrate with delivery ledger safely + +Attachment delivery result must still satisfy the existing prompt delivery proof rules. + +| Result | Ledger behavior | +|---|---| +| Model returns visible reply | Existing success path. | +| Model returns plain text direct user reply | Existing safe materialization path. | +| Model says it cannot view image | Delivery succeeded, semantic unsupported response. Mark delivered if visible reply exists. | +| Provider rejects file | Delivery failed with provider diagnostic. | +| Empty assistant turn | Existing bounded repair policy applies. | +| Non-visible tool without progress | Existing proof-directed repair applies. | + +### Step 4 - Add exact diagnostics + +OpenCode attachment diagnostics should include provider/model context but redact secrets. + +Examples: + +- `OpenCode model openrouter/z-ai/glm-5.1 is not marked vision-capable for image attachments.` +- `OpenCode image artifact missing before delivery.` +- `OpenCode provider rejected image attachment: ` +- `OpenCode returned a visible reply saying it cannot inspect images.` + +Do not turn a model “I cannot see images” answer into transport failure if the model produced a visible answer. That is useful user feedback. + +## OpenCode Edge Cases + +| Case | Expected behavior | +|---|---| +| OpenRouter key quota exceeded | Provider failure with exact redacted diagnostic. | +| OpenCode OAuth stale | Auth/session failure, not attachment failure. | +| Model vision unknown | Block or warn according to release setting, default block. | +| Model claims no image support despite capability yes | Visible semantic answer, not retry-loop. | +| File path accepted but model returns empty turn | Existing bounded empty-turn repair. | +| Concurrent messages with images | Ledger correlation by original message id remains authoritative. | +| Retry of failed image delivery | Reuse same managed artifact if still present, otherwise fail artifact missing. | +| User restarts member after image failure | Restart does not delete inbox message or attachment metadata. | + +## Capability Governance + +Do not scatter model lists across UI and backend. + +Recommended single source: + +```text +src/features/agent-attachments/shared/opencodeVisionCapabilities.ts +``` + +Export both backend and renderer-safe functions: + +```ts +export function resolveOpenCodeVisionCapability(input: { + providerId: string; + model: string; +}): AgentAttachmentCapability { + // Pure function, no filesystem, no network. +} +``` + +When adding a new model: + +1. Add capability entry with evidence comment. +2. Add unit test. +3. Add manual smoke result to this plan or docs. +4. Do not infer full provider support from one model. + +## Manual OpenCode QA + +Run at least these visual prompts: + +1. `openai/gpt-5.4-mini`: red-card image -> expected `red`. +2. `openrouter/moonshotai/kimi-k2.6`: red-card image -> expected `red`. +3. `openrouter/z-ai/glm-4.5v`: red-card image -> expected `red`. +4. `openrouter/z-ai/glm-5.1`: red-card image -> expected block or clear unsupported warning. +5. Quota/key failure simulation -> exact provider error visible, no teammate offline unless runtime exits. + +## Phase 4 Exit Criteria + +- OpenCode text-only prompt path unchanged. +- Vision-capable OpenCode models receive real image files. +- Non-vision/unknown OpenCode models do not silently hallucinate image answers. +- Existing OpenCode repair policy remains bounded. +- Provider quota/auth errors remain exact and do not become generic “spawn failed”. + + +## Implementation Safeguards + +### Capability result must be explainable + +Users need to understand why one OpenCode model supports screenshots and another does not. + +```ts +export interface CapabilityExplanation { + supported: boolean; + reason: 'known_vision_model' | 'known_non_vision_model' | 'unknown_model' | 'unsupported_provider'; + displayText: string; +} +``` + +Examples: + +- `openai/gpt-5.4-mini supports image attachments.` +- `openrouter/z-ai/glm-5.1 is not marked as image-capable. Choose a vision-capable model or remove images.` +- `This OpenCode model has unknown image support. Image delivery is blocked for reliability.` + +### Do not couple OpenCode vision support to weak-model repair + +The proof-directed repair policy handles missing response proof. It should not decide model vision support. + +Keep separate: + +- `OpenCodeVisionCapability`: can this model receive images? +- `OpenCodePromptDeliveryRepairPolicy`: did a delivered prompt produce required proof? +- `OpenCodeDeliveryLedger`: what happened to this message attempt? + +### OpenCode provider rejection mapping + +OpenRouter/OpenCode errors should remain exact but redacted. + +| Provider error | Mapping | +|---|---| +| Quota/credits/token budget | provider rejected attachment/message, exact diagnostic. | +| Model not found | model/provider config diagnostic. | +| Unsupported file/input | attachment provider rejected. | +| OAuth/session stale | provider auth/session diagnostic. | +| Empty assistant turn | existing bounded no-response repair. | + +No regex classification is required to decide business logic. Regex/string matching can be used only for safe redaction and known diagnostic display cleanup. + +### OpenCode PR checklist + +- Unknown models do not silently receive images. +- Supported models use native file parts. +- No model-specific prompt hacks are added. +- Existing text-only OpenCode tests still pass. +- Existing empty-turn retry remains bounded. +- Provider quota/auth errors are not masked by attachment code. + +### Additional OpenCode tests + +```ts +it('blocks unknown OpenCode vision model before delivery', () => { + const decision = resolveOpenCodeVisionCapability({ + providerId: 'openrouter', + model: 'some/new-model', + }); + + expect(decision.supported).toBe(false); + expect(decision.reason).toBe('unknown_model'); +}); + +it('does not invoke repair policy for model capability block', () => { + const result = planOpenCodeAttachmentDelivery(unknownVisionModelInput()); + expect(result.status).toBe('blocked'); + expect(result.retryable).toBe(false); +}); +``` + + +## Failure Injection Tests for Phase 4 + +```ts +describe('OpenCode image capability and delivery', () => { + it('blocks a known non-vision OpenCode model before prompt delivery', () => { + const result = planOpenCodeImageDelivery({ + providerId: 'openrouter', + model: 'z-ai/glm-5.1', + attachments: [fakeImageAttachment()], + }); + + expect(result.status).toBe('blocked'); + expect(result.failureCode).toBe('attachment_model_unsupported'); + }); + + it('does not retry provider quota rejection as proof repair', async () => { + const result = await deliverOpenCodeImage(fakeOpenRouterQuotaError()); + + expect(result.retryKind).not.toBe('proof_directed_repair'); + expect(result.failureCode).toBe('attachment_provider_rejected'); + }); +}); +``` + +## OpenCode Multi-Hop Edge Cases + +| Scenario | Safe behavior | +|---|---| +| User sends image to lead, lead delegates to OpenCode member | Lead message contains image; delegated message should include image only if explicitly attached/forwarded by lead protocol. Do not automatically leak user image to all teammates. | +| User sends direct image to OpenCode member | Direct delivery uses OpenCode file parts. | +| OpenCode member replies with tool call but no visible reply | Existing proof policy decides, not image layer. | +| OpenCode member starts task from image | Task evidence/progress still required for task state. | +| OpenCode model answers “I cannot access images” | Visible reply delivered; UI can show model limitation, no retry loop. | + +## Privacy Boundary for Delegation + +Do not automatically fan out attachments to every teammate. Attachment propagation should follow explicit message routing. + +Rules: + +- Direct `user -> member`: deliver attachment to that member only. +- `user -> lead`: deliver attachment to lead only. +- Lead delegation: attachment forwarded only if the generated structured message includes or references the attachment intentionally. +- System nudges/work-sync: do not include user attachments. + +This avoids accidental data exposure across teammates. + +## Phase 4 Stop Conditions + +Stop and reassess if: + +- OpenCode requires model-specific prompt hacks to see images. +- Unknown models must be allowed silently for UX convenience. +- Existing proof repair tests start changing because of image capability logic. +- Provider quota errors get converted into generic delivery proof failures. + + +## File-Level Implementation Plan + +Desktop suggested files: + +```text +src/features/agent-attachments/main/providers/opencodeAttachmentAdapter.ts +src/features/agent-attachments/shared/opencodeVisionCapabilities.ts +src/features/agent-attachments/main/providers/opencodeAttachmentAdapter.test.ts +``` + +Orchestrator suggested files: + +```text +src/runtime/opencode/opencodeFileParts.ts +src/runtime/opencode/opencodeFileParts.test.ts +``` + +### Capability matrix skeleton + +```ts +const OPENCODE_VISION_MODELS: OpenCodeModelVisionCapability[] = [ + { + provider: 'openai', + modelPattern: /^gpt-5\.4-mini$/, + vision: 'yes', + evidence: 'live-smoke', + }, + { + provider: 'openrouter', + modelPattern: /^moonshotai\/kimi-k2\.6$/, + vision: 'yes', + evidence: 'live-smoke', + }, + { + provider: 'openrouter', + modelPattern: /^z-ai\/glm-4\.5v$/, + vision: 'yes', + evidence: 'live-smoke', + }, + { + provider: 'openrouter', + modelPattern: /^z-ai\/glm-5\.1$/, + vision: 'no', + evidence: 'live-smoke', + }, +]; +``` + +Keep comments near each entry with date and smoke summary. Do not infer provider-wide support. + +### OpenCode adapter skeleton + +```ts +export function buildOpenCodeAttachmentPromptRequest(input: { + text: string; + providerId: string; + model: string; + attachments: AgentAttachmentPayload[]; +}): OpenCodePromptRequest { + const capability = resolveOpenCodeVisionCapability({ providerId: input.providerId, model: input.model }); + if (!capability.supported) { + throw new AttachmentDeliveryError('attachment_model_unsupported', capability.displayText); + } + + return { + text: input.text, + files: input.attachments.map((attachment) => selectProviderImageArtifact(attachment, 'opencode').absolutePath), + }; +} +``` + +### OpenCode warning copy examples + +Use precise user-facing copy: + +```text +This OpenCode model is not verified for image attachments. Choose a vision-capable model or remove the image. +``` + +```text +OpenCode provider rejected the image attachment: +``` + +Avoid vague copy: + +```text +Message failed. +``` + +```text +OpenCode error. +``` + + +## Phase 4 Deep Review Addendum + +### OpenCode capability decision must be cached but refreshable + +Model capability matrix is mostly static, but the user may change provider/model in Edit Team. + +Rules: + +- Resolve capability from current member metadata at send time. +- Do not cache capability per member forever. +- If member model changes, next send uses new model capability. +- UI warnings should refresh after Edit Team save/restart. + +### OpenCode direct vs lead-routed images + +Direct user to OpenCode member: + +```text +user -> alice(OpenCode) with image +``` + +Deliver image to Alice only. + +User to lead: + +```text +user -> lead with image +``` + +Deliver image to lead only. Lead may choose to ask teammate, but automatic image forwarding should not happen unless future protocol explicitly supports attachment references. + +This is a privacy and predictability decision. + +### OpenCode unsupported model message examples + +For direct send: + +```text +Alice is using openrouter/z-ai/glm-5.1, which is not verified for image attachments. Remove the image or switch Alice to a vision-capable model. +``` + +For team send to lead where lead is not OpenCode: + +```text +The lead can receive this image. It will not be automatically forwarded to OpenCode teammates unless the lead explicitly sends it. +``` + +### OpenCode model capability update procedure + +When adding a new OpenCode vision model: + +1. Run red-square smoke test. +2. Record provider, exact model id, date, result. +3. Add capability matrix entry. +4. Add unit test. +5. If model is unreliable, mark unsupported or unknown, not supported. + + +## Phase 4 Implementation Contract Addendum + +### Provider/model canonicalization + +OpenCode model ids can appear with or without provider prefix depending on UI/runtime source. Normalize before capability lookup. + +```ts +export function canonicalizeOpenCodeModel(input: { + providerId: string; + model: string; +}): { providerId: string; model: string } { + const providerId = input.providerId.toLowerCase(); + const model = input.model.replace(/^openrouter\//, '').replace(/^openai\//, ''); + return { providerId, model }; +} +``` + +Capability tests must cover both forms: + +- `providerId=openrouter`, `model=moonshotai/kimi-k2.6` +- `providerId=openrouter`, `model=openrouter/moonshotai/kimi-k2.6` + +### OpenCode DTO between desktop and orchestrator + +```ts +export interface OpenCodePromptRuntimeRequest { + text: string; + files?: Array<{ + artifactId: string; + path: string; + mimeType: 'image/png' | 'image/jpeg'; + sizeBytes: number; + }>; +} +``` + +Rules: + +- `files` are provider-native attachments, not arbitrary user paths. +- `files` empty means text-only behavior. +- Orchestrator validates path before passing `-f` or HTTP file part. + +### OpenCode ledger interaction examples + +```ts +if (capability.supported === false) { + return { + accepted: false, + delivered: false, + retryable: false, + failureCode: 'attachment_model_unsupported', + }; +} +``` + +This must not create a ledger retry attempt because the prompt was not delivered. + +If provider accepts prompt but returns empty assistant turn, existing ledger retry applies: + +```ts +return { + accepted: true, + delivered: false, + retryable: true, + failureCode: 'empty_assistant_turn', +}; +``` + + +## Implementation Readiness Addendum + +### Definition of Ready for Phase 4 + +Before coding Phase 4: + +- Phase 1 normalized artifacts are stable. +- Existing OpenCode delivery ledger tests are green. +- OpenCode text-only direct reply path is green. +- Capability matrix initial entries are accepted. +- Orchestrator OpenCode file-part mechanism is identified. + +### OpenCode mocked sender strategy + +Use fake OpenCode sender outcomes: + +```ts +type FakeOpenCodeOutcome = + | { kind: 'visible_reply'; text: string } + | { kind: 'empty_assistant_turn' } + | { kind: 'provider_error'; message: string } + | { kind: 'non_visible_tool' }; +``` + +Test attachment adapter and ledger separately: + +- Adapter tests: capability and file-part request building. +- Ledger tests: how delivery outcome affects retry/proof. +- Do not combine both in one huge brittle test unless it is e2e. + +### OpenCode additional edge cases + +| Edge case | Expected behavior | +|---|---| +| Model id casing differs | Canonicalize before capability lookup. | +| Provider prefix duplicated | Canonicalize. | +| OpenRouter key missing | Existing provider auth diagnostic. | +| OpenRouter credits exhausted | Provider diagnostic, no proof repair. | +| Vision model returns empty | Existing bounded empty-turn retry. | +| Non-vision model says cannot see image | If blocked before send, this should not occur. If it occurs from stale capability, visible reply is shown and capability should be corrected. | + +### Capability matrix change control + +Do not accept capability entries without evidence. + +Required evidence for `vision: yes`: + +- Real smoke prompt. +- Model id exactly as runtime uses it. +- Date. +- Observed answer. + +Required evidence for `vision: no`: + +- Real smoke or provider documentation. +- Observed refusal/unsupported behavior. + + +## Final Phase 4 Acceptance Specs + +### Spec 1 - known vision OpenCode model + +```gherkin +Given a user sends an image to openrouter/moonshotai/kimi-k2.6 +And the capability matrix marks it vision-capable +When delivery is planned +Then the image is sent as a native file part +And existing ledger proof gates decide delivery success +``` + +### Spec 2 - known non-vision OpenCode model + +```gherkin +Given a user sends an image to openrouter/z-ai/glm-5.1 +And the capability matrix marks it unsupported +When delivery is planned +Then delivery is blocked before prompt send +And no retry ledger attempt is created +And the UI explains that the model is not verified for image attachments +``` + +### Spec 3 - OpenCode provider quota error + +```gherkin +Given an OpenCode provider rejects a message due to quota +When an image message is sent +Then the exact redacted provider diagnostic is shown +And the failure is not converted into proof repair +And the member launch state is unchanged +``` + +### Phase 4 exact PR contract + +The Phase 4 PR is acceptable only if: + +- Capability matrix has tests for every entry. +- Unknown models default safe. +- File-part request builder has tests. +- Existing OpenCode ledger bounded retry tests remain green. +- Unsupported capability block does not create a retry attempt. +- Direct and lead-routed privacy rules are documented in tests or code comments. + +### OpenCode capability copy examples + +```text +Jack is using openrouter/z-ai/glm-5.1, which is not verified for image attachments. Switch to a vision-capable model or remove the image. +``` + +```text +Alice can receive this image because openai/gpt-5.4-mini is verified for image attachments. +``` + + +## Phase 4 Pre-Mortem and Extra Safeguards + +### Likely OpenCode mistakes + +| Mistake | Concrete prevention | +|---|---| +| Unknown model allowed because “maybe vision” | Unknown defaults blocked. | +| Capability lookup misses provider prefix variant | Canonicalization tests. | +| Provider quota error enters proof repair | Provider rejection is terminal for that attempt, not proof repair. | +| Direct user image gets auto-forwarded to teammates | Explicit route-only attachment propagation. | +| Non-vision model gets image-less prompt silently | Capability block before delivery. | + +### OpenCode file-part transport abstraction + +Do not let high-level delivery know whether OpenCode uses CLI `-f` or HTTP multipart internally. + +```ts +export interface OpenCodeFilePartTransport { + sendPrompt(input: { + sessionId: string; + text: string; + files: OpenCodeRuntimeFilePart[]; + }): Promise; +} +``` + +This allows future switch from CLI to server API without changing capability or ledger logic. + +### OpenCode semantic refusal handling + +If model responds visibly with “I cannot inspect images”: + +- Direct user message: show response as delivered. +- Add optional diagnostic warning that capability matrix may be wrong. +- Do not retry automatically. +- Consider downgrading capability entry after repeated smoke failure. + +### OpenCode tests for no silent drop + +```ts +it('does not send text-only prompt when image is unsupported', async () => { + const sender = createFakeOpenCodeSender(); + await expect(deliverImageToUnsupportedOpenCodeModel(sender)).rejects.toMatchObject({ + code: 'attachment_model_unsupported', + }); + expect(sender.calls).toHaveLength(0); +}); +``` + diff --git a/docs/team-management/agent-attachments-phase-5-e2e-and-polish-plan.md b/docs/team-management/agent-attachments-phase-5-e2e-and-polish-plan.md index 78c671dd..af15e401 100644 --- a/docs/team-management/agent-attachments-phase-5-e2e-and-polish-plan.md +++ b/docs/team-management/agent-attachments-phase-5-e2e-and-polish-plan.md @@ -629,3 +629,683 @@ All OpenRouter models support screenshots. | docs become stale | capability catalog references live smoke date | | unsupported negative model changes behavior | update catalog/test explicitly | | copied diagnostics leak image data | redaction unit tests | + +## Detailed E2E Matrix + +Use one tiny deterministic image fixture: a red square with no metadata dependencies. + +| Provider path | Model | Prompt | Expected | +|---|---|---|---| +| Claude subscription | current configured Claude model | `What color is the square? One word.` | `red` | +| Codex subscription | `gpt-5.4-mini` or available vision model | same | `red` | +| OpenCode OpenAI | `openai/gpt-5.4-mini` | same | `red` | +| OpenCode OpenRouter | `openrouter/moonshotai/kimi-k2.6` | same | `red` | +| OpenCode OpenRouter | `openrouter/z-ai/glm-4.5v` | same | `red` | +| OpenCode OpenRouter | `openrouter/z-ai/glm-5.1` | same | blocked/unsupported | + +## Negative E2E Matrix + +| Scenario | Expected | +|---|---| +| Oversized image | UI blocks before send with size/optimization details. | +| Corrupt image | UI blocks with decode error. | +| Missing optimized artifact | Delivery fails actionable, message preserved. | +| Provider quota exhausted | Provider diagnostic shown, no generic spawn failed. | +| Codex logged out | Existing Codex login diagnostic shown. | +| OpenCode model no vision | Unsupported warning/block shown. | +| Multiple images over total budget | Deterministic budget warning. | +| Runtime exits after send | Runtime exit diagnostic separate from attachment validation. | + +## Automated Test Layers + +### Pure tests + +- Budget selection. +- Capability resolution. +- MIME/type classification. +- Provider adapter serialization. +- Redaction. + +### Main process tests + +- IPC validation rejects invalid attachment ids. +- Backend never accepts raw renderer paths. +- Missing artifacts produce controlled failure. +- Delivery failure does not mutate launch state. + +### Renderer tests + +- Preview and optimization warnings. +- Target switching recomputes capability warning. +- Send button blocks unsupported attachments. +- Removing attachments clears warnings. + +### Safe E2E tests + +- Provider command builders receive native attachment parts. +- One real smoke per supported provider before release. + +## Release Checklist + +Before enabling broadly: + +- `pnpm typecheck --pretty false` passes. +- Focused attachment pure tests pass. +- Existing team launch tests still pass. +- Existing OpenCode delivery tests still pass. +- At least one real visual smoke is green for Claude, Codex, and OpenCode. +- No new dependency is added without version/license check. +- App logs do not contain raw base64 image data. +- Copy diagnostics redacts paths/secrets where required. +- User-facing warnings are clear and actionable. + +## Observability Additions + +Add lightweight diagnostics that help debug without leaking content. + +Safe diagnostic fields: + +```ts +interface AttachmentDiagnosticSummary { + attachmentCount: number; + totalBytes: number; + optimizedTotalBytes: number; + providers: string[]; + blockedReason?: AttachmentDeliveryFailureCode; + warningCodes: string[]; +} +``` + +Unsafe diagnostic fields: + +- Base64 content. +- Full OCR/text extracted from images. +- Provider API keys. +- User home directory paths unless already part of explicit local diagnostics. +- Full binary file hashes if privacy-sensitive. + +## Post-Release Monitoring + +Watch for these regressions after release: + +- Users report team going offline after sending images. +- Users report image sent but model says it cannot see image on a supposedly vision-capable model. +- Inbox/message JSON grows unexpectedly large. +- Renderer memory spikes after attaching multiple screenshots. +- Codex/OpenCode diagnostics become generic after attachment failures. +- Retry loops repeat the same image message without bounded attempts. + +If any happen: + +1. Disable only affected provider adapter path. +2. Keep normalization and UI warnings if stable. +3. Preserve messages and attachment metadata for forensic debugging. +4. Do not revert unrelated bootstrap/process backend changes in the same hotfix. + +## Final Safety Review Questions + +Before implementation starts, answer these in the PR description: + +- Does any provider receive base64 as plain text? If yes, why is that unavoidable? +- Can renderer force backend to read an arbitrary file path? It must not. +- What happens if optimized artifact is deleted before retry? +- What happens if user switches target model after attaching image? +- What happens if a provider supports text but not vision? +- Does this change launch readiness or only message delivery? +- Are all retries bounded by existing ledger/runtime rules? + + +## Deep Verification Plan + +### Deterministic fixture generation + +Keep fixture generation local and deterministic so E2E does not depend on external image files. + +```ts +export async function createRedSquareFixture(path: string): Promise { + // Use a tiny PNG fixture committed to test fixtures, or generate with deterministic bytes. +} +``` + +Preferred: + +- Commit a tiny PNG fixture under test fixtures if repository policy allows binary fixtures. +- Otherwise generate once in test setup using a deterministic encoder. + +### E2E assertion style + +Do not require exact model wording. Normalize response and assert semantic color. + +```ts +function answerMentionsRed(text: string): boolean { + return /\bred\b/i.test(text) || /красн/i.test(text); +} +``` + +Avoid accepting `I cannot view images` as pass. + +### E2E timeout strategy + +- Provider smoke tests should have generous but bounded timeout. +- Failure should print provider/model, delivery path, and redacted stderr tail. +- One provider failure should not hide other provider results. + +### Release-blocking vs non-blocking tests + +Release-blocking: + +- Pure unit tests. +- Serialization tests. +- One Claude smoke if Claude support is enabled. +- One Codex smoke if Codex support is enabled. +- One OpenCode OpenAI smoke if OpenCode image support is enabled for OpenAI. + +Non-blocking/manual before release: + +- Multiple OpenRouter model smokes. +- Very large image performance test. +- HEIC/clipboard platform-specific tests. + +### User documentation checklist + +Document these user-facing facts: + +- Screenshots are automatically optimized before sending. +- Some OpenCode models do not support image attachments. +- If a model cannot receive images, the UI will ask the user to switch model or remove images. +- Raw files are not pasted as base64 into messages. +- If delivery fails, the message is preserved and the error explains whether it was size, model support, auth, quota, or runtime. + +### Final PR template section + +Each implementation PR should include: + +```md +## Attachment safety checklist + +- [ ] Text-only messages unchanged. +- [ ] No base64 plain-text prompt fallback. +- [ ] Backend validates attachment ids and paths. +- [ ] Unsupported model behavior tested. +- [ ] Provider auth errors remain provider auth errors. +- [ ] Delivery failure does not mark teammate offline unless runtime exits. +- [ ] Copy diagnostics redacts secrets. +- [ ] E2E smoke listed or intentionally deferred. +``` + + +## Failure Injection E2E Scenarios + +These should be run after unit coverage is green. + +### Runtime survives attachment failure + +1. Start a team with one known working member. +2. Send unsupported oversized image. +3. Verify send is blocked or delivery fails without marking team/member offline. +4. Send text-only message afterward. +5. Verify member still responds. + +### Provider auth failure remains auth failure + +1. Temporarily use stale/invalid provider auth in test environment. +2. Send image message. +3. Verify diagnostics mention auth/login, not image unsupported. +4. Restore auth. + +### Non-vision model blocked + +1. Select known non-vision OpenCode model. +2. Attach image. +3. Verify UI blocks before delivery. +4. Remove image. +5. Verify text-only send works. + +### Artifact missing on retry + +1. Save message with image artifact. +2. Delete artifact file in test harness before retry. +3. Trigger retry. +4. Verify `attachment_artifact_missing` and no text-only fallback. + +## Release Candidate Checklist + +A release candidate can include attachment support only if: + +- All Phase 1 pure/domain tests pass. +- Provider enabled in UI has at least one real visual smoke test green. +- Unsupported model UX has been manually verified. +- Copy diagnostics includes attachment error code and provider/model, but no base64. +- Team launch tests are unchanged or green. +- Existing user message delivery tests are green. +- OpenCode proof repair tests are green. +- Codex auth/preflight tests are green. + +## Suggested rollout order in release notes + +1. “Image attachments are optimized before sending.” +2. “Claude and Codex vision-capable models receive screenshots through native image channels.” +3. “OpenCode image support depends on the selected model. The UI warns when a model is not image-capable.” +4. “Unsupported images are blocked with actionable diagnostics instead of being pasted as text.” + +## Post-Implementation Audit Checklist + +After all phases are implemented, search the codebase for these anti-patterns: + +```text +base64, +data:image, +--image, +-f, +attachment_provider_rejected, +attachment_model_unsupported, +launch-state, +spawnStatus, +bootstrapConfirmed +``` + +Audit goals: + +- `base64` appears only in provider-native block serialization or tests. +- `data:image` is not inserted into prompt text. +- `--image` appears only in Codex command builder and tests. +- `-f` image usage appears only in OpenCode adapter/command builder and tests. +- Attachment failures do not write launch-state/spawn status. +- Bootstrap code does not import attachment modules. + + +## Implementation Completion Checklist by Phase + +### Phase 1 completion evidence + +Attach to PR: + +- Unit test output for budget/validation/capability modules. +- Screenshot of composer warning for unsupported/oversized image. +- Screenshot of optimized image metadata display if UI exposes it. +- Confirmation that no provider delivery code changed. + +### Phase 2 completion evidence + +Attach to PR: + +- Claude serialization test output. +- Real Claude visual smoke result. +- Text-only Claude regression result. +- Diagnostic screenshot for oversized image block. + +### Phase 3 completion evidence + +Attach to PR: + +- Codex command builder test output. +- Real Codex visual smoke result using subscription mode. +- Confirmation Codex auth diagnostics unchanged. +- Confirmation no shell string command construction was added. + +### Phase 4 completion evidence + +Attach to PR: + +- OpenCode capability matrix tests. +- OpenCode OpenAI visual smoke result. +- OpenRouter Kimi/GLM visual smoke results if enabled. +- Unsupported GLM/non-vision model warning screenshot. +- Confirmation OpenCode proof repair lifecycle unchanged. + +### Phase 5 completion evidence + +Attach to PR: + +- Full focused test list. +- Manual smoke matrix with provider/model/date. +- Known limitations section. +- Release note draft. + +## Final bug-prevention checklist + +Before merging the final phase, manually inspect these concerns: + +- Does every provider adapter have tests for empty attachments and image attachments? +- Does every adapter reject unsupported MIME types? +- Does every adapter avoid changing text-only behavior? +- Does any code parse provider error text to decide model capability? It should not. +- Does any code write attachment failures into launch state? It must not. +- Does any code drop attachments silently? It must not. +- Does unsupported model UI block send or require explicit user action? It should. +- Does retry preserve attachment identity? It must. +- Does artifact missing produce a clear error? It must. + +## Honest risk estimate after this planning + +If implemented phase-by-phase exactly as planned: + +- Phase 1 risk: `2.5/10` +- Phase 2 risk: `3/10` +- Phase 3 risk: `4/10` +- Phase 4 risk: `5/10` +- Phase 5 risk: `2/10` +- Overall release risk: `3.5/10` + +If implemented as one broad refactor: + +- Overall release risk: `7/10` + +Main reason: the dangerous part is not image resizing itself. The dangerous part is accidentally coupling attachments to delivery proofs, runtime liveness, auth diagnostics, or provider-specific launch code. + + +## Phase 5 Deep Review Addendum + +### Cross-provider E2E script shape + +A single script can reduce manual drift, but it should not be required for normal unit tests. + +Conceptual CLI: + +```bash +pnpm tsx scripts/smoke-agent-attachments.ts \ + --provider claude \ + --model current \ + --image test/fixtures/red-square.png +``` + +Output should be machine-readable enough for logs: + +```json +{ + "provider": "codex", + "model": "gpt-5.4-mini", + "image": "red-square.png", + "result": "pass", + "answerPreview": "red", + "durationMs": 18432 +} +``` + +### E2E failure report template + +When an image smoke fails, report: + +```text +Provider: OpenCode +Model: openrouter/z-ai/glm-5.1 +Expected: red +Observed: I cannot view images +Classification: model_not_vision_capable +Action: keep blocked/unsupported in capability matrix +``` + +Do not report it as generic runtime failure. + +### Manual release drill + +Before release, do this exact drill: + +1. Start app fresh. +2. Create or open a simple team with Claude lead. +3. Send text-only message, confirm normal reply. +4. Send one screenshot to Claude, confirm visual answer. +5. Switch to Codex member, send same screenshot, confirm visual answer. +6. Switch to OpenCode vision member, send same screenshot, confirm visual answer. +7. Switch to OpenCode unsupported model, confirm UI blocks image. +8. Send text-only to unsupported model, confirm it still works. +9. Restart app, verify messages render and attachments do not disappear. +10. Copy diagnostics from failed unsupported send, verify no base64/secrets. + +### Final release confidence rating target + +Do not ship broad attachment support unless confidence reaches: + +- 🎯 at least `8.5/10` for Claude. +- 🎯 at least `8/10` for Codex. +- 🎯 at least `7.5/10` for OpenCode vision-capable known models. +- 🛡️ at least `9/10` that unsupported models fail safely. +- 🛡️ at least `9/10` that launch/readiness is unaffected. + + +## Phase 5 Implementation Contract Addendum + +### Test command matrix + +Focused checks after each phase should be small and targeted. + +Phase 1: + +```bash +pnpm vitest run test/features/agent-attachments/budgets.test.ts test/features/agent-attachments/validation.test.ts +pnpm typecheck --pretty false +``` + +Phase 2: + +```bash +pnpm vitest run test/features/agent-attachments/claudeAttachmentAdapter.test.ts +pnpm vitest run test/main/services/team/TeamProvisioningServiceRelay.test.ts +``` + +Phase 3: + +```bash +pnpm vitest run test/features/agent-attachments/codexAttachmentAdapter.test.ts +# plus orchestrator codex command builder test in agent_teams_orchestrator +``` + +Phase 4: + +```bash +pnpm vitest run test/features/agent-attachments/opencodeAttachmentAdapter.test.ts +pnpm vitest run test/main/services/team/OpenCodePromptDeliveryLedger.test.ts test/main/services/team/OpenCodePromptDeliveryWatchdog.test.ts +``` + +Phase 5: + +```bash +pnpm typecheck --pretty false +pnpm vitest run test/features/agent-attachments/**/*.test.ts +``` + +Exact test paths may change during implementation, but the test categories should remain. + +### Smoke result ledger + +Create a simple markdown table in PR or docs after live smokes: + +| Date | Provider | Model | Runtime path | Result | Notes | +|---|---|---|---|---|---| +| 2026-05-09 | Claude | current | app team delivery | pass | red-square -> red | +| 2026-05-09 | Codex | gpt-5.4-mini | app team delivery | pass | red-square -> red | +| 2026-05-09 | OpenCode | openai/gpt-5.4-mini | app team delivery | pass | red-square -> red | + +This prevents accidental reliance on standalone prototype results when app path differs. + +### Final implementation order reminder + +Do not start with provider adapters. Start with Phase 1 domain and storage because provider adapters need a stable input contract. + +Correct order: + +1. Normalize and persist artifacts safely. +2. Block unsupported/oversized sends. +3. Add Claude adapter. +4. Add Codex adapter. +5. Add OpenCode adapter. +6. Add E2E and docs polish. + +Wrong order: + +1. Hack provider CLI args. +2. Later figure out storage. +3. Later figure out UI warnings. +4. Later discover retries lost files. + + +## Implementation Readiness Addendum + +### Definition of Ready for Phase 5 + +Before Phase 5: + +- All provider adapters have unit tests. +- At least one provider image path works in app-managed runtime. +- Unsupported model UX exists. +- Diagnostics taxonomy is implemented. + +### Fixture strategy + +Prefer one committed tiny PNG fixture plus optional generated variants. + +Fixtures: + +```text +test/fixtures/attachments/red-square.png +test/fixtures/attachments/red-square-large.png +test/fixtures/attachments/corrupt.png +test/fixtures/attachments/not-image.txt +``` + +If binary fixtures are not desired, generate in test setup, but keep generation deterministic. + +### Smoke classification helper + +```ts +export function classifyVisualSmokeAnswer(answer: string): 'pass' | 'refusal' | 'wrong' | 'empty' { + const normalized = answer.trim().toLowerCase(); + if (!normalized) return 'empty'; + if (/cannot|can't|unable|view|image/.test(normalized) && !/red/.test(normalized)) return 'refusal'; + if (/\bred\b|красн/.test(normalized)) return 'pass'; + return 'wrong'; +} +``` + +Do not overfit this helper to one provider. It is only for smoke classification. + +### Release notes limitations section + +Document limitations honestly: + +```md +Known limitations: + +- Image support depends on selected provider and model. +- Some OpenCode/OpenRouter models are intentionally blocked until verified for vision. +- Very large images are optimized and may lose some detail. +- Non-image files continue using existing file behavior and are not part of the new image pipeline. +``` + +### Final post-merge watchlist + +For the first release after merge, watch for: + +- Increased renderer memory usage after attaching screenshots. +- Any report of team launch failures after sending images. +- Any report that images appear in message text as base64. +- Any OpenCode model that should be marked unsupported. +- Any Codex auth diagnostic regression. +- Any failed retry due to missing artifacts. + + +## Final Phase 5 Acceptance Specs + +### Spec 1 - cross-provider visual smoke matrix + +```gherkin +Given the red-square fixture +When visual smoke runs for each enabled provider path +Then every supported provider/model answers red +And every unsupported model is blocked or classified unsupported +And no failure is reported as generic spawn failure +``` + +### Spec 2 - diagnostics are safe + +```gherkin +Given an attachment delivery fails +When the user copies diagnostics +Then diagnostics include provider, model, failure code, and redacted reason +And diagnostics do not include base64 image bytes +And diagnostics do not include API keys or bearer tokens +``` + +### Spec 3 - launch unaffected + +```gherkin +Given a team launches without attachments +When launch completes +Then launch behavior matches pre-attachment behavior +And no attachment module participates in bootstrap readiness +``` + +### Phase 5 exact PR contract + +The Phase 5 PR is acceptable only if: + +- Smoke results are recorded with date/provider/model/runtime path. +- Unsupported model behavior is manually verified. +- Existing launch and delivery focused tests are green. +- Documentation explains limitations honestly. +- Release notes avoid promising universal vision support. +- Post-merge watchlist is included in release checklist. + +### Final “do not ship if” list + +Do not ship if any of these are true: + +- Text-only messages regress for any provider. +- Any provider receives base64 as plain prompt text. +- Unsupported OpenCode models silently accept image messages. +- Attachment failure marks teammate spawn failed without runtime exit evidence. +- Codex subscription auth diagnostics regress. +- Image retry can silently send text without the original image. +- Copy diagnostics leaks base64 or secrets. + + +## Phase 5 Pre-Mortem and Extra Safeguards + +### Likely verification mistakes + +| Mistake | Prevention | +|---|---| +| Standalone provider smoke passes but app path fails | Smoke through app-managed team delivery. | +| E2E accepts refusal as pass | Use classifier that rejects “cannot view image”. | +| Only happy path tested | Include unsupported model, artifact missing, and auth failure. | +| Logs leak base64 | Copy diagnostics test searches for base64 markers. | +| Release notes overpromise | Explicit known limitations section. | + +### Copy diagnostics acceptance test + +```ts +it('copy diagnostics omits image bytes and secrets', () => { + const text = buildAttachmentFailureDiagnostics(fakeImageFailure()); + expect(text).toContain('attachment_model_unsupported'); + expect(text).not.toContain('data:image'); + expect(text).not.toMatch(/[A-Za-z0-9+/]{200,}={0,2}/); + expect(text).not.toMatch(/sk-[a-zA-Z0-9_-]+/); +}); +``` + +### App-path smoke script must prove route + +Smoke result should include route: + +```json +{ + "route": "user->lead", + "teamName": "attachment-smoke-claude", + "provider": "anthropic", + "model": "current", + "attachmentTransport": "claude_structured_blocks", + "result": "pass" +} +``` + +If route is standalone CLI only, mark as prototype evidence, not release evidence. + +### Release manager checklist + +- Read all `Do not ship if` items. +- Confirm no broad feature flag remains. +- Confirm capability gates are user-visible product behavior. +- Confirm no launch/provisioning files import attachment modules. +- Confirm provider smoke evidence is app-path evidence. +- Confirm rollback by provider is possible without data migration. +