37 KiB
Phase 5 - Cross-runtime attachment E2E, diagnostics, docs, and polish
⚠️ Историческая документация: Этот файл содержит дизайн-документацию. Фактическая реализация в src/features/agent-attachments/ может отличаться от описанной архитектуры.
Summary
Goal: make the completed attachment system observable, testable, and understandable for users before release.
Chosen approach: small live smoke harness + deterministic diagnostics + UI copy polish + documentation, with no new runtime semantics.
🎯 8.8 🛡️ 8.7 🧠 5.4
Estimated change size: 180-320 LOC plus tests/docs.
This phase should happen after Claude, Codex, and OpenCode adapters are implemented. It should not introduce new delivery behavior.
Deliverables
- live attachment smoke script;
- reusable test fixture image generator;
- user-visible diagnostics for unsupported models and oversized images;
- docs for supported runtimes/models;
- release checklist.
Live smoke harness
Create a script that generates a deterministic image and runs each supported runtime.
Suggested location:
scripts/smoke/agent-attachments-smoke.mjs
Sketch:
const cases = [
{
id: 'claude-subscription-streaming',
runtime: 'claude',
model: 'claude-haiku-4-5',
expected: /red/i,
},
{
id: 'codex-native-gpt-5-4-mini',
runtime: 'codex',
model: 'gpt-5.4-mini',
expected: /red/i,
},
{
id: 'opencode-openai-gpt-5-4-mini',
runtime: 'opencode',
model: 'openai/gpt-5.4-mini',
expected: /red/i,
},
{
id: 'opencode-openrouter-kimi-k2-6',
runtime: 'opencode',
model: 'openrouter/moonshotai/kimi-k2.6',
envRequired: ['OPENROUTER_API_KEY'],
expected: /red/i,
},
{
id: 'opencode-openrouter-glm-4-5v',
runtime: 'opencode',
model: 'openrouter/z-ai/glm-4.5v',
envRequired: ['OPENROUTER_API_KEY'],
expected: /red/i,
},
{
id: 'opencode-openrouter-glm-5-1-negative',
runtime: 'opencode',
model: 'openrouter/z-ai/glm-5.1',
envRequired: ['OPENROUTER_API_KEY'],
expectedUnsupported: true,
},
];
The harness must:
- redact keys;
- use timeouts;
- kill child processes on timeout;
- write structured JSON result;
- skip cases when required auth/env is missing;
- never print base64 image content.
Deterministic fixture image
Do not depend on external image files.
Generate a small valid PNG with Node zlib and CRC32, like the prototype did.
export function writeRedCardPng(path: string): void {
// 320x240 red card with white center marker.
}
This avoids flaky fixtures and keeps smoke tests self-contained.
Diagnostics UX
Add compact diagnostics wherever attachments are shown or rejected.
Examples:
Sent 1 optimized image: screenshot.jpg, 1920x1080, 612 KB.
Images are not supported by openrouter/z-ai/glm-5.1. Choose GLM 4.5V, Kimi K2.6, GPT-5.4-mini, Claude, or Codex.
Attachment payload is too large after optimization: 8.4 MB serialized. Limit is 7.5 MB.
OpenRouter is not connected in OpenCode. Connect OpenRouter before using this model.
Copy diagnostics
When user copies diagnostics for a failed send, include:
Attachment summary:
- files: 2
- optimized bytes: 1.2 MB
- estimated serialized payload: 1.7 MB
- target runtime: opencode
- target model: openrouter/z-ai/glm-5.1
- capability decision: unsupported image input
Do not include:
- base64;
- full API keys;
- bearer tokens;
- raw data URLs.
Documentation
Add docs under:
docs/team-management/agent-attachments.md
Contents:
- supported runtimes;
- supported model examples;
- unsupported model examples;
- why images may be resized;
- why some models cannot receive screenshots;
- troubleshooting auth/provider issues;
- how to run smoke tests.
Release checklist
Before release:
- text-only messages still work for Claude/Codex/OpenCode;
- oversized image blocked before send;
- Claude image send works;
- Codex image send works;
- OpenCode OpenAI image send works;
- OpenCode OpenRouter Kimi works if key configured;
- OpenCode GLM 5.1 image is blocked or clearly marked unsupported;
- no base64 appears in logs, copied diagnostics, or UI error text;
- retry with attachments reuses artifacts or fails loudly;
- removing attachments clears warnings;
- unsupported model warning updates when model changes.
E2E scenarios
Scenario 1 - Claude lead screenshot
Create/launch Claude team -> send screenshot to lead -> lead answers about image.
Expected:
- no process crash;
- message visible;
- optimized attachment notice visible;
- lead response received.
Scenario 2 - Codex lead screenshot
Create/launch Codex team -> send screenshot -> Codex sees image via --image.
Expected:
- artifact file created;
- Codex args include
--image; - no base64 in prompt text;
- response received.
Scenario 3 - OpenCode supported model
OpenCode Kimi K2.6 secondary -> direct user message with screenshot.
Expected:
- file part delivered;
- delivery proof still required;
- response visible.
Scenario 4 - OpenCode unsupported model
OpenCode GLM 5.1 secondary -> attempt screenshot send.
Expected:
- send blocked before model call;
- message explains model does not support image input;
- no fake queued/pending delivery;
- text-only send still works.
Scenario 5 - Oversized multi-image send
Attach 5 large screenshots.
Expected:
- optimizer reduces where safe;
- if still too large, send blocked;
- no partial delivery.
Test plan
Suggested focused checks:
pnpm vitest run src/features/agent-attachments/**/*.test.ts test/main/ipc/teams.test.ts test/renderer/components/team/messages/MessageComposer.test.tsx
pnpm vitest run test/main/services/team/TeamProvisioningService.test.ts test/main/services/team/OpenCodePromptDeliveryLedger.test.ts
pnpm typecheck --pretty false
Live smoke only when requested:
node scripts/smoke/agent-attachments-smoke.mjs --case claude-subscription-streaming
node scripts/smoke/agent-attachments-smoke.mjs --case codex-native-gpt-5-4-mini
OPENROUTER_API_KEY=... node scripts/smoke/agent-attachments-smoke.mjs --case opencode-openrouter-kimi-k2-6
Safety checklist
- Smoke harness redacts secrets.
- Live tests have timeouts and cleanup.
- Docs clearly separate transport support from model vision support.
- No new runtime behavior is introduced in this phase.
Deep implementation details
Live smoke output contract
The smoke script should write machine-readable JSON and concise console output.
export interface AttachmentSmokeResult {
id: string;
runtime: 'claude' | 'codex' | 'opencode';
model: string;
status: 'passed' | 'failed' | 'skipped';
reason?: string;
responseText?: string;
durationMs: number;
diagnostics: string[];
}
Console output example:
PASS claude-subscription-streaming -> red
PASS codex-native-gpt-5-4-mini -> red
SKIP opencode-openrouter-kimi-k2-6 -> OPENROUTER_API_KEY not set
FAIL opencode-openrouter-glm-5-1-negative -> expected unsupported but got red
Never print secrets.
Timeout wrapper
async function runWithTimeout<T>(label: string, timeoutMs: number, run: (signal: AbortSignal) => Promise<T>): Promise<T> {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(new Error(`${label} timed out`)), timeoutMs);
try {
return await run(controller.signal);
} finally {
clearTimeout(timer);
}
}
For child processes, abort must kill process group when possible.
Redaction helper
export function redactAttachmentSmokeLog(input: string): string {
return input
.replace(/sk-or-v1-[A-Za-z0-9_-]+/g, 'sk-or-v1-[REDACTED]')
.replace(/Bearer\s+[A-Za-z0-9._-]+/gi, 'Bearer [REDACTED]')
.replace(/data:image\/[a-z0-9.+-]+;base64,[A-Za-z0-9+/=]+/gi, 'data:image/[REDACTED];base64,[REDACTED]');
}
Docs structure
docs/team-management/agent-attachments.md should include:
# Agent attachments
## Supported runtimes
## Supported image models
## Unsupported or unverified models
## Why screenshots are optimized
## Troubleshooting
## Running smoke tests
## Security and privacy notes
UI polish details
Attachment preview should show:
screenshot.jpg
1920x1080 - 612 KB - optimized
Unsupported model warning should include direct action:
Change model
Remove image
Do not show internal provider ids only. Use friendly label when available:
GLM 5.1 via OpenRouter
But copied diagnostics should include exact model id:
modelId=openrouter/z-ai/glm-5.1
More e2e cases
| Scenario | Expected |
|---|---|
| Text-only message after failed image send | succeeds normally |
| User removes unsupported image and sends text | no stale warning blocks send |
| User switches from GLM 5.1 to GLM 4.5V | warning clears and send allowed |
| User switches from OpenCode to Claude | OpenCode model warning disappears, Claude budget warning remains if oversized |
| OpenRouter key missing | OpenRouter smoke skipped, not failed |
| OpenRouter quota exhausted | smoke failed with provider quota diagnostic, no secret printed |
| Codex auth expired | Codex smoke failed with auth diagnostic, attachment system not blamed |
| Claude subscription over limit | Claude smoke failed with provider limit diagnostic, attachment system not blamed |
Release readiness scoring
Before shipping, score each area:
| Area | Target score |
|---|---|
| Text-only regression confidence | 9/10 |
| Oversized image protection | 9/10 |
| Claude image path | 8.5/10 |
| Codex image path | 8/10 |
| OpenCode OpenAI image path | 8/10 |
| OpenCode OpenRouter model gating | 7.5/10 |
| User-facing errors | 8.5/10 |
If any score is below target, do not release the whole attachment feature. Ship only earlier phases.
Regression traps
- Smoke tests accidentally depend on local user secrets and fail in CI.
- UI says “image sent” when only optimization happened.
- Diagnostics copy includes data URL.
- Docs overpromise unknown OpenRouter models.
- Negative model smoke becomes flaky because provider upgrades model capability. If GLM 5.1 starts supporting images, update catalog and test expectation.
File-by-file implementation plan
Smoke script
Create:
scripts/smoke/agent-attachments-smoke.mjs
Optional helper:
scripts/smoke/lib/write-red-card-png.mjs
scripts/smoke/lib/redact-smoke-log.mjs
Do not put live smoke in normal test suite by default.
Documentation
Create:
docs/team-management/agent-attachments.md
Link it from:
docs/team-management/debugging-agent-teams.md
only if it helps support/debugging.
UI polish tests
Potential tests:
test/renderer/components/team/messages/MessageComposer.test.tsx
test/renderer/utils/attachmentUtils.test.ts
src/features/agent-attachments/**/*.test.ts
Smoke script behavior details
CLI options
node scripts/smoke/agent-attachments-smoke.mjs --all
node scripts/smoke/agent-attachments-smoke.mjs --case codex-native-gpt-5-4-mini
node scripts/smoke/agent-attachments-smoke.mjs --json /tmp/attachment-smoke.json
Skip logic
if (case.envRequired?.some(name => !process.env[name])) {
return { status: 'skipped', reason: `${name} not set` };
}
Missing auth should be failed if the runtime is expected to be locally logged in, but OpenRouter env cases can be skipped if key absent.
Child process cleanup
const child = spawn(command, args, { detached: true });
try {
return await waitForResult(child, timeoutMs);
} finally {
if (!child.killed) {
try { process.kill(-child.pid!, 'SIGTERM'); } catch {}
}
}
Be careful on macOS where process groups may differ. If not detached, kill child pid only.
Docs examples
Supported model section
## Verified image-capable models
- Claude subscription via stream-json
- Codex native GPT-5.4-mini via `--image`
- OpenCode OpenAI GPT-5.4-mini
- OpenCode OpenRouter Kimi K2.6
- OpenCode OpenRouter GLM 4.5V
Unsupported model section
## Known unsupported or text-only models
- OpenCode OpenRouter GLM 5.1: accepts text but does not support image input in live smoke.
Troubleshooting section
If OpenCode says `Provider not found: openrouter`, connect OpenRouter in provider management or provide `OPENROUTER_API_KEY` for smoke tests.
More polish edge cases
| Edge case | UI/docs behavior |
|---|---|
| User sees “not verified” for a model they know supports vision | docs explain conservative default and how to request/verify model |
| Live smoke passes for a previously unknown model | update capability catalog in separate commit |
| Provider changes model behavior | negative smoke catches mismatch, catalog updated deliberately |
| User reports model saw image but UI blocked | add override only after reproducing or provider metadata confirms |
| User reports image too blurry | adjust Phase 1 quality policy, not provider adapters |
| User reports process crashed with image | diagnostics should include payload bytes and runtime stderr tail, not base64 |
Final release decision tree
If Phase 1 is green but Phase 2 is risky -> ship safer budget validation only.
If Claude is green but Codex is flaky -> ship Claude only, keep Codex blocked.
If Codex is green but OpenCode model gate is incomplete -> ship Claude+Codex, keep OpenCode blocked.
If OpenCode OpenAI is green but OpenRouter is unstable -> allow OpenAI, block OpenRouter unknowns.
Do not hold safer early phases hostage to later dynamic OpenRouter model risk.
Phase 5 exit criteria
Phase 5 is complete only when:
- smoke harness can run selected cases independently;
- smoke harness redacts secrets and data URLs;
- docs list verified and unsupported models separately;
- UI copy does not overpromise unknown models;
- copied diagnostics include enough metadata to debug without leaking payload;
- release checklist is green or explicitly scoped down.
Smoke harness case definitions
const cases: AttachmentSmokeCase[] = [
{
id: 'claude-streaming-haiku',
runtime: 'claude',
command: 'node',
args: ['scripts/smoke/runners/claude-sdk-image.mjs'],
expected: /\bred\b/i,
timeoutMs: 60_000,
},
{
id: 'codex-native-gpt-5-4-mini',
runtime: 'codex',
command: 'codex',
args: ['exec', '--json', '--skip-git-repo-check', '-C', '/tmp', '--model', 'gpt-5.4-mini', '--image', '$IMAGE', '-'],
stdin: 'Look at the attached image. Reply with exactly one word: red, green, or blue.',
expected: /\bred\b/i,
timeoutMs: 90_000,
},
{
id: 'opencode-openrouter-glm-5-1-negative',
runtime: 'opencode',
envRequired: ['OPENROUTER_API_KEY'],
expectCapabilityBlocked: true,
},
];
For negative cases after Phase 4, prefer testing the app capability gate rather than spending OpenRouter tokens calling known unsupported models.
Diagnostics copy example
Attachment delivery diagnostic
team: atlas-hq
recipient: jack
runtime: opencode
model: openrouter/z-ai/glm-5.1
attachments: 1 image
optimized bytes: 612 KB
estimated serialized bytes: 842 KB
capability: unsupported
reason: GLM 5.1 is text-only for image input in verified OpenCode/OpenRouter smoke.
No base64, no data URL, no API key.
Documentation warnings
Docs must say:
Verified model support can change. If a model starts or stops accepting images, update the capability catalog and smoke expectations in a separate commit.
Docs must not say:
All OpenRouter models support screenshots.
Final pre-release manual checklist
- Send text-only message to Claude lead.
- Send optimized image to Claude lead.
- Send text-only message to Codex lead.
- Send image to Codex lead.
- Send text-only direct message to OpenCode member.
- Send image to OpenCode OpenAI member.
- Send image to OpenCode Kimi K2.6 member if OpenRouter configured.
- Attempt image to OpenCode GLM 5.1 and confirm it blocks before send.
- Attempt oversized image and confirm it blocks before send.
- Copy diagnostics and confirm no data URL/base64/key.
Phase 5 bug traps
| Trap | Prevention |
|---|---|
| live smoke consumes tokens in normal CI | not part of default test command |
| smoke fails due missing auth and blocks release | missing optional env is skipped, not failed |
| docs become stale | capability catalog references live smoke date |
| unsupported negative model changes behavior | update catalog/test explicitly |
| copied diagnostics leak image data | redaction unit tests |
Detailed E2E Matrix
Use one tiny deterministic image fixture: a red square with no metadata dependencies.
| Provider path | Model | Prompt | Expected |
|---|---|---|---|
| Claude subscription | current configured Claude model | What color is the square? One word. |
red |
| Codex subscription | gpt-5.4-mini or available vision model |
same | red |
| OpenCode OpenAI | openai/gpt-5.4-mini |
same | red |
| OpenCode OpenRouter | openrouter/moonshotai/kimi-k2.6 |
same | red |
| OpenCode OpenRouter | openrouter/z-ai/glm-4.5v |
same | red |
| OpenCode OpenRouter | openrouter/z-ai/glm-5.1 |
same | blocked/unsupported |
Negative E2E Matrix
| Scenario | Expected |
|---|---|
| Oversized image | UI blocks before send with size/optimization details. |
| Corrupt image | UI blocks with decode error. |
| Missing optimized artifact | Delivery fails actionable, message preserved. |
| Provider quota exhausted | Provider diagnostic shown, no generic spawn failed. |
| Codex logged out | Existing Codex login diagnostic shown. |
| OpenCode model no vision | Unsupported warning/block shown. |
| Multiple images over total budget | Deterministic budget warning. |
| Runtime exits after send | Runtime exit diagnostic separate from attachment validation. |
Automated Test Layers
Pure tests
- Budget selection.
- Capability resolution.
- MIME/type classification.
- Provider adapter serialization.
- Redaction.
Main process tests
- IPC validation rejects invalid attachment ids.
- Backend never accepts raw renderer paths.
- Missing artifacts produce controlled failure.
- Delivery failure does not mutate launch state.
Renderer tests
- Preview and optimization warnings.
- Target switching recomputes capability warning.
- Send button blocks unsupported attachments.
- Removing attachments clears warnings.
Safe E2E tests
- Provider command builders receive native attachment parts.
- One real smoke per supported provider before release.
Release Checklist
Before enabling broadly:
pnpm typecheck --pretty falsepasses.- Focused attachment pure tests pass.
- Existing team launch tests still pass.
- Existing OpenCode delivery tests still pass.
- At least one real visual smoke is green for Claude, Codex, and OpenCode.
- No new dependency is added without version/license check.
- App logs do not contain raw base64 image data.
- Copy diagnostics redacts paths/secrets where required.
- User-facing warnings are clear and actionable.
Observability Additions
Add lightweight diagnostics that help debug without leaking content.
Safe diagnostic fields:
interface AttachmentDiagnosticSummary {
attachmentCount: number;
totalBytes: number;
optimizedTotalBytes: number;
providers: string[];
blockedReason?: AttachmentDeliveryFailureCode;
warningCodes: string[];
}
Unsafe diagnostic fields:
- Base64 content.
- Full OCR/text extracted from images.
- Provider API keys.
- User home directory paths unless already part of explicit local diagnostics.
- Full binary file hashes if privacy-sensitive.
Post-Release Monitoring
Watch for these regressions after release:
- Users report team going offline after sending images.
- Users report image sent but model says it cannot see image on a supposedly vision-capable model.
- Inbox/message JSON grows unexpectedly large.
- Renderer memory spikes after attaching multiple screenshots.
- Codex/OpenCode diagnostics become generic after attachment failures.
- Retry loops repeat the same image message without bounded attempts.
If any happen:
- Disable only affected provider adapter path.
- Keep normalization and UI warnings if stable.
- Preserve messages and attachment metadata for forensic debugging.
- Do not revert unrelated bootstrap/process backend changes in the same hotfix.
Final Safety Review Questions
Before implementation starts, answer these in the PR description:
- Does any provider receive base64 as plain text? If yes, why is that unavoidable?
- Can renderer force backend to read an arbitrary file path? It must not.
- What happens if optimized artifact is deleted before retry?
- What happens if user switches target model after attaching image?
- What happens if a provider supports text but not vision?
- Does this change launch readiness or only message delivery?
- Are all retries bounded by existing ledger/runtime rules?
Deep Verification Plan
Deterministic fixture generation
Keep fixture generation local and deterministic so E2E does not depend on external image files.
export async function createRedSquareFixture(path: string): Promise<void> {
// Use a tiny PNG fixture committed to test fixtures, or generate with deterministic bytes.
}
Preferred:
- Commit a tiny PNG fixture under test fixtures if repository policy allows binary fixtures.
- Otherwise generate once in test setup using a deterministic encoder.
E2E assertion style
Do not require exact model wording. Normalize response and assert semantic color.
function answerMentionsRed(text: string): boolean {
return /\bred\b/i.test(text) || /красн/i.test(text);
}
Avoid accepting I cannot view images as pass.
E2E timeout strategy
- Provider smoke tests should have generous but bounded timeout.
- Failure should print provider/model, delivery path, and redacted stderr tail.
- One provider failure should not hide other provider results.
Release-blocking vs non-blocking tests
Release-blocking:
- Pure unit tests.
- Serialization tests.
- One Claude smoke if Claude support is enabled.
- One Codex smoke if Codex support is enabled.
- One OpenCode OpenAI smoke if OpenCode image support is enabled for OpenAI.
Non-blocking/manual before release:
- Multiple OpenRouter model smokes.
- Very large image performance test.
- HEIC/clipboard platform-specific tests.
User documentation checklist
Document these user-facing facts:
- Screenshots are automatically optimized before sending.
- Some OpenCode models do not support image attachments.
- If a model cannot receive images, the UI will ask the user to switch model or remove images.
- Raw files are not pasted as base64 into messages.
- If delivery fails, the message is preserved and the error explains whether it was size, model support, auth, quota, or runtime.
Final PR template section
Each implementation PR should include:
## Attachment safety checklist
- [ ] Text-only messages unchanged.
- [ ] No base64 plain-text prompt fallback.
- [ ] Backend validates attachment ids and paths.
- [ ] Unsupported model behavior tested.
- [ ] Provider auth errors remain provider auth errors.
- [ ] Delivery failure does not mark teammate offline unless runtime exits.
- [ ] Copy diagnostics redacts secrets.
- [ ] E2E smoke listed or intentionally deferred.
Failure Injection E2E Scenarios
These should be run after unit coverage is green.
Runtime survives attachment failure
- Start a team with one known working member.
- Send unsupported oversized image.
- Verify send is blocked or delivery fails without marking team/member offline.
- Send text-only message afterward.
- Verify member still responds.
Provider auth failure remains auth failure
- Temporarily use stale/invalid provider auth in test environment.
- Send image message.
- Verify diagnostics mention auth/login, not image unsupported.
- Restore auth.
Non-vision model blocked
- Select known non-vision OpenCode model.
- Attach image.
- Verify UI blocks before delivery.
- Remove image.
- Verify text-only send works.
Artifact missing on retry
- Save message with image artifact.
- Delete artifact file in test harness before retry.
- Trigger retry.
- Verify
attachment_artifact_missingand no text-only fallback.
Release Candidate Checklist
A release candidate can include attachment support only if:
- All Phase 1 pure/domain tests pass.
- Provider enabled in UI has at least one real visual smoke test green.
- Unsupported model UX has been manually verified.
- Copy diagnostics includes attachment error code and provider/model, but no base64.
- Team launch tests are unchanged or green.
- Existing user message delivery tests are green.
- OpenCode proof repair tests are green.
- Codex auth/preflight tests are green.
Suggested rollout order in release notes
- “Image attachments are optimized before sending.”
- “Claude and Codex vision-capable models receive screenshots through native image channels.”
- “OpenCode image support depends on the selected model. The UI warns when a model is not image-capable.”
- “Unsupported images are blocked with actionable diagnostics instead of being pasted as text.”
Post-Implementation Audit Checklist
After all phases are implemented, search the codebase for these anti-patterns:
base64,
data:image,
--image,
-f,
attachment_provider_rejected,
attachment_model_unsupported,
launch-state,
spawnStatus,
bootstrapConfirmed
Audit goals:
base64appears only in provider-native block serialization or tests.data:imageis not inserted into prompt text.--imageappears only in Codex command builder and tests.-fimage usage appears only in OpenCode adapter/command builder and tests.- Attachment failures do not write launch-state/spawn status.
- Bootstrap code does not import attachment modules.
Implementation Completion Checklist by Phase
Phase 1 completion evidence
Attach to PR:
- Unit test output for budget/validation/capability modules.
- Screenshot of composer warning for unsupported/oversized image.
- Screenshot of optimized image metadata display if UI exposes it.
- Confirmation that no provider delivery code changed.
Phase 2 completion evidence
Attach to PR:
- Claude serialization test output.
- Real Claude visual smoke result.
- Text-only Claude regression result.
- Diagnostic screenshot for oversized image block.
Phase 3 completion evidence
Attach to PR:
- Codex command builder test output.
- Real Codex visual smoke result using subscription mode.
- Confirmation Codex auth diagnostics unchanged.
- Confirmation no shell string command construction was added.
Phase 4 completion evidence
Attach to PR:
- OpenCode capability matrix tests.
- OpenCode OpenAI visual smoke result.
- OpenRouter Kimi/GLM visual smoke results if enabled.
- Unsupported GLM/non-vision model warning screenshot.
- Confirmation OpenCode proof repair lifecycle unchanged.
Phase 5 completion evidence
Attach to PR:
- Full focused test list.
- Manual smoke matrix with provider/model/date.
- Known limitations section.
- Release note draft.
Final bug-prevention checklist
Before merging the final phase, manually inspect these concerns:
- Does every provider adapter have tests for empty attachments and image attachments?
- Does every adapter reject unsupported MIME types?
- Does every adapter avoid changing text-only behavior?
- Does any code parse provider error text to decide model capability? It should not.
- Does any code write attachment failures into launch state? It must not.
- Does any code drop attachments silently? It must not.
- Does unsupported model UI block send or require explicit user action? It should.
- Does retry preserve attachment identity? It must.
- Does artifact missing produce a clear error? It must.
Honest risk estimate after this planning
If implemented phase-by-phase exactly as planned:
- Phase 1 risk:
2.5/10 - Phase 2 risk:
3/10 - Phase 3 risk:
4/10 - Phase 4 risk:
5/10 - Phase 5 risk:
2/10 - Overall release risk:
3.5/10
If implemented as one broad refactor:
- Overall release risk:
7/10
Main reason: the dangerous part is not image resizing itself. The dangerous part is accidentally coupling attachments to delivery proofs, runtime liveness, auth diagnostics, or provider-specific launch code.
Phase 5 Deep Review Addendum
Cross-provider E2E script shape
A single script can reduce manual drift, but it should not be required for normal unit tests.
Conceptual CLI:
pnpm tsx scripts/smoke-agent-attachments.ts \
--provider claude \
--model current \
--image test/fixtures/red-square.png
Output should be machine-readable enough for logs:
{
"provider": "codex",
"model": "gpt-5.4-mini",
"image": "red-square.png",
"result": "pass",
"answerPreview": "red",
"durationMs": 18432
}
E2E failure report template
When an image smoke fails, report:
Provider: OpenCode
Model: openrouter/z-ai/glm-5.1
Expected: red
Observed: I cannot view images
Classification: model_not_vision_capable
Action: keep blocked/unsupported in capability matrix
Do not report it as generic runtime failure.
Manual release drill
Before release, do this exact drill:
- Start app fresh.
- Create or open a simple team with Claude lead.
- Send text-only message, confirm normal reply.
- Send one screenshot to Claude, confirm visual answer.
- Switch to Codex member, send same screenshot, confirm visual answer.
- Switch to OpenCode vision member, send same screenshot, confirm visual answer.
- Switch to OpenCode unsupported model, confirm UI blocks image.
- Send text-only to unsupported model, confirm it still works.
- Restart app, verify messages render and attachments do not disappear.
- Copy diagnostics from failed unsupported send, verify no base64/secrets.
Final release confidence rating target
Do not ship broad attachment support unless confidence reaches:
- 🎯 at least
8.5/10for Claude. - 🎯 at least
8/10for Codex. - 🎯 at least
7.5/10for OpenCode vision-capable known models. - 🛡️ at least
9/10that unsupported models fail safely. - 🛡️ at least
9/10that launch/readiness is unaffected.
Phase 5 Implementation Contract Addendum
Test command matrix
Focused checks after each phase should be small and targeted.
Phase 1:
pnpm vitest run test/features/agent-attachments/budgets.test.ts test/features/agent-attachments/validation.test.ts
pnpm typecheck --pretty false
Phase 2:
pnpm vitest run test/features/agent-attachments/claudeAttachmentAdapter.test.ts
pnpm vitest run test/main/services/team/TeamProvisioningServiceRelay.test.ts
Phase 3:
pnpm vitest run test/features/agent-attachments/codexAttachmentAdapter.test.ts
# plus orchestrator codex command builder test in agent_teams_orchestrator
Phase 4:
pnpm vitest run test/features/agent-attachments/opencodeAttachmentAdapter.test.ts
pnpm vitest run test/main/services/team/OpenCodePromptDeliveryLedger.test.ts test/main/services/team/OpenCodePromptDeliveryWatchdog.test.ts
Phase 5:
pnpm typecheck --pretty false
pnpm vitest run test/features/agent-attachments/**/*.test.ts
Exact test paths may change during implementation, but the test categories should remain.
Smoke result ledger
Create a simple markdown table in PR or docs after live smokes:
| Date | Provider | Model | Runtime path | Result | Notes |
|---|---|---|---|---|---|
| 2026-05-09 | Claude | current | app team delivery | pass | red-square -> red |
| 2026-05-09 | Codex | gpt-5.4-mini | app team delivery | pass | red-square -> red |
| 2026-05-09 | OpenCode | openai/gpt-5.4-mini | app team delivery | pass | red-square -> red |
This prevents accidental reliance on standalone prototype results when app path differs.
Final implementation order reminder
Do not start with provider adapters. Start with Phase 1 domain and storage because provider adapters need a stable input contract.
Correct order:
- Normalize and persist artifacts safely.
- Block unsupported/oversized sends.
- Add Claude adapter.
- Add Codex adapter.
- Add OpenCode adapter.
- Add E2E and docs polish.
Wrong order:
- Hack provider CLI args.
- Later figure out storage.
- Later figure out UI warnings.
- Later discover retries lost files.
Implementation Readiness Addendum
Definition of Ready for Phase 5
Before Phase 5:
- All provider adapters have unit tests.
- At least one provider image path works in app-managed runtime.
- Unsupported model UX exists.
- Diagnostics taxonomy is implemented.
Fixture strategy
Prefer one committed tiny PNG fixture plus optional generated variants.
Fixtures:
test/fixtures/attachments/red-square.png
test/fixtures/attachments/red-square-large.png
test/fixtures/attachments/corrupt.png
test/fixtures/attachments/not-image.txt
If binary fixtures are not desired, generate in test setup, but keep generation deterministic.
Smoke classification helper
export function classifyVisualSmokeAnswer(answer: string): 'pass' | 'refusal' | 'wrong' | 'empty' {
const normalized = answer.trim().toLowerCase();
if (!normalized) return 'empty';
if (/cannot|can't|unable|view|image/.test(normalized) && !/red/.test(normalized)) return 'refusal';
if (/\bred\b|красн/.test(normalized)) return 'pass';
return 'wrong';
}
Do not overfit this helper to one provider. It is only for smoke classification.
Release notes limitations section
Document limitations honestly:
Known limitations:
- Image support depends on selected provider and model.
- Some OpenCode/OpenRouter models are intentionally blocked until verified for vision.
- Very large images are optimized and may lose some detail.
- Non-image files continue using existing file behavior and are not part of the new image pipeline.
Final post-merge watchlist
For the first release after merge, watch for:
- Increased renderer memory usage after attaching screenshots.
- Any report of team launch failures after sending images.
- Any report that images appear in message text as base64.
- Any OpenCode model that should be marked unsupported.
- Any Codex auth diagnostic regression.
- Any failed retry due to missing artifacts.
Final Phase 5 Acceptance Specs
Spec 1 - cross-provider visual smoke matrix
Given the red-square fixture
When visual smoke runs for each enabled provider path
Then every supported provider/model answers red
And every unsupported model is blocked or classified unsupported
And no failure is reported as generic spawn failure
Spec 2 - diagnostics are safe
Given an attachment delivery fails
When the user copies diagnostics
Then diagnostics include provider, model, failure code, and redacted reason
And diagnostics do not include base64 image bytes
And diagnostics do not include API keys or bearer tokens
Spec 3 - launch unaffected
Given a team launches without attachments
When launch completes
Then launch behavior matches pre-attachment behavior
And no attachment module participates in bootstrap readiness
Phase 5 exact PR contract
The Phase 5 PR is acceptable only if:
- Smoke results are recorded with date/provider/model/runtime path.
- Unsupported model behavior is manually verified.
- Existing launch and delivery focused tests are green.
- Documentation explains limitations honestly.
- Release notes avoid promising universal vision support.
- Post-merge watchlist is included in release checklist.
Final “do not ship if” list
Do not ship if any of these are true:
- Text-only messages regress for any provider.
- Any provider receives base64 as plain prompt text.
- Unsupported OpenCode models silently accept image messages.
- Attachment failure marks teammate spawn failed without runtime exit evidence.
- Codex subscription auth diagnostics regress.
- Image retry can silently send text without the original image.
- Copy diagnostics leaks base64 or secrets.
Phase 5 Pre-Mortem and Extra Safeguards
Likely verification mistakes
| Mistake | Prevention |
|---|---|
| Standalone provider smoke passes but app path fails | Smoke through app-managed team delivery. |
| E2E accepts refusal as pass | Use classifier that rejects “cannot view image”. |
| Only happy path tested | Include unsupported model, artifact missing, and auth failure. |
| Logs leak base64 | Copy diagnostics test searches for base64 markers. |
| Release notes overpromise | Explicit known limitations section. |
Copy diagnostics acceptance test
it('copy diagnostics omits image bytes and secrets', () => {
const text = buildAttachmentFailureDiagnostics(fakeImageFailure());
expect(text).toContain('attachment_model_unsupported');
expect(text).not.toContain('data:image');
expect(text).not.toMatch(/[A-Za-z0-9+/]{200,}={0,2}/);
expect(text).not.toMatch(/sk-[a-zA-Z0-9_-]+/);
});
App-path smoke script must prove route
Smoke result should include route:
{
"route": "user->lead",
"teamName": "attachment-smoke-claude",
"provider": "anthropic",
"model": "current",
"attachmentTransport": "claude_structured_blocks",
"result": "pass"
}
If route is standalone CLI only, mark as prototype evidence, not release evidence.
Release manager checklist
- Read all
Do not ship ifitems. - Confirm no broad feature flag remains.
- Confirm capability gates are user-visible product behavior.
- Confirm no launch/provisioning files import attachment modules.
- Confirm provider smoke evidence is app-path evidence.
- Confirm rollback by provider is possible without data migration.