93 KiB
OpenCode Delivery Ledger And Bounded Retry Plan
Summary
Recommended implementation: add an OpenCode prompt-delivery ledger with explicit accepted, responded, unanswered, retried, and failed states, plus a bounded retry watchdog.
Goal: when the app delivers an inbox row into a live OpenCode teammate session, we stop treating promptAsync() acceptance as success. Acceptance only means the bridge accepted the prompt. A row is fully delivered only after OpenCode shows evidence that the teammate acted on it.
Expected implementation size:
claude_team: 🎯 8 🛡️ 9 🧠 7 - roughly950-1450changed lines.agent_teams_orchestrator: 🎯 8 🛡️ 9 🧠 6 - roughly500-850changed lines.- Tests and fixtures: 🎯 9 🛡️ 9 🧠 6 - roughly
750-1250changed lines.
The plan is intentionally OpenCode-only. Native Codex, Claude, and Gemini teammate delivery paths stay unchanged.
Why This Is Needed
Current OpenCode relay semantics are too optimistic:
claude_teamwrites an inbox row toinboxes/<member>.json.relayOpenCodeMemberInboxMessages()sees the unread row.- It calls
deliverOpenCodeMemberMessage(). OpenCodeTeamRuntimeAdapter.sendMessageToMember()calls the orchestrator bridge.OpenCodeBridgeCommandHandler.runSendMessage()callspromptAsync().- The bridge returns
accepted: true. claude_teammarks the inbox rowread: true.
The problem: accepted: true only proves that OpenCode accepted a prompt into a session. It does not prove the model used tools, sent a visible message, touched a task, or even produced meaningful assistant text.
Observed failure mode from beacon-desk-92121221:
- Jack and Tom had task assignment rows marked read.
- The OpenCode sessions had received the app-delivery prompts.
- The transcript showed empty assistant turns after those prompts.
- The assigned tasks stayed
in_progress. - No comments, task events, or visible messages were produced.
That is a durable message-loss bug because the inbox row is already read, so normal retry logic has nothing left to pick up.
Current Code Paths
App-side OpenCode delivery
Primary files:
src/main/services/team/TeamProvisioningService.tssrc/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.tssrc/main/services/team/opencode/bridge/OpenCodeBridgeCommandContract.tssrc/main/ipc/teams.tssrc/shared/types/team.ts
Important current behavior:
// TeamProvisioningService.ts
const delivery = await this.deliverOpenCodeMemberMessage(teamName, {
memberName,
text: message.text,
messageId: message.messageId,
replyRecipient,
actionMode,
taskRefs,
});
if (delivery.delivered) {
await this.markInboxMessagesRead(teamName, memberName, [message]);
}
delivery.delivered currently means bridge prompt acceptance. It should become "bridge accepted plus response proof exists" for OpenCode inbox relay completion.
Deeper research findings
The highest-risk parts after code review are:
-
promptAsync()returnsPromise<void>. It does not return the OpenCode user message id created by/session/:id/prompt_async, so v1 cannot rely on a prompt return value as proof. The response observer must use transcript reconcile. -
sendSessionMessage()does return anOpenCodeMessage, but it is the synchronous/session/:id/messagepath with a long request timeout. It is a bad default for live app delivery because it can block UI/runtime relay behind model execution. It is useful as a future fallback, not the v1 delivery primitive. -
markInboxMessagesRead()writes a separate JSON file from any new ledger file. There is no atomic cross-file commit. The ledger must represent "response observed, inbox read commit still pending" so crashes or mark-read failures do not cause duplicate prompts. -
Controller inbox writes always create a
messageIdwhen one is missing. Task assignment notifications usemessageStore.buildMessage(), so they have stable ids. That means the OpenCode prompt ledger can requireinboxMessageIdfor every eligible OpenCode relay row. -
OpenCode runtime stores are lane-scoped for secondary lanes. The prompt delivery ledger should be lane-scoped too, not only team-root scoped, because run id, session id, and manifest recovery are lane facts.
-
Existing
relayedMemberInboxMessageIdsis a native relay dedupe cache. It is not durable, it is not response-aware, and it should not be reused as the OpenCode delivery truth. -
opencode.sendMessageinOpenCodeReadinessBridgeis already a direct bridge command, not a state-changing command service call. The new observe command should follow this pattern. Do not route observe throughOpenCodeStateChangingBridgeCommandService, because observe must not acquire state-changing command leases, write command-ledger entries, or commit runtime-store manifests. -
State-changing bridge service is currently only for:
'opencode.launchTeam' | 'opencode.reconcileTeam' | 'opencode.stopTeam'
opencode.observeMessageDelivery must not be added to that union. It should be added only to the general bridge command contract and orchestrator supported-command dispatch.
-
reconcileSession(record, { limit })can miss old prompt history when a long OpenCode session has many later events. If the observer cannot find the inboundmessageIdor theprePromptCursorinside the limited transcript, it must do one wider/full-history reconcile before returningprompt_not_indexedorempty_assistant_turn. -
Real OpenCode fixtures use canonical MCP tool ids like
agent-teams_message_send, but tool names can also appear asmcp__agent-teams__message_sendor, in lower-level proof paths, plain names such asmessage_send. The observer must normalize tool names before classifying response proof. -
Real OpenCode fixtures also use normal tool names such as
bashandreadin lowercase. Execution-tool response proof must not only match Codex-style display names likeBash. -
If the bridge command itself times out after the prompt may have been submitted, the app cannot know whether OpenCode accepted the prompt. Treat that as
failed_retryablewithacceptanceUnknown: true, and make the watchdog observe the transcript before sending any retry. Do not immediately send a duplicate retry after an acceptance-unknown timeout. -
messageStoreand renderer/feed services already understandrelayOfMessageId, but the OpenCode MCPmessage_sendschema does not expose it yet. Exposing the existing field is lower risk than inventing a new reply-id contract, and it gives the watchdog a non-heuristic visible-reply correlation signal.
Orchestrator-side OpenCode bridge
Primary files:
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.ts/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeSessionBridge.ts/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeEventTranslator.ts/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeTranscriptProjector.ts
Important current behavior:
await openCodeSessionBridge.promptAsync(record, {
text: identityReminder ? `${identityReminder}\n\n${text}` : text,
agent: asString(body.agent) ?? 'teammate',
noReply: body.noReply === true,
});
const reconciled = await withTimeout(
openCodeSessionBridge.reconcileSession(record, { limit: 50 }),
OPENCODE_SEND_RECONCILE_TIMEOUT_MS,
);
return {
accepted: true,
sessionId: record.opencodeSessionId,
memberName,
runtimePid,
diagnostics: reconcileDiagnostics,
};
This already does a post-accept reconcile, but it does not turn the reconcile summary into a response-proof contract.
Transcript capabilities
OpenCodeTranscriptProjector already gives enough normalized fields:
export type OpenCodeCanonicalMessage = {
id: string | null;
parentId: string | null;
role: 'user' | 'assistant' | 'system' | 'unknown';
completedAt: number | null;
text: string;
reasoningText: string;
previewText: string | null;
partTypes: string[];
toolCalls: OpenCodeCanonicalToolCall[];
finishReason: string | null;
hasError: boolean;
};
This is enough to classify a prompt outcome.
Key Design Decision
Recommended read semantics
Do not mark OpenCode member inbox rows read: true at accepted.
Mark them read only when the delivery ledger reaches a terminal success state:
respondedwith a tool call that is sufficient for the message intent.respondedwith visibleagent-teams_message_send.respondedwith a useful plain assistant answer, with diagnostic warning because visible UI capture may not happen.
Do not auto-mark failed_terminal rows read in v1. Keep them unread and surface diagnostics. A later product action can add explicit "ack failed delivery" behavior, but the delivery watchdog should not hide failed OpenCode prompts.
This is the safest model because inboxes/<member>.json remains the durable queue of uncommitted work.
Options Considered
-
Recommended: keep row unread until response proof, ledger suppresses duplicate relays. 🎯 9 🛡️ 9 🧠 6 - roughly
650-1050lines total. This preserves durable queue truth and fixes the actual loss bug. The complexity is in preventing the file watcher from retrying immediately. -
Mark row read at prompt acceptance, but use a separate retry ledger. 🎯 7 🛡️ 7 🧠 5 - roughly
400-750lines total. Easier to bolt on, but it keeps inbox truth dishonest. Recovery after app restart is harder because the original row no longer looks pending. -
Make UI send block until OpenCode response proof. 🎯 6 🛡️ 8 🧠 4 - roughly
250-450lines total. Simpler conceptually, but bad UX and still incomplete for watcher-driven task assignment rows.
Use option 1.
Ledger Model
Add a new OpenCode-specific prompt delivery ledger. Do not reuse RuntimeDeliveryJournal; that journal is for the opposite direction, where a runtime writes into canonical app destinations through runtime_deliver_message.
Recommended file for secondary OpenCode lanes:
<team>/.opencode-runtime/lanes/<encodeURIComponent(laneId)>/opencode-prompt-delivery-ledger.json
Recommended schema name:
'opencode.promptDeliveryLedger'
Add it to the runtime store manifest descriptors, schema-name validator, cross-store invariant input, and lane-scoped manifest recovery checks:
{
schemaName: 'opencode.promptDeliveryLedger',
schemaVersion: 1,
relativePath: 'opencode-prompt-delivery-ledger.json',
criticality: 'rebuildable_from_canonical_destination',
owner: 'delivery',
rebuildStrategy: 'verify_canonical_destinations',
}
Do not store secondary-lane prompt delivery truth in the team root .opencode-runtime directory. Use getOpenCodeLaneScopedRuntimeFilePath() and the lane manifest path from getOpenCodeRuntimeManifestPath(teamsBasePath, teamName, laneId).
Reason: activeRunId, runtime session id, stale runtime detection, and durable launch evidence are lane scoped. A team-root ledger would make mixed launches harder to recover correctly after app restart or lane replacement.
Readiness impact:
- Prompt ledger corruption should not block OpenCode provider/model readiness.
- It should surface degraded diagnostics and force conservative observe-first reconstruction.
- Launch should not fail only because the prompt ledger is missing; unread inbox rows remain the canonical destination.
Missing Or Quarantined Ledger Recovery
Because the ledger is rebuildable_from_canonical_destination, a missing ledger can be reconstructed from unread inbox rows. But reconstruction must be conservative.
If the ledger is missing/quarantined while OpenCode lane/session evidence exists:
- Read unread OpenCode member inbox rows.
- Create reconstructed records with
status: 'failed_retryable',responseState: 'not_observed', andacceptanceUnknown: true. - Run
opencode.observeMessageDeliverybefore sending any new prompt. - Only prompt if observation cannot find a delivered user prompt or response proof after the normal grace window.
Reason: app crash or file corruption may have happened after promptAsync() accepted but before the ledger persisted accepted. Rebuilding directly as pending and immediately prompting can duplicate messages.
If no active lane/session can be resolved, keep the reconstructed record failed_retryable with opencode_runtime_not_active; do not prompt until the runtime is live again.
Record Shape
export type OpenCodePromptDeliveryStatus =
| 'pending'
| 'accepted'
| 'responded'
| 'unanswered'
| 'retry_scheduled'
| 'retried'
| 'failed_retryable'
| 'failed_terminal';
export type OpenCodePromptResponseState =
| 'not_observed'
| 'pending'
| 'prompt_not_indexed'
| 'responded_tool_call'
| 'responded_visible_message'
| 'responded_non_visible_tool'
| 'responded_plain_text'
| 'permission_blocked'
| 'tool_error'
| 'empty_assistant_turn'
| 'session_stale'
| 'session_error'
| 'reconcile_failed';
export interface OpenCodePromptDeliveryLedgerRecord {
id: string;
teamName: string;
memberName: string;
laneId: string;
runId: string | null;
runtimeSessionId: string | null;
inboxMessageId: string;
inboxTimestamp: string;
source: 'watcher' | 'ui-send' | 'manual' | 'watchdog';
replyRecipient: string;
actionMode: 'do' | 'ask' | 'delegate' | null;
taskRefs: TaskRef[];
payloadHash: string;
status: OpenCodePromptDeliveryStatus;
responseState: OpenCodePromptResponseState;
attempts: number;
maxAttempts: number;
acceptanceUnknown: boolean;
nextAttemptAt: string | null;
lastAttemptAt: string | null;
lastObservedAt: string | null;
acceptedAt: string | null;
respondedAt: string | null;
failedAt: string | null;
inboxReadCommittedAt: string | null;
inboxReadCommitError: string | null;
prePromptCursor: string | null;
postPromptCursor: string | null;
deliveredUserMessageId: string | null;
observedAssistantMessageId: string | null;
observedToolCallNames: string[];
observedVisibleMessageId: string | null;
visibleReplyMessageId: string | null;
visibleReplyInbox: string | null;
visibleReplyCorrelation:
| 'relayOfMessageId'
| 'direct_child_message_send'
| 'plain_assistant_text'
| null;
lastReason: string | null;
diagnostics: string[];
createdAt: string;
updatedAt: string;
}
Stable Record ID
Use a stable deterministic ID:
sha256([
'opencode-prompt-delivery-v1',
teamName,
memberName.toLowerCase(),
laneId,
inboxMessageId,
].join('\0'))
Do not use message text as the primary key. Two different messages can have identical text and both must be delivered.
Use payloadHash only as a safety check. If the same record ID appears with a different payload hash, mark failed_terminal and log a diagnostic because the inbox row identity is corrupt or reused.
Payload hash input should include:
- message text,
- summary,
- actionMode,
- taskRefs,
- replyRecipient,
- attachment metadata ids/filenames/mime types/sizes,
- source/from/to,
- conversation ids if present.
Do not hash attachment file bytes in v1. The watchdog needs stable delivery identity, not expensive file integrity checks.
Runtime Identity Binding
The first delivery attempt must resolve and persist:
laneId,runIdif known,runtimeSessionIdwhen available,memberNamecanonical casing,providerId: 'opencode'.
Retries and observations must use the ledger laneId, not recompute a new lane from current member config unless the ledger is being created for the first time.
If the member's current provider/model changes while a record is non-terminal:
- do not redirect the old OpenCode delivery to a new provider or lane,
- mark the record
failed_terminalwithopencode_recipient_runtime_identity_changed, - leave the inbox row unread with diagnostics,
- let any new user message use the new provider path normally.
Reason: a retry belongs to the original runtime session that accepted or may have accepted the prompt. Re-resolving from mutable team config can deliver stale messages to the wrong runtime.
Response Versus Inbox Commit
responded means OpenCode acted. It does not necessarily mean the inbox read flag was committed.
Track this separately:
status: 'responded',
inboxReadCommittedAt: null,
inboxReadCommitError: 'write failed'
Relay behavior for this state:
- Do not re-prompt.
- Retry only the
markInboxMessagesRead()commit. - Once read commit succeeds, set
inboxReadCommittedAt.
This closes the crash window between observing a response and flipping read: true.
Why Not Use sendSessionMessage() For v1
There is a tempting alternative: replace promptAsync() with synchronous sendSessionMessage() because it returns an OpenCodeMessage.
Options:
-
Recommended: keep
promptAsync()plus observe-only reconcile. 🎯 9 🛡️ 9 🧠 6 - roughly550-950lines total. It preserves non-blocking relay and gives durable response proof through the watchdog. -
Use
sendSessionMessage()for all OpenCode deliveries. 🎯 5 🛡️ 6 🧠 4 - roughly250-500lines total. It may block up to the model execution timeout, couples UI send to model latency, and still needs transcript projection for tool side effects. -
Hybrid: use
sendSessionMessage()only for manual UI sends, keeppromptAsync()for watcher deliveries. 🎯 6 🛡️ 6 🧠 7 - roughly450-800lines total. It creates two delivery semantics and makes failures harder to reason about.
Use option 1.
State Machine
stateDiagram-v2
[*] --> pending
pending --> accepted: promptAsync accepted
pending --> failed_retryable: bridge transient failure
pending --> failed_terminal: bad recipient/runtime stopped
accepted --> responded: response proof observed
accepted --> unanswered: idle + no meaningful response
accepted --> failed_retryable: reconcile failed or session stale
accepted --> failed_terminal: permanent contract error
unanswered --> retry_scheduled: attempts < maxAttempts
retry_scheduled --> retried: due time reached
retried --> accepted: promptAsync accepted again
retried --> failed_retryable: transient retry failure
failed_retryable --> retry_scheduled: attempts < maxAttempts
failed_retryable --> failed_terminal: attempts exhausted
responded --> [*]
failed_terminal --> [*]
Important semantics:
pending: ledger row exists but no bridge attempt has completed.accepted: OpenCode accepted a prompt, but no meaningful response proof exists yet.responded: OpenCode acted. This is the only normal success state that commits the inbox row as read.unanswered: OpenCode became idle and the transcript has no meaningful response after the delivered user prompt.retry_scheduled: watchdog will retry afternextAttemptAt.retried: diagnostic state for audit; implementation can immediately move fromretry_scheduledtoacceptedafter the next attempt.failed_retryable: transient bridge/runtime issue.failed_terminal: no more retries or permanent error.
Response Proof Contract
Add a pure response observer in the orchestrator:
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeDeliveryResponseObserver.ts
Suggested API:
export interface OpenCodeDeliveryResponseObservation {
state:
| 'pending'
| 'responded_tool_call'
| 'responded_visible_message'
| 'responded_plain_text'
| 'responded_non_visible_tool'
| 'permission_blocked'
| 'tool_error'
| 'empty_assistant_turn'
| 'not_observed'
| 'prompt_not_indexed'
| 'session_stale'
| 'session_error'
| 'reconcile_failed';
deliveredUserMessageId: string | null;
assistantMessageId: string | null;
toolCallNames: string[];
visibleMessageToolCallId: string | null;
visibleReplyMessageId: string | null;
visibleReplyCorrelation:
| 'relayOfMessageId'
| 'direct_child_message_send'
| 'plain_assistant_text'
| null;
visibleReplyMissingRelayOfMessageId?: boolean;
latestAssistantPreview: string | null;
needsFullHistory?: boolean;
reason: string | null;
}
How To Identify The Delivered Prompt
The app-delivery prompt already embeds:
The inbound app messageId is "<messageId>"
Observer should:
- Collect all
usercanonical messages whose text contains the exact inboundmessageId. - Also pass
prePromptCursor, usually the session record'slastCanonicalCursorbeforepromptAsync(). - If exact
messageIdmatches exist, treat them as attempts for the same logical delivery. A sufficient response to any matching attempt proves the delivery. - Use the newest exact
messageIdmatch only for pending/unanswered diagnostics when no matching attempt has sufficient response proof. - Fallback to the first
usermessage afterprePromptCursoronly if exactly one candidate exists and the inboundmessageIdis missing from the indexed text. - If neither the exact
messageIdnor theprePromptCursorappears in the limited reconcile result, run one wider reconcile before classifying the prompt as missing. - Return
prompt_not_indexedif the prompt is not yet visible in transcript and raw status isbusyorretry. - Never use "latest user message" without either exact message-id match or a cursor-bounded unique candidate, because OpenCode sessions can contain bootstrap, briefing, assignment, comment, and retry prompts.
prompt_not_indexed is not a failure. It means OpenCode accepted the async prompt but /session/:id/message history has not caught up yet.
Visible Reply Correlation Contract
The response observer should not rely on text/time heuristics to decide whether a visible OpenCode reply belongs to an app-delivered inbox row.
The app already has a durable message id for every relayable inbox row. The controller message store already persists relayOfMessageId. The missing piece is exposing it through the OpenCode-visible MCP message_send schema and teaching the runtime prompt to use it.
Recommended v1 contract:
- Extend
agent-teams_message_send/message_sendparameters withrelayOfMessageId?: string. - Preserve
relayOfMessageIdthrough the MCP server, controllermessages.sendMessage(), andmessageStore.sendInboxMessage(). - In
buildOpenCodeRuntimeMessageText(), instruct OpenCode replies to include:
source="runtime_delivery"
relayOfMessageId="<inbound app messageId>"
- If the message has
taskRefs, still include thosetaskRefsexactly. - Do not introduce a second field named
replyToMessageIdin v1. Use existingrelayOfMessageIdso renderer, feed services, andTeamDataServicecan share one correlation primitive.
Suggested prompt wording:
The inbound app messageId is "<messageId>".
When you reply with agent-teams_message_send, include source="runtime_delivery" and relayOfMessageId="<messageId>".
Strongest visible response proof is a canonical destination row where:
message.from === memberName
message.relayOfMessageId === inboxMessageId
message.source === 'runtime_delivery'
For the normal user-directed case, this row is in inboxes/user.json. If the reply target is a same-team teammate or lead, check that recipient's inbox file instead of assuming user inbox.
The transcript observer still matters because OpenCode can act with task tools or execution tools without creating a visible inbox message. But for visible replies, destination-store proof should outrank transcript-only proof because it proves the app actually received the reply, not only that OpenCode attempted a tool call.
Recommended proof priority for visible replies:
- Destination row with matching
relayOfMessageId. - Successful direct-child
message_sendtool call whose arguments include matchingrelayOfMessageId. - Successful direct-child
message_sendtool call withoutrelayOfMessageId, accepted only as a fallback with diagnosticvisible_reply_missing_relayOfMessageId. - Plain assistant text, accepted only as
responded_plain_textwith diagnostic because it may not appear in Messages UI.
Do not use TeamDataService.linkPassiveUserReplySummaries() or summary/time matching as OpenCode delivery proof. That method is useful for rendering passive duplicate summaries, but it is intentionally heuristic and should not decide inbox read commits.
Visible Reply Semantic Sufficiency
Correlation proves that a reply belongs to a delivered message. It does not prove that the reply is useful.
For visible message_send replies, evaluate semantic sufficiency before read commit:
function isVisibleReplySemanticallySufficient(input: {
actionMode: 'do' | 'ask' | 'delegate' | null;
taskRefs: TaskRef[];
text: string;
summary?: string | null;
}): boolean
Rules:
- For
actionMode: 'ask', ack-only text such asПонял,Ок,Understood,Got it,I'll check, orI'll do itis not sufficient. The reply must contain an answer, a blocking question, or a concrete status/result. - For no action mode and no task refs, use the same rule as
ask. - For
taskRefs.length > 0, ack-only visible text alone is not sufficient unless there is also a successful task tool or execution tool. The goal is not just acknowledgement; the prompt delivery should produce task-side action or a concrete status. - For
actionMode: 'do', ack-only visible text alone is not sufficient unless there is also successful task/execution proof. - For
actionMode: 'delegate', a visible message is sufficient only if it names the delegated action, target, or status. Pure acknowledgement is not enough.
Ack-only detection must be intentionally narrow:
- Normalize whitespace and lowercase.
- Only treat as ack-only when the combined text/summary is short, for example under 120 characters.
- Match exact phrases or simple phrase prefixes from a small allow-reviewed list:
понял,ок,принял,сделаю,разберусь,understood,got it,ok,will do,I'll check,I'll take a look. - If the text contains a concrete blocker, question, result, file path, task id, error, number, code span, or more than one sentence of substance, treat it as sufficient.
- If uncertain, prefer sufficient and log
visible_reply_semantic_uncertain. False retries are usually worse than accepting a borderline concrete status.
If a visible reply is correlated but semantically insufficient:
- keep the ledger
accepted, - set
responseState = 'responded_visible_message', - set
lastReason = 'visible_reply_ack_only_still_requires_answer', - do not mark the original OpenCode inbox row read,
- retry after grace with copy asking for a concrete answer/status, not just acknowledgement.
This is deliberately stricter than the controller-level idle/ack-only guard. The controller guard decides whether to persist a message. The watchdog read-commit policy decides whether that persisted message satisfies the delivered prompt.
OpenCode Ack-Only Guard Tightening
Current controller logic treats any non-empty source as explicit delivery context for the idle/ack-only guard. That is too broad for OpenCode because a model can emit source="runtime_delivery" on a generic "Понял" / "Understood" message that is not tied to an app-delivered row.
For OpenCode senders, hasExplicitDeliveryContext() should treat a message as explicit delivery context only when at least one of these is true:
relayOfMessageIdis present and non-empty.taskRefsis present and non-empty.attachmentsis present and non-empty.leadSessionIdis present for a real lead-session reply.- Another future typed delivery context field is present.
source="runtime_delivery" alone is metadata, not proof. It should not bypass ack-only filtering unless it is paired with relayOfMessageId or task/action context.
This keeps the guard from dropping useful non-ack answers: a substantive answer with text still passes because the idle/ack-only detector only rejects ack/no-task/no-work phrases. The stricter context rule only affects OpenCode ack-only noise.
Reconcile Limit Fallback
The first post-send reconcile should stay bounded for latency:
await reconcileSession(record, { limit: 80 });
But bounded history is not enough for durable watchdog observation. If observeOpenCodeDeliveryResponse() cannot find either:
- the exact inbound
messageId, or - the
prePromptCursorthat was captured beforepromptAsync(),
then runObserveMessageDelivery() must retry with a wider/full transcript once:
const limited = await reconcileSession(record, { limit: 80 });
const first = observeOpenCodeDeliveryResponse({ ...limited, prePromptCursor });
if (first.state === 'prompt_not_indexed' && first.needsFullHistory === true) {
const full = await reconcileSession(record);
return observeOpenCodeDeliveryResponse({ ...full, prePromptCursor });
}
Do not add this full-history fallback to every sendMessage call. Use it in watchdog/observe paths and only when the anchor is missing. UI send should stay latency bounded.
What Counts As Responded
Use a direct child assistant message where:
assistant.parentId === deliveredUserMessage.id
There can be multiple assistant children for one delivered user prompt. Classify all children and choose the strongest successful response proof, not just the first or last child.
Classify as success if any of these are true:
- Tool call exists:
assistant.toolCalls.length > 0
This is the strongest signal, but only after filtering tool calls.
Meaningful tool calls:
agent-teams_message_sendmcp__agent-teams__message_sendmessage_sendagent-teams_task_*mcp__agent-teams__task_*task_*- normal execution tools when the prompt action mode is
do, such as shell/edit/read tools, if exposed by OpenCode in canonical transcript
Non-meaningful tool calls for delivery proof:
runtime_bootstrap_checkinagent-teams_runtime_bootstrap_checkinmcp__agent-teams__runtime_bootstrap_checkinmember_briefingagent-teams_member_briefingmcp__agent-teams__member_briefingruntime_heartbeatprocess_registerprocess_list
Reason: identity/bootstrap tools can be emitted because of the prepended reminder, not because the agent responded to the delivered message.
Tool-name normalization should happen before classification:
function normalizeToolName(name: string): string {
return name
.replace(/^mcp__agent-teams__/, '')
.replace(/^agent-teams_/, '')
.toLowerCase();
}
Then classify by normalized names:
- visible message tool:
message_send - task tools:
task_get,task_start,task_add_comment,task_complete, and othertask_names - bootstrap tools:
runtime_bootstrap_checkin,member_briefing,runtime_heartbeat,process_register,process_list - execution tools for
actionMode: 'do':bash,read,edit,write,grep,glob, and OpenCode equivalents observed in fixtures
Real fixture coverage should include at least these observed names:
agent-teams_runtime_bootstrap_checkinagent-teams_member_briefingagent-teams_message_sendagent-teams_task_getagent-teams_task_startagent-teams_task_add_commentagent-teams_task_completebashread
- Visible team message tool call exists:
assistant.toolCalls.some((tool) =>
normalizeToolName(tool.toolName) === 'message_send'
)
This should classify as responded_visible_message.
If the tool call arguments include relayOfMessageId matching the inbound inbox messageId, set visibleReplyCorrelation = 'relayOfMessageId'.
If the tool call is a direct child of the delivered prompt but lacks relayOfMessageId, still classify it as responded_visible_message in v1, but emit diagnostic visible_reply_missing_relayOfMessageId. This avoids losing a real answer during rollout while making the missing correlation visible in logs/tests.
- Meaningful non-visible tool call exists:
This classifies as responded_non_visible_tool.
For task assignment prompts, this is enough to mark the message responded if the tool call is a task tool or execution tool. The delivery watchdog is proving that the prompt was noticed, not that the whole task is complete.
For direct user questions, non-visible tool activity is not always enough. A read or bash call can prove the model noticed the prompt, but it does not prove the user got an answer. The read-commit policy must use actionMode and taskRefs, not only responseState.
- Non-empty assistant text exists:
assistant.text.trim().length > 0 || assistant.previewText?.trim()
This classifies as responded_plain_text, not ideal success. It prevents prompt spam when the model actually answered in plain text, but diagnostics should warn that the app may not show it as a normal team message.
Reasoning-only output is not a user-visible answer:
assistant.reasoningText.trim().length > 0 && assistant.text.trim().length === 0
Do not classify this as responded_plain_text. Treat it like no visible response unless a meaningful successful tool call exists.
- Tool call exists but all meaningful calls are errored:
assistant.toolCalls.some((tool) => tool.isError)
This classifies as tool_error, not responded. Do not mark the inbox row read unless there is also a successful meaningful tool call or non-empty plain text.
- Assistant error exists:
assistant.hasError === true
This is not success. It should map to session_error and failed_retryable or failed_terminal depending on error type.
- Pending permission exists:
summary.pendingPermissionRequestIds.length > 0
This is permission_blocked. Do not retry automatically while blocked.
What Counts As Unanswered
Classify as empty_assistant_turn if a direct assistant child exists but all are true:
assistant.toolCalls.length === 0
assistant.text.trim().length === 0
assistant.reasoningText.trim().length === 0
assistant.completedAt == null
assistant.finishReason == null
assistant.hasError === false
Classify as pending if:
- raw status is
busyorretry, and no meaningful child exists yet.
Classify as empty_assistant_turn or unanswered if:
- raw status is
idle. - no meaningful child exists.
- no permission block exists.
Do not rely on replyPendingSinceMessageId from OpenCodeEventTranslator alone. It currently treats any child message as a reply, including the empty assistant-turn failure.
If the direct child contains only reasoning/step metadata and no visible text or meaningful successful tool call, treat it as unanswered for delivery-read purposes.
Response Proof Priority
Use this classification order:
session_staleif reconcile outcome is stale.permission_blockedif pending permissions exist.prompt_not_indexedif the delivered prompt is not indexed yet and raw status isbusyorretry.pendingif direct child has running tool calls or session status isbusy/retry.responded_visible_messageif a successful visible message tool call exists.responded_non_visible_toolif a successful meaningful non-visible tool call exists.responded_tool_callonly as a backward-compatible generic response if the implementation cannot distinguish visible vs non-visible tool classes.responded_plain_textif non-empty assistant text exists.tool_errorif errored meaningful tool calls exist and no success signal exists.empty_assistant_turnif direct assistant child is empty or session is idle with no child.
Response Sufficiency Policy
responseState is observation. Inbox read commit is policy.
Use this policy:
| Message intent | Sufficient to commit read | Not sufficient by itself |
|---|---|---|
taskRefs.length > 0 |
semantically sufficient visible message, plain text with concrete status, successful task tool, successful execution tool | bootstrap/identity tools, errored tools, ack-only visible message with no task/execution proof |
actionMode: 'do' |
semantically sufficient visible message, plain text with concrete status, successful meaningful execution or task tool | bootstrap/identity tools, errored tools, ack-only visible message with no tool proof |
actionMode: 'delegate' |
visible message that names target/action/status, plain text with concrete status, successful task tool or message_send delegation/status |
unrelated execution-only tool with no task/message effect, ack-only text |
actionMode: 'ask' |
visible message or plain text that contains an answer, blocking question, or concrete status | non-visible execution/task lookup only, ack-only text |
| no action mode and no task refs | visible message or plain text that contains an answer, blocking question, or concrete status | non-visible tool activity only, ack-only text |
This avoids the most dangerous false success: a user asks an OpenCode teammate a question, the model performs a read, then goes idle without sending any answer. That should stay pending or become retryable; it should not mark the inbox row read.
Before committing read for responded_visible_message, the app should prefer destination-store proof:
const visibleReply = await findVisibleReplyByRelayOfMessageId({
teamName,
replyRecipient,
from: memberName,
relayOfMessageId: inboxMessageId,
});
If destination proof exists, commit read even if the bounded transcript observation is still delayed. The MCP tool call already wrote the canonical app row, which is stronger than waiting for OpenCode transcript indexing.
If destination proof does not exist but transcript proof shows a successful direct-child message_send, keep the normal transcript-based policy and log visible_reply_destination_not_found_yet. Do not immediately retry; re-observe/re-scan once because inbox writes and transcript reconciliation can race in either direction.
If responded_non_visible_tool is not sufficient for the message intent:
- keep the ledger non-terminal as
accepted, - store
responseState: 'responded_non_visible_tool', - set
lastReason: 'visible_reply_still_required', - re-observe after a short delay,
- after grace, retry with copy asking for a visible
agent-teams_message_sendor plain answer.
Bridge Contract Extension
Extend both sides of the OpenCode bridge:
export interface OpenCodeSendMessageCommandData {
accepted: boolean;
sessionId?: string;
memberName: string;
runtimePid?: number;
prePromptCursor?: string | null;
responseObservation?: OpenCodeDeliveryResponseObservation;
diagnostics: OpenCodeTeamBridgeDiagnostic[];
}
Orchestrator runSendMessage() should still return accepted: true after promptAsync() succeeds, even if response observation is pending or unanswered.
Reason: acceptance and response are different facts. The bridge should not turn post-accept reconcile weakness into a prompt delivery failure.
Add Observe-Only Bridge Command
The watchdog must be able to inspect a previous accepted prompt without sending a new prompt.
Add a new command:
export interface OpenCodeObserveMessageDeliveryCommandBody {
runId?: string;
laneId: string;
teamId: string;
teamName: string;
projectPath: string;
memberName: string;
messageId: string;
prePromptCursor?: string | null;
}
export interface OpenCodeObserveMessageDeliveryCommandData {
observed: boolean;
sessionId?: string;
memberName: string;
runtimePid?: number;
responseObservation: OpenCodeDeliveryResponseObservation;
diagnostics: OpenCodeTeamBridgeDiagnostic[];
}
Recommended command name:
opencode.observeMessageDelivery
Why this is required:
- Calling
opencode.sendMessagefor "check before retry" would create a duplicate prompt. claude_teamdoes not have direct OpenCode transcript access.- The orchestrator already owns session records, stale-session detection, host access, and transcript projection.
observeMessageDelivery should run the same runtime/session validation as sendMessage except it must not call promptAsync().
This is intentionally not an app-side state-changing bridge command.
It is acceptable to reuse the orchestrator-side runtime precondition validator even if the current helper name includes "StateChanging". The hard boundary is app-side command execution: observe must not use the state-changing command service or its command ledger.
Concrete bridge wiring:
- Add
opencode.observeMessageDeliverytoOpenCodeBridgeCommandNameandVALID_COMMANDSinOpenCodeBridgeCommandContract.ts. - Add it to orchestrator
SUPPORTED_COMMANDSand the dispatch switch inOpenCodeBridgeCommandHandler.ts. - Add
observeOpenCodeTeamMessageDelivery()toOpenCodeReadinessBridge. - Add
observeMessageDelivery()toOpenCodeTeamRuntimeAdapterand the bridge port type. - Use direct
this.bridge.execute('opencode.observeMessageDelivery', ...), matching currentsendOpenCodeTeamMessage(). - Do not add it to
OpenCodeStateChangingTeamCommandName. - Do not route it through
OpenCodeStateChangingBridgeCommandService. - Do not create command-ledger entries or runtime-store manifest commits for observe.
- Use a short observe timeout separate from send timeout, for example
OPENCODE_OBSERVE_MESSAGE_TIMEOUT_MS = 8_000.
Reason: observe is read-only from the app perspective. It reconciles an existing OpenCode session and returns response evidence. Treating it as a state-changing command would add lease/idempotency behavior intended for launch/reconcile/stop and would create failure modes unrelated to delivery observation.
Clarification: reconcileSession() may still update the orchestrator's own session record, for example stale flags or last cursor. That is acceptable. The constraint is that the app bridge must not treat observe as a launch/reconcile/stop command and must not commit app runtime-store manifests for it.
Implementation sketch:
async function runObserveMessageDelivery(envelope: BridgeEnvelope) {
const record = await openCodeSessionStore.get(teamId, laneId, memberName);
const limited = await openCodeSessionBridge.reconcileSession(record, { limit: 80 });
let responseObservation = observeOpenCodeDeliveryResponse({
inboundMessageId: messageId,
prePromptCursor,
summary: limited.summary,
});
if (responseObservation.needsFullHistory === true) {
const full = await openCodeSessionBridge.reconcileSession(record);
responseObservation = observeOpenCodeDeliveryResponse({
inboundMessageId: messageId,
prePromptCursor,
summary: full.summary,
});
}
return { snapshot, data: { observed: true, responseObservation, ... } };
}
Watchdog flow must be:
- Observe existing prompt.
- If response is now proven, commit read.
- If pending or blocked, wait.
- If unanswered and retry due, send a retry prompt.
Never retry first and observe second.
Retry Policy
Recommended defaults:
const OPENCODE_PROMPT_DELIVERY_MAX_ATTEMPTS = 3;
const OPENCODE_PROMPT_DELIVERY_RETRY_DELAYS_MS = [30_000, 90_000, 180_000];
const OPENCODE_PROMPT_RESPONSE_GRACE_MS = 20_000;
const OPENCODE_PROMPT_RESPONSE_GRACE_FOR_TASK_MS = 45_000;
const OPENCODE_PROMPT_WATCHDOG_SCAN_MS = 15_000;
Rules:
- UI send gets one immediate bridge attempt.
- Watcher delivery gets one immediate bridge attempt.
- Watchdog always calls
opencode.observeMessageDeliverybefore deciding to retry. - If OpenCode is
busyorretry, keep the recordacceptedorpendingand scan later. - If OpenCode is
idlewith empty/no response after grace period, markunanswered. - Schedule retry if attempts remain.
- Retry prompt should include a short retry header:
<opencode_delivery_retry>
Previous app message delivery was accepted by OpenCode but no action was observed.
Retry attempt 2/3 for inbound app messageId "<messageId>".
If you already acted on this message, do not duplicate work; send a concrete status via agent-teams_message_send with relayOfMessageId="<messageId>" or update the related task. Do not reply only with acknowledgement.
</opencode_delivery_retry>
For direct ask messages where OpenCode already used non-visible tools but did not answer visibly, use a different header:
<opencode_delivery_retry>
Previous app message delivery was noticed, but no visible answer was observed.
Retry attempt 2/3 for inbound app messageId "<messageId>".
Please reply with agent-teams_message_send to "<replyRecipient>" and include relayOfMessageId="<messageId>". If that tool is unavailable, provide a concise plain-text answer. Do not repeat tool work unless needed and do not reply only with acknowledgement.
</opencode_delivery_retry>
- After attempts are exhausted, mark
failed_terminal.
Acceptance-Unknown Bridge Failures
Some failures happen at the command boundary after the prompt may already have reached OpenCode. Example: the app bridge times out while the orchestrator is still inside or just after promptAsync().
For these cases:
- Create or update the ledger record as
failed_retryable. - Set
acceptanceUnknown: true. - Store
responseState: 'not_observed'and the bridge timeout diagnostic. - The next watchdog action must be
opencode.observeMessageDelivery. - Only retry the prompt if observe finds no matching delivered user prompt or meaningful response after the normal grace window.
Do not immediately retry acceptance-unknown failures. That would turn a transport timeout into duplicate user prompts.
UI Timeout Is Not Delivery Failure
The current IPC send path wraps OpenCode live relay in a short UI timeout. That timeout protects the renderer, but it does not necessarily cancel the underlying relay promise. The relay may still accept the prompt and update the ledger after the UI receives a timeout-shaped result.
Therefore:
OPENCODE_RUNTIME_DELIVERY_UI_TIMEOUT_MSshould produceruntimeDelivery.responsePending = true, not a terminal failed delivery.- The ledger record should be left
pendingorfailed_retryablewithacceptanceUnknown: true. - The watchdog must observe before any retry.
- The UI copy should say that live delivery is still being checked, not that the message definitively failed.
- If the underlying relay later records response proof, normal read commit should complete.
Suggested reason code:
'opencode_runtime_delivery_ui_timeout_pending'
Do not reuse the old hard-failure copy for this state.
Retry Eligibility Matrix
| Observation state | Retry? | Commit inbox read? | Notes |
|---|---|---|---|
responded_tool_call |
Depends | Depends | Generic responded state, prefer more specific visible/non-visible states when possible. Apply response sufficiency policy. |
responded_visible_message |
Depends | Depends | Commit only when correlated and semantically sufficient for the message intent. Ack-only visible replies can require retry. |
responded_non_visible_tool |
Depends | Depends | Commit for task/do/delegate when sufficient. For ask/direct questions, require visible/plain reply. |
responded_plain_text |
No | Yes | Warn because app Messages may not show it as normal teammate reply. |
permission_blocked |
No | No | Wait for permission resolution. |
prompt_not_indexed |
No | No | Re-observe after short delay. |
pending |
No | No | Re-observe after short delay. |
empty_assistant_turn |
Yes, if due | No | Core retry path. |
tool_error |
Yes, if retryable | No | Retry only after grace and bounded attempts. |
session_stale |
Not immediately | No | Require lane/session recovery first. |
reconcile_failed |
Yes, if retryable | No | Prefer observe retry before prompt retry. |
not_observed with acceptanceUnknown |
Observe first | No | Never retry prompt before transcript observation. |
Single Outstanding Delivery Per OpenCode Member
This is important.
Current relay processes up to 10 unread rows per member in one loop. For OpenCode, that can hammer a model session with multiple prompts before it has acted on the first one. That increases the chance of empty turns, ignored tasks, or responses to the wrong prompt.
Recommended v1 behavior:
- For OpenCode members, process only the oldest eligible unread row if there is no active non-terminal ledger record for that member.
- If a record is
accepted,pending,unansweredwith futurenextAttemptAt, orretry_scheduled, do not relay newer rows for that member yet. - Once the active record becomes
respondedorfailed_terminal, the next scan can pick the next inbox row. - Rows with
failed_terminalledger records stay unread for visibility, but automatic relay selection must skip them unless the user explicitly retries that failed message.
This is not required for native teammates because native runtimes already consume inbox files directly and have their own event loops.
Required Serialization
Use one per-member OpenCode delivery mutex for all entry points:
- inbox file watcher relay,
- UI send relay,
- manual relay,
- watchdog observe/retry,
- startup recovery.
The existing openCodeMemberInboxRelayInFlight pattern is the right shape, but v1 must ensure the watchdog also goes through the same per-member gate. Do not create a second independent watchdog path that can race the relay path.
The mutex should cover:
- reading inbox rows,
- checking active ledger state,
- creating or resuming a ledger record,
- deciding whether a bridge call is allowed,
- applying the bridge result or observation.
Do not hold an inbox file lock across a bridge call. The durable guard is the ledger record, written before the bridge call. The in-memory per-member mutex prevents duplicate bridge calls inside one app process.
Lock ordering:
- Acquire per-member in-memory delivery gate.
- Mutate ledger with
VersionedJsonStore.updateLocked(), then release ledger file lock. - Call bridge if needed, with no inbox or ledger file lock held.
- Mutate ledger with result/observation, then release ledger file lock.
- If read commit is allowed, call
markInboxMessagesRead()with inbox lock. - Mutate ledger to record
inboxReadCommittedAtorinboxReadCommitError.
Never hold ledger and inbox file locks at the same time. This keeps the watchdog compatible with existing inbox writer/reader paths and avoids lock inversion.
If another call arrives with onlyMessageId while a different message is active:
- wait for the existing per-member work to finish,
- re-read the ledger and inbox,
- return queued/pending for the new message if it is still unread,
- do not report the new message as delivered just because the older in-flight relay finished.
Active Record Definition
A record blocks newer OpenCode inbox rows for the same member if:
record.status !== 'responded' &&
record.status !== 'failed_terminal'
or if:
record.status === 'responded' && record.inboxReadCommittedAt == null
The second case blocks newer rows only until the read commit is retried. It should not send another prompt.
Failed Terminal Rows
failed_terminal is terminal for automatic delivery attempts, not for user visibility.
Rules:
- Do not mark the inbox row read automatically.
- Do not select that row again in normal oldest-unread relay scans.
- Do not let that row block newer unread rows for the same member.
- Show diagnostics on the member/card/message surface.
- Allow a future explicit manual retry action to reset the ledger record or create a new attempt.
This avoids both bad outcomes: hiding a failed message by marking it read, or permanently blocking the member queue behind a failed unread row.
UI Send Exception
If the user sends a direct UI message to a member while an older OpenCode delivery is active:
Recommended v1 behavior:
- Persist the new inbox row as usual.
- Return
runtimeDelivery.responsePending = true. - Do not bypass the active record and send the new prompt immediately.
- Show copy like
Message saved. OpenCode is still processing an earlier delivery.
Reason: letting UI sends bypass the ordering rule can reintroduce the same prompt collision that caused lost task assignments.
App-Side Relay Changes
New Service
Add:
src/main/services/team/opencode/delivery/OpenCodePromptDeliveryLedger.ts
Responsibilities:
- Create or read records by
teamName/memberName/laneId/inboxMessageId. - Enforce payload hash consistency.
- Transition records with validation.
- List due retry records.
- List active member records.
- Prune terminal records older than a retention window.
- Use
VersionedJsonStore.updateLocked()for every mutation.
Suggested retention:
const RESPONDED_RETENTION_MS = 7 * 24 * 60 * 60 * 1000;
const FAILED_RETENTION_MS = 30 * 24 * 60 * 60 * 1000;
Ledger API Surface
Recommended methods:
interface OpenCodePromptDeliveryLedgerStore {
ensurePending(input: EnsurePromptDeliveryInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
getByInboxMessage(input: DeliveryRecordKeyInput): Promise<OpenCodePromptDeliveryLedgerRecord | null>;
getActiveForMember(input: { teamName: string; laneId: string; memberName: string }): Promise<OpenCodePromptDeliveryLedgerRecord | null>;
listDue(input: { teamName?: string; now: Date; limit: number }): Promise<OpenCodePromptDeliveryLedgerRecord[]>;
markAccepted(input: MarkAcceptedInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
applyObservation(input: ApplyObservationInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
applyDestinationProof(input: ApplyDestinationProofInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
markRetryScheduled(input: MarkRetryScheduledInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
markRetryAttempted(input: MarkRetryAttemptedInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
markInboxReadCommitted(input: { id: string; committedAt: string }): Promise<OpenCodePromptDeliveryLedgerRecord>;
markInboxReadCommitFailed(input: { id: string; error: string }): Promise<OpenCodePromptDeliveryLedgerRecord>;
markFailed(input: MarkFailedInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
}
Suggested destination proof input:
interface ApplyDestinationProofInput {
id: string;
visibleReplyInbox: string;
visibleReplyMessageId: string;
visibleReplyCorrelation: 'relayOfMessageId';
visibleReplyText: string;
visibleReplySummary?: string | null;
semanticallySufficient: boolean;
observedAt: string;
}
applyObservation() must be idempotent. Re-observing a previously responded record must not change attempts or schedule another prompt.
applyDestinationProof() must also be idempotent. It should set responseState = 'responded_visible_message' and visible-reply fields without incrementing attempts. It should set status = 'responded' and respondedAt only when semanticallySufficient === true; otherwise keep the delivery non-terminal and set lastReason = 'visible_reply_ack_only_still_requires_answer'.
Implementation constraints:
- Use the existing
RuntimeDeliveryJournalStorestyle as a template, but do not copy itsbegin()attempt semantics blindly. ensurePending()must not incrementattemptswhen a duplicate watcher event sees the same pending or accepted record.- Increment
attemptsonly when a new prompt attempt is actually about to be sent. markRetryAttempted()should be the only normal transition that increments attempts after the initial attempt.- Validate duplicate ids on read, like
RuntimeDeliveryJournalStoredoes. - A payload hash conflict is terminal for that record and must not send a prompt.
- Ledger mutation methods should return the updated record so relay/watchdog code never reasons from stale in-memory copies.
TeamProvisioningService Relay Flow
Current simplified flow:
for (const message of unread.slice(0, 10)) {
const delivery = await deliverOpenCodeMemberMessage(...);
if (delivery.delivered) {
await markInboxMessagesRead(...);
}
}
New flow:
const active = await promptDeliveryLedger.getActiveForMember(teamName, memberName);
if (active && !isDueForRetry(active, now)) {
return {
relayed: 0,
attempted: 0,
delivered: 0,
failed: 0,
lastDelivery: {
delivered: true,
responsePending: true,
reason: 'opencode_delivery_response_pending',
},
};
}
const message = selectOldestEligibleUnreadMessage(...);
let record = await promptDeliveryLedger.ensurePending(...);
const delivery = await deliverOpenCodeMemberMessage(...);
record = await promptDeliveryLedger.applyDeliveryResult(record.id, delivery);
const visibleReply = await findOpenCodeVisibleReplyByRelayOfMessageId({
teamName,
replyRecipient: record.replyRecipient,
from: memberName,
relayOfMessageId: record.inboxMessageId,
});
if (visibleReply) {
record = await promptDeliveryLedger.applyDestinationProof({
id: record.id,
visibleReplyInbox: visibleReply.inboxName,
visibleReplyMessageId: visibleReply.messageId,
visibleReplyText: visibleReply.text,
visibleReplySummary: visibleReply.summary,
visibleReplyCorrelation: 'relayOfMessageId',
semanticallySufficient: isVisibleReplySemanticallySufficient({
actionMode: record.actionMode,
taskRefs: record.taskRefs,
text: visibleReply.text,
summary: visibleReply.summary,
}),
observedAt: new Date().toISOString(),
});
}
if (isOpenCodeDeliveryReadCommitAllowed(delivery, record)) {
await markInboxMessagesRead(teamName, memberName, [message]);
await promptDeliveryLedger.markInboxReadCommitted(...);
}
selectOldestEligibleUnreadMessage() must skip unread rows whose ledger record is failed_terminal, unless options.onlyMessageId explicitly targets that message for a manual retry path.
isOpenCodeDeliveryReadCommitAllowed() should return true for:
- semantically sufficient visible response:
responded_visible_message, - plain assistant text:
responded_plain_text, - generic
responded_tool_callonly when it can be classified as sufficient for the message intent, - non-visible tool activity only when
actionMode/taskRefsmake that sufficient.
It should return false for:
responded_non_visible_toolon directaskmessages without task refs,tool_error,- bootstrap/identity-only tools,
pending,prompt_not_indexed,empty_assistant_turn,reconcile_failed,session_stale.
Destination proof helper:
async function findOpenCodeVisibleReplyByRelayOfMessageId(input: {
teamName: string;
replyRecipient: string;
from: string;
relayOfMessageId: string;
}): Promise<{
inboxName: string;
messageId: string;
timestamp: string;
text: string;
summary: string | null;
} | null>
Rules:
- For
replyRecipient === 'user', readinboxes/user.json. - For lead aliases, resolve to the configured lead inbox name.
- For same-team teammate recipients, resolve to that teammate inbox name.
- Require exact
frommatch after member-name canonicalization. - Require exact
relayOfMessageIdmatch. - Prefer a row with
source === 'runtime_delivery', but allow missing source only as fallback with diagnosticvisible_reply_missing_runtime_delivery_source. - Do not match by text, summary, timestamp, or task display id.
If destination proof appears after the first bridge response observation, the next watchdog scan should mark the delivery responded, commit the original OpenCode recipient row read, and not send another prompt.
applyDeliveryResult() must persist:
acceptedAtwhen bridge accepted.prePromptCursorreturned by the orchestrator.runtimeSessionIdandruntimePidif present.acceptanceUnknownwhen a command-boundary timeout means prompt acceptance cannot be proven.responseState,deliveredUserMessageId,observedAssistantMessageId, andobservedToolCallNameswhen observation is available.visibleReplyMessageId,visibleReplyInbox, andvisibleReplyCorrelationwhen destination-store proof exists.
If delivery.responseState is already responded_* but markInboxMessagesRead() fails, the relay must not treat that as prompt failure. It should store inboxReadCommitError and retry the read commit on the next scan.
If an unread row already has a ledger record with status: 'responded' and inboxReadCommittedAt: null, relay should only retry markInboxMessagesRead().
Delivery Result Shape
Extend internal delivery result:
interface OpenCodeMemberInboxDelivery {
delivered: boolean; // bridge accepted or full response? See below.
accepted?: boolean;
responsePending?: boolean;
responseState?: OpenCodePromptResponseState;
ledgerStatus?: OpenCodePromptDeliveryStatus;
acceptanceUnknown?: boolean;
visibleReplyMessageId?: string;
visibleReplyCorrelation?: 'relayOfMessageId' | 'direct_child_message_send' | 'plain_assistant_text';
queuedBehindMessageId?: string;
reason?: string;
diagnostics?: string[];
}
For internal relay completion:
accepted: true,responsePending: true,delivered: trueis acceptable for UI feedback, but must not mark row read.responseState: responded_*,delivered: true,responsePending: falsecommits read.delivered: falsemeans prompt acceptance failed.
For SendMessageResult.runtimeDelivery, keep backward compatibility:
runtimeDelivery: {
providerId: 'opencode',
attempted: true,
delivered: accepted,
responsePending: observation is not responded,
responseState,
ledgerStatus,
acceptanceUnknown,
visibleReplyMessageId,
visibleReplyCorrelation,
queuedBehindMessageId,
reason,
diagnostics,
}
This avoids making UI send look like a hard failure when OpenCode accepted the message but has not responded yet.
Watchdog Scheduling
Add a lightweight app-side scheduler:
src/main/services/team/opencode/delivery/OpenCodePromptDeliveryWatchdog.ts
Responsibilities:
- Wake after inbox relay creates
accepted,unanswered, orretry_scheduled. - On app startup or team activation, scan due records.
- For each due record, first call the observe-only bridge path.
- If observe shows response proof, commit the inbox row read.
- If observe shows pending/blocked, reschedule without prompt retry.
- If observe shows unanswered and retry is due, call back into
TeamProvisioningService.relayOpenCodeMemberInboxMessages(teamName, memberName, { onlyMessageId, source: 'watchdog' }). - Cap concurrency globally and per member.
- Use the same per-member OpenCode delivery gate as relay, not a separate parallel executor for the same member.
Recommended caps:
const OPENCODE_PROMPT_WATCHDOG_GLOBAL_CONCURRENCY = 2;
const OPENCODE_PROMPT_WATCHDOG_PER_TEAM_CONCURRENCY = 1;
Do not use a tight interval. Use:
- a delayed timer for the nearest
nextAttemptAt, - plus opportunistic scans on inbox watcher events,
- plus app/team startup recovery.
Startup Recovery Algorithm
On app startup or when a team detail view becomes active:
- Read OpenCode lane index.
- For each active lane, read lane-scoped prompt delivery ledger.
- For each
respondedrecord withinboxReadCommittedAt === null, retry onlymarkInboxMessagesRead(). - For each
acceptedorretry_scheduledrecord, callopencode.observeMessageDelivery. - For each due
unansweredrecord, observe first, then retry only if still unanswered. - For missing/quarantined prompt ledger files, reconstruct unread inbox rows as
acceptanceUnknownand observe before prompt. - Ignore records whose lane is no longer active unless the row is still unread and the team is running. In that case classify as
failed_retryablewithopencode_runtime_not_active.
Recovery must group work by (teamName, laneId, memberName) and run each group through the per-member gate. Do not let startup recovery observe/retry a member while the inbox watcher is also relaying that member.
This avoids retrying stale prompts immediately after app restart.
Orchestrator Response Observer
New Pure Helper
Add:
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeDeliveryResponseObserver.ts
Example:
export function observeOpenCodeDeliveryResponse(input: {
inboundMessageId: string | null;
prePromptCursor: string | null;
summary: OpenCodeSessionReconcileSummary;
}): OpenCodeDeliveryResponseObservation {
if (input.summary.reconcileOutcome === 'stale') {
return { state: 'session_stale', ... };
}
if (input.summary.pendingPermissionRequestIds.length > 0) {
return { state: 'permission_blocked', ... };
}
const deliveryAttempts = findDeliveredUserMessageAttempts(
input.summary.messages,
input.inboundMessageId,
input.prePromptCursor,
);
if (deliveryAttempts.length === 0) {
return input.summary.rawStatus === 'busy' || input.summary.rawStatus === 'retry'
? { state: 'pending', ... }
: { state: 'empty_assistant_turn', reason: 'delivered_user_message_not_found', ... };
}
const attemptChildren = deliveryAttempts.map((attempt) => ({
deliveredUser: attempt,
assistantChildren: input.summary.messages.filter(
(message) => message.role === 'assistant' && message.parentId === attempt.id
),
}));
return classifyDeliveryAttempts(attemptChildren, input.summary.rawStatus);
}
Keep this helper pure so fixture tests are cheap and deterministic.
Bridge Integration
Update runSendMessage():
const inboundMessageId = asString(body.messageId);
const prePromptCursor = record.lastCanonicalCursor ?? null;
await openCodeSessionBridge.promptAsync(...);
let responseObservation: OpenCodeDeliveryResponseObservation = {
state: 'pending',
...
};
try {
const reconciled = await withTimeout(...);
responseObservation = observeOpenCodeDeliveryResponse({
inboundMessageId,
prePromptCursor,
summary: reconciled.summary,
});
} catch (error) {
responseObservation = {
state: 'reconcile_failed',
reason: stringifyError(error),
...
};
}
return {
accepted: true,
prePromptCursor,
responseObservation,
...
};
The app ledger must persist the returned prePromptCursor with the delivery record. If the bridge cannot return it because the command failed before loading a session record, store null and rely only on exact messageId matching.
UI Semantics
UI should not synthesize replies.
For user sends to OpenCode:
- If inbox persistence succeeds and bridge acceptance fails: show warning as today.
- If bridge acceptance succeeds but response is pending: show a non-blocking warning/info:
Message delivered to OpenCode. Waiting for teammate response...
- If watchdog later marks
failed_terminal: surface diagnostic in team warning area or member card, not as a fake message from the teammate.
Add fields to SendMessageResult.runtimeDelivery:
responsePending?: boolean;
responseState?: OpenCodePromptResponseState;
ledgerStatus?: OpenCodePromptDeliveryStatus;
acceptanceUnknown?: boolean;
visibleReplyMessageId?: string;
visibleReplyCorrelation?: 'relayOfMessageId' | 'direct_child_message_send' | 'plain_assistant_text';
queuedBehindMessageId?: string;
Renderer store should continue to return the SendMessageResult and only rethrow real send failures.
Important current UI detail:
- The renderer inserts an optimistic copy of the user's sent message with
read: truein the sender-side message feed. - That is separate from the recipient's inbox row and should not be changed by the OpenCode prompt ledger.
- The ledger controls only
inboxes/<opencode-member>.jsonread commit.
Warning copy should distinguish three cases:
- Bridge failed:
OpenCode runtime delivery failed: <reason>. Message was saved to inbox.
- UI timeout while delivery is still being checked:
OpenCode delivery is still being checked. Message was saved and will be observed before retry.
- Bridge accepted but response pending:
Message delivered to OpenCode. Waiting for teammate response...
- Message queued behind older OpenCode delivery:
Message saved. OpenCode is still processing an earlier delivery.
- Previous OpenCode delivery exhausted retries:
OpenCode did not respond after retries. Message is still unread and will not be retried automatically.
Renderer pending-reply behavior:
- Keep the recipient in a pending state for
responsePending: true. - Clear pending when a real message from that member appears in
inboxes/user.json. - Clear pending when runtime delivery fails terminally.
- Do not clear pending merely because
delivered: true; for OpenCode that can mean prompt accepted, not teammate answered.
Task Assignment Integration
Task assignment notifications are the highest-risk OpenCode path because they are often watcher-driven, not UI-driven.
Required behavior:
- Assignment inbox rows must include
taskRefs. - OpenCode delivery prompt already includes taskRefs when present.
- Watchdog response proof for task rows accepts task MCP tool calls as success.
- Direct user
askrows should not be closed by a non-visible tool call alone; they need visible message or plain assistant text. - If OpenCode sends a visible message saying blocked or busy with concrete blocker/status, classify as responded and mark read.
- If OpenCode does nothing, retry bounded.
Already fixed in controller during research:
agent-teams-controller/src/internal/tasks.jsnow includes the assigned task as a structuredtaskRefsitem in owner assignment notifications.- Test added:
includes the assigned task ref in owner assignment notifications.
This should be part of the final implementation baseline.
Delivery Watchdog Is Not A Task Stall Monitor
The delivery ledger should answer only one question:
Did OpenCode notice and act on this delivered inbox row?
It should not try to prove:
- the task is complete,
- the implementation is correct,
- the teammate is making enough progress after the first action.
If an agent responds to an assignment by calling task_get or task_start, the delivery should be considered responded. If the task later stalls, that belongs to the task stall monitor or a separate OpenCode task-progress watchdog.
This boundary avoids turning message delivery into a broad autonomous task supervisor.
Do not enable or rewrite the global task-stall monitor as part of this v1. It is a separate product behavior with different thresholds, recipients, and false-positive risks. The only v1 integration is that task-related MCP tool calls can prove prompt delivery.
Edge Cases
Duplicate inbox watcher events
Problem: leaving rows unread until responded means file watchers can repeatedly call relay.
Mitigation:
- Ledger lookup happens before any bridge call.
- If active record is not due, relay returns
responsePending: truewithout sending another prompt.
App restart after accepted prompt
Problem: app exits after prompt acceptance but before response proof.
Mitigation:
- Ledger persists
accepted. - Startup recovery scans accepted records.
- If transcript now shows response proof, mark
respondedand commit inbox row read. - If still unanswered and due, retry.
OpenCode session stale after behavior/config change
Problem: reconcileSession() can return stale because behavior fingerprint changed.
Mitigation:
- Observer returns
session_stale. - Ledger moves to
failed_retryablewith diagnostic. - If active lane recovery later resolves session identity, retry can continue.
- If no active runtime can be resolved, move to
failed_terminalafter max attempts or team stop.
Permission request blocks the session
Problem: OpenCode may need tool permission approval.
Mitigation:
- Observer returns
permission_blocked. - Do not retry while blocked.
- Keep row unread.
- Existing permission UI should handle the request.
- After permission answer, watchdog scan should re-check response state.
Plain assistant text
Problem: OpenCode may answer as plain text instead of agent-teams_message_send.
Mitigation:
- Classify as
responded_plain_text. - Mark row read to avoid prompt spam.
- Emit diagnostic warning because app Messages may not display that as a normal teammate message unless transcript projection surfaces it elsewhere.
Do not retry plain text. Retrying would likely produce duplicate or contradictory answers.
Multiple unread messages
Problem: old code can deliver up to 10 unread OpenCode prompts in one loop.
Mitigation:
- One outstanding non-terminal ledger record per OpenCode member.
- Newer rows wait in inbox until the active one is answered or terminal.
Same text, multiple messages
Problem: duplicate text should not dedupe distinct messages.
Mitigation:
- Primary key is
inboxMessageId. - Payload hash only detects corruption.
Message text changed under same messageId
Problem: inbox row edited after ledger creation.
Mitigation:
- If payload hash differs for same record ID, do not retry silently.
- Mark
failed_terminalwithopencode_prompt_delivery_payload_mismatch.
Team stopped while retries are scheduled
Problem: retries after stop would resurrect stale runtime assumptions.
Mitigation:
- Watchdog checks current team/run/lane before retry.
- If runtime stopped, mark
failed_terminalorfailed_retryabledepending on whether team is still running.
Lead is OpenCode
Pure OpenCode lead remains unsupported in v1.
If recipient is OpenCode lead and no stored lead session exists:
- Do not fake delivery.
- Leave inbox unread.
- Emit explicit diagnostic.
This plan is for OpenCode secondary teammate delivery.
Attachments
Current renderer UX only supports file attachments for live team-lead sends. OpenCode secondary teammate live delivery does not have an attachment transport in v1.
Rules:
- Do not add OpenCode attachment transport in this watchdog work.
- If an OpenCode secondary inbox row has attachments anyway, include attachment metadata in
payloadHash. - Do not embed file contents into the OpenCode prompt.
- Return
failed_terminalwithopencode_attachments_not_supported_for_secondary_runtimeunless a later implementation adds a real attachment delivery contract. - Keep the row unread and surface the diagnostic.
Reason: silently delivering only text for an attachment-bearing message would be another form of message loss.
Implementation Phases
Phase 0b - MCP Visible Reply Correlation
Files:
mcp-server/src/tools/messageTools.tsagent-teams-controller/src/internal/messages.jsagent-teams-controller/src/internal/messageStore.jssrc/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.tssrc/main/services/team/TeamDataService.tssrc/main/services/team/TeamMessageFeedService.tssrc/renderer/store/slices/teamSlice.tstest/controller.test.js
Definition of done:
message_sendaccepts optionalrelayOfMessageId.- Controller passes
relayOfMessageIdthrough to the message store without inventing it. - Message store persists
relayOfMessageIdon the visible reply row. - OpenCode runtime prompt asks for
relayOfMessageId="<inbound app messageId>"on visible runtime replies. - OpenCode ack-only guard does not treat
sourcealone as explicit delivery context. - Existing renderer/feed correlation by
relayOfMessageIdcontinues to work for native and OpenCode messages. - No existing
message_sendcaller is required to providerelayOfMessageId; the field is only mandatory for OpenCode runtime-delivery prompt wording.
This phase is small but important. It gives the watchdog a hard correlation signal and prevents future implementation from falling back to summary/time heuristics.
Phase 1 - Orchestrator Response Observer
Files:
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeDeliveryResponseObserver.ts/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.ts/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/types.ts/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.test.ts
Definition of done:
runSendMessage()still returnsaccepted: trueafterpromptAsync()succeeds.- Response observation is included when reconcile succeeds.
- Reconcile failure after prompt acceptance returns
accepted: truewithresponseObservation.state = 'reconcile_failed'. - Empty assistant child is classified as unanswered, not responded.
- Tool call child is classified as responded.
- Plain assistant text is classified as
responded_plain_text. prompt_not_indexedis returned when async prompt acceptance has not appeared in history yet.
Phase 1b - Observe-Only Bridge Command
Files:
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.ts/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.test.tssrc/main/services/team/opencode/bridge/OpenCodeBridgeCommandContract.tssrc/main/services/team/opencode/bridge/OpenCodeReadinessBridge.tssrc/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.ts
Definition of done:
opencode.observeMessageDeliveryvalidates runtime/session preconditions.- It does not call
promptAsync(). - It is executed as a direct bridge command, like current
opencode.sendMessage. - It is not part of
OpenCodeStateChangingTeamCommandName. - It does not use
OpenCodeStateChangingBridgeCommandService. - It returns the same response observation shape as
opencode.sendMessage. - App watchdog uses this before every retry.
Phase 2 - Bridge Contract Propagation In claude_team
Files:
src/main/services/team/opencode/bridge/OpenCodeBridgeCommandContract.tssrc/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.tssrc/main/services/team/TeamProvisioningService.tssrc/shared/types/team.ts
Definition of done:
OpenCodeTeamRuntimeAdapter.sendMessageToMember()maps bridgeresponseObservation.OpenCodeTeamRuntimeAdapter.observeMessageDelivery()maps observe-only bridgeresponseObservation.deliverOpenCodeMemberMessage()returnsaccepted,responsePending, andresponseState.- IPC UI relay timeout maps to
responsePendingplusacceptanceUnknown, not terminal failure. - UI
SendMessageResult.runtimeDeliverycan show pending response without treating it as send failure.
Phase 3 - Prompt Delivery Ledger
Files:
src/main/services/team/opencode/delivery/OpenCodePromptDeliveryLedger.tssrc/main/services/team/opencode/store/RuntimeStoreManifest.tstest/main/services/team/OpenCodePromptDeliveryLedger.test.ts
Definition of done:
- Store is versioned and validates records.
- Store is lane-scoped for secondary OpenCode lanes.
RuntimeStoreSchemaName,OPENCODE_RUNTIME_STORE_DESCRIPTORS, and schema validators knowopencode.promptDeliveryLedger.- Duplicate record with same payload is idempotent.
- Same record ID with different payload hash fails loudly.
- Due retry query works.
- Terminal record pruning is deterministic.
Phase 4 - Relay Integration
Files:
src/main/services/team/TeamProvisioningService.tstest/main/services/team/TeamProvisioningServiceRelay.test.ts
Definition of done:
- OpenCode row is not marked read at prompt acceptance if response is pending/unanswered.
- OpenCode row is marked read when response proof is
responded_*. - OpenCode row is not re-prompted when response is observed but read commit failed.
- Duplicate watcher events do not re-prompt before
nextAttemptAt. - One outstanding OpenCode delivery per member is enforced.
- Existing native relay tests still pass.
Phase 5 - Watchdog Scheduler
Files:
src/main/services/team/opencode/delivery/OpenCodePromptDeliveryWatchdog.tssrc/main/services/team/TeamProvisioningService.tssrc/main/index.tstest/main/services/team/TeamProvisioningServiceRelay.test.ts
Definition of done:
- Startup/team activation scans due OpenCode prompt deliveries.
- File watcher events schedule nearest retry instead of tight loops.
- Bounded retry stops at
maxAttempts. - Team stop cancels or ignores due retries.
Phase 6 - UI Diagnostics
Files:
src/shared/types/team.tssrc/renderer/store/slices/teamSlice.tssrc/renderer/components/team/messages/MessageComposer.tsxsrc/renderer/components/team/dialogs/SendMessageDialog.tsxsrc/renderer/components/team/messages/MessagesPanel.tsx
Definition of done:
- UI send does not clear as hard failure when OpenCode accepted but response pending.
- UI shows a clear pending response diagnostic.
- UI does not synthesize teammate replies.
- Real bridge failure still shows warning and preserves draft behavior.
Phase 7 - Fixture E2E Coverage
Files:
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/fixtures/test/fixtures/team/opencode/- new orchestrator observer tests.
- new app relay ledger tests.
Required fixture scenarios:
- Real success with OpenCode task/message tool calls.
- Empty assistant turn after delivered prompt.
- Session stale after prompt acceptance.
- Permission blocked.
- Plain assistant text fallback.
- Multiple unread rows with one active outstanding record.
- App restart with accepted ledger record and later transcript proof.
- Long transcript where limited reconcile misses the delivered prompt and full-history fallback finds it.
- Bridge timeout after possible prompt acceptance, followed by observe-before-retry.
Minimum Real-Fixture Assertions
For every real OpenCode fixture test, assert all of these:
- Delivered prompt is matched by inbound
messageId, not by latest-user heuristic. - Multiple retry prompts with the same inbound
messageIdare treated as one logical delivery. - Multiple assistant children under one delivered prompt are all inspected.
- Empty assistant child does not count as responded.
- Bootstrap-only tool calls do not count as responded.
- Successful
agent-teams_message_sendcounts as visible response, but read commit still requires correlation and semantic sufficiency. - Successful task tool call counts as delivery response for task assignment.
- Errored
agent-teams_message_senddoes not mark inbox read. - Plain text response counts as responded but emits diagnostic.
- Tool names are normalized across
agent-teams_,mcp__agent-teams__, and plain forms. - Lowercase OpenCode execution tools such as
bashandreadcount only whenactionModeallows execution response proof.
Test Plan
Orchestrator
cd /Users/belief/dev/projects/claude/agent_teams_orchestrator
bun test src/services/opencode/OpenCodeBridgeCommandHandler.test.ts src/services/opencode/OpenCodeDeliveryResponseObserver.test.ts
Target tests:
sendMessage returns accepted with responded_tool_call when child assistant has MCP tool call.sendMessage returns visibleReplyCorrelation relayOfMessageId when message_send args reference inbound messageId.sendMessage returns visible_reply_missing_relayOfMessageId diagnostic when direct-child message_send lacks relayOfMessageId.sendMessage returns accepted with empty_assistant_turn when child assistant has no text/tool/error/finish.sendMessage returns accepted with pending when OpenCode is busy and no child exists.sendMessage returns accepted with permission_blocked when pending permission exists.sendMessage returns accepted with reconcile_failed warning when reconcile throws.observeMessageDelivery does not call promptAsync.observeMessageDelivery returns prompt_not_indexed for accepted prompt not visible in transcript yet.observeMessageDelivery uses full-history fallback when limited reconcile misses the prompt anchor.observeMessageDelivery is not routed through state-changing bridge command service.observer normalizes agent-teams, mcp agent-teams, and plain tool names.- Fixture test for real OpenCode transcript projection.
App main process
cd /Users/belief/dev/projects/claude/claude_team
pnpm vitest run \
test/main/services/team/TeamProvisioningServiceRelay.test.ts \
test/main/services/team/OpenCodeTeamRuntimeAdapter.test.ts \
test/main/ipc/teams.test.ts
Target tests:
visible reply with relayOfMessageId commits OpenCode inbox read even when transcript observe is still prompt_not_indexed.visible reply destination row is preferred over transcript-only response proof.transcript message_send without destination row is re-observed once before retry.correlated ack-only visible reply to ask message stays unread and schedules visible answer retry.correlated ack-only visible reply to task assignment stays unread unless task or execution proof exists.semantic sufficiency classifier only blocks short exact ack-only phrases and allows concrete status text.accepted pending OpenCode delivery keeps inbox row unread and ledger accepted.responded OpenCode delivery marks inbox row read.direct ask with only non-visible tool activity stays unread and requests visible reply.reasoning-only assistant child does not mark direct ask read.task assignment with task_start marks delivery responded and read.missing ledger rebuilds unread row as acceptanceUnknown and observes before prompt.failed_terminal unread row is skipped by automatic relay and does not block newer rows.OpenCode secondary row with attachments fails terminal with attachment unsupported diagnostic and is not text-only delivered.pending record is not retried into a different lane after recipient provider/model changes.responded but mark-read failed stores inboxReadCommitError and does not retry prompt.duplicate watcher while accepted does not call bridge again.watchdog and watcher share one per-member delivery gate.onlyMessageId request behind active older delivery returns queued pending, not delivered.due unanswered row retries once and increments attempts.watchdog observes before retry and skips prompt when response appeared after previous timeout.bridge timeout with acceptanceUnknown observes before retrying.max attempts moves to failed_terminal.one outstanding record blocks newer unread rows.app restart scans accepted ledger and commits read when transcript now proves response.
Controller
cd /Users/belief/dev/projects/claude/claude_team
pnpm --filter agent-teams-controller test -- test/controller.test.js
Must stay green, including assignment notification taskRefs.
Additional target tests:
message_send accepts relayOfMessageId and persists it.OpenCode ack-only runtime_delivery without relayOfMessageId is rejected.OpenCode substantive runtime_delivery without relayOfMessageId is allowed but not treated as correlated delivery proof.source alone does not bypass idle ack filtering for OpenCode senders.
Renderer
cd /Users/belief/dev/projects/claude/claude_team
pnpm vitest run \
test/renderer/store/teamSlice.test.ts \
test/renderer/components/team/messages/MessagesPanel.test.ts \
test/renderer/components/team/dialogs/SendMessageDialog.test.tsx
Target tests:
- OpenCode accepted but response pending is not treated as send failure.
- OpenCode UI relay timeout is shown as pending/unknown, not terminal failure.
- Bridge failure still returns warning.
- Draft is not cleared on true send failure.
- Pending reply indicator remains while
responsePendingis true and clears on real reply or terminal delivery failure. - Pending reply indicator clears when a user-inbox row arrives with matching
relayOfMessageId. - No fake teammate reply is projected.
Rollout And Safety
Recommended rollout:
- Default enabled for OpenCode only.
- Add an emergency kill switch:
CLAUDE_TEAM_OPENCODE_PROMPT_DELIVERY_WATCHDOG=0
If disabled:
- Keep current prompt delivery behavior.
- Log that OpenCode prompt delivery watchdog is disabled.
This is a pragmatic safety valve, not a product feature flag.
Logging
Add structured logs for these events:
opencode_prompt_delivery_ledger_createdopencode_prompt_delivery_prompt_acceptedopencode_prompt_delivery_response_observedopencode_prompt_delivery_unansweredopencode_prompt_delivery_retry_scheduledopencode_prompt_delivery_retry_attemptedopencode_prompt_delivery_terminal_failureopencode_prompt_delivery_inbox_committed_read
Minimum fields:
{
teamName,
memberName,
laneId,
runId,
inboxMessageId,
runtimeSessionId,
status,
responseState,
attempts,
nextAttemptAt,
visibleReplyCorrelation,
visibleReplySemanticallySufficient,
reason,
}
Do not log full message text by default. Log payloadHash and first safe preview only under debug.
Production Risks And Mitigations
Risk: legitimate long-running work gets retried
Mitigation:
- If OpenCode status is
busyorretry, do not retry. - For task assignments, use longer grace.
- Meaningful tool calls count as response proof after filtering bootstrap and identity-only tools.
Risk: plain text response is not visible in Messages UI
Mitigation:
- Classify plain text as
responded_plain_text. - Do not retry.
- Add diagnostic so we can later improve transcript-to-UI projection.
Risk: unread rows accumulate after terminal failure
Mitigation:
- Keep terminal diagnostics in ledger.
- Skip terminal-failed rows during automatic relay selection so they do not block newer rows.
- Optionally mark terminal failures read only if UI has an explicit "ack failed delivery" action. Do not auto-read in v1.
Risk: user asks a question, model uses a tool, but never answers
Mitigation:
- Treat non-visible tool activity as sufficient only for task/do/delegate intents.
- For
askmessages, require visibleagent-teams_message_sendor plain assistant text before read commit. - Retry with copy that asks for a visible answer if the session goes idle after only non-visible tool activity.
Risk: model sends correlated acknowledgement but no answer/action
Mitigation:
relayOfMessageIdproves correlation only, not semantic completion.- For asks and direct messages, ack-only text is not read-commit sufficient.
- For task/do delivery, ack-only text needs task or execution proof before read commit.
- Retry copy asks for concrete answer/status and tells the model not to send only acknowledgement.
Risk: retry prompt causes duplicate task work
Mitigation:
- Retry prompt explicitly says not to duplicate if already acted.
- Observer checks transcript before every retry.
- One outstanding delivery per member reduces prompt collisions.
Risk: app and orchestrator disagree on response states
Mitigation:
- Response state enum lives in bridge contract.
- App treats unknown response states as pending with diagnostic, not success.
Implementation Guardrails
Reject an implementation if any of these are true:
- It marks an OpenCode member inbox row read immediately after
promptAsync()acceptance. - It marks direct
askmessages read after only non-visible tool activity. - It uses summary/time/passive-message linking as OpenCode delivery proof instead of explicit
relayOfMessageIdor transcript parent/child proof. - It treats
source="runtime_delivery"alone as explicit context for OpenCode ack-only messages. - It marks direct asks or task deliveries read after a correlated but ack-only visible message with no answer/status/tool proof.
- It requires every
message_sendcaller to providerelayOfMessageId; only OpenCode runtime-delivery replies should be prompted to include it. - It retries a prompt before running observe-only reconcile.
- It uses latest user message as response-proof anchor without exact
messageIdor pre-prompt cursor. - It returns terminal missing-prompt states from a bounded transcript before trying full-history fallback when the anchor is absent.
- It treats bootstrap-only tools as response proof.
- It routes
opencode.observeMessageDeliverythrough the state-changing bridge command service. - It uses a team-root prompt ledger for secondary OpenCode lanes.
- It allows multiple active non-terminal OpenCode prompt deliveries for the same member.
- It treats mark-read failure after response proof as a reason to re-prompt.
- It hides terminal OpenCode delivery failure by silently marking the inbox row read.
- It lets a
failed_terminalunread row block newer automatic OpenCode deliveries forever. - It silently drops attachments when delivering an OpenCode secondary inbox row.
- It retries a ledger record through a newly resolved lane/provider instead of the record's original OpenCode lane identity.
- It changes native Codex, Claude, or Gemini teammate inbox semantics.
The core invariant:
OpenCode prompt accepted != OpenCode teammate responded.
Everything else follows from this.
Open Questions
These are not blockers for v1, but they should be decided before implementation review:
-
Should
failed_terminalrows remain unread forever, or should the UI get an explicit "ack failed delivery" action? Recommended: keep unread and surface diagnostics in v1. 🎯 8 🛡️ 8 🧠 4 -
Should
responded_plain_textbe enough to mark read? Recommended: yes, with warning. Retrying plain text is more dangerous than accepting it. 🎯 8 🛡️ 8 🧠 3 -
Should max attempts be 3 or 2? Recommended: 3 with delays
[30s, 90s, 180s]. 🎯 8 🛡️ 8 🧠 2 -
Should OpenCode process only one unread row per member at a time? Recommended: yes. This is probably as important as the retry ledger. 🎯 9 🛡️ 9 🧠 5
Final Recommendation
Implement the ledger with read-on-responded semantics and one outstanding OpenCode delivery per member.
This fixes the real bug class:
- prompt accepted but no action,
- inbox row already read,
- task stuck forever,
- no durable retry path.
It also avoids the two common bad fixes:
- prompt spam loops,
- fake frontend replies.