777genius 19b6937446 feat(opencode): harden delivery and provider UI

2026-04-25 14:30:10 +03:00

93 KiB

Raw Permalink Blame History

OpenCode Delivery Ledger And Bounded Retry Plan

Summary

Recommended implementation: add an OpenCode prompt-delivery ledger with explicit accepted, responded, unanswered, retried, and failed states, plus a bounded retry watchdog.

Goal: when the app delivers an inbox row into a live OpenCode teammate session, we stop treating promptAsync() acceptance as success. Acceptance only means the bridge accepted the prompt. A row is fully delivered only after OpenCode shows evidence that the teammate acted on it.

Expected implementation size:

claude_team: 🎯 8 🛡️ 9 🧠 7 - roughly 950-1450 changed lines.
agent_teams_orchestrator: 🎯 8 🛡️ 9 🧠 6 - roughly 500-850 changed lines.
Tests and fixtures: 🎯 9 🛡️ 9 🧠 6 - roughly 750-1250 changed lines.

The plan is intentionally OpenCode-only. Native Codex, Claude, and Gemini teammate delivery paths stay unchanged.

Why This Is Needed

Current OpenCode relay semantics are too optimistic:

claude_team writes an inbox row to inboxes/<member>.json.
relayOpenCodeMemberInboxMessages() sees the unread row.
It calls deliverOpenCodeMemberMessage().
OpenCodeTeamRuntimeAdapter.sendMessageToMember() calls the orchestrator bridge.
OpenCodeBridgeCommandHandler.runSendMessage() calls promptAsync().
The bridge returns accepted: true.
claude_team marks the inbox row read: true.

The problem: accepted: true only proves that OpenCode accepted a prompt into a session. It does not prove the model used tools, sent a visible message, touched a task, or even produced meaningful assistant text.

Observed failure mode from beacon-desk-92121221:

Jack and Tom had task assignment rows marked read.
The OpenCode sessions had received the app-delivery prompts.
The transcript showed empty assistant turns after those prompts.
The assigned tasks stayed in_progress.
No comments, task events, or visible messages were produced.

That is a durable message-loss bug because the inbox row is already read, so normal retry logic has nothing left to pick up.

Current Code Paths

App-side OpenCode delivery

Primary files:

src/main/services/team/TeamProvisioningService.ts
src/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.ts
src/main/services/team/opencode/bridge/OpenCodeBridgeCommandContract.ts
src/main/ipc/teams.ts
src/shared/types/team.ts

Important current behavior:

// TeamProvisioningService.ts
const delivery = await this.deliverOpenCodeMemberMessage(teamName, {
  memberName,
  text: message.text,
  messageId: message.messageId,
  replyRecipient,
  actionMode,
  taskRefs,
});

if (delivery.delivered) {
  await this.markInboxMessagesRead(teamName, memberName, [message]);
}

delivery.delivered currently means bridge prompt acceptance. It should become "bridge accepted plus response proof exists" for OpenCode inbox relay completion.

Deeper research findings

The highest-risk parts after code review are:

promptAsync() returns Promise<void>. It does not return the OpenCode user message id created by /session/:id/prompt_async, so v1 cannot rely on a prompt return value as proof. The response observer must use transcript reconcile.
sendSessionMessage() does return an OpenCodeMessage, but it is the synchronous /session/:id/message path with a long request timeout. It is a bad default for live app delivery because it can block UI/runtime relay behind model execution. It is useful as a future fallback, not the v1 delivery primitive.
markInboxMessagesRead() writes a separate JSON file from any new ledger file. There is no atomic cross-file commit. The ledger must represent "response observed, inbox read commit still pending" so crashes or mark-read failures do not cause duplicate prompts.
Controller inbox writes always create a messageId when one is missing. Task assignment notifications use messageStore.buildMessage(), so they have stable ids. That means the OpenCode prompt ledger can require inboxMessageId for every eligible OpenCode relay row.
OpenCode runtime stores are lane-scoped for secondary lanes. The prompt delivery ledger should be lane-scoped too, not only team-root scoped, because run id, session id, and manifest recovery are lane facts.
Existing relayedMemberInboxMessageIds is a native relay dedupe cache. It is not durable, it is not response-aware, and it should not be reused as the OpenCode delivery truth.
opencode.sendMessage in OpenCodeReadinessBridge is already a direct bridge command, not a state-changing command service call. The new observe command should follow this pattern. Do not route observe through OpenCodeStateChangingBridgeCommandService, because observe must not acquire state-changing command leases, write command-ledger entries, or commit runtime-store manifests.
State-changing bridge service is currently only for:

'opencode.launchTeam' | 'opencode.reconcileTeam' | 'opencode.stopTeam'

opencode.observeMessageDelivery must not be added to that union. It should be added only to the general bridge command contract and orchestrator supported-command dispatch.

reconcileSession(record, { limit }) can miss old prompt history when a long OpenCode session has many later events. If the observer cannot find the inbound messageId or the prePromptCursor inside the limited transcript, it must do one wider/full-history reconcile before returning prompt_not_indexed or empty_assistant_turn.
Real OpenCode fixtures use canonical MCP tool ids like agent-teams_message_send, but tool names can also appear as mcp__agent-teams__message_send or, in lower-level proof paths, plain names such as message_send. The observer must normalize tool names before classifying response proof.
Real OpenCode fixtures also use normal tool names such as bash and read in lowercase. Execution-tool response proof must not only match Codex-style display names like Bash.
If the bridge command itself times out after the prompt may have been submitted, the app cannot know whether OpenCode accepted the prompt. Treat that as failed_retryable with acceptanceUnknown: true, and make the watchdog observe the transcript before sending any retry. Do not immediately send a duplicate retry after an acceptance-unknown timeout.
messageStore and renderer/feed services already understand relayOfMessageId, but the OpenCode MCP message_send schema does not expose it yet. Exposing the existing field is lower risk than inventing a new reply-id contract, and it gives the watchdog a non-heuristic visible-reply correlation signal.

Orchestrator-side OpenCode bridge

Primary files:

/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.ts
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeSessionBridge.ts
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeEventTranslator.ts
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeTranscriptProjector.ts

Important current behavior:

await openCodeSessionBridge.promptAsync(record, {
  text: identityReminder ? `${identityReminder}\n\n${text}` : text,
  agent: asString(body.agent) ?? 'teammate',
  noReply: body.noReply === true,
});

const reconciled = await withTimeout(
  openCodeSessionBridge.reconcileSession(record, { limit: 50 }),
  OPENCODE_SEND_RECONCILE_TIMEOUT_MS,
);

return {
  accepted: true,
  sessionId: record.opencodeSessionId,
  memberName,
  runtimePid,
  diagnostics: reconcileDiagnostics,
};

This already does a post-accept reconcile, but it does not turn the reconcile summary into a response-proof contract.

Transcript capabilities

OpenCodeTranscriptProjector already gives enough normalized fields:

export type OpenCodeCanonicalMessage = {
  id: string | null;
  parentId: string | null;
  role: 'user' | 'assistant' | 'system' | 'unknown';
  completedAt: number | null;
  text: string;
  reasoningText: string;
  previewText: string | null;
  partTypes: string[];
  toolCalls: OpenCodeCanonicalToolCall[];
  finishReason: string | null;
  hasError: boolean;
};

This is enough to classify a prompt outcome.

Key Design Decision

Options Considered

Recommended: keep row unread until response proof, ledger suppresses duplicate relays. 🎯 9 🛡️ 9 🧠 6 - roughly 650-1050 lines total. This preserves durable queue truth and fixes the actual loss bug. The complexity is in preventing the file watcher from retrying immediately.
Mark row read at prompt acceptance, but use a separate retry ledger. 🎯 7 🛡️ 7 🧠 5 - roughly 400-750 lines total. Easier to bolt on, but it keeps inbox truth dishonest. Recovery after app restart is harder because the original row no longer looks pending.
Make UI send block until OpenCode response proof. 🎯 6 🛡️ 8 🧠 4 - roughly 250-450 lines total. Simpler conceptually, but bad UX and still incomplete for watcher-driven task assignment rows.

Use option 1.

Ledger Model

Add a new OpenCode-specific prompt delivery ledger. Do not reuse RuntimeDeliveryJournal; that journal is for the opposite direction, where a runtime writes into canonical app destinations through runtime_deliver_message.

Recommended file for secondary OpenCode lanes:

<team>/.opencode-runtime/lanes/<encodeURIComponent(laneId)>/opencode-prompt-delivery-ledger.json

Recommended schema name:

'opencode.promptDeliveryLedger'

Add it to the runtime store manifest descriptors, schema-name validator, cross-store invariant input, and lane-scoped manifest recovery checks:

{
  schemaName: 'opencode.promptDeliveryLedger',
  schemaVersion: 1,
  relativePath: 'opencode-prompt-delivery-ledger.json',
  criticality: 'rebuildable_from_canonical_destination',
  owner: 'delivery',
  rebuildStrategy: 'verify_canonical_destinations',
}

Do not store secondary-lane prompt delivery truth in the team root .opencode-runtime directory. Use getOpenCodeLaneScopedRuntimeFilePath() and the lane manifest path from getOpenCodeRuntimeManifestPath(teamsBasePath, teamName, laneId).

Reason: activeRunId, runtime session id, stale runtime detection, and durable launch evidence are lane scoped. A team-root ledger would make mixed launches harder to recover correctly after app restart or lane replacement.

Readiness impact:

Prompt ledger corruption should not block OpenCode provider/model readiness.
It should surface degraded diagnostics and force conservative observe-first reconstruction.
Launch should not fail only because the prompt ledger is missing; unread inbox rows remain the canonical destination.

Missing Or Quarantined Ledger Recovery

Because the ledger is rebuildable_from_canonical_destination, a missing ledger can be reconstructed from unread inbox rows. But reconstruction must be conservative.

If the ledger is missing/quarantined while OpenCode lane/session evidence exists:

Read unread OpenCode member inbox rows.
Create reconstructed records with status: 'failed_retryable', responseState: 'not_observed', and acceptanceUnknown: true.
Run opencode.observeMessageDelivery before sending any new prompt.
Only prompt if observation cannot find a delivered user prompt or response proof after the normal grace window.

Reason: app crash or file corruption may have happened after promptAsync() accepted but before the ledger persisted accepted. Rebuilding directly as pending and immediately prompting can duplicate messages.

If no active lane/session can be resolved, keep the reconstructed record failed_retryable with opencode_runtime_not_active; do not prompt until the runtime is live again.

Record Shape

export type OpenCodePromptDeliveryStatus =
  | 'pending'
  | 'accepted'
  | 'responded'
  | 'unanswered'
  | 'retry_scheduled'
  | 'retried'
  | 'failed_retryable'
  | 'failed_terminal';

export type OpenCodePromptResponseState =
  | 'not_observed'
  | 'pending'
  | 'prompt_not_indexed'
  | 'responded_tool_call'
  | 'responded_visible_message'
  | 'responded_non_visible_tool'
  | 'responded_plain_text'
  | 'permission_blocked'
  | 'tool_error'
  | 'empty_assistant_turn'
  | 'session_stale'
  | 'session_error'
  | 'reconcile_failed';

export interface OpenCodePromptDeliveryLedgerRecord {
  id: string;
  teamName: string;
  memberName: string;
  laneId: string;
  runId: string | null;
  runtimeSessionId: string | null;
  inboxMessageId: string;
  inboxTimestamp: string;
  source: 'watcher' | 'ui-send' | 'manual' | 'watchdog';
  replyRecipient: string;
  actionMode: 'do' | 'ask' | 'delegate' | null;
  taskRefs: TaskRef[];
  payloadHash: string;
  status: OpenCodePromptDeliveryStatus;
  responseState: OpenCodePromptResponseState;
  attempts: number;
  maxAttempts: number;
  acceptanceUnknown: boolean;
  nextAttemptAt: string | null;
  lastAttemptAt: string | null;
  lastObservedAt: string | null;
  acceptedAt: string | null;
  respondedAt: string | null;
  failedAt: string | null;
  inboxReadCommittedAt: string | null;
  inboxReadCommitError: string | null;
  prePromptCursor: string | null;
  postPromptCursor: string | null;
  deliveredUserMessageId: string | null;
  observedAssistantMessageId: string | null;
  observedToolCallNames: string[];
  observedVisibleMessageId: string | null;
  visibleReplyMessageId: string | null;
  visibleReplyInbox: string | null;
  visibleReplyCorrelation:
    | 'relayOfMessageId'
    | 'direct_child_message_send'
    | 'plain_assistant_text'
    | null;
  lastReason: string | null;
  diagnostics: string[];
  createdAt: string;
  updatedAt: string;
}

Stable Record ID

Use a stable deterministic ID:

sha256([
  'opencode-prompt-delivery-v1',
  teamName,
  memberName.toLowerCase(),
  laneId,
  inboxMessageId,
].join('\0'))

Do not use message text as the primary key. Two different messages can have identical text and both must be delivered.

Use payloadHash only as a safety check. If the same record ID appears with a different payload hash, mark failed_terminal and log a diagnostic because the inbox row identity is corrupt or reused.

Payload hash input should include:

message text,
summary,
actionMode,
taskRefs,
replyRecipient,
attachment metadata ids/filenames/mime types/sizes,
source/from/to,
conversation ids if present.

Do not hash attachment file bytes in v1. The watchdog needs stable delivery identity, not expensive file integrity checks.

Runtime Identity Binding

The first delivery attempt must resolve and persist:

laneId,
runId if known,
runtimeSessionId when available,
memberName canonical casing,
providerId: 'opencode'.

Retries and observations must use the ledger laneId, not recompute a new lane from current member config unless the ledger is being created for the first time.

If the member's current provider/model changes while a record is non-terminal:

do not redirect the old OpenCode delivery to a new provider or lane,
mark the record failed_terminal with opencode_recipient_runtime_identity_changed,
leave the inbox row unread with diagnostics,
let any new user message use the new provider path normally.

Reason: a retry belongs to the original runtime session that accepted or may have accepted the prompt. Re-resolving from mutable team config can deliver stale messages to the wrong runtime.

Response Versus Inbox Commit

responded means OpenCode acted. It does not necessarily mean the inbox read flag was committed.

Track this separately:

status: 'responded',
inboxReadCommittedAt: null,
inboxReadCommitError: 'write failed'

Relay behavior for this state:

Do not re-prompt.
Retry only the markInboxMessagesRead() commit.
Once read commit succeeds, set inboxReadCommittedAt.

This closes the crash window between observing a response and flipping read: true.

Why Not Use `sendSessionMessage()` For v1

There is a tempting alternative: replace promptAsync() with synchronous sendSessionMessage() because it returns an OpenCodeMessage.

Options:

Recommended: keep promptAsync() plus observe-only reconcile. 🎯 9 🛡️ 9 🧠 6 - roughly 550-950 lines total. It preserves non-blocking relay and gives durable response proof through the watchdog.
Use sendSessionMessage() for all OpenCode deliveries. 🎯 5 🛡️ 6 🧠 4 - roughly 250-500 lines total. It may block up to the model execution timeout, couples UI send to model latency, and still needs transcript projection for tool side effects.
Hybrid: use sendSessionMessage() only for manual UI sends, keep promptAsync() for watcher deliveries. 🎯 6 🛡️ 6 🧠 7 - roughly 450-800 lines total. It creates two delivery semantics and makes failures harder to reason about.

Use option 1.

State Machine

stateDiagram-v2
  [*] --> pending
  pending --> accepted: promptAsync accepted
  pending --> failed_retryable: bridge transient failure
  pending --> failed_terminal: bad recipient/runtime stopped

  accepted --> responded: response proof observed
  accepted --> unanswered: idle + no meaningful response
  accepted --> failed_retryable: reconcile failed or session stale
  accepted --> failed_terminal: permanent contract error

  unanswered --> retry_scheduled: attempts < maxAttempts
  retry_scheduled --> retried: due time reached
  retried --> accepted: promptAsync accepted again
  retried --> failed_retryable: transient retry failure

  failed_retryable --> retry_scheduled: attempts < maxAttempts
  failed_retryable --> failed_terminal: attempts exhausted
  responded --> [*]
  failed_terminal --> [*]

Important semantics:

pending: ledger row exists but no bridge attempt has completed.
accepted: OpenCode accepted a prompt, but no meaningful response proof exists yet.
responded: OpenCode acted. This is the only normal success state that commits the inbox row as read.
unanswered: OpenCode became idle and the transcript has no meaningful response after the delivered user prompt.
retry_scheduled: watchdog will retry after nextAttemptAt.
retried: diagnostic state for audit; implementation can immediately move from retry_scheduled to accepted after the next attempt.
failed_retryable: transient bridge/runtime issue.
failed_terminal: no more retries or permanent error.

Response Proof Contract

Add a pure response observer in the orchestrator:

/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeDeliveryResponseObserver.ts

Suggested API:

export interface OpenCodeDeliveryResponseObservation {
  state:
    | 'pending'
    | 'responded_tool_call'
    | 'responded_visible_message'
    | 'responded_plain_text'
    | 'responded_non_visible_tool'
    | 'permission_blocked'
    | 'tool_error'
    | 'empty_assistant_turn'
    | 'not_observed'
    | 'prompt_not_indexed'
    | 'session_stale'
    | 'session_error'
    | 'reconcile_failed';
  deliveredUserMessageId: string | null;
  assistantMessageId: string | null;
  toolCallNames: string[];
  visibleMessageToolCallId: string | null;
  visibleReplyMessageId: string | null;
  visibleReplyCorrelation:
    | 'relayOfMessageId'
    | 'direct_child_message_send'
    | 'plain_assistant_text'
    | null;
  visibleReplyMissingRelayOfMessageId?: boolean;
  latestAssistantPreview: string | null;
  needsFullHistory?: boolean;
  reason: string | null;
}

How To Identify The Delivered Prompt

The app-delivery prompt already embeds:

The inbound app messageId is "<messageId>"

Observer should:

Collect all user canonical messages whose text contains the exact inbound messageId.
Also pass prePromptCursor, usually the session record's lastCanonicalCursor before promptAsync().
If exact messageId matches exist, treat them as attempts for the same logical delivery. A sufficient response to any matching attempt proves the delivery.
Use the newest exact messageId match only for pending/unanswered diagnostics when no matching attempt has sufficient response proof.
Fallback to the first user message after prePromptCursor only if exactly one candidate exists and the inbound messageId is missing from the indexed text.
If neither the exact messageId nor the prePromptCursor appears in the limited reconcile result, run one wider reconcile before classifying the prompt as missing.
Return prompt_not_indexed if the prompt is not yet visible in transcript and raw status is busy or retry.
Never use "latest user message" without either exact message-id match or a cursor-bounded unique candidate, because OpenCode sessions can contain bootstrap, briefing, assignment, comment, and retry prompts.

prompt_not_indexed is not a failure. It means OpenCode accepted the async prompt but /session/:id/message history has not caught up yet.

Visible Reply Correlation Contract

The response observer should not rely on text/time heuristics to decide whether a visible OpenCode reply belongs to an app-delivered inbox row.

The app already has a durable message id for every relayable inbox row. The controller message store already persists relayOfMessageId. The missing piece is exposing it through the OpenCode-visible MCP message_send schema and teaching the runtime prompt to use it.

Recommended v1 contract:

Extend agent-teams_message_send / message_send parameters with relayOfMessageId?: string.
Preserve relayOfMessageId through the MCP server, controller messages.sendMessage(), and messageStore.sendInboxMessage().
In buildOpenCodeRuntimeMessageText(), instruct OpenCode replies to include:

source="runtime_delivery"
relayOfMessageId="<inbound app messageId>"

If the message has taskRefs, still include those taskRefs exactly.
Do not introduce a second field named replyToMessageId in v1. Use existing relayOfMessageId so renderer, feed services, and TeamDataService can share one correlation primitive.

Suggested prompt wording:

The inbound app messageId is "<messageId>".
When you reply with agent-teams_message_send, include source="runtime_delivery" and relayOfMessageId="<messageId>".

Strongest visible response proof is a canonical destination row where:

message.from === memberName
message.relayOfMessageId === inboxMessageId
message.source === 'runtime_delivery'

For the normal user-directed case, this row is in inboxes/user.json. If the reply target is a same-team teammate or lead, check that recipient's inbox file instead of assuming user inbox.

The transcript observer still matters because OpenCode can act with task tools or execution tools without creating a visible inbox message. But for visible replies, destination-store proof should outrank transcript-only proof because it proves the app actually received the reply, not only that OpenCode attempted a tool call.

Recommended proof priority for visible replies:

Destination row with matching relayOfMessageId.
Successful direct-child message_send tool call whose arguments include matching relayOfMessageId.
Successful direct-child message_send tool call without relayOfMessageId, accepted only as a fallback with diagnostic visible_reply_missing_relayOfMessageId.
Plain assistant text, accepted only as responded_plain_text with diagnostic because it may not appear in Messages UI.

Do not use TeamDataService.linkPassiveUserReplySummaries() or summary/time matching as OpenCode delivery proof. That method is useful for rendering passive duplicate summaries, but it is intentionally heuristic and should not decide inbox read commits.

Visible Reply Semantic Sufficiency

Correlation proves that a reply belongs to a delivered message. It does not prove that the reply is useful.

For visible message_send replies, evaluate semantic sufficiency before read commit:

function isVisibleReplySemanticallySufficient(input: {
  actionMode: 'do' | 'ask' | 'delegate' | null;
  taskRefs: TaskRef[];
  text: string;
  summary?: string | null;
}): boolean

Rules:

For actionMode: 'ask', ack-only text such as Понял, Ок, Understood, Got it, I'll check, or I'll do it is not sufficient. The reply must contain an answer, a blocking question, or a concrete status/result.
For no action mode and no task refs, use the same rule as ask.
For taskRefs.length > 0, ack-only visible text alone is not sufficient unless there is also a successful task tool or execution tool. The goal is not just acknowledgement; the prompt delivery should produce task-side action or a concrete status.
For actionMode: 'do', ack-only visible text alone is not sufficient unless there is also successful task/execution proof.
For actionMode: 'delegate', a visible message is sufficient only if it names the delegated action, target, or status. Pure acknowledgement is not enough.

Ack-only detection must be intentionally narrow:

Normalize whitespace and lowercase.
Only treat as ack-only when the combined text/summary is short, for example under 120 characters.
Match exact phrases or simple phrase prefixes from a small allow-reviewed list: понял, ок, принял, сделаю, разберусь, understood, got it, ok, will do, I'll check, I'll take a look.
If the text contains a concrete blocker, question, result, file path, task id, error, number, code span, or more than one sentence of substance, treat it as sufficient.
If uncertain, prefer sufficient and log visible_reply_semantic_uncertain. False retries are usually worse than accepting a borderline concrete status.

If a visible reply is correlated but semantically insufficient:

keep the ledger accepted,
set responseState = 'responded_visible_message',
set lastReason = 'visible_reply_ack_only_still_requires_answer',
do not mark the original OpenCode inbox row read,
retry after grace with copy asking for a concrete answer/status, not just acknowledgement.

This is deliberately stricter than the controller-level idle/ack-only guard. The controller guard decides whether to persist a message. The watchdog read-commit policy decides whether that persisted message satisfies the delivered prompt.

OpenCode Ack-Only Guard Tightening

Current controller logic treats any non-empty source as explicit delivery context for the idle/ack-only guard. That is too broad for OpenCode because a model can emit source="runtime_delivery" on a generic "Понял" / "Understood" message that is not tied to an app-delivered row.

For OpenCode senders, hasExplicitDeliveryContext() should treat a message as explicit delivery context only when at least one of these is true:

relayOfMessageId is present and non-empty.
taskRefs is present and non-empty.
attachments is present and non-empty.
leadSessionId is present for a real lead-session reply.
Another future typed delivery context field is present.

source="runtime_delivery" alone is metadata, not proof. It should not bypass ack-only filtering unless it is paired with relayOfMessageId or task/action context.

This keeps the guard from dropping useful non-ack answers: a substantive answer with text still passes because the idle/ack-only detector only rejects ack/no-task/no-work phrases. The stricter context rule only affects OpenCode ack-only noise.

Reconcile Limit Fallback

The first post-send reconcile should stay bounded for latency:

await reconcileSession(record, { limit: 80 });

But bounded history is not enough for durable watchdog observation. If observeOpenCodeDeliveryResponse() cannot find either:

the exact inbound messageId, or
the prePromptCursor that was captured before promptAsync(),

then runObserveMessageDelivery() must retry with a wider/full transcript once:

const limited = await reconcileSession(record, { limit: 80 });
const first = observeOpenCodeDeliveryResponse({ ...limited, prePromptCursor });

if (first.state === 'prompt_not_indexed' && first.needsFullHistory === true) {
  const full = await reconcileSession(record);
  return observeOpenCodeDeliveryResponse({ ...full, prePromptCursor });
}

Do not add this full-history fallback to every sendMessage call. Use it in watchdog/observe paths and only when the anchor is missing. UI send should stay latency bounded.

What Counts As Responded

Use a direct child assistant message where:

assistant.parentId === deliveredUserMessage.id

There can be multiple assistant children for one delivered user prompt. Classify all children and choose the strongest successful response proof, not just the first or last child.

Classify as success if any of these are true:

Tool call exists:

assistant.toolCalls.length > 0

This is the strongest signal, but only after filtering tool calls.

Meaningful tool calls:

agent-teams_message_send
mcp__agent-teams__message_send
message_send
agent-teams_task_*
mcp__agent-teams__task_*
task_*
normal execution tools when the prompt action mode is do, such as shell/edit/read tools, if exposed by OpenCode in canonical transcript

Non-meaningful tool calls for delivery proof:

runtime_bootstrap_checkin
agent-teams_runtime_bootstrap_checkin
mcp__agent-teams__runtime_bootstrap_checkin
member_briefing
agent-teams_member_briefing
mcp__agent-teams__member_briefing
runtime_heartbeat
process_register
process_list

Reason: identity/bootstrap tools can be emitted because of the prepended reminder, not because the agent responded to the delivered message.

Tool-name normalization should happen before classification:

function normalizeToolName(name: string): string {
  return name
    .replace(/^mcp__agent-teams__/, '')
    .replace(/^agent-teams_/, '')
    .toLowerCase();
}

Then classify by normalized names:

visible message tool: message_send
task tools: task_get, task_start, task_add_comment, task_complete, and other task_ names
bootstrap tools: runtime_bootstrap_checkin, member_briefing, runtime_heartbeat, process_register, process_list
execution tools for actionMode: 'do': bash, read, edit, write, grep, glob, and OpenCode equivalents observed in fixtures

Real fixture coverage should include at least these observed names:

agent-teams_runtime_bootstrap_checkin
agent-teams_member_briefing
agent-teams_message_send
agent-teams_task_get
agent-teams_task_start
agent-teams_task_add_comment
agent-teams_task_complete
bash
read

Visible team message tool call exists:

assistant.toolCalls.some((tool) =>
  normalizeToolName(tool.toolName) === 'message_send'
)

This should classify as responded_visible_message.

If the tool call arguments include relayOfMessageId matching the inbound inbox messageId, set visibleReplyCorrelation = 'relayOfMessageId'.

If the tool call is a direct child of the delivered prompt but lacks relayOfMessageId, still classify it as responded_visible_message in v1, but emit diagnostic visible_reply_missing_relayOfMessageId. This avoids losing a real answer during rollout while making the missing correlation visible in logs/tests.

Meaningful non-visible tool call exists:

This classifies as responded_non_visible_tool.

For task assignment prompts, this is enough to mark the message responded if the tool call is a task tool or execution tool. The delivery watchdog is proving that the prompt was noticed, not that the whole task is complete.

For direct user questions, non-visible tool activity is not always enough. A read or bash call can prove the model noticed the prompt, but it does not prove the user got an answer. The read-commit policy must use actionMode and taskRefs, not only responseState.

Non-empty assistant text exists:

assistant.text.trim().length > 0 || assistant.previewText?.trim()

This classifies as responded_plain_text, not ideal success. It prevents prompt spam when the model actually answered in plain text, but diagnostics should warn that the app may not show it as a normal team message.

Reasoning-only output is not a user-visible answer:

assistant.reasoningText.trim().length > 0 && assistant.text.trim().length === 0

Do not classify this as responded_plain_text. Treat it like no visible response unless a meaningful successful tool call exists.

Tool call exists but all meaningful calls are errored:

assistant.toolCalls.some((tool) => tool.isError)

This classifies as tool_error, not responded. Do not mark the inbox row read unless there is also a successful meaningful tool call or non-empty plain text.

Assistant error exists:

assistant.hasError === true

This is not success. It should map to session_error and failed_retryable or failed_terminal depending on error type.

Pending permission exists:

summary.pendingPermissionRequestIds.length > 0

This is permission_blocked. Do not retry automatically while blocked.

What Counts As Unanswered

Classify as empty_assistant_turn if a direct assistant child exists but all are true:

assistant.toolCalls.length === 0
assistant.text.trim().length === 0
assistant.reasoningText.trim().length === 0
assistant.completedAt == null
assistant.finishReason == null
assistant.hasError === false

Classify as pending if:

raw status is busy or retry, and no meaningful child exists yet.

Classify as empty_assistant_turn or unanswered if:

raw status is idle.
no meaningful child exists.
no permission block exists.

Do not rely on replyPendingSinceMessageId from OpenCodeEventTranslator alone. It currently treats any child message as a reply, including the empty assistant-turn failure.

If the direct child contains only reasoning/step metadata and no visible text or meaningful successful tool call, treat it as unanswered for delivery-read purposes.

Response Proof Priority

Use this classification order:

session_stale if reconcile outcome is stale.
permission_blocked if pending permissions exist.
prompt_not_indexed if the delivered prompt is not indexed yet and raw status is busy or retry.
pending if direct child has running tool calls or session status is busy/retry.
responded_visible_message if a successful visible message tool call exists.
responded_non_visible_tool if a successful meaningful non-visible tool call exists.
responded_tool_call only as a backward-compatible generic response if the implementation cannot distinguish visible vs non-visible tool classes.
responded_plain_text if non-empty assistant text exists.
tool_error if errored meaningful tool calls exist and no success signal exists.
empty_assistant_turn if direct assistant child is empty or session is idle with no child.

Response Sufficiency Policy

responseState is observation. Inbox read commit is policy.

Use this policy:

Message intent	Sufficient to commit read	Not sufficient by itself
`taskRefs.length > 0`	semantically sufficient visible message, plain text with concrete status, successful task tool, successful execution tool	bootstrap/identity tools, errored tools, ack-only visible message with no task/execution proof
`actionMode: 'do'`	semantically sufficient visible message, plain text with concrete status, successful meaningful execution or task tool	bootstrap/identity tools, errored tools, ack-only visible message with no tool proof
`actionMode: 'delegate'`	visible message that names target/action/status, plain text with concrete status, successful task tool or `message_send` delegation/status	unrelated execution-only tool with no task/message effect, ack-only text
`actionMode: 'ask'`	visible message or plain text that contains an answer, blocking question, or concrete status	non-visible execution/task lookup only, ack-only text
no action mode and no task refs	visible message or plain text that contains an answer, blocking question, or concrete status	non-visible tool activity only, ack-only text

This avoids the most dangerous false success: a user asks an OpenCode teammate a question, the model performs a read, then goes idle without sending any answer. That should stay pending or become retryable; it should not mark the inbox row read.

Before committing read for responded_visible_message, the app should prefer destination-store proof:

const visibleReply = await findVisibleReplyByRelayOfMessageId({
  teamName,
  replyRecipient,
  from: memberName,
  relayOfMessageId: inboxMessageId,
});

If destination proof exists, commit read even if the bounded transcript observation is still delayed. The MCP tool call already wrote the canonical app row, which is stronger than waiting for OpenCode transcript indexing.

If destination proof does not exist but transcript proof shows a successful direct-child message_send, keep the normal transcript-based policy and log visible_reply_destination_not_found_yet. Do not immediately retry; re-observe/re-scan once because inbox writes and transcript reconciliation can race in either direction.

If responded_non_visible_tool is not sufficient for the message intent:

keep the ledger non-terminal as accepted,
store responseState: 'responded_non_visible_tool',
set lastReason: 'visible_reply_still_required',
re-observe after a short delay,
after grace, retry with copy asking for a visible agent-teams_message_send or plain answer.

Bridge Contract Extension

Extend both sides of the OpenCode bridge:

export interface OpenCodeSendMessageCommandData {
  accepted: boolean;
  sessionId?: string;
  memberName: string;
  runtimePid?: number;
  prePromptCursor?: string | null;
  responseObservation?: OpenCodeDeliveryResponseObservation;
  diagnostics: OpenCodeTeamBridgeDiagnostic[];
}

Orchestrator runSendMessage() should still return accepted: true after promptAsync() succeeds, even if response observation is pending or unanswered.

Reason: acceptance and response are different facts. The bridge should not turn post-accept reconcile weakness into a prompt delivery failure.

Add Observe-Only Bridge Command

The watchdog must be able to inspect a previous accepted prompt without sending a new prompt.

Add a new command:

export interface OpenCodeObserveMessageDeliveryCommandBody {
  runId?: string;
  laneId: string;
  teamId: string;
  teamName: string;
  projectPath: string;
  memberName: string;
  messageId: string;
  prePromptCursor?: string | null;
}

export interface OpenCodeObserveMessageDeliveryCommandData {
  observed: boolean;
  sessionId?: string;
  memberName: string;
  runtimePid?: number;
  responseObservation: OpenCodeDeliveryResponseObservation;
  diagnostics: OpenCodeTeamBridgeDiagnostic[];
}

Recommended command name:

opencode.observeMessageDelivery

Why this is required:

Calling opencode.sendMessage for "check before retry" would create a duplicate prompt.
claude_team does not have direct OpenCode transcript access.
The orchestrator already owns session records, stale-session detection, host access, and transcript projection.

observeMessageDelivery should run the same runtime/session validation as sendMessage except it must not call promptAsync(). This is intentionally not an app-side state-changing bridge command.

It is acceptable to reuse the orchestrator-side runtime precondition validator even if the current helper name includes "StateChanging". The hard boundary is app-side command execution: observe must not use the state-changing command service or its command ledger.

Concrete bridge wiring:

Add opencode.observeMessageDelivery to OpenCodeBridgeCommandName and VALID_COMMANDS in OpenCodeBridgeCommandContract.ts.
Add it to orchestrator SUPPORTED_COMMANDS and the dispatch switch in OpenCodeBridgeCommandHandler.ts.
Add observeOpenCodeTeamMessageDelivery() to OpenCodeReadinessBridge.
Add observeMessageDelivery() to OpenCodeTeamRuntimeAdapter and the bridge port type.
Use direct this.bridge.execute('opencode.observeMessageDelivery', ...), matching current sendOpenCodeTeamMessage().
Do not add it to OpenCodeStateChangingTeamCommandName.
Do not route it through OpenCodeStateChangingBridgeCommandService.
Do not create command-ledger entries or runtime-store manifest commits for observe.
Use a short observe timeout separate from send timeout, for example OPENCODE_OBSERVE_MESSAGE_TIMEOUT_MS = 8_000.

Reason: observe is read-only from the app perspective. It reconciles an existing OpenCode session and returns response evidence. Treating it as a state-changing command would add lease/idempotency behavior intended for launch/reconcile/stop and would create failure modes unrelated to delivery observation.

Clarification: reconcileSession() may still update the orchestrator's own session record, for example stale flags or last cursor. That is acceptable. The constraint is that the app bridge must not treat observe as a launch/reconcile/stop command and must not commit app runtime-store manifests for it.

Implementation sketch:

async function runObserveMessageDelivery(envelope: BridgeEnvelope) {
  const record = await openCodeSessionStore.get(teamId, laneId, memberName);
  const limited = await openCodeSessionBridge.reconcileSession(record, { limit: 80 });
  let responseObservation = observeOpenCodeDeliveryResponse({
    inboundMessageId: messageId,
    prePromptCursor,
    summary: limited.summary,
  });
  if (responseObservation.needsFullHistory === true) {
    const full = await openCodeSessionBridge.reconcileSession(record);
    responseObservation = observeOpenCodeDeliveryResponse({
      inboundMessageId: messageId,
      prePromptCursor,
      summary: full.summary,
    });
  }
  return { snapshot, data: { observed: true, responseObservation, ... } };
}

Watchdog flow must be:

Observe existing prompt.
If response is now proven, commit read.
If pending or blocked, wait.
If unanswered and retry due, send a retry prompt.

Never retry first and observe second.

Retry Policy

Recommended defaults:

const OPENCODE_PROMPT_DELIVERY_MAX_ATTEMPTS = 3;
const OPENCODE_PROMPT_DELIVERY_RETRY_DELAYS_MS = [30_000, 90_000, 180_000];
const OPENCODE_PROMPT_RESPONSE_GRACE_MS = 20_000;
const OPENCODE_PROMPT_RESPONSE_GRACE_FOR_TASK_MS = 45_000;
const OPENCODE_PROMPT_WATCHDOG_SCAN_MS = 15_000;

Rules:

UI send gets one immediate bridge attempt.
Watcher delivery gets one immediate bridge attempt.
Watchdog always calls opencode.observeMessageDelivery before deciding to retry.
If OpenCode is busy or retry, keep the record accepted or pending and scan later.
If OpenCode is idle with empty/no response after grace period, mark unanswered.
Schedule retry if attempts remain.
Retry prompt should include a short retry header:

<opencode_delivery_retry>
Previous app message delivery was accepted by OpenCode but no action was observed.
Retry attempt 2/3 for inbound app messageId "<messageId>".
If you already acted on this message, do not duplicate work; send a concrete status via agent-teams_message_send with relayOfMessageId="<messageId>" or update the related task. Do not reply only with acknowledgement.
</opencode_delivery_retry>

For direct ask messages where OpenCode already used non-visible tools but did not answer visibly, use a different header:

<opencode_delivery_retry>
Previous app message delivery was noticed, but no visible answer was observed.
Retry attempt 2/3 for inbound app messageId "<messageId>".
Please reply with agent-teams_message_send to "<replyRecipient>" and include relayOfMessageId="<messageId>". If that tool is unavailable, provide a concise plain-text answer. Do not repeat tool work unless needed and do not reply only with acknowledgement.
</opencode_delivery_retry>

After attempts are exhausted, mark failed_terminal.

Acceptance-Unknown Bridge Failures

Some failures happen at the command boundary after the prompt may already have reached OpenCode. Example: the app bridge times out while the orchestrator is still inside or just after promptAsync().

For these cases:

Create or update the ledger record as failed_retryable.
Set acceptanceUnknown: true.
Store responseState: 'not_observed' and the bridge timeout diagnostic.
The next watchdog action must be opencode.observeMessageDelivery.
Only retry the prompt if observe finds no matching delivered user prompt or meaningful response after the normal grace window.

Do not immediately retry acceptance-unknown failures. That would turn a transport timeout into duplicate user prompts.

UI Timeout Is Not Delivery Failure

The current IPC send path wraps OpenCode live relay in a short UI timeout. That timeout protects the renderer, but it does not necessarily cancel the underlying relay promise. The relay may still accept the prompt and update the ledger after the UI receives a timeout-shaped result.

Therefore:

OPENCODE_RUNTIME_DELIVERY_UI_TIMEOUT_MS should produce runtimeDelivery.responsePending = true, not a terminal failed delivery.
The ledger record should be left pending or failed_retryable with acceptanceUnknown: true.
The watchdog must observe before any retry.
The UI copy should say that live delivery is still being checked, not that the message definitively failed.
If the underlying relay later records response proof, normal read commit should complete.

Suggested reason code:

'opencode_runtime_delivery_ui_timeout_pending'

Do not reuse the old hard-failure copy for this state.

Retry Eligibility Matrix

Observation state	Retry?	Commit inbox read?	Notes
`responded_tool_call`	Depends	Depends	Generic responded state, prefer more specific visible/non-visible states when possible. Apply response sufficiency policy.
`responded_visible_message`	Depends	Depends	Commit only when correlated and semantically sufficient for the message intent. Ack-only visible replies can require retry.
`responded_non_visible_tool`	Depends	Depends	Commit for task/do/delegate when sufficient. For ask/direct questions, require visible/plain reply.
`responded_plain_text`	No	Yes	Warn because app Messages may not show it as normal teammate reply.
`permission_blocked`	No	No	Wait for permission resolution.
`prompt_not_indexed`	No	No	Re-observe after short delay.
`pending`	No	No	Re-observe after short delay.
`empty_assistant_turn`	Yes, if due	No	Core retry path.
`tool_error`	Yes, if retryable	No	Retry only after grace and bounded attempts.
`session_stale`	Not immediately	No	Require lane/session recovery first.
`reconcile_failed`	Yes, if retryable	No	Prefer observe retry before prompt retry.
`not_observed` with `acceptanceUnknown`	Observe first	No	Never retry prompt before transcript observation.

Single Outstanding Delivery Per OpenCode Member

This is important.

Current relay processes up to 10 unread rows per member in one loop. For OpenCode, that can hammer a model session with multiple prompts before it has acted on the first one. That increases the chance of empty turns, ignored tasks, or responses to the wrong prompt.

Recommended v1 behavior:

For OpenCode members, process only the oldest eligible unread row if there is no active non-terminal ledger record for that member.
If a record is accepted, pending, unanswered with future nextAttemptAt, or retry_scheduled, do not relay newer rows for that member yet.
Once the active record becomes responded or failed_terminal, the next scan can pick the next inbox row.
Rows with failed_terminal ledger records stay unread for visibility, but automatic relay selection must skip them unless the user explicitly retries that failed message.

This is not required for native teammates because native runtimes already consume inbox files directly and have their own event loops.

Required Serialization

Use one per-member OpenCode delivery mutex for all entry points:

inbox file watcher relay,
UI send relay,
manual relay,
watchdog observe/retry,
startup recovery.

The existing openCodeMemberInboxRelayInFlight pattern is the right shape, but v1 must ensure the watchdog also goes through the same per-member gate. Do not create a second independent watchdog path that can race the relay path.

The mutex should cover:

reading inbox rows,
checking active ledger state,
creating or resuming a ledger record,
deciding whether a bridge call is allowed,
applying the bridge result or observation.

Do not hold an inbox file lock across a bridge call. The durable guard is the ledger record, written before the bridge call. The in-memory per-member mutex prevents duplicate bridge calls inside one app process.

Lock ordering:

Acquire per-member in-memory delivery gate.
Mutate ledger with VersionedJsonStore.updateLocked(), then release ledger file lock.
Call bridge if needed, with no inbox or ledger file lock held.
Mutate ledger with result/observation, then release ledger file lock.
If read commit is allowed, call markInboxMessagesRead() with inbox lock.
Mutate ledger to record inboxReadCommittedAt or inboxReadCommitError.

Never hold ledger and inbox file locks at the same time. This keeps the watchdog compatible with existing inbox writer/reader paths and avoids lock inversion.

If another call arrives with onlyMessageId while a different message is active:

wait for the existing per-member work to finish,
re-read the ledger and inbox,
return queued/pending for the new message if it is still unread,
do not report the new message as delivered just because the older in-flight relay finished.

Active Record Definition

A record blocks newer OpenCode inbox rows for the same member if:

record.status !== 'responded' &&
record.status !== 'failed_terminal'

or if:

record.status === 'responded' && record.inboxReadCommittedAt == null

The second case blocks newer rows only until the read commit is retried. It should not send another prompt.

Failed Terminal Rows

failed_terminal is terminal for automatic delivery attempts, not for user visibility.

Rules:

Do not mark the inbox row read automatically.
Do not select that row again in normal oldest-unread relay scans.
Do not let that row block newer unread rows for the same member.
Show diagnostics on the member/card/message surface.
Allow a future explicit manual retry action to reset the ledger record or create a new attempt.

This avoids both bad outcomes: hiding a failed message by marking it read, or permanently blocking the member queue behind a failed unread row.

UI Send Exception

If the user sends a direct UI message to a member while an older OpenCode delivery is active:

Recommended v1 behavior:

Persist the new inbox row as usual.
Return runtimeDelivery.responsePending = true.
Do not bypass the active record and send the new prompt immediately.
Show copy like Message saved. OpenCode is still processing an earlier delivery.

Reason: letting UI sends bypass the ordering rule can reintroduce the same prompt collision that caused lost task assignments.

App-Side Relay Changes

New Service

Add:

src/main/services/team/opencode/delivery/OpenCodePromptDeliveryLedger.ts

Responsibilities:

Create or read records by teamName/memberName/laneId/inboxMessageId.
Enforce payload hash consistency.
Transition records with validation.
List due retry records.
List active member records.
Prune terminal records older than a retention window.
Use VersionedJsonStore.updateLocked() for every mutation.

Suggested retention:

const RESPONDED_RETENTION_MS = 7 * 24 * 60 * 60 * 1000;
const FAILED_RETENTION_MS = 30 * 24 * 60 * 60 * 1000;

Ledger API Surface

Recommended methods:

interface OpenCodePromptDeliveryLedgerStore {
  ensurePending(input: EnsurePromptDeliveryInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
  getByInboxMessage(input: DeliveryRecordKeyInput): Promise<OpenCodePromptDeliveryLedgerRecord | null>;
  getActiveForMember(input: { teamName: string; laneId: string; memberName: string }): Promise<OpenCodePromptDeliveryLedgerRecord | null>;
  listDue(input: { teamName?: string; now: Date; limit: number }): Promise<OpenCodePromptDeliveryLedgerRecord[]>;
  markAccepted(input: MarkAcceptedInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
  applyObservation(input: ApplyObservationInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
  applyDestinationProof(input: ApplyDestinationProofInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
  markRetryScheduled(input: MarkRetryScheduledInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
  markRetryAttempted(input: MarkRetryAttemptedInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
  markInboxReadCommitted(input: { id: string; committedAt: string }): Promise<OpenCodePromptDeliveryLedgerRecord>;
  markInboxReadCommitFailed(input: { id: string; error: string }): Promise<OpenCodePromptDeliveryLedgerRecord>;
  markFailed(input: MarkFailedInput): Promise<OpenCodePromptDeliveryLedgerRecord>;
}

Suggested destination proof input:

interface ApplyDestinationProofInput {
  id: string;
  visibleReplyInbox: string;
  visibleReplyMessageId: string;
  visibleReplyCorrelation: 'relayOfMessageId';
  visibleReplyText: string;
  visibleReplySummary?: string | null;
  semanticallySufficient: boolean;
  observedAt: string;
}

applyObservation() must be idempotent. Re-observing a previously responded record must not change attempts or schedule another prompt.

applyDestinationProof() must also be idempotent. It should set responseState = 'responded_visible_message' and visible-reply fields without incrementing attempts. It should set status = 'responded' and respondedAt only when semanticallySufficient === true; otherwise keep the delivery non-terminal and set lastReason = 'visible_reply_ack_only_still_requires_answer'.

Implementation constraints:

Use the existing RuntimeDeliveryJournalStore style as a template, but do not copy its begin() attempt semantics blindly.
ensurePending() must not increment attempts when a duplicate watcher event sees the same pending or accepted record.
Increment attempts only when a new prompt attempt is actually about to be sent.
markRetryAttempted() should be the only normal transition that increments attempts after the initial attempt.
Validate duplicate ids on read, like RuntimeDeliveryJournalStore does.
A payload hash conflict is terminal for that record and must not send a prompt.
Ledger mutation methods should return the updated record so relay/watchdog code never reasons from stale in-memory copies.

TeamProvisioningService Relay Flow

Current simplified flow:

for (const message of unread.slice(0, 10)) {
  const delivery = await deliverOpenCodeMemberMessage(...);
  if (delivery.delivered) {
    await markInboxMessagesRead(...);
  }
}

New flow:

const active = await promptDeliveryLedger.getActiveForMember(teamName, memberName);
if (active && !isDueForRetry(active, now)) {
  return {
    relayed: 0,
    attempted: 0,
    delivered: 0,
    failed: 0,
    lastDelivery: {
      delivered: true,
      responsePending: true,
      reason: 'opencode_delivery_response_pending',
    },
  };
}

const message = selectOldestEligibleUnreadMessage(...);
let record = await promptDeliveryLedger.ensurePending(...);

const delivery = await deliverOpenCodeMemberMessage(...);
record = await promptDeliveryLedger.applyDeliveryResult(record.id, delivery);

const visibleReply = await findOpenCodeVisibleReplyByRelayOfMessageId({
  teamName,
  replyRecipient: record.replyRecipient,
  from: memberName,
  relayOfMessageId: record.inboxMessageId,
});

if (visibleReply) {
  record = await promptDeliveryLedger.applyDestinationProof({
    id: record.id,
    visibleReplyInbox: visibleReply.inboxName,
    visibleReplyMessageId: visibleReply.messageId,
    visibleReplyText: visibleReply.text,
    visibleReplySummary: visibleReply.summary,
    visibleReplyCorrelation: 'relayOfMessageId',
    semanticallySufficient: isVisibleReplySemanticallySufficient({
      actionMode: record.actionMode,
      taskRefs: record.taskRefs,
      text: visibleReply.text,
      summary: visibleReply.summary,
    }),
    observedAt: new Date().toISOString(),
  });
}

if (isOpenCodeDeliveryReadCommitAllowed(delivery, record)) {
  await markInboxMessagesRead(teamName, memberName, [message]);
  await promptDeliveryLedger.markInboxReadCommitted(...);
}

selectOldestEligibleUnreadMessage() must skip unread rows whose ledger record is failed_terminal, unless options.onlyMessageId explicitly targets that message for a manual retry path.

isOpenCodeDeliveryReadCommitAllowed() should return true for:

semantically sufficient visible response: responded_visible_message,
plain assistant text: responded_plain_text,
generic responded_tool_call only when it can be classified as sufficient for the message intent,
non-visible tool activity only when actionMode/taskRefs make that sufficient.

It should return false for:

responded_non_visible_tool on direct ask messages without task refs,
tool_error,
bootstrap/identity-only tools,
pending,
prompt_not_indexed,
empty_assistant_turn,
reconcile_failed,
session_stale.

Destination proof helper:

async function findOpenCodeVisibleReplyByRelayOfMessageId(input: {
  teamName: string;
  replyRecipient: string;
  from: string;
  relayOfMessageId: string;
}): Promise<{
  inboxName: string;
  messageId: string;
  timestamp: string;
  text: string;
  summary: string | null;
} | null>

Rules:

For replyRecipient === 'user', read inboxes/user.json.
For lead aliases, resolve to the configured lead inbox name.
For same-team teammate recipients, resolve to that teammate inbox name.
Require exact from match after member-name canonicalization.
Require exact relayOfMessageId match.
Prefer a row with source === 'runtime_delivery', but allow missing source only as fallback with diagnostic visible_reply_missing_runtime_delivery_source.
Do not match by text, summary, timestamp, or task display id.

If destination proof appears after the first bridge response observation, the next watchdog scan should mark the delivery responded, commit the original OpenCode recipient row read, and not send another prompt.

applyDeliveryResult() must persist:

acceptedAt when bridge accepted.
prePromptCursor returned by the orchestrator.
runtimeSessionId and runtimePid if present.
acceptanceUnknown when a command-boundary timeout means prompt acceptance cannot be proven.
responseState, deliveredUserMessageId, observedAssistantMessageId, and observedToolCallNames when observation is available.
visibleReplyMessageId, visibleReplyInbox, and visibleReplyCorrelation when destination-store proof exists.

If delivery.responseState is already responded_* but markInboxMessagesRead() fails, the relay must not treat that as prompt failure. It should store inboxReadCommitError and retry the read commit on the next scan.

If an unread row already has a ledger record with status: 'responded' and inboxReadCommittedAt: null, relay should only retry markInboxMessagesRead().

Delivery Result Shape

Extend internal delivery result:

interface OpenCodeMemberInboxDelivery {
  delivered: boolean;          // bridge accepted or full response? See below.
  accepted?: boolean;
  responsePending?: boolean;
  responseState?: OpenCodePromptResponseState;
  ledgerStatus?: OpenCodePromptDeliveryStatus;
  acceptanceUnknown?: boolean;
  visibleReplyMessageId?: string;
  visibleReplyCorrelation?: 'relayOfMessageId' | 'direct_child_message_send' | 'plain_assistant_text';
  queuedBehindMessageId?: string;
  reason?: string;
  diagnostics?: string[];
}

For internal relay completion:

accepted: true, responsePending: true, delivered: true is acceptable for UI feedback, but must not mark row read.
responseState: responded_*, delivered: true, responsePending: false commits read.
delivered: false means prompt acceptance failed.

For SendMessageResult.runtimeDelivery, keep backward compatibility:

runtimeDelivery: {
  providerId: 'opencode',
  attempted: true,
  delivered: accepted,
  responsePending: observation is not responded,
  responseState,
  ledgerStatus,
  acceptanceUnknown,
  visibleReplyMessageId,
  visibleReplyCorrelation,
  queuedBehindMessageId,
  reason,
  diagnostics,
}

This avoids making UI send look like a hard failure when OpenCode accepted the message but has not responded yet.

Watchdog Scheduling

Add a lightweight app-side scheduler:

src/main/services/team/opencode/delivery/OpenCodePromptDeliveryWatchdog.ts

Responsibilities:

Wake after inbox relay creates accepted, unanswered, or retry_scheduled.
On app startup or team activation, scan due records.
For each due record, first call the observe-only bridge path.
If observe shows response proof, commit the inbox row read.
If observe shows pending/blocked, reschedule without prompt retry.
If observe shows unanswered and retry is due, call back into TeamProvisioningService.relayOpenCodeMemberInboxMessages(teamName, memberName, { onlyMessageId, source: 'watchdog' }).
Cap concurrency globally and per member.
Use the same per-member OpenCode delivery gate as relay, not a separate parallel executor for the same member.

Recommended caps:

const OPENCODE_PROMPT_WATCHDOG_GLOBAL_CONCURRENCY = 2;
const OPENCODE_PROMPT_WATCHDOG_PER_TEAM_CONCURRENCY = 1;

Do not use a tight interval. Use:

a delayed timer for the nearest nextAttemptAt,
plus opportunistic scans on inbox watcher events,
plus app/team startup recovery.

Startup Recovery Algorithm

On app startup or when a team detail view becomes active:

Read OpenCode lane index.
For each active lane, read lane-scoped prompt delivery ledger.
For each responded record with inboxReadCommittedAt === null, retry only markInboxMessagesRead().
For each accepted or retry_scheduled record, call opencode.observeMessageDelivery.
For each due unanswered record, observe first, then retry only if still unanswered.
For missing/quarantined prompt ledger files, reconstruct unread inbox rows as acceptanceUnknown and observe before prompt.
Ignore records whose lane is no longer active unless the row is still unread and the team is running. In that case classify as failed_retryable with opencode_runtime_not_active.

Recovery must group work by (teamName, laneId, memberName) and run each group through the per-member gate. Do not let startup recovery observe/retry a member while the inbox watcher is also relaying that member.

This avoids retrying stale prompts immediately after app restart.

Orchestrator Response Observer

New Pure Helper

Add:

/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeDeliveryResponseObserver.ts

Example:

export function observeOpenCodeDeliveryResponse(input: {
  inboundMessageId: string | null;
  prePromptCursor: string | null;
  summary: OpenCodeSessionReconcileSummary;
}): OpenCodeDeliveryResponseObservation {
  if (input.summary.reconcileOutcome === 'stale') {
    return { state: 'session_stale', ... };
  }

  if (input.summary.pendingPermissionRequestIds.length > 0) {
    return { state: 'permission_blocked', ... };
  }

  const deliveryAttempts = findDeliveredUserMessageAttempts(
    input.summary.messages,
    input.inboundMessageId,
    input.prePromptCursor,
  );
  if (deliveryAttempts.length === 0) {
    return input.summary.rawStatus === 'busy' || input.summary.rawStatus === 'retry'
      ? { state: 'pending', ... }
      : { state: 'empty_assistant_turn', reason: 'delivered_user_message_not_found', ... };
  }

  const attemptChildren = deliveryAttempts.map((attempt) => ({
    deliveredUser: attempt,
    assistantChildren: input.summary.messages.filter(
      (message) => message.role === 'assistant' && message.parentId === attempt.id
    ),
  }));

  return classifyDeliveryAttempts(attemptChildren, input.summary.rawStatus);
}

Keep this helper pure so fixture tests are cheap and deterministic.

Bridge Integration

Update runSendMessage():

const inboundMessageId = asString(body.messageId);
const prePromptCursor = record.lastCanonicalCursor ?? null;

await openCodeSessionBridge.promptAsync(...);

let responseObservation: OpenCodeDeliveryResponseObservation = {
  state: 'pending',
  ...
};

try {
  const reconciled = await withTimeout(...);
  responseObservation = observeOpenCodeDeliveryResponse({
    inboundMessageId,
    prePromptCursor,
    summary: reconciled.summary,
  });
} catch (error) {
  responseObservation = {
    state: 'reconcile_failed',
    reason: stringifyError(error),
    ...
  };
}

return {
  accepted: true,
  prePromptCursor,
  responseObservation,
  ...
};

The app ledger must persist the returned prePromptCursor with the delivery record. If the bridge cannot return it because the command failed before loading a session record, store null and rely only on exact messageId matching.

UI Semantics

UI should not synthesize replies.

For user sends to OpenCode:

If inbox persistence succeeds and bridge acceptance fails: show warning as today.
If bridge acceptance succeeds but response is pending: show a non-blocking warning/info:

Message delivered to OpenCode. Waiting for teammate response...

If watchdog later marks failed_terminal: surface diagnostic in team warning area or member card, not as a fake message from the teammate.

Add fields to SendMessageResult.runtimeDelivery:

responsePending?: boolean;
responseState?: OpenCodePromptResponseState;
ledgerStatus?: OpenCodePromptDeliveryStatus;
acceptanceUnknown?: boolean;
visibleReplyMessageId?: string;
visibleReplyCorrelation?: 'relayOfMessageId' | 'direct_child_message_send' | 'plain_assistant_text';
queuedBehindMessageId?: string;

Renderer store should continue to return the SendMessageResult and only rethrow real send failures.

Important current UI detail:

The renderer inserts an optimistic copy of the user's sent message with read: true in the sender-side message feed.
That is separate from the recipient's inbox row and should not be changed by the OpenCode prompt ledger.
The ledger controls only inboxes/<opencode-member>.json read commit.

Warning copy should distinguish three cases:

Bridge failed:

OpenCode runtime delivery failed: <reason>. Message was saved to inbox.

UI timeout while delivery is still being checked:

OpenCode delivery is still being checked. Message was saved and will be observed before retry.

Bridge accepted but response pending:

Message delivered to OpenCode. Waiting for teammate response...

Message queued behind older OpenCode delivery:

Message saved. OpenCode is still processing an earlier delivery.

Previous OpenCode delivery exhausted retries:

OpenCode did not respond after retries. Message is still unread and will not be retried automatically.

Renderer pending-reply behavior:

Keep the recipient in a pending state for responsePending: true.
Clear pending when a real message from that member appears in inboxes/user.json.
Clear pending when runtime delivery fails terminally.
Do not clear pending merely because delivered: true; for OpenCode that can mean prompt accepted, not teammate answered.

Task Assignment Integration

Task assignment notifications are the highest-risk OpenCode path because they are often watcher-driven, not UI-driven.

Required behavior:

Assignment inbox rows must include taskRefs.
OpenCode delivery prompt already includes taskRefs when present.
Watchdog response proof for task rows accepts task MCP tool calls as success.
Direct user ask rows should not be closed by a non-visible tool call alone; they need visible message or plain assistant text.
If OpenCode sends a visible message saying blocked or busy with concrete blocker/status, classify as responded and mark read.
If OpenCode does nothing, retry bounded.

Already fixed in controller during research:

agent-teams-controller/src/internal/tasks.js now includes the assigned task as a structured taskRefs item in owner assignment notifications.
Test added: includes the assigned task ref in owner assignment notifications.

This should be part of the final implementation baseline.

Delivery Watchdog Is Not A Task Stall Monitor

The delivery ledger should answer only one question:

Did OpenCode notice and act on this delivered inbox row?

It should not try to prove:

the task is complete,
the implementation is correct,
the teammate is making enough progress after the first action.

If an agent responds to an assignment by calling task_get or task_start, the delivery should be considered responded. If the task later stalls, that belongs to the task stall monitor or a separate OpenCode task-progress watchdog.

This boundary avoids turning message delivery into a broad autonomous task supervisor.

Do not enable or rewrite the global task-stall monitor as part of this v1. It is a separate product behavior with different thresholds, recipients, and false-positive risks. The only v1 integration is that task-related MCP tool calls can prove prompt delivery.

Edge Cases

Duplicate inbox watcher events

Problem: leaving rows unread until responded means file watchers can repeatedly call relay.

Mitigation:

Ledger lookup happens before any bridge call.
If active record is not due, relay returns responsePending: true without sending another prompt.

App restart after accepted prompt

Problem: app exits after prompt acceptance but before response proof.

Mitigation:

Ledger persists accepted.
Startup recovery scans accepted records.
If transcript now shows response proof, mark responded and commit inbox row read.
If still unanswered and due, retry.

OpenCode session stale after behavior/config change

Problem: reconcileSession() can return stale because behavior fingerprint changed.

Mitigation:

Observer returns session_stale.
Ledger moves to failed_retryable with diagnostic.
If active lane recovery later resolves session identity, retry can continue.
If no active runtime can be resolved, move to failed_terminal after max attempts or team stop.

Permission request blocks the session

Problem: OpenCode may need tool permission approval.

Mitigation:

Observer returns permission_blocked.
Do not retry while blocked.
Keep row unread.
Existing permission UI should handle the request.
After permission answer, watchdog scan should re-check response state.

Plain assistant text

Problem: OpenCode may answer as plain text instead of agent-teams_message_send.

Mitigation:

Classify as responded_plain_text.
Mark row read to avoid prompt spam.
Emit diagnostic warning because app Messages may not display that as a normal teammate message unless transcript projection surfaces it elsewhere.

Do not retry plain text. Retrying would likely produce duplicate or contradictory answers.

Multiple unread messages

Problem: old code can deliver up to 10 unread OpenCode prompts in one loop.

Mitigation:

One outstanding non-terminal ledger record per OpenCode member.
Newer rows wait in inbox until the active one is answered or terminal.

Same text, multiple messages

Problem: duplicate text should not dedupe distinct messages.

Mitigation:

Primary key is inboxMessageId.
Payload hash only detects corruption.

Message text changed under same messageId

Problem: inbox row edited after ledger creation.

Mitigation:

If payload hash differs for same record ID, do not retry silently.
Mark failed_terminal with opencode_prompt_delivery_payload_mismatch.

Team stopped while retries are scheduled

Problem: retries after stop would resurrect stale runtime assumptions.

Mitigation:

Watchdog checks current team/run/lane before retry.
If runtime stopped, mark failed_terminal or failed_retryable depending on whether team is still running.

Lead is OpenCode

Pure OpenCode lead remains unsupported in v1.

If recipient is OpenCode lead and no stored lead session exists:

Do not fake delivery.
Leave inbox unread.
Emit explicit diagnostic.

This plan is for OpenCode secondary teammate delivery.

Attachments

Current renderer UX only supports file attachments for live team-lead sends. OpenCode secondary teammate live delivery does not have an attachment transport in v1.

Rules:

Do not add OpenCode attachment transport in this watchdog work.
If an OpenCode secondary inbox row has attachments anyway, include attachment metadata in payloadHash.
Do not embed file contents into the OpenCode prompt.
Return failed_terminal with opencode_attachments_not_supported_for_secondary_runtime unless a later implementation adds a real attachment delivery contract.
Keep the row unread and surface the diagnostic.

Reason: silently delivering only text for an attachment-bearing message would be another form of message loss.

Implementation Phases

Phase 0b - MCP Visible Reply Correlation

Files:

mcp-server/src/tools/messageTools.ts
agent-teams-controller/src/internal/messages.js
agent-teams-controller/src/internal/messageStore.js
src/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.ts
src/main/services/team/TeamDataService.ts
src/main/services/team/TeamMessageFeedService.ts
src/renderer/store/slices/teamSlice.ts
test/controller.test.js

Definition of done:

message_send accepts optional relayOfMessageId.
Controller passes relayOfMessageId through to the message store without inventing it.
Message store persists relayOfMessageId on the visible reply row.
OpenCode runtime prompt asks for relayOfMessageId="<inbound app messageId>" on visible runtime replies.
OpenCode ack-only guard does not treat source alone as explicit delivery context.
Existing renderer/feed correlation by relayOfMessageId continues to work for native and OpenCode messages.
No existing message_send caller is required to provide relayOfMessageId; the field is only mandatory for OpenCode runtime-delivery prompt wording.

This phase is small but important. It gives the watchdog a hard correlation signal and prevents future implementation from falling back to summary/time heuristics.

Phase 1 - Orchestrator Response Observer

Files:

/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeDeliveryResponseObserver.ts
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.ts
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/types.ts
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.test.ts

Definition of done:

runSendMessage() still returns accepted: true after promptAsync() succeeds.
Response observation is included when reconcile succeeds.
Reconcile failure after prompt acceptance returns accepted: true with responseObservation.state = 'reconcile_failed'.
Empty assistant child is classified as unanswered, not responded.
Tool call child is classified as responded.
Plain assistant text is classified as responded_plain_text.
prompt_not_indexed is returned when async prompt acceptance has not appeared in history yet.

Phase 1b - Observe-Only Bridge Command

Files:

/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.ts
/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/OpenCodeBridgeCommandHandler.test.ts
src/main/services/team/opencode/bridge/OpenCodeBridgeCommandContract.ts
src/main/services/team/opencode/bridge/OpenCodeReadinessBridge.ts
src/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.ts

Definition of done:

opencode.observeMessageDelivery validates runtime/session preconditions.
It does not call promptAsync().
It is executed as a direct bridge command, like current opencode.sendMessage.
It is not part of OpenCodeStateChangingTeamCommandName.
It does not use OpenCodeStateChangingBridgeCommandService.
It returns the same response observation shape as opencode.sendMessage.
App watchdog uses this before every retry.

Phase 2 - Bridge Contract Propagation In claude_team

Files:

src/main/services/team/opencode/bridge/OpenCodeBridgeCommandContract.ts
src/main/services/team/runtime/OpenCodeTeamRuntimeAdapter.ts
src/main/services/team/TeamProvisioningService.ts
src/shared/types/team.ts

Definition of done:

OpenCodeTeamRuntimeAdapter.sendMessageToMember() maps bridge responseObservation.
OpenCodeTeamRuntimeAdapter.observeMessageDelivery() maps observe-only bridge responseObservation.
deliverOpenCodeMemberMessage() returns accepted, responsePending, and responseState.
IPC UI relay timeout maps to responsePending plus acceptanceUnknown, not terminal failure.
UI SendMessageResult.runtimeDelivery can show pending response without treating it as send failure.

Phase 3 - Prompt Delivery Ledger

Files:

src/main/services/team/opencode/delivery/OpenCodePromptDeliveryLedger.ts
src/main/services/team/opencode/store/RuntimeStoreManifest.ts
test/main/services/team/OpenCodePromptDeliveryLedger.test.ts

Definition of done:

Store is versioned and validates records.
Store is lane-scoped for secondary OpenCode lanes.
RuntimeStoreSchemaName, OPENCODE_RUNTIME_STORE_DESCRIPTORS, and schema validators know opencode.promptDeliveryLedger.
Duplicate record with same payload is idempotent.
Same record ID with different payload hash fails loudly.
Due retry query works.
Terminal record pruning is deterministic.

Phase 4 - Relay Integration

Files:

src/main/services/team/TeamProvisioningService.ts
test/main/services/team/TeamProvisioningServiceRelay.test.ts

Definition of done:

OpenCode row is not marked read at prompt acceptance if response is pending/unanswered.
OpenCode row is marked read when response proof is responded_*.
OpenCode row is not re-prompted when response is observed but read commit failed.
Duplicate watcher events do not re-prompt before nextAttemptAt.
One outstanding OpenCode delivery per member is enforced.
Existing native relay tests still pass.

Phase 5 - Watchdog Scheduler

Files:

src/main/services/team/opencode/delivery/OpenCodePromptDeliveryWatchdog.ts
src/main/services/team/TeamProvisioningService.ts
src/main/index.ts
test/main/services/team/TeamProvisioningServiceRelay.test.ts

Definition of done:

Startup/team activation scans due OpenCode prompt deliveries.
File watcher events schedule nearest retry instead of tight loops.
Bounded retry stops at maxAttempts.
Team stop cancels or ignores due retries.

Phase 6 - UI Diagnostics

Files:

src/shared/types/team.ts
src/renderer/store/slices/teamSlice.ts
src/renderer/components/team/messages/MessageComposer.tsx
src/renderer/components/team/dialogs/SendMessageDialog.tsx
src/renderer/components/team/messages/MessagesPanel.tsx

Definition of done:

UI send does not clear as hard failure when OpenCode accepted but response pending.
UI shows a clear pending response diagnostic.
UI does not synthesize teammate replies.
Real bridge failure still shows warning and preserves draft behavior.

Phase 7 - Fixture E2E Coverage

Files:

/Users/belief/dev/projects/claude/agent_teams_orchestrator/src/services/opencode/fixtures/
test/fixtures/team/opencode/
new orchestrator observer tests.
new app relay ledger tests.

Required fixture scenarios:

Real success with OpenCode task/message tool calls.
Empty assistant turn after delivered prompt.
Session stale after prompt acceptance.
Permission blocked.
Plain assistant text fallback.
Multiple unread rows with one active outstanding record.
App restart with accepted ledger record and later transcript proof.
Long transcript where limited reconcile misses the delivered prompt and full-history fallback finds it.
Bridge timeout after possible prompt acceptance, followed by observe-before-retry.

Minimum Real-Fixture Assertions

For every real OpenCode fixture test, assert all of these:

Delivered prompt is matched by inbound messageId, not by latest-user heuristic.
Multiple retry prompts with the same inbound messageId are treated as one logical delivery.
Multiple assistant children under one delivered prompt are all inspected.
Empty assistant child does not count as responded.
Bootstrap-only tool calls do not count as responded.
Successful agent-teams_message_send counts as visible response, but read commit still requires correlation and semantic sufficiency.
Successful task tool call counts as delivery response for task assignment.
Errored agent-teams_message_send does not mark inbox read.
Plain text response counts as responded but emits diagnostic.
Tool names are normalized across agent-teams_, mcp__agent-teams__, and plain forms.
Lowercase OpenCode execution tools such as bash and read count only when actionMode allows execution response proof.

Test Plan

Orchestrator

cd /Users/belief/dev/projects/claude/agent_teams_orchestrator
bun test src/services/opencode/OpenCodeBridgeCommandHandler.test.ts src/services/opencode/OpenCodeDeliveryResponseObserver.test.ts

Target tests:

sendMessage returns accepted with responded_tool_call when child assistant has MCP tool call.
sendMessage returns visibleReplyCorrelation relayOfMessageId when message_send args reference inbound messageId.
sendMessage returns visible_reply_missing_relayOfMessageId diagnostic when direct-child message_send lacks relayOfMessageId.
sendMessage returns accepted with empty_assistant_turn when child assistant has no text/tool/error/finish.
sendMessage returns accepted with pending when OpenCode is busy and no child exists.
sendMessage returns accepted with permission_blocked when pending permission exists.
sendMessage returns accepted with reconcile_failed warning when reconcile throws.
observeMessageDelivery does not call promptAsync.
observeMessageDelivery returns prompt_not_indexed for accepted prompt not visible in transcript yet.
observeMessageDelivery uses full-history fallback when limited reconcile misses the prompt anchor.
observeMessageDelivery is not routed through state-changing bridge command service.
observer normalizes agent-teams, mcp agent-teams, and plain tool names.
Fixture test for real OpenCode transcript projection.

App main process

cd /Users/belief/dev/projects/claude/claude_team
pnpm vitest run \
  test/main/services/team/TeamProvisioningServiceRelay.test.ts \
  test/main/services/team/OpenCodeTeamRuntimeAdapter.test.ts \
  test/main/ipc/teams.test.ts

Target tests:

visible reply with relayOfMessageId commits OpenCode inbox read even when transcript observe is still prompt_not_indexed.
visible reply destination row is preferred over transcript-only response proof.
transcript message_send without destination row is re-observed once before retry.
correlated ack-only visible reply to ask message stays unread and schedules visible answer retry.
correlated ack-only visible reply to task assignment stays unread unless task or execution proof exists.
semantic sufficiency classifier only blocks short exact ack-only phrases and allows concrete status text.
accepted pending OpenCode delivery keeps inbox row unread and ledger accepted.
responded OpenCode delivery marks inbox row read.
direct ask with only non-visible tool activity stays unread and requests visible reply.
reasoning-only assistant child does not mark direct ask read.
task assignment with task_start marks delivery responded and read.
missing ledger rebuilds unread row as acceptanceUnknown and observes before prompt.
failed_terminal unread row is skipped by automatic relay and does not block newer rows.
OpenCode secondary row with attachments fails terminal with attachment unsupported diagnostic and is not text-only delivered.
pending record is not retried into a different lane after recipient provider/model changes.
responded but mark-read failed stores inboxReadCommitError and does not retry prompt.
duplicate watcher while accepted does not call bridge again.
watchdog and watcher share one per-member delivery gate.
onlyMessageId request behind active older delivery returns queued pending, not delivered.
due unanswered row retries once and increments attempts.
watchdog observes before retry and skips prompt when response appeared after previous timeout.
bridge timeout with acceptanceUnknown observes before retrying.
max attempts moves to failed_terminal.
one outstanding record blocks newer unread rows.
app restart scans accepted ledger and commits read when transcript now proves response.

Controller

cd /Users/belief/dev/projects/claude/claude_team
pnpm --filter agent-teams-controller test -- test/controller.test.js

Must stay green, including assignment notification taskRefs.

Additional target tests:

message_send accepts relayOfMessageId and persists it.
OpenCode ack-only runtime_delivery without relayOfMessageId is rejected.
OpenCode substantive runtime_delivery without relayOfMessageId is allowed but not treated as correlated delivery proof.
source alone does not bypass idle ack filtering for OpenCode senders.

Renderer

cd /Users/belief/dev/projects/claude/claude_team
pnpm vitest run \
  test/renderer/store/teamSlice.test.ts \
  test/renderer/components/team/messages/MessagesPanel.test.ts \
  test/renderer/components/team/dialogs/SendMessageDialog.test.tsx

Target tests:

OpenCode accepted but response pending is not treated as send failure.
OpenCode UI relay timeout is shown as pending/unknown, not terminal failure.
Bridge failure still returns warning.
Draft is not cleared on true send failure.
Pending reply indicator remains while responsePending is true and clears on real reply or terminal delivery failure.
Pending reply indicator clears when a user-inbox row arrives with matching relayOfMessageId.
No fake teammate reply is projected.

Rollout And Safety

Recommended rollout:

Default enabled for OpenCode only.
Add an emergency kill switch:

CLAUDE_TEAM_OPENCODE_PROMPT_DELIVERY_WATCHDOG=0

If disabled:

Keep current prompt delivery behavior.
Log that OpenCode prompt delivery watchdog is disabled.

This is a pragmatic safety valve, not a product feature flag.

Logging

Add structured logs for these events:

opencode_prompt_delivery_ledger_created
opencode_prompt_delivery_prompt_accepted
opencode_prompt_delivery_response_observed
opencode_prompt_delivery_unanswered
opencode_prompt_delivery_retry_scheduled
opencode_prompt_delivery_retry_attempted
opencode_prompt_delivery_terminal_failure
opencode_prompt_delivery_inbox_committed_read

Minimum fields:

{
  teamName,
  memberName,
  laneId,
  runId,
  inboxMessageId,
  runtimeSessionId,
  status,
  responseState,
  attempts,
  nextAttemptAt,
  visibleReplyCorrelation,
  visibleReplySemanticallySufficient,
  reason,
}

Do not log full message text by default. Log payloadHash and first safe preview only under debug.

Production Risks And Mitigations

Risk: legitimate long-running work gets retried

Mitigation:

If OpenCode status is busy or retry, do not retry.
For task assignments, use longer grace.
Meaningful tool calls count as response proof after filtering bootstrap and identity-only tools.

Risk: plain text response is not visible in Messages UI

Mitigation:

Classify plain text as responded_plain_text.
Do not retry.
Add diagnostic so we can later improve transcript-to-UI projection.

Risk: unread rows accumulate after terminal failure

Mitigation:

Keep terminal diagnostics in ledger.
Skip terminal-failed rows during automatic relay selection so they do not block newer rows.
Optionally mark terminal failures read only if UI has an explicit "ack failed delivery" action. Do not auto-read in v1.

Risk: user asks a question, model uses a tool, but never answers

Mitigation:

Treat non-visible tool activity as sufficient only for task/do/delegate intents.
For ask messages, require visible agent-teams_message_send or plain assistant text before read commit.
Retry with copy that asks for a visible answer if the session goes idle after only non-visible tool activity.

Risk: model sends correlated acknowledgement but no answer/action

Mitigation:

relayOfMessageId proves correlation only, not semantic completion.
For asks and direct messages, ack-only text is not read-commit sufficient.
For task/do delivery, ack-only text needs task or execution proof before read commit.
Retry copy asks for concrete answer/status and tells the model not to send only acknowledgement.

Risk: retry prompt causes duplicate task work

Mitigation:

Retry prompt explicitly says not to duplicate if already acted.
Observer checks transcript before every retry.
One outstanding delivery per member reduces prompt collisions.

Risk: app and orchestrator disagree on response states

Mitigation:

Response state enum lives in bridge contract.
App treats unknown response states as pending with diagnostic, not success.

Implementation Guardrails

Reject an implementation if any of these are true:

It marks an OpenCode member inbox row read immediately after promptAsync() acceptance.
It marks direct ask messages read after only non-visible tool activity.
It uses summary/time/passive-message linking as OpenCode delivery proof instead of explicit relayOfMessageId or transcript parent/child proof.
It treats source="runtime_delivery" alone as explicit context for OpenCode ack-only messages.
It marks direct asks or task deliveries read after a correlated but ack-only visible message with no answer/status/tool proof.
It requires every message_send caller to provide relayOfMessageId; only OpenCode runtime-delivery replies should be prompted to include it.
It retries a prompt before running observe-only reconcile.
It uses latest user message as response-proof anchor without exact messageId or pre-prompt cursor.
It returns terminal missing-prompt states from a bounded transcript before trying full-history fallback when the anchor is absent.
It treats bootstrap-only tools as response proof.
It routes opencode.observeMessageDelivery through the state-changing bridge command service.
It uses a team-root prompt ledger for secondary OpenCode lanes.
It allows multiple active non-terminal OpenCode prompt deliveries for the same member.
It treats mark-read failure after response proof as a reason to re-prompt.
It hides terminal OpenCode delivery failure by silently marking the inbox row read.
It lets a failed_terminal unread row block newer automatic OpenCode deliveries forever.
It silently drops attachments when delivering an OpenCode secondary inbox row.
It retries a ledger record through a newly resolved lane/provider instead of the record's original OpenCode lane identity.
It changes native Codex, Claude, or Gemini teammate inbox semantics.

The core invariant:

OpenCode prompt accepted != OpenCode teammate responded.

Everything else follows from this.

Open Questions

These are not blockers for v1, but they should be decided before implementation review:

Should failed_terminal rows remain unread forever, or should the UI get an explicit "ack failed delivery" action? Recommended: keep unread and surface diagnostics in v1. 🎯 8 🛡️ 8 🧠 4
Should responded_plain_text be enough to mark read? Recommended: yes, with warning. Retrying plain text is more dangerous than accepting it. 🎯 8 🛡️ 8 🧠 3
Should max attempts be 3 or 2? Recommended: 3 with delays [30s, 90s, 180s]. 🎯 8 🛡️ 8 🧠 2
Should OpenCode process only one unread row per member at a time? Recommended: yes. This is probably as important as the retry ledger. 🎯 9 🛡️ 9 🧠 5

Final Recommendation

Implement the ledger with read-on-responded semantics and one outstanding OpenCode delivery per member.

This fixes the real bug class:

prompt accepted but no action,
inbox row already read,
task stuck forever,
no durable retry path.

It also avoids the two common bad fixes:

prompt spam loops,
fake frontend replies.

93 KiB Raw Permalink Blame History

OpenCode Delivery Ledger And Bounded Retry Plan

Summary

Why This Is Needed

Current Code Paths

App-side OpenCode delivery

Deeper research findings

Orchestrator-side OpenCode bridge

Transcript capabilities

Key Design Decision

Recommended read semantics

Options Considered

Ledger Model

Missing Or Quarantined Ledger Recovery

Record Shape

Stable Record ID

Runtime Identity Binding

Response Versus Inbox Commit

Why Not Use sendSessionMessage() For v1

State Machine

Response Proof Contract

How To Identify The Delivered Prompt

Visible Reply Correlation Contract

Visible Reply Semantic Sufficiency

OpenCode Ack-Only Guard Tightening

Reconcile Limit Fallback

What Counts As Responded

What Counts As Unanswered

Response Proof Priority

Response Sufficiency Policy

Bridge Contract Extension

Add Observe-Only Bridge Command

Retry Policy

Acceptance-Unknown Bridge Failures

UI Timeout Is Not Delivery Failure

Retry Eligibility Matrix

Single Outstanding Delivery Per OpenCode Member

Required Serialization

Active Record Definition

Failed Terminal Rows

UI Send Exception

App-Side Relay Changes

New Service

Ledger API Surface

TeamProvisioningService Relay Flow

Delivery Result Shape

Watchdog Scheduling

Startup Recovery Algorithm

Orchestrator Response Observer

New Pure Helper

Bridge Integration

UI Semantics

Task Assignment Integration

Delivery Watchdog Is Not A Task Stall Monitor

Edge Cases

Duplicate inbox watcher events

App restart after accepted prompt

OpenCode session stale after behavior/config change

Permission request blocks the session

Plain assistant text

Multiple unread messages

Same text, multiple messages

Message text changed under same messageId

Team stopped while retries are scheduled

Lead is OpenCode

Attachments

Implementation Phases

Phase 0b - MCP Visible Reply Correlation

Phase 1 - Orchestrator Response Observer

Phase 1b - Observe-Only Bridge Command

Phase 2 - Bridge Contract Propagation In claude_team

Phase 3 - Prompt Delivery Ledger

Phase 4 - Relay Integration

Phase 5 - Watchdog Scheduler

Phase 6 - UI Diagnostics

Phase 7 - Fixture E2E Coverage

Minimum Real-Fixture Assertions

Test Plan

Orchestrator

App main process

93 KiB

Raw Permalink Blame History

Why Not Use `sendSessionMessage()` For v1