fix: harden opencode delivery e2e flows

This commit is contained in:
777genius 2026-05-14 09:51:29 +03:00
parent 6dc0c1b233
commit 9a8a59757c
11 changed files with 503 additions and 582 deletions

View file

@ -23,6 +23,7 @@
| [research-worktrees.md](./research-worktrees.md) | Git worktrees + teams, запуск Claude процессов из UI (Phase 2) |
| [task-queue-derived-agenda-plan.md](./task-queue-derived-agenda-plan.md) | Подробный rollout-plan по разделению queue/inventory, derived actionOwner и phased agenda/delta sync |
| [debugging-agent-teams.md](./debugging-agent-teams.md) | Runtime debugging runbook, включая `CLAUDE_TEAM_TEAMMATE_MODE=tmux` для pane-backed teammate debug |
| [adaptive-task-graphs-research-note.md](./adaptive-task-graphs-research-note.md) | Research note по LATTE/AgentConductor: dynamic task graphs, frontier scheduling, selective verify, release stragglers |
## Ключевые решения

View file

@ -0,0 +1,181 @@
# Adaptive Task Graphs For Agent Teams
**Date:** 2026-05-14
**Status:** Research note, not an approved implementation plan
**Scope:** Team Management, task graph scheduling, lead/member coordination, token and conflict reduction
## Sources
- [AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation](https://arxiv.org/abs/2602.17100)
- [Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs](https://arxiv.org/html/2605.06320v1)
## Why This Is Interesting
These papers point at the same product problem we already see in Agent Teams: multi-agent performance is limited less by raw model capability and more by coordination overhead.
The useful idea is not "replace our orchestrator with a research framework". The useful idea is to make the task board itself a more explicit coordination graph:
- tasks are graph nodes
- `blockedBy` / `blocks` are dependency edges
- ready work is the graph frontier
- workers should receive scoped local context, not full team history
- stalled work should be released or reassigned explicitly
- risky or high-impact work should get selective verification
- coordination quality should be measured, not inferred from vibes
This fits our existing direction because the product already has task dependencies, review workflow, stall monitoring, task logs, context tracking, and lead/member briefing surfaces.
## Most Valuable Ideas To Preserve
### 1. LATTE-style dynamic task graph
LATTE is the more directly useful paper for us.
Core idea:
- the lead owns global graph consistency
- workers can propose or claim local work
- structural updates are serialized through the lead or controller
- execution stays parallel where dependencies allow it
- the graph remains inspectable, so coordination decisions are visible in the UI
Relevant operators to consider:
- `Discover` - create a newly discovered task when implementation reveals missing work
- `Assign` - set an owner for a ready task
- `Claim` - allow an idle member to take an unowned ready task
- `Complete` - mark task completion
- `Release` - clear owner or return stalled work to the ready queue
- `Close` - close stale/completed tasks when tests or evidence prove completion
- `Verify` - insert a lightweight review/check task before downstream work proceeds
🎯 Product value: 9/10
🛡️ Reliability if implemented incrementally: 8/10
🧠 Complexity: 6/10
Expected change size for a first useful version: about 700-1400 LOC.
### 2. Frontier-based scheduling
The board should be able to derive "what is actionable now" from graph state:
- a task is ready when all `blockedBy` tasks are completed or approved
- blocked tasks should not be started automatically
- ready unowned tasks can be offered to idle members
- ready owned tasks belong in the owner's operational queue
- lead briefing should show graph bottlenecks and unassigned frontier work
This connects directly to `task-queue-derived-agenda-plan.md`. The key addition is to treat the queue as a graph frontier, not just a filtered task list.
🎯 Product value: 9/10
🛡️ Reliability: 8/10
🧠 Complexity: 5/10
Expected change size: about 500-1000 LOC if built on the current derived agenda work.
### 3. Selective verification instead of review everything
LATTE's `Verify` is useful because it scales review cost with risk:
- verify upstream tasks that many other tasks depend on
- verify work touching shared files or public contracts
- verify tasks whose owner reported uncertainty
- skip extra verification for small isolated changes unless policy requires it
This maps well to our existing review UI and task comments. A future implementation could create a verification task or request review based on graph impact.
🎯 Product value: 8/10
🛡️ Reliability: 7/10
🧠 Complexity: 5/10
Expected change size: about 350-800 LOC.
### 4. Straggler release as first-class behavior
LATTE explicitly models stalled workers and `Release`. We already have task-stall monitoring, but the next step is to make release/reassign a structured board action, not only a message nudge.
Useful behavior:
- detect a task with weak or stale progress evidence
- notify or nudge the current owner first
- if still stalled, clear owner or reassign with context
- preserve evidence and avoid duplicate nudges
- never auto-start new runtime lanes as a side effect
This must stay compatible with existing OpenCode delivery watchdog and stall-monitor semantics.
🎯 Product value: 8/10
🛡️ Reliability: 7/10
🧠 Complexity: 6/10
Expected change size: about 600-1200 LOC.
### 5. Coordination metrics as a product surface
LATTE is especially useful because it externalizes coordination and measures failures:
- idle rounds
- straggler tail latency
- inter-agent messages
- file conflicts or concurrent writes
- redundant output
- wasted tokens
- task graph growth and bottlenecks
For Agent Teams, this could become a "team efficiency" diagnostic panel and a safer prerequisite before changing scheduling behavior.
🎯 Product value: 8/10
🛡️ Reliability: 9/10
🧠 Complexity: 4/10
Expected change size: about 350-800 LOC.
## AgentConductor Ideas Worth Keeping
AgentConductor is less directly implementable because it depends on an RL/SFT-trained orchestrator and competition-code benchmarks. Still, one product idea is valuable:
**Task difficulty should control graph density.**
Possible lightweight version for Agent Teams:
- easy task - solo or small graph, minimal messaging, no extra verification by default
- medium task - split by independent deliverables, use dependencies only where real ordering exists
- hard task - more explicit roles, denser review/checkpoints, stronger integration pass
- failed execution feedback - adapt the graph instead of repeating the same topology
Do not adopt the paper's full GRPO/SFT training path for now. It is too heavy for the app and not necessary to get product value.
🎯 Product value: 7/10
🛡️ Reliability: 6/10
🧠 Complexity: 7/10
Expected change size for a heuristic MVP: about 600-1300 LOC.
## Objectivity And Risk Notes
The LATTE paper is directionally credible but should not be treated as production proof.
Strong points:
- the core claim matches practical distributed-systems intuition
- the paper compares against several coordination styles, not only one weak baseline
- it evaluates multiple collaborative task types
- it emphasizes metrics we can independently measure
- the mechanism is simple enough to port incrementally
Limitations:
- it is an arXiv preprint, not final production validation
- benchmark tasks are controlled research tasks, not our full Electron plus runtime matrix
- baseline implementations may not match best possible production implementations
- reported improvements should be validated against our own teams, logs, and providers
Practical conclusion:
⚠️ Treat LATTE as a strong design signal, not a dependency or spec. Implement the ideas gradually behind our existing task board, lead/member briefings, and runtime-specific guardrails.
## Recommended Internal Path
1. Add coordination metrics first.
2. Derive a graph frontier from current task state.
3. Make lead and member briefings use the frontier as the operational queue.
4. Add structured release/reassign for stalled work.
5. Add selective verification for high-risk graph nodes.
6. Only after that, consider difficulty-aware graph density hints.
This ordering gives us evidence before automation. It also keeps the rollout compatible with existing `blockedBy`, review flow, task-stall monitor, OpenCode delivery watchdog, and context tracking.

View file

@ -278,7 +278,11 @@ import { isAgentTeamsToolUse } from './agentTeamsToolNames';
import { atomicWriteAsync } from './atomicWrite';
import { peekAutoResumeService } from './AutoResumeService';
import { ClaudeBinaryResolver } from './ClaudeBinaryResolver';
import { getConfiguredCliCommandLabel } from './cliFlavor';
import {
getCliFlavorUiOptions,
getConfiguredCliCommandLabel,
getConfiguredCliFlavor,
} from './cliFlavor';
import { withFileLock } from './fileLock';
import {
type ClassifiedMainProcessIdle,
@ -993,6 +997,41 @@ function getPreflightTimeoutMs(providerId: TeamProviderId | undefined): number {
return getProviderModelProbeTimeoutMs(providerId);
}
function getProviderRuntimeFailureLabel(providerId: TeamProviderId): string {
switch (providerId) {
case 'anthropic':
return 'Claude CLI';
case 'codex':
return 'Codex runtime';
case 'gemini':
return 'Gemini runtime';
case 'opencode':
return 'OpenCode runtime';
}
}
function getRunRuntimeFailureLabel(run: ProvisioningRun): string {
const providerIds = new Set<TeamProviderId>();
const addProvider = (providerId: TeamProviderId | undefined): void => {
if (providerId) {
providerIds.add(providerId);
}
};
addProvider(normalizeOptionalTeamProviderId(run.request.providerId));
addProvider(inferTeamProviderIdFromModel(run.request.model));
for (const member of run.request.members) {
addProvider(normalizeOptionalTeamProviderId(member.providerId));
addProvider(inferTeamProviderIdFromModel(member.model));
}
if (providerIds.size === 1) {
return getProviderRuntimeFailureLabel([...providerIds][0]!);
}
return getCliFlavorUiOptions(getConfiguredCliFlavor()).displayName;
}
function buildProviderCliCommandArgs(providerArgs: string[], args: string[]): string[] {
return mergeJsonSettingsArgs([...providerArgs, ...args]);
}
@ -32379,7 +32418,8 @@ export class TeamProvisioningService {
}
const errorText = buildCliExitError(code, run.stdoutBuffer, run.stderrBuffer);
const progress = updateProgress(run, 'failed', 'Claude CLI exited with an error', {
const runtimeFailureLabel = getRunRuntimeFailureLabel(run);
const progress = updateProgress(run, 'failed', `${runtimeFailureLabel} exited with an error`, {
error: errorText,
cliLogsTail: extractCliLogsFromRun(run),
});

View file

@ -17,8 +17,13 @@ import {
import type {
OpenCodeBridgeCommandLeaseStore,
OpenCodeBridgeCommandLease,
OpenCodeBridgeCommandLedger,
} from './OpenCodeBridgeCommandLedgerStore';
import { OpenCodeBridgeCommandLeaseError } from './OpenCodeBridgeCommandLedgerStore';
const DEFAULT_COMMAND_LEASE_ACQUIRE_TIMEOUT_MS = 10_000;
const DEFAULT_COMMAND_LEASE_ACQUIRE_RETRY_DELAY_MS = 100;
export interface OpenCodeBridgeCommandExecutor {
execute<TBody, TData>(
@ -63,6 +68,8 @@ export interface OpenCodeStateChangingBridgeCommandServiceOptions {
requestIdFactory?: () => string;
diagnosticIdFactory?: () => string;
clock?: () => Date;
leaseAcquireTimeoutMs?: number;
leaseAcquireRetryDelayMs?: number;
}
export class OpenCodeStateChangingBridgeCommandService {
@ -76,6 +83,8 @@ export class OpenCodeStateChangingBridgeCommandService {
private readonly requestIdFactory: () => string;
private readonly diagnosticIdFactory: () => string;
private readonly clock: () => Date;
private readonly leaseAcquireTimeoutMs: number;
private readonly leaseAcquireRetryDelayMs: number;
constructor(options: OpenCodeStateChangingBridgeCommandServiceOptions) {
this.expectedClientIdentity = options.expectedClientIdentity;
@ -89,6 +98,10 @@ export class OpenCodeStateChangingBridgeCommandService {
this.diagnosticIdFactory =
options.diagnosticIdFactory ?? (() => `opencode-bridge-diagnostic-${randomUUID()}`);
this.clock = options.clock ?? (() => new Date());
this.leaseAcquireTimeoutMs =
options.leaseAcquireTimeoutMs ?? DEFAULT_COMMAND_LEASE_ACQUIRE_TIMEOUT_MS;
this.leaseAcquireRetryDelayMs =
options.leaseAcquireRetryDelayMs ?? DEFAULT_COMMAND_LEASE_ACQUIRE_RETRY_DELAY_MS;
}
async execute<TBody, TData>(input: {
@ -136,7 +149,7 @@ export class OpenCodeStateChangingBridgeCommandService {
body: input.body,
});
const commandRequestId = this.requestIdFactory();
const lease = await this.leaseStore.acquire({
const lease = await this.acquireLease({
teamName: input.teamName,
laneId: normalizedLaneId,
runId: input.runId,
@ -243,6 +256,37 @@ export class OpenCodeStateChangingBridgeCommandService {
}
}
private async acquireLease(input: {
teamName: string;
laneId: string | null;
runId: string | null;
command: OpenCodeBridgeCommandName;
ttlMs: number;
}): Promise<OpenCodeBridgeCommandLease> {
const deadlineMs = Date.now() + Math.max(0, this.leaseAcquireTimeoutMs);
let lastError: unknown = null;
do {
try {
return await this.leaseStore.acquire(input);
} catch (error) {
if (
!(error instanceof OpenCodeBridgeCommandLeaseError) ||
!isActiveOpenCodeBridgeCommandLeaseError(error)
) {
throw error;
}
lastError = error;
if (Date.now() >= deadlineMs) {
throw error;
}
await sleep(Math.max(1, this.leaseAcquireRetryDelayMs));
}
} while (Date.now() < deadlineMs);
throw lastError instanceof Error ? lastError : new Error('OpenCode bridge lease unavailable');
}
private async appendUnknownOutcomeDiagnostic(input: {
result: OpenCodeBridgeResult<unknown>;
teamName: string;
@ -316,3 +360,11 @@ function requiresOpenCodeDeliveryAcceptanceContract(
function stringifyError(error: unknown): string {
return error instanceof Error ? error.message : String(error);
}
function isActiveOpenCodeBridgeCommandLeaseError(error: OpenCodeBridgeCommandLeaseError): boolean {
return error.message.startsWith('OpenCode bridge command lease already active:');
}
function sleep(delayMs: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, delayMs));
}

View file

@ -100,6 +100,12 @@ const SECRET_FLAG_PATTERN =
const BEARER_TOKEN_PATTERN = /\bBearer\s+\S+/gi;
const SECRET_KEY_PATTERN = /\bsk-[A-Za-z0-9_-]{16,}\b/g;
function resolveOpenCodeRuntimeSettlementMode(
input: Pick<OpenCodeTeamRuntimeMessageInput, 'messageKind'>
): OpenCodeSendMessageCommandBody['settlementMode'] {
return input.messageKind === 'member_work_sync_nudge' ? 'observed' : 'acceptance';
}
export class OpenCodeTeamRuntimeAdapter implements TeamLaunchRuntimeAdapter {
readonly providerId = 'opencode' as const;
private readonly lastProjectPathByTeamName = new Map<string, string>();
@ -334,7 +340,7 @@ export class OpenCodeTeamRuntimeAdapter implements TeamLaunchRuntimeAdapter {
text: buildOpenCodeRuntimeMessageText(input),
messageId: input.messageId,
...(input.deliveryAttemptId ? { deliveryAttemptId: input.deliveryAttemptId } : {}),
settlementMode: 'acceptance',
settlementMode: resolveOpenCodeRuntimeSettlementMode(input),
fileParts: input.fileParts,
actionMode: input.actionMode,
messageKind: input.messageKind,

View file

@ -91,6 +91,7 @@ import { type MemberActivityFilter, type MemberDetailTab } from './members/membe
import type { AddMemberEntry } from './dialogs/AddMemberDialog';
import type { TeamLaunchDialogMode } from './dialogs/LaunchTeamDialog';
import type { TeamColorSet } from '@renderer/constants/teamColors';
import type { TeamMessagesPanelMode } from '@renderer/types/teamMessagesPanelMode';
import type { ComponentProps, CSSProperties, RefObject } from 'react';
@ -449,12 +450,16 @@ const TeamLoadingSectionHeader = ({
type TeamContentLoadingSkeletonProps = Readonly<{
teamName: string;
headerColorSet: TeamColorSet;
isLight: boolean;
contentRef: RefObject<HTMLDivElement | null>;
provisioningBannerRef: RefObject<HTMLDivElement | null>;
}>;
const TeamContentLoadingSkeleton = ({
teamName,
headerColorSet,
isLight,
contentRef,
provisioningBannerRef,
}: TeamContentLoadingSkeletonProps): React.JSX.Element => (
@ -465,28 +470,32 @@ const TeamContentLoadingSkeleton = ({
role="status"
aria-label="Loading team"
>
<div className="relative -mx-4 -mt-4 mb-3 overflow-hidden border-b border-[var(--color-border)] bg-purple-950/20 px-4 py-3">
<div className="flex items-start justify-between gap-2">
<div className="relative -mx-4 -mt-4 mb-3 overflow-hidden border-b border-[var(--color-border)] px-4 py-3">
<div
className="pointer-events-none absolute inset-0 z-0"
style={{ backgroundColor: getThemedBadge(headerColorSet, isLight) }}
/>
<div className="relative z-10 flex items-start justify-between gap-2">
<div className="min-w-0 flex-1">
<div className="flex items-center gap-2">
<div className="flex h-6 items-center gap-2">
<SkeletonPill className="h-5 w-44" />
<SkeletonPill className="h-6 w-20 bg-emerald-500/15" />
</div>
<SkeletonPill className="mt-3 h-4 w-72" />
<div className="mt-3 flex flex-wrap items-center gap-3">
<SkeletonPill className="h-4 w-28" />
<SkeletonPill className="h-6 w-24 rounded-md" />
<SkeletonPill className="h-4 w-16" />
<SkeletonPill className="h-5 w-20 bg-emerald-500/15" />
</div>
</div>
<div className="flex shrink-0 items-center gap-2">
<div className="flex shrink-0 items-center gap-1.5">
<SkeletonPill className="h-7 w-16" />
<SkeletonPill className="size-7 rounded" />
<SkeletonPill className="size-7 rounded" />
</div>
</div>
<div className="mt-4 flex justify-end">
<SkeletonPill className="h-11 w-40 rounded-full border border-cyan-300/25 bg-cyan-500/10" />
<SkeletonPill className="relative z-10 mt-0.5 h-3 w-72 max-w-full" />
<div className="relative z-10 mt-1 flex items-start justify-between gap-3">
<div className="flex min-w-0 flex-1 flex-wrap items-center gap-x-3 gap-y-0.5">
<SkeletonPill className="h-3 w-28" />
<SkeletonPill className="h-5 w-24 rounded-md" />
<SkeletonPill className="h-3 w-16" />
</div>
<SkeletonPill className="-mt-2 h-8 w-36 shrink-0 rounded-full border border-cyan-300/25 bg-cyan-500/10" />
</div>
</div>
@ -592,6 +601,8 @@ type TeamLoadingSkeletonProps = Readonly<{
isActive: boolean | undefined;
isFocused: boolean | undefined;
messagesPanelMode: TeamMessagesPanelMode;
headerColorSet: TeamColorSet;
isLight: boolean;
contentRef: RefObject<HTMLDivElement | null>;
provisioningBannerRef: RefObject<HTMLDivElement | null>;
}>;
@ -601,6 +612,8 @@ const TeamLoadingSkeleton = ({
isActive,
isFocused,
messagesPanelMode,
headerColorSet,
isLight,
contentRef,
provisioningBannerRef,
}: TeamLoadingSkeletonProps): React.JSX.Element => (
@ -618,6 +631,8 @@ const TeamLoadingSkeleton = ({
<div className="relative min-h-0 min-w-0 flex-1">
<TeamContentLoadingSkeleton
teamName={teamName}
headerColorSet={headerColorSet}
isLight={isLight}
contentRef={contentRef}
provisioningBannerRef={provisioningBannerRef}
/>
@ -1635,6 +1650,8 @@ export const TeamDetailView = memo(function TeamDetailView({
pendingReviewRequest,
setPendingReviewRequest,
summaryKnownTeammateCount,
teamSummaryColor,
teamSummaryDisplayName,
} = useStore(
useShallow((s) => ({
projects: s.projects,
@ -1672,6 +1689,8 @@ export const TeamDetailView = memo(function TeamDetailView({
summaryKnownTeammateCount: teamName
? getSummaryKnownTeammateCount(s.teamByName[teamName])
: 0,
teamSummaryColor: teamName ? s.teamByName[teamName]?.color : undefined,
teamSummaryDisplayName: teamName ? s.teamByName[teamName]?.displayName : undefined,
loading: s.selectedTeamName === teamName ? s.selectedTeamLoading : false,
error: s.selectedTeamName === teamName ? s.selectedTeamError : null,
refreshTeamData: s.refreshTeamData,
@ -1701,6 +1720,13 @@ export const TeamDetailView = memo(function TeamDetailView({
const tabId = useTabIdOptional();
const isThisTabActive = isActive;
const wasInteractiveRef = useRef(false);
const loadingHeaderColorSet = useMemo(
() =>
teamSummaryColor
? getTeamColorSet(teamSummaryColor)
: nameColorSet(teamSummaryDisplayName || teamName),
[teamName, teamSummaryColor, teamSummaryDisplayName]
);
// Messages panel resize
const { isResizing: isMessagesPanelResizing, handleProps: messagesPanelHandleProps } =
@ -2509,6 +2535,8 @@ export const TeamDetailView = memo(function TeamDetailView({
isActive={isThisTabActive}
isFocused={isPaneFocused}
messagesPanelMode={messagesPanelMode}
headerColorSet={loadingHeaderColorSet}
isLight={isLight}
contentRef={contentRef}
provisioningBannerRef={provisioningBannerRef}
/>

View file

@ -1,6 +1,5 @@
/* eslint-disable react/jsx-props-no-spreading -- Standard shadcn pattern: forward remaining props to underlying elements */
import * as React from 'react';
import { createPortal } from 'react-dom';
import * as DialogPrimitive from '@radix-ui/react-dialog';
import { cn } from '@renderer/lib/utils';
@ -9,20 +8,7 @@ import { X } from 'lucide-react';
const Dialog = DialogPrimitive.Root;
const DialogTrigger = DialogPrimitive.Trigger;
const DialogClose = DialogPrimitive.Close;
type DialogPortalProps = React.ComponentPropsWithoutRef<typeof DialogPrimitive.Portal>;
const DialogPortal = ({ children, container }: DialogPortalProps): React.ReactPortal | null => {
const [mounted, setMounted] = React.useState(false);
React.useLayoutEffect(() => {
setMounted(true);
}, []);
const portalContainer = container ?? (mounted ? globalThis.document?.body : null);
return portalContainer ? createPortal(<>{children}</>, portalContainer) : null;
};
DialogPortal.displayName = DialogPrimitive.Portal.displayName;
const DialogPortal = DialogPrimitive.Portal;
const DialogOverlay = React.forwardRef<
React.ComponentRef<typeof DialogPrimitive.Overlay>,

View file

@ -1,9 +1,9 @@
{
"generatedAt": "2026-05-09T23:16:07.760Z",
"runsPerModel": 3,
"generatedAt": "2026-05-14T06:34:47.601Z",
"runsPerModel": 1,
"qualification": {
"minimumAverageScore": 90,
"minimumSuccessfulRuns": 3,
"minimumAverageScore": 70,
"minimumSuccessfulRuns": 1,
"minimumConsistencyScore": 85,
"requireNoHardFailures": true
},
@ -11,93 +11,93 @@
{
"model": "opencode/big-pickle",
"verdict": "recommended",
"confidence": "high",
"confidence": "low",
"qualified": true,
"readinessScore": 100,
"averageScore": 100,
"consistencyScore": 100,
"behavioralAverageScore": 100,
"minScore": 100,
"successfulRuns": 3,
"countedRuns": 3,
"successfulRuns": 1,
"countedRuns": 1,
"hardFailures": 0,
"providerInfraFailures": 0,
"runtimeTransportFailures": 0,
"modelBehaviorFailures": 0,
"harnessFailures": 0,
"p50DurationMs": 112355,
"p95DurationMs": 116891,
"p50DurationMs": 132968,
"p95DurationMs": 132968,
"stagePassRates": {
"launchBootstrap": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"directReply": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"peerRelayAB": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"peerRelayBC": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"concurrentReplies": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"taskRefs": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"cleanTranscript": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"noDuplicateTokens": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"latencyStable": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
"taskRefPassRates": {
"directReply": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"peerRelayAB": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"peerRelayBC": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"concurrentBob": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
},
"concurrentTom": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -112,8 +112,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -122,8 +122,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -132,8 +132,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -142,8 +142,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -152,8 +152,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -162,8 +162,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -172,8 +172,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -182,8 +182,8 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
},
@ -192,14 +192,14 @@
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"passed": 1,
"total": 1,
"rate": 100
}
}
],
"scoreStability": {
"sampleSize": 3,
"sampleSize": 1,
"minScore": 100,
"maxScore": 100,
"spread": 0,
@ -217,16 +217,16 @@
"outcome": "passed",
"failureCategory": "none",
"primaryFailure": null,
"durationMs": 112344,
"durationMs": 132968,
"hardFailure": false,
"stageDurationsMs": {
"setup": 183,
"launchBootstrap": 19933,
"materializeTasks": 35,
"directReply": 15430,
"peerRelayAB": 25001,
"peerRelayBC": 28154,
"concurrentReplies": 15551,
"setup": 2770,
"launchBootstrap": 49092,
"materializeTasks": 85,
"directReply": 13760,
"peerRelayAB": 22730,
"peerRelayBC": 21484,
"concurrentReplies": 14023,
"hygiene": 1
},
"stageFailures": {},
@ -253,455 +253,7 @@
"latencyStable": true
},
"diagnostics": [
"runId=d9d27eb0-2798-4980-a0fa-f082a6edd705"
]
},
{
"runIndex": 2,
"passed": true,
"score": 100,
"countedForRecommendation": true,
"outcome": "passed",
"failureCategory": "none",
"primaryFailure": null,
"durationMs": 112355,
"hardFailure": false,
"stageDurationsMs": {
"setup": 11,
"launchBootstrap": 18682,
"materializeTasks": 36,
"directReply": 15126,
"peerRelayAB": 24835,
"peerRelayBC": 28580,
"concurrentReplies": 17164,
"hygiene": 1
},
"stageFailures": {},
"taskRefChecks": {
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentBob": true,
"concurrentTom": true
},
"protocolViolations": {
"badMessages": 0,
"duplicateOrMissingTokens": []
},
"stages": {
"launchBootstrap": true,
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentReplies": true,
"taskRefs": true,
"cleanTranscript": true,
"noDuplicateTokens": true,
"latencyStable": true
},
"diagnostics": [
"runId=97364154-e06d-460c-94ae-65b73cb1b6f9"
]
},
{
"runIndex": 3,
"passed": true,
"score": 100,
"countedForRecommendation": true,
"outcome": "passed",
"failureCategory": "none",
"primaryFailure": null,
"durationMs": 116891,
"hardFailure": false,
"stageDurationsMs": {
"setup": 8,
"launchBootstrap": 18926,
"materializeTasks": 31,
"directReply": 17061,
"peerRelayAB": 27842,
"peerRelayBC": 27262,
"concurrentReplies": 15437,
"hygiene": 1
},
"stageFailures": {},
"taskRefChecks": {
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentBob": true,
"concurrentTom": true
},
"protocolViolations": {
"badMessages": 0,
"duplicateOrMissingTokens": []
},
"stages": {
"launchBootstrap": true,
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentReplies": true,
"taskRefs": true,
"cleanTranscript": true,
"noDuplicateTokens": true,
"latencyStable": true
},
"diagnostics": [
"runId=7bdd4b2e-dbd6-4474-a8a0-9418df433671"
]
}
]
},
{
"model": "opencode/minimax-m2.5-free",
"verdict": "strong-candidate",
"confidence": "high",
"qualified": false,
"readinessScore": 88.6,
"averageScore": 98.3,
"consistencyScore": 93.1,
"behavioralAverageScore": 98.3,
"minScore": 95,
"successfulRuns": 2,
"countedRuns": 3,
"hardFailures": 1,
"providerInfraFailures": 0,
"runtimeTransportFailures": 0,
"modelBehaviorFailures": 1,
"harnessFailures": 0,
"p50DurationMs": 108862,
"p95DurationMs": 118757,
"stagePassRates": {
"launchBootstrap": {
"passed": 3,
"total": 3,
"rate": 100
},
"directReply": {
"passed": 3,
"total": 3,
"rate": 100
},
"peerRelayAB": {
"passed": 3,
"total": 3,
"rate": 100
},
"peerRelayBC": {
"passed": 3,
"total": 3,
"rate": 100
},
"concurrentReplies": {
"passed": 3,
"total": 3,
"rate": 100
},
"taskRefs": {
"passed": 3,
"total": 3,
"rate": 100
},
"cleanTranscript": {
"passed": 3,
"total": 3,
"rate": 100
},
"noDuplicateTokens": {
"passed": 2,
"total": 3,
"rate": 66.7
},
"latencyStable": {
"passed": 3,
"total": 3,
"rate": 100
}
},
"taskRefPassRates": {
"directReply": {
"passed": 3,
"total": 3,
"rate": 100
},
"peerRelayAB": {
"passed": 3,
"total": 3,
"rate": 100
},
"peerRelayBC": {
"passed": 3,
"total": 3,
"rate": 100
},
"concurrentBob": {
"passed": 3,
"total": 3,
"rate": 100
},
"concurrentTom": {
"passed": 3,
"total": 3,
"rate": 100
}
},
"protocolViolationTotals": {
"badMessages": 0,
"duplicateOrMissingTokens": 2,
"affectedRuns": 1
},
"stageFailureImpact": [
{
"stage": "noDuplicateTokens",
"failedRuns": 1,
"weightedLoss": 5,
"passRate": {
"passed": 2,
"total": 3,
"rate": 66.7
}
},
{
"stage": "cleanTranscript",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
},
{
"stage": "concurrentReplies",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
},
{
"stage": "directReply",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
},
{
"stage": "latencyStable",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
},
{
"stage": "launchBootstrap",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
},
{
"stage": "peerRelayAB",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
},
{
"stage": "peerRelayBC",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
},
{
"stage": "taskRefs",
"failedRuns": 0,
"weightedLoss": 0,
"passRate": {
"passed": 3,
"total": 3,
"rate": 100
}
}
],
"scoreStability": {
"sampleSize": 3,
"minScore": 95,
"maxScore": 100,
"spread": 5,
"standardDeviation": 2.4,
"consistencyScore": 93.1
},
"dominantFailureCategory": "model-behavior",
"recommendationBlockers": [
"successful runs 2 < 3",
"hard failures 1",
"model-behavior failures 1",
"highest weighted stage loss noDuplicateTokens=5",
"protocol violations in 1 runs"
],
"runs": [
{
"runIndex": 1,
"passed": true,
"score": 100,
"countedForRecommendation": true,
"outcome": "passed",
"failureCategory": "none",
"primaryFailure": null,
"durationMs": 91530,
"hardFailure": false,
"stageDurationsMs": {
"setup": 10,
"launchBootstrap": 18716,
"materializeTasks": 31,
"directReply": 11557,
"peerRelayAB": 16323,
"peerRelayBC": 27370,
"concurrentReplies": 9606,
"hygiene": 1
},
"stageFailures": {},
"taskRefChecks": {
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentBob": true,
"concurrentTom": true
},
"protocolViolations": {
"badMessages": 0,
"duplicateOrMissingTokens": []
},
"stages": {
"launchBootstrap": true,
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentReplies": true,
"taskRefs": true,
"cleanTranscript": true,
"noDuplicateTokens": true,
"latencyStable": true
},
"diagnostics": [
"runId=23ae85d2-e79d-41c9-93a6-e843acea6d9e"
]
},
{
"runIndex": 2,
"passed": true,
"score": 100,
"countedForRecommendation": true,
"outcome": "passed",
"failureCategory": "none",
"primaryFailure": null,
"durationMs": 108862,
"hardFailure": false,
"stageDurationsMs": {
"setup": 10,
"launchBootstrap": 18359,
"materializeTasks": 35,
"directReply": 7236,
"peerRelayAB": 30664,
"peerRelayBC": 26124,
"concurrentReplies": 18477,
"hygiene": 0
},
"stageFailures": {},
"taskRefChecks": {
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentBob": true,
"concurrentTom": true
},
"protocolViolations": {
"badMessages": 0,
"duplicateOrMissingTokens": []
},
"stages": {
"launchBootstrap": true,
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentReplies": true,
"taskRefs": true,
"cleanTranscript": true,
"noDuplicateTokens": true,
"latencyStable": true
},
"diagnostics": [
"runId=c3a55d8a-4028-4af7-9e1a-8ae8c87a95e5"
]
},
{
"runIndex": 3,
"passed": false,
"score": 95,
"countedForRecommendation": true,
"outcome": "behavioral-fail",
"failureCategory": "model-behavior",
"primaryFailure": "duplicateOrMissingTokens=GAUNTLET_JACK_USER_OK_3,GAUNTLET_TOM_USER_OK_3",
"durationMs": 118757,
"hardFailure": true,
"stageDurationsMs": {
"setup": 9,
"launchBootstrap": 19986,
"materializeTasks": 37,
"directReply": 8036,
"peerRelayAB": 37430,
"peerRelayBC": 36219,
"concurrentReplies": 8551,
"hygiene": 0
},
"stageFailures": {},
"taskRefChecks": {
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentBob": true,
"concurrentTom": true
},
"protocolViolations": {
"badMessages": 0,
"duplicateOrMissingTokens": [
"GAUNTLET_JACK_USER_OK_3",
"GAUNTLET_TOM_USER_OK_3"
]
},
"stages": {
"launchBootstrap": true,
"directReply": true,
"peerRelayAB": true,
"peerRelayBC": true,
"concurrentReplies": true,
"taskRefs": true,
"cleanTranscript": true,
"noDuplicateTokens": false,
"latencyStable": true
},
"diagnostics": [
"runId=2b0610e0-7b10-49fc-88dd-ab30b37abce9",
"duplicateOrMissingTokens=GAUNTLET_JACK_USER_OK_3,GAUNTLET_TOM_USER_OK_3"
"runId=5f3d0b1b-17eb-44d6-8b61-644e6f8673c6"
]
}
]

View file

@ -1,9 +1,9 @@
# OpenCode Model Gauntlet Results
Generated: 2026-05-09T23:16:07.760Z
Generated: 2026-05-14T06:34:47.601Z
Runs per model: 3
Recommended threshold: average >= 90, successful runs >= 3, consistency >= 85, hard failures = 0
Runs per model: 1
Recommended threshold: average >= 70, successful runs >= 1, consistency >= 85, hard failures = 0
Provider-infra runs are reported separately and are not counted as model behavior. They still block a Recommended verdict until rerun succeeds.
@ -13,50 +13,25 @@ Scoring weights: launchBootstrap=15, directReply=10, peerRelayAB=15, peerRelayBC
| Model | Verdict | Confidence | Readiness | Consistency | Score Spread | Behavior Avg | Overall Avg | Counted | Pass Runs | Weakest Stage | Weakest TaskRef | Dominant Failure | Blockers | Provider Infra | Runtime Transport | Model Fails | Protocol Runs | p50 | p95 |
| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- | --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
| `opencode/big-pickle` | Recommended | high | 100 | 100 | 0 | 100 | 100 | 3/3 | 3/3 | cleanTranscript 3/3 (100%) | concurrentBob 3/3 (100%) | none | - | 0 | 0 | 0 | 0 | 112355ms | 116891ms |
| `opencode/minimax-m2.5-free` | Strong candidate | high | 88.6 | 93.1 | 5 | 98.3 | 98.3 | 3/3 | 2/3 | noDuplicateTokens 2/3 (66.7%) | concurrentBob 3/3 (100%) | model-behavior | successful runs 2 < 3; hard failures 1; model-behavior failures 1; highest weighted stage loss noDuplicateTokens=5; protocol violations in 1 runs | 0 | 0 | 1 | 1 | 108862ms | 118757ms |
| `opencode/big-pickle` | Recommended | low | 100 | 100 | 0 | 100 | 100 | 1/1 | 1/1 | cleanTranscript 1/1 (100%) | concurrentBob 1/1 (100%) | none | - | 0 | 0 | 0 | 0 | 132968ms | 132968ms |
## opencode/big-pickle
Readiness score: 100.
Score stability: consistency=100, min=100, max=100, spread=0, stdDev=0, samples=3.
Score stability: consistency=100, min=100, max=100, spread=0, stdDev=0, samples=1.
Recommendation blockers: -.
Weighted stage impact: -.
Stage pass rates: launchBootstrap:3/3 (100%), directReply:3/3 (100%), peerRelayAB:3/3 (100%), peerRelayBC:3/3 (100%), concurrentReplies:3/3 (100%), taskRefs:3/3 (100%), cleanTranscript:3/3 (100%), noDuplicateTokens:3/3 (100%), latencyStable:3/3 (100%).
Stage pass rates: launchBootstrap:1/1 (100%), directReply:1/1 (100%), peerRelayAB:1/1 (100%), peerRelayBC:1/1 (100%), concurrentReplies:1/1 (100%), taskRefs:1/1 (100%), cleanTranscript:1/1 (100%), noDuplicateTokens:1/1 (100%), latencyStable:1/1 (100%).
TaskRef pass rates: directReply:3/3 (100%), peerRelayAB:3/3 (100%), peerRelayBC:3/3 (100%), concurrentBob:3/3 (100%), concurrentTom:3/3 (100%).
TaskRef pass rates: directReply:1/1 (100%), peerRelayAB:1/1 (100%), peerRelayBC:1/1 (100%), concurrentBob:1/1 (100%), concurrentTom:1/1 (100%).
Protocol totals: badMessages=0, duplicateOrMissingTokens=0, affectedRuns=0.
| Run | Outcome | Category | Score | Counted | Duration | Failed Stages | Slowest Stage | TaskRefs | Protocol | Diagnostics |
| ---: | --- | --- | ---: | --- | ---: | --- | --- | --- | --- | --- |
| 1 | passed | none | 100 | yes | 112344ms | - | peerRelayBC:28154ms | directReply:ok, peerRelayAB:ok, peerRelayBC:ok, concurrentBob:ok, concurrentTom:ok | - | runId=d9d27eb0-2798-4980-a0fa-f082a6edd705 |
| 2 | passed | none | 100 | yes | 112355ms | - | peerRelayBC:28580ms | directReply:ok, peerRelayAB:ok, peerRelayBC:ok, concurrentBob:ok, concurrentTom:ok | - | runId=97364154-e06d-460c-94ae-65b73cb1b6f9 |
| 3 | passed | none | 100 | yes | 116891ms | - | peerRelayAB:27842ms | directReply:ok, peerRelayAB:ok, peerRelayBC:ok, concurrentBob:ok, concurrentTom:ok | - | runId=7bdd4b2e-dbd6-4474-a8a0-9418df433671 |
## opencode/minimax-m2.5-free
Readiness score: 88.6.
Score stability: consistency=93.1, min=95, max=100, spread=5, stdDev=2.4, samples=3.
Recommendation blockers: successful runs 2 < 3; hard failures 1; model-behavior failures 1; highest weighted stage loss noDuplicateTokens=5; protocol violations in 1 runs.
Weighted stage impact: noDuplicateTokens:loss=5, failed=1, pass=2/3 (66.7%).
Stage pass rates: launchBootstrap:3/3 (100%), directReply:3/3 (100%), peerRelayAB:3/3 (100%), peerRelayBC:3/3 (100%), concurrentReplies:3/3 (100%), taskRefs:3/3 (100%), cleanTranscript:3/3 (100%), noDuplicateTokens:2/3 (66.7%), latencyStable:3/3 (100%).
TaskRef pass rates: directReply:3/3 (100%), peerRelayAB:3/3 (100%), peerRelayBC:3/3 (100%), concurrentBob:3/3 (100%), concurrentTom:3/3 (100%).
Protocol totals: badMessages=0, duplicateOrMissingTokens=2, affectedRuns=1.
| Run | Outcome | Category | Score | Counted | Duration | Failed Stages | Slowest Stage | TaskRefs | Protocol | Diagnostics |
| ---: | --- | --- | ---: | --- | ---: | --- | --- | --- | --- | --- |
| 1 | passed | none | 100 | yes | 91530ms | - | peerRelayBC:27370ms | directReply:ok, peerRelayAB:ok, peerRelayBC:ok, concurrentBob:ok, concurrentTom:ok | - | runId=23ae85d2-e79d-41c9-93a6-e843acea6d9e |
| 2 | passed | none | 100 | yes | 108862ms | - | peerRelayAB:30664ms | directReply:ok, peerRelayAB:ok, peerRelayBC:ok, concurrentBob:ok, concurrentTom:ok | - | runId=c3a55d8a-4028-4af7-9e1a-8ae8c87a95e5 |
| 3 | behavioral-fail | model-behavior | 95 | yes | 118757ms | noDuplicateTokens | peerRelayAB:37430ms | directReply:ok, peerRelayAB:ok, peerRelayBC:ok, concurrentBob:ok, concurrentTom:ok | token=GAUNTLET_JACK_USER_OK_3+GAUNTLET_TOM_USER_OK_3 | duplicateOrMissingTokens=GAUNTLET_JACK_USER_OK_3,GAUNTLET_TOM_USER_OK_3 |
| 1 | passed | none | 100 | yes | 132968ms | - | launchBootstrap:49092ms | directReply:ok, peerRelayAB:ok, peerRelayBC:ok, concurrentBob:ok, concurrentTom:ok | - | runId=5f3d0b1b-17eb-44d6-8b61-644e6f8673c6 |

View file

@ -166,6 +166,50 @@ describe('OpenCodeStateChangingBridgeCommandService', () => {
await expect(leaseStore.getActive('team-a')).resolves.toBeNull();
});
it('waits briefly for an active lane lease instead of failing near-concurrent sends', async () => {
clientIdentity.bridgeProtocol.supportedCommands.push('opencode.sendMessage');
const server = peerIdentity('agent_teams_orchestrator');
server.bridgeProtocol.supportedCommands.push('opencode.sendMessage');
server.bridgeProtocol.opencodeDeliveryAcceptanceContractVersion =
OPEN_CODE_DELIVERY_ACCEPTANCE_CONTRACT_VERSION;
handshakePort.nextHandshake = buildHandshakeWithAcceptedCommands(
{ client: clientIdentity, server },
['opencode.launchTeam', 'opencode.stopTeam', 'opencode.sendMessage']
);
bridge.resultFactory = ({ body, command, options }) =>
bridgeSuccess({
requestId: options.requestId,
command,
data: {
runId: 'run-1',
idempotencyKey: body.preconditions.idempotencyKey,
runtimeStoreManifestHighWatermark: 10,
},
});
const service = createService({
leaseAcquireTimeoutMs: 200,
leaseAcquireRetryDelayMs: 5,
});
const activeLease = await leaseStore.acquire({
teamName: 'team-a',
laneId: 'secondary:opencode:bob',
runId: 'run-1',
command: 'opencode.sendMessage',
ttlMs: 10_000,
});
const resultPromise = service.execute(buildSendInput('acceptance'));
await sleep(20);
expect(bridge.calls).toHaveLength(0);
await leaseStore.release(activeLease.leaseId);
await expect(resultPromise).resolves.toMatchObject({ ok: true });
expect(bridge.calls).toHaveLength(1);
expect(bridge.calls[0].body.preconditions.commandLeaseId).toBe('lease-2');
await expect(leaseStore.getActive('team-a')).resolves.toBeNull();
});
it('records unknown outcome after timeout and blocks retry before a duplicate bridge call', async () => {
bridge.resultFactory = ({ body, command, options }) => ({
ok: false,
@ -238,7 +282,12 @@ describe('OpenCodeStateChangingBridgeCommandService', () => {
await expect(leaseStore.getActive('team-a')).resolves.toBeNull();
});
function createService(): OpenCodeStateChangingBridgeCommandService {
function createService(
overrides: {
leaseAcquireTimeoutMs?: number;
leaseAcquireRetryDelayMs?: number;
} = {}
): OpenCodeStateChangingBridgeCommandService {
return new OpenCodeStateChangingBridgeCommandService({
expectedClientIdentity: clientIdentity,
handshakePort,
@ -250,6 +299,7 @@ describe('OpenCodeStateChangingBridgeCommandService', () => {
requestIdFactory: () => 'cmd-1',
diagnosticIdFactory: () => 'diag-1',
clock: () => now,
...overrides,
});
}
});
@ -405,12 +455,12 @@ function buildHandshakeWithAcceptedCommands(
class FakeBridgeExecutor implements OpenCodeBridgeCommandExecutor {
calls: Array<{
command: OpenCodeBridgeCommandName;
body: { prompt: string; preconditions: { idempotencyKey: string } };
body: { prompt: string; preconditions: { idempotencyKey: string; commandLeaseId?: string } };
options: { cwd: string; timeoutMs: number; requestId?: string };
}> = [];
resultFactory: (input: {
command: OpenCodeBridgeCommandName;
body: { prompt: string; preconditions: { idempotencyKey: string } };
body: { prompt: string; preconditions: { idempotencyKey: string; commandLeaseId?: string } };
options: { cwd: string; timeoutMs: number; requestId?: string };
}) => OpenCodeBridgeResult<unknown> = ({ body, options }) =>
bridgeSuccess({
@ -429,7 +479,10 @@ class FakeBridgeExecutor implements OpenCodeBridgeCommandExecutor {
): Promise<OpenCodeBridgeResult<TData>> {
const call = {
command,
body: body as { prompt: string; preconditions: { idempotencyKey: string } },
body: body as {
prompt: string;
preconditions: { idempotencyKey: string; commandLeaseId?: string };
},
options,
};
this.calls.push(call);
@ -460,3 +513,7 @@ class FakeManifestReader implements RuntimeStoreManifestReader {
class FakeDiagnosticsSink implements OpenCodeStateChangingBridgeDiagnosticsSink {
readonly append = vi.fn(async () => {});
}
function sleep(delayMs: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, delayMs));
}

View file

@ -545,6 +545,49 @@ describe('OpenCodeTeamRuntimeAdapter', () => {
expect(sentText).toContain('never use #00000000');
});
it('uses observed settlement for member-work-sync nudges so turn-settled can drive reconcile', async () => {
const sendOpenCodeTeamMessage = vi.fn<
NonNullable<OpenCodeTeamRuntimeBridgePort['sendOpenCodeTeamMessage']>
>(async () => ({
accepted: true,
sessionId: 'oc-session-bob',
memberName: 'bob',
runtimePid: 456,
runtimePromptMessageId: 'msg_prompt_1',
diagnostics: [],
}));
const adapter = new OpenCodeTeamRuntimeAdapter(
bridgePort(readiness({ state: 'ready', launchAllowed: true }), {
sendOpenCodeTeamMessage,
})
);
await expect(
adapter.sendMessageToMember({
runId: 'run-1',
teamName: 'team-a',
laneId: 'secondary:opencode:bob',
memberName: 'bob',
cwd: '/repo',
text: 'sync your current work state',
messageId: 'sync-1',
messageKind: 'member_work_sync_nudge',
taskRefs: [{ taskId: 'task-1', displayId: 'abcd1234', teamName: 'team-a' }],
})
).resolves.toMatchObject({
ok: true,
runtimePromptMessageId: 'msg_prompt_1',
});
expect(sendOpenCodeTeamMessage).toHaveBeenCalledWith(
expect.objectContaining({
messageId: 'sync-1',
messageKind: 'member_work_sync_nudge',
settlementMode: 'observed',
})
);
});
it('observes direct teammate messages by exact accepted runtime prompt id', async () => {
const observeOpenCodeTeamMessageDelivery = vi.fn<
NonNullable<OpenCodeTeamRuntimeBridgePort['observeOpenCodeTeamMessageDelivery']>