# Debugging Agent Teams Use this runbook when a team launch hangs, a teammate is marked `registered` or `failed_to_start`, messages do not appear, or OpenCode participants look online but do not answer. ## First Rule Do not guess from the UI alone. Always correlate: - UI diagnostics copied from the launch/member detail panel - persisted team files under `~/.claude/teams//` - live process table - runtime-specific evidence, especially OpenCode lane manifests ## Key Files Team root: ```bash TEAM="" TEAM_DIR="$HOME/.claude/teams/$TEAM" ``` Important files and folders: - `config.json` - configured members, provider/model selection, project path - `members-meta.json` - member metadata, removed members, worktree settings if present - `launch-state.json` - current app-side truth for member launch/liveness - `bootstrap-state.json` - bootstrap phase summary when present - `bootstrap-journal.jsonl` - ordered bootstrap events from the CLI/runtime - `inboxes/*.json` - durable inbox messages for user, lead, and native teammates - `sentMessages.json` - app-side sent-message records - `tasks/*.json` - task board state - `.opencode-runtime/lanes.json` - OpenCode lane index - `.opencode-runtime/lanes//manifest.json` - lane-scoped runtime store manifest - `.opencode-runtime/lanes//opencode-sessions.json` - committed OpenCode session evidence Quick inspection: ```bash jq '.teamLaunchState, .summary, .members' "$TEAM_DIR/launch-state.json" jq '.lanes' "$TEAM_DIR/.opencode-runtime/lanes.json" 2>/dev/null find "$TEAM_DIR/.opencode-runtime" -maxdepth 3 -type f | sort tail -80 "$TEAM_DIR/bootstrap-journal.jsonl" 2>/dev/null ``` ## Launch Phases Primary launch and OpenCode secondary lanes are different paths. - Primary CLI members are created by the main provisioning process. - OpenCode secondary members are launched as side lanes after primary filesystem readiness. - Missing `inboxes/.json` is not automatically a launch bug. OpenCode side lanes do not have to be primary inbox-created before they start. - The UI can show the team still launching while primary members are already usable, because "all teammates joined" waits for secondary lanes too. When a launch hangs at `Prepared communication channels for X/Y members`, check whether `Y` incorrectly includes secondary OpenCode members. The filesystem monitor should wait for `effectiveMembers`, not every requested member. ## Member State Meanings Common `launch-state.json` cases: - `confirmed_alive` with `bootstrapConfirmed: true` - member is usable. - `registered` / `runtime_pending_bootstrap` - process or lane exists, but bootstrap proof is not committed yet. - `registered_only` - app has persisted metadata, but no live runtime proof. - `runtime_process_candidate` - process/session was observed, but committed runtime evidence is incomplete or pending. - `failed_to_start` with `runtime_process` - a process exists, but the launch gate still failed. Inspect diagnostics and runtime evidence. - `failed_to_start` with `stale_metadata` - persisted pid/session is old or dead. Do not treat `member_briefing` alone as runtime evidence. For OpenCode, the authoritative proof is committed bootstrap/session evidence in the lane runtime store. ## OpenCode Debug Flow For an OpenCode teammate: ```bash MEMBER="" jq --arg member "$MEMBER" '.members[$member]' "$TEAM_DIR/launch-state.json" jq '.lanes' "$TEAM_DIR/.opencode-runtime/lanes.json" 2>/dev/null find "$TEAM_DIR/.opencode-runtime/lanes" -maxdepth 3 -type f | sort ``` Expected healthy OpenCode lane: - `lanes.json` has the lane state `active` - lane `manifest.json` has `activeRunId` - lane manifest has at least one runtime evidence entry, usually `opencode.sessionStore` - lane directory has `opencode-sessions.json` - `launch-state.json` member has `runtimeRunId`, `runtimeSessionId`, and `bootstrapConfirmed: true` If the bridge says bootstrap succeeded but the manifest has `entries: []`, the issue is evidence commit, not model behavior. The member must not be considered deliverable until `opencode-sessions.json` and its manifest entry exist. OpenCode bridge ledger, if needed: ```bash LEDGER="$HOME/Library/Application Support/claude-agent-teams-ui/opencode-bridge/command-ledger.json" jq --arg team "$TEAM" '.data[] | select(.teamName == $team)' "$LEDGER" 2>/dev/null ``` Live process checks: ```bash pgrep -af "opencode serve" ps -p -o pid,ppid,etime,command ``` Do not kill all OpenCode processes as a debugging shortcut. First identify whether the pid belongs to the current team/lane. Some OpenCode temp `libopentui.dylib` files are held by live `opencode serve` processes and should only be cleaned after those processes are stopped. ## Messaging Debug Flow Lead and teammates use different delivery paths: - Lead reads stdin. Messages to lead go through `relayLeadInboxMessages()`. - Native teammates read their inbox files directly. - OpenCode teammates receive prompts through runtime delivery and must reply via `agent-teams_message_send`. - Teammate-to-user replies should appear in `inboxes/user.json` or app sent-message projections. If a notification appears but the Messages UI does not show it: ```bash jq '.' "$TEAM_DIR/inboxes/user.json" 2>/dev/null jq '.' "$TEAM_DIR/sentMessages.json" 2>/dev/null ``` Check `from`, `to`, `messageId`, `relayOfMessageId`, and `taskRefs`. Unknown authors should be rejected or normalized at the write boundary, not silently rendered as fake teammates. For OpenCode "message saved but not delivered" cases, inspect the OpenCode prompt-delivery ledger and response proof. Do not synthesize visible replies in the frontend. ## Task And Work-Stall Debug Flow For task stalls: ```bash TASK="" rg -n "$TASK" "$TEAM_DIR/tasks" "$TEAM_DIR/inboxes" "$TEAM_DIR/bootstrap-journal.jsonl" 2>/dev/null ``` Important distinctions: - Delivery proof means the agent received the message. - Task progress proof means the agent made meaningful task progress. - A weak comment like "starting work" is not strong progress. - `task_add_comment` should be evaluated from the actual persisted comment text, not only from the tool call. Task-stall monitor defaults: - General task-stall monitor is for all agents. - OpenCode direct remediation is provider-specific and should nudge the OpenCode owner first. - If OpenCode remediation is not accepted, fallback to lead alert. - Watchdog/remediation must not auto-start new OpenCode processes. ## Task Log Stream Debug Flow Task Log Stream is a projection, not a separate source of truth. For OpenCode tasks, a healthy stream should show native tool rows such as `read`, `bash`, `edit`, `write`, plus Agent Teams MCP rows. If it only shows `agent-teams_*` calls: - confirm the task has OpenCode attribution for the member/session - confirm the OpenCode transcript contains native tools inside the bounded task window - check whether the task was assigned after the native work happened - do not widen attribution so far that unrelated session work is pulled into the task If Changes says "No file changes recorded" while native `write`/`edit` rows exist, inspect the ledger/backfill path. Task logs can show runtime tools even when `.board-task-changes/**` was not created. ## Safe Fix Checklist Before changing launch or runtime logic: - Preserve stale-run, tombstone, stopped-team, and removed-member guards. - Do not make `member_briefing` runtime evidence. - Do not make delivery/watchdog auto-launch a fresh OpenCode lane. - Keep primary launch readiness separate from secondary OpenCode lane readiness. - Keep runtime evidence lane-scoped. Never let one OpenCode lane satisfy another lane. - Add a regression test for the exact state shape you found in `launch-state.json`. Recommended verification: ```bash pnpm vitest run test/main/services/team/TeamProvisioningService.test.ts pnpm vitest run test/main/services/team/TeamAgentLaunchMatrix.safe-e2e.test.ts pnpm typecheck --pretty false git diff --check ``` Use narrower test commands first when editing a focused path, then run the broader suite that covers launch, delivery, and liveness.