777genius d018002c3e feat(docs): restructure VitePress IA, improve onboarding/troubleshooting docs

- Restructure sidebar: Start → Guide → Operations → Developers → Reference
- Fix EN/RU sidebar order (Installation before Quickstart)
- Expand troubleshooting with diagnostics commands and task-log triage
- Improve quickstart with prerequisites, pitfalls, and contributor links
- Expand installation docs with verification commands
- Add cyberpunk hero theme to landing page
- Add atomicFile utility with tests and stage-runtime script
- Harden team provisioning with better error handling and progress output
- Add cross-team communication, kanban, and workSync improvements

2026-05-15 23:34:06 +03:00

12 KiB

Raw Permalink Blame History

title	description
Troubleshooting – Agent Teams Docs	Fix team launch issues, missing agent replies, rate limits, CLI auth problems, and lane bootstrap stalls with local diagnostics.

Troubleshooting

Most team issues fall into one of four buckets: runtime setup, launch confirmation, task parsing, and provider limits.

Quick evidence setup

For any team lifecycle issue, define these variables first and reuse the same shell:

TEAM="<team-name>"
TEAM_DIR="$HOME/.claude/teams/$TEAM"
TASKS_DIR="$HOME/.claude/tasks/$TEAM"

Then confirm the expected files exist before interpreting UI state:

test -d "$TEAM_DIR" && find "$TEAM_DIR" -maxdepth 2 -type f | sort | sed -n '1,80p'
test -d "$TASKS_DIR" && find "$TASKS_DIR" -maxdepth 1 -name '*.json' | sort | sed -n '1,40p'

::: warning Evidence first Do not fix prompts, provider settings, or process cleanup based only on a stuck badge. First correlate the UI with persisted files, launch artifacts, and runtime evidence. :::

Team does not launch

Check each item in order:

Runtime available — the selected CLI (claude, codex, opencode) is installed
PATH reachable — the binary is available in the environment PATH
Model access — the provider has access to the requested model string (especially for OpenCode, exact provider/model names matter)
Project path — the project directory exists and is readable
Network / VPN — some providers drop traffic when a VPN is active

::: tip Run the runtime binary in a terminal to verify PATH and auth. Example: claude --version or opencode --version. :::

OpenCode: registered but bootstrap unconfirmed

If OpenCode shows registered but bootstrap is unconfirmed, inspect artifacts first before changing team prompts.

Contributor/debugging details live in Contributor Architecture, which links to the canonical agent team debugging runbook.

Look at the newest launch failure artifact:

LATEST_FAILURE="$TEAM_DIR/launch-failure-artifacts/latest.json"
MANIFEST_PATH="$(jq -r '.manifestPath' "$LATEST_FAILURE")"
jq '.classification, .bootstrapTransportBreadcrumb, .memberSpawnStatuses' "$MANIFEST_PATH"

latest.json points to the newest packed artifact directory and its manifest.json. The manifest includes:

classification — why the launch was considered a failure
bootstrapTransportBreadcrumb — delivery path used
Member spawn statuses
Redacted logs and traces

Also check the lane manifest:

jq '.lanes' "$TEAM_DIR/.opencode-runtime/lanes.json" 2>/dev/null
find "$TEAM_DIR/.opencode-runtime/lanes" -maxdepth 2 -name manifest.json -print -exec jq '.activeRunId, .entries' {} \; 2>/dev/null

::: tip Do not guess from the UI Always correlate UI diagnostics with persisted files (launch-state.json, bootstrap-journal.jsonl) and runtime-specific evidence. :::

General diagnostics

Start with persisted files on disk rather than the UI alone.

Team root

printf '%s\n' "$TEAM_DIR"

Key files and what they tell you:

launch-state.json — member launch/liveness state (.teamLaunchState, .summary, .members)
bootstrap-journal.jsonl — ordered bootstrap events from CLI/runtime (tail -80)
bootstrap-state.json — bootstrap phase summary
config.json — provider, model, and project configuration
inboxes/*.json and sentMessages.json — message delivery state

jq '.teamLaunchState, .summary, .members' "$TEAM_DIR/launch-state.json"
tail -80 "$TEAM_DIR/bootstrap-journal.jsonl" 2>/dev/null

OpenCode runtime evidence

For OpenCode teammates, session proof is in the lane runtime store:

.opencode-runtime/lanes.json — lane index with state
.opencode-runtime/lanes/<lane>/manifest.json — activeRunId and evidence entries
.opencode-runtime/lanes/<lane>/opencode-sessions.json — committed session records

Expected healthy state: lane state active, manifest has activeRunId with at least one evidence entry, member has bootstrapConfirmed: true.

jq '.lanes' "$TEAM_DIR/.opencode-runtime/lanes.json" 2>/dev/null
find "$TEAM_DIR/.opencode-runtime" -maxdepth 3 -type f | sort

Launch failure artifacts

When a launch is marked as a failure, inspect latest.json:

LATEST_FAILURE="$TEAM_DIR/launch-failure-artifacts/latest.json"
jq '.' "$LATEST_FAILURE"
jq '.' "$(jq -r '.manifestPath' "$LATEST_FAILURE")"

The manifest includes:

classification — why the launch was considered a failure
bootstrapTransportBreadcrumb — delivery path used
Member spawn statuses and redacted logs/traces

Agent replies are missing

Open task logs and teammate messages. Missing replies often come from:

Runtime delivery retry — the agent may have answered, but the message was not delivered to the app. Check the delivery ledger.
Parsing or filtering — the agent output did not include expected markers or task references.
Task attribution — the work happened during the session but was not linked to the task because the correct task id was missing from the output.

::: warning Do not assume silence means ignoring Do not assume the model ignored the message until logs confirm it. :::

Use the persisted message state to separate "not sent" from "sent but not rendered":

jq '.' "$TEAM_DIR/inboxes/user.json" 2>/dev/null
jq '.' "$TEAM_DIR/sentMessages.json" 2>/dev/null

Check from, to, messageId, relayOfMessageId, and taskRefs. For OpenCode teammates, also inspect runtime delivery evidence before assuming the model ignored the prompt.

Tasks are not linked to changes

Use task-specific logs and code review links. If a diff appears detached:

Check whether the task id or task reference was included in the agent output.
Verify the agent called task_add_comment before making edits.
Ensure the agent called task_start so the board knows work began.

For OpenCode teammates, the authoritative proof that a session belongs to a task is in opencode-sessions.json and the lane manifest entry, not only the UI message stream.

Task log triage

When a task log looks incomplete, search by task id across task JSON, inboxes, and bootstrap events:

TASK="<short-or-full-task-id>"
rg -n "$TASK" "$TASKS_DIR" "$TEAM_DIR/inboxes" "$TEAM_DIR/bootstrap-journal.jsonl" 2>/dev/null

Interpret the result carefully:

Evidence	What it proves	What it does not prove
Message delivered	The app wrote or relayed a prompt	The agent made progress
Task comment	The agent posted board-visible text	The comment is meaningful progress
Native tool rows	The runtime did work in a session	The work belongs to this task unless attribution matches
Change ledger entry	The app recorded file changes	The implementation is correct

For OpenCode, a healthy task log usually includes native runtime rows like read, bash, edit, or write plus Agent Teams MCP rows. If you only see agent-teams_* rows, confirm task attribution and session bounds before widening log matching.

Rate limits

If a provider reports a known reset time, Agent Teams can nudge the lead to continue after cooldown. If reset time is unknown, wait or switch provider/runtime path.

Provider behavior	Suggested action
Known reset time displayed	Wait for cooldown and continue
No reset time shown	Switch provider or runtime path
Repeated 429s	Lower concurrency or use a different model lane

CLI auth issues

`claude login` does not persist

If the CLI is authenticated in one terminal but the app says it is not, verify the auth is saved to the expected config path and that the app process sees the same $HOME.

OpenCode provider key rejected

Double-check the provider name in config.json matches the provider prefix in the model string
Ensure the key is not expired or revoked in the provider dashboard

Auth diagnostic log

Each call to CliInstallerService.getStatus() appends one line to claude-cli-auth-diag.ndjson in the Electron log folder (usually ~/Library/Logs/<product-name>/ on macOS). If the file exceeds 512 KiB, it is truncated to empty before the next write.

Check this file if you see "Not logged in" or auth errors in the packaged app.

Lane bootstrap stuck

For OpenCode secondary lanes:

A missing inboxes/<member>.json is not automatically a bug. OpenCode lanes do not have to be primary-inbox-created before they start.
If the UI shows the team still launching while primary members are already usable, "all teammates joined" is waiting for secondary lanes.
If Prepared communication channels for X/Y members hangs, verify whether Y incorrectly includes secondary OpenCode members.

Lane manifest empty entries

If the bridge says bootstrap succeeded but manifest.json shows entries: [], the issue is evidence commit, not model behavior. The member must not be considered deliverable until opencode-sessions.json and its manifest entry exist.

Common member states

State	Meaning
`confirmed_alive` + `bootstrapConfirmed`	Healthy and ready
`registered` / `runtime_pending_bootstrap`	Process or lane exists, but bootstrap proof has not been committed yet
`failed_to_start` + `runtime_process`	Process exists, but launch gate failed. Check diagnostics
`failed_to_start` + `stale_metadata`	Saved pid/session is stale or dead

::: warning member_briefing by itself is NOT runtime evidence. For OpenCode, authoritative proof is committed runtime evidence such as opencode-sessions.json and the manifest entry. :::

Runtime debug mode

For local debugging, you can force teammates to run in tmux panes:

# Launch from a terminal
CLAUDE_TEAM_TEAMMATE_MODE=tmux pnpm dev

# Or add to custom CLI args
--teammate-mode tmux

Use this to inspect interactive CLI behavior. Do not consider this fully equivalent to the process backend.

Smoke checks

Use the desktop Electron app for normal validation. Browser/web dev mode does not include the full desktop runtime, IPC, provider auth, terminal, or team lifecycle behavior.

Docs-only changes

From the repo root:

pnpm --dir landing docs:build
git diff --check -- landing/product-docs

Team lifecycle changes

Start narrow, then expand:

pnpm test -- test/main/services/team/TeamProvisioningService.test.ts
pnpm test -- test/main/services/team/TeamAgentLaunchMatrix.safe-e2e.test.ts
pnpm typecheck
git diff --check

Live team smoke

Use a small team and a Git-tracked disposable project:

Start the desktop app with pnpm dev.
Create one lead plus one builder.
Ask for a tiny change with an explicit verification command.
Confirm the task moves pending -> in_progress -> completed.
Open task logs and verify tool rows, task comments, and file changes line up.
Stop only the smoke-owned team/processes when cleaning up.

::: warning Narrow cleanup only Do not kill all OpenCode hosts, unrelated tmux panes, or user teams while cleaning up a smoke run. :::

Safe cleanup

When cleaning up stale processes:

Identify the pid and confirm it belongs to the current team / lane.
Stop only processes explicitly belonging to a smoke test or the launch you are debugging.
Do not kill all OpenCode or shared host processes as a shortcut.

When to collect evidence

Before asking for help, collect:

Task id (short or full)
Team name
Runtime path (claude, codex, or opencode)
Launch log excerpt (from latest.json or bootstrap-journal.jsonl)
Provider / model string
Exact time window when the issue occurred

This data is usually enough to debug launch and task lifecycle issues.

::: tip If the issue persists, open the team's persisted files under ~/.claude/teams/<teamName>/ and correlate UI diagnostics with the live process state before changing code. :::

12 KiB Raw Permalink Blame History Unescape Escape