Remove the SessionDetail.messages stripping and related cache-size
change per maintainer feedback. The session-detail optimization will
follow separately after PR #58 lands with the right architectural
pattern (lightweight snapshot + separate endpoints).
This PR now contains only:
- progressPayload helpers (buildProgressLogsTail, buildProgressAssistantOutput)
- cap applied to emitLogsProgress, updateProgress, stall warning, retry error
- throttle raised 300ms -> 1000ms
- tests for the progress payload behavior
Users with long-running teams (37+ tasks, 10+ agents for an hour) were
hitting constant renderer crashes (issue #36). Two hot paths were
serializing unbounded histories across IPC on every tick:
- Provisioning progress: emitLogsProgress and updateProgress both
joined the full provisioningOutputParts array (~20 event-driven call
sites) plus the full CLI log tail, then fanned that out to the
renderer. After an hour, each tick shipped multi-megabyte payloads
and Zustand OOM'd on the immutable state clone.
- Session detail cache: SessionDetail.messages (the raw parsed JSONL)
was being cached and returned over IPC/HTTP even though the renderer
only reads session/chunks/processes/metrics. This roughly doubled
the per-entry cache footprint on large sessions.
Fixes:
- Add progressPayload helpers that cap the log tail to 200 lines and
assistant output to the last 20 parts; empty/whitespace joins
collapse to undefined so the noop guard is explicit rather than
coincidental.
- Apply the cap inside emitLogsProgress, updateProgress, and the two
inline emission paths (stall warning, retry error). Throttle the
log-progress tick 300ms -> 1000ms so Zustand can keep up.
- Add stripSessionDetailMessages and call it at every SessionDetail
production site that crosses IPC/HTTP (both sessions.ts routes,
both cache stores).
- Raise MAX_CACHE_SESSIONS 5 -> 20 now that the per-entry SessionDetail
footprint is bounded. Previously 5 forced constant re-parsing on
every session switch.
Tests: 15 new unit tests covering the helpers (tail slicing, empty
parts, whitespace-only parts, non-mutation of inputs).
- Added support for live runtime model metadata in team provisioning.
- Implemented functions to extract and manage CLI flag values for team members.
- Updated member specifications to include effective models based on provider defaults.
- Enhanced UI dialogs to check selected providers in parallel, improving responsiveness.
- Added tests for handling model unavailability during team bootstrap and launch processes.
- Introduced CLI_INSTALLER_VERIFY_PROVIDER_MODELS IPC channel for on-demand model verification.
- Implemented handler for verifying provider models in the CliInstallerService.
- Enhanced CLI installation status management with model verification state and availability.
- Updated related components to support model verification feedback in the UI.
- Updated components in the agent-graph renderer to utilize context hooks instead of the store for accessing team data.
- Introduced `useGraphActivityContext` and `useGraphMemberPopoverContext` hooks to streamline data management.
- Refactored `GraphBlockingEdgePopover`, `GraphNodePopover`, and `GraphTaskCard` components for improved performance and readability.
- Enhanced imports in `MemberDetailDialog` for better organization.
* fix(team): resolve stuck "reconciling" state and skip resume when teammates never spawned
Addresses #54.
When a team launch fails to bootstrap teammates, the team gets stuck showing
"Last launch is still reconciling" indefinitely, and retrying with --resume
reconnects the lead but does not re-spawn the dead teammates. The only
workaround was enabling "Clear context (fresh session)", which loses the
lead's prior conversation context.
Two root causes addressed:
1. createPersistedLaunchSnapshot counted members still in 'starting' state
(agentToolAccepted=false) as 'pending' regardless of launchPhase. When
launchPhase was 'finished' with never-spawned members, the aggregate
state stayed as 'partial_pending' forever, rendered as "still reconciling".
Fix: when launchPhase != 'active', promote such members to
'failed_to_start' so the aggregate becomes 'partial_failure'
("Launch failed partway"), which correctly signals a terminal state.
2. TeamProvisioningService._launchTeamInner always used --resume when a
previous leadSessionId existed, even if the previous launch had no
teammates successfully spawned. The CLI's deterministic reconnect path
restores lead context but does not re-spawn dead teammates, so the team
stays broken across relaunches. Fix: before adding --resume, read the
persisted launch state. If every expected teammate is 'starting' (never
spawned) or 'failed_to_start', skip --resume so the CLI performs a full
fresh bootstrap that spawns all teammates.
Verified manually on Linux: a team stuck in "still reconciling" correctly
transitions to "failed partway" after the first fix, and the next Launch
(without "Clear context") fully bootstraps and brings teammates online.
* fix(team): narrow skip resume to never-spawned teammates
---------
Co-authored-by: 777genius <quantjumppro@gmail.com>