Commit graph

1188 commits

Author SHA1 Message Date
777genius
bc660e7e9b perf(renderer): slow process-lite runtime telemetry refresh 2026-05-31 03:46:25 +03:00
777genius
f06a50f859 perf(renderer): reduce sidebar task list render work 2026-05-31 02:55:14 +03:00
iliya
867d5368e1 test(team): align lead session cache tests 2026-05-31 00:29:10 +03:00
iliya
76299a4e44 fix(team): align runtime telemetry tests with process rows 2026-05-31 00:14:16 +03:00
777genius
e90886316f perf(main): cache runtime resource telemetry 2026-05-30 23:59:56 +03:00
777genius
4b5a49e6e8 perf(main): share lead session tail reads 2026-05-30 23:19:15 +03:00
777genius
f4f42e2ca4 perf(main): avoid full team scans for global tasks 2026-05-30 23:14:42 +03:00
777genius
8f1e223d61 perf(renderer): slow task refreshes during launch 2026-05-30 22:57:41 +03:00
777genius
c2ab7a7ff4 perf(main): skip unchanged task resume locks 2026-05-30 22:47:52 +03:00
777genius
7ea57cb012 perf(renderer): defer task project scans 2026-05-30 22:33:17 +03:00
777genius
1f46a14a79 perf(main): trim runtime process liveness scans 2026-05-30 22:14:52 +03:00
777genius
4575255c28 perf(main): skip redundant task interval resume scans 2026-05-30 22:07:24 +03:00
777genius
e7d7b3014e perf(main): reduce runtime launch metadata work 2026-05-30 21:52:36 +03:00
777genius
b8a53dcc09 perf(renderer): reduce member card render work 2026-05-30 20:58:31 +03:00
777genius
60d806135c perf(renderer): lazy mount task context menus 2026-05-30 20:54:22 +03:00
777genius
2c13516d9f perf(renderer): lazy render member hover cards 2026-05-30 20:50:17 +03:00
777genius
d59865f300 perf(team): reduce runtime telemetry polling 2026-05-30 20:03:45 +03:00
777genius
b13ee56359 perf(team): cap global task projections earlier 2026-05-30 18:57:45 +03:00
777genius
304d0a5ef1 perf(team): avoid redundant task cache clone 2026-05-30 18:48:43 +03:00
777genius
a7606032fc fix(runtime): preserve provider status during refresh failures 2026-05-30 18:44:31 +03:00
777genius
a06423a574 fix(team): preserve live overlay in worker message pages 2026-05-30 18:44:12 +03:00
777genius
180bdb7575 perf(team): cache transcript affinity verdicts 2026-05-30 18:39:16 +03:00
777genius
9ed1988346 fix: refresh watch scope on provider fallback 2026-05-30 17:34:05 +03:00
777genius
06036460e9 fix: keep watch scope safe on provider errors 2026-05-30 17:11:31 +03:00
777genius
3b7b5dfd75 fix: preserve file lock acquire errors 2026-05-30 17:08:29 +03:00
777genius
a18009cc0f fix: keep created teams in watch scope 2026-05-30 16:51:47 +03:00
777genius
f4155a6742 test: align idle watchdog expectation with stale window 2026-05-30 16:32:20 +03:00
777genius
d43cd3a0db test: sort team config reader imports 2026-05-30 16:06:01 +03:00
777genius
8cb44cd793 perf: replace readline with a chunked line generator in team JSONL readers
readline.createInterface runs an expensive Unicode line-break regex + extra
stream/string-decoder machinery per chunk. The main transcript parser (parseJsonlStream)
already uses a buffer + manual newline split; these per-team readers still used readline.

Add readJsonlLines(): an async generator that yields a JSONL file's lines via a chunked
utf8 stream read + a plain '\n' split (drop-in for 'for await (const line of rl)'), so the
consumers' loop bodies are unchanged. Stream is utf8-decoded before splitting, so multi-byte
chars across chunk boundaries are safe; trailing CR (CRLF) is stripped; empty lines and a
final newline-less line are yielded, matching readline; breaking out of the loop destroys
the stream via the generator's finally.

Adopt it in MemberStatsComputer, TaskBoundaryParser, and FileContentResolver (file-history
scan). Behavior-identical (their existing tests pass: 18 + 6 + 12) plus 6 new tests for the
generator (CRLF, empty lines, no-trailing-newline, early break, multi-byte chunk boundary).

Note: session-browser readline paths (jsonl metadata extractor, metadataExtraction,
SessionContentFilter) are off the launch path and left as-is for now.
2026-05-30 15:58:09 +03:00
777genius
92f1000a4f test: cover transcript head metadata cache 2026-05-30 15:51:59 +03:00
777genius
127d31ba88 perf: cache unchanged team task reads 2026-05-30 15:51:15 +03:00
777genius
28a55416ca test: isolate runtime usage stubs from process table 2026-05-30 15:46:27 +03:00
777genius
61e2678a5d perf: cache team transcript head metadata 2026-05-30 15:26:09 +03:00
777genius
0a750a9fa8 perf: share a frozen team-summary snapshot instead of cloning on every listTeams
listTeams() deep-cloned ALL team summaries via structuredClone on every call -- even
cache hits and concurrent in-flight awaiters. A heap allocation sample of a launch
put this (listTeams -> cloneTeamSummaries -> structuredClone) as the single largest
memory allocator, driving heap churn + GC pressure during launch (this stand has ~158
teams, and listTeams is called constantly: startup, notification init, task projection,
IPC polls, provisioning).

Build ONE deep-frozen, independent snapshot per uncached load and hand the same
reference to the cache entry, in-flight awaiters, and every later reader. The single
cloneTeamSummaries keeps it independent of any cached config the loader returns;
freezing lets all readers share it safely. Audited every listTeams consumer -- all
iterate / map / filter / serialize, none mutate -- and the freeze turns any stray
future mutation into a loud error rather than silent cross-caller corruption.

TeamConfigReader 26/26 (added a frozen + same-reference regression test), and the
listTeams consumers (TeamDataService 116, CrossTeamService 26) all pass under frozen
summaries.
2026-05-30 15:25:26 +03:00
777genius
58dfac8377 fix: preserve runtime rss fallback when process table misses roots 2026-05-30 15:16:03 +03:00
777genius
f0797e2c12 perf: replace readline with a bounded chunked read in the affinity head scan
fileBelongsToTeam streamed the head window via createReadStream + readline. readline's
line iterator runs an expensive Unicode line-break regex and stream/string-decoder
machinery per chunk, which showed up as a top main-thread cost during launch (the line-
split regex alone was ~5.7% in the warm launch profile).

Replace it with a bounded chunked fs.read + a plain '\n' split. JSONL is strictly
newline-delimited and each line is trim()'d (so a trailing CR from CRLF is dropped),
so a '\n' split is cheaper and more correct (it will not split on a bare CR or a
Unicode line/paragraph separator inside a JSON string value, which readline would). A
StringDecoder preserves multi-byte UTF-8 sequences that straddle a chunk boundary.

Byte-identical semantics to the old loop: inspect up to TEAM_AFFINITY_SCAN_LINES
non-empty lines, first match wins via early break, and a final line is honored even
without a trailing newline. Reads in 64KB chunks so a team decided in its first lines
is not penalized by a huge file. Adds tests for CRLF endings + no-trailing-newline,
a multi-byte char straddling the 64KB boundary, and the 40-line window bound (21 pass).
2026-05-30 14:54:58 +03:00
777genius
776298b0e3 perf: reuse caller stat in fileBelongsToTeam, drop duplicate fs.stat
On the live resolution path collectRootJsonlSessionIds already stat()s each root
jsonl for its mtime-window filter, then fileBelongsToTeam stat()ed the very same
file again for its cache validation -- two fs.stat syscalls (plus two Stats
allocations) per file, every poll. fileBelongsToTeam now takes an optional
precomputed stat and the mtime-filter caller passes the stat it already has, so the
file is statted once. Measured 20 files -> 20 stat calls on the mtime path (was ~40).

Using a single stat snapshot is also slightly more consistent than two reads that
could straddle a concurrent write. The other call site (subagent scan) passes no
stat and is unchanged (fileBelongsToTeam stats it itself). Adds a regression test
that a caller-supplied stat is the one recorded in the affinity cache.
2026-05-30 14:11:58 +03:00
777genius
c8d40be460 perf: cache negative team-affinity verdicts from a full head window
fileBelongsToTeam only cached POSITIVE affinity durably; a negative verdict was
re-decided on any change, so during a launch every non-matching transcript in the
project dir that grew (mtime+size change from an active session) was re-streamed
(createReadStream+readline) and re-parsed (up to 40 head lines) on every bootstrap
poll. A live atlas-hq-5 launch profile put this whole subsystem (readline streaming
+ fileBelongsToTeam + line/team matching) at ~31% of main-thread JS, the single
largest launch cost.

A team's first 40 head lines are immutable for an append-only transcript, so a
`false` decided from a FULL inspected window (>= TEAM_AFFINITY_SCAN_LINES) stays
valid while the file only grows. Track headWindowFull on the cache entry and short-
circuit such negatives the same way positives are short-circuited (size >= cached).
Short files (partial window) are still re-scanned on growth, so a team mention that
later lands inside the head window is still detected. A shrink/rewrite (size <
cached) forces a re-scan, identical to the positive path.

Behavior-preserving for affinity correctness (no new false negatives); only removes
redundant re-streams. Adds regression tests for both the durable-negative and the
short-file-flips-to-true cases.
2026-05-30 13:17:49 +03:00
777genius
126a485477 wip: team messages panel updates and runtime usage cache refinements
Checkpoint of in-progress work:
- renderer: team messages panel/composer, messagesPanelLogic, teamSlice,
  AnimatedHeightReveal plus their tests
- main: runtime process usage-stats caching (ignoreCachedMisses, bounded
  eviction), alive-run-id helpers, team watch-scope notify wiring

Note: the getTeamAgentRuntimeSnapshot rssBytes expectation in
TeamAgentLaunchMatrix.safe-e2e is environment-dependent and still red.
2026-05-30 12:54:11 +03:00
777genius
d0c64fabb8 fix: export notifyTeamWatchScopeChanged so the committed build resolves
TeamProvisioningService imports notifyTeamWatchScopeChanged (added with the
setAliveRunId/deleteAliveRunId helpers) but the export was missing, so a clean
checkout of the branch failed to typecheck. Add the export plus a test; the
call-site wiring stays as in-progress work.
2026-05-30 12:33:57 +03:00
777genius
8b3cec8013 perf: update team watcher incrementally instead of full rebuild
A team launch repeatedly changes the watched target set (new dirs appear), and each
change tore down the chokidar watcher and recreated it over the full target set.
On macOS chokidar uses kqueue with one fd per watched file, so every rebuild
re-opened an fd for EVERY watched file (the large always-watched inbox set plus
scoped dirs). Profiling a 6-member mixed launch showed ~54k open() syscalls dominated
by these rebuilds.

Keep one persistent watcher and apply target-set changes with add()/unwatch() on the
delta only, so a reconcile opens fds for just the newly added dirs. The initial
watcher still uses ignoreInitial for a silent startup baseline, and
emitExistingFilesForNewTargets still backfills files already present in newly added
dirs, so the emitted event surface is unchanged. Because the watcher is no longer
recreated per reconcile, the stale-old-generation and close-throws-during-rebuild
failure modes are gone; their tests are replaced with incremental add/unwatch and
persistent-watcher coverage. All 69 watcher tests pass.
2026-05-30 12:05:23 +03:00
777genius
f79ea145d7 perf: batch per-member task-interval resume into one locked pass
During launch the live-status loop resumes every alive member every audit cycle.
resumeActiveIntervalsForMember runs a synchronous file-lock + full read of every
task file, so for an N-member team with M task files it did N locked passes x M
readFileSync per cycle (e.g. 6 members x 20 task files), blocking the main event
loop. Profiling a 6-member mixed launch showed mutateTeamTasks/withFileLockSync as
a top main-thread cost (~14%).

Add resumeActiveIntervalsForMembers that applies the identical per-member resume
logic against a member set in a single locked pass, and use it in the live-status
loop. Same mutations, but one lock + task read per cycle instead of one per member.
Adds a test covering multi-member resume in one pass.
2026-05-30 10:02:01 +03:00
777genius
aa9a1bba8c perf: debounce team watcher rebuilds during dir-event bursts
A team launch creates many directories/files in quick succession (worktrees,
inboxes, session logs), and each addDir/unlinkDir event triggered a full
TeamTaskWatchRegistry reconcile that tore down and recreated the entire chokidar
watcher (re-opening a kqueue fd per watched file on macOS). Profiling a 6-member
mixed-team launch showed kqueue churn (kevent) as a top native cost and watcher
rebuild as the top remaining main-thread JS cost after the transcript fix.

Debounce the event-driven reconcile (250ms) so a burst collapses into one rebuild.
collectTargets re-reads the current directory state and emitExistingFilesForNewTargets
backfills files created before the rebuild, so no change is missed; requestReconcile,
startup, and the periodic 30s reconcile stay immediate. Adds a test asserting a
burst of addDir events yields a single rebuild.
2026-05-30 09:46:16 +03:00
777genius
35b76f1354 perf: share bootstrap transcript tail parse across members
During launch, the bootstrap-wait loop polls each member and, per member, re-read
and re-JSON.parsed the same growing transcript tail (readRecentBootstrapTranscriptOutcome
was the top main-thread JS hotspot at ~21% during bootstrap, ~40% with its helpers).
The same file was parsed once per member per poll.

Memoize the parsed tail by (filePath, mtime, size) in a shared cache so the file is
read + parsed once per change and reused across all members. The per-member filter
and failure/success scan is byte-for-byte the same logic; only the redundant read +
JSON.parse is removed. Cache is bounded (LRU, same cap as the outcome cache) and
invalidated on mtime/size change, matching the existing outcome cache semantics.

Adds a test asserting the tail is parsed once and shared while per-member outcome
detection is unchanged.
2026-05-30 01:05:54 +03:00
777genius
5d63ecfe32 perf: scope team file watching to active and engaged teams
The main process watched every team directory under ~/.claude/teams (one shallow
chokidar target per team root, per team inboxes, and per task dir). On macOS this
falls back to kqueue, which needs one fd per watched file, so a workspace with
many teams kept ~1600 descriptors open and made startup and reconcile work scale
with the number of teams on disk.

Scope the team-root and task watching to teams that are running or currently
engaged in the UI. The teams root and every team's inboxes are still watched for
all teams, so cross-team message delivery, the lead inbox->stdin relay, and
notifications are unchanged. Idle teams are static, so dropping their team-root/
task watches is safe; opening a team (getData) or launching it re-adds it via an
immediate watch-scope refresh. The provider falls back to watching every team
when unset, and the EMFILE polling fallback is intentionally left unscoped so a
scope change can never look like a deletion.

Measured on a 162-team workspace: open team fds 1600 -> 730, with team-root
watching restored the moment a team is opened or goes live.
2026-05-30 00:25:55 +03:00
777genius
322c63ea8b perf: skip offline runtime polling 2026-05-29 16:11:16 +03:00
777genius
b4b9175287 perf: reuse team summary for comment notification init 2026-05-29 15:43:24 +03:00
777genius
f0514a7d17 perf: skip unverifiable runtime process scans 2026-05-29 14:38:01 +03:00
777genius
9be096f864 perf: cache persisted bootstrap outcome lookups 2026-05-29 14:19:04 +03:00
777genius
d0c6fdd28c perf: extend persisted runtime probe cache 2026-05-29 14:04:28 +03:00