35 KiB
getTeamData Parallel Read Plan
Goal
Reduce team:getData latency by parallelizing only the safe top-level read phase of getTeamData() without changing:
TeamDatapayload shape- message merge semantics
- dedup semantics
- warning semantics
- renderer/main truth boundaries
- launch/process reconciliation behavior
Primary target:
- repeated visible team detail refreshes
Not a target for this patch:
- new IPC endpoints
- renderer refresh changes
- new caches
- message payload slimming
- lower-pipeline refactors
Why This Requires Extra Care
This touches a guarded hot zone:
main process / IPC / heavy team payload assembly
Relevant guardrails:
- preserve state truth and reconciliation
- avoid heavier monolithic snapshot paths
- preserve observability
- prefer the smallest invariant-preserving change
Main failure modes to avoid:
- one async failure aborts the whole snapshot
- warning order becomes nondeterministic
- message merge order changes
- completion-time diagnostics become misleading
- side-effecting reads run concurrently with unrelated file-heavy reads
- naive parallelism creates enough disk contention to erase the perf gain
- the patch quietly assumes tests cover behavior they currently do not cover
- step wrappers silently mask wiring bugs because the wrapped closure does too much work
- nested settlement layers make the control flow harder to audit
presenceIndexPromiserejection semantics change by accident- shared mutable fallback objects leak across steps or tests
- diagnostics state is mutated from concurrent branches and becomes harder to audit
- queued heavy work starts reader I/O before a limiter slot is actually granted
- a queued heavy step records timing metadata at queue time instead of settle time
- fallback factories become mini control-flow hooks and start masking programming bugs
- warning text generation drifts from today because it becomes dynamic instead of literal
Relevant code:
- src/main/services/team/TeamDataService.ts#L501
- src/main/ipc/teams.ts#L540
- src/renderer/store/index.ts#L938
- src/main/services/team/TeamInboxReader.ts#L182
- agent-teams-controller/src/internal/processStore.js#L29
Hard Non-Goals
This patch must not:
- parallelize
readProcesses() - parallelize
resolveMembers() - parallelize
getMemberAdvisories(...) - parallelize
enrichMemberBranches(...) - change the post-load message pipeline
- change warning strings
- change warning order
- change any renderer store logic
- change any
InboxMessageorTeamDatasemantics - move concurrency into
TeamTaskReaderorTeamInboxReader
This patch must also not opportunistically fix adjacent issues such as:
- reusing
listInboxNames()insidegetMessages() - introducing a single-team worker path for tasks
- caching lead-session directory listings
Those are separate patches.
Additional rule:
- keep all new concurrency orchestration local to
getTeamData()
Why:
- moving concurrency into lower-level readers would widen the blast radius
- it would make it harder to reason about total filesystem pressure from this patch
Dependency Graph
The patch must preserve this dependency graph exactly.
Strict prerequisites
configmust complete before:extractLeadSessionTexts(config)- any downstream assembly
Additional rule:
- if
configis missing,getTeamData()must still throw before any read-phase step promises are created
Why:
- this preserves the current fast-fail behavior
- it avoids accidental background work for nonexistent teams
Safe early-start paths
After config:
presenceIndexPromise- tasks
- inboxNames
- messages
- leadTexts
- sentMessages
- metaMembers
- kanbanState
Assembly prerequisites
resolveMembers(...) requires all of:
configmetaMembersinboxNamestasksWithKanban- final assembled
messages
tasksWithKanban requires:
taskskanbanStatepresenceIndexPromise
presenceIndexPromise itself must preserve current semantics:
- it starts early
- it is awaited only after
tasksWithKanbanBase - if it rejects today,
getTeamData()rejects
Rule:
- do not wrap, settle, catch, delay, or parallelize away that rejection behavior in this patch
Explicit downstream-only paths
These must stay after assembly:
getMemberAdvisories(...)enrichMemberBranches(...)readProcesses(teamName)
Why this section exists:
- the patch is only safe if dependency edges remain explicit
- parallelization work tends to hide dependency mistakes behind “it still compiled”
Key Observations
1. processes is not a pure read
readProcesses() looks harmless, but controller.processes.listProcesses() can mark dead PIDs and synchronously rewrite processes.json.
Relevant code:
- src/main/services/team/TeamDataService.ts#L874
- agent-teams-controller/src/internal/processStore.js#L29
Rule:
- keep
processesoutside the parallel read phase
2. Some top-level readers already create their own fan-out
TeamInboxReader.getMessages() already performs bounded concurrent inbox reads.
Relevant code:
Implication:
- do not start every heavyweight reader at once
- control top-level heavy-step concurrency explicitly
3. listInboxNames() is duplicated work today
getTeamData() calls listInboxNames() directly, and getMessages() calls it again internally.
Relevant code:
Implication:
- phase 1 will still duplicate one inbox-directory listing
- that is acceptable for this patch
- do not widen the patch by coupling these readers
4. Warning order is currently deterministic
Current semantic order:
- tasks
- inbox names
- messages
- lead session texts
- sent messages
- member metadata
- kanban state
- runtime advisories
- processes
Rule:
- preserve this order even if reads complete out of order
Additional rule:
- preserve warning presence and wording exactly
- do not deduplicate warnings in this patch
Additional note:
presenceIndexPromiseis intentionally not part of this warning sequence- phase 1 must not introduce a new warning for presence loading
5. The lower pipeline is semantically sensitive
After the read phase, getTeamData() performs:
- message merge
- lead-session/live dedup
- session-id enrichment
- slash-result annotation
- kanban overlay
- member resolution
Rule:
- do not reorder, parallelize, or “clean up” this lower pipeline in phase 1
5a. kanbanGc is part of current diagnostics shape even though no GC runs here
getTeamData() currently records mark('kanbanGc') immediately after kanban load without performing garbage collection.
Relevant code:
Rule:
- keep this mark where it is in phase 1
- do not “clean it up” while touching perf code
Why:
- changing it mixes perf work with observability semantics
- it would make before/after slow-log comparison noisier
6. Existing tests do not cover the riskiest new behavior
Current coverage exercises many data semantics, but not:
- warning order under concurrent failures
- read-phase race behavior
- proof that
processesstayed outside the parallel phase - sync-throw fallback behavior for step wrappers
Rule:
- add those tests explicitly in this patch
Additional note:
- an existing test already asserts that
getTeamData()remains read-only and does not invoke kanban garbage collection - that invariant must stay green throughout this patch, not be replaced or weakened
Safe Scope
Safe to parallelize
After config loads successfully:
taskReader.getTasks(teamName)inboxReader.listInboxNames(teamName)inboxReader.getMessages(teamName)extractLeadSessionTexts(config)sentMessagesStore.readMessages(teamName)membersMetaStore.getMembers(teamName)kanbanManager.getState(teamName)
Explicitly keep serial
configReader.getConfig(teamName)presenceIndexPromisesetup logic- all message merge/dedup/enrichment logic
resolveMembers(...)getMemberAdvisories(...)enrichMemberBranches(...)readProcesses(teamName)
Design Choice
Preferred approach: two-level parallel read phase
After config, split the top read phase into:
Heavy group
Heavier file-system work:
getTasksgetMessagesextractLeadSessionTexts
Run through a small limiter.
Recommended limit:
2
Rules:
- use a fixed limit in phase 1
- do not make it adaptive
- do not make it platform-specific in phase 1
Why:
getMessages()already fans out internallygetTasks()can read many task filesextractLeadSessionTexts()may scan directories and JSONL files- fixed
2is the most predictable low-risk choice before profiling
Light group
Lower-risk direct reads:
listInboxNamesreadMessages(sentMessages)getMembers(metaMembers)getState(kanban)
Run immediately in parallel.
Important nuance:
- “light” here means “safe to start immediately”, not “guaranteed tiny”
- if later profiling shows one of these is expensive, that becomes a follow-up patch
Why not one big Promise.all
- too easy to turn one step failure into whole-snapshot failure
- too easy to lose deterministic warnings
- too easy to create uncontrolled disk contention
- too easy to blur diagnostics and completion timing
Safer fallback if confidence drops during implementation
If implementation reveals unexpected complexity, the fallback patch is:
- parallelize only the light group
- keep heavy steps serial
This fallback has lower upside but even lower risk:
🎯 7 🛡️ 10 🧠 3
Rule:
- prefer shipping the light-group-only version over broadening the patch unsafely
Step Wrapper Design
Do not use scattered try/catch blocks inside concurrent branches.
Introduce a small local helper inside getTeamData() or a tiny private helper in TeamDataService.
Preferred contract:
interface StepResult<T> {
value: T;
warning?: string;
completedAt: number;
}
function startReadStep<T>(options: {
label: string;
createFallback: () => T;
warningText?: string;
load: () => Promise<T>;
}): Promise<StepResult<T>>
Required semantics:
- start the load immediately when invoked
- never reject outward for expected step failures
- convert both sync throws and async rejects into fallback + warning
- always record
completedAt - never mutate shared warning arrays internally
- keep fallback and warning text explicit at the callsite
- create fallback values only on failure
Additional rule:
warningTextshould remain a static literal at each callsite- do not compute warning text dynamically inside the helper
Why:
- this patch must preserve wording exactly
- dynamic warning construction would increase drift risk for no real benefit
Stronger rule:
- the wrapped
loadclosure should be a thin direct call to one reader/service method - do not put assembly logic inside the wrapper closure
Why:
- broad closures increase the chance that a programming bug gets silently turned into fallback + warning
- keeping closures thin confines masking behavior to the existing best-effort I/O boundary
Important constraint:
- keep this helper local to the hot path
- do not introduce a new shared generic abstraction in the same patch
Why:
- this is a surgical performance patch
- broad abstractions increase blast radius and review cost
Additional rule:
- the helper must not inspect or transform returned values
Additional rule:
- the helper must not mutate shared outer state such as
warningsormarks
Why:
- any normalization inside the helper would increase the chance of semantic drift
- the helper should only control error capture and timing metadata
Additional rule:
- fallback values must be created per step callsite
- prefer
createFallback()over passing a prebuilt object/array - do not reuse one module-level empty array or one shared default object between steps
Additional rule:
createFallback()must be synchronous, side-effect free, and trivial- if
createFallback()itself throws, treat that as a programming bug, not an expected degraded read failure
Why:
- arrays like
messages,tasks, andsentMessagesare later merged and sometimes mutated - shared fallback identity would make accidental aliasing harder to spot in tests
- lazy fallback creation makes it harder to accidentally share the same object across successes and failures
- allowing fallback factories to become “smart” would widen the masking boundary beyond the current design intent
Settlement rule
Because each step promise is designed to resolve with a StepResult<T> even on failure:
- prefer
Promise.all(stepPromises)overPromise.allSettled(stepPromises)
Why:
Promise.allSettledbecomes redundant once the wrapper already settles failures- a second settlement layer makes the control flow harder to reason about
- it increases the chance of accidentally swallowing a bug in the orchestration code itself
Additional rule:
- do not add per-step logging inside the helper in phase 1
Why:
- existing observability already relies on slow-log timing and final warnings
- extra step-local logs would change noise characteristics while the patch goal is latency reduction
Diagnostics And Marks
Current mark(...) timestamps feed the slow log in TeamDataService.ts#L804.
Do not lose that.
Rule
- each step still needs a mark timestamp
- mark time should represent actual completion time
- not “when we finally consumed the result later”
Practical approach
When a step settles, keep completedAt in the returned StepResult.
After Promise.all(...) completes, assign that timestamp into marks[label] from the main control flow.
Example:
marks.tasks = tasksStep.completedAtmarks.messages = messagesStep.completedAt
Stronger rule:
- do not mutate
marksfrom inside concurrent branches
Why:
- the values would still be numerically correct, but the write path becomes harder to audit
- keeping all diagnostics writes in one deterministic place lowers review risk
Additional rule:
- do not synthesize marks for unfinished or skipped steps
Why:
- fake timestamps would make slow logs harder to trust
- missing marks should remain visible as
-1via currentmsSince(...)behavior if something truly did not complete
Important nuance
This changes interpretation slightly:
- before: marks reflected serial stage completion
- after: marks reflect “time since method start when this async read completed”
That is acceptable, but it should be documented near the helper or marks assignment in code comments.
Warning Discipline
Warnings must remain deterministic.
Do not:
- call
warnings.push(...)inside async branches
Instead:
- collect each step result independently
- append warnings after the parallel phase settles
- append them in the same semantic order as today
Required final order:
Tasks failed to loadInboxes failed to loadMessages failed to loadLead session texts failed to loadSent messages failed to loadMember metadata failed to loadKanban state failed to loadMember runtime advisories failed to loadProcesses failed to load
Phase 1 only changes how steps 1-7 are loaded, not their order.
Exact Safe Execution Order
Phase A - bootstrap
- load
config - create
warnings - compute
changePresenceEnabled - compute
logSourceSnapshot - start
presenceIndexPromiseexactly as today
Important rule:
- do not wrap
presenceIndexPromiseinstartReadStep - do not convert a
presenceIndexPromiserejection into fallback behavior - do not move its await earlier than today
Phase B - parallel read phase
Start these reads:
- heavy, limiter
2:- tasks
- messages
- leadTexts
- light, direct:
- inboxNames
- sentMessages
- metaMembers
- kanbanState
Wait for all of them to settle.
Preferred implementation detail:
- create every step promise before awaiting any of them
- start light steps immediately
- start heavy steps immediately through the limiter
- only await results after all step promises exist
- use one final await point for the whole phase, not separate group awaits
Why:
- avoids accidental serial gaps
- makes the intended parallelism explicit
- prevents “light first, heavy later” drift from sneaking in during refactor
Additional rule:
- instantiate heavy-step promises in stable semantic order:
- tasks
- messages
- leadTexts
Why:
- this makes limiter behavior deterministic
- it simplifies reasoning in tests that verify bounded heavy-step concurrency
Additional rule:
- do not create step promises inside array literals passed directly into
Promise.all
Why:
- it hides creation order
- it makes breakpoint/debugging worse
- it increases the chance of accidental inline logic expansion
Additional rule:
- queued heavy steps must not execute any reader work before a limiter slot is granted
Why:
- otherwise the code may look bounded while still front-loading real filesystem work
- this is an easy place to accidentally defeat the limiter with an eager closure
Preferred composition:
- queue a thunk that starts the wrapped step only after the limiter grants a slot
- in other words, prefer
runHeavy(() => startReadStep(...))over a shape that startsstartReadStep(...)first and only limits an inner sub-call
Why:
- this makes “not started yet” observable in tests
- it avoids subtle eagerness around fallback creation or timing capture
- it keeps the limiter boundary explicit in code review
Phase C - deterministic assembly
Using the resolved values:
- append warnings in fixed order
- initialize
messagesfrom the resolved inbox messages - append
leadTexts - append
sentMessages - run existing dedup logic unchanged
- run existing session-id enrichment unchanged
- run existing slash annotation unchanged
- build
tasksWithKanbanBase - await
presenceIndexPromise - apply presence overlay
- call
resolveMembers(...)
Important rule:
- do not mutate the
messagesarray during the parallel read phase
Important rule:
- preserve the current
presenceIndexPromiseawait point - it must still happen after
tasksWithKanbanBaseexists and before presence overlay is applied
Additional rule:
- keep local variable names close to the current implementation where practical
Why:
- smaller diffs reduce review risk in a fragile hot path
- easier diff-reading improves confidence that only await ordering changed
Phase D - unchanged lower phase
Keep this serial in phase 1:
getMemberAdvisories(...)enrichMemberBranches(...)readProcesses(teamName)
Important rule:
runtimeAdvisorieswarning must still precedeprocessesprocesseswarning must still be last
Additional rule:
- if phase 1 reveals that
readProcesses()is also a measurable bottleneck, do not “just parallelize it” - handle it in a separate plan because it is a reconcile-like path
Implementation Notes
1. Use a tiny local limiter, not a new dependency
Do not add a library.
Either:
- reuse a tiny local
mapLimitstyle pattern - or add a minimal local limiter helper inside
TeamDataService
The helper should be boring and obvious.
2. Preserve fallback defaults exactly
Defaults today:
- tasks:
[] - inboxNames:
[] - messages:
[] - leadTexts:
[] - sentMessages:
[] - metaMembers:
[] - kanbanState: default empty state
- processes:
[]
Do not change any of these defaults.
Stronger rule:
- instantiate array fallbacks as fresh literals at each step callsite
- instantiate the empty
kanbanStatefallback as a fresh object literal at its callsite
Why:
- it prevents accidental aliasing if later code mutates a returned fallback object
- it keeps tests honest when assertions depend on reference isolation
3. Keep presenceIndexPromise untouched
It already starts early and is gated correctly.
Changing it would add correctness risk with little payoff.
Stronger rule:
- preserve both success and failure semantics exactly
- if
presenceIndexPromiserejects today, the method should still reject after the same downstream point
Why:
- silently degrading change-presence data would be a correctness change, not just a perf change
4. Do not mix adjacent optimizations into this patch
Specifically do not combine:
- inbox-name reuse between
listInboxNames()andgetMessages() - task reader worker-path changes
- lead-session dir listing cache
Why:
- mixed patches are harder to reason about
- perf attribution gets worse
- rollback gets harder
5. Prefer the lowest-risk first wrapper conversion during implementation
If doing the implementation incrementally, the first serial wrapper conversion should be one of:
membersMetaStore.getMembers(teamName)sentMessagesStore.readMessages(teamName)
Avoid using getMessages() or extractLeadSessionTexts(config) as the first wrapper-conversion step.
Why:
- they are semantically denser and more file-heavy
- starting with a lighter reader makes it easier to verify the wrapper boundary before concurrency is introduced
6. Do not add committed benchmark scaffolding in this patch
If extra timing is needed during implementation:
- use temporary local instrumentation or existing slow-log output
- remove any temporary timing helpers before committing
Why:
- the patch goal is a surgical hot-path change, not a new profiling framework
- leaving ad hoc benchmark scaffolding behind would increase maintenance noise
Edge Cases
1. messages fail but leadTexts succeed
Expected:
- final
messagesstill containsleadTextsandsentMessages - warning includes
Messages failed to load
2. leadTexts fail but inbox messages succeed
Expected:
- final
messagesremains valid without lead session history - warning includes
Lead session texts failed to load
3. kanbanState fails
Expected:
- tasks still load
- empty kanban fallback is used
- warning order remains stable
- renderer still sees safe fallback state
4. metaMembers fail
Expected:
- resolver still works from config + inbox names + messages
- warning includes
Member metadata failed to load
5. One heavy read is very slow
Expected:
- light reads are not blocked from starting
- heavy reads are bounded and do not all pile up at once
6. Duplicate inbox-dir reads happen concurrently
Expected:
- this is acceptable in phase 1
- no semantic change occurs
- this should not be “optimized away” inside the same patch
7. processes marks a dead runtime and writes processes.json
Expected:
- this still happens
- it still happens outside the parallel read phase
8. A step throws synchronously before returning a promise
Expected:
- the step wrapper converts it into fallback + warning
getTeamData()still returns a partial snapshot
9. A heavy step finishes before a light step
Expected:
- marks reflect actual completion times
- warnings remain in semantic order
- assembly order remains unchanged
10. A bug occurs in orchestration code outside a step wrapper
Expected:
- it should still surface as a normal test/runtime failure
- it must not be hidden behind generic fallback behavior
Why this matters:
- the wrapper is only meant to preserve current best-effort reader semantics
- it is not meant to hide mistakes in assembly code
11. presenceIndexPromise rejects
Expected:
getTeamData()still rejects- no new warning is added
- the reject point remains after
tasksWithKanbanBaseconstruction, as today
Why this matters:
- changing this would alter correctness semantics for task change presence
- that is outside the scope of a safe latency patch
12. Heavy-step limiter accidentally serializes everything
Expected:
- light steps still start immediately
- at least two heavy steps may be in flight before the first one resolves
Why this matters:
- a broken limiter implementation can silently collapse back to serial behavior while tests still pass
13. A heavy step is queued behind the limiter and throws synchronously when finally started
Expected:
- the wrapper still converts it into fallback + warning
- the limiter slot is released
- downstream heavy steps can still run
Why this matters:
- queued execution introduces a second place where sync throws can happen
- the plan should be explicit that the limiter cannot leak capacity on failure
14. A queued heavy step has not started yet
Expected:
- its underlying reader has not been called
- it has not produced a warning
- it has not stamped
completedAt
Why this matters:
- otherwise the limiter may only be cosmetically bounding concurrency
- timing diagnostics would become misleading if queue time and completion time are confused
15. createFallback() is evaluated
Expected:
- it is evaluated only after the corresponding load fails
- it does not perform I/O or mutate shared state
- if it throws, the test should fail rather than being silently degraded
Why this matters:
- fallback factories are part of the masking boundary
- making them “smart” would quietly widen the behavior of the patch
Testing Plan
Add focused tests in TeamDataService.test.ts.
Preferred test technique
Prefer explicit deferred promises over fake timers for read-phase ordering tests.
Why:
- fake timers tend to couple tests to implementation details like
setTimeout - deferred promises make start-order and release-order assertions clearer
- the existing test file already uses direct promise orchestration patterns successfully
Required tests
-
Parallel read phase does not abort full
getTeamData()when one step fails. -
Warning order remains the same as before even when failures resolve out of order.
-
messagesmerge order remains unchanged:- inbox
- plus leadTexts
- plus sentMessages
- then current dedup/enrichment behavior
-
resolveMembers()is still called with the same effective inputs. -
readProcesses()still runs after the top read phase and still populatesprocesses. -
kanbanfallback remains safe when kanban read fails. -
leadTextssuccess still contributes messages even if inbox read fails. -
slow diagnostics do not crash when marks are produced from async completion times.
-
a step that throws synchronously still degrades to fallback + warning.
-
processesremains outside the parallel read phase. -
orchestration still fails normally if an error is thrown after step resolution and outside the wrapper boundary.
-
light steps start even while heavy steps are still blocked.
-
heavy-step limiter still allows bounded overlap and does not collapse to fully serial execution.
-
presenceIndexPromiserejection semantics remain unchanged. -
no concurrent branch writes directly into
warningsormarks. -
the existing read-only invariant for
getTeamData()remains intact and kanban GC is still not invoked from this path. -
the third heavy step does not call its underlying reader before one of the first two heavy steps settles.
-
a queued heavy step does not stamp
completedAtbefore it actually settles. -
missing-team fast-fail still happens before any read-phase step starts.
-
fallback creation happens only on failure and does not itself become a hidden degraded path.
Nice-to-have tests
- prove heavy-group limiter behavior with deferred promises
- prove warnings do not depend on completion order
- prove light steps can finish before heavy ones without changing assembly order
- prove heavy-step start order remains deterministic under the limiter
- prove queued heavy-step failure releases limiter capacity for the next heavy step
- prove fallback arrays/objects are not shared across unrelated step results
- prove missing-team fast-fail still happens before any read-phase step starts
- prove fallback creation is lazy and does not run on successful steps
Testing note
Current tests exercise many getTeamData() semantics, but they do not currently prove:
- warning order
- read-phase race behavior
- serial
processes
These tests must be added explicitly in this patch.
Additional testing rule:
- at least one new test should assert start order, not only final output
Why:
- output-only assertions can miss accidental serialization or accidental over-parallelization
Rollout Strategy
Phase 1
Implement only:
- local step wrapper
- small local heavy-step limiter
- top-level parallel read phase
- deterministic warning collection
- explicit tests for warning order and serial
processes - explicit tests for bounded heavy overlap and immediate light-step start
- explicit test for unchanged
presenceIndexPromiserejection behavior
Recommended implementation sequence
- Add the local
StepResult<T>shape andstartReadStep(...)helper without changing behavior yet. - Move one low-risk light read to the helper pattern and keep it serial to verify the wrapper boundary stays thin.
- Introduce the local heavy limiter with a tiny focused test before wiring all heavy steps through it.
- Convert the full top read phase into:
- light steps started directly
- heavy steps started via the limiter
- Keep warning collection and mark assignment outside the wrappers and outside concurrent branches.
- Re-run the lower assembly with local variable names as close to the current code as practical.
- Add the ordering and failure-mode tests last, after the implementation shape stabilizes.
Why this order:
- it keeps each transition auditable
- it makes it easier to catch accidental semantic drift before the whole method is reshaped
Additional rule:
- do not combine the helper-contract change and the full concurrency rewrite in one blind edit
Why:
- separating those transitions makes review of semantic drift much easier
Phase 2 only if needed later
- optional
getLeadSessionJsonlPaths()cache - optional inbox-name reuse between
listInboxNames()andgetMessages() - optional broader lower-phase parallelization
Success Criteria
The patch is successful if all are true:
TeamDatapayload shape is unchanged- existing tests pass
- new warning-order and race tests pass
typecheckpasses- targeted
getTeamDatalatency improves on repeated refreshes - no warning-order drift
- no changes to dedup or renderer-visible semantics
Areas With Lower Confidence
1. Best heavy-step limiter value
Confidence: medium.
Why:
2is the safest educated default- it is not backed by real profiling data yet
Decision:
- use
2in phase 1 - tune only after measurement
2. Real contention impact of concurrent getMessages() plus extractLeadSessionTexts()
Confidence: medium.
Why:
- both touch team filesystem data
- one already does internal fan-out
Decision:
- keep the heavy group bounded
- do not add more heavy steps to it in phase 1
3. Diagnostic interpretation shift
Confidence: medium-high.
Why:
- marks will no longer describe a strictly serial stage pipeline
- but they remain operationally useful
Decision:
- preserve labels
- store actual completion times
- document the interpretation shift
4. Perf upside size
Confidence: medium-high.
Why:
- the current serial top read phase is clearly a bottleneck candidate
- exact gains still depend on filesystem behavior and team size
Decision:
- keep the patch narrow
- verify with targeted timings before any broader refactor
5. Exact wrapper boundary
Confidence: medium.
Why:
- too broad a wrapper masks bugs
- too narrow a wrapper creates repetitive code and inconsistent fallback handling
Decision:
- wrap only the direct reader/service call
- keep all merge/assembly logic outside the wrapper
6. presenceIndexPromise semantics
Confidence: medium.
Why:
- it starts early but is awaited later
- it currently rejects the whole method instead of degrading to warning
- it is easy to accidentally “improve” this while introducing wrappers
Decision:
- preserve current reject semantics exactly
- add a regression test explicitly for this
7. Test strength for concurrency behavior
Confidence: medium.
Why:
- concurrency tests often pass while failing to prove the intended scheduling shape
- the current suite has very limited ordering-specific coverage
Decision:
- use explicit deferred promises
- assert both start order and completion-order independence
8. Limiter implementation correctness
Confidence: medium.
Why:
- small limiters are easy to get almost right while still leaking capacity or starting work eagerly
- that kind of bug can preserve output correctness while eliminating the perf gain
Decision:
- keep the limiter tiny and local
- verify queued-failure release behavior in tests
- verify that the third heavy step does not start before one of the first two settles
9. Wrapper and limiter composition
Confidence: medium.
Why:
- the code can look “limited” while still eagerly starting reader work
- this is one of the easiest places to lose the intended perf behavior without breaking output tests
Decision:
- prefer queuing a thunk that starts
startReadStep(...)only after slot acquisition - add a test that asserts the third heavy reader has not been called while both slots are occupied
10. Fallback-factory discipline
Confidence: medium.
Why:
createFallback()looks harmless, but it is part of the same masking boundary as the wrapper- if it grows side effects or branching logic, the patch stops being a narrow perf change
Decision:
- keep fallback factories as trivial literals/object constructors
- keep warning text static at the callsite
- let thrown fallback-factory errors surface as programming bugs
Residual Risk
Even after this patch, the architecture remains sensitive because:
team:getDatastill assembles a large monolithic snapshotmessages[]is still a hot payload- first-open cost still exists
processesremains a reconcile-like read/write path
This patch should therefore be treated as:
- a safe latency reduction
- not a final architectural solution