admin-valentin/agent-ecosystem

Fork 0

777genius 01e9e8350e feat(frontend): improve runtime connection and team setup ux

2026-04-11 19:55:25 +03:00

35 KiB

Raw Blame History

getTeamData Parallel Read Plan

Goal

Reduce team:getData latency by parallelizing only the safe top-level read phase of getTeamData() without changing:

TeamData payload shape
message merge semantics
dedup semantics
warning semantics
renderer/main truth boundaries
launch/process reconciliation behavior

Primary target:

repeated visible team detail refreshes

Not a target for this patch:

new IPC endpoints
renderer refresh changes
new caches
message payload slimming
lower-pipeline refactors

Why This Requires Extra Care

This touches a guarded hot zone:

main process / IPC / heavy team payload assembly

Relevant guardrails:

preserve state truth and reconciliation
avoid heavier monolithic snapshot paths
preserve observability
prefer the smallest invariant-preserving change

Main failure modes to avoid:

one async failure aborts the whole snapshot
warning order becomes nondeterministic
message merge order changes
completion-time diagnostics become misleading
side-effecting reads run concurrently with unrelated file-heavy reads
naive parallelism creates enough disk contention to erase the perf gain
the patch quietly assumes tests cover behavior they currently do not cover
step wrappers silently mask wiring bugs because the wrapped closure does too much work
nested settlement layers make the control flow harder to audit
presenceIndexPromise rejection semantics change by accident
shared mutable fallback objects leak across steps or tests
diagnostics state is mutated from concurrent branches and becomes harder to audit
queued heavy work starts reader I/O before a limiter slot is actually granted
a queued heavy step records timing metadata at queue time instead of settle time
fallback factories become mini control-flow hooks and start masking programming bugs
warning text generation drifts from today because it becomes dynamic instead of literal

Relevant code:

Hard Non-Goals

This patch must not:

parallelize readProcesses()
parallelize resolveMembers()
parallelize getMemberAdvisories(...)
parallelize enrichMemberBranches(...)
change the post-load message pipeline
change warning strings
change warning order
change any renderer store logic
change any InboxMessage or TeamData semantics
move concurrency into TeamTaskReader or TeamInboxReader

This patch must also not opportunistically fix adjacent issues such as:

reusing listInboxNames() inside getMessages()
introducing a single-team worker path for tasks
caching lead-session directory listings

Those are separate patches.

Additional rule:

keep all new concurrency orchestration local to getTeamData()

Why:

moving concurrency into lower-level readers would widen the blast radius
it would make it harder to reason about total filesystem pressure from this patch

Dependency Graph

The patch must preserve this dependency graph exactly.

Strict prerequisites

config must complete before:
- extractLeadSessionTexts(config)
- any downstream assembly

Additional rule:

if config is missing, getTeamData() must still throw before any read-phase step promises are created

Why:

this preserves the current fast-fail behavior
it avoids accidental background work for nonexistent teams

Safe early-start paths

After config:

presenceIndexPromise
tasks
inboxNames
messages
leadTexts
sentMessages
metaMembers
kanbanState

Assembly prerequisites

resolveMembers(...) requires all of:

config
metaMembers
inboxNames
tasksWithKanban
final assembled messages

tasksWithKanban requires:

tasks
kanbanState
presenceIndexPromise

presenceIndexPromise itself must preserve current semantics:

it starts early
it is awaited only after tasksWithKanbanBase
if it rejects today, getTeamData() rejects

Rule:

do not wrap, settle, catch, delay, or parallelize away that rejection behavior in this patch

Explicit downstream-only paths

These must stay after assembly:

getMemberAdvisories(...)
enrichMemberBranches(...)
readProcesses(teamName)

Why this section exists:

the patch is only safe if dependency edges remain explicit
parallelization work tends to hide dependency mistakes behind “it still compiled”

Key Observations

1. `processes` is not a pure read

readProcesses() looks harmless, but controller.processes.listProcesses() can mark dead PIDs and synchronously rewrite processes.json.

Relevant code:

Rule:

keep processes outside the parallel read phase

2. Some top-level readers already create their own fan-out

TeamInboxReader.getMessages() already performs bounded concurrent inbox reads.

Relevant code:

Implication:

do not start every heavyweight reader at once
control top-level heavy-step concurrency explicitly

3. `listInboxNames()` is duplicated work today

getTeamData() calls listInboxNames() directly, and getMessages() calls it again internally.

Relevant code:

Implication:

phase 1 will still duplicate one inbox-directory listing
that is acceptable for this patch
do not widen the patch by coupling these readers

4. Warning order is currently deterministic

Current semantic order:

tasks
inbox names
messages
lead session texts
sent messages
member metadata
kanban state
runtime advisories
processes

Rule:

preserve this order even if reads complete out of order

Additional rule:

preserve warning presence and wording exactly
do not deduplicate warnings in this patch

Additional note:

presenceIndexPromise is intentionally not part of this warning sequence
phase 1 must not introduce a new warning for presence loading

5. The lower pipeline is semantically sensitive

After the read phase, getTeamData() performs:

message merge
lead-session/live dedup
session-id enrichment
slash-result annotation
kanban overlay
member resolution

Rule:

do not reorder, parallelize, or “clean up” this lower pipeline in phase 1

5a. `kanbanGc` is part of current diagnostics shape even though no GC runs here

getTeamData() currently records mark('kanbanGc') immediately after kanban load without performing garbage collection.

Relevant code:

src/main/services/team/TeamDataService.ts#L740

Rule:

keep this mark where it is in phase 1
do not “clean it up” while touching perf code

Why:

changing it mixes perf work with observability semantics
it would make before/after slow-log comparison noisier

6. Existing tests do not cover the riskiest new behavior

Current coverage exercises many data semantics, but not:

warning order under concurrent failures
read-phase race behavior
proof that processes stayed outside the parallel phase
sync-throw fallback behavior for step wrappers

Rule:

add those tests explicitly in this patch

Additional note:

an existing test already asserts that getTeamData() remains read-only and does not invoke kanban garbage collection
that invariant must stay green throughout this patch, not be replaced or weakened

Safe Scope

Safe to parallelize

After config loads successfully:

taskReader.getTasks(teamName)
inboxReader.listInboxNames(teamName)
inboxReader.getMessages(teamName)
extractLeadSessionTexts(config)
sentMessagesStore.readMessages(teamName)
membersMetaStore.getMembers(teamName)
kanbanManager.getState(teamName)

Explicitly keep serial

configReader.getConfig(teamName)
presenceIndexPromise setup logic
all message merge/dedup/enrichment logic
resolveMembers(...)
getMemberAdvisories(...)
enrichMemberBranches(...)
readProcesses(teamName)

Design Choice

Preferred approach: two-level parallel read phase

After config, split the top read phase into:

Heavy group

Heavier file-system work:

getTasks
getMessages
extractLeadSessionTexts

Run through a small limiter.

Recommended limit:

Rules:

use a fixed limit in phase 1
do not make it adaptive
do not make it platform-specific in phase 1

Why:

getMessages() already fans out internally
getTasks() can read many task files
extractLeadSessionTexts() may scan directories and JSONL files
fixed 2 is the most predictable low-risk choice before profiling

Light group

Lower-risk direct reads:

listInboxNames
readMessages(sentMessages)
getMembers(metaMembers)
getState(kanban)

Run immediately in parallel.

Important nuance:

“light” here means “safe to start immediately”, not “guaranteed tiny”
if later profiling shows one of these is expensive, that becomes a follow-up patch

Why not one big `Promise.all`

too easy to turn one step failure into whole-snapshot failure
too easy to lose deterministic warnings
too easy to create uncontrolled disk contention
too easy to blur diagnostics and completion timing

Safer fallback if confidence drops during implementation

If implementation reveals unexpected complexity, the fallback patch is:

parallelize only the light group
keep heavy steps serial

This fallback has lower upside but even lower risk:

🎯 7 🛡️ 10 🧠 3

Rule:

prefer shipping the light-group-only version over broadening the patch unsafely

Step Wrapper Design

Do not use scattered try/catch blocks inside concurrent branches.

Introduce a small local helper inside getTeamData() or a tiny private helper in TeamDataService.

Preferred contract:

interface StepResult<T> {
  value: T;
  warning?: string;
  completedAt: number;
}

function startReadStep<T>(options: {
  label: string;
  createFallback: () => T;
  warningText?: string;
  load: () => Promise<T>;
}): Promise<StepResult<T>>

Required semantics:

start the load immediately when invoked
never reject outward for expected step failures
convert both sync throws and async rejects into fallback + warning
always record completedAt
never mutate shared warning arrays internally
keep fallback and warning text explicit at the callsite
create fallback values only on failure

Additional rule:

warningText should remain a static literal at each callsite
do not compute warning text dynamically inside the helper

Why:

this patch must preserve wording exactly
dynamic warning construction would increase drift risk for no real benefit

Stronger rule:

the wrapped load closure should be a thin direct call to one reader/service method
do not put assembly logic inside the wrapper closure

Why:

broad closures increase the chance that a programming bug gets silently turned into fallback + warning
keeping closures thin confines masking behavior to the existing best-effort I/O boundary

Important constraint:

keep this helper local to the hot path
do not introduce a new shared generic abstraction in the same patch

Why:

this is a surgical performance patch
broad abstractions increase blast radius and review cost

Additional rule:

the helper must not inspect or transform returned values

Additional rule:

the helper must not mutate shared outer state such as warnings or marks

Why:

any normalization inside the helper would increase the chance of semantic drift
the helper should only control error capture and timing metadata

Additional rule:

fallback values must be created per step callsite
prefer createFallback() over passing a prebuilt object/array
do not reuse one module-level empty array or one shared default object between steps

Additional rule:

createFallback() must be synchronous, side-effect free, and trivial
if createFallback() itself throws, treat that as a programming bug, not an expected degraded read failure

Why:

arrays like messages, tasks, and sentMessages are later merged and sometimes mutated
shared fallback identity would make accidental aliasing harder to spot in tests
lazy fallback creation makes it harder to accidentally share the same object across successes and failures
allowing fallback factories to become “smart” would widen the masking boundary beyond the current design intent

Settlement rule

Because each step promise is designed to resolve with a StepResult<T> even on failure:

prefer Promise.all(stepPromises) over Promise.allSettled(stepPromises)

Why:

Promise.allSettled becomes redundant once the wrapper already settles failures
a second settlement layer makes the control flow harder to reason about
it increases the chance of accidentally swallowing a bug in the orchestration code itself

Additional rule:

do not add per-step logging inside the helper in phase 1

Why:

existing observability already relies on slow-log timing and final warnings
extra step-local logs would change noise characteristics while the patch goal is latency reduction

Diagnostics And Marks

Current mark(...) timestamps feed the slow log in TeamDataService.ts#L804.

Do not lose that.

Rule

each step still needs a mark timestamp
mark time should represent actual completion time
not “when we finally consumed the result later”

Practical approach

When a step settles, keep completedAt in the returned StepResult.

After Promise.all(...) completes, assign that timestamp into marks[label] from the main control flow.

Example:

marks.tasks = tasksStep.completedAt
marks.messages = messagesStep.completedAt

Stronger rule:

do not mutate marks from inside concurrent branches

Why:

the values would still be numerically correct, but the write path becomes harder to audit
keeping all diagnostics writes in one deterministic place lowers review risk

Additional rule:

do not synthesize marks for unfinished or skipped steps

Why:

fake timestamps would make slow logs harder to trust
missing marks should remain visible as -1 via current msSince(...) behavior if something truly did not complete

Important nuance

This changes interpretation slightly:

before: marks reflected serial stage completion
after: marks reflect “time since method start when this async read completed”

That is acceptable, but it should be documented near the helper or marks assignment in code comments.

Warning Discipline

Warnings must remain deterministic.

Do not:

call warnings.push(...) inside async branches

Instead:

collect each step result independently
append warnings after the parallel phase settles
append them in the same semantic order as today

Required final order:

Tasks failed to load
Inboxes failed to load
Messages failed to load
Lead session texts failed to load
Sent messages failed to load
Member metadata failed to load
Kanban state failed to load
Member runtime advisories failed to load
Processes failed to load

Phase 1 only changes how steps 1-7 are loaded, not their order.

Exact Safe Execution Order

Phase A - bootstrap

load config
create warnings
compute changePresenceEnabled
compute logSourceSnapshot
start presenceIndexPromise exactly as today

Important rule:

do not wrap presenceIndexPromise in startReadStep
do not convert a presenceIndexPromise rejection into fallback behavior
do not move its await earlier than today

Phase B - parallel read phase

Start these reads:

heavy, limiter 2:
- tasks
- messages
- leadTexts
light, direct:
- inboxNames
- sentMessages
- metaMembers
- kanbanState

Wait for all of them to settle.

Preferred implementation detail:

create every step promise before awaiting any of them
start light steps immediately
start heavy steps immediately through the limiter
only await results after all step promises exist
use one final await point for the whole phase, not separate group awaits

Why:

avoids accidental serial gaps
makes the intended parallelism explicit
prevents “light first, heavy later” drift from sneaking in during refactor

Additional rule:

instantiate heavy-step promises in stable semantic order:
- tasks
- messages
- leadTexts

Why:

this makes limiter behavior deterministic
it simplifies reasoning in tests that verify bounded heavy-step concurrency

Additional rule:

do not create step promises inside array literals passed directly into Promise.all

Why:

it hides creation order
it makes breakpoint/debugging worse
it increases the chance of accidental inline logic expansion

Additional rule:

queued heavy steps must not execute any reader work before a limiter slot is granted

Why:

otherwise the code may look bounded while still front-loading real filesystem work
this is an easy place to accidentally defeat the limiter with an eager closure

Preferred composition:

queue a thunk that starts the wrapped step only after the limiter grants a slot
in other words, prefer runHeavy(() => startReadStep(...)) over a shape that starts startReadStep(...) first and only limits an inner sub-call

Why:

this makes “not started yet” observable in tests
it avoids subtle eagerness around fallback creation or timing capture
it keeps the limiter boundary explicit in code review

Phase C - deterministic assembly

Using the resolved values:

append warnings in fixed order
initialize messages from the resolved inbox messages
append leadTexts
append sentMessages
run existing dedup logic unchanged
run existing session-id enrichment unchanged
run existing slash annotation unchanged
build tasksWithKanbanBase
await presenceIndexPromise
apply presence overlay
call resolveMembers(...)

Important rule:

do not mutate the messages array during the parallel read phase

Important rule:

preserve the current presenceIndexPromise await point
it must still happen after tasksWithKanbanBase exists and before presence overlay is applied

Additional rule:

keep local variable names close to the current implementation where practical

Why:

smaller diffs reduce review risk in a fragile hot path
easier diff-reading improves confidence that only await ordering changed

Phase D - unchanged lower phase

Keep this serial in phase 1:

getMemberAdvisories(...)
enrichMemberBranches(...)
readProcesses(teamName)

Important rule:

runtimeAdvisories warning must still precede processes
processes warning must still be last

Additional rule:

if phase 1 reveals that readProcesses() is also a measurable bottleneck, do not “just parallelize it”
handle it in a separate plan because it is a reconcile-like path

Implementation Notes

1. Use a tiny local limiter, not a new dependency

Do not add a library.

Either:

reuse a tiny local mapLimit style pattern
or add a minimal local limiter helper inside TeamDataService

The helper should be boring and obvious.

2. Preserve fallback defaults exactly

Defaults today:

tasks: []
inboxNames: []
messages: []
leadTexts: []
sentMessages: []
metaMembers: []
kanbanState: default empty state
processes: []

Do not change any of these defaults.

Stronger rule:

instantiate array fallbacks as fresh literals at each step callsite
instantiate the empty kanbanState fallback as a fresh object literal at its callsite

Why:

it prevents accidental aliasing if later code mutates a returned fallback object
it keeps tests honest when assertions depend on reference isolation

3. Keep `presenceIndexPromise` untouched

It already starts early and is gated correctly.

Changing it would add correctness risk with little payoff.

Stronger rule:

preserve both success and failure semantics exactly
if presenceIndexPromise rejects today, the method should still reject after the same downstream point

Why:

silently degrading change-presence data would be a correctness change, not just a perf change

4. Do not mix adjacent optimizations into this patch

Specifically do not combine:

inbox-name reuse between listInboxNames() and getMessages()
task reader worker-path changes
lead-session dir listing cache

Why:

mixed patches are harder to reason about
perf attribution gets worse
rollback gets harder

5. Prefer the lowest-risk first wrapper conversion during implementation

If doing the implementation incrementally, the first serial wrapper conversion should be one of:

membersMetaStore.getMembers(teamName)
sentMessagesStore.readMessages(teamName)

Avoid using getMessages() or extractLeadSessionTexts(config) as the first wrapper-conversion step.

Why:

they are semantically denser and more file-heavy
starting with a lighter reader makes it easier to verify the wrapper boundary before concurrency is introduced

6. Do not add committed benchmark scaffolding in this patch

If extra timing is needed during implementation:

use temporary local instrumentation or existing slow-log output
remove any temporary timing helpers before committing

Why:

the patch goal is a surgical hot-path change, not a new profiling framework
leaving ad hoc benchmark scaffolding behind would increase maintenance noise

Edge Cases

1. `messages` fail but `leadTexts` succeed

Expected:

final messages still contains leadTexts and sentMessages
warning includes Messages failed to load

2. `leadTexts` fail but inbox messages succeed

Expected:

final messages remains valid without lead session history
warning includes Lead session texts failed to load

3. `kanbanState` fails

Expected:

tasks still load
empty kanban fallback is used
warning order remains stable
renderer still sees safe fallback state

4. `metaMembers` fail

Expected:

resolver still works from config + inbox names + messages
warning includes Member metadata failed to load

5. One heavy read is very slow

Expected:

light reads are not blocked from starting
heavy reads are bounded and do not all pile up at once

6. Duplicate inbox-dir reads happen concurrently

Expected:

this is acceptable in phase 1
no semantic change occurs
this should not be “optimized away” inside the same patch

7. `processes` marks a dead runtime and writes `processes.json`

Expected:

this still happens
it still happens outside the parallel read phase

8. A step throws synchronously before returning a promise

Expected:

the step wrapper converts it into fallback + warning
getTeamData() still returns a partial snapshot

9. A heavy step finishes before a light step

Expected:

marks reflect actual completion times
warnings remain in semantic order
assembly order remains unchanged

10. A bug occurs in orchestration code outside a step wrapper

Expected:

it should still surface as a normal test/runtime failure
it must not be hidden behind generic fallback behavior

Why this matters:

the wrapper is only meant to preserve current best-effort reader semantics
it is not meant to hide mistakes in assembly code

11. `presenceIndexPromise` rejects

Expected:

getTeamData() still rejects
no new warning is added
the reject point remains after tasksWithKanbanBase construction, as today

Why this matters:

changing this would alter correctness semantics for task change presence
that is outside the scope of a safe latency patch

12. Heavy-step limiter accidentally serializes everything

Expected:

light steps still start immediately
at least two heavy steps may be in flight before the first one resolves

Why this matters:

a broken limiter implementation can silently collapse back to serial behavior while tests still pass

13. A heavy step is queued behind the limiter and throws synchronously when finally started

Expected:

the wrapper still converts it into fallback + warning
the limiter slot is released
downstream heavy steps can still run

Why this matters:

queued execution introduces a second place where sync throws can happen
the plan should be explicit that the limiter cannot leak capacity on failure

14. A queued heavy step has not started yet

Expected:

its underlying reader has not been called
it has not produced a warning
it has not stamped completedAt

Why this matters:

otherwise the limiter may only be cosmetically bounding concurrency
timing diagnostics would become misleading if queue time and completion time are confused

15. `createFallback()` is evaluated

Expected:

it is evaluated only after the corresponding load fails
it does not perform I/O or mutate shared state
if it throws, the test should fail rather than being silently degraded

Why this matters:

fallback factories are part of the masking boundary
making them “smart” would quietly widen the behavior of the patch

Testing Plan

Add focused tests in TeamDataService.test.ts.

Preferred test technique

Prefer explicit deferred promises over fake timers for read-phase ordering tests.

Why:

fake timers tend to couple tests to implementation details like setTimeout
deferred promises make start-order and release-order assertions clearer
the existing test file already uses direct promise orchestration patterns successfully

Required tests

Parallel read phase does not abort full getTeamData() when one step fails.
Warning order remains the same as before even when failures resolve out of order.
messages merge order remains unchanged:
- inbox
- plus leadTexts
- plus sentMessages
- then current dedup/enrichment behavior
resolveMembers() is still called with the same effective inputs.
readProcesses() still runs after the top read phase and still populates processes.
kanban fallback remains safe when kanban read fails.
leadTexts success still contributes messages even if inbox read fails.
slow diagnostics do not crash when marks are produced from async completion times.
a step that throws synchronously still degrades to fallback + warning.
processes remains outside the parallel read phase.
orchestration still fails normally if an error is thrown after step resolution and outside the wrapper boundary.
light steps start even while heavy steps are still blocked.
heavy-step limiter still allows bounded overlap and does not collapse to fully serial execution.
presenceIndexPromise rejection semantics remain unchanged.
no concurrent branch writes directly into warnings or marks.
the existing read-only invariant for getTeamData() remains intact and kanban GC is still not invoked from this path.
the third heavy step does not call its underlying reader before one of the first two heavy steps settles.
a queued heavy step does not stamp completedAt before it actually settles.
missing-team fast-fail still happens before any read-phase step starts.
fallback creation happens only on failure and does not itself become a hidden degraded path.

Nice-to-have tests

prove heavy-group limiter behavior with deferred promises
prove warnings do not depend on completion order
prove light steps can finish before heavy ones without changing assembly order
prove heavy-step start order remains deterministic under the limiter
prove queued heavy-step failure releases limiter capacity for the next heavy step
prove fallback arrays/objects are not shared across unrelated step results
prove missing-team fast-fail still happens before any read-phase step starts
prove fallback creation is lazy and does not run on successful steps

Testing note

Current tests exercise many getTeamData() semantics, but they do not currently prove:

warning order
read-phase race behavior
serial processes

These tests must be added explicitly in this patch.

Additional testing rule:

at least one new test should assert start order, not only final output

Why:

output-only assertions can miss accidental serialization or accidental over-parallelization

Rollout Strategy

Phase 1

Implement only:

local step wrapper
small local heavy-step limiter
top-level parallel read phase
deterministic warning collection
explicit tests for warning order and serial processes
explicit tests for bounded heavy overlap and immediate light-step start
explicit test for unchanged presenceIndexPromise rejection behavior

Recommended implementation sequence

Add the local StepResult<T> shape and startReadStep(...) helper without changing behavior yet.
Move one low-risk light read to the helper pattern and keep it serial to verify the wrapper boundary stays thin.
Introduce the local heavy limiter with a tiny focused test before wiring all heavy steps through it.
Convert the full top read phase into:
- light steps started directly
- heavy steps started via the limiter
Keep warning collection and mark assignment outside the wrappers and outside concurrent branches.
Re-run the lower assembly with local variable names as close to the current code as practical.
Add the ordering and failure-mode tests last, after the implementation shape stabilizes.

Why this order:

it keeps each transition auditable
it makes it easier to catch accidental semantic drift before the whole method is reshaped

Additional rule:

do not combine the helper-contract change and the full concurrency rewrite in one blind edit

Why:

separating those transitions makes review of semantic drift much easier

Phase 2 only if needed later

optional getLeadSessionJsonlPaths() cache
optional inbox-name reuse between listInboxNames() and getMessages()
optional broader lower-phase parallelization

Success Criteria

The patch is successful if all are true:

TeamData payload shape is unchanged
existing tests pass
new warning-order and race tests pass
typecheck passes
targeted getTeamData latency improves on repeated refreshes
no warning-order drift
no changes to dedup or renderer-visible semantics

Areas With Lower Confidence

1. Best heavy-step limiter value

Confidence: medium.

Why:

2 is the safest educated default
it is not backed by real profiling data yet

Decision:

use 2 in phase 1
tune only after measurement

2. Real contention impact of concurrent `getMessages()` plus `extractLeadSessionTexts()`

Confidence: medium.

Why:

both touch team filesystem data
one already does internal fan-out

Decision:

keep the heavy group bounded
do not add more heavy steps to it in phase 1

3. Diagnostic interpretation shift

Confidence: medium-high.

Why:

marks will no longer describe a strictly serial stage pipeline
but they remain operationally useful

Decision:

preserve labels
store actual completion times
document the interpretation shift

4. Perf upside size

Confidence: medium-high.

Why:

the current serial top read phase is clearly a bottleneck candidate
exact gains still depend on filesystem behavior and team size

Decision:

keep the patch narrow
verify with targeted timings before any broader refactor

5. Exact wrapper boundary

Confidence: medium.

Why:

too broad a wrapper masks bugs
too narrow a wrapper creates repetitive code and inconsistent fallback handling

Decision:

wrap only the direct reader/service call
keep all merge/assembly logic outside the wrapper

6. `presenceIndexPromise` semantics

Confidence: medium.

Why:

it starts early but is awaited later
it currently rejects the whole method instead of degrading to warning
it is easy to accidentally “improve” this while introducing wrappers

Decision:

preserve current reject semantics exactly
add a regression test explicitly for this

7. Test strength for concurrency behavior

Confidence: medium.

Why:

concurrency tests often pass while failing to prove the intended scheduling shape
the current suite has very limited ordering-specific coverage

Decision:

use explicit deferred promises
assert both start order and completion-order independence

8. Limiter implementation correctness

Confidence: medium.

Why:

small limiters are easy to get almost right while still leaking capacity or starting work eagerly
that kind of bug can preserve output correctness while eliminating the perf gain

Decision:

keep the limiter tiny and local
verify queued-failure release behavior in tests
verify that the third heavy step does not start before one of the first two settles

9. Wrapper and limiter composition

Confidence: medium.

Why:

the code can look “limited” while still eagerly starting reader work
this is one of the easiest places to lose the intended perf behavior without breaking output tests

Decision:

prefer queuing a thunk that starts startReadStep(...) only after slot acquisition
add a test that asserts the third heavy reader has not been called while both slots are occupied

10. Fallback-factory discipline

Confidence: medium.

Why:

createFallback() looks harmless, but it is part of the same masking boundary as the wrapper
if it grows side effects or branching logic, the patch stops being a narrow perf change

Decision:

keep fallback factories as trivial literals/object constructors
keep warning text static at the callsite
let thrown fallback-factory errors surface as programming bugs

Residual Risk

Even after this patch, the architecture remains sensitive because:

team:getData still assembles a large monolithic snapshot
messages[] is still a hot payload
first-open cost still exists
processes remains a reconcile-like read/write path

This patch should therefore be treated as:

a safe latency reduction
not a final architectural solution

35 KiB Raw Blame History

getTeamData Parallel Read Plan

Goal

Why This Requires Extra Care

Hard Non-Goals

Dependency Graph

Strict prerequisites

Safe early-start paths

Assembly prerequisites

Explicit downstream-only paths

Key Observations

1. processes is not a pure read

2. Some top-level readers already create their own fan-out

3. listInboxNames() is duplicated work today

4. Warning order is currently deterministic

5. The lower pipeline is semantically sensitive

5a. kanbanGc is part of current diagnostics shape even though no GC runs here

6. Existing tests do not cover the riskiest new behavior

Safe Scope

Safe to parallelize

Explicitly keep serial

Design Choice

Preferred approach: two-level parallel read phase

Heavy group

Light group

Why not one big Promise.all

Safer fallback if confidence drops during implementation

Step Wrapper Design

Settlement rule

Diagnostics And Marks

Rule

Practical approach

Important nuance

Warning Discipline

Exact Safe Execution Order

Phase A - bootstrap

Phase B - parallel read phase

Phase C - deterministic assembly

Phase D - unchanged lower phase

Implementation Notes

1. Use a tiny local limiter, not a new dependency

2. Preserve fallback defaults exactly

3. Keep presenceIndexPromise untouched

4. Do not mix adjacent optimizations into this patch

5. Prefer the lowest-risk first wrapper conversion during implementation

6. Do not add committed benchmark scaffolding in this patch

Edge Cases

1. messages fail but leadTexts succeed

2. leadTexts fail but inbox messages succeed

3. kanbanState fails

4. metaMembers fail

5. One heavy read is very slow

6. Duplicate inbox-dir reads happen concurrently

7. processes marks a dead runtime and writes processes.json

8. A step throws synchronously before returning a promise

9. A heavy step finishes before a light step

10. A bug occurs in orchestration code outside a step wrapper

11. presenceIndexPromise rejects

12. Heavy-step limiter accidentally serializes everything

13. A heavy step is queued behind the limiter and throws synchronously when finally started

14. A queued heavy step has not started yet

15. createFallback() is evaluated

Testing Plan

Preferred test technique

Required tests

Nice-to-have tests

Testing note

Rollout Strategy

Phase 1

Recommended implementation sequence

Phase 2 only if needed later

Success Criteria

Areas With Lower Confidence

1. Best heavy-step limiter value

2. Real contention impact of concurrent getMessages() plus extractLeadSessionTexts()

3. Diagnostic interpretation shift

4. Perf upside size

5. Exact wrapper boundary

6. presenceIndexPromise semantics

7. Test strength for concurrency behavior

35 KiB

Raw Blame History

1. `processes` is not a pure read

3. `listInboxNames()` is duplicated work today

5a. `kanbanGc` is part of current diagnostics shape even though no GC runs here

Why not one big `Promise.all`

3. Keep `presenceIndexPromise` untouched

1. `messages` fail but `leadTexts` succeed

2. `leadTexts` fail but inbox messages succeed

3. `kanbanState` fails

4. `metaMembers` fail

7. `processes` marks a dead runtime and writes `processes.json`

11. `presenceIndexPromise` rejects

15. `createFallback()` is evaluated

2. Real contention impact of concurrent `getMessages()` plus `extractLeadSessionTexts()`

6. `presenceIndexPromise` semantics