matt ae4833e310 feat(planning): add project planning documents and configuration for SSH multi-context workspaces

- Introduced new planning files including PROJECT.md, REQUIREMENTS.md, ROADMAP.md, and STATE.md to outline the vision and requirements for SSH multi-context workspaces.
- Added ARCHITECTURE.md and CONCERNS.md to detail the codebase structure and address technical debt, known bugs, and security considerations.
- Created CONVENTIONS.md to establish coding standards and practices for the project.
- Updated .gitignore to exclude demo files and added configuration for planning tools.

This commit lays the groundwork for enhancing SSH functionality and user experience in managing multiple workspaces.

2026-02-12 09:37:58 +09:00

24 KiB

Raw Blame History

Pitfalls Research

Domain: Multi-context workspace switching for Electron + Zustand apps Researched: 2026-02-12 Confidence: HIGH

Critical Pitfalls

Pitfall 1: Destructive Context Switching with Incomplete State Snapshots

What goes wrong: Switching from SSH to local (or vice versa) calls getFullResetState() which wipes ALL selections and loaded data, destroying the user's previous workspace state. When switching back, users must re-navigate to their project, re-open sessions, and restore their scroll position manually.

Why it happens: Current implementation treats context switch as a "hard reset" instead of maintaining separate state snapshots per context. The connectionSlice spreads getFullResetState() on connect/disconnect, clearing selectedProjectId, selectedSessionId, conversation, tabs, and all derived state.

How to avoid:

Snapshot before switching: Capture full AppState before connect/disconnect
Scope snapshots by context: Use Map<'local' | SshHostKey, AppState> to store separate state per context
Partial restoration: Only restore state compatible with the new context (project IDs may differ between local/SSH)
Preserve UI preferences: Tab layout, expanded groups, scroll positions are context-agnostic and should always restore

Warning signs:

User complaints about "losing their place" when switching contexts
Empty dashboard after SSH connect, despite having viewed sessions before
No tabs open after switching back to local mode
Context panel expansions reset on every switch

Phase to address: Phase 1 (Core Infrastructure) — state snapshot/restore must be foundational before adding multi-context features.

Pitfall 2: EventEmitter Listener Accumulation on Repeated Context Switches

What goes wrong: FileWatcher, SshConnectionManager, and NotificationManager are EventEmitters. Each context switch may call setFileSystemProvider() or re-initialize services without removing old listeners, causing:

Memory leaks (listeners hold references to previous provider instances)
Duplicate event emissions (one event triggers 2x, 3x handlers)
Stale event handlers executing against wrong context

Why it happens: Services use .on() to attach listeners but don't call .removeAllListeners() or track cleanup functions. The initializeNotificationListeners() function in store/index.ts returns a cleanup function, but there's no guarantee it's called before re-initialization.

How to avoid:

Centralized cleanup: Create dispose() methods for all EventEmitter services
Lifecycle tracking: Maintain Map<string, () => void> of cleanup functions keyed by service name
Pre-switch cleanup: Call all cleanup functions BEFORE provider swap
Idempotent initialization: Check if (this.isInitialized) return to prevent double-init
Avoid removeAllListeners() on ipcRenderer: Always specify channel name — blanket removal breaks Electron internals

Warning signs:

Process memory usage (RSS) grows 50-100MB per context switch
DevTools heap snapshot shows multiple FileWatcher/SshConnectionManager instances
Console logs show duplicate "file-change" events for same file
IPC handlers fire 2x for single user action

Phase to address: Phase 1 (Core Infrastructure) — service lifecycle management is critical before multi-context.

Sources:

Pitfall 3: Stale Closures in Zustand Actions Capturing Old Provider References

What goes wrong: Zustand actions like fetchProjects() capture the FileSystemProvider in their closure at store creation time. After calling sshConnectionManager.getProvider() which returns a new SshFileSystemProvider, existing actions still reference the old LocalFileSystemProvider, causing operations to execute against the wrong context.

Why it happens: JavaScript closures capture variables at definition time. When a Zustand action calls window.electronAPI.getProjects(), the IPC handler calls projectScanner.scan(), which calls this.fsProvider.readdir(). If the scanner's provider reference wasn't updated, it uses the stale provider.

How to avoid:

Late binding via getter: Services should call getProvider() on every operation, not cache at construction
Provider as parameter: Pass provider explicitly to service methods instead of storing as instance variable
Re-initialize services: After provider swap, call fileWatcher.setFileSystemProvider(newProvider) on ALL services
Service registry pattern: Centralize provider injection so swap happens in one place
Verify in tests: Mock provider swap and assert service calls hit new provider

Warning signs:

SSH connect succeeds but dashboard shows local projects
File operations fail with "ENOENT" after provider swap
Console shows "LocalFileSystemProvider" operations when SSH is connected
User switches to SSH but sees local data until app restart

Phase to address: Phase 1 (Core Infrastructure) — provider injection architecture must be correct from the start.

Sources:

Pitfall 4: IPC Race Conditions During Context Switch with In-Flight Requests

What goes wrong: User clicks "Connect SSH" → IPC handler starts scanning remote projects → user quickly clicks "Disconnect" → scan completes and populates store with SSH data → store now shows SSH projects in local mode, creating data corruption.

Why it happens: Async IPC calls (getProjects, getSessions) don't track which context initiated them. By the time a response arrives, the context may have switched, but Zustand applies the stale response anyway.

How to avoid:

Context ID stamping: Include contextId (UUID per connection) in every IPC request/response
Response validation: Check if (currentContextId !== response.contextId) return before applying
Request cancellation: Track in-flight requests via AbortController and cancel on context switch
State machine: Use FSM with states: idle → connecting → connected → disconnecting → idle. Reject operations in wrong states.
Sequential transitions: Wait for disconnect cleanup to complete before starting connect

Warning signs:

Dashboard briefly flickers between local and SSH data during transitions
Error notifications appear for wrong context after switching
Search results from previous context appear after switch
Console shows "Cannot read property of null" after rapid connect/disconnect

Phase to address: Phase 1 (Core Infrastructure) — race condition prevention is non-negotiable for reliability.

Sources:

Pitfall 5: FileWatcher Polling Timer Not Cleared on SSH Disconnect

What goes wrong: SSH mode uses setInterval() polling (5s) instead of fs.watch(). When user disconnects, stop() clears pollingTimer, but the cleanup happens in finally block — if an exception occurs, timer keeps running. After 10 switches, 10 polling timers consume CPU checking a disconnected SSH provider.

Why it happens: Interval timers require explicit clearInterval(), and error paths may skip cleanup. The pollingTimer property is set to null without actually clearing the interval, causing orphaned timers.

How to avoid:

Cleanup in finally: Always clear timers in finally block, not just normal path
Defensive clearing: Before creating new timer, clear existing: if (this.timer) clearInterval(this.timer)
Timer registry: Track all active timers in Set<NodeJS.Timeout> and clear all on dispose
Polling refactor: Consider using a single interval that checks isRemote() flag instead of separate modes
Memory profiling: Add setInterval/clearInterval tracking to detect orphaned timers

Warning signs:

CPU usage increases by 5-10% per context switch
Chrome DevTools → Performance shows multiple "pollForChanges" concurrent executions
Memory usage grows 20MB per switch (timer closures hold provider references)
App becomes sluggish after 5-10 SSH switches

Phase to address: Phase 1 (Core Infrastructure) — timer lifecycle bugs cause performance degradation.

Pitfall 6: SSH Connection Not Properly Disposed, Keeping Socket Open

What goes wrong: SshConnectionManager.disconnect() calls client.end(), but if the SFTP channel is still processing operations, the socket remains open (TCP FIN not sent). After 10 context switches, 10 SSH connections consume file descriptors and remote server resources.

Why it happens: ssh2 Client .end() is graceful (waits for channel closure), but .destroy() is immediate. If you don't wait for SFTP operations to complete before calling .end(), the connection may linger. The dispose() method swallows errors, hiding cleanup failures.

How to avoid:

Force close on disconnect: Use client.destroy() instead of .end() for immediate termination
SFTP channel close: Call sftp.end() explicitly before client.end()
Connection timeout: Set 5s timeout on disconnect — if client doesn't close, force destroy
Socket tracking: Expose net.Socket from Client and verify socket.destroyed === true post-cleanup
Resource monitoring: Log open file descriptors (lsof) to detect leaks

Warning signs:

lsof -p <pid> shows 10+ TCP connections to SSH host
SSH host reports "Max sessions exceeded" after many switches
netstat shows multiple ESTABLISHED connections to port 22
Disconnect takes 5+ seconds (waiting for timeout)

Phase to address: Phase 1 (Core Infrastructure) — connection leaks cause operational issues at scale.

Sources:

Pitfall 7: Tab State Restoration Without Context Validation

What goes wrong: Snapshot captures open tabs with projectId: "abc-local-path". User switches to SSH where projects have different IDs. Restoration tries to open tab with local projectId against SSH provider → tab shows "Project not found" error or stale local data.

Why it happens: ProjectId encoding includes file path (-Users-name-project), so local and SSH projects for the same logical directory have different IDs. Tab restoration doesn't validate whether a project exists in the new context.

How to avoid:

Path normalization: Introduce logicalProjectPath (e.g., /home/user/project) separate from encoded ID
Cross-context mapping: Maintain Map<logicalPath, {localId, sshId}> to translate IDs during restore
Graceful degradation: If project doesn't exist in new context, close tab or show empty state instead of error
Restore validation: Before applying snapshot, call window.electronAPI.getProjects() and filter tabs to existing projects
User notification: Show toast "3 tabs closed - projects not available in SSH mode"

Warning signs:

User switches to SSH, tabs show "Failed to load session"
Console errors "Project ID abc-local not found" after context switch
Tab bar shows ghost tabs (visible but non-functional)
Click on restored tab triggers navigation to dashboard instead

Phase to address: Phase 2 (State Persistence) — after Phase 1 establishes snapshots, validation ensures safe restoration.

Pitfall 8: Partial Snapshot Creates Inconsistent Derived State

What goes wrong: Snapshot captures selectedProjectId, selectedSessionId, but omits sessionDetail, conversation, chunks. Restoration sets selections without corresponding data → store state says "session XYZ selected" but detail is null → UI renders empty or crashes on null access.

Why it happens: Developers manually list fields to snapshot instead of capturing full AppState. Derived state (detail, conversation) is computed from selections, but if selections restore before data loads, the UI sees inconsistent state.

How to avoid:

Snapshot everything: Capture AppState wholesale, excluding only ephemeral UI flags (loading, error)
Atomic restoration: Use setState(() => snapshot) so all fields update simultaneously
Lazy detail loading: On restoration, if selectedSessionId exists but sessionDetail is null, trigger background fetch
State versioning: Include snapshotVersion: 1 to handle schema changes (e.g., new fields added)
Whitelist vs blacklist: Easier to exclude ephemeral fields (loading, error) than enumerate 50+ persistent fields

Warning signs:

After switch, session viewer shows spinner indefinitely
Console error "Cannot read property 'chunks' of null"
Dashboard shows selected project but no sessions loaded
Tab labels show "undefined" or "[object Object]"

Phase to address: Phase 1 (Core Infrastructure) — snapshot completeness is foundational to state restoration.

Sources:

Zustand - react state management made easy

Technical Debt Patterns

Shortcuts that seem reasonable but create long-term problems.

Shortcut	Immediate Benefit	Long-term Cost	When Acceptable
Storing provider as instance variable	Simple, no getter needed	Stale closure bugs on provider swap	Never — always use getter
Not tracking cleanup functions	Less boilerplate code	Memory leaks accumulate with usage	Never — cleanup is mandatory
Hard reset on context switch	Easy to implement (3 lines)	User loses workspace state	Never — destroys UX
Using `removeAllListeners()`	Clears everything at once	Breaks Electron IPC internals	Never — specify channel
Sync IPC for connect	Blocks until complete	UI freezes 2-5s on slow networks	Never — SSH is async
Single global DataCache	Shared cache across contexts	Cache pollution (local data in SSH mode)	Never — scope cache per context
Manual field-by-field snapshot	Fine-grained control	Fragile, breaks when state schema evolves	Only for specific performance tuning
Not versioning snapshots	Don't need migration code	Can't safely restore after app updates	Only in MVP — add versioning in Phase 2

Integration Gotchas

Common mistakes when connecting to external services.

Integration	Common Mistake	Correct Approach
SSH2 Client	Using `.end()` without waiting for SFTP close	Call `sftp.end()`, wait 100ms, then `client.destroy()`
SFTP operations	Not catching ENOENT on remote path access	Wrap all `sftp.stat()` in try/catch, validate paths exist
IPC context bridge	Passing large objects (10MB+ session JSON)	Stream via chunks or use temp file + file path
FileWatcher SSH polling	Polling every 1s (too aggressive)	5s minimum — SSH has latency
Zustand subscriptions	Subscribing to full store on every render	Use selectors: `useStore((s) => s.field, shallow)`
React useEffect	Empty deps array with state access	Include all state in deps or use functional updates

Performance Traps

Patterns that work at small scale but fail as usage grows.

Trap	Symptoms	Prevention	When It Breaks
Cache not scoped by context	Search returns SSH results in local mode	Separate caches: `localCache`, `sshCache`	After first context switch
Snapshot entire 50MB state	500ms freeze during switch	Exclude non-serializable (functions, DOM refs)	10+ open sessions
No debounce on provider swap events	10 re-renders per switch	Debounce 100ms, batch provider updates	Rapid switching (QA testing)
Loading all tabs eagerly	UI freezes restoring 20 tabs	Virtual scrolling + lazy load tab content	10+ open tabs
Not canceling in-flight IPC	Stale responses overwrite new data	AbortController per IPC call	Slow network (SSH over VPN)
Synchronous validation during snapshot	Blocks UI during switch	Validate asynchronously post-restoration	Large state (100+ projects)

Security Mistakes

Domain-specific security issues beyond general web security.

Mistake	Risk	Prevention
Storing SSH passwords in snapshot	Plaintext credentials in memory dump	Never snapshot credentials — only host/port/user
Not validating SSH host keys	MITM attacks	Use ssh2 `hostVerifier` callback, check known_hosts
Exposing SSH provider to renderer	Renderer can execute arbitrary commands	Keep provider in main process only, expose via IPC
Logging full AppState snapshots	Sensitive data in logs	Redact: `password`, `privateKey`, `sessionContent`
Not sanitizing project paths	Path traversal attacks	Validate paths with `path.resolve()`, reject ".."

UX Pitfalls

Common user experience mistakes in this domain.

Pitfall	User Impact	Better Approach
No visual feedback during switch	User doesn't know if click worked	Show connecting spinner + progress bar
Immediate switch without confirmation	Accidental clicks lose workspace	"Switch will close X tabs. Continue?" dialog
No indication of current context	User forgets if in local/SSH	Persistent badge in header: "Local" or "SSH: hostname"
Context switch closes all tabs	User loses 10+ open sessions	Restore tabs if projects exist in new context
No way to view both contexts	Can't compare local vs SSH data	Split view or dual panes (future enhancement)
Errors shown as raw exceptions	"ENOTFOUND" means nothing to user	Friendly: "Cannot connect to SSH host. Check network."

"Looks Done But Isn't" Checklist

Things that appear complete but are missing critical pieces.

Context switch — Looks like it works because SSH connects, but missing: cleanup old listeners, cancel in-flight IPC, validate restored tabs
State snapshot — Appears to save state, but missing: exclude ephemeral fields, version snapshots, validate restorable
Provider swap — FileWatcher updated, but missing: update ProjectScanner, SessionParser, ErrorDetector, NotificationManager
Tab restoration — Tabs re-open, but missing: validate project exists, load session detail, restore scroll position
IPC request tracking — Responses return, but missing: context ID validation, abort on disconnect, timeout handling
Service dispose — dispose() method exists, but missing: remove EventEmitter listeners, clear timers, close connections
Cache invalidation — Cache cleared on switch, but missing: scope cache per context, invalidate on reconnect
Error handling — Try/catch present, but missing: rollback state on failure, show user-friendly message, log to sentry

Recovery Strategies

When pitfalls occur despite prevention, how to recover.

Pitfall	Recovery Cost	Recovery Steps
Destructive switch lost state	MEDIUM	Phase 2: Add persistent storage (localStorage) to survive restarts
Listener leaks	LOW	Add dispose tracking, expose `debug.listenerCount()` IPC for diagnosis
Stale closure	HIGH	Refactor service architecture to inject provider per call (breaking change)
IPC race condition	MEDIUM	Add context ID retroactively, reject responses with mismatched ID
Timer leak	LOW	DevTools audit: setInterval/clearInterval coverage, add cleanup tests
SSH connection leak	LOW	Add connection pool with max limit, force destroy after 30s timeout
Tab validation missing	MEDIUM	Add migration: filter invalid tabs on app startup
Partial snapshot	HIGH	Ship hotfix: capture full state, deprecate manual field list

Pitfall-to-Phase Mapping

How roadmap phases should address these pitfalls.

Pitfall	Prevention Phase	Verification
Destructive context switching	Phase 1	Unit test: switch → switch back → assert state restored
EventEmitter listener accumulation	Phase 1	Memory profiler: assert RSS stable after 10 switches
Stale closures	Phase 1	Integration test: mock provider swap, assert service calls new provider
IPC race conditions	Phase 1	E2E test: rapid connect/disconnect, assert no stale data
FileWatcher polling timer	Phase 1	Unit test: call stop(), assert pollingTimer cleared
SSH connection disposal	Phase 1	Integration test: assert socket.destroyed after disconnect
Tab restoration validation	Phase 2	E2E test: open tabs → switch SSH → assert only valid tabs restored
Partial snapshot	Phase 1	Integration test: snapshot → restore → assert detail populated

Sources

Official Documentation

Electron State Management

Memory Leaks and Cleanup

React Stale Closures

Zustand and State Management

VS Code Multi-Root Workspaces (Reference Architecture)

Dependency Injection and Service Lifecycle

Pitfalls research for: Multi-context workspace switching in Electron + Zustand Researched: 2026-02-12

24 KiB Raw Blame History

Pitfalls Research

Critical Pitfalls

Pitfall 1: Destructive Context Switching with Incomplete State Snapshots

Pitfall 2: EventEmitter Listener Accumulation on Repeated Context Switches

Pitfall 3: Stale Closures in Zustand Actions Capturing Old Provider References

Pitfall 4: IPC Race Conditions During Context Switch with In-Flight Requests

Pitfall 5: FileWatcher Polling Timer Not Cleared on SSH Disconnect

Pitfall 6: SSH Connection Not Properly Disposed, Keeping Socket Open

Pitfall 7: Tab State Restoration Without Context Validation

Pitfall 8: Partial Snapshot Creates Inconsistent Derived State

Technical Debt Patterns

Integration Gotchas

Performance Traps

Security Mistakes

UX Pitfalls

"Looks Done But Isn't" Checklist

Recovery Strategies

Pitfall-to-Phase Mapping

Sources

Official Documentation

Electron State Management

Memory Leaks and Cleanup

React Stale Closures

Zustand and State Management

VS Code Multi-Root Workspaces (Reference Architecture)

Dependency Injection and Service Lifecycle

24 KiB

Raw Blame History