777genius 26baaf6924 chore: checkpoint agent launch hardening

2026-05-07 23:26:37 +03:00

15 KiB

Raw Blame History

Agent launch architecture comparison research

Research date: 2026-05-07

Purpose: record factual research on how different systems launch or execute agents. This is informational context only, not an implementation recommendation.

Scope

Systems compared:

System	Repository / source	Snapshot
Our Agent Teams	Local `claude_team` + `agent_teams_orchestrator`	Local working tree, 2026-05-07
Paperclip	`paperclipai` docs/code research from earlier pass	Public docs / local research
Gastown	`github.com/gastownhall/gastown`	cloned `cfbdf3c`
GoClaw Enterprise / Teams	`github.com/nextlevelbuilder/goclaw`	cloned `a97e502`
GoClaw OpenClaw-compatible gateway	`github.com/roelfdiedericks/goclaw`	cloned `6a7ccdb`

Primary external references:

Topic	Source
Gastown README	https://github.com/gastownhall/gastown/blob/main/README.md
Gastown agent provider integration	https://github.com/gastownhall/gastown/blob/main/docs/agent-provider-integration.md
GoClaw agent loop	https://github.com/nextlevelbuilder/goclaw/blob/main/docs/01-agent-loop.md
GoClaw agent teams	https://github.com/nextlevelbuilder/goclaw/blob/main/docs/11-agent-teams.md
GoClaw team WS events	https://github.com/nextlevelbuilder/goclaw/blob/main/docs/13-ws-team-events.md
Paperclip agent runtime	https://github.com/paperclipai/docs/blob/main/agents-runtime.md
Paperclip adapters overview	https://paperclip.inc/docs/adapters/overview/

Short answer

There are four distinct launch/execution models:

Model	Used by	Essence
External live CLI process	Our Agent Teams	App/orchestrator launches real teammate runtimes and tracks bootstrap, PID, stderr, process health, runtime evidence, task/message state.
Bounded adapter run	Paperclip	A heartbeat or job starts a short agent run, adapter invokes CLI/provider, result is captured, run exits or times out.
Tmux session orchestration	Gastown	`tmux` is the universal runtime adapter. Agents run in terminal sessions, receive input through tmux, and are observed through panes/session state.
In-process agent loop	GoClaw Enterprise / Teams	Agent execution is a Go `Loop.Run(ctx, RunRequest)` scheduled through lanes. The agent is a logical loop inside the gateway, not necessarily a separate CLI teammate process.

What “in-process agent loop” removes from our live teammate product

In-process loop does not mean “bad”. It is often cleaner. But compared to our external process teammate model, it removes or changes several product properties.

Product property	External process teammate, our model	In-process GoClaw-style loop
Real process identity	Each teammate can have PID/RSS/stdout/stderr/process lifetime.	Agent run is a gateway invocation; no independent teammate PID by default.
CLI-realism	Claude/Codex/OpenCode behave as their real CLI runtimes, including auth, prompts, provider errors, stderr quirks.	Provider/driver behavior is normalized inside gateway; fewer raw CLI lifecycle surfaces.
Per-member restart semantics	Restart means kill/relaunch or reattach a concrete runtime for that member.	Restart is usually cancel/reschedule a logical run/session.
Bootstrap evidence	We can distinguish process alive, bootstrap submitted, bootstrap confirmed, delivery proof, task proof.	The loop itself is already the controlled runtime; less need for low-level bootstrap proof.
UI runtime cards	UI can show memory, process state, liveness source, failed/stalled bootstrap, exact runtime diagnostics.	UI tends to show run/session/task status rather than OS/process-level teammate state.
TTY/process debugging	Process/tmux mode can expose raw CLI behavior when needed.	Debugging is gateway traces/events/logs, not a live CLI pane/process per member.
Failure classes	Auth prompt, no stdin, CLI did not submit bootstrap, process died, stale PID, provider CLI stderr.	Mostly provider/tool/session/run errors inside the loop.
Isolation boundary	OS process boundary per teammate.	Mostly logical/session isolation inside one gateway process, unless it delegates to external providers/tools.

Important distinction: in-process loop is simpler and can be more stable for gateway/chat products. It is not a drop-in replacement for a desktop product whose value includes live external teammate runtimes.

Our Agent Teams launch/execution model

Our current direction is app-managed live external teammate runtime.

Observed local architecture:

Layer	Role
`claude_team` Electron app	UI, provisioning, runtime projection, team messages, tasks, diagnostics, retries.
`agent_teams_orchestrator`	Multi-agent runtime orchestration, teammate spawning, provider/runtime bridging.
Process backend	Default for app-launched teammates after recent changes. Launch-owned processes are tracked as runtime entities.
Optional tmux mode	Debug/manual mode, not production default. Useful for real TTY inspection.
App-managed bootstrap	Backend injects/records startup context and requires durable readiness evidence instead of trusting “process exists”.
Runtime projection	Maps launch state, process liveness, bootstrap proof, delivery proof, task state and diagnostics to UI.

Key properties:

Dimension	Current behavior
Agent lifetime	Long-lived teammate process/session, not just one request.
Availability proof	Process alive is not enough. Need bootstrap/runtime evidence.
Provider mix	Claude, Codex, OpenCode can coexist in one team.
User experience	Live team room: cards, memory, tasks, messages, runtime errors, restart/retry controls.
Complexity cost	High. Many edge cases around launch, cleanup, stale state, delivery, work-sync, retries.

Technical assessment:

Criterion	Score
Live team product fit	9.2/10
Mixed provider fidelity	8.7/10
Runtime proof strictness	8.8/10
Simplicity	5.8/10
Maintainability today	7.2/10
Overall technical score	8.5/10

Paperclip launch/execution model

Paperclip is closest to a bounded job/heartbeat runner.

Research summary from earlier pass:

Piece	Behavior
Agent invocation	Heartbeat or scheduled run calls adapter execution.
Runtime	Adapter starts/calls CLI or provider, captures output/status/errors.
Lifecycle	Run exits, times out, or is cancelled.
Concurrency	Wakeups coalesce if agent is already running.
Persistence	Status/logs/tokens/errors are stored per run.

This is operationally clean because there is no expectation that every teammate is a continuously alive process with card-level runtime state.

Technical assessment:

Criterion	Score
Bounded execution design	9.1/10
Simplicity	8.8/10
Failure boundedness	8.7/10
Live teammate room fit	6.3/10
External CLI fidelity	7.5/10
Overall technical score	8.2/10

Gastown launch/execution model

Gastown is tmux-first.

Facts from gastownhall/gastown:

Piece	Behavior
Main runtime adapter	`tmux` sessions.
Universal integration	Any CLI that runs in terminal can be started and controlled.
Work unit	Beads/issues and convoys.
Worker identity	Polecats have persistent identity and reusable worktrees.
Session lifetime	Sessions are ephemeral; identity and sandbox can persist.
Communication	Mail, nudges, hooks, Beads state, tmux input/output.
Monitoring	Witness, Deacon, Dogs, Doctor, cleanup commands.
Provider integration	Built-in/custom presets with command, args, env, process names, hooks, readiness delay/prompt.

Gastown explicitly documents a Tier 0 tmux shim: start CLI in tmux, send work through keystrokes, detect liveness through pane process, read output through captured pane. It also notes that this level is timing-sensitive and lacks delivery confirmation.

Core model:

gt sling <bead> <rig>
  -> allocate or reuse polecat identity/worktree
  -> create tmux session
  -> set env: GT_ROLE, GT_RIG, GT_POLECAT, BD_ACTOR, GT_AGENT, etc.
  -> inject startup beacon / prompt / hook context
  -> nudge with instructions if provider needs fallback
  -> Witness/Deacon patrol health and cleanup

Technical assessment:

Criterion	Score
Terminal-native ops	9.0/10
Persistent worker identity	8.7/10
Cleanup / doctor culture	8.8/10
Delivery proof strictness	6.4/10
Live product state consistency	6.8/10
Overall technical score	8.0/10

GoClaw Enterprise / Teams launch/execution model

This is nextlevelbuilder/goclaw, the relevant GoClaw for agent teams.

Core architecture from docs/code:

Piece	Behavior
Agent unit	`Loop` configured with provider, model, tools, workspace and agent type.
Run entrypoint	`Loop.Run(ctx, RunRequest)`.
Loop pattern	Think -> Act -> Observe, with max iterations and tool execution.
Scheduler	First-class lane scheduler.
Lanes	`main`, `subagent`, `team`, `cron`.
Queueing	Per-session queues with debounce, drop policy, max concurrent.
Team model	Lead/member, task board, mailbox, delegation.
Task semantics	Atomic claim, status lifecycle, dependencies, blocker escalation, task events.
Events	Typed WS events for delegation, tasks, team messages and agent lifecycle.

Core execution shape:

Inbound message / teammate message / cron / delegation
  -> Scheduler.Schedule(lane, RunRequest)
  -> SessionQueue serializes or bounds per session
  -> Lane worker admits execution
  -> Router.Get(agentID)
  -> Loop.Run(ctx, req)
  -> Provider call + tools + finalization
  -> Events + stored session/task/trace state

GoClaw team member execution is conceptually a scheduled agent run, not an externally spawned teammate CLI process with bootstrap/check-in.

Technical assessment:

Criterion	Score
Scheduler architecture	9.2/10
Agent loop clarity	8.9/10
Team task model	8.8/10
Typed event model	8.8/10
Real external teammate runtime fidelity	6.6/10
Live process UI fit	6.5/10
Overall technical score	8.7/10

GoClaw OpenClaw-compatible gateway model

This is roelfdiedericks/goclaw. It is a different project than nextlevelbuilder/goclaw.

High-level facts:

Piece	Behavior
Product class	Personal AI gateway / OpenClaw-compatible bot runtime.
Main strengths	Transcript search, memory graph, channels, persistent memory, delegated runs, ACP sessions.
Delegated work	`subagent_spawn`, `subagent_fanout`, `subagent_status`, `subagent_cancel`.
Runner	`DefaultRunner` starts active runs as goroutines with run IDs, timeout/cancel, optional concurrency lane semaphore.
UI/control	`/runners` dashboard, SSE events, Telegram/TUI summaries.
Cursor integration	ACP attachment to live Cursor session.

Runner shape:

subagent_spawn / fanout
  -> DefaultRunner.Start(ctx, RunSpec)
  -> create RunRecord queued
  -> goroutine waits for lane admission
  -> execute function runs child work
  -> registry records completed/failed/canceled/timeout
  -> events emitted

This is closer to Paperclip-style delegated bounded runs than to our live teammate process model.

Technical assessment:

Criterion	Score
Personal gateway/memory architecture	8.8/10
Delegated run boundedness	8.5/10
Channel/memory richness	9.0/10
Live external teammate fidelity	5.8/10
Team room fit	6.4/10
Overall technical score	8.1/10

Direct comparison table

System	Launch/execution primitive	Separate OS process per agent?	Long-lived teammate?	Task board	Team messages	Scheduler	Tmux	Best fit
Our Agent Teams	Launch-owned external CLI/process runtime	Yes	Yes	Yes	Yes	Partial/ad-hoc today	Optional/debug	Desktop live mixed-provider team room
Paperclip	Bounded adapter heartbeat run	Usually per run	No	Limited/not central	Not team-room focused	Yes, job-like	No core tmux	Reliable background/job agents
Gastown	Tmux session + worktree + Beads	Yes, through tmux	Session ephemeral, identity persistent	Beads/convoys	Mail/nudges	Scheduler/capacity exists	Core	Terminal-native multi-agent ops
GoClaw Enterprise	In-process scheduled agent loop	Not by default	Logical sessions/runs	Yes	Yes	First-class lanes	No core tmux	Multi-agent gateway/platform
GoClaw OpenClaw-compatible	Delegated goroutine runner + gateway sessions	Not by default	Logical runs/sessions	Not primary team board in same way	Channels	Runner lane semaphore	No core tmux	Personal gateway, memory, delegated runs

Honest overall scores

System	Overall technical score	Why
GoClaw Enterprise / Teams	8.7/10	Cleanest scheduler/team/task/event architecture among compared systems.
Our Agent Teams	8.5/10	Best fit for real live external Claude/Codex/OpenCode teammate product, but high complexity.
Paperclip	8.2/10	Very clean bounded runtime model, but not a live team-room system.
GoClaw OpenClaw-compatible	8.1/10	Strong personal gateway/memory/delegated run model, less comparable to our team runtime.
Gastown	8.0/10	Strong terminal ops and lifecycle culture, but tmux-first delivery/readiness is less proof-strict.

Research conclusions

The systems optimize for different truths:

System	Optimized for
Our Agent Teams	User-visible live team of real external coding agents.
Paperclip	Bounded, simple, resumable background agent runs.
Gastown	Terminal-native agent ops at scale with durable work identity.
GoClaw Enterprise	Clean gateway-native multi-agent scheduling and team task orchestration.
GoClaw OpenClaw-compatible	Long-memory personal agent gateway with delegated subruns.

Most useful conceptual takeaways for future reference:

Idea	Source	Why it matters
First-class scheduler lanes	GoClaw Enterprise	Separates main/team/subagent/cron load and makes cancellation/backpressure more deterministic.
Typed team event catalog	GoClaw Enterprise	Makes UI and state transitions easier to reason about.
Persistent identity vs ephemeral session	Gastown	Useful framing for member identity, runtime session, task ownership and cleanup.
Bounded adapter runs	Paperclip	Good model for cron, background checks and non-live workers.
Patrol/doctor cleanup culture	Gastown	Good operational model for stale runtime/process/data cleanup.

Non-recommendation note: this document intentionally does not propose changing our architecture. It records observed models for future design discussions.

15 KiB Raw Blame History

Agent launch architecture comparison research

Scope

Short answer

What “in-process agent loop” removes from our live teammate product

Our Agent Teams launch/execution model

Paperclip launch/execution model

Gastown launch/execution model

GoClaw Enterprise / Teams launch/execution model

GoClaw OpenClaw-compatible gateway model

Direct comparison table

Honest overall scores

Research conclusions

15 KiB

Raw Blame History