From d01fcffe32d68391d76032b04faaf7ed3156b25d Mon Sep 17 00:00:00 2001 From: Jiang Bohan Date: Wed, 11 Feb 2026 17:10:12 +0800 Subject: [PATCH] docs(subagent): add architecture flowchart and README Document the full subagent lifecycle: spawn, concurrency queue, execution, completion handling, two-tier announcement delivery, and record archival. Include provider inheritance chain and error propagation diagrams. Co-Authored-By: Claude Opus 4.6 --- packages/core/src/agent/subagent/README.md | 172 +++++++++++++++++++++ 1 file changed, 172 insertions(+) create mode 100644 packages/core/src/agent/subagent/README.md diff --git a/packages/core/src/agent/subagent/README.md b/packages/core/src/agent/subagent/README.md new file mode 100644 index 00000000..e007eab1 --- /dev/null +++ b/packages/core/src/agent/subagent/README.md @@ -0,0 +1,172 @@ +# Subagent System + +The subagent system allows a parent agent to spawn isolated child agents that run tasks in parallel and report results back automatically. + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Parent Agent (runner.ts) │ +│ │ +│ tools: sessions_spawn, sessions_list │ +│ state: resolvedProvider, toolsOptions │ +└──────────┬──────────────────────────────────────────────────────────┘ + │ + │ sessions_spawn(task, label, timeoutSeconds) + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Spawn Flow (sessions-spawn.ts) │ +│ │ +│ 1. Build subagent system prompt (announce.ts) │ +│ 2. hub.createSubagent(childSessionId, { provider, model }) │ +│ 3. registerSubagentRun({ start: () => childAgent.write(task) }) │ +│ 4. Return { status: "accepted", runId, childSessionId } │ +└──────────┬──────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Concurrency Queue (command-queue.ts) │ +│ │ +│ Lane: "subagent" — max 10 concurrent (configurable) │ +│ Queued runs wait for a slot before start() is called │ +└──────────┬──────────────────────────────────────────────────────────┘ + │ slot acquired + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Child Agent Execution │ +│ │ +│ ┌───────────────────────────────────────────────────────────────┐ │ +│ │ AsyncAgent (async-agent.ts) │ │ +│ │ - Isolated session with restricted tools (isSubagent=true) │ │ +│ │ - Inherits parent's LLM provider │ │ +│ │ - System prompt: task focus + error reporting rules │ │ +│ │ - Tracks lastRunError for error propagation │ │ +│ └───────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────┐ │ +│ │ watchChildAgent (registry.ts) │ │ +│ │ - Sets startedAt, starts timeout timer │ │ +│ │ - waitForIdle() — waits for child's task queue to drain │ │ +│ │ - onClose() — handles explicit close (timeout kill, etc.) │ │ +│ └───────────────────────────────────────────────────────────────┘ │ +└──────────┬──────────────────────────────────────────────────────────┘ + │ + │ child completes / errors / times out + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Completion Handling (registry.ts) │ +│ │ +│ handleRunCompletion(record) │ +│ │ │ +│ ├─ Phase 1: captureFindings() │ +│ │ - Read last assistant reply from child session JSONL │ +│ │ - Falls back to last toolResult if no assistant text │ +│ │ - Persists findings to record before session deletion │ +│ │ │ +│ ├─ Session Cleanup │ +│ │ - cleanup="delete": rm child session dir + hub.closeAgent() │ +│ │ - cleanup="keep": preserve for audit │ +│ │ │ +│ └─ Phase 2: checkAndAnnounce(requesterSessionId) │ +│ - Finds all unannounced, completed runs with findings │ +│ - Calls runCoalescedAnnounceFlow() │ +│ - Marks records: announced=true, archiveAtMs=now+60min │ +└──────────┬──────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Announcement Delivery (announce.ts) │ +│ │ +│ runCoalescedAnnounceFlow(requesterSessionId, records) │ +│ │ │ +│ ├─ Format message: formatCoalescedAnnouncementMessage() │ +│ │ - Single record: task name, status, findings, stats │ +│ │ - Multiple records: combined report with all findings │ +│ │ │ +│ ├─ Two-tier delivery: │ +│ │ │ +│ │ Tier 1: BUSY (parent running or has pending writes) │ +│ │ └─ enqueueAnnounce() → announce-queue.ts │ +│ │ - Debounce 1s to batch nearby completions │ +│ │ - Drain via writeInternal() when parent finishes │ +│ │ │ +│ │ Tier 2: IDLE (parent not running) │ +│ │ └─ sendAnnounceDirect() │ +│ │ - writeInternal(msg, { forwardAssistant, persistResponse })│ +│ │ │ +│ └─ All delivery uses writeInternal() (marks as internal: true) │ +│ → Prevents announcement from showing as user bubble in UI │ +│ → LLM processes findings and responds naturally to user │ +└──────────┬──────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Record Lifecycle (registry.ts) │ +│ │ +│ created → startedAt → endedAt → findingsCaptured → announced │ +│ │ +│ After announcement: │ +│ - Record kept with archiveAtMs = now + 60 min │ +│ - sessions_list can still query records during this window │ +│ - Sweeper runs every 60s, removes expired records │ +│ - When all records removed, sweeper stops │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +## Key Files + +| File | Purpose | +|------|---------| +| `sessions-spawn.ts` | Tool: spawns a child agent with task, label, timeout, provider | +| `sessions-list.ts` | Tool: lists subagent runs and their status | +| `registry.ts` | Lifecycle management: register, watch, capture, announce, archive | +| `announce.ts` | System prompt builder, findings reader, message formatter, delivery | +| `announce-queue.ts` | Debounced queue for batching announcements when parent is busy | +| `command-queue.ts` | Concurrency limiter for subagent lane slots | +| `lanes.ts` | Lane config: max concurrency (10), default timeout (600s) | +| `types.ts` | Shared types: SubagentRunRecord, SubagentRunOutcome, etc. | +| `registry-store.ts` | Persistence: save/load runs to disk for crash recovery | + +## Provider Inheritance + +Subagents inherit the parent's resolved LLM provider: + +``` +runner.ts (resolvedProvider) + → toolsOptions.provider + → tools.ts (CreateToolsOptions.provider) + → sessions-spawn.ts (options.provider) + → hub.createSubagent({ provider }) +``` + +When the user switches providers via UI (`setProvider()`), `toolsOptions.provider` is updated in sync so future spawns use the new provider. + +## Error Propagation + +``` +Child tool error (e.g., API 401) + → Subagent LLM sees error, includes in final message (system prompt rule) + → captureFindings() reads final message + → Announcement includes error in findings + → Parent LLM sees error and can inform user + +Child run error (e.g., missing API key for provider) + → AsyncAgent._lastRunError set + → registry.ts checks childAgent.lastRunError after waitForIdle() + → outcome = { status: "error", error: "No API key configured..." } + → Announcement: "task failed: No API key configured..." +``` + +## Timeout Behavior + +Default: 600s (10 min). System prompt guides the parent LLM: +- Simple tasks: 600s (default) +- Moderate tasks: 900-1200s (15-20 min) +- Complex tasks: 1200-1800s (20-30 min) + +On timeout: +1. Timeout timer fires in `watchChildAgent()` +2. `cleanup({ status: "timeout" })` is called +3. Child agent is closed via `hub.closeAgent()` +4. Findings are captured from whatever the child wrote so far +5. Announcement reports "timed out" with partial findings