docs(subagent): add architecture flowchart and README
Document the full subagent lifecycle: spawn, concurrency queue, execution, completion handling, two-tier announcement delivery, and record archival. Include provider inheritance chain and error propagation diagrams. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
58ec56234c
commit
d01fcffe32
1 changed files with 172 additions and 0 deletions
172
packages/core/src/agent/subagent/README.md
Normal file
172
packages/core/src/agent/subagent/README.md
Normal file
|
|
@ -0,0 +1,172 @@
|
|||
# Subagent System
|
||||
|
||||
The subagent system allows a parent agent to spawn isolated child agents that run tasks in parallel and report results back automatically.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Parent Agent (runner.ts) │
|
||||
│ │
|
||||
│ tools: sessions_spawn, sessions_list │
|
||||
│ state: resolvedProvider, toolsOptions │
|
||||
└──────────┬──────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ sessions_spawn(task, label, timeoutSeconds)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Spawn Flow (sessions-spawn.ts) │
|
||||
│ │
|
||||
│ 1. Build subagent system prompt (announce.ts) │
|
||||
│ 2. hub.createSubagent(childSessionId, { provider, model }) │
|
||||
│ 3. registerSubagentRun({ start: () => childAgent.write(task) }) │
|
||||
│ 4. Return { status: "accepted", runId, childSessionId } │
|
||||
└──────────┬──────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Concurrency Queue (command-queue.ts) │
|
||||
│ │
|
||||
│ Lane: "subagent" — max 10 concurrent (configurable) │
|
||||
│ Queued runs wait for a slot before start() is called │
|
||||
└──────────┬──────────────────────────────────────────────────────────┘
|
||||
│ slot acquired
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Child Agent Execution │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────┐ │
|
||||
│ │ AsyncAgent (async-agent.ts) │ │
|
||||
│ │ - Isolated session with restricted tools (isSubagent=true) │ │
|
||||
│ │ - Inherits parent's LLM provider │ │
|
||||
│ │ - System prompt: task focus + error reporting rules │ │
|
||||
│ │ - Tracks lastRunError for error propagation │ │
|
||||
│ └───────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────┐ │
|
||||
│ │ watchChildAgent (registry.ts) │ │
|
||||
│ │ - Sets startedAt, starts timeout timer │ │
|
||||
│ │ - waitForIdle() — waits for child's task queue to drain │ │
|
||||
│ │ - onClose() — handles explicit close (timeout kill, etc.) │ │
|
||||
│ └───────────────────────────────────────────────────────────────┘ │
|
||||
└──────────┬──────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ child completes / errors / times out
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Completion Handling (registry.ts) │
|
||||
│ │
|
||||
│ handleRunCompletion(record) │
|
||||
│ │ │
|
||||
│ ├─ Phase 1: captureFindings() │
|
||||
│ │ - Read last assistant reply from child session JSONL │
|
||||
│ │ - Falls back to last toolResult if no assistant text │
|
||||
│ │ - Persists findings to record before session deletion │
|
||||
│ │ │
|
||||
│ ├─ Session Cleanup │
|
||||
│ │ - cleanup="delete": rm child session dir + hub.closeAgent() │
|
||||
│ │ - cleanup="keep": preserve for audit │
|
||||
│ │ │
|
||||
│ └─ Phase 2: checkAndAnnounce(requesterSessionId) │
|
||||
│ - Finds all unannounced, completed runs with findings │
|
||||
│ - Calls runCoalescedAnnounceFlow() │
|
||||
│ - Marks records: announced=true, archiveAtMs=now+60min │
|
||||
└──────────┬──────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Announcement Delivery (announce.ts) │
|
||||
│ │
|
||||
│ runCoalescedAnnounceFlow(requesterSessionId, records) │
|
||||
│ │ │
|
||||
│ ├─ Format message: formatCoalescedAnnouncementMessage() │
|
||||
│ │ - Single record: task name, status, findings, stats │
|
||||
│ │ - Multiple records: combined report with all findings │
|
||||
│ │ │
|
||||
│ ├─ Two-tier delivery: │
|
||||
│ │ │
|
||||
│ │ Tier 1: BUSY (parent running or has pending writes) │
|
||||
│ │ └─ enqueueAnnounce() → announce-queue.ts │
|
||||
│ │ - Debounce 1s to batch nearby completions │
|
||||
│ │ - Drain via writeInternal() when parent finishes │
|
||||
│ │ │
|
||||
│ │ Tier 2: IDLE (parent not running) │
|
||||
│ │ └─ sendAnnounceDirect() │
|
||||
│ │ - writeInternal(msg, { forwardAssistant, persistResponse })│
|
||||
│ │ │
|
||||
│ └─ All delivery uses writeInternal() (marks as internal: true) │
|
||||
│ → Prevents announcement from showing as user bubble in UI │
|
||||
│ → LLM processes findings and responds naturally to user │
|
||||
└──────────┬──────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Record Lifecycle (registry.ts) │
|
||||
│ │
|
||||
│ created → startedAt → endedAt → findingsCaptured → announced │
|
||||
│ │
|
||||
│ After announcement: │
|
||||
│ - Record kept with archiveAtMs = now + 60 min │
|
||||
│ - sessions_list can still query records during this window │
|
||||
│ - Sweeper runs every 60s, removes expired records │
|
||||
│ - When all records removed, sweeper stops │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Key Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `sessions-spawn.ts` | Tool: spawns a child agent with task, label, timeout, provider |
|
||||
| `sessions-list.ts` | Tool: lists subagent runs and their status |
|
||||
| `registry.ts` | Lifecycle management: register, watch, capture, announce, archive |
|
||||
| `announce.ts` | System prompt builder, findings reader, message formatter, delivery |
|
||||
| `announce-queue.ts` | Debounced queue for batching announcements when parent is busy |
|
||||
| `command-queue.ts` | Concurrency limiter for subagent lane slots |
|
||||
| `lanes.ts` | Lane config: max concurrency (10), default timeout (600s) |
|
||||
| `types.ts` | Shared types: SubagentRunRecord, SubagentRunOutcome, etc. |
|
||||
| `registry-store.ts` | Persistence: save/load runs to disk for crash recovery |
|
||||
|
||||
## Provider Inheritance
|
||||
|
||||
Subagents inherit the parent's resolved LLM provider:
|
||||
|
||||
```
|
||||
runner.ts (resolvedProvider)
|
||||
→ toolsOptions.provider
|
||||
→ tools.ts (CreateToolsOptions.provider)
|
||||
→ sessions-spawn.ts (options.provider)
|
||||
→ hub.createSubagent({ provider })
|
||||
```
|
||||
|
||||
When the user switches providers via UI (`setProvider()`), `toolsOptions.provider` is updated in sync so future spawns use the new provider.
|
||||
|
||||
## Error Propagation
|
||||
|
||||
```
|
||||
Child tool error (e.g., API 401)
|
||||
→ Subagent LLM sees error, includes in final message (system prompt rule)
|
||||
→ captureFindings() reads final message
|
||||
→ Announcement includes error in findings
|
||||
→ Parent LLM sees error and can inform user
|
||||
|
||||
Child run error (e.g., missing API key for provider)
|
||||
→ AsyncAgent._lastRunError set
|
||||
→ registry.ts checks childAgent.lastRunError after waitForIdle()
|
||||
→ outcome = { status: "error", error: "No API key configured..." }
|
||||
→ Announcement: "task failed: No API key configured..."
|
||||
```
|
||||
|
||||
## Timeout Behavior
|
||||
|
||||
Default: 600s (10 min). System prompt guides the parent LLM:
|
||||
- Simple tasks: 600s (default)
|
||||
- Moderate tasks: 900-1200s (15-20 min)
|
||||
- Complex tasks: 1200-1800s (20-30 min)
|
||||
|
||||
On timeout:
|
||||
1. Timeout timer fires in `watchChildAgent()`
|
||||
2. `cleanup({ status: "timeout" })` is called
|
||||
3. Child agent is closed via `hub.closeAgent()`
|
||||
4. Findings are captured from whatever the child wrote so far
|
||||
5. Announcement reports "timed out" with partial findings
|
||||
Loading…
Add table
Add a link
Reference in a new issue