Commit graph

310 commits

Author SHA1 Message Date
Jiayuan Zhang
e1eaa73e04 fix(agent): infer run-log tool errors from payload 2026-02-17 09:41:37 +08:00
Jiayuan Zhang
9d1ac0049f refactor(protocol): standardize sessionId alias across conversation flows 2026-02-17 09:40:28 +08:00
Jiayuan Zhang
6969790c25 refactor(protocol): deprecate legacy agentId conversation fallback 2026-02-17 09:40:28 +08:00
Jiayuan Zhang
4de89943f2 refactor(session): add agent/conversation hierarchical storage 2026-02-17 09:40:28 +08:00
Jiayuan Zhang
a0bb88e7b7 refactor(hub): enforce conversation-scoped device authorization 2026-02-17 09:39:25 +08:00
Jiayuan Zhang
3123506657 refactor(channels): persist route bindings across restarts 2026-02-17 09:39:25 +08:00
Jiayuan Zhang
dee70ea659 refactor(channels): bind route keys to isolated conversations 2026-02-17 09:39:25 +08:00
Jiayuan Zhang
b7b3d323b8 refactor(hub): decouple agent and conversation runtime model 2026-02-17 09:39:25 +08:00
Jiayuan Zhang
6a778e38e7 test(hub): cover conversation rpc handlers 2026-02-17 09:39:25 +08:00
Jiayuan Zhang
5ccf7bd798 fix(hooks): persist verified main conversation identity 2026-02-17 09:39:25 +08:00
Jiayuan Zhang
3c8569151a refactor(hub): add conversation-first rpc aliases 2026-02-17 09:39:25 +08:00
Jiayuan Zhang
754e604a40 refactor(protocol): add conversationId compatibility across hub/client 2026-02-17 09:39:24 +08:00
Jiayuan Zhang
f4bd5b7bbc
Merge pull request #220 from multica-ai/codex/delegate-progress-timer
feat(desktop): show delegate sub-task progress and running timers
2026-02-17 03:34:52 +08:00
Jiayuan Zhang
d45605283e feat(desktop): show delegate sub-task progress and timers 2026-02-17 03:27:17 +08:00
Jiayuan Zhang
e28ecb9a91
Merge pull request #216 from multica-ai/codex/meta-skill-installer-e2e-skills-benchmark
feat(skills): add ClawHub meta installer and agent-driven E2E benchmark
2026-02-17 02:45:45 +08:00
Jiayuan Zhang
39fde8e4b0
Merge pull request #218 from multica-ai/codex/web-fetch-evidence-coverage
fix(agent): enforce web search fetch evidence coverage
2026-02-17 02:45:12 +08:00
Jiayuan Zhang
4b7f0afb50 fix(agent): guard workaround and local skill mutation commands 2026-02-17 02:37:29 +08:00
Jiayuan Zhang
6fd4819280 fix(agent): surface installed skill ids in prompt 2026-02-17 02:37:29 +08:00
Jiayuan Zhang
7eb18f47fc fix(agent): enforce capability-gap skill recovery guidance 2026-02-17 02:37:29 +08:00
Jiayuan Zhang
850d55336a fix(agent): enforce sufficient search-fetch evidence 2026-02-17 02:08:15 +08:00
Jiayuan Zhang
b5b65c6bae fix(agent): enforce cross-turn web fetch evidence 2026-02-17 01:48:53 +08:00
Jiayuan Zhang
6e71598c2c
Merge pull request #215 from multica-ai/codex/docs-prune-and-regenerate-core-docs
docs: prune stale docs and regenerate prioritized core docs
2026-02-17 01:26:21 +08:00
Jiayuan Zhang
fc8a813120
Merge pull request #214 from multica-ai/codex/chat-context-window-indicator
feat(chat): add context window usage indicator
2026-02-17 00:55:09 +08:00
Jiayuan Zhang
ce6291e9eb fix(agent): enforce web_fetch after successful web_search 2026-02-17 00:49:57 +08:00
Jiayuan Zhang
ecb0cd392e chore(docs): remove non-e2e documentation 2026-02-17 00:46:36 +08:00
Jiayuan Zhang
ec8b62cef1 feat(chat): add context window usage indicator 2026-02-17 00:38:17 +08:00
Jiayuan Zhang
909efb5dab refactor(core): remove legacy subagent registry subsystem 2026-02-17 00:07:15 +08:00
Jiayuan Zhang
43198d9dcc feat(core): add rpc to generate channel welcome messages 2026-02-16 12:24:24 +08:00
Jiayuan Zhang
357bf326e0 fix(data): propagate errors so is_error is set correctly in run-log
Previously the data tool caught all errors and returned them as normal
tool results with error info in the JSON content. This meant pi-agent-core
never saw an exception and always set isError=false in the run-log, even
for rate limit errors (errCode 9001) and other API failures.

Now errors propagate to pi-agent-core which sets isError=true and formats
the error message for the LLM automatically.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 03:39:11 +08:00
Jiayuan Zhang
9c8be30d3d fix(test): increase timeout for summary fallback artifact extraction test
UC4 test times out in CI (5s default) because generateSummary's API
provider layer takes longer to fail on slow CI runners. Increase to 15s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 01:10:18 +08:00
Jiayuan Zhang
aada2916f4 fix(agent): clear timeout timer in delegate tool to prevent unhandled rejection
The setTimeout in runSubagentTask was never cleared when childAgent.run()
completed before the timeout. The dangling timer would later reject an
unobserved promise, causing an unhandled promise rejection crash in Node.js
v15+. Capture the timer and clear it in a .finally() block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 01:09:21 +08:00
Jiayuan Zhang
f60551195a chore(agent): remove old sessions_spawn/sessions_list tools and update references
Delete sessions-spawn.ts, sessions-list.ts and their tests. Update CLI
to remove waitForSubagents polling workaround (delegate is synchronous).
Update UI, desktop IPC, SWE-bench, and system prompt tests to use the
new delegate tool name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 01:09:21 +08:00
Jiayuan Zhang
d3ef8ecc31 feat(agent): replace sessions_spawn with synchronous delegate tool
Replace the async sessions_spawn/sessions_list sub-agent system with a
single synchronous `delegate` tool. The new tool runs tasks in parallel
via Promise.all with per-task timeout, returning combined results directly
in the tool response. This eliminates the need for registry, announce queue,
persistence, and Hub involvement.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 01:09:21 +08:00
Jiayuan Zhang
94ae88ed8b
Merge pull request #208 from multica-ai/forrestchang/compaction-audit
Context window: 4-phase compaction improvements
2026-02-16 00:01:58 +08:00
Jiayuan Zhang
0bce493e10 fix(compaction): handle pi-agent-core toolResult format in truncation and pruning
The pre-emptive truncation, tool result pruning, and summary fallback
only checked for Anthropic-style `role: "user"` messages with
`type: "tool_result"` blocks. The actual runtime uses pi-agent-core
format with `role: "toolResult"`, `toolCallId`, and `toolName` on the
message itself. This caused truncation and pruning to silently skip
all tool results in real agent runs.

Add handlers for the pi-agent-core format in all four affected modules:
- session-manager.ts: check both "user" and "toolResult" roles
- tool-result-truncation.ts: new handler for toolResult format
- tool-result-pruning.ts: new processToolResultMessage() + updated loops
- summary-fallback.ts: include "toolResult" in artifact ref extraction

Verified via agent-driven E2E tests (5 test sessions, 6 artifacts).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:39:52 +08:00
Jiayuan Zhang
b15e1eeb2a test(compaction): harden E2E integration tests for artifact pipeline
- Add real user messages for bootstrap protection in pruning tests
- Fix artifact directory path assertions (baseDir vs sessions/baseDir)
- Add cross-phase tests (Phase 1 truncation → Phase 2 pruning)
- Remove conditional assertion guards that could silently skip checks
- All 30 E2E integration tests now pass with mandatory assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:13:12 +08:00
Jiayuan Zhang
58f02a2080 fix(compaction): match artifact refs from both soft trim and truncation markers
The extractArtifactRef regex only matched "Full result saved to" (from
pre-emptive truncation) but not "Full result available at" (from soft
trim). This caused hard clear to lose artifact references when preceded
by soft trim in the same pruning pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:13:05 +08:00
Jiayuan Zhang
a1ac250e2b feat(system-prompt): enhance report with truncation tracking and token estimates
SectionReport now includes truncated/originalChars fields for budget-controlled
sections. formatPromptReport shows estimated token count and truncation details.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:02:41 +08:00
Jiayuan Zhang
c3433871a6 feat(system-prompt): add bootstrap budget control for workspace and skills
Workspace.md content is capped at 20k chars and skills prompt at 12k chars.
Oversized content is intelligently truncated (head 70% + marker + tail 20%)
with newline-boundary snapping. Inspired by OpenClaw's bootstrap budget system.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:02:34 +08:00
Jiayuan Zhang
5aa8a52784 feat(compaction): make pruning and summary artifact-aware
Soft trim and hard clear now detect and preserve artifact references
in their markers. Summary instructions include guidance to note artifact
paths. Plain-text fallback extracts and lists all artifact references
in a "Saved Artifacts" section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:02:27 +08:00
Jiayuan Zhang
3f9a30423d feat(session): add artifact storage and pre-emptive tool result truncation
Oversized tool results (>30% of context window) are now saved as artifacts
before being truncated in the session. The LLM sees a truncated version with
head+tail preservation and a marker pointing to the full artifact file,
which it can re-read on demand. This prevents information loss during
context window management.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:02:18 +08:00
Jiayuan Zhang
b8fb671a4b feat(agent): add reply context and responsiveness guidance to channel system prompt
Add instructions for the agent to understand [Replying to: "..."] annotations
and to send brief acknowledgments before tool calls when messages come from
messaging channels.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:39:45 +08:00
Jiayuan Zhang
b412ca902b refactor(compaction): remove dead code and legacy count mode
- Remove compactMessagesWithSummary (~100 lines, never called; only
  the Chunked variant was used)
- Remove compactMessagesByCount, findSafeCompactionPoint, and all
  count-mode references (~90 lines)
- Narrow CompactionResult.reason to "tokens" | "summary" | "pruning"
- Narrow compactionMode to "tokens" | "summary" (was "count" | ...)
- Simplify session-manager: remove maxMessages/keepLast params,
  enable tool result pruning by default

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:37:23 +08:00
Jiayuan Zhang
92cf312843 refactor(compaction): remove pre-flight tool result pruning
Pre-flight compaction runs in-memory only (not persisted), so tool
result pruning in this path was wasted work — results were thrown away
after the LLM call. Post-turn compaction still handles pruning and
persists the results. Only Phase 2 (emergency message drop) remains
as a safety net in pre-flight.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:37:15 +08:00
Jiayuan Zhang
fbb0b11c6e fix(compaction): fix system prompt token estimation and reduce safety margin
- estimateSystemPromptTokens now uses estimateTokens() (chars/4) instead
  of chars/2, eliminating the 2x overestimate that caused pre-flight
  compaction to fire on every LLM call at small context windows
- ESTIMATION_SAFETY_MARGIN reduced from 1.5 to 1.2, increasing usable
  context from ~53% to ~73% before compaction triggers

At 200k context, effective usable tokens before compaction improved from
~86k to ~120k message tokens (39% increase).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:37:09 +08:00
Jiayuan Zhang
40a2e8ae55 fix(context-window): prioritize config over model for context window resolution
resolveContextWindowInfo now uses config > model > default priority so
explicit --context-window flag overrides model defaults. Also adds
--context-window CLI option to the run command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:37:02 +08:00
Jiayuan Zhang
71f95d042a fix(agent): prevent double-save of assistant message on abort
Track the last assistant message saved by the message_end event handler
and skip saving it again in the abort handler. This prevents the
duplicate assistant entries in session.jsonl that caused the
"tool_call_id is not found" bug.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:31:36 +08:00
Jiayuan Zhang
fa2616c390 fix(session): drop duplicate assistant messages in transcript repair
When a session is aborted mid-tool-execution, the assistant message can
be persisted twice (once by message_end, once by the abort handler).
The repair logic failed to handle this: it generated a synthetic tool
result for the first copy but deduplicated the result for the second,
leaving an orphaned tool call that caused "tool_call_id is not found"
errors on all subsequent API calls.

Detect and remove duplicate assistant messages whose tool call IDs
have all already been paired with results from an earlier copy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:31:30 +08:00
Jiayuan Zhang
084657868f revert(agent): remove parallel tool execution patch, keep serial
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 20:43:37 +08:00
Jiayuan Zhang
b394b0ccf9 fix(skills): use python3 and inject skill directory path into prompt
- SKILL.md: python → python3 (macOS has no `python` binary)
- skills/index.ts: inject skill directory path so agent can resolve
  relative paths like scripts/recalc.py to absolute paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 19:53:18 +08:00