Previously the data tool caught all errors and returned them as normal
tool results with error info in the JSON content. This meant pi-agent-core
never saw an exception and always set isError=false in the run-log, even
for rate limit errors (errCode 9001) and other API failures.
Now errors propagate to pi-agent-core which sets isError=true and formats
the error message for the LLM automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
UC4 test times out in CI (5s default) because generateSummary's API
provider layer takes longer to fail on slow CI runners. Increase to 15s.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The setTimeout in runSubagentTask was never cleared when childAgent.run()
completed before the timeout. The dangling timer would later reject an
unobserved promise, causing an unhandled promise rejection crash in Node.js
v15+. Capture the timer and clear it in a .finally() block.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete sessions-spawn.ts, sessions-list.ts and their tests. Update CLI
to remove waitForSubagents polling workaround (delegate is synchronous).
Update UI, desktop IPC, SWE-bench, and system prompt tests to use the
new delegate tool name.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the async sessions_spawn/sessions_list sub-agent system with a
single synchronous `delegate` tool. The new tool runs tasks in parallel
via Promise.all with per-task timeout, returning combined results directly
in the tool response. This eliminates the need for registry, announce queue,
persistence, and Hub involvement.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pre-emptive truncation, tool result pruning, and summary fallback
only checked for Anthropic-style `role: "user"` messages with
`type: "tool_result"` blocks. The actual runtime uses pi-agent-core
format with `role: "toolResult"`, `toolCallId`, and `toolName` on the
message itself. This caused truncation and pruning to silently skip
all tool results in real agent runs.
Add handlers for the pi-agent-core format in all four affected modules:
- session-manager.ts: check both "user" and "toolResult" roles
- tool-result-truncation.ts: new handler for toolResult format
- tool-result-pruning.ts: new processToolResultMessage() + updated loops
- summary-fallback.ts: include "toolResult" in artifact ref extraction
Verified via agent-driven E2E tests (5 test sessions, 6 artifacts).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add real user messages for bootstrap protection in pruning tests
- Fix artifact directory path assertions (baseDir vs sessions/baseDir)
- Add cross-phase tests (Phase 1 truncation → Phase 2 pruning)
- Remove conditional assertion guards that could silently skip checks
- All 30 E2E integration tests now pass with mandatory assertions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The extractArtifactRef regex only matched "Full result saved to" (from
pre-emptive truncation) but not "Full result available at" (from soft
trim). This caused hard clear to lose artifact references when preceded
by soft trim in the same pruning pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SectionReport now includes truncated/originalChars fields for budget-controlled
sections. formatPromptReport shows estimated token count and truncation details.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Workspace.md content is capped at 20k chars and skills prompt at 12k chars.
Oversized content is intelligently truncated (head 70% + marker + tail 20%)
with newline-boundary snapping. Inspired by OpenClaw's bootstrap budget system.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Soft trim and hard clear now detect and preserve artifact references
in their markers. Summary instructions include guidance to note artifact
paths. Plain-text fallback extracts and lists all artifact references
in a "Saved Artifacts" section.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Oversized tool results (>30% of context window) are now saved as artifacts
before being truncated in the session. The LLM sees a truncated version with
head+tail preservation and a marker pointing to the full artifact file,
which it can re-read on demand. This prevents information loss during
context window management.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add instructions for the agent to understand [Replying to: "..."] annotations
and to send brief acknowledgments before tool calls when messages come from
messaging channels.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove compactMessagesWithSummary (~100 lines, never called; only
the Chunked variant was used)
- Remove compactMessagesByCount, findSafeCompactionPoint, and all
count-mode references (~90 lines)
- Narrow CompactionResult.reason to "tokens" | "summary" | "pruning"
- Narrow compactionMode to "tokens" | "summary" (was "count" | ...)
- Simplify session-manager: remove maxMessages/keepLast params,
enable tool result pruning by default
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-flight compaction runs in-memory only (not persisted), so tool
result pruning in this path was wasted work — results were thrown away
after the LLM call. Post-turn compaction still handles pruning and
persists the results. Only Phase 2 (emergency message drop) remains
as a safety net in pre-flight.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- estimateSystemPromptTokens now uses estimateTokens() (chars/4) instead
of chars/2, eliminating the 2x overestimate that caused pre-flight
compaction to fire on every LLM call at small context windows
- ESTIMATION_SAFETY_MARGIN reduced from 1.5 to 1.2, increasing usable
context from ~53% to ~73% before compaction triggers
At 200k context, effective usable tokens before compaction improved from
~86k to ~120k message tokens (39% increase).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
resolveContextWindowInfo now uses config > model > default priority so
explicit --context-window flag overrides model defaults. Also adds
--context-window CLI option to the run command.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Track the last assistant message saved by the message_end event handler
and skip saving it again in the abort handler. This prevents the
duplicate assistant entries in session.jsonl that caused the
"tool_call_id is not found" bug.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a session is aborted mid-tool-execution, the assistant message can
be persisted twice (once by message_end, once by the abort handler).
The repair logic failed to handle this: it generated a synthetic tool
result for the first copy but deduplicated the result for the second,
leaving an orphaned tool call that caused "tool_call_id is not found"
errors on all subsequent API calls.
Detect and remove duplicate assistant messages whose tool call IDs
have all already been paired with results from an earlier copy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SKILL.md: python → python3 (macOS has no `python` binary)
- skills/index.ts: inject skill directory path so agent can resolve
relative paths like scripts/recalc.py to absolute paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LLM often invents custom groupId strings that don't exist in the
registry, causing "group not found" errors. Now auto-creates the
group instead, matching the behavior when `next` is provided.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
toolsOptions spread `options` which had sessionId undefined for
auto-generated sessions. This caused sessions_list and sessions_spawn
to fail with "No session ID available" — breaking sub-agent orchestration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sequential for+await tool dispatch with Promise.allSettled for
parallel execution. All tool_execution_start events emit immediately,
tools run concurrently, results are processed in original order.
Also fix run-log toolStartTimes to key by toolCallId instead of toolName
to prevent collisions with parallel same-name tools.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Grouped runs now display findings for completed sub-agents (up to 4000
chars). Ungrouped runs increased truncation from 200 to 4000 chars. All
status lines include full runId for subsequent API queries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Return error: true (boolean) with code field instead of error: "string_code"
to match ToolErrorPayload convention. Also update runner.ts formatRunLogToolSummary
to prefer details.code for error categorization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enrich tool_end events with result_chars, result_summary, and
error_type fields. Since run-log.jsonl is append-only and never
compacted, this preserves tool result metadata that would otherwise
be lost when session.jsonl undergoes compaction.
New fields:
- result_chars: total character count of result content
- result_summary: short tool-specific summary (e.g. "10 results",
"12.5KB", "finance/get_price_snapshot")
- error_type: error category for tool errors (e.g. "fetch_failed")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When SMC_DATA_DIR is set (e.g., for E2E tests), the credentials lookup
now falls back to ~/.super-multica/credentials.json5 if the custom
data dir doesn't have its own credentials file. This mirrors the
existing fallback pattern in auth-store.ts and removes the need for
the SMC_CREDENTIALS_PATH workaround in E2E tests.
Lookup order:
1. SMC_CREDENTIALS_PATH env var (explicit override)
2. {DATA_DIR}/credentials.json5 (current data dir)
3. ~/.super-multica/credentials.json5 (default location fallback)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --run-log CLI flag to enable structured run logging without env var.
Print session directory path to stderr when run-log is enabled so Coding
Agents can easily locate log files for analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compaction was reporting only 189 tokens removed for 6 messages because
Phase 1 (tool result pruning) hollowed out messages before Phase 2
(summary compaction) measured them. Now captures pre-pruning token count
and reports combined savings from both phases.
Also threads RunLog through SessionManager to emit tool_result_pruning
and compaction_detail events, and adds preflight pruning stats logging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The transformContext pipeline ran sanitizeToolUseResultPairing() before
preflightCompact(), but compaction (pruneToolResults + compactMessagesTokenAware)
can break tool_use/tool_result pairing by dropping assistant messages while
keeping their tool_result blocks. This caused 400 errors from the Anthropic API:
"unexpected tool_use_id found in tool_result blocks".
Add a second sanitizeToolUseResultPairing() call after preflightCompact()
to repair any orphaned tool_result blocks created during compaction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of only protecting .env files, use cpSync with force:true to
overlay bundle files onto the existing directory. This preserves any
user-created files (credentials.json, token.json, etc.) that don't
exist in the bundle, rather than deleting and re-copying the entire
directory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove hardcoded service API key hints from getApiKeyHint() — skill-specific
hints should be discovered dynamically by the agent via web_search/web_fetch
at runtime. Only keep LLM provider hints which are system-level. Update
skill-creator instructions accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add buildIneligibleSkillsSummary() to SkillManager that surfaces skills
with actionable issues (missing env vars, binaries) in the agent's system
prompt. Expand getApiKeyHint() with common service API providers. Update
buildSkillsSection() to guide the agent to suggest activating inactive
skills when they match user intent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove centralized skills.env.json5 in favor of per-skill .env files.
Clean up CredentialManager by removing hasEnv/getEnv/getResolvedEnvSnapshot
methods and skills env loading. Update CLI credentials and skills commands.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move skill environment variables from centralized skills.env.json5 to
per-skill .env files within each skill's directory. This makes credential
management more intuitive and self-contained.
- Fix parser to handle metadata.requires, always, os, skillKey, install
- Add minimal .env parser (dotenv.ts) and load .env at skill parse time
- Add env field to Skill type for per-skill environment variables
- Update eligibility checker to use skill.env instead of CredentialManager
- Preserve user .env files across bundled skill upgrades in loader
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These fields were only checked during eligibility but never injected
at runtime via credentialManager.getEnv(). Remove the half-implemented
per-skill credential config to reduce confusion.
API key configuration remains supported via skills.env.json5 and
process.env.
Refs: MUL-246, MUL-255
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move the shared office/ directory (pack/unpack, validators, schemas,
soffice wrapper) to skills/_shared/office/ and replace the three
identical copies with symlinks. Update skill loader to dereference
symlinks during copy to managed directory, and skip _-prefixed
directories in the bundled skills scan.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lazy-read process.env at call time instead of module import time.
This ensures the env bridge in the Electron main process has time to
set process.env.MULTICA_API_URL before the first API request.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hasToolUse() function was checking for "tool_use" (raw Anthropic format)
but pi-ai normalizes tool call blocks to type "toolCall". This made tool
narration non-functional in the ChannelManager (Desktop/embedded) path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the agent uses tools (web search, etc.), it generates intermediate
narration text like "Let me search..." before each tool call. These were
being sent as separate Telegram messages, causing message spam. Now we
detect tool_use blocks in the message content and skip sending those
intermediate messages — only the final answer reaches the user.
Applied to both Desktop channel plugin and Gateway Telegram service.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- store.test.ts: use baseDir option instead of mocking paths.js
- session-file-repair.test.ts: remove write-lock mock, assert behavior
- announce-findings.test.ts: use real storage with temp dirs
- sessions-list.test.ts: use real registry with seed helper
- compaction.test.ts: mock only third-party pi-coding-agent, use real
context-window internals
All tests exercise real code paths, improving confidence in actual
behavior per the strict mock policy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add AuthStoreOptions with baseDir to auth-profiles/store.ts functions,
add baseDir option to announce.ts readLatestAssistantReply, and add
seedSubagentRunForTests helper to registry.ts. These enable tests to
use real implementations with temp directories instead of mocking
internal modules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>