Oversized tool results (>30% of context window) are now saved as artifacts
before being truncated in the session. The LLM sees a truncated version with
head+tail preservation and a marker pointing to the full artifact file,
which it can re-read on demand. This prevents information loss during
context window management.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multi-turn test script for compaction behavior with low context window.
Runs 4 turns of file reading to push context pressure and outputs
run-log analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove compactMessagesWithSummary (~100 lines, never called; only
the Chunked variant was used)
- Remove compactMessagesByCount, findSafeCompactionPoint, and all
count-mode references (~90 lines)
- Narrow CompactionResult.reason to "tokens" | "summary" | "pruning"
- Narrow compactionMode to "tokens" | "summary" (was "count" | ...)
- Simplify session-manager: remove maxMessages/keepLast params,
enable tool result pruning by default
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-flight compaction runs in-memory only (not persisted), so tool
result pruning in this path was wasted work — results were thrown away
after the LLM call. Post-turn compaction still handles pruning and
persists the results. Only Phase 2 (emergency message drop) remains
as a safety net in pre-flight.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- estimateSystemPromptTokens now uses estimateTokens() (chars/4) instead
of chars/2, eliminating the 2x overestimate that caused pre-flight
compaction to fire on every LLM call at small context windows
- ESTIMATION_SAFETY_MARGIN reduced from 1.5 to 1.2, increasing usable
context from ~53% to ~73% before compaction triggers
At 200k context, effective usable tokens before compaction improved from
~86k to ~120k message tokens (39% increase).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
resolveContextWindowInfo now uses config > model > default priority so
explicit --context-window flag overrides model defaults. Also adds
--context-window CLI option to the run command.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In CLI mode, the parent Agent is not registered with the Hub, so the
normal announce flow can't deliver sub-agent results. Added polling
mechanism that waits for sub-agents to complete and prints their
findings directly to stdout.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SKILL.md: python → python3 (macOS has no `python` binary)
- skills/index.ts: inject skill directory path so agent can resolve
relative paths like scripts/recalc.py to absolute paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LLM often invents custom groupId strings that don't exist in the
registry, causing "group not found" errors. Now auto-creates the
group instead, matching the behavior when `next` is provided.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
toolsOptions spread `options` which had sessionId undefined for
auto-generated sessions. This caused sessions_list and sessions_spawn
to fail with "No session ID available" — breaking sub-agent orchestration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sequential for+await tool dispatch with Promise.allSettled for
parallel execution. All tool_execution_start events emit immediately,
tools run concurrently, results are processed in original order.
Also fix run-log toolStartTimes to key by toolCallId instead of toolName
to prevent collisions with parallel same-name tools.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Grouped runs now display findings for completed sub-agents (up to 4000
chars). Ungrouped runs increased truncation from 200 to 4000 chars. All
status lines include full runId for subsequent API queries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Return error: true (boolean) with code field instead of error: "string_code"
to match ToolErrorPayload convention. Also update runner.ts formatRunLogToolSummary
to prefer details.code for error categorization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers the full pipeline: dataset download, agent execution,
result analysis, and official Docker evaluation. Includes
runner options, output format, known limitations, and initial
benchmark results.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Always initialize Hub in CLI run mode to match the Desktop environment
where Hub is always active. This enables sessions_spawn (sub-agent
creation), cron tasks, channel plugins, and other Hub-dependent
features during E2E testing.
Hub constructor is non-blocking — gateway connection failures are
handled gracefully with auto-reconnect. hub.shutdown() in finally
block ensures clean teardown on exit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enrich tool_end events with result_chars, result_summary, and
error_type fields. Since run-log.jsonl is append-only and never
compacted, this preserves tool result metadata that would otherwise
be lost when session.jsonl undergoes compaction.
New fields:
- result_chars: total character count of result content
- result_summary: short tool-specific summary (e.g. "10 results",
"12.5KB", "finance/get_price_snapshot")
- error_type: error category for tool errors (e.g. "fetch_failed")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace
- run.ts: core runner that clones repos, runs Agent, collects git diff patches
- evaluate.sh: wrapper for official SWE-bench Docker evaluation harness
- analyze.ts: summarizes run results with per-repo and timing breakdowns
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When SMC_DATA_DIR is set (e.g., for E2E tests), the credentials lookup
now falls back to ~/.super-multica/credentials.json5 if the custom
data dir doesn't have its own credentials file. This mirrors the
existing fallback pattern in auth-store.ts and removes the need for
the SMC_CREDENTIALS_PATH workaround in E2E tests.
Lookup order:
1. SMC_CREDENTIALS_PATH env var (explicit override)
2. {DATA_DIR}/credentials.json5 (current data dir)
3. ~/.super-multica/credentials.json5 (default location fallback)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
web_search and data tools authenticate via auth.json (sid + deviceId).
When SMC_DATA_DIR is set (e.g. for E2E tests), the auth file may not
exist in the custom dir. Now getLocalAuth() falls back to
~/.super-multica-dev/auth.json, which is created by pnpm dev:local
Desktop login and valid for the dev backend (api-dev.copilothub.ai).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
E2E tests now use ~/.super-multica-e2e to avoid polluting dev
(~/.super-multica-dev) or production (~/.super-multica) session data.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add agent-driven E2E testing section to CLAUDE.md so all team members'
Coding Agents automatically know how to run and analyze E2E tests.
Update guide with MULTICA_API_URL requirement discovered during testing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive guide teaching Coding Agents how to perform automated E2E
testing by running the agent CLI with --run-log and analyzing structured
run-log events. Includes feature test playbooks, event reference, and
analysis patterns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --run-log CLI flag to enable structured run logging without env var.
Print session directory path to stderr when run-log is enabled so Coding
Agents can easily locate log files for analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compaction was reporting only 189 tokens removed for 6 messages because
Phase 1 (tool result pruning) hollowed out messages before Phase 2
(summary compaction) measured them. Now captures pre-pruning token count
and reports combined savings from both phases.
Also threads RunLog through SessionManager to emit tool_result_pruning
and compaction_detail events, and adds preflight pruning stats logging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The transformContext pipeline ran sanitizeToolUseResultPairing() before
preflightCompact(), but compaction (pruneToolResults + compactMessagesTokenAware)
can break tool_use/tool_result pairing by dropping assistant messages while
keeping their tool_result blocks. This caused 400 errors from the Anthropic API:
"unexpected tool_use_id found in tool_result blocks".
Add a second sanitizeToolUseResultPairing() call after preflightCompact()
to repair any orphaned tool_result blocks created during compaction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of only protecting .env files, use cpSync with force:true to
overlay bundle files onto the existing directory. This preserves any
user-created files (credentials.json, token.json, etc.) that don't
exist in the bundle, rather than deleting and re-copying the entire
directory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unhandledRejection and uncaughtException handlers to prevent the
gateway from crashing on unexpected errors. Add SIGTERM/SIGINT handlers
for graceful shutdown via app.close().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add bot.catch() to prevent unhandled errors from crashing the polling
loop, and catch the 409 "terminated by other getUpdates request" error
specifically when another bot instance is already running.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove hardcoded service API key hints from getApiKeyHint() — skill-specific
hints should be discovered dynamically by the agent via web_search/web_fetch
at runtime. Only keep LLM provider hints which are system-level. Update
skill-creator instructions accordingly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update skill-creator SKILL.md with proactive skill activation workflow:
guide users through API key setup, accept keys in chat, write .env files
automatically. Add sections for creating skills with env requirements and
.env file format reference.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add buildIneligibleSkillsSummary() to SkillManager that surfaces skills
with actionable issues (missing env vars, binaries) in the agent's system
prompt. Expand getApiKeyHint() with common service API providers. Update
buildSkillsSection() to guide the agent to suggest activating inactive
skills when they match user intent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove centralized skills.env.json5 in favor of per-skill .env files.
Clean up CredentialManager by removing hasEnv/getEnv/getResolvedEnvSnapshot
methods and skills env loading. Update CLI credentials and skills commands.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move skill environment variables from centralized skills.env.json5 to
per-skill .env files within each skill's directory. This makes credential
management more intuitive and self-contained.
- Fix parser to handle metadata.requires, always, os, skillKey, install
- Add minimal .env parser (dotenv.ts) and load .env at skill parse time
- Add env field to Skill type for per-skill environment variables
- Update eligibility checker to use skill.env instead of CredentialManager
- Preserve user .env files across bundled skill upgrades in loader
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move LoadingIndicator from ChatView into MessageList for consistent padding
- Add isLoading and hasPendingApprovals props to MessageList
- Adjust message spacing (my-1 → my-2) for better visual balance
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create LoadingIndicator component with "generating" and "streaming" variants
- Remove inline loading indicator from StreamingMarkdown (empty content returns empty fragment)
- Use unified LoadingIndicator in ChatView with consistent positioning
- Eliminates layout shift between different loading states
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use container utility class consistently across chat components
- Change container max-width from 5xl to 4xl for better readability
- Adjust message bubble padding (p-3 -> p-2)
- Fix logout dropdown alignment and add destructive variant
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>