multica

Author	SHA1	Message	Date
Jiayuan Zhang	3f9a30423d	feat(session): add artifact storage and pre-emptive tool result truncation Oversized tool results (>30% of context window) are now saved as artifacts before being truncated in the session. The LLM sees a truncated version with head+tail preservation and a marker pointing to the full artifact file, which it can re-read on demand. This prevents information loss during context window management. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 23:02:18 +08:00
Jiayuan Zhang	7a16df7c56	chore(compaction): add E2E compaction benchmark script Multi-turn test script for compaction behavior with low context window. Runs 4 turns of file reading to push context pressure and outputs run-log analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:37:36 +08:00
Jiayuan Zhang	b412ca902b	refactor(compaction): remove dead code and legacy count mode - Remove compactMessagesWithSummary (~100 lines, never called; only the Chunked variant was used) - Remove compactMessagesByCount, findSafeCompactionPoint, and all count-mode references (~90 lines) - Narrow CompactionResult.reason to "tokens" \| "summary" \| "pruning" - Narrow compactionMode to "tokens" \| "summary" (was "count" \| ...) - Simplify session-manager: remove maxMessages/keepLast params, enable tool result pruning by default Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:37:23 +08:00
Jiayuan Zhang	92cf312843	refactor(compaction): remove pre-flight tool result pruning Pre-flight compaction runs in-memory only (not persisted), so tool result pruning in this path was wasted work — results were thrown away after the LLM call. Post-turn compaction still handles pruning and persists the results. Only Phase 2 (emergency message drop) remains as a safety net in pre-flight. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:37:15 +08:00
Jiayuan Zhang	fbb0b11c6e	fix(compaction): fix system prompt token estimation and reduce safety margin - estimateSystemPromptTokens now uses estimateTokens() (chars/4) instead of chars/2, eliminating the 2x overestimate that caused pre-flight compaction to fire on every LLM call at small context windows - ESTIMATION_SAFETY_MARGIN reduced from 1.5 to 1.2, increasing usable context from ~53% to ~73% before compaction triggers At 200k context, effective usable tokens before compaction improved from ~86k to ~120k message tokens (39% increase). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:37:09 +08:00
Jiayuan Zhang	40a2e8ae55	fix(context-window): prioritize config over model for context window resolution resolveContextWindowInfo now uses config > model > default priority so explicit --context-window flag overrides model defaults. Also adds --context-window CLI option to the run command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 21:37:02 +08:00
Jiayuan Zhang	74c0ca0ddc	Merge pull request #203 from multica-ai/forrestchang/finance-benchmark fix: agent stability, tooling, and sub-agent orchestration	2026-02-15 20:50:44 +08:00
Jiayuan Zhang	a443d3009d	chore: remove leftover patch file from parallel execution revert Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 20:45:32 +08:00
Jiayuan Zhang	084657868f	revert(agent): remove parallel tool execution patch, keep serial Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 20:43:37 +08:00
Jiayuan Zhang	e39f9a5dfe	feat(cli): wait for sub-agents and output findings in run mode In CLI mode, the parent Agent is not registered with the Hub, so the normal announce flow can't deliver sub-agent results. Added polling mechanism that waits for sub-agents to complete and prints their findings directly to stdout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 19:53:26 +08:00
Jiayuan Zhang	b394b0ccf9	fix(skills): use python3 and inject skill directory path into prompt - SKILL.md: python → python3 (macOS has no `python` binary) - skills/index.ts: inject skill directory path so agent can resolve relative paths like scripts/recalc.py to absolute paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 19:53:18 +08:00
Jiayuan Zhang	691e33e71e	fix(tools): auto-create group when custom groupId is provided LLM often invents custom groupId strings that don't exist in the registry, causing "group not found" errors. Now auto-creates the group instead, matching the behavior when `next` is provided. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 19:53:12 +08:00
Jiayuan Zhang	d162ba98a9	fix(agent): pass sessionId to tools for sub-agent session tracking toolsOptions spread `options` which had sessionId undefined for auto-generated sessions. This caused sessions_list and sessions_spawn to fail with "No session ID available" — breaking sub-agent orchestration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 19:53:03 +08:00
Jiayuan Zhang	a254daff01	feat(agent): enable parallel tool execution via pi-agent-core patch Replace sequential for+await tool dispatch with Promise.allSettled for parallel execution. All tool_execution_start events emit immediately, tools run concurrently, results are processed in original order. Also fix run-log toolStartTimes to key by toolCallId instead of toolName to prevent collisions with parallel same-name tools. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:47:53 +08:00
Jiayuan Zhang	c012bff246	fix(tools): show findings and full runId in sessions_list list view Grouped runs now display findings for completed sub-agents (up to 4000 chars). Ungrouped runs increased truncation from 200 to 4000 chars. All status lines include full runId for subsequent API queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:47:44 +08:00
Jiayuan Zhang	02ed09b77b	fix(tools): use boolean error flag in web_fetch and web_search error responses Return error: true (boolean) with code field instead of error: "string_code" to match ToolErrorPayload convention. Also update runner.ts formatRunLogToolSummary to prefer details.code for error categorization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:47:37 +08:00
Jiayuan Zhang	9cb6ee30a2	Merge pull request #202 from multica-ai/forrestchang/swe-bench-runner feat: add SWE-bench runner for agent evaluation	2026-02-15 18:34:56 +08:00
Jiayuan Zhang	a0a837e76b	docs: add testing & benchmarks section to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:33:11 +08:00
Jiayuan Zhang	45acb965ba	docs: add SWE-bench section to CLAUDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:32:04 +08:00
Jiayuan Zhang	10c57c0f7a	docs: add SWE-bench runner guide Covers the full pipeline: dataset download, agent execution, result analysis, and official Docker evaluation. Includes runner options, output format, known limitations, and initial benchmark results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:30:58 +08:00
Jiayuan Zhang	b007ddffc8	feat(cli): initialize Hub in run mode for full agent capabilities Always initialize Hub in CLI run mode to match the Desktop environment where Hub is always active. This enables sessions_spawn (sub-agent creation), cron tasks, channel plugins, and other Hub-dependent features during E2E testing. Hub constructor is non-blocking — gateway connection failures are handled gracefully with auto-reconnect. hub.shutdown() in finally block ensures clean teardown on exit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:06:51 +08:00
Jiayuan Zhang	755ed5e9de	feat(run-log): add result metadata to tool_end events Enrich tool_end events with result_chars, result_summary, and error_type fields. Since run-log.jsonl is append-only and never compacted, this preserves tool result metadata that would otherwise be lost when session.jsonl undergoes compaction. New fields: - result_chars: total character count of result content - result_summary: short tool-specific summary (e.g. "10 results", "12.5KB", "finance/get_price_snapshot") - error_type: error category for tool errors (e.g. "fetch_failed") Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:06:42 +08:00
Jiayuan Zhang	90d374ffd5	feat(scripts): add SWE-bench runner for Multica agent evaluation - download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace - run.ts: core runner that clones repos, runs Agent, collects git diff patches - evaluate.sh: wrapper for official SWE-bench Docker evaluation harness - analyze.ts: summarizes run results with per-repo and timing breakdowns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:05:17 +08:00
Jiayuan Zhang	1c24dd2885	fix(credentials): add fallback to ~/.super-multica for custom data dirs When SMC_DATA_DIR is set (e.g., for E2E tests), the credentials lookup now falls back to ~/.super-multica/credentials.json5 if the custom data dir doesn't have its own credentials file. This mirrors the existing fallback pattern in auth-store.ts and removes the need for the SMC_CREDENTIALS_PATH workaround in E2E tests. Lookup order: 1. SMC_CREDENTIALS_PATH env var (explicit override) 2. {DATA_DIR}/credentials.json5 (current data dir) 3. ~/.super-multica/credentials.json5 (default location fallback) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 17:48:21 +08:00
Jiayuan Zhang	47f8e621c8	Merge pull request #201 from multica-ai/forrestchang/debug-agent-logs fix(agent): report accurate compaction metrics and add run-log observability	2026-02-15 16:58:27 +08:00
Jiayuan Zhang	75fac3a2d7	fix(auth): fallback to dev auth.json for E2E tests web_search and data tools authenticate via auth.json (sid + deviceId). When SMC_DATA_DIR is set (e.g. for E2E tests), the auth file may not exist in the custom dir. Now getLocalAuth() falls back to ~/.super-multica-dev/auth.json, which is created by pnpm dev:local Desktop login and valid for the dev backend (api-dev.copilothub.ai). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:37:26 +08:00
Jiayuan Zhang	1ffa8b1389	docs: add SMC_DATA_DIR isolation for E2E test sessions E2E tests now use ~/.super-multica-e2e to avoid polluting dev (~/.super-multica-dev) or production (~/.super-multica) session data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:29:39 +08:00
Jiayuan Zhang	a823e391b9	docs: add E2E testing workflow to CLAUDE.md and update guide with MULTICA_API_URL Add agent-driven E2E testing section to CLAUDE.md so all team members' Coding Agents automatically know how to run and analyze E2E tests. Update guide with MULTICA_API_URL requirement discovered during testing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:23:37 +08:00
Jiayuan Zhang	496eda82d7	docs: add agent-driven E2E testing guide for Coding Agents Comprehensive guide teaching Coding Agents how to perform automated E2E testing by running the agent CLI with --run-log and analyzing structured run-log events. Includes feature test playbooks, event reference, and analysis patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:03:47 +08:00
Jiayuan Zhang	a2c1379c1d	feat(cli): add --run-log flag and session dir output for agent-driven E2E testing Add --run-log CLI flag to enable structured run logging without env var. Print session directory path to stderr when run-log is enabled so Coding Agents can easily locate log files for analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 16:03:40 +08:00
Jiayuan Zhang	239dc5a7c6	fix(agent): report accurate compaction metrics and add run-log observability Compaction was reporting only 189 tokens removed for 6 messages because Phase 1 (tool result pruning) hollowed out messages before Phase 2 (summary compaction) measured them. Now captures pre-pruning token count and reports combined savings from both phases. Also threads RunLog through SessionManager to emit tool_result_pruning and compaction_detail events, and adds preflight pruning stats logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 15:42:04 +08:00
Jiayuan Zhang	313f826d58	Merge pull request #200 from multica-ai/forrestchang/fix-telegram-conflict fix(gateway): handle Telegram 409 conflict and add error resilience	2026-02-15 14:48:25 +08:00
Jiayuan Zhang	dba0c32d74	Merge pull request #199 from multica-ai/forrestchang/skill-env-storage feat(skills): implement per-skill .env files with auto-discovery	2026-02-15 14:46:32 +08:00
Jiayuan Zhang	51741d5111	Merge pull request #198 from multica-ai/forrestchang/rm-cmd-b-shortcut fix(ui): remove Cmd+B sidebar toggle shortcut	2026-02-15 14:42:12 +08:00
Jiayuan Zhang	99167b9837	fix(agent): re-validate tool pairing after preflight compaction The transformContext pipeline ran sanitizeToolUseResultPairing() before preflightCompact(), but compaction (pruneToolResults + compactMessagesTokenAware) can break tool_use/tool_result pairing by dropping assistant messages while keeping their tool_result blocks. This caused 400 errors from the Anthropic API: "unexpected tool_use_id found in tool_result blocks". Add a second sanitizeToolUseResultPairing() call after preflightCompact() to repair any orphaned tool_result blocks created during compaction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 14:25:48 +08:00
Jiayuan Zhang	57805cddb8	fix(ui): remove Cmd+B sidebar toggle shortcut Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 14:18:49 +08:00
Jiayuan Zhang	a4b7deac3e	fix(skills): preserve all user files during bundled skill upgrades Instead of only protecting .env files, use cpSync with force:true to overlay bundle files onto the existing directory. This preserves any user-created files (credentials.json, token.json, etc.) that don't exist in the bundle, rather than deleting and re-copying the entire directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 14:15:31 +08:00
Jiayuan Zhang	fe7c772219	fix(gateway): add process-level error handlers and graceful shutdown Add unhandledRejection and uncaughtException handlers to prevent the gateway from crashing on unexpected errors. Add SIGTERM/SIGINT handlers for graceful shutdown via app.close(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 14:07:50 +08:00
Jiayuan Zhang	5741402a1a	fix(telegram): handle 409 polling conflict and add global error boundary Add bot.catch() to prevent unhandled errors from crashing the polling loop, and catch the 409 "terminated by other getUpdates request" error specifically when another bot instance is already running. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 14:07:49 +08:00
Jiayuan Zhang	8848f09107	refactor(skills): remove hardcoded API key hints, use dynamic web search Remove hardcoded service API key hints from getApiKeyHint() — skill-specific hints should be discovered dynamically by the agent via web_search/web_fetch at runtime. Only keep LLM provider hints which are system-level. Update skill-creator instructions accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 13:39:14 +08:00
Jiayuan Zhang	8004403e1b	feat(skill-creator): add activation flow and API key onboarding instructions Update skill-creator SKILL.md with proactive skill activation workflow: guide users through API key setup, accept keys in chat, write .env files automatically. Add sections for creating skills with env requirements and .env file format reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 13:34:51 +08:00
Jiayuan Zhang	6f67bb77b8	feat(skills): expose ineligible skills in system prompt for auto-discovery Add buildIneligibleSkillsSummary() to SkillManager that surfaces skills with actionable issues (missing env vars, binaries) in the agent's system prompt. Expand getApiKeyHint() with common service API providers. Update buildSkillsSection() to guide the agent to suggest activating inactive skills when they match user intent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 13:34:51 +08:00
Jiayuan Zhang	0678431a7d	docs: update credential docs for per-skill .env files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 13:34:51 +08:00
Jiayuan Zhang	bd0b380e2e	refactor(credentials): remove skills.env.json5 support Remove centralized skills.env.json5 in favor of per-skill .env files. Clean up CredentialManager by removing hasEnv/getEnv/getResolvedEnvSnapshot methods and skills env loading. Update CLI credentials and skills commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 13:34:51 +08:00
Jiayuan Zhang	9f98ccca58	feat(skills): store API keys in per-skill .env files Move skill environment variables from centralized skills.env.json5 to per-skill .env files within each skill's directory. This makes credential management more intuitive and self-contained. - Fix parser to handle metadata.requires, always, os, skillKey, install - Add minimal .env parser (dotenv.ts) and load .env at skill parse time - Add env field to Skill type for per-skill environment variables - Update eligibility checker to use skill.env instead of CredentialManager - Preserve user .env files across bundled skill upgrades in loader Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 13:34:51 +08:00
Jiayuan Zhang	358fcb3c0e	Merge pull request #197 from multica-ai/forrestchang/create-pr feat(report): add code stats report generator	2026-02-15 13:09:52 +08:00
Naiyuan Qing	59f8802f7f	Merge pull request #196 from multica-ai/fix/chat-input-multiline fix(ui): preserve newlines in chat input multiline text	2026-02-15 11:08:09 +08:00
Naiyuan Qing	430f2c177e	refactor(ui): move LoadingIndicator into MessageList - Move LoadingIndicator from ChatView into MessageList for consistent padding - Add isLoading and hasPendingApprovals props to MessageList - Adjust message spacing (my-1 → my-2) for better visual balance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 11:03:59 +08:00
Naiyuan Qing	deb747a859	refactor(ui): unify loading indicator component - Create LoadingIndicator component with "generating" and "streaming" variants - Remove inline loading indicator from StreamingMarkdown (empty content returns empty fragment) - Use unified LoadingIndicator in ChatView with consistent positioning - Eliminates layout shift between different loading states Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 10:52:46 +08:00
Naiyuan Qing	c6ca5f3270	refactor(ui): unify container layout and adjust spacing - Use container utility class consistently across chat components - Change container max-width from 5xl to 4xl for better readability - Adjust message bubble padding (p-3 -> p-2) - Fix logout dropdown alignment and add destructive variant Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 10:47:59 +08:00

1 2 3 4 5 ...

1039 commits