Commit graph

2 commits

Author SHA1 Message Date
Jiayuan Zhang
f60551195a chore(agent): remove old sessions_spawn/sessions_list tools and update references
Delete sessions-spawn.ts, sessions-list.ts and their tests. Update CLI
to remove waitForSubagents polling workaround (delegate is synchronous).
Update UI, desktop IPC, SWE-bench, and system prompt tests to use the
new delegate tool name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 01:09:21 +08:00
Jiayuan Zhang
90d374ffd5 feat(scripts): add SWE-bench runner for Multica agent evaluation
- download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace
- run.ts: core runner that clones repos, runs Agent, collects git diff patches
- evaluate.sh: wrapper for official SWE-bench Docker evaluation harness
- analyze.ts: summarizes run results with per-repo and timing breakdowns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 18:05:17 +08:00