multica

History

Jiayuan Zhang 90d374ffd5 feat(scripts): add SWE-bench runner for Multica agent evaluation - download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace - run.ts: core runner that clones repos, runs Agent, collects git diff patches - evaluate.sh: wrapper for official SWE-bench Docker evaluation harness - analyze.ts: summarizes run results with per-repo and timing breakdowns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-02-15 18:05:17 +08:00
..
.gitignore	feat(scripts): add SWE-bench runner for Multica agent evaluation	2026-02-15 18:05:17 +08:00
analyze.ts	feat(scripts): add SWE-bench runner for Multica agent evaluation	2026-02-15 18:05:17 +08:00
download-dataset.py	feat(scripts): add SWE-bench runner for Multica agent evaluation	2026-02-15 18:05:17 +08:00
evaluate.sh	feat(scripts): add SWE-bench runner for Multica agent evaluation	2026-02-15 18:05:17 +08:00
run.ts	feat(scripts): add SWE-bench runner for Multica agent evaluation	2026-02-15 18:05:17 +08:00