multica/scripts/swe-bench
Jiayuan Zhang 90d374ffd5 feat(scripts): add SWE-bench runner for Multica agent evaluation
- download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace
- run.ts: core runner that clones repos, runs Agent, collects git diff patches
- evaluate.sh: wrapper for official SWE-bench Docker evaluation harness
- analyze.ts: summarizes run results with per-repo and timing breakdowns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 18:05:17 +08:00
..
.gitignore feat(scripts): add SWE-bench runner for Multica agent evaluation 2026-02-15 18:05:17 +08:00
analyze.ts feat(scripts): add SWE-bench runner for Multica agent evaluation 2026-02-15 18:05:17 +08:00
download-dataset.py feat(scripts): add SWE-bench runner for Multica agent evaluation 2026-02-15 18:05:17 +08:00
evaluate.sh feat(scripts): add SWE-bench runner for Multica agent evaluation 2026-02-15 18:05:17 +08:00
run.ts feat(scripts): add SWE-bench runner for Multica agent evaluation 2026-02-15 18:05:17 +08:00