multica

marketing-shibata50/multica

Fork 0

Commit graph

Author	SHA1	Message	Date
Jiayuan Zhang	90d374ffd5	feat(scripts): add SWE-bench runner for Multica agent evaluation - download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace - run.ts: core runner that clones repos, runs Agent, collects git diff patches - evaluate.sh: wrapper for official SWE-bench Docker evaluation harness - analyze.ts: summarizes run results with per-repo and timing breakdowns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-15 18:05:17 +08:00

Author

SHA1

Message

Date

Jiayuan Zhang

90d374ffd5

feat(scripts): add SWE-bench runner for Multica agent evaluation

- download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace
- run.ts: core runner that clones repos, runs Agent, collects git diff patches
- evaluate.sh: wrapper for official SWE-bench Docker evaluation harness
- analyze.ts: summarizes run results with per-repo and timing breakdowns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 18:05:17 +08:00

1 commit