multica/scripts/swe-bench/.gitignore at 63e754114952a7e4dd1ed4eaa8e482c99903d4d2 - marketing-shibata50/multica - Forgejo: Beyond coding. We forge.

marketing-shibata50/multica

Jiayuan Zhang 90d374ffd5 feat(scripts): add SWE-bench runner for Multica agent evaluation

- download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace
- run.ts: core runner that clones repos, runs Agent, collects git diff patches
- evaluate.sh: wrapper for official SWE-bench Docker evaluation harness
- analyze.ts: summarizes run results with per-repo and timing breakdowns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 18:05:17 +08:00

5 lines

81 B

Text

Raw Blame History

 # Downloaded datasets
 *.jsonl
 # Don't ignore the scripts themselves
 !.gitignore