multica/scripts
Jiayuan Zhang 90d374ffd5 feat(scripts): add SWE-bench runner for Multica agent evaluation
- download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace
- run.ts: core runner that clones repos, runs Agent, collects git diff patches
- evaluate.sh: wrapper for official SWE-bench Docker evaluation harness
- analyze.ts: summarizes run results with per-repo and timing breakdowns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 18:05:17 +08:00
..
swe-bench feat(scripts): add SWE-bench runner for Multica agent evaluation 2026-02-15 18:05:17 +08:00
archive-dev-data.sh feat(scripts): add dev:local:archive to snapshot dev data for debugging 2026-02-15 00:58:39 +08:00
build-cli.js chore(cli): update package.json and build script for unified CLI 2026-02-01 23:09:54 +08:00
dev-local.sh refactor: unify API URL env var to MULTICA_API_URL 2026-02-15 06:31:00 +08:00
generate-code-stats-report.sh feat(report): add code stats report generator 2026-02-15 04:32:30 +08:00
reset-user-data.sh chore: update reset scripts and docs for dev data directory 2026-02-15 00:39:25 +08:00
set-telegram-webhook.sh chore(telegram): add webhook setup script 2026-02-10 17:07:45 +08:00