- download-dataset.py: fetches SWE-bench Lite/Verified/Full from HuggingFace - run.ts: core runner that clones repos, runs Agent, collects git diff patches - evaluate.sh: wrapper for official SWE-bench Docker evaluation harness - analyze.ts: summarizes run results with per-repo and timing breakdowns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 lines
81 B
Text
5 lines
81 B
Text
# Downloaded datasets
|
|
*.jsonl
|
|
|
|
# Don't ignore the scripts themselves
|
|
!.gitignore
|