**Audit Results (256 questions):** - Pass: 231 (90.2%) - Issues: 25 (9.8%) - Critical: 6 (wrong answer/factual error) - Warning: 16 (ambiguous/outdated) - Info: 3 (minor wording) **Critical issues fixed** (see landing repo commit 94bc3db): - Q01-001: npm vs curl for universal install - Q03-011: CLAUDE.md location confusion - Q08-019: auto:N threshold misunderstanding - Q09-003: --headless flag doesn't exist - Q09-029: Boris Cherny attribution - Q12-012: wrong sub-agent count **Warnings to review** (Priority 2): - 5 ambiguities (missing guide context) - 7 factual accuracy issues (stats without sources) - 2 outdated info (version changes) **Healthiest categories:** Q05, Q07, Q11, Q13 (100% pass rate) **Need attention:** Q09 (79.3%), Q10 (75.0%) Audit system: extract-audit-context.py → generate-audit-batches.py → 16 parallel agents → generate-audit-report.py Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
129 lines
5.9 KiB
Markdown
129 lines
5.9 KiB
Markdown
# Quiz Question Audit Report
|
||
|
||
**Generated**: 2026-02-04
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
**Total Questions Reviewed**: 256
|
||
**Pass**: 231 (90.2%)
|
||
**Issues Found**: 25 (9.8%)
|
||
|
||
### Issue Breakdown
|
||
|
||
- **Critical**: 6 (wrong answer, major factual error)
|
||
- **Warning**: 16 (ambiguous, outdated, misleading)
|
||
- **Info**: 3 (minor wording, trivial)
|
||
|
||
---
|
||
|
||
## Critical Issues (Immediate Fix Required)
|
||
|
||
### Q01-001
|
||
|
||
**Type**: CORRECT_ANSWER
|
||
**Issue**: Guide shows npm as Universal Method, not curl
|
||
|
||
### Q03-011
|
||
|
||
**Type**: CORRECT_ANSWER
|
||
**Issue**: Guide states CLAUDE.md in .claude/ should be committed (project memory), not gitignored
|
||
|
||
### Q08-019
|
||
|
||
**Type**: CORRECT_ANSWER
|
||
**Issue**: The explanation states "There is no configurable 'auto:N' parameter" yet the question claims auto:N controls lazy loading. The guide (architecture.md:996) shows ENABLE_TOOL_SEARCH=auto:N sets thresholds (5%/10%/20% context), NOT max tools. The 10,000 token threshold is automatic. The question confuses threshold configuration with tool count.
|
||
|
||
### Q09-003
|
||
|
||
**Type**: CORRECT_ANSWER
|
||
**Issue**: Guide shows no CLI flag for headless mode, only mentions "Headless Mode" as section title with no content
|
||
|
||
### Q09-029
|
||
|
||
**Type**: CORRECT_ANSWER
|
||
**Issue**: Guide quote says "I treat Claude.md as compounding memory: every mistake becomes a durable rule for the team" (line 12856) - the 4-step cycle explanation is NOT from Boris Cherny, it's an interpretation. The correct answer explains a process not explicitly stated in the guide.
|
||
|
||
### Q12-012
|
||
|
||
**Type**: CORRECT_ANSWER
|
||
**Issue**: Guide shows 3 sub-agent types (Explore, Plan, general-purpose) but option b incorrectly lists 4 types including Bash as a sub-agent type. Bash is a TOOL not a sub-agent type.
|
||
|
||
---
|
||
|
||
## Warnings (Review & Consider Fixing)
|
||
|
||
### AMBIGUITY (5 questions)
|
||
|
||
- **Q01-014**: Guide context doesn't clearly list what's preserved vs not preserved
|
||
- **Q02-007**: Guide context shows generic section header, not specific content about context poisoning
|
||
- **Q02-015**: Guide context points to Fresh Context Pattern section, not XML prompts usage
|
||
- **Q04-011**: The guide context (line 17354) is irrelevant to multi-agent orchestration pattern; the correct context appears at lines 5280-5310
|
||
- **Q09-005**: "Rev the Engine" describes multiple rounds of PLANNING (think → plan → think harder → refine), not write-critique-improve cycles as stated in explanation
|
||
|
||
### CORRECT_ANSWER (2 questions)
|
||
|
||
- **Q10-001**: Shift+Tab does NOT toggle plan/execute; it cycles through permission modes (default→auto→plan). Use /plan to explicitly enter Plan Mode.
|
||
- **Q14-011**: Both interpretations are technically valid in guide context (ai-traceability.md lines 114-138), but the nuance is slightly different. Guide shows Assisted-by is for when you're the primary author with AI help (LLVM standard), Co-Authored-By is Claude's default (shared authorship). The answer is correct in principle but could be more precise.
|
||
|
||
### FACTUAL_ACCURACY (7 questions)
|
||
|
||
- **Q02-018**: Explanation says "76% fewer tokens with better results" but this specific metric is not in guide context provided
|
||
- **Q03-018**: Explanation mixes guide's 8 domains with Boris Cherny's 4 methods without clear distinction; potential confusion
|
||
- **Q04-018**: Stats cited as "53-79%" but guide (line 6259) shows "~56%" auto-invocation rate from Gao 2026; also "100% reliable" for CLAUDE.md is overstated (no source confirms 100%)
|
||
- **Q06-003**: Guide shows both $ARGUMENTS[0] and $0 as valid syntax, explanation incomplete
|
||
- **Q09-006**: Guide context excerpt shows generic "## Output Format" header (line 4849) unrelated to CLI flags; actual flag exists but wrong context provided
|
||
- **Q09-026**: Guide says ">10 occurrences = established" (line 5227) not ">10 occurrences", threshold should be "10+" or "≥10"
|
||
- **Q10-014**: Guide context shows "nano ~/.claude.json" (line 17354) which is NOT about .gitignore patterns. Correct info is in the explanation but context snippet is wrong file location.
|
||
|
||
### OUTDATED (2 questions)
|
||
|
||
- **Q10-004**: Guide shows 75-90% for /compact (line 1449: "🔴 Red | 75-90% | Use /compact or /clear"). Explanation says 70-90% which conflicts. Threshold updated from 70% to 75% in recent versions.
|
||
- **Q15-011**: Bridge script exists in examples/scripts/bridge.py, not unresolved. Guide context was incorrectly marked as unresolved.
|
||
|
||
---
|
||
|
||
## Info (Minor Issues)
|
||
|
||
- **Q09-028** (FACTUAL_ACCURACY): Guide references Osmani's article which mentions "comprehension debt" but doesn't explicitly define it as "Code you shipped but don't fully understand" - explanation is correct but attribution could be clearer
|
||
- **Q10-002** (FACTUAL_ACCURACY): Explanation correct but context snippet does not show the Esc×2 shortcut (line 15862 only shows "Esc: Dismiss current suggestion", not Esc×2). Guide context incomplete.
|
||
- **Q10-006** (TRIVIAL): Question shows answer in option text: "Bash(git *)" is the only option with wildcard syntax matching the question "allow ALL git commands".
|
||
|
||
---
|
||
|
||
## Health by Category
|
||
|
||
| Category | Pass | Issues | Pass Rate |
|
||
|----------|------|--------|-----------|
|
||
| Category Q01 | 16 | 2 | 88.9% |
|
||
| Category Q02 | 15 | 3 | 83.3% |
|
||
| Category Q03 | 17 | 2 | 89.5% |
|
||
| Category Q04 | 16 | 2 | 88.9% |
|
||
| Category Q05 | 18 | 0 | 100.0% |
|
||
| Category Q06 | 11 | 1 | 91.7% |
|
||
| Category Q07 | 16 | 0 | 100.0% |
|
||
| Category Q08 | 19 | 1 | 95.0% |
|
||
| Category Q09 | 23 | 6 | 79.3% |
|
||
| Category Q10 | 15 | 5 | 75.0% |
|
||
| Category Q11 | 17 | 0 | 100.0% |
|
||
| Category Q12 | 14 | 1 | 93.3% |
|
||
| Category Q13 | 12 | 0 | 100.0% |
|
||
| Category Q14 | 10 | 1 | 90.9% |
|
||
| Category Q15 | 12 | 1 | 92.3% |
|
||
|
||
---
|
||
|
||
## Recommended Actions
|
||
|
||
1. **Fix Critical Issues** (Priority 1)
|
||
- Review each critical issue
|
||
- Fix question/answer or update explanation
|
||
- Rebuild: `python3 scripts/build-questions.py`
|
||
|
||
2. **Review Warnings** (Priority 2)
|
||
- Evaluate ambiguities and outdated info
|
||
- Decide: fix, clarify, or accept
|
||
|
||
3. **Consider Info Issues** (Priority 3)
|
||
- Minor improvements for quality
|