docs: add quiz audit report (6 critical issues found)

**Audit Results (256 questions):**
- Pass: 231 (90.2%)
- Issues: 25 (9.8%)
  - Critical: 6 (wrong answer/factual error)
  - Warning: 16 (ambiguous/outdated)
  - Info: 3 (minor wording)

**Critical issues fixed** (see landing repo commit 94bc3db):
- Q01-001: npm vs curl for universal install
- Q03-011: CLAUDE.md location confusion
- Q08-019: auto:N threshold misunderstanding
- Q09-003: --headless flag doesn't exist
- Q09-029: Boris Cherny attribution
- Q12-012: wrong sub-agent count

**Warnings to review** (Priority 2):
- 5 ambiguities (missing guide context)
- 7 factual accuracy issues (stats without sources)
- 2 outdated info (version changes)

**Healthiest categories:** Q05, Q07, Q11, Q13 (100% pass rate)
**Need attention:** Q09 (79.3%), Q10 (75.0%)

Audit system: extract-audit-context.py → generate-audit-batches.py → 16 parallel agents → generate-audit-report.py

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-02-04 17:20:35 +01:00
parent a55ff38143
commit 69f493591e
17 changed files with 385 additions and 0 deletions

129
claudedocs/audit-report.md Normal file
View file

@ -0,0 +1,129 @@
# Quiz Question Audit Report
**Generated**: 2026-02-04
---
## Executive Summary
**Total Questions Reviewed**: 256
**Pass**: 231 (90.2%)
**Issues Found**: 25 (9.8%)
### Issue Breakdown
- **Critical**: 6 (wrong answer, major factual error)
- **Warning**: 16 (ambiguous, outdated, misleading)
- **Info**: 3 (minor wording, trivial)
---
## Critical Issues (Immediate Fix Required)
### Q01-001
**Type**: CORRECT_ANSWER
**Issue**: Guide shows npm as Universal Method, not curl
### Q03-011
**Type**: CORRECT_ANSWER
**Issue**: Guide states CLAUDE.md in .claude/ should be committed (project memory), not gitignored
### Q08-019
**Type**: CORRECT_ANSWER
**Issue**: The explanation states "There is no configurable 'auto:N' parameter" yet the question claims auto:N controls lazy loading. The guide (architecture.md:996) shows ENABLE_TOOL_SEARCH=auto:N sets thresholds (5%/10%/20% context), NOT max tools. The 10,000 token threshold is automatic. The question confuses threshold configuration with tool count.
### Q09-003
**Type**: CORRECT_ANSWER
**Issue**: Guide shows no CLI flag for headless mode, only mentions "Headless Mode" as section title with no content
### Q09-029
**Type**: CORRECT_ANSWER
**Issue**: Guide quote says "I treat Claude.md as compounding memory: every mistake becomes a durable rule for the team" (line 12856) - the 4-step cycle explanation is NOT from Boris Cherny, it's an interpretation. The correct answer explains a process not explicitly stated in the guide.
### Q12-012
**Type**: CORRECT_ANSWER
**Issue**: Guide shows 3 sub-agent types (Explore, Plan, general-purpose) but option b incorrectly lists 4 types including Bash as a sub-agent type. Bash is a TOOL not a sub-agent type.
---
## Warnings (Review & Consider Fixing)
### AMBIGUITY (5 questions)
- **Q01-014**: Guide context doesn't clearly list what's preserved vs not preserved
- **Q02-007**: Guide context shows generic section header, not specific content about context poisoning
- **Q02-015**: Guide context points to Fresh Context Pattern section, not XML prompts usage
- **Q04-011**: The guide context (line 17354) is irrelevant to multi-agent orchestration pattern; the correct context appears at lines 5280-5310
- **Q09-005**: "Rev the Engine" describes multiple rounds of PLANNING (think → plan → think harder → refine), not write-critique-improve cycles as stated in explanation
### CORRECT_ANSWER (2 questions)
- **Q10-001**: Shift+Tab does NOT toggle plan/execute; it cycles through permission modes (default→auto→plan). Use /plan to explicitly enter Plan Mode.
- **Q14-011**: Both interpretations are technically valid in guide context (ai-traceability.md lines 114-138), but the nuance is slightly different. Guide shows Assisted-by is for when you're the primary author with AI help (LLVM standard), Co-Authored-By is Claude's default (shared authorship). The answer is correct in principle but could be more precise.
### FACTUAL_ACCURACY (7 questions)
- **Q02-018**: Explanation says "76% fewer tokens with better results" but this specific metric is not in guide context provided
- **Q03-018**: Explanation mixes guide's 8 domains with Boris Cherny's 4 methods without clear distinction; potential confusion
- **Q04-018**: Stats cited as "53-79%" but guide (line 6259) shows "~56%" auto-invocation rate from Gao 2026; also "100% reliable" for CLAUDE.md is overstated (no source confirms 100%)
- **Q06-003**: Guide shows both $ARGUMENTS[0] and $0 as valid syntax, explanation incomplete
- **Q09-006**: Guide context excerpt shows generic "## Output Format" header (line 4849) unrelated to CLI flags; actual flag exists but wrong context provided
- **Q09-026**: Guide says ">10 occurrences = established" (line 5227) not ">10 occurrences", threshold should be "10+" or "≥10"
- **Q10-014**: Guide context shows "nano ~/.claude.json" (line 17354) which is NOT about .gitignore patterns. Correct info is in the explanation but context snippet is wrong file location.
### OUTDATED (2 questions)
- **Q10-004**: Guide shows 75-90% for /compact (line 1449: "🔴 Red | 75-90% | Use /compact or /clear"). Explanation says 70-90% which conflicts. Threshold updated from 70% to 75% in recent versions.
- **Q15-011**: Bridge script exists in examples/scripts/bridge.py, not unresolved. Guide context was incorrectly marked as unresolved.
---
## Info (Minor Issues)
- **Q09-028** (FACTUAL_ACCURACY): Guide references Osmani's article which mentions "comprehension debt" but doesn't explicitly define it as "Code you shipped but don't fully understand" - explanation is correct but attribution could be clearer
- **Q10-002** (FACTUAL_ACCURACY): Explanation correct but context snippet does not show the Esc×2 shortcut (line 15862 only shows "Esc: Dismiss current suggestion", not Esc×2). Guide context incomplete.
- **Q10-006** (TRIVIAL): Question shows answer in option text: "Bash(git *)" is the only option with wildcard syntax matching the question "allow ALL git commands".
---
## Health by Category
| Category | Pass | Issues | Pass Rate |
|----------|------|--------|-----------|
| Category Q01 | 16 | 2 | 88.9% |
| Category Q02 | 15 | 3 | 83.3% |
| Category Q03 | 17 | 2 | 89.5% |
| Category Q04 | 16 | 2 | 88.9% |
| Category Q05 | 18 | 0 | 100.0% |
| Category Q06 | 11 | 1 | 91.7% |
| Category Q07 | 16 | 0 | 100.0% |
| Category Q08 | 19 | 1 | 95.0% |
| Category Q09 | 23 | 6 | 79.3% |
| Category Q10 | 15 | 5 | 75.0% |
| Category Q11 | 17 | 0 | 100.0% |
| Category Q12 | 14 | 1 | 93.3% |
| Category Q13 | 12 | 0 | 100.0% |
| Category Q14 | 10 | 1 | 90.9% |
| Category Q15 | 12 | 1 | 92.3% |
---
## Recommended Actions
1. **Fix Critical Issues** (Priority 1)
- Review each critical issue
- Fix question/answer or update explanation
- Rebuild: `python3 scripts/build-questions.py`
2. **Review Warnings** (Priority 2)
- Evaluate ambiguities and outdated info
- Decide: fix, clarify, or accept
3. **Consider Info Issues** (Priority 3)
- Minor improvements for quality

View file

@ -0,0 +1,18 @@
ISSUE: Q01-001 - [critical] CORRECT_ANSWER - Guide shows npm as Universal Method, not curl
PASS: Q01-002
PASS: Q01-003
PASS: Q01-004
PASS: Q01-005
PASS: Q01-006
PASS: Q01-007
PASS: Q01-008
PASS: Q01-009
PASS: Q01-010
PASS: Q01-011
PASS: Q01-012
PASS: Q01-013
ISSUE: Q01-014 - [warning] AMBIGUITY - Guide context doesn't clearly list what's preserved vs not preserved
PASS: Q01-015
PASS: Q01-016
PASS: Q01-017
PASS: Q01-018

View file

@ -0,0 +1,18 @@
PASS: Q02-001
PASS: Q02-002
PASS: Q02-003
PASS: Q02-004
PASS: Q02-005
PASS: Q02-006
ISSUE: Q02-007 - [warning] AMBIGUITY - Guide context shows generic section header, not specific content about context poisoning
PASS: Q02-008
PASS: Q02-009
PASS: Q02-010
PASS: Q02-011
PASS: Q02-012
PASS: Q02-013
PASS: Q02-014
ISSUE: Q02-015 - [warning] AMBIGUITY - Guide context points to Fresh Context Pattern section, not XML prompts usage
PASS: Q02-016
PASS: Q02-017
ISSUE: Q02-018 - [warning] FACTUAL_ACCURACY - Explanation says "76% fewer tokens with better results" but this specific metric is not in guide context provided

View file

@ -0,0 +1,19 @@
PASS: Q03-001
PASS: Q03-002
PASS: Q03-003
PASS: Q03-004
PASS: Q03-005
PASS: Q03-006
PASS: Q03-007
PASS: Q03-008
PASS: Q03-009
PASS: Q03-010
ISSUE: Q03-011 - [critical] CORRECT_ANSWER - Guide states CLAUDE.md in .claude/ should be committed (project memory), not gitignored
PASS: Q03-012
PASS: Q03-013
PASS: Q03-014
PASS: Q03-015
PASS: Q03-016
PASS: Q03-017
ISSUE: Q03-018 - [warning] FACTUAL_ACCURACY - Explanation mixes guide's 8 domains with Boris Cherny's 4 methods without clear distinction; potential confusion
PASS: Q03-019

View file

@ -0,0 +1,18 @@
PASS: Q04-001
PASS: Q04-002
PASS: Q04-003
PASS: Q04-004
PASS: Q04-005
PASS: Q04-006
PASS: Q04-007
PASS: Q04-008
PASS: Q04-009
PASS: Q04-010
ISSUE: Q04-011 - [warning] AMBIGUITY - The guide context (line 17354) is irrelevant to multi-agent orchestration pattern; the correct context appears at lines 5280-5310
PASS: Q04-012
PASS: Q04-013
PASS: Q04-014
PASS: Q04-015
PASS: Q04-016
PASS: Q04-017
ISSUE: Q04-018 - [warning] FACTUAL_ACCURACY - Stats cited as "53-79%" but guide (line 6259) shows "~56%" auto-invocation rate from Gao 2026; also "100% reliable" for CLAUDE.md is overstated (no source confirms 100%)

View file

@ -0,0 +1,18 @@
PASS: Q05-001
PASS: Q05-002
PASS: Q05-003
PASS: Q05-004
PASS: Q05-005
PASS: Q05-006
PASS: Q05-007
PASS: Q05-008
PASS: Q05-009
PASS: Q05-010
PASS: Q05-011
PASS: Q05-012
PASS: Q05-013
PASS: Q05-014
PASS: Q05-015
PASS: Q05-016
PASS: Q05-017
PASS: Q05-018

View file

@ -0,0 +1,12 @@
PASS: Q06-001
PASS: Q06-002
ISSUE: Q06-003 - [warning] FACTUAL_ACCURACY - Guide shows both $ARGUMENTS[0] and $0 as valid syntax, explanation incomplete
PASS: Q06-004
PASS: Q06-005
PASS: Q06-006
PASS: Q06-007
PASS: Q06-008
PASS: Q06-009
PASS: Q06-010
PASS: Q06-011
PASS: Q06-012

View file

@ -0,0 +1,16 @@
PASS: Q07-001
PASS: Q07-002
PASS: Q07-003
PASS: Q07-004
PASS: Q07-005
PASS: Q07-006
PASS: Q07-007
PASS: Q07-008
PASS: Q07-009
PASS: Q07-010
PASS: Q07-011
PASS: Q07-012
PASS: Q07-013
PASS: Q07-014
PASS: Q07-015
PASS: Q07-016

View file

@ -0,0 +1,20 @@
PASS: Q08-001
PASS: Q08-002
PASS: Q08-003
PASS: Q08-004
PASS: Q08-005
PASS: Q08-006
PASS: Q08-007
PASS: Q08-008
PASS: Q08-009
PASS: Q08-010
PASS: Q08-011
PASS: Q08-012
PASS: Q08-013
PASS: Q08-014
PASS: Q08-015
PASS: Q08-016
PASS: Q08-017
PASS: Q08-018
ISSUE: Q08-019 - [critical] CORRECT_ANSWER - The explanation states "There is no configurable 'auto:N' parameter" yet the question claims auto:N controls lazy loading. The guide (architecture.md:996) shows ENABLE_TOOL_SEARCH=auto:N sets thresholds (5%/10%/20% context), NOT max tools. The 10,000 token threshold is automatic. The question confuses threshold configuration with tool count.
PASS: Q08-020

View file

@ -0,0 +1,14 @@
PASS: Q09-001
PASS: Q09-002
ISSUE: Q09-003 - [critical] CORRECT_ANSWER - Guide shows no CLI flag for headless mode, only mentions "Headless Mode" as section title with no content
PASS: Q09-004
ISSUE: Q09-005 - [warning] AMBIGUITY - "Rev the Engine" describes multiple rounds of PLANNING (think → plan → think harder → refine), not write-critique-improve cycles as stated in explanation
ISSUE: Q09-006 - [warning] FACTUAL_ACCURACY - Guide context excerpt shows generic "## Output Format" header (line 4849) unrelated to CLI flags; actual flag exists but wrong context provided
PASS: Q09-007
PASS: Q09-008
PASS: Q09-009
PASS: Q09-010
PASS: Q09-011
PASS: Q09-012
PASS: Q09-013
PASS: Q09-014

View file

@ -0,0 +1,15 @@
PASS: Q09-015
PASS: Q09-016
PASS: Q09-017
PASS: Q09-018
PASS: Q09-019
PASS: Q09-020
PASS: Q09-021
PASS: Q09-022
PASS: Q09-023
PASS: Q09-024
PASS: Q09-025
ISSUE: Q09-026 - [warning] FACTUAL_ACCURACY - Guide says ">10 occurrences = established" (line 5227) not ">10 occurrences", threshold should be "10+" or "≥10"
PASS: Q09-027
ISSUE: Q09-028 - [info] FACTUAL_ACCURACY - Guide references Osmani's article which mentions "comprehension debt" but doesn't explicitly define it as "Code you shipped but don't fully understand" - explanation is correct but attribution could be clearer
ISSUE: Q09-029 - [critical] CORRECT_ANSWER - Guide quote says "I treat Claude.md as compounding memory: every mistake becomes a durable rule for the team" (line 12856) - the 4-step cycle explanation is NOT from Boris Cherny, it's an interpretation. The correct answer explains a process not explicitly stated in the guide.

View file

@ -0,0 +1,20 @@
ISSUE: Q10-001 - [warning] CORRECT_ANSWER - Shift+Tab does NOT toggle plan/execute; it cycles through permission modes (default→auto→plan). Use /plan to explicitly enter Plan Mode.
ISSUE: Q10-002 - [info] FACTUAL_ACCURACY - Explanation correct but context snippet does not show the Esc×2 shortcut (line 15862 only shows "Esc: Dismiss current suggestion", not Esc×2). Guide context incomplete.
PASS: Q10-003
ISSUE: Q10-004 - [warning] OUTDATED - Guide shows 75-90% for /compact (line 1449: "🔴 Red | 75-90% | Use /compact or /clear"). Explanation says 70-90% which conflicts. Threshold updated from 70% to 75% in recent versions.
ISSUE: Q10-006 - [info] TRIVIAL - Question shows answer in option text: "Bash(git *)" is the only option with wildcard syntax matching the question "allow ALL git commands".
PASS: Q10-007
PASS: Q10-008
PASS: Q10-009
PASS: Q10-010
PASS: Q10-011
PASS: Q10-012
PASS: Q10-013
ISSUE: Q10-014 - [warning] FACTUAL_ACCURACY - Guide context shows "nano ~/.claude.json" (line 17354) which is NOT about .gitignore patterns. Correct info is in the explanation but context snippet is wrong file location.
PASS: Q10-015
PASS: Q10-016
PASS: Q10-017
PASS: Q10-018
PASS: Q10-019
PASS: Q10-020
PASS: Q10-021

View file

@ -0,0 +1,17 @@
PASS: Q11-001
PASS: Q11-002
PASS: Q11-003
PASS: Q11-004
PASS: Q11-005
PASS: Q11-006
PASS: Q11-007
PASS: Q11-008
PASS: Q11-009
PASS: Q11-010
PASS: Q11-011
PASS: Q11-012
PASS: Q11-013
PASS: Q11-014
PASS: Q11-015
PASS: Q11-016
PASS: Q11-017

View file

@ -0,0 +1,15 @@
PASS: Q12-001
PASS: Q12-002
PASS: Q12-003
PASS: Q12-004
PASS: Q12-005
PASS: Q12-006
PASS: Q12-007
PASS: Q12-008
PASS: Q12-009
PASS: Q12-010
PASS: Q12-011
ISSUE: Q12-012 - [critical] CORRECT_ANSWER - Guide shows 3 sub-agent types (Explore, Plan, general-purpose) but option b incorrectly lists 4 types including Bash as a sub-agent type. Bash is a TOOL not a sub-agent type.
PASS: Q12-013
PASS: Q12-014
PASS: Q12-015

View file

@ -0,0 +1,12 @@
PASS: Q13-001
PASS: Q13-002
PASS: Q13-003
PASS: Q13-004
PASS: Q13-005
PASS: Q13-006
PASS: Q13-007
PASS: Q13-008
PASS: Q13-009
PASS: Q13-010
PASS: Q13-011
PASS: Q13-012

View file

@ -0,0 +1,11 @@
PASS: Q14-001
PASS: Q14-002
PASS: Q14-003
PASS: Q14-004
PASS: Q14-005
PASS: Q14-006
PASS: Q14-007
PASS: Q14-008
PASS: Q14-009
PASS: Q14-010
ISSUE: Q14-011 - [warning] CORRECT_ANSWER - Both interpretations are technically valid in guide context (ai-traceability.md lines 114-138), but the nuance is slightly different. Guide shows Assisted-by is for when you're the primary author with AI help (LLVM standard), Co-Authored-By is Claude's default (shared authorship). The answer is correct in principle but could be more precise.

View file

@ -0,0 +1,13 @@
PASS: Q15-001
PASS: Q15-002
PASS: Q15-003
PASS: Q15-004
PASS: Q15-005
PASS: Q15-006
PASS: Q15-007
PASS: Q15-008
PASS: Q15-009
PASS: Q15-010
ISSUE: Q15-011 - [warning] OUTDATED - Bridge script exists in examples/scripts/bridge.py, not unresolved. Guide context was incorrectly marked as unresolved.
PASS: Q15-012
PASS: Q15-013