docs: add AI productivity research, trust calibration, and exploration workflow

## New Content ### Trust & Verification (ultimate-guide.md) - Section 1.7 "Trust Calibration: When and How Much to Verify" (~155 lines) - Research-backed stats (ACM, Veracode, CodeRabbit, Cortex.io) - Verification spectrum by code type - Solo vs Team strategies with workflow diagrams - "Prove It Works" checklist - New pitfall: "Trust AI output without proportional verification" - CLAUDE.md size guideline: 4-8KB optimal, >16K degrades coherence ### AI Productivity (learning-with-ai.md) - Section "The Reality of AI Productivity" (~55 lines) - Productivity curve phases (Wow Effect → Targeted Gains → Plateau) - High-gain vs low/negative-gain task categorization - Team success factors - Productivity trajectory table by pattern (Dependent/Avoidant/Augmented) - 5 new sources (GitHub, McKinsey, Stack Overflow, Uplevel, DORA) ### Session Limits (architecture.md) - "Session Degradation Limits" section - Turn limits (15-25), token thresholds (80-100K) - Success rates by scope (1-3 files: ~85%, 8+ files: ~40%) ### Exploration Workflow - NEW: guide/workflows/exploration-workflow.md - Anti-anchoring prompts, 3-5 approaches pattern - iterative-refinement.md: Script Generation Workflow (3-7 iteration pattern) - anchor-catalog.md: Anti-Anchoring Techniques, Exploration/Iteration Prompts ### Reference Updates - adoption-approaches.md: Empirical data section - reference.yaml: New deep_dive entries, updated line numbers Sources: MetalBear engineering blog, arXiv studies, Addy Osmani (Jan 2026) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 19:16:33 +01:00 · 2026-01-19 19:16:33 +01:00 · fd17414abb
commit fd17414abb
parent a9d302326c
10 changed files with 775 additions and 20 deletions
--- a/guide/architecture.md
+++ b/guide/architecture.md
@ -269,6 +269,33 @@ When context usage exceeds a threshold, Claude Code automatically summarizes old
 | Specific reads | Know what you need | Read exact files, not directories |
 | CLAUDE.md | Persistent context | Store conventions in memory files |

+### Session Degradation Limits
+
+**Confidence**: 70% (Tier 2 - Practitioner studies, arXiv research)
+
+Claude Code's effectiveness degrades predictably under certain conditions:
+
+| Condition | Observed Threshold | Symptom |
+|-----------|-------------------|---------|
+| Conversation turns | **15-25 turns** | Loses track of earlier constraints |
+| Token accumulation | **80-100K tokens** | Ignores requirements stated early in session |
+| Problem scope | **>5 files simultaneously** | Inconsistent changes, missed files |
+
+**Success rates by scope** (from practitioner studies):
+
+| Scope | Success Rate | Example |
+|-------|--------------|---------|
+| 1-3 files | ~85% | Fix bug in single module |
+| 4-7 files | ~60% | Refactor feature across components |
+| 8+ files | ~40% | Codebase-wide changes |
+
+**Mitigation strategies**:
+
+1. **Checkpoint prompts**: "Before continuing, recap the current requirements and constraints."
+2. **Session resets**: Start fresh for new tasks (`/clear`)
+3. **Scope tightly**: Break large tasks into focused sub-tasks
+4. **Use sub-agents**: Delegate exploration to `Task` tool to preserve main context
+
 ---

 ## 4. Sub-Agent Architecture