docs: add AI productivity research, trust calibration, and exploration workflow

## New Content ### Trust & Verification (ultimate-guide.md) - Section 1.7 "Trust Calibration: When and How Much to Verify" (~155 lines) - Research-backed stats (ACM, Veracode, CodeRabbit, Cortex.io) - Verification spectrum by code type - Solo vs Team strategies with workflow diagrams - "Prove It Works" checklist - New pitfall: "Trust AI output without proportional verification" - CLAUDE.md size guideline: 4-8KB optimal, >16K degrades coherence ### AI Productivity (learning-with-ai.md) - Section "The Reality of AI Productivity" (~55 lines) - Productivity curve phases (Wow Effect → Targeted Gains → Plateau) - High-gain vs low/negative-gain task categorization - Team success factors - Productivity trajectory table by pattern (Dependent/Avoidant/Augmented) - 5 new sources (GitHub, McKinsey, Stack Overflow, Uplevel, DORA) ### Session Limits (architecture.md) - "Session Degradation Limits" section - Turn limits (15-25), token thresholds (80-100K) - Success rates by scope (1-3 files: ~85%, 8+ files: ~40%) ### Exploration Workflow - NEW: guide/workflows/exploration-workflow.md - Anti-anchoring prompts, 3-5 approaches pattern - iterative-refinement.md: Script Generation Workflow (3-7 iteration pattern) - anchor-catalog.md: Anti-Anchoring Techniques, Exploration/Iteration Prompts ### Reference Updates - adoption-approaches.md: Empirical data section - reference.yaml: New deep_dive entries, updated line numbers Sources: MetalBear engineering blog, arXiv studies, Addy Osmani (Jan 2026) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 19:16:33 +01:00 · 2026-01-19 19:16:33 +01:00 · fd17414abb
commit fd17414abb
parent a9d302326c
10 changed files with 775 additions and 20 deletions
--- a/guide/workflows/exploration-workflow.md
+++ b/guide/workflows/exploration-workflow.md
@ -0,0 +1,318 @@
+# Exploration Before Implementation
+
+> **Confidence**: Tier 2 — Validated by practitioner studies (+20-30% decision quality, +40% alternatives identified).
+> **Source**: [MetalBear Engineering Blog](https://metalbear.com/blog/engineering-ai-use/), arXiv practitioner studies
+
+Before coding, ask Claude for multiple approaches with trade-offs. This prevents anchoring bias—the tendency to fixate on the first solution proposed.
+
+---
+
+## Table of Contents
+
+1. [TL;DR](#tldr)
+2. [The Pattern](#the-pattern)
+3. [Anti-Anchoring Prompts](#anti-anchoring-prompts)
+4. [When to Use](#when-to-use)
+5. [Integration with Claude Code](#integration-with-claude-code)
+6. [Anti-Patterns](#anti-patterns)
+7. [See Also](#see-also)
+
+---
+
+## TL;DR
+
+```
+1. Describe problem (no code, no preconception)
+2. Request 3-5 approaches with trade-offs
+3. Ask for quantified comparison
+4. Choose approach
+5. Then implement
+```
+
+Key insight: **Once a model proposes a concrete solution, it can unintentionally narrow your thinking.**
+
+---
+
+## The Pattern
+
+### Step 1: Problem Statement Only
+
+Start with the problem, not a solution direction:
+
+```
+I need to handle user sessions in a Node.js API.
+Requirements:
+- Support 10K concurrent users
+- Session data: user ID, permissions, preferences
+- Must survive server restarts
+```
+
+**Not this** (anchors on Redis):
+```
+I'm thinking of using Redis for sessions. How should I implement it?
+```
+
+### Step 2: Request Multiple Approaches
+
+```
+Give me 4 different approaches to solve this.
+For each, include:
+- Architecture overview
+- Pros and cons
+- Performance characteristics
+- Complexity to implement
+```
+
+### Step 3: Quantified Comparison
+
+```
+Now rank these approaches on a 1-10 scale for:
+- Latency (lower is better)
+- Scalability (10K → 100K users)
+- Operational complexity
+- Development time
+```
+
+### Step 4: Choose, Then Implement
+
+```
+I'll go with approach B (JWT + Redis hybrid).
+Now implement it following our existing patterns in src/auth/.
+```
+
+---
+
+## Anti-Anchoring Prompts
+
+LLMs can fixate on their first suggestion. These prompts combat that:
+
+| Prompt Type | Template | Effect |
+|-------------|----------|--------|
+| **Fresh start** | "Ignore any prior ideas. Generate 4 novel approaches to [X]" | Forces diversity |
+| **Reflection loop** | "Generate 3 options, then critique each, then recommend" | Self-correction (-25% anchoring bias) |
+| **Quantified trade-offs** | "Rank by [metric1], [metric2], [metric3] with scores 1-10" | Objective comparison |
+| **Devil's advocate** | "What are the strongest arguments against your recommendation?" | Surface hidden trade-offs |
+| **Constraint variation** | "Now solve the same problem with [opposite constraint]" | Expand solution space |
+
+### Example: Anti-Anchoring Prompt
+
+```
+I need pagination for a REST API with 1M+ records.
+
+IMPORTANT: Don't suggest offset-based pagination first.
+Generate 4 different pagination strategies, including at least one
+unconventional approach. For each:
+
+1. How it works (2-3 sentences)
+2. Best use case
+3. Worst use case
+4. Performance at 1M records
+
+Then recommend one, explaining why it beats the others for my use case.
+```
+
+### Reflection Loop Prompt
+
+```
+For implementing real-time notifications:
+
+Phase 1: Generate 3 approaches (WebSockets, SSE, Long Polling)
+Phase 2: For each, list 2 things that could go wrong in production
+Phase 3: Based on Phase 2, which approach is most resilient?
+
+Show your reasoning for each phase.
+```
+
+---
+
+## When to Use
+
+### Use Exploration
+
+| Scenario | Why |
+|----------|-----|
+| Greenfield features | No existing pattern to follow |
+| Architecture decisions | High impact, hard to reverse |
+| Multiple valid approaches | Need informed choice |
+| Unfamiliar domain | Don't know what you don't know |
+| Team disagreement | Get neutral analysis of options |
+
+### Skip Exploration
+
+| Scenario | Why |
+|----------|-----|
+| Bug fixes | Solution usually obvious from symptoms |
+| Single valid approach | No real choice to make |
+| Time-critical hotfixes | Speed > perfection |
+| Following existing pattern | Decision already made |
+| Trivial changes | Overhead not worth it |
+
+---
+
+## Integration with Claude Code
+
+### With /plan Mode
+
+Exploration happens **before** `/plan`:
+
+```
+# Step 1: Explore (no /plan yet)
+I need to add caching to the API. What are my options?
+
+# Claude responds with 4 approaches
+
+# Step 2: Choose
+Let's go with approach C (edge caching with Cloudflare).
+
+# Step 3: Plan
+/plan
+Implement edge caching using Cloudflare Workers.
+Follow the patterns in our existing middleware.
+```
+
+### With CLAUDE.md
+
+Add exploration triggers to your project instructions:
+
+```markdown
+## Workflow Preferences
+
+### Before New Features
+When implementing new features, first explore 3-4 approaches
+with trade-offs before committing to implementation.
+Use quantified comparison (1-10 scale) for:
+- Performance
+- Maintainability
+- Time to implement
+```
+
+### With TodoWrite
+
+Track exploration as a task:
+
+```
+TodoWrite:
+- [x] Explore caching approaches (4 options analyzed)
+- [x] Choose approach: edge caching with Cloudflare
+- [ ] Implement cache invalidation
+- [ ] Add cache headers to responses
+```
+
+---
+
+## Anti-Patterns
+
+### Premature Anchoring
+
+```
+# Wrong
+"I want to use Redis for caching. How do I set it up?"
+
+# Right
+"I need caching for API responses. What are my options?"
+```
+
+You've eliminated 90% of solutions before exploring.
+
+### Shallow Comparison
+
+```
+# Wrong
+"Which is better: Redis or Memcached?"
+
+# Right
+"Compare Redis, Memcached, and in-process caching for:
+- 50MB cache size
+- Read-heavy workload (95% reads)
+- Single server deployment
+Include latency, memory overhead, and operational complexity."
+```
+
+### Ignoring Trade-offs
+
+```
+# Wrong
+"Just tell me the best approach"
+
+# Right
+"What would I give up by choosing approach A over B?
+What scenarios would make me regret this choice?"
+```
+
+### Analysis Paralysis
+
+```
+# Wrong
+Exploring 10 options for a simple feature
+
+# Right
+Set a limit: "Give me your top 3 approaches, no more"
+Make a decision and iterate if needed
+```
+
+---
+
+## Example Session
+
+### Initial Exploration
+
+```
+User: I need to implement real-time updates in our React dashboard.
+      The dashboard shows ~50 metrics that update every 5 seconds.
+      Currently we poll, but users complain about stale data.
+
+Claude: Here are 4 approaches:
+
+**A) WebSockets**
+- Full duplex, server can push anytime
+- Pros: Lowest latency, true real-time
+- Cons: Connection management, scaling complexity
+- Best for: Sub-second updates, bidirectional communication
+
+**B) Server-Sent Events (SSE)**
+- Unidirectional server → client stream
+- Pros: Simple, HTTP-based, auto-reconnect
+- Cons: One-way only, limited browser connections
+- Best for: Dashboard feeds, notifications
+
+**C) Long Polling**
+- Client holds request until data ready
+- Pros: Works everywhere, simple fallback
+- Cons: Higher latency, more server load
+- Best for: Legacy compatibility
+
+**D) GraphQL Subscriptions**
+- WebSocket-based with GraphQL schema
+- Pros: Type-safe, selective updates
+- Cons: Requires GraphQL infrastructure
+- Best for: Existing GraphQL APIs
+
+User: Rank these for our use case (React, 50 metrics, 5s refresh,
+      team has no WebSocket experience).
+
+Claude: Scores (1-10, higher is better):
+
+| Approach | Latency | Complexity | Team Fit | Scalability |
+|----------|---------|------------|----------|-------------|
+| WebSockets | 10 | 4 | 3 | 7 |
+| SSE | 8 | 8 | 8 | 8 |
+| Long Polling | 5 | 9 | 9 | 5 |
+| GraphQL Subs | 9 | 3 | 2 | 7 |
+
+**Recommendation: SSE**
+Best balance of simplicity and real-time capability.
+Your team can implement it in a day, and it handles your
+50 metrics × 5s updates easily. Upgrade to WebSockets later
+if you need bidirectional or sub-second updates.
+
+User: Let's go with SSE. Now implement it.
+```
+
+---
+
+## See Also
+
+- [plan-driven.md](./plan-driven.md) — For codebase exploration (/plan mode)
+- [iterative-refinement.md](./iterative-refinement.md) — Refine after choosing approach
+- [../examples/semantic-anchors/anchor-catalog.md](../../examples/semantic-anchors/anchor-catalog.md) — Precision vocabulary for prompts
+- [spec-first.md](./spec-first.md) — Define requirements before exploring
--- a/guide/workflows/iterative-refinement.md
+++ b/guide/workflows/iterative-refinement.md
@ -13,8 +13,10 @@ Prompt, observe, reprompt until satisfied. The core loop of effective AI-assiste
 3. [Feedback Patterns](#feedback-patterns)
 4. [Autonomous Loops](#autonomous-loops)
 5. [Integration with Claude Code](#integration-with-claude-code)
-6. [Anti-Patterns](#anti-patterns)
-7. [See Also](#see-also)
+6. [Script Generation Workflow](#script-generation-workflow)
+7. [Iteration Strategies](#iteration-strategies)
+8. [Anti-Patterns](#anti-patterns)
+9. [See Also](#see-also)

 ---

@ -189,6 +191,79 @@ Good progress. Let's checkpoint:

 ---

+## Script Generation Workflow
+
+Script and automation generation delivers the highest ROI for iterative refinement—70-90% time savings in practitioner reports. Scripts are self-contained, testable in isolation, and yield immediate value.
+
+### The 3-7 Iteration Pattern
+
+Most production-ready scripts emerge after 3-7 iterations:
+
+| Iteration | Focus | Prompt Pattern |
+|-----------|-------|----------------|
+| 1 | Basic functionality | "Create a script that [goal]" |
+| 2-3 | Constraints + edge cases | "Add [constraint]. Handle [edge case]." |
+| 4-5 | Hardening | "Add error handling, logging, input validation" |
+| 6-7 | Polish | "Optimize for [metric]. Add usage docs." |
+
+### Example: Kubernetes Pod Manager (PowerShell)
+
+**Iteration 1 — Basic**
+```
+Create a PowerShell function to list pods in a Kubernetes namespace.
+```
+
+**Iteration 2 — Add filtering**
+```
+Add: filter by label selector and pod status.
+Show: pod name, status, age, restarts.
+```
+
+**Iteration 3 — Add actions**
+```
+Add: ability to delete pods matching filter.
+Require: confirmation before deletion.
+```
+
+**Iteration 4 — Error handling**
+```
+Handle: kubectl not found, invalid namespace, permission denied.
+Add: verbose logging with -Verbose flag.
+```
+
+**Iteration 5 — Production ready**
+```
+Add: dry-run mode, output to JSON for piping, help documentation.
+Ensure: works on Windows, Linux, macOS.
+```
+
+### Common Pitfalls
+
+| Pitfall | Example | Mitigation |
+|---------|---------|------------|
+| Hallucinated commands | `apt-get` on macOS | Specify OS: "Ubuntu 22.04 only" |
+| Security gaps | No input validation | Always request: "validate all user inputs" |
+| Over-engineering | Adds unnecessary libs | Request: "minimal dependencies, stdlib preferred" |
+| Context drift | Forgets requirements after iteration 5 | Checkpoint prompt: "Recap current requirements before next change" |
+| Platform assumptions | Assumes bash features in sh | Specify: "POSIX-compliant" or "bash 4+" |
+
+### Script Iteration Template
+
+```
+Current script: [paste or reference]
+
+Iteration goal: [specific improvement]
+
+Constraints:
+- Must preserve: [existing behavior to keep]
+- Must not: [things to avoid]
+- Target environment: [OS, shell, runtime]
+
+Success criteria: [how to verify this iteration works]
+```
+
+---
+
 ## Iteration Strategies

 ### Breadth-First
@ -307,6 +382,7 @@ Perfect. Commit this as "feat: add debounce utility with full TypeScript support

 ## See Also

+- [exploration-workflow.md](./exploration-workflow.md) — Explore alternatives before iterating
 - [tdd-with-claude.md](./tdd-with-claude.md) — TDD is iterative refinement with tests
 - [plan-driven.md](./plan-driven.md) — Plan before iterating
 - [../methodologies.md](../methodologies.md) — Iterative Loops methodology
--- a/guide/workflows/plan-driven.md
+++ b/guide/workflows/plan-driven.md
@ -244,6 +244,7 @@ Plans in `.claude/plans/` serve as decision documentation:

 ## See Also

+- [exploration-workflow.md](./exploration-workflow.md) — Explore alternatives before planning
 - [../ultimate-guide.md](../ultimate-guide.md) — Section 2.3 Plan Mode
 - [tdd-with-claude.md](./tdd-with-claude.md) — Combine with TDD
 - [spec-first.md](./spec-first.md) — Combine with Spec-First