docs: add AI productivity research, trust calibration, and exploration workflow

## New Content

### Trust & Verification (ultimate-guide.md)
- Section 1.7 "Trust Calibration: When and How Much to Verify" (~155 lines)
  - Research-backed stats (ACM, Veracode, CodeRabbit, Cortex.io)
  - Verification spectrum by code type
  - Solo vs Team strategies with workflow diagrams
  - "Prove It Works" checklist
- New pitfall: "Trust AI output without proportional verification"
- CLAUDE.md size guideline: 4-8KB optimal, >16K degrades coherence

### AI Productivity (learning-with-ai.md)
- Section "The Reality of AI Productivity" (~55 lines)
  - Productivity curve phases (Wow Effect → Targeted Gains → Plateau)
  - High-gain vs low/negative-gain task categorization
  - Team success factors
- Productivity trajectory table by pattern (Dependent/Avoidant/Augmented)
- 5 new sources (GitHub, McKinsey, Stack Overflow, Uplevel, DORA)

### Session Limits (architecture.md)
- "Session Degradation Limits" section
  - Turn limits (15-25), token thresholds (80-100K)
  - Success rates by scope (1-3 files: ~85%, 8+ files: ~40%)

### Exploration Workflow
- NEW: guide/workflows/exploration-workflow.md
  - Anti-anchoring prompts, 3-5 approaches pattern
- iterative-refinement.md: Script Generation Workflow (3-7 iteration pattern)
- anchor-catalog.md: Anti-Anchoring Techniques, Exploration/Iteration Prompts

### Reference Updates
- adoption-approaches.md: Empirical data section
- reference.yaml: New deep_dive entries, updated line numbers

Sources: MetalBear engineering blog, arXiv studies, Addy Osmani (Jan 2026)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-01-19 19:16:33 +01:00
parent a9d302326c
commit fd17414abb
10 changed files with 775 additions and 20 deletions

View file

@ -0,0 +1,318 @@
# Exploration Before Implementation
> **Confidence**: Tier 2 — Validated by practitioner studies (+20-30% decision quality, +40% alternatives identified).
> **Source**: [MetalBear Engineering Blog](https://metalbear.com/blog/engineering-ai-use/), arXiv practitioner studies
Before coding, ask Claude for multiple approaches with trade-offs. This prevents anchoring bias—the tendency to fixate on the first solution proposed.
---
## Table of Contents
1. [TL;DR](#tldr)
2. [The Pattern](#the-pattern)
3. [Anti-Anchoring Prompts](#anti-anchoring-prompts)
4. [When to Use](#when-to-use)
5. [Integration with Claude Code](#integration-with-claude-code)
6. [Anti-Patterns](#anti-patterns)
7. [See Also](#see-also)
---
## TL;DR
```
1. Describe problem (no code, no preconception)
2. Request 3-5 approaches with trade-offs
3. Ask for quantified comparison
4. Choose approach
5. Then implement
```
Key insight: **Once a model proposes a concrete solution, it can unintentionally narrow your thinking.**
---
## The Pattern
### Step 1: Problem Statement Only
Start with the problem, not a solution direction:
```
I need to handle user sessions in a Node.js API.
Requirements:
- Support 10K concurrent users
- Session data: user ID, permissions, preferences
- Must survive server restarts
```
**Not this** (anchors on Redis):
```
I'm thinking of using Redis for sessions. How should I implement it?
```
### Step 2: Request Multiple Approaches
```
Give me 4 different approaches to solve this.
For each, include:
- Architecture overview
- Pros and cons
- Performance characteristics
- Complexity to implement
```
### Step 3: Quantified Comparison
```
Now rank these approaches on a 1-10 scale for:
- Latency (lower is better)
- Scalability (10K → 100K users)
- Operational complexity
- Development time
```
### Step 4: Choose, Then Implement
```
I'll go with approach B (JWT + Redis hybrid).
Now implement it following our existing patterns in src/auth/.
```
---
## Anti-Anchoring Prompts
LLMs can fixate on their first suggestion. These prompts combat that:
| Prompt Type | Template | Effect |
|-------------|----------|--------|
| **Fresh start** | "Ignore any prior ideas. Generate 4 novel approaches to [X]" | Forces diversity |
| **Reflection loop** | "Generate 3 options, then critique each, then recommend" | Self-correction (-25% anchoring bias) |
| **Quantified trade-offs** | "Rank by [metric1], [metric2], [metric3] with scores 1-10" | Objective comparison |
| **Devil's advocate** | "What are the strongest arguments against your recommendation?" | Surface hidden trade-offs |
| **Constraint variation** | "Now solve the same problem with [opposite constraint]" | Expand solution space |
### Example: Anti-Anchoring Prompt
```
I need pagination for a REST API with 1M+ records.
IMPORTANT: Don't suggest offset-based pagination first.
Generate 4 different pagination strategies, including at least one
unconventional approach. For each:
1. How it works (2-3 sentences)
2. Best use case
3. Worst use case
4. Performance at 1M records
Then recommend one, explaining why it beats the others for my use case.
```
### Reflection Loop Prompt
```
For implementing real-time notifications:
Phase 1: Generate 3 approaches (WebSockets, SSE, Long Polling)
Phase 2: For each, list 2 things that could go wrong in production
Phase 3: Based on Phase 2, which approach is most resilient?
Show your reasoning for each phase.
```
---
## When to Use
### Use Exploration
| Scenario | Why |
|----------|-----|
| Greenfield features | No existing pattern to follow |
| Architecture decisions | High impact, hard to reverse |
| Multiple valid approaches | Need informed choice |
| Unfamiliar domain | Don't know what you don't know |
| Team disagreement | Get neutral analysis of options |
### Skip Exploration
| Scenario | Why |
|----------|-----|
| Bug fixes | Solution usually obvious from symptoms |
| Single valid approach | No real choice to make |
| Time-critical hotfixes | Speed > perfection |
| Following existing pattern | Decision already made |
| Trivial changes | Overhead not worth it |
---
## Integration with Claude Code
### With /plan Mode
Exploration happens **before** `/plan`:
```
# Step 1: Explore (no /plan yet)
I need to add caching to the API. What are my options?
# Claude responds with 4 approaches
# Step 2: Choose
Let's go with approach C (edge caching with Cloudflare).
# Step 3: Plan
/plan
Implement edge caching using Cloudflare Workers.
Follow the patterns in our existing middleware.
```
### With CLAUDE.md
Add exploration triggers to your project instructions:
```markdown
## Workflow Preferences
### Before New Features
When implementing new features, first explore 3-4 approaches
with trade-offs before committing to implementation.
Use quantified comparison (1-10 scale) for:
- Performance
- Maintainability
- Time to implement
```
### With TodoWrite
Track exploration as a task:
```
TodoWrite:
- [x] Explore caching approaches (4 options analyzed)
- [x] Choose approach: edge caching with Cloudflare
- [ ] Implement cache invalidation
- [ ] Add cache headers to responses
```
---
## Anti-Patterns
### Premature Anchoring
```
# Wrong
"I want to use Redis for caching. How do I set it up?"
# Right
"I need caching for API responses. What are my options?"
```
You've eliminated 90% of solutions before exploring.
### Shallow Comparison
```
# Wrong
"Which is better: Redis or Memcached?"
# Right
"Compare Redis, Memcached, and in-process caching for:
- 50MB cache size
- Read-heavy workload (95% reads)
- Single server deployment
Include latency, memory overhead, and operational complexity."
```
### Ignoring Trade-offs
```
# Wrong
"Just tell me the best approach"
# Right
"What would I give up by choosing approach A over B?
What scenarios would make me regret this choice?"
```
### Analysis Paralysis
```
# Wrong
Exploring 10 options for a simple feature
# Right
Set a limit: "Give me your top 3 approaches, no more"
Make a decision and iterate if needed
```
---
## Example Session
### Initial Exploration
```
User: I need to implement real-time updates in our React dashboard.
The dashboard shows ~50 metrics that update every 5 seconds.
Currently we poll, but users complain about stale data.
Claude: Here are 4 approaches:
**A) WebSockets**
- Full duplex, server can push anytime
- Pros: Lowest latency, true real-time
- Cons: Connection management, scaling complexity
- Best for: Sub-second updates, bidirectional communication
**B) Server-Sent Events (SSE)**
- Unidirectional server → client stream
- Pros: Simple, HTTP-based, auto-reconnect
- Cons: One-way only, limited browser connections
- Best for: Dashboard feeds, notifications
**C) Long Polling**
- Client holds request until data ready
- Pros: Works everywhere, simple fallback
- Cons: Higher latency, more server load
- Best for: Legacy compatibility
**D) GraphQL Subscriptions**
- WebSocket-based with GraphQL schema
- Pros: Type-safe, selective updates
- Cons: Requires GraphQL infrastructure
- Best for: Existing GraphQL APIs
User: Rank these for our use case (React, 50 metrics, 5s refresh,
team has no WebSocket experience).
Claude: Scores (1-10, higher is better):
| Approach | Latency | Complexity | Team Fit | Scalability |
|----------|---------|------------|----------|-------------|
| WebSockets | 10 | 4 | 3 | 7 |
| SSE | 8 | 8 | 8 | 8 |
| Long Polling | 5 | 9 | 9 | 5 |
| GraphQL Subs | 9 | 3 | 2 | 7 |
**Recommendation: SSE**
Best balance of simplicity and real-time capability.
Your team can implement it in a day, and it handles your
50 metrics × 5s updates easily. Upgrade to WebSockets later
if you need bidirectional or sub-second updates.
User: Let's go with SSE. Now implement it.
```
---
## See Also
- [plan-driven.md](./plan-driven.md) — For codebase exploration (/plan mode)
- [iterative-refinement.md](./iterative-refinement.md) — Refine after choosing approach
- [../examples/semantic-anchors/anchor-catalog.md](../../examples/semantic-anchors/anchor-catalog.md) — Precision vocabulary for prompts
- [spec-first.md](./spec-first.md) — Define requirements before exploring

View file

@ -13,8 +13,10 @@ Prompt, observe, reprompt until satisfied. The core loop of effective AI-assiste
3. [Feedback Patterns](#feedback-patterns)
4. [Autonomous Loops](#autonomous-loops)
5. [Integration with Claude Code](#integration-with-claude-code)
6. [Anti-Patterns](#anti-patterns)
7. [See Also](#see-also)
6. [Script Generation Workflow](#script-generation-workflow)
7. [Iteration Strategies](#iteration-strategies)
8. [Anti-Patterns](#anti-patterns)
9. [See Also](#see-also)
---
@ -189,6 +191,79 @@ Good progress. Let's checkpoint:
---
## Script Generation Workflow
Script and automation generation delivers the highest ROI for iterative refinement—70-90% time savings in practitioner reports. Scripts are self-contained, testable in isolation, and yield immediate value.
### The 3-7 Iteration Pattern
Most production-ready scripts emerge after 3-7 iterations:
| Iteration | Focus | Prompt Pattern |
|-----------|-------|----------------|
| 1 | Basic functionality | "Create a script that [goal]" |
| 2-3 | Constraints + edge cases | "Add [constraint]. Handle [edge case]." |
| 4-5 | Hardening | "Add error handling, logging, input validation" |
| 6-7 | Polish | "Optimize for [metric]. Add usage docs." |
### Example: Kubernetes Pod Manager (PowerShell)
**Iteration 1 — Basic**
```
Create a PowerShell function to list pods in a Kubernetes namespace.
```
**Iteration 2 — Add filtering**
```
Add: filter by label selector and pod status.
Show: pod name, status, age, restarts.
```
**Iteration 3 — Add actions**
```
Add: ability to delete pods matching filter.
Require: confirmation before deletion.
```
**Iteration 4 — Error handling**
```
Handle: kubectl not found, invalid namespace, permission denied.
Add: verbose logging with -Verbose flag.
```
**Iteration 5 — Production ready**
```
Add: dry-run mode, output to JSON for piping, help documentation.
Ensure: works on Windows, Linux, macOS.
```
### Common Pitfalls
| Pitfall | Example | Mitigation |
|---------|---------|------------|
| Hallucinated commands | `apt-get` on macOS | Specify OS: "Ubuntu 22.04 only" |
| Security gaps | No input validation | Always request: "validate all user inputs" |
| Over-engineering | Adds unnecessary libs | Request: "minimal dependencies, stdlib preferred" |
| Context drift | Forgets requirements after iteration 5 | Checkpoint prompt: "Recap current requirements before next change" |
| Platform assumptions | Assumes bash features in sh | Specify: "POSIX-compliant" or "bash 4+" |
### Script Iteration Template
```
Current script: [paste or reference]
Iteration goal: [specific improvement]
Constraints:
- Must preserve: [existing behavior to keep]
- Must not: [things to avoid]
- Target environment: [OS, shell, runtime]
Success criteria: [how to verify this iteration works]
```
---
## Iteration Strategies
### Breadth-First
@ -307,6 +382,7 @@ Perfect. Commit this as "feat: add debounce utility with full TypeScript support
## See Also
- [exploration-workflow.md](./exploration-workflow.md) — Explore alternatives before iterating
- [tdd-with-claude.md](./tdd-with-claude.md) — TDD is iterative refinement with tests
- [plan-driven.md](./plan-driven.md) — Plan before iterating
- [../methodologies.md](../methodologies.md) — Iterative Loops methodology

View file

@ -244,6 +244,7 @@ Plans in `.claude/plans/` serve as decision documentation:
## See Also
- [exploration-workflow.md](./exploration-workflow.md) — Explore alternatives before planning
- [../ultimate-guide.md](../ultimate-guide.md) — Section 2.3 Plan Mode
- [tdd-with-claude.md](./tdd-with-claude.md) — Combine with TDD
- [spec-first.md](./spec-first.md) — Combine with Spec-First