claude-code-ultimate-guide/guide/methodologies.md
Florian BRUNIAUX 444ce5aa6a docs: add resource evaluation process documentation
Added comprehensive documentation for resource evaluation methodology:

- Appendix C in ultimate-guide.md (line 15034)
  * 5-point scoring system with decision criteria
  * 14 documented assessments by category
  * Transparency on integration decisions
  * Links to full methodology

- Fixed methodologies.md GSD evaluation link
  * Corrected path: claudedocs → docs (public)

- Updated reference.yaml with resource evaluations metadata
  * Directory, count (14), methodology references
  * Appendix and README section line numbers

All resource-evaluations now properly documented and discoverable.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 15:10:24 +01:00

479 lines
22 KiB
Markdown

# Development Methodologies Reference
> **Confidence**: Tier 2 — Validated by multiple production reports and official documentation.
>
> **Last updated**: January 2026
This is a quick reference for 15 structured development methodologies that have emerged for AI-assisted development in 2025-2026. For hands-on practical workflows, see [workflows/](./workflows/).
---
## Table of Contents
1. [Decision Tree](#decision-tree-what-do-you-need)
2. [The 15 Methodologies](#the-15-methodologies)
3. [SDD Tools Reference](#sdd-tools-reference)
4. [Writing Effective Specs](#writing-effective-specs)
5. [Combination Patterns](#combination-patterns)
6. [Sources](#sources)
---
## Decision Tree: What Do You Need?
```
┌─ "I want quality code" ────────────→ workflows/tdd-with-claude.md
├─ "I want to spec before code" ─────→ workflows/spec-first.md
├─ "I need to plan architecture" ────→ workflows/plan-driven.md
├─ "I'm iterating on something" ─────→ workflows/iterative-refinement.md
└─ "I need methodology theory" ──────→ Continue reading below
```
---
## The 15 Methodologies
Organized in a 6-tier pyramid from strategic orchestration down to optimization techniques.
### Tier 1: Strategic Orchestration
| Name | What | Best For | Claude Fit |
|------|------|----------|------------|
| **BMAD** | Multi-agent governance with constitution as guardrail | Enterprise 10+ teams, long-term projects | ⭐⭐ Niche but powerful |
| **GSD** | Meta-prompting 6-phase workflow with fresh contexts per task | Solo devs, Claude Code CLI | ⭐⭐ Similar to patterns in guide |
**BMAD (Breakthrough Method for Agile AI-Driven Development)** inverts the traditional paradigm: documentation becomes the source of truth, not code. Uses specialized agents (Analyst, PM, Architect, Developer, QA) orchestrated with strict governance.
- **Key concept**: Constitution.md as strategic guardrail
- **When to use**: Complex enterprise projects needing governance
- **When to avoid**: Small teams, MVPs, rapid prototyping
**GSD (Get Shit Done)** addresses context rot through systematic 6-phase workflow (Initialize → Discuss → Plan → Execute → Verify → Complete) with fresh 200k-token contexts per task. Core concepts (multi-agent orchestration, fresh context management) overlap significantly with existing patterns like Ralph Loop, Gas Town, and BMAD. See [resource evaluation](../docs/resource-evaluations/gsd-evaluation.md) for detailed comparison.
> **Emerging**: [Ralph Inferno](https://github.com/sandstream/ralph-inferno) implements autonomous multi-persona workflows (Analyst→PM→UX→Architect→Business) with VM-based execution and self-correcting E2E loops. Experimental but interesting for "vibe coding at scale".
---
### Foundational Discipline: Plan-First Workflow
> **"Once the plan is good, the code is good."**
> — Boris Cherny, creator of Claude Code
**Not just a feature (`/plan` command) — a systematic discipline.**
**The Mental Model**:
Planning isn't optional for complex tasks. It's the difference between:
- ❌ 8 iterations of "try → fix → retry → fix again"
- ✅ 1 iteration of "plan → validate → execute cleanly"
**When to plan first**:
| Task Complexity | Plan First? | Why |
|----------------|-------------|-----|
| >3 files modified | ✅ Yes | Cross-file dependencies need architecture |
| >50 lines changed | ✅ Yes | Enough complexity for mistakes |
| Architectural changes | ✅ Yes | Impact analysis required |
| Unfamiliar codebase | ✅ Yes | Need exploration before action |
| Typo/obvious fix | ❌ No | Planning overhead > task time |
| Single-line change | ❌ No | Just do it |
**How plan-first works**:
1. **Exploration phase** (`/plan` mode):
- Claude reads files, explores architecture
- No edits allowed → forces thinking before action
- Proposes approach with trade-offs
2. **Validation phase** (you review):
- Plan exposes assumptions and gaps
- Easier to correct direction now vs after 100 lines written
- Plan becomes contract for execution
3. **Execution phase** (`/execute`):
- Plan → code becomes mechanical translation
- Fewer surprises, cleaner implementation
- Faster overall despite "slower" start
**Boris Cherny workflow**:
> "I run many sessions, start in plan mode, then switch into execution once the plan looks right. The signature upgrade is verification—giving Claude a way to test and confirm its own output."
**Benefits over "just start coding"**:
- **Fewer correction iterations**: Plan catches issues before they become code
- **Better architecture**: Forced to think about structure first
- **Clearer communication**: Plan is shared understanding with team/Claude
- **Reduced cost**: One clean iteration < multiple messy iterations (even if plan phase costs tokens)
**Integration with CLAUDE.md**:
Document your team's plan-first triggers:
```markdown
## Planning Policy
- ALWAYS plan first: API changes, database migrations, new features
- OPTIONAL planning: Bug fixes <10 lines, test additions
- NEVER skip: Changes affecting >2 modules
```
**See also**: [Plan Mode documentation](./ultimate-guide.md#23-plan-mode) for `/plan` command usage.
---
### Tier 2: Specification & Architecture
| Name | What | Best For | Claude Fit |
|------|------|----------|------------|
| **SDD** | Specs before code | APIs, contracts | ⭐⭐⭐ Core pattern |
| **Doc-Driven** | Docs = source of truth | Cross-team alignment | ⭐⭐⭐ CLAUDE.md native |
| **Req-Driven** | Rich artifact context (20+ artifacts) | Complex requirements | ⭐⭐ Heavy setup |
| **DDD** | Domain language first | Business logic | ⭐⭐ Design-time |
**SDD (Spec-Driven Development)** Specifications BEFORE code. One well-structured iteration equals 8 unstructured ones. CLAUDE.md IS your spec file.
**Doc-Driven Development** Living documentation versioned in git becomes the single source of truth. Changes to specs trigger implementation.
**Requirements-Driven Development** Uses CLAUDE.md as comprehensive implementation guide with 20+ structured artifacts.
**DDD (Domain-Driven Design)** Aligns software with business language through:
- Ubiquitous Language: Shared vocabulary in code
- Bounded Contexts: Isolated domain boundaries
- Domain Distillation: Core vs Support vs Generic domains
---
### Tier 3: Behavior & Acceptance
| Name | What | Best For | Claude Fit |
|------|------|----------|------------|
| **BDD** | Given-When-Then scenarios | Stakeholder collaboration | ⭐⭐⭐ Tests & specs |
| **ATDD** | Acceptance criteria first | Compliance, regulated | ⭐⭐ Process-heavy |
| **CDD** | API contracts as interface | Microservices | ⭐⭐⭐ OpenAPI native |
**BDD (Behavior-Driven Development)** Beyond testing: a collaboration process.
1. Discovery: Involve devs and business experts
2. Formulation: Write Given-When-Then examples
3. Automation: Convert to executable tests (Gherkin/Cucumber)
```gherkin
Feature: Order Management
Scenario: Cannot buy without stock
Given product with 0 stock
When customer attempts purchase
Then system refuses with error message
```
**ATDD (Acceptance Test-Driven Development)** Acceptance criteria defined BEFORE coding, collaboratively ("Three Amigos": Business, Dev, Test).
**CDD (Contract-Driven Development)** API contracts (OpenAPI specs) as executable interface between teams. Patterns: Contract as Test, Contract as Stub.
---
### Tier 4: Feature Delivery
| Name | What | Best For | Claude Fit |
|------|------|----------|------------|
| **FDD** | Feature-by-feature delivery | Large teams 10+ | ⭐⭐ Structure |
| **Context Eng.** | Context as first-class design | Long sessions | ⭐⭐⭐ Fundamental |
**FDD (Feature-Driven Development)** Five processes:
1. Develop Overall Model
2. Build Features List
3. Plan by Feature
4. Design by Feature
5. Build by Feature
Strict iteration: 2 weeks max per feature.
**Context Engineering** Treat context as design element:
- Progressive Disclosure: Let agent discover incrementally
- Memory Management: Conversation vs persistent memory
- Dynamic Refresh: Rewrite TODO list before response
---
### Tier 5: Implementation
| Name | What | Best For | Claude Fit |
|------|------|----------|------------|
| **TDD** | Red-Green-Refactor | Quality code | ⭐⭐⭐ Core workflow |
| **Eval-Driven** | Evals for LLM outputs | AI products | ⭐⭐⭐ Agents |
| **Multi-Agent** | Orchestrate sub-agents | Complex tasks | ⭐⭐⭐ Task tool |
**TDD (Test-Driven Development)** The classic cycle:
1. **Red**: Write failing test
2. **Green**: Minimal code to pass
3. **Refactor**: Clean up, tests stay green
With Claude: Be explicit. "Write FAILING tests that don't exist yet."
> **Verification Loops** — A formalized pattern for autonomous iteration (broader than TDD):
>
> **Core principle**: Give Claude a mechanism to verify its own output.
>
> ```
> Code generated → Verification tool → Feedback loop → Improvement
> ```
>
> **Why it works** (Boris Cherny): *"An agent that can 'see' what it has done produces better results."*
>
> **Verification mechanisms by domain**:
>
> | Domain | Verification Tool | What Claude "Sees" |
> |--------|-------------------|-------------------|
> | **Frontend** | Browser preview (live reload) | Visual rendering, layout, interactions |
> | **Backend** | Tests (unit/integration) | Pass/fail status, error messages |
> | **Types** | TypeScript compiler | Type errors, incompatibilities |
> | **Style** | Linters (ESLint, Prettier) | Style violations, formatting issues |
> | **Performance** | Profilers, benchmarks | Execution time, memory usage |
> | **Accessibility** | axe-core, screen readers | WCAG violations, navigation issues |
> | **Security** | Static analyzers (Semgrep) | Vulnerability patterns |
> | **UX** | User testing, recordings | Usability problems, confusion points |
>
> **TDD as canonical example**:
> 1. Claude writes tests for the feature
> 2. Claude iterates code until tests pass
> 3. Continue until explicit completion criteria met
>
> **Official guidance**: *"Tell Claude to keep going until all tests pass. It will usually take a few iterations."* — [Anthropic Best Practices](https://www.anthropic.com/engineering/claude-code-best-practices)
>
> **Implementation patterns**:
> - **Hooks**: PostToolUse hook runs verification after each edit
> - **Browser extension**: Claude in Chrome sees rendered output
> - **Test watchers**: Jest/Vitest watch mode provides instant feedback
> - **CI/CD gates**: GitHub Actions runs full validation suite
> - **Multi-Claude verification**: One Claude codes, another reviews
>
> **Anti-pattern**: Blind iteration without feedback. Without verification mechanism, Claude can't converge toward correct solution—it guesses.
**Eval-Driven Development** TDD for LLMs. Test agent behaviors via evals:
- Code-based: `output == golden_answer`
- LLM-based: Another Claude evaluates
- Human grading: Reference, slow
> **Eval Harness** — The infrastructure that runs evaluations end-to-end: providing instructions and tools, running tasks concurrently, recording steps, grading outputs, and aggregating results.
>
> See Anthropic's comprehensive guide: [Demystifying Evals for AI Agents](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents)
**Multi-Agent Orchestration** From single assistant to orchestrated team:
```
Meta-Agent (Orchestrator)
├── Analyst (requirements)
├── Architect (design)
├── Developer (code)
└── Reviewer (validation)
```
---
### Tier 6: Optimization
| Name | What | Best For | Claude Fit |
|------|------|----------|------------|
| **Iterative Loops** | Autonomous refinement | Optimization | ⭐⭐⭐ Core |
| **Fresh Context** | Reset per task, state in files | Long autonomous sessions | ⭐⭐⭐ Power users |
| **Prompt Engineering** | Technique foundation | Everything | ⭐⭐⭐ Prerequisite |
**Iterative Refinement Loops** Autonomous convergence:
1. Execute prompt
2. Observe result
3. If result "DONE" refine and repeat
**Prompt Engineering** Foundations for ALL Claude usage:
- Zero-Shot Chain of Thought: "Think step by step"
- Few-Shot Learning: 2-3 examples of expected pattern
- Structured Prompts: XML tags for organization
- Position Matters: For long docs, place question at end
**Fresh Context Pattern (Ralph Loop)** Solves context rot by spawning fresh agent instances per task. State persists in git + progress files, not chat history. Ideal for long autonomous sessions (migrations, overnight runs). See [Ultimate Guide - Fresh Context Pattern](./ultimate-guide.md#fresh-context-pattern-ralph-loop) for implementation.
---
## SDD Tools Reference
Three tools have emerged to formalize Spec-Driven Development:
| Tool | Use Case | Official Docs | Claude Integration |
|------|----------|---------------|-------------------|
| **Spec Kit** | Greenfield, governance | [github.blog/spec-kit](https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/) | `/speckit.constitution`, `/speckit.specify`, `/speckit.plan` |
| **OpenSpec** | Brownfield, changes | [github.com/Fission-AI/OpenSpec](https://github.com/Fission-AI/OpenSpec) | `/openspec:proposal`, `/openspec:apply`, `/openspec:archive` |
| **Specmatic** | API contract testing | [specmatic.io](https://specmatic.io) | MCP agent available |
### Spec Kit (Greenfield)
5-phase workflow:
1. Constitution: `/speckit.constitution` guardrails
2. Specify: `/speckit.specify` requirements
3. Plan: `/speckit.plan` architecture
4. Tasks: `/speckit.tasks` decomposition
5. Implement: `/speckit.implement` code
### OpenSpec (Brownfield)
Two-folder architecture:
```
openspec/
├── specs/ ← Current truth (stable)
└── changes/ ← Proposals (temporary)
```
Workflow: Proposal Review Apply Archive
### Specmatic (API Contracts)
- **Contract as Test**: Auto-generates 1000s of tests from OpenAPI spec
- **Contract as Stub**: Mock server for parallel development
- **Backward Compatibility**: Detects breaking changes
---
## Writing Effective Specs
> Based on analysis of 2,500+ agent configuration files.
> Source: [Addy Osmani](https://addyosmani.com/blog/good-spec/)
### The Six Essential Components
| Component | What to Include | Example |
|-----------|-----------------|---------|
| **Commands** | Executable with flags | `npm test -- --coverage` |
| **Testing** | Framework, coverage, locations | `vitest, 80%, tests/` |
| **Project structure** | Explicit directories | `src/`, `lib/`, `tests/` |
| **Code style** | One example > paragraphs | Show a real function |
| **Git workflow** | Branch, commit, PR format | `feat/name`, conventional commits |
| **Boundaries** | Permission tiers | See below |
### Permission Tiers
| Tier | Symbol | Use For |
|------|--------|---------|
| Always do | ✅ | Safe actions, no approval (lint, format) |
| Ask first | ⚠️ | High-impact changes (delete, publish) |
| Never do | 🚫 | Hard stops (commit secrets, force push main) |
### Curse of Instructions
> ⚠️ Research shows **more instructions = worse adherence** to each one.
>
> Solution: Feed only relevant spec sections per task, not the entire document.
### Monolithic vs Modular Specs
| Project Size | Approach |
|--------------|----------|
| Small (<10 files) | Single spec file |
| Medium (10-50 files) | Sectioned spec, feed per task |
| Large (50+ files) | Sub-agent routing by domain |
---
## Combination Patterns
Recommended stacks by situation:
| Situation | Recommended Stack | Notes |
|-----------|-------------------|-------|
| Solo MVP | SDD + TDD | Minimal overhead, quality focus |
| Team 5-10, greenfield | Spec Kit + TDD + BDD | Governance + quality + collaboration |
| Microservices | CDD + Specmatic | Contract-first, parallel dev |
| Existing SaaS (100+ features) | OpenSpec + BDD | Change tracking, no spec drift |
| Enterprise 10+ | BMAD + Spec Kit + Specmatic | Full governance + contracts |
| LLM-native product | Eval-Driven + Multi-Agent | Self-improving systems |
---
## Quick Reference Table
| Methodology | Level | Primary Focus | Team Size | Learning Curve |
|-------------|-------|---------------|-----------|----------------|
| BMAD | Orchestration | Governance | 10+ | High |
| SDD | Specification | Contracts | Any | Medium |
| Doc-Driven | Specification | Alignment | Any | Low |
| Req-Driven | Specification | Context | 5+ | Medium |
| DDD | Specification | Domain | 5+ | Very High |
| BDD | Behavior | Collaboration | 5+ | Medium |
| ATDD | Behavior | Compliance | 5+ | Medium |
| CDD | Behavior | APIs | 5+ | Medium |
| FDD | Delivery | Features | 10+ | Medium |
| Context Eng. | Delivery | AI sessions | Any | Low |
| TDD | Implementation | Quality | Any | Low |
| Eval-Driven | Implementation | AI outputs | Any | Medium |
| Multi-Agent | Implementation | Complexity | Any | Medium |
| Iterative | Optimization | Refinement | Any | Low |
| Prompt Eng. | Optimization | Foundation | Any | Very Low |
---
## Sources
### Official Documentation (Tier 1)
- Anthropic: [Claude Code Best Practices](https://www.anthropic.com/engineering/claude-code-best-practices)
- Anthropic: [Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
- Anthropic: [Demystifying Evals for AI Agents](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents)
- GitHub: [Spec-Driven Development Toolkit](https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/)
- Microsoft: [Spec-Driven Development with Spec Kit](https://developer.microsoft.com/blog/spec-driven-development-spec-kit)
### Methodology References (Tier 2)
**SDD & Spec-First**
- Addy Osmani: [How to Write Good Specs for AI Agents](https://addyosmani.com/blog/good-spec/)
- Addy Osmani: [My AI Coding Workflow in 2026](https://addyosmani.com/blog/ai-coding-workflow/) End-to-end workflow: spec-first, context packing, TDD, git checkpoints
- Martin Fowler: [SDD Tools Analysis](https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html)
- InfoQ: [Spec-Driven Development](https://www.infoq.com/articles/spec-driven-development/)
- Kinde: [Beyond TDD - Why SDD is the Next Step](https://kinde.com/learn/ai-for-software-engineering/best-practice/beyond-tdd-why-spec-driven-development-is-the-next-step/)
- Tessl.io: [Spec-Driven Dev with Claude Code](https://tessl.io/blog/spec-driven-dev-with-claude-code/)
**BMAD**
- GMO Recruit: [The BMAD Method](https://recruit.group.gmo/engineer/jisedai/blog/the-bmad-method-a-framework-for-spec-oriented-ai-driven-development/)
- Benny Cheung: [BMAD - Reclaiming Control in AI Dev](https://bennycheung.github.io/bmad-reclaiming-control-in-ai-dev)
- GitHub: [BMAD-AT-CLAUDE](https://github.com/24601/BMAD-AT-CLAUDE)
**TDD with AI**
- Steve Kinney: [TDD with Claude](https://stevekinney.com/courses/ai-development/test-driven-development-with-claude)
- Nathan Fox: [Taming GenAI Agents](https://www.nathanfox.net/p/taming-genai-agents-like-claude-code)
- Alex Op: [Custom TDD Workflow Claude Code](https://alexop.dev/posts/custom-tdd-workflow-claude-code-vue/)
**BDD & DDD**
- Alex Soyes: [BDD Behavior-Driven Development](https://alexsoyes.com/bdd-behavior-driven-development/)
- Alex Soyes: [DDD Domain-Driven Design](https://alexsoyes.com/ddd-domain-driven-design/)
- Inflectra: [Behavior-Driven Development](https://www.inflectra.com/Ideas/Topic/Behavior-Driven-Development.aspx)
**Context Engineering**
- Intuition Labs: [What is Context Engineering](https://intuitionlabs.ai/articles/what-is-context-engineering)
- Manus.im: [Context Engineering for AI Agents](https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus)
**Eval-Driven & Multi-Agent**
- Fireworks AI: [Eval-Driven Development with Claude Code](https://fireworks.ai/blog/eval-driven-development-with-claude-code)
- Brandon Casci: [Transform into a Dev Team using Claude Code Agents](https://www.brandoncasci.com/2025/09/21/how-to-transform-yourself-into-a-dev-team-using-claude-codes-ai-agents.html)
- The Unwind AI: [Claude Code's Multi-Agent Orchestration](https://www.theunwindai.com/p/claude-code-s-hidden-multi-agent-orchestration-now-open-source)
### Tools Documentation (Tier 1)
- OpenSpec: [github.com/Fission-AI/OpenSpec](https://github.com/Fission-AI/OpenSpec)
- Spec Kit: [github.com/github/spec-kit](https://github.com/github/spec-kit)
- Specmatic: [specmatic.io](https://specmatic.io)
- Specmatic Article: [Spec-Driven Development with GitHub Spec Kit and Specmatic MCP](https://specmatic.io/article/spec-driven-development-api-design-first-with-github-spec-kit-and-specmatic-mcp/)
### Additional References
- Talent500: [Claude Code TDD Guide](https://talent500.com/blog/claude-code-test-driven-development-guide/)
- Testlio: [Acceptance Test-Driven Development](https://testlio.com/blog/what-is-acceptance-test-driven-development/)
- Monday.com: [Feature-Driven Development](https://monday.com/blog/rnd/feature-driven-development-fdd/)
- Paddo.dev: [Ralph Wiggum Autonomous Loops](https://paddo.dev/blog/ralph-wiggum-autonomous-loops/)
- Walturn: [Prompt Engineering for Claude](https://www.walturn.com/insights/mastering-prompt-engineering-for-claude)
- AWS: [Prompt Engineering with Claude on Bedrock](https://aws.amazon.com/blogs/machine-learning/prompt-engineering-techniques-and-best-practices-learn-by-doing-with-anthropics-claude-3-on-amazon-bedrock/)
---
## See Also
- [workflows/tdd-with-claude.md](./workflows/tdd-with-claude.md) Practical TDD guide
- [workflows/spec-first.md](./workflows/spec-first.md) Spec-first development
- [workflows/plan-driven.md](./workflows/plan-driven.md) Using /plan mode
- [workflows/iterative-refinement.md](./workflows/iterative-refinement.md) Refinement loops
- [ultimate-guide.md#912](./ultimate-guide.md) Section 9.12 summary