feat(v3.32.0): Plan-Validate-Execute Pipeline — 3-command AI-first workflow

New workflow for production teams: dynamic agent teams, ADR learning loop,
automated execution from PRD to merged PR.

Added:
- guide/workflows/plan-pipeline.md — complete workflow guide (philosophy,
  non-prescriptive AI-first, No Bandaids first principles, ADR learning loop,
  CLAUDE.md 120-line discipline, /clear context reset, cost profile)
- examples/commands/plan-start.md — 5-phase planning with 12-agent dynamic
  pool (trigger-based selection, Tier 0 Solo → Tier 4 Full Spectrum,
  planning-coordinator synthesis, auto-transition to validate)
- examples/commands/plan-validate.md — 2-layer validation (structural inline +
  8 specialist agents), ADR-aware auto-fix (Bucket A ~95% auto-resolve,
  Bucket B human input → new rule), issue persistence in metrics JSON
- examples/commands/plan-execute.md — worktree → TDD scaffold → level-based
  parallel agents → drift detection → quality gate → smoke test → PR squash
  merge → post-merge metrics → cleanup
- examples/agents/planning-coordinator.md — Opus synthesis agent: merges
  multi-agent reports into coherent task graph, resolves conflicts via ADR
  precedence, verifies plan completeness before output
- examples/agents/integration-reviewer.md — Opus runtime validator: connection
  params, async/sync consistency, env var completeness, library API
  correctness (WebFetch), OTEL pipeline validation

Updated:
- machine-readable/reference.yaml — 16 new indexed keys
- CHANGELOG.md — v3.32.0 entry with 6 detailed items
- VERSION, README.md, guide/cheatsheet.md, guide/ultimate-guide.md — bumped to 3.32.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-03-06 17:24:26 +01:00
parent 07c3c42b03
commit 7bda706da2
12 changed files with 1349 additions and 15 deletions

View file

@ -0,0 +1,236 @@
---
name: plan-execute
description: "Execute a validated plan: worktree isolation, TDD scaffolding, level-based parallel agents, quality gate with smoke test, PR creation and merge. Handles everything through to merged PR."
---
# Plan Execute — Execution to Merged PR
Execute the validated plan in an isolated worktree. Spawn per-task agents, verify quality, create and merge the PR. Handles everything through to cleanup.
Run `/clear` before this command.
---
## Prerequisite
A validated plan must exist at `docs/plans/plan-{name}.md` with all issues resolved (output of `/plan-validate`).
---
## Step 1: Worktree Setup
Create an isolated git worktree:
```bash
git worktree add .worktrees/{plan-name} -b feature/{plan-name}
```
All execution happens inside the worktree. Main branch remains clean throughout.
---
## Step 2: TDD Scaffolding
*Only for tasks marked as TDD in the plan.*
For each TDD task, before any implementation:
1. Write the failing test(s) that define the acceptance criteria
2. Run tests to confirm they fail (red)
3. Commit the failing tests
4. Mark the test file in the task for the implementation agent to find
Do not write implementation code in this step.
---
## Step 3: Level-Based Parallel Execution
Parse the task list from the plan. Group tasks by layer (Layer 1 = foundation, Layer 2 = depends on Layer 1, etc.).
**For each layer:**
1. Identify all tasks in the layer
2. Spawn one agent per task in parallel (Task tool, run_in_background: true)
3. Each agent receives: its task description, files to modify, acceptance criteria, and relevant ADRs
4. Monitor all agents via TaskOutput polling loop
5. Each agent commits on task completion: `git commit -m "feat: {task-description}"`
6. Wait for all tasks in the layer to complete before starting the next layer
**Drift detection**: after each layer, diff the actual changes against the plan spec. If implementation deviates significantly from the plan (new files not in plan, plan files not touched), flag and ask how to proceed. Do not silently continue on drift.
**Agent instructions for each task:**
```
You are implementing one task from a validated plan.
Task: {description}
Files to modify: {file list}
Acceptance criteria: {criteria}
Relevant ADRs: {adr list}
First principles:
- Build state-of-the-art. No workarounds, no legacy patterns.
- Fix at the correct architectural level, never with component-level hacks.
- If you discover that the plan is wrong or missing context, stop and report — do not improvise architecture.
Commit your changes when complete with message: "feat: {task-description}"
```
---
## Step 4: Quality Gate
Run in parallel:
- Linter
- Type checker (if applicable)
- Full test suite
If all pass: proceed to smoke test.
If any fail: spawn a `quality-fixer` debug agent with the failure output. It gets up to **3 auto-fix attempts**. After each attempt, re-run the quality gate. If still failing after 3 attempts: stop, report the failure with the full error output, and wait for human intervention.
**Integration smoke test** *(skip for pure frontend or docs-only plans)*:
Run the smoke commands defined in the plan's `## Integration Verification` section. Additionally:
- If GraphQL: run an introspection probe to verify schema is accessible
- If Docker services: scan container logs for ERROR-level entries
- If new API routes: verify each returns expected status codes
Smoke test failures are debugged by a `quality-fixer-smoke` agent with the same 3-attempt limit.
---
## Step 5: Pre-PR Documentation
*In the worktree, before creating the PR.*
**PRD Reconciliation**: compare the implemented behavior against the original PRD. Note any deviations or additions discovered during implementation. Update the PRD with actuals. These updates ship in the same PR as the feature.
**Plan Archival**: move `docs/plans/plan-{name}.md` to `docs/plans/completed/plan-{name}.md`. Update the status header.
Commit documentation updates: `docs: reconcile PRD and archive plan for {feature-name}`.
---
## Step 6: Push and PR
Push the worktree branch and create the PR:
```bash
git push origin feature/{plan-name}
gh pr create \
--title "{feature-name}: {one-line summary from plan}" \
--body "$(cat .pr-body.md)"
```
PR body template:
```markdown
## Summary
{plan summary paragraph}
## Changes
{auto-generated from task list: bullet per task with files affected}
## ADRs
{list of ADRs created during this plan}
## Test Plan
{from plan test plan section}
## Smoke Test Results
{output from integration verification}
```
Merge using squash:
```bash
gh pr merge --squash --delete-branch
```
---
## Step 7: Post-Merge Metrics
Switch back to develop/main. Update `docs/plans/metrics/{name}.json` with execution data:
- Task count and per-layer breakdown
- TDD task count
- Diff stats (files changed, lines added/removed)
- Quality gate results (pass/fail, fix attempts)
- Smoke test results
- Drift score (0-1, how closely implementation matched plan)
- PR data (number, merge commit, timestamp)
Commit metrics update.
---
## Step 8: Worktree Cleanup
```bash
git worktree remove .worktrees/{plan-name}
```
---
## Usage
```
/plan-execute
```
Picks up the most recent validated plan. Or specify:
```
/plan-execute plan-user-authentication
```
## Output
```
Setting up worktree: .worktrees/user-authentication
Branch: feature/user-authentication
TDD scaffolding: 2 tasks marked TDD
✓ Written failing tests for: auth-token-validation
✓ Written failing tests for: refresh-token-rotation
Committed: "test: failing tests for auth pipeline (TDD)"
Executing Layer 1 (3 tasks, parallel)...
[agent-1] Implementing: JWT token generation service
[agent-2] Implementing: User session model
[agent-3] Implementing: Auth middleware
✓ Layer 1 complete. 3 commits.
Drift check: Layer 1... ✓ No drift detected.
Executing Layer 2 (2 tasks, parallel)...
[agent-4] Implementing: Login endpoint
[agent-5] Implementing: Refresh endpoint
✓ Layer 2 complete. 2 commits.
Quality gate...
✓ Lint passed
✓ Type check passed
✓ Tests: 47 passed, 0 failed
Smoke test...
✓ GraphQL introspection: OK
✓ POST /api/auth/login: 200
✓ POST /api/auth/refresh: 200
Pre-PR docs...
✓ PRD reconciled (1 minor deviation noted)
✓ Plan archived to docs/plans/completed/
PR created: #142 "user-authentication: JWT auth with refresh token rotation"
PR merged (squash). Branch deleted.
Metrics committed. Worktree cleaned.
✅ Feature complete.
```
## When to Use
After `/plan-validate` confirms all issues are resolved. Never skip validation — executing an unvalidated plan skips the independent review that catches ~18 issues on average.
## See Also
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)
- [Git Worktree Command](./git-worktree.md)
- [TDD with Claude](../guide/workflows/tdd-with-claude.md)

View file

@ -0,0 +1,180 @@
---
name: plan-start
description: "5-phase planning command: PRD analysis, design review, technical decisions, dynamic research team, metrics. Produces a complete implementation plan + ADRs before any code is written."
---
# Plan Start — 5-Phase Planning
Analyze the request and produce a complete implementation plan through structured phases. No code is written. Every significant decision is recorded. Run `/clear` after this command before running `/plan-validate`.
---
## Phase 1: PRD & Design Analysis
### Step 1.1 — PRD Analysis
*Skip if no PRD exists (refactor, infra change, bug fix).*
Read all PRD files and `docs/INFORMATION_ARCHITECTURE.md` if present. Scan the codebase to understand current implementation status.
Surface findings in 3 buckets:
**Missing requirements** — acceptance criteria that are absent or incomplete
**Ambiguous requirements** — items with multiple valid interpretations
**Compliance concerns** — security, data privacy, API contract implications
For each finding: present options with concrete pros/cons. Discuss with user. Record every decision in the plan file under a `## Decisions` section before moving on. Do not proceed past unresolved ambiguities.
### Step 1.2 — Design Analysis
*Skip if no UI changes are in scope.*
Read: `DESIGN_SYSTEM.md`, existing UX ADRs, CLAUDE.md UX rules.
Produce specs for:
- **Screen inventory**: new/modified screens, route placement, component reuse audit
- **State catalog**: empty, loading, populated, error, and partial states for every interactive element
- **Interaction specs**: user flows (happy path + alternates), focus/keyboard behavior
- **Animation specs**: map each interaction to existing keyframes or specify new ones, include `prefers-reduced-motion` fallbacks
- **Responsive behavior**: breakpoints, web/mobile divergence decisions
- **Accessibility**: WAI-ARIA pattern selection, live regions, error visibility
Create Design ADRs for significant UX decisions (choice of interaction pattern, new animation convention, platform divergence). Record minor layout choices directly in the plan file.
---
## Phase 2: Technical Analysis
Spawn 1-2 Explore agents for targeted codebase research. Run them in the background via Task tool.
While agents run, check:
- Existing ADRs in `docs/adr/` — if 3+ ADRs confirm a decision → auto-resolve without asking
- PATTERNS.md — apply confirmed patterns directly
When agents return: present architecture decisions with 2-3 options each, concrete pros/cons, and a recommendation. Ask for user input on each unresolved decision.
For each significant decision:
1. Create `docs/adr/ADR-XXXX.md` using standard Nygard format (Context / Decision / Status / Consequences)
2. Update `docs/adr/PATTERNS.md` with the new observation
---
## Phase 3: Scope Assessment
Apply trigger rules to determine which research agents are needed. Present the proposed team with justification for each inclusion.
**Research agent pool:**
| Agent | Trigger | Model |
|-------|---------|-------|
| `code-explorer` | Always | Sonnet |
| `arch-researcher` | Changes touch 2+ architectural layers | Sonnet |
| `database-analyst` | Any DB schema change | Sonnet |
| `security-analyst` | Auth, payments, PII, RBAC, rate limiting | Opus |
| `test-analyzer` | Non-trivial feature (not just a bug fix) | Sonnet |
| `cross-platform-specialist` | Web + mobile parity required | Sonnet |
| `native-app-specialist` | Tasks touch mobile/native UI package | Sonnet |
| `design-system-researcher` | UI changes in scope | Sonnet |
| `dependency-researcher` | New packages being added | Sonnet |
| `devops-specialist` | Docker, env vars, CI/CD changes | Sonnet |
| `integration-researcher` | New services, libraries, OTEL config | Opus |
| `planning-coordinator` | Always, when 2+ agents selected | Opus |
**Tier labels** (descriptive, not prescriptive):
- Tier 0 (0 agents): Solo — inline research, no spawning
- Tier 1 (1-3 agents): Focused
- Tier 2 (4-6 agents): Standard
- Tier 3 (7-9 agents): Comprehensive
- Tier 4 (10+ agents): Full Spectrum
Tell the user: "I recommend a **[Tier N - Label]** team: [agent list with one-line justification each]. Want to add or remove any agents?"
Wait for approval before Phase 4.
---
## Phase 4: Research & Plan Creation
**Tier 0**: Conduct inline research. Write plan directly without spawning agents.
**Tier 1+**: Spawn approved agents in parallel using Task tool (run_in_background: true). For each agent, provide:
- Its specific research scope
- The relevant files/areas to investigate
- The questions it needs to answer
Monitor agents via TaskOutput polling loop. Report progress: "3/6 agents complete..."
When all agents return: if `planning-coordinator` was spawned, send it all agent reports and have it synthesize the final plan. Otherwise, synthesize directly.
**Plan file structure** (`docs/plans/plan-{name}.md`):
```markdown
# Plan: {feature-name}
Created: {date} | Branch: {branch-name} | Tier: {N}
## Summary
One paragraph: what this implements and why.
## Decisions
Decisions recorded during Phase 1 (PRD analysis).
## Architecture
ADRs created, patterns applied, architectural choices made.
## Tasks
Ordered task list with layers (1 = foundation, 2 = depends on 1, etc.)
### Layer 1
- [ ] Task A — description, files affected, acceptance criteria
- [ ] Task B — description, files affected, acceptance criteria
### Layer 2
- [ ] Task C — depends on A — description, files affected, acceptance criteria
## Test Plan
How each task will be verified. TDD tasks marked explicitly.
## Integration Verification
Smoke test commands to run post-execution (if backend/services in scope).
## Out of Scope
What this plan explicitly does not address.
```
Commit: plan file + ADR files + agent report manifests.
---
## Phase 5: Finalize Metrics
Record timestamps, phase durations, agent counts, and cost estimates in `docs/plans/metrics/{name}.json`. Commit.
---
## Auto-Transition
If Phase 1 produced no unresolved ambiguities and Phase 2 produced no unresolved decisions: auto-start `/plan-validate` without asking.
If any human discussion occurred: ask "Ready to validate this plan?" before proceeding.
---
## Usage
```
/plan-start
```
Provide the feature description or point to a PRD file when prompted. The command handles the rest interactively.
## When to Use
Use for any non-trivial feature: anything touching more than 2 files, involving architecture decisions, or where a planning mistake would be expensive to undo.
For simple changes (typos, trivial refactors): use `/plan` mode instead.
## See Also
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)
- [Planning Coordinator Agent](../agents/planning-coordinator.md)
- [ADR Writer Agent](../agents/adr-writer.md)

View file

@ -0,0 +1,187 @@
---
name: plan-validate
description: "2-layer plan validation: instant structural checks + trigger-based specialist agents. Auto-fixes issues using ADRs and first principles. Every issue must be resolved before execution."
---
# Plan Validate — 2-Layer Validation
Independently validate the plan produced by `/plan-start`. No code is written. Run `/clear` after this command before running `/plan-execute`.
Validation is separate from planning by design: validators that didn't write the plan are not anchored to its assumptions.
---
## Prerequisite
A committed plan file must exist at `docs/plans/plan-{name}.md`. If multiple plans exist, list them and ask the user which to validate.
---
## Layer 1: Structural Validation
Run immediately, no agents required. Check the plan document for:
**Format & Completeness**
- [ ] All required sections present (Summary, Decisions, Architecture, Tasks, Test Plan, Out of Scope)
- [ ] Each task has: description, files affected, acceptance criteria, layer assignment
**Dependency Chain**
- [ ] No circular dependencies between tasks
- [ ] Tasks in higher layers only depend on tasks in lower layers
- [ ] All stated dependencies exist in the plan
**File Existence**
- [ ] Every file listed for modification actually exists in the codebase (use Glob)
- [ ] New files are in appropriate directories per project conventions
**ADR Consistency**
- [ ] Plan decisions align with ADRs created during `/plan-start`
- [ ] No contradiction with existing ADRs in `docs/adr/`
**CLAUDE.md Compliance**
- [ ] Plan respects all hard rules in CLAUDE.md
- [ ] No first principles violations (no workarounds, no backward-compat shims)
**Test Coverage**
- [ ] Every new function/component has a corresponding test task
- [ ] TDD-marked tasks have failing test written before implementation task
Record all Layer 1 issues with severity (BLOCKER / WARNING / INFO) before proceeding to Layer 2.
---
## Layer 2: Specialist Review
Select agents by applying trigger rules to the plan content. No user input needed — triggers are objective.
**Validation agent pool:**
| Agent | Trigger | Model |
|-------|---------|-------|
| `security-reviewer` | Auth, payments, PII, RBAC, new public APIs | Opus |
| `db-migration-reviewer` | New tables, columns, indexes, or migration files | Opus |
| `performance-reviewer` | New queries, resolvers, routes, or added dependencies | Sonnet |
| `design-system-reviewer` | New UI components or visual styling changes | Sonnet |
| `ux-reviewer` | New pages, forms, modals, or interaction patterns | Sonnet |
| `cross-platform-reviewer` | Changes touching both web and mobile, or shared packages | Sonnet |
| `native-app-reviewer` | Mobile screens, native UI package changes | Sonnet |
| `integration-reviewer` | New external services, libraries, or OTEL config | Opus |
Spawn triggered agents in parallel (Task tool, run_in_background: true). Each agent receives: the plan file, relevant ADRs, and targeted questions based on its domain.
Monitor via TaskOutput polling loop. Report progress to user.
Each agent must return structured findings:
```
FINDING: [BLOCKER|WARNING|INFO]
Location: [plan section or file reference]
Issue: [concrete description]
Risk: [what breaks if this isn't addressed]
Suggestion: [specific fix or alternative]
```
---
## Auto-Fix Phase
Merge Layer 1 structural issues + Layer 2 specialist findings into a single issue list. Every issue must be resolved. No skipping.
**Triage each issue:**
**Bucket A — Auto-resolve:**
- Issue matches an existing ADR decision → cite ADR, mark resolved
- Issue matches a confirmed pattern in PATTERNS.md → cite pattern, mark resolved
- Issue resolvable from first principles in CLAUDE.md → apply rule, mark resolved
**Bucket B — Needs human input:**
- Novel architectural question not covered by existing decisions
- Conflicting ADRs with no clear precedent
- Blocker with no obvious resolution
For Bucket B items: present the issue, explain why it can't be auto-resolved, propose options, wait for decision. Record the decision in the plan's `## Decisions` section and create a new ADR if it's architecturally significant.
**Apply all fixes in one batch** once all issues are triaged. Update the plan file. Commit the updated plan.
---
## Issue Persistence
Record every issue in `docs/plans/metrics/{name}.json` under `validation.issues`:
```json
{
"id": "S-001",
"layer": 1,
"severity": "WARNING",
"category": "test-coverage",
"description": "No test task for the new webhook handler",
"reporting_agent": "structural",
"triage": "A",
"resolution_source": "first-principles",
"resolution": "Added test task in Layer 2 of the plan"
}
```
This data feeds `/plan-metrics` for pattern analysis over time.
---
## Auto-Transition
If all issues are auto-resolved (Bucket A only): auto-start `/plan-execute` without asking.
If any human input was required (Bucket B): ask "All issues resolved. Ready to execute?" before proceeding.
---
## Usage
```
/plan-validate
```
Picks up the most recent uncommitted plan automatically. Or specify:
```
/plan-validate plan-user-authentication
```
## Output
```
Layer 1: Structural validation...
✓ Format complete
✓ Dependencies valid
⚠ WARNING S-001: Missing test task for webhook handler
✓ CLAUDE.md compliant
Layer 2: Triggering specialist agents...
→ security-reviewer (auth changes detected) [Opus]
→ db-migration-reviewer (new users table) [Opus]
→ performance-reviewer (new query in /api/users) [Sonnet]
Monitoring... 1/3 complete... 2/3 complete... done.
BLOCKER B-001 [security-reviewer]: JWT expiry not validated on refresh endpoint
WARNING B-002 [db-migration-reviewer]: Migration lacks rollback strategy
Auto-fix phase:
S-001 → auto-resolved (first principles: test coverage rule)
B-001 → NEEDS INPUT (no existing ADR for JWT refresh strategy)
B-002 → auto-resolved (ADR-0003: migration rollback pattern)
[User input requested for B-001]
Decision recorded. ADR-0011 created.
All 3 issues resolved. Plan updated.
→ Auto-starting /plan-execute
```
## When to Use
Always — before any `/plan-execute` call. The cost of validation ($0.20-3.00) is negligible against the cost of discovering issues mid-execution.
## See Also
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)
- [Integration Reviewer Agent](../agents/integration-reviewer.md)
- [Plan Challenger Agent](../agents/plan-challenger.md)