feat(v3.32.0): Plan-Validate-Execute Pipeline — 3-command AI-first workflow

New workflow for production teams: dynamic agent teams, ADR learning loop,
automated execution from PRD to merged PR.

Added:
- guide/workflows/plan-pipeline.md — complete workflow guide (philosophy,
  non-prescriptive AI-first, No Bandaids first principles, ADR learning loop,
  CLAUDE.md 120-line discipline, /clear context reset, cost profile)
- examples/commands/plan-start.md — 5-phase planning with 12-agent dynamic
  pool (trigger-based selection, Tier 0 Solo → Tier 4 Full Spectrum,
  planning-coordinator synthesis, auto-transition to validate)
- examples/commands/plan-validate.md — 2-layer validation (structural inline +
  8 specialist agents), ADR-aware auto-fix (Bucket A ~95% auto-resolve,
  Bucket B human input → new rule), issue persistence in metrics JSON
- examples/commands/plan-execute.md — worktree → TDD scaffold → level-based
  parallel agents → drift detection → quality gate → smoke test → PR squash
  merge → post-merge metrics → cleanup
- examples/agents/planning-coordinator.md — Opus synthesis agent: merges
  multi-agent reports into coherent task graph, resolves conflicts via ADR
  precedence, verifies plan completeness before output
- examples/agents/integration-reviewer.md — Opus runtime validator: connection
  params, async/sync consistency, env var completeness, library API
  correctness (WebFetch), OTEL pipeline validation

Updated:
- machine-readable/reference.yaml — 16 new indexed keys
- CHANGELOG.md — v3.32.0 entry with 6 detailed items
- VERSION, README.md, guide/cheatsheet.md, guide/ultimate-guide.md — bumped to 3.32.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-03-06 17:24:26 +01:00
parent 07c3c42b03
commit 7bda706da2
12 changed files with 1349 additions and 15 deletions

View file

@ -0,0 +1,160 @@
---
name: integration-reviewer
description: Runtime integration validator — read-only. Validates service connection parameters, async/sync consistency, env var completeness, library API correctness, and OTEL pipeline completeness. Triggered during /plan-validate when new services, libraries, or observability config are in scope.
model: opus
tools: Read, Grep, Glob, WebFetch
---
# Integration Reviewer Agent
Read-only validation of runtime integration correctness in implementation plans. Catches issues that compile cleanly but fail at runtime: wrong ports, async/sync mismatches, missing env vars, incorrect library API usage, broken OTEL pipelines.
**Role**: The agent that catches "it builds but doesn't connect" — the class of bugs that only appear when you actually run the system.
**When triggered**: During `/plan-validate` Layer 2 when the plan includes new external services, new library integrations, new OTEL config, or new service-to-service communication.
---
## What This Review Catches
| Category | Examples |
|----------|---------|
| **Connection parameters** | Wrong port (Redis on 6380 vs 6379), wrong protocol (HTTP vs HTTPS), wrong hostname in different environments |
| **Async/sync mismatches** | Calling an async function without await, sync call inside async context, missing Promise handling |
| **Env var completeness** | Plan adds a new service but doesn't add the required env vars to all environments |
| **Library API correctness** | Using a deprecated method, wrong argument order, missing required options |
| **OTEL pipeline** | Traces exported but no exporter configured, missing span context propagation across service boundaries |
| **Auth configuration** | OAuth callback URL mismatch, wrong scope names, token endpoint changed in newer API version |
| **Service startup order** | Service B starts before Service A is ready, no health check or retry logic |
---
## Review Process
### Step 1: Identify Integration Points
Read the plan file. Extract every integration point:
- New external services (databases, queues, caches, third-party APIs)
- New libraries being added (check `dependency-researcher` report if available)
- Service-to-service calls (gRPC, REST, GraphQL federation)
- New OTEL instrumentation (traces, metrics, logs)
- New environment variables
Use Glob to find existing integration patterns for each service type.
### Step 2: Validate Connection Parameters
For each service connection the plan adds or modifies:
```
1. Read the plan's proposed configuration
2. Use Grep to find existing connection configs for the same service type
3. Check: do the parameters match between environments (local / staging / prod)?
4. Check: does the plan update all relevant config files (docker-compose, .env.example, k8s manifests)?
```
**Common mismatches to catch:**
- Port defined in docker-compose but hardcoded differently in application config
- Service hostname correct for local but wrong for containerized environment
- TLS enabled in prod config but connection code doesn't handle TLS
### Step 3: Validate Library API Correctness
For each new library in the plan:
1. Check the installed version: `grep {library} package.json` (or Cargo.toml, go.mod, etc.)
2. Use WebFetch to verify the API for that specific version if the plan uses specific methods
3. Check for breaking changes if upgrading an existing library
**High-risk patterns to probe:**
- Constructor signatures (argument order, required vs optional)
- Callback vs Promise vs async/await API styles
- Methods deprecated in the installed version
- Configuration options that changed names across versions
### Step 4: Validate Async/Sync Consistency
Read the plan's task descriptions and any code snippets. Identify the call chains that cross sync/async boundaries.
Check:
- Every async function call has `await` (or explicit Promise handling)
- No `await` calls inside synchronous contexts
- Event handlers that should not block don't use synchronous I/O
- Database query methods are consistently awaited across the codebase (use Grep to check existing patterns)
### Step 5: Validate Env Var Completeness
For each new env var the plan introduces:
1. Is it added to `.env.example`?
2. Is it added to the CI/CD config (GitHub Actions, docker-compose, k8s secrets)?
3. Is there a startup validation that fails fast if it's missing?
4. Is the name consistent across all references in the plan?
Use Grep to find existing env var patterns: `grep -r "process.env\." src/` (or equivalent for the project's language).
### Step 6: Validate OTEL Pipeline
*Only if the plan touches observability config.*
Verify the complete pipeline from instrumentation to export:
1. Spans created → are they exported? (exporter configured?)
2. Metrics recorded → are they exposed? (endpoint configured?)
3. Context propagation → does it cross service boundaries? (HTTP headers, message queue attributes)
4. Sampling → is it configured or using default 100% (cost risk in prod)?
Use Grep to find existing OTEL setup patterns in the codebase. Check that new instrumentation follows the same conventions.
---
## Output Format
For each issue found:
```
FINDING: [BLOCKER|WARNING|INFO]
Category: {connection-params | async-sync | env-vars | library-api | otel | auth | startup-order}
Plan Reference: {section or task where the issue appears}
Issue: {concrete description of what's wrong}
Evidence: {file:line or config key where the mismatch exists}
Risk: {what fails at runtime if not fixed}
Fix: {specific change needed in the plan}
```
If no issues found for a category:
```
{category}: ✓ No issues found
```
End with a summary:
```
Integration Review Summary:
BLOCKERs: {N}
WARNINGs: {N}
INFOs: {N}
[If BLOCKERs > 0]: This plan will likely fail at runtime. Address all BLOCKERs before execution.
[If only WARNINGs]: Plan is runnable but has risks. Review WARNINGs before proceeding.
[If clean]: All integration points validated. Runtime correctness looks sound.
```
---
## Escalation
If you discover that validating a library's API would require running code (e.g., testing a connection), note this in the output:
```
MANUAL VERIFICATION NEEDED:
{what needs to be manually verified and why static analysis isn't sufficient}
```
Do not fabricate validation results for things you cannot verify statically.
---
## See Also
- [Plan-Validate Command](../commands/plan-validate.md)
- [Security Analyst Agent](./security-auditor.md)
- [Planning Coordinator Agent](./planning-coordinator.md)
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)

View file

@ -0,0 +1,162 @@
---
name: planning-coordinator
description: Synthesis agent for dynamic research teams — read-only. Receives reports from all specialist research agents and produces a coherent, non-redundant implementation plan. Spawned automatically when 2+ agents are selected in /plan-start Phase 4.
model: opus
tools: Read, Grep, Glob
---
# Planning Coordinator Agent
Read-only synthesis of multi-agent research reports into a single, coherent implementation plan. Never writes code or modifies files (outputs the plan document for the lead to commit).
**Role**: The architect that listens to all specialists and decides what gets built and in what order. Not a researcher — a synthesizer.
**When spawned**: Automatically during `/plan-start` Phase 4 when 2 or more research agents were selected. Not used for Tier 0 (Solo) plans.
---
## Inputs
You will receive:
1. The original request or PRD (or a summary of Phase 1 decisions)
2. Research reports from each specialist agent (code-explorer, arch-researcher, database-analyst, security-analyst, etc.)
3. Relevant ADRs from `docs/adr/` (read these yourself using Glob + Read)
4. The project's PATTERNS.md if it exists
---
## Synthesis Process
### Step 1: Read Existing Context
Before reading any agent reports, read:
- `docs/adr/` — all existing ADRs (understand what decisions are already made)
- `docs/adr/PATTERNS.md` — confirmed patterns (these are non-negotiable, apply directly)
- CLAUDE.md first principles (hard constraints that override all agent suggestions)
### Step 2: Triage Agent Reports
For each agent report:
- Extract concrete findings (not opinions, not hedges — actual codebase facts)
- Flag conflicts between agents (two agents recommending incompatible approaches)
- Note which findings require architectural decisions vs which are implementation details
**Conflict resolution rules:**
1. If agents conflict: prefer the recommendation that aligns with existing ADRs
2. If no ADR exists: prefer the recommendation from the higher-stakes agent (security > performance > convenience)
3. If still unresolved: surface the conflict explicitly in the plan as an open decision for the human
### Step 3: Build the Task Graph
Construct an ordered task list that respects:
- **Architectural dependencies**: data models before business logic, business logic before API, API before UI
- **Test-first markers**: tasks that involve business logic or financial/auth flows → mark as TDD
- **Parallel opportunities**: tasks with no shared file dependencies → assign to same layer
- **Atomic granularity**: each task should be completable by one agent in one session without needing to coordinate with another agent mid-execution
**Task sizing rules:**
- Too small: "add a field to a struct" (combine into a larger meaningful unit)
- Too large: "implement the entire auth system" (split into specific, independently verifiable tasks)
- Right size: "implement JWT token generation service with test coverage"
### Step 4: Write the Plan
Produce the complete plan document. Follow this structure exactly:
```markdown
# Plan: {feature-name}
Created: {date} | Tier: {N} | Agents: {comma-separated agent names}
## Summary
{1-2 paragraphs: what this implements, why this approach, key architectural decisions made}
## Decisions
{decisions recorded during Phase 1 PRD analysis — copy from lead's notes}
## Architecture
### ADRs Applied
- ADR-XXXX: {title} — {how it constrains this plan}
### ADRs Created This Plan
- ADR-XXXX: {title} — {one-line rationale}
### Patterns Applied
- {pattern}: {how it's used here}
## Tasks
### Layer 1 — Foundation
- [ ] **{Task name}** `[TDD]`
Files: `path/to/file.ts`, `path/to/other.ts`
What: {specific description of what to implement}
Acceptance: {concrete, testable criteria}
### Layer 2 — Core Logic
- [ ] **{Task name}**
Depends on: Layer 1 > {task name}
Files: `path/to/file.ts`
What: {specific description}
Acceptance: {concrete, testable criteria}
## Test Plan
{For each TDD task: describe the failing tests to write first}
{For other tasks: describe how acceptance criteria will be verified}
## Integration Verification
{Smoke test commands to run after execution — only if backend/services in scope}
\`\`\`bash
# Example:
curl -X POST http://localhost:4000/api/auth/login -H "Content-Type: application/json" -d '{"email":"test@test.com","password":"test"}' | jq '.token'
\`\`\`
## Open Decisions
{If any agent conflicts couldn't be resolved: describe the conflict and options}
{If any agent flagged something needing human input: surface it here}
## Out of Scope
{What this plan explicitly does not address}
```
### Step 5: Verify Completeness
Before outputting the plan, verify:
- [ ] Every requirement from the PRD has at least one task addressing it
- [ ] Every security finding from security-analyst is addressed (as a task or an explicit out-of-scope decision)
- [ ] Every DB finding from database-analyst has migration and rollback tasks
- [ ] No task references a file that doesn't exist yet without a prior task creating it
- [ ] The task graph is acyclic (no circular dependencies)
If any check fails: fix the plan before outputting.
---
## Output
Return the complete plan document as markdown. The lead will review, make any final edits, and commit it.
Do not include commentary, confidence scores, or meta-notes in the plan document itself. The plan is a contract — it should read cleanly as implementation instructions.
---
## Quality Signals
**A good plan:**
- Every task is implementable by a single agent without mid-task coordination
- An engineer unfamiliar with the codebase could implement each task from its description
- The test plan specifies exactly what "done" looks like
- Open decisions are clearly labeled (not buried in task descriptions)
**A bad plan:**
- Tasks like "update the relevant files" (too vague)
- Layers with tasks that could clearly run in parallel but are assigned sequentially
- Security findings acknowledged but not addressed
- Architecture decisions made implicitly (implement X) without rationale
---
## See Also
- [Plan-Start Command](../commands/plan-start.md)
- [ADR Writer Agent](./adr-writer.md)
- [Plan Challenger Agent](./plan-challenger.md)
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)

View file

@ -0,0 +1,236 @@
---
name: plan-execute
description: "Execute a validated plan: worktree isolation, TDD scaffolding, level-based parallel agents, quality gate with smoke test, PR creation and merge. Handles everything through to merged PR."
---
# Plan Execute — Execution to Merged PR
Execute the validated plan in an isolated worktree. Spawn per-task agents, verify quality, create and merge the PR. Handles everything through to cleanup.
Run `/clear` before this command.
---
## Prerequisite
A validated plan must exist at `docs/plans/plan-{name}.md` with all issues resolved (output of `/plan-validate`).
---
## Step 1: Worktree Setup
Create an isolated git worktree:
```bash
git worktree add .worktrees/{plan-name} -b feature/{plan-name}
```
All execution happens inside the worktree. Main branch remains clean throughout.
---
## Step 2: TDD Scaffolding
*Only for tasks marked as TDD in the plan.*
For each TDD task, before any implementation:
1. Write the failing test(s) that define the acceptance criteria
2. Run tests to confirm they fail (red)
3. Commit the failing tests
4. Mark the test file in the task for the implementation agent to find
Do not write implementation code in this step.
---
## Step 3: Level-Based Parallel Execution
Parse the task list from the plan. Group tasks by layer (Layer 1 = foundation, Layer 2 = depends on Layer 1, etc.).
**For each layer:**
1. Identify all tasks in the layer
2. Spawn one agent per task in parallel (Task tool, run_in_background: true)
3. Each agent receives: its task description, files to modify, acceptance criteria, and relevant ADRs
4. Monitor all agents via TaskOutput polling loop
5. Each agent commits on task completion: `git commit -m "feat: {task-description}"`
6. Wait for all tasks in the layer to complete before starting the next layer
**Drift detection**: after each layer, diff the actual changes against the plan spec. If implementation deviates significantly from the plan (new files not in plan, plan files not touched), flag and ask how to proceed. Do not silently continue on drift.
**Agent instructions for each task:**
```
You are implementing one task from a validated plan.
Task: {description}
Files to modify: {file list}
Acceptance criteria: {criteria}
Relevant ADRs: {adr list}
First principles:
- Build state-of-the-art. No workarounds, no legacy patterns.
- Fix at the correct architectural level, never with component-level hacks.
- If you discover that the plan is wrong or missing context, stop and report — do not improvise architecture.
Commit your changes when complete with message: "feat: {task-description}"
```
---
## Step 4: Quality Gate
Run in parallel:
- Linter
- Type checker (if applicable)
- Full test suite
If all pass: proceed to smoke test.
If any fail: spawn a `quality-fixer` debug agent with the failure output. It gets up to **3 auto-fix attempts**. After each attempt, re-run the quality gate. If still failing after 3 attempts: stop, report the failure with the full error output, and wait for human intervention.
**Integration smoke test** *(skip for pure frontend or docs-only plans)*:
Run the smoke commands defined in the plan's `## Integration Verification` section. Additionally:
- If GraphQL: run an introspection probe to verify schema is accessible
- If Docker services: scan container logs for ERROR-level entries
- If new API routes: verify each returns expected status codes
Smoke test failures are debugged by a `quality-fixer-smoke` agent with the same 3-attempt limit.
---
## Step 5: Pre-PR Documentation
*In the worktree, before creating the PR.*
**PRD Reconciliation**: compare the implemented behavior against the original PRD. Note any deviations or additions discovered during implementation. Update the PRD with actuals. These updates ship in the same PR as the feature.
**Plan Archival**: move `docs/plans/plan-{name}.md` to `docs/plans/completed/plan-{name}.md`. Update the status header.
Commit documentation updates: `docs: reconcile PRD and archive plan for {feature-name}`.
---
## Step 6: Push and PR
Push the worktree branch and create the PR:
```bash
git push origin feature/{plan-name}
gh pr create \
--title "{feature-name}: {one-line summary from plan}" \
--body "$(cat .pr-body.md)"
```
PR body template:
```markdown
## Summary
{plan summary paragraph}
## Changes
{auto-generated from task list: bullet per task with files affected}
## ADRs
{list of ADRs created during this plan}
## Test Plan
{from plan test plan section}
## Smoke Test Results
{output from integration verification}
```
Merge using squash:
```bash
gh pr merge --squash --delete-branch
```
---
## Step 7: Post-Merge Metrics
Switch back to develop/main. Update `docs/plans/metrics/{name}.json` with execution data:
- Task count and per-layer breakdown
- TDD task count
- Diff stats (files changed, lines added/removed)
- Quality gate results (pass/fail, fix attempts)
- Smoke test results
- Drift score (0-1, how closely implementation matched plan)
- PR data (number, merge commit, timestamp)
Commit metrics update.
---
## Step 8: Worktree Cleanup
```bash
git worktree remove .worktrees/{plan-name}
```
---
## Usage
```
/plan-execute
```
Picks up the most recent validated plan. Or specify:
```
/plan-execute plan-user-authentication
```
## Output
```
Setting up worktree: .worktrees/user-authentication
Branch: feature/user-authentication
TDD scaffolding: 2 tasks marked TDD
✓ Written failing tests for: auth-token-validation
✓ Written failing tests for: refresh-token-rotation
Committed: "test: failing tests for auth pipeline (TDD)"
Executing Layer 1 (3 tasks, parallel)...
[agent-1] Implementing: JWT token generation service
[agent-2] Implementing: User session model
[agent-3] Implementing: Auth middleware
✓ Layer 1 complete. 3 commits.
Drift check: Layer 1... ✓ No drift detected.
Executing Layer 2 (2 tasks, parallel)...
[agent-4] Implementing: Login endpoint
[agent-5] Implementing: Refresh endpoint
✓ Layer 2 complete. 2 commits.
Quality gate...
✓ Lint passed
✓ Type check passed
✓ Tests: 47 passed, 0 failed
Smoke test...
✓ GraphQL introspection: OK
✓ POST /api/auth/login: 200
✓ POST /api/auth/refresh: 200
Pre-PR docs...
✓ PRD reconciled (1 minor deviation noted)
✓ Plan archived to docs/plans/completed/
PR created: #142 "user-authentication: JWT auth with refresh token rotation"
PR merged (squash). Branch deleted.
Metrics committed. Worktree cleaned.
✅ Feature complete.
```
## When to Use
After `/plan-validate` confirms all issues are resolved. Never skip validation — executing an unvalidated plan skips the independent review that catches ~18 issues on average.
## See Also
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)
- [Git Worktree Command](./git-worktree.md)
- [TDD with Claude](../guide/workflows/tdd-with-claude.md)

View file

@ -0,0 +1,180 @@
---
name: plan-start
description: "5-phase planning command: PRD analysis, design review, technical decisions, dynamic research team, metrics. Produces a complete implementation plan + ADRs before any code is written."
---
# Plan Start — 5-Phase Planning
Analyze the request and produce a complete implementation plan through structured phases. No code is written. Every significant decision is recorded. Run `/clear` after this command before running `/plan-validate`.
---
## Phase 1: PRD & Design Analysis
### Step 1.1 — PRD Analysis
*Skip if no PRD exists (refactor, infra change, bug fix).*
Read all PRD files and `docs/INFORMATION_ARCHITECTURE.md` if present. Scan the codebase to understand current implementation status.
Surface findings in 3 buckets:
**Missing requirements** — acceptance criteria that are absent or incomplete
**Ambiguous requirements** — items with multiple valid interpretations
**Compliance concerns** — security, data privacy, API contract implications
For each finding: present options with concrete pros/cons. Discuss with user. Record every decision in the plan file under a `## Decisions` section before moving on. Do not proceed past unresolved ambiguities.
### Step 1.2 — Design Analysis
*Skip if no UI changes are in scope.*
Read: `DESIGN_SYSTEM.md`, existing UX ADRs, CLAUDE.md UX rules.
Produce specs for:
- **Screen inventory**: new/modified screens, route placement, component reuse audit
- **State catalog**: empty, loading, populated, error, and partial states for every interactive element
- **Interaction specs**: user flows (happy path + alternates), focus/keyboard behavior
- **Animation specs**: map each interaction to existing keyframes or specify new ones, include `prefers-reduced-motion` fallbacks
- **Responsive behavior**: breakpoints, web/mobile divergence decisions
- **Accessibility**: WAI-ARIA pattern selection, live regions, error visibility
Create Design ADRs for significant UX decisions (choice of interaction pattern, new animation convention, platform divergence). Record minor layout choices directly in the plan file.
---
## Phase 2: Technical Analysis
Spawn 1-2 Explore agents for targeted codebase research. Run them in the background via Task tool.
While agents run, check:
- Existing ADRs in `docs/adr/` — if 3+ ADRs confirm a decision → auto-resolve without asking
- PATTERNS.md — apply confirmed patterns directly
When agents return: present architecture decisions with 2-3 options each, concrete pros/cons, and a recommendation. Ask for user input on each unresolved decision.
For each significant decision:
1. Create `docs/adr/ADR-XXXX.md` using standard Nygard format (Context / Decision / Status / Consequences)
2. Update `docs/adr/PATTERNS.md` with the new observation
---
## Phase 3: Scope Assessment
Apply trigger rules to determine which research agents are needed. Present the proposed team with justification for each inclusion.
**Research agent pool:**
| Agent | Trigger | Model |
|-------|---------|-------|
| `code-explorer` | Always | Sonnet |
| `arch-researcher` | Changes touch 2+ architectural layers | Sonnet |
| `database-analyst` | Any DB schema change | Sonnet |
| `security-analyst` | Auth, payments, PII, RBAC, rate limiting | Opus |
| `test-analyzer` | Non-trivial feature (not just a bug fix) | Sonnet |
| `cross-platform-specialist` | Web + mobile parity required | Sonnet |
| `native-app-specialist` | Tasks touch mobile/native UI package | Sonnet |
| `design-system-researcher` | UI changes in scope | Sonnet |
| `dependency-researcher` | New packages being added | Sonnet |
| `devops-specialist` | Docker, env vars, CI/CD changes | Sonnet |
| `integration-researcher` | New services, libraries, OTEL config | Opus |
| `planning-coordinator` | Always, when 2+ agents selected | Opus |
**Tier labels** (descriptive, not prescriptive):
- Tier 0 (0 agents): Solo — inline research, no spawning
- Tier 1 (1-3 agents): Focused
- Tier 2 (4-6 agents): Standard
- Tier 3 (7-9 agents): Comprehensive
- Tier 4 (10+ agents): Full Spectrum
Tell the user: "I recommend a **[Tier N - Label]** team: [agent list with one-line justification each]. Want to add or remove any agents?"
Wait for approval before Phase 4.
---
## Phase 4: Research & Plan Creation
**Tier 0**: Conduct inline research. Write plan directly without spawning agents.
**Tier 1+**: Spawn approved agents in parallel using Task tool (run_in_background: true). For each agent, provide:
- Its specific research scope
- The relevant files/areas to investigate
- The questions it needs to answer
Monitor agents via TaskOutput polling loop. Report progress: "3/6 agents complete..."
When all agents return: if `planning-coordinator` was spawned, send it all agent reports and have it synthesize the final plan. Otherwise, synthesize directly.
**Plan file structure** (`docs/plans/plan-{name}.md`):
```markdown
# Plan: {feature-name}
Created: {date} | Branch: {branch-name} | Tier: {N}
## Summary
One paragraph: what this implements and why.
## Decisions
Decisions recorded during Phase 1 (PRD analysis).
## Architecture
ADRs created, patterns applied, architectural choices made.
## Tasks
Ordered task list with layers (1 = foundation, 2 = depends on 1, etc.)
### Layer 1
- [ ] Task A — description, files affected, acceptance criteria
- [ ] Task B — description, files affected, acceptance criteria
### Layer 2
- [ ] Task C — depends on A — description, files affected, acceptance criteria
## Test Plan
How each task will be verified. TDD tasks marked explicitly.
## Integration Verification
Smoke test commands to run post-execution (if backend/services in scope).
## Out of Scope
What this plan explicitly does not address.
```
Commit: plan file + ADR files + agent report manifests.
---
## Phase 5: Finalize Metrics
Record timestamps, phase durations, agent counts, and cost estimates in `docs/plans/metrics/{name}.json`. Commit.
---
## Auto-Transition
If Phase 1 produced no unresolved ambiguities and Phase 2 produced no unresolved decisions: auto-start `/plan-validate` without asking.
If any human discussion occurred: ask "Ready to validate this plan?" before proceeding.
---
## Usage
```
/plan-start
```
Provide the feature description or point to a PRD file when prompted. The command handles the rest interactively.
## When to Use
Use for any non-trivial feature: anything touching more than 2 files, involving architecture decisions, or where a planning mistake would be expensive to undo.
For simple changes (typos, trivial refactors): use `/plan` mode instead.
## See Also
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)
- [Planning Coordinator Agent](../agents/planning-coordinator.md)
- [ADR Writer Agent](../agents/adr-writer.md)

View file

@ -0,0 +1,187 @@
---
name: plan-validate
description: "2-layer plan validation: instant structural checks + trigger-based specialist agents. Auto-fixes issues using ADRs and first principles. Every issue must be resolved before execution."
---
# Plan Validate — 2-Layer Validation
Independently validate the plan produced by `/plan-start`. No code is written. Run `/clear` after this command before running `/plan-execute`.
Validation is separate from planning by design: validators that didn't write the plan are not anchored to its assumptions.
---
## Prerequisite
A committed plan file must exist at `docs/plans/plan-{name}.md`. If multiple plans exist, list them and ask the user which to validate.
---
## Layer 1: Structural Validation
Run immediately, no agents required. Check the plan document for:
**Format & Completeness**
- [ ] All required sections present (Summary, Decisions, Architecture, Tasks, Test Plan, Out of Scope)
- [ ] Each task has: description, files affected, acceptance criteria, layer assignment
**Dependency Chain**
- [ ] No circular dependencies between tasks
- [ ] Tasks in higher layers only depend on tasks in lower layers
- [ ] All stated dependencies exist in the plan
**File Existence**
- [ ] Every file listed for modification actually exists in the codebase (use Glob)
- [ ] New files are in appropriate directories per project conventions
**ADR Consistency**
- [ ] Plan decisions align with ADRs created during `/plan-start`
- [ ] No contradiction with existing ADRs in `docs/adr/`
**CLAUDE.md Compliance**
- [ ] Plan respects all hard rules in CLAUDE.md
- [ ] No first principles violations (no workarounds, no backward-compat shims)
**Test Coverage**
- [ ] Every new function/component has a corresponding test task
- [ ] TDD-marked tasks have failing test written before implementation task
Record all Layer 1 issues with severity (BLOCKER / WARNING / INFO) before proceeding to Layer 2.
---
## Layer 2: Specialist Review
Select agents by applying trigger rules to the plan content. No user input needed — triggers are objective.
**Validation agent pool:**
| Agent | Trigger | Model |
|-------|---------|-------|
| `security-reviewer` | Auth, payments, PII, RBAC, new public APIs | Opus |
| `db-migration-reviewer` | New tables, columns, indexes, or migration files | Opus |
| `performance-reviewer` | New queries, resolvers, routes, or added dependencies | Sonnet |
| `design-system-reviewer` | New UI components or visual styling changes | Sonnet |
| `ux-reviewer` | New pages, forms, modals, or interaction patterns | Sonnet |
| `cross-platform-reviewer` | Changes touching both web and mobile, or shared packages | Sonnet |
| `native-app-reviewer` | Mobile screens, native UI package changes | Sonnet |
| `integration-reviewer` | New external services, libraries, or OTEL config | Opus |
Spawn triggered agents in parallel (Task tool, run_in_background: true). Each agent receives: the plan file, relevant ADRs, and targeted questions based on its domain.
Monitor via TaskOutput polling loop. Report progress to user.
Each agent must return structured findings:
```
FINDING: [BLOCKER|WARNING|INFO]
Location: [plan section or file reference]
Issue: [concrete description]
Risk: [what breaks if this isn't addressed]
Suggestion: [specific fix or alternative]
```
---
## Auto-Fix Phase
Merge Layer 1 structural issues + Layer 2 specialist findings into a single issue list. Every issue must be resolved. No skipping.
**Triage each issue:**
**Bucket A — Auto-resolve:**
- Issue matches an existing ADR decision → cite ADR, mark resolved
- Issue matches a confirmed pattern in PATTERNS.md → cite pattern, mark resolved
- Issue resolvable from first principles in CLAUDE.md → apply rule, mark resolved
**Bucket B — Needs human input:**
- Novel architectural question not covered by existing decisions
- Conflicting ADRs with no clear precedent
- Blocker with no obvious resolution
For Bucket B items: present the issue, explain why it can't be auto-resolved, propose options, wait for decision. Record the decision in the plan's `## Decisions` section and create a new ADR if it's architecturally significant.
**Apply all fixes in one batch** once all issues are triaged. Update the plan file. Commit the updated plan.
---
## Issue Persistence
Record every issue in `docs/plans/metrics/{name}.json` under `validation.issues`:
```json
{
"id": "S-001",
"layer": 1,
"severity": "WARNING",
"category": "test-coverage",
"description": "No test task for the new webhook handler",
"reporting_agent": "structural",
"triage": "A",
"resolution_source": "first-principles",
"resolution": "Added test task in Layer 2 of the plan"
}
```
This data feeds `/plan-metrics` for pattern analysis over time.
---
## Auto-Transition
If all issues are auto-resolved (Bucket A only): auto-start `/plan-execute` without asking.
If any human input was required (Bucket B): ask "All issues resolved. Ready to execute?" before proceeding.
---
## Usage
```
/plan-validate
```
Picks up the most recent uncommitted plan automatically. Or specify:
```
/plan-validate plan-user-authentication
```
## Output
```
Layer 1: Structural validation...
✓ Format complete
✓ Dependencies valid
⚠ WARNING S-001: Missing test task for webhook handler
✓ CLAUDE.md compliant
Layer 2: Triggering specialist agents...
→ security-reviewer (auth changes detected) [Opus]
→ db-migration-reviewer (new users table) [Opus]
→ performance-reviewer (new query in /api/users) [Sonnet]
Monitoring... 1/3 complete... 2/3 complete... done.
BLOCKER B-001 [security-reviewer]: JWT expiry not validated on refresh endpoint
WARNING B-002 [db-migration-reviewer]: Migration lacks rollback strategy
Auto-fix phase:
S-001 → auto-resolved (first principles: test coverage rule)
B-001 → NEEDS INPUT (no existing ADR for JWT refresh strategy)
B-002 → auto-resolved (ADR-0003: migration rollback pattern)
[User input requested for B-001]
Decision recorded. ADR-0011 created.
All 3 issues resolved. Plan updated.
→ Auto-starting /plan-execute
```
## When to Use
Always — before any `/plan-execute` call. The cost of validation ($0.20-3.00) is negligible against the cost of discovering issues mid-execution.
## See Also
- [Plan-Validate-Execute Pipeline](../../guide/workflows/plan-pipeline.md)
- [Integration Reviewer Agent](../agents/integration-reviewer.md)
- [Plan Challenger Agent](../agents/plan-challenger.md)