From b6ce1ef72fcf89c55cefbdac630f164f5f5fa518 Mon Sep 17 00:00:00 2001 From: Florian BRUNIAUX Date: Fri, 13 Mar 2026 16:22:57 +0100 Subject: [PATCH] docs: add RPI workflow, changelog fragments, smart-suggest hook + LLM variance MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - guide/workflows/rpi.md (new): Research → Plan → Implement, 3-phase pattern with explicit GO gates, slash command templates, worked example - guide/workflows/changelog-fragments.md (new): per-PR YAML fragment enforcement, 3-layer system (CLAUDE.md rule + UserPromptSubmit hook + CI gate) - examples/hooks/bash/smart-suggest.sh (new): UserPromptSubmit behavioral coach, 3-tier priority (enforcement/discovery/contextual), ROI logging - guide/core/known-issues.md: LLM Day-to-Day Performance Variance section, 4 root causes (probabilistic inference, MoE routing, infra, context sensitivity) - guide/workflows/README.md: added RPI entry + quick selection row - machine-readable/reference.yaml: added entries for changelog_fragments, smart_suggest - CHANGELOG.md: [Unreleased] entries for all 4 new items - IDEAS.md: prompt-caching MCP plugin research note (testing in progress) Co-Authored-By: Claude Sonnet 4.6 --- CHANGELOG.md | 8 + IDEAS.md | 30 + examples/hooks/bash/smart-suggest.sh | 176 ++++++ guide/core/known-issues.md | 55 ++ guide/workflows/README.md | 9 + guide/workflows/changelog-fragments.md | 200 +++++++ guide/workflows/rpi.md | 766 +++++++++++++++++++++++++ machine-readable/reference.yaml | 7 + 8 files changed, 1251 insertions(+) create mode 100644 examples/hooks/bash/smart-suggest.sh create mode 100644 guide/workflows/changelog-fragments.md create mode 100644 guide/workflows/rpi.md diff --git a/CHANGELOG.md b/CHANGELOG.md index ac66808..be1855a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). ### Added +- **`guide/workflows/rpi.md`** — New workflow guide (RPI: Research → Plan → Implement). 3-phase feature development pattern with explicit validation gates: Research produces `RESEARCH.md`, Plan produces `PLAN.md`, Implement produces working code. Each gate requires explicit GO before the next phase. Includes slash command templates (`/rpi:research`, `/rpi:plan`, `/rpi:implement`), a worked example (adding rate limiting to an Express API), and comparison matrix vs Plan-Driven, TDD, and Spec-First. Best for features where discovering a wrong assumption late is expensive. + +- **`guide/workflows/changelog-fragments.md`** — New workflow guide for the Changelog Fragments pattern: one YAML fragment per PR, written at implementation time, validated by CI, assembled automatically at release. Covers 3-layer enforcement: (1) CLAUDE.md workflow rule for autonomous fragment creation, (2) `UserPromptSubmit` hook with 3-tier priority (enforcement → discovery → contextual), (3) independent CI migration check job. Includes the `UserPromptSubmit` tier pattern as a reusable hook architecture for any mandatory workflow step. + +- **`examples/hooks/bash/smart-suggest.sh`** — New `UserPromptSubmit` hook implementing the 3-tier behavioral coach pattern: Tier 0 enforcement (changelog fragment required before PR, plan-before-code), Tier 1 discovery (test-loop, retex, dupes, monitoring, security, release), Tier 2 contextual (code review, debugging, architecture, session resume). Max 1 suggestion per prompt (first match wins), dedup guard, ROI logging to `~/.claude/logs/smart-suggest.jsonl`, silent exit on no match. + +- **`guide/core/known-issues.md`** — New section "LLM Day-to-Day Performance Variance": documents session-to-session output quality variance (shorter responses, conservative suggestions, edge-case refusals) as expected behavior, not a bug. Explains the 4 root causes (probabilistic inference, MoE routing variance, infrastructure variance, context sensitivity) and provides an observable signals table. Includes a practical checklist for ruling out controllable factors before concluding "the model degraded." + - **`cc-sessions discover` documentation** — New subsection "Session Pattern Discovery" in §2.x (Session Management) covering the `discover` subcommand: n-gram mode (local, free, ~3s for 12 projects) vs `--llm` mode (semantic analysis via `claude --print`). Includes example output, the 20% rule decision framework (CLAUDE.md rule / skill / command categorization), and install instructions. Cross-reference added after the 20% rule callout in §5.1. - **`examples/scripts/cc-sessions.py` synced** — Updated from 498-line stale copy to the full 1225-line version from `~/bin/cc-sessions`. Includes the complete `discover` subcommand (n-gram analysis + `--llm` mode), incremental discover cache, Jaccard deduplication, and all filtering logic. GitHub source header added. diff --git a/IDEAS.md b/IDEAS.md index 5a5168c..f2750ec 100644 --- a/IDEAS.md +++ b/IDEAS.md @@ -85,6 +85,36 @@ CLAUDE.md configuration examples by framework: ## Watching (Waiting for Demand) +a### prompt-caching MCP Plugin + +MCP plugin that automates `cache_control` placement for developers building apps on the Anthropic SDK. Installed locally at `/Users/florianbruniaux/Sites/prompt-caching` and connected to Claude Code via `~/.claude.json`. + +**Status:** Testing in progress. Real usage data required before any documentation decision. + +**What we know:** +- 29 stars, v1.3.0, solo maintainer — maintenance risk +- Author-reported benchmarks (80-92% savings) — unverified, cannot cite +- Fills a real gap: no other MCP tool does this; Spring AI / LiteLLM / Pydantic AI serve different audiences +- Blog post (Mathieu Grenier) independently documents the same pain point + 5 antipatterns — score 3/5, worth integrating in "Strategy 6" regardless + +**Open questions:** +- [ ] Do real sessions on this project actually hit the cache? (run `get_cache_stats` after 10+ turns) +- [ ] Is the plugin stable enough to recommend? Any errors, memory leaks, session issues? +- [ ] What are the real savings on a CLAUDE.md-heavy project like this guide? + +**If test results are positive (cache hits confirmed, no stability issues):** +- Add to `guide/ecosystem/third-party-tools.md` with verified stats (not README claims) +- Add to landing third-party tools section +- Score upgrade: 3/5 → 4/5 + +**If test results are inconclusive or plugin is unstable:** +- Move to Discarded Ideas +- Keep the Mathieu Grenier blog post integration (independent value) + +**Check again:** After 1 week of real usage + +--- + ### Multi-LLM Consultation Patterns Using external LLMs (Gemini, GPT-4) as "second opinion" from Claude Code. diff --git a/examples/hooks/bash/smart-suggest.sh b/examples/hooks/bash/smart-suggest.sh new file mode 100644 index 0000000..0e9db79 --- /dev/null +++ b/examples/hooks/bash/smart-suggest.sh @@ -0,0 +1,176 @@ +#!/bin/bash +# .claude/hooks/smart-suggest.sh +# Event: UserPromptSubmit +# Non-blocking behavioral coach: suggests the right command or agent based on detected intent +# +# Architecture: 3-tier priority system +# Tier 0 — Enforcement: mandatory workflow gates (runs first, exits on match) +# Tier 1 — Discovery: high-value tools the developer may not know exist +# Tier 2 — Contextual: opportunistic suggestions for common patterns +# +# Properties: +# - Max 1 suggestion per prompt (first match wins) +# - Dedup guard: never suggest a command already in the prompt +# - ROI logging: records suggestions to ~/.claude/logs/smart-suggest.jsonl +# - Silent exit on no match (non-blocking, never fails) +# +# Customization: replace patterns with your own commands/agents. +# Keep Tier 0 small and precise — it intercepts before everything else. + +set -euo pipefail + +INPUT=$(cat) +PROMPT=$(echo "$INPUT" | jq -r '.prompt // empty' 2>/dev/null || true) + +# Skip prompts that are too short to match anything meaningful +if [[ -z "$PROMPT" || ${#PROMPT} -lt 8 ]]; then + exit 0 +fi + +PROMPT_LC=$(echo "$PROMPT" | tr '[:upper:]' '[:lower:]') + +# Skip slash commands — user already knows what they want +if [[ "$PROMPT_LC" =~ ^/ ]]; then + exit 0 +fi + +LABEL_CMD="Command" +LABEL_AGENT="Agent" + +# ─── suggest() ──────────────────────────────────────────────────────────────── +# Emits one suggestion then exits (guarantees max 1 suggestion per prompt). +# Dedup: if the command/agent name is already in the prompt, exits silently. +suggest() { + local label="$1" name="$2" reason="$3" + + # Strip leading slash, take first token — prevents matching "pr" in "improve" + # e.g. "/changelog:add" → "changelog:add", "code-reviewer" → "code-reviewer" + local check + check=$(echo "$name" | tr '[:upper:]' '[:lower:]' | sed 's|^/||' | awk '{print $1}') + if echo "$PROMPT_LC" | grep -qF "$check"; then + exit 0 + fi + + # Append to JSONL log for ROI measurement (non-blocking — failure is safe) + local logdir="${HOME}/.claude/logs" + mkdir -p "$logdir" 2>/dev/null || true + printf '{"ts":"%s","suggested":"%s","prompt_len":%d}\n' \ + "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$name" "${#PROMPT}" \ + >> "$logdir/smart-suggest.jsonl" 2>/dev/null || true + + cat << EOF +{ + "hookSpecificOutput": { + "hookEventName": "UserPromptSubmit", + "additionalContext": "[Suggestion] $label: $name -- $reason" + } +} +EOF + exit 0 +} + +# ============================================================================== +# TIER 0 — Enforcement +# Intercepts workflow violations before any other suggestion. +# Keep patterns tight: a false positive here is more disruptive than a miss. +# ============================================================================== + +# Changelog fragment enforcement +# If the developer signals intent to create a PR without mentioning a changelog +# fragment, redirect to the fragment creation step first. +# When fragment/skip-changelog is already mentioned → suggest the PR command normally. +if echo "$PROMPT_LC" | grep -qE '(create.*pr|open.*pr|make.*pr|pull.?request|push.*pr)'; then + if ! echo "$PROMPT_LC" | grep -qE '(changelog|fragment|skip-changelog)'; then + suggest "$LABEL_CMD" "pnpm changelog:add" \ + "REQUIRED before merge — creates changelog/fragments/{PR}-{slug}.yml (or add label 'skip-changelog' for tooling PRs)" + else + suggest "$LABEL_CMD" "/pr" \ + "PR creation with structured description and auto-detected scope" + fi +fi + +# Plan-before-code enforcement +# Intercepts implementation intent when no planning context is present. +# Negative lookahead excludes: plan, test, fix, review, explain, refactor. +if echo "$PROMPT_LC" | grep -qE '(implement|add (the|a|an)|create (the|a|an) (service|router|endpoint|feature|component)|write (the )?(code|logic|service))'; then + if ! echo "$PROMPT_LC" | grep -qE '(plan|test|fix|bug|review|refactor|explain|why|how)'; then + suggest "$LABEL_CMD" "/plan" \ + "Run /plan BEFORE coding — Research → Spec → Implement reduces rework" + fi +fi + +# ============================================================================== +# TIER 1 — Discovery +# High-value tools the developer may not know about. +# Use multi-word patterns to reduce false positives. +# ============================================================================== + +# Failing tests — suggest auto-fix loop before generic debug +if echo "$PROMPT_LC" | grep -qE '(tests? (fail|broken|red)|fix (the |these )?tests?|tests? (are )?failing)'; then + if ! echo "$PROMPT_LC" | grep -qE '(api|server|endpoint|500|404)'; then + suggest "$LABEL_CMD" "/test-loop" \ + "Auto-fix loop: 1 test per cycle with convergence stats" + fi +fi + +# Lesson learned / rollback scenario +if echo "$PROMPT_LC" | grep -qE '(wrong approach|bad approach|should have|dead end|root cause was|lesson learned|introduced.*bug|regression)'; then + suggest "$LABEL_CMD" "/retex" \ + "Capture this lesson in persistent memory (searchable across sessions)" +fi + +# Code duplication detected +if echo "$PROMPT_LC" | grep -qE '(duplicat|copy.?past|same code|repeated (code|logic)|identical (code|function))'; then + suggest "$LABEL_CMD" "/dupes" \ + "Detect copy-paste code with jscpd before the refactor" +fi + +# Monitoring / polling intent — suggest background loop +if echo "$PROMPT_LC" | grep -qE '(check (every|each|all)|wait (until|for)|poll|monitor|watch.*for|when (it.?s |it is )?done|once (the )?deploy|once (the )?ci)'; then + suggest "$LABEL_CMD" "/loop [interval] [prompt]" \ + "Run in background while you work: /loop 5m check the deploy status" +fi + +# Security-sensitive keywords +if echo "$PROMPT_LC" | grep -qE '(sql injection|xss|owasp|vulnerability|security audit|auth bypass)'; then + suggest "$LABEL_AGENT" "security-auditor" \ + "Read-only OWASP audit — detects injection, auth, and data exposure issues" +fi + +# Release assembly — suggest structured release command +if echo "$PROMPT_LC" | grep -qE '(cut.*release|prepare.*release|new (version|release)|release.*[0-9]+\.[0-9]|tag.*version|deploy.*main|merge.*main)'; then + suggest "$LABEL_CMD" "/release" \ + "Assembles fragments → CHANGELOG section, bumps version, creates release branch" +fi + +# ============================================================================== +# TIER 2 — Contextual +# General patterns, lower priority, regex kept tight to avoid noise. +# ============================================================================== + +# Code review before merge +if echo "$PROMPT_LC" | grep -qE '(review (the |this )?(code|pr|file)|code review|before (merge|merging))'; then + suggest "$LABEL_AGENT" "code-reviewer" \ + "Quality + security + performance review" +fi + +# Debugging — generic +if echo "$PROMPT_LC" | grep -qE '(debug (this|the)|fix (this |the )?(bug|crash|error)|stack trace|not (working|loading))'; then + suggest "$LABEL_AGENT" "debugger" \ + "Systematic root cause analysis" +fi + +# Architecture decision +if echo "$PROMPT_LC" | grep -qE '(architecture (decision|review)|system design|big refactor|large.*refactor)'; then + suggest "$LABEL_AGENT" "architect-review" \ + "Validates architectural decisions across scalability, coupling, and tradeoffs" +fi + +# Resume previous session +if echo "$PROMPT_LC" | grep -qE '(resume (session|work)|pick up (where|from)|continue (from )?(yesterday|last session))'; then + suggest "$LABEL_CMD" "/resume" \ + "Restores session context from persistent memory" +fi + +# No match — silent exit (never blocks the user) +exit 0 diff --git a/guide/core/known-issues.md b/guide/core/known-issues.md index 144d8f2..69cd1f8 100644 --- a/guide/core/known-issues.md +++ b/guide/core/known-issues.md @@ -234,6 +234,61 @@ Key quote: --- +## 🔄 LLM Day-to-Day Performance Variance + +**Type**: Expected behavior (not a bug) +**Severity**: 🟡 **LOW - AWARENESS** +**Status**: Inherent to LLM inference, not specific to any version + +### What This Is + +Claude's output quality can vary noticeably from session to session, even with identical prompts and a clean context window. This is distinct from context window degradation (which happens within a session as context fills up). This is about variance between fresh sessions. + +Users sometimes report shorter responses, more conservative suggestions, or unexpected refusals on tasks that worked fine the day before. This can feel like a model downgrade, but it is not. + +### Root Causes + +**Probabilistic inference**: Temperature above 0 means every inference run is non-deterministic. Two runs of the same prompt will produce different token sequences. This is fundamental to how language models work. + +**MoE routing variance**: Claude uses a Mixture of Experts architecture. On each forward pass, a routing mechanism selects which expert weights to activate. Different runs activate different combinations, producing different outputs even for semantically identical inputs. + +**Infrastructure variance**: In production, requests hit different servers with different load levels, hardware generations, and thermal states. These factors influence numerical precision in floating-point arithmetic during inference, creating subtle but real output differences. + +**Context sensitivity**: Even with `/clear`, tiny differences between sessions accumulate. The system prompt, tool list, and session initialization all slightly affect the model's first outputs. + +### Observable Signals + +| Signal | What You See | What It Means | +|--------|-------------|---------------| +| Response length | Shorter, less detailed than usual | Routing hit a more conservative path | +| Refusals | Edge cases that normally work get refused | Different safety calibration on this run | +| Code style | More verbose or more minimal than expected | Expert mix activated differently | +| Creativity | More conservative, less inventive suggestions | Not a capability loss, a sampling outcome | +| Verbosity | More caveats and disclaimers than usual | Normal variance in token probabilities | + +### What This Is NOT + +- **Not a model downgrade**: Anthropic versions models deliberately and documents changes. Day-to-day variance happens within the same model version. +- **Not a bug to report**: This behavior is expected and documented in LLM literature. It is inherent to probabilistic inference. +- **Not permanent**: The next session will likely behave differently. A "bad" run does not indicate a lasting change. +- **Not context window degradation**: That is a within-session phenomenon caused by token accumulation. This is between-session variance on fresh starts. + +> The Aug-Sep 2025 incident ([see Resolved Issues above](#model-quality-degradation-aug-sep-2025)) was the exception: Anthropic confirmed actual infrastructure bugs causing systematic degradation. True systematic degradation is rare and Anthropic investigates it. Normal session-to-session variance is something else. + +### Mitigation Strategies + +**Constrain the prompt**: More specific prompts reduce the output space and make variance less noticeable. "Write a function that does X, Y, Z, returns type T, handles edge case E" produces more consistent outputs than "write me something to handle X." + +**Fresh context before important work**: Run `/clear` before a high-stakes task. Accumulated session noise from earlier exploratory work can skew subsequent outputs even within the same session. + +**Reformulate and retry**: If an output seems off compared to your expectations, try the same request with different framing. A second formulation often routes through different expert paths and produces a better result. + +**Compare against a known-good prompt**: If you have a prompt from a previous session that produced excellent output, use it as a reference. If today's version of that prompt produces visibly worse output consistently, that warrants closer investigation (and potentially a GitHub issue if reproducible). + +**Calibrate expectations by task type**: Deterministic tasks (regex, simple transforms, well-defined algorithms) show less variance than creative or judgment-heavy tasks. Use Claude Code for the former with high reliability; for the latter, build review steps into your workflow. + +--- + ## 📊 Issue Statistics (as of Jan 28, 2026) | Metric | Count | Source | diff --git a/guide/workflows/README.md b/guide/workflows/README.md index ed2a9f7..b3ac3be 100644 --- a/guide/workflows/README.md +++ b/guide/workflows/README.md @@ -83,6 +83,14 @@ Eliminates merge conflicts on `CHANGELOG.md`, captures context at implementation - Conditional suggestion pattern: "if PR-intent without fragment-mention" - CI enforcement with independent migration check job +### [RPI: Research → Plan → Implement](./rpi.md) ⭐ NEW + +**3-phase feature development with explicit validation gates between phases** + +Build features in three locked phases: Research feasibility first, plan the implementation second, write code third. Each phase produces a concrete artifact (RESEARCH.md → PLAN.md → code). Each gate requires an explicit GO before the next phase starts. + +**When to use**: Features with unclear feasibility, more than a day of work, unknown technical territory, or anywhere discovering a wrong assumption late is costly + ### [Cognitive Mode Switching](./gstack-workflow.md) ⭐ NEW Switch between specialist roles across your ship cycle: strategic product gate, architecture review, paranoid code review, automated release, native browser QA, and retrospective. @@ -190,6 +198,7 @@ Multi-session task tracking with TodoWrite, tasks API, and context persistence a | **New project from template** | [Skeleton Projects](./skeleton-projects.md) | | **Team AI instructions** | [Team AI Instructions](./team-ai-instructions.md) | | **Enforce mandatory workflow steps** | [Changelog Fragments](./changelog-fragments.md) | +| **Unknown feasibility, multi-day feature** | [RPI: Research → Plan → Implement](./rpi.md) | | **Documentation** | [PDF Generation](./pdf-generation.md) | | **Social previews** | [OG Image Generation](./og-image-generation.md) | | **Conference talk from raw material** | [Talk Preparation Pipeline](./talk-pipeline.md) | diff --git a/guide/workflows/changelog-fragments.md b/guide/workflows/changelog-fragments.md new file mode 100644 index 0000000..f830c3b --- /dev/null +++ b/guide/workflows/changelog-fragments.md @@ -0,0 +1,200 @@ +# Changelog Fragments: Enforced Per-PR Documentation + +A 3-layer enforcement pattern that ensures every PR is documented at write time, never at release time. + +--- + +## The Problem + +Single `CHANGELOG.md` files break on active teams. Three open feature branches all touching the same file means merge conflicts on every merge. Someone resolves the conflict, drops a line, and release notes are wrong before they're published. + +The deeper problem is timing. "Document at release time" sounds reasonable until you're staring at PR #840 three weeks after it merged, trying to reconstruct what changed for users. The commit says `fix session handling`. The developer is in a different timezone. Context is gone. + +Enforcement without CI gates means changelogs become a nag job. Someone has to chase people before every release, under time pressure, filling in blanks from git log. + +The solution: one YAML fragment per PR, written while implementing, validated by CI, assembled automatically at release. + +--- + +## The 3-Layer Architecture + +The system works because enforcement happens at three independent levels. Each layer catches a different failure mode. + +### Layer 1: CLAUDE.md Workflow Rule + +The first layer is a rule loaded into Claude Code's context at every session. It encodes the entire fragment workflow so Claude can complete it autonomously when asked to create a PR. + +```markdown +# git-workflow.md (loaded via CLAUDE.md) + +## Changelog Fragment — Required Before Every PR + +Before creating a PR, always generate a changelog fragment. + +### Steps + +1. **Infer from git diff** — analyze `git diff main...HEAD` to determine: + - `type`: feat | fix | perf | refactor | security | docs | chore + - `scope`: the functional area affected (auth, sessions, api, etc.) + - `title`: one-line user-facing summary (< 80 chars) + +2. **Create the fragment** — run `pnpm changelog:add` or write directly: + ``` + changelog/fragments/{PR_NUMBER}-{slug}.yml + ``` + +3. **Validate** — run `pnpm changelog:validate changelog/fragments/{file}.yml` + +4. **Commit alongside the PR** — include the fragment in the same branch + +### Fragment schema + +```yaml +pr: 886 # must match filename prefix +type: fix # feat|fix|perf|refactor|security|docs|chore +scope: "visiochat" +title: "Fix empty chat after SSE race condition" # < 80 chars +description: | # optional — explain user impact, not implementation + SSE workplan fires before AI stream completes, causing ChatWrapper + to mount with 0 messages. +breaking: false +migration: false # set true if PR adds a DB migration +``` + +### Bypass + +Add label `skip-changelog` for PRs with no user impact (CI config, deps updates, release commits). +``` + +This rule makes Claude Code a participant in enforcement, not just a coding tool. When a developer says "make the PR," Claude infers the fragment content from the diff and creates it before opening the PR. + +### Layer 2: UserPromptSubmit Hook (Behavioral Detection) + +The second layer intercepts intent before it becomes action. When the developer types something that signals PR creation intent, the hook checks whether a changelog fragment was mentioned. + +```bash +# .claude/hooks/smart-suggest.sh (excerpt — Tier 0 enforcement) + +# PR creation intent detected +if echo "$PROMPT_LC" | grep -qE '(create.*pr|open.*pr|make.*pr|pull.?request|push.*pr)'; then + # Fragment not mentioned → redirect to creation step first + if ! echo "$PROMPT_LC" | grep -qE '(changelog|fragment|skip-changelog)'; then + suggest "pnpm changelog:add" \ + "REQUIRED before merge — creates changelog/fragments/{PR}-{slug}.yml" + else + # Already mentioned → suggest the PR command normally + suggest "/pr" "PR creation with structured description" + fi +fi +``` + +The hook is `UserPromptSubmit`: non-blocking, max one suggestion per prompt, silent on no match. It runs before Claude Code processes the prompt, so the developer sees the reminder inline before Claude starts doing anything. + +The conditional logic (`if X without Y`) is the key pattern here. It's not a blanket blocker — it adapts to context. If the developer already mentioned the fragment, they get the normal suggestion. If they didn't, they get the enforcement reminder. + +**Full hook with 3-tier architecture**: [`examples/hooks/bash/smart-suggest.sh`](../../examples/hooks/bash/smart-suggest.sh) + +### Layer 3: CI Enforcement (GitHub Actions) + +The third layer is the hard gate. Two independent jobs run on every PR targeting the main branch. + +**`check-fragment` job**: checks bypass labels first (closed list), then requires `changelog/fragments/{PR_NUMBER}-*.yml` to exist and pass structural validation. + +```yaml +- name: Check fragment exists and is valid + env: + PR_NUMBER: ${{ github.event.pull_request.number }} + PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} + run: | + SKIP_LABELS=("skip-changelog" "dependencies" "release" "chore: deps") + for LABEL in "${SKIP_LABELS[@]}"; do + if echo "$PR_LABELS" | grep -q "\"$LABEL\""; then + echo "Bypass label detected — fragment not required" + exit 0 + fi + done + + FRAGMENT=$(ls "changelog/fragments/${PR_NUMBER}-"*.yml 2>/dev/null | head -1) + if [ -z "$FRAGMENT" ]; then + echo "Fragment missing. Run: pnpm changelog:add" + exit 1 + fi + + pnpm tsx changelog/scripts/validate.ts "$FRAGMENT" +``` + +**`check-migration-flag` job** (runs independently, no bypass): detects new SQL migration files with `git diff --name-only --diff-filter=A`. If migrations are present and `migration: false` in the fragment, it fails. This job cannot be bypassed by labels — a `skip-changelog` PR that adds a migration still triggers the check. + +The two jobs are independent by design. A PR can bypass fragment creation (via label) but still fail the migration check. + +--- + +## Fragment Assembly at Release + +Fragments accumulate in `changelog/fragments/` as PRs merge. At release time, one command assembles them into a versioned CHANGELOG section. + +```bash +pnpm changelog:assemble --version 1.8.0 [--dry-run] +``` + +What it does: +1. Reads all `changelog/fragments/*.yml` +2. Groups by type in fixed order (feat, fix, perf, refactor, security, docs, chore) +3. Pulls `breaking: true` entries into a dedicated `🔨 Breaking Changes` section +4. Annotates `migration: true` entries inline with `⚠️ Migration DB.` +5. Replaces the `## [Next Release]` placeholder in `CHANGELOG.md` +6. Archives fragments to `changelog/fragments/released/{version}/` + +Output: +```markdown +## [1.8.0] - 2026-03-15 + +### 🔨 Breaking Changes +- **Remove legacy token format (#871)** — Tokens issued before v1.6.0 are invalid. + +### ✨ New Features +- **Add real-time presence indicators (#892)** + +### 🔧 Bug Fixes +- **Fix empty chat after SSE race condition (#886)** — SSE workplan fires before + AI stream completes, causing ChatWrapper to mount with 0 messages. +``` + +--- + +## Why 3 Layers, Not 1 + +Each layer catches a different failure mode: + +| Layer | Failure caught | When | +|-------|---------------|------| +| CLAUDE.md rule | Claude forgets the workflow | Every session | +| UserPromptSubmit hook | Developer types "make the PR" without thinking | Pre-prompt | +| CI gate | Fragment was skipped or corrupt | Pre-merge | + +A single CI gate catches the issue too late — the developer has to context-switch back after their PR is already open. The hook catches it at intent time. The CLAUDE.md rule means Claude handles it autonomously when given the task. + +The layers don't conflict. They reinforce each other. A developer who sees the hook suggestion will run `pnpm changelog:add`. Claude will follow the CLAUDE.md rule and validate the output. CI confirms everything before merge. + +--- + +## Adopting This Pattern + +The TypeScript scripts (add, validate, assemble, audit) are specific to the Méthode Aristote stack. The 3-layer enforcement pattern is not — it works with any fragment format, any CI system, any assembler. + +**Minimum viable setup:** + +1. **Define your fragment schema** (YAML, JSON, whatever fits your stack) +2. **Add a CLAUDE.md rule** encoding the creation workflow so Claude can handle it autonomously +3. **Add a `UserPromptSubmit` hook** with the `if PR-intent without fragment-mention → suggest` pattern +4. **Add a CI job** that checks for fragment existence before merge + +The hook pattern generalizes to any mandatory workflow step. Substitute "changelog fragment" with "ADR", "migration flag", "test coverage check" — the conditional detection logic is the same. + +--- + +## Related + +- Hook example: [`examples/hooks/bash/smart-suggest.sh`](../../examples/hooks/bash/smart-suggest.sh) +- Hook documentation: [UserPromptSubmit Hooks](../ultimate-guide.md) (search "UserPromptSubmit") +- Fragment validator and assembler scripts: available in the Méthode Aristote repository diff --git a/guide/workflows/rpi.md b/guide/workflows/rpi.md new file mode 100644 index 0000000..d322e76 --- /dev/null +++ b/guide/workflows/rpi.md @@ -0,0 +1,766 @@ +--- +title: "RPI: Research → Plan → Implement" +description: "A 3-phase feature development pattern with explicit validation gates between phases" +tags: [workflow, architecture, design-patterns, validation] +--- + +# RPI: Research → Plan → Implement + +> **Confidence**: Tier 2 — Synthesized from production team patterns. The gate-based structure aligns with Anthropic's guidance on agent task decomposition and agentic loop control. + +Build features in three locked phases: Research feasibility first, plan the implementation second, write code third. Each phase produces a concrete artifact. Each gate requires an explicit GO before the next phase starts. + +--- + +## Table of Contents + +1. [TL;DR](#tldr) +2. [When to Use RPI](#when-to-use-rpi) +3. [How the Gates Work](#how-the-gates-work) +4. [Phase 1: Research](#phase-1-research) +5. [Phase 2: Plan](#phase-2-plan) +6. [Phase 3: Implement](#phase-3-implement) +7. [Slash Command Templates](#slash-command-templates) +8. [Worked Example](#worked-example) +9. [Comparison to Other Workflows](#comparison-to-other-workflows) +10. [Tips and Troubleshooting](#tips-and-troubleshooting) +11. [See Also](#see-also) + +--- + +## TL;DR + +``` +Phase 1 — Research: + Claude explores feasibility, surfaces risks, asks decision questions + Output: RESEARCH.md + Gate: You decide GO / NO-GO + +Phase 2 — Plan: + Claude writes architecture decisions, user stories, test plan + Output: PLAN.md + Gate: You approve the plan before any code is written + +Phase 3 — Implement: + Claude implements step by step, tests pass before each next step + Output: working code + passing tests + Gate: each implementation step validated before the next begins +``` + +**Best for**: Features with unclear feasibility, more than a day of work, unknown technical territory, or anything where discovering a wrong assumption late is costly. + +--- + +## When to Use RPI + +### Use RPI When + +- **Feasibility is unknown**: You have an idea but aren't sure it holds up technically +- **Scope is large**: More than a day's worth of implementation work +- **Requirements are fuzzy**: You know the outcome you want, not the path to get there +- **Risk of wrong direction is high**: Security, payments, data migrations, integrations with external systems +- **You've been surprised before**: A feature looked simple, turned out to involve 6 other systems + +### Skip RPI When + +| Scenario | Better Approach | +|----------|----------------| +| Fix is obvious (typo, wrong color) | Direct edit | +| Feature is well-understood, requirements are clear | [Spec-First](./spec-first.md) or [dual-instance](./dual-instance-planning.md) | +| Exploration mode — you don't know what you want yet | Exploration workflow | +| Tiny change, single file | Just do it | + +### Decision Heuristic + +Ask yourself: "If the research phase reveals a serious problem, am I glad I didn't spend 2 days implementing first?" + +If yes, run RPI. The research phase typically takes 30-60 minutes and can save many hours. + +--- + +## How the Gates Work + +RPI has two human gates and one automated gate per implementation step. + +``` +[Idea] + | + v +[Phase 1: Research] + | + +- NO-GO -> Stop. Document why. Archive RESEARCH.md. + | + +- GO --------------------------------------------------------> + | + [Phase 2: Plan] + | + +- Needs revision -> iterate with Claude + | + +- Approved --------------------------------> + | + [Phase 3: Implement] + Step 1 -> Test gate + Step 2 -> Test gate + Step 3 -> Test gate + | + [Done] +``` + +**Gate 1 (after Research)**: You read RESEARCH.md and make a GO/NO-GO decision. This is the most important gate — it prevents building the wrong thing entirely. + +**Gate 2 (after Plan)**: You review PLAN.md before any code is written. Minor revisions happen here at zero cost. + +**Step gates (during Implement)**: Each implementation step must have passing tests before Claude moves to the next step. Automated, no human action required unless a step fails. + +--- + +## Phase 1: Research + +### What Research Covers + +The research phase answers five questions: + +1. **What already exists?** Relevant code, libraries, prior attempts in the codebase +2. **What needs to be built?** Scope boundary, components to create vs modify +3. **What are the risks?** Technical, security, integration, performance +4. **What are the decision points?** Architecture choices that affect the whole plan +5. **What is the effort estimate?** Rough sizing before committing to a plan + +Claude explores the codebase, reads relevant files, checks dependencies, and surfaces any constraint that would change the plan. The result is RESEARCH.md. + +### Starting Research + +Create the feature folder and invoke research: + +```bash +mkdir -p .claude/features/[feature-name] +``` + +Then in Claude: + +``` +/rpi:research [feature description] +``` + +Or without the slash command: + +``` +Run RPI Phase 1 (Research) for: [feature description] + +Save output to .claude/features/[feature-name]/RESEARCH.md +Use Plan Mode to explore without modifying code. +``` + +### RESEARCH.md Template + +```markdown +# Research: [Feature Name] + +**Date**: [YYYY-MM-DD] +**Requested**: [One-sentence description of the feature] +**Status**: PENDING DECISION + +--- + +## What Exists Today + +### Relevant Code +- [file path]: [what it does, why it matters] +- [file path]: [what it does, why it matters] + +### Relevant Libraries +- [library]: [currently used / available / needs to be added] + +### Prior Attempts or Related Work +- [any existing partial implementation, related PR, note in codebase] + +--- + +## What Needs to Be Built + +### New Files +- [file path]: [purpose] +- [file path]: [purpose] + +### Files to Modify +- [file path]: [what changes, why] + +### External Dependencies +- [dependency]: [reason needed, version constraint if any] + +--- + +## Risks + +| Risk | Likelihood | Impact | Notes | +|------|-----------|--------|-------| +| [risk description] | Low/Med/High | Low/Med/High | [mitigation or blocker] | + +--- + +## Architecture Decision Points + +Questions that need a decision before planning can start: + +1. **[Decision]**: Option A (pros: X, cons: Y) vs Option B (pros: X, cons: Y) +2. **[Decision]**: [Options and trade-offs] + +--- + +## Effort Estimate + +- Research-to-plan: [time] +- Implementation: [time range] +- Testing: [time] +- **Total estimate**: [range] + +**Confidence in estimate**: Low / Medium / High +**Why**: [reason for confidence level] + +--- + +## Recommendation + +[GO / NO-GO / NEEDS CLARIFICATION] + +[1-3 sentences explaining the recommendation] + +--- + +**Decision**: [ ] GO [ ] NO-GO [ ] NEEDS CLARIFICATION +**Notes**: [human fills this in] +``` + +### What a NO-GO Looks Like + +Not every research phase ends in GO. Common NO-GO reasons: + +- **Technical blocker**: External API doesn't support the required operation +- **Scope creep discovered**: "Simple feature" turns out to require rewriting the auth layer +- **Better alternative exists**: Research reveals an existing library or config change that solves the problem more simply +- **Risk too high for now**: The feature is valid but the timing is wrong + +Archive NO-GO research docs — they're valuable records of decisions made and why. + +--- + +## Phase 2: Plan + +### What the Plan Covers + +Phase 2 starts only after you mark RESEARCH.md with GO. Claude reads the research doc and produces a precise implementation plan. + +A good plan is specific enough that a different engineer (or Claude in a new session) could execute it without asking questions. + +``` +/rpi:plan .claude/features/[feature-name]/RESEARCH.md +``` + +Or without the slash command: + +``` +Run RPI Phase 2 (Plan) using: +- Research: .claude/features/[feature-name]/RESEARCH.md + +Save output to .claude/features/[feature-name]/PLAN.md +Do not write any code yet. Plan only. +``` + +### PLAN.md Template + +```markdown +# Plan: [Feature Name] + +**Date**: [YYYY-MM-DD] +**Research**: [link to RESEARCH.md] +**Estimated effort**: [from research] +**Risk level**: Low / Medium / High + +--- + +## Summary + +[2-4 sentences: what this implements, major design decisions taken, what it does NOT include] + +--- + +## Architecture Decisions + +[Record decisions made from the research decision points] + +1. **[Decision]**: Chose [Option] because [reason] +2. **[Decision]**: Chose [Option] because [reason] + +--- + +## Implementation Steps + +Steps must be sequential and independently testable. + +### Step 1: [Name] + +**Files**: [list of files to create or modify] +**What to build**: [precise description] +**Test gate**: [specific test or check that must pass before Step 2 starts] + +### Step 2: [Name] + +**Files**: [list of files to create or modify] +**What to build**: [precise description] +**Test gate**: [specific test or check that must pass before Step 3 starts] + +[...continue for all steps] + +--- + +## Success Criteria + +- [ ] [Testable criterion — describes observable behavior, not implementation] +- [ ] [Testable criterion] +- [ ] All step test gates pass + +--- + +## Out of Scope + +Explicitly list what this plan does NOT cover: +- [thing excluded] +- [thing excluded] + +--- + +## Risks Accepted + +[From RESEARCH.md risks, list which are accepted and how they're mitigated] + +--- + +## Rollback Plan + +If implementation fails mid-way: +- [what to undo] +- [how to restore previous state] + +--- + +**Plan approved?** [ ] YES — proceed to implementation +**Revision notes**: [human fills this in if changes needed] +``` + +### Reviewing the Plan + +Read PLAN.md carefully before approving. The goal is to catch design problems now, not during implementation. Specific things to check: + +- Implementation steps are in the right order, with no hidden dependencies +- Test gates are concrete and runnable (not "it looks right") +- Out-of-scope is explicit (prevents Claude from over-building) +- Rollback plan exists for anything touching data or shared state + +If the plan needs changes, ask Claude to revise before approving. This is free — revision after approval costs implementation time. + +--- + +## Phase 3: Implement + +### The Step-Gate Pattern + +Implementation runs step by step. Each step has a test gate. Claude does not start the next step until the current gate passes. + +``` +/rpi:implement .claude/features/[feature-name]/PLAN.md +``` + +Or without the slash command: + +``` +Run RPI Phase 3 (Implement) using: +- Plan: .claude/features/[feature-name]/PLAN.md + +Rules: +- Implement one step at a time +- After each step, run the test gate specified in the plan +- Do not start the next step until the test gate passes +- If a test gate fails, stop and report the failure — do not improvise a fix +- Commit after each step that passes its gate +``` + +### What Happens During Implementation + +Claude works through the plan's steps sequentially. For each step: + +1. Implements the specified files and changes +2. Runs the test gate (unit tests, integration check, or manual verification) +3. If gate passes: commits with a message referencing the step, announces readiness for the next step +4. If gate fails: reports the failure, the specific test output, and the likely cause. Does not attempt to fix unless you confirm. + +The step-commit pattern gives you a clean git history that mirrors the plan. If something goes wrong in Step 4, you can roll back to the Step 3 commit cleanly. + +### Step-Gate Failure Protocol + +When a test gate fails, Claude reports: + +``` +Step [N] gate failed. + +Gate: [what was supposed to pass] +Output: +[actual test output] + +Likely cause: [Claude's diagnosis] +Options: +1. Fix: [specific change that would likely fix it] +2. Revise plan: [if the plan step itself has a flaw] +3. Stop and investigate: [if the failure reveals something unexpected] + +Which should I do? +``` + +You decide. Claude does not auto-fix and proceed — that's how implementations drift from plans. + +--- + +## Slash Command Templates + +Save these to `.claude/commands/` to invoke each phase directly. + +### `/rpi:research` + +Save to `.claude/commands/rpi-research.md`: + +```markdown +# RPI Phase 1: Research + +Run feasibility research for the requested feature. + +## Instructions + +1. Enter Plan Mode (do not modify files during research) +2. Explore the codebase to answer these questions: + - What already exists that's relevant? + - What files will need to change or be created? + - What are the technical risks? + - What decisions need to be made before planning? + - What is a rough effort estimate? +3. Save output to `.claude/features/$ARGUMENTS/RESEARCH.md` using the template below +4. End with a clear recommendation: GO, NO-GO, or NEEDS CLARIFICATION +5. Ask the user for their GO/NO-GO decision before proceeding + +## Constraints + +- Do NOT write any code +- Do NOT modify any files +- Do NOT start planning implementation steps +- If uncertain about scope, surface it as a decision point +``` + +### `/rpi:plan` + +Save to `.claude/commands/rpi-plan.md`: + +```markdown +# RPI Phase 2: Plan + +Create an implementation plan based on approved research. + +## Pre-check + +Before starting: +1. Read the RESEARCH.md file specified in $ARGUMENTS +2. Verify it has a GO decision marked +3. If no GO decision found, stop and ask the user to decide first + +## Instructions + +1. Read RESEARCH.md carefully +2. Create PLAN.md in the same feature folder +3. Architecture decisions: resolve all decision points from research +4. Implementation steps: each step must have a concrete test gate +5. Success criteria: observable, testable, not implementation-internal +6. Out-of-scope: explicit list of what this plan does NOT cover +7. Do NOT write any implementation code +8. After writing the plan, ask the user to review and approve + +## Constraints + +- Do NOT write implementation code +- Steps must be sequential and independently testable +- Test gates must be runnable commands or precise manual checks, not vague descriptions +- Each step should be achievable in a single focused session +``` + +### `/rpi:implement` + +Save to `.claude/commands/rpi-implement.md`: + +```markdown +# RPI Phase 3: Implement + +Implement the feature following an approved plan, one step at a time. + +## Pre-check + +Before starting: +1. Read the PLAN.md file specified in $ARGUMENTS +2. Verify it has an approval marked +3. If no approval found, stop and ask the user to approve first + +## Instructions + +For each step in the plan: +1. Read the step description carefully +2. Implement only what the step specifies — nothing more +3. Run the test gate exactly as written in the plan +4. If the gate passes: + - Commit with message: "feat([feature]): step [N] — [step name]" + - Announce step completion and readiness for next step +5. If the gate fails: + - Report the exact failure output + - Diagnose the likely cause + - Present options (fix, revise plan, stop) + - Wait for human decision before proceeding + +## Constraints + +- Never skip a test gate +- Never start the next step before the current gate passes +- Never modify files outside the scope of the current step +- If you encounter something unexpected that changes the plan, stop and report it +- Commit after each passing step — not at the end +``` + +--- + +## Worked Example + +**Request**: "Add rate limiting to the public API endpoints." + +### Phase 1: Research Output (abbreviated) + +```markdown +# Research: API Rate Limiting + +**Date**: 2026-03-12 +**Status**: PENDING DECISION + +## What Exists Today + +- `src/middleware/` — has auth middleware, no rate limiting +- `package.json` — express-rate-limit not installed, redis available +- `src/routes/api.ts` — 14 public endpoints, unauthenticated routes mixed with authenticated ones + +## What Needs to Be Built + +- Rate limiter middleware for public endpoints +- Separate limits for authenticated vs unauthenticated users +- Redis store for distributed rate limiting (app runs on 3 instances) + +## Risks + +| Risk | Likelihood | Impact | Notes | +|------|-----------|--------|-------| +| Redis connection failure disables all API access | Low | High | Need fallback to in-memory if Redis unavailable | +| Rate limit too aggressive — breaks existing integrations | Medium | High | Need to survey current usage patterns first | + +## Architecture Decision Points + +1. **Library**: express-rate-limit (maintained, battle-tested) vs custom middleware +2. **Bypass for trusted IPs**: Allow internal services to bypass rate limiting? + +## Effort Estimate + +- Implementation: 2-4 hours +- Testing: 2 hours +- **Total**: 4-6 hours + +**Recommendation**: GO — standard problem, good library options, main risk is Redis fallback which is solvable. +``` + +**Human decision**: GO. Use express-rate-limit. No IP bypass for now. + +### Phase 2: Plan (abbreviated) + +```markdown +# Plan: API Rate Limiting + +**Risk level**: Medium (shared Redis state, potential to block legitimate traffic) + +## Architecture Decisions + +1. Library: express-rate-limit with rate-limit-redis store +2. No IP bypass initially — revisit if internal service issues arise +3. Unauthenticated: 100 requests/15 minutes. Authenticated: 1000 requests/15 minutes. +4. Redis failure fallback: in-memory store (accepts single-instance inconsistency) + +## Implementation Steps + +### Step 1: Install dependencies and configure Redis store + +**Files**: `package.json`, `src/config/rate-limit.ts` +**Test gate**: `npm install` completes, `src/config/rate-limit.ts` exports config without errors + +### Step 2: Implement rate limiter middleware + +**Files**: `src/middleware/rate-limit.ts` +**Test gate**: Unit test — limiter blocks 101st request from same IP within 15 minutes + +### Step 3: Apply to routes + +**Files**: `src/routes/api.ts` +**Test gate**: Integration test — unauthenticated route returns 429 after 100 requests; authenticated route does not + +### Step 4: Add Redis fallback + +**Files**: `src/config/rate-limit.ts` +**Test gate**: Test with Redis unavailable — API still responds (200, not 500), in-memory limiting active +``` + +**Human review**: Approved. + +### Phase 3: Implementation + +Claude implements Step 1, runs the gate (`npm install` + import check), commits `feat(rate-limit): step 1 — dependencies and config`. Then Step 2, Step 3, Step 4 in sequence. Each commit is clean. Each gate must pass before proceeding. + +--- + +## Comparison to Other Workflows + +| Workflow | Phase structure | Human gates | Best for | +|----------|----------------|------------|----------| +| **RPI** | Research + Plan + Implement | GO/NO-GO + plan approval | Unknown feasibility, >1 day, high risk of wrong direction | +| **Dual-Instance** | Plan + Implement (separate Claude instances) | Plan approval | Known features needing careful execution, spec-heavy work | +| **Spec-First** | Spec + Implement | None (spec is implicit gate) | Design-focused work, API contracts, team alignment | +| **TDD** | Test-first + Implement | None (tests are the gate) | Test coverage as driver, refactoring, incremental behavior | +| **Direct** | None | None | Simple changes, obvious scope, less than 2 hours | + +### RPI vs Dual-Instance + +Dual-instance separates planning and implementation into two Claude instances with strict role enforcement. It works well when you already know what you're building and want a high-quality plan. RPI adds a feasibility phase before planning, which makes it better for ambiguous requests. If you already have a clear spec, skip Research and use dual-instance or spec-first. + +### RPI vs Spec-First + +Spec-first is design-oriented: you define what the system should do, then Claude implements it. RPI is implementation-oriented with validation gates: you describe a goal, Claude researches how to achieve it, then the two of you agree on a plan before touching code. Use spec-first when the design is clear. Use RPI when the technical path is not. + +### RPI vs Direct Coding + +For anything under 2 hours with a clear scope, just ask Claude to do it. RPI adds overhead that isn't justified for small tasks. The research phase alone takes 30-60 minutes. That overhead pays off on multi-day features where discovering a wrong assumption late is much more expensive. + +--- + +## Tips and Troubleshooting + +### Claude Skips to Implementation in Research Phase + +**Problem**: Claude starts writing code during the research phase. + +**Solution**: Use Plan Mode (Shift+Tab twice) explicitly for the research phase. Include this line in the research command: + +``` +You are in Plan Mode. Do not modify files. Do not write implementation code. +Research only. Output goes to RESEARCH.md. +``` + +Or add to your CLAUDE.md: + +```markdown +## RPI Rules +- /rpi:research runs in Plan Mode only — no file modifications +- /rpi:plan produces only PLAN.md — no implementation code +- /rpi:implement runs one step at a time, waits for test gate before next step +``` + +### Research Phase Runs Too Long + +**Problem**: Claude explores the entire codebase instead of focusing on what's relevant. + +**Solution**: Scope the research explicitly: + +``` +/rpi:research payment-processing + +Focus area: src/payments/, src/routes/checkout.ts +Do not explore: frontend, auth, unrelated backend modules +Time budget: complete research in one session +``` + +### Plan Has Too Many Steps + +**Problem**: PLAN.md has 12 steps, making the implementation unwieldy. + +**Solution**: A plan with more than 6-8 steps usually needs a scope reduction, not more granular implementation. Ask Claude: + +``` +The plan has too many steps. What is the minimal viable scope that delivers +the core value? Revise the plan to implement only that, with a clear +"Future work" section for the rest. +``` + +### Test Gates Are Vague + +**Problem**: A step's test gate says "verify it works" rather than a specific command. + +**Solution**: Reject vague gates before approving the plan. Push back with: + +``` +Step 3's test gate is "verify rate limiting works." Make it specific: +what command do I run, and what output do I expect to see when it passes? +``` + +A good test gate: + +``` +Test gate: `npm test src/middleware/rate-limit.test.ts` — all 4 tests pass +``` + +A bad test gate: + +``` +Test gate: rate limiting is working correctly +``` + +### A Step Gate Fails Repeatedly + +**Problem**: Step 2's gate keeps failing even after attempted fixes. + +**Solution**: Two failures means investigate the plan, not keep iterating on fixes. + +``` +Stop implementation. The Step 2 gate has failed twice. +Review the plan — is the test gate achievable given the Step 1 output? +Do we need to revise the plan before continuing? +``` + +--- + +## File Structure Summary + +``` +.claude/ +└── features/ + └── [feature-name]/ + ├── RESEARCH.md # Phase 1 output (human annotates GO/NO-GO) + └── PLAN.md # Phase 2 output (human annotates approval) + +.claude/commands/ +├── rpi-research.md # /rpi:research slash command +├── rpi-plan.md # /rpi:plan slash command +└── rpi-implement.md # /rpi:implement slash command +``` + +Archive completed features: + +```bash +mkdir -p .claude/features/_archive +mv .claude/features/payment-processing .claude/features/_archive/ +``` + +The archive is a learning resource: completed RESEARCH.md and PLAN.md files show how previous features were reasoned about. + +--- + +## See Also + +- [dual-instance-planning.md](./dual-instance-planning.md) — Two-instance pattern for spec-heavy implementation +- [spec-first.md](./spec-first.md) — Design-first workflow using CLAUDE.md as contract +- [tdd-with-claude.md](./tdd-with-claude.md) — Combine with TDD for test-gate implementation +- [task-management.md](./task-management.md) — Managing multi-phase tasks across sessions +- **Main guide**: Advanced Patterns section — Multi-instance and planning pattern overview \ No newline at end of file diff --git a/machine-readable/reference.yaml b/machine-readable/reference.yaml index 2f4459b..1ce94d7 100644 --- a/machine-readable/reference.yaml +++ b/machine-readable/reference.yaml @@ -106,6 +106,13 @@ deep_dive: rpi_phase3_implement: "guide/workflows/rpi.md:237" # Step-gate pattern + failure protocol rpi_slash_commands: "guide/workflows/rpi.md:281" # /rpi:research /rpi:plan /rpi:implement templates rpi_vs_other_workflows: "guide/workflows/rpi.md:391" # Comparison table: RPI vs dual-instance vs spec-first + # Changelog Fragments Workflow + changelog_fragments_workflow: "guide/workflows/changelog-fragments.md" # Per-PR YAML fragments, 3-layer enforcement + changelog_fragments_claude_rule: "guide/workflows/changelog-fragments.md:26" # CLAUDE.md rule for autonomous fragment creation + changelog_fragments_hook: "guide/workflows/changelog-fragments.md:88" # UserPromptSubmit 3-tier hook (enforcement/discovery/contextual) + changelog_fragments_ci: "guide/workflows/changelog-fragments.md:169" # Independent CI migration check job + # Smart-Suggest Hook + smart_suggest_hook: "examples/hooks/bash/smart-suggest.sh" # UserPromptSubmit behavioral coach, 3-tier priority, ROI logging # Template Installation install_templates_script: "scripts/install-templates.sh" # Session management