docs: add RPI workflow, changelog fragments, smart-suggest hook + LLM variance

- guide/workflows/rpi.md (new): Research → Plan → Implement, 3-phase pattern with explicit GO gates, slash command templates, worked example - guide/workflows/changelog-fragments.md (new): per-PR YAML fragment enforcement, 3-layer system (CLAUDE.md rule + UserPromptSubmit hook + CI gate) - examples/hooks/bash/smart-suggest.sh (new): UserPromptSubmit behavioral coach, 3-tier priority (enforcement/discovery/contextual), ROI logging - guide/core/known-issues.md: LLM Day-to-Day Performance Variance section, 4 root causes (probabilistic inference, MoE routing, infra, context sensitivity) - guide/workflows/README.md: added RPI entry + quick selection row - machine-readable/reference.yaml: added entries for changelog_fragments, smart_suggest - CHANGELOG.md: [Unreleased] entries for all 4 new items - IDEAS.md: prompt-caching MCP plugin research note (testing in progress) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 16:22:57 +01:00 · 2026-03-13 16:22:57 +01:00 · b6ce1ef72f
commit b6ce1ef72f
parent 13efb5a774
8 changed files with 1251 additions and 0 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -8,6 +8,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

 ### Added

+- **`guide/workflows/rpi.md`** — New workflow guide (RPI: Research → Plan → Implement). 3-phase feature development pattern with explicit validation gates: Research produces `RESEARCH.md`, Plan produces `PLAN.md`, Implement produces working code. Each gate requires explicit GO before the next phase. Includes slash command templates (`/rpi:research`, `/rpi:plan`, `/rpi:implement`), a worked example (adding rate limiting to an Express API), and comparison matrix vs Plan-Driven, TDD, and Spec-First. Best for features where discovering a wrong assumption late is expensive.
+
+- **`guide/workflows/changelog-fragments.md`** — New workflow guide for the Changelog Fragments pattern: one YAML fragment per PR, written at implementation time, validated by CI, assembled automatically at release. Covers 3-layer enforcement: (1) CLAUDE.md workflow rule for autonomous fragment creation, (2) `UserPromptSubmit` hook with 3-tier priority (enforcement → discovery → contextual), (3) independent CI migration check job. Includes the `UserPromptSubmit` tier pattern as a reusable hook architecture for any mandatory workflow step.
+
+- **`examples/hooks/bash/smart-suggest.sh`** — New `UserPromptSubmit` hook implementing the 3-tier behavioral coach pattern: Tier 0 enforcement (changelog fragment required before PR, plan-before-code), Tier 1 discovery (test-loop, retex, dupes, monitoring, security, release), Tier 2 contextual (code review, debugging, architecture, session resume). Max 1 suggestion per prompt (first match wins), dedup guard, ROI logging to `~/.claude/logs/smart-suggest.jsonl`, silent exit on no match.
+
+- **`guide/core/known-issues.md`** — New section "LLM Day-to-Day Performance Variance": documents session-to-session output quality variance (shorter responses, conservative suggestions, edge-case refusals) as expected behavior, not a bug. Explains the 4 root causes (probabilistic inference, MoE routing variance, infrastructure variance, context sensitivity) and provides an observable signals table. Includes a practical checklist for ruling out controllable factors before concluding "the model degraded."
+
 - **`cc-sessions discover` documentation** — New subsection "Session Pattern Discovery" in §2.x (Session Management) covering the `discover` subcommand: n-gram mode (local, free, ~3s for 12 projects) vs `--llm` mode (semantic analysis via `claude --print`). Includes example output, the 20% rule decision framework (CLAUDE.md rule / skill / command categorization), and install instructions. Cross-reference added after the 20% rule callout in §5.1.

 - **`examples/scripts/cc-sessions.py` synced** — Updated from 498-line stale copy to the full 1225-line version from `~/bin/cc-sessions`. Includes the complete `discover` subcommand (n-gram analysis + `--llm` mode), incremental discover cache, Jaccard deduplication, and all filtering logic. GitHub source header added.
--- a/IDEAS.md
+++ b/IDEAS.md
@ -85,6 +85,36 @@ CLAUDE.md configuration examples by framework:

 ## Watching (Waiting for Demand)

+a### prompt-caching MCP Plugin
+
+MCP plugin that automates `cache_control` placement for developers building apps on the Anthropic SDK. Installed locally at `/Users/florianbruniaux/Sites/prompt-caching` and connected to Claude Code via `~/.claude.json`.
+
+**Status:** Testing in progress. Real usage data required before any documentation decision.
+
+**What we know:**
+- 29 stars, v1.3.0, solo maintainer — maintenance risk
+- Author-reported benchmarks (80-92% savings) — unverified, cannot cite
+- Fills a real gap: no other MCP tool does this; Spring AI / LiteLLM / Pydantic AI serve different audiences
+- Blog post (Mathieu Grenier) independently documents the same pain point + 5 antipatterns — score 3/5, worth integrating in "Strategy 6" regardless
+
+**Open questions:**
+- [ ] Do real sessions on this project actually hit the cache? (run `get_cache_stats` after 10+ turns)
+- [ ] Is the plugin stable enough to recommend? Any errors, memory leaks, session issues?
+- [ ] What are the real savings on a CLAUDE.md-heavy project like this guide?
+
+**If test results are positive (cache hits confirmed, no stability issues):**
+- Add to `guide/ecosystem/third-party-tools.md` with verified stats (not README claims)
+- Add to landing third-party tools section
+- Score upgrade: 3/5 → 4/5
+
+**If test results are inconclusive or plugin is unstable:**
+- Move to Discarded Ideas
+- Keep the Mathieu Grenier blog post integration (independent value)
+
+**Check again:** After 1 week of real usage
+
+---
+
 ### Multi-LLM Consultation Patterns
 Using external LLMs (Gemini, GPT-4) as "second opinion" from Claude Code.

--- a/examples/hooks/bash/smart-suggest.sh
+++ b/examples/hooks/bash/smart-suggest.sh
@ -0,0 +1,176 @@
+#!/bin/bash
+# .claude/hooks/smart-suggest.sh
+# Event: UserPromptSubmit
+# Non-blocking behavioral coach: suggests the right command or agent based on detected intent
+#
+# Architecture: 3-tier priority system
+#   Tier 0 — Enforcement: mandatory workflow gates (runs first, exits on match)
+#   Tier 1 — Discovery: high-value tools the developer may not know exist
+#   Tier 2 — Contextual: opportunistic suggestions for common patterns
+#
+# Properties:
+#   - Max 1 suggestion per prompt (first match wins)
+#   - Dedup guard: never suggest a command already in the prompt
+#   - ROI logging: records suggestions to ~/.claude/logs/smart-suggest.jsonl
+#   - Silent exit on no match (non-blocking, never fails)
+#
+# Customization: replace patterns with your own commands/agents.
+# Keep Tier 0 small and precise — it intercepts before everything else.
+
+set -euo pipefail
+
+INPUT=$(cat)
+PROMPT=$(echo "$INPUT" | jq -r '.prompt // empty' 2>/dev/null || true)
+
+# Skip prompts that are too short to match anything meaningful
+if [[ -z "$PROMPT" || ${#PROMPT} -lt 8 ]]; then
+    exit 0
+fi
+
+PROMPT_LC=$(echo "$PROMPT" | tr '[:upper:]' '[:lower:]')
+
+# Skip slash commands — user already knows what they want
+if [[ "$PROMPT_LC" =~ ^/ ]]; then
+    exit 0
+fi
+
+LABEL_CMD="Command"
+LABEL_AGENT="Agent"
+
+# ─── suggest() ────────────────────────────────────────────────────────────────
+# Emits one suggestion then exits (guarantees max 1 suggestion per prompt).
+# Dedup: if the command/agent name is already in the prompt, exits silently.
+suggest() {
+    local label="$1" name="$2" reason="$3"
+
+    # Strip leading slash, take first token — prevents matching "pr" in "improve"
+    # e.g. "/changelog:add" → "changelog:add", "code-reviewer" → "code-reviewer"
+    local check
+    check=$(echo "$name" | tr '[:upper:]' '[:lower:]' | sed 's|^/||' | awk '{print $1}')
+    if echo "$PROMPT_LC" | grep -qF "$check"; then
+        exit 0
+    fi
+
+    # Append to JSONL log for ROI measurement (non-blocking — failure is safe)
+    local logdir="${HOME}/.claude/logs"
+    mkdir -p "$logdir" 2>/dev/null || true
+    printf '{"ts":"%s","suggested":"%s","prompt_len":%d}\n' \
+        "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$name" "${#PROMPT}" \
+        >> "$logdir/smart-suggest.jsonl" 2>/dev/null || true
+
+    cat << EOF
+{
+  "hookSpecificOutput": {
+    "hookEventName": "UserPromptSubmit",
+    "additionalContext": "[Suggestion] $label: $name -- $reason"
+  }
+}
+EOF
+    exit 0
+}
+
+# ==============================================================================
+# TIER 0 — Enforcement
+# Intercepts workflow violations before any other suggestion.
+# Keep patterns tight: a false positive here is more disruptive than a miss.
+# ==============================================================================
+
+# Changelog fragment enforcement
+# If the developer signals intent to create a PR without mentioning a changelog
+# fragment, redirect to the fragment creation step first.
+# When fragment/skip-changelog is already mentioned → suggest the PR command normally.
+if echo "$PROMPT_LC" | grep -qE '(create.*pr|open.*pr|make.*pr|pull.?request|push.*pr)'; then
+    if ! echo "$PROMPT_LC" | grep -qE '(changelog|fragment|skip-changelog)'; then
+        suggest "$LABEL_CMD" "pnpm changelog:add" \
+            "REQUIRED before merge — creates changelog/fragments/{PR}-{slug}.yml (or add label 'skip-changelog' for tooling PRs)"
+    else
+        suggest "$LABEL_CMD" "/pr" \
+            "PR creation with structured description and auto-detected scope"
+    fi
+fi
+
+# Plan-before-code enforcement
+# Intercepts implementation intent when no planning context is present.
+# Negative lookahead excludes: plan, test, fix, review, explain, refactor.
+if echo "$PROMPT_LC" | grep -qE '(implement|add (the|a|an)|create (the|a|an) (service|router|endpoint|feature|component)|write (the )?(code|logic|service))'; then
+    if ! echo "$PROMPT_LC" | grep -qE '(plan|test|fix|bug|review|refactor|explain|why|how)'; then
+        suggest "$LABEL_CMD" "/plan" \
+            "Run /plan BEFORE coding — Research → Spec → Implement reduces rework"
+    fi
+fi
+
+# ==============================================================================
+# TIER 1 — Discovery
+# High-value tools the developer may not know about.
+# Use multi-word patterns to reduce false positives.
+# ==============================================================================
+
+# Failing tests — suggest auto-fix loop before generic debug
+if echo "$PROMPT_LC" | grep -qE '(tests? (fail|broken|red)|fix (the |these )?tests?|tests? (are )?failing)'; then
+    if ! echo "$PROMPT_LC" | grep -qE '(api|server|endpoint|500|404)'; then
+        suggest "$LABEL_CMD" "/test-loop" \
+            "Auto-fix loop: 1 test per cycle with convergence stats"
+    fi
+fi
+
+# Lesson learned / rollback scenario
+if echo "$PROMPT_LC" | grep -qE '(wrong approach|bad approach|should have|dead end|root cause was|lesson learned|introduced.*bug|regression)'; then
+    suggest "$LABEL_CMD" "/retex" \
+        "Capture this lesson in persistent memory (searchable across sessions)"
+fi
+
+# Code duplication detected
+if echo "$PROMPT_LC" | grep -qE '(duplicat|copy.?past|same code|repeated (code|logic)|identical (code|function))'; then
+    suggest "$LABEL_CMD" "/dupes" \
+        "Detect copy-paste code with jscpd before the refactor"
+fi
+
+# Monitoring / polling intent — suggest background loop
+if echo "$PROMPT_LC" | grep -qE '(check (every|each|all)|wait (until|for)|poll|monitor|watch.*for|when (it.?s |it is )?done|once (the )?deploy|once (the )?ci)'; then
+    suggest "$LABEL_CMD" "/loop [interval] [prompt]" \
+        "Run in background while you work: /loop 5m check the deploy status"
+fi
+
+# Security-sensitive keywords
+if echo "$PROMPT_LC" | grep -qE '(sql injection|xss|owasp|vulnerability|security audit|auth bypass)'; then
+    suggest "$LABEL_AGENT" "security-auditor" \
+        "Read-only OWASP audit — detects injection, auth, and data exposure issues"
+fi
+
+# Release assembly — suggest structured release command
+if echo "$PROMPT_LC" | grep -qE '(cut.*release|prepare.*release|new (version|release)|release.*[0-9]+\.[0-9]|tag.*version|deploy.*main|merge.*main)'; then
+    suggest "$LABEL_CMD" "/release" \
+        "Assembles fragments → CHANGELOG section, bumps version, creates release branch"
+fi
+
+# ==============================================================================
+# TIER 2 — Contextual
+# General patterns, lower priority, regex kept tight to avoid noise.
+# ==============================================================================
+
+# Code review before merge
+if echo "$PROMPT_LC" | grep -qE '(review (the |this )?(code|pr|file)|code review|before (merge|merging))'; then
+    suggest "$LABEL_AGENT" "code-reviewer" \
+        "Quality + security + performance review"
+fi
+
+# Debugging — generic
+if echo "$PROMPT_LC" | grep -qE '(debug (this|the)|fix (this |the )?(bug|crash|error)|stack trace|not (working|loading))'; then
+    suggest "$LABEL_AGENT" "debugger" \
+        "Systematic root cause analysis"
+fi
+
+# Architecture decision
+if echo "$PROMPT_LC" | grep -qE '(architecture (decision|review)|system design|big refactor|large.*refactor)'; then
+    suggest "$LABEL_AGENT" "architect-review" \
+        "Validates architectural decisions across scalability, coupling, and tradeoffs"
+fi
+
+# Resume previous session
+if echo "$PROMPT_LC" | grep -qE '(resume (session|work)|pick up (where|from)|continue (from )?(yesterday|last session))'; then
+    suggest "$LABEL_CMD" "/resume" \
+        "Restores session context from persistent memory"
+fi
+
+# No match — silent exit (never blocks the user)
+exit 0
--- a/guide/core/known-issues.md
+++ b/guide/core/known-issues.md
@ -234,6 +234,61 @@ Key quote:

 ---

+## 🔄 LLM Day-to-Day Performance Variance
+
+**Type**: Expected behavior (not a bug)
+**Severity**: 🟡 **LOW - AWARENESS**
+**Status**: Inherent to LLM inference, not specific to any version
+
+### What This Is
+
+Claude's output quality can vary noticeably from session to session, even with identical prompts and a clean context window. This is distinct from context window degradation (which happens within a session as context fills up). This is about variance between fresh sessions.
+
+Users sometimes report shorter responses, more conservative suggestions, or unexpected refusals on tasks that worked fine the day before. This can feel like a model downgrade, but it is not.
+
+### Root Causes
+
+**Probabilistic inference**: Temperature above 0 means every inference run is non-deterministic. Two runs of the same prompt will produce different token sequences. This is fundamental to how language models work.
+
+**MoE routing variance**: Claude uses a Mixture of Experts architecture. On each forward pass, a routing mechanism selects which expert weights to activate. Different runs activate different combinations, producing different outputs even for semantically identical inputs.
+
+**Infrastructure variance**: In production, requests hit different servers with different load levels, hardware generations, and thermal states. These factors influence numerical precision in floating-point arithmetic during inference, creating subtle but real output differences.
+
+**Context sensitivity**: Even with `/clear`, tiny differences between sessions accumulate. The system prompt, tool list, and session initialization all slightly affect the model's first outputs.
+
+### Observable Signals
+
+| Signal | What You See | What It Means |
+|--------|-------------|---------------|
+| Response length | Shorter, less detailed than usual | Routing hit a more conservative path |
+| Refusals | Edge cases that normally work get refused | Different safety calibration on this run |
+| Code style | More verbose or more minimal than expected | Expert mix activated differently |
+| Creativity | More conservative, less inventive suggestions | Not a capability loss, a sampling outcome |
+| Verbosity | More caveats and disclaimers than usual | Normal variance in token probabilities |
+
+### What This Is NOT
+
+- **Not a model downgrade**: Anthropic versions models deliberately and documents changes. Day-to-day variance happens within the same model version.
+- **Not a bug to report**: This behavior is expected and documented in LLM literature. It is inherent to probabilistic inference.
+- **Not permanent**: The next session will likely behave differently. A "bad" run does not indicate a lasting change.
+- **Not context window degradation**: That is a within-session phenomenon caused by token accumulation. This is between-session variance on fresh starts.
+
+> The Aug-Sep 2025 incident ([see Resolved Issues above](#model-quality-degradation-aug-sep-2025)) was the exception: Anthropic confirmed actual infrastructure bugs causing systematic degradation. True systematic degradation is rare and Anthropic investigates it. Normal session-to-session variance is something else.
+
+### Mitigation Strategies
+
+**Constrain the prompt**: More specific prompts reduce the output space and make variance less noticeable. "Write a function that does X, Y, Z, returns type T, handles edge case E" produces more consistent outputs than "write me something to handle X."
+
+**Fresh context before important work**: Run `/clear` before a high-stakes task. Accumulated session noise from earlier exploratory work can skew subsequent outputs even within the same session.
+
+**Reformulate and retry**: If an output seems off compared to your expectations, try the same request with different framing. A second formulation often routes through different expert paths and produces a better result.
+
+**Compare against a known-good prompt**: If you have a prompt from a previous session that produced excellent output, use it as a reference. If today's version of that prompt produces visibly worse output consistently, that warrants closer investigation (and potentially a GitHub issue if reproducible).
+
+**Calibrate expectations by task type**: Deterministic tasks (regex, simple transforms, well-defined algorithms) show less variance than creative or judgment-heavy tasks. Use Claude Code for the former with high reliability; for the latter, build review steps into your workflow.
+
+---
+
 ## 📊 Issue Statistics (as of Jan 28, 2026)

 | Metric | Count | Source |
--- a/guide/workflows/README.md
+++ b/guide/workflows/README.md
@ -83,6 +83,14 @@ Eliminates merge conflicts on `CHANGELOG.md`, captures context at implementation
 - Conditional suggestion pattern: "if PR-intent without fragment-mention"
 - CI enforcement with independent migration check job

+### [RPI: Research → Plan → Implement](./rpi.md) ⭐ NEW
+
+**3-phase feature development with explicit validation gates between phases**
+
+Build features in three locked phases: Research feasibility first, plan the implementation second, write code third. Each phase produces a concrete artifact (RESEARCH.md → PLAN.md → code). Each gate requires an explicit GO before the next phase starts.
+
+**When to use**: Features with unclear feasibility, more than a day of work, unknown technical territory, or anywhere discovering a wrong assumption late is costly
+
 ### [Cognitive Mode Switching](./gstack-workflow.md) ⭐ NEW

 Switch between specialist roles across your ship cycle: strategic product gate, architecture review, paranoid code review, automated release, native browser QA, and retrospective.
@ -190,6 +198,7 @@ Multi-session task tracking with TodoWrite, tasks API, and context persistence a
 | **New project from template** | [Skeleton Projects](./skeleton-projects.md) |
 | **Team AI instructions** | [Team AI Instructions](./team-ai-instructions.md) |
 | **Enforce mandatory workflow steps** | [Changelog Fragments](./changelog-fragments.md) |
+| **Unknown feasibility, multi-day feature** | [RPI: Research → Plan → Implement](./rpi.md) |
 | **Documentation** | [PDF Generation](./pdf-generation.md) |
 | **Social previews** | [OG Image Generation](./og-image-generation.md) |
 | **Conference talk from raw material** | [Talk Preparation Pipeline](./talk-pipeline.md) |
--- a/guide/workflows/changelog-fragments.md
+++ b/guide/workflows/changelog-fragments.md
@ -0,0 +1,200 @@
+# Changelog Fragments: Enforced Per-PR Documentation
+
+A 3-layer enforcement pattern that ensures every PR is documented at write time, never at release time.
+
+---
+
+## The Problem
+
+Single `CHANGELOG.md` files break on active teams. Three open feature branches all touching the same file means merge conflicts on every merge. Someone resolves the conflict, drops a line, and release notes are wrong before they're published.
+
+The deeper problem is timing. "Document at release time" sounds reasonable until you're staring at PR #840 three weeks after it merged, trying to reconstruct what changed for users. The commit says `fix session handling`. The developer is in a different timezone. Context is gone.
+
+Enforcement without CI gates means changelogs become a nag job. Someone has to chase people before every release, under time pressure, filling in blanks from git log.
+
+The solution: one YAML fragment per PR, written while implementing, validated by CI, assembled automatically at release.
+
+---
+
+## The 3-Layer Architecture
+
+The system works because enforcement happens at three independent levels. Each layer catches a different failure mode.
+
+### Layer 1: CLAUDE.md Workflow Rule
+
+The first layer is a rule loaded into Claude Code's context at every session. It encodes the entire fragment workflow so Claude can complete it autonomously when asked to create a PR.
+
+```markdown
+# git-workflow.md (loaded via CLAUDE.md)
+
+## Changelog Fragment — Required Before Every PR
+
+Before creating a PR, always generate a changelog fragment.
+
+### Steps
+
+1. **Infer from git diff** — analyze `git diff main...HEAD` to determine:
+   - `type`: feat | fix | perf | refactor | security | docs | chore
+   - `scope`: the functional area affected (auth, sessions, api, etc.)
+   - `title`: one-line user-facing summary (< 80 chars)
+
+2. **Create the fragment** — run `pnpm changelog:add` or write directly:
+   ```
+   changelog/fragments/{PR_NUMBER}-{slug}.yml
+   ```
+
+3. **Validate** — run `pnpm changelog:validate changelog/fragments/{file}.yml`
+
+4. **Commit alongside the PR** — include the fragment in the same branch
+
+### Fragment schema
+
+```yaml
+pr: 886                    # must match filename prefix
+type: fix                  # feat|fix|perf|refactor|security|docs|chore
+scope: "visiochat"
+title: "Fix empty chat after SSE race condition"   # < 80 chars
+description: |             # optional — explain user impact, not implementation
+  SSE workplan fires before AI stream completes, causing ChatWrapper
+  to mount with 0 messages.
+breaking: false
+migration: false           # set true if PR adds a DB migration
+```
+
+### Bypass
+
+Add label `skip-changelog` for PRs with no user impact (CI config, deps updates, release commits).
+```
+
+This rule makes Claude Code a participant in enforcement, not just a coding tool. When a developer says "make the PR," Claude infers the fragment content from the diff and creates it before opening the PR.
+
+### Layer 2: UserPromptSubmit Hook (Behavioral Detection)
+
+The second layer intercepts intent before it becomes action. When the developer types something that signals PR creation intent, the hook checks whether a changelog fragment was mentioned.
+
+```bash
+# .claude/hooks/smart-suggest.sh (excerpt — Tier 0 enforcement)
+
+# PR creation intent detected
+if echo "$PROMPT_LC" | grep -qE '(create.*pr|open.*pr|make.*pr|pull.?request|push.*pr)'; then
+    # Fragment not mentioned → redirect to creation step first
+    if ! echo "$PROMPT_LC" | grep -qE '(changelog|fragment|skip-changelog)'; then
+        suggest "pnpm changelog:add" \
+            "REQUIRED before merge — creates changelog/fragments/{PR}-{slug}.yml"
+    else
+        # Already mentioned → suggest the PR command normally
+        suggest "/pr" "PR creation with structured description"
+    fi
+fi
+```
+
+The hook is `UserPromptSubmit`: non-blocking, max one suggestion per prompt, silent on no match. It runs before Claude Code processes the prompt, so the developer sees the reminder inline before Claude starts doing anything.
+
+The conditional logic (`if X without Y`) is the key pattern here. It's not a blanket blocker — it adapts to context. If the developer already mentioned the fragment, they get the normal suggestion. If they didn't, they get the enforcement reminder.
+
+**Full hook with 3-tier architecture**: [`examples/hooks/bash/smart-suggest.sh`](../../examples/hooks/bash/smart-suggest.sh)
+
+### Layer 3: CI Enforcement (GitHub Actions)
+
+The third layer is the hard gate. Two independent jobs run on every PR targeting the main branch.
+
+**`check-fragment` job**: checks bypass labels first (closed list), then requires `changelog/fragments/{PR_NUMBER}-*.yml` to exist and pass structural validation.
+
+```yaml
+- name: Check fragment exists and is valid
+  env:
+    PR_NUMBER: ${{ github.event.pull_request.number }}
+    PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
+  run: |
+    SKIP_LABELS=("skip-changelog" "dependencies" "release" "chore: deps")
+    for LABEL in "${SKIP_LABELS[@]}"; do
+      if echo "$PR_LABELS" | grep -q "\"$LABEL\""; then
+        echo "Bypass label detected — fragment not required"
+        exit 0
+      fi
+    done
+
+    FRAGMENT=$(ls "changelog/fragments/${PR_NUMBER}-"*.yml 2>/dev/null | head -1)
+    if [ -z "$FRAGMENT" ]; then
+      echo "Fragment missing. Run: pnpm changelog:add"
+      exit 1
+    fi
+
+    pnpm tsx changelog/scripts/validate.ts "$FRAGMENT"
+```
+
+**`check-migration-flag` job** (runs independently, no bypass): detects new SQL migration files with `git diff --name-only --diff-filter=A`. If migrations are present and `migration: false` in the fragment, it fails. This job cannot be bypassed by labels — a `skip-changelog` PR that adds a migration still triggers the check.
+
+The two jobs are independent by design. A PR can bypass fragment creation (via label) but still fail the migration check.
+
+---
+
+## Fragment Assembly at Release
+
+Fragments accumulate in `changelog/fragments/` as PRs merge. At release time, one command assembles them into a versioned CHANGELOG section.
+
+```bash
+pnpm changelog:assemble --version 1.8.0 [--dry-run]
+```
+
+What it does:
+1. Reads all `changelog/fragments/*.yml`
+2. Groups by type in fixed order (feat, fix, perf, refactor, security, docs, chore)
+3. Pulls `breaking: true` entries into a dedicated `🔨 Breaking Changes` section
+4. Annotates `migration: true` entries inline with `⚠️ Migration DB.`
+5. Replaces the `## [Next Release]` placeholder in `CHANGELOG.md`
+6. Archives fragments to `changelog/fragments/released/{version}/`
+
+Output:
+```markdown
+## [1.8.0] - 2026-03-15
+
+### 🔨 Breaking Changes
+- **Remove legacy token format (#871)** — Tokens issued before v1.6.0 are invalid.
+
+### ✨ New Features
+- **Add real-time presence indicators (#892)**
+
+### 🔧 Bug Fixes
+- **Fix empty chat after SSE race condition (#886)** — SSE workplan fires before
+  AI stream completes, causing ChatWrapper to mount with 0 messages.
+```
+
+---
+
+## Why 3 Layers, Not 1
+
+Each layer catches a different failure mode:
+
+| Layer | Failure caught | When |
+|-------|---------------|------|
+| CLAUDE.md rule | Claude forgets the workflow | Every session |
+| UserPromptSubmit hook | Developer types "make the PR" without thinking | Pre-prompt |
+| CI gate | Fragment was skipped or corrupt | Pre-merge |
+
+A single CI gate catches the issue too late — the developer has to context-switch back after their PR is already open. The hook catches it at intent time. The CLAUDE.md rule means Claude handles it autonomously when given the task.
+
+The layers don't conflict. They reinforce each other. A developer who sees the hook suggestion will run `pnpm changelog:add`. Claude will follow the CLAUDE.md rule and validate the output. CI confirms everything before merge.
+
+---
+
+## Adopting This Pattern
+
+The TypeScript scripts (add, validate, assemble, audit) are specific to the Méthode Aristote stack. The 3-layer enforcement pattern is not — it works with any fragment format, any CI system, any assembler.
+
+**Minimum viable setup:**
+
+1. **Define your fragment schema** (YAML, JSON, whatever fits your stack)
+2. **Add a CLAUDE.md rule** encoding the creation workflow so Claude can handle it autonomously
+3. **Add a `UserPromptSubmit` hook** with the `if PR-intent without fragment-mention → suggest` pattern
+4. **Add a CI job** that checks for fragment existence before merge
+
+The hook pattern generalizes to any mandatory workflow step. Substitute "changelog fragment" with "ADR", "migration flag", "test coverage check" — the conditional detection logic is the same.
+
+---
+
+## Related
+
+- Hook example: [`examples/hooks/bash/smart-suggest.sh`](../../examples/hooks/bash/smart-suggest.sh)
+- Hook documentation: [UserPromptSubmit Hooks](../ultimate-guide.md) (search "UserPromptSubmit")
+- Fragment validator and assembler scripts: available in the Méthode Aristote repository
--- a/guide/workflows/rpi.md
+++ b/guide/workflows/rpi.md
@ -0,0 +1,766 @@
+---
+title: "RPI: Research → Plan → Implement"
+description: "A 3-phase feature development pattern with explicit validation gates between phases"
+tags: [workflow, architecture, design-patterns, validation]
+---
+
+# RPI: Research → Plan → Implement
+
+> **Confidence**: Tier 2 — Synthesized from production team patterns. The gate-based structure aligns with Anthropic's guidance on agent task decomposition and agentic loop control.
+
+Build features in three locked phases: Research feasibility first, plan the implementation second, write code third. Each phase produces a concrete artifact. Each gate requires an explicit GO before the next phase starts.
+
+---
+
+## Table of Contents
+
+1. [TL;DR](#tldr)
+2. [When to Use RPI](#when-to-use-rpi)
+3. [How the Gates Work](#how-the-gates-work)
+4. [Phase 1: Research](#phase-1-research)
+5. [Phase 2: Plan](#phase-2-plan)
+6. [Phase 3: Implement](#phase-3-implement)
+7. [Slash Command Templates](#slash-command-templates)
+8. [Worked Example](#worked-example)
+9. [Comparison to Other Workflows](#comparison-to-other-workflows)
+10. [Tips and Troubleshooting](#tips-and-troubleshooting)
+11. [See Also](#see-also)
+
+---
+
+## TL;DR
+
+```
+Phase 1 — Research:
+  Claude explores feasibility, surfaces risks, asks decision questions
+  Output: RESEARCH.md
+  Gate: You decide GO / NO-GO
+
+Phase 2 — Plan:
+  Claude writes architecture decisions, user stories, test plan
+  Output: PLAN.md
+  Gate: You approve the plan before any code is written
+
+Phase 3 — Implement:
+  Claude implements step by step, tests pass before each next step
+  Output: working code + passing tests
+  Gate: each implementation step validated before the next begins
+```
+
+**Best for**: Features with unclear feasibility, more than a day of work, unknown technical territory, or anything where discovering a wrong assumption late is costly.
+
+---
+
+## When to Use RPI
+
+### Use RPI When
+
+- **Feasibility is unknown**: You have an idea but aren't sure it holds up technically
+- **Scope is large**: More than a day's worth of implementation work
+- **Requirements are fuzzy**: You know the outcome you want, not the path to get there
+- **Risk of wrong direction is high**: Security, payments, data migrations, integrations with external systems
+- **You've been surprised before**: A feature looked simple, turned out to involve 6 other systems
+
+### Skip RPI When
+
+| Scenario | Better Approach |
+|----------|----------------|
+| Fix is obvious (typo, wrong color) | Direct edit |
+| Feature is well-understood, requirements are clear | [Spec-First](./spec-first.md) or [dual-instance](./dual-instance-planning.md) |
+| Exploration mode — you don't know what you want yet | Exploration workflow |
+| Tiny change, single file | Just do it |
+
+### Decision Heuristic
+
+Ask yourself: "If the research phase reveals a serious problem, am I glad I didn't spend 2 days implementing first?"
+
+If yes, run RPI. The research phase typically takes 30-60 minutes and can save many hours.
+
+---
+
+## How the Gates Work
+
+RPI has two human gates and one automated gate per implementation step.
+
+```
+[Idea]
+   |
+   v
+[Phase 1: Research]
+   |
+   +- NO-GO -> Stop. Document why. Archive RESEARCH.md.
+   |
+   +- GO -------------------------------------------------------->
+                                                                 |
+                                                       [Phase 2: Plan]
+                                                                 |
+                                          +- Needs revision -> iterate with Claude
+                                          |
+                                          +- Approved -------------------------------->
+                                                                                      |
+                                                                          [Phase 3: Implement]
+                                                                              Step 1 -> Test gate
+                                                                              Step 2 -> Test gate
+                                                                              Step 3 -> Test gate
+                                                                                   |
+                                                                                [Done]
+```
+
+**Gate 1 (after Research)**: You read RESEARCH.md and make a GO/NO-GO decision. This is the most important gate — it prevents building the wrong thing entirely.
+
+**Gate 2 (after Plan)**: You review PLAN.md before any code is written. Minor revisions happen here at zero cost.
+
+**Step gates (during Implement)**: Each implementation step must have passing tests before Claude moves to the next step. Automated, no human action required unless a step fails.
+
+---
+
+## Phase 1: Research
+
+### What Research Covers
+
+The research phase answers five questions:
+
+1. **What already exists?** Relevant code, libraries, prior attempts in the codebase
+2. **What needs to be built?** Scope boundary, components to create vs modify
+3. **What are the risks?** Technical, security, integration, performance
+4. **What are the decision points?** Architecture choices that affect the whole plan
+5. **What is the effort estimate?** Rough sizing before committing to a plan
+
+Claude explores the codebase, reads relevant files, checks dependencies, and surfaces any constraint that would change the plan. The result is RESEARCH.md.
+
+### Starting Research
+
+Create the feature folder and invoke research:
+
+```bash
+mkdir -p .claude/features/[feature-name]
+```
+
+Then in Claude:
+
+```
+/rpi:research [feature description]
+```
+
+Or without the slash command:
+
+```
+Run RPI Phase 1 (Research) for: [feature description]
+
+Save output to .claude/features/[feature-name]/RESEARCH.md
+Use Plan Mode to explore without modifying code.
+```
+
+### RESEARCH.md Template
+
+```markdown
+# Research: [Feature Name]
+
+**Date**: [YYYY-MM-DD]
+**Requested**: [One-sentence description of the feature]
+**Status**: PENDING DECISION
+
+---
+
+## What Exists Today
+
+### Relevant Code
+- [file path]: [what it does, why it matters]
+- [file path]: [what it does, why it matters]
+
+### Relevant Libraries
+- [library]: [currently used / available / needs to be added]
+
+### Prior Attempts or Related Work
+- [any existing partial implementation, related PR, note in codebase]
+
+---
+
+## What Needs to Be Built
+
+### New Files
+- [file path]: [purpose]
+- [file path]: [purpose]
+
+### Files to Modify
+- [file path]: [what changes, why]
+
+### External Dependencies
+- [dependency]: [reason needed, version constraint if any]
+
+---
+
+## Risks
+
+| Risk | Likelihood | Impact | Notes |
+|------|-----------|--------|-------|
+| [risk description] | Low/Med/High | Low/Med/High | [mitigation or blocker] |
+
+---
+
+## Architecture Decision Points
+
+Questions that need a decision before planning can start:
+
+1. **[Decision]**: Option A (pros: X, cons: Y) vs Option B (pros: X, cons: Y)
+2. **[Decision]**: [Options and trade-offs]
+
+---
+
+## Effort Estimate
+
+- Research-to-plan: [time]
+- Implementation: [time range]
+- Testing: [time]
+- **Total estimate**: [range]
+
+**Confidence in estimate**: Low / Medium / High
+**Why**: [reason for confidence level]
+
+---
+
+## Recommendation
+
+[GO / NO-GO / NEEDS CLARIFICATION]
+
+[1-3 sentences explaining the recommendation]
+
+---
+
+**Decision**: [ ] GO  [ ] NO-GO  [ ] NEEDS CLARIFICATION
+**Notes**: [human fills this in]
+```
+
+### What a NO-GO Looks Like
+
+Not every research phase ends in GO. Common NO-GO reasons:
+
+- **Technical blocker**: External API doesn't support the required operation
+- **Scope creep discovered**: "Simple feature" turns out to require rewriting the auth layer
+- **Better alternative exists**: Research reveals an existing library or config change that solves the problem more simply
+- **Risk too high for now**: The feature is valid but the timing is wrong
+
+Archive NO-GO research docs — they're valuable records of decisions made and why.
+
+---
+
+## Phase 2: Plan
+
+### What the Plan Covers
+
+Phase 2 starts only after you mark RESEARCH.md with GO. Claude reads the research doc and produces a precise implementation plan.
+
+A good plan is specific enough that a different engineer (or Claude in a new session) could execute it without asking questions.
+
+```
+/rpi:plan .claude/features/[feature-name]/RESEARCH.md
+```
+
+Or without the slash command:
+
+```
+Run RPI Phase 2 (Plan) using:
+- Research: .claude/features/[feature-name]/RESEARCH.md
+
+Save output to .claude/features/[feature-name]/PLAN.md
+Do not write any code yet. Plan only.
+```
+
+### PLAN.md Template
+
+```markdown
+# Plan: [Feature Name]
+
+**Date**: [YYYY-MM-DD]
+**Research**: [link to RESEARCH.md]
+**Estimated effort**: [from research]
+**Risk level**: Low / Medium / High
+
+---
+
+## Summary
+
+[2-4 sentences: what this implements, major design decisions taken, what it does NOT include]
+
+---
+
+## Architecture Decisions
+
+[Record decisions made from the research decision points]
+
+1. **[Decision]**: Chose [Option] because [reason]
+2. **[Decision]**: Chose [Option] because [reason]
+
+---
+
+## Implementation Steps
+
+Steps must be sequential and independently testable.
+
+### Step 1: [Name]
+
+**Files**: [list of files to create or modify]
+**What to build**: [precise description]
+**Test gate**: [specific test or check that must pass before Step 2 starts]
+
+### Step 2: [Name]
+
+**Files**: [list of files to create or modify]
+**What to build**: [precise description]
+**Test gate**: [specific test or check that must pass before Step 3 starts]
+
+[...continue for all steps]
+
+---
+
+## Success Criteria
+
+- [ ] [Testable criterion — describes observable behavior, not implementation]
+- [ ] [Testable criterion]
+- [ ] All step test gates pass
+
+---
+
+## Out of Scope
+
+Explicitly list what this plan does NOT cover:
+- [thing excluded]
+- [thing excluded]
+
+---
+
+## Risks Accepted
+
+[From RESEARCH.md risks, list which are accepted and how they're mitigated]
+
+---
+
+## Rollback Plan
+
+If implementation fails mid-way:
+- [what to undo]
+- [how to restore previous state]
+
+---
+
+**Plan approved?** [ ] YES — proceed to implementation
+**Revision notes**: [human fills this in if changes needed]
+```
+
+### Reviewing the Plan
+
+Read PLAN.md carefully before approving. The goal is to catch design problems now, not during implementation. Specific things to check:
+
+- Implementation steps are in the right order, with no hidden dependencies
+- Test gates are concrete and runnable (not "it looks right")
+- Out-of-scope is explicit (prevents Claude from over-building)
+- Rollback plan exists for anything touching data or shared state
+
+If the plan needs changes, ask Claude to revise before approving. This is free — revision after approval costs implementation time.
+
+---
+
+## Phase 3: Implement
+
+### The Step-Gate Pattern
+
+Implementation runs step by step. Each step has a test gate. Claude does not start the next step until the current gate passes.
+
+```
+/rpi:implement .claude/features/[feature-name]/PLAN.md
+```
+
+Or without the slash command:
+
+```
+Run RPI Phase 3 (Implement) using:
+- Plan: .claude/features/[feature-name]/PLAN.md
+
+Rules:
+- Implement one step at a time
+- After each step, run the test gate specified in the plan
+- Do not start the next step until the test gate passes
+- If a test gate fails, stop and report the failure — do not improvise a fix
+- Commit after each step that passes its gate
+```
+
+### What Happens During Implementation
+
+Claude works through the plan's steps sequentially. For each step:
+
+1. Implements the specified files and changes
+2. Runs the test gate (unit tests, integration check, or manual verification)
+3. If gate passes: commits with a message referencing the step, announces readiness for the next step
+4. If gate fails: reports the failure, the specific test output, and the likely cause. Does not attempt to fix unless you confirm.
+
+The step-commit pattern gives you a clean git history that mirrors the plan. If something goes wrong in Step 4, you can roll back to the Step 3 commit cleanly.
+
+### Step-Gate Failure Protocol
+
+When a test gate fails, Claude reports:
+
+```
+Step [N] gate failed.
+
+Gate: [what was supposed to pass]
+Output:
+[actual test output]
+
+Likely cause: [Claude's diagnosis]
+Options:
+1. Fix: [specific change that would likely fix it]
+2. Revise plan: [if the plan step itself has a flaw]
+3. Stop and investigate: [if the failure reveals something unexpected]
+
+Which should I do?
+```
+
+You decide. Claude does not auto-fix and proceed — that's how implementations drift from plans.
+
+---
+
+## Slash Command Templates
+
+Save these to `.claude/commands/` to invoke each phase directly.
+
+### `/rpi:research`
+
+Save to `.claude/commands/rpi-research.md`:
+
+```markdown
+# RPI Phase 1: Research
+
+Run feasibility research for the requested feature.
+
+## Instructions
+
+1. Enter Plan Mode (do not modify files during research)
+2. Explore the codebase to answer these questions:
+   - What already exists that's relevant?
+   - What files will need to change or be created?
+   - What are the technical risks?
+   - What decisions need to be made before planning?
+   - What is a rough effort estimate?
+3. Save output to `.claude/features/$ARGUMENTS/RESEARCH.md` using the template below
+4. End with a clear recommendation: GO, NO-GO, or NEEDS CLARIFICATION
+5. Ask the user for their GO/NO-GO decision before proceeding
+
+## Constraints
+
+- Do NOT write any code
+- Do NOT modify any files
+- Do NOT start planning implementation steps
+- If uncertain about scope, surface it as a decision point
+```
+
+### `/rpi:plan`
+
+Save to `.claude/commands/rpi-plan.md`:
+
+```markdown
+# RPI Phase 2: Plan
+
+Create an implementation plan based on approved research.
+
+## Pre-check
+
+Before starting:
+1. Read the RESEARCH.md file specified in $ARGUMENTS
+2. Verify it has a GO decision marked
+3. If no GO decision found, stop and ask the user to decide first
+
+## Instructions
+
+1. Read RESEARCH.md carefully
+2. Create PLAN.md in the same feature folder
+3. Architecture decisions: resolve all decision points from research
+4. Implementation steps: each step must have a concrete test gate
+5. Success criteria: observable, testable, not implementation-internal
+6. Out-of-scope: explicit list of what this plan does NOT cover
+7. Do NOT write any implementation code
+8. After writing the plan, ask the user to review and approve
+
+## Constraints
+
+- Do NOT write implementation code
+- Steps must be sequential and independently testable
+- Test gates must be runnable commands or precise manual checks, not vague descriptions
+- Each step should be achievable in a single focused session
+```
+
+### `/rpi:implement`
+
+Save to `.claude/commands/rpi-implement.md`:
+
+```markdown
+# RPI Phase 3: Implement
+
+Implement the feature following an approved plan, one step at a time.
+
+## Pre-check
+
+Before starting:
+1. Read the PLAN.md file specified in $ARGUMENTS
+2. Verify it has an approval marked
+3. If no approval found, stop and ask the user to approve first
+
+## Instructions
+
+For each step in the plan:
+1. Read the step description carefully
+2. Implement only what the step specifies — nothing more
+3. Run the test gate exactly as written in the plan
+4. If the gate passes:
+   - Commit with message: "feat([feature]): step [N] — [step name]"
+   - Announce step completion and readiness for next step
+5. If the gate fails:
+   - Report the exact failure output
+   - Diagnose the likely cause
+   - Present options (fix, revise plan, stop)
+   - Wait for human decision before proceeding
+
+## Constraints
+
+- Never skip a test gate
+- Never start the next step before the current gate passes
+- Never modify files outside the scope of the current step
+- If you encounter something unexpected that changes the plan, stop and report it
+- Commit after each passing step — not at the end
+```
+
+---
+
+## Worked Example
+
+**Request**: "Add rate limiting to the public API endpoints."
+
+### Phase 1: Research Output (abbreviated)
+
+```markdown
+# Research: API Rate Limiting
+
+**Date**: 2026-03-12
+**Status**: PENDING DECISION
+
+## What Exists Today
+
+- `src/middleware/` — has auth middleware, no rate limiting
+- `package.json` — express-rate-limit not installed, redis available
+- `src/routes/api.ts` — 14 public endpoints, unauthenticated routes mixed with authenticated ones
+
+## What Needs to Be Built
+
+- Rate limiter middleware for public endpoints
+- Separate limits for authenticated vs unauthenticated users
+- Redis store for distributed rate limiting (app runs on 3 instances)
+
+## Risks
+
+| Risk | Likelihood | Impact | Notes |
+|------|-----------|--------|-------|
+| Redis connection failure disables all API access | Low | High | Need fallback to in-memory if Redis unavailable |
+| Rate limit too aggressive — breaks existing integrations | Medium | High | Need to survey current usage patterns first |
+
+## Architecture Decision Points
+
+1. **Library**: express-rate-limit (maintained, battle-tested) vs custom middleware
+2. **Bypass for trusted IPs**: Allow internal services to bypass rate limiting?
+
+## Effort Estimate
+
+- Implementation: 2-4 hours
+- Testing: 2 hours
+- **Total**: 4-6 hours
+
+**Recommendation**: GO — standard problem, good library options, main risk is Redis fallback which is solvable.
+```
+
+**Human decision**: GO. Use express-rate-limit. No IP bypass for now.
+
+### Phase 2: Plan (abbreviated)
+
+```markdown
+# Plan: API Rate Limiting
+
+**Risk level**: Medium (shared Redis state, potential to block legitimate traffic)
+
+## Architecture Decisions
+
+1. Library: express-rate-limit with rate-limit-redis store
+2. No IP bypass initially — revisit if internal service issues arise
+3. Unauthenticated: 100 requests/15 minutes. Authenticated: 1000 requests/15 minutes.
+4. Redis failure fallback: in-memory store (accepts single-instance inconsistency)
+
+## Implementation Steps
+
+### Step 1: Install dependencies and configure Redis store
+
+**Files**: `package.json`, `src/config/rate-limit.ts`
+**Test gate**: `npm install` completes, `src/config/rate-limit.ts` exports config without errors
+
+### Step 2: Implement rate limiter middleware
+
+**Files**: `src/middleware/rate-limit.ts`
+**Test gate**: Unit test — limiter blocks 101st request from same IP within 15 minutes
+
+### Step 3: Apply to routes
+
+**Files**: `src/routes/api.ts`
+**Test gate**: Integration test — unauthenticated route returns 429 after 100 requests; authenticated route does not
+
+### Step 4: Add Redis fallback
+
+**Files**: `src/config/rate-limit.ts`
+**Test gate**: Test with Redis unavailable — API still responds (200, not 500), in-memory limiting active
+```
+
+**Human review**: Approved.
+
+### Phase 3: Implementation
+
+Claude implements Step 1, runs the gate (`npm install` + import check), commits `feat(rate-limit): step 1 — dependencies and config`. Then Step 2, Step 3, Step 4 in sequence. Each commit is clean. Each gate must pass before proceeding.
+
+---
+
+## Comparison to Other Workflows
+
+| Workflow | Phase structure | Human gates | Best for |
+|----------|----------------|------------|----------|
+| **RPI** | Research + Plan + Implement | GO/NO-GO + plan approval | Unknown feasibility, >1 day, high risk of wrong direction |
+| **Dual-Instance** | Plan + Implement (separate Claude instances) | Plan approval | Known features needing careful execution, spec-heavy work |
+| **Spec-First** | Spec + Implement | None (spec is implicit gate) | Design-focused work, API contracts, team alignment |
+| **TDD** | Test-first + Implement | None (tests are the gate) | Test coverage as driver, refactoring, incremental behavior |
+| **Direct** | None | None | Simple changes, obvious scope, less than 2 hours |
+
+### RPI vs Dual-Instance
+
+Dual-instance separates planning and implementation into two Claude instances with strict role enforcement. It works well when you already know what you're building and want a high-quality plan. RPI adds a feasibility phase before planning, which makes it better for ambiguous requests. If you already have a clear spec, skip Research and use dual-instance or spec-first.
+
+### RPI vs Spec-First
+
+Spec-first is design-oriented: you define what the system should do, then Claude implements it. RPI is implementation-oriented with validation gates: you describe a goal, Claude researches how to achieve it, then the two of you agree on a plan before touching code. Use spec-first when the design is clear. Use RPI when the technical path is not.
+
+### RPI vs Direct Coding
+
+For anything under 2 hours with a clear scope, just ask Claude to do it. RPI adds overhead that isn't justified for small tasks. The research phase alone takes 30-60 minutes. That overhead pays off on multi-day features where discovering a wrong assumption late is much more expensive.
+
+---
+
+## Tips and Troubleshooting
+
+### Claude Skips to Implementation in Research Phase
+
+**Problem**: Claude starts writing code during the research phase.
+
+**Solution**: Use Plan Mode (Shift+Tab twice) explicitly for the research phase. Include this line in the research command:
+
+```
+You are in Plan Mode. Do not modify files. Do not write implementation code.
+Research only. Output goes to RESEARCH.md.
+```
+
+Or add to your CLAUDE.md:
+
+```markdown
+## RPI Rules
+- /rpi:research runs in Plan Mode only — no file modifications
+- /rpi:plan produces only PLAN.md — no implementation code
+- /rpi:implement runs one step at a time, waits for test gate before next step
+```
+
+### Research Phase Runs Too Long
+
+**Problem**: Claude explores the entire codebase instead of focusing on what's relevant.
+
+**Solution**: Scope the research explicitly:
+
+```
+/rpi:research payment-processing
+
+Focus area: src/payments/, src/routes/checkout.ts
+Do not explore: frontend, auth, unrelated backend modules
+Time budget: complete research in one session
+```
+
+### Plan Has Too Many Steps
+
+**Problem**: PLAN.md has 12 steps, making the implementation unwieldy.
+
+**Solution**: A plan with more than 6-8 steps usually needs a scope reduction, not more granular implementation. Ask Claude:
+
+```
+The plan has too many steps. What is the minimal viable scope that delivers
+the core value? Revise the plan to implement only that, with a clear
+"Future work" section for the rest.
+```
+
+### Test Gates Are Vague
+
+**Problem**: A step's test gate says "verify it works" rather than a specific command.
+
+**Solution**: Reject vague gates before approving the plan. Push back with:
+
+```
+Step 3's test gate is "verify rate limiting works." Make it specific:
+what command do I run, and what output do I expect to see when it passes?
+```
+
+A good test gate:
+
+```
+Test gate: `npm test src/middleware/rate-limit.test.ts` — all 4 tests pass
+```
+
+A bad test gate:
+
+```
+Test gate: rate limiting is working correctly
+```
+
+### A Step Gate Fails Repeatedly
+
+**Problem**: Step 2's gate keeps failing even after attempted fixes.
+
+**Solution**: Two failures means investigate the plan, not keep iterating on fixes.
+
+```
+Stop implementation. The Step 2 gate has failed twice.
+Review the plan — is the test gate achievable given the Step 1 output?
+Do we need to revise the plan before continuing?
+```
+
+---
+
+## File Structure Summary
+
+```
+.claude/
+└── features/
+    └── [feature-name]/
+        ├── RESEARCH.md     # Phase 1 output (human annotates GO/NO-GO)
+        └── PLAN.md         # Phase 2 output (human annotates approval)
+
+.claude/commands/
+├── rpi-research.md         # /rpi:research slash command
+├── rpi-plan.md             # /rpi:plan slash command
+└── rpi-implement.md        # /rpi:implement slash command
+```
+
+Archive completed features:
+
+```bash
+mkdir -p .claude/features/_archive
+mv .claude/features/payment-processing .claude/features/_archive/
+```
+
+The archive is a learning resource: completed RESEARCH.md and PLAN.md files show how previous features were reasoned about.
+
+---
+
+## See Also
+
+- [dual-instance-planning.md](./dual-instance-planning.md) — Two-instance pattern for spec-heavy implementation
+- [spec-first.md](./spec-first.md) — Design-first workflow using CLAUDE.md as contract
+- [tdd-with-claude.md](./tdd-with-claude.md) — Combine with TDD for test-gate implementation
+- [task-management.md](./task-management.md) — Managing multi-phase tasks across sessions
+- **Main guide**: Advanced Patterns section — Multi-instance and planning pattern overview
--- a/machine-readable/reference.yaml
+++ b/machine-readable/reference.yaml
@ -106,6 +106,13 @@ deep_dive:
  rpi_phase3_implement: "guide/workflows/rpi.md:237"  # Step-gate pattern + failure protocol
  rpi_slash_commands: "guide/workflows/rpi.md:281"  # /rpi:research /rpi:plan /rpi:implement templates
  rpi_vs_other_workflows: "guide/workflows/rpi.md:391"  # Comparison table: RPI vs dual-instance vs spec-first
+  # Changelog Fragments Workflow
+  changelog_fragments_workflow: "guide/workflows/changelog-fragments.md"  # Per-PR YAML fragments, 3-layer enforcement
+  changelog_fragments_claude_rule: "guide/workflows/changelog-fragments.md:26"  # CLAUDE.md rule for autonomous fragment creation
+  changelog_fragments_hook: "guide/workflows/changelog-fragments.md:88"  # UserPromptSubmit 3-tier hook (enforcement/discovery/contextual)
+  changelog_fragments_ci: "guide/workflows/changelog-fragments.md:169"  # Independent CI migration check job
+  # Smart-Suggest Hook
+  smart_suggest_hook: "examples/hooks/bash/smart-suggest.sh"  # UserPromptSubmit behavioral coach, 3-tier priority, ROI logging
  # Template Installation
  install_templates_script: "scripts/install-templates.sh"
  # Session management