docs: add RPI workflow, changelog fragments, smart-suggest hook + LLM variance

- guide/workflows/rpi.md (new): Research → Plan → Implement, 3-phase pattern
  with explicit GO gates, slash command templates, worked example
- guide/workflows/changelog-fragments.md (new): per-PR YAML fragment enforcement,
  3-layer system (CLAUDE.md rule + UserPromptSubmit hook + CI gate)
- examples/hooks/bash/smart-suggest.sh (new): UserPromptSubmit behavioral coach,
  3-tier priority (enforcement/discovery/contextual), ROI logging
- guide/core/known-issues.md: LLM Day-to-Day Performance Variance section,
  4 root causes (probabilistic inference, MoE routing, infra, context sensitivity)
- guide/workflows/README.md: added RPI entry + quick selection row
- machine-readable/reference.yaml: added entries for changelog_fragments, smart_suggest
- CHANGELOG.md: [Unreleased] entries for all 4 new items
- IDEAS.md: prompt-caching MCP plugin research note (testing in progress)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-03-13 16:22:57 +01:00
parent 13efb5a774
commit b6ce1ef72f
8 changed files with 1251 additions and 0 deletions

View file

@ -8,6 +8,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
### Added
- **`guide/workflows/rpi.md`** — New workflow guide (RPI: Research → Plan → Implement). 3-phase feature development pattern with explicit validation gates: Research produces `RESEARCH.md`, Plan produces `PLAN.md`, Implement produces working code. Each gate requires explicit GO before the next phase. Includes slash command templates (`/rpi:research`, `/rpi:plan`, `/rpi:implement`), a worked example (adding rate limiting to an Express API), and comparison matrix vs Plan-Driven, TDD, and Spec-First. Best for features where discovering a wrong assumption late is expensive.
- **`guide/workflows/changelog-fragments.md`** — New workflow guide for the Changelog Fragments pattern: one YAML fragment per PR, written at implementation time, validated by CI, assembled automatically at release. Covers 3-layer enforcement: (1) CLAUDE.md workflow rule for autonomous fragment creation, (2) `UserPromptSubmit` hook with 3-tier priority (enforcement → discovery → contextual), (3) independent CI migration check job. Includes the `UserPromptSubmit` tier pattern as a reusable hook architecture for any mandatory workflow step.
- **`examples/hooks/bash/smart-suggest.sh`** — New `UserPromptSubmit` hook implementing the 3-tier behavioral coach pattern: Tier 0 enforcement (changelog fragment required before PR, plan-before-code), Tier 1 discovery (test-loop, retex, dupes, monitoring, security, release), Tier 2 contextual (code review, debugging, architecture, session resume). Max 1 suggestion per prompt (first match wins), dedup guard, ROI logging to `~/.claude/logs/smart-suggest.jsonl`, silent exit on no match.
- **`guide/core/known-issues.md`** — New section "LLM Day-to-Day Performance Variance": documents session-to-session output quality variance (shorter responses, conservative suggestions, edge-case refusals) as expected behavior, not a bug. Explains the 4 root causes (probabilistic inference, MoE routing variance, infrastructure variance, context sensitivity) and provides an observable signals table. Includes a practical checklist for ruling out controllable factors before concluding "the model degraded."
- **`cc-sessions discover` documentation** — New subsection "Session Pattern Discovery" in §2.x (Session Management) covering the `discover` subcommand: n-gram mode (local, free, ~3s for 12 projects) vs `--llm` mode (semantic analysis via `claude --print`). Includes example output, the 20% rule decision framework (CLAUDE.md rule / skill / command categorization), and install instructions. Cross-reference added after the 20% rule callout in §5.1.
- **`examples/scripts/cc-sessions.py` synced** — Updated from 498-line stale copy to the full 1225-line version from `~/bin/cc-sessions`. Includes the complete `discover` subcommand (n-gram analysis + `--llm` mode), incremental discover cache, Jaccard deduplication, and all filtering logic. GitHub source header added.

View file

@ -85,6 +85,36 @@ CLAUDE.md configuration examples by framework:
## Watching (Waiting for Demand)
a### prompt-caching MCP Plugin
MCP plugin that automates `cache_control` placement for developers building apps on the Anthropic SDK. Installed locally at `/Users/florianbruniaux/Sites/prompt-caching` and connected to Claude Code via `~/.claude.json`.
**Status:** Testing in progress. Real usage data required before any documentation decision.
**What we know:**
- 29 stars, v1.3.0, solo maintainer — maintenance risk
- Author-reported benchmarks (80-92% savings) — unverified, cannot cite
- Fills a real gap: no other MCP tool does this; Spring AI / LiteLLM / Pydantic AI serve different audiences
- Blog post (Mathieu Grenier) independently documents the same pain point + 5 antipatterns — score 3/5, worth integrating in "Strategy 6" regardless
**Open questions:**
- [ ] Do real sessions on this project actually hit the cache? (run `get_cache_stats` after 10+ turns)
- [ ] Is the plugin stable enough to recommend? Any errors, memory leaks, session issues?
- [ ] What are the real savings on a CLAUDE.md-heavy project like this guide?
**If test results are positive (cache hits confirmed, no stability issues):**
- Add to `guide/ecosystem/third-party-tools.md` with verified stats (not README claims)
- Add to landing third-party tools section
- Score upgrade: 3/5 → 4/5
**If test results are inconclusive or plugin is unstable:**
- Move to Discarded Ideas
- Keep the Mathieu Grenier blog post integration (independent value)
**Check again:** After 1 week of real usage
---
### Multi-LLM Consultation Patterns
Using external LLMs (Gemini, GPT-4) as "second opinion" from Claude Code.

View file

@ -0,0 +1,176 @@
#!/bin/bash
# .claude/hooks/smart-suggest.sh
# Event: UserPromptSubmit
# Non-blocking behavioral coach: suggests the right command or agent based on detected intent
#
# Architecture: 3-tier priority system
# Tier 0 — Enforcement: mandatory workflow gates (runs first, exits on match)
# Tier 1 — Discovery: high-value tools the developer may not know exist
# Tier 2 — Contextual: opportunistic suggestions for common patterns
#
# Properties:
# - Max 1 suggestion per prompt (first match wins)
# - Dedup guard: never suggest a command already in the prompt
# - ROI logging: records suggestions to ~/.claude/logs/smart-suggest.jsonl
# - Silent exit on no match (non-blocking, never fails)
#
# Customization: replace patterns with your own commands/agents.
# Keep Tier 0 small and precise — it intercepts before everything else.
set -euo pipefail
INPUT=$(cat)
PROMPT=$(echo "$INPUT" | jq -r '.prompt // empty' 2>/dev/null || true)
# Skip prompts that are too short to match anything meaningful
if [[ -z "$PROMPT" || ${#PROMPT} -lt 8 ]]; then
exit 0
fi
PROMPT_LC=$(echo "$PROMPT" | tr '[:upper:]' '[:lower:]')
# Skip slash commands — user already knows what they want
if [[ "$PROMPT_LC" =~ ^/ ]]; then
exit 0
fi
LABEL_CMD="Command"
LABEL_AGENT="Agent"
# ─── suggest() ────────────────────────────────────────────────────────────────
# Emits one suggestion then exits (guarantees max 1 suggestion per prompt).
# Dedup: if the command/agent name is already in the prompt, exits silently.
suggest() {
local label="$1" name="$2" reason="$3"
# Strip leading slash, take first token — prevents matching "pr" in "improve"
# e.g. "/changelog:add" → "changelog:add", "code-reviewer" → "code-reviewer"
local check
check=$(echo "$name" | tr '[:upper:]' '[:lower:]' | sed 's|^/||' | awk '{print $1}')
if echo "$PROMPT_LC" | grep -qF "$check"; then
exit 0
fi
# Append to JSONL log for ROI measurement (non-blocking — failure is safe)
local logdir="${HOME}/.claude/logs"
mkdir -p "$logdir" 2>/dev/null || true
printf '{"ts":"%s","suggested":"%s","prompt_len":%d}\n' \
"$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$name" "${#PROMPT}" \
>> "$logdir/smart-suggest.jsonl" 2>/dev/null || true
cat << EOF
{
"hookSpecificOutput": {
"hookEventName": "UserPromptSubmit",
"additionalContext": "[Suggestion] $label: $name -- $reason"
}
}
EOF
exit 0
}
# ==============================================================================
# TIER 0 — Enforcement
# Intercepts workflow violations before any other suggestion.
# Keep patterns tight: a false positive here is more disruptive than a miss.
# ==============================================================================
# Changelog fragment enforcement
# If the developer signals intent to create a PR without mentioning a changelog
# fragment, redirect to the fragment creation step first.
# When fragment/skip-changelog is already mentioned → suggest the PR command normally.
if echo "$PROMPT_LC" | grep -qE '(create.*pr|open.*pr|make.*pr|pull.?request|push.*pr)'; then
if ! echo "$PROMPT_LC" | grep -qE '(changelog|fragment|skip-changelog)'; then
suggest "$LABEL_CMD" "pnpm changelog:add" \
"REQUIRED before merge — creates changelog/fragments/{PR}-{slug}.yml (or add label 'skip-changelog' for tooling PRs)"
else
suggest "$LABEL_CMD" "/pr" \
"PR creation with structured description and auto-detected scope"
fi
fi
# Plan-before-code enforcement
# Intercepts implementation intent when no planning context is present.
# Negative lookahead excludes: plan, test, fix, review, explain, refactor.
if echo "$PROMPT_LC" | grep -qE '(implement|add (the|a|an)|create (the|a|an) (service|router|endpoint|feature|component)|write (the )?(code|logic|service))'; then
if ! echo "$PROMPT_LC" | grep -qE '(plan|test|fix|bug|review|refactor|explain|why|how)'; then
suggest "$LABEL_CMD" "/plan" \
"Run /plan BEFORE coding — Research → Spec → Implement reduces rework"
fi
fi
# ==============================================================================
# TIER 1 — Discovery
# High-value tools the developer may not know about.
# Use multi-word patterns to reduce false positives.
# ==============================================================================
# Failing tests — suggest auto-fix loop before generic debug
if echo "$PROMPT_LC" | grep -qE '(tests? (fail|broken|red)|fix (the |these )?tests?|tests? (are )?failing)'; then
if ! echo "$PROMPT_LC" | grep -qE '(api|server|endpoint|500|404)'; then
suggest "$LABEL_CMD" "/test-loop" \
"Auto-fix loop: 1 test per cycle with convergence stats"
fi
fi
# Lesson learned / rollback scenario
if echo "$PROMPT_LC" | grep -qE '(wrong approach|bad approach|should have|dead end|root cause was|lesson learned|introduced.*bug|regression)'; then
suggest "$LABEL_CMD" "/retex" \
"Capture this lesson in persistent memory (searchable across sessions)"
fi
# Code duplication detected
if echo "$PROMPT_LC" | grep -qE '(duplicat|copy.?past|same code|repeated (code|logic)|identical (code|function))'; then
suggest "$LABEL_CMD" "/dupes" \
"Detect copy-paste code with jscpd before the refactor"
fi
# Monitoring / polling intent — suggest background loop
if echo "$PROMPT_LC" | grep -qE '(check (every|each|all)|wait (until|for)|poll|monitor|watch.*for|when (it.?s |it is )?done|once (the )?deploy|once (the )?ci)'; then
suggest "$LABEL_CMD" "/loop [interval] [prompt]" \
"Run in background while you work: /loop 5m check the deploy status"
fi
# Security-sensitive keywords
if echo "$PROMPT_LC" | grep -qE '(sql injection|xss|owasp|vulnerability|security audit|auth bypass)'; then
suggest "$LABEL_AGENT" "security-auditor" \
"Read-only OWASP audit — detects injection, auth, and data exposure issues"
fi
# Release assembly — suggest structured release command
if echo "$PROMPT_LC" | grep -qE '(cut.*release|prepare.*release|new (version|release)|release.*[0-9]+\.[0-9]|tag.*version|deploy.*main|merge.*main)'; then
suggest "$LABEL_CMD" "/release" \
"Assembles fragments → CHANGELOG section, bumps version, creates release branch"
fi
# ==============================================================================
# TIER 2 — Contextual
# General patterns, lower priority, regex kept tight to avoid noise.
# ==============================================================================
# Code review before merge
if echo "$PROMPT_LC" | grep -qE '(review (the |this )?(code|pr|file)|code review|before (merge|merging))'; then
suggest "$LABEL_AGENT" "code-reviewer" \
"Quality + security + performance review"
fi
# Debugging — generic
if echo "$PROMPT_LC" | grep -qE '(debug (this|the)|fix (this |the )?(bug|crash|error)|stack trace|not (working|loading))'; then
suggest "$LABEL_AGENT" "debugger" \
"Systematic root cause analysis"
fi
# Architecture decision
if echo "$PROMPT_LC" | grep -qE '(architecture (decision|review)|system design|big refactor|large.*refactor)'; then
suggest "$LABEL_AGENT" "architect-review" \
"Validates architectural decisions across scalability, coupling, and tradeoffs"
fi
# Resume previous session
if echo "$PROMPT_LC" | grep -qE '(resume (session|work)|pick up (where|from)|continue (from )?(yesterday|last session))'; then
suggest "$LABEL_CMD" "/resume" \
"Restores session context from persistent memory"
fi
# No match — silent exit (never blocks the user)
exit 0

View file

@ -234,6 +234,61 @@ Key quote:
---
## 🔄 LLM Day-to-Day Performance Variance
**Type**: Expected behavior (not a bug)
**Severity**: 🟡 **LOW - AWARENESS**
**Status**: Inherent to LLM inference, not specific to any version
### What This Is
Claude's output quality can vary noticeably from session to session, even with identical prompts and a clean context window. This is distinct from context window degradation (which happens within a session as context fills up). This is about variance between fresh sessions.
Users sometimes report shorter responses, more conservative suggestions, or unexpected refusals on tasks that worked fine the day before. This can feel like a model downgrade, but it is not.
### Root Causes
**Probabilistic inference**: Temperature above 0 means every inference run is non-deterministic. Two runs of the same prompt will produce different token sequences. This is fundamental to how language models work.
**MoE routing variance**: Claude uses a Mixture of Experts architecture. On each forward pass, a routing mechanism selects which expert weights to activate. Different runs activate different combinations, producing different outputs even for semantically identical inputs.
**Infrastructure variance**: In production, requests hit different servers with different load levels, hardware generations, and thermal states. These factors influence numerical precision in floating-point arithmetic during inference, creating subtle but real output differences.
**Context sensitivity**: Even with `/clear`, tiny differences between sessions accumulate. The system prompt, tool list, and session initialization all slightly affect the model's first outputs.
### Observable Signals
| Signal | What You See | What It Means |
|--------|-------------|---------------|
| Response length | Shorter, less detailed than usual | Routing hit a more conservative path |
| Refusals | Edge cases that normally work get refused | Different safety calibration on this run |
| Code style | More verbose or more minimal than expected | Expert mix activated differently |
| Creativity | More conservative, less inventive suggestions | Not a capability loss, a sampling outcome |
| Verbosity | More caveats and disclaimers than usual | Normal variance in token probabilities |
### What This Is NOT
- **Not a model downgrade**: Anthropic versions models deliberately and documents changes. Day-to-day variance happens within the same model version.
- **Not a bug to report**: This behavior is expected and documented in LLM literature. It is inherent to probabilistic inference.
- **Not permanent**: The next session will likely behave differently. A "bad" run does not indicate a lasting change.
- **Not context window degradation**: That is a within-session phenomenon caused by token accumulation. This is between-session variance on fresh starts.
> The Aug-Sep 2025 incident ([see Resolved Issues above](#model-quality-degradation-aug-sep-2025)) was the exception: Anthropic confirmed actual infrastructure bugs causing systematic degradation. True systematic degradation is rare and Anthropic investigates it. Normal session-to-session variance is something else.
### Mitigation Strategies
**Constrain the prompt**: More specific prompts reduce the output space and make variance less noticeable. "Write a function that does X, Y, Z, returns type T, handles edge case E" produces more consistent outputs than "write me something to handle X."
**Fresh context before important work**: Run `/clear` before a high-stakes task. Accumulated session noise from earlier exploratory work can skew subsequent outputs even within the same session.
**Reformulate and retry**: If an output seems off compared to your expectations, try the same request with different framing. A second formulation often routes through different expert paths and produces a better result.
**Compare against a known-good prompt**: If you have a prompt from a previous session that produced excellent output, use it as a reference. If today's version of that prompt produces visibly worse output consistently, that warrants closer investigation (and potentially a GitHub issue if reproducible).
**Calibrate expectations by task type**: Deterministic tasks (regex, simple transforms, well-defined algorithms) show less variance than creative or judgment-heavy tasks. Use Claude Code for the former with high reliability; for the latter, build review steps into your workflow.
---
## 📊 Issue Statistics (as of Jan 28, 2026)
| Metric | Count | Source |

View file

@ -83,6 +83,14 @@ Eliminates merge conflicts on `CHANGELOG.md`, captures context at implementation
- Conditional suggestion pattern: "if PR-intent without fragment-mention"
- CI enforcement with independent migration check job
### [RPI: Research → Plan → Implement](./rpi.md) ⭐ NEW
**3-phase feature development with explicit validation gates between phases**
Build features in three locked phases: Research feasibility first, plan the implementation second, write code third. Each phase produces a concrete artifact (RESEARCH.md → PLAN.md → code). Each gate requires an explicit GO before the next phase starts.
**When to use**: Features with unclear feasibility, more than a day of work, unknown technical territory, or anywhere discovering a wrong assumption late is costly
### [Cognitive Mode Switching](./gstack-workflow.md) ⭐ NEW
Switch between specialist roles across your ship cycle: strategic product gate, architecture review, paranoid code review, automated release, native browser QA, and retrospective.
@ -190,6 +198,7 @@ Multi-session task tracking with TodoWrite, tasks API, and context persistence a
| **New project from template** | [Skeleton Projects](./skeleton-projects.md) |
| **Team AI instructions** | [Team AI Instructions](./team-ai-instructions.md) |
| **Enforce mandatory workflow steps** | [Changelog Fragments](./changelog-fragments.md) |
| **Unknown feasibility, multi-day feature** | [RPI: Research → Plan → Implement](./rpi.md) |
| **Documentation** | [PDF Generation](./pdf-generation.md) |
| **Social previews** | [OG Image Generation](./og-image-generation.md) |
| **Conference talk from raw material** | [Talk Preparation Pipeline](./talk-pipeline.md) |

View file

@ -0,0 +1,200 @@
# Changelog Fragments: Enforced Per-PR Documentation
A 3-layer enforcement pattern that ensures every PR is documented at write time, never at release time.
---
## The Problem
Single `CHANGELOG.md` files break on active teams. Three open feature branches all touching the same file means merge conflicts on every merge. Someone resolves the conflict, drops a line, and release notes are wrong before they're published.
The deeper problem is timing. "Document at release time" sounds reasonable until you're staring at PR #840 three weeks after it merged, trying to reconstruct what changed for users. The commit says `fix session handling`. The developer is in a different timezone. Context is gone.
Enforcement without CI gates means changelogs become a nag job. Someone has to chase people before every release, under time pressure, filling in blanks from git log.
The solution: one YAML fragment per PR, written while implementing, validated by CI, assembled automatically at release.
---
## The 3-Layer Architecture
The system works because enforcement happens at three independent levels. Each layer catches a different failure mode.
### Layer 1: CLAUDE.md Workflow Rule
The first layer is a rule loaded into Claude Code's context at every session. It encodes the entire fragment workflow so Claude can complete it autonomously when asked to create a PR.
```markdown
# git-workflow.md (loaded via CLAUDE.md)
## Changelog Fragment — Required Before Every PR
Before creating a PR, always generate a changelog fragment.
### Steps
1. **Infer from git diff** — analyze `git diff main...HEAD` to determine:
- `type`: feat | fix | perf | refactor | security | docs | chore
- `scope`: the functional area affected (auth, sessions, api, etc.)
- `title`: one-line user-facing summary (< 80 chars)
2. **Create the fragment** — run `pnpm changelog:add` or write directly:
```
changelog/fragments/{PR_NUMBER}-{slug}.yml
```
3. **Validate** — run `pnpm changelog:validate changelog/fragments/{file}.yml`
4. **Commit alongside the PR** — include the fragment in the same branch
### Fragment schema
```yaml
pr: 886 # must match filename prefix
type: fix # feat|fix|perf|refactor|security|docs|chore
scope: "visiochat"
title: "Fix empty chat after SSE race condition" # < 80 chars
description: | # optional — explain user impact, not implementation
SSE workplan fires before AI stream completes, causing ChatWrapper
to mount with 0 messages.
breaking: false
migration: false # set true if PR adds a DB migration
```
### Bypass
Add label `skip-changelog` for PRs with no user impact (CI config, deps updates, release commits).
```
This rule makes Claude Code a participant in enforcement, not just a coding tool. When a developer says "make the PR," Claude infers the fragment content from the diff and creates it before opening the PR.
### Layer 2: UserPromptSubmit Hook (Behavioral Detection)
The second layer intercepts intent before it becomes action. When the developer types something that signals PR creation intent, the hook checks whether a changelog fragment was mentioned.
```bash
# .claude/hooks/smart-suggest.sh (excerpt — Tier 0 enforcement)
# PR creation intent detected
if echo "$PROMPT_LC" | grep -qE '(create.*pr|open.*pr|make.*pr|pull.?request|push.*pr)'; then
# Fragment not mentioned → redirect to creation step first
if ! echo "$PROMPT_LC" | grep -qE '(changelog|fragment|skip-changelog)'; then
suggest "pnpm changelog:add" \
"REQUIRED before merge — creates changelog/fragments/{PR}-{slug}.yml"
else
# Already mentioned → suggest the PR command normally
suggest "/pr" "PR creation with structured description"
fi
fi
```
The hook is `UserPromptSubmit`: non-blocking, max one suggestion per prompt, silent on no match. It runs before Claude Code processes the prompt, so the developer sees the reminder inline before Claude starts doing anything.
The conditional logic (`if X without Y`) is the key pattern here. It's not a blanket blocker — it adapts to context. If the developer already mentioned the fragment, they get the normal suggestion. If they didn't, they get the enforcement reminder.
**Full hook with 3-tier architecture**: [`examples/hooks/bash/smart-suggest.sh`](../../examples/hooks/bash/smart-suggest.sh)
### Layer 3: CI Enforcement (GitHub Actions)
The third layer is the hard gate. Two independent jobs run on every PR targeting the main branch.
**`check-fragment` job**: checks bypass labels first (closed list), then requires `changelog/fragments/{PR_NUMBER}-*.yml` to exist and pass structural validation.
```yaml
- name: Check fragment exists and is valid
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
run: |
SKIP_LABELS=("skip-changelog" "dependencies" "release" "chore: deps")
for LABEL in "${SKIP_LABELS[@]}"; do
if echo "$PR_LABELS" | grep -q "\"$LABEL\""; then
echo "Bypass label detected — fragment not required"
exit 0
fi
done
FRAGMENT=$(ls "changelog/fragments/${PR_NUMBER}-"*.yml 2>/dev/null | head -1)
if [ -z "$FRAGMENT" ]; then
echo "Fragment missing. Run: pnpm changelog:add"
exit 1
fi
pnpm tsx changelog/scripts/validate.ts "$FRAGMENT"
```
**`check-migration-flag` job** (runs independently, no bypass): detects new SQL migration files with `git diff --name-only --diff-filter=A`. If migrations are present and `migration: false` in the fragment, it fails. This job cannot be bypassed by labels — a `skip-changelog` PR that adds a migration still triggers the check.
The two jobs are independent by design. A PR can bypass fragment creation (via label) but still fail the migration check.
---
## Fragment Assembly at Release
Fragments accumulate in `changelog/fragments/` as PRs merge. At release time, one command assembles them into a versioned CHANGELOG section.
```bash
pnpm changelog:assemble --version 1.8.0 [--dry-run]
```
What it does:
1. Reads all `changelog/fragments/*.yml`
2. Groups by type in fixed order (feat, fix, perf, refactor, security, docs, chore)
3. Pulls `breaking: true` entries into a dedicated `🔨 Breaking Changes` section
4. Annotates `migration: true` entries inline with `⚠️ Migration DB.`
5. Replaces the `## [Next Release]` placeholder in `CHANGELOG.md`
6. Archives fragments to `changelog/fragments/released/{version}/`
Output:
```markdown
## [1.8.0] - 2026-03-15
### 🔨 Breaking Changes
- **Remove legacy token format (#871)** — Tokens issued before v1.6.0 are invalid.
### ✨ New Features
- **Add real-time presence indicators (#892)**
### 🔧 Bug Fixes
- **Fix empty chat after SSE race condition (#886)** — SSE workplan fires before
AI stream completes, causing ChatWrapper to mount with 0 messages.
```
---
## Why 3 Layers, Not 1
Each layer catches a different failure mode:
| Layer | Failure caught | When |
|-------|---------------|------|
| CLAUDE.md rule | Claude forgets the workflow | Every session |
| UserPromptSubmit hook | Developer types "make the PR" without thinking | Pre-prompt |
| CI gate | Fragment was skipped or corrupt | Pre-merge |
A single CI gate catches the issue too late — the developer has to context-switch back after their PR is already open. The hook catches it at intent time. The CLAUDE.md rule means Claude handles it autonomously when given the task.
The layers don't conflict. They reinforce each other. A developer who sees the hook suggestion will run `pnpm changelog:add`. Claude will follow the CLAUDE.md rule and validate the output. CI confirms everything before merge.
---
## Adopting This Pattern
The TypeScript scripts (add, validate, assemble, audit) are specific to the Méthode Aristote stack. The 3-layer enforcement pattern is not — it works with any fragment format, any CI system, any assembler.
**Minimum viable setup:**
1. **Define your fragment schema** (YAML, JSON, whatever fits your stack)
2. **Add a CLAUDE.md rule** encoding the creation workflow so Claude can handle it autonomously
3. **Add a `UserPromptSubmit` hook** with the `if PR-intent without fragment-mention → suggest` pattern
4. **Add a CI job** that checks for fragment existence before merge
The hook pattern generalizes to any mandatory workflow step. Substitute "changelog fragment" with "ADR", "migration flag", "test coverage check" — the conditional detection logic is the same.
---
## Related
- Hook example: [`examples/hooks/bash/smart-suggest.sh`](../../examples/hooks/bash/smart-suggest.sh)
- Hook documentation: [UserPromptSubmit Hooks](../ultimate-guide.md) (search "UserPromptSubmit")
- Fragment validator and assembler scripts: available in the Méthode Aristote repository

766
guide/workflows/rpi.md Normal file
View file

@ -0,0 +1,766 @@
---
title: "RPI: Research → Plan → Implement"
description: "A 3-phase feature development pattern with explicit validation gates between phases"
tags: [workflow, architecture, design-patterns, validation]
---
# RPI: Research → Plan → Implement
> **Confidence**: Tier 2 — Synthesized from production team patterns. The gate-based structure aligns with Anthropic's guidance on agent task decomposition and agentic loop control.
Build features in three locked phases: Research feasibility first, plan the implementation second, write code third. Each phase produces a concrete artifact. Each gate requires an explicit GO before the next phase starts.
---
## Table of Contents
1. [TL;DR](#tldr)
2. [When to Use RPI](#when-to-use-rpi)
3. [How the Gates Work](#how-the-gates-work)
4. [Phase 1: Research](#phase-1-research)
5. [Phase 2: Plan](#phase-2-plan)
6. [Phase 3: Implement](#phase-3-implement)
7. [Slash Command Templates](#slash-command-templates)
8. [Worked Example](#worked-example)
9. [Comparison to Other Workflows](#comparison-to-other-workflows)
10. [Tips and Troubleshooting](#tips-and-troubleshooting)
11. [See Also](#see-also)
---
## TL;DR
```
Phase 1 — Research:
Claude explores feasibility, surfaces risks, asks decision questions
Output: RESEARCH.md
Gate: You decide GO / NO-GO
Phase 2 — Plan:
Claude writes architecture decisions, user stories, test plan
Output: PLAN.md
Gate: You approve the plan before any code is written
Phase 3 — Implement:
Claude implements step by step, tests pass before each next step
Output: working code + passing tests
Gate: each implementation step validated before the next begins
```
**Best for**: Features with unclear feasibility, more than a day of work, unknown technical territory, or anything where discovering a wrong assumption late is costly.
---
## When to Use RPI
### Use RPI When
- **Feasibility is unknown**: You have an idea but aren't sure it holds up technically
- **Scope is large**: More than a day's worth of implementation work
- **Requirements are fuzzy**: You know the outcome you want, not the path to get there
- **Risk of wrong direction is high**: Security, payments, data migrations, integrations with external systems
- **You've been surprised before**: A feature looked simple, turned out to involve 6 other systems
### Skip RPI When
| Scenario | Better Approach |
|----------|----------------|
| Fix is obvious (typo, wrong color) | Direct edit |
| Feature is well-understood, requirements are clear | [Spec-First](./spec-first.md) or [dual-instance](./dual-instance-planning.md) |
| Exploration mode — you don't know what you want yet | Exploration workflow |
| Tiny change, single file | Just do it |
### Decision Heuristic
Ask yourself: "If the research phase reveals a serious problem, am I glad I didn't spend 2 days implementing first?"
If yes, run RPI. The research phase typically takes 30-60 minutes and can save many hours.
---
## How the Gates Work
RPI has two human gates and one automated gate per implementation step.
```
[Idea]
|
v
[Phase 1: Research]
|
+- NO-GO -> Stop. Document why. Archive RESEARCH.md.
|
+- GO -------------------------------------------------------->
|
[Phase 2: Plan]
|
+- Needs revision -> iterate with Claude
|
+- Approved -------------------------------->
|
[Phase 3: Implement]
Step 1 -> Test gate
Step 2 -> Test gate
Step 3 -> Test gate
|
[Done]
```
**Gate 1 (after Research)**: You read RESEARCH.md and make a GO/NO-GO decision. This is the most important gate — it prevents building the wrong thing entirely.
**Gate 2 (after Plan)**: You review PLAN.md before any code is written. Minor revisions happen here at zero cost.
**Step gates (during Implement)**: Each implementation step must have passing tests before Claude moves to the next step. Automated, no human action required unless a step fails.
---
## Phase 1: Research
### What Research Covers
The research phase answers five questions:
1. **What already exists?** Relevant code, libraries, prior attempts in the codebase
2. **What needs to be built?** Scope boundary, components to create vs modify
3. **What are the risks?** Technical, security, integration, performance
4. **What are the decision points?** Architecture choices that affect the whole plan
5. **What is the effort estimate?** Rough sizing before committing to a plan
Claude explores the codebase, reads relevant files, checks dependencies, and surfaces any constraint that would change the plan. The result is RESEARCH.md.
### Starting Research
Create the feature folder and invoke research:
```bash
mkdir -p .claude/features/[feature-name]
```
Then in Claude:
```
/rpi:research [feature description]
```
Or without the slash command:
```
Run RPI Phase 1 (Research) for: [feature description]
Save output to .claude/features/[feature-name]/RESEARCH.md
Use Plan Mode to explore without modifying code.
```
### RESEARCH.md Template
```markdown
# Research: [Feature Name]
**Date**: [YYYY-MM-DD]
**Requested**: [One-sentence description of the feature]
**Status**: PENDING DECISION
---
## What Exists Today
### Relevant Code
- [file path]: [what it does, why it matters]
- [file path]: [what it does, why it matters]
### Relevant Libraries
- [library]: [currently used / available / needs to be added]
### Prior Attempts or Related Work
- [any existing partial implementation, related PR, note in codebase]
---
## What Needs to Be Built
### New Files
- [file path]: [purpose]
- [file path]: [purpose]
### Files to Modify
- [file path]: [what changes, why]
### External Dependencies
- [dependency]: [reason needed, version constraint if any]
---
## Risks
| Risk | Likelihood | Impact | Notes |
|------|-----------|--------|-------|
| [risk description] | Low/Med/High | Low/Med/High | [mitigation or blocker] |
---
## Architecture Decision Points
Questions that need a decision before planning can start:
1. **[Decision]**: Option A (pros: X, cons: Y) vs Option B (pros: X, cons: Y)
2. **[Decision]**: [Options and trade-offs]
---
## Effort Estimate
- Research-to-plan: [time]
- Implementation: [time range]
- Testing: [time]
- **Total estimate**: [range]
**Confidence in estimate**: Low / Medium / High
**Why**: [reason for confidence level]
---
## Recommendation
[GO / NO-GO / NEEDS CLARIFICATION]
[1-3 sentences explaining the recommendation]
---
**Decision**: [ ] GO [ ] NO-GO [ ] NEEDS CLARIFICATION
**Notes**: [human fills this in]
```
### What a NO-GO Looks Like
Not every research phase ends in GO. Common NO-GO reasons:
- **Technical blocker**: External API doesn't support the required operation
- **Scope creep discovered**: "Simple feature" turns out to require rewriting the auth layer
- **Better alternative exists**: Research reveals an existing library or config change that solves the problem more simply
- **Risk too high for now**: The feature is valid but the timing is wrong
Archive NO-GO research docs — they're valuable records of decisions made and why.
---
## Phase 2: Plan
### What the Plan Covers
Phase 2 starts only after you mark RESEARCH.md with GO. Claude reads the research doc and produces a precise implementation plan.
A good plan is specific enough that a different engineer (or Claude in a new session) could execute it without asking questions.
```
/rpi:plan .claude/features/[feature-name]/RESEARCH.md
```
Or without the slash command:
```
Run RPI Phase 2 (Plan) using:
- Research: .claude/features/[feature-name]/RESEARCH.md
Save output to .claude/features/[feature-name]/PLAN.md
Do not write any code yet. Plan only.
```
### PLAN.md Template
```markdown
# Plan: [Feature Name]
**Date**: [YYYY-MM-DD]
**Research**: [link to RESEARCH.md]
**Estimated effort**: [from research]
**Risk level**: Low / Medium / High
---
## Summary
[2-4 sentences: what this implements, major design decisions taken, what it does NOT include]
---
## Architecture Decisions
[Record decisions made from the research decision points]
1. **[Decision]**: Chose [Option] because [reason]
2. **[Decision]**: Chose [Option] because [reason]
---
## Implementation Steps
Steps must be sequential and independently testable.
### Step 1: [Name]
**Files**: [list of files to create or modify]
**What to build**: [precise description]
**Test gate**: [specific test or check that must pass before Step 2 starts]
### Step 2: [Name]
**Files**: [list of files to create or modify]
**What to build**: [precise description]
**Test gate**: [specific test or check that must pass before Step 3 starts]
[...continue for all steps]
---
## Success Criteria
- [ ] [Testable criterion — describes observable behavior, not implementation]
- [ ] [Testable criterion]
- [ ] All step test gates pass
---
## Out of Scope
Explicitly list what this plan does NOT cover:
- [thing excluded]
- [thing excluded]
---
## Risks Accepted
[From RESEARCH.md risks, list which are accepted and how they're mitigated]
---
## Rollback Plan
If implementation fails mid-way:
- [what to undo]
- [how to restore previous state]
---
**Plan approved?** [ ] YES — proceed to implementation
**Revision notes**: [human fills this in if changes needed]
```
### Reviewing the Plan
Read PLAN.md carefully before approving. The goal is to catch design problems now, not during implementation. Specific things to check:
- Implementation steps are in the right order, with no hidden dependencies
- Test gates are concrete and runnable (not "it looks right")
- Out-of-scope is explicit (prevents Claude from over-building)
- Rollback plan exists for anything touching data or shared state
If the plan needs changes, ask Claude to revise before approving. This is free — revision after approval costs implementation time.
---
## Phase 3: Implement
### The Step-Gate Pattern
Implementation runs step by step. Each step has a test gate. Claude does not start the next step until the current gate passes.
```
/rpi:implement .claude/features/[feature-name]/PLAN.md
```
Or without the slash command:
```
Run RPI Phase 3 (Implement) using:
- Plan: .claude/features/[feature-name]/PLAN.md
Rules:
- Implement one step at a time
- After each step, run the test gate specified in the plan
- Do not start the next step until the test gate passes
- If a test gate fails, stop and report the failure — do not improvise a fix
- Commit after each step that passes its gate
```
### What Happens During Implementation
Claude works through the plan's steps sequentially. For each step:
1. Implements the specified files and changes
2. Runs the test gate (unit tests, integration check, or manual verification)
3. If gate passes: commits with a message referencing the step, announces readiness for the next step
4. If gate fails: reports the failure, the specific test output, and the likely cause. Does not attempt to fix unless you confirm.
The step-commit pattern gives you a clean git history that mirrors the plan. If something goes wrong in Step 4, you can roll back to the Step 3 commit cleanly.
### Step-Gate Failure Protocol
When a test gate fails, Claude reports:
```
Step [N] gate failed.
Gate: [what was supposed to pass]
Output:
[actual test output]
Likely cause: [Claude's diagnosis]
Options:
1. Fix: [specific change that would likely fix it]
2. Revise plan: [if the plan step itself has a flaw]
3. Stop and investigate: [if the failure reveals something unexpected]
Which should I do?
```
You decide. Claude does not auto-fix and proceed — that's how implementations drift from plans.
---
## Slash Command Templates
Save these to `.claude/commands/` to invoke each phase directly.
### `/rpi:research`
Save to `.claude/commands/rpi-research.md`:
```markdown
# RPI Phase 1: Research
Run feasibility research for the requested feature.
## Instructions
1. Enter Plan Mode (do not modify files during research)
2. Explore the codebase to answer these questions:
- What already exists that's relevant?
- What files will need to change or be created?
- What are the technical risks?
- What decisions need to be made before planning?
- What is a rough effort estimate?
3. Save output to `.claude/features/$ARGUMENTS/RESEARCH.md` using the template below
4. End with a clear recommendation: GO, NO-GO, or NEEDS CLARIFICATION
5. Ask the user for their GO/NO-GO decision before proceeding
## Constraints
- Do NOT write any code
- Do NOT modify any files
- Do NOT start planning implementation steps
- If uncertain about scope, surface it as a decision point
```
### `/rpi:plan`
Save to `.claude/commands/rpi-plan.md`:
```markdown
# RPI Phase 2: Plan
Create an implementation plan based on approved research.
## Pre-check
Before starting:
1. Read the RESEARCH.md file specified in $ARGUMENTS
2. Verify it has a GO decision marked
3. If no GO decision found, stop and ask the user to decide first
## Instructions
1. Read RESEARCH.md carefully
2. Create PLAN.md in the same feature folder
3. Architecture decisions: resolve all decision points from research
4. Implementation steps: each step must have a concrete test gate
5. Success criteria: observable, testable, not implementation-internal
6. Out-of-scope: explicit list of what this plan does NOT cover
7. Do NOT write any implementation code
8. After writing the plan, ask the user to review and approve
## Constraints
- Do NOT write implementation code
- Steps must be sequential and independently testable
- Test gates must be runnable commands or precise manual checks, not vague descriptions
- Each step should be achievable in a single focused session
```
### `/rpi:implement`
Save to `.claude/commands/rpi-implement.md`:
```markdown
# RPI Phase 3: Implement
Implement the feature following an approved plan, one step at a time.
## Pre-check
Before starting:
1. Read the PLAN.md file specified in $ARGUMENTS
2. Verify it has an approval marked
3. If no approval found, stop and ask the user to approve first
## Instructions
For each step in the plan:
1. Read the step description carefully
2. Implement only what the step specifies — nothing more
3. Run the test gate exactly as written in the plan
4. If the gate passes:
- Commit with message: "feat([feature]): step [N] — [step name]"
- Announce step completion and readiness for next step
5. If the gate fails:
- Report the exact failure output
- Diagnose the likely cause
- Present options (fix, revise plan, stop)
- Wait for human decision before proceeding
## Constraints
- Never skip a test gate
- Never start the next step before the current gate passes
- Never modify files outside the scope of the current step
- If you encounter something unexpected that changes the plan, stop and report it
- Commit after each passing step — not at the end
```
---
## Worked Example
**Request**: "Add rate limiting to the public API endpoints."
### Phase 1: Research Output (abbreviated)
```markdown
# Research: API Rate Limiting
**Date**: 2026-03-12
**Status**: PENDING DECISION
## What Exists Today
- `src/middleware/` — has auth middleware, no rate limiting
- `package.json` — express-rate-limit not installed, redis available
- `src/routes/api.ts` — 14 public endpoints, unauthenticated routes mixed with authenticated ones
## What Needs to Be Built
- Rate limiter middleware for public endpoints
- Separate limits for authenticated vs unauthenticated users
- Redis store for distributed rate limiting (app runs on 3 instances)
## Risks
| Risk | Likelihood | Impact | Notes |
|------|-----------|--------|-------|
| Redis connection failure disables all API access | Low | High | Need fallback to in-memory if Redis unavailable |
| Rate limit too aggressive — breaks existing integrations | Medium | High | Need to survey current usage patterns first |
## Architecture Decision Points
1. **Library**: express-rate-limit (maintained, battle-tested) vs custom middleware
2. **Bypass for trusted IPs**: Allow internal services to bypass rate limiting?
## Effort Estimate
- Implementation: 2-4 hours
- Testing: 2 hours
- **Total**: 4-6 hours
**Recommendation**: GO — standard problem, good library options, main risk is Redis fallback which is solvable.
```
**Human decision**: GO. Use express-rate-limit. No IP bypass for now.
### Phase 2: Plan (abbreviated)
```markdown
# Plan: API Rate Limiting
**Risk level**: Medium (shared Redis state, potential to block legitimate traffic)
## Architecture Decisions
1. Library: express-rate-limit with rate-limit-redis store
2. No IP bypass initially — revisit if internal service issues arise
3. Unauthenticated: 100 requests/15 minutes. Authenticated: 1000 requests/15 minutes.
4. Redis failure fallback: in-memory store (accepts single-instance inconsistency)
## Implementation Steps
### Step 1: Install dependencies and configure Redis store
**Files**: `package.json`, `src/config/rate-limit.ts`
**Test gate**: `npm install` completes, `src/config/rate-limit.ts` exports config without errors
### Step 2: Implement rate limiter middleware
**Files**: `src/middleware/rate-limit.ts`
**Test gate**: Unit test — limiter blocks 101st request from same IP within 15 minutes
### Step 3: Apply to routes
**Files**: `src/routes/api.ts`
**Test gate**: Integration test — unauthenticated route returns 429 after 100 requests; authenticated route does not
### Step 4: Add Redis fallback
**Files**: `src/config/rate-limit.ts`
**Test gate**: Test with Redis unavailable — API still responds (200, not 500), in-memory limiting active
```
**Human review**: Approved.
### Phase 3: Implementation
Claude implements Step 1, runs the gate (`npm install` + import check), commits `feat(rate-limit): step 1 — dependencies and config`. Then Step 2, Step 3, Step 4 in sequence. Each commit is clean. Each gate must pass before proceeding.
---
## Comparison to Other Workflows
| Workflow | Phase structure | Human gates | Best for |
|----------|----------------|------------|----------|
| **RPI** | Research + Plan + Implement | GO/NO-GO + plan approval | Unknown feasibility, >1 day, high risk of wrong direction |
| **Dual-Instance** | Plan + Implement (separate Claude instances) | Plan approval | Known features needing careful execution, spec-heavy work |
| **Spec-First** | Spec + Implement | None (spec is implicit gate) | Design-focused work, API contracts, team alignment |
| **TDD** | Test-first + Implement | None (tests are the gate) | Test coverage as driver, refactoring, incremental behavior |
| **Direct** | None | None | Simple changes, obvious scope, less than 2 hours |
### RPI vs Dual-Instance
Dual-instance separates planning and implementation into two Claude instances with strict role enforcement. It works well when you already know what you're building and want a high-quality plan. RPI adds a feasibility phase before planning, which makes it better for ambiguous requests. If you already have a clear spec, skip Research and use dual-instance or spec-first.
### RPI vs Spec-First
Spec-first is design-oriented: you define what the system should do, then Claude implements it. RPI is implementation-oriented with validation gates: you describe a goal, Claude researches how to achieve it, then the two of you agree on a plan before touching code. Use spec-first when the design is clear. Use RPI when the technical path is not.
### RPI vs Direct Coding
For anything under 2 hours with a clear scope, just ask Claude to do it. RPI adds overhead that isn't justified for small tasks. The research phase alone takes 30-60 minutes. That overhead pays off on multi-day features where discovering a wrong assumption late is much more expensive.
---
## Tips and Troubleshooting
### Claude Skips to Implementation in Research Phase
**Problem**: Claude starts writing code during the research phase.
**Solution**: Use Plan Mode (Shift+Tab twice) explicitly for the research phase. Include this line in the research command:
```
You are in Plan Mode. Do not modify files. Do not write implementation code.
Research only. Output goes to RESEARCH.md.
```
Or add to your CLAUDE.md:
```markdown
## RPI Rules
- /rpi:research runs in Plan Mode only — no file modifications
- /rpi:plan produces only PLAN.md — no implementation code
- /rpi:implement runs one step at a time, waits for test gate before next step
```
### Research Phase Runs Too Long
**Problem**: Claude explores the entire codebase instead of focusing on what's relevant.
**Solution**: Scope the research explicitly:
```
/rpi:research payment-processing
Focus area: src/payments/, src/routes/checkout.ts
Do not explore: frontend, auth, unrelated backend modules
Time budget: complete research in one session
```
### Plan Has Too Many Steps
**Problem**: PLAN.md has 12 steps, making the implementation unwieldy.
**Solution**: A plan with more than 6-8 steps usually needs a scope reduction, not more granular implementation. Ask Claude:
```
The plan has too many steps. What is the minimal viable scope that delivers
the core value? Revise the plan to implement only that, with a clear
"Future work" section for the rest.
```
### Test Gates Are Vague
**Problem**: A step's test gate says "verify it works" rather than a specific command.
**Solution**: Reject vague gates before approving the plan. Push back with:
```
Step 3's test gate is "verify rate limiting works." Make it specific:
what command do I run, and what output do I expect to see when it passes?
```
A good test gate:
```
Test gate: `npm test src/middleware/rate-limit.test.ts` — all 4 tests pass
```
A bad test gate:
```
Test gate: rate limiting is working correctly
```
### A Step Gate Fails Repeatedly
**Problem**: Step 2's gate keeps failing even after attempted fixes.
**Solution**: Two failures means investigate the plan, not keep iterating on fixes.
```
Stop implementation. The Step 2 gate has failed twice.
Review the plan — is the test gate achievable given the Step 1 output?
Do we need to revise the plan before continuing?
```
---
## File Structure Summary
```
.claude/
└── features/
└── [feature-name]/
├── RESEARCH.md # Phase 1 output (human annotates GO/NO-GO)
└── PLAN.md # Phase 2 output (human annotates approval)
.claude/commands/
├── rpi-research.md # /rpi:research slash command
├── rpi-plan.md # /rpi:plan slash command
└── rpi-implement.md # /rpi:implement slash command
```
Archive completed features:
```bash
mkdir -p .claude/features/_archive
mv .claude/features/payment-processing .claude/features/_archive/
```
The archive is a learning resource: completed RESEARCH.md and PLAN.md files show how previous features were reasoned about.
---
## See Also
- [dual-instance-planning.md](./dual-instance-planning.md) — Two-instance pattern for spec-heavy implementation
- [spec-first.md](./spec-first.md) — Design-first workflow using CLAUDE.md as contract
- [tdd-with-claude.md](./tdd-with-claude.md) — Combine with TDD for test-gate implementation
- [task-management.md](./task-management.md) — Managing multi-phase tasks across sessions
- **Main guide**: Advanced Patterns section — Multi-instance and planning pattern overview

View file

@ -106,6 +106,13 @@ deep_dive:
rpi_phase3_implement: "guide/workflows/rpi.md:237" # Step-gate pattern + failure protocol
rpi_slash_commands: "guide/workflows/rpi.md:281" # /rpi:research /rpi:plan /rpi:implement templates
rpi_vs_other_workflows: "guide/workflows/rpi.md:391" # Comparison table: RPI vs dual-instance vs spec-first
# Changelog Fragments Workflow
changelog_fragments_workflow: "guide/workflows/changelog-fragments.md" # Per-PR YAML fragments, 3-layer enforcement
changelog_fragments_claude_rule: "guide/workflows/changelog-fragments.md:26" # CLAUDE.md rule for autonomous fragment creation
changelog_fragments_hook: "guide/workflows/changelog-fragments.md:88" # UserPromptSubmit 3-tier hook (enforcement/discovery/contextual)
changelog_fragments_ci: "guide/workflows/changelog-fragments.md:169" # Independent CI migration check job
# Smart-Suggest Hook
smart_suggest_hook: "examples/hooks/bash/smart-suggest.sh" # UserPromptSubmit behavioral coach, 3-tier priority, ROI logging
# Template Installation
install_templates_script: "scripts/install-templates.sh"
# Session management