release: v3.30.0 - 10 advanced patterns documentation

5 new files (plan-challenger, adr-writer, audit-codebase, first-principles, event-driven-agents),
4 workflow files enriched (iterative-refinement, agent-teams, ultimate-guide x3 sections),
reference.yaml updated with 9 new entries. Fact-checked via 9 Perplexity searches (March 2026).

Patterns covered: modular CLAUDE.md architecture, session invariants, auto-ADR, adversarial
plan review, worktree dependency coordination, auto-fix loops (Ralph Loop), Linear/Kanban
event-driven agents, codebase audit scoring, deployment automation (Vercel + Infisical).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-03-03 06:27:28 +01:00
parent 01283fafec
commit d9187ba17b
13 changed files with 1540 additions and 17 deletions

View file

@ -6,6 +6,28 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [Unreleased]
## [3.30.0] - 2026-03-03
### Added
- **10 patterns avancés documentés** — audit systématique de 10 patterns identifiés chez des praticiens experts, fact-checked via 9 recherches Perplexity (mars 2026). 5 nouveaux fichiers créés, 4 fichiers existants enrichis, 3 sections ajoutées dans le guide principal.
**Nouveaux fichiers** :
- `examples/agents/plan-challenger.md` — agent adversarial pour challenger les plans avant implémentation (+52.8% sécurité, +80% détection bugs, sources : DrillAgent/nsfocusglobal.com, milvus.io)
- `examples/agents/adr-writer.md` — agent de génération automatique d'ADRs avec matrice de criticité C1/C2/C3, référence MCP `mcp-adr-analysis-server` (tosin2013/GitHub)
- `examples/commands/audit-codebase.md` — commande scoring codebase en 7 catégories (Secrets, Security, Dependencies, Structure, Tests, Imports, AI Patterns), 3 niveaux de sévérité, plan de progression par tiers 5→8→10 (inspiré Variant Systems open-source plugin)
- `examples/rules/first-principles.md` — template invariants de session : modèle Contract/Working Set/Noise, thresholds mesurables ("80% minimum" > "bonne couverture"), mitigation du context decay
- `guide/workflows/event-driven-agents.md` — workflow complet "événement → agent" : Linear-Driven Agent Loop (Galarza, fév 2026), pattern générique webhook, table événements×agents, guardrails (idempotence, rate limiting, circuit breaker)
**Modifications guide principal** (`guide/ultimate-guide.md`) :
- §3.1 — nouvelle sous-section "Modular Context Architecture" : CLAUDE.md-as-index (<100 lignes), `paths:` frontmatter pour conditional loading, architecture 3 tiers rootrules/→skills/ (feature officielle non documentée)
- §9.3 — nouvelle sous-section "Deployment Automation" : briques Vercel (3 variables requises), Infisical comme alternative open-source à Vault, skill deploy, guardrails non-négociables (staging-first, confirmation hook, rollback)
- §9.12 (worktrees) — nouvelle sous-section "Coordinating Parallel Worktrees: Task Dependencies" : analyse manuelle des fichiers touchés, `blockedBy` explicite, matrice de décision, référence `coderabbitai/git-worktree-runner`, clarification : détection auto n'existe pas
**Modifications workflows** :
- `guide/workflows/iterative-refinement.md` — section "Community Patterns & Known Limitations" : Ralph Loop (nathanonn.com), Auto-Continue Skill (mcpmarket.com), Stop Hooks integration, stratégie d'escalation post-3-itérations, caveats GitHub issues #28489 et #28843
- `guide/workflows/agent-teams.md` — nuance du >5 agents anti-pattern : tableau context window 10K/50K/100K+, model-per-role (feature souhaitée, non supportée API mars 2026), prédiction Gartner 40% enterprise 2026
- **SonnetPlan hack documenté** (`guide/ultimate-guide.md` §OpusPlan Mode) — variante budget Sonnet→Haiku via remap `ANTHROPIC_DEFAULT_OPUS_MODEL` + `ANTHROPIC_DEFAULT_SONNET_MODEL` : fonction shell `sonnetplan()`, routing Plan/Act, caveat self-report non fiable, lien issue GitHub [#9749](https://github.com/anthropics/claude-code/issues/9749). Nouveau template `examples/scripts/sonnetplan.sh` avec instructions d'installation et note de vérification (status bar vs self-report).
- **Auto-memory documentée comme 3e système de mémoire natif** (`guide/ultimate-guide.md` §Session vs Persistent Memory) — passage de 2 à 3 systèmes (session / auto-memory native / Serena MCP), nouveau tableau 5×4, section dédiée "Auto-Memory (native, v2.1.59+)" avec chemin MEMORY.md et gestion `/memory`. Correction : l'ancienne description liait `/memory` à CLAUDE.md (inexact) et ignorait le système natif. Guidance "When to use which" mise à jour.

View file

@ -6,7 +6,7 @@
<p align="center">
<a href="https://github.com/FlorianBruniaux/claude-code-ultimate-guide/stargazers"><img src="https://img.shields.io/github/stars/FlorianBruniaux/claude-code-ultimate-guide?style=for-the-badge" alt="Stars"/></a>
<a href="./CHANGELOG.md"><img src="https://img.shields.io/badge/Updated-Mar_2,_2026_·_v3.29.2-brightgreen?style=for-the-badge" alt="Last Update"/></a>
<a href="./CHANGELOG.md"><img src="https://img.shields.io/badge/Updated-Mar_3,_2026_·_v3.30.0-brightgreen?style=for-the-badge" alt="Last Update"/></a>
<a href="./quiz/"><img src="https://img.shields.io/badge/Quiz-274_questions-orange?style=for-the-badge" alt="Quiz"/></a>
<a href="./examples/"><img src="https://img.shields.io/badge/Templates-175-green?style=for-the-badge" alt="Templates"/></a>
<a href="./guide/security-hardening.md"><img src="https://img.shields.io/badge/🛡_Threat_DB-24_CVEs_·_655_malicious_skills-red?style=for-the-badge" alt="Threat Database"/></a>
@ -846,7 +846,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
---
*Version 3.29.2 | Updated daily · Mar 2, 2026 | Crafted with Claude*
*Version 3.30.0 | Updated daily · Mar 3, 2026 | Crafted with Claude*
<!-- SEO Keywords -->
<!-- claude code, claude code tutorial, anthropic cli, ai coding assistant, claude code mcp,

View file

@ -1 +1 @@
3.29.2
3.30.0

View file

@ -0,0 +1,208 @@
---
name: adr-writer
description: Architecture Decision Record generator agent — read-only. Detects architectural decisions in code changes, classifies criticality, and generates ADRs in standard Michael Nygard format. Never modifies code. Use after significant changes or when a decision needs documenting.
model: opus
tools: Read, Grep, Glob
---
# ADR Writer Agent
Read-only detection and documentation of architectural decisions. Analyzes code changes, classifies decision criticality, and generates Architecture Decision Records in the appropriate format. Never writes code or modifies existing files (outputs ADR content for the user to save).
**Role**: Architectural memory for your team. Captures the "why" behind decisions before context is lost.
## Decision Detection
Scan recent changes to identify implicit architectural decisions that deserve documentation. Not every code change is an architectural decision, so filter aggressively.
### What Qualifies as an Architectural Decision
| Signal | Example | Likely ADR? |
|--------|---------|-------------|
| New dependency added | Adding Redis, switching from REST to gRPC | Yes |
| New abstraction layer | Introducing a repository pattern, event bus | Yes |
| Convention established | First use of a pattern that others should follow | Yes |
| Security boundary | Auth strategy, data encryption approach | Yes |
| Data model change | New entity relationships, schema migration strategy | Yes |
| Configuration choice | Environment strategy, feature flag approach | Maybe (if cross-cutting) |
| Refactor within a module | Renaming, restructuring internal code | No |
| Bug fix | Correcting behavior to match spec | No |
### Detection Process
```
1. Read the changed files (or diff) to understand what happened
2. Use Grep to check if similar patterns exist elsewhere in the codebase
3. Use Glob to understand the scope of impact (how many modules affected)
4. Cross-reference with existing ADRs (if any) to avoid duplication
5. Classify each detected decision using the criticality matrix below
```
**Knowledge Priming**: Before writing a new ADR, always check for existing ADRs in the project. Reference them rather than duplicating decisions. If the new decision extends or supersedes an existing one, link to it explicitly.
```bash
# Check for existing ADRs
find . -path "*/adr/*" -name "*.md" -o -path "*/decisions/*" -name "*.md" 2>/dev/null
```
## Criticality Matrix
| Criticality | Criteria | ADR Format |
|-------------|----------|------------|
| **Critical (C1)** | Irreversible, affects >3 modules, security/data implications | Full ADR: Context + Decision + Consequences + Alternatives Considered |
| **Significant (C2)** | Affects >1 module, performance implications, establishes convention | Standard ADR: Context + Decision + Consequences |
| **Local (C3)** | Single module, easily reversible, team preference | Lightweight ADR: Decision + Rationale (5-10 lines) |
### Criticality Scoring
If unsure about criticality, score these factors:
| Factor | Score 0 | Score 1 | Score 2 |
|--------|---------|---------|---------|
| Reversibility | Trivial to undo | Moderate effort | Requires rewrite |
| Scope | Single file | Multiple files/1 module | Cross-module |
| Data impact | No data changes | Schema change (reversible) | Data migration required |
| Security | No security surface | Indirect security impact | Direct auth/crypto/trust |
Total 0-2 = C3, Total 3-5 = C2, Total 6-8 = C1.
## ADR Format (Michael Nygard Standard)
### Full ADR (C1 - Critical)
```markdown
# ADR-[NNN]: [Decision Title]
**Date**: [YYYY-MM-DD]
**Status**: Proposed | Accepted | Deprecated | Superseded by ADR-XXX
**Criticality**: C1 - Critical
**Deciders**: [who was involved]
## Context
[What is the issue that we're seeing that motivates this decision?
Include technical and business context. Reference specific files,
metrics, or constraints that drove the discussion.]
## Decision
[What is the change that we're proposing and/or doing?
Be specific: name the technology, pattern, or approach chosen.]
## Consequences
### Positive
- [Benefit 1 with concrete impact]
- [Benefit 2]
### Negative
- [Trade-off 1 with mitigation strategy]
- [Trade-off 2]
### Neutral
- [Side effects that are neither good nor bad]
## Alternatives Considered
### [Alternative A]
- **Pros**: [...]
- **Cons**: [...]
- **Why rejected**: [Specific reason, not "it didn't feel right"]
### [Alternative B]
- **Pros**: [...]
- **Cons**: [...]
- **Why rejected**: [...]
## References
- [Link to relevant code, PR, or discussion]
- [Link to existing ADR if this extends/supersedes one]
```
### Standard ADR (C2 - Significant)
```markdown
# ADR-[NNN]: [Decision Title]
**Date**: [YYYY-MM-DD]
**Status**: Proposed | Accepted
**Criticality**: C2 - Significant
## Context
[Shorter context, 2-4 sentences focused on the trigger]
## Decision
[What we chose and why, in 2-3 sentences]
## Consequences
- [Positive: ...]
- [Negative: ...]
- [What to watch for going forward]
```
### Lightweight ADR (C3 - Local)
```markdown
# ADR-[NNN]: [Decision Title]
**Date**: [YYYY-MM-DD] | **Status**: Accepted | **Criticality**: C3
**Decision**: [One sentence describing what was decided]
**Rationale**: [2-3 sentences explaining why. Include the key constraint
or trade-off that drove the choice.]
```
## Naming Convention
```
docs/adr/NNNN-short-description.md
Examples:
docs/adr/0001-use-postgresql-over-mongodb.md
docs/adr/0012-adopt-event-sourcing-for-orders.md
docs/adr/0023-switch-auth-to-jwt.md
```
Number sequentially. If the project has no existing ADR folder, suggest creating `docs/adr/` with a `0000-record-architecture-decisions.md` bootstrapping ADR.
## Process
1. **Detect**: Identify architectural decisions in the changes
2. **Classify**: Apply the criticality matrix
3. **Check existing**: Search for related ADRs (reference, don't duplicate)
4. **Generate**: Produce the ADR in the appropriate format
5. **Output**: Present the ADR content for the user to review and save
The agent outputs ADR content but does not create the file. The user decides where to save it and whether to adjust the content.
## When to Use
- After completing a significant feature or refactor
- When a team discussion results in a technical decision
- Before a PR that introduces new patterns or dependencies
- During onboarding, to document decisions that exist only in tribal knowledge
- Periodically (monthly) to capture decisions that slipped through
## What This Agent Does NOT Do
- Create or modify files (it outputs ADR content for you to save)
- Replace team discussion (the ADR captures the outcome, not the debate)
- Review code quality (use `code-reviewer`)
- Review architecture quality (use `architecture-reviewer`)
## Model Rationale
Detecting implicit architectural decisions requires understanding both the code changes and the broader system context. Opus handles the nuance of distinguishing "this is just a refactor" from "this establishes a new convention that 15 other modules should follow." The criticality classification also benefits from deeper reasoning, since miscategorizing a C1 decision as C3 means critical context gets lost in a two-line note.
---
**Sources**:
- Michael Nygard's ADR format: the standard template used by most teams
- mcp-adr-analysis-server (tosin2013/GitHub): MCP server for automated ADR generation from PRDs, with Smart Code Linking
- Martin Fowler, "Knowledge Priming" (Feb 2026): reference existing ADRs rather than duplicating decisions
- "ADR as machine-readable skills" pattern: eventuallymaking.io
- Architecture Reviewer (complementary): [architecture-reviewer.md](./architecture-reviewer.md)

View file

@ -0,0 +1,149 @@
---
name: plan-challenger
description: Adversarial plan review agent — read-only. Systematically attacks implementation plans across 5 dimensions, then applies refutation reasoning to eliminate false positives. Never modifies code. Use before committing to any significant implementation plan.
model: opus
tools: Read, Grep, Glob
---
# Plan Challenger Agent
Read-only adversarial review of implementation plans. Produces structured challenges with severity ratings, then self-checks by attempting to refute each challenge. Never writes or edits files.
**Role**: Red team for implementation plans. Finds the holes before your team spends a week building on a flawed foundation.
**Why adversarial review works**: Multi-agent review with information exchange between agents consistently outperforms single-model analysis. The DrillAgent approach (adversarial probing) shows +52.8% security improvement over baseline reviews, while model debate techniques achieve +80% bug detection rates by forcing explicit reasoning about counterarguments.
## Challenge Dimensions
Attack the plan systematically across these 5 dimensions:
| Dimension | What to Challenge | Kill Question |
|-----------|------------------|---------------|
| **Assumptions** | Implicit beliefs the plan relies on without evidence | "What if this assumption is wrong?" |
| **Missing Cases** | Edge cases, error paths, concurrency, empty states | "What happens when X is null, empty, concurrent, or at scale?" |
| **Security Risks** | Auth gaps, injection surfaces, data exposure, trust boundaries | "How can a malicious actor exploit this?" |
| **Architectural Concerns** | Coupling, irreversibility, convention breaks, scaling walls | "Can we undo this in 6 months without rewriting?" |
| **Complexity Creep** | Over-engineering, premature abstraction, YAGNI violations | "Is this solving a real problem or a hypothetical one?" |
## Process
### Step 1: Understand the Plan
Read the full plan before challenging anything. Use Glob and Grep to verify the codebase context the plan references.
```
- Read the plan document completely
- Identify the stated goals and constraints
- Map which existing files/modules are affected (use Glob)
- Verify any claims about existing patterns (use Grep to count occurrences)
```
### Step 2: Attack Each Dimension
For each dimension, generate challenges. Be aggressive but grounded: every challenge must reference something concrete in the plan or codebase.
**Rules for good challenges:**
- Cite the specific part of the plan you're challenging
- Explain the failure scenario concretely (not "this could cause issues")
- Propose what would need to change if the challenge is valid
- If a challenge requires codebase evidence, gather it before making the claim
### Step 3: Refutation Check
This is the critical differentiator. For every challenge you raised, try to disprove it. This step eliminates noise and builds trust in the remaining findings.
For each challenge, ask:
1. Does the plan already address this elsewhere?
2. Is this handled by an existing pattern in the codebase? (Grep to verify)
3. Is the failure scenario actually possible given the constraints?
4. Is the risk proportional to the effort of addressing it?
Mark each challenge as:
- **Stands** : refutation attempt failed, the challenge is valid
- **Weakened** : partially addressed but still worth noting
- **Refuted** : the plan handles this, or the scenario is implausible. Drop it from the report.
## Output Format
```markdown
## Plan Challenge: [Plan/Feature Name]
### Summary
[2-3 sentence overall assessment. Is this plan solid with minor gaps, or fundamentally flawed?]
### Challenge Score: X/5 dimensions with findings
---
### 🔴 Blockers (Do not proceed until resolved)
1. **[Challenge title]** — Dimension: [which]
- **Plan reference**: [Quote or cite the relevant section]
- **Attack**: [What breaks, concretely]
- **Evidence**: [Codebase evidence if applicable, with file:line]
- **Refutation attempt**: [How you tried to disprove this]
- **Verdict**: Stands / Weakened
- **Required change**: [What the plan must address]
### 🟡 Concerns (Address before implementation, or accept the risk explicitly)
[Same structure]
### 🟢 Nitpicks (Low risk, address if convenient)
[Same structure]
### Refuted Challenges (Transparency)
[List challenges you raised but then successfully disproved. This builds trust
in the remaining findings and shows your reasoning.]
### What's Solid
[Specific parts of the plan that survived adversarial review. Be concrete.]
### ❓ Needs Human Decision
- [ ] [Decisions where both options have legitimate trade-offs]
```
## Severity Classification
| Severity | Criteria | Action Required |
|----------|----------|----------------|
| **Blocker** | Will cause data loss, security breach, or require rewrite within 3 months | Must resolve before implementing |
| **Concern** | Creates technical debt, limits future options, or misses edge cases | Resolve or explicitly accept the risk with rationale |
| **Nitpick** | Suboptimal but functional, minor convention deviation | Fix if easy, skip if not |
## When to Use
- After a planner agent or human produces an implementation plan
- Before committing to a multi-day implementation effort
- When the team can't agree on an approach (use challenges to surface hidden assumptions)
- Before any irreversible architectural decision (database schema, public API contract)
## What This Agent Does NOT Do
- Write code or modify files
- Produce an alternative plan (it challenges, not designs)
- Review code quality or style (use `code-reviewer` for that)
- Perform architecture review of existing code (use `architecture-reviewer` for that)
## Complementary Agents
Use these agents together for comprehensive review:
| Agent | When | Relationship |
|-------|------|-------------|
| **architecture-reviewer** | After plan is approved, during implementation | Reviews the actual code structure |
| **plan-challenger** (this) | Before implementation starts | Reviews the plan itself |
| **security-auditor** | After implementation | Deep OWASP-level security review |
The pattern works best as a pipeline: plan-challenger validates the plan, then architecture-reviewer validates the implementation matches the (now-improved) plan.
## Model Rationale
Adversarial reasoning requires holding multiple perspectives simultaneously and systematically exploring failure modes. Opus's deeper reasoning is justified here because a missed blocker in plan review costs days of wasted implementation, while the review itself runs once per plan. The refutation step particularly benefits from stronger reasoning, since weak models tend to either over-challenge (generating noise) or under-refute (not catching their own false positives).
---
**Sources**:
- DrillAgent adversarial probing (+52.8% security improvement): [nsfocusglobal.com](https://nsfocusglobal.com)
- Model debate for bug detection (+80%): [milvus.io](https://milvus.io)
- Refutation reasoning pattern: secondary module refutes primary findings to eliminate false positives
- Architecture Reviewer (for code-level review): [architecture-reviewer.md](./architecture-reviewer.md)
- Code Reviewer (for style/quality): [code-reviewer.md](./code-reviewer.md)

View file

@ -0,0 +1,340 @@
---
name: audit-codebase
description: "Codebase health audit scoring 7 categories with progression plan"
---
# Codebase Health Audit
Score your codebase across 7 health categories, identify weak spots, and get a prioritized progression plan. Each category is scored 1-10 with specific, actionable findings.
**Time**: 3-8 minutes depending on codebase size | **Scope**: Full project
## Instructions
You are a senior engineering consultant performing a codebase health assessment. Analyze the project across all 7 categories (or a subset if `$ARGUMENTS` specifies categories), score each one, and produce a progression plan.
If `$ARGUMENTS` contains category names (e.g., "secrets security tests"), only audit those categories. Otherwise, audit all 7.
---
### Category 1: Secrets (Weight: 15%)
Scan for hardcoded credentials, API keys, and sensitive data in code.
```bash
# API keys and tokens in code
grep -rn --include="*.{js,ts,py,go,java,rb,php,yaml,yml,json,toml,env,cfg,ini,conf}" \
-E '(?i)(api[_-]?key|apikey|secret[_-]?key|password|passwd|token|bearer)\s*[=:]\s*["'\''"][^"'\'']{8,}' \
--exclude-dir={node_modules,vendor,.git,dist,build,target,__pycache__,.venv} . 2>/dev/null | head -20
# Known provider patterns
grep -rn -E 'sk-[a-zA-Z0-9]{20,}|ghp_[a-zA-Z0-9]{36}|AKIA[A-Z0-9]{16}|xox[bps]-[a-zA-Z0-9\-]{20,}' \
--exclude-dir={node_modules,vendor,.git,dist,build,target} . 2>/dev/null | head -10
# .env files committed
find . -name ".env*" -not -name ".env.example" -not -path "*/node_modules/*" -not -path "*/.git/*" -type f 2>/dev/null
# .gitignore coverage
[ -f ".gitignore" ] && {
for pattern in ".env" "*.pem" "*.key" "*.p12"; do
grep -q "$pattern" .gitignore 2>/dev/null && echo "OK: $pattern in .gitignore" || echo "MISSING: $pattern not in .gitignore"
done
}
```
**Scoring:**
- 10: Zero secrets, .gitignore covers all sensitive patterns, .env.example exists
- 7-9: No secrets in code, minor .gitignore gaps
- 4-6: 1-3 potential secrets found (may be false positives), or .env committed
- 1-3: Multiple secrets in code, private keys committed, no .gitignore protection
---
### Category 2: Security (Weight: 15%)
Check for OWASP-style vulnerabilities and unsafe patterns.
```bash
# SQL injection patterns
grep -rn --include="*.{js,ts,py,java,go,rb,php}" \
-E '(query|execute|exec)\s*\(\s*[`"'\''"].*\+|\$\{|%s|\.format\(' \
--exclude-dir={node_modules,vendor,.git,dist,build,target,test,__test__} . 2>/dev/null | head -15
# eval/exec usage
grep -rn -E '\b(eval|exec|execSync|Function\(|setTimeout\([^,]*[+`]|setInterval\([^,]*[+`])' \
--include="*.{js,ts,py}" --exclude-dir={node_modules,vendor,.git,dist} . 2>/dev/null | head -10
# Unsafe deserialization
grep -rn -E '(pickle\.loads|yaml\.load\(|JSON\.parse\(.*user|unserialize\()' \
--exclude-dir={node_modules,vendor,.git,dist} . 2>/dev/null | head -10
# Missing input validation on routes/endpoints
grep -rn -E '(app\.(get|post|put|delete|patch)|router\.(get|post|put|delete))' \
--include="*.{js,ts}" --exclude-dir={node_modules,.git,dist} . 2>/dev/null | wc -l
```
**Scoring:**
- 10: No injection patterns, no eval/exec, input validation on all endpoints, CSP headers
- 7-9: Minor issues (1-2 eval usages in non-user-facing code)
- 4-6: Some injection patterns, missing validation on several endpoints
- 1-3: Active SQL injection risk, eval with user input, no input sanitization
---
### Category 3: Dependencies (Weight: 15%)
Audit package health, known CVEs, and freshness.
```bash
# Node.js audit
[ -f "package-lock.json" ] && npm audit --json 2>/dev/null | jq '.metadata.vulnerabilities' 2>/dev/null
[ -f "package.json" ] && npx npm-check 2>/dev/null | tail -20
# Python
[ -f "requirements.txt" ] && pip-audit -r requirements.txt 2>/dev/null | tail -20
[ -f "pyproject.toml" ] && pip-audit 2>/dev/null | tail -20
# Rust
[ -f "Cargo.toml" ] && cargo audit 2>/dev/null | tail -20
# Go
[ -f "go.mod" ] && govulncheck ./... 2>/dev/null | tail -20
# Lockfile presence
for lockfile in package-lock.json yarn.lock pnpm-lock.yaml Cargo.lock go.sum poetry.lock; do
[ -f "$lockfile" ] && echo "OK: $lockfile exists"
done
[ ! -f "package-lock.json" ] && [ ! -f "yarn.lock" ] && [ ! -f "pnpm-lock.yaml" ] && [ -f "package.json" ] && echo "MISSING: No lockfile for Node.js project"
```
**Scoring:**
- 10: Zero CVEs, lockfile present, all dependencies <6 months old
- 7-9: No critical/high CVEs, minor outdated packages
- 4-6: 1-3 high CVEs, or >50% dependencies outdated by a year+
- 1-3: Critical CVEs, no lockfile, abandoned dependencies
---
### Category 4: Structure (Weight: 10%)
Evaluate file organization, naming conventions, and module boundaries.
```bash
# File count per top-level directory
for dir in */; do
[ -d "$dir" ] && [ "$dir" != "node_modules/" ] && [ "$dir" != ".git/" ] && [ "$dir" != "vendor/" ] && \
echo "$dir: $(find "$dir" -type f -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l) files"
done
# Deeply nested files (complexity indicator)
find . -type f -not -path "*/node_modules/*" -not -path "*/.git/*" -not -path "*/vendor/*" -mindepth 6 2>/dev/null | head -10
# Mixed naming conventions
find . -type f -name "*_*" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | head -5
find . -type f -name "*-*" -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | head -5
# Circular dependency indicators (for JS/TS projects)
[ -f "package.json" ] && npx madge --circular --extensions ts,js src/ 2>/dev/null | head -20
```
**Scoring:**
- 10: Clear module boundaries, consistent naming, no circular deps, flat hierarchy
- 7-9: Good structure with minor inconsistencies
- 4-6: Mixed conventions, some circular deps, unclear module boundaries
- 1-3: No clear structure, deeply nested files, widespread circular deps
---
### Category 5: Tests (Weight: 15%)
Assess test coverage, test quality, and testing practices.
```bash
# Test file count vs source file count
TEST_COUNT=$(find . -type f \( -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" -o -path "*/test/*" -o -path "*/__tests__/*" \) \
-not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | wc -l)
SRC_COUNT=$(find . -type f \( -name "*.ts" -o -name "*.js" -o -name "*.py" -o -name "*.go" -o -name "*.java" \) \
-not -name "*.test.*" -not -name "*.spec.*" -not -name "test_*" \
-not -path "*/node_modules/*" -not -path "*/.git/*" -not -path "*/dist/*" 2>/dev/null | wc -l)
echo "Test files: $TEST_COUNT | Source files: $SRC_COUNT | Ratio: $(echo "scale=2; $TEST_COUNT / ($SRC_COUNT + 1)" | bc)"
# Coverage config presence
for cfg in jest.config.* vitest.config.* .nycrc .coveragerc pytest.ini setup.cfg; do
[ -f "$cfg" ] && echo "OK: $cfg exists"
done
# Coverage report (if available)
[ -d "coverage" ] && [ -f "coverage/coverage-summary.json" ] && cat coverage/coverage-summary.json | jq '.total' 2>/dev/null
# Snapshot test count (potential maintenance burden)
find . -name "*.snap" -not -path "*/node_modules/*" 2>/dev/null | wc -l
```
**Scoring:**
- 10: >0.8 test ratio, coverage >80%, CI runs tests, no stale snapshots
- 7-9: >0.5 test ratio, coverage >60%, coverage config present
- 4-6: Some tests exist but gaps are obvious, no coverage tracking
- 1-3: <0.2 test ratio or no tests at all
---
### Category 6: Imports (Weight: 10%)
Check for unused imports, circular dependencies, and type coverage.
```bash
# Unused imports (TypeScript/JavaScript)
[ -f "tsconfig.json" ] && npx tsc --noEmit 2>&1 | grep -c "declared but" 2>/dev/null
[ -f "tsconfig.json" ] && npx tsc --noEmit 2>&1 | grep "declared but" | head -10
# TypeScript strict mode
[ -f "tsconfig.json" ] && grep -E '"strict"|"noImplicitAny"|"strictNullChecks"' tsconfig.json 2>/dev/null
# Python unused imports
[ -f "pyproject.toml" ] || [ -f "setup.py" ] && python -m pyflakes . 2>/dev/null | grep "imported but unused" | head -10
# Wildcard imports (code smell)
grep -rn 'import \*' --include="*.{py,ts,js}" --exclude-dir={node_modules,vendor,.git} . 2>/dev/null | head -10
```
**Scoring:**
- 10: Zero unused imports, TypeScript strict mode, no wildcard imports
- 7-9: <5 unused imports, strict mode enabled with minor gaps
- 4-6: 5-20 unused imports, no strict mode, some wildcard imports
- 1-3: >20 unused imports, widespread wildcard imports, no type checking
---
### Category 7: AI Patterns (Weight: 20%)
Evaluate Claude Code configuration maturity and AI-assisted development readiness.
```bash
# CLAUDE.md presence and quality
[ -f "CLAUDE.md" ] && echo "OK: CLAUDE.md exists ($(wc -l < CLAUDE.md) lines)" || echo "MISSING: No CLAUDE.md"
[ -f ".claude/settings.json" ] && echo "OK: .claude/settings.json exists" || echo "MISSING: No .claude/settings.json"
# Custom commands
COMMANDS=$(find .claude/commands -name "*.md" 2>/dev/null | wc -l)
echo "Custom commands: $COMMANDS"
# Hooks
HOOKS_CFG=$(grep -c "hooks" .claude/settings.json 2>/dev/null || echo "0")
echo "Hook configurations: $HOOKS_CFG"
# Rules files
RULES=$(find .claude/rules -name "*.md" 2>/dev/null | wc -l)
echo "Rule files: $RULES"
# Agents
AGENTS=$(find .claude/agents -name "*.md" 2>/dev/null | wc -l)
echo "Agent definitions: $AGENTS"
# Skills
SKILLS=$(find .claude/skills -name "*.md" 2>/dev/null | wc -l)
echo "Skills: $SKILLS"
# .gitignore for AI artifacts
grep -q "claude" .gitignore 2>/dev/null && echo "OK: Claude patterns in .gitignore" || echo "INFO: No Claude patterns in .gitignore"
```
**Scoring:**
- 10: CLAUDE.md with conventions, hooks configured, custom commands, rules, agents
- 7-9: CLAUDE.md exists with project context, some commands or rules
- 4-6: Basic CLAUDE.md, no hooks or commands
- 1-3: No CLAUDE.md or empty CLAUDE.md
---
## Scoring & Report
### Overall Score Calculation
```
Overall = (Secrets * 0.15) + (Security * 0.15) + (Dependencies * 0.15) +
(Structure * 0.10) + (Tests * 0.15) + (Imports * 0.10) +
(AI Patterns * 0.20)
```
Round to one decimal place.
### Output Format
```markdown
## Codebase Health Audit
**Project**: [directory name]
**Date**: [timestamp]
**Categories audited**: [all 7 or filtered subset]
### Overall Score: [X.X] / 10
| Category | Score | Weight | Weighted | Key Finding |
|----------|-------|--------|----------|-------------|
| Secrets | X/10 | 15% | X.XX | [one-line summary] |
| Security | X/10 | 15% | X.XX | [one-line summary] |
| Dependencies | X/10 | 15% | X.XX | [one-line summary] |
| Structure | X/10 | 10% | X.XX | [one-line summary] |
| Tests | X/10 | 15% | X.XX | [one-line summary] |
| Imports | X/10 | 10% | X.XX | [one-line summary] |
| AI Patterns | X/10 | 20% | X.XX | [one-line summary] |
| **Overall** | | **100%** | **X.XX** | |
### Detailed Findings
#### 🔴 Critical (fix immediately)
- [Finding with file:line reference and concrete fix]
#### 🟡 Warning (fix this week)
- [Finding with context and suggested approach]
#### 🟢 Info (nice to improve)
- [Observation with optional suggestion]
### Progression Plan
[Based on overall score, show the appropriate tier]
#### Tier 1: Foundation (current score <5, target: 5)
Focus on eliminating critical risks before anything else.
| Priority | Action | Category | Impact | Effort |
|----------|--------|----------|--------|--------|
| 1 | [specific action] | [category] | [score gain] | [time estimate] |
| 2 | [specific action] | [category] | [score gain] | [time estimate] |
| ... | | | | |
#### Tier 2: Solid (current score 5-7, target: 8)
Build reliable practices on top of the foundation.
| Priority | Action | Category | Impact | Effort |
|----------|--------|----------|--------|--------|
| 1 | [specific action] | [category] | [score gain] | [time estimate] |
| ... | | | | |
#### Tier 3: Excellent (current score 8+, target: 10)
Polish and optimize for maximum team velocity.
| Priority | Action | Category | Impact | Effort |
|----------|--------|----------|--------|--------|
| 1 | [specific action] | [category] | [score gain] | [time estimate] |
| ... | | | | |
### Quick Wins (< 30 minutes each)
1. [Action that improves score with minimal effort]
2. [...]
3. [...]
```
### Severity Split
Approximately 70% of findings should be automatable (scripts, linters, CI checks can detect them). Flag the remaining 30% as requiring human judgment, and explain why automation falls short for those cases.
---
**Sources**:
- Variant Systems codebase analyzer plugin (variantsystems.io, Feb 2026): 7-category analysis framework
- OWASP Top 10 (2021): Security category patterns
- Claude Code Security Hardening Guide: AI Patterns category baseline
$ARGUMENTS

View file

@ -0,0 +1,151 @@
---
description: "Session invariant template - hard constraints, quality thresholds, and anti-patterns that Claude must respect throughout a session"
---
# First Principles: Session Invariants
This is a template for the "Contract" layer of your Claude Code rules. These are constraints that must hold true for the entire session, regardless of which task is active or how much context has accumulated.
Customize the sections below to match your team's standards. Replace the example values with your own thresholds.
> **Why this matters**: As conversation context grows, earlier instructions lose influence on Claude's behavior. This is called "context decay." Session invariants placed in CLAUDE.md or rules files act as compression anchors that resist this decay, because they're injected at the start of every context window.
## Hard Constraints
These rules never have exceptions. If Claude is about to violate one, it must stop and flag the conflict rather than proceeding.
```markdown
# Hard Constraints (never-break rules)
## Data Safety
- Never delete production data without explicit user confirmation in the same message
- Never store secrets (API keys, passwords, tokens) in code files or commit them
- Never run DROP, TRUNCATE, or DELETE without WHERE on production databases
## Code Safety
- Never disable TypeScript strict mode or ESLint rules to make code compile
- Never catch errors silently (empty catch blocks, swallowed promises)
- Never use `any` type in TypeScript except in test fixtures
## Process Safety
- Never force-push to main/master
- Never skip pre-commit hooks (no --no-verify)
- Never amend a commit that has been pushed to a shared branch
## Scope Safety
- Never modify files outside the directories specified in the current task
- Never add dependencies without stating the reason and checking bundle size impact
- Never refactor code that isn't part of the current task (note it for later instead)
```
## Quality Thresholds
Thresholds beat vague adjectives. "Good coverage" means different things to different people; "80% line coverage" is unambiguous. Define your numbers here.
```markdown
# Quality Thresholds
## Testing
- Minimum test coverage: 80% line coverage for new code
- Every public function must have at least one test
- Every bug fix must include a regression test
- Integration tests required for any endpoint that touches the database
## Performance
- API response time: p95 < 200ms for read endpoints, < 500ms for writes
- Bundle size: Total JS < 250KB gzipped (check with `npx bundlesize`)
- No N+1 queries (use DataLoader or equivalent for batch fetching)
- Database queries: no query > 100ms in development (enable slow query log)
## Code Quality
- Cyclomatic complexity: no function > 15 (enforce via ESLint rule)
- File length: no file > 400 lines (split when approaching limit)
- Function length: no function > 50 lines
- Nesting depth: no code > 4 levels of indentation
## Dependencies
- No dependency with known critical CVE
- No dependency abandoned > 2 years (check last publish date)
- Maximum 3 direct dependencies per feature module
```
## Workflow Invariants
Process constraints that ensure consistency across the session, especially when switching between tasks or when sub-agents are involved.
```markdown
# Workflow Invariants
## Commit Discipline
- Every commit must pass all existing tests before being created
- Commit messages follow Conventional Commits format (feat:, fix:, docs:, etc.)
- One logical change per commit (don't mix refactor with feature)
## Review Before Action
- Read a file before modifying it (no blind edits)
- Run tests after every significant change (not just at the end)
- Verify imports after adding/removing dependencies
## Communication
- When uncertain between two approaches, present both with trade-offs (don't pick silently)
- When a task will take more than 5 tool calls, outline the plan first
- When hitting an unexpected error, diagnose before retrying
```
## Anti-Patterns to Detect
Patterns Claude should flag when it encounters them in the codebase or in its own output. These work like automated code review rules, but for the AI's behavior during a session.
```markdown
# Anti-Patterns to Detect
## Code Smells to Flag
- God objects: classes with >10 public methods or >5 injected dependencies
- Feature envy: a function that references another module's internals more than its own
- Primitive obsession: passing >3 related primitives instead of a typed object
- Temporal coupling: functions that must be called in a specific order without enforcement
## Process Smells to Flag
- Yak shaving: spending >3 tool calls on something tangential to the task
- Gold plating: adding features, abstractions, or error handling not requested
- Shotgun surgery: a single change requiring edits in >5 files (suggests missing abstraction)
- Copy-paste programming: duplicating >5 lines instead of extracting a function
## AI-Specific Anti-Patterns
- Hallucinated APIs: calling a method that doesn't exist in the current version
- Stale context: referencing file contents from earlier in the conversation that may have changed
- Over-apology: spending tokens on apologies instead of fixing the issue
- Premature optimization: adding caching, lazy loading, or memoization without evidence of a perf problem
```
## Mitigating Context Decay
Three practical strategies to keep these invariants effective across long sessions:
1. **Place in CLAUDE.md**: Rules in CLAUDE.md are injected at the start of every context window, surviving auto-compaction. This is the strongest position for invariants.
2. **Use rules files for domain-specific constraints**: Put testing thresholds in `.claude/rules/testing.md`, security rules in `.claude/rules/security.md`. They load with CLAUDE.md but keep each file focused.
3. **LEARNINGS.md hook pattern**: Configure a hook that injects accumulated session learnings into sub-agents, so constraints discovered mid-session propagate to delegated work:
```json
// .claude/settings.json (hooks section)
{
"hooks": {
"PreToolUse": [{
"matcher": "Task",
"command": "cat .claude/LEARNINGS.md 2>/dev/null || true"
}]
}
}
```
This ensures that when Claude spawns sub-agents via the Task tool, they receive the same session-specific learnings and constraints.
---
**Sources**:
- 3-layer context model (Contract / Working Set / Noise): codeaholicguy.com (Feb 2026)
- CLAUDE.md as "context compression anchors": Craig Johnston (imti.co)
- "Thresholds, not vibes" pattern: specific numbers over vague adjectives
- LEARNINGS.md hook pattern: community practice for propagating context to sub-agents

View file

@ -12,7 +12,7 @@ tags: [cheatsheet, reference]
**Written with**: Claude (Anthropic)
**Version**: 3.29.2 | **Last Updated**: February 2026
**Version**: 3.30.0 | **Last Updated**: February 2026
---
@ -608,4 +608,4 @@ where.exe claude; claude doctor; claude mcp list
**Author**: Florian BRUNIAUX | [@Méthode Aristote](https://methode-aristote.fr) | Written with Claude
*Last updated: February 2026 | Version 3.29.2*
*Last updated: February 2026 | Version 3.30.0*

View file

@ -16,7 +16,7 @@ tags: [guide, reference, workflows, agents, hooks, mcp, security]
**Last updated**: January 2026
**Version**: 3.29.2
**Version**: 3.30.0
---
@ -4642,6 +4642,78 @@ Express + Prisma backend.
**Production Safety**: For teams deploying Claude Code in production, see [Production Safety Rules](production-safety.md) for port stability, database safety, and infrastructure lock patterns.
### Modular Context Architecture
As projects grow, keeping everything in a single CLAUDE.md file becomes unwieldy. The community has converged on a modular approach that separates the index from the detail, using Claude's native file-loading mechanisms.
**The pattern**: CLAUDE.md stays under 100 lines and acts as a routing index. Domain-specific rules live in `.claude/rules/*.md` files, loaded automatically at session start. Skills and workflows live in `.claude/skills/`.
```
.claude/
├── CLAUDE.md # Index only — under 100 lines
├── rules/
│ ├── testing.md # Test conventions, coverage thresholds
│ ├── security.md # Security invariants
│ ├── architecture.md # Design decisions, ADR references
│ └── api-conventions.md # API standards, naming rules
└── skills/
├── deploy.md # Deployment workflow
└── review.md # Code review process
```
**Why this works**: Claude loads ALL files in `.claude/rules/` at session start automatically (Section 3.2). The CLAUDE.md index stays readable at a glance while the full rule set is always active.
**Path-based conditional loading**: Claude supports `paths:` frontmatter in rule files to restrict rules to specific directories. A rule that only applies to notebook code doesn't need to load in every session:
```yaml
---
paths:
- notebooks/**
- experiments/**
---
# Jupyter Conventions
Always include a markdown cell explaining the experiment goal before any code.
Never use global state between notebook cells.
```
Rules without a `paths:` key load unconditionally. Rules with `paths:` only load when Claude is working with files that match those patterns.
**The 3-tier hierarchy** (community-validated pattern):
| Tier | Location | Content | When it loads |
|------|----------|---------|---------------|
| **Index** | `CLAUDE.md` | Commands, stack, critical constraints | Always |
| **Domain rules** | `.claude/rules/*.md` | Conventions by domain (testing, security, API) | Always (or path-scoped) |
| **Skills** | `.claude/skills/*.md` | Reusable workflows | On-demand via `/skill-name` |
**Practical example** for a full-stack project:
```markdown
# CLAUDE.md (index — 60 lines max)
## Stack
Next.js 14, TypeScript, PostgreSQL/Prisma, TailwindCSS
## Commands
- `pnpm dev` — start dev server
- `pnpm test` — run tests
- `pnpm build` — production build
## Rules loaded automatically
See .claude/rules/ for domain-specific conventions:
- testing.md — coverage minimums, test patterns
- security.md — auth rules, input validation
- api-conventions.md — REST naming, error format
## Critical constraints
- Never modify files in src/generated/ (auto-generated by Prisma)
- Always use pnpm, never npm or yarn
```
This separation keeps the daily-use index scannable while ensuring domain experts can expand their area without cluttering the shared index.
> **Source**: Pattern documented by the Claude Code community (joseparreogarcia.substack.com, 2026); 78% of developers create a CLAUDE.md within 48h of starting with Claude Code (SFEIR Institute survey). Path-based conditional loading is an official feature documented in the [Claude Code settings reference](https://docs.anthropic.com/en/docs/claude-code/settings).
---
## 3.2 The .claude/ Folder Structure
@ -4695,7 +4767,7 @@ The `.claude/` folder is your project's Claude Code directory for memory, settin
| Personal preferences | `CLAUDE.md` | ❌ Gitignore |
| Personal permissions | `settings.local.json` | ❌ Gitignore |
### 3.29.2 Version Control & Backup
### 3.30.0 Version Control & Backup
**Problem**: Without version control, losing your Claude Code configuration means hours of manual reconfiguration across agents, skills, hooks, and MCP servers.
@ -13097,6 +13169,98 @@ We've made your experience faster and more personal:
- Solution: Clean up commit history before generation (interactive rebase)
- Better: Enforce commit message format with git hooks
### Deployment Automation
Claude Code can automate deployments to Vercel, GCP, and other platforms using stored credentials. The key is assembling three components: secret management, a deploy skill, and mandatory guardrails.
#### Required secrets
Store credentials in the OS keychain rather than `.env` files:
```bash
# Vercel deployment (3 required variables)
security add-generic-password -a claude -s VERCEL_TOKEN -w "your_token"
security add-generic-password -a claude -s VERCEL_ORG_ID -w "your_org_id"
security add-generic-password -a claude -s VERCEL_PROJECT_ID -w "your_project_id"
# Retrieve in scripts
VERCEL_TOKEN=$(security find-generic-password -s VERCEL_TOKEN -w)
```
For multi-platform secrets (GitHub, Vercel, AWS simultaneously), **Infisical** provides centralized management with versioning and point-in-time recovery — a useful open-source alternative to HashiCorp Vault:
```bash
# Install Infisical CLI
brew install infisical/get-cli/infisical
# Inject secrets into Claude Code session
infisical run -- claude
# Infisical automatically sets all project secrets as env vars
```
#### Deployment skill
Create a skill that encapsulates the full deploy workflow:
```yaml
---
name: deploy-to-vercel
description: Deploy to Vercel staging then production with smoke tests
allowed-tools: Bash
---
## Deploy Workflow
1. Run tests: `pnpm test` — stop if any fail
2. Build: `pnpm build` — stop if build fails
3. Deploy to staging: `vercel deploy`
4. Run smoke tests against staging URL
5. **PAUSE** — output staging URL and ask for human confirmation before production
6. On approval: `vercel deploy --prod`
7. Verify production URL responds with HTTP 200
```
#### Non-negotiable guardrails
These guardrails are not optional. Production deployments without them create incidents:
| Guardrail | Implementation | Why |
|-----------|---------------|-----|
| **Staging-first** | Always deploy to staging before prod | Catch environment-specific failures |
| **Human confirmation** | Stop and ask before `--prod` flag | No autonomous production deploys |
| **Smoke test** | Verify HTTP 200 on key endpoints after deploy | Catch silent deployment failures |
| **Rollback ready** | Keep previous deployment ID before promoting | `vercel rollback <deployment-id>` |
**Hook for confirmation** (prevent accidental production deploys):
```json
// .claude/settings.json
{
"hooks": {
"PreToolUse": [{
"matcher": "Bash",
"hooks": [{
"type": "command",
"command": "scripts/check-prod-deploy.sh"
}]
}]
}
}
```
```bash
#!/bin/bash
# check-prod-deploy.sh — exit 2 to block, exit 0 to allow
INPUT=$(cat)
if echo "$INPUT" | grep -q "vercel deploy --prod\|gcloud deploy.*production"; then
echo "BLOCKED: Production deploy requires manual confirmation. Run the command directly from your terminal."
exit 2
fi
exit 0
```
> **Sources**: Vercel deploy skill pattern documented by the community (lobehub.com, haniakrim21); Infisical multi-platform secrets management at [infisical.com](https://infisical.com). No end-to-end automated deploy workflow exists in the community as of March 2026 — the building blocks are available but the staging-to-production promotion pattern is something each team assembles themselves.
## 9.4 IDE Integration
### VS Code Integration
@ -15013,6 +15177,48 @@ neonctl branches delete feature-payments
- [Neon Branching](https://neon.tech/docs/guides/branching) - Official Neon documentation
- [PlanetScale Branching](https://planetscale.com/docs/concepts/branching) - Official PlanetScale guide
### Coordinating Parallel Worktrees: Task Dependencies
When running multiple agents in parallel worktrees, the hardest problem isn't setup — it's coordination. There is no built-in automatic dependency detection between worktree agents. You manage it explicitly.
**The pattern: analyze files touched, then set `blockedBy` manually**
Before spawning parallel agents, identify which tasks share files:
```bash
# Quick dependency check: list files each task will touch
echo "Task A (auth feature):"
grep -r "UserService\|auth/" src/ --include="*.ts" -l
echo "Task B (payment feature):"
grep -r "PaymentService\|billing/" src/ --include="*.ts" -l
# No overlap? Safe to parallelize.
# Overlap detected? Sequence them.
```
In the Tasks API, set `blockedBy` for tasks that depend on others completing first:
```json
// Task B cannot start until Task A merges
TaskCreate("Implement payment service", { blockedBy: ["task-a-id"] })
```
**Decision matrix**:
| Scenario | Strategy |
|----------|----------|
| Tasks touch different files, different modules | Parallelize freely |
| Tasks touch same module, different files | Parallelize with explicit conflict resolution step |
| Tasks touch same files | Sequence them |
| Task B needs Task A's API contract | Block Task B until Task A's interface is defined |
**Practical rule**: A 5-minute analysis to find file overlaps before spawning agents saves hours of merge conflict resolution.
**Tooling**: [coderabbitai/git-worktree-runner](https://github.com/coderabbitai/git-worktree-runner) provides a bash-based worktree manager with basic AI tool integration. It handles the worktree lifecycle but not dependency detection — that stays manual.
> **Note**: Fully automatic dependency detection (where the system infers which tasks conflict) doesn't exist in Claude Code or the broader ecosystem as of March 2026. The approaches above are the practical state of the art.
---
## 9.13 Cost Optimization Strategies
@ -21938,4 +22144,4 @@ We'll evaluate and add it to this section if it meets quality criteria.
**Contributions**: Issues and PRs welcome.
**Last updated**: January 2026 | **Version**: 3.29.2
**Last updated**: January 2026 | **Version**: 3.30.0

View file

@ -117,6 +117,30 @@ Agent teams represent the evolution from "single agent" to "coordinated teams" p
| **Over-delegation** | Context switching cost exceeds gains | Active human oversight on critical decisions |
| **Premature automation** | Automating workflow not mastered manually | Manual → Semi-auto → Full-auto (progressive) |
### When Large Teams ARE Justified
The ">5 agents" rule above is a sensible default, but it breaks down in specific scenarios where the math favors larger teams. The real question is not "how many agents?" but "is the coordination overhead less costly than the context overflow?"
**Context window as the deciding factor**: A single Claude Code agent on a 50K+ line codebase fills 80-90% of its context window just loading the relevant files (source: atcyrus.com). At that point, the agent has almost no room left for reasoning. Splitting across multiple agents keeps each one at ~40% context usage, which leaves headroom for actual problem-solving.
| Scenario | Single Agent | 3-Agent Team | 5-Agent Team |
|----------|-------------|--------------|--------------|
| 10K line codebase | ~30% context, comfortable | Overkill | Overkill |
| 50K line codebase | 80-90% context, degraded reasoning | Ideal split | Justified if truly parallel modules |
| 100K+ line codebase | Context overflow, agent misses files | May still overflow per agent | Justified, consider even more |
**When more agents make sense**:
- Independent modules with zero shared state (no coordination overhead to pay)
- Parallel refactoring across isolated file trees (frontend vs backend vs infra)
- Read-heavy analysis where each agent covers a different subsystem
- The codebase physically cannot fit in one agent's context with room to spare
**When more agents hurt**: If agents constantly need to read each other's output or modify shared files, adding agents adds merge conflicts and coordination messages that eat into the very context you were trying to save.
> **Note on model selection per role**: As of March 2026, all agents in a team run the same model (Opus 4.6, required for Agent Teams). The community has requested role-based model selection where the team lead runs Opus for planning, implementation agents run Sonnet for speed, and test agents run Haiku for cost efficiency. This is not yet supported. The current workaround is spawning separate Claude Code processes with explicit `--model` flags, but you lose the built-in coordination and shared task list. Track this as a community feature request.
For broader industry context: Gartner predicts 40% of enterprise applications will incorporate task-specific agents by end of 2026. The team coordination patterns being established now in Claude Code and similar tools will likely become standard practice.
### Cost-Benefit Analysis
**Agent Teams** vs **Multi-Instance Manual**:

View file

@ -0,0 +1,293 @@
---
title: "Event-Driven Agent Automation"
description: "Trigger Claude Code agents from external events like Kanban card moves, GitHub issues, and Jira transitions"
tags: [workflow, agents, automation, event-driven, kanban]
---
# Event-Driven Agent Automation
> **Confidence**: Tier 3 — Emerging pattern, early adopters report positive results but tooling is still maturing.
Instead of manually invoking Claude Code for each task, let external events drive the work. A card moves to "In Progress" in Linear, and Claude picks it up automatically. A GitHub issue gets labeled `claude-fix`, and an agent starts working on it within seconds.
This is the shift from pull-based ("hey Claude, do this") to push-based ("events trigger agents").
---
## Table of Contents
1. [Core Concept](#core-concept)
2. [The Linear-Driven Agent Loop](#the-linear-driven-agent-loop)
3. [Generic Event-to-Agent Pattern](#generic-event-to-agent-pattern)
4. [Implementation Example](#implementation-example)
5. [Event Source Compatibility](#event-source-compatibility)
6. [Guardrails](#guardrails)
7. [Anti-Patterns](#anti-patterns)
8. [Tools & Resources](#tools--resources)
9. [See Also](#see-also)
---
## Core Concept
Traditional Claude Code usage is interactive: you open a terminal, type a prompt, iterate. Event-driven automation removes the human from the trigger step. The human still reviews output (PRs, code changes), but initiation happens through your existing project management workflow.
```mermaid
flowchart LR
A[Event Source] -->|webhook/poll| B[Event Filter]
B -->|matches rules| C[Context Extraction]
C -->|task data| D[Agent Selection]
D -->|spawn| E[Claude Code Agent]
E -->|results| F[Output Routing]
F -->|PR, comment, card update| A
style A fill:#f9f,stroke:#333
style E fill:#bbf,stroke:#333
style F fill:#bfb,stroke:#333
```
The loop is self-reinforcing: the agent's output (a PR, a status update) feeds back into the event source, which can trigger the next step.
---
## The Linear-Driven Agent Loop
The most documented pattern comes from Damian Galarza's workflow (damiangalarza.com, February 2026). Linear serves as the single source of truth for what needs doing, and Claude Code handles implementation end to end.
### Flow
```mermaid
flowchart TD
A[Developer moves card to 'In Progress'] -->|Linear webhook| B[Agent picks up card]
B --> C[Read card description + acceptance criteria]
C --> D[Claude Code implements feature]
D --> E[Run tests + lint]
E -->|pass| F[Open PR automatically]
E -->|fail| D
F --> G[Move card to 'In Review']
G --> H[Human reviews PR]
H -->|approve + merge| I[Move card to 'Done']
H -->|request changes| D
```
### What makes it work
The card description acts as the prompt. Good cards with clear acceptance criteria produce good code. Vague cards produce vague code, same as with human developers. The quality of your tickets directly determines the quality of the automation.
Linear's structured fields (description, acceptance criteria, labels, priority) map naturally to Claude Code's needs: what to build, how to verify it, and what constraints apply.
### Key requirements
- Cards must have clear acceptance criteria (not just a title)
- The repo needs a solid test suite for automated verification
- Branch naming conventions should be deterministic (e.g., `feat/LINEAR-123-card-title`)
- PR templates help standardize the agent's output
---
## Generic Event-to-Agent Pattern
The Linear example is specific, but the pattern generalizes to any event source. Five components make up the pipeline:
### 1. Event Source
Where the trigger originates. Could be a project management tool, a CI system, a monitoring alert, or a custom webhook.
### 2. Event Filter
Not every event should spawn an agent. Filters determine which events are actionable:
```bash
# Example: only process cards with the "claude-auto" label
if [[ "$CARD_LABELS" != *"claude-auto"* ]]; then
echo "Skipping: no claude-auto label"
exit 0
fi
```
### 3. Context Extraction
Pull the relevant data from the event payload and format it as a Claude Code prompt. This is where you translate from your tool's schema to natural language instructions.
### 4. Agent Selection
Different event types might need different agent configurations. A bug report needs a different CLAUDE.md context than a feature request. You might use different allowed tools, different models, or different safety constraints.
### 5. Output Routing
Where do the results go? Typically a combination of:
- Git branch + PR (code changes)
- Comment on the original issue/card (status updates)
- State transition on the card (moving to next column)
- Slack notification (human awareness)
---
## Implementation Example
A minimal bash loop that polls Linear for "In Progress" cards and spawns Claude Code agents:
```bash
#!/bin/bash
# linear-agent-loop.sh
# Polls Linear for cards in "In Progress" state and spawns Claude agents
LINEAR_API_KEY="${LINEAR_API_KEY:?Missing LINEAR_API_KEY}"
TEAM_ID="${LINEAR_TEAM_ID:?Missing LINEAR_TEAM_ID}"
PROCESSED_FILE="/tmp/linear-agent-processed.txt"
MAX_CONCURRENT=3
touch "$PROCESSED_FILE"
poll_linear() {
curl -s -X POST https://api.linear.app/graphql \
-H "Authorization: $LINEAR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "query { team(id: \"'"$TEAM_ID"'\") { issues(filter: { state: { name: { eq: \"In Progress\" } }, labels: { name: { eq: \"claude-auto\" } } }) { nodes { id title description } } } }"
}' | jq -r '.data.team.issues.nodes[] | "\(.id)|\(.title)|\(.description)"'
}
spawn_agent() {
local issue_id="$1"
local title="$2"
local description="$3"
echo "[$(date)] Spawning agent for: $title ($issue_id)"
claude --print --dangerously-skip-permissions \
"Implement the following Linear card:
Title: $title
Description: $description
Requirements:
1. Create a feature branch named feat/$issue_id
2. Implement the described feature
3. Run tests and fix any failures
4. Create a PR with the card title" \
2>&1 | tee "/tmp/agent-$issue_id.log"
echo "$issue_id" >> "$PROCESSED_FILE"
}
while true; do
active_agents=$(jobs -r | wc -l)
if [ "$active_agents" -ge "$MAX_CONCURRENT" ]; then
echo "[$(date)] Max concurrent agents reached ($MAX_CONCURRENT), waiting..."
sleep 30
continue
fi
poll_linear | while IFS='|' read -r id title description; do
if grep -q "$id" "$PROCESSED_FILE"; then
continue # Already processed
fi
spawn_agent "$id" "$title" "$description" &
done
sleep 60 # Poll interval
done
```
This is a starting point, not production code. Real deployments need proper error handling, a persistent state store (not a text file), and webhook-based triggers instead of polling.
---
## Event Source Compatibility
| Event Source | Trigger Events | Agent Use Case | Integration Method |
|-------------|----------------|----------------|-------------------|
| **Linear** | Card state change, label added | Feature implementation, bug fix | GraphQL API / MCP server |
| **GitHub Issues** | Issue created, labeled | Bug triage, investigation, fix PR | GitHub Actions / webhooks |
| **GitHub PR** | PR opened, review requested | Code review, automated fixes | GitHub Actions |
| **Jira** | Transition, sprint assignment | Feature work, tech debt cleanup | REST API / webhooks |
| **Slack** | Message in channel, emoji reaction | Quick fixes, investigations | Slack API / bot |
| **PagerDuty** | Incident created | Diagnostic scripts, initial triage | Webhooks |
| **Custom webhook** | Any HTTP POST | Anything | Direct HTTP endpoint |
---
## Guardrails
Event-driven agents run with less human oversight by design, so guardrails become critical.
### Idempotency
An agent might process the same event twice (network retry, duplicate webhook). The agent must check if work already exists before starting:
```bash
# Check if branch already exists for this card
if git ls-remote --heads origin "feat/$ISSUE_ID" | grep -q "feat/$ISSUE_ID"; then
echo "Branch already exists, skipping"
exit 0
fi
```
### Rate Limiting
Don't let a burst of events spawn 50 agents simultaneously. Set hard limits:
- **Max concurrent agents**: 3-5 for most teams
- **Cooldown period**: Minimum 30 seconds between agent spawns
- **Daily budget cap**: Set a maximum token spend per day
### Circuit Breaker
If agents keep failing on a particular type of task, stop trying:
```bash
FAILURE_COUNT=$(grep -c "FAILED" "/tmp/agent-failures.log" 2>/dev/null || echo 0)
if [ "$FAILURE_COUNT" -gt 5 ]; then
echo "Circuit breaker triggered: too many failures"
# Notify human, pause automation
exit 1
fi
```
### Human-in-the-Loop Checkpoints
Even in fully automated flows, keep humans in the loop at critical points:
- PR review remains manual (agents create PRs, humans approve them)
- Database migrations never auto-apply
- Deployment is a separate, human-triggered step
- Any card touching auth, billing, or PII requires explicit human approval
---
## Anti-Patterns
| Anti-Pattern | Problem | Solution |
|-------------|---------|----------|
| **Aggressive polling** | Hammering the API every 5 seconds wastes resources, gets you rate-limited | Use webhooks when available, poll no faster than every 60 seconds |
| **No circuit breaker** | Agent fails repeatedly on same task, burning tokens indefinitely | Track failures per task, stop after 3 attempts, alert human |
| **No dead letter queue** | Failed events disappear, nobody knows work was missed | Log failed events to a persistent store for manual review |
| **Unbounded concurrency** | 20 cards move at once, 20 agents spawn, machine melts | Hard cap on concurrent agents (3-5 is reasonable) |
| **Vague cards as prompts** | "Fix the thing" produces garbage code | Enforce card quality standards, skip cards without acceptance criteria |
| **No state persistence** | Script restarts, re-processes everything from scratch | Store processed event IDs in a database, not in-memory |
| **Skipping PR review** | Agent pushes directly to main | Always go through PR flow, humans review the output |
---
## Tools & Resources
### MCP Servers
- **linear-kanban-mcp** (0xikarus on GitHub): Exposes the Linear API for kanban board management directly from Claude Code. Enables reading cards, updating states, and managing labels without leaving the agent context.
### Skills & Platforms
- **skillsllm.com**: Offers a skill that orchestrates the full planning, validation, and execution cycle starting from a Linear card. Handles the translation from card metadata to structured Claude Code prompts.
### Agent Templates
- **Scrum Master Agent** (lobehub.com): Auto-detects whether it is running inside Claude Desktop or Claude Code and adapts its behavior accordingly. Useful as a starting point for context-aware agent design.
---
## See Also
- [agent-teams.md](./agent-teams.md) — Multi-agent parallel coordination
- [iterative-refinement.md](./iterative-refinement.md) — The core prompt-observe-reprompt loop
- [plan-driven.md](./plan-driven.md) — Plan before executing
- [../../examples/agents/](../../examples/agents/) — Ready-to-use agent templates

View file

@ -22,7 +22,8 @@ Prompt, observe, reprompt until satisfied. The core loop of effective AI-assiste
6. [Script Generation Workflow](#script-generation-workflow)
7. [Iteration Strategies](#iteration-strategies)
8. [Anti-Patterns](#anti-patterns)
9. [See Also](#see-also)
9. [Community Patterns & Known Limitations](#community-patterns--known-limitations)
10. [See Also](#see-also)
---
@ -511,6 +512,124 @@ Perfect. Commit this as "feat: add debounce utility with full TypeScript support
---
## Community Patterns & Known Limitations
The community has built several patterns on top of Claude Code's iterative loop. Some solve real pain points, others expose current limitations worth knowing about.
### Ralph Loop (Test-Driven Autonomous Iteration)
Source: nathanonn.com, February 2026.
The Ralph Loop constrains autonomous iteration to one test case per cycle instead of running the full suite every time. This keeps each cycle focused and prevents the agent from chasing multiple failures at once.
How it works:
1. Pick one failing test case
2. Fix it, verify it passes
3. Save progress to a JSON state file
4. Move to the next failing test case
5. After 3 failed attempts on the same case, mark it as `known_issue` and skip it
```json
{
"current_case": "test_auth_token_refresh",
"attempts": 2,
"known_issues": ["test_legacy_migration_edge_case"],
"completed": ["test_login", "test_logout", "test_session_timeout"]
}
```
The state file is the key innovation here. It survives context resets, `/compact` operations, and even full session restarts. The agent reads the file at the start of each cycle to know exactly where it left off, which cases are done, and which ones to skip.
The 3-attempt limit prevents the infinite loop trap that plagues naive autonomous loops. Rather than burning tokens on a stubborn test case, the agent moves forward and flags the issue for human review later.
### Auto-Continue Skill
Source: mcpmarket.com.
A confidence-based continuation system that decides whether the agent should keep going or stop for human input. Instead of a fixed iteration count, it evaluates the situation after each cycle:
**Auto-continues when**:
- Tests pass
- Build succeeds
- No new error types detected
- Confidence score remains above threshold
**Stops for human input when**:
- Confidence drops below threshold
- A new category of error appears (not just a new instance of a known error)
- Build or type-check fails in a way the agent hasn't seen before
This pairs well with Claude Code's Stop hooks. The skill can trigger post-task verification and decide whether to resume based on the results.
### Stop Hooks for Automatic Verification
A pattern that turns Claude Code's hook system into an automatic quality gate between iterations:
1. Claude finishes a task (or an iteration)
2. A `PostToolUse` hook on `TodoWrite` triggers a verification script
3. The script runs type-check, lint, and tests
4. Errors get piped back to Claude automatically
5. Claude fixes the issues without human intervention
```json
{
"hooks": {
"PostToolUse": [
{
"matcher": "TodoWrite",
"command": "bash -c 'npm run typecheck 2>&1; npm run lint 2>&1; npm test 2>&1'"
}
]
}
}
```
The hook fires every time Claude marks a task as done. If the verification catches something, Claude sees the output and can self-correct before moving to the next task.
### Escalation Strategy
What to do when 3 iterations fail on the same problem. Instead of looping forever or giving up, follow a structured escalation path:
1. **Decompose**: Break the failing task into 2-3 smaller sub-tasks that can be tackled independently
2. **Collect context**: Dump all error messages, stack traces, and attempted fixes into a structured file
3. **Model escalation**: If using Sonnet, retry the specific failing case with Opus for deeper reasoning
4. **Human escalation**: If the model upgrade doesn't help, create a GitHub issue with the full error context and mark the task as `known_issue`
```bash
# Escalation in practice
if [ "$ATTEMPT_COUNT" -ge 3 ]; then
# Collect context
cat errors.log attempts.log > escalation-context.md
# Try with Opus
claude --model claude-opus-4-6 \
"Fix this failing test. Context: $(cat escalation-context.md)"
# If still failing, create issue
if [ $? -ne 0 ]; then
gh issue create \
--title "Auto-escalation: $TEST_NAME fails after 3 attempts" \
--body "$(cat escalation-context.md)" \
--label "known_issue,needs-human"
fi
fi
```
The goal is never to silently drop work. Every failure either gets resolved, escalated, or explicitly tracked.
### Known Limitations
Being honest about what doesn't work yet, so you don't waste time reinventing solutions that don't exist.
**No built-in retry/verify/resume** (GitHub issue #28489): Headless automation in Claude Code lacks native support for retry logic, verification gates, and session resumption. Every team implementing autonomous loops builds their own version of this. State files, hook-based verification, and escalation scripts are all community workarounds for a gap in the platform.
**Agent iterations can be lost** (GitHub issue #28843): In multi-day workflows, agent iterations and their accumulated context can be destroyed. If you're running a workflow that spans multiple sessions or days, save explicit state files every N iterations. Do not rely on Claude's conversation memory as your only source of truth.
**Multi-day workflow fragility**: Long-running automation needs checkpointing discipline. Save state to disk (JSON files, git commits, issue comments) at regular intervals. The pattern is simple but easy to forget: if you can't reconstruct the agent's progress from files on disk alone, your workflow will break on session boundaries.
---
## See Also
- [exploration-workflow.md](./exploration-workflow.md) — Explore alternatives before iterating

View file

@ -3,8 +3,8 @@
# Source: guide/ultimate-guide.md
# Purpose: Condensed index for LLMs to quickly answer user questions about Claude Code
version: "3.29.2"
updated: "2026-02-26"
version: "3.30.0"
updated: "2026-03-03"
# ════════════════════════════════════════════════════════════════
# DEEP DIVE - Line numbers in guide/ultimate-guide.md
@ -47,6 +47,17 @@ deep_dive:
rules_code_quality_review: "examples/rules/code-quality-review.md"
rules_test_review: "examples/rules/test-review.md"
rules_performance_review: "examples/rules/performance-review.md"
rules_first_principles: "examples/rules/first-principles.md" # Session invariants: Contract/Working Set/Noise model
# Advanced Patterns (10 practitioner patterns, fact-checked March 2026)
modular_context_architecture: "guide/ultimate-guide.md:4645" # Pattern #1: CLAUDE.md-as-index + path-scoped rules
adversarial_plan_review: "examples/agents/plan-challenger.md" # Pattern #5: +52.8% security, +80% bug detection
adr_auto_generation: "examples/agents/adr-writer.md" # Pattern #3: criticality matrix C1/C2/C3
codebase_audit_scoring: "examples/commands/audit-codebase.md" # Pattern #9: 7-category scoring (Variant Systems)
event_driven_agents: "guide/workflows/event-driven-agents.md" # Pattern #8: Linear/GitHub/Jira webhook→agent
worktree_dependency_management: "guide/ultimate-guide.md:15180" # Pattern #6: manual analysis, no auto-detection
deployment_automation: "guide/ultimate-guide.md:13172" # Pattern #10: Vercel + Infisical guardrails
iterative_refinement_community: "guide/workflows/iterative-refinement.md:515" # Pattern #7: Ralph Loop + Auto-Continue
agent_teams_large_justified: "guide/workflows/agent-teams.md:120" # Pattern #4: when >5 agents are justified
# Team Configuration at Scale (Profile-Based Module Assembly)
team_ai_instructions_section: "guide/ultimate-guide.md#35-team-configuration-at-scale"
team_ai_instructions_workflow: "guide/workflows/team-ai-instructions.md"
@ -1415,7 +1426,7 @@ ecosystem:
- "Cross-links modified → Update all 4 repos"
history:
- date: "2026-01-20"
event: "Code Landing sync v3.29.2, 66 templates, cross-links"
event: "Code Landing sync v3.30.0, 66 templates, cross-links"
commit: "5b5ce62"
- date: "2026-01-20"
event: "Cowork Landing fix (paths, README, UI badges)"
@ -1427,7 +1438,7 @@ ecosystem:
onboarding_matrix_meta:
version: "2.0.0"
last_updated: "2026-02-05"
aligned_with_guide: "3.29.2"
aligned_with_guide: "3.30.0"
changelog:
- version: "2.0.0"
date: "2026-02-05"
@ -1455,7 +1466,7 @@ onboarding_matrix:
core: [rules, sandbox_native_guide, commands]
time_budget: "5 min"
topics_max: 3
note: "SECURITY FIRST - sandbox before commands (v3.29.2 critical fix)"
note: "SECURITY FIRST - sandbox before commands (v3.30.0 critical fix)"
beginner_15min:
core: [rules, sandbox_native_guide, workflow, essential_commands]
@ -1540,7 +1551,7 @@ onboarding_matrix:
- default: agent_validation_checklist
time_budget: "60 min"
topics_max: 6
note: "Dual-instance pattern for quality workflows (v3.29.2)"
note: "Dual-instance pattern for quality workflows (v3.30.0)"
learn_security:
intermediate_30min:
@ -1551,7 +1562,7 @@ onboarding_matrix:
- default: permission_modes
time_budget: "30 min"
topics_max: 4
note: "NEW goal (v3.29.2) - Security-focused learning path"
note: "NEW goal (v3.30.0) - Security-focused learning path"
power_60min:
core: [sandbox_native_guide, mcp_secrets_management, security_hardening]
@ -1576,7 +1587,7 @@ onboarding_matrix:
core: [rules, sandbox_native_guide, workflow, essential_commands, context_management, plan_mode]
time_budget: "60 min"
topics_max: 6
note: "Security foundation + core workflow (v3.29.2 sandbox added)"
note: "Security foundation + core workflow (v3.30.0 sandbox added)"
intermediate_120min:
core: [plan_mode, agents, skills, config_hierarchy, git_mcp_guide, hooks, mcp_servers]