diff --git a/CHANGELOG.md b/CHANGELOG.md index 6ad7085..4f159c2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,28 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). ## [Unreleased] +## [3.30.0] - 2026-03-03 + +### Added + +- **10 patterns avancés documentés** — audit systématique de 10 patterns identifiés chez des praticiens experts, fact-checked via 9 recherches Perplexity (mars 2026). 5 nouveaux fichiers créés, 4 fichiers existants enrichis, 3 sections ajoutées dans le guide principal. + + **Nouveaux fichiers** : + - `examples/agents/plan-challenger.md` — agent adversarial pour challenger les plans avant implémentation (+52.8% sécurité, +80% détection bugs, sources : DrillAgent/nsfocusglobal.com, milvus.io) + - `examples/agents/adr-writer.md` — agent de génération automatique d'ADRs avec matrice de criticité C1/C2/C3, référence MCP `mcp-adr-analysis-server` (tosin2013/GitHub) + - `examples/commands/audit-codebase.md` — commande scoring codebase en 7 catégories (Secrets, Security, Dependencies, Structure, Tests, Imports, AI Patterns), 3 niveaux de sévérité, plan de progression par tiers 5→8→10 (inspiré Variant Systems open-source plugin) + - `examples/rules/first-principles.md` — template invariants de session : modèle Contract/Working Set/Noise, thresholds mesurables ("80% minimum" > "bonne couverture"), mitigation du context decay + - `guide/workflows/event-driven-agents.md` — workflow complet "événement → agent" : Linear-Driven Agent Loop (Galarza, fév 2026), pattern générique webhook, table événements×agents, guardrails (idempotence, rate limiting, circuit breaker) + + **Modifications guide principal** (`guide/ultimate-guide.md`) : + - §3.1 — nouvelle sous-section "Modular Context Architecture" : CLAUDE.md-as-index (<100 lignes), `paths:` frontmatter pour conditional loading, architecture 3 tiers root→rules/→skills/ (feature officielle non documentée) + - §9.3 — nouvelle sous-section "Deployment Automation" : briques Vercel (3 variables requises), Infisical comme alternative open-source à Vault, skill deploy, guardrails non-négociables (staging-first, confirmation hook, rollback) + - §9.12 (worktrees) — nouvelle sous-section "Coordinating Parallel Worktrees: Task Dependencies" : analyse manuelle des fichiers touchés, `blockedBy` explicite, matrice de décision, référence `coderabbitai/git-worktree-runner`, clarification : détection auto n'existe pas + + **Modifications workflows** : + - `guide/workflows/iterative-refinement.md` — section "Community Patterns & Known Limitations" : Ralph Loop (nathanonn.com), Auto-Continue Skill (mcpmarket.com), Stop Hooks integration, stratégie d'escalation post-3-itérations, caveats GitHub issues #28489 et #28843 + - `guide/workflows/agent-teams.md` — nuance du >5 agents anti-pattern : tableau context window 10K/50K/100K+, model-per-role (feature souhaitée, non supportée API mars 2026), prédiction Gartner 40% enterprise 2026 + - **SonnetPlan hack documenté** (`guide/ultimate-guide.md` §OpusPlan Mode) — variante budget Sonnet→Haiku via remap `ANTHROPIC_DEFAULT_OPUS_MODEL` + `ANTHROPIC_DEFAULT_SONNET_MODEL` : fonction shell `sonnetplan()`, routing Plan/Act, caveat self-report non fiable, lien issue GitHub [#9749](https://github.com/anthropics/claude-code/issues/9749). Nouveau template `examples/scripts/sonnetplan.sh` avec instructions d'installation et note de vérification (status bar vs self-report). - **Auto-memory documentée comme 3e système de mémoire natif** (`guide/ultimate-guide.md` §Session vs Persistent Memory) — passage de 2 à 3 systèmes (session / auto-memory native / Serena MCP), nouveau tableau 5×4, section dédiée "Auto-Memory (native, v2.1.59+)" avec chemin MEMORY.md et gestion `/memory`. Correction : l'ancienne description liait `/memory` à CLAUDE.md (inexact) et ignorait le système natif. Guidance "When to use which" mise à jour. diff --git a/README.md b/README.md index 917a308..dfb809d 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@

Stars - Last Update + Last Update Quiz Templates Threat Database @@ -846,7 +846,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines. --- -*Version 3.29.2 | Updated daily · Mar 2, 2026 | Crafted with Claude* +*Version 3.30.0 | Updated daily · Mar 3, 2026 | Crafted with Claude* |webhook/poll| B[Event Filter] + B -->|matches rules| C[Context Extraction] + C -->|task data| D[Agent Selection] + D -->|spawn| E[Claude Code Agent] + E -->|results| F[Output Routing] + F -->|PR, comment, card update| A + + style A fill:#f9f,stroke:#333 + style E fill:#bbf,stroke:#333 + style F fill:#bfb,stroke:#333 +``` + +The loop is self-reinforcing: the agent's output (a PR, a status update) feeds back into the event source, which can trigger the next step. + +--- + +## The Linear-Driven Agent Loop + +The most documented pattern comes from Damian Galarza's workflow (damiangalarza.com, February 2026). Linear serves as the single source of truth for what needs doing, and Claude Code handles implementation end to end. + +### Flow + +```mermaid +flowchart TD + A[Developer moves card to 'In Progress'] -->|Linear webhook| B[Agent picks up card] + B --> C[Read card description + acceptance criteria] + C --> D[Claude Code implements feature] + D --> E[Run tests + lint] + E -->|pass| F[Open PR automatically] + E -->|fail| D + F --> G[Move card to 'In Review'] + G --> H[Human reviews PR] + H -->|approve + merge| I[Move card to 'Done'] + H -->|request changes| D +``` + +### What makes it work + +The card description acts as the prompt. Good cards with clear acceptance criteria produce good code. Vague cards produce vague code, same as with human developers. The quality of your tickets directly determines the quality of the automation. + +Linear's structured fields (description, acceptance criteria, labels, priority) map naturally to Claude Code's needs: what to build, how to verify it, and what constraints apply. + +### Key requirements + +- Cards must have clear acceptance criteria (not just a title) +- The repo needs a solid test suite for automated verification +- Branch naming conventions should be deterministic (e.g., `feat/LINEAR-123-card-title`) +- PR templates help standardize the agent's output + +--- + +## Generic Event-to-Agent Pattern + +The Linear example is specific, but the pattern generalizes to any event source. Five components make up the pipeline: + +### 1. Event Source + +Where the trigger originates. Could be a project management tool, a CI system, a monitoring alert, or a custom webhook. + +### 2. Event Filter + +Not every event should spawn an agent. Filters determine which events are actionable: + +```bash +# Example: only process cards with the "claude-auto" label +if [[ "$CARD_LABELS" != *"claude-auto"* ]]; then + echo "Skipping: no claude-auto label" + exit 0 +fi +``` + +### 3. Context Extraction + +Pull the relevant data from the event payload and format it as a Claude Code prompt. This is where you translate from your tool's schema to natural language instructions. + +### 4. Agent Selection + +Different event types might need different agent configurations. A bug report needs a different CLAUDE.md context than a feature request. You might use different allowed tools, different models, or different safety constraints. + +### 5. Output Routing + +Where do the results go? Typically a combination of: +- Git branch + PR (code changes) +- Comment on the original issue/card (status updates) +- State transition on the card (moving to next column) +- Slack notification (human awareness) + +--- + +## Implementation Example + +A minimal bash loop that polls Linear for "In Progress" cards and spawns Claude Code agents: + +```bash +#!/bin/bash +# linear-agent-loop.sh +# Polls Linear for cards in "In Progress" state and spawns Claude agents + +LINEAR_API_KEY="${LINEAR_API_KEY:?Missing LINEAR_API_KEY}" +TEAM_ID="${LINEAR_TEAM_ID:?Missing LINEAR_TEAM_ID}" +PROCESSED_FILE="/tmp/linear-agent-processed.txt" +MAX_CONCURRENT=3 + +touch "$PROCESSED_FILE" + +poll_linear() { + curl -s -X POST https://api.linear.app/graphql \ + -H "Authorization: $LINEAR_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "query": "query { team(id: \"'"$TEAM_ID"'\") { issues(filter: { state: { name: { eq: \"In Progress\" } }, labels: { name: { eq: \"claude-auto\" } } }) { nodes { id title description } } } }" + }' | jq -r '.data.team.issues.nodes[] | "\(.id)|\(.title)|\(.description)"' +} + +spawn_agent() { + local issue_id="$1" + local title="$2" + local description="$3" + + echo "[$(date)] Spawning agent for: $title ($issue_id)" + + claude --print --dangerously-skip-permissions \ + "Implement the following Linear card: + Title: $title + Description: $description + + Requirements: + 1. Create a feature branch named feat/$issue_id + 2. Implement the described feature + 3. Run tests and fix any failures + 4. Create a PR with the card title" \ + 2>&1 | tee "/tmp/agent-$issue_id.log" + + echo "$issue_id" >> "$PROCESSED_FILE" +} + +while true; do + active_agents=$(jobs -r | wc -l) + if [ "$active_agents" -ge "$MAX_CONCURRENT" ]; then + echo "[$(date)] Max concurrent agents reached ($MAX_CONCURRENT), waiting..." + sleep 30 + continue + fi + + poll_linear | while IFS='|' read -r id title description; do + if grep -q "$id" "$PROCESSED_FILE"; then + continue # Already processed + fi + spawn_agent "$id" "$title" "$description" & + done + + sleep 60 # Poll interval +done +``` + +This is a starting point, not production code. Real deployments need proper error handling, a persistent state store (not a text file), and webhook-based triggers instead of polling. + +--- + +## Event Source Compatibility + +| Event Source | Trigger Events | Agent Use Case | Integration Method | +|-------------|----------------|----------------|-------------------| +| **Linear** | Card state change, label added | Feature implementation, bug fix | GraphQL API / MCP server | +| **GitHub Issues** | Issue created, labeled | Bug triage, investigation, fix PR | GitHub Actions / webhooks | +| **GitHub PR** | PR opened, review requested | Code review, automated fixes | GitHub Actions | +| **Jira** | Transition, sprint assignment | Feature work, tech debt cleanup | REST API / webhooks | +| **Slack** | Message in channel, emoji reaction | Quick fixes, investigations | Slack API / bot | +| **PagerDuty** | Incident created | Diagnostic scripts, initial triage | Webhooks | +| **Custom webhook** | Any HTTP POST | Anything | Direct HTTP endpoint | + +--- + +## Guardrails + +Event-driven agents run with less human oversight by design, so guardrails become critical. + +### Idempotency + +An agent might process the same event twice (network retry, duplicate webhook). The agent must check if work already exists before starting: + +```bash +# Check if branch already exists for this card +if git ls-remote --heads origin "feat/$ISSUE_ID" | grep -q "feat/$ISSUE_ID"; then + echo "Branch already exists, skipping" + exit 0 +fi +``` + +### Rate Limiting + +Don't let a burst of events spawn 50 agents simultaneously. Set hard limits: + +- **Max concurrent agents**: 3-5 for most teams +- **Cooldown period**: Minimum 30 seconds between agent spawns +- **Daily budget cap**: Set a maximum token spend per day + +### Circuit Breaker + +If agents keep failing on a particular type of task, stop trying: + +```bash +FAILURE_COUNT=$(grep -c "FAILED" "/tmp/agent-failures.log" 2>/dev/null || echo 0) +if [ "$FAILURE_COUNT" -gt 5 ]; then + echo "Circuit breaker triggered: too many failures" + # Notify human, pause automation + exit 1 +fi +``` + +### Human-in-the-Loop Checkpoints + +Even in fully automated flows, keep humans in the loop at critical points: + +- PR review remains manual (agents create PRs, humans approve them) +- Database migrations never auto-apply +- Deployment is a separate, human-triggered step +- Any card touching auth, billing, or PII requires explicit human approval + +--- + +## Anti-Patterns + +| Anti-Pattern | Problem | Solution | +|-------------|---------|----------| +| **Aggressive polling** | Hammering the API every 5 seconds wastes resources, gets you rate-limited | Use webhooks when available, poll no faster than every 60 seconds | +| **No circuit breaker** | Agent fails repeatedly on same task, burning tokens indefinitely | Track failures per task, stop after 3 attempts, alert human | +| **No dead letter queue** | Failed events disappear, nobody knows work was missed | Log failed events to a persistent store for manual review | +| **Unbounded concurrency** | 20 cards move at once, 20 agents spawn, machine melts | Hard cap on concurrent agents (3-5 is reasonable) | +| **Vague cards as prompts** | "Fix the thing" produces garbage code | Enforce card quality standards, skip cards without acceptance criteria | +| **No state persistence** | Script restarts, re-processes everything from scratch | Store processed event IDs in a database, not in-memory | +| **Skipping PR review** | Agent pushes directly to main | Always go through PR flow, humans review the output | + +--- + +## Tools & Resources + +### MCP Servers + +- **linear-kanban-mcp** (0xikarus on GitHub): Exposes the Linear API for kanban board management directly from Claude Code. Enables reading cards, updating states, and managing labels without leaving the agent context. + +### Skills & Platforms + +- **skillsllm.com**: Offers a skill that orchestrates the full planning, validation, and execution cycle starting from a Linear card. Handles the translation from card metadata to structured Claude Code prompts. + +### Agent Templates + +- **Scrum Master Agent** (lobehub.com): Auto-detects whether it is running inside Claude Desktop or Claude Code and adapts its behavior accordingly. Useful as a starting point for context-aware agent design. + +--- + +## See Also + +- [agent-teams.md](./agent-teams.md) — Multi-agent parallel coordination +- [iterative-refinement.md](./iterative-refinement.md) — The core prompt-observe-reprompt loop +- [plan-driven.md](./plan-driven.md) — Plan before executing +- [../../examples/agents/](../../examples/agents/) — Ready-to-use agent templates diff --git a/guide/workflows/iterative-refinement.md b/guide/workflows/iterative-refinement.md index a3916c5..b67aa72 100644 --- a/guide/workflows/iterative-refinement.md +++ b/guide/workflows/iterative-refinement.md @@ -22,7 +22,8 @@ Prompt, observe, reprompt until satisfied. The core loop of effective AI-assiste 6. [Script Generation Workflow](#script-generation-workflow) 7. [Iteration Strategies](#iteration-strategies) 8. [Anti-Patterns](#anti-patterns) -9. [See Also](#see-also) +9. [Community Patterns & Known Limitations](#community-patterns--known-limitations) +10. [See Also](#see-also) --- @@ -511,6 +512,124 @@ Perfect. Commit this as "feat: add debounce utility with full TypeScript support --- +## Community Patterns & Known Limitations + +The community has built several patterns on top of Claude Code's iterative loop. Some solve real pain points, others expose current limitations worth knowing about. + +### Ralph Loop (Test-Driven Autonomous Iteration) + +Source: nathanonn.com, February 2026. + +The Ralph Loop constrains autonomous iteration to one test case per cycle instead of running the full suite every time. This keeps each cycle focused and prevents the agent from chasing multiple failures at once. + +How it works: + +1. Pick one failing test case +2. Fix it, verify it passes +3. Save progress to a JSON state file +4. Move to the next failing test case +5. After 3 failed attempts on the same case, mark it as `known_issue` and skip it + +```json +{ + "current_case": "test_auth_token_refresh", + "attempts": 2, + "known_issues": ["test_legacy_migration_edge_case"], + "completed": ["test_login", "test_logout", "test_session_timeout"] +} +``` + +The state file is the key innovation here. It survives context resets, `/compact` operations, and even full session restarts. The agent reads the file at the start of each cycle to know exactly where it left off, which cases are done, and which ones to skip. + +The 3-attempt limit prevents the infinite loop trap that plagues naive autonomous loops. Rather than burning tokens on a stubborn test case, the agent moves forward and flags the issue for human review later. + +### Auto-Continue Skill + +Source: mcpmarket.com. + +A confidence-based continuation system that decides whether the agent should keep going or stop for human input. Instead of a fixed iteration count, it evaluates the situation after each cycle: + +**Auto-continues when**: +- Tests pass +- Build succeeds +- No new error types detected +- Confidence score remains above threshold + +**Stops for human input when**: +- Confidence drops below threshold +- A new category of error appears (not just a new instance of a known error) +- Build or type-check fails in a way the agent hasn't seen before + +This pairs well with Claude Code's Stop hooks. The skill can trigger post-task verification and decide whether to resume based on the results. + +### Stop Hooks for Automatic Verification + +A pattern that turns Claude Code's hook system into an automatic quality gate between iterations: + +1. Claude finishes a task (or an iteration) +2. A `PostToolUse` hook on `TodoWrite` triggers a verification script +3. The script runs type-check, lint, and tests +4. Errors get piped back to Claude automatically +5. Claude fixes the issues without human intervention + +```json +{ + "hooks": { + "PostToolUse": [ + { + "matcher": "TodoWrite", + "command": "bash -c 'npm run typecheck 2>&1; npm run lint 2>&1; npm test 2>&1'" + } + ] + } +} +``` + +The hook fires every time Claude marks a task as done. If the verification catches something, Claude sees the output and can self-correct before moving to the next task. + +### Escalation Strategy + +What to do when 3 iterations fail on the same problem. Instead of looping forever or giving up, follow a structured escalation path: + +1. **Decompose**: Break the failing task into 2-3 smaller sub-tasks that can be tackled independently +2. **Collect context**: Dump all error messages, stack traces, and attempted fixes into a structured file +3. **Model escalation**: If using Sonnet, retry the specific failing case with Opus for deeper reasoning +4. **Human escalation**: If the model upgrade doesn't help, create a GitHub issue with the full error context and mark the task as `known_issue` + +```bash +# Escalation in practice +if [ "$ATTEMPT_COUNT" -ge 3 ]; then + # Collect context + cat errors.log attempts.log > escalation-context.md + + # Try with Opus + claude --model claude-opus-4-6 \ + "Fix this failing test. Context: $(cat escalation-context.md)" + + # If still failing, create issue + if [ $? -ne 0 ]; then + gh issue create \ + --title "Auto-escalation: $TEST_NAME fails after 3 attempts" \ + --body "$(cat escalation-context.md)" \ + --label "known_issue,needs-human" + fi +fi +``` + +The goal is never to silently drop work. Every failure either gets resolved, escalated, or explicitly tracked. + +### Known Limitations + +Being honest about what doesn't work yet, so you don't waste time reinventing solutions that don't exist. + +**No built-in retry/verify/resume** (GitHub issue #28489): Headless automation in Claude Code lacks native support for retry logic, verification gates, and session resumption. Every team implementing autonomous loops builds their own version of this. State files, hook-based verification, and escalation scripts are all community workarounds for a gap in the platform. + +**Agent iterations can be lost** (GitHub issue #28843): In multi-day workflows, agent iterations and their accumulated context can be destroyed. If you're running a workflow that spans multiple sessions or days, save explicit state files every N iterations. Do not rely on Claude's conversation memory as your only source of truth. + +**Multi-day workflow fragility**: Long-running automation needs checkpointing discipline. Save state to disk (JSON files, git commits, issue comments) at regular intervals. The pattern is simple but easy to forget: if you can't reconstruct the agent's progress from files on disk alone, your workflow will break on session boundaries. + +--- + ## See Also - [exploration-workflow.md](./exploration-workflow.md) — Explore alternatives before iterating diff --git a/machine-readable/reference.yaml b/machine-readable/reference.yaml index 2266ff0..0c4944f 100644 --- a/machine-readable/reference.yaml +++ b/machine-readable/reference.yaml @@ -3,8 +3,8 @@ # Source: guide/ultimate-guide.md # Purpose: Condensed index for LLMs to quickly answer user questions about Claude Code -version: "3.29.2" -updated: "2026-02-26" +version: "3.30.0" +updated: "2026-03-03" # ════════════════════════════════════════════════════════════════ # DEEP DIVE - Line numbers in guide/ultimate-guide.md @@ -47,6 +47,17 @@ deep_dive: rules_code_quality_review: "examples/rules/code-quality-review.md" rules_test_review: "examples/rules/test-review.md" rules_performance_review: "examples/rules/performance-review.md" + rules_first_principles: "examples/rules/first-principles.md" # Session invariants: Contract/Working Set/Noise model + # Advanced Patterns (10 practitioner patterns, fact-checked March 2026) + modular_context_architecture: "guide/ultimate-guide.md:4645" # Pattern #1: CLAUDE.md-as-index + path-scoped rules + adversarial_plan_review: "examples/agents/plan-challenger.md" # Pattern #5: +52.8% security, +80% bug detection + adr_auto_generation: "examples/agents/adr-writer.md" # Pattern #3: criticality matrix C1/C2/C3 + codebase_audit_scoring: "examples/commands/audit-codebase.md" # Pattern #9: 7-category scoring (Variant Systems) + event_driven_agents: "guide/workflows/event-driven-agents.md" # Pattern #8: Linear/GitHub/Jira webhook→agent + worktree_dependency_management: "guide/ultimate-guide.md:15180" # Pattern #6: manual analysis, no auto-detection + deployment_automation: "guide/ultimate-guide.md:13172" # Pattern #10: Vercel + Infisical guardrails + iterative_refinement_community: "guide/workflows/iterative-refinement.md:515" # Pattern #7: Ralph Loop + Auto-Continue + agent_teams_large_justified: "guide/workflows/agent-teams.md:120" # Pattern #4: when >5 agents are justified # Team Configuration at Scale (Profile-Based Module Assembly) team_ai_instructions_section: "guide/ultimate-guide.md#35-team-configuration-at-scale" team_ai_instructions_workflow: "guide/workflows/team-ai-instructions.md" @@ -1415,7 +1426,7 @@ ecosystem: - "Cross-links modified → Update all 4 repos" history: - date: "2026-01-20" - event: "Code Landing sync v3.29.2, 66 templates, cross-links" + event: "Code Landing sync v3.30.0, 66 templates, cross-links" commit: "5b5ce62" - date: "2026-01-20" event: "Cowork Landing fix (paths, README, UI badges)" @@ -1427,7 +1438,7 @@ ecosystem: onboarding_matrix_meta: version: "2.0.0" last_updated: "2026-02-05" - aligned_with_guide: "3.29.2" + aligned_with_guide: "3.30.0" changelog: - version: "2.0.0" date: "2026-02-05" @@ -1455,7 +1466,7 @@ onboarding_matrix: core: [rules, sandbox_native_guide, commands] time_budget: "5 min" topics_max: 3 - note: "SECURITY FIRST - sandbox before commands (v3.29.2 critical fix)" + note: "SECURITY FIRST - sandbox before commands (v3.30.0 critical fix)" beginner_15min: core: [rules, sandbox_native_guide, workflow, essential_commands] @@ -1540,7 +1551,7 @@ onboarding_matrix: - default: agent_validation_checklist time_budget: "60 min" topics_max: 6 - note: "Dual-instance pattern for quality workflows (v3.29.2)" + note: "Dual-instance pattern for quality workflows (v3.30.0)" learn_security: intermediate_30min: @@ -1551,7 +1562,7 @@ onboarding_matrix: - default: permission_modes time_budget: "30 min" topics_max: 4 - note: "NEW goal (v3.29.2) - Security-focused learning path" + note: "NEW goal (v3.30.0) - Security-focused learning path" power_60min: core: [sandbox_native_guide, mcp_secrets_management, security_hardening] @@ -1576,7 +1587,7 @@ onboarding_matrix: core: [rules, sandbox_native_guide, workflow, essential_commands, context_management, plan_mode] time_budget: "60 min" topics_max: 6 - note: "Security foundation + core workflow (v3.29.2 sandbox added)" + note: "Security foundation + core workflow (v3.30.0 sandbox added)" intermediate_120min: core: [plan_mode, agents, skills, config_hierarchy, git_mcp_guide, hooks, mcp_servers]