feat: smart-suggest ROI script + hook tuning + guide updates (Mar 16)

- Add examples/scripts/smart-suggest-roi.py: stdlib-only analyzer correlating suggestion log with session JSONL files to measure command acceptance rate. 4 acceptance signals, tier breakdown, daily trend, --json/--since/--no-sessions CLI. - Tune Aristote smart-suggest hook: tighten 5 over-firing triggers (/tech:commit, /tech:sonarqube, /tech:dupes, /check-conventions a11y, /tech:worktree) - Guide: identity re-injection hook, context engineering maturity grid, code review workflow, 1M context window GA update, Spring Break promo, security audit patterns - Resource evaluations: Nick Tune hooks (3/5), VicKayro security audit (2/5), Karl Mazier CLAUDE.md templates, Paul Rayner ContextFlow, Siddhant agent trace, Andrew Yng context hub, JP Caparas 1M context window Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 12:20:40 +01:00 · 2026-03-16 12:20:40 +01:00 · da8bc09f2d
commit da8bc09f2d
parent d9cff74d71
19 changed files with 1963 additions and 6 deletions
--- a/guide/core/architecture.md
+++ b/guide/core/architecture.md
@ -467,6 +467,23 @@ Claude Code's effectiveness degrades predictably under certain conditions:
 3. **Scope tightly**: Break large tasks into focused sub-tasks
 4. **Use sub-agents**: Delegate exploration to `Task` tool to preserve main context

+### Failure-Triggered Context Drift
+
+A separate degradation mode that does not depend on context size: repeated tool failures. When a tool call fails and Claude retries, error output accumulates in the context window. Stack traces, retry noise, and error messages dilute the original intent — subsequent attempts follow the error narrative rather than the task goal. The context window is not full, but the signal-to-noise ratio has degraded.
+
+This is distinct from compaction drift. Compaction addresses context *size*; failure re-injection addresses context *quality* within a bounded window.
+
+**Pattern**: re-inject the core task instruction on every command failure, not just after `/compact`. A `PostToolUse` hook can prefix retried prompts with a condensed version of the original task and constraints:
+
+```bash
+# PostToolUse hook: re-inject intent after failures
+if [[ "$CLAUDE_TOOL_EXIT_CODE" != "0" ]]; then
+  echo "REMINDER: The current task is: $ORIGINAL_TASK_SUMMARY. Ignore the above error if non-blocking and continue toward that goal."
+fi
+```
+
+Source: [Nick Tune — Workflow DSL: Domain-Driven Claude Code Workflows](https://nick-tune.me/blog/2026-03-01-workflow-dsl-domain-driven-claude-code-workflows/) (2026-03-01)
+
 ---

 ## 4. Sub-Agent Architecture
--- a/guide/core/context-engineering.md
+++ b/guide/core/context-engineering.md
@ -26,6 +26,7 @@ This guide covers everything from the token math behind context budgets to build
 6. [Context Lifecycle](#6-context-lifecycle)
 7. [Quality Measurement](#7-quality-measurement)
 8. [Context Reduction Techniques](#8-context-reduction-techniques)
+9. [Maturity Assessment](#9-maturity-assessment)

 ---

@ -67,6 +68,8 @@ LLMs are context-window computers. The quality of output is bounded by the quali

 Teams that invest in context engineering consistently report fewer revision cycles, better adherence to conventions, and more predictable outputs. The investment is front-loaded (building the system), but the returns compound across every interaction.

+A useful diagnostic reframe: **most AI output failures are context failures, not model failures.** When Claude generates a generic response, ignores a convention, or produces code that doesn't match your stack, the model is almost never broken — the context it received was incomplete, contradictory, or missing the right information at the right time. This reframe shifts troubleshooting from "the AI is bad at this" to "what is missing from the context?"
+
 ### The Three Layers

 Context engineering in Claude Code operates across three distinct layers:
@ -81,6 +84,21 @@ Each layer has different tradeoffs. Global config is always-on but cannot refere

 Good context engineering means putting each piece of information in the right layer — not cramming everything into one file, and not leaving critical knowledge in the session layer where it evaporates after every conversation.

+### Static vs. Dynamic Context
+
+The three-layer system above is *static context* — configuration files that are assembled before a session begins and remain stable throughout. Claude Code is primarily a static context system, which is why CLAUDE.md structure and path-scoping matter so much.
+
+As you move toward agent workflows, a second category appears: *dynamic context*, assembled at inference time as the agent operates.
+
+| Type | How assembled | Examples in Claude Code |
+|------|--------------|-------------------------|
+| **Static** | Before session, from files | CLAUDE.md, path-scoped modules, skills |
+| **Dynamic** | At runtime, from tools | Tool outputs, file reads, web fetches, MCP data |
+
+In practice, every Claude Code session uses both. The static context (your configuration) sets the behavioral envelope; the dynamic context (files Claude reads, tool results it processes) provides the specific information for each task. Context engineering covers both, but the failure modes differ: static context problems manifest as consistent convention violations; dynamic context problems manifest as Claude acting on stale or incomplete information mid-task.
+
+For teams building automated pipelines and agents, Anthropic's September 2025 engineering post ["Effective context engineering for AI agents"](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) covers the dynamic side in depth.
+
 ---

 ## 2. The Context Budget
@ -1174,6 +1192,50 @@ The highest-leverage sequence for a project with context debt:

 ---

+## 9. Maturity Assessment
+
+Context engineering capability develops in stages. Most teams reach Level 2 and stop — not because higher levels are complex, but because the failures at Level 2 are invisible. Output quality is acceptable, so the pressure to go further never appears. This assessment makes the gap visible.
+
+### The Six Levels
+
+| Level | Name | What exists | Failure mode |
+|-------|------|-------------|--------------|
+| **0** | No configuration | LLM with no CLAUDE.md | Generic outputs, zero project awareness |
+| **1** | Flat config | Single CLAUDE.md, no structure | Rules pile up, adherence degrades after ~100 lines |
+| **2** | Structured config | Sections, clear organization, global/project separation | Works solo, breaks at team scale |
+| **3** | Modular config | Path-scoped modules, deliberate layering | Rules maintained but no verification |
+| **4** | Measured config | Canary tests, adherence tracking, lifecycle management | System works but drifts silently over time |
+| **5** | Engineered system | Profiles, CI drift detection, ACE pipeline, quarterly audit rhythm | — |
+
+### Self-Assessment
+
+Answer each question. Stop at the first "No" — that is your current level.
+
+**Level 0 → 1**: Do you have a CLAUDE.md file in your project?
+
+**Level 1 → 2**: Does your configuration distinguish between global conventions (in `~/.claude/CLAUDE.md`) and project-specific rules (in `./CLAUDE.md`)? Are sections clearly separated?
+
+**Level 2 → 3**: Are subsystem-specific rules in path-scoped modules rather than the root CLAUDE.md? Does your root CLAUDE.md stay under 150 lines?
+
+**Level 3 → 4**: Do you have canary checks that verify key conventions? Do you track violation rates for your most important rules? Do you run a context audit after major milestones?
+
+**Level 4 → 5**: Do team members assemble their CLAUDE.md from profiles rather than editing it directly? Is there CI drift detection that alerts when configuration diverges from source modules? Do you run session retrospectives to feed new patterns back into configuration?
+
+### What to Do at Each Level
+
+| Your level | Next action |
+|------------|-------------|
+| 0 | Create a minimal CLAUDE.md with 5-10 rules. See §3 for what belongs there. |
+| 1 | Split global and project config. Move cross-project preferences to `~/.claude/CLAUDE.md`. |
+| 2 | Identify the 2-3 highest-traffic subsystems. Create path-scoped modules for them. |
+| 3 | Write 3-5 canary prompts for your most violated rules. Automate them. |
+| 4 | Introduce profiles for team members. Add CI drift detection. Start session retrospectives. |
+| 5 | Maintain quarterly audits. The system is built — the work is ongoing calibration. |
+
+Most teams move from Level 0 to Level 2 in a single afternoon. Moving from Level 3 to Level 4 requires a measurement habit, not more configuration. The bottleneck at the higher levels is not knowledge — it is the discipline to treat configuration as a living system rather than a one-time setup.
+
+---
+
 ## Cross-References

 - Architecture and project structure patterns: `guide/core/architecture.md`
--- a/guide/ultimate-guide.md
+++ b/guide/ultimate-guide.md
@ -2018,7 +2018,7 @@ The default model depends on your subscription: **Max/Team Premium** subscribers
 | **Sonnet 4.6** | $3.00 | $15.00 | 200K tokens | Default model (Feb 2026) |
 | Sonnet 4.5 | $3.00 | $15.00 | 200K tokens | Legacy (same price) |
 | Opus 4.6 (standard) | $5.00 | $25.00 | 200K tokens | Released Feb 2026 |
-| Opus 4.6 (1M context beta) | $10.00 | $37.50 | 1M tokens | Requests >200K context |
+| Opus 4.6 (1M context) | $5.00 | $25.00 | 1M tokens | GA for Max/Team/Enterprise; API requires tier 4 |
 | Opus 4.6 (fast mode) | $30.00 | $150.00 | 200K tokens | 2.5x faster, 6x price |
 | Haiku 4.5 | $0.80 | $4.00 | 200K tokens | Budget option |

@ -2028,7 +2028,7 @@ The default model depends on your subscription: **Max/Team Premium** subscribers

 #### 200K vs 1M Context: Performance, Cost & Use Cases

-The 1M context window (beta, API + usage tier 4 required) is a significant capability jump — but community feedback consistently frames it as a **niche premium tool**, not a default.
+The 1M context window (GA for Max/Team/Enterprise plans; API tier 4 still required for direct API use) is a significant capability jump — but community feedback consistently frames it as a **niche premium tool**, not a default.

 **Retrieval accuracy at scale (MRCR v2 8-needle 1M variant)**

@ -2042,13 +2042,13 @@ The benchmark is the "8-needle 1M variant" — finding 8 specific facts in a 1M-

 **Cost per session (approximate)**

-Above 200K input tokens, **all tokens** in the request are charged at premium rates — not just the excess. Applies to both Sonnet 4.6 and Opus 4.6.
+Above 200K input tokens on direct API, **all tokens** in the request are charged at premium rates — not just the excess. Note: on Max/Team/Enterprise Claude Code plans, Opus 4.6 1M is the default at standard rates (no premium) as of v2.1.75 (March 2026).

 | Session type | ~Tokens in | ~Tokens out | Sonnet 4.6 | Opus 4.6 |
 |---|---|---|---|---|
 | Bug fix / PR review (≤200K) | 50K | 5K | ~$0.23 | ~$0.38 |
 | Module refactoring (≤200K) | 150K | 20K | ~$0.75 | ~$1.25 |
-| Full service analysis (>200K, 1M beta) | 500K | 50K | ~$4.13 | ~$6.88 |
+| Full service analysis (>200K, 1M context) | 500K | 50K | ~$4.13 | ~$6.88 |

 For comparison: Gemini 1.5 Pro offers a 2M context window at $3.50/$10.50/MTok — significantly cheaper for pure long-context RAG. Community advice: use Gemini for large-document RAG, Claude for reasoning quality and agentic workflows.

@ -2065,8 +2065,8 @@ For comparison: Gemini 1.5 Pro offers a 2M context window at $3.50/$10.50/MTok
 **Key facts**
 - Opus 4.6 max output: **128K tokens**; Sonnet 4.6 max output: **64K tokens**
 - 1M context ≈ 30,000 lines of code / 750,000 words
- 1M context is **beta** — requires `anthropic-beta: context-1m-2025-08-07` header, usage tier 4 or custom rate limits
- Above 200K input tokens: Sonnet 4.6 doubles to $6/$22.50/MTok; Opus 4.6 doubles to $10/$37.50/MTok
+- 1M context is **GA for Max/Team/Enterprise Claude Code plans** (v2.1.75, March 2026) — API direct use still requires tier 4 or custom rate limits
+- API direct use above 200K input tokens: Sonnet 4.6 doubles to $6/$22.50/MTok; Opus 4.6 doubles to $10/$37.50/MTok (standard rate applies for Claude Code Max/Team/Enterprise plans)
 - If input stays ≤200K, standard pricing applies even with the beta flag enabled
 - **Practical workaround**: check context at ~70% and open a new session rather than hitting compaction ([HN pattern](https://news.ycombinator.com/item?id=46902427))
 - Community consensus: 200K + RAG is the default; 1M Opus is reserved for cases where loading everything at once is genuinely necessary
@ -10170,6 +10170,85 @@ echo '{"session_id":"test","cwd":"'$(pwd)'"}' | .claude/hooks/session-summary.sh

 ---

+### Identity Re-injection After Compaction
+
+**The problem**: When Claude compacts context during a long session, agents configured with a specific role — team lead, developer, reviewer — can "forget" their identity. The compacted transcript no longer contains the original system instructions, so the next response drops the role entirely and starts behaving generically.
+
+This is most visible in agent teams with explicit identity prefixes. A developer agent that was consistently marking messages with `🔨 DEVELOPER:` suddenly stops after compaction and starts responding as a generic assistant.
+
+**The pattern**: Store the agent's identity in a file (`.claude/agent-identity.txt`). After each user message, a `UserPromptSubmit` hook checks whether the last assistant response includes the expected identity marker. If not — which happens after compaction — it injects the identity file contents as `additionalContext`. The next response re-establishes the role without human intervention.
+
+```bash
+# .claude/agent-identity.txt
+# Your agent's identity instructions — anything that should survive compaction
+
+You are the feature team lead. You coordinate the team — you do not write code
+and you do not review code.
+
+Prefix every message with the current state:
+  SPAWN / PLANNING / DEVELOPING / REVIEWING / COMMITTING / COMPLETE
+```
+
+```bash
+# .claude/hooks/identity-reinjection.sh
+# UserPromptSubmit hook — re-injects identity after compaction
+
+IDENTITY_FILE="${CLAUDE_IDENTITY_FILE:-.claude/agent-identity.txt}"
+IDENTITY_MARKER="${CLAUDE_IDENTITY_MARKER:-}"
+
+[[ ! -f "$IDENTITY_FILE" ]] && exit 0
+
+IDENTITY=$(cat "$IDENTITY_FILE")
+[[ -z "$IDENTITY" ]] && exit 0
+
+# Default marker: first non-empty line of the identity file
+[[ -z "$IDENTITY_MARKER" ]] && IDENTITY_MARKER=$(grep -m1 -v '^#' "$IDENTITY_FILE" | head -c 40)
+
+TRANSCRIPT_PATH=$(echo "$INPUT" | jq -r '.transcript_path // empty')
+[[ -z "$TRANSCRIPT_PATH" || ! -f "$TRANSCRIPT_PATH" ]] && exit 0
+
+LAST_ASSISTANT=$(jq -r '
+    [.[] | select(.role == "assistant")] | last | .content |
+    if type == "array" then map(select(.type == "text") | .text) | join("") else . end
+' "$TRANSCRIPT_PATH" 2>/dev/null)
+
+# Identity intact: no action
+echo "$LAST_ASSISTANT" | grep -qF "$IDENTITY_MARKER" && exit 0
+
+# Identity missing: re-inject
+jq -n --arg ctx "[Identity reminder]\n\n$IDENTITY" '{"additionalContext": $ctx}'
+exit 0
+```
+
+**Configuration** (`settings.json`):
+
+```json
+{
+  "hooks": {
+    "UserPromptSubmit": [{
+      "hooks": [{
+        "type": "command",
+        "command": ".claude/hooks/identity-reinjection.sh"
+      }]
+    }]
+  }
+}
+```
+
+**How it behaves**:
+- Zero overhead when identity marker is present (exits immediately on match)
+- Silent no-op when no `.claude/agent-identity.txt` file exists
+- Triggers automatically after compaction — no manual intervention needed
+- Works in both solo sessions (long-running agents) and agent team configurations
+
+**Customization**: Set `CLAUDE_IDENTITY_MARKER` in your environment to a short, distinctive string from the agent's standard output (e.g. `"LEAD:"`, `"DEVELOPER:"`, `"🔨"`). If not set, the hook uses the first 40 characters of the identity file as the marker.
+
+> **Full implementation**: [`examples/hooks/bash/identity-reinjection.sh`](../../examples/hooks/bash/identity-reinjection.sh)
+>
+> **Origin**: Pattern sourced from Nick Tune's [hook-driven dev workflows](https://nick-tune.me/blog/2026-02-28-hook-driven-dev-workflows-with-claude-code/) (2026-02-28). The broader article covers state machine workflows with agent teams — see [Agent Teams Workflow](../workflows/agent-teams.md) for context.
+
+---
+
 # 8. MCP Servers

 _Quick jump:_ [What is MCP](#81-what-is-mcp) · [Available Servers](#82-available-servers) · [Configuration](#83-configuration) · [Server Selection Guide](#84-server-selection-guide) · [Plugin System](#85-plugin-system) · [MCP Security](#86-mcp-security)
@ -13391,6 +13470,8 @@ For critical work, combine everything:

 > **📖 Complete Workflow Guide**: See [GitHub Actions Workflows](./workflows/github-actions.md) for 5 production-ready patterns using the official `anthropics/claude-code-action` (PR review, triage, security, scheduled maintenance).

+> **Code Review (Teams/Enterprise)**: For automated PR review without manual prompting, see [Code Review](./workflows/code-review.md) — Anthropic's multi-agent review feature that posts inline GitHub comments on every PR.
+
 ### Headless Mode

 Run Claude Code without interactive prompts:
--- a/guide/workflows/code-review.md
+++ b/guide/workflows/code-review.md
@ -0,0 +1,158 @@
+---
+title: "Code Review (Claude Code feature)"
+description: "Automated multi-agent PR review for Teams and Enterprise — setup, triggers, REVIEW.md configuration, and cost management"
+tags: [feature, teams, enterprise, github, code-review]
+---
+
+# Code Review
+
+> **Availability**: Research preview — Teams and Enterprise plans only. Not available on Free/Pro accounts, nor for organizations with Zero Data Retention (ZDR) enabled.
+> **Launched**: March 9, 2026
+
+Claude Code's Code Review feature runs a multi-agent review on every GitHub pull request. A fleet of specialized agents examines the diff in the context of the full codebase, each looking for a different class of issue (logic errors, security vulnerabilities, edge cases, regressions), followed by a verification pass that filters false positives.
+
+Findings are posted as inline PR comments on the specific lines where issues were found, tagged by severity. Reviews don't approve or block PRs, so existing review workflows stay intact.
+
+---
+
+## How it works
+
+1. Trigger fires (PR opened, push, or manual `@claude review` comment)
+2. Multiple agents analyze the diff and surrounding code in parallel on Anthropic infrastructure
+3. Each agent targets a different class of issue
+4. A verification step checks candidates against actual code behavior to remove false positives
+5. Results are deduplicated, ranked by severity, and posted as inline PR comments
+6. If no issues are found, Claude posts a short confirmation comment
+
+Reviews complete in **20 minutes on average**, scaling with PR size and complexity.
+
+### Severity levels
+
+| Marker | Severity | Meaning |
+|:-------|:---------|:--------|
+| 🔴 | Normal | A bug that should be fixed before merging |
+| 🟡 | Nit | Minor issue, worth fixing but not blocking |
+| 🟣 | Pre-existing | A bug in the codebase not introduced by this PR |
+
+Each finding includes a collapsible extended reasoning section explaining why Claude flagged the issue and how it verified the problem.
+
+---
+
+## Setup
+
+An admin enables Code Review once for the organization and selects which repositories to include.
+
+### 1. Open admin settings
+
+Go to [claude.ai/admin-settings/claude-code](https://claude.ai/admin-settings/claude-code) and find the **Code Review** section. Requires admin access to both your Claude organization and permission to install GitHub Apps in your GitHub organization.
+
+### 2. Click Setup
+
+This begins the GitHub App installation flow.
+
+### 3. Install the Claude GitHub App
+
+Follow the prompts to install the Claude GitHub App on your GitHub organization. The app requests:
+
+- **Contents**: read and write
+- **Issues**: read and write
+- **Pull requests**: read and write
+
+Code Review uses read access to contents and write access to pull requests. This permission set also supports [GitHub Actions](/en/github-actions) if you enable that later.
+
+### 4. Select repositories
+
+Choose which repositories to enable. If a repo is missing, ensure you granted the GitHub App access during installation. You can add more repositories later from the admin settings table.
+
+### 5. Set review triggers per repo
+
+For each repository, choose when reviews run:
+
+| Trigger | When it runs | Cost profile |
+|---------|-------------|--------------|
+| **Once after PR creation** | Once when PR opens or is marked ready | Lowest |
+| **After every push** | On every push to the PR branch | Highest (multiplied by push count) |
+| **Manual** | Only when someone comments `@claude review` | Controlled |
+
+After the `@claude review` comment, subsequent pushes to that PR trigger reviews automatically regardless of the configured trigger.
+
+**Manual mode** is useful for high-traffic repos where you want to opt specific PRs into review, or only start reviewing when the PR is ready for review.
+
+---
+
+## Manual trigger
+
+Comment `@claude review` on any open, non-draft PR to start a review immediately. Requirements:
+
+- Top-level PR comment (not an inline diff comment)
+- `@claude review` at the start of the comment
+- Owner, member, or collaborator access on the repository
+
+If a review is already running, the request queues until the in-progress review completes.
+
+---
+
+## Configure reviews
+
+Two files control what Claude flags. Both are additive on top of the default correctness checks.
+
+### CLAUDE.md
+
+Claude reads all `CLAUDE.md` files in your directory hierarchy. Newly-introduced violations are flagged as nit-level findings. Bidirectional: if a PR makes a `CLAUDE.md` statement outdated, Claude flags that the docs need updating too.
+
+Use `CLAUDE.md` for guidance that also applies to interactive Claude Code sessions.
+
+### REVIEW.md
+
+Add `REVIEW.md` to your **repository root** for review-only rules. Auto-discovered, no configuration needed.
+
+```markdown
+# Code Review Guidelines
+
+## Always check
+- New API endpoints have corresponding integration tests
+- Database migrations are backward-compatible
+- Error messages don't leak internal details to users
+
+## Style
+- Prefer early returns over nested conditionals
+- Use structured logging, not f-string interpolation in log calls
+
+## Skip
+- Generated files under `src/gen/`
+- Formatting-only changes in `*.lock` files
+- Migration files in `db/migrations/`
+```
+
+Use `REVIEW.md` for rules that would clutter `CLAUDE.md` for general sessions (linter conventions, skip lists, team-specific patterns).
+
+---
+
+## Pricing
+
+Code Review is billed on token usage, **separately from your plan's included usage** (via [extra usage](https://support.claude.com/en/articles/12429409-extra-usage-for-paid-claude-plans)).
+
+- Average cost: **$15–25 per review**, scaling with PR size, codebase complexity, and the number of issues requiring verification
+- "After every push" multiplies cost by push count
+- To set a monthly spend cap: [claude.ai/admin-settings/usage](https://claude.ai/admin-settings/usage) → configure limit for the "Claude Code Review" service
+- Monitor spend: [claude.ai/analytics/code-review](https://claude.ai/analytics/code-review) (daily PR count, weekly spend, per-repo breakdown)
+
+---
+
+## Cross-reference
+
+For manual code review workflows (CLI, no Teams/Enterprise required):
+- [Multi-agent code review workflow](../ultimate-guide.md#multi-agent-code-review) — DIY agent teams via CLI
+- [GitHub Actions integration](./github-actions.md) — custom CI/CD automation (self-hosted alternative to this managed service)
+- [GitLab CI/CD](/en/gitlab-ci-cd) — self-hosted Claude integration for GitLab pipelines
+- Code Review plugin — on-demand local reviews before pushing (available in the plugin marketplace)
+
+---
+
+## Known limitations (research preview)
+
+- Teams and Enterprise only — no Free/Pro access
+- Not available for organizations with Zero Data Retention (ZDR) enabled
+- GitHub only for the managed service (GitLab supported via CI/CD integration, not this feature)
+- Full-repo indexing latency on first activation for large repos
+- Anthropic internal stats: ~7.5 issues found per PR >1000 lines, <1% false positive rate — self-reported, not independently verified