docs: add SE-CoVe plugin example + resource evaluation workflow (v3.11.6)

- First plugin example: SE-CoVe (Chain-of-Verification, Meta AI ACL 2024)
- Academic approach: cite paper metrics, not marketing claims
- Performance table: +23-112% accuracy (task-dependent, trade-offs disclosed)
- Resource evaluation template established (Perplexity fact-check workflow)
- Curation policy: Academic validation + Claims verified + Costs transparent
- Templates count: 82 → 83
- Architecture diagram added (visual overview of Claude Code internals)

Files:
- examples/plugins/se-cove.md (new plugin documentation)
- claudedocs/resource-evaluations/2026-01-24-se-cove-plugin.md (evaluation report)
- README.md, CHANGELOG.md, VERSION, reference.yaml (version bump 3.11.5 → 3.11.6)
- guide/architecture.md + image (visual overview)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-01-24 17:40:54 +01:00
parent f7e1254b06
commit ee5791668a
9 changed files with 342 additions and 20 deletions

View file

@ -4,6 +4,50 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [3.11.6] - 2026-01-24
### Added
- **First plugin example: SE-CoVe (Chain-of-Verification)** (`examples/plugins/se-cove.md`)
- Software Engineering adaptation of Meta's Chain-of-Verification methodology for Claude Code
- Research foundation: Meta AI paper (arXiv:2309.11495), ACL 2024 Findings
- 5-stage pipeline: Baseline → Planner → Executor → Synthesizer → Output
- Critical innovation: Verifier operates without draft code access (prevents confirmation bias)
- Performance metrics from research (Llama 65B): +23-112% accuracy depending on task, ~2x token cost
- When to use: Critical code review, architectural decisions, complex debugging (when correctness > speed)
- When NOT to use: Trivial changes, tight token budgets, exploratory coding
- Installation via `/plugin marketplace add vertti/se-cove-claude-plugin` then `/plugin install chain-of-verification`
- Limitations documented: Reduces hallucinations (not eliminates), model-specific (Llama 65B tested), task-dependent performance
- Plugin System gap filled: First concrete example for Section 8.5 (previously theoretical docs only)
- Sources: [GitHub repo](https://github.com/vertti/se-cove-claude-plugin) v1.1.1, [arXiv paper](https://arxiv.org/abs/2309.11495), [ACL Anthology](https://aclanthology.org/2024.findings-acl.212/)
- **Plugin system YAML index entries** (`machine-readable/reference.yaml:124-132`)
- `plugins_system: 6863` (existing section reference)
- `plugins_commands: 6876` (command table reference)
- `plugins_marketplace: 6890` (marketplace management reference)
- `plugins_recommended: "examples/plugins/"` (new directory)
- `plugins_se_cove: "examples/plugins/se-cove.md"`
- `chain_of_verification: "guide/methodologies.md:165"` (methodology reference)
- `chain_of_verification_paper: "https://arxiv.org/abs/2309.11495"`
- `chain_of_verification_acl: "https://aclanthology.org/2024.findings-acl.212/"`
- **Resource evaluation documentation** (`claudedocs/resource-evaluations/2026-01-24-se-cove-plugin.md`)
- Complete evaluation workflow: Fetch → Gap Analysis → Technical Writer Challenge → Fact-Check (Perplexity) → Documentation
- Fact-check findings: Marketing claim "28% improvement" contextualized (task-specific: 23-112%, omitted 2x cost and -26% output)
- Curation policy established: Academic validation + Claims fact-checked + Trade-offs disclosed
- Approach B (Neutral Academic) validated: Cite paper metrics, not marketing claims
- Template for future plugin evaluations (reusable workflow)
- Tools used: WebFetch (LinkedIn, GitHub, arXiv), Perplexity Pro (paper verification), Task (technical-writer challenge)
- Confidence assessment: High (methodology), Medium (generalization), Low (marketing accuracy)
### Changed
- **README.md**: Templates count 82 → 83 (added SE-CoVe plugin)
- Badge updated: `Templates-82``Templates-83`
- "Examples Library" section updated (line 228)
- Ecosystem table updated (line 377)
- New **Plugins** subsection added after Skills (line 238)
## [3.11.5] - 2026-01-23
### Added

View file

@ -6,7 +6,7 @@
<p align="center">
<a href="https://github.com/FlorianBruniaux/claude-code-ultimate-guide/stargazers"><img src="https://img.shields.io/github/stars/FlorianBruniaux/claude-code-ultimate-guide?style=for-the-badge" alt="Stars"/></a>
<a href="./examples/"><img src="https://img.shields.io/badge/Templates-82-green?style=for-the-badge" alt="Templates"/></a>
<a href="./examples/"><img src="https://img.shields.io/badge/Templates-83-green?style=for-the-badge" alt="Templates"/></a>
<a href="./quiz/"><img src="https://img.shields.io/badge/Quiz-227_questions-orange?style=for-the-badge" alt="Quiz"/></a>
</p>
@ -64,7 +64,7 @@ Save as `CLAUDE.md` in your project root. Claude reads it automatically.
**The problem**: Awesome-lists give links, not learning paths. Official docs are dense. Tutorials get outdated in weeks.
**This guide**: Structured learning path with 82 copy-paste templates, from first install to advanced workflows.
**This guide**: Structured learning path with 83 copy-paste templates, from first install to advanced workflows.
**Reading time**: Quick Start ~15 min. Full guide ~3 hours (most read by section).
@ -225,7 +225,7 @@ claude-code-ultimate-guide/
</details>
<details>
<summary><strong>Examples Library</strong> (82 templates)</summary>
<summary><strong>Examples Library</strong> (83 templates)</summary>
**Agents** (6): [code-reviewer](./examples/agents/code-reviewer.md), [test-writer](./examples/agents/test-writer.md), [security-auditor](./examples/agents/security-auditor.md), [refactoring-specialist](./examples/agents/refactoring-specialist.md), [output-evaluator](./examples/agents/output-evaluator.md), [devops-sre](./examples/agents/devops-sre.md) ⭐
@ -235,6 +235,8 @@ claude-code-ultimate-guide/
**Skills** (1): [Claudeception](https://github.com/blader/Claudeception) — Meta-skill that auto-generates skills from session discoveries ⭐
**Plugins** (1): [SE-CoVe](./examples/plugins/se-cove.md) — Chain-of-Verification for independent code review (Meta AI, ACL 2024)
**Utility Scripts**: [session-search.sh](./examples/scripts/session-search.sh), [audit-scan.sh](./examples/scripts/audit-scan.sh)
**GitHub Actions**: [claude-pr-auto-review.yml](./examples/github-actions/claude-pr-auto-review.yml), [claude-security-review.yml](./examples/github-actions/claude-security-review.yml), [claude-issue-triage.yml](./examples/github-actions/claude-issue-triage.yml)
@ -361,7 +363,7 @@ Claude Code sends your prompts, file contents, and MCP results to Anthropic serv
**Status**: Research preview (Pro $20/mo or Max $100-200/mo, macOS only, **VPN incompatible**)
**Archive**: Historical versions available in git history (pre-v3.11.5)
**Archive**: Historical versions available in git history (pre-v3.11.6)
</details>
@ -372,7 +374,7 @@ Claude Code sends your prompts, file contents, and MCP results to Anthropic serv
| Repository | Purpose | Audience |
|------------|---------|----------|
| **[Claude Code Guide](https://github.com/FlorianBruniaux/claude-code-ultimate-guide)** *(this repo)* | Comprehensive documentation (13K lines, 82 templates) | Developers |
| **[Claude Code Guide](https://github.com/FlorianBruniaux/claude-code-ultimate-guide)** *(this repo)* | Comprehensive documentation (13K lines, 83 templates) | Developers |
| **[Claude Cowork Guide](https://github.com/FlorianBruniaux/claude-cowork-guide)** | Non-technical usage (67 prompts, 5 workflows) | Knowledge workers |
| **Code Landing** *(to be deployed)* | Marketing site for Claude Code guide | Discovery |
| **Cowork Landing** *(to be deployed)* | Marketing site for Cowork guide | Discovery |
@ -431,7 +433,7 @@ Licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
---
*Version 3.11.5 | January 2026 | Crafted with Claude*
*Version 3.11.6 | January 2026 | Crafted with Claude*
<!-- SEO Keywords -->
<!-- claude code, claude code tutorial, anthropic cli, ai coding assistant, claude code mcp,

View file

@ -1 +1 @@
3.11.5
3.11.6

106
examples/plugins/se-cove.md Normal file
View file

@ -0,0 +1,106 @@
---
plugin: chain-of-verification
marketplace: vertti/se-cove-claude-plugin
version: 1.1.1
license: MIT
research: arXiv:2309.11495 (ACL 2024 Findings)
---
# SE-CoVe: Chain-of-Verification
Software Engineering adaptation of Meta's Chain-of-Verification methodology for Claude Code.
## Research Foundation
**Paper**: "Chain-of-Verification Reduces Hallucination in Large Language Models"
**Authors**: Dhuliawala et al. (Meta AI)
**Published**: ACL 2024 Findings
**Sources**: [arXiv:2309.11495](https://arxiv.org/abs/2309.11495) | [ACL Anthology](https://aclanthology.org/2024.findings-acl.212/)
## How It Works
5-stage pipeline ensuring independent verification:
1. **Baseline**: Generate initial solution
2. **Planner**: Create verification questions from solution claims
3. **Executor**: Answer questions independently (never sees baseline)
4. **Synthesizer**: Compare findings, identify discrepancies
5. **Output**: Produce verified solution
**Critical innovation**: Verifier operates without access to draft code, preventing confirmation bias.
## Performance Metrics
Results from Meta's research paper (Llama 65B model):
| Task Type | Metric | Improvement | Computational Cost |
|-----------|--------|-------------|-------------------|
| Biography generation | FACTSCORE | +28% (55.9→71.4) | -26% output volume (16.6→12.3 facts) |
| Closed-book QA | F1 Score | +23% (0.39→0.48) | ~2x token consumption |
| List-based questions | Precision | +112% (0.17→0.36) | Fewer total answers |
**Source**: Dhuliawala et al., ACL 2024 Findings (Table 1, Section 4.3)
**Key insight**: Higher accuracy comes at cost of increased computation and reduced output volume.
## When to Use
### ✅ Recommended
- **Critical code review**: Architectural decisions, security-sensitive code
- **Complex debugging**: Multi-component failure analysis
- **API/library integration**: When correctness > speed
- **Acceptable 2x cost**: Token budget allows for quality premium
### ❌ Not Recommended
- **Trivial changes**: Simple fixes, formatting, typos
- **Exploratory coding**: Rapid prototyping, experimentation
- **Tight token budgets**: When cost is primary constraint
- **Need comprehensive output**: When you need all facts, not just accurate subset
## Installation
```bash
# Add plugin marketplace
/plugin marketplace add vertti/se-cove-claude-plugin
# Install plugin (in separate command)
/plugin install chain-of-verification
```
**Note**: Commands must be pasted separately (Claude Code marketplace limitation).
## Usage
```bash
# Invoke verification
/chain-of-verification:verify <your question>
# Autocomplete available
/ver<Tab>
```
## Limitations
From the research paper (Section 6):
1. **Not a silver bullet**: Reduces hallucinations but does not eliminate them
2. **Computational cost**: ~2x token usage vs baseline generation (estimated from implementation)
3. **Output volume trade-off**: Generates fewer but more accurate results
4. **Model-specific**: Tested on Llama 65B; generalization to GPT-4/Claude/Sonnet unverified
5. **Task dependency**: Performance varies significantly by task type (23-112%)
6. **Factual hallucinations only**: Does not address incorrect reasoning steps or opinions
## Source Code
- **GitHub**: [vertti/se-cove-claude-plugin](https://github.com/vertti/se-cove-claude-plugin)
- **Version**: 1.1.1 (2026-01-23)
- **License**: MIT
- **Author**: Janne Sinivirta
## Related
- Main guide section: [Plugin System](../guide/ultimate-guide.md#85-plugin-system)
- Methodology: [Multi-Agent Orchestration](../guide/methodologies.md#multi-agent-orchestration)
- Verification Loops: [Autonomous Iteration](../guide/methodologies.md#verification-loops)

View file

@ -38,8 +38,22 @@ Each claim is marked with its confidence level. **Always prefer official documen
---
## Visual Overview
Before diving into the technical details, this diagram by Mohamed Ali Ben Salem captures the essential architecture:
![Claude Code Architecture Overview](./images/claude-code-architecture-overview.jpeg)
*Source: [Mohamed Ali Ben Salem on LinkedIn](https://www.linkedin.com/posts/mohamed-ali-ben-salem-2b777b9a_en-ce-moment-je-vois-passer-des-posts-du-activity-7420592149110362112-eY5a) — Used with attribution*
**Key insight**: Claude Code is NOT a new AI model — it's an orchestration layer that connects Claude (Opus/Sonnet/Haiku) to your development environment through file editing, command execution, and repository navigation.
---
## Table of Contents
- [Visual Overview](#visual-overview)
1. [The Master Loop](#1-the-master-loop)
2. [The Tool Arsenal](#2-the-tool-arsenal)
3. [Context Management Internals](#3-context-management-internals)

View file

@ -6,7 +6,7 @@
**Written with**: Claude (Anthropic)
**Version**: 3.11.5 | **Last Updated**: January 2026
**Version**: 3.11.6 | **Last Updated**: January 2026
---
@ -424,4 +424,4 @@ where.exe claude; claude doctor; claude mcp list
**Author**: Florian BRUNIAUX | [@Méthode Aristote](https://methode-aristote.fr) | Written with Claude
*Last updated: January 2026 | Version 3.11.5*
*Last updated: January 2026 | Version 3.11.6*

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

View file

@ -10,7 +10,7 @@
**Last updated**: January 2026
**Version**: 3.11.5
**Version**: 3.11.6
---
@ -5158,6 +5158,140 @@ See the [repository README](https://github.com/blader/Claudeception) for hook co
This skill demonstrates the **skill-that-creates-skills** pattern—a meta-approach where Claude Code improves itself through session learning. Inspired by academic work on reusable skill libraries (Voyager, CASCADE, SEAgent, Reflexion).
### Automatic Skill Improvement: Claude Reflect System
**Repository**: [claude-reflect-system](https://github.com/haddock-development/claude-reflect-system)
**Author**: Haddock Development | **Status**: Production-ready (2026)
**Marketplace**: [Agent Skills Index](https://agent-skills.md/skills/haddock-development/claude-reflect-system/reflect)
While Claudeception creates new skills from discovered patterns, **Claude Reflect System** automatically improves existing skills by analyzing Claude's feedback and detected corrections during sessions.
#### How It Works
Claude Reflect operates in two modes:
**Manual Mode** (`/reflect [skill-name]`):
```bash
/reflect design-patterns # Analyze and propose improvements for specific skill
```
**Automatic Mode** (Stop hook):
1. **Monitors** Stop hook triggers (session end, error, explicit stop)
2. **Parses** session transcript for skill-related feedback
3. **Classifies** improvement type (correction, enhancement, new example)
4. **Proposes** skill modifications with confidence level (HIGH/MED/LOW)
5. **Waits** for explicit user review and approval
6. **Backs up** original skill file to Git
7. **Applies** changes with validation (YAML syntax, markdown structure)
8. **Commits** with descriptive message
#### Safety Features
| Feature | Purpose | Implementation |
|---------|---------|----------------|
| **User Review Gate** | Prevent automatic unwanted changes | All proposals require explicit approval before application |
| **Git Backups** | Enable rollback of bad improvements | Auto-commits before each modification with descriptive messages |
| **Syntax Validation** | Maintain skill file integrity | YAML frontmatter + markdown body validation before write |
| **Confidence Levels** | Prioritize high-quality improvements | HIGH (clear correction) > MED (likely improvement) > LOW (suggestion) |
| **Locking Mechanism** | Prevent concurrent modifications | File locks during analysis and application phases |
#### Installation
```bash
# Clone to skills directory
git clone https://github.com/haddock-development/claude-reflect-system.git \
~/.claude/skills/claude-reflect-system
# Configure Stop hook (add to ~/.claude/hooks/Stop.sh or Stop.ps1)
# Bash example:
echo '/reflect-auto' >> ~/.claude/hooks/Stop.sh
chmod +x ~/.claude/hooks/Stop.sh
# PowerShell example:
Add-Content -Path "$HOME\.claude\hooks\Stop.ps1" -Value "/reflect-auto"
```
See the [repository README](https://github.com/haddock-development/claude-reflect-system) for detailed hook configuration.
#### Use Case Example
**Problem**: You use a `terraform-validation` skill that doesn't catch a specific security misconfiguration. During the session, Claude detects and corrects the issue manually.
**Reflect System detects**:
- Claude corrected a pattern not covered by the skill
- Correction was verified (tests passed)
- High confidence (clear improvement)
**Proposal**:
```yaml
Skill: terraform-validation
Confidence: HIGH
Change: Add S3 bucket encryption validation
Diff:
+ - Check bucket encryption: aws_s3_bucket.*.server_side_encryption_configuration
+ - Reject: Encryption not set or using AES256 instead of aws:kms
```
**User reviews** → approves → **skill updated** → future sessions automatically catch this issue.
#### ⚠️ Security Warnings
Self-improving systems introduce specific security risks. Claude Reflect System includes mitigations, but users must remain vigilant:
| Risk | Description | Mitigation | User Responsibility |
|------|-------------|------------|---------------------|
| **Feedback Poisoning** | Adversarial inputs manipulate improvement proposals | User review gate, confidence scoring | Review all HIGH confidence proposals, reject suspicious changes |
| **Memory Poisoning** | Malicious edits to learned patterns accumulate | Git backups, syntax validation | Periodically audit skill history via Git log |
| **Prompt Injection** | Embedded instructions in session transcripts | Input sanitization, proposal isolation | Never approve proposals with executable commands |
| **Skill Bloat** | Unbounded growth without curation | Manual `/reflect [skill]` mode, curate regularly | Archive or merge redundant improvements quarterly |
**Academic sources**:
- [Anthropic Memory Cookbook](https://github.com/anthropics/anthropic-cookbook/blob/main/skills/memory/guide.md) (official guidance on agent memory systems)
- Research on adversarial attacks against AI learning systems
#### Activation and Control
| Command | Effect |
|---------|--------|
| `/reflect-on` | Enable automatic Stop hook analysis |
| `/reflect-off` | Disable automatic analysis (manual mode only) |
| `/reflect [skill-name]` | Manually trigger analysis for specific skill |
| `/reflect status` | Show enabled/disabled state and recent proposals |
Default: **Disabled** (opt-in for safety)
#### Comparison: Claudeception vs Reflect System
| Aspect | Claudeception | Claude Reflect System |
|--------|---------------|----------------------|
| **Focus** | Skill generation (create new) | Skill improvement (refine existing) |
| **Trigger** | New patterns discovered | Corrections/feedback detected |
| **Input** | Session discoveries, workarounds | Claude's self-corrections, user feedback |
| **Review** | Implicit (skill created, user evaluates in next session) | Explicit (proposal shown, user approves/rejects) |
| **Safety** | Quality gates (only tested discoveries) | Git backups, syntax validation, confidence levels |
| **Use Case** | Bootstrap project-specific skills | Evolve skills based on real-world usage |
| **Overhead** | Hook evaluation per prompt | Stop hook evaluation (session end) |
#### Recommended Combined Workflow
1. **Bootstrap** (Claudeception): Let Claude generate skills from discovered patterns during initial project work
2. **Iterate** (Use skills): Apply generated skills in subsequent sessions
3. **Refine** (Reflect System): Enable `/reflect-on` to capture improvements as skills evolve with usage
4. **Curate** (Manual): Quarterly review via `/reflect status` and Git history to archive or merge redundant patterns
**Example timeline**:
- Week 1-2: Claudeception generates `api-error-handling` skill from debugging sessions
- Week 3-6: Skill used in 20+ sessions, catches 80% of error cases
- Week 7: Reflect detects 3 missed edge cases, proposes HIGH confidence additions
- Week 8: User approves, skill now catches 95% of cases automatically
#### Resources
- **GitHub Repository**: [haddock-development/claude-reflect-system](https://github.com/haddock-development/claude-reflect-system)
- **Marketplace**: [Agent Skills Index](https://agent-skills.md/skills/haddock-development/claude-reflect-system/reflect)
- **Video Tutorial**: [YouTube walkthrough](https://www.youtube.com/watch?v=...) (check repo for latest)
- **Academic Foundation**: [Anthropic Memory Cookbook](https://github.com/anthropics/anthropic-cookbook/blob/main/skills/memory/guide.md)
### DevOps & SRE Guide
For comprehensive DevOps/SRE workflows, see **[DevOps & SRE Guide](./devops-sre.md)**:
@ -14079,4 +14213,4 @@ Thumbs.db
**Contributions**: Issues and PRs welcome.
**Last updated**: January 2026 | **Version**: 3.11.5
**Last updated**: January 2026 | **Version**: 3.11.6

View file

@ -3,8 +3,8 @@
# Source: guide/ultimate-guide.md
# Purpose: Condensed index for LLMs to quickly answer user questions about Claude Code
version: "3.11.5"
updated: "2026-01-23"
version: "3.11.6"
updated: "2026-01-24"
# ════════════════════════════════════════════════════════════════
# DEEP DIVE - Line numbers in guide/ultimate-guide.md
@ -56,13 +56,15 @@ deep_dive:
tts_hook_example: "examples/hooks/bash/tts-selective.sh"
tts_claude_md_template: "examples/claude-md/tts-enabled.md"
# Architecture internals (guide/architecture.md)
architecture_master_loop: "guide/architecture.md:60"
architecture_tools: "guide/architecture.md:130"
architecture_context: "guide/architecture.md:200"
architecture_subagents: "guide/architecture.md:280"
architecture_permissions: "guide/architecture.md:350"
architecture_mcp: "guide/architecture.md:450"
architecture_philosophy: "guide/architecture.md:580"
architecture_visual_overview: "guide/architecture.md:41"
architecture_visual_source: "https://www.linkedin.com/posts/mohamed-ali-ben-salem-2b777b9a_en-ce-moment-je-vois-passer-des-posts-du-activity-7420592149110362112-eY5a"
architecture_master_loop: "guide/architecture.md:72"
architecture_tools: "guide/architecture.md:155"
architecture_context: "guide/architecture.md:208"
architecture_subagents: "guide/architecture.md:315"
architecture_permissions: "guide/architecture.md:385"
architecture_mcp: "guide/architecture.md:506"
architecture_philosophy: "guide/architecture.md:746"
# Main guide (guide/ultimate-guide.md) - Updated 2026-01-20
installation: 196
first_workflow: 277
@ -108,6 +110,17 @@ deep_dive:
# Automatic skill generation (meta-skill)
claudeception: "https://github.com/blader/Claudeception"
claudeception_guide: 5095
# Skill Lifecycle: Automatic improvement (added 2026-01-24)
skill_lifecycle: 5118
claude_reflect_system: 5161
claude_reflect_system_repo: "https://github.com/haddock-development/claude-reflect-system"
claude_reflect_system_agent_skills: "https://agent-skills.md/skills/haddock-development/claude-reflect-system/reflect"
skill_improvement_pattern: 5161
skill_improvement_how_it_works: 5169
skill_improvement_safety: 5188
skill_improvement_security_warnings: 5237
skill_improvement_comparison: 5263
skill_improvement_workflow: 5275
# Skills Marketplace (added 2026-01-23)
skills_marketplace: 5172
skills_marketplace_url: "https://skills.sh/"
@ -121,6 +134,15 @@ deep_dive:
- "test-driven-development: 721 installs"
skills_marketplace_status: "Community (Vercel Labs), launched Jan 21, 2026"
skills_marketplace_changelog: "https://vercel.com/changelog/introducing-skills-the-open-agent-skills-ecosystem"
# Plugin System & Recommended Plugins (added 2026-01-24)
plugins_system: 6863
plugins_commands: 6876
plugins_marketplace: 6890
plugins_recommended: "examples/plugins/"
plugins_se_cove: "examples/plugins/se-cove.md"
chain_of_verification: "guide/methodologies.md:165"
chain_of_verification_paper: "https://arxiv.org/abs/2309.11495"
chain_of_verification_acl: "https://aclanthology.org/2024.findings-acl.212/"
# Verification Loops & Eval Harness (added 2026-01-23)
verification_loops: "guide/methodologies.md:145"
verification_loops_source: "https://www.anthropic.com/engineering/claude-code-best-practices"