marketing-shibata50/claude-code-ultimate-guide

Florian BRUNIAUX ee5791668a docs: add SE-CoVe plugin example + resource evaluation workflow (v3.11.6)

- First plugin example: SE-CoVe (Chain-of-Verification, Meta AI ACL 2024)
- Academic approach: cite paper metrics, not marketing claims
- Performance table: +23-112% accuracy (task-dependent, trade-offs disclosed)
- Resource evaluation template established (Perplexity fact-check workflow)
- Curation policy: Academic validation + Claims verified + Costs transparent
- Templates count: 82 → 83
- Architecture diagram added (visual overview of Claude Code internals)

Files:
- examples/plugins/se-cove.md (new plugin documentation)
- claudedocs/resource-evaluations/2026-01-24-se-cove-plugin.md (evaluation report)
- README.md, CHANGELOG.md, VERSION, reference.yaml (version bump 3.11.5 → 3.11.6)
- guide/architecture.md + image (visual overview)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-24 17:40:54 +01:00

3.6 KiB

Raw Blame History

plugin	marketplace	version	license	research
chain-of-verification	vertti/se-cove-claude-plugin	1.1.1	MIT	arXiv:2309.11495 (ACL 2024 Findings)

SE-CoVe: Chain-of-Verification

Software Engineering adaptation of Meta's Chain-of-Verification methodology for Claude Code.

Research Foundation

Paper: "Chain-of-Verification Reduces Hallucination in Large Language Models" Authors: Dhuliawala et al. (Meta AI) Published: ACL 2024 Findings Sources: arXiv:2309.11495 | ACL Anthology

How It Works

5-stage pipeline ensuring independent verification:

Baseline: Generate initial solution
Planner: Create verification questions from solution claims
Executor: Answer questions independently (never sees baseline)
Synthesizer: Compare findings, identify discrepancies
Output: Produce verified solution

Critical innovation: Verifier operates without access to draft code, preventing confirmation bias.

Performance Metrics

Results from Meta's research paper (Llama 65B model):

Task Type	Metric	Improvement	Computational Cost
Biography generation	FACTSCORE	+28% (55.9→71.4)	-26% output volume (16.6→12.3 facts)
Closed-book QA	F1 Score	+23% (0.39→0.48)	~2x token consumption
List-based questions	Precision	+112% (0.17→0.36)	Fewer total answers

Source: Dhuliawala et al., ACL 2024 Findings (Table 1, Section 4.3)

Key insight: Higher accuracy comes at cost of increased computation and reduced output volume.

When to Use

✅ Recommended

Critical code review: Architectural decisions, security-sensitive code
Complex debugging: Multi-component failure analysis
API/library integration: When correctness > speed
Acceptable 2x cost: Token budget allows for quality premium

❌ Not Recommended

Trivial changes: Simple fixes, formatting, typos
Exploratory coding: Rapid prototyping, experimentation
Tight token budgets: When cost is primary constraint
Need comprehensive output: When you need all facts, not just accurate subset

Installation

# Add plugin marketplace
/plugin marketplace add vertti/se-cove-claude-plugin

# Install plugin (in separate command)
/plugin install chain-of-verification

Note: Commands must be pasted separately (Claude Code marketplace limitation).

Usage

# Invoke verification
/chain-of-verification:verify <your question>

# Autocomplete available
/ver<Tab>

Limitations

From the research paper (Section 6):

Not a silver bullet: Reduces hallucinations but does not eliminate them
Computational cost: ~2x token usage vs baseline generation (estimated from implementation)
Output volume trade-off: Generates fewer but more accurate results
Model-specific: Tested on Llama 65B; generalization to GPT-4/Claude/Sonnet unverified
Task dependency: Performance varies significantly by task type (23-112%)
Factual hallucinations only: Does not address incorrect reasoning steps or opinions

Source Code

GitHub: vertti/se-cove-claude-plugin
Version: 1.1.1 (2026-01-23)
License: MIT
Author: Janne Sinivirta

Main guide section: Plugin System
Methodology: Multi-Agent Orchestration
Verification Loops: Autonomous Iteration

3.6 KiB Raw Blame History