claude-code-ultimate-guide/examples/plugins/se-cove.md
Florian BRUNIAUX ee5791668a docs: add SE-CoVe plugin example + resource evaluation workflow (v3.11.6)
- First plugin example: SE-CoVe (Chain-of-Verification, Meta AI ACL 2024)
- Academic approach: cite paper metrics, not marketing claims
- Performance table: +23-112% accuracy (task-dependent, trade-offs disclosed)
- Resource evaluation template established (Perplexity fact-check workflow)
- Curation policy: Academic validation + Claims verified + Costs transparent
- Templates count: 82 → 83
- Architecture diagram added (visual overview of Claude Code internals)

Files:
- examples/plugins/se-cove.md (new plugin documentation)
- claudedocs/resource-evaluations/2026-01-24-se-cove-plugin.md (evaluation report)
- README.md, CHANGELOG.md, VERSION, reference.yaml (version bump 3.11.5 → 3.11.6)
- guide/architecture.md + image (visual overview)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-24 17:40:54 +01:00

106 lines
3.6 KiB
Markdown

---
plugin: chain-of-verification
marketplace: vertti/se-cove-claude-plugin
version: 1.1.1
license: MIT
research: arXiv:2309.11495 (ACL 2024 Findings)
---
# SE-CoVe: Chain-of-Verification
Software Engineering adaptation of Meta's Chain-of-Verification methodology for Claude Code.
## Research Foundation
**Paper**: "Chain-of-Verification Reduces Hallucination in Large Language Models"
**Authors**: Dhuliawala et al. (Meta AI)
**Published**: ACL 2024 Findings
**Sources**: [arXiv:2309.11495](https://arxiv.org/abs/2309.11495) | [ACL Anthology](https://aclanthology.org/2024.findings-acl.212/)
## How It Works
5-stage pipeline ensuring independent verification:
1. **Baseline**: Generate initial solution
2. **Planner**: Create verification questions from solution claims
3. **Executor**: Answer questions independently (never sees baseline)
4. **Synthesizer**: Compare findings, identify discrepancies
5. **Output**: Produce verified solution
**Critical innovation**: Verifier operates without access to draft code, preventing confirmation bias.
## Performance Metrics
Results from Meta's research paper (Llama 65B model):
| Task Type | Metric | Improvement | Computational Cost |
|-----------|--------|-------------|-------------------|
| Biography generation | FACTSCORE | +28% (55.9→71.4) | -26% output volume (16.6→12.3 facts) |
| Closed-book QA | F1 Score | +23% (0.39→0.48) | ~2x token consumption |
| List-based questions | Precision | +112% (0.17→0.36) | Fewer total answers |
**Source**: Dhuliawala et al., ACL 2024 Findings (Table 1, Section 4.3)
**Key insight**: Higher accuracy comes at cost of increased computation and reduced output volume.
## When to Use
### ✅ Recommended
- **Critical code review**: Architectural decisions, security-sensitive code
- **Complex debugging**: Multi-component failure analysis
- **API/library integration**: When correctness > speed
- **Acceptable 2x cost**: Token budget allows for quality premium
### ❌ Not Recommended
- **Trivial changes**: Simple fixes, formatting, typos
- **Exploratory coding**: Rapid prototyping, experimentation
- **Tight token budgets**: When cost is primary constraint
- **Need comprehensive output**: When you need all facts, not just accurate subset
## Installation
```bash
# Add plugin marketplace
/plugin marketplace add vertti/se-cove-claude-plugin
# Install plugin (in separate command)
/plugin install chain-of-verification
```
**Note**: Commands must be pasted separately (Claude Code marketplace limitation).
## Usage
```bash
# Invoke verification
/chain-of-verification:verify <your question>
# Autocomplete available
/ver<Tab>
```
## Limitations
From the research paper (Section 6):
1. **Not a silver bullet**: Reduces hallucinations but does not eliminate them
2. **Computational cost**: ~2x token usage vs baseline generation (estimated from implementation)
3. **Output volume trade-off**: Generates fewer but more accurate results
4. **Model-specific**: Tested on Llama 65B; generalization to GPT-4/Claude/Sonnet unverified
5. **Task dependency**: Performance varies significantly by task type (23-112%)
6. **Factual hallucinations only**: Does not address incorrect reasoning steps or opinions
## Source Code
- **GitHub**: [vertti/se-cove-claude-plugin](https://github.com/vertti/se-cove-claude-plugin)
- **Version**: 1.1.1 (2026-01-23)
- **License**: MIT
- **Author**: Janne Sinivirta
## Related
- Main guide section: [Plugin System](../guide/ultimate-guide.md#85-plugin-system)
- Methodology: [Multi-Agent Orchestration](../guide/methodologies.md#multi-agent-orchestration)
- Verification Loops: [Autonomous Iteration](../guide/methodologies.md#verification-loops)