docs: add SE-CoVe plugin example + resource evaluation workflow (v3.11.6)

- First plugin example: SE-CoVe (Chain-of-Verification, Meta AI ACL 2024) - Academic approach: cite paper metrics, not marketing claims - Performance table: +23-112% accuracy (task-dependent, trade-offs disclosed) - Resource evaluation template established (Perplexity fact-check workflow) - Curation policy: Academic validation + Claims verified + Costs transparent - Templates count: 82 → 83 - Architecture diagram added (visual overview of Claude Code internals) Files: - examples/plugins/se-cove.md (new plugin documentation) - claudedocs/resource-evaluations/2026-01-24-se-cove-plugin.md (evaluation report) - README.md, CHANGELOG.md, VERSION, reference.yaml (version bump 3.11.5 → 3.11.6) - guide/architecture.md + image (visual overview) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-24 17:40:54 +01:00 · 2026-01-24 17:40:54 +01:00 · ee5791668a
commit ee5791668a
parent f7e1254b06
9 changed files with 342 additions and 20 deletions
--- a/examples/plugins/se-cove.md
+++ b/examples/plugins/se-cove.md
@ -0,0 +1,106 @@
+---
+plugin: chain-of-verification
+marketplace: vertti/se-cove-claude-plugin
+version: 1.1.1
+license: MIT
+research: arXiv:2309.11495 (ACL 2024 Findings)
+---
+
+# SE-CoVe: Chain-of-Verification
+
+Software Engineering adaptation of Meta's Chain-of-Verification methodology for Claude Code.
+
+## Research Foundation
+
+**Paper**: "Chain-of-Verification Reduces Hallucination in Large Language Models"
+**Authors**: Dhuliawala et al. (Meta AI)
+**Published**: ACL 2024 Findings
+**Sources**: [arXiv:2309.11495](https://arxiv.org/abs/2309.11495) | [ACL Anthology](https://aclanthology.org/2024.findings-acl.212/)
+
+## How It Works
+
+5-stage pipeline ensuring independent verification:
+
+1. **Baseline**: Generate initial solution
+2. **Planner**: Create verification questions from solution claims
+3. **Executor**: Answer questions independently (never sees baseline)
+4. **Synthesizer**: Compare findings, identify discrepancies
+5. **Output**: Produce verified solution
+
+**Critical innovation**: Verifier operates without access to draft code, preventing confirmation bias.
+
+## Performance Metrics
+
+Results from Meta's research paper (Llama 65B model):
+
+| Task Type | Metric | Improvement | Computational Cost |
+|-----------|--------|-------------|-------------------|
+| Biography generation | FACTSCORE | +28% (55.9→71.4) | -26% output volume (16.6→12.3 facts) |
+| Closed-book QA | F1 Score | +23% (0.39→0.48) | ~2x token consumption |
+| List-based questions | Precision | +112% (0.17→0.36) | Fewer total answers |
+
+**Source**: Dhuliawala et al., ACL 2024 Findings (Table 1, Section 4.3)
+
+**Key insight**: Higher accuracy comes at cost of increased computation and reduced output volume.
+
+## When to Use
+
+### ✅ Recommended
+
+- **Critical code review**: Architectural decisions, security-sensitive code
+- **Complex debugging**: Multi-component failure analysis
+- **API/library integration**: When correctness > speed
+- **Acceptable 2x cost**: Token budget allows for quality premium
+
+### ❌ Not Recommended
+
+- **Trivial changes**: Simple fixes, formatting, typos
+- **Exploratory coding**: Rapid prototyping, experimentation
+- **Tight token budgets**: When cost is primary constraint
+- **Need comprehensive output**: When you need all facts, not just accurate subset
+
+## Installation
+
+```bash
+# Add plugin marketplace
+/plugin marketplace add vertti/se-cove-claude-plugin
+
+# Install plugin (in separate command)
+/plugin install chain-of-verification
+```
+
+**Note**: Commands must be pasted separately (Claude Code marketplace limitation).
+
+## Usage
+
+```bash
+# Invoke verification
+/chain-of-verification:verify <your question>
+
+# Autocomplete available
+/ver<Tab>
+```
+
+## Limitations
+
+From the research paper (Section 6):
+
+1. **Not a silver bullet**: Reduces hallucinations but does not eliminate them
+2. **Computational cost**: ~2x token usage vs baseline generation (estimated from implementation)
+3. **Output volume trade-off**: Generates fewer but more accurate results
+4. **Model-specific**: Tested on Llama 65B; generalization to GPT-4/Claude/Sonnet unverified
+5. **Task dependency**: Performance varies significantly by task type (23-112%)
+6. **Factual hallucinations only**: Does not address incorrect reasoning steps or opinions
+
+## Source Code
+
+- **GitHub**: [vertti/se-cove-claude-plugin](https://github.com/vertti/se-cove-claude-plugin)
+- **Version**: 1.1.1 (2026-01-23)
+- **License**: MIT
+- **Author**: Janne Sinivirta
+
+## Related
+
+- Main guide section: [Plugin System](../guide/ultimate-guide.md#85-plugin-system)
+- Methodology: [Multi-Agent Orchestration](../guide/methodologies.md#multi-agent-orchestration)
+- Verification Loops: [Autonomous Iteration](../guide/methodologies.md#verification-loops)