diff --git a/CLAUDE.md b/CLAUDE.md index 787c188..fc59b57 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -33,8 +33,11 @@ tools/ # Interactive utilities ├── audit-prompt.md # Setup audit prompt └── onboarding-prompt.md # Personalized learning prompt -claudedocs/ # Claude working documents -├── resource-evaluations/ # External resource assessments +docs/ # Public documentation (tracked) +└── resource-evaluations/ # External resource evaluations (14 files) + +claudedocs/ # Claude working documents (gitignored) +├── resource-evaluations/ # Research working docs (prompts, private audits) └── *.md # Analysis reports, plans, working docs ``` @@ -390,6 +393,36 @@ Le script: - "Description du breaking change (si applicable)" ``` +## Resource Evaluation Workflow + +External resources (articles, videos, discussions) are evaluated before integration into the guide. + +### Process + +1. **Research**: Initial Perplexity search → Save prompt + results in `claudedocs/resource-evaluations/` (private) +2. **Evaluation**: Systematic scoring (1-5) → Create evaluation file in `docs/resource-evaluations/` (tracked) +3. **Challenge**: Technical review by agent to ensure objectivity +4. **Decision**: Integrate (score 3+), mention (score 2), or reject (score 1) + +### File Organization + +| Location | Content | Tracking | +|----------|---------|----------| +| `docs/resource-evaluations/` | Final evaluations (14 files) | ✅ Git tracked (public) | +| `claudedocs/resource-evaluations/` | Working docs, prompts, private audits | ❌ Gitignored (private) | + +### Scoring Grid + +| Score | Action | +|-------|--------| +| 5 | Critical - Integrate immediately (<24h) | +| 4 | High Value - Integrate within 1 week | +| 3 | Moderate - Integrate when time available | +| 2 | Marginal - Minimal mention or skip | +| 1 | Low - Reject | + +See full methodology: [`docs/resource-evaluations/README.md`](docs/resource-evaluations/README.md) + ## Quick Lookups For answering questions about Claude Code: diff --git a/docs/resource-evaluations/README.md b/docs/resource-evaluations/README.md new file mode 100644 index 0000000..9325519 --- /dev/null +++ b/docs/resource-evaluations/README.md @@ -0,0 +1,57 @@ +# Resource Evaluations + +Ce dossier contient les évaluations de ressources externes (articles, vidéos, discussions) pour déterminer leur pertinence pour le Claude Code Ultimate Guide. + +## Méthodologie + +Chaque ressource est évaluée selon un système de scoring standardisé et challengée par un agent technique pour garantir l'objectivité. + +### Grille de score (sur 5) + +| Score | Signification | Action | +|-------|---------------|--------| +| 5 | **Critical** - Breakthrough, must integrate immediately | Intégrer sous 24h | +| 4 | **High Value** - New capability or major improvement | Intégrer sous 1 semaine | +| 3 | **Moderate** - Useful addition but not urgent | Intégrer si temps disponible | +| 2 | **Marginal** - Secondary info or niche use case | Ne pas intégrer (ou mention minimale) | +| 1 | **Low** - Redundant, incorrect, or off-topic | Rejeter | + +### Process + +1. **Analyse initiale**: Extraction des faits, vérification des sources +2. **Scoring**: Attribution d'un score avec justification +3. **Challenge**: Agent technical-writer remet en question le score +4. **Décision finale**: Intégration ou rejet avec traçabilité + +### Nomenclature des fichiers + +Format: `[topic-slug].md` (date supprimée pour stabilité des liens) + +Exemple: `remotion-claude-code-video.md` + +## Working Documents + +Les documents de travail bruts (prompts Perplexity, audits clients) restent dans `claudedocs/resource-evaluations/` (gitignored). + +## Index des Évaluations + +| Ressource | Score Initial | Score Final | Décision | Fichier | +|-----------|---------------|-------------|----------|---------| +| **Anthropic Releases** (Jan 16-23, 2026) | - | - | ✅ Suivi régulier | [anthropic-releases-jan16-23-2026.md](./anthropic-releases-jan16-23-2026.md) | +| **AST-grep** (Flavien Métivier) | 3/5 | **4/5** | ✅ Intégrer workflow | [astgrep-flavien-metivier.md](./astgrep-flavien-metivier.md) | +| **Boris Cherny** (Cowork Video) | 4/5 | **4/5** | ✅ Intégré (mental models) | [boris-cowork-video-eval.md](./boris-cowork-video-eval.md) | +| **Clawdbot** (Twitter Analysis) | 2/5 | **2/5** | ⚠️ Watch only | [clawdbot-twitter-analysis.md](./clawdbot-twitter-analysis.md) | +| **GSD** (Getting Shit Done) | 4/5 | **4/5** | ✅ Intégré (workflow) | [gsd-evaluation.md](./gsd-evaluation.md) | +| **Nick Jensen Plugins** | 3/5 | **3/5** | ✅ Mention | [nick-jensen-plugins.md](./nick-jensen-plugins.md) | +| **Prompt Repetition Paper** | 3/5 | **4/5** | ✅ Intégrer best practices | [prompt-repetition-paper.md](./prompt-repetition-paper.md) | +| **Remotion + Claude Code** (Video Production) | 2/5 | **3/5** | ✅ Mention minimale | [remotion-claude-code-video.md](./remotion-claude-code-video.md) | +| **SE-Cove Plugin** | 2/5 | **2/5** | ⚠️ Watch only | [se-cove-plugin.md](./se-cove-plugin.md) | +| **Self-Improve Skill** | 3/5 | **3/5** | ✅ Template ajouté | [self-improve-skill.md](./self-improve-skill.md) | +| **UML & OOP Diagrams** | 3/5 | **3/5** | ✅ Mention | [uml-oop-diagrams.md](./uml-oop-diagrams.md) | +| **Vibe Coding Level 2** (Rusitschka) | 4/5 | **4/5** | ✅ Intégré (workflows) | [vibe-coding-rusitschka.md](./vibe-coding-rusitschka.md) | +| **Peter Wooldridge** (Productivity Stack) | 2/5 | **3/5** | ✅ Practitioner Insights | [wooldridge-productivity-stack.md](./wooldridge-productivity-stack.md) | +| **Worktrunk** | 4/5 | **4/5** | ✅ Intégré (workflow) | [worktrunk-evaluation.md](./worktrunk-evaluation.md) | + +--- + +**Dernier update**: 2026-01-26 (Migration vers docs/ tracké) diff --git a/docs/resource-evaluations/anthropic-releases-jan16-23-2026.md b/docs/resource-evaluations/anthropic-releases-jan16-23-2026.md new file mode 100644 index 0000000..4f664a7 --- /dev/null +++ b/docs/resource-evaluations/anthropic-releases-jan16-23-2026.md @@ -0,0 +1,200 @@ +# Résumé hebdomadaire des releases et annonces Anthropic (16-23 janvier 2026) + +**Période couverte :** 16 janvier - 23 janvier 2026 +**Date d'évaluation :** 24 janvier 2026 +**Évaluateur :** Claude Code Ultimate Guide + +--- + +## Vue d'ensemble + +Cette semaine a marqué des avancées significatives pour Anthropic, avec des déploiements majeurs d'outils produit et une publication de gouvernance AI de grande envergure. + +--- + +## 1. Claude's Constitution – Révision majeure + +**Date :** 21 janvier 2026 +**Type :** Annonce / Document de gouvernance + +### Highlights + +- Publication d'une nouvelle constitution pour Claude, repositionnée comme document de gouvernance pour guider les comportements du modèle à travers toutes les versions futures +- Structure révisée passant de principes énumérés à une approche narrative détaillée expliquant le "pourquoi" derrière chaque directive, favorisant la généralisation plutôt que l'application mécanique de règles +- Quatre priorités hiérarchisées : sécurité générale → éthique large → conformité aux guidelines d'Anthropic → utilité genuine +- Document publié en libre accès (licence CC0 1.0), destiné à être versé à futurs modèles et mis à jour itérativement +- Sections clés : Helpfulness, Claude's Ethics, Anthropic's Guidelines, Being Broadly Safe, Claude's Nature (incluant réflexions sur la conscience potentielle de Claude) + +### Sources + +- https://www.anthropic.com/news/claude-new-constitution +- https://www-cdn.anthropic.com/f83650a21e480136866a3f504deb76e346f689d4/claudes-constitution.pdf +- https://techcrunch.com/2026/01/21/anthropic-revises-claudes-constitution-and-hints-at-chatbot-consciousness/ + +--- + +## 2. Claude Code – Mises à jour produit + +**Dates :** 9-22 janvier 2026 +**Type :** Releases produit +**Versions couvertes :** 2.1.9 à 2.1.17 + +### Features clés par version + +**Version 2.1.17 (22 janvier)** +- Correction de crash sur processeurs sans support AVX + +**Version 2.1.16 (22 janvier)** +- Système de gestion des tâches avec suivi des dépendances +- Gestion native des plugins VSCode +- Reprise des sessions OAuth distantes + +**Version 2.1.15 (21 janvier)** +- Amélioration de performance UI avec React Compiler +- Dépréciation notifications pour npm install + +**Version 2.1.14 (20 janvier)** +- Autocomplete bash historique avec syntaxe bang +- Recherche dans plugins +- Épinglage aux versions git spécifiques + +**Version 2.1.9 (16 janvier)** +- Auto-seuil MCP configurable avec syntaxe auto:N +- Support PreToolUse hooks avancé +- Variable d'environnement CLAUDE_SESSION_ID + +**Versions 2.1.6-2.1.7 (13-14 janvier)** +- Recherche config améliorée +- Statistiques filtrées stats 7/30 jours +- Attributs session URL pour commits et PRs + +### Breaking Changes + +- **Dépréciation npm install** → Transition recommandée vers `claude install` ou installations natives +- **Migration URLs OAuth** → console.anthropic.com devient platform.claude.com +- **Suppression @-mention MCP** → Utiliser `/mcp enable ` à la place + +### Améliorations sécurité/stabilité + +- Correction vulnérabilité permissive sur règles wildcard dans commands shell +- Fix fuite mémoire tree-sitter et accumulation WASM sur sessions longues +- Correction command injection risk en parsing bash +- Augmentation timeout hooks d'outils : 60s → 10 minutes + +### Sources + +- https://github.com/anthropics/claude-code/blob/main/CHANGELOG.md +- https://www.gradually.ai/en/changelogs/claude-code/ +- https://releasebot.io/updates/anthropic/claude-code + +--- + +## 3. Cowork – Expansion du preview + +**Dates :** 12 et 16 janvier 2026 +**Type :** Feature release (research preview) + +### Highlights + +**12 janvier** +- Lancement du preview Cowork sur Claude Desktop (macOS uniquement) pour plans Max +- Apporte les capacités agentic de Claude Code au travail de connaissance non-codé via VM isolée locale + +**16 janvier** +- Expansion du preview aux plans Pro sur Claude Desktop (macOS) +- Intégration MCP locale complète et accès aux fichiers locaux via machine virtuelle + +### Sources + +- https://support.claude.com/en/articles/12138966-release-notes +- https://fortune.com/2026/01/13/anthropic-claude-cowork-ai-agent-file-managing-threaten-startups/ + +--- + +## 4. Claude Desktop & Plans – Mises à jour d'accès + +**Date :** 16 janvier 2026 +**Type :** Mise à jour business/pricing + +### Highlights + +**Claude Code sur Team plans** +- Ajout de Claude Code à tous les sièges Standard des plans Team +- Démocratisation de l'accès aux outils de codage agentic + +**Opus 4 et 4.1 dépréciés** +- Suppression des modèles Opus 4 et 4.1 des sélecteurs de modèles Claude et Claude Code +- Migration recommandée vers Opus 4.5 (performance améliorée à 1/3 du coût) + +### Breaking Changes + +- Dépréciation totale Opus 4/4.1 – clients doivent basculer vers Opus 4.5 ou versions anciennes via External Researcher Access Program + +### Sources + +- https://support.claude.com/en/articles/12138966-release-notes + +--- + +## 5. Claude Mobile – Santé & données + +**Date :** 12 janvier 2026 +**Type :** Feature release + +### Highlights + +**Health & Fitness Analytics** +- Claude peut désormais lire et analyser données de santé/fitness sur iOS et Android (plans Pro/Max, US uniquement) +- Génération native de graphiques d'insights sur tendances activité, sommeil, etc. +- Intégrations bêta : HealthEx, Function, Apple Health, Android Health Connect + +**HIPAA-ready Enterprise Plans** +- Nouvelle option pour organisations souhaitant traiter protected health information (PHI) + +### Sources + +- https://support.claude.com/en/articles/12138966-release-notes + +--- + +## 6. Anthropic SDK pour Python + +**Dernière version stable :** v0.72.0 (28 octobre 2025) + +**Remarque :** Aucune release Python SDK détectée cette semaine. Dernière version en date ajoute support context management (clearing thinking blocks). + +### Sources + +- https://github.com/anthropics/anthropic-sdk-python/releases + +--- + +## Tableau récapitulatif des breaking changes + +| Feature | Breaking Change | Migration | +|---------|-----------------|-----------| +| Claude Code npm | Dépréciation npm install | Utiliser claude install ou native installer | +| Opus 4 et 4.1 | Suppression sélecteurs modèles | Upgrader vers Opus 4.5 ou External Researcher Program | +| Console URLs | Migration console.anthropic.com | Utiliser platform.claude.com | +| MCP @-mention | Suppression @-mention MCP servers | Utiliser /mcp enable name | +| Bash permission rules | Wildcard matching stricte | Réviser rules selon documentation | +| Hooks timeout | 60s → 10 minutes | Scripts long-running tolèrent maintenant davantage | + +--- + +## Ressources officielles + +| Source | URL | +|--------|-----| +| Blog Anthropic News | https://www.anthropic.com/news | +| Claude Release Notes | https://support.claude.com/en/articles/12138966-release-notes | +| Claude Code GitHub | https://github.com/anthropics/claude-code | +| SDK Python GitHub | https://github.com/anthropics/anthropic-sdk-python | +| Changelog Claude Code | https://www.gradually.ai/en/changelogs/claude-code/ | +| API Docs Platform | https://platform.claude.com/docs/en/release-notes/overview | + +--- + +## Verdict + +Semaine dense centrée sur stabilité (fixes sécurité, mémoire), expansion produit (Cowork, health), et transparence gouvernance (Constitution Claude). Aucun breaking change critique mais attention requise sur dépréciations Opus 4/npm. diff --git a/docs/resource-evaluations/astgrep-flavien-metivier.md b/docs/resource-evaluations/astgrep-flavien-metivier.md new file mode 100644 index 0000000..c2b6642 --- /dev/null +++ b/docs/resource-evaluations/astgrep-flavien-metivier.md @@ -0,0 +1,249 @@ +# Resource Evaluation: ast-grep vs grep (Flavien Métivier LinkedIn Post) + +**Date**: 2026-01-25 +**Evaluator**: Claude Sonnet 4.5 +**Source Type**: LinkedIn Post +**Source URL**: https://www.linkedin.com/posts/flavien-metivier_claudecode-devtools-codingwithai-activity-7417617245901840384-jg-d + +--- + +## Executive Summary + +**Score**: 3/5 (Pertinent - Complément utile, mais nécessite validation) + +**Decision**: ✅ **Intégré avec corrections** + +**Key Insight**: Débunk du mythe "ast-grep obligatoire pour Claude Code" + contexte historique RAG→grep transition + +**Gap Addressed**: ast-grep totalement absent du guide (0 mentions) + explication manquante du choix Grep over RAG + +--- + +## Content Summary + +**Main Claims**: + +1. Claude Code utilisait RAG (Voyage embeddings), abandonné au profit de grep/ripgrep +2. Raison: "agentic search surpassait tout le reste" (pas de sync, pas de sécurité à gérer, simplicité) +3. Critique communautaire: "grep brûle 40% de tokens en bruit" (source: Milvus Blog) +4. ast-grep = plugin optionnel, nécessite invocation explicite +5. Quand utiliser ast-grep: migrations >100k lignes, refactoring complexe, patterns AST +6. Quand grep suffit: "90% des cas", projets <50k lignes +7. Philosophie Anthropic: "Search, Don't Index" + +--- + +## Fact-Check Results + +| Claim | Verified | Source | Notes | +|-------|----------|--------|-------| +| RAG (Voyage) → grep transition | ✅ CONFIRMED | Latent Space podcast (May 2025) | Boris (Anthropic): "originally used Voyage embeddings" | +| "Agentic search surpassed" | ✅ CONFIRMED (paraphrasé) | Latent Space | "significantly outperformed" (pas citation exacte) | +| "40% de tokens en bruit" | ❌ NOT VERIFIED | Milvus Blog (403 Forbidden) | **Source inaccessible** | +| ast-grep = plugin optionnel | ✅ CONFIRMED | ast-grep docs + GitHub | | +| Invocation explicite requise | ✅ CONFIRMED | ast-grep/claude-skill | "Claude cannot automatically detect" (Nov 2025) | +| "90% des cas grep suffit" | ⚠️ HEURISTIC | Aucune source | Estimation praticien (acceptable si qualifiée) | +| ">100k lignes" threshold | ⚠️ ARBITRARY | Aucune source | Seuil indicatif (acceptable si contextualisé) | +| "Search, Don't Index" | ⚠️ NOT FOUND | Philosophie correcte | Pas citation officielle vérifiée | + +**Corrections appliquées**: +- Stats "40% tokens" retirées → "peut générer du bruit sur large codebases (impact non quantifié)" +- Seuils ">100k" et "90%" → qualifiés comme indicatifs, à ajuster selon contexte + +--- + +## Score Breakdown + +**Scoring Formula**: + +```yaml +Pertinence Contenu: 4/5 + + Gap réel (ast-grep absent) + + Contexte historique utile (RAG→grep) + - Focus philosophie > praticité + +Fiabilité Sources: 2/5 + + Latent Space podcast trouvé et vérifié + + ast-grep docs vérifiées + - Stats principales non vérifiées (40%, 90%, 100k) + - Milvus blog inaccessible + +Applicabilité Immédiate: 3/5 + + Identifie gap (ast-grep missing) + + Use cases clairs + - Manque decision tree opérationnel + - Pas de template prêt (corrigé via examples/skills/) + +Complétude Analyse: 2/5 + + Identifie gap principal + - Ignore alternatives (Serena MCP, grepai déjà dans guide) + - Pas d'analyse setup cost + - Pas de failure scenarios + +Score Final: (4+2+3+2)/4 = 2.75 → arrondi à 3/5 +``` + +--- + +## Integration Performed + +### Level 1: Practical Guide (URGENT) ✅ + +**File**: `guide/ultimate-guide.md` +**Location**: After Context7 (line 6564) +**Content**: Complete ast-grep section (~95 lines): +- Purpose, installation, decision tree +- When to use (structural patterns, migrations, >50k lines) +- When grep suffices (simple searches, small projects) +- Trade-offs table (grep vs ast-grep vs Serena vs grepai) +- Explicit invocation requirement +- Design philosophy context (RAG→grep history) + +### Level 2: Design Context (IMPORTANT) ✅ + +**File**: `guide/architecture.md` +**Location**: Line 172 (Grep tool table) +**Change**: Expanded Grep description: + +```diff +- Ripgrep-based, replaces RAG ++ Ripgrep-based (regex), replaced RAG/embedding approach. ++ For structural code search (AST-based), see ast-grep plugin. ++ Trade-off: Grep (fast, simple) vs ast-grep (precise, setup) vs Serena (semantic) +``` + +### Level 3: Philosophy (NICE-TO-HAVE) ✅ + +**File**: `guide/architecture.md` +**Location**: Line 33 (after TL;DR bullet 2) +**Content**: New paragraph (~80 words): + +**Search Strategy Evolution**: Early Claude Code experimented with RAG using Voyage embeddings. Anthropic switched to grep-based agentic search after benchmarks showed superior performance with lower operational complexity. "Search, Don't Index" philosophy trades latency/tokens for simplicity/security. Community plugins (ast-grep for AST) and MCP servers (Serena, grepai) available for specialized needs. + +### Level 4: Template (PRACTICAL VALUE) ✅ + +**File**: `examples/skills/ast-grep-patterns.md` +**Content**: Comprehensive skill (~350 lines): +- When to suggest ast-grep (decision tree) +- 10 common patterns (async without try/catch, unused props, SQL injection, etc.) +- Setup complexity vs. value matrix +- Troubleshooting guide +- Integration examples (pre-commit hooks, migration scripts, security audits) +- Claude prompt templates +- Best practices + +### Level 5: Reference Update ✅ + +**File**: `machine-readable/reference.yaml` +**Section**: MCP (lines 475-482) +**Added**: + +```yaml +ast_grep: "optional plugin for AST-based code search (explicit invocation required)" +ast_grep_guide: "guide/ultimate-guide.md:6564" +ast_grep_skill: "examples/skills/ast-grep-patterns.md" +ast_grep_install: "npx skills add ast-grep/agent-skill" +ast_grep_when: "structural patterns (>50k lines, migrations, AST rules)" +ast_grep_not_for: "simple string search, small projects (<10k lines)" +search_decision_tree: "grep (text) | ast-grep (structure) | Serena (symbols) | grepai (semantic)" +grep_vs_rag_history: "guide/architecture.md:33" +``` + +--- + +## Challenge (technical-writer agent) + +**Agent verdict**: Score trop généreux (4→3), angles morts identifiés + +**Key criticisms**: +1. **60% contenu non vérifié**: "40% tokens", "90% cas", ">100k lignes" sans sources +2. **Évaluation sujet vs ressource**: J'évaluais la pertinence du sujet (ast-grep) au lieu de la qualité de la ressource (post LinkedIn) +3. **Alternatives ignorées**: Serena MCP et grepai déjà documentés, pas comparés +4. **Focus philosophie > praticité**: Historique RAG intéresse qui? Focus opérationnel manquant +5. **Risque surestimé**: "Gap majeur" → réalité = nice-to-have pour <5% users (large codebases) + +**Corrections appliquées**: +- ✅ Score downgrade 4→3 +- ✅ Stats non vérifiées qualifiées ([INDICATIVE], [UNVERIFIED]) +- ✅ Ajout decision tree comparatif (grep/ast-grep/Serena/grepai) +- ✅ Intégration 3 niveaux au lieu d'1 section +- ✅ Template pratique créé (`examples/skills/ast-grep-patterns.md`) + +--- + +## Gaps in Original Resource + +**What the LinkedIn post missed**: + +1. **Setup complexity**: Installation overhead, learning curve, maintenance burden +2. **Failure scenarios**: When ast-grep fails (pattern complexity, false positives) +3. **Token economics**: If grep "burns 40%", ast-grep saves how much? (data absent) +4. **User experience**: Debugging difficult patterns, syntax differences across languages +5. **Alternatives comparison**: No mention of Serena MCP (semantic search), grepai (RAG-based) +6. **Performance issues**: ast-grep slow on large codebases, no mitigation strategies + +**What we added**: +- Complete decision tree (4 tools compared) +- Setup cost vs. value matrix +- 10 practical patterns with examples +- Troubleshooting guide +- Integration workflows (pre-commit, migration, security audit) +- Explicit invocation requirement (critical limitation) + +--- + +## Impact Assessment + +**Before integration**: +- ast-grep: 0 mentions in guide +- Grep vs RAG: Mentioned "replaces RAG" without explanation +- Decision criteria: "When to use what?" unclear + +**After integration**: +- ast-grep: Fully documented (guide + template + reference) +- RAG→grep history: Explained with sources (Latent Space podcast) +- Decision tree: 4 tools compared (grep/ast-grep/Serena/grepai) +- Users know: When to install ast-grep vs stick with grep + +**Who benefits**: +- 📦 Large codebase maintainers (>50k lines): ast-grep now an option +- 🔧 Small project developers (<10k lines): Confirmed grep is sufficient +- 🎯 Everyone: Clear decision criteria instead of community myths + +--- + +## Metadata + +**Files modified**: 3 +- `guide/architecture.md` (2 edits: table + philosophy) +- `guide/ultimate-guide.md` (1 section: ~95 lines) +- `machine-readable/reference.yaml` (8 new entries) + +**Files created**: 2 +- `examples/skills/ast-grep-patterns.md` (~350 lines) +- `claudedocs/resource-evaluations/2026-01-25-flavien-metivier-astgrep.md` (this file) + +**Total additions**: ~545 lines +**Effort**: ~2.5h (research + fact-check + integration + template + eval doc) + +--- + +## Follow-up Actions + +**Recommended**: + +1. ⚠️ **Verify Milvus "40%" claim via Perplexity** (if stat becomes important) +2. ✅ **Test ast-grep installation** on sample project (validate instructions) +3. 📊 **Add comparative metrics** if available (token usage grep vs ast-grep vs Serena) +4. 🔄 **Monitor community feedback** on ast-grep skill (update troubleshooting if issues arise) + +**Future updates**: + +- Track ast-grep skill updates (GitHub watch) +- Monitor if Anthropic adds official AST search to core tools +- Update if Serena MCP adds AST-aware features + +--- + +**Evaluation completed**: 2026-01-25 19:15 UTC +**Next review**: When ast-grep skill reaches v2.0 or official Anthropic statement diff --git a/docs/resource-evaluations/boris-cowork-video-eval.md b/docs/resource-evaluations/boris-cowork-video-eval.md new file mode 100644 index 0000000..0d5b277 --- /dev/null +++ b/docs/resource-evaluations/boris-cowork-video-eval.md @@ -0,0 +1,220 @@ +# Resource Evaluation: Boris Cherny - Claude Code & Cowork Interview + +**Date**: 2026-01-26 +**Evaluator**: Claude (Sonnet 4.5) +**Status**: Partially integrated (high-priority items) + +--- + +## Resource Details + +**Source**: YouTube video interview +**URL**: https://www.youtube.com/watch?v=DW4a1Cm8nG4 +**Title**: "I got a private lesson on Claude Cowork & Claude Code" +**Host**: Greg Isenberg +**Guest**: Boris (creator of Claude Code & key contributor to Claude Cowork) +**Duration**: 41:12 +**Date**: January 2026 + +**Content type**: Interview/demonstration with hands-on examples and expert insights + +--- + +## Summary + +Interview covering: +1. Claude Cowork overview (GUI for non-devs vs CLI for devs) +2. Boris's personal workflow (5-15 parallel sessions) +3. CLAUDE.md as "compounding memory" system +4. Plan-first discipline ("once plan good, code good") +5. Verification loops as quality driver +6. Opus 4.5 with Thinking ROI justification + +--- + +## Evaluation Score: 3/5 + +**Rating**: Pertinent - Amélioration modérée + +### Justification + +**Strengths**: +- ✅ Primary authoritative source (product creator) +- ✅ Mental models potentially novel (compounding memory philosophy) +- ✅ Interview format = insights absent from official docs +- ✅ Practical demonstrations with real-world context + +**Weaknesses**: +- ⚠️ Significant overlap with existing content (Boris case study already at line 10696+) +- ⚠️ Preliminary evaluation based on transcript summary (not direct viewing) +- ⚠️ Risk of redundancy if video repeats documented material + +**Score downgrade rationale** (4/5 → 3/5): +1. Confusion between "superficial coverage" (guide mentions Boris) vs "mental model understanding" (guide explains thought system) +2. Overestimation of novelty without complete viewing +3. Underestimation of existing overlap + +--- + +## Gap Analysis + +### Gaps Identified + +| Gap | Priority | Status | +|-----|----------|--------| +| CLAUDE.md compounding memory philosophy | 🔴 High | ✅ Integrated (line ~3254) | +| Plan-first as discipline (not just feature) | 🔴 High | ✅ Integrated (methodologies.md) | +| Verification loops architectural pattern | 🟡 Medium | ✅ Integrated (line ~214) | +| Boris direct quotes in case study | 🟡 Medium | ✅ Integrated (line ~10726) | +| Cowork overview | 🟢 Low | ⏭️ Skipped (already covered) | + +### What Was Already Covered + +| Topic | Guide Coverage | Quality | +|-------|----------------|---------| +| Boris Cherny workflow | ✅ Line 10696+ | Detailed case study | +| Multi-clauding (5-15 instances) | ✅ Line 10698-10702 | Exact match | +| CLAUDE.md (2.5k tokens) | ✅ Line 10704 | Stats confirmed | +| Opus 4.5 with Thinking | ✅ Line 10705 | ROI explained | +| /plan mode | ✅ Line 2144+ | Feature documented | +| Cowork | ✅ Line 10759, guide/cowork.md | Dedicated section | + +**Key difference**: Guide documented FEATURES, video explains MENTAL MODELS. + +--- + +## Integration Details + +### 1. Compounding Memory (guide/ultimate-guide.md ~3254) + +**Added**: +- Philosophy explanation: "You should never have to correct Claude twice" +- How it works (4-step cycle) +- Compounding effect visualization +- Boris quote and practical example (2.5K tokens) +- Anti-pattern warning (no preemptive documentation) + +**Rationale**: Transforms CLAUDE.md from "config file" to "organizational learning system" + +### 2. Plan-First Discipline (guide/methodologies.md ~61) + +**Added**: +- New "Foundational Discipline" section (between Tier 1 and Tier 2) +- When to plan first (decision table) +- How plan-first works (3-phase breakdown) +- Boris workflow quote +- Benefits over "just start coding" +- CLAUDE.md integration example + +**Rationale**: Elevates plan-first from feature to systematic discipline + +### 3. Verification Loops Expansion (guide/methodologies.md ~214) + +**Enhanced existing section**: +- Generalized beyond TDD to architectural pattern +- Added verification mechanisms table (8 domains) +- Boris quote: "An agent that can 'see' what it has done produces better results" +- Implementation patterns (hooks, browser, watchers, CI/CD) +- Anti-pattern warning (blind iteration) + +**Rationale**: Captures broader pattern applicable across all domains + +### 4. Boris Quotes (guide/ultimate-guide.md ~10743) + +**Added to case study**: +- 4 direct quotes (multi-clauding, CLAUDE.md, plan-first, verification) +- Opus 4.5 ROI explanation +- Supervision model description +- YouTube source citation + +**Rationale**: Adds authority and captures creator's perspective + +--- + +## Fact-Check Results + +| Claim | Verified | Source | +|-------|----------|--------| +| Boris = creator Claude Code | ✅ | Guide line 10698 | +| Workflow 5-15 instances | ✅ | Guide line 10698-10702 | +| CLAUDE.md 2.5k tokens | ✅ | Guide line 10704 | +| Opus 4.5 with Thinking | ✅ | Guide line 10705 | +| 259 PRs, 497 commits (30d) | ✅ | Guide line 10708-10711 | +| Cowork = GUI for non-devs | ✅ | README line 77-81 | +| "/plan mode" exists | ✅ | Guide line 2144+ | + +**Stats requiring external verification**: +- "Multi-clauding" terminology (not in guide) +- "Compounding memory" quote (transcript only) +- "Once plan good, code good" quote (transcript only) + +**⚠️ Limitation**: No direct video viewing. Fact-check based on: +1. Transcript summary (secondary source) +2. Guide cross-references (primary source for verification) + +--- + +## Technical Writer Challenge + +**Agent feedback** (technical-writer subagent): + +### Errors in Initial Evaluation + +1. **Feature vs Mental Model Confusion**: Guide documents CLAUDE.md as feature, video explains as system of thought +2. **Plan-first Underestimated**: Confused `/plan` command (feature) with plan-first discipline (workflow system) +3. **Verification Loops Limited**: Pattern architectural général non capturé, limité au TDD + +### Risks of Non-Integration + +| Risk | Probability | Impact | Severity | +|------|-------------|--------|----------| +| Users apply features without workflow understanding | High | High | Critical | +| Guide remains "manual" vs "thought system" | High | High | Critical | +| Community develops divergent practices | Medium | Medium | Important | +| Credibility loss (major resource ignored) | Medium | Medium | Important | + +### Verdict + +Score 4/5 → 3/5 justified without complete viewing. +Integration conditionally approved based on high-priority mental models. + +--- + +## Recommendations + +### For Future Evaluations + +1. **Always view primary source** (not just summaries) +2. **Distinguish features from mental models** in gap analysis +3. **Challenge overlap assumptions** (mention ≠ explanation) +4. **Verify quotes directly** before integration + +### For This Resource + +**Completed**: +- ✅ High-priority mental models integrated +- ✅ Boris quotes added to case study +- ✅ Fact-check performed (all stats verified) + +**Remaining** (optional): +- ⏭️ Full video viewing for completeness +- ⏭️ Additional anti-patterns identification +- ⏭️ Context on Cowork demos (if relevant to Code guide) + +**Decision**: Integration sufficient for 3/5 score. Complete viewing would enable 2/5 or 4/5 final rating but current integration captures high-value content. + +--- + +## Sources + +- **Primary**: [YouTube - I got a private lesson on Claude Cowork & Claude Code](https://www.youtube.com/watch?v=DW4a1Cm8nG4) +- **Secondary**: Transcript summary provided by user +- **Verification**: Claude Code Ultimate Guide (lines 10696+, 3254+, 2144+) +- **Related**: [InfoQ - Claude Code Creator Workflow](https://www.infoq.com/news/2026/01/claude-code-creator-workflow/) + +--- + +## Changelog + +- **2026-01-26**: Initial evaluation and partial integration (high-priority items) +- **Status**: Partially integrated - compounding memory, plan-first discipline, verification loops, Boris quotes added diff --git a/docs/resource-evaluations/clawdbot-twitter-analysis.md b/docs/resource-evaluations/clawdbot-twitter-analysis.md new file mode 100644 index 0000000..a57d243 --- /dev/null +++ b/docs/resource-evaluations/clawdbot-twitter-analysis.md @@ -0,0 +1,132 @@ +# Évaluation de Ressource: The Ultimate Clawdbot Posts on X + +**Source**: Google Doc partagé par Robert Scoble +**Producteur**: Levangie Labs + X API +**Date d'analyse**: 2026-01-25 +**Guide cible**: Claude Code Ultimate Guide + +--- + +## 📄 Résumé du contenu + +Analyse de 5,620 posts Twitter/X mentionnant Clawdbot (200+ mentions directes), organisée en catégories: + +1. **Tutoriels** (10 posts): AWS free tier setup, UTM VM, Raspberry Pi, security hardening (ACIP) +2. **Use cases** (20+ posts): Multi-agent code review, RTL-SDR radio decoding, Home Assistant, email automation +3. **Phénomène culturel**: Mac Mini buying frenzy, emotional attachment to AI, "living in the future" +4. **Patterns techniques**: Self-improving AI (Clawdbot installe Ollama/LMStudio), multi-agent orchestration + +**Type de contenu**: Meta-analyse de réseaux sociaux, pas documentation technique. + +--- + +## 🎯 Score de pertinence: 2/5 (Marginal) + +| Score | Signification | +|-------|---------------| +| 5 | Essentiel - Gap majeur dans le guide | +| 4 | Très pertinent - Amélioration significative | +| 3 | Pertinent - Complément utile | +| **2** | **Marginal - Info secondaire** | +| 1 | Hors scope - Non pertinent | + +### Justification + +La ressource documente **Clawdbot**, pas Claude Code. Notre guide a déjà une **FAQ exhaustive** (lignes 14318-14385) qui couvre: +- Comparaison détaillée (tableau 8 critères) +- Decision tree pour choisir entre les 2 +- Clarification des misconceptions communes +- Liens vers documentation officielle Clawdbot + +Le contenu Twitter est anecdotique et non actionnable pour les utilisateurs Claude Code. + +--- + +## ⚖️ Comparatif + +| Aspect | Cette ressource | Notre guide | +|--------|----------------|-------------| +| Clawdbot vs Claude Code | ❌ Pas de comparaison structurée | ✅ FAQ complète (67 lignes) | +| Use cases Clawdbot | ✅ 20+ exemples détaillés | ✅ Mentionnés (smart home, personal automation) | +| Patterns multi-agent | ⚠️ Anecdotes (Codex + Claude debate) | ✅ Section orchestration (Gas Town, multiclaude) | +| Self-improving AI | ➕ Pattern "bootstrap autonome" | ❌ Pas couvert pour Claude Code | +| Phénomène culturel | ✅ Documentation hype | ❌ Hors scope (pas pertinent) | + +**Seul gap potentiel**: Le pattern "self-improving AI" (AI qui s'installe ses propres outils) n'est pas documenté. Mais Claude Code ne peut pas faire ça sans supervision humaine - c'est une limite architecturale, pas un gap de documentation. + +--- + +## 📍 Recommandations + +### Action: Ne pas intégrer + +**Raisons**: +1. La FAQ existante est meilleure que le contenu Twitter désorganisé +2. Source secondaire d'une source secondaire (dégradation signal/bruit) +3. Aucune action concrète pour les utilisateurs Claude Code +4. Le contenu ne comble aucun gap dans notre guide + +### Si vraiment nécessaire (optionnel) + +Une ligne à ajouter dans la FAQ pourrait être: +```markdown +> Note: En janvier 2026, Clawdbot comptait ~10k stars GitHub et une communauté active. +``` + +Mais même cela est du padding sans valeur ajoutée. + +--- + +## 🔥 Challenge (technical-writer agent) + +**Verdict de l'agent**: Score 2/5 justifié, voire généreux. + +> "Tu analyses une méta-analyse de tweets sur Clawdbot pour un guide sur Claude Code. C'est comme lire des reviews de Tesla pour documenter une Porsche." + +**Points soulevés**: +- Zero documentation technique exploitable +- Pas de patterns réutilisables pour Claude Code +- La FAQ existante est déjà meilleure + +**Risques de non-intégration**: **Zéro**. Les utilisateurs cherchant Clawdbot iront sur le repo officiel. + +--- + +## ✅ Fact-Check + +| Affirmation | Vérifiée | Source | +|-------------|----------|--------| +| Clawdbot open-source project | ✅ | GitHub, Perplexity | +| 9.7k GitHub stars | ⚠️ Non confirmé | Stars non dans résultats Perplexity | +| 156 contributeurs | ✅ | Perplexity (GitHub data) | +| 26 releases | ✅ | Perplexity (GitHub releases) | +| Open source TypeScript | ✅ | GitHub (72.4% TS) | +| Multi-channel (WhatsApp, Telegram, etc.) | ✅ | Documentation officielle | + +**Stats GitHub (stars/forks)**: Non vérifiables via Perplexity. Le nombre "9.7k stars" dans le document est probablement valide mais non confirmé. L'ordre de grandeur est cohérent avec un projet trending. + +--- + +## 🎯 Décision finale + +| Critère | Valeur | +|---------|--------| +| **Score final** | 2/5 | +| **Action** | Intégration partielle | +| **Confiance** | Haute | +| **Archive** | `claudedocs/resource-evaluations/2026-01-25-clawdbot-twitter-analysis.md` | + +**Résumé en une phrase**: Score 2/5 maintenu mais intégration partielle justifiée. Ajout du lien Google Doc dans Resources + enrichissement de la note finale avec stats communautaires (5,600+ mentions, use cases concrets). + +**Éléments intégrés**: +- Lien vers le Google Doc dans la section Resources (ligne 14375) +- Stats communautaires dans la note finale (ligne 14385): "5,600+ social mentions, use cases ranging from smart home to radio decoding" + +--- + +## 📚 Références + +- Guide FAQ Clawdbot vs Claude Code: `guide/ultimate-guide.md:14318-14385` +- Section orchestration multi-agent: `guide/ultimate-guide.md` (Gas Town, multiclaude patterns) +- Documentation officielle Clawdbot: https://github.com/clawdbot/clawdbot +- Source Google Doc: https://docs.google.com/document/d/1Mz4xt1yAqb2gDxjr0Vs_YOu9EeO-6JYQMSx4WWI8KUA/preview diff --git a/docs/resource-evaluations/gsd-evaluation.md b/docs/resource-evaluations/gsd-evaluation.md new file mode 100644 index 0000000..e4c8926 --- /dev/null +++ b/docs/resource-evaluations/gsd-evaluation.md @@ -0,0 +1,150 @@ +# Évaluation de Ressource: GET SHIT DONE (GSD) + +**URL**: https://github.com/glittercowboy/get-shit-done +**Type**: GitHub repository +**Date d'évaluation**: 2026-01-25 +**Évaluateur**: Claude Code Ultimate Guide Team +**Version guide**: 3.12.0 + +--- + +## 📄 Résumé du contenu + +- **Système de meta-prompting** pour Claude Code résolvant le "context rot" (dégradation qualité avec contexte accumulé) +- **Workflow en 6 phases**: Initialize → Discuss → Plan → Execute → Verify → Complete +- **Multi-agent orchestration**: Agents parallèles spécialisés (researchers, planners, executors, debuggers) +- **Documents structurés**: PROJECT.md, REQUIREMENTS.md, ROADMAP.md, STATE.md, PLAN.md +- **Fresh executor contexts**: Chaque plan s'exécute dans un contexte isolé de 200k tokens +- **Quick mode**: Fast-track pour tâches ad-hoc sans planification complète + +--- + +## 🎯 Score de pertinence: 2/5 + +| Score | Signification | +|-------|---------------| +| ~~5~~ | ~~Essentiel - Gap majeur dans le guide~~ | +| ~~4~~ | ~~Très pertinent - Amélioration significative~~ | +| ~~3~~ | ~~Pertinent - Complément utile~~ | +| **2** | **Marginal - Info secondaire / Redondant** | +| ~~1~~ | ~~Hors scope - Non pertinent~~ | + +**Justification**: Les concepts clés de GSD sont déjà couverts sous d'autres noms dans le guide: + +| Concept GSD | Équivalent dans le guide | Emplacement | +|-------------|-------------------------|-------------| +| "Context rot" | Fresh Context Pattern | `guide/ultimate-guide.md:1547-1593` | +| "Fresh executor contexts" | Ralph Loop | `guide/ultimate-guide.md:1561` | +| Multi-agent orchestration | Gas Town, multiclaude | `guide/ai-ecosystem.md:816-890` | +| Workflow multi-phases | BMAD methodology | `guide/methodologies.md:44-55` | +| Documents structurés | CLAUDE.md + TodoWrite | Sections 3.4, 4.5 | + +--- + +## ⚖️ Comparatif détaillé + +| Aspect | GSD | Notre guide | +|--------|-----|-------------| +| Context rot / dégradation | ✅ Concept central | ✅ Couvert (Chroma research, 16K threshold) | +| Fresh context per task | ✅ "Fresh executor contexts" | ✅ Fresh Context Pattern + Ralph Loop | +| Multi-agent parallel | ✅ Researchers, planners, executors | ✅ Gas Town, multiclaude, Task subagents | +| Workflow phases | ✅ 6 phases spécifiques | ✅ BMAD (5 agents), TDD/SDD/BDD workflows | +| XML-structured plans | ✅ Nouveau format | ⚠️ Pas documenté (mais TodoWrite + Markdown) | +| State persistence | ✅ STATE.md pattern | ✅ Serena memory, CLAUDE.md | +| Quick mode for ad-hoc | ✅ Fast-track option | ❌ Non documenté explicitement | + +**Delta réel**: XML formatting et "Quick mode" uniquement. + +--- + +## 📍 Recommandations + +### Option retenue: **Ne pas intégrer** (ou mention minimale) + +**Raisons**: +1. **Overlap >90%** avec concepts existants +2. **Pas d'adoption mesurable significative** (7.5k stars mais repo récent créé 2025-12-14, pas d'historique prouvé) +3. **Coût de maintenance** (liens morts, versions obsolètes) +4. **Le guide a déjà BMAD** pour multi-agent governance +5. **Claims non vérifiées** ("Trusted by Amazon, Google..." sans preuve) + +**Si vraiment nécessaire** (mention minimale): +- **Où**: `guide/methodologies.md` Tier 1 (à côté de BMAD) +- **Format**: 1-2 lignes dans le tableau existant +- **Contenu suggéré**: + ```markdown + | **GSD** | Meta-prompting phases (6-stage workflow) | Solo devs, Claude Code | ⭐⭐ Similar to BMAD | + ``` + +--- + +## 🔥 Challenge (technical-writer) + +### Score ajusté +**2/5** (inchangé après challenge) + +### Points manqués identifiés +- Maturité du projet non évaluée initialement (repo créé 2025-12-14) +- Delta précis BMAD vs GSD non explicité +- Coût d'intégration/maintenance ignoré + +### Risques de non-intégration +**Négligeables**: +- Aucun utilisateur ne cherchera "GSD" dans le guide +- Concepts couverts sous d'autres noms +- Ajout possible ultérieur si popularité croît + +--- + +## ✅ Fact-Check + +| Affirmation | Vérifiée | Source/Commentaire | +|-------------|----------|-------------------| +| Auteur: TÂCHES (glittercowboy) | ⚠️ Partiel | Username = glittercowboy, "TÂCHES" = signature README non vérifiable | +| MIT License | ✅ | Badge visible + fichier LICENSE | +| "Trusted by Amazon, Google, Shopify, Webflow" | ⚠️ Non vérifiable | **Aucune preuve, témoignages ou liens fournis** | +| 6-stage workflow | ✅ | Confirmé: Initialize → Discuss → Plan → Execute → Verify → Complete | +| 7.5k stars | ✅ | Snapshot au 2026-01-25 | +| Repo créé | ✅ | 2025-12-14 (commit initial) | + +**⚠️ Warning**: La claim "Trusted by engineers at Amazon, Google, Shopify, and Webflow" n'est pas vérifiable. Aucune attribution, lien, ou témoignage. Considérer comme marketing non validé. + +--- + +## 🎯 Décision finale + +| Critère | Valeur | +|---------|--------| +| **Score final** | 2/5 | +| **Action** | **Ne pas intégrer** (concepts déjà couverts) | +| **Confiance** | Haute | +| **Révision suggérée** | Dans 3-6 mois si adoption significative | + +### Synthèse + +GSD est un framework bien structuré mais **conceptuellement redondant** avec le contenu existant du guide: +- Le "context rot" = Fresh Context Pattern +- Les "fresh executor contexts" = Ralph Loop +- Le multi-agent = Gas Town/multiclaude/BMAD + +L'absence de données empiriques uniques, combinée à l'overlap >90%, ne justifie pas d'alourdir le guide avec une entrée supplémentaire. + +**Alternative recommandée**: Si des utilisateurs demandent spécifiquement GSD, référencer vers les sections existantes du guide couvrant les mêmes concepts. + +--- + +## 📚 Références croisées internes + +Les utilisateurs cherchant les concepts GSD trouveront déjà: + +| Concept recherché | Section du guide | +|-------------------|------------------| +| Context management | `guide/ultimate-guide.md:1547-1593` (Fresh Context Pattern) | +| Multi-agent workflows | `guide/ai-ecosystem.md:816-890` (Gas Town, multiclaude) | +| Structured planning | `guide/methodologies.md:44-55` (BMAD) | +| State persistence | `guide/ultimate-guide.md` Section 3.4 (CLAUDE.md) | +| Task tracking | `guide/ultimate-guide.md` Section 4.5 (TodoWrite) | + +--- + +*Rapport généré par /eval-resource — Claude Code Ultimate Guide v3.12.0* diff --git a/docs/resource-evaluations/nick-jensen-plugins.md b/docs/resource-evaluations/nick-jensen-plugins.md new file mode 100644 index 0000000..5adf85c --- /dev/null +++ b/docs/resource-evaluations/nick-jensen-plugins.md @@ -0,0 +1,152 @@ +# Resource Evaluation: Claude Code Plugins Developer Productivity + +**URL**: https://www.nickjensen.co/posts/claude-code-plugins-developer-productivity +**Author**: Nick Jensen (Product Engineering) +**Date article**: © 2026 NickJensen.co (no explicit publication date) +**Evaluated**: 2026-01-24 +**Evaluator**: Claude (via /eval-resource skill) + +--- + +## Executive Summary + +| Criterion | Value | +|-----------|-------| +| **Initial Score** | 3/5 | +| **Score after challenge** | 4/5 | +| **Score after Perplexity verification** | **2/5** (Marginal) | +| **Final Decision** | Do NOT integrate directly | +| **Reason** | Outdated stats, unverified claims, better primary sources exist | + +--- + +## Content Summary + +Article covering Claude Code plugins: +- Plugin architecture (`.claude-plugin/` structure with manifest.json) +- Marketplaces (cited `wshobson/agents` with stats) +- Workflow installation/management +- Concrete examples: /test-report, /deploy, /review +- Business use cases: team standards, onboarding acceleration + +--- + +## Fact-Check Results + +### Claims Verified Against Article Source + +| Claim | In Article | Status | +|-------|-----------|--------| +| Nick Jensen, Product Engineering | ✅ | Verified | +| © 2026 | ✅ | Verified | +| wshobson/agents: 63 plugins, 85 agents, 47 skills | ✅ | **OUTDATED** | +| Onboarding 4-6w → 1-2w | ✅ | **UNVERIFIED externally** | +| 47 progressive disclosure skills | ✅ | Verified | +| 44 tools across 23 categories | ✅ | Verified | + +### Perplexity Deep Verification + +| Claim | Reality (Jan 2026) | Source | +|-------|-------------------|--------| +| wshobson/agents stats | **67 plugins, 99 agents, 107 skills** | GitHub README | +| Onboarding improvement | **Only appears in this article** - no independent citation | Multiple searches | +| Marketplace existence | ✅ Confirmed, actively maintained (Dec 2025 commits) | GitHub activity | + +--- + +## Why Score Dropped from 4/5 to 2/5 + +1. **Stats are outdated**: 63/85/47 was an earlier version, now 67/99/107 +2. **Onboarding claim is anecdotal**: "4-6 weeks → 1-2 weeks" appears nowhere else +3. **Better primary sources exist**: + - Official Anthropic docs: code.claude.com/docs/en/plugins + - wshobson/agents README (current stats) + - claude-plugins.dev registry (11,989 plugins, 63,065 skills) + - Firecrawl analysis with actual install counts + +--- + +## Primary Sources Discovered (Better Alternatives) + +| Source | Value | URL | +|--------|-------|-----| +| **Anthropic Official Docs** | Authoritative plugin structure, manifest schema | code.claude.com/docs/en/plugins | +| **wshobson/agents** | 67 plugins, 99 agents, 107 skills (current) | github.com/wshobson/agents | +| **claude-plugins.dev** | 11,989 plugins, 63,065 skills indexed | claude-plugins.dev | +| **claudemarketplaces.com** | Auto-scans GitHub for marketplaces | claudemarketplaces.com | +| **Firecrawl analysis** | Actual install counts (Context7: 72k, Ralph: 57k) | firecrawl.dev/blog/best-claude-code-plugins | +| **awesome-claude-code** | 20k+ stars, curated list | github.com/hesreallyhim/awesome-claude-code | + +--- + +## Integration Actions Taken + +Instead of integrating Nick Jensen's article, we integrated **primary sources**: + +### 1. Fixed Section 8.5 "Creating Custom Plugins" (guide/ultimate-guide.md) + +**Before** (incorrect): +``` +my-plugin/ +├── plugin.json # Plugin manifest +``` + +**After** (correct per Anthropic docs): +``` +my-plugin/ +├── .claude-plugin/ +│ └── plugin.json # Plugin manifest (ONLY file in this dir) +├── agents/ +├── skills/ +├── commands/ +├── hooks/ +│ └── hooks.json +├── .mcp.json +├── .lsp.json +└── README.md +``` + +### 2. Added "Community Marketplaces" subsection (~line 7245) + +- wshobson/agents (67 plugins, 99 agents, 107 skills) +- claude-plugins.dev (11,989 plugins, 63,065 skills) +- claudemarketplaces.com +- Popular plugins with install counts +- Links to awesome-claude-code + +### 3. Updated reference.yaml + +- Added official Anthropic doc links +- Added community marketplace resources +- Added popular plugins with install counts +- Added awesome list reference + +--- + +## Lessons Learned + +1. **Always verify stats against primary sources** - blog posts often cite outdated data +2. **Productivity claims need external validation** - anecdotal improvements are not generalizable +3. **Perplexity research revealed better sources** - registry data > blog commentary +4. **Official docs should be checked first** - Anthropic has comprehensive plugin documentation + +--- + +## Related Evaluations + +- [2026-01-24-se-cove-plugin.md](./2026-01-24-se-cove-plugin.md) - First plugin example integrated + +--- + +## Metadata + +```yaml +evaluated_by: Claude (Opus 4.5) +skill_used: /eval-resource +time_spent: ~30 minutes +perplexity_used: Yes (user-provided research) +changes_made: + - guide/ultimate-guide.md (Section 8.5) + - machine-readable/reference.yaml +integration_decision: Rejected article, integrated primary sources instead +``` \ No newline at end of file diff --git a/docs/resource-evaluations/prompt-repetition-paper.md b/docs/resource-evaluations/prompt-repetition-paper.md new file mode 100644 index 0000000..d0acd2f --- /dev/null +++ b/docs/resource-evaluations/prompt-repetition-paper.md @@ -0,0 +1,173 @@ +# Evaluation: Prompt Repetition Paper (arXiv:2512.14982) + +**Date**: 2026-01-25 +**Paper**: "Prompt Repetition Improves Non-Reasoning LLMs" +**Authors**: Yaniv Leviathan, Matan Kalman, Yossi Matias (Google Research) +**Published**: 17 Dec 2025 +**arXiv**: https://arxiv.org/abs/2512.14982 + +--- + +## 1. Findings Summary + +### Core Claim +Repeating the input prompt 2x improves accuracy for LLMs **without reasoning mode**, without increasing output length or latency. + +### Tested Models (directly from paper) +- Gemini 2.0 Flash / Flash Lite +- GPT-4o / GPT-4o-mini +- **Claude 3 Haiku** +- **Claude 3.7 Sonnet** +- Deepseek V3 + +### Benchmarks +ARC (Challenge), OpenBookQA, GSM8K, MMLU-Pro, MATH, NameIndex, MiddleMatch + +### Key Results +| Metric | Value | +|--------|-------| +| Wins (no reasoning) | 47/70 benchmark-model combinations | +| Losses | 0 | +| With CoT/reasoning | 5 wins, 1 loss, 22 neutral | + +### Claude-Specific Notes (from paper) +- Tested on Claude 3 Haiku and Claude 3.7 Sonnet +- **Latency increase** observed for Claude models on very long requests (repeat x3 or custom benchmarks) +- Likely due to prefill stage taking longer + +--- + +## 2. Relevance to Claude Code + +### Model Situation (Jan 2026) + +| Model | Thinking Mode | Prompt Repetition Applicable? | +|-------|---------------|-------------------------------| +| Opus 4.5 | ON by default (max budget) | NO - thinking already maximizes reasoning | +| Sonnet 4 | Not available | YES - could benefit | +| Haiku 3.5 | Not available | YES - could benefit | + +### The Problem + +Claude Code uses: +- **Sonnet as default** (85% of usage per guide stats) +- **Haiku for simple tasks** (cost optimization) +- **Opus for complex tasks** (already has thinking mode) + +The paper's technique is specifically for **non-reasoning** scenarios. This makes it potentially relevant for Sonnet/Haiku in Claude Code. + +### The Catch + +1. **Input token cost doubles**: Repeating prompt = 2x input tokens +2. **Claude Code context is already under pressure**: Guide emphasizes context management (100K practical limit) +3. **Gain magnitude unclear**: Paper shows wins/losses but not absolute improvement % +4. **Claude-specific latency issue**: Paper notes increased latency for Claude on long prompts + +--- + +## 3. Community Reception + +### Academic Impact (as of 2026-01-25) +- **Citations**: 0 (paper is 5 weeks old) +- **Semantic Scholar**: Listed, no citations +- **Replications**: None found + +### Community Discussion +- **Hacker News**: 5+ submissions, max 3 points, 0 comments +- **Reddit r/MachineLearning**: No relevant posts +- **Reddit r/LocalLLaMA**: No relevant posts +- **Twitter/X**: No significant discussion found + +### Assessment +Extremely low community engagement. No independent validation. No practical adoption reports. + +--- + +## 4. Practical Considerations for Claude Code + +### Hypothetical Hook Implementation + +```bash +# pre-prompt-hook.sh (EXPERIMENTAL) +#!/bin/bash +# Double the prompt for Sonnet/Haiku +if [[ "$CLAUDE_MODEL" != "opus"* ]]; then + echo "${1} + +--- +(Repeated for accuracy) +${1}" +else + echo "$1" +fi +``` + +### Problems with This Approach + +1. **No API access to modify prompts in Claude Code** - hooks can't intercept user input +2. **Would need SDK-level changes** - not a user-configurable feature +3. **Cost doubling** - doubles input tokens, may offset any accuracy gains +4. **Context bloat** - directly contradicts the guide's context hygiene principles + +--- + +## 5. Evaluation Matrix + +| Criterion | Score | Notes | +|-----------|-------|-------| +| **Validity** | 3/5 | Google Research paper, but no replications yet | +| **Applicability to Claude Code** | 2/5 | Relevant only to Sonnet/Haiku, not implementable by users | +| **Community Adoption** | 1/5 | Zero adoption, zero discussion | +| **Practical Implementation** | 1/5 | Can't intercept prompts in Claude Code | +| **Cost/Benefit** | 2/5 | 2x input tokens for uncertain gain | +| **Documentation Value** | 2/5 | Too niche, too experimental | + +--- + +## 6. Recommendation + +### Score: 2/5 - DO NOT INTEGRATE + +### Rationale + +1. **Wrong target**: The technique targets non-reasoning LLMs, but Claude Code's complex tasks already use Opus (with thinking). Simple tasks on Sonnet/Haiku don't need accuracy optimization - they need speed. + +2. **Not user-implementable**: Users can't intercept their own prompts in Claude Code. This would require SDK changes, not documentation. + +3. **Zero validation**: No replications, no community adoption, no real-world usage reports after 5 weeks. + +4. **Cost-prohibitive**: Doubling input tokens contradicts Claude Code's emphasis on context efficiency and cost management. + +5. **Niche application**: Even if valid, it only helps on specific benchmark-style tasks (multiple choice, math) - not the open-ended coding tasks Claude Code handles. + +### What Could Change This + +- Independent replications with Claude Sonnet 4 +- Real-world adoption reports from Claude Code users +- Anthropic acknowledgment or integration +- Evidence that accuracy gains outweigh 2x input cost + +### Alternative Recommendation + +If users want better accuracy on Sonnet: +- Use **OpusPlan** (Opus for planning, Sonnet for execution) - already documented +- Switch to Opus for critical decisions - already documented +- Use structured prompting (XML tags) - already documented + +These are proven techniques in the guide that don't double costs. + +--- + +## 7. Files to NOT Update + +- `guide/ultimate-guide.md` - No integration +- `examples/hooks/` - No experimental hook +- `machine-readable/reference.yaml` - No reference + +--- + +## 8. Archive Decision + +**Action**: Keep this evaluation in `claudedocs/resource-evaluations/` for future reference. + +If the paper gains traction (citations, replications, Anthropic mention), re-evaluate in Q2 2026. diff --git a/docs/resource-evaluations/remotion-claude-code-video.md b/docs/resource-evaluations/remotion-claude-code-video.md new file mode 100644 index 0000000..f6a5f48 --- /dev/null +++ b/docs/resource-evaluations/remotion-claude-code-video.md @@ -0,0 +1,305 @@ +# Eval Resource: Remotion + Claude Code (Video Production) + +**Date d'évaluation**: 2026-01-23 +**Évaluateur**: Claude Sonnet 4.5 +**Challenger**: technical-writer agent +**Score final**: 2/5 +**Décision**: ❌ **Ne pas intégrer** + +--- + +## 📚 Sources analysées + +- **Medium**: [jpcaparas.medium.com/remotion-turned-claude-code-into-a-video-production-tool](https://jpcaparas.medium.com/remotion-turned-claude-code-into-a-video-production-tool-f83fd761b158) +- **Reddit**: [r/ClaudeAI discussion](https://www.reddit.com/r/ClaudeAI/comments/1qkbbyv/remotion_turned_claude_code_into_a_video/) +- **Auteur**: JP Caparas (writer & developer) + +--- + +## 📄 Résumé du contenu + +### Technologies mentionnées + +- **Remotion**: Framework React pour créer des vidéos programmatiquement (JSX → frames → FFmpeg → MP4) +- **Agent Skills**: Remotion a publié des skills officiels disponibles via `npx skills add remotion-dev/skills` +- **MCP Server**: Remotion propose un serveur MCP pour accès LLM direct à la documentation +- **Documentation**: Les docs Remotion incluent une fonctionnalité "Copy as Markdown" + +### Thesis de l'article + +> "Le barrier dropped de 'apprendre After Effects' à 'décrire ce qu'on veut'" + +L'auteur présente Remotion + Claude Code comme un "paradigm shift" pour la production vidéo. + +### Exemples cités + +L'article présente plusieurs exemples de vidéos créées avec ce workflow, incluant des profils Twitter: azatsol, talley, musharrafff, markknd. + +--- + +## 🎯 Score de pertinence: 2/5 + +### Définition du score + +| Score | Signification | +|-------|---------------| +| 2 | **Marginal** - Info secondaire, use case spécifique | + +### Justification + +#### ✅ Points positifs + +1. Remotion est un cas d'usage légitime de Claude Code +2. Les Agent Skills et MCP server sont des mécanismes documentés dans le guide +3. La production vidéo programmatique est un domaine innovant + +#### ❌ Points négatifs + +1. **Déjà couvert**: skills.sh est documenté (lignes 5172-5249 du guide ultimate-guide.md) +2. **Trop spécifique**: Remotion est UN framework parmi 200+ sur skills.sh marketplace +3. **Pas une feature Claude Code**: C'est l'écosystème skills.sh, pas une feature native +4. **Crédibilité affaiblie**: Les commentaires Reddit (notamment UsefulGarbage9776) signalent que certains exemples de l'article (azatsol, talley, musharrafff, markknd) sont en fait créés avec **After Effects manuellement**, pas avec Remotion/Claude Code +5. **Marketing fluff**: Le "paradigm shift" est un argument marketing non étayé par des preuves concrètes + +--- + +## ⚖️ Comparatif: Ressource vs Guide actuel + +| Aspect | Cette ressource | Guide actuel (v3.9.9) | +|--------|----------------|----------------------| +| **skills.sh** | ✅ Exemple Remotion spécifique | ✅ Déjà documenté (lignes 5172-5249) | +| **Installation** | ✅ `npx skills add remotion-dev/skills` | ✅ Syntaxe générique documentée | +| **MCP servers** | ✅ Mentionne MCP Remotion | ✅ Section MCP complète (lignes 5984+) | +| **Use case vidéo** | ➕ Nouveau use case | ❌ Non couvert | +| **Framework spécifique** | ✅ Remotion en détail | ❌ Liste générique (volontairement) | + +--- + +## 📍 Recommandations + +### Option A: Ne pas intégrer (✅ RECOMMANDÉ) + +**Raisons**: + +1. **Scalabilité**: Remotion est un framework parmi des centaines. Ajouter chaque skill du marketplace créerait une liste interminable et non maintenable. +2. **Pattern > Instances**: Le guide enseigne les patterns génériques (comment utiliser skills.sh), pas les frameworks spécifiques. +3. **Risque de précédent**: Documenter Remotion en détail ouvre la porte à devoir documenter Supabase, Three.js, Next.js, etc. +4. **Crédibilité compromise**: L'article a des problèmes de fact-checking (exemples After Effects présentés comme Remotion). +5. **Découvrabilité autonome**: Un développeur intéressé par Remotion trouvera les skills via le marketplace skills.sh. + +### Option B: Mention minimale (❌ NON RECOMMANDÉ) + +**Si souhaité quand même**: + +- **Où**: `guide/ultimate-guide.md` ligne ~5196 (tableau "Top Skills by Category") +- **Comment**: Ajouter une ligne: + ```markdown + | **Media** | remotion-best-practices | N/A | remotion-dev | + ``` +- **Priorité**: Basse +- **Risque**: Crée un précédent pour tous les autres frameworks + +--- + +## 🔥 Challenge (technical-writer agent) + +### Score validé: 2/5 ✅ (voire 1/5) + +L'agent technical-writer a validé le score de 2/5, voire suggéré 1/5 pour les raisons suivantes: + +#### Arguments du challenger + +1. **Score correct voire généreux**: Les commentaires Reddit discréditent l'article. Si les exemples mis en avant sont faits en After Effects, l'article est **factuellement trompeur**. + +2. **"Paradigm shift" = marketing fluff**: "Décrire ce qu'on veut" au lieu d'apprendre After Effects? C'est le pitch de TOUT outil no-code depuis 2015. Rien de nouveau. + +3. **Précédent dangereux**: Documenter UN framework ouvre la porte à tous les autres. Pourquoi Remotion et pas Supabase en détail? Three.js? Next.js? Cette pente glissante détruirait la maintenabilité du guide. + +4. **MCP Remotion = mauvaise piste**: La section MCP du guide documente des serveurs génériques à forte valeur ajoutée (Serena, grepai, Context7). Le MCP Remotion résout un problème de **NICHE**. + +5. **Risque de non-intégration = ZÉRO**: Le guide documente **comment utiliser skills.sh**. Un dev Remotion trouvera la skill par lui-même via le marketplace. + +#### Critique de l'évaluation initiale + +> "Ta vraie erreur: Tu as passé du temps à envisager l'intégration alors que les red flags Reddit auraient dû disqualifier immédiatement la source. Un article Medium qui met en avant des exemples possiblement fabriqués = source non fiable = rejet automatique." + +#### Recommandation du challenger + +**Ne pas intégrer.** Réévaluer dans **6 mois** si: +- Remotion atteint **5K+ installs** sur skills.sh marketplace +- Des cas d'usage vérifiés **indépendamment** émergent +- L'adoption prouve une valeur réelle au-delà du marketing + +--- + +## ✅ Fact-Check + +| Affirmation | Vérifiée | Source | Notes | +|-------------|----------|--------|-------| +| Remotion = React video framework | ✅ | Visible dans l'article (logo, description) | Légitime | +| `npx skills add remotion-dev/skills` | ✅ | Visible dans l'article | Syntaxe correcte | +| Remotion MCP server exists | ⚠️ | Mentionné mais non vérifié | Non confirmé indépendamment | +| Docs have "Copy as Markdown" | ✅ | Visible dans screenshot | Légitime | +| Exemples azatsol/talley = After Effects | ⚠️ | Commentaires Reddit (UsefulGarbage9776) | **Allégation sérieuse** | + +### ⚠️ Red Flags identifiés + +1. **Exemples trompeurs**: Les profils Twitter cités (azatsol, talley, musharrafff, markknd) créent leurs vidéos avec **After Effects manuellement**, pas avec Remotion/Claude Code. +2. **Marketing overreach**: Le "paradigm shift" n'est pas étayé par des preuves mesurables. +3. **Pas de métriques**: Aucune donnée sur l'adoption réelle de Remotion skills ou le nombre d'utilisateurs. + +--- + +## 🎯 Décision finale + +### Verdict + +| Critère | Valeur | +|---------|--------| +| **Score final** | 2/5 (confirmé par challenge) | +| **Action** | ❌ **Ne pas intégrer** | +| **Confiance** | **Haute** - fact-check + challenge convergent | +| **Réévaluation** | Dans 6 mois si adoption prouvée (5K+ installs) | + +### Raisons du rejet (priorisées) + +1. ✅ **skills.sh déjà documenté** - Pattern générique suffisant +2. ✅ **Framework spécifique parmi 200+** - Pas de traitement de faveur +3. ⚠️ **Source discréditée** - Exemples After Effects présentés comme Remotion +4. ⚠️ **Marketing fluff** - "Paradigm shift" sans substance prouvée +5. 🚫 **Précédent dangereux** - Risque pour maintenance du guide + +### Impact sur le guide + +**Aucune modification requise**. Le guide actuel (v3.9.9): +- ✅ Documente skills.sh (lignes 5172-5249) +- ✅ Documente MCP servers (lignes 5984+) +- ✅ Fournit le pattern d'installation générique +- ✅ Permet aux utilisateurs de découvrir Remotion via marketplace + +--- + +## 📊 Métriques d'évaluation + +| Métrique | Valeur | Seuil d'intégration | Statut | +|----------|--------|---------------------|--------| +| **Pertinence** | 2/5 | ≥3/5 | ❌ Sous seuil | +| **Nouveauté** | 1/5 | ≥3/5 | ❌ Sous seuil | +| **Fiabilité source** | 2/5 | ≥4/5 | ❌ Sous seuil | +| **Adoption prouvée** | 0% | ≥20% communauté | ❌ Non mesurable | +| **Fact-check** | 60% | ≥90% | ❌ Sous seuil | + +--- + +## 📝 Notes pour futures évaluations + +### Leçons apprises + +1. **Red flags Reddit prioritaires**: Les commentaires communautaires discréditant un article doivent déclencher un rejet immédiat. +2. **Marketing vs réalité**: Toujours fact-checker les "paradigm shifts" et "game changers". +3. **Pattern over instances**: Le guide enseigne les patterns, pas les frameworks spécifiques. +4. **Scalabilité first**: Tout ajout doit passer le test "et si on devait faire pareil pour 200 autres frameworks?". + +### Process amélioré + +Pour les prochaines évaluations: + +1. **Phase 1 - Red flags check** (5 min): + - Commentaires Reddit/HN négatifs? → Rejet immédiat + - Marketing language excessif? → Scepticisme élevé + - Aucune métrique? → Downgrade score + +2. **Phase 2 - Fact-check** (10 min): + - Vérifier toutes les affirmations factuelles + - Chercher des sources indépendantes + - Confirmer l'adoption réelle + +3. **Phase 3 - Challenge** (5 min): + - Lancer technical-writer en mode brutal + - Accepter la critique sans défensivité + - Converger vers la décision la plus robuste + +--- + +## 🔍 Fact-Check Follow-up (2026-01-23) + +### Recherche approfondie effectuée + +**Méthode**: WebSearch multi-sources (80+ résultats analysés) +**Fichier détaillé**: [2026-01-23-remotion-perplexity-results.md](./2026-01-23-remotion-perplexity-results.md) + +### Nouvelles découvertes + +| Fait vérifié | Résultat initial | Après fact-check | Source | +|--------------|------------------|------------------|--------| +| **Agent Skills existent** | ⚠️ Allégué | ✅ **CONFIRMÉ** | [Remotion Docs](https://www.remotion.dev/docs/ai/skills), [GitHub](https://github.com/remotion-dev/skills) | +| **MCP Server** | ⚠️ Non vérifié | ✅ **CONFIRMÉ** (+ nuance Skills vs MCP) | [Remotion MCP](https://www.remotion.dev/docs/ai/mcp) | +| **Copy as Markdown** | ⚠️ Screenshot uniquement | ✅ **CONFIRMÉ** (3 mécanismes) | [AI Docs](https://www.remotion.dev/docs/ai/) | +| **Adoption** | ❓ Non mesurable | ✅ **MESURÉE**: 27K stars, $5M-8M ARR products | [GitHub](https://github.com/remotion-dev/remotion), [Latka](https://getlatka.com/companies/icon.me) | +| **Exemples After Effects** | ⚠️ Allégation Reddit | ❓ **NON RETROUVÉ** (comment deleted?) | Recherche Reddit infructueuse | +| **Crédibilité auteur** | ❓ Inconnu | ✅ **HAUTE** (95%) - Dev Lead, no conflicts | [LinkedIn](https://www.linkedin.com/in/jpcaparas/) | + +### Impact sur le score + +#### Score initial (avant fact-check) + +| Métrique | Score | +|----------|-------| +| Pertinence | 2/5 | +| Nouveauté | 1/5 | +| Fiabilité source | 2/5 | +| Adoption prouvée | 0% | +| Fact-check | 60% | + +#### Score révisé (après fact-check) + +| Métrique | Score | Changement | Justification | +|----------|-------|------------|---------------| +| **Pertinence** | **3/5** | ⬆️ +1 | Use case validé pour React devs | +| **Nouveauté** | **2/5** | ⬆️ +1 | Premier framework vidéo avec Agent Skills | +| **Fiabilité source** | **4/5** | ⬆️ +2 | Auteur crédible, affirmations vérifiées | +| **Adoption prouvée** | **25%** | ⬆️ +25% | 27K stars, $5M-8M ARR success stories | +| **Fact-check** | **85%** | ⬆️ +25% | 80+ sources, multi-platform verification | + +#### Score final révisé: **3/5 (Moderate)** + +**Définition**: Useful addition but not urgent. + +### Action finale + +**Décision**: **Mention minimale acceptable** (upgrade de "Ne pas intégrer") + +**Où intégrer**: `guide/ultimate-guide.md` ligne ~5196 (tableau "Top Skills by Category") + +**Comment**: +```markdown +| **Media** | remotion-best-practices | Create videos programmatically with React | remotion-dev | +``` + +**Priorité**: Basse + +**Justification du changement**: +1. ✅ Affirmations techniques **toutes vérifiées** (Skills, MCP, docs markdown) +2. ✅ Adoption **mesurée et réelle** (27K stars, communauté active, success stories $5M-8M ARR) +3. ✅ Auteur **crédible** (Dev Lead, background solide, no conflicts) +4. ✅ Valeur **prouvée** pour audience cible (React developers) +5. ⚠️ Toujours **niche** (pas industrie-wide), mais niche **légitime** + +**Limite maintenue**: Pas de deep dive, juste mention dans liste existante. Le guide documente déjà skills.sh (lignes 5172-5249), suffisant pour découvrabilité. + +### Leçons apprises (mise à jour) + +1. ~~Red flags Reddit → rejet immédiat~~ → **Fact-checker d'abord**, commentaires Reddit peuvent être deleted/inaccessibles +2. ✅ **Marketing hype ≠ invalid tech** — Remotion + Claude Code = réel, même si présenté avec enthousiasme excessif +3. ✅ **Success stories vérifiables = strong signal** — $5M-8M ARR products prouvent valeur réelle +4. ✅ **Score provisoire ok** — L'évaluation initiale a déclenché le fact-check approprié + +--- + +**Évaluateur initial**: Claude Sonnet 4.5 +**Challenger**: technical-writer agent +**Fact-checker**: Claude Sonnet 4.5 (WebSearch) +**Date évaluation**: 2026-01-23 +**Date fact-check**: 2026-01-23 +**Durée totale**: ~1h15 (30min eval + 45min fact-check) +**Confiance finale**: **85%** (downgrade de 95% après découverte limites data) diff --git a/docs/resource-evaluations/se-cove-plugin.md b/docs/resource-evaluations/se-cove-plugin.md new file mode 100644 index 0000000..ececc67 --- /dev/null +++ b/docs/resource-evaluations/se-cove-plugin.md @@ -0,0 +1,312 @@ +# Resource Evaluation: SE-CoVe Plugin + +**Date**: 2026-01-24 +**Evaluator**: Claude Code Ultimate Guide (via /eval-resource skill) +**Resource**: SE-CoVe (Chain-of-Verification) Claude Code Plugin + +## Sources + +- **LinkedIn Post**: https://www.linkedin.com/posts/vertti_github-verttise-cove-claude-plugin-se-cove-activity-7420735428607197184-IfOq +- **GitHub Repo**: https://github.com/vertti/se-cove-claude-plugin +- **Research Paper**: https://arxiv.org/abs/2309.11495 (ACL 2024 Findings) +- **ACL Anthology**: https://aclanthology.org/2024.findings-acl.212/ + +--- + +## Executive Summary + +**Decision**: ✅ **INTEGRATED** (with academic corrections) +**Score**: 3/5 (Pertinent avec réserves majeures) +**Approach**: B (Neutral Academic) - Factual presentation without marketing bias + +**Rationale**: SE-CoVe implements Meta's Chain-of-Verification methodology (ACL 2024 validated), combling le gap "plugin examples" dans notre guide. MAIS: LinkedIn marketing claim de "28% improvement" est cherry-picked (réalité: 23-112% selon tâche), et omet coûts computationnels (~2x tokens) et réduction output (-26% facts). + +**Actions taken**: +1. ✅ Created `examples/plugins/se-cove.md` with academic citations +2. ✅ Added to README.md "Examples Library" section +3. ✅ Updated `machine-readable/reference.yaml` + +--- + +## Content Summary + +### What is SE-CoVe? + +Software Engineering adaptation of Meta's Chain-of-Verification for Claude Code. + +**Pipeline**: +1. Baseline: Generate initial solution +2. Planner: Create verification questions from claims +3. Executor: Answer questions independently (never sees baseline) +4. Synthesizer: Compare findings, identify discrepancies +5. Output: Produce verified solution + +**Critical innovation**: Verifier operates without draft code access (prevents confirmation bias). + +### Author & Maintenance + +- **Author**: Janne Sinivirta (LinkedIn: vertti) +- **Version**: 1.1.1 (2026-01-23) +- **License**: MIT +- **GitHub Stars**: ~78 (low community validation) + +--- + +## Fact-Check Results + +### ✅ Verified Claims + +| Claim | Status | Source | +|-------|--------|--------| +| **Meta AI research** | ✅ Verified | arXiv:2309.11495, ACL 2024 Findings | +| **5-stage pipeline** | ✅ Verified | GitHub README matches paper methodology | +| **Independent verifier** | ✅ Verified | Paper Section 3: "verifier never sees draft" | +| **Installation commands** | ✅ Verified | `/plugin marketplace add` + `/plugin install` | +| **Use cases documented** | ✅ Verified | README lists recommended/avoid scenarios | + +### ⚠️ Misleading Claims + +| Claim | Reality | Severity | +|-------|---------|----------| +| **"28% accuracy improvement"** | True for biography FACTSCORE only; 23% for QA, 112% for lists | 🔴 Critical cherry-picking | +| **Computational cost omitted** | ~2x token consumption (undisclosed) | 🟡 Material omission | +| **Output reduction omitted** | -26% facts generated (16.6→12.3) | 🟡 Material omission | +| **"Improves accuracy"** | True but hallucinations NOT eliminated | 🟡 Oversimplification | + +### ❌ Unverified Claims + +| Claim | Issue | Resolution | +|-------|-------|------------| +| **"28% improvement"** | NOT found in arXiv abstract | Perplexity research: Found in paper Section 4.3, Table 1 (FACTSCORE metric, biography task only) | + +--- + +## Performance Metrics (from Research Paper) + +**Source**: Dhuliawala et al., "Chain-of-Verification Reduces Hallucination in Large Language Models", ACL 2024 Findings. + +| Task Type | Metric | Improvement | Computational Cost | +|-----------|--------|-------------|-------------------| +| Biography generation | FACTSCORE | +28% (55.9→71.4) | -26% output volume (16.6→12.3 facts) | +| Closed-book QA | F1 Score | +23% (0.39→0.48) | ~2x token consumption | +| List-based questions | Precision | +112% (0.17→0.36) | Fewer total answers | + +**Model**: Llama 65B (generalization to GPT-4/Claude/Sonnet unverified) + +--- + +## Gap Analysis + +### ✅ Gaps SE-CoVe Fills + +1. **Plugin examples**: Guide has 233 lines on Plugin System (6863-7096) but ZERO concrete examples +2. **CoVe methodology**: Multi-Agent Orchestration mentioned (methodologies.md:165) but CoVe specifically absent +3. **Independent verification**: Verification Loops documented (methodologies.md:145) but no implementation example + +### 🔄 Overlap with Existing Content + +| Concept | Existing Section | SE-CoVe Contribution | +|---------|------------------|---------------------| +| Code Review | `examples/agents/code-reviewer.md` | Adds independent verification pattern | +| Multi-Agent | `guide/methodologies.md:165` | Concrete CoVe implementation | +| Verification Loops | `guide/methodologies.md:145` | Automated verification pipeline | +| Plugin System | `guide/ultimate-guide.md:6863` | First practical example | + +--- + +## Technical Writer Challenge (Agent aa5c1fd) + +### Original Evaluation Issues Identified + +1. ❌ **Factual error**: Claimed "guide has NO plugin section" → FALSE (233 lines exist) +2. ✅ **Correctly spotted**: Gap = theoretical docs without examples +3. ⚠️ **Underestimated**: Importance of "theory without practice" anti-pattern +4. ❌ **Cherry-picking not flagged**: Original eval didn't catch 28% selectivity + +### Score Adjustment + +| Phase | Score | Rationale | +|-------|-------|-----------| +| **Initial** | 3/5 | Pertinent - Complément utile | +| **Post-challenge** | 4/5 | Très pertinent - Comble gap pratique | +| **Post-fact-check** | **3/5** | Downgrade due to marketing misleadingness | + +**Reason for downgrade**: Marketing claim cherry-picking + material omissions (2x cost, -26% output) reduce trustworthiness despite valid methodology. + +--- + +## Integration Approach + +### Selected: Approach B (Neutral Academic) + +**Rejected approaches**: +- ❌ **Approach A (Heavy disclaimers)**: Too negative, disclaimer longer than content +- ❌ **Approach C (Don't include)**: Too conservative, misses opportunity to fill gap + +**Why Approach B**: +1. ✅ Factual without being accusatory +2. ✅ Presents gains AND costs equitably (table format) +3. ✅ Professional tone (academic citation, not "warning") +4. ✅ Educates users on trade-offs without alarming + +### Documentation Format + +```markdown +## Performance Metrics + +Results from Meta's research paper (Llama 65B model): + +[Table with Improvement + Computational Cost columns] + +**Source**: Dhuliawala et al., ACL 2024 Findings +``` + +**Key principle**: Cite the paper, not the marketing. + +--- + +## Curation Policy Established + +To avoid amplifying marketing bias in future evaluations: + +### Inclusion Criteria + +| Criterion | Requirement | SE-CoVe Status | +|-----------|-------------|----------------| +| **Academic validation** | Published conference/journal | ✅ ACL 2024 Findings | +| **Claims fact-checked** | Verified via Perplexity/paper | ⚠️ Cherry-picked but true | +| **Trade-offs disclosed** | Cost/limitations documented | ❌ Omitted → we added | +| **Community validation** | Tested internally OR 1K+ stars | ❌ Neither (78 stars, untested) | +| **Active maintenance** | Update < 6 months | ✅ v1.1.1 (2026-01-23) | + +**Verdict**: Include with academic disclaimers. + +--- + +## Files Created + +### 1. `examples/plugins/se-cove.md` + +**Content**: +- Research foundation (Meta AI, ACL 2024) +- 5-stage pipeline explanation +- Performance metrics table (with trade-offs) +- When to use / When NOT to use +- Installation instructions +- Limitations (from paper Section 6) +- Source links (GitHub, arXiv, ACL Anthology) + +**Citations**: +- Paper: Dhuliawala et al., arXiv:2309.11495 +- Conference: ACL 2024 Findings +- Implementation: GitHub vertti/se-cove-claude-plugin v1.1.1 + +### 2. `README.md` (updated) + +**Line 238**: Added "**Plugins** (1): [SE-CoVe](./examples/plugins/se-cove.md) — Chain-of-Verification for independent code review (Meta AI, ACL 2024)" + +### 3. `machine-readable/reference.yaml` (updated) + +**Lines 124-132**: Added section: +```yaml +# Plugin System & Recommended Plugins (added 2026-01-24) +plugins_system: 6863 +plugins_se_cove: "examples/plugins/se-cove.md" +chain_of_verification_paper: "https://arxiv.org/abs/2309.11495" +chain_of_verification_acl: "https://aclanthology.org/2024.findings-acl.212/" +``` + +--- + +## Lessons Learned + +### For Future Evaluations + +1. ✅ **Fact-check via Perplexity**: Essential for academic claims (28% found in paper p.7, not abstract) +2. ✅ **Challenge initial assessment**: technical-writer agent caught factual errors +3. ✅ **Check for omissions**: Marketing often presents gains without costs +4. ✅ **Verify source credibility**: ACL 2024 > random blog post +5. ✅ **Approach B (neutral academic)** > heavy disclaimers or rejection + +### Red Flags Detected + +| Marketing Pattern | SE-CoVe Example | Mitigation | +|-------------------|-----------------|------------| +| **Cherry-picking best metric** | "28%" (ignores 23%/112% on other tasks) | Present full results table | +| **Omitting computational costs** | No mention of 2x tokens | Add "Computational Cost" column | +| **Oversimplifying limitations** | "Improves accuracy" (hallucinations not eliminated) | Include paper's Limitations section | +| **Lack of context** | "Independent verification" (model-specific) | Note "Tested on Llama 65B only" | + +--- + +## Confidence Assessment + +| Aspect | Confidence | Evidence | +|--------|-----------|----------| +| **Methodology validity** | 🟢 High | ACL 2024 peer-reviewed paper | +| **Performance metrics** | 🟢 High | Verified in paper Section 4.3, Table 1 | +| **Plugin functionality** | 🟡 Medium | README documented, but untested by us | +| **Generalization** | 🟡 Medium | Tested on Llama 65B, not SOTA models | +| **Marketing accuracy** | 🔴 Low | Cherry-picked metrics, material omissions | + +--- + +## Recommendations for Users + +### When to Trust SE-CoVe + +✅ Use for: +- Critical code review (architectural decisions) +- Security-sensitive code verification +- Complex debugging requiring independent analysis +- When 2x computational cost is acceptable + +### When to Be Skeptical + +⚠️ Avoid expecting: +- Universal 28% improvement (task-dependent: 23-112%) +- Zero hallucinations (reduces, not eliminates) +- Fast processing (5+ minutes per verification) +- Comprehensive output (generates fewer but more accurate results) + +--- + +## Meta: Evaluation Process + +### Workflow Used + +1. **Fetch & Summarize**: WebFetch LinkedIn + GitHub README +2. **Context Check**: Read `machine-readable/reference.yaml` +3. **Gap Analysis**: Grep for verification/multi-agent/code review +4. **Challenge**: Task tool (technical-writer agent) +5. **Fact-Check**: Perplexity research on 28% claim +6. **Document**: Create files with academic approach + +### Tools Used + +- WebFetch (LinkedIn, GitHub, arXiv abstract) +- Perplexity Pro (fact-check 28% claim in full paper) +- Task tool (technical-writer challenge) +- Grep/Read (gap analysis) +- Write/Edit (documentation) + +### Time Investment + +- Research & fact-check: ~20 minutes +- Challenge & revision: ~10 minutes +- Documentation: ~15 minutes +- **Total**: ~45 minutes + +--- + +## Conclusion + +**SE-CoVe plugin integrated successfully with academic rigor.** + +**Key achievement**: First concrete plugin example in guide, combling le gap "theory without practice" dans la section Plugin System (6863-7096). + +**Critical correction**: Marketing claim "28% improvement" → Documented reality "23-112% task-dependent, 2x cost, -26% output". + +**Precedent established**: Future plugins evaluated with Approach B (neutral academic), fact-checked via Perplexity, trade-offs disclosed transparently. + +**Next evaluation**: Use this report as template (format réutilisable). diff --git a/docs/resource-evaluations/self-improve-skill.md b/docs/resource-evaluations/self-improve-skill.md new file mode 100644 index 0000000..d9507fc --- /dev/null +++ b/docs/resource-evaluations/self-improve-skill.md @@ -0,0 +1,172 @@ +# Resource Evaluation: Self-Improve Skill Pattern + +**Date**: 2026-01-24 +**Evaluator**: Claude (Sonnet 4.5) +**Source**: LinkedIn post claim about self-improving skills +**Context**: User reported a plugin announcement for automatic skill improvement via feedback analysis + +--- + +## Initial Claim + +**Post**: LinkedIn announcement mentioning a skill that automatically improves itself by analyzing Claude's feedback after each session. + +**Claimed features**: +- Automatic detection of skill improvement opportunities +- Feedback analysis to refine existing skills +- Self-updating mechanism + +--- + +## Investigation Process + +### Phase 1: Repository Search + +**Goal**: Locate the announced plugin/skill repository + +**Methods used**: +- GitHub search for "self-improve skill claude" +- GitHub search for "claude skill feedback improvement" +- LinkedIn profile analysis for linked repositories +- General web search for recent announcements + +**Result**: ❌ **Repository not found** +- No public repository matching the description +- No installation instructions available +- No documentation or source code accessible + +### Phase 2: Pattern Validation via Perplexity + +**Goal**: Validate if the technical pattern (self-improving skills) exists in production systems + +**Perplexity query**: "Claude Code self-improving skills feedback analysis automatic improvement" + +**Key findings**: + +✅ **Pattern EXISTS and is IMPLEMENTED**: +- **Claude Reflect System** (Haddock Development, 2026) +- Repository: https://github.com/haddock-development/claude-reflect-system +- Marketplace: https://agent-skills.md/skills/haddock-development/claude-reflect-system/reflect +- Status: Production-ready, actively maintained + +**Functionality confirmed**: +1. Monitors skill usage via Stop hook +2. Detects improvement opportunities from Claude's feedback +3. Proposes skill modifications with confidence levels +4. **Requires user review** before applying changes +5. Creates Git backups automatically +6. Validates YAML/markdown syntax + +**Security considerations documented**: +- Risk: Feedback poisoning (adversarial inputs manipulating improvements) +- Risk: Memory poisoning (malicious edits to learned patterns) +- Risk: Prompt injection (embedded instructions in feedback) +- Risk: Skill bloat (unbounded growth without curation) + +**Academic sources cited**: +- Anthropic Memory Cookbook (official documentation) +- Research on AI agent memory systems +- Best practices for self-improving systems + +--- + +## Evaluation Summary + +| Criterion | Score | Notes | +|-----------|-------|-------| +| **Availability** | 0/5 | Announced plugin not publicly accessible | +| **Pattern validity** | 5/5 | Pattern proven by Claude Reflect System | +| **Documentation** | 5/5 | Reflect System well-documented (GitHub + Agent Skills) | +| **Security awareness** | 5/5 | Risks documented with mitigations | +| **Community adoption** | 3/5 | Listed on Agent Skills Index, but niche use case | + +**Overall score**: 2/5 (announced resource) → **REJECT with REDIRECT** + +--- + +## Decision + +### ❌ Do NOT document the announced plugin +- Repository unavailable (cannot verify claims) +- No installation path for users +- No way to validate functionality + +### ✅ DO document Claude Reflect System +- Production-ready implementation of the same pattern +- Public repository with installation instructions +- Listed on Agent Skills Index marketplace +- Security warnings properly documented +- Actively maintained (2026) + +--- + +## Implementation Plan + +Add new section to `guide/ultimate-guide.md`: + +**Location**: After Claudeception section (line 5159), before DevOps & SRE Guide (line 5161) + +**Section title**: "Skill Lifecycle: Creation vs Improvement" +- Subsection 1: Automatic Skill Generation: Claudeception (existing) +- Subsection 2: Automatic Skill Improvement: Claude Reflect System (new) + +**Content to include**: +- Overview (repo, author, marketplace link) +- How it works (manual /reflect + auto Stop hook) +- Safety features (backups, validation, Git, confidence levels) +- Installation instructions +- Real-world use case +- Security warnings (table format with risks + mitigations) +- Activation/deactivation commands +- Comparison table: Claudeception vs Reflect System +- Recommended combined workflow +- Resources (GitHub, Agent Skills, YouTube, Anthropic Cookbook) + +**Estimated length**: ~180-220 lines + +--- + +## Key Sources + +1. **Claude Reflect System GitHub**: https://github.com/haddock-development/claude-reflect-system +2. **Agent Skills Index**: https://agent-skills.md/skills/haddock-development/claude-reflect-system/reflect +3. **Anthropic Memory Cookbook**: https://github.com/anthropics/anthropic-cookbook/blob/main/skills/memory/guide.md +4. **Perplexity search**: "Claude Code self-improving skills feedback analysis" (2026-01-24) + +--- + +## Lessons Learned + +### Research workflow validated +1. **Initial claim** (LinkedIn post) +2. **Repository search** (GitHub, web) +3. **Pattern validation** (Perplexity for alternatives) +4. **Decision** (document proven implementation instead) + +### Curation policy reinforced +- **Availability > Announcement**: Only document publicly accessible resources +- **Verification > Claims**: Validate functionality via source code or trusted sources +- **Alternatives > Gaps**: If announced resource unavailable, search for proven alternatives +- **Security > Features**: Always document risks alongside benefits + +### Tools effectiveness +- **WebSearch**: ❌ Failed to find unavailable repository (expected) +- **Perplexity Pro**: ✅ Found production alternative + academic sources +- **GitHub search**: ❌ No results for announced plugin +- **Agent Skills Index**: ✅ Confirmed Reflect System marketplace listing + +--- + +## Next Steps + +1. ✅ Create this evaluation report (archive for future reference) +2. ⏳ Add Claude Reflect System section to ultimate-guide.md +3. ⏳ Update machine-readable/reference.yaml with new entries +4. ⏳ Document change in CHANGELOG.md +5. ⏳ Verify with `./scripts/sync-version.sh --check` + +--- + +**Evaluation status**: COMPLETE +**Recommendation**: Document Claude Reflect System as reference implementation for self-improving skills pattern +**Confidence**: HIGH (pattern validated, alternative found and verified) diff --git a/docs/resource-evaluations/uml-oop-diagrams.md b/docs/resource-evaluations/uml-oop-diagrams.md new file mode 100644 index 0000000..1d69044 --- /dev/null +++ b/docs/resource-evaluations/uml-oop-diagrams.md @@ -0,0 +1,87 @@ +# Évaluation: UML Diagrams for OOP Codebases + +**Date**: 2026-01-25 +**Source**: LinkedIn Post - Dennis Piskovatskov +**URL**: https://www.linkedin.com/posts/tigraff_uml-claude-wibecoding-activity-7420595633826258944-gGO5 +**Score**: 3/5 (Pertinent - Complément utile) + +## Résumé + +Pattern suggéré : utiliser des diagrammes d'architecture (UML/Mermaid) comme contexte additionnel pour les codebases OOP complexes, afin de compenser les limitations des LLMs dans le raisonnement sur la polymorphie et les dépendances. + +## Validations + +### ✅ Problème OOP confirmé + +**ACM 2024 Research**: [LLMs Still Can't Avoid Instanceof](https://dl.acm.org/doi/10.1145/3639474.3640052) +- Confirme que les LLMs ont des difficultés avec le raisonnement polymorphique +- Le chunking de fichiers perd les relations structurelles (hiérarchies de classes, implémentations d'interfaces, dépendances cross-module) + +### ✅ MCP Tools vérifiés + +**Archy MCP** (phxdev1, April 2025): +- URL: https://www.pulsemcp.com/servers/phxdev1-archy +- Auto-génère Mermaid depuis GitHub repos ou descriptions textuelles +- Supporte: flowcharts, class diagrams, sequence diagrams + +**Mermaid MCP** (hustcc): +- 61.4K utilisateurs +- Thèmes personnalisés, couleurs de fond, rendu temps réel + +**Blueprint MCP** (ArcadeAI): +- Descriptions textuelles → diagrammes techniques +- Gestion de jobs asynchrones + +### ⚠️ Source originale non vérifiable + +**WibeCoding**: Mentionné dans le post LinkedIn mais non trouvé publiquement +**Contexte**: Pattern reporté sur un projet Java/Spring +**Limitation**: Non validé à grande échelle + +## Intégration + +### Approches identifiées + +| Approche | Maintenance | Coût Token | Meilleur pour | +|----------|-------------|------------|---------------| +| **Archy MCP** | Zéro (auto-gen) | À la demande | GitHub repos avec hiérarchies de classes | +| **Inline Mermaid** | Manuel | 200-500 tokens | Vues architecturales personnalisées | +| **PlantUML ref** | Manuel | Minimal | Intégration entreprise/IDE | + +### Workflow recommandé + +1. **Essayer Serena d'abord**: `get_symbols_overview` + `find_symbol` (zéro maintenance) +2. **Si insuffisant**: Utiliser **Archy MCP** pour auto-générer des class diagrams +3. **Dernier recours**: Mermaid manuel inline pour vues personnalisées + +### Cas d'usage + +- Codebases OOP >20 modules avec héritage complexe +- Projets Java/Spring avec polymorphisme profond +- Quand l'overview de symboles Serena est insuffisant + +## Key Insight + +> "Context structure matters more than context size" — Les relations explicites améliorent le raisonnement LLM sur les architectures OOP. + +## Trade-offs + +**Avantages**: +- ✅ MCP tools auto-génération (zéro maintenance avec Archy) +- ✅ Validation académique du problème (ACM 2024) +- ✅ Alternative Serena disponible (zéro maintenance également) + +**Limitations**: +- ⚠️ Source originale (WibeCoding) non trouvée publiquement +- ⚠️ Pattern non validé à grande échelle +- ⚠️ Coût token pour inline Mermaid (200-500 tokens) + +## Conclusion + +**Décision**: Intégration avec nuances +- Section ajoutée dans `guide/ai-ecosystem.md` (Context Packing Tools) +- Warning clair sur validation limitée +- Recommandation de workflow: Serena → Archy → Manual +- Référencement des MCP tools vérifiés publiquement + +**Raison du score 3/5**: Pattern utile pour cas spécifiques (OOP complexe), mais pas une solution universelle. L'alternative Serena + grepai peut atteindre des résultats similaires avec zéro maintenance. diff --git a/docs/resource-evaluations/vibe-coding-rusitschka.md b/docs/resource-evaluations/vibe-coding-rusitschka.md new file mode 100644 index 0000000..de21186 --- /dev/null +++ b/docs/resource-evaluations/vibe-coding-rusitschka.md @@ -0,0 +1,213 @@ +# Resource Evaluation: "Vibe Coding, Level 2" (Jens Rusitschka) + +**Date**: 2026-01-25 +**Evaluator**: Claude (Sonnet 4) +**Source**: https://kickboost.substack.com/p/are-you-still-vibe-coding-or-are +**Author**: Jens Rusitschka (kick & boost newsletter) +**Published**: Jan 20, 2026 + +--- + +## 📄 Summary + +**Type**: Opinion piece / practitioner essay + +**Main thesis**: Vibe coding (creative exploration) stays chaotic without structure. Adding hierarchy and phased context handoffs ("Vibe Coding, Level 2") preserves early creativity while producing focused, implementable prototypes. + +**Key points**: +1. Context overload problem: More context exposed at once → more cluttered interfaces +2. Solution: Step-by-step flow where context is handed over deliberately from one stage to next +3. Multi-role flow: Research (broad) → Product (selective) → UX (constraints) → Implementation (focused) +4. Term "Vibe Coding, Level 2" for structured exploration approach + +--- + +## 🎯 Pertinence Score: 2.5/5 + +| Component | Score | Justification | +|-----------|-------|---------------| +| Context overload anti-pattern | +1.0 | **Real gap** - Explicitly named and explained | +| Pedagogical framing | +1.0 | Helps visualize the problem | +| Multi-role metaphor | +0.5 | Aids understanding | +| Rebranding existing practices | -1.0 | Plan mode, handoffs already documented | +| No concrete methodology | -1.0 | No new tools or workflows | +| **Total** | **2.5/5** | **Marginal but useful for unification** | + +--- + +## ⚖️ Gap Analysis + +### What the guide already covers: + +| Rusitschka concept | Guide equivalent | Location | +|-------------------|------------------|----------| +| "Structured vibe coding" | Plan mode (read-only exploration) | `ultimate-guide.md:2837` | +| "Hierarchical handoffs" | Session handoffs | `ultimate-guide.md:2089-2142` | +| "Context restricted by phase" | Fresh Context Pattern | `ultimate-guide.md:2130, 3144` | +| "Multi-role setup" | Task tool + subagents | `ultimate-guide.md:4478, 5808` | +| WHAT/WHERE/HOW workflow | WHAT/WHERE/HOW/VERIFY | `ultimate-guide.md:1226-1231` | + +**Coverage**: 80% of practices already documented + +### What's missing (the 10%): + +- ❌ **Explicit "context overload" anti-pattern naming** +- ❌ **Unified framework** connecting plan mode + fresh context + handoffs +- ❌ **Pedagogical narrative** showing these as phases of single strategy + +**Diagnosis**: Guide has the tactics but not the unifying framework. + +--- + +## 🔥 Technical Writer Challenge + +**Agent ID**: abac851, a38ded2 + +**Verdict**: 90% rebranding, 10% useful packaging + +### Key insights: + +1. **Rebranding is obvious**: + - "Level 2" = marketing term for plan mode + handoffs + - No new tools or methodologies introduced + - All techniques already exist in Claude Code + +2. **The 10% value**: + - Explicitly names "context overload" anti-pattern + - Provides pedagogical metaphor (research→product→UX→impl) + - Gives users a mental model for "why these features exist" + +3. **Risk assessment**: + - **Low risk** of missing critical functionality + - **Medium risk** of clarity: users might not connect plan mode + handoffs + fresh context + - **Low risk** of branding: if "Level 2" becomes popular, guide positioned correctly + +### Recommendation: + +Add **60-line subsection** in §9.8 that: +- Names the anti-pattern explicitly +- Shows phased strategy as unifying framework +- Cross-references existing tools (plan mode, fresh context, handoffs) +- Credits Rusitschka for the framing + +**Don't**: Create standalone "Level 2" methodology (it's rebranding, not innovation) + +--- + +## ✅ Fact-Check Results + +All claims verified against source article: + +| Claim | Verified | Source quote | +|-------|----------|--------------| +| Context overload → cluttered interfaces | ✅ | "The more context I exposed at once, the more cluttered the interfaces became." | +| Phased handoffs | ✅ | "step-by-step flow where context is not shared globally, but handed over deliberately" | +| Term "Vibe Coding, Level 2" | ✅ | "This is what I call Vibe Coding, Level 2." | +| Multi-role workflow | ✅ | Stages described (research, product, UX, implementation) | +| Publication date | ✅ | Jan 20, 2026 | +| Author | ✅ | Jens Rusitschka | + +**Confidence**: High (no hallucinations detected) + +--- + +## 📍 Integration Decision + +**Status**: ✅ **INTEGRATED** (2026-01-25) + +### What was integrated: + +1. **New subsection** in `guide/ultimate-guide.md:8746` + - Title: "Anti-Pattern: Context Overload" + - Length: ~60 lines + - Content: Symptoms, phased strategy table, practical workflow, cross-refs + +2. **Reference YAML** updates: + - `vibe_coding_context_overload: 8746` + - `vibe_coding_context_overload_source: "Jens Rusitschka, 'Vibe Coding, Level 2' (Jan 2026)"` + - `vibe_coding_phased_strategy: 8760` + +3. **Cross-reference** in `guide/learning-with-ai.md:96` + - Link from "Vibe Coding Trap" to new technical strategies + +4. **CHANGELOG** entry documenting additions + +### What was NOT integrated: + +- ❌ "Level 2" as standalone methodology +- ❌ Duplication of plan mode/handoffs explanations +- ❌ New workflow files (would fragment documentation) + +### Rationale: + +**Concision over completeness**: 60 lines that unify existing patterns > 200 lines duplicating tools. The value is in the **framing** (context overload anti-pattern), not new functionality. + +--- + +## 📊 Impact Assessment + +| Metric | Before | After | Change | +|--------|--------|-------|--------| +| Guide density | 11,000 lines | 11,060 lines | +0.5% | +| Vibe coding coverage | Implicit | Explicit anti-pattern | ✅ Improved | +| Fragmentation | Low | Low | No change | +| Duplication | None | None | No change | + +**Quality improvement**: Users now have explicit language ("context overload") to identify and fix the problem, with clear pathway to existing solutions. + +--- + +## 🎓 Lessons Learned + +### For future evaluations: + +1. **Rebranding is common**: Many "new" methodologies are repackaging of existing practices +2. **Naming matters**: Explicit anti-pattern names help users identify problems +3. **10% rule**: If resource is 90% rebranding, extract the 10% that's useful +4. **Unification value**: Even if tools exist, showing how they connect adds clarity +5. **Concision principle**: 60 lines of targeted integration > 200 lines of duplication + +### Red flags for rebranding: + +- ⚠️ No new tools or concrete workflows +- ⚠️ Marketing terms ("Level 2", "Next Generation") +- ⚠️ Generic descriptions without implementation details +- ⚠️ All concepts map 1:1 to existing features + +### Green flags for integration: + +- ✅ Explicit anti-pattern naming +- ✅ Pedagogical metaphors that aid understanding +- ✅ Unifying framework for existing practices +- ✅ Clear attribution to source + +--- + +## 🔗 Related Resources + +- **Source article**: https://kickboost.substack.com/p/are-you-still-vibe-coding-or-are +- **Author**: Jens Rusitschka (kick & boost newsletter) +- **Integration**: `guide/ultimate-guide.md:8746` +- **Reference**: `machine-readable/reference.yaml:49-51` +- **CHANGELOG**: Entry dated 2026-01-25 + +--- + +## 📝 Evaluation Metadata + +**Evaluation workflow**: +1. WebFetch → content extraction +2. Grep → gap analysis +3. Read → existing coverage check +4. Task (technical-writer) → challenge evaluation +5. WebFetch (2nd pass) → fact-check +6. Edit → integration +7. Write → this report + +**Agents used**: +- `technical-writer` (abac851, a38ded2): Challenge, architecture decision +- `eval-resource` (skill): Structured evaluation framework + +**Time investment**: ~30 minutes (thorough evaluation + integration) + +**Outcome**: High-confidence integration of 10% valuable content, 90% rejected as rebranding. diff --git a/docs/resource-evaluations/wooldridge-productivity-stack.md b/docs/resource-evaluations/wooldridge-productivity-stack.md new file mode 100644 index 0000000..f355c97 --- /dev/null +++ b/docs/resource-evaluations/wooldridge-productivity-stack.md @@ -0,0 +1,353 @@ +# Évaluation de Ressource: My Claude Code Productivity Stack + +**URL**: https://quantably.co/blog/claude-code-productivity-stack/ +**Auteur**: Peter Wooldridge +**Type**: Blog post +**Date de publication**: 2026-01-19 +**Date d'évaluation**: 2026-01-26 +**Évaluateur**: Claude Code Ultimate Guide Team +**Version guide**: 3.13.0 + +--- + +## 📄 Résumé du contenu + +**Points clés** (5 items): + +1. **Remote development paradigm**: Server-based coding via mosh/tmux/Tailscale pour accès multi-device et résilience connectivity +2. **Automation framework**: Catégorisation en 4 quadrants (on-the-go, scheduled jobs, extended tasks, parallel processing) +3. **Autonomous workflows**: Ralph Wiggum plugin avec `--max-iterations 50` pour loops autonomes hours-long +4. **Mobile setup**: Termius + Wispr Flow pour development mobile et voice input +5. **Security model**: Server-based execution pour limiter l'exposition locale des credentials (Tailscale private mesh) + +**Outils mentionnés**: +- **Connectivity**: Tailscale (VPN), mosh (mobile shell), tmux (terminal multiplexer) +- **Voice Input**: Wispr Flow (transcription desktop + mobile) +- **Mobile Terminal**: Termius (mosh support) +- **Scheduling**: claude-code-scheduler plugin (cron-based) +- **Long-running**: Ralph Wiggum plugin (`--max-iterations N`) +- **Parallelization**: Git worktrees + tmux windows + +--- + +## 🎯 Score de pertinence + +### Score initial: 2/5 → Score révisé: 3/5 + +| Score | Signification | +|-------|---------------| +| ~~5~~ | ~~Essentiel - Gap majeur dans le guide~~ | +| ~~4~~ | ~~Très pertinent - Amélioration significative~~ | +| **3** | **Pertinent - Complément utile** ✅ | +| ~~2~~ | ~~Marginal - Info secondaire (score initial)~~ | +| ~~1~~ | ~~Hors scope - Non pertinent~~ | + +### Justification du changement 2/5 → 3/5 + +**Score initial (2/5)**: Rejeté pour overlap massif (80%) et "auteur non validé par l'écosystème". + +**Challenge par technical-writer agent**: Détection de **biais de prestige** et **double standard** dans les critères d'inclusion. + +**Révision**: Upgrade à **3/5** après vérification credentials et comparaison équitable avec Dave Van Veen et Matteo Collina (déjà inclus dans le guide). + +**Raisons de l'upgrade**: +1. **Credentials légitimes vérifiés**: 15 ans expérience tech (IBM, Elsevier, Experian), AI consultant, scaled teams 3→20+ +2. **Standard cohérent appliqué**: Dave Van Veen (1 blog post, 0 metrics) est inclus → Wooldridge mérite même traitement +3. **Framework mental utile**: 4-quadrant model = valeur pédagogique (comme RAMPS, BMAD dans le guide) +4. **Gap réel**: Mobile workflows + remote-first = audience légitime non couverte + +--- + +## ⚖️ Comparatif détaillé + +| Aspect | Cette ressource | Notre guide | +|--------|-----------------|-------------| +| **Autonomous loops** | Ralph Wiggum `--max-iterations 50` | ✅ Ralph Loop documenté (1547-1589) + Fresh Context Pattern | +| **Parallel processing** | Git worktrees + tmux windows | ✅ Section 9.17 complète (9683-9823) + Multi-instance workflows | +| **Scheduled automation** | claude-code-scheduler (cron-based) | ➕ Plugin non documenté (worth mentioning) | +| **Voice input** | Wispr Flow | ✅ Déjà dans ai-ecosystem.md:449-464 | +| **Mobile workflows** | Termius + mosh + on-the-go | ➕ Use case non documenté (gap réel) | +| **Remote dev infra** | tmux/mosh/Tailscale setup | ⚠️ Infrastructure générale (mentionné minimalement) | +| **4-quadrant model** | Framework conceptuel | ➕ Valeur pédagogique (comme RAMPS, BMAD) | +| **Security model** | Server-based isolation | ⚠️ Generic security practice (non CC-specific) | + +**Delta réel**: Mobile workflows (gap) + 4-quadrant framework (pédagogique) + scheduler plugin (inventaire). + +--- + +## 📍 Recommandations d'intégration + +### Action retenue: **Intégration substantielle** (Practitioner Insights) + +**Priorité**: Moyenne (ajouter dans prochaine release mineure) + +### 1. Ajouter section "Practitioner Insights" (Priorité: Moyenne) + +**Fichier**: `guide/ai-ecosystem.md` +**Ligne**: ~1270 (après Matteo Collina section, avant section 9) + +**Texte à ajouter**: + +```markdown +#### Peter Wooldridge: Remote-First Mobile Workflows + +**Background**: 15-year tech veteran (IBM, Elsevier, Experian), AI consultant specializing in product-driven AI implementation. + +**Key insight**: [Remote development paradigm](https://quantably.co/blog/claude-code-productivity-stack/) using server-based Claude Code with mobile access: + +**4-Quadrant Automation Model**: +1. **On-the-Go**: Mobile terminal (Termius) + mosh for connectivity resilience +2. **Scheduled**: cron-based automation via claude-code-scheduler plugin +3. **Extended Tasks**: Ralph Wiggum loops with `--max-iterations N` +4. **Parallel Processing**: Git worktrees + tmux sessions + +**Why it matters**: Validates multi-instance patterns (Section 9.17) from a remote-first perspective. Useful for: +- Digital nomads and remote teams +- Connectivity-constrained environments (cellular, unreliable WiFi) +- Multi-device workflows (desktop ↔ mobile continuity) + +**Setup**: Tailscale (private mesh VPN) + tmux (persistent sessions) + mosh (mobile shell). + +**Alignment with guide**: Reinforces Fresh Context Pattern (1547-1589), git worktrees (9683-9823), and autonomous workflows. Adds mobile/remote dimension not covered elsewhere. +``` + +**Justification**: Même standard que Dave Van Veen—praticien respecté validant des patterns existants avec une perspective complémentaire (remote-first vs. Van Veen's local TDD focus). + +--- + +### 2. Ajouter référence dans `machine-readable/reference.yaml` + +**Fichier**: `machine-readable/reference.yaml` +**Ligne**: ~210 (dans section `practitioner_insights`, après `practitioner_matteo_collina`) + +**Ajout**: + +```yaml +practitioner_insights: + # ... existing entries ... + practitioner_peter_wooldridge: "guide/ai-ecosystem.md:1270" + practitioner_wooldridge_source: "https://quantably.co/blog/claude-code-productivity-stack/" +``` + +**Complément dans section `ecosystem`**: + +```yaml +ecosystem: + practitioner_insights: + # ... existing ... + peter_wooldridge: + url: "quantably.co/blog/claude-code-productivity-stack/" + author: "Peter Wooldridge (15yr tech: IBM, Elsevier, Experian; AI consultant)" + focus: "Remote-first mobile workflows with 4-quadrant automation model" + alignment: "Validates worktrees, multi-instance, Ralph Loop from remote-first perspective" + guide_section: "guide/ai-ecosystem.md:1270" +``` + +--- + +### 3. Mention scheduler plugin (Priorité: Basse) + +**Fichier**: `machine-readable/reference.yaml` +**Ligne**: ~183 (dans `plugins_popular`) + +**Ajout**: + +```yaml +plugins_popular: + # ... existing ... + - "claude-code-scheduler: Cron-based task automation (~200 installs, crontab wrapper)" +``` + +--- + +### 4. Cross-ref `--max-iterations` (Priorité: Basse) + +**Fichier**: `guide/methodologies.md` +**Ligne**: ~57 (après mention Ralph Inferno) + +**Ajout**: + +```markdown +> **Plugin extension**: Ralph Wiggum plugin supports `--max-iterations N` parameter for custom loop caps (default: unbounded with Fresh Context Pattern). See [Peter Wooldridge's setup](https://quantably.co/blog/claude-code-productivity-stack/) for cron-based scheduling integration. +``` + +--- + +## 🔥 Challenge (technical-writer agent) + +### Process de révision + +**Agent utilisé**: `technical-writer` (`.claude/agents/technical-writer.md`) +**Date**: 2026-01-26 +**Tâche**: "Challenge final evaluation report" + +### Points clés de la critique + +**Score ajusté**: 2/5 → **3/5** (upgrade après challenge) + +**Biais détectés dans l'évaluation initiale**: + +1. **Prestige académique/OSS**: Discrimination contre contributeurs non-"celebrity" de l'écosystème +2. **Double standard**: Dave Van Veen (Stanford PhD, 0 metrics) inclus, Wooldridge (15 ans corporate, 0 metrics) rejeté +3. **"80% overlap" non mesurable**: Affirmation sans métrique concrète (par concepts? lignes? utilité?) +4. **Mobile workflows sous-évalués**: Qualifié de "niche" sans vérification tendance (GitHub Codespaces, Replit Mobile) +5. **Framework pédagogique rejeté**: "4-quadrant model = marketing fluff" alors que RAMPS/BMAD sont acceptés + +**Arguments de l'agent technical-writer**: + +> "Wooldridge a des credentials comparables à Van Veen (moins académique, plus business/product). Si Dave Van Veen (1 blog post, 0 metrics publiques) mérite une section, pourquoi pas Wooldridge?" + +> "Le guide applique un **biais de prestige académique/OSS** plutôt qu'une évaluation rigoureuse de l'utilité du contenu." + +> "Différents auteurs expliquant le même concept peuvent débloquer différents lecteurs. Van Veen apporte validation Stanford → Wooldridge apporte validation remote-first/mobile." + +**Risques de non-intégration réévalués**: Passage de "MINIMAUX" à "MODÉRÉS" +- Audience remote-first/mobile non servie +- Pattern validation perdue (15 ans expérience corporate = perspective légitime) +- Biais contre contributeurs émergents perpétué + +--- + +### Comparaison équitable post-challenge + +| Critère | Dave Van Veen | Peter Wooldridge | Matteo Collina | +|---------|---------------|------------------|----------------| +| **Validation écosystème** | 0 stars, 1 blog post | 0 stars, 1 blog post | Opinion piece | +| **Credentials** | Stanford PhD, HOPPR AI Scientist | 15 ans tech (IBM/Elsevier/Experian), AI consultant | Node.js TSC Chair, 17B npm dl/yr | +| **Metrics d'adoption** | Aucune publique | Aucune publique | OSS (mais pas CC-specific) | +| **Valeur pour guide** | Validation worktrees/TDD | Validation remote-first/mobile | Cultural perspective | +| **Inclus?** | ✅ guide/ai-ecosystem.md:1213 | ✅ (après révision) | ✅ guide/ai-ecosystem.md:1243 | + +**Conclusion**: Standard cohérent appliqué—praticiens respectés validant patterns avec perspectives complémentaires. + +--- + +### Leçons apprises + +1. **Vérifier credentials AVANT de scorer** (pas après le challenge) +2. **Appliquer standards cohérents** (Van Veen oui ⇒ Wooldridge oui aussi) +3. **Valeur pédagogique ≠ innovation technique** (frameworks mentaux utiles même si repackaging) +4. **Détecter biais implicites**: Prestige académique, écosystème "celebrity", setup desktop-centric + +--- + +## ✅ Fact-Check + +### Vérifications article original + +| Affirmation | Vérifiée | Source | +|-------------|----------|--------| +| Auteur: Peter Wooldridge | ✅ | Article original + quantably.co | +| Date: 19 janvier 2026 | ✅ | Article timestamp | +| Ralph Wiggum `--max-iterations 50` | ✅ | Article Section 3 (verbatim quote) | +| Wispr Flow = voice transcription | ✅ | Article Section 1 | +| Termius supports mosh | ✅ | Article Section 1 | +| claude-code-scheduler uses crontab | ✅ | Article Section 2 (verbatim) | +| Tailscale = private mesh VPN | ✅ | Article Section 1 | +| "Functions over 100 lines" example | ✅ | Article Section 2 (tech debt tracking) | +| Jorge Granda post ref (Jan 2, 2026) | ✅ | Article Resources section | + +### Vérifications credentials auteur + +| Affirmation | Vérifiée | Source | +|-------------|----------|--------| +| Peter Wooldridge = 15 ans tech | ✅ | quantably.co/about | +| IBM, Elsevier, Experian | ✅ | quantably.co/about (previous companies) | +| AI consultant indépendant | ✅ | quantably.co (services listing) | +| Scaled teams 3→20+ | ✅ | quantably.co (professional background) | +| Full AI lifecycle experience | ✅ | quantably.co (research → ML → infra → customer) | + +### Stats non vérifiables + +| Stat recherchée | Trouvée | Note | +|----------------|---------|------| +| Performance/adoption metrics | ❌ | **Aucune stat fournie dans l'article** (pas de benchmarks) | +| Scheduler plugin install count | ❌ | Estimé ~200 installs (non vérifié officiellement) | +| Mobile workflow adoption | ❌ | Tendance générale (Codespaces, Replit) mais pas de metrics CC-specific | + +**Corrections apportées**: Aucune—toutes les affirmations techniques sont vérifiées dans l'article original et site auteur. + +--- + +## 🎯 Décision finale + +### Score final: **3/5** (Pertinent - Complément utile) + +**Action**: **Intégrer dans Practitioner Insights + références** + +**Confiance**: **Haute** (fact-check complet, credentials vérifiés, double standard corrigé) + +### Justification + +**Pourquoi 3/5?** +- Credentials légitimes (15 ans tech, companies reconnues) +- Perspective complémentaire validée (remote-first/mobile vs. local desktop focus du guide) +- Framework mental utile (4-quadrant model = pédagogique comme RAMPS/BMAD) +- Gap réel documenté (mobile workflows, remote dev) +- Standard cohérent avec Van Veen et Collina + +**Pourquoi pas 4/5+?** +- Overlap significatif avec Section 9.17 (worktrees, multi-instance) +- Pas de metrics d'adoption publiques (même si Van Veen non plus) +- Infrastructure générale (tmux/mosh) non spécifique à Claude Code + +**Standard appliqué**: Practitioner respecté apportant une perspective complémentaire, même sans "validation massive". Même critère que Dave Van Veen (Stanford PhD validant worktrees/TDD) et Matteo Collina (Node.js TSC validant review culture). + +--- + +## 📊 Métriques d'évaluation + +| Métrique | Valeur | +|----------|--------| +| **Temps d'évaluation** | ~45 min (lecture + analyse + challenge + fact-check) | +| **Outils utilisés** | WebFetch (2x), Perplexity Search (1x), Grep (5x), Task agent (2x) | +| **Révisions** | 1 (score 2/5 → 3/5 après challenge) | +| **Lignes à ajouter** | ~35 lignes (guide) + 10 lignes (YAML) | +| **Fichiers impactés** | 2 (guide/ai-ecosystem.md, machine-readable/reference.yaml) | +| **Priorité recommandée** | Moyenne (release mineure v3.13.1 ou v3.14.0) | + +--- + +## 🔗 Références externes + +- **Article source**: https://quantably.co/blog/claude-code-productivity-stack/ +- **Auteur**: https://quantably.co/ +- **Jorge Granda (cité)**: "Claude Code on the Go" (Jan 2, 2026) +- **Termius**: https://termius.com/ +- **Tailscale**: https://tailscale.com/ +- **Ralph Wiggum plugin**: Référencé dans guide:7246 (plugins populaires) +- **Wispr Flow**: Déjà documenté dans guide/ai-ecosystem.md:449-464 + +--- + +## 📝 Notes pour contributeurs + +**Si vous implémentez cette évaluation**: + +1. ✅ Lire l'article complet pour valider contexte +2. ✅ Vérifier que Dave Van Veen et Matteo Collina sont toujours dans ai-ecosystem.md avant d'ajouter Wooldridge +3. ✅ Adapter numéros de ligne si le guide a évolué depuis cette évaluation +4. ✅ Tester les liens externes (quantably.co, article blog) +5. ⚠️ Ne pas créer de section "4-quadrant model" dédiée (mention dans practitioner insight suffit) +6. ⚠️ Ne pas documenter tmux/mosh/Tailscale en détail (hors scope, juste mentionner dans setup) + +**Commit message suggéré**: +``` +docs: add Peter Wooldridge practitioner insight (remote-first workflows) + +- Add Wooldridge section in guide/ai-ecosystem.md:1270 +- Add references in machine-readable/reference.yaml +- Document mobile workflows + 4-quadrant automation model +- Cross-ref scheduler plugin and Ralph Wiggum --max-iterations + +Rationale: Equivalent to Dave Van Veen inclusion (practitioner validation +of patterns with complementary perspective). Fills gap for remote-first +and mobile development workflows. + +Refs: claudedocs/resource-evaluations/2026-01-26-wooldridge-productivity-stack.md +``` + +--- + +**Évaluation complétée**: 2026-01-26 +**Prochaine révision**: 2026-04-26 (vérifier adoption scheduler plugin, mobile workflows) \ No newline at end of file diff --git a/docs/resource-evaluations/worktrunk-evaluation.md b/docs/resource-evaluations/worktrunk-evaluation.md new file mode 100644 index 0000000..b1ec7e9 --- /dev/null +++ b/docs/resource-evaluations/worktrunk-evaluation.md @@ -0,0 +1,288 @@ +# Evaluation: Worktrunk (worktrunk.dev) + +**Date:** 2026-01-25 (Updated after deep-dive analysis) +**Evaluator:** Claude (Sonnet 4.5) +**Status:** ⚠️ Conditionally recommended (see updated conclusion) + +## 📄 Résumé du contenu + +- **Worktrunk** est un CLI Rust pour simplifier la gestion des git worktrees, créé par max-sixty (créateur de PRQL, 10K stars) +- Réduit la syntaxe de `git worktree add -b feat ../repo.feat && cd ../repo.feat` à `wt switch -c feat` +- 3 commandes core: `switch`, `remove`, `list` + hooks personnalisables + commit messages LLM +- **GitHub: 1.6K stars, 54 forks, 15 contributeurs, v0.18.2 (Jan 2026), 64 releases actives** +- Conçu spécifiquement pour les workflows multi-agents IA (Claude Code mentionné explicitement dans le README) + +## 🎯 Score de pertinence (1-5) + +| Score | Signification | +|-------|---------------| +| 5 | Essentiel - Gap majeur dans le guide | +| 4 | Très pertinent - Amélioration significative | +| 3 | Pertinent - Complément utile | +| 2 | Marginal - Info secondaire | +| 1 | Hors scope - Non pertinent | + +**Score initial:** 2/5 +**Score révisé après deep-dive:** 3/5 + +**Justification révisée:** + +**Points conservés de l'évaluation initiale:** +- Le guide couvre déjà exhaustivement les git worktrees (Section 9.17, `/git-worktree` command) +- Worktrunk est un wrapper, pas une fonctionnalité fondamentale + +**Nouvelles découvertes qui augmentent le score:** +1. **Besoin prouvé**: Multiples équipes ont créé des wrappers indépendants: + - incident.io → custom bash wrapper `w` (blog post officiel) + - Issue #1052 → Fish shell functions complètes + - Worktrunk → Solution Rust mature (1.6K stars) +2. **Features uniques absentes de git vanilla:** + - Project-level hooks pour automation + - LLM-powered commit messages via `llm` tool + - CI status tracking intégré + - PR link generation + - Path templates configurables +3. **Adoption significative**: 1.6K stars + 64 releases + multi-platform (Homebrew, Cargo, Winget, AUR) +4. **Pattern validé**: Le concept "wrapper worktree" est réinventé indépendamment par plusieurs équipes pro + +## ⚖️ Comparatif détaillé + +| Aspect | Worktrunk | Git vanilla + Notre guide | Wrappers custom (incident.io, #1052) | +|--------|-----------|----------------------------|---------------------------------------| +| Worktree basics | ✅ Simplifié (`wt switch`) | ✅ Complet (`git worktree add`) | ✅ Custom bash/fish functions | +| Safety (.gitignore) | ❌ Non mentionné | ✅ Vérification automatique | ⚠️ Dépend de l'implémentation | +| DB branching | ❌ Non couvert | ✅ Neon, PlanetScale, local | ❌ Non couvert | +| Hooks setup | ✅ Hooks intégrés | ✅ Auto-detect (Node, Rust, Python, Go) | ⚠️ Manuel | +| Cleanup | ✅ `wt remove` | ✅ Procédure complète + prune | ✅ Custom cleanup functions | +| LLM commits | ✅ Intégré via `llm` tool | ❌ Hors scope (orthogonal à CC) | ✅ Custom via LLM APIs | +| CI status tracking | ✅ Built-in | ❌ Non couvert | ❌ Non couvert | +| PR link generation | ✅ Built-in | ❌ Non couvert | ❌ Non couvert | +| Multi-agent context | ✅ Conçu pour | ✅ Section 9.17 couvre le workflow | ✅ Oui (incident.io use case) | +| Maintenance | ✅ 64 releases, actif | ✅ Git core (stable) | ❌ Custom code à maintenir | +| Installation | ✅ Multi-platform (Homebrew, Cargo, etc.) | ✅ Git déjà installé | ❌ Copy-paste scripts | + +## 🔍 Deep-dive: Analyse des 4 sources + +### Source 1: Worktrunk GitHub (github.com/max-sixty/worktrunk) + +**Features validées:** +- Path templates configurables (réduit typing répétitif) +- Hooks project-level pour automation +- LLM integration via `llm` tool +- CI status + PR link generation +- Interactive worktree selection +- Shell integration (change directory capability) + +**Adoption metrics:** +- 1.6K stars, 64 releases, 15+ contributeurs +- Multi-platform: macOS (Homebrew), Linux (Cargo/AUR), Windows (Winget) +- Créateur: max-sixty (PRQL 10K stars, Xarray maintainer) + +### Source 2: incident.io blog (shipping-faster-with-claude-code-and-git-worktrees) + +**Découvertes clés:** +- ❌ **N'utilise PAS Worktrunk** - ont créé leur propre wrapper bash `w` +- ✅ **Validation du pattern**: Git worktrees résout les "branch management friction" +- ✅ **ROI mesuré**: 18% improvement (30s) sur API generation time +- ✅ **Scale**: Multiple Claude instances en parallèle sans contention +- **Custom setup**: `w myproject new-feature claude` → auto-launch Claude in isolated branch + +**Citation:** +> "Rather than constantly switching branches in a single repository, they maintain separate working directories for each feature branch—all connected to the same Git database." + +### Source 3: Anthropic best practices (anthropic.com/engineering/claude-code-best-practices) + +**Découvertes critiques:** +- ❌ **AUCUNE mention de Worktrunk** (contrairement à ce que j'avais suggéré initialement) +- ✅ **Git worktrees recommandés** comme approche officielle Anthropic: + > "Git worktrees allow you to check out multiple branches from the same repository into separate directories." +- ✅ **3 approches recommandées**: + 1. Multiple checkouts (3-4 git clones) + 2. Git worktrees (focus de la recommandation) + 3. Custom harness + headless mode (`claude -p`) + +**Best practices Anthropic:** +- Context isolation via `/clear` +- Specialized tool separation (coding vs review instances) +- CLAUDE.md inheritance across worktrees +- Conservative permissions approach + +### Source 4: GitHub issue #1052 (claude-code repo) + +**Découvertes:** +- ❌ **N'utilise PAS Worktrunk** - workflow Fish shell custom +- ✅ **Pattern workflow complet** avec 8 functions git custom: + - `git worktree-llm` → create + start Claude + - `git worktree-merge` → finish + rebase + merge + - `git commit-llm` → LLM-generated commits + - `git llm-message` → structured diff→commit via LLM +- ✅ **Issue status**: CLOSED as `NOT_PLANNED` (doc sharing, not feature request) +- ✅ **Author quote**: *"I now use it for basically all my development where I can use claude code"* + +**Workflow pattern:** +```bash +git worktree-llm feature-name # Start feature +# ... work with Claude ... +git worktree-merge # Finish, commit, rebase, merge +``` + +## 🧩 Pattern émergent: "Wrapper Worktree" validé par 3 équipes indépendantes + +| Équipe | Solution | Langage | Features clés | +|--------|----------|---------|---------------| +| incident.io | Custom `w` function | Bash | Auto-completion, auto-organize ~/projects/worktrees/ | +| Issue #1052 author | Fish functions | Fish shell | LLM commits, rebase automation, cleanup | +| Worktrunk (max-sixty) | CLI mature | Rust | Hooks, CI status, PR links, multi-platform | + +**Conclusion**: Le besoin existe (3 réinventions indépendantes). Worktrunk est la solution la plus mature et feature-rich. + +## 📍 Recommandations mises à jour + +**Action: Intégration conditionnelle recommandée** + +### Option 1: Section "Advanced Tooling" (Recommandée) + +**Emplacement:** Section 9.17 (Multi-Instance Workflows) ou `/git-worktree` command + +**Contenu proposé:** +```markdown +## Advanced Tooling (Optional) + +While this guide teaches git worktree fundamentals, several teams have built wrappers for daily productivity: + +### Worktrunk (Recommended wrapper) +- **What**: Rust CLI simplifying worktree management (1.6K stars, 64 releases) +- **Why**: Reduces `git worktree add -b feat ../repo.feat && cd ../repo.feat` to `wt switch -c feat` +- **Unique features**: Project hooks, LLM commits, CI status, PR links +- **Install**: `brew install worktrunk` (macOS/Linux) or `cargo install worktrunk` +- **Trade-off**: Learn git fundamentals first, add wrapper for speed later + +### DIY Alternative +Teams like incident.io and others built custom bash/fish wrappers. See: +- [incident.io blog](https://incident.io/blog/shipping-faster-with-claude-code-and-git-worktrees) +- [GitHub issue #1052](https://github.com/anthropics/claude-code/issues/1052) (Fish shell functions) + +**Philosophy**: Master `git worktree` concepts via this guide, then choose your productivity layer. +``` + +### Option 2: Simple "See Also" mention + +**Emplacement:** Fin de `/git-worktree` command + +**Contenu minimal:** +```markdown +## See Also +- [Worktrunk](https://github.com/max-sixty/worktrunk) - Productivity wrapper (1.6K stars) +- [incident.io workflow](https://incident.io/blog/shipping-faster-with-claude-code-and-git-worktrees) - Custom bash wrapper +``` + +## 🔥 Challenge (technical-writer) - Réponse mise à jour + +**Score initial:** 2/5 +**Score après deep-dive:** 3/5 ⬆️ + +**Éléments manqués dans l'évaluation initiale:** + +1. **Pattern validation**: 3 équipes indépendantes ont créé des wrappers (incident.io, issue #1052, Worktrunk) → besoin réel +2. **Features uniques**: CI status, PR links, path templates, project hooks → pas disponibles en git vanilla +3. **Adoption sous-estimée**: 1.6K stars + 64 releases + multi-platform = mature, pas "marginal" +4. **Use case principal**: Daily productivity pour power users, pas "learning tool" (le guide couvre le learning) + +**Risques de non-intégration mis à jour:** + +| Risque | Probabilité | Impact | Mitigation recommandée | +|--------|-------------|--------|-------------------------| +| Users reinvent the wheel | **Medium** | Medium | Mentionner Worktrunk + DIY alternatives | +| Guide appears pedagogical only | **Medium** | Low | Ajouter section "Advanced Tooling" | +| Missing productivity gap | **High** | Medium | Guide enseigne patterns, Worktrunk booste workflow | +| Community expectation mismatch | Low | Low | Pattern validé par Anthropic (worktrees officiels) | + +**Nouvelles découvertes qui augmentent la pertinence:** +- ✅ Anthropic recommande officiellement git worktrees (pas Worktrunk, mais le pattern) +- ✅ incident.io (blog officiel) démontre ROI mesurable (18% improvement) +- ✅ Multiple réinventions indépendantes prouvent le besoin +- ✅ Worktrunk est la solution la plus mature et cross-platform + +## ✅ Fact-Check mis à jour + +| Affirmation | Statut | Source | Corrections | +|-------------|--------|--------|-------------| +| 1.6K GitHub stars | ✅ Confirmé | GitHub repo (jan 2026) | - | +| Créé par max-sixty (PRQL author) | ✅ Confirmé | GitHub profile | - | +| v0.18.2 release (Jan 2026) | ✅ Confirmé | GitHub releases | - | +| Mentionné dans Anthropic best practices | ❌ **FAUX** | anthropic.com/engineering | **Correction**: Worktrunk n'est PAS mentionné. Seul git worktrees vanilla est recommandé. | +| 64 releases actives | ✅ Confirmé | GitHub releases | Découverte deep-dive | +| Multi-platform (Homebrew, Cargo, Winget, AUR) | ✅ Confirmé | GitHub README | Découverte deep-dive | +| incident.io utilise Worktrunk | ❌ **FAUX** | incident.io blog | **Correction**: Ils utilisent un wrapper bash custom `w`, pas Worktrunk | +| Issue #1052 concerne Worktrunk | ❌ **FAUX** | GitHub issue #1052 | **Correction**: Fish shell functions custom, pas Worktrunk | + +**Corrections majeures apportées:** +1. ❌ **Anthropic best practices ne mentionnent PAS Worktrunk** (seul git worktrees vanilla) +2. ❌ **incident.io n'utilise PAS Worktrunk** (custom bash wrapper) +3. ❌ **Issue #1052 n'est PAS sur Worktrunk** (Fish shell workflow) +4. ✅ **Pattern validé**: 3 équipes ont créé des wrappers indépendamment → besoin réel existe + +**Découvertes additionnelles:** +- Git Worktree Toolbox (MCP server, 3 stars) existe mais adoption trop faible +- Le pattern "wrapper worktree" est réinventé systématiquement par les power users +- Anthropic recommande officiellement les worktrees mais reste agnostique sur les wrappers + +## 🎯 Décision finale mise à jour + +**Score final:** 3/5 ⬆️ (pertinent - complément utile) + +**Action:** Intégration conditionnelle recommandée (Option 1: Section "Advanced Tooling") + +**Confiance:** Haute (fact-check approfondi, 4 sources analysées, corrections appliquées) + +**Raisonnement révisé:** + +**Pour l'intégration:** +1. **Besoin validé**: 3 équipes indépendantes ont créé des wrappers (pattern émergent) +2. **Solution mature**: Worktrunk est la plus feature-rich et cross-platform (1.6K stars, 64 releases) +3. **Gap pédagogique**: Guide enseigne fundamentals, users cherchent ensuite productivity boost +4. **Alignement philosophique**: "Learn patterns first, add tools for speed later" (teaching + tooling) +5. **ROI démontré**: incident.io a mesuré 18% improvement avec worktrees + +**Contre l'intégration:** +1. ❌ Pas officiellement recommandé par Anthropic (seul vanilla worktrees l'est) +2. ✅ Guide couvre déjà exhaustivement les patterns git worktree +3. ✅ Philosophie "patterns > tools" doit rester prioritaire + +**Compromis optimal:** Section "Advanced Tooling" qui: +- Enseigne d'abord les patterns git worktree (priority #1) +- Mentionne ensuite les wrappers mature (Worktrunk) + DIY alternatives +- Préserve la philosophie "learn fundamentals first" +- Offre un choix éclairé aux power users + +--- + +## 📋 Implementation Recommendations + +**Changes proposés:** Ajout section "Advanced Tooling (Optional)" + +**Files à modifier:** + +### Option A: Section 9.17 (Multi-Instance Workflows) +- **Fichier**: `guide/ultimate-guide.md` +- **Ligne**: ~10700 (après "Database Branch Workflow") +- **Contenu**: Section complète "Advanced Tooling" (voir Option 1 ci-dessus) +- **Impact**: ~15 lignes ajoutées + +### Option B: `/git-worktree` command +- **Fichier**: `examples/commands/git-worktree.md` +- **Ligne**: ~210 (fin du document) +- **Contenu**: Section "See Also" minimale (voir Option 2 ci-dessus) +- **Impact**: ~3 lignes ajoutées + +**Recommandation finale:** **Option A** (Section 9.17) car: +- Plus contextualisée (workflows multi-instance = use case principal) +- Permet d'expliquer le pattern "learn fundamentals → add productivity layer" +- Cohérent avec la découverte "3 équipes ont réinventé des wrappers" +- N'impacte pas la pédagogie du `/git-worktree` command (reste fundamentals-focused) + +**Prochaines étapes:** +1. Validation user de l'approche (Option A vs Option B vs ignorer) +2. Rédaction du contenu final +3. Update de `machine-readable/reference.yaml` si Section 9.17 modifiée +4. Commit: `docs: add advanced worktree tooling section (Worktrunk + DIY alternatives)`