Commit graph

26 commits

Author SHA1 Message Date
Florian BRUNIAUX
d5375e32a5 docs: add 2 resource evaluations (Osmani LinkedIn + Beyond Vibe Coding)
Added:
- Resource Evaluation: Addy Osmani LinkedIn Post (scored 2/5, Marginal)
  - Post about Anthropic study (17% comprehension gap)
  - 100% overlap with Shen & Tamkin 2026 already documented
  - Decision: Tracking mention only (mainstream diffusion timeline)
  - New criterion: "Influencer Amplification" pattern documented

- Resource Evaluation: "Beyond Vibe Coding" Book (scored 3/5, Pertinent)
  - Comprehensive O'Reilly book by Addy Osmani
  - 90% overlap analysis (10/14 topics covered 100%)
  - Decision: Minimal integration (tracking mention + cross-refs)
  - Cross-validation with 2 Osmani articles already integrated

Updated:
- CHANGELOG.md: [Unreleased] section with detailed entries
- README.md: Resource evaluations count (36 → 38 assessments)

Files created:
- docs/resource-evaluations/addy-osmani-linkedin-anthropic-study.md
- docs/resource-evaluations/beyond-vibe-coding.md
- docs/resource-evaluations/nick-tune-feedback-loops.md

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 23:30:03 +01:00
Florian BRUNIAUX
fb339b8575 docs: update RTK evaluation (v0.2.0 → v0.7.0)
BREAKING UPDATE: All gaps from initial evaluation resolved upstream.

## Version Evolution
- Initial eval: v0.2.0 (2026-01-28, score 4/5)
- Updated eval: v0.7.0 (2026-02-01, score 4.5/5)
- Development: 5 major releases in 9 days

## Critical Changes Resolved
 pnpm support (v0.6.0) - was MISSING
 npm/vitest support (v0.6.0) - was MISSING
 Git arg parsing (v0.7.0) - was BROKEN
 grep functionality (v0.7.0) - was BROKEN
 ls efficiency (v0.7.0+) - was BROKEN (-274% worse)
 Analytics (v0.4.0) - rtk gain temporal audit
 Opportunity scanner (v0.7.0) - rtk discover
 GitHub CLI (v0.6.0) - full gh support
 Cargo commands (v0.6.0) - build/test/clippy
 Auto-rewrite hook (v0.7.0) - PreToolUse integration

## Score Changes
| Criterion | v0.2.0 | v0.7.0 | Change |
|-----------|--------|--------|--------|
| Accuracy & Reliability | 3 | 4 | +1 |
| Depth & Comprehensiveness | 4 | 5 | +1 |
| Practical Value | 5 | 5 | 0 |
| Originality & Uniqueness | 5 | 5 | 0 |
| Production Readiness | 3 | 4 | +1 |
| Community Validation | 2 | 3 | +1 |
| **TOTAL** | 3.90 | 4.33 | +0.43 |

Rounded: 4/5 → **4.5/5**

## Community Growth
- Stars: 8 → 17 (+113%)
- Forks: 0 → 2 (+200%)
- PRs merged: 0 → 10+ (community contributions)
- Contributors: 1 → 2+

## Architecture Maturity
- 24 command modules (was 12)
- 9 filtering strategies (50-99% reduction)
- SQLite token tracking (~/.local/share/rtk/history.db)
- Configuration system (~/.config/rtk/config.toml)

## Recommendation Update
- OLD: "GOOD (4/5) - git-only, bugs, experimental"
- NEW: "EXCELLENT (4.5/5) - production-ready, full stack"

## Fork Status
- Fork (FlorianBruniaux) contributed 10+ PRs to upstream
- All features merged → fork no longer needed
- Recommendation: Use upstream v0.7.0 directly

## Impact
- Token reduction: 72.6% (git) → 89.4% (full stack)
- Command coverage: 40% → 85% (dev sessions)
- Maturity: experimental → production-ready (early adopters)

File changes: 633 lines (+69), 405 insertions, 335 deletions (major rewrite)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 23:07:12 +01:00
Florian BRUNIAUX
fdee3305c5 docs: RTK documentation update - upstream + fork integration
- Update guide/ultimate-guide.md: RTK section (l.11084-11174)
  - Two repositories: upstream (stable) + fork (extended features)
  - Fork features: vitest, pnpm, prisma, gain, discover
  - Bug fixes documented (grep/ls fixed in fork)
  - Installation options: cargo, fork, binary

- Add guide/third-party-tools.md: RTK card (l.86)
  - Comparison upstream vs fork
  - Token savings: 70-90% depending on stack
  - Cross-reference to ultimate-guide Section 9

- Update machine-readable/reference.yaml:
  - rtk_upstream + rtk_fork_extended (two repos)
  - third_party_tools_rtk entry added
  - Line numbers updated

- Update docs/resource-evaluations/rtk-evaluation.md:
  - UPDATE 2026-02-01 section with fork comparison
  - Fork features table (JS/TS stack support)
  - Installation instructions for fork

Total: 4 files, ~320 lines modified

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 22:20:43 +01:00
Florian BRUNIAUX
a5942f1c53 docs: add Addy Osmani spec-writing evaluation (4/5) + spec-first.md sections
Integration of "How to write a good spec for AI agents" by Addy Osmani:

Evaluation (docs/resource-evaluations/addy-osmani-good-spec.md):
- Score: 4/5 (High Value - Integrate within 1 week)
- Fills gaps: modular design, operational boundaries, command specs
- Fact-checked: credentials verified via Perplexity, all claims sourced
- Challenge phase: technical-writer agent corrected initial 3/5 → 4/5

Spec-First Workflow Updates (guide/workflows/spec-first.md):
- NEW: "Modular Spec Design" section (~50 lines, line 322)
  Pattern: Split large specs into focused files (CLAUDE-[domain].md)
- NEW: "Operational Boundaries" section (~60 lines, line 372)
  Three-tier system: Always/Ask First/Never → maps to Claude Code modes
- NEW: "Command Spec Template" section (~40 lines, line 432)
  Executable command specs with expected outputs & error handling
- NEW: "Anti-Pattern: Monolithic CLAUDE.md" section (~30 lines, line 472)
  Explains cognitive load problem (>200 lines = context pollution)

Reference Index (machine-readable/reference.yaml):
- 8 new entries: spec_first_workflow → spec_osmani_score
- Links to new spec-first.md sections with line numbers
- Source attribution: https://addyosmani.com/blog/good-spec/

Public Facing (README.md):
- Incremented resource evaluations count: 35 → 36

File growth: spec-first.md 327 → 507 lines (+180)
Source: Addy Osmani (former Chrome team, 14y), published Jan 13, 2026

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 21:30:34 +01:00
Florian BRUNIAUX
bc86c8ed7f release: v3.20.6 - agentskills.io integration + 4 resource evaluations
- agentskills.io open standard: frontmatter table, skills-ref CLI, portability section
- Agent Skills supply chain risks (security-hardening.md §1.2)
- anthropics/skills (60K+★) added to complementary resources
- 16 new reference.yaml entries
- Resource evaluations: agentskills.io (4/5), Skill Doctor (2/5), dclaude (new), paddo (new)
- Sandbox isolation + README updates

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 16:49:33 +01:00
Florian BRUNIAUX
950370e81b release: v3.20.2 - Sandbox Isolation for Coding Agents
New guide file covering Docker Sandboxes (microVM isolation),
cloud alternatives (Fly.io Sprites, E2B, Vercel, Cloudflare),
safe autonomy workflows, and comparison matrix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 19:08:25 +01:00
Florian BRUNIAUX
22f2b91b83 docs: integrate Contribution Metrics blog (4/5) - Anthropic Jan 2026 data
New subsection in ultimate-guide.md with +67% PRs merged and 70-90%
AI-assisted code metrics. Separate from Aug 2025 study (different
methodology: PR-based vs self-reported). ROI cross-reference added.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 23:34:15 +01:00
Florian BRUNIAUX
26ee4ef894 release: v3.20.1 - Vercel AGENTS.md vs Skills evaluation
- New resource evaluation (025): Vercel blog on eager context vs lazy
  skill invocation (Gao, Jan 2026). Score 3/5, 13/13 fact-checked.
- Guide: added 8KB compression benchmark to CLAUDE.md sizing (line 3527)
- Guide: added 56% skill invocation warning to Memory Loading (line 4082)
- Guide: added invocation reliability caveat to skills.sh trade-offs
- Version sync 3.20.0 → 3.20.1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 21:45:14 +01:00
Florian BRUNIAUX
fd4550cbd3 release: v3.20.0 - Multi-Agent Code Review Automation
Integration of production-grade PR review patterns from Pat Cullen + Méthode Aristote.

New Features:
- Resource evaluation: Pat Cullen Final Review (5/5 - Critical)
- Enhanced /review-pr: +150 lines with Advanced Multi-Agent Review section
- Enhanced code-reviewer agent: +219 lines with anti-hallucination rules
- New workflow: Review Auto-Correction Loop in iterative-refinement.md
- Production example: Multi-Agent Code Review in ultimate-guide.md
- Reference updates: +3 entries (review_pr_advanced, review_anti_hallucination, review_auto_fix_loop)

Key Patterns:
- 3 specialized agents: Consistency, SOLID, Defensive Code Auditor
- Pre-flight check: git log Co-Authored-By detection
- Anti-hallucination: Grep/Glob verification before suggestions
- Severity classification: 🔴 Must Fix / 🟡 Should Fix / 🟢 Can Skip
- Convergence loop: review → fix → re-review (max 3 iterations)
- Conditional context loading: stack-agnostic decision table

Design Principles:
- Enrich existing files (no fragmentation)
- No breaking changes (review-pr.md template simple preserved)
- Complete attribution (Pat Cullen + Méthode Aristote with links)
- Audience-aware (beginner → advanced progression)

Files Modified:
- CHANGELOG.md, VERSION: bumped to 3.20.0
- docs/resource-evaluations/017-pat-cullen-final-review.md: NEW (120 lines)
- examples/commands/review-pr.md: 80 → 230 lines (+150)
- examples/agents/code-reviewer.md: 72 → 291 lines (+219)
- guide/workflows/iterative-refinement.md: 389 → 522 lines (+133)
- guide/ultimate-guide.md: +28 lines (Production Example section)
- machine-readable/reference.yaml: +3 entries
- README.md, guide/cheatsheet.md: version sync

Total: +537 insertions, 0 deletions (no breaking changes)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-30 16:07:09 +01:00
Florian BRUNIAUX
97d41b8598 release: v3.19.0 - Hook Execution Model documentation
Adds comprehensive async hooks documentation filling critical gap.
Includes decision matrix, migration guide, and Aristote case study.

Changes:
- Added Hook Execution Model section to ultimate-guide.md (~97 lines)
- Documented sync vs async hooks (v2.1.0+) with configuration examples
- Added decision matrix for 15 use cases
- Updated reference.yaml with 7 new hook async entries
- Resource evaluation: Melvyn Malherbe LinkedIn post (score 1/5)
- Aristote case study: 7 hooks analyzed, 3 migrated async
- Version bump: 3.18.2 → 3.19.0

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-30 12:37:23 +01:00
Florian BRUNIAUX
8b58f014e7 docs: add Addy Osmani 80% problem to Practitioner Insights
Add Addy Osmani (Google Chrome Team) article "The 80% Problem in
Agentic Coding" to AI Ecosystem Practitioner Insights section.

Changes:
- guide/ai-ecosystem.md: Add 32-line entry after Steinberger (~line 2024)
  * "80% problem" framework and comprehension debt concept
  * Three new failure modes (overengineering, assumption propagation, sycophantic)
  * Productivity paradox data (+98% PRs, +91% review time)
  * Alignment table mapping to existing guide sections
  * Transparent note: "secondary synthesis, primary sources documented"

- machine-readable/reference.yaml: Add 4 new references
  * practitioner_addy_osmani, practitioner_osmani_source
  * eighty_percent_problem, comprehension_debt_secondary

- docs/resource-evaluations/024-addy-osmani-80-percent-problem.md: Complete evaluation
  * Score: 3/5 (Pertinent) - downgraded from initial 4/5 after technical-writer challenge
  * Minimal integration (32 lines vs rejected 250 lines)
  * Fact-check: 6 stats verified, 1 Stack Overflow stat incorrect
  * Rationale: 90% overlap with existing content (Vibe Coding Trap, Trust Calibration)

- CHANGELOG.md: Document addition in v3.19.0

Decision: Minimal integration approach chosen to avoid duplication while
recognizing value of synthesis from respected author. Article aggregates
existing research already cited in guide with primary sources.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-30 12:32:38 +01:00
Florian BRUNIAUX
7df11b224f release: v3.18.2 - Steinberger Practitioner Insight
Add Peter Steinberger (PSPDFKit Founder, Moltbot Creator) to Practitioner
Insights with model-agnostic workflow patterns.

Changes:
- Add Steinberger entry in guide/ai-ecosystem.md (stream monitoring,
  multi-project juggling, fresh context validation, iterative exploration)
- Complete evaluation in docs/resource-evaluations/steinberger-inference-speed.md
  (score 3/5, fact-checked GPT-5.2, validated credentials)
- Update docs/resource-evaluations/README.md (15→16 evaluations)
- Add practitioner_steinberger references in machine-readable/reference.yaml
- Version bump 3.18.1 → 3.18.2 (VERSION + sync all docs)
- Update CHANGELOG.md with detailed v3.18.2 entry
- Update README.md evaluations count (22→25)

Scope: Model-agnostic patterns only, zero model comparisons.
Source: https://steipete.me/posts/2025/shipping-at-inference-speed

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-30 09:49:55 +01:00
Florian BRUNIAUX
940caf3f1e docs: add verified critical bugs tracker (known-issues.md)
NEW: guide/known-issues.md (285 lines)
- GitHub issue auto-creation bug (Issue #13797, v2.0.65+, ACTIVE)
  * 17+ confirmed accidental public disclosures
  * Security/privacy risk documented
  * Workarounds: explicit repo, manual approval, pre-execution verification
- Excessive token consumption (Issue #16856, v2.1.1+, Jan 2026)
  * 20+ reports of 4x+ faster consumption
  * Anthropic: "Not officially confirmed as bug" (investigating)
  * Workarounds: /context monitoring, shorter sessions, disable auto-compact
- Model quality degradation (Aug-Sep 2025, RESOLVED)
  * Anthropic official postmortem: 3 infrastructure bugs
  * Community theories (quantization) debunked

FACT-CHECKED: Perplexity Pro + GitHub API direct queries
- Verified: 5,702 open issues (not 4,697), 527 invalid labels
- Corrected: v2.1.1 token bug (not non-existent v2.0.61)
- Sources: GitHub Issues, Anthropic postmortem, The Register

UPDATED:
- guide/README.md: Added known-issues.md to docs table
- machine-readable/reference.yaml: 4 new entries for issue tracking
- CHANGELOG.md: Documented integration process

NEW: docs/resource-evaluations/023-community-discussions-report-jan2026.md
- Full evaluation process documented
- Fact-check methodology: Perplexity + GitHub API
- Score: 2/5 (Marginal - partial integration only)
- Lesson: Always verify community reports with primary sources

Impact: Critical security awareness for users, actionable workarounds

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 17:59:16 +01:00
Florian BRUNIAUX
c28161dca8 docs: enrich RTK evaluation with T3 Stack production testing
Real-World Testing Results (Méthode Aristote - T3 Stack):
- Project: Next.js 15 + tRPC + Prisma + pnpm
- Commands tested: 12 (git, pnpm, Vitest, TypeScript, Prisma)
- Git workflows validated: 85.6% avg reduction (up from 72.6%)

Critical Bug Discovered:
- git argument parsing broken (`--oneline`, `--graph` blocked)
- Workaround: `rtk git log -- -20` (works)
- Impact: CRITICAL - affects ALL git users

Modern Stack Gaps Identified:
- pnpm support MISSING (80-90% reduction possible, CRITICAL impact)
- Vitest support MISSING (90% reduction possible, HIGH impact)
- TypeScript support MISSING (70% reduction possible, MEDIUM impact)

ROI Analysis:
- Current v0.2.0: 40% command coverage, 55% token reduction
- Proposed v0.3.0 (pnpm + Vitest): 85% coverage, 80% reduction
- Dev effort: 1 week (7 days)

New Deliverables:
- Benchmark script: examples/scripts/rtk-benchmark.sh (reproductible tests)
- Test results: claudedocs/rtk-test-results-aristote.md (53KB, gitignored)
- Updated PR proposals: claudedocs/rtk-pr-proposals.md (P0-P2 ranking)
- GitHub issues: claudedocs/rtk-github-issue-template.md (ready for upstream)

Updated Evaluation:
- Score: Still 4/5 (GOOD) but clearer path to 5/5 (CRITICAL)
- Blockers: git args bug + pnpm/Vitest gaps
- Strength: 85.6% git reduction validated on production codebase

Full report: claudedocs/rtk-test-results-aristote.md (23K detailed analysis)
2026-01-28 14:01:37 +01:00
Florian BRUNIAUX
1000cb6e85 docs: add RTK integration templates and evaluation
- Evaluation: docs/resource-evaluations/rtk-evaluation.md (4/5 score, comprehensive benchmarks)
- CLAUDE.md template: examples/claude-md/rtk-optimized.md (manual usage instructions)
- Skill template: examples/skills/rtk-optimizer/SKILL.md (auto-suggestion)
- Hook template: examples/hooks/bash/rtk-auto-wrapper.sh (PreToolUse auto-wrapper)
- PR proposals: claudedocs/rtk-pr-proposals.md (7 upstream improvements)

These templates enable 3 RTK integration strategies referenced in guide:10478
2026-01-28 13:03:10 +01:00
Florian BRUNIAUX
34b7376408 fix: correct mgrep misattribution in Everything Claude Code evaluation
Issue:
- Incorrectly claimed Everything Claude Code contained "mgrep (50% token reduction)" tool
- No such tool exists in affaan-m/everything-claude-code (verified via WebFetch + repo search)
- Confused mgrep (mixedbread-ai semantic search) with non-existent token reduction tool

Files corrected:
- docs/resource-evaluations/015-everything-claude-code-github-repo.md (14 occurrences removed)
- machine-readable/reference.yaml:724 (unique patterns list updated)
- guide/ultimate-guide.md:14821 (replaced with verified patterns)
- CHANGELOG.md (v3.17.0 and v3.15.0 entries updated)

Verified patterns now documented:
- hookify (conversational hooks)
- pass@k metrics (formal verification)
- sandboxed subagents (tool restrictions)
- strategic compaction skills (context management)

Impact: Maintains guide accuracy, prevents user confusion

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 09:50:07 +01:00
Florian BRUNIAUX
11d2e4dfe3 docs: add everything-claude-code repository evaluation (5/5 CRITICAL)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-27 16:29:12 +01:00
Florian BRUNIAUX
3a5012eef7 docs: document Tasks API field visibility limitations (Gang Rui analysis)
Integration of community practitioner feedback on Tasks API (v2.1.16+)
field visibility constraints discovered through real-world usage.

Changes:
- guide/ultimate-guide.md:
  * Added 3 rows to comparison table (field visibility, metadata, overhead)
  * New subsection "⚠️ Tasks API Limitations (Critical)" (~40 lines)
  * Field visibility constraint table, cost examples, 3 workaround patterns

- guide/workflows/task-management.md:
  * New subsection "⚠️ Field Visibility Limitations" (~35 lines)
  * Workflow adjustments, cost awareness, mitigation strategies

- guide/cheatsheet.md:
  * Added limitation note with actionable tip (~3 lines)

- machine-readable/reference.yaml:
  * 4 new entries: limitations, field_visibility, cost_overhead, workarounds
  * Updated resource_evaluations_count: 16 → 22

- docs/resource-evaluations/016-gang-rui-tasks-api-limitations.md:
  * New comprehensive evaluation (score 5/5 CRITICAL)
  * Fact-check, challenge phase, integration details

- README.md:
  * Updated resource evaluations count: 15 → 22 assessments

Score: 5/5 (CRITICAL) - Breaks recommended workflow, 11x-51x cost
overhead, prevents user frustration, maintains guide credibility.

Source: https://www.linkedin.com/posts/limgangrui_i-explored-the-new-claude-codes-task-system-activity-7420651412881268736-Hpd6
Date: 2026-01-24

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-27 16:16:49 +01:00
Florian BRUNIAUX
edf74b38c5 docs: add missing hook events from official CHANGELOG (v2.1.9-v2.1.10)
- Add 3 missing events to Section 7.1: Setup, PermissionRequest, SubagentStop
- Document PreToolUse additionalContext feature (v2.1.9+)
- Create 3 production-ready hook templates (setup, permission, subagent)
- Add resource evaluation documenting rejection of secondary source

Source: Official Claude Code CHANGELOG, not external blog posts
Closes gap identified during resource evaluation process

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-27 12:45:47 +01:00
Florian BRUNIAUX
6e621806a4 docs: add Myths vs Reality appendix + TeammateTool documentation
- Appendix D: Myths vs Reality
  - Myth: Hidden features with secret flags
  - Myth: Tasks API = autonomous agents
  - Myth: 100x faster claims
  - Reality: Documented strengths of Claude Code
  - How to spot reliable vs unreliable sources

- New: TeammateTool experimental feature documentation
  - Multi-agent orchestration capabilities
  - Execution backends (in-process, tmux, iTerm2)
  - Usage patterns (parallel specialists, swarm)
  - Clear warnings about experimental status
  - Community sources cited

- Cheatsheet: Features Méconnues (But Official!) section
  - Tasks API, Background Agents, TeammateTool
  - Session Forking, LSP Tool
  - Pro tip: Read the CHANGELOG

- Reference.yaml: Added line numbers for new sections

- Resource evaluation: Rejected low-quality social media post
  (docs/resource-evaluations/2026-01-27-claude-code-hidden-feature-social-post.md)

Addresses community misinformation while documenting real experimental features with proper sourcing.
2026-01-27 09:45:06 +01:00
Florian BRUNIAUX
a8d0f0273d release: version 3.15.0 - MCP Apps integration
Bump version to 3.15.0 with comprehensive MCP Apps documentation.

Version updates:
- VERSION: 3.14.0 → 3.15.0
- Synced across: README.md, cheatsheet.md, ultimate-guide.md, reference.yaml
- Updated date: reference.yaml (2026-01-27)

CHANGELOG.md:
- Added MCP Apps (SEP-1865) documentation entry in [Unreleased]
- ~50 lines detailing all changes:
  - architecture.md section (~150 lines)
  - ultimate-guide.md section (~90 lines)
  - Table update (Plugin vs MCP vs MCP Apps)
  - machine-readable/reference.yaml (8 entries)
  - Resource evaluation (159 lines, score 4/5)
- Key facts: First official MCP extension, co-authored OpenAI+Anthropic
- 9 interactive tools at launch (Asana, Slack, Figma, etc.)
- Platform support: Claude Desktop, VS Code, ChatGPT, Goose
- CLI relevance: Indirect (ecosystem, dev, hybrid workflows)

README.md:
- Resource Evaluations: 14 → 15 assessments

docs/resource-evaluations/README.md:
- Added MCP Apps entry in index table
- Score: 3/5 → 4/5 (High Value)
- Updated date: 2026-01-27
- Confirmed count: 15 evaluations

Total changes:
- 2 commits (MCP Apps docs + version bump)
- ~240 lines documentation (architecture + guide)
- 15 resource evaluations tracked
- 4/5 integration score (ecosystem impact)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-27 08:24:25 +01:00
Florian BRUNIAUX
18ea240e12 docs: add MCP Apps (SEP-1865) documentation
Integrate comprehensive documentation for MCP Apps, the first official
MCP extension enabling interactive UI delivery.

Changes:
- guide/architecture.md (656): New section "MCP Extensions: Apps"
  - Technical architecture (primitives, SDK, security)
  - Platform support (Claude Desktop, VS Code, ChatGPT, Goose)
  - Example implementations (9 production tools at launch)
  - Developer workflow and SDK usage
  - ~150 lines of technical documentation

- guide/ultimate-guide.md (6509): New section "MCP Evolution: Apps"
  - User context and use cases
  - Available interactive tools (Asana, Slack, Figma, etc.)
  - Platform support matrix
  - Hybrid workflow examples
  - ~90 lines of user-facing documentation

- guide/ultimate-guide.md (7522): Table update
  - Added "Interactive UI" row to Plugin vs. MCP Server comparison
  - Clarified MCP Apps = "What Claude can show"

- machine-readable/reference.yaml: 8 new entries
  - mcp_apps_architecture, mcp_apps_evolution
  - Links to spec, SDK, blog posts
  - CLI relevance note (indirect)

- docs/resource-evaluations/mcp-apps-announcement.md: New evaluation
  - Score: 4/5 (High Value - Integrate within 1 week)
  - Fact-checked with Perplexity searches
  - Technical review by agent

Resource evaluated:
- https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps/
- https://claude.com/blog/interactive-tools-in-claude

Total documentation: ~240 lines across 3 files
Score: 4/5 (High Value)
CLI relevance: Indirect (ecosystem understanding, MCP server dev, hybrid workflows)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-27 08:14:49 +01:00
Florian BRUNIAUX
3a7671ac5e docs: update $ARGUMENTS syntax for v2.1.19 breaking change + evaluation
Updated all documentation and examples to reflect Claude Code v2.1.19
breaking change: $ARGUMENTS.0 → $ARGUMENTS[0] (bracket syntax).

Changes:
- guide/ultimate-guide.md: 7 occurrences updated to bracket/shorthand syntax
- guide/cheatsheet.md: Command template updated
- Added migration note in § 6.2 Variable Interpolation
- Created migration scripts: migrate-arguments-syntax.{sh,ps1}
  • Automated detection + conversion with backups
  • Dry-run mode, cross-platform (macOS/Linux/Windows)
- Added formal evaluation: eval-claude-code-releases-jan2026.md
  • Score: 5/5 (Critical - Integrate Immediately)
  • Covers releases 2.1.0 to 2.1.19 (January 2026)
  • Technical accuracy verified against GitHub CHANGELOG

Files:
- guide/ultimate-guide.md (+23 lines, 7 occurrences fixed)
- guide/cheatsheet.md (+1 line)
- examples/scripts/migrate-arguments-syntax.sh (+152 lines)
- examples/scripts/migrate-arguments-syntax.ps1 (+143 lines)
- docs/resource-evaluations/eval-claude-code-releases-jan2026.md (+273 lines)
- CHANGELOG.md (+12 lines, Unreleased section)

Total: +605 lines

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 17:37:49 +01:00
Florian BRUNIAUX
08a2a4261f docs: add system prompts official sources (Anthropic + community analyses)
- Add System Prompt Contents section in architecture.md (line 300-326)
  - Official Anthropic sources (Tier 1, 100% confidence)
  - Community analyses (Simon Willison, PromptHub)
  - Claude.ai vs API vs Code CLI distinctions

- Update reference.yaml with 4 new system prompts entries
  - system_prompts_official, willison_analysis, prompthub, architecture

- Create resource evaluation: system-prompts-official-vs-community.md
  - Score evolution: 4/5 → 3/5 → 2/5 (after Perplexity fact-check)
  - Decision: Do not integrate x1xhlol repository (redundant)
  - Use official Anthropic sources instead

- Create watchlist for non-integrated resources
  - claudedocs/resource-evaluations/watch-list.md (gitignored)
  - Re-evaluation triggers: Q2 2026 or if 10+ user requests

Related evaluation: docs/resource-evaluations/system-prompts-official-vs-community.md

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 17:03:42 +01:00
Florian BRUNIAUX
ab8bfcf782 docs: add Anaconda Croce evaluation (minimal integration)
Resource evaluated: "What I Learned Challenging Claude to a Coding Competition"
by Steve Croce (Anaconda Field CTO, Jan 16, 2026)

Score: 2/5 (Marginal - Info secondaire)

Integration:
- Added "Community Experiences" section in guide/learning-with-ai.md
- 2-paragraph mention with strong caveats (N=1, non-representative context)
- Full evaluation in docs/resource-evaluations/anaconda-croce-evaluation.md
- Updated reference.yaml count (14 → 16 evaluations)

Rationale:
- Provides light empirical validation (90s vs 60min on Advent of Code)
- Highlights "collaboration cost" angle (decreased Slack engagement)
- Limitations prevent extensive integration (solo puzzles ≠ team dev)
- Commercial bias noted (Anaconda blog by Anaconda CTO)

Technical review challenged initial 4/5 score → adjusted to 2/5.
Maintains guide rigor through minimal integration + explicit caveats.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 16:53:48 +01:00
Florian BRUNIAUX
1136dc683f docs: add resource-evaluations to tracked docs
- Create docs/resource-evaluations/ with 15 evaluation files
- Standardize filenames (remove date prefixes)
- Keep working docs and private audits in claudedocs/ (gitignored)
- Add resource evaluation workflow to CLAUDE.md

Files migrated:
- gsd, worktrunk, boris-cowork-video, wooldridge-productivity-stack
- remotion, nick-jensen, se-cove, self-improve-skill
- astgrep, clawdbot, prompt-repetition, uml-diagrams
- vibe-coding-rusitschka, anthropic-releases

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-26 14:02:05 +01:00