release: v3.24.0 - Agent Evaluation Framework

Major addition: Complete agent evaluation framework with production-ready template.

## Added

- **Resource Evaluation**: nao framework (score 3/5)
  - Identified critical gap: agent evaluation not documented
  - Technical challenge adjusted score 2/5 → 3/5
  - All claims fact-checked (TypeScript 58.9%, Python 38.5%)

- **Guide Section**: Agent Evaluation (guide/agent-evaluation.md, ~3K tokens)
  - Metrics: response quality, tool usage, performance, satisfaction
  - Patterns: logging hooks, unit tests, A/B testing, feedback loops
  - Example: analytics agent with built-in metrics
  - Tools: nao framework reference, Claude Code hooks integration

- **AI Ecosystem**: Section 8.2 Domain-Specific Agent Frameworks
  - nao (Analytics Agents): Database-agnostic, built-in evaluation
  - Transposable patterns: context builder, evaluation hooks, DB integrations

- **Template**: Analytics Agent with Evaluation (5 files, ~1K lines)
  - README: setup, usage, troubleshooting
  - Agent: SQL generator with evaluation criteria, safety rules
  - Hook: automated metrics logging (safety, performance, errors)
  - Script: analysis with stats, safety reports, recommendations
  - Report template: monthly evaluation format

## Changed

- Agent Evaluation Guide: updated template references, verified links
- Landing Site: templates count 110 → 114
- Version: 3.23.5 → 3.24.0

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Florian BRUNIAUX 2026-02-10 11:52:13 +01:00
parent 1fb783ebb8
commit ef7cdd899e
15 changed files with 1782 additions and 16 deletions

View file

@ -3,7 +3,7 @@
# Source: guide/ultimate-guide.md
# Purpose: Condensed index for LLMs to quickly answer user questions about Claude Code
version: "3.23.5"
version: "3.24.0"
updated: "2026-02-09"
# ════════════════════════════════════════════════════════════════
@ -173,7 +173,7 @@ deep_dive:
third_party_toad: "https://github.com/batrachianai/toad"
third_party_conductor: "https://docs.conductor.build"
# Configuration Management & Backup (Added 2026-02-02)
config_management_guide: "guide/ultimate-guide.md:4085" # Section 3.23.5
config_management_guide: "guide/ultimate-guide.md:4085" # Section 3.24.0
config_hierarchy: "guide/ultimate-guide.md:4095" # Global → Project → Local precedence
config_git_strategy_project: "guide/ultimate-guide.md:4110" # What to commit in .claude/
config_git_strategy_global: "guide/ultimate-guide.md:4133" # Version control ~/.claude/
@ -1169,7 +1169,7 @@ ecosystem:
- "Cross-links modified → Update all 4 repos"
history:
- date: "2026-01-20"
event: "Code Landing sync v3.23.5, 66 templates, cross-links"
event: "Code Landing sync v3.24.0, 66 templates, cross-links"
commit: "5b5ce62"
- date: "2026-01-20"
event: "Cowork Landing fix (paths, README, UI badges)"
@ -1181,7 +1181,7 @@ ecosystem:
onboarding_matrix_meta:
version: "2.0.0"
last_updated: "2026-02-05"
aligned_with_guide: "3.23.5"
aligned_with_guide: "3.24.0"
changelog:
- version: "2.0.0"
date: "2026-02-05"
@ -1209,7 +1209,7 @@ onboarding_matrix:
core: [rules, sandbox_native_guide, commands]
time_budget: "5 min"
topics_max: 3
note: "SECURITY FIRST - sandbox before commands (v3.23.5 critical fix)"
note: "SECURITY FIRST - sandbox before commands (v3.24.0 critical fix)"
beginner_15min:
core: [rules, sandbox_native_guide, workflow, essential_commands]
@ -1294,7 +1294,7 @@ onboarding_matrix:
- default: agent_validation_checklist
time_budget: "60 min"
topics_max: 6
note: "Dual-instance pattern for quality workflows (v3.23.5)"
note: "Dual-instance pattern for quality workflows (v3.24.0)"
learn_security:
intermediate_30min:
@ -1305,7 +1305,7 @@ onboarding_matrix:
- default: permission_modes
time_budget: "30 min"
topics_max: 4
note: "NEW goal (v3.23.5) - Security-focused learning path"
note: "NEW goal (v3.24.0) - Security-focused learning path"
power_60min:
core: [sandbox_native_guide, mcp_secrets_management, security_hardening]
@ -1330,7 +1330,7 @@ onboarding_matrix:
core: [rules, sandbox_native_guide, workflow, essential_commands, context_management, plan_mode]
time_budget: "60 min"
topics_max: 6
note: "Security foundation + core workflow (v3.23.5 sandbox added)"
note: "Security foundation + core workflow (v3.24.0 sandbox added)"
intermediate_120min:
core: [plan_mode, agents, skills, config_hierarchy, git_mcp_guide, hooks, mcp_servers]