From ef7cdd899e1fd4c1a9fca2bca0a6fe1cdbfad975 Mon Sep 17 00:00:00 2001 From: Florian BRUNIAUX Date: Tue, 10 Feb 2026 11:52:13 +0100 Subject: [PATCH] release: v3.24.0 - Agent Evaluation Framework MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major addition: Complete agent evaluation framework with production-ready template. ## Added - **Resource Evaluation**: nao framework (score 3/5) - Identified critical gap: agent evaluation not documented - Technical challenge adjusted score 2/5 → 3/5 - All claims fact-checked (TypeScript 58.9%, Python 38.5%) - **Guide Section**: Agent Evaluation (guide/agent-evaluation.md, ~3K tokens) - Metrics: response quality, tool usage, performance, satisfaction - Patterns: logging hooks, unit tests, A/B testing, feedback loops - Example: analytics agent with built-in metrics - Tools: nao framework reference, Claude Code hooks integration - **AI Ecosystem**: Section 8.2 Domain-Specific Agent Frameworks - nao (Analytics Agents): Database-agnostic, built-in evaluation - Transposable patterns: context builder, evaluation hooks, DB integrations - **Template**: Analytics Agent with Evaluation (5 files, ~1K lines) - README: setup, usage, troubleshooting - Agent: SQL generator with evaluation criteria, safety rules - Hook: automated metrics logging (safety, performance, errors) - Script: analysis with stats, safety reports, recommendations - Report template: monthly evaluation format ## Changed - Agent Evaluation Guide: updated template references, verified links - Landing Site: templates count 110 → 114 - Version: 3.23.5 → 3.24.0 Co-Authored-By: Claude Sonnet 4.5 --- CHANGELOG.md | 48 +- README.md | 2 +- VERSION | 2 +- docs/resource-evaluations/nao-framework.md | 202 ++++++++ examples/agents/analytics-with-eval/README.md | 243 ++++++++++ .../analytics-with-eval/analytics-agent.md | 315 +++++++++++++ .../analytics-with-eval/eval/metrics.sh | 137 ++++++ .../eval/report-template.md | 257 +++++++++++ .../hooks/post-response-metrics.sh | 103 +++++ guide/README.md | 1 + guide/agent-evaluation.md | 436 ++++++++++++++++++ guide/ai-ecosystem.md | 26 ++ guide/cheatsheet.md | 4 +- guide/ultimate-guide.md | 6 +- machine-readable/reference.yaml | 16 +- 15 files changed, 1782 insertions(+), 16 deletions(-) create mode 100644 docs/resource-evaluations/nao-framework.md create mode 100644 examples/agents/analytics-with-eval/README.md create mode 100644 examples/agents/analytics-with-eval/analytics-agent.md create mode 100644 examples/agents/analytics-with-eval/eval/metrics.sh create mode 100644 examples/agents/analytics-with-eval/eval/report-template.md create mode 100644 examples/agents/analytics-with-eval/hooks/post-response-metrics.sh create mode 100644 guide/agent-evaluation.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 18e318e..c7f91a2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,53 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). - +## [3.24.0] - 2026-02-10 + +### Added + +- **Resource Evaluation**: nao framework (`docs/resource-evaluations/nao-framework.md`) + - Evaluated open-source framework for building analytics agents + - Score: 3/5 (Moderate - Useful Complement) + - Identified critical gap: Agent evaluation not documented in guide + - Technical challenge by technical-writer agent adjusted score from 2/5 to 3/5 + - All technical claims fact-checked (TypeScript 58.9%, Python 38.5%, stack verified) + +- **New Guide Section**: Agent Evaluation (`guide/agent-evaluation.md`, ~3000 tokens) + - **Why Evaluate Agents**: Quantify quality, compare configurations, build feedback loops + - **Metrics to Track**: Response quality, tool usage, performance, user satisfaction + - **Implementation Patterns**: Logging hooks, unit tests, A/B testing, feedback loops + - **Example**: Analytics agent with built-in metrics collection + - **Tools & References**: nao framework as reference, Claude Code hooks integration + - Addresses critical gap identified in nao evaluation + - Navigation: After `guide/ultimate-guide.md` Section 4 (Agents) + +- **AI Ecosystem Update**: Section 8.2 Domain-Specific Agent Frameworks (`guide/ai-ecosystem.md`) + - New subsection after "Multi-Agent Orchestration Systems" + - **nao (Analytics Agents)**: Database-agnostic framework with built-in evaluation + - Transposable patterns: Context builder architecture, evaluation hooks, database integrations + - Links to new `guide/agent-evaluation.md` for implementation details + - Location: guide/ai-ecosystem.md lines 1612-1638 + +- **Template**: Analytics Agent with Evaluation (`examples/agents/analytics-with-eval/`, 5 files) + - **README.md**: Complete setup, usage, troubleshooting (production-ready) + - **analytics-agent.md**: SQL query generator with evaluation criteria and safety rules + - **hooks/post-response-metrics.sh**: Automated metrics logging (safety, performance, errors) + - **eval/metrics.sh**: Analysis script for aggregating collected metrics + - **eval/report-template.md**: Monthly evaluation report template + - Demonstrates patterns from `guide/agent-evaluation.md` in complete implementation + - Includes safety checks (destructive operations), performance monitoring, feedback loops + +### Changed + +- **Agent Evaluation Guide**: Updated template reference (line 434) + - Changed "(coming soon)" to "with hooks, scripts, and report template" + - Added reference to complete template in "Example" section (line 277) + - All links verified and functional + +- **Landing Site**: Templates count synchronized + - Updated index.html: 110 → 114 templates + - Updated examples/index.html: 110 → 114 templates + - Reflects addition of analytics-with-eval template (5 new files) ## [3.23.5] - 2026-02-10 diff --git a/README.md b/README.md index 6560bf7..e1ad333 100644 --- a/README.md +++ b/README.md @@ -509,7 +509,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines. --- -*Version 3.23.5 | February 2026 | Crafted with Claude* +*Version 3.24.0 | February 2026 | Crafted with Claude*