release: v3.24.0 - Agent Evaluation Framework

Major addition: Complete agent evaluation framework with production-ready template. ## Added - **Resource Evaluation**: nao framework (score 3/5) - Identified critical gap: agent evaluation not documented - Technical challenge adjusted score 2/5 → 3/5 - All claims fact-checked (TypeScript 58.9%, Python 38.5%) - **Guide Section**: Agent Evaluation (guide/agent-evaluation.md, ~3K tokens) - Metrics: response quality, tool usage, performance, satisfaction - Patterns: logging hooks, unit tests, A/B testing, feedback loops - Example: analytics agent with built-in metrics - Tools: nao framework reference, Claude Code hooks integration - **AI Ecosystem**: Section 8.2 Domain-Specific Agent Frameworks - nao (Analytics Agents): Database-agnostic, built-in evaluation - Transposable patterns: context builder, evaluation hooks, DB integrations - **Template**: Analytics Agent with Evaluation (5 files, ~1K lines) - README: setup, usage, troubleshooting - Agent: SQL generator with evaluation criteria, safety rules - Hook: automated metrics logging (safety, performance, errors) - Script: analysis with stats, safety reports, recommendations - Report template: monthly evaluation format ## Changed - Agent Evaluation Guide: updated template references, verified links - Landing Site: templates count 110 → 114 - Version: 3.23.5 → 3.24.0 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 11:52:13 +01:00 · 2026-02-10 11:52:13 +01:00 · ef7cdd899e
commit ef7cdd899e
parent 1fb783ebb8
15 changed files with 1782 additions and 16 deletions
--- a/guide/ai-ecosystem.md
+++ b/guide/ai-ecosystem.md
@ -1609,6 +1609,32 @@ If you're not using Gas Town/multiclaude, you can still:
 - Experimentation tolerance is high (work may be lost/redone)
 - Team has SRE capacity to monitor/intervene

+### 8.2 Domain-Specific Agent Frameworks
+
+Beyond general-purpose coding assistants, specialized frameworks target specific use cases with built-in context, evaluation, and deployment patterns.
+
+#### nao (Analytics Agents)
+
+**URL**: [github.com/getnao/nao](https://github.com/getnao/nao/) | **Stack**: TypeScript 58.9%, Python 38.5%
+
+**What it is**: Open-source framework for building and deploying analytics agents. Two-step architecture: build agent context via CLI (databases, docs, metadata) → deploy chat UI for natural language data queries.
+
+**Key features**:
+- Database agnostic (PostgreSQL, BigQuery, Snowflake, Databricks)
+- Built-in evaluation framework with unit testing
+- Native data visualization in chat interface
+- Self-hosted deployment with Docker
+- Stack: Fastify, Drizzle ORM, tRPC, React, shadcn UI
+
+**Relevance to Claude Code**: While nao deploys agents as standalone services (not Claude Code plugins), its patterns are transposable:
+- **Context builder architecture**: Structuring complex agent context (similar to `.claude/agents/` best practices)
+- **Evaluation framework**: Measuring agent quality through metrics, unit tests, and feedback loops (gap in current Claude Code workflows)
+- **Database integrations**: Patterns for injecting database context into agent prompts
+
+**When to use**: Data teams building conversational analytics interfaces for business users. For Claude Code users, nao serves as reference architecture for agent evaluation and database context patterns.
+
+**Status**: Active open-source project, production-ready, well-documented
+
 ---

 ## 9. Cost & Subscription Strategy