claude-code-ultimate-guide/docs/resource-evaluations/nao-framework.md

# Resource Evaluation: nao Framework

**URL**: https://github.com/getnao/nao/
**Type**: Open-source framework for building analytics agents
**Evaluation Date**: 2026-02-10
**Evaluator**: Claude Code (with technical-writer challenge)
**Target Guide**: Claude Code Ultimate Guide

---

## Summary

**nao** is an open-source framework for building and deploying analytics agents with a two-step architecture: build agent context via CLI tools, then deploy chat UI for end users to query data conversationally.

**Key Features**:
- Context builder supporting flexible data, metadata, documentation, and tool integrations
- Database agnostic (PostgreSQL, BigQuery, Snowflake, Databricks)
- Built-in evaluation framework with unit testing capabilities
- Self-hosted deployment with Docker containerization
- Native data visualization within chat interface
- Tech stack: Fastify, Drizzle ORM, tRPC, React, shadcn UI (TypeScript 58.9%, Python 38.5%)

---

## Relevance Score: 3/5 (Moderate - Useful Complement)

### Initial Score: 2/5 → Adjusted to 3/5 after technical challenge

**Justification for 3/5**:

**Relevance (+2 points)**:
- ✅ **Transposable architecture patterns** to Claude Code agents (.claude/agents/)
- ✅ **Evaluation framework** addresses critical gap in guide (no section on agent evaluation)
- ✅ **Database context patterns** applicable to agents with DB integrations

**Limitations (-2 points)**:
- ❌ No direct integration with Claude Code CLI (not a plugin/MCP server)
- ❌ Deployment scope (production chat UI) differs from guide's dev CLI focus

**Partial overlap**: Agent concepts and evaluation patterns are transposable, but final product (deployed analytics UI) is not guide's focus.

---

## Comparative Analysis

| Aspect | nao | Current Guide | Gap? |
|--------|-----|---------------|------|
| **Custom Agents** | ✅ Complete build + deploy framework | ✅ Extensive docs (.claude/agents/) | ➖ No |
| **Agent Architecture** | ➕ Structured context builder pattern | ⚠️ No complex context patterns | ✅ **Minor gap** |
| **Agent Evaluation** | ➕ Integrated framework (metrics, unit tests, feedback) | ❌ No mention | ✅ **Critical gap** |
| **Database Integrations** | ➕ Native support: BigQuery, Snowflake, Databricks, PostgreSQL | ✅ Mentioned (database-branch-setup.md) | ➖ No |
| **Agent Deployment** | ➕ Deployable chat UI with visualizations | ❌ Not covered (CLI focus) | ⚠️ Out of scope |
| **Analytics-Specific** | ➕ Specialized for conversational analytics | ❌ No analytics focus | ➖ No |
| **Context Management** | ✅ Context builder with docs/metadata | ✅ CLAUDE.md, .claude/rules/ | ➖ No |
| **Self-Hosted** | ✅ Docker + deployment guides | ✅ Mentioned (security-hardening.md) | ➖ No |

---

## Integration Recommendations

**Score 3/5 → Integrate (3 approaches)**

### ✅ Priority 1: Dedicated Section (Recommended)

**Create**: `guide/agent-evaluation.md` (~800 tokens)

**Content**:
- **Why Evaluate?**: Measure quality, track usage, identify bottlenecks
- **Metrics to Track**: Response time, tool call success rate, context relevance, error rate
- **Implementation Patterns**: Logging hooks, metrics aggregation, A/B testing, feedback collection
- **Reference**: Mention nao as complete evaluation framework

**Estimated tokens**: 500-800
**Suggested timeline**: Week 1
**Insertion point**: After `guide/ultimate-guide.md` section 4 (Agents)

---

### ✅ Priority 2: Agent Template with Evaluation (Optional)

**Create**: `examples/agents/analytics-with-eval/`

**Structure**:
```
analytics-with-eval/
├── config.yaml          # Agent config
├── context.md           # Agent instructions
├── eval/
│   ├── metrics.sh       # Collect metrics
│   └── report.md        # Example report
└── hooks/
    └── post_response.sh # Log response metrics
```

**Estimated tokens**: 300-400 (code + README)
**Suggested timeline**: Week 2-3
**Value**: Demonstrates concrete evaluation patterns

---

### ✅ Priority 3: Ecosystem Mention (Minimal)

**Add**: Section "Domain-Specific Agent Frameworks" in `guide/ai-ecosystem.md`

**Content**: 1 paragraph + link to nao

**Estimated tokens**: 100-150
**Suggested timeline**: Immediate
**Suggested line**: After orchestration frameworks section (~line 2200)

---

## Technical Challenge Results

The technical-writer agent identified several biases in the initial evaluation:

### Missed Points in Initial Evaluation

1. **Transposable agent architecture**: nao's `context builder` pattern applicable to Claude Code agents
2. **Evaluation framework = critical gap**: Guide has NO mention of agent evaluation (metrics, testing, feedback)
3. **Database context patterns**: Patterns for context injection from databases not documented

### Score Justification

**Correction**: 2/5 → **3/5** (Moderate)

**Arguments for 3/5**:
- Transposable architecture patterns (+1)
- Evaluation framework addresses identified gap (+1)
- Usable database context patterns (+0.5)
- Open-source, well-documented (+0.5)

### Gap "Agent Evaluation" Must Be Addressed

**YES, the guide MUST have this section** because:
- Devs create agents without knowing how to measure quality
- Anthropic docs mention evaluations but not in Claude Code context
- nao proves this is feasible and useful (production-ready)

### Risks of Non-Integration

1. **Evaluation gap remains undocumented** → Devs don't know how to measure agent quality
2. **Database context patterns undocumented** → Devs reinvent already-proven patterns
3. **Loss of credibility** → If evaluation becomes standard, guide will be behind

### Why Initial Evaluation Was Biased

1. **Confusion between scope and relevance**: Different scope ≠ not relevant
2. **Focus on final product**: Evaluated nao as *competing product*, not *pattern source*
3. **Underestimation of gaps**: Agent evaluation = critical gap not previously identified
4. **Premature rejection**: "Don't integrate" despite identifying 2 major gaps

**Lesson**: Evaluate resources for *transposable patterns*, not just direct integration.

---

## Fact-Check Results

All technical claims verified by re-fetching GitHub repository:

| Claim | Verified | Source |
|-------|----------|--------|
| TypeScript 58.9%, Python 38.5% | ✅ | Repository footer |
| Stack: Fastify, Drizzle, tRPC, React, shadcn, TanStack Query | ✅ | "Stack" section |
| Databases: PostgreSQL, BigQuery, Snowflake, Databricks | ✅ | Repository topics |
| Evaluation framework with unit testing | ✅ | "Evaluation framework" section |
| GitHub repo integration + Slack bot | ✅ | Quickstart + topics |
| Docker containerization + self-hosted | ✅ | "Docker" section + docs |

**Corrections made**: None (all initial claims were accurate)

**Stats requiring external research**: None (all verifiable on GitHub page)

---

## Final Decision

- **Final Score**: **3/5 (Moderate - Useful Complement)**
- **Action**: **Integrate** via 3 approaches (priority 1 + 2 + 3)
- **Confidence**: **High** (all claims fact-checked ✅)

### Concrete Action Plan

1. **Immediate** (today): Add mention in `guide/ai-ecosystem.md` section "Domain-Specific Agent Frameworks"
2. **Week 1**: Create `guide/agent-evaluation.md` with patterns inspired by nao
3. **Week 2-3**: Create template `examples/agents/analytics-with-eval/` with metrics

### Added Value for Guide

- ✅ Addresses critical gap (agent evaluation)
- ✅ Adds transposable patterns (context builder, DB integrations)
- ✅ Demonstrates complete lifecycle (build → eval → iterate)
- ✅ References production-ready framework in specific domain

---

## Metadata

**Evaluation performed with**: WebFetch (2×), Grep (3×), Task (technical-writer challenge)
**Evaluation time**: ~15 minutes
**Quality**: Complete challenge + fact-check ✅
**Follow-up**: Implement integration recommendations (A, B, C)