claude-code-ultimate-guide/examples/agents/analytics-with-eval/README.md
Florian BRUNIAUX ef7cdd899e release: v3.24.0 - Agent Evaluation Framework
Major addition: Complete agent evaluation framework with production-ready template.

## Added

- **Resource Evaluation**: nao framework (score 3/5)
  - Identified critical gap: agent evaluation not documented
  - Technical challenge adjusted score 2/5 → 3/5
  - All claims fact-checked (TypeScript 58.9%, Python 38.5%)

- **Guide Section**: Agent Evaluation (guide/agent-evaluation.md, ~3K tokens)
  - Metrics: response quality, tool usage, performance, satisfaction
  - Patterns: logging hooks, unit tests, A/B testing, feedback loops
  - Example: analytics agent with built-in metrics
  - Tools: nao framework reference, Claude Code hooks integration

- **AI Ecosystem**: Section 8.2 Domain-Specific Agent Frameworks
  - nao (Analytics Agents): Database-agnostic, built-in evaluation
  - Transposable patterns: context builder, evaluation hooks, DB integrations

- **Template**: Analytics Agent with Evaluation (5 files, ~1K lines)
  - README: setup, usage, troubleshooting
  - Agent: SQL generator with evaluation criteria, safety rules
  - Hook: automated metrics logging (safety, performance, errors)
  - Script: analysis with stats, safety reports, recommendations
  - Report template: monthly evaluation format

## Changed

- Agent Evaluation Guide: updated template references, verified links
- Landing Site: templates count 110 → 114
- Version: 3.23.5 → 3.24.0

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 11:52:13 +01:00

243 lines
5 KiB
Markdown

# Analytics Agent with Built-in Evaluation
**Template**: Production-ready analytics agent with automated metrics collection
**Use Case**: SQL query generation with quality tracking, performance monitoring, and safety validation
**Related Guide**: [Agent Evaluation](../../../guide/agent-evaluation.md)
---
## What's Included
| File | Purpose |
|------|---------|
| `analytics-agent.md` | Agent definition with evaluation criteria |
| `hooks/post-response-metrics.sh` | Automated metrics logging after each response |
| `eval/metrics.sh` | Analysis script for aggregating collected metrics |
| `eval/report-template.md` | Template for monthly evaluation reports |
---
## Setup
### 1. Copy Agent to Project
```bash
# Copy agent definition
cp analytics-agent.md ~/.claude/agents/
# Or for project-specific:
cp analytics-agent.md /path/to/project/.claude/agents/
```
### 2. Install Hook
```bash
# Copy hook to project
cp hooks/post-response-metrics.sh /path/to/project/.claude/hooks/
# Make executable
chmod +x /path/to/project/.claude/hooks/post-response-metrics.sh
```
### 3. Configure Hook Trigger
Add to `.claude/settings.json`:
```json
{
"hooks": {
"postToolUse": [
{
"command": ".claude/hooks/post-response-metrics.sh",
"enabled": true,
"description": "Log analytics agent metrics"
}
]
}
}
```
### 4. Create Logs Directory
```bash
mkdir -p /path/to/project/.claude/logs
```
---
## Usage
### Invoke Agent
```bash
# In Claude Code session
"Use analytics-agent to generate a SQL query for [task description]"
```
### Check Metrics
```bash
# View raw metrics log
cat .claude/logs/analytics-metrics.jsonl
# Run analysis
./examples/agents/analytics-with-eval/eval/metrics.sh
```
### Generate Monthly Report
```bash
# Copy template
cp eval/report-template.md reports/analytics-2026-02.md
# Fill in metrics from metrics.sh output
```
---
## Metrics Collected
The hook automatically logs:
| Metric | Description | Source |
|--------|-------------|--------|
| `timestamp` | ISO 8601 timestamp | System |
| `query` | Generated SQL query | Agent response |
| `exec_time` | Query execution time | Database |
| `safety` | PASS/FAIL for destructive ops | Query analysis |
| `row_count` | Rows returned | Database |
| `error` | Error message if query failed | Database |
**Log format**: JSONL (JSON Lines) in `.claude/logs/analytics-metrics.jsonl`
---
## Example Metrics Output
```bash
$ ./eval/metrics.sh
=== Analytics Agent Metrics Report ===
Period: 2026-02-01 to 2026-02-10
Total queries: 45
Safety checks:
- PASS: 42 (93%)
- FAIL: 3 (7%)
Execution time:
- Mean: 2.3s
- Median: 1.8s
- P95: 5.2s
- P99: 8.1s
Common failures:
1. DELETE without WHERE clause (2 occurrences)
2. DROP TABLE in query (1 occurrence)
Recommendations:
- Review agent instructions to emphasize WHERE clause requirement
- Add explicit prohibition on DROP operations
```
---
## Customization
### Modify Safety Checks
Edit `hooks/post-response-metrics.sh` line 12-16:
```bash
# Add more patterns
if echo "$QUERY" | grep -iE 'DELETE|DROP|TRUNCATE|ALTER'; then
SAFETY="FAIL"
fi
```
### Add Custom Metrics
Extend the JSON log structure:
```bash
echo "{
\"timestamp\":\"$(date -Iseconds)\",
\"query\":\"$QUERY\",
\"your_metric\":\"$VALUE\"
}" >> .claude/logs/analytics-metrics.jsonl
```
### Change Log Location
Update `POST_RESPONSE_LOG` variable in hook script.
---
## Troubleshooting
### Hook Not Triggering
**Check**:
1. Hook is executable: `ls -l .claude/hooks/*.sh`
2. Hook is enabled in settings.json
3. Agent name matches: `analytics-agent` (hyphenated)
**Debug**:
```bash
# Test hook manually
export CLAUDE_RESPONSE='{"content":"SELECT * FROM users;"}'
./.claude/hooks/post-response-metrics.sh
```
### No Metrics in Log
**Check**:
1. Log directory exists: `mkdir -p .claude/logs`
2. Write permissions: `touch .claude/logs/test.log`
3. Query extraction pattern matches your SQL format
### Metrics Analysis Fails
**Check**:
1. `jq` is installed: `which jq`
2. Log file is valid JSONL: `jq . .claude/logs/analytics-metrics.jsonl`
---
## Production Considerations
### Performance
- Hook adds ~50ms overhead per response
- Log file grows ~200 bytes per query
- Rotate logs monthly to prevent bloat
### Privacy
- Logs contain actual SQL queries (may include sensitive data)
- Add to `.gitignore`: `.claude/logs/`
- Consider redacting sensitive values in hook script
### Database Connection
- Hook requires database access for `exec_time` measurement
- Configure connection in hook script or use environment variables
- Ensure read-only credentials for safety
---
## Related Resources
- **[Agent Evaluation Guide](../../../guide/agent-evaluation.md)**: Complete evaluation methodology
- **[Hooks Documentation](../../../guide/ultimate-guide.md#5-hooks)**: Hook system reference
- **[nao Framework](https://github.com/getnao/nao/)**: Production analytics agent framework (inspiration)
---
## License
Same as parent repository (MIT)
**Questions?** Open an issue or discussion in the main repository.