docs: add E2E testing workflow to CLAUDE.md and update guide with MULTICA_API_URL
Add agent-driven E2E testing section to CLAUDE.md so all team members' Coding Agents automatically know how to run and analyze E2E tests. Update guide with MULTICA_API_URL requirement discovered during testing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
496eda82d7
commit
a823e391b9
2 changed files with 61 additions and 11 deletions
57
CLAUDE.md
57
CLAUDE.md
|
|
@ -170,19 +170,64 @@ Fonts are loaded via `@fontsource` packages (not Google Fonts) for cross-platfor
|
|||
The agent engine supports structured run logging for debugging. When enabled, it writes all key execution events to `~/.super-multica/sessions/{sessionId}/run-log.jsonl` alongside the session data.
|
||||
|
||||
```bash
|
||||
# Enable via environment variable
|
||||
MULTICA_RUN_LOG=1 pnpm multica run "your prompt"
|
||||
# Enable via CLI flag
|
||||
pnpm multica run --run-log "your prompt"
|
||||
|
||||
# Enable during tests
|
||||
MULTICA_RUN_LOG=1 pnpm --filter @multica/core test
|
||||
# Or via environment variable
|
||||
MULTICA_RUN_LOG=1 pnpm multica run "your prompt"
|
||||
|
||||
# Or programmatically
|
||||
const agent = new Agent({ enableRunLog: true });
|
||||
```
|
||||
|
||||
Logged events: `run_start`, `run_end`, `llm_call`, `llm_result`, `tool_start`, `tool_end`, `context_overflow`, `auth_rotate`, `error_classify`, `preflight_compact_start/end`, `compaction`.
|
||||
When `--run-log` is enabled, the CLI prints the session directory path to stderr:
|
||||
```
|
||||
[session: 019c584a-...]
|
||||
[session-dir: ~/.super-multica/sessions/019c584a-...]
|
||||
```
|
||||
|
||||
Each line is a JSON object with `ts` (timestamp) and `event` (type), suitable for AI-assisted log analysis. Implementation: `packages/core/src/agent/run-log.ts`.
|
||||
Logged events: `run_start`, `run_end`, `llm_call`, `llm_result`, `tool_start`, `tool_end`, `context_overflow`, `auth_rotate`, `error_classify`, `preflight_compact_start/end`, `tool_result_pruning`, `compaction`, `compaction_detail`.
|
||||
|
||||
Each line is a JSON object with `ts` (timestamp) and `event` (type), suitable for AI-assisted log analysis. Full event reference: `packages/core/src/agent/run-log.ts`.
|
||||
|
||||
## E2E Testing (Agent-Driven)
|
||||
|
||||
E2E tests are executed and analyzed by the Coding Agent (Claude Code), not by vitest. The Coding Agent runs the Multica agent via CLI, reads the structured run-log, and intelligently analyzes intermediate behavior and results.
|
||||
|
||||
### How to Run
|
||||
|
||||
```bash
|
||||
# Basic E2E test (web_search/data tools require MULTICA_API_URL)
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "your test prompt"
|
||||
|
||||
# With specific provider
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider kimi-coding "your test prompt"
|
||||
|
||||
# Multi-turn test (reuse session)
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --session <session-id> "follow-up prompt"
|
||||
```
|
||||
|
||||
### Analysis Workflow
|
||||
|
||||
After running, the Coding Agent should:
|
||||
1. Read `{session-dir}/run-log.jsonl` — structured execution events
|
||||
2. Read `{session-dir}/session.jsonl` — full conversation transcript (if needed)
|
||||
3. Analyze event sequence, tool calls, errors, and timing
|
||||
4. Report findings with verdict (pass/fail + details)
|
||||
|
||||
### What to Check
|
||||
|
||||
- **Event completeness**: `run_start` → ... → `run_end` (no orphaned starts)
|
||||
- **Tool pairing**: every `tool_start` has a matching `tool_end`
|
||||
- **Error handling**: `is_error`, `error_classify`, `auth_rotate` events
|
||||
- **Compaction health**: `tokens_removed > 0` when compaction fires
|
||||
- **Performance**: `llm_result.duration_ms`, tool execution times
|
||||
|
||||
### Important
|
||||
|
||||
- **`MULTICA_API_URL=https://api-dev.copilothub.ai`** is required for `web_search` and `data` tools. Without it, these tools fail with `MULTICA_API_URL is required`.
|
||||
- Default provider is `kimi-coding`. Override with `--provider`.
|
||||
- Detailed guide with feature-specific test playbooks: `docs/e2e-testing-guide.md`
|
||||
|
||||
## Credentials Setup
|
||||
|
||||
|
|
|
|||
|
|
@ -21,32 +21,37 @@ This approach is superior to static assertions because:
|
|||
1. **Credentials configured**: Run `pnpm multica credentials init` or ensure `~/.super-multica/credentials.json5` has valid provider credentials
|
||||
2. **Available providers**: Check with `pnpm multica profile list` or inspect credentials file
|
||||
3. **Default provider**: `kimi-coding` (Kimi Code, free tier available). Can override with `--provider`
|
||||
4. **`MULTICA_API_URL`**: Required for `web_search` and `data` tools. Set to `https://api-dev.copilothub.ai` for dev environment. Without this, web search and financial data tools will fail with `MULTICA_API_URL is required`
|
||||
|
||||
## Running a Test
|
||||
|
||||
### Basic command
|
||||
|
||||
```bash
|
||||
# For prompts that only need exec/read/write tools:
|
||||
pnpm multica run --run-log "your test prompt here"
|
||||
|
||||
# For prompts that need web_search or data tools (requires API URL):
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "your test prompt here"
|
||||
```
|
||||
|
||||
### With provider override
|
||||
|
||||
```bash
|
||||
pnpm multica run --run-log --provider claude-code "your test prompt"
|
||||
pnpm multica run --run-log --provider kimi-coding "your test prompt"
|
||||
pnpm multica run --run-log --provider anthropic --api-key sk-ant-... "your test prompt"
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider claude-code "your test prompt"
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider kimi-coding "your test prompt"
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider anthropic --api-key sk-ant-... "your test prompt"
|
||||
```
|
||||
|
||||
### Resume a session (multi-turn testing)
|
||||
|
||||
```bash
|
||||
# First turn
|
||||
pnpm multica run --run-log "Create a file called test.txt with content 'hello'"
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "Create a file called test.txt with content 'hello'"
|
||||
# Note the session ID from stderr output: [session: 019c584a-...]
|
||||
|
||||
# Second turn (same session)
|
||||
pnpm multica run --run-log --session 019c584a-... "Read the file test.txt and tell me its content"
|
||||
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --session 019c584a-... "Read the file test.txt and tell me its content"
|
||||
```
|
||||
|
||||
### Output
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue