docs: add E2E testing workflow to CLAUDE.md and update guide with MULTICA_API_URL

Add agent-driven E2E testing section to CLAUDE.md so all team members'
Coding Agents automatically know how to run and analyze E2E tests.
Update guide with MULTICA_API_URL requirement discovered during testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Jiayuan Zhang 2026-02-15 16:23:37 +08:00
parent 496eda82d7
commit a823e391b9
2 changed files with 61 additions and 11 deletions

View file

@ -170,19 +170,64 @@ Fonts are loaded via `@fontsource` packages (not Google Fonts) for cross-platfor
The agent engine supports structured run logging for debugging. When enabled, it writes all key execution events to `~/.super-multica/sessions/{sessionId}/run-log.jsonl` alongside the session data.
```bash
# Enable via environment variable
MULTICA_RUN_LOG=1 pnpm multica run "your prompt"
# Enable via CLI flag
pnpm multica run --run-log "your prompt"
# Enable during tests
MULTICA_RUN_LOG=1 pnpm --filter @multica/core test
# Or via environment variable
MULTICA_RUN_LOG=1 pnpm multica run "your prompt"
# Or programmatically
const agent = new Agent({ enableRunLog: true });
```
Logged events: `run_start`, `run_end`, `llm_call`, `llm_result`, `tool_start`, `tool_end`, `context_overflow`, `auth_rotate`, `error_classify`, `preflight_compact_start/end`, `compaction`.
When `--run-log` is enabled, the CLI prints the session directory path to stderr:
```
[session: 019c584a-...]
[session-dir: ~/.super-multica/sessions/019c584a-...]
```
Each line is a JSON object with `ts` (timestamp) and `event` (type), suitable for AI-assisted log analysis. Implementation: `packages/core/src/agent/run-log.ts`.
Logged events: `run_start`, `run_end`, `llm_call`, `llm_result`, `tool_start`, `tool_end`, `context_overflow`, `auth_rotate`, `error_classify`, `preflight_compact_start/end`, `tool_result_pruning`, `compaction`, `compaction_detail`.
Each line is a JSON object with `ts` (timestamp) and `event` (type), suitable for AI-assisted log analysis. Full event reference: `packages/core/src/agent/run-log.ts`.
## E2E Testing (Agent-Driven)
E2E tests are executed and analyzed by the Coding Agent (Claude Code), not by vitest. The Coding Agent runs the Multica agent via CLI, reads the structured run-log, and intelligently analyzes intermediate behavior and results.
### How to Run
```bash
# Basic E2E test (web_search/data tools require MULTICA_API_URL)
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "your test prompt"
# With specific provider
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider kimi-coding "your test prompt"
# Multi-turn test (reuse session)
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --session <session-id> "follow-up prompt"
```
### Analysis Workflow
After running, the Coding Agent should:
1. Read `{session-dir}/run-log.jsonl` — structured execution events
2. Read `{session-dir}/session.jsonl` — full conversation transcript (if needed)
3. Analyze event sequence, tool calls, errors, and timing
4. Report findings with verdict (pass/fail + details)
### What to Check
- **Event completeness**: `run_start` → ... → `run_end` (no orphaned starts)
- **Tool pairing**: every `tool_start` has a matching `tool_end`
- **Error handling**: `is_error`, `error_classify`, `auth_rotate` events
- **Compaction health**: `tokens_removed > 0` when compaction fires
- **Performance**: `llm_result.duration_ms`, tool execution times
### Important
- **`MULTICA_API_URL=https://api-dev.copilothub.ai`** is required for `web_search` and `data` tools. Without it, these tools fail with `MULTICA_API_URL is required`.
- Default provider is `kimi-coding`. Override with `--provider`.
- Detailed guide with feature-specific test playbooks: `docs/e2e-testing-guide.md`
## Credentials Setup

View file

@ -21,32 +21,37 @@ This approach is superior to static assertions because:
1. **Credentials configured**: Run `pnpm multica credentials init` or ensure `~/.super-multica/credentials.json5` has valid provider credentials
2. **Available providers**: Check with `pnpm multica profile list` or inspect credentials file
3. **Default provider**: `kimi-coding` (Kimi Code, free tier available). Can override with `--provider`
4. **`MULTICA_API_URL`**: Required for `web_search` and `data` tools. Set to `https://api-dev.copilothub.ai` for dev environment. Without this, web search and financial data tools will fail with `MULTICA_API_URL is required`
## Running a Test
### Basic command
```bash
# For prompts that only need exec/read/write tools:
pnpm multica run --run-log "your test prompt here"
# For prompts that need web_search or data tools (requires API URL):
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "your test prompt here"
```
### With provider override
```bash
pnpm multica run --run-log --provider claude-code "your test prompt"
pnpm multica run --run-log --provider kimi-coding "your test prompt"
pnpm multica run --run-log --provider anthropic --api-key sk-ant-... "your test prompt"
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider claude-code "your test prompt"
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider kimi-coding "your test prompt"
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider anthropic --api-key sk-ant-... "your test prompt"
```
### Resume a session (multi-turn testing)
```bash
# First turn
pnpm multica run --run-log "Create a file called test.txt with content 'hello'"
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "Create a file called test.txt with content 'hello'"
# Note the session ID from stderr output: [session: 019c584a-...]
# Second turn (same session)
pnpm multica run --run-log --session 019c584a-... "Read the file test.txt and tell me its content"
MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --session 019c584a-... "Read the file test.txt and tell me its content"
```
### Output