diff --git a/CLAUDE.md b/CLAUDE.md index e08524ea..d32c7d78 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -170,19 +170,64 @@ Fonts are loaded via `@fontsource` packages (not Google Fonts) for cross-platfor The agent engine supports structured run logging for debugging. When enabled, it writes all key execution events to `~/.super-multica/sessions/{sessionId}/run-log.jsonl` alongside the session data. ```bash -# Enable via environment variable -MULTICA_RUN_LOG=1 pnpm multica run "your prompt" +# Enable via CLI flag +pnpm multica run --run-log "your prompt" -# Enable during tests -MULTICA_RUN_LOG=1 pnpm --filter @multica/core test +# Or via environment variable +MULTICA_RUN_LOG=1 pnpm multica run "your prompt" # Or programmatically const agent = new Agent({ enableRunLog: true }); ``` -Logged events: `run_start`, `run_end`, `llm_call`, `llm_result`, `tool_start`, `tool_end`, `context_overflow`, `auth_rotate`, `error_classify`, `preflight_compact_start/end`, `compaction`. +When `--run-log` is enabled, the CLI prints the session directory path to stderr: +``` +[session: 019c584a-...] +[session-dir: ~/.super-multica/sessions/019c584a-...] +``` -Each line is a JSON object with `ts` (timestamp) and `event` (type), suitable for AI-assisted log analysis. Implementation: `packages/core/src/agent/run-log.ts`. +Logged events: `run_start`, `run_end`, `llm_call`, `llm_result`, `tool_start`, `tool_end`, `context_overflow`, `auth_rotate`, `error_classify`, `preflight_compact_start/end`, `tool_result_pruning`, `compaction`, `compaction_detail`. + +Each line is a JSON object with `ts` (timestamp) and `event` (type), suitable for AI-assisted log analysis. Full event reference: `packages/core/src/agent/run-log.ts`. + +## E2E Testing (Agent-Driven) + +E2E tests are executed and analyzed by the Coding Agent (Claude Code), not by vitest. The Coding Agent runs the Multica agent via CLI, reads the structured run-log, and intelligently analyzes intermediate behavior and results. + +### How to Run + +```bash +# Basic E2E test (web_search/data tools require MULTICA_API_URL) +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "your test prompt" + +# With specific provider +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider kimi-coding "your test prompt" + +# Multi-turn test (reuse session) +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --session "follow-up prompt" +``` + +### Analysis Workflow + +After running, the Coding Agent should: +1. Read `{session-dir}/run-log.jsonl` — structured execution events +2. Read `{session-dir}/session.jsonl` — full conversation transcript (if needed) +3. Analyze event sequence, tool calls, errors, and timing +4. Report findings with verdict (pass/fail + details) + +### What to Check + +- **Event completeness**: `run_start` → ... → `run_end` (no orphaned starts) +- **Tool pairing**: every `tool_start` has a matching `tool_end` +- **Error handling**: `is_error`, `error_classify`, `auth_rotate` events +- **Compaction health**: `tokens_removed > 0` when compaction fires +- **Performance**: `llm_result.duration_ms`, tool execution times + +### Important + +- **`MULTICA_API_URL=https://api-dev.copilothub.ai`** is required for `web_search` and `data` tools. Without it, these tools fail with `MULTICA_API_URL is required`. +- Default provider is `kimi-coding`. Override with `--provider`. +- Detailed guide with feature-specific test playbooks: `docs/e2e-testing-guide.md` ## Credentials Setup diff --git a/docs/e2e-testing-guide.md b/docs/e2e-testing-guide.md index b83350a5..9deacb6e 100644 --- a/docs/e2e-testing-guide.md +++ b/docs/e2e-testing-guide.md @@ -21,32 +21,37 @@ This approach is superior to static assertions because: 1. **Credentials configured**: Run `pnpm multica credentials init` or ensure `~/.super-multica/credentials.json5` has valid provider credentials 2. **Available providers**: Check with `pnpm multica profile list` or inspect credentials file 3. **Default provider**: `kimi-coding` (Kimi Code, free tier available). Can override with `--provider` +4. **`MULTICA_API_URL`**: Required for `web_search` and `data` tools. Set to `https://api-dev.copilothub.ai` for dev environment. Without this, web search and financial data tools will fail with `MULTICA_API_URL is required` ## Running a Test ### Basic command ```bash +# For prompts that only need exec/read/write tools: pnpm multica run --run-log "your test prompt here" + +# For prompts that need web_search or data tools (requires API URL): +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "your test prompt here" ``` ### With provider override ```bash -pnpm multica run --run-log --provider claude-code "your test prompt" -pnpm multica run --run-log --provider kimi-coding "your test prompt" -pnpm multica run --run-log --provider anthropic --api-key sk-ant-... "your test prompt" +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider claude-code "your test prompt" +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider kimi-coding "your test prompt" +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider anthropic --api-key sk-ant-... "your test prompt" ``` ### Resume a session (multi-turn testing) ```bash # First turn -pnpm multica run --run-log "Create a file called test.txt with content 'hello'" +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "Create a file called test.txt with content 'hello'" # Note the session ID from stderr output: [session: 019c584a-...] # Second turn (same session) -pnpm multica run --run-log --session 019c584a-... "Read the file test.txt and tell me its content" +MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --session 019c584a-... "Read the file test.txt and tell me its content" ``` ### Output