Merge pull request #174 from multica-ai/forrestchang/arch-analysis

test(core): migrate tests to strict mock policy with real implementations

2026-02-14 00:13:24 +08:00

11 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Super Multica is a distributed AI agent framework with a monorepo architecture. It includes an agent engine with multi-provider LLM support, an Electron desktop app with embedded Hub, a WebSocket gateway for remote access, and a Next.js web app.

Monorepo Structure

super-multica/
├── apps/
│   ├── cli/           ← Command-line interface (`@multica/cli`)
│   ├── desktop/       ← Electron + Vite + React (`@multica/desktop`) — primary target
│   ├── gateway/       ← NestJS WebSocket gateway (`@multica/gateway`)
│   ├── server/        ← NestJS REST API server (`@multica/server`)
│   ├── web/           ← Next.js 16 web app (`@multica/web`, port 3000)
│   └── mobile/        ← React Native mobile app (`@multica/mobile`)
│
├── packages/
│   ├── core/          ← Core agent engine, hub, channels (`@multica/core`)
│   ├── sdk/           ← Gateway client SDK (`@multica/sdk`, Socket.io)
│   ├── ui/            ← Shared UI components (`@multica/ui`, Shadcn/Tailwind v4)
│   ├── store/         ← Zustand state management (`@multica/store`)
│   ├── hooks/         ← React hooks (`@multica/hooks`)
│   ├── types/         ← Shared TypeScript types (`@multica/types`)
│   └── utils/         ← Utility functions (`@multica/utils`)
│
└── skills/            ← Bundled agent skills

Common Commands

# Install dependencies
pnpm install

# Multica CLI (unified entry point)
pnpm multica                   # Interactive mode (default)
pnpm multica run "<prompt>"    # Run a single prompt
pnpm multica chat              # Interactive REPL mode
pnpm multica session list      # List sessions
pnpm multica profile list      # List profiles
pnpm multica skills list       # List skills
pnpm multica tools list        # List tools
pnpm multica credentials init  # Initialize credentials
pnpm multica help              # Show help

# Development servers
pnpm dev                       # Desktop app (connects to dev gateway by default)
pnpm dev:desktop               # Same as above
pnpm dev:gateway               # WebSocket gateway only
pnpm dev:web                   # Next.js web app
pnpm dev:all                   # Gateway + web app

# Override gateway URL (e.g. local gateway)
GATEWAY_URL=http://localhost:3000 pnpm dev

# Build
pnpm build                     # Build all (turbo-orchestrated)
pnpm --filter @multica/desktop build
pnpm --filter @multica/core build

# Type checking
pnpm typecheck

# Testing (vitest)
pnpm test                      # Single run
pnpm test:watch                # Watch mode
pnpm test:coverage             # With v8 coverage

Architecture

Desktop App (standalone, recommended)
  └─ Hub (embedded)
     └─ Agent Engine (LLM runner, sessions, skills, tools)
        └─ (Optional) Gateway connection for remote access

Web App (requires Gateway)
  → @multica/sdk (GatewayClient, Socket.io)
    → Gateway (NestJS, WebSocket, port 3000)
      → Hub + Agent Engine

Agent Engine (packages/core/src/agent/): Orchestrates LLM interactions with multi-provider support (OpenAI, Anthropic, DeepSeek, Kimi, Groq, Mistral, Google, Together). Features session management (JSONL-based, UUIDv7 IDs), profile system (~/.super-multica/agent-profiles/), modular skills with hot-reload, and token-aware context window guards.

Hub (packages/core/src/hub/): Manages agents and communication channels. Embedded in desktop app, or runs standalone for web clients.

Gateway (apps/gateway/): NestJS WebSocket server with Socket.io for remote client access, message routing, and device verification.

CLI (apps/cli/): Command-line interface. Entry point: apps/cli/src/index.ts.

Tech Stack & Config

Package manager: pnpm 10 with workspaces (pnpm-workspace.yaml)
Build orchestration: Turborepo (turbo.json)
TypeScript: ESNext target, NodeNext modules, strict mode
Testing: Vitest with globals enabled
Frontend: React 19, Next.js 16, Tailwind CSS v4, Shadcn/UI
Backend: NestJS 11, Socket.io, Pino logging
Desktop: Electron 33+, electron-vite, electron-builder

pnpm Configuration

Required .npmrc for Electron packaging:

shamefully-hoist=true

After adding/changing .npmrc:

rm -rf node_modules apps/*/node_modules packages/*/node_modules
rm pnpm-lock.yaml
pnpm install

See docs/package-management.md for detailed package management guide.

Code Style

Comments: Always write code comments in English, regardless of the conversation language.

Design System

The UI follows a restrained, professional design language. This is a work tool, not a consumer app.

Core Principles

Restraint over decoration — No flashy colors, minimal animations
Clarity over cleverness — Obvious > subtle, explicit > implicit
Consistency over novelty — Use Shadcn/UI patterns, don't reinvent
Density over sprawl — Respect screen real estate

Typography

Font	CSS Variable	Usage
Geist Sans	`font-sans`	Primary UI text
Geist Mono	`font-mono`	Code, technical values
Playfair Display	`font-brand`	Brand name "Multica" ONLY

Fonts are loaded via @fontsource packages (not Google Fonts) for cross-platform consistency.

Colors

No brand color — Purple/blue "AI colors" feel generic. We use neutral grays.
Color is for state — Running (blue), success (green), error (red)
Dark mode is true dark — Not gray, actual near-black

Component Library

Base: Shadcn/UI (Radix primitives + Tailwind)
Styling: Tailwind CSS v4 with OKLCH colors
Config: packages/ui/src/styles/globals.css

When Building UI

Prefer existing Shadcn components over custom implementations
Use semantic color variables (--muted, --destructive), not raw colors
Keep animations subtle and purposeful (no gratuitous motion)
Test in both light and dark modes

Debugging: Run Log

The agent engine supports structured run logging for debugging. When enabled, it writes all key execution events to ~/.super-multica/sessions/{sessionId}/run-log.jsonl alongside the session data.

# Enable via environment variable
MULTICA_RUN_LOG=1 pnpm multica run "your prompt"

# Enable during tests
MULTICA_RUN_LOG=1 pnpm --filter @multica/core test

# Or programmatically
const agent = new Agent({ enableRunLog: true });

Logged events: run_start, run_end, llm_call, llm_result, tool_start, tool_end, context_overflow, auth_rotate, error_classify, preflight_compact_start/end, compaction.

Each line is a JSON object with ts (timestamp) and event (type), suitable for AI-assisted log analysis. Implementation: packages/core/src/agent/run-log.ts.

Credentials Setup

pnpm multica credentials init

Creates:

~/.super-multica/credentials.json5 (LLM providers + built-in tools)
~/.super-multica/skills.env.json5 (skills / plugins / integrations)

Atomic Commits

After completing any task that modifies code, create atomic commits:

Run git status and git diff to see all modifications
Skip if no changes exist
Group changes by logical purpose (feature, fix, refactor, docs, test, chore)
Stage and commit each group separately

Format: <type>(<scope>): <description>

Types: feat, fix, refactor, docs, test, chore

Examples

git add packages/core/src/agent/runner.ts packages/core/src/agent/runner.test.ts
git commit -m "feat(agent): add streaming support"

git add packages/utils/src/format.ts
git commit -m "refactor(utils): simplify date formatting"

git add README.md
git commit -m "docs: update API documentation"

Testing Guidelines

Mock Policy: External Only

CRITICAL RULE: Only mock third-party/external dependencies. NEVER mock internal modules.

Type	Examples	Can Mock?
Internal modules	`./runner.js`, `../utils/format.js`	NO
Monorepo packages	`@multica/core`, `@multica/utils`	NO
Third-party packages	`openai`, `@anthropic-ai/sdk`, `@mariozechner/*`	YES
System/time APIs	`vi.useFakeTimers()`, `vi.setSystemTime()`	YES
Network calls	External HTTP requests, WebSocket connections	YES

When AI writes code, tests become more valuable than the code itself. Mocking internal modules creates brittle tests that don't verify real integration between modules, hides bugs, and requires maintaining parallel mock implementations.

Preferred Patterns

Temp directories for I/O tests (no filesystem mocking):

const testDir = join(tmpdir(), `multica-test-${Date.now()}`);
beforeEach(() => mkdirSync(testDir, { recursive: true }));
afterEach(() => rmSync(testDir, { recursive: true, force: true }));

Test reset functions for stateful modules:

// In the module itself:
export function resetForTests() { /* clear in-memory state */ }

// In tests:
beforeEach(() => resetForTests());

Pure function tests — no mocking needed:

const result = resolveContextWindowInfo({ modelContextWindow: 100_000 });
expect(result.tokens).toBe(100_000);

Constructor/parameter injection over module mocking:

// Good: pass baseDir as parameter
const session = new SessionManager({ sessionId: "test", baseDir: testDir });

// Bad: mock the paths module
vi.mock("../../shared/paths.js", () => ({ DATA_DIR: "/tmp/test" }));

Anti-Patterns

vi.mock("./internal-module.js") — NEVER mock internal modules
Mock objects with 10+ method stubs — sign you should use the real implementation
vi.mock("../context-window/index.js") with simplified logic — hides real behavior
Tests that pass but don't exercise any real code paths ("fake green")

Reference Tests

Good patterns to follow:

packages/core/src/agent/session/session-manager.display.test.ts — real SessionManager + temp dirs
packages/core/src/agent/skills/loader.test.ts — real skill loading + temp filesystem
packages/core/src/agent/context-window/guard.test.ts — pure function tests
packages/core/src/agent/subagent/registry.test.ts — real registry + resetSubagentRegistryForTests()

Known violations (to be migrated):

packages/core/src/agent/async-agent.test.ts — mocks internal ./runner.js
packages/core/src/agent/session/compaction.test.ts — mocks internal ../context-window/index.js

Pre-push Checks

Before pushing, always run:

pnpm typecheck          # Type check all packages
pnpm test               # Run tests

This ensures CI will pass. For a clean check (no cache):

pnpm turbo typecheck --force

11 KiB Raw Blame History