chore(docs): remove non-e2e documentation

This commit is contained in:
Jiayuan Zhang 2026-02-17 00:46:36 +08:00
parent 292e2b9454
commit ecb0cd392e
47 changed files with 0 additions and 11844 deletions

379
CLAUDE.md
View file

@ -1,379 +0,0 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Super Multica is a distributed AI agent framework with a monorepo architecture. It includes an agent engine with multi-provider LLM support, an Electron desktop app with embedded Hub, a WebSocket gateway for remote access, and a Next.js web app.
## Monorepo Structure
```
super-multica/
├── apps/
│ ├── cli/ ← Command-line interface (`@multica/cli`)
│ ├── desktop/ ← Electron + Vite + React (`@multica/desktop`) — primary target
│ ├── gateway/ ← NestJS WebSocket gateway (`@multica/gateway`)
│ ├── server/ ← NestJS REST API server (`@multica/server`)
│ ├── web/ ← Next.js 16 web app (`@multica/web`, port 3000)
│ └── mobile/ ← React Native mobile app (`@multica/mobile`)
├── packages/
│ ├── core/ ← Core agent engine, hub, channels (`@multica/core`)
│ ├── sdk/ ← Gateway client SDK (`@multica/sdk`, Socket.io)
│ ├── ui/ ← Shared UI components (`@multica/ui`, Shadcn/Tailwind v4)
│ ├── store/ ← Zustand state management (`@multica/store`)
│ ├── hooks/ ← React hooks (`@multica/hooks`)
│ ├── types/ ← Shared TypeScript types (`@multica/types`)
│ └── utils/ ← Utility functions (`@multica/utils`)
└── skills/ ← Bundled agent skills
```
## Common Commands
```bash
# Install dependencies
pnpm install
# Multica CLI (unified entry point)
pnpm multica # Interactive mode (default)
pnpm multica run "<prompt>" # Run a single prompt
pnpm multica chat # Interactive REPL mode
pnpm multica session list # List sessions
pnpm multica profile list # List profiles
pnpm multica skills list # List skills
pnpm multica tools list # List tools
pnpm multica credentials init # Initialize credentials
pnpm multica help # Show help
# Development servers
pnpm dev # Desktop app (connects to dev gateway by default)
pnpm dev:desktop # Same as above
pnpm dev:gateway # WebSocket gateway only
pnpm dev:web # Next.js web app
pnpm dev:all # Gateway + web app
# Override gateway URL (e.g. local gateway)
GATEWAY_URL=http://localhost:3000 pnpm dev
# Build
pnpm build # Build all (turbo-orchestrated)
pnpm --filter @multica/desktop build
pnpm --filter @multica/core build
# Type checking
pnpm typecheck
# Testing (vitest)
pnpm test # Single run
pnpm test:watch # Watch mode
pnpm test:coverage # With v8 coverage
```
## Architecture
```
Desktop App (standalone, recommended)
└─ Hub (embedded)
└─ Agent Engine (LLM runner, sessions, skills, tools)
└─ (Optional) Gateway connection for remote access
Web App (requires Gateway)
@multica/sdk (GatewayClient, Socket.io)
→ Gateway (NestJS, WebSocket, port 3000)
→ Hub + Agent Engine
```
**Agent Engine** (`packages/core/src/agent/`): Orchestrates LLM interactions with multi-provider support (OpenAI, Anthropic, DeepSeek, Kimi, Groq, Mistral, Google, Together). Features session management (JSONL-based, UUIDv7 IDs), profile system (`~/.super-multica/agent-profiles/`), modular skills with hot-reload, and token-aware context window guards.
**Hub** (`packages/core/src/hub/`): Manages agents and communication channels. Embedded in desktop app, or runs standalone for web clients.
**Gateway** (`apps/gateway/`): NestJS WebSocket server with Socket.io for remote client access, message routing, and device verification.
**CLI** (`apps/cli/`): Command-line interface. Entry point: `apps/cli/src/index.ts`.
## Tech Stack & Config
- **Package manager**: pnpm 10 with workspaces (`pnpm-workspace.yaml`)
- **Build orchestration**: Turborepo (`turbo.json`)
- **TypeScript**: ESNext target, NodeNext modules, strict mode
- **Testing**: Vitest with globals enabled
- **Frontend**: React 19, Next.js 16, Tailwind CSS v4, Shadcn/UI
- **Backend**: NestJS 11, Socket.io, Pino logging
- **Desktop**: Electron 33+, electron-vite, electron-builder
## pnpm Configuration
**Required `.npmrc` for Electron packaging:**
```ini
shamefully-hoist=true
```
After adding/changing `.npmrc`:
```bash
rm -rf node_modules apps/*/node_modules packages/*/node_modules
rm pnpm-lock.yaml
pnpm install
```
See `docs/package-management.md` for detailed package management guide.
## Code Style
- **Comments**: Always write code comments in English, regardless of the conversation language.
## Design System
The UI follows a **restrained, professional** design language. This is a work tool, not a consumer app.
### Core Principles
1. **Restraint over decoration** — No flashy colors, minimal animations
2. **Clarity over cleverness** — Obvious > subtle, explicit > implicit
3. **Consistency over novelty** — Use Shadcn/UI patterns, don't reinvent
4. **Density over sprawl** — Respect screen real estate
### Typography
| Font | CSS Variable | Usage |
|------|--------------|-------|
| Geist Sans | `font-sans` | Primary UI text |
| Geist Mono | `font-mono` | Code, technical values |
| Playfair Display | `font-brand` | Brand name "Multica" ONLY |
Fonts are loaded via `@fontsource` packages (not Google Fonts) for cross-platform consistency.
### Colors
- **No brand color** — Purple/blue "AI colors" feel generic. We use neutral grays.
- **Color is for state** — Running (blue), success (green), error (red)
- **Dark mode is true dark** — Not gray, actual near-black
### Component Library
- **Base**: Shadcn/UI (Radix primitives + Tailwind)
- **Styling**: Tailwind CSS v4 with OKLCH colors
- **Config**: `packages/ui/src/styles/globals.css`
### When Building UI
- Prefer existing Shadcn components over custom implementations
- Use semantic color variables (`--muted`, `--destructive`), not raw colors
- Keep animations subtle and purposeful (no gratuitous motion)
- Test in both light and dark modes
## Debugging: Run Log
The agent engine supports structured run logging for debugging. When enabled, it writes all key execution events to `~/.super-multica/sessions/{sessionId}/run-log.jsonl` alongside the session data.
```bash
# Enable via CLI flag
pnpm multica run --run-log "your prompt"
# Or via environment variable
MULTICA_RUN_LOG=1 pnpm multica run "your prompt"
# Or programmatically
const agent = new Agent({ enableRunLog: true });
```
When `--run-log` is enabled, the CLI prints the session directory path to stderr:
```
[session: 019c584a-...]
[session-dir: ~/.super-multica/sessions/019c584a-...]
```
Logged events: `run_start`, `run_end`, `llm_call`, `llm_result`, `tool_start`, `tool_end`, `context_overflow`, `auth_rotate`, `error_classify`, `preflight_compact_start/end`, `tool_result_pruning`, `compaction`, `compaction_detail`.
Each line is a JSON object with `ts` (timestamp) and `event` (type), suitable for AI-assisted log analysis. Full event reference: `packages/core/src/agent/run-log.ts`.
## SWE-bench (Agent Benchmark)
Run the Multica agent against [SWE-bench](https://www.swebench.com/), the standard benchmark for evaluating AI coding agents on real GitHub issues.
```bash
# Download dataset
python scripts/swe-bench/download-dataset.py --dataset lite --limit 5
# Run agent against tasks
npx tsx scripts/swe-bench/run.ts --limit 5 --provider kimi-coding
# Analyze results
npx tsx scripts/swe-bench/analyze.ts
# Official evaluation (requires Docker)
bash scripts/swe-bench/evaluate.sh
```
Scripts are in `scripts/swe-bench/`. Full guide: `docs/swe-bench.md`.
## E2E Testing (Agent-Driven)
E2E tests are executed and analyzed by the Coding Agent (Claude Code), not by vitest. The Coding Agent runs the Multica agent via CLI, reads the structured run-log, and intelligently analyzes intermediate behavior and results.
### How to Run
E2E tests use an isolated data directory (`~/.super-multica-e2e`) to avoid polluting dev or production session data.
```bash
# Basic E2E test (web_search/data tools require MULTICA_API_URL)
SMC_DATA_DIR=~/.super-multica-e2e MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log "your test prompt"
# With specific provider
SMC_DATA_DIR=~/.super-multica-e2e MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --provider kimi-coding "your test prompt"
# Multi-turn test (reuse session)
SMC_DATA_DIR=~/.super-multica-e2e MULTICA_API_URL=https://api-dev.copilothub.ai pnpm multica run --run-log --session <session-id> "follow-up prompt"
# Clean up all E2E test data
rm -rf ~/.super-multica-e2e
```
### Analysis Workflow
After running, the Coding Agent should:
1. Read `{session-dir}/run-log.jsonl` — structured execution events
2. Read `{session-dir}/session.jsonl` — full conversation transcript (if needed)
3. Analyze event sequence, tool calls, errors, and timing
4. Report findings with verdict (pass/fail + details)
### What to Check
- **Event completeness**: `run_start` → ... → `run_end` (no orphaned starts)
- **Tool pairing**: every `tool_start` has a matching `tool_end`
- **Error handling**: `is_error`, `error_classify`, `auth_rotate` events
- **Compaction health**: `tokens_removed > 0` when compaction fires
- **Performance**: `llm_result.duration_ms`, tool execution times
### Important
- **`SMC_DATA_DIR=~/.super-multica-e2e`** isolates E2E test sessions from dev (`~/.super-multica-dev`) and production (`~/.super-multica`) data. Always set this.
- **`MULTICA_API_URL=https://api-dev.copilothub.ai`** is required for `web_search` and `data` tools. Without it, these tools fail with `MULTICA_API_URL is required`.
- **Auth for `web_search`/`data`**: These tools need dev backend auth. The auth store auto-falls back to `~/.super-multica-dev/auth.json`. If missing, run `pnpm dev:local` first and log in through the Desktop app.
- Default provider is `kimi-coding`. Override with `--provider`.
- Run-log and session data are at `~/.super-multica-e2e/sessions/{sessionId}/`
- Detailed guide with feature-specific test playbooks: `docs/e2e-testing-guide.md`
## Credentials Setup
```bash
pnpm multica credentials init
```
Creates:
- `~/.super-multica/credentials.json5` (LLM providers + built-in tools)
Skill-specific API keys go in `.env` files within each skill's directory:
- `~/.super-multica/skills/<skill-id>/.env`
## Atomic Commits
After completing any task that modifies code, create atomic commits:
1. Run `git status` and `git diff` to see all modifications
2. Skip if no changes exist
3. Group changes by logical purpose (feature, fix, refactor, docs, test, chore)
4. Stage and commit each group separately
**Format**: `<type>(<scope>): <description>`
Types: `feat`, `fix`, `refactor`, `docs`, `test`, `chore`
### Examples
```bash
git add packages/core/src/agent/runner.ts packages/core/src/agent/runner.test.ts
git commit -m "feat(agent): add streaming support"
git add packages/utils/src/format.ts
git commit -m "refactor(utils): simplify date formatting"
git add README.md
git commit -m "docs: update API documentation"
```
## Testing Guidelines
### Mock Policy: External Only
**CRITICAL RULE**: Only mock third-party/external dependencies. NEVER mock internal modules.
| Type | Examples | Can Mock? |
|------|----------|-----------|
| Internal modules | `./runner.js`, `../utils/format.js` | NO |
| Monorepo packages | `@multica/core`, `@multica/utils` | NO |
| Third-party packages | `openai`, `@anthropic-ai/sdk`, `@mariozechner/*` | YES |
| System/time APIs | `vi.useFakeTimers()`, `vi.setSystemTime()` | YES |
| Network calls | External HTTP requests, WebSocket connections | YES |
When AI writes code, tests become more valuable than the code itself. Mocking internal modules creates brittle tests that don't verify real integration between modules, hides bugs, and requires maintaining parallel mock implementations.
### Preferred Patterns
**Temp directories for I/O tests** (no filesystem mocking):
```typescript
const testDir = join(tmpdir(), `multica-test-${Date.now()}`);
beforeEach(() => mkdirSync(testDir, { recursive: true }));
afterEach(() => rmSync(testDir, { recursive: true, force: true }));
```
**Test reset functions for stateful modules**:
```typescript
// In the module itself:
export function resetForTests() { /* clear in-memory state */ }
// In tests:
beforeEach(() => resetForTests());
```
**Pure function tests** — no mocking needed:
```typescript
const result = resolveContextWindowInfo({ modelContextWindow: 100_000 });
expect(result.tokens).toBe(100_000);
```
**Constructor/parameter injection** over module mocking:
```typescript
// Good: pass baseDir as parameter
const session = new SessionManager({ sessionId: "test", baseDir: testDir });
// Bad: mock the paths module
vi.mock("../../shared/paths.js", () => ({ DATA_DIR: "/tmp/test" }));
```
### Anti-Patterns
- `vi.mock("./internal-module.js")` — NEVER mock internal modules
- Mock objects with 10+ method stubs — sign you should use the real implementation
- `vi.mock("../context-window/index.js")` with simplified logic — hides real behavior
- Tests that pass but don't exercise any real code paths ("fake green")
### Reference Tests
Good patterns to follow:
- `packages/core/src/agent/session/session-manager.display.test.ts` — real SessionManager + temp dirs
- `packages/core/src/agent/skills/loader.test.ts` — real skill loading + temp filesystem
- `packages/core/src/agent/context-window/guard.test.ts` — pure function tests
- `packages/core/src/agent/subagent/registry.test.ts` — real registry + `resetSubagentRegistryForTests()`
Known violations (to be migrated):
- `packages/core/src/agent/async-agent.test.ts` — mocks internal `./runner.js`
- `packages/core/src/agent/session/compaction.test.ts` — mocks internal `../context-window/index.js`
## Pre-push Checks
Before pushing, always run:
```bash
pnpm typecheck # Type check all packages
pnpm test # Run tests
```
This ensures CI will pass. For a clean check (no cache):
```bash
pnpm turbo typecheck --force
```

122
README.md
View file

@ -1,122 +0,0 @@
# Super Multica
**Multiplexed Information & Computing Agent**
An always-on AI agent that pulls real data, runs real computation, and takes real action — monitoring, analyzing, and acting within user-defined authorization boundaries.
See [Memo](./docs/memo.md) for product vision, architecture, and roadmap.
## Project Structure
```
apps/
├── cli/ # Command-line interface
├── desktop/ # Electron desktop app (recommended)
├── gateway/ # NestJS WebSocket gateway
├── server/ # NestJS REST API server
├── web/ # Next.js web app
└── mobile/ # React Native mobile app
packages/
├── core/ # Agent engine, hub, channels
├── sdk/ # Gateway client SDK
├── ui/ # Shared UI components (Shadcn/Tailwind v4)
├── store/ # Zustand state management
├── hooks/ # React hooks
├── types/ # Shared TypeScript types
└── utils/ # Utility functions
skills/ # Bundled agent skills
```
## Quick Start
```bash
pnpm install
```
### Development
```bash
pnpm dev # Desktop app (standalone, no Gateway needed)
pnpm dev:gateway # Gateway only
pnpm dev:web # Web app only
pnpm dev:all # Gateway + Web
```
### Local Full-Stack Development
`pnpm dev:local` starts the entire stack locally (Gateway + Desktop + Web) with isolated data directories, useful for end-to-end development and testing.
**Setup:**
1. Copy `.env.example` to `.env` at the repo root
2. Fill in `TELEGRAM_BOT_TOKEN` (get from [@BotFather](https://t.me/BotFather))
3. Run `pnpm dev:local`
**What it starts:**
| Service | Address | Notes |
|---------|---------|-------|
| Gateway | `http://localhost:4000` | Telegram long-polling mode |
| Web | `http://localhost:3000` | OAuth login flow |
| Desktop | — | Connects to local Gateway + Web |
**Data isolation:** All data goes to `~/.super-multica-dev` and `~/Documents/Multica-dev`, separate from production `~/.super-multica`.
**Related commands:**
```bash
pnpm dev:local:archive # Archive dev data and start fresh
```
## Architecture
```
Desktop App (standalone, recommended)
└─ Hub (embedded)
└─ Agent Engine
Web/Mobile Clients
→ Gateway (WebSocket, :3000)
→ Hub
→ Agent Engine
```
- **Desktop App**: Electron app with embedded Hub, no Gateway needed
- **Gateway**: WebSocket server for remote clients
- **Hub**: Agent lifecycle and event distribution
## Documentation
**Getting Started**
| Topic | Link |
|-------|------|
| Development guide | [docs/development.md](./docs/development.md) |
| Credentials & LLM providers | [docs/credentials.md](./docs/credentials.md) |
| CLI usage | [docs/cli.md](./docs/cli.md) |
| Skills & tools | [docs/skills-and-tools.md](./docs/skills-and-tools.md) |
| Package management | [docs/package-management.md](./docs/package-management.md) |
| Mobile development | [docs/mobile/guide.md](./docs/mobile/guide.md) |
**Testing & Benchmarks**
| Topic | Link |
|-------|------|
| SWE-bench runner | [docs/swe-bench.md](./docs/swe-bench.md) |
| E2E testing guide | [docs/e2e-testing-guide.md](./docs/e2e-testing-guide.md) |
**Architecture & Protocols**
| Topic | Link |
|-------|------|
| Product capabilities | [docs/product-capabilities.md](./docs/product-capabilities.md) |
| Message paths (Desktop/Web/Channel) | [docs/message-paths.md](./docs/message-paths.md) |
| Client streaming protocol | [docs/client-streaming-protocol.md](./docs/client-streaming-protocol.md) |
| Hub RPC protocol | [docs/rpc.md](./docs/rpc.md) |
| Exec approval protocol | [docs/exec-approval.md](./docs/exec-approval.md) |
| Time injection design | [docs/time-injection.md](./docs/time-injection.md) |
| Channel system | [docs/channels/README.md](./docs/channels/README.md) |
| Channel media handling | [docs/channels/media-handling.md](./docs/channels/media-handling.md) |
| Desktop login integration | [docs/auth/desktop-integration.md](./docs/auth/desktop-integration.md) |

View file

@ -1,490 +0,0 @@
# Multica Desktop App 设计文档
## 产品定位
Multica Desktop 是一个统一的桌面应用,具有双重身份:
1. **Host 模式**: 本机运行 Hub + Agent可供其他设备连接
2. **Client 模式**: 连接到其他 Hub 的 Agent 进行对话
用户安装同一个 App既可以作为 Agent 的宿主(让其他设备扫码连接),也可以扫码连接到别人的 Agent。
### 架构图
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Multica Desktop App │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ React UI (Renderer) │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Home │ │ Chat │ │ Tools │ │ Skills │ │Settings │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ │ │ │
│ 直接调用 (本地) WebSocket (远程) │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │
│ │ Local Hub + Agent │ │ Remote Hub (via Gateway) │ │
│ │ (进程内) │ │ (另一台设备) │ │
│ └─────────────────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ WebSocket
┌─────────────────────┐
│ Gateway │
│ (公网 WebSocket) │
└─────────────────────┘
```
**关键点**:
- **统一应用**: 不区分 Admin App 和 Client App一个 App 两种用法
- **Chat 双模式**: Chat 页面可以选择与本地 Agent 对话,或连接远程 Agent 对话
- **本地 Agent**: Hub + Agent 跑在 Electron 主进程内UI 通过 IPC 调用访问
- **远程连接**: 通过 Gateway WebSocket 连接到其他设备的 Hub
**约束**: 第一阶段 1 Client - 1 Hub - 1 Agent Session
---
## 技术实现设计
### 技术栈
| 层级 | 技术 | 说明 |
| ------ | ------------------------ | -------------- |
| 框架 | Electron 30 | 桌面应用 |
| 前端 | React 19 + Vite | 渲染进程 |
| 路由 | react-router-dom v7 | HashRouter |
| 状态 | @multica/store (Zustand) | 复用现有 store |
| UI | @multica/ui (Shadcn) | 复用现有组件 |
| 二维码 | qrcode.react | 生成二维码 |
| 通信 | @multica/sdk | Gateway 连接 |
### 文件结构规划
```
apps/desktop/
├── electron/
│ ├── main.ts # 主进程 (Hub + Agent)
│ └── preload.ts # 预加载脚本 (如需 IPC)
├── src/
│ ├── main.tsx # React 入口
│ ├── App.tsx # 路由配置
│ ├── pages/
│ │ ├── home.tsx # Home 入口页 (三个选项)
│ │ ├── chat.tsx # Chat 页面 (Local/Remote 双模式)
│ │ ├── tools.tsx # Tools 管理页
│ │ ├── skills.tsx # Skills 管理页
│ │ └── layout.tsx # 全局布局 (Header + Tabs)
│ ├── components/
│ │ ├── qr-code.tsx # 二维码组件
│ │ ├── qr-scanner.tsx # 扫码组件
│ │ ├── connection-status.tsx # 连接状态
│ │ ├── tool-list.tsx # Tools 列表
│ │ └── skill-list.tsx # Skills 列表
│ └── hooks/
│ ├── use-local-agent.ts # 本地 Agent 管理
│ ├── use-remote-agent.ts # 远程 Agent 连接
│ └── use-connection.ts # 连接状态管理
└── package.json
```
### 核心实现点
#### 1. 二维码生成与连接
二维码内容格式:
```json
{
"type": "multica-connect",
"gateway": "wss://gateway.multica.ai",
"hubId": "019c1d32-xxxx",
"agentId": "019c1d32-yyyy",
"token": "random-uuid-token",
"expires": 1234567890
}
```
连接流程:
```
1. Admin 启动 → Hub 连接公网 Gateway → 注册为 deviceType: "hub"
2. Admin 创建 Agent → 生成 token → 编码到二维码 (含 hubId + agentId + token)
3. Client 扫码 → 解析二维码 → 连接同一 Gateway
4. Client 发送 "connect-request" 到 hubId (带 token)
5. Admin 验证 token 有效且未过期 → 建立配对关系
6. Client 后续消息发到 hubIdpayload 带 agentId
7. Hub 路由消息到对应 Agent
```
#### 2. Tools 管理
**现有 CLI 命令** (已实现):
```bash
multica tools list # 列出所有 tools
multica tools list --profile coding # 按 profile 过滤
multica tools groups # 显示 tool groups
multica tools profiles # 显示预设 profiles
```
**Admin App 实现方式** - 通过 IPC 调用 Main Process:
```typescript
// Renderer 进程 (React Hook)
const tools = await window.electronAPI.tools.list();
const groups = await window.electronAPI.tools.getGroups();
const profiles = await window.electronAPI.tools.getProfiles();
await window.electronAPI.tools.setStatus('exec', false);
// Main 进程 (IPC Handler)
ipcMain.handle('tools:list', async () => {
const allTools = createAllTools(process.cwd());
return allTools.map((t) => ({
name: t.name,
group: TOOL_GROUPS[t.name],
enabled: true,
}));
});
```
**注意**: Renderer 进程运行在沙盒中,不能直接访问 Node.js API必须通过 IPC 调用 Main Process。
#### 3. Skills 管理
**现有 CLI 命令** (已实现):
```bash
multica skills list # 列出所有 skills
multica skills status # 显示状态摘要
multica skills status <id> # 单个 skill 详情
multica skills add owner/repo # 从 GitHub 添加
multica skills remove <name> # 删除 skill
multica skills install <id> # 安装依赖
```
**Admin App 实现方式** - 通过 IPC 调用 Main Process:
```typescript
// Renderer 进程 (React Hook)
const skills = await window.electronAPI.skills.list();
await window.electronAPI.skills.add('anthropics/skills');
await window.electronAPI.skills.remove('pdf');
await window.electronAPI.skills.setEnabled('commit', false);
// Main 进程 (IPC Handler)
ipcMain.handle('skills:list', async () => {
return await listAllSkillsWithStatus();
});
ipcMain.handle('skills:add', async (_, source: string) => {
await addSkill({ source, force: false });
});
```
---
## 四、Hub 集成技术方案
### 架构概述
Desktop App 采用 **Electron IPC + Hub 实例** 架构:
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Electron Desktop App │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Renderer Process (React UI) │ │
│ │ │ │
│ │ home.tsx → useHub() → window.electronAPI.hub.getStatus() │ │
│ │ tools.tsx → useTools() → window.electronAPI.tools.list() │ │
│ │ skills.tsx→ useSkills()→ window.electronAPI.skills.list() │ │
│ │ │ │
│ └──────────────────────────────┬─────────────────────────────────────────┘ │
│ │ IPC (contextBridge) │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Main Process (Node.js) │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ Hub Instance │ │ │
│ │ │ - hubId: UUIDv7 │ │ │
│ │ │ - agents: Map<agentId, AsyncAgent> │ │ │
│ │ │ - status: 'starting' | 'ready' | 'error' │ │ │
│ │ │ - GatewayClient: 连接公网 Gateway (可选) │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────────────────────────▼────────────────────────────────┐ │ │
│ │ │ AsyncAgent Instance │ │ │
│ │ │ - agentId: UUIDv7 │ │ │
│ │ │ - runner: AgentRunner (LLM interaction) │ │ │
│ │ │ - tools: Tool[] (可动态更新) │ │ │
│ │ │ - skills: SkillInfo[] │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ WebSocket (可选,用于 Client 远程连接)
┌─────────────────────┐
│ Public Gateway │
│ (wss://xxx) │
└─────────────────────┘
```
### IPC 通信机制
**工作原理**:
1. **Main Process**: 在 Electron 主进程中创建 Hub 和 Agent 实例
2. **Preload Script**: 通过 `contextBridge.exposeInMainWorld` 暴露安全 API
3. **Renderer Process**: React UI 通过 `window.electronAPI` 调用主进程功能
**与 CLI 命令的关系**:
| CLI 命令 | IPC Handler | 底层调用 |
| -------------------------- | ----------------- | -------------------------------------------- |
| `multica tools list` | `tools:list` | `createAllTools()` + `getToolStatus()` |
| `multica tools enable xxx` | `tools:setStatus` | `setToolStatus()` |
| `multica skills list` | `skills:list` | `loadSkills()` + `listAllSkillsWithStatus()` |
| `multica skills add xxx` | `skills:add` | `addSkill()` |
**本质上 CLI 和 Admin App 调用的是同一套底层模块**,区别仅在于:
- CLI: 通过命令行参数解析后直接调用
- Admin App: 通过 IPC 转发调用
### 核心文件
```
apps/desktop/
├── electron/
│ ├── main.ts # 主进程入口,创建窗口 + 注册 IPC
│ ├── preload.ts # 暴露 electronAPI
│ └── ipc/
│ ├── index.ts # 统一注册所有 IPC handlers
│ ├── hub.ts # Hub 管理 (创建/状态/连接 Gateway)
│ ├── agent.ts # Agent 管理 (Tools 读写)
│ └── skills.ts # Skills 管理
├── src/
│ └── hooks/
│ ├── use-hub.ts # 获取 Hub 状态
│ ├── use-tools.ts # Tools CRUD
│ └── use-skills.ts # Skills CRUD
```
### IPC 接口定义
```typescript
// electron/preload.ts 暴露的 API
interface ElectronAPI {
hub: {
getStatus: () => Promise<HubStatus>;
getAgentInfo: () => Promise<AgentInfo | null>;
};
tools: {
list: () => Promise<ToolStatus[]>;
setStatus: (toolName: string, enabled: boolean) => Promise<void>;
getGroups: () => Promise<Record<string, string[]>>;
getProfiles: () => Promise<string[]>;
};
skills: {
list: () => Promise<SkillInfo[]>;
add: (source: string) => Promise<void>;
remove: (name: string) => Promise<void>;
setEnabled: (name: string, enabled: boolean) => Promise<void>;
};
}
// 类型定义
interface HubStatus {
hubId: string;
status: 'starting' | 'ready' | 'error';
agentCount: number;
gatewayConnected: boolean;
gatewayUrl?: string;
}
interface AgentInfo {
agentId: string;
provider: string;
model: string;
status: 'idle' | 'running';
}
interface ToolStatus {
name: string;
group: string;
enabled: boolean;
needsConfig?: boolean;
}
interface SkillInfo {
name: string;
command: string;
source: 'bundled' | 'global' | 'profile';
status: 'ready' | 'missing-deps' | 'disabled';
description?: string;
}
```
### Hub 生命周期
```typescript
// electron/ipc/hub.ts 简化逻辑
let hub: Hub | null = null;
export function registerHubHandlers(ipcMain: IpcMain) {
// App 启动时自动创建 Hub
ipcMain.handle('hub:getStatus', async () => {
if (!hub) {
hub = new Hub();
await hub.start();
// 创建默认 Agent
const agent = await hub.createAgent({
provider: credentialManager.getLlmProvider(),
model: credentialManager.getLlmProviderConfig()?.model,
});
}
return {
hubId: hub.id,
status: hub.status,
agentCount: hub.agents.size,
gatewayConnected: hub.gateway?.connected ?? false,
};
});
}
```
### Tools 实时更新机制
当用户在 UI 中切换 Tool 开关时:
```
1. UI: Switch onChange → useTools.setToolStatus('exec', false)
2. Hook: await window.electronAPI.tools.setStatus('exec', false)
3. IPC: ipcMain.handle('tools:setStatus') → agent.updateTools(...)
4. Agent: 重新过滤 tools 列表,下次 LLM 调用使用新配置
```
**注意**: Tools 状态目前保存在内存中,重启后重置。后续可持久化到 `~/.super-multica/tool-config.json`
---
## 六、关于 RPC 与 IPC 的区别
**问**: Admin UI 和 Hub/Agent 之间是通过什么方式通信?
**答**: 通过 **Electron IPC (进程间通信)**,不是网络 RPC。
| 通信类型 | 场景 | 协议 |
| -------- | ------------------------------- | ------------------- |
| IPC | Admin UI ↔ Hub (同一设备) | Electron IPC (内存) |
| RPC | Client ↔ Gateway ↔ Hub (跨设备) | WebSocket |
**为什么选择 IPC 而不是直接 import?**
1. **安全隔离**: Renderer 进程不应直接访问 Node.js API 和文件系统
2. **进程隔离**: Electron 推荐 Renderer 运行在沙盒中
3. **一致性**: 与 CLI 调用相同的底层模块,便于维护
4. **扩展性**: 后续可以轻松添加 RPC 支持,供远程管理
```
┌─────────────────────────────────────────────────────────────────┐
│ Electron App │
│ │
│ ┌──────────────────────┐ ┌─────────────────────────────┐ │
│ │ Renderer Process │ │ Main Process │ │
│ │ (React UI, 沙盒) │ │ (Node.js, 完整权限) │ │
│ │ │ IPC │ │ │
│ │ useTools() ──────────────► │ ipcMain.handle('tools:*') │ │
│ │ useSkills() ─────────────► │ ipcMain.handle('skills:*') │ │
│ │ useHub() ────────────────► │ Hub + Agent 实例 │ │
│ └──────────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
**IPC 调用示例**:
```typescript
// Renderer (React 组件)
const tools = await window.electronAPI.tools.list();
// Main Process (IPC Handler)
ipcMain.handle('tools:list', async () => {
const allTools = createAllTools(process.cwd());
return allTools.map((t) => ({
name: t.name,
group: TOOL_GROUPS[t.name] || 'other',
enabled: getToolStatus(t.name),
}));
});
```
---
## 七、依赖安装
```bash
# 二维码生成
pnpm --filter @multica/desktop add qrcode.react
# 类型定义 (如需要)
pnpm --filter @multica/desktop add -D @types/qrcode.react
```
---
## 八、数据流架构
Chat 页面支持两种模式,底层使用相同的 UI 组件和 Store
### Local Mode (IPC 直连)
本地 Agent 对话,不需要 Gateway直接通过 Electron IPC 通信:
```
ChatInput → useLocalChat.sendMessage()
→ IPC: localChat:send → agent.write()
→ agent.subscribe() → IPC: localChat:event
→ useLocalChat.onEvent() → useMessagesStore.startStream/appendStream/endStream
→ MessageList 显示
```
### Remote Mode (Gateway)
远程 Agent 对话,通过 WebSocket 连接 Gateway
```
ChatInput → useMessagesStore.sendMessage()
→ ConnectionStore.send() → WebSocket → Gateway → Hub → agent.write()
→ Hub.consumeAgent() → WebSocket: stream event
→ ConnectionStore.onMessage() → useMessagesStore.startStream/appendStream/endStream
→ MessageList 显示
```
### 复用层级
| 层级 | 组件/模块 | 复用情况 |
| -------- | ----------------------------------- | ----------- |
| UI 层 | `MessageList`, `ChatInput` | ✅ 完全复用 |
| Store 层 | `useMessagesStore` | ✅ 完全复用 |
| Agent 层 | `AsyncAgent.write()`, `subscribe()` | ✅ 完全复用 |
| 传输层 | IPC vs WebSocket | ❌ 各自实现 |
---
## 九、TODO
- [ ] **优化 Memory Tool 逻辑**: 当前 memory tool 和 memory.md 没有统一,需要整合
- [ ] **优化 Agent Profile 加载逻辑**: 改进 Profile 的加载机制
- [ ] **Agent 自我迭代 Profile**: 添加让 Agent 在对话过程中自己修改 Profile 内文件的能力

View file

@ -1,50 +0,0 @@
# Welcome to your Expo app 👋
This is an [Expo](https://expo.dev) project created with [`create-expo-app`](https://www.npmjs.com/package/create-expo-app).
## Get started
1. Install dependencies
```bash
npm install
```
2. Start the app
```bash
npx expo start
```
In the output, you'll find options to open the app in a
- [development build](https://docs.expo.dev/develop/development-builds/introduction/)
- [Android emulator](https://docs.expo.dev/workflow/android-studio-emulator/)
- [iOS simulator](https://docs.expo.dev/workflow/ios-simulator/)
- [Expo Go](https://expo.dev/go), a limited sandbox for trying out app development with Expo
You can start developing by editing the files inside the **app** directory. This project uses [file-based routing](https://docs.expo.dev/router/introduction).
## Get a fresh project
When you're ready, run:
```bash
npm run reset-project
```
This command will move the starter code to the **app-example** directory and create a blank **app** directory where you can start developing.
## Learn more
To learn more about developing your project with Expo, look at the following resources:
- [Expo documentation](https://docs.expo.dev/): Learn fundamentals, or go into advanced topics with our [guides](https://docs.expo.dev/guides).
- [Learn Expo tutorial](https://docs.expo.dev/tutorial/introduction/): Follow a step-by-step tutorial where you'll create a project that runs on Android, iOS, and the web.
## Join the community
Join our community of developers creating universal apps.
- [Expo on GitHub](https://github.com/expo/expo): View our open source platform and contribute.
- [Discord community](https://chat.expo.dev): Chat with Expo users and ask questions.

View file

@ -1,182 +0,0 @@
# @multica/web
Next.js web client for Super Multica. This app is a **thin shell** — it contains only layout and page entry points. All business logic, state management, UI components, and network requests live in shared packages.
## Architecture
```
apps/web/app/
├── layout.tsx — Root layout, setConfig(), providers
├── page.tsx — Page entry, renders <Chat />
└── icon.png — Favicon
```
Everything else comes from packages:
| Package | Responsibility | Examples |
|---------|---------------|----------|
| `@multica/store` | Global state (Zustand) | Hub, Messages, Gateway, DeviceId |
| `@multica/ui` | Components & UI hooks | Chat, HubSidebar, Skeleton, useScrollFade |
| `@multica/fetch` | HTTP client & URL config | `consoleApi`, `setConfig()` |
| `@multica/sdk` | WebSocket client | `GatewayClient` |
### Where does new code go?
- **Page-scoped UI hook** (e.g. form toggle, scroll position) → `@multica/ui/hooks/`
- **Cross-component state** (e.g. user preferences, notifications) → `@multica/store`
- **Reusable component**`@multica/ui/components/`
- **HTTP request helper**`@multica/fetch`
- **This app** → Only if it's Next.js-specific (middleware, route handlers, `next.config`)
> Principle: desktop also consumes these packages, so anything reusable must NOT live in `apps/web`.
## Network Requests
Two communication channels, two packages:
```
HTTP → @multica/fetch (consoleApi) → Console :4000 (Hub, Agent CRUD)
WS → @multica/store (gateway) → Gateway :3000 (Chat messages)
```
Rules:
1. **Never hardcode URLs.** Use `consoleApi` for HTTP, `useGatewayStore` for WS. Both read from `setConfig()` in `layout.tsx`.
2. **HTTP for management, WS for real-time.** Creating/deleting agents is HTTP. Sending/receiving chat messages is WS.
3. **Future: gateway may proxy HTTP.** The current two-endpoint setup may merge into one. Because all requests go through `@multica/fetch` and `@multica/store`, business code won't need changes.
## State Management
We use Zustand. Follow these rules:
### Subscribe only to what you render
```tsx
// Good — component re-renders only when status changes
const status = useHubStore((s) => s.status)
// Bad — re-renders on ANY store change
const { status } = useHubStore()
```
### Use getState() in callbacks
Don't subscribe to state that's only used inside event handlers. Read it at call time instead.
```tsx
// Good — no subscription, no re-render
const handleSend = useCallback((text: string) => {
const hub = useHubStore.getState().hub
const agentId = useHubStore.getState().activeAgentId
if (!hub?.hubId || !agentId) return
useMessagesStore.getState().addUserMessage(text, agentId)
useGatewayStore.getState().send(hub.hubId, "message", { agentId, content: text })
}, [])
// Bad — subscribes to hub and activeAgentId just to use them in onClick
const hub = useHubStore((s) => s.hub)
const activeAgentId = useHubStore((s) => s.activeAgentId)
```
### Subscribe to derived values, not raw objects
```tsx
// Good — re-renders only when the boolean flips
const isConnected = useHubStore((s) => s.status === "connected")
// Bad — re-renders when any field of hub changes
const hub = useHubStore((s) => s.hub)
const isConnected = hub !== null
```
### Filter/derive with useMemo, not inside selectors
Selectors that return new references (`.filter()`, `.map()`) cause infinite re-renders. Derive outside the selector.
```tsx
// Good
const messages = useMessagesStore((s) => s.messages)
const filtered = useMemo(
() => messages.filter((m) => m.agentId === activeAgentId),
[messages, activeAgentId]
)
// Bad — .filter() returns a new array every time, triggers infinite loop
const filtered = useMessagesStore((s) => s.messages.filter(...))
```
### Initialize once
Side-effectful operations (WS connection, SDK init) must have guards to prevent double execution.
```tsx
// Inside store
connect: (deviceId) => {
if (client) return // Already connected, skip
client = new GatewayClient(...)
client.connect()
}
```
## Imports
### Use direct paths for @multica/ui
```tsx
// Good
import { Chat } from "@multica/ui/components/chat"
import { Button } from "@multica/ui/components/ui/button"
import { useScrollFade } from "@multica/ui/hooks/use-scroll-fade"
// Bad — barrel import pulls in everything
import { Chat, Button, useScrollFade } from "@multica/ui"
```
`@multica/store` barrel import is fine — it has few exports and all are lightweight Zustand stores.
### Heavy components: use dynamic import
For large dependencies (code editors, chart libraries, PDF viewers), lazy-load to keep the initial bundle small.
```tsx
import dynamic from "next/dynamic"
const CodeEditor = dynamic(
() => import("@multica/ui/components/code-editor"),
{ ssr: false }
)
```
## Conditional Rendering
Use ternary expressions, not `&&`, to avoid rendering `0` or `""` as visible content.
```tsx
// Good
{status === "connected" ? <AgentList /> : null}
// Bad — if agents is 0, renders "0" on screen
{agents.length && <AgentList />}
```
## Development
```bash
# Start web dev server (port 3001)
multica dev web
# Or start all services
multica dev
# Typecheck
cd apps/web && npx tsc --noEmit
```
## Adding a New Feature — Checklist
1. Does it need global state? → Create a store in `@multica/store`
2. Does it need HTTP calls? → Use `consoleApi` from `@multica/fetch`
3. Does it need a UI component? → Add to `@multica/ui/components/`
4. Does it need a UI hook? → Add to `@multica/ui/hooks/`
5. Is it Next.js-specific? → Only then add to `apps/web`
6. Is the component heavy (>50KB)? → Use `next/dynamic` with `{ ssr: false }`

View file

@ -1,75 +0,0 @@
# Desktop 登录集成
## 登录流程
```
Desktop 点击登录
启动本地 HTTP 服务器 (随机端口,如 54321)
打开浏览器 → http://localhost:3000/api/desktop/session?port=54321&platform=web
Web 重定向 → /login?next=...
用户登录,调用 /api/v1/auth/login (代理到 api-dev.copilothub.ai)
登录成功,回调 → http://127.0.0.1:54321/callback?sid=xxx&user=xxx
Desktop 保存到 ~/.super-multica/auth.json
```
## 前端逻辑
### Web 端
- 端口:**3000**
- 登录 API`/api/v1/auth/login`(通过 Next.js rewrites 代理到后端)
- 登录成功后回调:`http://127.0.0.1:{port}/callback?sid=xxx&user=xxx`
### Desktop 端
- 点击登录 → 启动本地服务器 → 打开浏览器
- 收到回调 → 保存到本地文件
## 存储
**路径:** `~/.super-multica/auth.json`
Desktop 登录成功后SID 和用户信息存储在本地文件:
```json
{
"sid": "session-id-from-backend",
"user": {
"uid": "user-id",
"name": "User Name",
"email": "user@example.com"
}
}
```
后续请求可从此文件读取 `sid` 进行认证。
## 退出登录
**后端只需要返回错误,前端会自动处理退出。**
前端收到认证错误后:
1. 调用 `auth:clear` 清除本地数据
2. 跳转到登录页
## 本地调试
```bash
# 1. 启动 WebNext.js rewrites 自动代理 /api/* 到 api-dev.copilothub.ai
pnpm dev:web
# 2. 启动 Desktop
pnpm dev:desktop
```
本地调试时Next.js rewrites配置在 `apps/web/next.config.ts`)自动将 `/api/*` 请求代理到 `MULTICA_API_URL` 指定的后端。
## 参考
- **Cap** - https://github.com/CapSoftware/Cap

View file

@ -1,288 +0,0 @@
# Channel System
The Channel system connects external messaging platforms (Telegram, Discord, etc.) to the Hub's agent. Each platform is a **plugin** that translates platform-specific APIs into a unified interface.
> For media handling details (audio transcription, image/video description), see [media-handling.md](./media-handling.md).
> For message flow across all three I/O paths (Desktop / Web / Channel), see [message-paths.md](../message-paths.md).
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ credentials.json5 │
│ { channels: { telegram: { default: { botToken } } } } │
└──────────────────────┬──────────────────────────────────────┘
│ loadChannelsConfig()
┌─────────────────────────────────────────────────────────────┐
│ Channel Manager (manager.ts) │
│ │
│ startAll() → iterate plugins → startAccount() per account │
│ ensureSubscribed() → listen for agent lifecycle events │
│ │
│ Incoming: │
│ routeIncoming() → 👀 ack + debouncer → agent.write() │
│ Outgoing: │
│ activeRoute → aggregator → plugin.outbound.*() │
│ │
│ State: │
│ pendingRoutes[] ─(FIFO)→ activeRoute + activeAcks │
│ ackBuffer[] ─(snapshot on flush)→ pendingRoutes[].acks │
└──────────┬──────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ InboundDebouncer (inbound-debouncer.ts) │
│ 500ms idle window / 2000ms hard cap per conversationId │
│ Each flush → snapshot route + acks → agent.write() │
└──────────┬──────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Plugin Registry (registry.ts) │
│ registerChannel(plugin) / listChannels() / getChannel(id) │
└──────────┬──────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Channel Plugins (e.g. telegram.ts) │
│ │
│ config — resolve account credentials │
│ gateway — receive messages (polling / webhook) │
│ outbound — send replies, typing, reactions (👀 ack) │
│ downloadMedia() — download media files to local disk │
└─────────────────────────────────────────────────────────────┘
```
## Plugin Interface
Each channel plugin implements `ChannelPlugin` (defined in `types.ts`):
```typescript
interface ChannelPlugin {
readonly id: string; // "telegram", "discord", etc.
readonly meta: { name: string; description: string };
readonly chunkerConfig?: BlockChunkerConfig; // override text chunking per platform
readonly config: ChannelConfigAdapter; // credential resolution
readonly gateway: ChannelGatewayAdapter; // receive messages
readonly outbound: ChannelOutboundAdapter; // send replies
downloadMedia?(fileId: string, accountId: string): Promise<string>; // optional
}
```
### Three Adapters
| Adapter | Role | Key Methods |
|---------|------|-------------|
| **config** | Resolve credentials from `credentials.json5` | `listAccountIds()`, `resolveAccount()`, `isConfigured()` |
| **gateway** | Receive inbound messages from the platform | `start(accountId, config, onMessage, signal)` |
| **outbound** | Send replies back to the platform | `sendText()`, `replyText()`, `sendTyping?()`, `addReaction?()`, `removeReaction?()` |
### downloadMedia (optional)
Platforms that support media (voice, image, video, document) implement `downloadMedia()` to download files to `~/.super-multica/cache/media/` with UUID filenames. The Manager calls this before processing media.
## Message Flow
### Inbound (Platform → Agent)
```
User sends message in Telegram
→ grammy long-polling → onMessage callback
→ ChannelManager.routeIncoming()
1. Update lastRoute (reply target)
2. Start typing indicator (repeats every 5s)
3. Add 👀 reaction to this message (ack)
4. Push ack route to ackBuffer
5. If media: routeMedia() → download → transcribe/describe → text
6. Push text into InboundDebouncer
InboundDebouncer (per conversationId):
┌─ 500ms idle window: wait for more messages
│ If another message arrives within 500ms, reset timer and append
│ If 2000ms since first message, force-flush immediately
└─ On flush:
1. Snapshot lastRoute → route
2. Snapshot ackBuffer → acks, clear buffer
3. Push { route, acks } to pendingRoutes queue
4. Call agent.write(combinedText)
```
All media is converted to text before the agent sees it. See [media-handling.md](./media-handling.md) for details.
### Outbound (Agent → Platform)
```
agent.write() queued → agent.run() starts
→ agent_start event
1. Shift entry from pendingRoutes queue
2. Set activeRoute = entry.route (stable for entire run)
3. Set activeAcks = entry.acks
→ message_start (assistant)
1. Create MessageAggregator wired to activeRoute
→ message_update (assistant)
1. Feed text deltas to aggregator
→ message_end (assistant)
1. Aggregator flushes final block, then null out
(May repeat if agent does multi-turn tool calls)
→ Aggregator emits BlockReply chunks:
Block 0: plugin.outbound.replyText() // reply to original message
Block N: plugin.outbound.sendText() // follow-up messages
→ agent_end event
1. Remove 👀 from all activeAcks messages
2. Clear activeRoute and activeAcks
3. If pendingRoutes is empty → stop typing
If more pending → keep typing for next run
```
The **MessageAggregator** buffers streaming LLM output and splits it into blocks at natural text boundaries (paragraphs, code blocks). This is necessary because messaging platforms cannot consume raw streaming deltas.
## Route Queue Pattern
The channel system uses a FIFO queue to correctly route replies when multiple messages arrive while the agent is busy. This solves the "reply-to mismatch" problem where rapid-fire messages would cause replies to target the wrong original message.
### State Fields
| Field | Type | Purpose |
|-------|------|---------|
| `lastRoute` | `LastRoute \| null` | Where the most recent channel message came from. Updated on every incoming message. |
| `pendingRoutes` | `{ route, acks }[]` | FIFO queue of snapshotted routes, one per debouncer flush. Dequeued on `agent_start`. |
| `activeRoute` | `LastRoute \| null` | Route for the currently running agent. Set on `agent_start`, cleared on `agent_end`. Stable across all turns within one run. |
| `ackBuffer` | `LastRoute[]` | Accumulates 👀 ack targets between debouncer flushes. Snapshotted and cleared on each flush. |
| `activeAcks` | `LastRoute[]` | All messages with 👀 in the current run. Cleaned up on `agent_end`. |
### Lifecycle
```
Message A arrives → lastRoute = A, ackBuffer = [A], 👀 on A
Message B arrives (50ms) → lastRoute = B, ackBuffer = [A, B], 👀 on B
─── 500ms idle ───
Debouncer flushes → pendingRoutes.push({ route: B, acks: [A, B] })
ackBuffer = [], agent.write("A\nB")
Message C arrives → lastRoute = C, ackBuffer = [C], 👀 on C
─── 500ms idle ───
Debouncer flushes → pendingRoutes.push({ route: C, acks: [C] })
ackBuffer = [], agent.write("C")
agent_start (run 1) → activeRoute = B, activeAcks = [A, B]
(agent processes "A\nB", replies to message B)
agent_end (run 1) → remove 👀 from A and B, pendingRoutes still has 1 → keep typing
agent_start (run 2) → activeRoute = C, activeAcks = [C]
(agent processes "C", replies to message C)
agent_end (run 2) → remove 👀 from C, pendingRoutes empty → stop typing
```
### Why agent_start / agent_end (not message_end)
In multi-turn agent runs (e.g. when the agent uses tools), `message_end` fires once per assistant message — potentially multiple times per `agent.run()`. Using `message_end` for state management would:
- Clear `activeRoute` mid-run, causing the next turn's aggregator to pick up the wrong route
- Remove 👀 too early (before the agent is actually done)
- Stop typing between tool-call turns
`agent_start` and `agent_end` fire exactly once per `agent.run()`, making them the correct lifecycle boundaries.
### lastRoute vs activeRoute
- **`lastRoute`** — global, updated on every incoming message. Used for: typing indicators, error reporting, creating aggregators when no activeRoute exists.
- **`activeRoute`** — per-run, set from queue on `agent_start`. Used for: reply targeting via aggregator. Guarantees that a run's reply goes to the correct message even if new messages arrive during processing.
Desktop and Web always receive agent events independently via their own mechanisms (IPC / Gateway). `clearLastRoute()` is called when a desktop/web message arrives to prevent channel forwarding.
## Inbound Debouncer
The `InboundDebouncer` (`inbound-debouncer.ts`) batches rapid-fire messages from the same conversation into a single `agent.write()` call. This prevents the agent from processing incomplete thoughts when users send multiple short messages quickly.
**Parameters:**
- `delayMs` (default 500ms) — idle window: how long to wait after each message before flushing
- `maxWaitMs` (default 2000ms) — hard cap: max time since first message before force-flushing
**Behavior:**
- Messages within 500ms of each other are combined with newlines
- Messages >500ms apart get independent flushes and separate agent runs
- No busy-awareness: each flush is independent regardless of agent state
- Each flush triggers a route snapshot (lastRoute + ackBuffer) pushed to the pendingRoutes queue
## Typing and Reaction Lifecycle
### Typing Indicator
- **Start:** `routeIncoming()` — starts a 5s repeating interval (Telegram requires re-sending "typing" every 5s)
- **Stop:** `agent_end` — only if `pendingRoutes` is empty (all queued runs complete). If runs remain queued, typing persists.
- **Also stops on:** `clearLastRoute()` (desktop/web message), `stopAccount()`, `stopAll()`, `agent_error`
### 👀 Ack Reaction
- **Add:** `routeIncoming()` — immediately on each message, before debouncing
- **Track:** pushed to `ackBuffer`, then snapshotted into `pendingRoutes[].acks` on debouncer flush, then moved to `activeAcks` on `agent_start`
- **Remove:** `agent_end` — iterates `activeAcks` and removes 👀 from each message
- **Also removed on:** `agent_error`
This ensures every queued message shows 👀 while waiting, and all 👀 are cleaned up precisely when the agent finishes processing that batch.
## Configuration
Channel credentials are stored in `~/.super-multica/credentials.json5` under the `channels` key:
```json5
{
channels: {
telegram: {
default: {
botToken: "123456:ABC-DEF..."
}
},
// discord: { default: { botToken: "..." } },
}
}
```
Each channel ID maps to accounts (keyed by account ID, typically `"default"`). The config adapter for each plugin knows how to extract and validate its credentials.
## Adding a New Plugin
1. Create `src/channels/plugins/<name>.ts` implementing `ChannelPlugin`
2. Register it in `src/channels/index.ts`:
```typescript
import { <name>Channel } from "./plugins/<name>.js";
registerChannel(<name>Channel);
```
3. Add the config shape to the `channels` section of `credentials.json5`
### Implementation Checklist
- [ ] `config` adapter: parse credentials from `credentials.json5`
- [ ] `gateway` adapter: connect to platform, normalize messages to `ChannelMessage`
- [ ] `outbound` adapter: `sendText`, `replyText`, optional `sendTyping`, `addReaction`, `removeReaction`
- [ ] `downloadMedia` (if platform supports media): download to `MEDIA_CACHE_DIR`
- [ ] Group filtering: only respond to messages directed at the bot
- [ ] Graceful shutdown: respect the `AbortSignal` passed to `gateway.start()`
## File Map
| File | Role |
|------|------|
| `src/channels/types.ts` | All type definitions (`ChannelPlugin`, `ChannelMessage`, `DeliveryContext`, etc.) |
| `src/channels/manager.ts` | `ChannelManager` — bridges plugins to the Hub's agent, route queue, typing/ack lifecycle |
| `src/channels/inbound-debouncer.ts` | `InboundDebouncer` — batches rapid-fire messages per conversationId |
| `src/channels/registry.ts` | Plugin registry (`registerChannel`, `listChannels`, `getChannel`) |
| `src/channels/config.ts` | Load channel config from `credentials.json5` |
| `src/channels/index.ts` | Bootstrap: register built-in plugins, re-export public API |
| `src/channels/plugins/telegram.ts` | Telegram plugin (grammy, long polling) |
| `src/channels/plugins/telegram-format.ts` | Markdown → Telegram HTML converter |
| `src/media/transcribe.ts` | Audio transcription (local whisper → OpenAI API) |
| `src/media/describe-image.ts` | Image description (OpenAI Vision API) |
| `src/media/describe-video.ts` | Video description (ffmpeg frame + Vision API) |
| `src/shared/paths.ts` | `MEDIA_CACHE_DIR` path constant |
| `src/hub/message-aggregator.ts` | Streaming text → block chunking for channel delivery |
| `packages/ui/src/components/message-list.tsx` | UI rendering with `stripUserMetadata()` for clean display |
## Current Plugins
| Plugin | Platform | Transport | Library |
|--------|----------|-----------|---------|
| `telegram` | Telegram | Long polling | grammy |
Planned: Discord, Feishu, LINE, etc.

View file

@ -1,161 +0,0 @@
# Channel Media Handling
How multimedia messages (voice, image, video, document) from messaging platforms are processed before reaching the Agent.
## Core Principle
All media is converted to text before the Agent sees it. The Agent only ever receives plain text via `agent.write()`.
```
Platform message (voice/image/video/doc)
→ Plugin: detect type + download file
→ Manager: convert to text (API transcription / vision description)
→ Agent receives text via agent.write()
```
## Reference Architecture (OpenClaw)
OpenClaw supports 6 platforms (Telegram, Discord, LINE, Signal, iMessage, Slack). All share the same media processing pipeline.
### Per-Platform Layer (different for each platform)
Each platform detects media type using its own API:
| Platform | Detection Method |
|----------|-----------------|
| Telegram | `msg.voice`, `msg.audio`, `msg.photo`, `msg.video`, `msg.document` |
| Discord | `attachment.content_type` MIME prefix (`audio/`, `image/`, `video/`) |
| LINE | `message.type` field (`"audio"`, `"image"`, `"video"`, `"file"`) |
| Signal | `attachment.contentType` MIME prefix |
| iMessage | `attachment.mime_type` MIME prefix |
| Slack | Any file attachment (MIME-based detection happens later) |
Each platform downloads the file using its own API, saves to local disk, and tags it:
- `<media:audio>` for voice/audio
- `<media:image>` for images
- `<media:video>` for video
- `<media:document>` for files
### Shared Layer (`applyMediaUnderstanding()`)
One function handles all conversions, called automatically before the Agent sees the message:
1. Reads local file path + MIME type
2. Selects conversion method based on type:
- **audio** → transcription (whisper local / OpenAI API / Groq / Deepgram / Google)
- **image** → vision model description (Gemini / OpenAI / Anthropic)
- **video** → vision model description
3. Replaces placeholder with formatted text:
- Audio: `[Audio]\nTranscript:\n<transcribed text>`
- Image: `[Image]\nDescription:\n<description text>`
4. If conversion fails (no provider configured), the raw placeholder stays in the message
### Transcription Provider Priority
Auto-detection order:
1. sherpa-onnx-offline (local)
2. whisper-cli / whisper.cpp (local)
3. whisper Python CLI (local)
4. gemini CLI (local)
5. API providers: OpenAI → Groq → Deepgram → Google
### Skill Integration
Whisper skills declare requirements in `SKILL.md` metadata:
```yaml
requires:
bins: ["whisper"] # must exist in PATH
```
If the binary is missing, the skill is filtered out — the Agent never sees it. If present, the Agent can use it for transcription.
---
## Our Implementation
All media is converted to text in the Manager layer (`routeMedia()`) before reaching the Agent, matching OpenClaw's `applyMediaUnderstanding()` pattern.
### Architecture
```
┌─────────────────────────────────────────────────────┐
│ Platform Plugin (e.g. telegram.ts) │
│ │
│ bot.on("message:voice") → detect type │
│ bot.api.getFile() → download to local disk │
│ Emit ChannelMessage with media attachment │
└──────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Channel Manager (manager.ts → routeMedia()) │
│ │
│ Download file via plugin.downloadMedia() │
│ audio → transcribeAudio() → text │
│ image → describeImage() → text │
│ video → describeVideo() (ffmpeg frame + vision) → text │
│ document → file path info │
│ All results → agent.write(text) │
└──────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Agent receives plain text only │
│ e.g. "[Voice Message]\nTranscript: ..." │
│ e.g. "[Image]\nDescription: ..." │
│ e.g. "[Video]\nDescription: ..." │
└─────────────────────────────────────────────────────┘
```
### Media Processing Modules
| Type | Module | Method | API |
|------|--------|--------|-----|
| audio | `src/media/transcribe.ts` | `transcribeAudio()` | Local whisper/whisper-cli → OpenAI Whisper API (`whisper-1`) |
| image | `src/media/describe-image.ts` | `describeImage()` | OpenAI Vision API (`gpt-4o-mini`) |
| video | `src/media/describe-video.ts` | `describeVideo()` | ffmpeg frame extraction + Vision API |
| document | (inline in manager) | — | File path info only |
### Agent Output Format
| Type | Success | No API Key |
|------|---------|------------|
| audio | `[Voice Message]\nTranscript: <text>` | `[audio message received]\nFile: <path>` |
| image | `[Image]\nDescription: <text>` | `[image message received]\nFile: <path>` |
| video | `[Video]\nDescription: <text>` | `[video message received]\nFile: <path>` |
| document | `[document message received]\nFile: <path>` | same |
### Audio Transcription Priority
`transcribeAudio()` tries providers in order, matching OpenClaw's local-first approach:
1. **Local whisper/whisper-cli** — Free, no latency, works offline. Detected via `which` and cached.
2. **OpenAI Whisper API** (`whisper-1`) — Requires API key in `credentials.json5`.
3. **null** — No provider available. Placeholder stays in message, agent naturally responds (e.g. suggests installing whisper).
### Whisper Skill (Agent Fallback)
The `skills/whisper/SKILL.md` skill is a secondary safety net. If transcription returned null (no local binary, no API key), the agent receives a placeholder with the file path. If whisper is installed, the skill tells the agent how to transcribe it via the exec tool.
### File Map
| File | Role |
|------|------|
| `src/channels/types.ts` | `ChannelMediaAttachment`, `ChannelMessage.media`, `ChannelPlugin.downloadMedia` |
| `src/channels/plugins/telegram.ts` | Detect voice/audio/photo/video/document + download via Grammy API |
| `src/channels/manager.ts` | `routeMedia()` — download, convert, `agent.write(text)` |
| `src/media/transcribe.ts` | Audio → text (local whisper → OpenAI Whisper API) |
| `src/media/describe-image.ts` | Image → text via OpenAI Vision API (gpt-4o-mini) |
| `src/media/describe-video.ts` | Video → extract frame (ffmpeg) → text via Vision API |
| `src/shared/paths.ts` | `MEDIA_CACHE_DIR` (`~/.super-multica/cache/media/`) |
| `skills/whisper/SKILL.md` | Local whisper CLI fallback skill |
### Future Work
| Task | Scope |
|------|-------|
| Groq / Deepgram fallback for audio | `src/media/transcribe.ts` |
| Multi-provider vision support (Gemini, Anthropic) | `src/media/describe-image.ts` |
| Document text extraction (PDF, DOCX) | `src/media/` |
| Media cache cleanup (delete old files) | `src/shared/` |
| Outbound media (send images/audio back to channels) | `types.ts`, plugins |

View file

@ -1,30 +0,0 @@
# CLI
```bash
multica # Interactive mode
multica run "prompt" # Single prompt
multica chat --profile my-agent # Use profile
multica --session abc123 # Continue session
multica session list # List sessions
multica profile list # List profiles
multica skills list # List skills
multica help # Show help
```
Short alias: `mu`
## Sessions
Sessions persist to `~/.super-multica/sessions/<id>/` with JSONL message history and JSON metadata. Context windows are automatically managed with token-aware compaction.
## Profiles
Profiles define agent identity, personality, and memory in `~/.super-multica/agent-profiles/<id>/`.
```bash
multica profile new my-agent # Create profile
multica profile list # List all
multica profile edit my-agent # Open in file manager
```
Profile files: `soul.md`, `user.md`, `workspace.md`, `memory.md`, `memory/*.md`

View file

@ -1,338 +0,0 @@
# Client Streaming Protocol
How clients receive real-time agent events via WebSocket (Gateway mode) or IPC (Desktop mode), and what data structures to use for rendering.
## Transport Overview
```
Gateway mode (Web App):
Client ←──WebSocket──→ Gateway ←──→ Hub ←──→ Agent
Desktop mode (Electron):
Renderer ←──IPC──→ Main Process (Hub + Agent)
```
Both transports deliver the same logical events. The client receives a `StreamPayload` envelope containing an event, and routes it to the store for rendering.
## StreamPayload Envelope
Every real-time event arrives wrapped in a `StreamPayload`:
```ts
interface StreamPayload {
streamId: string; // groups events belonging to the same assistant turn
agentId: string; // which agent produced this event
event: AgentEvent | CompactionEvent;
}
```
In Gateway mode, these arrive as Socket.io messages with `action = "stream"`. In Desktop IPC mode, they arrive as `localChat:event` messages with the same structure.
## Event Types
### 1. Message Lifecycle Events (AgentEvent)
These events represent an LLM response being generated in real time.
#### `message_start`
A new assistant message has begun streaming.
```json
{
"streamId": "019abc12-...",
"agentId": "019def34-...",
"event": {
"type": "message_start",
"message": {
"role": "assistant",
"content": []
}
}
}
```
**Client action:** Create a new empty assistant message bubble. Use `streamId` as the message ID for subsequent updates.
#### `message_update`
Partial content has arrived for the current message.
```json
{
"streamId": "019abc12-...",
"agentId": "019def34-...",
"event": {
"type": "message_update",
"message": {
"role": "assistant",
"content": [
{ "type": "text", "text": "Here is the partial response so far..." },
{ "type": "thinking", "thinking": "Let me consider..." }
]
}
}
}
```
**Client action:** Replace the message's `content` array with the new snapshot. Each update contains the full accumulated content, not a delta.
#### `message_end`
The assistant message is complete.
```json
{
"streamId": "019abc12-...",
"agentId": "019def34-...",
"event": {
"type": "message_end",
"message": {
"role": "assistant",
"content": [
{ "type": "text", "text": "Final complete response." }
],
"stopReason": "end_turn"
}
}
}
```
**Client action:** Finalize the message. Mark streaming as complete. Extract `stopReason` if needed.
### 2. Tool Execution Events (AgentEvent)
These events track tool calls made by the assistant during a turn.
#### `tool_execution_start`
The agent has begun executing a tool.
```json
{
"streamId": "019abc12-...",
"agentId": "019def34-...",
"event": {
"type": "tool_execution_start",
"toolCallId": "toolu_01ABC...",
"toolName": "Bash",
"args": { "command": "ls -la" }
}
}
```
**Client action:** Create a tool result message with `toolStatus: "running"`. Display a spinner or loading indicator.
#### `tool_execution_end`
The tool has finished executing.
```json
{
"streamId": "019abc12-...",
"agentId": "019def34-...",
"event": {
"type": "tool_execution_end",
"toolCallId": "toolu_01ABC...",
"result": "file1.txt\nfile2.txt\n",
"isError": false
}
}
```
**Client action:** Update the matching tool result message. Set `toolStatus` to `"success"` or `"error"` based on `isError`. Render `result` as the tool output.
### 3. Compaction Events (CompactionEvent)
These events notify the client when context window compaction occurs. They use a synthetic `streamId` of `compaction:{agentId}` and do not belong to any message stream.
#### `compaction_start`
Context compaction has begun. The agent is removing old messages to free up context window space.
```json
{
"streamId": "compaction:019def34-...",
"agentId": "019def34-...",
"event": {
"type": "compaction_start"
}
}
```
**Client action:** Show a compaction indicator (e.g., "Compacting context...").
#### `compaction_end`
Compaction is complete. Includes statistics about what was removed.
```json
{
"streamId": "compaction:019def34-...",
"agentId": "019def34-...",
"event": {
"type": "compaction_end",
"removed": 24,
"kept": 8,
"tokensRemoved": 45000,
"tokensKept": 12000,
"reason": "tokens"
}
}
```
| Field | Type | Description |
|-------|------|-------------|
| `removed` | `number` | Number of messages removed |
| `kept` | `number` | Number of messages retained |
| `tokensRemoved` | `number?` | Estimated tokens freed (absent in count mode) |
| `tokensKept` | `number?` | Estimated tokens remaining (absent in count mode) |
| `reason` | `string` | What triggered compaction: `"tokens"`, `"count"`, or `"summary"` |
**Client action:** Hide the compaction indicator. Optionally display a toast or inline notice with the stats.
## Content Block Types
Message content is an array of `ContentBlock`, which is a union of:
```ts
// Plain text
interface TextContent {
type: "text";
text: string;
}
// LLM reasoning (extended thinking)
interface ThinkingContent {
type: "thinking";
thinking: string;
}
// Tool invocation (appears in assistant messages)
interface ToolCall {
type: "toolCall";
id: string;
name: string;
arguments: Record<string, unknown>;
}
// Image content (appears in user messages)
interface ImageContent {
type: "image";
source: { type: "base64"; media_type: string; data: string };
}
```
## Client-Side Store Structure
The recommended Zustand store shape for rendering:
```ts
interface Message {
id: string;
role: "user" | "assistant" | "toolResult";
content: ContentBlock[];
agentId: string;
stopReason?: string;
// Tool result fields (role === "toolResult" only)
toolCallId?: string;
toolName?: string;
toolArgs?: Record<string, unknown>;
toolStatus?: "running" | "success" | "error" | "interrupted";
isError?: boolean;
}
interface CompactionStats {
removed: number;
kept: number;
tokensRemoved?: number;
tokensKept?: number;
reason: string;
}
interface MessagesState {
messages: Message[];
streamingIds: Set<string>; // IDs of messages currently streaming
compacting: boolean; // true while compaction is in progress
lastCompaction: CompactionStats | null; // stats from most recent compaction
}
```
## Event Routing Pseudocode
```ts
function handleStreamEvent(payload: StreamPayload) {
const { streamId, agentId, event } = payload;
switch (event.type) {
case "message_start":
store.startStream(streamId, agentId);
break;
case "message_update":
store.appendStream(streamId, event.message.content);
break;
case "message_end":
store.endStream(streamId, event.message.content, event.message.stopReason);
break;
case "tool_execution_start":
store.startToolExecution(agentId, event.toolCallId, event.toolName, event.args);
break;
case "tool_execution_end":
store.endToolExecution(event.toolCallId, event.result, event.isError);
break;
case "compaction_start":
store.startCompaction();
break;
case "compaction_end":
store.endCompaction({
removed: event.removed,
kept: event.kept,
tokensRemoved: event.tokensRemoved,
tokensKept: event.tokensKept,
reason: event.reason,
});
break;
}
}
```
## Message History via RPC
Clients can also fetch historical messages using the `getAgentMessages` RPC method. See [rpc.md](./rpc.md) for details.
The response returns `AgentMessage[]` which must be normalized into the `Message` format above. Key differences from streaming:
- Historical messages don't have `toolStatus` — infer it from `isError` (`"error"` or `"success"`).
- Historical messages may have `content` as a plain `string` instead of `ContentBlock[]` — normalize by wrapping in `[{ type: "text", text: content }]`.
- Tool arguments are not stored on `toolResult` messages — build a lookup map from assistant `ToolCall` blocks by `toolCallId` to reconstruct `toolArgs`.
## SDK Imports
All types are available from `@multica/sdk`:
```ts
import {
StreamAction,
type StreamPayload,
type AgentEvent,
type CompactionEvent,
type CompactionStartEvent,
type CompactionEndEvent,
type ContentBlock,
type TextContent,
type ThinkingContent,
type ToolCall,
type ImageContent,
} from "@multica/sdk";
```
Store types are available from `@multica/store`:
```ts
import {
useMessagesStore,
type Message,
type CompactionStats,
type ToolStatus,
} from "@multica/store";
```

View file

@ -1,64 +0,0 @@
# Credentials & LLM Providers
## Setup
```bash
multica credentials init
```
Creates:
- `~/.super-multica/credentials.json5` — LLM providers + tools
Example `credentials.json5`:
```json5
{
version: 1,
llm: {
provider: "openai",
providers: {
openai: { apiKey: "sk-xxx", model: "gpt-4o" }
}
},
tools: {
brave: { apiKey: "brv-..." }
}
}
```
## Skill API Keys
Skill-specific API keys are stored in `.env` files within each skill's directory:
```
~/.super-multica/skills/<skill-id>/.env
```
Example for the `earnings-analysis` skill:
```bash
# ~/.super-multica/skills/earnings-analysis/.env
FINANCIAL_DATASETS_API_KEY=your-key-here
```
Skills declare their required environment variables in `SKILL.md` frontmatter:
```yaml
metadata:
requires:
env:
- FINANCIAL_DATASETS_API_KEY
```
The `.env` file is preserved across skill upgrades and is never committed to version control.
## LLM Providers
**OAuth Providers** (external CLI login):
- `claude-code` — requires `claude login`
- `openai-codex` — requires `codex login`
**API Key Providers** (configure in `credentials.json5`):
- `anthropic`, `openai`, `kimi-coding`, `google`, `groq`, `mistral`, `xai`, `openrouter`
Check status: `/provider` in interactive mode

View file

@ -1,82 +0,0 @@
# Development Guide
## Dev Commands
```bash
pnpm dev # Desktop app (recommended)
pnpm dev:desktop # Same as above
pnpm dev:gateway # Gateway only
pnpm dev:web # Web app only
pnpm dev:all # Gateway + Web
pnpm build # Production build (turbo-orchestrated)
pnpm typecheck # Type check all packages
pnpm test # Run tests
pnpm test:watch # Watch mode
pnpm test:coverage # With v8 coverage
```
## Local Full-Stack Development
`pnpm dev:local` starts Gateway + Desktop + Web together with isolated data directories.
**Setup:**
1. Copy `.env.example` to `.env` at the repo root
2. Fill in `TELEGRAM_BOT_TOKEN` (get from [@BotFather](https://t.me/BotFather))
3. Run `pnpm dev:local`
| Service | Address | Notes |
|---------|---------|-------|
| Gateway | `http://localhost:4000` | Telegram long-polling mode |
| Web | `http://localhost:3000` | OAuth login flow |
| Desktop | — | Connects to local Gateway + Web |
Data is stored in `~/.super-multica-dev` and `~/Documents/Multica-dev`, isolated from production.
```bash
pnpm dev:local:archive # Archive dev data and start fresh
```
## Environment Configuration
**Desktop** (`apps/desktop/.env.*`):
| Variable | Description |
|----------|-------------|
| `MAIN_VITE_GATEWAY_URL` | WebSocket Gateway URL for remote device pairing |
| `MAIN_VITE_WEB_URL` | Web app URL for OAuth login redirect |
**Web** (`apps/web/next.config.ts`):
| Variable | Description |
|----------|-------------|
| `MULTICA_API_URL` | Backend API URL (required, no default) |
**Build for different environments:**
```bash
# Desktop
pnpm --filter @multica/desktop build # Production (.env.production)
pnpm --filter @multica/desktop build:staging # Staging (.env.staging)
# Web (Vercel)
# Set MULTICA_API_URL in Vercel Dashboard → Settings → Environment Variables
```
See `apps/desktop/.env.example` for the full variable reference.
## Monorepo Workflow
| Command | Purpose |
|---------|---------|
| `pnpm dev` | Full dev mode — watches `core`, `types`, `utils` packages |
| `pnpm dev:desktop` | Desktop only — skip package watching |
**When modifying packages:**
1. Edit code in `packages/core`, `packages/types`, or `packages/utils`
2. Terminal shows `[core] ESM ⚡️ Build success` (~100ms)
3. Restart Desktop to apply changes (Ctrl+C, then `pnpm dev`)
> **Why restart?** Electron main process does not support hot reload — this is an Electron limitation, not ours.

View file

@ -1,235 +0,0 @@
# Exec Approval Protocol
Human-in-the-loop command execution approval for the `exec` tool. When an agent attempts to run a shell command that doesn't pass safety checks, the Hub requests approval from the connected client before proceeding.
## Architecture Overview
```
Agent (exec tool) Hub Gateway Client (UI)
| | | |
|-- onApprovalNeeded -->| | |
| |-- evaluateCommandSafety() |
| |-- requiresApproval()? |
| | | |
| |== exec-approval-request =============> |
| | | |-- show UI
| | | |-- user decides
| | <== resolveExecApproval RPC ==========|
| | | |
| <-- approved/denied -| | |
| | | |
```
1. The **Agent** calls the `exec` tool with a shell command.
2. The `exec` tool invokes the `onApprovalNeeded` callback (injected by the Hub).
3. The **Hub** evaluates the command through a 4-layer safety engine.
4. If approval is needed, the Hub sends an `exec-approval-request` message to the Client via the Gateway.
5. The **Client** displays the approval UI and the user makes a decision.
6. The Client calls the `resolveExecApproval` RPC with the decision.
7. The Hub resolves the pending promise and the command is either executed or denied.
## Safety Evaluation
Before requesting approval, the Hub evaluates the command through 4 layers:
| Layer | Description | Example |
|-------|-------------|---------|
| **Allowlist** | Glob patterns of pre-approved commands | `git **`, `pnpm **` |
| **Shell syntax** | Detects dangerous shell constructs | `\|&`, `` ` ` ``, `$()`, `;` |
| **Safe binaries** | ~40 known-safe commands (no file-path args) | `ls`, `cat`, `git status` |
| **Dangerous patterns** | 25+ regex patterns for risky commands | `rm -rf`, `sudo`, `curl \| sh` |
The result is a risk level: `"safe"`, `"needs-review"`, or `"dangerous"`.
### Configuration
Stored in profile config (`~/.super-multica/agent-profiles/{profileId}/config.json`):
```json
{
"execApproval": {
"security": "allowlist",
"ask": "on-miss",
"timeoutMs": 60000,
"askFallback": "deny",
"allowlist": [
{ "pattern": "git **" },
{ "pattern": "pnpm **" }
]
}
}
```
| Field | Values | Default | Description |
|-------|--------|---------|-------------|
| `security` | `"deny"` \| `"allowlist"` \| `"full"` | `"allowlist"` | `deny` blocks all exec, `full` allows all, `allowlist` requires matching |
| `ask` | `"off"` \| `"on-miss"` \| `"always"` | `"on-miss"` | `off` never asks, `on-miss` asks when allowlist misses, `always` always asks |
| `timeoutMs` | number (ms) | `60000` | Time before auto-deny |
| `askFallback` | `"deny"` \| `"allowlist"` \| `"full"` | `"deny"` | What happens on timeout |
| `allowlist` | array of entries | `[]` | Pre-approved command patterns |
## WebSocket Protocol
### Step 1: Approval Request (Hub → Client)
When a command requires approval, the Hub sends a push message with action `exec-approval-request`:
```json
{
"id": "019444a0-0000-7000-8000-000000000001",
"from": "<hubDeviceId>",
"to": "<clientDeviceId>",
"action": "exec-approval-request",
"payload": {
"approvalId": "019444a0-1234-7abc-8000-abcdef123456",
"agentId": "019444a0-5678-7def-8000-123456abcdef",
"command": "rm -rf /tmp/test-data",
"cwd": "/Users/alice/projects/my-app",
"riskLevel": "dangerous",
"riskReasons": [
"Matches dangerous pattern: rm with -r or -f flags",
"Uses recursive/force deletion flags"
],
"expiresAtMs": 1738700060000
}
}
```
#### Payload Fields
| Field | Type | Description |
|-------|------|-------------|
| `approvalId` | `string` | Unique ID for this approval request (UUIDv7). Must be included in the response. |
| `agentId` | `string` | Session ID of the agent that initiated the command. |
| `command` | `string` | The shell command to be executed. |
| `cwd` | `string?` | Working directory for the command. Optional. |
| `riskLevel` | `"safe" \| "needs-review" \| "dangerous"` | Evaluated risk level. |
| `riskReasons` | `string[]` | Human-readable reasons for the risk assessment. |
| `expiresAtMs` | `number` | Unix timestamp (ms) when this request expires. After this time, the Hub auto-resolves based on `askFallback`. |
### Step 2: User Decision (Client → Hub)
The client sends a standard RPC request with method `resolveExecApproval`:
```json
{
"id": "019444a0-0000-7000-8000-000000000002",
"from": "<clientDeviceId>",
"to": "<hubDeviceId>",
"action": "request",
"payload": {
"requestId": "client-req-001",
"method": "resolveExecApproval",
"params": {
"approvalId": "019444a0-1234-7abc-8000-abcdef123456",
"decision": "allow-once"
}
}
}
```
#### Decision Values
| Decision | Effect |
|----------|--------|
| `"allow-once"` | Allow this command to execute. No persistent change. |
| `"allow-always"` | Allow and add the command's binary to the profile allowlist (e.g., `rm **`). Future commands from the same binary will auto-approve. |
| `"deny"` | Block the command. The agent receives a denial message. |
### Step 3: RPC Response (Hub → Client)
**Success** — the approval was found and resolved:
```json
{
"id": "019444a0-0000-7000-8000-000000000003",
"from": "<hubDeviceId>",
"to": "<clientDeviceId>",
"action": "response",
"payload": {
"requestId": "client-req-001",
"ok": true,
"payload": {
"ok": true
}
}
}
```
**Error** — the approval was not found (already resolved or expired):
```json
{
"id": "019444a0-0000-7000-8000-000000000004",
"from": "<hubDeviceId>",
"to": "<clientDeviceId>",
"action": "response",
"payload": {
"requestId": "client-req-001",
"ok": false,
"error": {
"code": "NOT_FOUND",
"message": "Approval request not found or already resolved"
}
}
}
```
## Timeout Behavior
If the client does not respond within `timeoutMs` (default: 60 seconds), the Hub resolves the approval automatically based on the `askFallback` configuration:
| `askFallback` | Behavior on timeout |
|---------------|---------------------|
| `"deny"` (default) | Command is denied (fail-closed). |
| `"full"` | Command is allowed. |
| `"allowlist"` | Command is allowed only if it matched the allowlist; otherwise denied. |
## SDK Types
All protocol types are exported from `@multica/sdk`:
```ts
import {
ExecApprovalRequestAction, // "exec-approval-request"
type ApprovalDecision, // "allow-once" | "allow-always" | "deny"
type ExecApprovalRequestPayload,
type ResolveExecApprovalParams,
type ResolveExecApprovalResult,
} from "@multica/sdk";
```
## Client Implementation Guide
A minimal client handling exec approvals:
```ts
import { GatewayClient, ExecApprovalRequestAction } from "@multica/sdk";
import type { ExecApprovalRequestPayload, ApprovalDecision } from "@multica/sdk";
// Listen for approval requests
client.onMessage((msg) => {
if (msg.action === ExecApprovalRequestAction) {
const payload = msg.payload as ExecApprovalRequestPayload;
showApprovalUI(payload);
}
});
// When user makes a decision
async function respondToApproval(approvalId: string, decision: ApprovalDecision) {
const result = await client.request(hubDeviceId, "resolveExecApproval", {
approvalId,
decision,
});
// result.ok === true if resolved successfully
}
```
## Error Handling
The system is designed to be **fail-closed**:
- If sending the approval request to the client fails → command is denied.
- If the client disconnects before responding → timeout fires, command follows `askFallback` (default: deny).
- If the RPC response references an unknown `approvalId``NOT_FOUND` error returned, no side effects.
- If the agent is closed while an approval is pending → all pending approvals for that agent are auto-denied.

View file

@ -1,265 +0,0 @@
# Multica Memo
**Multiplexed Information & Computing Agent**
---
## What is Multica
Multica is an always-on AI agent that pulls real data, runs real computation, and takes real action on behalf of users.
It is not a chatbot. It is not a search engine. It is not an analytics dashboard. It is an **autonomous employee** that works 24/7 — monitoring, analyzing, and acting within user-defined authorization boundaries.
Users interact with Multica through natural conversation. They can ask for immediate analysis, or tell the agent to run recurring tasks in the background. The same interface handles both modes — no separate workflow builder, no configuration forms. You talk to it like you'd talk to a team member.
---
## Core Insight
The value chain of knowledge work is: **Data → Analysis → Decision → Action**.
Existing AI products truncate this chain. ChatGPT and Claude stop at conversation. Perplexity stops at search. BI dashboards stop at visualization. Each one hands the remaining work back to the human.
Multica completes the full chain:
- **Data**: Pulls structured data from multiple sources through a unified `data` tool, backed by Multica's centralized data infrastructure. Users never configure API keys or deal with data providers.
- **Analysis**: Runs actual computation — Python, statistical models, charts — not just text summaries. The agent writes and executes code to derive quantitative insights.
- **Decision**: Applies domain-specific analytical frameworks encoded as Skills to evaluate the data and form actionable conclusions.
- **Action**: Executes real-world actions (trade, send email, update records) within a tiered authorization model that the user controls.
---
## Architecture
### One Tool, Infinite Domains
Multica's extensibility model is designed for horizontal scaling across verticals without agent-side complexity growth.
```
Finance Legal Medical ...
┌──────────┐ ┌──────────┐ ┌──────────┐
Skills │ Earnings │ │ Case │ │ Literature│
(Markdown) │ Screening │ │ Contract │ │ Drug │
│ Macro │ │ Compliance│ │ Clinical │
└─────┬─────┘ └─────┬─────┘ └─────┬────┘
│ │ │
┌─────┴───────────────┴────────────────┴────┐
Tool │ data(query, domain) │
(single) └─────────────────┬──────────────────────────┘
┌─────────────────┴──────────────────────────┐
Backend │ Multica Data Service │
│ routing / caching / normalization │
├─────────┬───────────┬───────────┬──────────┤
│ Polygon │ FRED │ PubMed │ Court- │
│ SEC │ NewsAPI │ OpenFDA │ listener │
└─────────┴───────────┴───────────┴──────────┘
```
**One `data` tool** serves all verticals. Adding a new domain means adding backend source adapters and writing Skill markdown files. The agent engine, tool set, and product surface remain unchanged.
**Skills encode domain expertise, not data plumbing.** A Skill is a Markdown file that teaches the agent an analytical workflow: what data to request, how to process it, what to look for, how to present findings. Domain experts can author Skills without writing code.
**Multica proxies all data access.** Users never register for third-party data APIs. Multica's backend handles authentication, rate limiting, caching, and normalization. This simplifies the user experience and creates a natural monetization layer.
### Foreground + Background, One Interface
```
User in conversation:
"Analyze TSLA" → Immediate execution
"Send me a market briefing every morning" → Agent schedules cron task
"Alert me if NVDA drops below 100" → Agent sets event trigger
"Cancel the morning briefing" → Agent removes cron task
```
The agent manages its own background tasks through existing tools (`cron`, `exec`). There is no separate workflow configuration UI. Conversation is the control plane.
Background tasks run persistently, independent of the app being open. Results are delivered through the user's preferred channel (email, Slack, Telegram, push notification, or in-app).
### Tiered Action Authorization
The agent's ability to take action is governed by a user-controlled trust gradient:
| Level | Behavior | Example |
|-------|----------|---------|
| 0 — Read-only | Pull data, analyze, report | Generate earnings analysis |
| 1 — Notify | Detect signal, alert user | "TSLA broke your stop-loss level" |
| 2 — Confirm | Propose action, wait for approval | "Sell 50% TSLA position? [Confirm]" |
| 3 — Autonomous | Execute within preset rules, notify after | Auto-rebalance portfolio within mandate |
Each action type can be independently configured. Users start conservative and escalate trust as they build confidence in the agent. Authorization constraints include per-action limits, daily caps, and scope restrictions.
---
## Product
### Form Factor
**Web-first** for distribution, with desktop and mobile for persistent background operation.
The primary interface is conversational — but output is structured. When the agent produces an analysis, it renders as a formatted report with charts, tables, and data citations, not a chat bubble. Reports are exportable (PDF, Excel).
The secondary interface is **the user's inbox**. Background tasks deliver results via email or messaging. Many users will interact with Multica more through their email than through the app itself.
### User Experience
A new user's first 24 hours:
1. Sign up (web, 30 seconds)
2. Tell the agent which stocks/sectors they follow
3. Next morning: first market briefing arrives in their inbox
4. Open the app, ask a follow-up question about something in the briefing
5. Tell the agent "do this every morning"
**Time to first value: < 24 hours, zero configuration, zero learning curve.**
### Cross-Domain Composition
The most powerful use cases combine multiple domains in a single workflow:
> "We're evaluating an acquisition of a gene-editing company. Give me a full due diligence report."
>
> Agent combines:
> - `data(query, "finance")` → Target's financials, valuation comps
> - `data(query, "legal")` → Patent portfolio, regulatory filings
> - `data(query, "medical")` → Clinical pipeline, trial results
> - `exec` → Python analysis, charts, risk scoring
> - Output: Integrated due diligence report spanning finance + IP + science
One `data` tool, three domains, agent orchestrates autonomously.
---
## Go-to-Market
### First Vertical: Finance
Finance is the right starting point because:
- **Data accessibility**: Abundant free and commercial APIs (market data, filings, macro indicators)
- **Willingness to pay**: Finance professionals value time; current tools (Bloomberg terminal: $24k/year) prove the market pays for information advantage
- **Quantitative output**: The agent's ability to compute (not just chat) is most visible in finance — ratios, models, charts, backtests
- **Recurring workflows**: Daily briefings, portfolio monitoring, earnings tracking — these drive retention naturally
### Target User
Individual investors, independent financial advisors, small fund analysts (< $50M AUM). They currently cobble together Yahoo Finance + SEC EDGAR + Excel + maybe Python scripts. A full company analysis takes them half a day.
Multica does it in 2 minutes.
### Distribution
| Channel | Approach |
|---------|----------|
| Twitter/X FinTwit | Real analysis examples as content — the output IS the demo |
| YouTube | "AI analyst built my morning briefing in 2 minutes" |
| Finance newsletters (Substack) | Weekly analysis pieces generated by Multica, attributed |
| Reddit (r/investing, r/SecurityAnalysis) | High-quality analysis posts, organic |
| Finance KOLs | Free Pro accounts, let them showcase their own output |
### Growth Loop
```
Free daily briefing (user signs up, picks stocks)
Briefing arrives next morning (immediate value)
User shares briefing excerpt on social media
Report footer: "Generated with Multica"
New user sees it → signs up
```
The output is inherently shareable. Every analysis report is a marketing asset.
### Pricing
| Tier | Price | Includes |
|------|-------|---------|
| Free | $0/mo | 5 analyses/month, 1 daily briefing, delayed data |
| Pro | $29/mo | Unlimited analyses, custom briefings, real-time data, export, action (Level 0-2) |
| Team | $79/user/mo | Shared workspace, collaborative Skills, API access |
| Enterprise | Custom | Private deployment, custom data sources, autonomous actions (Level 3), SLA |
---
## Roadmap
### Phase 0→1: Finance MVP (8 weeks)
| Week | Deliverable |
|------|-------------|
| 1-2 | `data` tool backend + 2 sources (market data, macro) |
| 3-4 | 3 finance Skills (company analysis, screening, macro briefing) |
| 5-6 | Email channel (agent sends results, receives instructions) |
| 7-8 | Web app (conversation + report rendering + task management) |
**Launch artifact**: "Sign up, pick 3 stocks, get your first AI briefing tomorrow morning."
### Phase 1→10: Deepen and Expand (months 3-12)
**Months 3-6 — Deepen finance:**
- More data sources (SEC filings, alternative data, earnings call transcripts)
- More Skills (DCF modeling, options analysis, sector comparison, portfolio review)
- Portfolio binding (user connects brokerage, agent gives personalized analysis)
- Event triggers (price alerts, earnings surprises, insider trading signals)
- Action capability (Level 1-2: trade proposals with confirmation)
**Months 6-12 — Adjacent verticals:**
- Finance + Legal (M&A due diligence, SEC compliance, patent analysis)
- Finance + Macro (policy impact, central bank analysis, geopolitical risk)
- Open Skill authoring (users create and share their own Skills)
### Phase 10→100: Platform (year 2+)
**Skill Ecosystem:**
```
multica.ai/skills/
├── @multica/ Official Skills (free)
├── @analyst-pro/ Community contributor (free/paid)
├── @hedgefund-x/ Enterprise private Skills
└── @lawfirm-y/ Vertical-specific paid Skills
```
- Anyone can publish a Skill (it's a Markdown file)
- Enterprises deploy private Skills for their teams
- Paid Skills: creator sets price, Multica takes platform fee
**Data Marketplace:**
- Third-party data providers plug into Multica's backend
- Premium data sources available to paying users
- Multica becomes the distribution channel for data providers
**Multi-vertical expansion:**
- Each new vertical = backend source adapters + domain Skills
- Agent engine unchanged
- Same authorization model, same product surface
---
## Defensibility
| Layer | Moat |
|-------|------|
| Data infrastructure | Aggregated, normalized, cached — hard to replicate per-source |
| Skill ecosystem | Network effects: more Skills → more users → more Skill creators |
| User data | Portfolio history, preference patterns, analysis history — switching cost |
| Trust calibration | User's authorization levels and constraints are personalized over time |
| Domain compounding | Cross-vertical composition (finance + legal + medical) is uniquely enabled by the unified `data` tool architecture |
---
## Summary
Multica is an always-on AI agent that completes the full knowledge work chain: data → analysis → decision → action.
It starts in finance — where data is accessible, users pay, and quantitative output is the clearest differentiator — with a daily briefing that delivers value in < 24 hours.
It scales horizontally through a unified `data` tool + Skill architecture that adds new verticals without changing the agent engine.
It builds a platform moat through a Skill ecosystem where domain experts encode their workflows as shareable, composable Markdown files.
The product is not a tool you open. It's an employee that works while you sleep.

View file

@ -1,232 +0,0 @@
# Message Paths — Desktop / Web / Channel
Three independent paths deliver messages to and from the Hub's agent.
All three share the same `AsyncAgent` instance — they are just different I/O surfaces.
---
## Overview
```
Desktop (Electron IPC) Web (WebSocket via Gateway) Channel (Bot API, e.g. Telegram)
│ │ │
▼ ▼ ▼
localChat:send IPC client.send → Gateway WS plugin.gateway (polling/webhook)
│ │ │
▼ ▼ ▼
hub.ts / ipc/hub.ts hub.ts / onMessage manager.ts / routeIncoming
clearLastRoute() clearLastRoute() set lastRoute
│ │ │
└────────────────► agent.write(text) ◄──────────────────────────────┘
AsyncAgent.run()
┌────────────┴────────────────┐
▼ ▼
agent.subscribe() agent.read()
(multi-consumer) (single-consumer iterable)
│ │
┌────────┴────────┐ ▼
▼ ▼ hub.ts / consumeAgent()
Desktop IPC Channel Manager │
(ipc/hub.ts) (manager.ts) ▼
│ │ Gateway WS → Web client
▼ ▼
localChat:event Bot API reply
→ renderer (via lastRoute)
```
---
## Path 1: Desktop (Electron IPC)
### Send (User → Agent)
```
Renderer: sendMessage(text)
→ IPC: localChat:send
→ ipc/hub.ts handler
→ hub.channelManager.clearLastRoute() // reply stays in desktop
→ agent.write(text)
```
**File**: `apps/desktop/electron/ipc/hub.ts``localChat:send` handler (line ~373)
### Receive (Agent → User)
```
Agent runs LLM
→ pi-agent-core fires AgentEvent
→ Agent.subscribeAll() → AsyncAgent channel + subscribers
→ agent.subscribe() callback in ipc/hub.ts
→ Filter: assistant messages + tool_execution + passthrough (compaction, agent_error)
→ IPC: mainWindow.webContents.send('localChat:event', { agentId, streamId, event })
→ Renderer: use-local-chat.ts onEvent callback
→ chat.handleStream(payload)
```
**Files**:
- `apps/desktop/electron/ipc/hub.ts``localChat:subscribe` handler (line ~248)
- `apps/desktop/src/hooks/use-local-chat.ts``onEvent` listener (line ~54)
- `packages/hooks/src/use-chat.ts``handleStream()` (line ~133)
### Error Handling
```
Agent.run() throws / returns error
→ AsyncAgent.write() catch block
→ channel.send(legacy Message) // for read() consumers (Web)
→ agent.emitMulticaEvent({ type: "agent_error", error }) // for subscribe() consumers
→ ipc/hub.ts subscriber → passthrough event → localChat:event
→ use-local-chat.ts → chat.setError() + setIsLoading(false)
```
---
## Path 2: Web (WebSocket via Gateway)
### Send (User → Agent)
```
Web app: sendMessage(text)
→ GatewayClient.send(hubId, "message", { agentId, content })
→ Socket.io → Gateway server → routes to Hub device
→ hub.ts / onMessage handler
→ channelManager.clearLastRoute() // reply stays in gateway
→ agentSenders.set(agentId, deviceId)
→ agent.write(content)
```
**File**: `src/hub/hub.ts``onMessage` handler (line ~154)
### Receive (Agent → User)
```
Agent runs LLM
→ pi-agent-core fires AgentEvent
→ Agent.subscribeAll() → AsyncAgent channel + subscribers
→ agent.read() consumed by hub.ts / consumeAgent()
→ Filter: assistant messages + tool_execution + passthrough (compaction, agent_error)
→ client.send(targetDeviceId, StreamAction, { streamId, agentId, event })
→ Socket.io → Gateway → routes to Web client device
→ GatewayClient.onMessage callback
→ use-gateway-chat.ts → chat.handleStream(payload)
```
**Files**:
- `src/hub/hub.ts``consumeAgent()` (line ~314)
- `packages/hooks/src/use-gateway-chat.ts``onMessage` listener (line ~50)
- `packages/hooks/src/use-chat.ts``handleStream()` (line ~133)
### Error Handling
```
Agent.run() throws / returns error
→ AsyncAgent.write() catch block
→ channel.send(legacy Message) // consumed by consumeAgent() → sent as "message" action
→ agent.emitMulticaEvent({ type: "agent_error", error })
→ read() → consumeAgent() → passthrough event → StreamAction
→ GatewayClient → use-gateway-chat.ts → chat.setError() + setIsLoading(false)
```
**Note**: Legacy error Messages also reach the Web client as `"message"` action (a plain text fallback). The `agent_error` event provides structured error info for proper UI rendering.
---
## Path 3: Channel (Bot API, e.g. Telegram)
### Send (User → Agent)
```
User sends message in Telegram
→ grammy long-polling receives Update
→ plugin.gateway.start() callback: onMessage(channelMessage)
→ ChannelManager.routeIncoming()
→ Set lastRoute = { plugin, deliveryCtx } // reply goes back to Telegram
→ agent.write(text) // same as desktop/web
```
**File**: `src/channels/manager.ts``routeIncoming()` (line ~233)
### Receive (Agent → User)
```
Agent runs LLM
→ pi-agent-core fires AgentEvent
→ Agent.subscribeAll() → AsyncAgent channel + subscribers
→ agent.subscribe() callback in ChannelManager.subscribeToAgent()
→ Check: if (!lastRoute) return // no active channel route, skip
→ Filter: only assistant messages
→ message_start → createAggregator() // MessageAggregator buffers/chunks text
→ message_update → aggregator.handleEvent()
→ message_end → aggregator.handleEvent() → null aggregator
→ Aggregator emits text blocks
→ Block 0: plugin.outbound.replyText(deliveryCtx, text) // Telegram reply
→ Block N: plugin.outbound.sendText(deliveryCtx, text) // follow-up messages
```
**Files**:
- `src/channels/manager.ts``subscribeToAgent()` (line ~151), `createAggregator()` (line ~205)
- `src/hub/message-aggregator.ts` — text chunking/buffering logic
### Error Handling
```
Agent.run() throws / returns error
→ AsyncAgent.write() catch block
→ agent.emitMulticaEvent({ type: "agent_error", error })
→ subscribe() → ChannelManager subscriber
→ if lastRoute exists:
→ plugin.outbound.sendText(deliveryCtx, "[Error] ${errorMsg}")
```
---
## Comparison Table
| Aspect | Desktop (IPC) | Web (WebSocket) | Channel (Bot API) |
|---------------------|------------------------|---------------------------|--------------------------|
| **Transport** | Electron IPC | Socket.io via Gateway | Bot API (HTTP) |
| **Send entry** | `localChat:send` | `client.send` → Gateway | `routeIncoming` |
| **Receive method** | `agent.subscribe()` | `agent.read()` (iterable) | `agent.subscribe()` |
| **Consumer** | ipc/hub.ts subscriber | hub.ts `consumeAgent()` | manager.ts subscriber |
| **Frontend hook** | `use-local-chat.ts` | `use-gateway-chat.ts` | N/A (Bot API) |
| **State hook** | `use-chat.ts` | `use-chat.ts` | N/A |
| **Reply routing** | Always (IPC channel) | `agentSenders` Map | `lastRoute` pattern |
| **clearLastRoute** | Yes (on send) | Yes (on send) | No (sets lastRoute) |
| **Error display** | `agent_error` → UI | `agent_error` → UI | `agent_error` → Bot text |
| **Tool results** | Rendered in UI | Rendered in UI | Skipped (text only) |
| **Text chunking** | No (full stream) | No (full stream) | Yes (MessageAggregator) |
---
## lastRoute Pattern
The `lastRoute` tracks which channel last sent a message. When the agent replies:
- If `lastRoute` is set → reply goes to that channel (e.g. Telegram)
- If `lastRoute` is null → reply goes to Desktop/Web only (via their own mechanisms)
**Clearing**: Desktop and Web both call `channelManager.clearLastRoute()` before `agent.write()`, so channel replies stop when the user switches to desktop/web.
**Setting**: `routeIncoming()` sets `lastRoute` when a channel message arrives.
Desktop and Web always receive agent events regardless of `lastRoute` — they use their own independent delivery mechanisms (IPC subscribe / Gateway read).
---
## Event Filtering
All three paths filter raw agent events. Only these are forwarded to consumers:
| Event Type | Desktop | Web | Channel |
|-------------------------|---------|-----|---------|
| `message_start` | assistant only | assistant only | assistant only |
| `message_update` | assistant only | assistant only | assistant only |
| `message_end` | assistant only | assistant only | assistant only |
| `tool_execution_start` | Yes | Yes | No |
| `tool_execution_end` | Yes | Yes | No |
| `compaction_start` | Yes (passthrough) | Yes (passthrough) | No |
| `compaction_end` | Yes (passthrough) | Yes (passthrough) | No |
| `agent_error` | Yes (passthrough) | Yes (passthrough) | Yes (→ text) |
| User message events | Filtered out | Filtered out | Filtered out |

View file

@ -1,497 +0,0 @@
# Mobile Development Guide
Complete lifecycle guide for developing, testing, and publishing the Expo React Native app — from first line of code to App Store / Google Play.
## Overview
```
Phase 1: Environment Setup You are here if starting fresh
Phase 2: Development & Testing Daily work loop
Phase 3: Pre-Release Preparation Before your first submission
Phase 4: Build & Submit Ship to stores
Phase 5: Post-Launch Maintain and update
```
---
## Phase 1: Environment Setup
### 1.1 Required Software
| Tool | Purpose | Install |
|------|---------|---------|
| **Node.js** (LTS) | JS runtime | `brew install node` or [nodejs.org](https://nodejs.org) |
| **pnpm** | Package manager | `corepack enable && corepack prepare pnpm@latest --activate` |
| **Xcode** | iOS build toolchain | Mac App Store (free) |
| **Xcode Command Line Tools** | Compilers, simulators | `xcode-select --install` |
| **CocoaPods** | iOS dependency manager | `sudo gem install cocoapods` |
| **Android Studio** | Android emulator + SDK (optional, iOS-first) | [developer.android.com](https://developer.android.com/studio) |
| **EAS CLI** | Expo build & submit | `npm install -g eas-cli` |
| **Expo CLI** | Dev server | Bundled with `npx expo` |
### 1.2 Xcode First-Time Setup
1. Open Xcode at least once to accept the license and install components
2. **Add your Apple ID** (free account is enough for development):
- Xcode → Settings → Accounts → `+` → Apple ID
- This creates a "Personal Team" for free code signing
3. Verify simulators are installed:
- Xcode → Settings → Components → download an iOS Simulator runtime
### 1.3 iPhone First-Time Setup (for Real Device Testing)
1. **Enable Developer Mode** (required on iOS 16+):
- Settings → Privacy & Security → Developer Mode → ON
- Device will restart
2. Connect iPhone to Mac via USB/USB-C cable
3. When prompted "Trust This Computer?" → tap Trust
### 1.4 Project Setup
```bash
# Install dependencies
pnpm install
# Generate native project files (creates ios/ and android/ directories)
npx expo prebuild
# Initialize EAS configuration (creates eas.json)
eas build:configure
```
### 1.5 Expo Account
```bash
# Create account at expo.dev, then:
eas login
eas whoami # verify
```
**No paid accounts needed at this stage.** Free Apple ID + free Expo account is enough for development.
---
## Phase 2: Development & Testing
### 2.1 Running on iOS Simulator
```bash
# Start the app in iOS simulator (no real device needed)
npx expo run:ios
```
- Fastest iteration loop — code changes hot-reload instantly
- Good for: UI layout, navigation, business logic, API calls
- **Cannot test**: camera, barcode scanner, real push notifications, biometrics
### 2.2 Running on Real iPhone
```bash
# Connect iPhone via USB, then:
npx expo run:ios --device
```
Expo CLI will:
1. Detect your connected device
2. Sign the app with your Personal Team (free Apple ID)
3. Build, install, and launch the app
**First time only**: After installation, go to:
- Settings → General → VPN & Device Management → Trust your developer certificate
#### Free Signing Limitations
| Limitation | Detail |
|-----------|--------|
| 7-day expiry | App stops launching after 7 days — just re-run `npx expo run:ios --device` |
| 3 devices max | Can register up to 3 test devices per Apple ID |
| Some entitlements unavailable | Push notifications, Apple Pay, iCloud require paid account |
| Cannot distribute to others | Only works on your own registered devices |
**Camera, barcode scanner, GPS, sensors all work fine with free signing.**
### 2.3 Daily Development Workflow
```
First time (or after native config changes):
npx expo prebuild Generate/update native projects
npx expo run:ios --device Build and install on device
Every day after that:
npx expo start --dev-client Start dev server only (no rebuild)
→ Open the app on device It connects automatically
→ Edit code, save Hot-reload updates instantly
```
**When do you need to rebuild?**
| Change | Rebuild needed? |
|--------|----------------|
| JS/TS code, React components | No — hot-reload |
| Styles, images, assets | No — hot-reload |
| Added new Expo SDK module | **Yes**`npx expo prebuild && npx expo run:ios --device` |
| Changed `app.json` permissions | **Yes** — rebuild |
| Updated native dependency | **Yes** — rebuild |
| Upgraded Expo SDK version | **Yes** — rebuild |
### 2.4 Testing Native Features (Camera, Scanner)
| Feature | Simulator | Real Device |
|---------|-----------|-------------|
| Camera preview | Not available | Works |
| Barcode / QR scan | Not available | Works |
| GPS location | Simulated location via Xcode menu | Real GPS |
| Push notifications | Not available | Requires paid Apple Developer account |
| Haptic feedback | Not available | Works |
| Device sensors (accelerometer, gyroscope) | Not available | Works |
For camera/scanner features, **always test on a real device**.
### 2.5 Debugging Tools
#### Developer Menu
Press `m` in the terminal (or shake the device) to open:
- Toggle Performance Monitor
- Toggle Element Inspector
- Open React Native DevTools
#### React Native DevTools
The primary debugging tool (replaced Chrome DevTools since RN 0.76):
| Tab | Use |
|-----|-----|
| Console | View logs, execute JS in app context |
| Sources | Set breakpoints, step through code |
| Network | Inspect API requests (Expo only) |
| Components | Inspect React component tree and props |
| Profiler | Measure render performance |
#### VS Code Integration
Install the **Expo Tools** extension for:
- Breakpoint debugging directly in VS Code
- `app.json` / `app.config.ts` IntelliSense
#### Native Crash Debugging
For crashes in native modules (not JS):
- **iOS**: Open Xcode → Window → Devices and Simulators → View Device Logs
- **Android**: `adb logcat` in terminal
---
## Phase 3: Pre-Release Preparation
**This is when you need to start spending money.**
### 3.1 Accounts & Fees
| Platform | Cost | Registration Time | Required For |
|----------|------|-------------------|--------------|
| **Apple Developer Program** | $99/year | 1-2 days review | App Store distribution |
| **Google Play Console** | $25 one-time | Days to weeks review | Play Store distribution |
| **Expo Account** | Free tier sufficient | Instant | EAS Build & Submit |
Register early — account review takes time, especially Google.
### 3.2 App Configuration
Update `app.json` or `app.config.ts`:
```jsonc
{
"name": "Multica",
"slug": "multica",
"version": "1.0.0",
"ios": {
"bundleIdentifier": "com.multica.app",
"buildNumber": "1", // increment each submission
"infoPlist": {
"NSCameraUsageDescription": "Used to scan QR codes and take photos",
"NSPhotoLibraryUsageDescription": "Used to save scanned images"
}
},
"android": {
"package": "com.multica.app",
"versionCode": 1, // increment each submission
"permissions": ["CAMERA"]
},
"icon": "./assets/icon.png", // 1024x1024 PNG, no transparency
"splash": {
"image": "./assets/splash.png"
}
}
```
### 3.3 EAS Build Profiles
`eas.json`:
```json
{
"cli": { "version": ">= 10.0.0" },
"build": {
"development": {
"developmentClient": true,
"distribution": "internal"
},
"preview": {
"distribution": "internal"
},
"production": {}
},
"submit": {
"production": {}
}
}
```
### 3.4 App Signing & Credentials
#### iOS
EAS auto-manages credentials (recommended):
- Distribution Certificate
- Provisioning Profile
- Or create manually in [Apple Developer Portal](https://developer.apple.com)
#### Android
- EAS auto-generates Keystore, stored securely on EAS servers
- **Back up your Keystore** — losing it means you can never update the published app
- Play Store requires AAB (Android App Bundle) format
### 3.5 Required Assets
| Asset | Spec |
|-------|------|
| **App Icon** | 1024x1024 PNG, no alpha/transparency (iOS) |
| **Splash Screen** | Platform-appropriate sizes |
| **iOS Screenshots** | 6.7", 6.5", 5.5" iPhone sizes + iPad (if universal) |
| **Android Screenshots** | 2-8 screenshots |
### 3.6 Required Metadata
#### Both Platforms
| Item | Notes |
|------|-------|
| **Privacy Policy URL** | Publicly accessible. Must disclose data collection, third-party sharing, AI usage, deletion rights |
| **App Description** | Short (≤80 chars for Google) + full description |
| **Support URL** | Where users can get help |
| **Account Deletion** | If app has registration, must support in-app account + data deletion |
#### Apple App Store Connect
| Item | Details |
|------|---------|
| Privacy Nutrition Labels | Data collection practices per category |
| App Review Information | Reviewer contact info, demo/test account |
| Content Rating | Age classification |
| Export Compliance | Encryption usage declaration |
| Info.plist Permission Strings | Clear purpose description for each permission |
#### Google Play Console
| Item | Details |
|------|---------|
| Data Safety Form | Required even if no data is collected |
| Content Rating Questionnaire | IARC rating |
| Target Audience | Must declare if targeting children |
| First Upload | Must upload AAB manually (Google API limitation) |
---
## Phase 4: Build & Submit
### 4.1 Production Build
```bash
# iOS
eas build --platform ios --profile production
# Android
eas build --platform android --profile production
# Both platforms
eas build --platform all --profile production
```
Builds run in Expo cloud — no local Xcode or Android Studio needed for production builds.
### 4.2 Submit to Apple App Store
```bash
eas submit --platform ios
```
This uploads the build to **App Store Connect / TestFlight**. Then:
1. Log into [App Store Connect](https://appstoreconnect.apple.com)
2. Select the uploaded build
3. Associate it with a version
4. Fill in all metadata, screenshots, privacy nutrition labels
5. Submit for App Review
### 4.3 Submit to Google Play Store
```bash
eas submit --platform android
```
**First time**: Must upload AAB manually in [Play Console](https://play.google.com/console).
After initial upload:
1. Navigate to Production → Create new release
2. Upload AAB or use the EAS-submitted build
3. Fill in description, screenshots, data safety form
4. Submit for review
### 4.4 Auto-Submit (Optional)
Build and submit in one step:
```bash
eas build --platform all --profile production --auto-submit
```
### 4.5 App Review
| | Apple | Google |
|---|---|---|
| Review time | Typically 24-48 hours | Hours to 7 days |
| Common rejections | Incomplete features, misleading screenshots, missing privacy policy, unclear permission strings | Data safety form mismatch, policy violations |
| After rejection | Fix issues, resubmit | Fix issues, resubmit |
---
## Phase 5: Post-Launch
### 5.1 OTA Updates (No Re-Review)
For JS/asset-only changes, push updates without going through App Review:
```bash
eas update --branch production
```
- Instant delivery to users — no store review
- Only works for JavaScript and asset changes
- **Native code changes still require a new build + review**
### 5.2 Version Bumping
For each new store submission:
- iOS: increment `buildNumber` in `app.json`
- Android: increment `versionCode` in `app.json`
- Bump `version` for user-visible version changes
### 5.3 CI/CD Automation
Create `.eas/workflows/build-and-submit.yml` to auto-build and submit on push to main.
#### Google Service Account Key (Automated Android Submissions)
1. EAS dashboard → Credentials → Android
2. Click Application identifier → Service Credentials
3. Add Google Service Account Key
---
## Quick Reference
### Common Commands
```bash
# Development
npx expo prebuild # Generate native projects
npx expo run:ios # Run on iOS simulator
npx expo run:ios --device # Run on connected iPhone
npx expo start --dev-client # Start dev server (after initial install)
# Building
eas build --platform ios --profile development # Dev build (for device testing)
eas build --platform ios --profile production # Production build
eas build --platform all --profile production # Both platforms
# Submitting
eas submit --platform ios # Submit to App Store
eas submit --platform android # Submit to Play Store
# OTA Updates
eas update --branch production # Push JS update to users
```
### Cost Summary
| Phase | Cost |
|-------|------|
| Development + local testing | **Free** (free Apple ID + Xcode) |
| EAS cloud builds | Free tier: 30 iOS + 30 Android builds/month |
| App Store submission | **$99/year** (Apple Developer Program) |
| Play Store submission | **$25 one-time** (Google Play Console) |
---
## Master Checklist
### Development Phase
- [ ] Install Node.js, pnpm, Xcode, EAS CLI
- [ ] Add Apple ID to Xcode (Settings → Accounts)
- [ ] Enable Developer Mode on iPhone
- [ ] Run `npx expo prebuild`
- [ ] Test on simulator: `npx expo run:ios`
- [ ] Test on real device: `npx expo run:ios --device`
- [ ] Trust developer certificate on device
- [ ] Verify camera/scanner functionality on real device
### Pre-Release Phase
- [ ] Register Apple Developer Program ($99/year)
- [ ] Register Google Play Console ($25)
- [ ] Configure `app.json` (bundleIdentifier, permissions, icon, splash)
- [ ] Configure `eas.json` build profiles
- [ ] Prepare app icon (1024x1024 PNG)
- [ ] Prepare splash screen
- [ ] Take App Store screenshots (all required sizes)
- [ ] Write and host privacy policy URL
- [ ] Write app description (short + full)
- [ ] Set up support URL
- [ ] Implement in-app account deletion (if registration exists)
### Submission Phase
- [ ] Run `eas build --platform all --profile production`
- [ ] iOS: `eas submit --platform ios`
- [ ] iOS: Fill metadata + privacy labels in App Store Connect
- [ ] iOS: Submit for App Review
- [ ] Android: Upload first AAB manually in Play Console
- [ ] Android: `eas submit --platform android`
- [ ] Android: Fill data safety form + metadata in Play Console
- [ ] Android: Submit for review
- [ ] Wait for review approval → app goes live
### Post-Launch Phase
- [ ] Set up `eas update` for OTA updates
- [ ] Set up CI/CD workflow (optional)
- [ ] Configure Google Service Account Key for automated Android submissions (optional)
---
## References
- [Expo: Getting Started](https://docs.expo.dev/get-started/introduction/)
- [Expo: Development Builds](https://docs.expo.dev/develop/development-builds/introduction/)
- [Expo: Local App Development](https://docs.expo.dev/guides/local-app-development/)
- [Expo: Debugging Tools](https://docs.expo.dev/debugging/tools/)
- [Expo: Submit to App Stores](https://docs.expo.dev/deploy/submit-to-app-stores/)
- [Expo: EAS Submit](https://docs.expo.dev/submit/introduction/)
- [Expo: EAS Update](https://docs.expo.dev/eas-update/introduction/)
- [Apple App Review Guidelines](https://developer.apple.com/app-store/review/guidelines/)
- [Apple App Privacy Details](https://developer.apple.com/app-store/app-privacy-details/)
- [Google Play Data Safety](https://support.google.com/googleplay/android-developer/answer/10787469)
- [Google Play Developer Policy Center](https://play.google/developer-content-policy/)

View file

@ -1,315 +0,0 @@
# Package Management Guide
## Overview
Super Multica uses **pnpm workspaces** for monorepo management. This document covers package management, dependency handling, and merge conflict resolution.
---
## Directory Structure
```
super-multica/
├── apps/ # Deployable applications
│ ├── cli/ # @multica/cli
│ ├── desktop/ # @multica/desktop (Electron)
│ ├── gateway/ # @multica/gateway (NestJS WebSocket)
│ ├── server/ # @multica/server (NestJS REST)
│ ├── web/ # @multica/web (Next.js)
│ └── mobile/ # @multica/mobile (React Native)
├── packages/ # Shared libraries
│ ├── core/ # @multica/core (agent, hub, channels)
│ ├── sdk/ # @multica/sdk (gateway client)
│ ├── ui/ # @multica/ui (shared components)
│ ├── store/ # @multica/store (Zustand)
│ ├── hooks/ # @multica/hooks (React hooks)
│ ├── types/ # @multica/types (TypeScript types)
│ └── utils/ # @multica/utils (utility functions)
├── skills/ # Bundled agent skills
├── pnpm-workspace.yaml # Workspace definition
├── pnpm-lock.yaml # Lockfile (auto-generated)
└── .npmrc # pnpm configuration
```
---
## Key Configuration Files
### pnpm-workspace.yaml
Defines which directories are workspace packages:
```yaml
packages:
- "apps/*"
- "packages/*"
```
### .npmrc
**Required configuration for Electron packaging:**
```ini
shamefully-hoist=true
```
**Why?** electron-builder requires all dependencies to be hoisted to the root `node_modules`. Without this, Electron builds will fail with "Cannot find module" errors.
### pnpm-lock.yaml
- Auto-generated lockfile
- **Never manually edit**
- Always regenerate on conflicts
---
## Common Commands
### Install Dependencies
```bash
# Install all workspace dependencies
pnpm install
# Clean install (after changing .npmrc or major updates)
rm -rf node_modules apps/*/node_modules packages/*/node_modules
rm pnpm-lock.yaml
pnpm install
```
### Add Dependencies
```bash
# Add to root (shared dev tools)
pnpm add -D typescript -w
# Add to specific package
pnpm add lodash --filter @multica/core
# Add dev dependency to specific package
pnpm add -D vitest --filter @multica/core
# Add workspace dependency (internal package)
pnpm add @multica/utils --filter @multica/core --workspace
```
### Update Dependencies
```bash
# Update all
pnpm update --recursive
# Update specific package
pnpm update lodash --filter @multica/core
# Interactive update
pnpm update --interactive --recursive
```
### Run Scripts
```bash
# Run script in specific package
pnpm --filter @multica/desktop dev
pnpm --filter @multica/core build
# Run script in all packages
pnpm --recursive run build
# Run script in root
pnpm multica --help
```
---
## Workspace Dependencies
### Internal References
Use `workspace:*` for internal dependencies:
```json
{
"name": "@multica/desktop",
"dependencies": {
"@multica/core": "workspace:*",
"@multica/ui": "workspace:*",
"@multica/utils": "workspace:*"
}
}
```
### Dependency Direction
```
apps/ → depends on → packages/
packages/ui → depends on → packages/core
packages/core → depends on → packages/types, packages/utils
❌ Circular dependencies are forbidden
```
### Catalog (Shared Versions)
`pnpm-workspace.yaml` defines shared versions:
```yaml
catalog:
react: "19.2.3"
typescript: "^5.9.3"
```
Use in package.json:
```json
{
"dependencies": {
"react": "catalog:"
}
}
```
---
## Branch Merge & Conflicts
### High-Conflict Files
| File | Conflict Type | Resolution Strategy |
|------|---------------|---------------------|
| `pnpm-lock.yaml` | Auto-generated | **Always regenerate** |
| `*/package.json` | Version/deps | Manual merge |
| `pnpm-workspace.yaml` | Catalog versions | Manual merge |
| `turbo.json` | Pipeline config | Manual merge |
### Resolving pnpm-lock.yaml Conflicts
**Never manually resolve `pnpm-lock.yaml` conflicts.** It's a machine-generated file with complex checksums.
```bash
# 1. Accept either version (doesn't matter which)
git checkout --theirs pnpm-lock.yaml
# or
git checkout --ours pnpm-lock.yaml
# 2. Delete and regenerate
rm pnpm-lock.yaml
pnpm install
# 3. Stage the new lockfile
git add pnpm-lock.yaml
# 4. Continue with merge
git merge --continue
# or
git commit
```
### Standard Merge Workflow
```bash
# 1. Fetch and merge
git fetch origin main
git merge origin/main
# 2. If conflicts in pnpm-lock.yaml:
git checkout --theirs pnpm-lock.yaml
rm pnpm-lock.yaml
pnpm install
git add pnpm-lock.yaml
# 3. Resolve other conflicts manually
# Edit conflicted files...
git add <resolved-files>
# 4. Complete merge
git commit
# 5. Verify build
pnpm build
pnpm test
```
### After Major Merges
Always verify:
```bash
pnpm install # Ensure deps are correct
pnpm build # Verify build works
pnpm test # Run tests
pnpm typecheck # Check types
```
---
## Troubleshooting
### "Cannot find module" in Electron Build
**Cause:** electron-builder can't find hoisted dependencies.
**Solution:**
```bash
# Ensure .npmrc has:
echo 'shamefully-hoist=true' > .npmrc
# Clean reinstall
rm -rf node_modules apps/*/node_modules packages/*/node_modules
rm pnpm-lock.yaml
pnpm install
```
### Workspace Protocol Not Resolved
**Cause:** workspace:* not resolving correctly.
**Solution:**
```bash
# Check pnpm-workspace.yaml includes the package
# Ensure package name matches exactly
pnpm install
```
### Peer Dependency Warnings
**Cause:** Missing peer dependencies.
**Solution:**
```bash
# Usually safe to ignore, but if causing issues:
pnpm add <missing-peer> --filter <package>
```
### Build Order Issues
**Cause:** Turborepo not building dependencies first.
**Solution:** Check `turbo.json` has correct `dependsOn`:
```json
{
"tasks": {
"build": {
"dependsOn": ["^build"]
}
}
}
```
---
## Best Practices
1. **Always use pnpm** — Don't mix npm/yarn
2. **Commit lockfile** — Always commit `pnpm-lock.yaml` changes
3. **Don't edit lockfile manually** — Regenerate on conflicts
4. **Use workspace:*** — For internal dependencies
5. **Use catalog:** — For shared version management
6. **Clean install after .npmrc changes** — Delete node_modules and lockfile
7. **Verify after merge** — Run build and tests

File diff suppressed because it is too large Load diff

View file

@ -1,365 +0,0 @@
# Hub RPC Protocol
The Hub exposes an RPC (Remote Procedure Call) interface over the Gateway WebSocket transport. Clients can invoke methods on the Hub and receive structured responses, all routed through the same Gateway message layer used for regular chat.
## Architecture Overview
```
Client (SDK) Gateway (WebSocket) Hub
| | |
|-- send(RequestAction) ------->|-- route to Hub ----------->|
| | |-- dispatch(method, params)
| | |-- handler executes
|<-- receive(ResponseAction) ---|<-- route to Client --------|
| | |
```
1. The **Client** calls `client.request(hubDeviceId, method, params)`.
2. The SDK generates a `requestId` (UUIDv7), wraps it into a `RequestPayload`, and sends a message with `action = "request"` to the Hub via the Gateway.
3. The **Gateway** routes the message to the Hub's socket (standard device-to-device routing).
4. The **Hub** detects `action === "request"` in its `onMessage` handler and delegates to `RpcDispatcher.dispatch()`.
5. The dispatcher looks up the registered handler for the given `method` and invokes it.
6. The Hub sends back a message with `action = "response"` containing either a success or error payload, addressed to the original sender.
7. The **Client SDK** intercepts incoming `"response"` messages in its `RECEIVE` listener, matches by `requestId`, and resolves (or rejects) the corresponding `Promise`.
## Message Format
All RPC messages use the standard `RoutedMessage` envelope:
```ts
interface RoutedMessage<T> {
id: string; // UUIDv7 message ID
uid: string | null;
from: string; // sender deviceId
to: string; // recipient deviceId
action: string; // "request" or "response"
payload: T;
}
```
### Request Payload
```ts
interface RequestPayload<T = unknown> {
requestId: string; // UUIDv7, generated by the SDK
method: string; // RPC method name
params?: T; // method-specific parameters
}
```
### Response Payload (Success)
```ts
interface ResponseSuccessPayload<T = unknown> {
requestId: string; // matches the request
ok: true;
payload: T; // method-specific result
}
```
### Response Payload (Error)
```ts
interface ResponseErrorPayload {
requestId: string; // matches the request
ok: false;
error: {
code: string; // machine-readable error code
message: string; // human-readable description
};
}
```
## Error Codes
| Code | Description |
|---|---|
| `METHOD_NOT_FOUND` | The requested RPC method does not exist. |
| `INVALID_PARAMS` | Missing or malformed parameters. |
| `AGENT_NOT_FOUND` | No session file found for the given agent ID. |
| `RPC_ERROR` | Catch-all for unexpected errors. |
## Client SDK Usage
The `GatewayClient` provides a `request()` method that handles the full request/response lifecycle:
```ts
request<T = unknown>(
to: string, // target deviceId (Hub's deviceId)
method: string, // RPC method name
params?: unknown, // method parameters
timeout?: number, // timeout in ms (default: 10000)
): Promise<T>
```
The method:
- Generates a `requestId` internally.
- Sends a `RequestPayload` via the Gateway.
- Returns a `Promise` that resolves with the response payload on success, or rejects with an `Error` on failure or timeout.
- Automatically cleans up pending requests on disconnect.
### Example
```ts
import { GatewayClient, type GetAgentMessagesResult } from "@multica/sdk";
const client = new GatewayClient({
url: "http://localhost:3000",
deviceId: "my-client",
deviceType: "client",
});
client.connect();
client.onRegistered(async () => {
try {
const result = await client.request<GetAgentMessagesResult>(
"hub-device-id",
"getAgentMessages",
{ agentId: "019abc12-...", offset: 0, limit: 20 },
);
console.log(`Total: ${result.total}, returned: ${result.messages.length}`);
} catch (err) {
console.error("RPC failed:", err.message);
}
});
```
## Available RPC Methods
### `getAgentMessages`
Retrieves the message history for a given agent session. Works for both active and closed agents as long as the session file exists on disk.
**Parameters:**
```ts
interface GetAgentMessagesParams {
agentId: string; // required - the agent/session ID
offset?: number; // starting index (default: 0)
limit?: number; // max messages to return (default: 50)
}
```
**Response:**
```ts
interface GetAgentMessagesResult {
messages: AgentMessage[]; // array of messages
total: number; // total message count in the session
offset: number; // the offset used
limit: number; // the limit used
}
```
Each `AgentMessage` in the array is one of:
- **UserMessage** (`role: "user"`) - User input (text or multimodal content).
- **AssistantMessage** (`role: "assistant"`) - LLM response, may contain `TextContent`, `ThinkingContent`, or `ToolCall` blocks. Includes `usage` (token counts and costs), `model`, `provider`, and `stopReason`.
- **ToolResultMessage** (`role: "toolResult"`) - Result of a tool invocation, with `toolCallId`, `toolName`, `content`, and `isError`.
**Example request:**
```ts
const result = await client.request<GetAgentMessagesResult>(
hubDeviceId,
"getAgentMessages",
{ agentId: "019abc12-3def-7000-8000-000000000001", offset: 0, limit: 10 },
);
```
**Example success response payload:**
```json
{
"requestId": "019abc12-...",
"ok": true,
"payload": {
"messages": [
{ "role": "user", "content": "Hello", "timestamp": 1700000000000 },
{
"role": "assistant",
"content": [{ "type": "text", "text": "Hi! How can I help?" }],
"model": "claude-sonnet-4-20250514",
"provider": "anthropic",
"usage": { "input": 10, "output": 15, "totalTokens": 25 },
"stopReason": "end_turn",
"timestamp": 1700000001000
}
],
"total": 42,
"offset": 0,
"limit": 10
}
}
```
**Example error response payload:**
```json
{
"requestId": "019abc12-...",
"ok": false,
"error": {
"code": "AGENT_NOT_FOUND",
"message": "No session found for agent: 019abc12-bad-id"
}
}
```
### `getHubInfo`
Returns Hub status information. No parameters required.
**Response:**
```ts
interface GetHubInfoResult {
hubId: string; // Hub device ID
url: string; // Current Gateway URL
connectionState: string; // "disconnected" | "connecting" | "connected" | "registered"
agentCount: number; // Number of active agents
}
```
**Example:**
```ts
const info = await client.request<GetHubInfoResult>(hubDeviceId, "getHubInfo");
```
---
### `listAgents`
Lists all active agents. No parameters required.
**Response:**
```ts
interface ListAgentsResult {
agents: { id: string; closed: boolean }[];
}
```
**Example:**
```ts
const result = await client.request<ListAgentsResult>(hubDeviceId, "listAgents");
```
---
### `createAgent`
Creates a new agent or restores an existing one.
**Parameters:**
```ts
interface CreateAgentParams {
id?: string; // optional - reuse existing session ID
}
```
**Response:**
```ts
interface CreateAgentResult {
id: string; // the created/restored agent session ID
}
```
**Example:**
```ts
const result = await client.request<CreateAgentResult>(hubDeviceId, "createAgent");
// or with specific ID:
const result = await client.request<CreateAgentResult>(hubDeviceId, "createAgent", { id: "existing-id" });
```
---
### `deleteAgent`
Closes and removes an agent.
**Parameters:**
```ts
interface DeleteAgentParams {
id: string; // required - agent ID to delete
}
```
**Response:**
```ts
interface DeleteAgentResult {
ok: boolean; // true if agent was found and deleted
}
```
**Example:**
```ts
const result = await client.request<DeleteAgentResult>(hubDeviceId, "deleteAgent", { id: "019abc12-..." });
```
---
### `updateGateway`
Reconnects the Hub to a different Gateway URL.
**Parameters:**
```ts
interface UpdateGatewayParams {
url: string; // required - new Gateway URL
}
```
**Response:**
```ts
interface UpdateGatewayResult {
url: string; // the new URL
connectionState: string; // connection state after reconnect
}
```
**Example:**
```ts
const result = await client.request<UpdateGatewayResult>(hubDeviceId, "updateGateway", { url: "http://localhost:4000" });
```
---
## Adding New RPC Methods
1. Create a handler file in `src/hub/rpc/handlers/`:
```ts
// src/hub/rpc/handlers/my-method.ts
import { RpcError, type RpcHandler } from "../dispatcher.js";
export function createMyMethodHandler(): RpcHandler {
return (params: unknown) => {
if (!params || typeof params !== "object") {
throw new RpcError("INVALID_PARAMS", "params must be an object");
}
// ... validate and handle
return { /* result */ };
};
}
```
2. Register it in `src/hub/hub.ts` constructor:
```ts
this.rpc.register("myMethod", createMyMethodHandler());
```
3. (Optional) Add typed params/result interfaces in `packages/sdk/src/actions/rpc.ts` and export them from `packages/sdk/src/actions/index.ts` for client-side type safety.

View file

@ -1,19 +0,0 @@
# Skills & Tools
## Skills
Skills extend agent functionality via `SKILL.md` files. See [Skills Documentation](../packages/core/src/agent/skills/README.md).
```bash
multica skills list # List skills
multica skills add owner/repo # Install from GitHub
multica skills status # Check status
```
Built-in: `commit`, `code-review`, `skill-creator`
## Tools
Available tools: `read`, `write`, `edit`, `glob`, `exec`, `process`, `web_fetch`, `web_search`, `memory_search`, `sessions_spawn`
See [Tools Documentation](../packages/core/src/agent/tools/README.md) for details.

View file

@ -1,253 +0,0 @@
# SWE-bench: Agent Coding Benchmark
Run and evaluate the Multica agent against [SWE-bench](https://www.swebench.com/), the standard benchmark for AI coding agents. SWE-bench tasks are real GitHub issues from open-source Python projects — the agent must read the issue, explore the codebase, and produce a patch that fixes the bug.
## Quick Start
```bash
# 1. Download dataset (requires: pip install datasets)
python scripts/swe-bench/download-dataset.py --dataset lite --limit 5
# 2. Run the agent
npx tsx scripts/swe-bench/run.ts --limit 5
# 3. Analyze results
npx tsx scripts/swe-bench/analyze.ts
```
## Scripts
```
scripts/swe-bench/
├── download-dataset.py # Download from HuggingFace → JSONL
├── run.ts # Core runner: Agent API → git diff → predictions
├── evaluate.sh # Official Docker evaluation harness wrapper
├── analyze.ts # Summarize run results
└── .gitignore # Ignores downloaded datasets and output files
```
## Pipeline
```
┌──────────────────┐
HuggingFace ──download──► JSONL ──┤ For each task: │
│ 1. git clone │
│ 2. git checkout │
│ 3. Agent.run() │
│ 4. git diff │
└────────┬─────────┘
predictions.jsonl (SWE-bench format)
┌───────────────┴───────────────┐
│ swebench.harness (Docker) │
│ Apply patch → run tests │
│ → pass/fail verdict │
└───────────────────────────────┘
```
## Dataset Variants
| Variant | Size | HuggingFace ID | Recommended For |
|---------|------|----------------|-----------------|
| **Lite** | 300 tasks | `princeton-nlp/SWE-bench_Lite` | Quick iteration, development |
| **Verified** | 500 tasks | `princeton-nlp/SWE-bench_Verified` | Official benchmarking, leaderboard |
| **Full** | ~2294 tasks | `princeton-nlp/SWE-bench` | Comprehensive evaluation |
```bash
# Download specific variant
python scripts/swe-bench/download-dataset.py --dataset verified
python scripts/swe-bench/download-dataset.py --dataset lite --limit 20
```
## Runner Options
```bash
npx tsx scripts/swe-bench/run.ts [options]
Options:
--dataset PATH JSONL dataset path (default: scripts/swe-bench/lite.jsonl)
--provider NAME LLM provider (default: kimi-coding)
--model NAME Model override
--limit N Max tasks to run (default: all)
--offset N Skip first N tasks (default: 0)
--output PATH Output predictions JSONL (default: scripts/swe-bench/predictions.jsonl)
--workdir PATH Repo clone directory (default: /tmp/swe-bench)
--timeout MS Per-task timeout (default: 300000 = 5min)
--instance ID Run a single instance
--debug Enable debug logging
```
### Examples
```bash
# Run 10 tasks with Anthropic Claude
npx tsx scripts/swe-bench/run.ts --limit 10 --provider anthropic
# Run a specific instance
npx tsx scripts/swe-bench/run.ts --instance "django__django-16379"
# Resume from task 50 with longer timeout
npx tsx scripts/swe-bench/run.ts --offset 50 --limit 10 --timeout 600000
# Compare providers (run separately, different output files)
npx tsx scripts/swe-bench/run.ts --provider kimi-coding --output scripts/swe-bench/pred-kimi.jsonl
npx tsx scripts/swe-bench/run.ts --provider anthropic --output scripts/swe-bench/pred-claude.jsonl
```
## How the Agent Solves Tasks
For each task, the runner:
1. **Clones the repository** to `/tmp/swe-bench/<instance_id>/` and checks out `base_commit`
2. **Creates an Agent** with a focused system prompt and restricted tools (coding only — no web, no cron, no sessions)
3. **Runs the agent** with the issue description as the prompt
4. **Collects `git diff`** as the patch after the agent finishes
5. **Appends** the prediction to `predictions.jsonl` in SWE-bench format
The agent has access to:
- `read`, `write`, `edit` — file operations
- `exec`, `process` — shell commands (for exploring code, running tests)
- `glob` — file search
Tools explicitly denied: `web_fetch`, `web_search`, `cron`, `data`, `sessions_spawn`, `sessions_list`, `memory_search`, `send_file`.
## Output Files
After a run, two files are produced:
### `predictions.jsonl` — SWE-bench format
```json
{"instance_id": "astropy__astropy-12907", "model_patch": "diff --git a/...", "model_name_or_path": "multica-kimi-coding"}
```
This file is the input to the official evaluation harness.
### `predictions.results.jsonl` — detailed run metrics
```json
{
"instance_id": "astropy__astropy-12907",
"success": true,
"patch": "diff --git a/...",
"error": null,
"duration_ms": 141892,
"session_id": "019c60c7-52ac-702a-9b9c-dc53c0daea6b"
}
```
## Analyzing Results
```bash
# Summary report
npx tsx scripts/swe-bench/analyze.ts
# Or specify a results file
npx tsx scripts/swe-bench/analyze.ts scripts/swe-bench/pred-kimi.results.jsonl
```
Output includes:
- Patch rate (how many tasks produced a diff)
- Duration statistics (avg/min/max)
- Error breakdown
- Per-repository stats
- Slowest tasks
### Run-Log Analysis
Each agent session writes a structured `run-log.jsonl` to `~/.super-multica/sessions/<session-id>/`. This captures every LLM call, tool invocation, and timing:
```bash
# Find a session's run log
cat ~/.super-multica/sessions/<session-id>/run-log.jsonl | head -5
# Quick stats from a run log
cat ~/.super-multica/sessions/<session-id>/run-log.jsonl | python3 -c "
import json, sys
events = [json.loads(l) for l in sys.stdin if l.strip()]
tools = [e for e in events if e['event'] == 'tool_start']
llm_ms = sum(e.get('duration_ms', 0) for e in events if e['event'] == 'llm_result')
print(f'LLM time: {llm_ms/1000:.1f}s | Tool calls: {len(tools)}')
"
```
## Official Evaluation (Docker)
The runner produces patches, but **only the official SWE-bench harness determines pass/fail** by applying the patch and running the project's test suite.
### Prerequisites
- Docker running (at least 120GB storage, 16GB RAM, 8 CPU cores)
- `pip install swebench`
### Run Evaluation
```bash
# Using the wrapper script
bash scripts/swe-bench/evaluate.sh
# Or directly
python -m swebench.harness.run_evaluation \
--dataset_name princeton-nlp/SWE-bench_Lite \
--predictions_path scripts/swe-bench/predictions.jsonl \
--max_workers 4 \
--run_id multica
```
Results are written to `logs/` and `evaluation_results/`.
## Known Limitations and Improvements
### Current Limitations
1. **No Docker isolation for agent execution**: The agent runs on the host, so `pip install` and other commands affect the system Python. SWE-bench standard practice is to run each task in a Docker container.
2. **`SMC_DATA_DIR` timing**: Setting `SMC_DATA_DIR` at runtime doesn't affect `DATA_DIR` (resolved at module import time). Sessions currently write to `~/.super-multica/sessions/`. To isolate, set the env var before the process starts:
```bash
SMC_DATA_DIR=~/.swe-bench-eval npx tsx scripts/swe-bench/run.ts --limit 5
```
3. **Sequential execution**: Tasks run one at a time. For large-scale runs, launch multiple processes with `--offset`/`--limit` to parallelize:
```bash
# Run 4 workers in parallel
npx tsx scripts/swe-bench/run.ts --offset 0 --limit 75 --output pred-0.jsonl &
npx tsx scripts/swe-bench/run.ts --offset 75 --limit 75 --output pred-1.jsonl &
npx tsx scripts/swe-bench/run.ts --offset 150 --limit 75 --output pred-2.jsonl &
npx tsx scripts/swe-bench/run.ts --offset 225 --limit 75 --output pred-3.jsonl &
wait
cat pred-*.jsonl > predictions.jsonl
```
4. **Repo cloning per instance**: Each instance clones the full repo. For repos with many tasks (e.g., astropy, django), a shared clone with `git worktree` would be faster.
### Potential Improvements
- **Docker-per-task**: Run each agent in a Docker container matching the SWE-bench environment spec (correct Python version, pre-installed dependencies)
- **Shared repo pool**: Clone each unique repo once, use `git worktree` for per-task isolation
- **Cost tracking**: Parse run-log token counts for per-task and aggregate cost estimates
- **Multi-turn retries**: If the agent produces no patch, retry with feedback
- **System prompt tuning**: The current prompt is minimal; more detailed guidance (e.g., "search for related test files to understand expected behavior") could improve solve rate
## Related Benchmarks
| Benchmark | Focus | Notes |
|-----------|-------|-------|
| [SWE-bench Verified](https://openai.com/index/introducing-swe-bench-verified/) | Bug fixing (Python) | Gold standard, 500 human-verified tasks |
| [SWE-bench Multilingual](https://github.com/SWE-bench/SWE-bench) | Bug fixing (7 languages) | Java, TS, JS, Go, Rust, C, C++ |
| [Terminal-Bench](https://www.swebench.com/) | CLI workflows | Multi-step sandboxed terminal tasks |
| [Aider Polyglot](https://aider.chat/docs/leaderboards/) | Code editing | 225 Exercism exercises, 6 languages |
| [DPAI Arena](https://www.jetbrains.com/) | Full dev workflow | JetBrains: patch, test, review, analysis |
| [HumanEval](https://github.com/openai/human-eval) | Function generation | 164 Python function tasks, largely saturated |
## Initial Results (kimi-coding, 3 tasks)
First run on 3 SWE-bench Lite tasks (all astropy):
| Task | Status | Duration | LLM Time | Tools | Fix |
|------|--------|----------|----------|-------|-----|
| `astropy__astropy-12907` | PATCHED | 141.9s | 125.1s | 30 | `_cstack`: `= 1``= right` |
| `astropy__astropy-14182` | PATCHED | 192.0s | 166.9s | 56 | Added `header_rows` param to RST writer |
| `astropy__astropy-14365` | PATCHED | 65.7s | 49.6s | 23 | `re.compile()` + `re.IGNORECASE` |
3/3 tasks produced patches. Formal evaluation pending (requires Docker harness).

View file

@ -1,36 +0,0 @@
# Time Injection Design
Super Multica uses **message-level timestamp injection** for time awareness.
Instead of placing dynamic time text in the system prompt, user turns are stamped at runtime.
```mermaid
flowchart TD
A[Incoming turn] --> B{Entry point}
B -->|Desktop/Gateway/Cron/Subagent| C[AsyncAgent.write]
B -->|Heartbeat poll| D[AsyncAgent.write injectTimestamp=false]
C --> E{Already stamped or has 'Current time:'?}
E -->|Yes| F[Keep original message]
E -->|No| G[Prefix: [DOW YYYY-MM-DD HH:mm TZ]]
D --> H[Keep original heartbeat prompt]
F --> I[Agent.run]
G --> I
H --> I
I --> J[LLM receives final turn text]
```
## Injection Matrix
| Path | Runtime call | Timestamp injected? | Notes |
| --- | --- | --- | --- |
| Desktop direct chat | `agent.write(content)` | Yes | Default behavior |
| Gateway/remote chat | `agent.write(content)` | Yes | Same entry path as desktop |
| `sessions_spawn` child task | `childAgent.write(task)` | Yes | Child turn gets current time context |
| Cron `agent-turn` payload | `agent.write(cronMessage)` | Yes (guarded) | Skips if message already carries `Current time:` |
| Heartbeat runner | `agent.write(prompt, { injectTimestamp: false })` | No | Prevents heartbeat prompt matching from breaking |
| Internal orchestration | `writeInternal(...)` | No | Uses separate internal run path |
## Why This Design
- Keeps system prompt cache-stable (no per-turn date churn in system prompt text)
- Gives the model an explicit "now" reference on each user turn
- Uses guardrails to avoid double-stamping and heartbeat regressions

View file

@ -1,197 +0,0 @@
# Agent Profile System
The Agent Profile system allows you to define and manage agent personalities, capabilities, and configurations. Each profile is a collection of markdown files and a JSON configuration file stored in a directory.
## Directory Structure
```
~/.super-multica/agent-profiles/
└── <profile-id>/
├── soul.md # Personality constraints and behavior style
├── identity.md # Agent's name and self-awareness
├── tools.md # Custom tool usage instructions
├── memory.md # Persistent knowledge base
├── bootstrap.md # Guidance for each conversation start
└── config.json # Profile configuration (tools, provider, model)
```
## Profile Files
### soul.md
Defines the agent's personality constraints and behavior boundaries.
```markdown
# Soul
You are a helpful AI assistant. Follow these guidelines:
- Be concise and direct in your responses
- Ask clarifying questions when requirements are ambiguous
- Admit when you don't know something
```
### identity.md
Contains the agent's identity information.
```markdown
# Identity
- Name: CodeBot
- Role: Software development assistant
```
### tools.md
Custom instructions for tool usage (appended to the system prompt).
### memory.md
Persistent knowledge base that survives across conversations.
### bootstrap.md
Guidance information provided at the start of each conversation.
### config.json
JSON configuration for the profile:
```json
{
"tools": {
"profile": "coding",
"allow": ["web_fetch"],
"deny": ["exec"]
},
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"thinkingLevel": "medium"
}
```
## Configuration Options
### tools
Tool policy configuration. See [Tools README](../tools/README.md) for details.
| Field | Type | Description |
|-------|------|-------------|
| `profile` | string | Base profile: `minimal`, `coding`, `web`, `full` |
| `allow` | string[] | Additional tools to allow (supports `group:*` syntax) |
| `deny` | string[] | Tools to block (takes precedence over allow) |
| `byProvider` | object | Provider-specific tool rules |
Example configurations:
```json
// Minimal - only file operations
{
"tools": {
"profile": "minimal",
"allow": ["group:fs"]
}
}
// Coding without web access
{
"tools": {
"profile": "coding",
"deny": ["group:web"]
}
}
// Full access except shell execution
{
"tools": {
"deny": ["exec", "process"]
}
}
```
### provider
Default LLM provider for this profile.
### model
Default model ID for this profile.
### thinkingLevel
Default thinking level: `none`, `low`, `medium`, `high`.
## Usage
### CLI
```bash
# Use a specific profile
pnpm agent:cli --profile my-agent "Hello"
# Profile with custom base directory
pnpm agent:cli --profile my-agent --profile-dir /path/to/profiles "Hello"
```
### Programmatic
```typescript
import { ProfileManager } from "./profile/index.js";
// Load existing profile
const manager = new ProfileManager({
profileId: "my-agent",
baseDir: "/custom/path", // optional
});
// Get profile (returns undefined if not exists)
const profile = manager.getProfile();
// Get or create with defaults
const profile = manager.getOrCreateProfile(true); // useTemplates
// Build system prompt from profile
const systemPrompt = manager.buildSystemPrompt();
// Get tools configuration
const toolsConfig = manager.getToolsConfig();
// Get full profile config
const config = manager.getProfileConfig();
```
## Config Priority
When using a profile, configurations are merged with CLI options:
1. **Profile config.json** - Base configuration
2. **CLI options** - Override profile settings
```bash
# Profile has tools.profile = "coding"
# CLI adds --tools-deny exec
# Result: coding profile without exec tool
pnpm agent:cli --profile my-agent --tools-deny exec "list files"
```
The merge behavior:
- `profile`: CLI wins if specified
- `allow`: Union of both lists
- `deny`: Union of both lists
- `byProvider`: Deep merge with CLI taking precedence
## Creating a Profile
### Manual Creation
1. Create directory: `mkdir -p ~/.super-multica/agent-profiles/my-agent`
2. Create markdown files (soul.md, identity.md, etc.)
3. Create config.json with your settings
### Programmatic Creation
```typescript
import { createAgentProfile } from "./profile/index.js";
// Create with default templates
const profile = createAgentProfile("my-agent", {
useTemplates: true, // Fill with default content
});
// Create empty profile
const profile = createAgentProfile("minimal-agent", {
useTemplates: false,
});
```

View file

@ -1,438 +0,0 @@
# Skills System
[English](./README.md) | [中文](./README.zh-CN.md)
Skills extend agent capabilities through `SKILL.md` definition files.
## Table of Contents
- [SKILL.md Specification](#skillmd-specification)
- [Skill Invocation](#skill-invocation)
- [Loading & Precedence](#loading--precedence)
- [CLI Commands](#cli-commands)
---
## SKILL.md Specification
Each skill is a directory containing a `SKILL.md` file with YAML frontmatter + Markdown content.
### Basic Structure
```markdown
---
name: My Skill
version: 1.0.0
description: What this skill does
metadata:
emoji: "🔧"
requires:
bins: [git]
---
# Instructions
Detailed instructions injected into the agent's system prompt...
```
### Frontmatter Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | Yes | Display name |
| `version` | string | No | Version number |
| `description` | string | No | Short description |
| `homepage` | string | No | Homepage URL |
| `metadata` | object | No | See below |
| `config` | object | No | See below |
| `install` | array | No | See below |
### metadata.requires
Defines eligibility requirements:
```yaml
metadata:
emoji: "📝"
requires:
bins: [git, node] # All must exist
anyBins: [npm, pnpm] # At least one must exist
env: [API_KEY] # All must be set
platforms: [darwin, linux] # Current OS must match
```
| Field | Description |
|-------|-------------|
| `bins` | Required binaries (all must exist in PATH) |
| `anyBins` | Alternative binaries (at least one must exist) |
| `env` | Required environment variables |
| `platforms` | Supported platforms: `darwin`, `linux`, `win32` |
### config
Runtime configuration options:
```yaml
config:
enabled: true
requiresConfig: ["skills.myskill.apiKey"]
options:
timeout: 30000
```
### install
Dependency installation specifications:
```yaml
install:
- kind: brew
package: jq
- kind: npm
package: typescript
global: true
- kind: uv
package: requests
- kind: go
package: github.com/example/tool@latest
- kind: download
url: https://example.com/tool.tar.gz
archiveType: tar.gz
stripComponents: 1
```
**Supported install kinds:**
| Kind | Description | Key Fields |
|------|-------------|------------|
| `brew` | Homebrew | `package`, `cask` |
| `npm` | npm/pnpm/yarn | `package`, `global` |
| `uv` | Python uv | `package` |
| `go` | Go install | `package` |
| `download` | Download & extract | `url`, `archiveType` |
**Common fields:** `id`, `label`, `platforms`, `when`
---
## Skill Invocation
Skills can be invoked by users via slash commands (`/skill-name`) or automatically by the AI model.
### User Invocation
In the interactive CLI, type `/` followed by a skill name to invoke it:
```
You: /pdf analyze report.pdf
```
**Tab completion**: Type `/p` then press Tab to see matching skills like `/pdf`.
**List available skills**: Type `/help` to see all available skill commands.
### Invocation Control
Control how skills can be invoked using frontmatter fields:
```yaml
---
name: My Skill
user-invocable: true # Can be invoked via /command (default: true)
disable-model-invocation: false # Include in AI prompt (default: false)
---
```
| Field | Default | Description |
|-------|---------|-------------|
| `user-invocable` | `true` | Enable `/command` invocation in CLI |
| `disable-model-invocation` | `false` | If `true`, skill is hidden from AI's system prompt |
**Use cases:**
- **User-only skill** (`disable-model-invocation: true`): User can invoke via `/command`, but AI won't use it automatically
- **AI-only skill** (`user-invocable: false`): AI can use it, but no `/command` available
- **Disabled skill** (both `false`): Hidden from both user and AI
### Command Dispatch
For advanced integrations, skills can dispatch directly to tools:
```yaml
---
name: PDF Tool
command-dispatch: tool
command-tool: pdf-processor
command-arg-mode: raw
---
```
| Field | Description |
|-------|-------------|
| `command-dispatch` | Set to `tool` to enable tool dispatch |
| `command-tool` | Name of the tool to invoke |
| `command-arg-mode` | How arguments are passed (`raw` = as-is) |
### Command Name Normalization
Skill names are normalized for command use:
- Converted to lowercase
- Special characters replaced with underscores
- Truncated to 32 characters max
- Duplicate names get numeric suffixes (e.g., `pdf_2`)
---
## Loading & Precedence
Skills load from two sources with precedence (lowest to highest):
| Priority | Source | Path | Description |
|----------|--------|------|-------------|
| 1 | managed | `~/.super-multica/skills/` | Global skills (CLI-installed + bundled) |
| 2 | profile | `~/.super-multica/agent-profiles/<id>/skills/` | Profile-specific skills |
Higher priority sources override skills with the same ID.
### Initialization
On first run, bundled skills are automatically copied to the managed directory (`~/.super-multica/skills/`). This makes them editable and allows users to customize or remove them.
### Adding Profile-Specific Skills
You can install skills directly to a profile using the `--profile` option:
```bash
# Install skill to a specific profile
multica skills add owner/repo --profile my-agent
# Install with force overwrite
multica skills add owner/repo/skill-name --profile my-agent --force
```
Alternatively, create them manually:
```bash
# Create profile skills directory
mkdir -p ~/.super-multica/agent-profiles/<profile-id>/skills/<skill-name>
# Create the SKILL.md file
cat > ~/.super-multica/agent-profiles/<profile-id>/skills/<skill-name>/SKILL.md << 'EOF'
---
name: My Profile Skill
version: 1.0.0
description: A skill specific to this profile
---
# Instructions
Your skill instructions here...
EOF
```
Profile skills automatically override managed skills with the same ID, allowing per-profile customization.
### Eligibility Filtering
After loading, skills are filtered by:
1. Platform check (`platforms`)
2. Binary check (`bins`, `anyBins`)
3. Environment check (`env`)
4. Config check (`requiresConfig`)
5. Enabled check (`config.enabled`)
Only skills passing all checks are marked as eligible.
---
## CLI Commands
All commands use the unified `multica` CLI (or `pnpm multica` during development).
### List Skills
```bash
multica skills list # List all skills
multica skills list -v # Verbose mode
multica skills status # Summary status
multica skills status <id> # Specific skill status
```
### Install from GitHub
**Example: Installing from [anthropics/skills](https://github.com/anthropics/skills)**
The repository structure:
```
anthropics/skills/
├── skills/
│ ├── algorithmic-art/
│ │ └── SKILL.md
│ ├── brand-guidelines/
│ │ └── SKILL.md
│ ├── pdf/
│ │ └── SKILL.md
│ └── ... (16 skills total)
```
Install the entire repository (all 16 skills):
```bash
multica skills add anthropics/skills
# Installs to: ~/.super-multica/skills/skills/
# All skills available: algorithmic-art, brand-guidelines, pdf, etc.
```
Install a single skill only:
```bash
multica skills add anthropics/skills/skills/pdf
# Installs to: ~/.super-multica/skills/pdf/
# Only the pdf skill is installed
```
Install from a specific branch or tag:
```bash
multica skills add anthropics/skills@main
```
Using full URL:
```bash
multica skills add https://github.com/anthropics/skills
multica skills add https://github.com/anthropics/skills/tree/main/skills/pdf
```
Force overwrite existing:
```bash
multica skills add anthropics/skills --force
```
**Supported formats:**
| Format | Example | Description |
|--------|---------|-------------|
| `owner/repo` | `anthropics/skills` | Clone entire repository |
| `owner/repo/path` | `anthropics/skills/skills/pdf` | Single directory (sparse checkout) |
| `owner/repo@ref` | `anthropics/skills@v1.0.0` | Specific branch or tag |
| Full URL | `https://github.com/anthropics/skills` | GitHub URL |
| Full URL + path | `https://github.com/.../tree/main/skills/pdf` | URL with specific path |
### Remove Skills
```bash
multica skills remove <name> # Remove installed skill
multica skills remove # List installed skills
```
### Install Dependencies
```bash
multica skills install <id> # Install skill dependencies
multica skills install <id> <install-id> # Specific install option
```
---
## Status Diagnostics
The `status` command provides detailed diagnostics for understanding why skills are or aren't eligible.
### Summary Status
```bash
multica skills status # Show summary with grouping by issue type
multica skills status -v # Verbose mode with hints
```
Output shows:
- Total/eligible/ineligible counts
- Ineligible skills grouped by issue type (binary, env, platform, etc.)
### Detailed Skill Status
```bash
multica skills status <skill-id>
```
Output includes:
- Basic skill info (name, version, source, path)
- **Eligibility status** with detailed diagnostics
- **Requirements checklist** showing which binaries/env vars are present
- **Install options** with availability status
- **Quick actions** with actionable hints to resolve issues
### Diagnostic Types
| Type | Description | Example Hint |
|------|-------------|--------------|
| `disabled` | Skill disabled in config | Enable via `skills.<id>.enabled: true` |
| `not_in_allowlist` | Bundled skill not allowed | Add to `config.allowBundled` array |
| `platform` | Platform mismatch | "Only works on: darwin, linux" |
| `binary` | Missing required binary | "brew install git" |
| `any_binary` | No alternative binary found | "Install any of: npm, pnpm, yarn" |
| `env` | Missing environment variable | "export OPENAI_API_KEY=..." |
| `config` | Missing config value | "Set config path: browser.enabled" |
---
## Async Serialization
The skills system uses async serialization to prevent concurrent operations from corrupting files or causing race conditions.
### How It Works
Operations with the same key are executed sequentially:
```typescript
import { serialize, SerializeKeys } from "./skills/index.js";
// These will execute sequentially, not in parallel
const p1 = serialize(SerializeKeys.skillAdd("my-skill"), () => addSkill(...));
const p2 = serialize(SerializeKeys.skillAdd("my-skill"), () => addSkill(...));
// This runs in parallel (different key)
const p3 = serialize(SerializeKeys.skillAdd("other-skill"), () => addSkill(...));
```
### Built-in Serialization
The following operations are automatically serialized:
- `addSkill()` - by skill name
- `removeSkill()` - by skill name
- `installSkill()` - by skill ID
### Utility Functions
```typescript
import {
isProcessing, // Check if key is being processed
getQueueLength, // Get pending operations count
getActiveKeys, // Get all active operation keys
waitForKey, // Wait for key operations to complete
waitForAll, // Wait for all operations
} from "./skills/index.js";
```
---
## Troubleshooting
**Skill not showing as eligible?**
Run `pnpm skills:cli status <skill-id>` to see detailed diagnostics with actionable hints.
**Override a bundled skill?**
Create a skill with the same ID in `~/.super-multica/skills/` or profile skills directory.
**Hot reload not working?**
Ensure `chokidar` is installed: `pnpm add chokidar`
**Concurrent operations causing issues?**
All add/remove/install operations are automatically serialized. If you're building custom integrations, use the `serialize()` function with appropriate keys.

View file

@ -1,438 +0,0 @@
# Skills 系统
[English](./README.md) | [中文](./README.zh-CN.md)
Skills 通过 `SKILL.md` 定义文件扩展 Agent 的能力。
## 目录
- [SKILL.md 规范](#skillmd-规范)
- [Skill 调用](#skill-调用)
- [加载与优先级](#加载与优先级)
- [CLI 命令](#cli-命令)
---
## SKILL.md 规范
每个 skill 是一个包含 `SKILL.md` 文件的目录,文件包含 YAML frontmatter 和 Markdown 内容。
### 基本结构
```markdown
---
name: My Skill
version: 1.0.0
description: 这个 skill 的功能描述
metadata:
emoji: "🔧"
requires:
bins: [git]
---
# 说明
注入到 agent 系统提示词中的详细说明...
```
### Frontmatter 字段
| 字段 | 类型 | 必需 | 描述 |
|------|------|------|------|
| `name` | string | 是 | 显示名称 |
| `version` | string | 否 | 版本号 |
| `description` | string | 否 | 简短描述 |
| `homepage` | string | 否 | 主页 URL |
| `metadata` | object | 否 | 见下文 |
| `config` | object | 否 | 见下文 |
| `install` | array | 否 | 见下文 |
### metadata.requires
定义资格要求:
```yaml
metadata:
emoji: "📝"
requires:
bins: [git, node] # 全部必须存在
anyBins: [npm, pnpm] # 至少一个必须存在
env: [API_KEY] # 全部必须设置
platforms: [darwin, linux] # 当前操作系统必须匹配
```
| 字段 | 描述 |
|------|------|
| `bins` | 必需的二进制文件(全部必须存在于 PATH 中) |
| `anyBins` | 备选二进制文件(至少一个必须存在) |
| `env` | 必需的环境变量 |
| `platforms` | 支持的平台:`darwin``linux``win32` |
### config
运行时配置选项:
```yaml
config:
enabled: true
requiresConfig: ["skills.myskill.apiKey"]
options:
timeout: 30000
```
### install
依赖安装规范:
```yaml
install:
- kind: brew
package: jq
- kind: npm
package: typescript
global: true
- kind: uv
package: requests
- kind: go
package: github.com/example/tool@latest
- kind: download
url: https://example.com/tool.tar.gz
archiveType: tar.gz
stripComponents: 1
```
**支持的安装类型:**
| 类型 | 描述 | 关键字段 |
|------|------|----------|
| `brew` | Homebrew | `package``cask` |
| `npm` | npm/pnpm/yarn | `package``global` |
| `uv` | Python uv | `package` |
| `go` | Go install | `package` |
| `download` | 下载并解压 | `url``archiveType` |
**通用字段:** `id``label``platforms``when`
---
## Skill 调用
用户可以通过斜杠命令(`/skill-name`)调用 skillsAI 模型也可以自动调用。
### 用户调用
在交互式 CLI 中,输入 `/` 加上 skill 名称来调用:
```
You: /pdf analyze report.pdf
```
**Tab 补全**:输入 `/p` 然后按 Tab 键查看匹配的 skills`/pdf`
**列出可用 skills**:输入 `/help` 查看所有可用的 skill 命令。
### 调用控制
使用 frontmatter 字段控制 skill 的调用方式:
```yaml
---
name: My Skill
user-invocable: true # 可通过 /command 调用默认true
disable-model-invocation: false # 包含在 AI 提示词中默认false
---
```
| 字段 | 默认值 | 描述 |
|------|--------|------|
| `user-invocable` | `true` | 在 CLI 中启用 `/command` 调用 |
| `disable-model-invocation` | `false` | 如果为 `true`skill 对 AI 的系统提示词隐藏 |
**使用场景:**
- **仅用户 skill**`disable-model-invocation: true`):用户可通过 `/command` 调用,但 AI 不会自动使用
- **仅 AI skill**`user-invocable: false`AI 可使用,但没有 `/command` 可用
- **禁用 skill**(两者都为 `false`):对用户和 AI 都隐藏
### 命令分发
对于高级集成skills 可以直接分发到工具:
```yaml
---
name: PDF Tool
command-dispatch: tool
command-tool: pdf-processor
command-arg-mode: raw
---
```
| 字段 | 描述 |
|------|------|
| `command-dispatch` | 设置为 `tool` 启用工具分发 |
| `command-tool` | 要调用的工具名称 |
| `command-arg-mode` | 参数传递方式(`raw` = 原样传递) |
### 命令名称规范化
Skill 名称会被规范化以用作命令:
- 转换为小写
- 特殊字符替换为下划线
- 截断至最多 32 个字符
- 重复名称添加数字后缀(如 `pdf_2`
---
## 加载与优先级
Skills 从两个来源加载,优先级从低到高:
| 优先级 | 来源 | 路径 | 描述 |
|--------|------|------|------|
| 1 | managed | `~/.super-multica/skills/` | 全局 skillsCLI 安装 + 内置) |
| 2 | profile | `~/.super-multica/agent-profiles/<id>/skills/` | Profile 专属 skills |
高优先级来源会覆盖具有相同 ID 的 skills。
### 初始化
首次运行时,内置 skills 会自动复制到 managed 目录(`~/.super-multica/skills/`)。这使得用户可以编辑或删除它们。
### 添加 Profile 专属 Skills
可以使用 `--profile` 选项直接安装 skills 到特定 profile
```bash
# 安装 skill 到特定 profile
multica skills add owner/repo --profile my-agent
# 强制覆盖安装
multica skills add owner/repo/skill-name --profile my-agent --force
```
也可以手动创建:
```bash
# 创建 profile skills 目录
mkdir -p ~/.super-multica/agent-profiles/<profile-id>/skills/<skill-name>
# 创建 SKILL.md 文件
cat > ~/.super-multica/agent-profiles/<profile-id>/skills/<skill-name>/SKILL.md << 'EOF'
---
name: My Profile Skill
version: 1.0.0
description: 此 profile 专属的 skill
---
# 说明
你的 skill 说明内容...
EOF
```
Profile skills 会自动覆盖同 ID 的 managed skills允许按 profile 自定义。
### 资格过滤
加载后skills 会按以下条件过滤:
1. 平台检查(`platforms`
2. 二进制文件检查(`bins``anyBins`
3. 环境变量检查(`env`
4. 配置检查(`requiresConfig`
5. 启用检查(`config.enabled`
只有通过所有检查的 skills 才会被标记为符合条件。
---
## CLI 命令
所有命令使用统一的 `multica` CLI开发时使用 `pnpm multica`)。
### 列出 Skills
```bash
multica skills list # 列出所有 skills
multica skills list -v # 详细模式
multica skills status # 汇总状态
multica skills status <id> # 特定 skill 状态
```
### 从 GitHub 安装
**示例:从 [anthropics/skills](https://github.com/anthropics/skills) 安装**
仓库结构:
```
anthropics/skills/
├── skills/
│ ├── algorithmic-art/
│ │ └── SKILL.md
│ ├── brand-guidelines/
│ │ └── SKILL.md
│ ├── pdf/
│ │ └── SKILL.md
│ └── ... (共 16 个 skills)
```
安装整个仓库(所有 16 个 skills
```bash
multica skills add anthropics/skills
# 安装到:~/.super-multica/skills/skills/
# 所有 skills 可用algorithmic-art、brand-guidelines、pdf 等
```
只安装单个 skill
```bash
multica skills add anthropics/skills/skills/pdf
# 安装到:~/.super-multica/skills/pdf/
# 只安装 pdf skill
```
从特定分支或标签安装:
```bash
multica skills add anthropics/skills@main
```
使用完整 URL
```bash
multica skills add https://github.com/anthropics/skills
multica skills add https://github.com/anthropics/skills/tree/main/skills/pdf
```
强制覆盖现有:
```bash
multica skills add anthropics/skills --force
```
**支持的格式:**
| 格式 | 示例 | 描述 |
|------|------|------|
| `owner/repo` | `anthropics/skills` | 克隆整个仓库 |
| `owner/repo/path` | `anthropics/skills/skills/pdf` | 单个目录(稀疏检出) |
| `owner/repo@ref` | `anthropics/skills@v1.0.0` | 特定分支或标签 |
| 完整 URL | `https://github.com/anthropics/skills` | GitHub URL |
| 完整 URL + 路径 | `https://github.com/.../tree/main/skills/pdf` | 带特定路径的 URL |
### 移除 Skills
```bash
multica skills remove <name> # 移除已安装的 skill
multica skills remove # 列出已安装的 skills
```
### 安装依赖
```bash
multica skills install <id> # 安装 skill 依赖
multica skills install <id> <install-id> # 特定安装选项
```
---
## 状态诊断
`status` 命令提供详细的诊断信息,帮助了解 skills 为何符合或不符合条件。
### 汇总状态
```bash
multica skills status # 显示按问题类型分组的汇总
multica skills status -v # 详细模式带提示
```
输出显示:
- 总计/符合条件/不符合条件计数
- 按问题类型分组的不符合条件 skillsbinary、env、platform 等)
### 详细 Skill 状态
```bash
multica skills status <skill-id>
```
输出包括:
- 基本 skill 信息(名称、版本、来源、路径)
- **资格状态**及详细诊断
- **要求检查表**显示哪些二进制文件/环境变量存在
- **安装选项**及可用性状态
- **快速操作**及可操作的提示
### 诊断类型
| 类型 | 描述 | 示例提示 |
|------|------|----------|
| `disabled` | Skill 在配置中禁用 | 通过 `skills.<id>.enabled: true` 启用 |
| `not_in_allowlist` | 内置 skill 不在允许列表中 | 添加到 `config.allowBundled` 数组 |
| `platform` | 平台不匹配 | "仅支持darwin、linux" |
| `binary` | 缺少必需的二进制文件 | "brew install git" |
| `any_binary` | 未找到备选二进制文件 | "安装任一npm、pnpm、yarn" |
| `env` | 缺少环境变量 | "export OPENAI_API_KEY=..." |
| `config` | 缺少配置值 | "设置配置路径browser.enabled" |
---
## 异步序列化
Skills 系统使用异步序列化来防止并发操作损坏文件或导致竞态条件。
### 工作原理
具有相同键的操作按顺序执行:
```typescript
import { serialize, SerializeKeys } from "./skills/index.js";
// 这些将按顺序执行,而非并行
const p1 = serialize(SerializeKeys.skillAdd("my-skill"), () => addSkill(...));
const p2 = serialize(SerializeKeys.skillAdd("my-skill"), () => addSkill(...));
// 这个并行运行(不同的键)
const p3 = serialize(SerializeKeys.skillAdd("other-skill"), () => addSkill(...));
```
### 内置序列化
以下操作自动序列化:
- `addSkill()` - 按 skill 名称
- `removeSkill()` - 按 skill 名称
- `installSkill()` - 按 skill ID
### 工具函数
```typescript
import {
isProcessing, // 检查键是否正在处理
getQueueLength, // 获取待处理操作数量
getActiveKeys, // 获取所有活动操作键
waitForKey, // 等待键操作完成
waitForAll, // 等待所有操作
} from "./skills/index.js";
```
---
## 故障排除
**Skill 未显示为符合条件?**
运行 `multica skills status <skill-id>` 查看详细诊断及可操作的提示。
**覆盖内置 skill**
`~/.super-multica/skills/` 或配置文件 skills 目录中创建具有相同 ID 的 skill。
**热重载不工作?**
确保安装了 `chokidar``pnpm add chokidar`
**并发操作导致问题?**
所有 add/remove/install 操作都会自动序列化。如果你在构建自定义集成,请使用 `serialize()` 函数并使用适当的键。

View file

@ -1,172 +0,0 @@
# Subagent System
The subagent system allows a parent agent to spawn isolated child agents that run tasks in parallel and report results back automatically.
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────┐
│ Parent Agent (runner.ts) │
│ │
│ tools: sessions_spawn, sessions_list │
│ state: resolvedProvider, toolsOptions │
└──────────┬──────────────────────────────────────────────────────────┘
│ sessions_spawn(task, label, timeoutSeconds)
┌─────────────────────────────────────────────────────────────────────┐
│ Spawn Flow (sessions-spawn.ts) │
│ │
│ 1. Build subagent system prompt (announce.ts) │
│ 2. hub.createSubagent(childSessionId, { provider, model }) │
│ 3. registerSubagentRun({ start: () => childAgent.write(task) }) │
│ 4. Return { status: "accepted", runId, childSessionId } │
└──────────┬──────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Concurrency Queue (command-queue.ts) │
│ │
│ Lane: "subagent" — max 10 concurrent (configurable) │
│ Queued runs wait for a slot before start() is called │
└──────────┬──────────────────────────────────────────────────────────┘
│ slot acquired
┌─────────────────────────────────────────────────────────────────────┐
│ Child Agent Execution │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ AsyncAgent (async-agent.ts) │ │
│ │ - Isolated session with restricted tools (isSubagent=true) │ │
│ │ - Inherits parent's LLM provider │ │
│ │ - System prompt: task focus + error reporting rules │ │
│ │ - Tracks lastRunError for error propagation │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ watchChildAgent (registry.ts) │ │
│ │ - Sets startedAt, starts timeout timer │ │
│ │ - waitForIdle() — waits for child's task queue to drain │ │
│ │ - onClose() — handles explicit close (timeout kill, etc.) │ │
│ └───────────────────────────────────────────────────────────────┘ │
└──────────┬──────────────────────────────────────────────────────────┘
│ child completes / errors / times out
┌─────────────────────────────────────────────────────────────────────┐
│ Completion Handling (registry.ts) │
│ │
│ handleRunCompletion(record) │
│ │ │
│ ├─ Phase 1: captureFindings() │
│ │ - Read last assistant reply from child session JSONL │
│ │ - Falls back to last toolResult if no assistant text │
│ │ - Persists findings to record before session deletion │
│ │ │
│ ├─ Session Cleanup │
│ │ - cleanup="delete": rm child session dir + hub.closeAgent() │
│ │ - cleanup="keep": preserve for audit │
│ │ │
│ └─ Phase 2: checkAndAnnounce(requesterSessionId) │
│ - Finds all unannounced, completed runs with findings │
│ - Calls runCoalescedAnnounceFlow() │
│ - Marks records: announced=true, archiveAtMs=now+60min │
└──────────┬──────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Announcement Delivery (announce.ts) │
│ │
│ runCoalescedAnnounceFlow(requesterSessionId, records) │
│ │ │
│ ├─ Format message: formatCoalescedAnnouncementMessage() │
│ │ - Single record: task name, status, findings, stats │
│ │ - Multiple records: combined report with all findings │
│ │ │
│ ├─ Two-tier delivery: │
│ │ │
│ │ Tier 1: BUSY (parent running or has pending writes) │
│ │ └─ enqueueAnnounce() → announce-queue.ts │
│ │ - Debounce 1s to batch nearby completions │
│ │ - Drain via writeInternal() when parent finishes │
│ │ │
│ │ Tier 2: IDLE (parent not running) │
│ │ └─ sendAnnounceDirect() │
│ │ - writeInternal(msg, { forwardAssistant, persistResponse })│
│ │ │
│ └─ All delivery uses writeInternal() (marks as internal: true) │
│ → Prevents announcement from showing as user bubble in UI │
│ → LLM processes findings and responds naturally to user │
└──────────┬──────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Record Lifecycle (registry.ts) │
│ │
│ created → startedAt → endedAt → findingsCaptured → announced │
│ │
│ After announcement: │
│ - Record kept with archiveAtMs = now + 60 min │
│ - sessions_list can still query records during this window │
│ - Sweeper runs every 60s, removes expired records │
│ - When all records removed, sweeper stops │
└─────────────────────────────────────────────────────────────────────┘
```
## Key Files
| File | Purpose |
|------|---------|
| `sessions-spawn.ts` | Tool: spawns a child agent with task, label, timeout, provider |
| `sessions-list.ts` | Tool: lists subagent runs and their status |
| `registry.ts` | Lifecycle management: register, watch, capture, announce, archive |
| `announce.ts` | System prompt builder, findings reader, message formatter, delivery |
| `announce-queue.ts` | Debounced queue for batching announcements when parent is busy |
| `command-queue.ts` | Concurrency limiter for subagent lane slots |
| `lanes.ts` | Lane config: max concurrency (10), default timeout (1800s) |
| `types.ts` | Shared types: SubagentRunRecord, SubagentRunOutcome, etc. |
| `registry-store.ts` | Persistence: save/load runs to disk for crash recovery |
## Provider Inheritance
Subagents inherit the parent's resolved LLM provider:
```
runner.ts (resolvedProvider)
→ toolsOptions.provider
→ tools.ts (CreateToolsOptions.provider)
→ sessions-spawn.ts (options.provider)
→ hub.createSubagent({ provider })
```
When the user switches providers via UI (`setProvider()`), `toolsOptions.provider` is updated in sync so future spawns use the new provider.
## Error Propagation
```
Child tool error (e.g., API 401)
→ Subagent LLM sees error, includes in final message (system prompt rule)
→ captureFindings() reads final message
→ Announcement includes error in findings
→ Parent LLM sees error and can inform user
Child run error (e.g., missing API key for provider)
→ AsyncAgent._lastRunError set
→ registry.ts checks childAgent.lastRunError after waitForIdle()
→ outcome = { status: "error", error: "No API key configured..." }
→ Announcement: "task failed: No API key configured..."
```
## Timeout Behavior
Default: 1800s (30 min). System prompt guides the parent LLM:
- Simple tasks: 1800s (default)
- Moderate tasks: 1800-2400s (30-40 min)
- Complex tasks: 2400-3600s (40-60 min)
On timeout:
1. Timeout timer fires in `watchChildAgent()`
2. `cleanup({ status: "timeout" })` is called
3. Child agent is closed via `hub.closeAgent()`
4. Findings are captured from whatever the child wrote so far
5. Announcement reports "timed out" with partial findings

View file

@ -1,266 +0,0 @@
# Tools System
[中文文档](./README.zh-CN.md)
The tools system provides LLM agents with capabilities to interact with the external world. Tools are the "hands and feet" of an agent - without tools, an LLM can only generate text responses.
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ Tool Definition │
│ (AgentTool from @mariozechner/pi-agent-core) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ name │ │ description │ │ parameters │ │
│ │ label │ │ execute │ │ (TypeBox) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 3-Layer Policy Filter │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Layer 1: Global Allow/Deny │ │
│ │ User customization via CLI or config │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Layer 2: Provider-Specific │ │
│ │ Different rules for different LLM providers │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Layer 3: Subagent Restrictions │ │
│ │ Limited tools for spawned child agents │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Filtered Tools │
│ (passed to pi-agent-core) │
└─────────────────────────────────────────────────────────────────┘
```
## Available Tools
| Tool | Name | Description |
| -------------- | ---------------- | ----------------------------------- |
| Read | `read` | Read file contents |
| Write | `write` | Write content to files |
| Edit | `edit` | Edit existing files |
| Glob | `glob` | Find files by pattern |
| Exec | `exec` | Execute shell commands |
| Process | `process` | Manage long-running processes |
| Web Fetch | `web_fetch` | Fetch and extract content from URLs |
| Web Search | `web_search` | Search the web via Devv Search |
| Sessions Spawn | `sessions_spawn` | Spawn a sub-agent session |
> **Note**: Agents use file-based memory (`memory.md`, `memory/*.md`) via `read` and `edit` tools instead of dedicated memory tools.
## Tool Groups
Groups provide shortcuts for allowing/denying multiple tools at once:
| Group | Tools |
| ---------------- | ------------------------------------ |
| `group:fs` | read, write, edit, glob |
| `group:runtime` | exec, process |
| `group:web` | web_search, web_fetch |
| `group:subagent` | sessions_spawn |
| `group:core` | All fs, runtime, and web tools |
## Usage
### CLI Usage
All commands use the unified `multica` CLI (or `pnpm multica` during development).
```bash
# Allow only specific tools
multica run --tools-allow group:fs,group:runtime "list files"
# Deny specific tools
multica run --tools-deny exec,process "read file.txt"
# Use tool groups
multica run --tools-allow group:fs "read config.json"
```
### Programmatic Usage
```typescript
import { Agent } from './runner.js';
const agent = new Agent({
tools: {
// Layer 1: Global allow/deny
allow: ['group:fs', 'group:runtime', 'web_fetch'],
deny: ['exec'],
// Layer 2: Provider-specific rules
byProvider: {
google: {
deny: ['exec', 'process'], // Google models can't use runtime tools
},
},
},
// Layer 3: Subagent mode
isSubagent: false,
});
```
### Inspecting Tool Configuration
Use the tools CLI to inspect and test configurations:
```bash
# List all available tools
multica tools list
# List tools with allow rules
multica tools list --allow group:fs,group:runtime
# List tools with deny rules
multica tools list --deny exec
# Show all tool groups
multica tools groups
```
## Policy System Details
### Layer 1: Global Allow/Deny
User-specified allow/deny lists:
- `allow`: Only these tools are available (supports group:\* syntax)
- `deny`: These tools are blocked (takes precedence over allow)
If no `allow` list is specified, all tools are available by default.
### Layer 2: Provider-Specific
Different LLM providers may have different capabilities or restrictions:
```typescript
{
byProvider: {
google: { deny: ["exec"] }, // Gemini can't execute commands
anthropic: { allow: ["*"] }, // Claude has full access
}
}
```
### Layer 3: Subagent Restrictions
When `isSubagent: true`, additional restrictions are applied to prevent spawned agents from accessing sensitive tools like session management.
## Adding New Tools
1. Create a new file in `src/agent/tools/` (e.g., `my-tool.ts`)
2. Define the tool using TypeBox for the schema:
```typescript
import { Type } from '@sinclair/typebox';
import type { AgentTool } from '@mariozechner/pi-agent-core';
const MyToolSchema = Type.Object({
param1: Type.String({ description: 'Parameter description' }),
param2: Type.Optional(Type.Number()),
});
export function createMyTool(): AgentTool<typeof MyToolSchema> {
return {
name: 'my_tool',
label: 'My Tool',
description: 'What this tool does',
parameters: MyToolSchema,
execute: async (toolCallId, args) => {
// Implementation
return { result: 'success' };
},
};
}
```
3. Register the tool in `src/agent/tools.ts`:
```typescript
import { createMyTool } from './tools/my-tool.js';
export function createAllTools(cwd: string): AgentTool<any>[] {
// ... existing tools
const myTool = createMyTool();
return [
...baseTools,
myTool as AgentTool<any>,
// ...
];
}
```
4. Add the tool to appropriate groups in `groups.ts`:
```typescript
export const TOOL_GROUPS: Record<string, string[]> = {
'group:my_category': ['my_tool', 'other_tool'],
// ...
};
```
## Testing
Run the policy system tests:
```bash
pnpm test src/agent/tools/policy.test.ts
```
## Agent Profile Integration
Tools configuration can be defined in Agent Profile's `config.json`, allowing different agents to have different tool capabilities:
```
┌─────────────────────────────────────────────────────────────────┐
│ Super Multica Hub │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ │
│ │ Profile: │ │ Profile: │ │ Profile: │ │
│ │ coder │ │ reviewer │ │ devops │ │
│ │ │ │ │ │ │ │
│ │ tools: │ │ tools: │ │ tools: │ │
│ │ allow:fs │ │ deny:* │ │ allow:* │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
└─────────┼────────────────┼────────────────┼─────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Client │ │ Client │ │ Client │
└──────────┘ └──────────┘ └──────────┘
```
Each Agent's Profile can define its own tools configuration in `config.json`:
```json
{
"tools": {
"allow": ["group:fs", "group:runtime"],
"deny": ["exec"]
},
"provider": "anthropic",
"model": "claude-sonnet-4-20250514"
}
```
See [Profile README](../profile/README.md) for full documentation.

View file

@ -1,268 +0,0 @@
# 工具系统
[English](./README.md)
工具系统为 LLM Agent 提供与外部世界交互的能力。工具是 Agent 的"手和脚"——没有工具LLM 只能生成文本响应。
## 架构概览
```
┌─────────────────────────────────────────────────────────────────┐
│ 工具定义 │
│ (AgentTool from @mariozechner/pi-agent-core) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ name │ │ description │ │ parameters │ │
│ │ label │ │ execute │ │ (TypeBox) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 3 层策略过滤器 │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 第 1 层: 全局 Allow/Deny │ │
│ │ 通过 CLI 或配置文件进行用户自定义 │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 第 2 层: Provider 特定规则 │ │
│ │ 不同 LLM Provider 有不同的规则 │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 第 3 层: Subagent 限制 │ │
│ │ 子 Agent 的工具访问受限 │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 过滤后的工具 │
│ (传递给 pi-agent-core) │
└─────────────────────────────────────────────────────────────────┘
```
## 可用工具
| 工具 | 名称 | 描述 |
| -------------- | ---------------- | ------------------------------ |
| Read | `read` | 读取文件内容 |
| Write | `write` | 写入文件内容 |
| Edit | `edit` | 编辑现有文件 |
| Glob | `glob` | 按模式查找文件 |
| Exec | `exec` | 执行 Shell 命令 |
| Process | `process` | 管理长时间运行的进程 |
| Web Fetch | `web_fetch` | 从 URL 获取并提取内容 |
| Web Search | `web_search` | 搜索网络(需要 API Key |
| Memory Search | `memory_search` | 搜索 memory 文件(需要 Profile|
| Sessions Spawn | `sessions_spawn` | 创建子 Agent 会话 |
> **注意**: `memory_search` 工具通过关键词搜索 `memory.md``memory/*.md` 文件。Agent 通过 `read``edit` 工具操作 memory 文件内容。
## 工具组
工具组提供了一次性允许/禁止多个工具的快捷方式:
| 组 | 工具 |
| ---------------- | ------------------------------ |
| `group:fs` | read, write, edit, glob |
| `group:runtime` | exec, process |
| `group:web` | web_search, web_fetch |
| `group:memory` | memory_search |
| `group:subagent` | sessions_spawn |
| `group:core` | 所有 fs、runtime 和 web 工具 |
## 使用方法
### CLI 使用
所有命令使用统一的 `multica` CLI开发时使用 `pnpm multica`)。
```bash
# 只允许特定工具
multica run --tools-allow group:fs,group:runtime "list files"
# 禁止特定工具
multica run --tools-deny exec,process "read file.txt"
# 使用工具组
multica run --tools-allow group:fs "read config.json"
```
### 编程使用
```typescript
import { Agent } from './runner.js';
const agent = new Agent({
tools: {
// 第 1 层: 全局 allow/deny
allow: ['group:fs', 'group:runtime', 'web_fetch'],
deny: ['exec'],
// 第 2 层: Provider 特定规则
byProvider: {
google: {
deny: ['exec', 'process'], // Google 模型不能使用运行时工具
},
},
},
// 第 3 层: Subagent 模式
isSubagent: false,
});
```
### 检查工具配置
使用 tools CLI 检查和测试配置:
```bash
# 列出所有可用工具
multica tools list
# 列出带有允许规则的工具
multica tools list --allow group:fs,group:runtime
# 列出带有禁止规则的工具
multica tools list --deny exec
# 显示所有工具组
multica tools groups
```
## 策略系统详情
### 第 1 层: 全局 Allow/Deny
用户指定的 allow/deny 列表:
- `allow`: 只有这些工具可用(支持 group:\* 语法)
- `deny`: 这些工具被阻止(优先于 allow
如果未指定 `allow` 列表,默认所有工具都可用。
### 第 2 层: Provider 特定规则
不同的 LLM Provider 可能有不同的能力或限制:
```typescript
{
byProvider: {
google: { deny: ["exec"] }, // Gemini 不能执行命令
anthropic: { allow: ["*"] }, // Claude 有完全访问权限
}
}
```
### 第 3 层: Subagent 限制
`isSubagent: true` 时,会应用额外的限制,防止子 Agent 访问敏感工具(如会话管理)。
## 添加新工具
1. 在 `src/agent/tools/` 中创建新文件(例如 `my-tool.ts`
2. 使用 TypeBox 定义工具的 Schema
```typescript
import { Type } from '@sinclair/typebox';
import type { AgentTool } from '@mariozechner/pi-agent-core';
const MyToolSchema = Type.Object({
param1: Type.String({ description: '参数描述' }),
param2: Type.Optional(Type.Number()),
});
export function createMyTool(): AgentTool<typeof MyToolSchema> {
return {
name: 'my_tool',
label: 'My Tool',
description: '这个工具做什么',
parameters: MyToolSchema,
execute: async (toolCallId, args) => {
// 实现
return { result: 'success' };
},
};
}
```
3. 在 `src/agent/tools.ts` 中注册工具:
```typescript
import { createMyTool } from './tools/my-tool.js';
export function createAllTools(cwd: string): AgentTool<any>[] {
// ... 现有工具
const myTool = createMyTool();
return [
...baseTools,
myTool as AgentTool<any>,
// ...
];
}
```
4. 在 `groups.ts` 中将工具添加到适当的组:
```typescript
export const TOOL_GROUPS: Record<string, string[]> = {
'group:my_category': ['my_tool', 'other_tool'],
// ...
};
```
## 测试
运行策略系统测试:
```bash
pnpm test src/agent/tools/policy.test.ts
```
## Agent Profile 集成
工具配置可以在 Agent Profile 的 `config.json` 中定义,允许不同的 Agent 拥有不同的工具能力:
```
┌─────────────────────────────────────────────────────────────────┐
│ Super Multica Hub │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ │
│ │ Profile: │ │ Profile: │ │ Profile: │ │
│ │ coder │ │ reviewer │ │ devops │ │
│ │ │ │ │ │ │ │
│ │ tools: │ │ tools: │ │ tools: │ │
│ │ allow:fs │ │ deny:* │ │ allow:* │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
└─────────┼────────────────┼────────────────┼─────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Client │ │ Client │ │ Client │
└──────────┘ └──────────┘ └──────────┘
```
每个 Agent 的 Profile 可以在 `config.json` 中定义自己的工具配置:
```json
{
"tools": {
"allow": ["group:fs", "group:runtime"],
"deny": ["exec"]
},
"provider": "anthropic",
"model": "claude-sonnet-4-20250514"
}
```
详见 [Profile README](../profile/README.md)。

View file

@ -1,17 +0,0 @@
# @multica/store
Zustand state management for Multica apps.
## Usage
```tsx
// From barrel
import { useHubStore, useMessagesStore, useGatewayStore } from '@multica/store'
// Per-file subpath import
import { useGatewayStore } from '@multica/store/gateway'
import { useHubStore } from '@multica/store/hub'
import { useMessagesStore } from '@multica/store/messages'
import { useHubInit } from '@multica/store/hub-init'
import { useDeviceId } from '@multica/store/device-id'
```

View file

@ -1,32 +0,0 @@
# @multica/ui
Shared UI component library. Shadcn + Tailwind CSS v4.
## Usage
```tsx
// UI components — subpath imports, no barrel
import { Button } from '@multica/ui/components/ui/button'
import { Card, CardContent } from '@multica/ui/components/ui/card'
// Feature components
import { ThemeProvider } from '@multica/ui/components/theme-provider'
import { Chat } from '@multica/ui/components/chat'
import { Markdown } from '@multica/ui/components/markdown'
// Hooks
import { useIsMobile } from '@multica/ui/hooks/use-mobile'
import { useAutoScroll } from '@multica/ui/hooks/use-auto-scroll'
// Utilities
import { cn } from '@multica/ui/lib/utils'
// Styles (app entry point)
import '@multica/ui/globals.css'
```
## Adding Components
```bash
pnpm --filter @multica/ui dlx shadcn@latest add <component>
```

View file

@ -1,213 +0,0 @@
---
name: DCF Valuation
description: Perform Discounted Cash Flow (DCF) valuation analysis for public companies. Use when the user asks to value a stock, calculate intrinsic value, fair value, perform DCF analysis, determine if a stock is undervalued or overvalued, or estimate a price target.
version: 1.1.1
metadata:
emoji: "\U0001F9EE"
tags:
- finance
- valuation
- dcf
userInvocable: true
disableModelInvocation: false
---
## Instructions
Perform a rigorous Discounted Cash Flow (DCF) valuation. Follow all steps and show your work. Use external macro context when assumptions are time-sensitive (for example, risk-free rate regime shifts).
### Progress Checklist
```
DCF Analysis Progress:
- [ ] Step 1: Gather financial data
- [ ] Step 2: Calculate historical FCF and growth
- [ ] Step 3: Estimate WACC
- [ ] Step 4: Project future cash flows
- [ ] Step 5: Calculate present value and fair value
- [ ] Step 6: Sensitivity analysis
- [ ] Step 7: Validate results
- [ ] Step 8: Present findings
```
### Step 1: Gather Financial Data
Use `data` tool with `domain="finance"` for all calls:
1. **Cash Flow History** (5 years):
```
action: "get_cash_flow_statements"
params: { ticker: "[TICKER]", period: "annual", limit: 5 }
```
Extract: `free_cash_flow`, `net_cash_flow_from_operations`, `capital_expenditure`
Fallback: FCF = Operating Cash Flow - CapEx
2. **Income Statements** (5 years):
```
action: "get_income_statements"
params: { ticker: "[TICKER]", period: "annual", limit: 5 }
```
Extract: `revenue`, `operating_income`, `net_income`, `income_tax_expense`
3. **Balance Sheet** (latest):
```
action: "get_balance_sheets"
params: { ticker: "[TICKER]", period: "annual", limit: 1 }
```
Extract: `total_debt`, `cash_and_equivalents`, `outstanding_shares`
4. **Financial Metrics** (current):
```
action: "get_financial_metrics_snapshot"
params: { ticker: "[TICKER]" }
```
Extract: `market_cap`, `enterprise_value`, `return_on_invested_capital`, `debt_to_equity`, `free_cash_flow_per_share`
5. **Analyst Estimates**:
```
action: "get_analyst_estimates"
params: { ticker: "[TICKER]", period: "annual" }
```
Extract: Forward EPS estimates for growth validation
6. **Current Price**:
```
action: "get_price_snapshot"
params: { ticker: "[TICKER]" }
```
7. **Company Facts**:
```
action: "get_company_facts"
params: { ticker: "[TICKER]" }
```
Extract: `sector` — use to determine WACC range from [sector-wacc.md](references/sector-wacc.md)
8. **Recent Event Context**:
- Pull company-specific headlines with:
```
action: "get_news"
params: { ticker: "[TICKER]", limit: 10 }
```
- Use this to flag event risk (guidance reset, litigation, regulation, one-off gains/losses) that may distort near-term FCF extrapolation.
### Step 2: Calculate Historical FCF and Growth
- Compute FCF for each of the last 5 years
- Calculate 5-year FCF CAGR: `(FCF_latest / FCF_earliest)^(1/years) - 1`
- Cross-validate with: revenue growth, operating income growth, analyst EPS growth
- **Cap projected growth at 15%** (sustained higher growth is rare)
- If FCF is volatile, weight analyst estimates more heavily
### Step 3: Estimate WACC
Use the company's `sector` to look up the base WACC range from [sector-wacc.md](references/sector-wacc.md).
**Calculate WACC:**
```
WACC = (E/V) * Re + (D/V) * Rd * (1 - Tax Rate)
Where:
E = Market cap (equity value)
D = Total debt
V = E + D
Re = Risk-free rate + Beta * Equity Risk Premium
Rd = Cost of debt (estimate from interest expense / total debt)
Tax Rate = Effective tax rate from income statements
```
**Default assumptions:**
- Risk-free rate: pull latest 10-year Treasury yield using `web_search` (preferred) and cite date/source. Fallback range: ~4.0-4.5%.
- Equity risk premium: ~5.5%
- If beta unavailable, use sector average
**Sanity check:** WACC should be 2-4% below ROIC for value-creating companies.
### Step 4: Project Future Cash Flows (Years 1-5)
- Apply growth rate with annual decay (multiply by 0.95 each year)
- Year 1: FCF * (1 + growth_rate)
- Year 2: FCF * (1 + growth_rate * 0.95)
- Year 3: FCF * (1 + growth_rate * 0.90)
- Year 4: FCF * (1 + growth_rate * 0.85)
- Year 5: FCF * (1 + growth_rate * 0.80)
**Terminal Value** (Gordon Growth Model):
```
TV = FCF_Year5 * (1 + g) / (WACC - g)
Where g = terminal growth rate (2.5% default, GDP proxy)
```
### Step 5: Calculate Present Value and Fair Value
```
PV of each FCF = FCF_t / (1 + WACC)^t
PV of Terminal Value = TV / (1 + WACC)^5
Enterprise Value = Sum of PV(FCFs) + PV(Terminal Value)
Net Debt = Total Debt - Cash and Equivalents
Equity Value = Enterprise Value - Net Debt
Fair Value per Share = Equity Value / Shares Outstanding
```
### Step 6: Sensitivity Analysis
Create a matrix varying two key assumptions:
| | TG 2.0% | TG 2.5% | TG 3.0% |
|---|---|---|---|
| **WACC -1%** | $ | $ | $ |
| **WACC base** | $ | $ | $ |
| **WACC +1%** | $ | $ | $ |
(TG = Terminal Growth Rate)
### Step 7: Validate Results
Before presenting, check:
1. **EV comparison**: Calculated EV within 30% of reported enterprise_value
- If off by >30%, revisit WACC or growth assumptions
2. **Terminal value ratio**: Should be 50-80% of total EV for mature companies
- If >90%, growth rate may be too high
- If <40%, near-term projections may be aggressive
3. **FCF yield check**: Compare fair value FCF yield to current market FCF yield
If validation fails, adjust assumptions and recalculate.
### Step 8: Present Results
Format clearly with:
1. **Executive Summary**
- Current price vs. fair value estimate
- Upside/downside percentage
- Verdict: Undervalued / Fairly Valued / Overvalued
2. **Key Assumptions Table**
| Assumption | Value | Source |
|---|---|---|
| Growth Rate | X% | 5Y CAGR + analyst cross-check |
| WACC | X% | Sector range + company adjustments |
| Terminal Growth | X% | GDP proxy |
| Tax Rate | X% | Effective rate from financials |
3. **Projected FCF Table**
| Year | FCF | Growth | PV of FCF |
|---|---|---|---|
4. **Valuation Bridge**
- PV of projected FCFs
- PV of Terminal Value
- = Enterprise Value
- - Net Debt
- = Equity Value
- / Shares Outstanding
- = **Fair Value per Share**
5. **Sensitivity Matrix** (from Step 6)
6. **Risks & Caveats**
- Key risks to the valuation thesis
- DCF limitations (sensitive to growth and WACC assumptions)
- Company-specific caveats (high debt, cyclicality, early-stage, etc.)

View file

@ -1,40 +0,0 @@
# Sector WACC Reference
Use the company's `sector` from `get_company_facts` to look up the base WACC range below, then adjust for company-specific factors.
## WACC by Sector
| Sector | Typical WACC Range | Notes |
|--------|-------------------|-------|
| Communication Services | 8-10% | Mix of stable telecom and growth media |
| Consumer Discretionary | 8-10% | Cyclical exposure |
| Consumer Staples | 7-8% | Defensive, stable demand |
| Energy | 9-11% | Commodity price exposure |
| Financials | 8-10% | Leverage already in business model |
| Health Care | 8-10% | Regulatory and pipeline risk |
| Industrials | 8-9% | Moderate cyclicality |
| Information Technology | 8-12% | Higher end for high-growth; lower for mature |
| Materials | 8-10% | Cyclical, commodity exposure |
| Real Estate | 7-9% | Interest rate sensitivity |
| Utilities | 6-7% | Regulated, stable cash flows |
## Adjustment Factors
**Add to base WACC:**
- High debt (D/E > 1.5): +1-2%
- Small cap (< $2B market cap): +1-2%
- Emerging markets exposure: +1-3%
- Concentrated customer base: +0.5-1%
- Regulatory uncertainty: +0.5-1.5%
**Subtract from base WACC:**
- Market leader with moat: -0.5-1%
- Recurring revenue model: -0.5-1%
- Investment grade credit: -0.5%
## Sanity Checks
- WACC should typically be 2-4% below ROIC for value-creating companies
- If WACC > ROIC, the company may be destroying value
- Typical range for US large-cap: 7-12%
- Anything below 6% or above 14% warrants extra scrutiny

View file

@ -1,513 +0,0 @@
---
name: Word Document
description: "Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of \"Word doc\", \"word document\", \".docx\", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a \"report\", \"memo\", \"letter\", \"template\", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation."
version: 1.0.0
metadata:
emoji: "📄"
tags:
- office
- document
- docx
install:
- id: brew-pandoc
kind: brew
formula: pandoc
bins: [pandoc]
label: "Install pandoc for text extraction"
os: [darwin, linux]
- id: brew-libreoffice
kind: brew
formula: libreoffice
bins: [soffice]
label: "Install LibreOffice for PDF conversion"
os: [darwin]
- id: brew-poppler
kind: brew
formula: poppler
bins: [pdftoppm]
label: "Install poppler for PDF to image conversion"
os: [darwin, linux]
- id: npm-docx
kind: node
formula: docx
bins: []
label: "Install docx-js for document creation"
userInvocable: true
disableModelInvocation: false
---
# DOCX creation, editing, and analysis
## Overview
A .docx file is a ZIP archive containing XML files.
## Quick Reference
| Task | Approach |
|------|----------|
| Read/analyze content | `pandoc` or unpack for raw XML |
| Create new document | Use `docx-js` - see Creating New Documents below |
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
### Converting .doc to .docx
Legacy `.doc` files must be converted before editing:
```bash
python scripts/office/soffice.py --headless --convert-to docx document.doc
```
### Reading Content
```bash
# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md
# Raw XML access
python scripts/office/unpack.py document.docx unpacked/
```
### Converting to Images
```bash
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
```
### Accepting Tracked Changes
To produce a clean document with all tracked changes accepted (requires LibreOffice):
```bash
python scripts/accept_changes.py input.docx output.docx
```
---
## Creating New Documents
Generate .docx files with JavaScript, then validate. Install: `npm install -g docx`
### Setup
```javascript
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
```
### Validation
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
```bash
python scripts/office/validate.py doc.docx
```
### Page Size
```javascript
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5 inches in DXA
height: 15840 // 11 inches in DXA
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
}
},
children: [/* content */]
}]
```
**Common page sizes (DXA units, 1440 DXA = 1 inch):**
| Paper | Width | Height | Content Width (1" margins) |
|-------|-------|--------|---------------------------|
| US Letter | 12,240 | 15,840 | 9,360 |
| A4 (default) | 11,906 | 16,838 | 9,026 |
**Landscape orientation:** docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:
```javascript
size: {
width: 12240, // Pass SHORT edge as width
height: 15840, // Pass LONG edge as height
orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)
```
### Styles (Override Built-in Headings)
Use Arial as the default font (universally supported). Keep titles black for readability.
```javascript
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
paragraphStyles: [
// IMPORTANT: Use exact IDs to override built-in styles
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
]
}]
});
```
### Lists (NEVER use unicode bullets)
```javascript
// WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("Item")] }) // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD
// CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "\u2022", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("Numbered item")] }),
]
}]
});
// Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)
```
### Tables
**CRITICAL: Tables need dual widths** - set both `columnWidths` on the table AND `width` on each cell. Without both, tables render incorrectly on some platforms.
```javascript
// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
children: [new Paragraph({ children: [new TextRun("Cell")] })]
})
]
})
]
})
```
**Table width calculation:**
Always use `WidthType.DXA``WidthType.PERCENTAGE` breaks in Google Docs.
```javascript
// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360] // Must sum to table width
```
**Width rules:**
- **Always use `WidthType.DXA`** — never `WidthType.PERCENTAGE` (incompatible with Google Docs)
- Table width must equal the sum of `columnWidths`
- Cell `width` must match corresponding `columnWidth`
- Cell `margins` are internal padding - they reduce content area, not add to cell width
- For full-width tables: use content width (page width minus left and right margins)
### Images
```javascript
// CRITICAL: type parameter is REQUIRED
new Paragraph({
children: [new ImageRun({
type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "Title", description: "Desc", name: "Name" } // All three required
})]
})
```
### Page Breaks
```javascript
// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })
// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })
```
### Table of Contents
```javascript
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })
```
### Headers/Footers
```javascript
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
})] })
},
children: [/* content */]
}]
```
### Critical Rules for docx-js
- **Set page size explicitly** - docx-js defaults to A4; use US Letter (12240 x 15840 DXA) for US documents
- **Landscape: pass portrait dimensions** - docx-js swaps width/height internally; pass short edge as `width`, long edge as `height`, and set `orientation: PageOrientation.LANDSCAPE`
- **Never use `\n`** - use separate Paragraph elements
- **Never use unicode bullets** - use `LevelFormat.BULLET` with numbering config
- **PageBreak must be in Paragraph** - standalone creates invalid XML
- **ImageRun requires `type`** - always specify png/jpg/etc
- **Always set table `width` with DXA** - never use `WidthType.PERCENTAGE` (breaks in Google Docs)
- **Tables need dual widths** - `columnWidths` array AND cell `width`, both must match
- **Table width = sum of columnWidths** - for DXA, ensure they add up exactly
- **Always add cell margins** - use `margins: { top: 80, bottom: 80, left: 120, right: 120 }` for readable padding
- **Use `ShadingType.CLEAR`** - never SOLID for table shading
- **TOC requires HeadingLevel only** - no custom styles on heading paragraphs
- **Override built-in styles** - use exact IDs: "Heading1", "Heading2", etc.
- **Include `outlineLevel`** - required for TOC (0 for H1, 1 for H2, etc.)
---
## Editing Existing Documents
**Follow all 3 steps in order.**
### Step 1: Unpack
```bash
python scripts/office/unpack.py document.docx unpacked/
```
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (`&#x201C;` etc.) so they survive editing. Use `--merge-runs false` to skip run merging.
### Step 2: Edit XML
Edit files in `unpacked/word/`. See XML Reference below for patterns.
**Use "Claude" as the author** for tracked changes and comments, unless the user explicitly requests use of a different name.
**Use the Edit tool directly for string replacement. Do not write Python scripts.** Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.
**CRITICAL: Use smart quotes for new content.** When adding text with apostrophes or quotes, use XML entities to produce smart quotes:
```xml
<!-- Use these entities for professional typography -->
<w:t>Here&#x2019;s a quote: &#x201C;Hello&#x201D;</w:t>
```
| Entity | Character |
|--------|-----------|
| `&#x2018;` | ' (left single) |
| `&#x2019;` | ' (right single / apostrophe) |
| `&#x201C;` | " (left double) |
| `&#x201D;` | " (right double) |
**Adding comments:** Use `comment.py` to handle boilerplate across multiple XML files (text must be pre-escaped XML):
```bash
python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
```
Then add markers to document.xml (see Comments in XML Reference).
### Step 3: Pack
```bash
python scripts/office/pack.py unpacked/ output.docx --original document.docx
```
Validates with auto-repair, condenses XML, and creates DOCX. Use `--validate false` to skip.
**Auto-repair will fix:**
- `durableId` >= 0x7FFFFFFF (regenerates valid ID)
- Missing `xml:space="preserve"` on `<w:t>` with whitespace
**Auto-repair won't fix:**
- Malformed XML, invalid element nesting, missing relationships, schema violations
### Common Pitfalls
- **Replace entire `<w:r>` elements**: When adding tracked changes, replace the whole `<w:r>...</w:r>` block with `<w:del>...<w:ins>...` as siblings. Don't inject tracked change tags inside a run.
- **Preserve `<w:rPr>` formatting**: Copy the original run's `<w:rPr>` block into your tracked change runs to maintain bold, font size, etc.
---
## XML Reference
### Schema Compliance
- **Element order in `<w:pPr>`**: `<w:pStyle>`, `<w:numPr>`, `<w:spacing>`, `<w:ind>`, `<w:jc>`, `<w:rPr>` last
- **Whitespace**: Add `xml:space="preserve"` to `<w:t>` with leading/trailing spaces
- **RSIDs**: Must be 8-digit hex (e.g., `00AB1234`)
### Tracked Changes
**Insertion:**
```xml
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>inserted text</w:t></w:r>
</w:ins>
```
**Deletion:**
```xml
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
```
**Inside `<w:del>`**: Use `<w:delText>` instead of `<w:t>`, and `<w:delInstrText>` instead of `<w:instrText>`.
**Minimal edits** - only mark what changes:
```xml
<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>
```
**Deleting entire paragraphs/list items** - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add `<w:del/>` inside `<w:pPr><w:rPr>`:
```xml
<w:p>
<w:pPr>
<w:numPr>...</w:numPr> <!-- list numbering if present -->
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
</w:del>
</w:p>
```
Without the `<w:del/>` in `<w:pPr><w:rPr>`, accepting changes leaves an empty paragraph/list item.
**Rejecting another author's insertion** - nest deletion inside their insertion:
```xml
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>their inserted text</w:delText></w:r>
</w:del>
</w:ins>
```
**Restoring another author's deletion** - add insertion after (don't modify their deletion):
```xml
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>deleted text</w:t></w:r>
</w:ins>
```
### Comments
After running `comment.py` (see Step 2), add markers to document.xml. For replies, use `--parent` flag and nest markers inside the parent's.
**CRITICAL: `<w:commentRangeStart>` and `<w:commentRangeEnd>` are siblings of `<w:r>`, never inside `<w:r>`.**
```xml
<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r><w:t>text</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>
```
### Images
1. Add image file to `word/media/`
2. Add relationship to `word/_rels/document.xml.rels`:
```xml
<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
```
3. Add content type to `[Content_Types].xml`:
```xml
<Default Extension="png" ContentType="image/png"/>
```
4. Reference in document.xml:
```xml
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
```
---
## Dependencies
- **pandoc**: Text extraction
- **docx**: `npm install -g docx` (new documents)
- **LibreOffice**: PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
- **Poppler**: `pdftoppm` for images

View file

@ -1,463 +0,0 @@
---
name: Earnings Analysis
description: >-
Analyze a company's financial statements (income statement, balance sheet,
cash flow statement) to assess financial health, earnings quality, and
competitive advantage. Use when the user asks to read/analyze financial
statements, check earnings quality, assess financial health, evaluate
profitability trends, or screen for competitive moats.
version: 1.0.0
metadata:
emoji: "\U0001F4D1"
requires:
env:
- FINANCIAL_DATASETS_API_KEY
tags:
- finance
- earnings
- analysis
- statements
- buffett
userInvocable: true
disableModelInvocation: false
---
## Instructions
You are performing a structured financial statement analysis. Follow all steps in order and show your work. Output language must match the user's input language.
**IMPORTANT: This analysis requires BOTH structured data AND external context.** You MUST use `web_search` to gather earnings call insights, industry context, and explanations for data anomalies. An analysis based only on API data without any web research is incomplete. Expect to make 3-6 web searches throughout the analysis.
### Progress Checklist
```
Earnings Analysis Progress:
- [ ] Step 1: Gather financial data
- [ ] Step 2: Income statement analysis
- [ ] Step 3: Balance sheet analysis
- [ ] Step 4: Cash flow statement analysis
- [ ] Step 5: Buffett competitive advantage scoring
- [ ] Step 6: Quality of earnings assessment
- [ ] Step 7: SEC filing qualitative analysis
- [ ] Step 8: Peer comparison (if requested)
- [ ] Step 9: Present findings
```
### Step 1: Gather Financial Data
Use `data` tool with `domain="finance"` for all structured data calls.
#### 1a. Structured Data
1. **Annual financial statements** (5 years):
```
action: "get_all_financial_statements"
params: { ticker: "[TICKER]", period: "annual", limit: 5 }
```
This returns income statements, balance sheets, and cash flow statements together.
2. **Quarterly financial statements** (last 4 quarters):
```
action: "get_all_financial_statements"
params: { ticker: "[TICKER]", period: "quarterly", limit: 4 }
```
3. **Current financial metrics**:
```
action: "get_financial_metrics_snapshot"
params: { ticker: "[TICKER]" }
```
4. **Company facts**:
```
action: "get_company_facts"
params: { ticker: "[TICKER]" }
```
Extract: `sector`, `industry` — needed for benchmark comparisons in later steps.
5. **Current stock price**:
```
action: "get_price_snapshot"
params: { ticker: "[TICKER]" }
```
6. **Recent news**:
```
action: "get_news"
params: { ticker: "[TICKER]", limit: 10 }
```
Scan headlines for material events (earnings surprises, guidance changes, M&A, restructuring).
#### 1b. External Context (Web Search) — MANDATORY
You MUST run the following two web searches after gathering structured data. These are not optional.
1. **Latest earnings call highlights** (REQUIRED):
```
web_search("[COMPANY] latest earnings call highlights key takeaways [CURRENT_YEAR]")
```
Extract: management guidance, segment commentary, strategic priorities, forward outlook.
This provides the "why" behind the numbers that structured data cannot explain.
2. **Industry/macro backdrop** (REQUIRED):
```
web_search("[INDUSTRY] industry outlook trends [CURRENT_YEAR]")
```
Extract: industry growth rate, tailwinds/headwinds, regulatory changes, competitive dynamics.
This is needed to assess whether the company's performance is company-specific or industry-wide.
3. **Company-specific events** (conditional — run if news headlines or data show a material event):
```
web_search("[COMPANY] [EVENT_KEYWORD] impact analysis")
```
Examples: acquisition, restructuring, product launch, lawsuit, management change.
**Checkpoint:** Before proceeding to Step 2, verify that you have completed at least 2 web searches above. If you have not, go back and run them now.
### Step 2: Income Statement Analysis
Analyze the income statement across all 5 annual periods. Calculate and present:
1. **Revenue trend**:
- Year-over-year growth rate for each year
- 5-year CAGR: `(Revenue_latest / Revenue_earliest)^(1/years) - 1`
- Flag any years with revenue decline
2. **Margin analysis** (calculate for each year, show the trend):
- Gross Margin = Gross Profit / Revenue
- Operating Margin = Operating Income / Revenue
- Net Margin = Net Income / Revenue
3. **Margin benchmarks** (from [financial-ratios-benchmarks.md](references/financial-ratios-benchmarks.md)):
- Compare each margin to sector benchmarks
- Flag margins that are significantly above or below sector range
4. **EPS analysis**:
- EPS trend over 5 years
- EPS growth consistency (note any years of decline)
5. **Expense structure**:
- Cost of revenue as % of revenue (trend)
- SG&A as % of revenue (trend)
- R&D as % of revenue (trend, if applicable)
- Flag any expense category growing faster than revenue
6. **Contextual explanation** (REQUIRED — use web search results from Step 1b):
- For each significant trend or inflection point in the data above, provide a **why** explanation using the earnings call and industry context gathered in Step 1b.
- If revenue growth changed direction significantly (acceleration or deceleration > 10pp), run an additional search:
`web_search("[COMPANY] revenue [growth/decline] reason [YEAR]")`
- If margins shifted by more than 5pp year-over-year, run an additional search:
`web_search("[COMPANY] margin [expansion/compression] [YEAR]")`
- **Do not present a data table without narrative.** Every major trend must have a "why" attached, citing the source (earnings call, industry report, or company announcement).
Present as a table:
| Metric | Year 1 | Year 2 | Year 3 | Year 4 | Year 5 | 5Y CAGR |
|--------|--------|--------|--------|--------|--------|---------|
### Step 3: Balance Sheet Analysis
Analyze the balance sheet across all 5 annual periods:
1. **Liquidity**:
- Current Ratio = Current Assets / Current Liabilities
- Quick Ratio = (Current Assets - Inventory) / Current Liabilities
- Cash and equivalents trend
2. **Leverage**:
- Cash vs. Total Debt (short-term + long-term debt)
- Debt-to-Equity = Total Liabilities / Total Shareholders' Equity
- Interest Coverage = Operating Income / Interest Expense
- Debt payoff capacity = Total Debt / Net Income (in years)
3. **Asset quality**:
- Receivables Turnover = Revenue / Accounts Receivable
- Inventory Turnover = Cost of Revenue / Inventory (if applicable)
- Goodwill as % of Total Assets (flag if > 30%)
4. **Equity structure**:
- Retained earnings: year-over-year changes (growing?)
- Preferred stock: present or absent?
- Treasury stock: present? growing? (indicates buybacks)
5. **Working capital trend**:
- Net Working Capital = Current Assets - Current Liabilities
- Direction of change over 5 years
6. **Contextual explanation** (use web search results from Step 1b + additional searches as needed):
- Explain major balance sheet changes using earnings call context from Step 1b.
- If total debt changed significantly (> 30% YoY), you MUST search for the reason:
`web_search("[COMPANY] debt [issuance/repayment] [YEAR]")`
- If goodwill jumped, you MUST search for acquisition context:
`web_search("[COMPANY] acquisition [YEAR]")`
- Large treasury stock changes → confirm buyback program details:
`web_search("[COMPANY] share buyback program")`
Compare key ratios to sector benchmarks from [financial-ratios-benchmarks.md](references/financial-ratios-benchmarks.md).
### Step 4: Cash Flow Statement Analysis
Analyze cash flow statements across all 5 annual periods:
1. **Operating cash flow quality**:
- OCF vs. Net Income ratio for each year
- Target: OCF/NI > 1.0 (cash earnings exceed accrual earnings)
- Trend direction
2. **Free cash flow**:
- FCF = Operating Cash Flow - Capital Expenditure
- FCF Margin = FCF / Revenue
- 5-year FCF trend and CAGR
3. **Capital intensity**:
- CapEx / Revenue ratio
- CapEx / Net Income ratio (Buffett benchmark: < 25% excellent, < 50% acceptable)
- Is CapEx growing faster than revenue? (potential red flag)
4. **Cash flow composition**:
- Net cash from operating activities (should be consistently positive)
- Net cash from investing activities (negative = investing in growth)
- Net cash from financing activities (pattern: debt vs. equity funded?)
5. **Shareholder returns**:
- Dividends paid (from financing activities)
- Share buybacks / treasury stock repurchase
- Total payout ratio = (Dividends + Buybacks) / Net Income
- Is the company returning cash while maintaining growth?
6. **Contextual explanation** (use web search results from Step 1b + additional searches as needed):
- Explain cash flow patterns using earnings call context from Step 1b.
- If CapEx spiked significantly in a particular year, you MUST search for what was built:
`web_search("[COMPANY] capital expenditure investment [YEAR]")`
- If FCF diverged sharply from net income, search for restructuring or working capital events.
Present a summary table:
| Metric | Year 1 | Year 2 | Year 3 | Year 4 | Year 5 |
|--------|--------|--------|--------|--------|--------|
### Step 5: Buffett Competitive Advantage Scoring
Apply the scoring framework from [buffett-checklist.md](references/buffett-checklist.md).
For each of the 13 criteria across 4 categories:
1. Calculate the metric value from the data gathered in Steps 1-4
2. Determine the score based on the threshold table
3. Note the sector-specific caveats (Financials, Utilities, REITs, Growth-stage)
Present the full scorecard table and the overall rating (Excellent / Good / Average / Weak).
### Step 6: Quality of Earnings Assessment
Assess whether reported earnings are backed by real cash and sustainable operations:
1. **Accrual ratio**:
- Formula: (Net Income - Operating Cash Flow) / Total Assets
- Interpretation: Lower is better. High positive values suggest earnings are driven by accruals rather than cash.
- Red flag threshold: > 10%
2. **Revenue recognition quality**:
- Compare Accounts Receivable growth rate vs. Revenue growth rate
- If AR grows significantly faster than revenue → potential aggressive revenue recognition
- Red flag threshold: AR growth > Revenue growth + 5 percentage points
3. **Inventory quality** (if applicable):
- Compare Inventory growth rate vs. Cost of Revenue growth rate
- Rising inventory vs. flat/declining COGS → potential obsolescence risk
- Red flag threshold: Inventory growth > COGS growth + 10 percentage points
4. **One-time items**:
- Identify significant non-recurring charges or gains in the income statement
- Calculate adjusted net income excluding one-time items
- Compare adjusted vs. reported margins
5. **Deferred revenue trend** (if applicable):
- Growing deferred revenue is a positive signal (future revenue already contracted)
- Declining deferred revenue may signal weakening demand pipeline
6. **External validation** (web search):
- If any red flags were triggered above, search for corroborating or mitigating context:
`web_search("[COMPANY] accounting concerns OR restatement OR SEC inquiry")`
- Check for auditor changes (can signal accounting issues):
`web_search("[COMPANY] auditor change OR audit opinion")`
- Only run these searches if quantitative red flags exist. Do not search proactively for every company.
Summarize quality of earnings as: **High** / **Moderate** / **Low** with supporting evidence.
### Step 7: SEC Filing Qualitative Analysis
Pull and analyze the most recent annual or quarterly filing:
1. **Get filing list**:
```
action: "get_filings"
params: { ticker: "[TICKER]", filing_type: "10-K", limit: 1 }
```
If 10-K is not recent enough, also pull 10-Q:
```
action: "get_filings"
params: { ticker: "[TICKER]", filing_type: "10-Q", limit: 1 }
```
2. **Read MD&A section** (Management's Discussion and Analysis):
```
action: "get_filing_items"
params: { ticker: "[TICKER]", filing_type: "10-K", item: "7" }
```
For 10-Q, MD&A is item "2":
```
action: "get_filing_items"
params: { ticker: "[TICKER]", filing_type: "10-Q", item: "2" }
```
3. **Read Risk Factors**:
```
action: "get_filing_items"
params: { ticker: "[TICKER]", filing_type: "10-K", item: "1A" }
```
4. **Extract and analyze**:
- Management's explanation of revenue and margin trends
- Forward-looking statements and guidance
- Key risk factors that could impact financial health
- Any disclosures about accounting policy changes
- Cross-validate: Does management narrative align with the quantitative data from Steps 2-4?
- Flag contradictions between management tone and actual numbers
5. **Supplement with earnings call transcript** (REQUIRED — web search/fetch):
You MUST search for and incorporate the most recent earnings call. This is critical for understanding management's forward-looking view.
- Search for the transcript:
`web_search("[COMPANY] [QUARTER] [YEAR] earnings call transcript")`
- If a transcript URL is found, use `web_fetch` to read key sections (CEO/CFO prepared remarks, Q&A highlights).
- Extract: forward guidance, segment-level commentary, management tone on competitive position, key analyst concerns.
- Cross-reference earnings call statements with MD&A disclosures — flag any inconsistencies.
6. **Summarize key insights**:
- What management says about the business trajectory
- Material risks not visible in the numbers alone
- Any changes in risk factors vs. prior filings (if noticeable)
- Key analyst questions and management responses from earnings call (if available)
### Step 8: Peer Comparison (Conditional)
**Execute this step only when the user explicitly requests peer comparison or industry benchmarking.**
1. **Identify peers**:
- Use the `sector` and `industry` from `get_company_facts`
- Select 2-3 publicly traded competitors in the same industry
- If the user specifies peers, use those instead
2. **Pull peer data** (for each peer):
```
action: "get_financial_metrics_snapshot"
params: { ticker: "[PEER_TICKER]" }
```
```
action: "get_income_statements"
params: { ticker: "[PEER_TICKER]", period: "annual", limit: 1 }
```
```
action: "get_balance_sheets"
params: { ticker: "[PEER_TICKER]", period: "annual", limit: 1 }
```
3. **Comparative table**:
| Metric | [TARGET] | [PEER 1] | [PEER 2] | [PEER 3] | Sector Avg |
|--------|----------|----------|----------|----------|------------|
| Revenue Growth (YoY) | | | | | |
| Gross Margin | | | | | |
| Net Margin | | | | | |
| ROE | | | | | |
| D/E Ratio | | | | | |
| FCF Margin | | | | | |
| P/E Ratio | | | | | |
4. **Competitive position assessment**:
- Where does the target company rank among peers on each metric?
- Identify clear advantages and disadvantages relative to peers
- Note if the target trades at a premium or discount to peers and whether it's justified
### Step 9: Present Findings
Compile the full analysis into a structured report. Follow this exact structure:
#### 1. Executive Summary
- Company name, ticker, sector, current price
- One-paragraph thesis: Is this a financially healthy company with a durable competitive advantage?
- Financial health rating from Buffett scorecard (Excellent / Good / Average / Weak)
- Earnings quality assessment (High / Moderate / Low)
#### 2. Financial Health Scorecard
- Full Buffett checklist scorecard table from Step 5
- Total score and rating
#### 3. Trend Dashboard
- 5-year key metrics trend table from Steps 2-4:
| Metric | Y1 | Y2 | Y3 | Y4 | Y5 | Trend |
|--------|----|----|----|----|----|----|
| Revenue | | | | | | arrow |
| Gross Margin | | | | | | arrow |
| Net Margin | | | | | | arrow |
| ROE | | | | | | arrow |
| D/E Ratio | | | | | | arrow |
| FCF | | | | | | arrow |
| OCF/NI | | | | | | arrow |
| CapEx/NI | | | | | | arrow |
Use directional indicators in the Trend column.
#### 4. Quality of Earnings
- Summary from Step 6 with key metrics and assessment
#### 5. Key Strengths & Red Flags
- **Strengths**: List 3-5 financial strengths with supporting data
- **Red Flags**: List any warning signs discovered during analysis. If none, state "No material red flags identified."
Common red flags to watch for:
- Revenue growth but declining margins
- Net income growing but OCF declining
- AR growing faster than revenue
- Inventory building up vs. flat COGS
- Rising debt with declining interest coverage
- Retained earnings declining
- Large goodwill relative to total assets
- CapEx consistently > 50% of net income
- Management tone in MD&A contradicts financial data
#### 6. SEC Filing Insights
- Key findings from Step 7
- Management's outlook and material risks
#### 7. Peer Comparison (if Step 8 was executed)
- Comparative table and competitive position assessment
### Guardrails
- Always state the date range of financial data used.
- If any data is missing or unavailable, explicitly note it and adjust the analysis scope.
- Do not present calculated ratios as precise — round to one decimal place.
- Clearly distinguish between facts (from data) and interpretive conclusions.
- The Buffett scorecard is a screening framework, not a buy/sell recommendation. State this in the output.
- For non-US companies or companies not filing with the SEC, skip Step 7 and note the limitation.
- Output language must match the user's input language (Chinese input → Chinese output, English input → English output).
### Web Search Requirements
**Minimum mandatory searches (you MUST perform these):**
1. Earnings call highlights (Step 1b) — for management's own explanation of results
2. Industry outlook (Step 1b) — for macro/sector context
3. Earnings call transcript (Step 7) — for forward guidance and analyst Q&A
**Additional searches (trigger when data shows anomalies):**
- Revenue or margin inflection points (Steps 2-4)
- Major debt changes or acquisitions (Step 3)
- CapEx spikes (Step 4)
- Quality-of-earnings red flags (Step 6)
**Search principles:**
- **Source quality**: Prefer primary sources (SEC filings, company press releases, earnings call transcripts) over secondary sources (analyst blogs, news aggregators).
- **Cite with dates**: Always include source name and date when referencing external information.
- **Separate fact from opinion**: Label analyst or media commentary as external opinion, not fact.
- **Total budget**: Expect 3-8 web searches per analysis. Fewer than 3 means you are likely missing critical context.

View file

@ -1,99 +0,0 @@
# Buffett Competitive Advantage Checklist
Score each criterion and calculate a total. Use this to assess whether a company has a durable competitive advantage (economic moat).
## Scoring System
Total: 100 points across 4 categories (25 points each).
### Category 1: Profitability (25 points)
| # | Criterion | Excellent | Good | Weak |
|---|-----------|-----------|------|------|
| 1 | **Gross Margin** | > 40% → **10 pts** | 30-40% → **6 pts** | < 30% **2 pts** |
| 2 | **Net Margin** | > 20% → **10 pts** | 10-20% → **6 pts** | < 10% **2 pts** |
| 3 | **Return on Equity (ROE)** | > 15% → **5 pts** | 10-15% → **3 pts** | < 10% **1 pt** |
How to calculate:
- Gross Margin = Gross Profit / Revenue
- Net Margin = Net Income / Revenue
- ROE = Net Income / Total Shareholders' Equity
- Use the most recent annual figures; cross-check with 5-year average
### Category 2: Balance Sheet Health (25 points)
| # | Criterion | Pass | Partial | Fail |
|---|-----------|------|---------|------|
| 4 | **Cash > Total Debt** | Yes → **8 pts** | Cash > 50% of Debt → **4 pts** | Cash < 50% of Debt **1 pt** |
| 5 | **Debt-to-Equity Ratio** | < 0.8 **7 pts** | 0.8-1.5 **4 pts** | > 1.5 → **1 pt** |
| 6 | **No Preferred Stock** | None → **5 pts** | — | Has Preferred → **0 pts** |
| 7 | **Retained Earnings Growth** | Growing 5 consecutive years → **5 pts** | Growing 3-4 years → **3 pts** | Declining or flat → **1 pt** |
How to calculate:
- Cash = Cash and Cash Equivalents + Short-term Investments
- Total Debt = Short-term Debt + Long-term Debt
- D/E = Total Liabilities / Total Shareholders' Equity
- Retained Earnings: Compare year-over-year from balance sheets
Special note on D/E:
- Exclude operating lease liabilities from "debt" for this assessment (they are contractual obligations, not financial debt)
- If treasury stock is large, it reduces equity and inflates D/E — note this in analysis
### Category 3: Cash Flow Quality (25 points)
| # | Criterion | Excellent | Good | Weak |
|---|-----------|-----------|------|------|
| 8 | **CapEx / Net Income** | < 25% **10 pts** | 25-50% **6 pts** | > 50% → **2 pts** |
| 9 | **Operating CF > Net Income** | OCF/NI > 1.0 → **8 pts** | OCF/NI = 0.8-1.0 → **4 pts** | OCF/NI < 0.8 **1 pt** |
| 10 | **Shareholder Returns** | Buybacks + Dividends → **7 pts** | Dividends only → **4 pts** | Neither → **1 pt** |
How to calculate:
- CapEx: Capital Expenditure from cash flow statement (use absolute value)
- Operating CF: Net Cash from Operating Activities
- Buybacks: Check if Treasury Stock increased year-over-year, or look at "repurchase of common stock" in financing activities
- Dividends: Look at "dividends paid" in financing activities
Note on CapEx:
- One-time large CapEx (e.g., new factory, data center buildout) should be noted but not penalized if the 5-year average CapEx/NI is still within range
- Asset-light businesses (software, services) naturally score well here
### Category 4: Consistency (25 points)
| # | Criterion | Excellent | Good | Weak |
|---|-----------|-----------|------|------|
| 11 | **Revenue Growth Streak** | 5+ consecutive years growing → **10 pts** | 3-4 years → **6 pts** | < 3 years **2 pts** |
| 12 | **Net Income Growth Streak** | 5+ consecutive years growing → **10 pts** | 3-4 years → **6 pts** | < 3 years **2 pts** |
| 13 | **Recession Resilience** | Profitable through last recession → **5 pts** | Revenue dip < 10% **3 pts** | Significant losses **1 pt** |
How to assess:
- Revenue/NI growth: Check year-over-year changes for the last 5 years
- Recession resilience: Check 2020 (COVID) and 2022 (rate hikes) performance. For older data, check 2008-2009 if available.
- A single flat year in an otherwise consistent growth streak can be scored as "Good"
## Score Interpretation
| Total Score | Rating | Interpretation |
|-------------|--------|----------------|
| 80-100 | **Excellent** | Strong durable competitive advantage. Consistent profitability, fortress balance sheet, capital-light operations. Classic Buffett-style investment candidate. |
| 60-79 | **Good** | Solid business with some competitive advantages. May have minor weaknesses in one category. Worth deeper investigation. |
| 40-59 | **Average** | Mediocre competitive position. Multiple areas of concern. Higher risk of margin erosion or competitive disruption. |
| < 40 | **Weak** | No clear competitive advantage. High debt, inconsistent earnings, or capital-intensive operations. Not a typical Buffett investment. |
## Sector-Specific Caveats
- **Financials**: Skip gross margin (criterion 1). Use net interest margin > 3% as substitute for 10 pts. D/E ratio thresholds don't apply — use Tier 1 Capital Ratio > 10% for 7 pts instead.
- **Utilities**: Naturally capital-intensive (CapEx criterion will score low). Offset by checking regulated return stability. If regulated ROE is consistently 9-11%, award 6 pts for criterion 8.
- **REITs**: Required to pay out 90%+ as dividends, so retained earnings won't grow. Skip criterion 7; award 5 pts if FFO per share grows consistently instead.
- **Growth-stage Tech**: May not yet have 5 years of profitability. Score consistency based on revenue growth and gross margin expansion trajectory. Note that the overall score may be artificially low.
## Output Format
Present the scorecard as a table:
| # | Criterion | Value | Score | Max |
|---|-----------|-------|-------|-----|
| 1 | Gross Margin | 43.2% | 10 | 10 |
| 2 | Net Margin | 25.1% | 10 | 10 |
| ... | ... | ... | ... | ... |
| | **Total** | | **XX** | **100** |
| | **Rating** | | **Excellent/Good/Average/Weak** | |

View file

@ -1,70 +0,0 @@
# Financial Ratios Benchmarks by Sector
Use the company's `sector` from `get_company_facts` to look up benchmark ranges below. Compare the company's ratios against these benchmarks and note deviations.
## Profitability Benchmarks
| Sector | Gross Margin | Operating Margin | Net Margin | ROE | ROA |
|--------|-------------|-----------------|------------|-----|-----|
| Communication Services | 50-60% | 15-25% | 10-18% | 12-20% | 5-10% |
| Consumer Discretionary | 35-50% | 8-15% | 5-10% | 15-25% | 5-10% |
| Consumer Staples | 35-45% | 12-18% | 8-12% | 20-30% | 8-12% |
| Energy | 30-50% | 10-20% | 5-15% | 10-20% | 5-10% |
| Financials | N/A | 25-35% | 15-25% | 10-15% | 1-2% |
| Health Care | 55-70% | 15-25% | 10-20% | 15-25% | 8-12% |
| Industrials | 25-35% | 10-15% | 6-10% | 15-20% | 5-8% |
| Information Technology | 55-70% | 20-30% | 15-25% | 20-35% | 10-15% |
| Materials | 25-35% | 10-18% | 5-12% | 10-18% | 5-8% |
| Real Estate | 55-70% | 25-40% | 15-30% | 5-10% | 2-5% |
| Utilities | 35-50% | 15-25% | 8-15% | 8-12% | 3-5% |
## Balance Sheet Benchmarks
| Sector | Current Ratio | Quick Ratio | D/E Ratio | Interest Coverage |
|--------|--------------|-------------|-----------|-------------------|
| Communication Services | 1.0-1.5 | 0.8-1.2 | 0.8-1.5 | 4-8x |
| Consumer Discretionary | 1.2-2.0 | 0.8-1.5 | 0.5-1.2 | 5-10x |
| Consumer Staples | 1.0-1.5 | 0.6-1.0 | 0.5-1.0 | 8-15x |
| Energy | 1.0-1.5 | 0.8-1.2 | 0.3-0.8 | 5-10x |
| Financials | N/A | N/A | 2.0-8.0 | N/A |
| Health Care | 1.5-2.5 | 1.2-2.0 | 0.3-0.8 | 8-15x |
| Industrials | 1.2-2.0 | 0.8-1.5 | 0.5-1.0 | 6-12x |
| Information Technology | 2.0-3.5 | 1.5-3.0 | 0.2-0.6 | 15-30x |
| Materials | 1.5-2.5 | 1.0-1.5 | 0.4-0.8 | 6-12x |
| Real Estate | 1.0-1.5 | 0.5-1.0 | 0.8-1.5 | 3-5x |
| Utilities | 0.8-1.2 | 0.5-0.8 | 1.0-2.0 | 3-5x |
## Cash Flow Benchmarks
| Sector | FCF Margin | CapEx/Revenue | Op. CF / Net Income |
|--------|-----------|---------------|---------------------|
| Communication Services | 10-20% | 10-20% | 1.2-1.8x |
| Consumer Discretionary | 5-12% | 3-8% | 1.1-1.5x |
| Consumer Staples | 8-15% | 3-6% | 1.2-1.5x |
| Energy | 5-15% | 15-30% | 1.5-2.5x |
| Financials | N/A | 1-3% | N/A |
| Health Care | 15-25% | 3-8% | 1.2-1.8x |
| Industrials | 5-12% | 3-8% | 1.2-1.6x |
| Information Technology | 20-35% | 3-10% | 1.2-1.8x |
| Materials | 5-12% | 5-12% | 1.3-2.0x |
| Real Estate | 15-30% | 5-15% | 1.5-3.0x |
| Utilities | 5-10% | 15-25% | 2.0-3.5x |
## Usage Notes
- **Financials sector**: Gross margin and current/quick ratios are not meaningful for banks and insurers. Use net interest margin and capital adequacy ratios instead.
- **Real Estate**: High depreciation makes net margin less useful. Focus on Funds From Operations (FFO).
- **Growth-stage companies**: May have negative margins. Compare against growth-stage peers rather than mature sector benchmarks.
- **Cyclical sectors** (Energy, Materials, Industrials): Use cycle-average margins (5-7 years) rather than single-year comparisons.
- **Post-M&A**: Goodwill and amortization may distort margins for 1-2 years after acquisitions. Note any large acquisitions.
## Buffett's Rules of Thumb (Quick Reference)
| Metric | Excellent | Good | Weak |
|--------|-----------|------|------|
| Gross Margin | > 40% | 30-40% | < 30% |
| Net Margin | > 20% | 10-20% | < 10% |
| ROE | > 15% | 10-15% | < 10% |
| D/E Ratio | < 0.5 | 0.5-0.8 | > 0.8 |
| CapEx / Net Income | < 25% | 25-50% | > 50% |
| Debt Payoff (years) | < 2 | 2-4 | > 4 |

View file

@ -1,171 +0,0 @@
---
name: Finance Research
description: Conduct analyst-grade financial research across primary and secondary markets using structured financial data plus macro and public-information cross-checks.
version: 1.1.1
metadata:
emoji: "\U0001F4CA"
tags:
- finance
- research
- stocks
- data
- macro
- sentiment
userInvocable: true
disableModelInvocation: false
---
## Instructions
You are conducting financial research with an analyst-grade standard. Tool usage is a dynamic decision. Do not force tool combinations. Choose tools based on evidence sufficiency for the specific question.
### Available Data Actions
#### Price Data
- `get_price_snapshot` — Current stock price. Params: `{ ticker }`
- `get_prices` — Historical OHLCV prices. Params: `{ ticker, start_date, end_date, interval?, interval_multiplier? }`
- interval: "day" (default), "week", "month", "year"
- `get_crypto_price_snapshot` — Current crypto price. Params: `{ ticker }` (e.g. "BTC-USD")
- `get_crypto_prices` — Historical crypto prices. Same params as get_prices.
- `get_available_crypto_tickers` — List available crypto tickers. Params: `{}`
#### Financial Statements
All share params: `{ ticker, period, limit?, report_period_gt?, report_period_gte?, report_period_lt?, report_period_lte? }`
- period: "annual", "quarterly", or "ttm"
- Dates in YYYY-MM-DD format
Actions:
- `get_income_statements` — Revenue, expenses, net income, EPS
- `get_balance_sheets` — Assets, liabilities, equity, debt, cash
- `get_cash_flow_statements` — Operating, investing, financing cash flows, FCF
- `get_all_financial_statements` — All three at once (more efficient when you need multiple)
#### Metrics & Estimates
- `get_financial_metrics_snapshot` — Current key ratios (P/E, market cap, margins, etc.). Params: `{ ticker }`
- `get_financial_metrics` — Historical metrics. Params: `{ ticker, period?, limit?, report_period*? }`
- `get_analyst_estimates` — EPS and revenue estimates. Params: `{ ticker, period? }`
#### Company Info
- `get_company_facts` — Sector, industry, employees, exchange, website. Params: `{ ticker }`
- `get_news` — Recent company news articles. Params: `{ ticker, start_date?, end_date?, limit? }`
- `get_insider_trades` — Insider buying/selling (SEC Form 4). Params: `{ ticker, limit?, filing_date*? }`
- `get_segmented_revenues` — Revenue by segment/geography. Params: `{ ticker, period, limit? }`
#### SEC Filings
- `get_filings` — List filings metadata. Params: `{ ticker, filing_type?, limit? }`
- `get_filing_items` — Read filing sections. Params: `{ ticker, filing_type, accession_number?, item? }`
### Evidence Sufficiency Gate (Internal Decision)
Before deep analysis, make an internal evidence decision. Do not output a technical decision block by default.
If the user explicitly asks for methodology or reasoning transparency, provide a concise plain-language explanation of your research approach.
Decision policy:
- Start with `data_only` when structured data can support the requested conclusion.
- Escalate to `hybrid` when the task is event-driven, time-sensitive, or requires causal explanation not visible in structured data alone.
- Use `web_first` only when the task is mainly document/news/policy driven (common in pre-IPO without stable ticker coverage).
- If a tool is unavailable, continue with available tools and explicitly downgrade confidence.
### Core Analysis Framework
1. **Scope & Market Type**
- Identify if this is primary market (IPO, pre-IPO, follow-on, placement) or secondary market (listed stock/sector/index).
- State region and analysis horizon (event-driven, 3-6 months, 1-3 years).
2. **Core Company Data (Structured)**
- Start with: `get_price_snapshot`, `get_company_facts`, `get_financial_metrics_snapshot`.
- Pull statements (`get_all_financial_statements`) and estimates as needed.
3. **Macro & Policy Context (Conditional)**
- Use `web_search` / `web_fetch` only if required by your internal evidence decision.
- If used, prefer high-signal primary sources (central bank, regulator, official releases).
- For time-sensitive conclusions, include source dates explicitly.
4. **News & Sentiment Context (Conditional)**
- Use `get_news` for company-linked coverage when available.
- Add web cross-checks only when event validation materially affects the conclusion.
5. **Synthesis & Decision**
- Separate **facts**, **inference**, and **assumptions**.
- Build bull/base/bear scenarios with explicit trigger conditions.
- Provide confidence level and explain the main uncertainty drivers.
### Primary Market (一级市场) Workflow
When asked about IPOs, pre-IPO, or new issuance:
1. **Deal Basics**
- Identify issuer, listing venue, offering structure (primary/secondary shares), expected timeline.
- Determine whether a reliable ticker exists in current data coverage.
2. **Filing/Prospectus Review**
- Prefer official documents (e.g., S-1/F-1/prospectus) via `web_search` + `web_fetch`.
- Extract: use of proceeds, customer concentration, related-party transactions, share classes, lock-up, dilution risks.
Primary-market capability boundary:
- If `ticker` is available and filings are retrievable, run hybrid analysis (structured + document evidence).
- If `ticker` is unavailable or structured filing fields are limited, run web-led analysis and clearly label it as partial-coverage with reduced confidence.
3. **Valuation & Comparable Set**
- Build peer set from listed comps (secondary market tickers) and compare growth, margin, and valuation multiples.
- Flag gaps between issuer narrative and peer reality.
4. **Deal Risk Map**
- Highlight red flags: weak FCF quality, aggressive non-GAAP adjustments, concentrated revenue, regulatory overhang.
- Provide post-listing watch items: lock-up expiry, first earnings, guidance revisions.
### Secondary Market (二级市场) Workflow
When asked about listed equities:
1. **Trend & Positioning**
- Pull 1y price history (`get_prices`) and identify regime (uptrend/range/downtrend) with volatility context.
2. **Fundamentals**
- Analyze growth quality (revenue vs FCF), margin durability, leverage, and capital allocation.
3. **Valuation**
- Compare current multiples to historical bands and peers (when peer data is available).
- Connect valuation premium/discount to expected growth and risk profile.
4. **Catalysts & Risks**
- Earnings, guidance, product cycle, policy changes, rates/FX/commodity sensitivity, insider activity.
### Output Standard
Always include:
1. **Executive Summary** (thesis + stance + confidence)
2. **Evidence Table** with columns:
- Signal
- Direction (Bull/Bear/Neutral)
- Why it matters
- Source
- Date
3. **Scenario Table** (bull/base/bear with probabilities or relative weights)
4. **Key Monitoring Triggers** (what would invalidate current thesis)
### Guardrails
- Always state data cutoff dates.
- If data is missing, explicitly mark it and show the impact on confidence.
- Do not present assumptions as facts.
- For event-driven conclusions, if you skip web validation, explicitly explain why structured evidence is still sufficient.
### Example: Secondary Market Analysis
For "Analyze Apple's investment outlook":
1. `data(domain="finance", action="get_price_snapshot", params={ticker: "AAPL"})`
2. `data(domain="finance", action="get_company_facts", params={ticker: "AAPL"})`
3. `data(domain="finance", action="get_all_financial_statements", params={ticker: "AAPL", period: "annual", limit: 3})`
4. `data(domain="finance", action="get_financial_metrics", params={ticker: "AAPL", period: "quarterly", limit: 8})`
5. `data(domain="finance", action="get_analyst_estimates", params={ticker: "AAPL", period: "annual"})`
6. `data(domain="finance", action="get_news", params={ticker: "AAPL", limit: 10})`
7. `web_search(query="latest Fed policy decision impact on US mega-cap tech valuations")`
8. `web_search(query="Apple supply chain or regulatory news latest quarter")`
Then synthesize fundamental trend, macro regime, and event sentiment into a scenario-based conclusion.

View file

@ -1,335 +0,0 @@
---
name: PDF Processing
description: Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
version: 1.0.0
metadata:
emoji: "📕"
tags:
- office
- document
- pdf
install:
- id: brew-poppler
kind: brew
formula: poppler
bins: [pdftoppm, pdftotext, pdfimages]
label: "Install poppler for PDF text/image extraction"
os: [darwin, linux]
- id: brew-qpdf
kind: brew
formula: qpdf
bins: [qpdf]
label: "Install qpdf for advanced PDF manipulation"
os: [darwin, linux]
userInvocable: true
disableModelInvocation: false
---
# PDF Processing Guide
## Overview
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see reference.md. If you need to fill out a PDF form, read forms.md and follow its instructions.
## Quick Start
```python
from pypdf import PdfReader, PdfWriter
# Read a PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")
# Extract text
text = ""
for page in reader.pages:
text += page.extract_text()
```
## Python Libraries
### pypdf - Basic Operations
#### Merge PDFs
```python
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
with open("merged.pdf", "wb") as output:
writer.write(output)
```
#### Split PDF
```python
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as output:
writer.write(output)
```
#### Extract Metadata
```python
reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Title: {meta.title}")
print(f"Author: {meta.author}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")
```
#### Rotate Pages
```python
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90) # Rotate 90 degrees clockwise
writer.add_page(page)
with open("rotated.pdf", "wb") as output:
writer.write(output)
```
### pdfplumber - Text and Table Extraction
#### Extract Text with Layout
```python
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)
```
#### Extract Tables
```python
with pdfplumber.open("document.pdf") as pdf:
for i, page in enumerate(pdf.pages):
tables = page.extract_tables()
for j, table in enumerate(tables):
print(f"Table {j+1} on page {i+1}:")
for row in table:
print(row)
```
#### Advanced Table Extraction
```python
import pandas as pd
with pdfplumber.open("document.pdf") as pdf:
all_tables = []
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
if table: # Check if table is not empty
df = pd.DataFrame(table[1:], columns=table[0])
all_tables.append(df)
# Combine all tables
if all_tables:
combined_df = pd.concat(all_tables, ignore_index=True)
combined_df.to_excel("extracted_tables.xlsx", index=False)
```
### reportlab - Create PDFs
#### Basic PDF Creation
```python
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter
# Add text
c.drawString(100, height - 100, "Hello World!")
c.drawString(100, height - 120, "This is a PDF created with reportlab")
# Add a line
c.line(100, height - 140, 400, height - 140)
# Save
c.save()
```
#### Create PDF with Multiple Pages
```python
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []
# Add content
title = Paragraph("Report Title", styles['Title'])
story.append(title)
story.append(Spacer(1, 12))
body = Paragraph("This is the body of the report. " * 20, styles['Normal'])
story.append(body)
story.append(PageBreak())
# Page 2
story.append(Paragraph("Page 2", styles['Heading1']))
story.append(Paragraph("Content for page 2", styles['Normal']))
# Build PDF
doc.build(story)
```
#### Subscripts and Superscripts
**IMPORTANT**: Never use Unicode subscript/superscript characters in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes.
Instead, use ReportLab's XML markup tags in Paragraph objects:
```python
from reportlab.platypus import Paragraph
from reportlab.lib.styles import getSampleStyleSheet
styles = getSampleStyleSheet()
# Subscripts: use <sub> tag
chemical = Paragraph("H<sub>2</sub>O", styles['Normal'])
# Superscripts: use <super> tag
squared = Paragraph("x<super>2</super> + y<super>2</super>", styles['Normal'])
```
For canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts.
## Command-Line Tools
### pdftotext (poppler-utils)
```bash
# Extract text
pdftotext input.pdf output.txt
# Extract text preserving layout
pdftotext -layout input.pdf output.txt
# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt # Pages 1-5
```
### qpdf
```bash
# Merge PDFs
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
qpdf input.pdf --pages . 6-10 -- pages6-10.pdf
# Rotate pages
qpdf input.pdf output.pdf --rotate=+90:1 # Rotate page 1 by 90 degrees
# Remove password
qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf
```
### pdftk (if available)
```bash
# Merge
pdftk file1.pdf file2.pdf cat output merged.pdf
# Split
pdftk input.pdf burst
# Rotate
pdftk input.pdf rotate 1east output rotated.pdf
```
## Common Tasks
### Extract Text from Scanned PDFs
```python
# Requires: pip install pytesseract pdf2image
import pytesseract
from pdf2image import convert_from_path
# Convert PDF to images
images = convert_from_path('scanned.pdf')
# OCR each page
text = ""
for i, image in enumerate(images):
text += f"Page {i+1}:\n"
text += pytesseract.image_to_string(image)
text += "\n\n"
print(text)
```
### Add Watermark
```python
from pypdf import PdfReader, PdfWriter
# Create watermark (or load existing)
watermark = PdfReader("watermark.pdf").pages[0]
# Apply to all pages
reader = PdfReader("document.pdf")
writer = PdfWriter()
for page in reader.pages:
page.merge_page(watermark)
writer.add_page(page)
with open("watermarked.pdf", "wb") as output:
writer.write(output)
```
### Extract Images
```bash
# Using pdfimages (poppler-utils)
pdfimages -j input.pdf output_prefix
# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.
```
### Password Protection
```python
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
# Add password
writer.encrypt("userpassword", "ownerpassword")
with open("encrypted.pdf", "wb") as output:
writer.write(output)
```
## Quick Reference
| Task | Best Tool | Command/Code |
|------|-----------|--------------|
| Merge PDFs | pypdf | `writer.add_page(page)` |
| Split PDFs | pypdf | One page per file |
| Extract text | pdfplumber | `page.extract_text()` |
| Extract tables | pdfplumber | `page.extract_tables()` |
| Create PDFs | reportlab | Canvas or Platypus |
| Command line merge | qpdf | `qpdf --empty --pages ...` |
| OCR scanned PDFs | pytesseract | Convert to image first |
| Fill PDF forms | pdf-lib or pypdf (see forms.md) | See forms.md |
## Next Steps
- For advanced pypdfium2 usage, see reference.md
- For JavaScript libraries (pdf-lib), see reference.md
- If you need to fill out a PDF form, follow the instructions in forms.md
- For troubleshooting guides, see reference.md

View file

@ -1,294 +0,0 @@
**CRITICAL: You MUST complete these steps in order. Do not skip ahead to writing code.**
If you need to fill out a PDF form, first check to see if the PDF has fillable form fields. Run this script from this file's directory:
`python scripts/extract_form_field_info.py --check <file.pdf>`, and depending on the result go to either the "Fillable fields" or "Non-fillable fields" and follow those instructions.
# Fillable fields
If the PDF has fillable form fields:
- Run this script from this file's directory: `python scripts/extract_form_field_info.py <input.pdf> <field_info.json>`. It will create a JSON file with a list of fields in this format:
```
[
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"rect": ([left, bottom, right, top] bounding box in PDF coordinates, y=0 is the bottom of the page),
"type": ("text", "checkbox", "radio_group", or "choice"),
},
// Checkboxes have "checked_value" and "unchecked_value" properties:
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"type": "checkbox",
"checked_value": (Set the field to this value to check the checkbox),
"unchecked_value": (Set the field to this value to uncheck the checkbox),
},
// Radio groups have a "radio_options" list with the possible choices.
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"type": "radio_group",
"radio_options": [
{
"value": (set the field to this value to select this radio option),
"rect": (bounding box for the radio button for this option)
},
// Other radio options
]
},
// Multiple choice fields have a "choice_options" list with the possible choices:
{
"field_id": (unique ID for the field),
"page": (page number, 1-based),
"type": "choice",
"choice_options": [
{
"value": (set the field to this value to select this option),
"text": (display text of the option)
},
// Other choice options
],
}
]
```
- Convert the PDF to PNGs (one image for each page) with this script (run from this file's directory):
`python scripts/convert_pdf_to_images.py <file.pdf> <output_directory>`
Then analyze the images to determine the purpose of each form field (make sure to convert the bounding box PDF coordinates to image coordinates).
- Create a `field_values.json` file in this format with the values to be entered for each field:
```
[
{
"field_id": "last_name", // Must match the field_id from `extract_form_field_info.py`
"description": "The user's last name",
"page": 1, // Must match the "page" value in field_info.json
"value": "Simpson"
},
{
"field_id": "Checkbox12",
"description": "Checkbox to be checked if the user is 18 or over",
"page": 1,
"value": "/On" // If this is a checkbox, use its "checked_value" value to check it. If it's a radio button group, use one of the "value" values in "radio_options".
},
// more fields
]
```
- Run the `fill_fillable_fields.py` script from this file's directory to create a filled-in PDF:
`python scripts/fill_fillable_fields.py <input pdf> <field_values.json> <output pdf>`
This script will verify that the field IDs and values you provide are valid; if it prints error messages, correct the appropriate fields and try again.
# Non-fillable fields
If the PDF doesn't have fillable form fields, you'll add text annotations. First try to extract coordinates from the PDF structure (more accurate), then fall back to visual estimation if needed.
## Step 1: Try Structure Extraction First
Run this script to extract text labels, lines, and checkboxes with their exact PDF coordinates:
`python scripts/extract_form_structure.py <input.pdf> form_structure.json`
This creates a JSON file containing:
- **labels**: Every text element with exact coordinates (x0, top, x1, bottom in PDF points)
- **lines**: Horizontal lines that define row boundaries
- **checkboxes**: Small square rectangles that are checkboxes (with center coordinates)
- **row_boundaries**: Row top/bottom positions calculated from horizontal lines
**Check the results**: If `form_structure.json` has meaningful labels (text elements that correspond to form fields), use **Approach A: Structure-Based Coordinates**. If the PDF is scanned/image-based and has few or no labels, use **Approach B: Visual Estimation**.
---
## Approach A: Structure-Based Coordinates (Preferred)
Use this when `extract_form_structure.py` found text labels in the PDF.
### A.1: Analyze the Structure
Read form_structure.json and identify:
1. **Label groups**: Adjacent text elements that form a single label (e.g., "Last" + "Name")
2. **Row structure**: Labels with similar `top` values are in the same row
3. **Field columns**: Entry areas start after label ends (x0 = label.x1 + gap)
4. **Checkboxes**: Use the checkbox coordinates directly from the structure
**Coordinate system**: PDF coordinates where y=0 is at TOP of page, y increases downward.
### A.2: Check for Missing Elements
The structure extraction may not detect all form elements. Common cases:
- **Circular checkboxes**: Only square rectangles are detected as checkboxes
- **Complex graphics**: Decorative elements or non-standard form controls
- **Faded or light-colored elements**: May not be extracted
If you see form fields in the PDF images that aren't in form_structure.json, you'll need to use **visual analysis** for those specific fields (see "Hybrid Approach" below).
### A.3: Create fields.json with PDF Coordinates
For each field, calculate entry coordinates from the extracted structure:
**Text fields:**
- entry x0 = label x1 + 5 (small gap after label)
- entry x1 = next label's x0, or row boundary
- entry top = same as label top
- entry bottom = row boundary line below, or label bottom + row_height
**Checkboxes:**
- Use the checkbox rectangle coordinates directly from form_structure.json
- entry_bounding_box = [checkbox.x0, checkbox.top, checkbox.x1, checkbox.bottom]
Create fields.json using `pdf_width` and `pdf_height` (signals PDF coordinates):
```json
{
"pages": [
{"page_number": 1, "pdf_width": 612, "pdf_height": 792}
],
"form_fields": [
{
"page_number": 1,
"description": "Last name entry field",
"field_label": "Last Name",
"label_bounding_box": [43, 63, 87, 73],
"entry_bounding_box": [92, 63, 260, 79],
"entry_text": {"text": "Smith", "font_size": 10}
},
{
"page_number": 1,
"description": "US Citizen Yes checkbox",
"field_label": "Yes",
"label_bounding_box": [260, 200, 280, 210],
"entry_bounding_box": [285, 197, 292, 205],
"entry_text": {"text": "X"}
}
]
}
```
**Important**: Use `pdf_width`/`pdf_height` and coordinates directly from form_structure.json.
### A.4: Validate Bounding Boxes
Before filling, check your bounding boxes for errors:
`python scripts/check_bounding_boxes.py fields.json`
This checks for intersecting bounding boxes and entry boxes that are too small for the font size. Fix any reported errors before filling.
---
## Approach B: Visual Estimation (Fallback)
Use this when the PDF is scanned/image-based and structure extraction found no usable text labels (e.g., all text shows as "(cid:X)" patterns).
### B.1: Convert PDF to Images
`python scripts/convert_pdf_to_images.py <input.pdf> <images_dir/>`
### B.2: Initial Field Identification
Examine each page image to identify form sections and get **rough estimates** of field locations:
- Form field labels and their approximate positions
- Entry areas (lines, boxes, or blank spaces for text input)
- Checkboxes and their approximate locations
For each field, note approximate pixel coordinates (they don't need to be precise yet).
### B.3: Zoom Refinement (CRITICAL for accuracy)
For each field, crop a region around the estimated position to refine coordinates precisely.
**Create a zoomed crop using ImageMagick:**
```bash
magick <page_image> -crop <width>x<height>+<x>+<y> +repage <crop_output.png>
```
Where:
- `<x>, <y>` = top-left corner of crop region (use your rough estimate minus padding)
- `<width>, <height>` = size of crop region (field area plus ~50px padding on each side)
**Example:** To refine a "Name" field estimated around (100, 150):
```bash
magick images_dir/page_1.png -crop 300x80+50+120 +repage crops/name_field.png
```
(Note: if the `magick` command isn't available, try `convert` with the same arguments).
**Examine the cropped image** to determine precise coordinates:
1. Identify the exact pixel where the entry area begins (after the label)
2. Identify where the entry area ends (before next field or edge)
3. Identify the top and bottom of the entry line/box
**Convert crop coordinates back to full image coordinates:**
- full_x = crop_x + crop_offset_x
- full_y = crop_y + crop_offset_y
Example: If the crop started at (50, 120) and the entry box starts at (52, 18) within the crop:
- entry_x0 = 52 + 50 = 102
- entry_top = 18 + 120 = 138
**Repeat for each field**, grouping nearby fields into single crops when possible.
### B.4: Create fields.json with Refined Coordinates
Create fields.json using `image_width` and `image_height` (signals image coordinates):
```json
{
"pages": [
{"page_number": 1, "image_width": 1700, "image_height": 2200}
],
"form_fields": [
{
"page_number": 1,
"description": "Last name entry field",
"field_label": "Last Name",
"label_bounding_box": [120, 175, 242, 198],
"entry_bounding_box": [255, 175, 720, 218],
"entry_text": {"text": "Smith", "font_size": 10}
}
]
}
```
**Important**: Use `image_width`/`image_height` and the refined pixel coordinates from the zoom analysis.
### B.5: Validate Bounding Boxes
Before filling, check your bounding boxes for errors:
`python scripts/check_bounding_boxes.py fields.json`
This checks for intersecting bounding boxes and entry boxes that are too small for the font size. Fix any reported errors before filling.
---
## Hybrid Approach: Structure + Visual
Use this when structure extraction works for most fields but misses some elements (e.g., circular checkboxes, unusual form controls).
1. **Use Approach A** for fields that were detected in form_structure.json
2. **Convert PDF to images** for visual analysis of missing fields
3. **Use zoom refinement** (from Approach B) for the missing fields
4. **Combine coordinates**: For fields from structure extraction, use `pdf_width`/`pdf_height`. For visually-estimated fields, you must convert image coordinates to PDF coordinates:
- pdf_x = image_x * (pdf_width / image_width)
- pdf_y = image_y * (pdf_height / image_height)
5. **Use a single coordinate system** in fields.json - convert all to PDF coordinates with `pdf_width`/`pdf_height`
---
## Step 2: Validate Before Filling
**Always validate bounding boxes before filling:**
`python scripts/check_bounding_boxes.py fields.json`
This checks for:
- Intersecting bounding boxes (which would cause overlapping text)
- Entry boxes that are too small for the specified font size
Fix any reported errors in fields.json before proceeding.
## Step 3: Fill the Form
The fill script auto-detects the coordinate system and handles conversion:
`python scripts/fill_pdf_form_with_annotations.py <input.pdf> fields.json <output.pdf>`
## Step 4: Verify Output
Convert the filled PDF to images and verify text placement:
`python scripts/convert_pdf_to_images.py <output.pdf> <verify_images/>`
If text is mispositioned:
- **Approach A**: Check that you're using PDF coordinates from form_structure.json with `pdf_width`/`pdf_height`
- **Approach B**: Check that image dimensions match and coordinates are accurate pixels
- **Hybrid**: Ensure coordinate conversions are correct for visually-estimated fields

View file

@ -1,612 +0,0 @@
# PDF Processing Advanced Reference
This document contains advanced PDF processing features, detailed examples, and additional libraries not covered in the main skill instructions.
## pypdfium2 Library (Apache/BSD License)
### Overview
pypdfium2 is a Python binding for PDFium (Chromium's PDF library). It's excellent for fast PDF rendering, image generation, and serves as a PyMuPDF replacement.
### Render PDF to Images
```python
import pypdfium2 as pdfium
from PIL import Image
# Load PDF
pdf = pdfium.PdfDocument("document.pdf")
# Render page to image
page = pdf[0] # First page
bitmap = page.render(
scale=2.0, # Higher resolution
rotation=0 # No rotation
)
# Convert to PIL Image
img = bitmap.to_pil()
img.save("page_1.png", "PNG")
# Process multiple pages
for i, page in enumerate(pdf):
bitmap = page.render(scale=1.5)
img = bitmap.to_pil()
img.save(f"page_{i+1}.jpg", "JPEG", quality=90)
```
### Extract Text with pypdfium2
```python
import pypdfium2 as pdfium
pdf = pdfium.PdfDocument("document.pdf")
for i, page in enumerate(pdf):
text = page.get_text()
print(f"Page {i+1} text length: {len(text)} chars")
```
## JavaScript Libraries
### pdf-lib (MIT License)
pdf-lib is a powerful JavaScript library for creating and modifying PDF documents in any JavaScript environment.
#### Load and Manipulate Existing PDF
```javascript
import { PDFDocument } from 'pdf-lib';
import fs from 'fs';
async function manipulatePDF() {
// Load existing PDF
const existingPdfBytes = fs.readFileSync('input.pdf');
const pdfDoc = await PDFDocument.load(existingPdfBytes);
// Get page count
const pageCount = pdfDoc.getPageCount();
console.log(`Document has ${pageCount} pages`);
// Add new page
const newPage = pdfDoc.addPage([600, 400]);
newPage.drawText('Added by pdf-lib', {
x: 100,
y: 300,
size: 16
});
// Save modified PDF
const pdfBytes = await pdfDoc.save();
fs.writeFileSync('modified.pdf', pdfBytes);
}
```
#### Create Complex PDFs from Scratch
```javascript
import { PDFDocument, rgb, StandardFonts } from 'pdf-lib';
import fs from 'fs';
async function createPDF() {
const pdfDoc = await PDFDocument.create();
// Add fonts
const helveticaFont = await pdfDoc.embedFont(StandardFonts.Helvetica);
const helveticaBold = await pdfDoc.embedFont(StandardFonts.HelveticaBold);
// Add page
const page = pdfDoc.addPage([595, 842]); // A4 size
const { width, height } = page.getSize();
// Add text with styling
page.drawText('Invoice #12345', {
x: 50,
y: height - 50,
size: 18,
font: helveticaBold,
color: rgb(0.2, 0.2, 0.8)
});
// Add rectangle (header background)
page.drawRectangle({
x: 40,
y: height - 100,
width: width - 80,
height: 30,
color: rgb(0.9, 0.9, 0.9)
});
// Add table-like content
const items = [
['Item', 'Qty', 'Price', 'Total'],
['Widget', '2', '$50', '$100'],
['Gadget', '1', '$75', '$75']
];
let yPos = height - 150;
items.forEach(row => {
let xPos = 50;
row.forEach(cell => {
page.drawText(cell, {
x: xPos,
y: yPos,
size: 12,
font: helveticaFont
});
xPos += 120;
});
yPos -= 25;
});
const pdfBytes = await pdfDoc.save();
fs.writeFileSync('created.pdf', pdfBytes);
}
```
#### Advanced Merge and Split Operations
```javascript
import { PDFDocument } from 'pdf-lib';
import fs from 'fs';
async function mergePDFs() {
// Create new document
const mergedPdf = await PDFDocument.create();
// Load source PDFs
const pdf1Bytes = fs.readFileSync('doc1.pdf');
const pdf2Bytes = fs.readFileSync('doc2.pdf');
const pdf1 = await PDFDocument.load(pdf1Bytes);
const pdf2 = await PDFDocument.load(pdf2Bytes);
// Copy pages from first PDF
const pdf1Pages = await mergedPdf.copyPages(pdf1, pdf1.getPageIndices());
pdf1Pages.forEach(page => mergedPdf.addPage(page));
// Copy specific pages from second PDF (pages 0, 2, 4)
const pdf2Pages = await mergedPdf.copyPages(pdf2, [0, 2, 4]);
pdf2Pages.forEach(page => mergedPdf.addPage(page));
const mergedPdfBytes = await mergedPdf.save();
fs.writeFileSync('merged.pdf', mergedPdfBytes);
}
```
### pdfjs-dist (Apache License)
PDF.js is Mozilla's JavaScript library for rendering PDFs in the browser.
#### Basic PDF Loading and Rendering
```javascript
import * as pdfjsLib from 'pdfjs-dist';
// Configure worker (important for performance)
pdfjsLib.GlobalWorkerOptions.workerSrc = './pdf.worker.js';
async function renderPDF() {
// Load PDF
const loadingTask = pdfjsLib.getDocument('document.pdf');
const pdf = await loadingTask.promise;
console.log(`Loaded PDF with ${pdf.numPages} pages`);
// Get first page
const page = await pdf.getPage(1);
const viewport = page.getViewport({ scale: 1.5 });
// Render to canvas
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
canvas.height = viewport.height;
canvas.width = viewport.width;
const renderContext = {
canvasContext: context,
viewport: viewport
};
await page.render(renderContext).promise;
document.body.appendChild(canvas);
}
```
#### Extract Text with Coordinates
```javascript
import * as pdfjsLib from 'pdfjs-dist';
async function extractText() {
const loadingTask = pdfjsLib.getDocument('document.pdf');
const pdf = await loadingTask.promise;
let fullText = '';
// Extract text from all pages
for (let i = 1; i <= pdf.numPages; i++) {
const page = await pdf.getPage(i);
const textContent = await page.getTextContent();
const pageText = textContent.items
.map(item => item.str)
.join(' ');
fullText += `\n--- Page ${i} ---\n${pageText}`;
// Get text with coordinates for advanced processing
const textWithCoords = textContent.items.map(item => ({
text: item.str,
x: item.transform[4],
y: item.transform[5],
width: item.width,
height: item.height
}));
}
console.log(fullText);
return fullText;
}
```
#### Extract Annotations and Forms
```javascript
import * as pdfjsLib from 'pdfjs-dist';
async function extractAnnotations() {
const loadingTask = pdfjsLib.getDocument('annotated.pdf');
const pdf = await loadingTask.promise;
for (let i = 1; i <= pdf.numPages; i++) {
const page = await pdf.getPage(i);
const annotations = await page.getAnnotations();
annotations.forEach(annotation => {
console.log(`Annotation type: ${annotation.subtype}`);
console.log(`Content: ${annotation.contents}`);
console.log(`Coordinates: ${JSON.stringify(annotation.rect)}`);
});
}
}
```
## Advanced Command-Line Operations
### poppler-utils Advanced Features
#### Extract Text with Bounding Box Coordinates
```bash
# Extract text with bounding box coordinates (essential for structured data)
pdftotext -bbox-layout document.pdf output.xml
# The XML output contains precise coordinates for each text element
```
#### Advanced Image Conversion
```bash
# Convert to PNG images with specific resolution
pdftoppm -png -r 300 document.pdf output_prefix
# Convert specific page range with high resolution
pdftoppm -png -r 600 -f 1 -l 3 document.pdf high_res_pages
# Convert to JPEG with quality setting
pdftoppm -jpeg -jpegopt quality=85 -r 200 document.pdf jpeg_output
```
#### Extract Embedded Images
```bash
# Extract all embedded images with metadata
pdfimages -j -p document.pdf page_images
# List image info without extracting
pdfimages -list document.pdf
# Extract images in their original format
pdfimages -all document.pdf images/img
```
### qpdf Advanced Features
#### Complex Page Manipulation
```bash
# Split PDF into groups of pages
qpdf --split-pages=3 input.pdf output_group_%02d.pdf
# Extract specific pages with complex ranges
qpdf input.pdf --pages input.pdf 1,3-5,8,10-end -- extracted.pdf
# Merge specific pages from multiple PDFs
qpdf --empty --pages doc1.pdf 1-3 doc2.pdf 5-7 doc3.pdf 2,4 -- combined.pdf
```
#### PDF Optimization and Repair
```bash
# Optimize PDF for web (linearize for streaming)
qpdf --linearize input.pdf optimized.pdf
# Remove unused objects and compress
qpdf --optimize-level=all input.pdf compressed.pdf
# Attempt to repair corrupted PDF structure
qpdf --check input.pdf
qpdf --fix-qdf damaged.pdf repaired.pdf
# Show detailed PDF structure for debugging
qpdf --show-all-pages input.pdf > structure.txt
```
#### Advanced Encryption
```bash
# Add password protection with specific permissions
qpdf --encrypt user_pass owner_pass 256 --print=none --modify=none -- input.pdf encrypted.pdf
# Check encryption status
qpdf --show-encryption encrypted.pdf
# Remove password protection (requires password)
qpdf --password=secret123 --decrypt encrypted.pdf decrypted.pdf
```
## Advanced Python Techniques
### pdfplumber Advanced Features
#### Extract Text with Precise Coordinates
```python
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
page = pdf.pages[0]
# Extract all text with coordinates
chars = page.chars
for char in chars[:10]: # First 10 characters
print(f"Char: '{char['text']}' at x:{char['x0']:.1f} y:{char['y0']:.1f}")
# Extract text by bounding box (left, top, right, bottom)
bbox_text = page.within_bbox((100, 100, 400, 200)).extract_text()
```
#### Advanced Table Extraction with Custom Settings
```python
import pdfplumber
import pandas as pd
with pdfplumber.open("complex_table.pdf") as pdf:
page = pdf.pages[0]
# Extract tables with custom settings for complex layouts
table_settings = {
"vertical_strategy": "lines",
"horizontal_strategy": "lines",
"snap_tolerance": 3,
"intersection_tolerance": 15
}
tables = page.extract_tables(table_settings)
# Visual debugging for table extraction
img = page.to_image(resolution=150)
img.save("debug_layout.png")
```
### reportlab Advanced Features
#### Create Professional Reports with Tables
```python
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib import colors
# Sample data
data = [
['Product', 'Q1', 'Q2', 'Q3', 'Q4'],
['Widgets', '120', '135', '142', '158'],
['Gadgets', '85', '92', '98', '105']
]
# Create PDF with table
doc = SimpleDocTemplate("report.pdf")
elements = []
# Add title
styles = getSampleStyleSheet()
title = Paragraph("Quarterly Sales Report", styles['Title'])
elements.append(title)
# Add table with advanced styling
table = Table(data)
table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.grey),
('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('FONTSIZE', (0, 0), (-1, 0), 14),
('BOTTOMPADDING', (0, 0), (-1, 0), 12),
('BACKGROUND', (0, 1), (-1, -1), colors.beige),
('GRID', (0, 0), (-1, -1), 1, colors.black)
]))
elements.append(table)
doc.build(elements)
```
## Complex Workflows
### Extract Figures/Images from PDF
#### Method 1: Using pdfimages (fastest)
```bash
# Extract all images with original quality
pdfimages -all document.pdf images/img
```
#### Method 2: Using pypdfium2 + Image Processing
```python
import pypdfium2 as pdfium
from PIL import Image
import numpy as np
def extract_figures(pdf_path, output_dir):
pdf = pdfium.PdfDocument(pdf_path)
for page_num, page in enumerate(pdf):
# Render high-resolution page
bitmap = page.render(scale=3.0)
img = bitmap.to_pil()
# Convert to numpy for processing
img_array = np.array(img)
# Simple figure detection (non-white regions)
mask = np.any(img_array != [255, 255, 255], axis=2)
# Find contours and extract bounding boxes
# (This is simplified - real implementation would need more sophisticated detection)
# Save detected figures
# ... implementation depends on specific needs
```
### Batch PDF Processing with Error Handling
```python
import os
import glob
from pypdf import PdfReader, PdfWriter
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def batch_process_pdfs(input_dir, operation='merge'):
pdf_files = glob.glob(os.path.join(input_dir, "*.pdf"))
if operation == 'merge':
writer = PdfWriter()
for pdf_file in pdf_files:
try:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
logger.info(f"Processed: {pdf_file}")
except Exception as e:
logger.error(f"Failed to process {pdf_file}: {e}")
continue
with open("batch_merged.pdf", "wb") as output:
writer.write(output)
elif operation == 'extract_text':
for pdf_file in pdf_files:
try:
reader = PdfReader(pdf_file)
text = ""
for page in reader.pages:
text += page.extract_text()
output_file = pdf_file.replace('.pdf', '.txt')
with open(output_file, 'w', encoding='utf-8') as f:
f.write(text)
logger.info(f"Extracted text from: {pdf_file}")
except Exception as e:
logger.error(f"Failed to extract text from {pdf_file}: {e}")
continue
```
### Advanced PDF Cropping
```python
from pypdf import PdfWriter, PdfReader
reader = PdfReader("input.pdf")
writer = PdfWriter()
# Crop page (left, bottom, right, top in points)
page = reader.pages[0]
page.mediabox.left = 50
page.mediabox.bottom = 50
page.mediabox.right = 550
page.mediabox.top = 750
writer.add_page(page)
with open("cropped.pdf", "wb") as output:
writer.write(output)
```
## Performance Optimization Tips
### 1. For Large PDFs
- Use streaming approaches instead of loading entire PDF in memory
- Use `qpdf --split-pages` for splitting large files
- Process pages individually with pypdfium2
### 2. For Text Extraction
- `pdftotext -bbox-layout` is fastest for plain text extraction
- Use pdfplumber for structured data and tables
- Avoid `pypdf.extract_text()` for very large documents
### 3. For Image Extraction
- `pdfimages` is much faster than rendering pages
- Use low resolution for previews, high resolution for final output
### 4. For Form Filling
- pdf-lib maintains form structure better than most alternatives
- Pre-validate form fields before processing
### 5. Memory Management
```python
# Process PDFs in chunks
def process_large_pdf(pdf_path, chunk_size=10):
reader = PdfReader(pdf_path)
total_pages = len(reader.pages)
for start_idx in range(0, total_pages, chunk_size):
end_idx = min(start_idx + chunk_size, total_pages)
writer = PdfWriter()
for i in range(start_idx, end_idx):
writer.add_page(reader.pages[i])
# Process chunk
with open(f"chunk_{start_idx//chunk_size}.pdf", "wb") as output:
writer.write(output)
```
## Troubleshooting Common Issues
### Encrypted PDFs
```python
# Handle password-protected PDFs
from pypdf import PdfReader
try:
reader = PdfReader("encrypted.pdf")
if reader.is_encrypted:
reader.decrypt("password")
except Exception as e:
print(f"Failed to decrypt: {e}")
```
### Corrupted PDFs
```bash
# Use qpdf to repair
qpdf --check corrupted.pdf
qpdf --replace-input corrupted.pdf
```
### Text Extraction Issues
```python
# Fallback to OCR for scanned PDFs
import pytesseract
from pdf2image import convert_from_path
def extract_text_with_ocr(pdf_path):
images = convert_from_path(pdf_path)
text = ""
for i, image in enumerate(images):
text += pytesseract.image_to_string(image)
return text
```
## License Information
- **pypdf**: BSD License
- **pdfplumber**: MIT License
- **pypdfium2**: Apache/BSD License
- **reportlab**: BSD License
- **poppler-utils**: GPL-2 License
- **qpdf**: Apache License
- **pdf-lib**: MIT License
- **pdfjs-dist**: Apache License

View file

@ -1,263 +0,0 @@
---
name: PowerPoint Presentation
description: "Use this skill any time a .pptx file is involved in any way -- as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions \"deck,\" \"slides,\" \"presentation,\" or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill."
version: 1.0.0
metadata:
emoji: "📊"
tags:
- office
- presentation
- pptx
install:
- id: brew-libreoffice
kind: brew
formula: libreoffice
bins: [soffice]
label: "Install LibreOffice for PDF conversion"
os: [darwin]
- id: brew-poppler
kind: brew
formula: poppler
bins: [pdftoppm]
label: "Install poppler for PDF to image conversion"
os: [darwin, linux]
- id: npm-pptxgenjs
kind: node
formula: pptxgenjs
bins: []
label: "Install PptxGenJS for creating presentations"
- id: npm-markitdown
kind: uv
formula: "markitdown[pptx]"
bins: [markitdown]
label: "Install markitdown for text extraction"
userInvocable: true
disableModelInvocation: false
---
# PPTX Skill
## Quick Reference
| Task | Guide |
|------|-------|
| Read/analyze content | `python -m markitdown presentation.pptx` |
| Edit or create from template | Read [editing.md](editing.md) |
| Create from scratch | Read [pptxgenjs.md](pptxgenjs.md) |
---
## Reading Content
```bash
# Text extraction
python -m markitdown presentation.pptx
# Visual overview
python scripts/thumbnail.py presentation.pptx
# Raw XML
python scripts/office/unpack.py presentation.pptx unpacked/
```
---
## Editing Workflow
**Read [editing.md](editing.md) for full details.**
1. Analyze template with `thumbnail.py`
2. Unpack → manipulate slides → edit content → clean → pack
---
## Creating from Scratch
**Read [pptxgenjs.md](pptxgenjs.md) for full details.**
Use when no template or reference presentation is available.
---
## Design Ideas
**Don't create boring slides.** Plain bullets on a white background won't impress anyone. Consider ideas from this list for each slide.
### Before Starting
- **Pick a bold, content-informed color palette**: The palette should feel designed for THIS topic. If swapping your colors into a completely different presentation would still "work," you haven't made specific enough choices.
- **Dominance over equality**: One color should dominate (60-70% visual weight), with 1-2 supporting tones and one sharp accent. Never give all colors equal weight.
- **Dark/light contrast**: Dark backgrounds for title + conclusion slides, light for content ("sandwich" structure). Or commit to dark throughout for a premium feel.
- **Commit to a visual motif**: Pick ONE distinctive element and repeat it -- rounded image frames, icons in colored circles, thick single-side borders. Carry it across every slide.
### Color Palettes
Choose colors that match your topic -- don't default to generic blue. Use these palettes as inspiration:
| Theme | Primary | Secondary | Accent |
|-------|---------|-----------|--------|
| **Midnight Executive** | `1E2761` (navy) | `CADCFC` (ice blue) | `FFFFFF` (white) |
| **Forest & Moss** | `2C5F2D` (forest) | `97BC62` (moss) | `F5F5F5` (cream) |
| **Coral Energy** | `F96167` (coral) | `F9E795` (gold) | `2F3C7E` (navy) |
| **Warm Terracotta** | `B85042` (terracotta) | `E7E8D1` (sand) | `A7BEAE` (sage) |
| **Ocean Gradient** | `065A82` (deep blue) | `1C7293` (teal) | `21295C` (midnight) |
| **Charcoal Minimal** | `36454F` (charcoal) | `F2F2F2` (off-white) | `212121` (black) |
| **Teal Trust** | `028090` (teal) | `00A896` (seafoam) | `02C39A` (mint) |
| **Berry & Cream** | `6D2E46` (berry) | `A26769` (dusty rose) | `ECE2D0` (cream) |
| **Sage Calm** | `84B59F` (sage) | `69A297` (eucalyptus) | `50808E` (slate) |
| **Cherry Bold** | `990011` (cherry) | `FCF6F5` (off-white) | `2F3C7E` (navy) |
### For Each Slide
**Every slide needs a visual element** -- image, chart, icon, or shape. Text-only slides are forgettable.
**Layout options:**
- Two-column (text left, illustration on right)
- Icon + text rows (icon in colored circle, bold header, description below)
- 2x2 or 2x3 grid (image on one side, grid of content blocks on other)
- Half-bleed image (full left or right side) with content overlay
**Data display:**
- Large stat callouts (big numbers 60-72pt with small labels below)
- Comparison columns (before/after, pros/cons, side-by-side options)
- Timeline or process flow (numbered steps, arrows)
**Visual polish:**
- Icons in small colored circles next to section headers
- Italic accent text for key stats or taglines
### Typography
**Choose an interesting font pairing** -- don't default to Arial. Pick a header font with personality and pair it with a clean body font.
| Header Font | Body Font |
|-------------|-----------|
| Georgia | Calibri |
| Arial Black | Arial |
| Calibri | Calibri Light |
| Cambria | Calibri |
| Trebuchet MS | Calibri |
| Impact | Arial |
| Palatino | Garamond |
| Consolas | Calibri |
| Element | Size |
|---------|------|
| Slide title | 36-44pt bold |
| Section header | 20-24pt bold |
| Body text | 14-16pt |
| Captions | 10-12pt muted |
### Spacing
- 0.5" minimum margins
- 0.3-0.5" between content blocks
- Leave breathing room--don't fill every inch
### Avoid (Common Mistakes)
- **Don't repeat the same layout** -- vary columns, cards, and callouts across slides
- **Don't center body text** -- left-align paragraphs and lists; center only titles
- **Don't skimp on size contrast** -- titles need 36pt+ to stand out from 14-16pt body
- **Don't default to blue** -- pick colors that reflect the specific topic
- **Don't mix spacing randomly** -- choose 0.3" or 0.5" gaps and use consistently
- **Don't style one slide and leave the rest plain** -- commit fully or keep it simple throughout
- **Don't create text-only slides** -- add images, icons, charts, or visual elements; avoid plain title + bullets
- **Don't forget text box padding** -- when aligning lines or shapes with text edges, set `margin: 0` on the text box or offset the shape to account for padding
- **Don't use low-contrast elements** -- icons AND text need strong contrast against the background; avoid light text on light backgrounds or dark text on dark backgrounds
- **NEVER use accent lines under titles** -- these are a hallmark of AI-generated slides; use whitespace or background color instead
---
## QA (Required)
**Assume there are problems. Your job is to find them.**
Your first render is almost never correct. Approach QA as a bug hunt, not a confirmation step. If you found zero issues on first inspection, you weren't looking hard enough.
### Content QA
```bash
python -m markitdown output.pptx
```
Check for missing content, typos, wrong order.
**When using templates, check for leftover placeholder text:**
```bash
python -m markitdown output.pptx | grep -iE "xxxx|lorem|ipsum|this.*(page|slide).*layout"
```
If grep returns results, fix them before declaring success.
### Visual QA
**USE SUBAGENTS** -- even for 2-3 slides. You've been staring at the code and will see what you expect, not what's there. Subagents have fresh eyes.
Convert slides to images (see [Converting to Images](#converting-to-images)), then use this prompt:
```
Visually inspect these slides. Assume there are issues -- find them.
Look for:
- Overlapping elements (text through shapes, lines through words, stacked elements)
- Text overflow or cut off at edges/box boundaries
- Decorative lines positioned for single-line text but title wrapped to two lines
- Source citations or footers colliding with content above
- Elements too close (< 0.3" gaps) or cards/sections nearly touching
- Uneven gaps (large empty area in one place, cramped in another)
- Insufficient margin from slide edges (< 0.5")
- Columns or similar elements not aligned consistently
- Low-contrast text (e.g., light gray text on cream-colored background)
- Low-contrast icons (e.g., dark icons on dark backgrounds without a contrasting circle)
- Text boxes too narrow causing excessive wrapping
- Leftover placeholder content
For each slide, list issues or areas of concern, even if minor.
Read and analyze these images:
1. /path/to/slide-01.jpg (Expected: [brief description])
2. /path/to/slide-02.jpg (Expected: [brief description])
Report ALL issues found, including minor ones.
```
### Verification Loop
1. Generate slides → Convert to images → Inspect
2. **List issues found** (if none found, look again more critically)
3. Fix issues
4. **Re-verify affected slides** -- one fix often creates another problem
5. Repeat until a full pass reveals no new issues
**Do not declare success until you've completed at least one fix-and-verify cycle.**
---
## Converting to Images
Convert presentations to individual slide images for visual inspection:
```bash
python scripts/office/soffice.py --headless --convert-to pdf output.pptx
pdftoppm -jpeg -r 150 output.pdf slide
```
This creates `slide-01.jpg`, `slide-02.jpg`, etc.
To re-render specific slides after fixes:
```bash
pdftoppm -jpeg -r 150 -f N -l N output.pdf slide-fixed
```
---
## Dependencies
- `pip install "markitdown[pptx]"` - text extraction
- `pip install Pillow` - thumbnail grids
- `npm install -g pptxgenjs` - creating from scratch
- LibreOffice (`soffice`) - PDF conversion (auto-configured for sandboxed environments via `scripts/office/soffice.py`)
- Poppler (`pdftoppm`) - PDF to images

View file

@ -1,205 +0,0 @@
# Editing Presentations
## Template-Based Workflow
When using an existing presentation as a template:
1. **Analyze existing slides**:
```bash
python scripts/thumbnail.py template.pptx
python -m markitdown template.pptx
```
Review `thumbnails.jpg` to see layouts, and markitdown output to see placeholder text.
2. **Plan slide mapping**: For each content section, choose a template slide.
⚠️ **USE VARIED LAYOUTS** — monotonous presentations are a common failure mode. Don't default to basic title + bullet slides. Actively seek out:
- Multi-column layouts (2-column, 3-column)
- Image + text combinations
- Full-bleed images with text overlay
- Quote or callout slides
- Section dividers
- Stat/number callouts
- Icon grids or icon + text rows
**Avoid:** Repeating the same text-heavy layout for every slide.
Match content type to layout style (e.g., key points → bullet slide, team info → multi-column, testimonials → quote slide).
3. **Unpack**: `python scripts/office/unpack.py template.pptx unpacked/`
4. **Build presentation** (do this yourself, not with subagents):
- Delete unwanted slides (remove from `<p:sldIdLst>`)
- Duplicate slides you want to reuse (`add_slide.py`)
- Reorder slides in `<p:sldIdLst>`
- **Complete all structural changes before step 5**
5. **Edit content**: Update text in each `slide{N}.xml`.
**Use subagents here if available** — slides are separate XML files, so subagents can edit in parallel.
6. **Clean**: `python scripts/clean.py unpacked/`
7. **Pack**: `python scripts/office/pack.py unpacked/ output.pptx --original template.pptx`
---
## Scripts
| Script | Purpose |
|--------|---------|
| `unpack.py` | Extract and pretty-print PPTX |
| `add_slide.py` | Duplicate slide or create from layout |
| `clean.py` | Remove orphaned files |
| `pack.py` | Repack with validation |
| `thumbnail.py` | Create visual grid of slides |
### unpack.py
```bash
python scripts/office/unpack.py input.pptx unpacked/
```
Extracts PPTX, pretty-prints XML, escapes smart quotes.
### add_slide.py
```bash
python scripts/add_slide.py unpacked/ slide2.xml # Duplicate slide
python scripts/add_slide.py unpacked/ slideLayout2.xml # From layout
```
Prints `<p:sldId>` to add to `<p:sldIdLst>` at desired position.
### clean.py
```bash
python scripts/clean.py unpacked/
```
Removes slides not in `<p:sldIdLst>`, unreferenced media, orphaned rels.
### pack.py
```bash
python scripts/office/pack.py unpacked/ output.pptx --original input.pptx
```
Validates, repairs, condenses XML, re-encodes smart quotes.
### thumbnail.py
```bash
python scripts/thumbnail.py input.pptx [output_prefix] [--cols N]
```
Creates `thumbnails.jpg` with slide filenames as labels. Default 3 columns, max 12 per grid.
**Use for template analysis only** (choosing layouts). For visual QA, use `soffice` + `pdftoppm` to create full-resolution individual slide images—see SKILL.md.
---
## Slide Operations
Slide order is in `ppt/presentation.xml``<p:sldIdLst>`.
**Reorder**: Rearrange `<p:sldId>` elements.
**Delete**: Remove `<p:sldId>`, then run `clean.py`.
**Add**: Use `add_slide.py`. Never manually copy slide files—the script handles notes references, Content_Types.xml, and relationship IDs that manual copying misses.
---
## Editing Content
**Subagents:** If available, use them here (after completing step 4). Each slide is a separate XML file, so subagents can edit in parallel. In your prompt to subagents, include:
- The slide file path(s) to edit
- **"Use the Edit tool for all changes"**
- The formatting rules and common pitfalls below
For each slide:
1. Read the slide's XML
2. Identify ALL placeholder content—text, images, charts, icons, captions
3. Replace each placeholder with final content
**Use the Edit tool, not sed or Python scripts.** The Edit tool forces specificity about what to replace and where, yielding better reliability.
### Formatting Rules
- **Bold all headers, subheadings, and inline labels**: Use `b="1"` on `<a:rPr>`. This includes:
- Slide titles
- Section headers within a slide
- Inline labels like (e.g.: "Status:", "Description:") at the start of a line
- **Never use unicode bullets (•)**: Use proper list formatting with `<a:buChar>` or `<a:buAutoNum>`
- **Bullet consistency**: Let bullets inherit from the layout. Only specify `<a:buChar>` or `<a:buNone>`.
---
## Common Pitfalls
### Template Adaptation
When source content has fewer items than the template:
- **Remove excess elements entirely** (images, shapes, text boxes), don't just clear text
- Check for orphaned visuals after clearing text content
- Run visual QA to catch mismatched counts
When replacing text with different length content:
- **Shorter replacements**: Usually safe
- **Longer replacements**: May overflow or wrap unexpectedly
- Test with visual QA after text changes
- Consider truncating or splitting content to fit the template's design constraints
**Template slots ≠ Source items**: If template has 4 team members but source has 3 users, delete the 4th member's entire group (image + text boxes), not just the text.
### Multi-Item Content
If source has multiple items (numbered lists, multiple sections), create separate `<a:p>` elements for each — **never concatenate into one string**.
**❌ WRONG** — all items in one paragraph:
```xml
<a:p>
<a:r><a:rPr .../><a:t>Step 1: Do the first thing. Step 2: Do the second thing.</a:t></a:r>
</a:p>
```
**✅ CORRECT** — separate paragraphs with bold headers:
```xml
<a:p>
<a:pPr algn="l"><a:lnSpc><a:spcPts val="3919"/></a:lnSpc></a:pPr>
<a:r><a:rPr lang="en-US" sz="2799" b="1" .../><a:t>Step 1</a:t></a:r>
</a:p>
<a:p>
<a:pPr algn="l"><a:lnSpc><a:spcPts val="3919"/></a:lnSpc></a:pPr>
<a:r><a:rPr lang="en-US" sz="2799" .../><a:t>Do the first thing.</a:t></a:r>
</a:p>
<a:p>
<a:pPr algn="l"><a:lnSpc><a:spcPts val="3919"/></a:lnSpc></a:pPr>
<a:r><a:rPr lang="en-US" sz="2799" b="1" .../><a:t>Step 2</a:t></a:r>
</a:p>
<!-- continue pattern -->
```
Copy `<a:pPr>` from the original paragraph to preserve line spacing. Use `b="1"` on headers.
### Smart Quotes
Handled automatically by unpack/pack. But the Edit tool converts smart quotes to ASCII.
**When adding new text with quotes, use XML entities:**
```xml
<a:t>the &#x201C;Agreement&#x201D;</a:t>
```
| Character | Name | Unicode | XML Entity |
|-----------|------|---------|------------|
| `“` | Left double quote | U+201C | `&#x201C;` |
| `”` | Right double quote | U+201D | `&#x201D;` |
| `` | Left single quote | U+2018 | `&#x2018;` |
| `` | Right single quote | U+2019 | `&#x2019;` |
### Other
- **Whitespace**: Use `xml:space="preserve"` on `<a:t>` with leading/trailing spaces
- **XML parsing**: Use `defusedxml.minidom`, not `xml.etree.ElementTree` (corrupts namespaces)

View file

@ -1,420 +0,0 @@
# PptxGenJS Tutorial
## Setup & Basic Structure
```javascript
const pptxgen = require("pptxgenjs");
let pres = new pptxgen();
pres.layout = 'LAYOUT_16x9'; // or 'LAYOUT_16x10', 'LAYOUT_4x3', 'LAYOUT_WIDE'
pres.author = 'Your Name';
pres.title = 'Presentation Title';
let slide = pres.addSlide();
slide.addText("Hello World!", { x: 0.5, y: 0.5, fontSize: 36, color: "363636" });
pres.writeFile({ fileName: "Presentation.pptx" });
```
## Layout Dimensions
Slide dimensions (coordinates in inches):
- `LAYOUT_16x9`: 10" × 5.625" (default)
- `LAYOUT_16x10`: 10" × 6.25"
- `LAYOUT_4x3`: 10" × 7.5"
- `LAYOUT_WIDE`: 13.3" × 7.5"
---
## Text & Formatting
```javascript
// Basic text
slide.addText("Simple Text", {
x: 1, y: 1, w: 8, h: 2, fontSize: 24, fontFace: "Arial",
color: "363636", bold: true, align: "center", valign: "middle"
});
// Character spacing (use charSpacing, not letterSpacing which is silently ignored)
slide.addText("SPACED TEXT", { x: 1, y: 1, w: 8, h: 1, charSpacing: 6 });
// Rich text arrays
slide.addText([
{ text: "Bold ", options: { bold: true } },
{ text: "Italic ", options: { italic: true } }
], { x: 1, y: 3, w: 8, h: 1 });
// Multi-line text (requires breakLine: true)
slide.addText([
{ text: "Line 1", options: { breakLine: true } },
{ text: "Line 2", options: { breakLine: true } },
{ text: "Line 3" } // Last item doesn't need breakLine
], { x: 0.5, y: 0.5, w: 8, h: 2 });
// Text box margin (internal padding)
slide.addText("Title", {
x: 0.5, y: 0.3, w: 9, h: 0.6,
margin: 0 // Use 0 when aligning text with other elements like shapes or icons
});
```
**Tip:** Text boxes have internal margin by default. Set `margin: 0` when you need text to align precisely with shapes, lines, or icons at the same x-position.
---
## Lists & Bullets
```javascript
// ✅ CORRECT: Multiple bullets
slide.addText([
{ text: "First item", options: { bullet: true, breakLine: true } },
{ text: "Second item", options: { bullet: true, breakLine: true } },
{ text: "Third item", options: { bullet: true } }
], { x: 0.5, y: 0.5, w: 8, h: 3 });
// ❌ WRONG: Never use unicode bullets
slide.addText("• First item", { ... }); // Creates double bullets
// Sub-items and numbered lists
{ text: "Sub-item", options: { bullet: true, indentLevel: 1 } }
{ text: "First", options: { bullet: { type: "number" }, breakLine: true } }
```
---
## Shapes
```javascript
slide.addShape(pres.shapes.RECTANGLE, {
x: 0.5, y: 0.8, w: 1.5, h: 3.0,
fill: { color: "FF0000" }, line: { color: "000000", width: 2 }
});
slide.addShape(pres.shapes.OVAL, { x: 4, y: 1, w: 2, h: 2, fill: { color: "0000FF" } });
slide.addShape(pres.shapes.LINE, {
x: 1, y: 3, w: 5, h: 0, line: { color: "FF0000", width: 3, dashType: "dash" }
});
// With transparency
slide.addShape(pres.shapes.RECTANGLE, {
x: 1, y: 1, w: 3, h: 2,
fill: { color: "0088CC", transparency: 50 }
});
// Rounded rectangle (rectRadius only works with ROUNDED_RECTANGLE, not RECTANGLE)
// ⚠️ Don't pair with rectangular accent overlays — they won't cover rounded corners. Use RECTANGLE instead.
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, {
x: 1, y: 1, w: 3, h: 2,
fill: { color: "FFFFFF" }, rectRadius: 0.1
});
// With shadow
slide.addShape(pres.shapes.RECTANGLE, {
x: 1, y: 1, w: 3, h: 2,
fill: { color: "FFFFFF" },
shadow: { type: "outer", color: "000000", blur: 6, offset: 2, angle: 135, opacity: 0.15 }
});
```
Shadow options:
| Property | Type | Range | Notes |
|----------|------|-------|-------|
| `type` | string | `"outer"`, `"inner"` | |
| `color` | string | 6-char hex (e.g. `"000000"`) | No `#` prefix, no 8-char hex — see Common Pitfalls |
| `blur` | number | 0-100 pt | |
| `offset` | number | 0-200 pt | **Must be non-negative** — negative values corrupt the file |
| `angle` | number | 0-359 degrees | Direction the shadow falls (135 = bottom-right, 270 = upward) |
| `opacity` | number | 0.0-1.0 | Use this for transparency, never encode in color string |
To cast a shadow upward (e.g. on a footer bar), use `angle: 270` with a positive offset — do **not** use a negative offset.
**Note**: Gradient fills are not natively supported. Use a gradient image as a background instead.
---
## Images
### Image Sources
```javascript
// From file path
slide.addImage({ path: "images/chart.png", x: 1, y: 1, w: 5, h: 3 });
// From URL
slide.addImage({ path: "https://example.com/image.jpg", x: 1, y: 1, w: 5, h: 3 });
// From base64 (faster, no file I/O)
slide.addImage({ data: "image/png;base64,iVBORw0KGgo...", x: 1, y: 1, w: 5, h: 3 });
```
### Image Options
```javascript
slide.addImage({
path: "image.png",
x: 1, y: 1, w: 5, h: 3,
rotate: 45, // 0-359 degrees
rounding: true, // Circular crop
transparency: 50, // 0-100
flipH: true, // Horizontal flip
flipV: false, // Vertical flip
altText: "Description", // Accessibility
hyperlink: { url: "https://example.com" }
});
```
### Image Sizing Modes
```javascript
// Contain - fit inside, preserve ratio
{ sizing: { type: 'contain', w: 4, h: 3 } }
// Cover - fill area, preserve ratio (may crop)
{ sizing: { type: 'cover', w: 4, h: 3 } }
// Crop - cut specific portion
{ sizing: { type: 'crop', x: 0.5, y: 0.5, w: 2, h: 2 } }
```
### Calculate Dimensions (preserve aspect ratio)
```javascript
const origWidth = 1978, origHeight = 923, maxHeight = 3.0;
const calcWidth = maxHeight * (origWidth / origHeight);
const centerX = (10 - calcWidth) / 2;
slide.addImage({ path: "image.png", x: centerX, y: 1.2, w: calcWidth, h: maxHeight });
```
### Supported Formats
- **Standard**: PNG, JPG, GIF (animated GIFs work in Microsoft 365)
- **SVG**: Works in modern PowerPoint/Microsoft 365
---
## Icons
Use react-icons to generate SVG icons, then rasterize to PNG for universal compatibility.
### Setup
```javascript
const React = require("react");
const ReactDOMServer = require("react-dom/server");
const sharp = require("sharp");
const { FaCheckCircle, FaChartLine } = require("react-icons/fa");
function renderIconSvg(IconComponent, color = "#000000", size = 256) {
return ReactDOMServer.renderToStaticMarkup(
React.createElement(IconComponent, { color, size: String(size) })
);
}
async function iconToBase64Png(IconComponent, color, size = 256) {
const svg = renderIconSvg(IconComponent, color, size);
const pngBuffer = await sharp(Buffer.from(svg)).png().toBuffer();
return "image/png;base64," + pngBuffer.toString("base64");
}
```
### Add Icon to Slide
```javascript
const iconData = await iconToBase64Png(FaCheckCircle, "#4472C4", 256);
slide.addImage({
data: iconData,
x: 1, y: 1, w: 0.5, h: 0.5 // Size in inches
});
```
**Note**: Use size 256 or higher for crisp icons. The size parameter controls the rasterization resolution, not the display size on the slide (which is set by `w` and `h` in inches).
### Icon Libraries
Install: `npm install -g react-icons react react-dom sharp`
Popular icon sets in react-icons:
- `react-icons/fa` - Font Awesome
- `react-icons/md` - Material Design
- `react-icons/hi` - Heroicons
- `react-icons/bi` - Bootstrap Icons
---
## Slide Backgrounds
```javascript
// Solid color
slide.background = { color: "F1F1F1" };
// Color with transparency
slide.background = { color: "FF3399", transparency: 50 };
// Image from URL
slide.background = { path: "https://example.com/bg.jpg" };
// Image from base64
slide.background = { data: "image/png;base64,iVBORw0KGgo..." };
```
---
## Tables
```javascript
slide.addTable([
["Header 1", "Header 2"],
["Cell 1", "Cell 2"]
], {
x: 1, y: 1, w: 8, h: 2,
border: { pt: 1, color: "999999" }, fill: { color: "F1F1F1" }
});
// Advanced with merged cells
let tableData = [
[{ text: "Header", options: { fill: { color: "6699CC" }, color: "FFFFFF", bold: true } }, "Cell"],
[{ text: "Merged", options: { colspan: 2 } }]
];
slide.addTable(tableData, { x: 1, y: 3.5, w: 8, colW: [4, 4] });
```
---
## Charts
```javascript
// Bar chart
slide.addChart(pres.charts.BAR, [{
name: "Sales", labels: ["Q1", "Q2", "Q3", "Q4"], values: [4500, 5500, 6200, 7100]
}], {
x: 0.5, y: 0.6, w: 6, h: 3, barDir: 'col',
showTitle: true, title: 'Quarterly Sales'
});
// Line chart
slide.addChart(pres.charts.LINE, [{
name: "Temp", labels: ["Jan", "Feb", "Mar"], values: [32, 35, 42]
}], { x: 0.5, y: 4, w: 6, h: 3, lineSize: 3, lineSmooth: true });
// Pie chart
slide.addChart(pres.charts.PIE, [{
name: "Share", labels: ["A", "B", "Other"], values: [35, 45, 20]
}], { x: 7, y: 1, w: 5, h: 4, showPercent: true });
```
### Better-Looking Charts
Default charts look dated. Apply these options for a modern, clean appearance:
```javascript
slide.addChart(pres.charts.BAR, chartData, {
x: 0.5, y: 1, w: 9, h: 4, barDir: "col",
// Custom colors (match your presentation palette)
chartColors: ["0D9488", "14B8A6", "5EEAD4"],
// Clean background
chartArea: { fill: { color: "FFFFFF" }, roundedCorners: true },
// Muted axis labels
catAxisLabelColor: "64748B",
valAxisLabelColor: "64748B",
// Subtle grid (value axis only)
valGridLine: { color: "E2E8F0", size: 0.5 },
catGridLine: { style: "none" },
// Data labels on bars
showValue: true,
dataLabelPosition: "outEnd",
dataLabelColor: "1E293B",
// Hide legend for single series
showLegend: false,
});
```
**Key styling options:**
- `chartColors: [...]` - hex colors for series/segments
- `chartArea: { fill, border, roundedCorners }` - chart background
- `catGridLine/valGridLine: { color, style, size }` - grid lines (`style: "none"` to hide)
- `lineSmooth: true` - curved lines (line charts)
- `legendPos: "r"` - legend position: "b", "t", "l", "r", "tr"
---
## Slide Masters
```javascript
pres.defineSlideMaster({
title: 'TITLE_SLIDE', background: { color: '283A5E' },
objects: [{
placeholder: { options: { name: 'title', type: 'title', x: 1, y: 2, w: 8, h: 2 } }
}]
});
let titleSlide = pres.addSlide({ masterName: "TITLE_SLIDE" });
titleSlide.addText("My Title", { placeholder: "title" });
```
---
## Common Pitfalls
⚠️ These issues cause file corruption, visual bugs, or broken output. Avoid them.
1. **NEVER use "#" with hex colors** - causes file corruption
```javascript
color: "FF0000" // ✅ CORRECT
color: "#FF0000" // ❌ WRONG
```
2. **NEVER encode opacity in hex color strings** - 8-char colors (e.g., `"00000020"`) corrupt the file. Use the `opacity` property instead.
```javascript
shadow: { type: "outer", blur: 6, offset: 2, color: "00000020" } // ❌ CORRUPTS FILE
shadow: { type: "outer", blur: 6, offset: 2, color: "000000", opacity: 0.12 } // ✅ CORRECT
```
3. **Use `bullet: true`** - NEVER unicode symbols like "•" (creates double bullets)
4. **Use `breakLine: true`** between array items or text runs together
5. **Avoid `lineSpacing` with bullets** - causes excessive gaps; use `paraSpaceAfter` instead
6. **Each presentation needs fresh instance** - don't reuse `pptxgen()` objects
7. **NEVER reuse option objects across calls** - PptxGenJS mutates objects in-place (e.g. converting shadow values to EMU). Sharing one object between multiple calls corrupts the second shape.
```javascript
const shadow = { type: "outer", blur: 6, offset: 2, color: "000000", opacity: 0.15 };
slide.addShape(pres.shapes.RECTANGLE, { shadow, ... }); // ❌ second call gets already-converted values
slide.addShape(pres.shapes.RECTANGLE, { shadow, ... });
const makeShadow = () => ({ type: "outer", blur: 6, offset: 2, color: "000000", opacity: 0.15 });
slide.addShape(pres.shapes.RECTANGLE, { shadow: makeShadow(), ... }); // ✅ fresh object each time
slide.addShape(pres.shapes.RECTANGLE, { shadow: makeShadow(), ... });
```
8. **Don't use `ROUNDED_RECTANGLE` with accent borders** - rectangular overlay bars won't cover rounded corners. Use `RECTANGLE` instead.
```javascript
// ❌ WRONG: Accent bar doesn't cover rounded corners
slide.addShape(pres.shapes.ROUNDED_RECTANGLE, { x: 1, y: 1, w: 3, h: 1.5, fill: { color: "FFFFFF" } });
slide.addShape(pres.shapes.RECTANGLE, { x: 1, y: 1, w: 0.08, h: 1.5, fill: { color: "0891B2" } });
// ✅ CORRECT: Use RECTANGLE for clean alignment
slide.addShape(pres.shapes.RECTANGLE, { x: 1, y: 1, w: 3, h: 1.5, fill: { color: "FFFFFF" } });
slide.addShape(pres.shapes.RECTANGLE, { x: 1, y: 1, w: 0.08, h: 1.5, fill: { color: "0891B2" } });
```
---
## Quick Reference
- **Shapes**: RECTANGLE, OVAL, LINE, ROUNDED_RECTANGLE
- **Charts**: BAR, LINE, PIE, DOUGHNUT, SCATTER, BUBBLE, RADAR
- **Layouts**: LAYOUT_16x9 (10"×5.625"), LAYOUT_16x10, LAYOUT_4x3, LAYOUT_WIDE
- **Alignment**: "left", "center", "right"
- **Chart data labels**: "outEnd", "inEnd", "center"

View file

@ -1,91 +0,0 @@
---
name: Profile Setup
description: Interactive setup wizard to personalize your agent profile
version: 1.0.0
metadata:
emoji: "🧙"
tags:
- profile
- setup
- onboarding
---
## Instructions
You are conducting an interactive setup to personalize the agent profile. Your goal is to learn about the user through natural conversation and update their profile files accordingly.
### Setup Context
The user has just created a new agent profile and wants to personalize it. You have access to the profile directory and can update the following files:
- `soul.md` - Agent identity (name, role, style)
- `user.md` - Information about the user (name, preferences)
### Conversation Flow
Have a natural conversation to configure the agent and learn about the user. Don't follow a rigid script - adapt based on their responses. Here are topics to explore:
1. **Agent Identity** (for soul.md)
- What would you like to call me? (agent's name)
- What personality/style do you prefer? (concise and direct, warm and friendly, formal, casual, etc.)
2. **About the User** (for user.md)
- What should I call you?
- What's your timezone or location? (for context)
3. **Communication Preferences**
- How do you prefer responses? (concise vs detailed)
- Any language preferences? (English, Chinese, mixed)
- Anything that annoys you in AI responses?
### Guidelines
- **Be conversational**: This is a dialogue, not an interrogation. Ask follow-up questions naturally.
- **Don't ask everything**: Pick the most relevant questions based on context. Skip what doesn't apply.
- **Summarize and confirm**: After gathering information, summarize what you learned and ask if it's accurate.
- **Update files progressively**: As you learn things, update the relevant profile files.
- **End gracefully**: When you have enough information, wrap up the conversation and let the user know their profile is ready.
### File Updates
When updating files, use the `edit` tool to modify specific sections:
**soul.md - Update the Identity section:**
```markdown
## Identity
- **Name:** Jarvis
- **Role:** General-purpose AI assistant
- **Style:** Concise, direct, and friendly
```
**user.md example:**
```markdown
# User
- **Name:** Jiayuan
- **Call me:** Jiayuan
- **Timezone:** Asia/Shanghai
## Preferences
- Prefers concise responses
- Language: Chinese preferred, English for technical terms
```
### Starting the Conversation
Begin with a friendly greeting and explain what you're doing. Start by asking about the agent's identity first, then move to learning about the user. For example:
"Hi! I'm here to help set up your agent profile. Let me ask you a few questions so I can be configured to assist you better.
First, what would you like to call me? (Or just press enter to keep the default name 'Assistant')"
### Ending the Conversation
When you've gathered enough information, summarize and close:
"Great! I've updated your profile with what I learned:
- [Summary of key points]
Your profile is ready. You can always update these files later or run setup again. Feel free to start chatting with me anytime!"

View file

@ -1,301 +0,0 @@
---
name: Skill Creator
description: Create, edit, and manage custom skills to extend agent capabilities. Also activates inactive skills by guiding users through API key setup. Use when the user asks to create a new skill, build a custom capability, extend the agent's functionality, or when an inactive skill matches the user's intent.
version: 1.2.0
metadata:
emoji: "🛠️"
always: true
tags:
- meta
- skills
- developer-tools
---
## Instructions
You can create, edit, and manage skills to extend your own capabilities or help users build custom skills. You also activate inactive skills by guiding users through API key configuration.
## Activating Inactive Skills
When a user's request matches an **inactive skill** (listed under "Installed But Inactive Skills" in your system prompt), follow this flow:
1. **Inform the user**: Tell them the skill exists but needs setup
2. **Explain what's missing**: Reference the diagnostic info (e.g., "The Gmail skill requires a GMAIL_API_KEY")
3. **Guide them to get the key**: Use `web_search` or `web_fetch` to find how to obtain the required API key, then provide clear step-by-step instructions to the user
4. **Accept the key in chat**: Ask the user to paste the API key directly in the conversation
5. **Write the `.env` file**: Use the `write` tool to create the skill's `.env` file:
```
~/.super-multica/skills/<skill-id>/.env
```
Content format:
```
# API key for <Skill Name>
<ENV_VAR_NAME>=<pasted-key>
```
6. **Confirm activation**: The skill system auto-reloads on file changes. Tell the user the skill is now active and proceed with their original request.
**IMPORTANT**: The user's API key is written to a local file only. Never log, echo, or transmit the key anywhere else.
### Example (hypothetical — only act on skills that actually appear in your system prompt)
Suppose the system prompt contains an inactive skill entry like:
```
- **Stock Tracker** (`stock-tracker`): Track stock prices
- Missing environment variables: STOCK_API_KEY
- Fix: Set STOCK_API_KEY in ~/.super-multica/skills/stock-tracker/.env
```
Then the conversation would be:
```
User: "What's AAPL trading at?"
Agent: *sees stock-tracker in inactive skills list*
Agent: *uses web_search to find how to get a Stock API key*
Agent: "I have a Stock Tracker skill but it needs a STOCK_API_KEY. Here's how to get one: ..."
User: "sk-abc123..."
Agent: *writes ~/.super-multica/skills/stock-tracker/.env*
Agent: "Done! Stock Tracker is active. Let me check AAPL for you..."
```
**CRITICAL**: Only reference skills that are actually listed in your system prompt under "Installed But Inactive Skills". Never assume a skill exists without seeing it there.
## Creating New Skills When No Match Exists
If the user asks for a capability that doesn't match any existing or inactive skill:
1. **Suggest creating a new skill** if the capability is well-defined and repeatable
2. Briefly describe what the skill would do and ask for confirmation
3. Follow the **Skill Creation Process** below to create it
4. If the new skill needs API keys, guide the user through obtaining and configuring them
## Skill Creation Process
**ALWAYS follow these steps in order when creating a new skill:**
1. Understand what the skill should do
2. Initialize the skill using `init_skill.py`
3. Edit the generated SKILL.md
4. Test the skill
### Step 1: Understand the Skill
Before creating, clarify:
- What functionality should the skill provide?
- When should it be triggered?
- Does it need helper scripts?
### Step 2: Initialize the Skill
**CRITICAL: Never create skills in the current working directory.**
**Choose the correct directory based on context:**
- **If running under a profile**: Create in `~/.super-multica/agent-profiles/<profile-id>/skills/` (profile-specific)
- **If no profile**: Create in `~/.super-multica/skills/` (global)
```bash
# For profile-specific skill (when running under a profile):
mkdir -p ~/.super-multica/agent-profiles/<profile-id>/skills/<skill-name>
# For global skill (when no profile is active):
mkdir -p ~/.super-multica/skills/<skill-name>
```
Create SKILL.md with proper structure:
```bash
# Replace <SKILL_DIR> with the appropriate path from above
cat > <SKILL_DIR>/SKILL.md << 'EOF'
---
name: <Skill Name>
description: <What this skill does and when to use it>
version: 1.0.0
metadata:
emoji: "🔧"
tags:
- custom
---
## Instructions
<Instructions for using this skill>
EOF
# (Optional) Create scripts directory if needed
mkdir -p <SKILL_DIR>/scripts
```
**Example - Creating a translator skill (global):**
```bash
mkdir -p ~/.super-multica/skills/translator
cat > ~/.super-multica/skills/translator/SKILL.md << 'EOF'
---
name: Translator
description: Translate text between languages. Use when user asks to translate text.
version: 1.0.0
metadata:
emoji: "🌐"
tags:
- language
---
## Instructions
When asked to translate text:
1. Identify source and target languages
2. Provide accurate, natural translations
3. For ambiguous terms, offer alternatives
EOF
```
### Step 3: Edit the Skill
After initialization, edit the `SKILL.md` file in the skill directory:
1. Update the `description` - This is the primary trigger mechanism
2. Write clear `## Instructions` - What the agent should do
3. Add helper scripts to `scripts/` if needed
4. Add reference docs to `references/` if needed
### Step 4: Test the Skill
The skill is automatically loaded (hot-reload). Verify with:
```bash
pnpm skills:cli list | grep <skill-name>
```
**IMPORTANT: Do NOT create .skill package files.** Skills are loaded directly from the directory structure. There is no packaging step needed.
## SKILL.md Format
Every skill must have a `SKILL.md` file with YAML frontmatter:
```markdown
---
name: Skill Display Name
description: Brief description of what this skill does
version: 1.0.0
metadata:
emoji: "🔧"
tags:
- category1
requires:
bins: [required-binary]
env: [REQUIRED_ENV_VAR]
---
## Instructions
Detailed instructions for using this skill...
```
### Frontmatter Fields
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Display name for the skill |
| `description` | Yes | Short description (triggers skill selection) |
| `version` | No | Semantic version |
| `metadata.emoji` | No | Emoji for display |
| `metadata.tags` | No | Categorization tags |
| `metadata.requires.bins` | No | Required binaries (all must exist) |
| `metadata.requires.anyBins` | No | Alternative binaries (one must exist) |
| `metadata.requires.env` | No | Required environment variables |
## Directory Structure
Skills are stored in two locations:
```
# Global skills (available to all profiles)
~/.super-multica/skills/
├── my-skill/
│ └── SKILL.md
└── another-skill/
├── SKILL.md
├── scripts/
│ └── helper.py
└── references/
└── api-docs.md
# Profile-specific skills (only for this profile)
~/.super-multica/agent-profiles/<profile-id>/skills/
└── profile-only-skill/
└── SKILL.md
```
## Editing Existing Skills
To modify an existing skill:
1. Read the current SKILL.md file
2. Make changes to frontmatter or instructions
3. Save - changes take effect immediately (hot-reload)
## Listing and Removing Skills
```bash
# List all skills
pnpm skills:cli list
# Check skill status
pnpm skills:cli status <skill-name>
# Remove a global skill
pnpm skills:cli remove <skill-name>
# or
rm -rf ~/.super-multica/skills/<skill-name>
# Remove a profile-specific skill
rm -rf ~/.super-multica/agent-profiles/<profile-id>/skills/<skill-name>
```
## Skills with API Key Requirements
When creating a skill that needs an API key:
1. Declare env requirements in the SKILL.md frontmatter:
```yaml
metadata:
requires:
env: [SERVICE_API_KEY]
```
2. After creating the SKILL.md, write the `.env` file in the same directory:
```
# API key for <Service Name>
SERVICE_API_KEY=<key-value>
```
3. The skill becomes eligible immediately (hot-reload is automatic).
### .env File Format
Each skill stores its credentials in `~/.super-multica/skills/<skill-id>/.env`:
```
# Lines starting with # are comments
KEY_NAME=value
ANOTHER_KEY="value with spaces"
```
Rules:
- One key per line, `KEY=VALUE` format
- Quotes are optional (stripped automatically)
- Each skill has its own `.env` — no centralized credential file
## Best Practices
1. **Correct directory** - Never create skills in the current working directory
2. **Clear description** - Include "when to use" triggers in the description
3. **Concise instructions** - Keep SKILL.md under 500 lines
4. **Test scripts** - Run helper scripts to verify they work
5. **Single responsibility** - Each skill should do one thing well
6. **Proactive activation** - When you see an inactive skill matching user intent, suggest activating it
## Skill Precedence
Skills load from two sources (highest priority wins):
1. Profile-specific skills (`~/.super-multica/agent-profiles/<id>/skills/`)
2. Global skills (`~/.super-multica/skills/`)
Profile skills override global skills with the same ID.

View file

@ -1,55 +0,0 @@
---
name: Audio Transcription
description: Transcribe audio files using local Whisper CLI when automatic pre-processing is unavailable
version: 1.1.0
metadata:
emoji: "🎙️"
requires:
anyBins:
- whisper
- whisper-cli
install:
- id: brew-whisper
kind: brew
formula: openai-whisper
bins: [whisper]
label: "Install OpenAI Whisper via Homebrew"
os: [darwin]
tags:
- audio
- transcription
- media
userInvocable: false
disableModelInvocation: false
---
## Audio Transcription (Agent Fallback)
Voice messages from channels are pre-processed before reaching you. The transcription
priority is:
1. **Local whisper CLI** (free, offline) — requires `whisper` or `whisper-cli` in PATH
2. **OpenAI Whisper API** — requires an OpenAI API key in credentials
3. **No provider available** — you receive a raw file path instead of a transcript
When both providers are unavailable, you will receive `[audio message received]` with a
`File:` path instead of `[Voice Message]` with a transcript. Use local whisper to
transcribe manually:
```
whisper "<file_path>" --model base --output_format txt --output_dir /tmp
```
Then read the `.txt` file from `/tmp/` and respond based on the transcribed content.
### Setup
To enable automatic local transcription (recommended):
```bash
brew install openai-whisper
```
The first run will download the `base` model (~139MB) to `~/.cache/whisper/`.
No app restart is required — the binary is detected automatically on the next
voice message.

View file

@ -1,307 +0,0 @@
---
name: Excel Spreadsheet
description: "Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved."
version: 1.0.0
metadata:
emoji: "📗"
tags:
- office
- spreadsheet
- xlsx
install:
- id: brew-libreoffice
kind: brew
formula: libreoffice
bins: [soffice]
label: "Install LibreOffice for formula recalculation"
os: [darwin]
userInvocable: true
disableModelInvocation: false
---
# Requirements for Outputs
## All Excel files
### Professional Font
- Use a consistent, professional font (e.g., Arial, Times New Roman) for all deliverables unless otherwise instructed by the user
### Zero Formula Errors
- Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
### Preserve Existing Templates (when updating templates)
- Study and EXACTLY match existing format, style, and conventions when modifying files
- Never impose standardized formatting on files with established patterns
- Existing template conventions ALWAYS override these guidelines
## Financial models
### Color Coding Standards
Unless otherwise stated by the user or existing template
#### Industry-Standard Color Conventions
- **Blue text (RGB: 0,0,255)**: Hardcoded inputs, and numbers users will change for scenarios
- **Black text (RGB: 0,0,0)**: ALL formulas and calculations
- **Green text (RGB: 0,128,0)**: Links pulling from other worksheets within same workbook
- **Red text (RGB: 255,0,0)**: External links to other files
- **Yellow background (RGB: 255,255,0)**: Key assumptions needing attention or cells that need to be updated
### Number Formatting Standards
#### Required Format Rules
- **Years**: Format as text strings (e.g., "2024" not "2,024")
- **Currency**: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")
- **Zeros**: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")
- **Percentages**: Default to 0.0% format (one decimal)
- **Multiples**: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)
- **Negative numbers**: Use parentheses (123) not minus -123
### Formula Construction Rules
#### Assumptions Placement
- Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells
- Use cell references instead of hardcoded values in formulas
- Example: Use =B5*(1+$B$6) instead of =B5*1.05
#### Formula Error Prevention
- Verify all cell references are correct
- Check for off-by-one errors in ranges
- Ensure consistent formulas across all projection periods
- Test with edge cases (zero values, negative numbers)
- Verify no unintended circular references
#### Documentation Requirements for Hardcodes
- Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"
- Examples:
- "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
- "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
- "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
- "Source: FactSet, 8/20/2025, Consensus Estimates Screen"
# XLSX creation, editing, and analysis
## Overview
A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
## Important Requirements
**LibreOffice Required for Formula Recalculation**: You can assume LibreOffice is installed for recalculating formula values using the `scripts/recalc.py` script. The script automatically configures LibreOffice on first run, including in sandboxed environments where Unix sockets are restricted (handled by `scripts/office/soffice.py`)
## Reading and analyzing data
### Data analysis with pandas
For data analysis, visualization, and basic operations, use **pandas** which provides powerful data manipulation capabilities:
```python
import pandas as pd
# Read Excel
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
# Analyze
df.head() # Preview data
df.info() # Column info
df.describe() # Statistics
# Write Excel
df.to_excel('output.xlsx', index=False)
```
## Excel File Workflows
## CRITICAL: Use Formulas, Not Hardcoded Values
**Always use Excel formulas instead of calculating values in Python and hardcoding them.** This ensures the spreadsheet remains dynamic and updateable.
### WRONG - Hardcoding Calculated Values
```python
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5
```
### CORRECT - Using Excel Formulas
```python
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'
```
This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
## Common Workflow
1. **Choose tool**: pandas for data, openpyxl for formulas/formatting
2. **Create/Load**: Create new workbook or load existing file
3. **Modify**: Add/edit data, formulas, and formatting
4. **Save**: Write to file
5. **Recalculate formulas (MANDATORY IF USING FORMULAS)**: Use the scripts/recalc.py script
```bash
python3 scripts/recalc.py output.xlsx
```
6. **Verify and fix any errors**:
- The script returns JSON with error details
- If `status` is `errors_found`, check `error_summary` for specific error types and locations
- Fix the identified errors and recalculate again
- Common errors to fix:
- `#REF!`: Invalid cell references
- `#DIV/0!`: Division by zero
- `#VALUE!`: Wrong data type in formula
- `#NAME?`: Unrecognized formula name
### Creating new Excel files
```python
# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
```
### Editing existing Excel files
```python
# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook
# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheet
# Working with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
```
## Recalculating formulas
Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided `scripts/recalc.py` script to recalculate formulas:
```bash
python3 scripts/recalc.py <excel_file> [timeout_seconds]
```
Example:
```bash
python3 scripts/recalc.py output.xlsx 30
```
The script:
- Automatically sets up LibreOffice macro on first run
- Recalculates all formulas in all sheets
- Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)
- Returns JSON with detailed error locations and counts
- Works on both Linux and macOS
## Formula Verification Checklist
Quick checks to ensure formulas work correctly:
### Essential Verification
- [ ] **Test 2-3 sample references**: Verify they pull correct values before building full model
- [ ] **Column mapping**: Confirm Excel columns match (e.g., column 64 = BL, not BK)
- [ ] **Row offset**: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)
### Common Pitfalls
- [ ] **NaN handling**: Check for null values with `pd.notna()`
- [ ] **Far-right columns**: FY data often in columns 50+
- [ ] **Multiple matches**: Search all occurrences, not just first
- [ ] **Division by zero**: Check denominators before using `/` in formulas (#DIV/0!)
- [ ] **Wrong references**: Verify all cell references point to intended cells (#REF!)
- [ ] **Cross-sheet references**: Use correct format (Sheet1!A1) for linking sheets
### Formula Testing Strategy
- [ ] **Start small**: Test formulas on 2-3 cells before applying broadly
- [ ] **Verify dependencies**: Check all cells referenced in formulas exist
- [ ] **Test edge cases**: Include zero, negative, and very large values
### Interpreting scripts/recalc.py Output
The script returns JSON with error details:
```json
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}
```
## Best Practices
### Library Selection
- **pandas**: Best for data analysis, bulk operations, and simple data export
- **openpyxl**: Best for complex formatting, formulas, and Excel-specific features
### Working with openpyxl
- Cell indices are 1-based (row=1, column=1 refers to cell A1)
- Use `data_only=True` to read calculated values: `load_workbook('file.xlsx', data_only=True)`
- **Warning**: If opened with `data_only=True` and saved, formulas are replaced with values and permanently lost
- For large files: Use `read_only=True` for reading or `write_only=True` for writing
- Formulas are preserved but not evaluated - use scripts/recalc.py to update values
### Working with pandas
- Specify data types to avoid inference issues: `pd.read_excel('file.xlsx', dtype={'id': str})`
- For large files, read specific columns: `pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])`
- Handle dates properly: `pd.read_excel('file.xlsx', parse_dates=['date_column'])`
## Code Style Guidelines
**IMPORTANT**: When generating Python code for Excel operations:
- Write minimal, concise Python code without unnecessary comments
- Avoid verbose variable names and redundant operations
- Avoid unnecessary print statements
**For Excel files themselves**:
- Add comments to cells with complex formulas or important assumptions
- Document data sources for hardcoded values
- Include notes for key calculations and model sections