9router/gitbook/content/en/features/combos.md
2026-05-11 11:50:24 +07:00

537 lines
10 KiB
Markdown

# Combos - Custom Fallback Chains
Create custom model combinations with automatic fallback. Combos let you define your own routing strategy based on cost, quality, and availability.
---
## What Are Combos?
Combos are **custom fallback chains** that you create in the dashboard. Instead of using a single model, you define a sequence of models that 9Router tries in order.
**Example:**
```
Combo name: premium-coding
Models:
1. cc/claude-opus-4-5-20251101 (try first)
2. glm/glm-4.7 (if #1 quota exhausted)
3. minimax/MiniMax-M2.1 (if #2 quota exhausted)
```
**Usage in CLI:**
```
Model: premium-coding
```
9Router automatically tries each model in sequence until one succeeds.
---
## Why Use Combos?
### 1. Maximize Subscription Value
```
cc/claude-opus → glm/glm-4.7 → if/kimi-k2-thinking
→ Use subscription first, cheap backup, free emergency
→ Get full value from subscriptions you already pay for
```
### 2. Minimize Costs
```
glm/glm-4.7 → minimax/MiniMax-M2.1 → if/kimi-k2-thinking
→ Start with cheapest paid option ($0.60/1M)
→ Fallback to even cheaper ($0.20/1M)
→ Emergency free tier
→ Total cost: ~$5-10/month vs $2000 on ChatGPT API
```
### 3. Ensure 24/7 Availability
```
cc/claude-opus → cx/gpt-5.2-codex → glm/glm-4.7 → if/kimi-k2-thinking
→ Always include free tier at the end
→ Never run out of quota
→ Code anytime, anywhere
```
### 4. Optimize for Quality
```
cc/claude-opus-4-5 → cx/gpt-5.2-codex → gc/gemini-3-pro
→ Best models first
→ Fallback to other premium models
→ Maintain high quality across fallback chain
```
---
## How to Create Combos
### Step 1: Open Dashboard
```
http://localhost:20128
→ Login with your password
```
### Step 2: Navigate to Combos
```
Dashboard → Combos → Create New Combo
```
### Step 3: Configure Combo
**Combo Name:**
```
premium-coding
```
**Description (optional):**
```
Subscription first, cheap backup, free emergency
```
**Select Models:**
```
1. cc/claude-opus-4-5-20251101
2. glm/glm-4.7
3. minimax/MiniMax-M2.1
```
**Drag to reorder** - Priority from top to bottom.
### Step 4: Save
```
Click "Save Combo"
→ Combo appears in model list
```
### Step 5: Use in CLI
```
Cursor/Cline/Any tool:
Model: premium-coding
```
---
## Example Combos
### Example 1: Premium Coding (Subscription → Cheap → Free)
**Goal**: Maximize subscription value, minimize extra costs.
```
Dashboard → Combos → Create New
Name: premium-coding
Models:
1. cc/claude-opus-4-5-20251101
2. glm/glm-4.7
3. minimax/MiniMax-M2.1
```
**Usage:**
```
Cursor IDE:
Model: premium-coding
```
**Behavior:**
```
Morning (fresh quota):
Request → cc/claude-opus-4-5 ✅
Afternoon (Claude quota out):
Request → glm/glm-4.7 ✅ (auto switched)
Evening (GLM quota out):
Request → minimax/MiniMax-M2.1 ✅ (auto switched)
```
**Monthly cost (100M tokens):**
```
80M via Claude Code: $0 (subscription)
15M via GLM: $9
5M via MiniMax: $1
Total: $10 + your subscription
```
**Savings**: ~99% vs ChatGPT API ($2000).
---
### Example 2: Budget Combo (Cheap → Free)
**Goal**: Minimize costs, use free tier as backup.
```
Dashboard → Combos → Create New
Name: budget-combo
Models:
1. glm/glm-4.7
2. minimax/MiniMax-M2.1
3. if/kimi-k2-thinking
```
**Usage:**
```
Cline:
Provider: OpenAI Compatible
Base URL: http://localhost:20128/v1
Model: budget-combo
```
**Behavior:**
```
Request → glm/glm-4.7
✅ Daily quota available → Use GLM ($0.60/1M)
❌ Quota exhausted → Try MiniMax ($0.20/1M)
❌ MiniMax quota out → Use iFlow (FREE)
```
**Monthly cost (100M tokens):**
```
70M via GLM: $42
20M via MiniMax: $4
10M via iFlow: $0
Total: $46 vs $2000 on ChatGPT API
```
**Savings**: 97%.
---
### Example 3: Free Combo (Zero Cost)
**Goal**: 100% free, no costs ever.
```
Dashboard → Combos → Create New
Name: free-combo
Models:
1. if/kimi-k2-thinking
2. qw/qwen3-coder-plus
3. kr/claude-sonnet-4.5
```
**Usage:**
```
Claude Desktop:
Model: free-combo
```
**Behavior:**
```
Request → if/kimi-k2-thinking
✅ Available → Use iFlow
❌ Error → Try Qwen
❌ Error → Try Kiro
```
**Monthly cost:**
```
100M tokens via free providers: $0
Total: $0 forever
```
**Use case**: Personal projects, learning, experimentation.
---
### Example 4: Quality First (Premium Models Only)
**Goal**: Best quality, no cheap fallback.
```
Dashboard → Combos → Create New
Name: quality-first
Models:
1. cc/claude-opus-4-5-20251101
2. cx/gpt-5.2-codex
3. gc/gemini-3-pro-preview
```
**Usage:**
```
Codex CLI:
export OPENAI_BASE_URL="http://localhost:20128"
Model: quality-first
```
**Behavior:**
```
Request → cc/claude-opus-4-5
❌ Quota out → cx/gpt-5.2-codex
❌ Quota out → gc/gemini-3-pro-preview
❌ All out → Return error (no cheap fallback)
```
**Use case**: Critical production code, complex refactoring.
---
### Example 5: Multi-Subscription (Maximize All)
**Goal**: Use all subscriptions before paying extra.
```
Dashboard → Combos → Create New
Name: multi-sub
Models:
1. gc/gemini-3-flash-preview (FREE 180K/month)
2. cc/claude-opus-4-5-20251101 (Pro subscription)
3. cx/gpt-5.2-codex (Plus subscription)
4. gh/gpt-5 (Copilot subscription)
5. glm/glm-4.7 (Cheap backup)
6. if/kimi-k2-thinking (Free emergency)
```
**Monthly cost (200M tokens):**
```
50M via Gemini CLI: $0 (free tier)
80M via Claude Code: $0 (subscription)
40M via Codex: $0 (subscription)
20M via Copilot: $0 (subscription)
8M via GLM: $4.80
2M via iFlow: $0
Total: $4.80 + existing subscriptions
```
**Result**: Use 190M tokens from subscriptions, only $4.80 extra.
---
### Example 6: Quota Reset Optimization
**Goal**: Distribute usage based on reset times.
```
Dashboard → Combos → Create New
Name: reset-optimized
Models:
1. cc/claude-opus-4-5 (5h reset, use morning)
2. gc/gemini-3-flash (1K/day, use afternoon)
3. glm/glm-4.7 (daily 10AM reset, use evening)
4. minimax/MiniMax-M2.1 (5h rolling, use night)
5. if/kimi-k2-thinking (unlimited, emergency)
```
**Daily routine:**
```
08:00 - 13:00: Claude Code (fresh 5h quota)
13:00 - 18:00: Gemini CLI (1K/day quota)
18:00 - 22:00: GLM (resets 10AM next day)
22:00 - 08:00: MiniMax (5h rolling) or iFlow
```
**Result**: Code 24/7 with minimal costs.
---
## Use Combos in CLI Tools
### Cursor IDE
```
Settings → Models → Advanced:
OpenAI API Base URL: http://localhost:20128/v1
OpenAI API Key: [from dashboard]
Model: premium-coding
```
### Claude Desktop
Edit `~/.claude/config.json`:
```json
{
"anthropic_api_base": "http://localhost:20128/v1",
"anthropic_api_key": "your-9router-api-key",
"model": "budget-combo"
}
```
### Codex CLI
```bash
export OPENAI_BASE_URL="http://localhost:20128"
export OPENAI_API_KEY="your-9router-api-key"
codex --model quality-first "your prompt"
```
### Cline / Continue / RooCode
```
Provider: OpenAI Compatible
Base URL: http://localhost:20128/v1
API Key: [from dashboard]
Model: free-combo
```
### API Request
```bash
curl http://localhost:20128/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "premium-coding",
"messages": [
{"role": "user", "content": "Write a function to..."}
],
"stream": true
}'
```
---
## Best Practices
### 1. Always Include Free Tier
```
✅ Good:
cc/claude-opus → glm/glm-4.7 → if/kimi-k2-thinking
❌ Bad:
cc/claude-opus → glm/glm-4.7
(no free fallback, can run out of quota)
```
**Why**: Ensures 24/7 availability, never blocked by quota.
### 2. Order by Cost (Cheap to Expensive)
```
✅ Good:
glm/glm-4.7 → minimax/MiniMax-M2.1 → cc/claude-opus
❌ Bad:
cc/claude-opus → glm/glm-4.7
(wastes subscription quota on simple tasks)
```
**Exception**: If you want to maximize subscription value, put subscription first.
### 3. Match Quality Requirements
```
For production code:
cc/claude-opus → cx/gpt-5.2-codex → glm/glm-4.7
For quick tasks:
glm/glm-4.7 → if/kimi-k2-thinking
For experimentation:
if/kimi-k2-thinking → qw/qwen3-coder-plus
```
### 4. Consider Quota Reset Times
```
Morning combo (fresh quotas):
cc/claude-opus → cx/gpt-5.2-codex
Evening combo (quotas likely exhausted):
glm/glm-4.7 → minimax/MiniMax-M2.1 → if/kimi-k2-thinking
```
### 5. Create Multiple Combos for Different Use Cases
```
premium-coding: For complex tasks
budget-combo: For simple tasks
free-combo: For experimentation
quality-first: For production code
```
**Switch between combos** based on task requirements.
### 6. Monitor Combo Performance
```
Dashboard → Analytics → Combo Usage:
premium-coding:
80% via cc/claude-opus (good, using subscription)
15% via glm/glm-4.7 (acceptable backup)
5% via minimax (rare fallback)
```
**Optimize**: If too much fallback usage, increase primary quota or reorder models.
---
## Advanced Configuration
### Set Budget Limits per Combo
```
Dashboard → Combos → Edit → Budget:
Daily limit: $5
Monthly limit: $50
```
When limit reached, 9Router skips paid models and uses free tier only.
### Enable/Disable Models in Combo
```
Dashboard → Combos → Edit → Models:
✅ cc/claude-opus-4-5 (enabled)
❌ glm/glm-4.7 (temporarily disabled)
✅ if/kimi-k2-thinking (enabled)
```
**Use case**: Temporarily disable expensive models without deleting combo.
### Clone Existing Combo
```
Dashboard → Combos → Clone "premium-coding"
→ Creates copy with "-copy" suffix
→ Modify and save as new combo
```
**Use case**: Create variations for different scenarios.
---
## Troubleshooting
**Issue: Combo not appearing in model list**
**Solution:**
1. Refresh dashboard
2. Check combo is saved (green checkmark)
3. Restart CLI tool to refresh model list
**Issue: Combo always uses last model (free tier)**
**Solution:**
1. Check quota for primary models (Dashboard → Quota)
2. Verify API keys are valid (Dashboard → Providers)
3. Check budget limits not exceeded
**Issue: Combo costs more than expected**
**Solution:**
1. Dashboard → Analytics → Review combo usage
2. Check if primary models are quota-exhausted
3. Reorder models (put cheaper first)
4. Set budget limits
---
## Related
- [Smart Routing](./smart-routing.md) - How auto fallback works
- [Quota Tracking](./quota-tracking.md) - Monitor usage and costs